DBFS_HS – Database Hierarchical Storage, Part 2
In part 1 we set up a hierarchical store under the user hstest, created a table with SecureFile LOBs, and archived 30 SecureFiles. So what’s going on under the covers and how do we retrieve the data? Is the data even really out of the database? In this article I will try to answer these questions as we continue to explore our hierarchical store.
How it all works
At the simplest level the hierarchical store provides various APIs to accomplish a put, a get, and some informational queries. Intermixed in these functions are the ability to create “links” to archived data such that a query of an archived SecureFile LOB that is no longer in database storage will be retrieved without application intervention (i.e. application transparency). The mechanism of a put involves moving the SecureFile LOB to a cached staging area (i.e. the cache table) until it can be archived out to the store’s defined storage media. Once archived it may be eligible to be “flushed” from the cache table in the database and therefore no longer reside in database storage. Once SecureFile LOB’s are moved to a store cache, or really staging, table they are eventually written to the store’s storage media in batches or “tarballs”. The size of these tarballs is controlled by a property defined for the store in the DBFS_HS$_STOREPROPERTIES table. If a “get” is issued and the SecureFile LOB is in the cache table then it can be retrieved directly. If not then the SecureFile LOB must be recalled from the tarball and written back into the SecureFile LOB’s row in the cache table.
This raises a question about storage. Ideally you would expect to see the cache table steady state at some fairly constant size as the rate of archiving and retrieval evens out. But it seems pretty obvious that the cache table is a good candidate for a dedicated tablespace. Since its space usage is not necessarily predictable it would be nice to keep it separate from other objects so that it’s size could be capped and not allowed to grow uncontrollably. It’s not clear if the cache table could be partitioned, but this would probably only be practical if it was very large. The partitioning scheme would also need to be set in a way that new and old SecureFile LOBs were being read and written to from each partition in order to insure that space was reused. The cache table also seems to be a good candidate for a second tier of storage. If the source table(s) are on a primary storage tier then clearly the cache table is not going to need to meet the same performance criteria, otherwise why archive? Since the actual store will be on a media outside the database this raises the possibility of having at least three storage tiers.
Database Objects
Next let’s look at what objects have been created. The following SYS dictionary tables exist to support dbms_dbfs:
- DBFS_HS$_BACKUPFILETABLE
- DBFS_HS$_CONTENTFILEMAPTBL
- DBFS_HS$_FS
- DBFS_HS$_PROPERTY
- DBFS_HS$_SFLOCATORTABLE
- DBFS_HS$_STORECOMMANDS
- DBFS_HS$_STOREID2POLICYCTX
- DBFS_HS$_STOREIDTABLE
- DBFS_HS$_STOREPROPERTIES
- DBFS_SFS$_FS
- DBFS_SFS$_FST
- DBFS_SFS$_FSTO
- DBFS_SFS$_FSTP
- DBFS_SFS$_SNAP
- DBFS_SFS$_TAB
- DBFS_SFS$_VOL
- DBFS$_MOUNTS
- DBFS$_STATS
- DBFS$_STORES
Additionally for each store created the following tables are created in the SYS schema:
- HS$AT<dbname><seq>
- HS$CT<dbname><seq>
- HS$DB<dbname><seq>
- HS$DT<dbname><seq>
- HS$JT<dbname><seq>
- HS$TT<dbname><seq>
When a store is created running the dbms_dbfs_hs.createStore procedure a table is created from the “tblname” input parameter. This table is used as a metadata table and stores the SecureFile LOB in the “filedata” column. A second table is also created with the name in the format of SFS$_FSTP_<seq>. This table holds more information about how the SecureFile LOB is archived and provides the gateway into the SYS metadata tables.
A view is also created with the name in the format of VSFS$_<seq> which exposes store based metadata about the archived SecureFile LOB.
In my database the following tables and view have been created:
The following view can be used to provide a mapping between the archived SecureFile LOB and it’s location in the store and cache. A separate view is required for each store due to the use of separate cache and metadata tables for each store.
create or replace force view hs_store_vas select c.pathname AS path, c.item AS name, c.std_guid AS guid, sf2.propvalue AS file_size, dbms_lob.getlength(c.filedata) AS cache_size, sf1.propvalue AS contentid, l.tarballid, b.backupfilename, st.storeowner, st.storename from hstest.hs_cache c, hstest.sfs$_fstp_12 sf1, hstest.sfs$_fstp_12 sf2, sys.dbfs_hs$_contentfnmaptbl m, sys.dbfs_hs$_sflocatortable l, sys.dbfs_hs$_backupfiletable b, sys.dbfs_hs$_storeidtable st where c.std_guid = sf1.std_guid and c.pathtype = 1 and sf1.propname = 'HS:CONTENTID' and sf1.propvalue = m.contentfilename(+) and sf1.std_guid = sf2.std_guid and sf2.propname = 'HS:FILESIZE' and m.archiverefid = l.archiverefid(+) and l.tarballid = b.tarballid(+) and b.storeid = st.storeid(+);
Usage Examples
Now that everything is setup let’s walk through some examples of how DBFS HS works. First let’s look at what we have. In part 1 of this series I archived 30 SecureFile LOBs and that created one “tarball” operating system file:
380:oracle@testvm:/ora_src/bkup [db11g] $ ls -lrt total 8212 -rw-r--r-- 1 oracle oinstall 2136 Jun 6 23:46 Oracle_Disk_SBT_Catalog -rw-r--r-- 1 oracle oinstall 8388616 Jun 6 23:46 __DB11G_1_A5180F78E736D02AE040A8C057011B2C_1 381:oracle@testvm:/ora_src/bkup [db11g]
Now we’ll archive 15 more SecureFile LOBs, but we’ll just start at the beginning to show that if the SecureFile LOB has already been archived then re-archiving it does not invoke an error:
SQL> SET SERVEROUTPUT ON;
DECLARE
v_mountpoint VARCHAR2(32) := '/hs_store/';
v_LOB BLOB;
v_LOBsize NUMBER;
v_debug CHAR(1) := 'N';
BEGIN
FOR arch_rec IN
(
SELECT hs.* FROM hs_lobtable hs
where dbms_lob.getlength(hs.lobcol) != 0
and rownum <= 45 ) LOOP BEGIN SELECT hs.lobcol INTO v_LOB FROM hs_lobtable hs WHERE hs.id = arch_rec.id FOR UPDATE; -- v_LOBsize := SYS.DBMS_LOB.GETLENGTH(v_LOB); DBMS_OUTPUT.PUT_LINE('Archive ID: ' || arch_rec.id); DBMS_OUTPUT.PUT_LINE('LOB size: ' || v_LOBsize); -- -- Write the LOB to the store -- DBMS_LOB.MOVE_TO_DBFS_LINK ( lob_loc => v_LOB,
storage_path => v_mountpoint || arch_rec.id,
flags => DBMS_LOB.DBFS_LINK_NOCACHE);
--
COMMIT;
EXCEPTION
WHEN OTHERS THEN
ROLLBACK;
DBMS_OUTPUT.PUT_LINE(DBMS_UTILITY.FORMAT_ERROR_BACKTRACE);
END;
END LOOP;
END;
/
SQL> 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
26 27 28 29 30 31 32 33 34 35 36 37 38 39
Archive ID: 1
LOB size: 16384
Archive ID: 2
LOB size: 16384
Archive ID: 3
LOB size: 16384
Archive ID: 4
LOB size: 16384
Archive ID: 5
LOB size: 16384
Archive ID: 6
LOB size: 16384
Archive ID: 7
LOB size: 16384
Archive ID: 8
LOB size: 16384
Archive ID: 9
LOB size: 16384
Archive ID: 10
LOB size: 16384
Archive ID: 11
LOB size: 16384
Archive ID: 12
LOB size: 16384
Archive ID: 13
LOB size: 16384
Archive ID: 14
LOB size: 16384
Archive ID: 15
LOB size: 16384
Archive ID: 16
LOB size: 16384
Archive ID: 17
LOB size: 16384
Archive ID: 18
LOB size: 16384
Archive ID: 19
LOB size: 16384
Archive ID: 20
LOB size: 16384
Archive ID: 21
LOB size: 16384
Archive ID: 22
LOB size: 16384
Archive ID: 23
LOB size: 16384
Archive ID: 24
LOB size: 16384
Archive ID: 25
LOB size: 16384
Archive ID: 26
LOB size: 16384
Archive ID: 27
LOB size: 16384
Archive ID: 28
LOB size: 16384
Archive ID: 29
LOB size: 16384
Archive ID: 30
LOB size: 16384
Archive ID: 31
LOB size: 16384
Archive ID: 32
LOB size: 16384
Archive ID: 33
LOB size: 16384
Archive ID: 34
LOB size: 16384
Archive ID: 35
LOB size: 16384
Archive ID: 36
LOB size: 16384
Archive ID: 37
LOB size: 16384
Archive ID: 38
LOB size: 16384
Archive ID: 39
LOB size: 16384
Archive ID: 40
LOB size: 16384
Archive ID: 41
LOB size: 16384
Archive ID: 42
LOB size: 16384
Archive ID: 43
LOB size: 16384
Archive ID: 44
LOB size: 16384
Archive ID: 45
LOB size: 16384
PL/SQL procedure successfully completed.
SQL>
Now let’s run a store push and see if we’ve create another archive file:
SQL> DECLARE store_name varchar2(32) := 'HS_STORE'; BEGIN dbms_dbfs_hs.storePush(store_name) ; commit ; END; / 2 3 4 5 6 7 PL/SQL procedure successfully completed. SQL> !ls -lrt total 16420 -rw-r--r-- 1 oracle oinstall 8388616 Jun 6 23:46 __DB11G_1_A5180F78E736D02AE040A8C057011B2C_1 -rw-r--r-- 1 oracle oinstall 3204 Jun 20 09:17 Oracle_Disk_SBT_Catalog -rw-r--r-- 1 oracle oinstall 8388616 Jun 20 09:17 __DB11G_2_A6248CC964C16A45E040A8C0570111A4_2 SQL>
OK, we’ve created a new archive file in the OS. Let’s look at what our metadata view says about our store and the cache for the store:
SQL> select name, cache_size, backupfilename from hs_store_v; NAME CACHE_SIZE BACKUPFILENAME --------------- ---------- ------------------------------------------------------------ LOB1 0 __DB11G_1_A5180F78E736D02AE040A8C057011B2C_1 LOB10 0 __DB11G_1_A5180F78E736D02AE040A8C057011B2C_1 LOB11 0 __DB11G_1_A5180F78E736D02AE040A8C057011B2C_1 LOB12 0 __DB11G_1_A5180F78E736D02AE040A8C057011B2C_1 LOB13 0 __DB11G_1_A5180F78E736D02AE040A8C057011B2C_1 LOB14 0 __DB11G_1_A5180F78E736D02AE040A8C057011B2C_1 LOB15 0 __DB11G_1_A5180F78E736D02AE040A8C057011B2C_1 LOB16 0 __DB11G_1_A5180F78E736D02AE040A8C057011B2C_1 LOB17 0 __DB11G_1_A5180F78E736D02AE040A8C057011B2C_1 LOB18 0 __DB11G_1_A5180F78E736D02AE040A8C057011B2C_1 LOB19 0 __DB11G_1_A5180F78E736D02AE040A8C057011B2C_1 LOB2 0 __DB11G_1_A5180F78E736D02AE040A8C057011B2C_1 LOB20 0 __DB11G_1_A5180F78E736D02AE040A8C057011B2C_1 LOB21 0 __DB11G_1_A5180F78E736D02AE040A8C057011B2C_1 LOB22 0 __DB11G_1_A5180F78E736D02AE040A8C057011B2C_1 LOB23 0 __DB11G_1_A5180F78E736D02AE040A8C057011B2C_1 LOB24 0 __DB11G_1_A5180F78E736D02AE040A8C057011B2C_1 LOB25 0 __DB11G_1_A5180F78E736D02AE040A8C057011B2C_1 LOB26 0 __DB11G_1_A5180F78E736D02AE040A8C057011B2C_1 LOB27 0 __DB11G_1_A5180F78E736D02AE040A8C057011B2C_1 LOB28 0 __DB11G_1_A5180F78E736D02AE040A8C057011B2C_1 LOB29 0 __DB11G_1_A5180F78E736D02AE040A8C057011B2C_1 LOB3 0 __DB11G_1_A5180F78E736D02AE040A8C057011B2C_1 LOB30 0 __DB11G_1_A5180F78E736D02AE040A8C057011B2C_1 LOB4 0 __DB11G_1_A5180F78E736D02AE040A8C057011B2C_1 LOB5 0 __DB11G_1_A5180F78E736D02AE040A8C057011B2C_1 LOB6 0 __DB11G_1_A5180F78E736D02AE040A8C057011B2C_1 LOB7 0 __DB11G_1_A5180F78E736D02AE040A8C057011B2C_1 LOB8 0 __DB11G_1_A5180F78E736D02AE040A8C057011B2C_1 LOB9 0 __DB11G_1_A5180F78E736D02AE040A8C057011B2C_1 31 0 __DB11G_2_A6248CC964C16A45E040A8C0570111A4_2 32 0 __DB11G_2_A6248CC964C16A45E040A8C0570111A4_2 33 0 __DB11G_2_A6248CC964C16A45E040A8C0570111A4_2 34 0 __DB11G_2_A6248CC964C16A45E040A8C0570111A4_2 35 0 __DB11G_2_A6248CC964C16A45E040A8C0570111A4_2 36 0 __DB11G_2_A6248CC964C16A45E040A8C0570111A4_2 37 0 __DB11G_2_A6248CC964C16A45E040A8C0570111A4_2 38 0 __DB11G_2_A6248CC964C16A45E040A8C0570111A4_2 39 0 __DB11G_2_A6248CC964C16A45E040A8C0570111A4_2 40 0 __DB11G_2_A6248CC964C16A45E040A8C0570111A4_2 41 0 __DB11G_2_A6248CC964C16A45E040A8C0570111A4_2 42 0 __DB11G_2_A6248CC964C16A45E040A8C0570111A4_2 43 0 __DB11G_2_A6248CC964C16A45E040A8C0570111A4_2 44 0 __DB11G_2_A6248CC964C16A45E040A8C0570111A4_2 45 0 __DB11G_2_A6248CC964C16A45E040A8C0570111A4_2 45 rows selected. SQL>
It looks like all 45 of our SecureFile LOBs have been archived to a file and are not currently in the cache. Now let’s see if we can access one of the SecureFile LOBs:
SQL> SET SERVEROUTPUT ON; DECLARE l_LOB BLOB; l_buffer RAW(4000); l_ctr NUMBER := 1; l_amount NUMBER := 100; l_id NUMBER := 45; BEGIN SELECT hs.lobcol INTO l_LOB FROM hs_lobtable hs WHERE hs.id = l_id; -- dbms_lob.read(l_LOB, l_amount, l_ctr, l_buffer); dbms_output.put_line( RAWTOHEX(l_buffer) ); END; /SQL> 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 45506C45676E4761545861514265575641797571534A77687A4F79634D5A6E765771484867736C456B634C6845587466566377747A7A4745646A6774 4E7456704E69596857485A6B6146585767704D567275564D5A74417A616F41746C4874784B6A5061 PL/SQL procedure successfully completed. SQL>
We were able to read the first 100 bytes of the SecureFile LOB with name or id of ’45′. If we query our metadata we now see:
SQL> select name, cache_size, backupfilename from hs_store_v where name = '45'; NAME CACHE_SIZE BACKUPFILENAME --------------- ---------- ------------------------------------------------------------ 45 16384 __DB11G_2_A6248CC964C16A45E040A8C0570111A4_2 SQL>
And now the cache_size shows the size of the SecureFile LOB. For those doubters out there I’ll rename the backup file and we’ll try accessing another SecureFile LOB:
SQL> !ls -lrt total 16420 -rw-r--r-- 1 oracle oinstall 8388616 Jun 6 23:46 __DB11G_1_A5180F78E736D02AE040A8C057011B2C_1 -rw-r--r-- 1 oracle oinstall 3204 Jun 20 09:17 Oracle_Disk_SBT_Catalog -rw-r--r-- 1 oracle oinstall 8388616 Jun 20 09:17 __DB11G_2_A6248CC964C16A45E040A8C0570111A4_2 SQL> !mv __DB11G_2_A6248CC964C16A45E040A8C0570111A4_2 __DB11G_2_A6248CC964C16A45E040A8C0570111A4_2.BAK SQL> DECLARE l_LOB BLOB; l_buffer RAW(4000); l_ctr NUMBER := 1; l_amount NUMBER := 100; l_id NUMBER := 44; BEGIN SELECT hs.lobcol INTO l_LOB FROM hs_lobtable hs WHERE hs.id = l_id; -- dbms_lob.read(l_LOB, l_amount, l_ctr, l_buffer); dbms_output.put_line( RAWTOHEX(l_buffer) ); END; / 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 DECLARE * ERROR at line 1: ORA-19507: failed to retrieve sequential file, handle="__DB11G_2_A6248CC964C16A45E040A8C0570111A4_2", parms="" ORA-27029: skgfrtrv: sbtrestore returned error ORA-19511: Error received from media manager layer, error text: sbtpvt_open_input: file /ora_src/bkup/__DB11G_2_A6248CC964C16A45E040A8C0570111A4_2 does not exist or cannot be accessed, errno = 2 ORA-06512: at "SYS.DBMS_APBACKEND", line 60 ORA-06512: at "SYS.DBMS_APBACKEND", line 1378 ORA-06512: at "SYS.DBMS_DBFS_HS", line 1042 ORA-06512: at "SYS.DBMS_DBFS_HS", line 1203 ORA-06512: at "SYS.DBMS_DBFS_HS", line 1481 ORA-06512: at "SYS.DBMS_DBFS_HS", line 1592 ORA-06512: at line 1 ORA-06512: at "SYS.DBMS_DBFS_CONTENT", line 3108 ORA-06512: at "SYS.DBMS_LOB_AM_PRIVATE", line 65 ORA-06512: at line 1 ORA-06512: at "SYS.DBMS_LOB", line 1056 ORA-06512: at line 13
The access failed because the archive file could not be found. This proves that the database is capable of accessing secondary storage to transparently retrieve the SecureFile LOB. Now we’ll rename the file back and see if we can successfully retrieve the SecureFile LOB:
SQL> !mv __DB11G_2_A6248CC964C16A45E040A8C0570111A4_2.BAK __DB11G_2_A6248CC964C16A45E040A8C0570111A4_2 SQL> DECLARE l_LOB BLOB; l_buffer RAW(4000); l_ctr NUMBER := 1; l_amount NUMBER := 100; l_id NUMBER := 44; BEGIN SELECT hs.lobcol INTO l_LOB FROM hs_lobtable hs WHERE hs.id = l_id; -- dbms_lob.read(l_LOB, l_amount, l_ctr, l_buffer); dbms_output.put_line( RAWTOHEX(l_buffer) ); END; / 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 6F4A6A796978784F576F54524261634979444256535747644772655952614D4E6646735657675545566B48577762596C43707162426F7975414F5841 594B6242754D6E5948784E546E49724F78664F4968564154504C4966555156757277475077757544 PL/SQL procedure successfully completed. SQL> select name, cache_size, backupfilename from hs_store_v where name = '44'; NAME CACHE_SIZE BACKUPFILENAME --------------- ---------- ------------------------------------------------------------ 44 16384 __DB11G_2_A6248CC964C16A45E040A8C0570111A4_2 SQL>
And we see that we can indeed access the SecureFile LOB and that it has been retrieved back into the cache.
To wrap this up Oracle has made available a feature that can be used to move closer to an ILM environment. One that can transparently support multiple tiers of storage for SecureFile LOBs.

Entries
June 23rd, 2011 at 7:25 am
Excellent article!
What will hs_lobtable.lobcol contains after the LOB moved to the HS store? An id points to the staging table, or the DBFS Link itself?
And how can we move the LOBs from HS store back to the original table segment in database? Thanks
December 16th, 2011 at 2:37 am
The LOB can be retrieved from the HS using the DBMS_LOB call COPY_FROM_DBFS_LINK.