DBFS_HS – Database Hierarchical Storage, Part 2

In part 1 we set up a hierarchical store under the user hstest, created a table with SecureFile LOBs, and archived 30 SecureFiles. So what’s going on under the covers and how do we retrieve the data? Is the data even really out of the database? In this article I will try to answer these questions as we continue to explore our hierarchical store.

How it all works

At the simplest level the hierarchical store provides various APIs to accomplish a put, a get, and some informational queries. Intermixed in these functions are the ability to create “links” to archived data such that a query of an archived SecureFile LOB that is no longer in database storage will be retrieved without application intervention (i.e. application transparency). The mechanism of a put involves moving the SecureFile LOB to a cached staging area (i.e. the cache table) until it can be archived out to the store’s defined storage media. Once archived it may be eligible to be “flushed” from the cache table in the database and therefore no longer reside in database storage. Once SecureFile LOB’s are moved to a store cache, or really staging, table they are eventually written to the store’s storage media in batches or “tarballs”. The size of these tarballs is controlled by a property defined for the store in the DBFS_HS$_STOREPROPERTIES table. If a “get” is issued and the SecureFile LOB is in the cache table then it can be retrieved directly. If not then the SecureFile LOB must be recalled from the tarball and written back into the SecureFile LOB’s row in the cache table.

This raises a question about storage. Ideally you would expect to see the cache table steady state at some fairly constant size as the rate of archiving and retrieval evens out. But it seems pretty obvious that the cache table is a good candidate for a dedicated tablespace. Since its space usage is not necessarily predictable it would be nice to keep it separate from other objects so that it’s size could be capped and not allowed to grow uncontrollably. It’s not clear if the cache table could be partitioned, but this would probably only be practical if it was very large. The partitioning scheme would also need to be set in a way that new and old SecureFile LOBs were being read and written to from each partition in order to insure that space was reused. The cache table also seems to be a good candidate for a second tier of storage. If the source table(s) are on a primary storage tier then clearly the cache table is not going to need to meet the same performance criteria, otherwise why archive? Since the actual store will be on a media outside the database this raises the possibility of having at least three storage tiers.

Database Objects

Next let’s look at what objects have been created. The following SYS dictionary tables exist to support dbms_dbfs:

  • DBFS_HS$_BACKUPFILETABLE
  • DBFS_HS$_CONTENTFILEMAPTBL
  • DBFS_HS$_FS
  • DBFS_HS$_PROPERTY
  • DBFS_HS$_SFLOCATORTABLE
  • DBFS_HS$_STORECOMMANDS
  • DBFS_HS$_STOREID2POLICYCTX
  • DBFS_HS$_STOREIDTABLE
  • DBFS_HS$_STOREPROPERTIES
  • DBFS_SFS$_FS
  • DBFS_SFS$_FST
  • DBFS_SFS$_FSTO
  • DBFS_SFS$_FSTP
  • DBFS_SFS$_SNAP
  • DBFS_SFS$_TAB
  • DBFS_SFS$_VOL
  • DBFS$_MOUNTS
  • DBFS$_STATS
  • DBFS$_STORES

Additionally for each store created the following tables are created in the SYS schema:

  • HS$AT<dbname><seq>
  • HS$CT<dbname><seq>
  • HS$DB<dbname><seq>
  • HS$DT<dbname><seq>
  • HS$JT<dbname><seq>
  • HS$TT<dbname><seq>

When a store is created running the dbms_dbfs_hs.createStore procedure a table is created from the “tblname” input parameter. This table is used as a metadata table and stores the SecureFile LOB in the “filedata” column. A second table is also created with the name in the format of SFS$_FSTP_<seq>. This table holds more information about how the SecureFile LOB is archived and provides the gateway into the SYS metadata tables.

A view is also created with the name in the format of VSFS$_<seq> which exposes store based metadata about the archived SecureFile LOB.

In my database the following tables and view have been created:

Store View

The following view can be used to provide a mapping between the archived SecureFile LOB and it’s location in the store and cache. A separate view is required for each store due to the use of separate cache and metadata tables for each store.

create or replace force view hs_store_vas
select
  c.pathname AS path,
  c.item AS name,
  c.std_guid AS guid,
  sf2.propvalue AS file_size,
  dbms_lob.getlength(c.filedata) AS cache_size,
  sf1.propvalue AS contentid,
  l.tarballid,
  b.backupfilename,
  st.storeowner,
  st.storename
from
  hstest.hs_cache c,
  hstest.sfs$_fstp_12 sf1,
  hstest.sfs$_fstp_12 sf2,
  sys.dbfs_hs$_contentfnmaptbl m,
  sys.dbfs_hs$_sflocatortable l,
  sys.dbfs_hs$_backupfiletable b,
  sys.dbfs_hs$_storeidtable st
where
  c.std_guid = sf1.std_guid
  and c.pathtype = 1
  and sf1.propname = 'HS:CONTENTID'
  and sf1.propvalue = m.contentfilename(+)
  and sf1.std_guid = sf2.std_guid
  and sf2.propname = 'HS:FILESIZE'
  and m.archiverefid = l.archiverefid(+)
  and l.tarballid = b.tarballid(+)
  and b.storeid = st.storeid(+);

Usage Examples

Now that everything is setup let’s walk through some examples of how DBFS HS works. First let’s look at what we have. In part 1 of this series I archived 30 SecureFile LOBs and that created one “tarball” operating system file:

380:oracle@testvm:/ora_src/bkup [db11g]
$ ls -lrt
total 8212
-rw-r--r-- 1 oracle oinstall    2136 Jun  6 23:46 Oracle_Disk_SBT_Catalog
-rw-r--r-- 1 oracle oinstall 8388616 Jun  6 23:46 __DB11G_1_A5180F78E736D02AE040A8C057011B2C_1
381:oracle@testvm:/ora_src/bkup [db11g]

Now we’ll archive 15 more SecureFile LOBs, but we’ll just start at the beginning to show that if the SecureFile LOB has already been archived then re-archiving it does not invoke an error:

SQL> SET SERVEROUTPUT ON;
DECLARE
  v_mountpoint         VARCHAR2(32) := '/hs_store/';
  v_LOB                BLOB;
  v_LOBsize            NUMBER;
  v_debug              CHAR(1) := 'N';
BEGIN
  FOR arch_rec IN
  (
    SELECT hs.* FROM hs_lobtable hs
    where dbms_lob.getlength(hs.lobcol) != 0
    and rownum <= 45   )   LOOP     BEGIN       SELECT hs.lobcol       INTO v_LOB       FROM hs_lobtable hs       WHERE hs.id = arch_rec.id FOR UPDATE;       --       v_LOBsize := SYS.DBMS_LOB.GETLENGTH(v_LOB);       DBMS_OUTPUT.PUT_LINE('Archive ID: ' || arch_rec.id);       DBMS_OUTPUT.PUT_LINE('LOB size: ' || v_LOBsize);       --       -- Write the LOB to the store       --       DBMS_LOB.MOVE_TO_DBFS_LINK (         lob_loc => v_LOB,
        storage_path => v_mountpoint || arch_rec.id,
        flags => DBMS_LOB.DBFS_LINK_NOCACHE);
      --
      COMMIT;
    EXCEPTION
      WHEN OTHERS THEN
        ROLLBACK;
        DBMS_OUTPUT.PUT_LINE(DBMS_UTILITY.FORMAT_ERROR_BACKTRACE);
    END;
  END LOOP;
END;
/
SQL>   2    3    4    5    6    7    8    9   10   11   12   13   14   15   16   17   18   19   20   21   22   23   24   25

26   27   28   29   30   31   32   33   34   35   36   37   38   39

Archive ID: 1
LOB size: 16384
Archive ID: 2
LOB size: 16384
Archive ID: 3
LOB size: 16384
Archive ID: 4
LOB size: 16384
Archive ID: 5
LOB size: 16384
Archive ID: 6
LOB size: 16384
Archive ID: 7
LOB size: 16384
Archive ID: 8
LOB size: 16384
Archive ID: 9
LOB size: 16384
Archive ID: 10
LOB size: 16384
Archive ID: 11
LOB size: 16384
Archive ID: 12
LOB size: 16384
Archive ID: 13
LOB size: 16384
Archive ID: 14
LOB size: 16384
Archive ID: 15
LOB size: 16384
Archive ID: 16
LOB size: 16384
Archive ID: 17
LOB size: 16384
Archive ID: 18
LOB size: 16384
Archive ID: 19
LOB size: 16384
Archive ID: 20
LOB size: 16384
Archive ID: 21
LOB size: 16384
Archive ID: 22
LOB size: 16384
Archive ID: 23
LOB size: 16384
Archive ID: 24
LOB size: 16384
Archive ID: 25
LOB size: 16384
Archive ID: 26
LOB size: 16384
Archive ID: 27
LOB size: 16384
Archive ID: 28
LOB size: 16384
Archive ID: 29
LOB size: 16384
Archive ID: 30
LOB size: 16384
Archive ID: 31
LOB size: 16384
Archive ID: 32
LOB size: 16384
Archive ID: 33
LOB size: 16384
Archive ID: 34
LOB size: 16384
Archive ID: 35
LOB size: 16384
Archive ID: 36
LOB size: 16384
Archive ID: 37
LOB size: 16384
Archive ID: 38
LOB size: 16384
Archive ID: 39
LOB size: 16384
Archive ID: 40
LOB size: 16384
Archive ID: 41
LOB size: 16384
Archive ID: 42
LOB size: 16384
Archive ID: 43
LOB size: 16384
Archive ID: 44
LOB size: 16384
Archive ID: 45
LOB size: 16384

PL/SQL procedure successfully completed.

SQL>

Now let’s run a store push and see if we’ve create another archive file:

SQL> DECLARE
  store_name varchar2(32) := 'HS_STORE';
BEGIN
  dbms_dbfs_hs.storePush(store_name) ;
  commit ;
END;
/  2    3    4    5    6    7

PL/SQL procedure successfully completed.

SQL> !ls -lrt
total 16420
-rw-r--r-- 1 oracle oinstall 8388616 Jun  6 23:46 __DB11G_1_A5180F78E736D02AE040A8C057011B2C_1
-rw-r--r-- 1 oracle oinstall    3204 Jun 20 09:17 Oracle_Disk_SBT_Catalog
-rw-r--r-- 1 oracle oinstall 8388616 Jun 20 09:17 __DB11G_2_A6248CC964C16A45E040A8C0570111A4_2

SQL>

OK, we’ve created a new archive file in the OS. Let’s look at what our metadata view says about our store and the cache for the store:

SQL> select name, cache_size, backupfilename from hs_store_v;

NAME            CACHE_SIZE BACKUPFILENAME
--------------- ---------- ------------------------------------------------------------
LOB1                     0 __DB11G_1_A5180F78E736D02AE040A8C057011B2C_1
LOB10                    0 __DB11G_1_A5180F78E736D02AE040A8C057011B2C_1
LOB11                    0 __DB11G_1_A5180F78E736D02AE040A8C057011B2C_1
LOB12                    0 __DB11G_1_A5180F78E736D02AE040A8C057011B2C_1
LOB13                    0 __DB11G_1_A5180F78E736D02AE040A8C057011B2C_1
LOB14                    0 __DB11G_1_A5180F78E736D02AE040A8C057011B2C_1
LOB15                    0 __DB11G_1_A5180F78E736D02AE040A8C057011B2C_1
LOB16                    0 __DB11G_1_A5180F78E736D02AE040A8C057011B2C_1
LOB17                    0 __DB11G_1_A5180F78E736D02AE040A8C057011B2C_1
LOB18                    0 __DB11G_1_A5180F78E736D02AE040A8C057011B2C_1
LOB19                    0 __DB11G_1_A5180F78E736D02AE040A8C057011B2C_1
LOB2                     0 __DB11G_1_A5180F78E736D02AE040A8C057011B2C_1
LOB20                    0 __DB11G_1_A5180F78E736D02AE040A8C057011B2C_1
LOB21                    0 __DB11G_1_A5180F78E736D02AE040A8C057011B2C_1
LOB22                    0 __DB11G_1_A5180F78E736D02AE040A8C057011B2C_1
LOB23                    0 __DB11G_1_A5180F78E736D02AE040A8C057011B2C_1
LOB24                    0 __DB11G_1_A5180F78E736D02AE040A8C057011B2C_1
LOB25                    0 __DB11G_1_A5180F78E736D02AE040A8C057011B2C_1
LOB26                    0 __DB11G_1_A5180F78E736D02AE040A8C057011B2C_1
LOB27                    0 __DB11G_1_A5180F78E736D02AE040A8C057011B2C_1
LOB28                    0 __DB11G_1_A5180F78E736D02AE040A8C057011B2C_1
LOB29                    0 __DB11G_1_A5180F78E736D02AE040A8C057011B2C_1
LOB3                     0 __DB11G_1_A5180F78E736D02AE040A8C057011B2C_1
LOB30                    0 __DB11G_1_A5180F78E736D02AE040A8C057011B2C_1
LOB4                     0 __DB11G_1_A5180F78E736D02AE040A8C057011B2C_1
LOB5                     0 __DB11G_1_A5180F78E736D02AE040A8C057011B2C_1
LOB6                     0 __DB11G_1_A5180F78E736D02AE040A8C057011B2C_1
LOB7                     0 __DB11G_1_A5180F78E736D02AE040A8C057011B2C_1
LOB8                     0 __DB11G_1_A5180F78E736D02AE040A8C057011B2C_1
LOB9                     0 __DB11G_1_A5180F78E736D02AE040A8C057011B2C_1
31                       0 __DB11G_2_A6248CC964C16A45E040A8C0570111A4_2
32                       0 __DB11G_2_A6248CC964C16A45E040A8C0570111A4_2
33                       0 __DB11G_2_A6248CC964C16A45E040A8C0570111A4_2
34                       0 __DB11G_2_A6248CC964C16A45E040A8C0570111A4_2
35                       0 __DB11G_2_A6248CC964C16A45E040A8C0570111A4_2
36                       0 __DB11G_2_A6248CC964C16A45E040A8C0570111A4_2
37                       0 __DB11G_2_A6248CC964C16A45E040A8C0570111A4_2
38                       0 __DB11G_2_A6248CC964C16A45E040A8C0570111A4_2
39                       0 __DB11G_2_A6248CC964C16A45E040A8C0570111A4_2
40                       0 __DB11G_2_A6248CC964C16A45E040A8C0570111A4_2
41                       0 __DB11G_2_A6248CC964C16A45E040A8C0570111A4_2
42                       0 __DB11G_2_A6248CC964C16A45E040A8C0570111A4_2
43                       0 __DB11G_2_A6248CC964C16A45E040A8C0570111A4_2
44                       0 __DB11G_2_A6248CC964C16A45E040A8C0570111A4_2
45                       0 __DB11G_2_A6248CC964C16A45E040A8C0570111A4_2

45 rows selected.

SQL>

It looks like all 45 of our SecureFile LOBs have been archived to a file and are not currently in the cache. Now let’s see if we can access one of the SecureFile LOBs:

SQL> SET SERVEROUTPUT ON;
DECLARE
  l_LOB                BLOB;
  l_buffer             RAW(4000);
  l_ctr                NUMBER := 1;
  l_amount             NUMBER := 100;
  l_id                 NUMBER := 45;
BEGIN
  SELECT hs.lobcol
  INTO l_LOB
  FROM hs_lobtable hs
  WHERE hs.id = l_id;
  --
  dbms_lob.read(l_LOB, l_amount, l_ctr, l_buffer);
  dbms_output.put_line( RAWTOHEX(l_buffer) );
END;
/SQL>   2    3    4    5    6    7    8    9   10   11   12   13   14   15   16
45506C45676E4761545861514265575641797571534A77687A4F79634D5A6E765771484867736C456B634C6845587466566377747A7A4745646A6774
4E7456704E69596857485A6B6146585767704D567275564D5A74417A616F41746C4874784B6A5061

PL/SQL procedure successfully completed.

SQL>

We were able to read the first 100 bytes of the SecureFile LOB with name or id of ’45′. If we query our metadata we now see:

SQL> select name, cache_size, backupfilename from hs_store_v where name = '45';

NAME            CACHE_SIZE BACKUPFILENAME
--------------- ---------- ------------------------------------------------------------
45                   16384 __DB11G_2_A6248CC964C16A45E040A8C0570111A4_2

SQL>

And now the cache_size shows the size of the SecureFile LOB. For those doubters out there I’ll rename the backup file and we’ll try accessing another SecureFile LOB:

SQL> !ls -lrt
total 16420
-rw-r--r-- 1 oracle oinstall 8388616 Jun  6 23:46 __DB11G_1_A5180F78E736D02AE040A8C057011B2C_1
-rw-r--r-- 1 oracle oinstall    3204 Jun 20 09:17 Oracle_Disk_SBT_Catalog
-rw-r--r-- 1 oracle oinstall 8388616 Jun 20 09:17 __DB11G_2_A6248CC964C16A45E040A8C0570111A4_2

SQL> !mv __DB11G_2_A6248CC964C16A45E040A8C0570111A4_2 __DB11G_2_A6248CC964C16A45E040A8C0570111A4_2.BAK

SQL> DECLARE
  l_LOB                BLOB;
  l_buffer             RAW(4000);
  l_ctr                NUMBER := 1;
  l_amount             NUMBER := 100;
  l_id                 NUMBER := 44;
BEGIN
  SELECT hs.lobcol
  INTO l_LOB
  FROM hs_lobtable hs
  WHERE hs.id = l_id;
  --
  dbms_lob.read(l_LOB, l_amount, l_ctr, l_buffer);
  dbms_output.put_line( RAWTOHEX(l_buffer) );
END;
/  2    3    4    5    6    7    8    9   10   11   12   13   14   15   16
DECLARE
*
ERROR at line 1:
ORA-19507: failed to retrieve sequential file, handle="__DB11G_2_A6248CC964C16A45E040A8C0570111A4_2", parms=""
ORA-27029: skgfrtrv: sbtrestore returned error
ORA-19511: Error received from media manager layer, error text:
sbtpvt_open_input: file /ora_src/bkup/__DB11G_2_A6248CC964C16A45E040A8C0570111A4_2 does not exist or cannot be
accessed, errno = 2
ORA-06512: at "SYS.DBMS_APBACKEND", line 60
ORA-06512: at "SYS.DBMS_APBACKEND", line 1378
ORA-06512: at "SYS.DBMS_DBFS_HS", line 1042
ORA-06512: at "SYS.DBMS_DBFS_HS", line 1203
ORA-06512: at "SYS.DBMS_DBFS_HS", line 1481
ORA-06512: at "SYS.DBMS_DBFS_HS", line 1592
ORA-06512: at line 1
ORA-06512: at "SYS.DBMS_DBFS_CONTENT", line 3108
ORA-06512: at "SYS.DBMS_LOB_AM_PRIVATE", line 65
ORA-06512: at line 1
ORA-06512: at "SYS.DBMS_LOB", line 1056
ORA-06512: at line 13

The access failed because the archive file could not be found. This proves that the database is capable of accessing secondary storage to transparently retrieve the SecureFile LOB. Now we’ll rename the file back and see if we can successfully retrieve the SecureFile LOB:

SQL> !mv __DB11G_2_A6248CC964C16A45E040A8C0570111A4_2.BAK __DB11G_2_A6248CC964C16A45E040A8C0570111A4_2

SQL> DECLARE
  l_LOB                BLOB;
  l_buffer             RAW(4000);
  l_ctr                NUMBER := 1;
  l_amount             NUMBER := 100;
  l_id                 NUMBER := 44;
BEGIN
  SELECT hs.lobcol
  INTO l_LOB
  FROM hs_lobtable hs
  WHERE hs.id = l_id;
  --
  dbms_lob.read(l_LOB, l_amount, l_ctr, l_buffer);
  dbms_output.put_line( RAWTOHEX(l_buffer) );
END;
/  2    3    4    5    6    7    8    9   10   11   12   13   14   15   16
6F4A6A796978784F576F54524261634979444256535747644772655952614D4E6646735657675545566B48577762596C43707162426F7975414F5841
594B6242754D6E5948784E546E49724F78664F4968564154504C4966555156757277475077757544

PL/SQL procedure successfully completed.

SQL> select name, cache_size, backupfilename from hs_store_v where name = '44';

NAME            CACHE_SIZE BACKUPFILENAME
--------------- ---------- ------------------------------------------------------------
44                   16384 __DB11G_2_A6248CC964C16A45E040A8C0570111A4_2

SQL>

And we see that we can indeed access the SecureFile LOB and that it has been retrieved back into the cache.

To wrap this up Oracle has made available a feature that can be used to move closer to an ILM environment. One that can transparently support multiple tiers of storage for SecureFile LOBs.

2 Responses to “DBFS_HS – Database Hierarchical Storage, Part 2”

  1. newbie Says:

    Excellent article!

    What will hs_lobtable.lobcol contains after the LOB moved to the HS store? An id points to the staging table, or the DBFS Link itself?

    And how can we move the LOBs from HS store back to the original table segment in database? Thanks

  2. Andy Rivenes Says:

    The LOB can be retrieved from the HS using the DBMS_LOB call COPY_FROM_DBFS_LINK.

Leave a Reply