The LAG Function
The Oracle LAG function gives the value of the previous row within a table without the need for a self join. The LAG and LEAD functions have been available since Oracle 8.1.6 when analytic functions were first introduced. Tom Kyte dedicated an entire chapter to analytic functions in his first book, “Expert One-on-One Oracle” and they have obviously been part of the Oracle documentation as well. Their adoption has not been as quick though, and I will admit to being slow when it comes to converting code from using a self join and subtracting n and n+1 values rather than using LAG. In the open source SYSMON code many of the views used for interval summation used a function of the form:
DECODE( SIGN(e.phyrds - b.phyrds),-1,0,
(e.phyrds - b.phyrds) ),
This required a self join to get the begin and end values for each interval. This was done to insure that no negative values occurred when an instance restart occurred. The full SQL looked similar to this:
WITH g_fileio AS
(
select
fs.snap_id,
fs.dbid,
fs.instance_number,
fs.tsname,
fs.filename,
fs.phyrds
from
dba_hist_filestatxs fs
)
SELECT
b.begin_interval_time,
e.begin_interval_time,
e.snap_id,
e.dbid,
e.instance_number,
e.tsname,
e.filename,
DECODE( SIGN(e.phyrds - b.phyrds),-1,0,
(e.phyrds - b.phyrds) ),
DECODE( SIGN(e.phyblkrd - b.phyblkrd),-1,0,
(e.phyblkrd - b.phyblkrd) ),
DECODE( SIGN(e.readtim - b.readtim),-1,0,
(e.readtim - b.readtim) ),
DECODE( SIGN(e.singleblkrds - b.singleblkrds),-1,0,
(e.singleblkrds - b.singleblkrds) ),
DECODE( SIGN(e.singleblkrdtim - b.singleblkrdtim),-1,0,
(e.singleblkrdtim - b.singleblkrdtim) ),
DECODE( SIGN(e.phywrts - b.phywrts),-1,0,
(e.phywrts - b.phywrts) ),
DECODE( SIGN(e.phyblkwrt - b.phyblkwrt),-1,0,
(e.phyblkwrt - b.phyblkwrt) ),
DECODE( SIGN(e.writetim - b.writetim),-1,0,
(e.writetim - b.writetim) )
FROM
g_fileio b,
g_fileio e
WHERE
e.snap_id = b.snap_id + 1
AND b.tsname = e.tsname
AND b.filename = e.filename
AND b.dbid = e.dbid
AND b.instance_number = e.instance_number
The following will compare the “old” way of using a self join with the LAG function.
LAG Function Usage
The following example will show how the LAG function can be used when accessing AWR tables in Oracle 11g. Initially just one “value” will be accessed to show how the LAG function works. The following SQL illustrates the example:
column filename format a50;
WITH g_fileio AS
(
select
fs.snap_id,
fs.dbid,
fs.instance_number,
fs.tsname,
fs.filename,
fs.phyrds
from
dba_hist_filestatxs fs
)
SELECT
e.snap_id,
e.instance_number,
e.filename,
e.phyrds as curr_rds,
LAG(e.phyrds,1,0) OVER(
PARTITION BY e.dbid, e.instance_number, e.tsname, e.filename
ORDER BY e.snap_id ) as prev_rds
FROM
g_fileio e
WHERE
e.snap_id between 83578 and 83590
and e.tsname = 'DATA_TBS'
and e.instance_number = 2
ORDER BY
e.snap_id
/
This LAG function has been written in a manner that if the previous row is outside the scope of our result set it will return a zero, otherwise it will return the previous row’s value for phyrds. When this SQL is run we see that for the first interval that the “previous” value is 0. This makes sense since there is no previous row in our set, as constrained by “e.snap_id between 83578 and 83590″, and as we said previously our LAG function has been defined to return a zero in that case. The rest of the rows return the value of the previous row’s phyrds column. Notice that we’ve order by snap_id to make this easy to see in the example below:
SNAP_ID INSTANCE_NUMBER FILENAME CURRRDS PREVRDS
---------- --------------- ------------------------------------- ---------- ----------
83578 2 +ORADATA01/db11g/data_tbs_01.dbf 36477 0
83579 2 +ORADATA01/db11g/data_tbs_01.dbf 36484 36477
83580 2 +ORADATA01/db11g/data_tbs_01.dbf 36484 36484
83581 2 +ORADATA01/db11g/data_tbs_01.dbf 36512 36484
83582 2 +ORADATA01/db11g/data_tbs_01.dbf 36512 36512
83583 2 +ORADATA01/db11g/data_tbs_01.dbf 36519 36512
83584 2 +ORADATA01/db11g/data_tbs_01.dbf 36519 36519
83585 2 +ORADATA01/db11g/data_tbs_01.dbf 36541 36519
83586 2 +ORADATA01/db11g/data_tbs_01.dbf 36541 36541
83587 2 +ORADATA01/db11g/data_tbs_01.dbf 36542 36541
83588 2 +ORADATA01/db11g/data_tbs_01.dbf 36544 36542
83589 2 +ORADATA01/db11g/data_tbs_01.dbf 36544 36544
83590 2 +ORADATA01/db11g/data_tbs_01.dbf 36545 36544
This is an important observation that we will explore further in a subsequent section.
Efficiency
So an interesting question is how much better, if at all, is the LAG function over the original self join?
In order to make the “lag” example produce the same output as the “self-join” example the LAG function was moved into the “WITH” clause so that the row previous to the first row of the result set would be available. In our first example we didn’t do this to illustrate how the LAG function handles the situation where there is no previous row.
The “self-join” SQL:
column filename format a50;
SELECT
e.snap_id,
e.instance_number,
e.filename,
DECODE( SIGN(e.phyrds - b.phyrds),-1,0,
(e.phyrds - b.phyrds) ) AS delta_phyrds
FROM
dba_hist_filestatxs b,
dba_hist_filestatxs e
WHERE
e.snap_id = b.snap_id + 1
AND b.filename = e.filename
AND b.dbid = e.dbid
AND b.instance_number = e.instance_number
and e.snap_id between 83578 and 83590
and e.tsname = 'DATA_TBS'
and e.instance_number = 2
ORDER BY
e.snap_id
/
The “lag” SQL:
column filename format a50;
WITH g_fileio AS
(
select
fs.snap_id,
fs.dbid,
fs.instance_number,
fs.tsname,
fs.filename,
fs.phyrds - LAG(fs.phyrds,1,0)
OVER( PARTITION BY fs.dbid, fs.instance_number, fs.tsname, fs.filename
ORDER BY fs.snap_id ) as delta_phyrds
from
dba_hist_filestatxs fs
)
SELECT
e.snap_id,
e.instance_number,
e.filename,
e.delta_phyrds
FROM
g_fileio e
WHERE
e.snap_id BETWEEN 83578 AND 83590 AND
e.tsname = 'DATA_TBS' AND
e.instance_number = 2
ORDER BY
e.snap_id
/
Using Tom Kyte’s “runstats” utility makes testing the difference between these two methods easy. The following will use a slightly modified version available here that summarizes the output into a “workload” format. This was used here because it shows just what is needed to see the difference between the two methods.
The test was run several times to try and filter out the affects of whether the data was cached or not. In fact, the “self-join” version is particularly sensitive to caching and can be expected to run much more slowly when the data is not if the buffer cache.
Test Script
exec runStats_pkg.rs_start;
column filename format a50;
SELECT
e.snap_id,
e.instance_number,
e.filename,
DECODE( SIGN(e.phyrds - b.phyrds),-1,0,
(e.phyrds - b.phyrds) ) AS delta_phyrds
FROM
dba_hist_filestatxs b,
dba_hist_filestatxs e
WHERE
e.snap_id = b.snap_id + 1
AND b.filename = e.filename
AND b.dbid = e.dbid
AND b.instance_number = e.instance_number
and e.snap_id between 83578 and 83590
and e.tsname = 'CMSPUB_DATA_TBS'
and e.instance_number = 2
ORDER BY
e.snap_id
/
exec runStats_pkg.rs_middle;
column filename format a50;
WITH g_fileio AS
(
select
fs.snap_id,
fs.dbid,
fs.instance_number,
fs.tsname,
fs.filename,
fs.phyrds - LAG(fs.phyrds,1,0) OVER( PARTITION BY fs.dbid, fs.instance_number, fs.tsname, fs.filename
ORDER BY fs.snap_id ) as delta_phyrds
from
dba_hist_filestatxs fs
)
SELECT
e.snap_id,
e.instance_number,
e.filename,
e.delta_phyrds
FROM
g_fileio e
WHERE
e.snap_id between 83578 and 83590
and e.tsname = 'CMSPUB_DATA_TBS'
and e.instance_number = 2
ORDER BY
e.snap_id
/
exec runStats_pkg.rs_stop(p_output=>'WORKLOAD');
Workload Results
The following are the results from three tests run one right after another. In each case the LAG is faster and does less work. In the third run a I reversed the order just to see if running either one first made any difference.
SQL> Run1 ran in 118 hsecs Run2 ran in 11 hsecs run 1 ran in 1072.73% of the time Name Run1 Run2 Diff STAT...sorts (memory) 2 3 1 STAT...physical reads 29 0 -29 STAT...physical reads cache 29 0 -29 STAT...CPU used by this session 89 13 -76 STAT...Elapsed Time 121 14 -107 STAT...redo size 4,568 4,816 248 STAT...sorts (rows) 5,860 11,707 5,847 STAT...session logical reads 170,866 19,517 -151,349 Run1 latches total versus runs -- difference and pct Run1 Run2 Diff Pct 1,793,976 135,443 -1,658,533 1,324.52% PL/SQL procedure successfully completed. SQL> SQL> Run1 ran in 85 hsecs Run2 ran in 24 hsecs run 1 ran in 354.17% of the time Name Run1 Run2 Diff STAT...recursive cpu usage 3 2 -1 STAT...sorts (memory) 2 3 1 STAT...physical reads cache 8 0 -8 STAT...physical reads 8 0 -8 STAT...CPU used by this session 75 25 -50 STAT...Elapsed Time 88 28 -60 STAT...redo size 5,952 5,108 -844 STAT...sorts (rows) 5,861 11,709 5,848 STAT...session logical reads 170,942 19,546 -151,396 Run1 latches total versus runs -- difference and pct Run1 Run2 Diff Pct 1,895,158 701,188 -1,193,970 270.28% PL/SQL procedure successfully completed. SQL> SQL> Run1 ran in 24 hsecs Run2 ran in 88 hsecs run 1 ran in 27.27% of the time Name Run1 Run2 Diff STAT...sorts (memory) 3 2 -1 STAT...recursive cpu usage 2 4 2 STAT...Elapsed Time 28 90 62 STAT...CPU used by this session 25 90 65 STAT...redo size 4,288 5,908 1,620 STAT...sorts (rows) 11,709 5,861 -5,848 STAT...session logical reads 19,500 170,948 151,448 Run1 latches total versus runs -- difference and pct Run1 Run2 Diff Pct 510,514 2,151,923 1,641,409 23.72% PL/SQL procedure successfully completed. SQL>
Caveats
The biggest problem with using the LAG function is to insure that the row set that it operates on is complete. This means that the leading row prior to the first row used in the result set must be available. This usually means that an inline view or a with statement must be used to insure that the resulting query has a prior row to work with. The following example will try to illustrate this. In the first case the LAG function is calculated as part of a single query. The where clause restricts the data to a specific range of snap_ids and the resulting lag entry is incorrect for the first row:
select
snap_id,
wait_class,
total_waits_fg AS waits_fg,
total_waits_fg - LAG(total_waits_fg,1,0)
OVER(PARTITION BY dbid, instance_number, event_id, wait_class_id
ORDER BY snap_id) AS waits_fg_lag
from
dba_hist_system_event
where
snap_id between 28452 and 28462
and wait_class = 'Commit'
and instance_number = 1
order by
snap_id
/
SNAP_ID WAIT_CLASS WAITS_FG WAITS_FG_LAG
---------- ------------------------------------------------ ---------- ------------
28452 Commit 1241804 1241804
28453 Commit 1244080 2276
28454 Commit 1246347 2267
28455 Commit 1248675 2328
28456 Commit 1250995 2320
28457 Commit 1253477 2482
28458 Commit 1255821 2344
28459 Commit 1258202 2381
28460 Commit 1260383 2181
28461 Commit 1262600 2217
28462 Commit 1264982 2382
11 rows selected.
SQL>
The next example uses an inline view to allow the LAG function to have the row prior to the first row available to operate on:
compute sum of waits_fg_lag on wait_class;
break on report;
select *
from (
select
snap_id,
wait_class,
total_waits_fg AS waits_fg,
total_waits_fg - LAG(total_waits_fg,1,0)
OVER(PARTITION BY dbid, instance_number, event_id, wait_class_id
ORDER BY snap_id) AS waits_fg_lag
from
dba_hist_system_event
where
instance_number = 1
and wait_class = 'Commit'
)
where
snap_id between (28452 + 1) and 28462
order by
snap_id
/
SNAP_ID WAIT_CLASS WAITS_FG WAITS_FG_LAG
---------- ------------------------------------------------ ---------- ------------
28453 Commit 1244080 2276
28454 Commit 1246347 2267
28455 Commit 1248675 2328
28456 Commit 1250995 2320
28457 Commit 1253477 2482
28458 Commit 1255821 2344
28459 Commit 1258202 2381
28460 Commit 1260383 2181
28461 Commit 1262600 2217
28462 Commit 1264982 2382
10 rows selected.
SQL>
In this case the lag result for the first row is correct. Again, this is because the row prior to the first snap_id is available to the LAG function.
Conclusion
The LAG function is much more efficient in these tests than the self-join. The session logical reads average about 19,500 for the lag test and around 171,000 for the self-join test. The run time is always less for the lag test and the self-join test seems particularly sensitive to the benefits of caching when run continually. When initially run the time to run was significantly worse (see Full Runstats Test):
Run1 ran in 8296 hsecs Run2 ran in 1913 hsecs run 1 ran in 433.66% of the time
The only disadvantage is the more care must be taken to code the query in such a way that the row prior to the first row wanted is available to the LAG function.
Entries