Archive for the ‘Oracle DBA’ Category

ASMLIB and RPM Versions

April 26th, 2010

When determining the proper versions of ASMLIB that should be installed, sometimes it’s useful to know the source RPM as well as the name. Typically I’ll run the rpm command with the “-qa” options:


$ rpm -qa | grep asm
oracleasm-2.6.18-164.0.0.0.1.el5xen-2.0.5-1.el5
oracleasmlib-2.0.4-1.el5
oracleasm-support-2.1.3-1.el5
$

But there are times when deciphering the kernel version that it’s nice to see the source rpm name as well. This can be accomplished with the “-i” option:


$ rpm -q -i -a | grep asm
Name        : oracleasm-2.6.18-164.0.0.0.1.el5xen  Relocations: (not relocatable)
Group       : System Environment/Kernel     Source RPM: oracleasm-2.6.18-164.0.0.0.1.el5-2.0.5-1.el5.src.rpm
URL         : http://oss.oracle.com/projects/oracleasm/
Name        : oracleasmlib                 Relocations: (not relocatable)
Group       : System Environment/Kernel     Source RPM: oracleasmlib-2.0.4-1.el5.src.rpm
Name        : oracleasm-support            Relocations: (not relocatable)
Group       : System Environment/Kernel     Source RPM: oracleasm-support-2.1.3-1.el5.src.rpm
URL         : http://oss.oracle.com/projects/oracleasm/
$

When installing ASMLIB there are three rpms that need to be verified. See http://www.oracle.com/technology/tech/linux/asmlib/driver_matrix.html for more information. There are also two versions available, one for the 2.4 kernel and one for 2.6 kernels. See http://www.oracle.com/technology/tech/linux/asmlib/driver_matrix.html.

When verifying which kernel version to use a “uname -r” command will show the currently running kernel version:


$ uname -r
2.6.18-164.0.0.0.1.el5xen
$

Typically I use “uname -a” for completeness:


$ uname -a
Linux testvm 2.6.18-164.0.0.0.1.el5xen #1 SMP Wed Sep 2 22:48:20 EDT 2009 i686 athlon i386 GNU/Linux
$

To double check the kernel rpms installed:


$ rpm -qa | grep kernel
kernel-xen-devel-2.6.18-164.0.0.0.1.el5
kernel-xen-2.6.18-164.0.0.0.1.el5
kernel-headers-2.6.18-164.0.0.0.1.el5
453:oracle@testvm:/home/oracle []
$

Once the kernel version is established, the appropriate rpm can be downloaded starting from the main ASMLIB page.

The entry for the kernel shown here looks like:

It can get confusing since there are also entries for higher levels:
Notice the 164.2.1, currently there are also 6, 9, 10, 11 and 15 versions available as well so it’s important to verify the exact kernel version.

RMAN Tape Simulation

April 8th, 2010

I needed a way to simulate an SBT channel recently using RMAN and was reminded of a slick technique that is documented in Oracle Support Note: 312737.1, RMAN and Specific Media Managers Environment Variables. The following is a simple example:


$ORACLE_HOME/bin/rman target / nocatalog

run {
allocate channel ch00 device type 'SBT_TAPE'
PARMS="SBT_LIBRARY=oracle.disksbt,
ENV=(BACKUP_DIR=/app/local/oracle/orabkup/db11g/tape)";
backup datafile 1 format='%U';
}

When this backup is run it creates a disk file in the directory listed in the BACKUP_DIR variable that emulates a tape file. This is especially handy if you’re testing the new database file system hierarchical store (e.g. dbfs_hs) features in Oracle 11gR2, since only SBT output is supported.

The following output shows the results of running this in 11.2:


$ $ORACLE_HOME/bin/rman target / nocatalog

Recovery Manager: Release 11.2.0.1.0 - Production on Thu Apr 8 15:00:53 2010

Copyright (c) 1982, 2009, Oracle and/or its affiliates.  All rights reserved.

connected to target database: DB11G (DBID=179279599)
using target database control file instead of recovery catalog

RMAN> run {
allocate channel ch00 device type 'SBT_TAPE'
PARMS="SBT_LIBRARY=oracle.disksbt,
ENV=(BACKUP_DIR=/app/local/oracle/orabkup/db11g/tape)";
backup datafile 1 format='%U';
}2> 3> 4> 5> 6>

allocated channel: ch00
channel ch00: SID=51 device type=SBT_TAPE
channel ch00: WARNING: Oracle Test Disk API

Starting backup at 08-APR-10
channel ch00: starting full datafile backup set
channel ch00: specifying datafile(s) in backup set
input datafile file number=00001 name=/app/local/oradata/db11g/system01.dbf
channel ch00: starting piece 1 at 08-APR-10
channel ch00: finished piece 1 at 08-APR-10
piece handle=12lak79g_1_1 tag=TAG20100408T150104 comment=API Version 2.0,MMS Version 8.1.3.0
channel ch00: backup set complete, elapsed time: 00:00:35
Finished backup at 08-APR-10

Starting Control File and SPFILE Autobackup at 08-APR-10
piece handle=c-179279599-20100408-01 comment=API Version 2.0,MMS Version 8.1.3.0
Finished Control File and SPFILE Autobackup at 08-APR-10
released channel: ch00

RMAN> exit

Recovery Manager complete.
$

The file output looks like this:


$ ls -lrt
total 610920
-rw-r--r-- 1 oracle oinstall 608174088 Apr  8 15:01 12lak79g_1_1
-rw-r--r-- 1 oracle oinstall      3204 Apr  8 15:01 Oracle_Disk_SBT_Catalog
-rw-r--r-- 1 oracle oinstall  16777224 Apr  8 15:01 c-179279599-20100408-01
447:oracle@testvm:/app/local/oracle/orabkup/db11g/tape [db11g]
$

The file can be read back by Oracle and I’ll have an example of this using the dbfs_content API soon.

The other interesting thing about this is that I believe it’s been around since Oracle8 when RMAN first appeared.

11gR2 RAC Observations – crs_stat

April 3rd, 2010

If you’ve ever used the command “crs_stat -t” then you probably know that the output leaves a little to be desired. Well I stumbled upon this gem of a utility a while back and thought I’d give it a little more attention. It’s called crsstat, appropriately enough, and is a shell script available here:

http://www.rachelp.nl/index_kb.php?menu=scripts&actie=show&id=14

As it turns out, the script is actually buried in Oracle Support Note: 259301.1. Not sure who the original author is, but the script is just great.

The following shows the output of the “crs_stat -t” command:


$ crs_stat -t
Name           Type           Target    State     Host
------------------------------------------------------------
ora....DATA.dg ora....up.type ONLINE    ONLINE    racdb1
ora....GRID.dg ora....up.type ONLINE    ONLINE    racdb1
ora....ER.lsnr ora....er.type ONLINE    ONLINE    racdb1
ora....N1.lsnr ora....er.type ONLINE    ONLINE    racdb2
ora.asm        ora.asm.type   ONLINE    ONLINE    racdb1
ora....SM1.asm application    ONLINE    ONLINE    racdb1
ora....B1.lsnr application    ONLINE    ONLINE    racdb1
ora....db1.gsd application    ONLINE    ONLINE    racdb1
ora....db1.ons application    ONLINE    ONLINE    racdb1
ora....db1.vip ora....t1.type ONLINE    ONLINE    racdb1
ora....SM2.asm application    ONLINE    ONLINE    racdb2
ora....B2.lsnr application    ONLINE    ONLINE    racdb2
ora....db2.gsd application    ONLINE    ONLINE    racdb2
ora....db2.ons application    ONLINE    ONLINE    racdb2
ora....db2.vip ora....t1.type ONLINE    ONLINE    racdb2
ora....SM3.asm application    ONLINE    ONLINE    racdb3
ora....B3.lsnr application    ONLINE    ONLINE    racdb3
ora....db3.gsd application    ONLINE    ONLINE    racdb3
ora....db3.ons application    ONLINE    ONLINE    racdb3
ora....db3.vip ora....t1.type ONLINE    ONLINE    racdb3
ora.racdb.db   ora....se.type ONLINE    ONLINE    racdb2
ora....taf.svc ora....ce.type ONLINE    ONLINE    racdb3
ora....taf.svc ora....ce.type ONLINE    ONLINE    racdb3
ora.eons       ora.eons.type  ONLINE    ONLINE    racdb1
ora.gsd        ora.gsd.type   ONLINE    ONLINE    racdb1
ora....network ora....rk.type ONLINE    ONLINE    racdb1
ora.oc4j       ora.oc4j.type  OFFLINE   OFFLINE
ora.ons        ora.ons.type   ONLINE    ONLINE    racdb1
ora....ry.acfs ora....fs.type ONLINE    ONLINE    racdb1
ora.scan1.vip  ora....ip.type ONLINE    ONLINE    racdb2
$

On the same system with the crsstat script:

$ ./crsstat
HA Resource                                   Target     State
-----------                                   ------     -----
ora.racdbDATA.dg                              ONLINE     ONLINE on racdb1
ora.racdbGRID.dg                              ONLINE     ONLINE on racdb1
ora.LISTENER.lsnr                             ONLINE     ONLINE on racdb1
ora.LISTENER_SCAN1.lsnr                       ONLINE     ONLINE on racdb2
ora.asm                                       ONLINE     ONLINE on racdb1
ora.racdb1.ASM1.asm                           ONLINE     ONLINE on racdb1
ora.racdb1.LISTENER_RACDB1.lsnr               ONLINE     ONLINE on racdb1
ora.racdb1.gsd                                ONLINE     ONLINE on racdb1
ora.racdb1.ons                                ONLINE     ONLINE on racdb1
ora.racdb1.vip                                ONLINE     ONLINE on racdb1
ora.racdb2.ASM2.asm                           ONLINE     ONLINE on racdb2
ora.racdb2.LISTENER_RACDB2.lsnr               ONLINE     ONLINE on racdb2
ora.racdb2.gsd                                ONLINE     ONLINE on racdb2
ora.racdb2.ons                                ONLINE     ONLINE on racdb2
ora.racdb2.vip                                ONLINE     ONLINE on racdb2
ora.racdb3.ASM3.asm                           ONLINE     ONLINE on racdb3
ora.racdb3.LISTENER_RACDB3.lsnr               ONLINE     ONLINE on racdb3
ora.racdb3.gsd                                ONLINE     ONLINE on racdb3
ora.racdb3.ons                                ONLINE     ONLINE on racdb3
ora.racdb3.vip                                ONLINE     ONLINE on racdb3
ora.cmssb.db                                  ONLINE     ONLINE on racdb2
ora.cmssb.pref_all_basictaf.svc               ONLINE     ONLINE on racdb3
ora.cmssb.pref_all_notaf.svc                  ONLINE     ONLINE on racdb3
ora.eons                                      ONLINE     ONLINE on racdb1
ora.gsd                                       ONLINE     ONLINE on racdb1
ora.net1.network                              ONLINE     ONLINE on racdb1
ora.oc4j                                      OFFLINE    OFFLINE
ora.ons                                       ONLINE     ONLINE on racdb1
ora.registry.acfs                             ONLINE     ONLINE on racdb1
ora.scan1.vip                                 ONLINE     ONLINE on racdb2
$

Just a tremendous difference!

11gR2 RAC Observations – OCR and Voting Disk in ASM

April 2nd, 2010

When installing 11g Release 2 clusterware you are required to install the OCR and voting disks into an ASM diskgroup(s). However when upgrading to 11.2 this is actually optional, and if you’re planning on running databases at lower releases on the same cluster then putting the OCR and voting disks into ASM actually causes problems. This is a problem because pre-11.2 utilities don’t know how to access the OCR when it’s in ASM. The databases seem to run fine but srvctl won’t work correctly. The 11.2 version is of no help either:

$ ./srvctl status database -d racdb
PRCD-1027 : Failed to retrieve database racdb
PRKP-1088 : Failed to retrieve configuration of cluster database racdb
PRKR-1078 : Database racdb of version 11.0.0.0.0 cannot be administered using current version of srvctl.
Instead run srvctl from /u01/app/oracle/product/11.1.0/db_1

It is relatively easy to move between ASM and block devices though so this doesn’t really present much of a problem. The following shows the OCR in ASM:

[root@racdb1 bin]# ./ocrcheck
Status of Oracle Cluster Registry is as follows :
         Version                  :          3
         Total space (kbytes)     :     519920
         Used space (kbytes)      :      13852
         Available space (kbytes) :     506068
         ID                       :   33748369
         Device/File Name         : +GRID01
                                    Device/File integrity check succeeded

                                    Device/File not configured

                                    Device/File not configured

                                    Device/File not configured

                                    Device/File not configured

         Cluster registry integrity check succeeded

         Logical corruption check succeeded

To convert back to block devices first we add back our original two OCR devices:

[root@racdb1 bin]# ./ocrconfig -add /dev/mapper/CRS01p1
[root@racdb1 bin]# ./ocrconfig -add /dev/mapper/CRS02p1
[root@racdb1 bin]# ./ocrcheck
Status of Oracle Cluster Registry is as follows :
         Version                  :          3
         Total space (kbytes)     :     519920
         Used space (kbytes)      :      13852
         Available space (kbytes) :     506068
         ID                       :   33748369
         Device/File Name         : +GRID01
                                    Device/File integrity check succeeded
         Device/File Name         : /dev/mapper/CRS01p1
                                    Device/File integrity check succeeded
         Device/File Name         : /dev/mapper/CRS02p1
                                    Device/File integrity check succeeded

                                    Device/File not configured

                                    Device/File not configured

         Cluster registry integrity check succeeded

         Logical corruption check succeeded

Then we replace our voting disk devices:

[root@racdb1 grid]# crsctl query css votedisk
##  STATE    File Universal Id                File Name Disk group
--  -----    -----------------                --------- ---------
 1. ONLINE   a4be2c392efd4f9fbff47545e09168ae (ORCL:GRID01) [DATA]
Located 1 voting disk(s).
[root@racdb1 bin]# ./crsctl replace votedisk /dev/mapper/VOTE01p1 /dev/mapper/VOTE02p1 /dev/mapper/VOTE03p1

Now formatting voting disk: /dev/mapper/VOTE01p1.
Now formatting voting disk: /dev/mapper/VOTE02p1.
Now formatting voting disk: /dev/mapper/VOTE03p1.
CRS-4256: Updating the profile
Successful addition of voting disk 6f67e53306844f79bf2f3cf2c9976d5d.
Successful addition of voting disk a69f48d9a6c54f95bf7eab415ba05510.
Successful addition of voting disk 9c48ad75bcc84fe6bf0e3fe17c6d0852.
Successful deletion of voting disk 691f7c1c8a554f25bfa603421c4b2a6b.
CRS-4256: Updating the profile
CRS-4266: Voting file(s) successfully replaced
[root@racdb1 bin]# ./crsctl query css votedisk

##  STATE    File Universal Id                File Name Disk group
--  -----    -----------------                --------- ---------
1. ONLINE   6f67e53306844f79bf2f3cf2c9976d5d (/dev/mapper/VOTE01p1) []
2. ONLINE   a69f48d9a6c54f95bf7eab415ba05510 (/dev/mapper/VOTE02p1) []
3. ONLINE   9c48ad75bcc84fe6bf0e3fe17c6d0852 (/dev/mapper/VOTE03p1) []
Located 3 voting disk(s).

Now we’ll remove the OCR from ASM:

[root@racdb1 bin]# ./ocrconfig -delete +GRID01

[root@racdb1 bin]# ./ocrcheck

Status of Oracle Cluster Registry is as follows :
Version                  :          3
Total space (kbytes)     :     519920
Used space (kbytes)      :      13852
Available space (kbytes) :     506068
ID                       :   33748369
Device/File Name         : /dev/mapper/CRS01p1
Device/File integrity check succeeded
Device/File Name         : /dev/mapper/CRS02p1
Device/File integrity check succeeded

Device/File not configured

Device/File not configured

Device/File not configured

Cluster registry integrity check succeeded

Logical corruption check succeeded

So let’s dig a little further into what the problems might be with older versions of srvctl. The following is part of a log file from the racg directory:

1037:oracle@racdb1:/u01/app/oracle/product/11.1.0/db_1/log/racdb1/racg [racdb1]
$ tail -f imon_racdb.log
2009-11-24 14:53:16.866: [    RACG][2142504896] [29100][2142504896][ora.default]: racgimon exiting clsz init failed

2009-11-24 14:54:04.871: [  OCROSD][312616896]utopen:6':failed in stat OCR file/disk +GRID01, errno=2, os err string=No such file or directory
2009-11-24 14:54:04.871: [  OCROSD][312616896]utopen:7:failed to open OCR file/disk +GRID01 , errno=2, os err string=No such file or directory
2009-11-24 14:54:04.871: [  OCRRAW][312616896]proprinit: Could not open raw device
2009-11-24 14:54:04.871: [ default][312616896]a_init:7!: Backend init unsuccessful : [26]
2009-11-24 14:54:04.871: [ CSSCLNT][312616896]clsssinit: Unable to access OCR device in OCR init.PROC-26: Error while accessing the physical storage Operating System error [No such file or directory] [2]

2009-11-24 14:54:04.871: [    RACG][312616896] [29338][312616896][ora.default]: racgimon exiting clsz init failed

2009-11-24 15:36:02.227: [    RACG][22218688] [11666][22218688][ora.racdb.racdb1.inst]: racgimon started

2009-11-24 15:36:38.744: [    RACG][1251604800] [11666][1251604800][ora.racdb.racdb1.inst]:
SQL*Plus: Release 11.1.0.7.0 - Production on Tue Nov 24 15:36:03 2009

Copyright (c) 1982, 2008, Oracle.  All rights reserved.

Enter user-name:
Connected to:
Oracle Database 11g Enterprise Edition Release 11.1.0.7.0 - 64bit Production

2009-11-24 15:36:38.744: [    RACG][1251604800] [11666][1251604800][ora.racdb.racdb1.inst]: With the Partitioning, Real Application Clusters, OLAP, Data Mining
and Real Application Testing options

SQL> Database closed.
Database dismounted.
ORACLE instance shut down.

2009-11-24 15:36:38.744: [    RACG][1251604800] [11666][1251604800][ora.racdb.racdb1.inst]: SQL> Disconnected from Oracle Database 11g Enterprise Edition Release 11.1.0.7.0 - 64bit Production
With the Partitioning, Real Application Clusters, OLAP, Data Mining
and Real Application Testing options

So from this we can see that the 11.1 version can’t read our ASM OCR. Once the OCR is moved back to a block device then both 11.1 and 11.2 function correctly.

11g Release 2 RAC Observations

December 4th, 2009

So I’ve done a couple of 11gR2 RAC installs and an upgrade and I don’t think one of them has been the same. The two big things so far have been SCAN addresses and the OCR and voting disk placement.

SCAN addresses are Oracle’s new way of solving the “connection” and load balancing problems inherent with VIPs. For installs it’s required, but on upgrades it’s optional. Quite a difference and I think it should be optional for installs as well. That way it allows for an implementation once it’s fully understood.

As for the OCR and voting disk(s) Oracle has made it possible to put both in ASM. This is great if you’re running just 11gR2, but if you’re upgrading a cluster with any other releases then the OS utilities like “srvctl” won’t work because they can’t read the OCR, and you can’t use the 11.2 versions on older releases. If this is your situation then you have to leave the OCR and voting disks where they are and can’t take advantage of this new feature.

Trace Event Timeline, Wrap Up

October 11th, 2009

I’ve published a new utility to generate a trace file event timeline. It’s available on appsdba.com here. I’ve also updated the original interval resource profiler with the same code as the event profiler utility so that it displays CPU events separated by PARSE, EXEC and FETCH with full recursive accounting. The previous version grouped all CPU time into one “event” called “CPU Service” and only distinguished between non-recursive (e.g. dep=0) and recursive (e.g. dep>0) time. The new utility also sorts events, which my original attempt did not, but so far I have not seen a significant difference in the recursive time accounting. The event sort has changed the look of the graphs slightly from the originals, so comparisons are a little more difficult. The time accounting is correct so far, so it looks like this version is accurate. I will be publishing a new utility shortly that can generate the “execution tree” that I showed in my previous article “Trace Event Timeline -  Recursive, Non-Recursive and Exclusive Time“. This utility makes it easier to reconcile time to insure proper time accounting.

As for the graph comparison? Here’s the new view of the same graph that I showed in Part 3 of this trace event timeline series.

new_20172

Trace Event Timeline – Recursive, Non-Recursive and Exclusive Time

August 30th, 2009

Non-recursive and recursive cpu and elapsed time in extended SQL trace files are distinguished by whether the “dep=” value for PARSE, EXEC and FETCH statements is zero (i.e. non-recursive) or greater than zero (i.e. recursive). Millsap and Holt in Optimizing Oracle Performance [Pgs. 90-94] do an excellent job explaining the differentiation, and it is important in profiling resource time since all recursive time is invoked by non-recursive parent statements. What is typically not discussed, and is a little more complicated to categorize, is something called exclusive time. Exclusive time is simply the time that can be directly attributed to executing a particular PARSE, EXEC or FETCH statement. In other words it is the statement time less any recursive time incurred for that statement. It’s actually a little more complicated in that what we really have for each motivating statement is four time categories, the exclusive time for that statement and any recursive PARSE, EXEC or FETCH time. However, this is a recursive, no pun intended, relationship for each recursive PARSE, EXEC or FECTH statement since they may also have recursive children. In order to make the time accounting a little easier I wrote what I call an “execution tree” for PARSE, EXEC or FETCH statements in a trace file. It basically creates an indented tree listing based on recursive level and sums the four types of time when a parent recursive level is reached. This makes it much easier to determine the proper time accounting.

Partial trace file:


=====================

PARSING IN CURSOR #4 len=740 dep=1 uid=61 oct=3 lid=61 tim=1220394068956400 hv=1020616855 ad='e3eaf9e0'
select  tab.row_id, tab.msgid, tab.corrid, tab.priority, tab.delay,   tab.expiration, tab.retry_count, tab.exception_qschema,   tab.exception_queue, tab.chain_no, tab.local_order_no, tab.enq_time,   tab.time_manager_info, tab.state, tab.enq_tid, tab.step_no,   tab.sender_name, tab.sender_address, tab.sender_protocol,   tab.dequeue_msgid, tab.user_prop, tab.user_data   from "AE2"."AQ$_ANALYSIS_ENGINE_IN_F" tab  where (subscriber_id = :1 and subscriber_name = :2 and queue_evtid = :3         and delivery_mode = 'PERSISTENT') and (tab.user_data.extract('/ns:NIFMessage/ns:payload/aee:analysisSequenceId', 'xmlns:ns="http://nif.llnl.gov" xmlns:aee="http://www.oracle.com/aq/XMLType"').getStringVal() like '%TAS_WFLZfocusToTCC_Sequencer%')
END OF STMT
EXEC #4:c=0,e=626,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=1,tim=1220394068956393
FETCH #4:c=69989,e=68532,p=0,cr=630,cu=0,mis=0,r=0,dep=1,og=1,tim=1220394069024985
WAIT #6: nam='Streams AQ: waiting for messages in the queue' ela= 147396 queue id=73580 process#=4038849208 wait time=1 obj#=-1 tim=1220394069172592
EXEC #4:c=0,e=317,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=1,tim=1220394069173174
FETCH #4:c=71990,e=70466,p=0,cr=630,cu=0,mis=0,r=0,dep=1,og=1,tim=1220394069243668
ERROR #6:err=25228 tim=783947179
WAIT #6: nam='SQL*Net break/reset to client' ela= 6 driver break?=1 p3=0 obj#=-1 tim=1220394069244129
WAIT #6: nam='SQL*Net break/reset to client' ela= 271 driver break?=0 p3=0 obj#=-1 tim=1220394069244422
WAIT #6: nam='SQL*Net message to client' ela= 2 driver #bytes=1 p3=0 obj#=-1 tim=1220394069244444
WAIT #6: nam='SQL*Net message from client' ela= 807 driver #bytes=1 p3=0 obj#=-1 tim=1220394069245294
=====================
PARSING IN CURSOR #6 len=174 dep=0 uid=61 oct=47 lid=61 tim=1220394069245484 hv=562127231 ad='ef4b8648'
begin dbms_aqin.aq$_dequeue_in( :1, :2, :3, :4, :5, :6, :7, :8, :9, :10, :11, :12, :13, :14, :15, :16, :17, :18, :19, :20, :21, :22, :23, :24, :25, :26, :27, :28, :29); end;
END OF STMT
PARSE #6:c=0,e=38,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,tim=1220394069245479
EXEC #4:c=0,e=284,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=1,tim=1220394069247543
FETCH #4:c=73988,e=72207,p=0,cr=630,cu=0,mis=0,r=0,dep=1,og=1,tim=1220394069319777
EXEC #4:c=0,e=291,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=1,tim=1220394069320320
WAIT #4: nam='latch: cache buffers chains' ela= 41 address=3983055408 number=122 tries=0 obj#=-1 tim=1220394069344643
FETCH #4:c=71989,e=70324,p=0,cr=630,cu=0,mis=0,r=0,dep=1,og=1,tim=1220394069390667
WAIT #6: nam='Streams AQ: waiting for messages in the queue' ela= 978084 queue id=73580 process#=4038849208 wait time=1 obj#=-1 tim=1220394070368953
EXEC #4:c=0,e=322,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=1,tim=1220394070369487
FETCH #4:c=71990,e=70318,p=0,cr=630,cu=0,mis=0,r=0,dep=1,og=1,tim=1220394070439833
EXEC #6:c=220966,e=1194523,p=0,cr=1896,cu=0,mis=0,r=0,dep=0,og=1,tim=1220394070440145

Resulting execution tree:

  Line 39: EXEC #4:c=0,e=626,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=1,tim=1220394068956393,hv=1020616855
  Line 40: FETCH #4:c=69989,e=68532,p=0,cr=630,cu=0,mis=0,r=0,dep=1,og=1,tim=1220394069024985,hv=1020616855
  Line 42: EXEC #4:c=0,e=317,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=1,tim=1220394069173174,hv=1020616855
  Line 43: FETCH #4:c=71990,e=70466,p=0,cr=630,cu=0,mis=0,r=0,dep=1,og=1,tim=1220394069243668,hv=1020616855
  PARSE: cpu=          0 ela=          0
  EXEC: cpu=          0 ela=        943
  FETCH: cpu=     141979 ela=     138998
  Recursive: cpu=     141979 ela=     139941; Exclusive: cpu=    -141979 ela=    -139903
Line 53: PARSE #6:c=0,e=38,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,tim=1220394069245479,hv=562127231
  Line 54: EXEC #4:c=0,e=284,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=1,tim=1220394069247543,hv=1020616855
  Line 55: FETCH #4:c=73988,e=72207,p=0,cr=630,cu=0,mis=0,r=0,dep=1,og=1,tim=1220394069319777,hv=1020616855
  Line 56: EXEC #4:c=0,e=291,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=1,tim=1220394069320320,hv=1020616855
  Line 58: FETCH #4:c=71989,e=70324,p=0,cr=630,cu=0,mis=0,r=0,dep=1,og=1,tim=1220394069390667,hv=1020616855
  Line 60: EXEC #4:c=0,e=322,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=1,tim=1220394070369487,hv=1020616855
  Line 61: FETCH #4:c=71990,e=70318,p=0,cr=630,cu=0,mis=0,r=0,dep=1,og=1,tim=1220394070439833,hv=1020616855
  PARSE: cpu=          0 ela=          0
  EXEC: cpu=          0 ela=        897
  FETCH: cpu=     217967 ela=     212849
  Recursive: cpu=     217967 ela=     213746; Exclusive: cpu=       2999 ela=     980777
Line 62: EXEC #6:c=220966,e=1194523,p=0,cr=1896,cu=0,mis=0,r=0,dep=0,og=1,tim=1220394070440145,hv=562127231
PARSE: cpu=          0 ela=         38
EXEC: cpu=     220966 ela=    1194523
FETCH: cpu=          0 ela=          0

As you can see, when you “go up a recursive level” the PARSE, EXEC and FETCH totals of the previous level are summed by category, totaled as recursive time, and the resulting exclusive time based on the parent statement’s time.

This whole concept of exclusive time becomes important when considering how to most accurately reflect “interval” time in the trace event timeline. My first attempt at categorizing time depended on only using non-recursive totals, that is recursive depth zero, PARSE, EXEC and FETCH statements along with event elapsed time. This had the convenience of not requiring complicated tracking of all the recursive time allocations, and with the proper disclosure was a perfectly valid method of creating the timeline. Of course there are some problems with only representing recursive depth zero time, the biggest one being tracing applications that are predominantly PL/SQL activity. This can lead to few, if any, recursive depth zero timings and can therefore create a misleading timeline. Another big problem is that the distribution of time can change significantly if the actual recursive PARSE, EXEC and FETCH is just lumped in to the parent statement’s recursive depth zero parse time rather than being categorized through each recursive level. A more accurate picture can be obtained by summarizing all recursive and non-recursive time for each time interval. I believe that this is an accurate representation of time from the standpoint of the entire trace file timeline, but can create some thorny issues for those trying to categorize time at the statement level. At the statement level we get into the issues that were mentioned in Part 1 of my Trace Event Timeline series.

To implement this new accuracy level requires that for each recursive level that we track the PARSE, EXEC and FETCH time of each statement’s recursive children along with that statement’s exclusive time, and then track that based on the interval time. This then provides the current totals for each time category as each statement, or wait event for that matter, is emitted. Since interval boundaries are checked as each trace file statement is encountered, when a boundary is crossed we now have an accounting of the current cpu time regardless of recursive depth. I will post a comparison of the trace event timeline that I first proposed and this new method of categorizing all recursive time shortly.

Trace Event Timeline, Part 3

August 11th, 2009

This is part 3 of my series discussing a new technique for analyzing extended SQL trace files that I’m calling a “Trace Event Timeline”. The timeline is presented in a stacked bar chart format based on non-uniform intervals. Each stacked bar is the sum of the time spent in each recorded event during that particular timed interval. The intervals are non-uniform because they are taken only at points in between either WAIT events or recursive depth zero PARSE, EXEC or FETCH statements. A target interval length is derived by dividing the total trace file duration by the number of specified intervals, currently 20. When the running timeline of the trace file exceeds the target interval a new interval is started.

The purpose of the technique is to provide more information about when events occurred during the tracing period. Having just a resource profile of the trace file leaves out information about the event duration skew. An example might be having a large amount of ”SQL*Net message from client time” that exists almost completely at the start and end of the trace period. This might be in contrast to having that relatively evenly distributed across the entire interval. In the first case it might simply be a “scoping” problem with the start and end of the trace whereas the second example might be part of the actual performance issue (i.e. long client side processing separated by row fetches from the database).

As I mentioned in part 1 of this series, the idea for an event timeline occurred to me while I looking at a process that would do a bunch of database work very fast and then pause for a bit and would then speed up again. All sorts of conjecture about database overload were made, but all indications from a workload perspective showed that the database server was barely doing any work at all. The following shows the resource profile from a trace taken during the period that this behavior occurred:

                               Total Event  % Event      Total  --------------- Cumulative ----------------
Event Name                      Time (sec)     Time     Events  Avg Time(sec)  Min Time(sec)  Max Time(sec)
----------------------------   -----------  -------   --------  -------------  -------------  -------------
SQL*Net message from client     255.000799    48.11    152,647       0.001671       0.000138       0.290118
CPU Service                     244.962734    46.22    864,046       0.000284       0.000000       0.054992
unaccounted-for                  18.943126     3.57
db file sequential read          10.345820     1.95      3,262       0.003172       0.000017       0.295407
SQL*Net message to client         0.364421     0.07    152,647       0.000002       0.000000       0.002573
log file switch completion        0.285168     0.05          4       0.071292       0.003946       0.239785
log file sync                     0.130075     0.02         28       0.004646       0.000005       0.017599
latch: library cache              0.002779     0.00         36       0.000077       0.000005       0.001123
latch: cache buffers chains       0.002359     0.00          8       0.000295       0.000005       0.000621
SQL*Net more data from client     0.001583     0.00        201       0.000008       0.000005       0.000051
buffer busy waits                 0.000276     0.00          6       0.000046       0.000017       0.000144
latch: session allocation         0.000082     0.00          1       0.000082       0.000082       0.000082
latch: shared pool                0.000062     0.00          1       0.000062       0.000062       0.000062
SQL*Net more data to client       0.000034     0.00          1       0.000034       0.000034       0.000034
-----------------------------  -----------  -------  ---------  -------------  -------------  -------------
Total                           530.039318   100.00  1,172,888

Notice that 94% of the time was spent almost equally in just two events, “SQL*Net message from client” and “CPU service”. One might assume from looking at this resource profile, and even the individual events with a tkprof output, that the “SQL*Net message from client” could just be a scoping problem. Perhaps we started the trace too early or ended it too late (i.e. before the process started or after it ended).[1] However, from the trace event timeline shown below we see that this is not the case at all:

ora_20172_9294129

What we do see is that the “EXEC” event (i.e. CPU service for EXEC calls) dominates the timeline and then virtually stops for a short period of time, when we see the “SQL*Net message from client” event accumulate most of the time, and then the “EXEC” event takes over again. In fact, this mirrors the behavior that was observed during the actual running of the process, and from this information we can be pretty sure that the delay or “slowdown” is not happening because of the database. If there were contention or some resource shortage we would see that event and not the “SQL*Net message from client” event.

Hopefully I have provided a reasonably clear picture of what a trace event timeline is, how it can be created, and what it can be used for. We’ll have to see if this becomes a useful technique for those problems that are obscured by the skew of event time across a trace file timeline.


[1] Typically an “idle” database process will accumulate “SQL*Net message from client” time waiting for work to be initiated by the client connection.

Trace Event Timeline, Part 2

August 10th, 2009

In Part 2 of this post I’m going to describe how the trace event timeline is created. I think it’s important to discuss which events are being captured and how the intervals are derived since it affects the output of the graph and of course, the validity of the technique. There will also be a Part 3 which will discuss the event timeline graph and how it may be used.

There are several issues that crop up when thinking about how to slice a trace file into intervals and what is valid to use as event totals. When I initially came up with the idea of an event timeline I posed the question to Jeff Holt at Method-R.com in their forum as to what he thought about the idea. He raised these same issues, and suffice it to say that trying to do this at the “statement” level would be very hard. Reconciling the recursive SQL and trying to account for the time across multiple intervals was beyond the scope of what I was trying to achieve. That’s the beauty of the Interval Resource Profiler. It only summarizes the entire trace file or interval, and it does not attempt the much more detailed analysis of statement level attribution. This makes it fairly easy from a scripting perspective to use non-uniform intervals based on statement timings for wait events and recursive depth zero parse, execute and fetch statements.

Now for some background on the current state of extended SQL trace files. As of 11.1.0.6 a SQL trace file looks something like this (this is just a small part of one trace file):

EXEC #6:c=0,e=83,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,tim=1219708745416872
WAIT #6: nam='SQL*Net message to client' ela= 1 driver #bytes=1 p3=0 obj#=-1 tim=1219708745416904
FETCH #6:c=0,e=84,p=0,cr=8,cu=0,mis=0,r=1,dep=0,og=1,tim=1219708745417024
STAT #6 cnt=1 pid=0 pos=1 obj=0 op='NESTED LOOPS  (cr=8 pr=0 pw=0 time=0 us cost=5 size=89 card=1)'
STAT #6 cnt=1 pid=1 pos=1 obj=71003 op='TABLE ACCESS BY INDEX ROWID ODM_PUBLICOBJECT (cr=4 pr=0 pw=0 time=0 us cost=3 size=73 card=1)'
STAT #6 cnt=1 pid=2 pos=1 obj=71009 op='INDEX UNIQUE SCAN SYS_C009848 (cr=3 pr=0 pw=0 time=0 us cost=2 size=0 card=1)'
STAT #6 cnt=1 pid=1 pos=2 obj=70836 op='TABLE ACCESS BY INDEX ROWID ODM_DOCUMENT (cr=4 pr=0 pw=0 time=0 us cost=2 size=16 card=1)'
STAT #6 cnt=1 pid=4 pos=1 obj=70839 op='INDEX UNIQUE SCAN SYS_C009781 (cr=3 pr=0 pw=0 time=0 us cost=1 size=0 card=1)'
WAIT #6: nam='SQL*Net message from client' ela= 1202 driver #bytes=1 p3=0 obj#=-1 tim=1219708745418336
FETCH #6:c=0,e=2,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=0,tim=1219708745418358
WAIT #6: nam='SQL*Net message to client' ela= 1 driver #bytes=1 p3=0 obj#=-1 tim=1219708745418375
WAIT #6: nam='SQL*Net message from client' ela= 87423 driver #bytes=1 p3=0 obj#=-1 tim=1219708745505813
EXEC #3:c=0,e=84,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,tim=1219708745505944
WAIT #3: nam='SQL*Net message to client' ela= 1 driver #bytes=1 p3=0 obj#=-1 tim=1219708745506104
FETCH #3:c=0,e=174,p=0,cr=11,cu=0,mis=0,r=3,dep=0,og=1,tim=1219708745506134
WAIT #3: nam='SQL*Net message from client' ela= 2607 driver #bytes=1 p3=0 obj#=-1 tim=1219708745508784

Oracle continues to add events and fields but the basic trace file format remains essentially the same (see Metalink note 39817.1). The first thing to notice is that the WAIT, PARSE, EXEC and FETCH statements all have “tim=” values. These values provide the basis for a running timeline for the interval being traced (see Optimizing Oracle Performance for the best description of a trace file). Unfortunately the “tim=” values for the WAIT statements do not yet appear to make sense in the overall timing (this is apparently due to Oracle bug 8342329), and aren’t available in older releases. However this can be dealt with by keeping a running timeline and using the “ela=” or elapsed time values to supplement the timeline between PARSE, EXEC and FETCH statements. This is, in fact, the approach taken with the Interval Resource Profiler.

Now, what about slicing a trace file into intervals? Given that we’re only interested in the entire trace file interval and that we have a running timeline, we can create non-uniform intervals as long as we don’t “double count” recursive CPU time (again see Optimizing Oracle Performance for a description of double counting recursive statements). That leaves us with creating our interval “splits” between WAIT events or recursive depth zero PARSE, EXEC and FETCH statements. The side effect of this is that the event totals for each interval may or may not be the same, hence the non-uniform part. This ultimately affects the graph representation as well, and I’ll cover that in part 3. Adding this code to the Interval Resource Profiler was actually pretty easy since the timeline tracking and event summarization was already there.

The basic timeline tracking consists of checking to see if the start time has been set for statements with a “tim=” value that are not WAIT events. If the start time has not been set then the timeline is set to the current time minus any previously recorded WAIT event time. If it has been set then if the current time is less than the start time the start time is reset to the current time. Otherwise the end time is set (the normal situation) and a check is made to see if we’ve passed over an event time boundary, and if so then a new boundary time is set.

Timeline

#
# Track interval time start and end
#
if ($time_start == 0) {
  $time_start = ($8 - $time_offset);
  $time_offset = 0;
  $time_end = $8;
  $time_skew = $time_end;
}
elsif ($8 < $time_start) {
  $time_start = $8;
}
else {
  $time_end = $8;
  #
  # Adjust the skew interval
  #
  if (($time_end-$time_skew) > $time_skew_intvl) {
    $time_skew = $time_end;
  }
}
#

If a PARSE, EXEC or FETCH statement and the recursive depth is zero (i.e. dep = 0) then the CPU time “c=” is added to the appropriate event total for this interval (i.e. PARSE, EXEC or FETCH).

PARSE|EXEC|FETCH

#
# Track the interval events for P|E|F dep=0
#
if ($10 == 0) {
  my $cmd = "'" . $1 . "'";
  if (defined $intvl_skew->{$time_skew}->{$cmd}) {
    $intvl_skew->{$time_skew}->{$cmd} += $3;
  }
  else {
    $intvl_skew->{$time_skew}->{$cmd} = $3;
  }
}
#

If this is a WAIT statement then the timeline consists of either adding the time to the “time_offset”, if the start time hasn’t been set, or adding it to the end time (in case no more tim= values are seen). The end time is then checked to see if we’ve passed  over an event time boundary and if so then we set a new one.


WAITs

#
if ($time_start == 0) {
  $time_offset = $ela;
}
$time_end += $ela;
#
# Adjust the skew interval
#
if (($time_end-$time_skew) > $time_skew_intvl) {
  $time_skew = $time_end;
}
#

The elapsed time (i.e. “ela=”) is then added to the event time for this interval.

#
# Track the interval events
#
if (defined $intvl_skew->{$time_skew}->{$nam}) {
  $intvl_skew->{$time_skew}->{$nam} += $ela;
}
else {
  $intvl_skew->{$time_skew}->{$nam} = $ela;
}
#

The only hard part left is determining what the base interval time should be for the event timeline. The Interval Resource Profiler is a two pass parser, so currently the event timeline interval is calculated in pass one by taking the entire trace file interval time and dividing it by the desired number of event intervals, currently 20. Then in pass 2 the previously described logic is applied to calculate the event timeline using the interval as $time_skew_intvl.

Once the event timeline has been calculated, it’s just a matter of graphing the results. This can be done in Perl using the GD Graphics Library package, this was how the graph in part 1 was created, or the results can be output to a CSV file and input into an Excel spreadsheet. In either case a pivot on the data must be performed in order to get the event time as a column for each interval row in order to create the graph. This will be part of the discussion of Part 3.

Trace Event Timeline, Part 1

August 7th, 2009

I was working on a problem recently where a process was “slow”. So naturally I got one of the DBAs to trace a sample of one of the runs and we took a look at it. The interesting part was that it was about half and half between SQL*Net message from client and CPU (EXEC statements). It turns out that the process would do a bunch of database work very fast and then just pause, issuing statements but much more slowly, and then it would speed up again. Naturally everyone blamed the database, but from the trace file one might assume that it was at least partly a client issue. The problem was that from the trace file there was no way to tell that the behavior was “skewed”, running fast then slowing down then running fast again.

This led me to thinking that it would be very handy in some cases to know a little bit more about when certain events occur within a traced session. Some of our famous brethren will usually say that that’s just a scoping problem, and while that is probably true, in the real world it’s not always so obvious, or easy to do something about. So I decided to modify the AppsDBA Interval Resource Profiler to see if I could create an “Event Timeline”. The following is the result:

cmsdev1_ora_1825_3507313

The idea is that the graph displays the event totals in a timeline as they occurred. It seems to make it easier to determine roughly when actions took place. In other words, is there event skew? Not skew for a particular event, which is what the Hotsos/Method-R Profiler gives you, but across the entire trace interval. The timeline should help determine if all of the activity took place in a small set of intervals, or evenly across the whole interval, or perhaps like my original problem in spurts. The resource profile that goes along with this graph looks like the following:


                          Total Event  % Event    Total  --------------- Cumulative ----------------
Event Name                 Time (sec)     Time   Events  Avg Time(sec)  Min Time(sec)  Max Time(sec)
------------------------  -----------  -------  -------  -------------  -------------  -------------
SQL*Net message from         0.801698    77.92      262       0.003060       0.000280       0.150290
client
SQL*Net more data from       0.132364    12.87      520       0.000255       0.000003       0.019935
client
CPU Service                  0.058989     5.73      478       0.000123       0.000000       0.010998
unaccounted-for              0.033580     3.26
library cache pin            0.000746     0.07        4       0.000187       0.000147       0.000254
library cache lock           0.000610     0.06        3       0.000203       0.000161       0.000237
gc current grant busy        0.000563     0.05        1       0.000563       0.000563       0.000563
SQL*Net message to           0.000285     0.03      262       0.000001       0.000000       0.000031
Client
SQL*Net more data to         0.000016     0.00        1       0.000016       0.000016       0.000016
client
------------------------  -----------  -------  -------  -------------  -------------  -------------
Total                        1.028851   100.00    1,531

I’ll have more to say about this in the near future.