Archive for August, 2009

Trace Event Timeline – Recursive, Non-Recursive and Exclusive Time

August 30th, 2009

Non-recursive and recursive cpu and elapsed time in extended SQL trace files are distinguished by whether the “dep=” value for PARSE, EXEC and FETCH statements is zero (i.e. non-recursive) or greater than zero (i.e. recursive). Millsap and Holt in Optimizing Oracle Performance [Pgs. 90-94] do an excellent job explaining the differentiation, and it is important in profiling resource time since all recursive time is invoked by non-recursive parent statements. What is typically not discussed, and is a little more complicated to categorize, is something called exclusive time. Exclusive time is simply the time that can be directly attributed to executing a particular PARSE, EXEC or FETCH statement. In other words it is the statement time less any recursive time incurred for that statement. It’s actually a little more complicated in that what we really have for each motivating statement is four time categories, the exclusive time for that statement and any recursive PARSE, EXEC or FETCH time. However, this is a recursive, no pun intended, relationship for each recursive PARSE, EXEC or FECTH statement since they may also have recursive children. In order to make the time accounting a little easier I wrote what I call an “execution tree” for PARSE, EXEC or FETCH statements in a trace file. It basically creates an indented tree listing based on recursive level and sums the four types of time when a parent recursive level is reached. This makes it much easier to determine the proper time accounting.

Partial trace file:


=====================

PARSING IN CURSOR #4 len=740 dep=1 uid=61 oct=3 lid=61 tim=1220394068956400 hv=1020616855 ad='e3eaf9e0'
select  tab.row_id, tab.msgid, tab.corrid, tab.priority, tab.delay,   tab.expiration, tab.retry_count, tab.exception_qschema,   tab.exception_queue, tab.chain_no, tab.local_order_no, tab.enq_time,   tab.time_manager_info, tab.state, tab.enq_tid, tab.step_no,   tab.sender_name, tab.sender_address, tab.sender_protocol,   tab.dequeue_msgid, tab.user_prop, tab.user_data   from "AE2"."AQ$_ANALYSIS_ENGINE_IN_F" tab  where (subscriber_id = :1 and subscriber_name = :2 and queue_evtid = :3         and delivery_mode = 'PERSISTENT') and (tab.user_data.extract('/ns:NIFMessage/ns:payload/aee:analysisSequenceId', 'xmlns:ns="http://nif.llnl.gov" xmlns:aee="http://www.oracle.com/aq/XMLType"').getStringVal() like '%TAS_WFLZfocusToTCC_Sequencer%')
END OF STMT
EXEC #4:c=0,e=626,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=1,tim=1220394068956393
FETCH #4:c=69989,e=68532,p=0,cr=630,cu=0,mis=0,r=0,dep=1,og=1,tim=1220394069024985
WAIT #6: nam='Streams AQ: waiting for messages in the queue' ela= 147396 queue id=73580 process#=4038849208 wait time=1 obj#=-1 tim=1220394069172592
EXEC #4:c=0,e=317,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=1,tim=1220394069173174
FETCH #4:c=71990,e=70466,p=0,cr=630,cu=0,mis=0,r=0,dep=1,og=1,tim=1220394069243668
ERROR #6:err=25228 tim=783947179
WAIT #6: nam='SQL*Net break/reset to client' ela= 6 driver break?=1 p3=0 obj#=-1 tim=1220394069244129
WAIT #6: nam='SQL*Net break/reset to client' ela= 271 driver break?=0 p3=0 obj#=-1 tim=1220394069244422
WAIT #6: nam='SQL*Net message to client' ela= 2 driver #bytes=1 p3=0 obj#=-1 tim=1220394069244444
WAIT #6: nam='SQL*Net message from client' ela= 807 driver #bytes=1 p3=0 obj#=-1 tim=1220394069245294
=====================
PARSING IN CURSOR #6 len=174 dep=0 uid=61 oct=47 lid=61 tim=1220394069245484 hv=562127231 ad='ef4b8648'
begin dbms_aqin.aq$_dequeue_in( :1, :2, :3, :4, :5, :6, :7, :8, :9, :10, :11, :12, :13, :14, :15, :16, :17, :18, :19, :20, :21, :22, :23, :24, :25, :26, :27, :28, :29); end;
END OF STMT
PARSE #6:c=0,e=38,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,tim=1220394069245479
EXEC #4:c=0,e=284,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=1,tim=1220394069247543
FETCH #4:c=73988,e=72207,p=0,cr=630,cu=0,mis=0,r=0,dep=1,og=1,tim=1220394069319777
EXEC #4:c=0,e=291,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=1,tim=1220394069320320
WAIT #4: nam='latch: cache buffers chains' ela= 41 address=3983055408 number=122 tries=0 obj#=-1 tim=1220394069344643
FETCH #4:c=71989,e=70324,p=0,cr=630,cu=0,mis=0,r=0,dep=1,og=1,tim=1220394069390667
WAIT #6: nam='Streams AQ: waiting for messages in the queue' ela= 978084 queue id=73580 process#=4038849208 wait time=1 obj#=-1 tim=1220394070368953
EXEC #4:c=0,e=322,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=1,tim=1220394070369487
FETCH #4:c=71990,e=70318,p=0,cr=630,cu=0,mis=0,r=0,dep=1,og=1,tim=1220394070439833
EXEC #6:c=220966,e=1194523,p=0,cr=1896,cu=0,mis=0,r=0,dep=0,og=1,tim=1220394070440145

Resulting execution tree:

  Line 39: EXEC #4:c=0,e=626,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=1,tim=1220394068956393,hv=1020616855
  Line 40: FETCH #4:c=69989,e=68532,p=0,cr=630,cu=0,mis=0,r=0,dep=1,og=1,tim=1220394069024985,hv=1020616855
  Line 42: EXEC #4:c=0,e=317,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=1,tim=1220394069173174,hv=1020616855
  Line 43: FETCH #4:c=71990,e=70466,p=0,cr=630,cu=0,mis=0,r=0,dep=1,og=1,tim=1220394069243668,hv=1020616855
  PARSE: cpu=          0 ela=          0
  EXEC: cpu=          0 ela=        943
  FETCH: cpu=     141979 ela=     138998
  Recursive: cpu=     141979 ela=     139941; Exclusive: cpu=    -141979 ela=    -139903
Line 53: PARSE #6:c=0,e=38,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,tim=1220394069245479,hv=562127231
  Line 54: EXEC #4:c=0,e=284,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=1,tim=1220394069247543,hv=1020616855
  Line 55: FETCH #4:c=73988,e=72207,p=0,cr=630,cu=0,mis=0,r=0,dep=1,og=1,tim=1220394069319777,hv=1020616855
  Line 56: EXEC #4:c=0,e=291,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=1,tim=1220394069320320,hv=1020616855
  Line 58: FETCH #4:c=71989,e=70324,p=0,cr=630,cu=0,mis=0,r=0,dep=1,og=1,tim=1220394069390667,hv=1020616855
  Line 60: EXEC #4:c=0,e=322,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=1,tim=1220394070369487,hv=1020616855
  Line 61: FETCH #4:c=71990,e=70318,p=0,cr=630,cu=0,mis=0,r=0,dep=1,og=1,tim=1220394070439833,hv=1020616855
  PARSE: cpu=          0 ela=          0
  EXEC: cpu=          0 ela=        897
  FETCH: cpu=     217967 ela=     212849
  Recursive: cpu=     217967 ela=     213746; Exclusive: cpu=       2999 ela=     980777
Line 62: EXEC #6:c=220966,e=1194523,p=0,cr=1896,cu=0,mis=0,r=0,dep=0,og=1,tim=1220394070440145,hv=562127231
PARSE: cpu=          0 ela=         38
EXEC: cpu=     220966 ela=    1194523
FETCH: cpu=          0 ela=          0

As you can see, when you “go up a recursive level” the PARSE, EXEC and FETCH totals of the previous level are summed by category, totaled as recursive time, and the resulting exclusive time based on the parent statement’s time.

This whole concept of exclusive time becomes important when considering how to most accurately reflect “interval” time in the trace event timeline. My first attempt at categorizing time depended on only using non-recursive totals, that is recursive depth zero, PARSE, EXEC and FETCH statements along with event elapsed time. This had the convenience of not requiring complicated tracking of all the recursive time allocations, and with the proper disclosure was a perfectly valid method of creating the timeline. Of course there are some problems with only representing recursive depth zero time, the biggest one being tracing applications that are predominantly PL/SQL activity. This can lead to few, if any, recursive depth zero timings and can therefore create a misleading timeline. Another big problem is that the distribution of time can change significantly if the actual recursive PARSE, EXEC and FETCH is just lumped in to the parent statement’s recursive depth zero parse time rather than being categorized through each recursive level. A more accurate picture can be obtained by summarizing all recursive and non-recursive time for each time interval. I believe that this is an accurate representation of time from the standpoint of the entire trace file timeline, but can create some thorny issues for those trying to categorize time at the statement level. At the statement level we get into the issues that were mentioned in Part 1 of my Trace Event Timeline series.

To implement this new accuracy level requires that for each recursive level that we track the PARSE, EXEC and FETCH time of each statement’s recursive children along with that statement’s exclusive time, and then track that based on the interval time. This then provides the current totals for each time category as each statement, or wait event for that matter, is emitted. Since interval boundaries are checked as each trace file statement is encountered, when a boundary is crossed we now have an accounting of the current cpu time regardless of recursive depth. I will post a comparison of the trace event timeline that I first proposed and this new method of categorizing all recursive time shortly.

Trace Event Timeline, Part 3

August 11th, 2009

This is part 3 of my series discussing a new technique for analyzing extended SQL trace files that I’m calling a “Trace Event Timeline”. The timeline is presented in a stacked bar chart format based on non-uniform intervals. Each stacked bar is the sum of the time spent in each recorded event during that particular timed interval. The intervals are non-uniform because they are taken only at points in between either WAIT events or recursive depth zero PARSE, EXEC or FETCH statements. A target interval length is derived by dividing the total trace file duration by the number of specified intervals, currently 20. When the running timeline of the trace file exceeds the target interval a new interval is started.

The purpose of the technique is to provide more information about when events occurred during the tracing period. Having just a resource profile of the trace file leaves out information about the event duration skew. An example might be having a large amount of ”SQL*Net message from client time” that exists almost completely at the start and end of the trace period. This might be in contrast to having that relatively evenly distributed across the entire interval. In the first case it might simply be a “scoping” problem with the start and end of the trace whereas the second example might be part of the actual performance issue (i.e. long client side processing separated by row fetches from the database).

As I mentioned in part 1 of this series, the idea for an event timeline occurred to me while I looking at a process that would do a bunch of database work very fast and then pause for a bit and would then speed up again. All sorts of conjecture about database overload were made, but all indications from a workload perspective showed that the database server was barely doing any work at all. The following shows the resource profile from a trace taken during the period that this behavior occurred:

                               Total Event  % Event      Total  --------------- Cumulative ----------------
Event Name                      Time (sec)     Time     Events  Avg Time(sec)  Min Time(sec)  Max Time(sec)
----------------------------   -----------  -------   --------  -------------  -------------  -------------
SQL*Net message from client     255.000799    48.11    152,647       0.001671       0.000138       0.290118
CPU Service                     244.962734    46.22    864,046       0.000284       0.000000       0.054992
unaccounted-for                  18.943126     3.57
db file sequential read          10.345820     1.95      3,262       0.003172       0.000017       0.295407
SQL*Net message to client         0.364421     0.07    152,647       0.000002       0.000000       0.002573
log file switch completion        0.285168     0.05          4       0.071292       0.003946       0.239785
log file sync                     0.130075     0.02         28       0.004646       0.000005       0.017599
latch: library cache              0.002779     0.00         36       0.000077       0.000005       0.001123
latch: cache buffers chains       0.002359     0.00          8       0.000295       0.000005       0.000621
SQL*Net more data from client     0.001583     0.00        201       0.000008       0.000005       0.000051
buffer busy waits                 0.000276     0.00          6       0.000046       0.000017       0.000144
latch: session allocation         0.000082     0.00          1       0.000082       0.000082       0.000082
latch: shared pool                0.000062     0.00          1       0.000062       0.000062       0.000062
SQL*Net more data to client       0.000034     0.00          1       0.000034       0.000034       0.000034
-----------------------------  -----------  -------  ---------  -------------  -------------  -------------
Total                           530.039318   100.00  1,172,888

Notice that 94% of the time was spent almost equally in just two events, “SQL*Net message from client” and “CPU service”. One might assume from looking at this resource profile, and even the individual events with a tkprof output, that the “SQL*Net message from client” could just be a scoping problem. Perhaps we started the trace too early or ended it too late (i.e. before the process started or after it ended).[1] However, from the trace event timeline shown below we see that this is not the case at all:

ora_20172_9294129

What we do see is that the “EXEC” event (i.e. CPU service for EXEC calls) dominates the timeline and then virtually stops for a short period of time, when we see the “SQL*Net message from client” event accumulate most of the time, and then the “EXEC” event takes over again. In fact, this mirrors the behavior that was observed during the actual running of the process, and from this information we can be pretty sure that the delay or “slowdown” is not happening because of the database. If there were contention or some resource shortage we would see that event and not the “SQL*Net message from client” event.

Hopefully I have provided a reasonably clear picture of what a trace event timeline is, how it can be created, and what it can be used for. We’ll have to see if this becomes a useful technique for those problems that are obscured by the skew of event time across a trace file timeline.


[1] Typically an “idle” database process will accumulate “SQL*Net message from client” time waiting for work to be initiated by the client connection.

Trace Event Timeline, Part 2

August 10th, 2009

In Part 2 of this post I’m going to describe how the trace event timeline is created. I think it’s important to discuss which events are being captured and how the intervals are derived since it affects the output of the graph and of course, the validity of the technique. There will also be a Part 3 which will discuss the event timeline graph and how it may be used.

There are several issues that crop up when thinking about how to slice a trace file into intervals and what is valid to use as event totals. When I initially came up with the idea of an event timeline I posed the question to Jeff Holt at Method-R.com in their forum as to what he thought about the idea. He raised these same issues, and suffice it to say that trying to do this at the “statement” level would be very hard. Reconciling the recursive SQL and trying to account for the time across multiple intervals was beyond the scope of what I was trying to achieve. That’s the beauty of the Interval Resource Profiler. It only summarizes the entire trace file or interval, and it does not attempt the much more detailed analysis of statement level attribution. This makes it fairly easy from a scripting perspective to use non-uniform intervals based on statement timings for wait events and recursive depth zero parse, execute and fetch statements.

Now for some background on the current state of extended SQL trace files. As of 11.1.0.6 a SQL trace file looks something like this (this is just a small part of one trace file):

EXEC #6:c=0,e=83,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,tim=1219708745416872
WAIT #6: nam='SQL*Net message to client' ela= 1 driver #bytes=1 p3=0 obj#=-1 tim=1219708745416904
FETCH #6:c=0,e=84,p=0,cr=8,cu=0,mis=0,r=1,dep=0,og=1,tim=1219708745417024
STAT #6 cnt=1 pid=0 pos=1 obj=0 op='NESTED LOOPS  (cr=8 pr=0 pw=0 time=0 us cost=5 size=89 card=1)'
STAT #6 cnt=1 pid=1 pos=1 obj=71003 op='TABLE ACCESS BY INDEX ROWID ODM_PUBLICOBJECT (cr=4 pr=0 pw=0 time=0 us cost=3 size=73 card=1)'
STAT #6 cnt=1 pid=2 pos=1 obj=71009 op='INDEX UNIQUE SCAN SYS_C009848 (cr=3 pr=0 pw=0 time=0 us cost=2 size=0 card=1)'
STAT #6 cnt=1 pid=1 pos=2 obj=70836 op='TABLE ACCESS BY INDEX ROWID ODM_DOCUMENT (cr=4 pr=0 pw=0 time=0 us cost=2 size=16 card=1)'
STAT #6 cnt=1 pid=4 pos=1 obj=70839 op='INDEX UNIQUE SCAN SYS_C009781 (cr=3 pr=0 pw=0 time=0 us cost=1 size=0 card=1)'
WAIT #6: nam='SQL*Net message from client' ela= 1202 driver #bytes=1 p3=0 obj#=-1 tim=1219708745418336
FETCH #6:c=0,e=2,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=0,tim=1219708745418358
WAIT #6: nam='SQL*Net message to client' ela= 1 driver #bytes=1 p3=0 obj#=-1 tim=1219708745418375
WAIT #6: nam='SQL*Net message from client' ela= 87423 driver #bytes=1 p3=0 obj#=-1 tim=1219708745505813
EXEC #3:c=0,e=84,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,tim=1219708745505944
WAIT #3: nam='SQL*Net message to client' ela= 1 driver #bytes=1 p3=0 obj#=-1 tim=1219708745506104
FETCH #3:c=0,e=174,p=0,cr=11,cu=0,mis=0,r=3,dep=0,og=1,tim=1219708745506134
WAIT #3: nam='SQL*Net message from client' ela= 2607 driver #bytes=1 p3=0 obj#=-1 tim=1219708745508784

Oracle continues to add events and fields but the basic trace file format remains essentially the same (see Metalink note 39817.1). The first thing to notice is that the WAIT, PARSE, EXEC and FETCH statements all have “tim=” values. These values provide the basis for a running timeline for the interval being traced (see Optimizing Oracle Performance for the best description of a trace file). Unfortunately the “tim=” values for the WAIT statements do not yet appear to make sense in the overall timing (this is apparently due to Oracle bug 8342329), and aren’t available in older releases. However this can be dealt with by keeping a running timeline and using the “ela=” or elapsed time values to supplement the timeline between PARSE, EXEC and FETCH statements. This is, in fact, the approach taken with the Interval Resource Profiler.

Now, what about slicing a trace file into intervals? Given that we’re only interested in the entire trace file interval and that we have a running timeline, we can create non-uniform intervals as long as we don’t “double count” recursive CPU time (again see Optimizing Oracle Performance for a description of double counting recursive statements). That leaves us with creating our interval “splits” between WAIT events or recursive depth zero PARSE, EXEC and FETCH statements. The side effect of this is that the event totals for each interval may or may not be the same, hence the non-uniform part. This ultimately affects the graph representation as well, and I’ll cover that in part 3. Adding this code to the Interval Resource Profiler was actually pretty easy since the timeline tracking and event summarization was already there.

The basic timeline tracking consists of checking to see if the start time has been set for statements with a “tim=” value that are not WAIT events. If the start time has not been set then the timeline is set to the current time minus any previously recorded WAIT event time. If it has been set then if the current time is less than the start time the start time is reset to the current time. Otherwise the end time is set (the normal situation) and a check is made to see if we’ve passed over an event time boundary, and if so then a new boundary time is set.

Timeline

#
# Track interval time start and end
#
if ($time_start == 0) {
  $time_start = ($8 - $time_offset);
  $time_offset = 0;
  $time_end = $8;
  $time_skew = $time_end;
}
elsif ($8 < $time_start) {
  $time_start = $8;
}
else {
  $time_end = $8;
  #
  # Adjust the skew interval
  #
  if (($time_end-$time_skew) > $time_skew_intvl) {
    $time_skew = $time_end;
  }
}
#

If a PARSE, EXEC or FETCH statement and the recursive depth is zero (i.e. dep = 0) then the CPU time “c=” is added to the appropriate event total for this interval (i.e. PARSE, EXEC or FETCH).

PARSE|EXEC|FETCH

#
# Track the interval events for P|E|F dep=0
#
if ($10 == 0) {
  my $cmd = "'" . $1 . "'";
  if (defined $intvl_skew->{$time_skew}->{$cmd}) {
    $intvl_skew->{$time_skew}->{$cmd} += $3;
  }
  else {
    $intvl_skew->{$time_skew}->{$cmd} = $3;
  }
}
#

If this is a WAIT statement then the timeline consists of either adding the time to the “time_offset”, if the start time hasn’t been set, or adding it to the end time (in case no more tim= values are seen). The end time is then checked to see if we’ve passed  over an event time boundary and if so then we set a new one.


WAITs

#
if ($time_start == 0) {
  $time_offset = $ela;
}
$time_end += $ela;
#
# Adjust the skew interval
#
if (($time_end-$time_skew) > $time_skew_intvl) {
  $time_skew = $time_end;
}
#

The elapsed time (i.e. “ela=”) is then added to the event time for this interval.

#
# Track the interval events
#
if (defined $intvl_skew->{$time_skew}->{$nam}) {
  $intvl_skew->{$time_skew}->{$nam} += $ela;
}
else {
  $intvl_skew->{$time_skew}->{$nam} = $ela;
}
#

The only hard part left is determining what the base interval time should be for the event timeline. The Interval Resource Profiler is a two pass parser, so currently the event timeline interval is calculated in pass one by taking the entire trace file interval time and dividing it by the desired number of event intervals, currently 20. Then in pass 2 the previously described logic is applied to calculate the event timeline using the interval as $time_skew_intvl.

Once the event timeline has been calculated, it’s just a matter of graphing the results. This can be done in Perl using the GD Graphics Library package, this was how the graph in part 1 was created, or the results can be output to a CSV file and input into an Excel spreadsheet. In either case a pivot on the data must be performed in order to get the event time as a column for each interval row in order to create the graph. This will be part of the discussion of Part 3.

Trace Event Timeline, Part 1

August 7th, 2009

I was working on a problem recently where a process was “slow”. So naturally I got one of the DBAs to trace a sample of one of the runs and we took a look at it. The interesting part was that it was about half and half between SQL*Net message from client and CPU (EXEC statements). It turns out that the process would do a bunch of database work very fast and then just pause, issuing statements but much more slowly, and then it would speed up again. Naturally everyone blamed the database, but from the trace file one might assume that it was at least partly a client issue. The problem was that from the trace file there was no way to tell that the behavior was “skewed”, running fast then slowing down then running fast again.

This led me to thinking that it would be very handy in some cases to know a little bit more about when certain events occur within a traced session. Some of our famous brethren will usually say that that’s just a scoping problem, and while that is probably true, in the real world it’s not always so obvious, or easy to do something about. So I decided to modify the AppsDBA Interval Resource Profiler to see if I could create an “Event Timeline”. The following is the result:

cmsdev1_ora_1825_3507313

The idea is that the graph displays the event totals in a timeline as they occurred. It seems to make it easier to determine roughly when actions took place. In other words, is there event skew? Not skew for a particular event, which is what the Hotsos/Method-R Profiler gives you, but across the entire trace interval. The timeline should help determine if all of the activity took place in a small set of intervals, or evenly across the whole interval, or perhaps like my original problem in spurts. The resource profile that goes along with this graph looks like the following:


                          Total Event  % Event    Total  --------------- Cumulative ----------------
Event Name                 Time (sec)     Time   Events  Avg Time(sec)  Min Time(sec)  Max Time(sec)
------------------------  -----------  -------  -------  -------------  -------------  -------------
SQL*Net message from         0.801698    77.92      262       0.003060       0.000280       0.150290
client
SQL*Net more data from       0.132364    12.87      520       0.000255       0.000003       0.019935
client
CPU Service                  0.058989     5.73      478       0.000123       0.000000       0.010998
unaccounted-for              0.033580     3.26
library cache pin            0.000746     0.07        4       0.000187       0.000147       0.000254
library cache lock           0.000610     0.06        3       0.000203       0.000161       0.000237
gc current grant busy        0.000563     0.05        1       0.000563       0.000563       0.000563
SQL*Net message to           0.000285     0.03      262       0.000001       0.000000       0.000031
Client
SQL*Net more data to         0.000016     0.00        1       0.000016       0.000016       0.000016
client
------------------------  -----------  -------  -------  -------------  -------------  -------------
Total                        1.028851   100.00    1,531

I’ll have more to say about this in the near future.