11g Release 2 RAC Observations

December 4th, 2009

So I’ve done a couple of 11gR2 RAC installs and an upgrade and I don’t think one of them has been the same. The two big things so far have been SCAN addresses and the OCR and voting disk placement.

SCAN addresses are Oracle’s new way of solving the “connection” and load balancing problems inherent with VIPs. For installs it’s required, but on upgrades it’s optional. Quite a difference and I think it should be optional for installs as well. That way it allows for an implementation once it’s fully understood.

As for the OCR and voting disk(s) Oracle has made it possible to put both in ASM. This is great if you’re running just 11gR2, but if you’re upgrading a cluster with any other releases then the OS utilities like “srvctl” won’t work because they can’t read the OCR, and you can’t use the 11.2 versions on older releases. If this is your situation then you have to leave the OCR and voting disks where they are and can’t take advantage of this new feature.

Rates and Waits Revisited

December 4th, 2009

Recently I’ve been reminded about why it’s so important to consider the rate of work in addition to any “wait events” when optimizing response time. I first coined the term “Rates and Waits” for the paper, “Oracle Workload Management Using Time Based Optimization Techniques” way back in 2003, and the point I was making then, as now, is that it’s the rate of work performed that’s really important.

After all, service time in the formula “response time = service time + wait time”[1] is all about the rate of work performed. The trap of considering only wait events for “tuning” is just that, a trap. Deciding between important wait events or so called idle wait events, and trying to determine the real impact of them is impossible unless the interval being measured and how much work was performed is known. Why? Because there is no context to measure against. Without knowing the impact of the time measured there is no way to judge the time spent waiting. If the interval was 1 second then waiting for 0.5 seconds is probably significant, but if the interval was 10 seconds then it probably isn’t. This is why time based profiles are so important.

When considering system level workload, rates become even more important, since wait events at the system level are essentially useless. Recall that at the system level there is an infinite capacity to wait. After all, more than one process can wait on something, but each service (whether CPU, I/O or network) can only service one request at a time. With system level workload it’s all about capacity and throughput. Capacity is the maximum amount of “service” that a system or resource can perform and throughput is the measure of that service.


[1] See YAPP or Gunther or pretty much any performance material for a further explanation of R = S + W.

Trace Event Timeline, Wrap Up

October 11th, 2009

I’ve published a new utility to generate a trace file event timeline. It’s available on appsdba.com here. I’ve also updated the original interval resource profiler with the same code as the event profiler utility so that it displays CPU events separated by PARSE, EXEC and FETCH with full recursive accounting. The previous version grouped all CPU time into one “event” called “CPU Service” and only distinguished between non-recursive (e.g. dep=0) and recursive (e.g. dep>0) time. The new utility also sorts events, which my original attempt did not, but so far I have not seen a significant difference in the recursive time accounting. The event sort has changed the look of the graphs slightly from the originals, so comparisons are a little more difficult. The time accounting is correct so far, so it looks like this version is accurate. I will be publishing a new utility shortly that can generate the “execution tree” that I showed in my previous article “Trace Event Timeline -  Recursive, Non-Recursive and Exclusive Time“. This utility makes it easier to reconcile time to insure proper time accounting.

As for the graph comparison? Here’s the new view of the same graph that I showed in Part 3 of this trace event timeline series.

new_20172

Trace Event Timeline – Recursive, Non-Recursive and Exclusive Time

August 30th, 2009

Non-recursive and recursive cpu and elapsed time in extended SQL trace files are distinguished by whether the “dep=” value for PARSE, EXEC and FETCH statements is zero (i.e. non-recursive) or greater than zero (i.e. recursive). Millsap and Holt in Optimizing Oracle Performance [Pgs. 90-94] do an excellent job explaining the differentiation, and it is important in profiling resource time since all recursive time is invoked by non-recursive parent statements. What is typically not discussed, and is a little more complicated to categorize, is something called exclusive time. Exclusive time is simply the time that can be directly attributed to executing a particular PARSE, EXEC or FETCH statement. In other words it is the statement time less any recursive time incurred for that statement. It’s actually a little more complicated in that what we really have for each motivating statement is four time categories, the exclusive time for that statement and any recursive PARSE, EXEC or FETCH time. However, this is a recursive, no pun intended, relationship for each recursive PARSE, EXEC or FECTH statement since they may also have recursive children. In order to make the time accounting a little easier I wrote what I call an “execution tree” for PARSE, EXEC or FETCH statements in a trace file. It basically creates an indented tree listing based on recursive level and sums the four types of time when a parent recursive level is reached. This makes it much easier to determine the proper time accounting.

Partial trace file:


=====================

PARSING IN CURSOR #4 len=740 dep=1 uid=61 oct=3 lid=61 tim=1220394068956400 hv=1020616855 ad='e3eaf9e0'
select  tab.row_id, tab.msgid, tab.corrid, tab.priority, tab.delay,   tab.expiration, tab.retry_count, tab.exception_qschema,   tab.exception_queue, tab.chain_no, tab.local_order_no, tab.enq_time,   tab.time_manager_info, tab.state, tab.enq_tid, tab.step_no,   tab.sender_name, tab.sender_address, tab.sender_protocol,   tab.dequeue_msgid, tab.user_prop, tab.user_data   from "AE2"."AQ$_ANALYSIS_ENGINE_IN_F" tab  where (subscriber_id = :1 and subscriber_name = :2 and queue_evtid = :3         and delivery_mode = 'PERSISTENT') and (tab.user_data.extract('/ns:NIFMessage/ns:payload/aee:analysisSequenceId', 'xmlns:ns="http://nif.llnl.gov" xmlns:aee="http://www.oracle.com/aq/XMLType"').getStringVal() like '%TAS_WFLZfocusToTCC_Sequencer%')
END OF STMT
EXEC #4:c=0,e=626,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=1,tim=1220394068956393
FETCH #4:c=69989,e=68532,p=0,cr=630,cu=0,mis=0,r=0,dep=1,og=1,tim=1220394069024985
WAIT #6: nam='Streams AQ: waiting for messages in the queue' ela= 147396 queue id=73580 process#=4038849208 wait time=1 obj#=-1 tim=1220394069172592
EXEC #4:c=0,e=317,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=1,tim=1220394069173174
FETCH #4:c=71990,e=70466,p=0,cr=630,cu=0,mis=0,r=0,dep=1,og=1,tim=1220394069243668
ERROR #6:err=25228 tim=783947179
WAIT #6: nam='SQL*Net break/reset to client' ela= 6 driver break?=1 p3=0 obj#=-1 tim=1220394069244129
WAIT #6: nam='SQL*Net break/reset to client' ela= 271 driver break?=0 p3=0 obj#=-1 tim=1220394069244422
WAIT #6: nam='SQL*Net message to client' ela= 2 driver #bytes=1 p3=0 obj#=-1 tim=1220394069244444
WAIT #6: nam='SQL*Net message from client' ela= 807 driver #bytes=1 p3=0 obj#=-1 tim=1220394069245294
=====================
PARSING IN CURSOR #6 len=174 dep=0 uid=61 oct=47 lid=61 tim=1220394069245484 hv=562127231 ad='ef4b8648'
begin dbms_aqin.aq$_dequeue_in( :1, :2, :3, :4, :5, :6, :7, :8, :9, :10, :11, :12, :13, :14, :15, :16, :17, :18, :19, :20, :21, :22, :23, :24, :25, :26, :27, :28, :29); end;
END OF STMT
PARSE #6:c=0,e=38,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,tim=1220394069245479
EXEC #4:c=0,e=284,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=1,tim=1220394069247543
FETCH #4:c=73988,e=72207,p=0,cr=630,cu=0,mis=0,r=0,dep=1,og=1,tim=1220394069319777
EXEC #4:c=0,e=291,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=1,tim=1220394069320320
WAIT #4: nam='latch: cache buffers chains' ela= 41 address=3983055408 number=122 tries=0 obj#=-1 tim=1220394069344643
FETCH #4:c=71989,e=70324,p=0,cr=630,cu=0,mis=0,r=0,dep=1,og=1,tim=1220394069390667
WAIT #6: nam='Streams AQ: waiting for messages in the queue' ela= 978084 queue id=73580 process#=4038849208 wait time=1 obj#=-1 tim=1220394070368953
EXEC #4:c=0,e=322,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=1,tim=1220394070369487
FETCH #4:c=71990,e=70318,p=0,cr=630,cu=0,mis=0,r=0,dep=1,og=1,tim=1220394070439833
EXEC #6:c=220966,e=1194523,p=0,cr=1896,cu=0,mis=0,r=0,dep=0,og=1,tim=1220394070440145

Resulting execution tree:

  Line 39: EXEC #4:c=0,e=626,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=1,tim=1220394068956393,hv=1020616855
  Line 40: FETCH #4:c=69989,e=68532,p=0,cr=630,cu=0,mis=0,r=0,dep=1,og=1,tim=1220394069024985,hv=1020616855
  Line 42: EXEC #4:c=0,e=317,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=1,tim=1220394069173174,hv=1020616855
  Line 43: FETCH #4:c=71990,e=70466,p=0,cr=630,cu=0,mis=0,r=0,dep=1,og=1,tim=1220394069243668,hv=1020616855
  PARSE: cpu=          0 ela=          0
  EXEC: cpu=          0 ela=        943
  FETCH: cpu=     141979 ela=     138998
  Recursive: cpu=     141979 ela=     139941; Exclusive: cpu=    -141979 ela=    -139903
Line 53: PARSE #6:c=0,e=38,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,tim=1220394069245479,hv=562127231
  Line 54: EXEC #4:c=0,e=284,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=1,tim=1220394069247543,hv=1020616855
  Line 55: FETCH #4:c=73988,e=72207,p=0,cr=630,cu=0,mis=0,r=0,dep=1,og=1,tim=1220394069319777,hv=1020616855
  Line 56: EXEC #4:c=0,e=291,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=1,tim=1220394069320320,hv=1020616855
  Line 58: FETCH #4:c=71989,e=70324,p=0,cr=630,cu=0,mis=0,r=0,dep=1,og=1,tim=1220394069390667,hv=1020616855
  Line 60: EXEC #4:c=0,e=322,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=1,tim=1220394070369487,hv=1020616855
  Line 61: FETCH #4:c=71990,e=70318,p=0,cr=630,cu=0,mis=0,r=0,dep=1,og=1,tim=1220394070439833,hv=1020616855
  PARSE: cpu=          0 ela=          0
  EXEC: cpu=          0 ela=        897
  FETCH: cpu=     217967 ela=     212849
  Recursive: cpu=     217967 ela=     213746; Exclusive: cpu=       2999 ela=     980777
Line 62: EXEC #6:c=220966,e=1194523,p=0,cr=1896,cu=0,mis=0,r=0,dep=0,og=1,tim=1220394070440145,hv=562127231
PARSE: cpu=          0 ela=         38
EXEC: cpu=     220966 ela=    1194523
FETCH: cpu=          0 ela=          0

As you can see, when you “go up a recursive level” the PARSE, EXEC and FETCH totals of the previous level are summed by category, totaled as recursive time, and the resulting exclusive time based on the parent statement’s time.

This whole concept of exclusive time becomes important when considering how to most accurately reflect “interval” time in the trace event timeline. My first attempt at categorizing time depended on only using non-recursive totals, that is recursive depth zero, PARSE, EXEC and FETCH statements along with event elapsed time. This had the convenience of not requiring complicated tracking of all the recursive time allocations, and with the proper disclosure was a perfectly valid method of creating the timeline. Of course there are some problems with only representing recursive depth zero time, the biggest one being tracing applications that are predominantly PL/SQL activity. This can lead to few, if any, recursive depth zero timings and can therefore create a misleading timeline. Another big problem is that the distribution of time can change significantly if the actual recursive PARSE, EXEC and FETCH is just lumped in to the parent statement’s recursive depth zero parse time rather than being categorized through each recursive level. A more accurate picture can be obtained by summarizing all recursive and non-recursive time for each time interval. I believe that this is an accurate representation of time from the standpoint of the entire trace file timeline, but can create some thorny issues for those trying to categorize time at the statement level. At the statement level we get into the issues that were mentioned in Part 1 of my Trace Event Timeline series.

To implement this new accuracy level requires that for each recursive level that we track the PARSE, EXEC and FETCH time of each statement’s recursive children along with that statement’s exclusive time, and then track that based on the interval time. This then provides the current totals for each time category as each statement, or wait event for that matter, is emitted. Since interval boundaries are checked as each trace file statement is encountered, when a boundary is crossed we now have an accounting of the current cpu time regardless of recursive depth. I will post a comparison of the trace event timeline that I first proposed and this new method of categorizing all recursive time shortly.

Trace Event Timeline, Part 3

August 11th, 2009

This is part 3 of my series discussing a new technique for analyzing extended SQL trace files that I’m calling a “Trace Event Timeline”. The timeline is presented in a stacked bar chart format based on non-uniform intervals. Each stacked bar is the sum of the time spent in each recorded event during that particular timed interval. The intervals are non-uniform because they are taken only at points in between either WAIT events or recursive depth zero PARSE, EXEC or FETCH statements. A target interval length is derived by dividing the total trace file duration by the number of specified intervals, currently 20. When the running timeline of the trace file exceeds the target interval a new interval is started.

The purpose of the technique is to provide more information about when events occurred during the tracing period. Having just a resource profile of the trace file leaves out information about the event duration skew. An example might be having a large amount of ”SQL*Net message from client time” that exists almost completely at the start and end of the trace period. This might be in contrast to having that relatively evenly distributed across the entire interval. In the first case it might simply be a “scoping” problem with the start and end of the trace whereas the second example might be part of the actual performance issue (i.e. long client side processing separated by row fetches from the database).

As I mentioned in part 1 of this series, the idea for an event timeline occurred to me while I looking at a process that would do a bunch of database work very fast and then pause for a bit and would then speed up again. All sorts of conjecture about database overload were made, but all indications from a workload perspective showed that the database server was barely doing any work at all. The following shows the resource profile from a trace taken during the period that this behavior occurred:

                               Total Event  % Event      Total  --------------- Cumulative ----------------
Event Name                      Time (sec)     Time     Events  Avg Time(sec)  Min Time(sec)  Max Time(sec)
----------------------------   -----------  -------   --------  -------------  -------------  -------------
SQL*Net message from client     255.000799    48.11    152,647       0.001671       0.000138       0.290118
CPU Service                     244.962734    46.22    864,046       0.000284       0.000000       0.054992
unaccounted-for                  18.943126     3.57
db file sequential read          10.345820     1.95      3,262       0.003172       0.000017       0.295407
SQL*Net message to client         0.364421     0.07    152,647       0.000002       0.000000       0.002573
log file switch completion        0.285168     0.05          4       0.071292       0.003946       0.239785
log file sync                     0.130075     0.02         28       0.004646       0.000005       0.017599
latch: library cache              0.002779     0.00         36       0.000077       0.000005       0.001123
latch: cache buffers chains       0.002359     0.00          8       0.000295       0.000005       0.000621
SQL*Net more data from client     0.001583     0.00        201       0.000008       0.000005       0.000051
buffer busy waits                 0.000276     0.00          6       0.000046       0.000017       0.000144
latch: session allocation         0.000082     0.00          1       0.000082       0.000082       0.000082
latch: shared pool                0.000062     0.00          1       0.000062       0.000062       0.000062
SQL*Net more data to client       0.000034     0.00          1       0.000034       0.000034       0.000034
-----------------------------  -----------  -------  ---------  -------------  -------------  -------------
Total                           530.039318   100.00  1,172,888

Notice that 94% of the time was spent almost equally in just two events, “SQL*Net message from client” and “CPU service”. One might assume from looking at this resource profile, and even the individual events with a tkprof output, that the “SQL*Net message from client” could just be a scoping problem. Perhaps we started the trace too early or ended it too late (i.e. before the process started or after it ended).[1] However, from the trace event timeline shown below we see that this is not the case at all:

ora_20172_9294129

What we do see is that the “EXEC” event (i.e. CPU service for EXEC calls) dominates the timeline and then virtually stops for a short period of time, when we see the “SQL*Net message from client” event accumulate most of the time, and then the “EXEC” event takes over again. In fact, this mirrors the behavior that was observed during the actual running of the process, and from this information we can be pretty sure that the delay or “slowdown” is not happening because of the database. If there were contention or some resource shortage we would see that event and not the “SQL*Net message from client” event.

Hopefully I have provided a reasonably clear picture of what a trace event timeline is, how it can be created, and what it can be used for. We’ll have to see if this becomes a useful technique for those problems that are obscured by the skew of event time across a trace file timeline.


[1] Typically an “idle” database process will accumulate “SQL*Net message from client” time waiting for work to be initiated by the client connection.

Trace Event Timeline, Part 2

August 10th, 2009

In Part 2 of this post I’m going to describe how the trace event timeline is created. I think it’s important to discuss which events are being captured and how the intervals are derived since it affects the output of the graph and of course, the validity of the technique. There will also be a Part 3 which will discuss the event timeline graph and how it may be used.

There are several issues that crop up when thinking about how to slice a trace file into intervals and what is valid to use as event totals. When I initially came up with the idea of an event timeline I posed the question to Jeff Holt at Method-R.com in their forum as to what he thought about the idea. He raised these same issues, and suffice it to say that trying to do this at the “statement” level would be very hard. Reconciling the recursive SQL and trying to account for the time across multiple intervals was beyond the scope of what I was trying to achieve. That’s the beauty of the Interval Resource Profiler. It only summarizes the entire trace file or interval, and it does not attempt the much more detailed analysis of statement level attribution. This makes it fairly easy from a scripting perspective to use non-uniform intervals based on statement timings for wait events and recursive depth zero parse, execute and fetch statements.

Now for some background on the current state of extended SQL trace files. As of 11.1.0.6 a SQL trace file looks something like this (this is just a small part of one trace file):

EXEC #6:c=0,e=83,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,tim=1219708745416872
WAIT #6: nam='SQL*Net message to client' ela= 1 driver #bytes=1 p3=0 obj#=-1 tim=1219708745416904
FETCH #6:c=0,e=84,p=0,cr=8,cu=0,mis=0,r=1,dep=0,og=1,tim=1219708745417024
STAT #6 cnt=1 pid=0 pos=1 obj=0 op='NESTED LOOPS  (cr=8 pr=0 pw=0 time=0 us cost=5 size=89 card=1)'
STAT #6 cnt=1 pid=1 pos=1 obj=71003 op='TABLE ACCESS BY INDEX ROWID ODM_PUBLICOBJECT (cr=4 pr=0 pw=0 time=0 us cost=3 size=73 card=1)'
STAT #6 cnt=1 pid=2 pos=1 obj=71009 op='INDEX UNIQUE SCAN SYS_C009848 (cr=3 pr=0 pw=0 time=0 us cost=2 size=0 card=1)'
STAT #6 cnt=1 pid=1 pos=2 obj=70836 op='TABLE ACCESS BY INDEX ROWID ODM_DOCUMENT (cr=4 pr=0 pw=0 time=0 us cost=2 size=16 card=1)'
STAT #6 cnt=1 pid=4 pos=1 obj=70839 op='INDEX UNIQUE SCAN SYS_C009781 (cr=3 pr=0 pw=0 time=0 us cost=1 size=0 card=1)'
WAIT #6: nam='SQL*Net message from client' ela= 1202 driver #bytes=1 p3=0 obj#=-1 tim=1219708745418336
FETCH #6:c=0,e=2,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=0,tim=1219708745418358
WAIT #6: nam='SQL*Net message to client' ela= 1 driver #bytes=1 p3=0 obj#=-1 tim=1219708745418375
WAIT #6: nam='SQL*Net message from client' ela= 87423 driver #bytes=1 p3=0 obj#=-1 tim=1219708745505813
EXEC #3:c=0,e=84,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,tim=1219708745505944
WAIT #3: nam='SQL*Net message to client' ela= 1 driver #bytes=1 p3=0 obj#=-1 tim=1219708745506104
FETCH #3:c=0,e=174,p=0,cr=11,cu=0,mis=0,r=3,dep=0,og=1,tim=1219708745506134
WAIT #3: nam='SQL*Net message from client' ela= 2607 driver #bytes=1 p3=0 obj#=-1 tim=1219708745508784

Oracle continues to add events and fields but the basic trace file format remains essentially the same (see Metalink note 39817.1). The first thing to notice is that the WAIT, PARSE, EXEC and FETCH statements all have “tim=” values. These values provide the basis for a running timeline for the interval being traced (see Optimizing Oracle Performance for the best description of a trace file). Unfortunately the “tim=” values for the WAIT statements do not yet appear to make sense in the overall timing (this is apparently due to Oracle bug 8342329), and aren’t available in older releases. However this can be dealt with by keeping a running timeline and using the “ela=” or elapsed time values to supplement the timeline between PARSE, EXEC and FETCH statements. This is, in fact, the approach taken with the Interval Resource Profiler.

Now, what about slicing a trace file into intervals? Given that we’re only interested in the entire trace file interval and that we have a running timeline, we can create non-uniform intervals as long as we don’t “double count” recursive CPU time (again see Optimizing Oracle Performance for a description of double counting recursive statements). That leaves us with creating our interval “splits” between WAIT events or recursive depth zero PARSE, EXEC and FETCH statements. The side effect of this is that the event totals for each interval may or may not be the same, hence the non-uniform part. This ultimately affects the graph representation as well, and I’ll cover that in part 3. Adding this code to the Interval Resource Profiler was actually pretty easy since the timeline tracking and event summarization was already there.

The basic timeline tracking consists of checking to see if the start time has been set for statements with a “tim=” value that are not WAIT events. If the start time has not been set then the timeline is set to the current time minus any previously recorded WAIT event time. If it has been set then if the current time is less than the start time the start time is reset to the current time. Otherwise the end time is set (the normal situation) and a check is made to see if we’ve passed over an event time boundary, and if so then a new boundary time is set.

Timeline

#
# Track interval time start and end
#
if ($time_start == 0) {
  $time_start = ($8 - $time_offset);
  $time_offset = 0;
  $time_end = $8;
  $time_skew = $time_end;
}
elsif ($8 < $time_start) {
  $time_start = $8;
}
else {
  $time_end = $8;
  #
  # Adjust the skew interval
  #
  if (($time_end-$time_skew) > $time_skew_intvl) {
    $time_skew = $time_end;
  }
}
#

If a PARSE, EXEC or FETCH statement and the recursive depth is zero (i.e. dep = 0) then the CPU time “c=” is added to the appropriate event total for this interval (i.e. PARSE, EXEC or FETCH).

PARSE|EXEC|FETCH

#
# Track the interval events for P|E|F dep=0
#
if ($10 == 0) {
  my $cmd = "'" . $1 . "'";
  if (defined $intvl_skew->{$time_skew}->{$cmd}) {
    $intvl_skew->{$time_skew}->{$cmd} += $3;
  }
  else {
    $intvl_skew->{$time_skew}->{$cmd} = $3;
  }
}
#

If this is a WAIT statement then the timeline consists of either adding the time to the “time_offset”, if the start time hasn’t been set, or adding it to the end time (in case no more tim= values are seen). The end time is then checked to see if we’ve passed  over an event time boundary and if so then we set a new one.


WAITs

#
if ($time_start == 0) {
  $time_offset = $ela;
}
$time_end += $ela;
#
# Adjust the skew interval
#
if (($time_end-$time_skew) > $time_skew_intvl) {
  $time_skew = $time_end;
}
#

The elapsed time (i.e. “ela=”) is then added to the event time for this interval.

#
# Track the interval events
#
if (defined $intvl_skew->{$time_skew}->{$nam}) {
  $intvl_skew->{$time_skew}->{$nam} += $ela;
}
else {
  $intvl_skew->{$time_skew}->{$nam} = $ela;
}
#

The only hard part left is determining what the base interval time should be for the event timeline. The Interval Resource Profiler is a two pass parser, so currently the event timeline interval is calculated in pass one by taking the entire trace file interval time and dividing it by the desired number of event intervals, currently 20. Then in pass 2 the previously described logic is applied to calculate the event timeline using the interval as $time_skew_intvl.

Once the event timeline has been calculated, it’s just a matter of graphing the results. This can be done in Perl using the GD Graphics Library package, this was how the graph in part 1 was created, or the results can be output to a CSV file and input into an Excel spreadsheet. In either case a pivot on the data must be performed in order to get the event time as a column for each interval row in order to create the graph. This will be part of the discussion of Part 3.

Trace Event Timeline, Part 1

August 7th, 2009

I was working on a problem recently where a process was “slow”. So naturally I got one of the DBAs to trace a sample of one of the runs and we took a look at it. The interesting part was that it was about half and half between SQL*Net message from client and CPU (EXEC statements). It turns out that the process would do a bunch of database work very fast and then just pause, issuing statements but much more slowly, and then it would speed up again. Naturally everyone blamed the database, but from the trace file one might assume that it was at least partly a client issue. The problem was that from the trace file there was no way to tell that the behavior was “skewed”, running fast then slowing down then running fast again.

This led me to thinking that it would be very handy in some cases to know a little bit more about when certain events occur within a traced session. Some of our famous brethren will usually say that that’s just a scoping problem, and while that is probably true, in the real world it’s not always so obvious, or easy to do something about. So I decided to modify the AppsDBA Interval Resource Profiler to see if I could create an “Event Timeline”. The following is the result:

cmsdev1_ora_1825_3507313

The idea is that the graph displays the event totals in a timeline as they occurred. It seems to make it easier to determine roughly when actions took place. In other words, is there event skew? Not skew for a particular event, which is what the Hotsos/Method-R Profiler gives you, but across the entire trace interval. The timeline should help determine if all of the activity took place in a small set of intervals, or evenly across the whole interval, or perhaps like my original problem in spurts. The resource profile that goes along with this graph looks like the following:


                          Total Event  % Event    Total  --------------- Cumulative ----------------
Event Name                 Time (sec)     Time   Events  Avg Time(sec)  Min Time(sec)  Max Time(sec)
------------------------  -----------  -------  -------  -------------  -------------  -------------
SQL*Net message from         0.801698    77.92      262       0.003060       0.000280       0.150290
client
SQL*Net more data from       0.132364    12.87      520       0.000255       0.000003       0.019935
client
CPU Service                  0.058989     5.73      478       0.000123       0.000000       0.010998
unaccounted-for              0.033580     3.26
library cache pin            0.000746     0.07        4       0.000187       0.000147       0.000254
library cache lock           0.000610     0.06        3       0.000203       0.000161       0.000237
gc current grant busy        0.000563     0.05        1       0.000563       0.000563       0.000563
SQL*Net message to           0.000285     0.03      262       0.000001       0.000000       0.000031
Client
SQL*Net more data to         0.000016     0.00        1       0.000016       0.000016       0.000016
client
------------------------  -----------  -------  -------  -------------  -------------  -------------
Total                        1.028851   100.00    1,531

I’ll have more to say about this in the near future.

Protecting Instrumentation With Conditional Compilation

October 5th, 2008

Conditional compilation was a feature added to Oracle 10gR2 for PL/SQL, but appears to be available in releases 9.2 and 10.1. It seems like a cool feature but I had never needed it for anything that I could think of. However as I’ve spent more and more time writing PL/SQL lately I’ve realized that it is an excellent way of protecting instrumentation that may be dependent on external calls. My example involves the Hotsos Instrumentation Library for Oracle (ILO) which I have been using a lot lately. Without conditional compilation, code written that calls the ILO will fail to compile if the ILO hasn’t been installed. However with conditional compilation the code can be “hidden” and can be used without change even if the ILO has not been installed. The following will show an example. I’ve taken the liberty of renaming the HOTSOS_ILO_TASK call to HOTSOS1_ILO_TASK simply because the package is installed in my test database.

My first example attempts to compile my procedure after setting my selection directive (i.e. hotsos_ilo) to true. The code will fail to compile due to the lack of the HOTSOS1_ILO_TASK package.

SQL> alter session set plsql_ccflags = 'hotsos_ilo:true';

Session altered.

SQL> create or replace procedure test_cond
2 is
3 begin
4 $IF $$hotsos_ilo $THEN
5 HOTSOS1_ILO_TASK.BEGIN_TASK(
6 module => 'Set Module',
7 action => 'Set Action');
8 $END
9 --
10 dbms_lock.sleep(5);
11 --
12 $IF $$hotsos_ilo $THEN
13 HOTSOS1_ILO_TASK.END_TASK;
14 $END
15 end;
16 /

Warning: Procedure created with compilation errors.

SQL> show errors
Errors for PROCEDURE TEST_COND:

LINE/COL ERROR
-------- -----------------------------------------------------------------
5/5 PL/SQL: Statement ignored
5/5 PLS-00201: identifier 'HOTSOS1_ILO_TASK.BEGIN_TASK' must be
declared
13/5 PL/SQL: Statement ignored
13/5 PLS-00201: identifier 'HOTSOS1_ILO_TASK.END_TASK' must be
declared

The next example shows a successful compilation after setting the selection directive (i.e. hotsos_ilo) to false.

SQL> alter session set plsql_ccflags = 'hotsos_ilo:false';

Session altered.

SQL> --
SQL> create or replace procedure test_cond
2 is
3 begin
4 $IF $$hotsos_ilo $THEN
5 HOTSOS1_ILO_TASK.BEGIN_TASK(
6 module => 'Set Module',
7 action => 'Set Action');
8 $END
9 --
10 dbms_lock.sleep(5);
11 --
12 $IF $$hotsos_ilo $THEN
13 HOTSOS1_ILO_TASK.END_TASK;
14 $END
15 end;
16 /

Procedure created.

SQL>

Services and backups

June 29th, 2008

I’ve been a proponent of using services both to help categorize database connections and to use them to help classify workload. For applications where “you can’t touch the code” this can be the only way to come up with categories to measure application workload. Recently I’ve started using a “backup” service to identify the cost backups contribute to overall system workload. The problem with services however is that there are only limited ways to associate a service to a connection. For “internal” connections a service can be associated with a scheduler job through a job class, and can also be associated with parallel execution or parallel DML, or with advanced queuing (although I’ve only used scheduler jobs with services). As for client connections, the service is assigned with the “SERVICE_NAME” parameter on the connect string. This implies a connection through the listener since this is typically part of Oracle net services. This can pose a problem for backups since this creates, in my opinion, an unnecessary dependence on the listener being up in order to take advantage of running the backup process through a service.

Typically a backup program, and we’ll talk about RMAN here but it could be any product really, will connect to the database through a “bequeath” connection by setting the ORACLE_SID environment variable as part of the environment setup. This type of connection doesn’t require that a listener be available, and also doesn’t need a tnsnames.ora entry. Unfortunately there is also no way to associate the connection with a service that I’ve found. However, I have stumbled upon a solution. It turns out that it is possible to define a “bequeath” connection in the tnsnames.ora file and subsequently use the SERVICE_NAME parameter as part of the connect string. This type of connection, even though part of the tnsnames.ora file, doesn’t require a listener to work. I stumbled upon this in Metalink note 132764.1, and it seems that it has always existed, at least since Oracle7 and SQL Net V2, based on some of the other notes available on Metalink.

The following will show an example of setting up the TNS entry and then setting up the backup service and running RMAN connected to the new backup service.

The first task is to create the service in the database. The following example is for a single instance database, in RAC you should use srvctl.

SQL> show parameters service

NAME                         TYPE        VALUE
---------------------------- ----------- -------------------------------------------
service_names                string      orcl.appsdba.com
SQL> alter system set service_names = "orcl.appsdba.com,backup.appsdba.com" scope=both;

System altered.

SQL> show parameters service

NAME                         TYPE        VALUE
---------------------------- ----------- -------------------------------------------
service_names                string      orcl.appsdba.com,backup.appsdba.com
SQL>

Verify the listener service registration:

$ lsnrctl services

LSNRCTL for Linux: Version 10.1.0.5.0 - Production on 25-JUN-2008 10:12:00

Copyright (c) 1991, 2004, Oracle.  All rights reserved.

Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=IPC)(KEY=orcl)))
Services Summary...
Service "PLSExtProc" has 1 instance(s).
  Instance "PLSExtProc", status UNKNOWN, has 1 handler(s) for this service...
    Handler(s):
      "DEDICATED" established:0 refused:0
         LOCAL SERVER
Service "backup.appsdba.com" has 1 instance(s).
  Instance "orcl", status READY, has 1 handler(s) for this service...
    Handler(s):
      "DEDICATED" established:1 refused:0 state:ready
         LOCAL SERVER
Service "orcl" has 1 instance(s).
  Instance "orcl", status UNKNOWN, has 1 handler(s) for this service...
    Handler(s):
      "DEDICATED" established:0 refused:0
         LOCAL SERVER
Service "orcl.appsdba.com" has 1 instance(s).
  Instance "orcl", status READY, has 1 handler(s) for this service...
    Handler(s):
      "DEDICATED" established:1 refused:0 state:ready
         LOCAL SERVER
The command completed successfully
$

Let’s show the service registration in the database:

SQL> select * from v$services where service_id = 10;

SERVICE_ID NAME                       NAME_HASH
---------- -------------------------- ----------
NETWORK_NAME
------------------------------------------------
CREATION_ CREATION_DATE_HASH
--------- ------------------
        10 backup.appsdba.com         833319543
backup.appsdba.com
25-JUN-08          638794112

SQL>

Now, we’ll show the tnsnames.ora entry. Note that the protocol is “beq” for bequeath and that we’ve used the SERVICE_NAME for our backup service. Also note that as part of the definition we’ve concatenated the “oracle” executable name and our oracle SID name for the ARGV0 value.

ORCL_BACKUP =
  (DESCRIPTION =
    (ADDRESS =
      (PROTOCOL = BEQ)
      (PROGRAM = oracle)
      (ARGV0 = oracleorcl)
      (ARGS = '(DESCRIPTION=(LOCAL=YES)(ADDRESS=(PROTOCOL=beq)))')
    )
    (CONNECT_DATA =
      (SERVICE_NAME = backup.appsdba.com)
    )
  )

Now let’s test the database connection:

$ sqlplus system@orcl_backup

SQL*Plus: Release 10.1.0.5.0 - Production on Wed Jun 25 10:19:06 2008

Copyright (c) 1982, 2005, Oracle.  All rights reserved.

Enter password: 

Connected to:
Oracle Database 10g Enterprise Edition Release 10.1.0.5.0 - Production
With the Partitioning, OLAP and Data Mining options

SQL> select USERNAME, SERVER, PROGRAM, STATUS, SERVICE_NAME
from v$session where username = 'SYSTEM';

USERNAME                       SERVER    PROGRAM                            STATUS
------------------------------ --------- ---------------------------------- --------
SERVICE_NAME
----------------------------------------------------------------
SYSTEM                         DEDICATED sqlplus@appsdba.com (TNS V1-V3)    ACTIVE
backup.appsdba.com

SQL>

So, we’ve connected to the database and verified through the v$session view that we’ve connected to our backup service.
Now let’s connect with RMAN:

$ $ORACLE_HOME/bin/rman nocatalog target /@orcl_backup

Recovery Manager: Release 10.1.0.5.0 - Production

Copyright (c) 1995, 2004, Oracle.  All rights reserved.

connected to target database: ORCL (DBID=3806185278)
using target database controlfile instead of recovery catalog

RMAN>

In another session we’ll query v$session again:

SQL> l
  1  select username, status, program, service_name from v$session
  2* where username = 'SYS'
SQL> /

USERNAME                       STATUS   PROGRAM
------------------------------ -------- --------------------------------
SERVICE_NAME
----------------------------------------------------------------
SYS                            ACTIVE   sqlplus@appsdba.com (TNS V1-V3)
SYS$USERS

SYS                            INACTIVE rman@appsdba.com (TNS V1-V3)
backup.appsdba.com

SYS                            INACTIVE rman@appsdba.com (TNS V1-V3)
backup.appsdba.com

SQL>

Now let’s verify that we can still connect with no listener:

/u01/app/oracle/local/bkup$ lsnrctl stop

LSNRCTL for Linux: Version 10.1.0.5.0 - Production on 25-JUN-2008 10:33:09

Copyright (c) 1991, 2004, Oracle.  All rights reserved.

Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=IPC)(KEY=orcl)))
The command completed successfully
/u01/app/oracle/local/bkup$

Now let’s see if we can connect with our normal TNS alias:

/u01/app/oracle/local/bkup$ sqlplus system@orcl

SQL*Plus: Release 10.1.0.5.0 - Production on Wed Jun 25 10:33:23 2008

Copyright (c) 1982, 2005, Oracle.  All rights reserved.

Enter password:
ERROR:
ORA-12541: TNS:no listener

Enter user-name:

And no, we can’t because the listener is down. Now let’s see if our backup connection will work:

/u01/app/oracle/local/bkup$ $ORACLE_HOME/bin/rman nocatalog target /@orcl_backup

Recovery Manager: Release 10.1.0.5.0 - Production

Copyright (c) 1995, 2004, Oracle.  All rights reserved.

connected to target database: ORCL (DBID=3806185278)
using target database controlfile instead of recovery catalog

RMAN> exit

Recovery Manager complete.
/u01/app/oracle/local/bkup$

And indeed it does. We now have a backup service and it is independent of the Oracle network listener.

Subpartition Statistics

February 3rd, 2008

I’ve been working on a project with a very large subpartitioned table and I’ve been trying to come up with a sensible statistics generation strategy. This application is on a Linux based RAC system currently running 10g R2 (10.2.0.3). I’ve been trying to use every source of information available and of course, many 10053 traces. I came across this little blurb on Jonathon Lewis’ web site:

Sub-Partition stats (12th June)

Page 39, Second paragraph:

The problem of partitions and table-level statistics echoes on down the chain to subpartitions. If you want to query exactly one subpartition of one partition, then the optimizer uses the statistics for that one subpartition.

This is wrong. It looks as if Oracle hardly uses sub-partition statistics at all. I have a test case on 10.2.0.1 which shows the optimizer using the partition level statistics to calculate the cardinality of a very simple query that has been identified as targeting exactly one sub-partition of one partition.  However, the access path is a full table scan of that sub-partition, and the cost of the scan is clearly derived from the number of blocks in the sub-partition.

Thanks to Adrian Billington for bringing this to my attention.

I must say that this mirrors what I’ve found. For the queries that I’ve been modeling, the 10053 traces lead me to believe that the optimizer only considers the row and block counts for the pruned subpartitions in its calculations. All other information, including histogram statistics, all appear to be taken from the partition level only.   

The following excerpt from a 10053 trace is the only place that I’ve found where the optimizer considers the subpartition statistics:

Table Stats::
Table: mytab Alias: mytab (making adjustments for partition skews)
ORIGINAL VALUES:: #Rows: 286200000 #Blks: 1598304 AvgRowLen: 82.00
SUBPARTITIONS::
PRUNED: 1
ANALYZED: 1 UNANALYZED: 0
Partition [5]
#Rows: 286200000 #Blks: 1598304 AvgRowLen: 82.00
#Rows: 25004700 #Blks: 135309 AvgRowLen: 82.00 <== This is the only place that reflects subpartition stats.