Archive for December, 2009

11g Release 2 RAC Observations

December 4th, 2009

So I’ve done a couple of 11gR2 RAC installs and an upgrade and I don’t think one of them has been the same. The two big things so far have been SCAN addresses and the OCR and voting disk placement.

SCAN addresses are Oracle’s new way of solving the “connection” and load balancing problems inherent with VIPs. For installs it’s required, but on upgrades it’s optional. Quite a difference and I think it should be optional for installs as well. That way it allows for an implementation once it’s fully understood.

As for the OCR and voting disk(s) Oracle has made it possible to put both in ASM. This is great if you’re running just 11gR2, but if you’re upgrading a cluster with any other releases then the OS utilities like “srvctl” won’t work because they can’t read the OCR, and you can’t use the 11.2 versions on older releases. If this is your situation then you have to leave the OCR and voting disks where they are and can’t take advantage of this new feature.

Rates and Waits Revisited

December 4th, 2009

Recently I’ve been reminded about why it’s so important to consider the rate of work in addition to any “wait events” when optimizing response time. I first coined the term “Rates and Waits” for the paper, “Oracle Workload Management Using Time Based Optimization Techniques” way back in 2003, and the point I was making then, as now, is that it’s the rate of work performed that’s really important.

After all, service time in the formula “response time = service time + wait time”[1] is all about the rate of work performed. The trap of considering only wait events for “tuning” is just that, a trap. Deciding between important wait events or so called idle wait events, and trying to determine the real impact of them is impossible unless the interval being measured and how much work was performed is known. Why? Because there is no context to measure against. Without knowing the impact of the time measured there is no way to judge the time spent waiting. If the interval was 1 second then waiting for 0.5 seconds is probably significant, but if the interval was 10 seconds then it probably isn’t. This is why time based profiles are so important.

When considering system level workload, rates become even more important, since wait events at the system level are essentially useless. Recall that at the system level there is an infinite capacity to wait. After all, more than one process can wait on something, but each service (whether CPU, I/O or network) can only service one request at a time. With system level workload it’s all about capacity and throughput. Capacity is the maximum amount of “service” that a system or resource can perform and throughput is the measure of that service.


[1] See YAPP or Gunther or pretty much any performance material for a further explanation of R = S + W.