Rates and Waits Revisited
Recently I’ve been reminded about why it’s so important to consider the rate of work in addition to any “wait events” when optimizing response time. I first coined the term “Rates and Waits” for the paper, “Oracle Workload Management Using Time Based Optimization Techniques” way back in 2003, and the point I was making then, as now, is that it’s the rate of work performed that’s really important.
After all, service time in the formula “response time = service time + wait time”[1] is all about the rate of work performed. The trap of considering only wait events for “tuning” is just that, a trap. Deciding between important wait events or so called idle wait events, and trying to determine the real impact of them is impossible unless the interval being measured and how much work was performed is known. Why? Because there is no context to measure against. Without knowing the impact of the time measured there is no way to judge the time spent waiting. If the interval was 1 second then waiting for 0.5 seconds is probably significant, but if the interval was 10 seconds then it probably isn’t. This is why time based profiles are so important.
When considering system level workload, rates become even more important, since wait events at the system level are essentially useless. Recall that at the system level there is an infinite capacity to wait. After all, more than one process can wait on something, but each service (whether CPU, I/O or network) can only service one request at a time. With system level workload it’s all about capacity and throughput. Capacity is the maximum amount of “service” that a system or resource can perform and throughput is the measure of that service.
[1] See YAPP or Gunther or pretty much any performance material for a further explanation of R = S + W.
Entries