I have a 2 node RAC cluster under test, and the throughput isn’t as much as hoped. I needed to work out what SQL was causing all my cluster waits…
First step – identify the snapshots that cover the time period of the testing by looking in dba_hist_snapshot (I’ve not put in the SQL for this as it’s fairly obvious)
My two snapshots are 541 and 542.
Next do something with a dual purpose, check the number of waits associated with each class (why chase things that won’t have a big impact) and get the wait_class_id..
WAIT_CLASS_ID WAIT_CLASS CNT
————- ——————– ———-
2000153315 Network 1
3290255840 Configuration 1
1740759767 User I/O 2
1893977003 Other 22
3875070507 Concurrency 27
4108307767 System I/O 47
4217450380 Application 48
3386400367 Commit 265
3871361733 Cluster 1273
10 rows selected.
Well, the majority of my waits are cluster related (though next on my hit list will be the commit related events). Let’s find out which cluster waits are giving me problems
EVENT_ID EVENT CNT
———- —————————— ———-
1742950045 gc current retry 1
661121159 gc cr multi block request 1
3905407295 gc current request 1
2705335821 gc cr block congested 3
3897775868 gc current multi block request 3
105117041 gc buffer busy release 3
1520064534 gc cr block busy 5
3785617759 gc current block congested 7
2685450749 gc current grant 2-way 12
1912606394 gc buffer busy acquire 62
2277737081 gc current grant busy 73
2701629120 gc current block busy 112
737661873 gc cr block 2-way 360
111015833 gc current block 2-way 630
14 rows selected.
So most of my waits are ‘gc cr block 2-way’ and ‘gc current block 2-way’. Take their event_id and see which SQL was doing most of the waiting for those events.
So now we have the SQL ID, we can get the text associated with it, and hopefully come up with a fix on the SQL.
So the majority of our cluster waits is caused by the insert into the transaction log…
The two waits are related to transferring and requesting blocks in memory from the other instance. The Current block is where we’re competing for blocks to update. If a block has fewer rows, then the chance that both instances will need the same block is reduced.
This leaves me with 2 simple fixes that doesn’t need the application fiddling with:-
1. Reduce row density (tweak PCTFREE)
2. Reduce my blocksize from 8k to 2k.
I chose the 2k blocksize option by creating a new tablespace with the smaller blocksize and a new buffer pool to support it. It increased throughput by 10%