Hacking SLOB to run on Solaris

Kevin Closson’s Silly Little Oracle Benchmark https://kevinclosson.net/slob/ is not ported to Solaris. This means that to get it working is not as easy as it should be. Kevin himself suggests that you use a small linux host to run the tool if your database is running on a non-supported operating system.

I’m going to track the changes I make here.

setup.sh

First problem – grep!

./setup.sh test 16
SLOB 2.4.0
FATAL : 2018.05.15-14:40:57 : Usage : ./setup.sh.orig: <tablespace name> <number of SLOB schemas to create and load>
FATAL : 2018.05.15-14:40:57 : Option 2 must be an integer

This is caused by the function f_is_int requiring the gnu grep command behaviour. Simplest way to this is to change the grep to point to /usr/gnu/bin/grep in this script.

Next problem..

Once this change was made, I was then able to create the first schema ( yay!) but then when it came to make the remaining 15, it failed.

NOTIFY : 2018.05.15-14:37:27 : Waiting for background batch 1. Loading up to user11
FATAL : 2018.05.15-14:37:29 : 
FATAL : 2018.05.15-14:37:29 : f_flag_abort: Triggering abort
FATAL : 2018.05.15-14:37:29 : 
FATAL : 2018.05.15-14:37:30 : 
FATAL : 2018.05.15-14:37:30 : f_flag_abort: Triggering abort
FATAL : 2018.05.15-14:37:30 : 
FATAL : 2018.05.15-14:37:40 : 
FATAL : 2018.05.15-14:37:40 : f_flag_abort: Triggering abort
FATAL : 2018.05.15-14:37:40 : 
FATAL : 2018.05.15-14:37:40 : 
FATAL : 2018.05.15-14:37:40 : f_check_abort_flag: discovered abort flag
FATAL : 2018.05.15-14:37:40 : 
FATAL : 2018.05.15-14:37:40 : Aborting SLOB setup. See /export/home/oracle/mel/SLOB/cr_tab_and_load.out

 

Handily there is an error message in the logfile

ALTER TABLE cf1 MINIMIZE RECORDS_PER_BLOCK
*
ERROR at line 1:
ORA-00604: error occurred at recursive SQL level 1
ORA-00039: error during periodic action
ORA-04036: PGA memory used by the instance exceeds PGA_AGGREGATE_LIMIT




Disconnected from Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production
With the Partitioning, Real Application Clusters, Automatic Storage Management, OLAP,
Advanced Analytics and Real Application Testing options
FATAL : 2018.05.15-14:37:40 :
FATAL : 2018.05.15-14:37:40 : f_setup: Failed to load user4 SLOB table
FATAL : 2018.05.15-14:37:40 :

I’m going to assume the clue is in the error and increase the PGA aggregate limit

SQL> alter system set pga_aggregate_limit=36G scope=both sid='*';

After this change, the script ran to the end successfully.

 

$ diff setup.sh setup.sh.orig
30,31d29
< 
< 
86c84
< if ( ! echo $s | /usr/gnu/bin/grep -q "^-\?[0-9]*$" )
---
> if ( ! echo $s | grep -q "^-\?[0-9]*$" )
250c248
< if ( echo "$cdb" | /usr/gnu/bin/grep -q "YES" > /dev/null 2>&1 )
---
> if ( echo "$cdb" | grep -q "YES" > /dev/null 2>&1 )
254c252
< if ( echo "$cdb" | /usr/gnu/bin/grep -q "NO" > /dev/null 2>&1 )
---
> if ( echo "$cdb" | grep -q "NO" > /dev/null 2>&1 )
410c408
< done | sqlplus -s "$constring" 2>&1 | tee -a $fname | /usr/gnu/bin/grep -i "dropped" | wc -l | while read num_processed
---
> done | sqlplus -s "$constring" 2>&1 | tee -a $fname | grep -i "dropped" | wc -l | while read num_processed

runit.sh

So – we already know that the scripts need gnu grep, especially in the function f_is_int. Safest option will be  to change every occurrence of grep with /usr/gnu/bin/grep.

Try running it with a single schema

 ./runit.sh 1

The duration of the run is based on the slob.conf variable RUN_TIME. The default is 300 seconds, but I am dropping it to 90 while I am doing debugging.

This ran successfully (I think!). Certainly AWR reports were created.

There was one obvious failure – the mpstat output was not generated.

$ cat mpstat.out 
mpstat: -P expects a number
Usage: mpstat [-aqm] [-A core|soc|bin] [-k key1,[key2,...]] [-o num] [-p | -P processor_set] [-T d|u] [-I statfile | -O statfile ] [interval [count]]

Ok – so this is the line causing the problem.

 ( mpstat -P ALL 3 > mpstat.out 2>&1) &

On Linux mpstat without the -P flag will give you summary output, -P separates the output by processor.  On Solaris the default is to show one line per CPU, so I just need to change the line to

 ( mpstat 3 > mpstat.out 2>&1) &

also check for other invocations of mpstat

Next - try with 2 sessions

./runit.sh 2

Arrgh. That fails with

: List of monitored sqlplus PIDs written to /tmp/.SLOB.2018.05.15.153524/13174.f_wait_pids.out.
usage: ps [ -aAdefHlcjLPyZ ] [ -o format ] [ -t termlist ]
 [ -u userlist ] [ -U userlist ] [ -G grouplist ]
 [ -p proclist ] [ -g pgrplist ] [ -s sidlist ] [ -z zonelist ] [-h lgrplist]
 'format' is one or more of:
 user ruser group rgroup uid ruid gid rgid pid ppid pgid sid taskid ctid
 pri opri pcpu pmem vsz rss rssprivate rssshared osz nice class time etime stime zone zoneid env
 f s c lwp nlwp psr tty addr wchan fname comm args projid project pset lgrp

This wasn’t obvious where the fault was happening. So lets add the -x flag to the first line of our script to try and find the line causing our problems.

++ sed /PID/d
++ wc -l
+++ cat /tmp/.SLOB.2018.05.15.153852/15451.f_wait_pids.out
++ sed 's/[^0-9]//g'
++ ps -p 32526 32527
usage: ps [ -aAdefHlcjLPyZ ] [ -o format ] [ -t termlist ]
 [ -u userlist ] [ -U userlist ] [ -G grouplist ]
 [ -p proclist ] [ -g pgrplist ] [ -s sidlist ] [ -z zonelist ] [-h lgrplist]
 'format' is one or more of:
 user ruser group rgroup uid ruid gid rgid pid ppid pgid sid taskid ctid
 pri opri pcpu pmem vsz rss rssprivate rssshared osz nice class time etime stime zone zoneid env
 f s c lwp nlwp psr tty addr wchan fname comm args projid project pset lgrp
++ return 0
+ tmp=0

So there is something with the ps -p command that is not happy in solaris. I  can reproduce the problem at my command line, by providing ps -p with multiple pids.


# ps -p 7 8
usage: ps [ -aAdefHlcjLPyZ ] [ -o format ] [ -t termlist ].....

If we look at the ps man page, on solaris it says

Some options accept lists as arguments. Items in a list can be either
separated by commas or else enclosed in quotes and separated by commas
or spaces. Values for proclist and grplist must be numeric.

so this means the following will work, but not the format within runit.sh

# ps -p 7,8
 PID TTY TIME CMD
 7 ? 127:47 zpool-rp
 8 ? 1:39 kmem_tas
# ps -p "7 8"
 PID TTY TIME CMD
 7 ? 127:47 zpool-rp
 8 ? 1:39 kmem_tas

 

So – I need to find the line with ps command, and try to change the format presented.

while ( ps -p $pidstring > /dev/null 2>&1 )

 ps -fp $pidstring

both occur in function f_wait_pids

but, they are actually being presented with the list of pids built up in line 1440

sqlplus_pids="${sqlplus_pids} $!"

So – how do I go about getting either a comma or a quote in the string that is passed to the function f_wait_pids?

Well, rather than destroying something that Kevin may rely on later, I add a line below 1440 to generate the pid list with commas

 melsqlplus_pids="${melsqlplus_pids},$!"

This is pretty ugly, as it means my string leads with a comma. However it seems that Solaris doesn’t care about this.

Now I need to add this to the function call on line 1493

if ( ! f_wait_pids "$(( SCHEMAS * THREADS_PER_SCHEMA ))" "$RUN_TIME" "$WORK_LOOP" "$sqlplus_pids"  "$melsqlplus_pids")

Of course I need to add something to pick up that new 5th argument to the function, so within the function itself

local melpidstring="$5"

while ( ps -p "$melpidstring" > /dev/null 2>&1 )

ps -fp "$melpidstring"

So this all seems ok.. but then I discover some more little rats in f_count_pid.

So once again – create my own custom usage of the pid list. There probably is a MUCH simpler way to do this, I’m just building it up as I go along. So – here are the changes I have made to runit.sh to get it to execute without error

 

oracle@sc8avm-25:~/mel/SLOB$ diff runit.sh runit.sh.orig
1c1
< #!/bin/bash 
---
> #!/bin/bash
45c45
< if ( ! echo "$s" | /usr/gnu/bin/grep -q "^-\?[0-9]*$" ) 
---
> if ( ! echo "$s" | grep -q "^-\?[0-9]*$" ) 
275c275
< for string in 'iostat -xm 3' 'mpstat 3' 'vmstat 3'
---
> for string in 'iostat -xm 3' 'mpstat -P ALL 3' 'vmstat 3'
300c300
< ls -l /proc/${tmp}/fd | /usr/gnu/bin/grep deleted
---
> ls -l /proc/${tmp}/fd | grep deleted
416c416
< sqlplus $user/${user}${non_admin_connect_string} <<EOF 2>/dev/null | sed 's/^.* FATAL/FATAL/g' | /usr/gnu/bin/grep FATAL > $tmpfile
---
> sqlplus $user/${user}${non_admin_connect_string} <<EOF 2>/dev/null | sed 's/^.* FATAL/FATAL/g' | grep FATAL > $tmpfile
457c457
< if ( /usr/gnu/bin/grep FATAL "$tmpfile" > /dev/null 2>&1 )
---
> if ( grep FATAL "$tmpfile" > /dev/null 2>&1 )
485d484
< local melpidstring="$5"
488d486
< local meltmpfile="${SLOB_TEMPDIR}/${RANDOM}.MEL${FUNCNAME}.out"
502d499
< echo "$melpidstring" > $meltmpfile 2>&1
508d504
< f_msg NOTIFY "List of monitored sqlplus PIDs with commas written to ${meltmpfile}."
514c510
< tmp=`f_count_pids "$meltmpfile"`
---
> tmp=`f_count_pids "$tmpfile"`
542,544c538,539
< echo "This is the pidstring value $pidstring"
< echo "This is the melpidstring value $melpidstring"
< while ( ps -p "$melpidstring" > /dev/null 2>&1 )
---
> 
> while ( ps -p $pidstring > /dev/null 2>&1 )
551c546
< ps -fp "$melpidstring"
---
> ps -fp $pidstring
875c870
< if ( ! echo "$tmp" | /usr/gnu/bin/grep -q '\-' > /dev/null 2>&1 )
---
> if ( ! echo "$tmp" | grep -q '\-' > /dev/null 2>&1 )
1397c1392
< ( mpstat 3 > mpstat.out 2>&1) &
---
> ( mpstat -P ALL 3 > mpstat.out 2>&1) &
1446d1440
< melsqlplus_pids="${melsqlplus_pids},$!"
1496c1490,1491
< if ( ! f_wait_pids "$(( SCHEMAS * THREADS_PER_SCHEMA ))" "$RUN_TIME" "$WORK_LOOP" "$sqlplus_pids" "$melsqlplus_pids" )
---
> 
> if ( ! f_wait_pids "$(( SCHEMAS * THREADS_PER_SCHEMA ))" "$RUN_TIME" "$WORK_LOOP" "$sqlplus_pids" )

Tomorrow – time to get someone else to run it and see if it behaves how they expect.

 

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s