vmstat – who is in kthr w status

This post leans heavily on the work of clever people before me, especially this blog post https://blogs.oracle.com/swan/entry/find_out_process_es_with

I had a zone with has been heavily used until recently, but had mostly been quiescent for the past few weeks. In vmstat, it had been noticed that I had approximately 150 lwps in the w state.

# vmstat 5 5
 kthr memory page disk faults cpu
 r b w swap free re mf pi po fr de sr sd sd sd vc in sy cs us sy id
 0 0 188 873445368 54537152 472 699 13 0 0 0 0 20 0 32 0 17621 32317 17008 0 0 99
 0 0 149 863818176 9628832 42 733 38 0 0 0 0 31 0 33 0 19384 147817 16927 20 1 79
 0 0 149 864814392 10389536 125 919 0 0 0 0 0 14 0 20 0 19523 135382 16785 26 1 73
 0 0 149 864914688 10422784 63 544 0 0 0 0 0 16 0 33 0 19072 128355 16572 26 1 73
 0 0 149 863905112 9505416 46 579 0 0 0 0 0 14 0 29 0 19057 130498 16582 25 1 75

 

The vmstat man page says..

 w the number of swapped out lightweight processes (LWPs)
   that are waiting for processing resources to finish.

Hmm..

So I had a look at this blog post which uses mdb and some knowledge of the solaris source to find the PIDs https://blogs.oracle.com/swan/entry/find_out_process_es_with

 

Of course as I’m in a Zone, I need to do my investigations at the Global Zone level.

First get the list of PIDs and their swapped count in hex

 

# echo '::walk proc|::print -t proc_t p_pidp->pid_id p_swapcnt'|mdb -k|awk '{if(NR%2){printf("%s\t",$0);}else{printf("%s\n",$0);}}'|awk '{if($NF!=0){printf("pid: %s\tp_swapcnt: %s\n",$4,$NF);}}'

giving an output like

pid: 0x2f p_swapcnt: 0x1
pid: 0x25 p_swapcnt: 0x3
pid: 0x11 p_swapcnt: 0x1
pid: 0xf p_swapcnt: 0x5
pid: 0xcb7d p_swapcnt: 0x1

which I saved to a text file. Now, the blog post only had 17 to play with.. I’ve got over 150 so I’m not going to be looking up all the individual PIDs by hand. There is almost certainly a more elegant way of doing this, through cunning use of pipe and awk or maybe dtrace, but I was pressed for time.

#!/bin/bash
runcounter=0
while read blah pidder blah1 counter
do
 outpid=`printf "%d\n" $pidder`
 outcounter=`printf "%d\n" $counter`
 echo "Number of LWPS swapped : $outcounter"
 echo "Process=`ps -fp $outpid`" 
 echo "-------------------------------------------"
 runcounter=$(($runcounter+$outcounter))
done < walk.txt 
echo "Total number of lwps in state w: $runcounter"

This gave an output similar to :

<snip>

-------------------------------------------
Number of LWPS swapped : 5
Process= UID PID PPID C STIME TTY TIME CMD
 root 15 1 0 Feb 27 ? 1:22 /lib/svc/bin/svc.startd
-------------------------------------------
Number of LWPS swapped : 1
Process= UID PID PPID C STIME TTY TIME CMD
 root 52093 15 0 Mar 09 console 0:00 /usr/sbin/ttymon -g -d /dev/console -l console -m ldterm,ttcompat -h -p sc7ach00pd01-d2 console login: 
-------------------------------------------
Total number of lwps in state w: 149


From the Solaris Internals manual ( 2.4.1 The Process Structure,  Table 10.3 and 10.3.6 The Memory Scheduler),   processes with p_swapcnt > 0 are those who have been swapped out by the memory scheduler to free up memory pages.  This is a separate operation from page-out, and is relatively inexpensive, though does dramatically affect the process’s performance.  Swapping out a process involves removing all of a process’s thread structures and private pages from memory and setting flags in the process to table to show that this process has been swapped out.  The memory scheduler is started at boot time and doesn’t do anything until the memory is consistently less than desfree memory over a 30 second average.  Desfree is a calculated value https://docs.oracle.com/cd/E53394_01/html/E54818/chapter2-10.html#OSTUNchapter2-103 , set at 1/128th of the memory of the system, at a minimum of 256K.

 

At some point in the recent past, this system suffered extreme memory pressure due to someone starting up a huge SGA + PGA on the system. It looks like the memory scheduler does not automatically swap the processes back in when the memory pressure eases, instead waiting for the process to do ‘something’ and need to run those LWPs (this makes sense – it’s better to not do work unless it’s needed, and as desfree is not actually a lot of memory free, if you’re bumping along that threshhold the last thing you need is the scheduler to un-swap something and tip you back into a memory shortage)

I basically ‘touched’ each one of these pids by using the pfiles command … and now I have no processes sitting in state ‘w’

# vmstat 5 3
 kthr      memory            page            disk          faults      cpu
 r b w   swap  free  re  mf pi po fr de sr sd sd sd vc   in   sy   cs us sy id
 0 0 187 873530456 54500568 471 698 13 0 0 0 0 20 0 32 0 17621 32769 17006 0 0 99
 0 0 0 909716856 43286880 585 1103 0 0 0 0 0 28 0 32 0 18478 206015 16744 2 0 98
 0 0 0 909647064 43275192 62 260 0 0 0 0 0 14  0 27  0 17117 202019 15744 2 0 98

 

 

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s