vmstat – who is in kthr w status

This post leans heavily on the work of clever people before me, especially this blog post https://blogs.oracle.com/swan/entry/find_out_process_es_with

I had a zone with has been heavily used until recently, but had mostly been quiescent for the past few weeks. In vmstat, it had been noticed that I had approximately 150 lwps in the w state.

# vmstat 5 5
 kthr memory page disk faults cpu
 r b w swap free re mf pi po fr de sr sd sd sd vc in sy cs us sy id
 0 0 188 873445368 54537152 472 699 13 0 0 0 0 20 0 32 0 17621 32317 17008 0 0 99
 0 0 149 863818176 9628832 42 733 38 0 0 0 0 31 0 33 0 19384 147817 16927 20 1 79
 0 0 149 864814392 10389536 125 919 0 0 0 0 0 14 0 20 0 19523 135382 16785 26 1 73
 0 0 149 864914688 10422784 63 544 0 0 0 0 0 16 0 33 0 19072 128355 16572 26 1 73
 0 0 149 863905112 9505416 46 579 0 0 0 0 0 14 0 29 0 19057 130498 16582 25 1 75

 

The vmstat man page says..

 w the number of swapped out lightweight processes (LWPs)
   that are waiting for processing resources to finish.

Hmm..

So I had a look at this blog post which uses mdb and some knowledge of the solaris source to find the PIDs https://blogs.oracle.com/swan/entry/find_out_process_es_with

 

Of course as I’m in a Zone, I need to do my investigations at the Global Zone level.

First get the list of PIDs and their swapped count in hex

 

# echo '::walk proc|::print -t proc_t p_pidp->pid_id p_swapcnt'|mdb -k|awk '{if(NR%2){printf("%s\t",$0);}else{printf("%s\n",$0);}}'|awk '{if($NF!=0){printf("pid: %s\tp_swapcnt: %s\n",$4,$NF);}}'

giving an output like

pid: 0x2f p_swapcnt: 0x1
pid: 0x25 p_swapcnt: 0x3
pid: 0x11 p_swapcnt: 0x1
pid: 0xf p_swapcnt: 0x5
pid: 0xcb7d p_swapcnt: 0x1

which I saved to a text file. Now, the blog post only had 17 to play with.. I’ve got over 150 so I’m not going to be looking up all the individual PIDs by hand. There is almost certainly a more elegant way of doing this, through cunning use of pipe and awk or maybe dtrace, but I was pressed for time.

#!/bin/bash
runcounter=0
while read blah pidder blah1 counter
do
 outpid=`printf "%d\n" $pidder`
 outcounter=`printf "%d\n" $counter`
 echo "Number of LWPS swapped : $outcounter"
 echo "Process=`ps -fp $outpid`" 
 echo "-------------------------------------------"
 runcounter=$(($runcounter+$outcounter))
done < walk.txt 
echo "Total number of lwps in state w: $runcounter"

This gave an output similar to :

<snip>

-------------------------------------------
Number of LWPS swapped : 5
Process= UID PID PPID C STIME TTY TIME CMD
 root 15 1 0 Feb 27 ? 1:22 /lib/svc/bin/svc.startd
-------------------------------------------
Number of LWPS swapped : 1
Process= UID PID PPID C STIME TTY TIME CMD
 root 52093 15 0 Mar 09 console 0:00 /usr/sbin/ttymon -g -d /dev/console -l console -m ldterm,ttcompat -h -p sc7ach00pd01-d2 console login: 
-------------------------------------------
Total number of lwps in state w: 149


From the Solaris Internals manual ( 2.4.1 The Process Structure,  Table 10.3 and 10.3.6 The Memory Scheduler),   processes with p_swapcnt > 0 are those who have been swapped out by the memory scheduler to free up memory pages.  This is a separate operation from page-out, and is relatively inexpensive, though does dramatically affect the process’s performance.  Swapping out a process involves removing all of a process’s thread structures and private pages from memory and setting flags in the process to table to show that this process has been swapped out.  The memory scheduler is started at boot time and doesn’t do anything until the memory is consistently less than desfree memory over a 30 second average.  Desfree is a calculated value https://docs.oracle.com/cd/E53394_01/html/E54818/chapter2-10.html#OSTUNchapter2-103 , set at 1/128th of the memory of the system, at a minimum of 256K.

 

At some point in the recent past, this system suffered extreme memory pressure due to someone starting up a huge SGA + PGA on the system. It looks like the memory scheduler does not automatically swap the processes back in when the memory pressure eases, instead waiting for the process to do ‘something’ and need to run those LWPs (this makes sense – it’s better to not do work unless it’s needed, and as desfree is not actually a lot of memory free, if you’re bumping along that threshhold the last thing you need is the scheduler to un-swap something and tip you back into a memory shortage)

I basically ‘touched’ each one of these pids by using the pfiles command … and now I have no processes sitting in state ‘w’

# vmstat 5 3
 kthr      memory            page            disk          faults      cpu
 r b w   swap  free  re  mf pi po fr de sr sd sd sd vc   in   sy   cs us sy id
 0 0 187 873530456 54500568 471 698 13 0 0 0 0 20 0 32 0 17621 32769 17006 0 0 99
 0 0 0 909716856 43286880 585 1103 0 0 0 0 0 28 0 32 0 18478 206015 16744 2 0 98
 0 0 0 909647064 43275192 62 260 0 0 0 0 0 14  0 27  0 17117 202019 15744 2 0 98

 

 

Advertisements

Adding an Oracle Exadata Storage Server to Enterprise Manager using the command line

 

Ok, I’m just noodling around here… I have some ‘spare’ storage servers that are in the same fabric as my SuperCluster and I wanted to discover them in EM.

oracle@odc-em-sc7a:/u01/app/oracle/agent13/agent_inst/sysman/log$ emcli add_target -type=oracle_exadata -name=”expbcel09.osc.uk.oracle.com” -host=”sc7ach00pd00-d1.osc.uk.oracle.com” -properties=”CellName:expbcel09.osc.uk.oracle.com;MgmtIPAddr:138.3.3.82″
Target “expbcel09.osc.uk.oracle.com:oracle_exadata” added successfully

Changing attributes on ASM Diskgroups

You can see the attributes set on your ASM diskgroups in 2 ways, via the view  v$asm_attribute

SQL> select a.name, b.name, b.value from v$asm_attribute b, v$asm_diskgroup a where a.group_number=b.group_number and a.name='DATAX6C1' ;

Or via asmcmd and the lsattr

grid@sc7ach00pd00-d1:~$ asmcmd

ASMCMD> cd DATAX6C1
ASMCMD> lsattr -G DATAXC1 -l

Attributes you might want to pay attention to in an Exadata environment are

  • compatible.advm
  • cell.smart_scan_capable
  • appliance.mode
  • compatible.asm
  • compatible.rdbms

If you manually create a diskgroup via asmca these attributes will not normally be set, and so you may want to go manually set them.

SQL> ALTER DISKGROUP DATAX6C1 SET ATTRIBUTE 'appliance.mode'='TRUE';

The attribute is set immediately, but based on my experience, it does not come into effect until the disk group has  been rebalanced.

SQL> alter diskgroup DATAX6C1 rebalance power 2;

Creating a ramdisk in Solaris 11

Ramdisks are a great way to ‘prove’ that it’s not the performance of the underlying disks device that is stopping a process from writing a file quickly (doesn’t prove anything about the filesystem though…) . Ramdisks are transient, and are lost on system reboot, and also consume the memory on your system, so if you make them too large you can cause yourself other problems.

Creating a Ramdisk

The ramdiskadm command is used to create a ramdisk. In this example I am creating a 2G ramdisk called ‘idisk’

# ramdiskadm -a idisk 2G

Then you create the filesystem on the ramdisk (in this case UFS)

# newfs /dev/ramdisk/idisk

newfs: construct a new file system /dev/ramdisk/idisk: (y/n)? y
Warning: 2688 sector(s) in last cylinder unallocated
/dev/ramdisk/idisk:    41942400 sectors in 6827 cylinders of 48 tracks, 128 sectors
        20479.7MB in 427 cyl groups (16 c/g, 48.00MB/g, 5824 i/g)
super-block backups (for fsck -F ufs -o b=#) at:
 32, 98464, 196896, 295328, 393760, 492192, 590624, 689056, 787488, 885920,
Initializing cylinder groups:
........
super-block backups for last 10 cylinder groups at:
 40997024, 41095456, 41193888, 41292320, 41390752, 41489184, 41587616,
 41686048, 41784480, 41882912

Now you have a filesystem, you can mount it onto the correct location

# mkdir /export/home/tuxedo/DATA2
# mount /dev/ramdisk/idisk /export/home/tuxedo/DATA2 

Remember to set the ownership/permissions to allow the non-root users to write to the device

# chown tuxedo:oinstall /export/home/tuxedo/DATA2

Maintaining Ramdisks

You can check if a ramdisk exists by just running ramdiskadm without parameters

# ramdiskadm

Block Device                                                  Size  Removable 
/dev/ramdisk/idisk                                     21474836480    Yes

You can remove a ramdisk by unmounting the filesystem and using ramdiskadm -d

# umount /export/home/tuxedo/DATA2 
# ramdiskadm -d idisk

Expanding a zpool backed by an iSCSI LUN

So, you have a zpool provided by an iscsi LUN which is tight on space, and you’ve done all the tidying you can think of.. what do you do next? Well if you’re lucky, you have space to expand the iscsi LUN and then make it available to your zpool.

First – find the LUN that holds the zpool using zpool status <poolname>

 

# zpool status zonepool
  pool: zonepool
 state: ONLINE
  scan: none requested
config:

        NAME                                     STATE     READ WRITE CKSUM
        zonepool                                 ONLINE       0     0     0
          c0t600144F0A22997000000574BFAA90004d0  ONLINE       0     0     0

errors: No known data errors

Note the lun identifier, starting with c0, ending with ‘d0’

Locate the LUN on your storage appliance. If you are on a ZFS appliance there is a really handy script  in Oracle Support Document 1921605.1 Otherwise you’ll have to use the tools supplied with your storage array or your eyes 😉

So, I’ve located my lun on my ZFS appliance by matching the LUN identifier and then I need to change the LUN size..

shares> select sc1-myfs
shares sc1-myfs> 
shares sc1-myfs> select zoneshares_zonepool 
shares sc1-myfs/zoneshares_zonepool> get lunguid
 lunguid = 600144F0A22997000000574BFAA90004
shares sc1-myfs/zoneshares_zonepool> set volsize=500G
 volsize = 500G (uncommitted)
shares sc1-myfs/zoneshares_zonepool> commit

 

Now I just need to get my zpool to expand into the available space on the lun

# zpool online -e zonepool c0t600144F0A22997000000574BFAA90004d0

And now we’re done

 

Installing Cisco AnyConnect VPN on Ubuntu 16.04

I was struggling setting up a new VPN to connect to my servers at the office as vpnsetup.sh was failing

# ./vpnsetup.sh 
Installing Cisco AnyConnect Secure Mobility Client...
Extracting installation files to /tmp/vpn.0Zgby3/vpninst625702875.tgz...
Unarchiving installation files to /tmp/vpn.0Zgby3...
Starting Cisco AnyConnect Secure Mobility Client Agent...
Failed to start vpnagentd.service: Unit vpnagentd.service not found.

I found a bunch of articles on the internet saying that this was due to missing libraries so started with the first batch of recommendations…

# apt install -y lib32z1 lib32ncurses5

This still didn’t work.

So I tried the next one, which was to also install the network-manager-openconnect package and reload the daemons

# apt install network-manager-openconnect

# systemctl daemon-reload

Success!

# ./vpnsetup.sh 
Installing Cisco AnyConnect Secure Mobility Client...
Removing previous installation...
mv: cannot stat '/opt/cisco/vpn/*.log': No such file or directory
Extracting installation files to /tmp/vpn.yUyv15/vpninst922924093.tgz...
Unarchiving installation files to /tmp/vpn.yUyv15...
Starting Cisco AnyConnect Secure Mobility Client Agent...
Warning: vpnagentd.service changed on disk. Run 'systemctl daemon-reload' to reload units.
Done!

 

 

 

Really simple HTTP server with python

I hit a situation where I needed to be able to serve a firmware update via http and didn’t want to configure a full apache (or similar) webserver.

You can create a very simple http server using python

Move the file you want to share to a directory that doesn’t contain anything you wouldn’t want being shared with the rest of your network

# cp Sun_System_Firmware-9_5_1_c-SPARC_M5-32+M6-32.pkg /meltmp

Change directory to the directory you want to share and start the webserver (I’m choosing to use the port 8080)

# cd /meltemp

# python -m SimpleHTTPServer 8080
Serving HTTP on 0.0.0.0 port 8080 ...
10.10.7.40 - - [19/May/2016 16:34:35] "GET /Sun_System_Firmware-9_5_1_c-SPARC_M5-32+M6-32.pkg HTTP/1.1" 200 -

This server will run until you CONTROLC out of the session, writing logging information to the screen. You could of course nohup/background it, but I think this is really designed as a quick and dirty solution so I would not leave it running long term.

This works on Solaris and Linux

Redirecting Oracle Management agent to another OMS

I have an Enterprise Manager 13.1 system that is regularly reset to a ‘clean’ state using zfs snapshots. If you already have the management agents installed on other hosts you have the choice of

  1. De-installing the existing agent and installing a clean one
  2. Persuading the agent to talk to your ‘new’ server

A way of doing 2) is to follow the steps outlined in the MOS note EM 12c : Redirecting Oracle Management Agent to Another Oracle Management Service (Doc ID 1390717.1) which despite being for 12c appears to work for Enterprise Manager 13.

 

agent13@$ rm -rf agent_inst
agent13@$ mkdir agent_inst
agent13@$ /u01/app/agent13/agent_13.1.0.0.0/sysman/install/agentDeploy.sh \
AGENT_BASE_DIR=/u01/app/agent13 OMS_HOST=odc-em-ssc \
EM_UPLOAD_PORT=1159 AGENT_REGISTRATION_PASSWORD=welcome1 \
AGENT_INSTANCE_HOME=/u01/app/agent13/agent_inst \
AGENT_PORT=3872 -configOnly

 

This takes about 5 minutes to re-configure the agent to talk to the new server and then promote the targets. You will then be prompted to re-run the root.sh associated with the agent’s Oracle Home.

 

Creating an Application Zone on a SuperCluster

Application Zones on a SuperCluster Solaris 11 LDOM are subject to a lot fewer restrictions than the Exadata Database Zones. This also means that the documentation is less proscriptive and detailed.

This post will show a simple Solaris 11 zone creation, it is meant for example purposes only and not as a product. I am going to use a T5 SuperCluster for this walkthrough. The main difference you will need to consider for a M7 SuperCluster are:

  1. both heads of the ZFS-ES are active so you will need to select the correct head and infiniband interface name.
  2. there is only 1 QGBE card available per PDOM. This means you may need to present vnics from the domain that owns the card for the management network if you require this connectivity.

 

Useful Related MOS Notes

Considerations

As per note 2041460.1 the best practice for creating the file systems for the zone root filesystem is to  1 LUN per LDOM and create a filesystem on this shared pool for each application zone. Reservations and quotas can be used to prevent a zone from using more that its share.

You need to make sure you calculate minimum number of cores required for the global-zone  as per note 1625055.1

You  need to make sure that the IPS repos are all available, and that any IDRs  you have applied to your global zone are available.

Preparation


Put entries into global zone’s hostfile to for  your new zone.  I will use 3 addresses, one for the 1gbit management network, 1 for the 10gbit client network and 1 for the infiniband network on the storage partition (p8503).

 

10.10.14.15     sc5bcn01-d4.blah.mydomain.com   sc5bcn01-d4
10.10.10.78     sc5b01-d4.blah.mydomain.com sc5b01-d4
192.168.28.10   sc5b01-d4-storIB.blah.mydomain.com      sc5102-d4-storIB

 

Create an iscsi LUN for the zone root filesystem if you do not already have one already defined to hold zone roots. I am going to use the iscsi-lun.sh script that is designed for use  by other tools which create the Exadata Database Zones. The good thing about using this is it follows the naming convention etc. used for the other zones. However, it is not installed by default on Application zones (it is provided by the system/platform/supercluster/iscsi package in the exa-family repository) and this is not a supported use of the script.

  • -z is the name of my ZFS-ES
  • -i is the 1gbit hostname of my globalzone
  • -n and -N are used by the exavm utility to create the LUNs. In our case they will both be set to 1.
  • -s The size of the LUN to be created.
  • -l the volume block size. I have selected 32K, you may have other performance metrics that lead you to a different block size.
root@sc5bcn01-d3:/opt/oracle.supercluster/bin# ./iscsi-lun.sh create  \
-z sc5bsn01 -i sc5bcn01-d3  -n 1 -N 1 -s 500G -l 32K
Verifying sc5bcn01-d3 is an initiator node
The authenticity of host 'sc5bcn01-d3 (10.10.14.14)' can't be established.
RSA key fingerprint is 72:e6:d1:a1:be:a3:b3:d9:96:ea:77:61:bd:c7:f8:de.
Are you sure you want to continue connecting (yes/no)? yes
Password: 
Getting IP address of IB interface ipmp1 on sc5bsn01
Password: 
Setting up iscsi service on sc5bcn01-d3
Password: 
Setting up san object(s) and lun(s) for sc5bcn01-d3 on sc5bsn01
Password: 
Setting up iscsi devices on sc5bcn01-d3
Password: 
c0t600144F0F0C4EECD00005436848B0001d0 has been formatted and ready to use

Create a zpool to hold all of your zone roots

root@sc5bcn01-d3:/# zpool create zoneroots c0t600144F0F0C4EECD00005436848B0001d0

Now create a filesystem for your zone root and set a quota on it (optional).

root@sc5bcn01-d3:/# zfs create zoneroots/sc5b01-d4-rpool 
root@sc5bcn01-d3:/# zfs set quota=100G zoneroots/sc5b01-d4-rpool

Create partitions so your zone can access the IB Storage Network (optional, but nice to have, and my example will include them). First locate the interfaces that have access to the IB Storage Network partition  (PKEY=8503) using dladm and then create partitions using these interfaces.

root@sc5bcn01-d3:~# dladm show-part
LINK         PKEY  OVER         STATE    FLAGS
stor_ipmp0_0 8503  net7         up       f---
stor_ipmp0_1 8503  net8         up       f---
bondib0_0    FFFF  net8         up       f---
bondib0_1    FFFF  net7         up       f---
root@sc5bcn01-d3:~# dladm create-part -l net8 -P 8503 sc5b01d4_net8_p8503
root@sc5bcn01-d3:~# dladm create-part -l net7 -P 8503 sc5b01d4_net7_p8503

Create the Zone

Prepare your zone configuration file, here is mine. Note, I have non-standard link names to make it more readable. You will need to use ipadm to determine the lower-link names  that match your system

create -b
set brand=solaris
set zonepath=/zoneroots/sc5b01-d4-rpool
set autoboot=true
set ip-type=exclusive
add net
set configure-allowed-address=true
set physical=sc5b01d4_net7_p8503
end
add net
set configure-allowed-address=true
set physical=sc5b01d4_net8_p8503
end
add anet
set linkname=net0
set lower-link=auto
set configure-allowed-address=true
set link-protection=mac-nospoof
set mac-address=random
end
add net
set linkname=mgmt0
set lower-link=net0
set configure-allowed-address=true
set link-protection=mac-nospoof
set mac-address=random
end
add net
set linkname=mgmt1
set lower-link=net1
set configure-allowed-address=true
set link-protection=mac-nospoof
set mac-address=random
end
add anet
set linkname=client0
set lower-link=net2
set configure-allowed-address=true
set link-protection=mac-nospoof
set mac-address=random
end
add anet
set linkname=client1
set lower-link=net5
set configure-allowed-address=true
set link-protection=mac-nospoof
set mac-address=random
end

 

Implement the zone configuration using your pre-configured file or type it in manually..

root@sc5bcn01-d3:~# zonecfg -z sc5b01-d4 -f <yourzonefile>

 

Install the zone. Optionally you can specify a template to install required packages on top of the standard solaris-small-server group, or specify another package group. I base this on the standard xml file used by zone installs and customize the <software data> section (see this blog post here https://blogs.oracle.com/zoneszone/entry/automating_custom_software_installation_in  for an example)

root@sc5bcn01-d3:~# cp /usr/share/auto_install/manifest/zone_default.xml myzone.xml
root@sc5bcn01-d3:~# zoneadm -z sc5b01-d4 install -m myzone.xml

Next you boot the zone, and use zlogin -C to login to the console and answer the usual Solaris configuration questions about root password, timezone, locale. I do not usually configure the networking at this time, and add it later.

root@sc5bcn01-d3:~# zoneadm -z sc5b01-d4 boot
root@sc5bcn01-d3:~# zlogin -C sc5b01-d4

Create the required networking

# ipadm create-ip  mgmt0
# ipadm create-ip  mgmt1
# ipadm create-ip  client1
# ipadm create-ip  client0
# ipadm create-ipmp -i mgmt0 -i mgmt1 scm_ipmp0
# ipadm create-ipmp -i client0 -i client1 sc_ipmp0
# ipadm create-addr -T static -a local=10.10.10.78/22 sc_ipmp0/v4
# ipadm create-addr -T static -a local=10.10.14.15/24 scm_ipmp0/v4
# route -p add default 10.10.8.1
# ipadm create-ip sc5b01d4_net8_p8503
# ipadm create-ip sc5b01d4_net7_p8503
# ipadm create-ipmp -i sc5b01d4_net8_p8503 -i sc5b01d4_net7_p8503 stor_ipmp0
# ipadm set-ifprop -p standby=on -m ip sc5b01d4_net8_p8503
# ipadm create-addr -T static -a local=192.168.28.10/22 stor_ipmp0/v4

Optional Post Install steps

Root Login

Allow root to login over ssh by editing /etc/ssh/sshd_config and changing PermitRootLogin=no to PermitRootLogin=yes.
# svcadm restart ssh

Configure DNS support

# svccfg -s dns/client setprop config/search = astring: "blah.mydomain.com"
# svccfg -s dns/client setprop config/nameserver = net_address: \(10.10.34.4 10.10.34.5\)
# svccfg -s dns/client refresh 
# svccfg -s dns/client:default  validate
# svccfg -s dns/client:default  refresh 
# svccfg -s /system/name-service/switch setprop config/default = astring: \"files dns\"
# svccfg -s system/name-service/switch:default refresh
# svcadm enable dns/client

 

 Resource Capping

At the time of writing (20/04/16) virtual and physical memory capping is not supported on SuperCluster. This is mentioned in Oracle Support Document 1452277.1 (SuperCluster Critical Issues) as issue SOL_11_1.

Creating Processor sets and associating with your zone

See more detail about pools and processor sets on my blog here and here.And of course in the Solaris 11.3 manuals.

A quick summary of the commands follows.

This creates a fixed size processor set, consisting of 64 threads.

poolcfg -c "create pset pset_sc5bcn02-d4.osc.uk.oracle.com_id_6160 (uint pset.min = 64; uint pset.max = 64)" /etc/pooladm.conf

Then a pool is created, and associated with the processor set.

poolcfg -c "create pool pool_sc5bcn02-d4.osc.uk.oracle.com_id_6160" /etc/pooladm.conf
poolcfg -c "associate pool pool_sc5bcn02-d4.osc.uk.oracle.com_id_6160 (pset pset_sc5bcn02-d4.osc.uk.oracle.com_id_6160)" /etc/pooladm.conf
poolcfg -c 'modify pool pool_sc5bcn02-d4.osc.uk.oracle.com_id_6160 (string pool.scheduler="TS")' /etc/pooladm.conf

Enable the pool configuration saved in /etc/pooladm.conf

pooladm -c

modify the zone config to set the pool

zonecfg -z sc5bcn02-d4
zonecfg:sc5bcn02-d4> set pool=pool_sc5bcn02-d4.osc.uk.oracle.com_id_6160
zonecfg:sc5bcn02-d4> verify
zonecfg:sc5bcn02-d4> commit

Then you can stop and restart the zone to associate it with the processor set.