Category Archives: SPARC

What packages did that incorporation just install?

So you just installed a Solaris package incorporation and you want to work out what it actually included..

First .. find out when your package was installed

root@host-8-200:/var/log/pkg# pkg history 
START OPERATION CLIENT OUTCOME
2017-10-12T14:51:03 set-property transfer module Succeeded
2017-10-12T14:51:03 image-create transfer module Succeeded
2017-10-12T14:51:04 refresh-publishers transfer module Succeeded
2017-10-12T14:51:20 rebuild-image-catalogs transfer module Succeeded
2017-10-12T14:51:27 install transfer module Succeeded
2017-10-12T15:33:18 install pkg Succeeded
2017-10-12T15:39:07 install pkg Succeeded

We are going to dig into the install that occurred at 15:39

root@host-8-200:/var/log/pkg# pkg history -t 2017-10-12T15:39:07 -l

This gives a  really long listing.. but the key part for me was headed

 

Package version changes:
None -> pkg://solaris/developer/build/make@0.5.11,5.11-0.175.2.0.0.34.0:20140303T132010Z
None -> pkg://solaris/developer/assembler@0.5.11,5.11-0.175.3.9.0.2.0:20160528T012706Z
None -> pkg://solaris/group/prerequisite/oracle/oracle-rdbms-server-12-1-preinstall@0.5.11,5.11-0.175.3.11.0.4.0:20160804T020607Z

This shows that 3 packages were installed as they went from version ‘None’ to an actual version number.

Advertisements

Daxstat in Solaris 11.3/SuperCluster problems

One of the big challenges with the Software in Silicon features is actually monitoring them to see if they are actually doing anything. To monitor the DAX on the SPARC M7 chip you used to have to use busstat commands, and then interpret them (not easy! the documentation is not very thorough).

In later releases of Solaris 11.3 (SRU 19 onwards?) there is a command called daxstat which you can use to see in a much more human readable form the activity on your DAX.https://docs.oracle.com/cd/E86824_01/html/E54764/daxstat-1m.html

However, when I went to use it on my SuperCluster I hit a problem… it was failing with an error I couldn’t understand.

# daxstat
Traceback (most recent call last):
 File "/usr/bin/daxstat", line 969, in <module>
 sys.exit(main())
 File "/usr/bin/daxstat", line 962, in main
 return process_opts()
 File "/usr/bin/daxstat", line 905, in process_opts
 dax_ids, dax_queue_ids = derive_dax_opts(args, parser)
 File "/usr/bin/daxstat", line 844, in derive_dax_opts
 dax_ids = find_ids(query, parser, None)
 File "/usr/bin/daxstat", line 683, in find_ids
 all_dax_kstats = RCU.list_objects(kbind.Kstat(), query)
 File "/usr/lib/python2.7/vendor-packages/rad/connect.py", line 391, in list_objects
 a RADInterface object
 File "/usr/lib/python2.7/vendor-packages/rad/client.py", line 213, in _raise_error
 packer.pack_int((timestamp % 1000000) * 1000)
rad.client.NotFoundError: Error listing com.oracle.solaris.rad.kstat:type=Kstat: not found (3)

It *should* have worked on my current version of Solaris

# pkg list entire
NAME (PUBLISHER) VERSION IFO
entire 0.5.11-0.175.3.22.0.3.0 i--

 

So, I did some tweaking. I am not sure which of steps 1 or 2 actually fixed my problem, as it seemed to need the reboot to activate my ‘fix’

Step 1 – Make sure you have the Remote Administration Daemon packages installed. https://docs.oracle.com/cd/E53394_01/html/E54825/index.html I installed the package group group/system/management/rad/rad-server-interfaces to make sure I wasn’t missing anything.

 # pkg list |grep rad
group/system/management/rad/rad-server-interfaces 0.5.11-0.175.3.0.0.30.0 i--
system/management/rad 0.5.11-0.175.3.21.0.4.0 i--
system/management/rad/client/rad-c 0.5.11-0.175.3.21.0.3.0 i--
system/management/rad/client/rad-java 0.5.11-0.175.3.17.0.1.0 i--
system/management/rad/client/rad-python 0.5.11-0.175.3.17.0.1.0 i--
system/management/rad/module/rad-dlmgr 0.5.11-0.175.3.17.0.1.0 i--
system/management/rad/module/rad-files 0.5.11-0.175.3.17.0.1.0 i--
system/management/rad/module/rad-kstat 0.5.11-0.175.3.17.0.1.0 i--
system/management/rad/module/rad-network 0.5.11-0.175.3.17.0.1.0 i--
system/management/rad/module/rad-panels 0.5.11-0.175.3.17.0.1.0 i--
system/management/rad/module/rad-smf 0.5.11-0.175.3.17.0.1.0 i--
system/management/rad/module/rad-time 0.5.11-0.175.3.17.0.1.0 i--
system/management/rad/module/rad-usermgr 0.5.11-0.175.3.17.0.4.0 i--
system/management/rad/module/rad-zfsmgr 0.5.11-0.175.3.17.0.1.0 i--
system/management/rad/module/rad-zonemgr 0.5.11-0.175.3.22.0.1.0 i--

Step 2 – Make sure the RAD service is running (mine was disabled)

# svcs -a |grep rad

online 9:22:10 svc:/system/rad:local
online 9:22:10 svc:/system/logadm-upgrade:default
online 9:22:10 svc:/system/rad:local-http

It was still failing to work though.. and I couldn’t work out why. In desperation I tried a reboot and the command stopped failing.

# daxstat 1
No data available to display.

Run workload that uses the DAX and then it will populate the output.

# daxstat 1
DAX commands fallbacks input output %busy
 0 466 9 4.0M 0.0M 0
 1 469 13 4.0M 0.0M 0
 2 462 12 2.0M 0.0M 0
 3 461 16 4.0M 0.0M 0
 4 473 9 4.0M 0.0M 0
 5 457 8 2.0M 0.0M 0
 6 459 6 2.0M 0.0M 0
 7 465 10 4.0M 0.0M 0

Installing Enterprise Manager agent on Oracle Linux 6.7 SPARC

I have access to an EM13 Enterprise Manager server, and I am going to add my Oracle Linux 6.7 SPARC to this system for monitoring.

First – check that you have the latest plugins and agents installed for the platform.

Screenshot-Self Update: Agent Software - Oracle Enterprise Manager - Mozilla Firefox

Next, on the hosts, create a user to ‘own’ the agent software

[root@host-8-160 ~]# groupadd -g 10001 oinstall
[root@host-8-160 ~]# useradd -g oinstall -s /bin/bash -d /home/agent13 -m agent13
[root@host-8-160 ~]# passwd agent13
Changing password for user agent13.

Create a directory structure for the software

[root@host-8-160 ~]# mkdir -p /u01/app
[root@host-8-160 ~]# chgrp -R oinstall /u01
[root@host-8-160 ~]# chmod g+rwx /u01

Now, back in Enterprise Manager go.. Setup -> Add Target -> Add Target Manually -> Install Agent on Host.

Enter the fully qualified domain name of your host, and the correct Platform

Screenshot-Add Host Targets : Host and Platform - Mozilla Firefox

 

Enter the installation location

Enter the credentials for agent13 and root user and hit next.

Then you can hit deploy agent.

 

Post install configuration/Worries

 

The agent installed successfully – but the host target is not being marked as available.

Looking at the output of emctl status agent I have 2 concerns.

[agent13@host-8-160 bin]$ ./emctl status agent
Oracle Enterprise Manager Cloud Control 13c Release 2 
Copyright (c) 1996, 2016 Oracle Corporation. All rights reserved.
---------------------------------------------------------------
Agent Version : 13.2.0.0.0
OMS Version : 13.2.0.0.0
Protocol Version : 12.1.0.1.0
Agent Home : /u01/app/agent13/ngc13/agent_inst
Agent Log Directory : /u01/app/agent13/ngc13/agent_inst/sysman/log
Agent Binaries : /u01/app/agent13/ngc13/agent_13.2.0.0.0
Core JAR Location : /u01/app/agent13/ngc13/agent_13.2.0.0.0/jlib
Agent Process ID : 20636
Parent Process ID : 20497
Agent URL : https://host-8-160.blah.com:3876/emd/main/
Local Agent URL in NAT : https://host-8-160.blah.com:3876/emd/main/
Repository URL : https://ngc13c.blah.com:4901/empbs/upload
Started at : 2017-06-13 10:52:26
Started by user : agent13
Operating System : Linux version 4.1.12-94.3.4.el6uek.sparc64 (sparcv9)
Number of Targets : (none)
Last Reload : (none)
Last successful upload : (none)
Last attempted upload : (none)
Total Megabytes of XML files uploaded so far : 0
Number of XML files pending upload : 0
Size of XML files pending upload(MB) : 0
Available disk space on upload filesystem : 98.09%
Collection Status : Collections enabled
Heartbeat Status : Ok
Last attempted heartbeat to OMS : 2017-06-13 10:57:31
Last successful heartbeat to OMS : 2017-06-13 10:57:31
Next scheduled heartbeat to OMS : 2017-06-13 10:58:31

 

There are no targets, and there has not been a successful upload.

[agent13@host-8-160 bin]$ ./emctl pingOMS
Oracle Enterprise Manager Cloud Control 13c Release 2 
Copyright (c) 1996, 2016 Oracle Corporation. All rights reserved.
---------------------------------------------------------------
EMD pingOMS completed successfully

[agent13@host-8-160 bin]$ ./emctl upload agent
Oracle Enterprise Manager Cloud Control 13c Release 2 
Copyright (c) 1996, 2016 Oracle Corporation. All rights reserved.
---------------------------------------------------------------
EMD upload completed successfully

If I look in my targets.xml it is pretty empty

[agent13@host-8-160 ngc13]$ cat ./agent_inst/sysman/emd/targets.xml
<Targets AGENT_TOKEN="67DBE4C8ECBA03FA5DC991893B75619C55C9B1CEACAA6ED68074AB9C65CFF973"/>

[agent13@host-8-160 bin]$ ./emctl config agent listtargets
Oracle Enterprise Manager Cloud Control 13c Release 2 
Copyright (c) 1996, 2016 Oracle Corporation. All rights reserved.
[agent13@host-8-160 bin]$

On the enterprise manager server I had errors similar to this

 
Metric evaluation error start - Unable to connect to the agent at https://host-8-161.blah.com:3876/emd/main/ [No route to host]

Tried putting that URL into my browser… cannot connect to it.

Firewall! DOH! Of course!

Temporarily disabled the iptables firewall

[root@host-8-161 /]# service iptables stop
iptables: Setting chains to policy ACCEPT: filter [ OK ]
iptables: Flushing firewall rules: [ OK ]
iptables: Unloading modules: [ OK ]

Now I can connect to the agent address in my browser.

 

So.. the choice is disable the firewall, or alter the rules. As I’m in a lab, I’m going straight to disabling the firewall.

[root@host-8-161 /]# chkconfig iptables off

Now, try to get the agent to generate the internal target list (host, ORACLE_HOME)

[agent13@host-8-161 bin]$ ./emctl config agent listtargets
Oracle Enterprise Manager Cloud Control 13c Release 2 
Copyright (c) 1996, 2016 Oracle Corporation. All rights reserved.

[agent13@host-8-161 bin]$ ./emctl config agent addinternaltargets
Oracle Enterprise Manager Cloud Control 13c Release 2 
Copyright (c) 1996, 2016 Oracle Corporation. All rights reserved.
2017-06-13 12:00:37,234 [main] WARN oracle.sysman.gcagent.comm.agent.http.SSLInit - User requested cipher suite SSL_RSA_WITH_RC4_128_MD5, which is not supported for SSLContext TLSv1.2
2017-06-13 12:00:37,242 [main] WARN oracle.sysman.gcagent.comm.agent.http.SSLInit - User requested cipher suite SSL_RSA_WITH_RC4_128_SHA, which is not supported for SSLContext TLSv1.2

[agent13@host-8-161 bin]$ ./emctl config agent listtargets
Oracle Enterprise Manager Cloud Control 13c Release 2 
Copyright (c) 1996, 2016 Oracle Corporation. All rights reserved.

Now when I look at my targets.xml it has entries

[agent13@host-8-161 agent_inst]$ cat ./sysman/emd/targets.xml
<Targets AGENT_TOKEN="6A415CAF76EC952756AE3BC675B0080ADAEE066B3F9B10B1B4A6410870130843">
 <Target TYPE="host" NAME="host-8-161.blah.com" DISPLAY_NAME="host-8-161.osc.uk.oracle.com" EMD_URL="https://host-8-161.blah.com:3876/emd/main/" TIMEZONE_REGION="" IDENTIFIER="TARGET_GUID=51D4595ED5982DE8E0539011038AD7DB"/>
 <Target TYPE="oracle_emd" NAME="host-8-161.blah.com:3876" DISPLAY_NAME="host-8-161.blah.com:3876" EMD_URL="https://host-8-161.blah.com:3876/emd/main/" TIMEZONE_REGION="" IDENTIFIER="TARGET_GUID=0958C84AFB17CE4D3F9FB85C81250615"/>
 <Target TYPE="oracle_home" NAME="agent13c1_1_host-8-161.blah.com_1639" DISPLAY_NAME="agent13c1_1_host-8-161.blah.com_1639" EMD_URL="https://host-8-161.blah.com:3876/emd/main/" TIMEZONE_REGION="" IDENTIFIER="TARGET_GUID=C13E4BCE40F4509C3FC788A3C08EED68">
 <Property NAME="HOME_TYPE" VALUE="O"/>
 <Property NAME="INVENTORY" VALUE="/u01/app/oraInventory"/>
 <Property NAME="INSTALL_LOCATION" VALUE="/u01/app/agent13/ngc13/agent_13.2.0.0.0"/>
 </Target>
</Targets>

When I look at the hosts in Enterprise Manager they are now marked as up.

 

linux

Thoughts and other questions..

The agent13 user on the primary domain has automatically been given the permission to run read only ldm commands (similar to the privileges that need to be manually applied to the user on Solaris).

Unlike on other platforms (e.g. SuperCluster) the hierachy of LDOMs does not seem to be recorded.

Installing and configuring DTRACE on Oracle Linux SPARC

DTRACE is one of the killer features of Solaris, and allows you to programmatically monitor system statistics and diagnose performance issues.  See https://github.com/opendtrace/toolkit for toolkit scripts so you do not have to write your own.

Dtrace is not shipped with the install media. You need to manually download the rpms from

http://www.oracle.com/technetwork/server-storage/linux/downloads/linux-dtrace-2800968.html

 

 

Dtrace is very kernel version dependent. Do not yum update your kernel without checking that dtrace is available for that release or you will have problems!

You can use yum to install the rpms

[root@host-8-161 sfw]# yum localinstall dtrace*
Loaded plugins: downloadonly, ulninfo
Setting up Local Package Process
Examining dtrace-utils-0.6.0-3.el6.sparc64.rpm: dtrace-utils-0.6.0-3.el6.sparc64
Marking dtrace-utils-0.6.0-3.el6.sparc64.rpm to be installed
Examining dtrace-utils-devel-0.6.0-3.el6.sparc64.rpm: dtrace-utils-devel-0.6.0-3.el6.sparc64
Marking dtrace-utils-devel-0.6.0-3.el6.sparc64.rpm to be installed
Resolving Dependencies
--> Running transaction check
---> Package dtrace-utils.sparc64 0:0.6.0-3.el6 will be installed
---> Package dtrace-utils-devel.sparc64 0:0.6.0-3.el6 will be installed
--> Processing Dependency: libdtrace-ctf-devel > 0.4.0 for package: dtrace-utils-devel-0.6.0-3.el6.sparc64
--> Running transaction check
---> Package libdtrace-ctf-devel.sparc64 0:0.5.0-3.el6 will be installed
--> Finished Dependency Resolution

Dependencies Resolved

================================================================================
 Package Arch Version Repository Size
================================================================================
Installing:
 dtrace-utils sparc64 0.6.0-3.el6 /dtrace-utils-0.6.0-3.el6.sparc64 766 k
 dtrace-utils-devel
 sparc64 0.6.0-3.el6 /dtrace-utils-devel-0.6.0-3.el6.sparc64 77 k
Installing for dependencies:
 libdtrace-ctf-devel
 sparc64 0.5.0-3.el6 public_ol6_latest 15 k

Transaction Summary
================================================================================
Install 2 Packages (+1 Dependent package)

Total size: 857 k
Total download size: 15 k
Installed size: 877 k
Is this ok [y/N]: y
Downloading Packages:
libdtrace-ctf-devel-0.5.0-3.el6.sparc64.rpm | 15 kB 00:00 
warning: rpmts_HdrFromFdno: Header V3 RSA/SHA256 Signature, key ID ec551f03: NOKEY
Public key for libdtrace-ctf-devel-0.5.0-3.el6.sparc64.rpm is not installed
Retrieving key from file:///etc/pki/rpm-gpg/RPM-GPG-KEY-oracle
Importing GPG key 0xEC551F03:
 Userid : "Oracle OSS group (Open Source Software group) <build@oss.oracle.com>"
 Fingerprint: 4214 4123 fecf c55b 9086 313d 72f9 7b74 ec55 1f03
 Package : 6:oraclelinux-release-6Server-7.0.8.sparc64 (@anaconda-OracleLinuxServer-201705232044.sparc64/6.7)
 From : /etc/pki/rpm-gpg/RPM-GPG-KEY-oracle
Is this ok [y/N]: y
Running Transaction Check
Running Transaction Test
Transaction Test Succeeded
Running Transaction
 Installing : dtrace-utils-0.6.0-3.el6.sparc64 1/3 
 Installing : libdtrace-ctf-devel-0.5.0-3.el6.sparc64 2/3 
 Installing : dtrace-utils-devel-0.6.0-3.el6.sparc64 3/3 
 Verifying : dtrace-utils-devel-0.6.0-3.el6.sparc64 1/3 
 Verifying : libdtrace-ctf-devel-0.5.0-3.el6.sparc64 2/3 
 Verifying : dtrace-utils-0.6.0-3.el6.sparc64 3/3

Installed:
 dtrace-utils.sparc64 0:0.6.0-3.el6 dtrace-utils-devel.sparc64 0:0.6.0-3.el6

Dependency Installed:
 libdtrace-ctf-devel.sparc64 0:0.5.0-3.el6

Complete!

 

 

At this point when you run dtrace it doesn’t show anything useful and has no probes available.

[root@host-8-160 sfw]# dtrace -l
dtrace: module license 'CDDL' taints kernel.
Disabling lock debugging due to kernel taint
 ID PROVIDER MODULE FUNCTION NAME
 1  dtrace                   BEGIN
 2  dtrace                   END
 3  dtrace                   ERROR

 

You need to manually load the kernel modules for the probes and providers you want to use.  There is a list of providers in the the Oracle Linux Dtrace Tutorial manual and the Oracle Linux Dtrace Guide

A summary of what is available at the time of writing (June 2017) is below.

Provider Kernel Module Description
dtrace dtrace Provides probes that relate to DTrace itself, such as BEGIN, ERROR, and END. You can use these probes to initialize DTrace’s state before tracing begins, process its state after tracing has completed, and handle unexpected execution errors in other probes.
fasttrap fasttrap Supports user-space tracing of DTrace-enabled applications.
io sdt Provides probes that relate to data input and output. The io provider enables quick exploration of behavior observed through I/O monitoring tools such as iostat.
proc sdt Provides probes for monitoring process creation and termination, LWP creation and termination, execution of new programs, and signal handling.
profile profile Provides probes associated with an interrupt that fires at a fixed, specified time interval. These probes are associated with the asynchronous interrupt event rather than with any particular point of execution. You can use these probes to sample some aspect of a system’s state.
sched sdt Provides probes related to CPU scheduling. Because CPUs are the one resource that all threads must consume, the sched provider is very useful for understanding systemic behavior.
syscall systrace Provides probes at the entry to and return from every system call. Because system calls are the primary interface between user-level applications and the operating system kernel, these probes can offer you an insight into the interaction between applications and the system.

You can manually load the probes

[root@host-8-160 log]# modprobe -a dtrace profile systrace sdt dt_test fasttrap

However, you may want to write  startup script to automatically load the probes at boot time if the dtrace device exists.

[root@host-8-160 sfw]# cat /etc/sysconfig/modules/dtrace.modules
 

#!/bin/sh
if [ ! -c /dev/dtrace/dtrace ] ; then
         exec /sbin/modprobe -a dtrace profile systrace sdt dt_test
 fi

[root@host-8-160 sfw]# chmod 755 /etc/sysconfig/modules/dtrace.modules

 

Once the module has been loaded into the kernel, you can list all probes using

[root@host-8-160 sfw]# dtrace -l

or for just a single provider

[root@host-8-160 etc]# dtrace -l -P io
 ID  PROVIDER MODULE    FUNCTION NAME
 266 io       vmlinux   end_bio_bh_io_sync done
 267 io       vmlinux   _submit_bh start
 269 io       vmlinux   __wait_on_buffer wait-start
 270 io       vmlinux   __wait_on_buffer wait-done


You may also want to look at the information in the manual about setting the permissions on the dtrace helper device to allow code that runs as a user other than root to be recorded.

Scribbled notes on installing the Oracle database on Oracle Linux 6.7 SPARC

I had a very short time to play with my Oracle Linux SPARC box before I handed it to my customers, so I only had a very quick attempt to install the Oracle RDBMS and start a database. I did only a very basic install using database storage on filesystem, and allowed the installer to create the DB. So these notes are even more scrappy than usual.

While not yet a certified platform, you can download the Oracle Database 12.1.0.2 for Oracle Linux 6.7 SPARC images on e-delivery. There is not a publicly available install document, so I’m going to follow the install guide for Linux

Preparing for the Install

https://docs.oracle.com/database/121/LADBI/olinrpm.htm#LADBI7477

I couldn’t find the pre-installation rpm for Linux-SPARC on ULN. So I am going to have to follow the documentation and hope I have all the packages.

Verify openssh is installed

[root@host-8-161 yum.repos.d]# rpm -qa |grep ssh
 openssh-clients-5.3p1-117.el6.sparc64
 openssh-5.3p1-117.el6.sparc64
 openssh-server-5.3p1-117.el6.sparc64
 libssh2-1.4.2-2.el6_7.1.sparc64

Check that the required packages are installed.

https://docs.oracle.com/database/121/LADBI/pre_install.htm#LADBI7534

I added the following packages..

  • compat-libcap1
  • compat-libstdc++-33

On the installation media the is an additional rpm to install cvuqdisk-1.0.9-1.rpm – but this requires an oracle user….

So.. lets create my user and groups for now

[root@host-8-161 rpm]# groupadd -g 1001 oinstall
 [root@host-8-161 rpm]# groupadd -g 1002 dba
 [root@host-8-161 rpm]# useradd -g dba -G oinstall -s /bin/bash -d /home/oracle/ oracle
 [root@host-8-161 rpm]# passwd oracle

Now retry installing the package

[root@host-8-161 rpm]# rpm -i cvuqdisk-1.0.9-1.rpm

Using default group oinstall to install package

Create directories

[root@host-8-161 rpm]# mkdir /u01
[root@host-8-161 rpm]# mkdir -p /u01/app/oracle
[root@host-8-161 rpm]# chown -R oracle:dba /u01

 

Set the Oracle user resource limits

[root@host-8-161 rpm]# cat /etc/security/limits.conf

# End of file
 oracle soft nofile 1024
 oracle hard nofile 65536
 oracle soft nproc 2047
 oracle hard nproc 16384
 oracle soft stack 10240
 oracle hard stack 10240
 oracle soft memlock 3145728
 oracle hard memlock 3145728

 

  • Set the display and try running runinstaller
  • .. fails with PRVF-0002 Unable to retrieve local node name
  • Added the hostname and IP to the local /etc/hosts and the install continued.
  • Pre-installation checks give warning about kernel parameters and swap size… but it does offer me a fixit script for the kernel parameters. Need to ensure that these changes to parameters are added to /etc/sysctl.conf
  • Ran the fixit, and still some parameters giving a warning – semms etc – I guess these would need a reboot. So the kernel parameters will need reviewing

Things I might want to consider adding to the /etc/sysctl.conf (stolen from another system)


kernel.msgmni = 2878
kernel.msgmax = 8192
kernel.msgmnb = 65536
kernel.shmmni = 4096
kernel.shmmax = 229916494233
kernel.shmall = 28065978

 

After the install completed, the database started ok.

 

 

 

 

Creating LDOMs on Oracle Linux 6.7 SPARC

Lots of things to work out in advance

  1. What disks are available for use by my LDOM? I have a couple of disks, but I’m going to try creating the LDOM virtual disks on logical volumes hosted on the disk /dev/sdc
  2. What networking can I use? This is fairly simple, I only have 1 active network connection on eth0 so this will have to be virtualised
  3. How much resource is in the server, and how much can I give to my guest domain? You can see the total resource available in ldm ls.  I know I have 2 x SPARC M7 CPU, each with 32 cores, and 8 threads per core.
NAME STATE FLAGS CONS VCPU MEMORY UTIL NORM UPTIME
primary active -ndcv- UART 512 958G 0.0% 0.0% 4h 40m

Use GNU-Parted to partition the disk

Sorry this part was written retrospectively – I had multiple problems with the creating of the device to host the LDOM operating system. This manifested itself as

  1. No stability of boot device. If I gave my LDOM an entire disk to use as a boot device, it would get a boot sector installed in the expected place. On power on/poweroff of the server, the Grub boot loader did use the OS image for the guest LDOM to boot the primary LDOM. I got round this by giving the guest LDOMs slices on a disk.
  2. Not very stable device tree. If you use the /dev/sdX type names to refer to devices in your LDOM definition,  this device  name can change on reboot. So use something more stable like the WWN of the device.

You can see which disks are available using lsscsi

[root@host-8-160 ~]# lsscsi
[0:0:0:0] disk HITACHI H109060SESUN600G A690 /dev/sda 
[0:0:1:0] disk HGST HSCAC2DA4SUN400G A29A /dev/sdb 
[0:0:2:0] disk HGST H101812SFSUN1.2T A770 /dev/sdc 
[0:0:3:0] disk HGST H101812SFSUN1.2T A770 /dev/sdd 
[1:0:0:0] disk HGST HSCAC2DA4SUN400G A122 /dev/sde 
[1:0:1:0] disk HGST HSCAC2DA4SUN400G A122 /dev/sdf 
[8:0:0:0] cd/dvd SUN Remote ISO CDROM 1.01 /dev/sr0 
[9:0:0:0] cd/dvd TEAC DV-W28S-B AT11 /dev/sr1 
[10:0:0:0] disk MICRON eUSB DISK 1112 /dev/sdg

I am going to use one of the 1.2 TB disks as the boot device for the guest LDOM.

I used GNU Parted to label the disk with 2 partitions. The tool works in both GB/MB (1000 bytes to a kb) and GiB/MiB (1024 bytes to a KiB)

parted /dev/sdc

(parted) p
Model: HGST H101812SFSUN1.2T (scsi)
Disk /dev/sdc: 1200GB
Sector size (logical/physical): 512B/512B
Partition Table: gpt

Number Start End Size File system Name Flags
 1 1049kB 537GB 537GB ext3 host854
 2 537GB 1075GB 538GB host161

(parted) quit

I created filesystems on the partitions – I don’t think this required, but sometimes OS installers are unhappy if the disk is completely blank.

 

 

[root@host-8-160 ~]# mkfs -t ext4 -L host8161 /dev/sdc2
mke2fs 1.43-WIP (20-Jun-2013)
Filesystem label=host8161
OS type: Linux
Block size=4096 (log=2)
Fragment size=4096 (log=2)
Stride=0 blocks, Stripe width=0 blocks
32833536 inodes, 131334144 blocks
6566707 blocks (5.00%) reserved for the super user
First data block=0
Maximum filesystem blocks=4294967296
4008 block groups
32768 blocks per group, 32768 fragments per group
8192 inodes per group
Superblock backups stored on blocks: 
 32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208, 
 4096000, 7962624, 11239424, 20480000, 23887872, 71663616, 78675968, 
 102400000

Allocating group tables: done 
Writing inode tables: done 
Creating journal (32768 blocks): done
Writing superblocks and filesystem accounting information: done

 

[root@host-8-160 ~]# parted /dev/sdc
GNU Parted 2.1
Using /dev/sdc
Welcome to GNU Parted! Type 'help' to view a list of commands.
(parted) p 
Model: HGST H101812SFSUN1.2T (scsi)
Disk /dev/sdc: 1200GB
Sector size (logical/physical): 512B/512B
Partition Table: gpt

Number Start End Size File system Name Flags
 1 1049kB 537GB 537GB ext4 host854
 2 537GB 1075GB 538GB ext4 host161

 

As I had problems with device tree stability, I looked for a more stable naming method. Under the /dev/disk/ directory there are more stable naming interfaces that refer to characteristics that do not change, such as wwn. This is under the /dev/disk/by-id

 

[root@host-8-160 by-id]# ls -l wwn-0x5000cca02d021474-part1
lrwxrwxrwx. 1 root root 10 Jun 12 16:19 wwn-0x5000cca02d021474-part1 -> ../../sdc1
[root@host-8-160 by-id]# ls -l wwn-0x5000cca02d021474-part2
lrwxrwxrwx. 1 root root 10 Jun 12 16:19 wwn-0x5000cca02d021474-part2 -> ../../sdc2

 

 

Add the default services

Enable bridge control – there is a bunch of stuff in the release notes about the difference in virtual switch architecture for ldoms in linux.

http://docs.oracle.com/cd/E37670_01/E86243/html/ConfigureServicesControlDomain.html

The process is to change a file as follows

 

# sed -i '/SUBSYSTEM/ s/^#//' /etc/udev/rules.d/99-vsw.rules

and reboot.

You will not be able to see anything in the output from brctl show until the domain is bound.

Create the Virtual Consoles, Virtual Network Switch and Virtual disk Service

[root@host-8-160 ~]# ldm add-vcc port-range=5000-5100 primary-vcc0 primary
LDom primary does not support dynamic reconfiguration of IO devices
Initiating a delayed reconfiguration operation on the primary domain.
All configuration changes for other domains are disabled until the primary
domain reboots, at which time the new configuration for the primary domain
will also take effect.
[root@host-8-160 ~]# ldm add-vds primary-vds0 primary
------------------------------------------------------------------------------
Notice: The primary domain is in the process of a delayed reconfiguration.
Any changes made to the primary domain will only take effect after it reboots.
------------------------------------------------------------------------------

[root@host-8-160 ~]# ldm add-vsw net-dev=eth0 primary-vsw0 primary
------------------------------------------------------------------------------
Notice: The primary domain is in the process of a delayed reconfiguration.
Any changes made to the primary domain will only take effect after it reboots.


[root@host-8-160 ~]# ldm list-services
VCC
 NAME LDOM PORT-RANGE
 primary-vcc0 primary 5000-5100

VSW
 NAME LDOM MACADDRESS NET-DEV DVID|PVID|VIDs
 ---- ---- ---------- ------- --------------
 primary-vsw0 primary 00:14:4f:fb:cd:dc eth0 1|1|--

VDS
 NAME LDOM VOLUME OPTIONS MPGROUP DEVICE
 primary-vds0 primary

Reconfigure primary to free resources for the guest domain

I am going to assign 96 cores to the primary domain and 100GB memory.

[root@host-8-160 ~]# ldm set-vcpu 96 primary
------------------------------------------------------------------------------
Notice: The primary domain is in the process of a delayed reconfiguration.
Any changes made to the primary domain will only take effect after it reboots.
------------------------------------------------------------------------------

------------------------------------------------------------------------------

[root@host-8-160 ~]# ldm set-memory 100G primary
------------------------------------------------------------------------------
Notice: The primary domain is in the process of a delayed reconfiguration.
Any changes made to the primary domain will only take effect after it reboots.
------------------------------------------------------------------------------

Now reboot to activate the new configuration.

Create the Guest

This guest will have fully virtual I/O.

Create a virtual device to act as the DVD to allow the OS to be booted.

[root@host-8-160 ~]# ldm add-vdsdev /sfw/OL-201705232017-R6-U7-sparc-dvd.iso iso_vol@primary-vds0

Not all ldm commands have been implemented on Linux, so you cannot do some things such as add vcpu by core.. e.g

[root@host-8-160 ~]# ldm add-vcpu --core 16 host-8-161

Usage:
 ldm add-vcpu <number> <ldom>

My domain will be called host-8-161 to match the planned unix hostname.

[root@host-8-160 ~]# ldm add-domain host-8-161
[root@host-8-160 ~]# ldm add-vcpu 96 host-8-161
[root@host-8-160 ~]# ldm add-memory 100G host-8-161
[root@host-8-160 ~]# ldm add-vnet linkprop=phys-state vnet1 primary-vsw0 host-8-161
[root@host-8-160 ~]# ldm add-vdsdev /dev/disk/by-id/wwn-0x5000cca02d021474-part2 boot-8-161@primary-vds0
[root@host-8-160 ~]# ldm add-vdisk boot-8-161 boot-8-161@primary-vds0 host-8-161

I am also going to add the dvd device to allow the OS to be booted from here

[root@host-8-160 ~]# ldm add-vdisk vdisk_iso iso_vol@primary-vds0 host-8-161

Now bind the domain.

[root@host-8-160 ~]# ldm bind host-8-161
At this point you will be able to see output in the brctl show command
[root@host-8-160 ~]# brctl show
bridge name   bridge id              STP enabled  interfaces
              vsw0 8000.0010e08a4806 no           eth0
                                                  vif0.0

Start the domain

[root@host-8-160 ~]# ldm start host-8-161
LDom host-8-161 started

Connect to the console. This is different than on Solaris SPARC in that you use the ldmconsole command. To exit from this you use <ctrl>q

[root@host-8-160 ~]# ldmconsole host-8-161

{0} ok banner

SPARC T7-2, No Keyboard
Copyright (c) 1998, 2017, Oracle and/or its affiliates. All rights reserved.
OpenBoot 4.40.5, 256.0000 GB memory installed, Serial #83519177.
Ethernet address 0:14:4f:fa:66:c9, Host ID: 84fa66c9.


Booting and installing the Guest

Now we have the Open Boot Prom (OBP) ‘ok’ prompt which is familiar to people who work on SPARC Solaris.

We can see what device aliases have been created

{0} ok devalias
vdisk_iso       /virtual-devices@100/channel-devices@200/disk@1
boot-8-161      /virtual-devices@100/channel-devices@200/disk@0
vnet1           /virtual-devices@100/channel-devices@200/network@0
net             /virtual-devices@100/channel-devices@200/network@0
disk            /virtual-devices@100/channel-devices@200/disk@0
virtual-console /virtual-devices/console@1
name            aliases

 

I can now boot from my virtual iso device and install Linux

{0} ok boot vdisk_iso - install

After that the install is similar to the process documented in Installing Oracle Linux for SPARC on a T7-2

You will need to manually configure the networking and hostname, yum updates and install dtrace if required.

Check the name assigned to the virtualised network interface

[root@localhost ~]# ip addr show
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN 
 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
 inet 127.0.0.1/8 scope host lo
 valid_lft forever preferred_lft forever
 inet6 ::1/128 scope host 
 valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN qlen 1000
 link/ether 00:14:4f:fb:91:5d brd ff:ff:ff:ff:ff:ff

Edit the /etc/sysconfig/network-scripts/ifcfg-eth0 and set the correct parameters for your environment.

[root@host-8-161 ~]# cat /etc/sysconfig/network-scripts/ifcfg-eth0
DEVICE=eth0
HWADDR=00:14:4F:FB:91:5D
TYPE=Ethernet
UUID=eb521b4c-7e70-4963-af78-550163d2b214
ONBOOT=yes
NM_CONTROLLED=no
BOOTPROTO=none
IPADDR=1.2.3.161
PREFIX=22
GATEWAY=1.2.3.1
DNS1=1.2.34.4
DNS2=1.2.34.5
DOMAIN=blah.com
[root@host-8-161 ~]# cat /etc/sysconfig/network
NETWORKING=yes
HOSTNAME=host-8-161.blah.com

 

Troubleshooting communication problems with the service processor

I couldn’t contact the service processor to save the spconfig

[root@host-8-160 ~]# ldm ls-spconfig
The requested operation could not be performed because the communication
channel between the LDoms Manager and the system controller is down.
The ILOM interconnect may be disabled or down.

[root@host-8-160 ~]# ip addr show usb0
Device "usb0" does not exist.

 

My current settings

-> show /SP/network/interconnect hostmanaged

/SP/network/interconnect
 Properties:
 hostmanaged = true

-> show /SP/network/interconnect state

/SP/network/interconnect
 Properties:
 state = disabled

 

 

It should be..

-> show /SP/network/interconnect hostmanaged

/SP/network/interconnect
 Properties:
 hostmanaged = false

-> show /SP/network/interconnect state

/SP/network/interconnect
 Properties:
 state = enabled

After changing this – the usb0 network device should be available in the operating system.

[root@host-8-160 ~]# ip addr show usb0
10: usb0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN qlen 1000
 link/ether 02:21:28:57:47:17 brd ff:ff:ff:ff:ff:ff

Hmm… still not quite right. It doesn’t have a network address assigned.

Try resetting the SP… no difference

rebooted the OS..

 

Success.

[root@host-8-160 ~]# ip addr show usb0
8: usb0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UNKNOWN qlen 1000
 link/ether 02:21:28:57:47:17 brd ff:ff:ff:ff:ff:ff
 inet 169.254.182.77/24 brd 169.254.182.255 scope global usb0
 valid_lft forever preferred_lft forever
 inet6 fe80::21:28ff:fe57:4717/64 scope link 
 valid_lft forever preferred_lft forever
[root@host-8-160 ~]# ldm ls-spconfig

Installing Oracle Linux for SPARC on a T7-2

Preparation

Booting from the install media

We do not have Oracle Linux integrated with our standard install server so I am going to do this via CDROM redirection.

I have copied the ISO image to a NFS shared directory on my network, now I have to setup the redirection so the server will boot from this.

Make sure the ILOM timeout is set to at least 2 hours

-> show /SP/cli timeout

/SP/cli
 Properties:
 timeout = 0

 

The timeout is expressed in minutes – if it is set to 0 then the session will never timeout.

Make sure the KVMS services are enabled in the ILOM

-> show /SP/services/kvms/ servicestate

/SP/services/kvms
 Properties:
 servicestate = enabled

Set the host_storage_device to remote, and set the remote location to your NFS server

-> cd /SP/services/kvms/host_storage_device
/SP/services/kvms/host_storage_device

-> set mode=remote
Set 'mode' to 'remote'

-> cd remote
/SP/services/kvms/host_storage_device/remote

-> set server_uri=nfs://1.3.4.78:/Factory/OL-201705232017-R6-U7-sparc-dvd.iso
Set 'server_URI' to 'nfs://1.3.4.78:/Factory/OL-201705232017-R6-U7-sparc-dvd.iso'

Verify that the status is operational – if not check that your uri is correctly set

-> show /SP/services/kvms/host_storage_device status

/SP/services/kvms/host_storage_device
 Properties:
 status = operational

 

 

Set the system to not autoboot the existing solaris OS

-> set /HOST/bootmode script="setenv auto-boot? false"

Now start the system and login to the console to monitor

-> start /SYS
Are you sure you want to start /SYS (y/n)? y
Starting /SYS

-> start /SP/console
Are you sure you want to start /SP/console (y/n)? y

 

At the ok prompt

{0} ok boot rcdrom - install
Boot device: /pci@308/pci@1/usb@0/hub@1/storage@1/disk@0 File and args: - install

 

GNU GRUB version 2.02~beta3

+----------------------------------------------------------------------------+
 |*Install linux using text mode (use DHCP) | 
 | Install linux using VNC (graphical) mode (use DHCP) |
 | Rescue mode (use DHCP) |
 | |
 | |
 | |
 | |
 | |
 | |
 | |
 | |
 | | 
 +----------------------------------------------------------------------------+

Use the ^ and v keys to select which entry is highlighted. 
 Press enter to boot the selected OS, `e' to edit the commands 
 before booting or `c' for a command-line.

At this point you can select to test your install media – I said no at this point

I selected the install language of English

On the next screen I received an error

Error processing drive: ↑ │ 
 │ ▮ │ 
 │ platform-f03527f8-pci-0009:01:00.0-usb-0:1.3:1.0-scsi-0:0:0:0 ▒ │ 
 │ 1936MB ▒ │ 
 │ MICRON eUSB DISK ▒ │ 
 │ ▒ │ 
 │ This device may need to be reinitialized. ▒ │ 
 │ ▒ │ 
 │ REINITIALIZING WILL CAUSE ALL DATA TO BE LOST! ▒ │ 
 │ ▒ │ 
 │ This action may also be applied to all other disks needing ▒ │ 
 │ reinitialization. ↓ │ 
 │

the eUSB disk is used for the failback miniroot – I tried ‘ignore’ on this

On the next device which is the internal disk I selected ‘re-initialize’

Then I selected the correct timezone ‘Europe/London’

I entered a root password.

On the next screen I selected one disk out of the available devices and allowed it to completely re-initialize.

After that, the install completed in approximately 30 minutes.

Configure Networking

You cannot currently use nm-tool to configure the networking on  Oracle Linux 6 Update 7 (SPARC) as it can conflict with the ldomsmanager packages.

So – manually edit the /etc/sysconfig/network-scripts/ifcfg-eth0 and set the parameters to match your environment.

 

DEVICE=eth0
HWADDR=00:10:E0:8A:48:06
TYPE=Ethernet
UUID=57dfbae6-1cbc-4d25-8f9c-081238563128
ONBOOT=yes
NM_CONTROLLED=yes
BOOTPROTO=none
IPADDR=1.2.3.160
PREFIX=22
GATEWAY=1.2.3.1
DNS1=1.2.3.4
DNS2=1.2.3.5
DOMAIN=blah.com

Restart the network stack.

 

 

[root@localhost network-scripts]# service network restart
Shutting down loopback interface: [ OK ]
Bringing up loopback interface: [ OK ]
Bringing up interface eth0: pps pps0: new PPS source ptp0
ixgbe 0001:01:00.0: registered PHC device on eth0
IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready
Determining if ip address 138.3.8.160 is already in use for device eth0...
[ OK ]
[root@localhost network-scripts]# ixgbe 0001:01:00.0 eth0: NIC Link is Up 10 Gbps, Flow Control: None
IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready

So now I have networking, but still no hostname set… so now I need to edit /etc/sysconfig/network and change the hostname. This will not be picked up until reboot..

[root@host-8-160 ~]# cat /etc/sysconfig/network
NETWORKING=yes
HOSTNAME=host-8-160.blah.com

 

Yum configuration

Next up.. get yum configured. I’m bound to need more packages so being able to get hold of the latest will be important.

I am going to connect to Oracle’s public yum servers, and my computer can access the internet via a proxy. So all I need to do is tell yum which proxy to use

edit /etc/yum.conf and add the proxy information

proxy=http://1.3.3.194:80

 

Verify that there is a file listing the repositories to use in /etc/yum.repos.d (this should be automatically configured at install)

ls -l /etc/yum.repos.d
total 4
-rw-r--r--. 1 root root 486 Jun 7 16:54 public-yum-ol6.repo

[root@host-8-160 yum.repos.d]# cat /etc/yum.repos.d/public-yum-ol6.repo 
[public_ol6_latest]
name=Oracle Linux $releasever Latest ( SPARC64 )
baseurl=http://public-yum.oracle.com/repo/OracleLinux/OL6/latest/sparc64/
gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-oracle
gpgcheck=1
enabled=1




[public_ol6_software_collections]
name=Software Collection Library release 1.2 packages for Oracle Linux 6 (SPARC64)
baseurl=http://yum.oracle.com/repo/OracleLinux/OL6/SoftwareCollections/sparc64/
gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-oracle
gpgcheck=1
enabled=1

ok.. now I can install additional packages as required.

[root@host-8-160 yum.repos.d]# which wget
/usr/bin/which: no wget in (/usr/lib64/qt-3.3/bin:/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin:/root/bin)
[root@host-8-160 yum.repos.d]# yum install wget
Loaded plugins: downloadonly, ulninfo
Setting up Install Process
Resolving Dependencies
--> Running transaction check
---> Package wget.sparc64 0:1.12-5.el6_6.1 will be installed
--> Finished Dependency Resolution

Dependencies Resolved

<snip>

Post install configuration / Issues from the release notes

In the release notes it says that to upgrade the OFED packages you need to remove them and re-install, also to get a new yum repo file.

 

 

 

 

 

 

vmstat – who is in kthr w status

This post leans heavily on the work of clever people before me, especially this blog post https://blogs.oracle.com/swan/entry/find_out_process_es_with

I had a zone with has been heavily used until recently, but had mostly been quiescent for the past few weeks. In vmstat, it had been noticed that I had approximately 150 lwps in the w state.

# vmstat 5 5
 kthr memory page disk faults cpu
 r b w swap free re mf pi po fr de sr sd sd sd vc in sy cs us sy id
 0 0 188 873445368 54537152 472 699 13 0 0 0 0 20 0 32 0 17621 32317 17008 0 0 99
 0 0 149 863818176 9628832 42 733 38 0 0 0 0 31 0 33 0 19384 147817 16927 20 1 79
 0 0 149 864814392 10389536 125 919 0 0 0 0 0 14 0 20 0 19523 135382 16785 26 1 73
 0 0 149 864914688 10422784 63 544 0 0 0 0 0 16 0 33 0 19072 128355 16572 26 1 73
 0 0 149 863905112 9505416 46 579 0 0 0 0 0 14 0 29 0 19057 130498 16582 25 1 75

 

The vmstat man page says..

 w the number of swapped out lightweight processes (LWPs)
   that are waiting for processing resources to finish.

Hmm..

So I had a look at this blog post which uses mdb and some knowledge of the solaris source to find the PIDs https://blogs.oracle.com/swan/entry/find_out_process_es_with

 

Of course as I’m in a Zone, I need to do my investigations at the Global Zone level.

First get the list of PIDs and their swapped count in hex

 

# echo '::walk proc|::print -t proc_t p_pidp->pid_id p_swapcnt'|mdb -k|awk '{if(NR%2){printf("%s\t",$0);}else{printf("%s\n",$0);}}'|awk '{if($NF!=0){printf("pid: %s\tp_swapcnt: %s\n",$4,$NF);}}'

giving an output like

pid: 0x2f p_swapcnt: 0x1
pid: 0x25 p_swapcnt: 0x3
pid: 0x11 p_swapcnt: 0x1
pid: 0xf p_swapcnt: 0x5
pid: 0xcb7d p_swapcnt: 0x1

which I saved to a text file. Now, the blog post only had 17 to play with.. I’ve got over 150 so I’m not going to be looking up all the individual PIDs by hand. There is almost certainly a more elegant way of doing this, through cunning use of pipe and awk or maybe dtrace, but I was pressed for time.

#!/bin/bash
runcounter=0
while read blah pidder blah1 counter
do
 outpid=`printf "%d\n" $pidder`
 outcounter=`printf "%d\n" $counter`
 echo "Number of LWPS swapped : $outcounter"
 echo "Process=`ps -fp $outpid`" 
 echo "-------------------------------------------"
 runcounter=$(($runcounter+$outcounter))
done < walk.txt 
echo "Total number of lwps in state w: $runcounter"

This gave an output similar to :

<snip>

-------------------------------------------
Number of LWPS swapped : 5
Process= UID PID PPID C STIME TTY TIME CMD
 root 15 1 0 Feb 27 ? 1:22 /lib/svc/bin/svc.startd
-------------------------------------------
Number of LWPS swapped : 1
Process= UID PID PPID C STIME TTY TIME CMD
 root 52093 15 0 Mar 09 console 0:00 /usr/sbin/ttymon -g -d /dev/console -l console -m ldterm,ttcompat -h -p sc7ach00pd01-d2 console login: 
-------------------------------------------
Total number of lwps in state w: 149


From the Solaris Internals manual ( 2.4.1 The Process Structure,  Table 10.3 and 10.3.6 The Memory Scheduler),   processes with p_swapcnt > 0 are those who have been swapped out by the memory scheduler to free up memory pages.  This is a separate operation from page-out, and is relatively inexpensive, though does dramatically affect the process’s performance.  Swapping out a process involves removing all of a process’s thread structures and private pages from memory and setting flags in the process to table to show that this process has been swapped out.  The memory scheduler is started at boot time and doesn’t do anything until the memory is consistently less than desfree memory over a 30 second average.  Desfree is a calculated value https://docs.oracle.com/cd/E53394_01/html/E54818/chapter2-10.html#OSTUNchapter2-103 , set at 1/128th of the memory of the system, at a minimum of 256K.

 

At some point in the recent past, this system suffered extreme memory pressure due to someone starting up a huge SGA + PGA on the system. It looks like the memory scheduler does not automatically swap the processes back in when the memory pressure eases, instead waiting for the process to do ‘something’ and need to run those LWPs (this makes sense – it’s better to not do work unless it’s needed, and as desfree is not actually a lot of memory free, if you’re bumping along that threshhold the last thing you need is the scheduler to un-swap something and tip you back into a memory shortage)

I basically ‘touched’ each one of these pids by using the pfiles command … and now I have no processes sitting in state ‘w’

# vmstat 5 3
 kthr      memory            page            disk          faults      cpu
 r b w   swap  free  re  mf pi po fr de sr sd sd sd vc   in   sy   cs us sy id
 0 0 187 873530456 54500568 471 698 13 0 0 0 0 20 0 32 0 17621 32769 17006 0 0 99
 0 0 0 909716856 43286880 585 1103 0 0 0 0 0 28 0 32 0 18478 206015 16744 2 0 98
 0 0 0 909647064 43275192 62 260 0 0 0 0 0 14  0 27  0 17117 202019 15744 2 0 98

 

 

Expanding a zpool backed by an iSCSI LUN

So, you have a zpool provided by an iscsi LUN which is tight on space, and you’ve done all the tidying you can think of.. what do you do next? Well if you’re lucky, you have space to expand the iscsi LUN and then make it available to your zpool.

First – find the LUN that holds the zpool using zpool status <poolname>

 

# zpool status zonepool
  pool: zonepool
 state: ONLINE
  scan: none requested
config:

        NAME                                     STATE     READ WRITE CKSUM
        zonepool                                 ONLINE       0     0     0
          c0t600144F0A22997000000574BFAA90004d0  ONLINE       0     0     0

errors: No known data errors

Note the lun identifier, starting with c0, ending with ‘d0’

Locate the LUN on your storage appliance. If you are on a ZFS appliance there is a really handy script  in Oracle Support Document 1921605.1 Otherwise you’ll have to use the tools supplied with your storage array or your eyes 😉

So, I’ve located my lun on my ZFS appliance by matching the LUN identifier and then I need to change the LUN size..

shares> select sc1-myfs
shares sc1-myfs> 
shares sc1-myfs> select zoneshares_zonepool 
shares sc1-myfs/zoneshares_zonepool> get lunguid
 lunguid = 600144F0A22997000000574BFAA90004
shares sc1-myfs/zoneshares_zonepool> set volsize=500G
 volsize = 500G (uncommitted)
shares sc1-myfs/zoneshares_zonepool> commit

 

Now I just need to get my zpool to expand into the available space on the lun

# zpool online -e zonepool c0t600144F0A22997000000574BFAA90004d0

And now we’re done

 

Passing boot arguments via the reboot command

This is probably something *so* obvious to most people who use solaris all the time, but I had previously never joined the dots.

The reboot command can pass arguments to the boot program and kernel on restart. If the boot arguments start with a hypen you need to include a delimiter of — (two hyphens). You also may need to quote the command if whitespace is important.

So if you want to issue a boot -r command you could do the 2 step process of bringing the system down to the OBP and then issuing boot -r or you could just

reboot -- -r

To reboot and install solaris from the network

reboot  -- 'net:dhcp - install'