Installing Oracle Linux for SPARC on a T7-2

Preparation

Booting from the install media

We do not have Oracle Linux integrated with our standard install server so I am going to do this via CDROM redirection.

I have copied the ISO image to a NFS shared directory on my network, now I have to setup the redirection so the server will boot from this.

Make sure the ILOM timeout is set to at least 2 hours

-> show /SP/cli timeout

/SP/cli
 Properties:
 timeout = 0

 

The timeout is expressed in minutes – if it is set to 0 then the session will never timeout.

Make sure the KVMS services are enabled in the ILOM

-> show /SP/services/kvms/ servicestate

/SP/services/kvms
 Properties:
 servicestate = enabled

Set the host_storage_device to remote, and set the remote location to your NFS server

-> cd /SP/services/kvms/host_storage_device
/SP/services/kvms/host_storage_device

-> set mode=remote
Set 'mode' to 'remote'

-> cd remote
/SP/services/kvms/host_storage_device/remote

-> set server_uri=nfs://1.3.4.78:/Factory/OL-201705232017-R6-U7-sparc-dvd.iso
Set 'server_URI' to 'nfs://1.3.4.78:/Factory/OL-201705232017-R6-U7-sparc-dvd.iso'

Verify that the status is operational – if not check that your uri is correctly set

-> show /SP/services/kvms/host_storage_device status

/SP/services/kvms/host_storage_device
 Properties:
 status = operational

 

 

Set the system to not autoboot the existing solaris OS

-> set /HOST/bootmode script="setenv auto-boot? false"

Now start the system and login to the console to monitor

-> start /SYS
Are you sure you want to start /SYS (y/n)? y
Starting /SYS

-> start /SP/console
Are you sure you want to start /SP/console (y/n)? y

 

At the ok prompt

{0} ok boot rcdrom - install
Boot device: /pci@308/pci@1/usb@0/hub@1/storage@1/disk@0 File and args: - install

 

GNU GRUB version 2.02~beta3

+----------------------------------------------------------------------------+
 |*Install linux using text mode (use DHCP) | 
 | Install linux using VNC (graphical) mode (use DHCP) |
 | Rescue mode (use DHCP) |
 | |
 | |
 | |
 | |
 | |
 | |
 | |
 | |
 | | 
 +----------------------------------------------------------------------------+

Use the ^ and v keys to select which entry is highlighted. 
 Press enter to boot the selected OS, `e' to edit the commands 
 before booting or `c' for a command-line.

At this point you can select to test your install media – I said no at this point

I selected the install language of English

On the next screen I received an error

Error processing drive: ↑ │ 
 │ ▮ │ 
 │ platform-f03527f8-pci-0009:01:00.0-usb-0:1.3:1.0-scsi-0:0:0:0 ▒ │ 
 │ 1936MB ▒ │ 
 │ MICRON eUSB DISK ▒ │ 
 │ ▒ │ 
 │ This device may need to be reinitialized. ▒ │ 
 │ ▒ │ 
 │ REINITIALIZING WILL CAUSE ALL DATA TO BE LOST! ▒ │ 
 │ ▒ │ 
 │ This action may also be applied to all other disks needing ▒ │ 
 │ reinitialization. ↓ │ 
 │

the eUSB disk is used for the failback miniroot – I tried ‘ignore’ on this

On the next device which is the internal disk I selected ‘re-initialize’

Then I selected the correct timezone ‘Europe/London’

I entered a root password.

On the next screen I selected one disk out of the available devices and allowed it to completely re-initialize.

After that, the install completed in approximately 30 minutes.

Configure Networking

You cannot currently use nm-tool to configure the networking on  Oracle Linux 6 Update 7 (SPARC) as it can conflict with the ldomsmanager packages.

So – manually edit the /etc/sysconfig/network-scripts/ifcfg-eth0 and set the parameters to match your environment.

 

DEVICE=eth0
HWADDR=00:10:E0:8A:48:06
TYPE=Ethernet
UUID=57dfbae6-1cbc-4d25-8f9c-081238563128
ONBOOT=yes
NM_CONTROLLED=yes
BOOTPROTO=none
IPADDR=1.2.3.160
PREFIX=22
GATEWAY=1.2.3.1
DNS1=1.2.3.4
DNS2=1.2.3.5
DOMAIN=blah.com

Restart the network stack.

 

 

[root@localhost network-scripts]# service network restart
Shutting down loopback interface: [ OK ]
Bringing up loopback interface: [ OK ]
Bringing up interface eth0: pps pps0: new PPS source ptp0
ixgbe 0001:01:00.0: registered PHC device on eth0
IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready
Determining if ip address 138.3.8.160 is already in use for device eth0...
[ OK ]
[root@localhost network-scripts]# ixgbe 0001:01:00.0 eth0: NIC Link is Up 10 Gbps, Flow Control: None
IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready

So now I have networking, but still no hostname set… so now I need to edit /etc/sysconfig/network and change the hostname. This will not be picked up until reboot..

[root@host-8-160 ~]# cat /etc/sysconfig/network
NETWORKING=yes
HOSTNAME=host-8-160.blah.com

 

Yum configuration

Next up.. get yum configured. I’m bound to need more packages so being able to get hold of the latest will be important.

I am going to connect to Oracle’s public yum servers, and my computer can access the internet via a proxy. So all I need to do is tell yum which proxy to use

edit /etc/yum.conf and add the proxy information

proxy=http://1.3.3.194:80

 

Verify that there is a file listing the repositories to use in /etc/yum.repos.d (this should be automatically configured at install)

ls -l /etc/yum.repos.d
total 4
-rw-r--r--. 1 root root 486 Jun 7 16:54 public-yum-ol6.repo

[root@host-8-160 yum.repos.d]# cat /etc/yum.repos.d/public-yum-ol6.repo 
[public_ol6_latest]
name=Oracle Linux $releasever Latest ( SPARC64 )
baseurl=http://public-yum.oracle.com/repo/OracleLinux/OL6/latest/sparc64/
gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-oracle
gpgcheck=1
enabled=1




[public_ol6_software_collections]
name=Software Collection Library release 1.2 packages for Oracle Linux 6 (SPARC64)
baseurl=http://yum.oracle.com/repo/OracleLinux/OL6/SoftwareCollections/sparc64/
gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-oracle
gpgcheck=1
enabled=1

ok.. now I can install additional packages as required.

[root@host-8-160 yum.repos.d]# which wget
/usr/bin/which: no wget in (/usr/lib64/qt-3.3/bin:/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin:/root/bin)
[root@host-8-160 yum.repos.d]# yum install wget
Loaded plugins: downloadonly, ulninfo
Setting up Install Process
Resolving Dependencies
--> Running transaction check
---> Package wget.sparc64 0:1.12-5.el6_6.1 will be installed
--> Finished Dependency Resolution

Dependencies Resolved

<snip>

Post install configuration / Issues from the release notes

In the release notes it says that to upgrade the OFED packages you need to remove them and re-install, also to get a new yum repo file.

 

 

 

 

 

 

Advertisements

vmstat – who is in kthr w status

This post leans heavily on the work of clever people before me, especially this blog post https://blogs.oracle.com/swan/entry/find_out_process_es_with

I had a zone with has been heavily used until recently, but had mostly been quiescent for the past few weeks. In vmstat, it had been noticed that I had approximately 150 lwps in the w state.

# vmstat 5 5
 kthr memory page disk faults cpu
 r b w swap free re mf pi po fr de sr sd sd sd vc in sy cs us sy id
 0 0 188 873445368 54537152 472 699 13 0 0 0 0 20 0 32 0 17621 32317 17008 0 0 99
 0 0 149 863818176 9628832 42 733 38 0 0 0 0 31 0 33 0 19384 147817 16927 20 1 79
 0 0 149 864814392 10389536 125 919 0 0 0 0 0 14 0 20 0 19523 135382 16785 26 1 73
 0 0 149 864914688 10422784 63 544 0 0 0 0 0 16 0 33 0 19072 128355 16572 26 1 73
 0 0 149 863905112 9505416 46 579 0 0 0 0 0 14 0 29 0 19057 130498 16582 25 1 75

 

The vmstat man page says..

 w the number of swapped out lightweight processes (LWPs)
   that are waiting for processing resources to finish.

Hmm..

So I had a look at this blog post which uses mdb and some knowledge of the solaris source to find the PIDs https://blogs.oracle.com/swan/entry/find_out_process_es_with

 

Of course as I’m in a Zone, I need to do my investigations at the Global Zone level.

First get the list of PIDs and their swapped count in hex

 

# echo '::walk proc|::print -t proc_t p_pidp->pid_id p_swapcnt'|mdb -k|awk '{if(NR%2){printf("%s\t",$0);}else{printf("%s\n",$0);}}'|awk '{if($NF!=0){printf("pid: %s\tp_swapcnt: %s\n",$4,$NF);}}'

giving an output like

pid: 0x2f p_swapcnt: 0x1
pid: 0x25 p_swapcnt: 0x3
pid: 0x11 p_swapcnt: 0x1
pid: 0xf p_swapcnt: 0x5
pid: 0xcb7d p_swapcnt: 0x1

which I saved to a text file. Now, the blog post only had 17 to play with.. I’ve got over 150 so I’m not going to be looking up all the individual PIDs by hand. There is almost certainly a more elegant way of doing this, through cunning use of pipe and awk or maybe dtrace, but I was pressed for time.

#!/bin/bash
runcounter=0
while read blah pidder blah1 counter
do
 outpid=`printf "%d\n" $pidder`
 outcounter=`printf "%d\n" $counter`
 echo "Number of LWPS swapped : $outcounter"
 echo "Process=`ps -fp $outpid`" 
 echo "-------------------------------------------"
 runcounter=$(($runcounter+$outcounter))
done < walk.txt 
echo "Total number of lwps in state w: $runcounter"

This gave an output similar to :

<snip>

-------------------------------------------
Number of LWPS swapped : 5
Process= UID PID PPID C STIME TTY TIME CMD
 root 15 1 0 Feb 27 ? 1:22 /lib/svc/bin/svc.startd
-------------------------------------------
Number of LWPS swapped : 1
Process= UID PID PPID C STIME TTY TIME CMD
 root 52093 15 0 Mar 09 console 0:00 /usr/sbin/ttymon -g -d /dev/console -l console -m ldterm,ttcompat -h -p sc7ach00pd01-d2 console login: 
-------------------------------------------
Total number of lwps in state w: 149


From the Solaris Internals manual ( 2.4.1 The Process Structure,  Table 10.3 and 10.3.6 The Memory Scheduler),   processes with p_swapcnt > 0 are those who have been swapped out by the memory scheduler to free up memory pages.  This is a separate operation from page-out, and is relatively inexpensive, though does dramatically affect the process’s performance.  Swapping out a process involves removing all of a process’s thread structures and private pages from memory and setting flags in the process to table to show that this process has been swapped out.  The memory scheduler is started at boot time and doesn’t do anything until the memory is consistently less than desfree memory over a 30 second average.  Desfree is a calculated value https://docs.oracle.com/cd/E53394_01/html/E54818/chapter2-10.html#OSTUNchapter2-103 , set at 1/128th of the memory of the system, at a minimum of 256K.

 

At some point in the recent past, this system suffered extreme memory pressure due to someone starting up a huge SGA + PGA on the system. It looks like the memory scheduler does not automatically swap the processes back in when the memory pressure eases, instead waiting for the process to do ‘something’ and need to run those LWPs (this makes sense – it’s better to not do work unless it’s needed, and as desfree is not actually a lot of memory free, if you’re bumping along that threshhold the last thing you need is the scheduler to un-swap something and tip you back into a memory shortage)

I basically ‘touched’ each one of these pids by using the pfiles command … and now I have no processes sitting in state ‘w’

# vmstat 5 3
 kthr      memory            page            disk          faults      cpu
 r b w   swap  free  re  mf pi po fr de sr sd sd sd vc   in   sy   cs us sy id
 0 0 187 873530456 54500568 471 698 13 0 0 0 0 20 0 32 0 17621 32769 17006 0 0 99
 0 0 0 909716856 43286880 585 1103 0 0 0 0 0 28 0 32 0 18478 206015 16744 2 0 98
 0 0 0 909647064 43275192 62 260 0 0 0 0 0 14  0 27  0 17117 202019 15744 2 0 98

 

 

Adding an Oracle Exadata Storage Server to Enterprise Manager using the command line

 

Ok, I’m just noodling around here… I have some ‘spare’ storage servers that are in the same fabric as my SuperCluster and I wanted to discover them in EM.

oracle@odc-em-sc7a:/u01/app/oracle/agent13/agent_inst/sysman/log$ emcli add_target -type=oracle_exadata -name=”expbcel09.osc.uk.oracle.com” -host=”sc7ach00pd00-d1.osc.uk.oracle.com” -properties=”CellName:expbcel09.osc.uk.oracle.com;MgmtIPAddr:138.3.3.82″
Target “expbcel09.osc.uk.oracle.com:oracle_exadata” added successfully

Changing attributes on ASM Diskgroups

You can see the attributes set on your ASM diskgroups in 2 ways, via the view  v$asm_attribute

SQL> select a.name, b.name, b.value from v$asm_attribute b, v$asm_diskgroup a where a.group_number=b.group_number and a.name='DATAX6C1' ;

Or via asmcmd and the lsattr

grid@sc7ach00pd00-d1:~$ asmcmd

ASMCMD> cd DATAX6C1
ASMCMD> lsattr -G DATAXC1 -l

Attributes you might want to pay attention to in an Exadata environment are

  • compatible.advm
  • cell.smart_scan_capable
  • appliance.mode
  • compatible.asm
  • compatible.rdbms

If you manually create a diskgroup via asmca these attributes will not normally be set, and so you may want to go manually set them.

SQL> ALTER DISKGROUP DATAX6C1 SET ATTRIBUTE 'appliance.mode'='TRUE';

The attribute is set immediately, but based on my experience, it does not come into effect until the disk group has  been rebalanced.

SQL> alter diskgroup DATAX6C1 rebalance power 2;

Creating a ramdisk in Solaris 11

Ramdisks are a great way to ‘prove’ that it’s not the performance of the underlying disks device that is stopping a process from writing a file quickly (doesn’t prove anything about the filesystem though…) . Ramdisks are transient, and are lost on system reboot, and also consume the memory on your system, so if you make them too large you can cause yourself other problems.

Creating a Ramdisk

The ramdiskadm command is used to create a ramdisk. In this example I am creating a 2G ramdisk called ‘idisk’

# ramdiskadm -a idisk 2G

Then you create the filesystem on the ramdisk (in this case UFS)

# newfs /dev/ramdisk/idisk

newfs: construct a new file system /dev/ramdisk/idisk: (y/n)? y
Warning: 2688 sector(s) in last cylinder unallocated
/dev/ramdisk/idisk:    41942400 sectors in 6827 cylinders of 48 tracks, 128 sectors
        20479.7MB in 427 cyl groups (16 c/g, 48.00MB/g, 5824 i/g)
super-block backups (for fsck -F ufs -o b=#) at:
 32, 98464, 196896, 295328, 393760, 492192, 590624, 689056, 787488, 885920,
Initializing cylinder groups:
........
super-block backups for last 10 cylinder groups at:
 40997024, 41095456, 41193888, 41292320, 41390752, 41489184, 41587616,
 41686048, 41784480, 41882912

Now you have a filesystem, you can mount it onto the correct location

# mkdir /export/home/tuxedo/DATA2
# mount /dev/ramdisk/idisk /export/home/tuxedo/DATA2 

Remember to set the ownership/permissions to allow the non-root users to write to the device

# chown tuxedo:oinstall /export/home/tuxedo/DATA2

Maintaining Ramdisks

You can check if a ramdisk exists by just running ramdiskadm without parameters

# ramdiskadm

Block Device                                                  Size  Removable 
/dev/ramdisk/idisk                                     21474836480    Yes

You can remove a ramdisk by unmounting the filesystem and using ramdiskadm -d

# umount /export/home/tuxedo/DATA2 
# ramdiskadm -d idisk

Expanding a zpool backed by an iSCSI LUN

So, you have a zpool provided by an iscsi LUN which is tight on space, and you’ve done all the tidying you can think of.. what do you do next? Well if you’re lucky, you have space to expand the iscsi LUN and then make it available to your zpool.

First – find the LUN that holds the zpool using zpool status <poolname>

 

# zpool status zonepool
  pool: zonepool
 state: ONLINE
  scan: none requested
config:

        NAME                                     STATE     READ WRITE CKSUM
        zonepool                                 ONLINE       0     0     0
          c0t600144F0A22997000000574BFAA90004d0  ONLINE       0     0     0

errors: No known data errors

Note the lun identifier, starting with c0, ending with ‘d0’

Locate the LUN on your storage appliance. If you are on a ZFS appliance there is a really handy script  in Oracle Support Document 1921605.1 Otherwise you’ll have to use the tools supplied with your storage array or your eyes 😉

So, I’ve located my lun on my ZFS appliance by matching the LUN identifier and then I need to change the LUN size..

shares> select sc1-myfs
shares sc1-myfs> 
shares sc1-myfs> select zoneshares_zonepool 
shares sc1-myfs/zoneshares_zonepool> get lunguid
 lunguid = 600144F0A22997000000574BFAA90004
shares sc1-myfs/zoneshares_zonepool> set volsize=500G
 volsize = 500G (uncommitted)
shares sc1-myfs/zoneshares_zonepool> commit

 

Now I just need to get my zpool to expand into the available space on the lun

# zpool online -e zonepool c0t600144F0A22997000000574BFAA90004d0

And now we’re done

 

Installing Cisco AnyConnect VPN on Ubuntu 16.04

I was struggling setting up a new VPN to connect to my servers at the office as vpnsetup.sh was failing

# ./vpnsetup.sh 
Installing Cisco AnyConnect Secure Mobility Client...
Extracting installation files to /tmp/vpn.0Zgby3/vpninst625702875.tgz...
Unarchiving installation files to /tmp/vpn.0Zgby3...
Starting Cisco AnyConnect Secure Mobility Client Agent...
Failed to start vpnagentd.service: Unit vpnagentd.service not found.

I found a bunch of articles on the internet saying that this was due to missing libraries so started with the first batch of recommendations…

# apt install -y lib32z1 lib32ncurses5

This still didn’t work.

So I tried the next one, which was to also install the network-manager-openconnect package and reload the daemons

# apt install network-manager-openconnect

# systemctl daemon-reload

Success!

# ./vpnsetup.sh 
Installing Cisco AnyConnect Secure Mobility Client...
Removing previous installation...
mv: cannot stat '/opt/cisco/vpn/*.log': No such file or directory
Extracting installation files to /tmp/vpn.yUyv15/vpninst922924093.tgz...
Unarchiving installation files to /tmp/vpn.yUyv15...
Starting Cisco AnyConnect Secure Mobility Client Agent...
Warning: vpnagentd.service changed on disk. Run 'systemctl daemon-reload' to reload units.
Done!

 

 

 

Really simple HTTP server with python

I hit a situation where I needed to be able to serve a firmware update via http and didn’t want to configure a full apache (or similar) webserver.

You can create a very simple http server using python

Move the file you want to share to a directory that doesn’t contain anything you wouldn’t want being shared with the rest of your network

# cp Sun_System_Firmware-9_5_1_c-SPARC_M5-32+M6-32.pkg /meltmp

Change directory to the directory you want to share and start the webserver (I’m choosing to use the port 8080)

# cd /meltemp

# python -m SimpleHTTPServer 8080
Serving HTTP on 0.0.0.0 port 8080 ...
10.10.7.40 - - [19/May/2016 16:34:35] "GET /Sun_System_Firmware-9_5_1_c-SPARC_M5-32+M6-32.pkg HTTP/1.1" 200 -

This server will run until you CONTROLC out of the session, writing logging information to the screen. You could of course nohup/background it, but I think this is really designed as a quick and dirty solution so I would not leave it running long term.

This works on Solaris and Linux

Redirecting Oracle Management agent to another OMS

I have an Enterprise Manager 13.1 system that is regularly reset to a ‘clean’ state using zfs snapshots. If you already have the management agents installed on other hosts you have the choice of

  1. De-installing the existing agent and installing a clean one
  2. Persuading the agent to talk to your ‘new’ server

A way of doing 2) is to follow the steps outlined in the MOS note EM 12c : Redirecting Oracle Management Agent to Another Oracle Management Service (Doc ID 1390717.1) which despite being for 12c appears to work for Enterprise Manager 13.

 

agent13@$ rm -rf agent_inst
agent13@$ mkdir agent_inst
agent13@$ /u01/app/agent13/agent_13.1.0.0.0/sysman/install/agentDeploy.sh \
AGENT_BASE_DIR=/u01/app/agent13 OMS_HOST=odc-em-ssc \
EM_UPLOAD_PORT=1159 AGENT_REGISTRATION_PASSWORD=welcome1 \
AGENT_INSTANCE_HOME=/u01/app/agent13/agent_inst \
AGENT_PORT=3872 -configOnly

 

This takes about 5 minutes to re-configure the agent to talk to the new server and then promote the targets. You will then be prompted to re-run the root.sh associated with the agent’s Oracle Home.