Tag Archives: ldom

Creating LDOMs on Oracle Linux 6.7 SPARC

Lots of things to work out in advance

  1. What disks are available for use by my LDOM? I have a couple of disks, but I’m going to try creating the LDOM virtual disks on logical volumes hosted on the disk /dev/sdc
  2. What networking can I use? This is fairly simple, I only have 1 active network connection on eth0 so this will have to be virtualised
  3. How much resource is in the server, and how much can I give to my guest domain? You can see the total resource available in ldm ls.  I know I have 2 x SPARC M7 CPU, each with 32 cores, and 8 threads per core.
NAME STATE FLAGS CONS VCPU MEMORY UTIL NORM UPTIME
primary active -ndcv- UART 512 958G 0.0% 0.0% 4h 40m

Use GNU-Parted to partition the disk

Sorry this part was written retrospectively – I had multiple problems with the creating of the device to host the LDOM operating system. This manifested itself as

  1. No stability of boot device. If I gave my LDOM an entire disk to use as a boot device, it would get a boot sector installed in the expected place. On power on/poweroff of the server, the Grub boot loader did use the OS image for the guest LDOM to boot the primary LDOM. I got round this by giving the guest LDOMs slices on a disk.
  2. Not very stable device tree. If you use the /dev/sdX type names to refer to devices in your LDOM definition,  this device  name can change on reboot. So use something more stable like the WWN of the device.

You can see which disks are available using lsscsi

[root@host-8-160 ~]# lsscsi
[0:0:0:0] disk HITACHI H109060SESUN600G A690 /dev/sda 
[0:0:1:0] disk HGST HSCAC2DA4SUN400G A29A /dev/sdb 
[0:0:2:0] disk HGST H101812SFSUN1.2T A770 /dev/sdc 
[0:0:3:0] disk HGST H101812SFSUN1.2T A770 /dev/sdd 
[1:0:0:0] disk HGST HSCAC2DA4SUN400G A122 /dev/sde 
[1:0:1:0] disk HGST HSCAC2DA4SUN400G A122 /dev/sdf 
[8:0:0:0] cd/dvd SUN Remote ISO CDROM 1.01 /dev/sr0 
[9:0:0:0] cd/dvd TEAC DV-W28S-B AT11 /dev/sr1 
[10:0:0:0] disk MICRON eUSB DISK 1112 /dev/sdg

I am going to use one of the 1.2 TB disks as the boot device for the guest LDOM.

I used GNU Parted to label the disk with 2 partitions. The tool works in both GB/MB (1000 bytes to a kb) and GiB/MiB (1024 bytes to a KiB)

parted /dev/sdc

(parted) p
Model: HGST H101812SFSUN1.2T (scsi)
Disk /dev/sdc: 1200GB
Sector size (logical/physical): 512B/512B
Partition Table: gpt

Number Start End Size File system Name Flags
 1 1049kB 537GB 537GB ext3 host854
 2 537GB 1075GB 538GB host161

(parted) quit

I created filesystems on the partitions – I don’t think this required, but sometimes OS installers are unhappy if the disk is completely blank.

 

 

[root@host-8-160 ~]# mkfs -t ext4 -L host8161 /dev/sdc2
mke2fs 1.43-WIP (20-Jun-2013)
Filesystem label=host8161
OS type: Linux
Block size=4096 (log=2)
Fragment size=4096 (log=2)
Stride=0 blocks, Stripe width=0 blocks
32833536 inodes, 131334144 blocks
6566707 blocks (5.00%) reserved for the super user
First data block=0
Maximum filesystem blocks=4294967296
4008 block groups
32768 blocks per group, 32768 fragments per group
8192 inodes per group
Superblock backups stored on blocks: 
 32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208, 
 4096000, 7962624, 11239424, 20480000, 23887872, 71663616, 78675968, 
 102400000

Allocating group tables: done 
Writing inode tables: done 
Creating journal (32768 blocks): done
Writing superblocks and filesystem accounting information: done

 

[root@host-8-160 ~]# parted /dev/sdc
GNU Parted 2.1
Using /dev/sdc
Welcome to GNU Parted! Type 'help' to view a list of commands.
(parted) p 
Model: HGST H101812SFSUN1.2T (scsi)
Disk /dev/sdc: 1200GB
Sector size (logical/physical): 512B/512B
Partition Table: gpt

Number Start End Size File system Name Flags
 1 1049kB 537GB 537GB ext4 host854
 2 537GB 1075GB 538GB ext4 host161

 

As I had problems with device tree stability, I looked for a more stable naming method. Under the /dev/disk/ directory there are more stable naming interfaces that refer to characteristics that do not change, such as wwn. This is under the /dev/disk/by-id

 

[root@host-8-160 by-id]# ls -l wwn-0x5000cca02d021474-part1
lrwxrwxrwx. 1 root root 10 Jun 12 16:19 wwn-0x5000cca02d021474-part1 -> ../../sdc1
[root@host-8-160 by-id]# ls -l wwn-0x5000cca02d021474-part2
lrwxrwxrwx. 1 root root 10 Jun 12 16:19 wwn-0x5000cca02d021474-part2 -> ../../sdc2

 

 

Add the default services

Enable bridge control – there is a bunch of stuff in the release notes about the difference in virtual switch architecture for ldoms in linux.

http://docs.oracle.com/cd/E37670_01/E86243/html/ConfigureServicesControlDomain.html

The process is to change a file as follows

 

# sed -i '/SUBSYSTEM/ s/^#//' /etc/udev/rules.d/99-vsw.rules

and reboot.

You will not be able to see anything in the output from brctl show until the domain is bound.

Create the Virtual Consoles, Virtual Network Switch and Virtual disk Service

[root@host-8-160 ~]# ldm add-vcc port-range=5000-5100 primary-vcc0 primary
LDom primary does not support dynamic reconfiguration of IO devices
Initiating a delayed reconfiguration operation on the primary domain.
All configuration changes for other domains are disabled until the primary
domain reboots, at which time the new configuration for the primary domain
will also take effect.
[root@host-8-160 ~]# ldm add-vds primary-vds0 primary
------------------------------------------------------------------------------
Notice: The primary domain is in the process of a delayed reconfiguration.
Any changes made to the primary domain will only take effect after it reboots.
------------------------------------------------------------------------------

[root@host-8-160 ~]# ldm add-vsw net-dev=eth0 primary-vsw0 primary
------------------------------------------------------------------------------
Notice: The primary domain is in the process of a delayed reconfiguration.
Any changes made to the primary domain will only take effect after it reboots.


[root@host-8-160 ~]# ldm list-services
VCC
 NAME LDOM PORT-RANGE
 primary-vcc0 primary 5000-5100

VSW
 NAME LDOM MACADDRESS NET-DEV DVID|PVID|VIDs
 ---- ---- ---------- ------- --------------
 primary-vsw0 primary 00:14:4f:fb:cd:dc eth0 1|1|--

VDS
 NAME LDOM VOLUME OPTIONS MPGROUP DEVICE
 primary-vds0 primary

Reconfigure primary to free resources for the guest domain

I am going to assign 96 cores to the primary domain and 100GB memory.

[root@host-8-160 ~]# ldm set-vcpu 96 primary
------------------------------------------------------------------------------
Notice: The primary domain is in the process of a delayed reconfiguration.
Any changes made to the primary domain will only take effect after it reboots.
------------------------------------------------------------------------------

------------------------------------------------------------------------------

[root@host-8-160 ~]# ldm set-memory 100G primary
------------------------------------------------------------------------------
Notice: The primary domain is in the process of a delayed reconfiguration.
Any changes made to the primary domain will only take effect after it reboots.
------------------------------------------------------------------------------

Now reboot to activate the new configuration.

Create the Guest

This guest will have fully virtual I/O.

Create a virtual device to act as the DVD to allow the OS to be booted.

[root@host-8-160 ~]# ldm add-vdsdev /sfw/OL-201705232017-R6-U7-sparc-dvd.iso iso_vol@primary-vds0

Not all ldm commands have been implemented on Linux, so you cannot do some things such as add vcpu by core.. e.g

[root@host-8-160 ~]# ldm add-vcpu --core 16 host-8-161

Usage:
 ldm add-vcpu <number> <ldom>

My domain will be called host-8-161 to match the planned unix hostname.

[root@host-8-160 ~]# ldm add-domain host-8-161
[root@host-8-160 ~]# ldm add-vcpu 96 host-8-161
[root@host-8-160 ~]# ldm add-memory 100G host-8-161
[root@host-8-160 ~]# ldm add-vnet linkprop=phys-state vnet1 primary-vsw0 host-8-161
[root@host-8-160 ~]# ldm add-vdsdev /dev/disk/by-id/wwn-0x5000cca02d021474-part2 boot-8-161@primary-vds0
[root@host-8-160 ~]# ldm add-vdisk boot-8-161 boot-8-161@primary-vds0 host-8-161

I am also going to add the dvd device to allow the OS to be booted from here

[root@host-8-160 ~]# ldm add-vdisk vdisk_iso iso_vol@primary-vds0 host-8-161

Now bind the domain.

[root@host-8-160 ~]# ldm bind host-8-161
At this point you will be able to see output in the brctl show command
[root@host-8-160 ~]# brctl show
bridge name   bridge id              STP enabled  interfaces
              vsw0 8000.0010e08a4806 no           eth0
                                                  vif0.0

Start the domain

[root@host-8-160 ~]# ldm start host-8-161
LDom host-8-161 started

Connect to the console. This is different than on Solaris SPARC in that you use the ldmconsole command. To exit from this you use <ctrl>q

[root@host-8-160 ~]# ldmconsole host-8-161

{0} ok banner

SPARC T7-2, No Keyboard
Copyright (c) 1998, 2017, Oracle and/or its affiliates. All rights reserved.
OpenBoot 4.40.5, 256.0000 GB memory installed, Serial #83519177.
Ethernet address 0:14:4f:fa:66:c9, Host ID: 84fa66c9.


Booting and installing the Guest

Now we have the Open Boot Prom (OBP) ‘ok’ prompt which is familiar to people who work on SPARC Solaris.

We can see what device aliases have been created

{0} ok devalias
vdisk_iso       /virtual-devices@100/channel-devices@200/disk@1
boot-8-161      /virtual-devices@100/channel-devices@200/disk@0
vnet1           /virtual-devices@100/channel-devices@200/network@0
net             /virtual-devices@100/channel-devices@200/network@0
disk            /virtual-devices@100/channel-devices@200/disk@0
virtual-console /virtual-devices/console@1
name            aliases

 

I can now boot from my virtual iso device and install Linux

{0} ok boot vdisk_iso - install

After that the install is similar to the process documented in Installing Oracle Linux for SPARC on a T7-2

You will need to manually configure the networking and hostname, yum updates and install dtrace if required.

Check the name assigned to the virtualised network interface

[root@localhost ~]# ip addr show
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN 
 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
 inet 127.0.0.1/8 scope host lo
 valid_lft forever preferred_lft forever
 inet6 ::1/128 scope host 
 valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN qlen 1000
 link/ether 00:14:4f:fb:91:5d brd ff:ff:ff:ff:ff:ff

Edit the /etc/sysconfig/network-scripts/ifcfg-eth0 and set the correct parameters for your environment.

[root@host-8-161 ~]# cat /etc/sysconfig/network-scripts/ifcfg-eth0
DEVICE=eth0
HWADDR=00:14:4F:FB:91:5D
TYPE=Ethernet
UUID=eb521b4c-7e70-4963-af78-550163d2b214
ONBOOT=yes
NM_CONTROLLED=no
BOOTPROTO=none
IPADDR=1.2.3.161
PREFIX=22
GATEWAY=1.2.3.1
DNS1=1.2.34.4
DNS2=1.2.34.5
DOMAIN=blah.com
[root@host-8-161 ~]# cat /etc/sysconfig/network
NETWORKING=yes
HOSTNAME=host-8-161.blah.com

 

Troubleshooting communication problems with the service processor

I couldn’t contact the service processor to save the spconfig

[root@host-8-160 ~]# ldm ls-spconfig
The requested operation could not be performed because the communication
channel between the LDoms Manager and the system controller is down.
The ILOM interconnect may be disabled or down.

[root@host-8-160 ~]# ip addr show usb0
Device "usb0" does not exist.

 

My current settings

-> show /SP/network/interconnect hostmanaged

/SP/network/interconnect
 Properties:
 hostmanaged = true

-> show /SP/network/interconnect state

/SP/network/interconnect
 Properties:
 state = disabled

 

 

It should be..

-> show /SP/network/interconnect hostmanaged

/SP/network/interconnect
 Properties:
 hostmanaged = false

-> show /SP/network/interconnect state

/SP/network/interconnect
 Properties:
 state = enabled

After changing this – the usb0 network device should be available in the operating system.

[root@host-8-160 ~]# ip addr show usb0
10: usb0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN qlen 1000
 link/ether 02:21:28:57:47:17 brd ff:ff:ff:ff:ff:ff

Hmm… still not quite right. It doesn’t have a network address assigned.

Try resetting the SP… no difference

rebooted the OS..

 

Success.

[root@host-8-160 ~]# ip addr show usb0
8: usb0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UNKNOWN qlen 1000
 link/ether 02:21:28:57:47:17 brd ff:ff:ff:ff:ff:ff
 inet 169.254.182.77/24 brd 169.254.182.255 scope global usb0
 valid_lft forever preferred_lft forever
 inet6 fe80::21:28ff:fe57:4717/64 scope link 
 valid_lft forever preferred_lft forever
[root@host-8-160 ~]# ldm ls-spconfig

Changing the number of CPUs in a zone’s processor set (pooladm) Solaris 11.1

This post is mainly related to SuperCluster configuration, but can be applied to other solaris based systems.

In SuperCluster you can run the Oracle RDBMS in zones, and the zones are CPU capped. You may want to change the number of CPUs assigned to your zone for a couple of reasons

1) You have changed the number of CPUs that are available in the LDOM supporting this zone using a tool such as setcoremem and want to resize the zone to take this into account.

2) You need to change the zone size because you need more / less capacity.

Determine the number of processors that need to be reserved for the global zone (and all of your other zones!)


For SuperCluster you should reserve a miniumum of 2 cores per IB HCA in domains with a single HCA, a minimum of 4 cores for domains with 2 or more HCA. You also need to take into account other zones running on the system.


Find out how many CPUs are in the global zone

# psrinfo -pv

The physical processor has 16 cores and 128 virtual processors (0-127)

(... snipped output)
The core has 8 virtual processors (488-495)
The core has 8 virtual processors (496-503)
The core has 8 virtual processors (504-511)
SPARC-T5 (chipid 3, clock 3600 MHz)

So in my case, there are 512 virtual CPU, of which I need to reserve at least 32 (4 cores x 8 threads) for my global zone, and as I will also have an app zone on there, possibly a lot more. So lets say I’m going to keep 16 cores in global that would leave 384 for my dbzone

Get the name of the pool

The SuperCluster db zone creation tools create the pools with logical names based on the zonename. However, the way to be sure what pool is in use is to look at the zone’s definition

# zonecfg -z sc5acn01-d5 export |grep pool
set pool=pool_sc5acn01-d5.blah.uk.mydomain.com_id_6135

If this doesn’t return the name of the pool, your zone is not using resource pools and may be using one of the other methods of capping CPU usage such as allocating dedicated cpu resources (ncpus=X). If so.. stop reading here as this is not the blog post you are looking for.

Find the processor set associated with your pool by either looking in /etc/pooladm.conf or by running pooladm with no parameters to get the current running config. Checking in pooladm is WAY more readable so that is my preferred method.

# pooladm |more

system default
string system.comment
int system.version 1
boolean system.bind-default true
string system.poold.objectives wt-load

pool pool_sc5acn01-d5.blah.uk.mydomain.com_id_6135
int pool.sys_id 1
boolean pool.active true
boolean pool.default false
string pool.scheduler TS
int pool.importance 1
string pool.comment
pset pset_sc5acn01-d5.blah.uk.mydomain.com_id_6135

Here’s an extract of my pooladm.conf

<pool name="pool_sc5acn01-d5.blah.uk.mydomain.com_id_6135" active="true" default="false" importance="1" comment="" res="pset_1" ref_id="pool_1">
<property name="pool.sys_id" type="int">1</property>
<property name="pool.scheduler" type="string">TS</property>
</pool>
<pool name="pool_default" active="true" default="true" importance="1" comment="" res="pset_-1" ref_id="pool_0">
<property name="pool.sys_id" type="int">0</property>
<property name="pool.scheduler" type="string">TS</property>
</pool>
<res_comp type="pset" sys_id="1" name="pset_sc5acn01-d5.blah.uk.mydomain.com_id_6135" default="false" min="8" max="8" units="population" comment="" ref_id="pset_1">

It looks to me that the res=”pset_1″ in the pool definition points to the ref_id=”pset_1″ in the pset definition.

OK, so now I know my pset is called pset_sc5acn01-d5.blah.uk.mydomain.com_id_6135 and it currently has 8 cpus. I also know that my running config and my persistent config are synchronised.

Shutdown the database zone.

This may not be necessary, but since Oracle can make assumptions based on CPU count at startup, I think it is safest.

# zoneadm -z sc5acn01-d5 shutdown

Change the pset configuration

I’m going to do this by changing the config file to make it persistent, as there’s nothing more embarassing than making a change that is lost by a reboot. I set the processor set to have a minimum of 384 CPU and a maximum of 384 CPU.

# poolcfg -c 'modify pset pset_sc5acn02-d5.blah.uk.mydomain.com_id_6135 ( uint pset.min=384 ; uint pset.max=384 )' /etc/pooladm.conf

Check that it has applied to your config file
# grep pset_sc5acn01-d5 /etc/pooladm.conf

Force it to re-read the file and use the new configuration

# pooladm -c
# pooladm -s

Now you can run pooladm without any arguments and get the running config. If it all looks ok, go ahead and boot your zone

# pooladm

(snipped output)


pset pset_sc5acn01-d5.blah.uk.mydoamain.com_id_6135
int pset.sys_id 1
boolean pset.default false
uint pset.min 384
uint pset.max 384
string pset.units population
uint pset.load 723
uint pset.size 384
string pset.comment

Specifying a disk slice in ai manifest

After installing T5-8 firmware patch 17264131 to upgrade to version 910b my AI install was failing on an LDOM

 

InstallationLogger DEBUG      Executing:  ['/usr/sbin/zpool', 'create', '-f', '-B', 'rpool', 'c2d0']
InstallationLogger ERROR      cannot label 'c2d0':  try using fdisk(1M) and then provide a specific slice
Unable to build pool from specified devices: invalid vdev configuration
InstallationLogger ERROR      Error occurred during execution of 'target-instantiation' checkpoint.

 

As a workaround, I had to amend the manifest file for that specific LDOM and specify the exact disk slice to use…

<target>
<disk whole_disk="true">
<disk_name name="c2d0" name_type="ctd"/>
<slice name="0" in_zpool="rpool"/>
</disk>

 

I’m not sure if whole_disk needs to be set to true or false, as it seems illogical to be set to true. On the other hand, it worked so I’m good to get on with my other tasks.