Category Archives: Exadata

Changing Exadata Cell access parameters

One challenge is with Exadata cells in a lab environment is that they are secure! This means that it has long lock out times in the event of an incorrect login and tough lock settings. You can manually change these.. but every time you update your cell there is a chance they will be reset.

A more permanent way is to use /opt/oracle.cellos/host_access_control on each storage cell. https://docs.oracle.com/cd/E58626_01/html/E58630/z40036a01393423.html#scrolltoc

For example, if you want to drop the lock time in the event of a failed login from 10 minutes to a more manageable 60 seconds  you would issue the command

/opt/oracle.cellos/host_access_control pam-auth --lock=60

You can combine multiple pam-auth commands on the same line.. e.g. if I also want to say that the cell only remembers one previous password I could say

/opt/oracle.cellos/host_access_control pam-auth --lock=60 --remember=1

 

There are a lot of options for this tool – you can set the system back to secure defaults, or make it even more secure, such as locking an account after a single failed login!

Advertisements

Adding an Oracle Exadata Storage Server to Enterprise Manager using the command line

 

Ok, I’m just noodling around here… I have some ‘spare’ storage servers that are in the same fabric as my SuperCluster and I wanted to discover them in EM.

oracle@odc-em-sc7a:/u01/app/oracle/agent13/agent_inst/sysman/log$ emcli add_target -type=oracle_exadata -name=”expbcel09.osc.uk.oracle.com” -host=”sc7ach00pd00-d1.osc.uk.oracle.com” -properties=”CellName:expbcel09.osc.uk.oracle.com;MgmtIPAddr:138.3.3.82″
Target “expbcel09.osc.uk.oracle.com:oracle_exadata” added successfully

Changing attributes on ASM Diskgroups

You can see the attributes set on your ASM diskgroups in 2 ways, via the view  v$asm_attribute

SQL> select a.name, b.name, b.value from v$asm_attribute b, v$asm_diskgroup a where a.group_number=b.group_number and a.name='DATAX6C1' ;

Or via asmcmd and the lsattr

grid@sc7ach00pd00-d1:~$ asmcmd

ASMCMD> cd DATAX6C1
ASMCMD> lsattr -G DATAXC1 -l

Attributes you might want to pay attention to in an Exadata environment are

  • compatible.advm
  • cell.smart_scan_capable
  • appliance.mode
  • compatible.asm
  • compatible.rdbms

If you manually create a diskgroup via asmca these attributes will not normally be set, and so you may want to go manually set them.

SQL> ALTER DISKGROUP DATAX6C1 SET ATTRIBUTE 'appliance.mode'='TRUE';

The attribute is set immediately, but based on my experience, it does not come into effect until the disk group has  been rebalanced.

SQL> alter diskgroup DATAX6C1 rebalance power 2;

Enabling DNFS and configuring a RMAN backup to a ZFS 7320

DNFS Configuration process

This process is based on the setup required to attach a ZFS-BA to an Exadata. Unlike the ZFS-7320 a ZFS-BA has more infiniband links connected to the system and so can support greater throughput.

On the ZFS appliance

Create a  new project to hold the backup destination ‘MyCompanyBackuptest’

Edit project ‘MyCompanyBackuptest’
General Tab

→ Set ‘Synchronous write bias’ to Throughput
→ Set ‘Mountpoint’ to /export/mydb

Protocols Tab

→ Add nfs exceptions for all of ‘MyCompany’ servers for read/write and root access, using ‘Network’ and giving the individual IP addresses.

192.168.28.7/32
192.168.28.6/32
192.168.28.3/32
192.168.28.2/32

Shares Tab

→ Create filesystems backup1 to backup8

On SPARC node

As root

Check the required kernel parameters are set in /etc/system (done automatically by ssctuner service)

set rpcmod:clnt_max_conns = 8
set nfs:nfs3_bsize = 131072

Set suggested ndd parameters, by creating a script in /etc/rc2.d so they are set after every boot.

root@sc5acn01-d1:/etc/rc2.d# cat S99ndd
/usr/sbin/ndd -set /dev/tcp tcp_max_buf 4194304
/usr/sbin/ndd -set /dev/tcp tcp_xmit_hiwat 2097152
/usr/sbin/ndd -set /dev/tcp tcp_recv_hiwat 2097152
/usr/sbin/ndd -set /dev/tcp tcp_conn_req_max_q 16384
/usr/sbin/ndd -set /dev/tcp tcp_conn_req_max_q0 16384

Create mountpoints for the backup directories

root@sc5acn01-d1:/# for i in 1 2 3 4 5 6 7 8 
do 
mkdir /backup${i} 
done

Add /etc/vfstab entries for the mountpoints

sc5a-storIB:/export/mydb/backup1 - /backup1 nfs - yes rw,bg,hard,nointr,rsize=1048576,wsize=1048576,proto=tcp,vers=3,forcedirectio
sc5a-storIB:/export/mydb/backup2 - /backup2 nfs - yes rw,bg,hard,nointr,rsize=1048576,wsize=1048576,proto=tcp,vers=3,forcedirectio
sc5a-storIB:/export/mydb/backup3 - /backup3 nfs - yes rw,bg,hard,nointr,rsize=1048576,wsize=1048576,proto=tcp,vers=3,forcedirectio
sc5a-storIB:/export/mydb/backup4 - /backup4 nfs - yes rw,bg,hard,nointr,rsize=1048576,wsize=1048576,proto=tcp,vers=3,forcedirectio
sc5a-storIB:/export/mydb/backup5 - /backup5 nfs - yes rw,bg,hard,nointr,rsize=1048576,wsize=1048576,proto=tcp,vers=3,forcedirectio
sc5a-storIB:/export/mydb/backup6 - /backup6 nfs - yes rw,bg,hard,nointr,rsize=1048576,wsize=1048576,proto=tcp,vers=3,forcedirectio
sc5a-storIB:/export/mydb/backup7 - /backup7 nfs - yes rw,bg,hard,nointr,rsize=1048576,wsize=1048576,proto=tcp,vers=3,forcedirectio
sc5a-storIB:/export/mydb/backup8 - /backup8 nfs - yes rw,bg,hard,nointr,rsize=1048576,wsize=1048576,proto=tcp,vers=3,forcedirectio

Mount the filesystems and set ownership to oracle:dba

root@sc5acn01-d1:/# for i in 1 2 3 4 5 6 7 8 
do 
mount /backup${i} 
done
root@sc5acn01-d1:/# for i in 1 2 3 4 5 6 7 8 
do 
chown oracle:dba 
/backup${i} 
done

As Oracle

Stop any databases running from the ORACLE_HOME where you want to enable DNFS.
Ensure you can remotely authenticate as sysdba, creating a password file using orapwd if required.
Relink for dnfs support

oracle@sc5acn01-d1:/u01/app/oracle/product/11.2.0.3/dbhome_1/rdbms/lib$ make -f $ORACLE_HOME/rdbms/lib/ins_rdbms.mk dnfs_on

I was a little uncertain about the oradnfstab entries as most examples relate to a ZFS-BA which has many IB connections and 2 active heads, whereas the 7320 in this case was set in Active/Passive. I created $ORACLE_HOME/dbs/oradnfstab with the following entries.

server:sc5a-storIB path:192.168.28.1
export: /export/mydb/backup1 mount:/backup1
export: /export/mydb/backup2 mount:/backup2
export: /export/mydb/backup3 mount:/backup3
export: /export/mydb/backup4 mount:/backup4
export: /export/mydb/backup5 mount:/backup5
export: /export/mydb/backup6 mount:/backup6
export: /export/mydb/backup7 mount:/backup7
export: /export/mydb/backup8 mount:/backup8

Restart you database and check the alertlog to see if DNFS has been enabled by grepping for NFS.

Oracle instance running with ODM: Oracle Direct NFS ODM Library Version 3.0
 Wed Mar 26 16:50:43 2014

Backup and restore scripts will need to be adjusted to set suggested underscore parameters and to use the new locations.

oracle@sc5acn01-d1:~/mel$ cat dnfs_backup.rman
startup mount
run
{
sql 'alter system set "_backup_disk_bufcnt"=64';
sql 'alter system set "_backup_disk_bufsz"=1048576';
ALLOCATE CHANNEL ch01 DEVICE TYPE DISK connect 'sys/welcome1@mydb' format '/backup1/mydb/%U';
ALLOCATE CHANNEL ch02 DEVICE TYPE DISK connect 'sys/welcome1@mydb' format '/backup2/mydb/%U';
ALLOCATE CHANNEL ch03 DEVICE TYPE DISK connect 'sys/welcome1@mydb' format '/backup3/mydb/%U';
ALLOCATE CHANNEL ch04 DEVICE TYPE DISK connect 'sys/welcome1@mydb' format '/backup4/mydb/%U';
ALLOCATE CHANNEL ch05 DEVICE TYPE DISK connect 'sys/welcome1@mydb' format '/backup5/mydb/%U';
ALLOCATE CHANNEL ch06 DEVICE TYPE DISK connect 'sys/welcome1@mydb' format '/backup6/mydb/%U';
ALLOCATE CHANNEL ch07 DEVICE TYPE DISK connect 'sys/welcome1@mydb' format '/backup7/mydb/%U';
ALLOCATE CHANNEL ch08 DEVICE TYPE DISK connect 'sys/welcome1@mydb' format '/backup8/mydb/%U';
backup database TAG='dnfs-backup';
backup current controlfile format '/backup/dnfs-backup/backup-controlfile';
}
oracle@sc5acn01-d1:~/mel$ cat dnfs_restore.rman
startup nomount
restore controlfile from '/backup/dnfs-backup/backup-controlfile';
alter database mount;
configure device type disk parallelism 2;
run
{
sql 'alter system set "_backup_disk_bufcnt"=64';
sql 'alter system set "_backup_disk_bufsz"=1048576';
ALLOCATE CHANNEL ch01 DEVICE TYPE DISK connect 'sys/welcome1@mydb' format '/backup1/mydb/%U';
ALLOCATE CHANNEL ch02 DEVICE TYPE DISK connect 'sys/welcome1@mydb' format '/backup2/mydb/%U';
ALLOCATE CHANNEL ch03 DEVICE TYPE DISK connect 'sys/welcome1@mydb' format '/backup3/mydb/%U';
ALLOCATE CHANNEL ch04 DEVICE TYPE DISK connect 'sys/welcome1@mydb' format '/backup4/mydb/%U';
ALLOCATE CHANNEL ch05 DEVICE TYPE DISK connect 'sys/welcome1@mydb' format '/backup5/mydb/%U';
ALLOCATE CHANNEL ch06 DEVICE TYPE DISK connect 'sys/welcome1@mydb' format '/backup6/mydb/%U';
ALLOCATE CHANNEL ch07 DEVICE TYPE DISK connect 'sys/welcome1@mydb' format '/backup7/mydb/%U';
ALLOCATE CHANNEL ch08 DEVICE TYPE DISK connect 'sys/welcome1@mydb' format '/backup8/mydb/%U';
restore database from TAG='dnfs-backup';
}

Results of the changes

The timings are based on the longest running backup piece, rather than the wall clock time as this could include other RMAN operations such as re-cataloging files.

Standard NFS DNFS
Backup 2:32:09 44:58
Restore 33:42 24:46

So, it’s clear from these results that DNFS can have a huge impact on the backup performance and also a positive effect on restore performance.

If you look at the ZFS analytics for the backup, you can see that we were writing approximately 2 GB/s

backup

Also we were seeing approximately 1.2 GB/s read for the restore.
restore

Check if a griddisk is being cached

Check that flashcache is created

CellCLI> list flashcache
expacell01_FLASHCACHE normal

check that flashcache in writeback mode

CellCLI> list cell attributes name, flashcachemode
expacell01 WriteBack

List which parts of the flash cache is cachine a griddisk

CellCLI> list griddisk attributes name, cachedby
DBFS_DG_CD_02_expacell01 “FD_11_expacell01, FD_08_expacell01, FD_10_expacell01, FD_09_expacell01”
DBFS_DG_CD_03_expacell01 “FD_01_expacell01, FD_00_expacell01, FD_03_expacell01, FD_02_expacell01”
DBFS_DG_CD_04_expacell01 “FD_04_expacell01, FD_07_expacell01, FD_06_expacell01, FD_05_expacell01”
DBFS_DG_CD_05_expacell01 “FD_04_expacell01, FD_07_expacell01, FD_06_expacell01, FD_05_expacell01”
DBFS_DG_CD_06_expacell01 “FD_13_expacell01, FD_14_expacell01, FD_12_expacell01, FD_15_expacell01”
DBFS_DG_CD_07_expacell01 “FD_11_expacell01, FD_08_expacell01, FD_10_expacell01, FD_09_expacell01”
DBFS_DG_CD_08_expacell01 “FD_01_expacell01, FD_00_expacell01, FD_03_expacell01, FD_02_expacell01”
DBFS_DG_CD_09_expacell01 “FD_04_expacell01, FD_07_expacell01, FD_06_expacell01, FD_05_expacell01”
DBFS_DG_CD_10_expacell01 “FD_11_expacell01, FD_08_expacell01, FD_10_expacell01, FD_09_expacell01”
DBFS_DG_CD_11_expacell01 “FD_01_expacell01, FD_00_expacell01, FD_03_expacell01, FD_02_expacell01”
EXPA_DATA_CD_00_expacell01 “FD_13_expacell01, FD_14_expacell01, FD_12_expacell01, FD_15_expacell01”
EXPA_DATA_CD_01_expacell01 “FD_13_expacell01, FD_14_expacell01, FD_12_expacell01, FD_15_expacell01”
EXPA_DATA_CD_02_expacell01 “FD_11_expacell01, FD_08_expacell01, FD_10_expacell01, FD_09_expacell01”
EXPA_DATA_CD_03_expacell01 “FD_01_expacell01, FD_00_expacell01, FD_03_expacell01, FD_02_expacell01”
EXPA_DATA_CD_04_expacell01 “FD_04_expacell01, FD_07_expacell01, FD_06_expacell01, FD_05_expacell01”
EXPA_DATA_CD_05_expacell01 “FD_04_expacell01, FD_07_expacell01, FD_06_expacell01, FD_05_expacell01”
EXPA_DATA_CD_06_expacell01 “FD_13_expacell01, FD_14_expacell01, FD_12_expacell01, FD_15_expacell01”
EXPA_DATA_CD_07_expacell01 “FD_11_expacell01, FD_08_expacell01, FD_10_expacell01, FD_09_expacell01”
EXPA_DATA_CD_08_expacell01 “FD_01_expacell01, FD_00_expacell01, FD_03_expacell01, FD_02_expacell01”
EXPA_DATA_CD_09_expacell01 “FD_04_expacell01, FD_07_expacell01, FD_06_expacell01, FD_05_expacell01”
EXPA_DATA_CD_10_expacell01 “FD_11_expacell01, FD_08_expacell01, FD_10_expacell01, FD_09_expacell01”
EXPA_DATA_CD_11_expacell01 “FD_01_expacell01, FD_00_expacell01, FD_03_expacell01, FD_02_expacell01”

Infiniband healthchecking

I was double checking the health of my infiniband setup

A really useful command is ‘ibdiagnet’ available on Exadata, Exalogic and Sparc SuperCluster. It has several command line options, here I am asking for the simplest test, with 100 packets being used for each link.

#ibdiagnet -c 100

It gives a summary table at the end of the run showing whether any problems were encountered during the execution.

----------------------------------------------------------------
-I- Stages Status Report:
STAGE                                    Errors Warnings
Bad GUIDs/LIDs Check                     0      0
Link State Active Check                  0      0
General Devices Info Report              0      0
Performance Counters Report              0      1
Partitions Check                         0      0
IPoIB Subnets Check                      0      1

Please see /tmp/ibdiagnet.log for complete log
----------------------------------------------------------------

So I have two warning areas on my report which I’ll investigate separately.

IPoIB Subnets Check

-I---------------------------------------------------
-I- IPoIB Subnets Check
-I---------------------------------------------------
-I- Subnet: IPv4 PKey:0x7fff QKey:0x00000b1b MTU:2048Byte rate:10Gbps SL:0x00
-W- Suboptimal rate for group. Lowest member rate:40Gbps > group-rate:10Gbps

This warning looks pretty scary, however it is just saying that the multicast group was created as 4 x SDR (10Gb) rtather than QDR speed (40Gb) even though all the nodes are QDR. Open SM defaults to a 10Gb group rate for multicast groups.

Performance counters report

If you look in the logfile /tmp/ibdiagnet.log and search for -W- you will be able to find the Port(s) with the problem

-V- PM Port=9 lid=0x0075 guid=0x002128e8adaba0a0 dev=48438 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 159 80
-W- lid=0x0075 guid=0x002128e8adaba0a0 dev=48438 Port=9
Performance Monitor counter     : Value
link_error_recovery_counter     : 0xff (overflow)

Now to look for the actual device that matches this.

If you look in the file /tmp/ibdiagnet.lst it can give more information about the port with the problem..

{ SW Ports:24 SystemGUID:002128e8adaba0a3 NodeGUID:002128e8adaba0a0 PortGUID:002128e8adaba0a0 VenID:000002C9 DevID:BD36 Rev:000000A0 {SUN DCS 36P QDR sscasw-ib2.blah.com} LID:0075 PN:09 } { CA Ports:02 SystemGUID:0021280001ef508d NodeGUID:0021280001ef508a PortGUID:0021280001ef508b VenID:000002C9 DevID:673C Rev:000000B0 {MT25408 ConnectX Mellanox Technologies} LID:0035 PN:01 } PHY=4x LOG=ACT SPD=10

So this says Lid x075 Port 9 on switch sscasw-ib2 is the one with the problem.

Login to the switch and check out this port

[root@sscasw-ib2 opensm]# perfquery 0x075 9
# Port counters: Lid 117 port 9
PortSelect:......................9
CounterSelect:...................0x1b01
SymbolErrors:....................0
LinkRecovers:....................256
LinkDowned:......................0
RcvErrors:.......................0
RcvRemotePhysErrors:.............0
RcvSwRelayErrors:................0
XmtDiscards:.....................0
XmtConstraintErrors:.............0
RcvConstraintErrors:.............0
LinkIntegrityErrors:.............0
ExcBufOverrunErrors:.............0
VL15Dropped:.....................0
XmtData:.........................4294967295
RcvData:.........................4294967295
XmtPkts:.........................160825707
RcvPkts:.........................218355355

You can also check this out (look for amber lights!) through the managment bui on the switch. You can also use the BUI work out which physical port on the switch matches this Lid/Port combo (13A in this case)

I reseated the cable in port 13A, and cleared the error counter

[root@sscasw-ib2 opensm]# ibclearcounters

Now I’m monitoring the status of the port – if the error count increases again I will replace the cable.

 

 

Strange behaviour of listener_networks and scan listener

I had an Exalogic and a Sparc SuperCluster T4-4 connected together by infiniband for a set of tests. This meant that I was able to enable SDP and IP over the infiniband network.

To configure it, I had followed the instructions in the Exalogic Manual.

After setting the listener_networks parameter I checked whether the services had registered correctly with the scan listener. Expected behaviour is to see all instances registered with all 3 scan listeners

– Set your environment to the GRID_HOME

- Check which nodes are running the scan listener as you can only interrogate the listener from that node

 oracle@ssca01:~$ srvctl status scan
 SCAN VIP scan1 is enabled
 SCAN VIP scan1 is running on node ssca03
 SCAN VIP scan2 is enabled
 SCAN VIP scan2 is running on node ssca04
 SCAN VIP scan3 is enabled
 SCAN VIP scan3 is running on node ssca01

So on ssca01, I can check the status of LISTENER_SCAN1 ..
And it had  no services registered. Strange. Checked all of my listeners and only LISTENER_SCAN3 had any services registered

oracle@ssca01:~$ /u01/app/11.2.0.3/grid/bin/lsnrctl status LISTENER_SCAN3
LSNRCTL for Solaris: Version 11.2.0.3.0 - Production on 06-SEP-2012 15:20:43
Copyright (c) 1991, 2011, Oracle.  All rights reserved.
Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=IPC)(KEY=LISTENER_SCAN3)))
 STATUS of the LISTENER
 ------------------------
 Alias                     LISTENER_SCAN3
 Version                   TNSLSNR for Solaris: Version 11.2.0.3.0 - Production
 Start Date                30-AUG-2012 15:59:53
 Uptime                    6 days 23 hr. 20 min. 49 sec
 Trace Level               off
 Security                  ON: Local OS Authentication
 SNMP                      OFF
 Listener Parameter File   /u01/app/11.2.0.3/grid/network/admin/listener.ora
 Listener Log File         /u01/app/11.2.0.3/grid/log/diag/tnslsnr/ssca01/listener_scan3/alert/log.xml
 Listening Endpoints Summary...
 (DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(KEY=LISTENER_SCAN3)))
 (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=137.3.16.87)(PORT=1521)))
 Services Summary...
 Service "ibs" has 3 instance(s).
 Instance "IBS2", status READY, has 1 handler(s) for this service...
 Instance "IBS3", status READY, has 1 handler(s) for this service...
 Instance "IBS4", status READY, has 1 handler(s) for this service...
 The command completed successfully

I was confident my scan address was registered correctly in the DNS

oracle@ssca01:~$ nslookup ssca-scan
 Server:         138.4.34.5
 Address:        138.4.34.5#53
Name:   ssca-scan.blah.com
 Address: 137.3.16.89
 Name:   ssca-scan.blah.com
 Address: 137.3.16.88
 Name:   ssca-scan.blah.com
 Address: 137.3.16.87

I looked on Oracle Support and I could find no other reports of this problem, but then only a small proportion of customers will be running in this configuration.

However, I did find a note 1448717.1 that documented a similar problem with the remote_listener parameter.

So, I amended my tnsnames.ora file so that my LISTENER_IPREMOTE alias included the 3 scan ip addresses

#LISTENER_IPREMOTE =
#  (DESCRIPTION =
#    (ADDRESS = (PROTOCOL = TCP)(HOST = ssca-scan.blah.com)(PORT = 1521))
#  )

LISTENER_IPREMOTE =
  (DESCRIPTION =
    (ADDRESS = (PROTOCOL = TCP)(HOST = 137.3.16.87)(PORT = 1521))
    (ADDRESS = (PROTOCOL = TCP)(HOST = 137.3.16.88)(PORT = 1521))
    (ADDRESS = (PROTOCOL = TCP)(HOST = 137.3.16.89)(PORT = 1521))
  )

You can trace the PMON registration process by setting the following database event

alter system set events=’immediate trace name listener_registration level 3′;

and then issue a alter system register; to force pmon to re-register to listeners.

This will produce a trace file in background_dump dest

Looking through this logfile I saw it was still trying to register with the SCAN address.

 Remote listeners:
  0 - (DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=ssca-scan.osc.uk.oracle.com)(PORT=1521)))
       state=1, err=0
       nse[0]=0, nse[1]=0, nte[0]=0, nte[1]=0, nte[2]=0
       ncre=0
       endp=(DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=ssca-scan.osc.uk.oracle.com)(PORT=1521)))
         flg=0x0 nse=12545

At this point I realised that the tnsnames.ora must only be checked at certain times, such as database startup. So, I restarted my database.

Success! On checkng all of my scan listeners they all had services registered.

Logins, Pam and sorting it out..

A colleague reported a problem with a server.. when he tried to ssh as the user Oracle to one server it constantly failed with :-

oracle@ed2qcomp05's password:
Permission denied, please try again.

He could su to oracle as root, he could ssh as oracle from another server with user equivalency, so was confident that the home directory was intact.

When we looked in the /var/log/secure we saw the following message:

Nov  7 12:23:20 ed2qcomp05 sshd[27305]: pam_tally2(sshd:auth): user oracle (1000) tally 49, deny 5
Nov  7 12:23:21 ed2qcomp05 sshd[27305]: Failed password for oracle from 10.130.3.216 port 39519 ssh2

In /etc/pam.d/sshd it was configured to deny access after 5 attempts

auth       required     pam_tally2.so deny=5 onerr=fail

So, it looked like pam had locked out the oracle user due to multiple failed login attempts. At this point on a production system you should start to investigate who has been trying to access your system, however,we knew what had caused the problem.

First check  how many failed logins pam had counted for that user.

[root@ed2qcomp05 pam.d]# pam_tally2 --user oracle
Login           Failures Latest failure     From
oracle             49    11/07/11 12:23:20  c1718-3-216-mgt.ssclabs.net

Then you reset the ‘tally’ for oracle

[root@ed2qcomp05 pam.d]# pam_tally2 --user oracle --reset
Login           Failures Latest failure     From
oracle             49    11/07/11 12:23:20  c1718-3-216-mgt.ssclabs.net

Verify that it has been reset

[root@ed2qcomp05 pam.d]# pam_tally2 --user oracle
Login           Failures Latest failure     From
oracle              0

And now the Oracle user can log in to the system

Ways of monitoring ASM disk performance

I have the feeling I should call this post ‘part 1’ as I’m writing things as I discover new features and how to use them.

In 11G release 2 ASMCMD has a iostat feature that allows you to list the reads and writes per disk (either as I/O operation or bytes)

iostat [-et][–io] [–suppressheader] [–region] [-G diskgroup] [interval]

-e display error statistics (write/read)

-t display time statistics giving the total I/O time in hundredths of a second (requires TIMED_STATISTICS to be true)

-G diskgroup

interval – repeat the command every X seconds

As with most iostat commands, the first run in an interval is total stats up to now, and subsequent intervals cover the time since previous report.

ASMCMD> iostat -t -G data_upper 5
Group_Name  Dsk_Name                    Reads         Writes        Read_Time     Write_Time
DATA_UPPER  DATA_DM01_CD_00_ED2HCELL12  368823115776  398765133824  15652.064264  115609.999195
DATA_UPPER  DATA_DM01_CD_00_ED2HCELL13  360830513152  399665251328  15293.415546  108496.371997

Group_Name  Dsk_Name                    Reads     Writes    Read_Time  Write_Time

DATA_UPPER  DATA_DM01_CD_00_ED2HCELL12  0.00      0.00      0.00       0.00
DATA_UPPER  DATA_DM01_CD_00_ED2HCELL13  0.00      6553.60   0.00       0.00

This information is extracted from the V$ASM_DISK_IOSTAT dynamic performance view.

 

 

Looking at Exadata Cell Disks

An Exadata X2-2 storage cell contains 12 disks. While you should not need to look at them via iostat to measure performance, as metrics are available via dbconsole, enterprise manager and asm disk views, it can sometimes be useful to be able to take a quick look at the system.

Ways to map your Exadata devices to /dev/sd*

lsscsi

[root@ed2hcell01 ~]# lsscsi
[0:0:20:0]   enclosu SUN      HYDE12           0341  -
[0:2:0:0]    disk    LSI      MR9261-8i        2.90  /dev/sda
[0:2:1:0]    disk    LSI      MR9261-8i        2.90  /dev/sdb
[0:2:2:0]    disk    LSI      MR9261-8i        2.90  /dev/sdc
[0:2:3:0]    disk    LSI      MR9261-8i        2.90  /dev/sdd
[0:2:4:0]    disk    LSI      MR9261-8i        2.90  /dev/sde
[0:2:5:0]    disk    LSI      MR9261-8i        2.90  /dev/sdf
[0:2:6:0]    disk    LSI      MR9261-8i        2.90  /dev/sdg
[0:2:7:0]    disk    LSI      MR9261-8i        2.90  /dev/sdh
[0:2:8:0]    disk    LSI      MR9261-8i        2.90  /dev/sdi
[0:2:9:0]    disk    LSI      MR9261-8i        2.90  /dev/sdj
[0:2:10:0]   disk    LSI      MR9261-8i        2.90  /dev/sdk
[0:2:11:0]   disk    LSI      MR9261-8i        2.90  /dev/sdl
[1:0:0:0]    disk    ATA      MARVELL SD88SA02 D20Y  /dev/sdm
[1:0:1:0]    disk    ATA      MARVELL SD88SA02 D20Y  /dev/sdn
[1:0:2:0]    disk    ATA      MARVELL SD88SA02 D20Y  /dev/sdo
[1:0:3:0]    disk    ATA      MARVELL SD88SA02 D20Y  /dev/sdp
[2:0:0:0]    disk    ATA      MARVELL SD88SA02 D20Y  /dev/sdq
[2:0:1:0]    disk    ATA      MARVELL SD88SA02 D20Y  /dev/sdr
[2:0:2:0]    disk    ATA      MARVELL SD88SA02 D20Y  /dev/sds
[2:0:3:0]    disk    ATA      MARVELL SD88SA02 D20Y  /dev/sdt
[3:0:0:0]    disk    ATA      MARVELL SD88SA02 D20Y  /dev/sdu
[3:0:1:0]    disk    ATA      MARVELL SD88SA02 D20Y  /dev/sdv
[3:0:2:0]    disk    ATA      MARVELL SD88SA02 D20Y  /dev/sdw
[3:0:3:0]    disk    ATA      MARVELL SD88SA02 D20Y  /dev/sdx
[4:0:0:0]    disk    ATA      MARVELL SD88SA02 D20Y  /dev/sdy
[4:0:1:0]    disk    ATA      MARVELL SD88SA02 D20Y  /dev/sdz
[4:0:2:0]    disk    ATA      MARVELL SD88SA02 D20Y  /dev/sdaa
[4:0:3:0]    disk    ATA      MARVELL SD88SA02 D20Y  /dev/sdab

The LSI devices (hightlighted in red) are the physical disks, the MARVELL (in blue) are the Flash devices.

cellcli

The ‘list celldisk detail’ command gives information about the celldisks. Note, the command output is very long, so pipe it to a file!

[root@ed2hcell01 ~]# cellcli -e list celldisk detail
name:                   CD_00_ed2hcell01
 comment:
 creationTime:           2011-02-15T12:22:02+00:00
 deviceName:             /dev/sda
 devicePartition:        /dev/sda3
 diskType:               HardDisk
 errorCount:             0
 freeSpace:              0
 id:                     8f63def6-0ca7-448a-a3bd-76e5572af5f5
 interleaving:           none
 lun:                    0_0
 raidLevel:              0
 size:                   528.734375G
 status:                 normal

 name:                   CD_01_ed2hcell01
 comment:
 creationTime:           2011-02-15T12:22:06+00:00
 deviceName:             /dev/sdb
 devicePartition:        /dev/sdb3
 diskType:               HardDisk
 errorCount:             0
 freeSpace:              0
 id:                     41f37430-2beb-4062-9a03-ae801f411520
 interleaving:           none
 lun:                    0_1
 raidLevel:              0
 size:                   528.734375G
 status:                 normal

 name:                   CD_02_ed2hcell01
 comment:
 creationTime:           2011-02-15T12:22:06+00:00
 deviceName:             /dev/sdc
 devicePartition:        /dev/sdc
 diskType:               HardDisk
 errorCount:             0
 freeSpace:              0
 id:                     a9fc2181-10fc-498e-a902-9bb5cb47441c
 interleaving:           none
 lun:                    0_2
 raidLevel:              0
 size:                   557.859375G
 status:                 normal

<snip>

Note the lines highlighted in red:- on the first 2 disks the celldisks sit in the partition /dev/sd*3 to allow space for the software and operating system. On the remaining disks the cells have access to the whole disks.

This means that in iostat you cannot separate out the load for different ASM diskgroups whose griddisks share the same celldisk. You can see which celldisks are used by your griddisk by using

[root@ed2hcell01 ~]#cellcli -e list griddisk detail

Again the output to this command can be long, and it is better piped to a file.

name:                   DATA_DM01_CD_00_ed2hcell01
 availableTo:
 cellDisk:               CD_00_ed2hcell01
 comment:
 creationTime:           2011-02-15T12:26:29+00:00
 diskType:               HardDisk
 errorCount:             0
 id:                     a8c155f9-64c4-4604-adfd-3771ac93c35a
 offset:                 32M
 size:                   392G
 status:                 active