Category Archives: Technical

TCPAttemptFails

A customer reported receiving a lot of these errors in the logs

TCPAttemptFails – “

The number of times that TCP connections have made a direct transition to the CLOSED state from either the SYN-SENT state or the SYN-RCVD state, plus the number of times that TCP connections have made a direct transition to the LISTEN state from the SYN-RCVD state.

When you look in netstat -s you can also see this counter incrementing –

[root@host-8-149 net]# netstat -s |grep 'failed connection'
    439395 failed connection attempts
[root@host-8-149 net]# netstat -s |grep 'failed connection'
    439462 failed connection attempts
[root@host-8-149 net]# netstat -s |grep 'failed connection'
    439502 failed connection attempts

This is occurring because some process is trying to open a connection and they are getting a reset packet in response, transitioning it  to CLOSED immediately.

Using tcpdump to look for things with reset flags on the public network interface

[root@host-8-149 net]# tcpdump -ieno2 -n -v 'tcp[tcpflags] & (tcp-rst) != 0'
dropped privs to tcpdump
tcpdump: listening on eno2, link-type EN10MB (Ethernet), capture size 262144 bytes
^C
0 packets captured
179 packets received by filter
41 packets dropped by kernel
[root@host-8-149 net]#

No packets are showing up – so it must be local to the server.

If I look on loopback however

root@host-8-149 net]# tcpdump -ilo -n -v 'tcp[tcpflags] & (tcp-rst) != 0'
dropped privs to tcpdump
tcpdump: listening on lo, link-type EN10MB (Ethernet), capture size 262144 bytes
16:10:03.623336 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto TCP (6), length 40)
    127.0.0.1.9411 > 127.0.0.1.42450: Flags [R.], cksum 0xff00 (correct), seq 0, ack 2066902277, win 0, length 0
16:10:03.625842 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto TCP (6), length 40)
    127.0.0.1.9411 > 127.0.0.1.42454: Flags [R.], cksum 0xbe62 (correct), seq 0, ack 3236259820, win 0, length 0
16:10:03.626131 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto TCP (6), length 40)
    127.0.0.1.9411 > 127.0.0.1.42456: Flags [R.], cksum 0x01b1 (correct), seq 0, ack 3517060063, win 0, length 0
16:10:03.782660 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto TCP (6), length 40)
    127.0.0.1.9411 > 127.0.0.1.42486: Flags [R.], cksum 0x5428 (correct), seq 0, ack 4287861592, win 0, length 0
16:10:03.880794 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto TCP (6), length 40)
    127.0.0.1.9411 > 127.0.0.1.42504: Flags [R.], cksum 0x2ae2 (correct), seq 0, ack 2701138720, win 0, length 0

So something running on this system is trying to connect to port 9411 on this server, via loopback interface.

When I look on netstat there is nothing listening on port 9411

[root@host-8-149 proc]# netstat -anpe |grep 9411
tcp        0      0 127.0.0.1:6010          0.0.0.0:*               LISTEN      0          1094116    506938/sshd: root@p 
tcp6       0      0 ::1:6010                :::*                    LISTEN      0          1094115    506938/sshd: root@p 
unix  2      [ ACC ]     STREAM     LISTENING     1094117  506938/sshd: root@p  /tmp/ssh-7RSGI64Efz/agent.506938

So – this leads me to think there is a mis-configured application on the system that is either missing a process that should be listening on port 9411, or is trying to connect to the ‘wrong’ port number.

Hacking SLOB to run on Solaris

Kevin Closson’s Silly Little Oracle Benchmark https://kevinclosson.net/slob/ is not ported to Solaris. This means that to get it working is not as easy as it should be. Kevin himself suggests that you use a small linux host to run the tool if your database is running on a non-supported operating system.

I’m going to track the changes I make here.

setup.sh

First problem – grep!

./setup.sh test 16
SLOB 2.4.0
FATAL : 2018.05.15-14:40:57 : Usage : ./setup.sh.orig: <tablespace name> <number of SLOB schemas to create and load>
FATAL : 2018.05.15-14:40:57 : Option 2 must be an integer

This is caused by the function f_is_int requiring the gnu grep command behaviour. Simplest way to this is to change the grep to point to /usr/gnu/bin/grep in this script.

Next problem..

Once this change was made, I was then able to create the first schema ( yay!) but then when it came to make the remaining 15, it failed.

NOTIFY : 2018.05.15-14:37:27 : Waiting for background batch 1. Loading up to user11
FATAL : 2018.05.15-14:37:29 : 
FATAL : 2018.05.15-14:37:29 : f_flag_abort: Triggering abort
FATAL : 2018.05.15-14:37:29 : 
FATAL : 2018.05.15-14:37:30 : 
FATAL : 2018.05.15-14:37:30 : f_flag_abort: Triggering abort
FATAL : 2018.05.15-14:37:30 : 
FATAL : 2018.05.15-14:37:40 : 
FATAL : 2018.05.15-14:37:40 : f_flag_abort: Triggering abort
FATAL : 2018.05.15-14:37:40 : 
FATAL : 2018.05.15-14:37:40 : 
FATAL : 2018.05.15-14:37:40 : f_check_abort_flag: discovered abort flag
FATAL : 2018.05.15-14:37:40 : 
FATAL : 2018.05.15-14:37:40 : Aborting SLOB setup. See /export/home/oracle/mel/SLOB/cr_tab_and_load.out

 

Handily there is an error message in the logfile

ALTER TABLE cf1 MINIMIZE RECORDS_PER_BLOCK
*
ERROR at line 1:
ORA-00604: error occurred at recursive SQL level 1
ORA-00039: error during periodic action
ORA-04036: PGA memory used by the instance exceeds PGA_AGGREGATE_LIMIT




Disconnected from Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production
With the Partitioning, Real Application Clusters, Automatic Storage Management, OLAP,
Advanced Analytics and Real Application Testing options
FATAL : 2018.05.15-14:37:40 :
FATAL : 2018.05.15-14:37:40 : f_setup: Failed to load user4 SLOB table
FATAL : 2018.05.15-14:37:40 :

I’m going to assume the clue is in the error and increase the PGA aggregate limit

SQL> alter system set pga_aggregate_limit=36G scope=both sid='*';

After this change, the script ran to the end successfully.

 

$ diff setup.sh setup.sh.orig
30,31d29
< 
< 
86c84
< if ( ! echo $s | /usr/gnu/bin/grep -q "^-\?[0-9]*$" )
---
> if ( ! echo $s | grep -q "^-\?[0-9]*$" )
250c248
< if ( echo "$cdb" | /usr/gnu/bin/grep -q "YES" > /dev/null 2>&1 )
---
> if ( echo "$cdb" | grep -q "YES" > /dev/null 2>&1 )
254c252
< if ( echo "$cdb" | /usr/gnu/bin/grep -q "NO" > /dev/null 2>&1 )
---
> if ( echo "$cdb" | grep -q "NO" > /dev/null 2>&1 )
410c408
< done | sqlplus -s "$constring" 2>&1 | tee -a $fname | /usr/gnu/bin/grep -i "dropped" | wc -l | while read num_processed
---
> done | sqlplus -s "$constring" 2>&1 | tee -a $fname | grep -i "dropped" | wc -l | while read num_processed

runit.sh

So – we already know that the scripts need gnu grep, especially in the function f_is_int. Safest option will be  to change every occurrence of grep with /usr/gnu/bin/grep.

Try running it with a single schema

 ./runit.sh 1

The duration of the run is based on the slob.conf variable RUN_TIME. The default is 300 seconds, but I am dropping it to 90 while I am doing debugging.

This ran successfully (I think!). Certainly AWR reports were created.

There was one obvious failure – the mpstat output was not generated.

$ cat mpstat.out 
mpstat: -P expects a number
Usage: mpstat [-aqm] [-A core|soc|bin] [-k key1,[key2,...]] [-o num] [-p | -P processor_set] [-T d|u] [-I statfile | -O statfile ] [interval [count]]

Ok – so this is the line causing the problem.

 ( mpstat -P ALL 3 > mpstat.out 2>&1) &

On Linux mpstat without the -P flag will give you summary output, -P separates the output by processor.  On Solaris the default is to show one line per CPU, so I just need to change the line to

 ( mpstat 3 > mpstat.out 2>&1) &

also check for other invocations of mpstat

Next - try with 2 sessions

./runit.sh 2

Arrgh. That fails with

: List of monitored sqlplus PIDs written to /tmp/.SLOB.2018.05.15.153524/13174.f_wait_pids.out.
usage: ps [ -aAdefHlcjLPyZ ] [ -o format ] [ -t termlist ]
 [ -u userlist ] [ -U userlist ] [ -G grouplist ]
 [ -p proclist ] [ -g pgrplist ] [ -s sidlist ] [ -z zonelist ] [-h lgrplist]
 'format' is one or more of:
 user ruser group rgroup uid ruid gid rgid pid ppid pgid sid taskid ctid
 pri opri pcpu pmem vsz rss rssprivate rssshared osz nice class time etime stime zone zoneid env
 f s c lwp nlwp psr tty addr wchan fname comm args projid project pset lgrp

This wasn’t obvious where the fault was happening. So lets add the -x flag to the first line of our script to try and find the line causing our problems.

++ sed /PID/d
++ wc -l
+++ cat /tmp/.SLOB.2018.05.15.153852/15451.f_wait_pids.out
++ sed 's/[^0-9]//g'
++ ps -p 32526 32527
usage: ps [ -aAdefHlcjLPyZ ] [ -o format ] [ -t termlist ]
 [ -u userlist ] [ -U userlist ] [ -G grouplist ]
 [ -p proclist ] [ -g pgrplist ] [ -s sidlist ] [ -z zonelist ] [-h lgrplist]
 'format' is one or more of:
 user ruser group rgroup uid ruid gid rgid pid ppid pgid sid taskid ctid
 pri opri pcpu pmem vsz rss rssprivate rssshared osz nice class time etime stime zone zoneid env
 f s c lwp nlwp psr tty addr wchan fname comm args projid project pset lgrp
++ return 0
+ tmp=0

So there is something with the ps -p command that is not happy in solaris. I  can reproduce the problem at my command line, by providing ps -p with multiple pids.


# ps -p 7 8
usage: ps [ -aAdefHlcjLPyZ ] [ -o format ] [ -t termlist ].....

If we look at the ps man page, on solaris it says

Some options accept lists as arguments. Items in a list can be either
separated by commas or else enclosed in quotes and separated by commas
or spaces. Values for proclist and grplist must be numeric.

so this means the following will work, but not the format within runit.sh

# ps -p 7,8
 PID TTY TIME CMD
 7 ? 127:47 zpool-rp
 8 ? 1:39 kmem_tas
# ps -p "7 8"
 PID TTY TIME CMD
 7 ? 127:47 zpool-rp
 8 ? 1:39 kmem_tas

 

So – I need to find the line with ps command, and try to change the format presented.

while ( ps -p $pidstring > /dev/null 2>&1 )

 ps -fp $pidstring

both occur in function f_wait_pids

but, they are actually being presented with the list of pids built up in line 1440

sqlplus_pids="${sqlplus_pids} $!"

So – how do I go about getting either a comma or a quote in the string that is passed to the function f_wait_pids?

Well, rather than destroying something that Kevin may rely on later, I add a line below 1440 to generate the pid list with commas

 melsqlplus_pids="${melsqlplus_pids},$!"

This is pretty ugly, as it means my string leads with a comma. However it seems that Solaris doesn’t care about this.

Now I need to add this to the function call on line 1493

if ( ! f_wait_pids "$(( SCHEMAS * THREADS_PER_SCHEMA ))" "$RUN_TIME" "$WORK_LOOP" "$sqlplus_pids"  "$melsqlplus_pids")

Of course I need to add something to pick up that new 5th argument to the function, so within the function itself

local melpidstring="$5"

while ( ps -p "$melpidstring" > /dev/null 2>&1 )

ps -fp "$melpidstring"

So this all seems ok.. but then I discover some more little rats in f_count_pid.

So once again – create my own custom usage of the pid list. There probably is a MUCH simpler way to do this, I’m just building it up as I go along. So – here are the changes I have made to runit.sh to get it to execute without error

 

oracle@sc8avm-25:~/mel/SLOB$ diff runit.sh runit.sh.orig
1c1
< #!/bin/bash 
---
> #!/bin/bash
45c45
< if ( ! echo "$s" | /usr/gnu/bin/grep -q "^-\?[0-9]*$" ) 
---
> if ( ! echo "$s" | grep -q "^-\?[0-9]*$" ) 
275c275
< for string in 'iostat -xm 3' 'mpstat 3' 'vmstat 3'
---
> for string in 'iostat -xm 3' 'mpstat -P ALL 3' 'vmstat 3'
300c300
< ls -l /proc/${tmp}/fd | /usr/gnu/bin/grep deleted
---
> ls -l /proc/${tmp}/fd | grep deleted
416c416
< sqlplus $user/${user}${non_admin_connect_string} <<EOF 2>/dev/null | sed 's/^.* FATAL/FATAL/g' | /usr/gnu/bin/grep FATAL > $tmpfile
---
> sqlplus $user/${user}${non_admin_connect_string} <<EOF 2>/dev/null | sed 's/^.* FATAL/FATAL/g' | grep FATAL > $tmpfile
457c457
< if ( /usr/gnu/bin/grep FATAL "$tmpfile" > /dev/null 2>&1 )
---
> if ( grep FATAL "$tmpfile" > /dev/null 2>&1 )
485d484
< local melpidstring="$5"
488d486
< local meltmpfile="${SLOB_TEMPDIR}/${RANDOM}.MEL${FUNCNAME}.out"
502d499
< echo "$melpidstring" > $meltmpfile 2>&1
508d504
< f_msg NOTIFY "List of monitored sqlplus PIDs with commas written to ${meltmpfile}."
514c510
< tmp=`f_count_pids "$meltmpfile"`
---
> tmp=`f_count_pids "$tmpfile"`
542,544c538,539
< echo "This is the pidstring value $pidstring"
< echo "This is the melpidstring value $melpidstring"
< while ( ps -p "$melpidstring" > /dev/null 2>&1 )
---
> 
> while ( ps -p $pidstring > /dev/null 2>&1 )
551c546
< ps -fp "$melpidstring"
---
> ps -fp $pidstring
875c870
< if ( ! echo "$tmp" | /usr/gnu/bin/grep -q '\-' > /dev/null 2>&1 )
---
> if ( ! echo "$tmp" | grep -q '\-' > /dev/null 2>&1 )
1397c1392
< ( mpstat 3 > mpstat.out 2>&1) &
---
> ( mpstat -P ALL 3 > mpstat.out 2>&1) &
1446d1440
< melsqlplus_pids="${melsqlplus_pids},$!"
1496c1490,1491
< if ( ! f_wait_pids "$(( SCHEMAS * THREADS_PER_SCHEMA ))" "$RUN_TIME" "$WORK_LOOP" "$sqlplus_pids" "$melsqlplus_pids" )
---
> 
> if ( ! f_wait_pids "$(( SCHEMAS * THREADS_PER_SCHEMA ))" "$RUN_TIME" "$WORK_LOOP" "$sqlplus_pids" )

Tomorrow – time to get someone else to run it and see if it behaves how they expect.

 

Dimstat and add on stats

In my working life I regularly use a tool called Dimstat, a unix performance monitoring tool which stores the statistics into a MySQL database. The tool is written by a former colleague and is available from http://dimitrik.free.fr/

It comes with ‘built in’ support to gather standard unix statistics such as vmstat, mpstat but one of the great features is the ability to add new statistics collection methods. It can handle single line output (e.g. vmstat) or multi line output like mpstat. Let’s say we want to gather filesystem utilisation information based on the output of df -k. This will be a multi-line STAT as we will have more than one filesystem reported.

Creating your script

First you create your script that gathers the information you want. You must remember to include a line separator field in your output (default newline) so dimstat knows that the record is complete and also a field to accept the interval.

So here’s my basic script, you may want to adjust the filesystems I am excluding from the list (or make the egrep more elegant!)

# cat mel_df.sh
#!/bin/bash
while true
do
df -k |egrep -v 'fd|mnttab|objfs|sharetab|Filesystem|volatile|proc|devices|contract|dev' | sort | awk  '{ print $6 " " $2 " " $3 " " $4}'
echo ""
sleep $1
done

Test your script interactively

# ./mel_df.sh 5
/logpool 1717567488 31 1262824069
/var/fmw/app 1717567488 454729292 1262824069
/rpool 429391872 73 164531945
/ 429391872 14039895 164531945
/var 429391872 125827192 164531945
/var/share 429391872 195 164531945
/export 429391872 32 164531945
/export/home 429391872 35 164531945
/export/home/oracle 429391872 1415887 164531945
/export/home/otd_user 429391872 35 164531945
/export/home/weblogic 429391872 36 164531945
/zones 429391872 32 164531945
/zones/otd-zone 429391872 35 164531945
/tmp 235826528 48 235826480

/logpool 1717567488 31 1262824069
/var/fmw/app 1717567488 454729292 1262824069
/rpool 429391872 73 164531769
/ 429391872 14039895 164531769
/var 429391872 125827196 164531769
/var/share 429391872 195 164531769
/export 429391872 32 164531769
/export/home 429391872 35 164531769
/export/home/oracle 429391872 1415887 164531769
/export/home/otd_user 429391872 35 164531769
/export/home/weblogic 429391872 36 164531769
/zones 429391872 32 164531769
/zones/otd-zone 429391872 35 164531769
/tmp 235826528 48 235826480

Now you copy the script into /etc/STATsrv/bin on the hosts you want to capture the statistics from.

Then you edit the /etc/STATsrv/access file and add a line pointing to your script, and giving it the name ‘DF_CHECK’

command  DF_check       /etc/STATsrv/bin/mel_df.sh

On the host with the STATsrv demon we can check if this stat is advertised and available..

 ./STATcmd -h localhost -c STAT_LIST
STAT *** OK CONNECTION 0 sec.
STAT *** LIST COMMAND (STAT_LIST)
STAT: vmstat
STAT: mpstat
STAT: netstat
STAT: ForkExec
STAT: MEMSTAT
STAT: tailX
STAT: ioSTAT.sh
STAT: netLOAD.sh
STAT: netLOAD
STAT: psSTAT
STAT: UserLOAD
STAT: ProcLOAD
STAT: bsdlink
STAT: bsdlink.sh
STAT: sysinfo
STAT: SysINFO
STAT: Siostat
STAT: ProjLOAD
STAT: PoolLOAD
STAT: TaskLOAD
STAT: ZoneLOAD
STAT: IOpatt
STAT: CPUSet
STAT: UDPstat
STAT: DF_check
STAT *** LIST END (STAT_LIST)

We can test if the STATsrv demon can run the script

 ./STATcmd -h localhost -c "DF_check 1"
STAT *** OK CONNECTION 0 sec.
STAT *** OK COMMAND (cmd: DF_check)
/logpool 1717567488 31 891420077
/var/fmw/app 1717567488 826058863 891420077
/rpool 429391872 73 55609192
/ 429391872 10794969 55609192
/var 429391872 237518790 55609192
/var/share 429391872 169 55609192
/export 429391872 32 55609192
/export/home 429391872 35 55609192
/export/home/mel 429391872 2871626 55609192
/export/home/oracle 429391872 136058 55609192
/export/home/weblogic 429391872 36 55609192
/tmp 194877488 296 194877192

/logpool 1717567488 31 891420077
/var/fmw/app 1717567488 826058863 891420077
/rpool 429391872 73 55609192
/ 429391872 10794969 55609192
/var 429391872 237518790 55609192
/var/share 429391872 169 55609192
/export 429391872 32 55609192
/export/home 429391872 35 55609192
/export/home/mel 429391872 2871626 55609192
/export/home/oracle 429391872 136058 55609192
/export/home/weblogic 429391872 36 55609192
/tmp 194877488 296 194877192

Declaring your script to the Dimstat server

You have 2 methods to declare your script to the server, either via the GUI or by importing a stat definition file (the format for these files make this option for experienced users only)

Via the GUI you select ADD-on STATS -> Integrate new ADD-on STAT

Enter the name of your STAT: DF_check and complete the information about the column names and data types

dimstat

Once you have declared your add on stat you should now be able to start a new collect on the host using your new statistic. Once it has collected some data, the button for your statistic will become visible in the Analyze page.

Sample stat description file

# =======================================================================
# DF_check: dim_STAT New STAT Description
# =======================================================================
DF_check
4
1
DF_check Statistic(s)
DF_check %i


# =======================================================================
# Column: v_df_check_att (mountpoint)
# =======================================================================
v_df_check_att
64
1
mountpoint
mountpoint
0
# =======================================================================
# Column: v_column4 (size_kb)
# =======================================================================
v_column4
1
2
size_kb
size_kb
0
# =======================================================================
# Column: v_column5 (used_kb)
# =======================================================================
v_column5
1
3
used_kb
used_kb
0
# =======================================================================
# Column: v_column6 (free_kb)
# =======================================================================
v_column6
1
4
free_kb
free_kb
0

mp3splt

mp3splt is a really useful little tool to chop big mp3s into smaller ones. I use it mainly to split audio books into finer grained sections for easier navigation in the car or on my ipod.

To achieve this I use the following command

C:\Program Files (x86)\mp3splt>mp3splt d:\temp\audio\TheAdventureoftheBlueCarbun
cle.mp3 -a -t 30.00 -d d:\temp\audio\bluecarbuncle

This will split my mp3 file into 30 minute (-t 30.00) chunks, auto adjusting to cut in silent parts (-a) and writing the output files into a directory, creating if necessary.

CURL – curl: (9) Server denied you to change to the given directory

I hit this error trying to import some VMs into OVM

To get more detail I tried the command from the ocmmand line

curl -v "ftp://root:blahroot@192.168.8.192/var/tmp/taxud-disk1.vmdk"
* About to connect() to 192.168.8.192 port 21
* Trying 192.168.8.192... connected
* Connected to 192.168.8.192 (192.168.8.192) port 21
< 220 (vsFTPd 2.0.5) > USER root
< 331 Please specify the password. > PASS blahroot
< 230 Login successful. > PWD
< 257 "/root" * Entry path is '/root' > CWD var
< 550 Failed to change directory. * Server denied you to change to the given directory * Connection #0 to host 192.168.8.192 left intact curl: (9) Server denied you to change to the given directory > QUIT
< 221 Goodbye.

This is because the pathname I have given is relative, rather than absolute, so I was unknowingly trying to change directory to /root/var/tmp which did not exist.

To give an absolute pathname, you need an extra slash

curl “ftp://root:blahroot@192.168.8.192//var/tmp/taxud-disk1.vmdk”

Strange behaviour of listener_networks and scan listener

I had an Exalogic and a Sparc SuperCluster T4-4 connected together by infiniband for a set of tests. This meant that I was able to enable SDP and IP over the infiniband network.

To configure it, I had followed the instructions in the Exalogic Manual.

After setting the listener_networks parameter I checked whether the services had registered correctly with the scan listener. Expected behaviour is to see all instances registered with all 3 scan listeners

– Set your environment to the GRID_HOME

- Check which nodes are running the scan listener as you can only interrogate the listener from that node

 oracle@ssca01:~$ srvctl status scan
 SCAN VIP scan1 is enabled
 SCAN VIP scan1 is running on node ssca03
 SCAN VIP scan2 is enabled
 SCAN VIP scan2 is running on node ssca04
 SCAN VIP scan3 is enabled
 SCAN VIP scan3 is running on node ssca01

So on ssca01, I can check the status of LISTENER_SCAN1 ..
And it had  no services registered. Strange. Checked all of my listeners and only LISTENER_SCAN3 had any services registered

oracle@ssca01:~$ /u01/app/11.2.0.3/grid/bin/lsnrctl status LISTENER_SCAN3
LSNRCTL for Solaris: Version 11.2.0.3.0 - Production on 06-SEP-2012 15:20:43
Copyright (c) 1991, 2011, Oracle.  All rights reserved.
Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=IPC)(KEY=LISTENER_SCAN3)))
 STATUS of the LISTENER
 ------------------------
 Alias                     LISTENER_SCAN3
 Version                   TNSLSNR for Solaris: Version 11.2.0.3.0 - Production
 Start Date                30-AUG-2012 15:59:53
 Uptime                    6 days 23 hr. 20 min. 49 sec
 Trace Level               off
 Security                  ON: Local OS Authentication
 SNMP                      OFF
 Listener Parameter File   /u01/app/11.2.0.3/grid/network/admin/listener.ora
 Listener Log File         /u01/app/11.2.0.3/grid/log/diag/tnslsnr/ssca01/listener_scan3/alert/log.xml
 Listening Endpoints Summary...
 (DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(KEY=LISTENER_SCAN3)))
 (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=137.3.16.87)(PORT=1521)))
 Services Summary...
 Service "ibs" has 3 instance(s).
 Instance "IBS2", status READY, has 1 handler(s) for this service...
 Instance "IBS3", status READY, has 1 handler(s) for this service...
 Instance "IBS4", status READY, has 1 handler(s) for this service...
 The command completed successfully

I was confident my scan address was registered correctly in the DNS

oracle@ssca01:~$ nslookup ssca-scan
 Server:         138.4.34.5
 Address:        138.4.34.5#53
Name:   ssca-scan.blah.com
 Address: 137.3.16.89
 Name:   ssca-scan.blah.com
 Address: 137.3.16.88
 Name:   ssca-scan.blah.com
 Address: 137.3.16.87

I looked on Oracle Support and I could find no other reports of this problem, but then only a small proportion of customers will be running in this configuration.

However, I did find a note 1448717.1 that documented a similar problem with the remote_listener parameter.

So, I amended my tnsnames.ora file so that my LISTENER_IPREMOTE alias included the 3 scan ip addresses

#LISTENER_IPREMOTE =
#  (DESCRIPTION =
#    (ADDRESS = (PROTOCOL = TCP)(HOST = ssca-scan.blah.com)(PORT = 1521))
#  )

LISTENER_IPREMOTE =
  (DESCRIPTION =
    (ADDRESS = (PROTOCOL = TCP)(HOST = 137.3.16.87)(PORT = 1521))
    (ADDRESS = (PROTOCOL = TCP)(HOST = 137.3.16.88)(PORT = 1521))
    (ADDRESS = (PROTOCOL = TCP)(HOST = 137.3.16.89)(PORT = 1521))
  )

You can trace the PMON registration process by setting the following database event

alter system set events=’immediate trace name listener_registration level 3′;

and then issue a alter system register; to force pmon to re-register to listeners.

This will produce a trace file in background_dump dest

Looking through this logfile I saw it was still trying to register with the SCAN address.

 Remote listeners:
  0 - (DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=ssca-scan.osc.uk.oracle.com)(PORT=1521)))
       state=1, err=0
       nse[0]=0, nse[1]=0, nte[0]=0, nte[1]=0, nte[2]=0
       ncre=0
       endp=(DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=ssca-scan.osc.uk.oracle.com)(PORT=1521)))
         flg=0x0 nse=12545

At this point I realised that the tnsnames.ora must only be checked at certain times, such as database startup. So, I restarted my database.

Success! On checkng all of my scan listeners they all had services registered.

Zoning a Brocade Switch Using WWNs

Way back in the mists of time I used to use port based zoning on Brocade switches, however, I started having problems with this and newer storage systems (almost certainly pilot error!). I needed to zone some switches for a customer’s piece of work and this time I thought I’d get with the future and use WWN based zoning.

So, in my setup I have 2 hosts, each with 2 connections per switch, and 2 storage arrays with 1 connection to the switch.

swd77:admin> switchshow
switchName:     swd77
switchType:     34.0
switchState:    Online
switchMode:     Native
switchRole:     Principal
switchDomain:   1
switchId:       fffc01
switchWwn:      10:00:00:05:1e:02:a2:08
zoning:         OFF
switchBeacon:   OFF

Area Port Media Speed State
==============================
  0   0   id    N4   Online    F-Port  20:14:00:a0:b8:29:f5:56 <- Storage Array 1
  1   1   id    N4   Online    F-Port  20:16:00:a0:b8:29:cd:b4 <- Storage Array 2
  2   2   id    N4   Online    F-Port  21:00:00:24:ff:20:3a:f6 <- Host A
  3   3   id    N4   Online    F-Port  21:00:00:24:ff:20:3a:e0 <- Host A
  4   4   --    N4   No_Module
  5   5   --    N4   No_Module
  6   6   id    N4   No_Light
  7   7   id    N4   No_Light
  8   8   id    N4   Online    F-Port  21:00:00:24:ff:20:3b:92 <- Host B
  9   9   id    N4   Online    F-Port  21:00:00:24:ff:25:6d:ac <- Host B
 10  10   id    N4   No_Light
 11  11   id    N4   No_Light
 12  12   id    N4   No_Light
 13  13   id    N4   No_Light
 14  14   --    N4   No_Module
 15  15   --    N4   No_Module

Create aliases for your hosts and storage arrays

swd77:admin> alicreate host1_a,"21:00:00:24:ff:20:3b:92"
swd77:admin> alicreate host1_b,"21:00:00:24:ff:25:6d:ac"
swd77:admin> alicreate host2_a,"21:00:00:24:ff:20:3a:f6"
swd77:admin> alicreate host2_b,"21:00:00:24:ff:20:3a:e0"
swd77:admin> alicreate "a6140","20:14:00:a0:b8:29:f5:56"
swd77:admin> alicreate "b6140","20:16:00:a0:b8:29:cd:b4"

Create Zones to include your alias

swd77:admin> zonecreate "port2","host1_a; a6140; b6140"
swd77:admin> zonecreate "port3","host1_b; a6140; b6140"
swd77:admin> zonecreate "port8","host2_a;  a6140; b6140"
swd77:admin> zonecreate "port9","host2_b;  a6140; b6140"

Create a configuration for your zones and save it

swd77:admin> cfgcreate "customer1","port2; port3; port8; port9"
swd77:admin> cfgsave
You are about to save the Defined zoning configuration. This
action will only save the changes on Defined configuration.
Any changes made on the Effective configuration will not
take effect until it is re-enabled.
Do you want to save Defined zoning configuration only?  (yes, y, no, n): [no] yes

When you’re happy with your configuration, enable it.

swd77:admin> cfgenable customer1
You are about to enable a new zoning configuration.
This action will replace the old zoning configuration with the
current configuration selected.
Do you want to enable 'customer1' configuration  (yes, y, no, n): [no] y
zone config "customer1" is in effect
Updating flash ...

Check at the OS level to see if you can see all your required volumes.

Android Calendar Synchronization

My work phone has just been upgraded to a HTC Legend. The phone is very attractive, the screen clear and bright and the choice of applications dizzying.

I did discover one thing that Android did less well than the Nokia E71 – Calendar Synchronization. The native app seems to be totally geared around you being able to use google calendar, which is fine for personal use, but no good if your company uses another calendar technology and prevent you from storing your corporate diary in external services.

The answer I’ve found is the application http://www.hypermatix.com/products/calendar_sync_for_android?q=faq which is able to connect to my corporate calendar and use it to populate the local calendar on my phone. Currently only in Beta, but it seems to be an effective solution to the problem.