Category Archives: RAC

Who shot ssca01 ?

So previously I discovered that my cluster node was crashed by cssdagent.

But why?
First stop on my tour is the clusterware alert log for the node

root@ssca01:/u01/app/11.2.0.3/grid/log/ssca01# vi alertssca01.log

The first error of note is

[cssd(27689)]CRS-1656:The CSS daemon is terminating due to a fatal error; Details at (:CSSSC00012:) in /u01/app/11.2.0.3/grid/log/ssca01/cssd/ocssd.log
2012-09-07 17:30:44.042
[cssd(27689)]CRS-1652:Starting clean up of CRSD resources.
2012-09-07 17:30:44.347

So, time to look at the indicated logfile

root@ssca01:/u01/app/11.2.0.3/grid/log/ssca01/cssd# vi ocssd.l01

2012-09-07 17:30:44.042: [    CSSD][5]###################################
2012-09-07 17:30:44.042: [    CSSD][5]clssscExit: CSSD aborting from thread GMClientListener
2012-09-07 17:30:44.042: [    CSSD][5]###################################
2012-09-07 17:30:44.042: [    CSSD][5](:CSSSC00012:)clssscExit: A fatal error occurred and the CSS daemon is terminating abnormally
2012-09-07 17:30:44.042: [    CSSD][5]clssnmSendMeltdownStatus: node ssca01, number 1, has experienced a failure in thread number 15 and is shutting down
2012-09-07 17:30:44.042: [    CSSD][5]clssgmThreadRecovery:recovering clntlsnr mutex
2012-09-07 17:30:44.042: [    CSSD][5]clssnmQueueClientEvent:  Sending Event(6), type 6, incarn 240566197
2012-09-07 17:30:44.042: [    CSSD][5]clssnmQueueClientEvent: Node[1] state = 3, birth = 240566195, unique = 1346338630
2012-09-07 17:30:44.042: [    CSSD][5]clssnmQueueClientEvent: Node[2] state = 3, birth = 240566197, unique = 1346691206
2012-09-07 17:30:44.042: [    CSSD][5]clssnmQueueClientEvent: Node[3] state = 3, birth = 240566195, unique = 1346338628
2012-09-07 17:30:44.042: [    CSSD][5]clssnmQueueClientEvent: Node[4] state = 3, birth = 240566195, unique = 1346338626
2012-09-07 17:30:44.043: [    CSSD][1]clssnmStartNMMon: Received a fault event

Are there any OS errors around that time? According to /var/adm/messages

Sep  7 17:30:44 ssca01 in.routed[702]: [ID 970160 daemon.notice] unable to get interface flags for sc_ipmp0:1: No such device or address
Sep  7 17:30:44 ssca01 in.routed[702]: [ID 472501 daemon.notice] sc_ipmp0:1 has no ifIndex: No such device or address
Sep  7 17:30:44 ssca01 in.routed[702]: [ID 970160 daemon.notice] unable to get interface flags for bondib0:1: No such device or address
Sep  7 17:30:44 ssca01 in.routed[702]: [ID 472501 daemon.notice] bondib0:1 has no ifIndex: No such device or address

Which is strange.  Now in.routed as one of it’s many jobs does some network health checking – could it have been a bit gun crazy during a network glitch?

 

Advertisements

How to find out if you’re accessing a RAC database

How can you tell if you’re accessing a RAC database? Simple!

You can tell if it is a cluster database by looking to see if the cluster database parameter is set:-

SQL> select name, value from v$parameter where name=’cluster_database’;

NAME VALUE
——————— ———————
cluster_database TRUE

or

set serveroutput on
BEGIN
IF dbms_utility.is_cluster_database THEN
dbms_output.put_line(‘Running in SHARED/RAC mode.’);
ELSE
dbms_output.put_line(‘Running in EXCLUSIVE mode.’);
END IF;
END;
/

You can tell how many instances are active by:-

SQL> SELECT * FROM V$ACTIVE_INSTANCES;

INST_NUMBER INST_NAME
———– ———————–

1 c1718-6-45:AXIOSS1

2 c1718-6-46:AXIOSS2