Who shot ssca01 ?

So previously I discovered that my cluster node was crashed by cssdagent.

But why?
First stop on my tour is the clusterware alert log for the node

root@ssca01:/u01/app/11.2.0.3/grid/log/ssca01# vi alertssca01.log

The first error of note is

[cssd(27689)]CRS-1656:The CSS daemon is terminating due to a fatal error; Details at (:CSSSC00012:) in /u01/app/11.2.0.3/grid/log/ssca01/cssd/ocssd.log
2012-09-07 17:30:44.042
[cssd(27689)]CRS-1652:Starting clean up of CRSD resources.
2012-09-07 17:30:44.347

So, time to look at the indicated logfile

root@ssca01:/u01/app/11.2.0.3/grid/log/ssca01/cssd# vi ocssd.l01

2012-09-07 17:30:44.042: [    CSSD][5]###################################
2012-09-07 17:30:44.042: [    CSSD][5]clssscExit: CSSD aborting from thread GMClientListener
2012-09-07 17:30:44.042: [    CSSD][5]###################################
2012-09-07 17:30:44.042: [    CSSD][5](:CSSSC00012:)clssscExit: A fatal error occurred and the CSS daemon is terminating abnormally
2012-09-07 17:30:44.042: [    CSSD][5]clssnmSendMeltdownStatus: node ssca01, number 1, has experienced a failure in thread number 15 and is shutting down
2012-09-07 17:30:44.042: [    CSSD][5]clssgmThreadRecovery:recovering clntlsnr mutex
2012-09-07 17:30:44.042: [    CSSD][5]clssnmQueueClientEvent:  Sending Event(6), type 6, incarn 240566197
2012-09-07 17:30:44.042: [    CSSD][5]clssnmQueueClientEvent: Node[1] state = 3, birth = 240566195, unique = 1346338630
2012-09-07 17:30:44.042: [    CSSD][5]clssnmQueueClientEvent: Node[2] state = 3, birth = 240566197, unique = 1346691206
2012-09-07 17:30:44.042: [    CSSD][5]clssnmQueueClientEvent: Node[3] state = 3, birth = 240566195, unique = 1346338628
2012-09-07 17:30:44.042: [    CSSD][5]clssnmQueueClientEvent: Node[4] state = 3, birth = 240566195, unique = 1346338626
2012-09-07 17:30:44.043: [    CSSD][1]clssnmStartNMMon: Received a fault event

Are there any OS errors around that time? According to /var/adm/messages

Sep  7 17:30:44 ssca01 in.routed[702]: [ID 970160 daemon.notice] unable to get interface flags for sc_ipmp0:1: No such device or address
Sep  7 17:30:44 ssca01 in.routed[702]: [ID 472501 daemon.notice] sc_ipmp0:1 has no ifIndex: No such device or address
Sep  7 17:30:44 ssca01 in.routed[702]: [ID 970160 daemon.notice] unable to get interface flags for bondib0:1: No such device or address
Sep  7 17:30:44 ssca01 in.routed[702]: [ID 472501 daemon.notice] bondib0:1 has no ifIndex: No such device or address

Which is strange.  Now in.routed as one of it’s many jobs does some network health checking – could it have been a bit gun crazy during a network glitch?

 

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s