Crash dump analysis – or who shot ssca01?

So, I had a system fail with

Sep  7 17:30:48 ssca01 SC Alert: [ID 665947 daemon.notice] Audit | minor: root : Close Session : object = "/SP/session/type" : value = "shell" : success
 Sep  7 17:31:40 ssca01 unix: [ID 836849 kern.notice]
 Sep  7 17:31:40 ssca01 ^Mpanic[cpu243]/thread=302adde5c20:
 Sep  7 17:31:40 ssca01 unix: [ID 156897 kern.notice] forced crash dump initiated at user request

So I need to know which process issued the crash dump..

root@ssca01: cd /var/crash
root@ssca01:/var/crash# file *.0
 vmdump.0:       SunOS 5.11 11.0 64-bit SPARC compressed crash dump from 'ssca01'

First unpack the compressed crash dump

root@ssca01:/var/crash# savefore -vf vmdump.0
root@ssca01:/var/crash# mdb -k unix.0 vmcore.0
 Loading modules: [ unix genunix specfs dtrace zfs scsi_vhci sd mpt_sas mac px ldc crypto ip hook neti arp usba kssl sockfs qlc fctl random niumx idm fcp cpc mdesc fcip logindmux ptm sppp nsmb ufs ipc nfs ]
 > :: status
 mdb: syntax error near ":"
 > ::status
 debugging crash dump vmcore.0 (64-bit) from ssca01
 operating system: 5.11 11.0 (sun4v)
 image uuid: 8d9326c5-a01c-e66e-d5dc-fe299173999d
 panic message: forced crash dump initiated at user request
 dump content: kernel pages only
> ::showrev
 Hostname: ssca01
 Release: 5.11
 Kernel architecture: sun4v
 Application architecture: sparcv9
 Kernel version: SunOS 5.11 sun4v 11.0
 Platform: sun4v
> ::panicinfo
 cpu              243
 thread      302adde5c20
 message forced crash dump initiated at user request
 tstate       9900001605
 g1                4
 g2                4
 g3          193e800
 g4                1
 g5          183f800
 g6                0
 g7      302adde5c20
 o0          12b7820
 o1      2a11bb1b9b8
 o2          7100000
 o3               32
 o4                2
 o5      2a11bb1b9d8
 o6      2a11bb1b081
 o7          107ed4c
 pc          105650c
 npc          1056510
 y                0

Then you can use this shell script to pick out the details of the thread..

#!/usr/bin/env sh
echo "::ps" | mdb -k unix.0 vmcore.0 | \
 nawk '$8 !~ /ADDR/ {print $8" "$NF}' > /tmp/.core.$$
cat /dev/null > /tmp/core.$$
while read ps; do
 echo "process name: `echo ${ps} | nawk '{print $2}'`" >> /tmp/core.$$
 echo ${ps} | nawk '{print $1"::walk thread | ::findstack"}' | \
 mdb unix.0 vmcore.0 >> /tmp/core.$$
 echo >> /tmp/core.$$
 done < /tmp/.core.$$
rm /tmp/.core.$$
exit 0

Have a look at the created listings and look for the thread number

vi /tmp/core.*

process name: cssdagent
 [snip of lots of threads]
stack pointer for thread 302adde5c20: 2a11bb1b081
 000002a11bb1b131 kadmin+0x5a0()
 000002a11bb1b201 uadmin+0x1c0()
 000002a11bb1b2d1 syscall_trap+0xac()

so this was initiated by cssdagent, part of Oracle Clusterware

<to be continued>

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s