2011-04-06 21:13:57.781: [ CSSD]clssnmvDHBValidateNCopy: node 1, tptrac1, has a disk HB, but no network HB, DHB has rcfg 183608157, wrtcnt, 41779628, LATS 3108751988, lastSeqNo 41779625, uniqueness 1294698909, timestamp 1302104569/3108688378The error message clearly says there is no network heart beat between the two nodes. Indeed, it failed when I tried to ping using the private IP address. On node1, “eth1” was messed up as shown below:
[root@tptrac1 ~]# ifconfig eth1 eth1 Link encap:Ethernet HWaddr F4:CE:46:84:F7:CA inet6 addr: fe80::f6ce:46ff:fe84:f7ca/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:3244634 errors:0 dropped:0 overruns:0 frame:0 TX packets:6800251 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:2828581056 (2.6 GiB) TX bytes:1669807875 (1.5 GiB) Interrupt:162 Memory:f6000000-f6012800 [root@tptrac1 ~]#I reassigned the private IP address as shown below:
ifconfig eth1 10.28.177.1 netmask 255.255.255.128 upI was able to ping after setting the private IP address on Node1. It was now time to stop and start the cluster.
-bash-3.2$ ./crsctl check crs CRS-4638: Oracle High Availability Services is online CRS-4537: Cluster Ready Services is online CRS-4529: Cluster Synchronization Services is online CRS-4533: Event Manager is online -bash-3.2$Well, the cluster happily came up without reporting any errors. I would have saved all the troubleshooting time if I had checked node reachability in the first place. Anyways, it was a good troubleshooting exercise and I also got something to share on my blog.