Steps to Remove Node from Cluster When the Node Crashes Due to OS/Hardware Failure and cannot boot up (Doc ID 466975.1) To BottomTo Bottom
Modified:16-Nov-2012Type:HOWTO
Rate this document Email link to this document Open document in new window Printable Page
In this Document
Goal
Fix
Summary
Example Configuration
Initial Stage
Step 1 Remove oifcfg information for the failed node
Step 2 Remove ONS information
Step 3 Remove resources
Step 4 Execute rootdeletenode.sh
Step 5 Update the Inventory
References
APPLIES TO:
Oracle Server - Enterprise Edition - Version 10.2.0.1 to 11.1.0.6 [Release 10.2 to 11.1]
Oracle Server - Standard Edition - Version 10.2.0.1 to 11.1.0.6 [Release 10.2 to 11.1]
Information in this document applies to any platform.
Oracle Server Enterprise Edition - Version: 10.2.0.1 to 11.1.0.6
Oracle Clusterware
GOAL
This document is intented to provide the steps to be taken to remove a node from the Oracle cluster. The node itself is unavailable due to some OS issue or hardware issue which prevents the node from starting up. This document will provide the steps to remove such a node so that it can be added back after the node is fixed.
The steps to remove a node from a Cluster is already documented in the Oracle documentation at
Version Documentation Link
10gR2 http://download.oracle.com/docs/cd/B19306_01/rac.102/b14197/adddelunix.htm#BEIFDCAF
11gR1 http://download.oracle.com/docs/cd/B28359_01/rac.111/b28255/adddelclusterware.htm#BEIFDCAF
This note is different because the documentation covers the scenario where the node is accessible and the removal is a planned procedure. This note covers the scenario where the Node is unable to boot up and therefore it is not possible to run the clusterware commands from this node.
For 11gR2, refer to note 1262925.1
FIX
Summary
Basically all the steps documented in the Oracle Clusterware Administration and Deployment Guide must be followed. The difference here is that we skip the steps that are to be executed on the node which is not available and we run some extra commands on the other node which is going to remain in the cluster to remove the resources from the node that is to be removed.
Example Configuration
All steps outlined in this document were executed on a cluster with the following configuration:
Item Value
Node Names lc2n1, lc2n2, lc2n3
Operating System Oracle Enterprise Linux 5 Update 4
Oracle Clusterware Release 10.2.0.5.0
ASM & Database Release 10.2.0.5.0
Clusterware Home /u01/app/oracle/product/10.2.0/crs ($CRS_HOME)
ASM Home /u01/app/oracle/product/10.2.0/asm
Database Home /u01/app/oracle/product/10.2.0/db_1
Cluster Name lc2
Assume that node lc2n3 is down due to a hardware failure and cannot even boot up. The plan is to remove it from the clusterware, fix the issue and then add it again to the Clusterware. In this document, we will cover the steps to remove the node from the clusterware
Please note that for better readability instead of 'crs_stat -t' the sample script 'crsstat' from
Doc ID 259301.1 CRS and 10g/11.1 Real Application Clusters
was used to query the state of the CRS resources. This script is not part of a standard CRS installation.
Initial Stage
At this stage, the Oracle Clusterware is up and running on nodes lc2n1 & lc2n2 (good nodes) . Node lc2n3 is down and cannot be accessed. Note that the Virtual IP of lc2n3 is running on Node 1. The rest of the lc2n3 resources are OFFLINE:
[oracle@lc2n1 ~]$ crsstat
Name Target State Host
-------------------------------------------------------------------------------
ora.LC2DB1.LC2DB11.inst ONLINE ONLINE lc2n1
ora.LC2DB1.LC2DB12.inst ONLINE ONLINE lc2n2
ora.LC2DB1.LC2DB13.inst ONLINE OFFLINE
ora.LC2DB1.LC2DB1_SRV1.LC2DB11.srv ONLINE ONLINE lc2n1
ora.LC2DB1.LC2DB1_SRV1.LC2DB12.srv ONLINE ONLINE lc2n2
ora.LC2DB1.LC2DB1_SRV1.LC2DB13.srv ONLINE OFFLINE
ora.LC2DB1.LC2DB1_SRV1.cs ONLINE ONLINE lc2n1
ora.LC2DB1.db ONLINE ONLINE lc2n2
ora.lc2n1.ASM1.asm ONLINE ONLINE lc2n1
ora.lc2n1.LISTENER_LC2N1.lsnr ONLINE ONLINE lc2n1
ora.lc2n1.gsd ONLINE ONLINE lc2n1
ora.lc2n1.ons ONLINE ONLINE lc2n1
ora.lc2n1.vip ONLINE ONLINE lc2n1
ora.lc2n2.ASM2.asm ONLINE ONLINE lc2n2
ora.lc2n2.LISTENER_LC2N2.lsnr ONLINE ONLINE lc2n2
ora.lc2n2.gsd ONLINE ONLINE lc2n2
ora.lc2n2.ons ONLINE ONLINE lc2n2
ora.lc2n2.vip ONLINE ONLINE lc2n2
ora.lc2n3.ASM3.asm ONLINE OFFLINE
ora.lc2n3.LISTENER_LC2N3.lsnr ONLINE OFFLINE
ora.lc2n3.gsd ONLINE OFFLINE
ora.lc2n3.ons ONLINE OFFLINE
ora.lc2n3.vip ONLINE ONLINE lc2n1
[oracle@lc2n1 ~]$
Step 1 Remove oifcfg information for the failed node
Generally most installations use the global flag of the oifcfg command and therefore they can skip this step. They can confirm this using:
[oracle@lc2n1 bin]$ $CRS_HOME/bin/oifcfg getif
eth0 192.168.56.0 global public
eth1 192.168.57.0 global cluster_interconnect
If the output of the command returns global as shown above then you can skip the following step (executing the command below on a global defination will return an error as shown below.
If the output of the oifcfg getif command does not return global then use the following command
[oracle@lc2n1 bin]$ $CRS_HOME/bin/oifcfg delif -node lc2n3
PROC-4: The cluster registry key to be operated on does not exist.
PRIF-11: cluster registry error
Step 2 Remove ONS information
Execute the following command to find out the remote port number to be used
[oracle@lc2n1 bin]$ cat $CRS_HOME/opmn/conf/ons.config
localport=6113
remoteport=6200
loglevel=3
useocr=on
and remove the information pertaining to the node to be deleted using:
[oracle@lc2n1 bin]$ $CRS_HOME/bin/racgons remove_config lc2n3:6200
Step 3 Remove resources
In this step, the resources that were defined on this node have to be removed. These resources include Database Instances, ASm, Listener and Nodeapps resources. A list of these can be acquired by running crsstat (crs_stat -t) command from any node
[oracle@lc2n1 ~]$ crsstat |grep OFFLINE
ora.LC2DB1.LC2DB13.inst ONLINE OFFLINE
ora.LC2DB1.LC2DB1_SRV1.LC2DB13.srv ONLINE OFFLINE
ora.lc2n3.ASM3.asm ONLINE OFFLINE
ora.lc2n3.LISTENER_LC2N3.lsnr ONLINE OFFLINE
ora.lc2n3.gsd ONLINE OFFLINE
ora.lc2n3.ons ONLINE OFFLINE
Before removing any resource it is recommended to take a backup of the OCR:
[root@lc2n1 ~]# cd $CRS_HOME/cdata/lc2
[root@lc2n1 lc2]# $CRS_HOME/bin/ocrconfig -export ocr_before_node_removal.exp
[root@lc2n1 lc2]# ls -l ocr_before_node_removal.exp
-rw-r--r-- 1 root root 151946 Nov 15 15:24 ocr_before_node_removal.exp
Use 'srvctl' from the database home to delete the database instance on node 3:
[oracle@lc2n1 ~]$ . oraenv
ORACLE_SID = [oracle] ? LC2DB1
[oracle@lc2n1 ~]$ $ORACLE_HOME/bin/srvctl remove instance -d LC2DB1 -i LC2DB13
Remove instance LC2DB13 from the database LC2DB1? (y/[n]) y
Use 'srvctl' from the ASM home to delete the ASM instance on node 3:
[oracle@lc2n1 ~]$ . oraenv
ORACLE_SID = [oracle] ? +ASM1
[oracle@lc2n1 ~]$ $ORACLE_HOME/bin/srvctl remove asm -n lc2n3
Next remove the listener resource.
Please note that there is no 'srvctl remove listener' subcommand prior to 11.1 so this command will not work in 10.2. Using 'netca' to delete the listener from a down node also is not an option as netca needs to remove the listener configuration from the listener.ora.
10.2 only:
The only way to remove the listener resources is to use the command 'crs_unregister', please use this command only in this particular scenario:
[oracle@lc2n1 lc2]$ $CRS_HOME/bin/crs_unregister ora.lc2n3.LISTENER_LC2N3.lsnr
11.1 only:
Set the environment to the home from which the listener runs (ASM or database):
[oracle@lc2n1 ~]$ . oraenv
ORACLE_SID = [oracle] ? +ASM1
[oracle@lc2n1 lc2]$ $ORACLE_HOME/bin/srvctl remove listener -n lc2n3
As user root stop the nodeapps resources:
[root@lc2n1 oracle]# $CRS_HOME/bin/srvctl stop nodeapps -n lc2n3
[root@lc2n1 oracle]# crsstat |grep OFFLINE
ora.lc2n3.LISTENER_LC2N3.lsnr OFFLINE OFFLINE
ora.lc2n3.gsd OFFLINE OFFLINE
ora.lc2n3.ons OFFLINE OFFLINE
ora.lc2n3.vip OFFLINE OFFLINE
Now remove them:
[root@lc2n1 oracle]# $CRS_HOME/bin/srvctl remove nodeapps -n lc2n3
Please confirm that you intend to remove the node-level applications on node lc2n3 (y/[n]) y
At this point all resources from the bad node should be gone:
[oracle@lc2n1 ~]$ crsstat
Name Target State Host
-------------------------------------------------------------------------------
ora.LC2DB1.LC2DB11.inst ONLINE ONLINE lc2n1
ora.LC2DB1.LC2DB12.inst ONLINE ONLINE lc2n2
ora.LC2DB1.LC2DB1_SRV1.LC2DB11.srv ONLINE ONLINE lc2n1
ora.LC2DB1.LC2DB1_SRV1.LC2DB12.srv ONLINE ONLINE lc2n2
ora.LC2DB1.LC2DB1_SRV1.cs ONLINE ONLINE lc2n1
ora.LC2DB1.db ONLINE ONLINE lc2n2
ora.lc2n1.ASM1.asm ONLINE ONLINE lc2n1
ora.lc2n1.LISTENER_LC2N1.lsnr ONLINE ONLINE lc2n1
ora.lc2n1.gsd ONLINE ONLINE lc2n1
ora.lc2n1.ons ONLINE ONLINE lc2n1
ora.lc2n1.vip ONLINE ONLINE lc2n1
ora.lc2n2.ASM2.asm ONLINE ONLINE lc2n2
ora.lc2n2.LISTENER_LC2N2.lsnr ONLINE ONLINE lc2n2
ora.lc2n2.gsd ONLINE ONLINE lc2n2
ora.lc2n2.ons ONLINE ONLINE lc2n2
ora.lc2n2.vip ONLINE ONLINE lc2n2
Step 4 Execute rootdeletenode.sh
From the node that you are not deleting execute as root the following command which will help find out the node number of the node that you want to delete
[oracle@lc2n1 ~]$ $CRS_HOME//bin/olsnodes -n
lc2n1 1
lc2n2 2
lc2n3 3
this number can be passed to the rootdeletenode.sh command which is to be executed as root from any node which is going to remain in the cluster.
[root@lc2n1 ~]# cd $CRS_HOME/install
[root@lc2n1 install]# ./rootdeletenode.sh lc2n3,3
CRS-0210: Could not find resource 'ora.lc2n3.ons'.
CRS-0210: Could not find resource 'ora.lc2n3.vip'.
CRS-0210: Could not find resource 'ora.lc2n3.gsd'.
CRS-0210: Could not find resource ora.lc2n3.vip.
CRS nodeapps are deleted successfully
clscfg: EXISTING configuration version 3 detected.
clscfg: version 3 is 10G Release 2.
Successfully deleted 14 values from OCR.
Key SYSTEM.css.interfaces.nodelc2n3 marked for deletion is not there. Ignoring.
Successfully deleted 5 keys from OCR.
Node deletion operation successful.
'lc2n3,3' deleted successfully
[root@lc2n1 install]# $CRS_HOME/bin/olsnodes -n
lc2n1 1
lc2n2 2
Step 5 Update the Inventory
From the node which is going to remain in the cluster run the following command as owner of the CRS_HOME. The argument to be passed to the CLUSTER_NODES is a comma seperated list of node names of the cluster which are going to remain in the cluster. This step needs to be performed from once per home (Clusterware, ASM and RDBMS homes).
[oracle@lc2n1 install]$ $CRS_HOME/oui/bin/runInstaller -updateNodeList ORACLE_HOME=/u01/app/oracle/product/10.2.0/crs "CLUSTER_NODES={lc2n1,lc2n2}" CRS=TRUE
Starting Oracle Universal Installer...
No pre-requisite checks found in oraparam.ini, no system pre-requisite checks will be executed.
The inventory pointer is located at /etc/oraInst.loc
The inventory is located at /u01/app/oracle/oraInventory
'UpdateNodeList' was successful.
[oracle@lc2n1 install]$ $CRS_HOME/oui/bin/runInstaller -updateNodeList ORACLE_HOME=/u01/app/oracle/product/10.2.0/asm "CLUSTER_NODES={lc2n1,lc2n2}"
Starting Oracle Universal Installer...
No pre-requisite checks found in oraparam.ini, no system pre-requisite checks will be executed.
The inventory pointer is located at /etc/oraInst.loc
The inventory is located at /u01/app/oracle/oraInventory
'UpdateNodeList' was successful.
[oracle@lc2n1 install]$ $CRS_HOME/oui/bin/runInstaller -updateNodeList ORACLE_HOME=/u01/app/oracle/product/10.2.0/db_1 "CLUSTER_NODES={lc2n1,lc2n2}"
Starting Oracle Universal Installer...
No pre-requisite checks found in oraparam.ini, no system pre-requisite checks will be executed.
The inventory pointer is located at /etc/oraInst.loc
The inventory is located at /u01/app/oracle/oraInventory
'UpdateNodeList' was successful.