ORACLE/RAC2014. 2. 22. 12:10
반응형

Steps to Remove Node from Cluster When the Node Crashes Due to OS/Hardware Failure and cannot boot up (Doc ID 466975.1) To BottomTo Bottom

Modified:16-Nov-2012Type:HOWTO

Rate this document Email link to this document Open document in new window Printable Page


In this Document

Goal

Fix

  Summary

  Example Configuration

  Initial Stage

  Step 1 Remove oifcfg information for the failed node

  Step 2 Remove ONS information

  Step 3 Remove resources

  Step 4 Execute rootdeletenode.sh

  Step 5 Update the Inventory

References

APPLIES TO:


Oracle Server - Enterprise Edition - Version 10.2.0.1 to 11.1.0.6 [Release 10.2 to 11.1]

Oracle Server - Standard Edition - Version 10.2.0.1 to 11.1.0.6 [Release 10.2 to 11.1]

Information in this document applies to any platform.

Oracle Server Enterprise Edition - Version: 10.2.0.1 to 11.1.0.6

Oracle Clusterware



GOAL


This document is intented to provide the steps to be taken to remove a node from the Oracle cluster. The node itself is unavailable due to some OS issue or hardware issue which prevents the node from starting up. This document will provide the steps to remove such a node so that it can be added back after the node is fixed.


The steps to remove a node from a Cluster is already documented in the Oracle documentation at


Version Documentation Link

10gR2 http://download.oracle.com/docs/cd/B19306_01/rac.102/b14197/adddelunix.htm#BEIFDCAF

11gR1 http://download.oracle.com/docs/cd/B28359_01/rac.111/b28255/adddelclusterware.htm#BEIFDCAF

This note is different because the documentation covers the scenario where the node is accessible and the removal is a planned procedure. This note covers the scenario where the Node is unable to boot up and therefore it is not possible to run the clusterware commands from this node.


For 11gR2, refer to note 1262925.1


 


FIX


Summary


Basically all the steps documented in the Oracle Clusterware Administration and Deployment Guide must be followed. The difference here is that we skip the steps that are to be executed on the node which is not available and we run some extra commands on the other node which is going to remain in the cluster to remove the resources from the node that is to be removed.


Example Configuration


 All steps outlined in this document were executed on a cluster with the following configuration:


Item Value

Node Names lc2n1, lc2n2, lc2n3

Operating System Oracle Enterprise Linux 5 Update 4

Oracle Clusterware Release 10.2.0.5.0

ASM & Database Release 10.2.0.5.0

Clusterware Home /u01/app/oracle/product/10.2.0/crs ($CRS_HOME)

ASM Home /u01/app/oracle/product/10.2.0/asm

Database Home /u01/app/oracle/product/10.2.0/db_1

 Cluster Name lc2

 


 Assume that node lc2n3 is down due to a hardware failure and cannot even boot up. The plan is to remove it from the clusterware, fix the issue and then add it again to the Clusterware. In this document, we will cover the steps to remove the node from the clusterware


Please note that for better readability instead of 'crs_stat -t' the sample script 'crsstat' from 

  Doc ID 259301.1 CRS and 10g/11.1 Real Application Clusters 

was used to query the state of the CRS resources. This script is not part of a standard CRS installation.

 


Initial Stage


At this stage, the Oracle Clusterware is up and running on nodes lc2n1 & lc2n2 (good nodes) . Node lc2n3 is down and cannot be accessed. Note that the Virtual IP of lc2n3 is running on Node 1. The rest of the lc2n3 resources are OFFLINE:


[oracle@lc2n1 ~]$ crsstat

Name                                     Target     State      Host      

-------------------------------------------------------------------------------

ora.LC2DB1.LC2DB11.inst                  ONLINE     ONLINE     lc2n1     

ora.LC2DB1.LC2DB12.inst                  ONLINE     ONLINE     lc2n2     

ora.LC2DB1.LC2DB13.inst                  ONLINE     OFFLINE              

ora.LC2DB1.LC2DB1_SRV1.LC2DB11.srv       ONLINE     ONLINE     lc2n1     

ora.LC2DB1.LC2DB1_SRV1.LC2DB12.srv       ONLINE     ONLINE     lc2n2     

ora.LC2DB1.LC2DB1_SRV1.LC2DB13.srv       ONLINE     OFFLINE              

ora.LC2DB1.LC2DB1_SRV1.cs                ONLINE     ONLINE     lc2n1     

ora.LC2DB1.db                            ONLINE     ONLINE     lc2n2     

ora.lc2n1.ASM1.asm                       ONLINE     ONLINE     lc2n1     

ora.lc2n1.LISTENER_LC2N1.lsnr            ONLINE     ONLINE     lc2n1     

ora.lc2n1.gsd                            ONLINE     ONLINE     lc2n1     

ora.lc2n1.ons                            ONLINE     ONLINE     lc2n1     

ora.lc2n1.vip                            ONLINE     ONLINE     lc2n1     

ora.lc2n2.ASM2.asm                       ONLINE     ONLINE     lc2n2     

ora.lc2n2.LISTENER_LC2N2.lsnr            ONLINE     ONLINE     lc2n2     

ora.lc2n2.gsd                            ONLINE     ONLINE     lc2n2     

ora.lc2n2.ons                            ONLINE     ONLINE     lc2n2     

ora.lc2n2.vip                            ONLINE     ONLINE     lc2n2     

ora.lc2n3.ASM3.asm                       ONLINE     OFFLINE              

ora.lc2n3.LISTENER_LC2N3.lsnr            ONLINE     OFFLINE              

ora.lc2n3.gsd                            ONLINE     OFFLINE              

ora.lc2n3.ons                            ONLINE     OFFLINE              

ora.lc2n3.vip                            ONLINE     ONLINE     lc2n1     

[oracle@lc2n1 ~]$

 


Step 1 Remove oifcfg information for the failed node


Generally most installations use the global flag of the oifcfg command and therefore they can skip this step. They can confirm this using:


[oracle@lc2n1 bin]$ $CRS_HOME/bin/oifcfg getif

eth0  192.168.56.0  global  public

eth1  192.168.57.0  global  cluster_interconnect

If the output of the command returns global as shown above then you can skip the following step (executing the command below on a global defination will return an error as shown below.


If the output of the oifcfg getif command does not return global then use the following command


[oracle@lc2n1 bin]$ $CRS_HOME/bin/oifcfg delif -node lc2n3 

PROC-4: The cluster registry key to be operated on does not exist.

PRIF-11: cluster registry error

 


Step 2 Remove ONS information


Execute the following command to find out the remote port number to be used


[oracle@lc2n1 bin]$ cat $CRS_HOME/opmn/conf/ons.config

localport=6113 

remoteport=6200 

loglevel=3

useocr=on

and remove the information pertaining to the node to be deleted using:


[oracle@lc2n1 bin]$ $CRS_HOME/bin/racgons remove_config lc2n3:6200

 


Step 3 Remove resources


In this step, the resources that were defined on this node have to be removed. These resources include Database Instances, ASm, Listener and Nodeapps resources. A list of these can be acquired by running crsstat (crs_stat -t) command from any node


[oracle@lc2n1 ~]$ crsstat |grep OFFLINE

ora.LC2DB1.LC2DB13.inst                  ONLINE     OFFLINE              

ora.LC2DB1.LC2DB1_SRV1.LC2DB13.srv       ONLINE     OFFLINE              

ora.lc2n3.ASM3.asm                       ONLINE     OFFLINE              

ora.lc2n3.LISTENER_LC2N3.lsnr            ONLINE     OFFLINE              

ora.lc2n3.gsd                            ONLINE     OFFLINE              

ora.lc2n3.ons                            ONLINE     OFFLINE             

 Before removing any resource it is recommended to take a backup of the OCR:


[root@lc2n1 ~]# cd $CRS_HOME/cdata/lc2

[root@lc2n1 lc2]# $CRS_HOME/bin/ocrconfig -export ocr_before_node_removal.exp

[root@lc2n1 lc2]# ls -l ocr_before_node_removal.exp

-rw-r--r-- 1 root root 151946 Nov 15 15:24 ocr_before_node_removal.exp

 Use 'srvctl' from the database home to delete the database instance on node 3:


[oracle@lc2n1 ~]$ . oraenv

ORACLE_SID = [oracle] ? LC2DB1

[oracle@lc2n1 ~]$ $ORACLE_HOME/bin/srvctl remove instance -d LC2DB1 -i LC2DB13

Remove instance LC2DB13 from the database LC2DB1? (y/[n]) y

 Use 'srvctl' from the ASM home to delete the ASM instance on node 3:


[oracle@lc2n1 ~]$ . oraenv

ORACLE_SID = [oracle] ? +ASM1

[oracle@lc2n1 ~]$ $ORACLE_HOME/bin/srvctl remove asm -n lc2n3

Next remove the listener resource.


Please note that there is no 'srvctl remove listener' subcommand prior to 11.1 so this command will not work in 10.2. Using 'netca' to delete the listener from a down node also is not an option as netca needs to remove the listener configuration from the listener.ora.

10.2 only:


The only way to remove the listener resources is to use the command 'crs_unregister', please use this command only in this particular scenario:


[oracle@lc2n1 lc2]$ $CRS_HOME/bin/crs_unregister ora.lc2n3.LISTENER_LC2N3.lsnr

 11.1 only:


 Set the environment to the home from which the listener runs (ASM or database):


[oracle@lc2n1 ~]$ . oraenv

ORACLE_SID = [oracle] ? +ASM1

[oracle@lc2n1 lc2]$ $ORACLE_HOME/bin/srvctl remove listener -n lc2n3 

  As user root stop the nodeapps resources:


[root@lc2n1 oracle]# $CRS_HOME/bin/srvctl stop nodeapps -n lc2n3

[root@lc2n1 oracle]# crsstat |grep OFFLINE

ora.lc2n3.LISTENER_LC2N3.lsnr            OFFLINE    OFFLINE              

ora.lc2n3.gsd                            OFFLINE    OFFLINE              

ora.lc2n3.ons                            OFFLINE    OFFLINE              

ora.lc2n3.vip                            OFFLINE    OFFLINE        

 Now remove them:


[root@lc2n1 oracle]#  $CRS_HOME/bin/srvctl remove nodeapps -n lc2n3

Please confirm that you intend to remove the node-level applications on node lc2n3 (y/[n]) y

 At this point all resources from the bad node should be gone:


[oracle@lc2n1 ~]$ crsstat 

Name                                     Target     State      Host      

-------------------------------------------------------------------------------

ora.LC2DB1.LC2DB11.inst                  ONLINE     ONLINE     lc2n1     

ora.LC2DB1.LC2DB12.inst                  ONLINE     ONLINE     lc2n2     

ora.LC2DB1.LC2DB1_SRV1.LC2DB11.srv       ONLINE     ONLINE     lc2n1     

ora.LC2DB1.LC2DB1_SRV1.LC2DB12.srv       ONLINE     ONLINE     lc2n2     

ora.LC2DB1.LC2DB1_SRV1.cs                ONLINE     ONLINE     lc2n1     

ora.LC2DB1.db                            ONLINE     ONLINE     lc2n2     

ora.lc2n1.ASM1.asm                       ONLINE     ONLINE     lc2n1     

ora.lc2n1.LISTENER_LC2N1.lsnr            ONLINE     ONLINE     lc2n1     

ora.lc2n1.gsd                            ONLINE     ONLINE     lc2n1     

ora.lc2n1.ons                            ONLINE     ONLINE     lc2n1     

ora.lc2n1.vip                            ONLINE     ONLINE     lc2n1     

ora.lc2n2.ASM2.asm                       ONLINE     ONLINE     lc2n2     

ora.lc2n2.LISTENER_LC2N2.lsnr            ONLINE     ONLINE     lc2n2     

ora.lc2n2.gsd                            ONLINE     ONLINE     lc2n2     

ora.lc2n2.ons                            ONLINE     ONLINE     lc2n2     

ora.lc2n2.vip                            ONLINE     ONLINE     lc2n2  

 


Step 4 Execute rootdeletenode.sh


From the node that you are not deleting execute as root the following command which will help find out the node number of the node that you want to delete


[oracle@lc2n1 ~]$ $CRS_HOME//bin/olsnodes -n

lc2n1   1

lc2n2   2

lc2n3   3

this number can be passed to the rootdeletenode.sh command which is to be executed as root from any node which is going to remain in the cluster.


[root@lc2n1 ~]# cd $CRS_HOME/install

[root@lc2n1 install]# ./rootdeletenode.sh lc2n3,3

CRS-0210: Could not find resource 'ora.lc2n3.ons'.

CRS-0210: Could not find resource 'ora.lc2n3.vip'.

CRS-0210: Could not find resource 'ora.lc2n3.gsd'.

CRS-0210: Could not find resource ora.lc2n3.vip.

CRS nodeapps are deleted successfully

clscfg: EXISTING configuration version 3 detected.

clscfg: version 3 is 10G Release 2.

Successfully deleted 14 values from OCR.

Key SYSTEM.css.interfaces.nodelc2n3 marked for deletion is not there. Ignoring.

Successfully deleted 5 keys from OCR.

Node deletion operation successful.

'lc2n3,3' deleted successfully

[root@lc2n1 install]# $CRS_HOME/bin/olsnodes -n

lc2n1   1

lc2n2   2

 


Step 5 Update the Inventory


From the node which is going to remain in the cluster run the following command as owner of the CRS_HOME. The argument to be passed to the CLUSTER_NODES is a comma seperated list of node names of the cluster which are going to remain in the cluster. This step needs to be performed from once per home (Clusterware, ASM and RDBMS homes).


[oracle@lc2n1 install]$ $CRS_HOME/oui/bin/runInstaller -updateNodeList ORACLE_HOME=/u01/app/oracle/product/10.2.0/crs "CLUSTER_NODES={lc2n1,lc2n2}" CRS=TRUE  

Starting Oracle Universal Installer...


No pre-requisite checks found in oraparam.ini, no system pre-requisite checks will be executed.

The inventory pointer is located at /etc/oraInst.loc

The inventory is located at /u01/app/oracle/oraInventory

'UpdateNodeList' was successful.


[oracle@lc2n1 install]$ $CRS_HOME/oui/bin/runInstaller -updateNodeList ORACLE_HOME=/u01/app/oracle/product/10.2.0/asm "CLUSTER_NODES={lc2n1,lc2n2}"

Starting Oracle Universal Installer...


No pre-requisite checks found in oraparam.ini, no system pre-requisite checks will be executed.

The inventory pointer is located at /etc/oraInst.loc

The inventory is located at /u01/app/oracle/oraInventory

'UpdateNodeList' was successful.

[oracle@lc2n1 install]$ $CRS_HOME/oui/bin/runInstaller -updateNodeList ORACLE_HOME=/u01/app/oracle/product/10.2.0/db_1 "CLUSTER_NODES={lc2n1,lc2n2}"

Starting Oracle Universal Installer...


No pre-requisite checks found in oraparam.ini, no system pre-requisite checks will be executed.

The inventory pointer is located at /etc/oraInst.loc

The inventory is located at /u01/app/oracle/oraInventory

'UpdateNodeList' was successful.

반응형
Posted by [PineTree]
ORACLE/RAC2013. 5. 30. 11:27
반응형


출처 : http://leejehong.tistory.com/170


[root@RAC1 ~]# srvctl config database -d devdb -v
Database unique name: devdb
Database name: devdb
Oracle home: /u01/app/11.2.0/db
Oracle user: oracle

10g CRS AUTO_START 값 변경.txt


11G Oracle RAC Startup Policy 변경.txt


11gR2 Disable Enable Automatic startup Oracle HAS.txt


Changing Resource Attributes in 11gR2 Grid Infrastructure.txt


crs_stat_resource상태확인(AUTO_START).txt


Spfile: /dev/raw/raw6
Domain: 
Start options: open
Stop options: immediate
Database role: PRIMARY
Management policy: AUTOMATIC
Server pools: devdb
Database instances: devdb1,devdb2
Disk Groups: 
Mount point paths: 
Services: 
Type: RAC
Database is administrator managed
[root@RAC1 ~]#


oracle@RAC1:/>crs_stat -t
Name           Type           Target    State     Host        
------------------------------------------------------------
ora....ER.lsnr ora....er.type ONLINE    ONLINE    rac1        
ora....N1.lsnr ora....er.type ONLINE    ONLINE    rac2        
ora.cvu        ora.cvu.type   ONLINE    ONLINE    rac1        
ora.devdb.db   ora....se.type OFFLINE   OFFLINE               
ora.gsd        ora.gsd.type   ONLINE    ONLINE    rac1        
ora....network ora....rk.type ONLINE    ONLINE    rac1        
ora.oc4j       ora.oc4j.type  ONLINE    ONLINE    rac1        
ora.ons        ora.ons.type   ONLINE    ONLINE    rac1        
ora....C1.lsnr application    ONLINE    ONLINE    rac1        
ora.rac1.gsd   application    ONLINE    ONLINE    rac1        
ora.rac1.ons   application    ONLINE    ONLINE    rac1        
ora.rac1.vip   ora....t1.type ONLINE    ONLINE    rac1        
ora....C2.lsnr application    ONLINE    ONLINE    rac2        
ora.rac2.gsd   application    ONLINE    ONLINE    rac2        
ora.rac2.ons   application    ONLINE    ONLINE    rac2        
ora.rac2.vip   ora....t1.type ONLINE    ONLINE    rac2        
ora....ry.acfs ora....fs.type OFFLINE   OFFLINE               
ora.scan1.vip  ora....ip.type ONLINE    ONLINE    rac2        
oracle@RAC1:/>
oracle@RAC1:/>

reboot시 Database가 자동으로 시작되지 않음.


oracle@RAC1:/>crs_stat -p ora.devdb.db |grep -i start
AUTO_START=restore
GEN_START_OPTIONS@SERVERNAME(rac1)=open
GEN_START_OPTIONS@SERVERNAME(rac2)=open
RESTART_ATTEMPTS=2
START_TIMEOUT=600
oracle@RAC1:/>

AUTO_START=restore 로 되어있어서 startup 되지 않음.

# 참고
* AUTO_START=1(always) -> 그 전의 상태와 상관없이 하드웨어 설정상태만 정상이면 crs 재구동시 리소스가 online 되어짐. * * AUTO_START=2(never) -> 모든 리소스를 수동으로 시작 
* AUTO_START=0(restore) -> 모든 리소스는 내리기 전 상태로 복귀.

[root@RAC1 profile]# crs_stat -p ora.devdb.db |grep -i start
AUTO_START=restore
GEN_START_OPTIONS@SERVERNAME(rac1)=open
GEN_START_OPTIONS@SERVERNAME(rac2)=open
RESTART_ATTEMPTS=2
START_TIMEOUT=600

[root@RAC1 profile]# 
[root@RAC1 profile]# crsctl modify resource "ora.devdb.db" -attr "AUTO_START=always"
[root@RAC1 profile]# crs_stat -p ora.devdb.db |grep -i start
AUTO_START=always
GEN_START_OPTIONS@SERVERNAME(rac1)=open
GEN_START_OPTIONS@SERVERNAME(rac2)=open
RESTART_ATTEMPTS=2
START_TIMEOUT=600
[root@RAC1 profile]#

Changing Resource Attributes in 11gR2 Grid Infrastructure 
In 11gR2 grid infrastructure installations certain resources may have auto start set to never and restore. 
This was observed both on environments where clusterware was upgraded to 11.2 as well as newly installed environments. 
Depending on the situation these may not be desirable. Auto start attribute setting could be changed as follows.

1. Check the current auto start values

# crsctl stat res -p
NAME=ora.FLASH.dg
TYPE=ora.diskgroup.type
ACL=owner:oracle:rwx,pgrp:oinstall:rwx,other::r--
ACTION_FAILURE_TEMPLATE=
ACTION_SCRIPT=
AGENT_FILENAME=%CRS_HOME%/bin/oraagent%CRS_EXE_SUFFIX%
ALIAS_NAME=
AUTO_START=never     

NAME=ora.DATA.dg
TYPE=ora.diskgroup.type
ACL=owner:oracle:rwx,pgrp:oinstall:rwx,other::r--
ACTION_FAILURE_TEMPLATE=
ACTION_SCRIPT=
AGENT_FILENAME=%CRS_HOME%/bin/oraagent%CRS_EXE_SUFFIX%
ALIAS_NAME=
AUTO_START=never    

NAME=ora.clusdb.db
TYPE=ora.database.type
ACL=owner:oracle:rwx,pgrp:oinstall:rwx,other::r--
ACTION_FAILURE_TEMPLATE=
ACTION_SCRIPT=
ACTIVE_PLACEMENT=1
AGENT_FILENAME=%CRS_HOME%/bin/oraagent%CRS_EXE_SUFFIX%
AUTO_START=restore

2. Since ASM diskgroup that database depend on will never auto start database will also be unavailable.

3. Change the resource start attribute with

# crsctl modify resource "ora.FLASH.dg" -attr "AUTO_START=always"
# crsctl modify resource "ora.DATA.dg" -attr "AUTO_START=always"
# crsctl modify resource ora.clusdb.db -attr "AUTO_START=always"
Auto start must be upper case if not command will fail 
crsctl modify resource ora.clusdb.db -attr "auto_start=always"
CRS-0160: The attribute 'auto_start' is not supported in this resource type.
CRS-4000: Command Modify failed, or completed with errors.


4. Verify the status change with 
# crsctl stat res -p
NAME=ora.clusdb.db
TYPE=ora.database.type
ACL=owner:oracle:rwx,pgrp:oinstall:rwx,other::r--
ACTION_FAILURE_TEMPLATE=
ACTION_SCRIPT=
ACTIVE_PLACEMENT=1
AGENT_FILENAME=%CRS_HOME%/bin/oraagent%CRS_EXE_SUFFIX%
AUTO_START=always

반응형
Posted by [PineTree]