ORACLE/RAC2012. 5. 18. 16:38
반응형

VIP Failover Take Long Time After Network Cable Pulled [ID 403743.1]
--------------------------------------------------------------------------------
 
  수정 날짜 05-JAN-2011     유형 PROBLEM     상태 PUBLISHED  

In this Document
  Symptoms
  Changes
  Cause
  Solution
  References

--------------------------------------------------------------------------------

Applies to:
Oracle Server - Enterprise Edition - Version: 10.2.0.1 to 11.1.0.7 - Release: 10.2 to 11.1
Information in this document applies to any platform.
***Checked for relevance on 05-Jan-2011***
Symptoms
This example is based on SUN Solaris platform, with IPMP configured for the public network. In this case, VIP failover takes almost 4 minutes to complete when both network cables of the public network are pulled from one node.

crsd.log shows:

2006-12-07 13:14:05.401: [ CRSAPP][4588] CheckResource error for ora.node1.vip error code = 1
2006-12-07 13:14:05.408: [ CRSRES][4588] In stateChanged, ora.node1.vip target is ONLINE
2006-12-07 13:14:05.409: [ CRSRES][4588] ora.node1.vip on node1 went OFFLINE unexpectedly
<<< detect network cable failure and VIP OFFLINE immediately

2006-12-07 13:14:05.410: [ CRSRES][4588] StopResource: setting CLI values
2006-12-07 13:14:05.420: [ CRSRES][4588] Attempting to stop `ora.node1.vip` on member `node1`
2006-12-07 13:14:06.651: [ CRSRES][4588] Stop of `ora.node1.vip` on member `node1` succeeded.
2006-12-07 13:14:06.652: [ CRSRES][4588] ora.node1.vip RESTART_COUNT=0 RESTART_ATTEMPTS=0
2006-12-07 13:14:06.667: [ CRSRES][4588] ora.node1.vip failed on node1 relocating.
2006-12-07 13:14:06.758: [ CRSRES][4588] StopResource: setting CLI values
2006-12-07 13:14:06.766: [ CRSRES][4588] Attempting to stop `ora.node1.LISTENER_NODE1.lsnr` on member `node1`
2006-12-07 13:17:41.399: [ CRSRES][4588] Stop of `ora.node1.LISTENER_NODE1.lsnr` on member `node1` succeeded.
<<< takes 3.5 minutes to stop listener

2006-12-07 13:17:41.402: Attempting to stop `ora.node1.ASM1.asm` on member `node1`
<<< stop dependant inst and ASM
2006-12-07 13:17:55.610: [ CRSRES][4588] Stop of `ora.node1.ASM1.asm` on member `node1` succeeded.

2006-12-07 13:17:55.661: [ CRSRES][4588] Attempting to start `ora.node1.vip` on member `node2`
2006-12-07 13:18:00.260: [ CRSRES][4588] Start of `ora.node1.vip` on member `node2` succeeded.
<<< now VIP failover complete after almost 4 mins


ora.node1.LISTENER_NODE1.lsnr.log shows:

Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=node1vip)(PORT=1521)(IP=FIRST)))
TNS-12535: TNS:operation timed
2006-12-07 13:17:41.329: [ RACG][1] [23916][1][ora.node1.LISTENER_NODE1.lsnr]: out
   TNS-12560: TNS:protocol adapter error
     TNS-00505: Operation timed out
     Solaris Error: 145: Connection timed out
     Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=10.1.10.100)(PORT=1521)(IP=FIRST)))
The command completed successfully


Client connection hang during this failover time.


Changes
This may be a new setup, or a setup that was migrated from an earlier release.
Cause
This problem is caused by the first address in the listener.ora configuration being an address that uses the TCP protocol.

In this circumstance, when a network cable is pulled, "lsnrctl stop" listener has to wait for TCP timeout before it can check next address. On the Solaris platform, TCP timeout is defined by tcp_ip_abort_cinterval with a default value of 180000 (3 minutes).   That is why shutting down listener almost took 3.5 minutes. (TCP timeout on other platforms may vary).  The error message "Solaris Error: 145: Connection timed out" in ora.node1.LISTENER_NODE1.lsnr.log also indicates it is waiting for tcp timeout.

The listener.ora in this scenario is defined as:

 

[LISTENER_NODE1 =
 (DESCRIPTION_LIST =
   (DESCRIPTION =
     (ADDRESS_LIST =
       (ADDRESS = (PROTOCOL = TCP)(HOST = node1vip)(PORT = 1521)(IP = FIRST))
     )
     (ADDRESS_LIST =
       (ADDRESS = (PROTOCOL = TCP)(HOST = 10.1.10.100)(PORT = 1521)(IP = FIRST))
     )
     (ADDRESS_LIST =
       (ADDRESS = (PROTOCOL = IPC)(KEY = EXTPROC))
     )
   )
 )Solution
To prevent this, move the IPC address to be the first address for the listener in the listener.ora, eg:


LISTENER_NODE1 =
  (DESCRIPTION_LIST =
    (DESCRIPTION =
       (ADDRESS_LIST =
          (ADDRESS = (PROTOCOL = IPC)(KEY = EXTPROC))
       )
       (ADDRESS_LIST =
          (ADDRESS = (PROTOCOL = TCP)(HOST = node1vip)(PORT = 1521)(IP = FIRST))
        )
       (ADDRESS_LIST =
           (ADDRESS = (PROTOCOL = TCP)(HOST = 10.1.10.100)(PORT = 1521)(IP = FIRST))
        )
     )
  )

When lsnrctl tries to stop the listener, it will now connect to the IPC address first, which is available during that time. It will not have to wait for tcp timeout.

After the above change, the VIP failover only takes 48 to 50 seconds to complete regardless of the tcp_ip_abort_cinterval setting.

Please note, listener.ora files newly created from 10.2.0.3 to 11.1.0.7 should have the IPC protocol as the first address in listener.ora in most cases.  However, if you have upgraded from a previous release, or manually modified/copied over a listener.ora from a previous install, you may not have the IPC protocol as the first address, regardless of your version. Manual modification is required to move IPC protocol to be the first address to avoid the problem described in this note.

References


 

반응형
Posted by [PineTree]
ORACLE/RAC2012. 5. 18. 16:24
반응형

How to Configure Solaris Link-based IPMP for Oracle VIP [ID 730732.1]

  수정 날짜 23-NOV-2011     유형 REFERENCE     상태 PUBLISHED  

In this Document
  Purpose
  Scope
  How to Configure Solaris Link-based IPMP for Oracle VIP
  References


Applies to:

Oracle Server - Enterprise Edition - Version: 10.2.0.1 to 11.2.0.2 - Release: 10.2 to 11.2
Oracle Solaris on SPARC (64-bit)
Oracle Solaris on x86-64 (64-bit)

Purpose

This note will give a sample configuration for Link-based failure detection mode for IPMP which is introduced in Sun Solaris 10 platform.

Before Sun Solaris 10, there is only Probe-based failure detection IPMP configuration that you can find the example in note 283107.1

The main different between Probe-based IPMP and Link-based IPMP
- In Probe-based IPMP, beside the host's Physical IP address you also need to assign a test IP address for each NIC card. And one target system, normally default gateway, where multipathing daemon used for ICMP probe message.

- In Link-based IPMP, only the host's Physical IP address is required.

Scope

By default, link-based failure detection is always enabled in Solaris 10, provided that the driver for the interface supports this type of failure detection. The following Sun network drivers are supported in the current release


hme
eri
ce
ge
bge
qfe
dmfe
e1000g
ixgb
nge
nxge
rge
xge



Network Requirement
--------------------------------
There is no different for Probe-based and Link-based in term of hardware requirement.

It is only one Physical IP address required per cluster node. The following is list of NIC Card and IP addresses that will be used in the following example.
- Public Interface: ce0 and ce1
- Physical IP: 130.35.100.123
- Oracle RAC VIP: 130.35.100.124

How to Configure Solaris Link-based IPMP for Oracle VIP

IPMP Configuration
-----------------------------
1. ifconfig ce0 group racpub
2. ifconfig ce0 addif 130.35.100.123 netmask + broadcast + up
3. ifconfig ce1 group racpub

To preserve the IPMP configuration across reboot, you need to update the /etc/hostname.* files as following
1. The entry of /etc/hostname.ce0 file
130.35.100.123 netmask + broadcast + group racpub up

2. The entry of /etc/hostname.ce1 file
group racpub up

Before CRS installation , the 'ifconfig -a' output will be

lo0: flags=2001000849<UP,LOOPBACK,RUNNING,MULTICAST,IPv4,VIRTUAL> mtu 8232 index 1
inet 127.0.0.1 netmask ff000000
hme0: flags=1000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4> mtu 1500 index 2
inet 192.168.1.1 netmask ffffff00 broadcast 192.168.1.255
ether 0:19:b9:3f:87:11
ce0: flags=1000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4> mtu 1500 index 3
inet 130.35.100.123 netmask ffffff00 broadcast 130.35.100.255
groupname racpub
ether 0:14:d1:13:7b:7e
ce1: flags=1000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4> mtu 1500 index 4
inet 0.0.0.0 netmask ff000000 broadcast 0.255.255.255
groupname racpub
ether 0:18:e7:8:c5:8b


Since no test IP assigned to public interfaces, the IP address of ce0 is the physical IP address and ce1 is 0.0.0.0.

CRS / VIPCA configuration
----------------------------------------
Upon successful of root.sh on the CRS installation, vipca will only make the primary interface as the public interface. If you start vipca application manually, the second screen (VIP Configuration Assistant, 1 of 2) will only list ce0 as the available public interface.

After that, you need to update CRS for the second NIC (ce1) information with srvctl command

# srvctl modify nodeapps -n tsrac1 -A 130.35.100.124/255.255.255.0/ce0\|ce1

After CRS is installed and Oracle VIP is running, the 'ifconfig -a' output will be

lo0: flags=2001000849<UP,LOOPBACK,RUNNING,MULTICAST,IPv4,VIRTUAL> mtu 8232 index 1
inet 127.0.0.1 netmask ff000000
hme0: flags=1000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4> mtu 1500 index 2
inet 192.168.1.1 netmask ffffff00 broadcast 192.168.1.255
ether 0:19:b9:3f:87:11
ce0: flags=1000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4> mtu 1500 index 3
inet 130.35.100.123 netmask ffffff00 broadcast 130.35.100.255
groupname racpub
ether 0:14:d1:13:7b:7e
ce0:1: flags=1040843<UP,BROADCAST,RUNNING,MULTICAST,DEPRECATED,IPv4> mtu 1500 index 3
inet 130.35.100.124 netmask ffffff00 broadcast 130.35.100.255
ce1: flags=1000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4> mtu 1500 index 4
inet 0.0.0.0 netmask ff000000 broadcast 0.255.255.255
groupname racpub
ether 0:18:e7:8:c5:8b


When the primary interface on the public network failed, either NIC faulty or the LAN cable broken, Oracle VIP will follow the physical IP failover to standby interface. As the following output of 'ifconfig -a' shows

lo0: flags=2001000849<UP,LOOPBACK,RUNNING,MULTICAST,IPv4,VIRTUAL> mtu 8232 index 1
inet 127.0.0.1 netmask ff000000
hme0: flags=1000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4> mtu 1500 index 2
inet 192.168.1.1 netmask ffffff00 broadcast 192.168.1.255
ether 0:19:b9:3f:87:11
ce0: flags=19000802<BROADCAST,MULTICAST,IPv4,NOFAILOVER,FAILED> mtu 0 index 3
inet 0.0.0.0 netmask 0
groupname racpub
ether 0:14:d1:13:7b:7e
ce1: flags=1000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4> mtu 1500 index 4
inet 0.0.0.0 netmask ff000000 broadcast 0.255.255.255
groupname racpub
ether 0:18:e7:8:c5:8b
ce1:1: flags=1000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4> mtu 1500 index 4
inet 130.35.100.123 netmask ffffff00 broadcast 130.35.100.255
ce1:2: flags=1040843<UP,BROADCAST,RUNNING,MULTICAST,DEPRECATED,IPv4> mtu 1500 index 4
inet 130.35.100.124 netmask ffffff00 broadcast 130.35.100.255

References

NOTE:283107.1 - Configuring Solaris IP Multipathing (IPMP) for the Oracle 10g VIP
NOTE:368464.1 - How to Setup IPMP as Cluster Interconnect
docs.oracle.com/cd/E19253-01/816-4554/mpoverview/index.html

관련 정보 표시 관련 자료


제품
  • Oracle Database Products > Oracle Database > Oracle Database > Oracle Server - Enterprise Edition
키워드
CLUSTERWARE; CONFIGURATION; SOLARIS; VIP

반응형
Posted by [PineTree]