ORACLE/ADMIN2011. 5. 2. 16:58
반응형

Problem Description:
====================

Established connections fails intermittently.
ORA-03113 is returned to the application.
Application must start a new connection.
No errors are generated in the database alert.log


Solution Description:
=====================

WARNING: Misuse of Windows Registry/Editor can cause serious problems
that may require you to reinstall your operating system.

Increase TcpMaxDataRetransmissions parameter in the Windows Registry


On Windows NT/2000:

HKEY_LOCAL_MACHINE
      SYSTEM       
        CurrentControlSet         
          Services
            Tcpip             
             Parameters

add a key "TcpMaxDataRetransmissions" (REG_DWORD) and set its value to 10.

On the Edit menu, click Add Value, and then add the following:

      Value Name: TcpMaxDataRetransmissions
      Value Type: REG_DWORD - Number
      Valid Range: 0 - 0xFFFFFFFF
      Default Value: 5 Decimal
      New Value: 10 to  Decimal

The value may vary as it is dynamically adjusted.


On Windows 95:

HKEY_LOCAL_MACHINE     
   SYSTEM       
     CurrentControlSet         
       Services
            Winsock

add a key "TcpMaxDataRetransmissions" and set its value to 10.


On the Edit menu, click Add Value, and then add the following:

      Value Name: TcpMaxDataRetransmissions
      Value Type: REG_DWORD - Number
      Valid Range: 0 - 0xFFFFFFFF
      Default Value: 5 Decimal
      New Value: 10  Decimal


Solution Explanation:
=====================

When a TCP/IP packet in a given segment is sent, the sender waits for
an acknowledgment (ACK) before proceeding to the next task.
If an ACK is not received before the retransmission timer expires,
the retransmission timer value is doubled and the packet is resent.

A retrans counter is incremented and the process repeats until the
retrans counter is equal to the TcpMaxDataRetransmissions value.
At this point the network is considered timed-out and the socket is
closed by the Microsoft Operating System.

Oracle SQLNET/Net8 tracing reports this as an ntt2err: SOC xxxx error
as well as several other Oracle network transport errors (NT).

The default initial retransmission timer value is set to 3 seconds.
This value is doubled and it's product is doubled on the next iteration,
until iterations equal the TcpMaxDataRetransmissions value.

The total time to time-out with default values looks like this:
  (3s)+(3s*2)+(3s*4)+(3s*8)+(3s*16)=93s OR 1 min 33 seconds to network
  time-out.

If you see something very close to the following a SQL*Net/Net8 trace
file, the most likely cause would be that the network has timed-out.
The ntt2err SOC error is the real error - all errors appearing after
this are secondary or misleading errors.

ntt2err: soc xxxxx error - operation=5, ntresnt[0]=517, ntresnt[1]=54,
ntresnt[2]=0

Following the appearance of ntt2err, you may expect to see something like the
following. These errors are symptoms of the real problem.  They tell us
that the network or server went down when it only timed-out.

-<ERROR>- osnqrc:  wanted 1 got 0, type 0
-<ERROR>- osnqper:  error from osnqrc
-<ERROR>- osnqper:    nr err code: 0
-<ERROR>- osnqper:    ns main err code: 12547
-<ERROR>- osnqper:    ns (2)  err code: 12560
-<ERROR>- osnqper:    nt main err code: 517
-<ERROR>- osnqper:    nt (2)  err code: 54
-<ERROR>- osnqper:    nt OS   err code: 0
osnqer: entry
osnqer:  incoming err = 12151


A little later on you may see the following:
osnqer:  returning err = 3113

A clear sign of a busy network is:
ntt2err: soc xxxx - operation=5,
ntresnt[0]=517, ntresnt[1]=54, ntresnt[2]=0
This is very consistent.
       
The ORA-3113 tells us:
Text:   end-of-file on communication channel

You can look in the trace but it will be impossible to find an EOF
packet dump. You must review the second meaning of ora-3113 - the
network or server machine went down. Once again somewhat misleading
as the network was only busy and the server is really fine.


Note:
Increasing TcpMaxDataRetransmissions is only a workaround for a bad or over
stressed network. The workaround should gives you some time to address the
underlying network issues.
It is time to get out the network sniffer and measure the degree of the
problem.
It is more than likely that it will get worse over time, particularly with
increased network usage.


Search words:

Tcp Max Data Retransmissions
TcpMaxDataRetransmissions
3113
ntt2err: soc error -
operation=5, ntresnt[0]=517,
ntresnt[1]=54, ntresnt[2]=0


ref: {4950.102}    BUG-561277

반응형
Posted by [PineTree]