ORACLE/ADMIN2011. 5. 2. 22:35
반응형

SQL*NET DCD(DEAD CONNECTION DETECTION)과 KEEPALIVE의 관계
========================================================


PURPOSE 
---------

Client PC의 process가 사라지고 난 후 server에서 접속이 close가 안 되었을 경우
문제가 발생할 수 있다. 전형적인 예로는 Oracle에 접속되어 있는 상황에서 
사용자가 기계를 reboot 하거나 전원을 끊거나 할 경우 client process는 정확히
종료되었을 지 모르나 서버상의 background process는 client가 아직까지 접속
되어 있는것으로 생각하고 처리를 계속 수행하고 있으므로 이 결과로 인하여 서버
상에서는 많은 자원을 낭비할 수 있다.

이는 Oracle의 문제가 아니며 TCP/IP의 제한에 따른 것이다.

Dead Conection을 발견하여 보다 빠르게 close시키고 server상에서 사용하고 
있는resource를 보다 빠르게 free시키기 위해서는 keepalive를 정확하게 설정
하여야 한다.

Problem Description 
-------------------

** KEEPALIVE **

TCP/IP접속을 하고 있는 양측에서 접속을 하고 있는 동안 정해진 시간내에 
다른 반대편으로 부터 아무런 데이터를 받지 못했음을 감지하는 경우 
그것으로 인하여 transparent layer에서 "dummy(test)" packet을 application 
layer로 보내고 응답(acknowledgement)가 올 때가지 다른 한쪽에서는 기다리게
된다.
만일 접속된 process에서 응답(acknowledgement)를 기다리는 중에 timeout이 되는
경우 TCP/IP에서는 접속을 포기하기전에 보내진 "dummy(test)" packet에 대한 
응답(acknowledgement)를 기다리는 일련의 일들을 몇번이고 재시도 할 수 있도록
 구현되어 있다.

즉 keepalive가 설정되어 있을 경우에는 client process들이 실제로 connection을
유지하고 있는 지 확인하기 위해 모든 비활성 TCP/IP 접속을 주기적으로 조사
(Polling)한다. 접속이 되어 있는 process들은 정해진 비활성 주기 이후에 서버의
TCP/IP s/w에 의해 "dummy(test)"을 보내기 시작하며 반대편으로 부터 반드시 
응답(acknowledgement)을 받아야 한다. 만일 보낸 시도 packet들이  client에서 
무시되어 버린다면  서버는 접속이 최악의 상태인것으로 간주하고 접속을 
close한다. 그런 다음 ORACLE에서는 connection과 관련된 shadow process를 안전
하게 제거한다.

SQL*NET의 TCP/IP level에서는 listener가 startup될때 setsockopt()라는 함수를
호출함으로서 listener에서는 keepalive가 구현되어 있다.


* Dead mans Handle *

 일반적으로 네트워크를  이용하여 양쪽에 도달하기까지에 대한 algorithm의
 이름이며 keepalive와 매우 동일하고 이는 TCP/IP에서 사용하는 전문용어는 
 아니다. 
 SQL*NET에서 사용하는 용어로서 SQL*NET v2.1이상에 포함되어 있는 dead man's
 handle에 근간을 둔 algorithm으로 모든 platforms들과 protocol들에서 일반적
 으로 사용하고 있으며 이를 DCD(DEAD CONNECTION DECTION)라고도 부른다.
 SQL*NET V2.1.3 이상 (RDBMS V7.1.3이상)에서 사용할 수 있다.

* Timeout *

 Timeout은 orasrv의 기능으로 client가 connection handhake (client가 요청한 
 break-mode, buffersize, SID에 관한 정보를 어디로 전달할 것이지를 찾는 
 일련의 작업)를 하는 동안 응답이 올때까지 계속 기다리는 것을 방지하기 
 위해  orasrv에 추가된 기능이다.
 이 기능은 SQL*NET V1.2.7.2.3이상에서 가능하며 다음의 명령어를 수행하여 
 사용할 수 있다. (Default = 0)

 $ tcpctl timeout 5


만일 사용자가 application에서 정상적으로 종료하지 않고 전원을 끄는 경우
process들은 "Clean-up"되지 않고 이러한 shadow process들은 고아(oraphaned)
process가 될것이다. 
만일 수많은 사용자들이 이렇게 비정상 종료를 할 경우 서버에서는 이러한 defunct 
process들로 인하여 과부하가 걸리게 될 것이며 이때 이들 process들은 parent 
process id (PPID)를 1로 갖는 daemon 또는 init 소유의 process들이다.

이러한 상황을 해결하기 위해, 사용자는 TCP/IP level에서 KEEPALIVE option을 
설정할 수 있으며 이 parameter는
/usr/include/netinet디렉토리내의 tcp_timer.h
참조 할 수 있다.

일반적으로 Keepalive는 다음에서 보는 parameters에 의해 제어 할 수 있으며 
물론 system마다 parameters이름은 약간의 차이가 있을 수 있다.
                              
        tcp_keepidle       :  time before keepalive probes begin 
        tcp_keepintvl      :  time between keepalive probes 
        tcp_maxidle        :  time to drop (after probes)
        tcp_ttl            :  time to live for TCP segs 
        PR_SLOWHZ          :  usually 2 times per second


KEEPALIVE의 기존 interval은 다음과 같이 정의 되어 있다.

#define TCPTV_KEEP_IDLE (120*60*PR_SLOWHZ)  /* dflt time before probing */

여기서 PR_SLOWHZ을 1초로 정의되어 있다고 가정하면 (120*60*PR_SLOWHZ)은 
2시간을 나타내는 것이다. KEEPALIVE가 시작되면 "dummy" packet을 connection
들에 보내어 만일 connection이 active상태라면 KEEPALIVE가 작동하지 않을 
것이고 만일 더 이상 active상태가 아니라면 KEEPALIVE에 의해 connection이 
close될것 이다. 그런 다음Dedicated server process(shadow process)는 kill 
signal을 받는 것과 동시에 종료될 것이다.

주의 : KEEPALIVE의 특징을 모든 TCP/IP Verdor에서 제공하는 것은 아니며,
        이는 TCP/IP level에서 설정하는 것으로 Oracle의 기능은 아니다.
       이 기능은 vendor마다 다를 수 있으므로 만일 값을 설정하려면 TCP/IP 
        vendor로 문의하기 바란다. 
       이 기능은 문제를 피해갈 수 있는 하나의 방법일뿐 문제해결을 위한
       해결책이 아니므로 이로 인해 생기는 문제에 대해 는 보장하지 않는다.

SQL*NET V2.1이상의 서버와 SQL*NET V2.0.14이상의 client에서 제공하는 DCD
(Dead Connection Detection) 기능
은 TCP/IP의 KEEPALIVE 특징처럼 동일한 
기능을 수행한다.

DCD는 서버의 $ORACLE_HOME/network/admin/sqlnet.ora파일내에 있는 parameter인 
sqlnet.expire_time에 정의할 수 있으며 단위는 분이며 1 ~ 무한대까지 가능하다.

만일 sqlnet.ora에 정의되어 있지 않았다면 DCD의 기능은 사용되지 않는 다 
이 parameter는 client와 server사이에서 connection을 통하여 연속되는 시도
packet들을 전달하는 시간 주기를 결졍한다.


Workaround 
-----------

지금까지 설명한 keepalive에 대한 기능을 각 H/W에서는 어떻게 
설정하는 지 알아 보도록 한다. (일반적으로 만이 사용하는 H/W를 기준)
그리고 다음의 OS parameter를 수정시에는 반드시 OS engineer와 상의는 
하는 것이 바람직 하겠다.


< Windows NT >

 1. Regedt32를 수행한다.

 2. HKEY_LOCAL_MACHINE -> SYSTEM -> CurrentControlSet -> Services -> 
    
       Tcpip -> Parameters으로 간다.

 3. 편집메뉴 -> 값추가를 선택하면 값추가 윈도우가 나타난다.

 4. 값이름을 KeepAliveTime으로 넣고 데이타 형식을 REG_DWORD로 선택한후 
    확인버튼을 누르게 되면 DWORD편집기 윈도우가 나타나며 여기서 기수
    (이진, 십진, 16진)를 선택하는 데 십진을 선택하고 난후 데이타 값을
    180000 (3분) 설정한 후 확인 버튼을 누른다. 


< Netware >

   Netware에서는 DCD기능이 적용 안됨


< SunOS, Sun Solaris >

 # ndd /dev/tcp \?                      ----->  parameter 확인하는 방법 
 # ndd /dev/tcp tcp_keepalive_interval  ----->  설정된 값 확인 
 7200000     (기본값 :  7200000 ms)
 # ndd /dev/tcp
 tcp_keepalive_interval 10000
 # ndd /dev/tcp tcp_keepalive_interval 
 10000


< Digital Unix >

 # dbx -k /vmunix

 (dbx) p tcp_maxidle
 1200
 (dbx) p tcp_keepidle                ------> print value
 14400         ------> 설정된 값 확인 
 (dbx) assign tcp_keepidle = 1200    ------> new value assign 
 1200
 (dbx) assign tcp_maxidle = 600      ------> new value assign 
 600
 (dbx) quit

 두 parameter의 단위는 0.5초 입니다. tcp_keepidle의 기본값은 2시간이며 
 tcp_maxidle의 기본값은 20분입니다.
 tcp_keepidle은 keepalive code가 적용되기 전까지의 시간이며, tcp_maxidle은 
 connection이 drop될때까지 keepalive가 적용되는 시간과의  사이를 정해놓은
 시간입니다.
 즉 예를 들자면 tcp_keepidle을 10분으로, tcp_maxidle을 5분으로 설정하여 
 놓았다면 defaunct connection들은 15분후애 종료됩니다.

< IBM AIX >

 # no -a       ------->  설정된 값 확인
   .....         (units of half seconds)
   tcp_keepidle  = 14400       (2  hours)
   tcp_keepintvl = 150         (75 seconds)
   .....
 # no -o tcp_keepidle=1200
 # no -o tcp_keepintvl=40


  tcp_keepidle는 keepalive 메세지를 보내기 전에 활동하고 있지 않은 시간을 
  결정하는 parameter이고 tcp_keepintvl은 keepalive시도 메세지를 얼마나 자주 
  보내는지를 결정하는 parameter이다. 
  접속은 8회의 비응답 시도후에 broken이 고려된다.

  이러한 parameter의 값을 설정하는 것은 어려운 부분으로 불필요한 traffic을 
  감소시키기 위해 적은 값으로 설정하는 게 일반적이다. 실제로 이러한 
  parameter들은 시스템에 상당히 의존적이다. 
  아무튼 IBM AIX에서 oracle을 사용하는 경우 1200(10분)과 400(20초)으로 설정
  하여 사용하기를 추천한다.

  이 parameter의 값을 영구보존하기 위해서는 /etc/rc.tcpip에 두개의 명령어를 
  추가한다.


 < HP-UX  HP9000 Ver 10.01  I70 series >

  조정 할 수 있는 parameter를 확인한다.
 
  # nettune -h

  조정 할 수 있는 TCP parameter의 이름들을 확인한다.

  #  nettune -h tcp | grep


     tcp_localsubnets:
     tcp_receive:
     tcp_send:
     tcp_defaultttl:
     tcp_keepstart:
     tcp_keepfreq:
     tcp_keepstop:
     tcp_maxretrans:
     tcp_urgent_data_ptr:
     tcp_pmtu:
     tcp_random_seq:

  TCP에서 keepalives의 전송을 시작하기 전에 idle time을 변경하기위해 
  tcp_keepstart를 설정한다 
  그리고 keepalives사이의 시간을 변경하기 위해 tcp_keepfreq를 설정한다.


  # nettune tcp_keepfreq    --> This shows the current value: 75.

  # nettune -l tcp_keepfreq --> This shows the range of values.

    tcp_keepfreq = 75 default = 75 min = 5 max = 2000 units = seconds
    
  # nettune -s tcp_keepfreq 1000  -->    set it to 1000

  # nettune tcp_keepfreq          -->    verify it as 1000


  keepalive를 수정하기 위하여

   # adb -w /ph-ux /dev/kmem

    0d10$d              (Change to base 10)
    tcp_keepidle?D      (Display value of keepidle - 14400 = 2 hours)
    tcp_keepidle?W 1200 (This command updates the HP-UX kernel on disk)
    tcp_keepidle/W 1200 (This command updates the HP-UX kernel in
                         memory)
    tcp_keepidle?D      (This will display the value used for the next
                         boot)
    tcp_keepidle/D      (This will display the value used for future
                         connections)
    ^d                  (CRTL-D to exit adb)

    Again, this will modify the HP-UX kernel on disk and in memory.

    사용자의 시스템을 위해 PHNE_8420=Nettune Cumulative Patch 와
    PHNE_11758=FDDI cumulative patch를 찾아보기 바란다.


 < HP-UX  HP9000 Ver 10.10  K210 series >

  TCP에서 keepalives의 전송을 시작하기 전에 idle time을 변경하기위해 
  tcp_keepstart를 설정한다 
  그리고 keepalives사이의 시간을 변경하기 위해 tcp_keepfreq를 설정한다.


  # nettune tcp_keepfreq              # what's the current value?
    75

  # nettune -l tcp_keepfreq           # what are the range of values?
    tcp_keepfreq = 75 default = 75 min = 5 max = 2000 units = seconds

  # nettune -s tcp_keepfreq 1000      # set it to 1000

  # nettune tcp_keepfreq              # verify it
    1000

  변경할 수 있는 모든 값들은 help를 이용하여 얻을 수 있다.

  # nettune -h                # what can I tune?
  # nettune -h tcp            # what TCP variables can I tune?
  # nettune -h tcp | grep :   # what are the TCP tunables named?

  # adb -w /ph-ux /dev/kmem
    0d10$d              (Change to base 10)
    tcp_keepidle?D      (Display value of keepidle - 14400 = 2 hours)
    tcp_keepidle?W 1200 (This command updates the HP-UX kernel on disk)
    tcp_keepidle/W 1200 (This command updates the HP-UX kernel in
                         memory)
    tcp_keepidle?D      (This will display the value used for the next
                         boot)
    tcp_keepidle/D      (This will display the value used for future
                        connections)
    ^d                  (CRTL-D to exit adb)


이외의 H/W에 대해서는 OS verdor로 문의하여 알아 보시기 바랍니다.

반응형
Posted by [PineTree]
ORACLE/ADMIN2011. 5. 2. 22:32
반응형
KeepAlive 란 Idle TCP/IP Connection 의 Time out 기능을 말한다.

TCP/IP 기반으로 remote 로 연결된 상태에서 클라이언트의 갑작스런 리부팅이나
스위치 Off 의 경우 해당 연결을 서버에서 Closing 하지 못하게 된다.
이것은 TCP/IP의 제한인 것이지 오라클의 문제가 아니다.
이 때 KeepAlive가 정확하게 configure되어 있다면 좀 더 빨리 Dead connection
을 발견하고 빨리 Closing할 것이다.
TCP/IP 에 관련된 것이므로 KeepAlive를 각 플랫폼 별로 지정하는 방법이 모두
틀리다.

예를들면 유저가 PC 에서 Unix Oracle Server로 연결 후 실수로 다음 쿼리 문을
실행했다.

SQL> select * from bigtable, hugetable, massivetable;

적게는 수십분, 많게는 몇시간이 걸릴 작업이라면 유저는 쿼리를 인터럽트하기
보다 reset 버튼을 누르게 될 것이다. 이때 클라이언트 프로세스는 없어졌지만
백그라운드 서버 프로세스는 계속 running 중일것이다.
이런 현상은 수시로 일어날 수 있다. 또한 당연한 것이다.
유저가 사용을 잘못하는 것이기 때문이다.

하지만 SQL*NET 2.1 이상에서 Dead-Connection-Detection 이라 불리는 기능을
지원한다. 이것은 곧 KeepAlive 를 지정할 수 있는 기능이다.
반드시 서버 사이드의 $ORACLE_HOME/network/admin/sqlnet.ora 파일에 다음의
파라미터를 지정해야 한다. 클라이언트에는 지정해봐야 전혀 소용이 없다.

sqlnet.expire_time = 1 (단위는 분)

Netware를 제외한 모든 Platform, Protocol에서 지원이 된다.

# Windows NT 에서 KeepAlive 를 지정하는 방법.
(타 플랫폼은 제외)

Regedt32 를 실행한다.

레지스트리에서

HKEY_LOCAL_MACHINE/SYSTEM/CurrentControlSet/Services/Tcpip/Parameters

까지 이동한다.
여기서 키 추가가 아닌 값 추가를 한다.

파라미터는 KeepAliveTime (REG_DWORD) 로 입력한다.

값은 Decimal -> 180000 (for example = 3 minutes)

으로 지정한다.
반응형
Posted by [PineTree]
ORACLE/TroubleShooting2011. 5. 2. 22:10
반응형

An ORA-3113 "end of file on communication channel" error is a general error usually reported by a client process connected to an Oracle database. The error basically means 'I cannot communicate with the Oracle shadow process'. As it is such a general error more information must be collected to help determine what has happened - this error by itself does not indicate the cause of the problem. For example, ORA-3113 could be signalled for any of these scenarios:

    Server machine crashed
    Your server process was killed at O/S level
    Network problems
    Oracle internal errors / aborts on the server
    Client incorrectly handling multiple connections
    etc.. etc.. etc.. - a lot of possible causes !!

This short article describes what information to collect for an ORA-3113 error. It is common for this error to be accompanied by other errors such as:

    ORA-1041 internal error. hostdef extension doesn't exist
    ORA-3114 not connected to ORACLE
    ORA-1012 not logged on

These are all symptomatic of being disconnected from Oracle.

Please collect as much information as possible from the items listed below and submit this information to Oracle support. Where a step produces an output file / trace file this may be needed by Oracle Support to help determine the cause of the error. The more information you can present in one go the better your chance of a speedy solution. Note that some sections may not be applicable.

If viewing this article on a Web Browser you can follow the links, otherwise manually go to the relevant section.

 
What Scenario does the ORA-3113 occur in ?

  A. When attempting to startup Oracle ?                    -> Section A
  B. When attempting to make a connection to the database ? -> Section B
  C. Client gets the error running SQL / PLSQL ?            -> Section C
  D. Server trace file reports ORA-3113 ?                   -> Section D

You may find it useful to scan the checklist in Section E at the end of this article. This covers some questions / issues relevant to all problem sections.

NOTE: References to SVRMGR are outdated with Oracle9i and Oracle 10g. If you are on an earlier version of the database, you will use 'svrmgrl' instead of SQL*Plus to connect as the internal user.

e.g. svrmgrl
SVRMGR> connect / as sysdba (or connect internal)
(A) ORA-3113 when attempting to STARTUP Oracle

  
  There are several phases involved in starting up a database. If ORA-3113
  occurs during startup then abort the instance and start up using the
  sequence below. If an error occurs at any step then see the related notes
  below.

    a. Start any required services.        On error see A1
       Eg: On NT start the OracleServiceSID

    b. Connect as a SYSDBA user.          On error see A1
       Eg:  sqlplus /nolog
        SQL> connect / as sysdba

    c. Startup nomount.              On error see A1
       Eg:
        SQL> startup nomount

    d. Mount the database.             On error see A1 and A2
       Eg:
        SQL> alter database mount;

    e. Recover the database         On error see A3
       Eg:
        SQL> recover database

    f. Open the database             On error see A4
       Eg:    
        SQL> alter database open;

    g. Wait 3 minutes then issue a select.   On error see A4
       Eg:    
        SQL> select count(*) from DBA_OBJECTS;



A1) Errors connecting as SYSDBA / Internal OR on startup nomount
   
    There is something fundamental wrong with the software / environment
    if you cannot connect to Server Manager as a DBA user.
    The steps here cover errors such as ORA-3113, ORA-12547: TNS:lost contact
    or similar errors connecting to Oracle or starting the instance NOMOUNT. 
    Check the following items:

    A1.1)    If possible reboot the server disabling any automatic
        startup of Oracle before you do so. This may seem drastic
        but helps make sure you are working from a consistent
        starting point.

    A1.2)     Check your environment points at the expected ORACLE_HOME
        and ORACLE_SID and that TWO_TASK is not set (Unix) or
        LOCAL is not set (NT registry).
                Check the USER_DUMP_DEST and BACKGROUND_DUMP_DEST and default
                trace directories under this environment for any user trace
                files or alert log entries generated. These may help indicate
                the cause of the problem.
                Eg: ORA-600[SKGMINVALID] may indicate a problem with the
                    shared memory Unix parameters on Unix systems.
                Try to show that any trace file / alert log entry you
                find is truely related to the "CONNECT" command by re-issuing
                the "connect" and checking for a new trace file / alert entry
                at the time of the error.

        A1.3)   Unix only:
        Some Unix platforms need LD_LIBRARY_PATH to be set
                correctly to resolve any dynamically linked libraries.
                As the user with the problem:

                        % script /tmp/ldd.out
                        % id
                        % cd $ORACLE_HOME/bin
                        % ldd oracle
                        % exit

                If the 'ldd' command does not exist go to the next step below.
                Check that all lines listed show a full library file. If there
                are any 'not found' lines reported contact Oracle support
                with the output of /tmp/ldd.out .


        A1.4)   Unix only:
        Your 'oracle' executable may be corrupt. Relink it thus:

                        Log in as the 'oracle' user.
                        % script /tmp/relink.out
                        % cd $ORACLE_HOME/rdbms/lib
            % mv $ORACLE_HOME/bin/oracle $ORACLE_HOME/bin/oracle.dd.mon.yy
                        % rm -f ./oracle       
                        % make -f ins_rdbms.mk ioracle
                        % exit

        Prior to Oracle7.3 the relink command was:
            make -f oracle.mk ioracle

                If this reports any errors Oracle support will need to see
                the contents of the file /tmp/relink.out .

   
    A1.5)   Have you installed the Parallel Server Option ?
                ORA-3113 can occur if you have installed the Parallel
        Server Option but do NOT have a Distributed Lock Manager
        installed or running correctly.

                Unix:
          If the Parallel Server Option was installed by accident
          then it can be de-installed by relinking.
          Eg:
                        Shut down any Oracle instances
                        % script /tmp/relink.out
                        % cd $ORACLE_HOME/rdbms/lib
                        # 'oracle' should not exist so delete it if it present
                        % rm -f oracle
                        % make -f ins_rdbms.mk no_parropt ioracle
                        % exit

          NB: Do NOT deinstall the Parallel Server Option if the
              database is using Parallel Server unless both nodes
              are shut down otherwise database corruption could occur.

    A1.6)   If the error is on STARTUP NOMOUNT:

            Check the init.ora file used to start the database.
            This provides the configuration details used
            configure the instance. To help isolate the problem
            it may be useful use a very basic init.ora file
            when starting the instance. If this works then
            parameters can be increased / introduced one at a
            time to see if there is a problem with a particular
            setting.

    A1.7)     Check for server side trace files which may give more
        indication what the underlying problem is.
        See section C for details on how to check
        for server trace files.
       

    A1.8)     Ensure there is free disk space in:

          a. Your USER_DUMP_DEST and BACKGROUND_DUMP_DEST locations

          b. Your AUDIT destination (Unix)
            The default is $ORACLE_HOME/rdbms/audit

          c. Your Oracle Trace directory if Oracle Trace is enabled
            See Note:45482.1

        A1.9) Windows 2000 - Only
              If the Server's sqlnet.ora file contains Authnetication services
              which are NOT reachable by Oracle, then an ORA-3113 error will
              result.
              For example, if the sqlnet.ora file contains the parameter:
              SQLNET.AUTHENTICATION_SERVICES = (NTS) and the Oracle database
              is moved from a Windows NT Domain to an Active Directory one,
              or if a Domain Controller is introduced, then an error will
              result trying to start the database.
              Remove the sqlnet.authentication_services line so that Oracle
              does not look for a non-existent KDC (Kerberos Domain Controller).


           
A2) Errors Mounting the database

    Check all the items in A1 first.

    If an error occurs when mounting the database there may be problems
    with the control-files or data files, or with resources required to
    open these files.

    A2.1)     The location of the control files are specified in the
        init.ora file.  Try mounting using each controlfile in
        turn.
        eg: "Shutdown abort",
            edit the init.ora to refer to ONE of the controlfiles only,
            "startup nomount",
            "alter database mount"
        Repeat for each controlfile to see if any controlfile works.
       
    A2.2)    It is possible to re-create the controlfiles if you know the
        location of all datafiles and online logs, or to restore an old
        backup controlfile. Always back up the current controlfiles before
        restoring any backup copies or issuing a CREATE CONTROLFILE
        command.
        The steps for this are not documented here.

        A2.3)   Unix:
                Some unix platforms have a 'truss' command (or 'tusc').
        If available this can be used to help trace how far Oracle
        gets before the error occurs.
                Eg: 
                        % truss -o /tmp/truss.out -f svrmgrl

                Keep the file /tmp/truss.out safe - Oracle Support MAY need to see it.


A3) Errors on RECOVER DATABASE

      ORA-3113 during recover database is often related to a corruption on the
    database or redo stream which causes the shadow process to die. There should
    be a server side trace file produced for this sort of problem.
    See Section C for details on how to locate any trace files
    from both USER_DUMP_DEST and BACKGROUND_DUMP_DEST.

    A3.1)    If the "recover database" fails fairly quickly then it
        may help to collect the redo up to the point of failure as this
        may help identify where the problem is.

        Use the following commands just prior to the RECOVER DATABASE
        command:

          SQL> alter session set max_dump_file_size=unlimited;
          SQL> alter session set events
          2> '10228 trace name context forever, level 10';
          SQL> RECOVER DATABASE
       
        This causes redo information to be written to the user trace
        file. The last items of redo may help determine which file
        has problems.

    A3.2)    If you do not have many datafiles in the database it may be
        as quick to recover each file in turn to see if this narrows
        down the problem.
        Eg:
          SQL> select name from v$datafile;
       
        and then for each file:

          SQL> RECOVER DATAFILE 'full_file_name'

        If this gets to a problem file then back up the file and
        use standard recovery options as if the file was lost.
       

A4) Errors on ALTER DATABASE OPEN

    Database open performs very many operations and so it is necessary
    to collect any trace information before determining the next steps.
    However, the following may help isolate the problem more quickly:

    A4.1)     Move files out of your USER_DUMP_DEST and BACKGROUND_DUMP_DEST
        directory as these steps will generate a lot of trace.

    A4.2)     Edit the init.ora file and add the lines:

            event="10046 trace name context forever, level 12"
            event="10015 trace name context forever, level 1"
            event="10228 trace name context forever, level 1"

        If you already have "EVENT=" lines in the init.ora file
        this MUST directly below the other "Event=" lines.
        These lines will trace:

            SQL and BIND activity during startup
            REDO applied
            Information about transactional rollback required

    A4.3)     Startup the instance as described at the top of this section.
       
        As soon as the error occurs REMOVE the above events from the init.ora
        file and shutdown. Collect together the trace files and alert logs
        as described in Section C

       


(B) ORA-3113 attempting to connect to the database

  
   Which SQL*Net layer are you using ?

    a. SQL*Net2 or Net8  -> Goto B1
       This is the default for Oracle 7.0.15 onwards

    b. SQL*Net 1         -> Goto B2
       Note: This can only be used with Oracle releases before 7.3


B1) Connecting to Oracle using SQL*Net V2 / Net8

   Net8 or SQL*Net2 should report network related errors if a problem
   occurs whilst establishing a connection to a remote Oracle shadow
   process. An ORA-3113 implies that the connection itself has been established
   but then is lost. As such follow the steps in Section C

B2) Connecting to Oracle using SQL*Net V1

   This section is included for historical reasons.


   Is the tool you are trying to connect from

    a. A local connection ?    -> Goto BL
       ie: The client is on the same machine as the database and you are
        not trying to connect over a network connection.

    b. A remote connection ? -> Goto BR


   BL) Local SQL*Net V1 Connections

      For local connections check the following:

        BL1)    Have you installed the Parallel Server Option ?
                ORA-3113 will occur if you have installed the Parallel
                Server Option but do NOT have a Distributed Lock Manager
                installed or running.

                To deinstall the Parallel Server Option:

                        Shut down any Oracle instances
                        % script /tmp/relink.out
                        % cd $ORACLE_HOME/rdbms/lib
                        # 'oracle' should not exist so delete it if it present
                        % rm -f oracle
                        % make -f oracle.mk no_parropt ioracle
                        % exit

                If this reports any errors Oracle support will need to see
                the contents of the file /tmp/relink.out .


        BL2)    Try using the SQL*Net V1 driver for local connections:

                        setenv TWO_TASK P:

                Then try the client tool. If this now works you may have a
                problem with the default SQL*Net driver.


        BL3)    Your 'oracle' executable may be corrupt. Relink it thus:

                        Log in as the 'oracle' user.
                        % script /tmp/relink.out
                        % cd $ORACLE_HOME/rdbms/lib
                        % rm -f oracle       
                        % make -f oracle.mk ioracle
                        % exit

                If this reports any errors Oracle support will need to see
                the contents of the file /tmp/relink.out .


        BL4)    Some Unix platforms need LD_LIBRARY_PATH to be set
                correctly to resolve any dynamically linked libraries.
                As the user with the problem:

                        % script /tmp/ldd.out
                        % id
                        % cd $ORACLE_HOME/bin
                        % ldd oracle
                        % exit

                If the 'ldd' command does not exist go to the next step below.
                Check that all lines listed show a full library file. If there
                are any 'not found' lines reported contact Oracle support
                with the output of /tmp/ldd.out .


        BL5)    If you cannot connect as the Oracle user AND your system has
                the 'truss' command try the following when logged in as
                'oracle' (using the relevant client tool):
               
                        % truss -o /tmp/truss.out -f sqlplus user/password
                        Exit from sqlplus (or the problem tool)

                Keep the file /tmp/truss.out safe - Oracle MAY need to see it.


   BR) Remote SQL*Net v1 Connections

      For remote connections check the following:

        BR1)    Check if you can make LOCAL connections. If not then follow
                the steps above for LOCAL connection problems.


        BR2)    Which SQL*Net protocol are you using ?

                Table B.1
                ~~~~~~~~~
                SYMBOL     SQL*Net V1 Layer  Prefix  Debug       Listener
                ------     ----------------  ------  -----       --------
                osnptt      PIPE Two Task     P:     OSNTTD      none
                osnasy      ASYNC             A:     OSNADBUG    none
                osnttt      TCP/IP Two Task   T:     OSNTDBUG    orasrv
                osntlitcp   TCP/TLI Two Task  TT:    OSNTLIDBUG  tcptlisrv
                osntlispx   SPX Two Task      X:     OSNTLIDBUG  spxsrv
                osndnt      DECNET Two Task   D:     OSNDDBUG    none


                For the protocol you are using check that the 'oracle'
                executable has this linked in thus:

                        Log in as oracle on the server
                        % script /tmp/drivers.out
                        % cd $ORACLE_HOME/bin
                        % drivers oracle
                        % exit

                        Eg: If you are using TCP/IP it should list TCP/IP.

                If the 'drivers' command does not exist on your machine check
                the 'oracle' executable as below substituting the relevant
                symbol from Table C.1 for the word 'SYMBOL'. If you get *NO*
                output the driver is probably *NOT* installed.

                        % script /tmp/symbols.out
                        % cd $ORACLE_HOME/bin
                        % nm oracle | grep -i SYMBOL    # Use relevant SYMBOL
                        % exit

                        Eg: For SQL*Net TCP/IP you would use the command:

                                % nm oracle | grep -i osnttt

                If the required driver is not installed you should:

                        a) Relink Oracle (See step (B3) above).
                        b) Re-check the 'oracle' executable for the
                           relevant driver. If it is still missing then
                           the relevant SQL*Net driver has probably not
                           been installed. Reinstall the required SQL*Net
                           driver.


        BR3)    Check your /etc/oratab or /var/opt/oracle/oratab file
                is of the form:

                        # Comments begin with a HASH
                        SID:/path/to/oracle/home:N

                And confirm:
               
                        [ ] There are no blank lines.
                        [ ] The PATH to ORACLE_HOME is correct and contains
                             no environment variables.
                        [ ] There are no ':'s in the ORACLE_HOME path.
                        [ ] There is NOTHING at the end of the line.
                            The last character on a line should be Y or N.
                            There should NOT be a fourth field.


        BR4)    Unix only:
        If you have 'truss' available try to truss the Oracle
                connection. You will normally need 'root' privilege to do this
                and should use truss on the relevant listener process (see
                Table B1)
                Eg: For TCP/IP the listener is 'orasrv' so enter these commands
                    as 'root':

                        % truss -o /tmp/truss.out -f -eall -p <PID_of_orasrv>
               
                        Attempt the connection to reproduce the ORA-3113 then
                        interrupt this 'truss' session.

                This will produce a LOT of output so keep it but do not send it
                to Oracle initially - just make a note that it is available.


(C) Client sees ORA-3113 running SQL / PLSQL

  
   If the ORA-3113 error occurs AFTER you have connected to Oracle then
   it is most likely that the 'oracle' executable has terminated unexpectedly.

        C1)     Determine which database you were connected to and
                obtain the following init.ora parameter values:
       
                        Parameter               Default
                        ~~~~~~~~~               ~~~~~~~
                        USER_DUMP_DEST          $ORACLE_HOME/rdbms/log
                        BACKGROUND_DUMP_DEST    $ORACLE_HOME/rdbms/log
                        CORE_DUMP_DEST          $ORACLE_HOME/dbs

                Eg: To find these log into Server Manager and:

                        SQL> show parameter dump


        C2)     Check in your 'USER_DUMP_DEST' for any Oracle trace file.
                It is important to find the correct trace file.
        On Unix:
          Use the command 'ls -ltr' to list files in time order with the
                  latest trace files appearing LAST. The trace file will typically
          be of the form '<SID>_ora_<PID>.trc'.
        On NT:
          Click on the "Modified" column in Windows Explorer to sort the
          files by their modified date. Files will typically be of the form
          'ORA<PID>.TRC'.
       
        If you are not sure which trace file may be relevant MOVE all
        the current trace files to a different directory and reproduce
        the problem.

        C3)     Check in your 'BACKGROUND_DUMP_DEST' for your alert log and
        any other trace files produced close to the time of the error.
        On Unix this should be named 'alert_<SID>.log'.
        On NT this should be named '<SID>ALRT.LOG'

        C4)     Unix Only:
        If there is no trace file check for a 'core' dump in the
                CORE_DUMP_DEST. Check thus:

                        % cd $ORACLE_HOME/dbs   # Or your CORE_DUMP_DEST
                        % ls -l core*

                If there is a file called 'core' check its time matches the
                time of the problem. If there are directories called
                'core_<PID>' check for core files in each of these. It is
                IMPORTANT to get the correct core file. Now obtain a stack
                trace from this 'core' file. Check each of the sequences below
                for how to do this - one of these should work for your
                platform:


                    If you have dbx:
                        % script /tmp/core.stack
                        % dbx $ORACLE_HOME/bin/oracle core
                        (dbx) where
                        ...
                        (dbx) quit
                        % exit
                       
                    If you have sdb:
                        % script /tmp/core.stack
                        % sdb $ORACLE_HOME/bin/oracle core
                        * t
                        ...
                        * q
                        % exit
                       
                    If you have xdb:
                        % script /tmp/core.stack
                        % xdb $ORACLE_HOME/bin/oracle core
                        (xdb) t
                        ...
                        (xdb) q
                        % exit


                    If you have adb:
                        % script /tmp/core.stack
                        % adb $ORACLE_HOME/bin/oracle core
                        $c
                        ...
                        $q
                        % exit


        C5)     Try to isolate the SQL command that is executing when
                the error occurs. Eg: Is it a particular SQL statement
                or PL/SQL block that causes the error ?
        In many cases this will be listed in the trace file
        produced under the heading "Current SQL statement", or
         near the middle of the trace file under the cursor referred
        to by the "Current cursor NN" line.

        If the trace does not show the failing statement then
        then SQL_TRACE may be used to help determine this provided
          the problem reproduces. SQL_TRACE can be enabled in most
        client tools:

                Eg: Product     Action
                    ~~~~~~~     ~~~~~~
                    SQL*Plus    Issue 'ALTER SESSION SET SQL_TRACE TRUE;'
                    Pro*        EXEC SQL ALTER SESSION SET SQL_TRACE TRUE;

                This should force a server side SQL trace file as detailed
                in C2 above. The trace file should give a clue as to what
                SQL was being executed.


        C6)     If no trace file can be found and the problem is reproducible
        then SQL*Net trace may help to show what the latest operation sent
        to the 'oracle' process was.
                For SQL*Net V2 / V8 tracing see the article Note:16564.1
   
        Historical note:
                     For SQL*Net V1 check which SQL*Net protocol you are using
                    and note the 'Debug' environment variable from table B1 below.
                    Then catch SQL*Net trace from the client. Eg: For SQL*Net
                    TCP/IP and sqlplus:
   
                            % setenv OSNTDBUG -1    # Use correct OSN*DBUG variable
                            % sh
                            % sqlplus scott/tiger@t:hostname:sid 2>/tmp/net1.out
       

        C7)     Based on information collected above try to determine a small
                test case which will reproduce the problem. This is important
                for two reasons:

                        a) It gives Oracle support a small test case if the
                           problem does not look like a known problem.
               
                        b) It gives you a simple way to check if any patch
                           supplied will fix the problem.



    C8)    If a statement can be isolated which consistently raised an
        ORA-3113 error then it is worth spending some time collecting
        additional information such as:
           
            - An execution plan for the statement
            - Table definitions, column definitions
            - Information on constraints, triggers etc..

        ie: Any additional information about the statement which fails.
            eg: If a SELECT fails then it may succeed if run under a
            different optimizer mode.



    C9)     Check if your server administrator has any scripts which abort
        long running or CPU intensive processes. An ORA-3113 process
        can occur if someone kills your Oracle shadow process at O/S
        level (Eg: kill -9 on Unix).


(D) Server trace reports ORA-3113

    D1)     It is unusual for a server trace to report ORA-3113.
        However, this can occur if the server loses contact with
        the client OR a database link connection.
        Treat this the same as an ORA-3113 in a client process
        and follow the steps in Section C.

    D2)     The following line may be added to the init.ora file
        to help collect maximum information when the error occurs:

            event="3113 trace name errorstack level 3"

        If you already have "EVENT=" lines in the init.ora file
        this MUST directly below the other "Event=" lines.

(E) Additional Checks / Information

        E1)     Is it just this one tool that encounters the error or
                do you get ORA-3113 from any tool doing a similar operation ?
                If the problem reproduces in SQL*Plus use this in all tests
                below.

        E2)     Unix only:
        Check if the problem is just restricted to:

                        [ ] One particular UNIX user,
                        [ ] Any UNIX user
                    or  [ ] Any UNIX user EXCEPT as the Oracle user.


        E3)     Check if the problem is just restricted to:

                        [ ] One particular ORACLE logon
                    or  [ ] Any ORACLE logon that has access to the
                                relevant tables.

        E4)     If this is a client-server set up does this occur from:

                        [ ] Any client
                        [ ] Just one particular client
                    or  [ ] Just one group of clients ?
                            If so what do these clients have in common ?
                            Eg: Software release .
               
        E5)     Do you have a second server or database version where the
                same operation works correctly ?


    E6)     Ensure there is free disk space in:

          a. Your USER_DUMP_DEST and BACKGROUND_DUMP_DEST locations

          b. Your AUDIT destination (Unix)
            The default is $ORACLE_HOME/rdbms/audit

          c. Your Oracle Trace directory if Oracle Trace is enabled
            See Note:45482.1

반응형

'ORACLE > TroubleShooting' 카테고리의 다른 글

ora 4031 에러 처리 방안  (0) 2011.06.02
ORA-4031 에러 만들기  (0) 2011.06.02
DIAGNOSING ORA-3113 ERRORS [ID 1020463.6]  (0) 2011.05.02
Thread 1 cannot allocate new log, sequence  (0) 2011.03.25
CHECKPOINT NOT COMPLETE  (0) 2011.01.30
Posted by [PineTree]
ORACLE/TroubleShooting2011. 5. 2. 21:48
반응형

Applies to:

Oracle Server - Personal Edition
Oracle Server - Enterprise Edition
Oracle Server - Standard Edition
Information in this document applies to any platform.
*** Checked for relevance on 05-Feb-2007 ***
*** Checked for relevance on 27-Jul-2010 ***

Purpose

The ORA-3113 error is a general error reported by Oracle client tools,
which signifies that they cannot communicate with the oracle shadow
process. As it is such a general error more information must be collected
to help determine what has happened.

This short article describes what information to collect for an
ORA-3113 error when the Oracle server is on a Unix platform.

Scope and Application

This article is intended for DBAs at all levels of experience.

DIAGNOSING ORA-3113 ERRORS

General Issues:
===============
1) Is it only one tool that encounters the error or do you get an ORA-3113 from any tool doing a similar operation?  If the problem reproduces in SQL*Plus use this in all tests below.

2) Check if the problem is just restricted to:
     [ ] One particular UNIX user,
     [ ] Any UNIX user
 or [ ] Any UNIX user EXCEPT as the Oracle user.

3) Check if the problem is just restricted to:
     [ ] One particular ORACLE logon
 or [ ] Any ORACLE logon that has access to the relevant tables.

4) If you have a client-server configuration does this occur from:
    [ ] Any client
    [ ] Just one particular client
or [ ] Just one group of clients ?

If so what do these clients have in common ?
Eg: Software release .

5) Do you have a second server or database version where the same operation works correctly?

Connecting to Oracle
================

If the ORA-3113 error occurs when actually connecting to Oracle then follow the section below. If you connect to Oracle successfully and get the error on an established connection, please go to the section 'An Established Connection'.

Local Connections
==============

For local connections check the following:

1) Try using the SQL*Net driver for local connections:

setenv TWO_TASK P:

Then try the client tool. If this now works you may have a problem with the default SQL*Net driver.

2) Your 'oracle' executable may be corrupt. Relink it as follows:

For 7.3.X thru 8.1.7

Log in as the 'oracle' user.
% script /tmp/relink.out
% cd $ORACLE_HOME/rdbms/lib
% rm -f oracle
% make -f oracle.mk ioracle
% exit

For 9.2.X thru 11.2.X use the relink executable
relink
usage:
relink
parameters: all, oracle, network,
client, client_sharedlib interMedia,
ctx, precomp, utilities, oemagent, ldap

If this reports any errors Oracle support will need to see the contents of the file /tmp/relink.out .

3) If you cannot connect as the Oracle user AND your system has system call tracing such as truss, trace the problem connection when logged in as 'oracle' (using the relevant client tool). The example given below is for truss:

% truss -o /tmp/truss.out -f sqlplus user/password

Keep the file /tmp/truss.out safe - Oracle MAY need to see it.

Remote Connections
================

For remote connections check the following:

1) Check if you can make LOCAL connections. If not then follow the steps above for LOCAL connection problems.

2) If you have truss or an equivalent tracing tool available try to trace the Oracle connection. You will normally need the root privilege to do this and should trace the listener process.

Eg: For TCP/IP the listener is 'tnslsnr' so enter these commands as 'root':

% truss -o /tmp/truss.out -f -eall -p <pid of tnslsnr>

Attempt the connection to reproduce the ORA-3113 then interrupt this 'truss' session.

An Established Connection:

===================

If the ORA-3113 error occurs AFTER you have connected to Oracle, then it is most likely that the oracle executable has terminated unexpectedly.

1) Determine which database you were connected to and obtain the following init.ora parameter values:

Parameter                                               Default

USER_DUMP_DEST                            $ORACLE_HOME/rdbms/log
BACKGROUND_DUMP_DEST          $ORACLE_HOME/rdbms/log
CORE_DUMP_DEST                           $ORACLE_HOME/dbs

To find these log into SQL*Plus:

SQL> show parameter dump

2) Check your alert_<SID>.log for errors coinciding with the ORA-3113.

3) Check in your 'USER_DUMP_DEST' for any Oracle trace file. It is important to find the correct trace file. Use the command 'ls -ltr' to list files in time order with the latest trace files appearing LAST. If you are not sure which trace file may be relevant, move all the current trace files to a different directory and reproduce the problem. The trace file will typically be of the form 'ora_<spid>.trc'. If there is an error in the alert.log the relevant trace file name will be referenced.

4) If there is no trace file check for a core dump in the CORE_DUMP_DEST. Check as follows:

% cd $ORACLE_HOME/dbs # Or your CORE_DUMP_DEST
% ls -l core*

If there is a file called core, check that its time matches the time of the problem. If there are directories called 'core_' check for core files in each of these. It is IMPORTANT to get the correct core file. Now obtain a stack trace from this core file using Note 1812.1 'TECH: Getting a Stack Trace from a CORE file'
5) Try to isolate the SQL command that is executing when the error occurs. Eg: Is it a particular SQL statement or PL/SQL block that causes the error? To help establish this turn on SQL_TRACE for the client tool.

Eg: Product     Action
~~~~~~~       ~~~~~~
SQL*Plus       Issue 'ALTER SESSION SET SQL_TRACE TRUE;'
Pro*               EXEC SQL ALTER SESSION SET SQL_TRACE TRUE;

This should force a server side SQL trace file as detailed in #3 above. The trace file should give a clue as to what SQL was being executed.

6) Try to obtain any SQL*Net trace to show what the latest operation sent to the Oracle process was (Note 16564.1 'TECH: SQL*Net V2 on Unix - A Quick Guide to Setting Up Client Side Tracing')

7) Based on information collected above try to create a small test case which will reproduce the problem. This is important for two reasons:

a) It allows Oracle to test in a debug environment if the problem does not look like a known problem.
b) It gives you a simple way to check if any patch supplied will fix the problem.








관련 정보 표시 관련 자료


제품
  • Oracle Database Products > Oracle Database > Oracle Database > Oracle Server - Personal Edition
  • Oracle Database Products > Oracle Database > Oracle Database > Oracle Server - Enterprise Edition
  • Oracle Database Products > Oracle Database > Oracle Database > Oracle Server - Standard Edition
키워드
UNIXDIAG
오류
ORA-3113; 3113 ERROR
반응형
Posted by [PineTree]
ORACLE/ADMIN2011. 5. 2. 21:42
반응형

1.서버/클라이언트 로그기록

2.테스트 프로그램 로그기록(truss)

3.DB login/logout 로그기록(audit)

-client 요청에 의한 logout 인지 pmon에 의한 logout 인지 확인가능

4.event trace

 

1.trace 및 로그기록

 

Client | sqlnet.ora
----------------------------------

TRACE_LEVEL_CLIENT=ADMIN
TRACE_FILE_CLIENT=client.trc
TRACE_DIRECTORY_CLIENT=/user1/ora9i/dbConn
TRACE_UNIQUE_CLIENT=on

 

Server | sqlnet.ora
----------------------------------

TRACE_LEVEL_SERVER=ADMIN
TRACE_FILE_SERVER=server.trc
TRACE_DIRECTORY_SERVER =/data1/oracle/app/oracle/product/817/network/log/200710
LOG_FILE_SERVER=server.log
LOG_DIRECTORY_SERVER = /data1/oracle/app/oracle/product/817/network/log 

 

# listener.ora
----------------------------------

TRACE_LEVEL_mercury=support
TRACE_DIRECTORY_mercury=/data1/oracle/app/oracle/product/817/network/log/200710
TRACE_FILE_mercury=dbSrv
TRACE_TIMESTAMP_mercury=ON
TRACE_FILENO_mercury=3
TRACE_FILELEN_mercury=1024768

 

2.Client

$>truss -aeof <output> <execute>

 

3.audit

sql>audit create session username by access;

 

4.event trace(서버 alert.log에는 아무것도 안남습니다만 혹시나해서~)
alter system set event="3113 trace name errorstack level 3"
alter system set max_dump_file_size=unlimited;
alter system set events '3113 trace name errorstack level 3';

->'3113 trace name errorstack level 3'; => '03113 trace name ...  이렇게 해야되는가요? '0' 없어도 되죠?

반응형
Posted by [PineTree]
ORACLE/ADMIN2011. 5. 2. 21:38
반응형
TECH: SQL*Net V2 on Unix - A Quick Guide to Setting Up Client Side Tracing [ID 16564.1]


Modified 20-OCT-2005 Type REFERENCE Status PUBLISHED

( For a more detailed discussion of SQL*Net V2 tracing see Note:16658.1 )

Client Tracing
~~~~~~~~~~~~~~
1) Set the environment variable TNS_ADMIN to the directory where the 
   tnsnames.ora and listener.ora files exist.

   The default location is $ORACLE_HOME/network/admin. Set $TNS_ADMIN to this
   if it is not set. This ENSURES you know which files you are using.

2) Start the listener:          lsnrctl
                                > set password <password>
                                > start

   Note any errors. If you do not have a password set then ignore the 
   set password command.

3) If the listener started, start the database.

4) Create a file in $HOME called .sqlnet.ora and add the lines:

        trace_level_client= 16
        trace_file_client=client
        trace_directory_client= /tmp            (or similar)
        trace_unique_client=true

5) Try to connect from SQL*Plus thus:

        sqlplus username/password@alias

   or
        sqlplus username/password

   substituting a suitable alias.

6) If you get an error we may need to see the client trace file
   /tmp/client_<PID>.trc where <PID> is the process ID of the 
   client process (*1).
   This will be quite large so it is best to FAX or EMAIL it.

*1 Note: On earlier versions of SQL*Net the filename may NOT have
         the process ID appended to it.


Listener Tracing:
~~~~~~~~~~~~~~~~~

1) Edit your $TNS_ADMIN/listener.ora file and add the lines:

        TRACE_LEVEL_LISTENER     = 16    
        TRACE_DIRECTORY_LISTENER = /tmp
        TRACE_FILE_LISTENER      = "listener"   

2) Stop and restart the listener:

        lsnrctl stop
        lsnrctl start

   Output should go to /tmp/listener.trc


반응형
Posted by [PineTree]
ORACLE/ADMIN2011. 5. 2. 21:36
반응형

In this Document
  Goal
  Solution
     Launch the Net Manager utility
     Enabling Oracle Net Client / Server tracing (or both).
      Enabling Oracle SQLNet Listener tracing.
     Cyclic Tracing
     Tracing Kerberos
     Tracing Externel Procedures (Extproc)
     Tnsping Trace
     Enabling Dynamic Listener Tracing
     Only for 11g
  References


Applies to:

Oracle Net Services - Version: 9.2.0.1.0 to 10.2.0.2.0
Information in this document applies to any platform.

Goal

This document covers how to use Oracle Net Manager tool to enable Oracle SQLNet Client, Server, Listener, Kerberos and External Procedure tracing.  It also includes how to control the size and amount of trace files produced and steps to start dynamic listener tracing.

Solution

Launch the Net Manager utility

 

Windows : Go to Start button, select the Oracle Home on which we are going to generate the tracing for. Net Manager is located under 'Configuration and Migration Tools' of the Oracle Home menu.

Unix : The tool can also be started remotely from a Microsoft Windows system with the help a terminal emulation software. You need to ensure this software is up and running first.  Ensure that the $DISPLAY, $ORACLE_HOME, and $PATH environment variables are properly set. The Net Manager tool is started by the command "netmgr".

[oracle@testnet ~]$ export DISPLAY=152.69.172.167:0.0
[oracle@testnet ~]$ . ora10gr2.env
The environment settings are performed for 10gr2 home
[oracle@testnet bin]$ pwd
/u01/ora10gr2/bin
[oracle@testnet ~]$ netmgr

 

After launching the Net Manager, the location of the network configuration files shown in the top of the Net Manager window.  Ensure this is the expected location. If it is not, you have TNS_ADMIN environment variable set, which is pointing the session towards a different location.

 

Enabling Oracle Net Client / Server tracing (or both).

 

A. Click on Profiles in the left panel. The right panel of Net Manager will now show more details.

B. Select General from drop down menu on the right panel of the Net Manager.

C. Select Tracing tab in the right panel. Client Information and Server Information is now seen.

D. Select the required Trace Level. Oracle Support usually requests level SUPPORT (also known as level 16). It is optional to provide a value for the Trace Directory and Trace File. If you do,  ensure that the directory exists and that the owner of the Oracle software is able to write to that directory. The default location for the trace files is the $ORACLE_HOME/network/trace directory. Default name is CLI.TRC

The Unique Trace File Name option for the Client Information section, will make unique trace files for every connection. Without this option selected, all client tracing goes into the one file.  If running Microsoft Windows, always set this parameter or there is the potential that the trace file with the error will be overwritten.  The Server does not have this parameter available.

E. If tracing is to be done on the Server and the connection model is Shared Server, then the Instance will have to be restarted for the Dispatchers to pick up the changes to the SQLNET.ORA file.

The SQLNET.ORA will now look like the following :

TRACE_LEVEL_CLIENT = SUPPORT
TRACE_UNIQUE_CLIENT = on
TRACE_LEVEL_SERVER = SUPPORT
TRACE_DIRECTORY_CLIENT = C:\TEMP
TRACE_FILE_CLIENT = CLIENT
TRACE_DIRECTORY_SERVER = C:\TEMP
TRACE_FILE_SERVER = SERVER
TRACE_TIMESTAMP_ CLIENT = ON
Note:If tracing is to be done on the Server and the connection model is Shared Server(MTS), then the Instance will have to be restarted for the Dispatchers to pick up the changes to the SQLNET.ORA file.

 Enabling Oracle SQLNet Listener tracing.

A. In the left panel of the Net Manager, select Listeners view.

B. Select the Listener you want to trace. Once selected the right hand panel will show more details.

C. Select General Parameters from the drop down button, in the right hand panel.

D. Select Logging & Tracing tab.

E. Click the Tracing Enabled radio button and select Trace Level. SUPPORT level is normally requested by Oracle Support.

Select a directory for the trace to be generated.

G. Listener will need to be restarted or reloaded to enable or disable tracing.
 

Note: Stopping a Listener does not effect existing connections.

 

The LISTENER.ORA will now have the following lines.

TRACE_FILE_LISTENER = LISTENER.TRC
TRACE_DIRECTORY_LISTENER = C:\TEMP
TRACE_LEVEL_LISTENER = SUPPORT

Cyclic Tracing

Adding parameters TRACE_FILENO_ and TRACE_FILELEN_ to the SQLNET.ORA or LISTENER.ORA files enables Oracle Net trace files to be limited a certain size and a certain number of files. Following example is for Listener traces of 10 MB, with a maximum of 10 files for a maximum of 100 Mb of trace files. Cyclic tracing can be configured for the Client, Server and Listener

TRACE_FILELEN_LISTENER=10240
TRACE_FILENO_LISTENER=10

TRACE_FILENO_{CLIENT/SERVER/listener_name}: When this parameter is set it will determine the maximum number of trace files that will be generated. Trace data will continue to generate but in a cyclic fashion. The maximum size of each file is set by configuring the appropriate TRACE_FILELEN parameter.

TRACE_FILELEN_{CLIENT/SERVER/listener_name}: When set determines the maximum size of each trace file before either starting a new trace file (if the TRACE_FILENO parameter is set) or will overwrites an existing file (cyclic tracing). Once the last trace file has been written, up to the size specified by TRACE_FILELEN_<CLIENT/SERVER/listener_name>, tracing will continue by overwriting the first trace file in the cycle.

Tracing Kerberos

Kerberos can be traced using the following parameters in the SQLNET.ORA file.
See Note 185897.1 Kerberos Troubleshooting Guide for more details.

TRACE_LEVEL_OKINIT=SUPPORT
TRACE_DIRECTORY_OKINIT=<VALID_DIRECTORY>
TRACE_FILE_OKINIT=<VALID_FILE_NAME>

 

Tracing Externel Procedures (Extproc)

External procedure can be traced using the following parameter in the SQLNET.ORA file, where extproc is setup.

TRACE_LEVEL_AGENT= SUPPORT
TRACE_DIRECTORY_AGENT = <VALID_DIRECTORY_PATH>
TRACE_TIMESTAMP_AGENT =ON

 

Tnsping Trace

TNSPING.TRACE_LEVEL = SUPPORT
TNSPING.TRACE_DIRECTORY = <path where tnsping.trc will go>

 

Enabling Dynamic Listener Tracing

There might be situations when you cannot reload or restart the listener to enable the traces.Such times, you can fmake use of the dynamic tracing for the listener .

LSNRCTL> set current_listener listener
Current Listener is listener
LSNRCTL> set trc_level 16
Connecting to (ADDRESS=(PROTOCOL=tcp)(HOST=)(PORT=1521))
LISTENER parameter "trc_level" set to support
The command completed successfully
LSNRCTL> set trc_directory /home/oracle
Connecting to (ADDRESS=(PROTOCOL=tcp)(HOST=)(PORT=1521))
LISTENER parameter "trc_directory" set to /home/oracle
The command completed successfully
LSNRCTL> exit

This would create sqlnet support level tracing on the listener under directory /home/oracle with name listener.trc

Only for 11g

11g has new method of tracing Automatic Diagnostic Repository(ADR)  which will create the trace to the ADR directories
Reference
Note 454927.1Using and Disabling the Automatic Diagnostic Repository (ADR) with Oracle Net for 11g

To create the trace files to the desired directory,
For client tracing & server tracing add:
DIAG_ADR_ENABLED= OFF


For listener tracing:
DIAG_ADR_ENABLED_<listener name> = OFF


Unable to start Net Manager

If Net Manager is unable to start, check that you are able to launch other GUI tools from the same server, where Net Manager is failing. For example the X-Window utility "xclock". If you are not able to launch xclock, you need to address this problem, before attempting to start Oracle GUI tools. If xclock works, check other Oracle GUI tools, like DBCA before reporting the issue to Oracle Support.

References

NOTE:374116.1 - How to Match Oracle Net Client and Server Trace Files
NOTE:454927.1 - Using and Disabling the Automatic Diagnostic Repository (ADR) with Oracle Net for 11g


반응형
Posted by [PineTree]
ORACLE/ADMIN2011. 5. 2. 21:32
반응형

Applies to:

Oracle Application Object Library
Information in this document applies to any platform.
Checked for relevance on 15-Feb-2010

Solution

  • goal: How to Run and View a Listener Trace
  • fact: Oracle Application Object Library 10.7 G
  • fact: Oracle Application Object Library 10.7 N
  • fact: Oracle Application Object Library 11

fix: 1. Set the parameter TRACE_LEVEL_LISTENER=16 in the listener.ora file and bounce the listener to start the trace. (or) At the command line type: lsnrctl trace 16 Note: This level of tracing generates a huge amount of information in the trace file. Turn it off once enough trace information is obtained. 2. Attempt to view a report from the client. This will produce a very detailed trace file of the connection attempt. 3. Set the parameter TRACE_LEVEL_LISTENER=0 in the listener.ora file and bounce the listener to stop the trace. (or) At the command line type: lsnrctl trace OFF 4. Navigate to $ORACLE_HOME/bin 5. Run trcasst Ex: trcasst filename.trc>[outfile] Where [outfile] is the name of the output file to contain the modified trace file. 6. View the output file Reference: See the Net8 Administrator's manual for details on Trace Assistant.
반응형
Posted by [PineTree]
ORACLE/ADMIN2011. 5. 2. 21:28
반응형

In this Document
  Goal
  Solution
  References


Applies to:

Oracle Net Services - Version: 9.2.0.1.0 to 11.2.0.1 - Release: 9.2 to 11.2
Information in this document applies to any platform.
Checked for relevance on 7-MAY-2010.

Goal

When attempting to limit the amount of trace generated when TNS listener tracing is enabled, cyclic tracing can be used. But to further restrict the amount of time TNS listener is enabled, tracing can be enabled and disabled, dynamically via the listener control, lsnrctl. This means tracing can be enabled only when required and allows the listener to run without generating trace until required.But when enabled via this method, Time stamp is missing from the trace file(s) generated. The following steps explain how to workaround the time stamp restriction.

Solution

To workaround the current restriction of TNS listener tracing enabled via listener control, generating traces without time stamp, the following steps can be used. Steps 1 to 3 must be inplaced before attempting to capture any problem with tracing.

 1. Backup the Listener.ora file, add the following parameters and save.

TRACE_LEVEL_<Listener Name> = 16
TRACE_FILE_<Listener Name> = <File Name>
TRACE_DIRECTORY_<Listener Name> = <Directory>
TRACE_TIMESTAMP_<Listener Name> = ON
TRACE_FILELEN_<Listener Name> = <Size in KiloBytes>
TRACE_FILENO_<Listener Name> = <Number>

TRACE_FILELEN and TRACE_FILENO are the cyclic tracing parameters.Ensure values used for these are large enough to capture information,without being overwritten.

2.Reload the listener, with command lsnrctl reload <listener name>

$lsnrctl reload 

LSNRCTL for Solaris: Version 10.2.0.4.0 - Production on 21-NOV-2008 13:32:15 
Copyright (c) 1991, 2004, Oracle.  All rights reserved. 

Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=sample.com)(PORT=1521))) 
The command completed successfully

With a default listener name of "listener", the listener name can be excluded from commands.

3. Stop listener tracing, run set trc_level 0 via lsnrctl

LSNRCTL> set trc_level 0 
Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=sample.com)(PORT=1521))) 
listener parameter "trc_level" set to off 
The command completed successfully

If your listener name is not using the default name of "listener", then run command "set current_listener <listener name> in listener control, before any other command is used.

4. When TNS listener tracing is required to be re-enabled. run set trc_level 16 via lsnrctl.

LSNRCTL> set trc_level 16  
Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=sample.com)(PORT=1521)))  
listener parameter "trc_level" set to support  
The command completed successfully

 
5. Turn off tracing as soon as the problem / issue has been captured with command "set trc_level 0" from lsnrctl. (As per step 3). Remove or hash out the parameters added to the listener.ora in Step 1 and run reload command. This will stop tracing from being re-activated.

Please note for 11g TNS listener tracing, ADR tracing can be disabled by following Note 454927.1Using and Disabling the Automatic Diagnostic Repository (ADR) with Oracle Net for 11g
반응형
Posted by [PineTree]
ORACLE/ADMIN2011. 5. 2. 16:58
반응형

Problem Description:
====================

Established connections fails intermittently.
ORA-03113 is returned to the application.
Application must start a new connection.
No errors are generated in the database alert.log


Solution Description:
=====================

WARNING: Misuse of Windows Registry/Editor can cause serious problems
that may require you to reinstall your operating system.

Increase TcpMaxDataRetransmissions parameter in the Windows Registry


On Windows NT/2000:

HKEY_LOCAL_MACHINE
      SYSTEM       
        CurrentControlSet         
          Services
            Tcpip             
             Parameters

add a key "TcpMaxDataRetransmissions" (REG_DWORD) and set its value to 10.

On the Edit menu, click Add Value, and then add the following:

      Value Name: TcpMaxDataRetransmissions
      Value Type: REG_DWORD - Number
      Valid Range: 0 - 0xFFFFFFFF
      Default Value: 5 Decimal
      New Value: 10 to  Decimal

The value may vary as it is dynamically adjusted.


On Windows 95:

HKEY_LOCAL_MACHINE     
   SYSTEM       
     CurrentControlSet         
       Services
            Winsock

add a key "TcpMaxDataRetransmissions" and set its value to 10.


On the Edit menu, click Add Value, and then add the following:

      Value Name: TcpMaxDataRetransmissions
      Value Type: REG_DWORD - Number
      Valid Range: 0 - 0xFFFFFFFF
      Default Value: 5 Decimal
      New Value: 10  Decimal


Solution Explanation:
=====================

When a TCP/IP packet in a given segment is sent, the sender waits for
an acknowledgment (ACK) before proceeding to the next task.
If an ACK is not received before the retransmission timer expires,
the retransmission timer value is doubled and the packet is resent.

A retrans counter is incremented and the process repeats until the
retrans counter is equal to the TcpMaxDataRetransmissions value.
At this point the network is considered timed-out and the socket is
closed by the Microsoft Operating System.

Oracle SQLNET/Net8 tracing reports this as an ntt2err: SOC xxxx error
as well as several other Oracle network transport errors (NT).

The default initial retransmission timer value is set to 3 seconds.
This value is doubled and it's product is doubled on the next iteration,
until iterations equal the TcpMaxDataRetransmissions value.

The total time to time-out with default values looks like this:
  (3s)+(3s*2)+(3s*4)+(3s*8)+(3s*16)=93s OR 1 min 33 seconds to network
  time-out.

If you see something very close to the following a SQL*Net/Net8 trace
file, the most likely cause would be that the network has timed-out.
The ntt2err SOC error is the real error - all errors appearing after
this are secondary or misleading errors.

ntt2err: soc xxxxx error - operation=5, ntresnt[0]=517, ntresnt[1]=54,
ntresnt[2]=0

Following the appearance of ntt2err, you may expect to see something like the
following. These errors are symptoms of the real problem.  They tell us
that the network or server went down when it only timed-out.

-<ERROR>- osnqrc:  wanted 1 got 0, type 0
-<ERROR>- osnqper:  error from osnqrc
-<ERROR>- osnqper:    nr err code: 0
-<ERROR>- osnqper:    ns main err code: 12547
-<ERROR>- osnqper:    ns (2)  err code: 12560
-<ERROR>- osnqper:    nt main err code: 517
-<ERROR>- osnqper:    nt (2)  err code: 54
-<ERROR>- osnqper:    nt OS   err code: 0
osnqer: entry
osnqer:  incoming err = 12151


A little later on you may see the following:
osnqer:  returning err = 3113

A clear sign of a busy network is:
ntt2err: soc xxxx - operation=5,
ntresnt[0]=517, ntresnt[1]=54, ntresnt[2]=0
This is very consistent.
       
The ORA-3113 tells us:
Text:   end-of-file on communication channel

You can look in the trace but it will be impossible to find an EOF
packet dump. You must review the second meaning of ora-3113 - the
network or server machine went down. Once again somewhat misleading
as the network was only busy and the server is really fine.


Note:
Increasing TcpMaxDataRetransmissions is only a workaround for a bad or over
stressed network. The workaround should gives you some time to address the
underlying network issues.
It is time to get out the network sniffer and measure the degree of the
problem.
It is more than likely that it will get worse over time, particularly with
increased network usage.


Search words:

Tcp Max Data Retransmissions
TcpMaxDataRetransmissions
3113
ntt2err: soc error -
operation=5, ntresnt[0]=517,
ntresnt[1]=54, ntresnt[2]=0


ref: {4950.102}    BUG-561277

반응형
Posted by [PineTree]