ORACLE/TroubleShooting2011. 5. 2. 22:10
반응형

An ORA-3113 "end of file on communication channel" error is a general error usually reported by a client process connected to an Oracle database. The error basically means 'I cannot communicate with the Oracle shadow process'. As it is such a general error more information must be collected to help determine what has happened - this error by itself does not indicate the cause of the problem. For example, ORA-3113 could be signalled for any of these scenarios:

    Server machine crashed
    Your server process was killed at O/S level
    Network problems
    Oracle internal errors / aborts on the server
    Client incorrectly handling multiple connections
    etc.. etc.. etc.. - a lot of possible causes !!

This short article describes what information to collect for an ORA-3113 error. It is common for this error to be accompanied by other errors such as:

    ORA-1041 internal error. hostdef extension doesn't exist
    ORA-3114 not connected to ORACLE
    ORA-1012 not logged on

These are all symptomatic of being disconnected from Oracle.

Please collect as much information as possible from the items listed below and submit this information to Oracle support. Where a step produces an output file / trace file this may be needed by Oracle Support to help determine the cause of the error. The more information you can present in one go the better your chance of a speedy solution. Note that some sections may not be applicable.

If viewing this article on a Web Browser you can follow the links, otherwise manually go to the relevant section.

 
What Scenario does the ORA-3113 occur in ?

  A. When attempting to startup Oracle ?                    -> Section A
  B. When attempting to make a connection to the database ? -> Section B
  C. Client gets the error running SQL / PLSQL ?            -> Section C
  D. Server trace file reports ORA-3113 ?                   -> Section D

You may find it useful to scan the checklist in Section E at the end of this article. This covers some questions / issues relevant to all problem sections.

NOTE: References to SVRMGR are outdated with Oracle9i and Oracle 10g. If you are on an earlier version of the database, you will use 'svrmgrl' instead of SQL*Plus to connect as the internal user.

e.g. svrmgrl
SVRMGR> connect / as sysdba (or connect internal)
(A) ORA-3113 when attempting to STARTUP Oracle

  
  There are several phases involved in starting up a database. If ORA-3113
  occurs during startup then abort the instance and start up using the
  sequence below. If an error occurs at any step then see the related notes
  below.

    a. Start any required services.        On error see A1
       Eg: On NT start the OracleServiceSID

    b. Connect as a SYSDBA user.          On error see A1
       Eg:  sqlplus /nolog
        SQL> connect / as sysdba

    c. Startup nomount.              On error see A1
       Eg:
        SQL> startup nomount

    d. Mount the database.             On error see A1 and A2
       Eg:
        SQL> alter database mount;

    e. Recover the database         On error see A3
       Eg:
        SQL> recover database

    f. Open the database             On error see A4
       Eg:    
        SQL> alter database open;

    g. Wait 3 minutes then issue a select.   On error see A4
       Eg:    
        SQL> select count(*) from DBA_OBJECTS;



A1) Errors connecting as SYSDBA / Internal OR on startup nomount
   
    There is something fundamental wrong with the software / environment
    if you cannot connect to Server Manager as a DBA user.
    The steps here cover errors such as ORA-3113, ORA-12547: TNS:lost contact
    or similar errors connecting to Oracle or starting the instance NOMOUNT. 
    Check the following items:

    A1.1)    If possible reboot the server disabling any automatic
        startup of Oracle before you do so. This may seem drastic
        but helps make sure you are working from a consistent
        starting point.

    A1.2)     Check your environment points at the expected ORACLE_HOME
        and ORACLE_SID and that TWO_TASK is not set (Unix) or
        LOCAL is not set (NT registry).
                Check the USER_DUMP_DEST and BACKGROUND_DUMP_DEST and default
                trace directories under this environment for any user trace
                files or alert log entries generated. These may help indicate
                the cause of the problem.
                Eg: ORA-600[SKGMINVALID] may indicate a problem with the
                    shared memory Unix parameters on Unix systems.
                Try to show that any trace file / alert log entry you
                find is truely related to the "CONNECT" command by re-issuing
                the "connect" and checking for a new trace file / alert entry
                at the time of the error.

        A1.3)   Unix only:
        Some Unix platforms need LD_LIBRARY_PATH to be set
                correctly to resolve any dynamically linked libraries.
                As the user with the problem:

                        % script /tmp/ldd.out
                        % id
                        % cd $ORACLE_HOME/bin
                        % ldd oracle
                        % exit

                If the 'ldd' command does not exist go to the next step below.
                Check that all lines listed show a full library file. If there
                are any 'not found' lines reported contact Oracle support
                with the output of /tmp/ldd.out .


        A1.4)   Unix only:
        Your 'oracle' executable may be corrupt. Relink it thus:

                        Log in as the 'oracle' user.
                        % script /tmp/relink.out
                        % cd $ORACLE_HOME/rdbms/lib
            % mv $ORACLE_HOME/bin/oracle $ORACLE_HOME/bin/oracle.dd.mon.yy
                        % rm -f ./oracle       
                        % make -f ins_rdbms.mk ioracle
                        % exit

        Prior to Oracle7.3 the relink command was:
            make -f oracle.mk ioracle

                If this reports any errors Oracle support will need to see
                the contents of the file /tmp/relink.out .

   
    A1.5)   Have you installed the Parallel Server Option ?
                ORA-3113 can occur if you have installed the Parallel
        Server Option but do NOT have a Distributed Lock Manager
        installed or running correctly.

                Unix:
          If the Parallel Server Option was installed by accident
          then it can be de-installed by relinking.
          Eg:
                        Shut down any Oracle instances
                        % script /tmp/relink.out
                        % cd $ORACLE_HOME/rdbms/lib
                        # 'oracle' should not exist so delete it if it present
                        % rm -f oracle
                        % make -f ins_rdbms.mk no_parropt ioracle
                        % exit

          NB: Do NOT deinstall the Parallel Server Option if the
              database is using Parallel Server unless both nodes
              are shut down otherwise database corruption could occur.

    A1.6)   If the error is on STARTUP NOMOUNT:

            Check the init.ora file used to start the database.
            This provides the configuration details used
            configure the instance. To help isolate the problem
            it may be useful use a very basic init.ora file
            when starting the instance. If this works then
            parameters can be increased / introduced one at a
            time to see if there is a problem with a particular
            setting.

    A1.7)     Check for server side trace files which may give more
        indication what the underlying problem is.
        See section C for details on how to check
        for server trace files.
       

    A1.8)     Ensure there is free disk space in:

          a. Your USER_DUMP_DEST and BACKGROUND_DUMP_DEST locations

          b. Your AUDIT destination (Unix)
            The default is $ORACLE_HOME/rdbms/audit

          c. Your Oracle Trace directory if Oracle Trace is enabled
            See Note:45482.1

        A1.9) Windows 2000 - Only
              If the Server's sqlnet.ora file contains Authnetication services
              which are NOT reachable by Oracle, then an ORA-3113 error will
              result.
              For example, if the sqlnet.ora file contains the parameter:
              SQLNET.AUTHENTICATION_SERVICES = (NTS) and the Oracle database
              is moved from a Windows NT Domain to an Active Directory one,
              or if a Domain Controller is introduced, then an error will
              result trying to start the database.
              Remove the sqlnet.authentication_services line so that Oracle
              does not look for a non-existent KDC (Kerberos Domain Controller).


           
A2) Errors Mounting the database

    Check all the items in A1 first.

    If an error occurs when mounting the database there may be problems
    with the control-files or data files, or with resources required to
    open these files.

    A2.1)     The location of the control files are specified in the
        init.ora file.  Try mounting using each controlfile in
        turn.
        eg: "Shutdown abort",
            edit the init.ora to refer to ONE of the controlfiles only,
            "startup nomount",
            "alter database mount"
        Repeat for each controlfile to see if any controlfile works.
       
    A2.2)    It is possible to re-create the controlfiles if you know the
        location of all datafiles and online logs, or to restore an old
        backup controlfile. Always back up the current controlfiles before
        restoring any backup copies or issuing a CREATE CONTROLFILE
        command.
        The steps for this are not documented here.

        A2.3)   Unix:
                Some unix platforms have a 'truss' command (or 'tusc').
        If available this can be used to help trace how far Oracle
        gets before the error occurs.
                Eg: 
                        % truss -o /tmp/truss.out -f svrmgrl

                Keep the file /tmp/truss.out safe - Oracle Support MAY need to see it.


A3) Errors on RECOVER DATABASE

      ORA-3113 during recover database is often related to a corruption on the
    database or redo stream which causes the shadow process to die. There should
    be a server side trace file produced for this sort of problem.
    See Section C for details on how to locate any trace files
    from both USER_DUMP_DEST and BACKGROUND_DUMP_DEST.

    A3.1)    If the "recover database" fails fairly quickly then it
        may help to collect the redo up to the point of failure as this
        may help identify where the problem is.

        Use the following commands just prior to the RECOVER DATABASE
        command:

          SQL> alter session set max_dump_file_size=unlimited;
          SQL> alter session set events
          2> '10228 trace name context forever, level 10';
          SQL> RECOVER DATABASE
       
        This causes redo information to be written to the user trace
        file. The last items of redo may help determine which file
        has problems.

    A3.2)    If you do not have many datafiles in the database it may be
        as quick to recover each file in turn to see if this narrows
        down the problem.
        Eg:
          SQL> select name from v$datafile;
       
        and then for each file:

          SQL> RECOVER DATAFILE 'full_file_name'

        If this gets to a problem file then back up the file and
        use standard recovery options as if the file was lost.
       

A4) Errors on ALTER DATABASE OPEN

    Database open performs very many operations and so it is necessary
    to collect any trace information before determining the next steps.
    However, the following may help isolate the problem more quickly:

    A4.1)     Move files out of your USER_DUMP_DEST and BACKGROUND_DUMP_DEST
        directory as these steps will generate a lot of trace.

    A4.2)     Edit the init.ora file and add the lines:

            event="10046 trace name context forever, level 12"
            event="10015 trace name context forever, level 1"
            event="10228 trace name context forever, level 1"

        If you already have "EVENT=" lines in the init.ora file
        this MUST directly below the other "Event=" lines.
        These lines will trace:

            SQL and BIND activity during startup
            REDO applied
            Information about transactional rollback required

    A4.3)     Startup the instance as described at the top of this section.
       
        As soon as the error occurs REMOVE the above events from the init.ora
        file and shutdown. Collect together the trace files and alert logs
        as described in Section C

       


(B) ORA-3113 attempting to connect to the database

  
   Which SQL*Net layer are you using ?

    a. SQL*Net2 or Net8  -> Goto B1
       This is the default for Oracle 7.0.15 onwards

    b. SQL*Net 1         -> Goto B2
       Note: This can only be used with Oracle releases before 7.3


B1) Connecting to Oracle using SQL*Net V2 / Net8

   Net8 or SQL*Net2 should report network related errors if a problem
   occurs whilst establishing a connection to a remote Oracle shadow
   process. An ORA-3113 implies that the connection itself has been established
   but then is lost. As such follow the steps in Section C

B2) Connecting to Oracle using SQL*Net V1

   This section is included for historical reasons.


   Is the tool you are trying to connect from

    a. A local connection ?    -> Goto BL
       ie: The client is on the same machine as the database and you are
        not trying to connect over a network connection.

    b. A remote connection ? -> Goto BR


   BL) Local SQL*Net V1 Connections

      For local connections check the following:

        BL1)    Have you installed the Parallel Server Option ?
                ORA-3113 will occur if you have installed the Parallel
                Server Option but do NOT have a Distributed Lock Manager
                installed or running.

                To deinstall the Parallel Server Option:

                        Shut down any Oracle instances
                        % script /tmp/relink.out
                        % cd $ORACLE_HOME/rdbms/lib
                        # 'oracle' should not exist so delete it if it present
                        % rm -f oracle
                        % make -f oracle.mk no_parropt ioracle
                        % exit

                If this reports any errors Oracle support will need to see
                the contents of the file /tmp/relink.out .


        BL2)    Try using the SQL*Net V1 driver for local connections:

                        setenv TWO_TASK P:

                Then try the client tool. If this now works you may have a
                problem with the default SQL*Net driver.


        BL3)    Your 'oracle' executable may be corrupt. Relink it thus:

                        Log in as the 'oracle' user.
                        % script /tmp/relink.out
                        % cd $ORACLE_HOME/rdbms/lib
                        % rm -f oracle       
                        % make -f oracle.mk ioracle
                        % exit

                If this reports any errors Oracle support will need to see
                the contents of the file /tmp/relink.out .


        BL4)    Some Unix platforms need LD_LIBRARY_PATH to be set
                correctly to resolve any dynamically linked libraries.
                As the user with the problem:

                        % script /tmp/ldd.out
                        % id
                        % cd $ORACLE_HOME/bin
                        % ldd oracle
                        % exit

                If the 'ldd' command does not exist go to the next step below.
                Check that all lines listed show a full library file. If there
                are any 'not found' lines reported contact Oracle support
                with the output of /tmp/ldd.out .


        BL5)    If you cannot connect as the Oracle user AND your system has
                the 'truss' command try the following when logged in as
                'oracle' (using the relevant client tool):
               
                        % truss -o /tmp/truss.out -f sqlplus user/password
                        Exit from sqlplus (or the problem tool)

                Keep the file /tmp/truss.out safe - Oracle MAY need to see it.


   BR) Remote SQL*Net v1 Connections

      For remote connections check the following:

        BR1)    Check if you can make LOCAL connections. If not then follow
                the steps above for LOCAL connection problems.


        BR2)    Which SQL*Net protocol are you using ?

                Table B.1
                ~~~~~~~~~
                SYMBOL     SQL*Net V1 Layer  Prefix  Debug       Listener
                ------     ----------------  ------  -----       --------
                osnptt      PIPE Two Task     P:     OSNTTD      none
                osnasy      ASYNC             A:     OSNADBUG    none
                osnttt      TCP/IP Two Task   T:     OSNTDBUG    orasrv
                osntlitcp   TCP/TLI Two Task  TT:    OSNTLIDBUG  tcptlisrv
                osntlispx   SPX Two Task      X:     OSNTLIDBUG  spxsrv
                osndnt      DECNET Two Task   D:     OSNDDBUG    none


                For the protocol you are using check that the 'oracle'
                executable has this linked in thus:

                        Log in as oracle on the server
                        % script /tmp/drivers.out
                        % cd $ORACLE_HOME/bin
                        % drivers oracle
                        % exit

                        Eg: If you are using TCP/IP it should list TCP/IP.

                If the 'drivers' command does not exist on your machine check
                the 'oracle' executable as below substituting the relevant
                symbol from Table C.1 for the word 'SYMBOL'. If you get *NO*
                output the driver is probably *NOT* installed.

                        % script /tmp/symbols.out
                        % cd $ORACLE_HOME/bin
                        % nm oracle | grep -i SYMBOL    # Use relevant SYMBOL
                        % exit

                        Eg: For SQL*Net TCP/IP you would use the command:

                                % nm oracle | grep -i osnttt

                If the required driver is not installed you should:

                        a) Relink Oracle (See step (B3) above).
                        b) Re-check the 'oracle' executable for the
                           relevant driver. If it is still missing then
                           the relevant SQL*Net driver has probably not
                           been installed. Reinstall the required SQL*Net
                           driver.


        BR3)    Check your /etc/oratab or /var/opt/oracle/oratab file
                is of the form:

                        # Comments begin with a HASH
                        SID:/path/to/oracle/home:N

                And confirm:
               
                        [ ] There are no blank lines.
                        [ ] The PATH to ORACLE_HOME is correct and contains
                             no environment variables.
                        [ ] There are no ':'s in the ORACLE_HOME path.
                        [ ] There is NOTHING at the end of the line.
                            The last character on a line should be Y or N.
                            There should NOT be a fourth field.


        BR4)    Unix only:
        If you have 'truss' available try to truss the Oracle
                connection. You will normally need 'root' privilege to do this
                and should use truss on the relevant listener process (see
                Table B1)
                Eg: For TCP/IP the listener is 'orasrv' so enter these commands
                    as 'root':

                        % truss -o /tmp/truss.out -f -eall -p <PID_of_orasrv>
               
                        Attempt the connection to reproduce the ORA-3113 then
                        interrupt this 'truss' session.

                This will produce a LOT of output so keep it but do not send it
                to Oracle initially - just make a note that it is available.


(C) Client sees ORA-3113 running SQL / PLSQL

  
   If the ORA-3113 error occurs AFTER you have connected to Oracle then
   it is most likely that the 'oracle' executable has terminated unexpectedly.

        C1)     Determine which database you were connected to and
                obtain the following init.ora parameter values:
       
                        Parameter               Default
                        ~~~~~~~~~               ~~~~~~~
                        USER_DUMP_DEST          $ORACLE_HOME/rdbms/log
                        BACKGROUND_DUMP_DEST    $ORACLE_HOME/rdbms/log
                        CORE_DUMP_DEST          $ORACLE_HOME/dbs

                Eg: To find these log into Server Manager and:

                        SQL> show parameter dump


        C2)     Check in your 'USER_DUMP_DEST' for any Oracle trace file.
                It is important to find the correct trace file.
        On Unix:
          Use the command 'ls -ltr' to list files in time order with the
                  latest trace files appearing LAST. The trace file will typically
          be of the form '<SID>_ora_<PID>.trc'.
        On NT:
          Click on the "Modified" column in Windows Explorer to sort the
          files by their modified date. Files will typically be of the form
          'ORA<PID>.TRC'.
       
        If you are not sure which trace file may be relevant MOVE all
        the current trace files to a different directory and reproduce
        the problem.

        C3)     Check in your 'BACKGROUND_DUMP_DEST' for your alert log and
        any other trace files produced close to the time of the error.
        On Unix this should be named 'alert_<SID>.log'.
        On NT this should be named '<SID>ALRT.LOG'

        C4)     Unix Only:
        If there is no trace file check for a 'core' dump in the
                CORE_DUMP_DEST. Check thus:

                        % cd $ORACLE_HOME/dbs   # Or your CORE_DUMP_DEST
                        % ls -l core*

                If there is a file called 'core' check its time matches the
                time of the problem. If there are directories called
                'core_<PID>' check for core files in each of these. It is
                IMPORTANT to get the correct core file. Now obtain a stack
                trace from this 'core' file. Check each of the sequences below
                for how to do this - one of these should work for your
                platform:


                    If you have dbx:
                        % script /tmp/core.stack
                        % dbx $ORACLE_HOME/bin/oracle core
                        (dbx) where
                        ...
                        (dbx) quit
                        % exit
                       
                    If you have sdb:
                        % script /tmp/core.stack
                        % sdb $ORACLE_HOME/bin/oracle core
                        * t
                        ...
                        * q
                        % exit
                       
                    If you have xdb:
                        % script /tmp/core.stack
                        % xdb $ORACLE_HOME/bin/oracle core
                        (xdb) t
                        ...
                        (xdb) q
                        % exit


                    If you have adb:
                        % script /tmp/core.stack
                        % adb $ORACLE_HOME/bin/oracle core
                        $c
                        ...
                        $q
                        % exit


        C5)     Try to isolate the SQL command that is executing when
                the error occurs. Eg: Is it a particular SQL statement
                or PL/SQL block that causes the error ?
        In many cases this will be listed in the trace file
        produced under the heading "Current SQL statement", or
         near the middle of the trace file under the cursor referred
        to by the "Current cursor NN" line.

        If the trace does not show the failing statement then
        then SQL_TRACE may be used to help determine this provided
          the problem reproduces. SQL_TRACE can be enabled in most
        client tools:

                Eg: Product     Action
                    ~~~~~~~     ~~~~~~
                    SQL*Plus    Issue 'ALTER SESSION SET SQL_TRACE TRUE;'
                    Pro*        EXEC SQL ALTER SESSION SET SQL_TRACE TRUE;

                This should force a server side SQL trace file as detailed
                in C2 above. The trace file should give a clue as to what
                SQL was being executed.


        C6)     If no trace file can be found and the problem is reproducible
        then SQL*Net trace may help to show what the latest operation sent
        to the 'oracle' process was.
                For SQL*Net V2 / V8 tracing see the article Note:16564.1
   
        Historical note:
                     For SQL*Net V1 check which SQL*Net protocol you are using
                    and note the 'Debug' environment variable from table B1 below.
                    Then catch SQL*Net trace from the client. Eg: For SQL*Net
                    TCP/IP and sqlplus:
   
                            % setenv OSNTDBUG -1    # Use correct OSN*DBUG variable
                            % sh
                            % sqlplus scott/tiger@t:hostname:sid 2>/tmp/net1.out
       

        C7)     Based on information collected above try to determine a small
                test case which will reproduce the problem. This is important
                for two reasons:

                        a) It gives Oracle support a small test case if the
                           problem does not look like a known problem.
               
                        b) It gives you a simple way to check if any patch
                           supplied will fix the problem.



    C8)    If a statement can be isolated which consistently raised an
        ORA-3113 error then it is worth spending some time collecting
        additional information such as:
           
            - An execution plan for the statement
            - Table definitions, column definitions
            - Information on constraints, triggers etc..

        ie: Any additional information about the statement which fails.
            eg: If a SELECT fails then it may succeed if run under a
            different optimizer mode.



    C9)     Check if your server administrator has any scripts which abort
        long running or CPU intensive processes. An ORA-3113 process
        can occur if someone kills your Oracle shadow process at O/S
        level (Eg: kill -9 on Unix).


(D) Server trace reports ORA-3113

    D1)     It is unusual for a server trace to report ORA-3113.
        However, this can occur if the server loses contact with
        the client OR a database link connection.
        Treat this the same as an ORA-3113 in a client process
        and follow the steps in Section C.

    D2)     The following line may be added to the init.ora file
        to help collect maximum information when the error occurs:

            event="3113 trace name errorstack level 3"

        If you already have "EVENT=" lines in the init.ora file
        this MUST directly below the other "Event=" lines.

(E) Additional Checks / Information

        E1)     Is it just this one tool that encounters the error or
                do you get ORA-3113 from any tool doing a similar operation ?
                If the problem reproduces in SQL*Plus use this in all tests
                below.

        E2)     Unix only:
        Check if the problem is just restricted to:

                        [ ] One particular UNIX user,
                        [ ] Any UNIX user
                    or  [ ] Any UNIX user EXCEPT as the Oracle user.


        E3)     Check if the problem is just restricted to:

                        [ ] One particular ORACLE logon
                    or  [ ] Any ORACLE logon that has access to the
                                relevant tables.

        E4)     If this is a client-server set up does this occur from:

                        [ ] Any client
                        [ ] Just one particular client
                    or  [ ] Just one group of clients ?
                            If so what do these clients have in common ?
                            Eg: Software release .
               
        E5)     Do you have a second server or database version where the
                same operation works correctly ?


    E6)     Ensure there is free disk space in:

          a. Your USER_DUMP_DEST and BACKGROUND_DUMP_DEST locations

          b. Your AUDIT destination (Unix)
            The default is $ORACLE_HOME/rdbms/audit

          c. Your Oracle Trace directory if Oracle Trace is enabled
            See Note:45482.1

반응형

'ORACLE > TroubleShooting' 카테고리의 다른 글

ora 4031 에러 처리 방안  (0) 2011.06.02
ORA-4031 에러 만들기  (0) 2011.06.02
DIAGNOSING ORA-3113 ERRORS [ID 1020463.6]  (0) 2011.05.02
Thread 1 cannot allocate new log, sequence  (0) 2011.03.25
CHECKPOINT NOT COMPLETE  (0) 2011.01.30
Posted by [PineTree]