转载

使用oradebug dump hanganalyze 分析oracle hang系列三

背景

   db hang时,分析下oradebug dump hanganalyze 产生的trace file第一部分内容与未HANG的区别,进一步理解其内容构成。


结论

1,DB 未HANG时,第一部分显示如下:


Chains most likely to have caused the hang:
 [a] Chain 1 Signature: <='latch: cache buffers chains'
     Chain 1 Signature Hash: 0xccebf225
 [b] Chain 2 Signature: <='buffer busy waits'
     Chain 2 Signature Hash: 0x9a13abed
 [c] Chain 3 Signature: <='latch: cache buffers chains'
     Chain 3 Signature Hash: 0xccebf225




2,DB HAGN时,第一部分显示如下
Chains most likely to have caused the hang:


*** 2015-11-06 03:42:00.529
 [a] Chain 1 Signature: 'rdbms ipc message'<='log file sync'
     Chain 1 Signature Hash: 0xcfe404d3
 [b] Chain 2 Signature: 'rdbms ipc message'<='log file sync'
     Chain 2 Signature Hash: 0xcfe404d3
 [c] Chain 3 Signature: 'rdbms ipc message'<='log file sync'
     Chain 3 Signature Hash: 0xcfe404d3


3,可见二者区别在于未HANG有not in a wait的字样,而HANG却是具体的等待事件
4, 可以基于第一部分的   Chain 1 Signature Hash: 0xcfe404d3直接定位到发生HANG的chain
   然后再查找       is blocked by,即可找到持锁会话,进行针对性分析即可




测试



1,数据库未HANG时产生的oracle dump hanganalyze 3的trace file第一部分


 
Chains most likely to have caused the hang:
 [a] Chain 1 Signature: <='latch: cache buffers chains'
     Chain 1 Signature Hash: 0xccebf225
 [b] Chain 2 Signature: <='buffer busy waits'
     Chain 2 Signature Hash: 0x9a13abed
 [c] Chain 3 Signature: <='latch: cache buffers chains'
     Chain 3 Signature Hash: 0xccebf225




2,模拟DB HANG,分析TRACE FILE第一部分




--oradebug session1
SQL> select pname,pid,spid,addr from v$process where pname in ('LGWR','PMON');


PNAME             PID SPID                                             ADDR
---------- ---------- ------------------------------------------------ ----------------
PMON                2 3856                                             00000000DD59FC40
LGWR               11 3876                                             00000000DD5A8E80


SQL> select pid,spid,addr from v$process where addr=(select paddr from v$session where sid=(select sid from v$mystat where rownum=1));


       PID SPID                                             ADDR
---------- ------------------------------------------------ ----------------
        19 3952                                             00000000DD5B1080




SQL> select sid from v$session where sid=(select sid from v$mystat where rownum=1);


       SID
----------
        17




--session 2
SQL> select sid from v$session where sid=(select sid from v$mystat where rownum=1);


       SID
----------
       183


SQL> select pid,spid,addr from v$process where addr=(select paddr from v$session where sid=(select sid from v$mystat where rownum=1));


       PID SPID                                             ADDR
---------- ------------------------------------------------ ----------------
        35 6845                                             00000000DD5C1480




SQL> create table t_db_hang(a int);


Table created.




--session1
SQL> oradebug setospid 3876
Oracle pid: 11, Unix process pid: 3876, image: oracle@seconary (LGWR)
SQL> oradebug suspend
Statement processed.


--session2 


---insert hang
SQL> insert into t_db_hang values(1);


-- session 3


--可见session 3不能产生新的会话,数据库已经HANG住
[oracle@seconary ~]$ sqlplus scott/system


SQL*Plus: Release 11.2.0.1.0 Production on Fri Nov 6 03:37:28 2015


Copyright (c) 1982, 2009, Oracle.  All rights reserved.


--session1 
--生成一个oradebug dump hanganalyze 3




SQL> oradebug setmypid
Statement processed.
SQL> oradebug dump hanganalyze 3
Statement processed.
SQL> oradebug tracefile_name
/oracle/diag/rdbms/guowang/guowang/trace/guowang_ora_3952.trc


--分析trace file


第一部分
Chains most likely to have caused the hang:


*** 2015-11-06 03:42:00.529
 [a] Chain 1 Signature: 'rdbms ipc message'<='log file sync'
     Chain 1 Signature Hash: 0xcfe404d3
 [b] Chain 2 Signature: 'rdbms ipc message'<='log file sync'
     Chain 2 Signature Hash: 0xcfe404d3
 [c] Chain 3 Signature: 'rdbms ipc message'<='log file sync'
     Chain 3 Signature Hash: 0xcfe404d3


基于等待会话183查找,发现如下信息,可见等待会话11,如果发现DB HANG了,也可以查找is blocked by,即可找到持锁会话,进行针对性分析即可
-------------------------------------------------------------------------------
Chain 80:
-------------------------------------------------------------------------------
    Oracle session identified by:
    {
                instance: 1 (guowang.guowang)
                   os id: 6845
              process id: 35, oracle@seconary (TNS V1-V3)
              session id: 183
        session serial #: 183
    }
    is waiting for 'log file sync' with wait info:  --发现其在等待LGWR,其实这个和上面ORADEBUG 模拟LGWR被阻塞也有关
    {
                      p1: 'buffer#'=0x22f7
                      p2: 'sync scn'=0x4705d0e
            time in wait: 1 min 43 sec (last interval)
            time in wait: 5 min 14 sec (total)
           timeout after: never
                 wait id: 241
                blocking: 0 sessions
            wait history:
              * time between current wait and wait #1: 0.000000 sec
              1.       event: 'latch free'
                 time waited: 0.004440 sec
                     wait id: 242             p1: 'address'=0xdf0e9298
                                              p2: 'number'=0x2
                                              p3: 'tries'=0x0
              * time between wait #1 and #2: 0.000000 sec
              2.       event: 'log file sync'
                 time waited: 3 min 30 sec
                     wait id: 241             p1: 'buffer#'=0x22f7
                                              p2: 'sync scn'=0x4705d0e
              * time between wait #2 and #3: 0.120403 sec
              3.       event: 'db file sequential read'
                 time waited: 0.014195 sec
                     wait id: 240             p1: 'file#'=0x1
                                              p2: 'block#'=0xaad
                                              p3: 'blocks'=0x1
    }
    and is blocked by 'instance: 1, os id: 3876, session id: 11',
    which is a member of 'Chain 1'.




    看下等待会话11是什么,可见会话11就是后台进程LGWR


        and is blocked by
 => Oracle session identified by:
    {
                instance: 1 (guowang.guowang)
                   os id: 3876
              process id: 11, oracle@seconary (LGWR)
              session id: 11
        session serial #: 1


--既然lgwr是阻塞会话,恢复它的功能即可
--session1
[oracle@seconary ~]$ sqlplus -prelim '/as sysdba'


SQL*Plus: Release 11.2.0.1.0 Production on Fri Nov 6 03:51:57 2015


Copyright (c) 1982, 2009, Oracle.  All rights reserved.


SQL> 


SQL> oradebug setospid 3876
Oracle pid: 11, Unix process pid: 3876, image: oracle@seconary (LGWR)
SQL> oradebug resume
Statement processed.


---session 2
--hang dml执行成功
SQL> insert into t_db_hang values(1);


1 row created.


--session 3也可以新生成会话
[oracle@seconary ~]$ sqlplus scott/system


SQL*Plus: Release 11.2.0.1.0 Production on Fri Nov 6 03:58:36 2015


Copyright (c) 1982, 2009, Oracle.  All rights reserved.




Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options


SQL> 




再学习下另一个知识


Chains most likely to have caused the hang:


*** 2015-11-06 03:42:00.529
 [a] Chain 1 Signature: 'rdbms ipc message'<='log file sync'
     Chain 1 Signature Hash: 0xcfe404d3
 [b] Chain 2 Signature: 'rdbms ipc message'<='log file sync'
     Chain 2 Signature Hash: 0xcfe404d3
 [c] Chain 3 Signature: 'rdbms ipc message'<='log file sync'
     Chain 3 Signature Hash: 0xcfe404d3




 经过分析,这就是发生HANG的相关chain信息,可以由具体的chain signature hash对应值,直接定们到具体的chain即可    
正文到此结束
Loading...