转载

select for update语句造成ORA-00060 deadlock死锁问题分析

测试环境出现问题

今天,收到项目组通知说发生了死锁,让我查一下死锁出现的原因

首先,登录数据库,查看trace日志所在路径

  1. SYS@cams>show parameter dump;
  2. NAME TYPE VALUE
  3. ------------------------------------ ----------- ------------------------------
  4. background_core_dump string partial
  5. background_dump_dest string /u01/app/oracle/diag/rdbms/cams/cams/trace
  6. core_dump_dest string /u01/app/oracle/diag/rdbms/cams/cams/cdump
  7. max_dump_file_size string unlimited
  8. shadow_core_dump string PARTIAL
  9. user_dump_dest string /u01/app/oracle/diag/rdbms/cams/cams/trace

查看alert日志,搜索00060

  1. [oracle@db trace]$ cd /u01/app/oracle/diag/rdbms/cams/cams/trace/
  2. [oracle@db trace]$ vi alert_cams.log

找到报错trace日志如下:

  1. ORA-00060: Deadlock detected. More info in file /u01/app/oracle/diag/rdbms/cams/cams/trace/cams_ora_12850.trc.
  2. Tue Sep 12 13:28:31 2017
  3. ORA-00060: Deadlock detected. More info in file /u01/app/oracle/diag/rdbms/cams/cams/trace/cams_ora_12854.trc.
  4. Tue Sep 12 13:28:36 2017
  5. ORA-00060: Deadlock detected. More info in file /u01/app/oracle/diag/rdbms/cams/cams/trace/cams_ora_12822.trc.
  6. Tue Sep 12 13:31:12 2017

打开其中一个trace文件,部分内容如下

  1. Trace file /u01/app/oracle/diag/rdbms/cams/cams/trace/cams_ora_12850.trc
  2. Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
  3. With the Partitioning, OLAP, Data Mining and Real Application Testing options
  4. ORACLE_HOME = /u01/app/oracle/product/11.2.0/db_1
  5. System name: Linux
  6. Node name: db
  7. Release: 2.6.32-431.el6.x86_64
  8. Version: #1 SMP Sun Nov 10 22:19:54 EST 2013
  9. Machine: x86_64
  10. VM name: VMWare Version: 6
  11. Instance name: cams
  12. Redo thread mounted by this instance: 1
  13. Oracle process number: 293
  14. Unix process pid: 12850, image: oracle@db
  15. *** 2017-09-12 13:28:28.401
  16. *** SESSION ID:(1139.47765) 2017-09-12 13:28:28.401
  17. *** CLIENT ID:() 2017-09-12 13:28:28.401
  18. *** SERVICE NAME:(SYS$USERS) 2017-09-12 13:28:28.401
  19. *** MODULE NAME:(JDBC Thin Client) 2017-09-12 13:28:28.401
  20. *** ACTION NAME:() 2017-09-12 13:28:28.401
  21.  
  22.  
  23. *** 2017-09-12 13:28:28.401
  24. DEADLOCK DETECTED ( ORA-00060 )
  25.  
  26. [Transaction Deadlock]
  27.  这里指出死锁不是Oracle自身的错误,是用户的设计造成的问题
  28. The following deadlock is not an ORACLE error. It is a
  29. deadlock due to user error in the design of an application
  30. or from issuing incorrect ad-hoc SQL. The following
  31. information may aid in determining the deadlock:
  32.  
  33. 这个死锁图表明两个session各持有一份资源,等待对方持有的资源
  34. Deadlock graph:
  35.                        ---------Blocker(s)-------- ---------Waiter(s)---------
  36. Resource Name process session holds waits process session holds waits
  37. TX-00160018-00000adc 293 1139 X 384 65 X
  38. TX-0004000f-000064d6 384 65 X 293 1139 X
  39.  
  40. session 1139: DID 0001-0125-000074A5 session 65: DID 0001-0180-000037EC
  41. session 65: DID 0001-0180-000037EC session 1139: DID 0001-0125-000074A5
  42.  
  43. Rows waited on:
  44.   Session 1139: obj - rowid = 00021C7B - AAAhx7AAHAAARMrAAB
  45.   (dictionary objn - 138363, file - 7, block - 70443, slot - 1)
  46.   Session 65: obj - rowid = 00021C7B - AAAhx7AAHAAARMtAAD
  47.   (dictionary objn - 138363, file - 7, block - 70445, slot - 3)
  48.  
  49. ----- Information for the OTHER waiting sessions -----
  50. Session 65:
  51.   sid: 65 ser: 27869 audsid: 3026187 user: 111/CAMS_CORE
  52.     flags: (0x45) USR/- flags_idl: (0x1) BSY/-/-/-/-/-
  53.     flags2: (0x40009) -/-/INC
  54.   pid: 384 O/S info: user: oracle, term: UNKNOWN, ospid: 12854
  55.     image: oracle@db
  56.   client details:
  57.     O/S info: user: cams, term: unknown, ospid: 1234
  58.     machine: yy program: JDBC Thin Client
  59. application name: JDBC Thin Client, hash value=2546894660
  60.   current SQL:
  61. 导致死锁的SQL语句1
  62.   select
  63.  
  64. SEQ_NO, ACO_AC,AC_DTE,CCY,BAL,FRZ_AMT,
  65. CRE_TLR, CRE_DTE,
  66. CRE_BR, CRE_BK, UPD_TLR, UPD_DTE, UPD_BR, UPD_BK,TSL_AMT,VPS_BAL
  67.  
  68. FROM CD_ACBALANCE
  69.  WHERE ACO_AC = :1
  70. FOR UPDATE
  71.  
  72. ----- End of information for the OTHER waiting sessions -----
  73.  
  74. Information for THIS session:
  75.  
  76. ----- Current SQL Statement for this session (sql_id=cfy88pmyts0fn) -----
  77. 导致死锁的SQL语句2
  78. select
  79.  
  80. SEQ_NO, ACO_AC,AC_DTE,CCY,BAL,FRZ_AMT,
  81. CRE_TLR, CRE_DTE,
  82. CRE_BR, CRE_BK, UPD_TLR, UPD_DTE, UPD_BR, UPD_BK,TSL_AMT,VPS_BAL
  83.  
  84. FROM CD_ACBALANCE
  85.  WHERE ACO_AC = :1
  86. FOR UPDATE
  87. ===================================================

这里可以明确的是select for update语句造成了死锁。但是考虑到环境为测试环境,没开启归档,不能通过logminer挖掘故障期间的SQL事物,而且开发人员已经通过kill session的方式处理了死锁,所以需要在别的环境中重现该问题,并进行分析。

重现死锁问题

首先连上数据库,启用HR用户

  1. [oracle@prod ~]$ sqlplus / as sysdba
  2. SQL*Plus: Release 11.2.0.4.0 Production on Wed Sep 13 03:14:17 2017
  3. Copyright (c) 1982, 2013, Oracle. All rights reserved.
  4. Connected to an idle instance.
  5. SYS@PROD>startup;
  6. ORACLE instance started.
  7. Total System Global Area 1202556928 bytes
  8. Fixed Size 2252704 bytes
  9. Variable Size 771752032 bytes
  10. Database Buffers 419430400 bytes
  11. Redo Buffers 9121792 bytes
  12. Database mounted.
  13. Database opened.
  14. SYS@PROD>conn hr/hr
  15. ERROR:
  16. ORA-28000: the account is locked
  17. Warning: You are no longer connected to ORACLE.
  18. @>conn / as sysdba
  19. Connected.
  20. SYS@PROD>alter user hr identified by hr account unlock;
  21. User altered.
  22. SYS@PROD>conn hr/hr
  23. Connected.

选中COUNTRIES表作为测试对象

  1. HR@PROD>desc countries;
  2.  Name Null? Type
  3.  ----------------------------------------- -------- ----------------------------
  4.  COUNTRY_ID NOT NULL CHAR(2)
  5.  COUNTRY_NAME VARCHAR2(40)
  6.  REGION_ID NUMBER
  7. HR@PROD>select count(*) from countries;
  8.   COUNT(*)
  9. ----------
  10. 25
  11. HR@PROD>select count(distinct(COUNTRY_ID)) from countries;
  12. COUNT(DISTINCT(COUNTRY_ID))
  13. ---------------------------
  14.  25


经过验证,COUNTRIES表中的COUNTRY_ID字段是唯一的。

COUNTRIES表新建一个test表,在test表中做测试

  1. HR@PROD>create table test as select * from countries;
  2. Table created.
  3. HR@PROD>desc test
  4.  Name Null? Type
  5.  ----------------------------------------- -------- ----------------------------
  6.  COUNTRY_ID NOT NULL CHAR(2)
  7.  COUNTRY_NAME VARCHAR2(40)
  8.  REGION_ID NUMBER
  9. HR@PROD>select country_id,country_name,region_id from countries;
  10. CO COUNTRY_NAME REGION_ID
  11. -- ---------------------------------------- ----------
  12. AR Argentina 2
  13. AU Australia 3
  14. BE Belgium 1
  15. BR Brazil 2
  16. CA Canada 2
  17. CH Switzerland 1
  18. CN China 3
  19. DE Germany 1
  20. DK Denmark 1
  21. EG Egypt 4
  22. FR France 1
  23. CO COUNTRY_NAME REGION_ID
  24. -- ---------------------------------------- ----------
  25. IL Israel 4
  26. IN India 3
  27. IT Italy 1
  28. JP Japan 3
  29. KW Kuwait 4
  30. ML Malaysia 3
  31. MX Mexico 2
  32. NG Nigeria 4
  33. NL Netherlands 1
  34. SG Singapore 3
  35. UK United Kingdom 1
  36. CO COUNTRY_NAME REGION_ID
  37. -- ---------------------------------------- ----------
  38. US United States of America 2
  39. ZM Zambia 4
  40. ZW Zimbabwe 4
  41. 25 rows selected.


情形1

两个select for update语句想要同时锁定一条语句。

Session1

  1. HR@PROD>select country_id,country_name,region_id from test where country_id='AU' for update;
  2. CO COUNTRY_NAME REGION_ID
  3. -- ---------------------------------------- ----------
  4. AU Australia 3


Sessions2

  1. HR@PROD>select country_id,country_name,region_id from test where country_id='AU' for update;


这时,session2会一直处于锁等待状态,而不会出现死锁。当把第一个事物commit或者rollback之后,第二个事物会继续执行。

Session1

  1. HR@PROD>rollback;
  2. Rollback complete.


Session2

  1. HR@PROD>select country_id,country_name,region_id from test where country_id='AU' for update;
  2. CO COUNTRY_NAME REGION_ID
  3. -- ---------------------------------------- ----------
  4. AU Australia


显然,两个select for update语句想要同时锁定一条语句,并不会出现死锁,而会出现锁等待的现象。

情形2

第一个select for update事物锁定A row后,想要再锁定B row;第二个select for update事物锁定B row后,想要锁定A row

Session1

  1. HR@PROD>select country_id,country_name,region_id from test where country_id='AU' for update;
  2. CO COUNTRY_NAME REGION_ID
  3. -- ---------------------------------------- ----------
  4. AU Australia 3


Session 2

  1. HR@PROD>select country_id,country_name,region_id from test where country_id='UK' for update;
  2. CO COUNTRY_NAME REGION_ID
  3. -- ---------------------------------------- ----------
  4. UK United Kingdom 1


如果Session1想要继续持有Session2正在持有的row

Session1

  1. HR@PROD>select country_id,country_name,region_id from test where country_id='UK' for update;


这时,会出现锁等待的现场,和上一场景类似。如果Session2也想持有Session1正在持有的row

Session2

  1. HR@PROD>select country_id,country_name,region_id from test where country_id='AU' for update;


这时,Session2还处于等待状态,但是Session1出现死锁:

Session1

  1. HR@PROD>select country_id,country_name,region_id from test where country_id='UK' for update;
  2. select country_id,country_name,region_id from test where country_id='UK' for update
  3. *
  4. ERROR at line 1:
  5. ORA-00060: deadlock detected while waiting for resource


这时,死锁的现象已经重现,而且,场景2从头到尾只使用了一个select for update语句,只是换了参数而已。

定位死锁语句

通过检查数据库表,能够检查出是哪一条语句被死锁,产生死锁的机器是哪一台。

1)用dba用户执行以下语句

  1. select c.owner, c.object_name, c.object_type, b.sid,
  2. b.serial#, b.lockwait, b.status, b.osuser, b.machine, b.process, b.program
  3. from v$locked_object a ,
  4. v$session b,
  5. dba_objects c
  6. where b.sid = a.session_id
  7. and a.object_id = c.object_id;


select for update语句造成ORA-00060 deadlock死锁问题分析

如果有输出的结果,则说明有死锁,且能看到死锁的机器是哪一台。字段说明:

OWNER:死锁语句所用的数据库用户,这里是HR用户。

OBJECT_NAME:产生死锁的对象,这里是TEST表。

OBJECT_TYPE:产生死锁的对象类型,这里是TABLE

SIDSESSION标识,常用于连接其它

SERIAL#SID有可能会重复,当两个sessionSID重复时,SERIAL#用来区别session

LOCKWAIT可以通过这个字段查询出当前正在等待的锁的相关信息,如果有内容表示被死锁或者有锁等待事件。

STATUS用来判断session状态。ACTIVE:正执行SQL语句INACTIVE:等待操作。KILLED:被标注为杀死。

OSUSER:客户端操作系统用户名。

MACHINE:客户端操作系统的机器名。

PROCESS:客户端进程的ID

PROGRAM:客户端执行的应用程序。

2)用dba用户执行以下语句,可以查看到被死锁的语句。

  1. select sql_text from v$sql where (address,hash_value) in
  2. (select sql_address,sql_hash_value from v$session where sid in
  3. (select session_id from v$locked_object));


或者


  1. select sql_text from v$sql where (address,hash_value) in
  2. (select sql_address,sql_hash_value from v$session where lockwait is not null);


select for update语句造成ORA-00060 deadlock死锁问题分析

这里查出来出现死锁的语句和之前的测试结果一致。

3)分析trace日志文件

死锁发生时,可以再trace日志文件中找到如下语句:

  1. ORA-00060: Deadlock detected. More info in file /u01/app/oracle/diag/rdbms/prod/PROD/trace/PROD_ora_4852.trc.


打开trace文件,可以找到死锁的语句:

  1. ----- Information for the OTHER waiting sessions -----
  2. Session 31:
  3.   sid: 31 ser: 25 audsid: 110280 user: 84/HR
  4.     flags: (0x45) USR/- flags_idl: (0x1) BSY/-/-/-/-/-
  5.     flags2: (0x40009) -/-/INC
  6.   pid: 23 O/S info: user: oracle, term: UNKNOWN, ospid: 4853
  7.     image: oracle@prod (TNS V1-V3)
  8.   client details:
  9.     O/S info: user: oracle, term: pts/1, ospid: 3149
  10.     machine: prod program: sqlplus@prod (TNS V1-V3)
  11.     application name: SQL*Plus, hash value=3669949024
  12.   current SQL:
  13.   select country_id,country_name,region_id from test where country_id='AU' for update
  14.  
  15. ----- End of information for the OTHER waiting sessions -----
  16.  
  17. Information for THIS session:
  18.  
  19. ----- Current SQL Statement for this session (sql_id=6n5kjs2twrwwq) -----
  20. select country_id,country_name,region_id from test where country_id='UK' for update
  21. ===================================================


解决死锁问题

方法一(杀进程)

1)查找死锁的进程:


  1. select sid, serial#, username, command, lockwait, osuser from v$session where lockwait is not null;


select for update语句造成ORA-00060 deadlock死锁问题分析

2kill掉这个死锁的进程:


  1. alter system kill session 'sid, serial#';


这里执行语句为:alter system kill session '29,69';

select for update语句造成ORA-00060 deadlock死锁问题分析

原先造成死锁的进程被killed

select for update语句造成ORA-00060 deadlock死锁问题分析

3)如果还不能解决,使用杀系统进程的方式处理:

这里为了测试,再次重现了死锁,并使用杀进程方式进行处理。


  1. select p.spid from v$session s,v$process p where s.sid=XXX and s.paddr=p.addr;


其中,XXX使用第一步查出来的SID参数替代

这里查出来的进程号为:4257


  1. [oracle@prod ~]$ ps -ef | grep 4257
  2. oracle 4257 3149 0 07:08 ? 00:00:00 oraclePROD (DESCRIPTION=(LOCAL=YES)(ADDRESS=(PROTOCOL=beq)))
  3. oracle 4321 4000 0 07:21 pts/4 00:00:00 grep 4257
  4. [oracle@prod ~]$ kill -9 4257
  5. [oracle@prod ~]$ ps -ef | grep 4257
  6. oracle 4327 4000 0 07:21 pts/4 00:00:00 grep 4257


杀完进程之后,造成死锁的进程被杀死

select for update语句造成ORA-00060 deadlock死锁问题分析

查不到死锁进程

select for update语句造成ORA-00060 deadlock死锁问题分析

方法二(重启库)


  1. SYS@PROD>shutdown immediate;
  2. Database closed.
  3. Database dismounted.
  4. ORACLE instance shut down.
  5. SYS@PROD>startup;
  6. ORACLE instance started.
  7. Total System Global Area 1202556928 bytes
  8. Fixed Size 2252704 bytes
  9. Variable Size 771752032 bytes
  10. Database Buffers 419430400 bytes
  11. Redo Buffers 9121792 bytes
  12. Database mounted.
  13. Database opened.


查看死锁进程:

select for update语句造成ORA-00060 deadlock死锁问题分析

方法三(commit || rollback

Session1


  1. HR@PROD>select country_id,country_name,region_id from test where country_id='AU' for update;
  2. CO COUNTRY_NAME REGION_ID
  3. -- ---------------------------------------- ----------
  4. AU Australia 3


Session2


  1. HR@PROD>select country_id,country_name,region_id from test where country_id='UK' for update;
  2. CO COUNTRY_NAME REGION_ID
  3. -- ---------------------------------------- ----------
  4. UK United Kingdom 1


Session1


  1. HR@PROD>select country_id,country_name,region_id from test where country_id='UK' for update;
  2. Waiting……


Session2


  1. HR@PROD>select country_id,country_name,region_id from test where country_id='AU' for update;
  2. Waiting……


Session1


  1. HR@PROD>select country_id,country_name,region_id from test where country_id='UK' for update;
  2. select country_id,country_name,region_id from test where country_id='UK' for update
  3.                                               *
  4. ERROR at line 1:
  5. ORA-00060: deadlock detected while waiting for resource


select for update语句造成ORA-00060 deadlock死锁问题分析

这里Session1出现死锁,只要执行commit或者rollback就可以解除死锁,只不过事务中第一个SQL执行成功,第二个SQL执行失败。

Session1


  1. HR@PROD>commit;
  2. Commit complete.


Session2


  1. HR@PROD>select country_id,country_name,region_id from test where country_id='AU' for update;
  2. CO COUNTRY_NAME REGION_ID
  3. -- ---------------------------------------- ----------
  4. AU Australia 3


此时,死锁状态解除:

select for update语句造成ORA-00060 deadlock死锁问题分析

问题总结

最后,通过与开发人员交流,得出的结论是出现该问题的原因不是程序设计的问题,而是在开发环境中,有人通过debug模式连到服务器上进行代码调试,有人通过客户端的形式访问服务器上的应用,当两者同时调试某一功能时(通过select for update的语句进行一张表中的数据访问),在特殊的情况下,出现了死锁的情况。

当然,在开发环境中比较容易出现这类情况,如果需要避免,就需要使用该开发环境的开发人员与测试人员做好沟通了。对于容易出现锁表的功能,可以要求测试人员在测试环境中测试,除特殊情况,尽量减少在开发环境中测试的次数,以免锁表影响开发测试进度。

正文到此结束
Loading...