Home >Java >javaTutorial >Java program deadlock problem principles and solutions
The Java language ensures atomicity through the synchronized keyword. This is because each Object has an implicit lock, which is also called a monitor object. This internal lock is automatically acquired before entering synchronized, and once leaving this mode, the lock will be automatically released whether it is completed or interrupted. Obviously this is an exclusive lock, and each lock request is mutually exclusive. Compared with many advanced locks (Lock/ReadWriteLock, etc.), the cost of synchronized is higher than the latter. But synchronzied has a simpler syntax and is easier to use and understand. Lock Once the lock() method is called to acquire the lock but not released correctly, it is likely to cause a deadlock. Therefore, the release operation of Lock is always followed in the finally code block. This is also an adjustment and redundancy in the code structure. The implementation of Lock has used hardware resources to the extreme, so there is not much room for optimization in the future, unless the hardware has higher performance, but synchronized is only a standard implementation, which is still very high on different platforms and different hardware. There is room for improvement, and the optimization of Java locks in the future will mainly focus on this. Since synchronzied cannot avoid deadlocks, deadlocks are common errors. The causes and solutions to deadlocks are described in detail below.
Deadlock is an error at the operating system level and is the abbreviation of process deadlock. It was first discovered in 1965 by Dijkstra when he was studying the banker's algorithm. It is one of the most difficult problems to deal with in computer operating systems and even in the entire field of concurrent programming.
In fact, there are many things in the computer world that require multi-threading to solve, because only in this way can resources be utilized to the greatest extent and the efficiency of computing can be reflected. However, in practice, there are many situations in computer systems where resources can only be used by one process at a time, such as printers, and only one process can control it at the same time. In a multi-channel programming environment, several processes often share such resources, and a process may require more than one resource. Therefore, there will be several processes competing for limited resources and advancing in an inappropriate order, resulting in an infinite loop of waiting. We call this state a deadlock. To describe it simply, deadlock refers to a situation where multiple processes wait in a loop for resources occupied by each other and remain deadlocked indefinitely. Obviously, if there is no external force, then each process involved in the deadlock will always be in a blocked state.
The deadlock phenomenon in the system not only wastes a lot of system resources, but even causes the entire system to collapse, bringing catastrophic consequences. Therefore, the deadlock problem must be paid great attention both theoretically and technically.
How does a banker safely lend a certain amount of funds to several customers so that these customers can borrow the money to complete what they want to do? thing, and at the same time the banker can recover all the funds without going bankrupt. The banker is like an operating system, the customer is like the running process, and the banker's funds are the resources of the system.
The banker's algorithm needs to ensure the following four points:
When a customer's maximum demand for funds does not exceed the banker's existing funds, the customer can be accepted;
Customers can borrow in installments, but the total amount of loans cannot exceed the maximum demand;
When the banker's existing funds cannot meet the loan amount that the customer still needs, the payment of the customer's loan can be postponed, but it can always be Allow customers to get loans within a limited time;
When customers get all the funds they need, they will be able to return all the funds within a limited time.
List 1. Banker’s algorithm implementation
/* 一共有5个进程需要请求资源,有3类资源 */ public class BankDemo { // 每个进程所需要的最大资源数 public static int MAX[][] = { { 7, 5, 3 }, { 3, 2, 2 }, { 9, 0, 2 }, { 2, 2, 2 }, { 4, 3, 3 } }; // 系统拥有的初始资源数 public static int AVAILABLE[] = { 10, 5, 7 }; // 系统已给每个进程分配的资源数 public static int ALLOCATION[][] = { { 0, 0, 0 }, { 0, 0, 0 }, { 0, 0, 0 }, { 0, 0, 0 }, { 0, 0, 0 } }; // 每个进程还需要的资源数 public static int NEED[][] = { { 7, 5, 3 }, { 3, 2, 2 }, { 9, 0, 2 }, { 2, 2, 2 }, { 4, 3, 3 } }; // 每次申请的资源数 public static int Request[] = { 0, 0, 0 }; // 进程数与资源数 public static int M = 5, N = 3; int FALSE = 0; int TRUE = 1; public void showdata() { int i, j; System.out.print("系统可用的资源数为:/n"); for (j = 0; j < N; j++) { System.out.print("资源" + j + ":" + AVAILABLE[j] + " "); } System.out.println(); System.out.println("各进程还需要的资源量:"); for (i = 0; i < M; i++) { System.out.print("进程" + i + ":"); for (j = 0; j < N; j++) { System.out.print("资源" + j + ":" + NEED[i][j] + " "); } System.out.print("/n"); } System.out.print("各进程已经得到的资源量: /n"); for (i = 0; i < M; i++) { System.out.print("进程"); System.out.print(i); for (j = 0; j < N; j++) { System.out.print("资源" + j + ":" + ALLOCATION[i][j] + " "); } System.out.print("/n"); } } // 分配资源,并重新更新各种状态 public void changdata(int k) { int j; for (j = 0; j < N; j++) { AVAILABLE[j] = AVAILABLE[j] - Request[j]; ALLOCATION[k][j] = ALLOCATION[k][j] + Request[j]; NEED[k][j] = NEED[k][j] - Request[j]; } }; // 回收资源,并重新更新各种状态 public void rstordata(int k) { int j; for (j = 0; j < N; j++) { AVAILABLE[j] = AVAILABLE[j] + Request[j]; ALLOCATION[k][j] = ALLOCATION[k][j] - Request[j]; NEED[k][j] = NEED[k][j] + Request[j]; } }; // 释放资源 public void free(int k) { for (int j = 0; j < N; j++) { AVAILABLE[j] = AVAILABLE[j] + ALLOCATION[k][j]; System.out.print("释放" + k + "号进程的" + j + "资源!/n"); } } public int check0(int k) { int j, n = 0; for (j = 0; j < N; j++) { if (NEED[k][j] == 0) n++; } if (n == 3) return 1; else return 0; } // 检查安全性函数 // 所以银行家算法其核心是:保证银行家系统的资源数至少不小于一个客户的所需要的资源数。在安全性检查函数 chkerr() 上由这个方法来实现 // 这个循环来进行核心判断,从而完成了银行家算法的安全性检查工作。 public int chkerr(int s) { int WORK; int FINISH[] = new int[M], temp[] = new int[M];// 保存临时的安全进程序列 int i, j, k = 0; for (i = 0; i < M; i++) FINISH[i] = FALSE; for (j = 0; j < N; j++) { WORK = AVAILABLE[j]; // 第 j 个资源可用数 i = s; // 判断第 i 个进程是否满足条件 while (i < M) { if (FINISH[i] == FALSE && NEED[i][j] <= WORK) { WORK = WORK + ALLOCATION[i][j]; FINISH[i] = TRUE; temp[k] = i; k++; i = 0; } else { i++; } } for (i = 0; i < M; i++) if (FINISH[i] == FALSE) { System.out.print("/n 系统不安全!!! 本次资源申请不成功!/n"); return 1; } } System.out.print("/n 经安全性检查,系统安全,本次分配成功。/n"); System.out.print("本次安全序列:"); for (i = 0; i < M - 1; i++) { System.out.print("进程" + temp[i] + "->"); } System.out.print("进程" + temp[M - 1]); System.out.println("/n"); return 0; } }
Deadlock problem It is a problem unique to multi-threading, and it can be considered an extreme case where switching between threads consumes system performance. In a deadlock, threads wait for each other's resources without releasing their own resources, resulting in endless waiting. The result is that system tasks can never be completed. The deadlock problem is a problem that should be resolutely avoided and eliminated in multi-threaded development.
Generally speaking, the following conditions need to be met for a deadlock problem to occur:
1. Mutual exclusion condition: A resource can only be used by one thread at a time.
2. Request and hold conditions: When a process is blocked due to requesting resources, it will keep the obtained resources.
3. Non-deprivation conditions: The resources that have been obtained by the process cannot be forcibly deprived before they are used up.
4. Circular waiting conditions: Several processes form a head-to-tail circular waiting relationship for resources.
As long as any one of the four necessary conditions for deadlock is destroyed, the deadlock problem can be solved.
Let’s look at an example first. As mentioned before, deadlock is a running situation when two or more threads are permanently blocked. The generation of this situation is accompanied by at least two threads and two or multiple resources. In the example shown in Listing 2, we write a simple program that will cause a deadlock to occur, and then we will see how to analyze it.
List 2. Deadlock example
public class ThreadDeadlock { public static void main(String[] args) throws InterruptedException { Object obj1 = new Object(); Object obj2 = new Object(); Object obj3 = new Object(); Thread t1 = new Thread(new SyncThread(obj1, obj2), "t1"); Thread t2 = new Thread(new SyncThread(obj2, obj3), "t2"); Thread t3 = new Thread(new SyncThread(obj3, obj1), "t3"); t1.start(); Thread.sleep(5000); t2.start(); Thread.sleep(5000); t3.start(); } } class SyncThread implements Runnable { private Object obj1; private Object obj2; public SyncThread(Object o1, Object o2) { this.obj1 = o1; this.obj2 = o2; } @Override public void run() { String name = Thread.currentThread().getName(); System.out.println(name + " acquiring lock on " + obj1); synchronized (obj1) { System.out.println(name + " acquired lock on " + obj1); work(); System.out.println(name + " acquiring lock on " + obj2); synchronized (obj2) { System.out.println(name + " acquired lock on " + obj2); work(); } System.out.println(name + " released lock on " + obj2); } System.out.println(name + " released lock on " + obj1); System.out.println(name + " finished execution."); } private void work() { try { Thread.sleep(30000); } catch (InterruptedException e) { e.printStackTrace(); } } }
在上面的程序中同步线程正完成 Runnable 的接口,它工作的是两个对象,这两个对象向对方寻求死锁而且都在使用同步阻塞。在主函数中,我使用了三个为同步线程运行的线程,而且在其中每个线程中都有一个可共享的资源。这些线程以向第一个对象获取封锁这种方式运行。但是当它试着向第二个对象获取封锁时,它就会进入等待状态,因为它已经被另一个线程封锁住了。这样,在线程引起死锁的过程中,就形成了一个依赖于资源的循环。当我执行上面的程序时,就产生了输出,但是程序却因为死锁无法停止。输出如清单 3 所示。
清单 3. 清单 2 运行输出
t1 acquiring lock on java.lang.Object@1dd3812 t1 acquired lock on java.lang.Object@1dd3812 t2 acquiring lock on java.lang.Object@c791b9 t2 acquired lock on java.lang.Object@c791b9 t3 acquiring lock on java.lang.Object@1aa9f99 t3 acquired lock on java.lang.Object@1aa9f99 t1 acquiring lock on java.lang.Object@c791b9 t2 acquiring lock on java.lang.Object@1aa9f99
在此我们可以清楚地在输出结果中辨认出死锁局面,但是在我们实际所用的应用中,发现死锁并将它排除是非常难的。
JVM 提供了一些工具可以来帮助诊断死锁的发生,如下面程序清单 4 所示,我们实现了一个死锁,然后尝试通过 jstack 命令追踪、分析死锁发生。
清单 4. 死锁代码
import java.util.concurrent.locks.ReentrantLock; // 下面演示一个简单的死锁,两个线程分别占用 south 锁和 north 锁,并同时请求对方占用的锁,导致死锁 public class DeadLock extends Thread { protected Object myDirect; static ReentrantLock south = new ReentrantLock(); static ReentrantLock north = new ReentrantLock(); public DeadLock(Object obj) { this.myDirect = obj; if (myDirect == south) { this.setName("south"); } else { this.setName("north"); } } @Override public void run() { if (myDirect == south) { try { north.lockInterruptibly();// 占用 north try { Thread.sleep(500); } catch (Exception ex) { ex.printStackTrace(); } south.lockInterruptibly(); System.out.println("car to south has passed"); } catch (InterruptedException ex) { System.out.println("car to south is killed"); ex.printStackTrace(); } finally { if (north.isHeldByCurrentThread()) { north.unlock(); } if (south.isHeldByCurrentThread()) { south.unlock(); } } } if (myDirect == north) { try { south.lockInterruptibly();// 占用 south try { Thread.sleep(500); } catch (Exception ex) { ex.printStackTrace(); } north.lockInterruptibly(); System.out.println("car to north has passed"); } catch (InterruptedException ex) { System.out.println("car to north is killed"); ex.printStackTrace(); } finally { if (north.isHeldByCurrentThread()) { north.unlock(); } if (south.isHeldByCurrentThread()) { south.unlock(); } } } } public static void main(String[] args) throws InterruptedException { DeadLock car2south = new DeadLock(south); DeadLock car2north = new DeadLock(north); car2south.start(); car2north.start(); } }
jstack 可用于导出 Java 应用程序的线程堆栈,-l 选项用于打印锁的附加信息。我们运行 jstack 命令,输出入清单 5 和 6 所示,其中清单 5 里面可以看到线程处于运行状态,代码中调用了拥有锁投票、定时锁等候和可中断锁等候等特性的 ReentrantLock 锁机制。清单 6 直接打印出出现死锁情况,报告 north 和 sourth 两个线程互相等待资源,出现了死锁。
清单 5. jstack 运行输出 1
[root@facenode4 ~]# jstack -l 31274 2015-01-29 12:40:27 Full thread dump Java HotSpot(TM) 64-Bit Server VM (20.45-b01 mixed mode): "Attach Listener" daemon prio=10 tid=0x00007f6d3c001000 nid= 0x7a87 waiting on condition [0x0000000000000000] java.lang.Thread.State: RUNNABLE Locked ownable synchronizers: - None "DestroyJavaVM" prio=10 tid=0x00007f6da4006800 nid= 0x7a2b waiting on condition [0x0000000000000000] java.lang.Thread.State: RUNNABLE Locked ownable synchronizers: - None "north" prio=10 tid=0x00007f6da4101800 nid= 0x7a47 waiting on condition [0x00007f6d8963b000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for <0x000000075903c7c8> ( a java.util.concurrent.locks.ReentrantLock$NonfairSync) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:156) at java.util.concurrent.locks.AbstractQueuedSynchronizer. parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:811) at java.util.concurrent.locks.AbstractQueuedSynchronizer. doAcquireInterruptibly(AbstractQueuedSynchronizer.java:867) at java.util.concurrent.locks.AbstractQueuedSynchronizer. acquireInterruptibly(AbstractQueuedSynchronizer.java:1201) at java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:312) at DeadLock.run(DeadLock.java:50) Locked ownable synchronizers: - <0x000000075903c798> (a java.util.concurrent.locks.ReentrantLock$NonfairSync) "south" prio=10 tid=0x00007f6da4100000 nid= 0x7a46 waiting on condition [0x00007f6d8973c000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for <0x000000075903c798> ( a java.util.concurrent.locks.ReentrantLock$NonfairSync) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:156) at java.util.concurrent.locks.AbstractQueuedSynchronizer. parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:811) at java.util.concurrent.locks.AbstractQueuedSynchronizer. doAcquireInterruptibly(AbstractQueuedSynchronizer.java:867) at java.util.concurrent.locks.AbstractQueuedSynchronizer. acquireInterruptibly(AbstractQueuedSynchronizer.java:1201) at java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:312) at DeadLock.run(DeadLock.java:28) Locked ownable synchronizers: - <0x000000075903c7c8> (a java.util.concurrent.locks.ReentrantLock$NonfairSync) "Low Memory Detector" daemon prio=10 tid=0x00007f6da40d2800 nid= 0x7a44 runnable [0x0000000000000000] java.lang.Thread.State: RUNNABLE Locked ownable synchronizers: - None "C2 CompilerThread1" daemon prio=10 tid=0x00007f6da40d0000 nid= 0x7a43 waiting on condition [0x0000000000000000] java.lang.Thread.State: RUNNABLE Locked ownable synchronizers: - None "C2 CompilerThread0" daemon prio=10 tid=0x00007f6da40cd000 nid= 0x7a42 waiting on condition [0x0000000000000000] java.lang.Thread.State: RUNNABLE Locked ownable synchronizers: - None "Signal Dispatcher" daemon prio=10 tid=0x00007f6da40cb000 nid= 0x7a41 runnable [0x0000000000000000] java.lang.Thread.State: RUNNABLE Locked ownable synchronizers: - None "Finalizer" daemon prio=10 tid=0x00007f6da40af000 nid= 0x7a40 in Object.wait() [0x00007f6d89d44000] java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) - waiting on <0x0000000759001300> (a java.lang.ref.ReferenceQueue$Lock) at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:118) - locked <0x0000000759001300> (a java.lang.ref.ReferenceQueue$Lock) at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:134) at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:171) Locked ownable synchronizers: - None "Reference Handler" daemon prio=10 tid=0x00007f6da40ad000 nid= 0x7a3f in Object.wait() [0x00007f6d89e45000] java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) - waiting on <0x00000007590011d8> (a java.lang.ref.Reference$Lock) at java.lang.Object.wait(Object.java:485) at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:116) - locked <0x00000007590011d8> (a java.lang.ref.Reference$Lock) Locked ownable synchronizers: - None "VM Thread" prio=10 tid=0x00007f6da40a6000 nid=0x7a3e runnable "GC task thread#0 (ParallelGC)" prio=10 tid=0x00007f6da4019800 nid=0x7a2c runnable "GC task thread#1 (ParallelGC)" prio=10 tid=0x00007f6da401b000 nid=0x7a2d runnable "GC task thread#2 (ParallelGC)" prio=10 tid=0x00007f6da401d000 nid=0x7a2e runnable "GC task thread#3 (ParallelGC)" prio=10 tid=0x00007f6da401f000 nid=0x7a2f runnable "GC task thread#4 (ParallelGC)" prio=10 tid=0x00007f6da4020800 nid=0x7a30 runnable "GC task thread#5 (ParallelGC)" prio=10 tid=0x00007f6da4022800 nid=0x7a31 runnable "GC task thread#6 (ParallelGC)" prio=10 tid=0x00007f6da4024800 nid=0x7a32 runnable "GC task thread#7 (ParallelGC)" prio=10 tid=0x00007f6da4026000 nid=0x7a33 runnable "GC task thread#8 (ParallelGC)" prio=10 tid=0x00007f6da4028000 nid=0x7a34 runnable "GC task thread#9 (ParallelGC)" prio=10 tid=0x00007f6da402a000 nid=0x7a35 runnable "GC task thread#10 (ParallelGC)" prio=10 tid=0x00007f6da402b800 nid=0x7a36 runnable "GC task thread#11 (ParallelGC)" prio=10 tid=0x00007f6da402d800 nid=0x7a37 runnable "GC task thread#12 (ParallelGC)" prio=10 tid=0x00007f6da402f800 nid=0x7a38 runnable "GC task thread#13 (ParallelGC)" prio=10 tid=0x00007f6da4031000 nid=0x7a39 runnable "GC task thread#14 (ParallelGC)" prio=10 tid=0x00007f6da4033000 nid=0x7a3a runnable "GC task thread#15 (ParallelGC)" prio=10 tid=0x00007f6da4035000 nid=0x7a3b runnable "GC task thread#16 (ParallelGC)" prio=10 tid=0x00007f6da4036800 nid=0x7a3c runnable "GC task thread#17 (ParallelGC)" prio=10 tid=0x00007f6da4038800 nid=0x7a3d runnable "VM Periodic Task Thread" prio=10 tid=0x00007f6da40dd000 nid=0x7a45 waiting on condition JNI global references: 886
清单 6. jstack 运行输出片段 2
Found one Java-level deadlock: ============================= "north": waiting for ownable synchronizer 0x000000075903c7c8, ( a java.util.concurrent.locks.ReentrantLock$NonfairSync), which is held by "south" "south": waiting for ownable synchronizer 0x000000075903c798, ( a java.util.concurrent.locks.ReentrantLock$NonfairSync), which is held by "north" Java stack information for the threads listed above: =================================================== "north": at sun.misc.Unsafe.park(Native Method) - parking to wait for <0x000000075903c7c8> ( a java.util.concurrent.locks.ReentrantLock$NonfairSync) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:156) at java.util.concurrent.locks.AbstractQueuedSynchronizer. parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:811) at java.util.concurrent.locks.AbstractQueuedSynchronizer. doAcquireInterruptibly(AbstractQueuedSynchronizer.java:867) at java.util.concurrent.locks.AbstractQueuedSynchronizer. acquireInterruptibly(AbstractQueuedSynchronizer.java:1201) at java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:312) at DeadLock.run(DeadLock.java:50) "south": at sun.misc.Unsafe.park(Native Method) - parking to wait for <0x000000075903c798> ( a java.util.concurrent.locks.ReentrantLock$NonfairSync) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:156) at java.util.concurrent.locks.AbstractQueuedSynchronizer. parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:811) at java.util.concurrent.locks.AbstractQueuedSynchronizer. doAcquireInterruptibly(AbstractQueuedSynchronizer.java:867) at java.util.concurrent.locks.AbstractQueuedSynchronizer. acquireInterruptibly(AbstractQueuedSynchronizer.java:1201) at java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:312) at DeadLock.run(DeadLock.java:28) Found 1 deadlock.
死锁是由四个必要条件导致的,所以一般来说,只要破坏这四个必要条件中的一个条件,死锁情况就应该不会发生。
如果想要打破互斥条件,我们需要允许进程同时访问某些资源,这种方法受制于实际场景,不太容易实现条件;
打破不可抢占条件,这样需要允许进程强行从占有者那里夺取某些资源,或者简单一点理解,占有资源的进程不能再申请占有其他资源,必须释放手上的资源之后才能发起申请,这个其实也很难找到适用场景;
进程在运行前申请得到所有的资源,否则该进程不能进入准备执行状态。这个方法看似有点用处,但是它的缺点是可能导致资源利用率和进程并发性降低;
避免出现资源申请环路,即对资源事先分类编号,按号分配。这种方式可以有效提高资源的利用率和系统吞吐量,但是增加了系统开销,增大了进程对资源的占用时间。
如果我们在死锁检查时发现了死锁情况,那么就要努力消除死锁,使系统从死锁状态中恢复过来。消除死锁的几种方式:
1. 最简单、最常用的方法就是进行系统的重新启动,不过这种方法代价很大,它意味着在这之前所有的进程已经完成的计算工作都将付之东流,包括参与死锁的那些进程,以及未参与死锁的进程;
2. 撤消进程,剥夺资源。终止参与死锁的进程,收回它们占有的资源,从而解除死锁。这时又分两种情况:一次性撤消参与死锁的全部进程,剥夺全部资源;或者逐步撤消参与死锁的进程,逐步收回死锁进程占有的资源。一般来说,选择逐步撤消的进程时要按照一定的原则进行,目的是撤消那些代价最小的进程,比如按进程的优先级确定进程的代价;考虑进程运行时的代价和与此进程相关的外部作业的代价等因素;
3. 进程回退策略,即让参与死锁的进程回退到没有发生死锁前某一点处,并由此点处继续执行,以求再次执行时不再发生死锁。虽然这是个较理想的办法,但是操作起来系统开销极大,要有堆栈这样的机构记录进程的每一步变化,以便今后的回退,有时这是无法做到的。
其实即便是商业产品,依然会有很多死锁情况的发生,例如 MySQL 数据库,它也经常容易出现死锁案例。
假设我们用 Show innodb status 检查引擎状态时发现了死锁情况,如清单 7 所示。
清单 7. MySQL 死锁
WAITING FOR THIS LOCK TO BE GRANTED: RECORD LOCKS space id 0 page no 843102 n bits 600 index `KEY_TSKTASK_MONTIME2` of table `dcnet_db/TSK_TASK` trx id 0 677833454 lock_mode X locks rec but not gap waiting Record lock, heap no 395 PHYSICAL RECORD: n_fields 3; compact format; info bits 0 0: len 8; hex 8000000000000425; asc %;; 1: len 8; hex 800012412c66d29c; asc A,f ;; 2: len 8; hex 800000000097629c; asc b ;; *** WE ROLL BACK TRANSACTION (1)
我们假设涉事的数据表上面有一个索引,这次的死锁就是由于两条记录同时访问到了相同的索引造成的。
我们首先来看看 InnoDB 类型的数据表,只要能够解决索引问题,就可以解决死锁问题。MySQL 的 InnoDB 引擎是行级锁,需要注意的是,这不是对记录进行锁定,而是对索引进行锁定。在 UPDATE、DELETE 操作时,MySQL 不仅锁定 WHERE 条件扫描过的所有索引记录,而且会锁定相邻的键值,即所谓的 next-key locking;
如语句 UPDATE TSK_TASK SET UPDATE_TIME = NOW() WHERE ID > 10000 会锁定所有主键大于等于 1000 的所有记录,在该语句完成之前,你就不能对主键等于 10000 的记录进行操作;当非簇索引 (non-cluster index) 记录被锁定时,相关的簇索引 (cluster index) 记录也需要被锁定才能完成相应的操作。
再分析一下发生问题的两条 SQL 语句:
当
update TSK_TASK set STATUS_ID=1064,UPDATE_TIME=now () where STATUS_ID=1061 and MON_TIME<date_sub(now(), INTERVAL 30 minute)
执行时,MySQL 会使用 KEY_TSKTASK_MONTIME2 索引,因此首先锁定相关的索引记录,因为 KEY_TSKTASK_MONTIME2 是非簇索引,为执行该语句,MySQL 还会锁定簇索引(主键索引)。
假设“update TSK_TASK set STATUS_ID=1067,UPDATE_TIME=now () where ID in (9921180)”几乎同时执行时,本语句首先锁定簇索引 (主键),由于需要更新 STATUS_ID 的值,所以还需要锁定 KEY_TSKTASK_MONTIME2 的某些索引记录。
这样第一条语句锁定了 KEY_TSKTASK_MONTIME2 的记录,等待主键索引,而第二条语句则锁定了主键索引记录,而等待 KEY_TSKTASK_MONTIME2 的记录,这样死锁就产生了。
我们通过拆分第一条语句解决了死锁问题:即先查出符合条件的 ID:select ID from TSK_TASK where STATUS_ID=1061 and MON_TIME < date_sub(now(), INTERVAL 30 minute);然后再更新状态:update TSK_TASK set STATUS_ID=1064 where ID in (….)。
我们发现,死锁虽然是较早就被发现的问题,但是很多情况下我们设计的程序里还是经常发生死锁情况。我们不能只是分析如何解决死锁这类问题,还需要具体找出预防死锁的方法,这样才能从根本上解决问题。总的来说,还是需要系统架构师、程序员不断积累经验,从业务逻辑设计层面彻底消除死锁发生的可能性。
以上就是Java 程序死锁问题原理及解决方案 的内容,更多相关内容请关注PHP中文网(www.php.cn)!