JPPF Issue Tracker
star_faded.png
Please log in to bookmark issues
bug_report_small.png
CLOSED  Bug report JPPF-147  -  Deadlock in the driver
Posted May 14, 2013 - updated May 21, 2013
icon_info.png This issue has been closed with status "Closed" and resolution "RESOLVED".
Issue details
  • Type of issue
    Bug report
  • Status
     
    Closed
  • Assigned to
     lolo4j
  • Progress
       
  • Type of bug
    Not triaged
  • Likelihood
    Not triaged
  • Effect
    Not triaged
  • Posted by
     lolo4j
  • Owned by
     lolo4j
  • Category
    Server
  • Resolution
    RESOLVED
  • Priority
    High
  • Reproducability
    Always
  • Severity
    Critical
  • Targetted for
    icon_milestones.png JPPF 3.3.2
Issue description
With the attached sample code, I reproduce the following deadlock in the driver:
Found one Java-level deadlock
 
"ClientClassServer-4":
  waiting to lock monitor 0x0000000011d95cf0 (object 0x00000000e0682cc8, a org.jppf.server.nio.classloader.ClassContext),
  which is held by "NodeClassServer-2"
 
"NodeClassServer-2":
  waiting for ownable synchronizer 0x00000000e009f368, (a java.util.concurrent.locks.ReentrantLock$NonfairSync),
  which is held by "ClientClassServer-4"
 
 
 Java stack information for the threads listed above 
 
"ClientClassServer-4":
  at org.jppf.server.nio.classloader.ClassContext.processRequests(ClassContext.java:197)
  - waiting to lock <0x00000000e0682cc8> (a org.jppf.server.nio.classloader.ClassContext)
  at org.jppf.server.nio.classloader.ClassContext.setState(ClassContext.java:92)
  at org.jppf.server.nio.classloader.ClassContext.setState(ClassContext.java:42)
  at org.jppf.server.nio.StateTransitionManager.transitionChannel(StateTransitionManager.java:138)
  at org.jppf.server.nio.StateTransitionTask.run(StateTransitionTask.java:85)
  - locked <0x00000000e0682c78> (a org.jppf.server.nio.SelectionKeyWrapper)
  at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
  at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
  at java.util.concurrent.FutureTask.run(FutureTask.java:166)
  at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
  at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
  at java.lang.Thread.run(Thread.java:722)
 
 "NodeClassServer-2":
  at sun.misc.Unsafe.park(Native Method)
  - parking to wait for  0x00000000e009f368> (a java.util.concurrent.locks.ReentrantLock$NonfairSync)
  at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
  at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834)
  at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:867)
  at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1197)
  at java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:214)
  at java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:290)
  at org.jppf.server.nio.StateTransitionManager.transitionChannel(StateTransitionManager.java:122)
  at org.jppf.server.nio.StateTransitionManager.transitionChannel(StateTransitionManager.java:108)
  at org.jppf.server.nio.classloader.ClassContext.processRequests(ClassContext.java:200)
  - locked <0x00000000e0682cc8> (a org.jppf.server.nio.classloader.ClassContext)
  at org.jppf.server.nio.classloader.ClassContext.addRequest(ClassContext.java:189)
  at org.jppf.server.nio.classloader.node.WaitingNodeRequestState.processDynamic(WaitingNodeRequestState.java:186)
  at org.jppf.server.nio.classloader.node.WaitingNodeRequestState.processResource(WaitingNodeRequestState.java:102)
  at org.jppf.server.nio.classloader.node.WaitingNodeRequestState.performTransition(WaitingNodeRequestState.java:77)
  at org.jppf.server.nio.classloader.node.WaitingNodeRequestState.performTransition(WaitingNodeRequestState.java:37)
  at org.jppf.server.nio.StateTransitionTask.run(StateTransitionTask.java:82)
  - locked <0x00000000e0678da0> (a org.jppf.server.nio.SelectionKeyWrapper)
  at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
  at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
  at java.util.concurrent.FutureTask.run(FutureTask.java:166)
  at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
  at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
  at java.lang.Thread.run(Thread.java:722)
Two users also reported a similar deadlock, for which I'm attaching the related thread dumps
Steps to reproduce this issue
  • 1 driver
  • 4 nodes (same machine) with processing.threads = 1
  • using the attached sample code (runner + sample task)
  • ==> deadlock in the driver after some time
Interestingly, I cannot reproduce when setting processing.threads > 1 on the nodes

#6
Comment posted by
 lolo4j
May 14, 05:08
A file was uploaded. reproducing code sampleicon_open_new.png
#7
Comment posted by
 lolo4j
May 14, 05:09
A file was uploaded. deadlock 1 reported by usericon_open_new.png
#9
Comment posted by
 lolo4j
May 14, 06:23
I've narrowed down the part of the code that causes it as multiple concurrent calls to ClassContext.processRequests()

The method is like this:
private synchronized void processRequests() {
  if (ClassState.IDLE_PROVIDER.equals(getState()) && (currentRequest.get() == null) && (getNbPendingRequests() > 0)) {
    if (debugEnabled) log.debug("state changing from {} to {} for {}", new Object[] {ClassState.IDLE_PROVIDER, ClassState.SENDING_PROVIDER_REQUEST, this});
    driver.getClientClassServer().getTransitionManager().transitionChannel(getChannel(), ClassTransition.TO_SENDING_PROVIDER_REQUEST);
  }
}
To fix the issue I changed it to this:
private final Lock lockRequest = new ReentrantLock();
private void processRequests() {
  // if requests are already being processed, no need to do anything
  if (lockRequest.tryLock()) {
    try {
      if (ClassState.IDLE_PROVIDER.equals(getState()) && (currentRequest.get() == null) && (getNbPendingRequests() > 0)) {
        if (debugEnabled) log.debug("state changing from {} to {} for {}", new Object[] {ClassState.IDLE_PROVIDER, ClassState.SENDING_PROVIDER_REQUEST, this});
        driver.getClientClassServer().getTransitionManager().transitionChannel(getChannel(), ClassTransition.TO_SENDING_PROVIDER_REQUEST);
      }
    } finally {
      lockRequest.unlock();
    }
  }
}
With this code fix, I cannot reproduce anymore
#10
Comment posted by
 lolo4j
May 16, 08:01
Another deadlock:
Deadlock detected
 
- thread id 71 "NodeJobServer-8" is waiting to lock java.util.concurrent.locks.ReentrantLock$NonfairSync@428ad4c6 which is held by thread id 56 "ClientJobServer-1"
- thread id 56 "ClientJobServer-1" is waiting to lock org.jppf.server.protocol.ServerJob@248f2c2 which is held by thread id 53 "NodeJobServer-4"
- thread id 53 "NodeJobServer-4" is waiting to lock java.util.concurrent.locks.ReentrantLock$NonfairSync@428ad4c6 which is held by thread id 56 "ClientJobServer-1"
 
Stack trace information for the threads listed above
 
"NodeJobServer-8" - 71 - state: WAITING - blocked count: 58 - blocked time: 0 - wait count: 225 - wait time: 0
  at sun.misc.Unsafe.park(Native Method)
  - waiting on java.util.concurrent.locks.ReentrantLock$NonfairSync@428ad4c6
  at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
  at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834)
  at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:867)
  at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1197)
  at java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:214)
  at java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:290)
  at org.jppf.server.protocol.ServerJob.resultsReceived(ServerJob.java:211)
  at org.jppf.server.protocol.ServerTaskBundleNode.resultsReceived(ServerTaskBundleNode.java:172)
  at org.jppf.server.nio.nodeserver.WaitingResultsState.performTransition(WaitingResultsState.java:86)
  at org.jppf.server.nio.nodeserver.WaitingResultsState.performTransition(WaitingResultsState.java:38)
  at org.jppf.server.nio.StateTransitionTask.run(StateTransitionTask.java:82)
  - locked org.jppf.server.nio.SelectionKeyWrapper@66145b9d
  at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
  at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
  at java.util.concurrent.FutureTask.run(FutureTask.java:166)
  at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
  at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
  at java.lang.Thread.run(Thread.java:722)
 
  Locked ownable synchronizers:
  - java.util.concurrent.ThreadPoolExecutor$Worker@6ca9e1af
 
"ClientJobServer-1" - 56 - state: BLOCKED - blocked count: 19 - blocked time: 0 - wait count: 62505 - wait time: 0
  at org.jppf.server.protocol.ServerJob.taskCompleted(ServerJob.java:277)
  - waiting on org.jppf.server.protocol.ServerJob@248f2c2
  at org.jppf.server.protocol.ServerJob.cancel(ServerJob.java:363)
  at org.jppf.server.protocol.ServerJob$BundleCompletionListener.taskCompleted(ServerJob.java:492)
  at org.jppf.server.protocol.ServerTaskBundleClient.fireTasksCompleted(ServerTaskBundleClient.java:333)
  at org.jppf.server.protocol.ServerTaskBundleClient.cancel(ServerTaskBundleClient.java:258)
  at org.jppf.server.nio.client.ClientContext.cancelJobOnClose(ClientContext.java:270)
  - locked org.jppf.server.nio.client.ClientContext@108bbce9
  at org.jppf.server.nio.client.ClientContext.handleException(ClientContext.java:112)
  at org.jppf.server.nio.StateTransitionTask.run(StateTransitionTask.java:94)
  at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
  at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
  at java.util.concurrent.FutureTask.run(FutureTask.java:166)
  at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
  at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
  at java.lang.Thread.run(Thread.java:722)
 
  Locked ownable synchronizers:
  - java.util.concurrent.locks.ReentrantLock$NonfairSync@428ad4c6
  - java.util.concurrent.ThreadPoolExecutor$Worker@226c0370
 
"NodeJobServer-4" - 53 - state: WAITING - blocked count: 50 - blocked time: 0 - wait count: 230 - wait time: 0
  at sun.misc.Unsafe.park(Native Method)
  - waiting on java.util.concurrent.locks.ReentrantLock$NonfairSync@428ad4c6
  at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
  at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834)
  at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:867)
  at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1197)
  at java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:214)
  at java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:290)
  at org.jppf.server.protocol.ServerJob.taskCompleted(ServerJob.java:277)
  - locked org.jppf.server.protocol.ServerJob@248f2c2
  at org.jppf.server.protocol.ServerTaskBundleNode.taskCompleted(ServerTaskBundleNode.java:194)
  at org.jppf.server.nio.nodeserver.WaitingResultsState.performTransition(WaitingResultsState.java:106)
  at org.jppf.server.nio.nodeserver.WaitingResultsState.performTransition(WaitingResultsState.java:38)
  at org.jppf.server.nio.StateTransitionTask.run(StateTransitionTask.java:82)
  - locked org.jppf.server.nio.SelectionKeyWrapper@606f394f
  at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
  at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
  at java.util.concurrent.FutureTask.run(FutureTask.java:166)
  at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
  at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
  at java.lang.Thread.run(Thread.java:722)
 
  Locked ownable synchronizers:
  - java.util.concurrent.ThreadPoolExecutor$Worker@33a3668a
The cause of this one id that ServerJob.taskCompleted() is both synchronized and using the reentrant lock, which is clearly bad. Removing the synchronized qualifier for the method should fix it.
#11
Comment posted by
 lolo4j
May 21, 09:02
fixed bug Deadlock in the driver

Fixed. Changes committed to SVN:

The issue was updated with the following change(s):
  • This issue has been closed
  • The status has been updated, from Confirmed to Closed.
  • This issue's progression has been updated to 100 percent completed.
  • The resolution has been updated, from Not determined to RESOLVED.
  • Information about the user working on this issue has been changed, from lolo4j to Not being worked on.