JPPF Issue Tracker
star_faded.png
Please log in to bookmark issues
bug_report_small.png
CLOSED  Bug report JPPF-72  -  Server deadlock in TaskQueueChecker / NodeNioServer
Posted Sep 25, 2012 - updated Oct 07, 2012
icon_info.png This issue has been closed with status "Closed" and resolution "RESOLVED".
Issue details
  • Type of issue
    Bug report
  • Status
     
    Closed
  • Assigned to
     lolo4j
  • Progress
       
  • Type of bug
    Not triaged
  • Likelihood
    Not triaged
  • Effect
    Not triaged
  • Posted by
     lolo4j
  • Owned by
    Not owned by anyone
  • Category
    Server
  • Resolution
    RESOLVED
  • Priority
    Normal
  • Reproducability
    Often
  • Severity
    Normal
  • Targetted for
    icon_milestones.png JPPF 3.2
Issue description
When running the "many jobs" sample in the demo module, with 2 drivers with local node and P2P enabled, I get the following ddeadlock:

"Peer Initializer [Peer-1]":
  waiting for ownable synchronizer 0x00000000e0782220, (a java.util.concurrent.locks.ReentrantLock$NonfairSync),
  which is held by "TaskQueueChecker"
"TaskQueueChecker":
  waiting for ownable synchronizer 0x00000000e02104b8, (a java.util.concurrent.locks.ReentrantLock$NonfairSync),
  which is held by "NodeJobServer-3"
"NodeJobServer-3":
  waiting to lock monitor 0x000000000b3e0020 (object 0x00000000e0782848, a java.util.LinkedHashSet),
  which is held by "TaskQueueChecker"
 
 
Java stack information for the threads listed above 
 
"Peer Initializer [Peer-1]":
  at sun.misc.Unsafe.park(Native Method)
  - parking to wait for  0x00000000e0782220> (a java.util.concurrent.locks.ReentrantLock$NonfairSync)
  at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
  at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834)
  at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:867)
  at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1197)
  at java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:214)
  at java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:290)
  at org.jppf.server.queue.JPPFPriorityQueue.addBundle(JPPFPriorityQueue.java:125)
  at org.jppf.server.peer.PeerNode.perform(PeerNode.java:177)
  at org.jppf.server.peer.PeerNode.run(PeerNode.java:119)
  at org.jppf.server.peer.JPPFPeerInitializer.run(JPPFPeerInitializer.java:77)
 
"TaskQueueChecker":
  at sun.misc.Unsafe.park(Native Method)
  - parking to wait for  0x00000000e02104b8> (a java.util.concurrent.locks.ReentrantLock$NonfairSync)
  at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
  at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834)
  at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:867)
  at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1197)
  at java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:214)
  at java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:290)
  at org.jppf.server.nio.StateTransitionManager.transitionChannel(StateTransitionManager.java:108)
  at org.jppf.server.nio.nodeserver.AbstractNodeContext.submit(AbstractNodeContext.java:419)
  at org.jppf.server.nio.nodeserver.AbstractNodeContext.submit(AbstractNodeContext.java:1)
  at org.jppf.server.nio.nodeserver.TaskQueueChecker.dispatchJobToChannel(TaskQueueChecker.java:331)
  - locked <0x00000000e0782958> (a org.jppf.server.nio.nodeserver.LocalNodeChannel)
  at org.jppf.server.nio.nodeserver.TaskQueueChecker.dispatch(TaskQueueChecker.java:265)
  - locked <0x00000000e0782848> (a java.util.LinkedHashSet)
  at org.jppf.server.nio.nodeserver.TaskQueueChecker.run(TaskQueueChecker.java:231)
  at java.lang.Thread.run(Thread.java:722)
 
"NodeJobServer-3":
  at org.jppf.server.nio.nodeserver.TaskQueueChecker.addIdleChannel(TaskQueueChecker.java:173)
  - waiting to lock <0x00000000e0782848> (a java.util.LinkedHashSet)
  at org.jppf.server.nio.nodeserver.NodeNioServer.updateConnectionStatus(NodeNioServer.java:241)
  at org.jppf.server.nio.nodeserver.NodeNioServer.access$1(NodeNioServer.java:234)
  at org.jppf.server.nio.nodeserver.NodeNioServer$1.executionStatusChanged(NodeNioServer.java:114)
  at org.jppf.server.nio.nodeserver.AbstractNodeContext.fireExecutionStatusChanged(AbstractNodeContext.java:478)
  at org.jppf.server.nio.nodeserver.AbstractNodeContext.setState(AbstractNodeContext.java:341)
  at org.jppf.server.nio.nodeserver.AbstractNodeContext.setState(AbstractNodeContext.java:1)
  at org.jppf.server.nio.StateTransitionManager.transitionChannel(StateTransitionManager.java:145)
  at org.jppf.server.nio.StateTransitionTask.run(StateTransitionTask.java:85)
  - locked <0x00000000e0784c60> (a org.jppf.server.nio.SelectionKeyWrapper)
  at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
  at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
  at java.util.concurrent.FutureTask.run(FutureTask.java:166)
  at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
  at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
  at java.lang.Thread.run(Thread.java:722)
Steps to reproduce this issue
  • start 2 drivers, each with local node and P2P enabled (jppf.peer.discovery.enabled = true)
  • run the "many jobs" sample (class sample.dist.manyjobs.ManyJobsRunner in 'demo' module)
  • ==> a hang occurs

#4
Comment posted by
 jandam
Sep 30, 23:41
A file was uploaded. Proposed patch - not tested due to JPPF-73 and JPPF-74icon_open_new.png
#5
Comment posted by
 jandam
Oct 01, 22:01
Issue is not always reproducible. Suggested patch applied to trunk revision 2443. This patch reduces synchronized/locked blocks.
#6
Comment posted by
 lolo4j
Oct 02, 08:29
The patched seems to have fixed this issue, which I can't reproduce anymore. I propose to do more testing before closing this bug.