JPPF Issue Tracker
star_faded.png
Please log in to bookmark issues
bug_report_small.png
CLOSED  Bug report JPPF-327  -  Node deadlock when shutting it down
Posted Sep 14, 2014 - updated Sep 15, 2014
icon_info.png This issue has been closed with status "Closed" and resolution "RESOLVED".
Issue details
  • Type of issue
    Bug report
  • Status
     
    Closed
  • Assigned to
     lolo4j
  • Progress
       
  • Type of bug
    Not triaged
  • Likelihood
    Not triaged
  • Effect
    Not triaged
  • Posted by
     Daniel Widdis
  • Owned by
    Not owned by anyone
  • Category
    Server
  • Resolution
    RESOLVED
  • Priority
    Normal
  • Reproducability
    Rarely
  • Severity
    Normal
  • Targetted for
    icon_milestones.png JPPF 4.2.3
Issue description
Minor synchronization issue. Unlikely to reproduce this, but recording relevant stack traces in case there's an easy fix.
----------------------------------------
Thread dump for node 10.176.197.10:11202
----------------------------------------
 
--------------------------------------------------------------------------------
Deadlock detected
 
- thread id 64 "NodeShutdown" is waiting to lock org.jppf.server.node.remote.JPPFRemoteNode@65919c1f which is held by thread id 44 "NodeShutdown"
- thread id 44 "NodeShutdown" is waiting to lock java.util.ArrayList@6c8b3d39 which is held by thread id 1 "main"
- thread id 1 "main" is waiting to lock org.jppf.server.node.remote.JPPFRemoteNode@65919c1f which is held by thread id 44 "NodeShutdown"
Stack trace information for the threads listed above
 
"NodeShutdown" - 64 - state: BLOCKED - blocked count: 1 - blocked time: 0 - wait count: 0 - wait time: 0
  at org.jppf.utils.ThreadSynchronization.setStopped(ThreadSynchronization.java:86)
  - waiting on org.jppf.server.node.remote.JPPFRemoteNode@65919c1f
  at org.jppf.server.node.JPPFNode.shutdown(JPPFNode.java:377)
  at org.jppf.management.JPPFNodeAdmin$2.run(JPPFNodeAdmin.java:148)
  at java.lang.Thread.run(Thread.java:745)
 
"NodeShutdown" - 44 - state: BLOCKED - blocked count: 1 - blocked time: 0 - wait count: 0 - wait time: 0
  at org.jppf.node.event.LifeCycleEventHandler.fireNodeEnding(LifeCycleEventHandler.java:141)
  - waiting on java.util.ArrayList@6c8b3d39
  at org.jppf.server.node.JPPFNode.reset(JPPFNode.java:389)
  at org.jppf.server.node.JPPFNode.stopNode(JPPFNode.java:368)
  - locked org.jppf.server.node.remote.JPPFRemoteNode@65919c1f
  at org.jppf.node.NodeRunner$ShutdownOrRestart$1.run(NodeRunner.java:358)
  at java.security.AccessController.doPrivileged(Native Method)
  at org.jppf.node.NodeRunner$ShutdownOrRestart.run(NodeRunner.java:355)
  at org.jppf.node.NodeRunner.shutdown(NodeRunner.java:297)
  at org.jppf.server.node.JPPFNode.shutdown(JPPFNode.java:380)
  at org.jppf.management.JPPFNodeAdmin$2.run(JPPFNodeAdmin.java:148)
  at java.lang.Thread.run(Thread.java:745)
 
"main" - 1 - state: BLOCKED - blocked count: 51 - blocked time: 0 - wait count: 56 - wait time: 0
  at org.jppf.node.AbstractNode.setTaskCount(AbstractNode.java:91)
  - waiting on org.jppf.server.node.remote.JPPFRemoteNode@65919c1f
  at org.jppf.management.JPPFNodeAdmin.setTaskCounter(JPPFNodeAdmin.java:176)
  - locked org.jppf.management.JPPFNodeAdmin@3e91d58b
  at org.jppf.management.NodeStatusNotifier.jobEnding(NodeStatusNotifier.java:96)
  - locked org.jppf.management.JPPFNodeAdmin@3e91d58b
  at org.jppf.node.event.LifeCycleEventHandler.fireJobEnding(LifeCycleEventHandler.java:221)
  - locked java.util.ArrayList@6c8b3d39
  at org.jppf.server.node.NodeExecutionManagerImpl.cleanup(NodeExecutionManagerImpl.java:288)
  at org.jppf.server.node.NodeExecutionManagerImpl.execute(NodeExecutionManagerImpl.java:211)
  at org.jppf.server.node.JPPFNode.processNextJob(JPPFNode.java:186)
  at org.jppf.server.node.JPPFNode.perform(JPPFNode.java:169)
  at org.jppf.server.node.JPPFNode.run(JPPFNode.java:134)
  at org.jppf.node.NodeRunner.main(NodeRunner.java:130)
Steps to reproduce this issue
Run about 93 nodes. Shut one down. Get impatient when response is not immediate. Click shutdown button again before it's done.

Node was a slave node, idle at the time and there were 32 threads competing for 8 cores.

#2
Comment posted by
 Daniel Widdis
Sep 14, 18:44
A file was uploaded. Full thread dump of deadlockicon_open_new.png
#4
Comment posted by
 lolo4j
Sep 15, 06:24
Yes, the fix is easy enough and consists in 2 parts:
  • removed the need for synchronization on the listeners in LifeCycleEventHandler by using CopyOnWriteArrayList instead of ArrayList
  • once a shudown or restart request has been made, mark the node as shutting down and reject any further request
I'm also changing the title of this bug to reflect that the deadlock is in the node and not in the driver
#6
Comment posted by
 lolo4j
Sep 15, 06:30
Fixed in:
#8
Comment posted by
 lolo4j
Sep 15, 06:33
Dan, let me know if you need a patch, I can publish one. For info, the fix spans over 3 jars and requires updating the nodes and server.
#9
Comment posted by
 Daniel Widdis
icon_reply.pngSep 15, 07:03, in reply to comment #8
I can wait until the next release! This was no problem other than a node hanging around on the GUI list longer than desired. Not sure I could replicate it even if I tried!

lolo4j wrote:
Dan, let me know if you need a patch, I can publish one. For info, the fix
spans over 3 jars and requires updating the nodes and server.