JPPF Issue Tracker
star_faded.png
Please log in to bookmark issues
bug_report_small.png
CLOSED  Bug report JPPF-340  -  Deadlock on task completion
Posted Oct 20, 2014 - updated Oct 22, 2014
icon_info.png This issue has been closed with status "Closed" and resolution "RESOLVED".
Issue details
  • Type of issue
    Bug report
  • Status
     
    Closed
  • Assigned to
     lolo4j
  • Progress
       
  • Type of bug
    Not triaged
  • Likelihood
    Not triaged
  • Effect
    Not triaged
  • Posted by
     nickguletskii200@gmail.com
  • Owned by
    Not owned by anyone
  • Category
    Server
  • Resolution
    RESOLVED
  • Priority
    High
  • Reproducability
    Always
  • Severity
    Normal
  • Targetted for
    icon_milestones.png JPPF 4.2.4
Issue description
A deadlock happens on task completion. Seems to only happen on single-core systems running both the client, driver and node at the same time (or maybe it just happens more often in this case?)
Steps to reproduce this issue
I don't have an example of how to reproduce this issue without installing my application, but it should be replicatable by trying to concurrently adding jobs from a CompletableFuture with applyAsync. I reproduced this with 4.1.3, 4.2.2 and 4.2.3. In some cases, it takes a long time for it to lock up, in others it only takes a few tasks, depending on the threading model of my schedulers.

#3
Comment posted by
 nickguletskii200@gmail.com
Oct 20, 20:22
A file was uploaded. Thread dumpicon_open_new.png This comment was attached:

JPPF driver 4.2.3 thread dump
#6
Comment posted by
 lolo4j
Oct 20, 22:46
I'm inserting the deadlock information from the attached thread dump, for ease of use:
Thread dump for driver a.b.c.d:11198
 
Deadlock detected
 
- thread id 30 "NodeJobServer-0001" is waiting to lock org.jppf.nio.SelectionKeyWrapper@21c434ae which is held by thread id 25 "ClientJobServer-0001"
- thread id 25 "ClientJobServer-0001" is waiting to lock org.jppf.server.protocol.ServerTaskBundleClient@255fa1f6 which is held by thread id 30 "NodeJobServer-0001"
 
Stack trace information for the threads listed above
 
"NodeJobServer-0001" - 30 - state: BLOCKED - blocked count: 6 - blocked time: 88294 - wait count: 170 - wait time: 24383
  at org.jppf.server.nio.client.CompletionListener.taskCompleted(CompletionListener.java:86)
  - waiting on org.jppf.nio.SelectionKeyWrapper@21c434ae
  at org.jppf.server.protocol.ServerTaskBundleClient.fireTasksCompleted(ServerTaskBundleClient.java:340)
  at org.jppf.server.protocol.ServerTaskBundleClient.resultReceived(ServerTaskBundleClient.java:200)
  - locked org.jppf.server.protocol.ServerTaskBundleClient@255fa1f6
  at org.jppf.server.protocol.ServerJob.resultsReceived(ServerJob.java:130)
  at org.jppf.server.protocol.ServerTaskBundleNode.resultsReceived(ServerTaskBundleNode.java:174)
  at org.jppf.server.nio.nodeserver.WaitingResultsState.processResults(WaitingResultsState.java:134)
  at org.jppf.server.nio.nodeserver.WaitingResultsState.process(WaitingResultsState.java:82)
  at org.jppf.server.nio.nodeserver.WaitingResultsState.performTransition(WaitingResultsState.java:63)
  at org.jppf.server.nio.nodeserver.WaitingResultsState.performTransition(WaitingResultsState.java:39)
  at org.jppf.nio.StateTransitionTask.run(StateTransitionTask.java:82)
  - locked org.jppf.nio.SelectionKeyWrapper@72ff90ec
  at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
  at java.util.concurrent.FutureTask.run(FutureTask.java:266)
  at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
  at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
  at java.lang.Thread.run(Thread.java:745)
 
  Locked ownable synchronizers:
  - java.util.concurrent.ThreadPoolExecutor$Worker@227fdb38
 
"ClientJobServer-0001" - 25 - state: BLOCKED - blocked count: 13 - blocked time: 88287 - wait count: 505 - wait time: 27976
  at org.jppf.server.protocol.ServerTaskBundleClient.isDone(ServerTaskBundleClient.java:283)
  - waiting on org.jppf.server.protocol.ServerTaskBundleClient@255fa1f6
  at org.jppf.server.nio.client.WaitingJobState.performTransition(WaitingJobState.java:87)
  at org.jppf.server.nio.client.WaitingJobState.performTransition(WaitingJobState.java:33)
  at org.jppf.nio.StateTransitionTask.run(StateTransitionTask.java:82)
  - locked org.jppf.nio.SelectionKeyWrapper@21c434ae
  at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
  at java.util.concurrent.FutureTask.run(FutureTask.java:266)
  at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
  at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
  at java.lang.Thread.run(Thread.java:745)
 
  Locked ownable synchronizers:
  - java.util.concurrent.ThreadPoolExecutor$Worker@24812bf2
#7
Comment posted by
 lolo4j
Oct 20, 23:58
A file was uploaded. Tentative fixicon_open_new.png
#8
Comment posted by
 lolo4j
Oct 20, 23:59
I tried to reproduce with a VM setup with one processor, but without success. Since I can't reproduce the deadlock, I made a "blind" fix, which I'm attaching to this ticket.

Can you give it a try and add a comment on the outcome? You just need to replace the server's existing lib/jppf-server.jar with the fixed one. If this resolves the problem, I will publish an official patch.
#9
Comment posted by
 nickguletskii200@gmail.com
icon_reply.pngOct 21, 17:08, in reply to comment #8


lolo4j wrote:
I tried to reproduce with a VM setup with one processor, but without
success. Since I can't reproduce the deadlock, I made a "blind" fix, which
I'm attaching to this ticket.

Can you give it a try and add a comment on the outcome? You just need to
replace the server's existing lib/jppf-server.jar with the fixed one. If
this resolves the problem, I will publish an official patch.


Amazing! I ran 20k jobs and the deadlock hasn't happened yet, so I'd say that it's fixed. Thank you very much!
#10
Comment posted by
 lolo4j
Oct 22, 10:41
fixed in