JPPF Issue Tracker
star_faded.png
Please log in to bookmark issues
bug_report_small.png
CLOSED  Bug report JPPF-108  -  Deadlock in the server upon client disconnection
Posted Dec 23, 2012 - updated Dec 26, 2012
icon_info.png This issue has been closed with status "Closed" and resolution "RESOLVED".
Issue details
  • Type of issue
    Bug report
  • Status
     
    Closed
  • Assigned to
     lolo4j
  • Progress
       
  • Type of bug
    Not triaged
  • Likelihood
    Not triaged
  • Effect
    Not triaged
  • Posted by
     lolo4j
  • Owned by
    Not owned by anyone
  • Category
    Server
  • Resolution
    RESOLVED
  • Priority
    Critical
  • Reproducability
    Rarely
  • Severity
    Critical
  • Targetted for
    icon_milestones.png JPPF 3.2.1
Issue description
While running a test, I got the following deadlock when killing the client runner before a job was complete:

Deadlock information:
 
"NodeJobServer-8":
  waiting to lock monitor 0x0000000009d38a68 (object 0x00000000fca59928, a org.jppf.server.protocol.ServerTaskBundleClient),
  which is held by "ClientJobServer-7"
 
"ClientJobServer-7":
  waiting to lock monitor 0x000000000b7ab8f8 (object 0x00000000fca58510, a org.jppf.server.protocol.ServerJob),
  which is held by "NodeJobServer-8"
 
 
Java stack information for the threads listed above 
 
"NodeJobServer-8":
  at org.jppf.server.protocol.ServerTaskBundleClient.resultReceived(ServerTaskBundleClient.java:201)
  - waiting to lock <0x00000000fca59928> (a org.jppf.server.protocol.ServerTaskBundleClient)
  at org.jppf.server.protocol.ServerJob.resultsReceived(ServerJob.java:225)
  - locked <0x00000000fca58510> (a org.jppf.server.protocol.ServerJob)
  at org.jppf.server.protocol.ServerTaskBundleNode.resultsReceived(ServerTaskBundleNode.java:180)
  at org.jppf.server.nio.nodeserver.WaitingResultsState.performTransition(WaitingResultsState.java:81)
  at org.jppf.server.nio.nodeserver.WaitingResultsState.performTransition(WaitingResultsState.java:1)
  at org.jppf.server.nio.StateTransitionTask.run(StateTransitionTask.java:82)
  - locked <0x00000000e2e09f70> (a org.jppf.server.nio.SelectionKeyWrapper)
  at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
  at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
  at java.util.concurrent.FutureTask.run(FutureTask.java:166)
  at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
  at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
  at java.lang.Thread.run(Thread.java:722)
 
"ClientJobServer-7":
  at org.jppf.server.protocol.ServerJob.taskCompleted(ServerJob.java:275)
  - waiting to lock <0x00000000fca58510> (a org.jppf.server.protocol.ServerJob)
  at org.jppf.server.protocol.ServerJob.cancel(ServerJob.java:363)
  at org.jppf.server.protocol.ServerJob$BundleCompletionListener.taskCompleted(ServerJob.java:464)
  at org.jppf.server.protocol.ServerTaskBundleClient.fireTasksCompleted(ServerTaskBundleClient.java:381)
  at org.jppf.server.protocol.ServerTaskBundleClient.cancel(ServerTaskBundleClient.java:287)
  - locked <0x00000000fca59928> (a org.jppf.server.protocol.ServerTaskBundleClient)
  at org.jppf.server.nio.client.ClientContext.cancelJobOnClose(ClientContext.java:269)
  - locked <0x00000000e2e1fad8> (a org.jppf.server.nio.client.ClientContext)
  at org.jppf.server.nio.client.ClientContext.handleException(ClientContext.java:111)
  at org.jppf.server.nio.StateTransitionTask.run(StateTransitionTask.java:94)
  at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
  at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
  at java.util.concurrent.FutureTask.run(FutureTask.java:166)
  at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
  at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
  at java.lang.Thread.run(Thread.java:722)
Steps to reproduce this issue
- Using a driver with the following load-balncer config:

jppf.load.balancing.algorithm = rl
jppf.load.balancing.strategy = rl
 
# "rl" profile
strategy.rl.performanceCacheSize = 1000
strategy.rl.performanceVariationThreshold = 0.001
strategy.rl.maxActionRange = 10


- 2 remote nodes with 8 processing threads each

- running the matrix multiplication sample with square matrix size = 1000

==> kill the client while a job is being executed ==> deadlock

#4
Comment posted by
 lolo4j
Dec 23, 09:45
Fixed. Changes committed to SVN:

The issue was updated with the following change(s):
  • This issue has been closed
  • The status has been updated, from New to Closed.
  • This issue's progression has been updated to 100 percent completed.
  • The resolution has been updated, from Not determined to RESOLVED.
  • Information about the user working on this issue has been changed, from lolo4j to Not being worked on.