JPPF Issue Tracker
star_faded.png
Please log in to bookmark issues
bug_report_small.png
CLOSED  Bug report JPPF-479  -  Task cancelation/timeout problems
Posted Oct 14, 2016 - updated Jan 15, 2017
icon_info.png This issue has been closed with status "Closed" and resolution "RESOLVED".
Issue details
  • Type of issue
    Bug report
  • Status
     
    Closed
  • Assigned to
     lolo4j
  • Progress
       
  • Type of bug
    Not triaged
  • Likelihood
    Not triaged
  • Effect
    Not triaged
  • Posted by
     zorba128
  • Owned by
    Not owned by anyone
  • Category
    Core
  • Resolution
    RESOLVED
  • Priority
    High
  • Reproducability
    Always
  • Severity
    Normal
  • Targetted for
    icon_milestones.png JPPF 5.2.4
Issue description
We are encountering several JPPF stability problems - it seems everything is somehow related to timeouts and task cancellation. Behavior ranges from broken job state (no result, no cancellation flag, no exception), locked job (client blocks forever waiting for job), and even locked cluster node.

Steps to reproduce this issue
In order to reproduce behavior I'm just executing simple job with client timeout set:

val job = new JPPFJob(s"Job $i")
val timeout = new JPPFSchedule(Random.nextInt(1000))
job.getClientSLA.setJobExpirationSchedule(timeout)
job.add(new Callable[String] with Serializable { def call(): String = {println(s"Task $i"); s"Task $i"} })
val result = client.submitJob(job).get(0)
println(s"Job [$i], cancelled: [${job.isCancelled}], result: [${result.getResult}], exception: [${result.getThrowable}]")


I'm randomizing timeout value to have some jobs canceled, and some actually executed; also I want timeout to trigger at various stages of job processing. Now I'm just running this code 1000 times in a loop against single jppf client. Cluster consists of JPPF 5.2.2 driver and single node (all with default configuration).

And now - depending on luck:

1. Job never finishes (awaits results forever). When this happened - job was actually executed on node (node console had "Task 428" printout).

  [org.jppf.server.job.JPPFJobManager.jobQueued(146)]: jobId 'Job 428' queued
  [org.jppf.server.job.JPPFJobManager.jobUpdated(178)]: jobId 'Job 428' updated
  [org.jppf.server.job.management.DriverJobManagement.sendNotification(233)]: sending event JOB_QUEUED for job JobInformation[jobUuid=Job 428, jobName=Job 428, taskCount=0, initialTaskCount=1, priority=0, suspended=false, pending=false, maxNodes=2147483647], node=null
  [org.jppf.server.job.management.DriverJobManagement.sendNotification(233)]: sending event JOB_UPDATED for job JobInformation[jobUuid=Job 428, jobName=Job 428, taskCount=1, initialTaskCount=1, priority=0, suspended=false, pending=false, maxNodes=2147483647], node=null
  [org.jppf.server.job.JPPFJobManager.jobUpdated(178)]: jobId 'Job 428' updated
  [org.jppf.server.job.management.DriverJobManagement.sendNotification(233)]: sending event JOB_UPDATED for job JobInformation[jobUuid=Job 428, jobName=Job 428, taskCount=0, initialTaskCount=1, priority=0, suspended=false, pending=false, maxNodes=2147483647], node=null
  [org.jppf.server.job.JPPFJobManager.jobDispatched(120)]: jobId 'Job 428' : added node RemoteNodeContext[channel=SelectionKeyWrapper[id=4], state=SENDING_BUNDLE, uuid=6D331B8F-D1BB-D0B2-6B4A-62D7F33E0810, connectionUuid=null, peer=false, ssl=false]
  [org.jppf.server.job.management.DriverJobManagement.sendNotification(233)]: sending event JOB_DISPATCHED for job JobInformation[jobUuid=Job 428, jobName=Job 428, taskCount=1, initialTaskCount=1, priority=0, suspended=false, pending=false, maxNodes=2147483647], node=JPPFManagementInfo[127.0.0.1:12001, type=node|MASTER, local=false, secure=false, active=true, uuid=6D331B8F-D1BB-D0B2-6B4A-62D7F33E0810]
  [org.jppf.server.job.management.DriverJobManagement.cancelJob(69)]: Request to cancel job 'Job 428'
  [org.jppf.server.job.JPPFJobManager.jobReturned(132)]: jobId 'Job 428' : removed node RemoteNodeContext[channel=SelectionKeyWrapper[id=4], state=WAITING_RESULTS, uuid=6D331B8F-D1BB-D0B2-6B4A-62D7F33E0810, connectionUuid=null, peer=false, ssl=false]
  [org.jppf.server.job.management.DriverJobManagement.sendNotification(233)]: sending event JOB_RETURNED for job JobInformation[jobUuid=Job 428, jobName=Job 428, taskCount=1, initialTaskCount=1, priority=0, suspended=false, pending=false, maxNodes=2147483647], node=JPPFManagementInfo[127.0.0.1:12001, type=node|MASTER, local=false, secure=false, active=true, uuid=6D331B8F-D1BB-D0B2-6B4A-62D7F33E0810]


Note - there is no JOB_ENDED notification after JOB_RETURNED.

2. Job doesn't have cancelled flag set, but result=null and exception=null Here also job was actually executed on node.

3. Job fails with java.lang.NoClassDefFoundError This usually happens after first job was canceled:
  Submitting job [1] with timeout [schedule[delay=20]]
  Task [1], cancelled: [true], time: [24 ms], result: [null], exception: [null]
  Submitting job [2] with timeout [schedule[delay=819]]
  Task [2], cancelled: [false], time: [17 ms], result: [null], exception: [java.lang.NoClassDefFoundError: Could not initialize class scala.Predef$]


4. Node gets locked - it stops executing tasks, I think it also disappears from management console.

This is actually original problem I'm trying to solve. It also seems to be related to task cancelation and classloading, as in node log I found:
  2016-10-13 16:54:51,044 [DEBUG][org.jppf.classloader.AbstractJPPFClassLoaderLifeCycle.loadResource(165)]: java.lang.InterruptedException
  at java.lang.Object.wait(Native Method)
  at java.lang.Object.wait(Unknown Source)
  at org.jppf.classloader.ResourceFuture.get(ResourceFuture.java:95)
  at org.jppf.classloader.ResourceFuture.get(ResourceFuture.java:32)
  at org.jppf.classloader.AbstractClassLoaderConnection.loadResource(AbstractClassLoaderConnection.java:121)
  ...



#9
Comment posted by
 lolo4j
Oct 14, 20:12
What I'm worried about is the NoClassDefoundError reported by the task. Is there any way you could post the full stack trace? In particular, there should be a "caused by: java.lang.ClassNotFoundException" that would be the root cause of the exception.

Additionally, could you attach the node and driver logs to this defect?

Also, I see that you're running the client and jobs from Scala. Are the Scala libraries deployed on the node side, or are they downloaded from the client side? This could cause class loader problems as we have seen in the past for other languages like Clojure.

Lastly, my apologies for the missing 5.2.2 release and 5.2.3 target milestone, this is now fixed.
#11
Comment posted by
 lolo4j
Oct 14, 20:56
I was able to reproduce myself, so here is the stack trace I asked for smileys/2.png
java.lang.NoClassDefFoundError: Could not initialize class scala.Predef$
  at test.TestApp$$anonfun$1$$anon$1.call(TestApp.scala:16)
  at test.TestApp$$anonfun$1$$anon$1.call(TestApp.scala:16)
  at org.jppf.client.taskwrapper.CallableTaskWrapper.execute(CallableTaskWrapper.java:57)
  at org.jppf.client.taskwrapper.JPPFAnnotatedTask.run(JPPFAnnotatedTask.java:84)
  at org.jppf.execute.NodeTaskWrapper.run(NodeTaskWrapper.java:158)
  at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
  at java.util.concurrent.FutureTask.run(FutureTask.java:262)
  at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
  at java.util.concurrent.FutureTask.run(FutureTask.java:262)
  at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
  at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
  at java.lang.Thread.run(Thread.java:745)]
#12
Comment posted by
 lolo4j
Oct 14, 23:42
I found the cause of the issue. When the first task is executed in the node, the simple statement println(s"Task $i") initiates a large number of class loading requests, I can see in the node log that hundreds of Scala classes are looked up and dosnloaded from the client. On my dev machine, I measured around 1.8 second during which only class loading occurs. During that time, the task execution is blocked, and the job expires because of the timeout that is set in the SLA.

I see 2 possible fixes:
  • increase the timeout, at least for the first job, to take the class loading overhead into account
  • add the Scala libraries to the node's classpath. I tried that by adding scala-library.jar and scala-reflect.jar, and the classloading overhead went down to around 125ms
#13
Comment posted by
 zorba128
icon_reply.pngOct 16, 09:25, in reply to comment #12
Hi

Regarding `issue 3`

lolo4j wrote:
When the first task is executed in the node, the simple statement println(s"Task $i")
initiates a large number of class loading requests.
increase the timeout, at least for the first job, to take the class
loading overhead into account add the Scala libraries to the node's classpath.
I tried that by adding scala-library.jar and scala-reflect.jar, and the classloading
overhead went down to around 125ms


But issue is not that first job has timed out due to lots of classes loaded over network. Problem is that second job, executed just after it - runs in context of somehow broken class loader resulting in class not found error. Rather than that, it should also timeout (if it also did not manage load necessary classes and perform work within specified timeout), or it should execute successfully. Now it seems once this happens, session enters some invalid state - and one has to use another jppf connection to be able to execute some other task.

I thought `job.getSLA.getClassPath.setForceClassLoaderReset(true)` might be used as kind of workaround, but doesn't seem to help much.

Deploying libraries to nodes will decrease probability of problems by reducing overall classloading time, but issue still remains and probably could happen even with single-class, no-dependencies, java-only task being executed on infrastructure with really low network throughput between hosts.

regards, marcin

#14
Comment posted by
 lolo4j
icon_reply.pngOct 17, 09:42, in reply to comment #13
Marcin,

Thank you very much for your input. I totally agree with you. From what I've seen the issue boils down to the initial NoClassDefoundError, which is not properly handled by the JPPF class loader. I believe it is triggered by th fact that the task is cancelled, which is done by interupting the thread that executes the task. At the time this happens, the class loader is waiting for the driver to send the response to a class loading request, while in the scope of a call to ClassLoader.define(...).

I'm investigating how this should be handled.
#15
Comment posted by
 lolo4j
Oct 18, 07:37
Debugging the node in Eclipse with a breakpoint on NoClassDefFoundError, I get the "real" stack trace:
Daemon Thread [Processing-0004] (Suspended (exception java.lang.NoClassDefFoundError))  
  org.jppf.classloader.JPPFClassLoader(java.lang.ClassLoader).loadClass(java.lang.String, boolean) line: 436  
  org.jppf.classloader.JPPFClassLoader(org.jppf.classloader.AbstractJPPFClassLoader).loadClass(java.lang.String, boolean) line: 332 
  org.jppf.classloader.JPPFClassLoader(java.lang.ClassLoader).loadClass(java.lang.String) line: 357 
  java.lang.ClassLoader.defineClass1(java.lang.String, byte[], int, int, java.security.ProtectionDomain, java.lang.String) line: not available [native method]  
  org.jppf.classloader.JPPFClassLoader(java.lang.ClassLoader).defineClass(java.lang.String, byte[], int, int, java.security.ProtectionDomain) line: 760 
  org.jppf.classloader.JPPFClassLoader(java.lang.ClassLoader).defineClass(java.lang.String, byte[], int, int) line: 642 
  org.jppf.classloader.JPPFClassLoader(org.jppf.classloader.AbstractJPPFClassLoader).findClass(java.lang.String, boolean) line: 160 
  org.jppf.classloader.JPPFClassLoader(org.jppf.classloader.AbstractJPPFClassLoader).findClass(java.lang.String) line: 110  
  org.jppf.classloader.JPPFClassLoader(java.lang.ClassLoader).loadClass(java.lang.String, boolean) line: 424  
  org.jppf.classloader.JPPFClassLoader(org.jppf.classloader.AbstractJPPFClassLoader).loadClass(java.lang.String, boolean) line: 332 
  org.jppf.classloader.JPPFClassLoader(java.lang.ClassLoader).loadClass(java.lang.String) line: 357 
  test.TestApp$$anonfun$1$$anon$1.call() line: 21 
  test.TestApp$$anonfun$1$$anon$1.call() line: 17 
  org.jppf.client.taskwrapper.CallableTaskWrapper.execute() line: 57  
  org.jppf.client.taskwrapper.JPPFAnnotatedTask.run() line: 84  
  org.jppf.execute.NodeTaskWrapper.run() line: 158  
  java.util.concurrent.Executors$RunnableAdapter<T>.call() line: 511  
  java.util.concurrent.FutureTask<V>.run() line: 266  
  java.util.concurrent.Executors$RunnableAdapter<T>.call() line: 511  
  java.util.concurrent.ExecutorCompletionService$QueueingFuture(java.util.concurrent.FutureTask<V>).run() line: 266 
  org.jppf.execute.ThreadManagerThreadPool$1(java.util.concurrent.ThreadPoolExecutor).runWorker(java.util.concurrent.ThreadPoolExecutor$Worker) line: 1142  
  java.util.concurrent.ThreadPoolExecutor$Worker.run() line: 617  
  java.lang.Thread.run() line: 745
#16
Comment posted by
 lolo4j
Oct 18, 08:30
Furthermore, looking in the node log (with DEBUG logging enabled on org.jppf.classloader), I get the root cause:
2016-10-18 07:46:00,736 [DEBUG][Processing-0008     ][org.jppf.classloader.AbstractJPPFClassLoaderLifeCycle.loadResource(165)]: 
java.lang.InterruptedException
  at java.lang.Object.wait(Native Method)
  at java.lang.Object.wait(Object.java:502)
  at org.jppf.classloader.ResourceFuture.get(ResourceFuture.java:95)
  at org.jppf.classloader.ResourceFuture.get(ResourceFuture.java:1)
  at org.jppf.classloader.AbstractClassLoaderConnection.loadResource(AbstractClassLoaderConnection.java:121)
  at org.jppf.classloader.AbstractJPPFClassLoaderLifeCycle.loadResource(AbstractJPPFClassLoaderLifeCycle.java:157)
  at org.jppf.classloader.AbstractJPPFClassLoader.findClass(AbstractJPPFClassLoader.java:150)
  at org.jppf.classloader.AbstractJPPFClassLoader.findClass(AbstractJPPFClassLoader.java:110)
  at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
  at org.jppf.classloader.AbstractJPPFClassLoader.loadClass(AbstractJPPFClassLoader.java:332)
  at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
  at java.lang.ClassLoader.defineClass1(Native Method)
  at java.lang.ClassLoader.defineClass(ClassLoader.java:760)
  at java.lang.ClassLoader.defineClass(ClassLoader.java:642)
  at org.jppf.classloader.AbstractJPPFClassLoader.findClass(AbstractJPPFClassLoader.java:160)
  at org.jppf.classloader.AbstractJPPFClassLoader.findClass(AbstractJPPFClassLoader.java:110)
  at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
  at org.jppf.classloader.AbstractJPPFClassLoader.loadClass(AbstractJPPFClassLoader.java:332)
  at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
  at scala.sys.BooleanProp$.keyExists(BooleanProp.scala:73)
  at scala.sys.SystemProperties$.bool(SystemProperties.scala:71)
  at scala.sys.SystemProperties$.noTraceSupression$lzycompute(SystemProperties.scala:82)
  at scala.sys.SystemProperties$.noTraceSupression(SystemProperties.scala:82)
  at scala.util.control.NoStackTrace$.<init>(NoStackTrace.scala:31)
  at scala.util.control.NoStackTrace$.<clinit>(NoStackTrace.scala)
  at scala.util.control.NoStackTrace$class.fillInStackTrace(NoStackTrace.scala:22)
  at scala.util.control.BreakControl.fillInStackTrace(Breaks.scala:94)
  at java.lang.Throwable.<init>(Throwable.java:250)
  at scala.util.control.BreakControl.<init>(Breaks.scala:94)
  at scala.util.control.Breaks.<init>(Breaks.scala:29)
  at scala.collection.Traversable$.<init>(Traversable.scala:95)
  at scala.collection.Traversable$.<clinit>(Traversable.scala)
  at scala.package$.<init>(package.scala:40)
  at scala.package$.<clinit>(package.scala)
  at scala.Predef$.<init>(Predef.scala:94)
  at scala.Predef$.<clinit>(Predef.scala)
  at test.TestApp$$anonfun$1$$anon$1.call(TestApp.scala:21)
  at test.TestApp$$anonfun$1$$anon$1.call(TestApp.scala:17)
  at org.jppf.client.taskwrapper.CallableTaskWrapper.execute(CallableTaskWrapper.java:57)
  at org.jppf.client.taskwrapper.JPPFAnnotatedTask.run(JPPFAnnotatedTask.java:84)
  at org.jppf.execute.NodeTaskWrapper.run(NodeTaskWrapper.java:158)
  at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
  at java.util.concurrent.FutureTask.run(FutureTask.java:266)
  at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
  at java.util.concurrent.FutureTask.run(FutureTask.java:266)
  at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
  at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
  at java.lang.Thread.run(Thread.java:745)
Here's the code of the AbstractJPPFClassLoaderLifeCycle.loadResource() method:
protected JPPFResourceWrapper loadResource(final Map<ResourceIdentifier, Object> map) throws ClassNotFoundException {
  JPPFResourceWrapper resource = null;
  if (!isRemoteClassLoadingDisabled()) {
    try {
      if (debugEnabled) log.debug(build(this, " loading remote definition for resource [", map.get("name"), "]"));
      map.put(ResourceIdentifier.FILE_LOOKUP_ALLOWED, FILE_LOOKUP);
      resource = connection.loadResource(map, dynamic, requestUuid, uuidPath);
      if (debugEnabled) log.debug(build(this, " remote definition for resource [", map.get("name") + "] ", resource.getDefinition()==null ? "not " : "", "found"));
    } catch(IOException e) {
      if (debugEnabled) log.debug(this.toString() + " connection with class server ended, re-initializing, exception is:", e);
      throw new JPPFNodeReconnectionNotification("connection with class server ended, re-initializing, exception is:", e, ConnectionReason.CLASSLOADER_PROCESSING_ERROR);
    } catch(ClassNotFoundException e) {
      throw e;
    } catch(Exception e) {
      if (debugEnabled) log.debug(e.getMessage(), e);
    }
  }
  return resource;
}
We can see the InterruptedException is caught here and the returned ResourceWrapper is null. Because of this null value, the caller method higher in the stack AbstractJPPFClassLoader.findClass() will raise a ClassNotFoundException, This method is called by the native method ClassLoader.defineClass1() which I suspect is throwing the NoClassDefFoundError upon catching the ClassNotFoundException. I'm guessing here, since I didn't look in the native code to check it, however I couldn't find another non-native method in the call stack that throws a NoClassDefoundError (and I did look at the java code for these).

In our code, there's only one place that calls ClassLoader.defineClass(). I'll try and catch the NoClassDefoundError there and wrap it in in a ClassNotFoundException, which should be the normal behavior. I will try and see if it's possible to propagate the information that the thread was interrupted, so it's possible to make a decision as whether to add the class to the "not found cache" (we shouldn't in this scenario, but there are other real reasons for throwing a NoClassDefoundError).
#17
Comment posted by
 zorba128
Nov 02, 11:08
Any updates on subject?
#18
Comment posted by
 kamilk
icon_reply.pngNov 25, 13:20, in reply to comment #17
I think I'm facing similar problems; I'm using plain java, but my code uses some embedded resources which are also loaded with remote classloader. Now I simply disabled possibility to cancel jobs, but it would be nice if it worked.

#19
Comment posted by
 lolo4j
Nov 27, 09:03
Fixed the issue by using the interruptible flag that is present in Task / AbstractTask, and modfying the JPPF code so that it can also be applied to Callable, Runnable and Pojo tasks.

For example, to avoid interuption of the tasks's thread, a callable task should be implemented like this:
public static class MyCallable implements Callable<String>, Serializable, Interruptibility {
  @Override
  public String call() throws Exception {
    // scala-library.jar and scala-reflect.jar must be in the client's classpath
    Class.forName("scala.Predef$");
    return "task result";
  }
 
  @Override
  public boolean isInterruptible() {
    return false;
  }
}
A task that extends AbstractTask would only need to override isInterruptible().

With this, when a task is cancelled, its thread will not be interrupted, so any class loading taking place will complete.
#20
Comment posted by
 lolo4j
Nov 27, 09:11
Fixed in:
#24
Comment posted by
 zorba128
icon_reply.pngNov 30, 10:36, in reply to comment #20
Is there a way to know if cancellation attempt was made from within task, so it can gracefully cancel its execution?

#25
Comment posted by
 lolo4j
icon_reply.pngDec 02, 09:01, in reply to comment #24
For tasks that extend AbstractTask, you simply override the onCancel() callback method:
public class MyTask extends AbstractTask<String> {
  @Override
  public void run() {
    // do the work
  }
 
  @Override
  public void onCancel() {
    // handle cancellation
  }
}
For other types of tasks, like those implementing Callable, you need to set a JPPFTaskCallback onto the JPPFAnnotatedTask wrapping the Callable:
public class MyCallable implements Callable<String> {
  @Override
  public String call() throws Exception {
    // do the work
    return "hello";
  }
}
 
public class CancelCallback extends JPPFTaskCallback<Object> {
  @Override
  public void run() {
    MyCallable task = (MyCallable) getTask().getTaskObject();
    // handle cancellation ...
  }
}
 
// on the client side:
JPPFJob job = new JPPFJob();
JPPFAnnotatedTask myTask = (JPPFAnnotatedTask) job.add(new MyCallable());
myTask.setCancelCallback(new CancelCallback());


zorba128 wrote:
Is there a way to know if cancellation attempt was made from within task,
so it can gracefully cancel its execution?



#26
Comment posted by
 zorba128
icon_reply.pngDec 07, 10:49, in reply to comment #25
From what I've tested - it seems onCancel in AbstractTask gets called only after run() method is finished. So I get cancel notification, but I cannot really use it to abort work.

#27
Comment posted by
 zorba128
Dec 08, 14:48
It seems first scenario I've described still happens. My job is just JPPFAnnotatedTask which wraps Callable, now with isInterruptible=false. Take a look at complete driver and node logs below. 'Job 43' was last one executed and 'Job 44' got stuck (no JOB_ENDED notification).

2016-12-08 13:17:33,481 [DEBUG][org.jppf.server.queue.JPPFPriorityQueue.addBundle(111)]: adding bundle with ServerTaskBundleClient[id=88, pendingTasks=1, cancelled=false, done=false, job=JPPFTaskBundle[name=Job 44, uuid=Job 44, initialTaskCount=1, taskCount=1, bundleUuid=null, uuidPath=TraversalList[position=0, list=[F4890E83-1C23-C32D-79F6-0A22E77F3758, DBB8847E-6E00-CC3A-4BC9-04F14A48FED1]]]]
2016-12-08 13:17:33,481 [DEBUG][org.jppf.server.job.management.DriverJobManagement.sendNotification(233)]: sending event JOB_QUEUED for job JobInformation[jobUuid=Job 44, jobName=Job 44, taskCount=0, initialTaskCount=1, priority=0, suspended=false, pending=false, maxNodes=2147483647], node=null
2016-12-08 13:17:33,481 [DEBUG][org.jppf.server.job.JPPFJobManager.jobUpdated(178)]: jobId 'Job 44' updated
2016-12-08 13:17:33,481 [DEBUG][org.jppf.server.job.management.DriverJobManagement.sendNotification(233)]: sending event JOB_UPDATED for job JobInformation[jobUuid=Job 44, jobName=Job 44, taskCount=1, initialTaskCount=1, priority=0, suspended=false, pending=false, maxNodes=2147483647], node=null
2016-12-08 13:17:33,481 [DEBUG][org.jppf.server.queue.JPPFPriorityQueue.addBundle(134)]: Maps size information: priorityMap[shallow size=1, total elements=1]
2016-12-08 13:17:33,481 [DEBUG][org.jppf.server.nio.nodeserver.TaskQueueChecker.dispatch(86)]: 1 channels idle
2016-12-08 13:17:33,481 [DEBUG][org.jppf.nio.StateTransitionManager.transitionChannel(156)]: transition from WAITING_JOB to IDLE with ops=1 (readyOps=1) for channel id=6, submit=false
2016-12-08 13:17:33,481 [DEBUG][org.jppf.server.nio.nodeserver.TaskQueueChecker.checkJobState(288)]: job 'Job 44', suspended=false, pending=false, expired=false
2016-12-08 13:17:33,481 [DEBUG][org.jppf.server.nio.nodeserver.TaskQueueChecker.checkJobState(290)]: current nodes = 0, maxNodes = 2147483647
2016-12-08 13:17:33,482 [DEBUG][org.jppf.server.nio.nodeserver.TaskQueueChecker.findIdleChannelIndex(205)]: jobUuid=Job 44, readyNodes=null
2016-12-08 13:17:33,482 [DEBUG][org.jppf.server.nio.nodeserver.TaskQueueChecker.findIdleChannelIndex(215)]: uuid path=[F4890E83-1C23-C32D-79F6-0A22E77F3758, DBB8847E-6E00-CC3A-4BC9-04F14A48FED1], node uuid=F05120D9-B45E-DFD3-BD31-1F6F733F0774
2016-12-08 13:17:33,482 [DEBUG][org.jppf.server.nio.nodeserver.TaskQueueChecker.findIdleChannelIndex(260)]: found 1 acceptable channels
2016-12-08 13:17:33,482 [DEBUG][org.jppf.server.nio.nodeserver.AbstractTaskQueueChecker.removeIdleChannel(168)]: Removing idle channel RemoteNodeContext[channel=SelectionKeyWrapper[id=8], state=IDLE, uuid=F05120D9-B45E-DFD3-BD31-1F6F733F0774, connectionUuid=null, peer=false, ssl=false]
2016-12-08 13:17:33,482 [DEBUG][org.jppf.server.nio.nodeserver.TaskQueueChecker.prepareJobDispatch(160)]: dispatching jobUuid=Job 44 to node RemoteNodeContext[channel=SelectionKeyWrapper[id=8], state=IDLE, uuid=F05120D9-B45E-DFD3-BD31-1F6F733F0774, connectionUuid=null, peer=false, ssl=false], nodeUuid=null
2016-12-08 13:17:33,482 [DEBUG][org.jppf.server.queue.JPPFPriorityQueue.nextBundle(164)]: requesting bundle with 2 tasks, next bundle has 1 tasks
2016-12-08 13:17:33,482 [DEBUG][org.jppf.server.job.JPPFJobManager.jobUpdated(178)]: jobId 'Job 44' updated
2016-12-08 13:17:33,482 [DEBUG][org.jppf.server.queue.JPPFPriorityQueue.removeBundle(215)]: removing bundle from queue, jobName= Job 44, removeFromJobMap=false
2016-12-08 13:17:33,482 [DEBUG][org.jppf.server.queue.JPPFPriorityQueue.nextBundle(179)]: Maps size information: priorityMap[shallow size=0, total elements=0]
2016-12-08 13:17:33,482 [DEBUG][org.jppf.server.queue.JPPFPriorityQueue.nextBundle(183)]: found 1 tasks in the job, result=ServerTaskBundleNode[id=17, name=Job 44, uuid=Job 44, initialTaskCount=1, taskCount=1, cancelled=false, requeued=false, channel=null]
2016-12-08 13:17:33,482 [DEBUG][org.jppf.server.job.management.DriverJobManagement.sendNotification(233)]: sending event JOB_UPDATED for job JobInformation[jobUuid=Job 44, jobName=Job 44, taskCount=0, initialTaskCount=1, priority=0, suspended=false, pending=false, maxNodes=2147483647], node=null
2016-12-08 13:17:33,482 [DEBUG][org.jppf.server.nio.nodeserver.TaskQueueChecker.dispatch(116)]: prepareJobDispatch() returned ServerTaskBundleNode[id=17, name=Job 44, uuid=Job 44, initialTaskCount=1, taskCount=1, cancelled=false, requeued=false, channel=null]
2016-12-08 13:17:33,482 [DEBUG][org.jppf.server.nio.nodeserver.TaskQueueChecker.dispatchJobToChannel(180)]: dispatching 1 tasks of job 'Job 44' to node F05120D9-B45E-DFD3-BD31-1F6F733F0774
2016-12-08 13:17:33,482 [DEBUG][org.jppf.nio.StateTransitionManager.transitionChannel(156)]: transition from IDLE to SENDING_BUNDLE with ops=5 (readyOps=1) for channel id=8, submit=false
2016-12-08 13:17:33,482 [DEBUG][org.jppf.server.protocol.AbstractServerJobBase.jobDispatched(144)]: added to dispatch set: ServerTaskBundleNode[id=17, name=Job 44, uuid=Job 44, initialTaskCount=1, taskCount=1, cancelled=false, requeued=false, channel=RemoteNodeContext[channel=SelectionKeyWrapper[id=8], state=SENDING_BUNDLE, uuid=F05120D9-B45E-DFD3-BD31-1F6F733F0774, connectionUuid=null, peer=false, ssl=false]]
2016-12-08 13:17:33,482 [DEBUG][org.jppf.server.job.JPPFJobManager.jobDispatched(120)]: jobId 'Job 44' : added node RemoteNodeContext[channel=SelectionKeyWrapper[id=8], state=SENDING_BUNDLE, uuid=F05120D9-B45E-DFD3-BD31-1F6F733F0774, connectionUuid=null, peer=false, ssl=false]
2016-12-08 13:17:33,482 [DEBUG][org.jppf.nio.StateTransitionManager.submitTransition(103)]: submitting transition for channel id=8
2016-12-08 13:17:33,482 [DEBUG][org.jppf.server.nio.nodeserver.AbstractTaskQueueChecker.removeIdleChannel(168)]: Removing idle channel RemoteNodeContext[channel=SelectionKeyWrapper[id=8], state=SENDING_BUNDLE, uuid=F05120D9-B45E-DFD3-BD31-1F6F733F0774, connectionUuid=null, peer=false, ssl=false]
2016-12-08 13:17:33,482 [DEBUG][org.jppf.server.nio.nodeserver.SendingBundleState.performTransition(73)]: got bundle ServerTaskBundleNode[id=17, name=Job 44, uuid=Job 44, initialTaskCount=1, taskCount=1, cancelled=false, requeued=false, channel=RemoteNodeContext[channel=SelectionKeyWrapper[id=8], state=SENDING_BUNDLE, uuid=F05120D9-B45E-DFD3-BD31-1F6F733F0774, connectionUuid=null, peer=false, ssl=false]] from the queue for SelectionKeyWrapper[id=8, readyOps=4, interestOps=0, context=RemoteNodeContext[channel=SelectionKeyWrapper[id=8], state=SENDING_BUNDLE, uuid=F05120D9-B45E-DFD3-BD31-1F6F733F0774, connectionUuid=null, peer=false, ssl=false]]
2016-12-08 13:17:33,482 [DEBUG][org.jppf.server.job.management.DriverJobManagement.sendNotification(233)]: sending event JOB_DISPATCHED for job JobInformation[jobUuid=Job 44, jobName=Job 44, taskCount=1, initialTaskCount=1, priority=0, suspended=false, pending=false, maxNodes=2147483647], node=JPPFManagementInfo[127.0.0.1:12001, type=node|MASTER, local=false, secure=false, active=true, uuid=F05120D9-B45E-DFD3-BD31-1F6F733F0774]
2016-12-08 13:17:33,483 [DEBUG][org.jppf.nio.PlainNioObject.write(127)]: read 4 bytes for PlainNioObject[channel id=8, size=4, count=4, source=null, dest=ChannelOutputDestination[channel=java.nio.channels.SocketChannel[connected local=/127.0.0.1:11112 remote=/127.0.0.1:63005]], location=MultipleBuffersLocation[size=4, count=4, currentBuffer=null, currentBufferIndex=0, transferring=false, list=[org.jppf.utils.JPPFBuffer@674cea9d]]]
2016-12-08 13:17:33,483 [DEBUG][org.jppf.nio.PlainNioObject.write(127)]: read 1971 bytes for PlainNioObject[channel id=8, size=1971, count=1971, source=null, dest=ChannelOutputDestination[channel=java.nio.channels.SocketChannel[connected local=/127.0.0.1:11112 remote=/127.0.0.1:63005]], location=MultipleBuffersLocation[size=1971, count=1971, currentBuffer=null, currentBufferIndex=0, transferring=false, list=[org.jppf.utils.JPPFBuffer@aec045a]]]
2016-12-08 13:17:33,483 [DEBUG][org.jppf.nio.PlainNioObject.write(127)]: read 4 bytes for PlainNioObject[channel id=8, size=4, count=4, source=null, dest=ChannelOutputDestination[channel=java.nio.channels.SocketChannel[connected local=/127.0.0.1:11112 remote=/127.0.0.1:63005]], location=MultipleBuffersLocation[size=4, count=4, currentBuffer=null, currentBufferIndex=0, transferring=false, list=[org.jppf.utils.JPPFBuffer@41ec3ac9]]]
2016-12-08 13:17:33,483 [DEBUG][org.jppf.nio.PlainNioObject.write(127)]: read 5 bytes for PlainNioObject[channel id=8, size=5, count=5, source=null, dest=ChannelOutputDestination[channel=java.nio.channels.SocketChannel[connected local=/127.0.0.1:11112 remote=/127.0.0.1:63005]], location=MultipleBuffersLocation[size=5, count=5, currentBuffer=null, currentBufferIndex=0, transferring=false, list=[org.jppf.utils.JPPFBuffer@72cf0fe8]]]
2016-12-08 13:17:33,483 [DEBUG][org.jppf.nio.PlainNioObject.write(127)]: read 4 bytes for PlainNioObject[channel id=8, size=4, count=4, source=null, dest=ChannelOutputDestination[channel=java.nio.channels.SocketChannel[connected local=/127.0.0.1:11112 remote=/127.0.0.1:63005]], location=MultipleBuffersLocation[size=4, count=4, currentBuffer=null, currentBufferIndex=0, transferring=false, list=[org.jppf.utils.JPPFBuffer@32020449]]]
2016-12-08 13:17:33,483 [DEBUG][org.jppf.nio.PlainNioObject.write(127)]: read 930 bytes for PlainNioObject[channel id=8, size=930, count=930, source=null, dest=ChannelOutputDestination[channel=java.nio.channels.SocketChannel[connected local=/127.0.0.1:11112 remote=/127.0.0.1:63005]], location=MultipleBuffersLocation[size=930, count=930, currentBuffer=null, currentBufferIndex=0, transferring=false, list=[org.jppf.utils.JPPFBuffer@782065d8]]]
2016-12-08 13:17:33,483 [DEBUG][org.jppf.server.nio.nodeserver.SendingBundleState.performTransition(90)]: sent entire bundle ServerTaskBundleNode[id=17, name=Job 44, uuid=Job 44, initialTaskCount=1, taskCount=1, cancelled=false, requeued=false, channel=RemoteNodeContext[channel=SelectionKeyWrapper[id=8], state=SENDING_BUNDLE, uuid=F05120D9-B45E-DFD3-BD31-1F6F733F0774, connectionUuid=null, peer=false, ssl=false]] to node SelectionKeyWrapper[id=8, readyOps=4, interestOps=0, context=RemoteNodeContext[channel=SelectionKeyWrapper[id=8], state=SENDING_BUNDLE, uuid=F05120D9-B45E-DFD3-BD31-1F6F733F0774, connectionUuid=null, peer=false, ssl=false]]
2016-12-08 13:17:33,483 [DEBUG][org.jppf.nio.StateTransitionManager.transitionChannel(156)]: transition from SENDING_BUNDLE to WAITING_RESULTS with ops=1 (readyOps=4) for channel id=8, submit=false
2016-12-08 13:17:33,485 [DEBUG][org.jppf.nio.StateTransitionManager.submitTransition(103)]: submitting transition for channel id=8
2016-12-08 13:17:33,485 [DEBUG][org.jppf.nio.PlainNioObject.read(98)]: read 4 bytes for PlainNioObject[channel id=8, size=4, count=4, source=ChannelInputSource[channel=java.nio.channels.SocketChannel[connected local=/127.0.0.1:11112 remote=/127.0.0.1:63005]], dest=null, location=MultipleBuffersLocation[size=4, count=4, currentBuffer=org.jppf.utils.JPPFBuffer@196ee243, currentBufferIndex=0, transferring=false, list=[org.jppf.utils.JPPFBuffer@196ee243]]]
2016-12-08 13:17:33,485 [DEBUG][org.jppf.nio.PlainNioObject.read(98)]: read 1073 bytes for PlainNioObject[channel id=8, size=1073, count=1073, source=ChannelInputSource[channel=java.nio.channels.SocketChannel[connected local=/127.0.0.1:11112 remote=/127.0.0.1:63005]], dest=null, location=MultipleBuffersLocation[size=1073, count=1073, currentBuffer=org.jppf.utils.JPPFBuffer@2653cb3f, currentBufferIndex=0, transferring=false, list=[org.jppf.utils.JPPFBuffer@2653cb3f]]]
2016-12-08 13:17:33,486 [DEBUG][org.jppf.nio.PlainNioObject.read(98)]: read 4 bytes for PlainNioObject[channel id=8, size=4, count=4, source=ChannelInputSource[channel=java.nio.channels.SocketChannel[connected local=/127.0.0.1:11112 remote=/127.0.0.1:63005]], dest=null, location=MultipleBuffersLocation[size=4, count=4, currentBuffer=org.jppf.utils.JPPFBuffer@66c94191, currentBufferIndex=0, transferring=false, list=[org.jppf.utils.JPPFBuffer@66c94191]]]
2016-12-08 13:17:33,486 [DEBUG][org.jppf.nio.PlainNioObject.read(98)]: read 939 bytes for PlainNioObject[channel id=8, size=939, count=939, source=ChannelInputSource[channel=java.nio.channels.SocketChannel[connected local=/127.0.0.1:11112 remote=/127.0.0.1:63005]], dest=null, location=MultipleBuffersLocation[size=939, count=939, currentBuffer=org.jppf.utils.JPPFBuffer@127ed644, currentBufferIndex=0, transferring=false, list=[org.jppf.utils.JPPFBuffer@127ed644]]]
2016-12-08 13:17:33,486 [DEBUG][org.jppf.scheduling.JPPFScheduleHandler.cancelAction(131)]: DispatchExpiration : cancelling action for key=Job 44|17, future=null
2016-12-08 13:17:33,486 [DEBUG][org.jppf.server.nio.nodeserver.WaitingResultsState.process(84)]: read bundle JPPFTaskBundle[name=Job 44, uuid=Job 44, initialTaskCount=1, taskCount=1, bundleUuid=null, uuidPath=TraversalList[position=0, list=[F4890E83-1C23-C32D-79F6-0A22E77F3758, DBB8847E-6E00-CC3A-4BC9-04F14A48FED1]]] from node RemoteNodeContext[channel=SelectionKeyWrapper[id=8], state=WAITING_RESULTS, uuid=F05120D9-B45E-DFD3-BD31-1F6F733F0774, connectionUuid=null, peer=false, ssl=false]
2016-12-08 13:17:33,486 [DEBUG][org.jppf.server.nio.nodeserver.WaitingResultsState.processResults(121)]: received bundle with 1 tasks, taskCount=1 : JPPFTaskBundle[name=Job 44, uuid=Job 44, initialTaskCount=1, taskCount=1, bundleUuid=null, uuidPath=TraversalList[position=0, list=[F4890E83-1C23-C32D-79F6-0A22E77F3758, DBB8847E-6E00-CC3A-4BC9-04F14A48FED1]]]
2016-12-08 13:17:33,486 [DEBUG][org.jppf.server.nio.nodeserver.WaitingResultsState.processResults(126)]: resubmitPositions = null for JPPFTaskBundle[name=Job 44, uuid=Job 44, initialTaskCount=1, taskCount=1, bundleUuid=null, uuidPath=TraversalList[position=0, list=[F4890E83-1C23-C32D-79F6-0A22E77F3758, DBB8847E-6E00-CC3A-4BC9-04F14A48FED1]]]
2016-12-08 13:17:33,486 [DEBUG][org.jppf.server.protocol.AbstractServerJobBase.jobReturned(162)]: removed from dispatch set: ServerTaskBundleNode[id=17, name=Job 44, uuid=Job 44, initialTaskCount=1, taskCount=1, cancelled=false, requeued=false, channel=RemoteNodeContext[channel=SelectionKeyWrapper[id=8], state=WAITING_RESULTS, uuid=F05120D9-B45E-DFD3-BD31-1F6F733F0774, connectionUuid=null, peer=false, ssl=false]]
2016-12-08 13:17:33,486 [DEBUG][org.jppf.server.job.management.DriverJobManagement.cancelJob(69)]: Request to cancel job 'Job 44'
2016-12-08 13:17:33,486 [DEBUG][org.jppf.server.job.JPPFJobManager.jobReturned(132)]: jobId 'Job 44' : removed node RemoteNodeContext[channel=SelectionKeyWrapper[id=8], state=WAITING_RESULTS, uuid=F05120D9-B45E-DFD3-BD31-1F6F733F0774, connectionUuid=null, peer=false, ssl=false]
2016-12-08 13:17:33,486 [DEBUG][org.jppf.server.protocol.ServerJob.cancel(267)]: request to cancel ServerJob[id=45, job uuid=Job 44, name=Job 44, status=EXECUTING, taskCount=0, nbBundles=1], jobExpired=false, pending=false, suspended=false
2016-12-08 13:17:33,486 [DEBUG][org.jppf.server.job.management.DriverJobManagement.sendNotification(233)]: sending event JOB_RETURNED for job JobInformation[jobUuid=Job 44, jobName=Job 44, taskCount=1, initialTaskCount=1, priority=0, suspended=false, pending=false, maxNodes=2147483647], node=JPPFManagementInfo[127.0.0.1:12001, type=node|MASTER, local=false, secure=false, active=true, uuid=F05120D9-B45E-DFD3-BD31-1F6F733F0774]
2016-12-08 13:17:33,486 [DEBUG][org.jppf.server.protocol.ServerJob.resultsReceived(106)]: received 1 results from ServerTaskBundleNode[id=17, name=Job 44, uuid=Job 44, initialTaskCount=1, taskCount=1, cancelled=false, requeued=false, channel=null]
2016-12-08 13:17:33,486 [DEBUG][org.jppf.server.protocol.ServerJob.handleCancelledStatus(229)]: cancelling 0 dispatches for ServerJob[id=45, job uuid=Job 44, name=Job 44, status=CANCELLED, taskCount=0, nbBundles=1], jobExpired=false, pending=false, suspended=false
2016-12-08 13:17:33,486 [DEBUG][org.jppf.server.protocol.ServerJob.handleCancelledTasks(248)]: cancelling tasks for ServerJob[id=45, job uuid=Job 44, name=Job 44, status=CANCELLED, taskCount=0, nbBundles=1], jobExpired=false, pending=false, suspended=false
2016-12-08 13:17:33,487 [DEBUG][org.jppf.server.nio.nodeserver.NodeReservationHandler.onJobCancelled(124)]: job Job 44 cancelled, removing all reservations
2016-12-08 13:17:33,487 [DEBUG][org.jppf.server.protocol.ServerJob.resultsReceived(136)]: client bundle map has 1 entries: [ServerTaskBundleClient[id=88, pendingTasks=1, cancelled=false, done=false, job=JPPFTaskBundle[name=Job 44, uuid=Job 44, initialTaskCount=1, taskCount=1, bundleUuid=null, uuidPath=TraversalList[position=0, list=[F4890E83-1C23-C32D-79F6-0A22E77F3758, DBB8847E-6E00-CC3A-4BC9-04F14A48FED1]]]]]
2016-12-08 13:17:33,487 [DEBUG][org.jppf.server.protocol.ServerTaskBundleClient.resultReceived(205)]: received 1 tasks for ServerTaskBundleClient[id=88, pendingTasks=1, cancelled=false, done=false, job=JPPFTaskBundle[name=Job 44, uuid=Job 44, initialTaskCount=1, taskCount=1, bundleUuid=null, uuidPath=TraversalList[position=0, list=[F4890E83-1C23-C32D-79F6-0A22E77F3758, DBB8847E-6E00-CC3A-4BC9-04F14A48FED1]]]]
2016-12-08 13:17:33,487 [DEBUG][org.jppf.server.protocol.ServerTaskBundleClient.fireTasksCompleted(360)]: created bundle id=89 for ServerTaskBundleClient[id=88, pendingTasks=1, cancelled=false, done=false, job=JPPFTaskBundle[name=Job 44, uuid=Job 44, initialTaskCount=1, taskCount=1, bundleUuid=null, uuidPath=TraversalList[position=0, list=[F4890E83-1C23-C32D-79F6-0A22E77F3758, DBB8847E-6E00-CC3A-4BC9-04F14A48FED1]]]]
2016-12-08 13:17:33,487 [DEBUG][org.jppf.server.nio.client.CompletionListener.taskCompleted(78)]: empty results list for bundle ServerTaskBundleClient[id=89, pendingTasks=0, cancelled=false, done=false, job=JPPFTaskBundle[name=Job 44, uuid=Job 44, initialTaskCount=1, taskCount=0, bundleUuid=null, uuidPath=TraversalList[position=0, list=[F4890E83-1C23-C32D-79F6-0A22E77F3758, DBB8847E-6E00-CC3A-4BC9-04F14A48FED1]]]]
2016-12-08 13:17:33,487 [DEBUG][org.jppf.server.queue.JPPFPriorityQueue.requeue(150)]: requeuing job ServerJob[id=45, job uuid=Job 44, name=Job 44, status=CANCELLED, taskCount=1, nbBundles=1], jobExpired=false, pending=false, suspended=false
2016-12-08 13:17:33,487 [DEBUG][org.jppf.nio.StateTransitionManager.transitionChannel(156)]: transition from WAITING_RESULTS to IDLE with ops=1 (readyOps=1) for channel id=8, submit=false
2016-12-08 13:17:33,487 [DEBUG][org.jppf.server.nio.nodeserver.AbstractTaskQueueChecker.addIdleChannel(141)]: Adding idle channel RemoteNodeContext[channel=SelectionKeyWrapper[id=8], state=IDLE, uuid=F05120D9-B45E-DFD3-BD31-1F6F733F0774, connectionUuid=null, peer=false, ssl=false]
2016-12-08 13:17:33,487 [DEBUG][org.jppf.server.nio.nodeserver.NodeReservationHandler.removeReservation(111)]: removing reservation for node RemoteNodeContext[channel=SelectionKeyWrapper[id=8], state=IDLE, uuid=F05120D9-B45E-DFD3-BD31-1F6F733F0774, connectionUuid=null, peer=false, ssl=false]
2016-12-08 13:17:33,487 [DEBUG][org.jppf.server.nio.nodeserver.TaskQueueChecker.dispatch(86)]: 1 channels idle
2016-12-08 13:17:33,487 [DEBUG][org.jppf.server.nio.nodeserver.TaskQueueChecker.dispatch(130)]: no channel found for bundle 
2016-12-08 13:17:34,415 [DEBUG][org.jppf.comm.discovery.JPPFMulticastReceiver.addConnectionInfo(159)]: nb connections: 1
2016-12-08 13:17:34,416 [DEBUG][org.jppf.comm.discovery.JPPFMulticastReceiver.addConnectionInfo(159)]: nb connections: 1
2016-12-08 13:17:34,488 [DEBUG][org.jppf.server.nio.nodeserver.TaskQueueChecker.dispatch(86)]: 1 channels idle


And node logs:
2016-12-08 13:17:33,466 [DEBUG][org.jppf.server.node.remote.RemoteNodeIO.deserializeObjects(72)]: got bundle JPPFTaskBundle[name=Job 42, uuid=Job 42, initialTaskCount=1, taskCount=1, bundleUuid=null, uuidPath=TraversalList[position=0, list=[F4890E83-1C23-C32D-79F6-0A22E77F3758, DBB8847E-6E00-CC3A-4BC9-04F14A48FED1]]]
2016-12-08 13:17:33,466 [DEBUG][org.jppf.server.node.remote.RemoteNodeIO.deserializeObjects(89)]: bundle task count = 1, handshake = false
2016-12-08 13:17:33,466 [DEBUG][org.jppf.server.node.remote.RemoteNodeIO.deserializeObjects(99)]: got all data
2016-12-08 13:17:33,466 [DEBUG][org.jppf.server.node.JPPFNode.processNextJob(197)]: received a bundle with 1 tasks
2016-12-08 13:17:33,466 [DEBUG][org.jppf.execute.AbstractExecutionManager.execute(151)]: executing 1 tasks
2016-12-08 13:17:33,468 [DEBUG][org.jppf.scheduling.JPPFScheduleHandler.cancelAction(131)]: Task Timeout Timer : cancelling action for key=java.util.concurrent.FutureTask@2a23e8c1, future=null
2016-12-08 13:17:33,468 [DEBUG][org.jppf.management.JPPFNodeAdmin.setTaskCounter(196)]: node tasks counter reset to 13 requested
2016-12-08 13:17:33,468 [DEBUG][org.jppf.server.node.JPPFNode.processResults(256)]: processing 1 task results for job 'Job 42'
2016-12-08 13:17:33,468 [DEBUG][org.jppf.server.node.remote.RemoteNodeIO.sendResults(121)]: writing results for JPPFTaskBundle[name=Job 42, uuid=Job 42, initialTaskCount=1, taskCount=1, bundleUuid=null, uuidPath=TraversalList[position=0, list=[F4890E83-1C23-C32D-79F6-0A22E77F3758, DBB8847E-6E00-CC3A-4BC9-04F14A48FED1]]]
2016-12-08 13:17:33,468 [DEBUG][org.jppf.management.OffloadableNotification.setUserData(93)]: used memory=91 206 368, threshold=1 527 146 086
2016-12-08 13:17:33,468 [DEBUG][org.jppf.server.node.remote.RemoteNodeIO.sendResults(140)]: wrote full results
2016-12-08 13:17:33,468 [DEBUG][org.jppf.server.node.AbstractNodeIO.postSendResults(179)]: resetting remoteClassLoadingDisabled to false
2016-12-08 13:17:33,468 [DEBUG][org.jppf.server.node.remote.RemoteNodeIO.deserializeObjects(65)]: waiting for next request. Serializer = org.jppf.utils.ObjectSerializerImpl@7f13d6e (class loader = JPPFClassLoader[id=1, type=server, uuidPath=[], offline=false, classpath=])
2016-12-08 13:17:33,474 [DEBUG][org.jppf.server.node.remote.RemoteNodeIO.deserializeObjects(72)]: got bundle JPPFTaskBundle[name=Job 43, uuid=Job 43, initialTaskCount=1, taskCount=1, bundleUuid=null, uuidPath=TraversalList[position=0, list=[F4890E83-1C23-C32D-79F6-0A22E77F3758, DBB8847E-6E00-CC3A-4BC9-04F14A48FED1]]]
2016-12-08 13:17:33,474 [DEBUG][org.jppf.server.node.remote.RemoteNodeIO.deserializeObjects(89)]: bundle task count = 1, handshake = false
2016-12-08 13:17:33,475 [DEBUG][org.jppf.server.node.remote.RemoteNodeIO.deserializeObjects(99)]: got all data
2016-12-08 13:17:33,475 [DEBUG][org.jppf.server.node.JPPFNode.processNextJob(197)]: received a bundle with 1 tasks
2016-12-08 13:17:33,475 [DEBUG][org.jppf.execute.AbstractExecutionManager.execute(151)]: executing 1 tasks
2016-12-08 13:17:33,476 [DEBUG][org.jppf.scheduling.JPPFScheduleHandler.cancelAction(131)]: Task Timeout Timer : cancelling action for key=java.util.concurrent.FutureTask@8f544f5, future=null
2016-12-08 13:17:33,476 [DEBUG][org.jppf.management.JPPFNodeAdmin.setTaskCounter(196)]: node tasks counter reset to 14 requested
2016-12-08 13:17:33,476 [DEBUG][org.jppf.server.node.JPPFNode.processResults(256)]: processing 1 task results for job 'Job 43'
2016-12-08 13:17:33,476 [DEBUG][org.jppf.management.OffloadableNotification.setUserData(93)]: used memory=92 199 984, threshold=1 527 146 086
2016-12-08 13:17:33,476 [DEBUG][org.jppf.server.node.remote.RemoteNodeIO.sendResults(121)]: writing results for JPPFTaskBundle[name=Job 43, uuid=Job 43, initialTaskCount=1, taskCount=1, bundleUuid=null, uuidPath=TraversalList[position=0, list=[F4890E83-1C23-C32D-79F6-0A22E77F3758, DBB8847E-6E00-CC3A-4BC9-04F14A48FED1]]]
2016-12-08 13:17:33,476 [DEBUG][org.jppf.server.node.remote.RemoteNodeIO.sendResults(140)]: wrote full results
2016-12-08 13:17:33,476 [DEBUG][org.jppf.server.node.AbstractNodeIO.postSendResults(179)]: resetting remoteClassLoadingDisabled to false
2016-12-08 13:17:33,477 [DEBUG][org.jppf.server.node.remote.RemoteNodeIO.deserializeObjects(65)]: waiting for next request. Serializer = org.jppf.utils.ObjectSerializerImpl@7f13d6e (class loader = JPPFClassLoader[id=1, type=server, uuidPath=[], offline=false, classpath=])
2016-12-08 13:17:33,483 [DEBUG][org.jppf.server.node.remote.RemoteNodeIO.deserializeObjects(72)]: got bundle JPPFTaskBundle[name=Job 44, uuid=Job 44, initialTaskCount=1, taskCount=1, bundleUuid=null, uuidPath=TraversalList[position=0, list=[F4890E83-1C23-C32D-79F6-0A22E77F3758, DBB8847E-6E00-CC3A-4BC9-04F14A48FED1]]]
2016-12-08 13:17:33,483 [DEBUG][org.jppf.server.node.remote.RemoteNodeIO.deserializeObjects(89)]: bundle task count = 1, handshake = false
2016-12-08 13:17:33,484 [DEBUG][org.jppf.server.node.remote.RemoteNodeIO.deserializeObjects(99)]: got all data
2016-12-08 13:17:33,484 [DEBUG][org.jppf.server.node.JPPFNode.processNextJob(197)]: received a bundle with 1 tasks
2016-12-08 13:17:33,484 [DEBUG][org.jppf.execute.AbstractExecutionManager.execute(151)]: executing 1 tasks
2016-12-08 13:17:33,485 [DEBUG][org.jppf.scheduling.JPPFScheduleHandler.cancelAction(131)]: Task Timeout Timer : cancelling action for key=java.util.concurrent.FutureTask@79cef89a, future=null
2016-12-08 13:17:33,485 [DEBUG][org.jppf.management.JPPFNodeAdmin.setTaskCounter(196)]: node tasks counter reset to 15 requested
2016-12-08 13:17:33,485 [DEBUG][org.jppf.server.node.JPPFNode.processResults(256)]: processing 1 task results for job 'Job 44'
2016-12-08 13:17:33,485 [DEBUG][org.jppf.server.node.remote.RemoteNodeIO.sendResults(121)]: writing results for JPPFTaskBundle[name=Job 44, uuid=Job 44, initialTaskCount=1, taskCount=1, bundleUuid=null, uuidPath=TraversalList[position=0, list=[F4890E83-1C23-C32D-79F6-0A22E77F3758, DBB8847E-6E00-CC3A-4BC9-04F14A48FED1]]]
2016-12-08 13:17:33,485 [DEBUG][org.jppf.management.OffloadableNotification.setUserData(93)]: used memory=92 199 984, threshold=1 527 146 086
2016-12-08 13:17:33,485 [DEBUG][org.jppf.server.node.remote.RemoteNodeIO.sendResults(140)]: wrote full results
2016-12-08 13:17:33,485 [DEBUG][org.jppf.server.node.AbstractNodeIO.postSendResults(179)]: resetting remoteClassLoadingDisabled to false
2016-12-08 13:17:33,485 [DEBUG][org.jppf.server.node.remote.RemoteNodeIO.deserializeObjects(65)]: waiting for next request. Serializer = org.jppf.utils.ObjectSerializerImpl@7f13d6e (class loader = JPPFClassLoader[id=1, type=server, uuidPath=[], offline=false, classpath=])


The issue was updated with the following change(s):
  • The status has been updated, from Closed to New.
  • This issue has been reopened
  • The resolution has been updated, from RESOLVED to Not determined.
#28
Comment posted by
 lolo4j
Dec 30, 08:15
A file was uploaded. Fix for immediate task cancel callback and bug JPPF-483icon_open_new.png
#29
Comment posted by
 lolo4j
Dec 30, 08:38
I am experimenting with a fix for the task cancel callback not being called immediately. In fact, onCancel() being called after the task completion is the expected behavior, and I registered documentation Bug report JPPF-484 - Invocation of tasks' onCancel() method is not clearly documented for this.

There is a non-public API for a callback invoked immediately upon cancellation of a task, used only in the .Net port. I have adapted JPPFAnnotatedTask to use it, so it becomes available by implementing the interface org.jppf.node.protocol.CancellationHandler, for instance like this:
public static class MyCallable implements Callable<String>, Serializable, Interruptibility, CancellationHandler {
  @Override
  public String call() throws Exception {
    Thread.sleep(5000L); // or any operation that takes time
    return "result: success";
  }
 
  @Override
  public boolean isInterruptible() { return false; }
 
  @Override
  public void doCancelAction() throws Exception {
    System.out.println("MyCallable: in doCancelAction()"); // this is printed immediately upon cancellation of this task
  }
}


I have uploaded a jar file to this bug report that includes this fix. Can you give it a try? The jar also includes a fix for Bug report JPPF-483 - ConcurrentModificationException in AbstractExecutionManager, which may be causing the job hung problem. If this doesn't resolve the job hung issue, could please provide a sample code to reproduce it?
#31
Comment posted by
 zorba128
icon_reply.pngJan 05, 11:54, in reply to comment #29
I have uploaded a jar file to this bug report that includes this fix. Can
you give it a try? The jar also includes a fix for Bug report JPPF-483 -
ConcurrentModificationException in AbstractExecutionManager, which may be
causing the job hung problem. If this doesn't resolve the job hung issue,
could please provide a sample code to reproduce it?


I don't think anything changed after applying patch; test is exactly as described at the very beginning if this bug report. See complete java code to reproduce issue:
package research.jppf;
 
import java.io.Serializable;
import java.util.concurrent.Callable;
import java.util.concurrent.atomic.AtomicInteger;
 
import org.jppf.JPPFException;
import org.jppf.client.*;
import org.jppf.client.taskwrapper.JPPFAnnotatedTask;
import org.jppf.node.protocol.Interruptibility;
import org.jppf.node.protocol.Task;
import org.jppf.scheduling.JPPFSchedule;
import java.util.Random;
 
public class JPPFStability2 {
 
    public static class MyPayload implements Callable<String>, Serializable, Interruptibility {
        private String name;
 
        MyPayload(String name) {
            this.name = name;
        }
 
        @Override
        public String call() throws Exception {
            Thread.sleep(5);
            System.out.println("[" + name + "] executed");
            return name;
        }
 
        @Override
        public boolean isInterruptible() {
            return false;
        }
    }
 
    public static void main(String args[]) throws JPPFException {
        Random random = new Random();
        JPPFClient client = new JPPFClient();
        if (client.awaitWorkingConnectionPools(10000).isEmpty()) throw new RuntimeException("No JPPF connection");
 
        AtomicInteger canceledCount = new AtomicInteger(0);
        for (int i = 0; i < 1000; i++) {
            Task task = new JPPFAnnotatedTask(new MyPayload("Task_" + i));
            JPPFJob job = new JPPFJob("Job_" + i);
            JPPFSchedule timeout = new JPPFSchedule(random.nextInt(10) + 2);
            job.getClientSLA().setJobExpirationSchedule(timeout);
            job.add(task);
 
            System.out.println("Submitting job [" + job.getName() + "] with timeout [" + job.getClientSLA().getJobExpirationSchedule() + "]");
            long start = System.currentTimeMillis();
            Task<?> result = client.submitJob(job).get(0);
            long duration = System.currentTimeMillis() - start;
 
            System.out.println("[" + job.getName() + "], cancelled: [" + job.isCancelled() + "], duration: [" + duration + " ms], " +
                    "result: [" + result.getResult() + "], exception: [" + result.getThrowable() + "]");
 
            if (job.isCancelled()) canceledCount.incrementAndGet();
 
            if (result.getThrowable() != null)
                throw new RuntimeException("Job [" + job.getName() + "] failed: " + result.getThrowable());
            if (!job.isCancelled() && result.getResult() == null)
                throw new RuntimeException("Job [" + job.getName() + "] - null result");
            if (!job.isCancelled() && !result.getResult().toString().equals("Task_" + i))
                throw new RuntimeException("Job [" + job.getName() + "] - invalid result");
        }
 
        System.out.println("Number of canceled jobs: " + canceledCount.get());
        client.close();
    }
}


You might want to fine-tune delays/randomization range, but on my setup (ms windows machine with local driver and local node launched with default config) I'm able to hang jppf with 100% accuracy.
#32
Comment posted by
 lolo4j
Jan 06, 07:30
Thanks, I can reproduce now. The missing part in my code was the very short job timeout schedule (2 to 11 ms). Now I just need to figure out the problem.
#33
Comment posted by
 lolo4j
Jan 06, 08:25
A file was uploaded. Fix for job hungicon_open_new.png
#34
Comment posted by
 lolo4j
Jan 06, 08:29
I think I've finally nailed it. I could see in the logs that the server was receiving the cancel request while it was still processing the results from the node. There was a synchronization problem at this point. To fix it, I added code to lock the job while its results are processed (the cancel function already locks it). Now I cannot reproduce anymore.

Can you give it a try? I attached a new patched jar for the server (file jppf-server-5.2.4-SNAPSHOT.jar).
#35
Comment posted by
 zorba128
icon_reply.pngJan 09, 09:23, in reply to comment #34
lolo4j wrote:
I think I've finally nailed it.


Glad to hear that; with both patches applied everything seems fine at first sight. We'll do some real-life testing after 5.2.4 is released and available as maven dependency.

I'll let you know if we find out something.

marcin

#36
Comment posted by
 lolo4j
Jan 09, 10:09
Fixed in:
#38
Comment posted by
 lolo4j
Jan 15, 00:11
I'm reopening this issue because it appears the fix introduced a deadlock occurring in the server, when the client is killed while jobs are still being processed by the server. See deadlocked threads stacks in the next comment

The issue was updated with the following change(s):
  • The status has been updated, from Closed to New.
  • This issue has been reopened
  • The resolution has been updated, from RESOLVED to Not determined.
#39
Comment posted by
 lolo4j
Jan 15, 00:15
Call stacks and deadlock info for the deadlocked threads, as shown in the admin console:
Deadlock detected
 
- thread id 93 "JobExecutor_0020" is waiting to lock java.util.concurrent.locks.ReentrantLock$NonfairSync@137933b8 which is held by thread id 30 "JPPF NIO-0004"
- thread id 30 "JPPF NIO-0004" is waiting to lock org.jppf.server.nio.client.ClientContext@40396e3d which is held by thread id 31 "JPPF NIO-0005"
- thread id 31 "JPPF NIO-0005" is waiting to lock java.util.concurrent.locks.ReentrantLock$NonfairSync@137933b8 which is held by thread id 30 "JPPF NIO-0004"
 
Stack trace information for the threads listed above
 
"JobExecutor_0020" - 93 - state: WAITING - blocked count: 1 - blocked time: 1 - wait count: 229 - wait time: 215159
  at sun.misc.Unsafe.park(Native Method)
  - waiting on java.util.concurrent.locks.ReentrantLock$NonfairSync@137933b8
  at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
  at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834)
  at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:867)
  at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1197)
  at java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:214)
  at java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:290)
  at org.jppf.server.queue.JPPFPriorityQueue.getAllJobIds(JPPFPriorityQueue.java:316)
  at org.jppf.server.job.management.DriverJobManagement.selectJobUuids(DriverJobManagement.java:284)
  at org.jppf.server.job.management.DriverJobManagement.getJobInformation(DriverJobManagement.java:252)
  at sun.reflect.GeneratedMethodAccessor24.invoke(Unknown Source)
  at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
  at java.lang.reflect.Method.invoke(Method.java:606)
  at sun.reflect.misc.Trampoline.invoke(MethodUtil.java:75)
  at sun.reflect.GeneratedMethodAccessor2.invoke(Unknown Source)
  at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
  at java.lang.reflect.Method.invoke(Method.java:606)
  at sun.reflect.misc.MethodUtil.invoke(MethodUtil.java:279)
  at com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:112)
  at com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:46)
  at com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.java:237)
  at com.sun.jmx.mbeanserver.PerInterface.invoke(PerInterface.java:138)
  at com.sun.jmx.mbeanserver.MBeanSupport.invoke(MBeanSupport.java:252)
  at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.invoke(DefaultMBeanServerInterceptor.java:819)
  at com.sun.jmx.mbeanserver.JmxMBeanServer.invoke(JmxMBeanServer.java:801)
  at javax.management.remote.generic.ServerIntermediary.handleRequest(ServerIntermediary.java:160)
  at javax.management.remote.generic.ServerIntermediary$PrivilegedRequestJob.run(ServerIntermediary.java:515)
  at java.security.AccessController.doPrivileged(Native Method)
  at javax.management.remote.generic.ServerIntermediary$RequestHandler.handleMBSReqMessage(ServerIntermediary.java:382)
  at javax.management.remote.generic.ServerIntermediary$RequestHandler.execute(ServerIntermediary.java:321)
  at com.sun.jmx.remote.generic.ServerSynchroMessageConnectionImpl$RemoteJob.run(ServerSynchroMessageConnectionImpl.java:336)
  at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
  at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
  at java.lang.Thread.run(Thread.java:745)
 
  Locked ownable synchronizers:
  - java.util.concurrent.ThreadPoolExecutor$Worker@38855445
 
"JPPF NIO-0004" - 30 - state: BLOCKED - blocked count: 211 - blocked time: 181906 - wait count: 9956 - wait time: 216885
  at org.jppf.server.nio.client.ClientContext.getNbTasksToSend(ClientContext.java:324)
  - waiting on org.jppf.server.nio.client.ClientContext@40396e3d
  at org.jppf.server.nio.client.CompletionListener.taskCompleted(CompletionListener.java:72)
  at org.jppf.server.protocol.ServerTaskBundleClient.fireTasksCompleted(ServerTaskBundleClient.java:361)
  at org.jppf.server.protocol.ServerTaskBundleClient.resultReceived(ServerTaskBundleClient.java:218)
  at org.jppf.server.protocol.ServerJob.resultsReceived(ServerJob.java:134)
  at org.jppf.server.protocol.ServerTaskBundleNode.resultsReceived(ServerTaskBundleNode.java:197)
  at org.jppf.server.nio.nodeserver.WaitingResultsState.processResults(WaitingResultsState.java:147)
  at org.jppf.server.nio.nodeserver.WaitingResultsState.process(WaitingResultsState.java:86)
  at org.jppf.server.nio.nodeserver.WaitingResultsState.performTransition(WaitingResultsState.java:66)
  at org.jppf.server.nio.nodeserver.WaitingResultsState.performTransition(WaitingResultsState.java:42)
  at org.jppf.nio.StateTransitionTask.run(StateTransitionTask.java:79)
  - locked org.jppf.nio.SelectionKeyWrapper@23758f9a
  at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
  at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
  at java.lang.Thread.run(Thread.java:745)
 
  Locked ownable synchronizers:
  - java.util.concurrent.ThreadPoolExecutor$Worker@711a2c8f
  - java.util.concurrent.locks.ReentrantLock$NonfairSync@137933b8
 
"JPPF NIO-0005" - 31 - state: WAITING - blocked count: 138 - blocked time: 63 - wait count: 10124 - wait time: 398799
  at sun.misc.Unsafe.park(Native Method)
  - waiting on java.util.concurrent.locks.ReentrantLock$NonfairSync@137933b8
  at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
  at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834)
  at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:867)
  at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1197)
  at java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:214)
  at java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:290)
  at org.jppf.server.queue.JPPFPriorityQueue.getJob(JPPFPriorityQueue.java:285)
  at org.jppf.server.nio.client.ClientContext.cancelJobOnClose(ClientContext.java:266)
  - locked org.jppf.server.nio.client.ClientContext@40396e3d
  at org.jppf.server.nio.client.ClientContext.handleException(ClientContext.java:121)
  at org.jppf.nio.StateTransitionTask.run(StateTransitionTask.java:95)
  at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
  at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
  at java.lang.Thread.run(Thread.java:745)
 
  Locked ownable synchronizers:
  - java.util.concurrent.ThreadPoolExecutor$Worker@6e825e02
#40
Comment posted by
 lolo4j
Jan 15, 01:41
Fixed in: