JPPF Issue Tracker
star_faded.png
Please log in to bookmark issues
bug_report_small.png
CLOSED  Bug report JPPF-613  -  NPE due to race condition in the server leads to losing node connections
Posted Dec 19, 2019 - updated Dec 19, 2019
icon_info.png This issue has been closed with status "Closed" and resolution "RESOLVED".
Issue details
  • Type of issue
    Bug report
  • Status
     
    Closed
  • Assigned to
     lolo4j
  • Progress
       
  • Type of bug
    Not triaged
  • Likelihood
    Not triaged
  • Effect
    Not triaged
  • Posted by
     lolo4j
  • Owned by
    Not owned by anyone
  • Category
    Core
  • Resolution
    RESOLVED
  • Priority
    Normal
  • Reproducability
    Always
  • Severity
    Normal
  • Targetted for
    icon_milestones.png JPPF 6.1.4
Issue description
When the driver is under stress, I sometimes see the following exception:
java.lang.NullPointerException
  at org.jppf.server.nio.nodeserver.async.AsyncNodeMessageHandler.bundleSent(AsyncNodeMessageHandler.java:111)
  at org.jppf.server.nio.nodeserver.async.AsyncNodeMessageWriter.postWrite(AsyncNodeMessageWriter.java:67)
  at org.jppf.server.nio.nodeserver.async.AsyncNodeMessageWriter.postWrite(AsyncNodeMessageWriter.java:1)
  at org.jppf.nio.NioMessageWriter.doWrite(NioMessageWriter.java:72)
  at org.jppf.nio.NioMessageWriter.write(NioMessageWriter.java:52)
  at org.jppf.nio.StatelessNioServer.handleWrite(StatelessNioServer.java:263)
  at org.jppf.nio.StatelessNioServer.lambda$2(StatelessNioServer.java:104)
  at org.jppf.nio.StatelessNioServer$KeysetHandler.handle(StatelessNioServer.java:225)
  at org.jppf.nio.StatelessNioServer.lambda$3(StatelessNioServer.java:195)
  at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
  at java.util.concurrent.FutureTask.run(FutureTask.java:266)
  at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
  at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
  at java.lang.Thread.run(Thread.java:748)
This happens because the method "AsyncNodeMessageHandler.bundleSent()" is called just after a job dispatch is sent to a node, but it is possible, if the tasks are very short-lived, that the results have already been received and processed, leading to the NPE
Steps to reproduce this issue
  • start a driver with load-balancing configured with "algorithm = manual" and "size = 1"
  • start 10 nodes
  • submit a continuous job stream with a concurrency level of 40 (i.e. up to 40 jobs in the driver at any time)
  • each job has 1000 tasks that last 2 ms each
==> a NPE will show in the driver log, and a node is lost, until it restarts