JPPF Issue Tracker
star_faded.png
Please log in to bookmark issues
bug_report_small.png
CLOSED  Bug report JPPF-418  -  Memory leak in client queue
Posted Oct 16, 2015 - updated Oct 25, 2015
icon_info.png This issue has been closed with status "Closed" and resolution "RESOLVED".
Issue details
  • Type of issue
    Bug report
  • Status
     
    Closed
  • Assigned to
     lolo4j
  • Progress
       
  • Type of bug
    Not triaged
  • Likelihood
    Not triaged
  • Effect
    Not triaged
  • Posted by
     lolo4j
  • Owned by
    Not owned by anyone
  • Category
    Client
  • Resolution
    RESOLVED
  • Priority
    Critical
  • Reproducability
    Always
  • Severity
    Critical
  • Targetted for
    icon_milestones.png JPPF 4.2.9
Issue description
From this forum thread:

When a job is cancelled and it was only partially sent to the server (due to load-balancer settings for instance), the sizeMap field of the class AbstractJPPFQueue is not properly cleaned up, which leads to an OutOfMemeoryError
Steps to reproduce this issue
Using the attached reproducing code:
  • configure a client with 256 MB of heap and these load balancer settings, to ensure the client only sends one task at a time:
jppf.load.balancing.algorithm = manual
jppf.load.balancing.profile = manual
jppf.load.balancing.profile.manual.size = 1
  • run the sample, it will attempt to submit 10,000 jobs with two tasks each, each task having a memory footprint of 5 MB.
==> the sample fails with an OOME at the 15th job

#2
Comment posted by
 lolo4j
Oct 16, 13:03
A file was uploaded. self-contained reproducing codeicon_open_new.png
#11
Comment posted by
 lolo4j
Oct 18, 06:57
Now that I fixed the memory leak, I uncovered znother bug in the client. In 4.2.8, the JPPFJob.awaitResult(timeout) (used internally in JPPFClient.submitJob() for blocking jobs) is sometimes not notified of the job completion, causing the application thread that awaits the rjob results to be stuck in a wait(). This can happen in extreme cases where the job completes between the call to SubmissionManager.submitJob() and the call to JPPFJob.awaitResults().

awaitResults ultimately calls AbstractJPPFJob.await() which has this code:
void await(final long timeout, final boolean raiseTimeoutException) throws TimeoutException {
  long millis = timeout > 0L ? timeout : Long.MAX_VALUE;
  long elapsed = 0L;
  long start = System.currentTimeMillis();
  while ((results.size() < tasks.size()) && ((elapsed = System.currentTimeMillis() - start) < millis)) results.goToSleep(millis - elapsed);
  if ((elapsed >= millis) && raiseTimeoutException) throw new TimeoutException("timeout expired");
}
The big problem here is results.goToSleep(millis - elapsed) which performs a synchronized call to results.wait(...). If the job has already completed before the while loop is executed, then the thread gets stuck forever. Instead we should do a results.goToSleep(1L).

As a side note, this fix is already implemented in 5.0, 5.1 and in the trunk
#12
Comment posted by
 lolo4j
Oct 18, 09:02
Fixed in: