JPPF Issue Tracker
star_faded.png
Please log in to bookmark issues
bug_report_small.png
CLOSED  Bug report JPPF-612  -  JMX remote connector: poor handling of large number of notifications
Posted Dec 18, 2019 - updated Dec 18, 2019
icon_info.png This issue has been closed with status "Closed" and resolution "RESOLVED".
Issue details
  • Type of issue
    Bug report
  • Status
     
    Closed
  • Assigned to
     lolo4j
  • Progress
       
  • Type of bug
    Not triaged
  • Likelihood
    Not triaged
  • Effect
    Not triaged
  • Posted by
     lolo4j
  • Owned by
    Not owned by anyone
  • Time spent
    1 hour
  • Category
    JMX connector
  • Resolution
    RESOLVED
  • Priority
    Normal
  • Reproducability
    Always
  • Severity
    Normal
  • Targetted for
    icon_milestones.png JPPF 6.1.4
Issue description
In high load scenarios where massive amounts of job life cycle notifications are generated, I observed the following issues:
  • on the server side, the notifications tend to accumulate in queues associated with the client connections, leading to large increases in heap usage, even OOMEs if the JPPF driver's heap is not large enough. This is not a memory leak, but rather a large spike in heap usage.
  • on the client side (e.g. the admin console configured with immediate notifications mode), the notification handling will use a separate thread for each notification, leading to threads proliferation (I observed cases with a peak of over 30,000 live threads), leading in turn to the process being completely frozen
Steps to reproduce this issue
Here is the test configuration I used
  • start a grid with 1 driver with load balancing configured to "algorithm = manual" and "size = 1", so as to maximize the number of job dispatches and therefore the number of job notifications
  • start and connect 100 nodes to the driver
  • connect a swing-based admin console, configured with "jppf.gui.publish.mode = immediate_notifications"
  • continuously submit 1000 jobs, with a concurrency level of 40 (up to 40 jobs are active in the driver at any time), with 1000 tasks each. Each task is very short-lived (e.g. 2 milliseconds)