JPPF Issue Tracker
star_faded.png
Please log in to bookmark issues
bug_report_small.png
CLOSED  Bug report JPPF-110  -  Proportional algorithm results in uneven load with small number of long-lived tasks
Posted Dec 27, 2012 - updated Jan 27, 2013
icon_info.png This issue has been closed with status "Closed" and resolution "RESOLVED".
Issue details
  • Type of issue
    Bug report
  • Status
     
    Closed
  • Assigned to
     lolo4j
  • Progress
       
  • Type of bug
    Not triaged
  • Likelihood
    Not triaged
  • Effect
    Not triaged
  • Posted by
     lolo4j
  • Owned by
    Not owned by anyone
  • Category
    Server
  • Resolution
    RESOLVED
  • Priority
    Normal
  • Reproducability
    Always
  • Severity
    Normal
  • Targetted for
    icon_milestones.png JPPF 3.3
Issue description
The full description can be found in this forum thread.
Steps to reproduce this issue
- Using 1 driver with this configuration:
jppf.load.balancing.algorithm = proportional
jppf.load.balancing.strategy = proportional
# "proportional" profile
strategy.proportional.performanceCacheSize = 300
strategy.proportional.proportionalityFactor = 1
strategy.proportional.initialSize = 1
strategy.proportional.initialMeanTime = 1e10
- 3 nodes with 2 processing threads each - submit jobs with 10 to 15 tasks each, with long-lived tasks (I tried with 2s up to 70s with the same results)


#4
Comment posted by
 lolo4j
Jan 26, 21:40
After a long time spent researching this and pulling my hair, I finally figured that some of the assumptions we make in the load-balancing framework are incorrect. To try and explain (I'm doing this here, so I can put it in the documentation of the next release), let's say we have a node with 2 processing threads which receives a btach of 5 tasks. With 2 threads, the tasks would be approximately executed in the following sequence:
(task1) (task2)
(task3) (task4)
(task5)
On the server side, the feedback(int nbTasks, double time) method will be invoked on the load-balancer, where nbTasks is the size of the tasks batch and time represents the round-trip time between the server and the node for the whole batch.

This is where the problem is. In effect, with this feedback information, all we can do is compute an approximative mean time for each task which is incorrect when the number of tasks is not a strict multiple of the number of threads in the node. In our example above, we have basically captured the elapsed time for 3 tasks executed in sequence (3 = 5 tasks / 2 threads, rounded to the next integer if the remainder is > 0). The false assumption we make here is that this represents the node performance capability, when in fact it represents the elapsed time for the tasks. A more accurate assumption would be to say that this represents the node performance for 2.5 tasks instead, or more formally nbTasks / nbThreads + (nbTasks % nbThreads) / nbThreads.

So, we can already see that the load-balancer is missing one piece of information: the number of threads in the node. This can be fixed easily enough, by making the load-balancer node-aware. Another problem here is that the time parameter includes the whole server-to-node-and-back round-trip, and there's no way to know which part of it represents the grid overhead (i.e. serialization/deserialization + network transport) and which part is the actual task execution time.

Thus, to provide as much accuracy as possible, I added code to the node and server such that the overhead time can be communicated separately to the load-balancer, and also such that a new piece of data is communicated: the accumulated elapsed execution time of the tasks in the batch received by the node. I added the following interface, to be implemented by any load-balancer that wishes to receive this information:
public interface BundlerEx extends Bundler
{
  void feedback(final int nbTasks, final double totalTime, final double accumulatedElapsed, final double overheadTime);
}
Then, I made all adaptive load-balancers (i.e. "proportional", "autotuned" and "rl" algorithms) implement the NodeAware and BundlerEx interface, with the following implemntation for the new feedback() method:
@Override
public void feedback(final int size, final double totalTime, final double accumulatedElapsed, final double overheadTime) {
  int n1 = size / nbThreads;
  int n2 = size % nbThreads;
  double mean = accumulatedElapsed / size;
  double t = 0d;
  if (n1 == 0) t = mean;
  else {
    t = n1 * mean;
    if (n2 > 0) t += mean * ((double) n2 / nbThreads);
  }
  t += overheadTime;
  feedback(size, t);
}
Note how we simply recompute the totalTime before feeding it to the "old" feedback method, so we don't have to rewrite the code of the algorithm.

With this technique, I observed a much more balanced load on the nodes, from the execution time perspective. In general, the number of tasks sent to he nodes is not exactly evenly balanced. For instance with 3 nodes and "proportional algorithm" you will have a 4/5/6 tasks distribution for job with 15 tasks. However this provides a near optimal throughput for the overall job execution, and I also observed that the number of tasks executed by each node over time is much closer to an optimal distribution than the former implementation of the algorithm.

So the plan is as follows:
  • for JPPF 3.2.x, I will include this in the next maintenance release, but without documenting it. This will impact the built-in "proportional", "autotuned" and "rl" algorithms.
  • for JPPF 3.3, I will update the documentation accordingly, and in particular add a section on "BundlerEx" in the chapter about custom load-balancers.


#5
Comment posted by
 lolo4j
Jan 27, 08:03
Fixed. Changes committed to SVN:

The issue was updated with the following change(s):
  • This issue has been closed
  • The status has been updated, from Confirmed to Closed.
  • This issue's progression has been updated to 100 percent completed.
  • The resolution has been updated, from Not determined to RESOLVED.
  • Information about the user working on this issue has been changed, from lolo4j to Not being worked on.