JPPF Issue Tracker
star_faded.png
Please log in to bookmark issues
feature_request_small.png
CLOSED  Feature request JPPF-34  -  Inefficient load balancing with peer servers
Posted Aug 08, 2012 - updated Mar 22, 2013
action_vote_minus_faded.png
0
Votes
action_vote_plus_faded.png
icon_info.png This issue has been closed with status "Closed" and resolution "RESOLVED".
Issue details
  • Type of issue
    Feature request
  • Status
     
    Closed
  • Assigned to
     lolo4j
  • Progress
       
  • Type of bug
    Not triaged
  • Likelihood
    Not triaged
  • Effect
    Not triaged
  • Posted by
     lolo4j
  • Owned by
    Not owned by anyone
  • Category
    Server
  • Resolution
    RESOLVED
  • Priority
    Normal
  • Targetted for
    icon_milestones.png JPPF 3.3
Issue description
This is to keep track of various issues with load-balancing when p2p communication between servers is involved. Some of the problems I have observed:

1) with proportianal algortihm, this is an issue with the boostrap parameters and collision detection. A peer driver (considered as a node) may not be accepted for executing a job because it would cause a collision in the job routing, i.e. it has already received (a subset of) the job and sent a subset of it to this driver or another which sent it to this driver, etc... In this case this means the bundler associated with this server as node is never updated. This includes its mean task execution time. If the algorithm is bootstraped improperly, the mean time can remain lower than any of the "real" nodes, and cause these nodes to always receive a minimum number of tasks. This means a proper value must be given for "strategy.profile_name.initialMeanTime". This value is expressed in nanoseconds, and the JPPF default iis 1000, which is way too low. It should be in the order of 1e9

2) For the thread node algorithm, the problem is that a peer driver (seen as a node) does not provide a valid number of processing threads. We propose to set a number of threads equal to the sum of processing threads for all "real" nodes attached to it.

Need to check for the other algorithms (except the "manual" one of course)


#7
Comment posted by
 lolo4j
Mar 14, 09:24
I'm reopening this issue as not all problems have been resolved. See this forum thread for addtional information.
#10
Comment posted by
 lolo4j
Mar 16, 13:52
There is one problem in org.jppf.server.queue.JPPFPriorityQueue, in method getSize(ServerJob), which is used to compute the maximum task bundle size in the server queue. Currently we have:
@override
protected int getSize(final ServerJob bundleWrapper) {
  return bundleWrapper.getInitialTaskCount();
}
getInitialTaskCount() is the number of tasks in the job submitted by the client. However, in a peer driver that receives a job from another driver, we need to use getTaskCount(), which is the number of tasks sent by the other driver (not the client). This is causing one or more of the nodes attached to the peer driver to never receive any tasks, because their associated load-balancer would use a task bundle size greater than the number of tasks in the job.

So the fix is as follows:
@override
protected int getSize(final ServerJob bundleWrapper) {
  return bundleWrapper.getTaskCount();
}
With this, I can see that all nodes always receive something, provided there are enough tasks in the job.
#11
Comment posted by
 lolo4j
Mar 22, 21:12
Fixed. Changes committed to SVN:

The issue was updated with the following change(s):
  • This issue has been closed
  • The status has been updated, from Confirmed to Closed.
  • This issue's progression has been updated to 100 percent completed.
  • The resolution has been updated, from Not determined to RESOLVED.