JPPF, java, parallel computing, distributed computing, grid computing, parallel, distributed, cluster, grid, cloud, open source, android, .net
JPPF

The open source
grid computing
solution

 Home   About   Features   Download   Documentation   On Github   Forums 
January 19, 2021, 10:02:47 PM *
Welcome,
Please login or register.

Login with username, password and session length
Advanced search  
News: New users, please read this message. Thank you!
  Home Help Search Login Register  
Pages: [1]   Go Down

Author Topic: uneven load balancing even after several practice runs  (Read 14531 times)

broiyan

  • JPPF Grand Master
  • ****
  • Posts: 54
uneven load balancing even after several practice runs
« on: December 19, 2012, 03:52:49 PM »

Loads are surprisingly unbalanced.  I run with the following "proportional" strategy which I believe are copied from the example code:

strategy.proportional.size = 5
strategy.proportional.performanceCacheSize = 300
strategy.proportional.proportionalityFactor = 1


I run 1 job containing 10 identical tasks on 3 nodes.  I expect JPPF to need some experience to know the capabilities of the nodes so I give it several runs.  After about 3 "practice runs", the 4th run did not have a distribution of 4/3/3, the most evenly distributed possibility. 

My tasks are identical in every way because they are just for testing: count to a large integer, and it happens to take around 70 seconds.

When I run 10 identical tasks on 2 nodes, after about 5 practice runs, I did not get a distribution of 5/5.  In fact even after about 10 runs, never has 5/5 occurred.

These are off-premises cloud servers with one of the bigger service providers in the industry.  Can these uneven distributions be attributed to the nature of the cloud provider's loads or is the best balancing I can expect from JPPF's load balancing?
« Last Edit: December 19, 2012, 03:54:51 PM by broiyan »
Logged

lolo

  • Administrator
  • JPPF Council Member
  • *****
  • Posts: 2262
    • JPPF Web site
Re: uneven load balancing even after several practice runs
« Reply #1 on: December 20, 2012, 06:32:23 AM »

Hello,

I think there are several issues with your load-balancing configuration:
- the parameter "size" does not exist for the "proportional" algorithm, I believe you meant to use "initialSize" instead
- the default value for "initialSize" is 10, that's what would be used in your case
- this means that the server will send 10 tasks to each node the first time. Since you have 10 tasks in your job, all of them would be sent to a single node on the first run, and this would set the load-balancer off.

The algorithm is based on the mean round trip time of the tasks sent to the nodes. This mean time thus includes network transport, task execution time, task wait time (if more tasks are sent than there are processing threads in the node) and JPPF overhead. We have some old documentation on how this algorithm works, if you wish to see the details.

The default settings for this algorithm were intended for jobs with many short-lived tasks, which is very different from what you have. So you will need to adjust them. In particular, I would set "strategy.proportional.initialSize" to a value in the range [1, nbThreads] where "nbThreads" is the number of processing threads in the nodes (by default this is the number of cores/CPUs in each node).
For the tasks that you described, i.e. relatively long-lived with a small footprint, the network transport time should be negligible when compared to the tasks' execution time, so I believe you should have much better resutls from the start with "intiialSize = 1".

I hope this clarifies,
-Laurent

« Last Edit: December 27, 2012, 07:59:22 AM by lolo »
Logged

broiyan

  • JPPF Grand Master
  • ****
  • Posts: 54
Re: uneven load balancing even after several practice runs
« Reply #2 on: December 22, 2012, 11:06:40 AM »

Thanks but the algorithm link does not work (because it does not look like an external URI).  I remember seeing it linked from elsewhere before but, regardless, I'm not sure if I have access to all the variable definitions to interpret the algorithm.  Some variables did not look familiar to me.

Anyway, setting the .initialSize to 1 does not seem to make any difference.  I've run it 7 times now and I continue to see mostly uneven distributions and no trend towards improvement. 



« Last Edit: December 22, 2012, 11:10:26 PM by broiyan »
Logged

lolo

  • Administrator
  • JPPF Council Member
  • *****
  • Posts: 2262
    • JPPF Web site
Re: uneven load balancing even after several practice runs
« Reply #3 on: December 27, 2012, 08:52:28 AM »

Hello,

Sorry for this late response.
I apologize for the incorrect link. It is in fact http://www.jppf.org/doc/v2/index.php?title=JPPF_Performance#Deterministic_algorithm:_.22proportional.22_configuration (I also fixed it in my previous post).
I did some extensive testing and I found that indeed it is difficult to get an even load on all the nodes with the proportional algorithm, even when the nodes are idempotent.
Since the documentation linked above was written, 2 bootsrapping parameters were added to the alogorithm:
- "initialSize" which you already know
- "initialMeanTime", which is the initial mean time for tasks execution to use, the first time an algorithm instance (association of the algorithm with a node) computes the number of tasks to send. It is expressed in nanoseconds and its default value is 109 (i.e. 1 second)
I tested with generic jobs with tasks that perform CPU-intensive computations for a configurable time. With a use case similar to yours (3 nodes with 2 virtual cores each, jobs with 10 to 15 long-lived tasks). When submitting jobs with 10 tasks, I found that if the load doesn't change over time, meaning I always submit jobs with the same number of tasks and the same duration for each task, I would get an even distribtution of 4/3/3 after two runs, and it would then stay that way. However, when submitting 15 tasks, I observed that the distribution oscillated between 7/4/4 and 6/5/4, instead of the expected 5/5/5.

I registered this as a bug: JPPF-110 Proportional algorithm results in uneven load with small number of long-lived tasks

As a conclusion, it seems the proportional algorithm is not well adapted to your use case. Indeed, it was initially designed for medium to large numbers of small tasks, such as happens with most embarassingly parallel problems. I believe that using the "nodethreads" algorithm will produce better results for you, as it will distribute as many tasks to each node as this node has available cores.

Sincerely,
-Laurent
Logged

lolo

  • Administrator
  • JPPF Council Member
  • *****
  • Posts: 2262
    • JPPF Web site
Re: uneven load balancing even after several practice runs
« Reply #4 on: January 27, 2013, 08:37:29 AM »

Hello,

I wanted to let you know that we released JPPF 3.2.2, which includes a fix for JPPF-110. Feel free to read the latest comments in the bug report for a detailed explanation of what we did. I'd be very interested in your feedback on this.

Sincerely,
-Laurent
Logged
Pages: [1]   Go Up
 
JPPF Powered by SMF 2.0 RC5 | SMF © 2006–2011, Simple Machines LLC Get JPPF at SourceForge.net. Fast, secure and Free Open Source software downloads