JPPF, java, parallel computing, distributed computing, grid computing, parallel, distributed, cluster, grid, cloud, open source, android, .net
JPPF

The open source
grid computing
solution

 Home   About   Features   Download   Documentation   On Github   Forums 
June 03, 2023, 04:10:10 PM *
Welcome,
Please login or register.

Login with username, password and session length
Advanced search  
News: New users, please read this message. Thank you!
  Home Help Search Login Register  
Pages: [1]   Go Down

Author Topic: Load Balancing - Semantic of Jobs  (Read 3224 times)

clgv

  • JPPF Master
  • ***
  • Posts: 25
Load Balancing - Semantic of Jobs
« on: August 24, 2012, 03:53:17 PM »

So, my cluster is up and running.
In first test cases I see that always only one job on my driver/server is displayed in the admin GUI.
I have expected to see all jobs I submitted there.
I read about load balancing configuration but I did not find information on how that influences the jobs.

What is the semantic and treatment of jobs in JPPF?

If I do not really need to, should I better refrain from grouping tasks to jobs for performance reasons?

In general I need a configuration to run at least 200000 tasks that need between 1min up to 5mins on at least 6 machines with 12 parallel threads.
Optimally, none of the machines should have idle cores when there are still tasks left.
« Last Edit: August 24, 2012, 04:40:02 PM by clgv »
Logged

lolo

  • Administrator
  • JPPF Council Member
  • *****
  • Posts: 2272
    • JPPF Web site
Re: Load Balancing - Semantic of Jobs
« Reply #1 on: August 25, 2012, 10:10:26 AM »

Hello,

Quote
In first test cases I see that always only one job on my driver/server is displayed in the admin GUI.

I am assuming you're submitting your jobs via a single client instance and your client has a single driver connection, could you confirm?
In this configuration, only one job at a time will be sent to the driver, and other subsequent jobs will be waiting in a queue on the client side, until the first job is completed.
If you want to submit multiple jobs concurrently, you will need to configure a driver connection pool: "jppf.pool.size=n" with server discovery or "mydriver.jppf.pool.size=n" when discovery is disabled, where n is the number of connections between the client and driver and defines the maximum number of jobs that can be sent concurrently. You will also need to set your jobs as non-blocking jobs.

The role of the load balancer is to split a job on the server into one to many subsets of its tasks, such that each subset is sent to a node. Depending on the load-balancer implementation and configuration, and the number of tasks in the job, all subsets could be sent to the same job in sequence, or to many nodes in parallel, or any combination in-between.
The main function of the load-balancer is to compute the number of tasks in each task subset.

Let's take a simple example. We use the following configuration:
- a job with 100 tasks
- 1 driver, 2 nodes
- we configure the "manual" algorithm with a fixed subset size of 25

Once the job is sent to the driver, the distribution of the tasks to the nodes will be as follows:
1) as per the load-balancer configuration, the driver will send a subset of 25 tasks to each node, so 50 tasks will be sent, and 50 tasks will remain in the driver's queue
When a node finishes executing the 25 tasks, the task subset is sent back to the client, and the job's TaskResultListener will receive a corresponding notification.
2) again, the driver will send 25 tasks to each node, and 0 tasks will remain in the driver's queue.
In addition to what happens in 1), when the last node finishes executing its tasks, the entire job will be completed.

The behavior can be very dfferent if you chose a different load-balancing algorithm. For instance, the "proportional" algorithm attempts to distribute all tasks at once, and the number of tasks sent to each node is computed based on the past performance of the node (round trip time from driver to node, wrt number of tasks). So each node may receive a different number of tasks over time.
We have some very old documentation on the load-balancers which may clarify a little more.

As a general rule, the more tasks you can send to a node, the better the performance will be, provided the tasks are fairly distributed among all the nodes.
For instance, using the "manual" alogrithm with a fixed size of (nbTasks / nbNodes) will work well only if your nodes are idempotent and if their number does not vary dynamically.
This is why I would first recommend to use the "proportional" algorithm, because it adapts to changing conditions in the grid: when nodes are added or removed dynamically, when the number and performance profile of the tasks change, or when other conditions arise, such as varying network latency and bandwidth. Additionally, this algorithm requires a bootstraping of its past performance cache. This is why, the first time tasks are sent to a node, it will provide a predetermined, generally small, fixed number of tasks (the "initialSize" config parameter), so that it can get a preliminary idea of what that node's performance is.

If you try this algorithm and find that it does not work as you expect, you may try other alogrithms and tune their configuration accordingly. This is not always an easy task, but as far as I know there is no universal solution that will work in all use cases, which is why JPPF provides as much flexibility as possible in that area.

Additionally, you might want to consider pluging your own load-balancer if none of the built-in ones satisfies your requirements.

Also, the driver's load-balancing configuration is dynamically manageable, which allows you to change the load-balancing configuration at any time, for instance before submitting a new job that is completely different from the previously submitted jobs.

I hope this clarifies.

Sincerely,
-Laurent
Logged

clgv

  • JPPF Master
  • ***
  • Posts: 25
Re: Load Balancing - Semantic of Jobs
« Reply #2 on: August 27, 2012, 10:40:40 AM »

Thanks a lot for this exhaustive answer!  :)
Logged

clgv

  • JPPF Master
  • ***
  • Posts: 25
Re: Load Balancing - Semantic of Jobs
« Reply #3 on: September 18, 2012, 04:55:25 PM »

I still have problems with the way the load balancer works.

As a test setup I have 3 machines client, server, node. The node has 12 cores so I gave it 12 threads and use the "nodethreads" strategy in the server. Even with a multiplicator of 3 I only get an estimated concurrency of around 6.3.
One reason therefore is that my tasks have different execution durations so that I observed 9 threads idling for at least one minute while the other 3 were still busy.

From the "custom load balancer" description I conclude that the partitioning into subjobs which are transmitted together cannot be easily changed to a behavior similar to a task stream, i.e. a node always sends completed task back to the server immediately and the server sends him a new task while the node can have a specified number of "buffer tasks" so that the computation threads can work continously.

The scenario that all tasks are assigned on job submission would not work for me since then I could not react (adjust used thread number) on changes on the nodes (other users starting calculations outside JPPF).

With the current observed performance, especially the cpu idle times, I am not able to use JPPF.
Did I maybe miss an option that might work better for me?
« Last Edit: September 18, 2012, 05:10:23 PM by clgv »
Logged

lolo

  • Administrator
  • JPPF Council Member
  • *****
  • Posts: 2272
    • JPPF Web site
Re: Load Balancing - Semantic of Jobs
« Reply #4 on: September 20, 2012, 07:45:49 AM »

Hello,

Your observations are very accurate.
Another solution that I see would be to use multiple nodes instead of just one, on the same physical machine. While not completely avoiding idle threads at all time, this will greatly reduce the number of idle threads. The main drawback of this topology will be the memory footprint of all the nodes, however it's not that big (the nodes by themselves are pretty lean), and if you have enough memory this is definitely worth trying.
Ultimately, the topology that would get you closest to a task stream would be to have 12 nodes with one thread each, however you may find that node/threads combinations in-between would yield an acceptable throughput as well. You'll also need to account for the I/O part between node and server, during which the CPU is barely used.

I hope this helps.

Sincerely,
-Laurent
Logged
Pages: [1]   Go Up
 
JPPF Powered by SMF 2.0 RC5 | SMF © 2006–2011, Simple Machines LLC Get JPPF at SourceForge.net. Fast, secure and Free Open Source software downloads