Hello,
In first test cases I see that always only one job on my driver/server is displayed in the admin GUI.
I am assuming you're submitting your jobs via a single client instance and your client has a single driver connection, could you confirm?
In this configuration, only one job at a time will be sent to the driver, and other subsequent jobs will be waiting in a queue on the client side, until the first job is completed.
If you want to submit multiple jobs concurrently, you will need to configure a driver connection pool: "jppf.pool.size=n" with server discovery or "mydriver.jppf.pool.size=n" when discovery is disabled, where n is the number of connections between the client and driver and defines the maximum number of jobs that can be sent concurrently. You will also need to set your jobs as
non-blocking jobs.
The role of the load balancer is to split a job on the server into one to many subsets of its tasks, such that each subset is sent to a node. Depending on the load-balancer implementation and configuration, and the number of tasks in the job, all subsets could be sent to the same job in sequence, or to many nodes in parallel, or any combination in-between.
The main function of the load-balancer is to compute the number of tasks in each task subset.
Let's take a simple example. We use the following configuration:
- a job with 100 tasks
- 1 driver, 2 nodes
- we configure the "manual" algorithm with a fixed subset size of 25
Once the job is sent to the driver, the distribution of the tasks to the nodes will be as follows:
1) as per the load-balancer configuration, the driver will send a subset of 25 tasks to each node, so 50 tasks will be sent, and 50 tasks will remain in the driver's queue
When a node finishes executing the 25 tasks, the task subset is sent back to the client, and the job's
TaskResultListener will receive a corresponding notification.
2) again, the driver will send 25 tasks to each node, and 0 tasks will remain in the driver's queue.
In addition to what happens in 1), when the last node finishes executing its tasks, the entire job will be completed.
The behavior can be very dfferent if you chose a different load-balancing algorithm. For instance, the "proportional" algorithm attempts to distribute all tasks at once, and the number of tasks sent to each node is computed based on the past performance of the node (round trip time from driver to node, wrt number of tasks). So each node may receive a different number of tasks over time.
We have some very old
documentation on the load-balancers which may clarify a little more.
As a general rule, the more tasks you can send to a node, the better the performance will be, provided the tasks are fairly distributed among all the nodes.
For instance, using the "manual" alogrithm with a fixed size of (nbTasks / nbNodes) will work well only if your nodes are idempotent and if their number does not vary dynamically.
This is why I would first recommend to use the "proportional" algorithm, because it adapts to changing conditions in the grid: when nodes are added or removed dynamically, when the number and performance profile of the tasks change, or when other conditions arise, such as varying network latency and bandwidth. Additionally, this algorithm requires a bootstraping of its past performance cache. This is why, the first time tasks are sent to a node, it will provide a predetermined, generally small, fixed number of tasks (the "initialSize" config parameter), so that it can get a preliminary idea of what that node's performance is.
If you try this algorithm and find that it does not work as you expect, you may try other alogrithms and tune their configuration accordingly. This is not always an easy task, but as far as I know there is no universal solution that will work in all use cases, which is why JPPF provides as much flexibility as possible in that area.
Additionally, you might want to consider
pluging your own load-balancer if none of the built-in ones satisfies your requirements.
Also, the driver's load-balancing configuration is
dynamically manageable, which allows you to change the load-balancing configuration at any time, for instance before submitting a new job that is completely different from the previously submitted jobs.
I hope this clarifies.
Sincerely,
-Laurent