Hello Ankit,
First of all, let me state that there isn't an optimal configuration that fits all possible use cases, because the overall performance of the grid ultimately depends on the workload it has to process, that is, it depends on what the tasks in your jobs do.
Regarding 32 vs.64 bits JVMs, I have never found a definitive answer about which is faster. What I can say is, if your JVMs / JPPF nodes need more than 1.2 GB of heap, then 64 bits is your only option.
Now let's have a look at the possible setups that can be done with JPPF and how it impacts performance. In what follows, I will assume that all your tasks are roughly equivalent in terms of I/O, memory usage and computational weight.
1) Memory usage
There are 3 main indicators to consider:
- the memory footprint of the code of a JPPF node is quite small: an idle node uses less than 3 MB immediately after a full GC
- the non-heap memory footprint is a different story. The PermGen space usage will depend on the classes that are loaded by the node during execution of the tasks. The node maintains a cache of class loaders, which load classes from the client(s) that submit the jobs. The
cache size is configurable and allows you to define a tradeoff between Permgen space usage and performance (how frequently you have to reload the classes).
- the JVM RAM usage is the sum of max heap + PermGen space + other non-heap memory. I do not know of any rule to calculate the actual footprint of a JVM. In my experience, I have observed that, for a 64 bits JVM running on Windows 7 and with a default Permgen size, with 64 MB of heap the JVM process takes up to 120 MB of ram, with 128 MB of heap it takes up to 200 MB.
2) Processing threads in a node
Each JPPF node can have one or more
processing threads, which determine how many tasks it can process in parallel. How many threads should be configured for a node will depend on the nature of your tasks: if the tasks are mostly CPU-bound, then the number of processing threads should be equal to or very close to the number of cpus; if the tasks are I/O-bound then you can have many more processing threads than you have cpus. These are the two extremes, you'll probably have to do some tuning based on what the tasks do.
One point to note: consider that if a task uses up to X amount of memory, then n tasks will use n * X and the heap should be sized accordingly to avoid out of memory conditions.
Also, consider that the number of processing threads is
manageable: it can be changed dynamically from your code to adapt to the workload
3) Multiple nodes on the same machine
This is especially easy to setup with the
node provisioning feature. In a context with multiple jobs executing in parallel, the obvious advantage is that multiple jobs can be distributed concurrently to the nodes. In a single job context, this will essentially impact the performance of I/O between server and nodes: when a job is dispatched to multiple nodes, the I/O between the server and each of the nodes will be performed in parallel. If your tasks are very large, you should see a significant speedup as a consequence.
Keep in mind that in this setup, the total number of processing threads is now nbNodes * processingThreads, and the impact with regards to the number of cpus described in 2) applies to this number.
4) Load balancing in the server
The
load balancing in the server determines how many tasks are sent to each node. It basically calculates how each job will be split into subsets of its tasks and which node each subset will be sent to. When a job is made of many small, short-lived tasks, grouping as many tasks as possible in each subset will increase the overall throughput due to I/O speedup. If the tasks take a long time to execute, then the peformance gain will be much less significant or even negligible.
The number of tasks in each subset also impacts how a node performs as follows:
- the memory footprint is directly related to the number of tasks. As a consequence, the load-balancer settings can be used to coarsely control the node memory usage
- if there are less tasks than processing threads, then some threads will remain idle and you may have underused CPU resources.
- if there are more tasks than processing threads, then some tasks may be waiting in the node while the threads are busy. This isn't necessarily a problem, since these tasks might have been waiting in the server queue instead.
5) Load balancing in the client
The JPPF client has load-balancing settings exactly as the server does. The difference is that, instead of balancing the load between nodes, it balances between connections to one or multiple servers and the local executor. Consider that in the client you can define:
- a local executor which executes tasks in the client JVM
- multiple connections to a single server
- connections to multiple servers
- all of the above
You can also configure your jobs to be
sent over multiple server connections in parallel, thus enabling parallel I/O between server and client for a single job.
That's basically all of it. I hope this will provide you with insights as to how you can optimize the hardware for your application.
In fact, after reviewing what I wrote above, I believe this should be included in the documentation, as I believe it will benefit the whole JPPF community. I created this feature request:
JPPF-351 Write a performance tuning/optimization documentation sectionSincerely,
-Laurent