JPPF, java, parallel computing, distributed computing, grid computing, parallel, distributed, cluster, grid, cloud, open source, android, .net
JPPF

The open source
grid computing
solution

 Home   About   Features   Download   Documentation   On Github   Forums 
June 04, 2023, 08:22:51 AM *
Welcome,
Please login or register.

Login with username, password and session length
Advanced search  
News: New users, please read this message. Thank you!
  Home Help Search Login Register  
Pages: [1]   Go Down

Author Topic: Improving Node performance  (Read 1476 times)

Ankit

  • JPPF Padawan
  • *
  • Posts: 12
Improving Node performance
« on: December 01, 2014, 08:18:20 AM »

Hi Laurent,

I have a really basic question that I need your help in. I hope this is the correct place to ask this question.

In my application, I have created only 1 jppf job. This job has multiple jppf tasks. So basically I can only send it to 1 remote machine and in that machine I can create multiple nodes to run the tasks simultaneously.

I want to know what should be the optimal configuration of that remote machine where my jppf drivers will be running. Should it be a 32 bit system or a 64 bit?

Since I am only using 1 remote machine and since the objective is to reduce the run time of the application, I want the remote machine to have a good configuration so that the nodes can run the multiple tasks simultaneously.

Can you please explain to me how the performance of the nodes can be increased by increasing the RAM, number of CPU, using a 32/64 bit version of Windows 7, etc?

I would really appreciate your help in this.

Thanks a lot
Warm regards
Ankit
Logged

lolo

  • Administrator
  • JPPF Council Member
  • *****
  • Posts: 2272
    • JPPF Web site
Re: Improving Node performance
« Reply #1 on: December 02, 2014, 07:47:02 AM »

Hello Ankit,

First of all, let me state that there isn't an optimal configuration that fits all possible use cases, because the overall performance of the grid ultimately depends on the workload it has to process, that is, it depends on what the tasks in your jobs do.

Regarding 32 vs.64 bits JVMs, I have never found a definitive answer about which is faster. What I can say is, if your JVMs / JPPF nodes need more than 1.2 GB of heap, then 64 bits is your only option.

Now let's have a look at the possible setups that can be done with JPPF and how it impacts performance. In what follows, I will assume that all your tasks are roughly equivalent in terms of I/O, memory usage and computational weight.

1) Memory usage

There are 3 main indicators to consider:
- the memory footprint of the code of a JPPF node is quite small: an idle node uses less than 3 MB immediately after a full GC
- the non-heap memory footprint is a different story. The PermGen space usage will depend on the classes that are loaded by the node during execution of the tasks. The node maintains a cache of class loaders, which load classes from the client(s) that submit the jobs. The cache size is configurable and allows you to define a tradeoff between Permgen space usage and performance (how frequently you have to reload the classes).
- the JVM RAM usage is the sum of max heap + PermGen space + other non-heap memory. I do not know of any rule to calculate the actual footprint of a JVM. In my experience, I have observed that, for a 64 bits JVM running on Windows 7 and with a default Permgen size, with 64 MB of heap the JVM process takes up to 120 MB of ram, with 128 MB of heap it takes up to 200 MB.

2) Processing threads in a node

Each JPPF node can have one or more processing threads, which determine how many tasks it can process in parallel. How many threads should be configured for a node will depend on the nature of your tasks: if the tasks are mostly CPU-bound, then the number of processing threads should be equal to or very close to the number of cpus; if the tasks are I/O-bound then you can have many more processing threads than you have cpus. These are the two extremes, you'll probably have to do some tuning based on what the tasks do.

One point to note: consider that if a task uses up to X amount of memory, then n tasks will use n * X and the heap should be sized accordingly to avoid out of memory conditions.
Also, consider that the number of processing threads is manageable: it can be changed dynamically from your code to adapt to the workload

3) Multiple nodes on the same machine

This is especially easy to setup with the node provisioning feature. In a context with multiple jobs executing in parallel, the obvious advantage is that multiple jobs can be distributed concurrently to the nodes. In a single job context, this will essentially impact the performance of I/O between server and nodes: when a job is dispatched to multiple nodes, the I/O between the server and each of the nodes will be performed in parallel. If your tasks are very large, you should see a significant speedup as a consequence.

Keep in mind that in this setup, the total number of processing threads is now nbNodes * processingThreads, and the impact with regards to the number of cpus described in 2) applies to this number.

4) Load balancing in the server

The load balancing in the server determines how many tasks are sent to each node. It basically calculates how each job will be split into subsets of its tasks and which node each subset will be sent to. When a job is made of many small, short-lived tasks, grouping as many tasks as possible in each subset will increase the overall throughput due to I/O speedup. If the tasks take a long time to execute, then the peformance gain will be much less significant or even negligible.

The number of tasks in each subset also impacts how a node performs as follows:
- the memory footprint is directly related to the number of tasks. As a consequence, the load-balancer settings can be used to coarsely control the node memory usage
- if there are less tasks than processing threads, then some threads will remain idle and you may have underused CPU resources.
- if there are more tasks than processing threads, then some tasks may be waiting in the node while the threads are busy. This isn't necessarily a problem, since these tasks might have been waiting in the server queue instead.

5) Load balancing in the client

The JPPF client has load-balancing settings exactly as the server does. The difference is that, instead of balancing the load between nodes, it balances between connections to one or multiple servers and the local executor. Consider that in the client you can define:
- a local executor which executes tasks in the client JVM
- multiple connections to a single server
- connections to multiple servers
- all of the above

You can also configure your jobs to be sent over multiple server connections in parallel, thus enabling parallel I/O between server and client for a single job.

That's basically all of it. I hope this will provide you with insights as to how you can optimize the hardware for your application.

In fact, after reviewing what I wrote above, I believe this should be included in the documentation, as I believe it will benefit the whole JPPF community. I created this feature request: JPPF-351 Write a performance tuning/optimization documentation section

Sincerely,
-Laurent
Logged
Pages: [1]   Go Up
 
JPPF Powered by SMF 2.0 RC5 | SMF © 2006–2011, Simple Machines LLC Get JPPF at SourceForge.net. Fast, secure and Free Open Source software downloads