adequate
adequate
adequate
adequate
 

JPPF
 Home   About   Download   Documentation   Forums 
June 19, 2013, 04:25:06 PM *
Welcome,
Please login or register.

Login with username, password and session length
Advanced search  
News: Registered users, your contribution is requested! Please participate in our JDK support poll
New users, please read this message. Thank you!
  Home Help Search Login Register  
Pages: [1]   Go Down

Author Topic: Help with understanding performance, settings & more  (Read 544 times)

rafaelsteil

  • JPPF Padawan
  • *
  • Posts: 4
Help with understanding performance, settings & more
« on: January 22, 2012, 10:55:41 PM »


Hello,

I am seeking for some help and guidance to better configure JPFF's environment (driver, nodes and tasks) in order to execute the tasks the best way as possible. I have some rough results (I mean, without any fancy network or I/O graphics, just some measurements using milliseconds) that I'd like to share, in order to better provision the hardware needed for each job.

My environment is running on Amazon's EC2, with 1 instance for the server and the client (although the client is not configured to run locally), and up to 5 instances for the nodes. All instances have the same configuration, and there is one type of task that is executed many times, with different sets of data. Each task is independent of each other, there isn't any kind of data sharing and everything runs in memory, so there is no I/O happening.

For this test, I have 412 inputs, which are broken in 412 tasks. The server is configured to use jppf.load.balancing.algorithm = manual, jppf.load.balancing.strategy = manual and strategy.manual.size = 24. Each node is set up to use 24 threads, and the client's classes and all required libraries are in the node's classpath, in order to prevent the remote classloader to take place. All instances are in the same network.

The results I have so far are:
- Command line (without JPPF), with 24 threads using a ThreadPool: completed in ~513 seconds
- Running one node: ~1.017 seconds
- 3 nodes: ~320 seconds
- 5 nodes: ~210 seconds

1) My very first question is: what do you think of such results? Are they expected, or by just looking at them there is something I could do to improve it?

2) Why did it take so much more time to execute the tasks when running on a single node when compared to the command line version, which uses the exact same code? Again, is it an expected result? (please don't get me wrong here, I am asking just to make sure that I will make realistic projections for my needs).

One interesting thing to note is that, although the single node test took twice as much as the simple command line test, the average time to process each step of the task was virtually the same.

Cheers.
Logged

lolo

  • Administrator
  • JPPF Council Member
  • *****
  • Posts: 1472
    • JPPF Web site
Re: Help with understanding performance, settings & more
« Reply #1 on: January 23, 2012, 07:19:15 PM »

Hello,

I would say that these results look as expected.
In the first result you have (with one node), the setup is roughly equivalent to the command-line test you performed, and to that you have to add the JPPF overhead:
- serialization of the tasks on the client side
- transport to the server
- from there transport to the node
- deserialization on the node side
- same thing the other way around

So the JPPF overhead will depend essentially on the size of the data in each task, and the performance of the network between client and server, and server and nodes.
This is also consistent with your mention that "the average time to process each step of the task was virtually the same".

On the other hand, the results with 3 and 5 nodes appear to show that the performance grows linearly with the number of nodes, which is good news.

So to answer your first question accurately, it would great to know a few more things about the tasks and the environment:
- do you have a rough idea of the data size in each task?
- how good is the network communication between EC2 instances?

Given your setup, the main parameter you can tune to increase the throughput will be the number of tasks sent to each node, that is the "strategy.manual.size = X" parameter of the load balancer configuration. If you don't use many nodes, you might want to try and increase that number until you find the "optimal" value for your configuration.
You might also want, if possible at all, to change the code of your tasks to perform some cleanup when each task is completed, to limit the amount of data that is sent back to the server and to the client. For instance clearing collections or setting to null the instance variables you don't need anymore as part of the results. You may also use the "transient" qualifier for instance variables that are only meaningful during the computation in the task, etc...

I hope this helps.

Sincerely,
-Laurent
Logged

rafaelsteil

  • JPPF Padawan
  • *
  • Posts: 4
Re: Help with understanding performance, settings & more
« Reply #2 on: January 24, 2012, 02:34:33 AM »

Thanks for the reply, it's good to know that the results are OK. I'll let you know of any other relevant information that I find, if any.

Cheers,
Logged
Pages: [1]   Go Up
 
Support This Project Powered by SMF 2.0 RC5 | SMF © 2006–2011, Simple Machines LLC Powered by Parallel Matters Get JPPF at SourceForge.net. Fast, secure and Free Open Source software downloads