adequate
adequate
adequate
adequate
 

JPPF, java, parallel computing, distributed computing, grid computing, parallel, distributed, cluster, grid, cloud, open source, android, .net
JPPF

The open source
grid computing
solution

 Home   About   Features   Download   Documentation   Forums 
June 24, 2018, 11:57:57 PM *
Welcome,
Please login or register.

Login with username, password and session length
Advanced search  
News: New users, please read this message. Thank you!
  Home Help Search Login Register  
Pages: [1]   Go Down

Author Topic: Low grid utilization with high connection pool size  (Read 309 times)

webcentric

  • JPPF Master
  • ***
  • Posts: 27
Low grid utilization with high connection pool size
« on: February 10, 2017, 05:11:27 PM »

Hello,

We're running a grid with about 400 nodes. Due to the logic of our application, we have high number of jobs each of which has relatively low number of tasks (usually between 1 and 80, with average number of tasks around 5). Also, our jobs are network IO bound, and last between a couple of seconds and several hundred seconds (job expiration client SLA set to 200s)
Because of that we're using non blocking jobs and have set the connection pool size for the driver to 300.

However, running under JPPF 5.2.4, our grid utilization is about 15%-20% (maximum 60-80 nodes are concurrently executing, while the rest are sitting idle)

In our test setup, with around 30 nodes, and the same connection pool size, we were getting utilization of over 90%

In both cases we're using nodethreads algorithm with multiplier of 2. Each node has one thread and were having multiple slave nodes on each machine. We also tried manual profile, but it didn't do anything except lowering our utilization a bit in test setup.

On the other hand, with similar setup, JPPF 4.2.2 performs better - around 30-40% utilization, but it has different issue we're trying to solve by upgrading to 5.2.4

Is JPPF supposed to work with such high connection pool sizes (since each connection in the pool spawns it's own thread), and what's the upper limit of connection pool size that makes sense, performance-wise?
Any ideas how we can tweak / troubleshoot our setup (other than rethinking our jobs so they have more tasks)?

Regards,
  Andrija
Logged

lolo

  • Administrator
  • JPPF Council Member
  • *****
  • Posts: 2223
    • JPPF Web site
Re: Low grid utilization with high connection pool size
« Reply #1 on: February 11, 2017, 10:16:57 AM »

Hello Andrija,

15%-20% nodes usage is definitely not a normal behavior. I have tried to reproduce, with a somewhat smaller setup than your production grid, with what I hope is close to your own setup:
- 1 driver with nodethreads algo and multiplicator=2, -Xmx2g (2 GB heap) in jppf.jvm.options
- 250 nodes with jppf.processing.threads=1, -Xmx128m
- jobs have 5 tasks each, each task sleeps for 3 seconds (almost no CPU usage)
- the client opens 250 connections, submits 10,000 jobs using a streaming pattern with a maximum of 250 concurrent jobs
- client configured with -Xmx512m
Everything runs with Java 8 u121 (64 bits).

Under these conditions, I could observe a 100% nodes utilization. I monitored with the admin console, in particular I defined a chart that shows idle vs. busy nodes, and as soon as the client started submitting jobs, the number of busy nodes quickly grew up to 250, and stayed there until all jobs completed.

Could you provide additional details on your setup? in particular it may help to understand what size JVMs you are using for your driver, nodes and client. The allocated heap size is especially important for the driver. The driver can handle workloads that are much larger than its heap, but in this case it will offload some of it to disk, which may result in significant slowdowns due to the disk I/O (but it's still better than an OutOfMemoryError). That is one scenario that may explain the behavior you observe. Is there any way you can check that, for instance by increasing the heap size for your driver?

Also, do you have a rough idea of how many jobs are executing at any given time, are there notable peak values?

Quote
Is JPPF supposed to work with such high connection pool sizes (since each connection in the pool spawns it's own thread), and what's the upper limit of connection pool size that makes sense, performance-wise?

Yes JPPF is supposed to work with sevral hundred connections to the server. There is no theoretical limit to the number of connections, other than the resources of the sysem the client runs on: RAM, number of cores, file handles, TCP ports etc...

Thank you for your time,
-Laurent
Logged

webcentric

  • JPPF Master
  • ***
  • Posts: 27
Re: Low grid utilization with high connection pool size
« Reply #2 on: February 14, 2017, 03:34:53 PM »

Hi Laurent,

Our heap settings were:
- 1g at driver
- default (no Xmx set) at client
- 64m at nodes

We tried increasing it up to 6g at driver, 2g at server and 128m at nodes, however that didn't yield any improvement.

However, during the additional troubleshooting we found what was the issue - due to nature of our tasks, some of them had execution policies that required execution on certain nodes. Apparently such tasks blocked enough client connections while waiting for the required nodes to become available, to degrade performance. During that time, although there were plenty idle nodes there weren't enough connections to use them as connections were busy waiting.
During troubleshooting we discarded all forced jobs, and utilization raised to 99%

The strange thing is that 4.2.2 performs much better in this scenario than 5.2.4. Not sure why is that...

Do you have any suggestion on the configuration that would minimize the impact of such setup?
We will probably have to rewrite our client so it minimizes the number of submitted jobs for tasks that are forced on nodes (ie, put all tasks forced to specific node in one job).

Regards,
  Andrija
Logged

lolo

  • Administrator
  • JPPF Council Member
  • *****
  • Posts: 2223
    • JPPF Web site
Re: Low grid utilization with high connection pool size
« Reply #3 on: February 20, 2017, 08:10:24 AM »

Hello Andrija,

Thank you for the explanation of what was going on. There is a way to avoid all connections being blocked by jobs that execute on specific nodes. The idea is to define 2 connection pools in your client configuration, identical but for their name and optionally thier pool size. The connections in these pools will connect to the same driver in exactly the same way, but we will be able to choose either one based on the type of job. Then you can partition your set of jobs among these two pools by setting a client-side execution policy requiring jobs to be sent only through one of these connection pools.

Basically you would have a configuration like this:
Code: [Select]
jppf.drivers = driver1 driver2

driver1.jppf.server.host = 192.168.1.24
driver1.jppf.server.port = 11111
driver1.jppf.pool.size = 100

driver2.jppf.server.host = 192.168.1.24
driver2.jppf.server.port = 11111
driver2.jppf.pool.size = 50

Then the tricky part is to set something like "pool.name = driver1" on the configuration used by each connection pool: since these properties are actually obtained from the remote driver, they are only availalble once a connection is established and the handshake is done. We can do this with a combination of a ConnectionPoolListener and a ClientConnectionStatusListener:

Code: [Select]
public class MyPoolListener extends ConnectionPoolListenerAdapter implements ClientConnectionStatusListener {
  @Override
  public void connectionAdded(ConnectionPoolEvent event) {
    // add a status listener to the connection upon creation
    event.getConnection().addClientConnectionStatusListener(this);
  }

  @Override
  public void statusChanged(ClientConnectionStatusEvent event) {
    JPPFClientConnection c = (JPPFClientConnection) event.getClientConnectionStatusHandler();
    // when the connection is established
    if (c.getStatus() == JPPFClientConnectionStatus.ACTIVE) {
      JPPFConnectionPool pool = c.getConnectionPool();
      JPPFSystemInformation info = pool.getSystemInfo();
      if (info != null) {
        TypedProperties props = info.getJppf();
        props.setString("pool.name", pool.getName());
      }
    }
  }
}

It is strongly recommended to add this listener upon construction of the JPPFClient, to avoid missing any event:
Code: [Select]
JPPFClient client = new JPPFClient(new MyPoolListener());
Now you can assign to each type of job (with or without node restriction) a client-side execution policy that will route it through the proper connection pool:
Code: [Select]
JPPFJob job = ...;
job.getClientSLA().setExecutionPolicy(new Equal("pool.name", true, "driver2"));

From there on, you will always have connections available for each type of job. The balancing of jobs between the 2 pools will really depend on how many jobs of each type you have and how frequently they are submitted. However, keep in mind that you can always adjust the distribution dynamically, since the size of a connection pool is dynamic and can be set by API:
Code: [Select]
JPPFClient client = ...;
JPPFConnectionPool pool = client.findConnectionPool("driver1");
if (pool != null) pool.setSize(100);

For your convenience, I attached a working example which applies everything that was explained above, in the hope it'll help clarify.

Sincerely,
-Laurent
Logged

webcentric

  • JPPF Master
  • ***
  • Posts: 27
Re: Low grid utilization with high connection pool size
« Reply #4 on: March 02, 2017, 11:50:23 PM »

Hi Laurent,

Thanks for the example.

I tried running it, but something doesn't seem right. It does initialize two connection pools (driver1 and driver2) and dispatches the jobs to the proper pool based on the execution policy.
However, it only actually executes TYPE_1 jobs (using driver1). TYPE_2 jobs (using driver2) do get dispatched, but are never executed on any nodes, as if no nodes are joined to driver2.

Does this kind of setup require any special configuration on the node end?

Regards,
  Andrija
Logged

lolo

  • Administrator
  • JPPF Council Member
  • *****
  • Posts: 2223
    • JPPF Web site
Re: Low grid utilization with high connection pool size
« Reply #5 on: March 05, 2017, 06:17:33 AM »

Hi Andrija,

In the sample, the jobs of type 2 have an execution policy that expects the property "pool.name = driver2" to be set in the node. That's probably why they do not execute on your side. Just add the property to one or more of your nodes and you should see the jobs of type 2 go to completion.

The goal was to simulate a scenario similar to yours, where only a part of the nodes were eligible for some type of jobs.

I hope this clarifies,
-Laurent
Logged
Pages: [1]   Go Up
 
JPPF Powered by SMF 2.0 RC5 | SMF © 2006–2011, Simple Machines LLC Get JPPF at SourceForge.net. Fast, secure and Free Open Source software downloads