JPPF, java, parallel computing, distributed computing, grid computing, parallel, distributed, cluster, grid, cloud, open source, android, .net
JPPF

The open source
grid computing
solution

 Home   About   Features   Download   Documentation   On Github   Forums 
January 20, 2021, 10:14:03 AM *
Welcome,
Please login or register.

Login with username, password and session length
Advanced search  
News: New users, please read this message. Thank you!
  Home Help Search Login Register  
Pages: [1]   Go Down

Author Topic: JPPF Server sending Tasks to Peer Server with no Nodes connected  (Read 1483 times)

JppfFan

  • JPPF Padawan
  • *
  • Posts: 12
JPPF Server sending Tasks to Peer Server with no Nodes connected
« on: September 23, 2015, 12:46:03 AM »

Greetings All.  I am seeing my job "hang" as my server is sending tasks to a peer server with no nodes connected.  I would expect that the driver would use its connected nodes to execute tasks instead of sending them to a peer server which cannot execute tasks.  Is there a configuration property or code that I can use to change this behavior?

Here are the details of my environment:
JPPF version: 3.3.7, Build number: 1180, Build date: 2013-11-28 05:39 CET
OS: Windows Server 2012 R2
Java: JDK 1.7.51

Topology:
2 JPPFClientWithFailover (client-1 and client-2): each is on a Glassfish application server instance.  Each is configured to use driver-1 as primary, and then to failover to driver-2.
2 JPPF Drivers (driver-1 and driver-2): each is configured to be a peer server of the other with recovery enabled.
40 JPPF Nodes: each is configured using the "DiscoveryHook.java" approach to connect to driver-1 primarily and to failover connect to driver-2.

Sequence of events:
  • System is in startup state; both clients connected to driver-1 and all 40 nodes connected to driver-1; driver-1 and driver-2 are peer servers of each other.
  • A Job with 200 Tasks is submitted via client-1 to driver-1.
  • 192 Tasks are executed successfully via the nodes.
  • 8 Tasks are sent to driver-2 (peer server) and get stuck there; the job never completes and times out.

I should say that we've had great luck with JPPF and that we are so close to having a topology that survives failovers, recoveries, restarts, etc.  This is hopefully the last piece of the puzzle. 

Laurent, if you are out there; I should also mention that we are running with jppf-3.3.7-patch-02 that you provided us in my previous topic.  I would love to hear from you.

Thanks,
JPPF Fan
Logged

lolo

  • Administrator
  • JPPF Council Member
  • *****
  • Posts: 2262
    • JPPF Web site
Re: JPPF Server sending Tasks to Peer Server with no Nodes connected
« Reply #1 on: September 23, 2015, 08:18:43 AM »

Hi JPPF Fan,

Well, it's been a long-standing problem in JPPF. I have finally registered this as a bug: JPPF-413 Job stuck when dispatched to a peer driver with no node.

In the scenario you describe, an easy fix is for the driver to mark other drivers as peer(s) in the information available to execution policies. Then you can add an execution policy to the job to prevent it from being dispatched to peer drivers, like this:

Code: [Select]
JPPFJob job = new JPPFJob();
job.getSLA().setExecutionPolicy(new Equal("jppf.peer.driver", false));

For this to work, you will need the updated jppf-server.jar, which you can get from here: http://www.jppf.org/private/3.3.7/jppf-server.jar
Would you mind giving a try? I'll make it an official patch upon your confirmation that it works.

Sincerely,
-Laurent

PS: nice nickname :)


« Last Edit: September 23, 2015, 12:23:18 PM by lolo »
Logged

JppfFan

  • JPPF Padawan
  • *
  • Posts: 12
Re: JPPF Server sending Tasks to Peer Server with no Nodes connected
« Reply #2 on: September 23, 2015, 03:48:13 PM »

Hi Laurent,
The peer server function is very powerful in JPPF - we want to be able to use it for the case where one of our clients sends a job to one of the drivers that has no nodes connected, but does has a peer server with nodes connected. 

We have seen JPPF do this correctly when we do a failover/restart test where we shut down the "primary" driver-1, all nodes disconnect and reconnect to driver-2, then driver-1 comes back up and clients continue to send to driver-1.  Driver-1 has driver-2 as a peer server and uses its nodes. 

So, if we did the execution policy not to execute on a peer server, we would lose that resilience and have the same problem.

The execution policy would need an additional check to see that there are nodes connected on the peer server. 

Perhaps there is another way we could get around this issue?  We've thought about attaching 20 nodes to driver-1 and 20 nodes to driver-2 instead of 40 nodes attached to a primary driver-1 with a failover to driver-2.  We're concerned that this option will be slower as tasks will not get spread through the peer server's nodes quickly enough and parallel enough.  For instance, in the failover test above where all 40 nodes are on the peer server, the job took 4 times as long.  We watched the admin console and it appeared that tasks were not being executed in parallel as much (i.e. 2-4 tasks at a time instead of 20 tasks at a time).  So if there was something we could change to improve that performance?  (We are using nodethreads algorithm and want tasks to spread across all the nodes available and take all the resources to get done as fast as possible. 

Thank you for the quick response  :)
Logged

lolo

  • Administrator
  • JPPF Council Member
  • *****
  • Posts: 2262
    • JPPF Web site
Re: JPPF Server sending Tasks to Peer Server with no Nodes connected
« Reply #3 on: September 24, 2015, 01:18:26 PM »

Ok, I understand your failover scenario and why my fix doesn't fit in.

The ideal solution would indeed be to keep track of the number of nodes attached to each peer dirver, however it is somewhat complex to add this, and it will have a significant impact on the server code and therefore a high risk of breaking something else. I can implement a full solution for the upcoming version v5.1, but I certainly don't have the availability to implement it in 3.3.7, which is almost 2 years old now and essentially in maintenance mode (bug fixes only).

What I can propose for 3.3.7 is the following:
- provide the number of nodes arrached to the peer driver, at handshake time between the 2 drivers
- if it doesn't have too much impact (which I believe it doesn't), also count and provide the total  number of processing threads for all the nodes attached to the peer, so the "nodethreads" load-balancer can work effectively - currently this is why the "nodethreads" algorithm is not very efficient between peers.
- still depending on the impact on existing code, update the number of of nodes/threads after each job execution by the peer driver, so you will have intermittent updates (it is relatively easy to transport additional information within the job header).

This will not cover the following situations:
- when all the nodes attached to the peer fail while a job is executing
- when all the nodes attached to the peer fail between 2 job executions

What do you think?

-Laurent
Logged

JppfFan

  • JPPF Padawan
  • *
  • Posts: 12
Re: JPPF Server sending Tasks to Peer Server with no Nodes connected
« Reply #4 on: September 24, 2015, 04:07:40 PM »

Hi Laurent,
I have a few questions about updating to v5.1.  When would 5.1 with your fix come out?  Would you include both the fix for the zero-nodes-connected-to-peer issue and also include the enhancement for the nodethreads algorithm to function better with peer drivers?  I read some of your docs on migrating from 3 to 4 and 4 to 5.  Do you think upgrading is just a matter of refactoring to the new class names and methods or are we going to need to rethink everything?

For your suggested 3.3.7 route, I take it that you are recommending we switch to a 50/50 split of the nodes between our two drivers and allow the improved nodethreads algorithm to improve performance?  Or do you mean that the zero-nodes-connected-to-peer issue would be fixed as well? 

Let me know what you think and we'll do some thinking over here.

Thanks,
JPPF Fan
Logged

lolo

  • Administrator
  • JPPF Council Member
  • *****
  • Posts: 2262
    • JPPF Web site
Re: JPPF Server sending Tasks to Peer Server with no Nodes connected
« Reply #5 on: September 24, 2015, 08:29:17 PM »

Hi JPPF Fan,

Quote
When would 5.1 with your fix come out?
I'm hoping by end of October at the latest. Until the 5.1 beta release, my focus was mostly on the Android port, but now development will accelerate for the other features and enhancements.

Quote
Would you include both the fix for the zero-nodes-connected-to-peer issue and also include the enhancement for the nodethreads algorithm to function better with peer drivers?
I guess I wasn't clear enough in my previous post, sorry about this. What I'm proposing for 3.3.7 is not an enhancement for the 'nodethreads" algorithm, but an enhancment to the driver that will provide adiditional information about the peers, so it can make the decision to not dispatch jobs to a peer that doesn't have any node. Incidentally, it will also enable the 'nodethreads" algorithm to work with more accurate information with regards to the peers.

Quote
Do you think upgrading is just a matter of refactoring to the new class names and methods or are we going to need to rethink everything?
It should be essentially a matter of refactoriing, except for ClientWithFailover, which was deprecated in 4.x and disappeared in 5.x. However, we both know it will have an impact on other aspects of your project's lifecycle such as testing and building (for instance the JPPF jars packaging changed in 5.0, as described here).

Quote
... I take it that you are recommending we switch to a 50/50 split of the nodes between our two drivers ...
I'm not recommending any way over the other. I'm simply saying that the enhancement will improve the efficiency of both ways of splitting the nodes. Except for the two situations I described previously, the scenario where a peer has no nodes will also be covered.

I hope this clarifies,
-Laurent
Logged

lolo

  • Administrator
  • JPPF Council Member
  • *****
  • Posts: 2262
    • JPPF Web site
Re: JPPF Server sending Tasks to Peer Server with no Nodes connected
« Reply #6 on: September 27, 2015, 07:26:45 PM »

Hi JPPF Fan,

After spending some time on this, I realized that for various reasons, the solution I was considering for 3.3.7 was never going to work in practice.
Therefore, I instead implemented a solution where each driver will poll its peer drivers periodically for the necessary information (number of nodes + sum of their processing threads) via JMX.
I updated the patch 02 for JPPF 3.3.7 with this fix, since the same jars (jppf-common.jar and jppf-server.jar) are impacted. The fix also includes the enhancement we discussed regarding the "nodethreads" algorithm.

The polling period can be set in each driver's configuration file like this:
Code: [Select]
jppf.peer.handler.period = 1000it is expressed in milliseconds and defaults to 1000 ms when unspecified.

Sincerely,
-Laurent
Logged

JppfFan

  • JPPF Padawan
  • *
  • Posts: 12
Re: JPPF Server sending Tasks to Peer Server with no Nodes connected
« Reply #7 on: October 12, 2015, 04:08:40 PM »

Hi Laurent,
I apologize for not getting to this earlier.  We are excited about this and will give it a try.  I have downloaded from the patches page; you guys are well-organized.

Thank you so much,
Jppf Fan
Logged
Pages: [1]   Go Up
 
JPPF Powered by SMF 2.0 RC5 | SMF © 2006–2011, Simple Machines LLC Get JPPF at SourceForge.net. Fast, secure and Free Open Source software downloads