JPPF, java, parallel computing, distributed computing, grid computing, parallel, distributed, cluster, grid, cloud, open source, android, .net
JPPF

The open source
grid computing
solution

 Home   About   Features   Download   Documentation   On Github   Forums 
June 04, 2023, 09:29:58 AM *
Welcome,
Please login or register.

Login with username, password and session length
Advanced search  
News: New users, please read this message. Thank you!
  Home Help Search Login Register  
Pages: [1]   Go Down

Author Topic: Server / Nodes Don't Appear to Respect Idle Socket Timeout Setting  (Read 3752 times)

djroze

  • JPPF Knight
  • **
  • Posts: 20

Hi there,

   I have been trying to keep peer-to-peer server and server-node connections active across network boundaries but it appears that the connections go stale over time. After checking the documentation and other forum posts, I had enabled a greater-than-10 jppf.socket.max-idle setting on the client, servers and nodes, but this setting doesn't appear to be fixing the issue with stale peer-to-peer server and server-node connections. From the server configuration documentation page (http://www.jppf.org/doc/v3/index.php?title=Configuring_a_JPPF_server):

Quote
To remedy to that situation, it is possible to configure an idle timeout on either side of the connection, so that the connection can be close cleanly and grid operations can continue unhindered. This is done via the following property: jppf.socket.max-idle = timeout_in_seconds If the timeout value is less than 10 seconds, then it is considered as no timeout.

   Upon looking at the source code to try to confirm that I was specifying the right setting, I can't find reference to this property anywhere except for the client code. Searching in the unzipped source directory:

Quote
JPPF-3.3.6-full-src$ grep -R "max-idle" *
Binary file client/classes/org/jppf/client/AbstractClientConnectionHandler.class matches
client/src/java/org/jppf/client/AbstractClientConnectionHandler.java:    long configSocketIdle = JPPFConfiguration.getProperties().getLong("jppf.socket.max-idle", -1L);

   Neither my server nor node processes are including the jppf-client JAR on the classpath so it looks to me like they aren't loading any code that will respect this setting. Can you confirm whether or not "jppf.socket.max-idle" is the right setting to be using on the server and node code to keep broken connections from going undetected?

Thanks in advance,
Daniel
Logged

lolo

  • Administrator
  • JPPF Council Member
  • *****
  • Posts: 2272
    • JPPF Web site
Re: Server / Nodes Don't Appear to Respect Idle Socket Timeout Setting
« Reply #1 on: November 23, 2013, 03:47:51 PM »

Hi Daniel,

Well this is embarrassing. As you found out, the documentation is completely wrong with regard to this feature, which indeed was only implemented in the JPPF client. I registered the bug report JPPF-200 Documentation incorrectly states that idle socket timeout is working for server and nodes for this.

To detect broken connections between servers and nodes, there is a heartbeat-based recovery mechanism. However, this only works between servers and real nodes, not for connections between servers. I have registered a feature request for this as well, for the 4.0 milestone: JPPF-201 Recovery from hard failures between servers. As this is a non-trivial feature, I honestly cannot implement it as an enhancement in the next maintenance release, so it will have to be in JPPF 4.0, which I hope to deliver for Christmas. I hope this is acceptable to you.

Sincerely,
-Laurent
Logged

djroze

  • JPPF Knight
  • **
  • Posts: 20
Re: Server / Nodes Don't Appear to Respect Idle Socket Timeout Setting
« Reply #2 on: November 25, 2013, 11:17:35 PM »

Hi Laurent,

   OK, thanks for confirming the issue and letting me know the situation. So is there any reason I wouldn't be able to connect a node across a network boundary where the node can see the class loader / JMX ports on the server, but the server cannot initiate connections to the node? Is the ping recovery mechanism sufficient to keep the server-node connection alive in this scenario or is it necessary to use peer-to-peer server connections across network boundaries as the network diagram encourages? I'm not concerned about hardware failures, just want to make sure that the server and node will automatically reconnect to one another if the network connection is broken. If it is possible for the nodes to automatically reconnect from behind a firewall then I should be able to make do with just one server and some nodes connecting to that server from inside another network until JPPF-201 is available...

Thanks in advance,
Daniel
Logged

lolo

  • Administrator
  • JPPF Council Member
  • *****
  • Posts: 2272
    • JPPF Web site
Re: Server / Nodes Don't Appear to Respect Idle Socket Timeout Setting
« Reply #3 on: November 26, 2013, 06:47:35 AM »

Hi Daniel,

To clarify, the only connection initiated by the server to a node is the one on the node's JMX port. If you don't need the node management features, this won't be a problem. Furthermore, there is no JMX connection between servers. So, as long as the nodes are able to reach the server, no matter which network/subnet they're on, you may not need multiple servers with a p2p topology.

The ping mechanism is designed to supplement the built-in connection failure mechanism, in conditions where the TCP protocol does not detect that a connection is broken. This happens generally when there is a hardware failure, for instance when a network cable is unplugged on one side, or a router dies. The ping will ensure that the broken connection is detected and that the node will attempt to reestablish a connection to the server. However, if the failure condition persists, the node may not be able to reconnect.

Additionally, you might want to set the TCP keepalive property on your connections. It's only documented in the configuration properties reference, but it is available for all connections between JPPF clients, servers (including server to server) and nodes.

Sincerely,
-Laurent
Logged

djroze

  • JPPF Knight
  • **
  • Posts: 20
Re: Server / Nodes Don't Appear to Respect Idle Socket Timeout Setting
« Reply #4 on: December 10, 2013, 08:07:36 PM »

Hi Laurent,

   Thanks for the clarification regarding the recovery scenarios and ping mechanism. Extra thanks for mentioning the TCP keep-alive configuration property, that may be what I've been looking for all along. With that enabled the nodes seem to be staying in contact with the server now rather than just silently disconnecting so I think that's progress; I am seeing other errors crop up with classloading problems but I will open a separate thread for troubleshooting that. Thanks!

- Daniel
Logged
Pages: [1]   Go Up
 
JPPF Powered by SMF 2.0 RC5 | SMF © 2006–2011, Simple Machines LLC Get JPPF at SourceForge.net. Fast, secure and Free Open Source software downloads