JPPF, java, parallel computing, distributed computing, grid computing, parallel, distributed, cluster, grid, cloud, open source, android, .net
JPPF

The open source
grid computing
solution

 Home   About   Features   Download   Documentation   On Github   Forums 
June 03, 2023, 04:59:05 PM *
Welcome,
Please login or register.

Login with username, password and session length
Advanced search  
News: New users, please read this message. Thank you!
  Home Help Search Login Register  
Pages: [1]   Go Down

Author Topic: How to configure cluster to complete job in the event of any hardware failure  (Read 2076 times)

kilik

  • JPPF Knight
  • **
  • Posts: 16

I have a job whose execution results I don't need. But I want the job execute even in the event of any hardware failure on the client, server or node side.

For client side, setCancelUponClientDisconnect(true) make job still execute even if client disconnects because of hardware failure on the client machine.
For server side, enabling recovery from hardware failures on the nodes.
But what if server machine encounter hardware failure?
« Last Edit: July 05, 2012, 07:19:49 AM by kilik »
Logged

lolo

  • Administrator
  • JPPF Council Member
  • *****
  • Posts: 2272
    • JPPF Web site

Hello,

As you have stated, the JPPF client currently cannot recover from a hard-failure of the server. I have registered a feature requested for this: 3540739 - Client side recovery from hardware failure. Given our current bandwith, I cannot promise this will be implemented anytime soon.
Something you might consider to mitigate the risk of such an event, is the possiblity to persist jobs and their results. We also have a related sample which demonstrates how it works and provides a code sample of an implementation of it.

I hope this helps.

Sincerely,
-Laurent
Logged

kilik

  • JPPF Knight
  • **
  • Posts: 16

Hi Laurent,

Thanks for your suggestion. The job recovery sample you mentioned demonstrates the manual recovery, as far as I understand. The client cannot automatically recover job since it cannot detect the broken connection without using heartbeats. Manual recovery is not suitable in some case.
Actually, server become the single point of failure of a JPPF cluster. If there would be a hot-standby server synchronizing server state with the primary server, the server fail-over could take place in the event of server failure.
« Last Edit: July 06, 2012, 04:37:34 PM by kilik »
Logged
Pages: [1]   Go Up
 
JPPF Powered by SMF 2.0 RC5 | SMF © 2006–2011, Simple Machines LLC Get JPPF at SourceForge.net. Fast, secure and Free Open Source software downloads