JPPF, java, parallel computing, distributed computing, grid computing, parallel, distributed, cluster, grid, cloud, open source, android, .net
JPPF

The open source
grid computing
solution

 Home   About   Features   Download   Documentation   On Github   Forums 

Grid Failover

What does the sample do?

This demo illustrates a JPPF grid failover mechanism which ensures that a JPPF driver is not a single point of failure.

How does it work?

The failover mechanism relies on two main components:
  • the nodes have a specific node connection strategy which handles an ordered list of drivers to connect to, as follows:
    • the first driver in the list is considered the primary driver, all others are backups in case the primary goes down
    • when initially connecting or reconnecting due to a mangement request, the node attemps to connect to the first (highest priority) driver in the list, then goes down the list if the driver is not online
    • when reconnecting due to a disconnection from the driver, the node goes down the list of drivers, rolling back to the top when the end of the list is reached
  • a JPPF client is used as a controller, to detect when a driver is back up after going down, and decide whether it is the new primary and force the nodes to reconnect to it. This client is used as follows:
    • it is configured to connect to the same drivers as provided to the nodes. Each configured driver has a priority in descending order of the list given to the nodes
    • for each driver connection, a connection status listener is registered
    • the listener detects situations when the primary driver goes down or gets back up and determines which dfriver is the current primary
    • when a new primary driver goes back up, the listener issues a management request to force the nodes to reconnect, once they complete their remaining tasks

How do I run it?

  • Before running this sample application, you must have at least two JPPF servers running.
    For information on how to set up a node and server, please refer to the JPPF documentation.
  • Update the drivers definition file to reflect the information on your JPPF servers
  • build the sample with this command: ant zip. This will create a file named jppf-grid-failover.zip
  • configure a node to use the driver discovery:
    • unzip the jppf-grid-failover.zip file into the root installation directory of a node. It will copy the required libraries into the node's /lib directory, and the drivers.yaml file in the node's root installation directory.
    • in a text editor, open the node's configuration file located at <node_install_root>/config/jppf-node.properties and add this line:
      jppf.server.connection.strategy = org.jppf.example.gridfailover.NodeSideDiscovery
    • start the node: ./startNode.sh or startNode.bat
  • configure an administration console to use the driver discovery:
    • unzip the jppf-grid-failover.zip file into the root installation directory of an adminsitration console. It will copy the required libraries into the console's /lib directory, and the drivers.yaml file in the console's root installation directory.
    • in a text editor, open the admin console's configuration file located at <console_install_root>/config/jppf-gui.properties and add this line: jppf.remote.execution.enabled = false
      this will disable the default built-in discovery mechanism, such that only our custom one is used
    • the client-side discovery auto-installs through the Service Provider Interface (SPI), thus no configuration update is needed for this
    • start the admin console: ./startConsole.sh or startConsole.bat
    • once started, the admin console should show the drivers defined in the drivers.yaml file, along with a node connected to the first driver in the list
  • from this sample's root directory, start the sample grid controller: ./run.sh or run.bat. The console output will show something similar to this:
    client process id: 37556, uuid: A9619789-6E4B-4DA9-B267-FC014A811275
    press [Enter] to exit
    discovered new driver: ClientConnectionPoolInfo[name=primary, secure=false, host=localhost,
     port=11111, priority=2, poolSize=1, jmxPoolSize=1, heartbeatEnabled=false, maxJobs=2147483647]
    discovered new driver: ClientConnectionPoolInfo[name=backup, secure=false, host=localhost,
     port=11112, priority=1, poolSize=1, jmxPoolSize=1, heartbeatEnabled=false, maxJobs=2147483647]
    [client: primary-1 - ClassServer] Attempting connection to the class server at localhost:11111
    [client: backup-1 - ClassServer] Attempting connection to the class server at localhost:11112
    [client: backup-1 - ClassServer] Reconnected to the class server
    [client: primary-1 - ClassServer] Reconnected to the class server
    [client: primary-1 - TasksServer] Attempting connection to the task server at localhost:11111
    [client: backup-1 - TasksServer] Attempting connection to the task server at localhost:11112
    [client: primary-1 - TasksServer] Reconnected to the JPPF task server
    [client: backup-1 - TasksServer] Reconnected to the JPPF task server
    
  • shutdown the primary server:
    • in the administration console's "Tree view", select the driver to which a node is connected
    • click on the "Server restart or shutdown" button ()
    • in the displayed dialog, enter "0" for the "Shutdown delay" and uncheck the "Restart" checkbox:
    • click the "Ok" button to terminate the server
  • you will notice that the node reconnects to the "backup" server
  • now start the terminated server again
  • observe how the node disconnects from the backup server and reconnects to the one that was just restarted

Related source files

I have additional questions and comments, where can I go?

There are 2 privileged places you can go to:


JPPF Copyright © 2005-2020 JPPF.org Get JPPF at SourceForge.net. Fast, secure and Free Open Source software downloads