JPPF, java, parallel computing, distributed computing, grid computing, parallel, distributed, cluster, grid, cloud, open source, android, .net
JPPF, java, parallel computing, distributed computing, grid computing, parallel, distributed, cluster, grid, cloud, open source, android, .net
JPPF

The open source
grid computing
solution

 Home   About   Features   Download   Documentation   On Github   Forums 

Offline nodes

From JPPF 6.2 Documentation

Revision as of 12:47, 25 August 2015 by Lolocohen (Talk | contribs)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

Contents

Main Page > Deployment > Offline nodes


JPPF 4.0 introduced a new “offline” mode for the nodes: in this mode, the nodes will disconnect from the server before executing the tasks, then reconnect to send the results and get a new set of tasks. The tasks are thus executed offline. As a consequence, the distributed class loader connection is disabled, and so is the JMX-based remote management of the node.

In this mode, the scalability of a single-server JPPF grid is greatly increased, since it becomes possible to have many more nodes than there are available TCP ports on the server. This is also particularly adapted to a volunteer computing type of grid, especially when combined with the “Idle Host” mode.

1 Class loading considerations

Since the dynamic class loader is disabled for offline nodes, we need another way to ensure that the tasks and supporting classes are known to the node. There are two solutions for this, which can be used together:

  • statically: the classes can be deployed along with each node and added to its classpath
  • dynamically: the supporting libraries can be sent along with the jobs, using the job's SLA classpath attribute. The nodes have a specific mechanism to add these libraries to the classpath before deserializing and executing the tasks.

Additionally, the node will also need the JPPF libraries used by the JPPF server in its classpath: jppf-server.jar and jppf-common.jar.

2 Avoiding stuck jobs

Since offline nodes work disconnected most of the time, the server has no way to know the status of a node, nor detect whether it crashed or is unable to reconnect. When this happens, the standard JPPF recovery mechanism, which resubmits the tasks sent to the node when a disconnection is detected, cannot be applied. In turn, this will cause the entire job to be stuck in the server queue, never completing.

To avoid this risk, the job SLA allows you to specify an expiration for all subsets of a job sent to any node. This expiration, specified as either a fixed date or a maximum duration for the job dispatch (i.e. subset of the job sent to a node), will cause the server to consider the job dispatch to have failed, and resubmit or simply cancel the tasks it contains, depending on the maximum number of allowed expirations specified in the SLA.

3 Example: configuring an offline node and submitting a job

To configure an offline node, set the following properties in its configuration file:

# enable offline mode
jppf.node.offline = true
# add the JPPF server libraries to the classpath
jppf.jvm.options = -server -Xmx512m -cp lib/jppf-server.jar -cp lib/jppf-common.jar

Note that we specified the additional libraries as 2 distincts “-cp” statements, so that the classpath specification does not depend on a platform-specific syntax. For instance on Linux we could just write instead:

-cp lib/jppf-server.jar:lib/jppf-common.jar

To submit a job along with a supporting library:

JPPFJob myJob = new JPPFJob();
ClassPath classpath = myJob.getSLA().getClassPath();
// wrap a jar file into a FileLocation object
Location jarLocation = new FileLocation(“libs/MyLib.jar”);
// copy the jar file into memory
Location location = new MemoryLocation(jarLocation.toByteArray());
// add it as classpath element
classpath.add(“myLib”, location);
// tell the node to reset the tasks classloader with this new class path
classpath.setForceClassLoaderReset(true);
// set the job dispatches to expire if they execute for more than 5 seconds
myJob.getSLA().setDispatchExpirationSchedule(new JPPFSchedule(5000L));
// dispatched tasks will be resubmitted at most 2 times before they are cancelled
myJob.getSLA().setMaxDispatchExpirations(2);
// submit the job
JPPFClient client = ...;
List<JPPFTask> results = client.submit(myJob);
Main Page > Deployment > Offline nodes



JPPF Copyright © 2005-2020 JPPF.org Powered by MediaWiki