Offline nodes
From JPPF 6.2 Documentation
|
Main Page > Deployment > Offline nodes |
JPPF 4.0 introduced a new “offline” mode for the nodes: in this mode, the nodes will disconnect from the server before executing the tasks, then reconnect to send the results and get a new set of tasks. The tasks are thus executed offline. As a consequence, the distributed class loader connection is disabled, and so is the JMX-based remote management of the node.
In this mode, the scalability of a single-server JPPF grid is greatly increased, since it becomes possible to have many more nodes than there are available TCP ports on the server. This is also particularly adapted to a volunteer computing type of grid, especially when combined with the “Idle Host” mode.
1 Class loading considerations
Since the dynamic class loader is disabled for offline nodes, we need another way to ensure that the tasks and supporting classes are known to the node. There are two solutions for this, which can be used together:
- statically: the classes can be deployed along with each node and added to its classpath
- dynamically: the supporting libraries can be sent along with the jobs, using the job's SLA classpath attribute. The nodes have a specific mechanism to add these libraries to the classpath before deserializing and executing the tasks.
Additionally, the node will also need the JPPF libraries used by the JPPF server in its classpath: jppf-server.jar and jppf-common.jar.
2 Avoiding stuck jobs
Since offline nodes work disconnected most of the time, the server has no way to know the status of a node, nor detect whether it crashed or is unable to reconnect. When this happens, the standard JPPF recovery mechanism, which resubmits the tasks sent to the node when a disconnection is detected, cannot be applied. In turn, this will cause the entire job to be stuck in the server queue, never completing.
To avoid this risk, the job SLA allows you to specify an expiration for all subsets of a job sent to any node. This expiration, specified as either a fixed date or a maximum duration for the job dispatch (i.e. subset of the job sent to a node), will cause the server to consider the job dispatch to have failed, and resubmit or simply cancel the tasks it contains, depending on the maximum number of allowed expirations specified in the SLA.
3 Example: configuring an offline node and submitting a job
To configure an offline node, set the following properties in its configuration file:
# enable offline mode jppf.node.offline = true # add the JPPF server libraries to the classpath jppf.jvm.options = -server -Xmx512m -cp lib/jppf-server.jar -cp lib/jppf-common.jar
Note that we specified the additional libraries as 2 distincts “-cp” statements, so that the classpath specification does not depend on a platform-specific syntax. For instance on Linux we could just write instead:
-cp lib/jppf-server.jar:lib/jppf-common.jar
To submit a job along with a supporting library:
JPPFJob myJob = new JPPFJob(); ClassPath classpath = myJob.getSLA().getClassPath(); // wrap a jar file into a FileLocation object Location jarLocation = new FileLocation(“libs/MyLib.jar”); // copy the jar file into memory Location location = new MemoryLocation(jarLocation.toByteArray()); // add it as classpath element classpath.add(“myLib”, location); // tell the node to reset the tasks classloader with this new class path classpath.setForceClassLoaderReset(true); // set the job dispatches to expire if they execute for more than 5 seconds myJob.getSLA().setDispatchExpirationSchedule(new JPPFSchedule(5000L)); // dispatched tasks will be resubmitted at most 2 times before they are cancelled myJob.getSLA().setMaxDispatchExpirations(2); // submit the job JPPFClient client = ...; List<JPPFTask> results = client.submit(myJob);
Main Page > Deployment > Offline nodes |