Is it possible to configure the driver/server to have it make sure that there's always the special 'controller' job is running?
Is there a good way for a node to detect if a job/task is stuck and if so, terminate/cancel it?
Is it possible to send a message/command to a node where a job is running?
What happens if the driver/server goes down? Do all nodes continue to run and reconnect when the driver comes back? Or will nodes be forced to restart?
Can a node (i.e. controller) also be a client that submits jobs?
Is there a way to have jobs assigned so that the driver will send those jobs to nodes on different machines?
That's what a driver does by default: if a node dies, the driver resubmits the job it was executing to another node automatically. There's nothing else to do, as it's the built-in behavior.
Task timeout / NodeLifeCycleListener
Node Management API
Is there a way ... to have the driver start up ... and by itself pickup and submit that initial job (i.e. bootstrap)?
import java.util.List;import org.jppf.client.*;import org.jppf.server.protocol.JPPFTask;import org.jppf.startup.JPPFDriverStartupSPI;public class MyStartup implements JPPFDriverStartupSPI { @Override public void run() { System.out.println("running MyStartup"); try { final JPPFClient client = new JPPFClient(); final JPPFJob job = new JPPFJob(); job.setName("bootstrap job"); job.addTask(new MyTask()); Runnable jobSubmission = new Runnable() { @Override public void run() { try { List<JPPFTask> results = client.submit(job); JPPFTask t = results.get(0); if (t.getException() != null) throw t.getException(); else System.out.println("result: " + t.getResult()); client.close(); } catch (Exception e) { e.printStackTrace(); } } }; new Thread(jobSubmission).start(); } catch (Throwable e) { e.printStackTrace(); } } private static class MyTask extends JPPFTask { @Override public void run() { System.out.println("hello bootstrap world"); setResult("execution sucessful"); } }}
At this point, I was thinking that there would be 1 job with 1 task that would basically run forever if everything is going well or until it's cancelled by the controller.
Would a client or in our case the controller node be able to 'discover' all the other nodes in the network and find out which node is running which job via JPPFDriverAdmin.nodesInformation, JPPFDriver.getJobManager().getAllJobIds() and .getNodesForJob()?
I'm assuming that the driver itself doesn't do a lot of things other than take the jobs clients send and distribute them and send back statuses.