JPPF, java, parallel computing, distributed computing, grid computing, parallel, distributed, cluster, grid, cloud, open source, android, .net
JPPF

The open source
grid computing
solution

 Home   About   Features   Download   Documentation   On Github   Forums 
December 06, 2021, 04:34:34 PM *
Welcome,
Please login or register.

Login with username, password and session length
Advanced search  
News: New users, please read this message. Thank you!
  Home Help Search Login Register  
Pages: [1]   Go Down

Author Topic: Is there a way to change the SLA of a job that was submited to the JPPF server?  (Read 1286 times)

boris.klug

  • JPPF Master
  • ***
  • Posts: 41

Hello,

my question is already in the subject: Is there a way to change the SLA of a job that was submited to the JPPF server?


Maybe I should tell you why I want to do this:

We have a quite big (from our perspective!) jppf node network with 80 cores and we will double the cores in the next days.

Today we have one jppf server. If two or more people want to use the jppf network, they can specify the nodes where the calculation will take place, so the first program can use the first 50% of the nodes, the second the other 50%. Thats OK.

The problem is that dividing the jppf network this way is static, not dynamic. If the one program quits, the remaining program will only use 50% of the nodes. Also you have to coordinate a lot.

So my idea is to write a kind of control program for the jppf server: All jobs should be added/scheduled to the jppf server whichout any knowledge if there is another program running or planned.

The jobs should be in suspended mode. The control program monitors the jppf server so it knows how many jobs are running and how many jobs are waiting. Than the control program configures the jobs and the network in a way that always 100% of the nodes are used and that all programs use the same amount of nodes.

An example:

We do calculation only on slave nodes which are started in front of the calculation program and shut down after it is completed.

When the first program starts and submits a job to the jppf server, on every (master) node a slave node is started and the job is set to running mode (suspended = false). The tasks of the job will run on all nodes (100%).

When a second job is sheduled, the control program will recognize it and will shut down 50% of the slave nodes. The first program now runs only on 50% of the nodes. New slave nodes will be started on the free 50% (see *1) and the second program will use this 50%.
When one of the two programs is finished, the used 50% slaves are shut down and new slave nodes are started again so the remaining program can now use again 100%.

First: What do you think about this idea?

For this idea to work I would be good if the SLA of a job which is running on the server could be changed by the control program.

It would be great if you can help me with this.

(*1) We start new nodes and do not reuse the already running nodes because they get a new uuid which is used with a job sla to direct to tasks to the right slave nodes. Also the configuration of the nodes (nr of threads, memory, ...) can be different.

Logged

lolo

  • Administrator
  • JPPF Council Member
  • *****
  • Posts: 2272
    • JPPF Web site

Hello,

Sorry for the late answer. As I xasn't too sure how this can be implemented, I worked on an implementation to give me a feeling on how it could work.
First off, the ability to change the SLA of a job is a very good idea and I registered this feature request: JPPF-401 Ability to dynamically change the SLA and metadata of a job in the server. Thank you for that.

However, the solution I implementeddoes not need to change the SLA of the jobs. Instead, it uses the ability to add/override configuration properties in slave nodes, and add the property "job.uuid = <job_uuid>", so each slave is dedicated to a single job. As a consequence, the jobs must have an execution policy based on this property: new Equal("job.uuid", false, job.getUuid()).

The controller is implemented as a job notifications listener and does the following:
- it is installed in the driver via a startup class
- upon the JOB_QUEUED and JOB_ENDED notifications, it performs a re-allocation of the slave nodes by computing how many and which slave nodes to allocate to each job; in other words it performs a segmentation of the nodes per job.
- based on the computed segmentation, it stops existig slaves and starts new ones as needed.
- the jobs are submitted in suspended mode, and the controller resumes the job upon the JOB_QUEUED notification, once it has checked that slave nodes have been stopped and/or started appropriately for the job.

The source of this solution (for JPPF 5.0.3) can be downloaded from here: http://www.jppf.org/private/5.0.3/nodes_allocation.zip
It also includes a patched jppf-node.jar for the nodes and driver, since unfortunately I uncovered bug JPPF-402 in the provisioning code which I had to fix. I hope the code and its comments will be enough for you to undestand how it works, otherwise don't hesitate to get back to me in this discussion thread.

Additionally, make sure that the jobs are submitted with at least the following SLA:
Code: [Select]
JPPFJob job = ...;
ExecutionPolicy policy = new Equal("jppf.node.provisioning.slave", true).and(new Equal("job.uuid", false, job.getUuid()));
job.getSLA().setExecutionPolicy(policy);
job.getSLA().setSuspended(true);

Sincerely,
-Laurent
Logged

boris.klug

  • JPPF Master
  • ***
  • Posts: 41

Thanks for the reply, it is very insightful. I have implemented it in a very similar way but not as job notification listener in a startup class.

I implemented it as a normal program that connects to the driver and monitors the jobs that are queued and ended. Thats because we do not have access to the computer the driver is installed.

But I will have a look at your implementation, maybe in a future version we will use your implementation. We are also on JPPF 4, so when upgrading to 5 maybe we can sneak the starup class in.
Logged
Pages: [1]   Go Up
 
JPPF Powered by SMF 2.0 RC5 | SMF © 2006–2011, Simple Machines LLC Get JPPF at SourceForge.net. Fast, secure and Free Open Source software downloads