adequate
adequate
adequate
adequate
JPPF, java, parallel computing, distributed computing, grid computing, parallel, distributed, cluster, grid, cloud, open source, android, .net
JPPF

The open source
grid computing
solution

 Home   About   Features   Download   Documentation   Forums 
April 24, 2018, 02:28:45 PM *
Welcome,
Please login or register.

Login with username, password and session length
Advanced search  
News: New users, please read this message. Thank you!
  Home Help Search Login Register  

Recent Posts

Pages: [1] 2 3 ... 10
1
Developers help / Maximum resource utilization with unlimited task pool
« Last post by calkuta2 on April 03, 2018, 11:00:14 PM »
Hello,

I am implementing a steady-state genetic algorithm, meaning that we have an essentially unlimited number of tasks.  Assuming operation on a local node + slave nodes, what we would like to do is keep each slave node busy working on evaluating a candidate solution.  We also would like each task to be processed immediately upon completion, followed by the assignment of another task to the unoccupied node.  This process would continue indefinitely.

Is this possible?  The problem we are currently encountering is that we are sending out groups of tasks in a job, and as the job nears completion many available threads are being unutilized until the job returns and we send out another job.

We would also like to achieve this same scheme on a grid of nodes running slave nodes, which I assume would be straightforward once establishing it for the case of a local node + slave nodes.

Thanks kindly for any help.

Ricky
2
Developers help / Re: Reduce priority of NodeError job
« Last post by lolo on March 23, 2018, 08:06:22 AM »
Hi Shiva,

The node log extract you provided shows clearly that the job dispatch timeout was triggered and that the node actually attempts to cancel the executing task.
What I am suspecting here is that the task is not performing interruptible operations at that time causing the cancel request ot be ignored.

A bit of explanation: cancelling a task in a JPPF node results in calling Future.cancel(true), which in turn results in calling Thread.interrupt() on the thread that is executing the task. If the task is not performing an interruptible operation, as specified in the Javadoc for Thread.interrupt(), then all that happens is that the interrupted flag of the thread is set to true. If the task is performing an interruptible operation, then the thread will also receive an InterruptedException, allowing it to effectively stop processing.

In your case, I believe the task is not doing an interruptible operation and therefore does not receive an InterruptedException. To resolve this, you will need to add regular checks in the code of the task, as in this example:

Code: [Select]
public class MyTask extends AbstratTask<Object> {
  @Override
  public void run() {
    try {
      ...
      if (Thread.currentThread().isInterrupted()) {
        throw new InterruptedException("task cancelled");
      }
      ...
    } catch(InterruptedException e) {
      // process cancellation/interruption
    }
  }
}

Regarding your question:
Quote
I would also like to know if there is a way to run some additional step at the end of each timeout to make sure the node state is good enough for next job/task

There isn't a way to distinguish whether a job was cancelled because of a timeout, but you can determine whether a job was cancelled, by using a NodeLifeCycleListener and implementing its jobEnding() method.

For instance, let's first add a taskCancelled attribute to the above task implementation:

Code: [Select]
public class MyTask extends AbstractTask<Object> {
  private boolean taskCancelled;

  @Override public void run() {
    try {
      ...
      if (Thread.currentThread().isInterrupted()) {
        this.taskCancelled = true;
        throw new InterruptedException("task cancelled");
      }
      ...
    } catch(InterruptedException e) { /* process cancellation/interruption */ }
  }

  public boolean isTaskCancelled() { return taskCancelled; }
}

Then we can write a NodeLifeCycleListener that uses this attribute, as follows:

Code: [Select]
public class MyNodeLifeCycleListener extends NodeLifeCycleListenerAdapter {
  @Override
  public void jobEnding(NodeLifeCycleEvent event) {
    boolean jobCancelled = false;
    for (Task<?> t: event.getTasks()) {
      MyTask task = (MyTask) t;
      if (task.isTaskCancelled()) {
        jobCancelled = true;
        break;
      }
    }
    if (jobCancelled) {
      // check node state, etc...
    }
  }
}

Sincerely,
-Laurent
3
Developers help / Re: Reduce priority of NodeError job
« Last post by shiva.verma on March 22, 2018, 10:03:39 AM »
Thanks Laurent for the response. I will try supressing the client's console output as you suggested.

I am right now stuck with timeout.

Scenario:
 -Each job has single task
 -I want job to expire after running for X amount of time.
 - Code:
Code: [Select]
JPPFSchedule jobExpirationSchedule = new JPPFSchedule(4000L);
job.getSLA().setMaxDispatchExpirations(0);
job.getSLA().setDispatchExpirationSchedule(jobExpirationSchedule);

But this doesn't seem like cancelling the job once the timeout is achieved. The job/task still continue to run

This is what I observed in server log:
Code: [Select]
2018-03-22 01:37:38,232 [DEBUG][org.jppf.scheduling.JPPFScheduleHandler.scheduleAction(102)]: DispatchExpiration : scheduling action[key=1E73FD32-AF6A-4FEF-4A13-B04BD34FAB34|4, schedule[delay=4000], action=org.jppf.server.nio.nodeserver.NodeDispatchTimeoutAction@2bb55e7f, start=2018-03-22 01:37:38.231
2018-03-22 01:37:38,247 [DEBUG][org.jppf.scheduling.JPPFScheduleHandler.scheduleAction(110)]: DispatchExpiration : date=2018-03-22 01:37:42.231, key=1E73FD32-AF6A-4FEF-4A13-B04BD34FAB34|4, future=java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask@5bf8c575
2018-03-22 01:37:38,247 [DEBUG][org.jppf.nio.StateTransitionManager.transitionChannel(157)]: transition from SENDING_BUNDLE to WAITING_RESULTS with ops=1 (readyOps=4) for channel id=4, submit=false
2018-03-22 01:37:38,247 [DEBUG][org.jppf.nio.PlainNioObject.read(98)]: read 4 bytes for PlainNioObject[channel id=477, size=4, count=4, source=ChannelInputSource[channel=java.nio.channels.SocketChannel[connected local=/127.0.0.1:11111 remote=/127.0.0.1:62421]], dest=null, location=MultipleBuffersLocation[size=4, count=4, currentBuffer=org.jppf.utils.JPPFBuffer@1ce56350, currentBufferIndex=0, transferring=false, list=[org.jppf.utils.JPPFBuffer@1ce56350]]]

node log:
Code: [Select]
2018-03-22 01:37:42,248 [DEBUG][org.jppf.management.JPPFNodeAdmin.cancelJob(253)]: Request to cancel jobId = '1E73FD32-AF6A-4FEF-4A13-B04BD34FAB34', requeue = false
2018-03-22 01:37:42,249 [DEBUG][org.jppf.execute.AbstractExecutionManager.cancelAllTasks(193)]: cancelling all tasks with: callOnCancel=true, requeue=false
2018-03-22 01:37:42,249 [DEBUG][org.jppf.execute.AbstractExecutionManager.cancelTask(211)]: cancelling task = NodeTaskWrapper[task=test.utilities.RunSuiteTask@3ceef7d, cancelled=false, callOnCancel=false, timeout=false, started=true]
2018-03-22 01:37:42,249 [DEBUG][org.jppf.execute.AbstractExecutionManager.cancelTask(214)]: calling future.cancel(true) for task = NodeTaskWrapper[task=test.utilities.RunSuiteTask@3ceef7d, cancelled=false, callOnCancel=false, timeout=false, started=true]
2018-03-22 01:37:42,250 [DEBUG][org.jppf.scheduling.JPPFScheduleHandler.cancelAction(131)]: Task Timeout Timer : cancelling action for key=java.util.concurrent.FutureTask@26e39e2f, future=null
2018-03-22 01:37:42,251 [DEBUG][org.jppf.classloader.AbstractJPPFClassLoader.getResourceAsStream(253)]: JPPFClassLoader[id=3, type=client, uuidPath=[707FABBC-51DD-5EEE-F16F-EDEC867EE744, 40705A97-1FED-BDCD-B8D5-1485A2E2337C], offline=false, classpath=] lookup for 'META-INF/services/org.apache.xerces.xni.parser.XMLParserConfiguration' = null for JPPFClassLoader[id=3, type=client, uuidPath=[707FABBC-51DD-5EEE-F16F-EDEC867EE744, 40705A97-1FED-BDCD-B8D5-1485A2E2337C], offline=false, classpath=]
2018-03-22 01:37:42,253 [DEBUG][org.jppf.classloader.AbstractJPPFClassLoader.getResourceAsStream(253)]: JPPFClassLoader[id=3, type=client, uuidPath=[707FABBC-51DD-5EEE-F16F-EDEC867EE744, 40705A97-1FED-BDCD-B8D5-1485A2E2337C], offline=false, classpath=] lookup for 'META-INF/services/org.apache.xerces.xni.parser.XMLParserConfiguration' = null for JPPFClassLoader[id=3, type=client, uuidPath=[707FABBC-51DD-5EEE-F16F-EDEC867EE744, 40705A97-1FED-BDCD-B8D5-1485A2E2337C], offline=false, classpath=]
2018-03-22 01:37:42,275 [DEBUG][org.jppf.classloader.AbstractJPPFClassLoader.getResourceAsStream(253)]: JPPFClassLoader[id=3, type=client, uuidPath=[707FABBC-51DD-5EEE-F16F-EDEC867EE744, 40705A97-1FED-BDCD-B8D5-1485A2E2337C], offline=false, classpath=] lookup for 'META-INF/services/org.apache.xerces.xni.parser.XMLParserConfiguration' = null for JPPFClassLoader[id=3, type=client, uuidPath=[707FABBC-51DD-5EEE-F16F-EDEC867EE744, 40705A97-1FED-BDCD-B8D5-1485A2E2337C], offline=false, classpath=]
2018-03-22 01:37:59,308 [DEBUG][org.jppf.classloader.AbstractJPPFClassLoader.getResourceAsStream(253)]: JPPFClassLoader[id=3, type=client, uuidPath=[707FABBC-51DD-5EEE-F16F-EDEC867EE744, 40705A97-1FED-BDCD-B8D5-1485A2E2337C], offline=false, classpath=] lookup for 'META-INF/services/org.apache.xerces.xni.parser.XMLParserConfiguration' = null for JPPFClassLoader[id=3, type=client, uuidPath=[707FABBC-51DD-5EEE-F16F-EDEC867EE744, 40705A97-1FED-BDCD-B8D5-1485A2E2337C], offline=false, classpath=]


I would also like to know if there is a way to run some additional step at the end of each timeout to make sure the node state is good enough for next job/task

Please help
4
Developers help / Re: Reduce priority of NodeError job
« Last post by lolo on March 18, 2018, 07:58:11 AM »
Hello Shiva,

I'm glad the display issue is resolved :)

To answer your questions:

1. Unfortunately, JPPF doesn't have an option to suppress the output generated with System.out.println. What you can do is redirect System.out to a file, using System.setOut(). You may also use a more elegant solution found here, by simply copy/pasting the class SystemOutToSlf4j and calling SystemOutToSlf4j.enable("org.jppf.client") before creating the JPPFClient.

2. The number of connections available to a client represents the maximum number of jobs a JPPF client can handle concurrently. For example, if your client has 50 connections and you submit 60 jobs, then up to 50 jobs at a time will be sent to the server. The remaining 10 jobs will be kept in the client's queue, waiting for a connection to be available (which happens whenever a job completes). To make an analogy, this is exactly like the number of threads in a fixed thread pool executor.

Sincerely,
-Laurent
5
Developers help / Re: Reduce priority of NodeError job
« Last post by shiva.verma on March 17, 2018, 09:52:06 PM »
Thanks a ton Laurent, this worked like a charm. I have not tested it thoroughly yet. Will give test it more and let you know.

Sorry for putting a dump question, But I have couple of more related questions:

1. can you also please suggest me the best way to suppress console output from the client. The only output I am unable to suppress is following and I want it to be clubbed to the client logfile only:
....
....
....
[client: driver1-48 - ClassServer] Attempting connection to the class server at localhost:11111
[client: driver1-45 - ClassServer] Reconnected to the class server
[client: driver1-47 - ClassServer] Reconnected to the class server
[client: driver1-49 - ClassServer] Attempting connection to the class server at localhost:11111
[client: driver1-46 - ClassServer] Reconnected to the class server
[client: driver1-50 - ClassServer] Attempting connection to the class server at localhost:11111
[client: driver1-48 - ClassServer] Reconnected to the class server
[client: driver1-43 - TasksServer] Attempting connection to the task server at localhost:11111
[client: driver1-42 - TasksServer] Attempting connection to the task server at localhost:11111
[client: driver1-49 - ClassServer] Reconnected to the class server
[client: driver1-50 - ClassServer] Reconnected to the class server
[client: driver1-48 - TasksServer] Attempting connection to the task server at localhost:11111
[client: driver1-46 - TasksServer] Attempting connection to the task server at localhost:11111
[client: driver1-47 - TasksServer] Attempting connection to the task server at localhost:11111
[client: driver1-45 - TasksServer] Attempting connection to the task server at localhost:11111
[client: driver1-44 - TasksServer] Attempting connection to the task server at localhost:11111
[client: driver1-50 - TasksServer] Attempting connection to the task server at localhost:11111
[client: driver1-49 - TasksServer] Attempting connection to the task server at localhost:11111
....
....
....

2. The number of these connections should be double the number of nodes connecting and do not has to do anything with the number of jobs? I know in my case I need two connection (one for sending and another for collecting). My understanding is yet not very clear on this. Any pointer would be great.


Thanks again
Shiva
6
Developers help / Re: Detect node reconnect
« Last post by lolo on March 17, 2018, 08:18:26 AM »
Hello,

The easiest for this would be to use a NodeLifeCycleListener and implement its nodeStarting() method, using a connection counter or a first_connection flag to know whether the node is connecting for the first time. For instance:

Code: [Select]
public class MyNodeListener extends NodeLifeCycleListenerAdapter {
  private boolean firstConnection = true;

  @Override
  public void nodeStarting(NodeLifeCycleEvent event) {
    if (firstConnection) {
      firstConnection = false;
      processFirstConnection(event);
    } else {
      processSubsequentConnection(event);
    }
  }

  private void processFirstConnection(NodeLifeCycleEvent event) { ... }

  private void processSubsequentConnection(NodeLifeCycleEvent event) { ... }
}

Sincerely,
-Laurent
7
Developers help / Re: Reduce priority of NodeError job
« Last post by lolo on March 17, 2018, 07:51:14 AM »
Hi Shiva,

Thanks a lot for the screenshots, they provided very useful information.
In your admin-ui configuration file I see the following: "jppf.gui.publish.mode = immediate_notifications". Could you try to change it to "jppf.gui.publish.mode = polling"  instead, and let us know if this changes anything?

In "immediate_notifications" mode, the job data view is updated only via notifications (as the name indicates of course) received from the driver. However, I could see, in the server statistics view, that your tasks last around 2 minutes. That would be the frequency of the notifications the admin-ui receives. If the admin-ui is started after the jobs are submitted, then it would miss the notifications emitted by the driver for jobs dispatched to the nodes, and therefore the job data view would not reflect the current execution status of the jobs and nodes.

For such a scenario, the "polling" mode is indeed more appropriate, as it will update and/or rebuid the entire job execution tree after each polling interval (provided with the "jppf.gui.publish.period" property).

Sincerely,
-Laurent
8
Developers help / Re: Reduce priority of NodeError job
« Last post by shiva.verma on March 17, 2018, 01:58:18 AM »
Hi Laurent,

Please find attached the screenshot.

Please let me know if you need anything else to understand this issue better?

Thanks
Shiva
9
Developers help / Re: Reduce priority of NodeError job
« Last post by lolo on March 17, 2018, 12:54:24 AM »
Hello Shiva,

Sorry to learn that the patch didn't resolve the problem.
It is very difficult to understand what the issue is from the logs. Is there any way that you could provide a screenshot of the amdin-ui that illustrates the problem?

Or maybe could you tell whether it is something similar to this (in bug JPPF-527):



or this (in bug JPPF-518) ?



Thanks,
-Laurent
10
Developers help / Detect node reconnect
« Last post by shiva.verma on March 16, 2018, 07:57:57 AM »
I want to note all node connections, so that I can perform differently when I know the node which was connected earlier has reconnect.

Right now I am running some startup jobs every-time a node connect, but I want to execute different job/action when a node reconnect before it can take part in the current load execution.

Much thanks in advance
Pages: [1] 2 3 ... 10
JPPF Powered by SMF 2.0 RC5 | SMF © 2006–2011, Simple Machines LLC Get JPPF at SourceForge.net. Fast, secure and Free Open Source software downloads