JPPF, java, parallel computing, distributed computing, grid computing, parallel, distributed, cluster, grid, cloud, open source, android, .net
JPPF

The open source
grid computing
solution

 Home   About   Features   Download   Documentation   On Github   Forums 
June 04, 2023, 09:19:52 AM *
Welcome,
Please login or register.

Login with username, password and session length
Advanced search  
News: New users, please read this message. Thank you!
  Home Help Search Login Register  
Pages: [1]   Go Down

Author Topic: JPPF and Database tasks  (Read 4952 times)

DP

  • JPPF Padawan
  • *
  • Posts: 11
JPPF and Database tasks
« on: January 13, 2015, 02:02:11 PM »


As per my implementation a few jobs are failing to complete the tasks and a few are inserting multiple times. Overall there is data inconsistency. Are we missing any thing to work with JPPF and MySql. As we have downloads from ftp, http and insert that data into database. Could u please suggest best implementation for this task.

The following are the steps,
1. Download data from HTTP or FTP.
2. Process the Data.
3. Insert the data into DB.

The above steps will have vice versa also.

I do want to implement with best solution to avoid data inconsistency..

Thanks
DP :D
Logged

lolo

  • Administrator
  • JPPF Council Member
  • *****
  • Posts: 2272
    • JPPF Web site
Re: JPPF and Database tasks
« Reply #1 on: January 14, 2015, 07:24:26 AM »

Hello DP,

A likely scenario is that one or more of your tasks may have been resubmitted after they updated the database. This can happen for instance when a node crashes while executing tasks: theses tasks are then requeued in the server for execution by another node. Because of this, some tasks may indeed write to the database twice.

To prevent this, I see three approaches which can be combined together:

1) Prevent resubmission of the tasks. This can be done either:
- at the job level, by specifying the appropriate settings in the job SLA. For instance, to prevent task resubmit event in the case of node failure, you could do as follows:
Code: [Select]
JPPFJob job = ...;
// tasks cannot be resubmitted
job.getSLA().setMaxTaskResubmits(0);
// resubmits due to node errors are also counted
job.getSLA().setApplyMaxResubmitsUponNodeError(true);
// ... submit the job and get the results ...

- at the task level, using the maxTaskResubmits attribute of each task. Note that you would still need to set job.getSLA().setApplyMaxResubmitsUponNodeError(true) in the job SLA.

2) Have the tasks check whether the data has already been written to the database before doing the inserts/updates. This might give the tasks a better chance at completing cleanly when or if they are resubmitted.

3) Use transactions. In particular, if you are not using a transaction manager, you might want to set the autocommit attribute of your database connections to false, so that the data will only be committed if you reach a Connection.commit() statement.

To detect tasks that did not complete normally, either because of an exception during their execution or due to a node failure, you could add an attribute which can be used to trigger a special processing on the client side once you've received the results. A simple example as follows:

Let's define a JPPF task with a "completed" attribute:
Code: [Select]
public class MyTask extends AbstractTask<String> {
  // will be set to true only if the task completes
  // and is sent back to the client without problem
  private boolean completed = false;

  public void run() {
    try {
      // ... update the DB ...
      this.completed = true;  // last statement
    } catch (Exception e) {
      setThrowable(e);
    }
  }

  public boolean isCompleted() {
    return this.completed;
  }
}

On the client side, it could be used like this:
Code: [Select]
JPPFClient client = new JPPFClient();
JPPFJob job = new JPPFJob();
// ... other job settings ...
job.add(new MyTask());
List<Task<?>> results = client.submitJob(job);
for (Task<?> task: results) {
  MyTask myTask = (MyTask) task;
  if (!myTask.isCompleted()) {
    // process task with potential issue
  }
}

I hope this helps.

Sincerely,
-Laurent
Logged

DP

  • JPPF Padawan
  • *
  • Posts: 11
Re: JPPF and Database tasks
« Reply #2 on: January 14, 2015, 07:05:24 PM »

Hello lolo,

Thanks for the quick reply. The approaches are good and helpful in moving forward in jppf implementation.
A quick question for the first approach,
If the node fails in the middle of the job and in the middle of the task. If we work with job.getSLA().setMaxTaskResubmits(0); then the task may gone..  how can i trace that task to resubmit?? How can we submit the task without inserting duplicates in database?



Thanks,
DP :)
Logged

lolo

  • Administrator
  • JPPF Council Member
  • *****
  • Posts: 2272
    • JPPF Web site
Re: JPPF and Database tasks
« Reply #3 on: January 19, 2015, 08:28:55 AM »

Hi DP,

The only sure way that I can see, to avoid duplicates in the database, is to query the database for duplicates before performing the updates. This means some code that you'd need to have in your tasks to lookup for duplicates in the database. In addition to this, you would need some verification that the task effectively completed (e.g. the node didn't crash in the middle of it). You could use a pattern similar or based on this one using a flag that indicates completion status:

Code: [Select]
public class MyTask extends AbstractTask<Object> {
  private boolean taskCompleted = false;

  @Override
  public void run() {
    try {
      // get data from file, url, etc...
      fetchData();
      // query the DB for possible duplicates
      if (!checkDatabaseUpdated()) {
        // if no duplicate, update the DB
        updateDatabase();
      } else {
        ...
      }
    } finally {
      taskCompleted = true;
    }
  }

  private boolean checkDatabaseUpdated() {
    ...
  }

  public boolean isTaskCompleted() {
    return taskCompleted;
  }
}

Then, on the client side, you could gather the tasks that did not complete and only resubmit those as a new job, for instance:

Code: [Select]
JPPFClient client = new JPPFClient();
JPPFJob job = new JPPFJob();
// add the initial tasks
...
boolean done = false;
List<Task<?>> tasksToResubmit = new ArrayList<>();
while (!done) {
  List<Task<?>> results = client.submitJob(job);
  for (Task<?> result: results) {
    MyTask task = (MyTask) result;
    if (!task.isTaskCompleted()) {
      // task did not complete
      tasksToResubmit.add(task);
    } else {
      // process the result
      ...
    }
  }
  if (tasksToResubmit.isEmpty()) {
    // job is done fully
    done = true;
  } else {
    job = new JPPFJob();
    for (Task<?> task: tasksToResubmit) {
      job.add(task);
      tasksToResubmit.clear();
    }
  }
}

Sincerely,
-Laurent
Logged

DP

  • JPPF Padawan
  • *
  • Posts: 11
Re: JPPF and Database tasks
« Reply #4 on: January 22, 2015, 01:35:57 PM »

Hello Laurent,

Thank you very much for the inputs. Started working on JPPF JobStreaming pattern as we have the list of tasks and need concurrent execution on four nodes and they are long running jobs. Hope It will suit our environment. Actually My idea is to capture the task results whether its failure or success and insert them in DB for the future reference. Normally If the task is completed then i can able to capture the result. But My question is when the task fails due to a node crash in the middle of execution, Can we able to capture that data? Please suggest a way to move forward in this scenario.

Sincerely,
DP
Logged

DP

  • JPPF Padawan
  • *
  • Posts: 11
Re: JPPF and Database tasks
« Reply #5 on: January 29, 2015, 04:28:34 AM »

Hello Laurent,

This is another post  in learning how to submit tasks. I want to collect the results in the TaskExecutionListener. In My scenario

 
Code: [Select]
try
    {         
     
      int nbTasks = 5;;
      long taskSleepTime = 2;
      JobDetails jobDetails = null;
     
      jppfClient = new JPPFClient();
      Service service = Service.getInstance();
      // Create a job with the specified number of tasks
      JPPFJob job = new JPPFJob();
      job.setName("CompanyName");
      job.getSLA().setMaxTaskResubmits(0);
     
      job.getSLA().setApplyMaxResubmitsUponNodeError(true);
     
      for (int i=1; i<=nbTasks; i++)
      {
        MyJppfTask task = new MyJppfTask(jobDetails);
        task.setId("" + i);
       
        job.add(task);
       
      }
      job.setBlocking(false);
jppfClient.submitJob(job);
   output("MyTaskRunner ended");
    }
    catch(Exception e)
    {
      e.printStackTrace();
    }
    finally
    {
      if (jppfClient != null) jppfClient.close();
    }

But I was unable to find the result int he TaskExceution Listener. IS it the expected behavior??

My Intention is I dont want to wait for the results and client should not wait for the output. It should be like give it and close. The task should run on the node and driver,  client is no more after submitting the job to the driver.

Please suggest..

Thanks
-DP
« Last Edit: January 29, 2015, 04:38:58 AM by DP »
Logged

lolo

  • Administrator
  • JPPF Council Member
  • *****
  • Posts: 2272
    • JPPF Web site
Re: JPPF and Database tasks
« Reply #6 on: January 29, 2015, 09:25:40 PM »

Hi DP,

I'm not sure why the TaskExecutionListener is not giving you the task results. Could you please elaborate pn what you mean by "I was unable to find the result int he TaskExceution Listener"?
In particular, are you absolutely sure that your TaskExecutionListener implementation is properly installed and called upon completion of the tasks? You could temporarillyadd traces in the no-args constructor and in the listener methods to verify this.

Given you usage scenario, I'm seeing an issue in your client code, which may cause the task execution notifications to be lost: if your intent is to not collect the results on the client side, you need to set the following SLA flag: job.getSLA().setCancelUponDisconnect(false). If this flag is not set to false, it will cause the driver to cancel the job as soon as the client is closed.

You also need to consider potential class loading issues: if the client is closed while the node is still executing tasks, and not all required classes have yet been loaded, the node will be unable to load additional classes from the client and this may result in ClassNotFoundException for the execution of your tasks. You need to make sure this doesn't happen.

Sincerely,
-Laurent
Logged

DP

  • JPPF Padawan
  • *
  • Posts: 11
Re: JPPF and Database tasks
« Reply #7 on: March 06, 2015, 06:46:35 PM »

Hello Laurent,

1. Created application to receive tasks results and copied this jar on node and inserting the results in DB. Perfectly working on Windows.
2. If I do the same on linux version even the client application also not working. If i remove this listener jar then the client application working good.

Please help me on this. Am I missing anything on Linux.

Thanks
DP :(
Logged

okccoder

  • JPPF Padawan
  • *
  • Posts: 3
    • Free Classified Ads
Re: JPPF and Database tasks
« Reply #8 on: April 06, 2015, 02:27:50 AM »

Great input here!
Logged

DP

  • JPPF Padawan
  • *
  • Posts: 11
Re: JPPF and Database tasks
« Reply #9 on: April 06, 2015, 07:20:54 AM »

Hello All,

For My Recent post i have updated node's wrapper.conf file to load the dependencies for node listener and its working good.

But facing issues for the same on JPPF 5.0. Even i loaded the dependency jars to the node its not working on windows tooo...


Thanks,
DP
Logged
Pages: [1]   Go Up
 
JPPF Powered by SMF 2.0 RC5 | SMF © 2006–2011, Simple Machines LLC Get JPPF at SourceForge.net. Fast, secure and Free Open Source software downloads