JPPF, java, parallel computing, distributed computing, grid computing, parallel, distributed, cluster, grid, cloud, open source, android, .net
JPPF

The open source
grid computing
solution

 Home   About   Features   Download   Documentation   On Github   Forums 
January 19, 2020, 12:39:09 PM *
Welcome,
Please login or register.

Login with username, password and session length
Advanced search  
News: New users, please read this message. Thank you!
  Home Help Search Login Register  
Pages: [1]   Go Down

Author Topic: Returned task does not match submitted job  (Read 3633 times)

micrei

  • JPPF Knight
  • **
  • Posts: 19
Returned task does not match submitted job
« on: September 04, 2012, 02:34:14 PM »

Dear supporter

We use JPPF 2.5.2 since more than a year in an application where more than 100'000 calculations are executed in a kind of batch job. Each calculation is done by one JPPFTask in one blocking JPPFJob. Now we found that quite often the returned task do not match the submitted job. Here the main statements from our code:

 JPPFJob job = new JPPFJob(); 
 job.getJobSLA().setPriority(5);
 job.getJobSLA().setJobExpirationSchedule(new JPPFSchedule(this.timeout));
 job.setId(<calculation-specific-id>
 MyJPPFTask task = new MyJPPFTask(....);
 job.addTask(task);
 job.setBlocking(true);
 List<JPPFTask> results = jppfClient.submit(job);
 for (JPPFTask task: results){
   check(job.getId(), task.getCalcId());
   ...

Is this a known issue of JPPF 2.5.2? What can we do to avoid this problem? Any suggestions?

Thanks a lot and kind regards
Michel
Logged

lolo

  • Administrator
  • JPPF Council Member
  • *****
  • Posts: 2258
    • JPPF Web site
Re: Returned task does not match submitted job
« Reply #1 on: September 05, 2012, 04:34:05 AM »

Hello Michel,

I believe this would be a really gross issue in JPPF and we probably would have heard of it in the 13 months since 2.5.2 was released.
For now, let's consider all possible causes. Could you please provide details on how you check that task result and job match? In particular, how is task.getCalcId() computed?
Ideally, it would be very helpful if you could provide a sample code, especially for MyJPPFTask, that would allow us to reproduce the problem.

Also, I see that you set an expiration timeout on your jobs. It is possible that, for some reason, some of the jobs expire and leave the task in an undefined state.
Unfortunately, it is not easy to check that a job has expired. Since you have only one task per job, you might consider setting a timeout at the task level, and override its onTimeout() method:

Code: [Select]
public class MyJPPFTask extends JPPFTask {
  ...
  private boolean expired = false;

  public void run() {
    ...
  }

  public void onTimeout() {
    this.expired = true;
  }

  public boolean hasExpired() {
    return this.expired;
  }
}

with this pattern, you can easily check if the task actually expired.

Thanks for your time.

Sincerely,
-Laurent
Logged

micrei

  • JPPF Knight
  • **
  • Posts: 19
Re: Returned task does not match submitted job
« Reply #2 on: September 06, 2012, 02:03:26 PM »

Hello Laurant

I found, that JPPF mixes up JPPF Task in case of job timeouts. If no job timeout occurs, everything is fine. If a job timeout occured the failed task returns in another job.

I changed your TemplateJPPFTask and TemplateApplicationRunner delivered for JPPF 2.5.2 to demonstrate the case (see attachements). The JPPF task just sleeps for a given time before it copies the input to the output. executeBlockingJob validates returned tasks and throws exceptions in case of validation failures. createJob and executeBlockingJob are called in the "calculate" service operation. Calculate is concurrently run 100 times, where the index is used as input and is used to provide a long running task, which causes a timeout exception.

I run the application against a grid with 12 nodes, where driver and nodes run on another server then the client. The corresponding jppf.log is also attached.

I hope this helps you. Thanks for your help.

Kind regards
Michel
Logged

micrei

  • JPPF Knight
  • **
  • Posts: 19
Re: Returned task does not match submitted job
« Reply #3 on: September 06, 2012, 02:18:52 PM »

Sorry Laurant

The attachments in the last post were not correct. Therefore here the correct files.

The first log is for JPPF_TIMEOUT=2000, the second for JPPF_TIMEOUT=20000. So we only see validation exceptions in the first one.

Cheers
Michel
Logged

micrei

  • JPPF Knight
  • **
  • Posts: 19
Re: Returned task does not match submitted job
« Reply #4 on: September 06, 2012, 03:14:51 PM »

It seems not to be my day. Here the log for JPPF_TIMEOUT=2000 in the attachement.
Logged

lolo

  • Administrator
  • JPPF Council Member
  • *****
  • Posts: 2258
    • JPPF Web site
Re: Returned task does not match submitted job
« Reply #5 on: September 07, 2012, 08:45:43 AM »

Hi Michel,

Thank you very much for the code sample. This allowed me to reproduce the problem, for which I reggistered the bug report JPPF-61 Job expiration causes wrong tasks to be returned for the next jobs
I have a (temporary) fix which I'd like you to try. You can donwload it here: http://www.jppf.org/private/JPPF-2.5.5-JobExpirationFix.zip
You will need to upgrade to the latest release for JPPF 2.5, which is v2.5.5 - we will not provide a patch for 2.5.2
To apply the fix:
- stop the JPPF server
- replace the <driver_install_root>/lib/jppf-server.jar file with the one in the downloaded zip file
- restart the server

Upon your confirmation that it is working for you, I will publish an official patch.

Thanks a lot,
-Laurent
Logged

micrei

  • JPPF Knight
  • **
  • Posts: 19
Re: Returned task does not match submitted job
« Reply #6 on: September 07, 2012, 02:57:39 PM »

Hi Laurant

You are incredibly fast. Thanks for the fix. Unfortunately I cannot test it before Monday, because I do not have access to the required environment today. I will let you know the result as soon as I can.

Kind regards,
Michel
Logged

micrei

  • JPPF Knight
  • **
  • Posts: 19
Re: Returned task does not match submitted job
« Reply #7 on: September 10, 2012, 03:02:49 PM »

Hi Laurant

I tested the fix basically with the test application. I kept the test application as it was (no libs changed. I only replaced the libs in the driver in our JPPF grid: jppf-common.jar and jppf-common-node.jar of version 2.5.5 and jppf-server.jar from the fix package. I still see the timeout exceptions as expected. The exceptions for the job / task id mismatches had gone. So you fixed the bug. Thanks a lot.

We have a workaround in our application. So we will not upgrade to 2.5.5 right now or to the fix. We will upgrade sometime to the newest version. Is this problem already solved in the newest version or will you do it now?

Best regards
Michel
Logged

lolo

  • Administrator
  • JPPF Council Member
  • *****
  • Posts: 2258
    • JPPF Web site
Re: Returned task does not match submitted job
« Reply #8 on: September 13, 2012, 05:14:46 AM »

Hi Michel,

Thank you for the feedback.
For JPPF 2.5.5 I have updated patch 01 with the fix for this bug.
In the latest version, while testing for this problem, I found another issue JPPF-62 which caused jobs to expire on the client side instead of having their countdownstart in the server. This issue is also fix, and the fix will be released in the next maintneance release, JPPF 3.1.3, due this month.

I hope this helps,
-Laurent
Logged
Pages: [1]   Go Up
 
JPPF Powered by SMF 2.0 RC5 | SMF © 2006–2011, Simple Machines LLC Get JPPF at SourceForge.net. Fast, secure and Free Open Source software downloads