adequate
adequate
adequate
adequate
 

JPPF, java, parallel computing, distributed computing, grid computing, parallel, distributed, cluster, grid, cloud, open source, android, .net
JPPF

The open source
grid computing
solution

 Home   About   Features   Download   Documentation   Forums 
June 25, 2018, 12:07:28 PM *
Welcome,
Please login or register.

Login with username, password and session length
Advanced search  
News: New users, please read this message. Thank you!
  Home Help Search Login Register  
Pages: [1]   Go Down

Author Topic: JPPFContainer fails to load class of tasks - jobs disappear and are quitted  (Read 388 times)

flo

  • JPPF Padawan
  • *
  • Posts: 13

Hello Developers, hello Laurant,

testing my running grid I observed vanishing jobs and I found no way how to avoid this. So I want to ask you for your help and hope you can give me some tipps.

The procedures I was running through are the following:
(1) starting the server, gui and one node; then the job was submitted (some tests with a running connection between runner and server and some without)
(2) starting the server and gui; running the job (client on the runner side is still connected to the server) then connect one node
(3) starting the server and gui; submit the job, interrupt the connection between runner and server and afterwards I started one node

After testing a lot with (1) and (2) I was happy because everything worked fine. During the last testing step we saw, that jobs vanish in procedure (3).
I started to search for the reasons. First I checked version 5.2 but there was the same problem.
Debugging and stepping through the code gave me some ideas what could be the problem but I see no way how to solve them. It seems that the job is ended in a regular way - JPPFJobManager.fireJobEvent(final JobNotification event) is called with the events JOB_ENDED and JOB_RETURNED. As a consequence of this I checked the log-files and could see that the task-class is not found.
I got the error message
Quote
[ERROR][org.jppf.server.node.JPPFContainer.call(218)]: task at index 1 could not be deserialized : java.lang.ClassNotFoundException: Could not load class 'JPPFTask'

If I run more then one job with the same task from one runner, there is no problem, even if i disconnect the client. Only if I send one or more jobs to the grid and kill the connection between client and server before one job/task of this client is dispatched to a node.


Do you have any ideas or could you imagine how to solve this problem?

I really hope that I do not annoy you with my questions! I am very thankful for any of your answers.
Thank you very much!!!


Kind regards
flo
Logged

lolo

  • Administrator
  • JPPF Council Member
  • *****
  • Posts: 2223
    • JPPF Web site

Hello Flo,

From your detailed description, I believe the behavior you observe is the expected one and is due to the way the JPPF distributed class loader works.
We have documented this in details in the class loading documentation, but let me try to summarize.

When a job arrives in a node, the node will try to deserialize the tasks. In the default mode, the tasks' class isn't part of the node's classpath, so the node will delegate the class loading to the JPPF classloader, which in turn will send a class loading request to the server. In a similar way, the server will send a class loading request to the client that submitted the job. The client will then find the corresponding .class file and send it back as a byte[] in its response to the server, which will then send it back to the node. From there, the node is able to load the class and instantiate the task it is deserializing. The class loading mechaism knows which client to send the request to because each job embeds the uuid of the client that submits it, along with the uuid of the server(s) it goes through. Thus, upon arriving to a node, the job will have a "uuid path" in the form "/client_uuid/driver_uuid". In a  multi-server topology, the uuid path will be like "/client_uuid/driver1_uuid/.../driverN_uuid".

Now, if you disconnect the client from the driver before the job is sent to the node, the driver will not be able to fetch the class from the client, thus it will send a "class not found" response to the node, and the JPPF class loader in the node wil throw a ClassNotFoundException.

The same problem may also occur after deserialization of the task, because the task execution may require classes that are not yet loaded. For example, if you disconnect the client/runner while the tasks are still executing in the node, it is possible that you will run into a ClassNotFoundException because it will no longer be possible to get the class files from the client.

The easiest way to avoid this is to have the classes of your tasks and their dependencies in the node's classpath. The drawback is that it places a deployment burden on you, since you'd have to copy your jars and their dependencies to each node. You can then disable remote class loading on a per-job basis, by setting the remoteClassLoadingEnabled attribute of the job SLA:
Code: [Select]
JPPFJob job = new JPPFJob();
job.getSLA().setRemoteClassLoadingEnabled(false);

Another, more complicated way, is to transport the required jar files along with the job, then use a NodeLifeCycleListener addon and override its jobHeaderLoaded() callback to add the jars to the JPPF class loader's classpath.
For example, on the client/runner side you could add the jars like this:
Code: [Select]
public class ClientSide {
  public void createJob() {
    JPPFJob job = new JPPFJob();
    updateJobClassPath(job, "./libs/myLib1.jar", "./libs/myLib2.jar");
  }
 
  public void updateJobClassPath(JPPFJob job, String...jarPaths) {
    try {
      ClassPath classpath = job.getSLA().getClassPath();
      for (String path: jarPaths) {
        File jar = new File(path);
        byte[] bytes = FileUtils.getFileAsByte(jar);
        classpath.add(jar.getName(), new MemoryLocation(bytes));
      }
    } catch(Exception e) {
      e.printStackTrace();
    }
  }
}

then on the node side, you could define a node life cycle listener like this:
Code: [Select]
public class MyNodeListener extends NodeLifeCycleListenerAdapter {
  @Override
  public void jobHeaderLoaded(NodeLifeCycleEvent event) {
    JPPFDistributedJob job = event.getJob();
    List<URL> classpath = extractClasspathFromJob(job);
    AbstractJPPFClassLoader classLoader = event.getTaskClassLoader();
    for (URL url: classpath) classLoader.addURL(url);
  }

  private List<URL> extractClasspathFromJob(JPPFDistributedJob job) {
    List<URL> urls = new ArrayList<>();
    try {
      ClassPath cp = job.getSLA().getClassPath();
      for (ClassPathElement cpElt: cp) {
        Location<?> loc = cpElt.getRemoteLocation();
        Location<URL> urlLoc = loc.copyTo(new URLLocation("file:///tmp/" + cpElt.getName()));
        urls.add(urlLoc.getPath());
      }
    } catch(Exception e) {
      e.printStackTrace();
    }
    return urls;
  }
}

For your convenience, I attached the full source of these code examples to this post.

I hope this helps,
-Laurent
« Last Edit: August 24, 2016, 08:49:43 AM by lolo »
Logged

flo

  • JPPF Padawan
  • *
  • Posts: 13

Hello Laurant,

Thank you very much for your summary! It is very good and helps me a lot.
I have tried it for 3 hours, but it is still not working.

I will check this the next days and hopefully will get this.
Otherwise I would contact you again next week, if you are fine with that.

Have a nice week!  ;) :)

Kind regards
Flo



Logged

flo

  • JPPF Padawan
  • *
  • Posts: 13

Hello Laurant,

now I have found a way how to code a NodeListener which work correctly.
I have based it completely on your suggestion! I am very grateful to you for the effort you made to summarize the code and to explain it.  :)
Thank you very much.

I had some problems which was caused by not finding the jar-files. I solved it with the attribute tempDir in the object.
This brought the access to the same folder anytime.
Here is my code:

Code: [Select]
/*
 * JPPF.
 * Copyright (C) 2005-2016 JPPF Team.
 * http://www.jppf.org
 *
 * Licensed under the Apache License, Version 2.0 (the "License");
 * you may not use this file except in compliance with the License.
 * You may obtain a copy of the License at
 *
 *   http://www.apache.org/licenses/LICENSE-2.0
 *
 * Unless required by applicable law or agreed to in writing, software
 * distributed under the License is distributed on an "AS IS" BASIS,
 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 * See the License for the specific language governing permissions and
 * limitations under the License.
 */

package classPathLoader;

import java.io.File;
import java.net.URL;
import java.util.ArrayList;
import java.util.List;

import org.jppf.classloader.AbstractJPPFClassLoader;
import org.jppf.location.Location;
import org.jppf.location.URLLocation;
import org.jppf.node.event.NodeLifeCycleEvent;
import org.jppf.node.event.NodeLifeCycleListenerAdapter;
import org.jppf.node.protocol.ClassPath;
import org.jppf.node.protocol.ClassPathElement;
import org.jppf.node.protocol.JPPFDistributedJob;

public class NodeListener extends NodeLifeCycleListenerAdapter {

/**
* Location where the downloaded libraries are stored on the node's file system.
*/
public static final String REPOSITORY_DIR = "tmp";

public static File tempDir;


/**
  * Upon connection to the server, delete the libraries listed
  * in the "toDelete" file, and update the file accordingly
*/
@Override
public void nodeStarting(final NodeLifeCycleEvent event) {
try {
tempDir = new File("/" + REPOSITORY_DIR + "/");

if (tempDir.exists()) {
removeTemp(tempDir);
}
tempDir.mkdir();
tempDir.deleteOnExit();
} catch (Exception e) {
e.printStackTrace();
}
}

private void removeTemp(File tempDir) {

for (File file: tempDir.listFiles()) {
if (file.isDirectory()) {
removeTemp(file);
}
file.delete();
}
}


/**
* Upon disconnection from the server, perform repository cleanup operations.
*/
@Override
public void nodeEnding(final NodeLifeCycleEvent event) {
}


  @Override
  public void jobHeaderLoaded(NodeLifeCycleEvent event) {
  JPPFDistributedJob job = event.getJob();
  List<URL> classpath = extractClasspathFromJob(job);
  AbstractJPPFClassLoader classLoader = event.getTaskClassLoader();
  for (URL url: classpath) classLoader.addURL(url);

  }

  private List<URL> extractClasspathFromJob(JPPFDistributedJob job) {
 
  try {
      ClassPath cp = job.getSLA().getClassPath();
      List<URL> urls = new ArrayList<>();
      for (ClassPathElement cpElt: cp) {
        Location<?> loc = cpElt.getRemoteLocation();
        Location<URL> urlLoc = loc.copyTo(new URLLocation("file:///" + tempDir.getAbsolutePath() + "/" + cpElt.getName()));
        urls.add(urlLoc.getPath());
      }
      return urls;
    } catch(Exception e) {
      e.printStackTrace();
    }
    return null;
  }
}
 

Now there showed up a new Problem:
My grid is running on D://JPPF_Testproject/ and the libraries of my tasks are placed to D://temp.
Additionally there is generated a file with the same name on D://.
I checked that it is not managed from my code, but I found no other point where the copying could be performed.

Do you have an idea how to check out the point where it is copied?

Thank you very much in advance!
Flo
Logged

lolo

  • Administrator
  • JPPF Council Member
  • *****
  • Posts: 2223
    • JPPF Web site
Re: JPPFContainer fails to load class of tasks - jobs disappear and are quitted
« Reply #4 on: September 05, 2016, 07:07:20 AM »

Hi Flo,

Sorry for the late reply. To try and determine where the coying is occurring, I would add traces at various points in the code, checking for the existence of the files in D:\. For instance something like this:

Code: [Select]
File fileTocheck = new File("D:/myJar.jar");
checkFile("trace1:", fileToCheck);
List<URL> classpath = extractClasspathFromJob(job);
checkFile("trace2:", fileToCheck);

public boolean checkFile(String prefix, File fileToCheck) {
  boolean exists = fileToCheck.exists();
  String status = exists ? "exists" : "does not exist";
  System.out.printf("%s the file %s %s%n", prefix, fileToCheck, status);
  return exists;
}

Sincerely,
-Laurent
Logged
Pages: [1]   Go Up
 
JPPF Powered by SMF 2.0 RC5 | SMF © 2006–2011, Simple Machines LLC Get JPPF at SourceForge.net. Fast, secure and Free Open Source software downloads