JPPF, java, parallel computing, distributed computing, grid computing, parallel, distributed, cluster, grid, cloud, open source, android, .net
JPPF

The open source
grid computing
solution

 Home   About   Features   Download   Documentation   On Github   Forums 
May 30, 2023, 07:33:59 AM *
Welcome,
Please login or register.

Login with username, password and session length
Advanced search  
News: New users, please read this message. Thank you!
  Home Help Search Login Register  
Pages: [1]   Go Down

Author Topic: Using files inside jobs  (Read 4375 times)

Ibereth

  • JPPF Padawan
  • *
  • Posts: 11
Using files inside jobs
« on: June 18, 2013, 11:47:42 PM »

Hi, is there anything inside JPFF to copy some files/folders to the remote nodes before executing? I have dependencies with config files that are not found remotely.
Thanks...
Logged

lolo

  • Administrator
  • JPPF Council Member
  • *****
  • Posts: 2272
    • JPPF Web site
Re: Using files inside jobs
« Reply #1 on: June 19, 2013, 07:45:48 AM »

Hello,

There are multiple ways you can use copy files / folders to a node, including:

- from a task, you can send a callback to the client, which will fetch files in the client file system and return them to the task.

- another possibility is to add the folder you need (or one of its parents) to your client's or server's classpath and then use of the AbstractJPPFClassLoader's methods getResource(), getResources(), getResourceAsStream() or getMultipleResources() to fetch the files you need.

- you can also attach the files needed, either within an attribute of each task, or (recommended) as part of a job's data provider

- yet another possibility is to embed an FTP server in your client or server, and use an FTP client API from your tasks to get the files from there. We have an example of this in our FTP Server Sample.

I hope this helps,
-Laurent
Logged

Ibereth

  • JPPF Padawan
  • *
  • Posts: 11
Re: Using files inside jobs
« Reply #2 on: December 27, 2013, 08:47:24 PM »

Hi again, I'm working on the same problem once more and decided to use the data provider to store the files and then copy them locally on each node when needed. The problem is that when I'm using something like this:

Paths.get(SomeClass.class.getResource("someFile").toURI()).toFile();

this generates a random path like this one:
/tmp/.jppf/9692e6cb989017640e689060bf136d44_1/org/package/someFile

but when I was first applyting all the files on the local system the _1 wasn't there so the file will not be there when I'm trying to access it.
Is there a way to prevent this from happening?

Thanks...
Logged

lolo

  • Administrator
  • JPPF Council Member
  • *****
  • Posts: 2272
    • JPPF Web site
Re: Using files inside jobs
« Reply #3 on: December 27, 2013, 10:08:34 PM »

Hello,

This looks like a bug that was fixed in JPPF 3.3.7: JPPF-203 Class loader resource cache generates duplicate resources.
Could you tell us which version you're using, and if not 3.3.7, could you upgrade and let us know the outcome?

Also, in your code example, I do not see how you are using the data provider to copy the files. As far as I can tell, the call to SomeClass.class.getResource() call is actually using the jppf class loader to get the file from the client's classpath. The class loader then stores it in a temporary cache in /tmp/.jppf.

To use a data provider to copy a file, I would add, on the client side, the file byte[] to the data provider, then on the node side (in the task run() method) I would get this byte[] and save it to a file via a FileOutputStream. JPPF has a location API that makes this kind of operations easier:

Code: [Select]
// client side
DataProvider dp = new MemoryMapDataProvider();
Location file = new FileLocation("someFolder/someFile");
byte[] bytes = file.toByteArray();
dp.setValue("file1", bytes);

// node side
byte[] bytes = (byte[]) getDataProvider().getValue("file1");
Location memLoc = new MemoryLocation(bytes);
// save to the node local file system
Location fileLoc = new FileLocation("someFolder/someFile");
memLoc.copyTo(fileLoc);

By doing this, you completely avoid using the class loader to download the file(s).

Sincerely,
-Laurent
Logged

Ibereth

  • JPPF Padawan
  • *
  • Posts: 11
Re: Using files inside jobs
« Reply #4 on: December 27, 2013, 11:32:37 PM »

Hi, I'm using jppf version 3.3.6 so will try to update to see if that fix the problem.
Regarding how I'm using the data provider, I'm copying the byte[] back to the file system (to those temporary directories) because I need to use a legacy code that access files directly through the File class so no classLoader magic can happen over there :)
Thanks!!!
Logged

Ibereth

  • JPPF Padawan
  • *
  • Posts: 11
Re: Using files inside jobs
« Reply #5 on: December 27, 2013, 11:55:04 PM »

I updated to version 3.3.7 and seeing the same error.
To copy the files (first time the random directory is generated) I'm using:

Paths.get(getClass().getClassLoader().getResource("jppf.properties").toURI()).getParent()

that will give me the random directory and I use that as 'root' to copy all the files (creating subDirectories, etc).
In the case I need to retrieve the File I'm using:

Paths.get(MyClass.class.getResource("myFile").toURI().toFile()

that second time its generating the _1 temporary directory

if at that moment I do something like:

Thread.currentThread().getContextClassLoader().getResource("jppf.properties")

I get the original temporary directory (without the _1)

the same happens if I try to get the file with the absolute path instead of using the class object to get it, but I cannot change that either.

I think its related to having MyClass.class but not entirely sure about it.

Thanks...
Logged

lolo

  • Administrator
  • JPPF Council Member
  • *****
  • Posts: 2272
    • JPPF Web site
Re: Using files inside jobs
« Reply #6 on: December 28, 2013, 07:27:24 AM »

Hello,

I checked on my side, and I'm unable to reproduce this behavior with JPPF 3.3.7. I'm suspecting that somehow, your nodes are running an earlier version.
To get this question out of the way, could you by any chance add this line at the start of your task's run() method:
Code: [Select]
System.out.println("running JPPF version " + VersionUtils.getVersion().getVersionNumber());
If this displays "Running JPPF 3.3.7", then it would be great if you could provide a small sample (task + client runner classes) which would allow us to reproduce and investgiate further.

Also, if you could add the following traces:
Code: [Select]
System.out.println("own classloader     = " + getClass().getClassLoader());
System.out.println("context classloader = " + Thread.currentThread().getContextClassLoader());
This would let us know if there is any class loader mismatch

Lastly, I wanted to emphasize that it is generally not a good idea to use the JPPF temporary folders as a base for your own files. In general, the node will delete these folders as soon as possible. Normally this happens whenever a classloader is removed from the node's class loader cache. A cleanup of theses folders also happens when the node's JVM terminates. It is generally safer to create and use your own folders.

Thanks for your time,
-Laurent
« Last Edit: December 28, 2013, 03:40:26 PM by lolo »
Logged

Ibereth

  • JPPF Padawan
  • *
  • Posts: 11
Re: Using files inside jobs
« Reply #7 on: January 03, 2014, 07:05:24 PM »

Ok, after running your scripts and having the right version I tried to generate an isolated projects that reproduce the problem and find out that for accesing files jppf works perfect and I don't need to copy them before. But for accessing directories it simple does not work. The result of trying to access a directory through jppf is a single file with the list of files inside the directory instead of a real directory. That's why I need to copy the files before trying to access them and it seams that jppf change strategy when the file already exist in the system and that's when the _1 directory is created.
Is there a way to change the directory behavior so I don't need to copy the content? (the code that needs that directory is legacy code so I cannot change its implementation).
Thanks...
Logged

lolo

  • Administrator
  • JPPF Council Member
  • *****
  • Posts: 2272
    • JPPF Web site
Re: Using files inside jobs
« Reply #8 on: January 04, 2014, 05:30:26 PM »

Hello,

I believe one of the problems here is a misunderstanding on the notion of files and directories. From the JVM perspective, anything you access via ClassLoader.getResource() is not considered as a file in a file system, but rather as a resource in the classloader's classpath. It could be a file, but it could also be an entry in a jar or zip file, or a remote resource on a web server, etc...

Therefore, it will not work if you try to use such resources as files, for instance by converting the URL returned by ClassLoader.getResource() into a file.
The _1 version of the file is generated because you access the corresponding resource via ClassLoader.getResource() (or something equivalent) after you have copied the file in the directory computed from the resource URL.

How does that happen? The JPPF class loader stores resources looked up via ClassLoader.getResource() (and also getResources() and getResourceAsStream()) into a cache. By default, this cache is on the local file system, but it can also be in memory, if you set "jppf.resource.cache.storage = memory" in the node confiugration.

Additionally, the class loader keeps in memory a mapping of resource names (for instance "somePackage/someFile") to all the known locations for each resource name. This map is used to determine whether to look up the resources in the remote client application : if a resource already exists in the map with that name, then this is what is returned. If not, then the class loader looks up the resource in the remote client, then adds it to the cache. When it is added to the cache, JPPF first checks if a file with the same name already exists, e.g. "/tmp/.jppf/abcdef/somePackage/someFile". If it already exists, then it will create the _1 folder, to store the new file without overwriting the old one, thus in "/tmp/.jppf/abcdef_1/somePackage/someFile".

So, what I believe is that you do not need to copy any of those files directly into /tmp/.jppf. These folders are made for JPPF internal use only and should not be tampered with. Just accessing a resource via the class loader, will cause JPPF to download it automatically from the client onto the node. Is this not sufficient for your needs? If not, could please clrify what is not working according to your requirements?

Thanks,
-Laurent
Logged

Ibereth

  • JPPF Padawan
  • *
  • Posts: 11
Re: Using files inside jobs
« Reply #9 on: January 06, 2014, 06:53:25 PM »

Hi again, first of all, thanks for the explanation.
The problem in my case is that I need to run some legacy code that will receive a File object and expect that that file object is a directory to scan for several files. So we calculate the File object using Path object but then we have no longer control over how that directory is going to be access and, obviously, is not using the class loader to get the content of the directory.
My idea was to copy the content of that directory so its on the node side when the legacy code will try to access it but since I need several versions of the same code (maybe with different values on the files) to run in parallel I cannot copy them to an absolute path.
When I was trying to understand how jppf works with file I realize that jppf was already generating a temporary directory and since the path given by Path object was pointing to that directory I tried to copy the files first so it can be found by the legacy code.
Logged

lolo

  • Administrator
  • JPPF Council Member
  • *****
  • Posts: 2272
    • JPPF Web site
Re: Using files inside jobs
« Reply #10 on: January 06, 2014, 08:08:26 PM »

Ok, thank you very much for the clarification, I think I get what your problem is now.
So I believe an approach that would work would be to load the entire content of the directory into the task on the client side, for instance in the task constructor, and in the task's run() method copy those files to a (new) directory with a unique name, resulting in a directory that you can pass to your legacy code. This could look like this:

Code: [Select]
public class MyTask extends JPPFTask {
  // maps the names of files in the directory to their content
  private Map<String, byte[]> fileMap = new HashMap<>();
  private String directoryName;
  // used to generate a unique directory name on the node side
  private static AtomicInteger sequence = new AtomicInteger(0);

  public MyTask(File directory) {
    this.directoryName = directory.getName();
    File[] files = directory.listFiles();
    if (files != null) {
      for (File file: files) {
        if (file.isFile()) {
          byte[] content = new FileLocation(file).toByteArray();
          fileMap.put(file.getName(), content);
        }
      }
    }
  }

  @Override
  public void run() {
    File directory = copyFiles();
    ... task code ...
  }

  private File copyFiles() {
    // first create a unique directory, for instance in '/tmp'
    File dir = new File("/tmp/" + this.directoryName + "_" + sequence.incrementAndGet());
    // create all necessary parent directories
    if (!dir.mkDirs()) throw new IllegalStateException("could not create directory '" + dir + "'");

    // copy all the files in that directory 
    for (Map.Entry<String, byte[]> entry: fileMap.entrySet()) {
      // wrap the byte[] into a MemoryLocation object
      Location memLocation = new MemoryLocation(entry.getValue());
      // generate the path of the file to create
      File file = new File(dir, entry.getKey());
      Location fileLocation = new FileLocation(file);
      // copy the content to the actual file
      memLocation.copyTo(fileLocation);
    }
    return dir;
  }
}

Does this resolve the problem?

-Laurent
Logged

Ibereth

  • JPPF Padawan
  • *
  • Posts: 11
Re: Using files inside jobs
« Reply #11 on: January 06, 2014, 08:54:05 PM »

I was doing something similar but I'm trying to have something that can be run both locally and with jppf and that's why I was interested on making the classloader feature working :)
Is there a way to trick jppf's cache to it can use the files I bring from the client instead of trying to get it itself?
Another aproach that I was thinking was to send the names of the files on the directory in the data provider and search for them before calling my code (as initialization of the task) so jppf will cache the real files and then the Paths.get will get the right temporary directory and the legacy code will have a real directory with all the files. The problem on this aproach is that this include copying all the files on each run and the idea of using the data provider was to minimize the transfers between client and node. Is there a way to avoid this?
Thanks...
Logged

lolo

  • Administrator
  • JPPF Council Member
  • *****
  • Posts: 2272
    • JPPF Web site
Re: Using files inside jobs
« Reply #12 on: January 07, 2014, 02:55:35 PM »

Hi,

Quote
Is there a way to trick jppf's cache to it can use the files I bring from the client instead of trying to get it itself?
Out of the box, no there isn't any way to do it. I do not think it's a very good idea either to implement this, as it has all chances to mess the class loader behavior, plus it doesn't make sense when the node is configured to store classpath resources in memory.

Your other approach would work for sure. You could avoid reloading the files at each task execution, by using the same JPPFClient uuid each time: the node will reuse the same class loader, since it is associated with the client uuid, and therefore the same resource cache. Is this something that would work for you?

Sincerely,
-Laurent
Logged

Ibereth

  • JPPF Padawan
  • *
  • Posts: 11
Re: Using files inside jobs
« Reply #13 on: January 07, 2014, 08:38:15 PM »

It could work, I will give it a try to see how its works.
In a related topic, is there a way to run some code on each node after the job is done? (to clean up some files)
Thanks...
Logged

lolo

  • Administrator
  • JPPF Council Member
  • *****
  • Posts: 2272
    • JPPF Web site
Re: Using files inside jobs
« Reply #14 on: January 07, 2014, 10:55:14 PM »

Hi,

The best way to do some cleanup after a job is done would be with a node life cycle listener, in the jobEnding() notification method.

-Laurent
Logged

Ibereth

  • JPPF Padawan
  • *
  • Posts: 11
Re: Using files inside jobs
« Reply #15 on: January 08, 2014, 08:58:41 PM »

Ok, back to the main problem :)
I tried to load all the resources first and then call the method that searchs for the directory and the problem is that if I first search for a file inside a directory and then search for the directory its generating the _1
Sample code:

Code: [Select]
import java.nio.file.Paths;
import java.util.List;

import org.jppf.client.JPPFClient;
import org.jppf.client.JPPFJob;
import org.jppf.server.protocol.JPPFTask;

public class JppfSample {
public static void main(String[] args) throws Exception {
JPPFClient client = new JPPFClient();
JPPFJob job = new JPPFJob();
job.addTask(new JPPFTask() {
private String resource1, resource2;
public void run() {
try {
String file = Paths.get(Thread.currentThread().getContextClassLoader().getResource("resource2/resources/etc").toURI()).toFile().getAbsolutePath();
System.out.println(resource1 = file);
file = Paths.get(Thread.currentThread().getContextClassLoader().getResource("resource2/resources").toURI()).toFile().getAbsolutePath();
System.out.println(resource2 = file);
} catch (Throwable e) {
e.printStackTrace();
}
}

@Override public String toString() {
return resource1 + "\n" + resource2;
}
});
List<JPPFTask> submit = client.submit(job);
JPPFTask task = submit.get(0);
System.out.println(task);
System.exit(0);
}
}

I have a file at: resource2/resources/etc for this sample to work.

The output is:

/tmp/.jppf/cb9d8432d5f87cd98c84eb80289c0aa9/resource2/resources/etc
/tmp/.jppf/cb9d8432d5f87cd98c84eb80289c0aa9_1/resource2/resources

Logged

lolo

  • Administrator
  • JPPF Council Member
  • *****
  • Posts: 2272
    • JPPF Web site
Re: Using files inside jobs
« Reply #16 on: January 09, 2014, 08:02:01 PM »

Hi,

Yes, the behavior you observe is normal: JPPF does not create any internal mapping for resources that are directories, so when you call ClassLoader.getResource("resource2/resources"), it will try to get it from the remote client, then when checking if the file name already exists in the cache, it will find out that there indeed such a path and it will create the _1 folder to hold the result of the resource lookup.

BTW, I figured why ClassLoader.getResource() on a directory returns a string listing the files in this directory: this results from the implementation of the getInputStream() method in the class sun.net.www.protocol.file.FileURLConnection (it is undocumented and I had to decompile it to have an idea of what it does).

So in you example, what I would do is the following:
Code: [Select]
File file = Paths.get(Thread.currentThread().getContextClassLoader().getResource("resource2/resources/etc").toURI()).toFile();
String resource = file.getAbsolutePath();
System.out.println("resource1 = " + resource);
resource = file.getParentFile().getAbsolutePath();
System.out.println("resource2 = " + resource);
This works without a problem.

Sincerely,
-Laurent
Logged
Pages: [1]   Go Up
 
JPPF Powered by SMF 2.0 RC5 | SMF © 2006–2011, Simple Machines LLC Get JPPF at SourceForge.net. Fast, secure and Free Open Source software downloads