JPPF, java, parallel computing, distributed computing, grid computing, parallel, distributed, cluster, grid, cloud, open source, android, .net
JPPF, java, parallel computing, distributed computing, grid computing, parallel, distributed, cluster, grid, cloud, open source, android, .net
JPPF

The open source
grid computing
solution

 Home   About   Features   Download   Documentation   On Github   Forums 

Sharing data among tasks : the DataProvider API

From JPPF 3.3 Documentation

Jump to: navigation, search

Contents

Main Page > Development guide > Sharing data among tasks


After a job is submitted, the server will distribute the tasks in the job among the nodes of the JPPF grid. Generally, more than one task may be sent to each node. Given the communication and serialization protocols implemented in JPPF, objects referenced by multiple tasks at submission time will be deserialized as multiple distinct instances at the time of execution in the node. This means that, if n tasks reference object A at submission time, the node will actually deserialize multiple copies of A, with Task1 referencing A1, … , Taskn referencing An. We can see that, if the shared object is very large, we will quickly face memory issues.

To resolve this problem, JPPF provides a mechanism called data provider that enables sharing common objects among tasks in the same job. A data provider is an instance of a class that implements the interface DataProvider. Here is the definition of this interface:

 public interface DataProvider extends Serializable {
   Object getValue(Object key) throws Exception;
 
   void setValue(Object key, Object value) throws Exception;
 }

This is indeed a basic object map interface: you can store objects and associate them with a key, then retrieve these objects using the associated key.

Here is an example of using a data provider:

In the application:

 MyLargeObject myLargeObject = ...;
 // create a data provider backed by a HashMap
 DataProvider dataProvider = new MemoryMapDataProvider();
 // store the shared object in the data provider
 dataProvider.setValue("myKey", myLargeObject);
 // associate the dataProvider with the job
 JPPFJob = new JPPFJob(dataProvider);
 job.add(new MyTask());

In the task:

 public class MyTask extends JPPFTask {
   public void run() {
     // get a reference to the data provider
     DataProvider dataProvider = getDataProvider();
     // retrieve the shared data
     MyLargeObject myLargeObject = (MyLargeObject) dataProvider.getValue("myKey");
     // ... use the data ...
   }
 }

Note 1: the association of a data provider to each task is done automatically by JPPF and is totally transparent to the application.

Note 2: from each task's perspective,the data provider should be considered read-only. Modifications to the data provider such as adding or modifying values, will NOT be propagated beyond the scope of the node. Hence, a data provider cannot be used as a common data store for the tasks. Its only goal is to avoid exessive memory consumption.

In the next sub-sections, we will detail the existing implementations of DataProvider that exist in the JPPF API.

1 MemoryMapDataProvider: map-based provider

MemoryMapDataProvider is a very simple implementation of the DataProvider interface. It is backed by a java.util.HashMap<Object, Object>. The getValue() method is equivalent to a call to HashMap.get(), and the setValue() is equivalent to HashMap.put().

2 ClientDataProvider: computing data in the client


Note: as of JPPF 3.2, this feature has been deprecated, and may be removed in a future version. It is replaced with the JPPFtask.compute(JPPFCallable) and JPPFTask.isInNode() APIs, as described in "Executing code in the client from a task“. These APIs resolve a limitation of ClientDataProvider which does not work when a task is executing locally with regards to the JPPF client.



JPPF provides a way for a task to send a piece of code to be executed on the client and get the resulting data objects.

This functionality is available through the use of a new DataProvider: ClientDataProvider.

The class ClientDataProvider extends MemoryMapDataProvider and adds one method:

 public <V> Object computeValue(Object key, JPPFCallable<V> callable)

Here, callable is the equivalent of a callback that is sent to the client for execution, and whose result is stored in the DataProvider on the node side. The interface JPPFCallable<V> is defined as follows:

 public interface JPPFCallable<V> extends Callable<V>, Serializable {
 }

Example use:

 public class DataProviderTestTask extends JPPFTask {
   public void run() {
     System.out.println("this should be on the node side");
     ClientDataProvider dataProvider = (ClientDataProvider) getDataProvider();
     // compute a value on the client side and store in the data provider
     Object o = dataProvider.computeValue("result", new MyCallable());
     System.out.println("Result of client-side execution:\n" + o);
     setResult(o);
     // retrieve the value without re-computing it
     Object o2 = dataProvider.getValue("result");
   }
 
   /**
   * A callable that simply prints a message on the client side
   * and returns the message to the node.
   */
   public static class MyCallable implements JPPFCallable<String> {
     public String call() {
       String s = "this should be on the client side";
       System.out.println(s);
       return s;
     }
   }
 }

Here is the sequence of steps performed when calling the method ClientDataProvider.computeValue():

  • the JPPFCallable instance is sent to the client application
  • the resulting value is computed as the return value of JPPFCallable.call()
  • the resulting value is sent back to the node
  • the value is stored in the data provider by calling ClientDataProvider.setValue(key, value)
  • the value is returned as the result of ClientDataProvider.computeValue()

Once the value has been computed, it can be retrieved, without being computed again, by calling the method ClientDataProvider.getValue(). To compute a new value, ClientDataProvider.computeValue() should be called again.


Main Page > Development guide > Sharing data among tasks

3 Data provider for non-JPPF tasks

By default, tasks whose class does not extend JPPFTask do not have access to the DataProvider that is set on the a job. This includes tasks that implement Runnable or Callable (including those submitted with a JPPFExecutorService), annotated with @JPPFRunnable, and POJO tasks.

JPPF now provides a mechanism which enables non JPPF tasks to gain access to the DataProvider. To this effect, the task must implement the interface DataProviderHolder, defined as follows:

 package org.jppf.client.taskwrapper;
 import org.jppf.task.storage.DataProvider;
 
 // This interface must be implemented by tasks that are not subclasses
 // of JPPFTask when they need access to the job's DataProvider
 public interface DataProviderHolder {
   // Set the data provider for the task
   void setDataProvider(DataProvider dataProvider);
 }

Here is an example implementation:

 public class MyTask
   implements Callable<String>, Serializable, DataProviderHolder {
 
   // DataProvider set onto this task
   private transient DataProvider dataProvider;
 
   @Override
   public String call() throws Exception {
     String result = (String) dataProvider.getValue("myKey");
     System.out.println("got value " + result);
     return result;
   }
 
   // Called by the node when the task is received from the server
   @Override
   public void setDataProvider(final DataProvider dataProvider) {
     this.dataProvider = dataProvider;
   }
 }

Note that the “dataProvider” attribute is set as transient, to prevent the DataProvider from being serialized along with the task when it is sent back to the server after execution. Another way to achieve this would be to set it to null at the end of the call() method, for instance in a try {} finally {} block.


JPPF Copyright © 2005-2020 JPPF.org Powered by MediaWiki