JPPF, java, parallel computing, distributed computing, grid computing, parallel, distributed, cluster, grid, cloud, open source, android, .net
JPPF, java, parallel computing, distributed computing, grid computing, parallel, distributed, cluster, grid, cloud, open source, android, .net

The open source
grid computing

 Home   About   Features   Download   Documentation   On Github   Forums 

Sharing data among tasks : the DataProvider API

From JPPF 6.3 Documentation

(Difference between revisions)
Jump to: navigation, search

Latest revision as of 06:44, 30 June 2019


Main Page > Development guide > Sharing data among tasks

After a job is submitted, the server will distribute the tasks in the job among the nodes of the JPPF grid. Generally, more than one task may be sent to each node. Given the communication and serialization protocols implemented in JPPF, objects referenced by multiple tasks at submission time will be deserialized as multiple distinct instances at the time of execution in the node. This means that, if n tasks reference object A at submission time, the node will actually deserialize multiple copies of A, with Task1 referencing A1, … , Taskn referencing An. We can see that, if the shared object is very large, we will quickly face memory issues.

To resolve this problem, JPPF provides a mechanism called data provider that enables sharing common objects among tasks in the same job. A data provider is an instance of a class that implements the interface DataProvider. Here is the definition of this interface:

public interface DataProvider extends Metadata {
  // @deprecated: use getParameter(Object) instead
  <T> T  getValue(final Object key) throws Exception;
  // @deprecated: use setParameter(Object, Object) instead
  void setValue(Object key, Object value) throws Exception;

As we can see, the two methods in the interface are deprecated, but kept for preserving the compatibility with applications written with a JPPF version prior to 4.0. The actual API is is defined in the Metadata interface as follows:

public interface Metadata extends Serializable {
  // Retrieve a parameter in the metadata
  <T> T getParameter(Object key);
  // Return a parameter in the metadata, or a default value if not found
  <T> T getParameter(Object key, T def);
  // Set or replace a parameter in the metadata
  void setParameter(Object key, Object value);
  // Remove a parameter from the metadata
  <T> T removeParameter(Object key);
  // Get the metadata map
  Map<Object, Object> getAll();
  // Clear all the the metadata
  void clear();

This is indeed a basic object map interface: you can store objects and associate them with a key, then retrieve these objects using the associated key.

Here is an example of using a data provider:

In the application:

MyLargeObject myLargeObject = ...;
// create a data provider backed by a HashMap
DataProvider dataProvider = new MemoryMapDataProvider();
// store the shared object in the data provider
dataProvider.setValue("myKey", myLargeObject);
// associate the dataProvider with the job
JPPFJob = new JPPFJob(dataProvider);
job.add(new MyTask());

In the task:

public class MyTask extends JPPFTask {
  public void run() {
    // get a reference to the data provider
    DataProvider dataProvider = getDataProvider();
    // retrieve the shared data
    MyLargeObject myLargeObject = (MyLargeObject) dataProvider.getValue("myKey");
    // ... use the data ...

Note 1: the association of a data provider to each task is done automatically by JPPF and is totally transparent to the application.

Note 2: from each task's perspective, the data provider should be considered read-only. Modifications to the data provider such as adding or modifying values, will NOT be propagated beyond the scope of the node. Hence, a data provider cannot be used as a common data store for the tasks. Its only goal is to avoid exessive memory consumption and improve the performance of the job serialization.

In the next sub-sections, we will detail the existing implementations of DataProvider that exist in the JPPF API.

[edit] 1 MemoryMapDataProvider: map-based provider

MemoryMapDataProvider is a very simple implementation of the DataProvider interface. It is backed by a java.util.Hashtable<Object, Object>. It can be used safely from multiple concurrent threads..

[edit] 2 Data provider for non-JPPF tasks

By default, tasks whose class does not extend AbstractTask do not have access to the DataProvider that is set on the a job. This includes tasks that implement Runnable or Callable (including those submitted with a JPPFExecutorService), annotated with @JPPFRunnable, and POJO tasks.

JPPF now provides a mechanism which enables non JPPF tasks to gain access to the DataProvider. To this effect, the task must implement the interface DataProviderHolder, defined as follows:

package org.jppf.client.taskwrapper;

// This interface must be implemented by tasks that are not subclasses
// of JPPFTask when they need access to the job's DataProvider
public interface DataProviderHolder {
  // Set the data provider for the task
  void setDataProvider(DataProvider dataProvider);

Here is an example implementation:

public class MyTask
  implements Callable<String>, Serializable, DataProviderHolder {

  // DataProvider set onto this task
  private transient DataProvider dataProvider;

  public String call() throws Exception {
    String result = (String) dataProvider.getValue("myKey");
    System.out.println("got value " + result);
    return result;

  // Called by the node when the task is received from the server
  public void setDataProvider(final DataProvider dataProvider) {
    this.dataProvider = dataProvider;

Note that the “dataProvider” attribute is set as transient, to prevent the DataProvider from being serialized along with the task when it is sent back to the server after execution. Another way to achieve this would be to set it to null at the end of the call() method, for instance in a try {} finally {} block.

Main Page > Development guide > Sharing data among tasks

JPPF Copyright © 2005-2020 Powered by MediaWiki