JPPF Performance

From JPPFWiki

Jump to: navigation, search

Contents

Main Page > JPPF Performance

What is this page about?

The goal of this page is to provide information, as well as hints and advice, to understand and maximize the performance of JPPF and JPPF-based applications. It is designed as an on-going work, as we are constantly testing and improving the peroformance features of the framework. Thus, it will be regularly updated as new inputs and findings come out.

Configuring JPPF for performance

Driver configuration

The distribution of the tasks to the nodes is performed by the JPPF driver. This work is actually the main factor of the observed performance of the framework. It consists essentially in determining how many tasks will go to each node for execution, out of a set of tasks sent by the client application. Each set of tasks sent to a node is called a "bundle", and the role of the load balancing (or task scheduling) algorithm is to optimize the performance by adjusting the number of task sent to each node.
JPPF now has 3 different algorithms to compute the distribution of tasks to the nodes, each with its own configuration parameters.

  1. Static algorithm: "manual" configuration
    This algorithm simply distributes a fixed number of tasks to each node. It is suited for networks where each node has equivalent characteristics (hardware and software), and where the type and number of tasks remains the same. There is only one parameter to configure: task.bundle.size, its value being the number of tasks to distribute to each node.
  2. Heuristic algorithm: "autotuning" configuration
    This algorithm is heuristic in the sense that it determines a good solution for the number of tasks to send to each node, while the solution may not be the optimal one. This algorithm is loosely based on the Monte Carlo algorithm. The heuristic part is provided by the fact that this algorithm performs a random walk within the space of solutions. It is a purely adaptive algorithm, based on the known past performance of each node. It does not rely on, nor does it know about, the characteristics of the nodes (i.e. hardware and software on the physical host).
    This algorithm is local to each node connection, meaning that a separate task bundle size is determined for each node, independently of the other nodes. In fact, the other nodes will adjust due to the performance changes generated each time the bundle size changes.
    Here is a recommended configuration for the best performance in most cases:
      # specify the algorithm
      task.bundle.strategy = autotuned
      # name of the performance profile to use
      task.bundle.autotuned.strategy = optimized_strategy
      # define the algorithm parameters
      strategy.optimized_strategy.minSamplesToAnalyse = 100
      strategy.optimized_strategy.minSamplesToCheckConvergence = 50
      strategy.optimized_strategy.maxDeviation = 0.2
      strategy.optimized_strategy.maxGuessToStable = 50
      strategy.optimized_strategy.sizeRatioDeviation = 1.5
      strategy.optimized_strategy.decreaseRatio = 0.2
    
  3. Deterministic algorithm: "proportional" configuration
    As for the heuristic algorithm, this one is purely adaptive and based solely on the known past performance of the nodes.
    However, contrary to the heuristic approach, the computation is performed without a random part. Each bundle size is determined in proportion to the mean execution time to the power of N. Here, N is one of the algorithm's parameters, called "proportionality factor", hence the algorithm name. The mean time is computed as a moving average over the last M executed tasks for a given node. M is the other algorithm parameter, called "performance cache size".
    Also contrary to the heuristic approach, this algorithm is global to the server, which means that the bundle size for each node depends on that of the other nodes. This also means that it does not allow overriding the bundle size tuning strategy at the node level.
    This algorithm is particularly well suited for relatively small networks, 100 nodes or less, and we have regularly observed that it out-performs the heuristic algorithm in most cases. Over that number of nodes, the computation overhead is likely to outweigh the performance gain.
    We have found the following configuration to provide an excellent performance:
      # specify the algorithm
      task.bundle.strategy = proportional
      # name of the performance profile to use
      task.bundle.autotuned.strategy = optimized_strategy
      # define the algorithm parameters
      strategy.optimized_strategy.performanceCacheSize = 2000
      strategy.optimized_strategy.proportionalityFactor = 2
    

Node configuration

For the nodes, there are 2 configuration areas that will help manage the performance

  1. Specifying the number of execution threads:
    Tasks are sent to the nodes grouped as bundles for execution. It is clear that allowing the concurrent execution of multiple tasks will generate a performance benefit, if the hardware and OS on which the node is running can cope.
    To this effect, nodes use a pool of threads to perform the execution of the tasks. A node's configuration file has a property, "processing.threads" to specify the size of the execution thread pool.
    As the performance will depend on many factors, including the hardware, the nature of the tasks, the ratio of IO versus computation, etc., it is delicate to make a workable recommendation. I many cases, however, setting a pool size larger than the number of available CPUs/cores will significantly improve the performance
  2. Overriding the server's tuning configuration:
    Each node has the ability to override the server's settings for performance tuning. This is done by specifying the same parameters, in the node configuration file, as those specified on the server side.
    The override is enabled by specifying the property "task.bundle.strategy", to either "manual" or "autotuned". It will only work if the server is configured for heuristic auto-tuning algorithm. If the server uses the "proportional" or "manual" algorithm, node overrides will be discarded. This is due to the global nature of these algorithms.
    So basically the types of overrides you can specify are as follows:
      •  override the server's "autouning" settings with a fixed bundle size for a specific node
      •  override the "autotuning" parameters with a set of node-specific parameters

Client configuration

To be completed

Tasks design and performance

To be completed

Topology

To be completed

Advanced JPPF load balancing

To be completed

Personal tools