Node configuration

From JPPF Documentation

Jump to: navigation, search

Contents

Main Page > Configuration guide > Node configuration

Server discovery

By default, JPPF nodes are configured to automatically discover active servers on the network. As we have seen in the server discovery configuration , this is possible thanks to the UDP broadcast mechanism of the server. On the other end, the node needs to join the same UDP group to subscribe to the broadcasts from the server, which is done by configuring the following properties:

 # Enable or disable automatic discovery of JPPF drivers
 jppf.discovery.enabled = true
 
 # UDP multicast group to which drivers broadcast their connection parameters
 jppf.discovery.group = 230.0.0.1
 
 # UDP multicast port to which drivers broadcast their connection parameters
 jppf.discovery.port = 11111
 
 # How long in milliseconds the node will attempt to automatically discover a driver
 # before falling back to the manual configuration parameters
 jppf.discovery.timeout = 5000
 
 # IPv4 address inclusion patterns
 jppf.discovery.include.ipv4 = 
 
 # IPv4 address exclusion patterns
 jppf.discovery.exclude.ipv4 = 
 
 # IPv6 address inclusion patterns
 jppf.discovery.include.ipv6 = 
 
 # IPv6 address exclusion patterns
 jppf.discovery.exclude.ipv6 = 

For the node to actually find a server on the network, the values for the group and port must be the same for a node and at least one server. If multiple servers are found on the network, the node will arbitrarily pick one.

Note the property jppf.discovery.timeout: if defines a fall back strategy that wil cause the node to connect to the server defined in the manual configuration parameters after the specified time.

The last four properties define inclusion and exclusion patterns for IPv4 and IPv6 addresses. Each of them defines a list of comma- or semicolumn- separated patterns. For the syntax of the IPv4 patterns, please refer to the Javadoc for the class IPv4AddressPattern, and to IPv6AddressPattern for IPv6 patterns syntax. This enables filtering out unwanted IP addresses: the discovery mechanism will only allow addresses that are included and not excluded.

Let's take for instance the following pattern specifications:

 jppf.discovery.include.ipv4 = 192.168.1.
 jppf.discovery.exclude.ipv4 = 192.168.1.100-

The inclusion pattern only allows IP addresses in the range 192.168.1.0, ..., 192.168.1.255. The exclusion pattern filters out IP addresses in the range 192.168.1.100, ..., 192.168.1.255. Thus, we actually defined a filter that only accepts addresses in the range 192.168.1.0, ..., 192.168.1.99

Instead of these 2 patterns, we could have simply defined the following equivalent inclusion pattern:

 jppf.discovery.include.ipv4 = 192.168.1.0-99

Manual network configuration

If server discovery is disabled, network access to a server must be configured manually. To this effect, the node requires the address of host on which the server is running, and 2 TCP ports, as shown in this example:

 # IP address or host name of the server
 jppf.server.host = my_host
 # class loader port
 class.server.port = 11111
 # communication between node and server
 node.server.port = 11113

To not define these properties is equivalent to assigning them their default value (i.e. “localhost” for the host address).

Socket connections idle timeout

In some environments, a firewall may be configured to automatically close socket connections that have been idle for more than a specified time. This may lead to a situation where a server may be unaware that a node or client was disconnected, and cause one or more jobs to never return. To remedy to that situation, it is possible to configure an idle timeout on either side of the connection, so that the connection can be close cleanly and grid operations can continue unhindered. This is done via the following property:

 jppf.socket.max-idle = timeout_in_seconds

If the timeout value is less than 10 seconds, then it is considered as no timeout. The default value is -1.

JMX management configuration

JPPF uses JMX to provide remote management capabilities for the nodes, and uses the default RMI connector for communication. Each server has its own embedded RMI registry.

The management features are enabled by default; this behavior can be changed by setting the following property:

 # Enable or disable management of this node
 jppf.management.enabled = true

When management is enabled, the following properties must be defined:

 # JMX management host IP address. If not specified (recommended), the first non-local
 # IP address (i.e. neither 127.0.0.1 nor localhost) on this machine will be used.
 # If no non-local IP is found, localhost will be used.
 jppf.management.host = localhost
 
 # JMX management port, used by the remote JMX connector
 jppf.management.port = 11198
 
 # Internal RMI port used by JMX management
 jppf.management.rmi.port = 12198

These properties have the same meaning and usage as for a server.

Recovery and failover

When the connection to a server is interrupted, the node will automatically attempt, for a given length of time, and at regular intervals, to reconnect to the same server. These properties are configured as follows, with their default values:

 # number of seconds before the first reconnection attempt
 reconnect.initial.delay = 1
 
 # time after which the system stops trying to reconnect, in seconds
 # a value of zero or less means it never stops
 reconnect.max.time = 60
 
 # time between two connection attempts, in seconds
 reconnect.interval = 1

With these values, we have configured the recovery mechanism such that it will attempt to reconnect to the server after a 1 second delay, for 60 seconds and with connection attemps at 1 second intervals.

Interaction of failover and server discovery

When dicovery is enabled for the node (jppf.dicovery.enabled = true) and the maximum reconnection time is not infinite (reconnect.max.time = <strictly_positive_value>), a sophisticated failover mechanism takes place, following the sequence of steps below:

  • the node attempts to reconnect to the driver to which it was previously connected (or attempted to connect), during a maximum time specified by the configuration property "reconnect.max.time"
  • during this maximum time, it will make multiple attempts to connect to the same driver. This covers the case when the driver is restarted in the mean time.
  • after this maximum time has elapsed, it will attempt to auto-discover another driver, during a maximum time, specified via the configuration property "jppf.discovery.timeout" (in milliseconds)
  • if the node still fails to reconnect after this timeout has expired, it will fall back to the driver manually specified in the node's configuration file
  • the cycle starts again

Recovery from hardware failures

The mechanism to recover from hardwaire failure has its counterpart on each node, which works as follows:

  1. the node establishes a specific connection to the server, dedicated to failure detection
  2. at connection time, a handshake protocol takes place, where the node communicates a unique id (UUID) to the server
  3. the node will then attempt to get a message from the server (“check” message).
  4. if the message from the server is not received in a specified time frame, and this, a specified number of times in a row, the node will consider the connection to the server broken, will close it cleanly, and let the recovery and failover mechanism take over, as described in the previous section Interaction of failover and server discovery.

The following configuration properties are those required by the nodes' hardware failure recovery mechanism implemented by the server:

 # Enable recovery from hardware failures on the node.
 # Default value is false (disabled).
 jppf.recovery.enabled = false
 
 # Dedicated port number for the detection of node failure, must be the same as
 # the value specified in the server configuration. Default value is 22222.
 jppf.recovery.server.port = 22222
  
 # Maximum number of attempts to get a message from the server before the
 # connection is considered broken. Default value is 2.
 jppf.recovery.max.retries = 2
  
 # Maximum time in milliseconds allowed for each attempt to get a message
 # from the server. Default value is 60000 (1 minute).
 jppf.recovery.read.timeout = 60000

Note: if server discovery is active for a node, then the port number specified for the driver will override the one specified in the node's configuration.

Processing threads

A node can process multiple tasks concurrently, using a pool of threads. The size of this pool is configured as follows:

 # number of threads running tasks in this node
 processing.threads = 4

If this property is not defined, its value defaults to the number of processors or cores available to the JVM.

Node process configuration

In the same way as for a server (see Server process configuration), the node is made of 2 processes. In addition to the properties and environment inherited from the controller process, it is possible to specify other JVM options via the following configuration property:

 jppf.jvm.options = -Xms64m -Xmx512m

As for the server, it is possible to specify additional class path elements through this property, by adding one or more “-cp” or “-classpath” options (unlike the Java command which only accepts one). For example:

 jppf.jvm.options = -cp lib/myJar.jar -cp lib/OtherJar.jar -Xmx512m

Class loader cache

Each node creates a specific class loader for each new client whose tasks are executed in that node. The cache itself is managed as a bounded queue, and the oldest class loader will be evicted from the cache whenever the maximum size is reached. The evicted class loader then becomes unreachable and can be garbage collected. In most modern JDKs, this also results in the classes being unloaded.

If the class loader cache size is too large, this can lead to an out of memory condition in the node, especially in these 2 scenarios:

  • if too many classes are loaded, the space reserved to the class definitions (permanent generation in Oracle JDK) will fill up and cause an “OutOfMemoryError: PermGen space”
  • if the classes hold a large amount of static data (via static fields and static initializers), an “OutOfMemoryError: Heap Space” will be thrown

To mitigate this, the size of the class loader cache can be configured in the node as follows:

 jppf.classloader.cache.size = 50

The default value for this property is 50, and the value must be at least equal to 1.

Security policy

It is possible to limit what the nodes can do on the machine that hosts them. To this effect, they provide the means to restrict what permissions are granted to them on their host. These permissions are based on the Java security policy model. Discussing Java security is not in the scope of this document, but there is ample documentation about it in the JDK documentation.

To implement security, nodes require a security policy file. The syntax of this file is similar to that of Java security policy files, except that it only accepts permission entries (no grant or security context entries).

Some examples of permission entries:

 // permission to read, write, delete node log file in current directory
 permission java.io.FilePermission "${user.dir}/jppf-node.log", "read,write,delete";
 // permission to read all log4j system properties
 permission java.util.PropertyPermission "log4j.*", "read";
 // permission to connect to a MySQL database on the default port on localhost
 permission java.net.SocketPermission "localhost:3306", "connect,listen";

To enable the security policy, the node configuration file must contain the following property definition:

 # Path to the security file, relative to the current directory or classpath
 jppf.policy.file = jppf.policy

When this property is not defined, or the policy file cannot be found, security is disabled.

The policy file does not have to be local to the node. If it is not present locally, the node will download it from the server. In this case it has to be locally accessible by the server, and the path to the policy file will be interpreted as path on the server's file system. This feature, combined with the ablity to remotely restart the nodes, allows to easily update and propagate changes to the security policy for all the nodes.

Full node configuration file (default values)

 # Host name, or ip address, of the host the JPPF driver is running on
 jppf.server.host = localhost
 
 # port number for the class server that performs remote class loading
 class.server.port = 11111
 
 # port number the nodes connect to
 node.server.port = 11113
 
 # Enabling JMX features
 jppf.management.enabled = true
 
 # JMX management host IP address
 #jppf.management.host = localhost
 
 # JMX management port
 jppf.management.port = 12001
 
 # Internal RMI port used by JMX management
 jppf.management.rmi.port = 13001
 
 # path to the JPPF security policy file
 #jppf.policy.file = config/jppf.policy
 
 # Enable/Disable automatic discovery of JPPF drivers
 jppf.discovery.enabled = true
 
 # UDP multicast group to which drivers broadcast their connection parameters
 jppf.discovery.group = 230.0.0.1
 
 # UDP multicast port to which drivers broadcast their connection parameters
 jppf.discovery.port = 11111
 
 # How long the  node will attempt to automatically discover a driver before
 # falling back to the parameters specified in this configuration file
 jppf.discovery.timeout = 5000
 
 # Automatic recovery: number of seconds before the first reconnection attempt
 reconnect.initial.delay = 1
 
 # Time after which the system stops trying to reconnect, in seconds
 reconnect.max.time = 60
 
 # Automatic recovery: time between two connection attempts, in seconds
 reconnect.interval = 1
 
 # Processing Threads: number of threads running tasks in this node
 #processing.threads = 1
 
 # Other JVM options added to the java command line when the node is started as
 # a subprocess. Multiple options are separated by spaces
 jppf.jvm.options = -server -Xmx256m
Main Page > Configuration guide > Node configuration

Support This Project Powered by MediaWiki