Configuring a JPPF server
From JPPF 4.2 Documentation
|Main Page > Configuration guide > Configuring a JPPF server|
Basic network configuration
The server network communication mechanism uses TCP/IP. To do its basic work of receiving jobs and dispatching them for execution, one TCP port is required. In the configuration file, this property would be defined as follows, with its default value:
# JPPF server port jppf.server.port = 11111
- not defining this property is equivalent to assigning it its default value.
- dynamic port allocation: when the port number is set to 0, JPPF will dynamically allocate a valid port number. Note that this feature is mostly useful when server discovery is enabled.
Backward compatibility with JPPF v2.x: To avoid too much disruption in applications configured for JPPF v2.x, JPPF will use the server port defined with the "old" property "class.server.port" if "jppf.server.port" is not defined.
By default, JPPF nodes and clients are configured to automatically discover active servers on the network. This is made possible because, by default, a JPPF server will broadcast the required information (i.e. host address and port numbers) using the UDP multicast mechanism. This mechanism itself is configurable, by setting the following properties:
# Enable or disable automatic discovery of JPPF drivers jppf.discovery.enabled = true # UDP multicast group to which drivers broadcast their connection parameters jppf.discovery.group = 220.127.116.11 # UDP multicast port to which drivers broadcast their connection parameters jppf.discovery.port = 11111 # How long a driver should wait between 2 broadcasts, in milliseconds jppf.discovery.broadcast.interval = 1000 # IPv4 address inclusion patterns jppf.discovery.broadcast.include.ipv4 = # IPv4 address exclusion patterns jppf.discovery.broadcast.exclude.ipv4 = # IPv6 address inclusion patterns jppf.discovery.broadcast.include.ipv6 = # IPv6 address exclusion patterns jppf.discovery.broadcast.exclude.ipv6 =
The values indicated above are the default values. Note that, given the nature of the UDP protocol, the broadcast data is transient, and has to be re-sent at regular intervals to allow new nodes or clients to find the information. The broadcast interval property allows some control over the level of network traffic ("chattyness") thus generated.
The last four properties define inclusion and exclusion patterns for IPv4 and IPv6 addresses. Each of them defines a list of comma- or semicolumn- separated patterns. The IPv4 patterns can be exppressed in either CIDR notation, or in a syntax defined in the Javadoc for the class IPv4AddressPattern. Similarly, IPv6 patterns can be expressd in CIDR notation or in a syntax defined in IPv6AddressPattern. This allows restricting the network interfaces on which to broadcast the server information: the server will only broadcast from the host's addresses that are included and not excluded.
Connecting to other servers
We have seen in the JPPF topology section that servers can connect to each other, up to a full-fledged peer-to-peer topology. When a server A connects to another server B, A will act as a node attached to B (from B's perspective). Based on this, there are 4 possible kinds of connectivity between 2 servers:
- A and B are not connected at all
- A is connected to B (i.e. A acts as a node attached to B)
- B is connected to A (i.e. B acts as a node attached to A)
- A and B are connected to each other
There are 2 ways to define a connection from a server to other servers on the network:
Using automatic discovery
In this scenario, we must enable the discovery of peer servers:
# Enable or disable auto-discovery of other peer servers (defaults to false) jppf.peer.discovery.enabled = true
For this to work, the server broadcast must be enabled on the peer server(s), and the properties defined in the previous section will be used, hence they must be set to the same values on the other server(s). A server can discover other servers without having to broadcast its own connection information (i.e. without being "discoverable").
Please note that the default value for the above property is "false". Setting the default to "true" would imply that each server would connect to all other servers accessible on the network, with a high risk of unwanted side effects.
Manual connection to peer servers
This will be best illustrated with an example configuration:
# define a space-separated list of peers to connect to jppf.peers = server_1 server_2 # connection to server_1 jppf.peer.server_1.server.host = host_1 jppf.peer.server_1.server.port = 11111 # connection to server_2 jppf.peer.server_2.server.host = host_2 jppf.peer.server_2.server.port = 11111
To connect to each peer, we must define its IP address or host name as well as a port number. Please note that the value we have defined for " jppf.peer.server_1.server.port" must be the same as the one defined for "jppf.server.port" in server_1's configuration, and the value for "jppf.peer.server_1.server.port" must be equal to that of "jppf.server.port" in server_1's configuration.
Backward compatibility with JPPF v2.x: To avoid too much disruption in applications configured for JPPF v2.x, JPPF will use the server port defined with the "old" property "jppf.peer.server_1.class.port" if "jppf.peer.server_1.server.port" is not defined.
Using manual configuration and server discovery together
It is also possible to use the manual configuration simultaneously with the discovery, by adding a specific peer driver name, “jppf_discovery” to the list of manually configured peers:
# enable auto-discovery of other peer servers jppf.peer.discovery.enabled = true # specifiy both discovery and manually configured peers jppf.peers = jppf_discovery server_1 # host for this driver server_1.jppf.server.host = host_1 # port for this driver server_1.jppf.server.port = 11111
JMX management configuration
JPPF uses JMX to provide remote management capabilities for the servers, and uses the JMXMP connector for communication.
The management features are enabled by default; this behavior can be changed by setting the following property:
# Enable or disable management of this server jppf.management.enabled = true
When management is enabled, the following properties must be defined:
# JMX management host IP address. If not specified (recommended), the first non-local # IP address (i.e. neither 127.0.0.1 nor localhost) on this machine will be used. # If no non-local IP is found, localhost will be used. jppf.management.host = localhost # JMX management port, used by the remote JMX connector jppf.management.port = 11198
Let's see in more details the usage of each of these properties:
- jppf.management.host: defines the host name or IP address for the remote management and monitoring of the servers and nodes. It represents the host where an RMI registry is running. When this property is not defined explicitely, JPPF will automatically fetch the first non-local IP address (meaning not the loopback address) it can find on the current host. If none is found, localhost will be used. This provides a way to use an identical configuration for all the servers on a network.
- jppf.management.port: defines the port number for connecting to the remote Mbean server. The default value for this property is 11198. If 2 nodes, 2 drivers or a driver and a node run on the same host, they must have a different value for this property.
Note: if a management port is already in use by another JPPF component or application, JPPF will automatically increment it, until it finds an available port number. This means that you can in fact leave the port numbers to their default values (or not specify them at all), as JPPF will automatically ensure that valid unique port numbers are used.
The distribution of the tasks to the nodes is performed by the JPPF driver. This work is actually the main factor of the observed performance of the framework. It consists essentially in determining how many tasks will go to each node for execution, out of a set of tasks sent by the client application. Each set of tasks sent to a node is called a "bundle", and the role of the load balancing (or task scheduling) algorithm is to optimize the performance by adjusting the number of task sent to each node.
The algorithm to use is configured with the following property:
jppf.load.balancing.algorithm = <algorithm_name>
The algorithm name can be one of those prefefined in JPPF, or a user-defined one. We will see how to define a custom alogrithm in Creating a custom load-balancer. JPPF now has 4 predefined load-balancing algorithms to compute the distribution of tasks to the nodes, each with its own configuration parameters. These algorithms are:
- “manual” : each bundle has a fixed number of tasks, meaning that each will receive at most this number of tasks
- “autotuned” : adaptive heuristic algorithm based on the Monte Carlo algorithm
- “proportional” : an adaptive deterministic algorithm based on the contribution of each node to the overall mean task execution time
- “rl” : adaptive algorithm based on an artificial intelligence technique called “reinforcement learning”
- “NodeThreads” : each bundle will have at most n * m tasks, where n is the number of threads in the node to which the bundle is sent is sent, and m is a user-defined parameter
The predefined possible values for the property jppf.load.balancing.algorithm are thus: manual, autotuned, proportional, rl and NodeThreads. If not defined the algorithm defaults to manual. For example:
jppf.load.balancing.algorithm = proportional
In addition to the pre-defined algorithms, it is possible to define your own, as described in the section Creating a custom load-balancer.
Each algorithm uses its own set of parameters, which define together a strategy for the algorithm, It is also called a performance profile or simply profile, and we will use these terms interchangeably. A strategy has a name that serves to identify a group of parameters and their values, using the following pattern:
jppf.load.balancing.profile = <profile_name> jppf.load.balancing.profile.<profile_name>.<parameter_1> = <value_1> ... jppf.load.balancing.profile.<profile_name>.<parameter_n> = <value_n>
Using this, you can define multiple profiles and easily switch from one to the other, by simple changing the value of jppf.load.balancing.profile. It is also possible to mix, in a single profile, the parameters for multiple algorithms, however it is not recommended, as there may be name collisions.
To illustrate this, we will give a sample profile for each of the predefined algorithms:
# algorithm name jppf.load.balancing.algorithm = manual # name of the set of parameter values or profile for the algorithm jppf.load.balancing.profile = manual_profile # "manual" profile jppf.load.balancing.profile.manual_profile.size = 1
# algorithm name jppf.load.balancing.algorithm = autotuned # name of the set of parameter values or profile for the algorithm jppf.load.balancing.profile = autotuned_profile # "autotuned" profile jppf.load.balancing.profile.autotuned_profile.size = 5 jppf.load.balancing.profile.autotuned_profile.minSamplesToAnalyse = 100 jppf.load.balancing.profile.autotuned_profile.minSamplesToCheckConvergence = 50 jppf.load.balancing.profile.autotuned_profile.maxDeviation = 0.2 jppf.load.balancing.profile.autotuned_profile.maxGuessToStable = 50 jppf.load.balancing.profile.autotuned_profile.sizeRatioDeviation = 1.5 jppf.load.balancing.profile.autotuned_profile.decreaseRatio = 0.2
# algorithm name jppf.load.balancing.algorithm = proportional # name of the set of parameter values or profile for the algorithm jppf.load.balancing.profile = proportional_profile # "proportional" profile jppf.load.balancing.profile.proportional_profile.initialSize = 1 jppf.load.balancing.profile.proportional_profile.performanceCacheSize = 1000 jppf.load.balancing.profile.proportional_profile.proportionalityFactor = 1
# algorithm name jppf.load.balancing.algorithm = rl # name of the set of parameter values or profile for the algorithm jppf.load.balancing.profile = rl_profile # "rl" profile jppf.load.balancing.profile.rl_profile.performanceCacheSize = 3000 jppf.load.balancing.profile.rl_profile.performanceVariationThreshold = 0.001
# algorithm name jppf.load.balancing.algorithm = nodethreads # name of the set of parameter values or profile for the algorithm jppf.load.balancing.profile = nodethreads_profile # means that multiplicator * nbThreads tasks will be sent to each node jppf.load.balancing.profile.nodethreads_profile.multiplicator = 1
Server process configuration
A JPPF server is in fact made of two processes: a “controller” process and a “server” process. The controller launches the server as a separate process and watches its exit code. If the exit code has a pre-defined value of 2, then it will restart the server process, otherwise it will simply terminate. This mechanism allows the remote (eventually delayed) restart of a server using the management APIs or the management console. It is also made such that, if any of the two processes dies unexpectedly, then the other process will die as well, leaving no lingering Java process in the OS.
The server process inherits the following parameters from the controller process:
- location of jppf configuration (-Djppf.config or -Djppf.config.plugin)
- current directory
- environment variables
- Java class path
It is possible to specify additional JVM parameters for the server process, using the configuration property jppf.jvm.options, as in this example:
jppf.jvm.options = -Xms64m -Xmx512m
Here is another example with remote debugging options:
jppf.jvm.options = -Xmx512m -server \ -Xrunjdwp:transport=dt_socket,address=localhost:8000,server=y,suspend=n
It is possible to specify additional class path elements through this property, by adding one or more “-cp” or “-classpath” options (unlike the Java command which only accepts one). For example:
jppf.jvm.options = -cp lib/myJar1.jar:lib/myJar2.jar -Xmx512m \ -classpath lib/external/externalJar.jar
Configuring a local node
Each JPPF driver is able to run a single node in its own JVM, called ”local node”. The main advantage is that the communication between server and node is much faster, since the network overhead is removed. This is particularly useful if you intend to create a pure P2P topology, where all servers communicate with each other and only one node is attached to each server.
To enable a local node in the driver, use the following configuration propoerty, which defaults to “false”:
jppf.local.node.enabled = true
- the local node can be configured using the same properties as described in the Node Configuration section, except for the network-related properties, since no network is involved between driver and local node
- for the same reason, the SSL configuration does not apply to a local node
Recovery from hardware failures of remote nodes
Network disconnections due to hardware failures are notoriously difficult to detect, let alone recover from. JPPF implements a configurable mechanism that enables detecting such failures, and recover from them, in a reasonable time frame. This mechanism works as follows:
- the node establishes a specific connection to the server, dedicated to failure detection
- at connection time, a handshake protocol takes place, where the node communicates a unique id (UUID) to the server, that can be correlated to other connections for this node (i.e. job server and distributed class loader)
- at regular intervals (heartbeats), the server will send a very short message to the node, which it expects the node to acknowledge by sending a short response of its own
- if the node's response is not received in a specified time frame, and this, a specified number of times in a row, the server will consider the connection to the node broken, will close it cleanly, close the associated connections, and handle the recovery, such as requeuing tasks that were being executed by the node for execution on another node
In practice, the polling of the nodes is performed by a “reaper” object that will handle the querying of the nodes, using a pool of dedicated threads rather than one thread per node. This enables a higher scalability with a large number of nodes.
The ability to specify multiple attempts at getting a response from the node is useful to handle situations where the network is slow, or when the node or server is busy with a high CPU utilization level. On the server side, the parameters of this mechanism are configurable via the following properties:
# Enable recovery from hardware failures on the nodes. # Default value is false (disabled). jppf.recovery.enabled = false # Maximum number of attempts to get a response form the node before the # connection is considered broken. Default value is 3. jppf.recovery.max.retries = 3 # Maximum time in milliseconds allowed for each attempt to get a response # from the node. Default value is 6000 (6 seconds). jppf.recovery.read.timeout = 6000 # Dedicated port number for the detection of node failure. # Default value is 22222. jppf.recovery.server.port = 22222 # Interval in milliseconds between two runs of the connection reaper. # Default value is 60000 (1 minute). jppf.recovery.reaper.run.interval = 60000 # Number of threads allocated to the reaper. # Default value is the number of available CPUs. jppf.recovery.reaper.pool.size = 8
Important note: given the implementation of this mechanism, you must be careful to keep some consistency between the server and node settings. As a general rule of thumb, the settings should always respect the following constraint:
serverReaperInterval < nodeMaxRetries * nodeTimeout
This rule applies between any server and node, as well as between any two servers if you enabled communication between servers in your configuration.
Note: if server discovery is active for a node, then the port number specified for the driver will override the one specified in the node's configuration.
The JPPF driver uses several pools of threads to perform network I/O with the nodes, clients and other drivers in parallel. There is a single configuration property that specifies the size of each of these pools:
jppf.transition.thread.pool.size = <number_of_io_threads>
When left unspecified, this property will take a default value equal to the number of processors available to the JVM (equivalent to Runtime.getRuntime().availableProcessors()).
Redirecting the console output
In some situations, it might be desirable to redirect the standard and error output of the driver, that is, the output of System.out and System.err, to files. This can be accomplished with the following properties:
# file on the file system where System.out is redirected jppf.redirect.out = /some/path/someFile.out.log # whether to append to an existing file or to create a new one jppf.redirect.out.append = false # file on the file system where System.err is redirected jppf.redirect.err = /some/path/someFile.err.log # whether to append to an existing file or to create a new one jppf.redirect.err.append = false
By default, a new file is created each time the driver is started, unless “jppf.redirect.out.append = true” or “jppf.redirect.err.append = true” are specified. If a file path is not specified, then the corresponding output is not redirected.
|Main Page > Configuration guide > Configuring a JPPF server|