JPPF, java, parallel computing, distributed computing, grid computing, parallel, distributed, cluster, grid, cloud, open source, android, .net
JPPF

The open source
grid computing
solution

 Home   About   Features   Download   Documentation   On Github   Forums 
June 04, 2023, 07:55:03 PM *
Welcome,
Please login or register.

Login with username, password and session length
Advanced search  
News: New users, please read this message. Thank you!
  Home Help Search Login Register  
Pages: [1]   Go Down

Author Topic: Orphan node fail over  (Read 1685 times)

Cobar7960

  • JPPF Padawan
  • *
  • Posts: 4
Orphan node fail over
« on: July 23, 2018, 10:02:58 PM »

Current topology: Two drivers with one driver/node on 2 machines.

How do you handle fail over when the drivers are up but a node is down?

Client properties:

Code: [Select]
#------------------------------------------------------------------------------#
# JPPF.                                                                        #
# Copyright (C) 2005-2016 JPPF Team.                                           #
# http://www.jppf.org                                                          #
#                                                                              #
# Licensed under the Apache License, Version 2.0 (the "License");              #
# you may not use this file except in compliance with the License.             #
# You may obtain a copy of the License at                                      #
#                                                                              #
#    http://www.apache.org/licenses/LICENSE-2.0                                #
#                                                                              #
# Unless required by applicable law or agreed to in writing, software          #
# distributed under the License is distributed on an "AS IS" BASIS,            #
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.     #
# See the License for the specific language governing permissions and          #
# limitations under the License.                                               #
#------------------------------------------------------------------------------#


#------------------------------------------------------------------------------#
# Space-separated list of named drivers this client may connect to.            #
# If auto discovery of the server is enabled, this needs not be specified.     #
#------------------------------------------------------------------------------#

#jppf.drivers = driver1
jppf.drivers = primary-pool secondary-pool

primary-pool.jppf.server.host=IP-1
primary-pool.jppf.server.port=11111
primary-pool.jppf.priority=20

secondary-pool.jppf.server.host=IP-2
secondary-pool.jppf.server.port=11111
secondary-pool.jppf.priority=10

#------------------------------------------------------------------------------#
# Manual configuration of the connection to a JPPF driver                      #
# These settings only apply if jppf.discovery.enabled = false                  #
#------------------------------------------------------------------------------#

# Host name, or ip address, of the host the JPPF driver is running on
# If auto discovery of the server is enabled, this needs not be specified.
#driver1.jppf.server.host = localhost
driver1.jppf.server.host = IP-1


# port number the server is listening to for connections.
# Defaults to 11111 if SSL is disabled, 11443 if SSL is enabled
driver1.jppf.server.port = 11111

# Whether SSL connectivity is requested for this driver. Defaults to false
#driver1.jppf.ssl.enabled = false

# Priority given to the driver connection.
# When this is used, the client is always connected to the available driver connection(s)
# with the highest priority. If multiple drivers have the same priority, they will be
# used as a pool and jobs will be evenly distributed among them. The default value is 0
#driver1.jppf.priority = 10

# Size of the connection pool associated with a driver definition. The JPPF client
# will create the specified number of connections to the driver, allowing to send
# multiple jobs concurrently to the same driver, or each job over  multiple connections
# in parallel, or any combination of the two, depending on the load balancing settings.
# The default value is 1
#driver1.jppf.pool.size = 10

# Size of the associated pool of JMX connections. Defaults to 1.
# Each JMX connection uses resources (threads and socket connection) on both the
# server and client. It is thus recommended to have this value as low as possible.
#driver1.jppf.jmx.pool.size = 10

#------------------------------------------------------------------------------#
# Manual configuration of the connection to a second JPPF driver               #
# These settings only apply if jppf.discovery.enabled = false                  #
#------------------------------------------------------------------------------#

#driver2.jppf.server.host = localhost
#driver2.jppf.server.port = 11443
#driver2.jppf.ssl.enabled = true
#driver2.jppf.priority = 5
#driver2.jppf.pool.size = 2
#driver2.jppf.jmx.pool.size = 2

#------------------------------------------------------------------------------#
# Management configuration                                                     #
#------------------------------------------------------------------------------#

# Enable or disable management features in the client. Defaults to true (enabled)
#jppf.management.enabled = false

#------------------------------------------------------------------------------#
# SSL Settings                                                                 #
#------------------------------------------------------------------------------#

# enable SSL for auto-discovered drivers. Default is false (disabled).
# if enabled, only SSL connections are established
#jppf.ssl.enabled = true

# location of the SSL configuration on the file system or in the classpath
jppf.ssl.configuration.file = config/ssl/ssl.properties

# SSL configuration as an arbitrary source. Value is the fully qualified name
# of an implementation of Callable<InputStream> with optional arguments
#jppf.ssl.configuration.source = org.jppf.ssl.FileStoreSource config/ssl/ssl.properties

#------------------------------------------------------------------------------#
# Automatic recovery of driver connections. This parameters determine how the  #
# JPPF client reacts when a connection to a driver is lost                     #
#------------------------------------------------------------------------------#

# number of seconds before the first reconnection attempt. Defaults to 0
#jppf.reconnect.initial.delay = 0

# time in seconds after which the system stops trying to reconnect
# A value of zero or less means the system never stops trying. Defaults to 60
jppf.reconnect.max.time = -1

# time between two connection attempts, in seconds. Defaults to 1
#jppf.reconnect.interval = 1

# whether to resolve the drivers' ip addresses into host names. Defaults to true (resolve the addresses)
jppf.resolve.addresses = true

#------------------------------------------------------------------------------#
# Local executor settings. The local executor, when enabled, processes jobs in #
# the same JVM as the JPPF client, using the exact same APIs as a for remote   #
# connections.                                                                 #
#------------------------------------------------------------------------------#

# Enable local execution of jobs? Default value is false (disabled)
#jppf.local.execution.enabled = true

# Number of threads to use for local execution. Defaults to the number of CPUs available to the JVM
#jppf.local.execution.threads = 4

# priority assigned to the local executor; defaults to 0
# this is equivalent to "<driver_name>.jppf.priority" in manual network configuration
#jppf.local.execution.priority = 10

# Enable remote execution of jobs? Default value is true (enabled)
# when disabled, jobs will not be submitted to any remote server
#jppf.remote.execution.enabled = true

#------------------------------------------------------------------------------#
# Configuration of automatic discovery of JPPF drivers.                        #
#------------------------------------------------------------------------------#

# Enable or disable discovery of JPPF drivers. Defaults to true (enabled)
jppf.discovery.enabled = false

# UDP multicast group to which drivers broadcast their connection parameters
# and to which clients and nodes listen. Defaults to 230.0.0.1
#jppf.discovery.group = 230.0.0.1

# UDP multicast port to which drivers broadcast their connection parameters. Defaults to 11111
#jppf.discovery.port = 11111

# Size of the connection pool for each discovered driver. Default value is 1
#jppf.pool.size = 1

# priority assigned to all auto-discovered connections; defaults to 0
# this is equivalent to "<driver_name>.jppf.priority" in manual network configuration
#jppf.discovery.priority = 10

# IPv4 address patterns included in the server discovery mechanism
# Drivers whose IPv4 address matches the pattern will be included
# in the list of discovered drivers.
#jppf.discovery.include.ipv4 = 192.168.1.; 192.168.1.0/24

# IPv4 address patterns excluded from the server discovery mechanism
# Drivers whose IPv4 address matches the pattern will be excluded
# from the list of discovered drivers.
#jppf.discovery.exclude.ipv4 = 192.168.1.128-; 192.168.1.0/25

# IPv6 address patterns included in the server discovery mechanism
#jppf.discovery.include.ipv6 = 1080:0:0:0:8:800:200C-20FF:-; ::1/80

# IPv6 address patterns excluded from the server discovery mechanism
#jppf.discovery.exclude.ipv6 = 1080:0:0:0:8:800:200C-20FF:0C00-0EFF; ::1/96

#------------------------------------------------------------------------------#
# Specify alternate serialization schemes.                                     #
# Defaults to org.jppf.serialization.DefaultJavaSerialization.                 #
#------------------------------------------------------------------------------#

# The default: standard Java serialization
#jppf.object.serialization.class = org.jppf.serialization.DefaultJavaSerialization

# built-in JPPF serialization, enables serialization of objects whose class does not implement java.io.Serializable
#jppf.object.serialization.class = org.jppf.serialization.DefaultJPPFSerialization

# XStream serialization
#jppf.object.serialization.class = org.jppf.serialization.XstreamSerialization

# Kryo serialization, defined in the "Kryo Serialization" sample
#jppf.object.serialization.class = org.jppf.serialization.kryo.KryoSerialization

#------------------------------------------------------------------------------#
# Specify a data transformation class.                                         #
# If left unspecified, no transformation is used.                              #
#------------------------------------------------------------------------------#

# Defined in the "Network Data Encryption" sample
#jppf.data.transform.class = org.jppf.example.dataencryption.SecureKeyCipherTransform

#------------------------------------------------------------------------------#
# Load-balancing configuration. The load-balancing determines how the tasks in #
# the jobs are distributed over the available driver connections, including    #
# the connection to the local executor (if  enabled).                          #
# If no load-balancing is configured, the JPPF client will default to the      #
# "manual" algorithm with a fixed size of 1,000,000                            #
#------------------------------------------------------------------------------#

# Name of the load-balancing algorithm to use. Pre-defined possible values are:
# manual | autotuned | proportional | rl | nodethreads
# It can also be the name of a user-defined algorithm. Defaults to "manual"
jppf.load.balancing.algorithm = manual

# name of the set of parameter values (aka profile) to use with the algorithm
jppf.load.balancing.profile = manual_profile

# "manual" profile
jppf.load.balancing.profile.manual_profile.size = 1000000

# "autotuned" profile
jppf.load.balancing.profile.autotuned_profile.size = 5
jppf.load.balancing.profile.autotuned_profile.minSamplesToAnalyse = 100
jppf.load.balancing.profile.autotuned_profile.minSamplesToCheckConvergence = 50
jppf.load.balancing.profile.autotuned_profile.maxDeviation = 0.2
jppf.load.balancing.profile.autotuned_profile.maxGuessToStable = 50
jppf.load.balancing.profile.autotuned_profile.sizeRatioDeviation = 1.5
jppf.load.balancing.profile.autotuned_profile.decreaseRatio = 0.2

# "proportional" profile
jppf.load.balancing.profile.proportional_profile.size = 5
jppf.load.balancing.profile.proportional_profile.initialMeanTime = 1e10
jppf.load.balancing.profile.proportional_profile.performanceCacheSize = 300
jppf.load.balancing.profile.proportional_profile.proportionalityFactor = 1

# "rl" profile
jppf.load.balancing.profile.rl_profile.performanceCacheSize = 1000
jppf.load.balancing.profile.rl_profile.performanceVariationThreshold = 0.0001
jppf.load.balancing.profile.rl_profile.maxActionRange = 10

# "nodethreads" profile
jppf.load.balancing.profile.nodethreads_profile.multiplicator = 1

# "rl2" profile
jppf.load.balancing.profile.rl2_profile.performanceCacheSize = 1000
jppf.load.balancing.profile.rl2_profile.performanceVariationThreshold = 0.75
jppf.load.balancing.profile.rl2_profile.minSamples = 20
jppf.load.balancing.profile.rl2_profile.maxSamples = 100
jppf.load.balancing.profile.rl2_profile.maxRelativeSize = 0.5

#------------------------------------------------------------------------------#
# JPPF grid topology monitoring: configuration of the refresh intervals.       #
# Change the values of these properties if the monitoring API is has trouble   #
# keeping up with  all the information received from the nodes and servers.    #
# This may happen when the number of nodes and servers becomes large and the   #
# TopologyManager cannot cope. Increasing the refresh intervals (or decreasing #
# the frequency of the updates) resolves such situations.                      #
#------------------------------------------------------------------------------#

# refresh interval im for the topology panels: tree view and graph views; defaults to 1000
# this is the interval between 2 successive runs of the task that refreshes the topology via JMX requests
jppf.admin.refresh.interval.topology = 1000

# refresh interval for the JVM health panel in ms; defaults to 3000
# this is the interval between 2 successive runs of the task that refreshes the JVM health via JMX requests
jppf.admin.refresh.interval.health = 3000

#------------------------------------------------------------------------------#
# Global performance tuning parameters. These affect the performance and       #
# throughput of I/O operations in JPPF. The values provided in the vanilla     #
# JPPF distribution are known to offer a good performance in most situations   #
# and environments.                                                            #
#------------------------------------------------------------------------------#

# Size of send and receive buffer for socket connections.
# Defaults to 32768 and must be in range [1024, 1024*1024]
# 128 * 1024 = 131072
jppf.socket.buffer.size = 131072
# Size of temporary buffers (including direct buffers) used in I/O transfers.
# Defaults to 32768 and must be in range [1024, 1024*1024]
jppf.temp.buffer.size = 12288
# Maximum size of temporary buffers pool (excluding direct buffers). When this size
# is reached, new buffers are still created, but not released into the pool, so they
# can be quickly garbage-collected. The size of each buffer is defined with ${jppf.temp.buffer.size}
# Defaults to 10 and must be in range [1, 2048]
jppf.temp.buffer.pool.size = 200
# Size of temporary buffer pool for reading lengths as ints (size of each buffer is 4).
# Defaults to 100 and must be in range [1, 2048]
jppf.length.buffer.pool.size = 100

#------------------------------------------------------------------------------#
# Enabling or disabling the lookup of classpath resources in the file system   #
# Defaults to true (enabled)                                                   #
#------------------------------------------------------------------------------#

#jppf.classloader.file.lookup = true

#------------------------------------------------------------------------------#
# Timeout in millis for JMX requests. Defaults to Long.MAX_VALUE (2^63 - 1)    #
#------------------------------------------------------------------------------#

#jppf.jmx.request.timeout = $script{ java.lang.Long.MAX_VALUE }$

#------------------------------------------------------------------------------#
# path to the Java executable. When defined, it is used by the launch script   #
# (run.bat or run.sh) instead of the default Java path.                        #
# It is undefined by default, meaning that the script will use the "java"      #
# command, relying on Java being in the system PATH.                           #
#------------------------------------------------------------------------------#

# linux/unix example
#jppf.java.path = /opt/java/jdk1.8.0_x64/bin/java
# windows example
#jppf.java.path = C:/java/jdk1.8.0_x64/bin/java.exe



Driver:

Code: [Select]
#------------------------------------------------------------------------------#
# JPPF                                                                         #
# Copyright (C) 2005-2015 JPPF Team.                                           #
# http://www.jppf.org                                                          #
#                                                                              #
# Licensed under the Apache License, Version 2.0 (the "License");              #
# you may not use this file except in compliance with the License.             #
# You may obtain a copy of the License at                                      #
#                                                                              #
# http://www.apache.org/licenses/LICENSE-2.0                                #
#                                                                              #
# Unless required by applicable law or agreed to in writing, software          #
# distributed under the License is distributed on an "AS IS" BASIS,            #
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.     #
# See the License for the specific language governing permissions and          #
# limitations under the License.                                               #
#------------------------------------------------------------------------------#

#------------------------------------------------------------------------------#
# port number to which the server listens for plain connections                #
# default value is 11111; uncomment to specify a different value               #
# to disable, specify a negative port number                                   #
#------------------------------------------------------------------------------#

jppf.server.port = 11111

#------------------------------------------------------------------------------#
# port number to which the server listens for secure connections               #
# default value is 11443; uncomment to specify a different value               #
# to disable, specify a negative port number                                   #
#------------------------------------------------------------------------------#

#jppf.ssl.server.port = 11443
jppf.ssl.server.port = -1

#------------------------------------------------------------------------------#
#                          SSL Settings                                        #
#------------------------------------------------------------------------------#

# location of the SSL configuration on the file system
#jppf.ssl.configuration.file = config/ssl/ssl-server.properties

# SSL configuration as an arbitrary source. Value is the fully qualified name
# of an implementation of java.util.concurrent.Callable<InputStream>
# with optional space-separated arguments
#jppf.ssl.configuration.source = org.jppf.ssl.FileStoreSource config/ssl/ssl-server.properties

# enable secure communications with other servers; defaults to false (disabled)#
jppf.peer.ssl.enabled = false

#------------------------------------------------------------------------------#
# Enabling and configuring JMX features                                        #
#------------------------------------------------------------------------------#

# non-secure JMX connections; default is true (enabled)
#jppf.management.enabled = true

# secure JMX connections via SSL/TLS; default is false (disabled)
#jppf.management.ssl.enabled = true

# JMX management host IP address. If not specified (recommended), the first non-local
# IP address (i.e. neither 127.0.0.1 nor localhost) on this machine will be used.
# If no non-local IP is found, localhost will be used
#jppf.management.host = localhost

# JMX management port. Defaults to 11198. If the port is already bound, the driver
# will scan for the first available port instead.
#jppf.management.port = 11187

#------------------------------------------------------------------------------#
# Configuration of the driver discovery broadcast service                      #
#------------------------------------------------------------------------------#

# Enable/Disable automatic discovery of this JPPF drivers; default to true
#jppf.discovery.enabled = true

# UDP multicast group to which drivers broadcast their connection parameters
# and to which clients and nodes listen. Default value is 230.0.0.1
#jppf.discovery.group = 230.0.0.1

# UDP multicast port to which drivers broadcast their connection parameters
# and to which clients and nodes listen. Default value is 11111
#jppf.discovery.port = 11111

# Time between 2 broadcasts, in milliseconds. Default value is 1000
#jppf.discovery.broadcast.interval = 1000

# IPv4 inclusion patterns: broadcast these ipv4 addresses
#jppf.discovery.broadcast.include.ipv4 = 192.168.1.; 192.168.1.0/24

# IPv4 exclusion patterns: do not broadcast these ipv4 addresses
#jppf.discovery.exclude.ipv4 = 192.168.1.128-; 192.168.1.0/25

# IPv6 inclusion patterns: broadcast these ipv6 addresses
#jppf.discovery.include.ipv6 = 1080:0:0:0:8:800:200C-20FF:-; ::1/80

# IPv6 exclusion patterns: do not broadcast these ipv6 addresses
#jppf.discovery.exclude.ipv6 = 1080:0:0:0:8:800:200C-20FF:0C00-0EFF; ::1/64

#------------------------------------------------------------------------------#
# Connection with other servers, enabling P2P communication                    #
#------------------------------------------------------------------------------#

# Enable/disable auto-discovery of remote peer drivers. Default value is false
jppf.peer.discovery.enabled = true

# manual configuration of peer servers, as a space-separated list of peers names to connect to
jppf.peers = server_1 server_2

# enable both automatic and manual discovery
#jppf.peers = jppf_discovery server_1 server_2
 
# connection to server_1
jppf.peer.server_1.server.host = jppf-server-1
jppf.peer.server_1.server.port = 11111
# connection to server_2
jppf.peer.server_2.server.host = jppf-server-2
jppf.peer.server_2.server.port = 11111

# Default is false
jppf.peer.allow.orphans = true

#------------------------------------------------------------------------------#
# Load-balancing configuration                                                 #
#------------------------------------------------------------------------------#

# name of the load-balancing algorithm to use; pre-defined possible values are:
# manual | autotuned | proportional | rl | nodethreads
# it can also be the name of a user-defined algorithm. Default value is "manual"
jppf.load.balancing.algorithm = nodethreads

# name of the set of parameter values (aka profile) to use for the algorithm
jppf.load.balancing.profile = nodethreads_profile

# "manual" profile
jppf.load.balancing.profile.manual_profile.size = 1

# "autotuned" profile
jppf.load.balancing.profile.autotuned_profile.size = 5
jppf.load.balancing.profile.autotuned_profile.minSamplesToAnalyse = 100
jppf.load.balancing.profile.autotuned_profile.minSamplesToCheckConvergence = 50
jppf.load.balancing.profile.autotuned_profile.maxDeviation = 0.2
jppf.load.balancing.profile.autotuned_profile.maxGuessToStable = 50
jppf.load.balancing.profile.autotuned_profile.sizeRatioDeviation = 1.5
jppf.load.balancing.profile.autotuned_profile.decreaseRatio = 0.2

# "proportional" profile
jppf.load.balancing.profile.proportional_profile.size = 5
jppf.load.balancing.profile.proportional_profile.initialMeanTime = 1e10
jppf.load.balancing.profile.proportional_profile.performanceCacheSize = 300
jppf.load.balancing.profile.proportional_profile.proportionalityFactor = 1

# "rl" profile
jppf.load.balancing.profile.rl_profile.performanceCacheSize = 1000
jppf.load.balancing.profile.rl_profile.performanceVariationThreshold = 0.0001
jppf.load.balancing.profile.rl_profile.maxActionRange = 10

# "nodethreads" profile
jppf.load.balancing.profile.nodethreads_profile.multiplicator = 1.5

#------------------------------------------------------------------------------#
# Other JVM options added to the java command line when the driver is started  #
# as a subprocess. Multiple options are separated by spaces.                   #
#------------------------------------------------------------------------------#

jppf.jvm.options = -Xmx4096m -Djava.util.logging.config.file=config/logging-driver.properties

# example with remote debugging options
#jppf.jvm.options = -server -Xmx256m -Xrunjdwp:transport=dt_socket,address=localhost:8000,server=y,suspend=n

#------------------------------------------------------------------------------#
# Specify alternate serialization schemes.                                     #
# Defaults to org.jppf.serialization.DefaultJavaSerialization.                 #
#------------------------------------------------------------------------------#

# default
#jppf.object.serialization.class = org.jppf.serialization.DefaultJavaSerialization

# built-in object serialization schemes
#jppf.object.serialization.class = org.jppf.serialization.DefaultJPPFSerialization
#jppf.object.serialization.class = org.jppf.serialization.XstreamSerialization

# defined in the "Kryo Serialization" sample
#jppf.object.serialization.class = org.jppf.serialization.kryo.KryoSerialization

#------------------------------------------------------------------------------#
# Specify a data transformation class. If unspecified, no transformation occurs#
#------------------------------------------------------------------------------#

# Defined in the "Network Data Encryption" sample
#jppf.data.transform.class = org.jppf.example.dataencryption.SecureKeyCipherTransform

#------------------------------------------------------------------------------#
# whether to resolve the nodes' ip addresses into host names                   #
# defaults to true (resolve the addresses)                                     #
#------------------------------------------------------------------------------#

org.jppf.resolve.addresses = true

#------------------------------------------------------------------------------#
# Local (in-JVM) node. When enabled, any node-specific properties will apply   #
#------------------------------------------------------------------------------#

# Enable/disable the local node. Default is false (disabled)
#jppf.local.node.enabled = false
# example node-specific setting
jppf.processing.threads = 4

#------------------------------------------------------------------------------#
# In idle mode configuration. In this mode the server or node starts when no   #
# mouse or keyboard activity has occurred since the specified timeout, and is  #
# stopped when any new activity occurs.                                        #
#------------------------------------------------------------------------------#

# Idle mode enabled/disabled. Default is false (disabled)
#jppf.idle.mode.enabled = false

# Fully qualified class name of the factory object that instantiates a platform-specific idle state detector
#jppf.idle.detector.factory = org.jppf.example.idlesystem.IdleTimeDetectorFactoryImpl

# Time of keyboard and mouse inactivity to consider the system idle, in milliseconds
# Default value is 300000 (5 minutes)
#jppf.idle.timeout = 6000

# Interval between 2 successive calls to the native APIs to determine idle state changes
# Default value is 1000
#jppf.idle.poll.interval = 1000

#------------------------------------------------------------------------------#
# Automatic recovery from hard failure of the nodes connections. These         #
# parameters configure how the driver reacts when a node fails to respond to   #
# its heartbeat messages.                                                      #
#------------------------------------------------------------------------------#

# Enable recovery from failures on the nodes. Default to false (disabled)
#jppf.recovery.enabled = false

# Max number of attempts to get a response from the node before the connection
# is considered broken. Default value is 3
#jppf.recovery.max.retries = 3

# Max time in milliseconds allowed for each attempt to get a response from the node.
# Default value is 6000 (6 seconds)
#jppf.recovery.read.timeout = 6000

# Dedicated port number for the detection of node failure. Defaults to 22222.
# If server discovery is enabled on the nodes, this value will override the port number specified in the nodes
#jppf.recovery.server.port = 22222

# Interval in milliseconds between two runs of the connection reaper
# Default value is 60000 (1 minute)
#jppf.recovery.reaper.run.interval = 60000

# Number of threads allocated to the reaper. Default to the number of available CPUs
#jppf.recovery.reaper.pool.size = 8

#------------------------------------------------------------------------------#
# Redirecting System.out and System.err to files.                              #
#------------------------------------------------------------------------------#

# file path on the file system where System.out is redirected.
# if unspecified or invalid, then no redirection occurs
#jppf.redirect.out = System.out.log
# whether to append to an existing file or to create a new one
jppf.redirect.out.append = false

# file path on the file system where System.err is redirected
# if unspecified or invalid, then no redirection occurs
#jppf.redirect.err = System.err.log
# whether to append to an existing file or to create a new one
jppf.redirect.err.append = false

#------------------------------------------------------------------------------#
# Global performance tuning parameters. These affect the performance and       #
# throughput of I/O operations in JPPF. The values provided in the vanilla     #
# JPPF distribution are known to offer a good performance in most situations   #
# and environments.                                                            #
#------------------------------------------------------------------------------#

# Size of send and receive buffer for socket connections.
# Defaults to 32768 and must be in range [1024, 1024*1024]
# 128 * 1024 = 131072
jppf.socket.buffer.size = 131072
# Size of temporary buffers (including direct buffers) used in I/O transfers.
# Defaults to 32768 and must be in range [1024, 1024*1024]
jppf.temp.buffer.size = 12288
# Maximum size of temporary buffers pool (excluding direct buffers). When this size
# is reached, new buffers are still created, but not released into the pool, so they
# can be quickly garbage-collected. The size of each buffer is defined with ${jppf.temp.buffer.size}
# Defaults to 10 and must be in range [1, 2048]
jppf.temp.buffer.pool.size = 200
# Size of temporary buffer pool for reading lengths as ints (size of each buffer is 4).
# Defaults to 100 and must be in range [1, 2048]
jppf.length.buffer.pool.size = 100

#------------------------------------------------------------------------------#
# Enabling or disabling the lookup of classpath resources in the file system   #
# Defaults to true (enabled)                                                   #
#------------------------------------------------------------------------------#

#jppf.classloader.file.lookup = true

#------------------------------------------------------------------------------#
# Timeout in millis for JMX requests. Defaults to Long.MAX_VALUE (2^63 - 1)    #
#------------------------------------------------------------------------------#

#jppf.jmx.request.timeout = $script{ java.lang.Long.MAX_VALUE }$

Logged

lolo

  • Administrator
  • JPPF Council Member
  • *****
  • Posts: 2272
    • JPPF Web site
Re: Orphan node fail over
« Reply #1 on: July 24, 2018, 02:20:50 AM »

Hello,

My understanding of the problem you describe is that you have 2 drivers D1 and D2, and two nodes N1 and N2, with D1 / N1 on one machine, and D2 / N2 on another machine, and one of the nodes fails. Is this correct?

If yes, then you have a situation where a job will be stuck when N2 is down and D1 dispatches (part of) the job to D2. To avoid this issue you can either:

- set jppf.peers.allow.orphans to false instead of true

- you can also prevent each job from being dispatched to a peer driver that doesn' t have any node, by setting an appropriate execution policy onto its SLA:
Code: [Select]
JPPFJob job = new JPPFJob();
// send only to nodes that are not peer drivers or have at least one node
ExecutionPOlicy policy = new Equal("jppf.peer.driver", false).or(new AtLeast("jppf.peer.total.nodes", 1));
job.getSLA().setExecutionPolicy(policy);

The property jppf.peer.total.nodes is set internally by the driver when it is notified by a peer driver of a change in its number of nodes.

- you can also set the job to expire after a specified elapsed time or after a specfied date:
Code: [Select]
JPPFJob job = new JPPFJob();
// set the job to expire after 15 seconds if it hasn't completed
job.getSLA().setJobExpirationSchedule(new JPPFSchedule(15000L));
While this doesn't prevent the job from being sent to an orphaned driver, it ensures the job won't remain stuck.

I hope this helps,
-Laurent
Logged

Cobar7960

  • JPPF Padawan
  • *
  • Posts: 4
Re: Orphan node fail over
« Reply #2 on: July 24, 2018, 03:42:03 PM »

Thanks Laurent!

You are correct. I killed N1. The client connects to D1 with priority 20 (D2 has priority of 10). D1 queues the job and hangs because it has no nodes attached. When I kill D1 the client queues the job to D2.

My requirement was to have fail over to D2/N2 should D1/N1 fail and vise versa. They don't want to load balance between D1 and D2.

If I change the client to use the Execution Policy, do I need the P2P configuration? Is the peer configuration primarily used for load balancing between drivers?

The Execution Policy didn't work with or without:
Code: [Select]
#jppf.peer.discovery.enabled = true
#jppf.peers = server_2
jppf.recovery.enabled = true

TemplateApplicaitonRunner
Code: [Select]
  public void executeBlockingJob(final JPPFClient jppfClient) throws Exception {
    // Create a job
    JPPFJob job = createJob("Template blocking job");

    // set the job in blocking mode.
    job.setBlocking(true);

    System.out.println("Setting Execution Policy");
    ExecutionPolicy policy = new Equal("jppf.peer.driver", false).or(new AtLeast("jppf.peer.total.nodes", 1));
    job.getSLA().setExecutionPolicy(policy);

    // Submit the job and wait until the results are returned.
    // The results are returned as a list of Task<?> instances,
    // in the same order as the one in which the tasks where initially added to the job.
    List<Task<?>> results = jppfClient.submitJob(job);

    // process the results
    processExecutionResults(job.getName(), results);
  }

Of interest, I removed the driver log in an attempt to attach it to this topic and restarted the driver. When the client connected it threw
java.lang.NullPointerException
   at org.jppf.server.protocol.ServerTaskBundleClient.<init>(ServerTaskBundleClient.java:121)
The client then connected to D2.

I have also attempted to connect to the D1 and N1 via JMX.
TemplateApplicationRunner
Code: [Select]

      String host = "localhost";
      int port = 11198;
      System.out.println("Checking Driver at " + host + ":" + port);
      JMXDriverConnectionWrapper driverWrapper = new JMXDriverConnectionWrapper(host, port);
//      new JMXDriverConnectionWrapper();
      driverWrapper.connectAndWait(20000);
      System.out.println("DRIVER NODES:: " + driverWrapper.nbNodes());


      System.out.println("Checking NODE at " + host + ":" + port);
      JMXNodeConnectionWrapper nodeWrapper = new JMXNodeConnectionWrapper(host, port);
      nodeWrapper.connectAndWait(10000);
      System.out.println("NODE STATE:: " + nodeWrapper.state() + "\nNode Connected? " + nodeWrapper.isConnected());

      host = "D2-IP";
      System.out.println("Checking Driver at " + host + ":" + port);
      driverWrapper = new JMXDriverConnectionWrapper(host, port);
      //new JMXDriverConnectionWrapper("localhost", 11198);
      driverWrapper.connectAndWait(10000);
      System.out.println("DRIVER NODES:: " + driverWrapper.nbNodes());

Template Console:
Code: [Select]
     [java] client process id: 24557, uuid: 679F8AC5-DC1D-C748-E373-7770CC57544B
     [java] Checking Driver at localhost:11198
     [java] [client: primary-pool-1 - ClassServer] Attempting connection to the class server at diamd-eadget-t5810-23.labs.isgs.lmco.com:11111
     [java] [client: secondary-pool-1 - ClassServer] Attempting connection to the class server at ds-jppf-02.labs.isgs.lmco.com:11111
     [java] [client: primary-pool-1 - ClassServer] Reconnected to the class server
     [java] [client: secondary-pool-1 - ClassServer] Reconnected to the class server
     [java] [client: primary-pool-1 - TasksServer] Attempting connection to the task server at diamd-eadget-t5810-23.labs.isgs.lmco.com:11111
     [java] [client: secondary-pool-1 - TasksServer] Attempting connection to the task server at ds-jppf-02.labs.isgs.lmco.com:11111
     [java] [client: secondary-pool-1 - TasksServer] Reconnected to the JPPF task server
     [java] [client: primary-pool-1 - TasksServer] Reconnected to the JPPF task server
     [java] DRIVER NODES:: null
     [java] Checking NODE at localhost:11198
     [java] NODE STATE:: null
     [java] Node Connected? false
     [java] Checking Driver at D2-IP:11198
     [java] DRIVER NODES:: null


Driver log:
Code: [Select]
2018-07-24 14:02:01,073 [DEBUG][org.jppf.utils.NetworkUtils.getManagementHost(174)]: JMX host from NetworkUtils: 192.168.122.1
2018-07-24 14:02:01,073 [DEBUG][org.jppf.utils.NetworkUtils.getManagementHost(177)]: computed JMX host: localhost
2018-07-24 14:02:01,077 [DEBUG][org.jppf.management.JMXServerFactory.createServer(51)]: created JMX server: org.jppf.management.JMXMPServer@ba4d54
2018-07-24 14:02:01,077 [DEBUG][org.jppf.management.JMXMPServer.start(74)]: starting remote connector server
2018-07-24 14:02:01,077 [DEBUG][org.jppf.utils.NetworkUtils.getIPAddresses(132)]: found network interface: name:virbr0 (virbr0)
2018-07-24 14:02:01,078 [DEBUG][org.jppf.utils.NetworkUtils.getIPAddresses(132)]: found network interface: name:enp0s25 (enp0s25)
2018-07-24 14:02:01,078 [DEBUG][org.jppf.utils.NetworkUtils.getIPAddresses(132)]: found network interface: name:lo (lo)
2018-07-24 14:02:01,078 [DEBUG][org.jppf.utils.NetworkUtils.getIPAddresses(132)]: found network interface: name:virbr0 (virbr0)
2018-07-24 14:02:01,078 [DEBUG][org.jppf.utils.NetworkUtils.getIPAddresses(132)]: found network interface: name:enp0s25 (enp0s25)
2018-07-24 14:02:01,079 [DEBUG][org.jppf.utils.NetworkUtils.getIPAddresses(132)]: found network interface: name:lo (lo)
2018-07-24 14:02:01,079 [DEBUG][org.jppf.utils.NetworkUtils.getManagementHost(174)]: JMX host from NetworkUtils: 192.168.122.1
2018-07-24 14:02:01,079 [DEBUG][org.jppf.utils.NetworkUtils.getManagementHost(177)]: computed JMX host: localhost
2018-07-24 14:02:01,079 [DEBUG][org.jppf.management.JMXMPServer.start(83)]: managementPort=11198, portProperties=[jppf.management.port]
2018-07-24 14:02:01,082 [DEBUG][org.jppf.serialization.JPPFSerialization$Factory.init(80)]: found jppf.object.serialization.class = null
2018-07-24 14:02:01,082 [DEBUG][org.jppf.serialization.JPPFSerialization$Factory.init(93)]: using DefaultJavaSerialization
2018-07-24 14:02:01,096 [DEBUG][org.jppf.management.JMXMPServer.start(113)]: JMXConnectorServer started at URL service:jmx:jmxmp://127.0.0.1:11198

I can't connect with either JConsole and the Admin shows no Topology?


« Last Edit: July 24, 2018, 10:51:22 PM by Cobar7960 »
Logged

lolo

  • Administrator
  • JPPF Council Member
  • *****
  • Posts: 2272
    • JPPF Web site
Re: Orphan node fail over
« Reply #3 on: July 25, 2018, 06:59:16 AM »

Hello,

Let's look at the possible failure scenarios. We assume that orphans are allowed and that  P2P connectivity between the 2 drivers is enabled, that is, each driver configures the following:
Code: [Select]
jppf.peer.allow.orphans = true
jppf.peers = other_driver
jppf.peer.other_driver.server.host = <other_driver_host>
jppf.peer.other_driver.server.port = <other_driver_port>

- no failure, all drivers and nodes are alive: since D1 has the highest priority for the client, the client will always send jobs to D1. Then D1 will load-balance between N1 and D2. If a job is routed to D2 from D1, it will then be executed on N2. It cannot be routed back to D1, because each JPPF driver detects potential routing cycles before they occur.

- D1 fails: the client will now route the jobs to D2 and they will be executed on N2.

- N1 fails: D1 is still alive, and since it has the highest priority for the client, the client will keep sending jobs to D1. Since D1 is connected to D2, it will route the jobs to D2, and the jobs will be executed on N2

- D2 fails: from the client's perspective, nothing's changed, jobs will still be routed to D1 and then executed on N1

- N2 fails: jobs are sent to D1 since it has the highest priority. D1 will still load-balance between N1 and D2, unless you set the execution policy on the job. If a job is routed to D2, then it will be stuck there, unless N2 comes back or you set an expiration schedule on the job.

Quote
My requirement was to have fail over to D2/N2 should D1/N1 fail and vise versa
In this case, you need the P2P connectivity between the 2 drivers, as illustrated above.

Quote
They don't want to load balance between D1 and D2.
Unfortunately, this is currently unavoidable. However, I registered  a feature request for the upcoming JPPF 6.0: JPPF-543 Enable P2P connectivity between drivers to be used only for failover.

Quote
If I change the client to use the Execution Policy, do I need the P2P configuration?
It is in fact the execution policy that is needed when P2P is configured, to mitigate the scenario when onluy N2 fails.

Quote
Is the peer configuration primarily used for load balancing between drivers?
It is used for both.

Quote
Of interest, I removed the driver log in an attempt to attach it to this topic and restarted the driver. When the client connected it threw java.lang.NullPointerException
I'm not sure I understand. Could you please provide the exact steps you performed, so that I can reproduce and then fix the issue?

Quote
I have also attempted to connect to the D1 and N1 via JMX.
The console output you provided shows that the JMX connections were not successfully established. This means either the drivers/nodes you are connecting to are not running, or the host or port are incorrect.

For your convenience, I have attached the code of a client application I used to test the scenarios above. Could you give it a try and let us know if this works for you?

Sincerely,
-Laurent
Logged

Cobar7960

  • JPPF Padawan
  • *
  • Posts: 4
Re: Orphan node fail over
« Reply #4 on: July 25, 2018, 06:36:47 PM »

Laurent,
Thank you very much for your timely and concise answer!

Setting up the P2P between D1 and D2 with jppf.peer.allow.orphans = true solved either N1 or N2 being down. From the D1 client the job was routed to D2 and from the D2 client the job was routed to D1 when either N1 or N2 was shut down.





Logged
Pages: [1]   Go Up
 
JPPF Powered by SMF 2.0 RC5 | SMF © 2006–2011, Simple Machines LLC Get JPPF at SourceForge.net. Fast, secure and Free Open Source software downloads