adequate
adequate
adequate
adequate
 

JPPF, java, parallel computing, distributed computing, grid computing, parallel, distributed, cluster, grid, cloud, open source, android, .net
JPPF

The open source
grid computing
solution

 Home   About   Features   Download   Documentation   Forums 
December 17, 2017, 10:47:03 AM *
Welcome,
Please login or register.

Login with username, password and session length
Advanced search  
News: New users, please read this message. Thank you!
  Home Help Search Login Register  
Pages: [1]   Go Down

Author Topic: Idle node  (Read 104 times)

arefaydi

  • JPPF Master
  • ***
  • Posts: 31
Idle node
« on: October 10, 2017, 10:07:57 AM »

Hello,
I want all nodes in the grid work continously. I realized that if one of the node finishes its task before others, it waits until others finish their tasks. I set drivern.jppf.pool.size=4 on client config and jppf.load.balancing.profile.manual_profile.size = 5 (equal to jppf.processing.threads). There are jobs on queue always so i think nodes shouldn't wait according to this configuration.  Is it expected behaivour, if so is there a workaround?
JPPF version : 5.2.8
driver config
Code: [Select]
#------------------------------------------------------------------------------#
# JPPF                                                                         #
# Copyright (C) 2005-2016 JPPF Team.                                           #
# http://www.jppf.org                                                          #
#                                                                              #
# Licensed under the Apache License, Version 2.0 (the "License");              #
# you may not use this file except in compliance with the License.             #
# You may obtain a copy of the License at                                      #
#                                                                              #
# http://www.apache.org/licenses/LICENSE-2.0                                #
#                                                                              #
# Unless required by applicable law or agreed to in writing, software          #
# distributed under the License is distributed on an "AS IS" BASIS,            #
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.     #
# See the License for the specific language governing permissions and          #
# limitations under the License.                                               #
#------------------------------------------------------------------------------#

#------------------------------------------------------------------------------#
# port number to which the server listens for plain connections                #
# default value is 11111; uncomment to specify a different value               #
# to disable, specify a negative port number                                   #
#------------------------------------------------------------------------------#

jppf.server.port = 11113

#------------------------------------------------------------------------------#
# port number to which the server listens for secure connections               #
# default value is 11443; uncomment to specify a different value               #
# to disable, specify a negative port number                                   #
#------------------------------------------------------------------------------#

#jppf.ssl.server.port = 11443
jppf.ssl.server.port = -1

#------------------------------------------------------------------------------#
#                          SSL Settings                                        #
#------------------------------------------------------------------------------#

# location of the SSL configuration on the file system
#jppf.ssl.configuration.file = config/ssl/ssl-server.properties

# SSL configuration as an arbitrary source. Value is the fully qualified name
# of an implementation of java.util.concurrent.Callable<InputStream>
# with optional space-separated arguments
#jppf.ssl.configuration.source = org.jppf.ssl.FileStoreSource config/ssl/ssl-server.properties

# enable secure communications with other servers; defaults to false (disabled)#
#jppf.peer.ssl.enabled = true

#------------------------------------------------------------------------------#
# Enabling and configuring JMX features                                        #
#------------------------------------------------------------------------------#

# non-secure JMX connections; default is true (enabled)
#jppf.management.enabled = true

# secure JMX connections via SSL/TLS; default is false (disabled)
#jppf.management.ssl.enabled = true

# JMX management host IP address. If not specified (recommended), the first non-local
# IP address (i.e. neither 127.0.0.1 nor localhost) on this machine will be used.
# If no non-local IP is found, localhost will be used
#jppf.management.host = localhost

# JMX management port. Defaults to 11198. If the port is already bound, the driver
# will scan for the first available port instead.
#jppf.management.port = 11199

#------------------------------------------------------------------------------#
# Configuration of the driver discovery broadcast service                      #
#------------------------------------------------------------------------------#

# Enable/Disable automatic discovery of this JPPF drivers; default to true
jppf.discovery.enabled = true

# UDP multicast group to which drivers broadcast their connection parameters
# and to which clients and nodes listen. Default value is 230.0.0.1
#jppf.discovery.group = 230.0.0.1

# UDP multicast port to which drivers broadcast their connection parameters
# and to which clients and nodes listen. Default value is 11111
jppf.discovery.port = 11113

# Time between 2 broadcasts, in milliseconds. Default value is 1000
#jppf.discovery.broadcast.interval = 1000

# IPv4 inclusion patterns: broadcast these ipv4 addresses
#jppf.discovery.broadcast.include.ipv4 = 10.92.50.200

# IPv4 exclusion patterns: do not broadcast these ipv4 addresses
#jppf.discovery.exclude.ipv4 = 192.168.1.128-; 192.168.1.0/25

# IPv6 inclusion patterns: broadcast these ipv6 addresses
#jppf.discovery.include.ipv6 = 1080:0:0:0:8:800:200C-20FF:-; ::1/80

# IPv6 exclusion patterns: do not broadcast these ipv6 addresses
#jppf.discovery.exclude.ipv6 = 1080:0:0:0:8:800:200C-20FF:0C00-0EFF; ::1/64

#------------------------------------------------------------------------------#
# Connection with other servers, enabling P2P communication                    #
#------------------------------------------------------------------------------#

# Enable/disable auto-discovery of remote peer drivers. Default value is false
jppf.peer.discovery.enabled = true

# manual configuration of peer servers, as a space-separated list of peers names to connect to
#jppf.peers = server_1 server_2

# enable both automatic and manual discovery
#jppf.peers = jppf_discovery server_1 server_2

# connection to server_1
#jppf.peer.server_1.server.host = host_1
#jppf.peer.server_1.server.port = 11111
# connection to server_2
#jppf.peer.server_2.server.host = host_2
#jppf.peer.server_2.server.port = 11112

#------------------------------------------------------------------------------#
# Load-balancing configuration                                                 #
#------------------------------------------------------------------------------#

# name of the load-balancing algorithm to use; pre-defined possible values are:
# manual | autotuned | proportional | rl | nodethreads
# it can also be the name of a user-defined algorithm. Default value is "manual"
jppf.load.balancing.algorithm = manual

# name of the set of parameter values (aka profile) to use for the algorithm
jppf.load.balancing.profile = manual_profile

# "manual" profile
jppf.load.balancing.profile.manual_profile.size = 5

# "autotuned" profile
jppf.load.balancing.profile.autotuned_profile.size = 5
jppf.load.balancing.profile.autotuned_profile.minSamplesToAnalyse = 100
jppf.load.balancing.profile.autotuned_profile.minSamplesToCheckConvergence = 50
jppf.load.balancing.profile.autotuned_profile.maxDeviation = 0.2
jppf.load.balancing.profile.autotuned_profile.maxGuessToStable = 50
jppf.load.balancing.profile.autotuned_profile.sizeRatioDeviation = 1.5
jppf.load.balancing.profile.autotuned_profile.decreaseRatio = 0.2

# "proportional" profile
jppf.load.balancing.profile.proportional_profile.size = 5
jppf.load.balancing.profile.proportional_profile.initialMeanTime = 1e10
jppf.load.balancing.profile.proportional_profile.performanceCacheSize = 20
jppf.load.balancing.profile.proportional_profile.proportionalityFactor = 1

# "rl" profile
jppf.load.balancing.profile.rl_profile.performanceCacheSize = 1000
jppf.load.balancing.profile.rl_profile.performanceVariationThreshold = 0.0001
jppf.load.balancing.profile.rl_profile.maxActionRange = 10

# "nodethreads" profile
jppf.load.balancing.profile.nodethreads_profile.multiplicator = 1

# "rl2" profile
jppf.load.balancing.profile.rl2_profile.performanceCacheSize = 1000
jppf.load.balancing.profile.rl2_profile.performanceVariationThreshold = 0.75
jppf.load.balancing.profile.rl2_profile.minSamples = 20
jppf.load.balancing.profile.rl2_profile.maxSamples = 100
jppf.load.balancing.profile.rl2_profile.maxRelativeSize = 0.5

#------------------------------------------------------------------------------#
# Other JVM options added to the java command line when the driver is started  #
# as a subprocess. Multiple options are separated by spaces.                   #
#------------------------------------------------------------------------------#

#jppf.jvm.options = -Xmx256m -Djava.util.logging.config.file=config/logging-driver.properties

# example with remote debugging options
#jppf.jvm.options = -server -Xmx256m -Xrunjdwp:transport=dt_socket,address=localhost:8000,server=y,suspend=n

#------------------------------------------------------------------------------#
# path to the Java executable. When defined, it is used by the launch script   #
# (startDriver.bat or startDriver.sh) instead of the default Java path.        #
# It is undefined by default, meaning that the script will use the "java"      #
# command, relying on Java being in the system PATH.                           #
#------------------------------------------------------------------------------#

# linux/unix example
#jppf.java.path = /opt/java/jdk1.8.0_x64/bin/java
# windows example
#jppf.java.path = C:/java/jdk1.8.0_x64/bin/java.exe

#------------------------------------------------------------------------------#
# Specify alternate serialization schemes.                                     #
# Defaults to org.jppf.serialization.DefaultJavaSerialization.                 #
#------------------------------------------------------------------------------#

# default
#jppf.object.serialization.class = org.jppf.serialization.DefaultJavaSerialization

# built-in object serialization schemes
jppf.object.serialization.class = org.jppf.serialization.DefaultJPPFSerialization
#jppf.object.serialization.class = org.jppf.serialization.XstreamSerialization

# defined in the "Kryo Serialization" sample
#jppf.object.serialization.class = org.jppf.serialization.kryo.KryoSerialization

#------------------------------------------------------------------------------#
# Specify a data transformation class. If unspecified, no transformation occurs#
#------------------------------------------------------------------------------#

# Defined in the "Network Data Encryption" sample
#jppf.data.transform.class = org.jppf.example.dataencryption.SecureKeyCipherTransform

#------------------------------------------------------------------------------#
# whether to resolve the nodes' ip addresses into host names                   #
# defaults to true (resolve the addresses)                                     #
#------------------------------------------------------------------------------#

org.jppf.resolve.addresses = true

#------------------------------------------------------------------------------#
# Local (in-JVM) node. When enabled, any node-specific properties will apply   #
#------------------------------------------------------------------------------#

# Enable/disable the local node. Default is false (disabled)
jppf.local.node.enabled = true
jppf.local.node.bias = false
# example node-specific setting
#jppf.processing.threads = 2

#------------------------------------------------------------------------------#
# In idle mode configuration. In this mode the server or node starts when no   #
# mouse or keyboard activity has occurred since the specified timeout, and is  #
# stopped when any new activity occurs.                                        #
#------------------------------------------------------------------------------#

# Idle mode enabled/disabled. Default is false (disabled)
#jppf.idle.mode.enabled = false

# Fully qualified class name of the factory object that instantiates a platform-specific idle state detector
#jppf.idle.detector.factory = org.jppf.example.idlesystem.IdleTimeDetectorFactoryImpl

# Time of keyboard and mouse inactivity to consider the system idle, in milliseconds
# Default value is 300000 (5 minutes)
#jppf.idle.timeout = 6000

# Interval between 2 successive calls to the native APIs to determine idle state changes
# Default value is 1000
#jppf.idle.poll.interval = 1000

#------------------------------------------------------------------------------#
# Automatic recovery from hard failure of the nodes connections. These         #
# parameters configure how the driver reacts when a node fails to respond to   #
# its heartbeat messages.                                                      #
#------------------------------------------------------------------------------#

# Enable recovery from failures on the nodes. Default to false (disabled)
#jppf.recovery.enabled = false

# Max number of attempts to get a response from the node before the connection
# is considered broken. Default value is 3
#jppf.recovery.max.retries = 3

# Max time in milliseconds allowed for each attempt to get a response from the node.
# Default value is 6000 (6 seconds)
#jppf.recovery.read.timeout = 6000

# Dedicated port number for the detection of node failure. Defaults to 22222.
# If server discovery is enabled on the nodes, this value will override the port number specified in the nodes
#jppf.recovery.server.port = 22222

# Interval in milliseconds between two runs of the connection reaper
# Default value is 60000 (1 minute)
#jppf.recovery.reaper.run.interval = 60000

# Number of threads allocated to the reaper. Default to the number of available CPUs
#jppf.recovery.reaper.pool.size = 8

#------------------------------------------------------------------------------#
# Redirecting System.out and System.err to files.                              #
#------------------------------------------------------------------------------#

# file path on the file system where System.out is redirected.
# if unspecified or invalid, then no redirection occurs
#jppf.redirect.out = System.out.log
# whether to append to an existing file or to create a new one
jppf.redirect.out.append = false

# file path on the file system where System.err is redirected
# if unspecified or invalid, then no redirection occurs
#jppf.redirect.err = System.err.log
# whether to append to an existing file or to create a new one
jppf.redirect.err.append = false

#------------------------------------------------------------------------------#
# Global performance tuning parameters. These affect the performance and       #
# throughput of I/O operations in JPPF. The values provided in the vanilla     #
# JPPF distribution are known to offer a good performance in most situations   #
# and environments.                                                            #
#------------------------------------------------------------------------------#

# Size of send and receive buffer for socket connections.
# Defaults to 32768 and must be in range [1024, 1024*1024]
# 128 * 1024 = 131072
jppf.socket.buffer.size = 131072
# Size of temporary buffers (including direct buffers) used in I/O transfers.
# Defaults to 32768 and must be in range [1024, 1024*1024]
jppf.temp.buffer.size = 12288
# Maximum size of temporary buffers pool (excluding direct buffers). When this size
# is reached, new buffers are still created, but not released into the pool, so they
# can be quickly garbage-collected. The size of each buffer is defined with ${jppf.temp.buffer.size}
# Defaults to 10 and must be in range [1, 2048]
jppf.temp.buffer.pool.size = 200
# Size of temporary buffer pool for reading lengths as ints (size of each buffer is 4).
# Defaults to 100 and must be in range [1, 2048]
jppf.length.buffer.pool.size = 100

#------------------------------------------------------------------------------#
# Enabling or disabling the lookup of classpath resources in the file system   #
# Defaults to true (enabled)                                                   #
#------------------------------------------------------------------------------#

#jppf.classloader.file.lookup = true

#------------------------------------------------------------------------------#
# Timeout in millis for JMX requests. Defaults to Long.MAX_VALUE (2^63 - 1)    #
#------------------------------------------------------------------------------#

#jppf.jmx.request.timeout = $script{ java.lang.Long.MAX_VALUE }$



#--------------------------------- NODE CONFIGURATION -------------------------------------#

# JMX management port, defaults to 11198 (no SSL) or 11193 with SSL. If the port
# is already bound, the node will automatically scan for the next available port.
jppf.node.management.port = 12003


# time in seconds after which the system stops trying to reconnect
# A value of zero or less means the system never stops trying. Defaults to 60
jppf.reconnect.max.time = 5

#------------------------------------------------------------------------------#
# Processing Threads: number of threads running tasks in this node.            #
# default value is the number of available CPUs; uncomment to specify a        #
# different value. Blocking tasks might benefit from a number larger than CPUs #
#------------------------------------------------------------------------------#
jppf.processing.threads = 5

# JPPF class loader delegation model. values: parent | url, defaults to parent
jppf.classloader.delegation = parent

# size of the class loader cache in the node, defaults to 50
jppf.classloader.cache.size = 50

# class loader resource cache enabled? defaults to true.
jppf.resource.cache.enabled = true

# resource cache's type of storage: either "file" (the default) or "memory"
jppf.resource.cache.storage = file

# Define a node as master. Defaults to true
jppf.node.provisioning.master = true
# Define a node as a slave. Defaults to false
jppf.node.provisioning.slave = false
# Specify the path prefix used for the root directory of each slave node
# defaults to "slave_nodes/node_", relative to the master root directory
jppf.node.provisioning.slave.path.prefix = slave_nodes/node_
# Specify the directory where slave-specific configuration files are located
# Defaults to the "config" folder, relative to the master root directory
#jppf.node.provisioning.slave.config.path = config
# A set of space-separated JVM options always added to the slave startup command
#jppf.node.provisioning.slave.jvm.options = -Dlog4j.configuration=config/log4j-node.properties
# Specify the number of slaves to launch upon master node startup. Defaults to 0
jppf.node.provisioning.startup.slaves = 0
client config :
Code: [Select]
jppf.discovery.enabled =  false
jppf.drivers =  driver1 driver2 driver3 driver4

driver1.jppf.server.host =  10.254.104.41
driver1.jppf.server.port =  11113
driver1.jppf.pool.size = 4
driver1.jppf.ssl.enabled =  false
driver1.jppf.priority = 100

driver2.jppf.server.host =  10.254.104.157
driver2.jppf.server.port =  11113
driver2.jppf.pool.size = 4
driver2.jppf.ssl.enabled =  false
driver2.jppf.priority = 99

driver3.jppf.server.host =  10.254.104.84
driver3.jppf.server.port =  11113
driver3.jppf.pool.size = 4
driver3.jppf.ssl.enabled =  false
driver3.jppf.priority = 98

driver4.jppf.server.host =  10.254.104.83
driver4.jppf.server.port =  11113
driver4.jppf.pool.size = 4
driver4.jppf.ssl.enabled =  false
driver4.jppf.priority = 97

jppf.resolve.addresses =  true
jppf.load.balancing.algorithm =  manual
jppf.load.balancing.profile =  manual_profile
jppf.load.balancing.profile.manual_profile.size =  1000000
jppf.admin.refresh.interval.topology =  1000
jppf.admin.refresh.interval.health =  3000
jppf.socket.buffer.size =  131072
jppf.temp.buffer.size =  12288
jppf.temp.buffer.pool.size =  200
jppf.length.buffer.pool.size =  100
jppf.object.serialization.class =  org.jppf.serialization.DefaultJPPFSerialization
Logged

lolo

  • Administrator
  • JPPF Council Member
  • *****
  • Posts: 2209
    • JPPF Web site
Re: Idle node
« Reply #1 on: October 10, 2017, 08:59:42 PM »

Hello,

When you say "There are jobs on queue always", can you please tell us whether it is the driver or the client queue, and how you observe/measure it?

If this is the driver queue, then could you also tell us how many tasks you have typically in each job? Since you configured the drive load-balancing to send up to 5 tasks at once to each node, whether you have more than 5 tasks in each job or not will greatly influence how busy the nodes will be.

If the jobs remain in the client queue, then the problem is likely to be a lack of concurrency in the jobs submission.
I see that you configured a pool of 4 connections to the driver (with driver1.jppf.pool.size = 4). This means that the JPPF client can submit up to 4 jobs concurrently to the driver. However, whether your jobs are submitted concurrently to the driver or not depends on how you configure them and on how you submit them. Are you using something similar to one of the patterns described in the "Submitting multiple jobs concurrently" documentation?

Sincerely,
-Laurent
Logged

arefaydi

  • JPPF Master
  • ***
  • Posts: 31
Re: Idle node
« Reply #2 on: October 11, 2017, 11:00:30 AM »

Hello,
I didn't use the patterns you mention it because job creation triggered by requests made by another application. My job submitting and monitoring process like this (ı couldn't use monitoring tool because of port permissions to servers):
  • When a request arrived, I create and submit job to client and save it to db with id and PENDING state.
  • I also save tasks to db with PENDING state.
  • I change state of the job with JobListener.
  • On job started event, I update the job record's state to IN_PROGRESS.
  • On job end event, I update the job record's state to COMPLETED and update its task records according to its result.
  • There are console logs (when start and finish with identifying informations) on task code, so I can see if the nodes execute a task or remain iddle.
I can see that there is always 10-15 job records with PENDING state (client queue) and 4 jobs with IN_PROGRESS state (driver queue).
I realize iddle nodes like this : Sometimes, some of the nodes didn't print task logs while others printing (duration varies, sometimes 10 minutes). When others finishes their tasks and new tasks delivered to nodes, the iddle ones starting print task logs like others.
Job's state, logs and other records are consistent. Task count per job varies 1 to 150, but, since there is always 4 job in the driver queue, node shouldn't wait iddle.

Edit :
 Waiting may occur when total task count of jobs on the driver queue smaller than nodes total capacity. But I observed the issue while jobs on the driver queue has pending tasks more than nodes total capacity (node _count * thread_per_node).

My job submitting code like this:

Code: [Select]
...
JPPFJob jppfJob = new JPPFJob(uuid);
jppfJob.addJobListener(queryJobListener); // my listener to update states
jppfJob.setBlocking(false);
jppfJob.getSLA().setCancelUponClientDisconnect(true);
jppfJob.getSLA().setPriority(priority); // the problem occured when all jobs have same priority too, current priority calculations like this: if there is only one task on job, priority get max value, else priority will be incrementalId * -1  (jobs created later will have lower priority than previously created ones to achive FIFO)
... (adding tasks to jobs)
JPPFClientProvider.getClient().submitJob(jppfJob);

My client provider :
Code: [Select]
public class JPPFClientProvider {
    private static JPPFClient jppfClient;

    public static TypedProperties getClientConfig(){
        TypedProperties clientConfig = new TypedProperties();
        try {
            String paramFileName=System.getProperty("jppfClientConfig"); // client config file which I sended on first post.
            InputStream input = new FileInputStream(paramFileName);
            clientConfig.load(input);
        } catch (FileNotFoundException e) {
            e.printStackTrace();
        } catch (IOException e) {
            e.printStackTrace();
        }
        return clientConfig;
    }

    public static JPPFClient generateClient(){
        TypedProperties clientConfig=getClientConfig();
        JPPFClient jppfClient = new JPPFClient(null, clientConfig, (ConnectionPoolListener[]) null);
        jppfClient.awaitWorkingConnectionPool();
        Logger.info("Client created");
        try {
            jppfClient.setLoadBalancerSettings("manual", new TypedProperties().setInt("size", 1_000_000));
        } catch (Exception e) {
            e.printStackTrace();
        }
        return jppfClient;
    }

    public static JPPFClient getClient() {
        if(jppfClient==null){
            jppfClient=generateClient();
        }
        return jppfClient;
    }

}
« Last Edit: October 11, 2017, 12:02:54 PM by arefaydi »
Logged

lolo

  • Administrator
  • JPPF Council Member
  • *****
  • Posts: 2209
    • JPPF Web site
Re: Idle node
« Reply #3 on: October 12, 2017, 07:24:10 AM »

Hello,

Thank you very much for this detailed information.

Given your configuration and observed behavior, the first thing I would try is to increase the size of the connection pool on the client side, to a much larger number. For instance you could try with driver1.jppf.pool.size = 20 (20 instead of 4) and tune later, depending on the resulting throughput. I'm suggesting 20 because that's the sum of (pending + in-progress) jobs that you observed. This will cause the driver to have up to 20 jobs in its queue, and should result in a much better utilization of the nodes.

Given the load-balancing configuration in the driver ("manual" algorithm with size = 5) and the fact that each node can only process one job at a time, you might want to consider the following:

- the maximum total number of tasks executed by the nodes at any time in the whole grid is (5 * node_count)
- I'm assuming that iin general, the number of tasks in a job is not a multiple of 5. It means that some nodes will get less than 5 tasks. For example, if a job has 7 tasks, then one node will get 5 tasks and another one wil only get 2
- if the number of processing threads in each node is important to you in terms of performance, you might want to use the "nodethreads" algorithm instead of the "manual" one, because the number of tasks it sends to each node is proportional to each node's thread count. This is especially useful when not all nodes ahve the same thread count. This should also increase the average usage rate of your nodes

Sincerely,
-Laurent
Logged

arefaydi

  • JPPF Master
  • ***
  • Posts: 31
Re: Idle node
« Reply #4 on: October 12, 2017, 09:58:02 AM »

Hello,
I was testing with one client, in real senario, there will be 4 client and as a result, max driver queue size will be 16, and it will enough to fill all nodes according to average task size of jobs. I don't want to increase more because it may cause extra network connection and traffic.
I can change the algorithm to nodethreads and set the multiplicator bigger than one (setting it to 1 will be same with current config I think). But if the second situation below is true, nodethread with multiplicator can cause more waiting.

Quote
the fact that each node can only process one job at a time
  According to this, with 4 node have 5 thread, "manual" algorithm with size = 5 and when there are two job on the driver queue (first one has 17 task, second one has 15 task). Task dispatching to nodes will be like this : N1: 5 task, N2: 5 task, N3: 5 task, N4: 2 task .
There are two issue for this:
  • N4 doesn't take task from multiple jobs at same time even there is another job on driver queue, is it true?
  • Do task delivering to nodes occurs at same time for all nodes, I mean if N4 finishes its tasks before others, it will wait others to finish their tasks to take new ones? If so, waiting may occur another situations like some tasks requires more time from others even task counts equals on each node.
Logged

lolo

  • Administrator
  • JPPF Council Member
  • *****
  • Posts: 2209
    • JPPF Web site
Re: Idle node
« Reply #5 on: October 12, 2017, 11:58:06 AM »

Hello,

Quote
I can change the algorithm to nodethreads and set the multiplicator bigger than one (setting it to 1 will be same with current config I think). But if the second situation below is true, nodethread with multiplicator can cause more waiting.
You are right. If all your nodes have 5 threads, then changing to "nodethreads" will have no positive effect.

Quote
N4 doesn't take task from multiple jobs at same time even there is another job on driver queue, is it true?
Yes, this is how the nodes are designed.

Quote
Do task delivering to nodes occurs at same time for all nodes
No, a node will receive tasks as soon as it is available and there is a job in the driver queue. It doesn't wait for anything else.

I hope this clarifies,
-Laurent
Logged

arefaydi

  • JPPF Master
  • ***
  • Posts: 31
Re: Idle node
« Reply #6 on: November 07, 2017, 02:36:15 PM »

Hello,

I realized that I defined the problem incorrectly. I supposed that it is an issue of getting task from next jobs on the driver queue. But it occurs with only one job (which has enough task to fill all the nodes) too.
So, I think it is not the queue issue. When I monitor the iddle node from admin-ui, it appears 'iddle' on the topology view. On the charts->performance->node exec vs transport diagram, diagram shows only blue (latest transport time) while it is iddle (duration varies 2-3 minutes, I will share exact durations and frequency of the issue as soon as possibble). It may seem unimportant but performance and stability of the application is really important, so it would be good to know if there is a bug or configuration issue. Is it possible that the node is not executing a new task because of it is sending result of the task back or waiting transfering new task to it that time? Transfer duration cannot be that long normally because other nodes starting to execution while it is iddle and iddle one changing randomly every execution. How can I detect if a transfer is in progress to/from a node while it showing as iddle (with using admin-ui, code or configuration) ?

Edit:
There is a parameter named jppf.transition.thread.pool.size which has default value to processor count. Maybe io threads isn't enough.  Is there a recommended value for it (multiply by node/peer_count/thread_per_node)?

Edit2:
Each node (except active driver's local node) waits 3-4 hours for a day. It must be system network or grid network configuration issue because there is no wait on active driver's local node which takes tasks without network connection.

« Last Edit: November 14, 2017, 09:32:37 AM by arefaydi »
Logged

arefaydi

  • JPPF Master
  • ***
  • Posts: 31
Re: Idle node
« Reply #7 on: November 23, 2017, 11:49:15 AM »

Hello,
I monitored job event with http://www.jppf.org/samples-pack/PluggableView/ and realized that when node stays iddle for a while and change state iddle to executing, job returned event log printing for it. So, it was waiting for returning task results. How can I reduce this duration? I have already set parameters below but the problem still exists.

jppf.transition.thread.pool.size=80 (server has 16 core)
jppf.socket.buffer.size = 1048576
jppf.temp.buffer.size = 1048576
jppf.temp.buffer.pool.size = 1000
jppf.length.buffer.pool.size = 1000

Note :
There was some changes from original post.
there are 5 client and 5 peer now
jppf.processing.threads = 16
drivern.jppf.pool.size = 20
 
Logged

lolo

  • Administrator
  • JPPF Council Member
  • *****
  • Posts: 2209
    • JPPF Web site
Re: Idle node
« Reply #8 on: November 28, 2017, 06:40:30 AM »

Hello,

As far as I can see, the behavior you describe is not a problem of configuration of the nodes or driver. A node returns the tasks as soonas they have all completed, that is, as soon as the execution of the run() method of each task is finished, either rnormally or via an exception. The nide doesn't wait for anything else before sending the exected tasks back to the driver.

What I'm suspecting is that some of your tasks are taking a very long time to execute, for some unkown reason that we need to find out. Some possibilities I can see:

1) the task spends a lot of time trying to connect to or access a remote process or service. Some time measurments around the code that does that may allow to see if that's where the time is spent

2) another possibility is that the tasks are very large after their execution has finished, and several situations may arise:
- it may take a very long time to return them back to the server, especially if the network is slow or overloaded
- an out of memory condition may occur during the serialization of the tasks. An easy way to detect that is to add the -XX:+HeapDumpOnOutOfMemoryError option to the node's JVM and then check if the node generated a heap dump (.hprof) file. Note that there is no guaranteed way to recover safely from an OutOfMemoryError, apart from restarting the JVM.
 - it is also possible that the serialized form of a task reaches the size limit of 2 GB, or that the nodes fails to serialize the task for any reason, in which case the node will return an instance of JPPFExceptionResult instead of the task.

You might also want to add a timestamp attribute to your task, to assign the time at which it finished executing, then check it against the current time when you receive the result in the client, so you can have a rough idea of how long it took to return the task back from the node to the client (assuming the node and client clocks are not too much out of sync). This should help narrow down the area of investigation.

Sincerely,
-Laurent
Logged

arefaydi

  • JPPF Master
  • ***
  • Posts: 31
Re: Idle node
« Reply #9 on: November 28, 2017, 08:14:26 AM »

Hello,

There was another issue (http://www.jppf.org/forums/index.php/topic,8025.msg12729.html#msg12729) which  related to this one.
Quote
A node returns the tasks as soonas they have all completed
As you can see on the screenshot of jobData on that post, tasks were sometimes redirected to another peers and it was causing extra steps while delivering and returning task results and sometimes peers couldn't perform these transfers on time because these steps shouldn't occur normally (I think). Also sometimes peers which on path of tasks couldn't take new tasks until they finished returning their results.
For example
job submitted to p1, it sends tasks to p2 and p2 sends them to p3.
p3 completes tasks and return results to p2
p2 couldn't return these results to p1 for a while.
p1 couldn't take new tasks until p2 return results to p1.
sometimes, p2 couldn't take new tasks too until it return results to p1. (I didn't examine this clause, it may be irrelivent)

Solution (or workaround)
I changed
Code: [Select]
jppf.local.node.bias = false to true and peers aren't redirecting tasks to another peers now, extra steps gone and problem resolved :).

Logged
Pages: [1]   Go Up
 
JPPF Powered by SMF 2.0 RC5 | SMF © 2006–2011, Simple Machines LLC Get JPPF at SourceForge.net. Fast, secure and Free Open Source software downloads