adequate
adequate
adequate
adequate
 

JPPF, java, parallel computing, distributed computing, grid computing, parallel, distributed, cluster, grid, cloud, open source, android, .net
JPPF

The open source
grid computing
solution

 Home   About   Features   Download   Documentation   Forums 
September 19, 2018, 03:25:38 AM *
Welcome,
Please login or register.

Login with username, password and session length
Advanced search  
News: New users, please read this message. Thank you!
  Home Help Search Login Register  
Pages: [1]   Go Down

Author Topic: Driver Memory Issue  (Read 304 times)

arefaydi

  • JPPF Master
  • ***
  • Posts: 31
Driver Memory Issue
« on: October 25, 2017, 10:28:32 AM »

Hello
After grid starts to execute jobs, driver memory usage increasing continously until there is no free memory (actually, 400-500 mb remains). When operation system needs to start a new process, it is killing my application to free memory (I check this from kernel logs). Server total ram is 64 gb and when I run the same code in the task outside from grid, it only uses 3-4 gb ram and it doesn't increase like that. Because of everything is fine when not using grid, I think it is about the grid. Is there any parameter which cause (or effect) to this situation so I can test them with different values to observe the effect?
I monitor the memory usage with top command of os (ubuntu) and jppfs admin ui.
Here is my conf: Client, driver and its local nocal node embedded in the application and when I test this issue, jobs submitted by the client which in the same server with the first driver which has the most priority. According to this, client sends the jobs to first peer because of priority and other peers act like node.

JPPF 5.2.8
Client conf :
Code: [Select]
jppf.discovery.enabled =  false
jppf.drivers =  driver1 driver2 driver3 driver4

driver1.jppf.server.host =  10.254.104.41
driver1.jppf.server.port =  11113
driver1.jppf.pool.size = 4
driver1.jppf.ssl.enabled =  false
driver1.jppf.priority = 100

driver2.jppf.server.host =  10.254.104.157
driver2.jppf.server.port =  11113
driver2.jppf.pool.size = 4
driver2.jppf.ssl.enabled =  false
driver2.jppf.priority = 99

driver3.jppf.server.host =  10.254.104.84
driver3.jppf.server.port =  11113
driver3.jppf.pool.size = 4
driver3.jppf.ssl.enabled =  false
driver3.jppf.priority = 98

driver4.jppf.server.host =  10.254.104.83
driver4.jppf.server.port =  11113
driver4.jppf.pool.size = 4
driver4.jppf.ssl.enabled =  false
driver4.jppf.priority = 97

jppf.resolve.addresses =  true
jppf.load.balancing.algorithm =  manual
jppf.load.balancing.profile =  manual_profile
jppf.load.balancing.profile.manual_profile.size =  1000000
jppf.admin.refresh.interval.topology =  1000
jppf.admin.refresh.interval.health =  3000
jppf.socket.buffer.size =  131072
jppf.temp.buffer.size =  12288
jppf.temp.buffer.pool.size =  200
jppf.length.buffer.pool.size =  100
jppf.object.serialization.class =  org.jppf.serialization.DefaultJPPFSerialization

driver conf :
Code: [Select]
#------------------------------------------------------------------------------#
# JPPF                                                                         #
# Copyright (C) 2005-2016 JPPF Team.                                           #
# http://www.jppf.org                                                          #
#                                                                              #
# Licensed under the Apache License, Version 2.0 (the "License");              #
# you may not use this file except in compliance with the License.             #
# You may obtain a copy of the License at                                      #
#                                                                              #
# http://www.apache.org/licenses/LICENSE-2.0                                #
#                                                                              #
# Unless required by applicable law or agreed to in writing, software          #
# distributed under the License is distributed on an "AS IS" BASIS,            #
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.     #
# See the License for the specific language governing permissions and          #
# limitations under the License.                                               #
#------------------------------------------------------------------------------#

#------------------------------------------------------------------------------#
# port number to which the server listens for plain connections                #
# default value is 11111; uncomment to specify a different value               #
# to disable, specify a negative port number                                   #
#------------------------------------------------------------------------------#

jppf.server.port = 11113
#jppf.server.class.cache.enabled = false

#------------------------------------------------------------------------------#
# port number to which the server listens for secure connections               #
# default value is 11443; uncomment to specify a different value               #
# to disable, specify a negative port number                                   #
#------------------------------------------------------------------------------#

#jppf.ssl.server.port = 11443
jppf.ssl.server.port = -1

#------------------------------------------------------------------------------#
#                          SSL Settings                                        #
#------------------------------------------------------------------------------#

# location of the SSL configuration on the file system
#jppf.ssl.configuration.file = config/ssl/ssl-server.properties

# SSL configuration as an arbitrary source. Value is the fully qualified name
# of an implementation of java.util.concurrent.Callable<InputStream>
# with optional space-separated arguments
#jppf.ssl.configuration.source = org.jppf.ssl.FileStoreSource config/ssl/ssl-server.properties

# enable secure communications with other servers; defaults to false (disabled)#
#jppf.peer.ssl.enabled = true

#------------------------------------------------------------------------------#
# Enabling and configuring JMX features                                        #
#------------------------------------------------------------------------------#

# non-secure JMX connections; default is true (enabled)
#jppf.management.enabled = true

# secure JMX connections via SSL/TLS; default is false (disabled)
#jppf.management.ssl.enabled = true

# JMX management host IP address. If not specified (recommended), the first non-local
# IP address (i.e. neither 127.0.0.1 nor localhost) on this machine will be used.
# If no non-local IP is found, localhost will be used
#jppf.management.host = localhost

# JMX management port. Defaults to 11198. If the port is already bound, the driver
# will scan for the first available port instead.
#jppf.management.port = 11199

#------------------------------------------------------------------------------#
# Configuration of the driver discovery broadcast service                      #
#------------------------------------------------------------------------------#

# Enable/Disable automatic discovery of this JPPF drivers; default to true
jppf.discovery.enabled = true

# UDP multicast group to which drivers broadcast their connection parameters
# and to which clients and nodes listen. Default value is 230.0.0.1
#jppf.discovery.group = 230.0.0.1

# UDP multicast port to which drivers broadcast their connection parameters
# and to which clients and nodes listen. Default value is 11111
jppf.discovery.port = 11113

# Time between 2 broadcasts, in milliseconds. Default value is 1000
#jppf.discovery.broadcast.interval = 1000

# IPv4 inclusion patterns: broadcast these ipv4 addresses
#jppf.discovery.broadcast.include.ipv4 = 10.92.50.200

# IPv4 exclusion patterns: do not broadcast these ipv4 addresses
#jppf.discovery.exclude.ipv4 = 192.168.1.128-; 192.168.1.0/25

# IPv6 inclusion patterns: broadcast these ipv6 addresses
#jppf.discovery.include.ipv6 = 1080:0:0:0:8:800:200C-20FF:-; ::1/80

# IPv6 exclusion patterns: do not broadcast these ipv6 addresses
#jppf.discovery.exclude.ipv6 = 1080:0:0:0:8:800:200C-20FF:0C00-0EFF; ::1/64

#------------------------------------------------------------------------------#
# Connection with other servers, enabling P2P communication                    #
#------------------------------------------------------------------------------#

# Enable/disable auto-discovery of remote peer drivers. Default value is false
jppf.peer.discovery.enabled = true

# manual configuration of peer servers, as a space-separated list of peers names to connect to
#jppf.peers = server_1 server_2

# enable both automatic and manual discovery
#jppf.peers = jppf_discovery server_1 server_2

# connection to server_1
#jppf.peer.server_1.server.host = host_1
#jppf.peer.server_1.server.port = 11111
# connection to server_2
#jppf.peer.server_2.server.host = host_2
#jppf.peer.server_2.server.port = 11112

#------------------------------------------------------------------------------#
# Load-balancing configuration                                                 #
#------------------------------------------------------------------------------#

# name of the load-balancing algorithm to use; pre-defined possible values are:
# manual | autotuned | proportional | rl | nodethreads
# it can also be the name of a user-defined algorithm. Default value is "manual"
jppf.load.balancing.algorithm = nodethreads

# name of the set of parameter values (aka profile) to use for the algorithm
jppf.load.balancing.profile = nodethreads_profile

# "manual" profile
jppf.load.balancing.profile.manual_profile.size = 5

# "autotuned" profile
jppf.load.balancing.profile.autotuned_profile.size = 5
jppf.load.balancing.profile.autotuned_profile.minSamplesToAnalyse = 100
jppf.load.balancing.profile.autotuned_profile.minSamplesToCheckConvergence = 50
jppf.load.balancing.profile.autotuned_profile.maxDeviation = 0.2
jppf.load.balancing.profile.autotuned_profile.maxGuessToStable = 50
jppf.load.balancing.profile.autotuned_profile.sizeRatioDeviation = 1.5
jppf.load.balancing.profile.autotuned_profile.decreaseRatio = 0.2

# "proportional" profile
jppf.load.balancing.profile.proportional_profile.size = 5
jppf.load.balancing.profile.proportional_profile.initialMeanTime = 1e10
jppf.load.balancing.profile.proportional_profile.performanceCacheSize = 20
jppf.load.balancing.profile.proportional_profile.proportionalityFactor = 1

# "rl" profile
jppf.load.balancing.profile.rl_profile.performanceCacheSize = 1000
jppf.load.balancing.profile.rl_profile.performanceVariationThreshold = 0.0001
jppf.load.balancing.profile.rl_profile.maxActionRange = 10

# "nodethreads" profile
jppf.load.balancing.profile.nodethreads_profile.multiplicator = 1

# "rl2" profile
jppf.load.balancing.profile.rl2_profile.performanceCacheSize = 1000
jppf.load.balancing.profile.rl2_profile.performanceVariationThreshold = 0.75
jppf.load.balancing.profile.rl2_profile.minSamples = 20
jppf.load.balancing.profile.rl2_profile.maxSamples = 100
jppf.load.balancing.profile.rl2_profile.maxRelativeSize = 0.5

#------------------------------------------------------------------------------#
# Other JVM options added to the java command line when the driver is started  #
# as a subprocess. Multiple options are separated by spaces.                   #
#------------------------------------------------------------------------------#

#jppf.jvm.options = -Xmx256m -Djava.util.logging.config.file=config/logging-driver.properties

# example with remote debugging options
#jppf.jvm.options = -server -Xmx256m -Xrunjdwp:transport=dt_socket,address=localhost:8000,server=y,suspend=n

#------------------------------------------------------------------------------#
# path to the Java executable. When defined, it is used by the launch script   #
# (startDriver.bat or startDriver.sh) instead of the default Java path.        #
# It is undefined by default, meaning that the script will use the "java"      #
# command, relying on Java being in the system PATH.                           #
#------------------------------------------------------------------------------#

# linux/unix example
#jppf.java.path = /opt/java/jdk1.8.0_x64/bin/java
# windows example
#jppf.java.path = C:/java/jdk1.8.0_x64/bin/java.exe

#------------------------------------------------------------------------------#
# Specify alternate serialization schemes.                                     #
# Defaults to org.jppf.serialization.DefaultJavaSerialization.                 #
#------------------------------------------------------------------------------#

# default
#jppf.object.serialization.class = org.jppf.serialization.DefaultJavaSerialization

# built-in object serialization schemes
jppf.object.serialization.class = org.jppf.serialization.DefaultJPPFSerialization
#jppf.object.serialization.class = org.jppf.serialization.XstreamSerialization

# defined in the "Kryo Serialization" sample
#jppf.object.serialization.class = org.jppf.serialization.kryo.KryoSerialization

#------------------------------------------------------------------------------#
# Specify a data transformation class. If unspecified, no transformation occurs#
#------------------------------------------------------------------------------#

# Defined in the "Network Data Encryption" sample
#jppf.data.transform.class = org.jppf.example.dataencryption.SecureKeyCipherTransform

#------------------------------------------------------------------------------#
# whether to resolve the nodes' ip addresses into host names                   #
# defaults to true (resolve the addresses)                                     #
#------------------------------------------------------------------------------#

org.jppf.resolve.addresses = true

#------------------------------------------------------------------------------#
# Local (in-JVM) node. When enabled, any node-specific properties will apply   #
#------------------------------------------------------------------------------#

# Enable/disable the local node. Default is false (disabled)
jppf.local.node.enabled = true
jppf.local.node.bias = false
# example node-specific setting
#jppf.processing.threads = 2

#------------------------------------------------------------------------------#
# In idle mode configuration. In this mode the server or node starts when no   #
# mouse or keyboard activity has occurred since the specified timeout, and is  #
# stopped when any new activity occurs.                                        #
#------------------------------------------------------------------------------#

# Idle mode enabled/disabled. Default is false (disabled)
#jppf.idle.mode.enabled = false

# Fully qualified class name of the factory object that instantiates a platform-specific idle state detector
#jppf.idle.detector.factory = org.jppf.example.idlesystem.IdleTimeDetectorFactoryImpl

# Time of keyboard and mouse inactivity to consider the system idle, in milliseconds
# Default value is 300000 (5 minutes)
#jppf.idle.timeout = 6000

# Interval between 2 successive calls to the native APIs to determine idle state changes
# Default value is 1000
#jppf.idle.poll.interval = 1000

#------------------------------------------------------------------------------#
# Automatic recovery from hard failure of the nodes connections. These         #
# parameters configure how the driver reacts when a node fails to respond to   #
# its heartbeat messages.                                                      #
#------------------------------------------------------------------------------#

# Enable recovery from failures on the nodes. Default to false (disabled)
#jppf.recovery.enabled = false

# Max number of attempts to get a response from the node before the connection
# is considered broken. Default value is 3
#jppf.recovery.max.retries = 3

# Max time in milliseconds allowed for each attempt to get a response from the node.
# Default value is 6000 (6 seconds)
#jppf.recovery.read.timeout = 6000

# Dedicated port number for the detection of node failure. Defaults to 22222.
# If server discovery is enabled on the nodes, this value will override the port number specified in the nodes
#jppf.recovery.server.port = 22222

# Interval in milliseconds between two runs of the connection reaper
# Default value is 60000 (1 minute)
#jppf.recovery.reaper.run.interval = 60000

# Number of threads allocated to the reaper. Default to the number of available CPUs
#jppf.recovery.reaper.pool.size = 8

#------------------------------------------------------------------------------#
# Redirecting System.out and System.err to files.                              #
#------------------------------------------------------------------------------#

# file path on the file system where System.out is redirected.
# if unspecified or invalid, then no redirection occurs
#jppf.redirect.out = System.out.log
# whether to append to an existing file or to create a new one
jppf.redirect.out.append = false

# file path on the file system where System.err is redirected
# if unspecified or invalid, then no redirection occurs
#jppf.redirect.err = System.err.log
# whether to append to an existing file or to create a new one
jppf.redirect.err.append = false

#------------------------------------------------------------------------------#
# Global performance tuning parameters. These affect the performance and       #
# throughput of I/O operations in JPPF. The values provided in the vanilla     #
# JPPF distribution are known to offer a good performance in most situations   #
# and environments.                                                            #
#------------------------------------------------------------------------------#

# Size of send and receive buffer for socket connections.
# Defaults to 32768 and must be in range [1024, 1024*1024]
# 128 * 1024 = 131072
jppf.socket.buffer.size = 131072
# Size of temporary buffers (including direct buffers) used in I/O transfers.
# Defaults to 32768 and must be in range [1024, 1024*1024]
jppf.temp.buffer.size = 12288
# Maximum size of temporary buffers pool (excluding direct buffers). When this size
# is reached, new buffers are still created, but not released into the pool, so they
# can be quickly garbage-collected. The size of each buffer is defined with ${jppf.temp.buffer.size}
# Defaults to 10 and must be in range [1, 2048]
jppf.temp.buffer.pool.size = 200
# Size of temporary buffer pool for reading lengths as ints (size of each buffer is 4).
# Defaults to 100 and must be in range [1, 2048]
jppf.length.buffer.pool.size = 100

#------------------------------------------------------------------------------#
# Enabling or disabling the lookup of classpath resources in the file system   #
# Defaults to true (enabled)                                                   #
#------------------------------------------------------------------------------#

#jppf.classloader.file.lookup = true

#------------------------------------------------------------------------------#
# Timeout in millis for JMX requests. Defaults to Long.MAX_VALUE (2^63 - 1)    #
#------------------------------------------------------------------------------#

#jppf.jmx.request.timeout = $script{ java.lang.Long.MAX_VALUE }$



#--------------------------------- NODE CONFIGURATION -------------------------------------#

# JMX management port, defaults to 11198 (no SSL) or 11193 with SSL. If the port
# is already bound, the node will automatically scan for the next available port.
jppf.node.management.port = 12003


# time in seconds after which the system stops trying to reconnect
# A value of zero or less means the system never stops trying. Defaults to 60
jppf.reconnect.max.time = -1

jppf.reconnect.interval=5

#------------------------------------------------------------------------------#
# Processing Threads: number of threads running tasks in this node.            #
# default value is the number of available CPUs; uncomment to specify a        #
# different value. Blocking tasks might benefit from a number larger than CPUs #
#------------------------------------------------------------------------------#
jppf.processing.threads = 4

# JPPF class loader delegation model. values: parent | url, defaults to parent
jppf.classloader.delegation = parent

# size of the class loader cache in the node, defaults to 50
jppf.classloader.cache.size = 50

# class loader resource cache enabled? defaults to true.
# jppf.resource.cache.enabled = false

# resource cache's type of storage: either "file" (the default) or "memory"
jppf.resource.cache.storage = file

# Define a node as master. Defaults to true
jppf.node.provisioning.master = true
# Define a node as a slave. Defaults to false
jppf.node.provisioning.slave = false
# Specify the path prefix used for the root directory of each slave node
# defaults to "slave_nodes/node_", relative to the master root directory
jppf.node.provisioning.slave.path.prefix = slave_nodes/node_
# Specify the directory where slave-specific configuration files are located
# Defaults to the "config" folder, relative to the master root directory
#jppf.node.provisioning.slave.config.path = config
# A set of space-separated JVM options always added to the slave startup command
#jppf.node.provisioning.slave.jvm.options = -Dlog4j.configuration=config/log4j-node.properties
# Specify the number of slaves to launch upon master node startup. Defaults to 0
jppf.node.provisioning.startup.slaves = 0
Logged

lolo

  • Administrator
  • JPPF Council Member
  • *****
  • Posts: 2240
    • JPPF Web site
Re: Driver Memory Issue
« Reply #1 on: October 25, 2017, 08:10:00 PM »

Hello,

Could you post the full java options you use when launching the JVM that runs the driver + local node ? In particular, what value do you use for -Xmx ?

Thanks,
-Laurent
Logged

arefaydi

  • JPPF Master
  • ***
  • Posts: 31
Re: Driver Memory Issue
« Reply #2 on: October 26, 2017, 08:41:45 AM »

Hello,
I didn't set java_opts for the application, so I checked the default values with java -XX:+PrintFlagsFinal -version | grep HeapSize command and MaxHeapSize result is 16888365056 byte. Strange thing is
(with top command) application process' memory usage is stable (10-15 gb) but system's used memory increasing up to 63gb and when I kill the application, only 10-15 gb memory returned to system and free memory shows 48-53 gb that time. Again, it only occurs on first peer which act like active driver. 
Logged

arefaydi

  • JPPF Master
  • ***
  • Posts: 31
Re: Driver Memory Issue
« Reply #3 on: November 02, 2017, 02:11:54 PM »

Hello and sory

Quote
it only occurs on first peer which act like active driver
It was wrong , it occurs on others server too, only later than that server maybe because of memory size difference.

I think it is about linux kernel because process which using missing memory doesn't shown on the top command, similar issue https://superuser.com/questions/793192/invisible-memory-leak-on-linux-ubuntu-server-not-disk-cache-buffers. I will upgrade the kernel version and share the result as soon as possible. Sorry again to take your time.
Logged

arefaydi

  • JPPF Master
  • ***
  • Posts: 31
Re: Driver Memory Issue
« Reply #4 on: November 08, 2017, 08:11:23 AM »

Hello,

Finally resolved, it was about hibernate. I removed cascades from entity and memory usage decreased to 1-2 gb from 10-15 gb and system memory usage normal. We have cascaded entities three level and even we didn't need cascade operation, hibernate keeping references and checking child records if there is any change on them while every operation to parent record. On each level, there are 1000-1500 child records and there is alot of db operation on each task. I think, memory growing was so fast (gigabytes on minutes),  it was also generating memory fragmentation so memory error was occuring even there is memory. I'm really sorry to discuss this issue here and take your time because it was not about jppf.
Logged

lolo

  • Administrator
  • JPPF Council Member
  • *****
  • Posts: 2240
    • JPPF Web site
Re: Driver Memory Issue
« Reply #5 on: November 08, 2017, 08:31:27 AM »

Hello,

Thank you very much for providing all this feedback. But mostly I'm glad that you could figure out this issue.
Thanks to you I just learned soemthing about Hibernate, so our time is not wasted, and hopefully your sharing it in the community forums will help other developers as well.

Sincerely,
-Laurent
Logged
Pages: [1]   Go Up
 
JPPF Powered by SMF 2.0 RC5 | SMF © 2006–2011, Simple Machines LLC Get JPPF at SourceForge.net. Fast, secure and Free Open Source software downloads