JPPF, java, parallel computing, distributed computing, grid computing, parallel, distributed, cluster, grid, cloud, open source, android, .net
JPPF

The open source
grid computing
solution

 Home   About   Features   Download   Documentation   On Github   Forums 
December 13, 2019, 01:53:12 PM *
Welcome,
Please login or register.

Login with username, password and session length
Advanced search  
News: New users, please read this message. Thank you!
  Home Help Search Login Register  
Pages: [1] 2   Go Down

Author Topic: Node and Server Connectivity  (Read 15362 times)

jbd

  • JPPF Knight
  • **
  • Posts: 19
Node and Server Connectivity
« on: July 28, 2011, 08:01:01 PM »

Hello,

First of all, thank you for creating JPPF, I have been looking for this type of framework for awhile and it looks promising.  This will play a major role in our system if everything works out.

I ran the sample on my local windows7 with no problems.  When I tried to do the same on our AIX server I am having trouble getting the node to communicate with the server.  The server and node are running on the same machine at this time.

FYI, the AIX server shows 5 ip's

-----------------------
Scenario 1 - No changes to the config (server and node).

When the server starts I get several messages with ip's listed above - Unable to bind to interface xx.xxx.xx.xxx on port 11111

When the node starts I get the following (node log)

2011-07-28 08:43:07,865 [ERROR][org.jppf.comm.discovery.JPPFMulticastReceiver.run(267)]: The socket name is not available on this system.
java.net.SocketException: The socket name is not available on this system.
        at java.net.PlainDatagramSocketImpl.setOption(PlainDatagramSocketImpl.java:398)
        at java.net.MulticastSocket.setInterface(MulticastSocket.java:421)
        at org.jppf.comm.discovery.JPPFMulticastReceiver$Receiver.run(JPPFMulticastReceiver.java:233)
2011-07-28 08:43:07,875 [ERROR][org.jppf.comm.discovery.JPPFMulticastReceiver.run(267)]: The socket name is not available on this system.
java.net.SocketException: The socket name is not available on this system.
        at java.net.PlainDatagramSocketImpl.setOption(PlainDatagramSocketImpl.java:398)
        at java.net.MulticastSocket.setInterface(MulticastSocket.java:421)
        at org.jppf.comm.discovery.JPPFMulticastReceiver$Receiver.run(JPPFMulticastReceiver.java:233)
2011-07-28 08:43:07,865 [ERROR][org.jppf.comm.discovery.JPPFMulticastReceiver.run(267)]: The socket name is not available on this system.
java.net.SocketException: The socket name is not available on this system.
        at java.net.PlainDatagramSocketImpl.setOption(PlainDatagramSocketImpl.java:398)
        at java.net.MulticastSocket.setInterface(MulticastSocket.java:421)
        at org.jppf.comm.discovery.JPPFMulticastReceiver$Receiver.run(JPPFMulticastReceiver.java:233)
2011-07-28 08:43:07,887 [ERROR][org.jppf.comm.discovery.JPPFMulticastReceiver.run(267)]: The socket name is not available on this system.
java.net.SocketException: The socket name is not available on this system.
        at java.net.PlainDatagramSocketImpl.setOption(PlainDatagramSocketImpl.java:398)
        at java.net.MulticastSocket.setInterface(MulticastSocket.java:421)
        at org.jppf.comm.discovery.JPPFMulticastReceiver$Receiver.run(JPPFMulticastReceiver.java:233)
2011-07-28 08:43:07,895 [ERROR][org.jppf.comm.discovery.JPPFMulticastReceiver.run(267)]: The socket name is not available on this system.
2011-07-28 08:43:07,895 [ERROR][org.jppf.comm.discovery.JPPFMulticastReceiver.run(267)]: The socket name is not available on this system.
java.net.SocketException: The socket name is not available on this system.
        at java.net.PlainDatagramSocketImpl.setOption(PlainDatagramSocketImpl.java:398)
        at java.net.MulticastSocket.setInterface(MulticastSocket.java:421)
        at org.jppf.comm.discovery.JPPFMulticastReceiver$Receiver.run(JPPFMulticastReceiver.java:233)
2011-07-28 08:43:12,860 [DEBUG][org.jppf.comm.discovery.JPPFMulticastReceiver.receive(147)]: Auto-discovery of the driver connection information
: null
2011-07-28 08:43:12,862 [DEBUG][org.jppf.node.NodeRunner.discoverDriver(200)]: Could not auto-discover the driver connection information
2011-07-28 08:43:12,882 [DEBUG][org.jppf.classloader.JPPFClassLoader.init(101)]: initializing connection
2011-07-28 08:43:12,883 [DEBUG][org.jppf.classloader.JPPFClassLoader.initSocketClient(80)]: initializing socket connection
2011-07-28 08:43:12,890 [DEBUG][org.jppf.comm.socket.SocketInitializerImpl.initializeSocket(87)]: about to close socket wrapper
2011-07-28 08:43:12,922 [DEBUG][org.jppf.comm.socket.AbstractSocketWrapper.open(274)]: getReceiveBufferSize() = 67376
2011-07-28 08:43:12,937 [DEBUG][org.jppf.classloader.JPPFClassLoader.init(114)]: sending node initiation message
2011-07-28 08:43:13,040 [DEBUG][org.jppf.classloader.JPPFClassLoader.init(124)]: node initiation message sent, getting response
2011-07-28 08:43:13,193 [DEBUG][org.jppf.node.NodeRunner.main(136)]: received reconnection notification

Server displays

2011-07-28 08:43:13,112 [DEBUG][org.jppf.server.nio.StateTransitionManager.submitTransition(79)]: submitting transition for SelectionKeyWra
pper[loopback:44885, readyOps=1, keyOps=1]
2011-07-28 08:43:13,161 [DEBUG][org.jppf.server.nio.classloader.DefiningChannelTypeState.performTransition(74)]: channel: SelectionKeyWrapp
er[loopback:44885, readyOps=1, keyOps=0] read resource [null] done
2011-07-28 08:43:13,162 [DEBUG][org.jppf.server.nio.classloader.DefiningChannelTypeState.performTransition(95)]: initiating node: Selection
KeyWrapper[loopback:44885, readyOps=1, keyOps=0]
2011-07-28 08:43:13,164 [DEBUG][org.jppf.server.nio.classloader.ClassNioServer.addNodeConnection(297)]: adding node connection: uuid=8EE2B5
C462E16C52C8FA927A4F4A69E7, channel=SelectionKeyWrapper[loopback:44885, readyOps=1, keyOps=0]
2011-07-28 08:43:13,172 [DEBUG][org.jppf.server.nio.StateTransitionManager.transitionChannel(125)]: transitionned SelectionKeyWrapper[loopb
ack:44885, readyOps=1, keyOps=5] from DEFINING_TYPE to SENDING_INITIAL_NODE_RESPONSE
2011-07-28 08:43:13,174 [DEBUG][org.jppf.server.nio.StateTransitionManager.submitTransition(79)]: submitting transition for SelectionKeyWra
pper[loopback:44885, readyOps=5, keyOps=5]
2011-07-28 08:43:13,180 [DEBUG][org.jppf.server.nio.StateTransitionTask.run(87)]: node SelectionKeyWrapper[loopback:44885, readyOps=5, keyO
ps=0] has been disconnected
java.net.ConnectException: node SelectionKeyWrapper[loopback:44885, readyOps=5, keyOps=0] has been disconnected
        at org.jppf.server.nio.classloader.SendingNodeInitialResponseState.performTransition(SendingNodeInitialResponseState.java:64)
        at org.jppf.server.nio.classloader.SendingNodeInitialResponseState.performTransition(SendingNodeInitialResponseState.java:33)
        at org.jppf.server.nio.StateTransitionTask.run(StateTransitionTask.java:83)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:453)
        at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:315)
        at java.util.concurrent.FutureTask.run(FutureTask.java:150)
        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:898)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:920)
        at java.lang.Thread.run(Thread.java:736)
2011-07-28 08:43:13,187 [DEBUG][org.jppf.server.nio.classloader.ClassNioServer.removeNodeConnection(311)]: removing node connection: uuid=8
EE2B5C462E16C52C8FA927A4F4A69E7

-----------------------

Scenario 2 - Server and Node (discovery = false, jppf.server.host = 10.117.80.30)

The server does not get the message - Unable to bind to interface xx.xxx.xx.xxx on port 11111

The node displays

10:44:35,923  INFO NodeRunner:124 - starting node, uuid=8A19EE1F697BDE6CCB8D20B75FCB0028
10:44:36,003 DEBUG JPPFClassLoader:101 - initializing connection
10:44:36,005 DEBUG JPPFClassLoader:80 - initializing socket connection
Attempting connection to the class server at 10.117.80.30  :11111
10:44:36,012 DEBUG SocketInitializerImpl:87 - about to close socket wrapper
10:44:41,058 DEBUG SocketInitializerImpl:123 - SocketInitializer.initializeSocket(): Could not reconnect to the remote server

The server does not even show anything

-----------------------

Hopefully based on the logs you can guide me to what I am missing? 

Thanks in advance

Joseph Dela Cruz
Logged

lolo

  • Administrator
  • JPPF Council Member
  • *****
  • Posts: 2258
    • JPPF Web site
Re: Node and Server Connectivity
« Reply #1 on: July 30, 2011, 04:29:39 AM »

Hello Joseph,

Thanks for the detailed description of the issue you are facing.

Regarding your first scenario, with server discovery enabled, it seems that your system does not allow or support UDP multicast, and this is preventing the discovery from working.
Basically, the server will to attempt to bind to all available network interfaces, and broadcast a datagram packet containing its configuration information on each of these, on the configured discovery port.
Similarly, the node will listen on all available network interfaces for these packets to get that information.
Furthermore, when the node fails to discover a driver, it will use a basic fallback mechanism and use the server host configured in its configuration file, which leads to Scenario 2.
Here I am noticing that you have the following message in the console: "Attempting connection to the class server at 10.117.80.30  :11111"
Note the space before the ":11111" part. I think you have a trailing space in the value you set for the configuration property "jppf.server.host", and I believe this will cause the node to use it as a host name rather than an IP address, and thus it will be unable to resolve it. Could you try to remove this trailing space and let us know if this allows the node to connect to the server?

Sincerely,
-Laurent
Logged

jbd

  • JPPF Knight
  • **
  • Posts: 19
Re: Node and Server Connectivity
« Reply #2 on: August 01, 2011, 05:39:52 PM »

Hi Laurent,

You are right, there was a trailing space in the jppf host property.  I removed the trailing space and the node is now connecting to the server.  However, I am getting the following errors now.

FYI, I am running scenario 2 with host = ip.

Node log:

08:31:45,775 DEBUG JPPFClassLoader:101 - initializing connection
08:31:45,777 DEBUG JPPFClassLoader:80 - initializing socket connection
Attempting connection to the class server at 10.117.80.30:11111
08:31:45,787 DEBUG SocketInitializerImpl:87 - about to close socket wrapper
08:31:45,818 DEBUG AbstractSocketWrapper:274 - getReceiveBufferSize() = 65700
08:31:45,823 DEBUG JPPFClassLoader:114 - sending node initiation message
08:31:45,912 DEBUG JPPFClassLoader:124 - node initiation message sent, getting response
08:31:46,020 DEBUG NodeRunner:136 - received reconnection notification
08:31:46,025 DEBUG JPPFClassLoader:101 - initializing connection
Attempting connection to the class server at 10.117.80.30:11111
08:31:46,067 DEBUG SocketInitializerImpl:87 - about to close socket wrapper
08:31:46,086 DEBUG AbstractSocketWrapper:274 - getReceiveBufferSize() = 65700
08:31:46,088 DEBUG JPPFClassLoader:114 - sending node initiation message
08:31:46,095 DEBUG JPPFClassLoader:124 - node initiation message sent, getting response
08:31:46,125 DEBUG NodeRunner:136 - received reconnection notification
08:31:46,126 DEBUG JPPFClassLoader:101 - initializing connection

The Server log

08:31:45,913 DEBUG StateTransitionManager:79 - submitting transition for SelectionKeyWrapper[devserver1:38862, readyOps=1, keyOps=1]
08:31:45,973 DEBUG DefiningChannelTypeState:74 - channel: SelectionKeyWrapper[devserver1:38862, readyOps=1, keyOps=0] read resource [null] done
08:31:45,974 DEBUG DefiningChannelTypeState:95 - initiating node: SelectionKeyWrapper[devserver1:38862, readyOps=1, keyOps=0]
08:31:45,976 DEBUG ClassNioServer:297 - adding node connection: uuid=BC3A4C39CD28C83D653C7407A935480E, channel=SelectionKeyWrapper[devserver1:38862, readyOps=1, keyOps=0]
08:31:45,983 DEBUG StateTransitionManager:125 - transitionned SelectionKeyWrapper[devserver1:38862, readyOps=1, keyOps=5] from DEFINING_TYPE to SENDING_INITIAL_NODE_RESPONSE
08:31:45,985 DEBUG StateTransitionManager:79 - submitting transition for SelectionKeyWrapper[devserver1:38862, readyOps=5, keyOps=5]
08:31:45,990 DEBUG StateTransitionTask:87 - node SelectionKeyWrapper[devserver1:38862, readyOps=5, keyOps=0] has been disconnected
java.net.ConnectException: node SelectionKeyWrapper[devserver1:38862, readyOps=5, keyOps=0] has been disconnected
        at org.jppf.server.nio.classloader.SendingNodeInitialResponseState.performTransition(SendingNodeInitialResponseState.java:64)
        at org.jppf.server.nio.classloader.SendingNodeInitialResponseState.performTransition(SendingNodeInitialResponseState.java:33)
        at org.jppf.server.nio.StateTransitionTask.run(StateTransitionTask.java:83)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:453)
        at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:315)
        at java.util.concurrent.FutureTask.run(FutureTask.java:150)
        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:898)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:920)
        at java.lang.Thread.run(Thread.java:736)
08:31:46,004 DEBUG ClassNioServer:311 - removing node connection: uuid=BC3A4C39CD28C83D653C7407A935480E
08:31:46,093 DEBUG StateTransitionManager:79 - submitting transition for SelectionKeyWrapper[devserver1:38863, readyOps=1, keyOps=1]
08:31:46,102 DEBUG DefiningChannelTypeState:74 - channel: SelectionKeyWrapper[devserver1:38863, readyOps=1, keyOps=0] read resource [null] done
08:31:46,103 DEBUG DefiningChannelTypeState:95 - initiating node: SelectionKeyWrapper[devserver1:38863, readyOps=1, keyOps=0]
08:31:46,104 DEBUG ClassNioServer:297 - adding node connection: uuid=BC3A4C39CD28C83D653C7407A935480E, channel=SelectionKeyWrapper[devserver1:38863, readyOps=1, keyOps=0]
08:31:46,111 DEBUG StateTransitionManager:125 - transitionned SelectionKeyWrapper[devserver1:38863, readyOps=1, keyOps=5] from DEFINING_TYPE to SENDING_INITIAL_NODE_RESPONSE
08:31:46,113 DEBUG StateTransitionManager:79 - submitting transition for SelectionKeyWrapper[devserver1:38863, readyOps=5, keyOps=5]
08:31:46,115 DEBUG StateTransitionTask:87 - node SelectionKeyWrapper[devserver1:38863, readyOps=5, keyOps=0] has been disconnected
java.net.ConnectException: node SelectionKeyWrapper[devserver1:38863, readyOps=5, keyOps=0] has been disconnected
        at org.jppf.server.nio.classloader.SendingNodeInitialResponseState.performTransition(SendingNodeInitialResponseState.java:64)
        at org.jppf.server.nio.classloader.SendingNodeInitialResponseState.performTransition(SendingNodeInitialResponseState.java:33)
        at org.jppf.server.nio.StateTransitionTask.run(StateTransitionTask.java:83)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:453)
        at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:315)
        at java.util.concurrent.FutureTask.run(FutureTask.java:150)
        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:898)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:920)
        at java.lang.Thread.run(Thread.java:736)
08:31:46,119 DEBUG ClassNioServer:311 - removing node connection: uuid=BC3A4C39CD28C83D653C7407A935480E


Thanks again,

Joseph
Logged

lolo

  • Administrator
  • JPPF Council Member
  • *****
  • Posts: 2258
    • JPPF Web site
Re: Node and Server Connectivity
« Reply #3 on: August 02, 2011, 07:34:01 AM »

Hi Joseph,

I think there is something I do not know or understand (or both) about sockets on your system.

The first thing I noticed from your logs is the following message: "AbstractSocketWrapper:274 - getReceiveBufferSize() = 65700"
This is surprising to me, since in the JPPF code we do an explicit call to "Socket.setReceiveBufferSize(SOCKET_RECEIVE_BUFFER_SIZE)", where SOCKET_RECEIVE_BUFFER_SIZE is a constant with a value of 65536 (i.e. 64K)". I know the javadoc states that setReceiveBufferSize(...) is just providing a hint for the size of the receive buffer, but still it's the first time I see such a resulting value. I am thinking there may be some overriding configuration at the OS level, could you confirm this?

The logs show that something happens during the handshake between node and server. The following lines in the node's log:
  08:31:45,823 DEBUG JPPFClassLoader:114 - sending node initiation message
  08:31:45,912 DEBUG JPPFClassLoader:124 - node initiation message sent, getting response

show that the node is able to send the initial handshake message, however this message:
  08:31:46,020 DEBUG NodeRunner:136 - received reconnection notification
indicates that an error occurred while waiting for the acknowledgement from the server.
To confirm this, the exception on the server side indicates the server found a problem with the network channel, just before sending the acknowledgement to the node.

I am suspecting this has something to do with the network configuration on your AIX machine.
Could you please try a simple test, by setting "jppf.server.host = localhost" or "jppf.server.host = 127.0.0.1" in the node configuration, and let us know if the outcome is different?
You also mentioned that your system shows 5 different network interfaces, each with its own IP address. Could you try with the others, and tell us if the behavior is the same as when using 10.117.80.30 ?

Thanks,
-Laurent
Logged

jbd

  • JPPF Knight
  • **
  • Posts: 19
Re: Node and Server Connectivity
« Reply #4 on: August 02, 2011, 09:15:42 PM »

Hi Laurent,

Here are the results.

------------------
node jppf.server.host = 127.0.0.1

Node.log

Attempting connection to the class server at 127.0.0.1:11111
10:19:14,700 DEBUG SocketInitializerImpl:87 - about to close socket wrapper
10:19:14,712 DEBUG AbstractSocketWrapper:274 - getReceiveBufferSize() = 67376
10:19:14,713 DEBUG JPPFClassLoader:114 - sending node initiation message
10:19:14,714 DEBUG JPPFClassLoader:124 - node initiation message sent, getting response
10:19:14,727 DEBUG NodeRunner:136 - received reconnection notification

Server.log

10:19:15,700 DEBUG StateTransitionManager:79 - submitting transition for SelectionKeyWrapper[loopback:51676, readyOps=1, keyOps=1]
10:19:15,702 DEBUG DefiningChannelTypeState:74 - channel: SelectionKeyWrapper[loopback:51676, readyOps=1, keyOps=0] read resource [null] done
10:19:15,702 DEBUG DefiningChannelTypeState:95 - initiating node: SelectionKeyWrapper[loopback:51676, readyOps=1, keyOps=0]
10:19:15,702 DEBUG ClassNioServer:297 - adding node connection: uuid=A286D0E75FC94FC3F07459DDDD246F67, channel=SelectionKeyWrapper[loopback:51676, readyOps=1, keyOps=0]
10:19:15,703 DEBUG StateTransitionManager:125 - transitionned SelectionKeyWrapper[loopback:51676, readyOps=1, keyOps=5] from DEFINING_TYPE to SENDING_INITIAL_NODE_RESPONSE
10:19:15,704 DEBUG StateTransitionManager:79 - submitting transition for SelectionKeyWrapper[loopback:51676, readyOps=5, keyOps=5]
10:19:15,710 DEBUG StateTransitionTask:87 - node SelectionKeyWrapper[loopback:51676, readyOps=5, keyOps=0] has been disconnected
java.net.ConnectException: node SelectionKeyWrapper[loopback:51676, readyOps=5, keyOps=0] has been disconnected
        at org.jppf.server.nio.classloader.SendingNodeInitialResponseState.performTransition(SendingNodeInitialResponseState.java:64)
        at org.jppf.server.nio.classloader.SendingNodeInitialResponseState.performTransition(SendingNodeInitialResponseState.java:33)
        at org.jppf.server.nio.StateTransitionTask.run(StateTransitionTask.java:83)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:453)
        at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:315)
        at java.util.concurrent.FutureTask.run(FutureTask.java:150)
        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:898)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:920)
        at java.lang.Thread.run(Thread.java:736)
10:19:15,710 DEBUG ClassNioServer:311 - removing node connection: uuid=A286D0E75FC94FC3F07459DDDD246F67

----------------------------

node jppf.server.host = localhost

Node.log

11:07:22,117 DEBUG JPPFClassLoader:101 - initializing connection
Attempting connection to the class server at localhost:11111
11:07:22,117 DEBUG SocketInitializerImpl:87 - about to close socket wrapper
11:07:22,129 DEBUG AbstractSocketWrapper:274 - getReceiveBufferSize() = 67376
11:07:22,130 DEBUG JPPFClassLoader:114 - sending node initiation message
11:07:22,132 DEBUG JPPFClassLoader:124 - node initiation message sent, getting response
11:07:22,166 DEBUG NodeRunner:136 - received reconnection notification

Server.log

11:07:22,325 DEBUG StateTransitionManager:79 - submitting transition for SelectionKeyWrapper[loopback:54124, readyOps=1, keyOps=1]
11:07:22,356 DEBUG DefiningChannelTypeState:74 - channel: SelectionKeyWrapper[loopback:54124, readyOps=1, keyOps=0] read resource [null] done
11:07:22,357 DEBUG DefiningChannelTypeState:95 - initiating node: SelectionKeyWrapper[loopback:54124, readyOps=1, keyOps=0]
11:07:22,357 DEBUG ClassNioServer:297 - adding node connection: uuid=FC3A3E1DDD0A391C0C0E20B7D6D3D7FD, channel=SelectionKeyWrapper[loopback:54124, readyOps=1, keyOps=0]
11:07:22,358 DEBUG StateTransitionManager:125 - transitionned SelectionKeyWrapper[loopback:54124, readyOps=1, keyOps=5] from DEFINING_TYPE to SENDING_INITIAL_NODE_RESPONSE
11:07:22,363 DEBUG StateTransitionManager:79 - submitting transition for SelectionKeyWrapper[loopback:54124, readyOps=5, keyOps=5]
11:07:22,364 DEBUG StateTransitionTask:87 - node SelectionKeyWrapper[loopback:54124, readyOps=5, keyOps=0] has been disconnected
java.net.ConnectException: node SelectionKeyWrapper[loopback:54124, readyOps=5, keyOps=0] has been disconnected
        at org.jppf.server.nio.classloader.SendingNodeInitialResponseState.performTransition(SendingNodeInitialResponseState.java:64)
        at org.jppf.server.nio.classloader.SendingNodeInitialResponseState.performTransition(SendingNodeInitialResponseState.java:33)
        at org.jppf.server.nio.StateTransitionTask.run(StateTransitionTask.java:83)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:453)
        at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:315)
        at java.util.concurrent.FutureTask.run(FutureTask.java:150)
        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:898)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:920)
        at java.lang.Thread.run(Thread.java:736)
11:07:22,364 DEBUG ClassNioServer:311 - removing node connection: uuid=FC3A3E1DDD0A391C0C0E20B7D6D3D7FD

--------------------------
I also tried the different IP's and still getting the same results.

The difference is the getReceiveBufferSize(), some ip's return 65700 others 67376.  I am not that familiar with the network config, but if there is any particular OS settings you need let me know.

Thanks again for taking the time.
Logged

jbd

  • JPPF Knight
  • **
  • Posts: 19
Re: Node and Server Connectivity
« Reply #5 on: August 03, 2011, 11:17:27 PM »

I also ran the node on a windows7 machine to connect to the AIX server (discovery is false).  The  getReceiveBufferSize() stayed at 65536 but I am still getting the same error on the server.

FYI, I am using java 1.6.0 64-bit on the server.

java version "1.6.0"
Java(TM) SE Runtime Environment (build pap6460sr8fp1-20100624_01(SR8 FP1))
IBM J9 VM (build 2.4, JRE 1.6.0 IBM J9 2.4 AIX ppc64-64 jvmap6460sr8ifx-20100609_59383 (JIT enabled, AOT enabled)
J9VM - 20100609_059383
JIT  - r9_20100401_15339ifx2
GC   - 20100308_AA)
JCL  - 20100624_01
 
Logged

jbd

  • JPPF Knight
  • **
  • Posts: 19
Re: Node and Server Connectivity
« Reply #6 on: August 12, 2011, 01:14:29 AM »

Hi Laurent,

I ran the following scenario and it worked.
   
   Server on a Linux box
   Node on the same AIX (discovery is false).

It seems that the server running on the AIX server is not working properly.  Unfortunately, AIX is the server we need to use for this.  Is there any test I can run for you to get a better understanding?

Thanks 
Logged

lolo

  • Administrator
  • JPPF Council Member
  • *****
  • Posts: 2258
    • JPPF Web site
Re: Node and Server Connectivity
« Reply #7 on: August 17, 2011, 07:51:46 AM »

H Joseph,

Sorry for this late reply.
Unfortunately, I'm not familiar with AIX systems, and I'm puzzled by the behavior you are observing.
I did some research on the web, but did not find anything conclusive.
One thing I might suggest, would be to take a look at how your network interfaces are configured. Could you do this, according to the instructions in this article, and post the results?

There may also be some useful tips, in this other artile: http://www.ibm.com/developerworks/aix/library/au-aixnetworkproblem1/index.html?ca=drs-

I know that some of our users have sucessfully used JPPF on AIX, so I'm quite sure there is a way to resolve this.

Sincerely,
-Laurent
Logged

jbd

  • JPPF Knight
  • **
  • Posts: 19
Re: Node and Server Connectivity
« Reply #8 on: August 18, 2011, 01:47:08 AM »

Hi Laurent,

The following is the info of what I can gather. Unfortunately, I do not have root access and the server I use is shared by many applications.  I am only able to control the java code.

ifconfig -a
en0: flags=5e080863,c0<UP,BROADCAST,NOTRAILERS,RUNNING,SIMPLEX,MULTICAST,GROUPRT,64BIT,CHECKSUM_OFFLOAD(ACTIVE),PSEG,LARGESEND,CHAIN>
        inet 10.117.80.30 netmask 0xff000000 broadcast 10.255.255.255
        inet 10.117.80.208 netmask 0xffffff00 broadcast 10.117.80.255
        inet 10.117.80.170 netmask 0xffffff00 broadcast 10.117.80.255
        inet 10.117.80.172 netmask 0xffffff00 broadcast 10.117.80.255
         tcp_sendspace 131072 tcp_recvspace 65536 rfc1323 0
en1: flags=5e080863,c0<UP,BROADCAST,NOTRAILERS,RUNNING,SIMPLEX,MULTICAST,GROUPRT,64BIT,CHECKSUM_OFFLOAD(ACTIVE),PSEG,LARGESEND,CHAIN>
        inet 10.117.135.13 netmask 0xfffffc00 broadcast 10.117.135.255
         tcp_sendspace 131072 tcp_recvspace 65536 rfc1323 0
lo0: flags=e08084b<UP,BROADCAST,LOOPBACK,RUNNING,SIMPLEX,MULTICAST,GROUPRT,64BIT>
        inet 127.0.0.1 netmask 0xff000000 broadcast 127.255.255.255
        inet6 ::1/0
         tcp_sendspace 131072 tcp_recvspace 131072 rfc1323 1


lsattr -El en0

alias4        10.117.80.172,255.255.255.0 IPv4 Alias including Subnet Mask           True
alias4        10.117.80.170,255.255.255.0 IPv4 Alias including Subnet Mask           True
alias4        10.117.80.208,255.255.255.0 IPv4 Alias including Subnet Mask           True
alias6                                    IPv6 Alias including Prefix Length         True
arp           on                          Address Resolution Protocol (ARP)          True
authority                                 Authorized Users                           True
broadcast                                 Broadcast Address                          True
mtu           1500                        Maximum IP Packet Size for This Device     True
netaddr       10.117.80.30                Internet Address                           True
netaddr6                                  IPv6 Internet Address                      True
netmask                                   Subnet Mask                                True
prefixlen                                 Prefix Length for IPv6 Internet Address    True
remmtu        576                         Maximum IP Packet Size for REMOTE Networks True
rfc1323                                   Enable/Disable TCP RFC 1323 Window Scaling True
security      none                        Security Level                             True
state         up                          Current Interface Status                   True
tcp_mssdflt                               Set TCP Maximum Segment Size               True
tcp_nodelay                               Enable/Disable TCP_NODELAY Option          True
tcp_recvspace                             Set Socket Buffer Space for Receiving      True
tcp_sendspace                             Set Socket Buffer Space for Sending        True

lsattr -El en1
alias4                      IPv4 Alias including Subnet Mask           True
alias6                      IPv6 Alias including Prefix Length         True
arp           on            Address Resolution Protocol (ARP)          True
authority                   Authorized Users                           True
broadcast                   Broadcast Address                          True
mtu           1500          Maximum IP Packet Size for This Device     True
netaddr       10.117.135.13  Internet Address                           True
netaddr6                    IPv6 Internet Address                      True
netmask       255.255.252.0 Subnet Mask                                True
prefixlen                   Prefix Length for IPv6 Internet Address    True
remmtu        576           Maximum IP Packet Size for REMOTE Networks True
rfc1323                     Enable/Disable TCP RFC 1323 Window Scaling True
security      none          Security Level                             True
state         up            Current Interface Status                   True
tcp_mssdflt                 Set TCP Maximum Segment Size               True
tcp_nodelay                 Enable/Disable TCP_NODELAY Option          True
tcp_recvspace               Set Socket Buffer Space for Receiving      True
tcp_sendspace               Set Socket Buffer Space for Sending        True

Simple NIO server and client

I wanted to test the plain java nio and found this article http://rox-xmlrpc.sourceforge.net/niotut/. I tested with the server running on AIX and client running on both AIX and windows7, and it was successful. I know that JPPF is way more complex than this example but is there anything here that might give us a clue on why we are getting the errors?

Thanks for your help again,

Joseph
Logged

lolo

  • Administrator
  • JPPF Council Member
  • *****
  • Posts: 2258
    • JPPF Web site
Re: Node and Server Connectivity
« Reply #9 on: August 21, 2011, 11:28:51 AM »

Hello Joseph,

I still have no clue why you are facing this error. However, I believe I have a workaround, which I'd like you to test on your side.
Since the ConnectException is thrown from within a check the JPPF code is doing, I added the possiblity to disable this check.
You will need to get a modified jppf-server.jar, which you can get from this location.
To use it:
- replace the jppf-server.jar in your JPPF-x.y.z-driver/lib with the new one rpovided in the zip file
- in the driver's configuration file (JPPF-x.y.z-driver/config/jppf-driver.properties), add the following property: jppf.nio.check.connection = false
The zip also includes the modified source code, in case you need it for debugging.

Could you please try this out and let us know if it works for you? Upon your confirmation, I will include this workaround in the next maintenance release (JPPF 2.5.3).

Sincerely,
-Laurent
Logged

jbd

  • JPPF Knight
  • **
  • Posts: 19
Re: Node and Server Connectivity
« Reply #10 on: August 23, 2011, 12:10:13 AM »

Hi Laurent,

Finally it worked,  I am now able to move on.  Again, great framework.  Looking forward to building applications using JPPF.

Thanks again for your help.  I hope this did not break your code a whole lot.

--Joseph

Logged

jbd

  • JPPF Knight
  • **
  • Posts: 19
Re: Node and Server Connectivity
« Reply #11 on: November 08, 2011, 12:29:50 AM »

Hi Laurent,

I finally was able to get back to this.  I downloaded the latest version (jppf-2.5.3) and configured the jppf.nio.check.connection = false.  I did some testing by submitting a job, job executed successfully with the following log


[java] 15:17:22,191 DEBUG StateTransitionManager:125 - transitionned SelectionKeyWrapper[SANDLFTXR87RYF4.ES.AD.ADP.com:59373, readyOps=1, keyOps=1] from WAITING_RESULTS to IDLE
[java] 15:17:22,191 DEBUG ApplicationResultSender:68 - Sending bundle for job '6c0dd2b3ae2b1c9063c6689c1b19fbd8' with 1 tasks, exception parameter = null
[java] 15:17:22,195 DEBUG JPPFScheduleHandler:131 - Job Schedule Handler : cancelling action for key=6c0dd2b3ae2b1c9063c6689c1b19fbd8, task=null
[java] 15:17:22,195 DEBUG JPPFScheduleHandler:131 - Job Expiration Handler : cancelling action for key=6c0dd2b3ae2b1c9063c6689c1b19fbd8, task=null
[java] 15:17:22,195 DEBUG JPPFJobManager:167 - jobId '6c0dd2b3ae2b1c9063c6689c1b19fbd8' ended
[java] 15:17:22,198 DEBUG ApplicationConnection:118 - before reading header
[java] 15:17:22,198 DEBUG DriverJobManagement:305 - sending event JOB_ENDED for job 6c0dd2b3ae2b1c9063c6689c1b19fbd8

Then when the client close is called, the driver continuously logs the following.  I set the jppf.nio.check.connection = true and the driver closes the connection gracefully.  I think since the the check is disabled the driver fails to recognize the connection being closed.

[java] 15:17:25,923 DEBUG StateTransitionManager:79 - submitting transition for SelectionKeyWrapper[127.0.0.1:59374, readyOps=1, keyOps=1]
[java] 15:17:25,925 DEBUG JPPFConnection:100 - Connection reset
[java] java.net.SocketException: Connection reset
[java]     at java.net.SocketInputStream.read(SocketInputStream.java:168)
[java]     at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
[java]     at java.io.BufferedInputStream.read(BufferedInputStream.java:237)
[java]     at java.io.DataInputStream.readInt(DataInputStream.java:370)
[java]     at org.jppf.comm.socket.AbstractSocketWrapper.readInt(AbstractSocketWrapper.java:248)
[java]     at org.jppf.io.SocketWrapperInputSource.readInt(SocketWrapperInputSource.java:87)
[java]     at org.jppf.io.IOHelper.readData(IOHelper.java:94)
[java]     at org.jppf.server.app.ApplicationConnection.perform(ApplicationConnection.java:120)
[java]     at org.jppf.server.app.JPPFConnection.run(JPPFConnection.java:95)
[java] 15:17:25,926 DEBUG ApplicationConnection:186 - closing application connection : 127.0.0.1:59375
[java] 15:17:25,926 DEBUG StateTransitionManager:79 - submitting transition for SelectionKeyWrapper[127.0.0.1:59374, readyOps=1, keyOps=1]
[java] 15:17:25,928 DEBUG StateTransitionManager:79 - submitting transition for SelectionKeyWrapper[127.0.0.1:59374, readyOps=1, keyOps=1]
[java] 15:17:25,929 DEBUG StateTransitionManager:79 - submitting transition for SelectionKeyWrapper[127.0.0.1:59374, readyOps=1, keyOps=1]
[java] 15:17:25,929 DEBUG StateTransitionManager:79 - submitting transition for SelectionKeyWrapper[127.0.0.1:59374, readyOps=1, keyOps=1]
[java] 15:17:25,929 DEBUG StateTransitionManager:79 - submitting transition for SelectionKeyWrapper[127.0.0.1:59374, readyOps=1, keyOps=1]
[java] 15:17:26,009 DEBUG StateTransitionManager:79 - submitting transition for SelectionKeyWrapper[127.0.0.1:59374, readyOps=1, keyOps=1]
[java] 15:17:26,010 DEBUG StateTransitionManager:79 - submitting transition for SelectionKeyWrapper[127.0.0.1:59374, readyOps=1, keyOps=1]
[java] 15:17:26,010 DEBUG StateTransitionManager:79 - submitting transition for SelectionKeyWrapper[127.0.0.1:59374, readyOps=1, keyOps=1]
[java] 15:17:26,011 DEBUG StateTransitionManager:79 - submitting transition for SelectionKeyWrapper[127.0.0.1:59374, readyOps=1, keyOps=1]

Thanks again in advance,
Joseph
Logged

jandam

  • JPPF Master
  • ***
  • Posts: 38
Re: Node and Server Connectivity
« Reply #12 on: November 15, 2011, 03:26:24 PM »

Hello,

  I just filled bug 3438303 - Driver doesn't recognize that client connection was closed https://sourceforge.net/tracker/?func=detail&aid=3438303&group_id=135654&atid=733518

  Martin
Logged

lolo

  • Administrator
  • JPPF Council Member
  • *****
  • Posts: 2258
    • JPPF Web site
Re: Node and Server Connectivity
« Reply #13 on: November 22, 2011, 06:11:18 AM »

Hello Joseph,

I have implemented a fix for this, which is now part of the JPPF 2.5.4 distribution files. If you had downloaded v2.5.4 before yesterday, you will need to download them again.
Unfortunately, I could not cover all situations in the fix. It will not work if you close the client before sending any job, because the fix makes the first job provide additional information, used to correlate with the corresponding NIO channel, so the server knows which one to close. This will also not work with the upcoming v3.0, as it will be using pure NIO-based network communication.

Please remember that the checks disabled with "jppf.nio.check.connection = false" are those that allow JPPF to detect broken connections.

Additionally, I could also verify that the same issue occurs with connections to the nodes. To work around this, you can configure the recovery from hardware failure, using values such that it doesn't take too long to detect a broken connection.

I hope this helps.

Sincerely,
-Laurent
Logged

jbd

  • JPPF Knight
  • **
  • Posts: 19
Re: Node and Server Connectivity
« Reply #14 on: November 28, 2011, 08:45:28 PM »

Hi Laurent,

Thanks for the reply.

In my scenario, do you think it would make a difference if I use v3.0?  Do you have a beta so I can see if our system would behave better (jppf.nio.check.connection = true).  If it works, I will just wait for v3.0.


Thanks,

Joseph
Logged

lolo

  • Administrator
  • JPPF Council Member
  • *****
  • Posts: 2258
    • JPPF Web site
Re: Node and Server Connectivity
« Reply #15 on: November 29, 2011, 09:08:02 AM »

Hello Joseph,

Unfortunately, I don't think it will work any better with 3.0  :( There is no fundamental change in the JPPF communication model, so I think the same behavior will occur.
Nonetheless, if you wish to give it a try, you can get the latest beta build from this location: http://sourceforge.net/projects/jppf-project/files/jppf-project/jppf%203.0%20beta/
Please note that the documentation is not up to date, only the configuration guide has been updated. You can access the doc there: http://www.jppf.org/doc/v3

In the meantime, I believe I came up with a much better fix in 2.5.4. Would you mind giving it a try?
To do so, just download the jppf-server.jar file from this location: http://www.jppf.org/private/2.5.4/jppf-server.jar
The corresponding source jar is: http://www.jppf.org/private/2.5.4/jppf-server-src.jar
Then replace the one you have in your server's /lib folder with the new one, and restart the server.
This fixes the infinite loop issue for both client and node connections, when "jppf.nio.check.connection = false". The one drawback is that node disconnections won't be detected immediately, but only when the server attemps to send tasks for execution. These tasks will be resubmitted to another node, and this may cause a very slight overhead.

Can you try this and let us know of the results?

Thanks,
-Laurent
Logged

jbd

  • JPPF Knight
  • **
  • Posts: 19
Re: Node and Server Connectivity
« Reply #16 on: December 05, 2011, 06:51:38 PM »

Hi Laurent,

I tried the 2.5.4 version.  The server now seems to be closing the connections.  However, when I stop the node and restart, the server gets the following error:


09:30:13,479 DEBUG StateTransitionManager:79 - submitting transition for SelectionKeyWrapper[sandlftxr87ryf4.ca.adp.com:63579, readyOps=4, keyOps=4]
09:30:13,479 DEBUG SendingProviderRequestState:87 - provider SelectionKeyWrapper[sandlftxr87ryf4.ca.adp.com:63579, readyOps=4, keyOps=0] serving new resource request [org/jppf/utils/ObjectSerializerImpl.class] from node: SelectionKeyWrapper[loopback:38860, readyOps=1, keyOps=0]
09:30:13,480 DEBUG StateTransitionTask:87 - There is no process to read data written to a pipe.
java.io.IOException: There is no process to read data written to a pipe.
        at sun.nio.ch.FileDispatcher.write0(Native Method)
        at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:29)
        at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:104)
        at sun.nio.ch.IOUtil.write(IOUtil.java:75)
        at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:334)
        at org.jppf.utils.SerializationUtils.writeInt(SerializationUtils.java:158)
        at org.jppf.server.nio.SimpleNioContext.writeMessage(SimpleNioContext.java:79)
        at org.jppf.server.nio.classloader.SendingProviderRequestState.performTransition(SendingProviderRequestState.java:97)
        at org.jppf.server.nio.classloader.SendingProviderRequestState.performTransition(SendingProviderRequestState.java:33)
        at org.jppf.server.nio.StateTransitionTask.run(StateTransitionTask.java:83)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:453)
        at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:315)
        at java.util.concurrent.FutureTask.run(FutureTask.java:150)
        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:898)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:920)
        at java.lang.Thread.run(Thread.java:736)
09:30:13,481 DEBUG ClassNioServer:331 - closing channel SelectionKeyWrapper[sandlftxr87ryf4.ca.adp.com:63579, readyOps=4, keyOps=0]

The node does not get the job/task (waited for hours).  The server and node seems to be communicating on the restart because I see log messages between the server and the node.  Another thing is the server shows Queued tasks of 1, even if I've sent multiple requests, shouldn't the queue grow?

Thanks again,

Joseph
Logged

lolo

  • Administrator
  • JPPF Council Member
  • *****
  • Posts: 2258
    • JPPF Web site
Re: Node and Server Connectivity
« Reply #17 on: December 08, 2011, 08:04:05 AM »

Hi Joseph,

Thanks for the information. I am still unable to reproduce on my side.
Could you tell us what is happening at the time you kill the node? In particular, is the node executing a task at that time, or is it idle?
If I remember well, you are running on AIX, and you are using an IBM JVM. I am thus downloading an IBM JVM to test with and run my tests in a Linux environment, which is the closest I can get.
I will let you know what I find.

Regarding the number of tasks in the queue shown by the admin console, a possiblity is that only one job was sent to the server, and the others are still waiting on the client side.
Did you configure a connection pool for your client?

Sincerely,
-Laurent
Logged

jbd

  • JPPF Knight
  • **
  • Posts: 19
Re: Node and Server Connectivity
« Reply #18 on: December 09, 2011, 06:59:26 PM »

Hi Laurent,

The node is in idle state when I kill it.  I did not configure anything on the client, except for the server the client will connect to.

Upon further testing, it seems that when I use JPPFClient with a fixed uuid, this problem occurs, without the uuid, it works fine.  BTW, the test I am doing is a standalone client sending classes to the node via the server to print hello world (one run per invocation). 

I think I need to explain a little bit of what I am trying to do and maybe you can guide me on the best implementation strategy.  I want to build a generic master servant framework, wherein the master receives a request, identify which cluster and routes the request to the servant.  The requests are dynamic, the node may receive requests to list a directory, encrypt/decrypt a file, transfer a file, etc.  In addition, nodes may reside locally and remotely from the driver.  My question is, since nodes receive different types of classes, some huge classes, what is the most efficient way to implement this?  I was trying to use a fixed uuid so nodes would not have to reload, in case it receives the same type of request.  I am not sure though how jppf would react in terms of the amount of classes it can hold when using fixed uuid. 

Thanks again,

Joseph

 
Logged

lolo

  • Administrator
  • JPPF Council Member
  • *****
  • Posts: 2258
    • JPPF Web site
Re: Node and Server Connectivity
« Reply #19 on: December 14, 2011, 07:49:09 AM »

Hi Joseph,

First I'd like to make sure we're both on the same page with regards to what you mention as huge classes. If these are classes with lots of attributes, methods and code, then these are big classes. If we're talking about objects that reference a lot of data from thier attributes, then these are big objects, and their size has no impact on the class loading. Could you precise which it is?

In any case, the amount of classes that can be loaded is mostly determined by the JVM.
A JPPF node maintains a class loader cache. You can tune the parameter that specfifies the maximum number of class loaders in the cache, to find an acceptable balance between the memory footprint of the classes and the frequence of reloading.
Please note also that the server itself maintains a soft cache of resources and classes loaded from the client(s), so that classes loaded by a node can be loaded by other nodes faster. This server cache is made of soft references only, which will be garbage collected when the memory becomes scarce.

Now, I still do not get why this is not working when using a fixed uuid for your client.
What I just realized, reading the latest stack you provided, is that the exception is about the connection between the server and the client.
In effect you can see the following frame: at org.jppf.server.nio.classloader.SendingProviderRequestState.performTransition() which indicates the server is sending a request to the client, asking for the bytecode of a class.

What I'm suspecting is that, with nio checks disabled, the client class loader connection to the server is not released, even after the client terminates. The server may then attempt to reuse it because it bears the same client uuid, since your client uuid is always the same. In this scenario, the attempt to reuse a connection that is broken leads to the exception you observed.
Also this would only happen if the client creates more than one connection to the server. In this case, the exception would only happen for unused connections, as when nio checks are disabled, the server cannot detect that they are broken if they were not used to send at least one job.

To help confirm, Is there any way to check if you have set a connection pool in your client configuration? This would be configured via either "jppf.pool.size = N" or "<your_driver>.jppf.pool.size = N" with N > 1. Could you post your client configuration?

Thanks,
-Laurent

Logged

jbd

  • JPPF Knight
  • **
  • Posts: 19
Re: Node and Server Connectivity
« Reply #20 on: December 14, 2011, 08:12:28 PM »

Hi Laurent,

What I meant by huge is my business logic plus jar dependencies, not the data.  Currently, one of the request to the server/node may require around 6mb, not sure if you consider that huge, but i think it will slow down processing due to classloading, specially when invoked remotely.

I am not using any connection pool.  The client is standalone and the only thing it does is start the client (new jvm), send the request, wait to finish, then die. I used this test for both the scenarios (with and without uuid).  Below is my client configuration:


#------------------------------------------------------------------------------#
# JPPF.                                                                        #
# Copyright (C) 2005-2011 JPPF Team.                                           #
# http://www.jppf.org                                                          #
#                                                                              #
# Licensed under the Apache License, Version 2.0 (the "License");              #
# you may not use this file except in compliance with the License.             #
# You may obtain a copy of the License at                                      #
#                                                                              #
#     http://www.apache.org/licenses/LICENSE-2.0                                #
#                                                                              #
# Unless required by applicable law or agreed to in writing, software          #
# distributed under the License is distributed on an "AS IS" BASIS,            #
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.     #
# See the License for the specific language governing permissions and          #
# limitations under the License.                                               #
#------------------------------------------------------------------------------#


#------------------------------------------------------------------------------#
# List of drivers this client may connect to.                                  #
# If auto discovery of the server is enabled, this needs not be specified.     #
#------------------------------------------------------------------------------#

jppf.drivers = driver1

#------------------------------------------------------------------------------#
# Host name, or ip address, of the host the JPPF driver is running on          #
# If auto discovery of the server is enabled, this needs not be specified.     #
#------------------------------------------------------------------------------#

driver1.jppf.server.host = sanddap1.ga.adp.com
#driver1.jppf.server.host = localhost

#------------------------------------------------------------------------------#
# port number for the class server that performs remote class loading          #
# default value is 11111; uncomment to specify a different value               #
# If auto discovery of the server is enabled, this needs not be specified.     #
#------------------------------------------------------------------------------#

driver1.class.server.port = 11111

#------------------------------------------------------------------------------#
# port number the clients / applications connect to                            #
# default value is 11112; uncomment to specify a different value               #
# If auto discovery of the server is enabled, this needs not be specified.     #
#------------------------------------------------------------------------------#

driver1.app.server.port = 11112

#------------------------------------------------------------------------------#
# JMX management port of the driver                                            #
# default value is 11198; uncomment to specify a different value               #
# If auto discovery of the server is enabled, this needs not be specified.     #
#------------------------------------------------------------------------------#

#jppf.management.port = 11098
#jppf.management.enabled = false

#------------------------------------------------------------------------------#
# Priority given to the driver                                                 #
# The client is always connected to the available driver(s) with the highest   #
# priority. If multiple drivers have the same priority, they will be used as a #
# pool and tasks will be evenly distributed among them.                        #
# default value is 0; uncomment to specify a different value                   #
#------------------------------------------------------------------------------#

driver1.priority = 10
#driver1.jppf.pool.size = 10

driver2.jppf.server.host = localhost
driver2.class.server.port = 11111
driver2.app.server.port = 11112
#driver2.priority = 10

#------------------------------------------------------------------------------#
# Maximum time in milliseconds spent trying to initialize at least one         #
# connection, before releasing control to the main application thread.         #
# default value is 1000 (1 second); uncomment to specify a different value     #
#------------------------------------------------------------------------------#

#jppf.client.max.init.time = 1000

#------------------------------------------------------------------------------#
# Automatic recovery: number of seconds before the first reconnection attempt. #
# default value is 1; uncomment to specify a different value                   #
#------------------------------------------------------------------------------#

#reconnect.initial.delay = 1

#------------------------------------------------------------------------------#
# Automatic recovery: time after which the system stops trying to reconnect,   #
# in seconds. A value of zero or less means the system nevers stops trying.    #
# default value is 60; uncomment to specify a different value                  #
#------------------------------------------------------------------------------#

reconnect.max.time = -1

#------------------------------------------------------------------------------#
# Automatic recovery: time between two connection attempts, in seconds.        #
# default value is 1; uncomment to specify a different value                   #
#------------------------------------------------------------------------------#

#reconnect.interval = 1

#------------------------------------------------------------------------------#
#  Enable local execution of tasks? Default value is false                     #
#------------------------------------------------------------------------------#

#jppf.local.execution.enabled = true

#------------------------------------------------------------------------------#
# Number of threads to use for local execution                                 #
# The default value is the number of CPUs available to the JVM                 #
#------------------------------------------------------------------------------#

#jppf.local.execution.threads = 4

#------------------------------------------------------------------------------#
# Maximum time to wait before notifying of available local execution results   #
# The default value is Long.MAX_VALUE                                          #
#------------------------------------------------------------------------------#

#jppf.local.execution.accumulation.time = 5000000

#------------------------------------------------------------------------------#
# Unit in which the accumulation time is expressed. Possible values:           #
# n = nanos | m = millis | s = seconds | M = minutes | h = hours | d = days    #
# The default value is n (nanos)                                               #
#------------------------------------------------------------------------------#

#jppf.local.execution.accumulation.time.unit = n

#------------------------------------------------------------------------------#
# Maximum number of available local execution results before sending a         #
# notification. The default value is Integer.MAX_VALUE                         #
#------------------------------------------------------------------------------#

#jppf.local.execution.accumulation.size = 100

#------------------------------------------------------------------------------#
# Enable/Disable automatic discovery of JPPF drivers.                          #
# default value is true; uncomment to specify a different value                #
#------------------------------------------------------------------------------#

#jppf.discovery.enabled = true
jppf.discovery.enabled = false

#------------------------------------------------------------------------------#
# UDP multicast group to which drivers broadcast their connection parameters   #
# and to which clients and nodes listen. Default value is 230.0.0.1            #
#------------------------------------------------------------------------------#

#jppf.discovery.group = 230.0.0.1

#------------------------------------------------------------------------------#
# UDP multicast port to which drivers broadcast their connection parameters    #
# and to which clients and nodes listen. Default value is 11111                #
#------------------------------------------------------------------------------#

#jppf.discovery.port = 11111

#------------------------------------------------------------------------------#
# Size of the connection pool for each discovered driver; default is 1         #
#------------------------------------------------------------------------------#

#jppf.pool.size = 1

#------------------------------------------------------------------------------#
# IPV4 address patterns included in the server dscovery mechanism              #
#------------------------------------------------------------------------------#

#jppf.discovery.ipv4.include = 192.168.1.

#------------------------------------------------------------------------------#
# IPV4 address patterns excluded from the server dscovery mechanism            #
#------------------------------------------------------------------------------#

#jppf.discovery.ipv4.exclude = 192.168.1.-9; 192.168.1.100-

#------------------------------------------------------------------------------#
# IPV6 address patterns included in the server dscovery mechanism              #
#------------------------------------------------------------------------------#

#jppf.discovery.ipv6.include = 1080:0:0:0:8:800:200C-20FF:-

#------------------------------------------------------------------------------#
# IPV6 address patterns excluded from the server dscovery mechanism            #
#------------------------------------------------------------------------------#

#jppf.discovery.ipv6.exclude = 1080:0:0:0:8:800:200C-20FF:0C00-0EFF

#------------------------------------------------------------------------------#
# Fully qualified name of a data transform class; allows encryption and        #
# decryption of JPPF networked data; default is no transformation. See         #
# http://www.jppf.org/wiki/index.php?title=Transforming_and_encrypting_networked_data #
#------------------------------------------------------------------------------#

#jppf.data.transform.class = org.jppf.example.dataencryption.SecureKeyCipherTransform

#------------------------------------------------------------------------------#
# Objects streams factories for alternate serialization frameworks.            #
#  Default is org.jppf.serialization.JPPFObjectStreamBuilderImpl               #
#------------------------------------------------------------------------------#

# built-in serialization scheme that does not care if classes are Serializable
#jppf.object.stream.builder = org.jppf.serialization.GenericObjectStreamBuilder

# built-in serialization factory based on XStream - requires downloading the XStream libraries first
#jppf.object.stream.builder = org.jppf.serialization.XstreamObjectStreamBuilder

#------------------------------------------------------------------------------#
# Specify alternate object stream classes for serialization.                   #
# Defaults to java.io.ObjectInputStream and java.io.ObjectOutputStream.        #
#------------------------------------------------------------------------------#

# defaults
#jppf.object.input.stream.class = java.io.ObjectInputStream
#jppf.object.output.stream.class = java.io.ObjectOutputStream

# built-in object streams
#jppf.object.input.stream.class = org.jppf.serialization.JPPFObjectInputStream
#jppf.object.output.stream.class = org.jppf.serialization.JPPFObjectOutputStream

Another question, what would be the optimal load balancing strategy I should use, since the types of tasks varies?

Thanks,

Joseph


Logged

lolo

  • Administrator
  • JPPF Council Member
  • *****
  • Posts: 2258
    • JPPF Web site
Re: Node and Server Connectivity
« Reply #21 on: December 16, 2011, 07:56:06 AM »

Hi Joseph,

I was finally able to reproduce the same exception you posted earlier. This only happens when:
1. using always the same uuid for the JPPF client
2. using an IBM JVM
3. a client is created then closed without submitting a job

Creating and closing a JPPF client without submitting a job creates a dangling connection in the server. Since the client uuid it is associated with is always the same, the server reuses it later to request classes from the client, and that's when the exception is thrown. The fact the job processing is frozen at this point is due to a bad error processing in our code.
Interestingly, with the Oracle JVM, a different exception is thrown, later in the processing. This is worse in a sense, because it should throw an exception upon sending a class loading request, rather than upon receiving the response.

I will work on a fix for this and publish a patch, which may take some time. In the meantime could you check if point 3 above could occur in your code? I would suggest in this case to use a lazy initialization of the JPPF client when submitting a job, such as :

Code: [Select]
public class MyJobSubmitter {

  private static JPPFClient jppfClient = null;

  public static List<JPPFTask> submitJob(JPPFJob job) {
    synchronized(MyJobSubmitter.class) {
      if (jppfClient == null) jppfClient = new JPPFClient("some uuid");
    }
    return jppfClient.submit(job);
  }
}

Regarding the load-balancing, I believe the built-in adaptive algorithms ("proportional", "autotuned" and "rl") may not work very well when the tasks vary a lot in terms of computational weight.
Thus I see 2 possiblitities
- use a round robin-like algorithm, such as the "manual" algorithm, to send a fixed number of tasks to each node
- write a more optimized, custom algorithm, which would be job-aware, and that would use annotations that you could add to your jobs as metadata. We also have a sample which covers this kind of custom load balancer, which you can find here.

Sincerely,
-Laurent
« Last Edit: December 16, 2011, 08:46:59 AM by lolo »
Logged

jbd

  • JPPF Knight
  • **
  • Posts: 19
Re: Node and Server Connectivity
« Reply #22 on: December 19, 2011, 08:09:25 PM »

Hi Laurent,

Thanks for taking time to work on this.

In Point 3, the client always submits a job.  All client invocation before the node shutdown submits a job and then dies.  When I recycle the node, the client invocation hangs.

I also tried the code you've provided, and the issue still exists.  In addition, it  seems that when the node and server gets into this state, the succeeding client submissions, even with different uuids, are also hanging.

Thanks again




 
Logged

lolo

  • Administrator
  • JPPF Council Member
  • *****
  • Posts: 2258
    • JPPF Web site
Re: Node and Server Connectivity
« Reply #23 on: December 20, 2011, 01:09:23 AM »

Joseph,

I implemented a fix for this issue, which I included in the v2.5.5 release. Normally, the effet of the fix is that, when the issue occurs, the server will either
- find an available client connection if there is one, and recover transparently
- or explictely fail the class loading request, in which case the tasks in the nodes will fail at deserialization or execution time, and the error will be forwarded to the server, and then to the client
In both cases there should be no hangs any more, but some of the tasks my not be executed at all.

Can you give it a ty and let us know if this is working?

Thanks,
-Laurent
Logged

jbd

  • JPPF Knight
  • **
  • Posts: 19
Re: Node and Server Connectivity
« Reply #24 on: December 21, 2011, 09:02:27 PM »

Hi Laurent,

Thanks, the fix seems to be working.

Now I am testing node concurrency.  I ran 1 instance of the node and set the processing threads to 2, I modified my hello world to wait for 5 sec before returning, then I run 2 concurrent jobs.  It seems that the node is processing the jobs one at a time.  The behavior is the same with or without a fixed uuid. The server still set to nio check = false.

Thanks again,

Joseph 
Logged
Pages: [1] 2   Go Up
 
JPPF Powered by SMF 2.0 RC5 | SMF © 2006–2011, Simple Machines LLC Get JPPF at SourceForge.net. Fast, secure and Free Open Source software downloads