JPPF, java, parallel computing, distributed computing, grid computing, parallel, distributed, cluster, grid, cloud, open source, android, .net
JPPF

The open source
grid computing
solution

 Home   About   Features   Download   Documentation   Forums 
October 22, 2019, 03:52:12 PM *
Welcome,
Please login or register.

Login with username, password and session length
Advanced search  
News: New users, please read this message. Thank you!
  Home Help Search Login Register  
Pages: [1]   Go Down

Author Topic: Node Losing Ability to Load Classes from Server  (Read 2097 times)

djroze

  • JPPF Knight
  • **
  • Posts: 20
Node Losing Ability to Load Classes from Server
« on: December 10, 2013, 09:08:54 PM »

Hi Laurent,

   I have a node connecting using SSL across a network boundary. It seems to run tasks for a while and continues to try to reconnect if there are connection errors. However, it seems that when errors occur during certain stages of the node-server communication, the processes goes into a state where the node and server are connected but the node cannot execute tasks properly. The server continues to send tasks to the node but they fail due to classloading errors. The following exceptions appear to happen around the same time when the repeating errors begin (timestamps are slightly different because of clock differences on the server and node):

Node:
Quote
2013-12-10 04:58:23,967 [DEBUG][org.jppf.classloader.ClassLoaderRequestHandler.run(154)]: sending batch of 1 class loading requests: CompositeResourceWrapper[resources=[JPPFResourceWrapper[dynamic=false,name=<APPLICATION RESOURCE>, state=NODE_REQUEST]]]
2013-12-10 04:58:23,970 [DEBUG][org.jppf.classloader.ClassLoaderRequestHandler.run(160)]: got response null
2013-12-10 04:58:23,970 [DEBUG][org.jppf.classloader.AbstractJPPFClassLoaderLifeCycle.loadResource(140)]: connection with class server ended, re-initializing, exception is:
java.net.SocketException: Connection timed out
        at java.net.SocketOutputStream.socketWrite0(Native Method)
        at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:109)
        at java.net.SocketOutputStream.write(SocketOutputStream.java:153)
        at sun.security.ssl.OutputRecord.writeBuffer(OutputRecord.java:377)
        at sun.security.ssl.OutputRecord.write(OutputRecord.java:363)
        at sun.security.ssl.SSLSocketImpl.writeRecordInternal(SSLSocketImpl.java:830)
        at sun.security.ssl.SSLSocketImpl.writeRecord(SSLSocketImpl.java:801)
        at sun.security.ssl.AppOutputStream.write(AppOutputStream.java:122)
        at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
        at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140)
        at java.io.DataOutputStream.flush(DataOutputStream.java:123)
        at org.jppf.comm.socket.AbstractSocketWrapper.writeInt(AbstractSocketWrapper.java:171)
        at org.jppf.io.IOHelper.sendData(IOHelper.java:292)
        at org.jppf.classloader.RemoteResourceRequest.run(RemoteResourceRequest.java:76)
        at org.jppf.classloader.ClassLoaderRequestHandler$PeriodicTask.run(ClassLoaderRequestHandler.java:157)
        at java.lang.Thread.run(Thread.java:722)

Server:
Quote
2013-12-10 12:58:51,151 [DEBUG][org.jppf.server.nio.StateTransitionTask.run(92)]: error on channel SelectionKeyWrapper[id=98, <NODE_HOSTNAME>:<NODE_PORT>, readyOps=1, keyOps=0, context=RemoteNodeContext[channel=SelectionKeyWrapper[id=98], state=WAITING_RESULTS, uuid=98554C6E-CEF6-FF6B-26F2-F6D6741D978B, connectionUuid=null, peer=false]] : java.io.EOFException: null
java.io.EOFException
        at org.jppf.server.nio.SSLNioObject.read(SSLNioObject.java:99)
        at org.jppf.server.nio.AbstractNioMessage.readNextObject(AbstractNioMessage.java:139)
        at org.jppf.server.nio.AbstractNioMessage.read(AbstractNioMessage.java:101)
        at org.jppf.server.nio.nodeserver.AbstractNodeContext.readMessage(AbstractNodeContext.java:232)
        at org.jppf.server.nio.nodeserver.WaitingResultsState.performTransition(WaitingResultsState.java:69)
        at org.jppf.server.nio.nodeserver.WaitingResultsState.performTransition(WaitingResultsState.java:1)
        at org.jppf.server.nio.StateTransitionTask.run(StateTransitionTask.java:82)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
        at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
        at java.util.concurrent.FutureTask.run(FutureTask.java:166)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:724)
2013-12-10 12:58:51,273 [DEBUG][org.jppf.server.protocol.ServerTaskBundleNode.taskCompleted(190)]: received exception for ServerTaskBundleNode[id=5064, name=<JOB NAME>, uuid=22363685-cc36-4b5c-a2be-7e1ed721784f, initialTaskCount=1, taskCount=1, cancelled=false, requeued=true] : java.lang.Exception: java.io.EOFException
        at org.jppf.server.nio.nodeserver.AbstractNodeContext.handleException(AbstractNodeContext.java:185)
        at org.jppf.server.nio.StateTransitionTask.run(StateTransitionTask.java:94)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
        at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
        at java.util.concurrent.FutureTask.run(FutureTask.java:166)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:724)
Caused by: java.io.EOFException
        at org.jppf.server.nio.SSLNioObject.read(SSLNioObject.java:99)
        at org.jppf.server.nio.AbstractNioMessage.readNextObject(AbstractNioMessage.java:139)
        at org.jppf.server.nio.AbstractNioMessage.read(AbstractNioMessage.java:101)
        at org.jppf.server.nio.nodeserver.AbstractNodeContext.readMessage(AbstractNodeContext.java:232)
        at org.jppf.server.nio.nodeserver.WaitingResultsState.performTransition(WaitingResultsState.java:69)
        at org.jppf.server.nio.nodeserver.WaitingResultsState.performTransition(WaitingResultsState.java:1)
        at org.jppf.server.nio.StateTransitionTask.run(StateTransitionTask.java:82)
        ... 6 more

The node outputs the following a short time later…

Quote
2013-12-10 04:58:37,205 [DEBUG][org.jppf.classloader.AbstractJPPFClassLoader.findClass(154)]: found definition for resource [org.jppf.server.node.JPPFContainer$ObjectDeserializationTask, definitionLength=2809]
2013-12-10 04:58:37,211 [DEBUG][org.jppf.classloader.AbstractJPPFClassLoader.loadJPPFClass(86)]: looking up resource [org.jppf.utils.ObjectSerializerImpl]
2013-12-10 04:58:37,211 [DEBUG][org.jppf.classloader.AbstractJPPFClassLoader.loadJPPFClass(90)]: resource [org.jppf.utils.ObjectSerializerImpl] not already loaded
2013-12-10 04:58:37,212 [DEBUG][org.jppf.classloader.AbstractJPPFClassLoader.findClass(138)]: looking up definition for resource [org.jppf.utils.ObjectSerializerImpl]
2013-12-10 04:58:37,212 [DEBUG][org.jppf.classloader.AbstractJPPFClassLoaderLifeCycle.loadResource(136)]: loading remote definition for resource [org/jppf/utils/ObjectSerializerImpl.class]
2013-12-10 04:58:37,213 [DEBUG][org.jppf.classloader.ClassLoaderRequestHandler.run(154)]: sending batch of 1 class loading requests: CompositeResourceWrapper[resources=[JPPFResourceWrapper[dynamic=true, name=org/jppf/utils/ObjectSerializerImpl.class, state=NODE_REQUEST]]]
2013-12-10 04:58:37,423 [DEBUG][org.jppf.classloader.ClassLoaderRequestHandler.run(160)]: got response CompositeResourceWrapper[resources=[JPPFResourceWrapper[dynamic=true, name=org/jppf/utils/ObjectSerializerImpl.class, state=NODE_RESPONSE_ERROR]]]
2013-12-10 04:58:37,424 [DEBUG][org.jppf.classloader.AbstractJPPFClassLoaderLifeCycle.loadResource(138)]: remote definition for resource [org/jppf/utils/ObjectSerializerImpl.class] not found
2013-12-10 04:58:37,424 [DEBUG][org.jppf.classloader.AbstractJPPFClassLoader.findClass(149)]: definition for resource [org.jppf.utils.ObjectSerializerImpl] not found
2013-12-10 04:58:37,425 [ERROR][org.jppf.server.node.JPPFContainer.call(211)]: Could not load class 'org.jppf.utils.ObjectSerializerImpl' [object index: 0]
java.lang.ClassNotFoundException: Could not load class 'org.jppf.utils.ObjectSerializerImpl'
        at org.jppf.classloader.AbstractJPPFClassLoader.findClass(AbstractJPPFClassLoader.java:152)
        at org.jppf.classloader.AbstractJPPFClassLoader.loadJPPFClass(AbstractJPPFClassLoader.java:91)
        at org.jppf.utils.SerializationHelperImpl.getSerializer(SerializationHelperImpl.java:55)
        at org.jppf.server.node.JPPFContainer$ObjectDeserializationTask.call(JPPFContainer.java:202)
        at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
        at java.util.concurrent.FutureTask.run(FutureTask.java:166)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:722)

… and then seems to repeat these errors whenever it is sent a task, even if restarted:

Quote
2013-12-10 11:28:06,899 [DEBUG][org.jppf.server.node.remote.RemoteNodeIO.deserializeObjects(71)]: got bundle JPPFTaskBundle[name=<JOB NAME>, uuid=667371f2-4a28-4bec-9a66-c7608a45b214, initialTaskCount=1, taskCount=1, bundleUuid=FCBBEC58-7CAB-CC38-9AE9-1C290FB255F1-14300, uuidPath=TraversalList[position=0, list=[E184478B-A230-E1E8-6471-01C640A6BD66, FCBBEC58-7CAB-CC38-9AE9-1C290FB255F1]]]
2013-12-10 11:28:06,899 [DEBUG][org.jppf.server.node.remote.RemoteNodeIO.deserializeObjects(85)]: bundle task count = 1, state = EXECUTION_BUNDLE
2013-12-10 11:28:06,900 [DEBUG][org.jppf.classloader.AbstractJPPFClassLoader.loadJPPFClass(86)]: looking up resource [org.jppf.utils.ObjectSerializerImpl]
2013-12-10 11:28:06,900 [DEBUG][org.jppf.classloader.AbstractJPPFClassLoader.loadJPPFClass(90)]: resource [org.jppf.utils.ObjectSerializerImpl] not already loaded
2013-12-10 11:28:06,900 [DEBUG][org.jppf.classloader.AbstractJPPFClassLoader.findClass(138)]: looking up definition for resource [org.jppf.utils.ObjectSerializerImpl]
2013-12-10 11:28:06,900 [DEBUG][org.jppf.classloader.AbstractJPPFClassLoaderLifeCycle.loadResource(136)]: loading remote definition for resource [org/jppf/utils/ObjectSerializerImpl.class]
2013-12-10 11:28:06,901 [DEBUG][org.jppf.classloader.ClassLoaderRequestHandler.run(154)]: sending batch of 1 class loading requests: CompositeResourceWrapper[resources=[JPPFResourceWrapper[dynamic=true, name=org/jppf/utils/ObjectSerializerImpl.class, state=NODE_REQUEST]]]
2013-12-10 11:28:06,902 [DEBUG][org.jppf.classloader.ClassLoaderRequestHandler.run(160)]: got response null
2013-12-10 11:28:06,903 [DEBUG][org.jppf.classloader.AbstractJPPFClassLoaderLifeCycle.loadResource(140)]: connection with class server ended, re-initializing, exception is:
java.net.SocketException: Connection closed by remote host
        at sun.security.ssl.SSLSocketImpl.checkWrite(SSLSocketImpl.java:1506)
        at sun.security.ssl.AppOutputStream.write(AppOutputStream.java:70)
        at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
        at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140)
        at java.io.DataOutputStream.flush(DataOutputStream.java:123)
        at org.jppf.comm.socket.AbstractSocketWrapper.writeInt(AbstractSocketWrapper.java:171)
        at org.jppf.io.IOHelper.sendData(IOHelper.java:292)
        at org.jppf.classloader.RemoteResourceRequest.run(RemoteResourceRequest.java:76)
        at org.jppf.classloader.ClassLoaderRequestHandler$PeriodicTask.run(ClassLoaderRequestHandler.java:157)
        at java.lang.Thread.run(Thread.java:722)
2013-12-10 11:28:06,904 [ERROR][org.jppf.server.node.JPPFContainer.call(211)]: connection with class server ended, re-initializing, exception is: [object index: 0]
org.jppf.JPPFNodeReconnectionNotification: connection with class server ended, re-initializing, exception is:
        at org.jppf.classloader.AbstractJPPFClassLoaderLifeCycle.loadResource(AbstractJPPFClassLoaderLifeCycle.java:141)
        at org.jppf.classloader.AbstractJPPFClassLoader.findClass(AbstractJPPFClassLoader.java:145)
        at org.jppf.classloader.AbstractJPPFClassLoader.loadJPPFClass(AbstractJPPFClassLoader.java:91)
        at org.jppf.utils.SerializationHelperImpl.getSerializer(SerializationHelperImpl.java:55)
        at org.jppf.server.node.JPPFContainer$ObjectDeserializationTask.call(JPPFContainer.java:202)
        at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
        at java.util.concurrent.FutureTask.run(FutureTask.java:166)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:722)
Caused by: java.net.SocketException: Connection closed by remote host
        at sun.security.ssl.SSLSocketImpl.checkWrite(SSLSocketImpl.java:1506)
        at sun.security.ssl.AppOutputStream.write(AppOutputStream.java:70)
        at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
        at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140)
        at java.io.DataOutputStream.flush(DataOutputStream.java:123)
        at org.jppf.comm.socket.AbstractSocketWrapper.writeInt(AbstractSocketWrapper.java:171)
        at org.jppf.io.IOHelper.sendData(IOHelper.java:292)
        at org.jppf.classloader.RemoteResourceRequest.run(RemoteResourceRequest.java:76)
        at org.jppf.classloader.ClassLoaderRequestHandler$PeriodicTask.run(ClassLoaderRequestHandler.java:157)
        ... 1 more
2013-12-10 11:28:07,513 [DEBUG][org.jppf.classloader.AbstractJPPFClassLoader.loadJPPFClass(86)]: looking up resource [org.jppf.utils.ObjectSerializerImpl]
2013-12-10 11:28:07,514 [DEBUG][org.jppf.classloader.AbstractJPPFClassLoader.loadJPPFClass(90)]: resource [org.jppf.utils.ObjectSerializerImpl] not already loaded
2013-12-10 11:28:07,514 [DEBUG][org.jppf.classloader.AbstractJPPFClassLoader.findClass(138)]: looking up definition for resource [org.jppf.utils.ObjectSerializerImpl]
2013-12-10 11:28:07,514 [DEBUG][org.jppf.classloader.AbstractJPPFClassLoaderLifeCycle.loadResource(136)]: loading remote definition for resource [org/jppf/utils/ObjectSerializerImpl.class]
2013-12-10 11:28:07,515 [DEBUG][org.jppf.classloader.ClassLoaderRequestHandler.run(154)]: sending batch of 1 class loading requests: CompositeResourceWrapper[resources=[JPPFResourceWrapper[dynamic=true, name=org/jppf/utils/ObjectSerializerImpl.class, state=NODE_REQUEST]]]
2013-12-10 11:28:07,516 [DEBUG][org.jppf.classloader.AbstractJPPFClassLoaderLifeCycle.loadResource(140)]: connection with class server ended, re-initializing, exception is:
java.net.SocketException: Connection closed by remote host
        at sun.security.ssl.SSLSocketImpl.checkWrite(SSLSocketImpl.java:1506)
        at sun.security.ssl.AppOutputStream.write(AppOutputStream.java:70)
        at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
        at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140)
        at java.io.DataOutputStream.flush(DataOutputStream.java:123)
        at org.jppf.comm.socket.AbstractSocketWrapper.writeInt(AbstractSocketWrapper.java:171)
        at org.jppf.io.IOHelper.sendData(IOHelper.java:292)
        at org.jppf.classloader.RemoteResourceRequest.run(RemoteResourceRequest.java:76)
        at org.jppf.classloader.ClassLoaderRequestHandler$PeriodicTask.run(ClassLoaderRequestHandler.java:157)
        at java.lang.Thread.run(Thread.java:722)
2013-12-10 11:28:07,518 [ERROR][org.jppf.server.node.JPPFContainer.call(211)]: connection with class server ended, re-initializing, exception is: [object index: 1]
org.jppf.JPPFNodeReconnectionNotification: connection with class server ended, re-initializing, exception is:
        at org.jppf.classloader.AbstractJPPFClassLoaderLifeCycle.loadResource(AbstractJPPFClassLoaderLifeCycle.java:141)
        at org.jppf.classloader.AbstractJPPFClassLoader.findClass(AbstractJPPFClassLoader.java:145)
        at org.jppf.classloader.AbstractJPPFClassLoader.loadJPPFClass(AbstractJPPFClassLoader.java:91)
        at org.jppf.utils.SerializationHelperImpl.getSerializer(SerializationHelperImpl.java:55)
        at org.jppf.server.node.JPPFContainer$ObjectDeserializationTask.call(JPPFContainer.java:202)
        at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
        at java.util.concurrent.FutureTask.run(FutureTask.java:166)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:722)
Caused by: java.net.SocketException: Connection closed by remote host
        at sun.security.ssl.SSLSocketImpl.checkWrite(SSLSocketImpl.java:1506)
        at sun.security.ssl.AppOutputStream.write(AppOutputStream.java:70)
        at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
        at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140)
        at java.io.DataOutputStream.flush(DataOutputStream.java:123)
        at org.jppf.comm.socket.AbstractSocketWrapper.writeInt(AbstractSocketWrapper.java:171)
        at org.jppf.io.IOHelper.sendData(IOHelper.java:292)
        at org.jppf.classloader.RemoteResourceRequest.run(RemoteResourceRequest.java:76)
        at org.jppf.classloader.ClassLoaderRequestHandler$PeriodicTask.run(ClassLoaderRequestHandler.java:157)
        ... 1 more
2013-12-10 11:28:07,519 [ERROR][org.jppf.server.node.remote.RemoteNodeIO.deserializeObjects(102)]: Exception occurred while deserializing the tasks
org.jppf.JPPFNodeReconnectionNotification: connection with class server ended, re-initializing, exception is:
        at org.jppf.classloader.AbstractJPPFClassLoaderLifeCycle.loadResource(AbstractJPPFClassLoaderLifeCycle.java:141)
        at org.jppf.classloader.AbstractJPPFClassLoader.findClass(AbstractJPPFClassLoader.java:145)
        at org.jppf.classloader.AbstractJPPFClassLoader.loadJPPFClass(AbstractJPPFClassLoader.java:91)
        at org.jppf.utils.SerializationHelperImpl.getSerializer(SerializationHelperImpl.java:55)
        at org.jppf.server.node.JPPFContainer$ObjectDeserializationTask.call(JPPFContainer.java:202)
        at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
        at java.util.concurrent.FutureTask.run(FutureTask.java:166)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:722)
Caused by: java.net.SocketException: Connection closed by remote host
        at sun.security.ssl.SSLSocketImpl.checkWrite(SSLSocketImpl.java:1506)
        at sun.security.ssl.AppOutputStream.write(AppOutputStream.java:70)
        at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
        at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140)
        at java.io.DataOutputStream.flush(DataOutputStream.java:123)
        at org.jppf.comm.socket.AbstractSocketWrapper.writeInt(AbstractSocketWrapper.java:171)
        at org.jppf.io.IOHelper.sendData(IOHelper.java:292)
        at org.jppf.classloader.RemoteResourceRequest.run(RemoteResourceRequest.java:76)
        at org.jppf.classloader.ClassLoaderRequestHandler$PeriodicTask.run(ClassLoaderRequestHandler.java:157)
        ... 1 more
2013-12-10 11:28:07,520 [DEBUG][org.jppf.server.node.JPPFNode.perform(159)]: received an empty bundle

The server displays this a few times after the initial error. It does not repeat indefinitely, but it does appear a few times whenever the node is restarted:

Quote
2013-12-10 12:59:04,453 [DEBUG][org.jppf.server.nio.StateTransitionTask.run(92)]: error on channel SelectionKeyWrapper[id=69, <NODE_HOSTNAME>:<NODE_PORT>, readyOps=5, keyOps=0, context=ClassContext[channel=SelectionKeyWrapper[id=69], uuid=E184478B-A230-E1E8-6471-01C640A6BD66, state=SENDING_PROVIDER_REQUEST, resource=null, pendingRequests=1, currentRequest=null, connectionUuid=E184478B-A230-E1E8-6471-01C640A6BD66_1, type=client, peer=false]] : java.net.ConnectException: provider SelectionKeyWrapper[id=69, <NODE_HOSTNAME>:<NODE_PORT>, readyOps=5, keyOps=0, context=ClassContext[channel=SelectionKeyWrapper[id=69], uuid=E184478B-A230-E1E8-6471-01C640A6BD66, state=SENDING_PROVIDER_REQUEST, resource=null, pendingRequests=1, currentRequest=null, connectionUuid=E184478B-A230-E1E8-6471-01C640A6BD66_1, type=client, peer=false]] has been disconnected
java.net.ConnectException: provider SelectionKeyWrapper[id=69, <NODE_HOSTNAME>:<NODE_PORT>, readyOps=5, keyOps=0, context=ClassContext[channel=SelectionKeyWrapper[id=69], uuid=E184478B-A230-E1E8-6471-01C640A6BD66, state=SENDING_PROVIDER_REQUEST, resource=null, pendingRequests=1, currentRequest=null, connectionUuid=E184478B-A230-E1E8-6471-01C640A6BD66_1, type=client, peer=false]] has been disconnected
        at org.jppf.server.nio.classloader.client.SendingProviderRequestState.performTransition(SendingProviderRequestState.java:65)
        at org.jppf.server.nio.classloader.client.SendingProviderRequestState.performTransition(SendingProviderRequestState.java:1)
        at org.jppf.server.nio.StateTransitionTask.run(StateTransitionTask.java:82)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
        at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
        at java.util.concurrent.FutureTask.run(FutureTask.java:166)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:724)

   Please let me know if you have any ideas as to how to fix/avoid this or if there’s any other information from the logs that would be useful; there’s a lot there so I’m not sure which parts are important. Thanks in advance!

- Daniel
Logged

lolo

  • Administrator
  • JPPF Council Member
  • *****
  • Posts: 2256
    • JPPF Web site
Re: Node Losing Ability to Load Classes from Server
« Reply #1 on: December 12, 2013, 06:17:35 AM »

Hello Daniel,

I do not know what is causing the "SocketException: connection timed out", but I guess it is related to the network conidtions between the ndoe and server.
When this happens, it causes the node to reinitialize its connections with the server, like a restart but without actually stopping the process.

The EOFExceptions in the first server log you provided simply indicate that the node was disconnected, so we can ignore these.

What is more interesting is the second server log extract, which indicates that the connection between server and client was terminated, or at least the server believes it was terminated. Could you check your client log for errors or exception? If there is none, could you try to add this property in the server's configuration, then restart it and see if these problems still occur:
Code: [Select]
jppf.nio.check.connection = false
Thanks,
-Laurent
Logged

djroze

  • JPPF Knight
  • **
  • Posts: 20
Re: Node Losing Ability to Load Classes from Server
« Reply #2 on: December 12, 2013, 08:25:01 PM »

Hi Laurent,

   In the client log I found three instances of the following error around the same time:

Quote
2013-12-10 04:58:40,951 ERROR [BaseJPPFClientConnection.receiveBundleAndResults]:
java.io.EOFException
        at java.io.DataInputStream.readInt(DataInputStream.java:392)
        at org.jppf.comm.socket.AbstractSocketWrapper.readInt(AbstractSocketWrapper.java:264)
        at org.jppf.io.SocketWrapperInputSource.readInt(SocketWrapperInputSource.java:90)
        at org.jppf.io.IOHelper.readData(IOHelper.java:120)
        at org.jppf.io.IOHelper.unwrappedData(IOHelper.java:186)
        at org.jppf.client.BaseJPPFClientConnection.receiveBundleAndResults(BaseJPPFClientConnection.java:212)
        at org.jppf.client.BaseJPPFClientConnection.receiveResults(BaseJPPFClientConnection.java:260)
        at org.jppf.client.BaseJPPFClientConnection.receiveResults(BaseJPPFClientConnection.java:277)
        at org.jppf.client.balancer.ChannelWrapperRemote$RemoteRunnable.run(ChannelWrapperRemote.java:254)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
        at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
        at java.util.concurrent.FutureTask.run(FutureTask.java:166)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:722)
2013-12-10 04:58:40,952 DEBUG [ChannelWrapperRemote.run]:
java.io.EOFException
        at java.io.DataInputStream.readInt(DataInputStream.java:392)
        at org.jppf.comm.socket.AbstractSocketWrapper.readInt(AbstractSocketWrapper.java:264)
        at org.jppf.io.SocketWrapperInputSource.readInt(SocketWrapperInputSource.java:90)
        at org.jppf.io.IOHelper.readData(IOHelper.java:120)
        at org.jppf.io.IOHelper.unwrappedData(IOHelper.java:186)
        at org.jppf.client.BaseJPPFClientConnection.receiveBundleAndResults(BaseJPPFClientConnection.java:212)
        at org.jppf.client.BaseJPPFClientConnection.receiveResults(BaseJPPFClientConnection.java:260)
        at org.jppf.client.BaseJPPFClientConnection.receiveResults(BaseJPPFClientConnection.java:277)
        at org.jppf.client.balancer.ChannelWrapperRemote$RemoteRunnable.run(ChannelWrapperRemote.java:254)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
        at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
        at java.util.concurrent.FutureTask.run(FutureTask.java:166)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:722)

   I assume that this error is triggered by network hiccups between the client and server but I would hope that it would not permanently "damage" the classloading functionality for future jobs. Do you have any ideas for how to keep this error from impacting the rest of the tasks?

Thanks in advance,
Daniel
Logged

lolo

  • Administrator
  • JPPF Council Member
  • *****
  • Posts: 2256
    • JPPF Web site
Re: Node Losing Ability to Load Classes from Server
« Reply #3 on: December 13, 2013, 07:56:14 AM »

Hi Daniel,

By default, when a client connection is terminated, the server will cancel the jobs submitted via this connection. This behavior can be overriden in the job SLA by calling JPPFJob.setCancelUponClientDisconnect(false). However, when the class loader connection is interrupted, no further remote class loading will be possible, and this is a bigger problem, since it will prevent from executing the tasks if not all required classes have been loaded by the node(s). Also, even if the job is not cancelled, no results will be sent to a client. This means that once the connection between client and server is broken, no recovery will occur from the client's point of view, unless you use a persistence manager.

The only way to prevent this is to have the classes present in, or dynamically added to, the node's local classpath. It can be done in two general ways:
- statically, by transferring the jars and class folders to the node's file system and adding them to the JVM classpath in the launch script
- dynamically, by sending the jars along with the job, for instance in its metadata, then using an extended NodeLifeCycleListener to eventually reset the task class loader then dynamically add to its classpath. There is also a sample which provides a more sophisticated version of this, permitting management of updated jars.

You may also combine these two approaches, by statically deploying 3rd party libraries and only sending your classes within the job. To make it clear, the only way that guarantees zero impact on class loading, is to have everything deployed locally to each node, including the classes loaded by the node from the server (in jppf-common.jar and jppf-server.jar).

This being said, I'm still not sure the broken connection originates from the client, and the EOFException may be caused by the server actually closing the connection. This is an issue we've seen on specific OSes (on AIX), for which a workaround is to remove some checks performed on the server-side, by setting "jppf.nio.check.connection = false" in the server's configuration. Would you mind trying this and letting us know of the outcome?

Thanks,
-Laurent
Logged

djroze

  • JPPF Knight
  • **
  • Posts: 20
Re: Node Losing Ability to Load Classes from Server
« Reply #4 on: December 30, 2013, 11:41:43 PM »

Hi Laurent,

   I've tried the check.connection option as requested and it appears to reduce the amount / variety of error messages that are printed on the server/client but it appears that the classloader connection still gets interrupted and causes jobs to stop being processed (I think the connection is being broken due to network hiccups between the client and server, not the server closing the connection).

   I attempted to use local classloading per your recommendation here but I can't find a combination of JARs on the node's classpath that allows the node to start up normally other than what I've been using (which is just jppf-common-node).

* If I use jppf-common instead of jppf-common-node, the node launcher class cannot be found.
* If I use jppf-common-node and jppf-server, I get the following error:
Quote
2013-12-30 14:33:26,250 [DEBUG][org.jppf.utils.JPPFDefaultUncaughtExceptionHandler.uncaughtException(44)]: Uncaught exception in thread Thread[main,5,main]
java.lang.NoClassDefFoundError: org/jppf/server/node/AbstractClassLoaderManager
        at java.lang.ClassLoader.defineClass1(Native Method)
        at java.lang.ClassLoader.defineClass(ClassLoader.java:791)
        at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
        at java.net.URLClassLoader.defineClass(URLClassLoader.java:449)
        at java.net.URLClassLoader.access$100(URLClassLoader.java:71)
        at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
        at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
        at java.security.AccessController.doPrivileged(Native Method)
        at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:423)
        at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:410)
        at org.jppf.classloader.AbstractJPPFClassLoader.loadClass(AbstractJPPFClassLoader.java:339)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:356)
        at java.lang.Class.getDeclaredConstructors0(Native Method)
        at java.lang.Class.privateGetDeclaredConstructors(Class.java:2413)
        at java.lang.Class.getConstructor0(Class.java:2723)
        at java.lang.Class.newInstance0(Class.java:345)
        at java.lang.Class.newInstance(Class.java:327)
        at org.jppf.node.NodeRunner.createNode(NodeRunner.java:171)
        at org.jppf.node.NodeRunner.main(NodeRunner.java:123)
Caused by: java.lang.ClassNotFoundException: org.jppf.server.node.AbstractClassLoaderManager
        at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
        at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
        at java.security.AccessController.doPrivileged(Native Method)
        at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:423)
        at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:356)
        ... 21 more
* If I use jppf-common, jppf-common-node and jppf-server I get the following error:
Quote
2013-12-30 14:34:16,127 [DEBUG][org.jppf.utils.JPPFDefaultUncaughtExceptionHandler.uncaughtException(44)]: Uncaught exception in thread Thread[main,5,main]
java.lang.IllegalAccessError: tried to access method org.jppf.server.node.remote.RemoteClassLoaderManager.<init>(Lorg/jppf/server/node/JPPFNode;)V from class org.jppf.server.node.remote.JPPFRemoteNode
        at org.jppf.server.node.remote.JPPFRemoteNode.<init>(JPPFRemoteNode.java:54)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:525)
        at java.lang.Class.newInstance0(Class.java:374)
        at java.lang.Class.newInstance(Class.java:327)
        at org.jppf.node.NodeRunner.createNode(NodeRunner.java:171)
        at org.jppf.node.NodeRunner.main(NodeRunner.java:123)

In JPPF-189 it looks like this last combination is theoretically fixed in version 3.3.6, which is the version I'm using, but the workaround suggested to the original poster seemed to be to not use jppf-common and jppf-server in the node's classpath. What JARs specifically should I put in the node classpath to get everything loaded locally?

Thanks,
Daniel
Logged

lolo

  • Administrator
  • JPPF Council Member
  • *****
  • Posts: 2256
    • JPPF Web site
Re: Node Losing Ability to Load Classes from Server
« Reply #5 on: December 31, 2013, 06:52:27 AM »

Hi Daniel,

I confirm that the fix for JPPF-189 was not included in JPPF v3.3.6, I apologize for this.
However, it is part of the JPPF 3.3.7 release, so I recommend that you upgrade to this version. I just tried with these 2 versions, 3.3.6 produces the IllegalAccessError, whereas 3.3.7 doesn't.

The jars you need to add, in addition to jppf-common-node.jar, are the following:
- jppf-common.jar
- jppf-server.jar
- jmxremote_optional-1.0_01-ea.jar (otherwise management will be disabled for the node), which you can find in the driver distribution

Sincerely,
-Laurent
Logged
Pages: [1]   Go Up
 
JPPF Powered by SMF 2.0 RC5 | SMF © 2006–2011, Simple Machines LLC Get JPPF at SourceForge.net. Fast, secure and Free Open Source software downloads