JPPF, java, parallel computing, distributed computing, grid computing, parallel, distributed, cluster, grid, cloud, open source, android, .net
JPPF

The open source
grid computing
solution

 Home   About   Features   Download   Documentation   On Github   Forums 
February 25, 2020, 07:31:38 AM *
Welcome,
Please login or register.

Login with username, password and session length
Advanced search  
News: New users, please read this message. Thank you!
  Home Help Search Login Register  
Pages: [1]   Go Down

Author Topic: ChannelWrapperRemote NPE  (Read 1978 times)

wko

  • Guest
ChannelWrapperRemote NPE
« on: August 16, 2014, 01:48:08 AM »

Hello!

I have one driver hooked up to 8 nodes and two client services that send jobs to the drivers.  The JPPF configuration file on the client side is identical and contains these properties:

jppf.drivers = driver1 driver2

driver1.jppf.server.host = XXX
driver1.jppf.server.port = 11111
driver1.jppf.management.port = 11198
driver1.priority = 10
driver1.jppf.pool.size = 30

driver2.jppf.server.host = YYY
driver2.jppf.server.port = 11111
driver2.jppf.management.port = 11198
driver2.priority = 10
driver2.jppf.pool.size = 30

I execute the job at the same time on both clients.  One of the clients works correctly, the job executes and finishes successfully and I can see it execute on the admin console.  However, the other client is showing these exceptions/errors:

2014-08-15 15:57:09.131 PDT [RemoteChannelWrapper-driver1-5-0001] WARN  balancer.ChannelWrapperRemote   - java.lang.NullPointerException: null
2014-08-15 15:57:09.131 PDT [RemoteChannelWrapper-driver1-5-0001] DEBUG auto.JPPFHandler                - in ClientQueueListener.jobAdded jobName: ZZZ, jobUuid: 2E88076D-E058-D533-ED25-A1CDFFED799D
[client: driver1-5 - TasksServer] Attempting connection to the task server at XXX
2014-08-15 15:57:09.131 PDT [RemoteChannelWrapper-driver1-5-0001] INFO  auto.JPPFHandler                - in JobListener.jobStarted: job UUID: 2E88076D-E058-D533-ED25-A1CDFFED799D, name: ZZZ
2014-08-15 15:57:09.131 PDT [connecting driver1-5[sc2-qa-parser-j1:11111] : EXECUTING] ERROR lient.BaseJPPFClientConnection  - java.net.SocketException: Socket closed
2014-08-15 15:57:09.131 PDT [connecting driver1-5[sc2-qa-parser-j1:11111] : EXECUTING] WARN  nt.TaskServerConnectionHandler  - error initializing connection to job server: java.net.SocketException: Socket closed
[client: driver1-5 - TasksServer] Attempting connection to the task server at XXX
2014-08-15 15:57:09.131 PDT [driver1-5 - ClassServer] WARN  client.ClassServerDelegateImpl  - [driver1-5 - ClassServer] caught java.io.EOFException: null, will re-initialise ...
[client: driver1-5 - ClassServer] Attempting connection to the class server at XXX
...

This repeats over and over again.

Any hints on what the issue could be?

Please let me know if there is any additional information I can provide!

I don't think there are any network issues between the client and the driver, and I couldn't find any unusual error messages in the driver logs.

Thanks!
Andrew
Logged

lolo

  • Administrator
  • JPPF Council Member
  • *****
  • Posts: 2258
    • JPPF Web site
Re: ChannelWrapperRemote NPE
« Reply #1 on: August 16, 2014, 07:32:50 AM »

Hi Andrew,

First, could you tell us the version of JPPF you are using?

From the log extract you provided, it is very likely that the NPE you identified is the root cause, and the subsequent "java.net.SocketException: Socket closed" is caused by the client code closing the connection, following the NPE.

The first step would be to use the DEBUG logging level, to cause the client to log the full stack trace of the NPE instead of just the exception message. To do this, add the following line in your log4j-client.properties:
Code: [Select]
log4j.logger.org.jppf.client.balancer.ChannelWrapperRemote=DEBUG
I also noticed that, in your client configuration, you are setting up 30 connections to each driver. Could you try, for the purpose of investigating this issue, with just 1 connection and let us know if the problem still occurs? Also, it seems that your client is connecting to 2 separate drivers. It might provide some insight to know if the NPE occurs with both drivers or only one of them in particular. Could you try this and let us know of the outcome?

Thanks for your time,
-Laurent
Logged

wko

  • Guest
Re: ChannelWrapperRemote NPE
« Reply #2 on: August 18, 2014, 07:35:39 PM »

Hi Laurent!

Thanks for your assistance :)

We are using JPPF 4.2.1 for client/driver/nodes.

I've dropped log levels to debug on both the client and driver, then bounced both.  The issue seemed to disappear :(

However, I'm seeing a few more exceptions in the logs

This comes from the driver:

16:24:13,723 DEBUG StateTransitionTask:89 - error on channel SelectionKeyWrapper[id=188, readyOps=1, interestOps=0, context=channel=SelectionKeyWrapper[id=188], state=WAITING_NODE_REQUEST, resource=null, pendingResponses=0, type=node, peer=false, uuid=8104C36B-0CE3-FCA5-0586-DD462381BD70, secure=false, ssl=false] : java.lang.NullPointerException
   at java.util.ArrayList.<init>(ArrayList.java:168)
   at org.jppf.server.nio.classloader.client.ClientClassNioServer.getProviderConnections(ClientClassNioServer.java:165)
   at org.jppf.server.nio.classloader.node.WaitingNodeRequestState.findProviderConnection(WaitingNodeRequestState.java:198)
   at org.jppf.server.nio.classloader.node.WaitingNodeRequestState.processDynamic(WaitingNodeRequestState.java:173)
   at org.jppf.server.nio.classloader.node.WaitingNodeRequestState.processResource(WaitingNodeRequestState.java:105)
   at org.jppf.server.nio.classloader.node.WaitingNodeRequestState.performTransition(WaitingNodeRequestState.java:86)
   at org.jppf.server.nio.classloader.node.WaitingNodeRequestState.performTransition(WaitingNodeRequestState.java:39)
   at org.jppf.nio.StateTransitionTask.run(StateTransitionTask.java:82)
   at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
   at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
   at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
   at java.lang.Thread.run(Thread.java:745)

Here's another common one:

16:25:31,474 DEBUG ServerTaskBundleNode:192 - received exception for ServerTaskBundleNode[id=26, name=XXX, uuid=A829172B-AEFA-A4F5-80A5-32F37289F653, initialTaskCount=20, taskCount=10, cancelled=false, requeued=false] : java.io.EOFException
        at org.jppf.io.ChannelInputSource.read(ChannelInputSource.java:91)
        at org.jppf.io.ChannelInputSource.read(ChannelInputSource.java:61)
        at org.jppf.io.MultipleBuffersLocation.nonBlockingTransferFrom(MultipleBuffersLocation.java:193)
        at org.jppf.io.MultipleBuffersLocation.transferFrom(MultipleBuffersLocation.java:149)
        at org.jppf.nio.PlainNioObject.read(PlainNioObject.java:91)
        at org.jppf.nio.AbstractNioMessage.readNextObject(AbstractNioMessage.java:175)
        at org.jppf.nio.AbstractNioMessage.read(AbstractNioMessage.java:137)
        at org.jppf.server.nio.nodeserver.AbstractNodeContext.readMessage(AbstractNodeContext.java:284)
        at org.jppf.server.nio.nodeserver.WaitingResultsState.performTransition(WaitingResultsState.java:61)
        at org.jppf.server.nio.nodeserver.WaitingResultsState.performTransition(WaitingResultsState.java:39)
        at org.jppf.nio.StateTransitionTask.run(StateTransitionTask.java:82)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)

Are these ... possibly related?

Thanks again!
Andrew
Logged

lolo

  • Administrator
  • JPPF Council Member
  • *****
  • Posts: 2258
    • JPPF Web site
Re: ChannelWrapperRemote NPE
« Reply #3 on: August 18, 2014, 08:53:29 PM »

Hi Andrew,

Thanks for taking the time to debug this issue.
The first exception in the driver indicates that a node is making a class loading request, which the server is attempting to forward to the client that submitted the job. However, the server does not find any connection to the client and that's when the NPE occurs. This tells me two things:

1) our code doesn't properly handle the case when the list of connections is null. There should also be an exception if the related node's log file, is there any way you could check? For this I registered the bug JPPF-312 NPE in ClientClassNioServer.getProviderConnections() , I will release a patch tommorrow with the fix and keep you updated in this thread. When this bug is fixed, instead of having a NPE in the driver, you should have a ClassNotFoundException on the node side, which is the proper behavior.

2) no connections probably means means that the client was closed while at least one of the nodes is still executing a job. Can you confirm that your client is closed and that you do no wait for the job results on the client side?

It is possible that the EOFException that follows is a consequence of the NPE, but it's difficult to tell. Hopefully, the fix to JPPF-312 will clarify that.

Sincerely,
-Laurent
Logged

lolo

  • Administrator
  • JPPF Council Member
  • *****
  • Posts: 2258
    • JPPF Web site
Re: ChannelWrapperRemote NPE
« Reply #4 on: August 18, 2014, 09:44:31 PM »

Hi Andrew,

I fixed JPPF-312 and released the fix as patch 01 for JPPF 4.2.1. Please let us know if this resolves your issues.

Sincerely,
-Laurent
Logged

wko

  • Guest
Re: ChannelWrapperRemote NPE
« Reply #5 on: August 21, 2014, 04:46:31 AM »

Hi!

Sorry for the delay.  Thanks for the quick patch!

It is indeed possible that the client was closed while the job was executing for the 2nd issue (slightly hard to verify since we're testing with 10 clients in a bit of an ad-hoc fashion).

Cheers,
Andrew
Logged

wko

  • Guest
Re: ChannelWrapperRemote NPE
« Reply #6 on: October 20, 2014, 11:21:56 PM »

The root cause ended up being a bug in serialization (specifically we were using the Apache Thrift libraries and they don't handle null key values for maps with enum key types):

With this "log4j.logger.org.jppf.client.balancer.ChannelWrapperRemote=DEBUG" enabled, we got the following exception:

        ... (custom serialization logic)
   at sun.reflect.GeneratedMethodAccessor41.invoke(Unknown Source)
   at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   at java.lang.reflect.Method.invoke(Method.java:483)
   at java.io.ObjectStreamClass.invokeWriteObject(ObjectStreamClass.java:988)
   at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1496)
   at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432)
   at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178)
   at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548)
   at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509)
   at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432)
   at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178)
   at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:348)
   at org.jppf.serialization.DefaultJavaSerialization.serialize(DefaultJavaSerialization.java:31)
   at org.jppf.utils.ObjectSerializerImpl.serialize(ObjectSerializerImpl.java:88)
   at org.jppf.io.IOHelper.serializeDataToMemory(IOHelper.java:327)
   at org.jppf.io.IOHelper.serializeData(IOHelper.java:308)
   at org.jppf.io.IOHelper.sendData(IOHelper.java:290)
   at org.jppf.client.BaseJPPFClientConnection.sendTasks(BaseJPPFClientConnection.java:148)
   at org.jppf.client.balancer.ChannelWrapperRemote$RemoteRunnable.run(ChannelWrapperRemote.java:219)

It'd be lovely if this exception was exposed at a WARN level instead

Thanks for the help again!
Andrew
Logged
Pages: [1]   Go Up
 
JPPF Powered by SMF 2.0 RC5 | SMF © 2006–2011, Simple Machines LLC Get JPPF at SourceForge.net. Fast, secure and Free Open Source software downloads