JPPF Issue Tracker
star_faded.png
Please log in to bookmark issues
bug_report_small.png
CLOSED  Bug report JPPF-383  -  JMX request stuck waiting for a response upon node termination
Posted Apr 08, 2015 - updated Apr 27, 2015
icon_info.png This issue has been closed with status "Closed" and resolution "RESOLVED".
Issue details
  • Type of issue
    Bug report
  • Status
     
    Closed
  • Assigned to
     lolo4j
  • Progress
       
  • Type of bug
    Not triaged
  • Likelihood
    Not triaged
  • Effect
    Not triaged
  • Posted by
     lolo4j
  • Owned by
    Not owned by anyone
  • Category
    JMX connector
  • Resolution
    RESOLVED
  • Priority
    Normal
  • Reproducability
    Often
  • Severity
    Normal
  • Targetted for
    icon_milestones.png JPPF 4.2.8
Issue description
In some conditions, a JMX request to a remote peer can be stuck waiting for a response from the peer, when this peer is killed before it could send a response, i.e. while still executing the JMX request. When this happens in the driver, it results in one thread locking a JPPFNodeConnectionWrapper and never releasing it, in turn causing other threads to be stuck waiting for this JPPFNodeConnectionWrapper, as shown in this thread dump extract:
Locking thread - locked <0x000000062d8bde10> (a org.jppf.management.JMXNodeConnectionWrapper)
 
"JobExecutor_2203" daemon prio=5 tid=0x00002abb7e7a8800 nid=0x5f59 in Object.wait() [0x00002abbbaefb000]
   java.lang.Thread.State: TIMED_WAITING (on object monitor)
  at java.lang.Object.wait(Native Method)
  at com.sun.jmx.remote.generic.ClientSynchroMessageConnectionImpl.sendWithReturn(ClientSynchroMessageConnectionImpl.java:314)
  - locked <0x00000006346cffb0> (a com.sun.jmx.remote.generic.ClientSynchroMessageConnectionImpl$ResponseMsgWrapper)
  at javax.management.remote.generic.AbstractClientIntermediary.mBeanServerRequest(AbstractClientIntermediary.java:249)
  at javax.management.remote.generic.AbstractClientIntermediary.mBeanServerRequest(AbstractClientIntermediary.java:234)
  at javax.management.remote.generic.ClientIntermediary.getAttribute(ClientIntermediary.java:330)
  at javax.management.remote.generic.RemoteMBeanServerConnection.getAttribute(RemoteMBeanServerConnection.java:164)
  at org.jppf.management.JMXConnectionWrapper.getAttribute(JMXConnectionWrapper.java:293)
  - locked <0x000000062d8bde10> (a org.jppf.management.JMXNodeConnectionWrapper)
  at org.jppf.management.forwarding.JPPFNodeForwarding.forwardGetAttribute(JPPFNodeForwarding.java:126)
  at sun.reflect.GeneratedMethodAccessor153.invoke(Unknown Source)
  at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
  at java.lang.reflect.Method.invoke(Method.java:483)
  at sun.reflect.misc.Trampoline.invoke(MethodUtil.java:71)
  at sun.reflect.GeneratedMethodAccessor6.invoke(Unknown Source)
  at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
  at java.lang.reflect.Method.invoke(Method.java:483)
  at sun.reflect.misc.MethodUtil.invoke(MethodUtil.java:275)
  at com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:112)
  at com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:46)
  at com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.java:237)
  at com.sun.jmx.mbeanserver.PerInterface.invoke(PerInterface.java:138)
  at com.sun.jmx.mbeanserver.MBeanSupport.invoke(MBeanSupport.java:252)
  at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.invoke(DefaultMBeanServerInterceptor.java:819)
  at com.sun.jmx.mbeanserver.JmxMBeanServer.invoke(JmxMBeanServer.java:801)
  at javax.management.remote.generic.ServerIntermediary.handleRequest(ServerIntermediary.java:160)
  at javax.management.remote.generic.ServerIntermediary$PrivilegedRequestJob.run(ServerIntermediary.java:515)
  at java.security.AccessController.doPrivileged(Native Method)
  at javax.management.remote.generic.ServerIntermediary$RequestHandler.handleMBSReqMessage(ServerIntermediary.java:382)
  at javax.management.remote.generic.ServerIntermediary$RequestHandler.execute(ServerIntermediary.java:321)
  at com.sun.jmx.remote.generic.ServerSynchroMessageConnectionImpl$RemoteJob.run(ServerSynchroMessageConnectionImpl.java:336)
  at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
  at java.util.concurrent.FutureTask.run(FutureTask.java:266)
  at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
  at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
  at java.lang.Thread.run(Thread.java:745)
 
Blocked threads:
 
"JobExecutor_2447" daemon prio=5 tid=0x00002abb9d0d5800 nid=0x395e waiting for monitor entry [0x00002abb98e09000]
   java.lang.Thread.State: BLOCKED (on object monitor)
  at org.jppf.management.JMXConnectionWrapper.invoke(JMXConnectionWrapper.java:254)
  - waiting to lock <0x000000062d8bde10> (a org.jppf.management.JMXNodeConnectionWrapper)
  at org.jppf.management.JMXNodeConnectionWrapper.cancelJob(JMXNodeConnectionWrapper.java:192)
  at org.jppf.server.nio.nodeserver.AbstractNodeContext.cancelJob(AbstractNodeContext.java:450)
  at org.jppf.server.nio.nodeserver.NodeContextFuture.cancel(NodeContextFuture.java:57)
  at org.jppf.server.protocol.ServerJob.handleCancelledStatus(ServerJob.java:228)
  at org.jppf.server.protocol.ServerJob.cancel(ServerJob.java:262)
  at org.jppf.server.job.management.DriverJobManagement.cancelJob(DriverJobManagement.java:77)
  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
  at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
  at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
  at java.lang.reflect.Method.invoke(Method.java:483)
  at sun.reflect.misc.Trampoline.invoke(MethodUtil.java:71)
  at sun.reflect.GeneratedMethodAccessor6.invoke(Unknown Source)
  at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
  at java.lang.reflect.Method.invoke(Method.java:483)
  at sun.reflect.misc.MethodUtil.invoke(MethodUtil.java:275)
  at com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:112)
  at com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:46)
  at com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.java:237)
  at com.sun.jmx.mbeanserver.PerInterface.invoke(PerInterface.java:138)
  at com.sun.jmx.mbeanserver.MBeanSupport.invoke(MBeanSupport.java:252)
  at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.invoke(DefaultMBeanServerInterceptor.java:819)
  at com.sun.jmx.mbeanserver.JmxMBeanServer.invoke(JmxMBeanServer.java:801)
  at javax.management.remote.generic.ServerIntermediary.handleRequest(ServerIntermediary.java:160)
  at javax.management.remote.generic.ServerIntermediary$PrivilegedRequestJob.run(ServerIntermediary.java:515)
  at java.security.AccessController.doPrivileged(Native Method)
  at javax.management.remote.generic.ServerIntermediary$RequestHandler.handleMBSReqMessage(ServerIntermediary.java:382)
  at javax.management.remote.generic.ServerIntermediary$RequestHandler.execute(ServerIntermediary.java:321)
  at com.sun.jmx.remote.generic.ServerSynchroMessageConnectionImpl$RemoteJob.run(ServerSynchroMessageConnectionImpl.java:336)
  at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
  at java.util.concurrent.FutureTask.run(FutureTask.java:266)
  at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
  at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
  at java.lang.Thread.run(Thread.java:745)
 
 
"JobExecutor_2441" daemon prio=5 tid=0x00002abb9c9d8800 nid=0x3622 waiting for monitor entry [0x00002abbbc814000]
   java.lang.Thread.State: BLOCKED (on object monitor)
  at org.jppf.management.JMXConnectionWrapper.invoke(JMXConnectionWrapper.java:254)
  - waiting to lock <0x000000062d8bde10> (a org.jppf.management.JMXNodeConnectionWrapper)
  at org.jppf.management.forwarding.JPPFNodeForwarding.forwardInvoke(JPPFNodeForwarding.java:96)
  at org.jppf.management.forwarding.JPPFNodeForwarding.forwardInvoke(JPPFNodeForwarding.java:112)
  at org.jppf.management.forwarding.JPPFNodeForwarding.restart(JPPFNodeForwarding.java:193)
  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
  at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
  at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
  at java.lang.reflect.Method.invoke(Method.java:483)
  at sun.reflect.misc.Trampoline.invoke(MethodUtil.java:71)
  at sun.reflect.GeneratedMethodAccessor6.invoke(Unknown Source)
  at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
  at java.lang.reflect.Method.invoke(Method.java:483)
  at sun.reflect.misc.MethodUtil.invoke(MethodUtil.java:275)
  at com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:112)
  at com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:46)
  at com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.java:237)
  at com.sun.jmx.mbeanserver.PerInterface.invoke(PerInterface.java:138)
  at com.sun.jmx.mbeanserver.MBeanSupport.invoke(MBeanSupport.java:252)
  at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.invoke(DefaultMBeanServerInterceptor.java:819)
  at com.sun.jmx.mbeanserver.JmxMBeanServer.invoke(JmxMBeanServer.java:801)
  at javax.management.remote.generic.ServerIntermediary.handleRequest(ServerIntermediary.java:160)
  at javax.management.remote.generic.ServerIntermediary$PrivilegedRequestJob.run(ServerIntermediary.java:515)
  at java.security.AccessController.doPrivileged(Native Method)
  at javax.management.remote.generic.ServerIntermediary$RequestHandler.handleMBSReqMessage(ServerIntermediary.java:382)
  at javax.management.remote.generic.ServerIntermediary$RequestHandler.execute(ServerIntermediary.java:321)
  at com.sun.jmx.remote.generic.ServerSynchroMessageConnectionImpl$RemoteJob.run(ServerSynchroMessageConnectionImpl.java:336)
  at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
  at java.util.concurrent.FutureTask.run(FutureTask.java:266)
  at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
  at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
  at java.lang.Thread.run(Thread.java:745)
A preliminary analysis shows that the following code in com.sun.jmx.remote.generic.ClientSynchroMessageConnectionImpl.sendWithReturn() ends up waiting for a timeout equal to Long.MAX_VALUE, which is definitely too long!
long remainingTime = wtimeout;
final long startTime = System.currentTimeMillis();
synchronized (mwrapper) {
  while (!mwrapper.got && remainingTime > 0) {
    try {
      mwrapper.wait(remainingTime);
    } catch (InterruptedException ie) {
      // OK. This is a user thread, so it is possible that the user wants to stop waiting.
      break;
    }
    remainingTime = wtimeout - (System.currentTimeMillis() - startTime);
  }
}
Here, the value of 'wtimeout' is Long.MAX_VALUE. What nwe need to do is call mwrapper.wait(some_small_value) and call checkstate() at each iteration of the while loop, so an exception can be raised when we detect that the connection with the peer is broken.
Steps to reproduce this issue
In the org.jppf.management.JPPFNodeAdmin class, replace this:
public void shutdown() throws Exception {
  shutdownOrRestart(false);
}
with this:
public void shutdown() throws Exception {
  System.exit(1);
}
This results in no clean shutdown of the JMX connection(s) being performed on the node side, and the exit() is done from within the MBean method, while the server side is still waiting for a response.

Then, to reproduce:
  • start a driver
  • start an admin console
  • start 40 nodes (1 master + provisioning of 39 slaves)
  • in the admin console, select all nodes and click on the "Shutdown" buttonn
==> take a thread dump and observe the threads blocked wiaitng for a JPPFNodeConnectionWrapper

#11
Comment posted by
 lolo4j
Apr 10, 08:07
Fixed in:
#13
Comment posted by
 lolo4j
Apr 27, 10:45
The fix was flawed, causing the connection check to only happen once.

Fix updateded in: