JPPF, java, parallel computing, distributed computing, grid computing, parallel, distributed, cluster, grid, cloud, open source, android, .net
JPPF

The open source
grid computing
solution

 Home   About   Features   Download   Documentation   On Github   Forums 
April 07, 2020, 03:57:54 PM *
Welcome,
Please login or register.

Login with username, password and session length
Advanced search  
News: New users, please read this message. Thank you!
  Home Help Search Login Register  
Pages: [1]   Go Down

Author Topic: Job cancellation from client in JPPF 6.2 not cancelling job/tasks on node  (Read 27 times)

wimvc

  • JPPF Padawan
  • *
  • Posts: 5

Hi all,

I'm having some difficulties cancelling jobs with JPPF 6.2 (before, also in 6.1.3. Didn't try < 6.1.3) . Whenever I cancel a job from the client using 'client.cancelJob(jobUUID)' , the job is cancelled on the client-side and my application continues. However, when viewed in the Admin-UI, I see that the Job's tasks are still running,  even though the tasks regularly check for Thread.isInterrupted() - every few milliseconds.  When stopping the job from the Job Data panel in the Admin UI, the tasks immediately stop.   Note that the Admin and Client are running on the same host, so it shouldn't be related to firewall or anything.

I do see something peculiar in the console log, when DEBUG logging is enabled. Some interrupted-exception seems to be thrown, which subsequently closes a JMX connection. I'm not sure if this is relevant. Could anyone point me in the right direction to solve this?

Code: [Select]
35701 [pool-3-thread-1] DEBUG be.kuleuven.******  - Cancelling job D70C1CF5-2A35-4235-8B3C-241793E48162
35702 [pool-3-thread-1] DEBUG org.jppf.client.AbstractGenericClient  - request to cancel job with uuid=D70C1CF5-2A35-4235-8B3C-241793E48162
35702 [pool-3-thread-1] DEBUG org.jppf.client.balancer.JobManagerClient  - requesting cancel of jobId=D70C1CF5-2A35-4235-8B3C-241793E48162
35703 [pool-3-thread-1] DEBUG org.jppf.client.balancer.queue.JPPFPriorityQueue  - requesting cancel of jobId=D70C1CF5-2A35-4235-8B3C-241793E48162
35703 [pool-3-thread-1] DEBUG org.jppf.client.balancer.ClientJob  - requesting cancel of jobId=D70C1CF5-2A35-4235-8B3C-241793E48162
35703 [pool-3-thread-1] DEBUG org.jppf.client.balancer.AbstractChannelWrapperRemote  - requesting cancel of jobId=D70C1CF5-2A35-4235-8B3C-241793E48162
35703 [pool-3-thread-1] DEBUG org.jppf.client.balancer.ClientJob  - sending cancel request for jobId=D70C1CF5-2A35-4235-8B3C-241793E48162 to driver=76045AB5-F3EA-4B1C-945F-E4C235441008
35704 [pool-3-thread-1] DEBUG org.jppf.jmxremote.message.JMXMessageHandler  - sending request JMXRequest[messageID=1, messageType=INVOKE, params=[org.jppf:name=jobManagement,type=driver, cancelJob, [D70C1CF5-2A35-4235-8B3C-241793E48162], [java.lang.String]]], channels=ChannelsPair[readingChannelID=1, writingChannelID=2, connectionID=jppf://[0:0:0:0:0:0:0:0]:11111 3, closed=false, closing=false, serverSide=false, socketChannel=java.nio.channels.SocketChannel[connected local=/10.129.179.16:56331 remote=/10.129.176.3:11111]]
35704 [pool-3-thread-1] DEBUG org.jppf.jmxremote.message.JMXMessageHandler  - sending message JMXRequest[messageID=1, messageType=INVOKE, params=[org.jppf:name=jobManagement,type=driver, cancelJob, [D70C1CF5-2A35-4235-8B3C-241793E48162], [java.lang.String]]]
35706 [JPPF-0007] DEBUG org.jppf.jmxremote.nio.JMXMessageWriter  - about to send message MessageWrapper[jmxMessage=JMXRequest[messageID=1, messageType=INVOKE, params=[org.jppf:name=jobManagement,type=driver, cancelJob, [D70C1CF5-2A35-4235-8B3C-241793E48162], [java.lang.String]]]] from context JMXContext[id=2, connectionID=jppf://[0:0:0:0:0:0:0:0]:11111 3, serverSide=false, ssl=false, pendingMessages=0, socketChannel=java.nio.channels.SocketChannel[connected local=/10.129.179.16:56331 remote=/10.129.176.3:11111]]
35706 [JPPF-0007] DEBUG org.jppf.nio.PlainNioObject  - read 4 bytes for PlainNioObject[, size=4, count=4, source=null, dest=ChannelOutputDestination[channel=java.nio.channels.SocketChannel[connected local=/10.129.179.16:56331 remote=/10.129.176.3:11111]], location=MultipleBuffersLocation[size=4, count=4, currentBuffer=null, currentBufferIndex=0, transferring=false, list=[Lorg.jppf.utils.JPPFBuffer;@1aa27ac3]]
35706 [JPPF-0007] DEBUG org.jppf.nio.PlainNioObject  - read 442 bytes for PlainNioObject[, size=442, count=442, source=null, dest=ChannelOutputDestination[channel=java.nio.channels.SocketChannel[connected local=/10.129.179.16:56331 remote=/10.129.176.3:11111]], location=MultipleBuffersLocation[size=442, count=442, currentBuffer=null, currentBufferIndex=0, transferring=false, list=[Lorg.jppf.utils.JPPFBuffer;@2a82ee3]]
35706 [JPPF-0007] DEBUG org.jppf.jmxremote.nio.JMXContext  - wrote 446 bytes
35706 [JPPF-0007] DEBUG org.jppf.jmxremote.nio.JMXMessageWriter  - fully sent message MessageWrapper[jmxMessage=JMXRequest[messageID=1, messageType=INVOKE, params=[org.jppf:name=jobManagement,type=driver, cancelJob, [D70C1CF5-2A35-4235-8B3C-241793E48162], [java.lang.String]]]] from context JMXContext[id=2, connectionID=jppf://[0:0:0:0:0:0:0:0]:11111 3, serverSide=false, ssl=false, pendingMessages=0, socketChannel=java.nio.channels.SocketChannel[connected local=/10.129.179.16:56331 remote=/10.129.176.3:11111]]
35706 [pool-3-thread-1] DEBUG org.jppf.management.JMXConnectionWrapper  - error invoking mbean 'org.jppf:name=jobManagement,type=driver' method 'cancelJob([java.lang.String])' while not connected
java.io.IOException: java.lang.InterruptedException
at org.jppf.jmxremote.JPPFMBeanServerConnection.invoke(JPPFMBeanServerConnection.java:242)
at org.jppf.management.JMXConnectionWrapper.invoke(JMXConnectionWrapper.java:168)
at org.jppf.management.JMXDriverConnectionWrapper.cancelJob(JMXDriverConnectionWrapper.java:140)
at org.jppf.client.balancer.ClientJob.cancel(ClientJob.java:473)
at org.jppf.client.balancer.queue.JPPFPriorityQueue.cancelJob(JPPFPriorityQueue.java:290)
at org.jppf.client.balancer.JobManagerClient.cancelJob(JobManagerClient.java:374)
at org.jppf.client.AbstractGenericClient.cancelJob(AbstractGenericClient.java:395)
at be.kuleuven.******
at java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1382)
at java.util.stream.ReferencePipeline$Head.forEach(ReferencePipeline.java:580)
at be.kuleuven.******
at be.kuleuven.******
at com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:125)
at com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:69)
at com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:78)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.InterruptedException
at java.lang.Object.wait(Native Method)
at org.jppf.jmxremote.message.JMXMessageHandler.waitForMessage(JMXMessageHandler.java:219)
at org.jppf.jmxremote.message.JMXMessageHandler.receiveResponse(JMXMessageHandler.java:113)
at org.jppf.jmxremote.message.JMXMessageHandler.sendRequestWithResponse(JMXMessageHandler.java:97)
at org.jppf.jmxremote.JPPFMBeanServerConnection.invoke(JPPFMBeanServerConnection.java:238)
... 17 more
35707 [pool-3-thread-1] DEBUG org.jppf.jmxremote.JPPFMBeanServerConnection  - closing ChannelsPair[readingChannelID=1, writingChannelID=2, connectionID=jppf://[0:0:0:0:0:0:0:0]:11111 3, closed=false, closing=false, serverSide=false, socketChannel=java.nio.channels.SocketChannel[connected local=/10.129.179.16:56331 remote=/10.129.176.3:11111]]
35707 [pool-3-thread-1] DEBUG org.jppf.jmxremote.message.JMXMessageHandler  - sending request JMXRequest[messageID=2, messageType=CLOSE, params=[]], channels=ChannelsPair[readingChannelID=1, writingChannelID=2, connectionID=jppf://[0:0:0:0:0:0:0:0]:11111 3, closed=false, closing=true, serverSide=false, socketChannel=java.nio.channels.SocketChannel[connected local=/10.129.179.16:56331 remote=/10.129.176.3:11111]]
35707 [pool-3-thread-1] DEBUG org.jppf.jmxremote.message.JMXMessageHandler  - sending message JMXRequest[messageID=2, messageType=CLOSE, params=[]]
35707 [JPPF-0008] DEBUG org.jppf.jmxremote.nio.JMXMessageWriter  - about to send message MessageWrapper[jmxMessage=JMXRequest[messageID=2, messageType=CLOSE, params=[]]] from context JMXContext[id=2, connectionID=jppf://[0:0:0:0:0:0:0:0]:11111 3, serverSide=false, ssl=false, pendingMessages=0, socketChannel=java.nio.channels.SocketChannel[connected local=/10.129.179.16:56331 remote=/10.129.176.3:11111]]
35708 [JPPF-0008] DEBUG org.jppf.nio.PlainNioObject  - read 4 bytes for PlainNioObject[, size=4, count=4, source=null, dest=ChannelOutputDestination[channel=java.nio.channels.SocketChannel[connected local=/10.129.179.16:56331 remote=/10.129.176.3:11111]], location=MultipleBuffersLocation[size=4, count=4, currentBuffer=null, currentBufferIndex=0, transferring=false, list=[Lorg.jppf.utils.JPPFBuffer;@4a8cfa69]]
35708 [JPPF-0008] DEBUG org.jppf.nio.PlainNioObject  - read 191 bytes for PlainNioObject[, size=191, count=191, source=null, dest=ChannelOutputDestination[channel=java.nio.channels.SocketChannel[connected local=/10.129.179.16:56331 remote=/10.129.176.3:11111]], location=MultipleBuffersLocation[size=191, count=191, currentBuffer=null, currentBufferIndex=0, transferring=false, list=[Lorg.jppf.utils.JPPFBuffer;@4eb367e0]]
35708 [JPPF-0008] DEBUG org.jppf.jmxremote.nio.JMXContext  - wrote 195 bytes
35709 [JPPF-0008] DEBUG org.jppf.jmxremote.nio.JMXMessageWriter  - fully sent message MessageWrapper[jmxMessage=JMXRequest[messageID=2, messageType=CLOSE, params=[]]] from context JMXContext[id=2, connectionID=jppf://[0:0:0:0:0:0:0:0]:11111 3, serverSide=false, ssl=false, pendingMessages=0, socketChannel=java.nio.channels.SocketChannel[connected local=/10.129.179.16:56331 remote=/10.129.176.3:11111]]
35709 [JPPF-0008] DEBUG org.jppf.jmxremote.nio.JMXMessageWriter  - handling CLOSE for context JMXContext[id=2, connectionID=jppf://[0:0:0:0:0:0:0:0]:11111 3, serverSide=false, ssl=false, pendingMessages=0, socketChannel=java.nio.channels.SocketChannel[connected local=/10.129.179.16:56331 remote=/10.129.176.3:11111]]
35709 [JPPF-0008] DEBUG org.jppf.jmxremote.message.JMXMessageHandler  - sent request JMXRequest[messageID=2, messageType=CLOSE, params=[]], channels=ChannelsPair[readingChannelID=1, writingChannelID=2, connectionID=jppf://[0:0:0:0:0:0:0:0]:11111 3, closed=false, closing=true, serverSide=false, socketChannel=java.nio.channels.SocketChannel[connected local=/10.129.179.16:56331 remote=/10.129.176.3:11111]]
35710 [pool-3-thread-1] DEBUG org.jppf.jmxremote.JPPFJMXConnector  - isClose=true, exception=null
35710 [pool-3-thread-1] DEBUG org.jppf.jmxremote.JPPFJMXConnector  - firing notif with type=jmx.remote.connection.closed, exception=null, connectionID=jppf://[0:0:0:0:0:0:0:0]:11111 3
35710 [pool-3-thread-1] DEBUG org.jppf.client.balancer.ClientJob  - setting cancelled flag on job JPPFJob[name=BATCH: Optimize total weight, phase 1, submitted by Wim, uuid=D70C1CF5-2A35-4235-8B3C-241793E48162, blocking=true, nbTasks=40, nbResults=0, hasGraph=false]
35710 [pool-3-thread-1] DEBUG be.kuleuven.******  - Result cancelling job D70C1CF5-2A35-4235-8B3C-241793E48162: true
Logged

lolo

  • Administrator
  • JPPF Council Member
  • *****
  • Posts: 2260
    • JPPF Web site

Hello,

Yes, the InterruptedException has definitely something to do with the problem you describe. For some reason, the thread which issues the management request to cancel the job on the remote server is interrupted, causing the cancel action to fail.

I'm not sure how this happens, without more information. It would be very helpful to know who, exactly, is interrupting the thread. From what I see in your log extract, it appears that you are using  an ExecutorService which creates the threads. Is your code creating it? If yes, is there any possiblity for you to use a ThreadFactory which creates instances of a subclass of Thread where the interrupt() method is overriden to, for example, log its call stack and thread name?

You can easily do this by using a JPPFThreadFactory, as it creates instances of DebuggableThread. When the DEBUG log level is enabled, it will log information on the thread that is interrupting, including the thread description and its full call stack. Example usage:
Code: [Select]
ExecutorService myThreadPool = Executors.newFixedThreadPool(4, new JPPFThreadFactory("thread_name_prefix"));
Could you please try this and let us know of the result?

Thanks for your time,
-Laurent
Logged

wimvc

  • JPPF Padawan
  • *
  • Posts: 5

Hi lolo,

I will try that asap (when I get back to work :s) and let know the result.

Best regards,
Wim.
Logged

wimvc

  • JPPF Padawan
  • *
  • Posts: 5

Thx lolo, I managed to resolve my issue.  My user interface thread (javafx application thread)  would interrupt my job submission thread, in order to signal it should cancel remaining jobs. However, I didn't clear the interrupt status (used Thread.currentThread().isInterrupted()  instead of Thread.interrupted)  and I'm guessing the Object.wait (from my original post) was picking up this hanging interrupt-status. 

Thx! Another step towards improved usability - my UI now correctly cancels lingering jobs.
Logged

lolo

  • Administrator
  • JPPF Council Member
  • *****
  • Posts: 2260
    • JPPF Web site

Hi Wim,

Thank you very much for the feedback. I'm glad you could figure it out :)

-Laurent
Logged
Pages: [1]   Go Up
 
JPPF Powered by SMF 2.0 RC5 | SMF © 2006–2011, Simple Machines LLC Get JPPF at SourceForge.net. Fast, secure and Free Open Source software downloads