adequate
adequate
adequate
adequate
 

JPPF
 Home   About   Download   Documentation   Forums 
May 23, 2013, 03:22:04 PM *
Welcome,
Please login or register.

Login with username, password and session length
Advanced search  
News: Registered users, your contribution is requested! Please participate in our JDK support poll
New users, please read this message. Thank you!
  Home Help Search Login Register  
Pages: [1]   Go Down

Author Topic: Dropped WiFi connection might be crashing the driver  (Read 1299 times)

fommil

  • JPPF Grand Master
  • ****
  • Posts: 77
Dropped WiFi connection might be crashing the driver
« on: December 02, 2011, 04:53:33 PM »

Hello,

I *think* we have finally isolated a failure mode of the driver that is causing it to reject task submissions. Take this with a pinch of salt, because it's a really hard one to reproduce and it could just be coincidence.

When the machine running the driver loses its WiFi connection, the following exception trace is seen on the terminal that started the driver:

Dec 1, 2011 9:52:21 PM org.jppf.comm.discovery.JPPFBroadcaster run
SEVERE: Network is down
java.io.IOException: Network is down
   at java.net.PlainDatagramSocketImpl.send(Native Method)
   at java.net.DatagramSocket.send(DatagramSocket.java:625)
   at org.jppf.comm.discovery.JPPFBroadcaster.run(JPPFBroadcaster.java:134)
   at java.lang.Thread.run(Thread.java:680)

I don't believe we've ever seen this error. Also, I only started making the terminal output visible. Previously, I was just relying on the log and I've never seen this before.

We don't see this message if we manually turn off the wifi, or the wifi router. It only seems to be if the WiFi signal is lost, due to effects that we are not aware about (e.g. perhaps some kind of local interference).

This can happen every few days in an office environment, and the machine usually recovers the signal itself. However, JPPF doesn't seem to recover.
Logged

lolo

  • Administrator
  • JPPF Council Member
  • *****
  • Posts: 1434
    • JPPF Web site
Re: Dropped WiFi connection might be crashing the driver
« Reply #1 on: December 07, 2011, 08:19:24 AM »

Hello,

I have simulated this problem by generating an exception at the same place in the code of at org.jppf.comm.discovery.JPPFBroadcaster.run(). So, even if the cause is different, the effect is similar.
The effect is that the discovery broadcasting service of the driver isn't working anymore. Basically, this will prevent any node or client form discovering the server (if they use discovery). Already connected clients and nodes should not be affected. I could test that my client was still able to submit jobs and the nodes were still able to execute them, even after the exception occurs.
However, new clients/nodes configured with discovery enabled were unable to find the server. Is this what is happening in your environment? Are you also observing other errors/exceptions in the lcient or nodes?

Also, I'm puzzled why the exception is no logged, the code where the exception occurs is as follows (you can see the actual code there):

Code: [Select]
try
{
  Pair<MulticastSocket, DatagramPacket> socketInfo = it.next();
  socketInfo.first().send(socketInfo.second());
}
catch(Exception e)
{
  log.error(e.getMessage(), e);
  it.remove();
}


As you can see, JPPF does log the exception. I suspect there is an issue in your logging configuration, it's probably configured to only log to the console rather than in a file.

I will continue to investigate this, and see how the bradcasting service can recover from that. I'll keep you updated in this thread.

Sincerely,
-Laurent


Logged

fommil

  • JPPF Grand Master
  • ****
  • Posts: 77
Re: Dropped WiFi connection might be crashing the driver
« Reply #2 on: December 07, 2011, 09:52:51 AM »

Excellent, well done!! I have no more debugging for you unfortunately.
Logged

lolo

  • Administrator
  • JPPF Council Member
  • *****
  • Posts: 1434
    • JPPF Web site
Re: Dropped WiFi connection might be crashing the driver
« Reply #3 on: December 08, 2011, 06:47:11 AM »

Hello,

As I mentioned in my previous post, I believe the problem with your log is that it is not configured to log into a file.
From the stack you posted, it looks very much like JDK logging using the default simple formatter, whereas the default logging in JPPF is log4j with a very different formatting.
To change the level of information that is logged, and where it is logged, you could use this logging configuration file as a starting point and adapt it to your needs. Would you mind doing that and post the resulting log file ?

In the meantime, I have registered a bug for this issue: 3454260 - Broadcasting service does not recover from network breakdown. I will work on a fix and hopefully provide a patch soon.

Sincerely,
-Laurent
Logged

fommil

  • JPPF Grand Master
  • ****
  • Posts: 77
Re: Dropped WiFi connection might be crashing the driver
« Reply #4 on: December 08, 2011, 11:37:46 AM »

The whole network goes down when this happens - is that because the "already connected" nodes timeout from the driver (because it is no longer connected to the intranet) and then are unable to find it again when it reconnects?

I am using SLF4J, with a JDK backend. I'm using the following config. I'll use the JPPF formatter for a few days and see if the problem comes up again.


handlers = java.util.logging.FileHandler
.level = INFO

# Handler specific properties
java.util.logging.FileHandler.level = FINEST
#java.util.logging.FileHandler.formatter = org.jppf.logging.jdk.JPPFLogFormatter
java.util.logging.FileHandler.formatter = java.util.logging.SimpleFormatter
java.util.logging.FileHandler.append = false
java.util.logging.FileHandler.limit = 1000000
java.util.logging.FileHandler.pattern = %h/.thinkgridserver.log

# Facility specific properties.
com.level = SEVERE
sun.level = SEVERE
java.level = SEVERE
javax.level = SEVERE
org.jppf.level = INFO
org.jppf.utils.level = SEVERE
org.jppf.server.nio.level = FINEST
org.jppf.server.nio.NioObject.level = INFO
org.jppf.server.nio.nodeserver.TaskQueueChecker.level = INFO
Logged

fommil

  • JPPF Grand Master
  • ****
  • Posts: 77
Re: Dropped WiFi connection might be crashing the driver
« Reply #5 on: December 08, 2011, 11:42:10 AM »

Incidentally - if I want to turn broadcast mode off how would I do that? Our "driver" machine doesn't have a fixed IP address but it does have a fixed hostname on the local network.
Logged

lolo

  • Administrator
  • JPPF Council Member
  • *****
  • Posts: 1434
    • JPPF Web site
Re: Dropped WiFi connection might be crashing the driver
« Reply #6 on: December 08, 2011, 07:40:53 PM »

Hi,

If you want to turn the broadcasting mode off, it has to be done via the driver's static configuration, by specifiying "jpp.discovery.enabled = false"
You will then have to manually configure the nodes and clients to point to the driver's IP address.

Sincerely,
-Laurent
Logged

fommil

  • JPPF Grand Master
  • ****
  • Posts: 77
Re: Dropped WiFi connection might be crashing the driver
« Reply #7 on: December 08, 2011, 07:43:45 PM »

I know how to do it for static IP, but it'd need to work for hostnames to be useful for me. OK, no problem - we'll just wait for broadcast mode to be fixed :-)
Logged

lolo

  • Administrator
  • JPPF Council Member
  • *****
  • Posts: 1434
    • JPPF Web site
Re: Dropped WiFi connection might be crashing the driver
« Reply #8 on: December 08, 2011, 08:24:05 PM »

Sorry for being unclear, in fact you can use host names instead of IP addresses, this works the same way.

-Laurent
Logged
Pages: [1]   Go Up
 
Support This Project Powered by SMF 2.0 RC5 | SMF © 2006–2011, Simple Machines LLC Powered by Parallel Matters Get JPPF at SourceForge.net. Fast, secure and Free Open Source software downloads