JPPF, java, parallel computing, distributed computing, grid computing, parallel, distributed, cluster, grid, cloud, open source, android, .net
JPPF

The open source
grid computing
solution

 Home   About   Features   Download   Documentation   Forums 
October 18, 2019, 04:03:28 PM *
Welcome,
Please login or register.

Login with username, password and session length
Advanced search  
News: New users, please read this message. Thank you!
  Home Help Search Login Register  
Pages: [1]   Go Down

Author Topic: Taking JPPF 5.0 for a test drive  (Read 929 times)

dbwiddis

  • JPPF Council Member
  • *****
  • Posts: 106
Taking JPPF 5.0 for a test drive
« on: February 11, 2015, 04:09:37 AM »

I've been running into some server responsiveness issues with the latest 4.2 series releases (including slave nodes permanently disconnecting and the driver's thread count increasing to large values during the "freeze") and seeing that this issue was probably addressed in JPPF-316, decided to give the 5.0 beta a try.

It's like switching from an old Volkswagen to a Lamborghini.  8)

A few notes about the transition, with some suggestions along the way.

Installation
Mostly a painless process of migrating the changed jar files into the old folders, and updating a few deprecated classes.  I only missed two things on the first attempt: one, the fact that I had a pluggable MBean installed and also needed the driver's jar in my client's class path due to the refactoring of package locations in the new distribution.  And, even when I knew the name of the package I needed (org.jppf.management.spi) it wasn't obvious which jar file to find it in.  Second, I forgot to disable automatic discovery of drivers, which seems to prevent the GUI from finding my driver even when I specify the IP and port.

Suggestion 1: Somewhere in the API documentation, separate packages by which jar they are found in.

Suggestion 2: When 5.0 is released, have a condensed list of newly deprecated classes and their equivalent (e.g., I've found JPPFTask -> AbstractTask<String> but haven't dug into what to do with my NodeSelectors yet.)

Suggestion 3:  Why can't the Admin GUI find my driver when I specify an IP/port, if it's configured to automatically discover drivers (and not finding any)? Or if it's supposed to, is this a bug?

Speaking of bugs, you just fixed one.  Are the zip files on Sourceforge already updated with the new code, or is there a periodic "release" schedule of updated beta code?

Execution
The new driver is much more stable.  I tried hard to make a lot of nodes connect at once through mass slave provisioning rebooting, and other events.  While I did occasionally get it to "lose" connection to some of the newly connecting nodes, they did eventually realize they were disconnected and connected themselves again, instead of the 4.x behavior where nodes just disappeared without a trace when the server hiccuped.

Admin UI
I love the new information displayed and features available on the admin UI, particularly the whole-server memory and CPU, the sorting of servers on the JVM Health tab, the deferred restart and shutdown options, and the checkbox for node provisioning to not interrupt running nodes.  I've played around with these features and they work very well.  I threw several challenging configuration changes at my network which had brought my previous driver to a grinding halt (see above) and it handled them just fine.

The only thing I think may be missing in this whole process is some method of displaying when a node has a pending event scheduled.  For example, if I told a node to shut down (deferred until idle), there was no obvious indication that it was going to shut down, and no apparent way to cancel that instruction either.  I had expected some sort of highlighting similar to toggling a node's active state.  I do assume that the slave node count is one possible indication, as the slave node number on the GUI wouldn't match the displayed number of slaves during pending changes, but this wasn't a reliable indication when I was throwing multiple changes at it at the same time (changing slave nodes and deferred shutdowns both pending).

Suggestion 1: Have some sort of display indicating a pending action.  This could possibly be shown in the "Node Status" or "Exec Status" columns.

Suggestion 2: Permit an unexecuted pending action (shutdown, reboot, reprovisioning) to be cancelled, if it can be done easily.

New features
So far I've just been focused on getting things going like they were (but more smoothly).  Next step is simplifying a lot of my driver jmx workaround kludges made obsolete by the new "deferred" options.  I'll have to dig into the API to find some of these, but current ideas/thoughts/questions:
  • My current client code calls an external API to shut down the server immediately after a node is shutdown.  If I transition to the deferred shutdown model, how can I detect when a node is shutdown in order to call this API?  Is it possible for the node itself to call an external API as its last command when shutting down?
  • You mentioned in another thread regarding processor affinity that it might be possible with the .Net bridge.  Is this functionality available yet, and if so, how can I use it?
Logged

lolo

  • Administrator
  • JPPF Council Member
  • *****
  • Posts: 2256
    • JPPF Web site
Re: Taking JPPF 5.0 for a test drive
« Reply #1 on: February 11, 2015, 09:49:32 AM »

Hello,

Thank you very much for taking the time to test the 5.0 beta, and especially for taking the time to write down and share your experience.

Regarding the stability issues, you have certainly noticed that a number of related bugs, many of which you reported yourself, have been fixed in the various 4.2.x maintenance releases. All of these were ported to the 5.0 code as well. I'm also doing extra stress-testing before releasing 5.0, including, for example, tests where 90% of the nodes are dropped, then restarted, periodically, with jobs continually executing meanwhile. Reliability of the driver is vital, and it is always our priority.

Let me try to address your other qsuestions and suggestions:

Installation

Quote
Suggestion 1: Somewhere in the API documentation, separate packages by which jar they are found in.
Not sure what you mean exactly here. The problem is, some packages are distributed in more than one jar - each class being in a single jar, though. Would a paragraph in the pacakge's Javadoc, stating which jars it can be found in, be sufficient?

Quote
Suggestion 2: When 5.0 is released, have a condensed list of newly deprecated classes and their equivalent
It seems to me that this is already available in the deprecated list of the API documentation. Does this address your need?

Quote
Suggestion 3:  Why can't the Admin GUI find my driver when I specify an IP/port, if it's configured to automatically discover drivers
To have both manually defined and auto-discovered drivers, you need to add the special driver name "jppf_discovery" in the list of manually defined drivers
In this example:
Code: [Select]
jppf.drivers = jppf_discovery driver1
driver1.jppf.server.host = a.b.c.d
driver1.jppf.server.port = 11111
The GUI will attempt to connect to the driver at a.b.c.d:11111, and to any other driver broadcasting its connection information.
This has been a feature since JPPF 3.3 if I recall properly, related documentation can be found here

Quote
Are the zip files on Sourceforge already updated with the new code, or is there a periodic "release" schedule of updated beta code?
Neither  :(  I will release an updated RC1 today or tomorrow at the latest. This will include some performance improvements, following the round of profiling I'm currently conducting.

Admin UI

Quote
Suggestion 1: Have some sort of display indicating a pending action.  This could possibly be shown in the "Node Status" or "Exec Status" columns.
Suggestion 2: Permit an unexecuted pending action (shutdown, reboot, reprovisioning) to be cancelled, if it can be done easily.

I registered JPPF-366 Enable the nodes to expose and cancel any pending/deferred action for this.

New features:

Quote
My current client code calls an external API to shut down the server immediately after a node is shutdown.  If I transition to the deferred shutdown model, how can I detect when a node is shutdown in order to call this API?  Is it possible for the node itself to call an external API as its last command when shutting down?
Having the node call an API upon shutdown should easy enough to do with a JVM shutdown hook. Another possibility is to use the new grid topology monitoring API, and in particular regiister a topology listener to handle nodeRemoved() events. As a side note, this monitoring API is the result of refactoring the admin console, which is now based on it.

Sincerely,
-Laurent
Logged

dbwiddis

  • JPPF Council Member
  • *****
  • Posts: 106
Re: Taking JPPF 5.0 for a test drive
« Reply #2 on: February 11, 2015, 06:50:35 PM »

Would a paragraph in the pacakge's Javadoc, stating which jars it can be found in, be sufficient?
Yes.  For example, if this page specified it was found in the driver.jar file, that'd be great.

It seems to me that this is already available in the deprecated list of the API documentation. Does this address your need?
Perfectly!  Hadn't found that yet but it's exactly what I needed. 

To have both manually defined and auto-discovered drivers...
I actually don't want/need both, I just want one manually defined driver, but ran afoul of the config file defaults assuming I could just change the IP and it would work.  Digging into the admin-gui config file again, I see this comment:
Code: [Select]
#------------------------------------------------------------------------------#
# Space-separated list of named drivers this client may connect to.            #
# If auto discovery of the server is enabled, this needs not be specified.     #
#------------------------------------------------------------------------------#

jppf.drivers = driver1
and
Code: [Select]
# Host name, or ip address, of the host the JPPF driver is running on
# If auto discovery of the server is enabled, this needs not be specified.
driver1.jppf.server.host = localhost
And then further down:
Code: [Select]
# Enable or disable discovery of JPPF drivers. Defaults to true (enabled)
#jppf.discovery.enabled = true

The default settings appear to conflict with each other:  a jppf.driver and its host are specified by default, despite a comment saying it should not be if discovery is enabled, which it is by default.  So out-of-the-box, a user can't just edit the file from localhost to the IP of their driver, they need to also disable the discovery.  I might suggest that if auto discovery is desired, these initial driver settings should be commented out, prompting the user to read the comment above the property declaration when uncommenting (rather than simply editing).

You might also consider adding a (commented out) example of both a manual and jppf_discovery driver together, per your linked documentation, or including that option in the comments that say it needs to not be specified.

(This has been this way through the 4.x releases as well, so this isn't a new 5.0 issue, but I recall struggling with the initial connection defaults the first time I installed as well.)

Having the node call an API upon shutdown should easy enough to do with a JVM shutdown hook. Another possibility is to use the new grid topology monitoring API, and in particular regiister a topology listener to handle nodeRemoved() events. As a side note, this monitoring API is the result of refactoring the admin console, which is now based on it.
The grid topology monitoring sounds like the perfect solution, and I have some other code I can convert to use this API as well.  Thanks!
Logged
Pages: [1]   Go Up
 
JPPF Powered by SMF 2.0 RC5 | SMF © 2006–2011, Simple Machines LLC Get JPPF at SourceForge.net. Fast, secure and Free Open Source software downloads