JPPF, java, parallel computing, distributed computing, grid computing, parallel, distributed, cluster, grid, cloud, open source, android, .net
JPPF

The open source
grid computing
solution

 Home   About   Features   Download   Documentation   On Github   Forums 
March 28, 2023, 11:29:17 PM *
Welcome,
Please login or register.

Login with username, password and session length
Advanced search  
News: New users, please read this message. Thank you!
  Home Help Search Login Register  
Pages: [1]   Go Down

Author Topic: Potential use of JPPF on a project  (Read 11700 times)

rviloria

  • Guest
Potential use of JPPF on a project
« on: December 28, 2007, 11:29:53 AM »

Im doing some research on potential scheduling infrastructures and I wanted to ask the following of your product:

1) If the tasks were developed in a different implementation like C++, C, or Perl and they used files for input as well as drop files for output(results), then what would be my strategy to attack this with JPPF? Java Wrapper? What are the disadvantages because the task is in a different non-java implementation?

2) At our site the compute cluster resources already have their own scheduler/resource manager that cannot be avoided. I was potentially considering running JPPF as more of a scheduler running underneath the existing scheduler, where the existing scheduler's tasks would be starting up and running the JPPF infrastructure for a specified amount of compute time. This way once the JPPF infrastructure is all started, then JPPF will interschedule tasks while appearing to be a compute task itself to the existing scheduler. What would be challenges involved in potentially using JPPF in this scenario?

3) The network infrastructure is very restrictive because of security policies. The only port number that would probably be allowed is probably port 443 TCP, no UDP, definitely not a range of IPs,non-default SSH port number might be a possiblity. What are the disadvantages of this scenario with the use of JPPF infrastructure.

Thanks in Advance!

Ron
Logged

lolo

  • Administrator
  • JPPF Council Member
  • *****
  • Posts: 2272
    • JPPF Web site
Potential use of JPPF on a project
« Reply #1 on: December 28, 2007, 01:46:21 PM »

Hello Ron,

Welcome to the JPPF forums.

I have a few answers and guidelines that I hope you will find useful:

1) to run non-java tasks, I see 2 possible approaches:

- using a JNI wrapper.
This would probably work with C++ or C programs, however I am not sure how easy it is to do with Perl programs. Also, JNI-based implementations are generally tightly coupled with the external programs and pose difficult challenges in terms of maintenance.

- using JPPF tasks to start external programs.
This approach has the advantages of providing a very loose coupling between the Java and non-Java parts, allowing you to keep your external code doing what it's already doing, while the Java wrapper focuses on providing the required inputs and getting the resulting outputs. I definitely recommend that approach, as it is, in my opinion, the one that requires the least amount of integration and maintenance.
We have actually started working on this approach, and some APIs are already available in the current distribution. They provide a way to start an external program or shell script from a JPPF task, and specify a set of input files or urls as well as output files or urls.
It's not completely tested yet, and I don't think it covers all uses cases, however it is something that you can use and build upon.
Currently, the only documentation available for this is in the javadoc, you can find it at this location.
I would be happy to get your feedback on it, and implement the features you believe are missing or incomplete.

2) From a high-level view, I see 2 main challenges:

- Integration with the cluster architecture.
You will probably have to develop a JPPF client that is capable to communicate with the resource manager and transform a job into a set of JPPF tasks. I recommend that JPPF server and nodes be running all the time, as opposed to started on-demand, so as not to infringe too much on the job's allocated time.

- Scheduling constraints
Here I'm thinking mostly about time constraints, i.e. how long a job is allowed to run. I believe JPPF can accommodate this by specifying a timeout on the tasks. There is a feature that enables tasks to timeout after a specific length of time or at a specified date/time. This is documented here

With these points in mind, I believe your idea will work.

3) Network restrictions are the biggest pain point when using JPPF. JPPF communications use a custom protocol built directly on top of TCP/IP sockets (no SSL). A JPPF network requires 3 distinct TCP ports, there's no way around that in the current implementation. It may be possible to work around this limitation through SSH tunneling, or some other kind of tunneling, however I'm no expert on this topic and I would defer to a network specialist.

I hope this helps,
-Laurent
Logged

rviloria

  • Guest
Potential use of JPPF on a project
« Reply #2 on: December 28, 2007, 09:28:41 PM »

Quote from: "lolocohen"
....
1)...
- using JPPF tasks to start external programs.
This approach has the advantages of providing a very loose coupling between the Java and non-Java parts, allowing you to keep your external code doing what it's already doing, while the Java wrapper focuses on providing the required inputs and getting the resulting outputs. I definitely recommend that approach, as it is, in my opinion, the one that requires the least amount of integration and maintenance.
We have actually started working on this approach, and some APIs are already available in the current distribution. They provide a way to start an external program or shell script from a JPPF task, and specify a set of input files or urls as well as output files or urls.
It's not completely tested yet, and I don't think it covers all uses cases, however it is something that you can use and build upon.
Currently, the only documentation available for this is in the javadoc, you can find it at this location.
I would be happy to get your feedback on it, and implement the features you believe are missing or incomplete.


Yes, the Java Wrapper is my current approach, very loosely coupled, minor modification when a new non-java "compute model" needs to be deployed. In fact, Java based models are processed in the same way. I need to actually dive in and attempt an implementation to know all the devils in the details but I would like to find out how, in this case for non-java compute models, one deals with input and output file transfers. Can JPPF deal with situations where there is a shared filesystem and cases where there isn't with respect to input/output/data files?

Quote from: "lolocohen"
2) From a high-level view, I see 2 main challenges:

- Integration with the cluster architecture.
You will probably have to develop a JPPF client that is capable to communicate with the resource manager and transform a job into a set of JPPF tasks. I recommend that JPPF server and nodes be running all the time, as opposed to started on-demand, so as not to infringe too much on the job's allocated time.
...


Unfortunately the nodes cannot in this case be running all the time. Actually Im thinking very loosely coupled here as well, an on-demand JPPF cluster that runs as a compute job within an existing scheduler's environment. So for example, I could request some number of nodes for a specified period of wall clock time and all they do is start up the JPPF infrastructure. After that I make a submission to JPPF infrastructure and run jobs, and when I'm done I take down the JPPF infrastructure. If the nodes stay up all the time, the accounting mechanism will charge time even though nodes are idle. The other main reason for this approach for remaining loosely coupled is that this technique could also be used on other compute cluster resources at the site and varying security levels, potentially at different sites. The customization portion is how do I start up and take down the JPPF on demand infrastructure for a specific compute cluster's scheduler.

Quote from: "lolocohen"
3) Network restrictions are the biggest pain point when using JPPF. JPPF communications use a custom protocol built directly on top of TCP/IP sockets (no SSL). A JPPF network requires 3 distinct TCP ports, there's no way around that in the current implementation. It may be possible to work around this limitation through SSH tunneling, or some other kind of tunneling, however I'm no expert on this topic and I would defer to a network specialist.


Yes this may be an issue for the ideal situation(where I could potentially run a single JPPF infrastructure using compute resources from multiple locations simultaneously) I was hoping there was a way to multiplex things across 443. If JPPF used a single port then SSH Tunneling could be the way, but Im not sure if there is a way to funnel the use of 3 ports through one. Is the 3 ports requirement on the Server/Client/Both?

Thanks for the comments so far.
Logged

lolo

  • Administrator
  • JPPF Council Member
  • *****
  • Posts: 2272
    • JPPF Web site
Potential use of JPPF on a project
« Reply #3 on: December 30, 2007, 01:03:44 PM »

Hello Ron,

Quote
Can JPPF deal with situations where there is a shared filesystem and cases where there isn't with respect to input/output/data files?

The short answer is yes, JPPF supports input/output files on both shared file systems and using protocols such as HTTP or FTP. For instance, you can use the standard IO APIs for file access (including NIO), or the standard URL connection APIs for remote protocols.  Let me know if you need more details on how to do this.
Quote
how do I start up and take down the JPPF on demand infrastructure for a specific compute cluster's scheduler.

Here, the basic problem is how to start a Java process (JPPF node) on a node of your cluster. I do not know how your cluster infrastructure works, so it's difficult to make recommendations. However, if the capability exists in your scheduler, the simplest would be to instruct it to start a shell script that will start the JPPF node, and kill that process when the node is not needed anymore. JPPF is fault-tolerant and supports starting and taking down its components in any order and at any time, so that will help.
Quote
Yes this may be an issue for the ideal situation(where I could potentially run a single JPPF infrastructure using compute resources from multiple locations simultaneously) I was hoping there was a way to multiplex things across 443. If JPPF used a single port then SSH Tunneling could be the way, but I'm not sure if there is a way to funnel the use of 3 ports through one. Is the 3 ports requirement on the Server/Client/Both?

In terms of ports requirements, here is how it works:
[list=1]
  • the server binds to 3 ports, let's name them a, b and c. Port a is used for communication between client and server, to submit a job a get the results. Port c is used for communication between the server and the nodes, to dispatch tasks to the nodes and get the execution results. Port b is used by the distributed class loader.
  • clients require a and b
  • nodes require b and c[/list:o]
    I have registered the following feature request to add TCP port multiplexing features to JPPF: 1860861 - Provide TCP port multiplexing
    We intend to deliver it for the next minor point release (JPPF 1.1).

    I hope this helps,
    -Laurent
Logged

rviloria

  • Guest
Potential use of JPPF on a project
« Reply #4 on: December 30, 2007, 01:44:16 PM »

Quote from: "lolocohen"
Hello Ron,
....
I hope this helps,
-Laurent


You have answered all my questions so far very well. Thank you again for your responses.

Ron
Logged

rviloria

  • Guest
Re: Potential use of JPPF on a project
« Reply #5 on: August 05, 2009, 07:39:31 AM »

Hello Ron,

Welcome to the JPPF forums.

I have a few answers and guidelines that I hope you will find useful:

1) to run non-java tasks, I see 2 possible approaches:

- using a JNI wrapper.
This would probably work with C++ or C programs, however I am not sure how easy it is to do with Perl programs. Also, JNI-based implementations are generally tightly coupled with the external programs and pose difficult challenges in terms of maintenance.

- using JPPF tasks to start external programs.
This approach has the advantages of providing a very loose coupling between the Java and non-Java parts, allowing you to keep your external code doing what it's already doing, while the Java wrapper focuses on providing the required inputs and getting the resulting outputs. I definitely recommend that approach, as it is, in my opinion, the one that requires the least amount of integration and maintenance.
We have actually started working on this approach, and some APIs are already available in the current distribution. They provide a way to start an external program or shell script from a JPPF task, and specify a set of input files or urls as well as output files or urls.
It's not completely tested yet, and I don't think it covers all uses cases, however it is something that you can use and build upon.
Currently, the only documentation available for this is in the javadoc, you can find it at this location.
I would be happy to get your feedback on it, and implement the features you believe are missing or incomplete.

2) From a high-level view, I see 2 main challenges:

- Integration with the cluster architecture.
You will probably have to develop a JPPF client that is capable to communicate with the resource manager and transform a job into a set of JPPF tasks. I recommend that JPPF server and nodes be running all the time, as opposed to started on-demand, so as not to infringe too much on the job's allocated time.

- Scheduling constraints
Here I'm thinking mostly about time constraints, i.e. how long a job is allowed to run. I believe JPPF can accommodate this by specifying a timeout on the tasks. There is a feature that enables tasks to timeout after a specific length of time or at a specified date/time. This is documented here

With these points in mind, I believe your idea will work.

3) Network restrictions are the biggest pain point when using JPPF. JPPF communications use a custom protocol built directly on top of TCP/IP sockets (no SSL). A JPPF network requires 3 distinct TCP ports, there's no way around that in the current implementation. It may be possible to work around this limitation through SSH tunneling, or some other kind of tunneling, however I'm no expert on this topic and I would defer to a network specialist.

I hope this helps,
-Laurent

How are these points being addressed now with the latest release? Its been a while since I reassessed JPPF and Ive recently recrossed paths with it.

-Ron
Logged

lolo

  • Administrator
  • JPPF Council Member
  • *****
  • Posts: 2272
    • JPPF Web site
Re: Potential use of JPPF on a project
« Reply #6 on: August 06, 2009, 09:12:58 AM »

Hello Ron,

I believe most of the points we discussed have now been implemented, or will be in the upcoming JPPF 2.0 version.

For executing non-Java tasks, there is now a specific API. We have documented it here: http://www.jppf.org/wiki/index.php?title=Running_a_non-Java_program_or_script

In terms of accommodating constrained network environments, we developed a module specifically for this, which we call the TCP multiplexer. Its goal is to route all JPPF traffic through a single port.
We have documentation on this module at this location: http://www.jppf.org/wiki/index.php?title=JPPF_And_Networking#The_TCP_port_multiplexer

For the scheduling part, I believe we still have some way to go. It is possible to take down a server or node using the management APIs, but we currently have nothing specific to enable the on-demand startup of a JPPF cluster. This will be addressed in JPPF 2.0, but it's currently too early to say how. JPPF 2.0 will also provide full-fledged management capabilities at the job level (monitor, cancel, suspend, timeout, etc...).

Sincerely,
-Laurent
Logged

rviloria

  • Guest
Re: Potential use of JPPF on a project
« Reply #7 on: August 06, 2009, 09:24:36 AM »

Response was very much appreciated!

-Ron
Logged
Pages: [1]   Go Up
 
JPPF Powered by SMF 2.0 RC5 | SMF © 2006–2011, Simple Machines LLC Get JPPF at SourceForge.net. Fast, secure and Free Open Source software downloads