Running JPPF on Amazon's EC2, Rackspace, or other Cloud Services
From JPPF 4.2 Documentation
|
Main Page > Deployment > JPPF on Cloud services |
1 Java Cloud Toolkit
Apache jclouds® provides an open source Java toolkit for accessing EC2, Rackspace, and several other cloud providers. The Compute interface provides several methods for creating server(s) from either the provider's or your own saved image, file transfer, executing commands on the new server's shell, deleting servers, and more. Servers can be managed individually or in groups, permitting your JPPF client to provision and configure servers on-the-fly.
When you create a cloud server with this toolkit, you programmatically have access to its NodeMetadata including IP addresses and login credentials. By creating your driver and nodes in the right sequence, you can pass IP information between them, as well as create different server types and use Job SLA's based on the IPs to vary the types of servers you want for different types of jobs.
2 Server discovery
Cloud servers do not allow multicast network communication, so JPPF nodes must know which server to use ahead of time instead of using the auto-discovery feature. So the server property file must set:
jppf.discovery.enabled = false jppf.peer.discovery.enabled = false
And the node property file must set:
jppf.discovery.enabled = false jppf.server.host = IP_or_DNS_hostname
Similarly the client must set:
jppf.discovery.enabled = false jppf.drivers = driverA driverA.jppf.server.host = IP_or_DNS_hostname driverA.jppf.server.port = 11111
Amazon, Rackspace, and others charge for network access to a public IP, so you'll want the node to communicate with the internal 10.x.x.x address and not a public IP. More on this detail below...
3 Firewall configuration
EC2 puts all nodes into “security groups” that define allowed network access. Make sure to start JPPF servers with a special security group that allows access to the standard port 11111 and if you use the management tools remotely, 11198. You may also want to limit these to internal IPs 10.0.0.0/8 if your clients, servers and nodes are all within EC2.
Rackspace cloud servers have no default restrictions on private IPs and ports at the same datacenter, so JPPF will work out-of-the-box on an all-cloud network. If added security is desired, you can create an Isolated Cloud Network with your own set of private IP addresses (192.168.x.x). In order to associate cloud servers with a dedicated (managed) server at Rackspace, you must request to configure RackConnect to merge your cloud and managed accounts and use all-private IPs.
4 Instance type
EC2 and Rackspace nodes vary the number of available cores and available memory, so you may want a different node property file and startup script for each instance type you start, with an appropriate number of threads. For instance, on a EC2 c1.xlarge instance with 8 cores, you might want to have one additional thread so the CPU would be busy if any one thread was waiting on I/O:
processing.threads = 9
If your tasks require more I/O, you may need to experiment to find the best completion rate. You may want to configure multiple JPPF nodes on the same server.
5 IP Addresses
All EC2 and Rackspace instances will have both a public IP address (chosen randomly or your selected elastic IP), and a private internal IP 10.x.x.x. You are charged for traffic between availability zones regardless of address, and even within the same zone if you use the external IP. So you'll want to try to have the systems connect using the 10.x.x.x addresses.
Unfortunately, this complicates things a bit. Ideally you probably want to set up a pre-configured node image (AMI) and launch instances from that image as needed for your JPPF tasks. But you may not know the internal IP of the driver at the time. And you don't want to spend time creating a new AMI each time you launch a new task with a new driver. The following approaches will probably work:
One solution is to use a static elastic IP that you will always associate with the JPPF driver and eat the cost of EC2 traffic. It isn't that much really...
Or you can use DNS to publish your 10.x.x.x IP address for the driver before launching nodes, and configure the node AMI to use a fixed DNS hostname.
Or you can do a little programming with the EC2 or Rackspace API to pass the information around. This is the recommended approach. To this effect, JPPF provides a configuration hook, which will allow a node to read its configuration from a source other than a static and local configuration file. The node configuration plugin can read a properties file from S3 instead of a file already on the node. A matching startup task on the driver instance would publish an appropriate properties file to S3.
As of JPPF 4.1, you can use the getPrivateAddresses() method of the jclouds NodeMetadata class to return the private IP of the server, and then use the runScriptOnNode() method of the ComputeService class to set an environment variable or publish the IP in a file which can be referenced using substitutions or includes.
There are lots of other approaches that will give you the same results – just have the server publish its location to some known location (including possibly the node itself) and have the node read this and dynamically create its properties instead of having a fixed file.
Main Page > Deployment > JPPF on Cloud services |