Need your help in validating that JPPF (which I believe is) is the right solution...
Here is the Context (problem and potential solution:
- Web Application takes input in the form of a list of items (eg: from excel or XML) and then passes the data on to the system under discussion.
- This system now Validates the lines against database, external systems and by some in-process rules
- Some of the input data could run into tens of thousands of lines. Consequently the sequential process takes a lot of time to complete, annoying users.
The (first)solution to address the above problem was to
- Do all common processing for the whole input.
- Batch the input into multiple groups of lines and then process them in parallel.
Now, the resource available in a machine becomes an issue... when a large input is being processed, it will hold down the resources, impacting other (mostly smaller inputs).
That is worser than the original problem.
So, to workaround this issue, we add more machines so multiple inputs can be processed in parallel.
Any request is confined to a single machine. And, so, it is constrained by the number of parallel threads available in that single machine.
This is where we are, as of now.
However, this is still not perfect... for a large input, we are constraining all the processing to one single machine and not utilizing the other machines that could be idle.
This is where JPPF looks a good fit.
The architecture/topology that I have in mind is every machine runs a server, at least (more on this later) one node, and a client. This way, every machine is individually able to handle requests without a dependency on any other machine.
The servers in all the machines will be connected to each other.
One large input gets split into multiple tasks. The tasks are now added to one job. The job is sent to a server.
The server now splits it into multiple nodes (local or remote - thru a server).
The nodes that get assigned this job's tasks, process the tasks in parallel threads.
(Wanted to post a crude picture, that will help understand the approach. But, not familiar with including an image as part of a forum comment. Will try that later)
If I need more processing power, I just add a new machine (replica of every other machine; they are all homogeneous in terms of OS).
Now the question is ...
- is this a valid use case to build JPPF into?
- is this a valid topology or is it complicating things too much?
- Can multiple nodes be run within the same machine? Any resource contention issue that I need to be worried about?
Thanks