JPPF, java, parallel computing, distributed computing, grid computing, parallel, distributed, cluster, grid, cloud, open source, android, .net
JPPF

The open source
grid computing
solution

 Home   About   Features   Download   Documentation   On Github   Forums 
March 28, 2023, 11:50:31 PM *
Welcome,
Please login or register.

Login with username, password and session length
Advanced search  
News: New users, please read this message. Thank you!
  Home Help Search Login Register  
Pages: [1]   Go Down

Author Topic: Hadoop comparing to JPPF  (Read 10977 times)

EqAfrica

  • JPPF Padawan
  • *
  • Posts: 1
Hadoop comparing to JPPF
« on: December 19, 2009, 12:17:48 AM »

Dear community,

I am exploring the features of JPPF and I am wondering is there any way to use JPPF as an alternative to Hadoop MapReduce?
Can we compare those two frameworks at all ?
MapReduce is a software framework for distributed processing of large data sets on compute clusters.

Best regards,
Nikola Radakovic
Logged

mschaaf

  • Guest
Re: Hadoop comparing to JPPF
« Reply #1 on: February 18, 2010, 10:32:50 AM »

Dear Nikola,

from my point of view it is an alternative. And the question should be what is the best framework for my requirements and my starting point (creating a new application or extending an existing application).

In a project I am involved a framework to distribute the work over several desktop PCs was needed. The task was to compare a large amount and constantly growing data set. To this time it was only possible to scale with the number of processors and memory. The data is compared in sets of constant size to control the memory consumption. The computers we use are low cost hardware. This means single/double core desktop PCs with a 100Mbit network connection and up to 1 GB Ram.

The idea was to use hadoop to process this data. The team has about 3 years experience in Hadoop programming. First prototypes showed that a lot of rewriting is needed to get the application to fit in the MapReduce way of processing data. We had to export our data as files for input to the MapReduce process. The MapReduce process blows this data up to a multiple of the starting data. The result was a very high network load. To store this data the Hadoop DFS was also needed. This prototyping takes about a week.

So in result:
1. big amount of integration work (also with Cascading)
2. dependency to HDFS (Hadoop Distributed Filesystem)
3. high network traffic
4. blows up data

JPPF was the second try. In the original application Java threads where used to read there data from a blocking queue and compare them to set of data the thread had. The use of this blocking queue had to be rewritten. It wasn't easy possible to share data between tasks where one tasks writes in and the other reads it. But with a minimal change to give a task also a set of data from this queue in advance the problem was solved. In under a day a prototype was ready. Very good integration in our application. Network traffic is low. Very small overhead. The application performs with small data sets as fast as the previous solution. After one more day the prototype was ready for production.

So in result:
1. very very small integration work
2. a DFS is not needed
2. low network traffic
3. no data blow up
4. integrates well with existing source code

In result JPPF is used. The decision for JPPF and against Hadoop MapReduce was the result from our testings and our requirements. For other requirements and other scenarios the result could be another.

Best regards,
Martin Schaaf
Logged
Pages: [1]   Go Up
 
JPPF Powered by SMF 2.0 RC5 | SMF © 2006–2011, Simple Machines LLC Get JPPF at SourceForge.net. Fast, secure and Free Open Source software downloads