adequate
adequate
adequate
adequate
 

JPPF
 Home   About   Download   Documentation   Forums 
June 19, 2013, 12:15:26 PM *
Welcome,
Please login or register.

Login with username, password and session length
Advanced search  
News: Registered users, your contribution is requested! Please participate in our JDK support poll
New users, please read this message. Thank you!
  Home Help Search Login Register  
Pages: [1]   Go Down

Author Topic: Locality scheduling  (Read 2108 times)

fullung

  • JPPF Padawan
  • *
  • Posts: 14
    • http://lunglet.net
Locality scheduling
« on: July 10, 2007, 12:46:06 AM »

Hello all

I am investigating JPPF for a machine learning application. I am specifically interested in distributing the work required to do EM training of Gaussian Mixture Models.

http://en.wikipedia.org/wiki/Expectation-maximization_algorithm

For example, for the K-Means algorithm:

http://en.wikipedia.org/wiki/K-means_clustering

the E-step in the EM algorithm involves computing the distance between each data point and each cluster centroid. Since I have a lot of data, I would like to send the current centroids to all the nodes, have them calculate distances and report back, at which point I can do the M-step.

For large datasets, it becomes useful to have the same node process the same data on each iteration. Since the data has to come from somewhere initially (like a central server), having the same node work on the same data all the time can save a lot of bandwidth.

BOINC calls this concept locality scheduling:

http://boinc.berkeley.edu/sched_locality.php

Before I discovered JPPF, I was looking at using JMS to distribute the tasks to my nodes. The ActiveMQ JMS message broker has a very cool feature called message groups:

http://activemq.apache.org/message-groups.html

This allows you to ensure that the same consumer receives all the messages from a logical group which is determined by setting the same ID on all the messages in the group. This provides a basic way of doing locality scheduling.

I have briefly looked at JPPF's DataProvider mechanism, but it doesn't quite look like it's going to be able to help with locality scheduling.

Any thoughts on how one could achieve locality scheduling with JPPF?

Cheers,

Albert

P. S. More details about the setup I envision.

I'm looking at storing the data on a Hadoop distributed file system:

http://lucene.apache.org/hadoop/
http://lucene.apache.org/hadoop/hdfs_design.html

Each node will have an Ehcahche instance in which it will keep the data it fetches from the distributed file system:

http://ehcache.sourceforge.net/

This cache can be configured to have a maximum memory size and could possibly spool to disk, or discard the least recently used entries.
Logged
Pages: [1]   Go Up
 
Support This Project Powered by SMF 2.0 RC5 | SMF © 2006–2011, Simple Machines LLC Powered by Parallel Matters Get JPPF at SourceForge.net. Fast, secure and Free Open Source software downloads