JPPF, java, parallel computing, distributed computing, grid computing, parallel, distributed, cluster, grid, cloud, open source, android, .net

### The open sourcegrid computingsolution

March 28, 2023, 11:36:03 PM

Pages: [1]   Go Down

### AuthorTopic: JPPF for parallelization of loops.  (Read 4433 times)

#### jovanjovanovic

• Posts: 4
##### JPPF for parallelization of loops.
« on: December 02, 2012, 02:02:54 PM »

Hello everyone. I'v installed JPPF, tested it through Application template using my desktop and my laptop (both running Windows7) and it's working great. The idea I wanted to realize in first place before searching for frameworks that will allow me to do parallel processing was to use computer classroom on my University (100 computers) for speeding up brute-force testing. Our main course on University is Cryptology and there is SO MUCH we could do and measure if we could use processing power of 100 computers.

First thing, and it's really basic, I would like to do is to speed up brute-force searching (you know, for loop - going through alphabet in a for loop, in a for loop, in a for loop...), but the problem is that is ONE huge process, not multiple simple processes that JPPF could divide. Do you guys have any idea how could I divide such a construction of loops into independent tasks so JPPF could do it's job properly?

Cheers.
Logged

#### lolo

• JPPF Council Member
• Posts: 2272
##### Re: JPPF for parallelization of loops.
« Reply #1 on: December 03, 2012, 08:09:56 AM »

Hello,

Without a more precise description of what you want to achieve, we can only provide general recommendations. I would follow this general approach:

1) identify in your algorithm the atomic operation which is always executed within the within the deepest loop. By atomic, I mean a function such that its execution for a given set of indices (corresponding to all the loops it traverses) is independent from the execution of any other set of indices.

For example, if you have something like this:
Code: [Select]
`for (int i1 = 0; i1 < n1; i1++) {  for (int i2 = 0; i2 < n2; i2++) {    ...      for (int iN = 0; iN < nN; iN++) {        myFunction(i1, .., iN);      }  }}`
Here, the atomic operation is represented as "myAtomicFunction", and this is is soemething that can be implemented as a JPPF task:
Code: [Select]
`public class MyTask extends JPPFTask {  // initialization on the client side  public MyTask(int i1, .., int iN) {    ...  }  // execution on the ndoe side  public void run() {    Object o = myAtomicFunction(i1, .., iN);    setResult(o);  }  private Object myAtomicFunction(int i1, .., int iN) {    ...  }}`
2) You need to determine the granularity of your tasks, to avoid catastrophic performance issues. Typically, each call to myFunction() will have a very short execution time, which means that it may not be very efficient to have a single invocation per JPPF task: you will have a very large number of tasks, which may overload the grid, plus the grid overhead (serialization, network transport) will largely outweigh the benefit of distributing the execution.
So in this case, you should consider using JPPF tasks that are not atomic from the algorithm's point of view, but rather group a number of atomic operations. So for each task, you could implement the following:
Code: [Select]
`public class MyTask extends JPPFTask {  // initialization on the client side  public MyTask(int i1Start, int i1End, int n2, ..., int nN) {    ...  }  // execution on the ndoe side  public void run() {    Object[] myResults = new Object[i1End - i1Start + 1];    for (int i1 = i1Start; i1 <= i1End; i1++) {      myResults[index - i1Start] = myFunction(index, i2, .., iN);    }    setResult(myResults);  }  private Object myAtomicFunction(int i1) {    for (int i2 = 0; i2 < n2; i2++) {      ...        for (int iN = 0; iN < nN; iN++) {          Object o = myAtomicFunction(i1, i2, .., iN);          ...        }     }     ...  }  private Object myAtomicFunction(int i1, .., int iN) {    ...  }}`
Determining the granularity of the tasks will probably be the most difficult part, as you need to find a balance between the computational weight of each task and the total number of tasks, to ensure that distributing the computations will provide the best performance speedup.

We have a simple example of this approach in our matrix multiplication sample, where the atomic operation consists in multiplicating 2 matrix cell values and adding the result to a sum. In this sample, we chose for the grnaularity of the tasks to perform the multiplication of one row of a matrix with the entire second matrix. For a square matrix of size n, the sequential computation time is in O(n3). In our sample, the execution of each task is in O(n2) and we have n tasks.

I hope this helps.

Sincerely,
-Laurent
Logged

#### jovanjovanovic

• Posts: 4
##### Re: JPPF for parallelization of loops.
« Reply #2 on: December 05, 2012, 05:02:15 AM »

Ok, thanks. Great help. Until I read this I managed to make something using your application template. Now I'll talk on concrete example. I have a String password = "ZZZZZZ" - 6 characters. My "dictionary" looks like this:
Code: [Select]
`char dictionary[] = {'0', '1', '2', '3', '4', '5', '6', '7', '8', '9', 'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o',            'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z', '@', 'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M',            'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z'};`
Single computer using dual core processor needs 3 seconds to find "ZZZZ" - 4 characters when running through netbeans. This is my testing code:
Code: [Select]
`        String password = "ZZZZ";        String temp;        char dictionary[] = {'0', '1', '2', '3', '4', '5', '6', '7', '8', '9', 'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o',            'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z', '@', 'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M',            'N', 'O', 'P', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z'};        for (int a = 0; a < dictionary.length; a++) {            for (int b = 0; b < dictionary.length; b++) {                for (int c = 0; c < dictionary.length; c++) {                    for (int d = 0; d < dictionary.length; d++) {                        temp = String.valueOf(dictionary[a]) + dictionary[b] + dictionary[c] + dictionary[d];                        if (password.equals(temp)) {                            System.out.println(temp);                        }                    }                }            }        }NetBeans prints out:run:ZZZZBUILD SUCCESSFUL (total time: 3 seconds)`
Very very simple. But if I figure out how to finish job using simple methods, I could make anything after that. Ok. Using this theory, I figured out that for one task, 4 for loops would be nice measure (maybe I'm wrong?).

Using "application-template" from JPPF packet, I'v managed to make this idea distributable, but this time, I'm searching for password that has 6 characters. Like I wrote in the beginning. "task" code looks like this:
Code: [Select]
`In constructor, task gets "atmoment" string and "password" string. Password String is "ZZZZZZ". You'll see in Application runner code what is atmoment String.public void run() {        String temp;        char dictionary[] = {'0', '1', '2', '3', '4', '5', '6', '7', '8', '9', 'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o',            'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z', '@', 'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M',            'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z'};        String zero = atmoment.substring(0, 1);        String one = atmoment.substring(1, 2);        for (int a = 0; a < dictionary.length; a++) {            for (int b = 0; b < dictionary.length; b++) {                for (int c = 0; c < dictionary.length; c++) {                    for (int d = 0; d < dictionary.length; d++) {                        temp = zero + one + dictionary[a] + dictionary[b] + dictionary[c] + dictionary[d];                        if (password.equals(temp)) {                            setResult(temp);                        }                    }                }            }        }    }`
Because I'm running 4 loops in task, to find 6 characters password, I need to call them through another 2 for loops. Application runner code looks like this.
Code: [Select]
`public JPPFJob createJob() throws Exception {        // create a JPPF job        String password = "ZZZZZZ";        char dictionary[] = {'0', '1', '2', '3', '4', '5', '6', '7', '8', '9', 'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o',            'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z', '@', 'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M',            'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z'};        JPPFJob job = new JPPFJob();        // give this job a readable unique id that we can use to monitor and manage it.        job.setName("Brute-Force");        for (int i = 0; i < dictionary.length; i++){            for(int j = 0; j < dictionary.length ; j++){                atmoment = String.valueOf(dictionary[i]) + String.valueOf(dictionary[j]) + "0" + "0" + "0" + "0";                job.addTask(new TemplateJPPFTask(password, atmoment));            }        }        // start the job in suspended mode        //job.getSLA().setSuspended(true);        return job;    }`You can see that atmoment String is dynamic based on these two for loops. If these two for loops are at 'x', 'D', atmoment will look like "xD0000". Task will run through these four zeros (xD0000, xD0001, xD0002) and if it finds given password it will give me back the resoult.

This implementation is working, no problems. I'm using my desktop and my laptop (both running on Windows7 and both burning my desk while calculating this ) and they found "ZZZZZZ" in 23mins 7sec.
Now, if I move this to bigger scale, on my university where I have 100 computers. I'll install all this and this calculation will last much less.

Thing that is bothering and I can't connect is WHAT IF I want to search for password that has for example 10 characters, or even more? Should I add four more for loops in ApplicationRunner? Is that going to flood everything and make mess? How should I implement this method if I want more characters in password? Is JPPF waiting for task to be completed and then sends new one or it sends all 63*63*63*63*63*63 tasks and then waits for results? Can I make him wait until some "worker" is free and then to send new task?
And another quick question. How to stop everything when I get result that is non zero - meaning that some worker found matching password? It's not going to be "ZZZZZZ" every time, maybe it will be "htzUle" and there is no need to "spin" all the way to "ZZZZZZ"?

Thanks for all your help so far, I hope you will find some time to read this and give me proper answer. Cheers.
Logged

#### lolo

• JPPF Council Member
• Posts: 2272
##### Re: JPPF for parallelization of loops.
« Reply #3 on: December 09, 2012, 05:55:05 PM »

Hello,

The first thing to do is to optimize the initial algorithm. In your initial code, you have an obvious performance hotspot when you do:
temp = String.valueOf(dictionary[a]) + dictionary + dictionary[c] + dictionary[d];
This string concatenation is a huge performance sink. It is a lot more efficient to deal with arrays of chars, so I modified your initial code into this:

Code: [Select]
`long time = System.nanoTime();String password = "ZZZZ";char[] pwdChars = password.toCharArray();char[] temp = new char[pwdChars.length];for (int a = 0; a < dictionary.length; a++) {  temp[0] = dictionary[a];  for (int b = 0; b < dictionary.length; b++) {    temp[1] = dictionary[b];    for (int c = 0; c < dictionary.length; c++) {      temp[2] = dictionary[c];      for (int d = 0; d < dictionary.length; d++) {        temp[3] = dictionary[d];        count++;        if (Arrays.equals(temp, pwdChars)) {          time = System.nanoTime() - time;          output("found in " + StringUtils.toStringDuration(time/1000000L));          return;        }      }    }  }}`
With this, the processing time whent down from 675 ms to 59 ms, which is around a 10x improvement (on an Intel i7-2600 quad core + hyperthreading).
In this scenario (password length = 4) the sequential search is so short that the overhead of parallelizing will largely outweigh the performance improvements.
So we can say that for short passwords it is not worth parallelizing.

However, for longer passwords, it is definitely hugely beneficial. For instance, on my machine, looking for "ZZZZZ" (5 chars) takes around 3.5 seconds, and looking up for "ZZZZZZ" (6 chars) takes around 3mn40s (i.e. 220s).  So first let's write a search algorithm that will work for any password length. We can do that by writing a recursive algorithm:

Code: [Select]
`public static void loopOptimized(final char[] password) {  char[] temp = new char[password.length];  int[] indices = new int[password.length];  boolean b = loopOptimized(0, password.length-1, indices, temp, password);}public static boolean loopOptimized(final int currentDepth, final int maxDepth, final int[] indices, final char[] temp, final char[] pwdChars) {  if (currentDepth < maxDepth) {    for (indices[currentDepth] = 0; indices[currentDepth] < dictionary.length; indices[currentDepth]++) {      temp[currentDepth] = dictionary[indices[currentDepth]];      if (loopOptimized(currentDepth + 1, maxDepth, indices, temp, pwdChars)) return true;    }  } else {    for (indices[currentDepth] = 0; indices[currentDepth] < dictionary.length; indices[currentDepth]++) {      temp[currentDepth] = dictionary[indices[currentDepth]];      if (Arrays.equals(temp, pwdChars)) return true;    }  }  return false;}`
Now this can be easily split into independent tasks, by having for each task the first p characters pre-filled, as you did previously.
p = 2 seems a reasonable choice as it produces a number of tasks that can be efficiently distributed among the nodes in the grid, unless you have thousands of nodes, in which case you might consider chosing p = 3.
This gives us a task implementation like this:
Code: [Select]
`public class ParallelLoopTask extends JPPFTask {  private transient char[] dictionary; // retrieved from the DataProvider  final int idx1;  final int idx2;  boolean found = false;  public ParallelLoopTask(final int idx1, final int idx2) {    this.idx1 = idx1;    this.idx2 = idx2;  }  @Override  public void run() {    try {      char[] pwd = (char[]) getDataProvider().getValue("password");      dictionary = (char[]) getDataProvider().getValue("dictionary");      int[] indices = new int[pwd.length];      indices[0] = idx1;      indices[1] = idx2;      char[] temp = new char[pwd.length];      temp[0] = dictionary[idx1];      temp[1] = dictionary[idx2];      found = loop(2, pwd.length-1, indices, temp, pwd);      if (found) setResult(temp);    } catch (Exception e) {      setException(e);    }  }  public boolean loop(final int currentDepth, final int maxDepth, final int[] indices, final char[] temp, final char[] pwd) {    if (currentDepth < maxDepth) {      for (indices[currentDepth] = 0; indices[currentDepth] < dictionary.length; indices[currentDepth]++) {        temp[currentDepth] = dictionary[indices[currentDepth]];        if (loop(currentDepth + 1, maxDepth, indices, temp, pwd)) return true;      }    } else {      for (indices[currentDepth] = 0; indices[currentDepth] < dictionary.length; indices[currentDepth]++) {        temp[currentDepth] = dictionary[indices[currentDepth]];        if (Arrays.equals(temp, pwd)) return true;      }    }    return false;  }}`
Then the job is submitted as follows:
Code: [Select]
`private static void submitJob(final char[] pwd) throws Exception {  final JPPFJob job = new JPPFJob();  int length = dictionary.length;  for (int i=0; i<length; i++) {    for (int j=0; j<length; j++) {      job.addTask(new ParallelLoopTask(i, j));    }  }  DataProvider dp = new MemoryMapDataProvider();  dp.setValue("dictionary", dictionary);  dp.setValue("password", pwd);  job.setDataProvider(dp);  List<JPPFTask> results = jppfClient.submit(job);}`
With this, and using a single node with 8 threads, I get a maximum time of 48s for a password length of 6: that's a speedup of 6x when compared to the serial execution. So it's not exactly linear, but still a very significant improvement.

Now, to be able to stop all computations when a solution is found I have implemented a mechanism as follows:

- as you can see in the task's code, we set the task result to the password found

- we will add a result listener to the job, so we can cancel the job as soon as we know that a solution has been found:
Code: [Select]
`job.setResultListener(new JPPFResultCollector(job) {  @Override  public synchronized void resultsReceived(final TaskResultEvent event) {    super.resultsReceived(event);    if (event.getTaskList() != null) {      for (JPPFTask task: event.getTaskList()) {        if (task.getResult() != null) {          cancelJob(job.getUuid());          output("found password: " + new String((char[]) task.getResult()));          break;        }      }    }  }});`
where the method cancelJob(String jobUuid) is defined as:
Code: [Select]
`private static void cancelJob(final String jobUuid) {  Runnable r = new Runnable() {    @Override    public void run() {      try {        jppfClient.cancelJob(jobUuid);      } catch (Exception e) {        e.printStackTrace();      }    }  };  new Thread(r).start();}`
- for the tasks to know that they should stop processing, we add the task lifecycle's onCancel() method to set a flag that can be check in the loop(...) method:
Code: [Select]
`@Overridepublic void onCancel() {  cancelFlag = true;}`
- lastly, we add some code in the task to cancel the job in the current node when a solution is found, by using the node management APIs:
Code: [Select]
`private void cancelJob(final String jobUuid) throws Exception {  Runnable r = new Runnable() {    @Override    public void run() {      try {        JMXNodeConnectionWrapper jmx = new JMXNodeConnectionWrapper();        jmx.connect();        jmx.cancelJob(jobUuid, false);        jmx.close();      } catch(Exception e) {        e.printStackTrace();      }    }  };  new Thread(r).start();}`this method is called is the task's run() method:
Code: [Select]
`public void run() {  ...  if (found) {    setResult(temp);    String uuid = (String) getDataProvider().getValue("uuid");    cancelJob(uuid);  }  ...}`
I have attached a full working example to this post.
Using this mechanism allows you to stop the search much sooner.

I hope this helps.

Sincerely,
-Laurent
« Last Edit: December 09, 2012, 05:57:20 PM by lolo »
Logged

#### jovanjovanovic

• Posts: 4
##### Re: JPPF for parallelization of loops.
« Reply #4 on: December 16, 2012, 12:21:00 AM »

This is great, I can't believe that you actually wrote whole solution Thank You so much!
I'm reading this now and learning. I'll send you results as soon as I implement this on some real computing power.

Funny thing is that I can understand this code but I can't understand how to download that attachment

Thank you again!
Logged

#### lolo

• JPPF Council Member
• Posts: 2272
##### Re: JPPF for parallelization of loops.
« Reply #5 on: December 16, 2012, 08:08:52 PM »

Hi,

I have uploaded it here: http://www.jppf.org/private/parallelforloop.zip
Let me know if it doesn't work.

-Laurent
Logged