Bill Clementson's Blog

Bits and pieces (mostly Lisp-related) that I collect from the ether.

January 2005
Sun Mon Tue Wed Thu Fri Sat
1
2 3 4 5 6 7 8
9 10 11 12 13 14 15
16 17 18 19 20 21 22
23 24 25 26 27 28 29
30 31
Dec  Feb

Parallel Computing in Lisp - Part 4

Tuesday, January 25, 2005

I've had a bit of feedback from my previous posts (see Part 1, Part 2, Part 3) on parallel computing in Lisp. I'll cover that feedback as well as tie in a couple of related topics in this post.

Charles Stewart is bothered that I used the Wikipedia definitions for parallelism, concurrency and distribution that (in his opinion) "abolish useful distinctions that I think they ought to express". His feels that:

  1. Parallelism is a property of a computation or a machine, which means that control flow occurs on parallel pathways. It is opposed to sequentiality as a property of computation or machines;
  2. Concurrency is a property of programs and programming languages, which means the programming language has control constructs (or the program makes use of such) that are most naturally realised by parallel computations. It is, a little confusingly, also opposed to sequentiality as a property;
  3. Parallelism and concurrency are intimately related, but it is frequently useful to distinguish them: one expresses the program view, the other the execution view. Concurrent code may be realised on a sequential machine by use of, say, time-sharing; similarly sequential code may be realised by a parallel computation by means of a behind-the-scenes program transformation, which discovers opportunity for parallelism by means of flow analysis.
  4. Distribution is something else entirely: of a computation it means that it occurs in space spread out over many locations; of code it means the code expresses that data is stored or code is to be computed at several, named, locations.
  5. Distribution normally entails parallelism/concurrency, but code may be concurrent without describing locations, and quantum computation allows parallelism to happen in one place. Furthermore, this entailment is not necessary: a machine that is spread out over many locations, but which is designed so that computation may occur at one sublocation only when computation is suspended at other locations, is not parallel.
Luke Gorrie sent me a similar comment via email. He mentioned that there were a couple of good discussions of parallel vs. concurrent on the Lambda the Ultimate weblog (see here and here) and summarized what he felt were the differences (emphasis is mine):
"I also like these distinct definitions:
Parallelism is about solving a problem using multiple machines. For example, to render an image using a network of 100 computers.

Concurrency is about coordinating lots of things that are happening at the same time. For example, writing an IRC server.
Some problems combine both, like a database handling lots of concurrent transactions while using lots of disks and CPUs in parallel, but it's not always so.

I find this distinction useful in very many ways. For one, it says that Lisp-based research on parallel programming is some of the best work that's ever been done, and that no research on concurrent Lisp programming has been done at all!"
Although I agree with both Charles and Luke that it is sometimes useful to make these distinctions, I feel that the boundaries between "parallelism", "concurrency" and "distribution" will get very blurry in the future. That is why I like the Wikipedia definitions.

In his email, Luke also pointed out another good reference (link added):
"Did you know that Tony Hoare's classic book Communicating Sequential Processes (CSP) used Lisp (with M-expr syntax) as a kind of "formal semantics"? The book is online here: http://www.usingcsp.com/. I love it for some reason, I think it's that it gives a nice visual metaphor for concurrent systems (it almost makes them feel mechanical - I think that's a property of synchronous message passing systems but I've never actually tried programming that way)."
The mention of Hoare's CSP book is a good segue to my next email: Matthew Jadud (whose weblog I enjoy reading) emailed me about the concurrent programming language that he and Christian Jacobsen have been working on. It's called Transterpreter and is a new runtime for the occam programming language: "The Transterpreter is a virtual machine for the occam-pi programming language (a language designed explicitly for parallel programming, it is based on the Communicating Sequential Processes (CSP) algebra). The Transterpreter is based on the INMOS Transputer; currently, it allows classical occam and occam-pi programs to be run on virtually any platform, ranging from Windows, Mac OS X and Linux desktops (on Intel X86, MIPS and PowerPC processors). We're especially interested in running occam programs on small devices like the Lego Mindstorms, PDAs, and mobile phones. The Transterpreter is essentially a portable runtime for the KRoC compiler." Interestingly, occam-pi (on which Transterpreter is based) is meant to blend the best features of both CSP and pi-calculus: "In the interests of proveability, we have been careful to preserve the distinction between the original simple static point-to-point synchronised communication of occam and the dynamic asynchronous multiplexed communication of the pi-calculus; in this we have been prepared to sacrifice the elegant sparsity of the pi-calculus. We conjecture that the extra complexity (and discipline) introduced will make the task of proving concurrent and distributed programs easier." They'll be presenting this work at ACM's SIGCSE (Special Interest Group Computer Science Education) next month. There is an overview available as well as a paper and there will be some more information posted about the language on the Transterpreter weblog over the next few weeks. If you're not into occam but are into lisp, you might want to read Matt's earlier posting where he played around with MzScheme's ConcurrentML library and hacked together an s-expression-based syntax for occam-y constructs.

Since Matt's Transterpreter is designed to run on both desktops and small devices, it is worth pointing out the cell processor that is being designed for the PlayStation 3, Sony, Toshiba and IBM. This processor is targeted towards both desktop and small form factor devices and is a "wild card" that could potentially change the market for distributed, parallel applications. Read the hype:
"The first Cell based desktop computer will be the fastest desktop computer in the industry by a very large margin. Even high end multi-core x86s will not get close. Companies who produce microprocessors or DSPs are going to have a very hard time fighting the power a Cell will deliver. We have never seen a leap in performance like this before and I don't expect we'll ever see one again, It'll send shock-waves through the entire industry and we'll see big changes as a result.

The sheer power and low cost of the Cell means it will present a challenge to the venerable PC. The PC has always been able to beat competition by virtue of it's huge software base, but this base is not as strong as it once was. A lot of software now runs on Linux and this is not dependant on x86 processors or Microsoft. Most PCs now provide more power than is necessary and this fact combined with fast JIT emulators means that if necessary the Cell can provide PC compatibility without the PC.

It will not just attack the PC industry but expect it to be widely used in embedded applications where high performance is required. This means it will be made in numbers potentially many times that of x86 CPUs and this will reduce prices further. This will also hurt PC based vendors' desires to enter the home entertainment space as PC based solutions [Entertainment] will be more complex and cost more than Cell based systems.

This is going to prove difficult for the PC as CPU and GPU suppliers will have essentially nothing to fight back with. All they can hope to do is match a Cell's performance but even that is going to be incredibly difficult given the Cell's aggressive Cray-esqe design strategy.

Cell is going to turn the industry upside down, nobody has ever produced such a leap in performance in one go and certainly not at a low price. The CPU producers will be forced to fight back and irrespective of how well the Cell actually does in the market you can be sure that in a few short years all CPUs will be providing vastly more processing resources than they do today. Even if the Cell was to fail we shall all gain from it's legacy.

Not all companies will react correctly or in time, this will provide opportunities for newer, smaller and smarter companies. Big changes are coming, they may take years but the Cell means a decade from now the technology world is going to look very different."
If a fraction of this prediction is true, people who are ready to take advantage of this form of distributed computing power stand to profit from this "next-gen" processor architecture. If you're interested in reading more about the Cell architecture, you might want to follow some of the Technorati references. Incidentally, the following cells distributed processing diagram reminded me of an earlier Lisp-based approach:


Paul Dietz (well-known in Lisp circles for the Common Lisp compliance test suite he's been working on) mentioned the Cilk language:
"Cilk is a variant of C with features for spawning parallel computations (Cilk programs become equivalent serial C programs when the new Cilk keywords are removed). Its implementation has an effective and highly cool 'work stealing' scheduler for allocating large numbers of fine grained tasks among a smaller set of worker threads.

I don't know if Cilk would be suitable for something like a web server, but for highly parallel computational tasks programs it has done well. A team using Cilk won the ICFP'98 Programming Contest, and Cilkchess, a chess program written in Cilk, came in 4th (out of 30 programs) in the 1999 World Computer Chess Championship."
And, lastly, Ken Dyck sent me an email in which he mentioned that the Mozart Programming System has features similar to those I discussed when I previously mentioned Kenny Tilton's Cells. From the Mozart website:
"The Mozart system provides state-of-the-art support in two areas: open distributed computing and constraint-based inference. Mozart implements Oz, a concurrent object-oriented language with dataflow synchronization. Oz combines concurrent and distributed programming with logical constraint-based inference, making it a unique choice for developing multi-agent systems. Mozart is an ideal platform for both general-purpose distributed applications as well as for hard problems requiring sophisticated optimization and inferencing abilities. We have developed applications in scheduling and time-tabling, in placement and configuration, in natural language and knowledge representation, multi-agent systems and sophisticated collaborative tools."
There's definitely a lot happening in the parallel/concurrent/distributed space at the moment. I think this is an area where a lot of opportunities exist.

emacs Copyright © 2005 by Bill Clementson