Bill Clementson's Blog

Bits and pieces (mostly Lisp-related) that I collect from the ether.

December 2008
Sun Mon Tue Wed Thu Fri Sat
1 2 3 4 5 6
7 8 9 10 11 12 13
14 15 16 17 18 19 20
21 22 23 24 25 26 27
28 29 30 31
Nov  Jan

Clojure could be to Concurrency-Oriented Programming what Java was to OOP

Monday, December 1, 2008

It's difficult to know what specific things will cause a programming language to succeed. It's always easier in hindsight to come up with the reasons why one language was successful and another one wasn't. I think that sometimes it's a case of being in the right place at the right time with the right set of features. In his famous article "The Free Lunch Is Over: A Fundamental Turn Toward Concurrency in Software", Herb Sutter said "Concurrency is the next major revolution in how we write software". That was in 2005 - today, this is even more evident with the proliferation of multi-core CPU's. Clojure is a new language and one of the things that Clojure really seems to "get right" is it's approach to concurrency. It is extremely simple to use (and understand) and (especially if you've ever written concurrent code in Java or C++) most people will find writing concurrent Clojure programs a real pleasure! I've written about concurrency quite a bit in the past (mostly in relation to Lisp or Erlang), and I was very impressed by how Rich Hickey has made concurrency support an integral part of Clojure. So, what makes Clojure's concurrency support different from other languages?

A comment by Dion Almaer, former editor of TheServerSide, is recounted in the Preface to the book Java Concurrency in Practice in which he stated "that most Java programs are so rife with concurrency bugs that they work only 'by accident'". And, if you read the book, it's easy to see why getting concurrency right in Java is so difficult (although the book is about Java concurrency, the same issues are common to many popular programming languages today). At the end of Part I of Java Concurrency in Practice, there is a nice "cheat sheet" summary of the main concepts and rules related to writing good concurrent code:

  1. t's the mutable state, stupid. All concurrency issues boil down to coordinating access to mutable state. The less mutable state, the easier it is to ensure thread safety.
  2. Make fields final unless they need to be mutable.
  3. Immutable objects are automatically thread-safe. Immutable objects simplify concurrent programming tremendously. They are simpler and safer, and can be shared freely without locking or defensive copying.
  4. Encapsulation makes it practical to manage the complexity. You could write a thread-safe program with all data stored in global variables, but why would you want to? Encapsulating data within objects makes it easier to preserve their invariants; encapsulating synchronization within objects makes it easier to comply with their synchronization policy.
  5. Guard each mutable variable with a lock.
  6. Guard all variables in an invariant with the same lock.
  7. Hold locks for the duration of compound actions.
  8. A program that accesses a mutable variable from multiple threads without synchronization is a broken program.
  9. Don't rely on clever reasoning about why you don't need to synchronize.
  10. Include thread safety in the design process-or explicitly document that your class is not thread-safe.
  11. Document your synchronization policy.
Significantly, almost all of these relate to the difficulties/complexities associated with the synchronization of access to mutable variables. In fact, it's worthwhile to reiterate that item #1 explicitly states "All concurrency issues boil down to coordinating access to mutable state". Clojure avoids most of these issues by:
  1. Making all variables and data structures immutable by default. If you look at the "cheat sheat" summary above, this alone eliminates a huge range of concurrency problems that one has to deal with in Java (and other languages that provide mutable variables and data structures as the default).
  2. Using Agents to manage the update of independent values in their own thread of execution. Agents run in a thread pool and have a consistent execution sequence. Therefore (as explained in the documentation), "The actions of all Agents get interleaved amongst threads in a thread pool. At any point in time, at most one action for each Agent is being executed. Actions dispatched to an agent from another single agent or thread will occur in the order they were sent, potentially interleaved with actions dispatched to the same agent from other sources."
  3. Making provision for thread-local bindings of "inherited" variables (called "Vars" in Clojure). By default, threads will inherit the value of any Vars that have been set by the calling process. However, threads can create their own "local binding" to these Vars without affecting the value of the Var in the calling process or in another thread. The "local binding" can be established temporarily in a thread with the "root" binding being restored afterwards. All such temporary bindings are local to the thread only.
  4. Providing Software transactional memory (STM) control of updates to thread-shared variables (called "Refs" in Clojure). Multiple agents may all need to update the same Ref. Clojure's STM implementation provides "a"tomicity (every change to Refs made within a transaction occurs or none do), "c"onsistency (Refs remain in a consistent state before the start of a transaction and after the transaction is over regardless of whether the trasaction was successful or not), and "i"solated (no transaction sees the effects of any other transaction while it is running) ACID properties for any update of a Ref. Clojure's STM uses multiversion concurrency control with adaptive history queues for snapshot isolation, and provides a commute operation. This means that any "reads" of Refs are guaranteed to provide a consistent set of values ("snapshot") regardless of whether there are currently updates occuring to the Refs in other threads and any "writes" of the Refs are guaranteed to be applied against the same snapshot that was read. Clojure's STM manages the necessary locking (behind the scenes) of all the objects. The commute operation allows Clojure programs to read-and-update a Ref within a transaction (for example, if multiple Agents are incrementing a counter, it's important that they all be able to correctly update the counter; however, the updates do not necessarily need to occur in any particular order).
The following simplistic diagram illustrates a very basic example of these Clojure concurrency features. In the diagram, a Clojure program has created 4 separate Agents, 2 Vars, and 1 Ref. All 4 Agents can return their own independent values when queried by the calling process. Agents 1 & 2 both reference the Var 1 that was defined at the "root" level; however, when Agent 2 changes the value of Var 1, that change is local to Agent 2 only. Agent 1 sees the "root" binding of Var 1 and Agent 2 sees the local binding of Var 1 and neither see the alternative bindings. Agents 3 & 4 both need to update the value of Ref 1. However, since Refs can only be updated inside an STM transaction, the "A", "C", and "I" ACID properties of the data are maintained.

Clojure Concurrency

When compared to Java/C++, Clojure's approach to concurrency is much simpler and easier to "get right" when you're coding concurrent programs. However, Clojure isn't the only language to provide "native" support for concurrency. It's interesting to compare the approach that Clojure has taken with the approach that Erlang (another language that has built-in support for concurrency) has taken. I've posted a bit about Erlang in the past on my blog; so, although I'm not an Erlang expert, I do know something about the language. I'll focus on three main areas as a basis for comparison:
  1. Autonomous/independent threads of execution: Actors in Erlang, Agents in Clojure. Erlang's actors are light-weight "user-land" processes (e.g. - "green threads") that are autonomous and can scale tremendously. Clojure's agents are native threads that are managed in thread pools for performance. As explained in the documentation, "Clojure's Agents are reactive, not autonomous - there is no imperative message loop and no blocking receive". Rich Hickey has summarized the Actor/Agent differences and his design rationale:
    "There are other ways to model identity and state, one of the more popular of which is the message-passing actor model, best exemplified by the quite impressive Erlang. In an actor model, state is encapsulated in an actor (identity) and can only be affected/seen via the passing of messages (values). In an asynchronous system like Erlang's, reading some aspect of an actor's state requires sending a request message, waiting for a response, and the actor sending a response. It is important to understand that the actor model was designed to address the problems of distributed programs. And the problems of distributed programs are much harder - there are multiple worlds (address spaces), direct observation is not possible, interaction occurs over possibly unreliable channels, etc. The actor model supports transparent distribution. If you write all of your code this way, you are not bound to the actual location of the other actors, allowing a system to be spread over multiple processes/machines without changing the code.

    I chose not to use the Erlang-style actor model for same-process state management in Clojure for several reasons:
    • It is a much more complex programming model, requiring 2-message conversations for the simplest data reads, and forcing the use of blocking message receives, which introduce the potential for deadlock. Programming for the failure modes of distribution means utilizing timeouts etc. It causes a bifurcation of the program protocols, some of which are represented by functions and others by the values of messages.
    • It doesn't let you fully leverage the efficiencies of being in the same process. It is quite possible to efficiently directly share a large immutable data structure between threads, but the actor model forces intervening conversations and, potentially, copying. Reads and writes get serialized and block each other, etc.
    • It reduces your flexibility in modeling - this is a world in which everyone sits in a windowless room and communicates only by mail. Programs are decomposed as piles of blocking switch statements. You can only handle messages you anticipated receiving. Coordinating activities involving multiple actors is very difficult. You can't observe anything without its cooperation/coordination - making ad-hoc reporting or analysis impossible, instead forcing every actor to participate in each protocol.
    • It is often the case that taking something that works well locally and transparently distributing it doesn't work out - the conversation granularity is too chatty or the message payloads are too large or the failure modes change the optimal work partitioning, i.e. transparent distribution isn't transparent and the code has to change anyway.
    Clojure may eventually support the actor model for distributed programming, paying the price only when distribution is required, but I think it is quite cumbersome for same-process programming. YMMV of course."
  2. Safe use/update of shared data: Mnesia in Erlang, STM in Clojure. In Erlang, control of updates to thread-shared data is handled by Mnesia, a distributed DBMS. This provides full ACID data integrity; however, Erlang is designed to be a distributed concurrency language and Clojure is not, so the cost/benefit of using a DBMS for non-distributed data integrity has to be taken into consideration. Clojure's STM-based approach is an effective, well-performing solution that doesn't require any out-of-process storage. Although STM has recently had some vocal detractors, most of the criticisms of STM would appear to apply more in an environment where mutable variables/structures are the norm. There is a thread on the Clojure mailing list that details a number of reasons why Clojure's use of STM avoids many of the things that STM has been criticized for. As concurrency-guru Cliff Click says in the thread:
    "I think Clojure is on to something good here - Immutability IS the Big Hammer For Concurrency here, and the STM is just one of several flavors of 'Big Hammer isn't working so I need another approach'. Given the Immutability-By-Default, STM has a much better chance of being performant than in other languages so it makes sense to give it a go."
  3. Distributed concurrency: This one really subsumes the other two (as well as some others that I won't be discussing here). Providing native support for distributed concurrency (also called parallel computing) affects quite a few design choices when you're building a language. Erlang was designed from the outset to support both distributed and non-distributed (same-address-space) concurrency. Therefore, Erlang uses the same actor-based messaging model in both distributed and non-distributed processing environments and Mnesia is designed to provide a shared data store for Erlang processes that are not necessarily running on the same machine. These choices make sense for Erlang because Erlang has a unified approach to both distributed and non-distributed concurrency. Clojure deliberately does not have any unified, "built-in" support for distributed concurrency. This decision means that distributed concurrency can be more difficult to program in Clojure than in Erlang; however, it also means that non-distributed concurrency (which, for many applications, will be the more common concurrency scenario) can be catered for in an easier manner. In a comparison of the two languages, Rich Hickey explained his reasons for not designing Clojure with support for a distributed concurrency model:
    "I have some reservations about unifying the distributed and non-distributed models (see e.g. http://research.sun.com/techrep/1994/smli_tr-94-29.pdf), and have decided not to do so in Clojure, but I think Erlang, in doing so, does the right thing in forcing programmers to work as if the processes are distributed even when they are not, in order to allow the possibility of transparent distribution later, e.g. in the failure modes, the messaging system etc. However, issues related to latency, bandwidth, timeouts, chattiness, and costs of certain data structures etc remain. My experiences with transparent distribution were with COM and DCOM, and, quite frankly, not happy ones. I think Erlang has a much better story there, but the fact is that distributed programming is more complex, and I personally wouldn't want to incur that complexity all the time. I think it is ok for a system designer to decide one part of a system will be distributed and another not, and incur some rewrite if they are wrong. If I wrote phone switches I might think otherwise :) "
    Rich recently outlined his thoughts on what is available (for distributed concurrency support) as well as what he might do in the future:
    "There's JMS:
    http://en.wikipedia.org/wiki/Java_Message_Service
    https://mq.dev.java.net/
    http://joram.objectweb.org/
    http://activemq.apache.org/

    XMPP:
    http://en.wikipedia.org/wiki/Xmpp
    http://www.igniterealtime.org/projects/index.jsp

    JXTA/Shoal:
    http://en.wikipedia.org/wiki/Jxta
    https://shoal.dev.java.net/

    JINI:
    http://en.wikipedia.org/wiki/Jini
    http://incubator.apache.org/projects/river.html

    DHTs like Pastry:
    http://freepastry.org/

    JGroups:
    http://www.jgroups.org/javagroupsnew/docs/index.html

    Terracotta:
    http://www.terracotta.org

    Jinterface:
    http://www.erlang.org/doc/apps/jinterface/

    NetKernel:
    http://www.1060.org/

    and more. All useful from Clojure. Given the diversity, sophistication, maturity, interoperability, robustness etc of these options, it's unlikely I'm going to fiddle around with some language-specific solution. That said, I have been thinking about putting a simple wrapper API around queues that would work both locally and over something like JMS."
    So, for the time being, it's possible to "roll your own" distributed Clojure programs; but, in the future, there may be some extra or built-in support for distributed concurrency in Clojure.
To summarize, Erlang supports a unified approach to both distributed and non-distributed concurrency while Clojure focuses on non-distributed concurrency. One could find plenty of valid reasons to prefer either Erlang's or Clojure's approach to concurrency. The fact is, although both languages have strong support for concurrency as a guiding principle, they were both influenced by different design criteria so their concurrency support emphasizes different things. But, even though Erlang has a lot of really nice features for concurrency, it has not really caught on as a "mainstream" programming language (which is a shame as Erlang is a really nice language and superb for concurrency-oriented programming). Clojure seems much more likely to attract a larger following of developers due to a combination of different (but related) reasons: Java was never originally designed as a language for concurrent programming, it was designed as a language for embedded device programming with good OOP support. If you compared Java to C++ when Java came on the scene in the 1990's, it was less powerful, wasn't as fast, didn't have as many libraries, and it's support for Object-oriented programming (OOP) was not as featureful (and much less "complete" than some other languages). However, Java grabbed both market and mind share by providing the right combination of language and OOP features in a familiar package (familiar, at least, for C/C++ programmers) and in a way that was easier to use and "get right" than in C++. In many ways, Clojure has the potential to do for concurrency-oriented programming what Java did for object-oriented programming a decade ago: make it simpler to do properly using a language (or, in Clojure's case, a "language environment") that is similar to what programmers are already used to.

emacs Copyright © 2008 by Bill Clementson