Clojure could be to Concurrency-Oriented Programming what Java was to OOP
Monday, December 1, 2008
It's difficult to know what specific things will cause a programming language
to succeed. It's always easier in hindsight to come up with the
reasons why one language was successful and another one wasn't. I think that sometimes it's a case of being in the right place at the
right time with the right set of features. In his famous article
"The Free Lunch Is Over: A Fundamental Turn Toward Concurrency in
Software", Herb Sutter said "Concurrency is the next major revolution in how we
write software". That was in 2005 - today, this is even more
evident with the proliferation of multi-core CPU's.
Clojure is a new language and one of the things that Clojure really seems to "get right" is it's
approach to concurrency. It is extremely simple to use (and
understand) and (especially if you've ever
written concurrent code in Java or
C++) most people will find writing concurrent Clojure programs a real
pleasure! I've written
about concurrency quite a bit in the past (mostly in relation to
Lisp or Erlang), and I was very impressed by how Rich Hickey
has made concurrency support an integral part of Clojure. So, what
makes Clojure's concurrency support different from other languages?
A comment by Dion Almaer, former editor of TheServerSide, is recounted in the Preface to the book Java Concurrency in Practice in which he stated "that most Java programs are so rife with concurrency bugs that they work only 'by accident'". And, if you read the book, it's easy to see why getting concurrency right in Java is so difficult (although the book is about Java concurrency, the same issues are common to many popular programming languages today). At the end of Part I of Java Concurrency in Practice, there is a nice "cheat sheet" summary of the main concepts and rules related to writing good concurrent code:
- t's the mutable state, stupid. All concurrency issues boil down to coordinating access to mutable state. The less mutable state, the easier it is to ensure thread safety.
- Make fields final unless they need to be mutable.
- Immutable objects are automatically thread-safe. Immutable objects simplify concurrent programming tremendously. They are simpler and safer, and can be shared freely without locking or defensive copying.
- Encapsulation makes it practical to manage the complexity. You could write a thread-safe program with all data stored in global variables, but why would you want to? Encapsulating data within objects makes it easier to preserve their invariants; encapsulating synchronization within objects makes it easier to comply with their synchronization policy.
- Guard each mutable variable with a lock.
- Guard all variables in an invariant with the same lock.
- Hold locks for the duration of compound actions.
- A program that accesses a mutable variable from multiple threads without synchronization is a broken program.
- Don't rely on clever reasoning about why you don't need to synchronize.
- Include thread safety in the design process-or explicitly document that your class is not thread-safe.
- Document your synchronization policy.
- Making all variables and data structures immutable by default. If you look at the "cheat sheat" summary above, this alone eliminates a huge range of concurrency problems that one has to deal with in Java (and other languages that provide mutable variables and data structures as the default).
- Using Agents to manage the update of independent values in their own thread of execution. Agents run in a thread pool and have a consistent execution sequence. Therefore (as explained in the documentation), "The actions of all Agents get interleaved amongst threads in a thread pool. At any point in time, at most one action for each Agent is being executed. Actions dispatched to an agent from another single agent or thread will occur in the order they were sent, potentially interleaved with actions dispatched to the same agent from other sources."
- Making provision for thread-local bindings of "inherited" variables (called "Vars" in Clojure). By default, threads will inherit the value of any Vars that have been set by the calling process. However, threads can create their own "local binding" to these Vars without affecting the value of the Var in the calling process or in another thread. The "local binding" can be established temporarily in a thread with the "root" binding being restored afterwards. All such temporary bindings are local to the thread only.
- Providing Software transactional memory (STM) control of updates to thread-shared variables (called "Refs" in Clojure). Multiple agents may all need to update the same Ref. Clojure's STM implementation provides "a"tomicity (every change to Refs made within a transaction occurs or none do), "c"onsistency (Refs remain in a consistent state before the start of a transaction and after the transaction is over regardless of whether the trasaction was successful or not), and "i"solated (no transaction sees the effects of any other transaction while it is running) ACID properties for any update of a Ref. Clojure's STM uses multiversion concurrency control with adaptive history queues for snapshot isolation, and provides a commute operation. This means that any "reads" of Refs are guaranteed to provide a consistent set of values ("snapshot") regardless of whether there are currently updates occuring to the Refs in other threads and any "writes" of the Refs are guaranteed to be applied against the same snapshot that was read. Clojure's STM manages the necessary locking (behind the scenes) of all the objects. The commute operation allows Clojure programs to read-and-update a Ref within a transaction (for example, if multiple Agents are incrementing a counter, it's important that they all be able to correctly update the counter; however, the updates do not necessarily need to occur in any particular order).
When compared to Java/C++, Clojure's approach to concurrency is much simpler and easier to "get right" when you're coding concurrent programs. However, Clojure isn't the only language to provide "native" support for concurrency. It's interesting to compare the approach that Clojure has taken with the approach that Erlang (another language that has built-in support for concurrency) has taken. I've posted a bit about Erlang in the past on my blog; so, although I'm not an Erlang expert, I do know something about the language. I'll focus on three main areas as a basis for comparison:
- Autonomous/independent threads of execution:
Actors in Erlang,
Agents in Clojure. Erlang's actors are light-weight "user-land"
processes (e.g. -
"green threads") that are autonomous and can scale tremendously. Clojure's agents are native
threads that are managed in
thread pools for performance. As
explained in the documentation, "Clojure's Agents are reactive, not autonomous - there is no imperative message loop and no blocking receive". Rich Hickey
has summarized the Actor/Agent differences and his design rationale:
"There are other ways to model identity and state, one of the more popular of which is the message-passing actor model, best exemplified by the quite impressive Erlang. In an actor model, state is encapsulated in an actor (identity) and can only be affected/seen via the passing of messages (values). In an asynchronous system like Erlang's, reading some aspect of an actor's state requires sending a request message, waiting for a response, and the actor sending a response. It is important to understand that the actor model was designed to address the problems of distributed programs. And the problems of distributed programs are much harder - there are multiple worlds (address spaces), direct observation is not possible, interaction occurs over possibly unreliable channels, etc. The actor model supports transparent distribution. If you write all of your code this way, you are not bound to the actual location of the other actors, allowing a system to be spread over multiple processes/machines without changing the code.
I chose not to use the Erlang-style actor model for same-process state management in Clojure for several reasons:
- It is a much more complex programming model, requiring 2-message conversations for the simplest data reads, and forcing the use of blocking message receives, which introduce the potential for deadlock. Programming for the failure modes of distribution means utilizing timeouts etc. It causes a bifurcation of the program protocols, some of which are represented by functions and others by the values of messages.
- It doesn't let you fully leverage the efficiencies of being in the same process. It is quite possible to efficiently directly share a large immutable data structure between threads, but the actor model forces intervening conversations and, potentially, copying. Reads and writes get serialized and block each other, etc.
- It reduces your flexibility in modeling - this is a world in which everyone sits in a windowless room and communicates only by mail. Programs are decomposed as piles of blocking switch statements. You can only handle messages you anticipated receiving. Coordinating activities involving multiple actors is very difficult. You can't observe anything without its cooperation/coordination - making ad-hoc reporting or analysis impossible, instead forcing every actor to participate in each protocol.
- It is often the case that taking something that works well locally and transparently distributing it doesn't work out - the conversation granularity is too chatty or the message payloads are too large or the failure modes change the optimal work partitioning, i.e. transparent distribution isn't transparent and the code has to change anyway.
- Safe use/update of shared data:
Mnesia in Erlang,
STM in Clojure. In Erlang, control of updates to
thread-shared data is handled by
Mnesia, a distributed DBMS. This provides full
ACID data integrity; however, Erlang is designed to be a
distributed concurrency language and Clojure is
not, so the cost/benefit of using a DBMS for non-distributed data integrity has to be
taken into consideration. Clojure's STM-based approach is an effective, well-performing
solution that doesn't require any out-of-process
storage. Although STM
has recently had some vocal detractors, most of the criticisms of STM
would appear to apply more in an environment where mutable
variables/structures are the norm. There is a
thread on the Clojure mailing list that details a number of
reasons why Clojure's use of STM avoids many of the things that
STM has been criticized for. As concurrency-guru
Cliff Click says
in the thread:
"I think Clojure is on to something good here - Immutability IS the Big Hammer For Concurrency here, and the STM is just one of several flavors of 'Big Hammer isn't working so I need another approach'. Given the Immutability-By-Default, STM has a much better chance of being performant than in other languages so it makes sense to give it a go."
- Distributed concurrency: This one really subsumes the
other two (as well as some others that I won't be
discussing here). Providing native support for
distributed concurrency (also called
parallel computing) affects quite a few design choices when you're
building a language. Erlang was designed from the outset to
support both distributed and non-distributed (same-address-space)
concurrency. Therefore, Erlang uses the same actor-based messaging model in both distributed and non-distributed
processing environments and Mnesia is designed to provide a shared data store for
Erlang processes that are not necessarily running on the same
machine. These choices make sense for Erlang because Erlang has a
unified approach to
both distributed and non-distributed concurrency. Clojure deliberately does not have any
unified, "built-in" support for distributed concurrency. This
decision means that distributed concurrency can be more difficult to
program in Clojure than in Erlang; however, it also means that
non-distributed concurrency (which, for many applications, will be
the more common concurrency scenario) can be catered for in an easier manner. In
a comparison of the two languages,
Rich Hickey explained his reasons
for not designing Clojure with support for a distributed concurrency model:
"I have some reservations about unifying the distributed and non-distributed models (see e.g. http://research.sun.com/techrep/1994/smli_tr-94-29.pdf), and have decided not to do so in Clojure, but I think Erlang, in doing so, does the right thing in forcing programmers to work as if the processes are distributed even when they are not, in order to allow the possibility of transparent distribution later, e.g. in the failure modes, the messaging system etc. However, issues related to latency, bandwidth, timeouts, chattiness, and costs of certain data structures etc remain. My experiences with transparent distribution were with COM and DCOM, and, quite frankly, not happy ones. I think Erlang has a much better story there, but the fact is that distributed programming is more complex, and I personally wouldn't want to incur that complexity all the time. I think it is ok for a system designer to decide one part of a system will be distributed and another not, and incur some rewrite if they are wrong. If I wrote phone switches I might think otherwise :) "Rich recently outlined his thoughts on what is available (for distributed concurrency support) as well as what he might do in the future:
"There's JMS:So, for the time being, it's possible to "roll your own" distributed Clojure programs; but, in the future, there may be some extra or built-in support for distributed concurrency in Clojure.
DHTs like Pastry:
and more. All useful from Clojure. Given the diversity, sophistication, maturity, interoperability, robustness etc of these options, it's unlikely I'm going to fiddle around with some language-specific solution. That said, I have been thinking about putting a simple wrapper API around queues that would work both locally and over something like JMS."
- With the recent proliferation of multi-core CPUs, there are a growing number of developers who are developing applications that use concurrency.
- Clojure simplifies the process of developing a typical concurrent application.
- Many "modern" legacy systems are written in Java. Clojure runs on the JVM which makes it more likely to get a "foot into" existing corporate development shops that have a large investment in Java.
- Clojure interops seamlessly with Java and therefore provides easy access to a huge range of Java libraries that Clojure programmers can use.