Clojure Discussion - When Performance Matters
Tuesday, January 13, 2009
There has been an interesting thread on the Clojure mailing list about Clojure and performance. Mark P started off the thread with the following:
"I have recently found out about Clojure and am rather impressed. I am seriously considering whether Clojure is a viable language for use at work. The main stumbling block would be if performance (both speed and memory) turns out to be insufficent. I currently use C++, but I'd love to be able to leave C++ behind and use Clojure (or similar) instead.Among the replies, there were several suggestions to use better/alternative C/C++ interoperability mechanisms since a common consensus (frequently based on benchmarks like this debian one: http://shootout.alioth.debian.org/) was that a JVM-based language could not equal the speed of C/C++. However, some comments by concurrency and JVM guru Cliff Click provided an interesting contrarian view (Note: links inserted by me):
The programs I write perform applied mathematical optimization (using mainly integer arithmetic) and often take hours (occasionally even days) to run. So even small percentage improvements in execution speed can make a significant practical difference. And large problems can use a large amount of memory - so memory efficiency is also a concern.
Given these performance considerations, at first glance Clojure does not seem like a good choice. But I don't want to give up on the idea just yet. The allure of modernized lisp-style programming is really tempting.
There are three key factors that still give me hope:
If all else fails, maybe I could use Clojure as a prototyping language. Then when I get it right, I code up the actual programs in C++. But probably a lot would get lost in the translation from Clojure -> C++ so would it be worth it?
- Some of the algorithms I use have the potential to be parallelized. I am hoping that as the number of cores in PCs increase, at some point Clojure's performance will beat C++'s due to Clojure's superior utilization of multiple cores. (Any ideas on how many cores are needed for this to become true?)
- The JVM is continually being improved. Hopefully in a year or two, the performance of HotSpot will be closer to that of C++. (But maybe this is just wishful thinking.)
- Maybe I can implement certain performance critical components in C++ via the JNI. (But I get the impression that JNI itself isn't particularly efficient. Also, the more I pull over into the C++ side, the fewer advantages to using Clojure.)
I'd love to be convinced that Clojure is a viable choice, but I need to be a realist too. So what do people think? How realistic are my three 'hopes'? And are there any other performance enhancing possibilities that I have not taken into account? "
"Some comments:In the past, I've written (tongue-in-cheek) about how it's fairly easy to prove what you want to prove with benchmarks. I think Cliff's comments are right "on the money" as far as JVM performance is concerned. Clojure provides a number of ways to optimize code for performance (e.g. - with type hints, support for Java primitives, and coercions) while JVM-specific technologies provide many other optimization alternatives that a Clojure programmer can leverage when needed.
- If you think that HotSpot/Java is slower than C++ by any interesting amount, I'd love to see the test case. Being the architect of HotSpot '-server' I've a keen interest in where performance isn't on par with C. Except for a handful of specialized uses (e.g. high-level interpreters using gnu label vars), I've only ever seen equivalent code between C/C++ & Java (not so w/asm+C where the asm calls out specialized ops or makes specialized optimizations).
- As already mentioned, there's no auto-parallelization tools Out There that are ready for prime-time. (there ARE tools out there that can *help* parallelize an algorithm but you need annotations, etc to make them work)
- Making your algorithm parallel is worth an N-times speedup, where N is limited by the algorithm & available CPUs. Since you can get huge CPU counts these days, if you can parallelize your algorithm you'll surely win over almost any other hacking. If you take a 50% slowdown in the hacking but get to run well on a 4-way box, then your 2x ahead. I'd love to say that the JVM 'will just do it', but hand- hacking for parallelism is the current state-of-the-art.
- Java/Clojure makes some of this much easier than in C/C++. Having a memory model is a HUGE help in writing parallel code, as is the Java concurrency libs, or the above-mentioned Colt libraries.
- The debian shootout results generally badly mis-represent Java. Most of them have runtimes that are too small (<10sec) to show off the JIT, and generally don't use any of the features which commonly appear in large Java programs (heavy use of virtuals, deep class hierarchies, etc) for which the JIT does a lot of optimization. I give a public talk on the dangers of microbenchmarks and all the harnesses I've looked at in the shootout fail basic sanity checks. Example: the fannkuch benchmark runs 5 secs in Java, somewhat faster in C++. Why does anybody care about a program which runs 5sec? (there's other worse problems: e.g. the C++ code gets explicit constant array sizes hand-optimized via templates; the equivalent Java optimization isn't done but is trivial (declare 'n' as a *final* static var) and doing so greatly speeds up Java, etc).
- If you need a zillion integer (not FP) parallel Java cycles, look at an Azul box. Azul's got more integer cycles in a flat shared-memory config than anybody else by a long shot."