Clementson's Blog

Bits and pieces (mostly Lisp-related) that I collect from the ether.

October 2005
Sun Mon Tue Wed Thu Fri Sat
1
2 3 4 5 6 7 8
9 10 11 12 13 14 15
16 17 18 19 20 21 22
23 24 25 26 27 28 29
30 31
Sep  Nov

How are Lisps implemented?

Tuesday, October 11, 2005

One might intuitively think that most multi-platform CL implementations are written in C these days. I mean, that's what most language compilers are written in, right? However, you might be surprised to find that that isn't really true for some CL implementations! On c.l.l., Edi Weitz pointed out that, for CMUCL at least (and probably also SBCL, OpenMCL, LispWorks, and Allegro CL), C represents only a small portion of the SLOC in the implementation:

"Let's have a look at the CMUCL sources:
  edi@vmware:/tmp/src$ find . -name '*.c' -o -name '*.h' | xargs cat | wc -l
  35577
  edi@vmware:/tmp/src$ find . -name '*.lisp' | xargs cat | wc -l
  423835
Note that the C files include benchmarks, a Motif server, and code for several different architectures and operating systems. For a specific platform I think it is safe to say that less than 5% of the CMUCL code is C. AFAIK the C code is just a thin 'driver' to get the image started in a hostile (Unix) environment.

I'm pretty sure similar numbers apply to SBCL, OpenMCL, LispWorks, and AllegroCL. Maybe Duane, Christophe, or other implementors who read c.l.l want to chime in. (GCL, ECL, and CLISP are different, of course.)

Saying that CMUCL is partly implemented in C is a bit of a stretch IMHO. It's like saying that Apache is partly implemented as a shell script... :)"
Just out of curiosity, I checked the numbers for OpenMCL and SBCL as well. Here are the results I got: Now, mind you, these types of number comparisons don't really tend to tell you much without further supporting analysis. Therefore, I probably wouldn't have blogged about this except that Duane Rettig of Franz followed up Edi's post with a very interesting breakdown of Allegro CL (slightly edited):
"Allegro CL is written with three languages:
    Image:

  1. The Common Lisp itself: The lisp compiler compiles down to .fasl files, which are loaded (or deferred to autoload) at build time and dumped into an image.

    Library:

  2. C code: This code is mainly interface code, using .h files to parse the incredibly complex macrology involved on some architectures. Objects like stat structures and signal contexts are parameterized on the whims of the vendor's developers (subject to whatever new standards they are asked to follow). Rather than trying to track these structures by hand over 15 to 20 different operating systems, we let the vendor's C compiler do the work for these interfacing needs.

  3. A 'runtime system Lisp': This lisp source is not Common Lisp, but a lisp where integers are machine-integers and operations like + and logand are compiled into individual machine instructions. The Allegro CL compiler is used, in a runtime-system mode, to generate assembler source code which is then linked wth the C object files and any libraries to form a shared-library.

    [Note on the library: our garbage collector is also currently written in C. It need not be, and I've been enhancing the runtime-system lisp to handle all constructs that are needed by the gc. One thing remains that I'm aware of - our rs lisp knows how to reference and set global variables (not lisp globals, but 'globl' or 'public' variables - some would call these C variables, but they are really just exported named data words known by the linker). But it does not yet know how to define and allocate the storage for such globals - currently it has to be done in C. I may end up borrowing CL's defvar for such a purpose, but haven't decided. This project has been a low priority project, but once it is done it will allow the gc to be rewritten (or new ones written) in rs lisp language.]

    In addition to the library, There is some C and/or C++ code which is linked to form the main(), whose purpose is to load the shared library and to run the lisp_main. It is a very small amount of C code for which we provide the user with the option to write his own, in his own favorite language.

As for LOC, I won't give those out, but suffice it to say that the Lisp and rs lisp code far outstrips the C code, and if we were to move our gc over to rs lisp the C code would become incidental."
Cool! It's really neat to get this sort of "behind the scenes" look at how a major Common Lisp implementation is designed! As language users, we tend to focus on the "outward" characteristics of the CL implementation that we use (e.g. - things like performance, conformance to ANSI, application library support, etc.); however, it's nice every once in a while to see the type of innovative work and thinking that is going on "under the covers". It would be great to see summaries posted by some of the other commercial and open source CL implementers too.

emacs Copyright © 2005 by Bill Clementson