Bill Clementson's Blog

Bits and pieces (mostly Lisp-related) that I collect from the ether.

September 2004
Sun Mon Tue Wed Thu Fri Sat
1 2 3 4
5 6 7 8 9 10 11
12 13 14 15 16 17 18
19 20 21 22 23 24 25
26 27 28 29 30
Aug  Oct

Allegro Prolog - Part 2

Monday, September 20, 2004

Yesterday, I posted about Allegro Prolog, Franz's new Prolog in CL. One thing I should point out is that, in comparing an Allegro Prolog version of Einstein's Riddle with a CL version, I wasn't trying to make any definitive performance statements (nor should my post be considered a critique of Edi's CL code - he didn't code his version of the problem with the intention that it would be used as a performance benchmark). I just took a CL program that had gone through a number of iterations to improve performance and compared it's performance with the Allegro Prolog equivalent. It would definitely be possible to improve both the CL program's performance and the Allegro Prolog program's performance; however, I was just trying to see whether Allegro Prolog performance was "good enough" out of the box. It appears that Allegro Prolog performance is indeed "good enough" - in fact, I think that it is very good!

Ng Pheng Siong posted a weblog entry that referred to mine. In addition to providing a native Prolog version of the Einstein's Riddle problem, he gave some good reasons for using an "embedded Prolog" such as Allegro Prolog instead of a native Prolog implementation:

"So why does one program in an embedded Prolog, as compared to doing so in 'pure' Prolog? For starters, there's the 'it looks alien' thing: if Common Lisp or Smalltalk are strange to the 'average hacker' steeped in Algol-family languages, Prolog will require even more rewiring of the mind. Programming Prolog embedded in a host language allows the programmer to use Prolog for what it is good at, without him attempting to also write a GUI or serve content over HTTP in Prolog.

A capable host language like Common Lisp can implement Prolog in itself and provide seamless bidirectional interfacing, make Prolog facts and rules persistent in 'native' format, operate on the tuples in an SQL database directly as if they are Prolog facts without a whole lot of extra infrastructure, compile to native code, etc."
I also got an email from Steve Haflich, the Franz developer who is the author of Allegro Prolog. In addition to showing how the Allegro Prolog Einstein's Riddle code's performance could be improved, he provided some interesting insights. His email is reproduced (with permission) here:
"Greetings, Bill --
Jans Aasman forwarded your blog mention of Allegro Prolog. I'm the author of this code, and I certainly appreciate the favorable review.
I want to clarify two points on the performance and benchmarking. The Prolog implementation is very new and these issues certainly are not obvious. This may change as the documentation and interface with the rest of the programming environment both mature.
First, Allegro Prolog compiles a Prolog functor/arity into Lisp code. it manages this pretty much automatically -- each time a functor is (re)defined, that functor/arity is replaced with a stub that recompiles itself the first time it is called. This means that the Lisp compiler (along with the Prolog->Lisp translation machinery) is invoked the first time each functor/arity is called. I've repeated your benchmark run below. Note that the first run, when compilation happens during the very first dotimes iteration, there is significantly more consing. (The cpu cost of compilation seems to be lost in the noise, but these benchmarks were not run under clean, standalone machine conditions.)
cl-user(17): (time (dotimes (n 100) (prolog (einstein ?houses ?fish-owner) !))) ; cpu time (non-gc) 8,410 msec user, 20 msec system ; cpu time (gc) 0 msec user, 0 msec system ; cpu time (total) 8,410 msec user, 20 msec system ; real time 8,424 msec ; space allocation: ; 134,924 cons cells, 943,984 other bytes, 0 static bytes nil cl-user(18): (time (dotimes (n 100) (prolog (einstein ?houses ?fish-owner) !))) ; cpu time (non-gc) 8,470 msec user, 20 msec system ; cpu time (gc) 0 msec user, 0 msec system ; cpu time (total) 8,470 msec user, 20 msec system ; real time 8,473 msec ; space allocation: ; 33,020 cons cells, 532,408 other bytes, 0 static bytes nil cl-user(19): (time (dotimes (n 100) (prolog (einstein ?houses ?fish-owner) !))) ; cpu time (non-gc) 8,480 msec user, 0 msec system ; cpu time (gc) 0 msec user, 0 msec system ; cpu time (total) 8,480 msec user, 0 msec system ; real time 8,481 msec ; space allocation: ; 33,020 cons cells, 532,408 other bytes, 0 static bytes nil
Anyway, the compilation can be forced by calling (prolog-compile-symbols) before benchmarking. It can be called at load time by putting this form at the end of a Prolog source. Some day I'll arrange macrology to do the Lisp->Prolog compilation at file compile time, but right now it isn't possible.
The next point is that this benchmark depends on a fair amount of interpreted code:
(time (dotimes (n 100) (prolog (einstein ?houses ?fish-owner) !)))
prolog:prolog is a fairly hairy Lisp macro (M-Sh-M or M-Sh-W it to see the expansion) and all that macroexpansion and interpreted execution happens inside the benchmark. Further, interpreted dotimes is stupid, expanding the macro afresh for each of the 100 iterations. This doesn't account for a large portion of the total run time, but interpreted execution is sufficiently slow that I believe the time is large enough to show up in the benchmark.
I would run the benchmark this way (or more straightforwardly by moving the whole benchmark into a compiled defun).
cl-user(37): (funcall (compile nil '(lambda () (time (dotimes (n 100) (prolog (einstein ?houses ?fish-owner) !)))))) ; cpu time (non-gc) 7,360 msec user, 0 msec system ; cpu time (gc) 0 msec user, 0 msec system ; cpu time (total) 7,360 msec user, 0 msec system ; real time 7,359 msec ; space allocation: ; 304 cons cells, 206,400 other bytes, 0 static bytes nil
FInally, I was surprised by the remaining consing since the Prolog engine actually runs without consing. Everything Prolog does by unification and continuation is created with dynamic extent, so by creating convoluted code walking macros that slather dynamic-extent declarations in all the right places it is possible to stack cons everything. (Exceptions are functors like bagof and setof which may create structure by unification that survive the dynamic extent of the functor call.) The Prolog `trail' is a growable stack recording unifications that is treated as a resource. But what is still happening in the example above is that a fresh trail is alloaced for each run because the ! cut operator takes a nonlocal exit from the entire prolog macroexpansion -- bypassing the deallocation of the trail resource.
To demonstrate, here is another run with the cut operation removed. (Takes longer to run because the engine continues searching after finding the only solution, but this demonstrates the resource recovery.) It might cons a single new resource vector the first time it is run, but thereafter it runs essentially cons free. This is the second run:
I think I know how to fix this, and I'll try to get around to it before the next time the prolog patch is updated.
cl-user(40): (funcall (compile nil '(lambda () (time (dotimes (n 100) (prolog (einstein ?houses ?fish-owner) !)))))) ; cpu time (non-gc) 15,320 msec user, 10 msec system ; cpu time (gc) 0 msec user, 0 msec system ; cpu time (total) 15,320 msec user, 10 msec system ; real time 15,333 msec ; space allocation: ; 204 cons cells, 0 other bytes, 0 static bytes nil
I think the remaining conses come from the closure storage creation -- they would mostly disappear at higher safety, where local dynamic-extent declarations would be trusted:
cl-user(41): (funcall (compile nil '(lambda () (declare (optimize speed)) (time (dotimes (n 100) (prolog (einstein ?houses ?fish-owner) !)))))) ; cpu time (non-gc) 15,350 msec user, 0 msec system ; cpu time (gc) 0 msec user, 0 msec system ; cpu time (total) 15,350 msec user, 0 msec system ; real time 15,356 msec ; space allocation: ; 4 cons cells, 0 other bytes, 0 static bytes nil
Likely none of these benchmark / coding issues would be apparent even to the most knowledgeable programmer, which is why we intend to continue improving the programming api, the documentation, and the examples.
Don't feel under any obligation to upadte your review, which was pleasingly positive, but I wanted to comments the results and explain these obscure points.
Thanks again for the notice."

emacs Copyright © 2005 by Bill Clementson