Clementson's Blog

Bits and pieces (mostly Lisp-related) that I collect from the ether.

May 2004
Sun Mon Tue Wed Thu Fri Sat
1
2 3 4 5 6 7 8
9 10 11 12 13 14 15
16 17 18 19 20 21 22
23 24 25 26 27 28 29
30 31
Apr  Jun

CL SAX XML Parser Out-Performs Java Xerces

Wednesday, May 26, 2004

I've been experimenting with the new SAX XML parser that is part of the Franz Allegro 7.0 Beta (Note: Franz have kindly agreed to my blogging about features in the beta) and first impressions have been very positive. Subjectively, the parser seemed very fast, so, to get an objective comparison, I decided to test it against the Java Apache Xerces SAX parser (the most commonly used Java XML parser). To compare the two, I used the Counter program that is described on the Xerces SAX Samples page:

"A sample SAX2 counter. This sample program illustrates how to register a SAX2 ContentHandler and receive the callbacks in order to print information about the document. The output of this program shows the time and count of elements, attributes, ignorable whitespaces, and characters appearing in the document.

This class is useful as a "poor-man's" performance tester to compare the speed and accuracy of various SAX parsers. However, it is important to note that the first parse time of a parser will include both VM class load time and parser initialization that would not be present in subsequent parses with the same file."
I used a non-trivial XML file (817KB) with a large number of elements and attributes. For both the CL and the Java parsers, I timed the actual parse time (excluding any load or initialization times).

The CL SAX parser is in the beta of ACL 7.0. The Xerces Java SAX parser is version 2.6.2 running on Java 1.4.2_03. Both were run on a Win2000 PC with 1GB of memory and a 1200MHz CPU.

Here are the results for the Allegro SAX parser:
Final counts: 5495 elems, 35353 attrs, 51999 chars
; cpu time (non-gc) 471 msec user, 10 msec system
; cpu time (gc) 50 msec user, 0 msec system
; cpu time (total) 521 msec user, 10 msec system
; real time 531 msec

Here are the results for Xerces:
600 ms (5495 elems, 35353 attrs, 51999 spaces, 0 chars)

Now, before I get flamed ;-), let me say that you need to take these results with a grain of salt - performance comparisons are always a bit suspect unless you do them in very controlled environments. This test only measures one thing and is not trying to be a definitive test. Relative performance could be quite different when parsing different size XML files and files with different mixes of elements/attributes and content. However, considering that this is the first release of the CL SAX parser and the Java Xerces parser has been enhanced and optimized over a number of years (and it is the most widely used XML SAX parser), I think these are pretty impressive results for Allegro!

If you want to replicate my test, you can download the Xerces Counter program with the Xerces Java Parser. If you just want to examine the Java source for the Counter program, you can see it here. My CL Counter program is a modified version of some sample code that comes with the ACL 7.0 beta. To run it, you'll need to either wait for the ACL 7.0 release or contact Franz and ask to be a beta tester. Here is the code I used:
;; To use, compile and load this file and evaluate:
;; (time (sax-parse-file "sample.xml" :class 'sax-count-parser))
(require :sax)
(use-package :net.xml.sax)
(defstruct counter (elements 0) (attributes 0) (characters 0))
(defclass sax-count-parser (sax-parser) ((counts :initform (make-counter) :reader counts)))
(defmethod start-element ((parser sax-count-parser) iri localname qname attrs) (declare (ignore iri localname qname)) (let ((counter (counts parser))) (incf (counter-elements counter)) (let ((attlen (length attrs))) (if (> attlen 0) (incf (counter-attributes counter) attlen)))))
(defmethod content ((parser sax-count-parser) content start end ignorable) (declare (ignore content ignorable)) (let ((counter (counts parser))) (incf (counter-characters counter) (- end start))))
(defmethod end-document ((parser sax-count-parser)) (let ((counter (counts parser))) (format t "Final counts: ~d elems, ~d attrs, ~d chars~%" (counter-elements counter) (counter-attributes counter) (counter-characters counter))))

emacs Copyright © 2004 by Bill Clementson