Exploring Clojure (Lisp on the JVM) - Part 3 - Some Language Features
Monday, October 27, 2008
People who are coming
to
Clojure from Java will probably notice different things about
Clojure than people
who are coming to Clojure from CL or Scheme. In this post, I'll point out some of the language features that I've been noticing (primarily from a
CL programmer's point
of view). This won't be a general overview of Clojure, just some
comments on things that
I've come across in the past few days.
I've recently been playing around with
Namespaces in
Clojure (but that's going to be the topic of a different blog
post) and, in the
course of my experimentation, I found that I needed a way to do an
"exclusive or (XOR)" on sequences (e.g. - find all the values that are in
either one sequence or another but that are not in more than one of the sequences). In writing the function
to do this, I used some Clojure language features that I thought were
worth commenting on. The following code may not be the most idiomatic way
to write the solution in Clojure and it could definitely be improved
upon; however, I thought I would share some of the language features
that I mentally noted down as I went
through the development process.
- First of all, I wrote a function that would compare just two sequences:
(defn seq-xor-2-seqs "Returns the unique values that are in one sequence but not the other." [x y] (concat (loop [coll1 x coll2 y] (if (and coll1 coll2) (recur (rest coll1) (filter (complement #(.equals % (first coll1))) coll2)) coll2)) (loop [coll1 y coll2 x] (if (and coll1 coll2) (recur (rest coll1) (filter (complement #(.equals % (first coll1))) coll2)) coll2))))Testing the function produces the correct results:user> (doc seq-xor-2-seqs) ------------------------- user/seq-xor-2-seqs ([x y]) Returns the unique values that are in one sequence but not the other. nil user> user> (seq-xor-2-seqs '(a b c) '(a c)) (b) user> (seq-xor-2-seqs '((a b c) (a b) (a c)) '((a b) (a b c))) ((a c)) user> (seq-xor-2-seqs [1 2 3] [2 3]) (1) user> (seq-xor {:a 1 :b 2} {:a 1 :b 4}) ([:b 4] [:b 2])Some interesting language features to note about this code:- In the function definition, the comment added after the function name is automatically added as Metadata to the function and can be retrieved with the "doc" command. Metadata can be added to symbols and collections to arbitrarily annotate them. Since I haven't yet seen any interesting examples of how metadata can be useful, I plan to develop an example application that will make use of metadata in a future blog post .
- Since Java doesn't (yet) support tail call optimization, Clojure provides "loop"/"recur" constructs for recursive looping. Although not as elegant as built-in support for tail call optimization, in practice, the Clojure approach works quite nicely and the Clojure compiler complains if your "recur" statement isn't in a tail position.
- The first few tests above compare lists; however, the last two compare Vectors and Maps. This is because Clojure looks at many data types as being Collections which can use a consistent set of sequence functions to operate on them. This is VERY convenient as you write sequence functions once and then can use them on any collection.
- Note two things about the "#(.equals % (first coll1))"
statement:
- The ".equals" is a call to a Java method, not a Clojure function or macro. This is possible because Java Interop is built into Clojure and it is just about as seamless as you can get. This makes it trivial to call out to Java when you need to.
- The standard way to define an anonymous function in Clojure is with the fn special form. However, Clojure's Reader dispatch macro recognizes the "#()" form as being a short-hand form for defining an anonymous function (with "%", "%n", "%&" being used for argument substitution). So, "#(.equals % (first coll1))" could also have been written as "(fn [x] (.equals x (first coll1)))".
- Once the two sequences version of the code was working, I added
support for doing an XOR comparison on any number of sequences:
(in-ns 'bc-defs) (clojure/refer 'clojure)
Here are some test results at the REPL:
(defn- seq-xor-2-seqs "Returns the unique values that are in one sequence but not the other." [x y] (concat (loop [coll1 x coll2 y] (if (and coll1 coll2) (recur (rest coll1) (filter (complement #(.equals % (first coll1))) coll2)) coll2)) (loop [coll1 y coll2 x] (if (and coll1 coll2) (recur (rest coll1) (filter (complement #(.equals % (first coll1))) coll2)) coll2))))
(defn- seq-and-2-seqs "Returns the values that are in both sequences." [x y] (loop [coll1 x coll2 y result nil] (if (and coll1 coll2) (recur (rest coll1) coll2 (concat result (filter #(.equals % (first coll1)) coll2))) result)))
(defn seq-xor "Returns unique values that are in one sequence but not the others." ([] nil) ([x] (distinct x)) ([x & ys] (loop [x1 x ys1 ys old-dups nil] (let [coll1 x1 coll2 (first ys1) coll-rest (rest ys1) result (distinct (seq-xor-2-seqs coll1 (seq-xor-2-seqs old-dups coll2))) dups (distinct (concat old-dups (seq-and-2-seqs coll1 coll2)))] (if (and coll1 coll-rest) (recur result (concat (list (concat dups (first coll-rest))) (rest coll-rest)) dups) result)))))
(comment (seq-xor) (seq-xor '(a b)) (seq-xor '(a b b c)) (seq-xor '((a b) (c d))) (seq-xor '((a b) (c d)) '(a b)) (seq-xor '(a e) '(a b) '((a b) (c d)) '(a b)) (seq-xor '(a b c d e f g) '((a b) (b d)) '(a b) '(c d)) (seq-xor '(a b c d e f g) '(a b) '(b d) '(a b) '(c d)) (seq-xor [1 2 3] [2 3] [4 5]) (seq-xor [1 2 3 7] [2 3] [5 6] [7]) (seq-xor {:a 1 :b 2} {:a 1 :b 4} {:a 3 :b 2}) )user> (load-file "bc-defs.clj") nil user> (clojure/refer 'bc-defs) nil user> (seq-xor) nil user> (seq-xor '(a b)) (a b) user> (seq-xor '(a b b c)) (a b c) user> (seq-xor '((a b) (c d))) ((a b) (c d)) user> (seq-xor '((a b) (c d)) '(a b)) (a b (a b) (c d)) user> (seq-xor '(a e) '(a b) '((a b) (c d)) '(a b)) ((a b) (c d) e) user> (seq-xor '(a b c d e f g) '((a b) (b d)) '(a b) '(c d)) ((a b) (b d) e f g) user> (seq-xor '(a b c d e f g) '(a b) '(b d) '(a b) '(c d)) (e f g) user> (seq-xor [1 2 3] [2 3] [4 5]) (4 5 1) user> (seq-xor [1 2 3 7] [2 3] [5 6] [7]) (5 6 1) user> (seq-xor {:a 1 :b 2} {:a 1 :b 4} {:a 3 :b 2}) ([:a 3] [:b 4])
There are also some interesting language features to note in this
code: - I've created a "bc-defs" Namespace for the code. This, by itself, isn't noteworthy; however, it's interesting to note how I've specified public/private functions in the namespace. In addition to the existing "seq-xor-2-seqs" function, I've added 2 more functions: "seq-and-2-seqs" and "seq-xor". The first two functions are "local" (or "private") functions (e.g. - they will not be "visible" outside of the "bc-defs" namespace). The new "seq-xor" function is the only "exported" (or "public") function. Unlike in some languages where you have to explicitly define lists of exported/local functions, in Clojure you do it when you define the function ("defn-" for local functions and "defn" for exported functions).
- In Clojure, you can define arity overloading and variable-arity functions. I've done this in the "seq-xor" function to allow "seq-xor" to accept either no arguments (in which case, it returns nil), 1 argument (in which case, it returns the distrinct elements in that sequence), or multiple arguments (in which case it returns the distinct elements after an XOR of all of the sequences). In addition to arity overloading, Clojure has Multimethods (which I didn't use in this code) that provide for runtime polymorphism.
- In the "let" statement in "seq-xor", the "result" and "dups" bindings use "coll1" and "coll2" (which were bound in the same "let" statment). So, "let" in Clojure is the same as "let*" in Common Lisp and not the same as "let" in Common Lisp.
- At the bottom of the source file, I've added a "comment" form with some tests. This seems to be a common practice that a number of Clojure programmers are following in order to provide some common usage/test examples. As a side effect, the "comment" macro provides an easy way to disable blocks of code without affecting how Emacs colorizes the code and without altering the "visual structure" of the definitions (a nit, but something that sometimes irritates people when using block comments in other languages).
Update-2008-10-27: On the Clojure mailing list, I said "Since I'm new to Clojure and still learning how to do things, I would appreciate any comments on whether the code could be re-written in a more 'idiomatic' manner in Clojure" and received a couple of replies. Stuart Halloway suggested the following alternative (using Sets for the 2-sequence XOR function):
(defn seq-xor-2-seqs
"Returns the unique values that are in one sequence but not the other."
[x y]
(let [x (into #{} x)
y (into #{} y)]
(lazy-cat (clojure.set/difference x y)
(clojure.set/difference y x))))
and commented: "Not sure this is more idiomatic, though. And I guess it would perform
worse with huge collections..."
Then, Chouser posted the following:"If I'm reading thing's correctly, Bill's solution is O(n^2), while
Stuart's is O(n*log(n)) or better. It looks like difference iterates
over one seq and for each item does a lookup (nearly constant time) on
the the other, although copying into sets may use more memory.
Here's another approach to do all the seqs at once: "
(defn seq-xor
"Returns unique values that are in one sequence but not the others."
[& seqs]
(let [obj-cnt (reduce (fn [acc s]
(merge-with + acc (into {} (for [i s] {i 1}))))
{} seqs)]
(for [[obj cnt] obj-cnt :when (== cnt 1)]
obj)))
What a concise, elegant solution (using
Clojure's version of
List Comprehensions)! Then, Rich Hickey sent his version of the code (using Sets):
Rich then followed this with another variation (inspired by Chouser's, one pass, no counting):(alias 'set 'clojure.set)
(defn set-xor [& sets] (second (reduce (fn [[all ret] s] [(set/union all s) (set/union (set/difference ret s) (set/difference s all))]) [#{} #{}] sets)))
(defn seq-xor [& seqs] (seq (apply set-xor (map set seqs))))
(defn seq-xor [& seqs]
(seq (second
(reduce (fn [[all ret] x]
(if (contains? all x)
[all (disj ret x)]
[(conj all x) (conj ret x)]))
[#{} #{}] (mapcat distinct seqs)))))
Woo hoo! All good alternatives and nice to see how other people would
do the same thing - thanks everyone! Rich also provided an excellent
summary for this post and I'll close on that: "One take away from all of these is that use of sets, maps and vectors is very idiomatic Clojure, and the sooner you can add them to the lists in your toolbox the more productive you'll be with Clojure."

