Clementson's Blog

Bits and pieces (mostly Lisp-related) that I collect from the ether.

October 2008
Sun Mon Tue Wed Thu Fri Sat
1 2 3 4
5 6 7 8 9 10 11
12 13 14 15 16 17 18
19 20 21 22 23 24 25
26 27 28 29 30 31
Sep  Nov

Exploring Clojure (Lisp on the JVM) - Part 3 - Some Language Features

Monday, October 27, 2008

People who are coming to Clojure from Java will probably notice different things about Clojure than people who are coming to Clojure from CL or Scheme. In this post, I'll point out some of the language features that I've been noticing (primarily from a CL programmer's point of view). This won't be a general overview of Clojure, just some comments on things that I've come across in the past few days.

I've recently been playing around with Namespaces in Clojure (but that's going to be the topic of a different blog post) and, in the course of my experimentation, I found that I needed a way to do an "exclusive or (XOR)" on sequences (e.g. - find all the values that are in either one sequence or another but that are not in more than one of the sequences). In writing the function to do this, I used some Clojure language features that I thought were worth commenting on. The following code may not be the most idiomatic way to write the solution in Clojure and it could definitely be improved upon; however, I thought I would share some of the language features that I mentally noted down as I went through the development process.

  1. First of all, I wrote a function that would compare just two sequences:
    (defn seq-xor-2-seqs
      "Returns the unique values that are in one sequence but not the other."
      [x y]
      (concat
       (loop [coll1 x
    	  coll2 y]
         (if (and coll1 coll2)
           (recur (rest coll1)
    	      (filter (complement #(.equals % (first coll1))) coll2))
           coll2))
       (loop [coll1 y
    	  coll2 x]
         (if (and coll1 coll2)
           (recur (rest coll1)
    	      (filter (complement #(.equals % (first coll1))) coll2))
           coll2))))
        
    Testing the function produces the correct results:
    user> (doc seq-xor-2-seqs)
    -------------------------
    user/seq-xor-2-seqs
    ([x y])
      Returns the unique values that are in one sequence but not the other.
    nil
    user> 
    user> (seq-xor-2-seqs '(a b c) '(a c))
    (b)
    user> (seq-xor-2-seqs '((a b c) (a b) (a c)) '((a b) (a b c)))
    ((a c))
    user> (seq-xor-2-seqs [1 2 3] [2 3])
    (1)
    user> (seq-xor {:a 1 :b 2} {:a 1 :b 4})
    ([:b 4] [:b 2])
    
    Some interesting language features to note about this code:
    • In the function definition, the comment added after the function name is automatically added as Metadata to the function and can be retrieved with the "doc" command. Metadata can be added to symbols and collections to arbitrarily annotate them. Since I haven't yet seen any interesting examples of how metadata can be useful, I plan to develop an example application that will make use of metadata in a future blog post .
    • Since Java doesn't (yet) support tail call optimization, Clojure provides "loop"/"recur" constructs for recursive looping. Although not as elegant as built-in support for tail call optimization, in practice, the Clojure approach works quite nicely and the Clojure compiler complains if your "recur" statement isn't in a tail position.
    • The first few tests above compare lists; however, the last two compare Vectors and Maps. This is because Clojure looks at many data types as being Collections which can use a consistent set of sequence functions to operate on them. This is VERY convenient as you write sequence functions once and then can use them on any collection.
    • Note two things about the "#(.equals % (first coll1))" statement:
      1. The ".equals" is a call to a Java method, not a Clojure function or macro. This is possible because Java Interop is built into Clojure and it is just about as seamless as you can get. This makes it trivial to call out to Java when you need to.
      2. The standard way to define an anonymous function in Clojure is with the fn special form. However, Clojure's Reader dispatch macro recognizes the "#()" form as being a short-hand form for defining an anonymous function (with "%", "%n", "%&" being used for argument substitution). So, "#(.equals % (first coll1))" could also have been written as "(fn [x] (.equals x (first coll1)))".

  2. Once the two sequences version of the code was working, I added support for doing an XOR comparison on any number of sequences:
    (in-ns 'bc-defs)
    (clojure/refer 'clojure)
    (defn- seq-xor-2-seqs "Returns the unique values that are in one sequence but not the other." [x y] (concat (loop [coll1 x coll2 y] (if (and coll1 coll2) (recur (rest coll1) (filter (complement #(.equals % (first coll1))) coll2)) coll2)) (loop [coll1 y coll2 x] (if (and coll1 coll2) (recur (rest coll1) (filter (complement #(.equals % (first coll1))) coll2)) coll2))))
    (defn- seq-and-2-seqs "Returns the values that are in both sequences." [x y] (loop [coll1 x coll2 y result nil] (if (and coll1 coll2) (recur (rest coll1) coll2 (concat result (filter #(.equals % (first coll1)) coll2))) result)))
    (defn seq-xor "Returns unique values that are in one sequence but not the others." ([] nil) ([x] (distinct x)) ([x & ys] (loop [x1 x ys1 ys old-dups nil] (let [coll1 x1 coll2 (first ys1) coll-rest (rest ys1) result (distinct (seq-xor-2-seqs coll1 (seq-xor-2-seqs old-dups coll2))) dups (distinct (concat old-dups (seq-and-2-seqs coll1 coll2)))] (if (and coll1 coll-rest) (recur result (concat (list (concat dups (first coll-rest))) (rest coll-rest)) dups) result)))))
    (comment (seq-xor) (seq-xor '(a b)) (seq-xor '(a b b c)) (seq-xor '((a b) (c d))) (seq-xor '((a b) (c d)) '(a b)) (seq-xor '(a e) '(a b) '((a b) (c d)) '(a b)) (seq-xor '(a b c d e f g) '((a b) (b d)) '(a b) '(c d)) (seq-xor '(a b c d e f g) '(a b) '(b d) '(a b) '(c d)) (seq-xor [1 2 3] [2 3] [4 5]) (seq-xor [1 2 3 7] [2 3] [5 6] [7]) (seq-xor {:a 1 :b 2} {:a 1 :b 4} {:a 3 :b 2}) )
    Here are some test results at the REPL:
    user> (load-file "bc-defs.clj")
    nil
    user> (clojure/refer 'bc-defs)
    nil
    user> (seq-xor)
    nil
    user> (seq-xor '(a b))
    (a b)
    user> (seq-xor '(a b b c))
    (a b c)
    user> (seq-xor '((a b) (c d)))
    ((a b) (c d))
    user> (seq-xor '((a b) (c d)) '(a b))
    (a b (a b) (c d))
    user> (seq-xor '(a e) '(a b) '((a b) (c d)) '(a b))
    ((a b) (c d) e)
    user> (seq-xor '(a b c d e f g) '((a b) (b d)) '(a b) '(c d))
    ((a b) (b d) e f g)
    user> (seq-xor '(a b c d e f g) '(a b) '(b d) '(a b) '(c d))
    (e f g)
    user> (seq-xor [1 2 3] [2 3] [4 5])
    (4 5 1)
    user> (seq-xor [1 2 3 7] [2 3] [5 6] [7])
    (5 6 1)
    user> (seq-xor {:a 1 :b 2} {:a 1 :b 4} {:a 3 :b 2})
    ([:a 3] [:b 4])
  3. There are also some interesting language features to note in this code:
    • I've created a "bc-defs" Namespace for the code. This, by itself, isn't noteworthy; however, it's interesting to note how I've specified public/private functions in the namespace. In addition to the existing "seq-xor-2-seqs" function, I've added 2 more functions: "seq-and-2-seqs" and "seq-xor". The first two functions are "local" (or "private") functions (e.g. - they will not be "visible" outside of the "bc-defs" namespace). The new "seq-xor" function is the only "exported" (or "public") function. Unlike in some languages where you have to explicitly define lists of exported/local functions, in Clojure you do it when you define the function ("defn-" for local functions and "defn" for exported functions).
    • In Clojure, you can define arity overloading and variable-arity functions. I've done this in the "seq-xor" function to allow "seq-xor" to accept either no arguments (in which case, it returns nil), 1 argument (in which case, it returns the distrinct elements in that sequence), or multiple arguments (in which case it returns the distinct elements after an XOR of all of the sequences). In addition to arity overloading, Clojure has Multimethods (which I didn't use in this code) that provide for runtime polymorphism.
    • In the "let" statement in "seq-xor", the "result" and "dups" bindings use "coll1" and "coll2" (which were bound in the same "let" statment). So, "let" in Clojure is the same as "let*" in Common Lisp and not the same as "let" in Common Lisp.
    • At the bottom of the source file, I've added a "comment" form with some tests. This seems to be a common practice that a number of Clojure programmers are following in order to provide some common usage/test examples. As a side effect, the "comment" macro provides an easy way to disable blocks of code without affecting how Emacs colorizes the code and without altering the "visual structure" of the definitions (a nit, but something that sometimes irritates people when using block comments in other languages).
All in all, from what I've seen so far, programming in Clojure makes sense for a Lisper and it has some nice innovations. The biggest differences in "feel" for someone familiar with programming in CL (in my opinion) are the single paradigm approach in Clojure (and a functional-only approach makes a lot of sense when one of your goals is robust conccurency support) and the fact that Clojure is a Lisp-1 rather than a Lisp-2 (there are pros and cons to each, but I think that being a Lisp-1 is the right choice for Clojure). I'm looking forward to exploring further!


Update-2008-10-27: On the Clojure mailing list, I said "Since I'm new to Clojure and still learning how to do things, I would appreciate any comments on whether the code could be re-written in a more 'idiomatic' manner in Clojure" and received a couple of replies. Stuart Halloway suggested the following alternative (using Sets for the 2-sequence XOR function):
(defn seq-xor-2-seqs 
   "Returns the unique values that are in one sequence but not the other." 
   [x y] 
   (let [x (into #{} x) 
        y (into #{} y)] 
     (lazy-cat (clojure.set/difference x y) 
              (clojure.set/difference y x)))) 
and commented: "Not sure this is more idiomatic, though. And I guess it would perform worse with huge collections..."
Then, Chouser posted the following:
"If I'm reading thing's correctly, Bill's solution is O(n^2), while Stuart's is O(n*log(n)) or better. It looks like difference iterates over one seq and for each item does a lookup (nearly constant time) on the the other, although copying into sets may use more memory. Here's another approach to do all the seqs at once: "
(defn seq-xor 
  "Returns unique values that are in one sequence but not the others." 
  [& seqs] 
  (let [obj-cnt (reduce (fn [acc s] 
                          (merge-with + acc (into {} (for [i s] {i 1})))) 
                        {} seqs)] 
    (for [[obj cnt] obj-cnt :when (== cnt 1)] 
      obj))) 
What a concise, elegant solution (using Clojure's version of List Comprehensions)!

Then, Rich Hickey sent his version of the code (using Sets):
(alias 'set 'clojure.set)
(defn set-xor [& sets] (second (reduce (fn [[all ret] s] [(set/union all s) (set/union (set/difference ret s) (set/difference s all))]) [#{} #{}] sets)))
(defn seq-xor [& seqs] (seq (apply set-xor (map set seqs))))
Rich then followed this with another variation (inspired by Chouser's, one pass, no counting):
(defn seq-xor [& seqs]
  (seq (second
	(reduce (fn [[all ret] x]
		  (if (contains? all x)
		    [all (disj ret x)]
		    [(conj all x) (conj ret x)]))
		[#{} #{}] (mapcat distinct seqs)))))
Woo hoo! All good alternatives and nice to see how other people would do the same thing - thanks everyone! Rich also provided an excellent summary for this post and I'll close on that:
"One take away from all of these is that use of sets, maps and vectors is very idiomatic Clojure, and the sooner you can add them to the lists in your toolbox the more productive you'll be with Clojure."

emacs Copyright © 2008 by Bill Clementson