Tuesday, March 9, 2010

Creating a Lazy Sequence of Directory Descendants in Clojure

Update: it turns out Clojure has a built-in function file-seq that does (almost) exactly this. The difference is that you get a lazy sequence of File objects, not paths. Hooray for "batteries included"!


I was playing around with Clojure this morning, attempting to write a little photo viewer application. As part of this, I wound up writing a function that produces a lazy sequence of all the files that are descendants of a given directory. I was pretty happy with how it turned out, so I thought I’d share it. Here’s the code in its entirety: 



(import [java.io File])

(defn dir-descendants [dir]
(let [children (.listFiles (File. dir))]
(lazy-cat
(map (memfn getPath) (filter (memfn isFile) children))
(mapcat dir-descendants
(map (memfn getPath) (filter (memfn isDirectory) children))))))



The key bit here is the lazy-cat, which returns a lazy sequence – it doesn’t evaluate the second expression (in which I walk into the children of the current directory) until someone asks for the next element in the sequence. In other words, even though this function looks recursive, it’s not. Because lazy-cat returns right away, the “nested” call to  dir-descendants happens after the stack has been popped. This is totally awesome, because it means I can call dir-descendants and it’ll return right away. So I can start displaying pictures without having to wait for the entire directory tree to be enumerated. 


Two things struck me about writing this code. The first is that it’s really, really compact. It would be even smaller if it weren’t for the calls to memfn, which is just there so I can use a Java method as if it were a Clojure function. The second is that it was a bit of a mind-bender to write, although having written it I find it fairly easy to read. But I suspect both of those things would also be true if I attempted to write the equivalent C# (including the laziness, probably via a “yield return” implementation of IEnumerable somewhere), although it would almost certainly be at least somewhat more verbose. 


 

12 comments:

  1. A C# implementation for comparison:

    http://www.sellsbrothers.com/news/showTopic.aspx?ixTopic=2334

    ReplyDelete
  2. Craig,
    Java's listFiles returns an String[], so although your Closure wrapper may be lazy, you'll still block if the directory contains a lot of files.

    In C# 4, this is:
    IEnumerable Directory.EnumerateFiles()
    as opposed to the Directory.GetFiles, which like Java, returns a string[].

    ReplyDelete
  3. Yesterday, I posted some Clojure code that showed how to create a function that would lazily return all

    ReplyDelete
  4. The built-in Clojure function file-seq does exactly this. Its implementation uses tree-seq, making it much shorter.

    ReplyDelete
  5. perhaps a more idiomatic way to do this in clojure would be to use file-seq which returns a lazy tree of files. So the code above could be done as:

    (ns my-test
    (:import (java.io File))

    (defn dir-descendents [dir]
    (file-seq (File. dir))

    Maybe not as fun as putting all those cat, maps and filters together though.

    ReplyDelete
    Replies
    1. you need to close your namespace statement, but nice, tight code.

      Delete
  6. Another in C# 3 (though the indentation may get destroyed):

    using System.Collections.Generic;
    using System.IO;
    using System.Linq;

    namespace ConsoleApplication1
    {
    internal class Program
    {
    private static IEnumerable GetDirectoryDescendants(string path)
    {
    return Directory.GetFiles(path)
    .Concat(Directory.GetDirectories(path)
    .SelectMany(directory => GetDirectoryDescendants(directory)));
    }
    }
    }

    ReplyDelete
  7. @Jeremy: I like it. :)

    ReplyDelete
  8. @candara - Thanks! I saw what Mr. Sells had done and immediately thought "Well, if he can hand-crank an iterator in five minutes I can surely bust out some LINQ in two minutes." :)

    ReplyDelete
  9. #(.someMethod %) is more idiomatic than (memfn someMethod)--memfn is a living fossil.

    ReplyDelete
  10. @Stu: thanks, good to know. I'm not able to track the current Clojure idioms anywhere nearly as closely as I'd like. I suppose I'm like a guy who shows up in London speaking Middle English. :)

    I may swing through that code again at some point to see if I can tighten it up any more.

    ReplyDelete