CraigBlog: Creating a Lazy Enumeration of Directory Descendants in C#

Update: it turns out that Clojure has a built-in file-seq function (thanks to Stuart Sierra for pointing this out) that already does exactly this, making Clojure the winner hands-down for brevity :) Seriously though, arguments about “batteries included” aside – although a valid point of consideration – I still think it’s interesting to consider the bits of code below.

Yesterday, I posted some Clojure code that showed how to create a function that would lazily return all the descendants of a directory. Here’s the code again:

(import [java.io File])

(defn dir-descendants [dir]
  (let [children (.listFiles (File. dir))]
    (lazy-cat 
     (map (memfn getPath) (filter (memfn isFile) children)) 
     (mapcat dir-descendants 
      (map (memfn getPath) (filter (memfn isDirectory) children))))))

Towards the end of the post, I tossed off a somewhat off-hand comment:

Two things struck me about writing this code. The first is that it’s really, really compact. It would be even smaller if it weren’t for the calls to memfn, which is just there so I can use a Java method as if it were a Clojure function. The second is that it was a bit of a mind-bender to write, although having written it I find it fairly easy to read. But I suspect both of those things would also be true if I attempted to write the equivalent C# (including the laziness, probably via a “yield return” implementation of IEnumerable somewhere), although it would almost certainly be at least somewhat more verbose.

Well, Chris Sells called me on that one. Here’s the equivalent C# code he came up with:

using System.IO;
using System.Collections.Generic;

namespace ConsoleApplication1 {
    class Program {
        static IEnumerable GetDirectoryDecendants(string path) {
            foreach (var file in Directory.GetFiles(path)) { yield return file; }
            foreach (var directory in Directory.GetDirectories(path)) {
                foreach (var file in GetDirectoryDecendants(directory)) { yield return file; }
            }
        }
    }
}

I think quantitative comparisons of code terseness are generally pretty silly, but - subjectively - when I look at that it seems just as compact as the Clojure version. So I stand corrected on that one. A few other observations:

Chris included namespace and class statements. Clojure has equivalents for these, but I omitted them. Omitting them from the C# version would make it even shorter.

The C# version can actually be made more lazy than the Clojure one because of the CLR’s Directory.EnumerateFiles method, as pointed out by John in the comments to yesterday’s post. That’s pretty cool. I could, however, take advantage of this if I used ClojureCLR instead of the JVM-based Clojure.

I said that writing the Clojure version was a bit of a mind-bender. To clarify, the bendiness came from the behavior of lazy-cat, and I think primarily because I’m relatively new to Clojure. The “yield return” in the C# code – which is the key to the laziness – gave me similar pause when I first encountered it years ago. And although I’m comfortable enough with it now, the two yield return statements in Chris’s code made me pause for a moment to convince myself it would work without consuming stack. So I think there’s probably some inherent complexity in the laziness that – like many things – is no big barrier once you internalize it in either language.

So I haven’t proven anything to anyone, not even myself. That said, having worked with Clojure a bit and C# a lot, I’m still left with the impression that Clojure is generally more compact. It will be a few years before I can say whether that’s a good thing or not (or even whether it’s true), since I tend to think working with a language on that time scale is the only way to really get to know it. I will do two things, however. The first is to show you another bit of code in both Clojure and C#. First, the C#:

private static IEnumerable Fibs()
{
    yield return 0; 
    yield return 1; 
	
    int a = 0; 
    int b = 1; 
    while (true)
    {
        int temp = a + b; 
        a = b; 
        b = temp; 
        yield return b; 
    }
}

Next, the Clojure:

(def fibs (lazy-cat [0 1] (map + fibs (rest fibs))))

Hopefully it’s obvious from at least one of these that they’re meant to compute the Fibonacci sequence.

Maybe someone can come up with ways to make the C# more compact: I would welcome such suggestions. And I’m not trying to have a Clojure-versus-C# contest here – those are stupid, like saying, “Which is better: a hammer or a drill press?” But language comparisons are still interesting in that you generally wind up learning something you didn’t know about at least one of the languages.

The other thing I’m going to do is to implement my slideshow app (the thing that had me writing the directory recursion function in the first place) in both languages and post them here. Until Chris posted, I was pretty sure I knew how they’d stack up. Now I’m just curious! Maybe you are, too.

Thanks, Chris, for keeping me honest.

3 comments:

AnonymousMarch 10, 2010 at 5:26 AM
Oliver has a F# version - http://www.sturmnet.org/blog/2010/03/10/creating-a-lazy-sequence-of-directory-de
AnonymousMarch 10, 2010 at 11:41 PM
Craig Andera in his blog post showed yet another Fibonacci algorithm, this one with “yeild” operator
AnonymousMarch 11, 2010 at 2:03 AM
Interesting Finds: March 11, 2010

Wednesday, March 10, 2010

Creating a Lazy Enumeration of Directory Descendants in C#

3 comments: