CraigBlog: September 2007

Tuesday, September 25, 2007

HashSet&lt;T&gt; - Finally a Set Implementation in the .NET Class Libraries

Update: Despite what the documentation says, this class is new to .NET 3.5, and is not present in the .NET 3.0 libraries.

Update: Fixed typo - HashSet does not allow duplicate members, just as you'd expect.

I was poking around the other day and I noticed System.Collections.Generics.HashSet<T>. Sure enough, it's a collection class with the semantics of a set: there are no duplicates no matter how many times you add an item to the set, it is unordered, and it supports intersection, union, and subset operations.

It's part of .NET ~~3.0~~ 3.5, so it's ~~been around for a little while~~ not quite out yet. I doubt I'm the last (although I'm even more sure I'm not the first :) to hear of it, though. Go check it out, because a set is one of those data structures that gets used all the time. If you've ever instantiated a Dictionary<T, bool> just to keep track of if you've seen something, you know what I mean.

Thanks, CLR team!

Wednesday, September 19, 2007

Parameterless Lambdas in C#

I'm liking Linq, but I think I like it more for the new language features it forced than for the query syntax itself. Lambdas and object initialization syntax - particularly collection initialization syntax - are extremely handy. Of course, I'm intentionally using them all I possibly can to find out where the cracks are. No doubt with experience I'll look at my current patterns as gross overuse. Nonetheless, it's fun. :)

I was writing some test code today in Visual Studio 2008 when I hit a good use case for lambdas. I wanted to execute a statement and compare the result to an expected value, but only if the expected value wasn't null. In that case I don't want to execute the statement at all, because it has unwanted side effects (specifically, it asserts something I know will be false).

I could have thrown a crateload of "if" statements in to check for null, but that wouldn't be very DRY. The solution I hit on was to use a parameterless lambda instead. It's passed to a method that evaluates the lambda if the thing it's being compared to isn't null. It works well. What I wanted to blog here was the syntax, because it wasn't immediately obvious to me, and it didn't really turn up anything in a few moments of Googling. So here it is for the record:

AssertEqualIfApplicable(

expected,

() => SomeMethodWithSideEffects(foo, bar))

private void AssertEqualIfApplicable(object expected, Func<object> actual)

{

if (expected != null)
{
Assert.AreEqual(actual());
}

}

So it's the "() =>" that defines a parameterless lambda. The syntax is a bit awkward, but not painfully so, IMO. The thing I especially like about this is that the lambda allows me to use lexical closures. Which is a fancy way of saying that foo and bar in my example above can be local variables, and their values will still be accessible when the lambda is invoked.

I must not have studied enough Lisp yet, because my reaction to this is "cool!" rather than "pfagh! Lisp has had that since 1827". Oh, and Keith, yes, I realize you are laughing at me, right now. :)

Thursday, September 13, 2007

I Can't Be the Only One (Re)Learning Lisp

It's funny how when you go down a new path, you suddenly see evidence that you're not alone. Here's some evidence (from the truly marvelous xkcd.com) that I can't be the only one that decided to spend more time with Lisp lately:

and

I found the latter especially amusing given that my official job title is "Jedi Master". Of course, given how bad the most recent movies were, I've been thinking of changing it to "Wizard". :)

By the way, if you're interested in Lisp, be sure to read the absolutely excellent book "Practical Common Lisp". Some of my readers pointed it out to me, so I've been reading it. Highly recommended, especially as it is available online in its entirety for free.

Wednesday, September 12, 2007

Linqing to Xhtml - Part 2

Update: Fixed minor bug in implementation of IsOfClass.

In a previous post, I talked about how I'm hoping to be able to use Linq for XML to allow me to process XHTML, my current favorite data serialization format. At the end of that post, I wrote code more or less like this:

var alternates = from ul
   in document.Element(xhtmlns + "html").Element(xhtmlns + "body").Elements(xhtmlns + "ul")
   where ul.Attribute("class").Value == "alternates"
   select 
     from li 
     in ul.Elements(xhtmlns + "li") 
     select li.Value;

The idea was to be able to pull the values out of a bunch of XHTML list items. The problem with this code is that it doesn't really give me what I want. If you were to look at the type of the object referred to by alternates, you'd discover that it's a

System.Linq.Enumerable.SelectIterator<System.Xml.Linq.XElement,System.Collections.Generic.IEnumerable<string>>

Which - if you can read that expression without going blind - indicates that what I've got is essentially "a sequence of a sequence of strings". No, that's not a typo: it's a sequence of sequences, and iterating over it with a nested loop is sort of annoying.

Fortunately, it appears that Don Box reads this blog, or at least read that post. :) He's the co-author of the rather excellent article found here, and had I read it I would have known the solution. But even though I hadn't (I have now - you should, too), he was kind enough to drop by with a comment that made everything work. Here's the code I'm using now:

var alternates =
    from ul
    in document.Element(Xhtml.Tag("html")).Element(Xhtml.Tag("body")).Elements(Xhtml.Tag("ul"))
    where ul.IsOfClass("alternates")
    from li
    in ul.Elements(Xhtml.Tag("li"))
    where li.IsOfClass("alternate")
    select li.Value;

I've made a few changes beyond just the Linq bits, but I'll explain those in a minute. The key to making the query work was removing the first "select". Having the second "from" clause follow without an intervening "select" results in a SelectMany method call (all the Linq keywords like select, from, where, etc. are just shorthand for method calls). And that's exactly what we want: SelectMany collapses the query to a single dimension. Read the article for a better explanation. With this change, the query now returns something we can iterate directly over with a "foreach (string alternate in alternates)". Nice.

As for the other changes, there were a couple. One was to create a class called Xhtml with a static method called Tag that creates my XNames. This just cleans up the code a little bit from all that "xhtmlns +" stuff I had before. I also created this extension method:

public static bool IsOfClass(this XElement element, string className)
{
    // TODO: this should really account for the fact that the class 
    // attribute is multivalued - i.e. it's legal to have
    // class="foo bar quux", and we should return true for any of 
    // foo, bar, or quux. 
    return element.Attribute("class").Value.Equals(className); 
}

to let me use IsOfClass on an XElement - I just think the syntax is cleaner, and as I do more and more XHTML processing, stuff like this should help contribute to my goal of a reasonable syntax.

Tuesday, September 11, 2007

Birthday Problem Variation - In Lisp

I get interesting email from my readers from time to time. The other day someone wrote me up asking me "What are the odds that exactly three people in a room of 19 have the same birthday?" Apparently his wife had challenged him with this one and it was bugging him. I suppose he must have found me via my post about the Birthday Problem. His problem was a bit different - the code I wrote will calculate the odds that at least two people in a room have the same birthday, not that exactly three do.

The way I approached the problem was to break it down. I realized that it's fairly easy to calculate the odds that the first three people in a room of 19 have the same birthday: it's just the odds that the first three have some birthday times the odds that everyone else has a different birthday. The odds that the first three people have the same birthday is (1/365)^2, since it doesn't matter what the first person's birthday is, and then the other two have to have the same birthday. The odds that the rest of them have a different birthday is (364/365 * 363/365 * … * (365-16)/365).

That tells us what the odds of the first three people having the same birthday and everyone else having the same birthday is. To compute the odds of any three people having the same birthday, we just need to figure out how many ways there are to arrange 3 people out of 19, without regard to order (i.e. we don’t care if it's Steve, Bob, and Alice or Alice, Bob and Steve). But that's just the combination operation, occasionally notated C(n, k), and it's just n!/(k! * (n-k)!). The odds for any three people are just the odds for the first three times C(19, 3).

As you know, I've been learning Lisp, and I figured this would be a great problem to get a little experience with. And it was. Here's the code:

(defun fact (n)
  "Compute the factorial of n"
  (if (= n 1) 
      1 
      (* n (fact (- n 1)))))

(defun combin (n k)
  "The combination of n things taken k ways"
  (/ (fact n) (* (fact k) (fact (- n k)))))

(defun odds-of-others (n)
  "The odds that the other n-3 people in the room have some other birthday"
  (* (/ (fact 364) (fact (- 364 (- n 3))) (expt 365 (- n 3)))))

(defun odds-of-first-three (n)
  "The odds that the first three people in the room have the same birthday, 
and that everyone else has a different birthday"
  (* 1/365 1/365 (odds-of-others n)))

(float (* (odds-of-first-three 19) (combin 19 3)))

To my eye, this is a lovely, compact way to express the solution. Others may feel differently. :) Also, I've made no attempt to optimize the code.

If you're not familiar with Lisp, there are a few interesting things to point out here. First, note that the second line of each function definition (defun defines a function) is an optional documentation string. I just love that that sort of thing is built in to the language.

Second, one of the great things about using Lisp for this application is that it supports both rational numbers and big integers natively. That is, when you compute something like (/ 1 365) - the integer one divided by the integer 365 - the result is the rational number 1/365. If you multiply rationals, it continues to maintain the result as a rational. Plus, really large integers are also supported natively, including as the components of a rational. Which means that the expression (* (odds-of-first-three 19) (combin 19 3)) returns this:

105393858057485882391608381135063049633792/21153953378595625024923538729098931884765625

Which is pleasingly precise, but you can see why I cast it to a float. :) It's about 0.5%, in case you were wondering.

I should note that strictly speaking this calculates the odds that exactly three people in a room have the same birthday and that everyone else has a different birthday from everyone else. If you only care that three people have the same birthday and that no one else has the same birthday as them, the odds will be slightly higher, as we have to include the case where (for example) the first three were born on March 1st, and everyone else was born on March 2nd. That just means switching the term that looks like (364/365 * 363/365 * … * (365-16)/365) to one that looks like (364/365)^(n-3). That pushes the odds up to about 0.7%. The only code change we have to make to do that calculation is this:

(defun odds-of-first-three (n)
"The odds that the first three people in the room have the same birthday, 
and that everyone else has a different birthday"
  (* 1/365 1/365 (expt 364/365 (- n 3))))

Anyway, it was a fun problem, particularly to stretch my (nascent) Lisp muscles.

Oh, and I should warn you that although I think I've gotten this right, there's every chance someone who's much better than I at probability and/or Lisp will find a mistake. Hopefully if so, they'll leave a comment.

Monday, September 10, 2007

Yes, But I'm an &quot;Uber Cool Nerd King&quot;

Chris Sells can kneel before me. :)

NerdTests.com says I'm an Uber Cool Nerd King. What are you? Click here!

Thursday, September 6, 2007

I Am &quot;using System.Xml.Linq;&quot;

As you know, I've come to believe that REST is the correct default way for me to program web services. (Whether it's the correct way for you to do so is an entirely separate question.) Part of my experimentation in this space has been to embrace the idea that "XHTML is the new XML". That is, that data should be encoded using standard XHTML tags rather than custom XML grammars wherever possible.

The idea has enormous potential due not least to the fact that it makes your browser a debugging environment for your web services. However, browsers aren't the real target: custom web service clients are. And in the .NET space, there hasn't been a great API for consuming XHTML as an object graph. Oh, you could use XPath, but that's a bit messy. XmlSerializer would be preferable, but unfortunately it's no good at deserializing based on things like the value of a class attribute of a tag.

Now, however, we have Linq. So given a document like this:

<html xmlns="xhtml namespace">
<body>
  <ul class="alternates">
    <li>3</li>
    <li>4</li>
    <li>5</li>
  </ul>
</body>
</html>

we can do things like this:

XNamespace xhtmlns = "http://www.w3.org/1999/xhtml";

var alternates = from ul in document.Element(xhtmlns + "html").Element(xhtmlns + "body").Element(xhtmlns + "ul")
where ul.Attribute("class").Value == "alternates"
select from li in ul.Elements("li") select li.Value;

which is extremely damn cool. Of course, I can do something even more useful and initialize objects with the results of my query doing something like this instead:

var alternates = from ul
   in document.Element(xhtmlns + "html").Element(xhtmlns + "body").Elements(xhtmlns + "ul")
   where ul.Attribute("class").Value == "alternates"
   select 
     from li 
     in ul.Elements("li") 
     select new Alternate { Value = li.Value } ;

where Alternate is a class I've defined with a public property called Value. I imagine you can get it to work with an anonymous type, too, but so far that hasn't worked for me. Anyway, it's wicked cool nonetheless.

Wednesday, September 5, 2007

And Then There Were…Three and a Third

Coming soon [1] to a crib [2] near us:

[1] Around the end of March.

[2] OK, probably right in the bed just like the last one.