CraigBlog

Thursday, October 11, 2007

How to Improve Collection Initializers

I think I'm starting to traverse the Linq/Orcas circle of (software) life, because I finally hit something that I don't like and that I couldn't work around. So far, anyway - maybe someone out there can tell me where I'm going wrong.

You may know that in the Orcas version of C#, they've added support for initializing collections via the funky new object initialization syntax. You can read about it here. The issue that I'm running into has to do with the fact that I'd like to initialize a collection in an object initialization expression, but I'm using XLinq to do it, and as a result I'm getting a compiler error.

See, I can get this to work:

Person craig = new Person

{

Name = "Craig",

Age = 35

Children =

{

new Person

{

Name = "Ellen",

Age = 2.9

new Person

{

Name = "TBD",

Age = -0.4

}

But because this is just a simple example and in my real code I don't know the collection items at compile time, what I really want to do something is like this:

Person craig = new Person

{

Name = "Craig",

Age = 35,

Children =

{

GetProgeny("Craig Andera")

}

Where the Person class is the obvious implementation, and GetProgeny returns IEnumerable<Person>. Without doing anything else, I get a compiler error saying that Collection<Person> does not contain an overload of Add that accepts an IEnumerable<Person>, which makes perfect sense given that the collection initialization syntax is just shorthand for a bunch of calls to Add.

The part that sucks is that I can't get this to compose with extension methods to get what I want. That is, I'd like it if I were able to define an extension method like this:

public static void Add(this ICollection<Person> collection, IEnumerable<Person> items)

{

foreach (Person item in items)

{

collection.Add(item);

}

But that doesn't work. I'm not sure why, but here's the list of things I hope it is, in decreasing order of desirability:

I'm doing something wrong.

It doesn't work and there's a good reason.

It doesn't work and there's no good reason.

Anyone have any idea which it is?

Update: I should have said right up front that Children is a read-only property of Person. I always make my collections read-only, and would prefer to do so in this case as well.

Tuesday, October 9, 2007

Add Using Statement in Visual Studio 2008

Update: Apparently this has been available since VS2005. Who knew? Other than the people that commented, of course. Certainly not me. :)

Update #2: So, the "organize usings" menu item that I happened to show in the screenshot alphabetizes the using statements. Also very nice, although I'd like it better if it didn't remove the line breaks I put between groupings. Still, nice. You can also remove unused using statements. Here's the menu:

Jon Flanders showed this to me the other day. In Visual Studio 2008, if you've just added a class for which you haven't yet added the corresponding using statement to bring the namespace into scope, you can hit control-dot, and up pops the following menu:

And sure it enough it pops a "using System.Collections;" in up at the top of the file. Very nice. Better still if it would put it in alphabetical order, but I seem to recall that the order of using statements is significant when Linq is in play, so maybe they couldn't do that.

You can also access it via the right-click menu:

Tuesday, September 25, 2007

HashSet&lt;T&gt; - Finally a Set Implementation in the .NET Class Libraries

Update: Despite what the documentation says, this class is new to .NET 3.5, and is not present in the .NET 3.0 libraries.

Update: Fixed typo - HashSet does not allow duplicate members, just as you'd expect.

I was poking around the other day and I noticed System.Collections.Generics.HashSet<T>. Sure enough, it's a collection class with the semantics of a set: there are no duplicates no matter how many times you add an item to the set, it is unordered, and it supports intersection, union, and subset operations.

It's part of .NET ~~3.0~~ 3.5, so it's ~~been around for a little while~~ not quite out yet. I doubt I'm the last (although I'm even more sure I'm not the first :) to hear of it, though. Go check it out, because a set is one of those data structures that gets used all the time. If you've ever instantiated a Dictionary<T, bool> just to keep track of if you've seen something, you know what I mean.

Thanks, CLR team!

Wednesday, September 19, 2007

Parameterless Lambdas in C#

I'm liking Linq, but I think I like it more for the new language features it forced than for the query syntax itself. Lambdas and object initialization syntax - particularly collection initialization syntax - are extremely handy. Of course, I'm intentionally using them all I possibly can to find out where the cracks are. No doubt with experience I'll look at my current patterns as gross overuse. Nonetheless, it's fun. :)

I was writing some test code today in Visual Studio 2008 when I hit a good use case for lambdas. I wanted to execute a statement and compare the result to an expected value, but only if the expected value wasn't null. In that case I don't want to execute the statement at all, because it has unwanted side effects (specifically, it asserts something I know will be false).

I could have thrown a crateload of "if" statements in to check for null, but that wouldn't be very DRY. The solution I hit on was to use a parameterless lambda instead. It's passed to a method that evaluates the lambda if the thing it's being compared to isn't null. It works well. What I wanted to blog here was the syntax, because it wasn't immediately obvious to me, and it didn't really turn up anything in a few moments of Googling. So here it is for the record:

AssertEqualIfApplicable(

expected,

() => SomeMethodWithSideEffects(foo, bar))

private void AssertEqualIfApplicable(object expected, Func<object> actual)

{

if (expected != null)
{
Assert.AreEqual(actual());
}

}

So it's the "() =>" that defines a parameterless lambda. The syntax is a bit awkward, but not painfully so, IMO. The thing I especially like about this is that the lambda allows me to use lexical closures. Which is a fancy way of saying that foo and bar in my example above can be local variables, and their values will still be accessible when the lambda is invoked.

I must not have studied enough Lisp yet, because my reaction to this is "cool!" rather than "pfagh! Lisp has had that since 1827". Oh, and Keith, yes, I realize you are laughing at me, right now. :)

Thursday, September 13, 2007

I Can't Be the Only One (Re)Learning Lisp

It's funny how when you go down a new path, you suddenly see evidence that you're not alone. Here's some evidence (from the truly marvelous xkcd.com) that I can't be the only one that decided to spend more time with Lisp lately:

and

I found the latter especially amusing given that my official job title is "Jedi Master". Of course, given how bad the most recent movies were, I've been thinking of changing it to "Wizard". :)

By the way, if you're interested in Lisp, be sure to read the absolutely excellent book "Practical Common Lisp". Some of my readers pointed it out to me, so I've been reading it. Highly recommended, especially as it is available online in its entirety for free.

Wednesday, September 12, 2007

Linqing to Xhtml - Part 2

Update: Fixed minor bug in implementation of IsOfClass.

In a previous post, I talked about how I'm hoping to be able to use Linq for XML to allow me to process XHTML, my current favorite data serialization format. At the end of that post, I wrote code more or less like this:

var alternates = from ul
   in document.Element(xhtmlns + "html").Element(xhtmlns + "body").Elements(xhtmlns + "ul")
   where ul.Attribute("class").Value == "alternates"
   select 
     from li 
     in ul.Elements(xhtmlns + "li") 
     select li.Value;

The idea was to be able to pull the values out of a bunch of XHTML list items. The problem with this code is that it doesn't really give me what I want. If you were to look at the type of the object referred to by alternates, you'd discover that it's a

System.Linq.Enumerable.SelectIterator<System.Xml.Linq.XElement,System.Collections.Generic.IEnumerable<string>>

Which - if you can read that expression without going blind - indicates that what I've got is essentially "a sequence of a sequence of strings". No, that's not a typo: it's a sequence of sequences, and iterating over it with a nested loop is sort of annoying.

Fortunately, it appears that Don Box reads this blog, or at least read that post. :) He's the co-author of the rather excellent article found here, and had I read it I would have known the solution. But even though I hadn't (I have now - you should, too), he was kind enough to drop by with a comment that made everything work. Here's the code I'm using now:

var alternates =
    from ul
    in document.Element(Xhtml.Tag("html")).Element(Xhtml.Tag("body")).Elements(Xhtml.Tag("ul"))
    where ul.IsOfClass("alternates")
    from li
    in ul.Elements(Xhtml.Tag("li"))
    where li.IsOfClass("alternate")
    select li.Value;

I've made a few changes beyond just the Linq bits, but I'll explain those in a minute. The key to making the query work was removing the first "select". Having the second "from" clause follow without an intervening "select" results in a SelectMany method call (all the Linq keywords like select, from, where, etc. are just shorthand for method calls). And that's exactly what we want: SelectMany collapses the query to a single dimension. Read the article for a better explanation. With this change, the query now returns something we can iterate directly over with a "foreach (string alternate in alternates)". Nice.

As for the other changes, there were a couple. One was to create a class called Xhtml with a static method called Tag that creates my XNames. This just cleans up the code a little bit from all that "xhtmlns +" stuff I had before. I also created this extension method:

public static bool IsOfClass(this XElement element, string className)
{
    // TODO: this should really account for the fact that the class 
    // attribute is multivalued - i.e. it's legal to have
    // class="foo bar quux", and we should return true for any of 
    // foo, bar, or quux. 
    return element.Attribute("class").Value.Equals(className); 
}

to let me use IsOfClass on an XElement - I just think the syntax is cleaner, and as I do more and more XHTML processing, stuff like this should help contribute to my goal of a reasonable syntax.

Tuesday, September 11, 2007

Birthday Problem Variation - In Lisp

I get interesting email from my readers from time to time. The other day someone wrote me up asking me "What are the odds that exactly three people in a room of 19 have the same birthday?" Apparently his wife had challenged him with this one and it was bugging him. I suppose he must have found me via my post about the Birthday Problem. His problem was a bit different - the code I wrote will calculate the odds that at least two people in a room have the same birthday, not that exactly three do.

The way I approached the problem was to break it down. I realized that it's fairly easy to calculate the odds that the first three people in a room of 19 have the same birthday: it's just the odds that the first three have some birthday times the odds that everyone else has a different birthday. The odds that the first three people have the same birthday is (1/365)^2, since it doesn't matter what the first person's birthday is, and then the other two have to have the same birthday. The odds that the rest of them have a different birthday is (364/365 * 363/365 * … * (365-16)/365).

That tells us what the odds of the first three people having the same birthday and everyone else having the same birthday is. To compute the odds of any three people having the same birthday, we just need to figure out how many ways there are to arrange 3 people out of 19, without regard to order (i.e. we don’t care if it's Steve, Bob, and Alice or Alice, Bob and Steve). But that's just the combination operation, occasionally notated C(n, k), and it's just n!/(k! * (n-k)!). The odds for any three people are just the odds for the first three times C(19, 3).

As you know, I've been learning Lisp, and I figured this would be a great problem to get a little experience with. And it was. Here's the code:

(defun fact (n)
  "Compute the factorial of n"
  (if (= n 1) 
      1 
      (* n (fact (- n 1)))))

(defun combin (n k)
  "The combination of n things taken k ways"
  (/ (fact n) (* (fact k) (fact (- n k)))))

(defun odds-of-others (n)
  "The odds that the other n-3 people in the room have some other birthday"
  (* (/ (fact 364) (fact (- 364 (- n 3))) (expt 365 (- n 3)))))

(defun odds-of-first-three (n)
  "The odds that the first three people in the room have the same birthday, 
and that everyone else has a different birthday"
  (* 1/365 1/365 (odds-of-others n)))

(float (* (odds-of-first-three 19) (combin 19 3)))

To my eye, this is a lovely, compact way to express the solution. Others may feel differently. :) Also, I've made no attempt to optimize the code.

If you're not familiar with Lisp, there are a few interesting things to point out here. First, note that the second line of each function definition (defun defines a function) is an optional documentation string. I just love that that sort of thing is built in to the language.

Second, one of the great things about using Lisp for this application is that it supports both rational numbers and big integers natively. That is, when you compute something like (/ 1 365) - the integer one divided by the integer 365 - the result is the rational number 1/365. If you multiply rationals, it continues to maintain the result as a rational. Plus, really large integers are also supported natively, including as the components of a rational. Which means that the expression (* (odds-of-first-three 19) (combin 19 3)) returns this:

105393858057485882391608381135063049633792/21153953378595625024923538729098931884765625

Which is pleasingly precise, but you can see why I cast it to a float. :) It's about 0.5%, in case you were wondering.

I should note that strictly speaking this calculates the odds that exactly three people in a room have the same birthday and that everyone else has a different birthday from everyone else. If you only care that three people have the same birthday and that no one else has the same birthday as them, the odds will be slightly higher, as we have to include the case where (for example) the first three were born on March 1st, and everyone else was born on March 2nd. That just means switching the term that looks like (364/365 * 363/365 * … * (365-16)/365) to one that looks like (364/365)^(n-3). That pushes the odds up to about 0.7%. The only code change we have to make to do that calculation is this:

(defun odds-of-first-three (n)
"The odds that the first three people in the room have the same birthday, 
and that everyone else has a different birthday"
  (* 1/365 1/365 (expt 364/365 (- n 3))))

Anyway, it was a fun problem, particularly to stretch my (nascent) Lisp muscles.

Oh, and I should warn you that although I think I've gotten this right, there's every chance someone who's much better than I at probability and/or Lisp will find a mistake. Hopefully if so, they'll leave a comment.