Tuesday, September 11, 2007

Birthday Problem Variation - In Lisp

I get interesting email from my readers from time to time. The other day someone wrote me up asking me "What are the odds that exactly three people in a room of 19 have the same birthday?" Apparently his wife had challenged him with this one and it was bugging him. I suppose he must have found me via my post about the Birthday Problem. His problem was a bit different - the code I wrote will calculate the odds that at least two people in a room have the same birthday, not that exactly three do.

 

The way I approached the problem was to break it down. I realized that it's fairly easy to calculate the odds that the first three people in a room of 19 have the same birthday: it's just the odds that the first three have some birthday times the odds that everyone else has a different birthday. The odds that the first three people have the same birthday is (1/365)^2, since it doesn't matter what the first person's birthday is, and then the other two have to have the same birthday. The odds that the rest of them have a different birthday is (364/365 * 363/365 * … * (365-16)/365).

 

That tells us what the odds of the first three people having the same birthday and everyone else having the same birthday is. To compute the odds of any three people having the same birthday, we just need to figure out how many ways there are to arrange 3 people out of 19, without regard to order (i.e. we don’t care if it's Steve, Bob, and Alice or Alice, Bob and Steve). But that's just the combination operation, occasionally notated C(n, k), and it's just n!/(k! * (n-k)!). The odds for any three people are just the odds for the first three times C(19, 3).

 

As you know, I've been learning Lisp, and I figured this would be a great problem to get a little experience with. And it was. Here's the code:

(defun fact (n)
"Compute the factorial of n"
(if (= n 1)
1
(* n (fact (- n 1)))))
(defun combin (n k)
"The combination of n things taken k ways"
(/ (fact n) (* (fact k) (fact (- n k)))))
(defun odds-of-others (n)
"The odds that the other n-3 people in the room have some other birthday"
(* (/ (fact 364) (fact (- 364 (- n 3))) (expt 365 (- n 3)))))
(defun odds-of-first-three (n)
"The odds that the first three people in the room have the same birthday,
and that everyone else has a different birthday"
(* 1/365 1/365 (odds-of-others n)))
(float (* (odds-of-first-three 19) (combin 19 3)))

To my eye, this is a lovely, compact way to express the solution. Others may feel differently. :) Also, I've made no attempt to optimize the code.

 

If you're not familiar with Lisp, there are a few interesting things to point out here. First, note that the second line of each function definition (defun defines a function) is an optional documentation string. I just love that that sort of thing is built in to the language.

 

Second, one of the great things about using Lisp for this application is that it supports both rational numbers and big integers natively. That is, when you compute something like (/ 1 365) - the integer one divided by the integer 365 - the result is the rational number 1/365. If you multiply rationals, it continues to maintain the result as a rational. Plus, really large integers are also supported natively, including as the components of a rational. Which means that the expression (* (odds-of-first-three 19) (combin 19 3)) returns this:

 

105393858057485882391608381135063049633792/21153953378595625024923538729098931884765625

 

Which is pleasingly precise, but you can see why I cast it to a float. :) It's about 0.5%, in case you were wondering.

 

I should note that strictly speaking this calculates the odds that exactly three people in a room have the same birthday and that everyone else has a different birthday from everyone else. If you only care that three people have the same birthday and that no one else has the same birthday as them, the odds will be slightly higher, as we have to include the case where (for example) the first three were born on March 1st, and everyone else was born on March 2nd. That just means switching the term that looks like (364/365 * 363/365 * … * (365-16)/365) to one that looks like (364/365)^(n-3). That pushes the odds up to about 0.7%. The only code change we have to make to do that calculation is this:

(defun odds-of-first-three (n)
"The odds that the first three people in the room have the same birthday,
and that everyone else has a different birthday"
  (* 1/365 1/365 (expt 364/365 (- n 3))))

Anyway, it was a fun problem, particularly to stretch my (nascent) Lisp muscles.

 

Oh, and I should warn you that although I think I've gotten this right, there's every chance someone who's much better than I at probability and/or Lisp will find a mistake. Hopefully if so, they'll leave a comment.

4 comments:

  1. Distribution of birthdays in the USA for 1978:



    http://www.dartmouth.edu/~chance/teaching_aids/data/birthday.txt



    ;)



    Fun Fact!



    Q. Why did the Bayesian reasoner cross the road?

    A. You need more information to answer this question.



    -- http://yudkowsky.net/bayes/bayes.html

    ReplyDelete
  2. Sorry, but you're ignoring something crucial here. What you've calculated is the probability of the following event: there are three people who share one birthday, and all others have distinct birthdays. But this ignores the HIGH probability that many other pairs of people share birthdays, so the number you're calculating is artificially low (by a huge factor for large n).

    ReplyDelete
  3. @Todd: did you read the entire post? There's a bit at the end where I address exactly that. Are you saying I got that part wrong?

    ReplyDelete