Thursday, February 8, 2007

Sick, plus ordered bell numbers

I've been sick the last few days. I did a hard run last Friday, had dinner with a friend, and woke up the next morning with the worst sore throat imaginable. Since then I've been battling with a nasty upper respiratory virus that has me hacking like a tuberculosis plagued sailor. I've mostly been in bed.

While in bed, I've been doing programming problems. Now that I am done with my job interviews at Yahoo and Google, I'm not sure I really need to practice problems, but I enjoy them immensely, and have been hanging out at the website TopCoder. It has become a replacement for the amount of time I usually spend online playing chess.

I've done hundreds of these problems, but one interesting one yesterday went something like this:


GEEK ALERT! I'm going to get technical on your ass, so don't feel obligated to continue reading.


Suppose you have N different objects and have been 
given some informationabout their ordering. The
information will be given as a vector<string>
facts whose elements will be of the form "A=B" where A
and B are integers between 0 and N-1 inclusive. "0=8"
means the 0th object is equal to the 8th object. Given
these facts you will return the total possible number
of distinct ways the objects could be ordered.

For example: N = 4 facts = {"0=2","1=3"} Possible
orderings: 0=2<1=3, 1=3<0=2, 0=1=2=3 In this case
there are 3 possible orderings, so you would return 3.
If no facts were given, there would be 75 possible
orderings. As seen above, the orderings described here
are total orderings. This means, for all pairs of
objects, the ordering is specified (one is less than
the other, or they are equal). Two orderings are the
same if all pairs of objects in each ordering have
exactly the same relationship.

I generally love probability problems, but it took me a while to figure this one out. It wasn't reasonable to loop through all possibilities since there could be billions. The problem reminded me of the problem trying calculating how many possible rolls there are with N number of K sided dice. In that problem you have K buckets and try to determine how many ways to fit the N dice inside of them. In our problem, however, the number of buckets aren't fixed.

A simplification of the problem is to ignore the facts. Actually, when you have a fact, you can just treat the resulting information as one number. So if you know that a==b, you can pretend you have one less number to work with. The simplified problem goes like this:
Given N numbers, how many ways are there to group 
them into sets. Each set will have equal signs
between it's members, and between the sets you
will place less than signs. The order of the sets
is important. The order of the numbers within a
set is unimportant.

I manually calculated the first few results, then came up with a mechanism for counting them which became a royal pain to implement programatically. The first few results for N items are: 1, 3, 13, 75, 541... After some frustration I hit the jackpot, a really cool website that allows you to look up integer sequences. I discovered that the sequence above lists the ordered Bell numbers also known as the Fubini numbers. There is also an interesting set of numbers called the unordered Bell numbers or just plain Bell numbers, which tells you the number of ways to group N items into sets when the order of the sets is unimportant. Information on calculating regular Bell numbers was abundant, but not ordered Bell numbers, which is what I needed.

After some more research I disovered something called Stirling numbers, or more specifically, Stirling numbers of the second kind. Stirling numbers form a table that looks like this:


n \ k 1 2 3 4 5 6
1 1
2 1 1
3 1 3 1
4 1 7 6 1
5 1 15 25 10 1
6 1 31 90 65 15 1


Note that we've labeled each row n, and each column k, so we can write S(n,k). S(5,2) is 15 for example. Each number can be calculated by taking the number above itself times k + the number diagonally northwest of itself. So S(n, k) = S(n-1, k) * k + S(n-1, k-1).

Why are Stirling numbers useful for calculating the ordered Bell numbers? It turns out that if you want to find an ordered bell number, you just have to pick the corresponding row of stirling numbers, and sum them, multipying each term by k factorial as you move across the row. For example to find the 4th ordered Bell number B(4), you would pick the 4th row of Stirling numbers and add as follows: 1 * 1! + 7 * 2! + 6 * 3! + 1 * 4! = 75. Magic!

This works because you are actually summing the combinations for different ways of grouping the numbers. It makes for some easy code. Since the internet seems void of code to calculate ordered bell numbers here's a rough snippet in case someone finds it useful:


...
//Get the ordered bell # using a stirling transform
long long fub = 0;
long long kFact = 1;
for(int k = 1; k <= n; ++k) {
kFact *= k;
fub += calcStirling(n, k) * kFact;
}
return fub;

...

static long long calcStirling(int n, int k) {
assert(n > 0);
assert(k > 0);
if (k == 1) {
return 1;
}
if (k == n) {
return 1;
}
long long result = calcStirling(n - 1, k - 1) +
k * calcStirling(n - 1, k);
return result;
}

Enough geeky math stuff. Next time I'll talk about my triathlon training.

1 comment:

Jon said...

Thanks for letting us in on the Ordered Bell Numbers. It makes me want to quit my job and go to math grad school. I especially like the way you show the thought process you used to solve the problem, including the 'aha' moments.