Friday, February 4, 2011

The repertoire method in "Concrete Mathematics"

Concrete Mathematics by Knuth et al is a great book but there are a couple of places where people learning by themselves can stumble, fall and get lost.

When I originally worked through the book I couldn't make head or tail of the "repertoire method" of solving recurrences. Eventually I did figure it out and sometime later I wrote up what I understood and posted it on my (then) blog. It seems that a lot of people search for "Concrete Mathematics Repertoire method" on Google and it is my second most popular blog post (The most popular post is this one, fwiw)

So I re-read the repertoire method post today and while it is correct, I can explain it better today so here goes.

(What follows may not make sense to you if you are not working through CM (be warned!). Also be warned that I have no formal training in mathematics, computer science or programming and am entirely self taught. So people with such training can probably explain things better. The following reflects only *my* understanding. That said, onwards!)


By the time you hit the repertoire method section in Chapter 1 of Concrete Mathematics, you have been taught a simple method to find closed forms of recurrences. The essential algorithm is

(a) make a table of the recurrence values R(n) for small values of n.
(b) Eyeball the table to see if you can spot a pattern,
(c) write down the pattern.
(d) Prove (or disprove) the candidate closed form's correctness by Mathematical Induction over (a subset of) the natural numbers.

You used this method, for example to solve the Josephus problem, so you know where to stand (in the circle of your idiot friends trying to commit mass suicide) so that you end up being the survivor), surrender to the Romans and become a historian for the ages).

The repertoire method makes its (first) appearance in the generalization of the Josephus recurrence. The constants 1 and -1 are replaced by alpha beta and gamma to give the more general recurrence

f(1) = alpha
f(2n) = 2*f(n) + beta
f(2n + 1) = 2*f(n) + gamma~~~~~~~~~~~~~~~~~~~~~~~[1]

Your job is to find f(n) such that this recurrence is true for any values of alpha, beta, and gamma.

So how do you do this? You use the only technique you know (at this point in the book) of making a table for small values of n and eyeballing it to spot a pattern (I am too lazy to reproduce the table here, go buy the book!). You don't spot a pattern for the f(n) but you do notice that all values of f(n) follow a pattern of

f(n) = A(n)*alpha + B(n)*beta + C(n)*gamma~~~~~~~~~~~~~~[2]

This is a kind of template of what the final closed form will look like, depending of course on the values of A(n), B(n) etc

Something very important happened here. You broke down a problem into smaller subproblems. You still don't know the value of f(n) but now you know that if you can find the values of the functions A(n), B(n), and C(n) you have solved f(n).

Now you can find these values with your trusted "spot a pattern and verify with induction" (the only tool you have at this point) OR you can use the (unexplained!) repertoire method.


The first bit of confusion arises because the authors do both. They guess values of all the three functions in n, A(n), B(n) and C(n) from the initial table, say (correctly) that proving that these values by induction is long and tedious and then they go ahead and prove that the guessed value of A(n) is correct by induction! (to be fair they are solving for only A(n) and not all the unknowns simultaneously but though smaller this induction is still tedious. Try it. I did.)

Or, in more detail,

a value is guessed, (that A(n) = 2^m , m coming from rewriting n as 2^m + k as in the original Josephus problem - the book uses the letter l instead of k, I use k to distinguish easily from the number 1), beta and gamma are set to zero (remember the recurrence has to hold for *all* values of alpha, beta and gamma, including the selected 1,0,0) and then the recurrence [1] is rewritten (using [2]) as a recurrence in terms of A(n).

I'll repeat that so it is clear what is happening


Step 1: Guess a value for A(n), by eyeballing.We guess A(n) = 2^m

Step 2: Select alpha, beta and gamma so that the other two functions of n, B(n) and C(n), get eliminated from [2]. You can do this by selecting alpha = 1 and beta = gamma = zero.

Step 3: Rewrite the equation [2] (i.e the observed generalized form) in terms of the selected values of alpha, beta and gamma. You get f(n) = A(n).

Step 4: Use this equation (ie f(n) = A(n)) and your chosen values of alpha, beta and gamma to rewrite the original recurrence (ie equation [1]) to get

A(1) = 1 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~[3]
A(2n) = 2*A(n)~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~[4]
A(2n + 1) = 2*A(N)~~~~~~~~~~~~~~~~~~~~~~~~~~~~~[5]



Now the problem becomes to prove that your guess (i.e, A(n) = A(2^m + k) = 2^m) satisfies this new recurrence as expressed by [3] and [4] and [5].

The authors state "Sure enough it is true (by induction on m) that A(2^m + k) = 2^m "


The authors don't show the induction but you can work out this (tedious but not difficult) induction. Only basic algebraic manipulation is required)

Be aware that (a) the induction is on m, not n and (b) the predicate to be proven has the form P_m: (A(2^m +k) => [3] AND [4] AND [5]).

First, prove P_zero (as m starts from zero, though n starts from 1 - we need an induction on m, not n!). Then prove that P_m => P_m+1. (Weak Induction is sufficient).

So we prove (yay!!) that A(n) does = 2^m.

You could do the same for B(n) and C(n). Select values for alpha, beta and gamma to create recurrences in terms of B(n) only and then C(n) only, just as we did above for A(n), then use mathematical induction over m to prove your guesses correct.

In the book though, the authors switch to the repertoire method to find B(n) and C(n). This switch is the first confusing bit - A(n) is found using a guess + induction (the old method - eyeball, guess, use induction). But then they switch - and the repertoire method is used to find B(n) and C(n) and the initial guesses as to their values are unused!

Worsening the confusion is the fact that the repertoire method is not identified or explained explicitly at this point. As a student states in a margin note (great idea btw) "Beware: the authors are expecting you to figure out the idea of the repertoire method from seats of pants examples instead of giving a top down presentation"

The seat-of-pants example is actually enough if A(n),B(n) and C(n) are all worked out with the repertoire method.


So let us solve the whole thing with the repertoire method (no induction) and see how it works. Let us throw away the guesses about the values of A(n), B(n), and C(n). We'll assume we couldn't make any guesses for A(n), B(n) and C(n).

All we have is the original recurrence

f(1) = alpha
f(2n) = 2*f(n) + beta
f(2n + 1) = 2*f(n) + gamma~~~~~~~~~~~~~~~~~~~~~[1]

and our observation that f(n) always has the form

f(n) = A(n)*alpha + B(n)*beta + C(n)*gamma~~~~~~~~[2]

ok so we don't know (and we need to find out) the values of A(n), B(n) and C(n) .

The repertoire method (for this recurrence) works like this.
(1) Guess a value for f(n). (ie the guess is for the whole of [2] NOT a component A(n) as we did above!)
(2)See if you can find values for alpha, beta and gamma to validate this guess. Rewrite [1], the original recurrence in terms of your guess for f(n). See if you can find values for alpha, beta and gamma.
(3) Substitute these values (of alpha, beta and gamma), back into [2]. You'll get an equation in terms of the three unknowns A(n), B(n) and C(n).
(4)Repeat steps (1) - (3) till you have three independent equations.
(5)Solve for three linear equations in three unknowns.

Done.

Important: If you make a "wrong" guess you will end up with a useless equation like 0 = 0 or an equation that is not independent of the already derived equations and so on. If this happens, don't worry about it, try another guess till you do get three independent equations.

In detail.

I guess (the authors do too) that f(n) = 1.

Rationale for the guess: f(n) = constant is the simplest possible formulation of f(n) (just for fun you might want to try f(n) = 0)

Let us try substituting this in [1]

f(1) = alpha becomes 1 = alpha (since f(n) is 1 for any n).

similarly

f(2n) = 2*f(n) + beta becomes 1 = 2*1 + beta so beta = -1
f(2n + 1) = 2*f(n) + gamma becomes 1 = 2*1 + gamma so gamma = -1

so we have values for alpha,beta, gamma and f(n) and when we substitute back into [2] we get

A(n) - B(n) - C(n) = 1~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~[6]

ok!!! we have the first equation.

Now if we could get just two more independent equations like this we would have three equations in three unknowns (and so solve by Linear Algebra).

The authors use f(n) = n as their second guess and get A(n) + C(n) = n as their second equation.

(This is a good guess too. After f(n) = k, f(n) = n is the next rung up the complexity ladder. But in this specific example we can do better - see below)

and since they already proved that A(n) = 2^m by the "guess and use induction" method they don't need a third equation. They have two equations in two unknowns and they solve to get the solution.

But we don't have any guesses as to the values of A(n), so we can't plug that in.

We guessed at f(n) = 1 and got the equation A(n) - B(n) - C(n) = 1 and we need two more independent equations. We could re use the authors' f(n) = n guess and also cheat a bit and reuse the (non generalized) Josephus recurrence solution that we already proved that to "guess" that f(n) = 2k + 1. Then alpha = 1, beta = -1 and gamma = 1, giving us the third equation A(n) - B(n) + C(n) = 2k + 1. Three equations three variables. Solve. This gives the right answer too.

But that feels like a cheat. What if we hadn't solved the Josephus recurrence before? How would we guess f(n) = 2k + 1? We could go with f(n) = n but we can do better.

Since n = 2^m + k, we guess (our second guess)

f(n) = 2^m

and f(n) = k. (our third guess)

(Rationale for these guesses. Since n is dependent on 2^m and k why not guess with the simpler variables rather than f(n) = n, like the example in the book does? In this case this decision pays off spectacularly, giving the solutions for A(n) and C(n) directly and B(n) trivially )

These give us the equations, (just like we worked out [4] above, try it!)

A(n) = 2^m~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~[7]
C(n) = k~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~[8]

and these two along with [6] give us B(n) = 2^m - k - 1.

Look ma! no induction !

So now we have the values for A(n), B(n) and C(n) and now we can say that

the solution to the recurrence

f(1) = alpha
f(2n) = 2*f(n) + beta
f(2n + 1) = 2*f(n) + gamma is (given n = 2^m + k as explained earlier in the book)

f(n) = (2^m)*alpha + (2^m - k - 1)*beta + k*gamma.

Double checking,When alpha = 1, beta = -1 and gamma = 1 we get

the solution to

f(1) = 1
f(2n) = 2*f(1) - 1
f(2n + 1) = 2*f(n) + 1 (note: this is the original Josephus Recurrence)

is (2^m)*1 + (2^m - k - 1)*(-1) + k*(1), which resolves to

2k+1 which agrees with [1.9] (in the book).

Hopefully now what the authors say about the recurrence method makes more sense, though the sentence structure at this point in the book is confusing Let us take it apart - my comments in italics.

"First we find settings for parameters for which we know the solution" (the parameters here are alpha beta and gamma, the "solution"s are the various guessed values of f(n) NOT A(n),B(n), C(n). We make guesses for A(n),B(n) etc when we are using the prove by Induction method. When we use Repertoire method, we guess for f(n) and *find* A(n), B(n) etc);


"this gives us a repertoire of special cases that we can solve" (the special cases are the independent equations in the unknowns A(n),B(n),C(n) and solving them gives us the values of A(n) etc).


"Then we obtain the general case" (the solution of recurrence [1], the general value of f(n) )

"by combining the special cases" (In this case we combine the solutions of the equations which are the "special cases").

Hopefully that helped a bit.

The repertoire method pops up all over CM in various contexts, and once you grasp it is easy to identify and use. Enjoy the rest of Concrete Mathematics (which imho is a great, great book every programmer should have on his bookshelf)

16 comments:

himanshu said...

"...What follows will make no sense to you if you are not working through CM..."

I don't think prerequisite is that strict. People can make sense out of this as long as they are, in some way, familiar with recurrence relations :)

Ravi said...

@himanshu,

I was referring to the clarity of my writing (and its possible inadequacies in explaining mathematics) not the content, which (as you rightly point out) is fairly simple.

This confusion is a good indicator that I need to improve my writing further.

Moussa Cidibe said...

hah lucky me, I got confused by this just two days after you made this post.

Appreciate it.

rahul said...

Thanks a lot for the post. It was helpful !

Anonymous said...

Thanks a lot for this explanation. I reckon that it saved me quite a lot of time. ;-)

Vaios said...

That was crucial for me to understand. Really the writing of this books gets confusing some times. Perhaps you should write your own!! Are u facebook or some other social network. Its good to know people who seek knowlenge.

Raul said...

Can anyone write the calculations for f(n) = 2^m and f(n) = k???

I don't know how to do it... I can't get to A(n) = 2^m and C(n) = k

thx!

Anonymous said...

Thanks! This clears up the first chapter a lot.

Miguel said...

What I can't wrap my head around is why you're allowed to plug in guesses for f(n) in the first place. They're merely guesses, right? How do we know our results would be consistent with the real answer?

ie: We "guess" f(n) = 1. f(n) is obviously NOT 1, so why are we allowed to work with the results that we obtained from plugging that in?

Mohan Radhakrishnan said...

Hi Ravi,
Do you have any ideas about what to read to start mapping maths to our programming problems in areas other than AI ? I mean that many of our common mundane tasks will have more elegant solutions if one knows the proper maths. So one need not only focus on AI.

Mohan

Anonymous said...

@Miguel,

"We "guess" f(n) = 1. f(n) is obviously NOT 1"

umm in this case, f(n) *is* 1, for a specific set of values of alpha, beta gamma *and n*. We are just going backwards here. Think about it.

In the general case, if the graph of the final answer function doesn't pass through your 'guess point' you won't get valid values for alpha, beta, gamma, A(n) etc.

galib2145 said...

Awesome, really helpful, Thanks much!!

Kunal said...

Thanks for the article. It has really helped me. I have one doubt though.

"I guess (the authors do too) that f(n) = 1.

Rationale for the guess: f(n) = constant is the simplest possible formulation of f(n) (just for fun you might want to try f(n) = 0)

Let us try substituting this in [1]

f(1) = alpha becomes 1 = alpha (since f(n) is 1 for any n).

similarly

f(2n) = 2*f(n) + beta becomes 1 = 2*1 + beta so beta = -1
f(2n + 1) = 2*f(n) + gamma becomes 1 = 2*1 + gamma so gamma = -1"

Here, since f(n)=1, how are we replacing f(2n) and f(2n+1) for 1 on the LHS of the equation?

Ravi said...

f(x) = 1 for all x,including x = n, 2n+1, 2n-1, whatever else you can think of.

f(x) always returns 1 irrespective of the value of the parameter. On a graph, it would be a horizontal line at y = 1

AimlessInLA said...

Only a few days ago I learned the interesting fact that an early appearance in print by Donald Knuth, if not the earliest, was in 1957 in Mad Magazine! As a 19-year-old student he sent in "The Potrezbie System of Measurement", and it was published.

As for the explanation, I haven't gone through it yet but I certainly will. I've been struggling with this for a long time, and it's all the more frustrating because I was immediately able to follow the authors' eyeball-the-table/spot-the-pattern/prove-by-induction derivation of the original problem.

AimlessInLA said...

In the post you say:

The seat-of-pants example is actually enough if A(n),B(n) and C(n) are all worked out with the repertoire method.

Do you mean, if all three component functions are each worked out? It's an important distinction. I'm still struggling with it a bit, but isn't the underlying principle similar to how you solve set of three parallel equations in three variables?

I apologize if I'm way off base!