Patrick Stevens bio photo

Patrick Stevens

Former mathematics student at the University of Cambridge; now a software engineer.

Email Twitter Github Stackoverflow

This is to detail a much easier proof (at least, I find it so) of Cayley-Hamilton than the ones which appear on the Wikipedia page. It only applies in the case of complex vector spaces; most of the post is taken up with a proof of a lemma about complex matrices that is very useful in many contexts.

The idea is as follows: given an arbitrary square matrix, upper-triangularise it (looking at it in basis ). Then consider how acts on the vectors of ; in particular, how it deals with the subspace spanned by .

Lemma: upper-triangulation

Given a square matrix , there is a basis with respect to which is upper-triangular.

Proof: by induction. It’s obviously true for matrices, as they’re already triangular. Now, let’s take an arbitrary matrix . We want to make it upper-triangular. In particular, thinking about the top-left element, we need to have an eigenvector (since if is upper-triangular with respect to basis , then , where is the top-left element). OK, let’s grab an eigenvector with eigenvalue .

We’d love to be done by induction at this point - if we extend our eigenvector to a basis, that extension itself forms a smaller space, on which is upper-triangulable. We have that every subspace has a complement, so let’s call pick a complement of and call it .

Now, we want to be upper-triangulable on . It makes sense, then, to restrict it to - we’ll call the restriction , and that’s a linear map from to . Our inductive hypothesis requires a square matrix, so we need to throw out one of the rows of this linear map - but in order that we’re working with an endomorphism (rather than just a linear map) we need ’s domain to be . That means we have to throw out the top row as well - that is, we compose with the projection map onto .

Then is , and so we can induct to state that there is a basis of with respect to which is upper-triangular. Let’s take that basis of as our extension to , to make a basis of . (These are length- vectors.)

Then we construct ’s matrix as . (That’s how we construct a matrix for a map in a basis: state where the basis vectors go under the map.)

Now, with respect to this basis , what does look like? Certainly by definition. because acts just the same as on ; by upper-triangularity of , we have that for some . The first element (the coefficient) of , who knows? (We threw that information away by taking .) But that doesn’t matter - we’re looking for upper-triangulability rather than diagonalisability, so we’re allowed to have spare elements sitting at the top of the matrix.

And so forth: is upper-triangular with respect to some basis.

Note

Remember that we threw out some information by projecting onto . If it turned out that we didn’t throw out any information - if it turned out that if we could always “fill in with zeros” - then we’d find that we’d constructed a basis of eigenvectors, and that the matrix was diagonalisable. (This is how the two ideas are related.)

Theorem

Recall the statement of the theorem:

Every square matrix satisfies its characteristic polynomial.

Now, this would be absolutely trivial if our matrix were diagonalisable - just look at it in a basis with respect to which is diagonal (recalling that change-of-basis doesn’t change characteristic polynomial), and we end up with simultaneous equations which are conveniently decoupled from each other (by virtue of the fact that is diagonal).

We can’t assume diagonalisability - but we’ve shown that there is something nearly as good, namely upper-triangulability. Let’s assume (by picking an appropriate basis) that is upper-triangular. Now, let’s say the characteristic polynomial is . What does do to the basis vectors?

Well, let’s consider the first basis vector, . We have that because is upper-triangular with top-left element , so we have . If we look at the characteristic polynomial as , then, we see that .

What about the second basis vector? ; so . We’ve pulled the nd basis vector into an earlier-considered subspace, and happily we can kill it by applying . That is, .

Keep going: the final case is the th basis vector, . has a zero in the bottom-right entry, and is upper-triangular, so it must take to the subspace spanned by . Hence .

Since is zero on a basis, it must be zero on the whole space, and that is what we wanted to prove.