 ### Patrick Stevens

Former mathematics student at the University of Cambridge; now a software engineer.

# Cayley-Hamilton theorem

This is to detail a much easier proof (at least, I find it so) of Cayley-Hamilton than the ones which appear on the Wikipedia page. It only applies in the case of complex vector spaces; most of the post is taken up with a proof of a lemma about complex matrices that is very useful in many contexts.

The idea is as follows: given an arbitrary square matrix, upper-triangularise it (looking at it in basis $$B$$). Then consider how $$A-\lambda I$$ acts on the vectors of $$B$$; in particular, how it deals with the subspace spanned by $$b_1, \dots, b_i$$.

# Lemma: upper-triangulation

Given a square matrix $$A$$, there is a basis with respect to which $$A$$ is upper-triangular.

Proof: by induction. It’s obviously true for $$1 \times 1$$ matrices, as they’re already triangular. Now, let’s take an arbitrary $$n \times n$$ matrix $$A$$. We want to make it upper-triangular. In particular, thinking about the top-left element, we need $$A$$ to have an eigenvector (since if $$A$$ is upper-triangular with respect to basis $$B$$, then $$A(b_1) = \lambda b_1$$, where $$\lambda$$ is the top-left element). OK, let’s grab an eigenvector $$v_1$$ with eigenvalue $$\lambda$$.

We’d love to be done by induction at this point - if we extend our eigenvector to a basis, that extension itself forms a smaller space, on which $$A$$ is upper-triangulable. We have that every subspace has a complement, so let’s call pick a complement of $$\text{span}(v_1)$$ and call it $$W$$.

Now, we want $$A$$ to be upper-triangulable on $$W$$. It makes sense, then, to restrict it to $$W$$ - we’ll call the restriction $$\tilde{A}$$, and that’s a linear map from $$W$$ to $$V$$. Our inductive hypothesis requires a square matrix, so we need to throw out one of the rows of this linear map - but in order that we’re working with an endomorphism (rather than just a linear map) we need $$A$$’s domain to be $$W$$. That means we have to throw out the top row as well - that is, we compose with $$\pi$$ the projection map onto $$W$$.

Then $$\pi \cdot \tilde{A}$$ is $$(n-1)\times(n-1)$$, and so we can induct to state that there is a basis of $$W \leq V$$ with respect to which $$\pi \cdot \tilde{A}$$ is upper-triangular. Let’s take that basis of $$W$$ as our extension to $$v_1$$, to make a basis of $$V$$. (These are $$n-1$$ length-$$n$$ vectors.)

Then we construct $$A$$’s matrix as $$A(v_1), A(v_2), \dots, A(v_n)$$. (That’s how we construct a matrix for a map in a basis: state where the basis vectors go under the map.)

Now, with respect to this basis $$v_1, \dots, v_n$$, what does $$A$$ look like? Certainly $$A(v_1) = \lambda v_1$$ by definition. $$\pi(A(v_2)) = \pi(\tilde{A}(v_2))$$ because $$\tilde{A}$$ acts just the same as $$A$$ on $$W$$; by upper-triangularity of $$\pi \cdot \tilde{A}$$, we have that $$\pi \cdot \tilde{A}(\pi(v_2)) = k v_2$$ for some $$k$$. The first element (the $$v_1$$ coefficient) of $$A(v_2)$$, who knows? (We threw that information away by taking $$\pi$$.) But that doesn’t matter - we’re looking for upper-triangulability rather than diagonalisability, so we’re allowed to have spare elements sitting at the top of the matrix.

And so forth: $$A$$ is upper-triangular with respect to some basis.

## Note

Remember that we threw out some information by projecting onto $$W$$. If it turned out that we didn’t throw out any information - if it turned out that if we could always “fill in with zeros” - then we’d find that we’d constructed a basis of eigenvectors, and that the matrix was diagonalisable. (This is how the two ideas are related.)

# Theorem

Recall the statement of the theorem:

Every square matrix satisfies its characteristic polynomial.

Now, this would be absolutely trivial if our matrix $$A$$ were diagonalisable - just look at it in a basis with respect to which $$A$$ is diagonal (recalling that change-of-basis doesn’t change characteristic polynomial), and we end up with $$n$$ simultaneous equations which are conveniently decoupled from each other (by virtue of the fact that $$A$$ is diagonal).

We can’t assume diagonalisability - but we’ve shown that there is something nearly as good, namely upper-triangulability. Let’s assume (by picking an appropriate basis) that $$A$$ is upper-triangular. Now, let’s say the characteristic polynomial is $$\chi(x) = (x - \lambda_1)(x-\lambda_2) \dots (x-\lambda_n)$$. What does $$\chi(A)$$ do to the basis vectors?

Well, let’s consider the first basis vector, $$e_1$$. We have that $$A(e_1) = \lambda_1 e_i$$ because $$A$$ is upper-triangular with top-left element $$\lambda_1$$, so we have $$(A-\lambda_1 I)(e_1) = 0$$. If we look at the characteristic polynomial as $$(x-\lambda_n)\dots (x-\lambda_1)$$, then, we see that $$\chi(A)(e_1) = 0$$.

What about the second basis vector? $$A(e_2) = k e_1 + \lambda_2 e_2$$; so $$(A - \lambda_2 I)(e_2) = k e_1$$. We’ve pulled the $$2$$nd basis vector into an earlier-considered subspace, and happily we can kill it by applying $$(A-\lambda_1 I)$$. That is, $$\chi(A)(e_2) = (A-\lambda_n I)\dots (A-\lambda_1 I)(A-\lambda_2 I)(e_2) = (A-\lambda_n I)\dots (A-\lambda_1 I) (k e_1) = 0$$.

Keep going: the final case is the $$n$$th basis vector, $$e_n$$. $$A-\lambda_n I$$ has a zero in the bottom-right entry, and is upper-triangular, so it must take $$e_n$$ to the subspace spanned by $$e_1, \dots, e_{n-1}$$. Hence $$(A-\lambda_1 I)\dots (A-\lambda_n I)(e_n) = 0$$.

Since $$\chi(A)$$ is zero on a basis, it must be zero on the whole space, and that is what we wanted to prove.