 ### Patrick Stevens

Former mathematics student at the University of Cambridge; now a software engineer.

# Cayley-Hamilton theorem

This is to detail a much easier proof (at least, I find it so) of Cayley-Hamilton than the ones which appear on the Wikipedia page. It only applies in the case of complex vector spaces; most of the post is taken up with a proof of a lemma about complex matrices that is very useful in many contexts.

The idea is as follows: given an arbitrary square matrix, upper-triangularise it (looking at it in basis $B$). Then consider how $A-\lambda I$ acts on the vectors of $B$; in particular, how it deals with the subspace spanned by $b_1, \dots, b_i$.

# Lemma: upper-triangulation

Given a square matrix $A$, there is a basis with respect to which $A$ is upper-triangular.

Proof: by induction. It’s obviously true for $1 \times 1$ matrices, as they’re already triangular. Now, let’s take an arbitrary $n \times n$ matrix $A$. We want to make it upper-triangular. In particular, thinking about the top-left element, we need $A$ to have an eigenvector (since if $A$ is upper-triangular with respect to basis $B$, then $A(b_1) = \lambda b_1$, where $\lambda$ is the top-left element). OK, let’s grab an eigenvector $v_1$ with eigenvalue $\lambda$.

We’d love to be done by induction at this point - if we extend our eigenvector to a basis, that extension itself forms a smaller space, on which $A$ is upper-triangulable. We have that every subspace has a complement, so let’s call pick a complement of $\text{span}(v_1)$ and call it $W$.

Now, we want $A$ to be upper-triangulable on $W$. It makes sense, then, to restrict it to $W$ - we’ll call the restriction $\tilde{A}$, and that’s a linear map from $W$ to $V$. Our inductive hypothesis requires a square matrix, so we need to throw out one of the rows of this linear map - but in order that we’re working with an endomorphism (rather than just a linear map) we need $A$’s domain to be $W$. That means we have to throw out the top row as well - that is, we compose with $\pi$ the projection map onto $W$.

Then $\pi \cdot \tilde{A}$ is $(n-1)\times(n-1)$, and so we can induct to state that there is a basis of $W \leq V$ with respect to which $\pi \cdot \tilde{A}$ is upper-triangular. Let’s take that basis of $W$ as our extension to $v_1$, to make a basis of $V$. (These are $n-1$ length-$n$ vectors.)

Then we construct $A$’s matrix as $A(v_1), A(v_2), \dots, A(v_n)$. (That’s how we construct a matrix for a map in a basis: state where the basis vectors go under the map.)

Now, with respect to this basis $v_1, \dots, v_n$, what does $A$ look like? Certainly $A(v_1) = \lambda v_1$ by definition. $\pi(A(v_2)) = \pi(\tilde{A}(v_2))$ because $\tilde{A}$ acts just the same as $A$ on $W$; by upper-triangularity of $\pi \cdot \tilde{A}$, we have that $\pi \cdot \tilde{A}(\pi(v_2)) = k v_2$ for some $k$. The first element (the $v_1$ coefficient) of $A(v_2)$, who knows? (We threw that information away by taking $\pi$.) But that doesn’t matter - we’re looking for upper-triangulability rather than diagonalisability, so we’re allowed to have spare elements sitting at the top of the matrix.

And so forth: $A$ is upper-triangular with respect to some basis.

## Note

Remember that we threw out some information by projecting onto $W$. If it turned out that we didn’t throw out any information - if it turned out that if we could always “fill in with zeros” - then we’d find that we’d constructed a basis of eigenvectors, and that the matrix was diagonalisable. (This is how the two ideas are related.)

# Theorem

Recall the statement of the theorem:

Every square matrix satisfies its characteristic polynomial.

Now, this would be absolutely trivial if our matrix $A$ were diagonalisable - just look at it in a basis with respect to which $A$ is diagonal (recalling that change-of-basis doesn’t change characteristic polynomial), and we end up with $n$ simultaneous equations which are conveniently decoupled from each other (by virtue of the fact that $A$ is diagonal).

We can’t assume diagonalisability - but we’ve shown that there is something nearly as good, namely upper-triangulability. Let’s assume (by picking an appropriate basis) that $A$ is upper-triangular. Now, let’s say the characteristic polynomial is $\chi(x) = (x - \lambda_1)(x-\lambda_2) \dots (x-\lambda_n)$. What does $\chi(A)$ do to the basis vectors?

Well, let’s consider the first basis vector, $e_1$. We have that $A(e_1) = \lambda_1 e_i$ because $A$ is upper-triangular with top-left element $\lambda_1$, so we have $(A-\lambda_1 I)(e_1) = 0$. If we look at the characteristic polynomial as $(x-\lambda_n)\dots (x-\lambda_1)$, then, we see that $\chi(A)(e_1) = 0$.

What about the second basis vector? $A(e_2) = k e_1 + \lambda_2 e_2$; so $(A - \lambda_2 I)(e_2) = k e_1$. We’ve pulled the $2$nd basis vector into an earlier-considered subspace, and happily we can kill it by applying $(A-\lambda_1 I)$. That is, $\chi(A)(e_2) = (A-\lambda_n I)\dots (A-\lambda_1 I)(A-\lambda_2 I)(e_2) = (A-\lambda_n I)\dots (A-\lambda_1 I) (k e_1) = 0$.

Keep going: the final case is the $n$th basis vector, $e_n$. $A-\lambda_n I$ has a zero in the bottom-right entry, and is upper-triangular, so it must take $e_n$ to the subspace spanned by $e_1, \dots, e_{n-1}$. Hence $(A-\lambda_1 I)\dots (A-\lambda_n I)(e_n) = 0$.

Since $\chi(A)$ is zero on a basis, it must be zero on the whole space, and that is what we wanted to prove.