12 The Spectral Theorem
12.1 The Diagonalization Problem
Not every operator is diagonalizable. Over \mathbb{R}, rotations by angles other than multiples of \pi have no real eigenvalues. Over \mathbb{C}, the Jordan block \begin{pmatrix} \lambda & 1 \\ 0 & \lambda \end{pmatrix} is not diagonalizable.
We identify a class of operators guaranteed to admit orthonormal eigenbases: self-adjoint operators satisfying T^* = T, where the adjoint T^* is defined by \langle T(v), w \rangle = \langle v, T^*(w) \rangle.
The Spectral Theorem. Every self-adjoint operator on a finite-dimensional inner product space has an orthonormal eigenbasis and real eigenvalues.
Throughout, \mathcal{V} denotes a finite-dimensional inner product space over \mathbb{R} or \mathbb{C}.
In the Linear Maps chapter, we used T^* for the transpose (or dual map) T^* : \mathcal{W}^* \to \mathcal{V}^* defined by T^*(\varphi) = \varphi \circ T. Here, T^* denotes the adjoint defined by \langle T(v), w \rangle = \langle v, T^*(w) \rangle. In finite dimensions with an orthonormal basis, the adjoint is represented by the conjugate transpose A^*, while the algebraic transpose acts on dual spaces. When an inner product is present, the Riesz representation theorem identifies \mathcal{V} with \mathcal{V}^*, and under this identification the two notions coincide. We use T^* for the adjoint throughout this chapter and the remainder of the book.
12.2 The Adjoint Operator
The adjoint generalizes the conjugate transpose of matrices. Given a linear operator T : \mathcal{V} \to \mathcal{V}, we seek an operator T^* : \mathcal{V} \to \mathcal{V} such that moving T from the first argument of the inner product to the second (or vice versa) introduces T^*.
Theorem 12.1 Let \mathcal{V} be a finite-dimensional inner product space and T : \mathcal{V} \to \mathcal{V} a linear operator. There exists a unique linear operator T^* : \mathcal{V} \to \mathcal{V} such that \langle T(v), w \rangle = \langle v, T^*(w) \rangle for all v, w \in \mathcal{V}.
Proof. For fixed w \in \mathcal{V}, the map v \mapsto \langle T(v), w \rangle is a linear functional on \mathcal{V}. By Theorem 10.11, there exists a unique vector u \in \mathcal{V} such that \langle T(v), w \rangle = \langle v, u \rangle for all v. Define T^*(w) = u.
We verify T^* is linear. For w_1, w_2 \in \mathcal{V} and \alpha, \beta \in \mathbb{F}, \begin{align*} \langle v, T^*(\alpha w_1 + \beta w_2) \rangle &= \langle T(v), \alpha w_1 + \beta w_2 \rangle \\ &= \overline{\alpha} \langle T(v), w_1 \rangle + \overline{\beta} \langle T(v), w_2 \rangle \\ &= \overline{\alpha} \langle v, T^*(w_1) \rangle + \overline{\beta} \langle v, T^*(w_2) \rangle \\ &= \langle v, \alpha T^*(w_1) + \beta T^*(w_2) \rangle. \end{align*} Since this holds for all v, we conclude T^*(\alpha w_1 + \beta w_2) = \alpha T^*(w_1) + \beta T^*(w_2).
Uniqueness follows from the defining property: if S also satisfies \langle T(v), w \rangle = \langle v, S(w) \rangle, then \langle v, T^*(w) \rangle = \langle v, S(w) \rangle for all v, w, forcing T^* = S by nondegeneracy of the inner product. \square
Definition 12.1 (Adjoint operator) The linear operator T^* : \mathcal{V} \to \mathcal{V} satisfying \langle T(v), w \rangle = \langle v, T^*(w) \rangle for all v, w is the adjoint of T.
In \mathbb{C}^n with \langle x, y \rangle = x^* y, if T is represented by matrix A, then T^* is represented by A^*: \begin{align*} \langle Ax, y \rangle = (Ax)^* y = x^* A^* y = \langle x, A^* y \rangle. \end{align*} In \mathbb{R}^n, the adjoint of A is A^T.
Examples.
Identity: I^* = I since \langle I(v), w \rangle = \langle v, w \rangle = \langle v, I(w) \rangle.
Zero operator: 0^* = 0 since \langle 0(v), w \rangle = 0 = \langle v, 0(w) \rangle.
Projection onto a subspace: If P is orthogonal projection onto \mathcal{W} (see Section 12.3 below), then P^* = P. We verify this in Section 12.3.
Rotation in \mathbb{R}^2: For rotation by angle \theta, the matrix is R = \begin{pmatrix} \cos\theta & -\sin\theta \\ \sin\theta & \cos\theta \end{pmatrix}, so R^* = R^T = \begin{pmatrix} \cos\theta & \sin\theta \\ -\sin\theta & \cos\theta \end{pmatrix} = R^{-1}. Rotations are orthogonal but not self-adjoint (unless \theta = 0 or \pi).
Theorem 12.2 For operators S, T : \mathcal{V} \to \mathcal{V} and scalars \alpha \in \mathbb{F}:
(S + T)^* = S^* + T^*
(\alpha T)^* = \overline{\alpha} T^*
(ST)^* = T^* S^*
(T^*)^* = T
I^* = I
Proof.
For all v, w, \begin{align*} \langle (S+T)(v), w \rangle &= \langle S(v) + T(v), w \rangle \\ &= \langle S(v), w \rangle + \langle T(v), w \rangle \\ &= \langle v, S^*(w) \rangle + \langle v, T^*(w) \rangle \\ &= \langle v, (S^* + T^*)(w) \rangle. \end{align*}
\begin{align*} \langle (\alpha T)(v), w \rangle &= \langle \alpha T(v), w \rangle = \overline{\alpha} \langle T(v), w \rangle \\ &= \overline{\alpha} \langle v, T^*(w) \rangle = \langle v, \overline{\alpha} T^*(w) \rangle. \end{align*}
\begin{align*} \langle (ST)(v), w \rangle &= \langle T(v), S^*(w) \rangle \\ &= \langle v, T^*(S^*(w)) \rangle = \langle v, (T^* S^*)(w) \rangle. \end{align*}
\begin{align*} \langle T^*(v), w \rangle &= \overline{\langle w, T^*(v) \rangle} = \overline{\langle T(w), v \rangle} \\ &= \langle v, T(w) \rangle = \langle (T^*)^*(v), w \rangle. \end{align*}
Shown in example above. \square
Property (c) shows the adjoint reverses order in products: (ST)^* = T^* S^*, analogous to (AB)^T = B^T A^T for matrices. Property (b) involves conjugation: (\alpha T)^* = \overline{\alpha} T^*, reflecting that in complex spaces, the inner product is conjugate-linear in the second argument.
Theorem 12.3 For any operator T : \mathcal{V} \to \mathcal{V}:
\ker(T^*) = (\operatorname{im}(T))^\perp
\operatorname{im}(T^*) = (\ker(T))^\perp
\ker(T) = (\operatorname{im}(T^*))^\perp
\operatorname{rank}(T) = \operatorname{rank}(T^*)
Proof.
\begin{align*} w \in \ker(T^*) &\iff T^*(w) = 0 \\ &\iff \langle v, T^*(w) \rangle = 0 \text{ for all } v \\ &\iff \langle T(v), w \rangle = 0 \text{ for all } v \\ &\iff w \perp \operatorname{im}(T) \\ &\iff w \in (\operatorname{im}(T))^\perp. \end{align*}
From (a) applied to T^* and using (T^*)^* = T: \ker(T) = (\operatorname{im}(T^*))^\perp. Taking orthogonal complements and using (\mathcal{W}^\perp)^\perp = \mathcal{W} gives (\ker(T))^\perp = \operatorname{im}(T^*).
Apply (a) to T^*.
By Theorem 4.5 and orthogonal decomposition \mathcal{V} = \operatorname{im}(T) \oplus (\operatorname{im}(T))^\perp, \begin{align*} \dim \mathcal{V} &= \dim \operatorname{im}(T) + \dim (\operatorname{im}(T))^\perp \\ &= \operatorname{rank}(T) + \dim \ker(T^*) \\ &= \operatorname{rank}(T) + (n - \operatorname{rank}(T^*)). \end{align*} Solving gives \operatorname{rank}(T) = \operatorname{rank}(T^*). \square
These relations show the adjoint interchanges kernel and image orthogonally. Geometrically, T^* “reverses the direction” of T while preserving orthogonality structure.
12.3 Self-Adjoint Operators
An operator equal to its own adjoint enjoys special properties.
Definition 12.2 (Self-adjoint operator) An operator T : \mathcal{V} \to \mathcal{V} is self-adjoint (or Hermitian in the complex case, symmetric in the real case) if T^* = T. Equivalently, \langle T(v), w \rangle = \langle v, T(w) \rangle for all v, w \in \mathcal{V}.
In terms of matrices: T is self-adjoint if [T]_{\mathcal{B}} = [T]_{\mathcal{B}}^* in any orthonormal basis \mathcal{B}. For real matrices, this means A = A^T (symmetry); for complex matrices, A = A^* (conjugate symmetry).
Examples.
Diagonal matrices: Any diagonal matrix D = \operatorname{diag}(\lambda_1, \ldots, \lambda_n) with real entries is self-adjoint since D^* = D.
Symmetric real matrices: A = \begin{pmatrix} 2 & 1 \\ 1 & 3 \end{pmatrix} is self-adjoint over \mathbb{R} since A^T = A.
Hermitian complex matrices: A = \begin{pmatrix} 1 & i \\ -i & 2 \end{pmatrix} is self-adjoint over \mathbb{C} since A^* = \begin{pmatrix} 1 & -i \\ i & 2 \end{pmatrix}^T = \begin{pmatrix} 1 & i \\ -i & 2 \end{pmatrix} = A.
Orthogonal projections: If P is orthogonal projection onto subspace \mathcal{W}, then P^* = P. For v \in \mathcal{V} with v = w + u where w \in \mathcal{W} and u \in \mathcal{W}^\perp, we have P(v) = w. Then \begin{align*} \langle P(v_1), v_2 \rangle &= \langle w_1, w_2 + u_2 \rangle = \langle w_1, w_2 \rangle \\ &= \langle w_1 + u_1, w_2 \rangle = \langle v_1, P(v_2) \rangle, \end{align*} using w_1 \perp u_2 and u_1 \perp w_2.
Non-example: Rotation by \pi/4 in \mathbb{R}^2 is not self-adjoint since R^T = R^{-1} \neq R (unless the rotation is by 0 or \pi).
Theorem 12.4 Every eigenvalue of a self-adjoint operator T : \mathcal{V} \to \mathcal{V} is real.
Proof. Let \lambda be an eigenvalue with eigenvector v \neq 0. Then T(v) = \lambda v, so \begin{align*} \langle T(v), v \rangle &= \langle \lambda v, v \rangle = \lambda \|v\|^2. \end{align*} Since T = T^*, \begin{align*} \langle T(v), v \rangle &= \langle v, T(v) \rangle = \overline{\langle T(v), v \rangle} \\ &= \overline{\lambda \|v\|^2} = \overline{\lambda} \|v\|^2. \end{align*} Thus \lambda \|v\|^2 = \overline{\lambda} \|v\|^2. Since v \neq 0, we have \|v\|^2 > 0, so \lambda = \overline{\lambda}, meaning \lambda \in \mathbb{R}. \square
This is the first major consequence of self-adjointness: eigenvalues are guaranteed to be real, even in complex vector spaces. This explains why observables in quantum mechanics (self-adjoint operators) yield real measurement outcomes.
Theorem 12.5 Eigenvectors of a self-adjoint operator corresponding to distinct eigenvalues are orthogonal.
Proof. Let T(v_1) = \lambda_1 v_1 and T(v_2) = \lambda_2 v_2 with \lambda_1 \neq \lambda_2. Then \begin{align*} \lambda_1 \langle v_1, v_2 \rangle &= \langle \lambda_1 v_1, v_2 \rangle = \langle T(v_1), v_2 \rangle \\ &= \langle v_1, T(v_2) \rangle = \langle v_1, \lambda_2 v_2 \rangle = \overline{\lambda_2} \langle v_1, v_2 \rangle. \end{align*} Since eigenvalues are real by Theorem 12.4, \overline{\lambda_2} = \lambda_2, so (\lambda_1 - \lambda_2) \langle v_1, v_2 \rangle = 0. Since \lambda_1 \neq \lambda_2, we have \langle v_1, v_2 \rangle = 0, so v_1 \perp v_2. \square
This guarantees that eigenspaces corresponding to different eigenvalues are mutually orthogonal—a property not shared by general operators. It is this orthogonality that enables us to construct orthonormal eigenbases.
Theorem 12.6 If \mathcal{W} \subseteq \mathcal{V} is T-invariant for a self-adjoint operator T, then \mathcal{W}^\perp is also T-invariant.
Proof. Let w \in \mathcal{W} and u \in \mathcal{W}^\perp. Since \mathcal{W} is T-invariant, T(w) \in \mathcal{W}. We verify T(u) \in \mathcal{W}^\perp by showing \langle T(u), w \rangle = 0: \langle T(u), w \rangle = \langle u, T(w) \rangle = 0 since u \in \mathcal{W}^\perp and T(w) \in \mathcal{W}. Thus T(u) \perp w for all w \in \mathcal{W}, so T(u) \in \mathcal{W}^\perp. \square
This property is crucial for the inductive proof of the spectral theorem: starting with a one-dimensional eigenspace, its orthogonal complement is invariant, allowing us to restrict to a lower-dimensional subspace and apply induction.
12.4 The Spectral Theorem
We now prove the central result.
Theorem 12.7 (The Spectral Theorem) Let T : \mathcal{V} \to \mathcal{V} be a self-adjoint operator on a finite-dimensional inner product space. Then \mathcal{V} has an orthonormal basis consisting of eigenvectors of T. Equivalently, there exists an orthonormal basis \mathcal{B} in which [T]_{\mathcal{B}} is diagonal with real entries.
Proof. We proceed by induction on n = \dim \mathcal{V}.
Base case: If n = 1, every operator is a scalar multiple of the identity, hence diagonal in any basis. Any unit vector forms an orthonormal eigenbasis.
Inductive step: Assume the result holds for all self-adjoint operators on spaces of dimension < n. Let \dim \mathcal{V} = n.
We first show T has a real eigenvalue. The characteristic polynomial \chi(\lambda) = \det(T - \lambda I) is a degree-n polynomial with real coefficients (since [T]_{\mathcal{B}} is a real symmetric matrix in any orthonormal basis \mathcal{B}). By the fundamental theorem of algebra, \chi has a root \lambda_0 \in \mathbb{C}, so there exists a nonzero v \in \mathbb{C}^n with Tv = \lambda_0 v (interpreting T as acting on \mathbb{C}^n via its real matrix). We claim \lambda_0 \in \mathbb{R}.
Compute \langle Tv, v \rangle in two ways. On one hand, \langle Tv, v \rangle = \langle \lambda_0 v, v \rangle = \lambda_0 \|v\|^2. On the other, since T = T^*, \langle Tv, v \rangle = \langle v, Tv \rangle = \overline{\langle Tv, v \rangle} = \overline{\lambda_0} \|v\|^2. Thus \lambda_0 \|v\|^2 = \overline{\lambda_0} \|v\|^2. Since v \neq 0, we have \|v\|^2 > 0, so \lambda_0 = \overline{\lambda_0}, meaning \lambda_0 \in \mathbb{R}.
It remains to produce a real eigenvector. Write v = u + iw with u, w \in \mathbb{R}^n. Taking real and imaginary parts of Tv = \lambda_0 v gives Tu = \lambda_0 u \quad \text{and} \quad Tw = \lambda_0 w. Since v \neq 0, at least one of u, w is nonzero, yielding a real eigenvector of T with eigenvalue \lambda_0 \in \mathbb{R}.
Let \lambda_1 = \lambda_0 be this eigenvalue and v_1 a corresponding real eigenvector. Normalize to \|v_1\| = 1. Let \mathcal{W}_1 = \operatorname{span}(v_1), a one-dimensional T-invariant subspace.
By Theorem 12.6, \mathcal{W}_1^\perp is also T-invariant. Moreover, \dim \mathcal{W}_1^\perp = n - 1 by Corollary 11.1.
Restrict T to \mathcal{W}_1^\perp: define T' = T|_{\mathcal{W}_1^\perp} : \mathcal{W}_1^\perp \to \mathcal{W}_1^\perp. We verify T' is self-adjoint on \mathcal{W}_1^\perp (with the restricted inner product). For u, w \in \mathcal{W}_1^\perp, \langle T'(u), w \rangle = \langle T(u), w \rangle = \langle u, T(w) \rangle = \langle u, T'(w) \rangle. Thus T' is self-adjoint on the (n-1)-dimensional space \mathcal{W}_1^\perp.
By the inductive hypothesis, \mathcal{W}_1^\perp has an orthonormal basis \{v_2, \ldots, v_n\} of eigenvectors of T'. Since T' = T|_{\mathcal{W}_1^\perp}, these are also eigenvectors of T.
The set \{v_1, v_2, \ldots, v_n\} is orthonormal: v_1 \in \mathcal{W}_1 and v_2, \ldots, v_n \in \mathcal{W}_1^\perp are mutually orthogonal, and \{v_2, \ldots, v_n\} is orthonormal by construction. This gives an orthonormal eigenbasis of \mathcal{V}. \square
Corollary 12.1 (Matrix form) A matrix A \in M_n(\mathbb{F}) is self-adjoint (i.e., A = A^*) if and only if there exists a unitary matrix U (orthogonal if \mathbb{F} = \mathbb{R}) such that U^* A U = D where D = \operatorname{diag}(\lambda_1, \ldots, \lambda_n) with \lambda_i \in \mathbb{R}.
Proof. If A = A^*, apply Theorem 12.7 to obtain an orthonormal eigenbasis. The change-of-basis matrix U from the standard basis to this eigenbasis is unitary (its columns are orthonormal), and U^* A U = D by the diagonalization formula. Conversely, if U^* A U = D with D real diagonal and U unitary, then A = U D U^*, so A^* = (U D U^*)^* = U D^* U^* = U D U^* = A (using D^* = D for real diagonal). \square
Remark. The spectral theorem is sometimes called the principal axis theorem in geometry, where it states that every symmetric bilinear form (quadratic form) can be diagonalized by rotating to principal axes. In physics, it’s the basis for normal modes in classical mechanics: coupled oscillators decouple into independent modes along eigendirections.
12.5 Spectral Decomposition
The spectral theorem provides more than diagonalization—it yields a canonical decomposition of the operator as a sum of projections.
Let T : \mathcal{V} \to \mathcal{V} be self-adjoint with eigenvalues \lambda_1, \ldots, \lambda_k (distinct) and corresponding eigenspaces E_{\lambda_1}, \ldots, E_{\lambda_k}. By Theorem 12.5, these eigenspaces are pairwise orthogonal. By the spectral theorem, they span \mathcal{V}: \mathcal{V} = E_{\lambda_1} \oplus E_{\lambda_2} \oplus \cdots \oplus E_{\lambda_k}.
Let P_i : \mathcal{V} \to E_{\lambda_i} denote orthogonal projection onto E_{\lambda_i}. By the Orthogonality chapter, every v \in \mathcal{V} decomposes uniquely as v = \sum_{i=1}^k P_i(v) where P_i(v) \in E_{\lambda_i}.
Since T(P_i(v)) = \lambda_i P_i(v) (as P_i(v) is an eigenvector with eigenvalue \lambda_i), we have \begin{align*} T(v) &= T\left(\sum_{i=1}^k P_i(v)\right) = \sum_{i=1}^k T(P_i(v)) \\ &= \sum_{i=1}^k \lambda_i P_i(v) = \left(\sum_{i=1}^k \lambda_i P_i\right)(v). \end{align*}
Theorem 12.8 (Spectral decomposition) Let T : \mathcal{V} \to \mathcal{V} be self-adjoint with distinct eigenvalues \lambda_1, \ldots, \lambda_k and corresponding eigenspaces E_{\lambda_1}, \ldots, E_{\lambda_k}. Let P_i : \mathcal{V} \to E_{\lambda_i} be orthogonal projection onto E_{\lambda_i}. Then T = \sum_{i=1}^k \lambda_i P_i. Moreover:
P_i P_j = \delta_{ij} P_i (orthogonal projections)
\sum_{i=1}^k P_i = I (resolution of identity)
P_i^* = P_i (self-adjoint)
P_i^2 = P_i (idempotent)
Proof. The formula T = \sum \lambda_i P_i was shown above.
For i \neq j, \operatorname{im}(P_i) = E_{\lambda_i} and \operatorname{im}(P_j) = E_{\lambda_j} are orthogonal, so P_i P_j = 0 (projecting onto E_{\lambda_j} then onto E_{\lambda_i} gives zero since E_{\lambda_j} \perp E_{\lambda_i}). For i = j, P_i^2 = P_i by idempotence of projections.
Since \mathcal{V} = \bigoplus E_{\lambda_i}, decomposing v = \sum P_i(v) gives v = (\sum P_i)(v), so \sum P_i = I.
and (d): These are properties of orthogonal projections established in the Orthogonality chapter. \square
This decomposition is called the spectral decomposition or spectral resolution of T. It expresses T as a weighted sum of orthogonal projections onto eigenspaces, with weights given by eigenvalues. The projections P_i are sometimes called spectral projections.
Matrix form. If \mathcal{B} is an orthonormal eigenbasis organized by eigenspaces (first d_1 vectors span E_{\lambda_1}, next d_2 span E_{\lambda_2}, etc.), then [T]_{\mathcal{B}} = \begin{pmatrix} \lambda_1 I_{d_1} & 0 & \cdots & 0 \\ 0 & \lambda_2 I_{d_2} & \cdots & 0 \\ \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & \cdots & \lambda_k I_{d_k} \end{pmatrix} where d_i = \dim E_{\lambda_i}.
Applications. The spectral decomposition allows functional calculus: for any function f : \mathbb{R} \to \mathbb{R}, define f(T) = \sum_{i=1}^k f(\lambda_i) P_i. For instance, T^2 = \sum \lambda_i^2 P_i, e^T = \sum e^{\lambda_i} P_i, and T^{-1} = \sum \lambda_i^{-1} P_i (if \lambda_i \neq 0). This extends the notion of applying functions to operators beyond polynomials.
12.6 Quadratic Forms
A quadratic form on \mathcal{V} is a function Q : \mathcal{V} \to \mathbb{R} of the form Q(v) = \langle T(v), v \rangle where T : \mathcal{V} \to \mathcal{V} is a self-adjoint operator. In coordinates, if v = (x_1, \ldots, x_n) and A = [T]_{\mathcal{B}}, then Q(x) = x^T A x = \sum_{i,j} a_{ij} x_i x_j.
Quadratic forms arise in optimization (Hessian matrices at critical points), physics (kinetic and potential energy), geometry (curvature), and statistics (variance of random vectors).
Theorem 12.9 (Principal Axes Theorem) Let Q(v) = \langle T(v), v \rangle be a quadratic form with T self-adjoint. There exists an orthonormal basis \mathcal{B} = \{e_1, \ldots, e_n\} of eigenvectors of T such that Q(v) = \sum_{i=1}^n \lambda_i c_i^2 where v = \sum c_i e_i and \lambda_i are the eigenvalues of T.
Proof. By the spectral theorem, choose an orthonormal eigenbasis \{e_1, \ldots, e_n\} with T(e_i) = \lambda_i e_i. For v = \sum c_i e_i, \begin{align*} Q(v) &= \langle T(v), v \rangle = \left\langle T\left(\sum c_i e_i\right), \sum c_j e_j \right\rangle \\ &= \left\langle \sum \lambda_i c_i e_i, \sum c_j e_j \right\rangle \\ &= \sum_{i,j} \lambda_i c_i \overline{c_j} \langle e_i, e_j \rangle = \sum_i \lambda_i |c_i|^2. \quad \square \end{align*}
In the eigenbasis, the quadratic form has no cross terms—it is a weighted sum of squares. The eigenvectors e_i are the principal axes, and the eigenvalues \lambda_i are the principal coefficients.
Classification by definiteness. The eigenvalues determine the behavior of Q:
- Positive definite: Q(v) > 0 for all v \neq 0 \iff all \lambda_i > 0
- Positive semidefinite: Q(v) \geq 0 for all v \iff all \lambda_i \geq 0
- Negative definite: Q(v) < 0 for all v \neq 0 \iff all \lambda_i < 0
- Negative semidefinite: Q(v) \leq 0 for all v \iff all \lambda_i \leq 0
- Indefinite: Q takes both positive and negative values \iff some \lambda_i > 0 and some \lambda_j < 0
Example (Conic sections). Consider Q(x, y) = x^2 - y^2 in \mathbb{R}^2. The matrix is A = \begin{pmatrix} 1 & 0 \\ 0 & -1 \end{pmatrix} with eigenvalues \lambda_1 = 1, \lambda_2 = -1. The quadratic form is indefinite, representing a hyperbola Q(x,y) = c. Rotating by 45° to eigenvectors (1,1)/\sqrt{2} and (1,-1)/\sqrt{2} yields Q = u^2 - v^2 in the new coordinates, the standard hyperbola form.
12.7 Normal Operators
The spectral theorem generalizes beyond self-adjoint operators.
Definition 12.3 (Normal operator) An operator T : \mathcal{V} \to \mathcal{V} is normal if T^* T = T T^*. Equivalently, T commutes with its adjoint.
Self-adjoint operators are normal (since T = T^* implies T^* T = T T^* = T^2). Orthogonal and unitary operators are normal (since T^* = T^{-1} gives T^* T = I = T T^*). But not all normal operators are self-adjoint or orthogonal.
Example. Rotation in \mathbb{R}^2 by angle \theta \neq 0, \pi is normal but not self-adjoint. Over \mathbb{C}, it diagonalizes to \operatorname{diag}(e^{i\theta}, e^{-i\theta})—complex eigenvalues on the unit circle.
Theorem 12.10 (Spectral theorem for normal operators) Let T : \mathcal{V} \to \mathcal{V} be a normal operator on a finite-dimensional complex inner product space. Then \mathcal{V} has an orthonormal basis of eigenvectors of T. Equivalently, there exists a unitary matrix U such that U^* A U is diagonal.
Proof. (Sketch) The proof mirrors that for self-adjoint operators. The key steps:
Show T has an eigenvalue \lambda \in \mathbb{C} (by the fundamental theorem of algebra).
Show that if \mathcal{W} is T-invariant, then \mathcal{W}^\perp is T^*-invariant. For normal T, this implies \mathcal{W}^\perp is also T-invariant.
Apply induction on dimension as before. \square
For real normal operators, complexification may be required to obtain all eigenvalues, leading to pairs of complex conjugate eigenvalues with two-dimensional invariant real subspaces (as in rotations).
12.8 Computing Eigenvalues and Diagonalization
Algorithmic procedure for diagonalizing a self-adjoint operator T:
Find eigenvalues: Solve \det(T - \lambda I) = 0 for \lambda. All roots are real.
Find eigenvectors: For each eigenvalue \lambda_i, solve (T - \lambda_i I)v = 0 to obtain a basis of E_{\lambda_i}.
Orthonormalize eigenbases: Apply Gram-Schmidt within each eigenspace if necessary (though often a basis is already orthogonal).
Assemble orthonormal eigenbasis: Concatenate the orthonormal bases from all eigenspaces.
Form diagonalizing matrix: U = [e_1 \mid \cdots \mid e_n] where e_i are the orthonormal eigenvectors. Then U^* A U = D where D = \operatorname{diag}(\lambda_1, \ldots, \lambda_n).
Example. Diagonalize A = \begin{pmatrix} 4 & 2 \\ 2 & 1 \end{pmatrix}.
Step 1: Characteristic polynomial: \begin{align*} \det(A - \lambda I) &= \det\begin{pmatrix} 4-\lambda & 2 \\ 2 & 1-\lambda \end{pmatrix} \\ &= (4-\lambda)(1-\lambda) - 4 = \lambda^2 - 5\lambda = \lambda(\lambda - 5). \end{align*} Eigenvalues: \lambda_1 = 0, \lambda_2 = 5.
Step 2: Eigenvectors.
For \lambda_1 = 0: (A - 0I)v = Av = 0 gives \begin{pmatrix} 4 & 2 \\ 2 & 1 \end{pmatrix}\begin{pmatrix} x \\ y \end{pmatrix} = 0, so 4x + 2y = 0, yielding v_1 = \begin{pmatrix} 1 \\ -2 \end{pmatrix}.
For \lambda_2 = 5: (A - 5I)v = 0 gives \begin{pmatrix} -1 & 2 \\ 2 & -4 \end{pmatrix}\begin{pmatrix} x \\ y \end{pmatrix} = 0, so -x + 2y = 0, yielding v_2 = \begin{pmatrix} 2 \\ 1 \end{pmatrix}.
Step 3: Verify orthogonality: \langle v_1, v_2 \rangle = 1 \cdot 2 + (-2) \cdot 1 = 0.
Normalize: \|v_1\| = \sqrt{1 + 4} = \sqrt{5}, \|v_2\| = \sqrt{4 + 1} = \sqrt{5}. \begin{align*} e_1 = \frac{1}{\sqrt{5}}\begin{pmatrix} 1 \\ -2 \end{pmatrix}, \quad e_2 = \frac{1}{\sqrt{5}}\begin{pmatrix} 2 \\ 1 \end{pmatrix}. \end{align*}
Step 4: Form U = [e_1 \mid e_2] = \frac{1}{\sqrt{5}}\begin{pmatrix} 1 & 2 \\ -2 & 1 \end{pmatrix}.
Step 5: Verify U^T A U = D: D = \begin{pmatrix} 0 & 0 \\ 0 & 5 \end{pmatrix}.
Geometric interpretation. The eigenvectors define new orthogonal axes. In these coordinates, the quadratic form Q(x, y) = x^T A x = 4x^2 + 4xy + y^2 becomes Q = 0 u^2 + 5 v^2 = 5v^2, representing a parabolic cylinder along the u-axis.
12.9 Applications and Further Directions
1. Positive definite matrices and Cholesky decomposition. If A is positive definite (A = A^* with all \lambda_i > 0), the spectral decomposition A = U D U^* with D = \operatorname{diag}(\lambda_1, \ldots, \lambda_n) yields \begin{align*} A &= U D U^* = U D^{1/2} D^{1/2} U^* \\ &= (U D^{1/2} U^*)(U D^{1/2} U^*)^* = LL^* \end{align*} where L = U D^{1/2} U^* is lower triangular (after reordering). This is the Cholesky decomposition, central to numerical algorithms for positive definite systems.
2. Inertia and Sylvester’s law. The inertia of a self-adjoint operator is the triple (n_+, n_-, n_0) where n_+ is the number of positive eigenvalues, n_- the number of negative eigenvalues, and n_0 the number of zero eigenvalues. Sylvester’s law states that congruent transformations (changes of basis) preserve inertia, providing a topological invariant for quadratic forms.
3. Rayleigh quotient and variational characterization. The eigenvalues of T can be characterized variationally: \lambda_{\max} = \max_{\|v\| = 1} \langle T(v), v \rangle, \quad \lambda_{\min} = \min_{\|v\| = 1} \langle T(v), v \rangle. The Rayleigh quotient R(v) = \langle T(v), v \rangle / \|v\|^2 attains its maximum and minimum at eigenvectors corresponding to \lambda_{\max} and \lambda_{\min}.
4. Simultaneous diagonalization. If S, T are self-adjoint and commute (ST = TS), they share a common orthonormal eigenbasis—they are simultaneously diagonalizable. This underlies the theory of commuting observables in quantum mechanics.
5. Singular value decomposition (SVD). For any matrix A \in M_{m \times n}(\mathbb{F}) (not necessarily square or self-adjoint), the matrices A^* A and A A^* are self-adjoint and positive semidefinite. Diagonalizing them yields the singular value decomposition A = U \Sigma V^*, where U, V are orthogonal/unitary and \Sigma is diagonal with nonnegative entries (singular values). This is developed further in Chapter 14.
12.10 Closing Remarks
The spectral theorem is the culmination of the theory of self-adjoint operators: every such operator decomposes into independent scalar actions along orthogonal directions. This reduces complex linear transformations to their simplest form, revealing intrinsic geometric structure independent of coordinate choices.
Self-adjoint operators enjoy remarkable properties not shared by general operators: real eigenvalues, orthogonal eigenspaces, and guaranteed diagonalizability. These properties are not accidental—they reflect deep connections between linear algebra and geometry, mediated by the inner product.
The spectral decomposition T = \sum \lambda_i P_i expresses the operator as a weighted combination of orthogonal projections, each projecting onto an eigenspace. This decomposition enables functional calculus, allowing us to apply arbitrary functions to operators via f(T) = \sum f(\lambda_i) P_i.
Applications span mathematics, physics, engineering, and data science. In quantum mechanics, the spectral theorem justifies the probabilistic interpretation: measurements correspond to self-adjoint operators, outcomes to eigenvalues, and probabilities to projections onto eigenspaces. In statistics, principal component analysis diagonalizes covariance matrices, extracting independent modes of variation. In differential equations, the spectral theorem yields normal modes and separation of variables for symmetric operators.
The extension to normal operators on complex spaces broadens the theory to include rotations (unitary operators) and shear transformations. The spectral theorem for normal operators asserts that commutativity with the adjoint—T^* T = T T^*—suffices for orthogonal diagonalizability over \mathbb{C}.
Beyond finite dimensions, the theory generalizes to compact self-adjoint operators on infinite-dimensional Hilbert spaces, yielding spectral decompositions with convergent sums or integrals (the spectral theorem for unbounded operators). This infinite-dimensional extension underpins functional analysis, quantum mechanics, and partial differential equations.
The SVD chapter extends the spectral theorem to arbitrary matrices via the singular value decomposition, the most important factorization in applied linear algebra.