9 Invariant Subspaces and Eigenvalues
9.1 The Structure Problem for Operators
A linear map T : \mathcal{V} \to \mathcal{W} between distinct spaces transforms geometric objects from one ambient context to another. When domain and codomain coincide—when T : \mathcal{V} \to \mathcal{V} is an operator—the transformation acts on the space itself. The central question becomes: what is the simplest coordinate system in which T can be expressed?
Consider a projection P : \mathbb{R}^3 \to \mathbb{R}^3 onto a plane \mathcal{W} through the origin. The plane is mapped into itself; the orthogonal complement is collapsed to zero. In coordinates adapted to this decomposition \mathbb{R}^3 = \mathcal{W}_1 \oplus \mathcal{W}_2, the matrix is block-diagonal: P acts as identity on \mathcal{W}_1 and as zero on \mathcal{W}_2. The geometric decomposition induces an algebraic simplification.
More generally, if a subspace \mathcal{W} \subseteq \mathcal{V} is invariant under T—if T(\mathcal{W}) \subseteq \mathcal{W}—then T restricts to an operator on \mathcal{W}, and choosing bases adapted to invariant subspaces yields block-structured matrices. When \mathcal{V} decomposes as a direct sum of invariant subspaces, T decomposes into independent operators on each summand.
The ultimate simplification occurs when \mathcal{V} admits a decomposition into one-dimensional invariant subspaces. On a line \operatorname{span}(v), any operator acts as scalar multiplication. If \mathcal{V} decomposes into such lines, T becomes diagonal—the simplest possible form.
This chapter develops the theory systematically. We begin with direct sums, characterize invariant subspaces and their interaction with matrix representations, identify one-dimensional invariant subspaces with eigenvectors, and determine when operators admit diagonalization. The characteristic polynomial emerges as the computational tool detecting eigenvalues, and the spectral structure—the collection of eigenvalues and eigenspaces—determines the extent to which T can be simplified.
9.2 Direct Sum Decompositions
We established in Chapter 2 that vector spaces decompose into independent summands. The construction generalizes naturally beyond pairs.
Definition 9.1 (Direct sum) Subspaces \mathcal{W}_1, \ldots, \mathcal{W}_k \subseteq \mathcal{V} form a direct sum if every v \in \mathcal{V} admits a unique decomposition v = w_1 + \cdots + w_k with w_i \in \mathcal{W}_i. We write \mathcal{V} = \mathcal{W}_1 \oplus \cdots \oplus \mathcal{W}_k = \bigoplus_{i=1}^k \mathcal{W}_i.
Theorem 9.1 The following are equivalent:
\mathcal{V} = \bigoplus_{i=1}^k \mathcal{W}_i
\mathcal{V} = \sum_{i=1}^k \mathcal{W}_i and the only representation of 0 as \sum w_i with w_i \in \mathcal{W}_i has all w_i = 0
\mathcal{V} = \sum_{i=1}^k \mathcal{W}_i and for each j, \mathcal{W}_j \cap \left(\sum_{i \neq j} \mathcal{W}_i\right) = \{0\}
Proof. (i) \iff (ii) is immediate from uniqueness of the zero representation. For (ii) \implies (iii): if w_j \in \mathcal{W}_j \cap (\sum_{i \neq j} \mathcal{W}_i), write w_j = \sum_{i \neq j} w_i, giving w_j + \sum_{i \neq j}(-w_i) = 0. By (ii), w_j = 0. The reverse implication follows by reversing this argument. \square
Corollary 9.1 If \mathcal{V} = \bigoplus_{i=1}^k \mathcal{W}_i and \dim \mathcal{V} < \infty, then \dim \mathcal{V} = \sum_{i=1}^k \dim \mathcal{W}_i.
Proof. Bases of the summands concatenate to a basis of \mathcal{V}, as independence of the union follows from Theorem 9.1 (ii) and spanning is immediate. \square
The direct sum provides coordinates adapted to the decomposition: every vector v is uniquely determined by its components (w_1, \ldots, w_k) with w_i \in \mathcal{W}_i. Understanding \mathcal{V} reduces to understanding each \mathcal{W}_i independently.
From Chapter 3, the kernel-image decomposition exemplifies this structure: for T : \mathcal{V} \to \mathcal{W} with \mathcal{V} finite-dimensional, there exists \mathcal{U} \subseteq \mathcal{V} with \mathcal{V} = \ker(T) \oplus \mathcal{U} and T|_{\mathcal{U}} : \mathcal{U} \to \operatorname{im}(T) an isomorphism. The decomposition separates what T annihilates from what it preserves.
9.3 Invariant Subspaces
Let T : \mathcal{V} \to \mathcal{V} be a linear operator.
Definition 9.2 (Invariant subspace) A subspace \mathcal{W} \subseteq \mathcal{V} is T-invariant if T(\mathcal{W}) \subseteq \mathcal{W}.
Equivalently, \mathcal{W} is T-invariant if T restricts to an operator T|_{\mathcal{W}} : \mathcal{W} \to \mathcal{W}. The subspace forms a closed subsystem under iteration of T.
The trivial examples \{0\} and \mathcal{V} are always T-invariant. Nontrivial invariant subspaces reveal finer structure. Both \ker(T) and \operatorname{im}(T) are T-invariant: if T(v) = 0 then T(T(v)) = 0, and if w = T(v) then T(w) = T^2(v) \in \operatorname{im}(T).
Theorem 9.2 Let \mathcal{W} \subseteq \mathcal{V} be T-invariant with \dim \mathcal{W} = k. Choose a basis \mathcal{B} of \mathcal{V} whose first k vectors span \mathcal{W}. Then [T]_{\mathcal{B}} = \begin{pmatrix} A & B \\ 0 & D \end{pmatrix} where A \in M_{k \times k}(\mathbb{F}) represents T|_{\mathcal{W}}.
Proof. Let \mathcal{B} = \{w_1, \ldots, w_k, v_{k+1}, \ldots, v_n\} where \{w_1, \ldots, w_k\} is a basis of \mathcal{W}. For j \leq k, T(w_j) \in \mathcal{W} = \operatorname{span}(w_1, \ldots, w_k), so the j-th column of [T]_{\mathcal{B}} has zeros in positions k+1 through n. This yields the block structure, with A recording the action of T on \mathcal{W}. \square
The block-triangular form reflects that T cannot map \mathcal{W} outside itself. By Theorem 8.9 from Chapter 7, \det(T) = \det(A) \det(D)—the determinant factors according to the invariant subspace decomposition.
When both \mathcal{W} and a complement are invariant, the structure simplifies further.
Definition 9.3 (Reducing pair) Subspaces \mathcal{W}_1, \mathcal{W}_2 \subseteq \mathcal{V} form a reducing pair for T if \mathcal{V} = \mathcal{W}_1 \oplus \mathcal{W}_2 and both are T-invariant.
Theorem 9.3 If \mathcal{V} = \mathcal{W}_1 \oplus \mathcal{W}_2 with both \mathcal{W}_i T-invariant, then in a basis adapted to the decomposition, [T] = \begin{pmatrix} A & 0 \\ 0 & D \end{pmatrix} where A and D represent T|_{\mathcal{W}_1} and T|_{\mathcal{W}_2} respectively.
Proof. Apply Theorem 9.2 to \mathcal{W}_1, obtaining the upper block-triangular structure. Invariance of \mathcal{W}_2 forces the upper-right block B to vanish: for basis vectors in \mathcal{W}_2, T maps into \mathcal{W}_2, yielding no component in \mathcal{W}_1. \square
The operator decomposes: T acts on \mathcal{W}_1 and \mathcal{W}_2 independently. Powers and polynomials respect this decomposition: if [T] = \operatorname{diag}(A, D), then [T^k] = \operatorname{diag}(A^k, D^k) and [p(T)] = \operatorname{diag}(p(A), p(D)) for any polynomial p.
More generally, if \mathcal{V} = \bigoplus_{i=1}^k \mathcal{W}_i with all \mathcal{W}_i invariant, then [T] is block-diagonal with blocks A_1, \ldots, A_k representing T|_{\mathcal{W}_i}. Understanding T reduces to understanding each restriction T|_{\mathcal{W}_i} independently.
While block-diagonal structure simplifies analysis, the ultimate reduction occurs when each block is 1 \times 1—that is, when every invariant summand is one-dimensional. On a line \operatorname{span}(v), any linear operator acts by stretching or shrinking: there is only one degree of freedom, so T(v) must be a scalar multiple of v. If we can decompose \mathcal{V} into one-dimensional invariant subspaces, choosing a basis vector from each yields a coordinate system where T is diagonal—each coordinate is scaled independently.
This motivates our focus on one-dimensional invariant subspaces. Their existence and multiplicity determine whether an operator can be diagonalized.
9.4 One-Dimensional Invariant Subspaces
The simplest nontrivial invariant subspaces are lines through the origin.
Theorem 9.4 A one-dimensional subspace \operatorname{span}(v) with v \neq 0 is T-invariant if and only if T(v) = \lambda v for some \lambda \in \mathbb{F}.
Proof. If \operatorname{span}(v) is invariant, then T(v) \in \operatorname{span}(v), so T(v) = \lambda v for some \lambda. Conversely, if T(v) = \lambda v, then for any cv \in \operatorname{span}(v), T(cv) = cT(v) = c\lambda v = \lambda(cv) \in \operatorname{span}(v). \square
On a one-dimensional invariant subspace, any operator acts as scalar multiplication. This is the simplest possible action.
Definition 9.4 (Eigenvector and eigenvalue) A nonzero vector v \in \mathcal{V} is an eigenvector of T with eigenvalue \lambda \in \mathbb{F} if T(v) = \lambda v.
The set of all eigenvectors with eigenvalue \lambda, together with the zero vector, forms the eigenspace E_\lambda = \ker(T - \lambda I).
By Theorem 4.4 from Chapter 3, E_\lambda is a subspace. The eigenspace is precisely the set of vectors that T maps into scalar multiples of themselves with scaling factor \lambda. Note that \lambda is an eigenvalue if and only if E_\lambda \neq \{0\}, equivalently if T - \lambda I is not injective.
Theorem 9.5 Each eigenspace E_\lambda is T-invariant.
Proof. If v \in E_\lambda, then T(v) = \lambda v \in E_\lambda since T(\lambda v) = \lambda T(v) = \lambda^2 v = \lambda(\lambda v). \square
In fact, T acts on E_\lambda as the scalar operator \lambda I: the restriction T|_{E_\lambda} is multiplication by \lambda.
Theorem 9.6 Let v_1, \ldots, v_k be eigenvectors with distinct eigenvalues \lambda_1, \ldots, \lambda_k. Then \{v_1, \ldots, v_k\} is linearly independent.
Proof. By induction on k. The case k=1 is immediate as eigenvectors are nonzero. Suppose \sum_{i=1}^k c_i v_i = 0. Applying T yields \sum_{i=1}^k c_i \lambda_i v_i = 0. Multiply the original equation by \lambda_1 and subtract: \sum_{i=2}^k c_i(\lambda_i - \lambda_1)v_i = 0. By induction, c_i(\lambda_i - \lambda_1) = 0 for i \geq 2. Since \lambda_i \neq \lambda_1, we have c_i = 0. Substituting into the original equation gives c_1 v_1 = 0, hence c_1 = 0. \square
Corollary 9.2 If \lambda_1, \ldots, \lambda_k are distinct eigenvalues, then E_{\lambda_1} + \cdots + E_{\lambda_k} = E_{\lambda_1} \oplus \cdots \oplus E_{\lambda_k}.
Proof. By Theorem 9.1, we verify the sum is direct. If \sum v_i = 0 with v_i \in E_{\lambda_i}, choose bases for each eigenspace and expand each v_i as a linear combination. The resulting relation among eigenvectors with distinct eigenvalues forces all coefficients to vanish by Theorem 9.6, giving v_i = 0 for all i. \square
Eigenspaces corresponding to different eigenvalues are maximally independent: they intersect only at the origin. If \mathcal{V} = \bigoplus_{i=1}^k E_{\lambda_i}, then in a basis of eigenvectors, T is diagonal with eigenvalues along the diagonal (repeated according to the dimension of each eigenspace).
9.4.1 Example: Finding Eigenvectors
Consider T : \mathbb{R}^3 \to \mathbb{R}^3 with matrix A = \begin{pmatrix} 4 & 0 & 1 \\ 2 & 3 & 2 \\ 1 & 0 & 4 \end{pmatrix} relative to the standard basis. We seek one-dimensional invariant subspaces—equivalently, vectors v such that Av = \lambda v for some scalar \lambda.
The condition Av = \lambda v is equivalent to (A - \lambda I)v = 0. This system has nonzero solutions precisely when A - \lambda I is not invertible. The values of \lambda for which this occurs are the eigenvalues, and the corresponding nonzero solutions are eigenvectors.
For instance, take \lambda = 3: A - 3I = \begin{pmatrix} 1 & 0 & 1 \\ 2 & 0 & 2 \\ 1 & 0 & 1 \end{pmatrix}. The system (A - 3I)v = 0 row-reduces to v_1 + v_3 = 0 with v_2 free. Setting v_2 = 1 and v_3 = t yields eigenvectors \begin{pmatrix} -t \\ 1 \\ t \end{pmatrix}. The eigenspace E_3 = \operatorname{span}\left\{\begin{pmatrix} -1 \\ 0 \\ 1 \end{pmatrix}, \begin{pmatrix} 0 \\ 1 \\ 0 \end{pmatrix}\right\} has dimension 2.
With \lambda = 5: A - 5I = \begin{pmatrix} -1 & 0 & 1 \\ 2 & -2 & 2 \\ 1 & 0 & -1 \end{pmatrix}. Row reduction gives v_1 = v_3 and v_2 = v_3. The eigenspace E_5 = \operatorname{span}\left\{\begin{pmatrix} 1 \\ 1 \\ 1 \end{pmatrix}\right\} has dimension 1.
Since \dim E_3 + \dim E_5 = 2 + 1 = 3 = \dim \mathbb{R}^3, the operator is diagonalizable. In the basis \mathcal{B} = \left\{\begin{pmatrix} -1 \\ 0 \\ 1 \end{pmatrix}, \begin{pmatrix} 0 \\ 1 \\ 0 \end{pmatrix}, \begin{pmatrix} 1 \\ 1 \\ 1 \end{pmatrix}\right\}, the matrix of T is \begin{pmatrix} 3 & 0 & 0 \\ 0 & 3 & 0 \\ 0 & 0 & 5 \end{pmatrix}. The invariant subspace decomposition \mathbb{R}^3 = E_3 \oplus E_5 renders T transparent: it stretches by factor 3 along the plane E_3 and by factor 5 along the line E_5.
But how did we know to test \lambda = 3 and \lambda = 5? The characteristic polynomial provides a systematic method.
9.5 The Characteristic Polynomial
To determine eigenvalues systematically, observe that \lambda is an eigenvalue if and only if E_\lambda = \ker(T - \lambda I) \neq \{0\}, equivalently if T - \lambda I is not invertible. For finite-dimensional spaces, non-invertibility is detected by the determinant vanishing.
Definition 9.5 (Characteristic polynomial) For T : \mathcal{V} \to \mathcal{V} with \dim \mathcal{V} = n < \infty, the characteristic polynomial is \chi_T(\lambda) = \det(T - \lambda I).
This is a polynomial of degree n in \lambda. To verify the definition is basis-independent, note that if A = [T]_{\mathcal{B}} and B = [T]_{\mathcal{C}}, then B = P^{-1}AP for some invertible P by Theorem 6.3 from Chapter 5. Thus \det(B - \lambda I) = \det(P^{-1}(A - \lambda I)P) = \det(P^{-1})\det(A - \lambda I)\det(P) = \det(A - \lambda I) by multiplicativity of the determinant. The polynomial depends only on T, not on coordinates.
For computational purposes, when T is given by a matrix A \in M_n(\mathbb{F}) relative to some basis, we compute \chi_T(\lambda) = \det(A - \lambda I) by expanding the determinant. This yields a polynomial of degree n in \lambda, with leading coefficient (-1)^n and constant term \det(A).
9.5.1 Example: Computing the Characteristic Polynomial
For the matrix from our earlier example, A = \begin{pmatrix} 4 & 0 & 1 \\ 2 & 3 & 2 \\ 1 & 0 & 4 \end{pmatrix}, we have A - \lambda I = \begin{pmatrix} 4-\lambda & 0 & 1 \\ 2 & 3-\lambda & 2 \\ 1 & 0 & 4-\lambda \end{pmatrix}. Expanding along the second column (which has two zeros): \chi_A(\lambda) = (3-\lambda) \det\begin{pmatrix} 4-\lambda & 1 \\ 1 & 4-\lambda \end{pmatrix} = (3-\lambda)[(4-\lambda)^2 - 1]. Simplifying: (4-\lambda)^2 - 1 = 16 - 8\lambda + \lambda^2 - 1 = \lambda^2 - 8\lambda + 15 = (\lambda-3)(\lambda-5). Thus \chi_A(\lambda) = (3-\lambda)^2(5-\lambda) = -(λ-3)^2(λ-5). The roots are \lambda = 3 (with algebraic multiplicity 2) and \lambda = 5 (with algebraic multiplicity 1). These match our earlier findings: \dim E_3 = 2 equals the algebraic multiplicity of \lambda = 3, and \dim E_5 = 1 equals the algebraic multiplicity of \lambda = 5.
The characteristic polynomial encodes all eigenvalues. Finding its roots provides the first step in spectral analysis.
Theorem 9.7 \lambda \in \mathbb{F} is an eigenvalue of T if and only if \chi_T(\lambda) = 0.
Proof. \lambda is an eigenvalue \iff T - \lambda I not invertible \iff \det(T - \lambda I) = 0 by Theorem 7.11 from Chapter 6. \square
Over an algebraically closed field such as \mathbb{C}, every polynomial of degree n has exactly n roots counting multiplicity. Thus every operator on a finite-dimensional complex vector space has eigenvalues. Over \mathbb{R}, polynomials may have fewer than n real roots (e.g., rotations in \mathbb{R}^2 by angles not multiples of \pi), so real operators may lack real eigenvalues.
Definition 9.6 (Algebraic and geometric multiplicity) The algebraic multiplicity of eigenvalue \lambda is its multiplicity as a root of \chi_T(\lambda).
The geometric multiplicity of \lambda is \dim E_\lambda = \dim \ker(T - \lambda I).
Theorem 9.8 For any eigenvalue \lambda, 1 \leq \text{geometric multiplicity}(\lambda) \leq \text{algebraic multiplicity}(\lambda).
The lower bound is immediate: \lambda being an eigenvalue implies E_\lambda \neq \{0\}, so \dim E_\lambda \geq 1. The upper bound requires additional machinery developed in subsequent chapters. Equality of geometric and algebraic multiplicity occurs precisely when \lambda exhibits “full eigenspace dimension” relative to its appearance in the characteristic polynomial.
9.6 Diagonalizability
An operator T : \mathcal{V} \to \mathcal{V} is diagonalizable if there exists a basis \mathcal{B} of \mathcal{V} consisting entirely of eigenvectors, equivalently if [T]_{\mathcal{B}} is diagonal.
Theorem 9.9 The following are equivalent:
T is diagonalizable
\mathcal{V} = \bigoplus_{\lambda} E_\lambda where the sum ranges over all eigenvalues of T
\sum_{\lambda} \dim E_\lambda = \dim \mathcal{V} where the sum ranges over all eigenvalues
Proof. (i) \implies (ii): If \mathcal{B} is an eigenbasis, partition \mathcal{B} by eigenvalue: \mathcal{B} = \bigcup_{\lambda} \mathcal{B}_\lambda where \mathcal{B}_\lambda consists of eigenvectors with eigenvalue \lambda. Then E_\lambda = \operatorname{span}(\mathcal{B}_\lambda), and since \mathcal{B} spans \mathcal{V}, we have \mathcal{V} = \sum_{\lambda} E_\lambda. The sum is direct by Corollary 9.2.
\implies (iii): Immediate from Corollary 9.1.
\implies (i): Choose a basis for each E_\lambda. The union has \sum \dim E_\lambda = \dim \mathcal{V} vectors. By Theorem 9.6, eigenvectors from different eigenspaces are independent, so the union is linearly independent. Being a linearly independent set of size \dim \mathcal{V}, it is a basis. \square
When T is diagonalizable, a basis of eigenvectors reduces T to its simplest form: each basis vector is scaled independently. In coordinates, [T]_{\mathcal{B}} = \operatorname{diag}(\lambda_1, \ldots, \lambda_n) where eigenvalues appear with multiplicity equal to their geometric multiplicity. Powers simplify: [T^k]_{\mathcal{B}} = \operatorname{diag}(\lambda_1^k, \ldots, \lambda_n^k).
Not all operators are diagonalizable. The canonical example is T : \mathbb{R}^2 \to \mathbb{R}^2 with matrix \begin{pmatrix} 2 & 1 \\ 0 & 2 \end{pmatrix} in the standard basis. The characteristic polynomial is (2-\lambda)^2, giving eigenvalue \lambda = 2 with algebraic multiplicity 2. The eigenspace is E_2 = \ker\begin{pmatrix} 0 & 1 \\ 0 & 0 \end{pmatrix} = \operatorname{span}\{e_1\}, which has dimension 1. Since \dim E_2 < 2 = \dim \mathbb{R}^2, the operator is not diagonalizable.
Theorem 9.10 If T has \dim \mathcal{V} distinct eigenvalues, then T is diagonalizable.
Proof. Let \lambda_1, \ldots, \lambda_n be n = \dim \mathcal{V} distinct eigenvalues with eigenvectors v_1, \ldots, v_n. By Theorem 9.6, these form a linearly independent set. Having n linearly independent vectors in an n-dimensional space, they constitute a basis. \square
This sufficient condition is not necessary: an operator can be diagonalizable with fewer than n distinct eigenvalues if some eigenspaces have dimension greater than 1 (e.g., scalar operators T = cI have only one eigenvalue but are diagonal).
9.7 Spectral Properties
The spectrum of T is the set of its eigenvalues. The determinant and trace encode spectral information.
Definition 9.7 (Spectral radius) The spectral radius of an operator T is \rho(T) = \max\{|\lambda| : \lambda \text{ is an eigenvalue of } T\}.
The spectral radius measures the “size” of the largest eigenvalue in absolute value. It plays a fundamental role in the theory of matrix powers and dynamical systems.
Theorem 9.11 For a diagonalizable operator T on a finite-dimensional complex vector space, T^n \to 0 as n \to \infty if and only if \rho(T) < 1.
Proof. If T = PDP^{-1} with D = \operatorname{diag}(\lambda_1, \ldots, \lambda_n), then T^n = PD^nP^{-1} where D^n = \operatorname{diag}(\lambda_1^n, \ldots, \lambda_n^n).
If \rho(T) < 1, then |\lambda_i| < 1 for all i, so |\lambda_i|^n \to 0. Since matrix multiplication is continuous in the entries, T^n \to 0.
Conversely, if \rho(T) \ge 1, there exists an eigenvalue \lambda with |\lambda| \ge 1 and corresponding eigenvector v \neq 0. Then T^n v = \lambda^n v, which does not converge to 0 since |\lambda^n v| = |\lambda|^n \|v\| \not\to 0. \square
For non-diagonalizable operators, the result still holds (see below), but requires the Jordan canonical form.
9.7.1 Operator Norms and Boundedness
To make convergence statements precise, we introduce the operator norm, which measures the “size” of a linear operator.
Definition 9.8 (Operator norm) For a linear operator T : \mathcal{V} \to \mathcal{W} between normed vector spaces, the operator norm is \|T\| = \sup_{\|v\| = 1} \|T(v)\| = \sup_{v \neq 0} \frac{\|T(v)\|}{\|v\|}.
An operator is bounded if \|T\| < \infty.
In finite-dimensional spaces, the supremum is actually a maximum (attained on the compact unit sphere).
Theorem 9.12 In finite-dimensional vector spaces, all linear operators are bounded.
Proof. Let \{e_1, \ldots, e_n\} be a basis of \mathcal{V}. Any v = \sum c_i e_i satisfies \|T(v)\| = \left\|\sum c_i T(e_i)\right\| \le \sum |c_i| \|T(e_i)\| \le \left(\max_i \|T(e_i)\|\right) \sum |c_i|. Since all norms on finite-dimensional spaces are equivalent, there exists C with \sum |c_i| \le C\|v\|, giving \|T(v)\| \le C \max_i \|T(e_i)\| \cdot \|v\|. \square
Key properties of operator norms:
- Submultiplicativity: \|ST\| \le \|S\| \|T\|
- Spectral radius bound: \rho(T) \le \|T\| for any operator norm
- For normal matrices: \|T\| = \rho(T) when using the spectral norm (2-norm)
In infinite-dimensional spaces (function spaces, sequence spaces), boundedness is a nontrivial condition. The differentiation operator D : C^1[0,1] \to C[0,1] with D(f) = f' is unbounded: consider f_n(x) = \sin(nx) with \|f_n\|_\infty = 1 but \|f_n'\|_\infty = n \to \infty.
Unbounded operators require spectral theory for unbounded operators (domain considerations, self-adjoint extensions, essential spectrum)—central to quantum mechanics and PDEs. This is a major topic in functional analysis.
For our finite-dimensional setting, boundedness is automatic and we freely use operator norms in convergence arguments.
Theorem 9.13 If \lambda_1, \ldots, \lambda_n are the eigenvalues of T counted with algebraic multiplicity, then \det(T) = \prod_{i=1}^n \lambda_i.
Proof. The characteristic polynomial factors as \chi_T(\lambda) = \det(A - \lambda I) = (-1)^n(\lambda - \lambda_1) \cdots (\lambda - \lambda_n) over an algebraically closed field. Expanding the right side gives (-1)^n \lambda^n + \text{lower order terms}, with constant term (-1)^n(-\lambda_1) \cdots (-\lambda_n) = \prod \lambda_i. The constant term of \det(A - \lambda I) is \det(A), obtained by setting \lambda = 0. \square
Theorem 9.14 If \lambda_1, \ldots, \lambda_n are the eigenvalues of T counted with algebraic multiplicity, then \operatorname{tr}(T) = \sum_{i=1}^n \lambda_i.
Proof. The coefficient of \lambda^{n-1} in \chi_T(\lambda) = \det(A - \lambda I) equals (-1)^{n-1} \operatorname{tr}(A), which can be verified by expanding the determinant. The same coefficient in (-1)^n \prod(\lambda - \lambda_i) is (-1)^{n-1} \sum \lambda_i. Equating gives the result. \square
These identities reveal the geometric content of determinant and trace: the determinant measures the product of stretching factors along eigendirections, while the trace measures their sum. For diagonalizable operators this is immediate from the diagonal form; the theorems assert it holds universally, even for non-diagonalizable operators.
9.8 The Jordan Canonical Form
When an operator fails to diagonalize, the Jordan form provides the next-best decomposition.
Definition 9.9 (Jordan block) A Jordan block with eigenvalue \lambda and size k is the k \times k matrix J_k(\lambda) = \begin{pmatrix} \lambda & 1 & 0 & \cdots & 0 \\ 0 & \lambda & 1 & \cdots & 0 \\ 0 & 0 & \lambda & \cdots & 0 \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & 0 & \cdots & \lambda \end{pmatrix}.
A Jordan block is nearly diagonal: the eigenvalue \lambda appears on the diagonal, with 1’s on the superdiagonal. For example, J_3(2) = \begin{pmatrix} 2 & 1 & 0 \\ 0 & 2 & 1 \\ 0 & 0 & 2 \end{pmatrix}.
Theorem 9.15 (Jordan canonical form) Let T : \mathcal{V} \to \mathcal{V} be a linear operator on a finite-dimensional complex vector space. There exists a basis \mathcal{B} such that [T]_{\mathcal{B}} is block-diagonal with Jordan blocks: [T]_{\mathcal{B}} = \begin{pmatrix} J_{k_1}(\lambda_1) & & & \\ & J_{k_2}(\lambda_2) & & \\ & & \ddots & \\ & & & J_{k_m}(\lambda_m) \end{pmatrix}. This form is unique up to reordering of blocks.
The proof requires developing the theory of generalized eigenvectors and nilpotent operators, which lies beyond our scope. The key insight: when geometric multiplicity falls short of algebraic multiplicity, we obtain Jordan blocks of size larger than 1.
Example. The operator with matrix A = \begin{pmatrix} 2 & 1 \\ 0 & 2 \end{pmatrix} has eigenvalue \lambda = 2 with algebraic multiplicity 2 but geometric multiplicity 1 (only one eigenvector). The matrix is already in Jordan form: J_2(2).
For higher powers: J_2(2)^n = \begin{pmatrix} 2 & 1 \\ 0 & 2 \end{pmatrix}^n = \begin{pmatrix} 2^n & n \cdot 2^{n-1} \\ 0 & 2^n \end{pmatrix}.
This shows the deviation from pure scaling: the off-diagonal entry grows as n \cdot 2^{n-1}, introducing polynomial factors.
Theorem 9.16 For an operator T with Jordan canonical form, T^n \to 0 as n \to \infty if and only if \rho(T) < 1.
Proof (sketch). Each Jordan block J_k(\lambda) can be written as J_k(\lambda) = \lambda I + N where N is the strictly upper-triangular part (a nilpotent matrix with N^k = 0). Then J_k(\lambda)^n = (\lambda I + N)^n = \sum_{j=0}^{k-1} \binom{n}{j} \lambda^{n-j} N^j.
If |\lambda| < 1, the dominant term \lambda^n decays exponentially, and the polynomial factors \binom{n}{j} cannot overcome this decay. If |\lambda| \ge 1, the terms grow or stay bounded, preventing convergence to zero. \square
The Jordan form shows that spectral radius controls asymptotic behavior even for non-diagonalizable operators.
9.9 Matrix Exponential
The matrix exponential extends the exponential function to matrices, enabling solutions to systems of differential equations.
Definition 9.10 (Matrix exponential) For A \in M_n(\mathbb{F}), the matrix exponential is e^A = \sum_{k=0}^\infty \frac{A^k}{k!} = I + A + \frac{A^2}{2!} + \frac{A^3}{3!} + \cdots.
The series converges absolutely for all matrices A, as shown by comparison with the scalar exponential series.
Theorem 9.17 The series defining e^A converges in any matrix norm, and the convergence is uniform on bounded sets.
Proof. In any submultiplicative norm \|\cdot\| (satisfying \|AB\| \le \|A\| \|B\|), we have \|A^k\| \le \|A\|^k. Thus \left\|\sum_{k=0}^\infty \frac{A^k}{k!}\right\| \le \sum_{k=0}^\infty \frac{\|A\|^k}{k!} = e^{\|A\|} < \infty. \quad \square
Properties of the matrix exponential:
- e^0 = I
- If AB = BA, then e^{A+B} = e^A e^B
- e^A is always invertible with (e^A)^{-1} = e^{-A}
- If A = PDP^{-1} is diagonalizable, then e^A = Pe^DP^{-1} where e^D = \operatorname{diag}(e^{\lambda_1}, \ldots, e^{\lambda_n})
Example. For A = \begin{pmatrix} 0 & 1 \\ -1 & 0 \end{pmatrix} (rotation by 90°), the eigenvalues are \pm i. The matrix exponential is e^{tA} = \begin{pmatrix} \cos t & \sin t \\ -\sin t & \cos t \end{pmatrix}, giving the standard rotation matrix—a continuous family of rotations parametrized by time t.
Chapter 10 introduces inner product spaces and examines operators preserving geometric structure. The spectral theorem shows that self-adjoint operators on inner product spaces always diagonalize with orthonormal eigenbases, providing geometric meaning to the abstract algebraic decomposition developed here. Normal operators admit similar spectral decompositions, and the interplay between algebraic and geometric structure governs the theory of operators on Hilbert spaces.
The conceptual framework—decomposition into invariant subspaces, eigenvalue characterization via characteristic polynomials, reduction to simplest form—extends far beyond finite dimensions. Compact operators on Banach spaces, differential operators on function spaces, and representations of groups all admit spectral decompositions generalizing the finite-dimensional theory. The machinery developed here provides the foundation for functional analysis, differential equations, and mathematical physics.