7 Systems of Linear Equations
7.1 The Central Problem
The theory developed thus far has been abstract. Vector spaces, linear maps, bases, and coordinate representations provide a conceptual framework. We now turn to computation.
The fundamental algorithmic problem of linear algebra is: given A \in M_{m \times n}(\mathbb{F}) and b \in \mathbb{F}^m, find all x \in \mathbb{F}^n satisfying Ax = b.
This is a system of linear equations. Writing A = \begin{pmatrix} | & & | \\ a_1 & \cdots & a_n \\ | & & | \end{pmatrix} where a_j denotes the j-th column, the equation Ax = b becomes x_1 a_1 + x_2 a_2 + \cdots + x_n a_n = b. The question is geometric: Does b lie in the span of the columns of A? If so, what are the coefficients (x_1, \ldots, x_n) expressing b as a linear combination?
In the language of Chapter 3, this asks: given a linear map T : \mathcal{V} \to \mathcal{W} represented by A and a target w \in \mathcal{W} represented by b, find all v \in \mathcal{V} with T(v) = w. The geometric question “b \in \operatorname{im}(T)?” becomes the computational question “Can we solve Ax = b?”
Three cases arise:
No solution: b \notin \operatorname{span}(a_1, \ldots, a_n), the system is inconsistent
Unique solution: b \in \operatorname{span}(a_1, \ldots, a_n) and \{a_1, \ldots, a_n\} is linearly independent
Infinitely many solutions: b \in \operatorname{span}(a_1, \ldots, a_n) and \{a_1, \ldots, a_n\} is linearly dependent
When solutions exist, their structure is controlled by the kernel. This chapter develops the algorithmic machinery—row operations, Gaussian elimination, echelon forms—that determines which case holds and constructs all solutions systematically.
7.2 Elementary Row Operations
A row operation on A \in M_{m \times n}(\mathbb{F}) is one of three types:
- (Row interchange) Exchange rows i and j
- (Row scaling) Multiply row i by nonzero c \in \mathbb{F}
- (Row replacement) Replace row i by itself plus c times row j, where i \neq j
Geometrically, row operations correspond to changing coordinates in the codomain \mathbb{F}^m. They transform the system Ax = b into an equivalent system \tilde{A}x = \tilde{b} with the same solution set, but where the structure of \tilde{A} makes the solutions evident.
Each row operation can be realized as left multiplication by an elementary matrix: a matrix obtained by performing that operation on the identity I_m.
Definition 7.1 (Elementary matrix) An elementary matrix is obtained by performing a single row operation on I_m.
Let E_{ij} denote the matrix obtained by swapping rows i and j of I_m. Then E_{ij}A swaps rows i and j of A.
Let E_i(c) denote the matrix obtained by multiplying row i of I_m by c \neq 0. Then E_i(c)A multiplies row i of A by c.
Let E_{ij}(c) denote the matrix obtained by adding c times row j to row i in I_m (where i \neq j). Then E_{ij}(c)A adds c times row j of A to row i.
Theorem 7.1 Every elementary matrix is invertible. The inverses are: E_{ij}^{-1} = E_{ij}, \quad E_i(c)^{-1} = E_i(c^{-1}), \quad E_{ij}(c)^{-1} = E_{ij}(-c).
Proof. Each row operation has a natural inverse: swapping rows i and j twice returns to the original configuration; scaling by c then by c^{-1} is the identity; adding c times row j to row i, then adding -c times row j to row i, cancels. \square
A sequence of row operations corresponds to left multiplication by a product of elementary matrices. If B is obtained from A by row operations, then B = E_k \cdots E_1 A for some elementary matrices E_1, \ldots, E_k. The product P = E_k \cdots E_1 is invertible, belonging to \mathrm{GL}_m(\mathbb{F}).
Row operations preserve the solution set of Ax = b. If B = PA and c = Pb where P \in \mathrm{GL}_m(\mathbb{F}), then Ax = b if and only if Bx = c, because P is invertible: Ax = b \iff PAx = Pb \iff Bx = c.
Geometrically, multiplying both sides by P applies a change of coordinates to the codomain. The target b in the new coordinates is Pb, and the question “Does b lie in \operatorname{im}(A)?” becomes “Does Pb lie in \operatorname{im}(PA)?”—which has the same answer, since P is an isomorphism.
7.3 Row Echelon Form
The goal of row operations is to transform A into a form where its structure is transparent.
Definition 7.2 (Leading entry and pivot) The leading entry of a nonzero row is its leftmost nonzero entry. A pivot position is the location of a leading entry. A pivot column is a column containing a pivot position.
Definition 7.3 (Row echelon form) A matrix is in row echelon form if: 1. All nonzero rows are above any rows of all zeros 2. Each leading entry is in a column to the right of the leading entry in the row above 3. All entries in a column below a leading entry are zero
In row echelon form, the leading entries form a staircase descending from left to right: \begin{pmatrix} \boxed{*} & * & * & * & * & * \\ 0 & 0 & \boxed{*} & * & * & * \\ 0 & 0 & 0 & \boxed{*} & * & * \\ 0 & 0 & 0 & 0 & 0 & \boxed{*} \\ 0 & 0 & 0 & 0 & 0 & 0 \end{pmatrix} where \boxed{*} denotes a pivot and * denotes an arbitrary entry. Entries below pivots are zero; entries above and to the right of pivots may be nonzero.
Definition 7.4 (Reduced row echelon form) A matrix is in reduced row echelon form if it is in row echelon form and additionally:
- Each leading entry equals 1
- Each leading entry is the only nonzero entry in its entire column
In reduced row echelon form, each pivot is 1 and is the only nonzero entry in its column: \begin{pmatrix} \boxed{1} & * & 0 & 0 & * & 0 \\ 0 & 0 & \boxed{1} & 0 & * & 0 \\ 0 & 0 & 0 & \boxed{1} & * & 0 \\ 0 & 0 & 0 & 0 & 0 & \boxed{1} \\ 0 & 0 & 0 & 0 & 0 & 0 \end{pmatrix}
Row echelon form is not unique—different sequences of row operations may yield different row echelon forms—but the pivot positions coincide across all row echelon forms of a given matrix. Reduced row echelon form is unique, as the next theorem establishes.
7.4 Gaussian Elimination
Gaussian elimination is the systematic procedure for reducing a matrix to row echelon form via row operations.
Forward phase (reduction to row echelon form): Beginning with the leftmost column, locate the first column containing a nonzero entry. If necessary, interchange rows to move a nonzero entry to the top position—this entry becomes the first pivot. Use row replacement operations to create zeros below this pivot. Repeat this process on the remaining submatrix below the first row, working leftward to right until all rows are processed. The result is a matrix in row echelon form.
Backward phase (reduction to reduced row echelon form): Beginning with the rightmost pivot, scale its row so the pivot becomes 1. Use row replacement operations to create zeros above this pivot. Move leftward to the next pivot and repeat, continuing until all pivots are 1 and all entries above pivots are zero. The result is reduced row echelon form.
Consider the following example showing the structure: \begin{pmatrix} 2 & -4 & 6 & -2 & 8 \\ -1 & 2 & -3 & 4 & 1 \\ 3 & -6 & 9 & -1 & 11 \\ 1 & -2 & 3 & 1 & 5 \end{pmatrix} \sim \begin{pmatrix} \boxed{2} & -4 & 6 & -2 & 8 \\ 0 & 0 & 0 & \boxed{3} & 5 \\ 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 \end{pmatrix} This is in row echelon form with pivots in columns 1 and 4. Further row operations yield reduced row echelon form: \begin{pmatrix} \boxed{1} & -2 & 3 & 0 & 6 \\ 0 & 0 & 0 & \boxed{1} & \tfrac{5}{3} \\ 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 \end{pmatrix}
The pivots are in columns 1 and 4. Columns 2, 3, and 5 are free columns (non-pivot columns). The presence of free columns indicates that the columns are linearly dependent, and correspondingly, the kernel of A is nontrivial.
Theorem 7.2 Every matrix can be reduced to row echelon form via row operations. Moreover, every matrix is row equivalent to a unique reduced row echelon form.
Proof. The forward phase terminates in finitely many steps since each iteration either creates a new pivot or moves to a row below, and there are finitely many rows. The process produces a matrix in row echelon form.
For uniqueness of reduced row echelon form, suppose R and S are both in reduced row echelon form and row equivalent to A. Since row equivalence is an equivalence relation (proved in Section 7.8 below), R and S are row equivalent to each other, hence \ker(R) = \ker(S) (also proved there).
We claim R = S. Suppose they first differ in column j. Let r and s denote the j-th columns of R and S respectively, with r \neq s. Since R and S are in RREF, each non-pivot column is determined entirely by the pivot columns to its left: the k-th entry of a non-pivot column records the coefficient of the k-th pivot needed to express that column as a combination of pivot columns. If column j is a pivot column in one and not the other, then \ker(R) \neq \ker(S) (one has a free variable in position j, the other does not)—a contradiction. So j is a pivot column in both. But two RREF matrices with the same pivot positions and the same null space must agree on every column: the entry of a non-pivot column at row k equals (minus) the coefficient of that non-pivot variable in the unique null space basis vector corresponding to it, which is determined by \ker(R) = \ker(S). And pivot columns agree since they are standard basis vectors e_k in both. Thus R = S. \square

7.5 The Solution Set of Ax = b
To solve Ax = b, form the augmented matrix [A \mid b] \in M_{m \times (n+1)}(\mathbb{F}) and reduce to row echelon form (or reduced row echelon form).
The augmented matrix represents the system explicitly: the first n columns encode A, and the final column encodes b. Row operations applied to [A \mid b] simultaneously transform both A and b, preserving the solution set.
Consistency. The system is inconsistent if the reduced row echelon form of [A \mid b] contains a row of the form \begin{pmatrix} 0 & 0 & \cdots & 0 & \mid & c \end{pmatrix} where c \neq 0. This row represents the equation 0 = c, which has no solution. Geometrically, it indicates that b does not lie in the column space of A. If no such row appears, the system is consistent: solutions exist.
Free and pivot variables. Variables corresponding to pivot columns are pivot variables (also called basic variables). Variables corresponding to non-pivot columns are free variables. Free variables can take any value; pivot variables are determined by the free variables via back substitution.
Consider a system with augmented matrix reducing to: \begin{pmatrix} \boxed{1} & 2 & 0 & -1 & \mid & 3 \\ 0 & 0 & \boxed{1} & 2 & \mid & -1 \\ 0 & 0 & 0 & 0 & \mid & 0 \end{pmatrix} Pivot variables are x_1 and x_3; free variables are x_2 and x_4. Setting x_2 = s and x_4 = t, we obtain from the first row x_1 = 3 - 2s + t and from the second row x_3 = -1 - 2t. The general solution is \begin{pmatrix} x_1 \\ x_2 \\ x_3 \\ x_4 \end{pmatrix} = \begin{pmatrix} 3 \\ 0 \\ -1 \\ 0 \end{pmatrix} + s\begin{pmatrix} -2 \\ 1 \\ 0 \\ 0 \end{pmatrix} + t\begin{pmatrix} 1 \\ 0 \\ -2 \\ 1 \end{pmatrix}.
The solution set is a two-dimensional affine subspace of \mathbb{F}^4: a translate of the kernel by the particular solution (3, 0, -1, 0)^T.
7.6 Structure of Solution Sets
The form exhibited above is universal.
Theorem 7.3 (General solution formula) If Ax = b is consistent and x_p is any particular solution, then the set of all solutions is \{x_p + x_h : x_h \in \ker(A)\}.
Proof. If Ax_p = b and Ax_h = 0, then A(x_p + x_h) = Ax_p + Ax_h = b + 0 = b, so x_p + x_h is a solution.
Conversely, if Ax = b, then A(x - x_p) = Ax - Ax_p = b - b = 0, so x - x_p \in \ker(A), giving x = x_p + (x - x_p). \square
This decomposition has geometric content. The solution set to Ax = b is an affine subspace of \mathbb{F}^n: a translate of the linear subspace \ker(A) by the vector x_p. If \ker(A) = \{0\}, the solution is unique. If \dim \ker(A) = k > 0, the solution set is a k-dimensional affine subspace—a translated copy of \ker(A).
The homogeneous system Ax = 0 always has the solution x = 0, and its full solution set is \ker(A), a linear subspace of \mathbb{F}^n of dimension n - \operatorname{rank}(A) by Theorem 4.5.
The general solution formula reflects a principle pervasive in mathematics: solutions to inhomogeneous equations are obtained by adding a particular solution to the general solution of the associated homogeneous equation. This structure appears in differential equations (general solution = particular + complementary), in affine geometry (affine subspaces are translates of linear subspaces), and throughout analysis.
7.7 Consistency and Rank
A system Ax = b is consistent if and only if b \in \operatorname{Col}(A)—equivalently, b is a linear combination of the columns of A.
Theorem 7.4 The following are equivalent:
Ax = b is consistent
b \in \operatorname{Col}(A)
\operatorname{rank}([A \mid b]) = \operatorname{rank}(A)
The reduced row echelon form of [A \mid b] has no pivot in the last column
Proof. (1) \iff (2) is the definition of consistency in geometric language: consistency means b is expressible as Ax for some x, i.e., as a linear combination of the columns of A.
(2) \iff (3): The column space of [A \mid b] is \operatorname{Col}(A) + \operatorname{span}(b). If b \in \operatorname{Col}(A), then \operatorname{Col}([A \mid b]) = \operatorname{Col}(A), so the ranks are equal. If b \notin \operatorname{Col}(A), then b adds a linearly independent direction to \operatorname{Col}(A), giving \operatorname{rank}([A \mid b]) = \operatorname{rank}(A) + 1.
(3) \iff (4): A pivot in the last column of [A \mid b] corresponds exactly to a row [0 \; \cdots \; 0 \mid c] with c \neq 0, which increases the rank by 1. If no such row appears, the ranks are equal. \square
For square systems (m = n), consistency for all b is equivalent to invertibility of A:
Theorem 7.5 Let A \in M_{n \times n}(\mathbb{F}). The following are equivalent:
For every b \in \mathbb{F}^n, the system Ax = b has at least one solution
The linear map x \mapsto Ax is surjective
\operatorname{rank}(A) = n
A has a pivot in every row
A is invertible
Proof. (1) \iff (2) by definition of surjectivity.
(2) \iff (3) since surjectivity means \operatorname{im}(A) = \mathbb{F}^n, equivalently \dim \operatorname{im}(A) = n.
(3) \iff (4) since the rank equals the number of pivots, and for an n \times n matrix there are n rows, so n pivots means one per row.
(3) \iff (5) for square matrices: if \operatorname{rank}(A) = n, then \dim \ker(A) = 0 by Theorem 4.5, so A is injective; and since \operatorname{rank}(A) = n means \dim \operatorname{im}(A) = n = \dim \mathbb{F}^n, the image equals \mathbb{F}^n and A is also surjective. Hence A is bijective, hence invertible (by Theorem 5.21 from Chapter 4). \square
7.8 Row Equivalence
Two matrices A, B \in M_{m \times n}(\mathbb{F}) are row equivalent, written A \sim B, if one can be obtained from the other by a finite sequence of row operations. Equivalently, there exists P \in \mathrm{GL}_m(\mathbb{F}) such that B = PA.
Theorem 7.6 Row equivalence is an equivalence relation on M_{m \times n}(\mathbb{F}). If A \sim B, then:
\operatorname{rank}(A) = \operatorname{rank}(B)
\ker(A) = \ker(B)
A and B have the same reduced row echelon form
Proof. For the equivalence relation: reflexivity holds since A = I_m A; symmetry holds since B = PA implies A = P^{-1}B; transitivity holds since B = P_1 A and C = P_2 B imply C = P_2 P_1 A.
For (2): Bx = 0 \iff PAx = 0 \iff Ax = 0 since P is invertible. So \ker(A) = \ker(B).
For (1): by Theorem 4.5, \operatorname{rank}(A) = n - \dim \ker(A) = n - \dim \ker(B) = \operatorname{rank}(B).
For (3): this follows from Theorem 7.2. Both A and B have a unique RREF, and since A \sim B, any sequence of row operations reducing A to its RREF can be composed with the operations relating A to B to reduce B to the same RREF. \square
Row equivalent matrices represent the same linear system in different coordinates in the codomain. The geometric content—the kernel and the rank—is preserved, but the column space generally changes.
7.9 Computing Matrix Inverses
If A \in M_{n \times n}(\mathbb{F}) is invertible, we compute A^{-1} by row reduction.
Row operations correspond to left multiplication by elementary matrices. Reducing A to I_n means finding P = E_k \cdots E_1 \in \mathrm{GL}_n(\mathbb{F}) such that PA = I_n. But then P = A^{-1}.
To find P, perform the same row operations simultaneously on I_n. Form the augmented matrix [A \mid I_n] \in M_{n \times 2n}(\mathbb{F}) and row reduce. If A is invertible, the reduction yields [I_n \mid A^{-1}], since the same P with PA = I_n gives PI_n = P = A^{-1}.
If A is not invertible, the left half will not reduce to I_n—a row of zeros will appear, indicating rank deficiency.
Theorem 7.7 A \in M_{n \times n}(\mathbb{F}) is invertible if and only if A is row equivalent to I_n.
Proof. If A \sim I_n, then PA = I_n for some P \in \mathrm{GL}_n(\mathbb{F}), so A is invertible with A^{-1} = P.
Conversely, if A is invertible, then \operatorname{rank}(A) = n by Theorem 5.22. The reduced row echelon form of A therefore has n pivots. The unique n \times n matrix in reduced row echelon form with n pivots is I_n: each row has exactly one leading 1, which must occupy the diagonal position since there are as many pivots as rows and columns. \square
7.10 LU Decomposition
Gaussian elimination can be organized into a matrix factorization revealing the structure of the reduction process.
If A can be reduced to row echelon form using only row replacement operations (no row interchanges), we can write A = LU where L is lower triangular with ones on the diagonal and U is upper triangular.
Definition 7.5 (Triangular matrices) A matrix L \in M_{n \times n}(\mathbb{F}) is lower triangular if \ell_{ij} = 0 for i < j. It is upper triangular if u_{ij} = 0 for i > j.
Definition 7.6 (LU decomposition) A matrix A \in M_{n \times n}(\mathbb{F}) has an LU decomposition if A = LU where L is lower triangular with \ell_{ii} = 1 for all i, and U is upper triangular.
Construction: Perform forward elimination without row swaps, recording the multipliers. The matrix U is the resulting upper triangular matrix. The matrix L encodes the multipliers: if we subtracted m_{ji} times row i from row j to eliminate the (j,i) entry, then m_{ji} goes in position (j,i) of L. For the generic 3 \times 3 pattern: L = \begin{pmatrix} 1 & 0 & 0 \\ m_{21} & 1 & 0 \\ m_{31} & m_{32} & 1 \end{pmatrix}, \quad U = \begin{pmatrix} u_{11} & u_{12} & u_{13} \\ 0 & u_{22} & u_{23} \\ 0 & 0 & u_{33} \end{pmatrix}.
Solving Ax = b with this factorization becomes a two-step process: solve Ly = b by forward substitution, then solve Ux = y by back substitution. Each step is easy since the matrices are triangular. If we solve many systems with the same A but different b, we compute the LU decomposition once and reuse it for each new right-hand side—more efficient than repeatedly row-reducing [A \mid b].
Theorem 7.8 If A \in M_{n \times n}(\mathbb{F}) can be reduced to row echelon form using only row replacement operations (no row interchanges), then A has an LU decomposition.
Proof. Each row replacement operation in forward elimination adds a multiple of row j to row i with i > j (below the pivot), which is left multiplication by an elementary matrix E_{ij}(c) with i > j. Such matrices are lower triangular with ones on the diagonal. If E_k \cdots E_1 A = U, then A = E_1^{-1} \cdots E_k^{-1} U. Each E_{ij}(c)^{-1} = E_{ij}(-c) is also lower triangular with ones on the diagonal. A product of lower triangular matrices is lower triangular (since the (a,b) entry of a product only depends on entries in rows \geq a and columns \leq b), and a product of matrices with ones on the diagonal has ones on the diagonal (since the diagonal of the product of unit lower triangular matrices equals the product of the diagonals, each equal to 1). Thus L = E_1^{-1} \cdots E_k^{-1} is lower triangular with ones on the diagonal, giving A = LU. \square
If row interchanges are required, we obtain a PLU decomposition: PA = LU where P is a permutation matrix. Every matrix has a PLU decomposition.
7.10.1 LDU Decomposition
The LU decomposition can be further refined by separating the scaling from the elimination structure.
Definition 7.7 (LDU decomposition) A matrix A \in M_{n \times n}(\mathbb{F}) has an LDU decomposition if A = LDU where L is lower triangular with \ell_{ii} = 1, D is diagonal, and U is upper triangular with u_{ii} = 1.
Given an LU decomposition A = L_1 U_1, factor out the diagonal of U_1: U_1 = \begin{pmatrix} d_1 & 0 & \cdots \\ 0 & d_2 & \cdots \\ \vdots & & \ddots \end{pmatrix} \begin{pmatrix} 1 & u_{12}/d_1 & \cdots \\ 0 & 1 & \cdots \\ \vdots & & \ddots \end{pmatrix} = DU giving A = L_1 DU = LDU.
Theorem 7.9 If A has an LU decomposition A = L_1 U_1 with all pivots d_i = (U_1)_{ii} nonzero, then A has an LDU decomposition.
Proof. Factor U_1 = DU as above, dividing each row i of U_1 by d_i to obtain the unit upper triangular matrix U. The diagonal matrix D = \operatorname{diag}(d_1, \ldots, d_n) satisfies U_1 = DU, giving A = L_1 DU. \square
Uniqueness. The LDU decomposition is unique when it exists. If L_1 D_1 U_1 = L_2 D_2 U_2, then L_2^{-1} L_1 = D_2 U_2 U_1^{-1} D_1^{-1}. The left side is unit lower triangular and the right side is upper triangular; a matrix that is simultaneously unit lower triangular and upper triangular must be I (its diagonal entries are 1 from the unit lower triangular side, and all off-diagonal lower entries are 0, forcing it to be diagonal with 1’s, hence I). So L_1 = L_2, which gives D_1 U_1 = D_2 U_2, hence D_2^{-1} D_1 = U_2 U_1^{-1}. The left side is diagonal and the right side is unit upper triangular; by the same argument, D_1 = D_2, and then U_1 = U_2.
This factorization makes certain properties transparent: \det(A) = \det(L)\det(D)\det(U) = \prod_{i=1}^n d_i—the determinant equals the product of the pivots.
Symmetric matrices. If A = A^T has an LDU decomposition, then A^T = U^T D L^T = L D U = A forces U^T = L by uniqueness. Thus symmetric matrices factor as A = LDL^T, requiring only L and D. For positive definite symmetric matrices, all pivots d_i > 0, and writing D = \tilde{D}^2 with \tilde{D} = \operatorname{diag}(\sqrt{d_1}, \ldots, \sqrt{d_n}) yields the Cholesky decomposition A = \tilde{L}\tilde{L}^T where \tilde{L} = L\tilde{D}. This decomposition is central to numerical linear algebra.
7.11 Pivot Columns and the Column Space
The pivot columns of A—as they appear in the original matrix, not in the reduced row echelon form—form a basis of \operatorname{Col}(A).
Theorem 7.10 Let A \in M_{m \times n}(\mathbb{F}) have pivots in columns j_1, \ldots, j_r. Then \{a_{j_1}, \ldots, a_{j_r}\} is a basis of \operatorname{Col}(A), where a_j denotes the j-th column of A.
Proof. Let R denote the reduced row echelon form of A.
For independence: suppose \sum_{k=1}^{r} c_k a_{j_k} = 0. Define x \in \mathbb{F}^n by x_{j_k} = c_k and x_i = 0 for all other i. Then Ax = \sum_{k=1}^r c_k a_{j_k} = 0, so x \in \ker(A) = \ker(R) (by Theorem 7.6). Since the pivot columns of R are the standard basis vectors e_1, \ldots, e_r, the equation Rx = 0 reads \sum_{k=1}^r c_k e_k = 0, which forces c_1 = \cdots = c_r = 0.
For spanning: every column of A is a linear combination of the pivot columns. This is explicit in R: each non-pivot column of R is expressed as a linear combination of the pivot columns e_1, \ldots, e_r. The same linear combination holds in A because \ker(A) = \ker(R) implies the columns of A satisfy the same dependence relations as the columns of R. Thus \operatorname{Col}(A) = \operatorname{span}(a_{j_1}, \ldots, a_{j_r}). \square
Consequently, \operatorname{rank}(A) equals the number of pivots in any row echelon form of A. The pivot positions identify the maximal linearly independent subset of columns, and the free variables correspond to degrees of freedom in the kernel.
7.12 The Invertible Matrix Theorem
We collect equivalent characterizations of invertibility, drawing from Chapters 3, 4, and the present chapter.
Theorem 7.11 (The Invertible Matrix Theorem) For A \in M_{n \times n}(\mathbb{F}), the following are equivalent:
- A is invertible
- A is row equivalent to I_n
- A has n pivots
- \operatorname{rank}(A) = n
- \ker(A) = \{0\}
- The columns of A are linearly independent
- The columns of A span \mathbb{F}^n
- The columns of A form a basis of \mathbb{F}^n
- The map x \mapsto Ax is injective
- The map x \mapsto Ax is surjective
- The map x \mapsto Ax is bijective
- For every b \in \mathbb{F}^n, the equation Ax = b has exactly one solution
- A^T is invertible
- The rows of A are linearly independent
- The rows of A span \mathbb{F}^n
- The rows of A form a basis of \mathbb{F}^n
Proof. We established most of these equivalences in previous chapters.
(1) \iff (4): By Theorem 5.22.
(4) \iff (5): By Theorem 4.5, \operatorname{rank}(A) = n \iff \dim \ker(A) = 0 \iff \ker(A) = \{0\}.
(5) \iff (6): The columns are linearly independent if and only if the only solution to Ax = 0 is x = 0.
(6) \iff (7): By Theorem 3.9, an independent set in \mathbb{F}^n has size at most n; so n independent columns cannot be extended further, making them a maximally independent set, hence a basis by Theorem 3.6, hence spanning. Conversely, a spanning set of size n is minimally spanning (removing any element leaves n-1 vectors, which cannot span \mathbb{F}^n since any independent set—including any basis of size n—would then exceed the spanning size, contradicting Theorem 3.9), hence a basis by Theorem 3.6, hence independent.
(6) \land (7) \iff (8): A set is a basis if and only if it is both independent and spanning.
(5) \iff (9): By Theorem 4.6, injectivity is equivalent to trivial kernel.
(7) \iff (10): Spanning \mathbb{F}^n means the image is all of \mathbb{F}^n, which is surjectivity.
(9) \land (10) \iff (11): Bijectivity is injectivity and surjectivity.
(11) \iff (12): Bijectivity means for every b there exists a unique x with Ax = b.
(1) \iff (2): By Theorem 7.7.
(2) \iff (3): By Theorem 7.2, the reduced row echelon form of A is I_n if and only if A has n pivots.
(1) \iff (13): Since (A^T)^{-1} = (A^{-1})^T, invertibility of A is equivalent to invertibility of A^T.
(13) \iff (14) \iff (15) \iff (16): Apply equivalences (1) \iff (6) \iff (7) \iff (8) to A^T, noting that rows of A are columns of A^T. \square
This theorem unifies the geometric perspective (kernel, image, basis), the algebraic perspective (linear independence, span), and the computational perspective (pivots, row equivalence, matrix invertibility). This is an important theorem in introductory linear algebra, albeit somewhat obvious if one has understood the ideas up to this point.
7.13 Closing Remarks
Gaussian elimination bridges abstract theory and concrete computation. Row operations preserve the solution set while transforming the matrix to a form where solutions are transparent. The machinery of row echelon forms, pivot positions, and free variables provides systematic algorithms for the fundamental questions: Is Ax = b consistent? What is \ker(A)? Is A invertible, and if so, what is A^{-1}?
The solution structure—general = particular + homogeneous—reflects the affine geometry of linear equations. Solution sets are translates of the kernel, and the kernel is the solution space of the associated homogeneous system.
Row equivalence is an equivalence relation preserving rank and kernel but not column space. The pivot columns of the original matrix form a basis of \operatorname{Col}(A), extracting maximal independent information from A. Non-pivot columns correspond to free variables and degrees of freedom in the kernel.
LU decomposition factors the elimination process, separating the lower triangular operations from the upper triangular result. This is the first of many matrix factorizations in linear algebra, each adapted to different computational or theoretical purposes. The next chapter develops determinants, providing a coordinate-free measure of how linear maps scale volumes and yielding an algebraic criterion for invertibility.