7 Systems of Linear Equations

7.1 The Central Problem

The theory developed thus far has been abstract. Vector spaces, linear maps, bases, and coordinate representations provide a conceptual framework. We now turn to computation.

The fundamental algorithmic problem of linear algebra is: given A \in M_{m \times n}(\mathbb{F}) and b \in \mathbb{F}^m, find all x \in \mathbb{F}^n satisfying Ax = b.

This is a system of linear equations. Writing A = \begin{pmatrix} | & & | \\ a_1 & \cdots & a_n \\ | & & | \end{pmatrix} where a_j denotes the j-th column, the equation Ax = b becomes x_1 a_1 + x_2 a_2 + \cdots + x_n a_n = b. The question is geometric: Does b lie in the span of the columns of A? If so, what are the coefficients (x_1, \ldots, x_n) expressing b as a linear combination?

In the language of Chapter 3, this asks: given a linear map T : \mathcal{V} \to \mathcal{W} represented by A and a target w \in \mathcal{W} represented by b, find all v \in \mathcal{V} with T(v) = w. The geometric question “b \in \operatorname{im}(T)?” becomes the computational question “Can we solve Ax = b?”

Three cases arise:

No solution: b \notin \operatorname{span}(a_1, \ldots, a_n), the system is inconsistent
Unique solution: b \in \operatorname{span}(a_1, \ldots, a_n) and \{a_1, \ldots, a_n\} is linearly independent
Infinitely many solutions: b \in \operatorname{span}(a_1, \ldots, a_n) and \{a_1, \ldots, a_n\} is linearly dependent

When solutions exist, their structure is controlled by the kernel. This chapter develops the algorithmic machinery—row operations, Gaussian elimination, echelon forms—that determines which case holds and constructs all solutions systematically.

7.2 Elementary Row Operations

A row operation on A \in M_{m \times n}(\mathbb{F}) is one of three types:

(Row interchange) Exchange rows i and j
(Row scaling) Multiply row i by nonzero c \in \mathbb{F}
(Row replacement) Replace row i by itself plus c times row j, where i \neq j

Geometrically, row operations correspond to changing coordinates in the codomain \mathbb{F}^m. The row operations transform the system Ax = b into an equivalent system \tilde{A}x = \tilde{b} with the same solution set, but where the structure of \tilde{A} makes the solutions evident.

Each row operation can be realized as left multiplication by an elementary matrix: a matrix obtained by performing that operation on the identity I_m.

Definition 7.1 (Elementary matrix) An elementary matrix is obtained by performing a single row operation on I_m.

Let E_{ij} denote the matrix obtained by swapping rows i and j of I_m. Then E_{ij}A swaps rows i and j of A.

Let E_i(c) denote the matrix obtained by multiplying row i of I_m by c \neq 0. Then E_i(c)A multiplies row i of A by c.

Let E_{ij}(c) denote the matrix obtained by adding c times row j to row i in I_m (where i \neq j). Then E_{ij}(c)A adds c times row j of A to row i.

Theorem 7.1 Every elementary matrix is invertible. The inverses are: E_{ij}^{-1} = E_{ij}, \quad E_i(c)^{-1} = E_i(c^{-1}), \quad E_{ij}(c)^{-1} = E_{ij}(-c).

Proof. Each row operation has a natural inverse: swapping rows i and j twice returns to the original configuration; scaling by c then by c^{-1} is the identity; adding c times row j to row i, then adding -c times row j to row i, cancels. \square

A sequence of row operations corresponds to left multiplication by a product of elementary matrices. If B is obtained from A by row operations, then B = E_k \cdots E_1 A for some elementary matrices E_1, \ldots, E_k. The product P = E_k \cdots E_1 is invertible, belonging to \mathrm{GL}_m(\mathbb{F}).

Row operations preserve the solution set of Ax = b. If B = PA and c = Pb where P \in \mathrm{GL}_m(\mathbb{F}), then Ax = b if and only if Bx = c. This is because P is invertible: Ax = b \iff PAx = Pb \iff Bx = c.

Geometrically, multiplying both sides by P applies a change of coordinates to the codomain. The image of A in the new coordinate system is the image of PA, and the target b in the new coordinates is Pb. The question “Does b lie in \operatorname{im}(A)?” becomes “Does Pb lie in \operatorname{im}(PA)?”—which has the same answer, since P is an isomorphism.

7.3 Row Echelon Form

The goal of row operations is to transform A into a form where its structure is transparent.

Definition 7.2 (Leading entry and pivot) The leading entry of a nonzero row is its leftmost nonzero entry. A pivot position is the location of a leading entry. A pivot column is a column containing a pivot position.

Definition 7.3 (Row echelon form) A matrix is in row echelon form if: 1. All nonzero rows are above any rows of all zeros 2. Each leading entry is in a column to the right of the leading entry in the row above 3. All entries in a column below a leading entry are zero

In row echelon form, the leading entries form a “staircase” descending from left to right: \begin{pmatrix} \boxed{*} & * & * & * & * & * \\ 0 & 0 & \boxed{*} & * & * & * \\ 0 & 0 & 0 & \boxed{*} & * & * \\ 0 & 0 & 0 & 0 & 0 & \boxed{*} \\ 0 & 0 & 0 & 0 & 0 & 0 \end{pmatrix} where \boxed{*} denotes a pivot (leading entry) and * denotes an arbitrary entry. Entries below pivots are zero; entries above and to the right of pivots may be nonzero.

Definition 7.4 (Reduced row echelon form) A matrix is in reduced row echelon form if it is in row echelon form and additionally:

Each leading entry equals 1
Each leading entry is the only nonzero entry in its entire column

In reduced row echelon form, each pivot is 1 and is the only nonzero entry in its column: \begin{pmatrix} \boxed{1} & * & 0 & 0 & * & 0 \\ 0 & 0 & \boxed{1} & 0 & * & 0 \\ 0 & 0 & 0 & \boxed{1} & * & 0 \\ 0 & 0 & 0 & 0 & 0 & \boxed{1} \\ 0 & 0 & 0 & 0 & 0 & 0 \end{pmatrix}

Reduced row echelon form is unique: every matrix is row equivalent to exactly one matrix in reduced row echelon form. Row echelon form is not unique—different sequences of row operations may yield different row echelon forms—but the pivot positions are the same in all row echelon forms of a given matrix.

Geometrically, row echelon form reveals the rank of A (the number of pivots) and identifies a maximal linearly independent subset of the rows. The pivot columns in the original matrix A form a basis of the column space.

7.4 Gaussian Elimination

Gaussian elimination is the systematic procedure for reducing a matrix to row echelon form via row operations.

Forward phase (reduction to row echelon form):

Beginning with the leftmost column, locate the first column containing a nonzero entry (the leftmost pivot column). If necessary, interchange rows to move a nonzero entry to the top position. This entry becomes the first pivot. Use row replacement operations to create zeros below this pivot. Repeat this process on the remaining submatrix below the first row, working from left to right until all rows are processed.

The result is a matrix in row echelon form.

Backward phase (reduction to reduced row echelon form):

Beginning with the rightmost pivot, scale its row so the pivot becomes 1. Use row replacement operations to create zeros above this pivot. Move leftward to the next pivot and repeat, continuing until all pivots are 1 and all entries above pivots are zero.

The result is the unique reduced row echelon form.

Consider the following example showing the structure: \begin{pmatrix} 2 & -4 & 6 & -2 & 8 \\ -1 & 2 & -3 & 4 & 1 \\ 3 & -6 & 9 & -1 & 11 \\ 1 & -2 & 3 & 1 & 5 \end{pmatrix} \sim \begin{pmatrix} \boxed{2} & -4 & 6 & -2 & 8 \\ 0 & 0 & 0 & \boxed{3} & 5 \\ 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 \end{pmatrix} This is in row echelon form with pivots in columns 1 and 4. Further row operations yield reduced row echelon form: \begin{pmatrix} \boxed{1} & -2 & 3 & 0 & 6 \\ 0 & 0 & 0 & \boxed{1} & \tfrac{5}{3} \\ 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 \end{pmatrix}

The pivots are in columns 1 and 4. Columns 2, 3, and 5 are free columns (non-pivot columns). The presence of free columns indicates that the columns are linearly dependent, and correspondingly, the kernel of A is nontrivial.

Theorem 7.2 Every matrix can be reduced to row echelon form via row operations. Moreover, every matrix is row equivalent to a unique reduced row echelon form.

Proof. The forward phase of Gaussian elimination terminates in finitely many steps since each iteration either creates a new pivot or moves to a row below. The process produces a matrix in row echelon form.

For uniqueness of reduced row echelon form, observe that the pivot positions are determined by the original matrix—they correspond to a maximal linearly independent set of columns, which is an intrinsic property. Once pivot positions are fixed, the requirement that each pivot is 1 and the only nonzero entry in its column determines all entries uniquely. \square

7.5 The Solution Set of Ax = b

To solve Ax = b, form the augmented matrix [A \mid b] \in M_{m \times (n+1)}(\mathbb{F}) and reduce to row echelon form (or reduced row echelon form).

The augmented matrix represents the system explicitly: the first n columns encode A, and the final column encodes b. Row operations applied to [A \mid b] simultaneously transform both A and b, preserving the solution set.

Consistency. The system is inconsistent if the reduced row echelon form of [A \mid b] contains a row of the form \begin{pmatrix} 0 & 0 & \cdots & 0 & \mid & c \end{pmatrix} where c \neq 0. This row represents the equation 0 = c, which has no solution. Geometrically, it indicates that b does not lie in the column space of A—there is no linear combination of the columns of A that produces b.

If no such row appears, the system is consistent: solutions exist.

Free and pivot variables. Variables corresponding to pivot columns are pivot variables (also called basic variables). Variables corresponding to non-pivot columns are free variables. Free variables can take any value; pivot variables are determined by the free variables via back substitution.

Consider a system with augmented matrix reducing to: \begin{pmatrix} \boxed{1} & 2 & 0 & -1 & \mid & 3 \\ 0 & 0 & \boxed{1} & 2 & \mid & -1 \\ 0 & 0 & 0 & 0 & \mid & 0 \end{pmatrix} Pivot variables are x_1 and x_3; free variables are x_2 and x_4. Setting x_2 = s and x_4 = t, we obtain from the first row x_1 + 2s - t = 3 \implies x_1 = 3 - 2s + t, and from the second row x_3 + 2t = -1 \implies x_3 = -1 - 2t. The general solution is \begin{pmatrix} x_1 \\ x_2 \\ x_3 \\ x_4 \end{pmatrix} = \begin{pmatrix} 3 \\ 0 \\ -1 \\ 0 \end{pmatrix} + s\begin{pmatrix} -2 \\ 1 \\ 0 \\ 0 \end{pmatrix} + t\begin{pmatrix} 1 \\ 0 \\ -2 \\ 1 \end{pmatrix}.

The solution set is a two-dimensional affine subspace of \mathbb{F}^4: a translate of a plane (the kernel) by the vector (3, 0, -1, 0)^T (a particular solution).

7.6 Structure of Solution Sets

The form exhibited above is universal.

Theorem 7.3 (General solution formula) If Ax = b is consistent and x_p is any particular solution, then the set of all solutions is \{x_p + x_h : x_h \in \ker(A)\}.

Proof. If Ax_p = b and Ax_h = 0, then A(x_p + x_h) = Ax_p + Ax_h = b + 0 = b, so x_p + x_h is a solution.

Conversely, if Ax = b, then A(x - x_p) = Ax - Ax_p = b - b = 0, so x - x_p \in \ker(A). Thus x = x_p + (x - x_p) where x - x_p \in \ker(A). \square

This decomposition has geometric content. The solution set to Ax = b is an affine subspace of \mathbb{F}^n: a translate of the linear subspace \ker(A) by the vector x_p. If \ker(A) = \{0\}, the solution is unique (the affine subspace is a single point). If \dim \ker(A) = k > 0, the solution set is a k-dimensional affine subspace—a translated version of \ker(A).

The homogeneous system Ax = 0 always has solutions (at minimum, x = 0). Its solution set is precisely \ker(A), a linear subspace of \mathbb{F}^n of dimension n - \operatorname{rank}(A) by Theorem 4.5 (the rank-nullity theorem from Chapter 3).

The general solution formula reflects a principle pervasive in mathematics: solutions to inhomogeneous equations are obtained by adding a particular solution to the general solution of the associated homogeneous equation. This structure appears in differential equations (general solution = particular + complementary), in affine geometry (affine subspaces are translates of linear subspaces), and throughout analysis.

7.7 Consistency and Rank

A system Ax = b is consistent if and only if b \in \operatorname{Col}(A)—equivalently, b is a linear combination of the columns of A.

Theorem 7.4 The following are equivalent:

Ax = b is consistent
b \in \operatorname{Col}(A)
\operatorname{rank}([A \mid b]) = \operatorname{rank}(A)
The reduced row echelon form of [A \mid b] has no pivot in the last column

Proof. (1) \iff (2) is the definition of consistency in geometric language.

(2) \iff (3): The rank of [A \mid b] is the dimension of the span of its columns. If b is a linear combination of the columns of A, adding b as an additional column does not increase the rank. Conversely, if \operatorname{rank}([A \mid b]) > \operatorname{rank}(A), then b is not in the span of the columns of A.

(3) \iff (4): In reduced row echelon form, a pivot in the last column of [A \mid b] corresponds to a row [0 \; \cdots \; 0 \mid c] with c \neq 0, which increases the rank by 1. Thus \operatorname{rank}([A \mid b]) = \operatorname{rank}(A) + 1 in this case. If no such pivot appears, the ranks are equal. \square

For square systems (m = n), consistency for all b is equivalent to invertibility of A:

Theorem 7.5 Let A \in M_{n \times n}(\mathbb{F}). The following are equivalent:

For every b \in \mathbb{F}^n, the system Ax = b has at least one solution
The linear map x \mapsto Ax is surjective
\operatorname{rank}(A) = n
A has a pivot in every row
A is invertible

Proof. (1) \iff (2) by definition of surjectivity.

(2) \iff (3) since surjectivity means \operatorname{im}(A) = \mathbb{F}^n, equivalently \dim \operatorname{im}(A) = n.

(3) \iff (4) since the rank equals the number of pivots, and there are n rows.

(3) \iff (5) for square matrices: if \operatorname{rank}(A) = n, then \dim \ker(A) = 0 by Theorem 4.5, so A is injective, hence bijective (by Theorem 4.8 from Chapter 3), hence invertible (by Theorem 5.21 from Chapter 4). \square

7.8 Row Equivalence

Two matrices A, B \in M_{m \times n}(\mathbb{F}) are row equivalent, written A \sim B, if one can be obtained from the other by a finite sequence of row operations. Equivalently, there exists P \in \mathrm{GL}_m(\mathbb{F}) such that B = PA.

Theorem 7.6 Row equivalence is an equivalence relation on M_{m \times n}(\mathbb{F}). If A \sim B, then:

\operatorname{rank}(A) = \operatorname{rank}(B)
\ker(A) = \ker(B)
A and B have the same reduced row echelon form

Proof. For the equivalence relation: reflexivity holds since A = I_m A; symmetry holds since B = PA implies A = P^{-1}B; transitivity holds since B = P_1 A and C = P_2 B imply C = P_2 P_1 A.

For (1), rank is the dimension of the column space. Since B = PA with P invertible, the map x \mapsto Bx equals P composed with x \mapsto Ax. The image of B is the image of A under the isomorphism P, so \dim \operatorname{im}(B) = \dim \operatorname{im}(A).

For (2), Bx = 0 \iff PAx = 0 \iff Ax = 0 since P is invertible.

For (3), this follows from Theorem 7.2, which established uniqueness of reduced row echelon form. \square

Row equivalent matrices represent the same linear system in different coordinates in the codomain. The geometric content—the kernel and the rank—is preserved.

7.9 Computing Matrix Inverses

If A \in M_{n \times n}(\mathbb{F}) is invertible, we compute A^{-1} by row reduction.

Row operations correspond to left multiplication by elementary matrices. Reducing A to I_n means finding a product of elementary matrices P = E_k \cdots E_1 such that PA = I_n. But then P = A^{-1}.

To find P, perform the same row operations on I_n. Form the augmented matrix [A \mid I_n] \in M_{n \times 2n}(\mathbb{F}) and row reduce. If A is invertible, the reduction yields [I_n \mid A^{-1}].

Why this works: The row operations transforming A into I_n correspond to P with PA = I_n. Applying the same operations to I_n gives PI_n = P = A^{-1}.

If A is not invertible, the left half will not reduce to I_n—there will be a row of zeros, indicating rank deficiency.

Theorem 7.7 A \in M_{n \times n}(\mathbb{F}) is invertible if and only if A is row equivalent to I_n. Equivalently, the reduced row echelon form of A is I_n.

Proof. If A \sim I_n, then A = P^{-1}I_n = P^{-1} for some P \in \mathrm{GL}_n(\mathbb{F}), so A is invertible.

Conversely, if A is invertible, then \operatorname{rank}(A) = n by Theorem 5.22. The reduced row echelon form has n pivots. The only n \times n matrix in reduced row echelon form with n pivots is I_n (one pivot per row and column, each equal to 1, with zeros elsewhere). \square

7.10 LU Decomposition

Gaussian elimination can be organized into a matrix factorization revealing the structure of the reduction process.

If A can be reduced to row echelon form using only row replacement operations (no row interchanges), we can write A = LU where L is lower triangular with ones on the diagonal and U is upper triangular (in row echelon form).

Definition 7.5 (Triangular matrices) A matrix L \in M_{n \times n}(\mathbb{F}) is lower triangular if \ell_{ij} = 0 for i < j (all entries above the diagonal are zero). It is upper triangular if u_{ij} = 0 for i > j (all entries below the diagonal are zero).

Definition 7.6 (LU decomposition) A matrix A \in M_{n \times n}(\mathbb{F}) has an LU decomposition if A = LU where L is lower triangular with \ell_{ii} = 1 for all i, and U is upper triangular.

Construction: Perform forward elimination (reduction to row echelon form) without row swaps, recording the multipliers used in the row replacement operations. The matrix U is the resulting upper triangular matrix. The matrix L encodes the multipliers: if we subtracted m_{ji} times row i from row j during elimination, we place m_{ji} in position (j, i) of L.

For the generic pattern: A = \begin{pmatrix} a_{11} & a_{12} & a_{13} \\ a_{21} & a_{22} & a_{23} \\ a_{31} & a_{32} & a_{33} \end{pmatrix} After elimination without row swaps: L = \begin{pmatrix} 1 & 0 & 0 \\ m_{21} & 1 & 0 \\ m_{31} & m_{32} & 1 \end{pmatrix}, \quad U = \begin{pmatrix} u_{11} & u_{12} & u_{13} \\ 0 & u_{22} & u_{23} \\ 0 & 0 & u_{33} \end{pmatrix} where m_{ji} is the multiplier used to eliminate a_{ji} during the reduction.

Utility of LU decomposition: Solving Ax = b becomes:

Solve Ly = b by forward substitution (easy since L is lower triangular with ones on the diagonal)
Solve Ux = y by back substitution (easy since U is upper triangular)

If we solve many systems with the same A but different b, we compute the LU decomposition once and reuse it for each new b. This is more efficient than repeatedly row-reducing [A \mid b].

Theorem 7.8 If A \in M_{n \times n}(\mathbb{F}) can be reduced to row echelon form using only row replacement operations (no row interchanges), then A has an LU decomposition.

Proof. Each row replacement operation is left multiplication by an elementary matrix E_{ij}(c) with i > j (adding a multiple of row j to row i below it). Such matrices are lower triangular. If E_k \cdots E_1 A = U where U is upper triangular, then A = E_1^{-1} \cdots E_k^{-1} U. Since E_{ij}(c)^{-1} = E_{ij}(-c) is also lower triangular, the product L = E_1^{-1} \cdots E_k^{-1} is lower triangular. Moreover, each E_{ij}(c) has ones on the diagonal, so L has ones on the diagonal. \square

If row interchanges are required, we obtain a PLU decomposition: PA = LU where P is a permutation matrix (a product of row interchange elementary matrices). Permutation matrices are in \mathrm{GL}_n(\mathbb{F}) and represent reordering of rows.

7.10.1 LDU Decomposition

The LU decomposition can be further refined by extracting the diagonal entries of U, separating the scaling from the elimination structure.

Definition 7.7 (LDU decomposition) A matrix A \in M_{n \times n}(\mathbb{F}) has an LDU decomposition if A = LDU where:

L is lower triangular with \ell_{ii} = 1 for all i
D is diagonal
U is upper triangular with u_{ii} = 1 for all i

If A = L_1 U_1 is an LU decomposition with L_1 unit lower triangular and U_1 upper triangular, we factor out the diagonal of U_1: U_1 = \begin{pmatrix} d_1 & u_{12} & u_{13} & \cdots \\ 0 & d_2 & u_{23} & \cdots \\ 0 & 0 & d_3 & \cdots \\ \vdots & \vdots & \vdots & \ddots \end{pmatrix} = \begin{pmatrix} d_1 & 0 & 0 & \cdots \\ 0 & d_2 & 0 & \cdots \\ 0 & 0 & d_3 & \cdots \\ \vdots & \vdots & \vdots & \ddots \end{pmatrix} \begin{pmatrix} 1 & u_{12}/d_1 & u_{13}/d_1 & \cdots \\ 0 & 1 & u_{23}/d_2 & \cdots \\ 0 & 0 & 1 & \cdots \\ \vdots & \vdots & \vdots & \ddots \end{pmatrix} = DU where D = \operatorname{diag}(d_1, \ldots, d_n) with d_i the i-th diagonal entry of U_1, and the new U has ones on the diagonal.

Thus A = L_1 U_1 = L_1 D U = LDU where we relabel L = L_1.

Theorem 7.9 If A \in M_{n \times n}(\mathbb{F}) has an LU decomposition A = L_1 U_1 and all leading principal minors of A are nonzero (equivalently, the pivots d_1, \ldots, d_n are all nonzero), then A has an LDU decomposition A = LDU.

Proof. From A = L_1 U_1, the diagonal entries of U_1 are the pivots: d_i = (U_1)_{ii}. If all pivots are nonzero, we can factor U_1 = DU as shown above, where D = \operatorname{diag}(d_1, \ldots, d_n) and U is obtained from U_1 by dividing each row by its pivot (the diagonal entry). This yields A = L_1 D U = LDU. \square

Uniqueness. If A = LDU is an LDU decomposition, it is unique (assuming the pivots are nonzero). To see this, suppose A = L_1 D_1 U_1 = L_2 D_2 U_2 are two LDU decompositions. Then L_1 D_1 U_1 = L_2 D_2 U_2, giving L_2^{-1} L_1 = D_2 U_2 U_1^{-1} D_1^{-1}. The left side is lower triangular (with ones on the diagonal), and the right side is upper triangular (with ones on the diagonal after rearranging). A matrix that is both lower and upper triangular with ones on the diagonal is the identity. Thus L_1 = L_2, which forces D_1 U_1 = D_2 U_2. A similar argument shows D_1 = D_2 and U_1 = U_2.

Geometric interpretation. The LDU decomposition separates three operations:

L performs elimination below the diagonal: Lower triangular operations to clear entries below each pivot
D performs scaling at pivots: Diagonal scaling capturing the “size” of the pivots
U performs back-substitution structure: Upper triangular operations capturing relationships among pivot columns

This factorization makes certain properties transparent. For instance, \det(A) = \det(L) \det(D) \det(U) = 1 \cdot \det(D) \cdot 1 = \prod_{i=1}^n d_i. The determinant equals the product of the pivots.

Symmetric matrices. If A is symmetric (A = A^T) and has an LDU decomposition, then A = LDU implies A^T = (LDU)^T = U^T D^T L^T = U^T D L^T. But A = A^T, so LDU = U^T D L^T. By uniqueness of LDU decomposition (when it exists), we have L = U^T (equivalently, U = L^T). Thus for symmetric matrices: A = LDL^T This is the LDL^T decomposition, requiring only L and D to represent A—more efficient than the general LDU decomposition.

For positive definite symmetric matrices (all pivots positive), this becomes the foundation of the Cholesky decomposition, where we write D = \tilde{D}^2 with \tilde{D} = \operatorname{diag}(\sqrt{d_1}, \ldots, \sqrt{d_n}) and obtain A = (L\tilde{D})(L\tilde{D})^T = LL^T with a different lower triangular L. This decomposition is central to numerical linear algebra.

Applications. The LDU decomposition is useful for:

Determinant computation (next chapter): \det(A) = \prod d_i
Rank determination: \operatorname{rank}(A) equals the number of nonzero entries in D
Inertia of symmetric matrices: For A = A^T, the number of positive, negative, and zero diagonal entries in D gives the inertia (Sylvester’s law)
Solving systems: Ax = b becomes LDUx = b, solved by forward substitution (Ly = b), diagonal scaling (Dz = y), and back substitution (Ux = z)

7.11 Pivot Columns and the Column Space

The pivot columns of A (as they appear in the original matrix, not in the reduced row echelon form) form a basis of \operatorname{Col}(A).

Theorem 7.10 Let A \in M_{m \times n}(\mathbb{F}) have pivots in columns j_1, \ldots, j_r. Then \{a_{j_1}, \ldots, a_{j_r}\} is a basis of \operatorname{Col}(A), where a_j denotes the j-th column of A.

Proof. For independence: suppose \sum_{k=1}^{r} c_k a_{j_k} = 0. This is a linear dependence relation among the columns of A. Row operations do not change dependence relations among columns (though they do change the columns themselves). In the reduced row echelon form, the columns in positions j_1, \ldots, j_r are standard basis vectors e_1, \ldots, e_r (up to reordering). These are linearly independent, forcing c_1 = \cdots = c_r = 0.

For spanning: every column of A is a linear combination of the pivot columns. This is evident from the reduced row echelon form, where non-pivot columns are explicitly expressed in terms of pivot columns. Thus \operatorname{Col}(A) \subseteq \operatorname{span}(a_{j_1}, \ldots, a_{j_r}). The reverse inclusion is immediate since the pivot columns lie in \operatorname{Col}(A). \square

Consequently, \operatorname{rank}(A) equals the number of pivots in any row echelon form of A. The pivot positions identify the maximal linearly independent subset of columns.

Reducing A to row echelon form reveals which columns are “essential” (pivot columns) and which are “redundant” (non-pivot columns, expressible in terms of pivot columns). The free variables correspond to degrees of freedom in the kernel; the pivot variables correspond to constraints.

7.12 The Invertible Matrix Theorem

We collect equivalent characterizations of invertibility, drawing from Chapters 3, 4, and the present chapter.

Theorem 7.11 (The Invertible Matrix Theorem) For A \in M_{n \times n}(\mathbb{F}), the following are equivalent:

A is invertible
A is row equivalent to I_n
A has n pivots
\operatorname{rank}(A) = n
\ker(A) = \{0\}
The columns of A are linearly independent
The columns of A span \mathbb{F}^n
The columns of A form a basis of \mathbb{F}^n
The map x \mapsto Ax is injective
The map x \mapsto Ax is surjective
The map x \mapsto Ax is bijective
For every b \in \mathbb{F}^n, the equation Ax = b has exactly one solution
A^T is invertible
The rows of A are linearly independent
The rows of A span \mathbb{F}^n
The rows of A form a basis of \mathbb{F}^n

Proof. We established most of these equivalences in previous chapters.

(1) \iff (4): By Theorem 5.22.

(4) \iff (5): By Theorem 4.5, \dim \mathbb{F}^n = \operatorname{rank}(A) + \dim \ker(A), so \operatorname{rank}(A) = n \iff \dim \ker(A) = 0 \iff \ker(A) = \{0\}.

(5) \iff (6): The columns are linearly independent if and only if the only solution to Ax = 0 is x = 0.

(6) \iff (7): By Theorem 3.9, an independent set of n vectors in an n-dimensional space spans.

(6) \land (7) \iff (8): A set is a basis if and only if it is both independent and spanning.

(5) \iff (9): By Theorem 4.6, injectivity is equivalent to trivial kernel.

(7) \iff (10): Spanning \mathbb{F}^n means the image is all of \mathbb{F}^n, which is surjectivity.

(9) \land (10) \iff (11): Bijectivity is injectivity and surjectivity.

(11) \iff (12): Bijectivity means for every b there exists a unique x with Ax = b.

(1) \iff (2): By Theorem 7.7.

(2) \iff (3): By Theorem 7.2, the reduced row echelon form of A is I_n if and only if A has n pivots.

(1) \iff (13): Since (A^T)^{-1} = (A^{-1})^T, invertibility of A is equivalent to invertibility of A^T.

(13) \iff (14) \iff (15) \iff (16): Apply the equivalences (1) \iff (6) \iff (7) \iff (8) to A^T, noting that rows of A are columns of A^T. \square

This theorem unifies the geometric perspective (kernel, image, basis), the algebraic perspective (linear independence, span), and the computational perspective (pivots, row equivalence, matrix invertibility). It is central to the theory, albiet somewhat obvious.

7.13 Closing Remarks

Gaussian elimination bridges abstract theory and concrete computation. Row operations preserve the solution set while transforming the matrix to a form where solutions are transparent. The machinery of row echelon forms, pivot positions, and free variables provides systematic algorithms for the fundamental questions: Is Ax = b consistent? What is \ker(A)? Is A invertible, and if so, what is A^{-1}?

The solution structure—general = particular + homogeneous—reflects the affine geometry of linear equations. Solution sets are translates of the kernel, and the kernel itself is the solution space of the associated homogeneous system.

Row equivalence is an equivalence relation preserving rank and kernel but not column space. The pivot columns of the original matrix form a basis of \operatorname{Col}(A), extracting maximal independent information from A. Non-pivot columns correspond to free variables and degrees of freedom in the kernel.

LU decomposition factors the elimination process, separating the lower triangular operations from the upper triangular result. This is the first of many matrix factorizations in linear algebra, each adapted to different computational or theoretical purposes. The next chapter develops determinants, providing a coordinate-free measure of how linear maps scale volumes and yielding an algebraic criterion for invertibility.