15 Applications

The tools developed in this course — the change of variables theorem, Lagrange multipliers, the total derivative, the Laplacian, the Divergence theorem — show up throughout mathematics. This chapter shows them doing concrete work in probability, optimisation, and the theory of partial differential equations. The level is the same as the rest of the course: the arguments use what we have built, and nothing more.

15.1 Change of Variables for Probability Densities

A probability density function on \mathbb{R}^n is a non-negative smooth function f : \mathbb{R}^n \to \mathbb{R} with \int_{\mathbb{R}^n} f(x)\, dx = 1. The probability that a random vector X lies in a region \Omega is \mathbb{P}(X \in \Omega) = \int_\Omega f(x)\, dx.

The most important question about densities is: if X has density f_X and Y = g(X) for a smooth map g, what is the density of Y? This is exactly Theorem 10.3 (the change of variables theorem from Chapter 8).

Theorem 15.1 (Change of variables for densities) Let g : U \to V be a C^1 diffeomorphism between open sets in \mathbb{R}^n, and let X be a random vector with density f_X supported in U. Then Y = g(X) has density f_Y(y) = f_X(g^{-1}(y))\cdot |\det Dg^{-1}_y| = \frac{f_X(g^{-1}(y))}{|\det Dg_{g^{-1}(y)}|}.

Proof. For any region \Omega \subset V: \mathbb{P}(Y \in \Omega) = \mathbb{P}(X \in g^{-1}(\Omega)) = \int_{g^{-1}(\Omega)} f_X(x)\, dx = \int_\Omega f_X(g^{-1}(y))\,|\det Dg^{-1}_y|\, dy, where the last step is the change of variables theorem with x = g^{-1}(y). Since this holds for all \Omega, the integrand is f_Y. \square

The Jacobian factor |\det Dg^{-1}_y| = 1/|\det Dg_{g^{-1}(y)}| is what compensates for the stretching or compression of g: if g expands volumes by a factor of |\det Dg| near x, then the density at g(x) must be correspondingly smaller.

Example: the standard normal. The one-dimensional density f_X(x) = \frac{1}{\sqrt{2\pi}}e^{-x^2/2} integrates to 1 because \int_{-\infty} ^\infty e^{-x^2/2}\,dx = \sqrt{2\pi} — the Gaussian integral from Chapter 8.

In \mathbb{R}^n, the density f_X(x) = \frac{1}{(2\pi)^{n/2}} e^{-\|x\|^2/2} also integrates to 1: it factors as \prod_{i=1}^n \frac{1}{\sqrt{2\pi}} e^{-x_i^2/2}, and by Fubini the integral factors into n independent one-dimensional Gaussian integrals, each equal to 1.

Example: scaling. If X has density f_X and Y = aX for a scalar a \neq 0, then g(x) = ax has Dg = aI and |\det Dg| = |a|^n, so f_Y(y) = \frac{1}{|a|^n} f_X\!\left(\frac{y}{a}\right). Stretching the distribution by a divides the density by |a|^n — the total probability still integrates to 1.

Example: the multivariate Gaussian. A random vector X \sim \mathcal{N} (\mu, \Sigma) with mean \mu \in \mathbb{R}^n and covariance matrix \Sigma (symmetric, positive definite) has density f_X(x) = \frac{1}{(2\pi)^{n/2}\sqrt{\det\Sigma}}\, e^{-\frac{1}{2}(x-\mu)^T\Sigma^{-1}(x-\mu)}. To verify this integrates to 1: write \Sigma = LL^T (Cholesky decomposition) and set y = L^{-1}(x-\mu), a linear map with |\det DL^{-1}| = 1/\sqrt{\det\Sigma}. Under this substitution (x-\mu)^T\Sigma^{-1} (x-\mu) = \|y\|^2, so the integral becomes \frac{1}{(2\pi)^{n/2}} \int_{\mathbb{R}^n} e^{-\|y\|^2/2}\, dy = 1.

If Y = AX + b for an invertible A, the density transformation theorem gives Y \sim \mathcal{N}(A\mu + b, A\Sigma A^T) — the Gaussian family is stable under invertible linear transformations, and the change of variables theorem is the entire proof.

15.2 Constrained Optimisation: the Least Squares Problem

Given a matrix A \in \mathbb{R}^{m \times n} with m > n and a vector b \in \mathbb{R}^m, the system Ax = b is overdetermined — more equations than unknowns — and generically has no solution. The least squares problem asks for the x that minimises \|Ax - b\|^2.

This is an unconstrained optimisation problem in \mathbb{R}^n. Setting f(x) = \|Ax - b\|^2 = (Ax-b)^T(Ax-b), we have \nabla f(x) = 2A^T(Ax - b). Setting \nabla f = 0 gives the normal equations: A^T A\, x = A^T b. The matrix A^T A is symmetric and positive semidefinite. When A has full column rank — its columns are linearly independent — A^T A is positive definite and invertible, giving the unique minimiser x^* = (A^T A)^{-1} A^T b. The second derivative test confirms this is a minimum: D^2f = 2A^T A, which is positive definite, so the unique critical point is a global minimum.

The map x \mapsto Ax^* is the orthogonal projection of b onto the column space of A: Ax^* = A(A^TA)^{-1}A^Tb is the point in the column space of A closest to b. This is a geometric statement — the residual b - Ax^* is perpendicular to every column of A — and it follows from \nabla f(x^*) = 2A^T(Ax^* - b) = 0.

Constrained least squares. If we additionally require Cx = d for a constraint matrix C \in \mathbb{R}^{k\times n}, the Lagrangian is \mathcal{L}(x,\lambda) = \|Ax-b\|^2 - \lambda^T(Cx-d), and the Lagrange conditions give the linear system \begin{pmatrix} 2A^TA & C^T \\ C & 0 \end{pmatrix} \begin{pmatrix} x \\ \lambda \end{pmatrix} = \begin{pmatrix} 2A^Tb \\ d \end{pmatrix}. This is solvable when A has full column rank and C has full row rank — exactly the rank conditions from Theorem 9.4 (Chapter 7). The Lagrange multiplier \lambda measures how much the constraint increases the minimum least-squares residual.

15.3 First-Order PDEs and the Method of Characteristics

A partial differential equation (PDE) is an equation involving a function u of several variables and its partial derivatives. The simplest class — first-order linear PDEs — can be understood entirely with the tools of this course. The key idea is that a first-order linear PDE defines a vector field, and solving the PDE reduces to integrating that vector field’s flow lines.

15.3.1 The simplest case: constant coefficients

Consider the PDE on \mathbb{R}^2: a\,\frac{\partial u}{\partial x} + b\,\frac{\partial u}{\partial y} = 0, \qquad a, b \in \mathbb{R}, \quad (a,b) \neq (0,0). This says that the directional derivative of u in the direction v = (a,b) is zero everywhere. In other words, Df_{(x,y)}(a,b) = 0 at every point — u is constant along every line parallel to v.

The lines parallel to v = (a,b) are the level sets of the linear function \ell(x,y) = bx - ay (which is constant on lines of direction (a,b) since D\ell\cdot(a,b) = ba - ab = 0). So the general solution is u(x,y) = F(bx - ay) for any smooth function F : \mathbb{R} \to \mathbb{R}. The PDE reduces to a single free function of one variable. The quantity bx - ay is conserved along the flow of the vector field (a,b), which is exactly what it means for the solution to be constant along those flow lines.

15.3.2 Variable coefficients and characteristics

Now consider a general first-order linear PDE: a(x,y)\,\frac{\partial u}{\partial x} + b(x,y)\,\frac{\partial u}{\partial y} = c(x,y)\,u, where a, b, c : U \to \mathbb{R} are smooth. The left side is Du_{(x,y)} \cdot (a,b) — the derivative of u applied to the vector field F = (a,b). Setting this to c\,u says u changes at rate c\,u along the flow lines of F.

The characteristics are the integral curves of the vector field F = (a(x,y), b(x,y)) — the solutions \gamma(t) = (x(t), y(t)) of \dot{x} = a(x,y), \qquad \dot{y} = b(x,y). Along a characteristic, the PDE becomes an ODE: \frac{d}{dt}u(\gamma(t)) = Du_{\gamma(t)}\cdot\gamma'(t) = a\,\partial_x u + b\,\partial_y u = c(\gamma(t))\,u(\gamma(t)). This is a linear ODE in t with solution u(\gamma(t)) = u(\gamma(0))\, e^{\int_0^t c(\gamma(s))\,ds}. So once we know u on an initial curve transverse to the characteristics, the value of u at any other point is determined by following the characteristic through that point back to the initial curve and solving the ODE along the way.

The derivative computation \frac{d}{dt}u(\gamma(t)) = Du_{\gamma(t)}\cdot \gamma'(t) is exactly the chain rule from Chapter 3.

Example. Solve x\,\partial_x u + y\,\partial_y u = u with u(x,1) = \sin x.

The vector field is F = (x, y), giving characteristics \dot{x} = x, \dot{y} = y. These solve to x(t) = x_0 e^t, y(t) = y_0 e^t — rays from the origin. Along each characteristic, the ODE is \dot{u} = u, giving u = u_0\,e^t.

The initial condition u(x,1) = \sin x lies on the line y = 1. A point (x,y) with y > 0 lies on the characteristic through (x/y, 1) at time t = \log y (since y_0 e^t = y and y_0 = 1 gives t = \log y). The initial value is u_0 = \sin(x/y), and the solution at time t = \log y is u = u_0\,e^t = \sin(x/y)\cdot y. Therefore u(x,y) = y\sin\!\left(\frac{x}{y}\right). One verifies: \partial_x u = \cos(x/y), \partial_y u = \sin(x/y) - \frac{x}{y}\cos(x/y), and x\partial_x u + y\partial_y u = x\cos(x/y) + y\sin(x/y) - x\cos(x/y) = y\sin(x/y) = u. Correct.

15.3.3 The geometric picture

The method of characteristics is the observation that a first-order linear PDE defines a vector field F = (a,b) on the domain, and the PDE is simply the statement that u changes in a prescribed way along the flow lines of F. Finding u means finding a function that is constant — or changes at a prescribed rate — along these curves.

This is the same structure as conservative fields from Chapter 9: there we asked for a function whose gradient is a given vector field; here we ask for a function whose directional derivative along a given vector field is a given multiple of itself. The method of characteristics makes the vector field and its flow lines the central objects, reducing a PDE in two variables to a family of ODEs in one variable along each flow line.

The technique generalises: for a first-order PDE in n variables \sum_i a_i(x)\partial_{x_i} u = c(x)u, the characteristics are the integral curves of the vector field (a_1, \ldots, a_n) in \mathbb{R}^n, and the argument is word-for-word the same.

15.4 Harmonic Functions

A C^2 function u : U \to \mathbb{R} is harmonic if \Delta u = 0. The Divergence theorem gives harmonic functions a powerful structural property.

Theorem 15.2 (Mean value property) If u is harmonic on U and \overline{B}(a,r) \subset U, then u(a) = \frac{1}{\operatorname{Area}(S^{n-1}(r))} \int_{S^{n-1}(a,r)} u\, dS. The value of u at any point equals its average over any sphere centred there.

Proof. Write \phi(r) = \frac{1}{\omega_n r^{n-1}} \int_{S^{n-1}(a,r)} u\,dS where \omega_n is the area of the unit sphere. After the substitution x = a + ry: \phi'(r) = \frac{1}{\omega_n r^{n-1}} \int_{S^{n-1}(a,r)} \frac{\partial u}{\partial n}\, dS = \frac{1}{\omega_n r^{n-1}} \int_{B(a,r)} \Delta u\, dV = 0, using the Divergence theorem to convert the boundary integral of \partial u/\partial n to the volume integral of \Delta u, which is zero by harmonicity. So \phi is constant, and \phi(r) \to u(a) as r \to 0^+. \square

Theorem 15.3 (Maximum principle) If u is harmonic on a bounded connected open set U and continuous on \overline{U}, then u attains its maximum and minimum on \partial U.

Proof. If u attains its maximum M at an interior point a, the mean value property forces u = M on every sphere around a, hence on a ball around a. By connectedness, u = M on all of U and in particular on \partial U. \square

Uniqueness for the Dirichlet problem. The problem \Delta u = 0 on U, u = g on \partial U has at most one solution: if u_1 and u_2 both solve it, then u_1 - u_2 is harmonic and zero on \partial U, so by the maximum principle u_1 - u_2 = 0 on all of U.

The maximum principle is the reason harmonic functions model equilibrium: the temperature at any point is between the temperatures on the boundary, and the hottest and coldest points are always on the boundary, not inside.