9 The Inverse and Implicit Function Theorems

We have spent the last several chapters building a complete theory of differentiation: the total derivative as a linear approximation, partial derivatives as its coordinates, the chain rule, the Hessian and Taylor’s theorem. All of this describes how f behaves near a point a to some finite order. A natural and important question is: does the linear approximation Df_a tell us anything about the global structure of f near a, not just its first-order behaviour?

The most basic instance of this question is invertibility. If Df_a is an invertible linear map, is f itself invertible near a — does it have a smooth local inverse? The answer is yes, and this is the inverse function theorem. It is probably the most important theorem in multivariable calculus: it is what makes coordinate changes work, it underlies the implicit function theorem and the rank theorem, and it is what justified the definition of embedded submanifolds back in Chapter 2.

The answer is not obvious. Knowing Df_a is invertible tells you about f at the single point a. To build a local inverse you need to control f in a whole neighbourhood, and for that you need to know that Df doesn’t change too fast near a — which is a statement about continuity of the derivative, not just about its value at one point. The C^1 hypothesis in the theorem is doing real work.

The proof strategy is to write the problem of finding a preimage f(x) = y as a fixed-point problem, and then invoke the contraction mapping principle to show the fixed-point exists. This is a remarkably clean approach: the entire analytical difficulty of the theorem reduces to one lemma about contractions on complete metric spaces.

9.1 The Contraction Mapping Principle

A map T : X \to X on a metric space is a contraction if there exists c \in [0,1) such that d(T(x), T(y)) \leq c\, d(x,y) for all x, y. Every application of T shrinks distances by a factor of at least c, so repeated applications drive any two points together exponentially fast. In a complete metric space there is nowhere for the sequence T^k(x) to escape to, so it must converge — and its limit is a fixed point.

Theorem 9.1 (Contraction mapping principle) Let (X, d) be a non-empty complete metric space and T : X \to X a contraction with constant c < 1. Then T has a unique fixed point, and the iterates T^k(x_0) converge to it from any starting point x_0.

Proof. The sequence x_k = T^k(x_0) satisfies d(x_{k+1}, x_k) \leq c^k\, d(x_1, x_0), and a geometric series estimate gives d(x_m, x_k) \leq \frac{c^k}{1-c}\, d(x_1, x_0) for all m > k, so (x_k) is Cauchy. Let x^* be its limit. Continuity of T gives T(x^*) = \lim T(x_k) = \lim x_{k+1} = x^*, so x^* is a fixed point. If y^* is another, then d(x^*, y^*) = d(T(x^*), T(y^*)) \leq c\, d(x^*, y^*), which forces x^* = y^*. \square

The completeness of \mathbb{R}^n is what we need here, and closed balls in \mathbb{R}^n are complete. Keep this in mind — the proof of the IFT will apply the principle to a contraction on a closed ball.

9.2 The Inverse Function Theorem

Theorem 9.2 (Inverse function theorem) Let f : U \to \mathbb{R}^n be C^1 on the open set U \subset \mathbb{R}^n, and suppose Df_a is invertible at some a \in U. Then there exist open sets V \ni a and W \ni f(a) such that f|_V : V \to W is a diffeomorphism, with inverse derivative D(f^{-1})_{f(a)} = (Df_a)^{-1}.

Before the proof, a word on what the theorem does and does not give. It gives a local inverse — a smooth map g : W \to V with g(f(x)) = x for x near a. It says nothing about invertibility away from a: the map (r,\theta) \mapsto (r\cos\theta, r\sin\theta) is locally invertible wherever r \neq 0 but wraps the plane around the origin infinitely many times, so it has no global inverse. Local and global invertibility are genuinely different things.

Proof. By replacing f with (Df_a)^{-1} \circ f and translating, we may assume Df_a = I and f(a) = a = 0. This loses no generality since pre- and post-composing with diffeomorphisms preserves all the conclusions.

With Df_0 = I, define \phi(x) = x - f(x). Then D\phi_0 = I - I = 0. Since f is C^1 the derivative D\phi is continuous and zero at the origin, so there exists r > 0 such that \|D\phi_x\| \leq 1/2 on \bar B(0,r). By Theorem 5.4 (the mean value inequality), \|\phi(x) - \phi(y)\| \leq \tfrac{1}{2}\|x - y\| \qquad \text{for all } x, y \in \bar B(0,r). \tag{$*$} This single estimate does everything. It gives injectivity immediately: if f(x) = f(y) then x - y = \phi(x) - \phi(y), so \|x-y\| \leq \frac{1}{2}\|x-y\|, hence x = y. And it is exactly what is needed to set up a contraction.

To show f is surjective onto B(0, r/2), fix y with \|y\| < r/2 and consider the map T(x) = y + \phi(x). A fixed point of T is a solution to f(x) = y — precisely what we want. For x \in \bar B(0,r): \|T(x)\| \leq \|y\| + \|\phi(x) - \phi(0)\| \leq \tfrac{r}{2} + \tfrac{1}{2}\|x\| \leq r, so T maps \bar B(0,r) to itself. By (*), \|T(x) - T(x')\| \leq \frac{1}{2}\|x-x'\|, so T is a contraction. The contraction mapping principle gives a unique fixed point x^* \in \bar B(0,r) with f(x^*) = y.

So f is a bijection from V = f^{-1}(B(0,r/2)) \cap B(0,r) onto W = B(0, r/2), and the inverse g = f^{-1} : W \to V exists. To show g is C^1: fix b = f(x) \in W and write b + k = f(x+h), so h = g(b+k) - g(b). From differentiability of f: k = Df_x(h) + o(\|h\|), so h = (Df_x)^{-1}k + o(\|h\|). From (*) with y = 0: \|h\| \leq \tfrac{1}{2}\|h\| + \|f(x+h) - f(x)\| = \tfrac{1}{2}\|h\| + \|k\|, giving \|h\| \leq 2\|k\|. So o(\|h\|) = o(\|k\|), and g(b+k) - g(b) = (Df_x)^{-1}k + o(\|k\|). The derivative of g at b is (Df_{g(b)})^{-1}, which is continuous since f is C^1 and matrix inversion is smooth. Inductively, g is C^k whenever f is. \square

The formula D(f^{-1})_{f(a)} = (Df_a)^{-1} has a one-line proof independent of everything above: differentiate f^{-1} \circ f = \mathrm{id} by the chain rule to get D(f^{-1})_{f(a)} \circ Df_a = I. This is the multivariable version of (f^{-1})'(y) = 1/f'(x).

Example: polar coordinates. The map \phi(r,\theta) = (r\cos\theta, r\sin\theta) on (0,\infty) \times (-\pi, \pi) has derivative with determinant equal to r, which is positive on the whole domain. So \phi is a local diffeomorphism everywhere, and the inverse function theorem guarantees that Cartesian and polar coordinates are interchangeable near any point with r > 0. The explicit inverse r = \sqrt{x^2+y^2}, \theta = \arctan(y/x) confirms the theorem in this case; what the theorem gives in general is the guarantee that such a smooth inverse exists even when you cannot write it down explicitly.

9.3 The Implicit Function Theorem

The implicit function theorem is really just the inverse function theorem in disguise, and once you see the disguise it is impossible to un-see.

The question it answers is: given a smooth equation F(x', x'') = 0 in variables (x', x'') \in \mathbb{R}^{n-k} \times \mathbb{R}^k, when can we locally solve for x'' as a smooth function of x'? That is, when does the zero set of F look like a graph near a given point?

The answer is: when the derivative of F with respect to x'' alone is invertible. The intuition is that if F is not “stuck” in the x'' directions, we can use those directions to steer toward zero, and the IFT provides the smooth steering function.

Theorem 9.3 (Implicit function theorem) Let F : U \to \mathbb{R}^k be C^1 on the open set U \subset \mathbb{R}^n, and let a = (a', a'') \in U with F(a) = 0. Suppose the partial Jacobian \partial_{x''} F(a) — the k \times k matrix of partial derivatives of F with respect to the last k variables — is invertible.

Then there exist open sets V' \ni a' and V'' \ni a'', and a unique C^1 map \psi : V' \to V'' with \psi(a') = a'' such that F(x', \psi(x')) = 0 \quad \text{for all } x' \in V', and these are the only solutions to F = 0 in V' \times V''. The derivative of \psi is D\psi_{a'} = -(\partial_{x''} F(a))^{-1}\, \partial_{x'} F(a).

Proof. Define \Phi : U \to \mathbb{R}^n by \Phi(x', x'') = (x', F(x', x'')) — this leaves the first n-k coordinates alone and replaces the last k with F. We have \Phi(a) = (a', 0), and D\Phi_a = \begin{pmatrix} I_{n-k} & 0 \\ \partial_{x'} F(a) & \partial_{x''} F(a) \end{pmatrix}, which is invertible if and only if \partial_{x''} F(a) is — exactly our hypothesis. The inverse function theorem gives a smooth local inverse \Phi^{-1} near (a', 0). Since \Phi does not change the first n-k coordinates, the inverse has the form \Phi^{-1}(x', y) = (x', G(x', y)) for some smooth G. Setting \psi(x') = G(x', 0): \Phi(x', \psi(x')) = (x', F(x', \psi(x'))) = (x', 0), so F(x', \psi(x')) = 0, and uniqueness follows from the local injectivity of \Phi. Differentiating F(x', \psi(x')) = 0 by the chain rule gives \partial_{x'} F + \partial_{x''} F \cdot D\psi = 0, hence D\psi = -(\partial_{x''} F)^{-1} \partial_{x'} F. \square

In the single-equation case, k = 1, the derivative formula reduces to \psi'(x) = -F_x/F_y, the implicit differentiation formula from single-variable calculus. The theorem says this formula is valid whenever F_y(a) \neq 0 and that the implicitly defined function is genuinely smooth.

This theorem retroactively justifies everything in Chapter 2. The definition of embedded submanifold required a smooth F near each point p \in M with M = F^{-1}(0) and DF_p of full rank. The implicit function theorem is precisely the theorem that converts this condition into local coordinates: when DF_p : \mathbb{R}^N \to \mathbb{R}^k has rank k, the theorem gives a smooth \psi expressing k of the ambient coordinates as smooth functions of the remaining n = N-k, providing a chart for M. The rank condition was always waiting for this.

Example: the sphere. Take F(x,y,z) = x^2 + y^2 + z^2 - 1 near the north pole a = (0,0,1). The partial derivative \partial_z F = 2z equals 2 \neq 0 at a, so the theorem gives a smooth \psi(x,y) near (0,0) with \psi(0,0) = 1 and x^2 + y^2 + \psi(x,y)^2 = 1: explicitly \psi(x,y) = \sqrt{1-x^2-y^2}. The derivative formula gives D\psi_{(0,0)} = -(2z)^{-1}(2x, 2y)|_{(0,0,1)} = (0,0), confirming the sphere has a horizontal tangent at the north pole.

Example: a curve. Let F(x,y) = y^3 + xy - 1 near (0,1), where F(0,1) = 0. Since \partial_y F = 3y^2 + x = 3 at (0,1), the theorem gives a smooth \psi near 0 with \psi(0) = 1 and \psi(x)^3 + x\psi(x) = 1. The derivative: \psi'(0) = -F_x/F_y|_{(0,1)} = -y/(3y^2+x)|_{(0,1)} = -1/3.

9.4 The Rank Theorem

Both theorems above are special cases of a single organising principle: the local structure of a smooth map is completely determined by the rank of its derivative, and in the right coordinates it looks as simple as a linear map of that rank.

Theorem 9.4 (Rank theorem) Let f : U \to \mathbb{R}^m be smooth on an open set U \subset \mathbb{R}^n, and suppose Df_a has rank r at some a \in U. Then there exist smooth coordinate changes \phi near a and \psi near f(a) such that \psi \circ f \circ \phi^{-1}(x_1, \ldots, x_n) = (x_1, \ldots, x_r, 0, \ldots, 0).

In the right coordinates, f is just the projection onto the first r components — the simplest possible map of rank r. The IFT is the case r = n = m; the ImpFT is the case r = k < n (surjective derivative). The proof is an application of both, extracting the invertible r \times r minor of Df_a and straightening the image; we omit the details.

The rank theorem has a clean corollary. A value c \in \mathbb{R}^m is a regular value of f if Df_x is surjective at every x \in f^{-1}(c). At such points the rank of Df_x is m, and the rank theorem gives:

Theorem 9.5 (Regular value theorem) If c is a regular value of a smooth map f : U \to \mathbb{R}^m, then f^{-1}(c) is a smooth embedded submanifold of U of dimension n - m.

This closes the logical circle of the text. We started by defining embedded submanifolds via the rank condition and said the justification was “the implicit function theorem, proved in later chapters.” The regular value theorem is that justification. The sphere, level sets of smooth functions at regular values, constraint surfaces in optimisation — all of these are smooth submanifolds not by assumption but because the rank condition holds and this theorem applies.