11 Computing Derivatives of Elementary Functions
11.1 Applying the Rules to Elementary Functions
The preceding section developed the derivative rules in a more abstract framework than typical calculus texts. This abstraction served a purpose: it revealed why the rules hold rather than presenting them as formulas to memorize. The framework of differentials and linear maps provided the reasoning behind the formulas rather than presenting them as a collection of computational recipes.
We now turn from justification to application. As in Limits and Continuity, where we moved from \varepsilon-\delta definitions to practical computation, we apply the derivative rules to elementary functions: polynomials, exponentials, logarithms, and trigonometric functions. These derivatives, combined with the sum, product, quotient, and chain rules, form a complete toolkit for differentiating any expression built from elementary functions.
Notation. From this point forward, we write f'(x) or Df(x) rather than f'(a), emphasizing that the derivative varies with the point. The meaning remains the same: f'(x) is the coefficient of h in the linear approximation f(x+h) \approx f(x) + f'(x)h.
11.2 Polynomials
Polynomials are finite sums of powers with real coefficients. Each term can be handled using the linearity rules established in ?sec-diff-rules, making polynomials the natural starting point.
11.2.1 Constants
Consider a constant function f(x) = c. For any displacement h, f(x+h) - f(x) = c - c = 0.
The change is always zero, so the linear approximation is exact: f(x+h) = f(x) + 0 \cdot h.
Thus df_x(h) = 0 \cdot h, giving Df(x) = 0 for all x.
Recall from Section 8.3 that a linear map T : \mathbb{R} \to \mathbb{R} has the form T(h) = mh for some constant m. Here, the differential is the zero map—the linear functional that sends every displacement to zero. Geometrically, a constant function has a horizontal graph, and its tangent line at every point has slope zero.
11.2.2 The Identity Function
For f(x) = x, we have f(x+h) = x + h.
The linear approximation is exact: f(x+h) = f(x) + 1 \cdot h. Thus df_x(h) = h, giving Df(x) = 1 for all x.
The differential is the identity map on \mathbb{R}: it returns the displacement unchanged. The graph of y = x is itself a line with slope 1, so every tangent line coincides with the graph.
11.2.3 Powers of x
Consider f(x) = x^2. We have f(x+h) = (x+h)^2 = x^2 + 2xh + h^2.
The linear term is 2xh, and the error h^2 is o(h) as h \to 0. Thus df_x(h) = 2xh, giving Df(x) = 2x.
At each point x, the differential is the linear map h \mapsto 2xh. The coefficient 2x varies with position: at x = 0, the tangent line is horizontal (df_0(h) = 0), while at x = 3, the tangent line has slope 6 (df_3(h) = 6h).
For f(x) = x^3, f(x+h) = (x+h)^3 = x^3 + 3x^2 h + 3xh^2 + h^3.
The linear term is 3x^2 h, so Df(x) = 3x^2.
The pattern is clear. And thus motivates the following
Theorem 11.1 (Power Rule) For n \in \mathbb{N} and f(x) = x^n, we have df_x(h) = nx^{n-1}h.
In coordinates: (x^n)' = nx^{n-1}.
The binomial theorem gives (x+h)^n = x^n + nx^{n-1}h + \binom{n}{2}x^{n-2}h^2 + \binom{n}{3}x^{n-3}h^3 + \cdots + h^n.
The linear term is nx^{n-1}h. We must show all remaining terms are o(h).
Each remaining term has the form \binom{n}{k}x^{n-k}h^k for k \geq 2. Factor out h^2 \binom{n}{2}x^{n-2}h^2 + \binom{n}{3}x^{n-3}h^3 + \cdots + h^n = h^2\left[\binom{n}{2}x^{n-2} + \binom{n}{3}x^{n-3}h + \cdots + h^{n-2}\right].
For |h| < 1, the bracketed expression is bounded by \binom{n}{2}|x|^{n-2} + \binom{n}{3}|x|^{n-3} + \cdots + 1 =: C, where C depends only on x and n. Therefore, \left|\binom{n}{2}x^{n-2}h^2 + \cdots + h^n\right| \leq C|h|^2, so \frac{\left|\binom{n}{2}x^{n-2}h^2 + \cdots + h^n\right|}{|h|} \leq C|h| \to 0 as h \to 0. Thus df_x(h) = nx^{n-1}h. \square
11.2.4 General Polynomials
A polynomial has the form p(x) = a_0 + a_1 x + a_2 x^2 + \cdots + a_n x^n.
By linearity of the differential (see Theorem 10.1 and Theorem 10.2), dp_x(h) = d(a_0)_x(h) + a_1 d(x)_x(h) + a_2 d(x^2)_x(h) + \cdots + a_n d(x^n)_x(h).
Since d(a_0)_x(h) = 0 and d(x^k)_x(h) = kx^{k-1}h, we obtain dp_x(h) = a_1 h + 2a_2 x h + \cdots + n a_n x^{n-1} h.
Thus Dp(x) = a_1 + 2a_2 x + \cdots + n a_n x^{n-1}.
This gives the derivative of an arbitrary polynomial by termwise differentiation.
11.2.4.1 Example
Let f(x) = 3x^2 + 2x + 5. By linearity, \begin{align*} df_x(h) &= 3 \, d(x^2)_x(h) + 2 \, d(x)_x(h) + d(5)_x(h) \\ &= 3 \cdot 2xh + 2h + 0 \\ &= (6x + 2)h. \end{align*}
Thus Df(x) = 6x + 2.
At x = 1, the differential is df_1(h) = 8h. A displacement of h = 0.1 produces an approximate change of df_1(0.1) = 0.8 in the function value. The actual change is f(1.1) - f(1) = 10.83 - 10 = 0.83. The error of 0.03 comes from the quadratic term 3h^2 = 3(0.1)^2 = 0.03 in the expansion.
11.2.4.2 Example
Let g(x) = x^3 - 4x^2 + x - 7. Then Dg(x) = 3x^2 - 8x + 1.
To find where the tangent line is horizontal, solve Dg(x) = 0: 3x^2 - 8x + 1 = 0 \implies x = \frac{8 \pm \sqrt{64 - 12}}{6} = \frac{8 \pm 2\sqrt{13}}{6} = \frac{4 \pm \sqrt{13}}{3}.
At these points, the differential vanishes: dg_x(h) = 0 \cdot h for all h. The tangent line is horizontal, and the function has zero instantaneous rate of change.
11.3 Exponential and Logarithmic Functions
11.3.1 The Natural Exponential
The exponential function f(x) = e^x has the property that it equals its own derivative.
Claim: D(e^x) = e^x.
To see why, recall the definition of the derivative:
D(e^x) = \lim_{h \to 0} \frac{e^{x+h} - e^x}{h} = \lim_{h \to 0} \frac{e^x(e^h - 1)}{h} = e^x \lim_{h \to 0} \frac{e^h - 1}{h}.
So it suffices to evaluate \lim_{h \to 0} \frac{e^h - 1}{h}.
We established in ?exm-neutron-multiplication that e is defined through the convergent sequence e = \lim_{n \to \infty} \left(1 + \frac{1}{n}\right)^n.
This characterization of e uniquely determines it as the base for which \lim_{h \to 0} \frac{e^h - 1}{h} = 1.
This is not a theorem we need to prove here—it is part of the defining property of e as the base of the natural exponential function. One can verify this using the binomial expansion of (1 + h/n)^n and passing to the limit, but we take it as established. Therefore, D(e^x) = e^x \cdot 1 = e^x.
The tangent line at (a, e^a) has slope e^a, so the linear approximation is e^{a+h} \approx e^a + e^a h.
This self-replicating property underlies the exponential’s role in modeling processes where the rate of change is proportional to the current value—population growth, radioactive decay, compound interest.
11.3.2 General Exponentials
Once we understand the natural exponential, we can differentiate any exponential function with positive base. Let a > 0 and consider f(x) = a^x.
We can rewrite a^x in terms of the natural exponential: a^x = e^{x \ln a}.
This is a composition f \circ g where f(u) = e^u and g(x) = x \ln a. By Theorem 10.5, D(a^x) = Df(g(x)) \cdot Dg(x) = e^{x \ln a} \cdot \ln a = a^x \ln a.
The rate of change of a^x is proportional to its current value, with the proportionality constant \ln a. The tangent line at (x, a^x) is a^{x+h} \approx a^x + a^x \ln a \cdot h.
This reduces to the natural exponential case when a = e, since \ln e = 1.
11.3.3 The Natural Logarithm
The natural logarithm f(x) = \ln x is the inverse of e^x. For x > 0, we claim D(\ln x) = \frac{1}{x}.
By definition of the differential, d(\ln x)_x(h) = \lim_{k \to 0} \frac{\ln(x+k) - \ln(x)}{k} \cdot h.
Factor out x from the logarithm difference: \ln(x+k) - \ln(x) = \ln\left(\frac{x+k}{x}\right) = \ln\left(1 + \frac{k}{x}\right).
Thus d(\ln x)_x(h) = \lim_{k \to 0} \frac{\ln\left(1 + \frac{k}{x}\right)}{k} \cdot h = \lim_{k \to 0} \frac{\ln\left(1 + \frac{k}{x}\right)}{k/x} \cdot \frac{h}{x}.
It remains to evaluate \lim_{u \to 0} \frac{\ln(1+u)}{u}.
We use the inequalities \frac{u}{1+u} \le \ln(1+u) \le u \quad \text{for } 0 < u < 1,
these inequalities can be established using the Theorem 9.2 applied to f(t) = \ln t on appropriate intervals.
Let \varepsilon > 0. For |u| < 1/2, dividing the inequality by u (with u \neq 0) gives \frac{1}{1+u} \le \frac{\ln(1+u)}{u} \le 1.
Hence \left| \frac{\ln(1+u)}{u} - 1 \right| \le \frac{|u|}{1+u} \le 2 |u| \quad \text{for } |u| < 1/2.
Choosing \delta = \min(1/2, \varepsilon/2) ensures that for 0 < |u| < \delta, \left| \frac{\ln(1+u)}{u} - 1 \right| < \varepsilon.
Thus the limit equals 1. \square
Returning to our differential calculation d(\ln x)_x(h) = \underbrace{\lim_{k \to 0} \frac{\ln\left(1 + \frac{k}{x}\right)}{k/x}}_{=1} \cdot \frac{h}{x} = \frac{h}{x}.
Equivalently, in standard notation D(\ln x) = \frac{1}{x}.
For a small displacement h, d(\ln x)_x(h) = \frac{h}{x}.
For example, if x = 10 and h = 0.1, then d(\ln 10)(0.1) = 0.01, a 1\% change in the logarithm.
11.3.4 Logarithms with General Base
Let a > 0, a \neq 1, and consider the logarithm with base a: f(x) = \log_a x, \quad x > 0.
By the change of base formula, \log_a x = \frac{\ln x}{\ln a}.
Since \ln a is a constant, the differential is d(\log_a x)_x(h) = \frac{1}{\ln a}\, d(\ln x)_x(h) = \frac{1}{\ln a} \cdot \frac{h}{x} = \frac{h}{x \ln a}.
Equivalently, in standard derivative notation: D(\log_a x) = \frac{1}{x \ln a}.
11.4 Trigonometric Functions
11.4.1 The Sine Function
We established in Theorem 5.6 that \lim_{h \to 0} \frac{\sin h}{h} = 1.
This limit determines the derivative of sine. For f(x) = \sin x, \frac{f(x+h) - f(x)}{h} = \frac{\sin(x+h) - \sin x}{h}.
Using the angle addition formula \sin(x+h) = \sin x \cos h + \cos x \sin h, \frac{\sin(x+h) - \sin x}{h} = \sin x \frac{\cos h - 1}{h} + \cos x \frac{\sin h}{h}.
As h \to 0, we have \frac{\sin h}{h} \to 1 and \frac{\cos h - 1}{h} \to 0 (from Corollary 5.1). Thus D(\sin x) = \sin x \cdot 0 + \cos x \cdot 1 = \cos x.
11.4.2 The Cosine Function
For f(x) = \cos x, we use \cos(x+h) = \cos x \cos h - \sin x \sin h: \frac{\cos(x+h) - \cos x}{h} = \cos x \frac{\cos h - 1}{h} - \sin x \frac{\sin h}{h}.
As h \to 0, this becomes D(\cos x) = \cos x \cdot 0 - \sin x \cdot 1 = -\sin x.
11.4.3 Other Trigonometric Functions
The remaining trigonometric functions are quotients: \tan x = \frac{\sin x}{\cos x}, \quad \sec x = \frac{1}{\cos x}, \quad \csc x = \frac{1}{\sin x}, \quad \cot x = \frac{\cos x}{\sin x}.
Each derivative follows from Theorem 10.7. For tangent, \begin{align*} D(\tan x) &= \frac{\cos x \cdot \cos x - \sin x \cdot (-\sin x)}{\cos^2 x} \\ &= \frac{\cos^2 x + \sin^2 x}{\cos^2 x} \\ &= \frac{1}{\cos^2 x} = \sec^2 x. \end{align*}
The others are left as an exercise for the reader.
11.5 Rational Functions
A rational function is a quotient r(x) = \frac{p(x)}{q(x)} where p and q are polynomials. By Theorem 10.7, Dr(x) = \frac{Dp(x) \cdot q(x) - p(x) \cdot Dq(x)}{[q(x)]^2}.
11.5.0.1 Example
Let f(x) = \frac{x^2 + 1}{x - 2} for x \neq 2. Here p(x) = x^2 + 1 with Dp(x) = 2x, and q(x) = x - 2 with Dq(x) = 1. Thus Df(x) = \frac{2x(x-2) - (x^2+1)(1)}{(x-2)^2} = \frac{x^2 - 4x - 1}{(x-2)^2}.
The differential df_x exists wherever q(x) \neq 0, so the domain of Df is \mathbb{R} \setminus \{2\}. As x \to 2, the denominator approaches zero and the differential becomes increasingly sensitive to small changes—this is the instability we identified in ?sec-diff-rules near zeros of the denominator.
11.5.0.2 Example
Consider f(x) = \frac{1}{x} = x^{-1} for x \neq 0. By Theorem 10.7 with p(x) = 1 and q(x) = x, Df(x) = \frac{0 \cdot x - 1 \cdot 1}{x^2} = -\frac{1}{x^2} = -x^{-2}.
This suggests extending the power rule to negative integers: if f(x) = x^n for n < 0, then Df(x) = nx^{n-1}. We verify this holds for all integers using Theorem 10.7. For rational exponents n = p/q, the same pattern persists, and we have D(x^{p/q}) = \frac{p}{q} x^{\frac{p}{q} -1}.
1. Negative integers
Assume n<0. Write m=-n>0. Then x^n = x^{-m} = \frac{1}{x^m}.
Let u(x)=x^m, so Du(x)=m x^{m-1}.
Differentiate v(x)=1/u(x) using the reciprocal rule (a special case of the quotient rule): D\bigl(u^{-1}\bigr)(x) = -\frac{Du(x)}{u(x)^2} = -\frac{m x^{m-1}}{x^{2m}} = -m x^{-m-1}.
Since n=-m, this becomes D(x^n) = -m x^{-m-1} = n x^{n-1}.
The only restriction here is that x\neq 0, which is already required for x^n with n<0.
2. Rational exponents n = p/q
Let n=p/q with q>0. Work on the domain where the real q-th root is defined (for even q, take x>0; for odd q, any x\neq 0 works).
Define r(x) = x^{1/q}, \qquad f(x) = x^{p/q} = \bigl(r(x)\bigr)^p.
Set a = (x+h)^{1/q}, \qquad b = x^{1/q}.
Then a^q - b^q = (x+h) - x = h.
Using the factorization from Lemma 5.1, a^q - b^q = (a-b)\bigl(a^{q-1} + a^{q-2}b + \cdots + b^{q-1}\bigr),
we get \frac{(x+h)^{1/q}-x^{1/q}}{h}= \frac{1}{a^{q-1} + a^{q-2}b + \cdots + b^{q-1}}.
As h\to 0, we have a\to b. The sum in the denominator has q terms, all approaching b^{q-1}, so the denominator approaches q b^{q-1}. Thus Dr(x) = \lim_{h\to 0} \frac{(x+h)^{1/q} - x^{1/q}}{h} = \frac{1}{q b^{q-1}} = \frac{1}{q}x^{\frac{1}{q}-1}.
The outer function y\mapsto y^p has derivative p y^{p-1}. By the chain rule, Df(x) = p r(x)^{p-1} \cdot Dr(x) = p x^{\frac{p-1}{q}} \cdot \frac{1}{q} x^{\frac{1}{q}-1}.
Combine the exponents: Df(x) = \frac{p}{q} x^{\frac{p}{q} -1} = n x^{n-1}. \quad \square
Note on Inverse Trigonometric Functions. The derivatives of \arcsin x, \arccos x, and \arctan x are developed in the Inverse Function Theorem section of the Differentiation Techniques chapter, where their derivation from the differential perspective reveals the structural meaning of these formulas.
11.6 Use of the Chain Rule
Compositions of elementary functions are differentiated using Theorem 10.5. Recall from ?sec-diff-rules that the chain rule expresses the differential of a composition as the composition of differentials: d_a(f \circ g)(h) = df_{g(a)}(dg_a(h)).
In standard notation, if h = f \circ g, then Dh(x) = Df(g(x)) \cdot Dg(x).
The derivative of the composition is the derivative of the outer function, evaluated at the inner function, multiplied by the derivative of the inner function.
11.6.0.1 Example: Sine of a Polynomial
Let h(x) = \sin(x^2). Write h = f \circ g where f(u) = \sin u and g(x) = x^2. Then Df(u) = \cos u and Dg(x) = 2x, so Dh(x) = \cos(x^2) \cdot 2x = 2x \cos(x^2).
The differential is dh_x(k) = 2x \cos(x^2) \cdot k.
11.6.0.2 Example: Exponential of a Quotient
Let h(x) = e^{x/(x+1)} for x \neq -1. Write h = f \circ g where f(u) = e^u and g(x) = \frac{x}{x+1}. Then Df(u) = e^u and, by Theorem 10.7, Dg(x) = \frac{(x+1) - x}{(x+1)^2} = \frac{1}{(x+1)^2}.
Thus Dh(x) = e^{x/(x+1)} \cdot \frac{1}{(x+1)^2}.
11.6.0.3 Example: Nested Composition
Let h(x) = \sin(e^{x^2}). This is h = f \circ g \circ k where f(u) = \sin u, g(v) = e^v, and k(x) = x^2. Applying Theorem 10.5 twice, Dh(x) = Df(g(k(x))) \cdot Dg(k(x)) \cdot Dk(x) = \cos(e^{x^2}) \cdot e^{x^2} \cdot 2x = 2x e^{x^2} \cos(e^{x^2}).
11.7 Table of Derivatives
We have established derivatives for all elementary functions:
| Function | Derivative |
|---|---|
| c (constant) | 0 |
| x^n | nx^{n-1} |
| e^x | e^x |
| a^x | a^x\cdot \ln a |
| \ln x | \frac{1}{x} |
| \log_a x | \frac{1}{x \ln a} |
| \sin x | \cos x |
| \cos x | -\sin x |
| \tan x | \sec^2 x |
| \cot x | -\csc^2 x |
| \sec x | \sec x \tan x |
| \csc x | -\csc x \cot x |
Combined with Theorem 10.1, Theorem 10.4, Theorem 10.7, and Theorem 10.5, these derivatives allow us to differentiate any expression built from elementary functions. The process is algorithmic: identify the outermost operation, apply the corresponding rule, and work inward recursively.