Fullerton, California
copyright 2005 Benjamin Crowell
rev. November 16, 2013
This book is licensed under the Creative Commons
for those photographs and drawings of which I am not
the author, as listed in the photo credits. If you agree
to the license, it grants you certain privileges that you
would not otherwise have, such as the right to copy the
book, or download the digital version free of charge from At your option, you may also copy
this book under the GNU Free Documentation License version
1.2,, with no invariant
sections, no front-cover texts, and no back-cover texts.
7 3.6 Generalizations
7 l’Hôpital’s rule . . . . . . 65
1 Rates of Change
1.1 Change in discrete steps
Two sides of the same coin,
7.—Some guesses, 9.
1.2 Continuous change . .
A derivative, 13.—Properties
of the derivative, 14.—
14.—The second derivative,
1.3 Applications . . . . .
Problems. . . . . . . .
2 To infinity — and
2.1 Infinitesimals. . . . .
2.2 Safe use of infinitesimals
2.3 The product rule . . .
2.4 The chain rule . . . .
2.5 Exponentials
logarithms . . . . . . .
2.6 Quotients . . . . . . 42
2.7 Differentiation
computer . . . . . . . . 43
Problems. . . . . . . . 47
3 Limits and continuity
theorem, 53.—The extreme
value theorem, 56.
3.2 Limits . . . . . .
3.3 L’Hôpital’s rule . . .
3.4 Another perspective
indeterminate forms . .
3.5 Limits at infinity . . .
4.1 Definite and indefinite
integrals . . . . . . . .
4.2 The fundamental theorem
of calculus . . . . . . .
4.3 Properties of the integral
4.4 Applications . . . . .
Averages, 74.—Work, 75.—
Probability, 75.
The exponential, 39.—The
logarithm, 40.
3.1 Continuity . . . . . .
4 Integration
Maxima and minima, 17.—
Propagation of errors, 19.
Problems. . . . . . . .
Multiple applications of the
rule, 65.—The indeterminate
form ∞/∞, 66.—Limits at
infinity, 66.
Problems. . . . . . . .
5 Techniques
5.1 Newton’s method . . .
5.2 Implicit differentiation .
5.3 Methods of integration .
Change of variable,
Integration by parts,
Integrals that can’t be
Problems. . . . . . . .
6 Improper integrals
6.1 Integrating a function that
blows up . . . . . . . . 99
6.2 Limits of integration at
infinity . . . . . . . . . 100
Problems. . . . . . . . 102
7 Sequences
7.1 Infinite sequences. . . 103
7.2 Infinite series . . .
7.3 Tests for convergence
7.4 Taylor series . . . .
Problems. . . . . . .
8 Complex number
8.1 Review
numbers . . . . . . . . 117
8.2 Euler’s formula . . . . 120
8.3 Partial fractions revisited 122
Problems. . . . . . . . 124
9 Iterated integrals
9.1 Integrals inside integrals 127
9.2 Applications . . . . . 129
9.3 Polar coordinates . . . 131
9.4 Spherical and cylindrical
coordinates . . . . . . . 133
Problems. . . . . . . . 135
A Detours 137
Formal definition of the tangent line, 137.—Derivatives
of polynomials, 138.—Details
of the proof of the derivative of the sine function,
139.—Formal statement of
the transfer principle, 141.—
Is the transfer principle true?,
142.—The transfer principle
applied to functions, 147.—
Proof of the chain rule,
149.—Derivative of ex , 149.—
Proofs of the generalizations
of l’Hôpital’s rule, 150.—
Proof of the fundamental theorem of calculus, 152.—The
intermediate value theorem,
154.—Proof of the extreme
value theorem, 157.—Proof
of the mean value theorem,
159.—Proof of the fundamental theorem of algebra, 160.
B Answers and solutions
C Photo Credits 197
D References and Further Reading 199
Further Reading,
References, 199.
E Reference 201
E.1 Review . . . . . . . 201
area, and volume, 201.—
Trigonometry with a right
triangle, 201.—Trigonometry
with any triangle, 201.
E.2 Hyperbolic functions. . 201
E.3 Calculus . . . . . . 202
Rules for differentiation,
202.—Table of integrals, 202.
1 Rates of Change
1.1 Change in
discrete steps
Toward the end of the eighteenth
century, a German elementary
school teacher decided to keep his
pupils busy by assigning them a
long, boring arithmetic problem:
to add up all the numbers from
one to a hundred.1
The children set to work on their slates,
and the teacher lit his pipe, confident of a long break. But almost immediately, a boy named
Carl Friedrich Gauss brought up
his answer: 5,050.
b / A trick for finding the
ing the area of the shaded region.
Roughly half the square is shaded
in, so if we want only an approximate solution, we can simply calculate 72 /2 = 24.5.
But, as suggested in figure b, it’s
not much more work to get an exact result. There are seven sawteeth sticking out out above the diagonal, with a total area of 7/2,
so the total shaded area is (72 +
7)/2 = 28. In general, the sum of
the first n numbers will be (n2 +
n)/2, which explains Gauss’s result: (1002 + 100)/2 = 5, 050.
a / Adding the numbers
from 1 to 7.
Two sides of the same coin
Figure a suggests one way of solving this type of problem. The
filled-in columns of the graph represent the numbers from 1 to 7,
and adding them up means find-
Problems like this come up frequently. Imagine that each household in a certain small town sends
a total of one ton of garbage to the
1 I’m giving my own retelling of a
dump every year. Over time, the
hoary legend. We don’t really know the
exact problem, just that it was supposed garbage accumulates in the dump,
to have been something of this flavor.
taking up more and more space.
rate of change
(n2 + n)/2
The rate of change of the function
x can be notated as ẋ. Given the
function ẋ, we can always determine the function x for any value
of n by doing a running sum.
Likewise, if we know x, we can determine ẋ by subtraction. In the
c / Carl Friedrich Gauss
example where x = 13n, we can
(1777-1855), a long time
find ẋ = x(n) − x(n − 1) = 13n −
after graduating from ele13(n − 1) = 13. Or if we knew
mentary school.
that the accumulated amount of
Let’s label the years as n = 1, 2, garbage was given by (n + n)/2,
3, . . ., and let the function2 x(n) we could calculate the town’s poprepresent the amount of garbage ulation like this:
that has accumulated by the end
of year n. If the population is
constant, say 13 households, then n + n − (n − 1) + (n − 1)
garbage accumulates at a constant
2n + 1 + n − 1
rate, and we have x(n) = 13n.
But maybe the town’s population
is growing. If the population starts
out as 1 household in year 1, and
then grows to 2 in year 2, and so
on, then we have the same kind
of problem that the young Gauss
solved. After 100 years, the accumulated amount of garbage will be
5,050 tons. The pile of refuse grows
more quickly every year; the rate of
change of x is not constant. Tabulating the examples we’ve done so
far, we have this:
2 Recall that when x is a function, the
notation x(n) means the output of the
function when the input is n. It doesn’t
represent multiplication of a number x by
a number n.
d / ẋ is the slope of x.
The graphical interpretation of
this is shown in figure d: on a of n.
graph of x = (n2 + n)/2, the slope
of the line connecting two successive points is the value of the func- Some guesses
tion ẋ.
Even though we lack Gauss’s geIn other words, the functions x and nius, we can recognize certain patẋ are like different sides of the same terns. One pattern is that if ẋ is a
coin. If you know one, you can find function that gets bigger and bigthe other — with two caveats.
ger, it seems like x will be a function that grows even faster than
First, we’ve been assuming im- ẋ. In the example of ẋ = n and
plicitly that the function x starts x = (n2 +n)/2, consider what hapout at x(0) = 0. That might pens for a large value of n, like
not be true in general. For in- 100. At this value of n, ẋ = 100,
stance, if we’re adding water to a which is pretty big, but even withreservoir over a certain period of out pawing around for a calculator,
time, the reservoir probably didn’t we know that x is going to turn out
start out completely empty. Thus, really really big. Since n is large,
if we know ẋ, we can’t find out n2 is quite a bit bigger than n, so
everything about x without some roughly speaking, we can approxifurther information: the starting mate x ≈ n2 /2 = 5, 000. 100 may
value of x. If someone tells you be a big number, but 5,000 is a lot
ẋ = 13, you can’t conclude x = bigger. Continuing in this way, for
13n, but only x = 13n + c, where c n = 1000 we have ẋ = 1000, but
is some constant. There’s no such x ≈ 500, 000 — now x has far outambiguity if you’re going the op- stripped ẋ. This can be a fun game
posite way, from x to ẋ. Even to play with a calculator: look at
if x(0) 6= 0, we still have ẋ = which functions grow the fastest.
13n + c − [13(n − 1) + c] = 13.
For instance, your calculator might
have an x2 button, an ex button,
Second, it may be difficult, or even
and a button for x! (the factorial
impossible, to find a formula for
function, defined as x! = 1·2·. . .·x,
the answer when we want to dee.g., 4! = 1 · 2 · 3 · 4 = 24). You’ll
termine the running sum x given
find that 502 is pretty big, but e50
a formula for the rate of change ẋ.
is incomparably greater, and 50! is
Gauss had a flash of insight that
so big that it causes an error.
led him to the result (n2 + n)/2,
but in general we might only be All the x and ẋ functions we’ve
able to use a computer spreadsheet seen so far have been polynomials.
to calculate a number for the run- If x is a polynomial, then of course
ning sum, rather than an equation we can find a polynomial for ẋ as
that would be valid for all values well, because if x is a polynomial,
then x(n)−x(n−1) will be one too.
It also looks like every polynomial
we could choose for ẋ might also
correspond to an x that’s a polynomial. And not only that, but it
looks as though there’s a pattern
in the power of n. Suppose x is a
polynomial, and the highest power
of n it contains is a certain number — the “order” of the polynomial. Then ẋ is a polynomial of
that order minus one. Again, it’s
fairly easy to prove this going one
way, passing from x to ẋ, but more
difficult to prove the opposite relationship: that if ẋ is a polynomial
of a certain order, then x must be
a polynomial with an order that’s
greater by one.
We’d imagine, then, that the running sum of ẋ = n2 would be a
polynomial of order 3. If we calculate x(100) = 12 + 22 + . . . +
1002 on a computer spreadsheet,
we get 338,350, which looks suspiciously close to 1, 000, 000/3. It
looks like x(n) = n3 /3 + . . ., where
the dots represent terms involving
lower powers of n such as n2 . The
fact that the coefficient of the n3
term is 1/3 is proved in problem
21 on p. 23.
Example 1
Figure e shows a pyramid consisting
of a single cubical block on top, supported by a 2 × 2 layer, supported in
turn by a 3 × 3 layer. The total volume
is 12 + 22 + 32 , in units of the volume of
a single block.
Generalizing to the sum x(n) = 12 +
e / A pyramid with a volume of 12 + 22 + 32 .
22 + . . . + n2 , and applying the result of
the preceding paragraph, we find that
the volume of such a pyramid is approximately (1/3)Ah, where A = n2 is
the area of the base and h = n is the
When n is very large, we can get as
good an approximation as we like to
a smooth-sided pyramid, and the error incurred in x(n) ≈ (1/3)n3 + . . . by
omitting the lower-order terms . . . can
be made as small as desired.
We therefore conclude that the volume is exactly (1/3)Ah for a smoothsided pyramid with these proportions.
This is a special case of a theorem
first proved by Euclid (propositions
XII-6 and XII-7) two thousand years
before calculus was invented.
1.2 Continuous
Did you notice that I sneaked
something past you in the example
of water filling up a reservoir? The
x and ẋ functions I’ve been using
as examples have all been functions
defined on the integers, so they
represent change that happens in
discrete steps, but the flow of water
into a reservoir is smooth and con-
alyzing x and ẋ functions that were
truly continuous. The notation ẋ
is due to him (and he only used it
for continuous functions). Because
he was dealing with the continuous
flow of change, he called his new
set of mathematical techniques the
method of fluxions, but nowadays
it’s known as the calculus.
f / Isaac Newton (16431727)
tinuous. Or is it? Water is made
out of molecules, after all. It’s just
that water molecules are so small
that we don’t notice them as individuals. Figure g shows a graph
that is discrete, but almost appears continuous because the scale
has been chosen so that the points
blend together visually.
h / The function x(t) =
t 2 /2, and its tangent line
at the point (1, 1/2).
Newton was a physicist, and he
needed to invent the calculus as
part of his study of how objects
move. If an object is moving in
one dimension, we can specify its
position with a variable x, and x
will then be a function of time, t.
The rate of change of its position,
ẋ, is its speed, or velocity. Earlier experiments by Galileo had established that when a ball rolled
g / On this scale, the
down a slope, its position was pro2
graph of (n + n)/2 apportional to t2 , so Newton inferred
pears almost continuous.
that a graph like figure h would
be typical for any object moving
The physicist Isaac Newton started under the influence of a constant
thinking along these lines in the force. (It could be 7t2 , or t2 /42,
1660’s, and figured out ways of an- or anything else proportional to t2 ,
i / This line isn’t a tangent
line: it crosses the graph.
depending on the force acting on
the object and the object’s mass.)
Because the functions are continuous, not discrete, we can no longer
define the relationship between x
and ẋ by saying x is a running sum
of ẋ’s, or that ẋ is the difference between two successive x’s. But we
already found a geometrical relationship between the two functions
in the discrete case, and that can
serve as our definition for the continuous case: x is the area under
the graph of ẋ, or, if you like, ẋ is
the slope of the graph of x. For
now we’ll concentrate on the slope
This definition is still a little vague,
because we haven’t defined what
we mean by the “slope” of a curving graph. For a discrete graph
like figure d, we could define it as
the slope of the line drawn between
neighboring points. Visually, it’s
clear that the continuous version
of this is something like the line
drawn in figure h. This is referred
to as the tangent line.
We still need to convert this intuitive idea of a tangent line into
a formal definition. In a typical example like figure h, the tangent line can be defined as the line
that touches the graph at a certain
point, but, unlike the line in figure i, doesn’t cut across the graph
at that point.3 By measuring with
a ruler on figure h, we find that
the slope is very close to 1, so evidently ẋ(1) = 1. To prove this, we
construct the function representing
the line: `(t) = t − 1/2. We want
to prove that this line doesn’t cross
the graph of x(t) = t2 /2. The difference between the two functions,
x − `, is the polynomial t2 /2 − t +
1/2, and this polynomial will be
zero for any value of t where the
line touches or crosses the curve.
We can use the quadratic formula
to find these points, and the result
is that there is only one of them,
which is t = 1. Since x − ` is positive for at least some points to the
left and right of t = 1, and it only
equals zero at t = 1, it must never
be negative, which means that the
line always lies below the curve,
never crossing it.
3 In the case where the original graph
is itself a line, the tangent line simply coincides with the graph, and this also satisfies the definition, because the tangent
line doesn’t cut across the graph; it lies
on top of it. There is one other exceptional case, called a point of inflection,
which we won’t worry about right now.
For a more complicated definition that
correctly handles all the cases, see page
A derivative
That proves that ẋ(1) = 1, but it
was a lot of work, and we don’t
want to do that much work to evaluate ẋ at every value of t. There’s
a way to avoid all that, and find a
formula for ẋ. Compare figures h
and j. They’re both graphs of the
same function, and they both look
the same. What’s different? The
only difference is the scales: in figure j, the t axis has been shrunk
by a factor of 2, and the x axis by
a factor of 4. The graph looks the
same, because doubling t quadruples t2 /2. The tangent line here
is the tangent line at t = 2, not
t = 1, and although it looks like
the same line as the one in figure
h, it isn’t, because the scales are
different. The line in figure h had
a slope of rise/run = 1/1 = 1,
but this one’s slope is 4/2 = 2.
That means ẋ(2) = 2. In general,
this scaling argument shows that
ẋ(t) = t for any t.
j / The function t 2 /2
How is this
different from figure h?
This is called differentiating: finding a formula for the function ẋ,
given a formula for the function
x. The term comes from the idea
that for a discrete function, the
slope is the difference between two
successive values of the function.
The function ẋ is referred to as the
derivative of the function x, and
the art of differentiating is differential calculus. The opposite process, computing a formula for x
when given ẋ, is called integrating,
and makes up the field of integral
calculus; this terminology is based
on the idea that computing a running sum is like putting together
(integrating) many little pieces.
Note the similarity between this result for continuous functions,
x = t2 /2
ẋ = t
and our earlier result for discrete
x = (n2 + n)/2
ẋ = n
The similarity is no coincidence.
A continuous function is just a
smoothed-out version of a discrete
one. For instance, the continuous
version of the staircase function
shown in figure b on page 7 would
simply be a triangle without the
saw teeth sticking out; the area of
those ugly sawteeth is what’s represented by the n/2 term in the discrete result x = (n2 + n)/2, which
is the only thing that makes it different from the continuous result
x = t2 /2.
Properties of the derivative
It follows immediately from the
definition of the derivative that
multiplying a function by a constant multiplies its derivative by
the same constant, so for example
since we know that the derivative
of t2 /2 is t, we can immediately tell
that the derivative of t2 is 2t, and
the derivative of t2 /17 is 2t/17.
Also, if we add two functions, their
derivatives add. To give a good
example of this, we need to have
another function that we can differentiate, one that isn’t just some
multiple of t2 . An easy one is t: the
derivative of t is 1, since the graph
of x = t is a line with a slope of 1,
and the tangent line lies right on
top of the original line.
Example 2
The derivative of 5t 2 + 2t is the derivative of 5t 2 plus the derivative of 2t,
since derivatives add. The derivative
of 5t 2 is 5 times the derivative of t 2 ,
and the derivative of 2t is 2 times the
derivative of t, so putting everything
together, we find that the derivative of
5t 2 + 2t is (5)(2t) + (2)(1) = 10t + 2.
The derivative of a constant is
zero, since a constant function’s
graph is a horizontal line, with
a slope of zero. We now know
enough to differentiate any secondorder polynomial.
Example 3
. An insect pest from the United
States is inadvertently released in a
village in rural China. The pests
spread outward at a rate of s kilometers per year, forming a widening circle of contagion. Find the number of
square kilometers per year that become newly infested. Check that the
units of the result make sense. Interpret the result.
. Let t be the time, in years, since
the pest was introduced. The radius
of the circle is r = st, and its area is
a = πr 2 = π(st)2 . To make this look
like a polynomial, we have to rewrite
this as a = (πs2 )t 2 . The derivative is
ȧ = (πs2 )(2t)
ȧ = (2πs2 )t
The units of s are km/year, so squaring it gives km2 /year2 . The 2 and the
π are unitless, and multiplying by t
gives units of km2 /year, which is what
we expect for ȧ, since it represents the
number of square kilometers per year
that become infested.
Interpreting the result, we notice a
couple of things. First, the rate of
infestation isn’t constant; it’s proportional to t, so people might not pay
so much attention at first, but later on
the effort required to combat the problem will grow more and more quickly.
Second, we notice that the result is
proportional to s2 . This suggests that
anything that could be done to reduce
s would be very helpful. For instance,
a measure that cut s in half would reduce ȧ by a factor of four.
Higher-order polynomials
So far, we have the following results for polynomials up to order
Interpreting 1 as t0 , we detect what
seems to be a general rule, which
is that the derivative of tk is ktk−1 .
The proof is straightforward but
not very illuminating if carried out
with the methods developed in this
chapter, so I’ve relegated it to page
138. It can be proved much more
easily using the methods of chapter
Example 4
. If x = 2t 7 − 4t + 1, find ẋ.
. This is similar to example 2, the only
difference being that we can now handle higher powers of t. The derivative
of t 7 is 7t 6 , so we have
ẋ = (2)(7t 6 ) + (−4)(1) + 0
= 14t 6 − 4
Example 5
. Calculate 3−1 and 3.01−1 . Does
this seem consistent with a conjecture
that the rule for differentiating t k holds
for k < 0?
. We have 3−1 ≈ 0.33333 and
3.01−1 ≈ 0.332223, the difference being −1.1 × 10−3 . This suggests that
the graph of x = 1/t has a tangent line
at t = 3 with a slope of about
−1.1 × 10−3
= −0.11
If the rule for differentiating t k were to
hold, then we would have ẋ = −t −2 ,
and evaluating this at x = 3 would give
−1/9, which is indeed about −0.11.
Yes, the rule does appear to hold for
negative k , although this numerical
check does not constitute a proof. A
proof is given in example 10 on p. 27.
The second derivative
I described how Galileo and Newton found that an object subject
to an external force, starting from
rest, would have a velocity ẋ that
was proportional to t, and a position x that varied like t2 . The proportionality constant for the velocity is called the acceleration, a, so
that ẋ = at and x = at2 /2. For
example, a sports car accelerating
from a stop sign would have a large
acceleration, and its velocity at at
a given time would therefore be
a large number. The acceleration
can be thought of as the derivative of the derivative of x, written ẍ, with two dots. In our example, ẍ = a. In general, the acceleration doesn’t need to be constant. For example, the sports car
will eventually have to stop accelerating, perhaps because the backward force of air friction becomes
as great as the force pushing it forward. The total force acting on the
car would then be zero, and the car
would continue in motion at a constant speed.
Example 6
Suppose the pilot of a blimp has just
turned on the motor that runs its propeller, and the propeller is spinning
up. The resulting force on the blimp
is therefore increasing steadily, and
let’s say that this causes the blimp to
have an acceleration ẍ = 3t, which increases steadily with time. We want
to find the blimp’s velocity and position
as functions of time.
For the velocity, we need a polynomial
whose derivative is 3t. We know that
the derivative of t 2 is 2t, so we need to
use a function that’s bigger by a factor
of 3/2: ẋ = (3/2)t 2 . In fact, we could
add any constant to this, and make it
ẋ = (3/2)t 2 + 14, for example, where
the 14 would represent the blimp’s
initial velocity. But since the blimp
has been sitting dead in the air until the motor started working, we can
assume the initial velocity was zero.
Remember, any time you’re working
backwards like this to find a function
whose derivative is some other function (integrating, in other words), there
is the possibility of adding on a constant like this.
Finally, for the position, we need
something whose derivative is (3/2)t 2 .
The derivative of t 3 would be 3t 2 , so
we need something half as big as this:
x = t 3 /2.
The second derivative can be interpreted as a measure of the curvature of the graph, as shown in
figure k. The graph of the function
x = 2t is a line, with no curvature.
Its first derivative is 2, and its second derivative is zero. The function t2 has a second derivative of 2,
and the more tightly curved function 7t2 has a bigger second derivative, 14.
k / The functions 2t, t 2
and 7t 2 .
l / The functions t 2 and
3 − t 2.
Positive and negative signs of the
second derivative indicate concavity. In figure l, the function t2 is
like a cup with its mouth pointing
up. We say that it’s “concave up,”
and this corresponds to its positive second derivative. The function 3−t2 , with a second derivative
less than zero, is concave down.
Another way of saying it is that if
you’re driving along a road shaped
like t2 , going in the direction of increasing t, then your steering wheel
is turned to the left, whereas on a
road shaped like 3 − t2 it’s turned
to the right.
telling him that his investment in a certain stock will have a value given by
x = −2t 4 + (6.4577 × 1010 )t, where
t ≥ 2005 is the year. Should he sell at
some point? If so, when?
. If the value reaches a maximum at
some time, then the derivative should
be zero then. Taking the derivative
and setting it equal to zero, we have
m / The functions t 3 has
an inflection point at t =
Figure m shows a third possibility.
The function t3 has a derivative
3t2 , which equals zero at t = 0.
This called a point of inflection.
The concavity of the graph is down
on the left, up on the right. The
inflection point is where it switches
from one concavity to the other. In
the alternative description in terms
of the steering wheel, the inflection
point is where your steering wheel
is crossing from left to right.
1.3 Applications
Maxima and minima
When a function goes up and then
smoothly turns around and comes
back down again, it has zero slope
at the top. A place where ẋ = 0,
then, could represent a place where
x was at a maximum. On the other
hand, it could be concave up, in
which case we’d have a minimum.
The term extremum refers to either a maximum or a minimum.
Example 7
. Fred receives a mysterious e-mail tip
0 = −8t 3 + 6.4577 × 1010
6.4577 × 1010
t = ±2006.0
Obviously the solution at t = −2006.0
is bogus, since the stock market didn’t
exist four thousand years ago, and the
tip only claimed the function would be
valid for t ≥ 2005.
Should Fred sell on New Year’s eve of
But this could be a maximum, a minimum, or an inflection point. Fred definitely does not want to sell at t = 2006
if it’s a minimum! To check which of
the three possibilities hold, Fred takes
the second derivative:
ẍ = −24t 2
Plugging in t = 2006.0, we find that
the second derivative is negative at
that time, so it is indeed a maximum.
Implicit in this whole discussion
was the assumption that the maximum or minimum occurred where
the function was smooth. There
are some other possibilities.
In figure n, the function’s minimum occurs at an end-point of its
n / The function x = t
has a minimum at t =
0, which is not a place
where ẋ = 0. This point is
the edge of the function’s
Another possibility is that the
function can have a minimum or
maximum at some point where
its derivative isn’t well defined.
Figure o shows such a situation.
There is a kink in the function at
t = 0, so a wide variety of lines
could be placed through the graph
there, all with different slopes and
all staying on one side of the graph.
There is no uniquely defined tangent line, so the derivative is undefined.
o / The function x = |t|
has a minimum at t =
0, which is not a place
where ẋ = 0. This is a
point where the function
isn’t differentiable.
is a = tu = t(L/2 − t). The function only means anything realistic for
0 ≤ t ≤ L/2, since for values of t outside this region either the width or the
height of the rectangle would be negative. The function a(t) could therefore have a maximum either at a place
where ȧ = 0, or at the endpoints of the
function’s domain. We can eliminate
the latter possibility, because the area
is zero at the endpoints.
To evaluate the derivative, we first
need to reexpress a as a polynomial:
Example 8
. Rancher Rick has a length of cyclone fence L with which to enclose a
rectangular pasture. Show that he can
enclose the greatest possible area by
forming a square with sides of length
The derivative is
. If the width and length of the rectangle are t and u, and Rick is going to use up all his fencing material,
then the perimeter of the rectangle,
2t + 2u, equals L, so for a given width,
t, the length is u = L/2 − t. The area
Setting this equal to zero, we find t =
L/4, as claimed. This is a maximum,
not a minimum or an inflection point,
because the second derivative is the
constant ä = −2, which is negative for
all t, including t = L/4.
a = −t 2 +
ȧ = −2t +
Propagation of errors
take the tangent line as an approximation to the actual graph.
The slope of the tangent line is
the derivative of V , which is 4πr2 .
(This is the ball’s surface area.)
Setting (slope) = (rise)/(run) and
solving for the rise, which represents the change in V , we find
that it could be off by as much as
(4πr2 )(0.1 cm) = 170 cm3 . The
volume of the ball can therefore be
expressed as 6500±170 cm3 , where
the original figure of 6538 has been
rounded off to the nearest hundred
in order to avoid creating the impression that the 3 and the 8 actually mean anything — they clearly
don’t, since the possible error is
out in the hundreds’ place.
The Women’s National Basketball
Association says that balls used in
its games should have a radius of
11.6 cm, with an allowable range of
error of plus or minus 0.1 cm (one
millimeter). How accurately can
we determine the ball’s volume?
This calculation is an example of a
very common situation that occurs
in the sciences, and even in everyday life, in which we base a calculation on a number that has some
range of uncertainty in it, causing a
corresponding range of uncertainty
in the final result. This is called
p / How accurately can we determine propagation of errors. The idea is
the ball’s volume?
that the derivative expresses how
sensitive the function’s output is to
its input.
The equation for the volume of
a sphere gives V = (4/3)πr3 = The example of the basketball
6538 cm3 (about six and a half could also have been handled withliters). We have a function V (r), out calculus, simply by recalculatand we want to know how much ing the volume using a radius that
of an effect will be produced on was raised from 11.6 to 11.7 cm,
the function’s output V if its in- and finding the difference between
put r is changed by a certain small the two volumes. Understanding it
amount. Since the amount by in terms of calculus, however, gives
which r can be changed is small us a different way of getting at the
compared to r, it’s reasonable to same ideas, and often allows us to
understand more deeply what’s go- sides of the book and subtracting
ing on. For example, we noticed in the two measurements?
passing that the derivative of the
volume was simply the surface area
of the ball, which provides a nice
geometric visualization. We can
imagine inflating the ball so that
its radius is increased by a millimeter. The amount of added volume
equals the surface area of the ball
multiplied by one millimeter, just
as the amount of volume added to
the world’s oceans by global warming equals the oceans’ surface area
multiplied by the added depth.
For an example of an insight
that we would have missed if we
hadn’t applied calculus, consider
how much error is incurred in the
measurement of the width of a
book if the ruler is placed on the
book at a slightly incorrect angle,
so that it doesn’t form an angle
of exactly 90 degrees with spine.
The measurement has its minimum
(and correct) value if the ruler is
placed at exactly 90 degrees. Since
the function has a minimum at
this angle, its derivative is zero.
That means that we expect essentially no error in the measurement
if the ruler’s angle is just a tiny
bit off. This gives us the insight
that it’s not worth fiddling excessively over the angle in this measurement. Other sources of error
will be more important. For example, is the book a uniform rectangle? Are we using the worn end of
the ruler as its zero, rather than
letting the ruler hang over both
Graph the function t2 in the
neighborhood of t = 3, draw a tangent line, and use its slope to verify
that the derivative equals 2t at this
. Solution, p. 164
Graph the function sin et in
the neighborhood of t = 0, draw a
tangent line, and use its slope to
estimate the derivative. Answer:
0.5403023058. (You will of course
not get an answer this precise using
this technique.)
. Solution, p. 164
Differentiate the following functions with respect to t:
1, 7, t, 7t, t2 , 7t2 , t3 , 7t3 .
. Solution, p. 165
Differentiate 3t7 −4t2 +6 with
respect to t.
. Solution, p. 165
In other words, integrate the given
. Solution, p. 166
Let t be the time that has
elapsed since the Big Bang. In
that time, one would imagine that
light, traveling at speed c, has been
able to travel a maximum distance
ct. (In fact the distance is several
times more than this, because according to Einstein’s theory of general relativity, space itself has been
expanding while the ray of light
was in transit.) The portion of
the universe that we can observe
would then be a sphere of radius
ct, with volume v = (4/3)πr3 =
(4/3)π(ct)3 . Compute the rate v̇
at which the observable universe is
increasing, and check that your answer has the right units, as in example 3 on page 14.
. Solution, p. 166
Differentiate at2 + bt + c with
respect to t.
Kinetic energy is a measure
. Solution, p. 165 [Thompson, 1919] of an object’s quantity of motion;
Find two different functions when you buy gasoline, the energy
whose derivatives are the constant you’re paying for will be converted
3, and give a geometrical interpre- into the car’s kinetic energy (actually only some of it, since the entation.
. Solution, p. 165
gine isn’t perfectly efficient). The
Find a function x whose kinetic energy of an object with
derivative is ẋ = t7 . In other mass m and velocity v is given by
words, integrate the given func- K = (1/2)mv 2 . For a car accelertion.
. Solution, p. 166
ating at a steady rate, with v = at,
find the rate K̇ at which the en8
Find a function x whose
gine is required to put out kinetic
derivative is ẋ = 3t . In other
energy. K̇, with units of energy
words, integrate the given funcover time, is known as the power.
. Solution, p. 166
Check that your answer has the
Find a function x whose right units, as in example 3 on page
derivative is ẋ = 3t7 − 4t2 + 6. 14.
. Solution, p. 166
A metal square expands
and contracts with temperature,
the lengths of its sides varying according to the equation ` = (1 +
αT )`o . Find the rate of change
of its surface area with respect to
˙ where
temperature. That is, find `,
the variable with respect to which
you’re differentiating is the temperature, T . Check that your answer has the right units, as in example 3 on page 14.
. Solution, p. 167
a 6 a 12
E(r) = k
where k and a are constants. Note
that, as proved in chapter 2, the
rule that the derivative of tk is
ktk−1 also works for k < 0. Show
that there is an equilibrium at r =
a. Verify (either by graphing or by
testing the second derivative) that
this is a minimum, not a maximum
or a point of inflection.
Find the second derivative of
. Solution, p. 169
2t3 − t.
. Solution, p. 167
Prove that the total number
Locate any points of inflec- of maxima and minima possessed
tion of the function t3 + t2 . Verify by a third-order polynomial is at
by graphing that the concavity of most two.
. Solution, p. 170
the function reverses itself at this
Functions f and g are depoint.
. Solution, p. 167
fined on the whole real line, and
Let’s see if the rule that the are differentiable everywhere. Let
derivative of tk is ktk−1 also works s = f + g be their sum. In what
for k < 0. Use a graph to test one ways, if any, are the extrema of f ,
particular case, choosing one par- g, and s related?
ticular negative value of k, and one
. Solution, p. 170
particular value of t. If it works,
Euclid proved that the volwhat does that tell you about the
ume of a pyramid equals (1/3)bh,
rule? If it doesn’t work?
where b is the area of its base,
. Solution, p. 167
and h its height. A pyramidal
Two atoms will interact via tent without tent-poles is erected
electrical forces between their pro- by blowing air into it under prestons and electrons. To put them sure. The area of the base is easy
at a distance r from one another to measure accurately, because the
(measured from nucleus to nu- base is nailed down, but the height
cleus), a certain amount of energy fluctuates somewhat and is hard to
E is required, and the minimum measure accurately. If the amount
energy occurs when the atoms are of uncertainty in the measured
in equilibrium, forming a molecule. height is plus or minus eh , find the
Often a fairly good approximation amount of possible error eV in the
. Solution, p. 171
to the energy is the Lennard-Jones volume.
A hobbyist is going to measure the height to which her model
rocket rises at the peak of its trajectory. She plans to take a digital photo from far away and then
do trigonometry to determine the
height, given the baseline from the
launchpad to the camera and the
angular height of the rocket as
determined from analysis of the
photo. Comment on the error incurred by the inability to snap the
photo at exactly the right moment.
. Solution, p. 171
Prove, as claimed on p. 10,
that if the sum 12 + 22 + . . . + n2
is a polynomial, it must be of third
order, and the coefficient of the n3
term must be 1/3.
. Solution, p. 171
2 To infinity — and
a / Gottfried
Little kids readily pick up the idea
of infinity. “When I grow up,
I’m gonna have a million Barbies.”
“Oh yeah? Well, I’m gonna have
a billion.” “Well, I’m gonna have
infinity Barbies.” “So what? I’ll
have two infinity of them.” Adults
laugh, convinced that infinity, ∞,
is the biggest number, so 2∞ can’t
be any bigger. This is the idea behind a joke in the movie Toy Story.
Buzz Lightyear’s slogan is “To infinity — and beyond!” We assume
there isn’t any beyond. Infinity is
supposed to be the biggest there
is, so by definition there can’t be
anything bigger, right?
2.1 Infinitesimals
Actually mathematicians have invented many different logical sys-
tems for working with infinity, and
in most of them infinity does come
in different sizes and flavors. Newton, as well as the German mathematician Leibniz who invented calculus independently,1 had a strong
intuitive idea that calculus was really about numbers that were infinitely small: infinitesimals, the
opposite of infinities. For instance,
consider the number 1.12 = 1.21.
That 2 in the first decimal place
is the same 2 that appears in the
expression 2t for the derivative of
t2 .
b / A close-up view of the
function x = t 2 , showing the line that connects the points (1, 1)
and (1.1, 1.21).
1 There is some dispute over this point.
Newton and his supporters claimed that
Leibniz plagiarized Newton’s ideas, and
merely invented a new notation for them.
Figure b shows the idea visually.
The line connecting the points
(1, 1) and (1.1, 1.21) is almost indistinguishable from the tangent
line on this scale. Its slope is
(1.21 − 1)/(1.1 − 1) = 2.1, which
is very close to the tangent line’s
slope of 2. It was a good approximation because the points were
close together, separated by only
0.1 on the t axis.
a number t. The idea is that dt
is smaller than any ordinary number you could imagine, but it’s not
zero. The area of the square is increased by dx = 2tdt + dt2 , which
is analogous to the finite numbers
0.21 and 0.0201 we calculated earlier. Where before we divided by
a finite change in t such as 0.1 or
0.01, now we divide by dt, producing
2t dt + dt2
If we needed a better approxi=
mation, we could try calculating
1.012 = 1.0201. The slope of the
= 2t + dt
line connecting the points (1, 1)
and (1.01, 1.0201) is 2.01, which is for the derivative. On a graph like
even closer to the slope of the tan- figure b, dx/dt is the slope of the
gent line.
tangent line: the change in x divided by the changed in t.
Another method of visualizing the
idea is that we can interpret x = t2 But adding an infinitesimal numas the area of a square with sides ber dt onto 2t doesn’t really change
of length t, as suggested in fig- it by any amount that’s even theure c. We increase t by an in- oretically measurable in the real
finitesimally small number dt. The world, so the answer is really 2t.
d is Leibniz’s notation for a very Evaluating it at t = 1 gives the
small difference, and dt is to be exact result, 2, that the earlier
read is a single symbol, “dee-tee,” approximate results, 2.1 and 2.01,
not as a number d multiplied by were getting closer and closer to.
Example 9
To show the power of infinitesimals
and the Leibniz notation, let’s prove
that the derivative of t 3 is 3t 2 :
(t + dt)3 − t 3
3t 2 dt + 3t dt 2 + dt 3
= 3t 2 + . . .
c / A geometrical interpretation of the derivative
of t 2 .
where the dots indicate infinitesimal
terms that we can neglect.
This result required significant
sweat and ingenuity when proved
on page 138 by the methods of
chapter 1, and not only that
but the old method would have
required a completely different
method of proof for a function that
wasn’t a polynomial, whereas the
new one can be applied more generally, as we’ll see presently in examples 10-13.
shows you Inf is ready to accept
your typed input.
: ((1+d)^3-1)/d
As claimed, the result is 3, or close
enough to 3 that the infinitesimal
error doesn’t matter in real life. It
might look like Inf did this example by using algebra to simplify the
It’s easy to get the mistaken im- expression, but in fact Inf doesn’t
pression that infinitesimals exist know anything about algebra. One
in some remote fairyland where we way to see this is to use Inf to comcan never touch them. This may pare d with various real numbers:
be true in the same artsy-fartsy
sense that√we can never truly un: d<1
derstand 2, because its decimal
expansion goes on forever, and
: d<0.01
we therefore can never compute
it exactly. But in practical work,
: d<0.0000001
that doesn’t
stop us from working
with 2. We just approximate it
: d<0
as, e.g., 1.41. Infinitesimals are no
more or less mysterious than irrational numbers, and in particular
we can represent them concretely If d were just a variable being
on a computer.
If you go to treated according to the axioms of
algebra, there would be no way to,
you’ll find a web-based calculator tell how it compared with other
called Inf, which can handle numbers without having some speinfinite and infinitesimal numbers. cial information. Inf doesn’t know
It has a built-in symbol, d, which algebra, but it does know that d
represents an infinitesimally small is a positive number that is less
number such as the dx’s and dt’s than any positive real number that
we’ve been handling symbolically. can be represented using decimals
or scientific notation.
Let’s use Inf to verify that the
Example 10
derivative of t3 , evaluated at t = 1, In example 5 on p. 15, we made a
is equal to 3, as found by plug- rough numerical check to see if the
ging in to the result of example 9. differentiation rule t k → kt k−1 , which
The : symbol is the prompt that was proved on p. 138 for k = 1, 2, 3,
. . . , was also valid for k = −1, i.e.,
for the function x = 1/t. Let’s look
for an actual proof. To find a natural method of attack, let’s first redo
the numerical check in a slightly more
suggestive form. Again approximating
the derivating at t = 3, we have
Let’s apply the grade-school technique for subtracting fractions, in
which we first get them over the same
3 − 3.01
3.01 3 × 3.01
The result is
3 × 3.01
3 × 3.01
Replacing 3 with t and 0.01 with dt,
this becomes
t(t + dt)
= −t −2 + . . .
Example 11
The derivative of x = sin t, with t in
units of radians, is
sin(t + dt) − sin t
d / Graphs of sin t, and
its derivative cos t.
Applying the small-angle approximations sin u ≈ u and cos u ≈ 1, we
cos t dt
+ ...
= cos t + . . .
where “. . . ”
represents the error
caused by the small-angle approximations.
This is essentially all there is to the
computation of the derivative, except
for the remaining technical point that
we haven’t proved that the small-angle
approximations are good enough. In
example 9 on page 26, when we calculated the derivative of t 3 , the resulting expression for the quotient dx/dt
came out in a form in which we could
inspect the “. . . ” terms and verify before discarding them that they were infinitesimal. The issue is less trivial in
the present example. This point is addressed more rigorously on page 139.
Figure d shows the graphs of the function and its derivative. Note how the
two graphs correspond. At t = 0,
and with the trig identity sin(α + β) = the slope of sin t is at its largest, and
sin α cos β + cos α sin β, this becomes is positive; this is where the derivative, cos t, attains its maximum posisin t cos dt + cos t sin dt − sin t
. tive value of 1. At t = π/2, sin t has
reached a maximum, and has a slope
of zero; cos t is zero here. At t = π,
in the middle of the graph, sin t has its
maximum negative slope, and cos t is
at its most negative extreme of −1.
Physically, sin t could represent the
position of a pendulum as it moved
back and forth from left to right, and
cos t would then be the pendulum’s
Example 12
What about the derivative of the cosine? The cosine and the sine are really the same function, shifted to the
left or right by π/2. If the derivative
of the sine is the same as itself, but
shifted to the left by π/2, then the
derivative of the cosine must be a cosine shifted to the left by π/2:
e / The function x
1/(1 − t).
d cos t
= cos(t + π/2)
= − sin t
we can observe how much the result increases relative to 1, and this
will give us an approximation to the
derivative. For example, we find that
at t = 0.001, the function has the
value 1.001001001001, and so the
derivative is approximately (1.001 −
1)/(.001 − 0), or about 1. We can
therefore conjecture that the derivative is exactly 1, but that’s not the
same as proving it.
The next example will require a
little trickery. By the end of this
chapter you’ll learn general techniques for cranking out any derivative cookbook-style, without having to come up with any tricks.
But let’s take another look at that number 1.001001001001. It’s clearly a repeating decimal. In other words, it appears that
= 1+
+. . .
1 − 1/1000
Example 13
. Find the derivative of 1/(1 − t), evaluated at t = 0.
and we can easily verify this by multiplying both sides of the equation by
1 − 1/1000 and collecting like powers.
This is a special case of the geometric
. The graph shows what the function
looks like. It blows up to infinity at t =
1, but it’s well behaved at t = 0, where
it has a positive slope.
For insight, let’s calculate some points
on the curve. The point at which
we’re differentiating is (0, 1). If we
put in a small, positive value of t,
= 1 + t + t2 + . . .
which can be derived2 by doing synthetic division (the equivalent of long
2 As a technical aside, it’s not necessary for our present purposes to go into
the issue of how to make the most general possible definition of what is meant
division for polynomials), or simply
verified, after forming the conjecture based on the numerical example
above, by multiplying both sides by
1 − t.
As we’ll see in section 2.2, and have
been implicitly assuming so far, infinitesimals obey all the same elementary laws of algebra as the real
numbers, so the above derivation also
holds for an infinitesimal value of t.
We can verify the result using Inf:
: 1/(1-d)
Notice, however, that the series is
truncated after the first five terms.
This is similar to the truncation that
happens when
you ask your calcula√
tor to find 2 as a decimal.
The result for the derivative is
1 + dt + dt 2 + . . . − 1
1 + dt − 1
= 1 + ...
f / Bishop George Berkeley (1685-1753)
One prominent critic of the calculus was Newton’s contemporary
George Berkeley, the Bishop of
Cloyne. Although some of his
complaints are clearly wrong (he
denied the possibility of the second derivative), there was clearly
something to his criticism of the
infinitesimals. He wrote sarcastically, “They are neither finite
quantities, nor quantities infinitely
small, nor yet nothing. May we not
call them ghosts of departed quantities?”
Infinitesimals seemed scary, because if you mishandled them, you
could prove absurd things. For
example, let du be an infinitesi2.2 Safe use of
mal. Then 2du is also infinitesimal. Therefore both 1/du and
1/(2du) equal infinity, so 1/du =
The idea of infinitesimally small 1/(2du). Multiplying by du on
numbers has always irked purists. both sides, we have a proof that
1 = 1/2.
by a sum like this one which has an infinite number of terms; the only fact we’ll
need here is that the error in finite sum
obtained by leaving out the “. . . ” has
only higher powers of t. This is taken
up in more detail in ch. 7. Note that
the series only gives the right answer
for t < 1. E.g., for t = 1, it equals
1 + 1 + 1 + . . ., which, if it means anything,
clearly means something infinite.
In the eighteenth century, the use
of infinitesimals became like adultery: commonly practiced, but
shameful to admit to in polite circles. Those who used them learned
certain rules of thumb for handling
them correctly. For instance, they
would identify the flaw in my proof
of 1 = 1/2 as my assumption that
there was only one size of infinity,
when actually 1/du should be interpreted as an infinity twice as big
as 1/(2du). The use of the symbol ∞ played into this trap, because the use of a single symbol
for infinity implied that infinities
only came in one size. However,
the practitioners of infinitesimals
had trouble articulating a clear
set of principles for their proper
use, and couldn’t prove that a selfconsistent system could be built
around them.
By the twentieth century, when
I learned calculus, a clear consensus had formed that infinite
and infinitesimal numbers weren’t
numbers at all. A notation like
dx/dt, my calculus teacher told
me, wasn’t really one number divided by another, it was merely
a symbol for something called a
∆t→0 ∆t
where ∆x and ∆t represented finite changes. I’ll give a formal definition (actually two different formal definitions) of the term “limit”
in section 3.2, but intuitively the
concept is that is that we can get
as good an approximation to the
derivative as we like, provided that
we make ∆t small enough.
the dt, leaving them on opposite
sides of the equation. I buttonholed my teacher after class and
asked why he was now doing what
he’d told me you couldn’t really
do, and his response was that dx
and dt weren’t really numbers, but
most of the time you could get
away with treating them as if they
were, and you would get the right
answer in the end. Most of the
time!? That bothered me. How
was I supposed to know when it
wasn’t “most of the time?”
g / Abraham
But unknown to me and my
teacher, mathematician Abraham
Robinson had already shown in the
1960’s that it was possible to construct a self-consistent number system that included infinite and infinitesimal numbers. He called it
the hyperreal number system, and
That satisfied me until we got to
it included the real numbers as a
a certain topic (implicit differensubset.3
tiation) in which we were encour3 The main text of this book treats inaged to break the dx away from
Moreover, the rules for what you
can and can’t do with the hyperreals turn out to be extremely
simple. Take any true statement
about the real numbers. Suppose
it’s possible to translate it into a
statement about the hyperreals in
the most obvious way, simply by
replacing the word “real” with the
word “hyperreal.” Then the translated statement is also true. This
is known as the transfer principle.
Let’s look back at my bogus proof
of 1 = 1/2 in light of this simple principle. The final step of
the proof, for example, is perfectly
valid: multiplying both sides of the
equation by the same thing. The
following statement about the real
numbers is true:
For any real numbers a, b, and
c, if a = b, then ac = bc.
This can be translated in an obvious way into a statement about the
For any hyperreal numbers a,
b, and c, if a = b, then ac = bc.
about the reals, so there’s no reason to believe it’s true when applied to the hyperreals — and in
fact it’s false.
What the transfer principle tells us
is that the real numbers as we normally think of them are not unique
in obeying the ordinary rules of algebra. There are completely different systems of numbers, such
as the hyperreals, that also obey
How, then, are the hyperreals even
different from the reals, if everything that’s true of one is true of
the other? But recall that the
transfer principle doesn’t guarantee that every statement about the
reals is also true of the hyperreals. It only works if the statement
about the reals can be translated
into a statement about the hyperreals in the most simple, straightforward way imaginable, simply by
replacing the word “real” with the
word “hyperreal.” Here’s an example of a true statement about
the reals that can’t be translated
in this way:
However, what about the stateFor any real number a, there
ment that both 1/du and 1/(2du)
is an integer n that is greater
equal infinity, so they’re equal to
than a.
each other? This isn’t the translation of a statement that’s true This one can’t be translated so
simplemindedly, because it refers
finitesimals with the minimum fuss necessary in order to avoid the common to a subset of the reals called
goofs. More detailed discussions are of- the integers. It might be possiten relegated to the back of the book, as ble to translate it somehow, but
in example 11 on page 28. The reader
it would require some insight into
who wants to learn even more about the
hyperreal system should consult the list the correct way to translate that
word “integer.” The transfer prinof further reading on page 199.
ciple doesn’t apply to this statement, which indeed is false for the
hyperreals, because the hyperreals contain infinite numbers that
are greater than all the integers.
In fact, the contradiction of this
statement can be taken as a definition of what makes the hyperreals special, and different from
the reals: we assume that there is
at least one hyperreal number, H,
which is greater than all the integers.
As an analogy from everyday life,
consider the following statements
about the student body of the high
school I attended:
1. Every student at my high
school had two eyes and a face.
2. Every student at my high
school who was on the football
team was a jerk.
Let’s try to translate these into
statements about the population
of California in general. The student body of my high school is like
the set of real numbers, and the
present-day population of California is like the hyperreals. Statement 1 can be translated mindlessly into a statement that every Californian has two eyes and
a face; we simply substitute “every Californian” for “every student
at my high school.” But statement 2 isn’t so easy, because it
refers to the subset of students
who were on the football team,
and it’s not obvious what the corresponding subset of Californians
would be. Would it include everybody who played high school,
college, or pro football? Maybe
it shouldn’t include the pros, because they belong to an organization covering a region bigger than
California. Statement 2 is the kind
of statement that the transfer principle doesn’t apply to.4
Example 14
As a nontrivial example of how to apply the transfer principle, let’s consider
how to handle expressions like the
one that occurred when we wanted to
differentiate t 2 using infinitesimals:
d t2
= 2t + dt
I argued earlier than 2t + dt is so close
to 2t that for all practical purposes, the
answer is really 2t. But is it really valid
in general to say that 2t + dt is the
same hyperreal number as 2t? No.
We can apply the transfer principle to
the following statement about the reals:
For any real numbers a and b,
with b 6= 0, a + b 6= a.
Since dt isn’t zero, 2t + dt 6= 2t.
More generally, example 14 leads
us to visualize every number as being surrounded by a “halo” of numbers that don’t equal it, but differ from it by only an infinitesimal amount. Just as a magnifying glass would allow you to see
the fleas on a dog, you would need
an infinitely strong microscope to
4 For a slightly more precise and formal statement of the transfer principle,
see page 141.
see this halo. This is similar to
the idea that every integer is surrounded by a bunch of fractions
that would round off to that integer. We can define the standard
part of a finite hyperreal number,
which means the unique real number that differs from it infinitesimally. For instance, the standard
part of 2t + dt, notated st(2t + dt),
equals 2t. The derivative of a function should actually be defined as
the standard part of dx/dt, but
we often write dx/dt to mean the
derivative, and don’t worry about
the distinction.
well, and so we have at least three
levels to the hierarchy: infinities
comparable to H, finite numbers,
and infinitesimals comparable to
1/H. If you can swallow that,
then it’s not too much of a leap to
add more rungs to the ladder, like
extra-small infinitesimals that are
comparable to 1/H 2 . If this seems
a little crazy, it may comfort you
to think of statements about the
hyperreals as descriptions of limiting processes involving real numbers. For instance, in the sequence
of numbers 1.12 = 1.21, 1.012 =
1.0201, 1.0012 = 1.002001, . . . , it’s
clear that the number represented
One of the things Bishop Berkeley by the digit 1 in the final decimal
disliked about infinitesimals was place is getting smaller faster than
the idea that they existed in a the contribution due to the digit 2
kind of hierarchy, with dt2 being in the middle.
not just infinitesimally small, but
infinitesimally small compared to One subtle issue here, which I
the infinitesimal dt. If dt is the avoided mentioning in the differenflea on a dog, then dt2 is a sub- tiation of the sine function on page
microscopic flea that lives on the 28, is whether the transfer princiflea, as in Swift’s doggerel: “Big ple is sufficient to let us define all
fleas have little fleas/ On their the functions that appear as famil√
backs to ride ’em,/ and little fleas iar keys on a calculator: x2 , x,
have lesser fleas,/And so, ad in- sin x, cos x, ex , and so on. After
finitum.” Berkeley’s criticism was all, these functions were originally
off the mark here: there is such a defined as rules that would take a
hierarchy. Our basic assumption real number as an input and give a
about the hyperreals was that they real number as an output. It’s not
contain at least one infinite num- trivially obvious that their definiber, H, which is bigger than all tions can naturally be extended to
the integers. If this is true, then take a hyperreal number as an in1/H must be less than 1/2, less put and give back a hyperreal as
than 1/100, less then 1/1, 000, 000 an output. Essentially the answer
— less than 1/n for any integer n. is that we can apply the transfer
Therefore the hyperreals are guar- principle to them just as we would
anteed to include infinitesimals as to statements about simple arith-
metic, but I’ve discussed this a little more on page 147.
whose standard part is the result
to be proved.
2.3 The product rule
Example 15
. Find the derivative of the function
t sin t.
When I first learned calculus, it
seemed to me that if the derivative of 3t was 3, and the derivative of 7t was 7, then the derivative of t multiplied by t ought to
be just plain old t, not 2t. The
reason there’s a factor of 2 in the
correct answer is that t2 has two
reasons to grow as t gets bigger: it
grows because the first factor of t
is increasing, but also because the
second one is. In general, it’s possible to find the derivative of the
product of two functions any time
we know the derivatives of the individual functions.
d(t sin t)
d(sin t) dt
· sin t
= t cos t + sin t
Figure h gives the geometrical interpretation of the product rule.
Imagine that the king, in his castle at the southwest corner of his
rectangular kingdom, sends out a
line of infantry to expand his territory to the north, and a line of cavalry to take over more land to the
east. In a time interval dt, the cavThe product rule
If x and y are both functions of t, alry, which moves faster, covers a
then the derivative of their product distance dx greater than that covered by the infantry, dy. However,
the strip of territory conquered by
the cavalry, ydx, isn’t as great as
it could have been, because in our
example y isn’t as big as x.
The proof is easy. Changing t by
an infinitesimal amount dt changes
the product xy by an amount
(x + dx)(y + dy) − xy
= ydx + xdy + dxdy
and dividing by dt makes this into
dy dxdy
h / A geometrical interpretation of the
product rule.
A helpful feature of the Leibniz
notation is that one can easily
use it to check whether the units
of an answer make sense. If we
measure distances in meters and
time in seconds, then xy has units
of square meters (area), and so
does the change in the area, d(xy).
Dividing by dt gives the number
of square meters per second being conquered. On the right-hand
side of the product rule, dx/dt
has units of meters per second
(velocity), and multiplying it by
y makes the units square meters
per second, which is consistent
with the left-hand side. The units
of the second term on the right
likewise check out. Some beginners might be tempted to guess
that the product rule would be
d(xy)/dt = (dx/dt)(dy/dt), but
the Leibniz notation instantly reveals that this can’t be the case,
because then the units on the left,
m2 /s, wouldn’t match the ones on
the right, m2 /s2 .
Because this unit-checking feature
is so helpful, there is a special way
of writing a second derivative in
the Leibniz notation. What Newton called ẍ, Leibniz wrote as
d2 x
in units of seconds, then the second derivative is supposed to have
units of acceleration, in units of
meters per second per second, also
written (m/s)/s, or m/s2 . (The
acceleration of falling objects on
Earth is 9.8 m/s2 in these units.)
The Leibniz notation is meant to
suggest exactly this: the top of the
fraction looks like it has units of
meters, because we’re not squaring
x, while the bottom of the fraction
looks like it has units of seconds,
because it looks like we’re squaring dt. Therefore the units come
out right. It’s important to realize,
however, that the symbol d isn’t a
number (not a real one, and not a
hyperreal one, either), so we can’t
really square it; the notation is not
to be taken as a literal statement
about infinitesimals.
Example 16
A tricky use of the product
√ is to
find the derivative of t. Since t can
be written as t 1/2 , we might suspect
that the rule d(t k )/dt = kt k −1 would
work, giving a derivative 21 t −1/2 =
1/(2 t). However, the method from
ch. 1 used to prove that rule proved
on p.138 only work if k is an integer,
so the best we could do would be to
confirm our conjecture approximately
by graphing or numerical estimation.
Although the different placement
of the 2’s on top and bottom seems
strange and inconsistent to many
beginners, it actually works out Using the product rule, we can write
nicely. If x is a distance, mea- f (t) = d t/dt for our unknown derivasured in meters, and t is a time, tive, and back into the result using the
and bottom. The only minor subtlety is that we would like to be
able to be sloppy by using an expression like dy/dx to mean both
the quotient of two infinitesimal
numbers and a derivative, which is
defined as the standard part of this
But dt/dt = 1, so f (t) = 1/(2 t) as
quotient. This sloppiness turns out
to be all right, as proved on page
The trick used in example 16 can 149.
also be used to prove that the
Example 17
power rule d(xn )/dx = nxn−1 ap- . Jane hikes 3 kilometers in an hour,
plies to cases where n is an integer and hiking burns 70 calories per kiloless than 0, but I’ll instead prove meter. At what rate does she burn
this on page 41 by a technique that calories?
doesn’t depend on a trick, and also . We let x be the number of hours
applies to values of n that aren’t she’s spent hiking so far, y the disintegers.
tance covered, and z the calories
product rule:
d( t t)
√ √
= f (t) t + tf (t)
= 2f (t) t
spent. Then
2.4 The chain rule
Figure i shows three clowns on seesaws. If the leftmost clown moves
down by a distance dx, the middle
one will come up by dy, but this
will also cause the one on the right
to move down by dz. If we want
to predict how much the rightmost
clown will move in response to a
certain amount of motion by the
leftmost one, we have
dz dy
dy dx
This is called the chain rule. It
says that if a change in x causes y
to change, and y then causes z to
change, then this chain of changes
has a cascading effect. Mathematically, there is no big mystery here.
We simply cancel dy on the top
70 cal
1 hr
= 210 cal/hr
Example 18
. Figure j shows a piece of farm
equipment containing a train of gears
with 13, 21, and 42 teeth. If the smallest gear is driven by a motor, relate
the rate of rotation of the biggest gear
to the rate of rotation of the motor.
. Let x, y , and z be the angular positions of the three gears. Then by the
chain rule,
dz dy
dy dx
13 21
21 42
i / Three clowns on seesaws demonstrate the chain rule.
sin(y (x)). Then
dz dy
dy dx
= cos(y ) · 2x
= 2x cos(x 2 )
j / Example 18.
The way people usually say it is that
the chain rule tells you to take the
derivative of the outside function, the
sine in this case, and then multiply
by the derivative of “the inside stuff,”
which here is the square. Once you
get used to doing it, you don’t need
to invent a third, intermediate variable,
as we did here with y .
The chain rule lets us find the
Example 20
derivative of a function that has
been built out of one function stuck Let’s express the chain rule without
the use of the Leibniz notation. Let the
inside another.
Example 19
. Find the derivative of the function
z(x) = sin(x 2 ).
. Let y (x) = x 2 , so that z(x) =
function f be defined by f (x) = g(h(x)).
Then the derivative of f is given by
f 0 (x) = g 0 (h(x)) · h0 (x).
Example 21
. We’ve already proved that the
derivative of t k is kt k−1 for k = −1 (example 10 on p. 27) and for k = 1, 2, 3,
for example, (e0.001 − 1)/0.001 =
1.00050016670838 is very close to
1. But how do we know it’s exactly
. For k < 0, the function x = t can
one when dx is really infinitesimal?
−1 −k
be written as x = (t ) , where −k is
We can use Inf:
positive. Applying the chain rule, we
. . . (p. 138). Use these facts to extend
the rule to all integer values of k .
find dx/dt = (−k )(t −1 )−k−1 (−t −2 ) =
k t k −1 .
2.5 Exponentials and
: [exp(d)-1]/d
(The ...
indicates where I’ve
snipped some higher-order terms
The exponential
out of the output.) It seems clear
that c is equal to 1 except for negThe exponential function ex , ligible terms involving higher powwhere e = 2.71828 . . . is the base ers of dx. A rigorous proof is given
of natural logarithms, comes on page 149.
constantly up in applications as
Example 22
diverse as credit-card interest, the
foreign subgrowth of animal populations, and
electric circuits. For its derivative
as c =
we have
ex+dx − ex
ex edx − ex
= ex
co e
, where co is the initial concentration, and a is a constant. For caffeine in adults, a is typically about 7
hours. An example is shown in figure
k. Differentiate the concentration with
respect to time, and interpret the result. Check that the units of the result
make sense.
The second factor, edx − 1 /dx, . Using the chain rule,
doesn’t have x in it, so it must
just be a constant. Therefore we
= co e
· −
know that the derivative of ex is
simply e , multiplied by some un=− e
known constant,
= c ex .
A rough check by graphing at, say
x = 0, shows that the slope is close
to 1, so c is close to 1. Numerical calculation also shows that,
This can be interpreted as the rate
at which caffeine is being removed
from the blood and put into the person’s urine. It’s negative because the
concentration is decreasing. According to the original expression for x,
a substance with a large a will take
a long time to reduce its concentration, since t/a won’t be very big unless we have large t on top to compensate for the large a on the bottom.
In other words, larger values of a represent substances that the body has
a harder time getting rid of efficiently.
The derivative has a on the bottom,
and the interpretation of this is that for
a drug that is hard to eliminate, the
rate at which it is removed from the
blood is low.
It makes sense that a has units of
time, because the exponential function has to have a unitless argument,
so the units of t/a have to cancel out.
The units of the result come from the
factor of co /a, and it makes sense that
the units are concentration divided by
time, because the result represents
the rate at which the concentration is
. In general, one of the tricks to doing calculus is to rewrite functions in
forms that you know how to handle.
This one can be rewritten as a base-e
y = 10x
ln y = ln 10x
ln y = x ln 10
y = ex ln 10
Applying the chain rule, we have the
derivative of the exponential, which is
just the same exponential, multiplied
by the derivative of the inside stuff:
= ex ln 10 · ln 10
In other words, the “c” referred to in
the discussion of the derivative of ex
becomes c = ln 10 in the case of the
base-10 exponential.
The logarithm
k / Example 22. A typical graph of the concentration of caffeine in
the blood, in units of milligrams per liter, as a
function of time, in hours.
Example 23
. Find the derivative of the function
y = 10x .
The natural logarithm is the function that undoes the exponential.
In a situation like this, we have
where on the left we’re thinking of
y as a function of x, and on the
right we consider x to be a function
of y. Applying this to the natural
y = ln x
x = ey
= ey
= y
d ln x
later. The proof is example 24 below.) The integral of x−1 is not
x0 /0, which wouldn’t make sense
anyway because it involves division by zero.5 Likewise the derivative of x0 = 1 is 0x−1 , which is
zero. Figure l shows the idea. The
functions xn form a kind of ladder,
with differentiation taking us down
one rung, and integration taking us
up. However, there are two special
cases where differentiation takes us
off the ladder entirely.
Example 24
. Prove d(x n )/dx = nx n−1 for any real
value of n, not just an integer.
y = xn
= en ln x
By the chain rule,
= en ln x ·
= xn ·
= nx n−1
l / Differentiation and integration of
functions of the form x n . Constants
out in front of the functions are not
shown, so keep in mind that, for ex5 Speaking casually, one can say that
ample, the derivative of x 2 isn’t x, it’s
by zero gives infinity. This is
This is noteworthy because it
shows that there must be an exception to the rule that the derivative of xn is nxn−1 , and the integral of xn−1 is xn /n. (On page
37 I remarked that this rule could
be proved using the product rule
for negative integer values of k,
but that I would give a simpler,
less tricky, and more general proof
often a good way to think when trying to connect mathematics to reality.
However, it doesn’t really work that way
according to our rigorous treatment of
the hyperreals. Consider this statement:
“For a nonzero real number a, there is
no real number b such that a = 0b.” This
means that we can’t divide a by 0 and get
b. Applying the transfer principle to this
statement, we see that the same is true
for the hyperreals: division by zero is undefined. However, we can divide a finite
number by an infinitesimal, and get an
infinite result, which is almost the same
(For n = 0, the result is zero.)
When I started the discussion of
the derivative of the logarithm, I
wrote y = ln x right off the bat.
That meant I was implicitly assuming x was positive. More generally, the derivative of ln |x| equals
1/x, regardless of the sign (see
problem 29 on page 50).
2.6 Quotients
So far we’ve been successful with
a divide-and-conquer approach to
differentiation: the product rule
and the chain rule offer methods of breaking a function down
into simpler parts, and finding the
derivative of the whole thing based
on knowledge of the derivatives of
the parts. We know how to find
the derivatives of sums, differences,
and products, so the obvious next
step is to look for a way of handling
division. This is straightforward,
since we know that the derivative
of the function 1/u = u−1 is −u−2 .
Let u and v be functions of x.
Then by the product rule,
dv 1
· +v·
dx u
and by the chain rule,
dv 1
1 du
· −v· 2
dx u
u dx
when we want to write a derivative
like d(v/u)/dx. When we’re differentiating a complicated function,
it can be uncomfortable trying to
cram the expression into the top of
the d . . . /d . . . fraction. Therefore
it would be more common to write
such an expression like this:
d v
This could be considered an abuse
of notation, making d look like a
number being divided by another
number dx, when actually d is
meaningless on its own. On the
other hand, we can consider the
symbol d/dx to represent the operation of differentiation with respect to x; such an interpretation
will seem more natural to those
who have been inculcated with the
taboo against considering infinitesimals as numbers in the first place.
Using the new notation, the quotient rule becomes
d v 1 dv
v du
= ·
dx u
u dx u2 dx
The interpretation of the minus
sign is that if u increases, v/u decreases.
Example 25
. Differentiate y = x/(1 + 3x), and
check that the result makes sense.
. We identify v with x and u with 1 + x.
This is so easy to rederive on de- The result is
mand that I suggest not memorizv du
d v 1 dv
= ·
− 2 ·
ing it.
By the way, notice how the notation becomes a little awkward
u dx
1 + 3x
(1 + 3x)2
One way to check that the result
makes sense it to consider extreme
values of x. For very large values of x,
the 1 on the bottom of x/(1 + 3x) becomes negligible compared to the 3x,
and the function y approaches x/3x =
1/3 as a limit. Therefore we expect
that the derivative dy /dx should approach zero, since the derivative of
a constant is zero. It works: plugging in bigger and bigger numbers for
x in the expression for the derivative
does give smaller and smaller results.
(In the second term, the denominator
gets bigger faster than the numerator,
because it has a square in it.)
Another way to check the result is to
verify that the units work out. Suppose arbitrarily that x has units of gallons. (If the 3 on the bottom is unitless,
then the 1 would have to represent 1
gallon, since you can’t add things that
have different units.) The function y is
defined by an expression with units of
gallons divided by gallons, so y is unitless. Therefore the derivative dy /dx
should have units of inverse gallons.
Both terms in the expression for the
derivative do have those units, so the
units of the answer check out.
is no real creativity required, so a
computer can be programmed to
do all the drudgery. For example,
you can download a free, opensource program called Yacas from
install it on a Windows or Linux
machine. There is even a version
you can run in a web browser without installing any special software:
yacasconsole.html .
A typical session with Yacas looks
like this:
Example 26
D(x) x^2
D(x) Exp(x^2)
D(x) Sin(Cos(Sin(x)))
Upright type represents your input, and italicized type is the program’s output.
First I asked it to differentiate x2
with respect to x, and it told me
the result was 2x. Then I did
the derivative of ex , which I also
have done fairly easily by
2.7 Differentiation on could
hand. (If you’re trying this out
a computer
on a computer as you real along,
make sure to capitalize functions
In this chapter you’ve learned a set
like Exp, Sin, and Cos.) Finally
of rules for evaluating derivatives:
I tried an example where I didn’t
derivatives of products, quotients,
know the answer off the top of my
functions inside other functions,
head, and that would have been a
etc. Because these rules exist,
little tedious to calculate by hand.
it’s always possible to find a
formula for a function’s derivative, Unfortunately things are a little
given the formula for the original less rosy in the world of integrals.
function. Not only that, but there There are a few rules that can help
you do integrals, e.g., that the integral of a sum equals the sum of the
integrals, but the rules don’t cover
all the possible cases. Using Yacas to evaluate the integrals of the
same functions, here’s what happens.6
Example 27
Integrate(x) x^2
Integrate(x) Exp(x^2)
The first one works fine, and I
can easily verify that the answer
is correct, by taking the derivative
of x3 /3, which is x2 . (The answer could have been x3 /3 + 7, or
x3 /3+c, where c was any constant,
but Yacas doesn’t bother to tell us
that.) The second and third ones
don’t work, however; Yacas just
spits back the input at us without
making any progress on it. And
it may not be because Yacas isn’t
smart enough to figure out these
integrals. The function ex can’t
be integrated at all in terms of a
formula containing ordinary operations and functions such as addition, multiplication, exponentiation, trig functions, exponentials,
and so on.
6 If you’re trying these on your own
computer, note that the long input line
for the function sin cos sin x shouldn’t be
broken up into two lines as shown in the
That’s not to say that a program
like this is useless. For example,
here’s an integral that I wouldn’t
have known how to do, but that
Yacas handles easily:
Example 28
Integrate(x) Sin(Ln(x))
This one is easy to check by differentiating, but I could have been
marooned on a desert island for a
decade before I could have figured
it out in the first place. There are
various rules, then, for integration,
but they don’t cover all possible
cases as the rules for differentiation
do, and sometimes it isn’t obvious
which rule to apply. Yacas’s ability
to integrate sin ln x shows that it
had a rule in its bag of tricks that
I don’t know, or didn’t remember,
or didn’t realize applied to this integral.
Back in the 17th century, when
Newton and Leibniz invented calculus, there were no computers, so
it was a big deal to be able to find
a simple formula for your result.
Nowadays, however, it may not be
such a big deal. Suppose I want to
find the derivative of sin cos sin x,
evaluated at x = 1. I can do something like this on a calculator:
Example 29
sin cos sin 1 =
sin cos sin 1.0001 =
ter accuracy in our approximation
to the derivative.
/.0001 =
I have the right answer, with
plenty of precision for most realistic applications, although I might
have never guessed that the mysterious number −0.3167 was actually
−(cos 1)(sin sin 1)(cos cos sin 1).
This could get a little tedious if I
wanted to graph the function, for
instance, but then I could just use
a computer spreadsheet, or write
a little computer program. In this
chapter, I’m going to show you
how to do derivatives and integrals
using simple computer programs,
using Yacas. The following little
Yacas program does the same
thing as the set of calculator
operations shown above:
Example 30
N( (f(x+dx)-f(x))/dx )
(I’ve omitted all of Yacas’s output
except for the final result.) Line
1 defines the function we want to
differentiate. Lines 2 and 3 give
values to the variables x and dx.
Line 4 computes the derivative; the
N( ) surrounding the whole thing
is our way of telling Yacas that we
want an approximate numerical result, rather than an exact symbolic
Example 31
N( (f(x+dx)-f(x))/dx )
Line 5 defines the derivative function. It needs to know both x and
dx. Line 6 computes the derivative
using dx = 0.1, which we expect to
be a lousy approximation, since dx
is really supposed to be infinitesimal, and 0.1 isn’t even that small.
Line 7 does it with the same value
of dx we used earlier. The two results agree exactly in the first decimal place, and approximately in
the second, so we can be pretty
sure that the derivative is −0.32
to two figures of precision. Line
8 ups the ante, and produces a result that looks accurate to at least
3 decimal places. Line 9 attempts
to produce fantastic precision by
using an extremely small value of
dx. Oops — the result isn’t better, it’s worse! What’s happened
here is that Yacas computed f (x)
and f (x + dx), but they were the
same to within the precision it was
using, so f (x + dx) − f (x) rounded
off to zero.7
An interesting thing to try now is
7 Yacas can do arithmetic to any
to make dx smaller and smaller, precision you like, although you may
and see if we get better and bet- run into practical limits due to the
Example 31 demonstrates the concept of how a derivative can be defined in terms of a limit:
= lim
dx ∆x→0 ∆x
The idea of the limit is that we
can theoretically make ∆y/∆x approach as close as we like to dy/dx,
provided we make ∆x sufficiently
small. In reality, of course, we
eventually run into the limits of
our ability to do the computation,
as in the bogus result generated on
line 9 of the example.
amount of memory your computer has
and the speed of its CPU. For fun,
try N(Pi,1000), which tells Yacas to
compute π numerically to 1000 decimal
Carry out a calculation like
the one in example 9 on page 26
to show that the derivative of t4
equals 4t3 .
. Solution, p. 171
Example 12 on page 29 gave
a tricky argument to show that the
derivative of cos t is − sin t. Prove
the same result using the method
of example 11 instead.
. Solution, p. 172
Suppose H is a big number.
Experiment on a√calculator√to figure out whether H + 1− H − 1
comes out big, normal, or tiny. Try
making H bigger and bigger, and
see if you observe a trend. Based
on these numerical examples, form
a conjecture about what happens
to this expression when H is infinite.
. Solution, p. 172
(a) For any real numbers x and y,
x + y = y + x.
(b) The sine of any real number is
between −1 and 1.
(c) For any real number x, there
exists another real number y that
is greater than x.
(d) For any real numbers x 6= y,
there exists another real number z
such that x < z < y.
(e) For any real numbers x 6= y,
there exists a rational number z
such that x < z < y. (A rational number is one that can be expressed as an integer divided by
another integer.)
(f) For any real numbers x, y, and
z, (x + y) + z = x + (y + z).
(g) For any real numbers x and y,
either x < y or x = y or x > y.
(h) For any real number x, x + 1 6=
. Solution, p. 173
If we want to pump air
or water through a pipe, common sense tells us that it will be
easier to move a larger quantity
more quickly through a fatter pipe.
Quantitatively, we can define the
resistance, R, which is the ratio
of the pressure difference produced
by the pump to the rate of flow.
A fatter pipe will have a lower resistance. Two pipes can be used
in parallel, for instance when you
To which of the following turn on the water both in the
statements can the transfer prin- kitchen and in the bathroom, and
ciple be applied? If you think it in this situation, the two pipes let
can’t be applied to a certain state- more water flow than either would
ment, try to prove that the state- have let flow by itself, which tells
ment is false for the hyperreals, us that they act like a single pipe
e.g., by giving a counterexample.
with some lower resistance. The
Suppose dx is a small but
finite number. Experiment on
√ a
calculator to figure out how dx
compares in size to dx. Try making dx smaller and smaller, and
see if you observe a trend. Based
on these numerical examples, form
a conjecture about what happens
to this expression when dx is infinitesimal.
. Solution, p. 172
equation for their combined resistance is R = 1/(1/R1 + 1/R2 ).
Analyze the case where one resistance is finite, and the other infinite, and give a physical interpretation. Likewise, discuss the case
where one is finite, but the other is
. Solution, p. 173
the top down, i.e., e(e ) , not (ee )x .)
. Solution, p. 174
Differentiate a sin(bx + c)
with respect to x.
. Solution, p. 174
Let x = tp/q , where p and
q are positive integers. By a technique similar to the one in example 21 on p. 38, prove that the dif7
Naively, we would imagine ferentiation rule for tk holds when
that if a spaceship traveling at u = k = p/q.qwe
. Solution, p. ??
3/4 of the speed of light was to
Find a function whose
shoot a missile in the forward di- 13
with respect to x equals
rection at v = 3/4 of the speed
That is, find an inteof light (relative to the ship), then
the missile would be traveling at
Solution, p. 174
u + v = 3/2 of the speed of light.
However, Einstein’s theory of rela- 14
Use the chain rule to differtivity tells us that this is too good entiate ((x2 )2 )2 , and show that you
to be true, because nothing can go get the same result you would have
faster than light. In fact, the rela- obtained by differentiating x8 .
tivistic equation for combining ve. Solution, p. 174 [M. Livshits]
locities in this way is not u+v, but
The range of a gun, when
rather (u + v)/(1 + uv). In ordi- 15
to an angle θ, is given by
nary, everyday life, we never travel
at speeds anywhere near the speed
2v 2
sin θ cos θ
of light. Show that the nonrelag
tivistic result is recovered in the
case where both u and v are in- Find the angle that will produce
the maximum range.
. Solution, p. 173
. Solution, p. 175
Differentiate (2x + 3)100 with
16 Differentiate
sin cos tan x
respect to x. . Solution, p. 173
with respect to x.
Differentiate (x + 1)100 (x +
17 The hyperbolic cosine func200
with respect to x.
tion is defined by
. Solution, p. 174
ex + e−x
Differentiate the following
cosh x =
with respect to x: e7x , ee . (In
the latter expression, as in all ex- Find any minima and maxima of
ponentials nested inside exponen- this function.
tials, the evaluation proceeds from
. Solution, p. 175
Show that the function
sin(sin(sin x)) has maxima and
minima at all the same places
where sin x does, and at no other
. Solution, p. 175
simplify the writing, start by defining some other
psymbol to stand for
the constant g/A.
(b) Show that your answer can be
reexpressed in terms of the function tanh defined by tanh x = (ex −
19 Let f (x) = |x|+x and g(x) = −x
e )/(ex + e−x ).
x|x| + x. Find the derivatives of
(c) Show that your result for the
these functions at x = 0 in terms
velocity approaches a constant for
of (a) slopes of tangent lines and
large values of t.
(b) infinitesimals.
(d) Check that your answers to
. Solution, p. 176
parts b and c have units of velocity.
. Solution, p. 177
In free fall, the acceleration
will not be exactly constant, due 21
Differentiate tan θ with reto air resistance. For example, a spect to θ.
. Solution, p. 177
skydiver does not speed up indefi√
Differentiate 3 x with renitely until opening her chute, but 22
. Solution, p. 177
rather approaches a certain maxi- spect to x.
mum velocity at which the upward
Differentiate the following
force of air resistance cancels out
with respect
√ to x:
the force of gravity. The expres(a) y = √x2 + 1
sion for the distance dropped by of
(b) y = x
√ +a
a free-falling object, with air resis(c) y = 1/ √a + x
tance, is8
(d) y = a/ a − x2
r g
d = A ln cosh t
. Solution, p. 177 [Thompson, 1919]
Differentiate ln(2t + 1) with
where g is the acceleration the ob- respect to t.
. Solution, p. 178
ject would have without air resisIf you know the derivative of
tance, the function cosh has been 25
not necessary to use the
defined in problem 17, and A is a
in order to differenticonstant that depends on the size,
shape, and mass of the object, and ate 3 sin x, but show that using the
the density of the air. (For a sphere product rule gives the right result
. Solution, p. 178
of mass m and diameter d dropping anyway.
in air, A = 4.11m/d2 . Cf. problem 26
The Γ function (capital
10, p. 113.)
Greek letter gamma) is a contin(a) Differentiate this expression to uous mathematical function that
find the velocity. Hint: In order to has the property Γ(n) = 1 · 2 ·
8 Jan Benacka and Igor Stubna, The
Physics Teacher, 43 (2005) 432.
. . . · (n − 1) for n an integer. Γ(x)
is also well defined for values of x
that are not integers,
e.g., Γ(1/2)
happens to be π. Use computer
software that is capable of evaluating the Γ function to determine
numerically the derivative of Γ(x)
with respect to x, at x = 2. (In Yacas, the function is called Gamma.)
. Solution, p. 178
On even function is one with
the property f (−x) = f (x). For
example, cos x is an even function, and xn is an even function
if n is even. An odd function has
f (−x) = −f (x). Prove that the
derivative of an even function is
. Solution, p. 179
For a cylinder of fixed
surface area, what proportion of
length to radius will give the maximum volume?
. Solution, p. 178
31 Suppose we have a list of
numbers x1 , . . . xn , and we wish to
find some number q that is as close
as possible to as many of the xi as
possible. To make this a mathematically precise goal, we need to
define some numerical measure of
this closeness. Suppose we let h =
(x1 −q)2 +. . .+(xn −q)2 , which can
also be notated usingP
Σ, uppercase
Greek sigma, as h = i=1 (xi −q)2 .
Then minimizing h can be used as
a definition of optimal closeness.
would we not want to use
h =
i=1 (xi − q)?) Prove that
the value of q that minimizes h is
the average of the xi .
This problem is a variation on problem 11 on page 21.
Einstein found that the equation
K = (1/2)mv 2 for kinetic energy
was only a good approximation for
speeds much less than the speed of
light, c. At speeds comparable to
the speed of light, the correct equation is
mv 2
K=p 2
1 − v 2 /c2
(a) As in the earlier, simpler problem, find the power dK/dt for
an object accelerating at a steady
rate, with v = at.
(b) Check that your answer has the
right units.
(c) Verify that the power required
becomes infinite in the limit as v
approaches c, the speed of light.
This means that no material object can go as fast as the speed of
. Solution, p. 179
Use a trick similar to the one
used in example 16 to prove that
the power rule d(xk )/dx = kxk−1
applies to cases where k is an integer less than 0.
. Solution, p. 180 ?
The plane of Euclidean geometry is today often described
as the set of all coordinate pairs
(x, y), where x and y are real. We
could instead imagine the plane F
Prove, as claimed on page that is defined in the same way, but
42, that the derivative of ln |x| with x and y taken from the set
equals 1/x, for both positive and of hyperreal numbers. As a third
negative x.
. Solution, p. 179
alternative, there is the plane G
in which the finite hyperreals are
used. In E, Euclid’s parallel postulate holds: given a line and a point
not on the line, there exists exactly one line passing through the
point that does not intersect the
line. Does the parallel postulate
hold in F? In G? Is it valid to associate only E with the plane described by Euclid’s axioms?
. Solution, p. 180 ?
Discuss the following statement:
The repeating decimal
0.999 . . . is infinitesimally less than
. Solution, p. 180
Example 20 on page 38 expressed the chain rule without the
Leibniz notation, writing a function f defined by f (x) = g(h(x)).
Suppose that you’re trying to remember the rule, and two of the
possibilities that come to mind are
f 0 (x) = g 0 (h(x)) and f 0 (x) =
g 0 (h(x))h(x). Show that neither
of these can possibly be right, by
considering the case where x has
units. You may find it helpful to
convert both expressions back into
the Leibniz notation.
. Solution, p. 181
When you tune in a radio
station using an old-fashioned rotating dial you don’t have to be
exactly tuned in to the right frequency in order to get the station.
If you did, the tuning would be infinitely sensitive, and you’d never
be able to receive any signal at all!
Instead, the tuning has a certain
amount of “slop” intentionally de-
signed into it. The strength of the
received signal s can be expressed
in terms of the dial’s setting f by
a function of the form
s= p
a(f 2
− fo2 )2 + bf 2
where a, b, and fo are constants.
This functional form is in fact
very general, and is encountered in
many other physical contexts. The
graph below shows the resulting
bell-shaped curve. Find the frequency f at which the maximum
response occurs, and show that if b
is small, the maximum occurs close
to, but not exactly at, fo .
. Solution, p. 181
The function of problem
36, with a = 3, b = 1, and
fo = 1.
In a movie theater, the
image on the screen is formed by
a lens in the projector, and originates from one of the frames on
the strip of celluloid film (or, in the
newer digital projection systems,
from a liquid crystal chip). Let the
Problem 37. A set of light rays is emitted from the tip of the glamorous movie
star’s nose on the film, and reunited to form a spot on the screen which is the
image of the same point on his nose. The distances have been distorted for
clarity. The distance y represents the entire length of the theater from front to
distance from the film to the lens
be x, and let the distance from the
lens to the screen be y. The projectionist needs to adjust x so that
it is properly matched with y, or
else the image will be out of focus.
There is therefore a fixed relationship between x and y, and this relationship is of the form
+ =
x y
where f is a property of the lens,
called its focal length. A stronger
lens has a shorter focal length.
Since the theater is large, and the
projector is relatively small, x is
much less than y. We can see
from the equation that if y is sufficiently large, the left-hand side of
the equation is dominated by the
1/x term, and we have x ≈ f .
Since the 1/y term doesn’t completely vanish, we must have x
slightly greater than f , so that the
1/x term is slightly less than 1/f .
Let x = f + dx, and approximate
dx as being infinitesimally small.
Find a simple expression for y in
terms of f and dx.
. Solution, p. 182
Why might the expression
1∞ be considered an indeterminate
. Solution, p. 183
3 Limits and continuity
3.1 Continuity
that a function can be continuous
without being differentiable.
Intuitively, a continuous function
is one whose graph has no sudden
jumps in it; the graph is all a single connected piece. Formally, a
function f (x) is defined to be continuous if for any real x and any
infinitesimal dx, f (x + dx) − f (x)
is infinitesimal.
In most cases, there is no need
to invoke the definition explicitly
in order to check whether a function is continuous. Most of the
functions we work with are defined by putting together simpler
functions as building blocks. For
example, let’s say we’re already
Example 32 convinced that the functions deLet the function f be defined by f (x) = fined by g(x) = 3x and h(x) =
0 for x ≤ 0, and f (x) = 1 for x > 0. sin x are both continuous. Then if
Then f (x) is discontinuous, since for we encounter the function f (x) =
dx > 0, f (0 + dx) − f (0) = 1, which isn’t sin(3x), we can tell that it’s coninfinitesimal.
tinuous because its definition corresponds to f (x) = h(g(x)). The
functions g and h have been set
up like a bucket brigade, so that
g takes the input, calculates the
output, and then hands it off to
h for the final step of the calculation. This method of combining functions is called composition.
The composition of two continuous
functions is also continuous. Just
watch out for division. The funca / Example 32.
black dot indicates that
tion f (x) = 1/x is continuous evthe endpoint of the lower
erywhere except at x = 0, so for
ray is part of the ray,
example 1/ sin(x) is continuous evwhile the white one
erywhere except at multiples of π,
shows the contrary for
where the sine has zeroes.
the ray on the top.
If a function is discontinuous at a The intermediate value theorem
given point, then it is not differentiable at that point. On the other Another way of thinking about
hand, the example y = |x| shows continuous functions is given by
the intermediate value theorem.
Intuitively, it says that if you are
moving continuously along a road,
and you get from point A to point
B, then you must also visit every
other point along the road; only by
teleporting (by moving discontinuously) could you avoid doing so.
More formally, the theorem states
that if y is a continuous real-valued
function on the real interval from a
to b, and if y takes on values y1 and
y2 at certain points within this interval, then for any y3 between y1
and y2 , there is some real x in the
interval for which y(x) = y3 .
prove this with complete mathematical rigor, you would have to
get your friend to spell out very
explicitly what she thought were
the facts about integers that you
were allowed to start with as initial assumptions. Are you allowed
to assume that 1 exists? Will she
grant you that if a number n exists, so does n + 1? The intermediate value theorem is similar. It’s
stated as a theorem about certain
types of functions, but its truth
isn’t so much a matter of the properties of functions as the properties
of the underlying number system.
For the reader with a interest in
pure mathematics, I’ve discussed
this in more detail on page 154 and
given an abbreviated proof. (Most
introductory calculus texts do not
prove it at all.)
Example 33
. Show that there is a solution to the
equation 10x + x = 1000.
. We expect there to be a solution
near x = 3, where the function f (x) =
10x + x = 1003 is just a little too big.
On the other hand, f (2) = 102 is much
b / The intermediate value theorem too small. Since f has values above
states that if the function is continuand below 1000 on the interval from
ous, it must pass through y3 .
2 to 3, and f is continuous, the intermediate value theorem proves that a
The intermediate value theorem solution exists between 2 and 3. If we
seems so intuitively appealing that wanted to find a better numerical apif we want to set out to prove it, proximation to the solution, we could
we may feel as though we’re being do it using Newton’s method, which is
asked to prove a proposition such introduced in section 5.1.
as, “a number greater than 10 exExample 34
ists.” If a friend wanted to bet . Show that there is at least one soyou a six-pack that you couldn’t lution to the equation cos x = x, and
give bounds on its location.
. This is a transcendental equation,
and no amount of fiddling with algebra and trig identities will ever give a
closed-form solution, i.e., one that can
be written down with a finite number of
arithmetic operations to give an exact
result. However, we can easily prove
that at least one solution exists, by
applying the intermediate value theorem to the function x − cos x. The
cosine function is bounded between
−1 and 1, so this function must be
negative for x < −1 and positive for
x > 1. By the intermediate value theorem, there must be a solution in the
interval −1 ≤ x ≤ 1. The graph, c,
verifies this, and shows that there is
only one solution.
to be learned from the intermediate
value theorem that couldn’t be determined by graphing, but this example
clearly can’t be solved by graphing,
because we’re trying to prove a general result for all polynomials.
To see that the restriction to odd orders is necessary, consider the polynomial x 2 + 1, which has no real roots
because x 2 > 0 for any real number
To fix our minds on a concrete example for the odd case, consider the
polynomial P(x) = x 3 − x + 17. For
large values of x, the linear and constant terms will be negligible compared to the x 3 term, and since x 3
is positive for large values of x and
negative for large negative ones, it follows that P is sometimes positive and
sometimes negative.
Example 35
. Prove that every odd-order polynomial P with real coefficients has at
least one real root x, i.e., a point at
which P(x) = 0.
Making this argument more general
and rigorous, suppose we had a polynomial of odd order n that always had
the same sign for real x. Then by the
transfer principle the same would hold
for any hyperreal value of x. Now if x
is infinite then the lower-order terms
are infinitesimal compared to the x n
term, and the sign of the result is determined entirely by the x n term, but
x n and (−x)n have opposite signs, and
therefore P(x) and P(−x) have opposite signs. This is a contradiction,
so we have disproved the assumption
that P always had the same sign for
real x. Since P is sometimes negative and sometimes positive, we conclude by the intermediate value theorem that it is zero somewhere.
. Example 34 might have given the
impression that there was nothing
Example 36
. Show that the equation x = sin 1/x
c / The function x − cos x
constructed in example
has infinitely many solutions.
. This is another example that can’t
be solved by graphing; there is clearly
no way to prove, just by looking at
a graph like d, that it crosses the x
axis infinitely many times. The graph
does, however, help us to gain intuition for what’s going on. As x gets
smaller and smaller, 1/x blows up,
and sin 1/x oscillates more and more
rapidly. The function f is undefined
at 0, but it’s continuous everywhere
else, so we can apply the intermediate value theorem to any interval that
doesn’t include 0.
We want to prove that for any positive
u, there exists an x with 0 < x < u
for which f (x) has either desired sign.
Suppose that this fails for some real
u. Then by the transfer principle the
nonexistence of any real x with the desired property also implies the nonexistence of any such hyperreal x. But
for an infinitesimal x the sign of f is
determined entirely by the sine term,
since the sine term is finite and the linear term infinitesimal. Clearly sin 1/x
can’t have a single sign for all values
of x less than u, so this is a contradiction, and the proposition succeeds for
any u. It follows from the intermediate
value theorem that there are infinitely
many solutions to the equation.
d / The
x − sin 1/x.
different ways in which a function
can attain an extremum: e.g., at
an endpoint, at a place where its
derivative is zero, or at a nondifferentiable kink. The following theorem allows us to make a very general statement about all these possible cases, assuming only continuity.
The extreme value theorem states
that if f is a continuous real-valued
function on the real-number interval defined by a ≤ x ≤ b, then f
has maximum and minimum values on that interval, which are attained at specific points in the interval.
Let’s first see why the assumptions
are necessary. If we weren’t combined to a finite interval, then y =
x would be a counterexample, because it’s continuous and doesn’t
have any maximum or minimum
The extreme value theorem
value. If we didn’t assume continuity, then we could have a funcIn chapter 1, we saw that locat- tion defined as y = x for x < 1,
ing maxima and minima of func- and y = 0 for x ≥ 1; this functions may in general be fairly dif- tion never gets bigger than 1, but
ficult, because there are so many it never attains a value of 1 for any
specific value of x.
The extreme value theorem is
proved, in a somewhat more general form, on page 157.
Example 37
. Find the maximum value of the polynomial P(x) = x 3 + x 2 + x + 1 for
−5 ≤ x ≤ 5.
. Polynomials are continuous, so the
extreme value theorem guarantees
that such a maximum exists. Suppose
we try to find it by looking for a place
where the derivative is zero. The
derivative is 3x 2 + 2x + 1, and setting it
equal to zero gives a quadratic equation, but application of the quadratic
formula shows that it has no real solutions. It appears that the function
doesn’t have a maximum anywhere
(even outside the interval of interest)
that looks like a smooth peak. Since it
doesn’t have kinks or discontinuities,
there is only one other type of maximum it could have, which is a maximum at one of its endpoints. Plugging
in the limits, we find P(−5) = −104
and P(5) = 156, so we conclude that
the maximum value on this interval is
3.2 Limits
easier to prove with infinitesimals
than with limits.
Historically, the calculus of infinitesimals as created by Newton and Leibniz was reinterpreted
in the nineteenth century by
Cauchy, Bolzano, and Weierstrass
in terms of limits. All mathematicians learned both languages, and
switched back and forth between
them effortlessly, like the lady I
overheard in a Southern California
supermarket telling her mother,
“Let’s get that one, con los nuts.”
Those who had been trained in infinitesimals might hear a statement
using the language of limits, but
translate it mentally into infinitesimals; to them, every statement
about limits was really a statement about infinitesimals. To their
younger colleagues, trained using
limits, every statement about infinitesimals was really to be understood as shorthand for a limiting
process. When Robinson laid the
rigorous foundations for the hyperreal number system in the 1960’s, a
common objection was that it was
really nothing new, because every statement about infinitesimals
was really just a different way of
expressing a corresponding statement about limits; of course the
same could have been said about
Weierstrass’s work of the preceding century! In reality, all practitioners of calculus had realized
all along that different approaches
worked better for different problems; problem 13 on page 82 is an
example of a result that is much
The Weierstrass definition of a
limit is this:
Definition of the limit
We say that ` is the limit of the
function f (x) as x approaches a,
lim f (x) = `
if the following is true: for any real
number , there exists another real
number δ such that for all x in the
interval a−δ ≤ x ≤ a+δ, the value
of f lies within the range from `−
to ` + .
Intuitively, the idea is that if I want
you to make f (x) close to `, I just
have to tell you how close, and you
can tell me that it will be that close
as long as x is within a certain distance of a.
In terms of infinitesimals, we have:
Definition of the limit
We say that ` is the limit of the
function f (x) as x approaches a,
lim f (x) = `
if the following is true: for any infinitesimal number dx, the value of
f (a+dx) is finite, and the standard
part of f (a + dx) equals `.
The two definitions are equivalent.
Sometimes a limit can be evaluated
simply by plugging in numbers:
Example 38
. Evaluate
. Plugging in x = 0, we find that the
limit is 1.
defined, and moreover it would not be
valid to multiply both the top and the
bottom by x. In general, it’s not valid
algebra to multiply both the top and
the bottom of a fraction by 0, because
the result is 0/0, which is undefined.
But we didn’t actually multiply both the
top and the bottom by zero, because
we never let x equal zero. Both the
Weierstrass definition and the definition in terms of infinitesimals only refer to the properties of the function in a
region very close to the limiting point,
not at the limiting point itself.
In some examples, plugging in fails
if we try to do it directly, but can
be made to work if we massage the This is an example in which the function was not well defined at a certain
expression into a different form:
Example 39
. Evaluate
x→0 1
+ 8686
. Plugging in x = 0 fails because division by zero is undefined.
point, and yet the limit of the function
was well defined as we approached
that point. In a case like this, where
there is only one point missing from
the domain of the function, it is natural
to extend the definition of the function
by filling in the “gap tooth.” Example
41 below shows that this kind of fillingin procedure is not always possible.
Intuitively, however, we expect that the
limit will be well defined, and will equal
2, because for very small values of
x, the numerator is dominated by the
2/x term, and the denominator by the
1/x term, so the 7 and 8686 terms will
matter less and less as x gets smaller
and smaller.
To demonstrate this more rigorously, a
trick that works is to multiply both the
top and the bottom by x, giving
2 + 7x
1 + 8686x
which equals 2 when we plug in x = 0,
so we find that the limit is zero.
This example is a little subtle, because
when x equals zero, the function is not
e / Example 40, the function 1/x 2 .
Example 40
. Investigate the limiting behavior of
1/x 2 as x approaches 0, and 1.
. At x = 1, plugging in works, and we
find that the limit is 1.
At x = 0, plugging in doesn’t work,
because division by zero is undefined. Applying the definition in terms
of infinitesimals to the limit as x approaches 0, we need to find out
whether 1/(0 + dx)2 is finite for infinitesimal dx, and if so, whether it always has the same standard part. But
clearly 1/(0 + dx)2 = dx −2 is always
infinite, and we conclude that this limit
is undefined.
f / Example 41, the function tan−1 (1/x).
Example 41
. Investigate the limiting behavior of
f (x) = tan−1 (1/x) as x approaches 0.
. Plugging in doesn’t work, because
division by zero is undefined.
In the definition of the limit in terms
of infinitesimals, the first requirement
is that f (0 + dx) be finite for infinitesimal values of dx. The graph makes
this look plausible, and indeed we can
prove that it is true by the transfer principle. For any real x we have −π/2 ≤
f (x) ≤ π/2, and by the transfer principle this holds for the hyperreals as
well, and therefore f (0 + dx) is finite.
The second requirement is that the
standard part of f (0 + dx) have a
uniquely defined value. The graph
shows that we really have two cases
to consider, one on the right side of
the graph, and one on the left. Intuitively, we expect that the standard
part of f (0 + dx) will equal π/2 for positive dx, and −π/2 for negative, and
thus the second part of the definition
will not be satisfied. For a more formal
proof, we can use the transfer principle. For real x with 0 < x < 1, for example, f is always positive and greater
than 1, so we conclude based on the
transfer principle that f (0 + dx) > 1
for positive infinitesimal dx. But on
similar grounds we can be sure that
f (0 + dx) < −1 when dx is negative
and infinitesimal. Thus the standard
part of f (0 + dx) can have different values for different infinitesimal values of
dx, and we conclude that the limit is
In examples like this, we can define
a kind of one-sided limit, notated like
lim tan−1 = −
−1 1
lim tan
where the notations x → 0− and
x → 0+ are to be read “as x approaches zero from below,” and “as x
approaches zero from above.”
3.3 L’Hôpital’s rule
Consider the limit
sin x
x→0 x
Plugging in doesn’t work, because
we get 0/0. Division by zero is
undefined, both in the real number system and in the hyperreals.
A nonzero number divided by a
small number gives a big number; a
nonzero number divided by a very
small number gives a very big number; and a nonzero number divided
by an infinitesimal number gives
an infinite number. On the other
hand, dividing zero by zero means
looking for a solution to the equation 0 = 0q, where q is the result
of the division. But any q is a
solution of this equation, so even
speaking casually, it’s not correct
to say that 0/0 is infinite; it’s not
infinite, it’s anything we like.
like, if we’re willing to make x as
close to 0 as necessary. The graph
helps to make this plausible.
g / The graph of sin x/x.
The general idea here is that for
small values of x, the small-angle
Since plugging in zero didn’t work, approximation sin x ≈ x obtains,
let’s try estimating the limit by and as x gets smaller and smaller,
plugging in a number for x that’s the approximation gets better and
small, but not zero. On a calcula- better, so sin x/x gets closer and
closer to 1.
But we still haven’t proved rigor.
that the limit is exactly 1.
Let’s try using the definition of the
It looks like the limit is 1. We can limit in terms of infinitesimals.
confirm our conjecture to higher
sin x
sin(0 + dx)
= st
precision using Yacas’s ability to
x→0 x
0 + dx
do high-precision arithmetic:
dx + . . .
= st
where we’ve used the identity
sin(p + q) = sin p cos q + sin q cos p,
and . . . stands for terms of order
sin 0.00001
= 0.999999999983333
dx2 . So
It’s looking pretty one-ish. This is
the idea of the Weierstrass definition of a limit: it seems like we can
get an answer as close to 1 as we
sin x
. . .i
= st 1 +
x→0 x
We can check our work using Inf:
: (sin d)/d
by assumption). But the standard
part of du/dx is the definition of
the derivative u̇, and likewise for
dv/dx, so this establishes the result.
(The ... is where I’ve snipped
We will generalize L’Hôpital’s rule
trailing terms from the output.)
on p. 65.
This is a special case of a the following rule for calculating limits By the way, the housetop accent
on the “ô” in l’Hôpital means that
involving 0/0:
in Old French it used to be spelled
and pronounced “l’Hospital,” but
L’Hôpital’s rule (simplest form)
the “s” later became silent, so they
If u and v are functions with stopped writing it. So yes, it is the
u(a) = 0 and v(a) = 0, the deriva- same word as “hospital.”
tives v̇(a) and v̇(a) are defined, and
Example 42
the derivative v̇(a) 6= 0, then
. Evaluate
lim =
x→a v
ex − 1
. Taking the derivatives of the top and
bottom, we find ex /1, which equals 1
when evaluated at x = 0.
Proof: Since u(a) = 0, and the
derivative du/dx is defined at a,
u(a+dx) = du is infinitesimal, and
likewise for v. By the definition of . Evaluate
the limit, the limit is the standard
part of
Example 43
x −1
x 2 − 2x + 1
where by assumption the numerator and denominator are both
defined (and finite, because the
derivative is defined in terms of
the standard part). The standard part of a quotient like p/q
equals the quotient of the standard parts, provided that both p
and q are finite (which we’ve established), and q 6= 0 (which is true
. Plugging in x = 1 fails, because both
the top and the bottom are zero. Taking the derivatives of the top and bottom, we find 1/(2x − 2), which blows
up to infinity when x = 1. To symbolize the fact that the limit is undefined,
and undefined because it blows up to
infinity, we write
x −1
x 2 − 2x + 1
3.4 Another
perspective on
An expression like 0/0, called
an indeterminate form, can be
thought of in a different way in
terms of infinitesimals. Suppose
I tell you I have two infinitesimal
numbers d and e in my pocket,
and I ask you whether d/e is finite, infinite, or infinitesimal. You
can’t tell, because d and e might
not be infinitesimals of the same
order of magnitude. For instance,
if e = 37d, then d/e = 1/37 is finite; but if e = d2 , then d/e is infinite; and if d = e2 , then d/e is
infinitesimal. Acting this out with
numbers that are small but not infinitesimal,
= 1000
= .001
On the other hand, suppose I tell
you I have an infinitesimal number d and a finite number x, and
I ask you to speculate about d/x.
You know for sure that it’s going to
be infinitesimal. Likewise, you can
be sure that x/d is infinite. These
aren’t indeterminate forms.
We can do something similar with
infinite numbers. If H and K are
both infinite, then H − K is indeterminate. It could be infinite, for
example, if H was positive infinite
and K = H/2. On the other hand,
it could be finite if H = K + 1.
Acting this out with big but finite
1000 − 500 = 500
1001 − 1000 = 1
Example 44
. If
is H + 1 − H − 1 finite, infinite, infinitesimal, or indeterminate?
. Trying it with a finite, big number, we
1000001− 999999
= 1.00000000020373 × 10−3
which is clearly a wannabe infinitesimal. We can verify the result using
: H=1/d
: sqrt(H+1)-sqrt(H-1)
For convenience, the first line of input
defines an infinite number H in terms
of the calculator’s built-in infinitesimal
d. The result has only positive powers
of d, so it’s clearly infinitesimal.
More rigorously, we
can rewrite
√ p
1 + 1/H −
1 − 1/H). Since the √derivative of
the square root function x evaluated
at x = 1 is 1/2, we can approximate
this as
√ 1
+ ... − 1 −
+ ...
H 1+
√ 1
= H
+ ...
= √
number. That would be the type
of fallacy that lay behind the bogus proof on page 30 that 1 = 1/2,
which assumed that all infinities
had to be the same size.
A somewhat different example is
the arctangent function. The arctangent of 1000 equals approxiwhich is infinitesimal.
mately 1.5698, and inputting bigger and bigger numbers gives an3.5 Limits at infinity swers that appear to get closer
The definition of the limit in terms and closer to π/2 ≈ 1.5707. But
of infinitesimals extends immedi- the arctangent of -1000 is approxiately to limiting processes where mately −1.5698, i.e., very close to
x gets bigger and bigger, rather −π/2. From these numerical obthan closer and closer to some fi- servations, we conjecture that
nite value. For example, the funclim tan−1 x
tion 3 + 1/x clearly gets closer
and closer to 3 as x gets bigger
and bigger. If a is an infinite equals π/2 for positive infinite a,
number, then the definition says but −π/2 for negative infinite a.
that evaluating this expression at It would not be correct to write
a + dx, where dx is infinitesimal,
lim tan−1 x =
gives a result whose standard part x→∞
is 3. It doesn’t matter that a
happens to be infinite, the defini- because it does matter what infition still works. We also note that nite number we pick. Instead we
in this example, it doesn’t matter write
what infinite number a is; the limit
lim tan−1 x =
equals 3 for any infinite a. We can
write this fact as
lim tan x = −
lim 3 +
Some expressions don’t have this
where the symbol ∞ is to be in- kind of limit at all. For examterpreted as “nyeah nyeah, I don’t ple, if you take the sines of big
even care what infinite number you numbers like a thousand, a million,
put in here, I claim it will work etc., on your calculator, the reout to 3 no matter what.” The sults are essentially random numsymbol ∞ is not to be interpreted bers lying between −1 and 1. They
as standing for any specific infinite don’t settle down to any particular
value, because the sine function oscillates back and forth forever. To
prove formally that limx→+∞ sin x
is undefined, consider that the sine
function, defined on the real numbers, has the property that you
can always change its result by at
least 0.1 if you add either 1.5 or
−1.5 to its input. For example,
sin(.8) ≈ 0.717, and sin(.8 − 1.5) ≈
Applying the transfer
principle to this statement, we find
that the same is true on the hyperreals. Therefore there cannot be
any value ` that differs infinitesimally from sin a for all positive infinite values of a.
Another approach is to use l’Hôpital’s
rule. The derivative of the top is 2, and
the derivative of the bottom is 1, so the
limit is 2/1=2.
3.6 Generalizations
of l’Hôpital’s rule
Mathematical theorems are sometimes like cars. I own a Honda Fit
that is about as bare-bones as you
can get these days, but persuading a dealer to sell me that car
was like pulling teeth. The salesman was absolutely certain that
any sane customer would want to
pay an extra $1,800 for such crucial amenities as floor mats and a
Often we’re interested in finding
chrome tailpipe. L’Hôpital’s rule
the limit as x approaches infinity
in its most general form is a much
of an expression that is written as
fancier piece of machinery than
an indeterminate form like H/K,
the stripped down model described
where both H and K are infinite.
on p. 60. The price you pay for
Example 45 the deluxe model is that the proof
. Evaluate the limit
becomes much more complicated
than the one-liner that sufficed for
2x + 7
the simple version.
x→∞ x + 8686
. Intuitively, if x gets large enough the
constant terms will be negligible, and
the top and bottom will be dominated
by the 2x and x terms, respectively,
giving an answer that approaches 2.
One way to verify this is to divide both
the top and the bottom by x, giving
2 + x7
1 + 8686
Multiple applications of the rule
In the following example, we have
to use l’Hôpital’s rule twice before
we get an answer.
Example 46
. Evaluate
If x is infinite, then the standard part
of the top is 2, the standard part of the
bottom is 1, and the standard part of
the whole thing is therefore 2.
1 + cos x
(x − π)2
. Applying l’Hôpital’s rule gives
− sin x
2(x − π)
which still produces 0/0 when we plug
in x = π. Going again, we get
Limits at infinity
The indeterminate form ∞/∞
It is straightforward to prove a
variant of l’Hôpital’s rule that allows us to do limits at infinity. The
general proof is left as an exercise
(problem 8, p. 67). The result is
that l’Hôpital’s rule is equally valid
when the limit is at ±∞ rather
than at some real number a.
Consider an example like this:
. Evaluate
− cos x
The reason that this always works
is outlined on p. 150.
Example 48
1 + 1/x
1 + 2/x
This is an indeterminate form like
∞/∞ rather than the 0/0 form
for which we’ve already proved
l’Hôpital’s rule. As proved on
p. 151, l’Hôpital’s rule applies to
examples like this as well.
Example 47
1 + 1/x
1 + 2/x
. Both the numerator and the denominator go to infinity. Differentiation of the top and bottom gives
(−x −2 )/(−2x −2 ) = 1/2. We can see
that the reason the rule worked was
that (1) the constant terms were irrelevant because they become negligible
as the 1/x terms blow up; and (2) differentiating the blowing-up 1/x terms
makes them into the same x −2 on top
and bottom, which cancel.
Note that we could also have gotten
this result without l’Hôpital’s rule, simply by multiplying both the top and the
bottom of the original expression by x
in order to rewrite it as (x + 1)/(x + 2).
. We could use a change of variable
to make this into example 39 on p. 59,
which was solved using an ad hoc and
multiple-step procedure. But having
established the more general form of
l’Hôpital’s rule, we can do it in one
step. Differentiation of the top and bottom produces
. Evaluate
2x + 7
x + 8686
2x + 7
= =1
x + 8686 1
(a) Prove, using the Weierstrass definition of the limit,
that if limx→a f (x) = F and
limx→a g(x) = G both exist, them
limx→a [f (x) + g(x)] = F + G, i.e.,
that the limit of a sum is the sum
of the limits. (b) Prove the same
thing using the definition of the
limit in terms of infinitesimals.
. Solution, p. 183
Sketch the graph of the function e−1/x , and evaluate the following four limits:
exactly, and check your result by
numerical approximation.
. Solution, p. 184
x→0 x
lim e−1/x
lim e−1/x
. Solution, p. 183
Verify the following limits.
s3 − 1
s→1 s − 1
1 − cos θ
5x2 − 2x
n(n + 1)
n→∞ (n + 2)(n + 3)
ax2 + bx + c
x→∞ dx2 + ex + f
. Solution, p. 183
[Granville, 1911]
x cos x
1 − 2x
u→0 eu + e−u − 2
lim e−1/x
lim e−1/x
She applies l’Hôpital’s rule, differentiating top and bottom to find
1/ex , which equals 1 when she
plugs in x = 0. What is wrong
with her reasoning?
. Solution, p. 185
Amy is asked to evaluate
exactly, and check your result by
numerical approximation.
. Solution, p. 185
sin t
exactly, and check your result by
numerical approximation.
. Solution, p. 185
Prove a form of l’Hôpital’s
rule stating that
f (x)
is equal to the limit of f 0 /g 0 at infinity. Hint: change to some new
variable u such that x → ∞ corresponds to u → 0.
. Solution, p. 185
Prove that the linear function y = ax + b, where a and b are
real, is continuous, first using the
definition of continuity in terms of
infinitesimals, and then using the
definition in terms of the Weierstrass limit.
. Solution, p. 185
4 Integration
4.1 Definite and
Example 49
Because any formula can be differentiated symbolically to find another formula, the main motivation for doing derivatives numerically would be if the function to
be differentiated wasn’t known in
symbolic form. A typical example might be a two-person network
computer game, in which player
A’s computer needs to figure out
player B’s velocity based on knowledge of how her position changes
over time. But in most cases, it’s
numerical integration that’s interesting, not numerical differentiation.
As a warm-up, let’s see how to
do a running sum of a discrete
function using Yacas. The following program computers the sum
1+2+. . .+100 discussed to on page
7. Now that we’re writing real
computer programs with Yacas, it
would be a good idea to enter each
program into a file before trying to
run it. In fact, some of these examples won’t run properly if you just
start up Yacas and type them in
one line at a time. If you’re using
Adobe Reader to read this book,
you can do Tools>Basic>Select,
select the program, copy it into a
file, and then edit out the line num-
n := 1;
sum := 0;
While (n<=100) [
sum := sum+n;
n := n+1;
The semicolons are to separate one
instruction from the next, and they
become necessary now that we’re
doing real programming. Line 1
of this program defines the variable n, which will take on all the
values from 1 to 100. Line 2 says
that we haven’t added anything up
yet, so our running sum is zero do
far. Line 3 says to keep on repeating the instructions inside the
square brackets until n goes past
100. Line 4 updates the running
sum, and line 5 updates the value
of n. If you’ve never done any programming before, a statement like
n:=n+1 might seem like nonsense
— how can a number equal itself
plus one? But that’s why we use
the := symbol; it says that we’re
redefining n, not stating an equation. If n was previously 37, then
after this statement is executed, n
will be redefined as 38. To run the
program on a Linux computer, do
this (assuming you saved the program in a file named sum.yacas):
% yacas -pc sum.yacas
Here the % symbol is the computer’s prompt.
The result is
5,050, as expected. One way of
stating this result is
n = 5050
The capital Greek letter Σ, sigma,
is used because it makes the “s”
sound, and that’s the first sound in
the word “sum.” The n = 1 below
the sigma says the sum starts at 1,
and the 100 on top says it ends at
100. The n is what’s known as a
dummy variable: it has no meaning outside the context of the sum.
Figure a shows the graphical interpretation of the sum: we’re adding
up the areas of a series of rectangular strips. (For clarity, the figure
only shows the sum going up to 7,
rather than 100.)
pretation of what we’re trying to
do: find the area of the shaded
triangle. This is an example we
know how to do symbolically, so
we can do it numerically as well,
and check the answers against each
other. Symbolically, the area is
given by the integral. To integrate the function ẋ(t) = t, we
know we need some function with
a t2 in it, since we want something
whose derivative is t, and differentiation reduces the power by one.
The derivative of t2 would be 2t
rather than t, so what we want is
x(t) = t2 /2. Let’s compute the
area of the triangle that stretches
along the t axis from 0 to 100:
x(100) = 100/ 2 = 5000.
b / Graphical interpretation of the integral of the
function ẋ(t) = t.
Figure c shows how to accomplish
the same thing numerically. We
break up the area into a whole
bunch of very skinny rectangles.
Ideally, we’d like to make the width
Now how about an integral? Fig- of each rectangle be an infinitesiure b shows the graphical inter- mal number dx, so that we’d be
a / Graphical interpretation of the sum 1+2+. . .+
adding up an infinite number of infinitesimal areas. In reality, a computer can’t do that, so we divide up
the interval from t = 0 to t = 100
into H rectangles, each with finite width dt = 100/H. Instead
of making H infinite, we make it
the largest number we can without
making the computer take too long
to add up the areas of the rectangles.
bolic result to three digits of precision. Changing H to 10,000 gives
5, 000.5, which is one more digit.
Clearly as we make the number
of rectangles greater and greater,
we’re converging to the correct result of 5,000.
In the Leibniz notation, the thing
we’ve just calculated, by two different techniques, is written like this:
Z 100
t dt = 5, 000
It looks a lot like the Σ notation,
with the Σ replaces by a flattenedout letter “S.” The t is a dummy
variable. What I’ve been casually
referring to as an integral is really two different but closely related things, known as the definite
integral and the indefinite integral.
c / Approximating the integral numerically.
Example 50
tmax := 100;
H := 1000;
dt := tmax/H;
sum := 0;
t := 0;
While (t<=tmax) [
sum := N(sum+t*dt);
t := N(t+dt);
Definition of the indefinite integral
If ẋ is a function, then a function
x is an indefinite integral of ẋ if, as
implied by the notation, dx/dt =
Interpretation: Doing an indefinite integral means doing the opposite of differentiation. All the
possible indefinite integrals are the
same function except for an additive constant.
Example 51
In example 50, we split the in- . Find the indefinite integral of the
terval from t = 0 to 100 into function ẋ(t) = t.
H = 1000 small intervals, each
. Any function of the form
with width dt = 0.1. The result is
5,005, which agrees with the symx(t) = t 2 /2 + c
where c is a constant, is an indefinite integral of this function, since its
derivative is t.
ẋ(t)dt = x(b) − x(a)
Definition of the definite integral
If ẋ is a function, then the definite
integral of ẋ from a to b is defined The fundamental theorem is
proved on page 152. The idea it
Z b
expresses is that integration and
differentiation are inverse operaẋ(t)dt
tions. That is, integration undoes
differentiation, and differentiation
ẋ (a + i∆t) ∆t
= lim
undoes integration.
where ∆t = (b − a)/H.
Interpretation: What we’re calculating is the area under the graph
of ẋ, from a to b. (If the graph
dips below the t axis, we interpret
the area between it and the axis as
a negative area.) The thing inside
the limit is a calculation like the
one done in example 50, but generalized to a 6= 0. If H was infinite,
then ∆t would be an infinitesimal
number dt.
Example 52
. Interpret the indefinite integral
Z 2
1 t
graphically; then evaluate it it both
symbolically and numerically, and
check that the two results are consistent.
4.2 The fundamental
theorem of
d / The indefinite integral
The fundamental theorem of calcuR2
Let x be an indefinite integral of
ẋ, and let ẋ be a continuous func- . Figure d shows the graphical intertion (one whose graph is a single pretation. The numerical calculation
connected curve). Then
requires a trivial variation on the program from example 50:
a := 1;
b := 2;
H := 1000;
dt := (b-a)/H;
sum := 0;
t := a;
While (t<=b) [
sum := N(sum+(1/t)*dt);
t := N(t+dt);
The result is 0.693897243, and
increasing H to 10,000 gives
0.6932221811, so we can be
fairly confident that the result equals
0.693, to 3 decimal places.
(cf ) = c
But since the indefinite integral is
just the operation of undoing a
derivative, the same kind of rules
must hold true for indefinite integrals as well:
(f + g)dx = f dx + gdx
(cf )dx = c
f dx
And since a definite integral can be
Symbolically, the indefinite integral is found by plugging in the upper and
x = ln t. Using the fundamental the- lower limits of integration into the
orem of calculus, the area is ln 2 − indefinite integral, the same properties must be true of definite inteln 1 ≈ 0.693147180559945.
grals as well.
Judging from the graph, it looks plausible that the shaded area is about
This is an interesting example, because the natural log blows up to negative infinity as t approaches 0, so it’s
not possible to add a constant onto
the indefinite integral and force it to be
equal to 0 at t = 0. Nevertheless, the
fundamental theorem of calculus still
4.3 Properties of the
Example 53
. Evaluate the indefinite integral
(x + 2 sin x)dx
. Using the additive property, the integral becomes
xdx + 2 sin xdx
Then the property of scaling by a constant lets us change this to
xdx + 2 sin xdx
We need a function whose derivative
Let f and g be two functions of x, is x, which would be x 2 /2, and one
and let c be a constant. We already whose derivative is sin x, which must
be − cos x, so the result is
know that for derivatives,
(f + g) =
dx dx
1 2
x − 2 cos x + c
4.4 Applications
it outside of the integral, so
Z b
1 dx
b−a a
y x|ba
y (b − a)
ȳ =
In the story of Gauss’s problem of
adding up the numbers from 1 to
100, one interpretation of the result, 5,050, is that the average of
all the numbers from 1 to 100 is
50.5. This is the ordinary definiExample 55
tion of an average: add up all the . Find the average value of the functhings you have, and divide by the tion y = x 2 for values of x ranging from
number of things. (The result in 0 to 1.
this example makes sense, because
half the numbers are from 1 to 50,
Z 1
and half are from 51 to 100, so the
ȳ =
x 2 dx
average is half-way between 50 and
1−0 0
1 =
x 3 0
Similarly, a definite integral can
also be thought of as a kind of average. In general, if y is a function of
x, then the average, or mean, value
of y on the interval from x = a to
b can be defined as
The mean value theorem
If the continuous function y(x) has
Z b
average value ȳ on the inter1
ȳ =
y dx
x = a to b, then y atb−a a
tains its average value at least once
in that interval, i.e., there exists ξ
In the continuous case, dividing by
with a < ξ < b such that y(ξ) = ȳ.
b − a accomplishes the same thing
as dividing by the number of things
in the discrete case.
The mean value theorem is proved
on page 159. The special case in
Example 54
. Show that the definition of the aver- which ȳ = 0 is known as Rolle’s
age makes sense in the case where theorem.
the function is a constant.
. If y is a constant, then we can take
Example 56
. Verify the mean value theorem for
y = x 2 on the interval from 0 to 1.
1 2 kx 2
1 2
= ka
. The mean value is 1/3, as shown in
p55. This √value is achieved
at x = 1/3 = 1/ 3, which lies between 0 and 1.
The reason W grows like a2 , not just
like a, is that as the spring is compressed more, more and more effort
is required in order to compress it.
In physics, work is a measure of
the amount of energy transferred
by a force; for example, if a horse
sets a wagon in motion, the horse’s
force on the wagon is putting some
energy of motion into the wagon.
When a force F acts on an object that moves in the direction of
the force by an infinitesimal distance dx, the infinitesimal work
done is dW = F dx. Integrating
both sides, we have W = a F dx,
where the force may depend on x,
and a and b represent the initial
and final positions of the object.
Example 57
. A spring compressed by an amount
x relative to its relaxed length provides
a force F = kx. Find the amount of
work that must be done in order to
compress the spring from x = 0 to
x = a. (This is the amount of energy
stored in the spring, and that energy
will later be released into the toy bullet.)
W =
F dx
Mathematically, the probability
that something will happen can be
specified with a number ranging
from 0 to 1, with 0 representing impossibility and 1 representing certainty. If you flip a coin, heads and
tails both have probabilities of 1/2.
The sum of the probabilities of all
the possible outcomes has to have
probability 1. This is called normalization.
e / Normalization:
probability of picking
land plus the probability
of picking water adds up
to 1.
So far we’ve discussed random processes having only two possible
outcomes: yes or no, win or lose, sult from 1 to 3 is 1/2. The funcon or off. More generally, a ran- tion shown on the graph is called
dom process could have a result the probability distribution.
that is a number. Some processes
yield integers, as when you roll a
die and get a result from one to
six, but some are not restricted to
whole numbers, e.g., the height of
a human being, or the amount of
time that a uranium-238 atom will
exist before undergoing radioactive
decay. The key to handling these
continuous random variables is the
concept of the area under a curve,
i.e., an integral.
g / Rolling two dice and adding them
Figure g shows the probabilities of
various results obtained by rolling
two dice and adding them together, as in the game of craps.
f / Probability distribution for the result The probabilities are not all the
same. There is a small probability
of rolling a single die.
of getting a two, for example, because there is only one way to do it,
Consider a throw of a die. If the die by rolling a one and then another
is “honest,” then we expect all six one. The probability of rolling a
values to be equally likely. Since all seven is high because there are six
six probabilities must add up to 1, different ways to do it: 1+6, 2+5,
then probability of any particular etc.
value coming up must be 1/6. We
can summarize this in a graph, f. If the number of possible outcomes
Areas under the curve can be inter- is large but finite, for example the
preted as total probabilities. For number of hairs on a dog, the
instance, the area under the curve graph would start to look like a
from 1 to 3 is 1/6+1/6+1/6 = 1/2, smooth curve rather than a zigguso the probability of getting a re- rat.
What about probability distributions for random numbers that are
not integers? We can no longer
make a graph with probability on
the y axis, because the probability of getting a given exact number is typically zero. For instance,
there is zero probability that a person will be exactly 200 cm tall,
since there are infinitely many possible results that are close to 200
but not exactly two, for example 199.99999999687687658766. It
doesn’t usually make sense, therefore, to talk about the probability
of a single numerical result, but it
does make sense to talk about the
probability of a certain range of results. For instance, the probability
that a randomly chosen person will
be more than 170 cm and less than
200 cm in height is a perfectly reasonable thing to discuss. We can
still summarize the probability information on a graph, and we can
still interpret areas under the curve
as probabilities.
But the y axis can no longer be a
unitless probability scale. In the
example of human height, we want
the x axis to have units of meters,
and we want areas under the curve
to be unitless probabilities. The
area of a single square on the graph
paper is then
(unitless area of a square)
= (width of square
with distance units)
×(height of square)
If the units are to cancel out, then
the height of the square must evidently be a quantity with units
of inverse centimeters. In other
words, the y axis of the graph is
to be interpreted as probability per
unit height, not probability.
Another way of looking at it is that
the y axis on the graph gives a
derivative, dP/dx: the infinitesimally small probability that x
will lie in the infinitesimally small
range covered by dx.
Example 58
. A computer language will typically
have a built-in subroutine that produces a fairly random number that
is equally likely to take on any value
in the range from 0 to 1. If you
take the absolute value of the difference between two such numbers, the
probability distribution is of the form
h / A probability distribution for human dP/dx = k (1 − x). Find the value of
the constant k that is required by norheight.
k (1 − x) dx
= kx −
1 2 kx 2
= k − k /2
k =2
j / Example 59.
Compare the number of people with
heights in the range of 130-135 cm to
the number in the range 135-140. .
Answer, p. 163
Example 59
. A laser is placed one meter away
from a wall, and spun on the ground
to give it a random direction, but if
the angle u shown in figure j doesn’t
come out in the range from 0 to π/2,
the laser is spun again until an angle in the desired range is obtained.
Find the probability distribution of the
distance x shown in the figure. The
derivative d tan−1 z/dz = 1/(1+z 2 ) will
be required (see example 65, page
. Since any angle between 0 and π/2
is equally likely, the probability distribution dP/du must be a constant, and
normalization tells us that the constant
must be dP/du = 2/π.
The laser is one meter from the wall,
so the distance x, measured in meters, is given by x = tan u. For the
i / The average can be interpreted as probability distribution of x, we have
the balance point of the probability distribution.
dP dP du
du dx
2 d tan−1 x
When one random variable is re= ·
lated to another in some mathe2
matical way, the chain rule can be
π(1 + x 2 )
used to relate their probability distributions.
Note that the range of possible values
of x theoretically extends from 0 to in-
finity. Problem 7 on page 102 deals
with this.
If the next Martian you meet asks
you, “How tall is an adult human?,” you will probably reply
with a statement about the average
human height, such as “Oh, about
5 feet 6 inches.” If you wanted to
explain a little more, you could say,
“But that’s only an average. Most
people are somewhere between 5
feet and 6 feet tall.” Without
bothering to draw the relevant bell
curve for your new extraterrestrial
acquaintance, you’ve summarized
the relevant information by giving
an average and a typical range of
variation. The average of a probability distribution can be defined
geometrically as the horizontal position at which it could be balanced
if it was constructed out of cardboard, i. This is a different way
of working with averages than the
one we did earlier. Before, had
a graph of y versus x, we implicitly assumed that all values of x
were equally likely, and we found
an average value of y. In this new
method using probability distributions, the variable we’re averaging
is on the x axis, and the y axis tells
us the relative probabilities of the
various x values.
variable, this becomes an integral,
Z b
x̄ =
Example 60
. For the situation described in example 58, find the average value of x.
x̄ =
x · 2(1 − x) dx
(x − x 2 ) dx
1 2 1 3 x − x 2
Sometimes we don’t just want to
know the average value of a certain variable, we also want to have
some idea of the amount of variation above and below the average.
The most common way of measuring this is the standard deviation,
defined by
Z b
(x − x̄)2
The idea here is that if there was
For a discrete-valued variable with no variation at all above or ben possible values, the average low the average, then the quantity
would be
(x − x̄) would be zero whenever
dP/dx was nonzero, and the stanx̄ =
x P (x)
dard deviation would be zero. The
reason for taking the square root
and in the case of a continuous of the whole thing is so that the
result will have the same units as
Example 61
. For the situation described in example 58, find the standard deviation of
. The square of the standard deviation
Z 1
σ2 =
(x − x̄)2
Z 1
(x − 1/3)2 · 2(1 − x) dx
−x 3 + x 2 − x +
so the standard deviation is
σ= √
≈ 0.236
of continuity for ẋ is necessary, by
exhibiting a discontinuous function
Write a computer program for which the theorem fails.
similar to the one in example 52
. Solution, p. 188
on page 72 to evaluate the definite
7 Sketch
√ the graphs of y = x
and y = x for 0 ≤ x ≤ 1. GraphZ 1
ically, what relationship should
R 1 exe
ist between the integrals 0 x2 dx
and 0 x dx? Compute both in. Solution, p. 186
tegrals, and verify that the results
Evaluate the integral
are related in the expected way.
Z 2π
Rp √
sin x dx
bx xdx, where
b is a constant.
and draw a sketch to explain why
. Solution, p. 188
your result comes out the way it
. Solution, p. 186
9 In a gasoline-burning car en3
Sketch the graph that repre- gine, the exploding air-gas mixture
makes a force on the piston, and
sents the definite integral
the force tapers off as the piston
Z 2
expands, allowing the gas to ex−x2 + 2x
pand. (a) In the approximation
F = k/x, where x is the position
and estimate the result roughly of the piston, find the work done
from the graph. Then evaluate the on the piston as it travels from
integral exactly, and check against x = a to x = b, and show that
your estimate.
the result only depends on the ra. Solution, p. 187
tio b/a. This ratio is known as
Make a rough guess as to the the compression ratio of the enaverage value of sin x for 0 < x < gine. (b) A better approximation,
π, and then find the exact result which takes into account the cooling of the air-gas mixture as it exand check it against your guess.
pands, is F = kx−1.4 . Compute
. Solution, p. 188
the work done in this case.
5 Show that the mean value theorem’s assumption of continuity is
necessary, by exhibiting a discon10 A certain variable x varies
tinuous function for which the therandomly from -1 to 1, with
orem fails.
. Solution, p. 188
distribution dP/dx =
Show that the fundamental k 1 − x2 .
theorem of calculus’s assumption (a) Determine k from the require-
Problem 9.
ment of normalization.
(b) Find the average value of x.
(c) Find its standard deviation.
Suppose that we’ve already
established that the derivative of
an odd function is even, and vice
versa. (See problem 30, p. 50.)
Something similar can be proved
for integration. However, the following is not quite right.
Let f be even, and let g = f (x)dx
be its indefinite integral. Then by
the fundamental theorem of calculus, f is the derivative of g. Since
we’ve already established that the
derivative of an odd function is
even, we conclude that g is odd.
Find all errors in the proof.
. Solution, p. 188
A perfectly elastic ball
bounces up and down forever, always coming back up to the same
height h. Find its average height.
Problem 13.
13 The figure shows a curve with
a tangent line segment of length 1
that sweeps around it, forming a
new curve that is usually outside
the old one. Prove Holditch’s theorem, which states that the new
curve’s area differs from the old
one’s by π. (This is an example
of a result that is much more difficult to prove without making use
of infinitesimals.)
5 Techniques
5.1 Newton’s method
In the 1958 science fiction novel
Have Space Suit — Will
Travel, by Robert Heinlein, Kip
is a high school student who wants
to be an engineer, and his father is
trying to convince him to stretch
himself more if he wants to get anything out of his education:
“Why did Van Buren fail of reelection? How do you extract the
cube root of eighty-seven?”
Van Buren had been a president;
that was all I remembered. But I
could answer the other one. “If
you want a cube root, you look in
a table in the back of the book.”
tle too small, and 53 = 125 is much
too big, we guess x ≈ 4.3. Testing our guess, we have 4.33 = 79.5.
We want y to get bigger by 7.5, and
we can use calculus to find approximately how much bigger x needs
to get in order to accomplish that:
∆x =
= 2
= 2
= 0.14
Increasing our value of x to 4.3 +
Dad sighed. “Kip, do you think
0.14 = 4.44, we find that 4.443 =
that table was brought down from
87.5 is a pretty good approximaon high by an archangel?”
tion to 87. If we need higher preciWe no longer use tables to com- sion, we can go through the process
pute roots, but how does a pocket again with ∆y = −0.5, giving
calculator do it?
A technique
∆x ≈ 2
called Newton’s method allows us
to calculate the inverse of any func= 0.14
tion efficiently, including cases that
x = 4.43
aren’t preprogrammed into a calx3 = 86.9
culator. In the example from the
novel, we know how to calculate This second iteration gives an exthe function y = x3 fairly accu- cellent approximation.
rately and quickly for any given
value of x, but we want to turn the
equation around and find x when
Example 62
y = 87. We start with a rough . Figure 62 shows the astronomer Jomental guess: since 43 = 64 is a lit- hannes Kepler’s analysis of the motion
and we want to find x when y =
2π/4 = 1.57. As a first guess, we try
x = π/2 (90 degrees), since the eccentricity of Mercury’s orbit is actually
much smaller than the example shown
in the figure, and therefore the planet’s
speed doesn’t vary all that much as it
goes around the sun. For this value of
x we have y = 1.36, which is too small
by 0.21.
1 − (0.206) cos x
∆x ≈
a / Example 62.
= 0.21
of the planets. The ellipse is the orbit of the planet around the sun. At
t = 0, the planet is at its closest approach to the sun, A. At some later
time, the planet is at point B. The angle x (measured in radians) is defined
with reference to the imaginary circle
encompassing the orbit. Kepler found
the equation
= x − e sin x
(The derivative dy /dx happens to be
1 at x = π/2.) This gives a new value
of x, 1.57+.21=1.78. Testing it, we
have y = 1.58, which is correct to
within rounding errors after only one
iteration. (We were only supplied with
a value of e accurate to three significant figures, so we can’t get a result
with precision better than about that
where the period, T , is the time required for the planet to complete a full
orbit, and the eccentricity of the ellipse, e, is a number that measures
how much it differs from a circle. The
relationship is complicated because
the planet speeds up as it falls inward
toward the sun, and slows down again
as it swings back away from it.
5.2 Implicit
We can differentiate any function
that is written as a formula, and
find a result in terms of a formula.
However, sometimes the original
problem can’t be written in any
nice way as a formula. For examThe planet Mercury has e = 0.206. ple, suppose we want to find dy/dx
Find the angle x when Mercury has in a case where the relationship becompleted 1/4 of a period.
tween x and y is given by the following equation:
. We have
y = x − (0.206) sin x
y 7 + y = x7 + x2
There is no equivalent of the
quadratic formula for seventhorder polynomials, so we have no
way to solve for one variable in
terms of the other in order to differentiate it. However, we can still
find dy/dx in terms of x and y.
Suppose we let x grow to x + dx.
Then for example the x2 term will
grow to (x + dx)2 = x + 2dx + dx2 .
The squared infinitesimal is negligible, so the increase in x2 was really just 2dx, and we’ve really just
computed the derivative of x2 with
respect to x and multiplied it by
dx. In symbols,
d(x2 )
· dx
= 2x dx
d(x2 ) =
5.3 Methods of
Change of variable
Sometimes an unfamiliar-looking
integral can be made into a familiar one by substituting a a new
variable for an old one. For example, we know how to integrate 1/x
— the answer is ln x — but what
Let u = 2x + 1. Differentiating
both sides, we have du = 2dx, or
dx = du/2, so
That is, the change in x2 is 2x
times the change in x. Doing this
to both sides of the original equation, we have
2x + 1
7y 6 dy + 1 dy = 7x6 dx + 2x dx
(7y 6 + 1)dy = (7x6 + 2x)dx
This still doesn’t give us a formula for the derivative in terms of
x alone, but it’s not entirely useless. For instance, if we’re given
a numerical value of x, we can always use Newton’s method to find
y, and then evaluate the derivative.
ln u + c
= ln(2x + 1) + c
d(y 7 + y) = d(x7 + x2 )
7x6 + 2x
7y 6 + 1
2x + 1
This technique is known as a
change of variable or a substitution. (Because the letter u is often employed, you may also see it
called u-substitution.)
In the case of a definite integral,
we have to remember to change the
limits of integration to reflect the
new variable.
. Evaluate
Example 63
dx/(2x + 1).
. As before, let u = 2x + 1.
Z x=4
Z u=9
x=3 2x + 1
= ln u 2
any hope of working. The following is a little more dastardly.
Example 65
. Evaluate
Here the notation |u=9
u=7 means to evaluate the function at 7 and 9, and subtract the former from the latter. The
result is
= (ln 9 − ln 7)
2x + 1 2
1 9
= ln
2 7
1 + x2
. The substitution that works is x =
tan u. First let’s see what this does
to the expression 1 + x 2 . The familiar
sin2 u + cos2 u = 1
when divided by cos2 u, gives
tan u + 1 = sec u
Sometimes, as in the next example,
a clever substitution is the secret to
so 1 + x 2 becomes sec2 u. But differdoing a seemingly impossible inteentiating both sides of x = tan u gives
Example 64
dx = d sin u(cos u)−1
. Evaluate
= (d sin u)(cos u)−1
+ (sin u)d (cos u)−1
= 1 + tan2 u du
√ dx
. The only hope for reducing√this to a
form we can do is to let u = x. Then
dx = d(u 2 ) = 2udu, so
e x
√ dx =
· 2u du
eu du
= 2e
so the integral becomes
sec2 udu
sec2 u
= tan−1 x + c
= 2eu
= sec2 u du
What mere mortal would ever
have suspected that the substitution x = tan u was the one that
Example 64 really isn’t so tricky, was needed in example 65? One
since there was only one logical possible answer is to give up and
choice for the substitution that had do the integral on a computer:
Integrate(x) 1/(1+x^2)
Another possible answer is that
you can usually smell the possibility of this type of substitution, involving a trig function,
when the thing to be integrated
contains something reminiscent of
the Pythagorean theorem, as suggested by figure b. The 1 + x2
looks like what you’d get if you
had a right triangle with legs 1 and
x, and were using the Pythagorean
theorem to find its hypotenuse.
b / The substitution x =
tan u.
Example 66
. Evaluate dx/ 1 − x 2 .
. The 1 − x 2 looks like what you’d
get if you had a right triangle with
hypotenuse 1 and a leg of length x,
and were using the Pythagorean theorem to find the other leg, as in figure c. This motivates us to try the
substitution x = cos u, which
1 − x2 =
√ = − sin u du and
1 − cos2 u = sin u. The result is
− sin u du
sin u
1 − x2
= cos−1 x
c / The substitution x =
cos u.
Integration by parts
Figure d shows a technique called
by parts. If the inteR
gral R vdu is easier than the integral udv, then we can calculate
the easier one, and then by simple geometry determine the one we
wanted. Identifying the large rectangle that surrounds both shaded
areas, and the small white rectangle on the lower left, we have
u dv =(area of large rectangle)
− (area of small rectangle)
v du
In the case of an indefinite integral,
we have a similar relationship derived from the product rule:
d(uv) = u dv + v du
u dv = d(uv) − v du
Integrating both sides, we have the
following relation.
Integration by parts
u dv = uv − v du
. There are two obvious possibilities
for splitting up the integrand into factors,
u dv = (x)(cos x dx)
u dv = (cos x)(x dx)
The first one is the one that lets us
make progress. If u = x, then du = dx,
and if dv = cos x dx, then integration
gives v = sin x.
d / Integration by parts.
Since a definite integral can always be done by evaluating an indefinite integral at its upper and
lower limits, one usually uses this
form. Integrals don’t usually come
prepackaged in a form that makes
it obvious that you should use integration by parts. What the equation for integration by parts tells
us is that if we can split up the
integrand into two factors, one of
which (the dv) we know how to
integrate, we have the option of
changing the integral into a new
form in which that factor becomes
its integral, and the other factor becomes its derivative. If we
choose the right way of splitting up
the integrand into parts, the result
can be a simplification.
x cos x dx =
u dv
= uv − v du
= x sin x − sin x dx
= x sin x + cos x
Of the two possibilities we considered for u and dv , the reason this
one helped was that differentiating x
gave dx, which was simpler, and integrating cos xdx gave sin x, which
was no more complicated than before. The second possibility would
have made things worse rather than
better, because integrating xdx would
have given x 2 /2, which would have
been more complicated rather than
Example 68
Example 67
. Evaluate
ln x dx.
. Evaluate
x cos x dx
. This one is a little tricky, because it
isn’t explicitly written as a product, and
yet we can attack it using integration
by parts. Let u = ln x and dv = dx.
ln x dx =
u dv
= uv − v du
= x ln x − x
= x ln x − x
Example 69
. Evaluate
x 2 ex dx.
. Integration by parts lets us split
the integrand into two factors, integrate one, differentiate the other, and
then do that integral. Integrating or
differentiating ex does nothing. Integrating x 2 increases the exponent,
which makes the problem look harder,
whereas differentiating x 2 knocks the
exponent down a step, which makes it
look easier. Let u = x 2 and dv = ex dx,
so that du = 2xdx and v = ex . We
then have
x 2 ex dx = x 2 ex − 2 xex dx
Although we don’t immediately know
how to evaluate this new integral, we
can subject it to the same type of integration by parts, now with u = x and
dv = ex dx. After the second integration by parts, we have:
Partial fractions
Given a function like
x−1 x+1
we can rewrite it over a common
denominator like this:
−x − 1 + x − 1
(x − 1)(x + 1)
= 2
x −1
But note that the original form is
easily integrated to give
Z −1
x−1 x+1
= − ln(x−1)+ln(x+1)+c
while faced with the form
−2/(x2 − 1), we wouldn’t have
known how to integrate it.
Note that the original function was
of the form (−1)/ . . . + (+1)/ . . .
It’s not a coincidence that the two
constants on top, −1 and +1, are
opposite in sign but equal in absolute value. To see why, consider
the behavior of this function for
x 2 ex dx = x 2 ex − 2 xex − ex dx large values of x. Looking at the
form −1/(x − 1) + 1/(x + 1), we
= x 2 ex − 2 xex − ex
might naively guess that for a large
= (x 2 − 2x + 2)ex
value of x such as 1000, it would
come out to be somewhere on the
order thousandths. But looking at
the form −2/(x2 − 1), we would
expect it to be way down in the
millionths. This seeming paradox
is resolved by noting that for large
values of x, the two terms in the
form −1/(x − 1) + 1/(x + 1) very
nearly cancel. This cancellation
could only have happened if the
constants on top were opposites
like plus and minus one.
can then be determined by algebra,
or by the following trick.
Numerical method
Suppose we evaluate 1/P (x) for a
value of x very close to one of the
roots. In the example of the polynomial x4 − 5x3 − 25x2 + 65x +
The idea of the method of partial 84, let r1 . . . r4 be the roots in
fractions is that if we want to do the order in which they were rean integral of the form
turned by Yacas. Then A1 can
found by evaluating 1/P (x) at
x = 3.000001:
P (x)
where P (x) is an nth order polynomial, we rewrite 1/P as
+ ...
P (x)
x − r1
x − rn
where r1 . . . rn are the roots of the
polynomial, i.e., the solutions of
the equation P (r) = 0. If the polynomial is second-order, you can
find the roots r1 and r2 using
the quadratic formula; I’ll assume
for the time being that they’re
real. For higher-order polynomials, there is no surefire, easy way
of finding the roots by hand, and
you’d be smart simply to use computer software to do it. In Yacas,
you can find the real roots of a
polynomial like this:
(I assume it uses Newton’s method
to find them.) The constants Ai
We know that for x very close to
3, the expression
x−3 x−7 x+4 x+1
will be dominated by the A1 term,
3.000001 − 3
A1 ≈ (−8930)(10−6 )
−8930 ≈
By the same method we can find
the other four constants:
(The N( ,30) construct is to tell
Yacas to do a numerical calculation rather than an exact symbolic
one, and to use 30 digits of precision, in order to avoid problems
with rounding errors.) Thus,
−8.93 × 10−3
2.84 × 10−3
4.33 × 10−3
1.04 × 10−2
The desired integral is
= −8.93 × 10−3 ln(x − 3)
P (x)
+ 2.84 × 10−3 ln(x − 7)
− 4.33 × 10−3 ln(x + 4)
+ 1.04 × 10−2 ln(x + 1)
As in the simpler example I started
off with, where P was second order and we got A1 = −A2 , in this
n = 4 example we expect that
A1 + A2 + A3 + A4 = 0, for otherwise the large-x behavior of the
partial-fraction form would be 1/x
rather than 1/x4 . This is a useful
way of checking the result: −8.93+
2.84 − 4.33 + 10.4 = −.02 ≈ 0.
First, the same factor may occur
more than once, as in x3 − 5x2 +
7x − 3 = (x − 1)(x − 1)(x − 3). In
this example, we have to look for
an answer of the form A/(x − 1) +
B/(x−1)2 +C/(x−3), the solution
being −.25/(x − 1) − .5/(x − 1)2 +
.25/(x − 3).
Second, the roots may be complex.
This is no show-stopper if you’re
using computer software that handles complex numbers gracefully.
(You can choose a c that makes the
result real.) In fact, as discussed in
section 8.3, some beautiful things
can happen with complex roots.
But as an alternative, any polynomial with real coefficients can be
factored into linear and quadratic
factors with real coefficients. For
each quadratic factor Q(x), we
then have a partial fraction of the
form (A + Bx)/Q(x), where A and
B can be determined by algebra.
In Yacas, this can be done using
the Apart function.
Example 70
. Evaluate the integral
(x 4 − 8x 3 + 8x 2 − 8x + 7
using the method of partial fractions.
. First we use Yacas to look for real
roots of the polynomial:
There are two possible complica- Unfortunately this polynomial seems
to have only two real roots; the rest
are complex. We can divide out the
factor (x − 1)(x − 7), but that still
leaves us with a second-order polynomial, which has no real roots. One approach would be to factor the polynomial into the form (x − 1)(x − 7)(x −
p)(x − q), where p and q are complex,
as in section 8.3. Instead, let’s use Yacas to expand the integrand in terms
of partial fractions:
We can now rewrite the integral like
x dx
x2 + 1
x2 + 1
x −7
x −1
things work under the hood, and to
avoid being completely dependent on
one particular piece of software. As
an illustration of this gem of wisdom,
I found that when I tried to make Yacas evaluate the integral in one gulp,
it choked because the calculation became too complicated! Because I understood the ideas behind the procedure, I was still able to get a result
through a mixture of computer calculations and working it by hand. Someone who didn’t have the knowledge of
the technique might have tried the integral using the software, seen it fail,
and concluded, incorrectly, that the integral was one that simply couldn’t be
done. A computer is no substitute for
Residue method
On p. 90 I introduced the trick of
carrying out the method of partial fractions by evaluating 1/P (x)
numerically at x = ri + , near
which we can evaluate as follows:
where 1/P blows up. Sometimes
we would like to have an exact re1
ln(x 2 + 1)
sult rather than a numerical ap25
proximation. We can accomplish
tan x
this by using an infinitesimal num50
ber dx rather than a small but filn(x − 7)
nite . For simplicity, let’s assume
that all of the n roots ri are dis−
ln(x − 1)
tinct, and that P ’s highest-order
term is xn . We can then write P
as the product P (x) = (x−r1 )(x−
In fact, Yacas should be able to do r2 ) . . . (x − rn ). For products like
the whole integral for us from scratch, this, there is a notation Π (capital
but it’s best to understand how these Greek letter “pi”) that works like
Σ does for sums:
P (x) =
(x − ri )
It’s not necessary that the roots be
real, but for now we assume that
they are. We want to find the coefficients Ai such that
X Ai
P (x)
x − ri
We then have
P (ri + dx)
dx j6=i (ri − rj + dx)
+ ...
dx j6=i (ri − rj )
+ ...,
was found numerically to be A1 ≈
−8.930 × 10−3 . Determine it exactly
using the residue method.
. Differentiation gives P 0 (x) = 4x 3 −
15x 2 − 50x + 65. We then have A1 =
1/P 0 (3) = −1/112.
Integrals that can’t be done
Integral calculus was invented in
the age of powdered wigs and harpsichords, so the original emphasis
was on expressing integrals in a
form that would allow numbers to
be plugged in for easy numerical
evaluation by scribbling on scraps
of parchment with a quill pen.
This was an era when you might
have to travel to a large city to get
access to a table of logarithms.
In this computationally impoverished environment, one always
where . . . represents finite terms wanted to get answers in what’s
that are negligible compared to the known as closed form and in terms
infinite ones. Multiplying on both of elementary functions.
sides by dx, we have
A closed form expression means
one written using a finite num+ . . . = Ai + . . . ,
ber of operations, as opposed to
P 0 (ri )
something like the geometric series
where the . . . now stand for in- 1 + x + x2 + x3 + . . ., which goes on
finitesimals which must in fact can- forever.
cel out, since both Ai and 1/P 0 are
Elementary functions are usually
real numbers.
taken to be addition, subtraction,
Example 71
multiplication, division, logs, and
. The partial-fraction decomposition
exponentials, as well as other funcof the function
tions derivable from these. For ex1
a cube root is allowed, since
x 4 − 5x 3 − 25x 2 + 65x + 84
x = e(1/3) ln x , and so are trig
was found numerically on p. 90. The functions and their inverses, since,
coefficient of the 1/(x − 3) term as we will see in chapter 8, they
can be expressed in terms of logs
and exponentials.
seem to work by a process of pattern matching. They recognize certain integrals as being of a form
that can’t be done, so they know
not to try.
In theory, “closed form” doesn’t
mean anything unless we state the
elementary functions that are allowed. In practice, when people
Example 72
Stand R at attention!
refer to closed form, they usually . Students!
have in mind the particular set You will now evaluate e−x +7x dx in
of elementary functions described closed form.
A traditional freshman calculus
course spends such a vast amount
of time teaching you how to do integrals in closed form that it may
be easy to miss the fact that this
is impossible for the vast majority
of integrands that you might randomly write down. Here are some
examples of impossible integrals:
. No sir, I can’t do that. By a change of
variables of the form u = x + c, where
c is a constant, we could clearly put
this into the form e−x dx, which we
know is impossible.
an integral such as
R −x2
dx is important enough that
we want to give it a name, tabulate it, and write computer subroutines that can evaluate it numerically. For example, statisticians define √
the R“error function”
erf(x) = (2/ π) e−x dx. Some−x2
times if you’re not sure whether an
integral can be done in closed form,
xx dx
you can put it into computer softZ
ware, which will tell you that it
sin x
reduces to one of these functions.
You then know that it can’t be
ex tan xdx
done in closed form. For example, if you ask the popular
web site
2 to do e√−x +7x dx,
The first of these is a form that it spits back (1/2)e49/4 π erf(x −
is extremely important in statis- 7/2). This tells you both that
tics (it describes the area under the you shouldn’t be wasting your time
standard “bell curve”), so you can trying to do the integral in closed
see that impossible integrals aren’t form and that if you need to evalujust obscure things that don’t pop ate it numerically, you can do that
up in real life.
using the erf function.
People who are proficient at doing As shown in the following example,
integrals in closed form generally just because an indefinite integral
can’t be done, that doesn’t mean
that we can never do a related definite integral.
. Evaluate
R π/2
Example 73
e− tan x (tan2 x + 1)dx.
. The obvious substitution to try is u =
tan x, and this reduces the integrand
to e−x . This proves that the corresponding indefinite integral is impossible to express in closed form. However, the definite integral can be expressed in closed form; it turns out to
be π/2. The trick for proving this is
given in example 98 on p. 132.
Sometimes computer software
can’t say anything about a particular integral at all. That doesn’t
mean that the integral can’t be
Computers are stupid,
and they may try brute-force
techniques that fail because the
computer runs out of memory
or CPU Rtime. For example, the
integral dx/(x10000 − 1) (problem 15, p. 125) can be done in
closed form using the techniques
of chapter 8, and it’s not too hard
for a proficient human to figure
out how to attack it, but every
computer program I’ve tried it on
has failed silently.
1 Graph the function y = ex − 6
7x and get an approximate idea of
where any of its zeroes are (i.e., for
what values of x we have y(x) = 0).
Use Newton’s method to find the
zeroes to three significant figures of 7
x a − x dx
Z p
x4 + bx2 dx
2 The relationship between x and
y is given by xy = sin y + x2 y 2 .
where b is a constant.
(a) Use Newton’s method to find
the nonzero solution for y when 8 Evaluate
x = 3. Answer: y = 0.2231
xe−x dx
(b) Find dy/dx in terms of x and
y, and evaluate the derivative at
the point on the curve you found in
part a. Answer: dy/dx = −0.0379 9 Evaluate
Based on an example by Craig B.
3 Suppose you want to evaluate
1 + sin 2x
and you’ve found
π x
= − tan
1 + sin x
in a table of integrals. Use a
change of variable to find the answer to the original problem.
xex dx
x2 sin x dx
Hint: Use integration by parts
more than once.
5 Evaluate
sin xdx
1 + cos2 x
10 Use integration by parts to
evaluate the following integrals.
sin−1 x dx
cos−1 x dx
tan−1 x dx
4 Evaluate
sin xdx
1 + cos x
x2 − x − 6
also can’t be done in closed form.
x + 3x2 − 4
Consider the integral
ex dx
where p is a constant. There is an
obvious substitution. If this is to
result in an integral that can be
x3 − x2 + 4x − 4
evaluated in closed form by a series of integrations by parts, what
are the possible values of p? Don’t
15 Apply integration by parts actually complete the integral; just
twice to
determine what values of p will
. Solution, p. 189
e−x cos x dx
Evaluate the hundredth
derivative of the function
examine what happens, and ma(x2 + 1)/(x3 − x) using paper and
nipulate the result in order to solve
pencil. [Vladimir Arnol’d]
the original integral. (An approach
. Solution, p. 189 ?
that doesn’t rely on tricks is given
in example 90 on p. 121.)
Plan, but do not actually
carry out the steps that would be
required in order to generalize the
result of example 69 on p. 89 in order to evaluate
xa b−x dx
where a and b are constants.
Which is easier, the generalization
from 2 to a, or the one from e to
b? Do we need to introduce any restrictions on a or b?
. Solution, p. 189
17 The integral e−x dx can’t
be done in closed form. Knowing
this, use a change of variable to
write down a different integral that
6 Improper integrals
6.1 Integrating a
function that
blows up
When we integrate a function that
blows up to infinity at some point
in the interval we’re integrating,
the result may be either finite or
. Integrate the function y = 1/ x
from x = 0 to x = 1.
a / The
Example 75
. Integrate the function y = 1/x 2 from
x = 0 to x = 1.
. The function blows up to infinity at
one end of the region of integration,
but let’s just try evaluating it, and see
what happens.
x −2 dx = −x −1 0
= −1 +
Division by zero is undefined, so the
result is undefined.
x −1/2 dx = 2x 1/2 0
The result turns out to be finite. Intuitively, the reason for this is that the
spike at x = 0 is very skinny, and gets
skinny fast as we go higher and higher
Another way of putting it, using the hyperreal number system, is that if we
were to integrate from to 1, where was an infinitesimal number, then the
result would be −1 + 1/, which is infinite. The smaller we make , the bigger the infinite result we get out.
Intuitively, the reason that this integral
comes out infinite is that the spike at
x = 0 is fat, and doesn’t get skinny
fast enough.
x −2 dx = −x −1 1
=− +1
As H gets bigger and bigger, the result gets closer and closer to 1, so the
result of the improper integral is 1.
b / The integral
is infinite.
dx/x 2
These two examples were examples
of improper integrals.
Note that this is the same graph as
in example 74, but with the x and y
axes interchanged; this shows that the
two different types of improper integrals really aren’t so different.
6.2 Limits of
integration at
Another type of improper integral
is one in which one of the limits of
integration is infinite. The notation
f (x) dx
means the limit of a f (x) dx,
where H is made to grow bigger and bigger. Alternatively, we
can think of it as an integral in
which the top end of the interval
of integration is an infinite hyperreal number. A similar interpretation applies when the lower limit is
−∞, or when both limits are infinite.
Example 76
. Evaluate
x −2 dx
/ The
dx/x 2 is finite.
Example 77
. Newton’s law of gravity states that
the gravitational force between two
objects is given by F = Gm1 m2 /r 2 ,
where G is a constant, m1 and m2
are the objects’ masses, and r is
the center-to-center distance between
them. Compute the work that must be
done to take an object from the earth’s
surface, at r = a, and remove it to
r = ∞.
Gm1 m2
Z ∞
r −2 dr
= Gm1 m2
= −Gm1 m2 r −1 W =
Gm1 m2
The answer is inversely proportional
to a. In other words, if we were able to
start from higher up, less work would
have to be done.
1 Integrate
Z ∞
(b) Find the average value of x, or
show that it diverges.
(c) Find the standard deviation of
x, or show that it diverges.
e−x dx
or show that it diverges.
2 Integrate
Z ∞
or show that it diverges.
Z ∞
x2 2−x dx
or show that it diverges.
. Solution, p. 189
5 Integrate
Z ∞
or show that it diverges.
3 Integrate
Z ∞
e−x xn dx = n!
e−x cos x dx
or show that it diverges. (Problem
15 on p. 97 suggests a trick for doing the indefinite integral.)
6 Prove that
Z ∞
e−e dx
converges, but don’t evaluate it.
7 (a) Verify that the probability
distribution dP/dx given in example 59 on page 78 is properly normalized.
7 Sequences and Series
7.1 Infinite
Consider an infinite sequence of
numbers like 1/2, 2/3, 3/4, 4/5,
. . . We want to define this as approaching 1, or “converging to 1.”
The way to do this is to make a
function f (n), which is only well
defined for integer values of n.
Then f (1) = 1/2, f (2) = 2/3, and
in general f (n) = n/(n + 1). With
just a little tinkering, our definitions of limits can be applied to
this type of function (see problem
1 on page 112).
ligible is left to the reader’s imagination, as in one of those scenes
in a romance novel that ends with
something like “...and she surrendered...” For those with modern
training, the idea is that an infinite sum like 1 + 1 + 1 + . . . would
clearly give an infinite result, but
this is only because the terms are
all staying the same size. If the
terms get smaller and smaller, and
get smaller fast enough, then the
result can be finite. For example,
consider the geometric series in the
case where x = 1/2, for which we
expect the result 1/(1 − 1/2) = 2.
We have
7.2 Infinite series
A related question is how to rigorously define the sum of infinitely
many numbers, which is referred
to as an infinite series. An example is the geometric series 1 + x +
x2 + x3 + . . . = 1/(1 − x), which
we used casually on page 29. The
general concept of an infinite series
goes back to ancient Greek mathematics. Various supposed paradoxes about infinite series, such as
Zeno’s paradox, were exhibited, influencing Euclid to sidestep the issue in his Elements, where in Book
IX, Proposition 35 he provides only
an expression (1 − xn )/(1 − x) for
the nth partial sum of the geometric series. The case where n
gets so big that xn becomes neg-
1 1 1
+ + +
+ ...
2 4 8 16
which at the successive steps of addition equals 1, 1 12 , 1 43 , 1 78 , 1 15
16 ,
. . . . We’re getting closer and closer
to 2, cutting the distance in half
at each step. Clearly we can get as
close as we like to 2, if we’re willing
to add enough terms.
Note that we ended up wanting to
talk about the partial sums of the
series. This is the right way to get
a rigorous definition of the convergence of series in general. In the
case of the geometric series, for example, we can define a sequence of
the partial sums 1, 1+x, 1+x+x2 ,
. . . We can then define convergence
and limits of series in terms of convergence and limits of the partial
Bounded and increasing sequences:
A sequence that always increases,
It’s instructive to see what hapbut never surpasses a certain value,
pens to the geometric series with
x = 0.1. The geometric series becomes
This amounts to a restatement of
the compactness axiom for the real
1 + 0.1 + 0.01 + 0.001 + . . .
. numbers stated on page 155, and
The partial sums are 1, 1.1, 1.11, is therefore to be interpreted not
1.111, . . . We can see vividly here so much as a statement about sethat adding another term will only quences but as one about the real
affect the result in a certain deci- number system. In particular, it
mal place, without affecting any of fails if interpreted as a statement
the earlier ones. For instance, if about sequences confined entirely
we needed a result that was valid to the rational number system,
to three digits past the decimal as we can see from the sequence
place, we could stop at 1.111, be- 1, 1.4, 1.41, 1.414, . . . consisting
ing assured that we had attained a of the successive
√ decimal approximations
2, which does not
good enough approximation. If we
wanted an exact result, we could
also observe that multiplying the
result by 9 would give 9.999 . . .,
Example 78
which is the same as 10, so the . Prove that the geometric series 1 +
result must be 10/9, which is in 1/2 + 1/4 + . . . converges.
agreement with 1/(1 − 1/10) =
. The sequence of partial sums is in10/9.
One thing to watch out for with
infinite series is that the axioms of
the real number system only talk
about finite sums, so it’s easy to
get wrong results by attempting
to apply them to infinite ones (see
problem 2 on page 112).
creasing, since each term is positive.
Each term closes half of the remaining gap separating the previous partial sum from 2, so the sum never surpasses 2. Since the partial sums are
increasing and bounded, they converge to a limit.
Once we know that a particular series converges, we can also easily
7.3 Tests for
infer the convergence of other seconvergence
ries whose terms get smaller faster.
For example, we can be certain
There are many different tests that
that if the geometric series concan be used to determine whether
verges, so does the series
a sequence or series converges. I’ll
briefly state three of the most use- 1
+ ...
ful, with sketches of their proofs.
1 1×2 1×2×3
whose terms get smaller faster converges if and only if 1 f (x)dx
than any base raised to the power does.
Sketch of proof: Since the theoAlternating series with terms ap- rem is supposed to hold for both
proaching zero: If the terms of convergence and divergence, and
a series alternate in sign and ap- is also an “if and only if,” there
proach zero, then the series con- are actually four cases to prove, of
which we pick the representative
one where the integral is known to
Sketch of a proof: The even parconverge and we want to prove contial sums form an increasing severgence of the corresponding sum.
quence, the odd sums a decreasThe sum and the integral can be
ing one. Neither of these sequences
interpreted as the areas under two
of partial sums can be unbounded,
graphs: one like a smooth ramp
since the difference between partial
and one like a staircase. Sliding the
sums n and n + 1 would then have
staircase half a unit to the left, it
to be unbounded, but this differlies entirely underneath the ramp,
ence is simply the nth term, and
and therefore the area under it is
the terms approach zero. Since
also finite.
the even partial sums are increasing and bounded, they converge
Example 80
to a limit, and similarly for the . Prove that the series 1+1/2+1/3+. . .
odd ones. The two limits must diverges.
be equal, since the terms approach
. The integral of 1/x is ln x, which dizero.
Example 79
. Prove that the series 1 − 1/2 + 1/3 −
1/4 + . . . converges.
verges as x approaches infinity, so the
series diverges as well.
The ratio test: If the limit R =
limn→∞ |an+1 /an | exists, then the
. Its convergence follows because it is sum of a converges if R < 1 and
an alternating series with decreasing diverges if R > 1.
terms. The sum turns out to be ln 2,
although the convergence of the series is so slow that an extremely large
number of terms is required in order to
obtain a decent approximation,
The integral test: If the terms of a
series an are positive and decreasing, and f (x) is a positive and decreasing function on the real number line such that f (n) = an , then
the sum of an from n = 1 to ∞
The proof can be obtained by comparing with a geometric series.
Example 81
. Prove that the series 1+1/22 +1/33 +
. . . converges.
. R is easily proved to be 0, so the
sum converges by the ratio test.
At this point it will seem like a
mystery how anyone could have
proved the exact results claimed
for some of the “special” series,
such as 1 − 1/2 + 1/3 − 1/4 +
. . . = ln 2. Problems like these are
not the main focus of the chapter, and in fact there is no welldefined toolbox of techniques that
will allow any such “nice” series to
be evaluated exactly. Even a relatively innocent-looking example
like 1−2 + 2−2 + 3−2 + . . . defeated
some of the best mathematicians of
Europe for years (see problem 16,
p. 114). It is currently unknown
whether some P
apparently simple
series such as
n=1 1/(n sin n)
7.4 Taylor series
a / The function ex , and
the tangent line at x = 0.
is about 0.021, which is about
four times bigger. In other words,
doubling x seems to roughly
quadruple the error, so the error
is proportional to x2 ; it seems to
be about x2 /2. Well, if we want
a handy-dandy, super-accurate
estimate of ex for small values of
x, why not just account for this
error. Our new and improved
estimate is
If you calculate e0.1 on your calculator, you’ll find that it’s very close
to 1.1. This is because the tangent
line at x = 0 on the graph of ex
has a slope of 1 (dex /dx = ex = 1
ex ≈ 1 + x + x2
at x = 0), and the tangent line is
a good approximation to the exponential curve as long as we don’t for small values of x.
get too far away from the point of
How big is the error?
1.10517091807565 . . .,
differs from 1.1 by about 0.005.
If we go farther from the point
of tangency, the approximation
gets worse. At x = 0.2, the error
1 Alekseyev,
“On convergence of the
Flint Hills series,”
b / The function ex , and
the approximation 1 + x +
x 2 /2.
Figure b shows that the approximation is now extremely good for
sufficiently small values of x. The
difference is that whereas 1 + x
matched both the y-intercept and
the slope of the curve, 1 + x + x2 /2
matches the curvature as well. Recall that the second derivative is a
measure of curvature. The second
derivatives of the function and its
approximation are
order term to be (1/2)(1/3):
1 3
ex ≈ 1 + x + x2 +
Figure c shows the result. For a
significant range of x values close
to zero, the approximation is now
so good that we can’t even see the
difference between the two functions on the graph.
On the other hand, figure d shows
that the cubic approximation for
somewhat larger negative and positive values of x is poor — worse,
in fact, than the linear approximation, or even the constant apWe can do even better. Suppose proximation ex = 1. This is to
be expected, because any polynomial will blow up to either positive or negative infinity as x approaches negative infinity, whereas
the function ex is supposed to get
very close to zero for large negative
x. The idea here is that derivatives
are local things: they only measure
the properties of a function very
close to the point at which they’re
evaluated, and they don’t necessarc / The function ex , and
ily tell us anything about points far
the approximation 1 + x +
x /2 + x /6.
d x
e =1
1 2
1+x+ x =1
we want to match the third derivatives. All the derivatives of ex ,
evaluated at x = 0, are 1, so we
just need to add on a term proportional to x3 whose third derivative is one. Taking the first derivative will bring down a factor of 3
in front, and taking and the second derivative will give a 2, so to
cancel these out we need the third-
It’s a remarkable fact, then, that
by taking enough terms in a polynomial approximation, we can always get as good an approximation
to ex as necessary — it’s just that
a large number of terms may be
required for large values of x. In
other words, the infinite series
1 3
x + ...
1 + x + x2 +
The notation for a product like 1 ·
2 · . . . · n is n!, read “n factorial.”
So to get a term for our polynomial
whose fifth derivative is 1, we need
x5 /5!. The result for the infinite
series is
ex =
where the special case of 0! = 1
is assumed.2 This infinite series
is called the Taylor series for ex ,
evaluated around x = 0, and it’s
true, although I haven’t proved it,
d / The function e , and the approxithat this particular Taylor series
mation 1 + x + x /2 + x /6, on a wider
always converges to ex , no matter
how far x is from zero.
always gives exactly ex . But what In general, the Taylor series
is the pattern here that would al- around x = 0 for a function y is
lows us to figure out, say, the
fourth-order and fifth-order terms
an xn
that were swept under the rug
with the symbol “. . . ”? Let’s do
the fifth-order term as an example. where the condition for equality of
The point of adding in a fifth-order the nth order derivative is
term is to make the fifth derivative
1 dn y of the approximation equal to the
an =
n! dxn x=0
fifth derivative of ex , which is 1.
The first, second, . . . derivatives of Here the notation |
x=0 means that
x5 are
the derivative is to be evaluated at
d 5
d2 5
d3 5
d 5
d5 5
= 5x4
= 5 · 4x3
= 5 · 4 · 3x2
= 5 · 4 · 3 · 2x
x = 0.
A Taylor series can be used to approximate other functions besides
ex , and when you ask your calculator to evaluate a function such as a
sine or a cosine, it may actually be
using a Taylor series to do it. Taylor series are also the method Inf
2 This makes sense, because, for example, 4!=5!/5, 3!=4!/4, etc., so we should
have 0!=1!/1.
uses to calculate most expressions
involving infinitesimals. In example 13 on page 29, we saw that
when Inf was asked to calculate
1/(1 − d), where d was infinitesimal, the result was the geometric
: 1/(1-d)
These are also the the first five
terms of the Taylor series for the
function y = 1/(1 − x), evaluated
around x = 0. That is, the geometric series 1 + x + x2 + x3 + . . . is
really just one special example of
a Taylor series, as demonstrated in
the following example.
Example 82
. Find the Taylor series of y = 1/(1 −
x) around x = 0.
. Rewriting the function as y = (1 −
x)−1 and applying the chain rule, we
y |x=0 = 1
dy −2 =
dx x=0
d2 y −3 =
dx 2 x=0
3 d y
= 2 · 3(1 − x)−4 = 2 · 3
dx x=0
The pattern is that the nth derivative
is n!. The Taylor series therefore has
an = n!/n! = 1:
= 1 + x + x2 + x3 + . . .
If you flip back to page 104 and
compare the rate of convergence of
the geometric series for x = 0.1
and 0.5, you’ll see that the sum
converged much more quickly for
x = 0.1 than for x = 0.5. In
general, we expect that any Taylor
series will converge more quickly
when x is smaller. Now consider
what happens at x = 1. The series
is now 1 + 1 + 1 + . . ., which gives
an infinite result, and we shouldn’t
have expected any better behavior, since attempting to evaluate
1/(1 − x) at x = 1 gives division by zero. For x > 1, the results become nonsense. For example, 1/(1 − 2) = −1, which is finite, but the geometric series gives
1 + 2 + 4 + . . ., which is infinite.
In general, every function’s Taylor
series around x = 0 converges for
all values of x in the range defined
by |x| < r, where r is some number, known as the radius of convergence. Also, if the function is
defined by putting together other
functions that are well behaved (in
the sense of converging to their
own Taylor series in the relevant
region), then the Taylor series will
not only converge but converge to
the correct value. For the function
ex , the radius happen to be infinite, whereas for 1/(1−x) it equals
1. The following example shows a
worst-case scenario.
Example 83
The function y = e−1/x , shown in fig-
. The first few derivatives are
dx 2
dx 3
dx 4
dx 5
sin x = cos x
sin x = − sin x
sin x = − cos x
sin x = sin x
sin x = cos x
We can see that there will be a cycle of sin, cos, − sin, and − cos, repeating indefinitely. Evaluating these
e / The function e−1/x never con- derivatives at x = 0, we have 0, 1, 0,
verges to its Taylor series.
−1, . . . . All the even-order terms of
the series are zero, and all the oddorder terms are ±1/n!. The result is
ure e, never converges to its Taylor se1
ries, except at x = 0. This is because
sin x = x − x 3 + x 5 − . . .
the Taylor series for this function, evaluated around x = 0 is exactly zero! At The linear term is the familiar smallx = 0, we have y = 0, dy /dx = 0, angle approximation sin x ≈ x.
d2 y /dx 2 = 0, and so on for every
derivative. The zero function matches The radius of convergence of this sethe function y (x) and all its derivatives ries turns out to be infinite. Intuitively
to all orders, and yet is useless as the reason for this is that the factorian approximation to y (x). The radius als grow extremely rapidly, so that the
of convergence of the Taylor series is successive terms in the series eveninfinite, but it doesn’t give correct re- tually start diminish quickly, even for
sults except at x = 0. The reason large values of x.
for this is that y was built by composExample 85
ing two functions, w(x) = −1/x 2 and
evaluate a
y (w) = ew . The function w is badly limit of the form
behaved at x = 0 because it blows up
there. In particular, it doesn’t have a
x→0 v (x)
well-defined Taylor series at x = 0.
Example 84
. Find the Taylor series of y = sin x,
evaluated around x = 0.
where u(0) = v (0) = 0. L’Hôpital’s rule
tells us that we can do this by taking
derivatives on the top and bottom to
form u 0 /v 0 , and that, if necessary, we
can do more than one derivative, e.g.,
u 00 /v 00 . This was proved on p. 150 using the mean value theorem. But if u
and v are both functions that converge
to their Taylor series, then it is much
easier to see why this works. For example, suppose that their Taylor series both have vanishing constant and
linear terms, so that u = ax 2 + . . . and
v = bx 2 + . . .. Then u 00 = 2a + . . ., and
v 00 = 2b + . . ..
Note that evaluating these at x = 0
wouldn’t have worked, since division
by zero is undefined; this is because
ln x blows up to negative infinity at
x = 0. Evaluating them at x = 1,
we find that the nth derivative equals
±(n − 1)!, so the coefficients of the
Taylor series are ±(n − 1)!/n! = ±1/n,
except for the n = 0 term, which is
zero because ln 1 = 0. The resulting
series is
A function’s Taylor series doesn’t
ln x = (x−1)− (x−1)2 + (x−1)3 +. . .
have to be evaluated around x =
0. The Taylor series around some We can predict that its radius of conother center x = c is given by
vergence can’t be any greater than 1,
Tc (x) =
an (x − c)
dn y =
dxn x=c
To see that this is the right generalization, we can do a change of
variable, defining a new function
g(x) = f (x−c). The radius of convergence is to be measured from
the center c rather than from 0.
Example 86
. Find the Taylor series of ln x, evaluated around x = 1.
. Evaluating a few derivatives, we get
dx 2
dx 3
dx 4
ln x = x −1
ln x = −x −2
ln x = 2x −3
ln x = −6x −4
because ln x blows up at 0, which is at
a distance of 1 from 1.
6 Find the Taylor series expansion of cos x around x = 0. Check
Modify the Weierstrass defini- your work by combining the first
tion of the limit to apply to infinite two terms of this series with the
. Solution, p. 190
first term of the sine function from
(a) Prove that the infinite se- example 84 on page 110 to2 verries 1 − 1 + 1 − 1 + 1 − 1 + . . . ify that the trig identity sin x +
does not converge to any limit, us- cos x = 1 holds for terms up to
ing the generalization of the Weier- order x .
strass limit found in problem 1. 7 In classical physics, the kinetic
(b) Criticize the following argu- energy K of an object of mass m
ment. The series given in part a moving at velocity v is given by
equals zero, because addition is as- K = 1 mv 2 . For example, if a car is
sociative, so we can rewrite it as to start from a stoplight and then
(1 − 1) + (1 − 1) + (1 − 1) + . . . accelerate up to v, this is the the. Solution, p. 190
oretical minimum amount of en3
Use the integral test to prove ergy that would have to be used
the convergence of the geometric up by burning gasoline. (In reality, a car’s engine is not 100% effiseries for 0 < x < 1.
cient, so the amount of gas burned
. Solution, p. 190
is greater.)
Determine the convergence or
divergence of the following series. Einstein’s theory of relativity
states that the correct equation is
(a) 1 + 1/22 + 1/32 + . . .
(b) 1/ ln ln 3−1/ ln ln 6+1/ ln ln 9− actually
1/ ln ln 12 + . . .
K = q
− 1 mc2
ln 2 (ln 2)(ln 3)
where c is the speed of light. The
fact that it diverges as v → c is
+ ...
(ln 2)(ln 3)(ln 4)
interpreted to mean that no object
√ ∞
2 2 X (4k)!(1103 + 26390k)
(k!)4 3964k
can be accelerated to the speed of
Expand K in a Taylor series, and
show that the first nonvanishing
term is equal to the classical ex. Solution, p. 190
pression. This means that for ve5 Give an example of a series for locities that are small compared to
which the ratio test is inconclusive. the speed of light, the classical ex. Solution, p. 191
pression is a good approximation,
and Einstein’s theory does not con- the result due to air resistance is3
tradict any of the prior empirical
g − gvacuum
evidence from which the classical E =
expression was inferred.
ln eb + e2b − 1
8 Expand (1 + x)1/3 in a Taylor
series around x = 0. The value
x = 28 lies outside this series’ ra- where b = h/A, and A is a constant
dius of convergence, but we can that depends on the size, shape,
nevertheless use it to extract the and mass of the object, and the
cube root of 28 by recognizing that density of the air. (For a sphere of
281/3 = 3(28/27)1/3 . Calculate the mass m and diameter d dropping
root to four significant figures of in air, A = 4.11m/d . Cf. problem
precision, and check it in the ob- 20, p. 49.) Evaluate the constant
and linear terms of the Taylor sevious way.
ries for the function E(b).
9 Find the Taylor series expansion of log2 x around x = 1, and
use it to evaluate log2 1.0595 to
four significant figures of precision.
Check your result by using the fact
that 1.0595 is approximately the
twelfth root of 2. This number is
the ratio of the frequencies of two
successive notes of the chromatic
scale in music, e.g., C and D-flat.
11 (a) Prove that the convergence of an infinite series is unaffected by omitting some initial
terms. (b) Similarly, prove that
convergence is unaffected by multiplying all the terms by some constant factor.
The identity
10 In free fall, the acceleration
will not be exactly constant, due
to air resistance. For example, a
skydiver does not speed up indefinitely until opening her chute, but
rather approaches a certain maximum velocity at which the upward
force of air resistance cancels out
the force of gravity. If an object is
dropped from a height h, and the
time it takes to reach the ground is
used to measure the acceleration of
gravity, g, then the relative error in
x−x dx =
is known as the “Sophomore’s
dream,” because at first glance it
looks like the kind of plausible
but false statement that someone
would naively dream up. Verify it
numerically by machine computation.
Does sin x + sin sin x +
sin sin sin x + . . . converge?
. Solution, p. 192 ?
3 Jan Benacka and Igor Stubna, The
Physics Teacher, 43 (2005) 432.
+ ...
1+2 1+2+3
. Solution, p. 191
n + 1 + 1/n!
to six decimal places.
Euler was the first to prove
+ 2 + 2 + ... =
proof, that this factorization procedure could be extended to the
infinite series, so that f could be
represented as the infinite product
x x 1 − 2 ...
f (x) = 1 − 2
By multiplying this out and equating its linear term to that of the
Taylor series, we find the claimed
Extend this procedure to the x2
term and prove the result claimed
for the sum of the inverse fourth
powers of the integers.
sums with odd exponents ≥ 3 are
much harder, and relatively little
is known about them. The sum
of the inverse cubes is known as
Apèry’s constant.)
This problem had defeated other
great mathematicians of his time,
and was famous enough to be given
a special name, the Basel problem. Here we present an argument
based closely on Euler’s and pose
the problem of how to exploit Eu- 17
ler’s technique further in order to
sin(x2 ) dx
converge, or not?
. Solution, p. 191
From the Taylor series for the sine
function, we find the related series 18 Evaluate
lim cos(π n2 − n)
sin x
f (x) = √
=1− +
where n is an integer.
The partial sums of this series are
Determine the convergence
polynomials that approximate f 19
for small values of x. If such a of the series
polynomial were exact rather than
approximate, then it would have
n2 2−n
zeroes at x = π 2 , 4π 2 , 9π 2 , . . . ,
and we could write it as the product of its linear factors. Euler as- and if it converges, evaluate it.
sumed, without any more rigorous
. Solution, p. 192 ?
Determine the convergence
of the series
n2 2−n
and if it converges, evaluate it.
. Solution, p. 192 ?
For what integer values of p
should we expect the series
| cos n|n
to converge? A rigorous proof is
very difficult and may even be an
open problem, but it is relatively
straightforward to give a convincing argument.
. Solution, p. 193 ?
8 Complex number
8.1 Review of
For a more detailed treatment of
complex numbers, see ch. 3 of
James Nearing’s free book at
b / Addition of complex numbers is
just like addition of vectors, although
the real and imaginary axes don’t actually represent directions in space.
a / Visualizing complex numbers as
points in a plane.
We assume there is a number, i,
such that i2 = −1. The square
roots of −1 are then i and −i. (In
electrical engineering work, where
i stands for current, j is sometimes
used instead.) This gives rise to
a number system, called the complex numbers, containing the real
numbers as a subset. Any complex number z can be written in
the form z = a + bi, where a and
b are real, and a and b are then
referred to as the real and imaginary parts of z. A number with
a zero real part is called an imaginary number. The complex numbers can be visualized as a plane,
figure a, with the real number line
placed horizontally like the x axis
of the familiar x − y plane, and the
imaginary numbers running along
the y axis. The complex numbers are complete in a way that the
real numbers aren’t: every nonzero
complex number has two square
roots. For example, 1 is a real
it’s not possible to say whether one
complex number is greater than
another. We can compare them
in terms of their magnitudes (their
distances from the origin), but
two distinct complex numbers may
have the same magnitude, so, for
example, we can’t say whether 1 is
greater than i or i is greater than
√ Example 87
. Prove that 1/ 2 + i/ 2 is a square
root of i.
c / A complex number and its conju. Our proof can use any ordinary rules
of arithmetic, except for ordering.
number, so it is also a member
of the complex numbers, and its
square roots are −1 and 1. Likewise, −1 has square roots i and −i,
i has square
√ the number
1/ 2 + i/ 2 and −1/ 2 − i/ 2.
( √ + √ )2 = √ · √ + √ · √
+√ ·√ +√ ·√
= (1 + i + i − 1)
Complex numbers can be added
and subtracted by adding or subtracting their real and imaginary
Example 87 showed one method
parts, figure b. Geometrically, this
of multiplying complex numbers.
is the same as vector addition.
However, there is another nice inThe complex numbers a + bi and terpretation of complex multiplicaa − bi, lying at equal distances tion. We define the argument of
above and below the real axis, are a complex number, figure d, as its
called complex conjugates. The re- angle in the complex plane, measults of the quadratic formula are sured counterclockwise from the
either both real, or complex conju- positive real axis.
gates of each other. The complex two complex numbers then correconjugate of a number z is notated sponds to multiplying their magnitudes, and adding their arguments,
as z̄ or z ∗ .
figure e.
The complex numbers obey all the
same rules of arithmetic as the re- Self-Check
als, except that they can’t be or- Using this interpretation of multiplicadered along a single line. That is, tion, how could you find the square
d / A complex number can be described in terms of its magnitude and
roots of a complex number?
Answer, p. 163
Example 88
The magnitude |z| of a complex number z obeys the identity |z|2 = z z̄.
To prove this, we first note that z̄ has
the same magnitude as z, since flipping it to the other side of the real axis
doesn’t change its distance from the
origin. Multiplying z by z̄ gives a result whose magnitude is found by multiplying their magnitudes, so the magnitude of z z̄ must therefore equal |z|2 .
Now we just have to prove that z z̄ is a
positive real number. But if, for example, z lies counterclockwise from the
real axis, then z̄ lies clockwise from
it. If z has a positive argument, then
z̄ has a negative one, or vice-versa.
The sum of their arguments is therefore zero, so the result has an argument of zero, and is on the positive
real axis. 1
cheated a little. If z’s argument is
e / The argument of uv is the sum of
the arguments of u and v .
This whole system was built up
in order to make every number
have square roots. What about
cube roots, fourth roots, and so
on? Does it get even more weird
when you want to do those as well?
No. The complex number system
we’ve already discussed is sufficient
to handle all of them. The nicest
way of thinking about it is in terms
of roots of polynomials. In the
real number system, the polynomial x2 − 1 has two roots, i.e., two
values of x (plus and minus one)
that we can plug in to the polynomial and get zero. Because it has
these two real roots, we can rewrite
the polynomial as (x − 1)(x + 1).
However, the polynomial x2 +1 has
no real roots. It’s ugly that in the
real number system, some second30 degrees, then we could say z̄’s was -30,
but we could also call it 330. That’s OK,
because 330+30 gives 360, and an argument of 360 is the same as an argument
of zero.
order polynomials have two roots,
and can be factored, while others
can’t. In the complex number system, they all can. For instance,
x2 + 1 has roots i and −i, and can
be factored as (x − i)(x + i). In
general, the fundamental theorem
of algebra states that in the complex number system, any nth-order
polynomial can be factored completely into n linear factors, and
we can also say that it has n complex roots, with the understanding that some of the roots may be
the same. For instance, the fourthorder polynomial x4 + x2 can be
factored as (x − i)(x + i)(x − 0)(x −
0), and we say that it has four
roots, i, −i, 0, and 0, two of which
happen to be the same. This is a
sensible way to think about it, because in real life, numbers are always approximations anyway, and
if we make tiny, random changes to
the coefficients of this polynomial,
it will have four distinct roots, of
which two just happen to be very
close to zero. I’ve given a proof of
the fundamental theorem of algebra on page 160.
thing happens with the functions
ex , sin x, and cos x:
1 2
x + x3 + . . .
cos x = 1 − x2 + x4 − . . .
sin x = x − x3 + x5 − . . .
ex = 1 +
If x = iφ is an imaginary number,
we have
eiφ = cos φ + i sin φ
a result known as Euler’s formula.
The geometrical interpretation in
the complex plane is shown in figure f.
8.2 Euler’s formula
Having expanded our horizons to
include the complex numbers, it’s
natural to want to extend functions we knew and loved from the
world of real numbers so that they
can also operate on complex numbers. The only really natural way
to do this in general is to use Taylor series. A particularly beautiful
f / The complex number eiφ lies on the
unit circle.
Although the result may seem like
something out of a freak show at
first, applying the definition2 of the
2 See page 149 for an explanation of
where this definition comes from and why
it makes sense.
exponential function makes it clear
Example 89
. Write the sine and cosine functions
how natural it is:
in terms of exponentials.
ex = lim
x n
. Euler’s formula for x = −iφ gives
cos φ − i sin φ, since cos(−θ) = cos θ,
and sin(−θ) = − sin θ.
eix + e−ix
eix − e−ix
sin x =
cos x =
When x = iφ is imaginary, the
quantity (1 + iφ/n) represents a
number lying just above 1 in the
complex plane. For large n, (1 +
Example 90
iφ/n) becomes very close to the
unit circle, and its argument is the . Evaluate
small angle φ/n. Raising this numex cos xdx
ber to the nth power multiplies its
argument by n, giving a number
. Problem 15 on p. 97 suggested a
with an argument of φ.
special-purpose trick for doing this integral. An approach that doesn’t rely
on tricks is to rewrite the cosine in
terms of exponentials:
ex cos xdx
e + e−ix
= ex
(e(1+i)x + e(1−i)x ) dx
1 e(1+i)x e(1−i)x
2 1+i
g / Leonhard
Euler’s formula is used frequently
in physics and engineering.
Since this result is the integral of a
real-valued function, we’d like it to be
real, and in fact it is, since the first and
second terms are complex conjugates
of one another. If we wanted to, we
could use Euler’s theorem to convert
it back to a manifestly real result.3
3 In general, the use of complex number techniques to do an integral could result in a complex number, but that complex number would be a constant, which
could be subsumed within the usual constant of integration.
Example 91
Euler found the equation
π = 20 tan−1
+ 8 tan−1
which allowed the computation of π to
high precision in the era before electronic calculators, since the Taylor series for the inverse tangent converges
rapidly for small inputs. A cute way of
proving the validity of the equation is
to calculate
(7 + i)20 (79 + 3i)8
as follows in Yacas:
The fact that it is purely real, and has
a negative real part, demonstrates
that the quantity on the right side of
the original equation equals π + 2πn,
where n is an integer. Numerical estimation shows that n = 0. Although the
proof was straightforward, it provides
zero insight into how Euler figured it
out in the first place!
gives A = i/2 and B = −i/2, so
x +1
= ln(x + i)
− ln(x − i)
i x+i
= ln
2 x−i
The attractive thing about this approach, compared with the method
used on page 86, is that it doesn’t
require any tricks. If you came
across this integral ten years from
now, you could pull out your old
calculus book, flip through it, and
say, “Oh, here we go, there’s a way
to integrate one over a polynomial
— partial fractions.” On the other
hand, it’s odd that we started out
trying to evaluate an integral that
had nothing but real numbers, and
came out with an answer that isn’t
even obviously a real number.
But what about that expression
(x+i)/(x−i)? Let’s give it a name,
w. The numerator and denomina8.3 Partial fractions
tor are complex conjugates of one
another. Since they have the same
Suppose we want to evaluate the magnitude, we must have |w| = 1,
i.e., w is a complex number that
lies on the unit circle, the kind of
complex number that Euler’s for2
x +1
mula refers to. The numerator
by the method of partial fractions. has an argument of tan−1 (1/x) =
The quadratic formula tells us that π/2 − tan−1 x, and the denomithe roots are i and −i, setting nator has the same argument but
1/(x2 + 1) = A/(x + i) + B/(x − i) with the opposite sign. Division
means subtracting arguments, so
arg w = π − 2 tan−1 x. That means
that the result can be rewritten using Euler’s formula as
= ln ei(π−2 tan x)
x2 + 1
= · i(π − 2 tan−1 x)
= tan−1 x + c
In other words, it’s the same result
we found before, but found without the need for trickery.
Example 92
. Evaluate
dx/ sin x.
. This can be tackled by rewriting the
sine function in terms of complex exponentials, changing variables to u =
eix , and then using partial fractions.
= −2i
sin x
eix − e−ix
= −2i
u − 1/u
= −2
u2 − 1
= ln(u − 1) − ln(u + 1) + c
eix − 1
eix + 1
= ln(−i tan(x/2)) + c
= ln
= ln tan(x/2) + c 0
. Solution, p. 193
Find every complex number
1 Find arg i, arg(−i), and arg 37, 10
that z 3 = 1.
where arg z denotes the argument
. Solution, p. 194
of the complex number z.
2 Visualize the following multiplications in the complex plane
using the interpretation of multiplication in terms of multiplying magnitudes and adding arguments: (i)(i) = −1, (i)(−i) = 1,
(−i)(−i) = −1.
Factor the expression x3 −y 3
into factors of the lowest possible
order, using complex coefficients.
(Hint: use the result of problem
10.) Then do the same using real
. Solution, p. 194
3 If we visualize z as a point in
the complex plane, how should we
visualize −z?
4 Find four different complex
numbers z such that z 4 = 1.
5 Compute the following:
|1 + i| , arg(1 + i) ,
1 1
, arg
1 + i
+ 4x − 4
e−ax cos bx dx
Consider the equation
f 0 (x) = f (f (x)). This is known
as a differential equation: an equa1+i
tion that relates a function to its
own derivatives. What is unusual
about this differential equation is
6 Write the function tan x in
that the right-hand side involves
terms of complex exponentials.
the function nested inside itself.
Given, for example, the value of
7 Evaluate sin3 x dx.
f (0), we expect the solution of
Use Euler’s theorem to derive this equation to exist and to be
the addition theorems that express uniquely defined for all values of x.
sin(a + b) and cos(a + b) in terms That doesn’t mean, however, that
of the sines and cosines of a and b. we can write down such a solution
. Solution, p. 194
as a closed-form expression. Show
that two closed-form expressions
do exist, of the form f (x) = axb ,
Z π/2
and find the two values of b.
cos x cos 2x dx
. Solution, p. 194
(a) Discuss how the integral
could be evaluated, in principle, in
closed form. (b) See what happens
when you try to evaluate it using
computer software. (c) Express it
as a finite sum.
. Solution, p. 195 ?
9 Iterated integrals
9.1 Integrals inside
know how to multiply, so we have
to use brute force), we can first
evaluate the inside sum, which
In various applications, you need equals 8, giving
to do integrals stuck inside other
integrals. These are known as it8
erated integrals, or double inter=1
grals, triple integrals, etc. Similar concepts crop up all the time
even when you’re not doing cal- Notice how the “dummy” variable
culus, so let’s start by imagining c has disappeared. Finally we do
such an example. Suppose you the outside sum, over r, and find
want to count how many squares the result of 64.
there are on a chess board, and you
don’t know how to multiply eight Now imagine doing the same thing
times eight. You could start from with the pixels on a TV screen.
the upper left, count eight squares The electron beam sweeps across
across, then continue with the sec- the screen, painting the pixels in
ond row, and so on, until you each row, one at a time. This is rehow counted every square, giving ally no different than the example
the result of 64. In slightly more of the chess board, but because the
formal mathematical language, we pixels are so small, you normally
could write the following recipe: think of the image on a TV screen
for each row, r, from 1 to 8, con- as continuous rather than discrete.
sider the columns, c, from 1 to 8, This is the idea of an integral in
and add one to the count for each calculus. Suppose we want to find
one of them. Using the sigma no- the area of a rectangle of width a
tation, this becomes
and height b, and we don’t know
that we can just multiply to get
the area ab. The brute force way
do this is to break up the rectr=1 c=1
angle into a grid of infinitesimally
If you’re familiar with computer small squares, each having width
programming, then you can think dx and height dy, and therefore the
of this as a sum that could be infinitesimal area dA = dxdy. For
calculated using a loop nested in- convenience, we’ll imagine that the
side another loop. To evaluate the rectangle’s lower left corner is at
result (again, assuming we don’t the origin. Then the area is given
8 X
by this integral:
Z b Z
area =
Z b
Z a
dx dy
Notice how the leftmost integral
sign, over y, and the rightmost
differential, dy, act like bookends,
or the pieces of bread on a sandwich. Inside them, we have the integral sign that runs over x, and
the differential dx that matches it
on the right. Finally, on the innermost layer, we’d normally have the
thing we’re integrating, but here’s
it’s 1, so I’ve omitted it. Writing the lower limits of the integrals
with x = and y = helps to keep
it straight which integral goes with
with differential. The result is
Z b Z a
area =
Z b
Z a
dx dy
dx dy
let its legs run from the origin to (0, a),
and then to (a, a). In other words, the
triangle sits on top of its hypotenuse.
Then the integral can be set up the
same way as the one before, but for a
particular value of y , values of x only
run from 0 (on the y axis) to y (on the
hypotenuse). We then have
y =0
area =
dx dy
y =0
Z a
y =0
Z a
y dy
y =0
1 2
Note that in this example, because the
upper end of the x values depends
on the value of y , it makes a difference which order we do the integrals
in. The x integral has to be on the inside, and we have to do it first.
a dy
Z b
= ab
Area of a triangle
Example 93
. Find the area of a 45-45-90 right triangle having legs a.
. Let the triangle’s hypotenuse run
from the origin to the point (a, a), and
Volume of a cube
Example 94
. Find the volume of a cube with sides
of length a.
. This is a three-dimensional example,
so we’ll have integrals nested three
deep, and the thing we’re integrating
is the volume dV = dxdy dz.
The definite integral equals π, as you
can find using a trig substitution or
simply by looking it up in a table, and
the result is, as expected, πR 2 /2 for
the area of the semicircle. Doubling it,
we find the expected result of πR 2 for
a full circle.
volume =
Z a
Z a
Z a
dx dy dz
a dy dz
9.2 Applications
dy dz
Up until now, the integrand of the
innermost integral has always been
1, so we really could have done all
the double integrals as single integrals. The following example is one
in which you really need to do iterated integrals.
a dz
= a2
= a3
Area of a circle
Example 95
. Find the area of a circle.
. To make it easy, let’s find the area
of a semicircle and then double it. Let
the circle’s radius be r , and let it be
centered on the origin and bounded
below by the x axis. Then the curved
edge is given by
√ the equation R =
x + y , or y = R − x . Since the
y integral’s limit depends on x, the x
integral has to be on the outside. The
area is
Z √
R 2 −x 2
dy dx
area =
Z r
y =0
R 2 − x 2 dx
Z r
1 − (x/R)2 dx
a / The famous tightrope
walker Charles Blondin
uses a long pole for its
large moment of inertia.
Substituting u = x/R,
area = R 2
1 − u 2 du
Moments of inertia
Example 96
The moment of inertia is a measure
of how difficult it is to start an ob-
ject rotating (or stop it). For example,
tightrope walkers carry long poles because they want something with a big
moment of inertia. TheR moment of inertia is defined by I = R 2 dm, where
dm is the mass of an infinitesimally
small portion of the object, and R is
the distance from the axis of rotation.
To start with, let’s do an example that
doesn’t require iterated integrals. Let’s
calculate the moment of inertia of a
thin rod of mass M and length L about
a line perpendicular to the rod and
passing through its center.
problem. The integrand of the remaining double integral breaks down into
two terms, each of which depends on
only one of the variables, so we break
it into two integrals,
I =ρb
[r = |x|, so R 2 = x 2 ]
which we know have identical results.
We therefore only need to evaluate
one of them and double the result:
Now let’s do one that requires iterated integrals: the moment of inertia
of a cube of side b, for rotation about
an axis that passes through its center
and is parallel to four of its faces.
Let the origin be at the center of the
cube, and let x be the rotation axis.
I = R 2 dm
= ρ R 2 dV
= ρb
z 2 dy dz
R 2 dm
+ ρb
y 2 dy dz
y 2 + z 2 dx dy dz
y 2 + z 2 dy dz
The fact that the last step is a trivial integral results from the symmetry of the
I = 2ρb
= 2ρb2
1 5
= Mb2
z 2 dy d z
z 2 dz
9.3 Polar coordinates
c / Polar coordinates.
b / René
Philosopher and mathematician
René Descartes originated the idea
of describing plane geometry using
(x, y) coordinates measured from
a pair of perpendicular coordinate
axes. These rectangular coordinates are known as Cartesian coordinates, in his honor.
As a logical extension of Descartes’
idea, one can find different ways of
defining coordinates on the plane,
such as the polar coordinates in figure c. In polar coordinates, the differential of area, figure d can be
written as da = R dR dφ. The
idea is that since dR and dφ are infinitesimally small, the shaded area
in the figure is very nearly a rectangle, measuring dR is one dimension and R dφ in the other. (The
latter follows from the definition of
radian measure.)
d / The differential of
area in polar coordinates
Example 97
. A disk has mass M and radius b.
Find its moment of inertia for rotation about the axis passing perpendicularly through its center.
R 2 dM
= R 2 2 da
Z b Z 2π
R 2 · R dφ dR
πb2 R=0 φ=0
Z b
Z 2π
dφ dR
πb2 R=0
2M b 3
R dR
= 2
which corresponds to a probability of
1. As discussed on p. 93, the corresponding indefinite integral can’t be
done in closed form. The definite integral from −∞ to +∞, however, can
be evaluated by the following devious
trick due to Poisson. We first write I 2
as a product of two copies of the integral.
Z ∞
Z ∞
I2 =
e−x dx
e−x dx
Since the variable of integration x is
a “dummy” variable, we can choose it
to be any letter of the alphabet. Let’s
change the second one to y:
Z ∞
Z ∞
−x 2
−y 2
I =
e dx
e dy
This is in principle a pointless and trivial change, but it suggests visualizing
the right-hand side in the Cartesian
plane, and considering it as the integral of a single function that depends
on both x and y :
Z ∞ Z ∞ 2
I2 =
e−y e−x dxdy
e / The function e−x , example 98.
Example 98
In statistics, the standard “bell curve”
(also known as the normal distribution
or Gaussian) is shaped like e−x . An
area under this curve is proportional
to the probability that x lies within a
certain range. To fix the constant of
proportionality, we need to evaluate
Z ∞
e−x dx
Switching to polar coordinates, we
Z 2π Z ∞
I2 =
e−R RdR dφ
Z ∞0
= 2π
e−R RdR
which can be done using the substitution u = R 2 , du = 2RdR:
Z ∞
I 2 = 2π
e−u (du/2)
I= π
9.4 Spherical and
tance divided by a distance. Therefore the only factors in the expression
that have units are R, dR, and dz. If
these three factors are measured, say,
in meters, then their product has units
of cubic meters, which is correct for a
In cylindrical coordinates (R, φ, z),
z measures distance along the axis,
R measures distance from the axis,
Example 100
and φ is an angle that wraps
. Find the volume of a cone whose
around the axis.
height is h and whose base has radius
. Let’s plan on putting the z integral
on the outside of the sandwich. That
means we need to express the radius
rmax of the cone in terms of z. This
comes out nice and simple if we imagine the cone upside down, with its tip
at the origin. Then since we have
rmax (z = 0) = 0, and rmax (h) = b, evidently rmax = zb/h.
R dφ dR dz
f / Cylindrical coordinates.
r =0
= 2π
R dR dz
Z h
= 2π
r =0
(zb/h)2 /2 dz
Z h
The differential of volume in cylin2
z 2 dz
drical coordinates can be written
as dv = R dR dzdφ. This folπb2 h
lows from adding a third dimen3
sion, along the z axis, to the rectangle in figure d.
As a check, we note that the answer
Example 99
. Show that the expression for dv has
the right units.
. Angles are unitless, since the definition of radian measure involves a dis-
has units of volume. This is the classical result, known by the ancient Egyptians, that a cone has one third the volume of its enclosing cylinder.
In spherical coordinates (r, θ, φ),
the coordinate r measures the distance from the origin, and θ and φ
are analogous to latitude and longitude, except that θ is measured
down from the pole rather than
from the equator.
g / Spherical coordinates.
The differential of volume in
spherical coordinates is dv =
r2 sin θ dr dθ dφ.
Example 101
. Find the volume of a sphere.
r =b
r =0
r =b
= 2π
= 2π ·
r 2 sin θ dφ dr dθ
r 2 sin θ dr dθ
r =0
Z π
sin θ dθ
Find the volume enclosed by the
swinging rope, in terms of the ra1 Pascal’s snail (named after dius b of the circle at the rope’s
Étienne Pascal, father of Blaise fattest point, and the straight-line
Pascal) is the shape shown in the distance ` between the ends.
figure, defined by R = b(1 + cos θ)
5 A curvy-sided cone is defined in
in polar coordinates.
cylindrical coordinates by 0 ≤ z ≤
(a) Make a rough visual estimate
h and R ≤ kz 2 . (a) What units
of its area from the figure.
are implied for the constant k? (b)
(b) Find its area exactly, and check
Find the volume of the shape. (c)
against your result from part a.
Check that your answer to b has
(c) Show that your answer has the
the right units.
right units.
[Thompson, 1919]
6 The discovery of nuclear fission was originally explained by
modeling the atomic nucleus as a
drop of liquid. Like a water balloon, the drop could spin or vibrate, and if the motion became
sufficiently violent, the drop could
split in half — undergo fission. It
was later learned that even the
nuclei in matter under ordinary
conditions are often not spherical
but deformed, typically with an
elongated ellipsoidal shape like an
American football. One simple
Problem 1: Pascal’s snail with b = 1.
way of describing such a shape is
with the equation
2 A cone with a curved base is
defined by r ≤ b and θ ≤ π/4 in
spherical coordinates.
(a) Find its volume.
(b) Show that your answer has the
right units.
r ≤ b[1 + c(cos2 θ − k)]
where c = 0 for a sphere, c > 0 for
an elongated shape, and c < 0 for
a flattened one. Usually for nuclei
in ordinary matter, c ranges from
about 0 to +0.2. The constant k
3 Find the moment of inertia of is introduced because without it, a
a sphere for rotation about an axis change in c would entail not just
a change in the shape of the nupassing through its center.
cleus, but a change in its volume
4 A jump-rope swinging in circles as well. Observations show, on the
has the shape of a sine function. contrary, that the nuclear fluid is
highly incompressible, just like ordinary water, so the volume of the
nucleus is not expected to change
significantly, even in violent processes like fission. Calculate the
volume of the nucleus, throwing
away terms of order c2 or higher,
and show that k = 1/3 is required
in order to keep the volume constant.
This problem is a continuation of problem 6, and assumes the
result of that problem is already
known. The nucleus 168 Er has the
type of elongated ellipsoidal shape
described in that problem, with
c > 0. Its mass is 2.8 × 10−25 kg,
it is observed to have a moment
of inertia of 2.62 × 10−54 kg · m2
for end-over-end rotation, and its
shape is believed to be described
by b ≈ 6 × 10−15 m and c ≈ 0.2.
Assuming that it rotated rigidly,
the usual equation for the moment
of inertia could be applicable, but
it may rotate more like a water balloon, in which case its moment of
inertia would be significantly less
because not all the mass would actually flow. Test which type of rotation it is by calculating its moment of inertia for end-over-end rotation and comparing with the observed moment of inertia.
8 Von Kármán found empirically
that when a fluid flows turbulently
through a cylindrical pipe, the velocity of flow v varies according
to the “1/7 power law,” v/vo =
(1 − r/R)1/7 , where vo is the velocity at the center of the pipe, R is
the radius of the pipe, and r is the
distance from the axis. Find the
average velocity at which water is
transported through the pipe.
A Detours
Formal definition of the tangent line
Given a function x(t), consider any point P = (a, x(a)) on its graph.
Let the function `(t) be a line passing through P. We say that ` cuts
through x at P if there exists some real number d > 0 such that the
graph of ` is on one side of the graph of x for all a − d < t < a, and is
on the other side for all a < t < a + d.
Definition (Marsden1 ): A line ` through P is said to be the line tangent
to x at P if all lines through P with slopes less than that of ` cut
through x in one direction, while all lines with slopes greater than P’s
cut through it in the opposite direction.
The reason for the complication in the definition is that there are cases
in which the function is smooth and well-behaved throughout a certain
region, but for a certain point P in that region, all lines through P cut
through P. For example, the function x(t) = t3 is blessed everywhere
with lines that don’t cut through it — everywhere, that is, except at
t = 0, which is an inflection point (p. 17). Our definition fills in the
“gap tooth” in the derivative function in the obvious way.
Example 102
As an example, we demonstrate that the derivative of t 3 is zero where it passes
through the origin. Define the line `(t) = bt with slope b, passing through the
origin. For b < 0, ` cuts the graph of t 3 once at the origin, going down and
the right. For b > 0, ` cuts the graph of t 3√in three places, at t = 0 and ± b.
Picking any positive value of d less than b, we find that ` cuts the graph at
the origin, going up and to the right. Therefore b = 0 gives the tangent line at
the origin.
1 Calculus
Derivatives of polynomials
Some ideas in this proof are due to Tom Goodwillie.
Theorem: For n = 0, 1, 2, . . . , the derivative of the function x defined
by x(t) = tn is ẋ = ntn−1 .
The results for n = 0 and 1 hold by direct application of the definition
of the derivative.
For n > 1, it suffices to prove ẋ(0) = 0 and ẋ(1) = n, since the result for
other nonzero values of t then follows by the kind of scaling argument
used on page 13 for the n = 2 case.
We use the following properties of the derivative, all of which follow
immediately from its definition as the slope of the tangent line:
Shift. Shifting a function x(t) horizontally to form a new function x(t+c)
gives a derivative at any newly shifted point that is the same as
the derivative at the corresponding point on the unshifted graph.
Flip. Flipping the function x(t) to form a new function x(−t) negates
its derivative at t = 0.
Add. The derivative of the sum or difference of two functions is the sum
or difference of their derivatives.
For even n, ẋ(0) = 0 follows from the flip property, since x(−t) is the
same function as x(t). For n = 3, 5, . . . , we apply the definition of the
derivative in the same manner as was done in the preceding section for
n = 3.
We now need to show that ẋ(1) = n. Define the function u as
u(t) = x(t + 1) − x(t)
= 1 + nt + . . .
where the second line follows from the binomial theorem, and
. . . represents terms involving t2 and higher powers. Since we’ve already
established the results for n = 0 and 1, differentiation gives
u̇(t) = n + . . .
Now let’s evaluate this at t = 0, where, as shown earlier, the terms
represented by . . . all vanish. Applying the add and shift properties, we
ẋ(1) − ẋ(0) = n
But since ẋ(0) = 0, this completes the proof.
Although this proof was for integer exponents n ≥ 1, the result is also
true for any real value of n; see example 24 on p. 41.
Details of the proof of the derivative of the sine function
Some ideas in this proof are due to Jerome Keisler (see references, p.
On page 28, I computed the derivative of sin t to be cos t as follows:
dx = sin(t + dt) − sin t
= sin t cos dt
+ cos t sin dt − sin t
= cos t dt + . . .
We want to prove prove that the error “. . . ” introduced by the smallangle approximations really is of order dt2 .
A quick and dirty way to check whether this is likely to be true is to
use Inf to calculate sin(t + dt) at some specific value of t. For example,
at t = 1 we have this result:
: sin(1+d)
The small-angle approximations give sin(1 + d) ≈ sin 1 + (cos 1)d. The
coefficients of the first two terms of the exact result are, as expected
sin(1) = 0.84147 and cos(1) = 0.5403 . . ., so although the small-angle
approximations have introduced some errors, they involve only higher
powers of dt, as claimed.
The demonstration with Inf has two shortcomings. One is that it only
works for t = 1, but we need to prove that the result for all values
of t. That doesn’t mean that the check for t = 1 was useless. Even
though a general mathematical statement about all numbers can never
be proved by demonstrating specific examples for which it succeeds, a
single counterexample suffices to disprove it. The check for t = 1 was
worth doing, because if the first term had come out to be 0.88888, it
would have immediately disproved our claim, thereby saving us from
wasting hours attempting to prove something that wasn’t true.
The other problem is that I’ve never explained how Inf calculates this
kind of thing. The answer is that it uses something called a Taylor
series, discussed in section 7.4. Using Inf here without knowing yet
how Taylor series work is like√using your calculator as a “black box”
to extract the square root of 2 without knowing how it does it. Not
knowing the inner workings of the black box makes the demonstration
less than satisfying.
In any case, this preliminary check makes it sound like it’s reasonable
to go on and try to produce a real proof. We have
sin(t + dt) = sin t + cos tdt − E
where the error E introduced by the approximations is
E = sin t(1 − cos dt)
+ cos t(dt − sin dt)
Let the radius of the circle in figure a be one, so AD is cos dt and CD is
a / Geometrical interpretation of the error term.
sin dt. The area of the shaded pie slice is dt/2, and the area of triangle
ABC is sin dt/2, so the error made in the approximation sin dt ≈ dt
equals twice the area of the dish shape formed by line BC and arc BC.
Therefore dt−sin dt is less than the area of rectangle CEBD. But CEBD
has both an infinitesimal width and an infinitesimal height, so this error
is of no more than order dt2 .
For the approximation
cos dt ≈ 1, the error (represented by BD) is
1 − cos dt = 1 − 1 − sin2 dt, which is less than 1 − 1 − dt2 , since
sin dt < dt. Therefore this error is of order dt2 .
Formal statement of the transfer principle
On page 33, I gave an informal description of the transfer principle. The
idea being expressed was that the phrases “for any” and “there exists”
can only be used in phrases like “for any real number x” and “there
exists a real number y such that. . . ” The transfer principle does not
apply to statements like “there exists an integer x such that. . . ” or
even “there exists a subset of the real numbers such that. . . ”
The way to state the transfer principle more rigorously is to get rid of
the ambiguities of the English language by restricting ourselves to a welldefined language of mathematical symbols. This language has symbols
∀ and ∃, meaning ”for all” and ”there exists,” and these are called
quantifiers. A quantifier is always immediately followed by a variable,
and then by a statement involving that variable. For example, suppose
we want to say that a number greater than 1 exists. We can write the
statement ∃x x > 1, read as “there exists a number x such that x is
greater than 1.” We don’t actually need to say “there exists a number
x in the set of real numbers such that . . . ,” because our intention here
is to make statements that can be translated back and forth between
the reals and the hyperreals. In fact, we forbid this type of explicit
reference to the domain to which the quantifiers apply. This restriction
is described technically by saying that we’re only allowing first-order
Quantifiers can be nested. For example, I can state the commutativity
of addition as ∀x∀y x + y = y + x, and the existence of additive inverses
as ∀x∃y x + y = 0.
After the quantifier and the variable, we have some mathematical assertion, in which we’re allowed to use the symbols =, >, × and + for
the basic operations of arithmetic, and also parentheses and the logical
operators ¬, ∧ and ∨ for “not,” “and,” and “or.” Although we will
often find it convenient to use other symbols, such as 0, 1, −, /, ≤,
6=, etc., these are not strictly necesary. We use them only as a way of
making the formulas more readable, with the understanding that they
could be translated into the more basic symbols. For instance, I can
restate ∃x x > 1 as ∃x∃y∀z yz = z ∧ x > y. The number y ends up
just being a name for 1, because it’s the only number that will always
satisfy yz = z.
Finally, these statements need to satisfy certain syntactic rules. For
example, we can’t have a string of symbols like x + ×y, because the
operators + and × are supposed to have numbers on both sides.
A finite string of symbols satisfying all the above rules is called a wellformed formula (wff) in first-order logic.
The transfer principle states that a wff is true on the real numbers if
and only if it is true on the hyperreal numbers.
If you look in an elementary algebra textbook at the statement of all the
elementary axioms of the real number system, such as commutativity
of multiplication, associativity of addition, and so on, you’ll see that
they can all be expressed in terms of first-order logic, and therefore
you can use them when manipulating hyperreal numbers. However, it’s
not possible to fully characterize the real number system without giving
at least some further axioms that cannot be expressed in first order.
There is more than one way to set up these additional axioms, but
for example one common axiom to use is the Archimedean principle,
which states that there is no number that is greater than 1, greater
than 1 + 1, greater than 1 + 1 + 1, and so on. If we try to express
this as a well-formed formula in first order logic, one attempt would
be ¬∃x x > 1 ∧ x > 1 + 1 ∧ x > 1 + 1 + 1 . . ., where the . . .
indicates that the string of symbols would have to go on forever. This
doesn’t work because a well-formed formula has to be a finite string
of symbols. Another attempt would be ∃x∀n ∈ N x > n, where N
means the set of integers. This one also fails to be a wff in first-order
logic, because in first-order logic we’re not allowed to explicitly refer
to the domain of a quantifier. We conclude that the transfer principle
does not necessarily apply to the Archimedean principle, and in fact
the Archimedean principle is not true on the hyperreals, because they
include numbers that are infinite.
Now that we have a thorough and rigorous understanding of what the
transfer principle says, the next obvious question is why we should believe that it’s true. This is discussed in the following section.
Is the transfer principle true?
The preceding section stated the transfer principle in rigorous language.
But why should we believe that it’s true?
One approach would be to begin deducing things about the hyperreals,
and see if we can deduce a contradiction. As a starting point, we can
use the axioms of elementary algebra, because the transfer principle
tells us that those apply to the hyperreals as well. Since we also assume
that the Archimedean principle does not hold for the hyperreals, we
can also base our reasoning on that, and therefore many of the things
we can prove will be things that are true for the hyperreals, but false
for the reals. This is essentially what mathematicians started doing
immediately after Newton and Leibniz invented the calculus, and they
were immediately successful in producing contradictions. However, they
weren’t using formally defined logical systems, and they hadn’t stated
anything as specific and rigorous as the transfer principle. In particular,
they didn’t understand the need for anything like our restriction of the
transfer principle to first-order logic. If we could reach a contradiction
based on the more modern, rigorous statement of the transfer principle,
that would be a different matter. It would tell us that one of two things
was true: either (1) the hyperreal number system lacks logical selfconsistency, or (2) both the hyperreals and the reals lack self-consistency.
Abraham Robinson proved, however, around 1960 that the reals and the
hyperreals have the same level of consistency: one is self-consistent if
and only if the other is. In other words, if the hyperreals harbor a ticking
logical time bomb, so do the reals. Since most mathematicians don’t
lose much sleep worrying about a lack of self-consistency in the real
number system, this is generally taken as meaning that infinitesimals
have been rehabilitated. In fact, it gives them an even higher level
of respectability than they had in the era of Gauss and Euler, when
they were widely used, but mathematicians knew a valid style of proof
involving infinitesimals only because they’d slowly developed the right
“Spidey sense.”
But how in the world could Robinson have proved such a thing? It seems
like a daunting task. There is an infinite number of possible logical trains
of argument in mathematics. How could he have demonstrated, with a
stroke of a pen, that none of them could ever lead to a contradiction
(unless it indicated a contradiction lurking in the real number system
as well)? Obviously it’s not possible to check them all explicitly.
The way modern logicians prove such things is usually by using models.
For an easy example of a model, consider Euclidean geometry. Euclid
believed that the following four postulates2 were all self-evident:
1. Let the following be postulated: to draw a straight line from any
point to any point.
2. To extend a finite straight line continuously in a straight line.
3. To describe a circle with any center and radius.
2 modified
slightly by me from a translation by T.L. Heath, 1925
4. That all right angles are equal to one another.
These postulates, which today we would call “axioms,” played the same
role with respect to Euclidean geometry that the elementary axioms of
arithmetic play for the real number system.
Euclid also found that he needed a fifth postulate in order to prove many
of his most important theorems, such as the Pythagorean theorem. I’ll
state a different axiom that turns out to be equivalent to it:
5. Playfair’s version of the parallel postulate: Given any infinite
line L, and any point P not on that line, there exists a unique infinite
line through P that never crosses L.
The ancients believed this to be less obviously self-evident than the first
four, partly because if you were given the two lines, it could theoretically
take an infinite amount of time to inspect them and verify that they
never crossed, even at some very distant point. Euclid avoided even
mentioning infinite lines in postulates 1-4, and he considered postulate 5
to be so much less intuitively appealing in comparison that he organized
the Elements so that the first 28 propositions were those that could be
proved without resorting to it. Continuing the analogy with the reals
and hyperreals, the parallel postulate plays the role of the Archimedean
principle: a statement about infinity that we don’t feel quite so sure
For centuries, geometers tried to prove the parallel postulate from the
first five. The trouble with this kind of thing was that it could be difficult
to tell what was a valid proof and what wasn’t. The postulates were
written in an ambiguous human language, not a formal logical system.
As an example of the kind of confusion that could result, suppose we
assume the following postulate, 50 , in place of 5:
50 : Given any infinite line L, and any point P not on that line, every
infinite line through P crosses L.
Postulate 50 plays the role for noneuclidean geometry that the negation
of the Archimedean principle plays for the hyperreals. It tells us we’re
not in Kansas anymore. If a geometer can start from postulates 1-4
and 50 and arrive at a contradiction, then he’s made significant progress
toward proving that postulate 5 has to be true based on postulates 1-4.
(He would also have to disprove another version of the postulate, in
which there is more than one parallel through P.) For centuries, there
have been reasonable-sounding arguments that seemed to give such a
contradiction. For instance, it was proved that a geometry with 50 in it
was one in which distances were limited to some finite maximum. This
would appear to contradict postulate 3, since there would be a limit
on the radius of a circle. But there’s plenty of room for disagreement
here, because the ancient Greeks didn’t have any notion of a set of real
numbers. For them, the thing we would call a number was simply a
finite straight line (line segment) with a certain length. If postulate
3 says that we can make a circle given any radius, it’s reasonable to
interpret that as a statement that given any finite straight line as the
specification of the radius, we can make the circle. There is then no
contradiction, because the too-long radius can’t be specified in the first
place. This muddle is similar to the kind of confusion that reigned for
centuries after Newton: did infinitesimals lead to contradictions?
In the 19th century, Lobachevsky and Bolyai came up with a version of
Euclid’s axioms that was more rigorously defined, and that was carefully engineered to avoid the kinds of contradictions that had previously
been discovered in noneuclidean geometry. This is analogous to the invention of the transfer principle and the realization that the restriction
to first-order logic was necessary. Lobachevsky and Bolyai slaved away
for year after year proving new results in noneuclidean geometry, wondering whether they would ever reach a contradiction. Eventually they
started to doubt that there were ever going to be contradictions, and
finally they proved that the contradictions didn’t exist.
The technique for proving consistency was to make a model of the noneuclidean system. Consider geometry done on the surface of a sphere. The
word “line” in the axioms now has to be understood as referring to a
great circle, i.e., one with the same radius as the sphere. The parallel
postulate fails, because parallels don’t exist: every great circle intersects
every other great circle. One modification has to be made to the model
in order to make it consistent with the first postulate. The constructions
described in Euclid’s postulates are tacitly assumed to be unique (and
in more rigorous formulations are explicitly stated to be so). We want
there to be a unique line defined by any two distinct points. This works
fine on the sphere as long as the points aren’t too far apart, but it fails if
the points are antipodes, i.e., they lie at opposite sides of the sphere. For
example, every line of longitude on the Earth’s surface passes through
both poles. The solution to this problem is to modify what we mean by
“point.” Points at each other’s antipodes are considered to be the same
point. (Or, equivalently, we can do geometry on a hemisphere, but agree
that when we go off one edge, we “wrap around” to the opposite side.)
This spherical model obeys all the postulates of this particular system of
noneuclidean geometry. But consider now that we constructed it inside
a surrounding three-dimensional space in which the parallel postulate
does hold. Now suppose we keep on proving theorems in this system
of noneuclidean geometry, filling up page after page with proofs using
words like “line,” which we mentally associate with great circles on a
certain sphere — and eventually we reach a contradiction. But now we
can go back through our proofs, and in every place where the word “line”
occurs we can cross it out with a red pencil and put in “great circle on
this particular sphere.” It would now be a proof about Euclidean geometry, and the contradiction would prove that Euclidean geometry lacked
self-consistency. We therefore arrive at the result that if noneuclidean
geometry is inconsistent, so is Euclidean geometry. Since nobody believes that Euclidean geometry is inconsistent, this is considered the
moral equivalent of proving noneuclidean geometry to be consistent.
If you’ve been keeping the system of analogies in mind as you read this
story, it should be clear what’s coming next. If we want to prove that
the hyperreals have the same consistency as the reals, we just have to
construct a model of the hyperreals using the reals. This is done in detail
elsewhere (see Stroyan and in the references, p. 199).
I’ll just sketch the general idea. A hyperreal number is represented by
an infinite sequence of real numbers. For example, the sequence
7, 7, 7, 7, . . .
would be the hyperreal version of the number 7. A sequence like
1, 2, 3, . . .
represents an infinite number, while
1 1
1, , , . . .
2 3
is infinitesimal. All the arithmetic operations are defined by applying
them to the corresponding members of the sequences. For example, the
sum of the 7, 7, 7, . . . sequence and the 1, 2, 3, . . . sequence would be 8,
9, 10, . . . , which we interpret as a somewhat larger infinite number.
The big problem in this approach is how to compare hyperreals, because
a comparison like < is supposed to give an answer that is either true or
false. It’s not supposed to give a hyperreal number as the result.
It’s clear that 8, 9, 10, . . . is greater than 1, 1, 1, . . . , because every
member of the first sequence is greater than every member of the second one. But is 8, 9, 10, . . . greater than 9, 9, 9, . . . ? We want the
answer to be “yes,” because we’re thinking of the first one as an infinite
number and the second one as the ordinary finite number 9. The first
sequence is indeed greater than the second at almost every one of the
infinite number of places at which they could be compared. The only
place where it loses the contest is at the very first position, and the
only spot where we get a tie is the second one. Essentially the idea is
that we want to define a concept of what happens “almost everywhere”
on some infinite list. If one thing happens in an infinite number of
places and something else only happens at some finite number of spots,
then the definition of “almost everywhere” is clear. What’s harder is a
comparison of something like these two sequences:
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, . . .
1, 3, 1, 1, 3, 1, 1, 1, 3, 1, 1, 1, 1, 3, . . .
where the second sequence has longer and longer runs of ones interspersed between the threes. The two sequences are never equal at any
position, so clearly they can’t be considered to be equal as hyperreal
numbers. But there is an infinite number of spots in which the first
sequence is greater than the second, and likewise an infinite number in
which it’s less. It seems as though there are more in which it’s greater,
so we probably want to define the second sequence as being a hyperreal
number that’s less than 2. The problem is that it can be very difficult to
write down an acceptable definition of this “almost everywhere” notion.
The answer is very technical, and I won’t go into it here, but it can be
done. Because two sequences could be equal almost everywhere, we end
up having to define a hyperreal number not as a particular sequence but
as a set of sequences that are equal to each other almost everywhere.
With the construction of this model, it is possible to prove that the
hyperreals have the same level of consistency as the reals.
The transfer principle applied to functions
On page 34, I told you not to worry
√ about whether it was legitimate
to apply familiar functions like x2 , x, sin x, cos x, and ex to hyperreal
numbers. But since you’re reading this, you’re obviously in need of more
For some of these functions, the transfer principle straightforwardly
guarantees that they work for hyperreals, have all the familiar properties, and can be computed in the same way. For example, the following
statement is in a suitable form to have the transfer principle applied to
it: For any real number x, x · x ≥ 0. Changing “real” to “hyperreal,”
we find out that the square of a hyperreal number is greater than or
equal to zero, just like the square of a real number. Writing it as x2
or calling it a square is just a matter of notation and terminology. The
same applies to this statement: For any real number x ≥ 0, there exists
a real number y such that y 2 = x. Applying the transfer function to it
tells us that square roots can be defined for the hyperreals as well.
There’s a problem, however, when we get to functions like sin x and
ex . If you look up the definition of the sine function in a trigonometry
textbook, it will be defined geometrically, as the ratio of the lengths of
two sides of a certain triangle. The transfer principle doesn’t apply to
geometry, only to arithmetic. It’s not even obvious intuitively that it
makes sense to define a sine function on the hyperreals. In an application
like the differentiation of the sine function on page 28, we only had to
take sines of hyperreal numbers that were infinitesimally close to real
numbers, but if the sine is going to be a full-fledged function defined on
the hyperreals, then we should be allowed, for example, to take the sine
of an infinite number. What would that mean? If you take the sine of a
number like a million or a billion on your calculator, you just get some
apparently random result between −1 and 1. The sine function wiggles
back and forth indefinitely as x gets bigger and bigger, never settling
down to any specific limiting value. Apparently we could have sin H = 1
for a particular infinite H, and then sin(H + π/2) = 0, sin(H + π) = −1,
It turns out that the moral equivalent of the transfer function can indeed
be applied to any function on the reals, yielding a function that is in
some sense its natural “big brother” on the the hyperreals, but the
consequences can be either disturbing or exhilirating depending on your
tastes. For example, consider the function [x] that takes a real number
x and rounds it down to the greatest integer that is less than or equal
to to x, e.g., [3] = 3, and [π] = 3. This function, like any other real
function, can be extended to the hyperreals, and that means that we
can define the hyperintegers, the set of hyperreals that satisfy [x] = x.
The hyperintegers include the integers as a subset, but they also include
infinite numbers. This is likely to seem magical, or even unreasonable,
if we come at the hyperreals from a purely axiomatic point of view. The
extension of functions to the hyperreals seems much more natural in
view of the construction of the hyperreals in terms of sequences given in
the preceding section. For example, the sequence 1.3, 2.3, 3.3, 4.3, 5.3, . . .
represents an infinite number. If we apply the [x] function to it, we get
1, 2, 3, 4, 5, . . ., which is an infinite integer.
Proof of the chain rule
In the statement of the chain rule on page 37, I followed my usual
custom of writing derivatives as dy/dx, when actually the derivative is
the standard part, st(dy/dx). In more rigorous notation, the chain rule
should be stated like this:
= st
The transfer principle allows us to rewrite the left-hand side as
st[(dz/dy)(dy/dx)], and then we can get the desired result using the
identity st(ab) = st(a)st(b).
Derivative of ex
All of the reasoning on page 39 would have applied equally well to any
other exponential function with a different base, such as 2x or 10x .
Those functions would have different values of c, so if we want to determine the value of c for the base-e case, we need to bring in the definition
of e, or of the exponential function ex , somehow.
We can take the definition of ex to be
x n
ex = lim 1 +
The idea behind this relation is similar to the idea of compound interest.
If the interest rate is 10%, compounded annually, then x = 0.1, and
the balance grows by a factor (1 + x) = 1.1 in one year. If, instead,
we want to compound the interest monthly, we can set the monthly
interest rate to 0.1/12, and then the growth of the balance over a year
is (1+x/12)12 = 1.1047, which is slightly larger because the interest from
the earlier months itself accrues interest in the later months. Continuing
this limiting process, we find e1.1 = 1.1052.
If n is large, then we have a good approximation to the base-e exponential, so let’s differentiate this finite-n approximation and try to
find an approximation to the derivative of ex . The chain rule tells is
that the derivative of (1 + x/n)n is the derivative of the raising-tothe-nth-power function, multiplied by the derivative of the inside stuff,
d(1 + x/n)/dx = 1/n. We then have
d 1 + nx
x n−1 1
= n 1+
= 1+
But evaluating this at x = 0 simply gives 1, so at x = 0, the approximation to the derivative is exactly 1 for all values of n — it’s not even
necessary to imagine going to larger and larger values of n. This establishes that c = 1, so we have
= ex
for all values of x.
Proofs of the generalizations of l’Hôpital’s rule
Multiple applications of the rule
Here we prove, as claimed on p. 65, that the form of L’Hôpital’s rule
rule given on p. 60 can be generalized to the case where more than
one application of the rule is required. The proof requires material
from ch. 4 (integration and the mean value theorem), and, as discussed
in example 85 on p. 110, the motivation for the result becomes much
more transparent once has read ch. 7 and knows about Taylor series.
The reader who has arrived here while reading ch. 3 will need to defer
reading this section of the proof until after ch. 4, and may wish to wait
until after ch. 7.
The proof can be broken down into two steps.
Step 1: We first have to establish a stronger form of l’Hôpital’s rule that
states that lim u/v = lim u̇/v̇ rather than lim u/v = u̇/v̇. This form is
stronger, because in a case like example 46 on p. 65, u̇/v̇ isn’t defined,
but lim u̇/v̇ is.
We prove the stronger form using the mean value theorem (p. 74). For
simplicity of notation, let’s assume that the limit is being taken
R x at x = 0.
By the fundamental theorem of calculus, we have u(x) = 0 u̇(x0 )dx0 ,
and the mean value theorem then tells us that for some p between 0 and
x, u(x) = xu̇(p). Likewise for a q in this interval, v(x) = xv̇(q). So
= lim
p→0 v̇(q)
x→0 v
but since both p and q are closer to zero than x is, the limit as they
simultaneously approach zero is the same as the limit as x approaches
Step 2: If we need to take n derivatives, the proof follows by applying
the extra-strength rule n times.3
Change of variable
We will build up the rest of the features of l’Hôpital’s rule using the
technique of a change of variable. To demonstrate how this works, let’s
imagine that we were starting from an even more stripped-down version
of l’Hôpital’s rule than the one on p. 60. Say we only knew how to do
limits of the form x → 0 rather than x → a for an arbitrary real number
a. We could then evaluate limx→a u/v simply by defining t = x − a and
reexpressing u and v in terms of t.
Example 103
. Reduce
sin x
x −π
to a form involving a limit at 0.
. Define t = x − π. Solving for x gives x = t + π. We substitute into the above
expression to find
sin x
sin(t + π)
= lim
x→π x − π
If all we knew was the → 0 form of l’Hôpital’s rule, then this would suffice to
reduce the problem to one we knew how to solve. In fact, this kind of change of
variable works in all cases, not just for a limit at π, so rather then going through
a laborious change of variable every time, we could simply establish the more
general form on p. 60, with → a.
The indeterminate form ∞/∞
To prove that l’Hôpital’s rule works in general for ∞/∞ forms, we do a
change of variable on the outputs of the functions u and v rather than
3 There is a logical subtlety here, which is that although we’ve given a clearcut
recipe for cooking up a proof for any given n, that isn’t quite the same thing as
proving it for any positive integer n. This is an example where what we really need
is a technique called proof by induction. In general, proof by induction works like
this. Suppose we prove some statement about the integer 1, e.g., that l’Hôpital’s
rule is valid when you take 1 derivative. Now say that we can also prove that if that
statement holds for a given n, it also holds for n + 1. Proof by induction means that
we can then consider the statement as having been proved for all positive integers.
For suppose the contrary. Then there would be some least n for which it failed, but
this would be a contradiction, since it would hold for n − 1.
their inputs. Suppose that our original problem is of the form
where both functions blow up.4 We then define U = 1/u and V = 1/v.
We now have
= lim
= lim
and since U and V both approach zero, we have reduced the problem
to one that can be solved using the version of l’Hôpital’s rule already
proved for indeterminate forms like 0/0. Differentiating and applying
the chain rule, we have
−v −2 v̇
= lim = lim
−u−2 u̇
Since lim ab = lim a lim b provided that lim a and lim b are both defined,
we can rearrange factors to produce the desired result.
This change of variable is a specific example of a much more general
method of problem-solving in which we look for a way to reduce a hard
problem to an easier one. We will encounter changes of variable again on
p. 85 as a technique for integration, which means undoing the operation
of differentiation.
Proof of the fundamental theorem of calculus
There are three parts to the proof: (1) Take the equation that states
the fundamental theorem, differentiate both sides with respect to b, and
show that they’re equal. (2) Show that continuous functions with equal
derivatives must be essentially the same function, except for an additive
constant. (3) Show that the constant in question is zero.
1. By the definition of the indefinite integral, the derivative of x(b)−x(a)
with respect to b equals ẋ(b). We have to establish that this equals the
4 Think
about what happens when only u blows up, or only v.
ẋ(t)dt = st
= st
ẋ(t)dt −
= st
ẋ(b + i db/H)
db H→∞ i=0
1 X
ẋ(b + i db/H)
H→∞ H
= st lim
Since ẋ is continuous, all the values of ẋ occurring inside the sum can
differ only infinitesimally from ẋ(b). Therefore the quantity inside the
limit differs only infinitesimally from ẋ(b), and the standard part of its
limit must be ẋ(b).5
2. Suppose f and g are two continuous functions whose derivatives are
equal. Then d = f − g is a continuous function whose derivative is zero.
But the only continuous function with a derivative of zero is a constant,
so f and g differ by at most an additive constant.
3. I’ve established that the derivatives with respect to b of x(b) − x(a)
and a ẋdt are the same, so they differ by at most an additive constant.
But at b = a, they’re both zero, so the constant must be zero.
5 If you don’t want to use infinitesimals, then you can express the derivative as a
limit, and in the final step of the argument use the mean value theorem, introduced
later in the chapter.
The intermediate value theorem
On page 54 I asserted that the intermediate value theorem was really
more a statement about the (real or hyperreal) number system than
about functions. For insight, consider figure b, which is a geometrical
construction that constitutes the proof of the very first proposition in
Euclid’s celebrated Elements. The proposition to be proved is that given
a line segment AB, it is possible to construct an equilateral triangle with
AB as its base. The proof is by construction; that is, Euclid doesn’t
just give a logical argument that convinces us the triangle must exist,
he actually demonstrates how to construct it. First we draw a circle
with center A and radius AB, which his third postulate says we can do.
Then we draw another circle with the same radius, but centered at B.
Pick one of the intersections of the circles and call it C. Construct the
line segments AC and BC (postulate 1). Then AC equals AB by the
definition of the circle, and likewise BC equals AB. Euclid also has an
axiom that things equal to the same thing are equal to one another, so
it follows that AC equals BC, and therefore the triangle is equilateral.
b / A proof from Euclid’s Elements.
It seems like a model of mathematical rigor, but there’s a flaw in the
reasoning, which is that he assumes without justififcation that the circles do have a point in common. To see that this is not as secure an
assumption as it seems, consider the usual Cartesian representation of
plane geometry in terms of coordinates (x, y). Usually we assume that x
and y are real numbers. What if we instead do our Cartesian geometry
using rational numbers as coordinates? Euclid’s five postulates are all
consistent with this. For example, circles do exist. Let A = (0, 0) and
B = (1, 0). Then there are infinitely many pairs of rational numbers in
the set that satisfies the definition of the circle centered at A. Examples
include (3/5, 4/5) and (−7/25, 24/25). The circle is also continuous in
the sense that if I specify a point on it such as (−7/25, 24/25), and a
distance that I’m allowed to make as small as I please, say 10−6 , then
other points exist on the circle within that distance of the given point.
However, the intersection
by Euclid’s proof doesn’t exist. It
would lie at (1/2, 3/2), but 3 doesn’t exist in the rational number
In exactly the same way, we can construct counterexamples to the intermediate value theorem if the underlying system of numbers doesn’t
have the same properties as the real numbers. For example, let y = x2 .
Then y is a continuous function, on the interval from 0 to 1, but if
we take the rational numbers as our foundation,
√ then there is no x for
which y = 1/2. The solution would be x = 1/ 2, which doesn’t exist in
the rational number system. Notice the similarity between this problem
and the one in Euclid’s proof. In both cases we have curves that cut
one another without having an intersection. In the present example, the
curves are the graphs of the functions y = x2 and y = 1/2.
The interpretation is that the real numbers are in some sense more
densely packed than the rationals, and with two thousand years worth of
hindsight, we can see that Euclid should have included a sixth postulate
that expressed this density property. One possible way of stating such
a postulate is the following. Let L be a ray, and O its endpoint. We
think of O as the origin of the positive number line. Let P and Q be
sets of points on L such that every point in P is closer to O than every
point in Q. Then there exists some point Z on L such that Z lies at
least as far from O as every point in P, but no farther than any point in
Q. Technically this property is known as completeness. As an example,
let P = {x|x
√ < 2} and Q = {x|x ≥ 2}. Then the point Z would
have to be 2, which shows that the rationals are not complete. The
reals are complete, and the completeness axiom can serve as one of the
fundamental axioms of the real numbers.
Note that the axiom refers to sets P and Q, and says that a certain
fact is true for any choice of those sets; it therefore isn’t the type of
proposition that is covered by the transfer principle, and in fact it fails
for the hyperreals, as we can see if P is the set of all infinitesimals and
Q the positive real numbers.
Here is a skeletal proof of the intermediate value theorem, in which I’ll
make some simplifying assumptions and leave out some cases. We want
to prove that if y is a continuous real-valued function on the real interval
from a to b, and if y takes on values y1 and y2 at certain points within
this interval, then for any y3 between y1 and y2 , there is some real x in
the interval for which y(x) = y3 . I’ll assume the case in which x1 < x2
and y1 < y2 . Define sets of real numbers P = {x|y ≤ y3 }, and let
Q = {x|y ≥ y3 }. For simplicity, I’ll assume that every member of P is
less than or equal to every member of Q, which happens, for example,
if the function y(x) is always increasing on the interval [a, b]. If P and
Q intersect, then the theorem holds. Suppose instead that P and Q do
not intersect. Using the completeness axiom, there exists some real x
which is greater than or equal to every element of P and less than or
equal to every element of Q. Suppose x belongs to P. Then the following
statement is in the right form for the transfer principle to apply to it:
for any number x0 > x, y(x0 ) > y3 . We can conclude that the statement
is also true for the hyperreals, so that if dx is a positive infinitesimal and
x0 = x + dx, we have y(x) < y3 , but y(x + dx) > y3 . Then by continuity,
y(x) − y(x + dx) is infinitesimal. But y(x) < y3 and y(x + dx) > y3 , so
the standard part of y(x) must equal y3 . By assumption y takes on real
values for real arguments, so y(x) = y3 . The same reasoning applies if
x belongs to Q, and since x must belong either to P or to Q, the result
is proved.
For an alternative proof of the intermediate value theorem by an entirely
different technique, see Keisler (references, p. 199).
As a side issue, we could ask whether there is anything like the intermediate value theorem that can be applied to functions on the hyperreals.
Our definition of continuity on page 53 explicitly states that it only
applies to real functions. Even if we could apply the definition to a
function on the hyperreals, the proof given above would fail, since the
hyperreals lack the completeness property. As a counterexample, let be some positive infinitesimal, and define a function y such that y = −
when st(x) ≤ 0 and y = everywhere else. If we insist on applying
the definition of continuity to this function, it appears to be continuous,
so it violates the intermediate value theorem. Note, however, that the
way this function is defined is different from the way we usually define
functions on the hyperreals. Usually we define a function on the reals,
say y = x2 , in language to which the transfer principle applies, and then
we use the transfer principle to reason about the function’s analog on
the hyperreals. For instance, the function y = x2 has the property that
y ≥ 0 everywhere, and the transfer principle guarantees that that’s also
true if we take y = x2 as the definition of a function on the hyperreals.
For functions defined in this way, the intermediate value theorem makes
a statement that the transfer principle applies to, and it is therefore
true for the hyperreal version of the function as well.
Proof of the extreme value theorem
The extreme value theorem was stated on page 56. Before we can prove
it, we need to establish some preliminaries, which turn out to be interesting for their own sake.
Definition: Let C be a subset of the real numbers whose definition can
be expressed in the type of language to which the transfer principle
applies. Then C is compact if for every hyperreal number x satisfying
the definition of C, the standard part of x exists and is a member of C.
To understand the content of this definition, we need to look at the two
ways in which a set could fail to satisfy it.
First, suppose U is defined by x ≥ 0. Then there are positive infinite
hyperreal numbers that satisfy the definition, and their standard part is
not defined, so U is not compact. The reason U is not compact is that
it is unbounded.
Second, let V be defined by 0 ≤ x < 1. Then if dx is a positive infinitesimal, 1−dx satisfies the definition of V , but its standard part is 1, which
is not in V , so V is not compact. The set V has boundary points at
0 and 1, and the reason it is not compact is that it doesn’t contain its
right-hand boundary point. A boundary point is a real number which
is infinitesimally close to some points inside the set, and also to some
other points that are on the outside.
We therefore arrive at the following alternative characterization of the
notion of a compact set, whose proof is straightforward.
Theorem: A set is compact if and only if it is bounded and contains all
of its boundary points.
Intuitively, the reason compact sets are interesting is that if you’re standing inside a compact set and start taking steps in a certain direction,
without ever turning around, you’re guaranteed to approach some point
in the set as a limit. (You might step over some gaps that aren’t included in the set.) If the set was unbounded, you could just walk forever
at a constant speed. If the set didn’t contain its boundary point, then
you could asymptotically approach the boundary, but the goal you were
approaching wouldn’t be a member of the set.
The following theorem turns out to be the most difficult part of the
Theorem: A compact set contains its maximum and minimum.
Proof: Let C be a compact set. We know it’s bounded, so let M be the
set of all real numbers that are greater than any member of C. By the
completeness property of the real numbers, there is some real number x
between C and M . Let ∗ C be the set of hyperreal numbers that satisfies
the same definition that C does.
Every real x0 greater than x fails to satisfy the condition that defines
C, and by the transfer principle the same must be true if x0 is any
hyperreal, so if dx is a positive infinitesimal, x + dx must be outside of
But now consider x − dx. The following statement holds for the reals:
there is no number x0 < x that is greater than every member of C. By
the transfer principle, we find that there is some hyperreal number q
in ∗ C that is greater than x − dx. But the standard part of q must
equal x, for otherwise stq would be a member of C that was greater
than x. Therefore x is a boundary point of C, and since C is compact,
x is a member of C. We conclude C contains its maximum. A similar
argument shows that C contains its minimum, so the theorem is proved.
There were two subtle things about this proof. The first was that we
ended up constructing the set of hyperreals ∗ C, which was the hyperreal
“big brother” of the real set C. This is exactly the sort of thing that the
transfer principle does not guarantee we can do. However, if you look
back through the proof, you can see that ∗ C is used only as a notational
convenience. Rather than talking about whether a certain number was a
member of ∗ C, we could have referred, more cumbersomely, to whether
or not it satisfied the condition that had originally been used to define
C. The price we paid for this was a slight loss of generality. There
are so many different sets of real numbers that they can’t possibly all
have explicit definitions that can be written down on a piece of paper.
However, there is very little reason to be interested in studying the
properties of a set that we were never able to define in the first place.
The other subtlety was that we had to construct the auxiliary point
x − dx, but there was not much we could actually say about x − dx
itself. In particular, it might or might not have been a member of C.
For example, if C is defined by the condition x = 0, then ∗ C likewise
contains only the single element 0, and x − dx is not a member of ∗ C.
But if C is defined by 0 ≤ x ≤ 1, then x − dx is a member of ∗ C.
The original goal was to prove the extreme value theorem, which is a
statement about continuous functions, but so far we haven’t said anything about functions.
Lemma: Let f be a real function defined on a set of points C. Let D be
the image of C, i.e., the set of all values f (x) that occur for some x in
C. Then if f is continous and C is compact, D is compact as well. In
other words, continuous functions take compact sets to compact sets.
Proof: Let y = f (x) be any hyperreal output corresponding to a hyperreal input x in ∗ C. We need to prove that the standard part of y
exists, and is a member of D. Since C is compact, the standard part
of x exists and is a member of C. But then by continuity y differs only
infinitesimally from f (stx), which is real, so sty = f (stx) is defined and
is a member of D.
We are now ready to prove the extreme value theorem, in a version
slightly more general than the one originally given on page 56.
The extreme value theorem: Any continuous function on a compact set
achieves a maximum and minimum value, and does so at specific points
in the set.
Proof: Let f be continuous, and let C be the compact set on which
we seek its maximum and minimum. Then the image D as defined in
the lemma above is compact. Therefore D contains its maximum and
minimum values.
Proof of the mean value theorem
Suppose that the mean value theorem is violated. Let L be the set of all
x in the interval from a to b such that y(x) < ȳ, and likewise let M be
the set with y(x) > ȳ. If the theorem is violated, then the union of these
two sets covers the entire interval from a to b. Neither one can be empty;
if, for example, M was empty, then we would have y < ȳ everywhere
and also a y = a ȳ, but it follows directly from the definition of the
definite integral that when one function is less than another, its integral
is also less than the other’s. Since y takes on values less than and greater
than ȳ, it follows from the intermediate value theorem that y takes on
the value ȳ somewhere (intuitively, at a boundary between L and M ).
Proof of the fundamental theorem of algebra
We start with the following lemma, which is intuitively obvious, because
polynomials don’t have asymptotes. Its proof is given after the proof of
the main theorem.
Lemma: For any polynomial P (z) in the complex plane, its magnitude
|P (z)| achieves its minimum value at some specific point zo .
The fundamental theorem of algebra: In the complex number system, a
nonzero nth-order polynomial has exactly n roots, i.e., it can be factored
into the form P (z) = (z−a1 )(z−a2 ) . . . (z−an ), where the ai are complex
Proof: The proofs in the cases of n = 0 and 1 are trivial, so our strategy
is to reduce higher-n cases to lower ones. If an nth-degree polynomial P
has at least one root, a, then we can always reduce it to a polynomial of
degree n − 1 by dividing it by (z − a). Therefore the theorem is proved
by induction provided that we can show that every polynomial of degree
greater than zero has at least one root.
Suppose, on the contrary, that there is an nth order polynomial P (z),
with n > 0, that has no roots at all. Then by the lemma |P | achieves
its minimum value at some point zo . To make things more simple and
concrete, we can construct another polynomial Q(z) = P (z + zo )/P (zo ),
so that |Q| has a minimum value of 1, achieved at Q(0) = 1. This means
that Q’s constant term is 1. What about its other terms? Let Q(z) = 1+
c1 z + . . . + cn z n . Suppose c1 was nonzero. Then for infinitesimally small
values of z, the terms of order z 2 and higher would be negligible, and
we could make Q(z) be a real number less than one by an appropriate
choice of z’s argument. Therefore c1 must be zero. But that means that
if c2 is nonzero, then for infinitesimally small z, the z 2 term dominates
the z 3 and higher terms, and again this would allow us to make Q(z) be
real and less than one for appropriately chosen values of z. Continuing
this process, we find that Q(z) has no terms at all beyond the constant
term, i.e., Q(z) = 1. This contradicts the assumption that n was greater
than zero, so we’ve proved by contradiction that there is no P with the
properties claimed.
Uninteresting proof of the lemma: Let M (r) be the minimum value of
|P (z)| on the disk defined by |z| ≤ r. We first prove that M (r) can’t
asymptotically approach a minimum as r approaches infinity. Suppose
to the contrary: for every r, there is some r0 > r with M (r0 ) < M (r).
Then by the transfer principle, the same would have to be true for
hyperreal values of r. But it’s clear that if r is infinite, the lower-order
terms of P will be infinitesimally small compared to the highest-order
term, and therefore M (r) is infinite for infinite values of r, which is
a contradiction, since by construction M is decreasing, and finite for
finite r. We can therefore conclude by the extreme value theorem that
M achieves its minimum for some specific value of r. The least such r
describes a circle |z| = r in the complex plane, and the minimum of |P |
on this circle must be the same as its global minimum. Applying the
extreme value function to |P (z)| as a function of arg z on the interval
0 ≤ argz ≤ 2π, we establish the desired result.
B Answers and solutions
Answers to Self-Checks
Answers to self-checks for chapter 4
page 78, self-check 1:
The area under the curve from 130 to 135 cm is about 3/4 of a rectangle.
The area from 135 to 140 cm is about 1.5 rectangles. The number of people in the second range is about twice as much. We could have converted
these to actual probabilities (1 rectangle = 5 cm × 0.005 cm−1 = 0.025),
but that would have been pointless, because we were just going to compare the two areas.
Answers to self-checks for chapter 6
page 118, self-check 1: Say we’re looking for u = z, i.e., we want a
number u that, multiplied by itself, equals z. Multiplication multiplies
the magnitudes, so the magnitude of u can be found by taking the square
root of the magnitude of z. Since multiplication also adds the arguments
of the numbers, squaring a number doubles its argument. Therefore we
can simply divide the argument of z by two to find the argument of
u. This results in one of the square roots of z. There is another one,
which is −u, since (−u)2 is the same as u2 . This may seem a little odd:
if u was chosen so that doubling its argument gave the argument of z,
then how can the same be true for −u? Well for example, suppose the
argument of z is 4 ◦ . Then arg u = 2 ◦ , and arg(−u) = 182 ◦ . Doubling
182 gives 364, which is actually a synonym for 4 degrees.
Answers and solutions
Solutions to homework problems
Solutions for chapter 1
page 21, problem 1:
The tangent line has to pass through the point (3,9), and it also seems,
at least approximately, to pass through (1.5,0). This gives it a slope of
(9 − 0)/(3 − 1.5) = 9/1.5 = 6, and that’s exactly what 2t is at t = 3.
a / Problem 1.
page 21, problem 2:
The tangent line has to pass through the point (0, sin(e0 )) = (0, 0.84),
and it also seems, at least approximately, to pass through (-1.6,0). This
gives it a slope of (0.84 − 0)/(0 − (−1.6)) = 0.84/1.6 = 0.53. The more
accurate result given in the problem can be found using the methods of
chapter 2.
b / Problem 2.
page 21, problem 3:
The derivative is a rate of change, so the derivatives of the constants
1 and 7, which don’t change, are clearly zero. The derivative can be
interpreted geometrically as the slope of the tangent line, and since the
functions t and 7t are lines, their derivatives are simply their slopes, 1,
and 7. All of these could also have been found using the formula that
says the derivative of tk is ktk−1 , but it wasn’t really necessary to get
that fancy. To find the derivative of t2 , we can use the formula, which
gives 2t. One of the properties of the derivative is that multiplying a
function by a constant multiplies its derivative by the same constant, so
the derivative of 7t2 must be (7)(2t) = 14t. By similar reasoning, the
derivatives of t3 and 7t3 are 3t2 and 21t2 , respectively.
page 21, problem 4:
One of the properties of the derivative is that the derivative of a sum is
the sum of the derivatives, so we can get this by adding up the derivatives
of 3t7 , −4t2 , and 6. The derivatives of the three terms are 21t6 , −8t,
and 0, so the derivative of the whole thing is 21t6 − 8t.
page 21, problem 5:
This is exactly like problem 4, except that instead of explicit numerical
constants like 3 and −4, this problem involves symbolic constants a, b,
and c. The result is 2at + b.
page 21, problem 6:
The first thing that comes to mind is 3t. Its graph would be a line with
a slope of 3, passing through the origin. Any other line with a slope of
3 would work too, e.g., 3t + 1.
Answers and solutions
page 21, problem 7:
Differentiation lowers the power of a monomial by one, so to get something with an exponent of 7, we need to differentiate something with an
exponent of 8. The derivative of t8 would be 8t7 , which is eight times
too big, so we really need (t8 /8). As in problem 6, any other function
that differed by an additive constant would also work, e.g., (t8 /8) + 1.
page 21, problem 8:
This is just like problem 7, but we need something whose derivative
is three times bigger. Since multiplying by a constant multiplies the
derivative by the same constant, the way to accomplish this is to take
the answer to problem 7, and multiply by three. A possible answer is
(3/8)t8 , or that function plus any constant.
page 21, problem 9:
This is just a slight generalization of problem 8. Since the derivative
of a sum is the sum of the derivatives, we just need to handle each
term individually, and then add up the results. The answer is (3/8)t8 −
(4/3)t3 + 6t, or that function plus any constant.
page 21, problem 10:
The function v = (4/3)π(ct)3 looks scary and complicated, but it’s
than a constant multiplied by t3 , if we rewrite it as v =
3 3
(4/3)πc t . The whole thing in square brackets is simply one big
constant, which just
comes along for the ride when we differentiate.
3 2
t . (For
The result is v̇ = (4/3)πc3 (3t2 ), or, simplifying,
further physical insight, we can factor this as 4π(ct) c, where ct is the
radius of the expanding sphere, and the part in brackets is the sphere’s
surface area.)
For purposes of checking the units, we can ignore the unitless constant 4π, which just leaves c3 t2 .
This has units of
(meters per second)3 (seconds)2 , which works out to be cubic meters per
second. That makes sense, because it tells us how quickly a volume is
increasing over time.
page 21, problem 11:
This is similar to problem 10, in that it looks scary, but we can rewrite
it as a simple monomial, K = (1/2)mv 2 = (1/2)m(at)2 = (ma2 /2)t2 .
The derivative is (ma2 /2)(2t) = ma2 t. The car needs more and more
power to accelerate as its speed increases.
To check the units, we just need to show that the expression ma2 t has
units that are like those of the original expression for K, but divided
by seconds, since it’s a rate of change of K over time. This indeed
works out, since the only change in the factors that aren’t unitless is
the reduction of the powet of t from 2 to 1.
page 22, problem 12:
The area is a = `2 = (1 + αT )2 `2o . To make this into something we know
how to differentiate, we need to square out the expression involving T ,
and make it into something that is expressed explicitly as a polynomial:
a = `2o + 2`2o αT + `2o α2 T 2
Now this is just like problem 5, except that the constants superficially
look more complicated. The result is
ȧ = 2`2o α + 2`2o α2 T
= 2`2o α + α2 T
We expect the units of the result to be area per unit temperature, e.g.,
degrees per square meter. This is a little tricky, because we have to
figure out what units are implied for the constant α. Since the question
talks about 1 + αT , apparently the quantity αT is unitless. (The 1 is
unitless, and you can’t add things that have different units.) Therefore
the units of α must be “per degree,” or inverse degrees. It wouldn’t
make sense to add α and α2 T unless they had the same units (and
you can check for yourself that they do), so the whole thing inside the
parentheses must have units of inverse degrees. Multiplying by the `2o
in front, we have units of area per degree, which is what we expected.
page 22, problem 13:
The first derivative is 6t2 − 1. Going again, the answer is 12t.
page 22, problem 14:
The first derivative is 3t2 +2t, and the second is 6t+2. Setting this equal
to zero and solving for t, we find t = −1/3. Looking at the graph, it
does look like the concavity is down for t < −1/3, and up for t > −1/3.
page 22, problem 15:
I chose k = −1, and t = 1. In other words, I’m going to check the slope
of the function x = t−1 = 1/r at t = 1, and see whether it really equals
Answers and solutions
c / Problem 14.
ktk−1 = −1. Before even doing the graph, I note that the sign makes
sense: the function 1/t is decreasing for t > 0, so its slope should indeed
be negative.
d / Problem 15.
The tangent line seems to connect the points (0,2) and (2,0), so its slope
does indeed look like it’s −1.
The problem asked us to consider the logical meaning of the two possible outcomes. If the slope had been significantly different from −1
given the accuracy of our result, the conclusion would have been that it
was incorrect to extend the rule to negative values of k. Although our
example did come out consistent with the rule, that doesn’t prove the
rule in general. An example can disprove a conjecture, but can’t prove
it. Of course, if we tried lots and lots of examples, and they all worked,
our confidence in the conjecture would be increased.
page 22, problem 16:
A minimum would occur where the derivative was zero. First we rewrite
the function in a form that we know how to differentiate:
E(r) = ka12 r−12 − 2ka6 r−6
We’re told to have faith that the derivative of tk is ktk−1 even for k < 0,
0 = Ė
= −12ka12 r−13 + 12ka6 r−7
To simplify, we divide both sides by 12k. The left side was already zero,
so it keeps being zero.
0 = −a12 r−13 + a6 r−7
a12 r−13 = a6 r−7
a12 = a6 r6
a6 = r6
r = ±a
To check that this is a minimum, not a maximum or a point of inflection,
one method is to construct a graph. The constants a and k are irrelevant
to this issue. Changing a just rescales the horizontal r axis, and changing
k does the same for the vertical E axis. That means we can arbitrarily
set a = 1 and k = 1, and construct the graph shown in the figure. The
points r = ±a are now simply r = ±1. From the graph, we can see
that they’re clearly minima. Physically, the minimum at r = −a can
be interpreted as the same physical configuration of the molecule, but
with the positions of the atoms reversed. It makes sense that r = −a
behaves the same as r = a, since physically the behavior of the system
has to be symmetric, regardless of whether we view it from in front or
from behind.
The other method of checking that r = a is a minimum is to take the
second derivative. As before, the values of a and k are irrelevant, and
can be set to 1. We then have
Ė = −12r−13 + 12r−7
Ë = 156r−14 − 84r−8
Answers and solutions
e / Problem 16.
Plugging in r = ±1, we get a positive result, which confirms that the
concavity is upward.
page 22, problem 17:
Since polynomials don’t have kinks or endpoints in their graphs, the
maxima and minima must be points where the derivative is zero. Differentiation bumps down all the powers of a polynomial by one, so the
derivative of a third-order polynomial is a second-order polynomial. A
second-order polynomial can have at most two real roots (values of t for
which it equals zero), which are given by the quadratic formula. (If the
number inside the square root in the quadratic formula is zero or negative, there could be less than two real roots.) That means a third-order
polynomial can have at most two maxima or minima.
page 22, problem 18:
Since f , g, and s are smooth and defined everywhere, any extrema they
possess occur at places where their derivatives are zero. The converse is
not necessarily true, however; a place where the derivative is zero could
be a point of inflection. The derivative is additive, so if both f and g
have zero derivatives at a certain point, s does as well. Therefore in
most cases, if f and g both have an extremum at a point, so will s.
However, it could happen that this is only a point of inflection for s, so
in general, we can’t conclude anything about the extrema of s simply
from knowing where the extrema of f and g occur.
Going the other direction, we certainly can’t infer anything about extrema of f and g from knowledge of s alone. For example, if s(x) = x2 ,
with a minimum at x = 0, that tells us very little about f and g. We
could have, for example, f (x) = (x−1)2 /2−2 and g(x) = (x+1)2 /2+1,
neither of which has an extremum at x = 0.
page 22, problem 19:
Considering V as a function of h, with b treated as a constant, we have
for the slope of its graph
V̇ =
eV = V̇ · eh
= beh
page 23, problem 20:
Thinking of the rocket’s height as a function of time, we can see that
goal is to measure the function at its maximum. The derivative is zero
at the maximum, so the error incurred due to timing is approximately
zero. She should not worry about the timing error too much. Other
factors are likely to be more important, e.g., the rocket may not rise
exactly vertically above the launchpad.
page 23, problem 21: If ẋ = n2 , and x is a polynomial in n, then
we must have ẋ(n) = x(n) − x(n − 1) = n2 . If x is a polynomial of
order k, then x(n) and x(n − 1) both have nk terms with coefficients
of 1, so ẋ has no nk term. We want ẋ to have a nonvanishing n2
term, so we must have k ≥ 3. For k > 3, it’s easy to show that the
n3 term in x(n) − x(n − 1) is nonzero, so we must have k = 3. Let
x(n) = an3 + bn2 + . . ., where a is the coefficient that we want to prove
is 1/3, and . . . represents lower-order terms. By the binomial theorem,
we have x(n − 1) = an3 − 3an2 + bn2 + . . ., and subtracting this from
x(n) gives ẋ(n) = 3an3 + . . .. Since 3a = 1, we have a = 1/3.
Solutions for chapter 2
page 47, problem 1:
Answers and solutions
(t + dt)4 − t4
4t3 dt + 6t2 dt2 + 4t dt3 + dt4
= 4t3 + . . .
where . . . indicates infinitesimal terms. The derivative is the standard
part of this, which is 4t3 .
page 47, problem 2:
cos(t + dt) − cos t
The identity cos(α + β) = cos α cos β − sin α sin β then gives
cos t cos dt − sin t sin dt − cos t
The small-angle approximations cos dt ≈ 1 and sin dt ≈ dt result in
− sin t dt
= − sin t
page 47, problem 3:
H +1− H −1
1000, 000
1000, 000, 000 0.00032
The result is getting smaller and smaller, so it seems reasonable to guess
that if H is infinite, the expression gives an infinitesimal result.
page 47, problem 4:
.00001 .0032
The square root is getting smaller, but is not getting smaller as fast as
the number itself. In proportion to the √
original number, the square root
is actually getting bigger. It looks like dx is infinitesimal, but it’s still
infinitely big compared to dx. This makes sense, because dx equals
dx1/2 . we already knew that dx0 , which equals 1, was infinitely big
compared to dx1 , which equals dx. In the hierarchy of infinitesimals,
dx1/2 fits in between dx0 and dx1 .
page 47, problem 5:
Statements (a)-(d), and (f)-(g) are all valid for the hyperreals, because
they meet the test of being directly translatable, without having to
interpret the meaning of things like particular subsets of the reals in the
context of the hyperreals.
Statement (e), however, refers to the rational numbers, a particular
subset of the reals, and that means that it can’t be mindlessly translated
into a statement about the hyperreals, unless we had figured out a way
to translate the set of rational numbers into some corresponding subset
of the hyperreal numbers like the hyperrationals! This is not the type of
statement that the transfer principle deals with. The statement is not
true if we try to change “real” to “hyperreal” while leaving “rational”
alone; for example, it’s not true that there’s a rational number that lies
between the hyperreal numbers 0 and 0 + dx, where dx is infinitesimal.
page 47, problem 6: If R1 is finite and R2 infinite, then 1/R2 is
infinitesimal, 1/R1 + 1/R2 differs infinitesimally from 1/R1 , and the
combined resistance R differs infinitesimally from R1 . Physically, the
second pipe is blocked or too thin to carry any significant flow, so it’s
as though it weren’t present.
If R1 is finite and R2 is infinitesimal, then 1/R2 is infinite, 1/R1 + 1/R2
is also infinite, and the combined resistance R is infinitesimal. It’s so
easy for water to flow through R2 that R1 might as well not be present.
In the context of electrical circuits rather than water pipes, this is known
as a short circuit.
page 48, problem 7: The velocity addition is only interesting if the
infinitesimal velocities u and v are comparable to one another, i.e., their
ratio is finite. Let’s write for the size of these infinitesimals, so that
both u and v can be written as multiplied by some finite number.
Then 1 + uv differs from 1 by an amount that is on the order of 2 ,
which is infinitesimally small compared to . The same then holds true
for 1/(1 + uv) as well. The result of velocity addition (u + v)/(1 + uv)
is then u + v + . . ., where . . . represents quantities of order 3 , which
are amount to a correction that is infinitesimally small compared to the
nonrelativistic result u + v.
page 48, problem 8: This would be a horrible problem if we had to
Answers and solutions
expand this as a polynomial with 101 terms, as in chapter 1! But now
we know the chain rule, so it’s easy. The derivative is
100(2x + 3)99 [2]
where the first factor in brackets is the derivative of the function on
the outside, and the second one is the derivative of the “inside stuff.”
Simplifying a little, the answer is 200(2x + 3)99 .
page 48, problem 9:
Applying the product rule, we get
(x + 1)99 (x + 2)200 + (x + 1)100 (x + 2)199
(The chain rule was also required, but in a trivial way — for both of
the factors, the derivative of the “inside stuff” was one.)
page 48, problem 10:
The derivative of e7x is e7x · 7, where the first factor is the derivative of
the outside stuff (the derivative of a base-e exponential is just the same
thing), and the second factor is the derivative of the inside stuff. This
would normally be written as 7e7x .
The derivative of the second function is ee ex , with the second exponential factor coming from the chain rule.
page 48, problem 11:
We need to put together three different ideas here: (1) When a function
to be differentiated is multiplied by a constant, the constant just comes
along for the ride. (2) The derivative of the sine is the cosine. (3) We
need to use the chain rule. The result is −ab cos(bx + c).
page 48, problem 13:
If we just wanted to fine the integral of sin x, the answer would be − cos x
(or − cos x plus an arbitrary constant), since the derivative would be
−(− sin x), which would take us back to the original function. The
obvious thing to guess for the integral of a sin(bx + c) would therefore
be −a cos(bx + c), which almost works, but not quite. The derivative of
this function would be ab sin(bx + c), with the pesky factor of b coming
from the chain rule. Therefore what we really wanted was the function
−(a/b) cos(bx + c).
page 48, problem 14:
The chain rule gives
((x2 )2 )2 = 2((x2 )2 )(2(x2 ))(2x) = 8x7
which is the same as the result we would have gotten by differentiating
x8 .
page 48, problem 15:
To find a maximum, we take the derivative and set it equal to zero. The
whole factor of 2v 2 /g in front is just one big constant, so it comes along
for the ride. To differentiate the factor of sin θ cos θ, we need to use
the chain rule, plus the fact that the derivative of sin is cos, and the
derivative of cos is − sin.
2v 2
(cos θ cos θ + sin θ(− sin θ))
0 = cos2 θ − sin2 θ
cos θ = ± sin θ
We’re interested in angles between, 0 and 90 degrees, for which both
the sine and the cosine are positive, so
cos θ = sin θ
tan θ = 1
θ = 45 ◦
To check that this is really a maximum, not a minimum or an inflection
point, we could resort to the second derivative test, but we know the
graph of R(θ) is zero at θ = 0 and θ = 90 ◦ , and positive in between, so
this must be a maximum.
page 48, problem 17:
Taking the derivative and setting it equal to zero, we have
(ex − e−x ) /2 = 0, so ex = e−x , which occurs only at x = 0. The
second derivative is (ex + e−x ) /2 (the same as the original function),
which is positive for all x, so the function is everywhere concave up, and
this is a minimum.
page 49, problem 18:
There are no kinks, endpoints, etc., so extrema will occur only in places
where the derivative is zero. Applying the chain rule, we find the derivative to be cos(sin(sin x)) cos(sin x) cos x. This will be zero if any of the
three factors is zero. We have cos u = 0 only when |u| ≥ π/2, and π/2
is greater than 1, so it’s not possible for either of the first two factors
to equal zero. The derivative will therefore equal zero if and only if
cos x = 0, which happens in the same places where the derivative of
sin x is zero, at x = π/2 + πn, where n is an integer.
Answers and solutions
f / Problem 18.
This essentially completes the required demonstration, but there is one
more technical issue, which is that it’s conceivable that some of these
could be points of inflection. Constructing a graph of sin(sin(sin x))
gives us the necessary insight to see that this can’t be the case. The
function essentially looks like the sine function, but its extrema have
been “shaved down” a little, giving them slightly flatter tips that don’t
quite extend out to ±1. It’s therefore fairly clear that these aren’t points
of inflection. To prove this more rigorously, we could take the second
derivative and show that it was nonzero at the places where the first
derivative is zero. That would be messy. A less tedious argument is
as follows. We can tell from its formula that the function is periodic,
i.e., it has the property that f (x + `) = f (x), for ` = 2π. This follows
because the innermost sine function is periodic, and the outer layers
only depend on the result of the inner layer. Therefore all the points of
the form π/2 + 2πn have the same behavior. Either they’re all maxima
or they’re all points of inflection. But clearly a function can’t oscillate
back and forth without having any maxima at all, so they must all be
maxima. A similar argument applies to the minima.
page 49, problem 19:
The function f has a kink at x = 0, so it has no uniquely defined
tangent line there, and its derivative at that point is undefined. In terms
of infinitesimals, positive values of dx give df /dx = (dx + dx)/dx =
2, while negative ones give df /dx = (−dx + dx)/dx = 0. Since the
standard part of the quotient dy/dx depends on the specific value of
dx, the derivative is undefined.
The function g has no kink at x = 0. The graph of x|x| looks like two
half-parabolas glued together, and since both of them have slopes of 0
at x = 0, the slope of the tangent line is well defined, and is zero. In
terms of infinitesimals, dg/dy is the standard part of |dx| + 1, which is
page 49, problem 20:
g/A, so that d = A ln cosh ct =
(a) As suggested, let c =
A ln (ect + e−ct ). Applying the chain rule, the velocity is
cect − ce−ct
cosh ct
(b) The expression can be rewritten as Ac tanh ct.
(c) For large t, the e−ct terms become negligible, so the velocity is
Acect /ect = Ac. (d) From the original expression, A must have units of
distance, since the logarithm is unitless. Also, since ct occurs inside a
function, ct must be unitless, which means that c has units of inverse
time. The answers to parts b and c get their units from the factors of
Ac, which have units of distance multiplied by inverse time, or velocity.
page 49, problem 21:
Since I’ve advocated not memorizing the quotient rule, I’ll do this one
from first principles, using the product rule.
tan θ
d sin θ
dθ cos θ
d h
sin θ (cos θ)
= cos θ (cos θ) + (sin θ)(−1)(cos θ)−2 (− sin θ)
= 1 + tan2 θ
(Using a trig identity, this can also be rewritten as sec2 θ.)
page 49, problem
Reexpressing 3 x as x1/3 , the derivative is (1/3)x−2/3 .
page 49, problem 23:
(a) Using the chain rule, the derivative of (x2 + 1)1/2 is (1/2)(x2 +
1)−1/2 (2x) = x(x2 + 1)−1/2 .
(b) This is the same as a, except that the 1 is replaced with an a2 , so
the answer is x(x2 + a2 )−1/2 . The idea would be that a has the same
units as x.
(c) This can be rewritten as (a+x)−1/2 , giving a derivative of (−1/2)(a+
x)−3/2 .
(d) This is similar to c, but we pick up a factor of −2x from the chain
rule, making the result ax(a − x2 )−3/2 .
Answers and solutions
page 49, problem 24:
By the chain rule, the result is 2/(2t + 1).
page 49, problem 25:
Using the product rule, we have
3 sin x + 3
sin x
but the derivative of a constant is zero, so the first term goes away, and
we get 3 cos x, which is what we would have had just from the usual
method of treating multiplicative constants.
page 49, problem 26:
N( (1.0000042278-1)/(.00001) )
Probably only the first few digits of this are reliable.
page 50, problem 27:
The area and volume are
A = 2πr` + 2πr2
V = πr2 `
The strategy is to use the equation for A, which is a constant, to eliminate the variable `, and then maximize V in terms of r.
` = (A − 2πr2 )/2πr
Substituting this expression for ` back into the equation for V ,
V =
rA − πr3
To maximize this with respect to r, we take the derivative and set it
equal to zero.
A − 3πr2
A = 6πr2
` = (6πr2 − 2πr2 )/2πr
` = 2r
In other words, the length should be the same as the diameter.
page 50, problem 28:
(a) We can break the expression down into three factors: the constant
m/2 in front, the nonrelativistic velocity dependence v 2 , and the relativistic correction factor (1 − v 2 /c2 )−1/2 . Rather than substituting in
at for v, it’s a little less messy to calculate dK/dt = (dK/dv)(dv/dt) =
adK/dv. Using the product rule, we have
= a · m 2v 1 − vc2
− 2v
+v 2 · − 12 1 − vc2
= ma2 t 1 − vc2
−3/2 2
v2 + 2 1 − vc2
(b) The expression ma2 t is the nonrelativistic (classical) result, and has
the correct units of kinetic energy divided by time. The factor in square
brackets is the relativistic correction, which is unitless.
(c) As v gets closer and closer to c, the expression 1 − v 2 /c2 approaches
zero, so both the terms in the relativistic correction blow up to positive
page 50, problem 29:
We already know it works for positive x, so we only need to check it
for negative x. For negative values of x, the chain rule tells us that the
derivative is 1/|x|, multiplied by −1, since d|x|/dx = −1. This gives
−1/|x|, which is the same as 1/x, since x is assumed negative.
page 50, problem 30:
Since f (x) = f (−x),
df (−x)
df (x)
Answers and solutions
But by the chain rule, the right-hand side equals −f 0 (x), as claimed.
page 50, problem 32:
Let f = dxk /dx be the unknown function. Then
xk x−k+1
= f x−k+1 + xk (−k + 1)x−k
where we can use the ordinary rule for derivatives of powers on x−k+1 ,
since −k + 1 is positive. Solving for f , we have the desired result.
page 50, problem 33: Since the parallel postulate can be expressed
in terms of algebra through Cartesian geometry, the transfer principle
tells us that it holds for F as well. But G is defined in terms of the
finite hyperreals, so statements about E don’t carry over to statements
about G simply by replacing “real” with “hyperreal,” and the transfer
principle does not guarantee that the parallel postulate applies to G.
In fact, it is easy to find a counterexample in G. Let be an infinitesimal
number. Consider the lines with equations y = 1 and y = 1+x. Neither
of these intersects the x axis.
No, it is not valid to associate only E with the plane described by Euclid’s axioms. All of Euclid’s axioms hold equally well in F. F is referred
to as a nonstandard model of Euclid’s axioms. It has the same relation
to standard Euclidean geometry as the hyperreals have to the reals. If
we want to make up a set of axioms that describes E and can’t describe
F, then we need to add an additional axiom to Euclid’s set. An example of such an axiom would be an axiom stating that given any two line
segments with lengths `1 and `2 , there exists some integer n such that
n`1 > `2 . Note that although this axiom holds in E, the transfer principle cannot be used to show that it holds in F — it is false in F. The
transfer principle doesn’t apply because the transfer principle doesn’t
apply to statements that include phrases such as “for any integer.”
page 51, problem 34:
The normal definition of a repeating decimal such as 0.999 . . . is that it
is the limit of the sequence 0.9, 0.99, . . ., and the limit is a real number,
by definition. 0.999 . . . equals 1. However, there is an intuition that the
limiting process 0.9, 0.99, . . . “never quite gets there.” This intuition
can, in fact, be formalized in the construction described beginning on
page 142; we can define a hyperreal number based on the sequence
0.9, 0.99, . . ., and it is a number infinitesimally less than one. This is
not, however, the normal way of defining the symbol 0.999 . . ., and we
probably wouldn’t want to change the definition so that it was. If it
was, then 0.333 . . . would not equal 1/3.
page 51, problem 35:
Converting these into Leibniz notation, we find
To prove something is not true in general, it suffices to find one counterexample. Suppose that g and h are both unitless, and x has units
of seconds. The value of f is defined by the output of g, so f must
also be unitless. Since f is unitless, df /dx has units of inverse seconds
(“per second”). But this doesn’t match the units of either of the proposed expressions, because they’re both unitless. The correct chain rule,
however, works. In the equation
dg dh
dh dx
the right-hand side consists of a unitless factor multiplied by a factor
with units of inverse seconds, so its units are inverse seconds, matching
the left-hand side.
page 51, problem 36:
We can make life a lot easier by observing that the function s(f ) will
be maximized when the expression inside the square root is minimized.
Also, since f is squared every time it occurs, we can change to a variable
x = f 2 , and then once the optimal value of x is found we can take its
square root in order to find the optimal f . The function to be optimized
is then
a(x − fo2 )2 + bx
Differentiating this and setting the derivative equal to zero, we find
2a(x − fo2 ) + b = 0
Answers and solutions
which results in x = fo2 − b/2a, or
fo2 − b/2a
(choosing the positive root, since f represents a frequencies, and frequencies are positive by definition). Note that the quantity inside the
square root involves the square of a frequency, but then we take its
square root, so the units of the result turn out to be frequency, which
makes sense. We can see that if b is small, the second term is small, and
the maximum occurs very nearly at fo .
There is one subtle issue that was glossed over above, which is that
the graph on page 51 shows two extrema: a minimum at f = 0 and a
maximum at f > 0. What happened to the f = 0 minimum? The issue
is that I was a little sloppy with the change of variables. Let I stand
for the quantity inside the square root in the original expression for s.
Then by the chain rule,
ds dI dx
dI dx df
We looked for the place where dI/dx was zero, but ds/df could also be
zero if one of the other factors was zero. This is what happens at f = 0,
where dx/df = 0.
page 51, problem 37:
f + dx
=f 1−
1 + dx/f
Applying the geometric series 1/(1 + r) = 1 + r + r2 + . . .,
1− 1−
As checks on our result, we note that the units work out correctly (meters squared divided by meters give meters), and that the result is indeed
large, since we divide by the small quantity dx.
page 52, problem 38: One way to evaluate an expression like ab is by
using the identity ab = eb ln a . If we try to substitute a = 1 and b = ∞,
we get e∞·0 , which has an indeterminate form inside the exponential.
One way to express the idea is that if there is even the tiniest error in
the value of a, the value of a∞ can have any positive value.
Solutions for chapter 3
page 67, problem 1:
(a) The Weierstrass definition requires that if we’re given a particular ,
and we be able to find a δ so small that f (x) + g(x) differs from F + G
by at most for |x − a| < δ. But the Weierstrass definition also tells us
that given /2, we can find a δ such that f differs from F by at most
/2, and likewise for g and G. The amount by which f + g differs from
F + G is then at most /2 + /2, which completes the proof.
(b) Let dx be infinitesimal. Then the definition of the limit in terms of
infinitesimals says that the standard part of f (a + dx) differs at most
infinitesimally from F , and likewise for g and G. This means that f + g
differs from F + G by the sum of two infinitesimals, which is itself an
infinitesimal, and therefore the standard part of f +g evaluated at x+dx
equals F + G, satisfying the definition.
page 67, problem 2:
The shape of the graph can be found by considering four cases: large
negative x, small negative x, small positive x, and large positive x. In
these four cases, the function is respectively close to 1, large, small, and
close to 1.
The four limits correspond to the four cases described above.
page 67, problem 3: All five of these can be done using l’Hôpital’s
Answers and solutions
g / Problem 2.
s3 − 1
= lim
s→1 s − 1
sin θ
cos θ
1 − cos θ
= lim
= lim
5x2 − 2x
10x − 2
= lim
n2 + . . .
2n + . . .
n(n + 1)
= lim 2
= lim
= lim = 1
n→∞ (n + 2)(n + 3)
n + ...
2n + . . .
ax2 + bx + c
2ax + . . .
= lim
= lim
x→∞ dx2 + ex + f
2dx + . . .
In examples 2, 4, and 5, we differentiate more than once in order to
get an expression that can be evaluated by substitution. In 4 and 5,
. . . represents terms that we anticipate will go away after the second
differentiation. Most people probably would not bother with l’Hôpital’s
rule for 3, 4, or 5, being content merely to observe the behavior of the
highest-order term, which makes the limiting behavior obvious. Examples 3, 4, and 5 can also be done rigorously without l’Hôpit rule, by
algebraic manipulation; we divide on the top and bottom by the highest
power of the variable, giving an expression that is no longer an indeterminate form ∞/∞.
page 67, problem 4:
Both numerator and denominator go to zero, so we can apply l’Hôpital’s
rule. Differentiating top and bottom gives (cos x − x sin x)/(− ln 2 · 2x ),
which equals −1/ln2 at x = 0. To check this numerically, we plug
x = 10−3 into the original expression. The result is −1.44219, which is
very close to −1/ln2 = −1.44269 . . ..
page 67, problem 5:
L’Hôpital’s rule only works when both the numerator and the denominator go to zero.
page 67, problem 6: Applying l’Hôpital’s rule once gives
eu − e−u
which is still an indeterminate form. Applying the rule a second time,
we get
u→0 eu + e−u
As a numerical check, plugging u = 0.01 into the original expression
results in 0.9999917.
page 67, problem 7: L’Hôpital’s rule gives cos t/1 → −1. Plugging in
t = 3.1 gives -0.9997.
page 67, problem 8: Let u = 1/x. Then
df /dx
df /du
simply by algebraic manipulation of the infinitesimals. (If we want to
interpret these quantities as derivatives, then our notational convention
is that they stand for the standard parts of the quotients of the infinitesimals, in which case the equality is only for the standard parts.) This
equality holds not just in the limit but everywhere that the functions
are differentiable. The expression on the left is the thing whose limit
we’re trying to prove equals lim f /g. The right-hand side is equal to
lim f /g by the previously established form of l’Hôpital’s rule.
page 67, problem 9: By the definition of continuity in terms of infinitesimals, the function is continuous, because an infinitesimal change
dx leads to a change dy = adx in the output of the function which is
likewise infinitesimals. (This depends on the fact that a is assumed to
be real, which implies that it is finite.)
Continuity in terms of the Weierstrass limit holds because we can take
δ = /a.
Answers and solutions
Solutions for chapter 4
page 81, problem 1:
a := 0;
b := 1;
H := 1000;
dt := (b-a)/H;
sum := 0;
t := a;
While (t<=b) [
sum := N(sum+Exp(x^2)*dt);
t := N(t+dt);
The result is 1.46.
h / Problem 2.
page 81, problem 2:
The derivative of the cosine is minus the sine, so to get a function whose
derivative is the sine, we need minus the cosine.
sin x dx
= (− cos x)|0
= (− cos 2π) − (− cos 0)
= (−1) − (−1)
As shown in figure h, the graph has equal amounts of area above and
below the x axis. The area below the axis counts as negative area, so
the total is zero.
page 81, problem 3:
i / Problem 3.
The rectangular area of the graph is 2, and the area under the curve
fills a little more than half of that, so let’s guess 1.4.
−x + 2x =
1 3
2 − x +x 3
= (−8/3 + 4) − (0)
= 4/3
This is roughly what we were expecting from our visual estimate.
Answers and solutions
page 81, problem 4:
Over this interval, the value of the sin function varies from 0 to 1, and
it spends more time above 1/2 than below it, so we expect the average
to be somewhat greater than 1/2. The exact result is
Z π
sin =
sin x dx
π−0 0
= (− cos x)|0
= [− cos π − (− cos 0)]
which is, as expected, somewhat more than 1/2.
page 81, problem 5:
Consider a function y(x) defined on the interval from x = 0 to 2 like
−1 if 0 ≤ x ≤ 1
y(x) =
if 1 < x ≤ 2
The mean value of y is zero, but y never equals zero.
page 81, problem 6:
Let ẋ be defined as
ẋ(t) =
if t < 0
if t ≥ 0
Integrating this function up to t gives
0 if t ≤ 0
x(t) =
t if t ≥ 0
The derivative of x at t = 0 is undefined, and therefore integration
followed by differentiation doesn’t recover the original function ẋ.
page 81, problem 8: First we put the integrand into the more
p familiar
and convenient form cxp , whose integral is (c/(p + 1))xp+1 . bx x =
b1/2 x3/4 . Applying the general rule, the result is (4/7)b1/2 x7/4 .
page 82, problem 11: The claim is false for indefinite integrals, since
indefinite integrals can have a constant of integration. So, for example,
a possible indefinite integral of x2 is x3 /3 + 7, which is neither even nor
odd. The fundamental theorem doesn’t even refer to indefinite integrals,
which are simply defined through inverse differentiation.
Let’s fix the claim by changing g to a definite integral, g(x) = 0 f (u)du.
The claim is now true. However, the proof still doesn’t quite work.
We’ve established that all odd functions have even derivatives, but we
haven’t ruled out possibilities such as functions that are neither even
nor odd, but that have even derivatives.
Solutions for chapter 5
page 97, problem 16:
It’s pretty trivial to generalize from e to b. If we write bx as ex ln b , then
we can substitute u = x ln b and reduce the b 6= e case to b = e.
The generalization of the exponent of x from 2 to a is less straightforward. To do it with a = 2, we needed two integrations by parts, so
clearly if we wanted to do a case with a = 37, we could do it with 37
integrations by parts. However, we would have no easy way to write
down the complete answer without going through the whole tedious
calculation. Furthermore, this is only going to work if a is a positive
page 97, problem
18: The obvious substitution is u = xp , which leads
R u 1/p−1
to the form e u
du. If the exponent 1/p − 1 equals a nonnegative
integer n,R then through n integrations by parts, we can reduce this to
the form ex dx. This requires p = 1, 1/2, 1/3, . . .
page 97, problem 19: This is a mess if attacked by brute force. The
trick is to reexpress the function using partial fractions:
x2 + 1
x2 + 1
x2 + 1
x2 + 1
x3 − x
2(x + 1) 2(x − 1)
Writing u = x + 1 and v = x − 1, this becomes
u−1 + v −1 − x−1 + . . .
where . . . represents terms that will not survive multiple differentiations.
Since du/dx = dv/dx = 1, the chain rule tells us that differentiation
with respect to u or v is the same as differentiation with respect to x.
The result is 100!(u−101 +v −101 −x−101 ), where the notation 100! means
1 × 2 × . . . 100.
Solutions for chapter 6
page 102, problem 4:
Answers and solutions
The method of finding the indefinite integral is discussed in example 69 on p. 89 and problem 16 on p. 97.
The result is
−(ln 2)−3 e−u −u2 − 2u + 2 , where u = −x ln 2. Plugging in the limits
of integration, we obtain 2(ln 2)−3 .
Solutions for chapter 7
page 112, problem 1:
We can define the sequence f (n) as converging to ` if the following is
true: for any real number , there exists an integer N such that for all n
greater than N , the value of f lies within the range from ` − to ` + .
page 112, problem 2:
(a) The convergence of the series is defined in terms of the convergence
of its partial sums, which are 1, 0, 1, 0, . . . In the notation used in the
definition given in the solution to problem 1 above, suppose we pick
= 1/4. Then there is clearly no way to choose any numbers ` and N
that would satisfy the definition, for regardless of N , ` would have to
be both greater than 3/4 and less than 1/4 in order to agree with the
zeroes and ones that occur beyond the N th member of the sequence.
(b) As remarked on page 104, the axioms of the real number system,
such as associativity, only deal with finite sums, not infinite ones. To see
that absurd conclusions result from attempting to apply them to infinite
sums, consider that by the same type of argument we could group the
sum as 1 + (−1 + 1) + (−1 + 1) + . . ., which would equal 1.
page 112, problem 3:
The quantity xn can be reexpressed as en ln x , where ln x is negative
by hypothesis. The integral of this exponential with respect to n is a
similar exponential with a constant factor in front, and this converges
as n approaches infinity.
page 112, problem 4:
(a) Applying the integral test, we find that the integral of 1/x2 is −1/x,
which converges as x approaches infinity, so the series converges as well.
(b) This is an alternating series whose terms approach zero, so it converges. However, the terms get small extremely slowly, so an extraordinarily large number of terms would be required in order to get any
kind of decent approximation to the sum. In fact, it is impossible to
carry out a straightforward numerical evaluation of this sum because
it would require such an enormous number of terms that the rounding
errors would overwhelm the result.
(c) This converges by the ratio test, because the ratio of successive terms
approaches 0.
(d) Split the sum into two sums, one for the 1103 term and one for
the 26390k. The ratio of the two factorials is always less than 44k ,
so discarding constant factors, the first sum is less than a geometric
series with x = (4/396)4 < 1, and must therefore converge. The second
sum is less than a series of the form kxk . This one also converges, by
the integral test. (It has to be integrated with respect to k, not x,
and the integration can be done by parts.) Since both separate sums
converge, the entire sum converges. This bizarre-looking expression was
formulated and shown to equal 1/π by the self-taught genius Srinivasa
Ramanujan (1887-1920).
page 112, problem 5: E.g.,
n=0 sin n diverges, but the ratio test
won’t establish that, because the limit limn→∞ | sin(n + 1)/ sin(n)| does
not exist.
page 114, problem 14: The nth term an can be rewritten as 2/[n(n +
1)], and using partial fractionsP
this can be changed into 2/n − 2/(n + 1).
Let the partial sums be sn = 1 an . For insight, let’s write out s3 :
s3 =
2 2
1 2
2 2
2 3
2 2
3 4
This is called a telescoping series. The second part of one term cancels
out with the first part of the next. Therefore we have
s3 =
2 2
1 4
and in general
sn =
1 n+1
Letting n → ∞, we find that the series sums to 2.
page 114, problem 17: Yes, it converges. To see this, consider that its
graph consists of a series of peaks and valleys, each of which is narrower
than the last and therefore has less area. In fact, the width of these
humps approaches zero, so that the area approaches zero. This means
that the integral can be represented as a decreasing, alternating series
that approaches zero, which must converge.
Answers and solutions
page 113, problem 13: There are certainly some special values of x
for which it does converge, such as 0 and π. For a general value of x,
however, things become more complicated. Let the nth term be given
by the function t(n). |t| converges to a limit, since the first application
of the sine function brings us into the range 0 ≤ |t| ≤ 1, and from
then on, |t| is decreasing and bounded below by 0. It can’t approach a
nonzero limit, for given such a limit t∗ , there would always be values of
t slightly greater than t∗ such that sin t was less than t∗ . Therefore the
terms in the sum approach zero. This is necessary but not sufficient for
the series to converge.
Once t gets small enough, we can approximate the sine using a Taylor
series. Approximating the discrete function t by a continuous one, we
have dt/dn ≈ −(1/6)t3 , which can be rewritten as t−3 dt ≈ −(1/6)dn.
This is known as separation of variables. Integrating, we find that at
large values
of n, where the constant of integration becomes negligible,
t ≈ ± 3/n. The sum diverges by the integral test. Therefore the sum
diverges for all values of x except for multiples of π, which cause t to hit
zero immediately without passing through the region where the Taylor
series is a good approximation.
page 115, problem 20: Our first impression is that it must converge,
since the 2−n factor shrinks much more rapidly than the n2 factor.
To prove this rigorously, we can apply the integral test. The relevant
improper integral was carried out in problem 4 on p. 102.
Finding the sum is far more difficult, and there is no obvious technique
that is guaranteed to work. However, the integral test suggests an approach that does lead to a solution. The fact that the indefinite integral
can be evaluated suggests that perhaps the partial sum
Sn =
j 2 2−j
can also be evaluated. Furthermore, the fact that the integral was of
the form 2−x P (x), for some polynomial x, suggests that perhaps Sn is
of the same form. Based on this conjecture, we try to determine the
unknown coefficients in P (n) = an2 + bn + c.
n2 2−n
Sn − Sn−1 = n2 2−n
= 2−n −an2 + (4a − b)n − 2a + 2b − c
Solving for a, b, and c results in P (n) = −n2 − 4n − 6. This gives the
correct value for the difference Sn − Sn−1 , but doesn’t give Sn = 0 as
it should. But this is easy to fix simply by changing the form of our
conjectured partial sum slightly to Sn = 2−n P (n) + k, where k = 6.
Evaluating limn→∞ Sn , we get 6.
page 115, problem 21: The function cos2 averages to 1/2, so we
might naively expect that cosn would average to about 2−n/2 , in which
case the sum would converge for any value of p whatsoever. But the
average is misleading, because there are some “lucky” values of n for
which cos2 n ≈ 1, and these will have a disproportionate
on the
sum. We know by the integral test that
1/n diverges, but
converges, so clearly if p ≥ 2, then even these occasional “lucky” terms
will not cause divergence.
What about p = 1? Suppose we have some value of n for which cos2 n =
1 − , where is some small number. If this is to happen, then we
must have n = kπ + δ, where k is an integer and δ is small, so that
cos2 n ≈ 1 − δ 2 , i.e., ≈ δ 2 . This occurs with a probability proportional
to δ, and the resulting contribution to the sum is about (1 − δ 2 )n /n,
which by the binomial theorem is roughly of order of 1/n if nδ 2 ∼ 1.
This happens with probability
∼ n−1/2 , so the expected value of the
P −3/2
nth term is ∼ n
. Since
converges by the integral test, this
suggests, but does not prove rigorously, that we also get convergence for
p = 1.
A similar argument suggests that the sum diverges for p = 0.
Answers to self-checks for chapter 9
page 124, problem 9: First we rewrite the integrand as
1 ix
e + e−ix e2ix + 2−2ix
1 3ix
e + e−3ix + eix + e−ix
The indefinite integral is
1 ix
e3ix − e−3ix +
e − e−ix
Evaluating this at 0 gives 0, while at π/2 we find 1/3. The result is 1/3.
Answers and solutions
page 124, problem 8:
sin(a + b) = ei(a+b) − e−i(a+b) /2i
= eia eib − e−ia e−ib /2i
= [(cos a + i sin a)(cos b + i sin b) − (cos a − i sin a)(cos b − i sin b)] /2i
= [(cos a + i sin a)(cos b + i sin b) − (cos a − i sin a)(cos b − i sin b)] /2i
= cos a sin b + sin a cos b
By a similar computation, we find cos(a + b) = cos a cos b − sin a sin b.
page 124, problem 10: If z 3 = 1, then we know that |z| = 1, since
cubing z cubes its magnitude. Cubing z triples its argument, so the
argument of z must be a number that, when tripled, is equivalent to an
angle of zero. There are three possibilities: 0 × 3 = 0, (2π/3) × 3 = 2π,
and (4π/3)×3 = 4π. (Other possibilities, such as (32π/3), are equivalent
to one of these.) The solutions are:
z = 1, e2πi/3 , e4πi/3
page 124, problem 11: We can think of this as a polynomial in x or a
polynomial in y — their roles are symmetric. Let’s call x the variable.
By the fundamental theorem of algebra, it must be possible to factor it
into a product of three linear factors, if the coefficients are allowed to
be complex. Each of these factors causes the product to be zero for a
certain value of x. But the condition for the expression to be zero is
x3 = y 3 , which basically means that the ratio of x to y must be a third
root of 1. The problem, then, boils down to finding the three third roots
of 1, as in problem 10. Using the result of that problem, we find that
there are zeroes when x/y equals 1, e2πi/3 , and e4πi/3 . This tells us that
the factorization is (x − y)(x − e2πi/3 y)(x − e4πi/3 y).
The second part of the problem asks us to factorize as much as possible
using real coefficients. Our only hope of doing this is to multiply out
the two factors that involve complex coefficients, and see if they produce
something real. In fact, we can anticipate that it will work, because the
coefficients are complex conjugates of one another, and when a quadratic
has two complex roots, they are conjugates. The result is (x − y)(x2 +
xy + y 2 ).
page 124, problem 14: Applying the differential equation to the form
suggested gives abxb−1 = ab+1 xb . The exponents must be equal on
both sides, so b must be a solution of b2 − b + 1. The solutions are
b = (1 ± 3i)/2. For a more detailed discussion of this cute problem,
page 125, problem 15: (a) Let m = 10, 000. We know that integrals
of this form can be done, at least in theory, using partial fractions.
The ten thousand roots of the polynomial will be ten thousand points
evenly spaced around the unit circle in the complex plane. They can
be expressed as rk = e2πk/m for k = 0 to m − 1. Since all the roots
are unequal, the partial-fraction form of the integrand contains only
terms of the form Ak /(x − rk ). Integrating, we would get a sum of ten
thousand terms of the form Ak ln(x − rk ).
(b) I tried inputting the integral into three different pieces of symbolic
math software: the open-source packages Yacas and Maxima, and the
web-based interface to Wolfram’s proprietary Mathematica software at Maxima gave a partially integrated result after a couple
of minutes of computation. Yacas crashed. Mathematica’s web interface
timed out and suggested buying a stand-alone copy of Mathematica. All
three programs probably embarked on the computation of the Ak by
attempting to solve 10,000 equations in the 10,000 unknowns Ak , and
then ran out of resources (either memory or CPU time).
(c) The expressions look nicer if we let ω = e2π/m , so that rk = ω k . The
residue method gives
xm − 1
(x − ω )mω k(m−1)
Integration gives
ln x − ω k
(Thanks to user zulon for suggesting the
residue mathod, and to Robert Israel for pointing out that for
|x| < 1 this can also be
expressed as a hypergeometric function:
, 1; 1 + m
; xm .)
(−x) 2 F1 m
Answers and solutions
C Photo Credits
Except as specifically noted below or in a parenthetical credit in the caption of a
figure, all the illustrations in this book are under my own copyright, and are copyleft
licensed under the same license as the rest of the book.
In some cases it’s clear from the date that the figure is public domain, but I don’t
know the name of the artist or photographer; I would be grateful to anyone who
could help me to give proper credit. I have assumed that images that come from
U.S. government web pages are copyright-free, since products of federal agencies fall
into the public domain. I’ve included some public-domain paintings; photographic
reproductions of them are not copyrightable in the U.S. (Bridgeman Art Library,
Ltd. v. Corel Corp., 36 F. Supp. 2d 191, S.D.N.Y. 1999).
cover: Daniel Schwen, 2004; GFDL licensed 8 Gauss: C.A. Jensen (1792-1870).
11 Newton: Godfrey Kneller, 1702.
25 Leibniz: Bernhard Christoph Francke,
19 Baseketball photo: Wikimedia Commons user Reisio, public domain.
30 Berkeley: public domain.
31 Robinson: public-domain 1951 passport photo.
38 Gears: Jared C. Benedict, CC-BY-SA licensed. 120 Euler: Emanuel Handmann, 1753.
129 tightrope walker: public domain, since Blondin died in 1897.
Photo Credits
D References and
Further Reading
Further Reading
The amount of high-quality material on elementary calculus available
for free online these days is an embarrassment of riches, so most of my
suggestions for reading are online. I’ll refer to books in this section only
by the surname of the first author; the references section below tells you
where to find the book online or in print.
The reader who wants to learn more about the hyperreal system might
want to start with Stroyan and the article. For more
depth, one could next read the relevant parts of Keisler. The standard
(difficult) treatise on the subject is Robinson.
Given sufficient ingenuity, it’s possible to develop a surprisingly large
amount of the machinery of calculus without using limits or infinitesimals. Two examples of such treatments that are freely available online
are Marsden and Livshits. Marsden gives a geometrical definition of the
derivative similar to the one used in ch. 1 of this book, but in my opinion his efforts to develop a sufficient body of techniques without limits
or infinitesimals end up bogging down in complicated formulations that
have the same flavor as the Weierstrass definition of the limit and are
just as complicated. Livshits treats differentiation of rational functions
as division of functions.
Tall gives an interesting construction of a number system that is smaller
than the hyperreals, but easier to construct explicitly, and sufficient to
handle calculus involving analytic functions.
Keisler, J., Elementary Calculus: An Approach Using Infinitesimals,
Livshits, Michael,
References and Further Reading
Nonstandard Analysis and the Hyperreals,
Robinson, A., Non-Standard Analysis, Princeton University Press
Stroyan, K., A Brief Introduction to Infinitesimal Calculus,
Tall, D., Looking at graphs through infinitesimal microscopes,
windows and telescopes, Mathematical Gazette, 64, 22-49,
E Reference
E.1 Review
Trigonometry with a right
Quadratic equation:
The solutions
√ of ax + bx + c = 0
−b± b −4ac
are x =
sin θ = o/h
Logarithms and exponentials:
Pythagorean theorem: h2 = a2 + o2
cos θ = a/h
tan θ = o/a
Trigonometry with any triangle
ln(ab) = ln a + ln b
ea+b = ea eb
ln ex = eln x = x
ln(ab ) = b ln a
Law of Sines:
sin α
sin β
sin γ
Law of Cosines:
C 2 = A2 + B 2 − 2AB cos γ
Geometry, area, and volume
area of a triangle of
base b and height h
circumference of a
circle of radius r
area of a circle of radius r
surface area of a
sphere of radius r
volume of a sphere of
radius r
E.2 Hyperbolic
= 2πr
ex − e−x
ex + e−x
cosh x =
sinh x
tanh x =
cosh x
= πr2
sinh x =
= 4πr2
E.3 Calculus
Table of integrals
Let f and g be functions of x, and let
c be a constant.
xm dx =
Linearity of the derivative:
= ln |x| + c
(cf ) = c
xm+1 + c, m 6= −1
sin x dx = − cos x + c
(f + g) =
cos x dx = sin x + c
Rules for differentiation
ex dx = ex + c
ln x dx = x ln x − x + c
The chain rule:
f (g(x)) = f 0 (g(x))g 0 (x)
Derivatives of products and quotients:
(f g) =
f g0
− 2
Integral calculus
The fundamental theorem of calculus:
dx = f
Linearity of the integral:
cf (x)dx = c f (x)dx
[f (x) + g(x)] =
f (x)dx+
Integration by parts:
f dg = f g − gdf
= tan−1 x + c
1 + x2
= sin−1 x + c
1 − x2
cosh x dx = sinh x + c
sinh x dx = cosh x + c
tan x dx = − ln | cos x| + c
cot x dx = ln | sin x| + c
sec x dx = ln | sec x + tan x| + c
sec2 x dx = tan x + c
csc2 x dx = − cot x + c
Apèry’s constant, 114
Archimedean principle, 142
arctangent, 86
in Cartesian coordinates, 127
in polar coordinates, 131
argument, 118
average, 74
Basel problem, 114
Berkeley, George, 30
boundary point, 157
differential, 13
fundamental theorem of
proof, 152
statement, 72
integral, 13
Cartesian coordinates, 131
chain rule, 37
change of variables, 85
chromatic scale, 113
compact set, 157
completeness, 155
complex number, 117
argument of, 118
conjugate of, 118
magnitude of, 118
composition, 53
concavity, 16
conjugate, 118
continuous function, 53
Cartesian, 131
cylindrical, 133
polar, 131
spherical, 133
derivative of, 29
cylindrical coordinates, 133
chain rule, 37
defined using a limit, 31, 46
defined using infinitesimals, 34
definition using tangent line, 13
of a polynomial, 14, 138
of a quotient, 42
of a second-order polynomial, 14
of square root, 36
of the cosine, 29
of the exponential, 39, 149
of the logarithm, 40
of the sine, 28, 139
product rule, 35
properties of, 14
second, 15
undefined, 18
Descartes, Réne, 131
computer-aided, 43
numerical, 45
symbolic, 43
implicit, 84
propagation of, 19
Euclid, 103
Euler, 114
Euler’s formula, 120
Euler, Leonhard, 121
definition of, 149
derivative of, 39
extreme value theorem, 56
proof, 157
extremum of a function, 17
factorial, 9, 108
fission, 135
fundamental theorem of algebra
proof, 160
statement, 120
fundamental theorem of calculus
proof, 152
statement, 72
Galileo, 11
Gauss, Carl Friedrich, 7
portrait, 7
geometric series, 29, 103
halo, 33
Holditch’s theorem, 82
hyperbolic cosine, 48
hyperbolic tangent, 49
hyperinteger, 148
hyperreal number, 31
imaginary number, 117
implicit differentiation, 84
improper integral, 99
indeterminate form, 63
Inf (calculator), 27
infinitesimal number, 25
criticism of, 30
safe use of, 30
infinity, 25
inflection point, 17
integral, 13
definition, 72
improper, 99
definition, 71
iterated, 127
properties of, 73
integral test, 105
numerical, 71
symbolic, 44
methods of
by parts, 87
change of variable, 85
partial fractions, 89, 122
substitution, 85
intermediate value theorem, 54, 154
iterated integral, 127
Kepler, Johannes, 83
l’Hôpital’s rule
general form, 65
proofs, 150
simplest form, 60
Leibniz notation
derivative, 26
infinitesimal, 26
integral, 71
Leibniz, Gottfried, 25
limit, 31
infinitesimals, 58
Weierstrass, 58
liquid drop model, 135
definition of, 40
magnitude of a complex number, 118
maximum of a function, 17
mean value theorem
proof, 159
statement, 74
minimum of a function, 17
model, 143
moment of inertia, 129
Newton’s method, 83
Newton, Isaac, 10
normalization, 75
nucleus, 135
partial fractions, 89, 122
residue method, 92
periodic function, 176
planets, motion of, 83
polar coordinates, 131
probability, 75
product rule, 35
propagation of errors, 19
quantifier, 141
derivative of, 42
radius of convergence, 109
ratio test, 105
residue method, 92
Robinson, Abraham, 31
Rolle’s theorem, 74
sequence, 103
geometric, 29, 103
infinite, 103
Taylor, 106
telescoping, 191
series, infinite, 107
derivative of, 28
Sophomore’s dream, 113
spherical coordinates, 133
standard deviation, 79
standard part, 34
substitution, 85
synthetic division, 29
tangent line
formal definition, 137
informal definition, 12
Taylor series, 106
telescoping series, 191
transfer principle, 32
applied to functions, 148
in cylindrical coordinates, 133
in spherical coordinates, 134
well-formed formula, 142
work, 75
Zeno’s paradox, 103
Was this manual useful for you? yes no
Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Download PDF