Proofs: "That's some catch, that Catch-22"
David has proof seminars on Friday 12:30 PM and 3:00 PM in room 221
and is holding office hours at 10:00 AM.
In this course, we will learn a few principles for mathematical argumentation and
reasoning. We summarize here a few:
Week 13:
- In the proof of the divergence theorem, one can see an important
principle: the divide and conquer strategy. In order to
prove any of the Stokes type theorem, it is enough to prove it
for fields of the form [P,0,0], [0,Q,0] or [0,0,R]. If it is true
for these building blocks, then by linearity, it is true
in general. This is the linearity on the field side.
An other linearity applies on the geometry side. We can split up
a region into smaller parts. The integrals add up. The magic happens
at the interfaces of two parts. The boundary contributions cancel
at intersections. This results that only the boundary part
survives. This is the essential part why all these integral
theorems work.
Week 12:
- For the application part, we did not see much proof methods
but motivation for doing the mathematics we do.
Electromagnetism, Fluid dynamics and Topology, these were all
historically pivotal for the development of the mathematics of
Stokes theorem.
Week 11:
- The proof of Green can be simplified if the region is both
``left to right" and ``right to left". Fortunately, we can cut up a
general region into small enough regions which are of both types.
Now, there is a simple trick: if F=[P,0] use the bottom to top
and if F=[0,Q], use the left to right to verify the claim. Then things
are easy and we do not have to fight with too many terms. The handout
has been updated and the proof got considerably simpler.
This is by the way how all calculus books prove the theorem.
How can one see that small enough cut regions are both left to right and
bottom to top? What happens is that by linearization, the region G intersected
with a small slightly rotated square is a close to a polygon. And if we have no
horizontal, nor vertical parts, then the region can be written as
G = { c(x) < d(x) } = { a(y) < b(y) }
So, to make this more precise, first prove that any convex polygon with no horizontal,
nor vertical parts is both ``left to right" and ``bottom to top". Then conclude that
if the sides are replaced by nonlinear curves close enough to the sides both in slope
as well as in position, then the property of being both ``left to right" and ``bottom to top"
persists.
|
|
Bottom to Top | Left to Right
|
Week 10:
- In the line integral handout there is
a proof that if F has the closed loop property, then
a potential function f can be obtained from F by building the line integral
from (0,0) to (x,y). Now, we have proven the statement by building two different
paths which go from (0,0) to (x,y). They only go horizontally or vertically.
The verification that fx=P and fy=Q can then be done separately.
The question is now, whether there is not a simpler proof. Yes, it appears there is:
define the function f as the line integral from (0,0) to (x,y). By the path independence
this is a well defined function. Now define G= grad(f). Now build the vector field H=F-G.
By the fundamental theorem of line integral this
vector field has now the property that any line integral from (0,0) to any point (x,y)
is zero. Now, we only have to show that if a vector field H has this property that any
line integral is zero, then it is the zero vector field. But that follows by contradiction.
Assume, that H(a,b) = v is not zero at some point (a,b). Then take a path r(t) = (a,b) + t v.
Now by linearization, the line integral for some short time t is t |v|2 + R(t),
where R(t) is a function behaving like t2. So there is a line integral which is
nonzero. Contradiction. This proof has the disadvantage that it uses the "proof by contradiction
technique" which one should avoid if possible (it is easier to make mistakes when using
contradiction), even so the proof is correct.
- While grading the exam, it became that some confuse still the proof by
contradiction and finding a counter example. We have asked to prove or
disprove that the sum of two odd numbers is odd. Giving a counter example 3+5=8 is
even is not a proof by contradiction. Yes, the statement contradicts the claim, but
we just call it a counter example. In the exam problem, where we asked to prove
that the product of two odd numbers is odd, many have used a proof by contradiction.
This is ok, but better is a constructive proof (which has been done by maybe two thirds
of the class:
Theorem: if a and b are odd, then ab is odd.
Proof. we can write a=2m+1 and b=2n+1. The product is ab=(2m+1)(2n+1) = 4nm+2m+2n+1 = 2k+1
with k=2nm+m+n. So, also ab is odd. QED.
Week 9:
- In homework 25.2b and 25.3, a combinatorial principle comes into play.
It is the ``exclusion-inclusion principle". Suppose we want to find the complement of
three sets A,B,C, in a larger set X and we want to measure the complement. Then
|X-(C ∪ B ∪ C)| = |X| - |A| - |B| - |C| + |A ∩ B| + |A ∩ C| + |B ∩ C| - |A ∩ B ∩ C|.
Week 8:
- In Unit 24, we look a bit at literature. There are a few sample pages from each of the
four books here [PDF]. You can also look at the samples
individually:
Polya (the principles) [PDF],
Tao (solving math problems) [PDF],
Perkins (The Klondike picture about breakthrough) [PDF],
Posamentier Krulik (the reseating problem) [PDF]
- We might wonder why in the Fubini counter example, we can not just away a small neighborhood
around 0 to eliminate the singularity and have a perfectly continuous function
f(x,y) = (x2-y2)/(x2+y2)2 away from (0,0)
and Fubini holds. This is really a bit mindboggling.
If one makes the hole smaller and smaller then the limit exists and gives a Fubini type identity
also in the limit. But this is a regularized limit. Like ∫-11 1/x dx
can be regularized (Cauchy) by taking the limit of ∫I(ε) 1/x dx with
I(ε) = [-1,ε] union [ε,1] where the limit is is 0, evenso 1/x can not be
integrated from 0 to 1.
- The Riemann integral is not only less powerful, it is also more complicated
to deal with if regions have complicated boundaries. But if we make a Riemann sum approximation
of the integral, we have to match the region with unions of rectangles. Of course, this could
be modified, but then we would have to say exactly which type of partitions we can use. With
this modification the set-up would become even more complicated without gaining anything in
generality. The Archimedes basic version of the Riemann integral is good enough to deal with
continuous functions defined on regions bound by various differentiable curves. The generalized
set-up about Riemann integrals one can see in the literature is a bit of a waste of time as the
integral must for more sophisticated mathematics (like probability theory) in anyway be replaced by
the Lebesgue integral (which is not only more elegant, but also simpler than the Riemann integral.
There is a bit of abstraction however coming in. You get a sense on page 3 in
Unit 8 [PDF] which gives the definition of the Lebesque integral.
- The question of signs of the distortion factor in integration comes always up when making changes of variables.
In single variable one can hide the difficulty by orienting things. For example, when computing the arc length
of a curve r(t), 0 < t < 1, we integrate ∫01 |r'(t)| dt. This is
independent of the parametrization. It is also clear that the answer can not become negative, even if we go backwards.
So, if we integrate the other way round, we could use -dt, we just change the orientation of time
too (the change of sign can not be packed into the |dr| integration factor.
because the integration factor is (dr(t)^T dr(t))1/2).
In the case of coordinate changes with the same dimension, we have determinants of n x n matrices. The reason why
we can avoid in trouble in dimension 1 is that every one-dimensional object can be oriented.
In higher dimensions this is no more possible and as mentioned in the footnote of unit 23, we will
have to be a bit more sophisticated. The difference becomes dramatic in more general versions of
mathematics, where objects can become more general: in geometric probability theory (a Bosonic theory),
the integrals are called valuations, one deals then with properties like length, area or volume, but
also defines seemingly strange things like the length of a surface.
An then there is geometric measure theory (definitely a Fermionic theory), where one generalizes objects
too but where also the integration is kept orientation dependent and where one has a Stokes theorem dealing
with generalized forms. The cracks are already visible in single variable; in multivariable it will become dramatic.
Actually, things are very simple, if we just completely ignore yet any orientation issues and
define the Riemann integrals both in single as well as in multivariable, as something which does
not depend on the orientation of the underlying space. That is what we have done. A consequence
is to keep the distortion factor in ``change of variable formulas" non-negative. So,
if I=[0,1] is the unit interval and f(x)=x2 is a function representing a population density for example,
then ∫I x2 dx = 1/3 is just adding up the density values.
If we make a change of variables φ(x)=1-x, then this leaves the interval invariant and
|dφ| = 1 and ∫I (1-x)2 dx is still 1/2.
There is no need to change both the orientation of the ``space" I as well as of ``time" dx.
Week 7:
- We had not much time for the Morse lemma in class.
Try to read the proof from unit 19. It is a rather amazing result as, it allows (without the use
of linear algebra tricks) to see that near a critical point with non-zero determinant D,
we can find coordinates in which the function is a nice simple function. In dimension 2,
it implies that we can find coordinates near a critical point so that the function is
f(x,y) = x2+y2 or f(x,y) = -x2-y2 or
then f(x,y) = -x2+y2. You will in the proof seminar look a bit
at some amazing properties of Morse functions. One can use them to understand geometric
spaces like surfaces. Morse theory takes away any pathologies or crazyness of a space and allows
to use calculus to compute topological invariants. It makes geometry more rounded and accessible.
- Maybe a bit more personal: one of my calculus teachers was Eugene Trubowitz. He did some
nice things in Calc III (multivariable calculus), like dealing with the
Zeta Function [a page from my
own handwritten notes from lecture]. You see a picture of Trubowitz teaching that course
here.
Trubowitz told us: "Morse theory is very important". Many other mathematicians agree.
An Article of Raoul Bott from 1988 [PDF]
is especially nice. Bott
was one of the top mathematicians in the world and had his office just opposite to 507,
our classroom. At the end of the article Cliff Taubes is mentioned, who has his office also just nearby
and who used the theory at various occasions. Despite Morse theory being something classical, it is
still very modern. By the way, Marston Morse got both his Masters as well as his PhD degree from
Harvard. In any way, it is good to know that Harvard is a historical place for Morse theory.
Week 6:
- A proof of the Taylor theorem with remainder is given in
Stewart.
I presented the theorem in the following form
f(x+t) = ∑k=1n f(k)(x) tk/k! + ∫0t (t-s)n f(n+1) (x+s) ds/n!
(it is provided in a footnote to the handout). There is no need to learn this by heart. But one should take away that it allows
to estimate how close we are to the actual function. It is of the order C t(n+1)/n! if C is the maximal value of the (n+1)th derivative of f in the interval [0,t]. Now, as shown in class, this formula is clear in the case n=0, where it is just the
fundamental theorem of calculus. As also explained shortly in the lecture, the
induction step from n to n+1 is just integration by parts. To get from n to n+1,
we only have to show that three terms add to zero. Here is what I wrote on the blackboard
from the preparation notes [JPG],
[PDF]. If you stare the boxes, you see the integration by parts.
- The telescopic sum idea in the proof of the chain rule is central
in all of calculus. We wrote the derivative d/dt f(r(t)) as a difference
quotient and the split that up into a sum of difference quotients which
give fx x'(t) + fy y'(t) + fz z'(t).
Week 5:
- Thanks to Elliot to provide some references for proofs of Clairaut. There is a proof using
the mean value theorem in Stewart's book.
There is a proof of
Clairaut theorem which uses the Weierstrass approximation theorem. I had actually considered
until the day before the lecture to present such a proof but then decided against it because
the Weierstrass theorem has not been proven and because the Fubini proof allowed to use the
technique of proof by contradiction. The Fubini proof is also the most intuitive and it
produces a beautiful discrete version of the mixed derivative which can be used in discrete
setups like computer graphics. We will prove Fubini in the integration part of this course.
- The method of deformation is very important in modern geometric proofs.
The only of the Millenium problem which has been solved (the Poincaré conjecture)
had been proven by deformation using a partial differential equation, the Ricci flow.
This partial differential equation deforms the metric of a manifold.
- A new proof method which appeared this week is ``proof by contradiction".
In order to prove A ⇒ B, one assumes that B is false and concludes from
this that A is false. This contradiction proves then that B must be true.
It is a basic logic step that A ⇒ B is equivalent to
Not(B) ⇒ Not(A). This logical step is often done wrong:
The statement A ⇒ B does NOT imply B ⇒ A.
The statement "it rains therefore the street is wet" is not equivalent to
"the street is wet therefore it rains". The street could be wet also from
other reasons, like having swiped by a cleaning machine or from a shower
which has already passed.
- The animation of the Hopf Umlaufsatz has been done for
this page from 2006.
I had been told about this beautiful proof of Heinz
Hopf by Shlomo Sternberg sometime between 2002 and 2006.
Week 4:
- To prove that the Mandelbrot set is contained
in the disc of radius 2, one can prove by induction that if |c| = 2+a
then Tn(0) ≥ 2+n a . This statement S(n) can be
proven by induction using that T(0)=c, T2(0)=T(T(0)) = c2 + c.
Useful is the triangle inequality |z2 + c| ≥ |z|2 - |c|.
For the Mandelbulb set, where we take the 8th power, the induction proof becomes a
bit more messy when proving the statement S(n) again.
What is funny is that proving the stronger statement
S(n): Tn(0) ≥ 2n, (which means exponential growth), then
the proof is easier. The inequality Tn+1(0) ≥ 2n+1 is
easier to verify because the induction assumption is stronger too.
- If you look through the proofs we have done so far, we have
seen that in many cases, the proofs were just computations, but
computations done with variables. And then there was induction.
The uniqueness of row reduction was such a proof. But it appears implicitly
also in arguments like that the Mandelbrot set is contained in |c| ≤ 2.
As you want to prove there that if |c|=2+a, then |c^2+c| ≥ 2+2a. So,
if you start outside the circle of radius 2, then your distance to the circle
doubles in each step.
The induction technique is the only heavier proof technique we ask you to
master in the first midterm. Of course, you should be able to apply a theorem.
An example is to apply the intermediate value theorem to establish the existence
of a root.
We will look at more proof techniques later on. For now, it is also
important that you appreciate that mathematical objects have to be
defined properly. And an important ability is also to be able to follow a
given proof. A good and already quite challenging exercise is the proof that
a continuous function is Riemann integrable or to understand the ``wobbly table
theorem".
- When building up calculus leading to the Euler formula
exp(i x) = cos(x) + i sin(x), there are several pieces which
need to be defined and then put together. Here is an elegant way:
first start defining the exponential, then use the Euler formula
as a definition for the trig functions. I used this once for
a presentation in quantum calculus, where one does not take any limits.
Here
is the talk (20 x 20 seconds long). In quantum calculus, one does
not take the limit but defines the derivative of a function as f(x+1)-f(x).
Now, if things are defined properly, then all the calculus we know goes
over to the discrete. But in the discrete, the functions do not even have
to be continuous. We have the fundamental theorem of calculus, the Taylor
formula, solutions to differential equations etc.
Week 3:
- We have seen in the proof that a continuous function on [a,b] is Riemann
integrable how useful it is to split the task up into smaller pieces. This
is not only useful in mathematics, it is useful everywhere, like when working
on a project. We have taken three knowledge blocks for granted: the extremal value theorem,
the intermediate value theorem and the Heine-Cantor theorem which tells that a
function is continuous on [a,b] if and only if there exists a sequence of numbers
Mn going to zero such that whenever we take two points of distance
|x-y| ≤ 1/n, then their function values satisfies |f(x)-f(y)| ≤ Mn.
This is called uniform continuity and could be taken as the definition of continuity
as it is equivalent on a closed and bounded interval. Now, using these three
statements, the proof of the convergence of the Riemann integral was not that bad.
During class, there was the excellent question, why we did not use the mean value
theorem instead. I answered that it would be ``catch 22" as the mean value theorem
either assumes the function to be differentiable or then assumes the fundamental
theorem of calculus already assumes the existence of the integral. We avoided the catch 22
by defining the integral as a limsup. In order to talk about some thing, we needed a
preliminary definition. Defining the integral as a limit would have been catch 22, since
the goal of that entire venture was to prove that the limit exists.
- If T(t) is the unit tangent vector, then T' is perpendicular to T.
Proof: T.T =1 implies T'.T + T.T' = 0 so that T'.T = 0.
We have used the product formula d/dt (T . S ) = T'.S + T.S'.
- In general, algebra can be an adventure: examples:
- "We all know that 1/4 dollar is 25 cents. Take the square root: so 1/2 dollar is 5 cents!"
- "1 dollar = 100 cents = (10 cents)2 = (0.1 dollars)2 = 0.01 dollars = 1 cent"
- "x=0 multiply with x-1 to get x(x-1)=0. Now divide by x to get x-1=0 or x=1. We have shown 0=1"
You find lots of more such fun on This page.
Mathematica was careful to simplify an equation. When we wanted to equate the curvatures |T'|/|r'| and |r''xr'|/|r'|3|,
the computer refused to simplify at first. We had to use a trick.
Week 2:
- A good example can be essential to understand a theorem.
(This was used when verifying AT being perpendicular to solution space Ax=b.)
Even better is if you have a picture associated to the result. The abstract
theorem that the kernel of A is perpendicular to the image of AT
can best be thought of by seeing the kernel of A as a plane and the image
of AT as a normal vector to that plane. A good picture helps to keep the
result in the memory.
- We can make use of AI. A computer algebra system can help us to verify tedious
computations. (This was used when verifying the Cauchy-Binet theorem).
- We have also seen that a computer verification does not necessarily give insight.
To see things, one has to work harder. A visual explanation can potentially even
confuse. One can hardly argue with a naked algebraic verification but argue wrongly
with a visual proofs. Still, some geometric insight can help to appreciate the result:
the Cauchy Binet identity appears in disguise in HW 6.1, the three dimensional
Pythagorean theorem: ``the sum of the squares of the areas of the sides of a corner
is the square of the area of the cutting face".
Week 1:
- It is an important Catch 22 principle: avoid circular arguments.
(This applied when deriving Pythagoras. We took great care to avoid Pythagoras before proving it).
- The reduction principle "Without loss of generality" WLOG
allows to reduce a statement to a simpler statement. (This was was used in Cauchy-Schwarz)
- The principle "You shall only use precise definitions!" is necessary to avoid confusion
and errors. [This was used when introducing vectors or angles. A vector is not an object with
magnitude and direction. This definition is gibberish even so it sounds good.
A movie has a director and a length but it is hardly a vector.
The zero vector does not have a direction!
We have defined here vectors as a special class of
matrices and defined length as |v|=(v.v)1/2 and can define the direction of a non-zero
vector as v/|v|. ]
- Breaking up a proof using a lemma helps to reduce the complexity of a proof (was used in rref proof)
- The "principle of "Induction": check the statement for n=1, then check that
the assumption for n implies the statement for n+1. (was used in rref)
- Finding or even reading a proof can be hard. It is good to know this and enjoy this. A proof is
sometimes like a puzzle. Finding a proof is like inventing a puzzle.
Literature can help. (This applies to rref proof)