Proofs: "That's some catch, that Catch-22"

David has proof seminars on Friday 12:30 PM and 3:00 PM in room 221 and is holding office hours at 10:00 AM. In this course, we will learn a few principles for mathematical argumentation and reasoning. We summarize here a few:

Week 13:

In the proof of the divergence theorem, one can see an important principle: the divide and conquer strategy. In order to prove any of the Stokes type theorem, it is enough to prove it for fields of the form [P,0,0], [0,Q,0] or [0,0,R]. If it is true for these building blocks, then by linearity, it is true in general. This is the linearity on the field side. An other linearity applies on the geometry side. We can split up a region into smaller parts. The integrals add up. The magic happens at the interfaces of two parts. The boundary contributions cancel at intersections. This results that only the boundary part survives. This is the essential part why all these integral theorems work.

Week 12:

For the application part, we did not see much proof methods but motivation for doing the mathematics we do. Electromagnetism, Fluid dynamics and Topology, these were all historically pivotal for the development of the mathematics of Stokes theorem.

Week 11:

The proof of Green can be simplified if the region is both ``left to right" and ``right to left". Fortunately, we can cut up a general region into small enough regions which are of both types. Now, there is a simple trick: if F=[P,0] use the bottom to top and if F=[0,Q], use the left to right to verify the claim. Then things are easy and we do not have to fight with too many terms. The handout has been updated and the proof got considerably simpler. This is by the way how all calculus books prove the theorem. How can one see that small enough cut regions are both left to right and bottom to top? What happens is that by linearization, the region G intersected with a small slightly rotated square is a close to a polygon. And if we have no horizontal, nor vertical parts, then the region can be written as G = { c(x) < d(x) } = { a(y) < b(y) } So, to make this more precise, first prove that any convex polygon with no horizontal, nor vertical parts is both ``left to right" and ``bottom to top". Then conclude that if the sides are replaced by nonlinear curves close enough to the sides both in slope as well as in position, then the property of being both ``left to right" and ``bottom to top" persists.

Bottom to Top Left to Right

Week 10:
- In the line integral handout there is a proof that if F has the closed loop property, then a potential function f can be obtained from F by building the line integral from (0,0) to (x,y). Now, we have proven the statement by building two different paths which go from (0,0) to (x,y). They only go horizontally or vertically. The verification that f_x=P and f_y=Q can then be done separately. The question is now, whether there is not a simpler proof. Yes, it appears there is: define the function f as the line integral from (0,0) to (x,y). By the path independence this is a well defined function. Now define G= grad(f). Now build the vector field H=F-G. By the fundamental theorem of line integral this vector field has now the property that any line integral from (0,0) to any point (x,y) is zero. Now, we only have to show that if a vector field H has this property that any line integral is zero, then it is the zero vector field. But that follows by contradiction. Assume, that H(a,b) = v is not zero at some point (a,b). Then take a path r(t) = (a,b) + t v. Now by linearization, the line integral for some short time t is t |v|² + R(t), where R(t) is a function behaving like t². So there is a line integral which is nonzero. Contradiction. This proof has the disadvantage that it uses the "proof by contradiction technique" which one should avoid if possible (it is easier to make mistakes when using contradiction), even so the proof is correct.
- While grading the exam, it became that some confuse still the proof by contradiction and finding a counter example. We have asked to prove or disprove that the sum of two odd numbers is odd. Giving a counter example 3+5=8 is even is not a proof by contradiction. Yes, the statement contradicts the claim, but we just call it a counter example. In the exam problem, where we asked to prove that the product of two odd numbers is odd, many have used a proof by contradiction. This is ok, but better is a constructive proof (which has been done by maybe two thirds of the class:
  Theorem: if a and b are odd, then ab is odd.
  Proof. we can write a=2m+1 and b=2n+1. The product is ab=(2m+1)(2n+1) = 4nm+2m+2n+1 = 2k+1 with k=2nm+m+n. So, also ab is odd. QED.
Week 9:
In homework 25.2b and 25.3, a combinatorial principle comes into play. It is the ``exclusion-inclusion principle". Suppose we want to find the complement of three sets A,B,C, in a larger set X and we want to measure the complement. Then |X-(C ∪ B ∪ C)| = |X| - |A| - |B| - |C| + |A ∩ B| + |A ∩ C| + |B ∩ C| - |A ∩ B ∩ C|.

Week 8:

In Unit 24, we look a bit at literature. There are a few sample pages from each of the four books here [PDF]. You can also look at the samples individually: Polya (the principles) [PDF], Tao (solving math problems) [PDF], Perkins (The Klondike picture about breakthrough) [PDF], Posamentier Krulik (the reseating problem) [PDF]
We might wonder why in the Fubini counter example, we can not just away a small neighborhood around 0 to eliminate the singularity and have a perfectly continuous function f(x,y) = (x²-y²)/(x²+y²)² away from (0,0) and Fubini holds. This is really a bit mindboggling. If one makes the hole smaller and smaller then the limit exists and gives a Fubini type identity also in the limit. But this is a regularized limit. Like ∫_-1¹ 1/x dx can be regularized (Cauchy) by taking the limit of ∫_I(ε) 1/x dx with I(ε) = [-1,ε] union [ε,1] where the limit is is 0, evenso 1/x can not be integrated from 0 to 1.
The Riemann integral is not only less powerful, it is also more complicated to deal with if regions have complicated boundaries. But if we make a Riemann sum approximation of the integral, we have to match the region with unions of rectangles. Of course, this could be modified, but then we would have to say exactly which type of partitions we can use. With this modification the set-up would become even more complicated without gaining anything in generality. The Archimedes basic version of the Riemann integral is good enough to deal with continuous functions defined on regions bound by various differentiable curves. The generalized set-up about Riemann integrals one can see in the literature is a bit of a waste of time as the integral must for more sophisticated mathematics (like probability theory) in anyway be replaced by the Lebesgue integral (which is not only more elegant, but also simpler than the Riemann integral. There is a bit of abstraction however coming in. You get a sense on page 3 in Unit 8 [PDF] which gives the definition of the Lebesque integral.
The question of signs of the distortion factor in integration comes always up when making changes of variables. In single variable one can hide the difficulty by orienting things. For example, when computing the arc length of a curve r(t), 0 < t < 1, we integrate ∫₀¹ |r'(t)| dt. This is independent of the parametrization. It is also clear that the answer can not become negative, even if we go backwards. So, if we integrate the other way round, we could use -dt, we just change the orientation of time too (the change of sign can not be packed into the |dr| integration factor. because the integration factor is (dr(t)^T dr(t))^1/2). In the case of coordinate changes with the same dimension, we have determinants of n x n matrices. The reason why we can avoid in trouble in dimension 1 is that every one-dimensional object can be oriented. In higher dimensions this is no more possible and as mentioned in the footnote of unit 23, we will have to be a bit more sophisticated. The difference becomes dramatic in more general versions of mathematics, where objects can become more general: in geometric probability theory (a Bosonic theory), the integrals are called valuations, one deals then with properties like length, area or volume, but also defines seemingly strange things like the length of a surface. An then there is geometric measure theory (definitely a Fermionic theory), where one generalizes objects too but where also the integration is kept orientation dependent and where one has a Stokes theorem dealing with generalized forms. The cracks are already visible in single variable; in multivariable it will become dramatic. Actually, things are very simple, if we just completely ignore yet any orientation issues and define the Riemann integrals both in single as well as in multivariable, as something which does not depend on the orientation of the underlying space. That is what we have done. A consequence is to keep the distortion factor in ``change of variable formulas" non-negative. So, if I=[0,1] is the unit interval and f(x)=x² is a function representing a population density for example, then ∫_I x² dx = 1/3 is just adding up the density values. If we make a change of variables φ(x)=1-x, then this leaves the interval invariant and |dφ| = 1 and ∫_I (1-x)² dx is still 1/2. There is no need to change both the orientation of the ``space" I as well as of ``time" dx.

Week 7:

We had not much time for the Morse lemma in class. Try to read the proof from unit 19. It is a rather amazing result as, it allows (without the use of linear algebra tricks) to see that near a critical point with non-zero determinant D, we can find coordinates in which the function is a nice simple function. In dimension 2, it implies that we can find coordinates near a critical point so that the function is f(x,y) = x²+y² or f(x,y) = -x²-y² or then f(x,y) = -x²+y². You will in the proof seminar look a bit at some amazing properties of Morse functions. One can use them to understand geometric spaces like surfaces. Morse theory takes away any pathologies or crazyness of a space and allows to use calculus to compute topological invariants. It makes geometry more rounded and accessible.
Maybe a bit more personal: one of my calculus teachers was Eugene Trubowitz. He did some nice things in Calc III (multivariable calculus), like dealing with the Zeta Function [a page from my own handwritten notes from lecture]. You see a picture of Trubowitz teaching that course here. Trubowitz told us: "Morse theory is very important". Many other mathematicians agree. An Article of Raoul Bott from 1988 [PDF] is especially nice. Bott was one of the top mathematicians in the world and had his office just opposite to 507, our classroom. At the end of the article Cliff Taubes is mentioned, who has his office also just nearby and who used the theory at various occasions. Despite Morse theory being something classical, it is still very modern. By the way, Marston Morse got both his Masters as well as his PhD degree from Harvard. In any way, it is good to know that Harvard is a historical place for Morse theory.

Week 6:

A proof of the Taylor theorem with remainder is given in Stewart. I presented the theorem in the following form f(x+t) = ∑_k=1ⁿ f^(k)(x) t^k/k! + ∫₀^t (t-s)ⁿ f⁽ⁿ⁺¹⁾ (x+s) ds/n! (it is provided in a footnote to the handout). There is no need to learn this by heart. But one should take away that it allows to estimate how close we are to the actual function. It is of the order C t⁽ⁿ⁺¹⁾/n! if C is the maximal value of the (n+1)th derivative of f in the interval [0,t]. Now, as shown in class, this formula is clear in the case n=0, where it is just the fundamental theorem of calculus. As also explained shortly in the lecture, the induction step from n to n+1 is just integration by parts. To get from n to n+1, we only have to show that three terms add to zero. Here is what I wrote on the blackboard from the preparation notes [JPG], [PDF]. If you stare the boxes, you see the integration by parts.
The telescopic sum idea in the proof of the chain rule is central in all of calculus. We wrote the derivative d/dt f(r(t)) as a difference quotient and the split that up into a sum of difference quotients which give f_x x'(t) + f_y y'(t) + f_z z'(t).

Week 5:

Thanks to Elliot to provide some references for proofs of Clairaut. There is a proof using the mean value theorem in Stewart's book. There is a proof of Clairaut theorem which uses the Weierstrass approximation theorem. I had actually considered until the day before the lecture to present such a proof but then decided against it because the Weierstrass theorem has not been proven and because the Fubini proof allowed to use the technique of proof by contradiction. The Fubini proof is also the most intuitive and it produces a beautiful discrete version of the mixed derivative which can be used in discrete setups like computer graphics. We will prove Fubini in the integration part of this course.
The method of deformation is very important in modern geometric proofs. The only of the Millenium problem which has been solved (the Poincaré conjecture) had been proven by deformation using a partial differential equation, the Ricci flow. This partial differential equation deforms the metric of a manifold.
A new proof method which appeared this week is ``proof by contradiction". In order to prove A ⇒ B, one assumes that B is false and concludes from this that A is false. This contradiction proves then that B must be true. It is a basic logic step that A ⇒ B is equivalent to Not(B) ⇒ Not(A). This logical step is often done wrong: The statement A ⇒ B does NOT imply B ⇒ A. The statement "it rains therefore the street is wet" is not equivalent to "the street is wet therefore it rains". The street could be wet also from other reasons, like having swiped by a cleaning machine or from a shower which has already passed.
The animation of the Hopf Umlaufsatz has been done for this page from 2006. I had been told about this beautiful proof of Heinz Hopf by Shlomo Sternberg sometime between 2002 and 2006.

Week 4:

To prove that the Mandelbrot set is contained in the disc of radius 2, one can prove by induction that if |c| = 2+a then Tⁿ(0) ≥ 2+n a . This statement S(n) can be proven by induction using that T(0)=c, T²(0)=T(T(0)) = c² + c. Useful is the triangle inequality |z² + c| ≥ |z|² - |c|. For the Mandelbulb set, where we take the 8th power, the induction proof becomes a bit more messy when proving the statement S(n) again. What is funny is that proving the stronger statement S(n): Tⁿ(0) ≥ 2ⁿ, (which means exponential growth), then the proof is easier. The inequality Tⁿ⁺¹(0) ≥ 2ⁿ⁺¹ is easier to verify because the induction assumption is stronger too.
If you look through the proofs we have done so far, we have seen that in many cases, the proofs were just computations, but computations done with variables. And then there was induction. The uniqueness of row reduction was such a proof. But it appears implicitly also in arguments like that the Mandelbrot set is contained in |c| ≤ 2. As you want to prove there that if |c|=2+a, then |c^2+c| ≥ 2+2a. So, if you start outside the circle of radius 2, then your distance to the circle doubles in each step. The induction technique is the only heavier proof technique we ask you to master in the first midterm. Of course, you should be able to apply a theorem. An example is to apply the intermediate value theorem to establish the existence of a root. We will look at more proof techniques later on. For now, it is also important that you appreciate that mathematical objects have to be defined properly. And an important ability is also to be able to follow a given proof. A good and already quite challenging exercise is the proof that a continuous function is Riemann integrable or to understand the ``wobbly table theorem".
When building up calculus leading to the Euler formula exp(i x) = cos(x) + i sin(x), there are several pieces which need to be defined and then put together. Here is an elegant way: first start defining the exponential, then use the Euler formula as a definition for the trig functions. I used this once for a presentation in quantum calculus, where one does not take any limits. Here is the talk (20 x 20 seconds long). In quantum calculus, one does not take the limit but defines the derivative of a function as f(x+1)-f(x). Now, if things are defined properly, then all the calculus we know goes over to the discrete. But in the discrete, the functions do not even have to be continuous. We have the fundamental theorem of calculus, the Taylor formula, solutions to differential equations etc.

Week 3:

We have seen in the proof that a continuous function on [a,b] is Riemann integrable how useful it is to split the task up into smaller pieces. This is not only useful in mathematics, it is useful everywhere, like when working on a project. We have taken three knowledge blocks for granted: the extremal value theorem, the intermediate value theorem and the Heine-Cantor theorem which tells that a function is continuous on [a,b] if and only if there exists a sequence of numbers M_n going to zero such that whenever we take two points of distance |x-y| ≤ 1/n, then their function values satisfies |f(x)-f(y)| ≤ M_n. This is called uniform continuity and could be taken as the definition of continuity as it is equivalent on a closed and bounded interval. Now, using these three statements, the proof of the convergence of the Riemann integral was not that bad. During class, there was the excellent question, why we did not use the mean value theorem instead. I answered that it would be ``catch 22" as the mean value theorem either assumes the function to be differentiable or then assumes the fundamental theorem of calculus already assumes the existence of the integral. We avoided the catch 22 by defining the integral as a limsup. In order to talk about some thing, we needed a preliminary definition. Defining the integral as a limit would have been catch 22, since the goal of that entire venture was to prove that the limit exists.
If T(t) is the unit tangent vector, then T' is perpendicular to T. Proof: T.T =1 implies T'.T + T.T' = 0 so that T'.T = 0. We have used the product formula d/dt (T . S ) = T'.S + T.S'.
In general, algebra can be an adventure: examples:
1. "We all know that 1/4 dollar is 25 cents. Take the square root: so 1/2 dollar is 5 cents!"
2. "1 dollar = 100 cents = (10 cents)² = (0.1 dollars)² = 0.01 dollars = 1 cent"
3. "x=0 multiply with x-1 to get x(x-1)=0. Now divide by x to get x-1=0 or x=1. We have shown 0=1"
You find lots of more such fun on This page. Mathematica was careful to simplify an equation. When we wanted to equate the curvatures |T'|/|r'| and |r''xr'|/|r'|³|, the computer refused to simplify at first. We had to use a trick.

Week 2:

A good example can be essential to understand a theorem. (This was used when verifying A^T being perpendicular to solution space Ax=b.) Even better is if you have a picture associated to the result. The abstract theorem that the kernel of A is perpendicular to the image of A^T can best be thought of by seeing the kernel of A as a plane and the image of A^T as a normal vector to that plane. A good picture helps to keep the result in the memory.
We can make use of AI. A computer algebra system can help us to verify tedious computations. (This was used when verifying the Cauchy-Binet theorem).
We have also seen that a computer verification does not necessarily give insight. To see things, one has to work harder. A visual explanation can potentially even confuse. One can hardly argue with a naked algebraic verification but argue wrongly with a visual proofs. Still, some geometric insight can help to appreciate the result: the Cauchy Binet identity appears in disguise in HW 6.1, the three dimensional Pythagorean theorem: ``the sum of the squares of the areas of the sides of a corner is the square of the area of the cutting face".

Week 1:

It is an important Catch 22 principle: avoid circular arguments. (This applied when deriving Pythagoras. We took great care to avoid Pythagoras before proving it).
The reduction principle "Without loss of generality" WLOG allows to reduce a statement to a simpler statement. (This was was used in Cauchy-Schwarz)
The principle "You shall only use precise definitions!" is necessary to avoid confusion and errors. [This was used when introducing vectors or angles. A vector is not an object with magnitude and direction. This definition is gibberish even so it sounds good. A movie has a director and a length but it is hardly a vector. The zero vector does not have a direction! We have defined here vectors as a special class of matrices and defined length as |v|=(v.v)^1/2 and can define the direction of a non-zero vector as v/|v|. ]
Breaking up a proof using a lemma helps to reduce the complexity of a proof (was used in rref proof)
The "principle of "Induction": check the statement for n=1, then check that the assumption for n implies the statement for n+1. (was used in rref)
Finding or even reading a proof can be hard. It is good to know this and enjoy this. A proof is sometimes like a puzzle. Finding a proof is like inventing a puzzle. Literature can help. (This applies to rref proof)

Math 22a Fall 2018

22a Linear Algebra and Vector Analysis