Lecture notes, etc., for Math 55b: Honors Real and Complex Analysis (Spring [2010-]2011)

If you find a mistake, omission, etc., please let me know by e-mail.

The orange balls mark our current location in the course, and the current problem set.


My office hours are the same as for Math 55a: Fridays 3:00–4:30 PM at my office (Sci Ctr 335), or by appointment
Tony Feng's section and office hours:
Section: Thursdays 4–5 PM, Science Center 310
Office hours: Thursdays 8–9 PM, also in Science Center 310
Our first topic is the topology of metric spaces, a fundamental tool of modern mathematics that we shall use mainly as a key ingredient in our rigorous development of differential and integral calculus over R and C. To supplement the treatment in Rudin's textbook, I wrote up 20-odd pages of notes in six sections; copies will be distributed in class, and you also may view them and print out copies in advance from the PDF files linked below. [Some of the explanations, as of notations such as f (·) and the triangle inequality in C, will not be necessary; they were needed when this material was the initial topic of Math 55a, and it doesn't feel worth the effort to delete them now that it's been moved to 55b. Likewise for the comment about the Euclidean distance at the top of page 2 of the initial handout on “basic definitions and examples”.]

Metric Topology I
Basic definitions and examples: the metric spaces Rn and other product spaces; isometries; boundedness and function spaces

The “sup metric” on XS is sometimes also called the “uniform metric” because d(f, g)≤r is equivalent to a bound d(f(s), g(s))≤r for all s in S that is “uniform” in the sense that it's independent of the choice of s. Likewise for the sup metric on the space of bounded functions from S to an arbitrary metric space X (see the next paragraph).
If S is an infinite set and X is an unbounded metric space then we can't use our definition of XS as a metric space because supS dX(f(s), g(s)) might be infinite. But the bounded functions from S to X do constitute a metric space under the same definition of dXS. A function is said to be “bounded” if its image is a bounded set. You should check that dXS(f, g) is in fact finite for bounded f and g.
Now that metric topology is in 55b, not 55a, the following observation can be made: if X is R or C, the bounded functions in XS constitute a vector space, and the sup metric comes from a norm on that vector space: d(f, g) = ||fg|| where the norm ||·|| is defined by ||f || = sups |f(s)|. Likewise for the bounded functions from S to any normed vector space. Such spaces will figure in our development of real analysis (and in your further study of analysis beyond Math 55).
The “Proposition” on page 3 of the first topology handout can be extended as follows:
iv) For every point p of X there exists a real number M such that d(p,q) < M for all q of E.
In other words, for every p in X there exists an open ball about p that contains E. Do you see why this is equivalent to (i), (ii), and (iii)?
Metric Topology II
Open and closed sets, and related notions
tweaked 27.i.11 to correct typo “limits points” at the bottom of page 3, remove superfluous parentheses around “sup E” near the end of page 4 (twice), and make overlines in B, E, etc. look nicer

Metric Topology III
Introduction to functions and continuity
tweaked 27.i.11: the Appendix gives a proof that (C, d) is a metric space, but probably not the [only] proof…
tweaked 30.i.11 to correct a minor typo on page 2: at the end of the proof of the Theorem (Rudin 4.8), the ε-neighborhood is of course Nε( f (p)), not Nr( f (p)).

Metric Topology IV
Sequences and convergence, etc.
tweaked 29.i.11 for overlines (see 27.i.11 tweak above)

Metric Topology V
Compactness and sequential compactness
tweaked 29.i.11 for overlines (see 27.i.11 tweak above)

Metric Topology VI
Cauchy sequences and related notions (completeness, completions, and a third formulation of compactness)
tweaked 9.ii.11 (reference in p.3 is to Problem 4 of PS1, not Problem 5)

Here is a more direct proof of the theorem that a continuous map f : XY between metric spaces is uniformly continuous if X is compact. Assume not. Then there exists ε > 0 such that for all δ > 0 there are some points p, q in X such that d(p, q) < δ but d(f (p), f (q)) ≥ ε. For each n = 1, 2, 3, …, choose pn, qn that satisfy those inequalities for δ = 1/n. Since X is assumed (sequentially) compact, we can extract a subsequence {pni} of {pn} that converges to some p in X. But then {qni} converges to the same p. Hence both f (pni) and f (qni) converge to f (p), which contradicts the fact that d(f (pni), f (qni)) ≥ ε for each i.

Our next topic is differential calculus of vector-valued functions of one real variable, building on Chapter 5 of Rudin.
You may have already seen “little oh” and “big Oh” notations. For functions f, g on the same space, f = O(g)” means that  g  is a nonnegative real-valued function, f takes values in a normed vector space, and there exists a real constant M such that |f(x)|≤Mg(x) for all x. The notation f = o(g)” is used in connection with a limit; for instance, f(x) = o(g(x)) as x approaches x0” indicates that f, g are vector- and real-valued functions as above on some neighborhood of x0, and that for each ε>0 there is a neighborhood of x0 such that |f(x)|≤εg(x) for all x in the neighborhood. Thus f '(x0) = a means the same as f (x) = f (x0) + a(xx0) + o(|xx0|) as x approaches x0”, with no need to exclude the case x = x0. Rudin in effect uses this approach when proving the Chain Rule (5.5).

Apropos the Chain Rule: as far as I can see we don't need continuity of f  at any point except x (though that hypothesis will usually hold in any application). All that's needed is that x has some relative neighborhood N in [a,b] such that f (N) is contained in I. Also, it is necessary that f  map [a,b] to R, but g can take values in any normed vector space.

The derivative of f /g can be obtained from the product rule, together with the derivative of 1/g — which in turn can be obtained from the Chain Rule together with the the derivative of the single function 1/x. Once we do multivariate differential calculus, we'll see that the derivatives of f +g, f −g, fg, f /g could also be obtained in much the same way that we showed the continuity of those functions, by combining the multivariate Chain Rule with the derivatives of the specific functions x+y, xy, xy, x/y of two variables x,y.

As Rudin notes at the end of this chapter, differentiation can also be defined for vector-valued functions of one real variable. As Rudin does not note, the vector space can even be infinite-dimensional, provided that it is normed; and the basic algebraic properties of the derivative listed in Thm. 5.3 (p.104) can be adapted to this generality, e.g., the formula (fg)' = f 'g + fg' still holds if f, g take values in normed vector spaces U, V and multiplication is interpreted as a continuous bilinear map from U × V to some other normed vector space W.

Rolle's Theorem is the special case f (b) = f (a) of Rudin's Theorem 5.10; as you can see it is in effect the key step in his proof of Theorem 5.9, and thus of 5.10 as well.

We omit 5.12 (continuity of derivatives) and 5.13 (L'Hôpital's Rule). In 5.12, see p.94 for Rudin's notion of “simple discontinuity” (or “discontinuity of the first kind”) vs. “discontinuity of the second kind”, but please don't use those terms in your problem sets or other mathematical writing, since they're not widely known. In Rudin's proof of L'Hôpital's Rule (5.13), why can he assume that g(x) does not vanish for any x in (a,b), and that the denominator g(x)−g(y) in equation (18) is never zero?

NB The norm does not have to come from an inner product structure. Often this does not matter because we work in finite dimensional vector spaces, where all norms are equivalent, and changing to an equivalent norm does not affect the definition of the derivative. The one exception to this is Thm. 5.19 (p.113) where one needs the norm exactly rather than up to a constant factor. This theorem still holds for a general norm but requires an additional argument. The key ingredient of the proof is this: given a nonzero vector z in a vector space V, we want a continuous functional w on V such that ||w|| = 1 and w(z) = |z|. If V is an inner product space (finite-dimensional or not), the inner product with z / |z| provides such a functional w. But this approach does not work in general. The existence of such w is usually proved as a corollary of the Hahn-Banach theorem. When V is finite dimensional, w can be constructed by induction on the dimension of V. To deal with the general case one must also invoke the Axiom of Choice in its usual guise of Zorn's Lemma.

We next start on univariate integral calculus, largely following Rudin, chapter 6. The following gives some motivation for the definitions there. (And yes, it's the same Riemann who gave number theorists like me the Riemann zeta function and the Riemann Hypothesis.)
The Riemann-sum approach to integration goes back to the “method of exhaustion” of classical Greek geometry, in which the area of a plane figure (or the volume of a region in space) is bounded below and above by finding subsets and supersets that are finite unions of disjoint rectangles (or boxes). The lower and upper Riemann sums adapt this idea to the integrals of functions which may be negative as well as positive (recall that one of the weaknesses of geometric Greek mathematics is that the ancient Greeks had no concept of negative quantities — nor, for that matter, of zero). You may have encountered the quaint technical term “quadrature”, used in some contexts as a synonym for “integration”. This too is an echo of the geometrical origins of integration. “Quadrature” literally means “squaring”, meaning not “multiplying by itself” but “constructing a square of the same size as”; this in turn is equivalent to “finding the area of”, as in the phrase “squaring the circle”. For instance, Greek geometry contains a theorem equivalent to the integration of x2dx, a result called the “quadrature of the parabola”. The proof is tantamount to the evaluation of lower and upper Riemann sums for the integral of x2dx.

An alternative explanation of the upper and lower Riemann sums, and of “partitions” and “refinements” (Definitions 6.1 and 6.3 in Rudin), is that they arise by repeated application of the following two axioms describing the integral (see for instance L.Gillman's expository paper in the American Mathematical Monthly (Vol.100 #1, 16–25)):

The latter axiom is a consequence of the following two: the integral of a constant function from a to b is that constant times the length b−a of the interval [a,b]; and if fg on some interval then the integral of f over that interval does not exceed the integral of g. Note that again all these axioms arise naturally from an interpretation of the integral as a “signed area”.

The (Riemann-)Stieltjes integral, with dα in place of dx, is then obtained by replacing each Δx = b−a by Δα = α(b)−α(a).

In Theorem 6.12, property (a) says the integrable functions form a vector space, and the integral is a linear transformation; property (d) says it's a bounded transformation relative to the sup norm, with operator norm at most Δα = α(b)−α(a) (indeed it's not hard to show that the operator norm equals Δα = α(b)−α(a)); and (b) and (c) are the axioms noted above. Property (e) almost says the integral is linear as a function of α — do you see why “almost”?

Recall the “integration by parts” identity: fg is an integral of f dg + g df. The Stieltjes integral is a way of making sense of this identity even when f and/or g is not continuously differentiable. To be sure, some hypotheses on f and g must still be made for the Stieltjes integral of f dg to make sense. Rudin specifies one suitable system of such hypotheses in Theorem 6.22.

Here's a version of Riemann-Stieltjes integrals that works cleanly for integrating bounded functions from [a,b] to any complete normed vector space.
corrected 27.ii.11 to fix a minor typo (“reasonable hypotheses”, not “reasonable hypothesis”).

Riemann-Stieltjes integration by parts: Suppose both f and g are increasing functions on [a,b]. For any partition a = x0 < … < xn = b of the interval, write f(b)g(b) − f(a)g(a) as the telescoping sum of f(xi)g(xi) − f(xi−1)g(xi−1) from i=1 to n. Now rewrite the i-th summand as

f (xi) (g(xi)−g(xi−1)) + g(xi−1) (f (xi )−f (xi−1)).
[Naturally it is no accident that this identity resembles the one used in the familiar proof of the formula for the derivative of fg !] Summing this over i yields the upper Riemann-Stieltjes sum for the integral of f dg plus the lower R.-S. sum for the integral of g df. Therefore: if one of these integrals exists, so does the other, and their sum is f (b)g(b) − f (a)g(a). [Cf. Rudin, page 141, Exercise 17.]
Most of Chapter 7 of Rudin we've covered already in the topology lectures and problem sets. For more counterexamples along the lines of the first section of that chapter, see Counterexamples in Analysis by B.R.Gelbaum and J.M.H.Olsted — there are two copies in the Science Center library (QA300.G4). Concerning Thm. 7.16, be warned that it can easily fail for “improper integrals” on infinite intervals (see Rudin, p.138, Exercise 8, assigned on PS2). It is often very useful to bring a limit or an infinite sum within an integral sign, but this procedure requires justification beyond Thm. 7.16.

We'll cover most of the new parts of Chapter 7:

We'll then outline the Stone-Weierstrass theorem, which is the one major result of Chapter 7 we haven't seen yet. We then proceed to power series and the exponential and logarithmic functions in Chapter 8. We omit most of the discussion of Fourier series (185–192), an important topic (which used to be what I concluded Math 55b with), but one that alas cannot be accommodated given the mandates of the curricular review. We'll encounter a significant special case in the guise of Laurent expansions of an analytic function on a disc. See these notes (part 1, part 2) from 2002-3 on Hilbert space for a fundamental context for Fourier series and much else (notably much of quantum mechanics), which is also what we'll use to give one proof of Müntz's theorem on uniform approximation by arbitrary powers.

We also postpone discussion of Euler's Beta and Gamma integrals (also in Chapter 8) so that we can use multivariate integration to give a more direct proof of the formula relating them.

The result concerning the convergence of alternating series is stated and proved on pages 70-71 of Rudin (Theorem 3.42).

The original Weierstrass approximation theorem (7.26 in Rudin) can be reduced to the uniform approximation of the single function |x| on [−1,1]. From this function we can construct an arbirtrary piecewise linear continuous function, and such piecewise linear functions uniformly approximate any continuous function on a closed interval. To get at |x|, we'll rewrite it as [1−(1−x2)]1/2, and use the power series for (1−X)1/2. This power series (and more generally the power series for (1−X)A) is the first part of Exercise 22 for Chapter 8, on p.201; we've already outlined another approach in PS5, Problem 3 (under the assumption of the standard formula for differentiating xr with respect to x, which as we note there is not too hard for r rational). We need (1−x)1/2 to be approximated by its power series uniformly on the closed interval [−1,1] (or at least [0,1]); but fortunately this too follows from the proof of Abel's theorem (8.2, pages 174-5). Actually this is a subtler result than we need, since the Xn coefficient of the power series for (1-X)1/2 is negative for every n>0. If a power series in X has radius of convergence 1 and all but finitely many of its nonzero coefficients have the same sign, then it is easily shown that the sum of the coefficients converges if and only if f(X) has a finite limit as X approachess 1, in which case the sum equals that limit and the power series converges uniformly on [0,1]. That's all we need because clearly (1-X)1/2 extends to a continuous function on [0,1]. (For an alternative approach to uniformly approximating |x|, see exercise 23 on p.169.)

Rudin's notion of an “algebra” of functions is almost a special case of what we called an “algebra over F” in 55a (with F = R or C as usual), except that Rudin does not require his algebras to have a unit (else he wouldn't have to impose the “vanish on no point” condition). The notion can be usefully abstracted to a “normed algebra over F”, which is an algebra together with a vector space norm ||·|| satisfying ||xy|| ≤ ||x|| ||y|| for all x and y in the algebra. Among other things this leads to the Stone-Čech theorem.

In the first theorem of Chapter 8, Rudin obtains the termwise differentiability of a power series at any |x| < R by applying Theorem 7.17. That's nice, but we'll want to use the same result in other contexts, notably over C, where the mean value theorem does not apply. So we'll instead give an argument that works in any complete field with an absolute value — this includes R, C, and other examples such as the field Qp of p-adic numbers. If the sum of cn xn converges for some nonzero x with |x| = R, then any x satisfying |x| < R has a neighborhood that is still contained in {y : |y| < R}. So if f (x) is the sum of that series, then for yx in that neighborhood we may form the usual quotient (f (x)−f (y)) / (x−y) and expand it termwise, then let yx and recover the expected power series for f '(x) using the Weierstrass M test (Theorem 7.10).

An alternative derivation of formula (26) on p.179: differentiate the power series (25) termwise (now that we know it works also over C) to show E(z) = dE(z)/dz; then for any fixed w the difference E(w+z) − E(w) E(z) is an analytic function of z that vanishes at z = 0 and is thus zero everywhere.

Error in Rudin: the argument on p.180 that “Since E is strictly increasing and differentiable on [the real numbers], it has an inverse function L which is also strictly increasing and differentiable …” is not quite correct: consider the strictly increasing and differentiable function taking x to x3. What's the correct statement? (Hint: the Chain Rule tells you what the derivative of the inverse function must be.)

An explicit upper bound of 4 on π can be obtained by calculating cos(2) < 1 − 2^2/2! + 2^4/4! = −1/3 < 0. For much better estimates, integrate (x−x2)4 dx/(x2+1) from 0 to 1 and note that ½ ≤ 1/(x2+1) ≤ 1. :-)   [That was noted in my recent Math Table talk, but attendance was depressed by (I'm told) midterms season.]

Rudin uses a fact about convex functions that is only presented as an exercise earlier in the book (p.100, #23). Namely: let f  be a convex function on some interval I, and consider the slope s(x, y) := (f (x)−f (y)) / (x−y) as a function on the set of (x, y) in I × I with x > y; then s is an increasing function of both variables. The proof is fortunately not hard. For instance, to prove that if x > y' > y then s(x, y' ) > s(x, y), write y' as px + qy with p + q = 1, and calculate that s(x, y' ) > s(x, y) is equivalent to the usual convexity condition. The case x > x' > y works in exactly the same way.

While we alas suppress most of Fourier analysis, we can easily prove the following: if f :R/2πZC is a continuous function whose Fourier coefficients an := (2π)−1 R/2πZ exp(−inx) f (x) dx satisfy n|an| < ∞ then f equals its Fourier series. Proof : the difference is a continuous function all of whose Fourier coefficients vanish. By Stone-Weierstrass, it can be uniformly approximated by trigonometric polynomials, etc. (This special case of Stone-Weierstrass is also a theorem of Fejér, who obtained an explicit sequence of trigonometric polynomials converging to the function; see Körner.) A nice example is f (x) = Bk(2πx) for each k≥2, where Bk is the kth Bernoulli polynomial; this yields the values of ζ(k) for k = 2, 4, 6, …, and much else.

The rather obscure integration by parts in Rudin, p.194 is not necessary. A straightfoward choice of “parts” yields

x B(x, y+1) = y B(x+1, y) ;

This may seem to go in a useless direction, but the elementary observation that

B(x, y) = B(x, y+1) + B(x+1, y)

recovers the recursion (97).

In addition to the trigonometric definite integrals noted by Rudin (formula 98), Beta functions also turn up in the evaluation of the definite integral of ua du / (1+ub)c over (0,∞): let t = ub / (1+ub). What is the value of that integral? Can you obtain in particular the formula π / (b sin(a π/b)) for the special case c=1?

The limit formula for Γ(x) readily yields the product formula:

Γ(x) = x−1 eCx Prod(exp(x/k) / (1+x/k), k=1,2,3,...)
where C=0.57721566490... is Euler's constant, the limit as N→∞ of 1 + (1/2) + (1/3) + ... + (1/N) − log(N). This lets us easily show that Γ is infinitely differentiable (in fact analytic) and to obtain nice formulas for the derivatives of log(Γ(x)); for instance, Γ'(1)=−C, and more generally the logarithmic derivative of Γ(x) at x = N+1 is 1 + (1/2) + (1/3) + ... + (1/N) − C.

We next begin multivariate differential calculus, starting at the middle of Rudin Chapter 9 (since the first part of that chapter is for us a review of linear algebra — but you might want to read through the material on norms of linear maps and related topics in pages 208–9). Again, Rudin works with functions from open subsets of Rn to Rm, but most of the discussion works equally well with the target space Rm replaced by an arbitrary normed vector space V. If we want to allow arbitrary normed vector spaces for the domain of f, we'll usually have to require that the derivative f ' be a continuous linear map, or equivalently that its norm ||f '|| = sup|v|≤1|f '(v)| be finite.

As in the univariate case, proving the Mean Value Theorem in the multivariate context (Theorem 9.19) requires either that V have an inner-product norm, or the use of the Hahn-Banach theorem to construct suitable functionals on V. Once this is done, the key Theorem 9.21 can also be proved for functions to V, and without first doing the case m=1. To do this, first prove the result in the special case when each Dj f (x) vanishes; then reduce to this case by subtracting from f the linear map from Rn to V indicated by the partial derivatives Dj f (x).

The Inverse function theorem (9.24) is a special case of the Implicit function theorem (9.28), and its proof amounts to specializing the proof of the implicit function theorem. But Rudin proves the Implicit theorem as a special case of the Inverse theorem, so we have to do Inverse first. (NB for these two theorems we will assume that our target space is finite-dimensional; how far can you generalize to infinite-dimensional spaces?) Note that Rudin's statement of the contraction principle (Theorem 9.23 on p.220; cf. the final problem of problem set 5) is missing the crucial hypothesis that X be nonempty! The end of the proof of 9.24 could be simplified if Rudin allowed himself the full use of the hypothesis that f is continuously differentiable on E, not just at a: differentiability of the inverse function g at b = f(a) is easy given Rudin's construction of g; differentiability at any other point f(x) follows, since x might as well be a, and then the derivative is continuous because g and f ' are.

The proof of the second part of the implicit function theorem, which asserts that the implicit function g not only exists but is also continuously differentiable with derivative at b given by formula (58) (p.225), can be done more easily using the chain rule, since g has been constructed as the composition of the following three functions: first, send y to (0, y); then, apply the inverse function F−1; finally, project the resulting vector (x,y) to x. The first and last of these three functions are linear, so certainly C1; and the continuous differentiability of F−1 comes from the inverse function theorem.

Here's an approach to Dij=Dji that works for a C2 function to an arbitrary normed space. As in Rudin (see p.235) we reduce to the case of a function of two variables, and define u and Δ. Assume first that D21 f vanishes at (a,b). Then use the Fundamental Theorem of Calculus to write Δ(f,Q) as as the integral of u'(t) dt on [a, a+h], and then write u'(t) as an integral of D21 f (t,s) ds on [b,b+k]. Conclude that u'(t) = o(k) and thus that Δ(f,Q) / hk approaches zero. Now apply this to the function fxyD21 f (x,y) to see that in general Δ(f,Q) / hk approaches D21 f (x,y). Do the same in reverse order to conclude that D21 f (x,y)=D12 f (x,y). Can you prove D12(f ) = D21(f ) for a function f to an arbitrary inner product space under the hypotheses of Theorem 9.41?

We omit the “rank theorem” (whose lesser importance is noted by Rudin himself), as well as the section on determinants (which we treated at much greater length in Math 55a).

An important application of iterated partial derivatives is the Taylor expansion of an m-times differentiable function of several variables; see Exercise 30 (Rudin, 243-244). As promised at the start of Math 55a and/or Math 55b, this also applies to maxima and minima of real-valued functions f  of several variables, as follows. If f  is differentiable at a local maximum or minimum then its derivative there vanishes, as was the case for a function of one variable. Again we say that a zero of the derivative is a “critical point” of f. Suppose now that f  is C2 near a critical point. The second derivative can be regarded as a quadratic form. It must be positive semidefinite at a local minimum, and negative semidefinite at a local maximum. Conversely, if it is strictly positive (negative) definite at a critical point then that point is a strict local minimum (maximum) of f. Compare with Rudin's exercise 31 on page 244 (which however assumes that  f  is C3 — I don't know why Rudin requires third partial derivatives).

Next topic, and last one from Rudin, is multivariate integral calculus (Chapter 10). Most of the chapter is concerned with setting up a higher-dimensional generalization of the Fundamental Theorem of Calculus that comprises the divergence, Stokes, and Green theorems and much else besides. With varying degrees of regret we'll omit this material, as well as the Lebesgue theory of Chapter 11. We will, however, get some sense of multivariate calculus by giving a definition of integrals over Rn and proving the formula for change of variables (Theorem 10.9). this will already hint why in general an integral over an n-dimensional space is often best viewed as an integral not of a function but a “differential n-form”. For instance, in two dimensions an integral of f (x, y) dx dy can be thought of as f (x, y) dxdy, and then we recover the formula involving the Jacobian from the rules of exterior algebra. You'll have to read the rest of this chapter of Rudin, and/or take a course on differential geometry or “calculus on manifolds”, to see these ideas developed more fully.

Once we have the change of variables formula, we'll return to the section of Chapter 8 concerning the Euler's Beta and Gamma integrals and give a more natural treatment of the formula relating them (Theorem 8.20).

Math 55b concludes with an introduction to complex analysis (a.k.a. “functions of one complex variable”). We'll start with contour integrals and the fundamental theorems of Cauchy, roughly following the exposition in Ahlfors, chapter III (p.82 ff.). We'll prove:

A key concept in the theory and application of complex analysis is the residue of a function — more properly, a differential — on a punctured neighborhood of some complex number.
Problem sets 1 and 2: Metric topology basics
Problem 1 postponed to Problem Set 2 because we probably won't define “dense” Wednesday. You can already look at Problems 6 and 7, which don't require anything beyond Math 55a plus the notion of a metric, but you need not hand either of them in with PS1.
About Problem 1 and “dense”: a subset E of a topological space X is said to be dense if its closure is X, i.e. if X is the only closed set containing E. For a subspace Y of X we say E is dense in Y if its closure contains Y.
NB About Problem 5: here, and in general, R is assumed to have its usual metric d(x,y)=|xy|, unless (as in Problem 3) a different metric is specified.
Problem 12 corrected 27.i.11: the metric space that must consist only of closed bounded nonempty subsets of X is not X itself…
For Problem 6: the isometries are all of the form i(x) = Ax + b where b is any vector and A is a linear transformation that's orthogonal for 6i and a signed permutation matrix for 6ii. It's easy to see that any such map works (and indeed translation by b is an isometry of any normed vector space), so the hard part is showing that there are no others. Using translation, it's enough to prove this for isometries taking 0 to 0. The basic strategy in such a problem is to reconstruct the vector-space structure of V from the metric.
For an inner-product norm (Problem 6i) it's easy because collinear points can be recognized from having equality in the triangle inequality; geometrically they form degenerate triangles. For example, for any vector x and scalar r we can characterize rx by the conditions d(0, rx) = |r| d(0, x) and d(x, rx) = |r−1| d(0, x). Hence any isometry i that takes 0 to 0 must take rx to r i(x). Likewise the midpoint (x+y)/2 is characterized by its distances from x and y, so i must respect averages, and thus also respects vector sums because we've already seen it takes 2x to to 2 i(x). In other words, i is a linear transformation. We've already seen that a linear isometry is an orthogonal transformation, which lets us finish this solution of 6i.
[Note however that if V is a complex inner product space then its isometry group contains maps i(x) = Ax + b where A is linear over R but not over C.]
For the sup norm (Problem 6ii) it's trickier because there are many more degenerate triangles. Geometrically, “spheres” of radius r centered at points at distance 2r can be “tangent” at many points. But we can turn this to our advantage by exploiting the structure of the sets, call them T(p,q), where the spheres of radius d(p,q)/2 intersect, i.e. the sets of points at distance d(p,q)/2 from both p and q. Here's one way of doing this: given p and r there's a set, call it Cr(p), of 2n points q such that d(p,q)=r and T(p,q) consists of a single point (namely those for which each coordinate of qp is ±r). Thus an isometry i that fixes p must permute those 2n points. Now there are 2n points p' at distance 2r from p such that Cr(p') intersects Cr(p) in 2n−1 points (namely those for which qp is ±2r times a coordinate unit vector). So i must permute those 2n points p' as well for each choice of p and r. It soon follows that if i(0)=0 then i permutes the coordinate axes, and thus (using the result of Problem 5) acts on them by some signed permutation matrix A; with a bit of mathematical induction we then show that i acts on all of Rn by the same A.
[There are other ways to use the structure of the sets T(p,q) to reach this conclusion.]

Problem sets 3 and 4: Metric topology cont'd
Yes, in problem 10i all vector spaces are over F=R or C. (We already know from last term it's not true over Q…)
Problem 8 is the Lebesgue Covering Lemma (handout VI already let this cat out of the bag); r is a “Lebesgue number” for the open cover. Problem 10(ii) is A-5 on the 1999 Putnam exam (which specified n = 1999 — hah). Problem 13 is Urysohn's Lemma.

Problem set 5: Topology grand finale
Problem 3 is the proof Rudin gives in Chapter 8, pages 184-185. The proof we outlined last term is Exercises 26 and 27 on pages 202.
Problem 8 is the Arzelà-Ascoli theorem. (It turns out to also be largely given in Rudin Chapter 7, see 7.22 ff.) The Wikipedia page gives some typical applications to real and complex analysis.

Problem set 6: Univariate differential calculus
In problem 2 the relevant vector spaces are all finite-dimensional, so all norms are equivalent and differentiability etc. is well defined. Also, fo g is the function taking x to the composition of the linear operators f (x): VW and g(x): UV, not composition of functions as in the Chain Rule.
Belated correction 16.iii.11: in problem 2(ii), the map tf (t)−1 is indeed defined and differentiable in some neighborhood of x, but not necessarily on all of [a, b]. (And its derivative at x is given by the formula f (x)−1f '(x) f (x)−1, which nicely extends the familiar f (x)−2f '(x) for a nonzero scalar-valued function in a way consistent with the behavior of the derivative and inverse of the matrix transpose.)

Problem set 7: Univariate integral calculus
Corrected 6.iii.11: yes, this problem set is #7, not #2…
The nice solution to Problem 6 is to use a Riemann sum associated to a partition by a geometric series, rather than the usual arithmetic series.

Problem set 8: Univariate integral calculus, cont'd
Corrected 23.iii.11: In problem 1, the series for E(x) starts at n=0, not n=1. (To be sure if you did the n=1 version then the n=0 answer is immediate…)
Belated correction 25.iii.11: In problem 2, x0 is in the ground field k, not the “function field” K.

Problem set 9: Multivariate differential calculus
Tweaked 26.iii.11: added parentheses in Problem 7 around the n+1” in “(n+1)-dimensional” (and yes it's a “subspace” in the topological sense but not the vector-space sense…);
Corrected 28.iii.11 to fix a typo (missing “a” in “[Cauchy-Riemann] equations”), to state explicitly in Problem 7 that the root t0 is real, and to change the eigenvector of M from v0 to v(M) in Problem 8.

Problem set 10: Multivariate integral calculus, etc.
Δ [Not to be confused with this version of PS10; note the due dates. But if you find a complex number s of real part <1 such that nμ(n) n−s converges, please let me know!]
Corrected 7.iv.11 to fix a typo in the generalization of problem 7: the sum of the xisi should be ≤1, not =1.

Problem set 11: Convexity, multivariate change of variables, etc.
Corrected 10.iv.11: the two Rudin problems from p.290 were already assigned last week :-( so omitted.
Corrected 14.iv.11: in the last problem, the shell has |x| in (r1, r2), not (a, b).
and belatedly Corrected 25.iv.11: the numerator on the right-hand side of the displayed equation (in Problem 5) is Γ(ν+1), not Γ(v+1) [with “nu”, not “vee”].
and very belatedly :-( Corrected 7.v.11: in Problem 5 again, it's ν that must exceed |λ|, not just |ν|. (If ν is sufficiently negative the integral doesn't even converge.) And yes, the hypothesis on ν is needlessly strong (it's a relic of an earlier incorrect formula for the integral) but if you did it already for ν > |λ| then you needn't re-do the proof for a wider range of ν.

Problem set 12: Complex analysis I
Belatedly Corrected 22.iv.11: meromorphic functions, plural, in Problem 7; and |1−xk|, not (1−xk), in each factor of the numerator displayed in Problem 9

13th and last problem set: Complex analysis II: residues and contour integration
Belatedly Corrected 29.iv.11, but only to fix a transparent grammar error in the preamble