Ceci n’est pas un Math 55a syllabus
[No, you don’t have to know French to take Math 55a.
Googling ceci+n'est suffices to turn up
the explanation, such as it is.]
The CAs for Math 55a are
Vikram Sundar (vikramsundar@college) and
Rohil Prasad (prasad01@college)
[if writing from outside the Harvard network, append
.college.edu to ...@harvard].
CA office hours are Monday 8-10 PM in the
Leverett Dining Hall, starting September 4 (same place and time that
Math Night will start the week following).
Thanks to Vikram for setting up this
Dropbox link for the CAs’ notes from class.
Section times:
Vikram Sundar: Monday 1-2 PM; Science Center room 112 on Sep.11,
and room 222 from Sep.18 on.
Rohil Prasad: Thursday 4-5 PM, Science Center room 411
!
If you are coming to class but not
officially registered for Math 55 (e.g. you are auditing,
or still undecided between 25a and 55a but officially signed up for 25a),
send me your e-mail address
so that I and the CA's can include you in class announcements.
My office hours for the week of 18-22 September
will be Wednesday (Sep.20), not the usual Tuesday.
(Still 7:30 to 9:00 PM in the Lowell House Dining Hall.)
Here
is some more information from last year on the number 5777 etc.
(converted to MathJax and with the added remark on $5779 = L_{27}/L_9$);
as noted in the Sep.20 lecture, the fact that the palindrome 5775 factors
so smoothly ($3 \cdot 5^2 \cdot 7 \cdot 11$)
is also due in part to the fact that $5776 = 76^2$. Shanah Tovah!
!
The diagnostic quiz will be given
Wednesday, September 27 in class (11:07 AM to 12:00 noon).
It will cover only material from the first three problem sets.
August 30: “Math blackboard”
($\rm\TeX$’s \mathbb font), such as $\mathbb R$,
is a printed representation of a handwritten representation of
ordinary boldface such as $\bf R$. When using $\rm\TeX$
(or $\rm\LaTeX$ etc.), you might as well use normal boldface.
Either $\mathbf R$ or $\mathbb R$ means the set of real numbers,
whether considered as a field,
abelian group, metric space (more on this in Math 55b), or
whatever other structure is relevant. Likewise
$\mathbf C$ = $\mathbb C$ = the set of complex numbers;
$\mathbf Q$ = $\mathbb Q$ = the set of rational numbers
(quotients of integers — since the initial letter of
“rational(s)” is preempted by the use of $\bf R$ for the reals);
$\mathbf Z$ = $\mathbb Z$ = the set of integers
(from German Zahlen); and in Axler,
$\mathbf F$ = $\mathbb F$ = the field
$\bf R$ or $\bf C$.
At least in the beginning of the linear algebra
unit, we’ll be following the Axler textbook closely enough that
supplementary lecture notes should not be needed. Some important
extensions/modifications to the treatment in Axler:
[see Axler, page 5]
Pace the boxed note on that page,
virtually all mathematicians say and write
“$n$-tuple”
(more fully, “ordered $n$-tuple”),
while I cannot recall another instance of “list” used
for this as Axler does.
(One sometimes sees “tuple” for an
$n$-tuple of unspecified length $n$,
and “ordered pair” and perhaps “ordered triple”,
“ordered quadruple”,
etc. for $n = 2, 3, 4, \ldots$ .)
[cf. Axler, Notation 1.6 on page 4, and the
“Digression on Fields” on page 10]
Unless noted otherwise, $\bf F$ may be an arbitrary
field, not only $\bf R$ or $\bf C$. The
most important fields other than those of real and complex numbers
are the field $\bf Q$ of rational numbers, and the
finite fields ${\bf Z} / p {\bf Z}$ ($p$ prime).
Other examples are: the field ${\bf Q}(i)$ of complex numbers with rational
real and imaginary parts; more generally,
${\bf Q}(d^{1/2})$ for any non-square rational number $d$;
the “$p$-adic numbers” ${\bf Q}_p$ ($p$ prime),
of which we’ll say more when we study topology next term;
and more exotic finite fields such as the 9-element field
$({\bf Z}/3{\bf Z})(i)$.
Here’s a review of the axioms for
fields, vector spaces, and related mathematical structures.
[cf. Axler, p.28 ff.] We define the span of an arbitrary subset
$S$ of (or tuple in) a vector space $V$ as follows:
it is the set of all (finite) linear combinations
$a_1 v_1 + \cdots + a_n v_n$ with each $v_i$ in $S$ and each $a_i$ in $F\!$.
This is still the smallest vector subspace of $V$ containing $S$.
In particular, if $S$ is empty, its span is by definition $\{0\}$.
We do not require that $S$ be finite.
Warning: in general the space $F[X]$ (a.k.a. ${\cal P}(F)$)
of polynomials in $X$, and its subspaces ${\cal P}_n(F)$ of polynomials
of degree at most $n$, might not be naturally identified with
a subspace of the space $F^F$ of functions from $F$ to itself.
The problem is that two different polynomials may yield the same function.
For example, if $F$ is the field of $2$ elements then the polynomial $X^2-X$
gives rise to the zero function. In general, different polynomials
can represent the same function from the field $F$ to itself if and only if
$F$ is finite — do you see why?
(See also Exercise 11 in Axler 1.C, assigned as part of the first
problem set)
If $U_i$ are any subspaces of a vector space $V\!$, then so is their
intersection $\cap_i U_i$. Note that this is not limited to
finite intersections: $i$ could range over an “index set” $I$
of any cardinality (so we would write the intersection as
$\cap_{i \in I} U_i$).
We don’t usually want to intersect an empty family of sets
(do you see why not?), but for subsets of a given set $V$
we can declare that $\cap_{i\in\emptyset} U_i = V$.
For any field (or even any ring) $F$ there is a canonical
ring homomorphism, call it $h$, from $\bf Z$ to $F\!$.
“Ring homomorphism” means:
$h(0) = 0$, $h(1) = 1$, and for any integers $m,n$ we have
$h(m+n) = h(m) + h(n)$ and $h(mn) = h(m) \, h(n)$
(and $h(m-n) = h(m) - h(n)$, but this already follows from the
other properties, as indeed does $h(0)=0$).
But this doesn’t quite mean that we get an isomorphic copy of
$\bf Z$ in $F\!$, because $h$ might not be injective.
Equivalently, the kernel (that is, the preimage
$h^{-1}(\{0\}) = \{n : h(n) = 0\}$)
might be larger than just {0}. In general, $I$ must be an
ideal, i.e. an additive subgroup of $\bf Z$ that is
closed under multiplication by arbitrary integers
(whether in $I$ or not — this mimics the definition of
a subspace, though as it happens for ideals in $\bf Z$
it’s automatic). Now every ideal in $\bf Z$ is either
the zero ideal {0} or $(n) := \{ cn \mid c \in {\bf Z}\}$ for some
integer $n > 0$ (namely the least positive element of the ideal),
called the (positive) generator of the ideal.
When $F$ is a ring, any $n$ may arise as the generator of $\ker(h)$,
most easily for the ring ${\bf Z} / n {\bf Z}$ of integers $\bmod n$.
But if $F$ is a field and $\ker h = (n)$ then $n$ must be either
zero or prime, lest $F$ have zero divisors (elements $a$ and $b$,
neither zero, for which $ab=0$). This $n$ is then called the
characteristic of the field $F\!$. The familiar fields
$\bf Q$, $\bf R$, $\bf C$ all have characteristic zero.
For any prime p, there are fields of characteristic $p$,
notably the “prime field” ${\bf Z} / p {\bf Z}$
(mentioned above; this is the key fact from elementary (but nontrivial)
number theory that any nonzero element of ${\bf Z} / p {\bf Z}$
has a multiplicative inverse!). This field ${\bf Z} / p {\bf Z}$
and other finite fields have important uses in number theory,
combinatorics, computer science, and elsewhere, often using the
linear algebra that we develop in Math 55a.
[cf. the boxed note on page 42 of Axler] It is natural to wonder whether
every vector space, finite-dimensional or not, has a basis.
The polynomial ring $F[x]$, considered as a vector space over $F$
(and denoted by a fancy script $\mathcal P$ in Axler), does have a basis
(powers of $z$), as does a polynomial ring in
several variables, or even infinitely many (see the next item);
but does $F^\infty$? The answer is yes —
but only under the Axiom of Choice (equivalently,
Zorn’s Lemma)!
[I can write “But only under” because it is known that Choice/Zorn
is equivalent to the claim that every vector space has a basis.
Don’t spend too much time trying to find an explicit basis
for $F^\infty$, or for $\bf R$ as a vector space
over $\bf Q$ (a “Hamel basis”)…]
Using the same tool one can prove analogues of some other results in
Chapter 2, such as 2.33 (p.41: every linearly independent set
extends to a basis), and thus 2.34 (p.42: every subspace is a
direct summand; again, don’t spend too much time trying to do this
explicitly for $\bf Q$ as a subspace of the $\bf Q$-vector
space $\bf R$, or for $\oplus_{n\geq1} F$ as a subspace of $F^\infty$!).
NB some other results clearly fail in infinite dimensions, even when
we have an explicit basis; e.g. the even powers of $z$
form a linearly independent subset of $F[z]$ that has the same cardinality
as a basis but is not a basis.
However, 2.31 (p.40: every spanning list contains a basis)
still holds with no further axioms for spanning sets $S$
of arbitrary size, as long as $V$ is finite dimensional.
The reason is that $V$ has a finite spanning set, say $S_0$,
and every element of $S_0$ is a linear combination of
elements of $S$, and since linear combinations are
of necessity finite it takes only a finite subset of $S$
to span $S_0$ and thus $V$.
Now apply the proof of 2.31 to this finite subset.
We may call this generalization “2.31+”.
Here’s an extreme example of
how basic theorems about finite-dimensional vector spaces can become
utterly false for finitely-generated modules:
a module generated by just one element
can have a submodule that is not finitely generated.
Indeed, for any field $F$, let $A$ be the ring of polynomials
in infinitely many variables $X_j$.
[The letter $A$ is a common name for a ring, from French anneau,
cognate with English “annulus”.]
As usual we can regard $A$ as a module over itself, with a single
generator 1. Then a submodule is just an ideal of the ring.
Choose the ideal $I$ generated by all the $X_j$
which consists of all polynomials with constant coefficient equal 0.
Then if there are infinitely many indices $j$ then $I$ is infinitely generated;
indeed any generating set must be at least as large as the index set
of $j$’s,
so for every cardinal $\aleph$ we can make a ring $A$ with a
singly-generated module (namely $A$ itself) and with a submodule
that cannot be generated by fewer than $\aleph$ elements.
For a subtler example, consider the ring we might call
“$F[X^{1/2^\infty}]$”,
consisting of F-linear combinations of monomials
$X^{n/2^k}$ for arbitrary nonnegative integers $n$ and $k$.
Again let $I$ be the ideal generated by the nonconstant monomials,
which is not finitely generated, though there are generating sets
that are “only” countably infinite.
The new behavior involves the countable generating set
$\{ X^{1/2^k} \mid k \geq 0 \}$:
there is no minimal generating subset, because each $X^{1/2^k}$
is a multiple of $X^{1/2^{k'}}$ for any $k' \gt k$.
Likewise for the ring generated by all monomials
$X^r$ with $r$ any nonnegative rational number
(or even all $X^r$ with $r$ any nonnegative real number).
(When A is Noetherian, submodules of finitely-generated modules
are finitely-generated, but might still require more generators;
for example, there are Noetherian rings $A$ with
“non-principal ideals” $I$, which give examples of
a 1-generator module with a submodule that requires
at least 2 generators.)
Please avoid Axler’s notation “product” and
“$V \times W$” (p.91, 3.71 ff.).
I understand the motivation for this notation: it is formally correct,
and avoids the need to distinguish between “external direct sum”
(the usual name for that vector space) and “internal direct sum”
(a vector space sum [within some larger vector space] that happens to be direct).
The problem with this is that in Math 55 (and ubiquitously in the literature)
we shall introduce before long a “tensor product”
$V \otimes W$ of vector spaces, whose dimension is the product of the
dimensions of $V$ and $W$ when those two dimensions are finite;
and it would be a much bigger source of confusion to have that
notation coexist with “$V \times W$”
where the dimensions add.
So please stick with “$V \oplus W$”
and the name “external direct sum” — or if you must,
“Cartesian product” to avoid confusion with tensor products.
For a possibly infinite Cartesian product, which is not
the same as a direct sum (because an element of the direct sum must have
only finitely many nonzero components), we still have the notation
$\Pi_{i \in I} V_i$ to distinguish the Cartesian product from
the direct sum $\oplus_{i \in I} V_i$.
Apropos Axler 2.43 (page 47), a warning: the formula
$\dim(U+W) = \dim(U)+\dim(W) - \dim(U\cap W)$, and its analogy with the
inclusion-exclusion principle,
may lead you to expect a similar formula for $\dim(U_1+U_2+U_3)$
for any three subspaces $U_1,U_2,U_3$ of a vector space;
but that expected generalization is
(in)famously false in general
(and likewise for four or more subspaces)!
As with the notions of span and linear combination, the definition of
a linear transformation makes sense for modules over any ring $A$
(whether commutative or not), and in that generality is called an
$A$-module homomorphism (so you now know the
“morphisms” in the “category” of
$A$-modules); when $A$ is a skew field, we still
call this a linear transformation, and the “rank-nullity theorem”
(3.22, page 63) still holds for finite-dimensional vector spaces
in that context.
Suppose $T: V \to W$ is a linear transformation.
Axler’s notation for the image of T
was already becoming rather old-fashioned when he wrote the first edition of
his book; these days simply $T(V)$ is common
(and likewise for any function at all).
The terminology “null space” (whether one or two words)
for $T^{-1}(\{0\})$ is also somewhat quaint, and
we usually say “kernel” and write
“$\ker(T)$”
[and $\rm\LaTeX$ already provides the command \ker
to typeset this properly].
While I’m at it, best to avoid the use of
“one-to-one” to mean
“injective” (see boxed note on page 60),
because it is also sometimes used for
“bijective”.
Also, the $\rm\LaTeX$ for ${\cal L}(V,W)$ is {\cal L}(V,W);
note the brackets aroud \cal L, without which you would get
$\cal L(V,W)$.
Here’s
a page of ntoes on “Lemma 3.?” and some related observations
on how $\rm Hom$ connectes with finite and infinite direct sums.
More notes on notation: I understand why Axler wants to distinguish
$V'$ and $T'$ (dual space and transformation) from $V^*$ and $T^*$, and
$U^0$ (annihilator) from $U^\perp$ (see the boxed note on page 104).
I’ll try to stick with $U^0$ in this class. But for the duals,
using “ $\!\phantom|'\!$ ” this way incurs a steep price of the very useful
construction exemplified by “let $V$ and $V'$ be vector spaces”:
we already have few enough good letters to name mathematical structures
that even $\pi$ is pressed into double duty (not just $3.14159\ldots$ but
also the quotient πrojection from $V$ to $V/U$).
I’ll stick with the common $V^*$ and $T^*$ here.
An equivalent statement of the identity $(ST)^* = T^* S^*$
(third part of 3.101, page 104 of Axler), together with $(I_V)^* = I_{V^*}$
(which Axler might not even bother stating explicitly),
is that duality of vector spaces and linear transformations constitutes a
“contravariant
functor” from the category of
F-vector spaces and linear transformations to itself.
The results about quotient spaces and duality in sections E and F of
Chapter 3 are often described in terms of
exact sequences.
A sequence $\ \cdots \to L \to M \to N \to \cdots\ $ of linear transformations
(or $A$-module homomorphisms, “etc.”)
is said to be “exact at $M$”
if the kernel of the map $M \to N$ is the image of the map $L \to M$
(that is, if the elements of $M$ that go to zero in $N$
are precisely those that come from $L$). The sequence is
“exact” if it is exact at each step with both an incoming
and an outgoing map. In particular, a map $M \to N$ is injective iff
it extends to a sequence $0 \to M \to N$ that is exact at $M$,
and surjective iff it extends to a sequence $M \to N \to 0$
that is exact at $N$. [In this context “0” is
commonly used for the trivial vector space (or module, etc.) $\{0\}$.
Note that in each case there is no choice about the function from or to
that trivial vector space 0, and likewise at least for modules.
Another notation that signals injectivity is $M \hookrightarrow N$
(${\rm\LaTeX}$: \hookrightarrow, with the extra hook
suggesting $\subset$); likewise $M \to\!\!\!\!\to N$ for a surjective map.]
Thus the map is an isomorphism iff $0 \to M \to N \to 0$
is exact (at both $M$ and $N$). Even more easily,
$0 \to M \to 0$ is exact iff $M=0$.
A short exact sequence is the next case, with three modules
other than the initial and final 0. The standard example is
$0 \to L \to M \to N \to 0$ where the map $L \to M$
is an inclusion map (thus an injection) and the map $M \to N$
is the quotient map $M \to M/L$ (thus a surjection). In general if
$0 \to L \to M \to N \to 0$ is a short exact sequence then the injection
$L \to M$ identifies $L$ with a submodule of $M$, and then
the surjection $M \to N$ is identified with the quotient map.
More generally, any homomorphism $L \to M$ extends
(uniquely up to equivalence) to an exact sequence with four
modules between the outer zeros: $0 \to K \to L \to M \to N \to 0$,
where $K$ is the kernel of the map $L \to M$, and $N$ is its
“cokernel ”, that is, the quotient of
$M$ by the image of $L$.
Now consider the case of vector spaces. Then to each linear transformation
$V \to W$ we associate the dual transformation $V^* \leftarrow W^*$,
with the dual of a composition $V \to W \to X$ being
the composition of the dual transformations
$V^* \leftarrow W^* \leftarrow X^*$ in reverse order;
this makes duality a “contravariant functor”
on the category of $F$-vector spaces. The key fact is
that for finite-dimensional vector spaces, duality preserves exactness of
sequences of linear transformations. Thus starting from any linear $V \to W$,
we can extend to an exact sequence $0 \to U \to V \to W \to X \to 0$ with
$U$ the kernel and $X$ the cokernel, and dualize to deduce the exactness of
$0 \leftarrow U^* \leftarrow V^* \leftarrow W^* \leftarrow X^* \leftarrow 0$
with $V^* \leftarrow W^*$ the dual map.
This immediately encodes Axler 3.108 (page 107): the map
$V \to W$ is surjective iff $X$ is zero iff $X^*$ is zero
iff the dual map is injective. Likewise for 3.110 (p.108)
via the vanishing of $U$ and $U^*$.
With a bit more work we can get the general relations
$\ker(T^*) = ({\rm im}(T))^0$ (3.107, p.106) and
${\rm im}(T^*) = (\ker(T))^0$ (3.109, p.107)
between the kernels and images of $T$ and its dual, again assuming that
$T$ is a linear map between finite-dimensional vector spaces.
Conversely, the fact that duality preserves exactness
(for sequences of linear maps between finite-dimensional vector spaces)
can be deduced as a special case of 3.107 and 3.109.
Being a special case of ${\rm Hom}$, duality makes sense
in the more general setting of modules over a ring $A$:
the dual of an $A$-module $M$ is $M^* := {\rm Hom}(M,A)$,
the $A$-module of o
$A$-linear homomorphism from $M$ to $A$.
This still gives a “contravariant functor” from the category of
$A$-modules to itself:
an $A$-module homomorphism $M \to N$ gives rise in the same way
an $A$-module homomorphism $M^* \leftarrow N^*$ with the
direction reversed, consistent with identity and composition.
But, as you might suspect by now, our theorems about the kernel and image
of the dual of a linear transformation can fail in this more general setting,
even when applied to finitely-generated modules. We already see this for
injections and surjections: For a linear transformation $T : V \to W$,
we saw that if $T$ is injective then the dual transformation $T^*$
is surjective, and vice versa. Only one of these two results holds
for injections and surjections of $A$-modules;
can you see which one it is, and give a counterexample for the other
(already for $A = \bf Z$)?
Another way to think about the eigen-basics: “Lemma 5.0”:
If $T$ is an operator on any vector space $V$,
and $\lambda$ any scalar, then $U$ is an invariant subspace
for $T$iff it is an invariant subspace for
$T - \lambda I.$
So, for instance, since $\ker T$ is an invariant subspace, so is
$\ker(T-\lambda I),$
a.k.a. the $\lambda$-eigenspace.
Yet another note on notation: Axler’s name
“$T/U$” (for the operator on $V/U$
induced from the action of $T$ on a vector space $V$
with an invariant subspace $U,$ see 5.14 on p.137)
is a nice notation, but (unlike $T|_U$ for the restriction of $T$
to $U$) is seen rarely if at all in the research literature.
Normally it will be called plain $T$, or possibly $\overline T$
(since it is constructed by descending to $V/U$
the composition of $T$ with the quotient map $V \to V/U$).
Let $T$ be a linear operator on $V$. The algebraic
properties of polynomial evaluation at $T$ can be summarized
by saying that the map from $F[X]$to End($V$) that takes any polynomial $P$ to $P(T)$
is not just linear but a ring homomorphism.
[Since $F[X]$ is a commutative ring, so is the image of this homomorphism,
even though End($V$) is not commutative once $\dim(V) > 1$.]
In particular the kernel is an ideal in $F[X]$; when $V$
is finite dimensional, this ideal must be nonzero, and its generator is
what we shall call the “minimal polynomial” of $T$.Special case: if $V$ is $F$ itself, then we
naturally identify End($V$)with $F$,
and we get for any field element $x$ the evaluation homomorphism
from $F[X]$to F that takes any polynomial to
its value at $x$.
Axler proves the Fundamental Theorem of Algebra using
complex analysis, which cannot be assumed in Math 55a
(we’ll get to it at the end of 55b).
Here’s a proof using the topological tools
we’ll develop at the start of 55b.
(Axler gives one standard complex-analytic proof in 4.13 on page 124.)
Here are two other equivalent conditions for
algebraic closure, in terms of irreducible polynomials and
finite(-dimensional) field extensions.
Triangular matrices are intimately related with “flags”.
A (complete) flag in a finite dimensional vector space $V$
is a sequence of subspaces $\{0\} = V_0, V_1, V_2, \ldots, V_n = V$, with
each $V_i$ of dimension $i$ and (for $1\leq i\leq n$) containing $V_{i-1}$.
A basis $v_1,v_2,\ldots,v_n$ determines a flag: $V_i$ is the span of the
first $i$ basis vectors. Another basis $w_1,w_2,\ldots,w_n$
determines the same flag if and only if each $w_i$ is
a linear combination of $v_1,v_2,\ldots,v_i$
(necessarily with nonzero $v_i$ coefficient).
The standard flag in $F^n$ is the flag obtained in this way from
the standard basis of unit vectors $e_1,e_2,\ldots,e_n$.
The punchline is that, just as a diagonal matrix is one that respects
the standard basis (equivalently, the associated decomposition of
V as a direct sum of 1-dimensional subspaces),
an upper-triangular matrix is one that respects the standard flag.
Note that the $i$-th diagonal entry of a triangular matrix
gives the action on the one-dimensional quotient space
$V_i / V_{i-1}$ (again for each $i=1,\ldots,n$).
While the third edition of Axler includes quotients and duality,
it still lacks tensor algebra.
This is no surprise, but it will not stop us in Math 55!
Here’s an introduction
[As you might guess from \oplus,
the TeXism for the tensor-product symbol is \otimes.]
Corrected 14.x.2017 [Alec Sun]:
at the end of the first display on page 2,
it’s $w_{ij}$, not $u_i \otimes v_j$.
One of many applications is the trace of
an operator on a finite dimensional $F$-vector space $V$.
This is a linear map from ${\rm Hom}(V,V)$ to $F$.
We can define it simply as the composition of two maps:
our identification of ${\rm Hom}(V,V)$ with
the tensor product of $V^*$ and $V$,
and the natural map from this tensor product to $F$
coming from the bilinear map taking $(v^*,v)$ to $v^*(v)$.
We shall see that this is the same as the classical definition:
the trace of $T$ is the sum of the diagonal entries of
the matrix of $T$ with respect to any basis.
The coordinate-independent construction via tensor algebra
explains why the trace does not change under change of basis.
(The invariance can also be proved by checking explicitly that
$AB$ and $BA$ have the same trace for any square matrices $A,B$
of the same size.) Once we’ve constructed the trace, we have
a series of invariants ${\rm tr}(T^k)$ ($k=1,2,3,\ldots$;
the $k=0$ trace is ${\rm tr}(I_V) = \dim V$).
If $T$ has an upper-triangular matrix $(a_{ij})$
then the diagonal entries of $T^k$ are $a_{ii}^k$,
so ${\rm tr}(T^k) = \sum_i a_{ii}^k$. In characteristic zero
(or characteristic $>\dim V$), that’s enough to construct
the characteristic polynomial of $T$
[with apologies for using two mathematical senses of
“characteristic” in the same sentence...];
but to do it in general we’ll have to work harder.
Here are some basic definitions and facts about general
norms
on real and complex vector spaces.
Just as we can study bilinear symmetric forms
on a vector space over any field, not just $\bf R$,
we can study sesquilinear conjugate-symmetric forms
on a vector space over any field with a conjugation,
not just $\bf C$.
Here a “conjugation” on a field $F$
is a field automorphism $\sigma: F \to F$
such that $\sigma$ is not the identity but
$\sigma^2$ is the identity (that is, $\sigma$ is an involution).
Given a basis $\{v_i\}$ for $F$,
a sesquilinear form $\langle \cdot, \cdot \rangle$ on $F$
is determined by the field elements
$a_{i,\,j} = \langle v_i, v_j \rangle,$
and is conjugate-symmetric if and only if
$a_{j,i} = \sigma(a_{i,\,j})$ for all $i,j$.
Note that the “diagonal entries”
$a_{i,i} = \langle v_i, v_i \rangle$ —
and more generally $\langle v,v \rangle$ for any $v \in V$ —
must be elements of the subfield of $F$
fixed by $\sigma$.
Over any field not of characteristic 2,
we know that for any non-degenerate
symmetric pairing on a finite-dimensional vector space
there is an orthogonal basis, or equivalently
a choice of basis such that the pairing is $(x,y) = \sum_i a_i x_i y_i$
for some nonzero scalars $a_i$.
But in general it can be quite hard to decide whether
two different collections of $a_i$ yield isomorphic pairings.
Even over Q the answer is already tricky in dimensions
2 and 3, and I don’t think it’s known in a vector space of
arbitrary dimension. Over a finite field of odd size there are always
exactly two possibilities, as we may see in a few weeks.
“Sylvester’s Law of Inertia” states that
for a nondegenerate pairing on a finite-dimensional vector space $V/F$,
where either $F = \bf R$ and the pairing is bilinear and symmetric, or
$F = \bf C$ and the pairing is sesquilinear and conjugate-symmetric,
the counts of positive and negative inner products
for an orthogonal basis constitute an invariant of the pairing
and do not depend on the choice of orthogonal basis.
(This invariant is known as the “signature” of the pairing.)
The key trick in proving this result is as follows.
Suppose $V$ is the orthogonal direct sum of subspaces $U_1, U_2$
for which the pairing is positive definite on $U_1$
and negative definite on $U_2$.
[A pairing $(\cdot,\cdot)$ is called “negative definite”
if $-(\cdot,\cdot)$ is positive definite.]
Then any subspace $W$ of $V$
on which the pairing is positive definite
has dimension no greater than $\dim(U_1)$.
Proof: On $W \cap U_2,$
the pairing is both positive and negative definite;
hence that subspace is $\{0\}$. The claim follows by a dimension count,
and we quickly deduce Sylvester’s Law.
If $U$ is a subspace of inner-product space $V$,
but not necessarily finite dimensional, there is not generally a complement:
one can still define $U^\perp$, but the direct orthogonal sum
$U \oplus U^\perp$ might be strictly smaller than $V$.
What then happens to 6.56 (p.198 in 6.C),
which describes the orthogonal projection $P_U(v)$
as the vector in $U$ closest to $v$
(i.e., minimizing the norm $\|v - u\|$)?
Well, if there exists such $u$ then indeed
$v-u$ is orthogonal to $u$, but in general
the minimum need not be attained: at best we can construct a sequence of
vectors $u_n \in U$ such that $\| v - u \| $ approaches
$\inf_{u \in U} \| v - u \|.$
It then follows from
Apollonius’ theorem
(see the front cover of Axler! and also Exercise 31 of 6.A, page 179)
that the $u_n$ constitute a
Cauchy sequence
in U (else $(u_m + u_n)/2$ is too close to $v$).
So if $U$ is complete with respect to the norm distance
then there is a nearest vector and we can proceed as before.
But in general infinite-dimensional inner product spaces are not complete
(the complete ones are
Hilbert spaces, and that is a very special case).
We shall say a lot more about completeness and related notions
at the start of Math 55b.
A regular graph of degree $d$ is a
Moore graph of girth 5
if any two different vertices are linked by a unique path of length
at most 2. Such a graph necessarily has
$b = 1 + d (d-1) = d^2 + 1$ vertices.
Let $A$ be the adjacency matrix, i.e. the symmetric $n \times n$ matrix with
$A_{ij} = 0$ if vertices $i,j$ are (distinct and) adjacent on the graph,
and $A_{ij}=0$ otherwise; and let $\bf 1$ be the all-ones vector.
Then $\bf 1$ an eigenvector of $A$ with eigenalue $d$
(because each vertex has degree $d$).
We have $(1 + A + A^2) v = dv + \langle v, {\bf 1} \rangle \bf 1$
for all $v$ (proof: check on unit vectors and use linearity).
Thus $A$ takes the orthogonal complement
of ${\bf R} \cdot \bf1$ to itself, and satisfies
$1+A+A^2 = d$ on that orthogonal complement.
Since this quadratic equation has distinct roots, say
$m$ and $-1-m$ for some $m \ge 0$ (namely the positive root of
$1 + m + m^2 = d),$
it follows that the orthogonal complement
of ${\bf R} \cdot \bf1$
is the direct sum of the corresponding eigenspaces.
Let $d_1$ and $d_2$ be their dimensions. These sum to
$n - 1 = d^2$, sand satisfy $md_1 + (-1-m)d_2 + d = 0$
because the matrix $A$ has trace zero. This lets us solve
for $d_1$ and $d_2$.
in particular we find that $d_2 - d_1 = (2d-d^2) \, / \, (2m+1).$
Since that’s an integer [it is a surprisingly powerful constraint
that the dimension of any vector space is in $\bf Z$!],
either $d = 2$ (giving the pentagon graph) or $m$ is an integer.
Substituting $m^2 + m+1$ for $d$, we find that
$16(d_1 - d_2)$ is an integer plus $15 /(2m+1)$, whence
$m \in \{0,1,2,7\}.$ The first of these is impossible,
and the others give $d=3$, 7, or 57 as claimed.
Why the name “spectral theorem”? The set (or sometimes the
“multiset”) of eigenvalues
of a linear operator on a vector space $V$
is often called its
“spectrum”, especially when
$V$ is a real or complex vector space, either
finite or infinite dimensional. This is related with the
visual (and by extension the electromagnetic) spectrum, for reasons that
would take us much too far into wave and quantum mechanics,
so we shall say little more of that here (but you may encounter it again
in your physics class(es)).
We’ll define the determinant of an operator $T$
on a finite dimensional space $V$ as follows:
$T$ induces a linear operator $\wedge^n T$
on the top exterior power $\wedge^n V$ of $V$
(where $n = \dim V);$ this exterior power is one-dimensional,
so an operator on it is multiplication by some scalar;
$\det(T)$ is by definition the scalar corresponding to $\wedge^n T$.
The “top exterior power”
is a subspace of the “exterior algebra”
$\wedge^\bullet (V)$ of $V$,
which is the quotient of the tensor algebra by the two-sided
ideal generated by $\{ v \otimes v : v \in V\}.$
(Recall that this ideal also contains $v \otimes w + w \otimes v$
for all $v,w \in V.)$
We’ll still have to construct the sign homomorphism from the
symmetric group of order $\dim V$ to $\{1, -1\}$
to make sure that this exterior algebra is as large
as we expect it to be, and that in particular that
the $(\dim(V))$-th exterior power has dimension 1
rather than zero.
Interlude: normal subgroups;
short exact sequences in the context of groups
A subgroup $H$ of $G$ is normal
(satisfies $H = g^{-1} \! H g$ for all $g \in G)$ iff
$H$ is the kernel of some group homomorphism from $G$iff the injection $H \hookrightarrow G$ fits into
a short exact sequence $\{1\} \to H \to G \to Q \to \{1\},$
in which case $Q$ is the quotient group $G/H.$
[The notation {1} for the one-element (“trivial”) group
is usually abbreviated to plain 1, as in $1 \to H \to G \to Q \to 1.]$
This is not in Axler but can be found in any introductory text in
abstract algebra; see for instance Artin, Chapter 2, section 10.
Examples: $1 \to A_n \to S_n \to \{ \pm 1 \};$
also, the determinant homomorphism ${\rm GL}_n(F) \to F^*$
gives the short exact sequence
$1 \to {\rm SL}_n(F) \to {\rm GL}_n(F) \to F^* \to 1,$
and this works even if $F$ is just a commutative ring with unit
as long as $F^*$ is understood as the group of invertible elements
of $F$ — for example, ${\bf Z}^* = \{\pm 1\}.$
Some more tidbits about exterior algebra:
If $w \in \wedge^m V$ and $w' \in \wedge^{m'} V$ then
$ww' = (-1)^{mm'} w'w;$ that is, $w$ and $w'$' commute unless
$m,m'$ are both odd in which case $w$ and $w'$ anticommute.
(The identity $ww' = (-1)^{mm'} w'w$ is also written in the equivalent form
$w \wedge w' = (-1)^{mm'} w' \wedge w.)$
If $m + m' = n = \dim V$ then the natural pairing
$\wedge^m V \times \wedge^{m'} V \to \wedge^n V$ is nondegenerate,
and so identifies the $m'$-th exterior power
canonically with the dual of the $m$-th,
tensored with the top ($n$-th) exterior power.
In particular, if $m=1$, and $T$ is any invertible operator
on $V$, then we find that the induced action of $T$
on the $(n-1)$st exterior power is the same as
its action on $V^*$ multiplied by $\det T$.
This yields the formula connecting the inverse and cofactor matrix
of an invertible matrix (a formula which you may also know
in the guise of “Cramer’s rule”).
For each $m$ there is a natural non-degenerate pairing
between the $\wedge^m V$ and $\wedge^m V^*$,
which identifies these exterior powers with each other’s dual.
More will be said about exterior algebra when differential forms
appear in Math 55b.
We’ll also show that a symmetric (or Hermitian) matrix
is positive definite iff all its eigenvalues are positive
iff it has positive principal minors
(the “principal minors” are the determinants
of the square submatrices of all orders containing the (1,1) entry).
More generally we’ll show that the eigenvalue signs determine
the signature, as does the sequence of signs of principal minors
if they are all nonzero. More precisely: an invertible
symmetric/Hermitian matrix has signature
$(r,s)$ where
$r$ is the number of positive eigenvalues and
$s$ is the number of negative eigenvalues;
if its principal minors are all nonzero then
$r$ is the number of $j \in \{ 1, 2, \ldots, n \}$ such that the
$j$-th and ($j-1$)-st minors have the same sign,
and $s$ is the number of $j$ in that range such that the
$j$-th and ($j-1$)-st minors
have opposite sign [for $j=1$ we always count
the “zeroth minor” as being the positive number 1].
This follows inductively from the fact that the determinant has sign
$(-1)^s$ and the signature $(r',s')$ of the restriction of
a pairing to a subspace has $r' \leq r$ and $s' \leq s.$
For positive definiteness, we have the two further
equivalent conditions: the symmetric (or Hermitian) matrix
$A = (a_{jk})$ is positive definite
iff there is a basis $(v_j)$ of $F^n$ such that
$a_{j,k} = \langle v_j, v_k \rangle$ for all $j,k$,
and iff there is an invertible matrix $B$ such that $A = B^* \! B.$
For example, the matrix with entries $1 / (j+k-1)$
(“Hilbert matrix”) is positive-definite, because it is
the matrix of inner products (integrals on [0,1]) of the basis
$1, x, x^2, \ldots, x^{n-1}$ for the polynomials of degree $\lt n.$
See the 10th problem set for a calculus-free proof of the
positivity of the Hilbert matrix, and an evaluation of its determinant.
All of Chapter 8 works over an arbitrary algebraically closed field,
not only over $\bf C$ (except for the minor point about
extracting square roots, which breaks down in characteristic 2); and
the first section (“Generalized Eigenvalues”) works over any field.
More about nilpotent operators: let $T$ be any operator on
a vector space $V$ over a field $F,$ not assumed algebraically closed.
If $V$ is finite-dimensional, then The Following Are Equivalent:
(1) There exists a nonnegative integer $k$ such that $T^k = 0$;
(2) For any vector $v \in V$, there exists a nonnegative integer
$k$ such that $T^k v = 0;$
(3) $T^n = 0$, where $n = \dim V$.
Note that (1) and (2) make no mention of the dimension,
but are still not equivalent for operators on infinite-dimensional spaces.
(For example, consider differentiation on the
$\bf R$-vector space ${\bf R}[x].)$
We readily deduce the further equivalent conditions:
(4) There exists a basis for $V$ for which $T$ has
an upper-triangular matrix with every diagonal entry equal zero;
(5) Every upper-triangular matrix for $T$ has zeros on the diagonal,
and there exists at least one upper-triangular matrix for $T$.
Recall that the second part of (5) is automatic
if $F$ is algebraically closed.
The space of generalized 0-eigenvectors
(the maximal subspace on which $T$ is nilpotent)
is sometimes called the nilspaceof $T\!$.
It is an invariant subspace. When $V$ is finite dimensional,
$V$ is the direct sum of the nilspace and another invariant
subspace $V',$ consisting of the intersection of the subspaces
$T^k(V)$ as $k$ ranges over all positive integers (8.5).
This can be used to prove Cayley-Hamilton (over an algebraically closed field)
using the standard definition of the characteristic polynomial as
$\det(xI-T).$
An example in infinite dimension when (8.5) fails:
V is the real vector space of continuous functions from
$\bf R$ to $\bf R$, and $T$ is multiplication by $x$.
[That is a useful counterexample for many other aspects of
“eigenstuff” when we try to go beyond finite dimension;
for example, there are no eigenvectors, but
for every real number $\lambda$ the operator
$\lambda I - T$ is not invertible!]
The dimension of the space of generalized $\lambda$-eigenvalues
(i.e., of the nilspace of $T-\lambda I)$ is usually called the
algebraic multiplicity of $\lambda$
(since it’s the multiplicity of $\lambda$
as a root of the characteristic polynomial of $T$),
to distinguish it from the “geometric multiplicity”
which is the dimension of $\ker(T-\lambda I)$, a.k.a. the
eigenspace $V_\lambda$.
In the proof of Cauchy’s theorem (the Wikipedia page’s
“Proof 2”),
if the group order — call it $n$ — is not
a multiple of $p$, we find that $n^{p-1} \equiv 1 \bmod p$
(since in this setting the identity $e$ is the only solution of $g^p=e);$
this gives a combinatorial proof of
“Fermat’s little theorem
”
for $n > 0$ (since there is then always at least one group of
$n$ elements, namely the cyclic group ${\bf Z}/n{\bf Z}).$
Replacing $n$ by $-n$ then yields the result for negative integers
as well (since $(-1)^{p-1} \equiv 1 \bmod p$ even for $p=2).$
Often Artin’s 4.7 is called Sylow II, with 4.6 an
intermediate result; but Artin calls 4.6 “Sylow II”
and 4.7 a corollary.
The combinatorial argument for Sylow I also extends to prove the
“$1 \bmod p$” part of Sylow III
once we show that ${p^e m \choose p^e} \equiv m \bmod p.$
A nice way to see this is to start from the familiar congruence
$(1+X)^p \equiv 1+X^p \bmod p$ in ${\bf Z}[X]$ (which follows from
${p \choose k} \equiv 0 \bmod p$ for $0<k<p),$ and deduce
inductively that $(1+X)^{p^e} \equiv 1+X^{p^e} \bmod p$ for $e=1,2,3,\ldots.$
Raising to the $m$-th power yields
$(1+X)^{p^e m} \equiv (1+X^{p^e})^m \bmod p$, and then
comparing $X^{p^e}$ coefficients yields the desired congruence
${p^e m \choose p^e} \equiv m \bmod p.$
One might imagine that since all finite groups can be built up from
simple ones, and the
Classification Theorem
describes all simple finite groups, we can understand all finite groups.
Alas(?) this is far from the case. Even $p$-groups, that is
groups of prime-power order $p^n,$ are chaotic for large $n$.
Indeed for given $p$ the number of groups of order $n$ grows as
$p^{\frac{2n^3}{27} - O(n^2)}_{\phantom0},$
with most of these groups fitting into a short exact sequence of the form
$1 \to ({\bf Z}/p {\bf Z})^d \to G \to ({\bf Z}/p {\bf Z})^e \to 1$
with $d+e = n.$ To see how so many such groups can exist,
write the short exact sequence as $1 \to V \to G \to W \to 1$,
and construct a map $W \times W \to V$ as follows.
Given $w_1,w_2 \in W$, choose preimages $g_1,g_2 \in G$,
and map $(w_1,w_2)$ to the commutator $[g_1,g_2],$
which is in the kernel of the map $G \to W$ and can thus be regarded
as a vector in $V$. One can check that this commutator
is independent of the choice of preimages $g_1,g_2$, depends bilinearly
on $w_1,w_2$, and is alternating (the image vanishes if $w_1=w_2$).
Thus we have an element of ${\rm Hom}(\wedge^2 V, W)$,
a vector space over ${\bf Z} / p{\bf Z}$ of dimension $d \cdot {e \choose 2}$,
which is maximized when $d$ and $e$ are within a constant of $n/3$ and $2n/3$
respectively. Somewhat harder, one can show that any alternating map
$W \times W \to V$ is realized by some $G$
(and is realized uniquely unless $p=2.)$
For two such maps to give rise to isomorphic groups,
they must be related by elements of ${\rm GL}(V) \times {\rm GL}(W),$
and that group has fewer than $p^{d^2+e^2} < p^{n^2}$ elements.
Hence there are at least $p^{\frac{2n^3}{27} - O(n^2)}_{\phantom0}$
isomorphism classes as claimed.
(Similar “chaos” affects the classification of
trilinear and higher-order maps on vector spaces, such as
alternating trilinear forms on a vector of high dimension.)
Thanks to Vikram for this $\rm\LaTeX$
template
for problem-set solutions
(here’s what the
resulting PDF looks like). They ask that e-mail submissions of
problem sets have “Math 55 homework” in the Subject line.
First problem set / Linear Algebra I:
vector space basics; an introduction to convolution rings
Clarifications:
•
“Which if any of these basic results would fail if
$\bf F$ were replaced by $\bf Z$?” —
but don’t worry about this for problems 7 and 24,
which specify $\bf R$.
•
Problem 12: If you see how to compute this efficiently but not
what this has to do with Problem 8, please keep looking for
the connection.
Here’s
the “Proof of Concept” mini-crossword with links concerning
the ∎ symbol.
Here’s
an excessively annotated solution.
Second problem set / Linear Algebra II:
dimension of vector spaces; torsion groups/modules and divisible groups
About Problem 5: You may wonder: if not determinants,
what can you use? See Axler, Chapter 4, namely 4.8 through 4.12
(pages 121–123), and note that the proof of 4.8
(using techniques we won’t cover till next week) can be replaced by
the ordinary algorithm for polynomial long division, which you probably
learned with real coefficients but works over any field.
While I’m at it, 4.7 (page 120) works over any infinite
field; Axler’s proof is special to the real and complex numbers,
but 4.12 yields the result in general. (We already remarked that
this result does not hold for finite fields.)
Third problem set / Linear Algebra III:
Countable vs. uncountable dimension of vector spaces;
linear transformations and duality
corrected 18 September (Mark Kong):
• Problem 2: Suppose that for some (finite) $n$ we can extend $B_0$ by
$n$ vectors (not “extend $B$ by $n$ vectors” etc.).
• Also, in Problem 1 Mark notes that one already needs a bit of the
Axiom of Choice even to prove the fact (which I blithely asserted in class)
that a countable union of countable, or even finite, sets is itself countable.
(If you can enumerate a countable disjoint union
$\bigcup_{i=1}^\infty S_i$ of countable or finite sets, then you can choose
an element of $\prod_{i=1}^\infty S_i$ by choosing from each $S_i$
the element that comes earliest in the enumeration.) Go ahead and
assume this for Problem 1.
• (And in Problem 10 it’s subsets of fewer than
$e$ elements of $F$, not $e$-element subsets —
but that’s still not polynomial in $q$.)
Fourth problem set / Linear Algebra IV:
Duality, and connections with projective spaces and with
vector spaces of polynomials
corrected 27.ix.2017:
In problem 2ii, we need nonzero $x \in F$
such that $x^n \neq 1$ (not “$x^n=1$” which always exists);
and the introductory sentence now makes explicit the intention that
$F$ is a finite field of $q$ elements also for problem 3.
(Noted by Forrest Flesher)
Fifth problem set / Linear Algebra V:
“Eigenstuff”
(preceded by prelude: exact sequences and more duality)
corrected 8.x.2017: CJ Dowd is the first to note that
in problem 9 (Axler 5A:31) we cannot quite let $\bf F$ be arbitrary:
if it is finite and of size less than $m$ then it cannot contain
enough pairwise distinct eigenvalues to accommodate $v_i$
for each $i=1,2,\ldots,m$ ! Fortunately this is the only obstruction,
so for this problem assume that $\bf F$ contains
at least $m$ distinct elements.
Eighth problem set / Linear Algebra VIII:
The spectral theorem; spectral graph theory; symplectic structures
Problems 7 and 8 postponed till Friday, 3 November at noon.
Tenth problem set:
Linear Algebra X (determinants and distances);
representations of finite abelian groups (Discrete Fourier transform)
(Yes, in Problem 9i the equation “$A^4 = N^2$” means
$A^4 = N^2 I$ [i.e. $P(A) = 0$ where $P$ is the polynomial
$X^4 - N^2 \in {\bf C}[X]$.)
correction to 1i (CJ Dowd): the equality condition is not quite
right when $\det A = 0$ (and $n \geq 3)$, when equality holds iff
some $v_i = 0$, and then the other $v_j$ are orthogonal to that $v_i$
but need not be orthogonal to each other.
Eleventh and final problem set:
Representations of finite abelian groups
corrected 28.xi.2017: Fan Zhou notes that in part (i) of Problem 5
$g_1,g_2$ are in $G_1,G_2$ respectively, not $V_1,V_2$.