Math 256x: The Theory of Error-Correcting Codes (Fall 2013)

Math 256x: The Theory of Error-Correcting Codes (Fall 2013)

This year’s Math 256x is a “topics class” (taught here before, but last in 2001) that develops the mathematical theory of error-correcting codes and its connections with other areas of mathematics ranging from combinatorics to group and representation theory to projective and algebraic geometry. We meet Tuesdays and Thursdays from 11:30 AM to 1:00 PM in ~~Room 309~~ Room 109 of the Science Center.

If you find a mistake, omission, etc., please let me know by e-mail.

September 3: Overview
September 5: Introduction: Hamming distance and Hamming space; (n, M, d) codes; transmission and error-detection rates; the Singleton bound and MDS codes; the Hamming (sphere-packing) bound and perfect codes; the q-entropy function and the asymptotic form of the sphere-packing bound; the Gilbert-(Shannon-)Varshamov bound and our asymptotic embarrassment
September 10: Linear codes (and isometries of Hamming space)
First problem set: continuity of the asymptotic rate functions; a bit on spherical codes; binary codes with 2d > n (a foretaste of the “linear programming” bounds); optimality and uniqueness of the [7, 4, 3]₂ (dual Hamming) code
September 12: No Class
September 17: Linear codes and point configurations in projective space (and introduction of several structures and constructions that will recur later in the semester)
Second problem set [Corrected Sep.18, see 3ii]: More about point configurations and the associated linear codes
September 19: Codes and point configurations, cont’d; Segre’s theorem on ovals in algebraic projective planes of odd order
September 24: More on conics, especially over finite fields, and projective spaces in general; rational normal curves and classical Goppa codes.
We’ll use these Lecture notes
September 26: Introduction to weight enumerators and the MacWilliams identity
See also Part II of my Notices article “Lattices, Linear Codes, and Invariants” (Part I is in the analogous picture for lattices in Euclidean space, which is developed to an extent later in the course).
October 1: Some uses of the MacWilliams identity; Gleason on the Hamming weight enumerators of self-dual binary codes
October 3: Gleason for self-dual codes of Type II, III, and IV
October 8: Existence and uniqueness of the binary Golay codes
Third problem set: Synthemes, totals, and the projective plane of order 5
October 10: The binary Golay codes and related structures, cont’d
October 15: Extremal enumerators and codes; the Mallows-Odlyzko-Sloane theorem
October 17: The sphere-packing problem; lattices and their theta series
Table listing, in several code and lattice contexts, the parameters q, F, Δ, q₀ and 1/Δ(q₀), and the first value m₀ for which the extremal weight enumerator or theta series has a negative coefficient. (The m₀ information can also be found in Theorem 29 on page of Rains and Sloane’s chapter “Self-Dual Codes” from the Handbook of Coding Theory, citing a preprint by Zhang Shengyang that meanwhile was published in Discrete Applied Math. 91 (1999), 277–286.)
October 22: Overview of sphere-packing bounds; theta series of unimodular lattices; extremal theta functions and lattices
October 24: Type I and Type II lattices (a.k.a. odd and even unimodular lattices) and their theta series; Construction A
October 29: Asymptotics of kissing numbers of extremal codes and lattices via the stationary-phase method
October 31: Asymptotic impossibility of (nearly-)extremal codes. Start on Reed-Muller codes
November 5: Reed-Muller codes, cont’d
November 7: a bit more on Reed-Muller codes; introduction to cyclic codes
November 12: Cyclic codes cont’d: the BCH bound and BCH codes; QR codes, and Golay codes as QR codes
November 14: Krawtchouk polynomials and Lloyd’s theorem on perfect codes
November 19: Introduction to the linear programming (LP) bounds on error-correcting codes
November 21: orthogonal polynomial basics
November 26: LP bounds II: an asymptotic upper bound
November 28: No Class (Thanksgiving break)
December 3: The ternary Golay codes and related objects

Tuesday, Sep. 3: Overview

The general motivating setup from information theory for error-correcting block codes (as opposed to other kinds such as convolutional, let alone cryptographic: we’re aiming to protect against error, not eavesdropping). Natural languages such as English are very suboptimal error-correcting codes (fingerpuinting?); warning about real-world™ errors beyond the usual scope of the mathematical theory (eror-collecting colds). Coming attractions, drawing on combinatorics, symmetry and groups, linear algebra, finite geometry, invariant theory, orthogonal polynomials, etc. Example: the binary Golay code G₂₃, suggested by the coincidence

1 + 23 + (23·22)/2! + (23·22·21)/3! = 1 + 23 + 253 + 1771 = 2048 = 2¹¹,
has point stabilizer the sporadic Mathieu group M₂₃. Another example: encoding polynomials of degree <k on F_q by their values on all q field elements yields a code that can detect q−k errors; thus it can correct (q−k)/2 errors, and in this special case we’ll see that that this is computationally feasible (whereas in general finding the nearest codeword in a linear code is NP-complete).

[formalities about texts, office hours, grading, etc.; see the initial handout]

Tuesday, Sep. 5: Introduction

General setup for block codes:

Fix a finite set A (“alphabet”) of q>1 elements (“letters”).
[q=1 is vacuous, as already observed by Lewis Carroll — in general, a letter from an alphabet of size q carries log₂(q) bits’ worth of information.]

For any integer n>0 consider the set Aⁿ of n-letter “words” w = (w₁, …, w_n). Define the Hamming distance between two words w, w’ as the number of coordinates i where w_i ≠ w_i’; that is, d(w, w’) is the number of one-letter changes (errors) one must make in w to reach w’. Exercise: This is in fact a distance function. We can thus regard Aⁿ as a metric space, known as Hamming space.

A code is a nonempty subset C of Hamming space. C is said to be binary, ternary, quaternary, “etc.” if the alphabet is of size 2, 3, 4, etc. The integer n is the length of the code. The key parameters of C are q, n, the number of elements (“codewords”) of C, and the minimum (nonzero) distance d_min between codewords. [If there is only one codeword then it is sometimes convenient to set d_min = n + 1, rather than d_min = ∞ which is the usual convention for the minimum of an empty set.] We use the notation “(n, M, d) code” for a code of length n, size M, and minimum distance d. A code of minimum distance at least d is (d−1)-error detecting, and thus also e-error correcting if d > 2e. Sometimes the alphabet size q is incorporated into the notation as a subscript, making C an “(n, M, d)_q code”.

A basic question of coding theory is how large M can get given q, n, and d (or more-or-less equivalently how large d can get given q, n, and M). As is often the case in this kind of combinatorics, except in trivial cases it is rare that we can get an exact answer for a given choice of q, n, and M or d, and even the asymptotic behavior is not known: we can only give lower and upper bounds, which hardly ever coincide. Here one standard asymptotic direction is to fix q, let n→∞, and try to make the transmission rate R = (log_qM) / n and the error-detection rate δ = d / n both large. One of our recurring themes in this course will be such bounds and asymptotic bounds, and the rare and often special cases where the bounds agree and we get a provably optimal code.

An elementary upper bound is due to Singleton (1964): an (n, M, d) code must have M ≤ q^n−d+1. The proof is an easy pigeonhole argument. (Note that when M = 1 we must use the convention d = n + 1 for the Singleton bound to hold in this case.) Equivalently, R + δ ≤ (n + 1) / n. Codes that attain this bound are known as maximum distance separable, or MDS for short. We shall see that q is a prime power such a code exists for all d and n as long as 0 ≤ d−1 ≤ n ≤ q+1, but the behavior for n much larger than q is very different. Indeed, it follows from the Singleton bound that asymptotically R + δ ≤ 1, but we’ll see that for fixed q and large n this bound cannot be attained except for (R,δ)=(1,0) and (0,1), and moreover that if R > 0 for an infinite family of codes with fixed q then 1−δ is bounded away from zero.

Two trivial examples of MDS codes are Aⁿ itself (with d=1) and a one-word code (with d=n+1 — so yes, a singleton code attains equality in the Singleton bound). Only slightly less trivial is the example with d=n of the repetition code consisting of the q words of the form (a, a, …, a). A final easy example is a single checksum code with d=2 (also called a parity check code in the case q = 2): give A a group structure (usually abelian, though this isn’t necessary), and let C consist of the qⁿ⁻¹ words whose entries sum to zero. A curious variant is the ISBN, which is in effect the (10, 11·10⁹, 2)₁₁ code obtained from an MDS code by discarding all codewords that use 10 for any but the last letter; this code (even in its unmutilated MDS form) also has the feature of detecting single-transposition errors, because it is defined by an F₁₁-linear relation with pairwise distinct coefficients.

An example of an optimal code that is not MDS is the Hamming code (1950), with parameters q = 2 and (n, M, d) = (7, 16, 3). [NB an MDS code with the same n and M would have had d=4.] The 7 coordinates are identified with the vertices of the Fano plane P²(F₂); the alphabet A is F₂ = {0,1}, and the codewords are the all-zero word, the characteristic functions of the lines, and the ones’ complements of these 1+7=8 words. Check that each codeword c is at distance 3 from seven codewords and at distance 4 from seven other codewords, with the remaining codeword other than c itself being the antipodal word, at distance 7. Since no two codewords are at distance less than 3, the “Hamming spheres” (actually Hamming-distance balls) of radius 1 about the codewords are disjoint; but each contains 1+7=8 words, and there are 16 of them, so they tile the 2⁷ words of A⁷ perfectly, proving that M=16 is optimal given q=2, n=7, and d=4.

In general, the sphere-packing bound (a.k.a. the Hamming bound) is the observation that if d > 2e then M can be no larger than qⁿ divided by the number of words in a Hamming ball of radius e. A code that attains this bound is said to be perfect. This happens very rarely: besides generalizations of the Hamming code that we’ll see before long, all of which are perfect single-error-correcting codes, the only known perfect codes are the trivial ones of a one-word code and a binary repetition code of odd length, and the very nontrivial examples of the binary and ternary Golay codes. It is known (and we may show towards the end of the class) that these are the only perfect codes for which q is a prime power. [Warning: the condition that qⁿ be an exact multiple of the sphere volume is necessary but not sufficient; a notable example: it is known (and we’ll prove) that there is no perfect binary (90, 2⁷⁸, 5) code.]

What does the Hamming bound look like in the asymptotic regime where we fix q and let n and d grow while their ratio δ remains roughly constant? We need the following fundamental computation. The number of words at distance exactly e from a given word is (q−1)^e Binomial(n, e). If we fix q and let n, e → ∞ with e/n → ε for some fixed ε in (0,1), then we can estimate Binomial(n, e) using Stirling’s approximation. We find that log(Binomial(n, e)) is asymptotic to nH(ε) where H is the (binary) entropy function

H(ε) = −ε log ε − (1−ε) log (1−ε).
This gives us the logarithmic asymptotics of the count (q−1)^e Binomial(n, e) for q=2; in general we must add a term e log(q−1) to find that our count is asmyptotically exp(nH_q(ε)+o(n)), where H_q is the q-entropy function

H_q(ε) = −ε log ε − (1−ε) log (1−ε) + ε log (q−1).
For any q, this function H_q is convex downwards, with vertical tangents at (ε, H_q(ε)) = (0, 0) and (1, log(q−1)). Thus H_q has a unique maximum in (0,1), and it’s an elementary calculus exercise to locate this maximum at ε = (q−1)/q, where H_q(ε) = log q. [The location and value of the maximum were predictable in advance — do you see why? Likewise the symmetry H(ε) = H(1−ε) of the binary entropy function.]

It soon follows that the number of words at distance at most e from a given word is given by the same asymptotic formula exp(nH_q(ε)+o(n)) for ε ≤ (q−1)/q, and is (1−o(1)) qⁿ once ε > (q−1)/q. Therefore the asymptotic form of the sphere-packing bound is

R ≤ 1 − (H_q(δ/2) / log(q)).
This bound is better (i.e. smaller) than the asymptotic Singleton bound for small δ (because of the vertical tangent at δ=0), and for all δ in (0,1) when q=2 (because the two bounds agree at the endpoints and the function 1 − (H(δ/2) / log(2)) of δ is convex upwards).

Replacing e by d−1 in the Hamming bound yields the Gilbert-(Shannon-)Varshamov bound, which is a lower bound: there exists an (n, M, ≥d)_q code provided that qⁿ is no larger than the product of M with the number of words in a Hamming ball of radius d−1, because until that happens we can keep adding words to C that do not bring its minimum distance below d. The asymptotic form of this lower bound is

R = 1 − (H(δ) / log(q)) (δ < (q−1)/q).

It is a perennial source of embarrassment that, while the Gilbert-Varshamov bound is quite weak for small n, the asymptotic form is very hard to beat; the only improvements known use surprisingly sophisticated techniques from number theory (curves with many points over finite fields), and for small q (all q up to about 40) the Gilbert-Varshamov bound is still the best we have for all δ. [There are similar embarrassments elsewhere in combinatorics, as in Ramsey theory and Euclidean sphere packing in high dimensions, where the probabilistic method is asymptotically better than any known explicit construction.]

Tuesday, Sep. 10: Linear codes

We usually take for q a prime power, and identify A with a finite field, which as usual we denote by F [the alternative k, for German Körper, is pre-empted as we shall see a couple of paragraphs below]. Hamming space is then the F-vector space Fⁿ, and the Hamming distance is translation-invariant, so

d(w, w’) = d(0, w’−w) = wt(w’−w)

where “wt” is the (Hamming) weight: the weight of any word w is its number #{i: w_i ≠ 0} of nonzero coordinates. This is a discrete norm on Fⁿ, satisfying the triangle inequality wt(w+w’) ≤ wt(w) + wt(w’) for all w, w’, and also the homogeneity wt(cw) = wt(w) for all words w and nonzero scalars c.

A code C ⊆ Fⁿ is said to be linear if it is closed under addition and scalar multiplication, that is, if it is a vector subspace of Fⁿ. This is a very special case, but an important one; we shall soon say more about the advantages and disadvantages of linear codes vs. unrestricted codes, but for now we note that almost all our examples of codes from the previous lecture become linear when A is identified with a finite field, including the Hamming (7, 16, 3) code (check this!). This is clearly true for Aⁿ and the repetition code; a one-word code is linear iff it is the zero word; the one-checksum code is linear provided we chose (F,+) for the group structure; and the ISBN code is just a one-checksum code over the 11-element field with some of its codewords removed without increasing the minimum distance.

Suppose C is a linear (n, M, d) code. Then M = q^k where k is the dimension of C as an F-vector space; and d is the minimum (nonzero) weight min_{c∈C,
c≠0} wt(c) because C is closed under subtraction. A linear code’s basic parameters of distance, dimension, and minimum distance are given in square brackets: a linear (n, q^k, d) code is an [n, k, d] code (again with optional subscript q). The one-word code, repetition code, Hamming code, single-checksum code, and the code Fⁿ have parameters [n, 0, n+1] (or [n, 0, ∞]), [n, 1, n], [7, 4, 3], [n, n−1, 2], [n, n, 1] respectively.

The rate R = (log_qM) / n of an [n, k, d] code is simply k/n, and the Singleton bound is k + d ≤ n + 1. This makes sense, and remains true, even when F is not assumed finite, though then our proof (which used the pigeonhole principle) does not work. We shall rarely study linear codes over infinite fields, but can easily prove the Singleton bound for a linear code over an arbitrary field: choose any d−1 coordinates, and note that the subspace of C consisting of codewords supported on those coordinates must be zero; but this subspace has codimension at most n−d+1 in C, so n−d+1 is an upper bound on dim(C). A linear MDS code is a k-dimensional code whose words supported on any n−m coordinates constitute a subspace of dimension max(k−m, 0). This is the usual behavior when F is infinite, which is why we rarely do coding theory over infinite fields (but see “compressive sensing” for a recent nontrivial analogue of linear error-correcting codes that works in the context of linear algebra over R or C).

The sphere-packing bound of course holds for linear codes as a special case. The Gilbert-Varshamov bound holds for linear codes as well, though here we must prove it anew. One way is to mimic our proof for arbitrary codes, adding one basis vector at a time while the Hamming spheres of radius d−1 have not yet filled Fⁿ, and using the invariance of the Hamming metric under translation. We obtain the same bound, possibly improved by a small factor (no larger than q, and thus asymptotically negligible). Alternatively, we can choose a k-dimensional subspace of Fⁿ randomly (with each subspace equally likely), and estimate the probability that it has minimum weight at least d by computing the average number of nonzero codewords of weight <d. As long as this average is less than 1, the probability of success is positive; indeed it suffices for the avearge to be under q−1 (why?). The largest k for which this argument works, call it k_GV, again yields a code whose number q^k_GV of words is within a constant factor of the GV bound for arbitrary codes. Moreover, if we apply the same argument for linear codes of somewhat smaller dimension, say k_GV−c (which is asymptotically indistinguishable from k_GV), then the probability of success is 1−O(q^−c), so we are almost certain to succeed. But (possibly even more embarrassing than before) we still don’t know how to do it in practice for large n; even if we choose C at random from codes of length k_GV−c, then C almost surely has minimum weight at least d, but we cannot prove it because checking whether an arbitrary linear code has a word of length <d is computationally intractable!

Often in mathematics when we introduce a new mathematical structure we follow it up not just with examples but also with operations that give new examples from old (direct sums and quotients of groups or vector spaces, field extensions and polynomial rings, etc.). Such operations exist for codes too, and we’ve seen a few examples without calling attention to the general constructions (e.g. a subset of an (n, M, d) code is an (n, M’, d’) code with M’ ≤ M and d’ ≥ d, and the subspace of an [n, k, d] code supported on n’ of the coordinates is an [n’, k’, d’] code with k’ ≥ k − (n−n’) and d’ ≥ d); but for now we need not pursue this systematically. We do give one fundamental construction that’s not so immediate: the dual, which is a bijection between linear codes of length n and dimension k and linear codes of length n and dimension n−k. Denote by ⟨·,·⟩ : Fⁿ × Fⁿ → F the usual perfect symmetric pairing

⟨v,w⟩ = ∑_i v_i w_i = v₁w₁ + v₂w₂ + · · · + v_nw_n.

[Warning: since we hardly ever use F⊆R, there are usually plenty of nonzero v with ⟨v,v⟩ = 0; but we shall have other uses for ⟨v,v⟩ before long.] If C is a linear code then the dual code of C is

C^⊥ := {c* ∈ Fⁿ : ⟨c*, c⟩ = 0 for all c ∈ C}.

As usual C^⊥⊥ = C. The zero code and Fⁿ are each other’s dual, as are the repetition code and single-checksum code. In general the minimum distance of C^⊥ cannot be predicted from the minimum distance of C; for instance, if C is [n, n−1, 1] then C^⊥ can have minimum distance anywhere from 1 to n−1. However, a linear code is MDS iff its dual is, so the dual of an [n, k, n−k+1] code is always an [n, n−k, k+1] code. This is illustrated by our above examples with k = 0, 1, n−1, n. To prove it in general, suppose C is a length n code of dimension k such that C^⊥ is not MDS. Then C^⊥ has a nonzero word of weight at most k. This gives a nontrivial linear relation on k coordinates of C (not just “at most k” because we can always include some idle coordinates to bring the total to k). But then the subspace of C supported on the remaining n−k coordinates has codimension strictly less than k, and thus contains some nonzero word of weight at most n−k. Therefore C is not MDS either, QED.

The dual of the Hamming [7, 4, 3] code is the [7, 3, 4] code whose nonzero codewords are the line complements; this is a constant-weight code: all nonzero words have the same weight, here 4. That [7, 3, 4] code is thus contained in its dual; a code C ⊆ C^⊥ is said to be self-orthogonal, because an equivalent condition is ⟨c, c’⟩ = 0 for all c, c’ ∈ C. (Except in characteristic 2 it is enough to check this for c=c’, as usual for quadratic forms.) Further examples are the zero code (of course) and the repetition code when the length is a multiple of the characteristic. Such a code must have k ≤ n−k, that is, k ≤ n/2. An important special case is a self-dual code, which is a linear code satisfying C^⊥ = C; equivalently, a self-orthogonal code of length n and dimension n/2. An easy example is the repetition code of length 2 over a field of characteristic 2. A more interesting example is the extended Hamming code, the [8, 4, 4] binary code obtained by adding a checksum coordinate to the Hamming [7, 4, 3] code. The Hamming code contains the antipode (one’s complement) of each of its codewords, and each non-antipodal pair of codewords is at Hamming distance 4. (In general, adding a checksum coordinate to a binary linear code yields a code of the same dimension all of whose words have even weight; applying this transformation to an [n, k, d] code of odd minimum distance such as the Hamming code yields an [n+1, k, d+1] code.)

Interlude on isometries: Hamming space Aⁿ has two natural sources of isometries: we can apply a permutation of A to each coordinate, or an arbitrary permutation to the set of n coordinates. These generate a group, call it G, of n! · q!ⁿ isometries, which is the “wreath product” that’s the semidirect product of the symmetric group S_n acting on (S_q)ⁿ. We claim that G is in fact the full isometry group of Hamming space. One way to show this is to check that G acts simply transitively on (n+1)-tuples (w₀, W₁, W₂, …, W_n) where w₀ is a word and each W_i is a permutation of the q−1 words w that differ from w₀ only in coordinate σ(i), where σ is some permutation of {1, 2, …, n}. If the full isometry group were any larger than G, then its stabilizer acting on these (n+1)-tuples would be nontrivial; but that’s impossible because we can use the Hamming metric, together with the choice of w and of the ordering of the n(q−1) words w at distance 1 from w₀, to reconstruct the rest of Hamming space (each word w’ is uniquely determined by d(w₀, w’) and the d(w₀, w’) words w at distance 1 from w₀ that are closer to w’ than w₀).

Two codes C, C’ of the same length over the same alphabet are said to be isomorphic if C’ = g(C) for some isometry g of Hamming space, and the automorphism group Aut(C) of C is the stabilizer of C in the isometry group G of Hamming space (or sometimes the image of that group in the group of permutations of C; this is usually the same but there are codes whose stabilizer does not act faithfully).

If C is linear then translation by any codeword is an automorphism. But we usually exclude nonzero translations of linear codes, because the choice of 0 is part of the vector space structure, and thus should be fixed by any automorphism. Of the isometries of Fⁿ that preserve 0, the only linear ones are the F*-signed permutation matrices, that is, the matrices each of whose rows and columns has a single nonzero entry; the group of F*-signed permutation matrices (a.k.a. F*-signed coordinate permutations) is a semidirect product — again a wreath product — of S_n acting on (F*)ⁿ. The subgroup that takes C to C is the group of automorphisms of C as a linear code, and always contains the group F* of scalar matrices as a normal subgroup. (Again, we might be interested not in this group but in its image in GL(C) when the action might not be faithful.) Two linear codes are said to be isomorphic if they are equivalent under an F*-signed coordinate permutation matrix.

Putting the signed permutation matrices together with translations by (F*)ⁿ yields a group, call it G_lin, of Hamming isometries consistent with the linear structure. This group is a semidirect product in two ways: the F*-signed permutation matrices acting on (F*)ⁿ, and S_n acting on the nth power of the ax+b group of affine-linear permutations of F (a.k.a. GL₁(F)). When q is 2 or 3, the ax+b group is all of S_q, so concentrating on linear codes does not artificially break the symmetry of the problem. But for large F the ax+b group is tiny compared with S_q (index (q−2)!), which makes the restriction to linear codes feel less natural, and might suggest imposing alternative structures on the alphabet — e.g., later this term we might briefly consider codes over P¹(F_q), which has q+1 letters and an action of PGL₂(F_q). The borderline case is q=4, where the ax+b group is the alternating group A₄, and we recover the symmetric group S₄ by adjoining the nontrivial field automorphism of F. In general, when F has nontrivial automorphisms (i.e., when q = p^e with e > 1), we can augment G_lin slightly by taking a further semidirect product with the action of Aut(F). Note, though, that we must apply the same field automorphism to all coordinates at once (the “diagonal action” of Aut(F)), so even when q = 4 we do not quite get all of G once n > 1.

Tuesday, Sep. 17: Linear codes and point configurations in projective space

The interlude on Aut(C) also serves as a segue to a dictionary between linear codes and configurations of points in finite projective spaces. In applications, and in treatises aimed at applications, an [n, k, d] code will often be described by a generator matrix in block form (I | C), where I is the identity matrix and C is regarded as a matrix of checksums, so that the first k letters are the message (“information bits”, for a binary code) and the remaining n−k are the redundant information that makes error correction possible. But this breaks the S_n symmetry on the coordinates; and of course choosing any generator matrix for C breaks its GL(C) symmetry. As usual in linear algebra, when it comes to doing explicit computation we usually need to make a symmetry-breaking choice, but for structural study we want to work instrinsically to the extent that we can.

Here’s one way to think about it, leading to a nice geometric description of linear codes. Hamming space Fⁿ is quite a concrete vector space, coming to us with a choice of coordinates, ambiguous only up to permutation and scaling individual coordinates. But C has no intrinsic structure except for what it gains as a subspace of Fⁿ. So we regard C as an abstract vector space together with an embedding into Fⁿ, that is, a choice of n linear functionals on C. Permutation equivalence says that we have not an ordered n-tuple but an unordered set (or perhaps a multiset, as for the repetition code); and scaling equivalence on each coordinate says each functional is defined up to nonzero scaling. We assume that none of the coordinate functions is the zero functional. (*) Then the coordinates are nonzero vectors in the linear dual C* = Hom(C, F) of C [NB this is not the same as the dual code C^⊥!], and scalar equivalence means they’re really points in the k−1 dimensional projective space P(C*) = (C*−{0}) / F*. These points give a linear map from C to Fⁿ (up to F*-signed coordinate permutation); this map is injective iff the functionals span C*, that is, iff the n points are contained in no hyperplane of P(C*) — we say that the points must “span” P(C*). [It takes at least k points to do that, but we already know that n > k.] Therefore:

An [n, k, *] code without identically-zero coordinates is tantamount to a spanning (multi)set S_C of n points in P^k−1(F).

This respects the relevant symmetries: two codes are isomorphic iff their associated configurations are equivalent under Aut(P^k−1(F)) = PGL_k(F).

(*) In general, an [n, k, d] code C with z identically-zero coordinates is just an [n−z, k, d] code C₀ with no such coordinates that has been artificially inflated to length n with z idle coordinates that contribute neither information nor error correction; if we can understand C₀ then we’ve also understood C.

So how to recover the minimum weights d and d* of C and C^⊥ from the geometry of this configuration S_C in the dual projective space? Nonzero codewords c up to scaling ↔ hyperplanes H in the dual P^k−1, and the weight of c is the number of points of S not in H. So the minimum weight is given by the formula

d = n − max_S #(S ∩ H).

Since any k−1 points span a hyperplane, the Singleton bound follows immediately (you should check that this is really the same proof that we gave already), and MDS codes correspond to configurations of points in “general linear position”, with no k of the points in a hyperplane. As for d*, a nonzero dual codeword is a nontrivial linear relation on the coordinates of C. We already excluded the most trivial of relations by requiring that C have no nonzero coordinates; so d* > 1. Beyond that, d* = 2 means two coordinates are proportional, so two of the points of S coincide; d* = 3 means S has distinct points but three of them are collinear; and so forth. In general, we see that

d* is the smallest size of a linearly dependent subset of S.

(In general, δ points in a projective space are said to be “linearly dependent” when they lie on a linear projective subspace of dimension less than δ−1; here there must be d* points on a subspace of dimension exactly d−2, else the minimum dual weight would be even smaller.) So again C is MDS iff C^⊥ is MDS.

Note that if C has neither an identically zero coordinate nor a weight-1 codeword, then the same is true of C^⊥. This happens iff S has no subset of n−1 points all on a hyperplane. Given such S, we can reconstruct C, then form the dual code C^⊥, and find its associated point configuration S*, which again is n points (permuted in the same way as the points in S) with no n−1 points on a hyperplane, but in a projective space of dimension n−k−1, not k−1. Since C^⊥⊥ = C, applying this construction to S* recovers S. So we have a duality between n-point configurations in P^k−1 and in P^n−k−1 over any field (not necessarily finite), with both configurations satisfying the condition that no n−1 of the points are on a hyperplane. This is equivalent to the (Coble-)Gale duality in algebraic geometry; see for instance the 1998 article by Eisenbud and Popescu.

Returning now to our usual setting of a finite field, we can use this picture to construct and study some basic examples of linear codes. Suppose for starters that C* is single-error-correcting, i.e. that d ≥3. This is equivalent to the condition that the points of the configuration S associated to C are distinct. Thus n = #S is at most #P^k−1(F) = (q^k−1) / (q−1), and conversely if n satisfies that bound then we can find S and recover C and then an [n, n−k, ≥3] code C*. If n equals its maximum value (q^k−1) / (q−1), then C* is determined uniquely up to isomorphism, and is perfect because in this case q^k = 1 + (q−1)n. This shows that perfect single-error-correcting codes of all possible lengths exist whenever q is a prime power. [The linear ones are unique as noted, but there can be nonlinear codes with the same parameters.] There are called Hamming codes, because the (q,n) = (2,3) case recovers the Hamming [7, 4, 3] code. (NB Sometimes C* is called a “Hamming code” only when q=2, in which case the parameters of C* are [2^k−1, 2^k−k−1, 3]. In this case C itself is a constant-weight [2^k−1, k, 2^k−1] code, of which we shall have more to say soon.) Another remarkable property of the Hamming code C* is that Aut(C*) = GL(C*) ≅ GL_k(F); to see this, note that Aut(C*) is a subgroup of GL(C*) that contains F* and that maps surjectively to PGL(C*) (because any projective linear transformation permutes S_C when S_C consists of all F-points of P^k−1(F)).

When k=2, the Hamming code is MDS, and is the longest codimension-k MDS code over F. The same is true trivially for k=1. This no longer holds for k ≥ 3. We next consider the case k=3, when an MDS code corresponds to a set of n (distinct) points in the projective plane P²(F) no three of which are collinear. Such a set is called an “n-arc” in P²(F), and n-arcs were already studied in finite projective geometry before the advent of coding theory. We next give some of the known results about small and large n-arcs in finite projective planes.

When n is small, it is easy to describe and enumerate n-arcs, even without assuming that our projective plane is algebraic, that is, using only the combinatorial definition of a projective plane of order q as an incidence relation on q²+q+1 points and q²+q+1 lines with each point (resp. line) incident to q+1 lines (points) and any two points (lines) incident on a unique line (point). We find that for n ≤ 6 the number of ways to extend an (n−1)-arc to an n-arc is a number A_n depending only on (q and) n, so the total number of n-arcs is (1/n!) ∏_1≤i≤n A_i, where A_i is q²+q+1, q²+q, q², (q−1)², (q−2)(q−3) for i=1, 2, 3, 4, 5, and q²−9q+21 = (q−4)(q−5) + 1 for i=6. In particular, the number of ordered 4-arcs is ∏_1≤i≤4 A_i = q² (q³−1) (q³−q) = #PGL₃(F), and indeed in an algebraic projective plane PGL₃ acts simply transitively on the ordered 4-arcs; more generally, for each k the group PGL_k acts simply transitively on the ordered (k+1)-tuples of points in general linear position in P^k−1. In the coding picture, this tells us that the MDS codes with parameters [k+1, k, 2] (and dually [k+1, 1, k]) are unique up to isomorphism — which however is not hard to see directly for the 1-dimensional code. Returning to the case of a projective plane, n=4 is also the first case where C satisfies our hypothesis of no n−1 points on a hyperplane(=line). For n=5, the new factors of (q−2)(q−3) can be explained by noting that five points with no three on a line lie on a unique conic, and a conic has q+1 points, of which we’re choosing 5 which can be done in (q+1)q(q−1)(q−2)(q−3) / 5! ways. [More on conics in finite algebraic projective planes next time; the count is q⁵−q².] In particular, 5- and 6-arcs exist on any projective plane of order at least 4, and on a plane of order 4 or 5 every 5-arc extends uniquely to a 6-arc.

Once n>6, the number of extensions of an (n−1)-arc to an n-arc is no longer constant, and even the total number of n-arcs in a finite projective plane Π can no longer be given by a simple formula because it depends on details of the geometry of Π. After some inclusion-exclusion analysis one finds that for n=7 the count of n-arcs depends also on the number of copies of the Fano (order-2) projective planes in Π For n=8 we need that count as well as the number of occurrences of a certain configuration of 8 points and 8 lines; for n=9 several new counts arise, including the number of copies in Π of the affine plane of order 3 (a configuration of 9 points and 12 lines of 3, one joining each pair of points); and it quickly gets more complicated after that. [See David Glynn: Rings of geometries II, J. Combinatorial Theory A 49 (1988), 26–66, especially Theorems 4.2 and 4.4; thanks to Nathan Kaplan for this reference.] When Π = P²(F_q), these first few contributions depend only on the residue of q mod 2 and 3 respectively (e.g. there are no Fano planes except when q is a power of 2, and no affine planes of order 3 if q ≡ −1 mod 3); but even in this case the formulas get arbitrarily complicated as n grows, in a sense made precise by a theorem of Mnëv (1988) and memorably described by R. Vakil as “Murphy’s Law”.

Thursday, Sep. 19: Codes and point configurations, cont’d; Segre’s theorem

Still we can say something about the largest n-arcs, or equivalently the longest MDS codes of dimension 3 (and dually of dimension n−3). We already saw in effect that a conic is an (q+1)-arc; and it is easy to see that there cannot be an n-arc for n>q+2: each of the q+1 lines through a point of the arc can contain at most one further point. Thus the maximal n is either q+1 or q+2. We shall see that:
• n = q+2 is possible iff q is even;
• n = q+1 is attained only by conics when q is odd.
The latter is a celebrated theorem of Segre (Canad. J. Math. 7 (1955), 414–416 [NB what Segre calls a “Galois field” γ is our finite field F]); in our setting, this theorem says that the [q+1, 3, q−1] MDS code over a field of odd order is unique up to isomorphism. Arcs of q+1 and q+2 points are called “ovals” and “hyperovals” respectively; thus we assert that P²(F) contains a hyperoval iff q is even, while for odd q the only ovals are conics.

The upper bound q+2 has a purely combinatorial proof, and thus holds for an arbitrary finite projective plane, not just for algebraic planes; the same is true for the result that if a plane of order q has a hyperoval then q is even. The former result is easy: given a point P of an n-arc, each of the q+1 lines through P can go through only one further point of the arc, for a total of q+2. For the latter, observe that since P was an arbitrary point of the arc, any line not disjoint from a hyperoval must meet it in exactly two points; considering all line through a given point off the hyperoval gives a partition of the hyperoval’s points into pairs, whence q+2 is even, which proves the claim. [Note that this argument fails in the degenerate case of the “projective plane of order 1”, for lack of a point off the hyperoval (we need q²+q+1 > q+2); and indeed that “plane” does have a hyperoval. But we won’t worry about geometry over a “one-element field”, though it can be F_un to contemplate such things.]

Still working in the general combinatorial setting, we see that each point P of an oval O is on exactly one line t_P that meets O in no other point, and is thus called the “tangent” to O at P. Each point not in O is then on an even number of tangents if q is odd, and on an odd number of tangents if q is even. In the former case, a second-moment argument shows that the even number is either 0 or 2, just as in the familiar case of real conics (where a point in the plane lies on 0, 1, or 2 tangents according as it is inside, on, or outside the conic — in the projective setting the “inside” of a conic is the simply-connected component of its complement). For each of the q² points P not in O, let τ_P be the number of tangents through P. There are q+1 tangents, each lying on q points off O; thus ∑_P τ_P = (q+1) q. Each pair of tangents meets in a unique point, which is not on O, and each P is the intersection of τ_P(τ_P−1)/2 such pairs; thus ∑_P τ_P(τ_P−1) = (q+1) q. But then ∑_P τ_P(τ_P−2) = 0, and since each τ_P is even none of the summands is negative. Hence all summands vanish, whence each τ_P is 0 or 2 as claimed. A similar argument (see Problem 3 of the second problem set) shows that when q is even each τ_P is either 1 or q+1, the latter arising once; that is, all the “tangents” meet at a point! This point is called the “center” of the oval, and an oval together with its center is a hyperoval. For Π = P²(F_q) we shall show algebraically that every conic has a center, which will prove the existence of hyperovals in algebraic projective planes of even order.

[Proof of Segre’s theorem, where the key step is in effect that every triangle inscribed in an oval (i.e., formed by three of its points) has a “symmedian point”, or equivalently that every triangle circumscribed about an oval (i.e., formed by three of its tangents) has a Gergonne point. (For more on these, see X(6) and X(7) in the “Encyclopedia of Triangle Centers”, which starts with X(1)=incenter and X(2)=centroid, and has reached X(5394) and beyond.) Along the way we use in effect the classical theorems of Menelaus (115±25 CE), Ceva (1678, but anticipated by about 700 years by al-Mutaman), and (in one direction only) Wilson(-Lagrange 1771, known and perhaps proved a century or 7+ earlier)! When q is even, the same argument gives an alternative proof (but valid only for finite algebraic Π) that any three tangents to an oval, and thus all its tangents, meet at a point. Here’s another application of Segre’s result, showing that quadratic polynomials on a prime field F=Z/pZ of odd order are the only maps f : F→F such that x ↦ f (x+a) − f (x) is a permutation for each nonzero a in F. (And here’s how I found out about it.)]

Tuesday, Sep. 24: More on conics and other configurations in projective spaces

See the notes.

Once we know that every conic with a point can be written as xy + xz + yz = 0 we can check directly that in characteristic 2 all the tangents meet at (1:1:1). The existence of a “center” of a conic P(x, y, z) = 0 in characteristic 2 can be understood as follows. A quadratic form P(·) over any field yields a symmetric bilinear form B(·,·) defined by B(v,w) = P(v+w) − P(v) − P(w). In odd or zero characteristic, this form is nondegenerate iff P is. But in characteristic 2, B is also alternating, so if the number of variables is odd (as it is for a ternary form) there must be at least a one-dimensional kernel — and in our setting the kernel can be no larger (if B is identically zero then the conic is a double line). So there is some nonzero c, unique up to scaling, for which B(c,v) = 0 for all v. But then P(v+ac) = P(v) + a²P(c) for all scalars a, so if P(v+ac) = 0 (that is, if v is on the conic) then the line joining v to c meets the conic with multiplicity 2 at v, so is tangent to the conic, as claimed.

Once we know that every conic C with a rational point is a rational normal curve of degree 2, Bézout’s theorem becomes elementary for the intersection of C with any curve. Indeed once we identify C with P¹, the intersection of C with a curve of degree d becomes a polynomial of degree 2d on that P¹, and this polynomial is either identically zero (which happens precisely when our degree-d curve contains C) or has exactly 2d zeros counted with multiplicity over an algebraic closure. In particular, taking d=2 we see that two distinct smooth conics can meet in at most 4 points. This shows that when q is a power of 2 larger than 4 there are non-conic ovals obtained by starting from a conic plus its center and removing one point of the conic. (And it is known that once q>8 there are hyperovals that are not of the form conic+center at all.)

Thursday, Sep. 26: Introduction to weight enumerators and the MacWilliams identity

The (Hamming) weight enumerator of a linear code C of length n is a generating function that encodes the counts of words of weight w for each w = 0, 1, …, n, in a generating function W_C(X,Y), defined as ∑_c∈C X^n−wt(c)Y^wt(c), so for each w the X^n−wY^w coefficient is the number of words of weight w. [W_C also encodes the distribution of Hamming distances between codewords, thanks to the formula d(w, w’) = d(0, w’−w) = wt(w’−w); to generalize W_C to nonlinear codes we would sum X^{n−d(c, c’)}Y^d(c, c’) over all pairs of codewords, and possibly divide by #(C).] The weight enumerator encodes more refined information than just the minimum weight, and (while [n, k, d] for a linear code do not in general determine the minimum weight of its dual) the weight enumerator of a linear code C determines the weight enumerator of its dual code C^⊥, and thus in particular also the minimum distance of C^⊥.

Before giving and proving MacWilliams’ formula relating W_C with W_C^⊥, we give some examples and simpler properties.

The simplest example is the singleton code {0}, whose weight enumerator is Xⁿ.

The repetition code has one word of weight 0 and the remaining words have weight n, so its weight enumerator is Xⁿ + (q−1)Yⁿ.

Every linear code has one word of weight 0, so the weight enumerator always has leading term 1·Xⁿ; in other words, W_C(1,0)=1.

Evaluating W_C at X = Y = 1 counts the summands in ∑_c∈C; in other words, W_C(1,1) = #C, which is q^k for an [n, k, d] code.

The code Fⁿ has weight enumerator (X+(q−1)Y)ⁿ. This can be seen from the binomial expansion, but can also be seen as a special case of the following result: let C₁ and C₂ be [n₁, k₁, d₁] and [n₂, k₂, d₂] codes over the same field, and C the [n₁+n₂, k₁+k₂, min(k₁, k₂)] code whose words are concatenations of arbitrary words in C₁ and C₂. Then W_C = W_C₁ · W_C₂ . The proof is immediate from the distributive law. By induction it follows that concatenating any number of linear codes yields a code whose weight enumerator is the product of the weight enumerators of the component codes. [Warning: this simple-minded construction is not what “concatenated code” usually means in the theory of error-correcting codes.] Now Fⁿ can be constructed this way from n components each of which is the [1, 1, 1] code F, so W_Fⁿ = (W_F)ⁿ, and it is clear that W_F(X,Y) = X+(q−1)Y, from which W_Fⁿ = (X+(q−1)Y)ⁿ follows by the product formula.
The weight enumerator of the one-checksum code C is a bit trickier to compute — except for the case q=2, where a word is in C iff its weight is even. So, we get W_C from the weight enumerator (X+Y)ⁿ of Fⁿ by extracting the terms whose exponent of Y is even. This is accomplished by the standard trick of substituting −Y for Y and averaging. Thus the weight enumerator of the parity-check code of length n is ((X+Y)ⁿ + (X−Y)ⁿ) / 2.

By now we have enough examples to guess the rule, at least for binary codes:

Theorem (MacWilliams identity for binary codes). The weight enumerators of any linear binary code C and its dual code C^⊥ are related by

W_C^⊥ (X,Y) = |C|⁻¹W_C(X+Y, X−Y).

The proof is an application of Poisson summation for finite abelian groups. For any subgroup H of a finite abelian group G, and any function f :G→C, Poisson summation writes ∑_h∈H f (h) as a multiple of the sum of the discrete Fourier transform of f over the annihilator of H, which consists of all characters of G whose restriction to H is trivial. For G = Fⁿ, we shall identify the character group (a.k.a. the “Pontrjagin dual”) of G with G, and show that under this identification the annihilator of C is none other than the dual code C^⊥. When q=2, the Fourier transform of the function taking c to X^n−wt(c)Y^wt(c) is the function (X+Y)^n−w(X−Y)^w, which will imply the binary MacWilliams identity. For arbitrary F, the Fourier transform of the same function is (X+(q−1)Y)^n−w(X−Y)^w, giving rise to the MacWilliams identity for linear codes over any finite field:

Theorem (MacWilliams identity for Hamming weight enumerators). The weight enumerators of any linear code C over a q-element field and its dual code C^⊥ are related by

W_C^⊥ (X,Y) = |C|⁻¹W_C(X+(q−1)Y, X−Y).

So, for instance, our formula for the weight enumerator of a single-checksum code generalizes to ((X+(q−1)Y)ⁿ + (q−1)(X−Y)ⁿ) / q. Check that this is indeed consistent with the fact that this code has minimum weight 2!

We next develop Pontrjagin duality and Fourier analysis for finite abelian groups, where all the relevant C-vector spaces are finite dimensional. See for example paragraphs 2 and 3 of page 5 in this chapter of my notes on analytic number theory from 2009 (where this theory is used to describe Dirichlet characters); see also the further remarks on page 8, and Exercise 5 on page 10. The Fourier transform on binary Hamming space (Z/2Z)ⁿ is an important special case also known by other names such as the “Hadamard transform”. As usual with Fourier analysis there are several choices of normalization in the literature; we’ll use the same normalization that I chose here (page 1385), where the Fourier transform of f :G→C is the function taking any character χ to ∑_g∈G χ(g) f (g), which saddles the formula for the inverse Fourier transform with the factor |G|⁻¹ and the complex conjugate χ(g). [For G = (Z/2Z)ⁿ we need not worry about the complex conjugate because χ(g) is always ±1.]
Poisson summation figures in analytic number theory too (see this chapter of the 2009 notes), but again we need only the finite case, which is easier than Poisson summation on Z ⊂ R and entirely elementary. (We may cover Poisson summation on Z ⊂ R, or more generally on lattices in Rⁿ, when/if we get to the connection between linear codes and Euclidean lattices.)

Once q>2 we can form a weight enumerator that keeps track of more information than W_C, counting not just the distribution of zero vs. nonzero coordinates in codewords but also which nonzero field elements occur. (Yes, this breaks some of our symmetry, though it still treats all n coordinates equally.) The resulting complete weight enumerator cwe_C is a homogeneous polynomial of degree n in q variables indexed by F. Each codeword c contributes a term ∏_i X_{c_i}. The MacWilliams identity extends to complete weight enumerators: to obtain the c.w.e. of the dual code, start with cwe_C, apply the discrete Fourier transform to the function a ↦ X_a, and divide by |C|.

Tuesday, Oct. 1: Some uses of the MacWilliams identity; Gleason’s theorem for self-dual binary codes

Some applications of the MacWilliams identity:

Compute W_C when W_C^⊥ is easier to get at directly. Example: if C is a one-error-correcting Hamming code of length n = (q^k−1) / (q−1), then C^⊥ is a constant-weight [n, k, q^k−1] code, so W_C^⊥(X, Y) is Xⁿ + (q^k−1) X^{n−q^k}Y^{q^k}. MacWilliams then gives us a formula for W_C(X, Y). Expanding W_C in powers of Y then gives Xⁿ + N₃ Xⁿ⁻³Y³ + N₄ Xⁿ⁻⁴Y⁴ + O(Y⁵), where N₃ is q−1 times the number of collinear triples of points in P^k−1(F) (check these claims!). Can you compute and account for N₄? This is the start of a neat story at the interface of coding theory and algebraic geometry, about which we alas won’t be able to say much more this term.
More generally, we can compute the weight enumerator of any linear code whose dual is small enough to list one vector at a time. It is still a computationally intractable problem to compute the weight enumerator of a code of large length n whose dimension is near n/2 (or more generally whose dimension and codimension aren’t very small compared with its length) — all known algorithms are exponential-time — though the MacWilliams identity can help reduce the base of the exponent.
Determine the weight enumerator of MDS codes. The Hamming weight code of an MDS code depends only on its parameters q, n, and k: there’s one word of weight zero, then nothing until weight w = n−k+1 where each set of w coordinates supports q−1 codewords, and then for w = n−k+2 each set of w coordinates supports (q−1) (q+1−w) codewords, “etc.” — but the combinatorics can get somewhat intricate, and after our experience counting MDS codes of dimension 3 and length at least 7 you might be worried that the weight enumerator might not even be the same for all MDS codes of the same alphabet size, length, and dimension. However, the MacWilliams identity makes this clear. If C is MDS, then so is C^⊥; so we know W_C up to multiples of of Y^n−k+1, and (by MacWilliams) up to multiples of (X−Y)^k+1. These two polynomials are relatively prime, and the sum of their degrees is n+2 > n; so W_C is determined uniquely, and even with a few dimensions to spare for a ’sanity check” on our computations: the (inhomogeneous) linear system for the coefficients of W_C must be consistent, and must predict correctly the number of minimum-weight words of both C and C^⊥.
Weight enumerators and other properties of self-dual (and some nearly self-dual) codes. This will occupy us for the rest of this lecture and some time beyond.

Suppose C is a self-dual binary code of length n. Then C is automatically “even”: every codeword c has even weight, because ⟨c, c⟩ = 0. But a binary code is even iff its dual code contains the all-ones word 1. Since C is self-dual, it thus contains 1. Therefore C is closed under the ones’-complement involution c ↔ c+1, which takes words of length w to words of length n−w.

Now consider what this means for the weight enumerator W_C(X,Y).

All weights are even implies W_C is invariant under Y ↔ −Y. Hence W_C(X,Y) is a polynomial in X and Y².
But the degree n of W_C is even: we’ve already noted that a self-dual code over any field must have 2 | n (because C must have dimension n/2). Thus W_C is invariant also under (X,Y) ↔ (−X,−Y), and is thus a polynomial in X² and Y².
The ones’-complement involution shows that W_C is invariant also under the involution (X,Y) ↔ (Y,X). Therefore it is a polynomial in X²+Y² and X²Y² (elementary symmetric functions in X² and Y²; alternatively, write W_C as a polynomial in X²+Y² and X²−Y²).
Finally, MacWilliams gives invariance under the involution (X,Y) ↔ (2^−½(X+Y), 2^−½(X−Y)). This involution fixes X²+Y², and takes X²Y² to (X²−Y²)²/4. [The fact that X²+Y² is fixed is clear, because that’s the weight enumerator of the [2,1,2] repetition code {(0,0), (1,1)} which is self-dual.] Now the sum X²Y² + (X²−Y²)²/4 is (X²+Y²)²/4, which gives us nothing new; but the product gives an independent invariant, which we might as well multiply by 4 (weight enumerators must have integer coefficients) to get δ₈ := X²Y²(X²−Y²)².

In effect we’ve seen that W_C is invariant under a group, call it G_I, of 16 linear transformations of (X, Y); this group is dihedral: before the MacWilliams step, we had the symmetries of the square with vertices (±1, 0) and (0, ±1), and the MacWilliams involution adds to this the reflection about a line making angle π/8 with the horizontal, producing the symmetries of the regular octagon with vertices (±1, 0), (0, ±1), and (±2^−½, ±2^−½). We then showed that invariance under G_I makes W_C a weighted-homogeneous polynomial in X²+Y² and δ₈. [To prove the last step, we might write W_C in terms of X²+Y² and X²Y² − (X²−Y²)²/4, and then apply MacWilliams; this gives another set of generators of the invariant ring, but an equivalent one because (X²Y² − (X²−Y²)²/4)² = (X²+Y²)⁴/16 − δ₈.] This is Gleason’s theorem for self-dual binary codes. As a sample application, we determine all self-dual codes of length n ≤ 8. For any even n, Gleason’s theorem (together with the fact that W_C has leading term Xⁿ) shows that it takes only floor(n/8) coefficients to specify the weight enumerator of a self-dual code of length n; in particular, for n < 8 the weight enumerator must be (X²+Y²)^n/2. It follows that there are n/2 words of weight 2. But in a self-dual (more generally, a self-orthogonal) code C with a word c of weight 2, every word is either disjoint from c or the sum of c with a word disjoint from c. That is, C decomposes as the direct sum of a self-dual (or self-orthogonal) code of length n−2 with the [2,1,2] repetition code supported on c. In our setting we conclude that our code C is the direct sum of n/2 copies of that [2,1,2] code. For n=8, either C contains a word of weight 2 — in which case we’ve reduced to n=6, and conclude that C is the direct sum of 4 copies of the [2,1,2] code — or W_C is the unique linear combination of (X²+Y²)⁴ and δ₈ of the form X⁸+O(Y⁴), which is (X²+Y²)⁴ − 4δ₈ = X⁸ + 14X⁴Y⁴ + Y⁸, a polynomial we recognize as the weight enumerator of the [8, 4, 4] extended Hamming code. It is not hard to show that the extended Hamming code is the unique [8, 4, 4] code (for instance, by reducing to the uniqueness of the perfect [7, 4, 3] Hamming code, which we have shown already). This completes the classification of self-dual codes of length at most 8 up to isomorphism. [Such codes have by now been classified through length 24 and a bit beyond, but this requires some further tools beyond the MacWilliams identity, plus some nontrivial computation to treat the largest few cases; an eventual combinatorial explosion is inevitable, because the number of self-dual codes grows as 2^n²/4−O(n), and there are only n! = 2^{O(n log n)} permutations to identify different codes, making the total at least 2^{n²/4−O(n log n)}.]

A special property of binary codes is that once a linear code C is self-orthogonal the map taking a codeword c to its weight modulo 4 is a homomorphism C → 2Z/4Z. This follows from the inclusion-exclusion formula for symmetric differences (which correspond to addition in Z/2Z under the usual dictionary between (0,1)-valued functions and subsets): |AΔB| = |A| + |B| − 2|A∩B|. [This formula generalizes to symmetric differences of more than two sets; triple, quadruple, quintuple, etc. intersections are counted with multiplicity +4, −8, +16, and so forth. We’ll prove and use this generalization some weeks hence.] For a self-orthogonal code, |A|, |B| and |A∩B| are all even, so |AΔB| ≡ |A| + |B| mod 4 as claimed. (It also follows from the |AΔB| formula that conversely a binary linear code in which all weights are multiples of 4 must be self-orthogonal.) As a corollary, either C has as many words of weight 0 mod 4 as there are of weight 2 mod 4, or all the weights in C are divisible by 4. A linear binary code C is called singly even in the former case, and doubly even in the latter. If C is self-dual, it is also said to be of Type I if singly even, and of Type II if doubly even. For example, the [2,1,2] repetition code is Type I, while the extended Hamming code is Type II.

We can detect whether a self-orthogonal binary code C is singly or doubly even from its weight enumerator: W_C(1, i) is zero if C is singly even, and equals |C| when C is doubly even. Now if C is self-dual then by Gleason it is a polynomial in X²+Y² and δ₈, and we see that substituting (X,Y) = (1, i) yields zero in X²+Y² but −4 for δ₈. Therefore W_C(1, i) = 0 unless n is a multiple of 8, and we conclude that the length of every Type II code is a multiple of 8, in which case the (δ₈)^n/8 coefficient of W_C is (−4)^n/8. Conversely, if 8 | n we obtain a Type II code of length n as the direct sum of n/8 copies of the extended Hamming code (and again for large n there will be many other choices). Also, if 8 | n then a Type I code of length n must have weight enumerator divisible by X²+Y², so the (δ₈)^n/8 coefficient of W_C is zero, and we have determined one of the n/8 unknown coefficients of the expansion of W_C in monomials of degree n in X²+Y² and δ₈.

The weight enumerator of a Type II code is also invariant under the substitution of (X, iY) for (X,Y); this linear transformation together with G_I generates a larger group G_II that leaves invariant any such weight enumerator. We next describe G_II and its invariants, deduce the Gleason theorem for Type II codes, and obtain some consequences.

Thursday, Oct. 3: Gleason for self-dual codes of Type II, III, and IV

[Overview of Gleason I–IV in the context of complex reflections groups. [Here’s the paper “Finite unitary reflection groups” by Shephard and Todd (Canad. J. Math. 6 (1954), 274–304) that determined all such groups and gave the first proof of the theorem that these are precisely the finite subgroups of GL_n(C) with a polynomial invariant ring.]

[For a picture of the vertices of the regular octahedron permuted by G_II/μ₈, see Figure 2 here (page 6 [1387 in the journal]; for what it’s worth, Figure 1 on the previous page shows a somewhat pixellated graph of δ₈ and its double zeros on the unit disc X²+Y²≤1). For the approach I m taking to the invariant rings of G_I and G_II, see Appendix A (pages 33–34) of my paper with S. Kominers.]

[...] Thus the weight enumerator of a Type II code of length n has only floor(n/24) undetermined coefficients. In particular, for n < 24 there are no undetermined coefficients, and W_C = ϕ₈^n/8 is forced. This does not, however, mean that C must be isomorphic with the direct sum of n/8 copies of the [8, 4, 4] code, as one might guess by analogy with the Type I case. In fact for n=16 there is another Type II code, obtained as follows. Start with the direct sum of 8 copies of the [2, 1, 2] repetition code; then form the doubly even [16, 7, 4] subcode, which consists of words (a₁, a₁, a₂, a₂, a₃, a₃, …, a₈, a₈) such that a₁ + a₂ + a₃ + … + a₈ = 0, and adjoin the vector (1, 0, 1, 0, 1, 0, …, 1, 0). You should check that this is a Type II code and verify directly that its weight enumerator is ϕ₈². But it is not isomorphic with the direct sum of the [8, 4, 4] code with itself, because unlike that direct sum our new code is not linearly generated by its words of weight 4.

It is known that these are the only two Type II codes of length 16, and there are 9 such codes of length 24 (after which combinatorial explosion soon sets in). This is the first case where Gleason does not determine W_C uniquely, and again (as with Type I) we can make W_C unique by doubling the minimum weight, in this case getting the enumerator of a [24, 12, 8] code. We find that for any such code W_C must be ϕ₈³ − 42δ₂₄, which comes to

X²⁴ + 759 X¹⁶Y⁸ + 2576 X¹²Y¹² + 759 X⁸Y¹⁶ + Y²⁴.

Remarkably it is again true that there is a unique such code, and that removing any coordinate yields a perfect code, in this case the binary Golay code G₂₃, with parameters [23, 12, 7]. The extremal [24, 12, 8] code itself is called the “extended binary Golay code” G₂₄, and we shall describe it and its automorphism group (which is the largest of the five sporadic simple groups discovered by Mathieu) after giving the Gleason theorems and their initial consequences for self-dual codes of Type III and IV.

A Type III code is a self-dual ternary code. (Recall that “ternary code” means “code over the 3-element field”.) A ternary linear code C is self-orthogonal iff ⟨c, c⟩ = 0 for all c∈C, which in turn holds iff every codeword has weight divisible by 3. This means that W_C is invariant under the 3-cycle (X, Y) ↦ (X, ρY) where ρ is a cube root of unity. If C is self-dual then W_C is also invariant under (X, Y) ↔ 3^−½(X+2Y, X−Y) by MacWilliams. Thus W_C is invariant under the group, call it G_III, generated by these two linear transformations, which preserves the unitary form |X|² + 2 |Y|². An example of a Type III code is the [4, 3, 2] “tetracode”, which is the unique MDS code of these parameters (four points in P¹(Z/3Z); explicit generators can be 0 + + + and + 0 + −), a constant-weight code with weight enumerator ϕ₄ = X⁴ + 8 XY³. So G_III permutes the zeros of ϕ₄ in the Riemann sphere, which are at (X : Y) = 0, −2, and 1 ± sqrt(−3), forming a regular tetrahedron with respect to our unitary form |X|² + 2 |Y|². Since (X, Y) ↦ (X, ρY) acts by a 3-cycle and MacWilliams by a double transposition, G_III acts on the vertices of our regular tetrahedron by its full group A₄ of orientation-preserving isometries. The kernel of the map G_III → A₄ consists of roots of unity of order at most 4 (because we have a degree-4 invariant invariant ϕ, or alternatively because G_III is generated by linear transformations of determinant ±1). In fact all 4th roots of unity occur: the product of our two generators has determinant −1 and maps to a 3-cycle in A₄, so its cube is a scalar of determinant −1, which must be ±i. [We’ve thus also shown en passant that n must be a multiple of 4; again this has a purely algebraic proof using the structure of quadratic forms over finite fields, which shows more generally that a self-dual code over a field of 4k−1 elements must have length divisible by 4.] Therefore |G_III| = 4 |A₄| = 48. We could find a second generator of the invariant ring as we did for G_II, by filtering the group down to the identity with each step using a normal subgroup with a small quotient and keeping track of the invariant subring of C[X, Y] at each step; alternatively, we can construct a second invariant directly from a quartic vanishing on the dual tetrahedron, which are the points where (X : Y) is either ∞ or a cube root of 1. That polynomial Y³(X³−Y³) is not quite invariant under G_III: it“s fixed by the MacWilliams involution, but multiplying Y by ρ does the same to the quartic. Therefore the cube δ₁₂ = Y³ (X³−Y³)³ of the quartic is invariant under all of G_III, and it soon follows that C[ϕ₄, δ₁₂] is the full ring of invariants (for example we can check that the two generators are algebraically independent and that the product 4 · 12 = 48 of their degrees equals the size of G_III). We now have Gleason’s theorem for self-dual ternary codes: the Hamming weight enumerator of any such code is a polynomial in ϕ₄ and δ₁₂.

We draw the same kind of consequences as before: it takes only floor(n/12) coefficients to specify the weight enumerator of a Type III of length n, and for n < 12 there are no coefficients to determine and the weight enumerator must be ϕ₄^n/4. This time we can show that a code with that weight enumerator must be isomorphic with the direct sum of n/4 copies of the tetracode: There are 2n pairs of words of weight 3, so some two of them must overlap; but then they overlap in exactly 2 coordinates, and then they generate a copy of the tetracode, which must be a direct summand because the tetracode is self-dual, etc. At n=12 we first have a choice, and if we use it to eliminate the coefficient of W_C that counts words of weight 3 we find that any Type III [12, 6, ≥6] code must have weight enumerator

W_C = ϕ₄³ − 24 δ₁₂ = X¹² + 264 X⁶Y⁶ + 440 X³Y⁹ + 24 Y⁶.
Yet again it turns out (and we may show) that there is a unique such code, and that removing any coordinate yields a perfect code, this time the 2-error-correcting ternary Golay code G₁₁, with parameters [11, 6, 5] (check: the Hamming ball of radius 2 in (Z/3Z)¹¹ has 1 + 2·11 + 2²(11·10)/2 = 1 + 22 + 220 = 243 = 3⁵ points). The extremal [12, 6, 6] code itself is called the “extended ternary Golay code” G₁₂, and its automorphism group is the double cover 2.M₁₂ of the sporadic Mathieu group M₁₂. [Here there’s a nontrivial center (Z/3Z)* = {±1}, and the quotient Aut(G₁₂) / {±1} is isomorphic with M₁₂, but the Aut(G₁₂) is not just the product of {±1} with M₁₂, so is a nontrivial double cover, which turns out to be unique for M₁₂ though we won’t prove this uniqueness statement.]

A Type IV code is a quaternary linear code that is its own Hermitian dual, i.e. dual under the sesquilinear pairing

⟨v,w⟩ = ∑_i v_i σ(w_i) = v₁σ(w₁) + v₂σ(w₂) + · · · + v_nσ(w_n)

where σ: a ↔ a² is the Galois involution of the 4-element field. In other words, the Hermitian dual of C is the image of the ordinary linear dual under componentwise application of σ. A quaternary linear code C is contained in its Hermitian dual iff ⟨c, c⟩ = 0 for all c∈C, which in turn holds iff every codeword has even weight. This makes W_C invariant under the involution (X, Y) ↔ (X, −Y). If C is self-dual then W_C is also invariant under (X, Y) ↔ (X+3Y, X−Y) / 2 by MacWilliams. Thus W_C is invariant under the group, call it G_IV, generated by these two linear transformations, which preserves the quadratic form X² + 3Y². We soon see that G_IV is the 12-element dihedral group of isometries of the regular hexagon with vertices (1, 0) and (±½, ±½) [yes, this is a regular hexagon in the Euclidean geometry associated to our quadratic form]. An example of a Type IV code is the [2, 1, 2] repetition code, whose weight numerator is the invariant quadratic form X² + 3Y² itself. The ring of invariants consists of polynomials in this quadratic form and the sextic δ₆ := Y² (X²−Y²)² with double zeros on the long diagonals of our hexagon. Thus it takes floor(n/6) coefficients to specify W_C, and in particular for n<6 the weight enumerator must be (X² + 3Y²)^n/2, indicating 3n/2 words of weight 2. For n=2, we clearly must have the [2, 1, 2] repetition code; and for n=4 and n=6, we can argue as we did for Type I that the only Type IV code is the direct sum of n/2 copies of the [2, 1, 2] code, unless n=6 and there are no words of weight 2, in which case C has parameters [6, 3, 4] and

W_C = (X² + 3Y²)³ − 9 δ₆ = X⁶ + 45 X⁴Y² + 18 Y⁶.

But we already know what quaternary [6, 3, 4] codes look like, even without any self-duality hypothesis: they are MDS codes associated to hyperovals in in P²(F₄) — and indeed such a code must be even (and thus self-dual in our Hermitian sense) because every line meets a hyperoval in a 0 or 2 points and thus corresponds to a line of codewords of weight 6 or 4. We already know enough to show that the hyperoval is unique up to the action of PGL₃(F₄) (choose any four points in general linear position, all of which are equivalent under PGL₃(F₄), and then add the two points not collinear with any two of the four); thus C is unique up to isomorphism, and is called the “hexacode”. Curiously hyperovals in P²(F₄) will also play a crucial part in our development of the self-dual [24, 12, 8] code G₂₄.

Type IV is the final variation on our theme, at least for Hamming weight enumerators: there are no G_V, G_VI, etc., because (using the classification of finite subgroups of PGL₂(C)) there are no further cases where the MacWilliams involution together with a map (X, Y) ↦ (X, ζY) (for some root of unity ζ≠1) generate a finite group. The closest one can come (which is misleadingly called “Type V” in some sources, though it doesn’t really belong in the same family — I’d rather think of it as “Type Zero”) is to take ζ=−1, getting a finite-index subgroup of the orthogonal group for X² + (q−1) Y²; but then any even self-dual code over a field of more than 4 elements must have weight enumerator (X² + (q−1) Y²)^n/2 and is the direct sum of self-dual [2,1,2] codes (which exist iff −1 is a square).

Beyond Hamming weight enumerators, there are some further examples (complete or joint weight enumerators, etc.) where elementary observations plus MacWilliams yield invariance under a complex reflection group, or a group close enough to being a complex reflection group that one can still give a satisfactory explanation of its invariant ring. See for instance Rains and Sloane’s chapter “Self-Dual Codes” from the Handbook of Coding Theory, especially Section 6 (“Invariant Theory”, starting on p.29) on general results and techniques, and Section 7 (“Gleason’s theorem and generalizations”, starting on p.47) for many further cases. For example (p.59), a self-dual quinary code C ⊂ (Z/5Z)ⁿ has symmetrized weight enumerator swe_C(x, y, z) = cwe_C(x, y, z, z, y) invariant under a group isomorphic over C with the symmetries of an icosahedron, with a polynomial invariant ring with generator degrees 2, 6, 10; setting z = y recovers the Hamming weight enumerator as an element of a somewhat smaller ring than what one gets by applying the MacWilliams identity directly to W_C.

Tuesday, Oct. 8: Existence and uniqueness of the binary Golay codes

There are (at least) two natural directions to go with the Gleason theorems: the extended Golay codes G₂₄ and G₁₂, which are the first codes of Type II and III without words of length 4 and 3 respectively, and have some remarkable combinatorial properties; and extremal enumerators and codes for large n, which lead to a different flavor of mathematics and some long-open problems (e.g. it is still unknown whether there exists a Type II [72, 36, 16] code). We take up the binary Golay code(s) first.

The Golay codes have symmetry groups that are large (e.g. large enough to be multiply transitive on the coordinates) and sporadic. The size indicates that there are various ways to prove uniqueness: in each case there are various natural objects on which the group acts uniquely (small subsets of the coordinates, codewords of given weight, etc.), and we can fix one of them, try to show there’s still a unique way to fit a code of the desired properties around it, and then get as a bonus the transitivity on objects of that type. The fact that the groups are sporadic suggests that no proof of uniqueness can tell us everything of interest about the Golay codes: different choices will reveal only different parts of the structure of the code and its automorphism group, and suggest generalizations in a different direction. One could give an entire graduate course on the Golay codes and Mathieu groups (even if I might not be “one” who could give such a course), but since that’s not an option I must choose one approach and more-or-less stick with it. Given what we’ve seen so far, I’ll take the route via the subgroup M₂₁ = PSL₃(F₄) of M₂₄ = Aut(G₂₄).

Suppose, then, that C is a Type II [24, 12, 8] code. We noted already that dropping any one coordinate yields a [23, 12, 7] code which is a perfect 3-error-correcting code. In particular, if we choose any word w of weight 5 supported on the remaining 23 coordinates then it is at distance at most 3 from one of the codewords, and thus at most 4 from one of the words c of C; but C is even with minimum distance 8, so c has weight 8 and contains the support of w, and is determined uniquely by that property (this can be seen directly from the triangle inequality together with d_min(C)=8). In other words, the supports of the 759 minimal words constitute a (5, 8, 24) Steiner system. As a check on this computation, verify that 759 = Binomial(24, 5) / Binomial(8, 5). [A similar argument works for the extended Hamming code, showing that its 14 words of weight 4 give a (3, 4, 8) Steiner system; and indeed 14 is Binomial(8, 3) / Binomial(4, 3). More generally, for each d ≥ 3 the quadruples in (Z/2Z)^d that sum to zero constitute a (3, 4, 2^d) Steiner system with automorphisms by AGL_d(Z/2Z), and that group acts 3-transitively on the 2^d points.] Moreover, any two of the blocks of the (5, 8, 24) Steiner system intersect in 0, 2, or 4 points. [This can also be checked combinatorially, without assuming that the blocks come from a [24, 12, 8] code of Type II, by calculating the “intersection triangle” of the system; for more on this, and Steiner systems in general, see for instance Cameron and van Lint’s text Designs, Graphs, Codes and their Links (London Math. Society, 1991, reprinted 1996).]

If we fix one, two, or three of the 24 coordinates, and consider only those octads (8-element blocks) containing the chosen point(s), then the residual heptads, hexads, or pentads constitute a (4, 7, 23), (3, 6, 22), or (2, 5, 21) Steiner system with 253, 77, or 21 blocks respectively. In the last case, any two blocks intersect in one point (because the original octads meet at the three chosen points, and thus in a unique fourth one; in fact this holds automatically for a (2, 5, 21) system because of general properties of “square designs” with as many blocks as points, via the minimal equations of their incidence matrices — see again Cameron and van Lint). So we get a combinatorial projective plane of order 4. We thus begin by showing that any such plane Π is isomorphic with the algebraic projective plane P²(F₄). We use the hyperovals that we already encountered earlier, and along the way encounter some exceptional behavior of the alternating and symmetric groups A₆ and S₆, namely the triple cover 3. A₆ and the outer automorphism of S₆.

We already know Π has hyperovals; choose one, and call it O. Every point outside O is on three lines meeting O in two points each, and thus gives a “syntheme”, which in this context is the (somewhat quaint 19th-century) terminology for a partition of the six points of O into three pairs. But there are 21−6 = 15 points, and only 6! / (2³3!) = 15 synthemes, so each occurs for some point. We now have a labeling of the 21 points of Π by the 6 points and 15 synthemes. As for the lines, 15 of them meet O in two points (here 15 arises as Binomial(6,2)), so we have a labeling of all but 6 of the lines by pairs of O-points. We claim that those lines form a hyperoval O* in the dual project plane Π*. Indeed we noted already that every point outside O lies on three lines that meet O, and thus only two lines disjoint from O, i.e. two lines of O*. So no three lines of O* meet at a point, which makes O* a hyperoval in Π* because |O*| = 6. This O* is called the dual hyperoval of O (and indeed it is clear that O** = O). NB this construction is special to projective planes of order 4, unlike the notion of a “dual oval” for projective planes of odd order, where the dual oval consists of all lines tangent to (not “disjoint from”) the oval, and the construction works for all odd q (and for any choice of projective plane if there’s more than one plane that admits an oval).

Each of the O* lines, say l, lies on five syntheme points. This gives 3 · 5 = 15 pairs, and indeed each pair yields a line that meets l at a unique point and thus appears in exactly one of the five synthemes in l. So l determines a partition of the 15 pairs into five synthemes. But there are exactly six such partitions, so O* must contain them all! They’re called the “synthematic totals” or simply “totals” of the 6-element set O. We have thus accounted for all the points and lines of Π (points are 6 oval points and 15 syntheme points, and lines are 15 pairs and 6 totals), and shown in effect that Aut(Π) acts simply transitively on the 21 · 20 · 16 · 9 · 2 · 1 ordered hyperovals. Since we already know a projective plane of order 4 with that many automorphisms, we conclude that Π ≅ P²(F₄) and Aut(Π) ≅ Aut(P²(F₄)) = PΓL₃(F₄) (where Γ in place of G indicates semidirect product with Aut(F₄); likewise ΣL_n(F) and PΣL_n(F) mean the semidirect products of SL_n(F) and PSL_n(F) with Aut(F)). We shall also need the number of hyperovals, which is 21 · 20 · 16 · 9 · 2 · 1 / 6! = 168.

Before resuming the construction of G₂₄ we note some further properties of the stabilizer of O in Aut(Π). Becauase Π acts simply transitively on ordered hyperovals, the stabilizer of an unordered hyperoval O is isomorphic with the group S₆ of permutations of the points of O. This group is generated by simple transpositions, which are not in PGL₃(F₄) because that group acts simply transitively on ordered 4-arcs. So they must be in the complement of PGL₃(F₄) in PΓL₃(F₄). This can also be seen explicitly by choosing coordinates so that four points of O are at (1, 0, 0), (0, 1, 0), (0, 0, 1), and (1, 1, 1): the remaining two points must be at (1, ρ, ρ²) and (1, ρ², ρ), and are thus switched by the field automorphism acting coordinatewise. It follows that a permutation of O lifts to a PGL₃(F₄) automorphism of Π iff it is an even permutation. I claim that indeed all the even permutations are in PSL₃(F₄), and the entire S₆ is in PΣL₃(F₄). Note that this is a nontrivial statement: because F₄ consists of zero and the three cube roots of unity, every scalar matrix in GL₃(F₄) has determinant 1, so the determinant map GL₃(F₄) → (F₄)* descents to a well-defined function on PGL₃(F₄). Now A₆ is a simple group, so the restriction of our map to A₆ must be trivial, i.e. A₆ is in the kernel PSL₃(F₄) of the map, as claimed. This could also be seen directly using our coordinates for O: the group is generated by 3-cycles, all of which are conjugate and thus have the same determinant; and one of these cycles is diag(1, ρ, ρ²) (fixing the unit-vector points and cyclically permuting the other three), which indeed has determinant 1. We could also have used the permutation matrix corresponding to a 3-cycle, under which the last three points are fixed and the first three permuted cyclically. We conclude that for two hyperovals O, O’ all g ∈ PGL₃(F₄) such that g(O) = O’ have the same determinant; this gives a partition of the hyperovals into three PSL₃(F₄) orbits each of size 168 / 3 = 56.

We pause to note some exceptional behavior of the permutation groups A₆ and S₆ that are visible in this picture, namely the outer automorphism of S₆ and the triple cover of A₆.

Our copy of S₆ in Aut(Π) also acts on O*, and this gives an outer automorphism of S₆ — it is known that S₆, alone among all symmetric groups, has an outer automorphism, and we can see it in the simultaneous action of the hyperoval stabilier on O and O*. This outer automorphism switches the conjugacy classes of 15 simple and 15 triple transpositions, and also the 40 single and 40 double 3-cycles, and the 120 six-cycles with the 120 permutations of cyclc structure 1¹2¹3¹. The point stabilizer S₅ maps to a transitive subgroup (indeed a sharply 3-transitive subgroup) of S₆, which can be understood as PGL₂(Z/5Z) acting on the projective line over Z/5Z. Note that two triple transpositions commute iff they share a 2-cycle, whereas two simple transpositions commute unless they overlap; thus a total, considered as a collection of 5 triple transpositions, is mapped under an outer automorphism to a collection of 5 overlapping simple transpositions, i.e. a clique of size 5 in the line graph of the complete graph K₆. For each of the six vertices p of this K₆, the 5 edges through p are such a clique, and it is easy to see that there are no others (the other maximal cliques are triangles, which correspond to the one wrong turn that one might make when constructing a total).

The other exceptional object is the triple cover 3. A₆ of the alternating group A₆. The simple alternating groups A_n all have double covers, which are their preimages in the universal covers of the orthogonal groups O_n−1(R), but for n=6 and n=7 these do not give the full Schur multiplier of A_n because there is also a triple cover. For n=6 we readily obtain this triple cover as the preimage of A₆ under the quotient map SL₃(F₄) → PSL₃(F₄). We show that this triple cover is nontrivial by checking that its restriction to a 3-Sylow subgroup is already nontrivial. This subgroup is generated by disjoint 3-cycles, such as the two that we have already represented by diag(1, ρ, ρ²) and a permutation matrix. These transformations commute in PSL₃(F₄), but their lifts to SL₃(F₄) have a nontrivial commutator, ρ^±1 times the identity, and thus generate a Heisenberg group over Z/3Z, whereas a trivial triple cover would restrict to an elementary abelian 3-group. Because O is also the point configuration associated to the hexacode, we see that 3. A₆ is the hexacode’s automorphism group in the group of (F₄)*-signed permutation matrices; If we allow also field automorphism we get automorphisms by a triple cover 3. S₆ of the symmetric group of permutations of the coordinates. The semidirect product “2⁶:(3. S₆)” of this automorphism group with the hexacode then arises as a maximal subgroup of M₂₄, namely the stabilizer of a “sextet” (a partition of the 24 coordinates into 6 tetrads any two of which form an octad).

Further remarks: One can likewise use (hyper)ovals to prove the uniqueness of the projective planes of orders 2, 3, and 5, and count their automorphisms. For q=2 (respectively q=3), a hyperoval (resp. oval) O has four points, six secants, and three points off O where two secants meet (so here we get not an outer automorphism S₆ → S₆ but the nontrivial map S₄ → S₃ that makes S₄ solvable); in each case we soon label the remaining lines and points, and find the incidences between them. S₄ then arises as the affine linear group AGL₂(Z/2Z) in the q=2 setting (stabilizer of a line in the projective plane), and the projective linear group PGL₂(Z/3Z) for q=3 (automorphism group of the projective line, which is isomorphic with O). Once you’ve done this exercise, try the q=5 case, where the ovals have six points, and synthemes and totals again come into play but are combined in a different way from what we saw for q=4.

Thursday, Oct. 10: The binary Golay codes and related structures, cont’d

Let C₁ be the code generated by (characteristic functions) of lines in Π and C₀ be the even subcode of C₁, which is generated by symmetric differences of lines. Then C is self-dual, and thus doubly even because the generators have weight 8. The [24, 12, 8] code C that we started with contains (c | 000) for every c ∈ C₀, and (c | 111) for every c ∈ C₁ not in C₀. We shall show that C₀ has dimension 9 (and minimum weight 8), and thus that (C₀)^⊥ has dimension 21 − 12 = 9, from which we’ll soon deduce the existence and uniqueness of C.

We need to study (C₀)^⊥, because C is self-dual and contains (c | 000) for every c ∈ C₀, whence every codeword (c | ???) of C must have its Π part c in (C₀)^⊥. Now (C₀)^⊥ consists of all w such that ⟨w, l⟩ has constant parity as l varies over the 21 lines. It readily follows that either w is 0 or it has weight at least 5, with equality iff it is one of the lines. Now, as an extension of our earlier observations on doubly even codes, since C₀ is doubly even every coset of C₀ in (C₀)^⊥ has constant weight mod 4. For example, every word in C₁ has weight congruent to either 0 mod 4 (if in C₀) or 1 mod 4 (if not). Since C₀ is contained in its dual, which has minimum weight 5, it follows that C₀ has minimum weight 8, which is one property we’ll need if C is to have minimum weight 8. Further examples of low-weight words in (C₀)^⊥ are the 168 weight-6 hyperovals we counted Tuesday, and the 360 subplanes (copies of Π₂ in Π₄), which have weight 7. Now C₁ contains the all-1 word (sum of all the lines, or of the 5 lines through any given point); so (C₁)^⊥ is the even subcode of (C₀)^⊥, and consists of all words w such that ⟨w, l⟩ is even for every line l. If such w has weight at least 6 then it has 4 collinear points, and is thus congruent mod C₀ to a word of lower weight. It follows that (C₀)^⊥ has no words of weight 0 mod 4 other than those already in (C₀)^⊥, and the only words of weight 8 in C₀ are the Binomial(21, 2) = 210 line pairs. We claim that there are exactly three cosets of C₀ in (C₀)^⊥, other than C₀ itself, that consist of words of even weight. To find three, start from any 3-arc and check that it is contained in three hyperovals, no two of which are in the same coset. But for any three binary words w_i (i=1,2,3), if each w_i and w_i + w_j has weight 2 mod 4 then w₁ + w₂ + w₃ has weight 0 mod 4 (by inclusion-exclusion for symmetric differences). So once we’ve found three nontrivial cosets we’ve found them all. It follows that dim((C₀)^⊥) = dim(C₀) + 3, and thus that C₀ has dimension 9 as claimed.

[…]

Here are some more details about the (re)construction of the (3, 6, 22), (4, 7, 23), and (5, 8, 24) Steiner systems from Π, taken from a combinatorics course I taught here in 2009: lecture notes from March 24 and March 26. (At that point in the class I had not yet introduced linear codes, so some results had to be obtained differently.)

Tuesday, Oct. 15: Extremal enumerators and codes; the Mallows-Odlyzko-Sloane theorem

Each of the special codes suggested by Gleason’s Theorem (extended Hamming, G₂₄, G₁₂, and hexacode for Types I, II, III, IV respectively) was suggested by considering the smallest n for which Gleason’theorem does not determine the weight enumerator uniquely and then making the enumerator unique by incrementing the minimum weight, doubling it from (say) e to 2e (this e is 2, 4, 3, 2 for Types I, II, III, IV respectively). In general, in each Type if the weight enumerator of a code of length n has m undetermined coefficients then we can try incrementing the minimum weight m times, finding that the condition wt_min(C) ≥ (m+1)e determines W_C uniquely. We call the resulting polynomial the extremal weight enumerator for its Type and length, and call any C with that enumerator an extremal code. Examples include the four special codes listed above, as well as the codes so short that Gleason already determines W_C uniquely without any condition on the minimum weight. But in this generality it is no longer necessarily true that there is a unique length-n code of the specified Type with minimum weight (m+1)e. We already saw one example where the code is not unique (the two Type II codes of length 16), and one where no code exists (the Nordstrom-Robinson weight enumerator for Type I codes of length 16). We must also consider the possibility that an extremal code has minimum weight larger than (m+1)e: conceivably the coefficient of W_C that counts words of that weight might happen to vanish. We already saw one such case of extra zeros, though it did not affect the count of words of weight (m+1)e: the extended binary Golay code G₂₄ is also extremal of Type I, but has no words of weight 10. We first show that in fact the coefficient of the extremal weight enumerator that counts words of weight (m+1)e is always positive. This means that (m+1)e is an upper bound on the minimum weight of any self-dual code of each Type; for Types II, III, and IV this bound is asymptotically tighter than the sphere-packing bound for codes of length n → ∞ and rate 1/2. [For Type I the sphere-packing bound shows that for large enough n there are no extremal codes, and indeed for each k there are only finitely many n for which there can be a self-dual binary code of minimum weight ≥(n/4)−k; we shall soon see that this is true for each Type in a different way: while the coefficient that counts words of weight (m+1)e is positive, the next coefficient eventually goes negative, and indeed for large n it is not possible for each count through (m+2)e to be nonnegative if the minimal distance is within k of extremality.]

We can describe each of our four cases as follows. Dehomogenize W_C by setting X=1, and let Y = q^e because W_C(1,Y) is a polynomial in this q. […]

Proposition (Mallows-Odlyzko-Sloane 1975). Suppose all the coefficients of the power series for F₀ are positive, and the q^k coefficient in the power series expansion of Δ⁻¹ is positive for all k ≥ −1. Then for each m the extremal weight enumerator of degree m in F and Δ has a positive q^m+1 coefficient.

The hypothesis holds for each of our four Types of self-dual codes, so we obtain:

Corollary.
i) A Type I code of length n has minimum distance at most 2 floor(n/8) + 2.
ii) A Type II code of length n has minimum distance at most 4 floor(n/24) + 4.
iii) A Type III code of length n has minimum distance at most 3 floor(n/12) + 3.
iv) A Type IV code of length n has minimum distance at most 2 floor(n/6) + 2.

Proof of Proposition, using the invariance of the formal residue of a power series: see this TeX page. [This approach is simpler than the original proof from 1975, though that proof gives further information that we’ll return to next time. The approach we use here does let us give formulas for the q^m+1 coefficient (number of minimal words of a putative extremal code) in each case, such as the formulas exhibited with OEIS Sequence A034414 for Type II codes.]

Thursday, Oct. 17: The sphere-packing problem; lattices and their theta series

[General introduction to Euclidean lattices, the Hermite constant (best lattice packing of equal spheres), etc., with material mostly contained in the first chapter of “SPLAG” = Sphere Packings, Latiices, and Groups by Conway and Sloane. For the analogue in the setting of unimodular lattices and their theta series, see also Part I of my Notices article “Lattices, Linear Codes, and Invariants”. In this setting, in the Type II (= even unimodular) case, Δ is actually the modular form usually denoted by Δ, so its inverse is q⁻¹∏_n≥1 (1−qⁿ)⁻²⁴, which is manifestly positive because it is a product of positive geometric series; while in the Type II (odd unimodular) case, the factor (1−qⁿ)⁻²⁴ is replaced by the 8th power of (1+qⁿ) / (1−q⁴ⁿ), which again has a positive power-series expansion. This (together with the positivity of the theta functions F) gives us two further parts of our Corollary: a unimodular lattice of rank n has minimum norm at most floor(n/8) + 1, and an even unimodular lattice of rank n has minimum norm at most 2 floor(n/24) + 2. The second part was already proved in 1969 by Siegel, who was led to the question from a rather different direction (as one can already gather from the title of his paper “Berechnung von Zetafunktionen an ganzzahligen Stellen” = calculation of zeta functions [of number fields] at integers, which can be found in volume IV of his collected works).]

Tuesday, Oct. 22: Overview of sphere-packing bounds; theta series of unimodular lattices; extremal theta functions and lattices

The analogue for sphere packings of the Gilbert-Varshamov bound is the Minkowski-Hlawka bound, which holds for packings not just of spheres but of translates of an arbitrary centrally-symmetric convex body B in Rⁿ, as long as B is bounded and has positive volume. It looks simpler than the Gilbert-Varshamov bound because real vector spaces have nontrivial homothecies: while all balls in a given Hamming space “look” different, 2B = B+B is just a scaled copy of B, and thus has volume 2ⁿ Vol(B). This gives us a lower bound of 2⁻ⁿ of the packing density. Again an averaging argument lets us obtain a lattice packing satisfying the same bound; and again it is a long-standing embarrassment that we cannot attain this bound “explicitly”, nor do substantially better, even for Euclidean spheres (the best bounds there are of order n^½, and still by averaging arguments). For l^p balls with p > 2, one does get exponential improvements (but still not by explicit lattices, except for rather large p — NB l^∞ balls pack with density 1); but the Poisson summation formula, together with positivity of the Fourier transforms of Gaussian functions, prevents such improvements for p = 2 (and likewise the fact that exp(−|x|^p) has positive Fourier transform for p < 2 prevents this technique from improving on Minkowski-Hlawka for p < 2). See the paper

Elkies, N.D., Odlyzko, A.M., and Rush, J.A.: On the packing densities of superballs and other bodies, Invent. Math. 105 (1991), 613–639.

For upper bounds on Euclidean sphere-packings, the (literal!) sphere-packing bound on the density is 1, and it’s not too hard to improve this to O(n2^−n/2) starting from the result of the second problem on our first problem set. This is already enough to show that we cannot have extremal Type I lattices for large n, because their density would grow as (πe/16+o(1))^n/2, and πe/16 = 0.53+ > 1/2. The best asymptotic upper bound known (using much the same technique that we’ll develop for codes) is 2^(−c+o(1))n with c ≈ 0.599, a constant that has not been improved on in decades now.

Thursday, Oct. 24: Overview of sphere-packing bounds; theta series of unimodular lattices; extremal theta functions and lattices

If L is unimodular, then (as for a self-dual binary code) the map taking a lattice vector v to its norm ⟨v, v⟩ mod 2 is a homomorphism, and the lattice is even (= “Type II”) when this homomorphism is trivial. Again this can happen only for n divisible by 8, which we prove using theta functions though there is also an algebraic proof. (See the first part of Serre’s A Course in Arithmetic, which also treats modular forms and the theta functions of unimodular lattices in the final chapter.) Whether or not this homomorphism is trivial, it’s given by v ↦ ⟨v, c⟩ mod 2 for some lattice vector c, because L is unimodular, and c is determined uniquely mod 2L. Such c is said to be a characteristic vector of the lattice; such vectors form a coset of 2L in L called the characteristic coset. (The same information is carried by the translate of L formed by all c/2 for c in this coset; this translate is known as the “shadow” of L.) The lattice is even iff the zero vector is characteristic iff the characteristic coset is trivial. iff L is its own shadow. Now the norm of any coset of 2L in L is constant mod 4, but the characteristic coset has constant norm mod 8. For example, the characteristic vectors of the Zⁿ lattice are those all of whose coordinates are odd, and these have norm n mod 8. The fact that doubly even lattices exist only in dimensions divisible by 8 has a refinement: the characteristic norm is always n mod 8. Again this has an algebraic proof but can also be obtained from the transformation formulas for the theta function of L.

Construction A associates to any binary linear code C ⊆ (Z/2Z)ⁿ a lattice L_C ⊂ Rⁿ by starting from the subgroup of Zⁿ consisting of vectors whose reduction mod 2 is in C, and then scaling by 2^−½ (i.e. dividing all inner products by 2). This last step makes Construction A commute with duality: the lattice L_C^⊥. (This is proved starting from the observation that L_C contains 2^½Zⁿ and is contained in 2^−½Zⁿ, and these lattices are each other’s dual, so the dual of L_C is sandwiched between them as well.) If C has parameters [n, k, d] then L_C has density 2^k−(n/2) (so discriminant 2^n−2k) and minimum norm min(2, d/2).

[Remarks on some nonlattice periodic sphere packings: Construction L_C makes sense for any binary code C, whether linear or not; in general L_C is a periodic subset of Rⁿ, and if C has parameters (n, M, d) then L_C has density 2^−(n/2)M and the formula 2^k−(n/2) still gives the minimum norm of a nonzero difference between vectors of L_C. For example, suppose C is the Best [sic] code, which has parameters (10, 40, 4) and is known to maximize M given (n, d) = (10, 4). Then L_C has minimum distance 2 and density 2⁻⁵40 = 5/4. This is denser than any known lattice packing in R¹⁰, and is conjectured to be the densest possible in this dimension. This is the smallest dimension in which the Euclidean sphere-packing problem is expected to be solved only by a non-lattice packing (NB already for n=3, and several other small dimensions (some below 10), there are periodic nonlattice packings that have exactly the same density as the densest lattice packing). For more about Construction A, see SPLAG chapter 5, which also gives further Constructions, e.g. “Construction B” generalizes Leech’s construction of Λ₂₄ from G₂₄. Look up “Construction” in the index for further variations later in the book; for example there’s a “Construction A₃” that associates a lattice in Rⁿ to a ternary code of length n. We say a bit more about this a few paragraphs below.]

Back to the case of linear codes C and lattices L_C: the theta series θ_L(τ) = Θ_L(e^πiτ) can be expressed in terms of the Hamming weight enumerator W_C(X,Y): substitute

θ_Z⟨2⟩(τ) = ∑_n∈Z e^2πin²τ = 1 + 2e^2πiτ + 2e^8πiτ + 2e^18πiτ + ··· for X, and what might be called θ_{(Z+½)⟨2⟩}(τ) = ∑_n∈Z e^{πi(n+½)²τ} = 2e^πiτ/2 (1 + e^4πiτ + e^12πiτ + e^24πiτ + ···)

for Y. [The notation “L⟨2⟩” means the lattice L with all inner products multiplied by 2; for L=Z the resulting lattice is also known as the root lattice A₁.] The transformation formulas of these weight-½ modular forms under the generators τ ↦ τ+1 and τ ↦ −1/τ of PSL₂(Z) should look familiar: the former takes (X, Y) to (X, iY), and the latter takes (X, Y) to (τ/i)^½(X+Y, X−Y) / 2^½. This explains why the same magic numbers 8 and 24 arise in the description of theta functions of Type II lattices as did for weight enumerators of Type II codes; indeed our substitution of θ_{(Z+½)⟨2⟩}(τ) and θ_Z⟨2⟩(τ)) for X and Y takes the weight enumerator of the extended Hamming code to θ_E₈, and takes δ₂₄ = X⁴ Y⁴ (X⁴−Y⁴) to 16Δ. […]

This story generalizes in various directions (the space of lattices in Rⁿ is bigger than the space of codes in Hamming space). In general if L is any lattice with rational inner products then θ_L is a modular form of weight n/2 for some congruence subgroup Γ of PSL₂(Rⁿ) (i.e., Γ contains (necessarily with finite index) some Γ(N), that being the subgroup of PSL₂(Zⁿ) consisting of matrices congruent to ±1 mod N; warning: Γ may also contain elements not in PSL₂(Zⁿ)). This ultimately comes down to Poisson summation again, but the proof is subtler because — with few notable exceptions — Γ is not generated by just a translation together with some involution τ ↔ −c/τ. These “notable” exceptions include Γ(1) = PSL₂(Z) itself, and also the index-3 subgroup of Γ(1) consisting of transformations of the theta function of a Type I lattice. Two further examples arise for even lattices L whose duals are isomorphic with L⟨1/N⟩ for N=2 or 3, such as D₄ and A₂ respectively. (This case includes several further record lattices, in dimensions 16 and 32 for N=2, and dimension 12 for N=3.) Here the rings of modular forms behave much as they do for Γ(1), but with the weights 4, 12 replaced by 2, 8 for N=2 and 1, 6 for N=3. This again gives rise to notions of extremal theta series and extremal lattices (such as the above records), with results analogous to what we saw for Type II lattices.

Construction A generalizes in several ways, with Z⟨2⟩ and ½Z⟨2⟩ replaced by any lattice L₀ contained with finite index in some other lattice L₁: additive codes C of length n with alphabet L₁/L₀ correspond to lattices L_C between (L₀)ⁿ and (L₁)ⁿ. As long as all nontrivial cosets of L₀ in L₁ are isomorphic, the theta series of L_C can be obtained from the Hamming weight enumeratorf W_C. (Without the isomorphic-coset condition we need cwe_C.) Examples of such pairs (L₀, L₁) are (3Z, Z) [“Construction A₃”], ((2+i)Z[i], Z[i]) [for codes over Z/5Z], and ((2−ρ)Z[ρ], Z[ρ]) [for codes over Z/7Z, where ρ is a cube root of unity, so Z[ρ] ≅ A₂]. Two examples of particular interest are L₀ = A₂ or D₄ and L₁ = (L₀)*: if C is of Type III or IV respectively then L_C is an even unimodular lattice. Taking C=G₁₂ in the former case, and the hexacode in the latter, we obtain lattices L_C that are within one step from the Leech lattice, as with Leech’s original construction from G₂₄.

[For more on “shadows” and their weight enumerators or theta series, see also Chapter 5 (starting on page 26) and also pages 48–49 and 73–75 of Rains-Sloane, and also my two papers “a characterization of the Zⁿ lattice ” and “Lattices and codes with long shadows ”.]

Tuesday, Oct. 29: Asymptotics of kissing numbers of extremal codes and lattices via the stationary-phase method

For a statement and proof sketch of the stationary-phase asymptotics that we use, see this write-up (and note the two Exercises at the end). There are naturally plenty of sources for this and related techniques in general; for instance it is a recurring [no pun intended] theme in Stanley’s Enumerative Combinatorics.

Thursday, Oct. 31: Asymptotic impossibility of (nearly-)extremal codes. Start on Reed-Muller codes

See this write-up outlining a proof of the Mallows-Odlyzko-Sloane theorem.

Ξ Ξ Ξ Ξ Ξ Ξ Ξ Ξ Ξ Ξ Ξ Ξ Ξ Ξ Ξ Ξ Ξ Ξ Ξ Ξ Ξ Ξ Ξ Ξ Ξ Ξ Ξ Ξ Ξ Ξ Ξ Ξ Ξ Ξ Ξ Ξ Ξ Ξ Ξ Ξ Ξ Ξ

CORRECTED Nov.2 (and replaced k by F for the field, because we need k for the dimension):

Reed-Muller codes RM(r, m) are linear binary codes of length n = 2^m that generalize some of the codes we’ve seen already: zero, repetition, extended Hamming, parity-check, and the full Hamming space. Recall that the extended Hamming code can be constructed as the space of affine-linear functions on F³ where F = Z/2Z. This is RM(1, 3). In general, RM(r, m) consists of the space of polynomial functions of degree at most r on k^m (and thus automatically has automorphisms at least by the affine linear group AGL_m(F)). We next find the dimension and minimum weight of each RM(r, m), prove that the dual of a Reed-Muller code is again Reed-Muller (specifically RM(m−1−r, m)), show that RM(r, m) is doubly even when it’s self-orthogonal (and even more highly even when r is small enough compared with m), and generalize to Reed-Muller codes over arbitrary finite fields.

The minimum weight is actually easier: I claim that RM(r, m) has minimum weight 2^m−r, and that this minimum is attained precisely by affine subspaces of codimension r in F^m. This is proved by induction on m, being clear for m=0. To get from m to m+1, split F^m+1 into two parallel copies of F^m, and use the fact that if a polynomial of degree r vanishes on a hyperplane then it factors as the corresponding linear polynomial times a polynomial of degree r−1. (NB this last step is not as trivial as it may seem because we’re allowed to evaluate only at points in F^m, not over the algebraic closure; at any rate the analysis in the next paragraph will suffice to establish this.) This will also let us prove by induction that a word c of weight 2^m−r in RM(r, m) is a codimension-r subspace, once we show that it must be contained some a hyperplane (except for r=0 when the claim is immediate). But if this weren’t true then every hyperplane would intersect the support of c in exactly half of wt(c), and then the function (−1)^c: F^m → R would have a discrete Fourier transform supported on the origin, making (−1)^c constant. But then c is constant and we’re back in the “immediate” case of r=0.

[The same inductive technique shows that if c∈RM(r, m) has weight less than 2^m+1−r then 2^m+1−r−wt(c) is a power of 2. In fact the words of weight less than 2.5·2^m−r have been completely described by Azumi, Kasami, and Tokura (Information and Control 30 (1976), 380–395).]

To obtain the dimension of RM(r, m), fix some choice of coordinates on F^m, and write any polynomial function F^m → F as a sum of monomials linear in each variable separately — this is possible because x² = x for all (i.e. for both) x in F. This shows that every polynomial function is a linear combination of the 2^m monomials. These monomials span the space of all functions F^m → F because the characteristic function of any point p is the polynomial ∏_i (1 + x_i − p_i). Therefore our 2^m monomials are linearly independent, and RM(r, m) has dimension ∑_j≤r Binomial(m, j). At this point we can prove our claim about polynomials vanishing on a hyperplane in F^m+1: Choose coordinates so that the hyperplane is x₁ = 0, and consider the 2^m monomials divisible by x₁. They vanish on the hyperplane x₁ = 0, and on the complementary hyperplane x₁ = 1 they agree with our 2^m monomial generators of the functions on F^m. Since those monomials are independent, they generate the 2^m-dimensional space of functions on F^m+1 that vanish on x₁ = 0, and we conclude that every polynomial of degree at most r that vanishes on x₁ = 0 can be written as x₁ times a polynomial of degree at most r−1.

It follows that RM(m−1−r, m) has dimension 2^m − dim(RM(r, m)). So we can prove that these two Reed-Muller codes are each other’s dual by checking that they are orthogonal to each other. By linearity it is enough to check that any two monomials of degrees at most r and m−1−r are orthogonal to each other, which is to say that the sum over F^m of any monomial of degree <m vanishes; but this is clear — indeed monomial of any degree d has weight exactly 2^m−d. This also proves that RM(r, m) is the parity-check code for r = m − 1, and thus that RM(r, m) is even for all r < m. Moreover, RM(r, m) is self-orthogonal iff 2r ≤ m−1, and self-dual iff 2r = m−1; we conclude that RM(r, m) is doubly even if 2r ≤ m−1, with the one exception of (r, m) = (0, 1). In particular, if r ≥ 1 then RM(m−1−r, 2r+1) is a Type II code, and is extremal for r=1 (the extended Hamming code) and r=2 (a [32, 16, 8] code), but not for any r>2 (and very far from extremal as r gets large, because the minimum weight grows only as the square root of the length).

Tuesday, Nov. 5: Reed-Muller codes, cont’d

We saw that RM(r, m) is even for r ≤ m−1, and doubly even for 2r ≤ m−1. Also, all nonzero weights in RM(1, m) are 2^m−1, and it can be shown that RM(2, m) has all weights either 2^m−1 or 2^m−1±2^ρ for some ρ with floor((m−1)/2) ≤ ρ ≤ m−1 (this is routine at least if you know about quadratic forms over F; we already noted that we can show inductively that the weight is either 2^m−1 or 2^m−1±2^ρ for some integer ρ, but not the lower bound on ρ). This suggests that in general all words in RM(r, m) have weight divisible by 2^v where v = floor((m−1)/r) — that is, that the number of rational points on any hypersurface of degree at most r in F^m is a multiple of 2^v. We prove this by using the inclusion-exclusion formula for symmetric differences that we already mentioned several times earlier in the class, but this time need in full generality. Choose coordinates as before, and write any word c in RM(r, m) as sum ∑_i∈I c_i of monomials of degree at most r. Then inclusion-exclusion for Δ gives

wt(c) = ∑_∅≠J⊆I ((−2)^|J|−1wt(Π_j∈J c_j))

and each subset of e monomials has weight a power of 2 that’s at least 2^m−re (and also at least 1). Multiplying by (−2)^|J|−1 always yields a multiple of 2^v (check this!), so the same is true of wt(c), QED.

The weight enumerator of RM(r, m) is still unknown for most values of r and m. It is trivial for r=0, and easy for r=1 (where all weights are 2^m−1 except for the all-zero and all-ones words); and for r=2 the enumerator can be obtained from the theory of quadratic forms. By MacWilliams we thus know the weight enumerator also for r ≥ m−3. But for intermediate r only a handful of cases have been computed; for large m (and r in [3, m−4]) there are still many coefficient undetermined even once we use the known results on divisibility and on words of weight up to 2.5wt_min for both the code and its dual.

We easily generalize the construction of Reed-Muller codes by making F an arbitrary finite field and regarding the polynomials of degree at most r on F^m as words of length q^m where q is the cardinality of F. Most of the results for binary codes generalize, though sometimes with nontrivial effort. For starters, the space of functions F^m → F now has a basis consisting of the q^m monomials of degree at most q−1 in each variable (proved as before by constructing the characteristic function of any point as a product of polynomials of degree q−1 in the coordinates). The polynomial functions of degree ≤r thus constitute a code whose dimension is the sum over d≤r of the z^d coefficients of the generating function (1 + z + z² + ··· + z^q−1)^m. The codes of degrees r and (q−1)m−1−r have dimensions that sum to q^m, so again we can show that they’re each other’s dual by checking orthogonality, which here reduces to showing that the sum over F^m of any monomial of degree less than (q−1)m vanishes; and we already saw this in the context of Chevalley’s part of the Chevalley-Warning theorem.

Once q>2 we cannot expect that most of these codes will have any nontrivial divisibility condition on their Hamming weights; for instance, the single-checksum code of degree (q−1)m−2 has all possible weights except 1. But it is still true that the weights are highly divisible for small r: for r=1 all nonzero weights are q^m (for constant functions) and q^m − q^m−1 (for actual hyperplanes), and for r=2 we can use the structure of quadratic forms over finite fields to show that there are only O(m) different weights, all divisible by the floor((m−1)/2) power of q. In fact it is true in general that, as we saw for q=2, the number of points on any hypersurface of degree at most r is always a multiple of q^v where v = floor((m−1)/r); but here this is a nontrivial theorem (J. Ax, Zeros of polynomials over finite fields, Amer. J. Math. 86 (1964), 255–261) and it seems that no easy proof is known. […]

Tuesday, Nov. 7: a bit more on Reed-Muller codes; introduction to cyclic codes

Tuesday, Nov. 12: Cyclic codes cont’d: the BCH bound and BCH codes; QR codes, and Golay codes as QR codes

Away from characteristic 2 it is simpler to proceed as follows: define c(0) to have zeroth coordinate α and the other coordinates 1 or −1 according to the Legendre symbol (quadratic character) mod n. Then each of the translates c(a) of c(0) has inner product α²+n−1 with itself and −1 with all the other translates. So, when (q/n) = +1, we can (by quadratic reciprocity) choose α in F so that α² = −n, and then (c(a), c(b)) = −1 for all a, b mod n, whether equal or not. Extending each c(a) by a coordinate of 1 “at ∞” then gives generators of the extended QR code of length n+1, while the cyclic QR code of length n is generated by differences c(a) − c(b).

Thursday, Nov. 14: Krawtchouk polynomials and Lloyds’s theorem on perfect codes

At the end of the Krawtchouk/Lloyd notes we need the fact that the Krawtchouk polynomial K_e(x) actually has degree e, and no smaller. We can prove this by calculating that its x^e coefficient is (−q)^e/e!, a fact that is also key to the proof of the Tietäväinen–van Lint theorem; the factor q^e comes from the binomial expansion of ((q−1)+1)^e. We could also have deduced this from the definition of K_e in terms of the discrete Fourier transform of 1_{S_i} : if any K_e had degree strictly less than e then the Krawtchouk polynomials would be linearly dependent, and this linear dependence would be inherited by the characteristic functions 1_{S_i} — which is absurd because the Hamming spheres S_i are pairwise disjoint.

Tuesday, Nov. 19: Introduction to the linear programming (LP) bounds on error-correcting codes

Thursday, Nov. 21: orthogonal polynomial basics

The Krawtchouk polynomials for given q and n are orthogonal. We shall apply some general results about orthogonal polynomials, which are classical but not well-known, at least not known as well as they would have been ome decades ago. So most of this class will be devoted to developing the results we shall use. This material is still standard enough that special-purpose lecture notes should not be necessary; for references, I think the relevant chapters of Körner’s Fourier Analysis contain everything we shall use, while Szegö’s Orthogonal Polynomials has a more extensive treatment of the topic.

Definition/construction of orthogonal polynomials P_i
Roots of P_i are real, distinct, and in interior of support interval
Three-term recurrence and Darboux-Christoffel formula
Roots are interlaced (also for linear combinations of P_i and P_i+1)

[I switched the order of the last two because the three-term recurrence also gives an easy inductive proof of the interlacing property: given that P_i−1 changes sign between each pairs of roots of P_i, we deduce the same for P_i+1, but with opposite sign (because P_i+1 is congruent mod P_i to a negative multiple of P_i−1); and then the signs at ±∞ give the two extra roots of P_i+1 beyond the smallest and largest roots of P_i.]

Tuesday, Nov. 26: LP bounds II: an asymptotic upper bound

We show that for fixed q and δ ∈ [0, (q−1)/q] the asymptotic rate R = log_q(M) / n of a code with error-detection rate δ = d/n is bounded by R log q ≤ H_q(ι), where H_q is the q-entropy function

H(ε) = −ε log ε − (1−ε) log (1−ε),
we introduced back in mid-September, and ι = (1−1/q) − (1−2/q)δ − (2/q) [(q−1) (δ−δ²)]^½ is the smaller root of the quadratic q²(ι+δ)² − 4qιδ − 2q(q−1)(ι+δ) + (q−1)² = 0 (which decreases from ι = (q−1)/q at δ = 0 to ι = 0 at δ = (q−1)/q).

[...]

When a system {P_i} of orthogonal polynomials does allow for a closed form for the triple products ⟨1, P_i P_j P_k⟩ = ⟨P_i P_j, P_k⟩ (which give the expansion of P_i P_j as a linear combination of the P_k), the resulting expression is sometimes called the “Adams formula”, because Adams gave such a formula for the Legendre polynomials (which are orthogonal with respect to dx on an interval). For the Krawtchouk polynomials, such a formula is available only in the symmetrical case q=2, when the generating function simplifies to (1 + YY' + YY'' + Y'Y'')ⁿ, in which each term is either zero or involves a trinomial coefficient.

The stationary-phase estimates on K_i(x) show further that the oscillatory region between the smallest and largest roots of K_i is also the region that contributes the most to ⟨K_i, K_j⟩: the normalized polynomial |S_x|^½K_i(x) oscillates with constant amplitude (to within factors n^O(1)) within that region, and decays exponentially outside it. [This is similar to the asymptotic behavior of the Hermite functions, obtained in the same way by normalizing the Hermite polynomials, which are orthogonal with respect to the weight function exp(−x²) on R.] Here is a Sage plot that illustrates this behavior for q=3, n=150, and i=10 and 11 (so also illustrating the interlacing property):

[Taking ι=10/150 and 11/150 in the relation between δ and ι at the transition points, and multiplying the resulting δ’s by 150, gives the estimates [61.4, 131.9] (i =10, red) and [59.5, 133.2] (i =11, blue) for the oscillatory region, which looks about right already for n=150; the transition gets sharper (as a fraction of n) as n grows. Here’s the code I wrote; Sage has documentation for an implementation of Krawtchouk polynomials, but I wasn’t able to use Krawtchouk(...) directly — fortunately my rudimentary skill in Sage programming was sufficient. The function K(n,q,i,x) computes K_i(x) itself; K1(n,q,i,x) computes a normalized polynomial (q⁻ⁿ |S_x|)^½K_i(x); and K2(n,q,i,x) normalizes further by dividing by ⟨K_i, K_i⟩^½ = |S_i|^½ to make ∑_0≤x≤n K2(x)² = 1. ]

Tuesday, Dec. 3: The ternary Golay codes and related objects

Here are extended lecture notes for existence and uniqueness of the extended ternary Golay code G₁₂ and related objects: the order-12 Hadamard matrix, the sporadic Mathieu group M₁₂, and the perfect Golay code G₁₁ and the smallest sporadic Mathieu group M₁₁. (The lecture notes are “extended” because we won’t cover the development via Steiner systems in class; see my Math 155 notes for this approach; the relation between affine, inversive, and projective planes is developed in the notes for 17 February ‘09.)

A couple of addenda:

1) We noted already that Leech’s construction of his lattice Λ₂₄ from G₂₄ has a G₁₂ analogue, with (A₂)¹² in place of Z²⁴; it is one of many miracles of Λ₂₄ and its symmetry group Co₀ that they accommodate both the (G₂₄, M₂₄) and the (G₁₂, M₁₂) structures.

2) [...]