Math 263x: Computational Techniques in Number Theory and Algebraic Geometry (Fall 2012)

Math 263x: Computational Techniques in Number Theory and Algebraic Geometry (Fall 2012)

Math 263x is a new “topics class” concentrating on some of the computational tools and techniques that can complement theoretical research in number theory, algebraic geometry, and related fields. We meet Mondays and Wednesdays from 1:00 PM to 2:00 PM in Room 222 of the Science Center.

If you find a mistake, omission, etc., please let me know by e-mail.

September 5: Introduction
September 10: Belyi maps and some of their uses
September 12: Start on computation of Belyi functions
September 17: More on Belyi polynomials etc.
September 19: Resultants
September 24: A cube minus a square
September 28 [sic]: Interlude on sorting and searching
October 1: Using multivariate (and usually p-adic) Newton's method
October 3: Wednesday, Oct. 3: Multivariate p-adic Newton, cont'd
[October 8: No class: University holiday (Columbus Day)]
October 10: positive- [usually 1-]dimensional families
October 15: Curves of genus 0 through 5
October 17: Hyperelliptic curves; curves of genus 1; a Weil-Belyi function on an elliptic curve (and parametrizing 5-torsion etc.)
October 22: Overview of complex reflection groups and their invariant rings (which give rise to highly symmetric curves)
October 24: The Hilbert-Molien series of a group representation; A₄, S₄, A₅ acting on the Riemann sphere
Oct. 29: NO CLASS: Harvard cancelled all classes due to anticipated severe weather
October 31: Covariants of subgroups of GL₂ related to A₄ and S₄
November 5: Covariants of subgroups of GL₂ related to A₅; overview of the exceptional reflection groups of dimension 3 and higher
November 7: Hurwitz quaternions, the W(D₄) lattice, and the W(F₄) invariants; Belyi functions parametrizing trinomials with interesting Galois groups
Nov. 12 and 14: NO CLASS: I'll be in Lausanne, Switzerland
November 19: Introduction to (classical (elliptic)) modular curves
[November 21 is the start of Thanksgiving break]
November 26: Fricke and Atkin-Lehner involutions of X₀(N) and some of their uses
November 28: Non-Atkin-Lehner elements of the normalizer of Γ₀(N)

Wednesday, Sep. 5: Introduction

After outlining the general purpose and spirit of the class, we give an example that illustrates some of our concerns in a context that does not require most of the background that will be freely assumed later in the semester. The example is Fermat's celebrated two-squares theorem: a prime p can be written as a sum of two distinct squares if and only if p≡1 mod 4. The representation is unique up to switching the two summands. So take say p = 31415926535897932384626433832795028841. Fermat promises an essentially unique solution to the Diophantine equation p = x² + y².

HOW TO ACTUALLY FIND THIS SOLUTION?

Trying all x < p^1/2 works in finite time, but not “finite enough” even with the computer (and if/when the computers catch up I can double the number of digits in p…). One proof of the theorem almost yields an efficient algorithm, using an idea attributed to Cornacchia (1908): x/y is a square root of −1 mod p, and conversely given such a root we recover (x,y) in time ≪log^cp by lattice reduction (which in two dimensions is basically the Euclidean algorithm). All the ingredients we used are already implemented in packages such as gp, so the resulting algorithm can be expressed by a one-liner such as

fermat(p) = qflll([lift(sqrt(Mod(-1,p))),p;1,0])[1,]

[Victor Miller 1992, transcribed some time later into the new gp syntax]. So for instance

fermat(p) = qflll([lift(sqrt(Mod(-1,p))),p;1,0])[1,] # fermat(31415926535897932384626433832795028841)

returns [4223562448517994405, -3684758713859920604] in about 0 ms. (and this would even be feasible, if arduous, to do by hand).

Why did we write that this analysis “almost yields an efficient algorithm”? Well, how do we find the square root mod p? An embarrassment: it's easy to evaluate the Legendre symbol, but if it's +1 we generally don't know how to get a square root in deterministic polynomial time unless we assume the extended Riemann hypothesis for the Legendre character mod p — though we can do it in “random polynomial time”. However, modular square roots of small numbers can be evaluated in polynomial (albeit not practical) time by using the arithmetic of elliptic curves mod p ! That was the application Schoof gave for his algorithm [ = René Schoof: Elliptic Curves over Finite Fields and the Computation of Square Roots mod p, Math. of Computation 44, pages 483–494 (1985)] for counting rational points on an elliptic curve mod p. In our case we count points on the curve Y² = X³ − X, which is relevant because it has complex multiplication by a square root of −1: the count is p+1±2x or p+1±2y, from which we recover the two-square representation in determinstic polynomial time.

Monday, Sep. 10: Belyi maps [unramified covers of P¹ − {0,1,∞}] and some of their uses

See Serre's Topics in Galois Theory (Boston: Jones & Bartlett 1992) for the application to the inverse Galois problem (perhaps the best-known arithmetic application) and other results concerning Belyi functions. In algebraic geometry, such functions might be most famous for the equality case in the Hurwitz bound of 84(g−1) on the number of automorphisms of a Riemann surface (a.k.a. algebraic curve over C) of genus g>1: if C attains this bound, or more generally has more than 12(g−1) automorphisms, then the quotient map C→C/Aut(C) is a Belyi function.

Such functions appear surprisingly often in other contexts; one of these years I might write an article on the ubiquity of Belyi functions. For now, I give references and/or links to some of the places where I've run across Belyi functions over the years:

• ABC implies Mordell, International Math. Research Notices 1991 #7, 99–109 [bound with Duke Math. J. 64 (1991)].
• The Klein Quartic in Number Theory (1998, in the MSRI volume The Eightfold Way on Klein's quartic curve x³y + y³z + z³x = 0)
• “slides” from a 1999 talk at MSRI on “Other Arithmetic Manifestations of Branched Covers”
• Shimura curve computations (1998) [especially the curves associated to groups commensurate with arithmetic triangle groups]
• Rational points near curves and small nonzero |x³−y²| via lattice reduction (2000) [see the start of Section 4, pages 22–25; some of the other material here will figure later in the course]
• Trinomials ax⁷+bx+c and ax⁸+bx+c with Galois Groups of Order 168 and 8·168 (with Nils Bruin), Lecture Notes in Computer Science 2369 (proceedings of ANTS-5, 2002; C.Fieker and D.R.Kohel, eds.), 172–188.
• My HCMR article on “The ABC’s of Number Theory” (2007)

Wednesday, Sep. 12: Start on computation of Belyi functions

A bit more detail on the topology: a Belyi map C → P¹(C) of degree n is determined by permutations g₀, g₁, g_∞ that satisfy g₀g₁g_∞=id and generate a transitive group G of permutations of the n sheets. This group is then Galois group of the Galois closure of the function-field extension C(C) / C(t) associated to the cover (where t is a coordinate on P¹(C)). Warning: if the cover is defined over a field F that is not algebraically closed then one might have to first take an extension of this ground field before obtaining a function-field extension with Galois group G; this is already seen for the (2-point!) Belyi cover t = zⁿ if n > 2 and F is a field like Q that does not contain the nth roots of unity. Also, the g_i are defined only up to conjugation in the normalizer of G in S_n. Distinct solutions might still be algebraically conjugate because the generators of π₁(P¹(C) − {0,1,∞}) are not canonical. The number of solutions of g₀g₁g_∞=id given the G-conjugacy classes of the g_i can be computed from the character table (see again Serre), though checking whether a given solution actually generates G can be trickier. It has been done for enough examples to show (together with more theory about fields of definition, plus Hilbert's specialization theorem) that every sporadic group except possibly M₂₃ is the Galois group of infinitely many extensions of Q! Some of these extension are so big that we don't expect to ever see them, but for smaller groups like M₁₁ (and interesting non-sporadic groups) we can actually compute the Belyi covers and specialize to find explicit extensions.

Start on computing explicit examples. We'll start with some of the simpler cases, where C is rational g_∞=id is an n-cycle so the map is a polynomial. Some of these cases we have already seen: the two-point covers t = cxⁿ (where g₁ is trivial) and t = cx^a₁(1−x)^a₂ where a₁+a₂=n and c is chosen to put the image of the critical point a₁/n at t=1. Thus g₀ is the product of cycles of length a₁ and a₂, and g₁ is a simple transposition.

Next we might make g₀ the product of three cycles, of lengths a₁, a₂, and a₃, so we have t = c x^a₁ (x−1)^a₂ (x−w)^a₃ and must also choose the parameter w. For generic w, there are two critical points other than 0, 1, and w, namely the roots of the quadratic in the numerator of a₁/x + a₂/(x−1) + a₃/(x−w). There are two ways to make this a Belyi function: either the roots coincide, in which case g₁ is a 3-cycle, or they are distinct but mapped to the same t, making g₁ a double transposition. The former is simpler, so we'll start with that case. The roots coincide if and only if the discriminant of the quadratic in x vanishes. We calculate that this discriminant is (a₁+a₂)² w² − 2(a₁n−a₂a₃) w + (a₁+a₃)². So there are two solutions of w, and indeed we can see directly that there are two solutions of g₀g₁g_∞=id up to conjugacy in the specified conjugacy classes, provided the a_i are distinct. If not, there's a single solution, but we might not be able to put the zeros at x=0, 1, w because two (or all three) might be algebraic conjugates. Example: n=5, a₁=3, a₂=a₃=1 yields

x³ (x²+15x+60) = (x+6)³ (x²−3x+6) − 6⁴ so −x³(x²+15x+60)/6⁴ [or equivalently x³(x²−15x+60)/6⁴] is a quintic Belyi polynomial with group A₅. In any case the solutions of the quadratic in w cannot be rational, or even real, beacuse the discriminant of that quadratic is −16a₁a₂a₃n < 0.

In general if g₀ is the product of m cycles and g₁ is an (m+1)-cycle (with the other n−m−1 points fixed) we expect m! distinct Belyi maps assuming the cycle lengths are pairwise distinct, but a unique map if all but one of the cycle lengths is the same. Exercise: Show how to find the corresponding unique map algebraically; what happens if m|n and all m cycles are of the same length?

Monday, Sep. 17: More on Belyi polynomials etc.

POSTSCRIPT to the three-cycle analysis last time: one could also see directly that such a polynomial cannot exist over R, because its logarithmic derivative a₁/x + a₂/(x−1) + a₃/(x−w) would be a decreasing function of x and thus couldn't have a real critical point. On the other hand, it is possible to have real, and even rational, solutions of “−16a₁a₂a₃n = square” if we allow some a_i to be negative, and this yields Belyi functions of a different kind, where g₁ is still a 3-cycle but each of g₀ and g_∞ is a product of two cycles, say of lengths a₁, a₂ and b₁, b₂ where a₁+a₂ = b₁+b₂ = degree of the rational function. The Diophatine equation

a₁+a₂ = b₁+b₂, a₁a₂b₁b₂ = d² gives a double cover of P² that is a rational Del Pezzo surface; for instance, we may choose any rational numbers for r:=a₁/a₂, s:=b₁/b₂ subject to rs=square, and then solve a₁+a₂=b₁+b₂. The first few nontrivial solutions give three Belyi functions of degree 10, with {a₁, a₂} and {b₁, b₂} any two of {1,9}, {2,8}, and {5,5}. For instance, if we choose {1,9} and {2,8} then our function has the form P(x) = x⁹(x+w)/(x+1)² for some w, and then the condition that P’/P have a double root yields w=2 or w=50/49. [NB in each case w and w−1 are S-units for S={2,3,5,7}, as expected by Beckmann.] Exercise: Verify these values of w, and check that in the first case we obtain an identity x⁹(x+2) + (3⁹/2⁸) (x+1)² = (2x+3)³ · septic(x) where the septic has S-unit discriminant (indeed a {2,3}-unit in this case). Obtain the analogous identity for w=50/49, and/or for one or more of the functions with b₁=b₂=5. END POSTSCRIPT

Before proceeding to the case that g₁ is a double transposition, consider the generalization where g₁ is an (m+1)-cycle and g₀ is the product of m+1 cycles of lengths a₁, a₂, …, a_m+1. That is, P has m+1 distinct roots of multiplicities a₁, a₂, …, a_m+1, and P−1 has a root of multiplicity m. As before, this is equivalent (modulo scaling P) to the condition that the logarithmic derivative P’/P have a root of multiplicitly m (note that the numerator of P’/P is a polynomial of degree m).

Suppose first that the a_i are distinct. Then the roots of P are in the field of coefficients of P. As before, once m>1 this field cannot be Q, or indeed any subfield of R, because P’/P is monotone decreasing; but we can still ask to compute those roots as algebraic numbers. There are m! possibilities: starting with the n-cycle, we must choose m+1 of its vertices to divide the circumference into segments of lengths a_i in any order, and there are m! choices up to rotation along the cycle. (If the points moved by g₁ were not in cyclic order on g_∞ then g₀ would have fewer than m+1 cycles and the covering curve would have positive genus. Cf. the extensive literature on Grothendieck's “dessins d’enfant”.) So we write

P(x) = x^a₁ Π_1≤i≤m (x+w_i)^a_i+1 for some distinct nonzero w_i. [Note that we do not insist on scaling these to put w₁ at 1, to retain the symmetry among the roots of P, which is parametrized by the point (w₁ : w₂ : … : w_m) in projective (m−1)-space; permutations of the roots act on this space by projective linear transformations, and the subgroup that fixes the first root acts by coordinate permutations.] Then the numerator of P’/P is a homogeneous polynomial of degree m in x and the w_i. It soon follows that the condition that this numerator be an m-th power amounts to m−1 homogeneous equations in the w_i, of degrees 2, 3, …, m. Since we already know to expect m! = 2 · 3 · · · m solutions, these solutions must constitute the complete intersection of the corresponding m−1 hypersurfaces in P^m−1. [Interlude about resultants, elimination theory, Gröbner bases, etc. went here; more about resultants next time.]

Recall that we postponed till later the case that g₀ is the product of only three cycles, so t = c x^a₁ (x−1)^a₂ (x−w)^a₃ for some w (where a₁, a₂, and a₃ are the cycle lengths), but g₁ is a double transposition rather than a 3-cycle. In this case the count of solutions of g₀g₁g_∞=id grows with the degree n even without increasing the number of cycles in g₀, so we expect that typically w and thus P will have to be in ever larger number fields, but can still ask how to compute them.

In this setting the critical points x₁, x₂ at the roots of a₁/x + a₂/(x−1) + a₃/(x−w) are distinct but satisfy P(x₁) = P(x₂) [and we can normalize the common value c to 1 by multiplying P by c⁻¹ to put the third branch point at t=1]. Let Q(x) be the quadratic polynomial with roots x₁, x₂. We could solve Q(x)=0 to find x₁, x₂ as algebraic functions of w, and then work out what P(x₁) = P(x₂) means as an equation for w; equivalently, and less laboriously, we could ask that the remainder of P mod Q be a constant polynomial: the linear coefficient is some function of w, which we would set to zero. But this would still yield an unnecessarily complicated equation, because the values of w that make x₁=x₂ would arise as spurious solutions. Instead we exploit the fact that P must be congruent to a constant not just mod Q but even mod Q², and this condition would fail for x₁=x₂ (when P is congruent to c only mod Q^3/2). So we'll use for our equation the vanishing of the x³ coefficient of P mod Q². Exercise: is the vanishing of this coefficient already enough to assure that P mod Q² is a constant polynomial, or might we have to then impose the further conditions that the x² and x coefficients vanish as well?

Consider for example the case that n=6 and (a₁, a₂, a₃) = (4,1,1). As we already saw, the coincidence a₂=a₃ lets us simplfy P to the form Cx⁴(x²+ax+b) for some nonzero C and parameters a, b determined up to scaling to λa, λ²b. Even so, we expect two inequivalent solutions [corresponding to the two isomers of cyclohexadiene]. This is one of the simplest cases with a double transposition. Here it turns out that for each solution the Galois group is properly contained in S₆ (though it cannot be the alternating group A₆ or a subgroup of A₆ , because g₀ and g_∞ are odd permutations). When the two “double bonds” are adjacent, we get GL₂(F₅) (a.k.a. the (3-)transitive copy of S₅ in S₆, a.k.a. the image of the points stabilizer in S₆ under an outer automorphism of S₆); when the two “double bonds” are opposite, we have the imprimitive 48-element subgroup of S₆, isomorphic to the symmetrics of the octahedron acting on its six vertices as the stabilizer of the partition into three opposite pairs.

Indeed we find that Q(x) = 6x² + 5ax + 4b, and then that the x³ coefficient of P mod Q² is 144ab − 100a³. Thus either a = 0 or 36b = 25a². The solution a = 0 makes P a polynomial in x², which corresponds to the imprimitive solution. Thus the other case must be the GL₂(F₅) cover. A convenient choice of scaling is (a, b) = (6,25), giving the identity 27x⁴(x²+6x+25) = (3x²−12x+20) (3x²+15x+50)² − 50000.

Exercises:
i) Since we just got a sextic cover with Galois group S₅, there must also be a Belyi map of degree 5 giving the same Galois closure. The cycle structures are 32 / 41 / 221 (corresponding to the S₆ cycle structures 6 / 411 / 2211). Find the Belyi map.
ii) What happens for the Belyi polynomials of degree 7 for which g₁ is a double transposition and g₀ has shape 331 or 421?

Wednesday, Sep. 19: Resultants

POSTSCRIPT on some Belyi polynomials of degree at most 7 for which g₁ is a double transposition: the examples of degree 6 aren't quite the smallest, but we've in effect seen all those of smaller degree, with g₀ and g₁ interchanged (which switches P with 1−P). Thus the only degree-4 possibility is 4 / 211 / 22, where g₀ is a simple transposition, and in degree 5 one of the two possibilities is 5 / 311 / 221, where g₀ is a 3-cycle. This leaves only 5 / 221 / 221, but when g₀ and g₁ are both involutions G must be dihedral and we get a cover equivalent to a Čebyšev polynomial. Here we can calculate directly the identity x (x²+5x+5)² + 4 = (x+4) (x²+3x+1)²; translating x by 2 yields the equivalent form (x−2) (x²+x−1)² + 4 = (x+2) (x²−x−1)²; and then replacing x by 2x and dividing P by 2 yields the fifth Čebyšev polynomial 16x⁵ − 20x³ + 5x which is ramified above ±1, each of which has one simple and two double preimages.
As for degrees 6 and 7…
• In degree 6, besides the cases with cycle structures 6 / 411 / 2211, there's also 6 / 222 / 2211 and 6 / 321 / 2211. The former again has G dihedral (and also imprimitive in this case, since 6 is composite), giving rise to a Čebyšev polynomial. For the latter, there are three solutions, all generating the alternating group A₆: writing P(x) = c x³ (x+1)² (x+w), we compute that w must be one of the three roots of 25w³ − 12w² − 24w − 16, an irreducible polynomial that happens(?) to give the field Q(2^1/3); explicitly w = 2^4/3 / (3−2^1/3). [gp's nfdisc reported that disc(Q(w)/Q) = −108, which sufficed to identify the field with Q(2^1/3); there are various ways then to find w as an element of that cubic field, and then it's known that for any cubic extension K/F any two elements of K\F are related by a unique fractional F-linear transformation that can be found by linear algebra.]
• 7 / 331 / 22111: here G is necessarily the 168-element group; the polynomial is defined over Q((−7)^1/2). The Galois closure is the Klein quartic, and x is a rational coordinate on the quotient of this quartic by one of the two kinds of index-7 subgroups of G (both isomorphic with S₄ and switched by an outer automorphism of G). While the Klein quartic can be defined over Q, its automorphism group cannot. Again I refer to my article on the Klein quartic.
• 7 / 421 / 22111: here P(x) = c x⁴ (x+1)² (x+w) and there are four possibilities for w, but they are not all conjugate: two are the roots of 2w²+w+1, which again generate Q((−7)^1/2), and the others are roots of 27w²−18w−25 and generate Q(21^1/2). Indeed there are two possible Galois groups, G₁₆₈ and the full alternating group A₇; which one corresponds to which pair of w's? [...]
END POSTSCRIPT

Motivation, definition, and properties of resultants of univariate polynomials, which we'll use to eliminate one of two variables when we've brought one of these calculations down to solving two simultaneous nonlinear equations. The Sylvester matrix of P and Q has corank equal the degree of gcd(P, Q), as can be seen by identifying the row kernel with {(A, B) : deg(A) < deg(Q), deg(B) < deg(P), AP+BQ = 0}. If ξ is a common zero of P and Q then the column vector (ξⁿ⁻¹, ξⁿ⁻², …, ξ², ξ, 1) (where n = deg(P) + deg(Q) is the matrix size) is in the kernel. This fully accounts for the kernel if gcd(P, Q) has distinct roots. What happens if there are some roots of multiplicity 2 or greater?

Exercise: Use this to find the Belyi polynomials of degree 11 with cycle structures 3³1² and 2⁴1³ (a.k.a. 33311 and 2222111) above 0 and 1. [Start by writing P(x) = C (x³+ax+b)³ (x²+cx+d) ≡ constant mod Q² where Q(x) = P’(x) / (x³+ax+b)².] What's going on?

Monday, Sep. 24: A cube minus a square

POSTSCRIPT re degree-11 Belyi polynomials: for 11 / 3³1² / 2⁴1³ there are 10 possibilities, in two Galois orbits of sizes 2 and 8. The larger gives G = A₁₁, but the pair gives functions defined over Q((−11)^1/2) for which G is the simple group of 660 elements, isomorphic with PSL₂(F₁₁). [This is the last of Galois' list of subgroups of index p in PSL₂(F_p), here isomorphic with A₅; we already saw the S₄ and A₄ cases, for p=7 and p=5 respectively. There's also the case of the index-6 subgroup of PSL₂(F₉) ≅ A₆.] Curiously a very similar thing happens if we change the cycle structure of g₀ to 11 / 4²1³ / 2⁴1³, as Andrew Sutherland tried: again 10=2+8 possibilities with the larger orbit giving G = A₁₁, and the smaller orbit defined over Q((−11)^1/2), but here the quadratic orbit gives G = M₁₁, the smallest Mathieu group! Both of these appear in the list of Galois groups of polynomials in Müller's paper: see the first item under (ii) and the second under (iii) in the statement of the main theorem (between pages 3 and 4). The appearance of Q((−11)^1/2) in both cases is no coincidence: it ultimately comes down to the fact that in both PSL₂(F₁₁) and M₁₁ the 11-cycle g_∞ is conjugate with its k-th power iff k is a quadratic residue of 11, even if we consider conjugacy in the normalizer of the group in S₁₁. The quadratic extension Q((−11)^1/2) arises as the subfield of the 11th cyclotomic field Q(μ₁₁) fixed by the squares in Gal(Q(μ₁₁) / Q) = (Z/11Z)^*.
END POSTSCRIPT

We begin our consideration of the important special case where g₀ and g₁ are 3- and 2-cycles with few or no fixed points, corresponding to A + B = C where B and C are nearly a square and nearly a cube (“nearly” = within low-degree factors). Such identities arise in connection with Hurwitz curves (and other highly symmetric Riemann surfaces), and also in connection with Hall's conjecture (which asserts that for integers x, y either x³ = y² or |x³ − y²| ≫_ε x^½−ε). Rather more structurally, because the discriminant of a cubic X³+pX+q is −(4p³+27q²), pairs (P, Q) of polynomials for which P³−Q² has many repeated factors often appear in connection with elliptic and modular curves and elliptic surfaces.(*) Sometimes these connections overlap: the identity (x²+10x+5)³ − 1728x = (x²+4x−1)² (x²+22x+125) yields (via a “Pell equation”) the only known infinite family of solutions of 0 ≤ |x³ − y²| ≤ x^½ [Danilov 1982]; the associated Belyi map, with exponents 51 / 33 / 2211, also gives the cover X₀(5) → X(1) of modular curves (a connection noted in [NDE 1998]: if x = 125(η(5τ) / η(τ))⁶ = w₅(η(τ) / η(5τ))⁶ then j(τ) = (x²+10x+5)³/1728x); and the Galois closure X(5) is the icosahedral Galois cover P¹ → P¹/A₅ ≅ P¹.

(*) Recall that any cubic polynomial X³+aX²+bX+c can be brought into the form X³+pX+q using the change of variable X ← X − a/3, though sometimes another choice of translate is more useful. For example, Hall's identity with deg(P)=8, deg(Q)=12, and deg(P³−Q²)=5 has P(x) = 4(x⁸ − 2x⁷ + 7x⁶ − 6x⁵ + 11x⁴ + 4x³ + 12x + 1) [without the factor of 4 the polynomial Q would not quite be in Z[x]; Hall actually used P(x+1), which has larger coefficients]. Then Q is the truncation of the Laurent expansion of P^3/2 about x = ∞ (the x^−k coefficients in this expansion vanish for 1 ≤ k ≤ 6). But it's much simpler to write the corresponding cubic as X³ + (x⁴−x³+3x²+1) X² − 2(x³−x²+2x) X + (x²−x+1), and this actually reflects a natural approach to finding these P and Q.

Friday, Sep. 28: Interlude on sorting and searching

[There was no class Wednesday the 26th due to the Yom Kippur fast, so I offered to lecture Friday instead; lacking a quorum to proceed with Newton's method, I substituted a tangential lecture on some applications of sort-and-search in computational number theory]

A basic problem: given a list x_i (1<i<N), find all coincidences x_i = x_j with i ≠ j. This seems to require (N²−N)/2 pairwise tests, but in fact can be done in time O(N log N) by first sorting the list and then looking for consecutive matches. The key fact that sorting takes time only O(N log N) is nontrivial but has been known for some time (see e.g. the beginning of Volume III of Knuth's The Art of Computer Programming). So is the fact that O(N log N) is optimal if each step can only compare two list elements (because it takes at least log₂(N!) comparisons to distinguish all N! possible orderings; see “bucket sort” and, rather more facetiously, “spaghetti sort” for models of computation that allow for O(N) sorting). This reduction from order N² to N log N makes sorting a powerful computational tool in various contexts, despite the additional space cost (usually all N of the x_i must be stored at once, even when x_i is given by a function of i, when it takes practically no space to test x_i = x_j for each pair {i, j}). A typical application is finding all solutions of f (i) = f (j) with 1≤i<j≤N, or all solutions of f (x)=g(y) with x and y in different sets. Unix utilities such as sort, uniq, and comm can be very helpful here because they implement the relevant algorithms in an efficient and often convenient form; when the files aren't in quite the right format, grep, sed, and (for more complicated tasks) editors like vi and emacs, can provide the needed pre/inter/post-processing. For example, it is easy to find in a wordlist two common words that are the same except that one begins with C and the other with U (and are more familiar and longer than crease/urease). [Here's the solution.]

Besides its application to this kind of wordplay, fast sorting finds use in some computational problems in number theory. A typical example is the search for elliptic curves E/Q with many small integral points (“small” meaning comparable with the coefficients of E; that is, x≪H² and y≪H³ if the coefficients a_i are O(Hⁱ)). In this ANTS-6 (2004) paper with Mark Watkins we did this as a heuristic proxy for large Mordell-Weil rank. The factorization trick to (in essence) find y, y’ and B given x, x’ and A in x³ + Ax + B − y² = x’³ + Ax’ + B − y’² = 0 will recur later this term when/if we reach parametrizations of elliptic surfaces with two sections. Another kind of application is to searching for solutions of Diophantine equations of the form f (x, y) = g(z, t). When f is itself of the form f (x, y) = f₁(x) + f₂(y), and the same is true of g, there are further nontrivial improvements that reduce the space requirement from the expected order of N² down to N; see for instance Dan Bernstein's sortedsums page. (It still takes N² time, though there's no factor of log N because the lists of f₁, f₂, g₁, g₂ values can be pre-sorted in time only N log N.)

Monday, Oct. 1: Using multivariate (and usually p-adic) Newton's method

For a paradigm we shall use Hall's identity, already exhibited at the end of last Monday's notes: deg(P³−Q²)=5, where P is the octic P(x) = 4(x⁸ − 2x⁷ + 7x⁶ − 6x⁵ + 11x⁴ + 4x³ + 12x + 1), and Q is the truncation of the Laurent expansion of P^3/2 about x = ∞, whose x^−k coefficients in this expansion vanish for 1 ≤ k ≤ 6. This is the first primitive example of an ABC identity deg(P³−Q²) = m+1 with deg(P) = 2m and deg(Q) = 3m; this can also be seen combinatorially from the structure of permutations g₀, g₁, g_∞ of 6m objects that satisfy g₀g₁g_∞=id and have cycle structures 3^2m, 2^3m, 1^5m+1(5m−1); they correspond to trees on 2m vertices (the 3-cycles), each of degree 1 or 3, embedded in the plane (so each cubic vertex has an orientation, and therefore not quite the same as the azanes N_m−1H_m+1, which do not come with a planar embedding — and much different from boranes, which turn out to be much more complicated than ball-and-stick chemistry would suggest). Hall's example has m=4; see below for m=1,2,3.

For any m, we can normalize P and Q to be monic, and require the Laurent expansion of P^3/2 to match Q to within O(x^1−2m); that is, the x^−k coefficient must vanish for 1 ≤ k ≤ 2m−2. Each coefficient is a polynomial in the 2m non-leading coefficients of P, call them c_i (i = 0, 1, 2, … 2m−1), so it might seem we have 2 equations too few; but in fact there are only 2m−2 parameters for P once we account for the two-dimensional group of affine linear transformations (the “ax+b group”). We can choose a canonical translation by setting the x^2m−1 coefficient of P, and thus also the x^3m−1 coefficient of Q, to zero (these are the next-to-leading coefficients; for m=1 this is the familiar “completing the square”). This leaves one dimension of equivalences, scaling x to ax for some nonzero a; this multiplies each c_i by a^2m−i. We can thus interpret (c_2m−2 : c_2m−3 : c_2m−4 : … : c₁ : c₀) as a point in (2m−2)-dimensional weighted projective space with weights (2, 3, 4, …, 2m−1, 2m), and are looking for a point in the intersection of (2m−2) hypersurfaces, with the vanishing of the x^−k coefficient corresponding to a hypersurface of weighted degree 2m+k.

We encountered such a problem before (see the end of the Sep.12 notes), where the number of solutions was given by Bézout's product formula. But here such a formula would grossly overestimate the number of solutions because there's an (m−2)-dimensional variety of spurious solutions (P, Q) = (R², R³). Indeed it is not immediately obvious that there are non-spurious solutions; their existence follows from the topological construction of Belyi functions, but that still leaves the question of computing the polynomials explicitly. This is easy for the first few m, especially since the first few equations must be linear in the first few coefficients (counting from c₀), which lets us solve for the first m/3+O(1) coefficients as rational functions of the remaining 5m/3 or so. But then we're left with a complicated system of nonlinear equations. m=4 is the first case where resultants fail us: the first equation has weight 13, so is linear in c₀ (weight 8), but the next equation (weight 14) is already quadratic in c₁ (weight 7), and there are still four more dimensions to go. What to do? [Since we're using this example as a paradigm, we pretend that we don't know how to exploit the special “cube minus a square” structure; but even those tricks give out before long, leaving us with even more complicated simultaneous equations in four or more variables.]

We'll exploit our foreknowledge that the solutions must be rational numbers. (We shall soon see how to adapt this method to solutions that are not necessarily rational but still algebraic of small degree.) We do not know in advance how complicated these rational numbers must be (in the present case we at least pretend not to know…), but we hope that they're simple enough that we can approximate them closely enough to guess the numerator and denominator, and then confirm the guess by checking that we have an exact solution of our system of equations. The complexity of a rational number c is measured by its height H(c): if c=r/s in lowest terms then H(c) = max(|r|,|s|). The number of solutions of H(c) ≤ M is asymptotically proportional to H², so we need enough precision to tell at least that many numbers apart, i.e. more than 2 log H(c) digits; equivalently, c must be known to within O(H⁻²). Fortunately that necessary level of accuracy is also enough to recover r and s, and not just in theory but practically, using continued fractions, a.k.a. the Euclidean algorithm. This is a common enough application that it is often implemented as a pre-existing routine, see e.g. bestappr in gp.

If we have an approximation that's close but not good enough for this purpose, we can repeatedly improve it using Newton's method. Suppose we want to find a zero x_* of a differentiable function F on d-dimensional space, and have an approximation x₀ that's within ε of x_*. Then, as in the familiar case d=1, we expect to get a better approximation by setting x₁ = x₀ − (F’(x₀))⁻¹F(x₀). Note that F’ is a matrix-valued function. If this function is continuous, and F’(x_*) is invertible, then x₁ is within O(ε²) of x_* provided that ε was small enough. Iterating then gives us errors of order ε⁴, ε⁸, ε¹⁶, etc., doubling the precision at each step; and soon we reach enough precision to recover the rational coordinates of x_*.

In practice we won't actually compute F’(x_*), because F will typically be quite complicated (in our example there are rational functions of moderately high degree and a square root), making the formulas for its partial derivatives horrendous. Fortunately it's good enough to approximate them by the “difference quotient”: for each unit vector e_j, replace the column F’(x₀) e_j of F’(x₀) by ε⁻¹(F(x₀+e_j)−F(x₀)). This still requires d² function evaluations, but they're usually much simpler functions than the entries of the formal derivative of F, and we're now spared the chore of working out this formal derivative.

But we still don't know how to find a close enough initial approximation x₀ — and indeed it's hard to estimate just how close we must be since we have no handle on how close F’(x_*) and its nearby values F’(x) are to being singular, and how fast F’ changes near x_*. We often circumvent this problem by remembering a ubiquitous slogan of modern number theory: treat all completions of a number field on an equal footing to the extent possible. Here instead of the Archimedean completion R we'll work with a p-adic completion Q_p, where numerical analysis is much simpler thanks to the non-Archimedean norm. Better yet, finding the initial approximation usually reduces to an exhaustive search mod p, which is often feasible past the range of Gröbner-basis methods (though this method too must fail eventually, being exponential in the number of variables). As long as F’(x_*) is not just invertible but invertible mod p, Newton will work starting from (an arbitrary lift to Z_p of) x_* mod p, so that once we've found the solution mod p by exhaustive search, we approximate x_* to p-adic precision 2^N in N Newton steps.

Warning: even though P had quite small coefficients (absolute value at most 12), the weighted projective coordinates (c_2m−2 : c_2m−3 : c_2m−4 : … : c₁ : c₀) are more complicated, e.g. even scaling c_i by 4^m−i leaves us with (84 : 176 : 2366 : 13536 : 26884 : 218864 : 268777); and we cannot solve for individual c_i, only for weight-zero expressions like c₈ / (c₂c₆), which are more complicated yet. Fortunately it's easy to get enough p-adic precision. In hairier settings, we might first recognize a relatively simple weight-zero expression, which would give us a start on choosing a good scaling (in our illustrative example, c₂³ / (c₃²) = 9261/484 = 21³/22² suggests (c₂, c₃) = (21, 22) as a good starting normalization). At the end it can still be something of an art to massage P to a more appealing form (12 vs. 268777).

As promised, here are the polynomials for m=1,2,3. They all have imprimitive Galois groups. In general if the Galois group of a cover C→P¹ is imprimitive then the cover factors as C→C₀→P¹. If the original cover was ramified only above 0, 1, ∞ then the same is true of C₀→P¹, and this Belyi map may be of interest in its own right. In our setting the other map C→C₀ can always be taken to be the squaring or cubing map, which is possible if m+1 a multiple of 2 or 3 respectively.
• For m=1, the permutations generate a subgroup of S₆ that fixes a partition of the six objects into three pairs. On the polynomial side, we can normalize the quadratic P to be monic with no linear coefficient, say P(x) = x² + 2a for some nonzero a. Then Q(x) = x³ + 3ax and P³−Q² = 3a²x² + 8a³. All these solutions are equivalent over C, but the equivalence class over Q depends on a mod (Q^*)². The three pairs permuted by the Galois group are pairs {±x} of roots of P³/Q² = t. Writing P³/Q² in terms of X:=x² yields the degree-3 Belyi function (X+2a)³ / (X (X+2a)²) with cycle structures 3/21/21, equivalent to the cover of modular curves X₀(2) → X(1).
• For m=2, the permutations fix a partition of 12 objects into four triples. Once we normalize the quartic polynomial P by translating x to kill the x³ coefficient, the quadratic and constant coefficients must vanish as well (check this directly!), leaving a polynomial we can write as a multiple of P(x) = x⁴ + 4ax. Then Q(x) = x⁶ + 6ax³ + 6a² and P³−Q² = −(8a³x³ + 36a⁴). All these solutions are cubic twists of each other. Writing P³/Q² as a function of x³ yields a degree-4 Belyi function with cycle structures 31/31/22, equivalent to the cover of modular curves X₀(3) → X(1).
• Finally, for m=3, the group fixes a partition of the 6m=18 objects into nine pairs. The polynomisls were obtained by Birch in 1961: up to twist and scaling, P(x) = 36x⁶ + 24x⁴ + 10x² + 1, Q(x) = 216x⁹ + 216x⁷ + 126x⁵ + 35x³ + 21x/4, and P³−Q² = 9x⁴/2 + 39x²/16 + 1. Writing P³/Q² as a function of x³ yields a degree-9 Belyi function with cycle structures 22221/333/711; since the corresponding permutations have orders 2, 3, 7, the Galois closure is a Hurwitz curve. It turns out to be the Fricke-Macbeath curve of genus 7, second-smallest after the Klein quartic, with 504 automorphisms that constitute the simple group SL₂(F₈). [While the Fricke-Macbeath curve can be defined over Q, the automorphisms can only be defined over the cyclic cubic extension Q(2 cos(2π/7)); the group Gal(Q(2 cos(2π/7)) / Q) acts on SL₂(F₈) via Aut(F₈), giving a Galois extension of Q(t) with group ΣL₂(F₈) = SL₂(F₈) : Aut(F₈).]

Wednesday, Oct. 3: Multivariate p-adic Newton, cont'd

Q: what if the coefficients we solve for weren't rational? Remember we've seen a few examples over fields like Q((−7)^1/2), Q((−11)^1/2), and Q(2^1/3).

A₀: If we know the quadratic imaginary field in advance, just work over C rather than R, and use continued fractions to approximate the real and imaginary parts separately. If the field is unknown, but still quadratic imaginary, we can recover the real part and the square of the imaginary part from a close enough approximation, and then we're done.

BUT that doesn't help us for real quadratic or more complicated irrationalities…

(Added Oct.10) A₁: OK, so find all the algebraic conjugates — which in effect is what we already did in the imaginary quadratic case (when an approximation to c automatically gives us an approximation to its complex conjugate). Then their elementary symmetric functions are rational numbers, and we've reduced to a previous problem. This is useful in other contexts too, as for computing “CM points on modular curves”, such as j-invariants of elliptic curves with complex multiplication (CM). For example, consider the j-invariant of an elliptic curve with CM by Z[(−58)^1/2]. This is known to be an algebraic integer, here of degree 2 (the class number of Z[(−58)^1/2]). Using the classical series j = q⁻¹ + 744 + 196884q + … (or an existing implementation, e.g. ellj(sqrt(-58)) in gp), we find that one conjugate is j₀ = 604729957825300085503.99999217152685675585…, but this is still not enough accuracy to recover the quadratic equation satisfied by j₀. We could double the precision and try again, but it's easier to use the algebraic conjugate ellj(sqrt(-29/2)), which we compute as j₁ = 24591258496.000007828473…: we now recognize the first elementary symmetric function j₀+j₁ as the integer 604729957849891344000 (NB we knew in advance that it's in Z, so there's no question of whether our precision is adequate), and that lets us bootstrap the accuracy of j₁ to compute the product j₀ j₁ as 14871070713157137145512000000000 [which happily factors smoothly as it should by Gross-Zagier: 2¹²3⁶5⁹53³149³173³], and thus solve for both CM values as 432000 (1399837865393267 ± 29^1/2 259943365786104). [Alternatively, knowing a bit more of the CM theory we could have predicted that j₀−j₁ is a multiple of 29^1/2, or at any rate of the square root of some factor of 58, at which point even less precision is needed. Yes, there's good reason why j₀, and thus also q⁻¹=exp(58^1/2π) and j₁, are individually much closer to integers than one would expect by chance: the pair (j₀, j₁) constitutes an integral CM point on the modular curve X₀(2)/w₂.]

BUT we're actually going to work p-adically, and then it's usually feasible to approximate one conjugate (see below) but not all of them unless the algebraic degree is very small (or we get very lucky).

We need a more powerful technique.

Overview of lattice basis reduction, accomplished in polynomial time by the Lenstra-Lenstra-Lovász algorithm, which for us is a tool for finding integer relations and similar Diophantine-approximation tasks. For us, we need it to recognize an algebraic number c of degree at most d from a good approximation, which is tantamount to finding an integer relation in 1, c, c², …, c^d, or in c and 1, a, a², …, a^d−1 if we already know that c is in the field generated by the algebraic number a of degree d. These algorithms are usually presented over R, but (at least in their incarnation as lattice-reduction techniques) work equally well (i.e. via Euclidean lattices of comparable parameters) over Q_p, and indeed gp's lindep and algdep work for p-adic as well as real (or complex) numbers. If we know approximations to more than one algebraic conjugate of c, say δ conjugates, then we can use them all together, looking for integer relations on vectors of length δ. (The degree d can't be too large, but if it's so large that we cannot find c with a few thousand digits of precision then we usually didn't care much about this c as an algebraic number in the first place…)

Lattice reduction in dimension >2 can give rise to useful practical improvements even for problems that we already know how to solve using 2-dimensional lattice reduction (a.k.a. Euclid). For example, if we're trying to recover a point (c₀ : c₁ : c₂ : … : c_k) of height at most H in (unweighted) projective space P^k(Q), we need to know the point to within about 1 / H² to apply continued fractions to (say) each ratio c_i / c₀, but in fact an H^−(k+1)/k approximation suffices if we use all the c_i together and apply LLL in dimension k+1. [NB halving the necessary precision can gain a factor of about 4 in the computing time. See my ANTS-4 paper for further improvements when we know a proper subvariety of P^k(Q) that contains our point.] Likewise for an element of (say) Q(i), where the real and imaginary parts usually have the same denominator to within a small factor; indeed in this case it's even better to search for a representation (a+bi) / (c+di) with a, b, c, d small.

Finally (and you may have been wondering about this for a while now), how do we know that our solution will have even one conjugate defined over Q_p? Well in general we don't, and can claim only a technique that often works well in practice, not a provably efficient algorithm. (For “provably efficient” we'd also need an upper bound on the height of the solution.) Still, we expect — based first on a naïve probabilistic argument, and then on its corroboration by Čebotarev's density theorem — that for random p the expected number of conjugates defined over Q_p should equal 1, so (barring bad luck) we should find one before p gets so large that we run out of patience or computing time. In any case, once we've found an algebraic solution by hook or by crook we can prove it is correct by simply checking that the desired identity holds — at least if an identity is all we're looking for, with no additional condition like the identification of G when it is not determined uniquely by the cycle structures and other available data.

Wednesday, Oct. 10: positive- [usually 1-]dimensional families

So far we've developed some techniques for finding (maps described by) isolated identities. But there's also considerable interest in identities that vary in positive-dimensional families. Our techniques adapt at least to low- (preferably 1-)dimensional families; some of the techniques we'll develop later this term also lend themselves to families of higher dimension.

Two examples of problems that naturally lead to a one-dimensional family of identities: covers of P¹ branched at four points with a varying cross-ratio, or pairs E, E’ of elliptic curves related by a cyclic isogeny of degree N. The first example naturally generalizes to d-dimensional families of covers branched at d+3 points (and likewise covers of higher-genus curves, e.g. another kind of one-dimensional family involves covers of an elliptic curve branched above only one point). The d-dimensional moduli space of such a family need not be rational, and may be of interest itself. Indeed the moduli curves X₀(N) of cyclic N-isogenies are a special case: we may assume (by composing by translation in the group law) that the isogeny E’→E takes the origin of E’ to the origin of E; then it is known that the isogeny commutes with the multiplication-by-(−1) maps on the two curves, and thus descends to a map E’/{±1}→E/{±1}. But that's a rational map P¹→P¹ branched above the four ramified points of E→E/{±1} (coming from the 2-torsion of E), whose Galois group is dihedral of order 2N (semidirect product of Z/NZ with {±1}), and conversely such a four-point cover lifts to a cyclic N-isogeny between elliptic curves. If N is odd, say N=2k+1, then each of the four monodromy generators has cycle structure 1·2^k. (This modular curve also comes naturally equipped with a Belyi map j/1728, of degree roughly N — the exact formula is N Π_p|N(1+p⁻¹), the product running over prime factors of N without multiplicity.)

Now when we have an isolated identity we can try to find it by using Newton to bootstrap from an approximate solution. That doesn't work in a positive-dimensional family: it might actually be easier to find an initial approximation, but with fewer equations than unknowns it doesn't make sense to speak of to define the exact solution that it approximates. Even if we do know a special exact solution (e.g. the 13-isogeny 3+2i from the elliptic curve y² = x³−x to itself), that's not enough to describe the full family.

[...]

Monday, Oct. 15: Curves of genus 0 through 5

So far we've made sure that all our Belyi covers are rational curves; but that's not always the case, nor the only interesting case. Before giving some examples of Belyi maps on non-rational curves, we need to give (or recall) enough of a description of such curves to understand the form of explicit equations defining the curves and rational functions on them. We'll stop at genus 5, which is the last case that a generic curve is a complete intersection in projective space, namely the zero-locus of a three-dimensional space of quadrics (homogeneous polynomials of degree 2) in P⁴. For genus 4, it's the intersection of a quadric and a cubic in P³; for genus 3, a quartic curve in P². In each of these cases the only non-generic exceptions are hyperelliptic curves; in genus 2 all curves are hyperelliptic. It's actually the more familiar cases of genus 0 and 1 that can get tricky if we want to work over fields like Q or (for families of curves) C(t) that are not algebraically closed. [This and most of what follows is standard (neo)classical algebraic geometry; standard references include several books co-authored by Joe Harris, and at Harvard it may be even more convenient to ask a student of Joe Harris. ]

genus 0: over C it's just the projective line (a.k.a. Riemann sphere), but over a field that's not algebraically closed (nor finite) even genus-zero curves needn't be trivial. Such a curve is always a smooth conic in P² (embedded by the space Γ(−K) of anticanonical sections, which has dimension 3 by Riemann-Roch); thus the curve is rational iff it has a rational point ("if" by the usual slope parametrization, and the converse is trivial). More generally a genus-zero curve is rational iff it has a divisor D of odd degree: "if" because some D+cK has degree 1, and is effective by Riemann-Roch; and again "only if" is trivial. All our genus-zero covers so far came with such a D, indeed a distinguished point, which could be described as the unique multiplicity-m preimage of t for some m≥1 and t in {0,1,∞} (being careful that we're not in a case where we've had to make two or all three branch points algebraic conjugates). But that need not be the case in general. For example, there's a unique action defined over Q of the symmetric group S₄ on a curve of genus zero; but the curve cannot be rational over Q, or even over R, because then (by the usual averaging argument) S₄ would be contained in O₂(R)/{±1}, and thus would contain a cyclic group with index at most 2. [This could also be checked by calculating the Schur indicator of the 2-dimensional irreducible representation of the relevant central extension 2S₄, or indeed of its subgroup isomorphic with the 8-element quaternion group.] So the genus-zero curve must be "pointless"; explicitly it is the conic x²+y²+z²=0, with S₄ acting by signed coordinate permutations. [... quaternion algebras like Hurwitz {2,∞}, Br₂, ...]

genus 1: again, over an algebraically closed field such as C we have a familiar picture, this time an elliptic curve C, since there must be a rational point P₀. More generally, any divisor D of positive degree is still effective by Riemann-Roch, and if deg(D)=1 then D∼(P₀) for some rational point P₀. We can then use the sections of 3P₀ to embed C in P² as a cubic in Weierstrass form y² + a₁xy + a₃y = x³ + a₂x² + a₄x + a₆, calculating the coefficients a_i by comparing Laurent expansions about P₀ as usual. Sometimes (especially when C arises as a modular curve such as X₀(11)) it is more convenient to start from a degree-2 function x on C and a holomorphic differential ω, and then set z = dx/ω, which is anti-invariant under the involution of C satisfying x ι = x and is regular away from the poles of x, and thus satisfies an equation z² = P(x) for some polynomial P of degree 3 or 4 according as x has one double pole or two simple poles. On modular forms, the q-expansions of modular forms often give a convenient handle on rational functions and holomorphic functions. Here we may tell Sage:

ModularForms(11,prec=14).echelon_basis()

to get the q-expansions of a basis of the modular forms on X₀(11) to within O(q¹⁴), and get the result

[
1 + 12*q^2 + 12*q^3 + 12*q^4 + 12*q^5 + 24*q^6 + 24*q^7 + 36*q^8 + 36*q^9 + 48*q^10 + 72*q^12 + 24*q^13 + O(q^14),
q - 2*q^2 - q^3 + 2*q^4 + q^5 + 2*q^6 - 2*q^7 - 2*q^9 - 2*q^10 + q^11 - 2*q^12 + 4*q^13 + O(q^14)
]

in which the second generator, call it φ₁, is a cusp form and thus yields a holomorphic differential ω = φ₁ dq/q. The ratio φ₀/φ₁ then gives us a rational function x = q⁻¹ + 2 + 17q + 46q² + 116q³ + 252q⁴ + 533q⁵ + 1034q⁶ + 1961q⁷ + 3540q⁸ + 6253q⁹ + 10654q¹⁰ + 17897q¹¹ + O(q¹²), and we compute z = q (dx/dq) / φ₁ = −q⁻² −2q⁻¹ + 12 + 116q + 597q² + 2298q³ + ···. Comparing coefficients (or doing something like

Z=subst(z^2,q,serreverse(1/x)); subst(truncate(Z),q,1/X)

in gp) then gives us the equation z² = x⁴ − 4x³ − 88x² − 300x − 304 = (x+4) (x³−8x²−56x−76) for X₀(11). We can then project the rational zero x = −4 to infinity and normalize the leading coefficient of the resulting cubic (i.e., replace x by (−11/x)−4 and absorb the factor (22/x²)² into z²) to get a Weierstrass model y² = x³ + 14x² + 55x + 121/4; the standard form y² + y = x³ − x² − 10x − 20 is then recovered by translating x by −5 and y by ½. (We'll hopefully come back to questions such as where the q-expansions of φ₀ and φ₁ come from, and how the curve X₀(11) actually parametrizes 11-isogenies.)

But what if there is no divisor of degree 1? Any genus-1 curve C comes with a divisor D of some positive degree d, and Γ(D) has dimension d, giving a map to P^d−1 that is an embedding as an “elliptic normal curve ” for d≥3 (for d=2 it's a 2:1 map, giving a model of C of the form y² = quartic(x)). As with curves of increasing genus, elliptic normal curves of increasing degree d get increasingly complicated, and are complete intersections only for the smallest few cases; here these are d=3 (a plane cubic) and d=4 (the intersection of two quadrics). Fortunately the d≤4 cases are most if not all the ones we'll encounter. Still even those cases are much more complicated than the plane conics that are as tricky as genus-zero curves get. Two famous examples already for d=3 are x³+py³+p²z³=0 for p prime (no p-adic solution) and 3x³+4y³+5z³=0 (Selmer's example of a cubic with no rational points even though there is no local obstruction). [To be continued Wednesday…]

genus 2: Once g≥1, the curve C is of general type (the canonical divisor is positive). The space of holomorphic differentials gives a map, the canonical map, to P^g−1, which is either an embedding or a 2:1 map. In the latter case C is hyperelliptic, which is the case for all curves of genus 2 but only for special curves once g≥3. At least if g is even (or if the ground field is algebraically closed, or finite, or more generally is a field with trivial Br[2]), a hyperelliptic curve of genus g has the form y² = P(x) where P is a polynomial of degree 2g+2 without repeated roots. For g=2, the degree-2 function x on C is simply the ratio, say ω₁/ω₂, of generators of the 2-dimensional space of holomorphic forms. The hyperelliptic involution ι then takes any (x, y) to (x, −y), and takes each ω_i to −ω_i (as can be seen either from the explicit formulas (ω₁, ω₂) = (x dx/y, dx/y) or by observing that a ι-invariant holomorphic differentials descends to a holomorphic differential on P¹, and is thus zero). Thus we can construct y as dx / ω₂, and then find the hyperelliptic defining equation for C that equates y² with a sextic polynomial in x (or a quintic if there's a rational Weierstrass point and ω₂ was chosen to vanish at that point).

Again we give an example of a modular curve, this time X₁(13), which is the first X₁(N) of genus >1. [For N≤12 the curve is rational, except for N=11 when it is a curve of genus 1 that you should by now know how to compute; we shall see later this term how to describe the elliptic curves with N-torsion points that are parametrized by these curves X₁(N).] This time we need the two-dimensional space of cuspforms for Γ₁(13), which in Sage we can get from

CuspForms(Gamma1(13),prec=14).echelon_basis()

to get

[
q - 4*q^3 - q^4 + 3*q^5 + 6*q^6 - 3*q^8 + q^9 - 6*q^10 - 2*q^12 + 2*q^13 + O(q^14),
q^2 - 2*q^3 - q^4 + 2*q^5 + 2*q^6 - 2*q^8 + q^9 - 3*q^10 + 3*q^13 + O(q^14)
]

We call these φ₁ and φ₂, and set ω₁ = φ₁ dq/q and ω₂ = φ₂ dq/q so that x = ω₁/ω₂ has a pole at the cusp q = 0. It's more convenient to subtract 1, taking x = q⁻¹ + 1 + q + q² + q⁴ − q⁶ − 2q⁸ + O(q¹¹) rather than q⁻¹ + 2 + q + q² + q⁴ − q⁶ − 2q⁸ + O(q¹¹). This doesn't change y, but simplifies the equation from y² = x⁶ − 8x⁵ + 26x⁴ − 46x³ + 53x² − 42x + 17 to y² = x⁶ − 2x⁵ + x⁴ − 2x³ + 6x² − 4x + 1, which is the well(?)-known formula up to changing x to −x (which makes all the signs positive).

genus 3 and higher: Here if C is not hyperelliptic then the holomorphic differentials (= sections of the canonical divisor) embed C as a curve of degree 2g−2 in P^g−1. For g=3,4,5 this curve is a complete intersection, as described at the beginning of today's notes; given generators for the holomorphic differentials, and thus homogeneous coordinates on P^g−1, the defining equations are linear relations in the monomials of appropriate degree, and can be found by linear algebra. [Next time we'll see what to do if C turns out to be hyperelliptic.]

Wednesday, Oct. 17: Hyperelliptic curves; curves of genus 1; a Weil-Belyi function on an elliptic curve (and parametrizing 5-torsion etc.)

A curve C of genus g>1 is hyperelliptic if and only if the canonical map C→P^g−1 is not embedding; in this case the map is 2:1 to its image, which is a curve of genus 0 and degree g−1, call it C₀. Thus if g is even the quotient curve C₀ is rational (the hyperplane section is a divisor of odd degree), and then C has the familiar form y² = P(x) for some polynomial P of degree 2g+2 without repeated roots. The holomorphic differentials are then A(x) dx/y where A is an arbitrary polynomial of degree at most g−1. If g is odd, C₀ could be a conic Q(x₀, x₁, x₂) = 0, and then C can be written as the double cover y² = P(x₀, x₁, x₂) for some homogeneous polynomial P of degree g+1 such that the curve P = 0 meets the conic in 2g+2 distinct points.

Given just C and the holomorphic differentials, we can recognize the hyperelliptic curves as those for which the differentials satisfy too many quadratic relations, (g−1)(g−2) / 2 as opposed to the generic (g−2)(g−3) / 2. If we also have a rational point p on C then we can easily generalize our approach to genus-2 curves to find a hyperelliptic equation for C. Choose a basis for the holomorphic differentials whose i-th element ω_i (1 ≤ i ≤ g) vanishes to order exactly i−1 at p. We'll use only the last two basis elements, and write x = ω_g−1/ω_g, a degree-1 function on C₀. Then the function field of C is generated by x and y = dx/ω_g. We again give a modular example, this time the curve X₀(41) of genus 3. (Modular curves often come with involutions, and are thus hyperelliptic much more commonly than one might expect “at random”.) Here the Sage output of CuspForms(41,prec=20).echelon_basis() is


[
q + q^4 - q^5 - 2*q^6 + 2*q^7 - 2*q^8 - 3*q^10 - 2*q^12 + 2*q^14 + 2*q^15 + 3*q^16 - 2*q^17 + 3*q^18 + 2*q^19 + O(q^20),
q^2 - 2*q^4 - q^5 + 3*q^8 + q^9 + q^10 - 2*q^11 - 2*q^12 + 2*q^13 + 2*q^14 - 4*q^16 - 2*q^18 + 2*q^19 + O(q^20),
q^3 - 2*q^4 + q^6 - q^7 + 2*q^8 + 2*q^10 - 3*q^11 - q^12 + 2*q^13 - q^14 - 2*q^15 - 2*q^18 + 3*q^19 + O(q^20)
]

(you may have surmised by now that “CuspForms(41,prec=20)” is actually an abbreviation for “CuspForms(Gamma0(41),prec=20)”, which works as well). Call these φ₁, φ₂, φ₃ respectively. There's a rational point at the cusp q = 0, and we take that for our base point p, so each ω_i is the corresponding φ_i dq/q. Here we needn't explicitly set up a linear system to check for a quadratic relation, because such a relation must write φ₁φ₃ − φ₂² as a linear combination of φ₂φ₃ and φ₃² and we can peel off the coefficients one at a time; here we find that φ₁φ₃ − φ₂² = −2φ₂φ₃ + O(q²¹), which is more then enough q-adic precision to prove that the identity holds exactly: a nonzero section of 2K has only 8 zeros with multiplicity, so at most O(q¹⁰) dq². [What would happen if we didn't check for quadratic relations and just routinely set out to find a quartic equation satisfied by the φ_i?] So we take x = ω₂/ω₃ − 1 = φ₂/φ₃ − 1 = q⁻¹ + 1 + 2q + 2q² + 3q³ + 4q⁴ + 7q⁵ + 8q⁶ + 11q⁷ + O(q⁸) and y = (q dx/dq) / ω₃ = −q⁻⁴ − 2q⁻³ − 2q⁻² + q⁻¹ + 12 + 42 q + 120 q² + … and find the hyperelliptic equation y² = x⁸ − 4x⁷ − 8x⁶ + 10x⁵ + 20x⁴ + 8x³ − 15x² − 20x − 8 for X₀(41).
As before, this octic polynomial has distinct roots modulo all primes other than 2 and factors of the level (here 41: the discriminant is −2¹⁶41⁶), reflecting the curve's good reduction at all primes not dividing the level — the bad reduction at 2 is an artifact that can be removed by “uncompleting the square”.

For a general hyperelliptic curve C of genus 3, the genus-zero quotient curve C₀ might not be rational but is always given by the unique quadratic equation satisfied by the holomorphic differentials ω_i. We can then choose the ratio of any two, say x = ω₂/ω₃, and construct an ι-anti-invariant function y = dx / ω₃ (NB same denominator), which has double poles above each pole of x (= each zero of ω₃) and nowhere else. Thus we can write (ω₃² y)² as a homogeneous polynomial of degree 4 in the three ω_i, giving a hyperelliptic equation for C.

[elliptic normal curves and their Jacobians]

Here's an example of some of the new considerations that arise when we deal with Belyi functions on curves of positive genus. We'll find the unique such function f : E→P¹ with cycle structures 5, 5, 221. By Riemann-Hurwitz E has genus 1, and since it has at least one obvious divisor of degree 1 (the simple preimage of the 221 point) E is an elliptic curve. It might not be clear a priori that the two quintuple points are distinguishable, but for now we put them at f = 0 and f = ∞, and choose the quintuple pole as the origin O for the group law on E. Call the quintuple zero T, so f has divisor ( f ) = ( f )₀ − ( f )_∞ = 5(T) − 5(O), and write the divisor ( f )₁ as P+2D where P has degree 1 and D has degree 2. Now in genus zero any two divisors of the same degree are linearly equivalent, but here 5(T) − 5(O) ∼ 0 is a nontrivial condition, telling us that T is a 5-torsion point on E, and f is the associated Weil function. [T cannot be a trivial torsion point, because f (T) ≠ f (O) implies T ≠ O. In general if n(T) ∼ n(O) then T is m-torsion for some factor m of n, and if 1<m<n then the function with divisor n(T) − n(O) is an imprimitive cover of P¹, being an (n/m)-th power of a Weil function of degree m. Here n=5 is prime so there is no imprimitive case to consider.]

Curiously we can also predict the simple preimage P of the third branch point 1: it must be −2T. This exploits a trick that must have been rediscovered many times, though to my surprise I've found no explicit mention of it earlier than my ABC⇒Mordell paper of 1991. The idea (which applies to branched covers of any positive genus) is that once we know all the ramification of f, we know the divisor of its differential df, and the fact that this divisor is canonical gives us additional information (an extra equation in the Jacobian) on the preimages of the branch points. It is more convenient to work with the logarithmic differential df / f, which has a simple pole at each zero or pole of f, and a zero of multiplicity m−1 wherever f = t has a zero of multiplicity m for some t other than 0 and ∞. Here this means the logarithmic differential has divisor D−(O)−(T), so D ∼ (O)+(T); since also (P)+2D ∼ 5(O), we can eliminate D to find (P)+2(T) ∼ 3(O), whence P = −2T in the group law, as claimed. Thus we can start from any Weil function w with divisor 5(T)−5(O) (i.e. any multiple of f ) and recover f as w / w(−2T).

The next step is to parametrize pairs (E, T) where E is an elliptic curve and T is a 5-torsion point on E (NB: this is much better than starting from a random E and then choosing one of its 24 nontrivial 5-torsion points). The following procedure for parametrizing elliptic curves with a torsion point of low order goes at least back to Tate (see the formula for a general curve with a 7-torsion point the end of §7 of his paper The Arithmetic of Elliptic Curves (Inventiones Math. 1974)). Suppose E has extended Weierstrass form with coefficients (a₁, a₂, a₃, a₄, a₆), that is

y² + a₁xy + a₃y = x³ + a₂x² + a₄x + a₆.
Let T be any point other than the group-law origin O, and translate x and y to put T at (0,0); this makes a₆=0. The tangent to E at T has slope −a₄/a₃, so T is 2-torsion iff a₃=0. Otherwise, we may translate y by (a₄/a₃) x, keeping T at (0,0) but making a₄ = 0 (equivalently: making the tangent to E at P horizontal). At this point we've used up all the available changes of variable except multiplying (x, y) by (λ², λ³) for some nonzero λ, which multiplies each a_i by λⁱ; thus we have parametrized the space of elliptic curves E together with a nonzero, non-2-torsion rational point T by an open set in (1, 2, 3)-weighted projective space — not quite the entire projective space, because we must exclude (a₁ : a₂ : a₃) that make E singular. In particular, a₃ must not vanish lest E be singular at T. Moreover, T is 3-torsion iff the (horizontal) tangent at T meets E with multiplicity 3 at T, which is the case iff a₂ = 0. Hence if T is not 3-torsion then neither a₂ nor a₃ is zero, so we may choose the unique λ that makes a₂ = a₃ = a for some nonzero a.

Now it's easy to describe, for small N >3, the pairs (a₁, a) that make T an N-torsion point. We illustrate with the case N=5 that motivated this excursion. We write the condition 5T = 0 as 3T = −2T, which (since T ≠ O) is equivalent to the condition that 2T and 3T have the same x coordinate. [These coordinates can be computed in gp with ellpow([a1,a,a,0,0], [0,0], 2) and ellpow([a1,a,a,0,0], [0,0], 3) (look, Ma, no ellinit!), though here the group-law computations are simple enough to be done unaided.] We find that 2T = (−a, a₁a−a) and 3T = (1−a₁, a₁−a−1), so 5T = 0 iff a₁ = a+1. Therefore the general 5-torsion point on an elliptic curve is equivalent to the point (0, 0) on the curve with coefficients (a+1, a, a, 0, 0). Exercise: Find the corresponding formulas for 4T = 0, 6T = 0, and (recovering the formula in Tate's paper) 7T = 0.

Next step is to find a Weil function w. Since w is a section of 5(O), it is a linear combination of xy, x², y, x, and 1. There's a one-dimensional space of combinations that vanish to order at least 4 at T, and then the fifth zero is automatically at T as well because T is 5-torsion. One way to find these combinations is to expand y in a Taylor series about x=0 near T; we find y = x² − x³ + x⁴ + (a⁻¹−1) x⁵ + O(x⁶) , so w = x² − y − xy works. An alternative approach, which can be used even for Weil functions of really high degree, is to write w as a product of powers of linear forms. Here the functions x and y on E have divisors (T) + (−T) − 2(O) and 2(T) + (−2T) − 3(O) respectively, so xy² has divisor 5(T) + (−T) + 2(−2T) − 8(O), and we need only divide by the equation of the line through −T and −2T, which is tangent to E at −2T. This gives w = xy² / (x+y+a). Rationalizing the denominator and removing a common factor y simplifies this to (x+1)y − x², same up to sign as our previous answer.

We are finally ready to find the value of a, and thus the curve E, for which f = w / w(−2T) = (x² − y − xy) / a² is a (5, 5, 221) Belyi function. There are several ways to go about this. A simple one is to locate the x-coordinate of the zeros of of f −1 by computing the resulting with respect of y of f −1 with the defining equation of the curve. This yields a quintic in x, one of whose roots is x(−2T) = −a, and the other four must come in two equal pairs; that is, the quintic must be c (x+a) times the square of a quadratic polynomial, for some constant c. We find that the resultant is −(x+a) (x⁴ − ax³ + a²x² + 3a²x + a²−a³), so the last factor must be a square. As usual we solve for a by comparing with the Laurent expansion of the square root about x = ∞ (which here is x² − ax/2 + 3a²/8 + O(1/x)). We find that a = −8, and check that this indeed makes f a Belyi function with the desired cycle structures.

The standard model of E has coordinates (a₁, a₂, a₃, a₄, a₆) = (1, 1, 1, 22, −9) and can be obtained for instance by telling gp E = [-7,-8,-8,0,0]; R = ellglobalred(ellinit(E)); ellchangecurve(E, R[2]) which also shows that the curve has conductor 50, small enough that it already appears in Tingley's 40-year-old Antwerp Tables (which include all curves of conductor at most 200). I usually advocate against forcing equations into such forms, which can make the equations more unwieldy and hide features such as the point (0, 0); but the ellglobalred form does have the advantage of being a canonical reduced form, which one can use to tell whether two curves are isomorphic or to compile tables for future reference. [Once the genus exceeds 1 it can be much harder to detect and find isomorphisms between two given curves.] Here we learn from the table that E has not just a rational 5-torsion point, but also a rational 3-isogeny; indeed it was already known in 1972 that this curve and the 3-isogenous one with coefficients (1, 1, 1, −3, 1) are the only elliptic curves over Q with both a rational 5-torsion point and a rational 3-isogeny. These curves' appearance here is related with the fact that the Galois closure of our Belyi function is the Bring curve of genus 4 with automorphism group S₅, which has maximal size for a genus-4 curve in characteristic zero; I hope I'll have the time to say more about this in a few weeks.

Monday, Oct. 22: Overview of complex reflection groups and their invariant rings (which give rise to highly symmetric curves)

[motivation: Klein, Fermat etc., and Bring curves]

Recall that a complex reflection is a linear transformation g of a finite-dimensional projective space V such that g is of finite order and fixes a subspace of dimension g−1 (i.e. 1−g has rank 1); equivalently, det(g) is a root of unity ζ≠1, and g has matrix diag(ζ, 1, 1, …, 1) with respect to some basis of V (a g-eigenbasis as usual). A complex reflection group is then a finite subgroup G of GL_n(C) that is generated by the complex reflections it contains. Examples include: the Euclidean reflection groups A_n, BC_n, D_n, E_n that arise in the theory of Lie groups; any finite subgroup of GL₁(C)=C^*; and the group G₁×G₂ of block-diagonal square matrices of order n₁+n₂, if each G_i is a complex reflection group in GL_{n_i}(C).

A subgroup G of GL_n(C) acts on the polynomial ring C[x₁, x₂, …, x_n] by linear substitutions. G is said to have a polynomial invariant ring if the subring (C[x₁, x₂, …, x_n])^G is C[Φ₁, Φ₂, …, Φ_m] for some algebraically independent homogeneous polynomials Φ_i; if G is finite then necessarily m = n (and then |G| = Π_i deg(Φ_i); if G is infinite then m<n, e.g. if G is GL_n(C) itself then m is zero!) Groups with polynomial invariant rings include the finite subgroups μ_m of GL₁(C), with Φ₁=x^m; a block-diagonal group G₁×G₂ as above, provided each G_i itself has polynomial invariant ring; and the symmetric group S_n of n×n permutation matrices, for which Φ_i can be the i-th elementary symmetric function or power sum of the x's (indeed symmetric polynomials are the paradigm for a reflection group; BTW this example illustrates that the generators Φ_i are generally far from unique — though a few may be determined up to scalar multiple, and all the degrees remain the same — and the most convenient choice may depend on the application, in which case we also want to know the formulas for going between different generating sets).

The overlap between the two lists of examples is no coincidence:

Theorem (Shephard-Todd 1954): A finite subgroup G of GL_n(C) has polynomial invariant ring if and only if G is a complex reflection group.

Part of the original proof (G.C. Shephard and J.A. Todd, “Finite unitary reflection groups”, Canad. J. Math. 6 (1954), 274–304) required obtaining the full list of irreducible reflection group (Table VII on p.301 of the Shephard-Todd paper). A more conceptual proof followed (Chevalley, later extended by Serre), but the classification of complex reflection groups remains a very useful tool and source for highly symmetric algebraic curves and other geometric structures.

There are three infinite families of irreducible complex reflection groups, plus 34 exceptional cases, in dimensions ranging from 2 to 8. Most (19) of the 34 exceptional complex reflection groups are in dimension 2; each is a variation of one of the three special groups A₄, S₄, and A₅ of rotations of the Riemann sphere). The counts in dimensions 3, 4, 5, 6, 7, 8 are respectively 5, 5, 1, 2, 1, 1. We begin with the infinite families, then describe the exceptional reflection groups and their invariant rings.

We have encountered already most of the infinite families. The simplest one (though it happens to be listed third in the Shephard-Todd list) consists of the cyclic groups μ_m in GL₁(C), with the invariant ring generated by Φ₁ = x^m.

The next series (and the first in the Shephard-Todd table) is the symmetric group S_n+1 acting on an n-dimensional space, the zero-sum hyperplane of the permutation representation. [...]

The final infinite series generalizes the group that arises in the symmetries of Fermat curves. For any n>1 and m>1, the μ_m-signed n×n permutation matrices constitute a complex reflection group G(m, 1, n) of order n!mⁿ with invariant degrees m, 2m, …, n m, because the invariant polynomials are symmetric functions in the m-th powers of the coordinate variables (we exclude n=1, which we have seen already, and m=1, which yields the not-quite-irreducible group S_n of permutation matrices). We obtain a homomorphism G→μ_m by mapping each matrix in G to the product of its nonzero elements. For each factor q of m, the preimage of μ_q is again a complex reflection group, which Shephard and Todd call G(m, p, n) where pq=m. This group has order n!mⁿ/p, and (again assuming m>1) acts irreducibly except for G(2,2,2) which is a Klein 4-group. The invariant degrees are m, 2m, …, (n−1)m, and nq. Recall that the invariant polynomials for G(m, 1, n) are the symmetric functions in the m-th powers of the coordinate variables; if we generate these by the elementary symmetric functions, then Φ_n is the m-th power of the product of the variables, and we get generators of the G(m, p, n) invariants by replacing this m-th power by the q-th power of the same product.

The real (a.k.a. Euclidean) reflection groups in these families are: μ₂ = {±1} = A₁ (and trivially μ₁ = {1}) in GL₁(R); the symmetric group S_n+1 = A_n in GL_n(R); the groups G(2, 1, n) = BC_n and G(2, 2, n) = D_n, again in GL_n(R); and less obviously (because this requires a complex change of basis) G(m, m, 2) which is the 2m-element dihedral group in GL₂(R).

It is not quite correct to say the three infinite families plus the 34 exceptional cases fully account for the irreducible complex reflection groups. There's also the missing group G(2,2,2), and several coincidences where the same small group appears more than once on the list. This is already familiar from the classification of simple Lie groups, where in addition to the exceptional groups G₂, F₄, E₆, E₇, E₈ we have the missing D₂ (which is not simple, decomposing as two A₁'s) and the coincidence A₃=D₃. Since Lie groups correspond to Euclidean reflection groups, these irregularities appear in our list: the 34 exceptions include F₄, E₆, E₇, E₈; the reducible D₂ is the reducible G(2,2,2); and A₃=D₃ is the coincidence of S₄ with G(2,2,3). [This last one is related with the exact sequence 1→(Z/2Z)²→S₄→S₃→1; note how the invariant degrees 2,3,4 manage to match up!] There are two further coincidences involving the dihedral groups: G(3,3,2) is also S₃, and G(4,4,2) is also G(2,1,2). These correspond to the Lie groups A₂ and B₂; note that G₂ is not exceptional as a reflection group, because like A₂ and BC₂ it's just one of the dihedral groups (though distinguished by being “crystallographic”).

[In positive characteristic, it is known that a representation of a finite group that has polynomial invariant ring must still be generated by g such that 1−g has rank 1, but there are several new complications. For one, such g could also be a “transvection”, with all eigenvalues 1 but one 2×2 Jordan block. Dickson already gave the example of SL_n(F_q), which has no reflections (why?) but is generated by transvections, and has an invariant polynomial ring with generators of degrees qⁿ − qⁱ for 0 < i < n and (qⁿ − 1) / (q − 1). Dickson also showed that GL_n(F_q) has invariant polynomial ring with degrees qⁿ−qⁱ (0 ≤ i < n). The full list of such groups is known, but this is a more recent result, and here it turns out that not all reflection/transvection groups actually have polynomial invariant rings.]

Wednesday, Oct. 24: The Hilbert-Molien series of a group representation; A₄, S₄, A₅ acting on the Riemann sphere

Recall that if U is a graded vector space with degrees 0, 1, 2, …, and finite-dimensional graded pieces U₀, U₁, U₂, …, then the Hilbert series for U is the generating function Σ_n≥0 dim(U_n) · tⁿ. If U = (C[V])^G, the algebra of invariants for a finite group G acting on a finite-dimensional space V, then this generating function is called the “(Hilbert-)Molien series” of the representation. If the invariant ring is polynomial with generators of degrees d_i then the series is 1 / Π_i (1 − q^d_i). In characteristic zero, the fixed subspace of any finite-dimensional representation of G has dimension |G|⁻¹ Σ_g∈G tr(g). This yields the formula |G|⁻¹ Σ_g∈G (1 / det(1− t g)) for the Molien series. (This formula figures in one approach to the classification of complex reflection groups, which is one way that the problem becomes much harder in positive characteristic.) For example, if V = C and G = μ_m we get m⁻¹ Σ_ζ^m=1 1 / (1− ζ t), which simplifies to 1 / (1− t^m) as expected; if V = C² and G = {±1}, we get (1/2) [(1−t)⁻² + (1+t)⁻²] = (1+t²) / (1−t²)², so (as we already knew) the invariant ring is not polynomial, though here it's still a “complete intersection ring”, with three generators x², xy², y² in degree 2 and one relation x²y²=(xy)² in degree 4, corresponding to the factorization (1−t⁴) / (1−t²)³ of the Molien series. [...]

We next work out in some detail the identities related with exceptional finite subgroups of GL₂(C), which give rise to some of beautiful mathematics (mostly classical but with various modern links) that should be [i.e. that I wish were] better known. For starters, suppose G is a finite subgroup (not necessarily a complex reflection group) of GL₂(C), and let D be its normal subgroup of diagonal matrices. Recall that any complex representation of a finite group G fixes a positive-definite Hermitian pairing (obtained by averaging, a simple case of the “unitarian trick”), and thus map G to the unitary group of that pairing. [This is why Shephard and Todd can title their paper “finite unitary reflection groups” and still get a description of all finite complex reflection groups.] So here G is a subgroup of U₂(C). It follows that the induced action of G₀ := G / D on P¹(C) is an injection into PU₂(C), which is the group SO₃(R) of Euclidean rotations of the Riemann sphere P¹(C). Now it is known that a discrete subgroup of SO₃(R) is cyclic, dihedral, or one of the three exceptional groups A₄, S₄, A₅. The cyclic subgroups yield reducible representations, and dihedral cases yield reflection groups G(m, p, 2). We next consider the exceptional cases, in which G₀ is the group of orientation-preserving symmetries of the regular tetrahedron, octahedron (dually: cube), or icosahedron (dually: dodecahedron) inscribed in the Riemann sphere.

For any finite subgroup G₀ of PU₂(C) (or even PGL₂(C)) its preimage in SL₂(C) is in the middle of a short exact sequence 1→{±1}→G₁→G₀→1. When G₀ is one of the three exceptional groups, or more generally any group containing an involution, the short exact sequence cannot split, because in SL₂(C) contains no involution other than the central element −1. We next describe in each case polynomials that are invariant or at least “covariant” under the action of these groups 2A₄, 2S₄, 2A₅. (We say P is “covariant” under an action of G when there's a homomorphism G→C^* such that gP = χ(g)P for all group elements g.) Note that G₁ is never a reflection group, because it contains no reflections (a complex reflection cannot have determinant 1); but for most of our purposes we need only the projective action, and also once we know the covariant polynomials we can easily describe the invariant rings of each of the reflection groups with the same image in PGL₂(C).

A nonzero polynomial P is covariant for G₁ iff its zero divisor (which is just a finite multiset in the Riemann sphere) is invariant under G₀. Now each of our G₀ is the group of rotations of a regular polyhedron with N triangles meeting at each vertex, and acts freely on the Riemann sphere except for the vertices, face centers, and edge centers of the polyhedron, whose stabilizers are cyclic of order N, 3, 2 respectively. Here are the familiar counts:

N G₀ polyhedron |G₀| V F E

3 A₄ tetrahedron 12 4 4 6

4 S₄ octahedron 24 6 8 12

5 A₅ icosahedron 60 12 20 30

Now the Euler relation E = V + F − 2 = (V−1) + (F−1) means that once we know the polynomials of degrees V and F we can obtain the third polynomial as the Jacobian determinant of the first two. (The Jacobian cannot vanish because the polynomials are algebraically independent.) Also, since our polyhedron has triangular faces we have F = 3E/2, which together with Euler's formula implies F = 2(V−2); thus we can get the polynomial of degree F as the Hessian (determinant of the matrix of second partial derivatives) of the degree-V polynomial. So it remains to find the G₀-orbit of smallest size V in the Riemann sphere. In each case there is a linear relation in degree |G₀| between the N-th power, cube, and square of the covariants of degree V, F, E respectively. The ratio of these powers gives the quotient map P¹(C) → P¹(C) / G = P¹(C), with the target P¹(C) arising naturally as a line in P²(C), and intersecting the three coordinate lines of that P²(C) at the three branch points of the target P¹(C). In each case this quotient map can also be identified as the covering of modular curves X(N) → X(1), with the branch points of order 2, 3, and N at j = 1728 = 12³, j = 0, and j = ∞ respectively.

Monday, Oct. 31: Covariants of subgroups of GL₂ related to A₄ and S₄

We next give explicit formulas in each case, using x and y has homogeneous coordinates and z = x/y.

N=3: We put the four vertices of the tetrahedron at z = ∞ and the cube roots of unity (note that this choice is not consistent with the usual picture of the Riemann sphere with the equator on the unit circle; we shall see that the equator ends up being |z| = 2^½). Thus we may take for the first polynomial A = x³y − y⁴. The Hessian of A, divided by −9, is B = x⁴ + 8xy³, with roots at z = 0, −2, and 1±(−3)^½. Dividing the Jacobian derivative ∂(A,B) / ∂(x,y) by −4 yields C = x⁶ − 20x³y³ − 8y⁶, with relation 64A³ − B³ + C² = 0.

G₀ clearly contains the 3-cycle z→ζz where ζ is a cube root of unity. [Sorry, no \mapsto in HTML.] This 3-cycle lifts to the pair of linear substitutions (x, y) → ±(ζ⁻¹x, ζy) in G₁, which multiply A by ζ and B by ζ⁻¹, leaving C fixed. G₀ also contains a Klein 4-group, because any four-point subset of P¹ determines a Klein 4-group that permutes the set freely and transitively (i.e. sharply 1-transitively, a consequence of the fact that PGL₂ acts sharply 3-transitively on P¹). For example, G₀ contains the involution that takes 1↔∞ and ζ↔ζ⁻¹, which is z ↔ (z+2) / (z−1). [In general, a fractional linear transformation z → (az+b) / (cz+d) is an involution iff a + d = 0, i.e. iff the trace of the corresponding 2×2 matrix vanishes.] This involution, together with z→ζz, generates G₀. The involution lifts to the 4-cycles (x, y) → ±(−3)^−½(x+2y, x−y) in G₁, which leave A, B, C all invariant. Thus A, B, C are covariants of G₁ with characters that take our 3-cycle (x, y) → (ζ⁻¹x, ζy) to ζ, ζ⁻¹, 1 respectively. Call the first of these characters χ. We get the smallest exceptional reflection group (#4 in the Shephard-Todd table) by replacing each element g of G₁ by χ(g)g. The resulting subgroup of GL₂(C) is a double cover of the same G₀, and is abstractly isomorphic with G₁, but contains complex reflections such as (x, y) → (x, ζ⁻¹y). Its ring of invariants is generated by B and C, with degrees 4 and 6. The other three reflection groups mapping to G₀ are obtained from this one by extending the center from {±1} to μ₄, μ₆, and μ₁₂; their invariant degrees are respectively (4, 12), (6, 12), and (12, 12): change C to C², B to B³, or both. These are Shephard and Todd's groups 6, 5[sic], and 7.

Besides the A₄ cover P¹→P¹ (a.k.a. X(3)→X(1)), these polynomials with tetrahedral symmetry, especially quartics like A and B, arise in the construction of symmetric higher-genus curves and other objects. (See below for sextics like C which have octahedral symmetry.) Consider first the elliptic curve w² = A(x, y). For any homogeneous quartic f(x, y) with distinct roots, the Klein 4-group of symmetries of the roots lifts to the elliptic curve w² = f (x, y), giving translation by the 2-torsion points of the curve. For f = A the curve also has a 3-cycle that is not translation by a torsion point (because it has fixed points), so we get a curve with j-invariant 0. (This is also clear from our formula for A; e.g. dehomogenizing by setting y=1 yields w² = x³ − 1.) Likewise a quartic such as x⁴−y⁴ with dihedral symmetry yields an elliptic curve w² = f (x, y) with j-invariant 1728. Going beyond genus 1, the quartic plane curve w⁴ = f (x, y) has 48 symmetries, forming a reducible complex reflection group in PGL₃(C) (and a dihedral f yields the Fermat quartic, with 96 symmetries not all of which preserve the map to the (x : y) line). Finally (for now), consider the smooth quartic surface A(x, y) = A(v, w) in P³(C). Schur observed in 1882 that the 12 symmetries of the tetrahedron yield 64 lines on this quartic, four for each symmetry plus 4²=16 joining a root of A(x, y) to a root of A(v, w); this is more than the 48 of the Fermat quartic surface, though the Fermat quartic surface has more symmetries. Segre showed in 1943 that 64 lines is maximal for a smooth quartic surface over C (in small positive characteristic there can be extra lines).

The μ₄ case (#6) also arises in coding theory; this is the reason I chose χ(g)g rather than χ⁻¹(g)g, which is equivalent and has quartic invariant A rather than the “larger” B. Let K be a linear code of length n a finite field of q elements. [Normally one uses not K but C for Code, but this could get confusing here…] Recall that the (Hamming) weight enumerator W_K(x, y) is the homogeneous polynomial of degree n whose x^n−wy^w coefficient is the number of codewords of weight w (each w in [0,n]). The weight enumerator of any linear code K is related with the weight enumerator of its dual code K^⊥ via the MacWilliams identity W_K^⊥ (x, y) = |K|⁻¹ W_K (x + (q−1) y, x − y). Suppose now that K is a “Type III code”, i.e. that q = 3 and K is self-dual. The first example with n > 0 is the “tetracode”, generated by (1, 1, 1, 0) and (1, −1, 0, 1), with weight enumerator x⁴ + 8xy³ (all eight nonzero words have weight 3). This looks familiar for good reason! The condition K=K^⊥ implies that |K| = 3^n/2 (in general a self-dual code of length n has dimension n/2) and that every word has weight divisible by 3 (compute the pairing of any word with itself). The former property, together with MacWilliams, implies that the weight enumerator W_K is invariant under (x, y) ↔ 3^−½(x+2y, x−y); the latter, that W_K is invariant under (x, y) → (x, ζy). Therefore W_K is invariant under the subgroup of GL₂(C) generated by these two linear transformations, which is reflection group #6 with center μ₄ = {±1, ±i} (note that the scaling coefficient in the MacWilliams identity is 3^−½, not (−3)^−½). This yields Gleason's theorem for Type III codes: the weight enumerator is a polynomial in B = x⁴ + 8xy³ and C², or equivalently in B and A³ = y³(x³−y³)³. In particular, 4 | n, which was not obvious (though it can be proved by more direct means). Also, any Type III code of length 8 has weight enumerator B², and a Type III code of length 12 with no words of weight 3 must have weight enumerator B³ − 24A³ = x¹² + 264x⁶x⁶ + 440x³x⁹ + 24y¹². It is known that such a code exists, and is unique up to isomorphism, namely the extended ternary Golay code, which is also a natural route to the sporadic Mathieu group M₁₂, and especially its double cover 2.M₁₂; e.g. the 132 pairs of words of weight 6 are supported on the blocks of the (5,6,12) Steiner system, and the 12 pairs of words of maximal weight form the unique Hadamard matrix of order 12.

N=4: We have two natural choices here. One is to note that the edge-centers of a regular tetrahedron are the vertices of a regular octahedron, while the vertices of the tetrahedron and its dual constitute the eight vertices of the octahedron's dual cube. We may thus use the above C and AB = x⁷y + 7x⁴y⁴ − 8xy⁷ as our sextic and octic polynomials with octahedral symmetry. (As we know, we could also construct AB as a multiple of the Hessian of C; as it happens it's the Hessian divided by −3600.) Then D = ∂(C, AB) / ∂(x,y) = x¹² + 88x⁹y³ + 704x³y⁹ − 64y¹² is the covariant dodecic(?), with C⁴ + 256(AB)³ − D² = 0. [What would happen if we instead took the Hessian of AB to get a covariant polynomial of degree 2(8−2)=12?] In this picture the symmetry group S₄ consists of the A₄ and its composition with the involution z ↔ −2/z that switches the roots of A and B. The polynomials AB, C, and D are invariant under 2A₄, but the action of 2S₄ multiplies C by the nontrivial character 2S₄→{±1} coming from the sign character of S₄, while fixing AB and D. This can be seen directly from the action of the determinant-1 lifts of z ↔ −2/z, which are (x, y) ↔ (±2^½x, ∓2^−½y). It follows that if we lift even permutations from S₄ to SL₂(C), but odd permutations to matrices of determinant −1 in GL₂(C), we get another double cover of S₄ that does have a polynomial invariant ring, generated by AB and C; this is Shepard and Todd's complex reflection group #12 (where #8 through #11 are larger groups that contain the to SL₂(C) lift of S₄).

It is often more convenient to start from P = xy (x⁴−y⁴), whose roots z = 0, ∞, ±1, ±i are vertices of an octahedron inscribed in the sphere with obvious fourfold symmetry z → iz (and with the equator restored to |z| = 1). Dividing the Hessian of P by −25 yields Q = x⁸ + 14x⁴y⁴ + y⁸, with roots at the vertices of the cube dual to that octahedron. Dividing the Jacobian derivative ∂(P,Q) / ∂(x,y) by −8 then yields R = x¹² − 33x⁸y⁴ − 33x⁴y⁸ + y¹², which has its twelve roots at the eight points μ₄(1±2^½) and the four primitive 8th roots of unity 2^−½(±1±i ). The identity relating these covariants is 108P⁴ − Q³ + R² = 0.

Here the symmetry group is generated by z → iz together with the involution z ↔ (z+1) / (z−1), which switches 1 ↔ ∞, 0 ↔ −1, and i ↔ −i. The determinant of the associated linear map (x, y) → (x+y, x−y) is −2, so its lifts to SL₂ are obtained by dividing by the square roots of −2. [...]

Monday, Nov. 5: Covariants of subgroups of GL₂ related to A₅; overview of the exceptional reflection groups of dimension 3 and higher

N=5:

Wednesday, Nov. 7: Hurwitz quaternions, the W(D₄) lattice, and the W(F₄) invariants; Belyi functions parametrizing trinomials with interesting Galois groups

Monday, Nov. 19: Introduction to (classical (elliptic)) modular curves

Review of the complex-analytic picture of the modular curves Y(1) and X(1) (affine and projective j-line), which parametrize C-isomorphism classes of elliptic curves C/Λ (or, for j=∞, degenerate elliptic curves C / Zω = C^*), and their covers [Y(N), Y₁(N), Y₀(N) and] X(N), X₁(N), X₀(N), which respectively parametrize curves with full level-N structure, an N-torsion point, or a cyclic N-isogeny. These curves are all Belyi covers of the projective j-line, branched only above the cusp j = ∞ and the elliptic points j = 0, 1728 of order 3 and 2 respectively (the latter two parametrizing elliptic curves y² = x³ + a₆, y² = x³ + a₄x with triangular and square period lattices and extra automorphisms).

The covers X(N) → X(1) are Galois. For small N these curves are well known: rational for N = 2, 3, 4, 5, with SL₂(Z/NZ) / {±1} acting as S₃, A₄, S₄, A₅; elliptic for N = 6, with equation y² = x³ + 1; the Klein quartic for N = 7; and an unramified double cover of the Fermat quartic for N = 8. For large N, the genus of X(N) grows roughly as N³/24, so we probably don't want to work directly with these curves past N=11 or so, when the genus is 26, though there are still nice models in projective spaces of dimension about N/2 (far from complete intersections once N>8, but retaining all the modular automorphisms of X(N)). But X₁(N) and especially X₀(N) are feasible to work with even for considerably larger N, since their genus grows only quadratically and linearly or so with N, and the modular struture and the availability of q-expansions of modular functions and forms makes such calculations feasible well beyond the range of “random” Belyi functions. For example, we can compute the genus-16 curve X₀(191), and write j as an explicit degree-192 Belyi function on this curve. We've used these formulas to exhibit the j-invariants of a pair of Galois-conjugate elliptic curves defined over a quadratic number field that are related by an isogeny of degree 191.

We're not up to computing with X₀(191) yet, but we'll show how to get rational coordinates and j on X₀(N) for smome smaller but still not entirely trivial N, including 13 and 25, which are respectively the last prime and the last integer for which X₀(N) has genus 0. In general, if N is prime then the Belyi map X₀(N) → X(1) has degree N+1. [In the modular-curves literature one often denotes the level by N, not p, even when it is assumed prime.] The cusp j = ∞ has two preimages (the cusps of X₀(N)), a simple pole at τ = ∞ and a pole of order N at τ = 0. The zeros of j are all triple except for one simple zero for N=3 and a pair of simple zeros if N is 1 mod 3. Likewise, the zeros of j-1728 are all double except for one simple zero for N=2 and a pair of simple zeros if N is 1 mod 4. (This can be seen by identifying the sheets of the cover X₀(N) → X(1) with the N+1 points of a projective line mod N, namely the line associated to the 2-dimensional (Z/NZ)-vector space of N-torsion points on an elliptic curve; the monodromy generators of the three branch points are the reductions mod N of the stabilizers of τ = ∞, ρ, i in Γ(1) = SL₂(Z) / {±1}, namely [1,1; 0,1], [0,1; −1,0], [0,−1; 1,1], the first of which is indeed the product of the other two.) It then follows by Riemann Hurwitz that X₀(N) has genus g if N = 12g + {−1, 5, 7, 13} (and genus zero for N=2 and N=3). In particular, X₀(N) is rational iff N is one of 2, 3, 5, 7, 13. It is no coincidence that these are precisely the primes N for which N−1 is a factor of 24; we shall give a uniform construction of a function h of degree 1 on X₀(N) for each such N (which works also for the composite cases N=4, 9, 25, albeit not quite as nicely because for these N the curve has more than two cusps). We'll then write j as a rational function of this h, and verify that it is a Belyi function with the desired cycle structures.

As usual there's a PGL₂ choice for the rational coordinate h on a rational curve, which we exploit (and reduce) by putting a convenient point at infinity. Here we require that the pole of h be at the cusp τ = ∞, i.e. that the Laurent expansion of h in terms of q = e^2πiτ begin with a 1/q term; and we scale h so that h = q⁻¹ + O(1). This determines h up to an additive constant. [Such a normalized rational coordinate on a modular curve is classically called a “Hauptmodul” (plural “Hauptmoduln”) for the curve.] We could pin down h completely by requiring that the consant term of the q-expansion vanish (that is, h = q⁻¹ + O(q)); but this is usually not natural, and we prefer to use this freedom to put the zero of h at some convenient place (e.g. we usually work with j, not j−744). Here we put the zero at the other cusp of X₀(N), so that h takes the values 0 and ∞ at τ = 0 and ∞ respectively.

For N=2 we easily construct such h as Δ(τ) / Δ(2τ), where Δ(τ) = 12⁻³ (E₄(τ)³ − E₆(τ)²) = q Π_n≥1 (1−qⁿ)²⁴ is the cusp form of weight 12 for Γ(1), which is nonzero everywhere except for the simple zero at the cusp of X(1). Then j is a rational function of degree 3 in this h, with a simple pole at h = ∞ and a double pole at h = 0; thus it is a linear combination of h, 1, h⁻¹, h⁻², and by comparing coefficients (or using serreverse) we find the coefficients 1, 768, 196608, 16777216 = 1, 3 · 2⁸, 3 · 4⁸, 8⁸, whence j = (h+256)³ / h² = 1728 + (h+64) (h−512)² / h². [Check: The general elliptic curve with a 2-isogeny is the same as the general curve y² = x³ + ax² + bx with a 2-torsion point. This curve has j = 256 (3b−a²)³ / (b² (4b−a²)). Equating this to (h+256)³ / h² yields a cubic in h with one rational root,h = 256 b / (a²−4b); equivalently, (b : a²) = (h : h+256). This gives an explicit parametrization by X₀(2) of elliptic curves with a 2-isogeny (up to quadratic twist, which the modular curve cannot detect).]

For prime N>2, we can likewise construct the rational function Δ(τ) / Δ(Nτ) on X₀(N) with a pole only at the cusp ∞ and a zero only at the cusp 0, but the pole and zero have order N−1 > 1, so we would have to take an (N−1)-st root to get a Hauptmodul, and this is possible only when N−1 is a factor of 24. [In general, we can only take a root of order gcd(N−1,24), and find that the divisor (0) − (∞) on X₀(N) yields a torsion point of order (N−1) / gcd(N−1, 24) on the Jacobian J₀(N) of the modular curve.] This is suggested by the 24th powers in the product formula for Δ. Indeed it is known that the 24th root

η(τ) = q^1/24 Π_n≥1 (1−qⁿ) = q^1/24 − q^25/24 − q^49/24 + q^121/24 + q^169/24 − − + + …

is a modular cusp form of weight 1/2 (with μ₂₄-valued multipliers) for Γ(1). [I prefer this writing of the q-expansion to the usual one that factors out q^1/24 to get “pentagonal-number” exponents, because the squares 1, 25, 49, 121, 169, … show we're dealing with a twisted theta function.] Thus h := (η(τ) / η(Nτ))^24/(N−1) is a modular function for some congruence subgroup of Γ₀(N), and since it is invariant under the stabilizer of each cusp, it is a modular function for all of of Γ₀(N) by Ligozat's characterization of modular η products (“Courbes Modulaires de Genre 1”, Mémoires de la Soc. Math. de France 43 (1975)), and we have obtained our Hauptmodul. The factorizations of j and j −1728 are then as follows:
• N=3: j = (h+27) (h+243)³ / h³ = 1728 + (h² − 406h − 3⁹)² / h³;
• N=5: j = (h² + 250h + 5⁵)³ / h⁵ = 1728 + (h² + 22h + 125) (h² − 500h − 5⁶)² / h⁵;
• N=7: j = (h² + 13h + 49) (h² + 245h + 7⁴)³ / h⁷ = 1728 + (h⁴ − 10 · 7² h³ − 9 · 7⁴ h² − 2 · 7⁶ h − 7⁷)² / h⁷;
• N=13: j = (h² + 5h + 13) (h⁴ + 19 · 13 h³ + 20 · 13² h² + 7 · 13³ h + 13⁴)³ / h¹³
= 1728 + (h² + 6h + 13) (h⁶ − 38 · 13 h⁵ − 122 · 13² h⁴ − 108 · 13³ h³ − 46 · 13⁴ h² − 10 · 13⁵ h − 13⁶)² / h¹³.

We can already see some patterns and arithmetical artifacts here, some of which we'll soon be able to explain. For starters, note that the pair of simple roots of j (when N is 1 mod 3), or of j −1728 (when N is 1 mod 4), is always in K=Q((−3)^1/2) or Q(i) respectively. This is because they correspond to endomorphisms of degree N of an elliptic curve E of j-invariant 0 or 1728, that is, N-isogenies from E to itself; in each case the ring of endomorphisms is isomorphic with the ring of integers in K, and is defined over the same field K: for y² = x³ + a₆, the endomorphism ring ia generated by (x, y) → (ρx, y) where ρ is a cube root of unity; and for y² = x³ + a₄x, a generator is (x, y) → (−x, iy). We get a cyclic N-isogeny by choosing one of the two primes of K above the rational prime N.

When N−1 is a factor of 24 but N is not prime, we can still use the formula h := (η(τ) / η(Nτ))^24/(N−1) to obtain a Hauptmodul on X₀(N). There are three such N, namely N=4, 9, 25, and each is the square of a prime; thus X₀(N) has cusps other than ∞ and 0, but each of the new cusps has a factor of exactly N^1/2 in the denominator of τ, and at such a cusp η(τ) and η(Nτ) vanish to the same order, so h has neither zero nor pole. (NB we must check that the local order of vanishing is integral even to verify Ligozat's criterion.) These other cusps still figure as poles in the formula for j as a rational function of h. This makes it a bit harder to recover that function from the q-expansions, but still routine with the computer: if j = A(h) / B(h) for some polynomials A, B of bounded degree, then B(h) j = A(h) yields a system of linear equations in the coefficients of A and B. [Naturally this is closely related with our earlier technique of solving for the coefficients of an algebraic relation between two rational functions on the curve once we have enough rational points on it; indeed it's the special case where the relation is of degree 1 in one variable and the sample points are “infinitely close”. This is tantamount to an algorithm requiring ~N³ operations, assuming the coefficients are small enough that it's reasonable to assign unit time to each arithmetic operation. There are subtler techniques that reduce the run time significantly, but ~N³ is small enough that it's usually not worth implementing such techniques unless they're already built into the package we're using, as with using serreverse to expand j in powers of 1/h.] Here we find:
• N=4: j = (h² + 2⁸h + 2¹²)³ / ((h + 16) h⁴) = 1728 + (h + 32)² (h² − 2⁹h − 2¹³)² / ((h + 16) h⁴);
• N=9: j = (h + 9) (h³ + 3⁵h² + 3⁷h + 3⁸)³ / ((h² + 9h + 27) h⁹)
    = 1728 + (h⁶ − 2 · 3⁵ h⁵ − 11 · 3⁷ h⁴ − 56 · 3⁸ h³ − 5 · 3¹² h² − 2 · 3¹⁴ h − 3¹⁵)² / ((h² + 9h + 27) h⁹);
• N=25: j = (h¹⁰ + 2 · 5³ h⁹ + 7 · 5⁴ h⁸ + 56 · 5⁴ h⁷ + 57 · 5⁵ h⁶ + 202 · 5⁵ h⁵ + 21 · 5⁷ h⁴ + 8 · 5⁸ h³ + 11 · 5⁸ h² + 2 · 5⁹ h + 5⁹)³
          / ((h⁴ + 5h³ + 15h² + 25h + 25) h²⁵)
    = 1728 + (h² + 2h + 5) (h⁴ + 10h³ + 45h² + 100h + 125)²
          · (h¹⁰ − 4 · 5³ h⁹ − 29 · 5⁴ h⁸ − 262 · 5⁴ h⁷ − 279 · 5⁵ h⁶ − 1004 · 5⁵ h⁵ − 21 · 5⁸ h⁴ − 8 · 5⁹ h³ − 11 · 5⁹ h² − 2 · 5¹⁰ h − 5¹⁰)²
          / ((h⁴ + 5h³ + 15h² + 25h + 25) h²⁵).

If you carried out this calculation you may have noticed that many of the coefficients of the q-expansions of h vanish: for N=4, we have h = q⁻¹ − 8 + 20q − 62q³ + 216q⁵ − 641q⁷ + − …, so h+8 is an odd function of q; likewise for N=9, we have h = q⁻¹ − 3 + 5q² − 7q⁵ + 3q⁸ + 15q¹¹ − 32q¹⁴ + 9q¹⁷ + − + …, so h+3 is q⁻¹ times a power series in q³. We shall explain this next week. (Also for N=25 the qⁿ coefficient of h+1 vanishes unless n is ±1 mod 5, which takes a bit more work to account for.)

Monday, Nov. 26: Fricke and Atkin-Lehner involutions of X₀(N) and some of their uses

A key feature of the curves X₀(N), for both theory and computation, is maps between them that reflect transformations of the isogenies they parametrize. For example, if M is a proper factor of N then there are several natural maps X₀(N) → X₀(M), the easiest one coming from the inclusion of Γ₀(N) into Γ₀(M); we'll see later that all these maps come from GL₂(Q)-conjugates of Γ₀(N) in Γ₀(M). In the special case M = N > 1, there are nontrivial conjugations of Γ₀(N) to itself. Today we consider the most important of these: the Fricke involution and more generally the group of Atkin-Lehner involutions.

Recall that to any isogeny φ: E → E’ is associated a dual isogeny φ*: E’ → E of the same degree, with φ** = φ and φ*φ = deg(φ) [the multiplication-by-deg(φ) map on E]. If φ is cyclic, then so is φ*. Since the modular curve X₀(N) parametrizes cyclic N-isogenies, we obtain a map w_N : Y₀(N) → Y₀(N) as follows: if P is a point on Y₀(N) parametrizing an isogeny φ, then w_N(P) is the point parametrizing φ*. Since φ** = φ, this map w_N is an involution. Extending to X₀(N) yields the Fricke involution of X₀(N). Over C, this involution is represented by the fractional linear transformtion of the (extended) upper half-plane taking τ to −1 / (Nτ), which (for N>1) is not in SL₂(Z), but does belong to the normalizer of Γ₀(N) in PGL₂(Q)⁺ [the “+” means positive determinant — else we'd have a fractional linear transformation that switches the upper and lower half-planes]; indeed it had better normalize Γ₀(N) to give a well-defined automorphism of X₀(N). The group generated by Γ₀(N) and w_N is sometimes called Γ₀(N)⁺, and the corresponding modular curve X₀(N)/w_N can then denoted by X₀(N)⁺; for a field k, the k-rational points of X₀(N)⁺ parametrize (j-invariants of) cyclically N-isogenous pairs E, E’ of “k-curves”, with E, E’ (and the isogeny between them) defined over some field K containing k with degree at most 2, with E’ isomorphic with E^σ over the algebraic closure of k. Here σ is the Galois involution of K/k, or the identity if K=k. Warning: In general there might not be any twist over K that makes E’ ≅ E^σ over the same field K.]

By the Riemann-Hurwitz formula, the genus of X₀(N)⁺ can be computed from the genus of X₀(N) together with the number of fixed points of w_N. This number can be expressed in terms of the class numbers of binary quadratic forms of discriminants −N (if that's congruent to 0 or 1 mod 4) and −4N; it is even by Riemann-Hurwitz, and nonzero because τ = i / N^½ is always a fixed point. The total count grows as N^(1/2)±ε. Thus as N→∞ the genus of X₀(N)⁺ is asymptotically half the genus of X₀(N). It's never more than half that genus, and can be substantially less for small (or even some not-so-small) N. Indeed X₀(N)⁺ has genus 0 for some N as large as 71, famously including all the prime factors of the size of the Monster group (see “Monstrous Moonshine”); each of these curves is thus rational, because X₀(N), and thus any quotient of X₀(N), has at least one rational point, the cusp τ = i ∞ . Also, the genus of X₀(N)⁺ is 1 for some N as large as 131 (and at least when N is prime this elliptic curve has rank 1 over Q, so for each of these N there are infinitely many “Q-curves” over quadratic extensions of Q that are N-isogenous with their Galois conjugates. The largest N for which X₀(N)⁺ has genus 2 (genus 3) is 191 (resp. 239). This and much more information about the genera of these curves and their spaces of holomorphic differentials (a.k.a. weight-2 cuspforms) can be read off tables such as Antwerp Table 5 . We have seen already how to use the q-expansions of modular forms on such curves to compute explicit models.

For the seven values of N>1 with (N−1) | 24, for which we constructed an X₀(N) Hauptmodul h as the 24 / (N−1) power of η(τ) / η(Nτ), the involution w_N takes h ↔ N^{12 / (N−1)} / h. Note that the constant N^{12 / (N−1)} is integral even for the two values N=9 and N=25 for which the exponent 12 / (N−1) is half-integral. The two fixed points h = ±N^{6 / (N−1)} of w_N give elliptic curves E/C with a cyclic N-isogeny to E itself. Thus E has complex multiplication (CM). This is clear for the fixed point at τ = i / N^½, for which j(E) = j(i N^½) and thus End(E) is the imaginary quadratic order Z[(−N)^½] of discriminant −4N. At this τ, our h is clearly positive, so h = +N^{6 / (N−1)}. For N = 2, 3, 4, 7, we get the integers h = 64, 27, 16, 7 respectively, and can then use last week's formulas for j as a rational function of h to compute j = 20³, 2·30³, 66³, 255³ respectively. In this case the negative root h = −N^{6 / (N−1)} yields the third cusp of X₀(N) for N=4, and simpler CM values j = 12³, 0, −15³ of discriminant −4, −3, −7 for N = 2, 3, 7 respectively. [If we had known in advance that these “singular moduli” are in Z then we could have just used a few terms of the q-expansions for j, or better yet for E₄ and Δ (easier to bound the coefficients), and then rounded to the nearest integer. But this approach requires developing (or taking on faith) a substantial chunk of the theory of complex multiplication.] If N is one of the remaining three values 5, 9, 25, then the two square roots of N^{12 / (N−1)} yield the two Galois-conjugate j-invariants of curves with CM by Z[(−N)^½].

[Added after class: each of the pairs (X₀(N), w_N) yields several further CM values of j, coming from cyclic N-isogenies φ: E → E with φ* ≠ −φ, coming from pairs of solutions h of j(h) = j(w_N(h)) other than the fixed points of w_N. For N=2 this gives h² + 47h + 2¹² = 0 and j = −15³ which we have already seen, where the endomorphism ring has discriminant −7 and contains the elements (±1±(−7)^½) / 2 of norm 2). For N=3 the new CM values have h² + 46h + 3⁶ = 0 and h² − 10h + 3⁶ = 0; the former repeats the CM value j = 20³ seen already, but the latter gives the new j = −32³ with discriminant −11. Exercise: Do the analogous computation for N=5, 7, 13; can you identify the CM discriminant of each of the new “singular moduli” found?]

The Fricke involution w_N of X₀(N) generalizes to an Atkin-Lehner involution w_N₁ of the same curve X₀(N) for any factorization N = N₁N₂ with gcd(N₁, N₂) = 1; that is, with N₁, N₂ being complementary “unitary divisors” of N. These w_N₁ form an abelian group of exponent 2 (assuming N≠1) and rank equal to the number of distinct prime factors of N, the identity element being the trivial “involution” w₁. To construct w_N₁, factor the typical isogeny parametrized by X₀(N) in two ways as the product of cyclic isogenies of degrees N₁ and N₂, namely E → E’ → E’’’ and E → E’’ → E’’’ where the isogenies E → E’ and E’’ → E’’’ have degree N₁, and the isogenies E → E’’ and E’ → E’’’ have degree N₂. That is, points P on Y₀(N) parametrize commutative diagrams of cyclic isogenies of degrees N₁ and N₂. Replacing each of the isogenies of degree N₁ by its dual yields another such diagram, which corresponds to another point P’ on Y₀(N); we define w_N₁ to be the map that takes P to P’, completed to a map X₀(N) → X₀(N), which again is an involution because φ** = φ. For N₁=N this definition recovers w_N, as it should to justify the notation; and as is true for w_N, the involution w_N₁ comes from a matrix with integer entries and determinant N₁ in the normalizer of Γ₀(N) in PGL₂(Q)⁺. The product of two Atkin-Lehner involutions w_N₁ and w_N₂ is w_N₃ where N₃ is the unitary divisor of N determined uniquely by the condition that N₁N₂N₃ be a square. Unlike the special case N₁=N, for general unitary divisors N₁ of N we cannot expect to be able to lift w_N₁ to an involution in PGL₂(Q)⁺. It sometimes happens that an involution w_N₁ has no fixed points at all; this first happens for (N, N₁) = (14, 2) and (15, 3) When w_N₁ does have fixed points, they can again be counted via class numbers of quadratic forms; the general formulas can get annoying to figure out, but fortunately this is not necessary for any N small enough that we might actually compute formulas for X₀(N) or its quotient by some subgroup of the Atkin-Lehner group: by Riemann-Hurwitz, this is equivalent to finding, for each choice of ± signs, the dimension of the (±1, ±1, …, ±1) eigenspace of the action of the w_N₁'s on the holomorphic differentials on X₀(N), or equivalently on the weight-2 cuspforms for Γ₀(N); and these dimensions have been extensively tabulated (already the fifth Antwerp table contains this information for all N≤300), and computer packages will compute the eigenspaces, and thus a fortiori their dimensions, on the fly.

Wednesday, Nov. 28: Non-Atkin-Lehner elements of the normalizer of Γ₀(N)

N	G₀	polyhedron	\|G₀\|	V	F	E
3	A₄	tetrahedron	12	4	4	6
4	S₄	octahedron	24	6	8	12
5	A₅	icosahedron	60	12	20	30