Part I Introduction
I.4 The General Goals of Mathematical Research
6.4 Extremal Problems
There are many problems in mathematics where one wishes to maximize or minimize some quantity in
the presence of various constraints. These are called extremal problems. As with counting questions, there are some extremal problems for which one can realis- tically hope to work out the answer exactly, and many more for which, even though an exact answer is out of the question, one can still aim to find interesting estimates. Here are some examples of both kinds.
(i) Letnbe a positive integer and letXbe a set with nelements. How many subsets ofXcan be chosen if none of these subsets is contained in any other?
A simple observation one can make is that if two dif- ferent sets have the same size, then neither is contained in the other. Therefore, one way of satisfying the con- straints of the problem is to choose all the sets of some particular sizek. Now the number of subsets ofX of sizekisn!/k!(n−k)!, which is usually writtenn
k
(or
nCk), and it is not hard to show that
n k
is largest when k=n/2 ifnis even and whenk=(n±1)/2 ifnis odd.
For simplicity let us concentrate on the case whennis even. What we have just proved is that it is possible to pickn
n/2
subsets of ann-element set in such a way that none of them contains any other. That is, n
n/2
is a lower bound for the problem. A result known as Sperner’s theoremstates that it is an upper bound as well. That is, if you choose more thann
n/2
subsets of X, then, however you do it, one of these subsets will be contained in another. Therefore, the question is answered exactly, and the answer isn
n/2
. (Whenn is odd, then the answer is n
(n+1)/2
, as one might now expect.)
(ii) Suppose that the two ends of a heavy chain are attached to two hooks on the ceiling and that the chain is not supported anywhere else. What shape will the hanging chain take?
At first, this question does not look like a maximiza- tion or minimization problem, but it can be quickly turned into one. That is because a general principle from physics tells us that the chain will settle in the shape that minimizes its potential energy. We there- fore find ourselves asking a new question: let A and B be two points at distancedapart, and letCbe the set of all curves of lengthlthat have A and B as their two endpoints. Which curveC∈Chas the smallest poten- tial energy? Here one takes the mass of any portion of the curve to be proportional to its length. The poten- tial energy of the curve is equal tomgh, wheremis the mass of the curve,gis the gravitational constant, andhis the height of the center of gravity of the curve.
Sincemandgdo not change, another formulation of
the question is: which curveC ∈ Chas the smallest average height?
This problem can be solved by means of a technique known asthe calculus of variations. Very roughly, the idea is this. We have a set,C, and a functionhdefined onCthat takes each curveC∈Cto its average height.
We are trying to minimize h, and a natural way to approach that task is to define some sort of derivative and look for a curveC at which this derivative is 0.
Notice that the word “derivative” here doesnot refer to the rate of change of height as you move along the curve. Rather, it means the (linear) way that the average height of the entire curve changes in response to small perturbations of the curve. Using this kind of deriva- tive to find a minimum is more complicated than look- ing for the stationary points of a function defined onR, sinceCis an infinite-dimensional set and is therefore much more complicated thanR. However, the approach can be made to work, and the curve that minimizes the average height is known. (It is called a catenary, after the Latin word for chain.) Thus, this is another minimization problem that has been answered exactly.
For a typical problem in the calculus of variations, one is trying to find a curve, or surface, or more gen- eral kind of function, for which a certain quantity is minimized or maximized. If a minimum or maximum exists (which is by no means automatic when one is working with an infinite-dimensional set, so this can be an interesting and important question), the object that achieves it satisfies a system ofpartial differen- tial equations[I.3 §5.4] known as theEuler–Lagrange equations. For more about this style of minimization or maximization, see variational methods [III.94]
(and alsooptimization and lagrange multipliers [III.64]).
(iii) How many numbers can you choose between 1 andnif no three of them are allowed to lie in an arith- metic progression? Ifn =9 then the answer is 5. To see this, note first that no three of the five numbers 1,2,4,8,9 lie in an arithmetic progression. Now let us see if we can find six numbers that work.
If we make one of our numbers 5, then we must leave out either 4 or 6, or else we would have the progression 4,5,6. Similarly, we must leave out one of 3 and 7, one of 2 and 8, and one of 1 and 9. But then we have left out four numbers. It follows that we cannot choose 5 as one of the numbers.
We must leave out one of 1, 2, and 3, and one of 7, 8, and 9, so if we leave out 5 then we must include 4 and 6. But then we cannot include 2 or 8. But we must also
leave out at least one of 1, 4, and 7, so we are forced to leave out at least four numbers.
An ugly case-by-case argument of this kind is feasi- ble whenn=9, but as soon asnis at all large there are far too many cases for it to be possible to con- sider them all. For this problem, there does not seem to be a tidy answer that tells us exactly which is the largest set of integers between 1 andnthat contains no arithmetic progression of length 3. So instead one looks for upper and lower bounds on its size. To prove a lower bound, one must find a good way of construct- ing a large set that does not contain any arithmetic progressions, and to prove an upper bound one must show that any set of a certain size must necessarily contain an arithmetic progression. The best bounds to date are very far apart. In 1947, Behrend found a set of sizen/ec√
lognthat contains no arithmetic progres- sion, and in 1999 Jean Bourgain proved that every set of sizeCn
log logn/logncontains an arithmetic pro- gression. (If it is not obvious to you that these num- bers are far apart, then consider what happens when n=10100, say. Then e√
lognis about 4 000 000, while logn/log lognis about 6.5.)
(iv) Theoretical computer science is a source of many minimization problems: if one is programming a com- puter to perform a certain task, then one wants it to do so in as short a time as possible. Here is an elementary- sounding example: how many steps are needed to multiply twon-digit numbers together?
Even if one is not too precise about what is meant by a “step,” one can see that the traditional method, long multiplication, takes at leastn2steps since, dur- ing the course of the calculation, each digit of the first number is multiplied by each digit of the sec- ond. One might imagine that this was necessary, but in fact there are clever ways of transforming the prob- lem and dramatically reducing the time that a computer needs to perform a multiplication of this kind. The fastest known method usesthe fast fourier trans- form [III.26] to reduce the number of steps fromn2 toCnlognlog logn. Since the logarithm of a number is much smaller than the number itself, one thinks of Cnlognlog lognas being only just worse than a bound of the formCn. Bounds of this form are calledlinear, and for a problem like this are clearly the best one can hope for, since it takes 2nsteps even to read the digits of the two numbers.
Another question that is similar in spirit is whether there are fast algorithms for matrix multiplication. To multiply twon×nmatrices using the obvious method
one needs to don3 individual multiplications of the numbers in the matrices, but once again there are less obvious methods that do better. The main break- through on this problem was due to Strassen, who had the idea of splitting each matrix into fourn/2×n/2 matrices and multiplying those together. At first it seems as though one has to calculate the products of eight pairs ofn/2×n/2 matrices, but these products are related, and Strassen came up withsevensuch cal- culations from which the eight products could quickly be derived. One can then applyrecursion: that is, use the same idea to speed up the calculation of the seven n/2×n/2 matrix products, and so on.
Strassen’s algorithm reduces the number of numeri- cal multiplications from aboutn3to aboutnlog27. Since log27 is less than 2.81, this is a significant improve- ment, but only whenn is large. His basic divide-and- conquer strategy has been developed further, and the current record is better thann2.4. In the other direction, the situation is less satisfactory: nobody has found a proof that one needs to use significantly more thann2 multiplications.
For more problems of a similar kind, seecomputa- tional complexity[IV.20] andthe mathematics of algorithm design[VII.5].
(v) Some minimization and maximization problems are of a more subtle kind. For example, suppose that one is trying to understand the nature of the differ- ences between successive primes. The smallest such difference is 1 (the difference between 2 and 3), and it is not hard to prove that there is no largest difference (given any integerngreater than 1, none of the num- bers betweenn!+2 andn!+nis a prime). Therefore, there do not seem to be interesting maximization or minimization problems concerning these differences.
However, one can in fact formulate some fascinat- ing problems if one firstnormalizes in an appropriate way. As was mentioned earlier in this section, the prime number theorem states that the density of primes near n is about 1/logn, so an average gap between two primes nearnwill be about logn. Ifpandqare suc- cessive primes, we can therefore define a “normalized gap” to be(q−p)/logp. The average value of this nor- malized gap will be 1, but is it sometimes much smaller and sometimes much bigger?
It was shown by Westzynthius in 1931 that even nor- malized gaps can be arbitrarily large, and it was widely believed that they could also be arbitrarily close to zero. (The famous twin prime conjecture—that there are infinitely many primesp for which p+2 is also
a prime—implies this immediately.) However, it took until 2005 for this to be proved, by Goldston, Pintz, and Yıldırım. (Seeanalytic number theory[IV.2 §§6–8]
for a discussion of this problem.)
7 Determining Whether Different Mathematical Properties Are Compatible In order to understand a mathematical concept, such as that of a group or a manifold, there are various stages one typically goes through. Obviously it is a good idea to begin by becoming familiar with a few represen- tative examples of the structure, and also with tech- niques for building new examples out of old ones. It is also extremely important to understand the homomor- phisms, or “structure-preserving functions,” from one example of the structure to another, as was discussed in some fundamental mathematical definitions [I.3 §§4.1, 4.2].
Once one knows these basics, what is there left to understand? Well, for a general theory to be useful, it should tell us something about specific examples. For instance, as we saw in section 3.2, Lagrange’s theorem can be used to prove Fermat’s little theorem. Lagrange’s theorem is a general fact about groups: that ifGis a group of sizen, then the size of any subgroup of G must be a factor ofn. To obtain Fermat’s little theorem, one applies Lagrange’s theorem to the particular case whenGis the multiplicative group of nonzero integers modp. The conclusion one obtains—thatap is always congruent toa—is far from obvious.
However, what if we want to know something about a groupGthat might not be true for all groups? That is, suppose that we wish to determine whetherGhas some property P that some groups have and others do not. Since we cannot prove that the propertyP fol- lows from the group axioms, it might seem that we are forced to abandon the general theory of groups and look at the specific groupG. However, in many situ- ations there is an intermediate possibility: to identify some fairly general propertyQthat the groupGhas, and show thatQimplies the more particular property P that interests us.
Here is an illustration of this sort of technique in a different context. Suppose we wish to determine whether the polynomialp(x)=x4−2x3−x2−2x+1 has a real root. One method would be to study this par- ticular polynomial and try to find a root. After quite a lot of effort we might discover thatp(x)can be factor- ized as(x2+x+1)(x2−3x+1). The first factor is always
positive, but if we apply the quadratic formula to the second, we find thatp(x)=0 whenx=(3±√
5)/2. An alternative method, which uses a bit of general theory, is to notice thatp(1)is negative (in fact, it equals−3) and thatp(x)is large whenxis large (because then the x4term is far bigger than anything else), and then to use theintermediate value theorem, the result that any continuous function that is negative somewhere and positive somewhere else must be zero somewhere in between.
Notice that, with the second approach, there was still some computation to do—finding a value ofxfor which p(x)is negative—but that it was much easier than the computation in the first approach—finding a value of xfor whichp(x)is zero. In the second approach, we established thatphad the rather general property of being negative somewhere, and used the intermediate value theorem to finish off the argument.
There are many situations like this throughout math- ematics, and as they arise certain general properties become established as particularly useful. For exam- ple, if you know that a positive integernis prime, or that a groupGis Abelian (that is, gh = hg for any two elementsgandhofG), or that a function taking complex numbers to complex numbers isholomor- phic[I.3 §5.6], then as a consequence of these general properties you know a lot more about the objects in question.
Once properties have established themselves as im- portant, they give rise to a large class of mathemati- cal questions of the following form: given a mathemat- ical structure and a selection of interesting properties that it might have, which combinations of these prop- erties imply which other ones? Not all such questions are interesting, of course—many of them turn out to be quite easy and others are too artificial—but some of them are very natural and surprisingly resistant to one’s initial attempts to solve them. This is usually a sign that one has stumbled on what mathematicians would call a “deep” question. In the rest of this section let us look at a problem of this kind.
A groupGis calledfinitely generatedif there is some finite set{x1, x2, . . . , xk}of elements of Gsuch that all the rest can be written as products of elements in that set. For example, the group SL2(Z)consists of all 2×2 matrices(a bc d)such thata,b,c, anddare integers andad−bc=1. This group is finitely generated: it is a nice exercise to show that every such matrix can be built from the four matrices(1 10 1), (10 1−1), (1 01 1), and (−1 11 0)using matrix multiplication. (See [I.3 §3.2] for a
discussion of matrices. A first step toward proving this result is to show that(10 1m)(10 1n)=(10m+n1 ).)
Now let us consider a second property. Ifxis an ele- ment of a groupG, thenxis said to havefinite orderif there is some power ofxthat equals the identity. The smallest such power is called theorder ofx. For exam- ple, in the multiplicative group of nonzero integers mod 7, the identity is 1, and the order of the element 4 is 3, because 41=4, 42=16≡2 and 43=64≡1 mod 7. As for 3, its first six powers are 3, 2, 6, 4, 5, 1, so it has order 6. Now some groups have the very special property that there is some integernsuch that xnequals the identity for everyx—or, equivalently, the order of everyxis a factor ofn. What can we say about such groups?
Let us look first at the case where all elements have order 2. Writing e for the identity element, we are assuming thata2=efor every elementa. If we mul- tiply both sides of this equation by the inversea−1, then we deduce thata=a−1. The opposite implication is equally easy, so such groups are ones where every element is its own inverse.
Now letaandbbe two elements ofG. For any two elements aandb of any group we have the identity (ab)−1=b−1a−1(simply becauseabb−1a−1=aa−1= e), and in our special group where all elements equal their inverses we can deduce from this thatab=ba.
That is,Gis automatically Abelian.
Already we have shown that one general property, that every element ofGsquares to the identity, implies another, thatGis Abelian. Now let us add the condi- tion thatGis finitely generated, and letx1, x2, . . . , xk be aminimalset of generators. That is, suppose that every element ofGcan be built up out of thexi and that we need all of thexito be able to do this. Because G is Abelian and because every element is equal to its own inverse, we can rearrange products of thexi into a standard form, where each xi occurs at most once and the indices increase. For example, take the productx4x3x1x4x4x1x3x1x5. BecauseGis Abelian, this equals x1x1x1x3x3x4x4x4x5, and because each element is its own inverse this equals x1x4x5, the standard form of the original expression.
This shows thatG can have at most 2k elements, since for each xi we have the choice of whether or not to include it in the product (after it has been put in the form above). In particular, the properties “Gis finitely generated” and “every nonidentity element of Ghas order 2” imply the third property “Gis finite.” It turns out to be fairly easy to prove that two elements
whose standard forms are different are themselves dif- ferent, so in factGhas exactly 2kelements (wherekis the size of a minimal set of generators).
Now let us ask what happens if n is some integer greater than 2 andxn =efor every elementx. That is, ifGis finitely generated andxn =efor every x, mustGbe finite? This turns out to be a much harder question, originally asked byburnside[VI.60]. Burn- side himself showed thatGmust be finite ifn=3, but it was not until 1968 that his problem was solved, when Adian and Novikov proved the remarkable result that ifn4381 thenGdoesnothave to be finite. There is of course a big gap between 3 and 4381, and progress in bridging it has been slow. It was only in 1992 that this was improved ton13, by Ivanov. And to give an idea of how hard the Burnside problem is, it is still not known whether a group with two generators such that the fifth power of every element is the identity must be finite.
8 Working with Arguments That Are Not Fully Rigorous
A mathematical statement is considered to be estab- lished when it has a proof that meets the high stan- dards of rigor that are characteristic of the subject.
However, nonrigorous arguments have an important place in mathematics as well. For example, if one wishes to apply a mathematical statement to another field, such as physics or engineering, then the truth of the statement is often more important than whether one has proved it.
However, this raises an obvious question: if one has not proved a statement, then what grounds could there be for believing it? There are in fact several different kinds of nonrigorous justification, so let us look at some of them.