Part II The Origins of
II.4 Algorithms Jean-Luc Chabert
1 What Is an Algorithm?
It is not easy to give a precise definition of the word
“algorithm.” One can provide approximate synonyms:
some other words that (sometimes) mean roughly the same thing are “rule,” “technique,” “procedure,” and
“method.” One can also give good examples, such as long multiplication, the method one learns in high school for multiplying two positive integers together.
However, although informal explanations and well- chosen examples do give a good idea of what an algo- rithm is, the concept has undergone a long evolution: it was not until the twentieth century that a satisfactory formal definition was achieved, and ideas about algo- rithms have evolved further even since then. In this arti- cle, we shall try to explain some of these developments and clarify the contemporary meaning of the term.
1.1 Abacists and Algorists
Returning to the example of multiplication, an obvi- ous point is that how you try to multiply two numbers together is strongly influenced by how you represent those numbers. To see this, try multiplying the Roman numerals CXLVII and XXIX together without first con- verting them into their decimal counterparts, 147 and 29. It is difficult and time-consuming, and explains why arithmetic in the Roman empire was extremely rudi- mentary. A numeration system can be additive, as it was for the Romans, orpositional, like ours today. If it is positional, then it can use one or several bases—for instance, the Sumerians used both base 10 and base 60.
For a long time, many processes of calculation used abacuses. Originally, these were lines traced on sand, onto which one placed stones (the Latin for small stone is calculus) to represent numbers. Later there were counting tables equipped with rows or columns onto which one placed tokens. These could be used to rep- resent numbers to a given base. For example, if the base was 10, then a token would represent one unit, ten units, one hundred units, etc., according to which row or column it was in. The four arithmetic operations could then be carried out by moving the tokens accord- ing to precise rules. The Chinese counting frame can be regarded as a version of the abacus.
In the twelfth century, when the Arabic mathemati- cal works were translated into Latin, the denary posi- tional numeration system spread through Europe. This system was particularly suitable for carrying out the arithmetic operations, and led to new methods of cal- culation. The termalgoritmuswas introduced to refer to these, and to distinguish them from the traditional methods that used tokens on an abacus.
Although the signs for the numerals had been adapt- ed from Indian practice, the numerals became known as Arabic. And the origin of the word “algorithm” is Arabic: it arose from a distortion of the name al- khw¯arizm¯ı [VI.5], who was the author of the oldest known work on algebra, in the first half of the ninth century. His treatise, entitledal-Kit¯ab al-mukhtas.ar f¯ı h.is¯ab al-jabr wa’l-muq¯abala (“The compendious book on calculation by completion and balancing”), gave rise to the word “algebra.”
1.2 Finiteness
As we have just seen, in the Middle Ages the term “algo- rithm” referred to the processes of calculation based on the decimal notation for the integers. However, in
the seventeenth century, according to d’alembert’s [VI.20]Encyclopédie, the word was used in a more gen- eral sense, referring not just to arithmetic but also to methods in algebra and to other calculational proce- dures such as “the algorithm of the integral calculus”
or “the algorithm of sines.”
Gradually, the term came to mean any process of sys- tematic calculation that could be carried out by means of very precise rules. Finally, with the growing role of computers, the important role offiniteness was fully understood: it is essential that the process stops and provides a result after a finite time. Thus one arrives at the following naive definition:
An algorithm is a set of finitely many rules for manip- ulating a finite amount of data in order to produce a result in a finite number of steps.
Note the insistence on finiteness: finiteness in the writ- ing of the algorithm and finiteness in the implementa- tion of the algorithm.
The formulation above is not of course a mathemat- ical definition in the classical sense of the term. As we shall see later, it was important to formalize it further.
But for now, let us be content with this “definition”
and look at some classical examples of algorithms in mathematics.
2 Three Historical Examples
A feature of algorithms that we have not yet mentioned isiteration, or the repetition of simple procedures. To see why iteration is important, consider once again the example of long multiplication. This is a method that works for positive integers of any size. As the num- bers get larger, the procedure takes longer, but—and this is of vital importance—the method is “the same”:
if you understand how to multiply two three-digit num- bers together, then you do not need to learn any new principles in order to multiply two 137-digit numbers together (even if you might be rather reluctant to do the calculation). The reason for this is that the method for long multiplication involves a great deal of carefully structured repetition of much smaller tasks, such as multiplying two one-digit numbers together. We shall see that iteration plays a very important part in the algorithms to be discussed in this section.
2.1 Euclid’s Algorithm: Iteration
One of the best, and most often used, examples to illus- trate the nature of algorithms iseuclid’s algorithm
[III.22], which goes back to the third century b.c.e.It is a procedure described byeuclid[VI.2] to determine thegreatest common divisor(gcd) of two positive inte- gersaandb. (Sometimes the greatest common divisor is known as thehighest common factor(hcf).)
When one first meets the concept of the greatest com- mon divisor ofaandb, it is usually defined to be the largest positive integer that is a divisor (or factor) of bothaandb. However, for many purposes it is more convenient to think of it as the unique positive inte- gerdwith the following two properties. First, dis a divisor ofaandb, and second, ifcis any other divi- sor ofaandb, thendis divisible byc. The method for determiningdis provided by the first two propositions of Book VII of Euclid’sElements. Here is the first one:
“Two unequal numbers being set out, and the less being continually subtracted in turn from the greater, if the number which is left never measures the one before it until a unit is left, the original numbers will be prime to one another.” In other words, if by carrying out suc- cessive alternate subtractions one obtains the number 1, then the gcd of the two numbers is equal to 1. In this case one says that the numbers arerelatively primeor coprime.
2.1.1 Alternate Subtractions
Let us describe Euclid’s procedure in general. It is based on two simple observations:
(i) ifa=bthen the gcd ofaandbisb(ora);
(ii) dis a common divisor ofaandbif and only if it is a common divisor ofa−bandb, which implies that the gcd ofaandbis the same as the gcd of a−bandb.
Now suppose that we wish to determine the gcd ofa andb and suppose thata b. Ifa =bthen obser- vation (i) tells us that the gcd isb. Otherwise, observa- tion (ii) tells us that the answer will be the same as it is for the two numbersa−bandb. If we now leta1be the larger of these two numbers andb1the smaller (of course, if they are equal then we just seta1=b1=b), then we are faced with the same task that we started with—to determine the gcd of two numbers—but the larger of these two numbers,a1, is smaller thana, the larger of the original two numbers. We can therefore repeat the process: ifa1=b1then the gcd ofa1and b1, and hence that of aand b, is b1, and otherwise we replacea1bya1−b1and reorganize the numbers a1−b1andb1so that if one of them is larger then it comes first.
a− b c
b a
c
b a c
c<b a=b
a and b integers 0≤b≤a
the gcd of the given numbers is the current value of a
yes no
yes no
Figure 1 A flow chart for the procedure in Euclid’s algorithm.
One further observation is needed if we want to show that this procedure works. It is the following fundamen- tal fact about the positive integers, sometimes known as thewell-ordering principle.
(iii) A strictly decreasing sequence of positive integers a0> a1> a2>· · · must be finite.
Since the iterative procedure just described produces exactly such a strictly decreasing sequence, the itera- tions must eventually stop, which means that at some pointakandbkwill be equal, and that value is thus the gcd ofaandb(see figure 1).
2.1.2 Euclidean Divisions
Euclid’s algorithm is usually described in a slightly dif- ferent way. One makes use of a more complex pro- cedure calledEuclidean division—that is, division with remainder—which greatly reduces the number of steps that the algorithm takes. The basic fact underlying this procedure is that ifaandbare two positive integers then there are (unique) integersqandr such that
a=bq+r and 0r < b.
The numberqis called thequotientandris theremain- der. Remarks (i) and (ii) above are then replaced by the following ones:
(i) ifr=0 then the gcd ofaandbis equal tob;
(ii) the gcd ofaandbis the same as the gcd ofband r.
This time, at the first step, one replaces(a, b)by(b, r ).
Ifr =0, then at the second step one replaces(b, r )by
(r , r1), wherer1is the remainder in the division ofb byr, and so on. The sequence of remainders is strictly decreasing (b > r > r1> r20), so the process stops and the gcd is the last nonzero remainder.
It is not hard to see that the two approaches are equivalent. Suppose, for example, thata=103 438 and b = 37. If you use the first approach, then you will repeatedly subtract 37 from 103 438 until you reach a number that is smaller than 37. This number will be the remainder when 103 438 is divided by 37, which is the first number you would calculate if you used the second approach. Thus, the reason for the second approach is that repeated subtraction can be a very inefficient way of calculating remainders. This efficiency gain is very important in practice: the second approach gives rise to apolynomial-time algorithm[IV.20 §2], while the time taken by the first is exponentially long.
2.1.3 Generalizations
Euclid’s algorithm can be generalized to many other contexts where we have notions of addition, subtrac- tion, and multiplication. For example, there is a variant of it that applies to thering[III.81 §1]Z[i]ofGaussian integers, that is, numbers of the forma+bi, wherea andbare ordinary integers. It can also be applied to the ring of all polynomials with real coefficients (or coeffi- cients in any field, for that matter). The one require- ment is that we should be able to find some analogue of the notion of division with remainder, after which the algorithm is virtually identical to the algorithm for positive integers. For example, we have the following statement for polynomials: given any two polynomials AandBwithBnot the zero polynomial, there are poly- nomialsQandRsuch thatA=BQ+Rand eitherR=0 or the degree ofRis less than the degree ofB.
As Euclid noticed (Elements, Book X, proposition 2), one may also carry out the procedure on pairs of num- bersaandbthat are not necessarily integers. It is easy to check that the process will stop if and only if the ratioa/bis a rational number. This observation leads to the concept ofcontinued fractions[III.22], which are discussed in part III. They were not studied explic- itly before the seventeenth century, but the roots of the idea can be traced back toarchimedes[VI.3].
2.2 The Method of Archimedes to Calculateπ:
Approximation and Finiteness
The ratio of the circumference of a circle to the diam- eter is a constant that has been denoted by π since
the eighteenth century (see [III.70]). Let us see how Archimedes, in the third centuryb.c.e., obtained the classical approximation 227 for this ratio. If one draws inscribed polygons (whose vertices lie on the circle) and circumscribed polygons (whose sides are tangent to the circle) and if one computes the length of these poly- gons, then one obtains lower and upper bounds for the value of π, since the circumference of the circle is greater than the length of any inscribed polygon and less than the length of any circumscribed polygon (fig- ure 2). Archimedes started with regular hexagons, and then repeatedly doubled the number of sides, obtain- ing more and more precise bounds. He finished with ninety-six-sided polygons, obtaining the estimates
3+1071π3+17.
This process clearly involves iteration, but is it right to call it an algorithm? Strictly speaking it is not: how- ever many sides you take for your polygon, all you will get is an approximation toπ, so the process is not finite. However, what we do have is an algorithm that will calculateπto any desired accuracy: for exam- ple, if you demand an approximation that is correct to ten decimal places, then after a finite number of steps the algorithm will give you one. What matters now is that the processconverges. That is, it is important that the values that come out of the iteration get arbi- trarily close toπ. The geometric origin of the method can be used to prove that this is indeed the case, and in 1609 in Germany Ludolph van Ceulen obtained an approximation accurate to thirty-five decimal places using polygons with 262sides.
Nevertheless, there is a clear difference between this algorithm for approximatingπand Euclid’s algorithm for calculating the gcd of two positive integers. Algo- rithms like Euclid’s are often calleddiscrete algorithms, and are contrasted withnumerical algorithms, which are algorithms that are used to compute numbers that are not integers (seenumerical analysis[IV.21]).
2.3 The Newton–Raphson Method:
Recurrence Formulas
In around 1670,newton[VI.14] devised a method for finding roots of equations, which he explained with ref- erence to the examplex3−2x−5=0. His explanation starts with the observation that the rootxis approxi- mately equal to 2. He therefore writesx=2+pand obtains an equation forpby substituting 2+pforxin the original equation. This new equation works out to bep3+6p2+10p−1=0. Becausexis close to 2,pis
O
B C
D
E F
G H
A T
N
Figure 2Approximation ofπ.
small, so he then estimatespby forgetting the terms p3and 6p2(since these should be considerably smaller than 10p−1). This gives him the equation 10p−1=0, orp=101. Of course, this is not an exact solution, but it provides him with a new and better approximation, 2.1, forx. He then repeats the process, writingx=2.1+q, substituting to obtain an equation forq, solving this equation approximately, and refining his estimate still further. The estimate he obtains forqis−0.0054, so the next approximation forxis 2.0946.
How, though, can we be sure that this process really does converge tox? Let us examine the method more closely.
2.3.1 Tangents and Convergence
Newton’s method can be interpreted geometrically in terms of the graph of a functionf, though Newton him- self did not do so. A rootxof the equationf (x)=0 corresponds to a point where the curve with equation y = f (x) intersects the x-axis. If you start with an approximate valueaforxand setp =x−a, as we did above, then when you substitute a+p for x to obtain a new functiong(p), you are effectively moving the origin from(0,0)to the point(a,0). Then when you forget all powers ofpother than the constant and linear terms, you are finding the best linear approxima- tion to the functiong—which, geometrically speaking, is the tangent line togat the point(0, g(0)). Thus, the approximate value you obtain forpis thex-coordinate of the point where the tangent at(0, g(0))crosses the
a a+p+q a+p
Figure 3 Newton’s method.
x-axis. Addingato this value returns the origin to(0,0) and gives the new approximation to the root off. This is why Newton’s method is often called the tangent method (figure 3). And one can now see that the new approximation will definitely be better than the old one if the tangent tofat(a, f (a))intersects thex-axis at a point that lies betweenaand the point where the curve y=f (x)intersects thex-axis.
As it happens, this is not the case for Newton’s choice of the valuea=2 above, but it is true for the approx- imate value 2.1 and for all subsequent ones. Geo- metrically, the favorable situation occurs if the point (a, f (a))lies above thex-axis in a convex part of the curve that crosses thex-axis or below thex-axis in a concave part of the curve that crosses thex-axis. Under these circumstances, and provided the root is not a multiple one, the convergence is quadratic, meaning that the error at each stage is roughly the square of the error at the previous stage—or, equivalently, the approximation is valid to a number of decimal places that roughly doubles at each stage. This is enormously fast.
The choice of the initial approximation value is obvi- ously important, and raises unexpectedly subtle ques- tions. These are clearer if we look atcomplexpolyno- mials and their complex roots. Newton’s method can be easily adapted to this more general context. Suppose thatzis a root of some complex polynomial and that z0is an initial approximation forz. Newton’s method then gives us a sequencez0, z1, z2, . . ., which may or may not converge toz. We define thedomain of attrac- tion, denotedA(z), to be the set of all complex num- bersz0such that the resulting sequence does indeed converge toz. How do we determineA(z)?
The first person to ask this problem was cayley [VI.46], in 1879. He noticed that the solution is easy
for quadratic polynomials but difficult as soon as the degree is 3 or more. For example, the domains of attraction of the roots ±1 of the polynomial z2−1 are the open half-planes bounded by the vertical axis, but the domains corresponding to the roots 1,ω, and ω2 of z3−1 are extremely complicated sets. They were described by Julia in 1918—such subsets are now calledfractal sets. Newton’s method and fractal sets are discussed further indynamics[IV.14].
2.3.2 Recurrence Formulas
At each stage of his method, Newton had to produce a new equation, but in 1690 Raphson noticed that this was not really necessary. For particular examples, he gave single formulas that could be used at each step, but his basic observation applies in general and leads to a general formula for every case, which one can easily obtain using the interpretation in terms of tan- gents. Indeed, the tangent to the curvey=f (x)at the point ofx-coordinateahas the equationy−f (a)= f (a)(x−a), and it cuts thex-axis at the point with x-coordinate a−f (a)/f (a). What we now call the Newton–Raphson methodsprings from this simple for- mula. One starts with an initial approximationa0=a and then defines successive approximations using the recurrence formula
an+1=an− f (an) f (an).
As an example, let us consider the functionf (x)= x2−c. Here, Newton’s method provides a sequence of approximations of the square root√
cof c, given by the recurrence formulaan+1 = 12(an+c/an)(which we obtain by substitutingx2+cforf in the general formula above). This method for approximating square roots was known by Heron of Alexandria in the first century. Note that ifa0is close to√
c, thenc/a0is also close,√clies between them, anda1=12(a0+c/a0)is their arithmetic mean.
3 Does an Algorithm Always Exist?
3.1 Hilbert’s Tenth Problem:
The Need for Formalization
In 1900, at the Second International Congress of Math- ematicians,hilbert[VI.63] proposed a list of twenty- three problems. These problems, and Hilbert’s works in general, had a huge influence on mathematics during the twentieth century (Gray 2000). We are interested here in Hilbert’s tenth problem: given a Diophantine