Algorithms Jean-Luc Chabert

Part II The Origins of

II.4 Algorithms Jean-Luc Chabert

1 What Is an Algorithm?

It is not easy to give a precise deﬁnition of the word

“algorithm.” One can provide approximate synonyms:

some other words that (sometimes) mean roughly the same thing are “rule,” “technique,” “procedure,” and

“method.” One can also give good examples, such as long multiplication, the method one learns in high school for multiplying two positive integers together.

However, although informal explanations and well- chosen examples do give a good idea of what an algo- rithm is, the concept has undergone a long evolution: it was not until the twentieth century that a satisfactory formal deﬁnition was achieved, and ideas about algo- rithms have evolved further even since then. In this arti- cle, we shall try to explain some of these developments and clarify the contemporary meaning of the term.

1.1 Abacists and Algorists

Returning to the example of multiplication, an obvi- ous point is that how you try to multiply two numbers together is strongly inﬂuenced by how you represent those numbers. To see this, try multiplying the Roman numerals CXLVII and XXIX together without ﬁrst con- verting them into their decimal counterparts, 147 and 29. It is diﬃcult and time-consuming, and explains why arithmetic in the Roman empire was extremely rudi- mentary. A numeration system can be additive, as it was for the Romans, orpositional, like ours today. If it is positional, then it can use one or several bases—for instance, the Sumerians used both base 10 and base 60.

For a long time, many processes of calculation used abacuses. Originally, these were lines traced on sand, onto which one placed stones (the Latin for small stone is calculus) to represent numbers. Later there were counting tables equipped with rows or columns onto which one placed tokens. These could be used to rep- resent numbers to a given base. For example, if the base was 10, then a token would represent one unit, ten units, one hundred units, etc., according to which row or column it was in. The four arithmetic operations could then be carried out by moving the tokens accord- ing to precise rules. The Chinese counting frame can be regarded as a version of the abacus.

In the twelfth century, when the Arabic mathemati- cal works were translated into Latin, the denary posi- tional numeration system spread through Europe. This system was particularly suitable for carrying out the arithmetic operations, and led to new methods of cal- culation. The termalgoritmuswas introduced to refer to these, and to distinguish them from the traditional methods that used tokens on an abacus.

Although the signs for the numerals had been adapt- ed from Indian practice, the numerals became known as Arabic. And the origin of the word “algorithm” is Arabic: it arose from a distortion of the name al- khw¯arizm¯ı [VI.5], who was the author of the oldest known work on algebra, in the ﬁrst half of the ninth century. His treatise, entitledal-Kit¯ab al-mukhtas.ar f¯ı h.is¯ab al-jabr wa’l-muq¯abala (“The compendious book on calculation by completion and balancing”), gave rise to the word “algebra.”

1.2 Finiteness

As we have just seen, in the Middle Ages the term “algo- rithm” referred to the processes of calculation based on the decimal notation for the integers. However, in

the seventeenth century, according to d’alembert’s [VI.20]Encyclopédie, the word was used in a more gen- eral sense, referring not just to arithmetic but also to methods in algebra and to other calculational proce- dures such as “the algorithm of the integral calculus”

or “the algorithm of sines.”

Gradually, the term came to mean any process of sys- tematic calculation that could be carried out by means of very precise rules. Finally, with the growing role of computers, the important role ofﬁniteness was fully understood: it is essential that the process stops and provides a result after a ﬁnite time. Thus one arrives at the following naive deﬁnition:

An algorithm is a set of ﬁnitely many rules for manip- ulating a ﬁnite amount of data in order to produce a result in a ﬁnite number of steps.

Note the insistence on ﬁniteness: ﬁniteness in the writ- ing of the algorithm and ﬁniteness in the implementa- tion of the algorithm.

The formulation above is not of course a mathemat- ical deﬁnition in the classical sense of the term. As we shall see later, it was important to formalize it further.

But for now, let us be content with this “deﬁnition”

and look at some classical examples of algorithms in mathematics.

2 Three Historical Examples

A feature of algorithms that we have not yet mentioned isiteration, or the repetition of simple procedures. To see why iteration is important, consider once again the example of long multiplication. This is a method that works for positive integers of any size. As the num- bers get larger, the procedure takes longer, but—and this is of vital importance—the method is “the same”:

if you understand how to multiply two three-digit num- bers together, then you do not need to learn any new principles in order to multiply two 137-digit numbers together (even if you might be rather reluctant to do the calculation). The reason for this is that the method for long multiplication involves a great deal of carefully structured repetition of much smaller tasks, such as multiplying two one-digit numbers together. We shall see that iteration plays a very important part in the algorithms to be discussed in this section.

2.1 Euclid’s Algorithm: Iteration

One of the best, and most often used, examples to illus- trate the nature of algorithms iseuclid’s algorithm

[III.22], which goes back to the third century b.c.e.It is a procedure described byeuclid[VI.2] to determine thegreatest common divisor(gcd) of two positive inte- gersaandb. (Sometimes the greatest common divisor is known as thehighest common factor(hcf).)

When one ﬁrst meets the concept of the greatest com- mon divisor ofaandb, it is usually deﬁned to be the largest positive integer that is a divisor (or factor) of bothaandb. However, for many purposes it is more convenient to think of it as the unique positive inte- gerdwith the following two properties. First, dis a divisor ofaandb, and second, ifcis any other divi- sor ofaandb, thendis divisible byc. The method for determiningdis provided by the ﬁrst two propositions of Book VII of Euclid’sElements. Here is the ﬁrst one:

“Two unequal numbers being set out, and the less being continually subtracted in turn from the greater, if the number which is left never measures the one before it until a unit is left, the original numbers will be prime to one another.” In other words, if by carrying out suc- cessive alternate subtractions one obtains the number 1, then the gcd of the two numbers is equal to 1. In this case one says that the numbers arerelatively primeor coprime.

2.1.1 Alternate Subtractions

Let us describe Euclid’s procedure in general. It is based on two simple observations:

(i) ifa=bthen the gcd ofaandbisb(ora);

(ii) dis a common divisor ofaandbif and only if it is a common divisor ofa−bandb, which implies that the gcd ofaandbis the same as the gcd of a−bandb.

Now suppose that we wish to determine the gcd ofa andb and suppose thata b. Ifa =bthen obser- vation (i) tells us that the gcd isb. Otherwise, observa- tion (ii) tells us that the answer will be the same as it is for the two numbersa−bandb. If we now leta₁be the larger of these two numbers andb₁the smaller (of course, if they are equal then we just seta₁=b₁=b), then we are faced with the same task that we started with—to determine the gcd of two numbers—but the larger of these two numbers,a₁, is smaller thana, the larger of the original two numbers. We can therefore repeat the process: ifa1=b1then the gcd ofa1and b1, and hence that of aand b, is b1, and otherwise we replacea1bya1−b1and reorganize the numbers a1−b1andb1so that if one of them is larger then it comes ﬁrst.

a− b c

b a

b a c

c<b a=b

a and b integers 0≤b≤a

the gcd of the given numbers is the current value of a

yes no

Figure 1 A ﬂow chart for the procedure in Euclid’s algorithm.

One further observation is needed if we want to show that this procedure works. It is the following fundamen- tal fact about the positive integers, sometimes known as thewell-ordering principle.

(iii) A strictly decreasing sequence of positive integers a₀> a₁> a₂>· · · must be ﬁnite.

Since the iterative procedure just described produces exactly such a strictly decreasing sequence, the itera- tions must eventually stop, which means that at some pointakandb_kwill be equal, and that value is thus the gcd ofaandb(see ﬁgure 1).

2.1.2 Euclidean Divisions

Euclid’s algorithm is usually described in a slightly dif- ferent way. One makes use of a more complex pro- cedure calledEuclidean division—that is, division with remainder—which greatly reduces the number of steps that the algorithm takes. The basic fact underlying this procedure is that ifaandbare two positive integers then there are (unique) integersqandr such that

a=bq+r and 0r < b.

The numberqis called thequotientandris theremain- der. Remarks (i) and (ii) above are then replaced by the following ones:

(i) ifr=0 then the gcd ofaandbis equal tob;

(ii) the gcd ofaandbis the same as the gcd ofband r.

This time, at the ﬁrst step, one replaces(a, b)by(b, r ).

Ifr =0, then at the second step one replaces(b, r )by

(r , r₁), wherer₁is the remainder in the division ofb byr, and so on. The sequence of remainders is strictly decreasing (b > r > r₁> r₂0), so the process stops and the gcd is the last nonzero remainder.

It is not hard to see that the two approaches are equivalent. Suppose, for example, thata=103 438 and b = 37. If you use the ﬁrst approach, then you will repeatedly subtract 37 from 103 438 until you reach a number that is smaller than 37. This number will be the remainder when 103 438 is divided by 37, which is the ﬁrst number you would calculate if you used the second approach. Thus, the reason for the second approach is that repeated subtraction can be a very ineﬃcient way of calculating remainders. This eﬃciency gain is very important in practice: the second approach gives rise to apolynomial-time algorithm[IV.20 §2], while the time taken by the ﬁrst is exponentially long.

2.1.3 Generalizations

Euclid’s algorithm can be generalized to many other contexts where we have notions of addition, subtrac- tion, and multiplication. For example, there is a variant of it that applies to thering[III.81 §1]_Z[i]ofGaussian integers, that is, numbers of the forma+bi, wherea andbare ordinary integers. It can also be applied to the ring of all polynomials with real coeﬃcients (or coeﬃ- cients in any ﬁeld, for that matter). The one require- ment is that we should be able to ﬁnd some analogue of the notion of division with remainder, after which the algorithm is virtually identical to the algorithm for positive integers. For example, we have the following statement for polynomials: given any two polynomials AandBwithBnot the zero polynomial, there are poly- nomialsQandRsuch thatA=BQ+Rand eitherR=0 or the degree ofRis less than the degree ofB.

As Euclid noticed (Elements, Book X, proposition 2), one may also carry out the procedure on pairs of num- bersaandbthat are not necessarily integers. It is easy to check that the process will stop if and only if the ratioa/bis a rational number. This observation leads to the concept ofcontinued fractions[III.22], which are discussed in part III. They were not studied explic- itly before the seventeenth century, but the roots of the idea can be traced back toarchimedes[VI.3].

2.2 The Method of Archimedes to Calculateπ:

Approximation and Finiteness

The ratio of the circumference of a circle to the diam- eter is a constant that has been denoted by π since

the eighteenth century (see [III.70]). Let us see how Archimedes, in the third centuryb.c.e., obtained the classical approximation ²²₇ for this ratio. If one draws inscribed polygons (whose vertices lie on the circle) and circumscribed polygons (whose sides are tangent to the circle) and if one computes the length of these poly- gons, then one obtains lower and upper bounds for the value of π, since the circumference of the circle is greater than the length of any inscribed polygon and less than the length of any circumscribed polygon (ﬁg- ure 2). Archimedes started with regular hexagons, and then repeatedly doubled the number of sides, obtain- ing more and more precise bounds. He ﬁnished with ninety-six-sided polygons, obtaining the estimates

3+¹⁰₇₁π3+¹₇.

This process clearly involves iteration, but is it right to call it an algorithm? Strictly speaking it is not: how- ever many sides you take for your polygon, all you will get is an approximation toπ, so the process is not ﬁnite. However, what we do have is an algorithm that will calculateπto any desired accuracy: for exam- ple, if you demand an approximation that is correct to ten decimal places, then after a ﬁnite number of steps the algorithm will give you one. What matters now is that the processconverges. That is, it is important that the values that come out of the iteration get arbi- trarily close toπ. The geometric origin of the method can be used to prove that this is indeed the case, and in 1609 in Germany Ludolph van Ceulen obtained an approximation accurate to thirty-ﬁve decimal places using polygons with 2⁶²sides.

Nevertheless, there is a clear diﬀerence between this algorithm for approximatingπand Euclid’s algorithm for calculating the gcd of two positive integers. Algo- rithms like Euclid’s are often calleddiscrete algorithms, and are contrasted withnumerical algorithms, which are algorithms that are used to compute numbers that are not integers (seenumerical analysis[IV.21]).

2.3 The Newton–Raphson Method:

Recurrence Formulas

In around 1670,newton[VI.14] devised a method for ﬁnding roots of equations, which he explained with ref- erence to the examplex³−2x−5=0. His explanation starts with the observation that the rootxis approxi- mately equal to 2. He therefore writesx=2+pand obtains an equation forpby substituting 2+pforxin the original equation. This new equation works out to bep³+6p²+10p−1=0. Becausexis close to 2,pis

B C

E F

G H

A T

Figure 2Approximation ofπ.

small, so he then estimatespby forgetting the terms p³and 6p²(since these should be considerably smaller than 10p−1). This gives him the equation 10p−1=0, orp=₁₀¹. Of course, this is not an exact solution, but it provides him with a new and better approximation, 2.1, forx. He then repeats the process, writingx=2.1+q, substituting to obtain an equation forq, solving this equation approximately, and reﬁning his estimate still further. The estimate he obtains forqis−0.0054, so the next approximation forxis 2.0946.

How, though, can we be sure that this process really does converge tox? Let us examine the method more closely.

2.3.1 Tangents and Convergence

Newton’s method can be interpreted geometrically in terms of the graph of a functionf, though Newton him- self did not do so. A rootxof the equationf (x)=0 corresponds to a point where the curve with equation y = f (x) intersects the x-axis. If you start with an approximate valueaforxand setp =x−a, as we did above, then when you substitute a+p for x to obtain a new functiong(p), you are eﬀectively moving the origin from(0,0)to the point(a,0). Then when you forget all powers ofpother than the constant and linear terms, you are ﬁnding the best linear approxima- tion to the functiong—which, geometrically speaking, is the tangent line togat the point(0, g(0)). Thus, the approximate value you obtain forpis thex-coordinate of the point where the tangent at(0, g(0))crosses the

a a+p+q a+p

Figure 3 Newton’s method.

x-axis. Addingato this value returns the origin to(0,0) and gives the new approximation to the root off. This is why Newton’s method is often called the tangent method (ﬁgure 3). And one can now see that the new approximation will deﬁnitely be better than the old one if the tangent tofat(a, f (a))intersects thex-axis at a point that lies betweenaand the point where the curve y=f (x)intersects thex-axis.

As it happens, this is not the case for Newton’s choice of the valuea=2 above, but it is true for the approx- imate value 2.1 and for all subsequent ones. Geo- metrically, the favorable situation occurs if the point (a, f (a))lies above thex-axis in a convex part of the curve that crosses thex-axis or below thex-axis in a concave part of the curve that crosses thex-axis. Under these circumstances, and provided the root is not a multiple one, the convergence is quadratic, meaning that the error at each stage is roughly the square of the error at the previous stage—or, equivalently, the approximation is valid to a number of decimal places that roughly doubles at each stage. This is enormously fast.

The choice of the initial approximation value is obvi- ously important, and raises unexpectedly subtle ques- tions. These are clearer if we look atcomplexpolyno- mials and their complex roots. Newton’s method can be easily adapted to this more general context. Suppose thatzis a root of some complex polynomial and that z₀is an initial approximation forz. Newton’s method then gives us a sequencez0, z1, z2, . . ., which may or may not converge toz. We deﬁne thedomain of attrac- tion, denotedA(z), to be the set of all complex num- bersz₀such that the resulting sequence does indeed converge toz. How do we determineA(z)?

The ﬁrst person to ask this problem was cayley [VI.46], in 1879. He noticed that the solution is easy

for quadratic polynomials but diﬃcult as soon as the degree is 3 or more. For example, the domains of attraction of the roots ±1 of the polynomial z²−1 are the open half-planes bounded by the vertical axis, but the domains corresponding to the roots 1,ω, and ω² of z³−1 are extremely complicated sets. They were described by Julia in 1918—such subsets are now calledfractal sets. Newton’s method and fractal sets are discussed further indynamics[IV.14].

2.3.2 Recurrence Formulas

At each stage of his method, Newton had to produce a new equation, but in 1690 Raphson noticed that this was not really necessary. For particular examples, he gave single formulas that could be used at each step, but his basic observation applies in general and leads to a general formula for every case, which one can easily obtain using the interpretation in terms of tan- gents. Indeed, the tangent to the curvey=f (x)at the point ofx-coordinateahas the equationy−f (a)= f (a)(x−a), and it cuts thex-axis at the point with x-coordinate a−f (a)/f (a). What we now call the Newton–Raphson methodsprings from this simple for- mula. One starts with an initial approximationa0=a and then deﬁnes successive approximations using the recurrence formula

a_n+1=a_n− f (an) f (a_n).

As an example, let us consider the functionf (x)= x²−c. Here, Newton’s method provides a sequence of approximations of the square root√

cof c, given by the recurrence formulaa_n+1 = ¹₂(an+c/an)(which we obtain by substitutingx²+cforf in the general formula above). This method for approximating square roots was known by Heron of Alexandria in the ﬁrst century. Note that ifa0is close to√

c, thenc/a0is also close,√clies between them, anda1=¹₂(a0+c/a0)is their arithmetic mean.

3 Does an Algorithm Always Exist?

3.1 Hilbert’s Tenth Problem:

The Need for Formalization

In 1900, at the Second International Congress of Math- ematicians,hilbert[VI.63] proposed a list of twenty- three problems. These problems, and Hilbert’s works in general, had a huge inﬂuence on mathematics during the twentieth century (Gray 2000). We are interested here in Hilbert’s tenth problem: given a Diophantine

在文檔中 The Princeton Companion to Mathematics (頁 129-140)