The maximum-subarray problem

Third Edition

4.1 The maximum-subarray problem

Suppose that you been offered the opportunity to invest in the Volatile Chemical Corporation. Like the chemicals the company produces, the stock price of the Volatile Chemical Corporation is rather volatile. You are allowed to buy one unit of stock only one time and then sell it at a later date, buying and selling after the close of trading for the day. To compensate for this restriction, you are allowed to learn what the price of the stock will be in the future. Your goal is to maximize your proﬁt. Figure 4.1 shows the price of the stock over a 17-day period. You may buy the stock at any one time, starting after day 0, when the price is $100 per share. Of course, you would want to “buy low, sell high”—buy at the lowest possible price and later on sell at the highest possible price—to maximize your proﬁt. Unfortunately, you might not be able to buy at the lowest price and then sell at the highest price within a given period. In Figure 4.1, the lowest price occurs after day 7, which occurs after the highest price, after day 1.

You might think that you can always maximize proﬁt by either buying at the lowest price or selling at the highest price. For example, in Figure 4.1, we would maximize proﬁt by buying at the lowest price, after day 7. If this strategy always worked, then it would be easy to determine how to maximize proﬁt: ﬁnd the highest and lowest prices, and then work left from the highest price to ﬁnd the lowest prior price, work right from the lowest price to ﬁnd the highest later price, and take the pair with the greater difference. Figure 4.2 shows a simple counterexample,

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

120 110 100 90 80 70 60

Day 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

Price 100 113 110 85 105 102 86 63 81 101 94 106 101 79 94 90 97 Change 13 3 25 20 3 16 23 18 20 7 12 5 22 15 4 7

Figure 4.1 Information about the price of stock in the Volatile Chemical Corporation after the close of trading over a period of 17 days. The horizontal axis of the chart indicates the day, and the vertical axis shows the price. The bottom row of the table gives the change in price from the previous day.

0 1 2 3 4

Figure 4.2 An example showing that the maximum proﬁt does not always start at the lowest price or end at the highest price. Again, the horizontal axis indicates the day, and the vertical axis shows the price. Here, the maximum proﬁt of $3 per share would be earned by buying after day 2 and selling after day 3. The price of $7 after day 2 is not the lowest price overall, and the price of $10 after day 3 is not the highest price overall.

demonstrating that the maximum proﬁt sometimes comes neither by buying at the lowest price nor by selling at the highest price.

A brute-force solution

We can easily devise a brute-force solution to this problem: just try every possible pair of buy and sell dates in which the buy date precedes the sell date. A period of n days has_n

such pairs of dates. Since_n

is ‚.n²/, and the best we can hope for is to evaluate each pair of dates in constant time, this approach would take .n²/ time. Can we do better?

A transformation

In order to design an algorithm with an o.n²/ running time, we will look at the input in a slightly different way. We want to ﬁnd a sequence of days over which the net change from the ﬁrst day to the last is maximum. Instead of looking at the daily prices, let us instead consider the daily change in price, where the change on day i is the difference between the prices after day i 1 and after day i . The table in Figure 4.1 shows these daily changes in the bottom row. If we treat this row as an array A, shown in Figure 4.3, we now want to ﬁnd the nonempty, contiguous subarray of A whose values have the largest sum. We call this contiguous subarray the maximum subarray. For example, in the array of Figure 4.3, the maximum subarray of AŒ1 : : 16 is AŒ8 : : 11, with the sum 43. Thus, you would want to buy the stock just before day 8 (that is, after day 7) and sell it after day 11, earning a proﬁt of $43 per share.

At ﬁrst glance, this transformation does not help. We still need to check _n1

D ‚.n²/ subarrays for a period of n days. Exercise 4.1-2 asks you to show

Figure 4.3 The change in stock prices as a maximum-subarray problem. Here, the subar-ray AŒ8 : : 11, with sum 43, has the greatest sum of any contiguous subarsubar-ray of arsubar-ray A.

that although computing the cost of one subarray might take time proportional to the length of the subarray, when computing all ‚.n²/ subarray sums, we can orga-nize the computation so that each subarray sum takes O.1/ time, given the values of previously computed subarray sums, so that the brute-force solution takes ‚.n²/ time.

So let us seek a more efﬁcient solution to the maximum-subarray problem.

When doing so, we will usually speak of “a” maximum subarray rather than “the”

maximum subarray, since there could be more than one subarray that achieves the maximum sum.

The maximum-subarray problem is interesting only when the array contains some negative numbers. If all the array entries were nonnegative, then the maximum-subarray problem would present no challenge, since the entire array would give the greatest sum.

A solution using divide-and-conquer

Let’s think about how we might solve the maximum-subarray problem using the divide-and-conquer technique. Suppose we want to ﬁnd a maximum subar-ray of the subarsubar-ray AŒlow : : high. Divide-and-conquer suggests that we divide the subarray into two subarrays of as equal size as possible. That is, we ﬁnd the midpoint, say mid, of the subarray, and consider the subarrays AŒlow : : mid

and AŒmid C 1 : : high. As Figure 4.4(a) shows, any contiguous subarray AŒi : : j of AŒlow : : high must lie in exactly one of the following places:

entirely in the subarray AŒlow : : mid, so that low i j mid,

entirely in the subarray AŒmid C 1 : : high, so that mid < i j high, or

crossing the midpoint, so that low i mid < j high.

Therefore, a maximum subarray of AŒlow : : high must lie in exactly one of these places. In fact, a maximum subarray of AŒlow : : high must have the greatest sum over all subarrays entirely in AŒlow : : mid, entirely in AŒmid C 1 : : high, or crossing the midpoint. We can ﬁnd maximum subarrays of AŒlow : : mid and AŒmidC1 : : high recursively, because these two subproblems are smaller instances of the problem of ﬁnding a maximum subarray. Thus, all that is left to do is ﬁnd a

(a) (b)

Figure 4.4 (a) Possible locations of subarrays of AŒlow : : high: entirely in AŒlow : : mid, entirely in AŒmid C 1 : : high, or crossing the midpoint mid. (b) Any subarray of AŒlow : : high crossing the midpoint comprises two subarrays AŒi : : mid and AŒmid C 1 : : j , where low i mid and mid< j high.

maximum subarray that crosses the midpoint, and take a subarray with the largest sum of the three.

We can easily ﬁnd a maximum subarray crossing the midpoint in time linear in the size of the subarray AŒlow : : high. This problem is not a smaller instance of our original problem, because it has the added restriction that the subarray it chooses must cross the midpoint. As Figure 4.4(b) shows, any subarray crossing the midpoint is itself made of two subarrays AŒi : : mid and AŒmid C 1 : : j , where low i mid and mid < j high. Therefore, we just need to ﬁnd maximum subarrays of the form AŒi : : mid and AŒmid C 1 : : j and then combine them. The procedure FIND-MAX-CROSSING-SUBARRAY takes as input the array A and the indices low, mid, and high, and it returns a tuple containing the indices demarcating a maximum subarray that crosses the midpoint, along with the sum of the values in a maximum subarray.

15 return.max-left; max-right; left-sum C right-sum/

This procedure works as follows. Lines 1–7 ﬁnd a maximum subarray of the left half, AŒlow : : mid. Since this subarray must contain AŒmid, the for loop of lines 3–7 starts the index i at mid and works down to low, so that every subarray it considers is of the form AŒi : : mid. Lines 1–2 initialize the variables left-sum, which holds the greatest sum found so far, and sum, holding the sum of the entries in AŒi : : mid. Whenever we ﬁnd, in line 5, a subarray AŒi : : mid with a sum of values greater than left-sum, we update left-sum to this subarray’s sum in line 6, and in line 7 we update the variable max-left to record this index i . Lines 8–14 work analogously for the right half, AŒmid C 1 : : high. Here, the for loop of lines 10–14 starts the index j at midC1 and works up to high, so that every subarray it considers is of the form AŒmid C 1 : : j . Finally, line 15 returns the indices max-left and max-right that demarcate a maximum subarray crossing the midpoint, along with the sum left-sum C right-sum of the values in the subarray AŒmax-left : : max-right.

If the subarray AŒlow : : high contains n entries (so that n D high low C 1), we claim that the call FIND-MAX-CROSSING-SUBARRAY.A; low; mid; high/

takes ‚.n/ time. Since each iteration of each of the two for loops takes ‚.1/

time, we just need to count up how many iterations there are altogether. The for loop of lines 3–7 makes mid low C 1 iterations, and the for loop of lines 10–14 makes high mid iterations, and so the total number of iterations is

.mid low C 1/ C .high mid/ D high low C 1 D n :

With a linear-time FIND-MAX-CROSSING-SUBARRAY procedure in hand, we can write pseudocode for a divide-and-conquer algorithm to solve the maximum-subarray problem:

FIND-MAXIMUM-SUBARRAY.A; low; high/

1 if high= = low

2 return.low; high; AŒlow/ //base case: only one element 3 else mid D b.low C high/=2c

4 .left-low; left-high; left-sum/ D

FIND-MAXIMUM-SUBARRAY.A; low; mid/

5 .right-low; right-high; right-sum/ D

FIND-MAXIMUM-SUBARRAY.A; mid C 1; high/

6 .cross-low; cross-high; cross-sum/ D

FIND-MAX-CROSSING-SUBARRAY.A; low; mid; high/

7 if left-sum right-sum and left-sum cross-sum 8 return.left-low; left-high; left-sum/

9 elseif right-sum left-sum and right-sum cross-sum 10 return.right-low; right-high; right-sum/

11 else return.cross-low; cross-high; cross-sum/

The initial call FIND-MAXIMUM-SUBARRAY.A; 1; A:length/ will ﬁnd a maxi-mum subarray of AŒ1 : : n.

Similar to FIND-MAX-CROSSING-SUBARRAY, the recursive procedure FIND -MAXIMUM-SUBARRAY returns a tuple containing the indices that demarcate a maximum subarray, along with the sum of the values in a maximum subarray.

Line 1 tests for the base case, where the subarray has just one element. A subar-ray with just one element has only one subarsubar-ray—itself—and so line 2 returns a tuple with the starting and ending indices of just the one element, along with its value. Lines 3–11 handle the recursive case. Line 3 does the divide part, comput-ing the index mid of the midpoint. Let’s refer to the subarray AŒlow : : mid as the left subarray and to AŒmid C 1 : : high as the right subarray. Because we know that the subarray AŒlow : : high contains at least two elements, each of the left and right subarrays must have at least one element. Lines 4 and 5 conquer by recur-sively ﬁnding maximum subarrays within the left and right subarrays, respectively.

Lines 6–11 form the combine part. Line 6 ﬁnds a maximum subarray that crosses the midpoint. (Recall that because line 6 solves a subproblem that is not a smaller instance of the original problem, we consider it to be in the combine part.) Line 7 tests whether the left subarray contains a subarray with the maximum sum, and line 8 returns that maximum subarray. Otherwise, line 9 tests whether the right subarray contains a subarray with the maximum sum, and line 10 returns that max-imum subarray. If neither the left nor right subarrays contain a subarray achieving the maximum sum, then a maximum subarray must cross the midpoint, and line 11 returns it.

Analyzing the divide-and-conquer algorithm

Next we set up a recurrence that describes the running time of the recursive FIND -MAXIMUM-SUBARRAY procedure. As we did when we analyzed merge sort in Section 2.3.2, we make the simplifying assumption that the original problem size is a power of 2, so that all subproblem sizes are integers. We denote by T .n/ the running time of FIND-MAXIMUM-SUBARRAY on a subarray of n elements. For starters, line 1 takes constant time. The base case, when n D 1, is easy: line 2 takes constant time, and so

T .1/ D ‚.1/ : (4.5)

The recursive case occurs when n > 1. Lines 1 and 3 take constant time. Each of the subproblems solved in lines 4 and 5 is on a subarray of n=2 elements (our assumption that the original problem size is a power of 2 ensures that n=2 is an integer), and so we spend T .n=2/ time solving each of them. Because we have to solve two subproblems—for the left subarray and for the right subarray—the contribution to the running time from lines 4 and 5 comes to 2T .n=2/. As we have

already seen, the call to FIND-MAX-CROSSING-SUBARRAY in line 6 takes ‚.n/

time. Lines 7–11 take only ‚.1/ time. For the recursive case, therefore, we have T .n/ D ‚.1/ C 2T .n=2/ C ‚.n/ C ‚.1/

D 2T .n=2/ C ‚.n/ : (4.6)

Combining equations (4.5) and (4.6) gives us a recurrence for the running time T .n/ of FIND-MAXIMUM-SUBARRAY:

T .n/ D

(‚.1/ if n D 1 ;

2T .n=2/ C ‚.n/ if n > 1 : (4.7)

This recurrence is the same as recurrence (4.1) for merge sort. As we shall see from the master method in Section 4.5, this recurrence has the solution T .n/ D ‚.n lg n/. You might also revisit the recursion tree in Figure 2.5 to un-derstand why the solution should be T .n/ D ‚.n lg n/.

Thus, we see that the divide-and-conquer method yields an algorithm that is asymptotically faster than the brute-force method. With merge sort and now the maximum-subarray problem, we begin to get an idea of how powerful the divide-and-conquer method can be. Sometimes it will yield the asymptotically fastest algorithm for a problem, and other times we can do even better. As Exercise 4.1-5 shows, there is in fact a linear-time algorithm for the maximum-subarray problem, and it does not use divide-and-conquer.

Exercises

4.1-1

What does FIND-MAXIMUM-SUBARRAYreturn when all elements of A are nega-tive?

4.1-2

Write pseudocode for the brute-force method of solving the maximum-subarray problem. Your procedure should run in ‚.n²/ time.

4.1-3

Implement both the brute-force and recursive algorithms for the maximum-subarray problem on your own computer. What problem size n0gives the crossover point at which the recursive algorithm beats the brute-force algorithm? Then, change the base case of the recursive algorithm to use the brute-force algorithm whenever the problem size is less than n0. Does that change the crossover point?

4.1-4

Suppose we change the deﬁnition of the maximum-subarray problem to allow the result to be an empty subarray, where the sum of the values of an empty

subar-ray is 0. How would you change any of the algorithms that do not allow empty subarrays to permit an empty subarray to be the result?

4.1-5

Use the following ideas to develop a nonrecursive, linear-time algorithm for the maximum-subarray problem. Start at the left end of the array, and progress toward the right, keeping track of the maximum subarray seen so far. Knowing a maximum subarray of AŒ1 : : j , extend the answer to ﬁnd a maximum subarray ending at in-dex j C1 by using the following observation: a maximum subarray of AŒ1 : : j C 1

is either a maximum subarray of AŒ1 : : j or a subarray AŒi : : j C 1, for some 1 i j C 1. Determine a maximum subarray of the form AŒi : : j C 1 in constant time based on knowing a maximum subarray ending at index j .

在文檔中 ALGORITHMS INTRODUCTION TO (頁 89-96)