Strassen’s algorithm for matrix multiplication

Third Edition

4.2 Strassen’s algorithm for matrix multiplication

If you have seen matrices before, then you probably know how to multiply them.

(Otherwise, you should read Section D.1 in Appendix D.) If A D .aij/ and B D .bij/ are square n n matrices, then in the product C D A B, we deﬁne the entry cij, for i; j D 1; 2; : : : ; n, by

c_ij D Xn kD1

a_{i k} b_kj : (4.8)

We must compute n²matrix entries, and each is the sum of n values. The following procedure takes n n matrices A and B and multiplies them, returning their n n product C . We assume that each matrix has an attribute rows, giving the number of rows in the matrix.

SQUARE-MATRIX-MULTIPLY.A; B/

1 n D A:rows

2 let C be a new n n matrix 3 fori D 1 to n

4 forj D 1 to n

5 c_ij D 0

6 fork D 1 to n

7 c_ij D c_ij C a_{i k} b_kj 8 returnC

The SQUARE-MATRIX-MULTIPLY procedure works as follows. The for loop of lines 3–7 computes the entries of each row i , and within a given row i , the

for loop of lines 4–7 computes each of the entries cij, for each column j . Line 5 initializes cij to 0 as we start computing the sum given in equation (4.8), and each iteration of the for loop of lines 6–7 adds in one more term of equation (4.8).

Because each of the triply-nested for loops runs exactly n iterations, and each execution of line 7 takes constant time, the SQUARE-MATRIX-MULTIPLY proce-dure takes ‚.n³/ time.

You might at ﬁrst think that any matrix multiplication algorithm must take .n³/ time, since the natural deﬁnition of matrix multiplication requires that many mul-tiplications. You would be incorrect, however: we have a way to multiply matrices in o.n³/ time. In this section, we shall see Strassen’s remarkable recursive algo-rithm for multiplying n n matrices. It runs in ‚.n^{lg 7}/ time, which we shall show in Section 4.5. Since lg 7 lies between 2:80 and 2:81, Strassen’s algorithm runs in O.n^2:81/ time, which is asymptotically better than the simple SQUARE-MATRIX -MULTIPLYprocedure.

A simple divide-and-conquer algorithm

To keep things simple, when we use a divide-and-conquer algorithm to compute the matrix product C D A B, we assume that n is an exact power of 2 in each of the n n matrices. We make this assumption because in each divide step, we will divide n n matrices into four n=2 n=2 matrices, and by assuming that n is an exact power of 2, we are guaranteed that as long as n 2, the dimension n=2 is an integer.

Suppose that we partition each of A, B, and C into four n=2 n=2 matrices A D

Equation (4.10) corresponds to the four equations

C11 D A11 B11C A12 B21; (4.11)

C12 D A11 B12C A12 B22; (4.12)

C₂₁ D A₂₁ B₁₁C A₂₂ B₂₁; (4.13)

C₂₂ D A₂₁ B₁₂C A₂₂ B₂₂: (4.14)

Each of these four equations speciﬁes two multiplications of n=2 n=2 matrices and the addition of their n=2 n=2 products. We can use these equations to create a straightforward, recursive, divide-and-conquer algorithm:

SQUARE-MATRIX-MULTIPLY-RECURSIVE.A; B/

1 n D A:rows

2 let C be a new n n matrix 3 ifn == 1

4 c11 D a11 b11

5 else partition A, B, and C as in equations (4.9)

6 C11 D SQUARE-MATRIX-MULTIPLY-RECURSIVE.A11; B11/ C SQUARE-MATRIX-MULTIPLY-RECURSIVE.A12; B21/ 7 C12 D SQUARE-MATRIX-MULTIPLY-RECURSIVE.A11; B12/ C SQUARE-MATRIX-MULTIPLY-RECURSIVE.A12; B22/ 8 C21 D SQUARE-MATRIX-MULTIPLY-RECURSIVE.A21; B11/ C SQUARE-MATRIX-MULTIPLY-RECURSIVE.A22; B21/ 9 C22 D SQUARE-MATRIX-MULTIPLY-RECURSIVE.A21; B12/ C SQUARE-MATRIX-MULTIPLY-RECURSIVE.A22; B22/ 10 returnC

This pseudocode glosses over one subtle but important implementation detail.

How do we partition the matrices in line 5? If we were to create 12 new n=2 n=2 matrices, we would spend ‚.n²/ time copying entries. In fact, we can partition the matrices without copying entries. The trick is to use index calculations. We identify a submatrix by a range of row indices and a range of column indices of the original matrix. We end up representing a submatrix a little differently from how we represent the original matrix, which is the subtlety we are glossing over.

The advantage is that, since we can specify submatrices by index calculations, executing line 5 takes only ‚.1/ time (although we shall see that it makes no difference asymptotically to the overall running time whether we copy or partition in place).

Now, we derive a recurrence to characterize the running time of SQUARE -MATRIX-MULTIPLY-RECURSIVE. Let T .n/ be the time to multiply two n n matrices using this procedure. In the base case, when n D 1, we perform just the one scalar multiplication in line 4, and so

T .1/ D ‚.1/ : (4.15)

The recursive case occurs when n > 1. As discussed, partitioning the matrices in line 5 takes ‚.1/ time, using index calculations. In lines 6–9, we recursively call SQUARE-MATRIX-MULTIPLY-RECURSIVE a total of eight times. Because each recursive call multiplies two n=2 n=2 matrices, thereby contributing T .n=2/ to the overall running time, the time taken by all eight recursive calls is 8T .n=2/. We also must account for the four matrix additions in lines 6–9. Each of these matrices contains n²=4 entries, and so each of the four matrix additions takes ‚.n²/ time.

Since the number of matrix additions is a constant, the total time spent adding

ma-trices in lines 6–9 is ‚.n²/. (Again, we use index calculations to place the results of the matrix additions into the correct positions of matrix C , with an overhead of ‚.1/ time per entry.) The total time for the recursive case, therefore, is the sum of the partitioning time, the time for all the recursive calls, and the time to add the matrices resulting from the recursive calls:

T .n/ D ‚.1/ C 8T .n=2/ C ‚.n²/

D 8T .n=2/ C ‚.n²/ : (4.16)

Notice that if we implemented partitioning by copying matrices, which would cost

‚.n²/ time, the recurrence would not change, and hence the overall running time would increase by only a constant factor.

Combining equations (4.15) and (4.16) gives us the recurrence for the running time of SQUARE-MATRIX-MULTIPLY-RECURSIVE:

T .n/ D (

‚.1/ if n D 1 ;

8T .n=2/ C ‚.n²/ if n > 1 : (4.17)

As we shall see from the master method in Section 4.5, recurrence (4.17) has the solution T .n/ D ‚.n³/. Thus, this simple divide-and-conquer approach is no faster than the straightforward SQUARE-MATRIX-MULTIPLYprocedure.

Before we continue on to examining Strassen’s algorithm, let us review where the components of equation (4.16) came from. Partitioning each n n matrix by index calculation takes ‚.1/ time, but we have two matrices to partition. Although you could say that partitioning the two matrices takes ‚.2/ time, the constant of 2 is subsumed by the ‚-notation. Adding two matrices, each with, say, k entries, takes ‚.k/ time. Since the matrices we add each have n²=4 entries, you could say that adding each pair takes ‚.n²=4/ time. Again, however, the ‚-notation subsumes the constant factor of 1=4, and we say that adding two n²=4 n²=4 matrices takes ‚.n²/ time. We have four such matrix additions, and once again, instead of saying that they take ‚.4n²/ time, we say that they take ‚.n²/ time.

(Of course, you might observe that we could say that the four matrix additions take ‚.4n²=4/ time, and that 4n²=4 D n², but the point here is that ‚-notation subsumes constant factors, whatever they are.) Thus, we end up with two terms of ‚.n²/, which we can combine into one.

When we account for the eight recursive calls, however, we cannot just sub-sume the constant factor of 8. In other words, we must say that together they take 8T .n=2/ time, rather than just T .n=2/ time. You can get a feel for why by looking back at the recursion tree in Figure 2.5, for recurrence (2.1) (which is identical to recurrence (4.7)), with the recursive case T .n/ D 2T .n=2/C‚.n/. The factor of 2 determined how many children each tree node had, which in turn determined how many terms contributed to the sum at each level of the tree. If we were to ignore

the factor of 8 in equation (4.16) or the factor of 2 in recurrence (4.1), the recursion tree would just be linear, rather than “bushy,” and each level would contribute only one term to the sum.

Bear in mind, therefore, that although asymptotic notation subsumes constant multiplicative factors, recursive notation such as T .n=2/ does not.

Strassen’s method

The key to Strassen’s method is to make the recursion tree slightly less bushy. That is, instead of performing eight recursive multiplications of n=2 n=2 matrices, it performs only seven. The cost of eliminating one matrix multiplication will be several new additions of n=2 n=2 matrices, but still only a constant number of additions. As before, the constant number of matrix additions will be subsumed by ‚-notation when we set up the recurrence equation to characterize the running time.

Strassen’s method is not at all obvious. (This might be the biggest understate-ment in this book.) It has four steps:

1. Divide the input matrices A and B and output matrix C into n=2 n=2 subma-trices, as in equation (4.9). This step takes ‚.1/ time by index calculation, just as in SQUARE-MATRIX-MULTIPLY-RECURSIVE.

2. Create 10 matrices S1; S2; : : : ; S10, each of which is n=2 n=2 and is the sum or difference of two matrices created in step 1. We can create all 10 matrices in

‚.n²/ time.

3. Using the submatrices created in step 1 and the 10 matrices created in step 2, recursively compute seven matrix products P1; P₂; : : : ; P₇. Each matrix Pi is n=2 n=2.

4. Compute the desired submatrices C11; C₁₂; C₂₁; C₂₂ of the result matrix C by adding and subtracting various combinations of the Pimatrices. We can com-pute all four submatrices in ‚.n²/ time.

We shall see the details of steps 2–4 in a moment, but we already have enough information to set up a recurrence for the running time of Strassen’s method. Let us assume that once the matrix size n gets down to 1, we perform a simple scalar mul-tiplication, just as in line 4 of SQUARE-MATRIX-MULTIPLY-RECURSIVE. When n > 1, steps 1, 2, and 4 take a total of ‚.n²/ time, and step 3 requires us to per-form seven multiplications of n=2 n=2 matrices. Hence, we obtain the following recurrence for the running time T .n/ of Strassen’s algorithm:

T .n/ D

(‚.1/ if n D 1 ;

7T .n=2/ C ‚.n²/ if n > 1 : (4.18)

We have traded off one matrix multiplication for a constant number of matrix ad-ditions. Once we understand recurrences and their solutions, we shall see that this tradeoff actually leads to a lower asymptotic running time. By the master method in Section 4.5, recurrence (4.18) has the solution T .n/ D ‚.n^{lg 7}/.

We now proceed to describe the details. In step 2, we create the following 10 matrices:

S1 D B12 B22; S2 D A11C A12; S3 D A21C A22; S₄ D B₂₁ B₁₁; S5 D A11C A22; S6 D B11C B22; S7 D A12 A22; S8 D B21C B22; S9 D A11 A21; S10 D B11C B12:

Since we must add or subtract n=2 n=2 matrices 10 times, this step does indeed take ‚.n²/ time.

In step 3, we recursively multiply n=2 n=2 matrices seven times to compute the following n=2 n=2 matrices, each of which is the sum or difference of products of A and B submatrices:

P₁ D A₁₁ S₁ D A₁₁ B₁₂ A₁₁ B₂₂; P₂ D S₂ B₂₂ D A₁₁ B₂₂C A₁₂ B₂₂; P3 D S3 B11 D A21 B11C A22 B11; P4 D A22 S4 D A22 B21 A22 B11;

P5 D S5 S6 D A11 B11C A11 B22C A22 B11C A22 B22; P6 D S7 S8 D A12 B21C A12 B22 A22 B21 A22 B22; P7 D S9 S10 D A11 B11C A11 B12 A21 B11 A21 B12:

Note that the only multiplications we need to perform are those in the middle col-umn of the above equations. The right-hand colcol-umn just shows what these products equal in terms of the original submatrices created in step 1.

Step 4 adds and subtracts the Pi matrices created in step 3 to construct the four n=2 n=2 submatrices of the product C . We start with

C11 D P5C P4 P2C P6:

Expanding out the right-hand side, with the expansion of each Pi on its own line and vertically aligning terms that cancel out, we see that C11equals

A11 B11C A11 B22C A22 B11C A22 B22

A₂₂ B₁₁ C A₂₂ B₂₁

A11 B22 A12 B22

A22 B22 A22 B21C A12 B22C A12 B21

A11 B11 C A12 B21;

which corresponds to equation (4.11).

Similarly, we set C₁₂ D P₁C P₂; and so C12 equals A11 B12 A11 B22

C A₁₁ B₂₂C A₁₂ B₂₂ A11 B12 C A12 B22; corresponding to equation (4.12).

Setting

C21 D P3C P4

makes C21equal A21 B11C A22 B11

A₂₂ B₁₁C A₂₂ B₂₁ A₂₁ B₁₁ C A₂₂ B₂₁; corresponding to equation (4.13).

Finally, we set

C22 D P5C P1 P3 P7; so that C22equals

A11 B11C A11 B22C A22 B11C A22 B22

A₁₁ B₂₂ C A₁₁ B₁₂

A22 B11 A21 B11

A₁₁ B₁₁ A₁₁ B₁₂C A₂₁ B₁₁C A₂₁ B₁₂

A22 B22 C A21 B12;

which corresponds to equation (4.14). Altogether, we add or subtract n=2 n=2 matrices eight times in step 4, and so this step indeed takes ‚.n²/ time.

Thus, we see that Strassen’s algorithm, comprising steps 1–4, produces the cor-rect matrix product and that recurrence (4.18) characterizes its running time. Since we shall see in Section 4.5 that this recurrence has the solution T .n/ D ‚.n^{lg 7}/, Strassen’s method is asymptotically faster than the straightforward SQUARE -MATRIX-MULTIPLY procedure. The notes at the end of this chapter discuss some of the practical aspects of Strassen’s algorithm.

Exercises

Note: Although Exercises 4.2-3, 4.2-4, and 4.2-5 are about variants on Strassen’s algorithm, you should read Section 4.5 before trying to solve them.

4.2-1

Use Strassen’s algorithm to compute the matrix product

1 3 7 5

6 8 4 2

: Show your work.

4.2-2

Write pseudocode for Strassen’s algorithm.

4.2-3

How would you modify Strassen’s algorithm to multiply n n matrices in which n is not an exact power of 2? Show that the resulting algorithm runs in time ‚.n^{lg 7}/.

4.2-4

What is the largest k such that if you can multiply 3 3 matrices using k multi-plications (not assuming commutativity of multiplication), then you can multiply n n matrices in time o.n^{lg 7}/? What would the running time of this algorithm be?

4.2-5

V. Pan has discovered a way of multiplying 68 68 matrices using 132,464 mul-tiplications, a way of multiplying 70 70 matrices using 143,640 mulmul-tiplications, and a way of multiplying 72 72 matrices using 155,424 multiplications. Which method yields the best asymptotic running time when used in a divide-and-conquer matrix-multiplication algorithm? How does it compare to Strassen’s algorithm?

4.2-6

How quickly can you multiply a k n n matrix by an n k n matrix, using Strassen’s algorithm as a subroutine? Answer the same question with the order of the input matrices reversed.

4.2-7

Show how to multiply the complex numbers a C bi and c C d i using only three multiplications of real numbers. The algorithm should take a, b, c, and d as input and produce the real component ac bd and the imaginary component ad C bc separately.

在文檔中 ALGORITHMS INTRODUCTION TO (頁 96-104)