• 沒有找到結果。

Matrix Multiplication

N/A
N/A
Protected

Academic year: 2022

Share "Matrix Multiplication"

Copied!
43
0
0

加載中.... (立即查看全文)

全文

(1)

Matrix Multiplication

Let A be l × m matrix, and B be m × n matrix.

Then C = A × B is given by cij =

m

X

k=1

aik× bkj, (1)

where aik, bkj, cij are defined as before.

Zheng-Liang Lu 312 / 354

(2)

For example,

Note that if n = 1, then cij =Pm

k=1aik × bkj becomes the matrix A times the vector B.

Zheng-Liang Lu 313 / 354

(3)

Example

Let A ∈ Rl ×m and B ∈ Rm×n.

Please implement a program that A multiplies B and compare your result to A × B provided in MATLAB.

For simplicity, the dimensions l , m, n are the inputs to generate random matrices by rand.

Input: l , m, n for [A]l ×m and [B]m×n Output: A × B

Zheng-Liang Lu 314 / 354

(4)

1 clear; clc;

2 % main

3 l = randi(5, 1); % A {l x m}

4 m = randi(5, 1); % B {m x n}

5 n = randi(5, 1);

6 A = randi(10, l, m);

7 B = randi(10, m, n);

8 C = zeros(size(A, 1), size(B, 2));

9 for i = 1 : 1 : size(A, 1)

10 for j = 1 : 1 : size(B, 2)

11 for k = 1 : 1 : size(A, 2)

12 C(i, j) = C(i, j) + A(i, k) * B(k, j);

13 end

14 end

15 end

Time complexity: O(n3) Strassen (1969): O(n2.807355)

Coppersmith and Winograd (2010): O(n2.375477)

Zheng-Liang Lu 315 / 354

(5)

Matrix Exponentiation

Raising a matrix to a power is equivalent to repeatedly multiplying the matrix by itself.

For example, A2= A × A.

Note that A should be a square matrix. (Why?)

In mathematics, the matrix exponential1 is a matrix function on square matrices analogous to the ordinary exponential function.

That is, f (A) = eA, similar to f (x ) = ex.

However, raising a matrix to a matrix power, that is, AB, is undefined.

1Seematrix exponentialandPauli matrices.

Zheng-Liang Lu 316 / 354

(6)

Example

Be aware that exp(X ) and exp(1)X are different for any matrix X .

1 clear; clc;

2 % main

3 X = [0 1; 1 0];

4 mat exp1 = exp(1) ˆ X

5 mat exp2 = 0;

6 for i = 1 : 10 % check the identity of eˆa

7 mat exp2 = mat exp2 + X ˆ (i - 1) / ...

factorial(i - 1);

8 end

9 mat exp2

10 mat exp3 = exp(X) % different?

Zheng-Liang Lu 317 / 354

(7)

1 mat exp1 =

2

3 1.5431 1.1752

4 1.1752 1.5431

5 6

7 mat exp2 =

8

9 1.5431 1.1752

10 1.1752 1.5431

11 12

13 mat exp3 =

14

15 1.0000 2.7183

16 2.7183 1.0000

Zheng-Liang Lu 318 / 354

(8)

Determinant

det(A) is the determinant of the square matrix A.2 Recall that if A =

 a b c d



, then det(A) = ad − bc.

det(A) can calculate the determinant of A.

For example,

1 clear; clc;

2 % main

3 A = magic(3);

4 det(A)

2In math, |A| also denotes det(A).

Zheng-Liang Lu 319 / 354

(9)

Actually, a general formula3 for determinant is somehow complicated.

You can try a recursive algorithm to calculate the determinant for any n-by-n matrices.

3Seedeterminant.

Zheng-Liang Lu 320 / 354

(10)

Recursive Algorithm for Determinant

1 function y = myDet(X)

2 [r, c] = size(X);

3 if isequal(r,c) % check r==c?

4 if c == 1

5 y = X;

6 elseif c == 2

7 y = X(1, 1) * X(2, 2) - X(1, 2) * X(2, 1);

8 else

9 for i = 1 : c

10 if i == 1

11 % recall that c == r

12 y = X(1, 1) * myDet(X(2 : c, 2 : c));

13 elseif (i > 1 && i < c)

14 y = y + X(1, i) * (-1)ˆ(i + 1) *...

15 myDet(X(2 : c, [1 : i-1, i + 1 : ...

c]));

16 else

Zheng-Liang Lu 321 / 354

(11)

17 y = y + X(1, c) * (-1)ˆ(c + 1) * ...

myDet(X(2 : c, 1 : c - 1));

18 end

19 end

20 end

21 else

22 error('Should be a square matrix.')

23 end

Obviously, there are n! terms in the calculation.

So, this algorithm runs in O(en log n).

Try any 10-by-10 matrix.

Compare the running time with det.4

4By LU decomposition with running time O(n3).

Zheng-Liang Lu 322 / 354

(12)

Inverse Matrices

In linear algebra, an n-by-n square matrix A is invertibleif there exists an n-by-n square matrix B such that

AB = BA = In, where In denotes the n-by-n identity matrix.

Note that eye is similar to zeros and returns an identity matrix.

The following statements are equivalent:

A is invertible

det(A) 6= 0 (nonsingular) A is full rank5

5Seerank.

Zheng-Liang Lu 323 / 354

(13)

inv(A) returns an inverse matrix of A.

Note that inv(A) may return inf if A is badly scaled or nearly singular.6.

So, if A is not invertible, then det(A) = 0.

It is well-known that

A−1= adj (A) det(A),

where adj (A) refers to the adjugate matrix of A.

You can try to calculate an adjugate matrix7 for any n-by-n matrix. (Try.)

6That is, det(A) is near 0.

7Seeadjugate matrix.

Zheng-Liang Lu 324 / 354

(14)

Example

1 clear; clc

2 % main

3 A = pascal(4); % Pascal matrix

4 B = inv(A)

1 >> B =

2

3 4.0000 -6.0000 4.0000 -1.0000

4 -6.0000 14.0000 -11.0000 3.0000

5 4.0000 -11.0000 10.0000 -3.0000

6 -1.0000 3.0000 -3.0000 1.0000

Try inv(magic(3)).

Zheng-Liang Lu 325 / 354

(15)

System of Linear Equations

A system of linear equations is a set of linear equations involving the same set of variables.

For example,

3x +2y −z = 1

x −y +2z = −1

−2x +y −2z = 0

After doing math, we have x = 1, y = −2, and z = −2.

Zheng-Liang Lu 326 / 354

(16)

A general system of m linear equations with n unknowns can be written as









a11x1 +a12x2 · · · +a1nxn = b1 a21x1 +a22x2 · · · +a2nxn = b2

... ... . .. ... = ... am1x1 +am2x2 · · · +amnxn = bm where x1, . . . , xn are unknowns, a11, . . . , amn are the coefficientsof the system, and b1, . . . , bm are the constant terms.

Zheng-Liang Lu 327 / 354

(17)

So, we can rewrite the system of linear equations as a matrix equation, given by

Ax = b.

where

A =

a11 a12 · · · a1n a21 a22 · · · a2n ... ... . .. ... am1 am2 · · · amn

 ,

x =

 x1

... xn

, and

b =

 b1

... bm

.

Zheng-Liang Lu 328 / 354

(18)

General System of Linear Equations

Let x be the column vector with n independent variables and m constraints.8

Ifm = n9, then there exists theunique solution x0.

Ifm > n, then there is no exact solution but there exists one least-squares error solution such that kAx0− bk2 is minimal.

Let x0 be the least-square error solution.

Then the error is  = Ax0− b, usually not zero.

So, kAx0− bk2= (Ax0− b)(Ax0− b) = 2is minimal.

Ifm < n, then there areinfinitely many solutions.

8Assume that these equations arelinearly independent.

9Equivalently, rank(A) = rank([A b]). Or, see Cramer’s rule.

Zheng-Liang Lu 329 / 354

(19)

Quick Glance at Method of Least Squares

The method of least squares is a standard approach to the approximate solution of overdetermined systems.

Zheng-Liang Lu 330 / 354

(20)

Matrix Division in MATLAB

Consider Ax = b for a system of linear equations.

A\b is the matrix division of A into B, which is roughly the same as inv(A)b, except thatit is computed in a different way.

Left matrix divide (\) or mldivide is to do x =A−1b

=inv(A)b

=A\b.

Zheng-Liang Lu 331 / 354

(21)

Unique Solution (m = n)

For example,

3x +2y −z = 1

x −y +2z = −1

−2x +y −2z = 0

1 >> A = [3 2 -1; 1 -1 2; -2 1 -2];

2 >> b = [1; -1; 0];

3 >> x = A \ b

4

5 1

6 -2

7 -2

Zheng-Liang Lu 332 / 354

(22)

Overdetermined System (m > n)

For example,

2x −y = 2

x −2y = −2

x +y = 1

1 >> A=[2 -1; 1 -2; 1 1];

2 >> b=[2; -2; 1];

3 >> x = A \ b

4

5 1

6 1

Zheng-Liang Lu 333 / 354

(23)

Underdetermined System (m < n)

For example,

 x +2y +3z = 7

4x +5y +6z = 8

1 >> A = [1 2 3; 4 5 6];

2 >> b = [7; 8];

3 >> x = A \ b

4

5 -3

6 0

7 3.333

8

9 % (Why?)

Zheng-Liang Lu 334 / 354

(24)

Gauss Elimination

Recall the Gauss Elimination in high school.

For example,

3x +2y −z = 1

x −y +2z = −1

−2x +y −2z = 0

It is known that x = 1, y = −2, and z = −2.

Zheng-Liang Lu 335 / 354

(25)

Check if det(A) = 0, then the program terminates; otherwise, the program continues.

Loop to form upper triangular matrix which looks like

A =¯

1 ¯a12 · · · ¯a1n

0 1 · · · ¯a2n ... ... 1 ... 0 0 · · · 1

 ,

and

¯b =

 b¯12 ... b¯n

 .

where ¯aijs and ¯bis are the resulting values.

Zheng-Liang Lu 336 / 354

(26)

Use backward substitution to determine the solution vector x by

xn = b¯n, xi = b¯i

n

X

j =i +1

¯ aijxj,

where i ∈ {1, · · · , n − 1}.

Zheng-Liang Lu 337 / 354

(27)

Solution

1 clear; clc;

2 % main

3 A = [3 2 -1; 1 -1 2; -2 1 -2];

4 b = [1; -1; 0];

5 x = zeros(3, 1);

6 if rank(A) == rank([A b])

7 for i = 1 : 3

8 for j = i : 3

9 b(j) = b(j) / A(j, i);

10 A(j, :) = A(j, :) / A(j, i);

11 end

12 for j = i + 1 : 3

13 A(j, :) = A(j, :) - A(i, :);

14 b(j) = b(j) - b(i);

15 end

16 end

17 for i= 3 : -1 : 1

18 x(i) = b(i);

Zheng-Liang Lu 338 / 354

(28)

19 for j = i + 1 : 1 : 3

20 x(i) = x(i) - A(i, j) * x(j);

21 end

22 end

23 else

24 disp('No unique solution.');

25 end

Zheng-Liang Lu 339 / 354

(29)

Extension

How to extend this algorithm for any simultaneous equations of n variables?

1 clear; clc

2

3 n = randi(10, 1);

4 A = randi(100, n, n);

5 b = randi(100, n, 1);

6 if rank(A) == rank([A b])

7 for i = 1 : n

8 for j = i : n

9 b(j) = b(j) / A(j, i);

10 A(j, :) = A(j, :) / A(j, i);

11 end

12 for j = i + 1 : n

13 A(j,:) = A(j, :) - A(i, :);

14 b(j) = b(j) - b(i);

Zheng-Liang Lu 340 / 354

(30)

15 end

16 end

17 for i= n : -1 : 1

18 x(i) = b(i);

19 for j = i + 1 : 1 : n

20 x(i) = x(i) - A(i, j) * x(j);

21 end

22 end

23 else

24 disp('No unique solution.');

25 end

26 x'

Zheng-Liang Lu 341 / 354

(31)

Exercise

rank is used to check if rank(A) =rank([A, b]).

rref is used to reduce the argumented matrix [A, b].

Use built-in functions to implement a program that solves a general system of linear equations.

Zheng-Liang Lu 342 / 354

(32)

Zheng-Liang Lu 343 / 354

(33)

Solution

1 function y = linearSolver(A,b)

2

3 if rank(A) == rank([A b]) % argumented matrix

4 if rank(A) == size(A, 2);

5 disp('Exact one solution.')

6 x = A \ b

7 else

8 disp('Infinite numbers of solutions.')

9 rref([A b])

10 end

11 else

12 disp('There is no solution. (Only least ...

square solutions.)')

13 end

Can you replace reff with your Gaussian elimination algorithm?

Zheng-Liang Lu 344 / 354

(34)

Method of Least Squares

The first clear and concise exposition of the method of least squares was published by Legendre in 1805.

In 1809,Gauss published his method of calculating the orbits of celestial bodies.

The method of least squares is a standard approach to the approximate solution ofoverdetermined systems, i.e., sets of equations in which there are more equations than unknowns.

To obtain the coefficient estimates,the least-squares method minimizes the summed square of residuals.

Zheng-Liang Lu 345 / 354

(35)

More specific...

Let {yi}ni =1 be the observed response values and {ˆyi}ni =1 be the fitted response values.

Define the error or residual i = yi− ˆyi for i = 1, . . . , n.

Then the sum of squares error estimates associated with the data is given by

S =

n

X

i =1

2i. (2)

Zheng-Liang Lu 346 / 354

(36)

Linear Least Squares

A linearmodel is defined as an equation that islinear in the coefficients.

Suppose that you have n data points that can be modeled by a 1st-order polynomial, given by

y = ax + b.

By (2), i = yi − (axi + b).

Now S =Pn

i =1(yi − (axi+ b))2.

The least-squares fitting process minimizes the summed square of the residuals.

The coefficient a and b are determined by differentiating S with respect to each parameter, and setting the result equal to zero. (Why?)

Zheng-Liang Lu 347 / 354

(37)

Hence,

∂S

∂a = − 2

n

X

i =1

xi(yi− (axi+ b)) = 0,

∂S

∂b = − 2

n

X

i =1

(yi− (axi + b)) = 0.

The normal equations are defined as a

n

X

i =1

xi2+ b

n

X

i =1

xi =

n

X

i =1

xiyi,

a

n

X

i =1

xi + nb =

n

X

i =1

yi.

Zheng-Liang Lu 348 / 354

(38)

In fact,

 Pn

i =1xi2 Pn

i =1xi Pn

i =1xi n

  a b



=

 Pn

i =1xiyi Pn

i =1yi

 . Solving for a, b, we have

a =nPn

i =1xiyi −Pn

i =1xiPn i =1yi

nPn

i =1xi2− (Pn i =1xi)2

=Cov(X , Y ) Var(X ) ,

where Cov(X , Y ) refers to the covariance between X and Y . Also, we have

b = 1 n(

n

X

i =1

yi − a

n

X

i =1

xi).

Zheng-Liang Lu 349 / 354

(39)

Example: Drag Coefficients

Let v be the velocity of a moving object and k be a positive constant.

The drag force due to air resistance is proportional to the square of the velocity, that is, d = kv2.

In a wind tunnel experiment, the velocity v can be varied by setting the speed of the fan and the drag can be measured directly.

Zheng-Liang Lu 350 / 354

(40)

The following sequence of commands replicates the data one might receive from a wind tunnel:

1 clear; clc;

2 % main

3 v = [0 : 1 : 60]';

4 d = [0.1234 * v .ˆ 2]';

5 dn = d + 0.4 * v .* randn(size(v));

6 figure(1), plot(v, dn, '*', v, d, 'r-'); grid on;

7 legend('Data', 'Analytic');

Zheng-Liang Lu 351 / 354

(41)

Zheng-Liang Lu 352 / 354

(42)

The unknown coefficient k is to be determined by the method of least squares.

The formulation could be









v12k = dn1

v22k = dn2

... v612 k = dn61

.

Recall that for any matrix A and vector b with Ax = b, x = A\b returns the least square solution.

1 >> k = v .ˆ2 \ dn % note that the v and dn ...

vectors are row vectors, need to be transposed.

2

3 k =

4

5 0.1239

Zheng-Liang Lu 353 / 354

(43)

Generalized Linear Least Squares

In matrix form, linear models are given by the formula y = Xb + ,

where y is an n-by-1 vector of responses, X is the n-by-m matrix for the model, b is m-by-1 vector of coefficients, and  is an n-by-1 vector of errors.

Then the normal equations are given by (XTX )b = XTy , where XT is the transpose of the X . So, b = (XTX )−1XTy .

Zheng-Liang Lu 354 / 354

參考文獻

相關文件

“Find sufficiently accurate starting approximate solution by using Steepest Descent method” + ”Compute convergent solution by using Newton-based methods”. The method of

denote the successive intervals produced by the bisection algorithm... denote the successive intervals produced by the

Reading Task 6: Genre Structure and Language Features. • Now let’s look at how language features (e.g. sentence patterns) are connected to the structure

Students are asked to collect information (including materials from books, pamphlet from Environmental Protection Department...etc.) of the possible effects of pollution on our

化成 reduced echelon form 後由於每一個 row 除了該 row 的 pivot 外, 只剩 free variables (其他的 pivot variable 所在的 entry 皆為 0), 所以可以很快地看出解的形式..

(a) Giving your answers in standard form, estimate (i)the total mass of the population of Europe.. (ii)how many more people live in Asia than

Then, it is easy to see that there are 9 problems for which the iterative numbers of the algorithm using ψ α,θ,p in the case of θ = 1 and p = 3 are less than the one of the

Especially, the additional gauge symmetry are needed in order to have first order differential equations as equations