### Matrix Multiplication

Let A be l × m matrix, and B be m × n matrix.

Then C = A × B is given by cij =

m

X

k=1

aik× b_{kj}, (1)

where aik, bkj, cij are defined as before.

Zheng-Liang Lu 312 / 354

For example,

Note that if n = 1, then cij =Pm

k=1aik × b_{kj} becomes the
matrix A times the vector B.

Zheng-Liang Lu 313 / 354

### Example

Let A ∈ R^{l ×m} and B ∈ R^{m×n}.

Please implement a program that A multiplies B and compare your result to A × B provided in MATLAB.

For simplicity, the dimensions l , m, n are the inputs to generate random matrices by rand.

Input: l , m, n for [A]_{l ×m} and [B]_{m×n}
Output: A × B

Zheng-Liang Lu 314 / 354

1 clear; clc;

2 % main

3 l = randi(5, 1); % A {l x m}

4 m = randi(5, 1); % B {m x n}

5 n = randi(5, 1);

6 A = randi(10, l, m);

7 B = randi(10, m, n);

8 C = zeros(size(A, 1), size(B, 2));

9 for i = 1 : 1 : size(A, 1)

10 for j = 1 : 1 : size(B, 2)

11 for k = 1 : 1 : size(A, 2)

12 C(i, j) = C(i, j) + A(i, k) * B(k, j);

13 end

14 end

15 end

Time complexity: O(n^{3})
Strassen (1969): O(n^{2.807355})

Coppersmith and Winograd (2010): O(n^{2.375477})

Zheng-Liang Lu 315 / 354

### Matrix Exponentiation

Raising a matrix to a power is equivalent to repeatedly multiplying the matrix by itself.

For example, A^{2}= A × A.

Note that A should be a square matrix. (Why?)

In mathematics, the matrix exponential^{1} is a matrix function
on square matrices analogous to the ordinary exponential
function.

That is, f (A) = e^{A}, similar to f (x ) = e^{x}.

However, raising a matrix to a matrix power, that is, A^{B}, is
undefined.

1Seematrix exponentialandPauli matrices.

Zheng-Liang Lu 316 / 354

### Example

Be aware that exp(X ) and exp(1)^{X} are different for any
matrix X .

1 clear; clc;

2 % main

3 X = [0 1; 1 0];

4 mat exp1 = exp(1) ˆ X

5 mat exp2 = 0;

6 for i = 1 : 10 % check the identity of eˆa

7 mat exp2 = mat exp2 + X ˆ (i - 1) / ...

factorial(i - 1);

8 end

9 mat exp2

10 mat exp3 = exp(X) % different?

Zheng-Liang Lu 317 / 354

1 mat exp1 =

2

3 1.5431 1.1752

4 1.1752 1.5431

5 6

7 mat exp2 =

8

9 1.5431 1.1752

10 1.1752 1.5431

11 12

13 mat exp3 =

14

15 1.0000 2.7183

16 2.7183 1.0000

Zheng-Liang Lu 318 / 354

### Determinant

det(A) is the determinant of the square matrix A.^{2}
Recall that if A =

a b c d

, then det(A) = ad − bc.

det(A) can calculate the determinant of A.

For example,

1 clear; clc;

2 % main

3 A = magic(3);

4 det(A)

2In math, |A| also denotes det(A).

Zheng-Liang Lu 319 / 354

Actually, a general formula^{3} for determinant is somehow
complicated.

You can try a recursive algorithm to calculate the determinant for any n-by-n matrices.

3Seedeterminant.

Zheng-Liang Lu 320 / 354

### Recursive Algorithm for Determinant

1 function y = myDet(X)

2 [r, c] = size(X);

3 if isequal(r,c) % check r==c?

4 if c == 1

5 y = X;

6 elseif c == 2

7 y = X(1, 1) * X(2, 2) - X(1, 2) * X(2, 1);

8 else

9 for i = 1 : c

10 if i == 1

11 % recall that c == r

12 y = X(1, 1) * myDet(X(2 : c, 2 : c));

13 elseif (i > 1 && i < c)

14 y = y + X(1, i) * (-1)ˆ(i + 1) *...

15 myDet(X(2 : c, [1 : i-1, i + 1 : ...

c]));

16 else

Zheng-Liang Lu 321 / 354

17 y = y + X(1, c) * (-1)ˆ(c + 1) * ...

myDet(X(2 : c, 1 : c - 1));

18 end

19 end

20 end

21 else

22 error('Should be a square matrix.')

23 end

Obviously, there are n! terms in the calculation.

So, this algorithm runs in O(e^{n log n}).

Try any 10-by-10 matrix.

Compare the running time with det.^{4}

4By LU decomposition with running time O(n^{3}).

Zheng-Liang Lu 322 / 354

### Inverse Matrices

In linear algebra, an n-by-n square matrix A is invertibleif there exists an n-by-n square matrix B such that

AB = BA = I_{n},
where In denotes the n-by-n identity matrix.

Note that eye is similar to zeros and returns an identity matrix.

The following statements are equivalent:

A is invertible

det(A) 6= 0 (nonsingular)
A is full rank^{5}

5Seerank.

Zheng-Liang Lu 323 / 354

inv(A) returns an inverse matrix of A.

Note that inv(A) may return inf if A is badly scaled or nearly
singular.^{6}.

So, if A is not invertible, then det(A) = 0.

It is well-known that

A^{−1}= adj (A)
det(A),

where adj (A) refers to the adjugate matrix of A.

You can try to calculate an adjugate matrix^{7} for any n-by-n
matrix. (Try.)

6That is, det(A) is near 0.

7Seeadjugate matrix.

Zheng-Liang Lu 324 / 354

### Example

1 clear; clc

2 % main

3 A = pascal(4); % Pascal matrix

4 B = inv(A)

1 >> B =

2

3 4.0000 -6.0000 4.0000 -1.0000

4 -6.0000 14.0000 -11.0000 3.0000

5 4.0000 -11.0000 10.0000 -3.0000

6 -1.0000 3.0000 -3.0000 1.0000

Try inv(magic(3)).

Zheng-Liang Lu 325 / 354

### System of Linear Equations

A system of linear equations is a set of linear equations involving the same set of variables.

For example,

3x +2y −z = 1

x −y +2z = −1

−2x +y −2z = 0

After doing math, we have x = 1, y = −2, and z = −2.

Zheng-Liang Lu 326 / 354

A general system of m linear equations with n unknowns can be written as

a_{11}x_{1} +a_{12}x_{2} · · · +a_{1n}x_{n} = b_{1}
a21x1 +a22x2 · · · +a2nxn = b2

... ... . .. ... = ...
a_{m1}x_{1} +a_{m2}x_{2} · · · +a_{mn}x_{n} = b_{m}
where x1, . . . , xn are unknowns, a11, . . . , amn are the
coefficientsof the system, and b1, . . . , bm are the constant
terms.

Zheng-Liang Lu 327 / 354

So, we can rewrite the system of linear equations as a matrix equation, given by

Ax = b.

where

A =

a_{11} a_{12} · · · a_{1n}
a_{21} a_{22} · · · a_{2n}
... ... . .. ...
a_{m1} a_{m2} · · · a_{mn}

,

x =

x1

...
x_{n}

, and

b =

b1

... bm

.

Zheng-Liang Lu 328 / 354

### General System of Linear Equations

Let x be the column vector with n independent variables and
m constraints.^{8}

Ifm = n^{9}, then there exists theunique solution x^{0}.

Ifm > n, then there is no exact solution but there exists one
least-squares error solution such that kAx^{0}− bk^{2} is minimal.

Let x^{0} be the least-square error solution.

Then the error is = Ax^{0}− b, usually not zero.

So, kAx^{0}− bk^{2}= (Ax^{0}− b)(Ax^{0}− b) = ^{2}is minimal.

Ifm < n, then there areinfinitely many solutions.

8Assume that these equations arelinearly independent.

9Equivalently, rank(A) = rank([A b]). Or, see Cramer’s rule.

Zheng-Liang Lu 329 / 354

### Quick Glance at Method of Least Squares

The method of least squares is a standard approach to the approximate solution of overdetermined systems.

Zheng-Liang Lu 330 / 354

### Matrix Division in MATLAB

Consider Ax = b for a system of linear equations.

A\b is the matrix division of A into B, which is roughly the same as inv(A)b, except thatit is computed in a different way.

Left matrix divide (\) or mldivide is to do
x =A^{−1}b

=inv(A)b

=A\b.

Zheng-Liang Lu 331 / 354

### Unique Solution (m = n)

For example,

3x +2y −z = 1

x −y +2z = −1

−2x +y −2z = 0

1 >> A = [3 2 -1; 1 -1 2; -2 1 -2];

2 >> b = [1; -1; 0];

3 >> x = A \ b

4

5 1

6 -2

7 -2

Zheng-Liang Lu 332 / 354

### Overdetermined System (m > n)

For example,

2x −y = 2

x −2y = −2

x +y = 1

1 >> A=[2 -1; 1 -2; 1 1];

2 >> b=[2; -2; 1];

3 >> x = A \ b

4

5 1

6 1

Zheng-Liang Lu 333 / 354

### Underdetermined System (m < n)

For example,

x +2y +3z = 7

4x +5y +6z = 8

1 >> A = [1 2 3; 4 5 6];

2 >> b = [7; 8];

3 >> x = A \ b

4

5 -3

6 0

7 3.333

8

9 % (Why?)

Zheng-Liang Lu 334 / 354

### Gauss Elimination

Recall the Gauss Elimination in high school.

For example,

3x +2y −z = 1

x −y +2z = −1

−2x +y −2z = 0

It is known that x = 1, y = −2, and z = −2.

Zheng-Liang Lu 335 / 354

Check if det(A) = 0, then the program terminates; otherwise, the program continues.

Loop to form upper triangular matrix which looks like

A =¯

1 ¯a12 · · · ¯a1n

0 1 · · · ¯a_{2n}
... ... 1 ...
0 0 · · · 1

,

and

¯b =

b¯_{1}
b¯_{2}
...
b¯_{n}

.

where ¯a_{ij}s and ¯b_{i}s are the resulting values.

Zheng-Liang Lu 336 / 354

Use backward substitution to determine the solution vector x by

x_{n} = b¯_{n},
xi = b¯i−

n

X

j =i +1

¯ aijxj,

where i ∈ {1, · · · , n − 1}.

Zheng-Liang Lu 337 / 354

### Solution

1 clear; clc;

2 % main

3 A = [3 2 -1; 1 -1 2; -2 1 -2];

4 b = [1; -1; 0];

5 x = zeros(3, 1);

6 if rank(A) == rank([A b])

7 for i = 1 : 3

8 for j = i : 3

9 b(j) = b(j) / A(j, i);

10 A(j, :) = A(j, :) / A(j, i);

11 end

12 for j = i + 1 : 3

13 A(j, :) = A(j, :) - A(i, :);

14 b(j) = b(j) - b(i);

15 end

16 end

17 for i= 3 : -1 : 1

18 x(i) = b(i);

Zheng-Liang Lu 338 / 354

19 for j = i + 1 : 1 : 3

20 x(i) = x(i) - A(i, j) * x(j);

21 end

22 end

23 else

24 disp('No unique solution.');

25 end

Zheng-Liang Lu 339 / 354

### Extension

How to extend this algorithm for any simultaneous equations of n variables?

1 clear; clc

2

3 n = randi(10, 1);

4 A = randi(100, n, n);

5 b = randi(100, n, 1);

6 if rank(A) == rank([A b])

7 for i = 1 : n

8 for j = i : n

9 b(j) = b(j) / A(j, i);

10 A(j, :) = A(j, :) / A(j, i);

11 end

12 for j = i + 1 : n

13 A(j,:) = A(j, :) - A(i, :);

14 b(j) = b(j) - b(i);

Zheng-Liang Lu 340 / 354

15 end

16 end

17 for i= n : -1 : 1

18 x(i) = b(i);

19 for j = i + 1 : 1 : n

20 x(i) = x(i) - A(i, j) * x(j);

21 end

22 end

23 else

24 disp('No unique solution.');

25 end

26 x'

Zheng-Liang Lu 341 / 354

### Exercise

rank is used to check if rank(A) =rank([A, b]).

rref is used to reduce the argumented matrix [A, b].

Use built-in functions to implement a program that solves a general system of linear equations.

Zheng-Liang Lu 342 / 354

Zheng-Liang Lu 343 / 354

### Solution

1 function y = linearSolver(A,b)

2

3 if rank(A) == rank([A b]) % argumented matrix

4 if rank(A) == size(A, 2);

5 disp('Exact one solution.')

6 x = A \ b

7 else

8 disp('Infinite numbers of solutions.')

9 rref([A b])

10 end

11 else

12 disp('There is no solution. (Only least ...

square solutions.)')

13 end

Can you replace reff with your Gaussian elimination algorithm?

Zheng-Liang Lu 344 / 354

### Method of Least Squares

The first clear and concise exposition of the method of least squares was published by Legendre in 1805.

In 1809,Gauss published his method of calculating the orbits of celestial bodies.

The method of least squares is a standard approach to the approximate solution ofoverdetermined systems, i.e., sets of equations in which there are more equations than unknowns.

To obtain the coefficient estimates,the least-squares method minimizes the summed square of residuals.

Zheng-Liang Lu 345 / 354

### More specific...

Let {y_{i}}^{n}_{i =1} be the observed response values and {ˆy_{i}}^{n}_{i =1} be
the fitted response values.

Define the error or residual _{i} = y_{i}− ˆy_{i} for i = 1, . . . , n.

Then the sum of squares error estimates associated with the data is given by

S =

n

X

i =1

^{2}_{i}. (2)

Zheng-Liang Lu 346 / 354

### Linear Least Squares

A linearmodel is defined as an equation that islinear in the coefficients.

Suppose that you have n data points that can be modeled by
a 1^{st}-order polynomial, given by

y = ax + b.

By (2), i = yi − (ax_{i} + b).

Now S =Pn

i =1(y_{i} − (ax_{i}+ b))^{2}.

The least-squares fitting process minimizes the summed square of the residuals.

The coefficient a and b are determined by differentiating S with respect to each parameter, and setting the result equal to zero. (Why?)

Zheng-Liang Lu 347 / 354

Hence,

∂S

∂a = − 2

n

X

i =1

x_{i}(y_{i}− (ax_{i}+ b)) = 0,

∂S

∂b = − 2

n

X

i =1

(y_{i}− (ax_{i} + b)) = 0.

The normal equations are defined as a

n

X

i =1

x_{i}^{2}+ b

n

X

i =1

xi =

n

X

i =1

xiyi,

a

n

X

i =1

xi + nb =

n

X

i =1

yi.

Zheng-Liang Lu 348 / 354

In fact,

P_{n}

i =1x_{i}^{2} P_{n}

i =1x_{i}
Pn

i =1x_{i} n

a b

=

P_{n}

i =1x_{i}y_{i}
Pn

i =1y_{i}

. Solving for a, b, we have

a =nPn

i =1xiyi −Pn

i =1xiPn i =1yi

nPn

i =1x_{i}^{2}− (Pn
i =1xi)^{2}

=Cov(X , Y ) Var(X ) ,

where Cov(X , Y ) refers to the covariance between X and Y . Also, we have

b = 1 n(

n

X

i =1

y_{i} − a

n

X

i =1

x_{i}).

Zheng-Liang Lu 349 / 354

### Example: Drag Coefficients

Let v be the velocity of a moving object and k be a positive constant.

The drag force due to air resistance is proportional to the
square of the velocity, that is, d = kv^{2}.

In a wind tunnel experiment, the velocity v can be varied by setting the speed of the fan and the drag can be measured directly.

Zheng-Liang Lu 350 / 354

The following sequence of commands replicates the data one might receive from a wind tunnel:

1 clear; clc;

2 % main

3 v = [0 : 1 : 60]';

4 d = [0.1234 * v .ˆ 2]';

5 dn = d + 0.4 * v .* randn(size(v));

6 figure(1), plot(v, dn, '*', v, d, 'r-'); grid on;

7 legend('Data', 'Analytic');

Zheng-Liang Lu 351 / 354

Zheng-Liang Lu 352 / 354

The unknown coefficient k is to be determined by the method of least squares.

The formulation could be

v_{1}^{2}k = dn1

v_{2}^{2}k = dn2

...
v_{61}^{2} k = dn61

.

Recall that for any matrix A and vector b with Ax = b, x = A\b returns the least square solution.

1 >> k = v .ˆ2 \ dn % note that the v and dn ...

vectors are row vectors, need to be transposed.

2

3 k =

4

5 0.1239

Zheng-Liang Lu 353 / 354

### Generalized Linear Least Squares

In matrix form, linear models are given by the formula y = Xb + ,

where y is an n-by-1 vector of responses, X is the n-by-m matrix for the model, b is m-by-1 vector of coefficients, and is an n-by-1 vector of errors.

Then the normal equations are given by
(X^{T}X )b = X^{T}y ,
where X^{T} is the transpose of the X .
So, b = (X^{T}X )^{−1}X^{T}y .

Zheng-Liang Lu 354 / 354