Matrix Multiplication
Let A be l × m matrix, and B be m × n matrix.
Then C = A × B is given by cij =
m
X
k=1
aik× bkj, (1)
where aik, bkj, cij are defined as before.
Zheng-Liang Lu 312 / 354
For example,
Note that if n = 1, then cij =Pm
k=1aik × bkj becomes the matrix A times the vector B.
Zheng-Liang Lu 313 / 354
Example
Let A ∈ Rl ×m and B ∈ Rm×n.
Please implement a program that A multiplies B and compare your result to A × B provided in MATLAB.
For simplicity, the dimensions l , m, n are the inputs to generate random matrices by rand.
Input: l , m, n for [A]l ×m and [B]m×n Output: A × B
Zheng-Liang Lu 314 / 354
1 clear; clc;
2 % main
3 l = randi(5, 1); % A {l x m}
4 m = randi(5, 1); % B {m x n}
5 n = randi(5, 1);
6 A = randi(10, l, m);
7 B = randi(10, m, n);
8 C = zeros(size(A, 1), size(B, 2));
9 for i = 1 : 1 : size(A, 1)
10 for j = 1 : 1 : size(B, 2)
11 for k = 1 : 1 : size(A, 2)
12 C(i, j) = C(i, j) + A(i, k) * B(k, j);
13 end
14 end
15 end
Time complexity: O(n3) Strassen (1969): O(n2.807355)
Coppersmith and Winograd (2010): O(n2.375477)
Zheng-Liang Lu 315 / 354
Matrix Exponentiation
Raising a matrix to a power is equivalent to repeatedly multiplying the matrix by itself.
For example, A2= A × A.
Note that A should be a square matrix. (Why?)
In mathematics, the matrix exponential1 is a matrix function on square matrices analogous to the ordinary exponential function.
That is, f (A) = eA, similar to f (x ) = ex.
However, raising a matrix to a matrix power, that is, AB, is undefined.
1Seematrix exponentialandPauli matrices.
Zheng-Liang Lu 316 / 354
Example
Be aware that exp(X ) and exp(1)X are different for any matrix X .
1 clear; clc;
2 % main
3 X = [0 1; 1 0];
4 mat exp1 = exp(1) ˆ X
5 mat exp2 = 0;
6 for i = 1 : 10 % check the identity of eˆa
7 mat exp2 = mat exp2 + X ˆ (i - 1) / ...
factorial(i - 1);
8 end
9 mat exp2
10 mat exp3 = exp(X) % different?
Zheng-Liang Lu 317 / 354
1 mat exp1 =
2
3 1.5431 1.1752
4 1.1752 1.5431
5 6
7 mat exp2 =
8
9 1.5431 1.1752
10 1.1752 1.5431
11 12
13 mat exp3 =
14
15 1.0000 2.7183
16 2.7183 1.0000
Zheng-Liang Lu 318 / 354
Determinant
det(A) is the determinant of the square matrix A.2 Recall that if A =
a b c d
, then det(A) = ad − bc.
det(A) can calculate the determinant of A.
For example,
1 clear; clc;
2 % main
3 A = magic(3);
4 det(A)
2In math, |A| also denotes det(A).
Zheng-Liang Lu 319 / 354
Actually, a general formula3 for determinant is somehow complicated.
You can try a recursive algorithm to calculate the determinant for any n-by-n matrices.
3Seedeterminant.
Zheng-Liang Lu 320 / 354
Recursive Algorithm for Determinant
1 function y = myDet(X)
2 [r, c] = size(X);
3 if isequal(r,c) % check r==c?
4 if c == 1
5 y = X;
6 elseif c == 2
7 y = X(1, 1) * X(2, 2) - X(1, 2) * X(2, 1);
8 else
9 for i = 1 : c
10 if i == 1
11 % recall that c == r
12 y = X(1, 1) * myDet(X(2 : c, 2 : c));
13 elseif (i > 1 && i < c)
14 y = y + X(1, i) * (-1)ˆ(i + 1) *...
15 myDet(X(2 : c, [1 : i-1, i + 1 : ...
c]));
16 else
Zheng-Liang Lu 321 / 354
17 y = y + X(1, c) * (-1)ˆ(c + 1) * ...
myDet(X(2 : c, 1 : c - 1));
18 end
19 end
20 end
21 else
22 error('Should be a square matrix.')
23 end
Obviously, there are n! terms in the calculation.
So, this algorithm runs in O(en log n).
Try any 10-by-10 matrix.
Compare the running time with det.4
4By LU decomposition with running time O(n3).
Zheng-Liang Lu 322 / 354
Inverse Matrices
In linear algebra, an n-by-n square matrix A is invertibleif there exists an n-by-n square matrix B such that
AB = BA = In, where In denotes the n-by-n identity matrix.
Note that eye is similar to zeros and returns an identity matrix.
The following statements are equivalent:
A is invertible
det(A) 6= 0 (nonsingular) A is full rank5
5Seerank.
Zheng-Liang Lu 323 / 354
inv(A) returns an inverse matrix of A.
Note that inv(A) may return inf if A is badly scaled or nearly singular.6.
So, if A is not invertible, then det(A) = 0.
It is well-known that
A−1= adj (A) det(A),
where adj (A) refers to the adjugate matrix of A.
You can try to calculate an adjugate matrix7 for any n-by-n matrix. (Try.)
6That is, det(A) is near 0.
7Seeadjugate matrix.
Zheng-Liang Lu 324 / 354
Example
1 clear; clc
2 % main
3 A = pascal(4); % Pascal matrix
4 B = inv(A)
1 >> B =
2
3 4.0000 -6.0000 4.0000 -1.0000
4 -6.0000 14.0000 -11.0000 3.0000
5 4.0000 -11.0000 10.0000 -3.0000
6 -1.0000 3.0000 -3.0000 1.0000
Try inv(magic(3)).
Zheng-Liang Lu 325 / 354
System of Linear Equations
A system of linear equations is a set of linear equations involving the same set of variables.
For example,
3x +2y −z = 1
x −y +2z = −1
−2x +y −2z = 0
After doing math, we have x = 1, y = −2, and z = −2.
Zheng-Liang Lu 326 / 354
A general system of m linear equations with n unknowns can be written as
a11x1 +a12x2 · · · +a1nxn = b1 a21x1 +a22x2 · · · +a2nxn = b2
... ... . .. ... = ... am1x1 +am2x2 · · · +amnxn = bm where x1, . . . , xn are unknowns, a11, . . . , amn are the coefficientsof the system, and b1, . . . , bm are the constant terms.
Zheng-Liang Lu 327 / 354
So, we can rewrite the system of linear equations as a matrix equation, given by
Ax = b.
where
A =
a11 a12 · · · a1n a21 a22 · · · a2n ... ... . .. ... am1 am2 · · · amn
,
x =
x1
... xn
, and
b =
b1
... bm
.
Zheng-Liang Lu 328 / 354
General System of Linear Equations
Let x be the column vector with n independent variables and m constraints.8
Ifm = n9, then there exists theunique solution x0.
Ifm > n, then there is no exact solution but there exists one least-squares error solution such that kAx0− bk2 is minimal.
Let x0 be the least-square error solution.
Then the error is = Ax0− b, usually not zero.
So, kAx0− bk2= (Ax0− b)(Ax0− b) = 2is minimal.
Ifm < n, then there areinfinitely many solutions.
8Assume that these equations arelinearly independent.
9Equivalently, rank(A) = rank([A b]). Or, see Cramer’s rule.
Zheng-Liang Lu 329 / 354
Quick Glance at Method of Least Squares
The method of least squares is a standard approach to the approximate solution of overdetermined systems.
Zheng-Liang Lu 330 / 354
Matrix Division in MATLAB
Consider Ax = b for a system of linear equations.
A\b is the matrix division of A into B, which is roughly the same as inv(A)b, except thatit is computed in a different way.
Left matrix divide (\) or mldivide is to do x =A−1b
=inv(A)b
=A\b.
Zheng-Liang Lu 331 / 354
Unique Solution (m = n)
For example,
3x +2y −z = 1
x −y +2z = −1
−2x +y −2z = 0
1 >> A = [3 2 -1; 1 -1 2; -2 1 -2];
2 >> b = [1; -1; 0];
3 >> x = A \ b
4
5 1
6 -2
7 -2
Zheng-Liang Lu 332 / 354
Overdetermined System (m > n)
For example,
2x −y = 2
x −2y = −2
x +y = 1
1 >> A=[2 -1; 1 -2; 1 1];
2 >> b=[2; -2; 1];
3 >> x = A \ b
4
5 1
6 1
Zheng-Liang Lu 333 / 354
Underdetermined System (m < n)
For example,
x +2y +3z = 7
4x +5y +6z = 8
1 >> A = [1 2 3; 4 5 6];
2 >> b = [7; 8];
3 >> x = A \ b
4
5 -3
6 0
7 3.333
8
9 % (Why?)
Zheng-Liang Lu 334 / 354
Gauss Elimination
Recall the Gauss Elimination in high school.
For example,
3x +2y −z = 1
x −y +2z = −1
−2x +y −2z = 0
It is known that x = 1, y = −2, and z = −2.
Zheng-Liang Lu 335 / 354
Check if det(A) = 0, then the program terminates; otherwise, the program continues.
Loop to form upper triangular matrix which looks like
A =¯
1 ¯a12 · · · ¯a1n
0 1 · · · ¯a2n ... ... 1 ... 0 0 · · · 1
,
and
¯b =
b¯1 b¯2 ... b¯n
.
where ¯aijs and ¯bis are the resulting values.
Zheng-Liang Lu 336 / 354
Use backward substitution to determine the solution vector x by
xn = b¯n, xi = b¯i−
n
X
j =i +1
¯ aijxj,
where i ∈ {1, · · · , n − 1}.
Zheng-Liang Lu 337 / 354
Solution
1 clear; clc;
2 % main
3 A = [3 2 -1; 1 -1 2; -2 1 -2];
4 b = [1; -1; 0];
5 x = zeros(3, 1);
6 if rank(A) == rank([A b])
7 for i = 1 : 3
8 for j = i : 3
9 b(j) = b(j) / A(j, i);
10 A(j, :) = A(j, :) / A(j, i);
11 end
12 for j = i + 1 : 3
13 A(j, :) = A(j, :) - A(i, :);
14 b(j) = b(j) - b(i);
15 end
16 end
17 for i= 3 : -1 : 1
18 x(i) = b(i);
Zheng-Liang Lu 338 / 354
19 for j = i + 1 : 1 : 3
20 x(i) = x(i) - A(i, j) * x(j);
21 end
22 end
23 else
24 disp('No unique solution.');
25 end
Zheng-Liang Lu 339 / 354
Extension
How to extend this algorithm for any simultaneous equations of n variables?
1 clear; clc
2
3 n = randi(10, 1);
4 A = randi(100, n, n);
5 b = randi(100, n, 1);
6 if rank(A) == rank([A b])
7 for i = 1 : n
8 for j = i : n
9 b(j) = b(j) / A(j, i);
10 A(j, :) = A(j, :) / A(j, i);
11 end
12 for j = i + 1 : n
13 A(j,:) = A(j, :) - A(i, :);
14 b(j) = b(j) - b(i);
Zheng-Liang Lu 340 / 354
15 end
16 end
17 for i= n : -1 : 1
18 x(i) = b(i);
19 for j = i + 1 : 1 : n
20 x(i) = x(i) - A(i, j) * x(j);
21 end
22 end
23 else
24 disp('No unique solution.');
25 end
26 x'
Zheng-Liang Lu 341 / 354
Exercise
rank is used to check if rank(A) =rank([A, b]).
rref is used to reduce the argumented matrix [A, b].
Use built-in functions to implement a program that solves a general system of linear equations.
Zheng-Liang Lu 342 / 354
Zheng-Liang Lu 343 / 354
Solution
1 function y = linearSolver(A,b)
2
3 if rank(A) == rank([A b]) % argumented matrix
4 if rank(A) == size(A, 2);
5 disp('Exact one solution.')
6 x = A \ b
7 else
8 disp('Infinite numbers of solutions.')
9 rref([A b])
10 end
11 else
12 disp('There is no solution. (Only least ...
square solutions.)')
13 end
Can you replace reff with your Gaussian elimination algorithm?
Zheng-Liang Lu 344 / 354
Method of Least Squares
The first clear and concise exposition of the method of least squares was published by Legendre in 1805.
In 1809,Gauss published his method of calculating the orbits of celestial bodies.
The method of least squares is a standard approach to the approximate solution ofoverdetermined systems, i.e., sets of equations in which there are more equations than unknowns.
To obtain the coefficient estimates,the least-squares method minimizes the summed square of residuals.
Zheng-Liang Lu 345 / 354
More specific...
Let {yi}ni =1 be the observed response values and {ˆyi}ni =1 be the fitted response values.
Define the error or residual i = yi− ˆyi for i = 1, . . . , n.
Then the sum of squares error estimates associated with the data is given by
S =
n
X
i =1
2i. (2)
Zheng-Liang Lu 346 / 354
Linear Least Squares
A linearmodel is defined as an equation that islinear in the coefficients.
Suppose that you have n data points that can be modeled by a 1st-order polynomial, given by
y = ax + b.
By (2), i = yi − (axi + b).
Now S =Pn
i =1(yi − (axi+ b))2.
The least-squares fitting process minimizes the summed square of the residuals.
The coefficient a and b are determined by differentiating S with respect to each parameter, and setting the result equal to zero. (Why?)
Zheng-Liang Lu 347 / 354
Hence,
∂S
∂a = − 2
n
X
i =1
xi(yi− (axi+ b)) = 0,
∂S
∂b = − 2
n
X
i =1
(yi− (axi + b)) = 0.
The normal equations are defined as a
n
X
i =1
xi2+ b
n
X
i =1
xi =
n
X
i =1
xiyi,
a
n
X
i =1
xi + nb =
n
X
i =1
yi.
Zheng-Liang Lu 348 / 354
In fact,
Pn
i =1xi2 Pn
i =1xi Pn
i =1xi n
a b
=
Pn
i =1xiyi Pn
i =1yi
. Solving for a, b, we have
a =nPn
i =1xiyi −Pn
i =1xiPn i =1yi
nPn
i =1xi2− (Pn i =1xi)2
=Cov(X , Y ) Var(X ) ,
where Cov(X , Y ) refers to the covariance between X and Y . Also, we have
b = 1 n(
n
X
i =1
yi − a
n
X
i =1
xi).
Zheng-Liang Lu 349 / 354
Example: Drag Coefficients
Let v be the velocity of a moving object and k be a positive constant.
The drag force due to air resistance is proportional to the square of the velocity, that is, d = kv2.
In a wind tunnel experiment, the velocity v can be varied by setting the speed of the fan and the drag can be measured directly.
Zheng-Liang Lu 350 / 354
The following sequence of commands replicates the data one might receive from a wind tunnel:
1 clear; clc;
2 % main
3 v = [0 : 1 : 60]';
4 d = [0.1234 * v .ˆ 2]';
5 dn = d + 0.4 * v .* randn(size(v));
6 figure(1), plot(v, dn, '*', v, d, 'r-'); grid on;
7 legend('Data', 'Analytic');
Zheng-Liang Lu 351 / 354
Zheng-Liang Lu 352 / 354
The unknown coefficient k is to be determined by the method of least squares.
The formulation could be
v12k = dn1
v22k = dn2
... v612 k = dn61
.
Recall that for any matrix A and vector b with Ax = b, x = A\b returns the least square solution.
1 >> k = v .ˆ2 \ dn % note that the v and dn ...
vectors are row vectors, need to be transposed.
2
3 k =
4
5 0.1239
Zheng-Liang Lu 353 / 354
Generalized Linear Least Squares
In matrix form, linear models are given by the formula y = Xb + ,
where y is an n-by-1 vector of responses, X is the n-by-m matrix for the model, b is m-by-1 vector of coefficients, and is an n-by-1 vector of errors.
Then the normal equations are given by (XTX )b = XTy , where XT is the transpose of the X . So, b = (XTX )−1XTy .
Zheng-Liang Lu 354 / 354