FINITE DIFFERENCE METHODS
FOR
SOLVING DIFFERENTIAL EQUATIONS
I-Liang Chern
Department of Mathematics National Taiwan University
Contents
1 Introduction 3
1.1 Finite Difference Approximation . . . 3
1.2 Basic Numerical Methods for Ordinary Differential Equations . . . 5
1.3 Runge-Kutta methods . . . 8
1.4 Multistep methods . . . 12
1.5 Linear difference equation . . . 16
1.6 Stability analysis . . . 20
1.6.1 Zero Stability . . . 20
1.6.2 Absolute Stability . . . 24
2 Finite Difference Methods for Linear Parabolic Equations 25 2.1 Finite Difference Methods for the Heat Equation . . . 25
2.1.1 Some discretization methods . . . 25
2.1.2 Stability and Convergence for the Forward Euler method . . . 27
2.2 L2Stability – von Neumann Analysis . . . 28
2.3 Energy method . . . 30
2.4 Stability Analysis via Entropy Estimates . . . 31
2.5 Entropy estimate for backward Euler method . . . 32
2.6 Existence Theory . . . 35
2.6.1 Existence via forward Euler method . . . 35
2.6.2 A Sharper Energy Estimate for backward Euler method . . . 36
2.7 Relaxation of errors . . . 37
2.8 Boundary Conditions . . . 40
2.8.1 Dirichlet boundary condition . . . 40
2.8.2 Neumann boundary condition . . . 42
2.9 The discrete Laplacian and its inversion . . . 43
3 Finite Difference Methods for Linear Elliptic Equations 47 3.1 Discrete Laplacian in two dimensions . . . 47
3.1.1 Discretization methods . . . 47
3.1.2 The 9-point discrete Laplacian . . . 48
3.2.1 Fourier method . . . 49
3.2.2 Energy method . . . 50
4 Finite Difference Theory For Linear Hyperbolic Equations 53 4.1 A review of smooth theory of linear hyperbolic equations . . . 53
4.1.1 Linear advection equation . . . 53
4.1.2 Linear systems of hyperbolic equations . . . 55
4.2 Finite difference methods for linear advection equation . . . 58
4.2.1 Design techniques . . . 58
4.2.2 Courant-Friedrichs-Levy condition . . . 61
4.2.3 Consistency and Truncation Errors . . . 61
4.2.4 Lax’s equivalence theorem . . . 62
4.2.5 Stability analysis . . . 63
4.2.6 Modified equation . . . 65
4.3 Finite difference schemes for linear hyperbolic system with constant coefficients . . 68
4.3.1 Some design techniques . . . 68
4.3.2 Stability analysis . . . 69
4.4 Finite difference methods for linear systems with variable coefficients . . . 71
5 Scalar Conservation Laws 75 5.1 Physical models . . . 75
5.1.1 Traffic flow model . . . 75
5.1.2 Burgers’ equation . . . 76
5.1.3 Two phase flow . . . 78
5.2 Basic theory . . . 78
5.2.1 Riemann problem . . . 79
5.2.2 Entropy conditions . . . 80
5.2.3 Rieman problem for nonconvex fluxes . . . 83
5.3 Uniqueness and Existence . . . 83
6 Finite Difference Schemes For Scalar Conservation Laws 87 6.1 Major problems . . . 87
6.2 Conservative schemes . . . 88
6.3 Entropy and Monotone schemes . . . 90
7 Finite Difference Methods for Hyperbolic Conservation Laws 95 7.1 Flux splitting methods . . . 96
7.1.1 Total Variation Diminishing (TVD) . . . 96
7.1.2 Other Examples for φ(θ) . . . 98
7.1.3 Extensions . . . 99
7.2 High Order Godunov Methods . . . 100
7.2.1 Piecewise-constant reconstruction . . . 101
7.3 Multidimension . . . 107
7.3.1 Splitting Method . . . 108
7.3.2 Unsplitting Methods . . . 109
8 Systems of Hyperbolic Conservation Laws 111 8.1 General Theory . . . 111
8.1.1 Rarefaction Waves . . . 112
8.1.2 Shock Waves . . . 112
8.1.3 Contact Discontinuity (Linear Wave) . . . 115
8.2 Physical Examples . . . 116
8.2.1 Gas dynamics . . . 116
8.2.2 Riemann Problem of Gas Dynamics . . . 119
9 Kinetic Theory and Kinetic Schemes 127 9.1 Kinetic Theory of Gases . . . 127
Chapter 1
Introduction
The goal of this course is to introduce theoretical analysis of finite difference methods for solving partial differential equations. The focuses are the stability and convergence theory. The partial differential equations to be discussed include
• parabolic equations, • elliptic equations,
• hyperbolic conservation laws.
1.1
Finite Difference Approximation
A finite difference approximation is to approximate differential operators by finite difference oper-ators, which is a linear combination of u on discrete points. For example,
• Forward difference: D+u(x) := u(x+h)−u(x)h ,
• Backward difference: D−u(x) := u(x)−u(x−h)h ,
• Centered difference: D0u(x) := u(x+h)−u(x−h)2h .
Here, h is called the mesh size. By Taylor expansion, we can get • u0(x) = D+u(x) + O(h),
• u0(x) = D−u(x) + O(h),
• u0(x) = D0u(x) + O(h2).
These formulae can be derived by performing Taylor expansion of u at x. For instance, we expand u(x + h) = u(x) + u0(x)h +h 2 2 u 00(x) + h3 3!u 000(x) + · · ·
u(x − h) = u(x) − u0(x)h +h 2 2 u 00 (x) − h 3 3!u 000 (x) + · · · . Subtracting these two equations yields
u(x + h) − u(x − h) = 2u0(x)h +2h 3 3! u 000 (x) + · · · . This gives u0(x) = D0u(x) − h2 3!u 000 (x) + · · · = D0u(x) + O(h2).
Thus, u0(x) can be approximated by several difference operators. Indeed, we can approximate u0(x) by finite difference operators which involve u on more discrete points with higher order errors. For example,
u0(x) = D3u(x) + O(h3),
where
D3u(x) =
1
6h(2u(x + h) + 3u(x) − 6u(x − h) + u(x − 2h)) .
This formula can be derived by taking Taylor expansion of u(x + h), u(x − h), u(x − 2h) about x, then making proper combination to cancel 0th, and 2nd derivatives term. That is
u(x + h) = u(x) + u0(x)h +h 2 2 u 00(x) + h3 3!u 000(x) + · · · u(x − h) = u(x) − u0(x)h +h 2 2 u 00(x) − h3 3!u 000(x) + · · · u(x − 2h) = u(x) − 2u0(x)h + 4h 2 2 u 00(x) − 8h3 3! u 000(x) + · · ·
Taking the combination 2u(x + h) + 3u(x) − 6u(x − h) + u(x − 2h), we can cancel the zeroth, second derivatives and obtain u0(x) = D3u(x) + O(h3).
In general, suppose we are given the values of u at discrete points {xj}. These discrete points
are called grid points. We want to approximation u(k) at a specific point ¯x by the values of u at some of these grid points, say xj, j = 0, ..., n with n ≥ k. That is,
u(k)(¯x) =
n
X
j=0
cju(xj) + O(hp−k+1)
Here, the mesh size h denotes max{|xi− xj|}. The parameter p ≥ k is some positive integer. We
want to design cj so that p is as larger as possible. As we shall see later that we can choose p = n.
To find the coefficients cj, j = 0, ..., n, we make Taylor expansion of u(xj) about the point x:
u(xj) = p X i=0 1 i!(xj− ¯x) iu(i)(¯x) + O(hp+1).
We plug this expansion formula of u(xj) into the finite difference approximation formula for u(k)(x): u(k)(¯x) = n X j=0 cj p X i=0 1 i!(xj− ¯x) iu(i)(¯x) + O(hp−k+1).
Comparing both sides, we obtain
n X j=0 (xj− ¯x)i i! cj = 1 if i = k 0 otherwise , for i = 0, ..., p.
There are p + 1 equations here, it is natural to choose p = n to match the n + 1 unknowns. This is a n×n Vandermonde system. It is nonsingular if xiare different. The matlab code fdcoeffV(k,xbar,x)
can be used to compute these coefficients. Reference: Randy LeVeque’s book and his Matlab code. In the case of uniform grid, using central finite differencing, we can get high order approxima-tion by using less grid points. For instance, let xj = jh, where j ∈ Z. Let uj = u(xj). Then
u0(0) = u1− u−1 h + O(h 2) u00(0) = u1− 2u0+ u−1 h2 + O(h 2) u(3) = 1 2h3(u2− 2u1+ 2u0− 2u−1+ u−2) + O(h 2). Homeworks.
1. Consider xi = ih, i = 0, ..., n. Let ¯x = xm. Find the coefficients ci for u(k)(¯x) and the
coefficient of the leading truncation error for the following cases: • k = 1, n = 2, 3, m = 0, 1, 2, 3.
• k = 2, n = 2, m = 0, 1, 2.
1.2
Basic Numerical Methods for Ordinary Differential Equations
The basic assumption to design numerical algorithm for ordinary differential equations is the smooth-ness of the solutions, which is in general valid provided the coefficients are also smooth. Basic designning techniques include numerical interpolation, numerical integration, and finite difference approximation.
Euler method
Euler method is the simplest numerical integrator for ODEs. The ODE
is discretized by
yn+1= yn+ kf (tn, yn). (1.2) Here, t0, ..., tnare the grid points of time t. k = tn+1− tnis time step size of the discretization.
This method is called the forward Euler method. It simply replace dy/dt(tn) by the forward finite difference (yn+1− yn)/k. Given a smooth solution y(·), by Taylor expansion, we define the local
truncation error to be
τn:= y0(tn) −y(t
n+1) − y(tn)
k = O(k).
We are interested in the true error, which is defined to be en:= yn− y(tn). We have the following
convergence theorem.
Theorem 1.1. Assume f ∈ C1 and suppose the solutiony0 = f (t, y) with y(0) = y0 exists on
[0, T ]. Then the Euler method converges at any t ∈ [0, T ]. In fact, the true error en has the following estimate:
|en| ≤ eλt
λ O(k) → 0, as n → ∞. (1.3) Here,λ = max |∂f /∂y| and nk = t.
Proof. From the regularity of the solution, we have y ∈ C2[0, T ] and
y(tn+1) = y(tn) + kf (tn, y(tn)) + kτn. (1.4) Taking difference of (1.2) and (1.4), we obtain
|en+1| ≤ |en| + k|f (tn, yn) − f (tn, y(tn))| + k|τn| ≤ (1 + kλ)|en| + k|τn|.
where
|f (t, x) − f (t, y)| ≤ λ|x − y|.
The finite difference inequality has a fundamental solution Gn = (1 + λk)n, which is positive provided k is small. Multiplying above equation by (1 + λk)−n−1, we obtain
|em+1|G−m−1≤ |em|G−m+ kG−m−1|τm|. Summing in m from m = 0 to n − 1, we get
|en| ≤ n−1 X m=0 Gn−m−1k|τm| ≤ n−1 X m=0 GmO(k2) = G n− 1 G − 1 O(k 2) ≤ Gn λ O(k) ≤ eλt λ O(k), where t = nk and we have used (1 + λk)n≤ eλt.
1. The theorem states that the numerical method converges in [0, T ] as long as the solutions of the ODE exists.
2. The error is O(k) if the solution is in C2[0, T ].
3. The proof above relies on the existence and smoothness of the solution. However, one can also use this approach to prove the local existence theorem by showning the approximate solutions generated by the Euler method form a Cauchy sequence.
Backward Euler method
In many applications, the system is relaxed to a stable solution in a very short period of time. For instance, consider
y0 = y − y¯ τ .
The corresponding solution y(t) → ¯y as t ∼ O(τ ). In the above forward Euler method, practically, we should require
1 + kλ ≤ 1
in order to have Gnremain bounded. Here, λ is the Lipschitz constant. In the present case, λ = 1/τ . If τ is very small, the the above forward Euler method will require very small k and lead to inefficient computation. In general, forward Euler method is inefficient (require small k) if
max ∂f (t, y) ∂y >> 1. There are two possibilities:
• ∂f /∂y >> 1: In this case, we need to choose a very small k in order to resolve details. • ∂f /∂y << −1.
For the second case, the backward Euler method is recommended: yn+1= yn+ kf (tn+1, yn+1). Comparing the Taylor expansion of the exact solution at tn+1:
y(tn+1) = y(tn) + kf (tn+1, y(tn+1)) + O(k), we get that the true error en:= yn− y(tn) satisfies
en+1= en+ k∆f + O(k2) = en+ k ∂f ∂y(t, ¯y) en+1+ O(k2). 1 − k ∂f ∂y(t, ¯y) en+1= en+ O(k2).
Since ∂f∂y(t, ¯y) << −1, we take λ = min ∂f ∂y(t, ¯y) , we get 1 − k ∂f ∂y(t, ¯y) ≥ (1 + λk). Thus, (1 + λk)|en+1| ≤ |en| + O(k2)
The corresponding fundamental solution is Gn:= (1 + λk)−n. Notice that the error satisfies |en| ≤ n−1 X m=0 (1 + λk)−mO(k2) ≤ (1 + λk) −n+1 λk O(k 2) ≤ e −λT λ O(k).
There is no restriction on the size of k. However, the price to pay is that we need to solve a nonlinear equation
yn+1= yn+ kf (tn+1, yn+1) for yn+1at each time step.
Leap frog method
We integrate y0 = f (t, y) from tn−1to tn+1: y(tn+1) − y(tn−1) =
ˆ tn+1
tn−1
f (τ, y(τ )) dτ. We apply the midpoint rule for numerical integration, we then get
y(tn+1) − y(tn−1) = 2kf (tn, y(tn)) + O(k3). The midpoint method (or called leapfrog method) is
yn+1− yn−1= 2kf (tn, yn). (1.5) Homeworks.
1. Prove the convergence theorem for the leap-frog method for ODE. Hint: consider the system y1n= yn−1and y2n= yn.
1.3
Runge-Kutta methods
The Runge-Kutta method (RK) is a strategy to integrate´ttnn+1f dτ by some quadrature method.
RK2: A second order RK, denoted by RK2, is based on the trapezoidal rule of numerical integra-tion. First, we integrate the ODE y0 = f (t, y) to get
y(tn+1) − yn= ˆ tn+1
tn
f (τ, y(τ )) dτ. Next, this integration is approximated by
ˆ tn+1
tn
f (τ, y(τ )) dτ = k 2 f (t
n, yn) + f (tn+1, yn+1) + O(k3).
The latter term involves yn+1. An explicit Runge-Kutta method approximate yn+1by yn+kf (tn, yn). Thus, RK2 reads ξ1 = f (tn, yn) yn+1 = yn+k 2(f (t n, yn) + f (tn+1, yn+ kξ 1)).
Another kind of RK2 is based on the midpoint rule of integration. It reads ξ1 = f (tn, yn)
yn+1 = yn+ kf (tn+1/2, yn+k 2ξ1) The truncation error of RK2 is
y(tn+1) − y(tn) = yn+1− y(tn) + O(k3).
RK4 The 4th order Runge-Kutta method uses Simpson’s rule to approximate integration: ˆ tn+1
tn
f (t, y(t)) dt = k 6
f (tn, y(tn)) + 4f (tn+1/2, y(tn+1/2)) + f (tn+1, y(tn+1))+ O(k5). The quantity y(tn+1/2) is approximated by forward Euler method. It has the form
yn+1 = yn+k 6(ξ1+ 2ξ2+ 2ξ3+ ξ4) ξ1 = f (tn, yn) ξ2 = f (tn+ 1 2k, y n+k 2ξ1) ξ3 = f (tn+ 1 2k, y n+k 2ξ2) ξ4 = f (tn+ k, yn+ kξ3)
The truncation error of RK4 is
General explicit Runge-Kutta methods The method takes the following general form yn+1= yn+ k s X i=1 biξi, where ξ1 = f (tn, yn), ξ2 = f (tn+ c2k, yn+ ka21ξ1), ξ3 = f (tn+ c3k, yn+ ka31ξ1+ ka32ξ2), .. . ξs = f (tn+ csk, yn+ k(as1ξ1+ · · · + as,s−1ξs−1)).
We need to specify s (the number of stages), the coefficients aij(1 ≤ j < i ≤ s), bi(i = 1, ..., s)
and ci(i = 2, ..., s). We list them in the following Butcher table.
There are s(s − 1)/2 + s + (s − 1) unknowns to be determined for a specific scheme. We require the 0 c2 a21 c3 a31 a32 .. . ... . .. cs as1 as2 · · · as,s−1 b1 b2 · · · bs−1 bs
truncation error to be O(kp+1). To find these coefficients, we need to expand the truncation error formula
y(tn+1) − yn= yn+1− yn+ O(kp+1)
about (tn, yn) in terms of derivatives of y(·) at tn. Then we can get linear equations for the
coeffi-cients.
Adaptive Runge-Kutta (Runge-Kutta-Fehlberg method, ODE45) The adaptive Runge-Kutta method is designed to be able to estimate local truncation error in each time step. From which, we can adjust time step size to have roughly uniform truncation error in each step. This is done by using two RK methods with the same sets of aij and ci but different bi, b∗i. The set bi produces
RK method of order p. The auxiliary set b∗i produces a RK method with order p − 1. It is used to estimate the local truncation by
yn+1− yn+1,∗= h
s
X
i=1
(bi− b∗i)ki= O(hp)
The step size h is then estimated so that the truncation error is roughly the same in each time step. Below is the Butcher table for RK5 and RK4 (b∗).
0 1/4 1/4 3/8 3/32 9/32 12/13 1932/2197 −7200/2197 7296/2197 1 439/216 −8 3860/513 −845/4104 1/2 −8/27 2 −3544/2565 1859/4104 −11/40 b 16/135 0 6656/12825 28561/56430 −9/50 2/55 b∗ 25/216 0 1408/2565 2197/4104 −1/5 0
Convergence proof, an example Let us see the proof of the convergence of the two stage Runge-Kutta method. The scheme can be expressed as
yn+1= yn+ kΨ(yn, tn, k) (1.6) where
Ψ(yn, tn, k) := f (y + 1
2kf (y)). (1.7)
Suppose y(·) is a true solution, the corresponding truncation error τn:= y(t
n+1) − y(tn)
k − Ψ(y(t
n), tn, k) = O(k2)
Thus, the true solution satisfies
y(tn+1) − y(tn) = kΨ(y(tn), tn, k) + kτn The true error en:= yn− y(tn) satisfies
en+1= en+ k(Ψ(yn, tn, k) − Ψ(y(tn), tn, k)) − kτn. This implies
|en+1| ≤ |en| + kλ0|en| + k|τn|,
where λ0 is the Lipschitz constant of Ψ(y, t, k) with respect to y. Hence, we get |en| ≤ (1 + kλ0)n|e0| + k n−1 X m=0 (1 + kλ0)n−1−m|τm| ≤ eλ0t|e0| +e λ0t λ0 maxm |τ m| Reference:
• Lloyd N. Trefethen, Finite Difference and Sprectral Methods for Ordinary and Partial Differ-ential Equations,
• Randy LeVeque,
1.4
Multistep methods
The idea of multi-step method is to derive a relation between, for instance, yn+1, yn, yn−1, y0nand y0n−1so that the corresponding truncation is small. The simplest multistep method is the midpoint method. Suppose ynand yn−1is given. The new state yn+1is defined by
yn+1− yn−1= 2ky0n= 2kf (tn, yn) The truncation error is
τn:= 1 k y(t
n+1) − y(tn−1) − 2ky0(tn) = O(k2).
Thus, the method is second order.
We can also design a method which involves yn+1, yn, yn−1and y0n, y0n−1. For instance, yn+1= yn+k
2(3f (y
n) − f (yn−1))
The truncation error τn:= 1 k yn+1− yn+ k 2(3f (y n) − f (yn−1)) = O(k2).
A general r-step multistep method involves (yn+1, yn, ..., yn+1−r) and (y0n+1, y0n, ..., y0n+1−r). It can be written as r X m=0 amyn+1−r+m= k r X m=0 bmy0n+1−r+m= k r X m=0 bmfn+1−r+m. (1.8)
We will always assume ar 6= 0. Because it is the coefficient corresponding to yn+1, which is what
we want to find. When br= 0 the method is explicit; otherwise it is implicit. For a smooth solution
of (1.1), we define the truncation error τnto be τn:= 1 k r X m=0 amy(tn+1−r+m) − k r X m=0 bmy0(tn+1−r+m) !
Definition 1.1. A multi-step method is called of order p if τn= O(kp) uniformly in n. It is called consistent if τn(k) → 0 uniformly in n as k → 0.
Remark. When f is smooth, the solution of ODE y0 = f (t, y) is also smooth. Then the truncation is a smooth function of k. In this case, τ (k) → 0 is equivalent to τ (k) = O(k) as k → 0.
Initial setup An r-step multi-step method needs (y0, y1, ..., yr−1)T to start. There is only y0 given initially. We need to construct y1, ..., yr−1 by other methods. For instance RK methods. In
order to maintain the order of accuracy, we should use a method of p − 1 order. This will give initial error yi− y(ti) = O(kp) for i = 0, ..., r − 1.
Derivation of multistep method of order p For notational convenience, let us extend a’s and b’s by setting am= 0, bm= 0 for m > r. Taking Taylor expansion about tn+1−r, we get
kτn = r X m=0 am ∞ X j=0 1 j!y (j)(mk)j− k r X m=0 bm ∞ X j=1 1 (j − 1)!y (j)(mk)j−1 = r X m=0 am ! y(0)+ ∞ X j=1 1 j! r X m=0 mjam− jmj−1bm kjy(j) = r X m=0 am ! y(0)+ ∞ X j=1 1 j! r X m=0 mj−1(mam− jbm) kjy(j) = ∞ X j=0 1 j! X m=0 mj−1(mam− jbm) kjy(j) = ∞ X j=0 Cjy(j).
Here, the derivatives of y(·) are evaluated at tn+1−r. We list few equations for the coefficients a and b: C0 = a0+ · · · + ar C1 = (a1+ 2a2+ · · · rar) − (b0+ · · · + br) C2 = 1 2 (a1+ 2 2a 2+ · · · r2ar) − 2(b1+ · · · + rbr) .. . Cp = r X m=0 mp p! am− r X m=1 mp−1 (p − 1)!bm To obtain a scheme of order p, we need to require
Cj = 0, for j = 0, ..., p.
There are 2(r + 1) unknowns for the coefficients {am}rm=0, {bm}rm=0. In principle, we can choose
p = 2r + 1 to have the same number of equations. Unfortunately, there is some limitation from stability criterion which we shall be explained in the next section. The order of accuracy p should satisfy p ≤ r + 2 if r is even, r + 1 if r is odd, r if it is an explicit scheme.
This is the first Dahlquist stability barrier. We shall not discuss here. See Trefethen’s book, or Stiff Equation in Wiki, Let us see some concrete examples below.
Explicit Adams-Bashforth schemes When br= 0, the method is explicit. Here are some
exam-ples of the explicit schemes called Adams-Bashforth schemes, where ar = 1:
• 1-step: yn+1= yn+ kf (yn)
• 2-step: yn+1= yn+k2(3f (yn) − f (yn−1)) • 3 step: yn+1 = yn+ k
12(23f (yn) − 16f (yn−1) + 5f (yn−2))
The step size is r and the order is p = r.
Implicit Adams-Moulton schemes Another examples are the Adams-Moulton schemes, where br6= 0 and and the step size r = p
• 1-step: yn+1= yn+k2(f (yn+1) + f (yn))
• 2-step: yn+1= yn+12k(5f (yn+1) + 8f (yn) − f (yn−1))
• 3 step: yn+1 = yn+24k(9f (yn+1) + 19f (yn) − 5f (yn−1) + f (yn−2))
Sometimes, we can use an explicit scheme to guess yn+1 as a predictor in an implicit scheme. Such a method is called a predictor-corrector method. A standard one is the following Adams-Bashforth-Moulton scheme: Its predictor part is the Adams-Bashforth scheme:
ˆ
yn+1= yn+ k
12(23f (y
n) − 16f (yn−1) + 5f (yn−2))
The corrector is the Adams-Moulton scheme: yn+1= yn+ k
24(9f (ˆy
n+1) + 19f (yn) − 5f (yn−1) + f (yn−2))
The predictor-corrector is still an explicit scheme. However, for stiff problem, we should use im-plicit scheme instead.
Matlab codes are available on Wikiversity with key words “Bashforth and Adams-Moulton methods.”
Formal algebra Let us introduce the shift operator Zyn= yn+1, or in continuous sense, Zy(t) = y(t + k). Let D be the differential operator. The Taylor expansion
y(t + k) = y(t) + ky0(t) + 1 2!k
2D2y(t) + · · ·
can be expressed formally as Zy = 1 + (kD) + 1 2!(kD) 2+ · · · y = ekDy.
The multistep method can be expressed as
Ly := (a(Z) − kb(Z)D) y =a(ekD) − kDb(ekD)y = (C0+ C1(kD) + · · ·) y.
Here, a(Z) = r X m=0 amZm, b(Z) = r X m=0 bmZm
are the generating functions of {am} and {bm}. A multistep method is of order p means that
a(ekD) − kDb(kD)
y = O((kD)p+1)y. We may abbreviate kD by a symbol κ. The above formula is equivalent to
a(eκ) − κb(eκ) = O(κp+1). Or equivalently,
a(eκ)
b(eκ) = κ + O(κ
p+1) as κ → 0. (1.9)
We have the following theorem
Theorem 1.2. A multistep method with b(1) 6= 0 is of order p if and only if a(z)
b(z) = log z + O((z − 1)
p+1) as z → 1.
It is consistent if and only if
a(1) = 0 and a0(1) = b(1).
Proof. The first formula can be obtain from (1.9) by writing eκ = z. For the second formula, we expand log z about 1. We can get
a(z) = b(z) (z − 1) −(z − 1) 2 2 + (z − 1)3 3 + · · · + O((z − 1)p+1). We also expand a(z) and b(z) about z = 1, we can get
a(1) + (z − 1)a0(1) = b(1)(z − 1) + O((z − 1)2). Thus, the scheme is consistent if and only if a(1) = 0 and a0(1) = b(1).
Homeworks.
1. Consider the linear ODE y0 = λy, derive the finite difference equation using multistep method involving yn+1, yn, yn−1and y0nand y0n−1for this linear ODE.
1.5
Linear difference equation
Second-order linear difference equation. In the linear case y0 = λy, the above difference scheme results in a linear difference equation. Let us consider general second order linear difference equation with constant coefficients:
ayn+1+ byn+ cyn−1= 0, (1.10) where a 6= 0. To find its general solutions, we try the ansatz yn= ρnfor some number ρ. Here, the n in ynis an index, whereas the n in ρnis a power. Plug this ansatz into the equation, we get
aρn+1+ bρn+ cρn−1= 0. This leads to
aρ2+ bρ + c = 0.
There are two solutions ρ1and ρ2. In case ρ1 6= ρ2, these two solutions are independent. Since the
equation is linear, any linear combination of these two solutions is again a solution. Moreover, the general solution can only depend on two free parameters, namely, once y0and y−1are known, then {yn}
n∈Zis uniquely determined. Thus, the general solution is
yn= C1ρn1 + C2ρn2,
where C1, C2are constants. In case of ρ1 = ρ2, then we can use the two solutions ρn2 and ρn1 with
ρ2 6= ρ1, but very closed, to produce another nontrivial solution:
lim
ρ2→ρ1
ρn2 − ρn 1
ρ2− ρ1
This yields the second solution is nρn−11 . Thus, the general solution is C1ρn1 + C2nρn−11 .
Linear finite difference equation of order r . We consider general linear finite difference equa-tion of order r:
aryn+r+ · · · + a0yn= 0, (1.11)
where ar 6= 0. Since yn+r can be solved in terms of yn+r−1, ..., ynfor all n, this equation together
with initial data y0, ..., y−r+1has a unique solution. The solution space is r dimensions.
To find fundamental solutions, we try the ansatz yn= ρn for some number ρ. Plug this ansatz into equation, we get
arρn+r+ · · · + a0ρn= 0,
for all n. This implies
a(ρ) := arρr+ · · · + a0 = 0. (1.12)
The polynomial a(ρ) is called the characteristic polynomial of (1.11) and its roots ρ1, ..., ρr are
• Simple roots (i.e. ρi 6= ρj, for all i 6= j): The fundamental solutions are ρni, i = 1, ..., r.
• Multiple roots: if ρi is a multiple root with multiplicity mi, then the corresponding
indepen-dent solutions
ρni, nρn−1i , C2nρn−2i ..., Cmni−1ρn−mi+1
i
Here, Ckn:= n!/(k!(n − k)!). The solution C2nρin−2can be derived from differentiationdρdC1nρn−1 at ρi.
In the case of simple roots, we can express general solution as yn= C1ρn1 + · · · + Crρnr,
where the constants C1, ..., Crare determined by
yk= C1ρk1+ · · · + Crρkr, k = 0, ..., r − 1.
System of linear difference equation. The above rth order linear difference equation is equiva-lent to a first order linear difference system:
A0yn+1= Ayn (1.13) where yn= y1n .. . yrn = yn−r+1 .. . yn A0 = I(r−1)×(r−1) 0 0 ar , A = 0 1 0 · · · 0 0 0 1 · · · 0 .. . ... ... . .. ... 0 0 0 · · · 1 −a0 −a1 −a2 · · · −ar−1
.
We may divide (1.13) by A0 and get
yn+1= Gyn.
We call G the fundamental matrix of (1.13). For this homogeneous equation, the solution is yn= Gny0
Next, we compute Gnin terms of eigenvalues of G.
In the case that all eigenvalues ρi, i = 1, ..., r of G are distinct, then G can be expressed as
G = TDT−1, D = diag (ρ1, · · · , ρr),
When the eigenvalues of G have multiple roots, we can normalize it into Jordan blocks: G = TJT−1, J = diag (J1, · · · , Js),
where the Jordan block Jicorresponds to eigenvalue ρiwith multiplicity mi:
Ji = ρi 1 0 · · · 0 0 ρi 1 · · · 0 .. . ... ... . .. ... 0 0 0 · · · 1 0 0 0 · · · ρi mi×mi . andPs
i=1mi = r. Indeed, this form also covers the case of distinct eigenvalues.
In the stability analysis below, we are concerned with whether Gnis bounnded. It is easy to see that Gn= TJnT−1, Jn= diag (Jn1, · · · , Jns) Jni = ρni nρn−1i C2nρn−2 · · · Cmni−1ρn−mi+1 i 0 ρni nρn−1i · · · Cmn i−2ρ n−mi+2 i .. . ... ... . .. ... 0 0 0 · · · nρn−1i 0 0 0 · · · ρni mi×mi . where Ckn:= k!(n−k)!n! .
Definition 1.2. The fundamental matrix G is called stable if Gn remains bounded under certain norm k · k for all n.
Theorem 1.3 (von Neumann). The fundamental matrix G is stable if and only if its eigenvalues satisfy the following condition:
either|ρ| = 1 and ρ is a simple root,
or|ρ| < 1 (1.14)
Proof. It is easy to see that the nth power of a Jordan form Jinis bounded if its eigenvalue |ρi| < 1
or if ρi| = 1 but simple. On the other hand, if |ρi| > 1 then Jinis unbounded; or if ρi| = 1 but not
simple, then the term nρn−1i in Jinwill be unbounded.
Corollary 1.1. There exists a norm in Rnsuch that the above root condition forG is equivalent to kGk ≤ 1 with this norm.
Proof. 1. First, in Rn(or Cn), we define kxk∞ = maxi|xi|. For a linear mapping G : Rn →
Rn, we define its operator norm under the k · k∞by
kGk∞:= sup x6=0
kGxk∞
kxk∞
It is an easy exercise that for G = (aij)n×n, the operator norm kGk∞= max i X j |aij|.
2. Second, a matrix G can be expressed as
G = TDT−1, D = diag (J1, · · · , Js)
where Ji are Jordan blocks. For any i 6= 0, we can further transform Jiinto
Ji = SiKiS−1i where Ki= ρi 0 · · · 0 0 ρi · · · 0 .. . ... ... . .. ... 0 0 0 · · · 0 0 0 · · · ρi mi×mi , Si= diag (1, i, ..., mi i−1).
Let S = diag(S1, ..., Ss), K = diag(K1, ..., Ks). We can express G as
G = TS K(TS)−1 We now define the new norm of G as
kGk := kKk∞
This means that we define the new norm k · k in Rnby kxk := k(TS)−1xk∞.
Since TS is invertible, this does define a norm in Rn. With this norm, the corresponding operator norm is kKk∞.
3. For those Ji with mi > 1, the stability condition requires that |ρi| < 1. We choose i such
that |ρi| + i ≤ 1. Then the corresponding kKik∞ ≤ 1. Thus, kGk ≤ 1 with the above
Nonhomogeneous linear finite difference system In general, we consider the nonhomogeneous linear difference system:
yn+1= Gyn+ fn (1.15)
with initial data y0. Its solution can be expressed as yn = Gyn−1+ fn−1 = G(Gyn−2+ fn−2) + fn−1 .. . = Gny0+ n−1 X m=0 Gn−1−mfm Homeworks.
1. Consider the linear ODE
y0 = λy
where λ considered here can be complex. Study the linear difference equation derived for this ODE by forward Euler method, backward Euler, midpoint. Find its general solutions. 2. Consider linear finite difference equation with source term
ayn+1+ byn+ cyn−1= fn Given initial data ¯y0and ¯y1, find its solution.
3. Find the characteristic roots for the Adams-Bashforth and Adams-Moulton schemes with steps 1-3 for the linear equation y0 = λy.
1.6
Stability analysis
There are two kinds of stability concepts.
• Zero stability: Fix t = nk, the computed solution yn remains bounded as n → ∞ (or equivalently, k → 0).
• Absolute stability: Fix k > 0, the computed solution ynremains bounded as n → ∞.
1.6.1 Zero Stability
Our goal is to develop general convergence theory for multistep finite difference method for ODE: y0 = f (t, y) with initial condition y(0) = y0. An r-step multistep finite difference scheme can be
expressed as Lyn= r X m=0 amyn+1−r+m− k r X m=0 bmf (tn+1−r+m, yn+1−r+m) = 0. (1.16)
Definition 1.3. The truncation error τn(k) for the above finite difference scheme is defined by τn(k) := 1 k r X m=0 amy(tn+1−r+m) − k r X m=0 bmf (tn+1−r+m, y(tn+1−r+m)) ! , where y(·) is a true solution of the ODE.
Definition 1.4. A difference scheme is called consistent if the corresponding truncation error τn(k) → 0 uniformly in n as the mesh size k → 0. The scheme is of order p if τn(k) = O(kp) uniform in n.
In the multistep method, the consistency is equivalent to τ (k) = O(k) because we assume y(·) is smooth and the truncation error is a smooth function in k. The consistency is τ (k) → 0 as k → 0. Thus the smoothness of τ implies τ (k) = O(k).
Definition 1.5. A difference scheme is called zero stable if its solutions at time step n remain bounded as the mesh size k → 0 (nk = T is fixed, according n → ∞).
The main theorem is the follows. We will postpone its proof at the end of this section.
Theorem 1.4 (Dahlquist equivalence theorem). For finite difference schemes for ODE y0 = f (t, y), consistency + zero-stability ⇔ convergence
Stability criterion Let us investigate the condition on the coefficients a and b of an explicit mul-tistep method for the stability
Lyn= 0
to be bounded. We may assume ar = 1 and br= 0. Let us write it in matrix form:
yn+1= Ayn+ kBfn where A = 0 1 0 1 . .. . .. 0 1
−a0 · · · −ar−2 −ar−1 , yn= yn−r · · · yn , B = 0 0 0 0 . .. ... 0 0 b0 · · · br−2 br−1 , fn= fn−r · · · fn ,
In order to have solution to be bounded for a multistep scheme Ly = 0 for arbitrary f , it has at least to be valid when f ≡ 0. In this case, we need to invetigate the boundedness for the homogeneous equation:
We have seen in the last section that
Theorem 1.5. The necessary and sufficient condition for kAnk to be bounded is that the charac-teristic rootsρiof the characteristic equationa(z) = 0 satisfies:
either|ρi| < 1 or|ρi| = 1 but simple.
Convergence ⇒ Stability
Proof. We only need to find an f such that the corresponding multistep is not stable implies that it does not converge. We choose f ≡ 0. * Since An is unbounded, which means there is an
eigenvalue ρi with eigenvector yi such that |ρi| > 1 or |ρi| = 1 but not simple. We discuss the
formal case. The latter case can also be prove easily. In the former case, let yi be the eigenvector
of A corresponding to the eigenvalue ρi which satisfies |ρi| > 1. Let us choose y0 and generate
y0 = (y0r−1, · · · , y0)T by some explicit scheme starting from y0. We can choose y0 such that its
component on yi is nonzero. Then the corresponding yn := Any0 will be unbounded. Hence it
cannot converge to a constant, as k → 0. On the other hand, y0 depends on the mesh size k and y0(k) → (y0, · · · , y0)T as k → 0. Thus, the method does not converge for f ≡ 0.
Convergence ⇒ Consistency
Proof. From Theorem 1.2, we need to show that a(1) = 0 and a0(1) = b(1). To show the first, we consider the ODE: y0 = 0 with y(0) = 1. For the second, we consider the ODE: y0 = 1 and y(0) = 0.
• Show a(1) = 0: We choose y0 = (1, · · · , 1)T. From y1= Ay0, we get yr= −a0y0− · · · − ar−1yr−1 = −a0− · · · − ar−1.
Since yris independent of k, and we should have yr → 1 as k → 0 (by convergence), we conclude that yr= 1. Thus, we get a(1) = a0+ · · · + ar−1+ 1 = 0.
• Show a0(1) = b(1). We choose f ≡ 1, y(0) = 0. The corresponding ODE solution is y(t) = t. The multistep method gives
a(Z)yn− kb(Z)1 = 0. (1.17) We write
a(Z) = a0(1)(Z − 1) + O((Z − 1)2), b(Z)1 = b(1).
*Suppose a multistep method is convergence for every smooth f , then in particular, for f ≡ 0. In this case, if this
Then the principal part of the above finite difference is (Z − 1)y − kb(1)
a0(1) = 0.
This is an arithmetic series. Its solution is yn= nkab(1)0(1). Indeed, this sequence also satisfies
(1.17) provided its initial data yn also has the form yn = nkab(1)0(1) for 0 ≤ n < r. Thus,
arithmetic series yn= nkab(1)0(1) is a solution of the difference equation (1.17). Since nk = t,
the convergence yn→ t as n → ∞ enforces ab(1)0(1) = 1.
Stability + Consistency ⇒ Convergence
Proof. We recall that we can express the scheme as
yn+1= Ayn+ kBfn.
Let Y be an exact solution, then plug it into the above scheme, we get Yn+1= AYn+ kBFn+ kτn,
where Yn:= (Y (tn−r), ...Y (tn))T. We subtract these two and call en:= Yn− yn. We get
en+1= Aen+ kB (Fn− fn) + kτn. The term Fn− fncan be repressed as
Fn− fn = (f (Yn−r) − f (yn−r), · · · , f (Yn) − f (yn))T = (L−ren−r, · · · , L0en)T := Lnen where L−m := ˆ 1 0 f0(yn−m+ ten−m) dt. Thus, we get en+1 = (A + kBLn) en+ kτn = Gn(k)en+ kτn
with C independent of n and k. The reason is the follows. First, we assume that f is Lipschitz. Thus, the functions L−m above are uniformly bounded (independent of n). Hence the term kBLk
Lemma 1.1. If kAnk is bounded and kBnk are uniformly bounded, then the product
k(A + 1
nB1) · · · (A + 1 nBn)k is also uniformly bounded.
We have en ≤ Gn−1en−1+ kτn−1 ≤ Gn−1Gn−2en−2+ k Gn−2τn−2+ τn−1 ≤ Gn−1Gn−2· · · G0e0 +k Gn−2· · · G0τ0+ · · · + Gn−2τn−2+ τn−1
From the lemma, we get
kenk ≤ Cke0k + nkC max
n kτ
nk ≤ Cke0k + O(kp).
Proof of Lemma 1.1
Proof. 1. We have seen that kAnk is uniformly bounded under some norm is equivalent to kAk ≤ 1 for some other operator norm. Thus, we may just assume kAk ≤ 1.
2. Since all norms in finite dimension are equivalent, we may assume kBik ≤ b for all i =
1, ..., n. 3. We have k(A + 1 nB1) · · · (A + 1 nBn)k ≤ (kAk + b n) n≤ (1 + b n) n≤ exp(b).
Theorem 1.6 (First Dahlquist barrier). A zero-stable and linear r-step multistep method with p order of convergence should satisfies
p ≤ r + 2 if r is even, r + 1 if r is odd, r if it is an explicit scheme.
For proof, see pp. Hairer, Norsett, Wanner, Solving Ordinary Differential Equations, 384-387.
1.6.2 Absolute Stability
See Randall LeVeque, Finite Difference Methods for Ordinary and Partial Differential Equations, Chapter 7.
Chapter 2
Finite Difference Methods for Linear
Parabolic Equations
2.1
Finite Difference Methods for the Heat Equation
2.1.1 Some discretization methods
Let us start from the simplest parabolic equation, the heat equation: ut= uxx
Let h = ∆x, k = ∆t be the spatial and temporal mesh sizes. Define xj = jh, j ∈ Z and tn= nk,
n ≥ 0. Let us abbreviate u(xj, tn) by unj. We shall approximate unj by Ujn, where Ujnsatisfies some
finite difference equations.
Spatial discretization : The simplest one is that we use centered finite difference approximation for uxx:
uxx=
uj+1− 2uj+ uj−1
h2 + O(h 2)
This results in the following systems of ODEs ˙ Uj(t) = Uj+1(t) − 2Uj(t) + Uj−1(t) h2 or in vector form ˙ U = 1 h2AU where U = (U0, U1, ...)t, A = diag (1, −2, 1).
Homeworks.
1. Derive the 4th order centered finite difference approximation for uxx:
uxx =
1
h2(−uj−2+ 16uj−1− 30uj+ 16uj+1− uj+2) + O(h 4).
2. Derive a 2nd order centered finite difference approximation for (κ(x)ux)x.
Temporal discretization We can apply numerical ODE solvers • Forward Euler method:
Un+1= Un+ k h2AU
n (2.1)
• Backward Euler method:
Un+1= Un+ k h2AU n+1 (2.2) • 2nd order Runge-Kutta (RK2): Un+1− Un= k h2AU n+1/2, Un+1/2= Un+ k 2h2AU n (2.3) • Crank-Nicolson: Un+1− Un= k 2h2(AU n+1+ AUn). (2.4)
These linear finite difference equations can be solved formally as Un+1= GUn where • Forward Euler: G = 1 +hk2A, • Backward Euler: G = (1 −hk2A)−1, • RK2: G = 1 +hk2A +12 hk2 2 A2 • Crank-Nicolson: G = 1+ k 2h2A 1− k 2h2A
For the Forward Euler, We may abbreviate it as
Ujn+1= G(Uj−1n , Ujn, Uj+1n ), (2.5) where
G(Uj−1, Uj, Uj+1) = Uj +
k
2.1.2 Stability and Convergence for the Forward Euler method
Our goal is to show under what condition can Ujnconverges to u(xj, tn) as the mesh sizes h, k → 0.
To see this, we first see the local error a true solution can produce. Plug a true solution u(x, t) into (2.1). We get un+1j − unj = k h2 u n j+1− 2unj + unj−1 + kτjn (2.6) where τjn= Dt,+unj − (ut)nj − (D+D−unj − (uxx)nj) = O(k) + O(h2).
Let enj denote for unj − Un
j. Then subtract (2.1) from (2.6), we get
en+1j − enj = k h2 e
n
j+1− 2enj + enj−1 + kτjn. (2.7)
This can be expressed in operator form:
en+1 = Gen+ kτn. (2.8) kenk ≤ kGen−1k + kkτn−1k
≤ kG2en−2k + k(kGτn−2k + kτn−1k)
≤ kGne0k + k(kGn−1τ0k + · · · + kGτn−2k + kτn−1k) Suppose G satisfies the stability condition
kGnU k ≤ CkU k
for some C independent of n. Then
kenk ≤ Cke0k + C max
m |τ m|.
If the local truncation error has the estimate max
m kτ
mk = O(h2) + O(k)
and the initial error e0 satisfies
ke0k = O(h2),
then so does the global true error satisfies
kenk = O(h2) + O(k) for all n.
The above analysis leads to the following definitions and the equivalence theorem.
Definition 2.1. A finite difference method is called consistent if its local truncation error τ satisfies kτh,kk → 0 as h, k → 0.
Definition 2.2. A finite difference scheme Un+1= Gh,k(Un) is called stable in a region (h, k) ∈ R
if there exists a norm k · k such that
kGnh,kU k ≤ CkU k for all n > 0. Here, C is a constant independent of n.
Definition 2.3. A finite difference method is called convergence if the true error keh,kk → 0 as h, k → 0.
In the above analysis, we have seen that for forward Euler method for the heat equation, Theorem 2.1. The forward Euler method for the heat equation has the property:
stability + consistency ⇔ convergence.
We have already proven stability + consistency ⇒ convergence. The proof of the other way is the same as the Dahlquist equivalent theorem 1.4.
2.2
L
2Stability – von Neumann Analysis
We have seen from the above discussion that the convergence issue is reduced to the stability issue. In the stability analysis, we need to choose a norm to measure stability of the amplification operator G, we will choose operator norm in L2. For constant coefficient case, the von Neumann analysis (via Fourier method) provides a necessary and sufficient condition for stability. For more general cases such as variable coefficient cases, Kreiss’ matrix theorem provides a good characterization of stability.
Below, we describe L2stability analysis. We introduce two equivalent methods: Fourier method and energy method, the Fourier method is also known as the von Neumann stability analysis. Given {Uj}j∈Z, let us define
kU k2=X
j∈Z
|Uj|2
and its Fourier transform ˆ U (ξ) = 1 2π X j∈Z Uje−ijξ, ξ ∈ [0, 2π).
There are two advantages to analyze stability of a finite difference scheme using Fourier method. • The shift operator is transformed to a multiplier:
d
T U (ξ) = eiξU (ξ),ˆ
• The Parseval equality kU k2 = k ˆU k2 ≡ ˆ π −π | ˆU (ξ)|2dξ
holds, thus one can control the L2-norm of U and GU in the Fourier space. Now, let us consider a finite difference scheme of the form:
Ujn+1= (GUn)j = m
X
k=−l
akUj+kn .
Taking Fourier transform, we obtain
[ Un+1= bG(ξ) cUn(ξ), where b G(ξ) := m X k=−l akeikξ.
From the Parseval equality,
kUn+1k2 = k [Un+1k2 = ˆ π −π | bG(ξ)|2| cUn(ξ)|2dξ ≤ max ξ | bG(ξ)| 2 ˆ π −π | cUn(ξ)|2dξ = | bG|2∞kU k2
Thus a sufficient condition for stability is
| bG|∞≤ 1. (2.9)
Theorem 2.2. A finite difference scheme Ujn+1=
m
X
k=−l
akUj+kn
with constant coefficients is stable if
b G(ξ) := m X k=−l akeikξ satisfies max −π≤ξ≤π| bG(ξ)| ≤ 1. (2.10)
For the forward Euler method for the heat equation,
Ujn+1 = G(Uj−1, Uj, Uj+1) = αUj−1+ (1 − 2α)Uj+ αUj+1, α =
k h2.
the corresponding b
G(ξ) = α(eiξ+ e−iξ) + (1 − 2α) = 1 − 4α sin2 ξ 2
. The condition (2.10) is equivalent to
α ≤ 1 2. That is, k h2 ≤ 1 2.
Or equivalently, Ujn+1is the convex combination of Uj−1, Uj and Uj+1.
Homeworks.
1. Compute the bG for the schemes: Backward Euler, RK2 and Crank-Nicolson.
2.3
Energy method
Let us write the finite difference scheme as
Ujn+1= αUj−1n + βUjn+ γUj+1n , (2.11) where
α, β, γ ≥ 0 and α + β + γ = 1.
We multiply (2.11) by Ujn+1on both sides, apply Cauchy-Schwarz inequality, we get
(Ujn+1)2 = αUj−1n Ujn+1+ βUjnUjn+1+ γUj+1n Ujn+1 ≤ α 2((U n j−1)2+ (Ujn+1) 2) +β 2((U n j)2+ (Ujn+1) 2) + γ 2((U n j+1)2+ (Ujn+1) 2)
Here, we have used α, β, γ ≥ 0. We multiply this inequality by h and sum it over j ∈ Z. Denote
kU k2:= X j |Uj|2h 1/2 . We get kUn+1k2 ≤ α 2(kU nk2+ kUn+1k2) + β 2(kU nk2+ kUn+1k2) +γ 2(kU nk2+ kUn+1k2)
= 1 2(kU
nk2+ kUn+1k2).
Here, α + β + γ = 1 is applied. Thus, we get the energy estimate
kUn+1k2≤ kUnk2. (2.12)
Homeworks.
1. Can the RK-2 method possess an energy estimate?
2.4
Stability Analysis via Entropy Estimates
Stability in the maximum norm
We notice that the action of G is a convex combination of Uj−1, Uj, Uj+1, provided
0 < k h2 ≤ 1 2. (2.13) Thus, we get min {Uj−1n , Ujn, Uj+1n } ≤ Ujn+1≤ max {Uj−1n , Ujn, Uj+1n }. This leads to minjUjn+1≥ minjUjn, maxjUjn+1≤ maxjUjn, and maxj|Ujn+1| ≤ maxj|Ujn|.
That is, G is stable in k · k∞.
Entropy estimates
The property that Un+1 is a convex combination (average) of Un is very important. Given any convex function η(u), by Jenson’s inequality, we have*
η(Ujn+1) ≤ αη(Uj−1n ) + βη(Ujn) + γη(Uj+1n ). (2.14)
*η is convex implies
η(αUj−1+ (1 − α)V ) ≤ αη(Uj−1) + (1 − α)η(V ).
Take V = (βUj+ γUj+1)/(1 − α). Apply the definition of convex function again, we get
η(V ) ≤ β
1 − αη(Uj) + γ
1 − αη(Uj+1). Combine these two inequalities, we get
Summing over all j and using α + β + γ = 1, we get X j η(Ujn+1) ≤X j η(Ujn). (2.15)
The convex function is called entropy in this setting. The above inequality means that the “entropy” decreases in time. In particular, we choose
• η(u) = |u|2, we recover the L2stability, • η(u) = |u|p, 1 ≤ p < ∞, we get
X j |Ujn+1|p ≤X j |Ujn|p This leads to X j |Ujn+1|ph 1/p ≤ X j |Ujn|ph 1/p , the general Lpstability. Taking p → ∞, we recover L∞stability. • η(u) = |u − c| for any constant c, we obtain
X j |Un+1 j − c| ≤ X j |Un j − c|
This is called Kruzkov’s entropy estimate. We will see this inequality in hyperbolic theory again.
Homeworks.
1. Show that the solution of the difference equation derived from the RK2 satisfies the entropy estimate. What is the condition required on h and k for such entropy estimate?
2.5
Entropy estimate for backward Euler method
The backward Euler method for the heat equation is
Un+1 = Un+ λAUn+1, A = diag(1, −2, 1), λ = k h2,
the amplification matrix is given by
Un+1= GUn, G = (I − λA)−1. (2.16) The matrix M := I − λA has the following property:
Definition 2.4. A matrix M = (mij) is called an M -matrix if it satisfies
mii> 0, mij ≤ 0,
X
j6=i
|mij| ≤ mii (2.17)
For M = I − λA arisen from the backward Euler method, the corresponding mii = 1 + 2λ,
mij = −λ for j 6= i. Thus, it is an M -matrix.
Theorem 2.3. The inverse of an M-matrix is a nonnegative matrix, i.e. all its entries are non-negative.
I shall not prove this general theorem. You can read Golub-von Loan’s book, or consult wiki. Instead, I will find the inverse of M for the above specific M-matrix. Let us express
M = I − λ diag(1, −2, 1) = 1 + 2λ
2 diag (−a, 2, −a). Here,
a = 2λ
1 + 2λ, and 0 < a < 1 if h, k > 0. The general solution of the difference equation
− auj−1+ 2uj− auj+1= 0 (2.18)
has the form:
uj = C1ρj1+ C2ρj2
where ρ1, ρ2are the characteristic roots, i.e. the roots of the polynomial equation
−aρ2+ 2ρ − a = 0. Thus,
ρi=
1 ±√1 − a2
a .
From the assumption of the M-matrix, 0 < a < 1, we have ρ1< 1 and ρ2 > 1.
Now, we define a fundamental solution: gj =
(
ρj1 for j ≥ 0 ρj2 for j < 0 .
We can check that gj → 0 as |j| → ∞. Moreover, gj satisfies the difference equation (2.18) for
|j| ≥ 1. For j = 0, we have
−ag−1+ 2g0− ag1 = −aρ2−1+ 2 − aρ1 = 2 − a(ρ1+ ρ−12 ) = d
We reset gj ← gj 2λ (1 + 2λ)d .
Then we have 1 + 2λ 2 (−ag−1+ 2g0− ag1) = 1, 1 + 2λ 2 (−agj−1+ 2gj − agj+1) = 0, ∀j 6= 0. This means X j gi−jmj,k= δi,k, or
G(I − λA) = Id.
Thus, M−1 = (gi−j) is a positive matrix (i.e. all its entries are positive). Furthermore, from
gi−j− λ (−gi−j−1+ 2gi−j− gi−j+1) = δij,
summing over j ∈ Z, we obtain
X
j
gi−j = 1 for all i.
This means that
Uin+1= (GUn)i =X
j∈Z
gi−jUjn
is indeed average of Unwith weights gi−j. With this property, we can apply Jensen’s inequality to
get the entropy estimates:
Theorem 2.4. Let η(u) be a convex function. Let Ujnbe a solution of the difference equation derived from the backward Euler method for the heat equation. Then we have
X
j
η(Ujn) ≤X
j
η(Uj0). (2.19)
Remark 1. • From entropy estimate, we get stability estimates for G in all Lp norms with 1 ≤ p ≤ ∞.
• It is important to note that there is no restriction on the mesh sizes h and k for stability for the Backward Euler method.
Homeworks.
1. Can the Crank-Nicolson method for the heat equation satisfy the entropy estimate? What is the condition on h and k?
2.6
Existence Theory
We can prove existence theorem of PDEs through finite difference approximation. In order to do so, let us define continuous and discrete Sobolev spaces and make a connection between them.
The continuous Sobolev space is defined as
Hm := {u : R → R|u, u0, ..., u(m)∈ L2(R)}.
The discrete Sobolev space for functions defined on grid Gh := {xj := jh|j ∈ Z}.
Hhm:= {U : Gh→ R|U, Dx,+U, ..., Dmx,+U ∈ `2}.
Here, (Dx,+U )nj := (Uj+1n − Ujn)/h
For any discrete function Uj ∈ Hhm we can construct a function uhin Hmdefined by
uh(x) :=
X
j
Ujφh(x − xj) (2.20)
where φh(x) = sinc(x/h). We have
uh(xj) = Uj, for all xj ∈ Gh
It can be shown that
kDxmuhk ≈ kDmx,+U k. (2.21)
Here, the norm is the L2 norm. Similarly, the space L∞k (Hhm) can be embeded into L∞(Hm) by defining uh,k(x, t) = X n≥0 X j Ujnφk(t)φh(x)
The discrete norm and the continuous norm are equivalent.
2.6.1 Existence via forward Euler method
The forward Euler method for the heat equation ut= uxxreads
Ujn+1= Ujn+ k h2 U
n
j−1− 2Ujn+ Uj+1n .
Here, We have seen that we can get the energy estimate: kUnk ≤ kU0k.
We perform finite difference operation on the above equation, say the forward Euler equation, for instance, let Vjn = (Dx,+U )nj := (Uj+1n − Ujn)/h. Then Vjn satisfies the same finite difference
equation
Vjn+1 = Vjn+ k h2 V
n
Thus, it also possesses the same energy estimate. Similar estimate for Dx,+2 U . In general, we have kDx,+m Unk ≤ kDmx,+U0k. (2.22) If we assume the initial data u0 ∈ H2, then we get Un∈ Hh2 for all n ≥ 0.
Theorem 2.5. If the initial data u0 ∈ Hm,m ≥ 2 and k/h2 ≤ 1/2, then the solution of forward
Euler equation has the estimate
kDx,+m Unk ≤ kDmx,+U0k, kDt,+Unk ≤ kDx,+2 U0k (2.23)
Further, the corresponding smoothing functionuh,k has the same estimate and has a subsequence
converges to a solutionu(x, t) of the original equation.
Proof. The functions uh,k are unformly bounded in W1,∞(H2). Hence they have a subsequence
converges to a function u ∈ W1,∞(H2) weakly in W1,∞(H2) and strongly in L∞(H1). The functions uh,k satisfy uh,k(xj, tn+1) − uh,k(xj, tn) = k h2(uh,k(xj−1, t n) − 2u h,k(xj, tn) + uh,k(xj+1, tn))
Multiply a test smooth function φ, sum over j and n, take summation by part, we can get the subsequence converges to a solution of ut= uxxweakly.
2.6.2 A Sharper Energy Estimate for backward Euler method
In this subsection, we will get a sharper energy estimate for solutions obtained from the backward Euler method. Recall the backward Euler method for solving the heat equation is
Ujn+1− Ujn= λ(Uj−1n+1− 2Ujn+1+ Uj+1n+1) (2.24) where λ = k/h2. An important technique is the summation by part:
X j (Uj− Uj−1)Vj = − X j Uj(Vj+1− Vj) (2.25)
There is no boundary term because we consider periodic condition in the present case. We multiply both sides by Ujn+1, then sum over j. We get
X j (Ujn+1)2− Ujn+1Ujn=X j λ(Uj−1n+1− 2Ujn+1+ Uj+1n+1)Ujn+1 = λ X j (Uj−1n+1− Ujn+1)Ujn+1+X j (Uj+1n+1− Ujn)Ujn+1 = λ X j (Ujn+1− Uj+1n+1)Uj+1n+1+X j (Uj+1n+1− Ujn)Ujn+1
= −λX j |Un+1 j+1 − Ujn+1|2. The term Ujn+1Ujn≤ 1 2((U n+1 j )2+ (Ujn)2)
by Cauchy-Schwartz. Hence, we get 1 2 X j (Ujn+1)2− (Ujn)2) ≤ −λX j |Uj+1n+1− Ujn+1|2 Or 1 2Dt,−kU n+1k2 ≤ −h2 k k h2kDx,+U n+1k = −kD x,+Un+1k2. (2.26) where, Dt,−Vjn+1:= Vjn+1− Vn j k , Dx,+U n+1 j := Uj+1n+1− Ujn+1 h ,
We sum in n from n = 1 to N , we get the following theorem. Theorem 2.6. For the backward Euler method, we have the estimate
kUNk2+ C
N
X
n=1
kDx,+Unk2 ≤ kU0k2 (2.27)
This gives controls not only on kUnk2but also on kD
x,+Unk.
Homeworks.
1. Show that the Crank-Nicolson method also has similar energy estimate. 2. Can forward Euler method have similar energy estimate?
2.7
Relaxation of errors
In this section, we want to study the evolution of an error on a periodic domain [0, 2π). We consider ut= uxx, x ∈ [0, 2π), (2.28)
with initial data u0. The grid points xj = 2πj/N and h = 2π/N . The error enj := u(xj, tn) − Ujn
satisfies
en+1j = enj + λ(enj−1− 2enj + enj+1) + kτjn. (2.29) We want to know how error is relaxed to zero from an initial error e0. We study the homogeneous finite difference quation first. That is
or en+1 = G(un). The matrix is a tridiagonal matrix. It can be diagonalized by Fourier method. The eigenfunctions and eigenvalues are
vk,j = e2πijk/N, ρk = 1 − 2λ + 2λ cos(2πk/N ) = 1 − 4λ sin2(πk/N ), k = 0, ..., N − 1.
When λ ≤ 1/2, all eigenvalues are negative except ρ0:
1 = ρ0 > |ρ1| > |ρ2| > · · · .
The eigenfunction
v0≡ 1.
Hence, the projection of any discrete function U onto this eigenfunction is the average:P
jUj.
Now, we decompose the error into
en= N −1 X k=0 enkvk, n ≥ 0 Then en+1k = ρkenk. Thus, enk = ρnke0k.
Since ρ0= 1, we see that en0 = e00, which is the average of en, does not decay, unless e00 = 0 initially.
To guarantee the average of e0 is zero, we may choose Ujnto be the cell average of u(x, tn) in the jth cell: Ujn= 1 h ˆ xj+1/2 xj−1/2 u(x, tn) dx.
instead of the grid data. This implies the initial error has zero local averages, and thus so does the global average.
For the decay behaviours of errors enk for k = 1, ..., N − 1, we notice that for 1 ≤ k ≤ N − 1, ρk= 1 − 4λ sin2 πk N ≈ 1 − 4λ πk N 2 , for N >> 1. The largest values of ρs are ρ1and ρN −1:
ρ1= ρN −1≈ 1 − 4λ π N 2 = 1 − 4∆t h2 π2 N2 = 1 − ∆t.
They correspond to low frequency eigenmodes: v1 = (e2πij/N)N −1j=0 and vN −1 = (e−2πij/N)N −1j=0 .
The decay rate
Here, t = n∆t. This is the decay rate of en1 and enN −1with n∆t = t. They are the slowest decay modes. For k = N/2, the corresponding eigenmode vN/2 = ((−1)j)N −1j=0 is the highest frequency mode. The corresponding eigenvalue
ρN/2 = 1 − 4λ = 1 − 4
∆t h2.
The decay rate is
ρnN/2= 1 − 4∆t h2 n ≈ e−h24t.
which decays very fast.
The contribution of the truncation error to the true error is: en+1= ρkenk+ ∆tτkn Its solution is enk = ρnke0k+ ∆t n−1 X m=0 ρn−1−mk τkm
We see that the term en0 does not tend to zero unless τ0m = 0. This can be achieved if we choose Uj
as well as fj to be the cell averages instead the grid data. We have seen that the truncation error is
second order. That is
τk,max := max 0≤m≤n−1|τ m k | = O(h2). Then for k ≥ 1, ∆t n−1 X m=0 |ρk|n−1−m≤ ∆t n−1 X m=0 |ρ1|n−1−m= ∆t1 − ρ n 1 1 − ρ1 ≈ ∆t 1 − e −t 1 − (1 − ∆t) = 1 − e −t. Thus, we obtain |en| ≤ e−te0+ (1 − e−t)O(h2) with n∆t = t. Homeworks. 1. Define Uj := h1 ´xj+1/2
xj−1/2 u(x) dx. Show that if u(x) is a smooth periodic function on [0, 1],
then
u00(xj) =
1
h2(Uj−1− 2Uj+ Uj+1) + τ
2.8
Boundary Conditions
2.8.1 Dirichlet boundary condition
Now, we consider the initial-boundary problem:
ut= uxx, x ∈ [0, 1]
The Dirichlet boundary condition is
u(0) = a, u(1) = b. (2.31) The initial condition is
u(x, 0) = u0(x).
We introduce uniform grids: xj = j/N , j = 0, ..., N . The forward Euler method can be realized
on x1, ..., xN −1as
Ujn+1− Ujn= ∆t h2 U
n
j−1− 2Ujn+ Uj+1n , j = 1, ..., N − 1.
Near the boundary point x1, the finite difference approximation of uxx at x1 involves u at x0 = 0.
We plug the boundary condition: uxx(x1) = U0− 2U1+ U2 h2 + O(h 2) = a − 2U1+ U2 h2 + O(h 2) (2.32) Similarly, uxx(xN −1) = UN −2− 2UN −1+ UN h2 + O(h 2) = UN −2− 2UN −1+ b h2 + O(h 2)
The unknowns are U1n, ..., UN −1n with N − 1 finite difference equations at x1, ..., xN −1. Including
boundary terms, we write the equation as
Un+1= (I + λA)Un+ λB, λ = ∆t h2, A = −2 1 0 · · · 0 0 1 −2 1 · · · 0 0 .. . ... ... . .. ... ... 0 0 0 · · · 1 −2 (N −1)×(N −1) , B = a 0 .. . b (N −1)×1 . (2.33)
The matrix A is the discrete Laplacian with zero Dirichlet boundary condition. The term B comes from the Dirichlet boundary conditions.
Next, we examine how error is relaxed for the Euler method with zero Dirichlet boundary con-dition. We have seen that the error enj := u(xj, tn) − Ujn satisfies the difference equation with
truncation error as the source term:
en+1j = ((I + λA)en)j+ λδB + ∆tτjn, j = 1, ..., N − 1 where τjnis the truncation error,
δB = δa 0 .. . δb (N −1)×1 .
If there is any error from u(0) = a, say U0= a + δa, it will creates a truncation error δa/h2at x1.
The solution for this difference equation is en= Gne0+
n−1
X
m=0
Gn−1−m(λδB + ∆tτm) , G = I + λA. From Fourier method, we can compute the eigenvector and eigenvalue of A:
vk= (sin(πjk/N )N −1j=1 , λk = −4 sin2(πk/(2N )), k = 1, ..., N − 1.
In fact, we extend vkto an N + 1-vector as vk,0 = vk,N = 0. Using this extended vector, we can
check
vk,j−1−2vk,j+vk,j+1= (2 cos(πk/N )−2) sin(πjk/N ) = −4 sin2
πk 2N
vk,j, j = 1, ..., N −1, k = 1, ..., N −1.
The eigenvalues of I + 4λA are
ρk= 1 − 4λ sin2
πk 2N
, k = 1, ..., N − 1. In the present case, all eigenvalues
ρk< 1, k = 1, ..., N − 1.
provided the stability condition
λ ≤ 1/2. In this case,
1 > ρ1 > |ρ2| > ... > |ρN −1|.
The lowest mode is ρ1, which is
ρ1 = 1 − 4λ sin2(π/2N ) ≈ 1 − λ π N 2 = 1 −∆t h2 π2 N2 = 1 − π 2∆t.
and
ρn1 ≈ 1 − π2∆tn≈ e−π2t Thus,
kGnk ≤ e−π2t, n∆t = t. The accumulation effect is
n−1 X m=0 kGn−1−mk ≤ 1 − e −π2t ∆t . Thus, the error from the initial data is
kGne0k ≤ e−π2tke0k The error coming from truncation is
n−1
X
m=0
Gn−1−m(∆tτm) = (1 − e−π2t)O(h2). The error due to boundary is
n−1
X
m=0
Gn−1−m(λδB) = (1 − e−π2t) 1 h2kδBk.
2.8.2 Neumann boundary condition
The Neumann boundary condition is
u0(0) = σ0, u0(1) = σ1. (2.34)
We may use the following discretization methods: • First order: U1− U0 h = σ0. • Second order-I: U1− U0 h = ux(x1/2) = ux(0) + h 2uxx(x0) = σ0+ h 2f (x0) (2.35) • Second order-II: we use extrapolation
−3U0+ 4U1− U2
2h2 = σ0.
The knowns are Ujn with j = 0, ..., N . In the mean time, we add two more equations at the boundaries.
Homeworks.
1. Find the eigenfunctions and eigenvalues for the discrete Laplacian with the Neumann bound-ary condition (consider both first order and second order approximation at boundbound-ary). Notice that there is a zero eigenvalue.
Hint: You may use Matlab to find the eigenvalues and eigenvectors.
Here, I will provide another method. Suppose A is the discrete Laplacian with Neumann boundary condition. A is an (N + 1) × (N + 1) matrix. Suppose Av = λv. Then for j = 1, ..., N − 1, v satisfies vj−1− 2vj+ vj+1= λvj, j = 1, ..., N − 1. For v0, we have −2v0+ 2v1= λv0. For vN, we have −2vN + 2vN −1= λvN.
Then this matrix has the following eigenvectors:
vjk= cos(πjk/N ), k = 0, ..., N with eigenvalue λk= −2 + 2 cos(πk/N ) = −4 sin2 πk 2N , k = 0, ..., N. Notice that λ0= 0. The error corresponding this eigenmode does not decay.
Homeworks.
1. Complete the calculation. 2. Consider
ut= uxx+ f (x)
on [0, 1] with Neumann boundary condition u0(0) = u0(1) = 0. If´ f (x) dx 6= 0. What wil happen to u as t → ∞?
2.9
The discrete Laplacian and its inversion
We consider the elliptic equation
uxx− αu = f (x), x ∈ (0, 1),
with the Dirichlet boundary condition
The finite difference approximation of uxx at x1 involves u at x0 = 0. We plug the boundary contion: uxx(x1) = U0− 2U1+ U2 h2 + O(h 2) = a − 2U1+ U2 h2 + O(h 2) Similarly, uxx(xN −1) = UN −2− 2UN −1+ UN h2 + O(h 2) = UN −2− 2UN −1+ b h2 + O(h 2)
The unknowns are U1n, ..., UN −1n with N −1 finite difference equations at x1, ..., xN −1. The discrete
Laplacian becomes A = −2 1 0 · · · 0 0 1 −2 1 · · · 0 0 .. . ... ... . .. ... ... 0 0 0 · · · 1 −2 (N −1)×(N −1) . (2.37)
This is a discrete Laplacian with Dirichlet boundary condition. We have seen in the last section that A can be diagonalized by the discrete Fourier sin functions vk = (sin(πjk/N )j=1N −1. In this
section, we will find explicit expression of A−1. Indeed, we will find (A−2β)−1where β = αh2/2. The difference equation
Uj−1− (2 + 2β)Uj+ Uj+1 = 0
has two independent solutions ρj1and ρj2, where ρiare roots of
ρ2− (2 + 2β)ρ + 1 = 0. That is
ρ = 1 + β ±p(1 + β)2− 1.
Our goal below is to construct fundamental solution Gij, which is G = (A − 2β)−1.
Case 1: β = 0 When β = 0, the two independent solutions are Uj = 1 and Uj = j. Let us
construct the fundamental solution centered at i, call it Gij. It has the form:
Gi,j =
jCi j ≤ i,
(N − j)Ci0 j ≥ i, 1 ≤ i, j ≤ (N − 1). (2.38) for some constants Ciand Ci0. From Gi,i−1− 2Gi,i+ Gi,i+1= 1 and iCi = (N − i)Ci0, we obtain
that
Ci = −(N − i)/N Ci0= −i/N.
Case 2: β > 0 When β > 0, the two roots are ρ1 < 1 and ρ2 > 1. The fundamental solution Gij
has the following form Gij =
(
C1ρj1+ C2ρj2 for j ≤ i
D1ρj1+ D2ρj2 for j ≥ i
1 ≤ i ≤ (N − 1), 0 ≤ j ≤ N.
Here, we extend Gi,j with 1 ≤ j ≤ (N − 1) to 0 ≤ j ≤ N . The constants C1, C2, D1, D2 are
determined by
Gi0= 0, Gi,N = 0,
Gi,i−1− (2 + 2β)Gi,i+ Gi,i+1 = 1
C1ρi1+ C2ρi2= D1ρi1+ D2ρi2.
Homeworks.
1. Find the coefficients C1, C2, D1, D2above.
Let us go back to the original equation:
uxx− αu = f (x)
The above study of the Green’s function of the discrete Laplacian helps us to quantify the the error produced from the source term. If Au = f and A−1= G, then an error in f , say τ , will produce an error
e = Gτ.
The error from the boundary also has the same behaviour. If the off-diagonal part of G decays exponentially (i.e. β > 0), then the error is “localized,” otherwise, it pollutes everywhere.
Project 2. Solve the following equation
uxx− αu + f (x) = 0, x ∈ [0, 1]
numerically with periodic, Dirichlet and Neumann boundary condition. The equilibrium 1. A layer structure f (x) = −1 1/4 < x < 3/4 1 otherwise 2. An impluse f (x) = γ 1/2 − δ < x < 1/2 + δ 0 otherwise 3. A dipole f (x) = γ 1/2 − δ < x < 1/2 −γ 1/2 < x < 1/2 + δ 0 otherwise
You may choose α = 0.1, 1, 10, observe how solutions change as you vary α. Project 3. Solve the following equation
−uxx+ f (u) = g(x), x ∈ [0, 1]
numerically with Neumann boundary condition. Here, f (u) = F0(u) and the potential is F (u) = u4− γu2.
Study the solution as a function of γ. Choose simple g, say piecewise constant, a delta function δ(x − x0), or a dipole δ(x − x0+ ) − δ(x − x0− ).
Chapter 3
Finite Difference Methods for Linear
Elliptic Equations
3.1
Discrete Laplacian in two dimensions
We will solve the Poisson equation
4u = f in a domain Ω ⊂ R2with Dirichlet boundary condition
u = g on ∂Ω
Such a problem is a core problem in many applications. We may assume g = 0 by substracting a suitable function from u. Thus, we limit our discussion to the case of zero boundary condition. Let h be the spatial mesh size. For simplicity, let us assume Ω = [0, 1] × [0, 1]. But many discussion below can be extended to general smooth bounded domain.
3.1.1 Discretization methods
Centered finite difference The Laplacian is approximated by A = 1
h2 (Ui−1,j+ Ui+1,j+ Ui,j−1+ Ui,j+1− 4Ui,j) .
For the square domain, the indeces run from 1 ≤ i, j ≤ N − 1 and U0,j = UN,j = Ui,0= Ui,N = 0