Applied Analysis

(1)

APPLIED ANALYSIS

I-Liang Chern

Department of Mathematics National Taiwan University

(2)

(3)

Chapter 1 Some Motivation from Calculus of

Variations

1.1 Introduction

In mathematical sciences, physical world problems are usually modeled by algebraic equations, differential equations, integral equations, or the extrema (minima or maxima) of some functions or functionals (functions that defined on a function space). For instance, in economics, we minimize cost function under certain constraints. In geometry, we look for geodesics which minimize the arc lengths, surfaces. In mechanics, we find an extremum of so-called action over all possible trajectories. In medical imaging problems, we minimizes some prior functional under the constraint of measurement error. These problems can be viewed as finding extrema (minima or maxima) of some functionals in some function spaces. Likewise, many time-dependent partial differential equations can be viewed as evolution processes in function spaces.

The analytic approach to these problems is to find an appropriate function space and to solve the equations in that space. Most techniques were developed in 18 – 20 centuries. However, students do not need to learn all details of them. More importantly, students need to learn the motivations, the key parts of the techniques and their applications. Thus, this lecture provides a short path to learn these basic analytic techniques. They include

Analytic Techniques

• Some motivation from calculus of variations

• Basic notion of function spaces: Metric spaces, Banach spaces and Hilbert spaces • Methods of Contraction mapping

• Hilbert spaces: Approximation Theory, Fourier Series • Compactness

(8)

• Bounded Operators in Hilbert spaces, Spectral theory.

Many materials of this lecture note come from the book written by John K. Hunter and Bruno Nachtergaele, Applied Analysis.

Especially, I will use the homeworks in their book. Nevertheless, I will emphasize more on con-structive approach and examples.

1.2 A short story about Calculus of Variations

The development of calculus of variations has a long history. It may goes back to the brachis-tochrone problem proposed by Johann Bernoulli (1696). This is an ancient Greek problem, which is to find a path (or a curve) connecting two points A and B with B lower than A such that it takes minimal time for a ball to roll from A to B under gravity. Hohann Bernoulli used Fermat principle (light travels path with shortest distance) to prove that the curve for solving the brachistochrone problem is the cycloid.

Euler (1707-1783) and Lagrange (1736-1813) are two important persons in the development of the theory of calculus of variation. I quote two paragraphs below from Wiki for you to know some story of Euler and Lagrange.

“Lagrange was an Italian-French Mathematician and Astronomer. By the age of 18 he was teaching geometry at the Rotal Artillery School of Turin, where he organized a discussion group that became the Turin Academy of Sciences. In 1755, Lagrange sent Euler a letter in which he discussed the Calculus of Variations. Euler was deeply impressed by Lagrange’s work, and he held back his own work on the subject to let Lagrange publish first.”

“Although Euler and Lagrange never met, when Euler left Berlin for St. Petersburg in 1766, he recommended that Lagrange succeed him as the director of the Berlin Academy. Over the course of a long and celebrated career (he would be lionized by Marie Antoinette, and made a count by Napoleon before his death), Lagrange published a systemization of mechanics using his calculus of variations, and did significant work on the three-body problem and astronomical perturbations.”

1.3 Problems from Geometry

Geodesic curves Find the shortest path connecting two points A and B on the plane. Let y(x) be a curve with (a, y(a)) = A and (b, y(b)) = B. The geodesic curve problem is to minimize

Z b

a

p

1 + y0(x)2_dx

among all paths y(·) connecting A to B.

isoperimetric problem This was an ancient Greek problem. It is to find a closed curve with a given length enclosing the greatest area. Suppose the curve is described by (x(s), y(s)), where s is

(9)

1.4. EULER-LAGRANGE EQUATION 5 the arc length. We may assume the total length is 2π. Thus, 0 ≤ s ≤ 2π. The isoperimetric problem is max1 2 Z T (x(s) ˙y(s) − y(s) ˙x(s)) ds subject to Z T p ˙ x(s)2_{+ ˙}_y(s)2_{ds = 2π.}

Here, T = [0, 2π] the unit circle. Its solution is the unit circle and the solution to this problem is usually expressed in the form of so-called isoperimetric inequality (4πA ≤ L2). A geometric proof was given by Steiner (1838). An analytic proof was given by Weierstrass and by Edler1. The proof by Hurwitz (1902) using Fourier method will be given in later Chapter.

Minimal surface spanned by a given contour Suppose a contour in 3D is given by h(x, y) with (x, y) ∈ ∂Ω, where Ω is a simple domain in R2. The problem is to find surface z(x, y) such that

z(x, y) = h(x, y) for (x, y) ∈ ∂Ω, and z minimize the area

Z

Ω

q 1 + z2

x+ zy2dx dy.

The minimal surface problem was studied by Lagrange (1762). Classical examples include plane, catenoids, helicoids. There were many interesting minimal surfaces found in the past two centuries

2_.

1.4 Euler-Lagrange Equation

Let us consider the following variational problem: minimize J [y] :=

Z b

a

F (x, y(x), y0(x)) dx subject to the boundary conditions

y(a) = ya, y(b) = yb.

The set

A = {y : [a, b] → Rn_{∈ C}1_{|y(a) = y}

a, y(b) = yb}

1 _{You can read a review article by Alan Siegel, A historical review of isoperimetric theorem in 2-D, and its place}

in elementary plan geometry, http://www.cs.nyu.edu/faculty/siegel/SCIAM.pdf. For applications, you may find a book chapter from Fan in http://www.math.ucsd.edu/ fan/research/cb/ch2.pdf

2

(10)

is called an admissible class. Here, C1[a, b] denotes the set of functions from [a, b] to Rnwhich are continuously differentiable. Given a path y ∈ A, we consider a variation of this path in the direction of v by

y(x, ) := y(x) + v(x).

Here, v is a C1 function with v(a) = v(b) = 0 in order to have y(·, ) ∈ A for small . Such v is called a variation. Sometimes, it is denoted by δy. We can plug y(·, ) into J . Suppose y is a local minimum, then for any such variation v, J [y + v] takes minimum at = 0. This leads to a necessary condition: d d =0J [y + v] = 0.

Let us compute this derivative d d =0J [y + v] = Z b a ∂ ∂ =0F (x, y(x) + v(x), y 0_{(x) + v}0_{(x)) dx} = Z b a Fy(x, y(x), y0(x))v(x) + Fy0(x, y(x), y0(x))v0(x) dx

It is understood that ∂F/∂y0 here means the partial derivative w.r.t. the third variable y0. For in-stance, suppose F = y02/2, then ∂F/∂y0= y0.

Theorem 1.1 (Necessary Condition). A necessary condition for y ∈ A to be a local minimum of J is

Z b

a

Fy(x, y(x), y0(x))v(x) + Fy0(x, y(x), y0(x))v0(x) dx = 0 (1.1)

for allv ∈ C1[a, b] with v(a) = v(b) = 0.

If the solution y ∈ C2, then we can take integration by part on the second term to get Z b a Fy0(x, y(x), y0(x))v0(x) dx = − Z b a d dxFy0(x, y(x), y 0_{(x))v(x) dx.}

Here, we have used v(a) = v(b) = 0. Thus, the necessary condition can be rewritten as Z b a Fy(x, y(x), y0(x)) − d dxFy0(x, y(x), y 0_(x)) v(x) dx = 0

for all v ∈ C1[a, b] with v(a) = v(b) = 0. A fundamental theorem of calculus of variations is that Theorem 1.2. If f ∈ C[a, b] satisfies

Z b

a

f (x)v(x) dx = 0 for allC∞[a, b] with v(a) = v(b) = 0, then f ≡ 0.

(11)

1.4. EULER-LAGRANGE EQUATION 7 Proof. If f (x0) 6= 0 for some x0 ∈ (a, b) (say f (x0) = C > 0), then there is small neighborhood

(x0 − , x0+ ) such that f (x) > C/2. We can choose v to be a hump such that v(x) = 1 for

|x − x0| ≤ /2 and v(x) ≥ 0 and v(x) = 0 for |x − x0| ≥ . The test function still satisfies the

boundary constraint if is small enough. Using this v, we get Z b

a

f (x)v(x) dx ≥ C 2 > 0.

This contradicts to our assumption. We conclude f (x0) = 0 for all x0 ∈ (a, b). Since f is

continu-ous on [a, b], we also have f (a) = f (b) = 0 by continuity of f . Thus, we obtain the following stronger necessary condition.

Theorem 1.3. A necessary condition for a local minimum y of J in A ∩ C2is δJ

δy := Fy(x, y(x), y

0_{(x)) −} d

dxFy0(x, y(x), y

0_{(x)) = 0.} _(1.2)

Equation 1.2 is called the Euler-Lagrange equation for the minimization problem min J [y].

Example For the problem of minimizing arc length, the functional is

J (y) = Z b

a

q

1 + y02dx,

where y(a) = y0, y(b) = y1. The corresponding Euler-Lagrange equation is

− d dxLy0 = d dx y0 p 1 + y02 ! = 0. This yields y0 p 1 + y02 = Const.

Solving y0, we further get

y0 = C (a constant). Hence y = Cx + D. Applying boundary condition, we get

C = y1− y0 b − a , D =

by0− ay1

b − a . Thus, the minimal arc length curve is a straight line.

(12)

1.5 Problems from Mechanics

Least action principle In classical mechanics, the motion of particles in R3is described by m¨x = −∇V (x) = F.

Here, V (x) is a potential and F is the (conservative) force. This is called Newton’s mechanics. Typical examples of potentials are the harmonic potential V (x) = k₂2|x|2_{for a mass-spring system,}

and Newtonian potential V (x) = −_|x|G for solar-planet system. Here, k is the spring constant, G, the gravitation constant.

The Newton mechanics was reformulated by Lagrange (1788) in variational form and was orig-inally motivated by describing particle motions under constraints. Let us explain this variational formulation without constraint. First, let us introduce the concept of virtual velocity or variation of position. Given a path x(t), t0≤ t ≤ t1, consider a family of paths

x(t) := x(t, ) := x(t) + v(t), t0 ≤ t ≤ t1, −0< < 0.

Here, v(t) is called a virtual velocity and x(·) is called a small variation of the path x(·).

Some-times, we denote v(·), the variation of x(·) by δx. That is, δx := ∂|=0x.

Now, the Newton’s law of motion can be viewed as

δW = (F − m¨x) · v = 0 for any virtual velocity v.

The term δW is called the total virtual work in the direction v. The term F · v is the virtual work done by the external force F , while m¨x · v is the work done by the inertia force. The d’Alembert principle of virtual workstates that the virtual work is always zero along physical particle path under small perturbationδx.

If we integrate it in time from t0to t1with fixed v(t0) = v(t1) = 0, then we get

0 = Z t1 t0 −m¨x · v − ∇V (x) · v dτ = Z t1 t0 m ˙x · ˙v − ∂V (x) dτ = Z t1 t0 ∂ 1 2m| ˙x| 2_{− ∂} V (x) dτ = ∂ Z t1 t0 L(x, ˙x) dτ = δS. Here, L(τ, x, ˙x) :=1 2m| ˙x| 2_{− V (x),}

is called the Lagrangian, and the integral S =

Z t

t0

(13)

1.6. METHOD OF LAGRANGE MULTIPLIER 9 is called the action. Thus, δS = 0 along a physical path. This is called the Hamilton principle or the least action principle. You can show that the corresponding Euler-Language equation is exact the Newton’s law of motion. Thus the following formulations are equivalent:

• Newton’s equation of motion m¨x = −V0(x); • d’Alembert principle of virtual work:Rt1

t0 m ˙x · ˙v − V

0_{(x)v dt = 0 for all virtual velocity v;}

• Hamilton’s least action principle: δRt1

t0

m

2| ˙x|2− V (x) dt = 0.

One advantage of variational formulation – first integral One advantage of this variational formulation is that it is easy to find some invariants (or so-called integrals) of the system. One exmple is the existence of first integral.

Theorem 1.4. When the Lagrangian L(x, ˙x) is independent of t, then the quantity (called the first integral)

I(x, ˙x) := ˙x ·∂L ∂ ˙x − L is independent oft along physical trajectories.

Proof. We differentiate I along a physical trajectory: d dt[ ˙xLx˙− L] = xL¨ x˙+ ˙x d dtLx˙− Lxx − L˙ x˙x¨ = x˙ d dtLx˙ − Lx = 0.

For the Newton mechanics where L(x, ˙x) = 1₂m| ˙x|2− V (x), this first integral is indeed the total energy. Indeed, we obtain

I(x, ˙x) = 1 2m| ˙x|

2_{+ V (x).}

1.6 Method of Lagrange Multiplier

In variational problems, there are usually accompanied with some constraints. As we have seen that the iso-perimetric problem. Lagrange introduced auxiliary variable, called the Lagrange multiplier, to solve these kinds of problems. Below, we use the hanging rope problem to explain the method of Lagrange multiplier.

(14)

Hanging rope problem A rope given by y(x), a ≤ x ≤ b hangs two end points (a, ya) and

(b, yb). Suppose the rope has length ` and density ρ(x). Suppose the rope is in equilibrium, then it

minimizes its potential energy, which is J [y] = Z ` 0 ρgy ds = Z b a ρgy q 1 + y02dx. The rope is subject to the length constraint

W[y] = Z b

a

q

1 + y02_{dx = `.}

Method of Lagrange multiplier In dealing with such problems, it is very much like the opti-mization problems in finite dimensions with constraints. Let us start with two dimensional ex-amples. Suppose we want to minimize f (x, y) with constraint g(x, y) = 0. The method of La-grange multiplier states that a necessary condition for (x0, y0) being such a solution is that, if

∇g(x0, y0) 6= 0, then ∇f (x0, y0) k ∇g(x0, y0). This means that there exists a constant λ0 such

that ∇f (x0, y0)+λ0∇g(x0, y0) = 0. In other words, (x0, y0, λ0) is an extremum of the unconstraint

function F (x, y, λ) := f (x, y) + λg(x, y). That is, (x0, y0, λ0) solves

∂F ∂x = 0, ∂F ∂y = 0, ∂F ∂λ = 0.

The first two is equivalent to ∇f (x0, y0) k ∇g(x0, y0). The last one is equivalent to the constraint

g(x0, y0) = 0. The advantage is that the new formulation is an unconstrained minimization problem.

For constrained minimization problem in n dimensions, we have same result. Let y = (y1, ..., yn). f : Rn_{→ R and g : R}n_{→ R. Consider}

min f (y) subject to g(y) = 0.

A necessary condition for y0 being such a solution is that, if ∇g(y0) 6= 0, then there exists λ0

such that (y0, λ0) is an extremum of the unconstraint function F (y, λ) := f (y) + λg(y). That is,

(y0, λ0) solves

∂F

∂y(y0, λ0) = 0, ∂F

∂λ(y0, λ0) = 0.

For variational problem, we have much the same. Let us consider a variational problem in an abstract form:

min J [y] subject to W[y] = 0

in some admissible class A = {y : [a, b] → R|y(a) = ya, y(b) = yb} in some function space. We

approximate this variational problem to a finite dimensional problem. For any large n, we partition [a, b] into n even subintervals:

xi = a + i

b − a

(15)

1.6. METHOD OF LAGRANGE MULTIPLIER 11 We approximate y(·) ∈ A by piecewise linear continuous function ˜y with

˜

y(xi) = y(xi), i = 0, ..., n.

The function ˜y ∈ A has an one-to-one correspondence to y := (y1, ..., yn−1) ∈ Rn−1. We approxi-mate J [y] by J (y) := J [˜y], and W[y] by W (y) = W[˜y]. Then the original constrained variational problem is approximated by a constrained optimization problem in finite dimension. Suppose y0is

such a solution. According to the method of Lagrange multiplier, if ∇W (y0) 6= 0, then there exists

a λ0such that (y0, λ0) solves the variational problem: J (y) + λW (y).

Notice that the infinite dimensional gradient δW/δy can be approximated by the finite dimen-sional gradient ∇W (y). That is

δW δy [y] ≈ δW δy [˜y] = ∂W ∂y = ∇W (y). We summarize the above intuitive argument as the following theorem.

Theorem 1.5. If y0is an extremum ofJ [·] subject to the constraint W[y] = 0, and if δW/δy 6= 0,

then there exists a constantλ0 such that (y0, λ0) is an extremum of the functional J [y] + λW[y]

with respect to(y, λ).

*Remark. A more serious proof is the follows. 1. We consider two-parameter variations

z(x) = y(x) + 1h1(x) + 2h2(x).

The variation hi should satisfy the boundary conditions: hi(a) = hi(b) = 0 in order to have

z satisfy the boundary conditions: z(a) = ya and z(b) = yb. For arbitrarily chosen such

variations hi, we should also require isatisfying

W (1, 2) = W[y + 1h1+ 2h2] = 0.

On the variational subspaces spanned by hi, i = 1, 2, the functional J becomes

J (1, 2) := J [y + 1h1+ 2h2].

Thus the original problem is reduced to

min J (1, 2) subject to W (1, 2) = 0

on this variational subspace. By the method of Lagrange multiplier, there exists a λ such that an extremum of the original problem solves the unconstraint optimization problem min J + λW . This leads to three equations

0 = ∂ ∂1 (J + λW ) = δJ δy + λ δW δy · h1 0 = ∂ ∂2 (J + λW ) = δJ δy + λ δW δy · h2 0 = ∂ ∂λ(J + λW ) = W[y]

(16)

2. Notice that the Lagrange multiplier λ so chosen, depends on h1and h2. We want o show that

it is indeed a constant. This is proved below.

3. Since δW/δy(x) 6= 0, we choose x1 where δW/δy(x1) 6= 0. For any x2 ∈ (a, b), we

consider hi = δ(x − xi), i = 1, 2. Here, δ is the Dirac delta function. It has the property: for

any continuous function f , Z

f (x)δ(x − x0) dx = f (x0).

By choosing such hi, we obtain that there exists a λ12such that

δJ δy(x1) + λ12 δW δy (x1) = 0 δJ δy(x2) + λ12 δW δy (x2) = 0 In other words, the constant

λ12= − δJ δy(x1) δW δy (x1) .

For any arbitrarily chosen x2, we get the same constant. Thus, λ12is independent of x2. In

fact, the above formula shows

δJ δy(x1) δW δy (x1) = δJ δy(x2) δW δy (x2) ,

for any x26= x1. This means that there exists a constant λ such that

δJ

δy(x) + λ δW

δy (x) = 0 for all x ∈ (a, b).

Apply the Lagrange method to the hanging rope problem Let us go back to investigate the hanging rope problem. By the method of Lagrangian multiplier, we consider the extremum problem of new Lagrangian L(y, y0, λ) = ρgy q 1 + y02_{+ λ} q 1 + y02_.

The Lagrangian is independent of x, thus it admits the first integral L − y0Ly0 = C, or

(ρgy + λ) q 1 + y02₋ y 02 p 1 + y02 ! = C.

Solving for y0 gives

y0 = ±1 C

p

(17)

1.7. A PROBLEM FROM SPRING-MASS SYSTEM 13 Using method of separation of variable, we get

dy

p(ρgy + λ)2_{− C}2 = ±

dx C. Change variable u = ρgy + λ, we get

1 ρg cosh −1u C = ±x C + C1. Hence y = −λ ρg + C ρg cosh ρgx C + C2 .

The constraints C, C2 and the Lagrange multiplier λ are then determined by the two boundary

conditions and the constraint. The shape of this hanging rope is called a catenary.

1.7 A problem from spring-mass system

3

Consider a spring-mass system which consists of n masses placed vertically between two walls. The n masses and the two end walls are connected by n + 1 springs. If all masses are zeros, the springs are “at rest” states. When the masses are greater than zeros, the springs are elongated due to the gravitation force. The mass mimoves down uidistance, called the displacement. The goal is

to find the displacements ui of the masses mi, i = 1, ..., n.

In this model, the nodes are the masses mi. We may treat the end walls are the fixed masses,

and call them m0and mn+1, respectively. The edges (or the bonds) are the springs. Let us call the

spring connecting mi and mi+1 by edge (or spring) i, i = 1, ..., n + 1. Suppose the spring i has

spring constant ci. Let us call the downward direction the positive direction.

Let me start from the simplest case: n = 1 and no bottom wall. The mass m1 elongates the

spring 1 by a displacement u1. The elongated spring has a restoration force −c1u1acting on m1.4

This force must be balanced with the gravitational force on m1.5 Thus, we have

−c₁u1+ f1= 0,

where f1= m1g, the gravitation force on m1, and g is the gravitation constant. From this, we get

u1 =

f1

c1

.

Next, let us consider the case where there is a bottom wall. In this case, both springs 1 and 2 exert forces upward to m1. The balance law becomes

(18)

This results u1 = f1/(c1+ c2).

Let us jump to a slightly more complicated case, say n = 3. The displacements u0= 0, u4 = 0,

due to the walls are fixed. The displacements u1, u2, u3cause elongations of the springs:

ei= ui− ui−1, i = 1, 2, 3, 4.

The restoration force of spring i is

wi = ciei.

The force exerted to miby spring i is −wi = −ciei. In fact, when ei < 0, the spring is shortened

and it pushes downward to mass mi (the sign is positive), hence the force is −ciei > 0. On the

other hand, when ei > 0, the spring is elongated and it pull mi upward. We still get the force

−wi = −ciei < 0. Similarly, the force exerted to mi by spring i + 1 is wi+1 = ci+1ei+1. When

ei+1> 0, the spring i + 1 is elongated and it pulls midownward, the force is wi+1= ci+1ei+1> 0.

When ei+1< 0, it pushes miupward, and the force wi+1= ci+1ei+1< 0. In both cases, the force

exterted to mi by spring i + 1 is wi+1.

Thus, the force balance law on miis

wi+1− wi+ fi= 0, i = 1, 2, 3.

There are three algebraic equations for three unknowns u1, u2, u3. In principle, we can solve it.

Let us express the above equations in matrix form. First, the elongation:

e = Au, or     e1 e2 e3 e4     =     1 −1 1 −1 1 −1       u1 u2 u3  

the restoration force:

w = Ce, or     w1 w2 w3 w4     =     c1 c2 c3 c4         e1 e2 e3 e4    

the force balance laws:

Atw = f, or   1 −1 1 −1 1 −1       w1 w2 w3 w4     =   f1 f2 f3  

3_{This section is mainly from Gilbert Strang’s book, Computational and Applied Mathematics.} 4_{The minus sign is due to the direction of force is upward.}

5

(19)

1.7. A PROBLEM FROM SPRING-MASS SYSTEM 15 where Atis the transpose of A.

We can write the above equations in block matrix form as C−1 A At 0 −w u = 0 −f .

This kind of block matrix appears commonly in many other physical systems, for instance, network flows, fluid flows. In fact, any optimization system with constraint can be written in this form. Here, the constraint part is the second equation. We shall come back to this point in the next section.

One way to solve the above block matrix system is to eliminate the variable w and get Ku := AtCAu = f.

The matrix K := AtCA is a symmetric positive definite matrix. It is called the stiffness matrix. For n = 4, we get K := AtCA =   c1+ c2 −c2 0 −c2 c2+ c3 −c3 0 −c3 c3+ c4  

Mimimum principle Consider the functional P (u) := 1

2(Ku, u) − (f, u),

where K is a symmetric positive definite matrix in Rn. The directional derivative of P at u in the direction v is defined as P0(u)v = d dt t=0 P (u + tv)

P0(u) is called the gradient (or the first variation) of P at u. We can compute this gradient:6 P0(u)v = d dt t=0 1 2(K(u + tv), u + tv) − (f, u + tv) = 1 2 (Kv, u) + (Ku, v) − (f, v) = (Ku − f, v).

Here, we have used K being symmetric. Thus,

P0(u) = Ku − f. The second derivative is the Hessian. It is

P00(u) = K.

6_{Here, I use the following properties: (f, g)}0

= (f0, g) + (f, g0). This is because (f, g) =P ifigiand (f, g)0 = P i fi0gi+ fi, gi0 = (f0, g) + (f, g0).

(20)

If u∗is a minimum of P (v), then P0(u∗) = 0. This is called the Euler-Lagrange equation of P . Conversely, If u∗ satisfies the Euler-Lagrange equation Ku∗ = f , then u∗ is the minimum of P (v). In fact, for any v, we compute P (v) − P (u∗). We claim

P (v) − P (u∗) = 1

2(K(v − u

∗

), v − u∗).

To see this, since P (v) is a quadratic function of v, we can complete the squares: P (v) − P (u∗) = 1 2(Kv, v) − (f, v) − 1 2(Ku ∗ , u∗) + (f, u∗) = 1 2(Kv, v) − 1 2(Ku ∗_{, u}∗_{) − (f, v − u}∗₎ = 1 2(Kv, v) − 1 2(Ku ∗_{, u}∗_{) − (Ku}∗_{, v − u}∗₎ = 1 2(Kv, v) + 1 2(Ku ∗_{, u}∗_{) − (Ku}∗_{, v)} = 1 2(K(v − u ∗ ), v − u∗) ≥ 0.

Hence we get that u∗is a minimum. In fact, u∗is the only minimum because P (v) = P (u∗) if and only if (K(v − u∗), v − u∗) = 0. Since K is positive definite, we get v − u∗ = 0.

We conclude the above discussion as the follows.

Theorem 1.6. Let P (u) := 1₂(Ku, u) − (f, u) and K is symmetric positive definite. The vector u∗ which minimizesP (v) must satisfy the Euler-Lagrange equation P0(u∗) = Ku∗− f = 0. The converse is also true.

The physical meaning of P is the total potential energy of the spring-mass system. Indeed, 1 2(CAu, Au) = n X i=1 1 2ci(ui− ui−1) 2

is the sum of the potential energy stored in the spring, whereas the term (f, u) =

n

X

i=1

fiui

is the sum of the works done by the mass miwith displacement uifor i = 1, ..., n. The term −(f, u)

is the gravitational potential due to the masses miwith displacements ui.

1.8 A Problem from Elasticity

Consider a continuous elastic bar7of length 1, which is hanged vertically. (it is displaced up and down due to gravity). Set up an x-axis along the bar, so that its positive direction pointing down-wards and its origin is located at the top of the elastic bar. Consider any point at x along the bar (the

7

(21)

1.8. A PROBLEM FROM ELASTICITY 17 position is at x if no external force present), it is displaced down to x + u(x) because of the action of the external force of gravity8. Function u(x) is called the displacement. The stretching at any point is measured by the derivative e = du/dx, called the strain. If u is a constant, the elastic bar is un-stretched. Otherwise the stretching of the bar produces an internal force called stress (one can experience this force easily by pulling the two ends of an elastic bar). By experiments, people find this internal force is proportional to the strain of the bar, i.e.

(internal force) σ(x) = c(x)du dx ,

where c(x) is a constant determined by the elastic material, or a function if the material is inhomo-geneous.

To set up the model, we take a small piece of the bar [x, x + 4x], its equilibrium requires all forces acted on it to be balanced. We have

ac(x)du dx x+4x− ac(x)du dx x+ (ρ4xa)g = 0, (1.3)

where g is the gravitational constant, a the cross-sectional area, and ρ(x) the density at position x. Dividing both sides of equation (1.3) by a4x, then taking ∆x → 0, we get

− d dx(c(x)

du

dx) = f (x) (1.4)

where f (x) = g ρ(x), the external force per unit length.

The equation (1.4) must come with some appropriate physical boundary conditions to ensure it is well-posed.

Boundary conditions

(a) Both ends of the elastic bar are fixed, so no displacements: u(0) = 0, u(1) = 0. This is called Dirichlet boundary conditions.

(b) Top end of the elastic bar is fixed (no displacement), the other end is free (no internal force since it is in the air):

u(0) = 0, w x=1= c(x) du dx x=1 = 0 .

The first is called a Dirichlet boundary condition, the second is called a Neumann boundary condition. The boundary conditions

u(0) = 0, or u(1) = 0

8

(22)

or c(x)du dx x=1 = 0

are all called homogeneous boundary conditions, while the boundary conditions u(0) = −1, or u(1) = 2, or c(x)du dx x=1 = 3

are all called non-homogeneous boundary conditions. A physical example of the last bound-ary condition is that there is a object with weight 3 hanging at the end x = 1.

Variational formulation We shall discuss how to derive the variational formulation for the differ-ential equation (1.4) with Dirichlet boundary conditions. The same methodology can be applied to any other second order differential equations such as the Sturm-Liouville systems and more general boundary conditions.

The derivation is standard and simple. To do so, we multiply both sides of equation (1.4) by an arbitrary test function v (called virtual strain), satisfying v(0) = v(1) = 0 then integrating over (0, 1) gives Z 1 0 − d dx c(x) du dxv(x) dx = Z 1 0 f (x)v(x) dx .

Now by integration by parts and use the boundary conditions v(0) = v(1) = 0, we have Z 1 0 c(x)du dx dv dx dx = Z 1 0 f (x)v dx.

This leads to the variational formulation for the equations (1.4) with Dirichlet boundary condition. Namely,

Find the solutionu such that u(0) = 0, u(1) = 0 and

a(u, v) = g(v) for anyv satisfying v(0) = 0 and v(1) = 0 (1.5) wherea(·, ·) and g(·) are given by

a(u, v) = Z 1 0 c(x)du dx dv dx dx, g(v) = Z 1 0 f (x)v dx. Definition 1.1. • A C2_{-solution of (1.4) is called a classical solution.}

• A solution of (1.5) is called a weak solution of (1.4).

The advantage of this variational formulation (or called weak formulation) is that it involves only first order derivatives of u, not second derivatives in the original differential equation formulation. Thus, it has less regularity constraint on the solution u. Physically, we do encounter discontinuous coefficients. the stiffness function c(x) is discontinuous if it is made of two different materials with different stiffness and connected at a point. We have the following proposition

(23)

1.8. A PROBLEM FROM ELASTICITY 19 Proposition 1.1. Suppose c(x) is discontinuous at ¯x and smooth elsewhere. Suppose uis continuous and is aC2solution of (1.4) on both sides ofx. Then u is a solution of (1.5) if and only if it satisfies¯ the following jump condition acrossx:¯

[cux] = 0 across ¯x.

Here,[f ] := f (¯x+) − f (¯x−) denotes the jump across ¯x.

Minimal energy formulation Similar to the argument in classical mechanics, for Dirichlet bound-ary condition, one can define the energy by

E[u] := 1

2a(u, u) − g(u), and the admissible class

A = {u ∈ C1[0, 1]|u(0) = u(1) = 0}. Then the Euler-Lagrange corresponding to

min u∈AE[u] is − d dx c(x)du dx = f (x) with u(0) = u(1) = 0.

In conclusion, The following three formulations are equivalent for u with u(0) = u(1) = 0: • Minimizing energy min u Z 1 0 1 2c(x)u 2 x− f (x)u(x) dx • Variational formulation Z 1 0

(c(x)uxvx− f (x)v(x)) dx = 0 for all v with v(0) = v(1) = 0

• The Euler-Lagrange equation

− d dx(c(x)

du

dx) = f (x)

What is the energy functional corresponding to the boundary condition u(0) = 0 and the free-end boundary condition c(1)ux(1) = 3?

(24)

Elastic bar model is a continuous limit of the spring-mass system. In the continuous model (1.4), we divide the domain [0, 1] into n+1 subintervals uniformly, each has length ∆x = 1/(n+1). We label grid points i∆x by xi. We imagine there are masses miat xiwith springs connecting them

consecutively. Each spring has length ∆x while it is at rest. According to the spring-mass model, we have

ci(ui− ui−1) − ci+1(ui+1− ui) = mig.

where ci is the spring constant of the spring connecting xi to xi+1. As ∆x ≈ 0 with xi ≈ x, we

have

mi ≈ ρ(xi)∆x, ci≈ c(xi−1/2)/∆x.

Here, ρ is the density. Why the spring constant is proportional to 1/∆x? Think about the problem: Let us connect n springs with the same spring constant, what is the resulting spring constant?

Now, we this approximation, we get that for small ∆x, the spring-mass system becomes 1

∆x

c(xi−1/2)(ui− ui−1) − c(xi+1/2)(ui+1− ui)

= ρ(xi)∆x.

As we take ∆x → 0, we get the equation for the elastic bar: − d dx c(x) d dxu(x) = f (x), where f = gρ.

Notice that the end displacements u0 and un+1satisfy the fix-end boundary conditions

u0= 0, un+1= 0.

which correspond to the boundary condition of u(·) in the elastic bar model: u(0) = 0, u(1) = 0.

1.9 A Problem from Fluid Mechanics

Let us consider two-dimensional incompressible and irrotational flows. The incompressibility reads ∇ · u = 0.

The irrotationality gives

∇ × u = 0.

From the second, there exists a function φ such that u = ∇φ. This together with ∇ · u = 0, we get ∇2φ = 0.

Suppose the fluid is outside some domain Ω. Then on the boundary ∂Ω, u · n = 0. Here, n is the outer normal of ∂Ω. This is equivalent to the Neumann boundary condition for φ:

(25)

1.10. A PROBLEM FROM IMAGE SCIENCE – COMPRESSED SENSING 21 At far field, we assume that the flow is at constant velocity (−U, 0). This is equivalent to φ(x) = −U x as |x| → ∞. We can subtract φ0 = −U x from φ. Let Φ := φ − φ0. Then Φ satisfies

∇2Φ = 0,

∇Φ · n = −U nx, Φ(x) → 0 as |x| → ∞.

Here, n = (nx, ny) is the outer normal of ∂Ω. To derive the formulation, we multiply a test potential

ψ on both sides of ∇2Φ = 0, then integrate over the outer domain Ωc. We require ψ(∞) = 0. 0 = Z Ωc ∇2Φψ dx = − Z Ωc ∇Φ · ∇ψ dx + Z Ωc ∇ · ((∇Φ)ψ) dx = − Z Ωc ∇Φ · ∇ψ dx − Z ∂Ω ψ∇Φ · n dx = − Z Ωc ∇Φ · ∇ψ dx + Z ∂Ω ψU nxdx

Thus, the variation formulation is to find Φ such that Z Ωc ∇Φ · ∇ψ dx − Z ∂Ω ψU nxdx = 0

for all test function ψ ∈ C1(Ωc) with ψ(∞) = 0. The optimization formulation is to find

min Φ∈A 1 2 Z Ωc |∇Φ|2dx + Z ∂Ω ΦU nxdx = 0 A = {Φ ∈ C1(Ωc)||Φ(x)| → 0 as |x| → ∞}.

I leave you to prove that the above three formulations are equivalent when Φ ∈ C2(Ωc) and Φ(∞) = 0.

1.10 A problem from image science – Compressed Sensing

In image science, sometimes the data (image) is very sparse under some representation. For in-stance, the cartoon image is piecewise smooth. Hence it is sparse if it is represented in wavelets. If the image is expressed as a vector x in Rnspace. The dimension n = 5122 for a 512 × 512 image. As x is represented in wavelets: x = Ψd =P

idiψi, most coefficients {di} are zeros, or

very closed to zeros. In this case, we say x is sparse as represented in terms of Ψ.

The data is usually detected by so-called sensing matrix A, which is an m × n matrix. Each individual sensing is

bi =

X

j

(26)

where biis the data collected, ni is a noise.

The idea of compressed sensing is that to detect a sparse data x (or d) by an m × n sensing matrix A with m << n. If the noise is Gaussian white with mean 0 and variance , then we have

kAx − bk2 ≤ .

There are infinite many x ∈ Rnsatisfying the above constraint. Among them, we want to find the one which is most sparse as represented in Ψ. That is,

min

d |d|0subject to kAΨd − bk 2 _{≤ .}

Here, the L0 ”norm” is defined to be

|d|₀ = #{di 6= 0}.

Indeed, | · |0 is not a norm. This optimization problem is a non-convex optimization problem. It

algorithm is an N-P hard problem. In the theory of compressed sensing, if AΨ satisfies certain in-coherence condition, then the problem is equivalent to the following L1 minimization problem:

min d |d|1subject to kAΨd − bk 2 _{≤ ,} where |d|₁ :=X i |d_i|.

This is a convex optimization problem which enjoys polynomial computational complexity and many numerical algorithms are available.

Homeworks 1.1. 1. Prove Proposition 1.1. Also state and prove this proposition for two dimen-sional case.

2. In the one-dimensional elastic bar model, what is the energy functional corresponding to the boundary conditionu(0) = 0 and the free-end boundary condition c(1)ux(1) = 3?

3. Search some pictures of minimal surfaces and make an album of minimal surfaces. Don’t forget to quote where they are from.

(27)

1.10. A PROBLEM FROM IMAGE SCIENCE – COMPRESSED SENSING 23

(28)

u

₁

m

₁

m

1

-c

₁

u

₁

m

₁

g

-c

₁

u

₁

m

₁

g

-c

₂

u

₁

Figure 1.2: The left one is a spring without any mass. The middle one is a spring hanging a mass m1freely. The right one is a mass m1with two springs fixed on the ceiling and floor.

(29)

Chapter 2 Metric Spaces, Banach Spaces

2.1 Metric spaces

2.1.1 History and examples

The French mathematician Maurice Fr´echet (1878-1973) introduced metric spaces in 1906 in his dissertation, in which he opened the field of functionals on metric spaces and introduced the notion of compactness [Wiki]. These are important concepts of point set topology.

Definition 2.1. Given a set X. A metric d is a mapping d : X × X → R satisfying (a) d(x, y) ≥ 0 for all x, y ∈ X, and d(x, y) = 0 if and only if x = y;

(b) d(x, y) = d(y, x) for all x, y ∈ X;

(c) (triangle inequality)d(x, y) ≤ d(x, z) + d(y, z) for all x, y, z ∈ X. A metric space(X, d) is a set X equipped with a metric d.

Examples

1. The sphere S2 equipped with the Euclidean distance in R3 is a metric space. The sphere S2 can also have another metric, the geodesic distance (or the great circle). The geodesic distance d(x, y) is the shortest distance among any path on the sphere connecting x and y . 2. The continuous function space C[a, b] is defined by

C[a, b] = {u : [a, b] 7→ R is continuous} with the metric

d(u, v) := sup

x∈[a,b]

|u(x) − v(x)|. You can check d is a metric.

(30)

3. A (undirected) graph G = (V, E) consists of vertex set V = {x, y, ...} and edge set E = {e = (x, y), ...}. Two vertices x, y are called adjacent to each other if there is an edge e ∈ E connecting them, and their distance is defined to be 1. A path consists of connecting edges. The distance between any two vertices x and y is defined to be the shortest distance along all paths connecting them, if any; otherwise it is defined to be infinity. Let N = |V | be the number of vertices and A be an N × N matrix whose (i, j) entry is 1 if there is an edge connecting vertices i and j; and is zero otherwise. Then the distance d(i, j) = min{n|(An_{)(i, j) 6= 0}. Here, (A}n_{)(i, j) means the (i, j) entry of the matrix A}n_{. The graph}

G with this metric is a typical example of discrete metric space. However, this part is not what we concern in this lecture.

2.1.2 Limits and Continuous Functions

Definition 2.2. A sequence {xn} in a metric space X is said to converge to x ∈ X if d(xn, x) → 0

asn → ∞. That is, all but finite of them cluster at x. In other word, for any > 0 there exists N such thatd(xn, x) < for all n ≥ N .

Some basic notions.

• A point x is called a limit point of a set A in a metric space X if it is the limit of a sequence {xn} ⊂ A and xn6= x.

• The closure of a set A in a metric space X is the union of A with all its limit points. We denote it by ¯A.

• A set A is called closed if A = ¯A.

• The set B(x, ) := {y ∈ X|d(x, y) < } denote the -ball centered at x.

• A point x is called an interior point of a set A if there exists a neighbor B(x, ) ⊂ A for some > 0. The set of all interior points of A is called the interior of A and is denoted by Ao. • A set A is called open if A = Ao_.

• The complement of a set A is Ac_{:= {x ∈ X|x 6∈ A}}

One can show the following basic properties • (Ao₎o _{= A}o

• ¯A = ¯¯ A • ( ¯A)c= (Ac)o

• Arbitrary union of open sets is open.

• Arbitrary intersection of closed sets is closed. • Finite union of closed sets is closed.

(31)

2.1. METRIC SPACES 27 Examples

1. The sequence {(−1)n+_n1} has no limit. 2. The closure of Q in R is R.

3. R is both open and closed. Some limit properties in R

1. Infimum and limit infimum for a set Let A be a set in R. We have the following definitions. (a) b is a low bound of A: if b ≤ x for any x ∈ A.

(b) m is the infimum of A, or the greatest low bound (g.l.b.) of A: if (a) m is a low bound of A, (b) b ≤ m for any low bound b of A. We denote it by inf A.

(c) m is the limit inferior (or limit infimum): if m is the infimum of the set of the limit points of A; we denote it by lim inf A. If the limit point set of A is empty, then we define lim inf A = ∞.

(d) We have the following property: if A has a lower bound, then the following statements are equivalent: m = lim inf A ⇔ for any > 0, (a) all x ∈ A but finite many satisfies m − < x and (b) there exists at least one x ∈ A such that x < m + .

2. Supremum and limit superior for a set Let A be a set in R. We have the following defini-tions.

(a) u is a low bound of A: if u ≥ x for any x ∈ A.

(b) M is the supremum of A, or the least upper bound (l.u.b.) of A: if (a) M is a upper bound of A, (b) u ≥ M for any upper bound u of A. We denote it by sup A.

(c) M is the limit superior (or limit supremum): if M is the supremum of the set of the limit points of A. We denote it by lim sup A. If the limiting set is empty, we define lim sup A = −∞.

(d) We have the following property: if A has an upper bound, then the following statements are equivalent: M = lim sup A ⇔ for any > 0, (a) all x ∈ A but finite many satisfies M + > x and (b) there exists at least one x ∈ A such that x > M − .

3. Infimum and limit infimum for a sequence Let (xn) be a sequence. Then the definition of

infimum and liminf of (xn) is just to treat them as a set. The definition of liminf is equivalent

to

m = lim inf xn⇔ m = limn→∞ inf

m≥nxm ⇔ m = sup_n≥0m≥ninf xm.

Examples

1. Let xn= (−1)n− 1/n, n ≥ 1. Then inf{xn} = −2 and lim infn→∞xn= −1.

(32)

Continuous functions

Definition 2.3. Let f be a function which maps (X, dX) into (Y, dY). We say f is continuous at a

pointx0∈ X if for any > 0 thee exists a δ > 0 such that

dY(f (x), f (x0)) < whenever dX(x, x0) < δ.

Roughly speaking, f is continuous at x0means that whenever x is close to x0, the corresponding

f (x) has to be close to f (x0). This definition is indeed equivalent to the following two definitions.

Their proofs are left to you to get familiar with the -δ language for limit theory.

Definition 2.4. We say that f is sequentially continuous at a point x0 ∈ X if for any sequence

(xn)∞n=1withxn→ x0, we havef (xn) → f (x0) as n → ∞.

Definition 2.5. We say f is continuous at x0 ∈ X if f−1(V ) is open for every open neighborhood

V in Y containing f (x0).

The -δ definition for continuity is the most general formulation of continuity in metric space. A more restricted but more quantitative definition is the following order of continuity. The relative closeness of f (x) to f (x0) with respect to dX(x, x0) can be measured by

dY(f (x), f (x0)) ≤ ω(dX(x, x0)),

where ω(t) is a non-negative increasing function, and ω(t) → 0 as t → 0. For instance, the function |x|αsin(1/x) (α > 0) is continuous at x = 0. The order of continuity can be measured by ω(t) = |t|α. Thus, the continuity can be measured by some majorant function ω(·). But the continuity is independent of its oscillation. The oscillation can be measured from the derivative of the function, or local variation of the function. Among the majorant functions, ω(t) = |t|α → 0 for α > 0. It converges fast if α is large, and slow if α is close to 0. The majorant function ω(t) = 1/ ln |t| converges to 0 very slowly as |t| → 0, as compared with |t|α.

Exercise. Use -δ argument to show that x2, 1/x, sin(1/x) are continuous on (0, 1). Infimum and limit infimum of a function

1. Let f : (X, d) → R. Then m = inf x∈Xf (x) := inf{f (x)|x ∈ X}; and lim inf x→¯x f (x) := limδ→0+d(x,¯infx)<δf (x).

2. The above definition is equivalent to: (a) for any > 0, there exists a δ > 0 such that m − < f (x) for all d(x, ¯x) < δ; (b) for any , there exists a δ > 0 and an x with d(x, ¯x) < δ such that f (x) < m + .

(33)

2.1. METRIC SPACES 29 3. Let f (x) = |x| x 6= 0 −1 x = 0. Then lim infx→0f (x) = 0.

Definition 2.6. A function f : (X, d) → R is called lower semi-continuous (l.s.c.) if for every x ∈ X,

lim inf

y→x f (y) ≥ f (x)

There is an equivalent way to check the lower semi-continuity by epigraph. It is defined to be epif := {(x, t) ∈ X × R|f (x) ≤ t}

Then a function is l.s.c. if and only if its epigraph is closed in X × R.

2.1.3 Completions of metric spaces

Definition 2.7. A sequence {xn} in a metric space X is called a Cauchy sequence if all but finite

of them cluster. This means that: for any > 0, there exists an N such that d(xn, xm) < for any

n, m ≥ N .

Definition 2.8. A metric space is called complete if all Cauchy sequences in X converge. Examples

1. Rn, Cnare complete metric spaces.

2. Qnequipped with the metric d(x, y) := kx − yk2 is not complete. But the completion of Qn

in Rnis Rn.

Given a metric space (X, d), there is a natural way to extend it to a complete and smallest metric space ( ˜X, ˜d), which means that

1. There is an imbedding ı : X → ˜X. This means that ı is one-to-one. 2. The restriction of ˜d on ı(X) is identical to d. That is, ˜d(ıx, ıy) = d(x, y). 3. ( ˜X, ˜d) is complete.

4. ı(X) is dense in ˜X, that is, ı(X) = ˜X.

In applications, we would like to work on complete spaces, which allow us to take limit. If a metric space is not complete, we can take its completion. The completion of an incomplete space is mimic to the completion of Q in R. You imagine that any real number can be approximated by rational

(34)

sequences. This approximation sequence can be constructed in many ways. For instance, let x ∈ R be represented by x = ∞ X i=−m aip−i,

where p > 1 is a positive integer, m an integer, and 0 ≤ ai < p are integers. We choose

xn= n

X

i=−m

aip−i.

Then (xn) is a Cauchy sequence and approaches x. Certainly there are infinite many Cauchy

se-quences approaching the same x. We say that they are equivalent. In other words, (xn) ∼ (yn) if

xn− yn → 0 as n → ∞. The collection of all those Cauchy sequences which approach the same

real number x is called an equivalence class. Any particular Cauchy in this equivalence is called a representation of the real number. Thus, we may identify a real number x to the equivalent class of Cauchy sequence associated with it. This correspondence is one-to-one and onto. Thus, R can be viewed as the set of all these equivalent classes.

The completion of an abstract metric space (X, d) mimic to the above process. Its construction goes as below.

1. Define

˜

X := {(xn)n∈Nis a Cauchy sequence in X}/ ∼,

where the equivalence relation is defined by1

(xn) ∼ (yn) if and only if d(xn, yn) → 0 as n → ∞.

Thus, the element ˜x ∈ ˜X is the set of all Cauchy sequences {(xn)} in which all of them are

equivalent.

2. Given ˜x and ˜y, choose any two representation (xn) and (yn) from ˜x and ˜y respectively, define

˜

d(˜x, ˜y) := lim

n→∞d(xn, yn).

3. Given x ∈ X, define the Cauchy sequence (xn) with xn = x for all n. The equivalent class

that containing this Cauchy sequence (xn) is denoted by ı(x). This is a natural imbedding

from X to ˜X.

One can show that in the above construction: (i) The relation ∼ is an equivalent relation, (ii) ˜d is well-defined, (iii) ˜X is complete, and (iv) ı(X) is dense in ˜X.

1

A relation ∼ is called an equivalent relation in a set X if (i)x ∼ x, (ii) if x ∼ y then y ∼ x, (iii) if x ∼ y and y ∼ z, then x ∼ z. An equivalent class ˜x := {y ∈ X|y ∼ x}.

(35)

2.1. METRIC SPACES 31 The function space C[a, b] In applications, especially ODEs, we often encounter that the solution is at least continuous in time. This motives us to study the function space

C[a, b] := {u : [a, b] → R is continuous.} Given u, v ∈ C[a, b], we define

d(u, v) := sup

x∈[a,b]

|u(x) − v(x)|.

Theorem 2.1. C[a, b] is complete.

Proof. Suppose {un} is a Cauchy sequence in C[a, b]. For any > 0, there exists an N () > 0

such that

sup

x∈[a,b]

|un(x) − um(x)| <

for every n, m > N . For each fixed x ∈ [a, b], {un(x)} is a Cauchy sequence in R. Thus, un(x)

converges to a limit, called u(x). This convergence is indeed uniform in x. In fact, we can take m → ∞ in the above formula to get

sup

x∈[a,b]

|u_n(x) − u(x)| ≤ .

for every n > N . Next, we show that u is continuous at every point x0 ∈ [a, b]. For any > 0,

we have seen that there is N such that supx∈[a,b]|uN(x) − u(x)| < . On the other hand, uN is

continuous at x0. Thus, there exists a δ > 0, which depends on uN, and x0, such that

|u_N(x) − uN(x0)| < for |x − x0| < δ.

Thus,

|u(x) − u(x0)| ≤ |u(x) − uN(x)| + |uN(x) − uN(x0)| + |uN(x0) − u(x0)| < 3.

This shows u is continuous at an arbitrary point x0∈ [a, b].

Exercise.

• Can you replace C[a, b] by C(a, b) in the above theorem? Here, C(a, b) includes all contin-uous functions from (a, b) to R with finite sup norm. In this definition, 1/x is not in C(0, 1) but sin(1/x) does.

• Consider the subset A in C(a, b) to be the set of those functions which have finite limits at the boundary points a and b. What is the relation between A and C[a, b]?

(36)

2.2 Banach spaces

2.2.1 Normed linear space – A space where we can do calculus

The metric space has no algebraic structure. A natural extension of Euclidean space structure is the normed linear space, in which calculus can be introduced. A set X with addition and scalar multiplication is called a linear space (or vector space).

Definition 2.9. A linear space X over a field R (or C) has addition and scalar multiplication operations which satisfy

(a) for allx, y, z ∈ X, x + y = y + x; (x + y) + z = x + (y + z); there exists a zero vector 0 such thatx + 0 = x; for all x ∈ X, there exists a unique (−x) such that x + (−x) = 0; (b) for any x, y ∈ X, any λ, µ ∈ R, 1x = x, (λ + µ)x = λx + µx, λ(x + y) = λx + λy,

λ(µx) = (λµ)x;

Definition 2.10. A norm k · k on a linear space X is a mapping X → R satisfying (a) kxk ≥ 0 for all x ∈ X and kxk = 0 if and only if x = 0;

(b) kλxk = |λ|kxk for all λ ∈ R and x ∈ X;

(c) (triangle inequality)kx + yk ≤ kxk + kyk for all x, y ∈ X.

A normed linear space(X, k · k) is a linear space X equipped with a norm k · k. Definition 2.11. A complete normed linear space is called a Banach space.

Properties

• A normed linear space is a metric space equipped with the metric d(x, y) = kx − yk. • A metric in a linear space defines a norm if it satisfies the translational invariant property

(d(x − z, y − z) = d(x, y)) and the homogeneity property (d(λx, 0) = λd(x, 0)). • The unit ball in a normed linear space is convex (triangle inequality).

• In a finite dimensional normed space, all norms are equivalent. Here, two norms k · k1 and

k·k₂in a normed linear space X are said to be equivalent if there exists two positive constants C1, C2such that

C1kxk1≤ kxk2 ≤ C2kxk1

(37)

2.2. BANACH SPACES 33 Examples

1. The Rnspace equipped with the Euclidean norm

kxk₂ = (|x1|2+ · · · + |xn|2)1/2

is a Banach space.

2. The Rnspace equipped with the p-norm:

kxk_p= (|x1|p+ · · · + |xn|p)1/p, 1 ≤ p < ∞

are Banach spaces. Furthermore, one can show that kxk∞:= max

i |xi|

is a norm, and

kxk_p → kxk∞, as p → ∞.

Notice that kxkpwith 0 ≤ p < 1 is not a norm, but it can measure the sparsity of x. Indeed,

kxk₀:= #{xi 6= 0},

which measure the sparsity of x, and kxkp → kxk0 as p → 0.

3. The set of matrices

M_m×n_{:= {A : R}n_{7→ R}mis linear} equipped with the Frobenious norm defined by

kAkF :=   X ij |Aij|2   1/2 is a Banach space.

4. The `p(N) (1 ≤ p < ∞) space is the set

`p(N) := {x : N → R|

∞

X

i=1

|x_i|p< ∞}

equipped with the norm

kxkp := (|x1|p+ |x2|p+ · · · )1/p.

Similar to the finite dimensional case, we define kxk∞:= sup

i

|xi|

is a norm and kxkp → kxk∞ (as p → ∞) if they exist. Indeed, one can prove that `p,

(38)

5. The set of continuous functions

C[a, b] := {u : [a, b] → R is continuous} is a linear space. We define the sup norm by

kuk∞:= max x∈[a,b]

|u(x)|. Then (C[a, b], k · k∞) is a Banach space.

6. (C[a, b], k · kp), 1 ≤ p < ∞ is not complete. A simple example is that the sequence of

continuous functions

un(x) := tanh(nx), x ∈ [−1, 1]

tends (in all k · kp, 1 ≤ p < ∞) to

u(x) =    −1 for x < 0 0 for x = 0 1 for x > 0 which is not in C[−1, 1].

The Completion of normed linear spaces

1. The completion of Q in R under absolute value norm | · | is R. 2. Let

C1[a, b] := {u : [a, b] 7→ R, u, u0are continuous} Then C1[a, b] is complete under the norm

|u|1,∞:= sup x

|u(x)| + sup

x

|u0(x)|.

But C1[a, b] is not complete under the sup norm |u|∞ := sup |u(x)|. Its completion under

sup norm is C[a, b]. .

3. The completion of C[a, b] under the norm kuk1 :=

Z b

a

|u(x)| dx

is called the L1-space, and is denoted by L1(a, b). It is the set of all Lebesgue integrable functions on (a, b).

4. The completion of (C[a, b], k · kp), 1 ≤ p < ∞ is the Lpspace

Lp(a, b) := {u : [a, b] → R| Z b

a

|u(x)|p_{dx < ∞}}

(39)

2.2. BANACH SPACES 35 5. The function 1/|x|αis in Lp(−1, 1) for 0 < αp < 1.

6. Is the function sin(1/x) in Lp(−1, 1) for 1 ≤ p ≤ ∞?

7. For which α the corresponding |x|−αsin(1/|x|) ∈ Lp(−1, 1)?

2.2.2 Approximation and Basis

In function spaces, we want to approximate general functions in terms of linear combination of some simple known functions. This linear combination is usually in terms of infinite series, but countable. A set I is called countable if it is either finite many or there is an one-to-one correspondence between I and N. One can check that Z × Z is countable and thus Q is also countable because a rational number r can be represented by p/q with (p, q) ∈ Z × Z. In R, we want to approximate a real number by an (countable) infinite series. For instance, we may approximate r ∈ [0, 1] by

r =

∞

X

n=1

an2−n.

where an∈ {0, 1}. Each finite sub series is an element in Q. This motivates the following definition.

Definition 2.12. A metric space X is said to be separable if there is a countable set A ⊂ X such that ¯A = X.

C[0, 1] is separable

1. The Bernstein polynomials are bν,n(x) := n ν xν(1 − x)n−ν, ν = 0, ..., n. They are in the space

Pn:= {p(x)|p is a polynomial and deg(p) ≤ n}

The space Pn has dimension n + 1. Since bν,n, ν = 0, ..., n are independent, They form a

basis of Pn.

2. The set of Bernstein polynomials with rational coefficients A = {

n

X

ν=0

aνbν,n|aν ∈ Q, n ≥ 0}

is countable and is dense in C[0, 1]. Indeed, let f ∈ C[0, 1], then Bn(f ) := n X ν=0 fν n bν,n(x)

(40)

(a) bν,n> 0 andPnν=0bν,n(x) = 1.

(b) The difference f (x) − Bn(f ) has the following estimates:

(c) The Bernstein polynomial bν,n(x) concentrates at ν/n ∼ x as n → ∞. More precisely,

for any fixed small δ,

X

|ν/n−x|≥δ

bν,n(x) → 0

as n → ∞.

(d) f is uniformly continuous on [0, 1]. That is, for any > 0, there exists a δ > 0 such that |f (x) − f (y)| < whenever |x − y| < δ.

I leave you to fill in the gaps.2

2

The Berstein polynomial has the following probability interpretation.

(a) Let X be the random variable of one binormial trial with probability x of success. That is, P (X = 1) = x and P (X = 0) = (1 − x). If we perform two independent Bernoulli trials, denote Xithe outcome of the ith trial.

The sample space Ω2 = {(1, 1), (1, 0), (0, 1), (0, 0)}. Here, (a1, a2) denotes that X1 = a1and X2 = a2.The

probability of S2= X1+ X2is

P (S2= 2) = x2, P (S2 = 1) = 2x(1 − x), P (S2= 0) = (1 − x)2.

For n independent Bernoulli trials, the number of elements that have ν times success is n!/(ν!(n − ν)!). Thus, the probability of ν times successes is

P (Sn= ν) =

n! ν!(n − ν)!x

ν

(1 − x)n−ν= bν,n(x).

(b) The expectation of a random X is defined to be E(X) :=P

ννP (X = ν).

(c) The weak form of law of large number states that: Let X be the random variable of the Bernoulli trial and E(X) = x. Let Sn= X1+ · · · + Xnwhere Xiare all independent and with identical distribution as X, then

lim n→∞E Sn n = x.

(d) The key is the Chybeshev inequality: P (|Sn/n − x| > δ) ≤ σ 2

δ2n→ 0 as n → ∞. Here, σ

2_{= E(|X − x|}2_{) the}

(41)

2.2. BANACH SPACES 37 Definition 2.13. Let (X, k · k) be a separable Banach space. A set {en}∞n=1 is called a Schauder

basis ofX if for every x ∈ X, there is a unique representation of x in terms of {en} by

x = ∞ X n=1 anen. Examples

1. In `p, 1 ≤ p < ∞, let e1 = (1, 0, 0, · · · ), e2:= (0, 1, 0, · · · ), e2 := (0, 0, 1, 0, · · · ),etc. Then

Given a point x ∈ `p, x = (x1, x2, · · · ). Let

xn:= (x1, ..., xn, 0, 0, · · · ).

Then

kxn_{− xk} p → 0.

In other words, {en}∞n=1is a Schauder basis in `p.

2. In the next section, we will see that the Fourier functions {e_n:= e2πinx}∞_n=−∞ form a basis in C(T) := {u : [0, 1] → C periodic}.

3. In finite element method, the solution is approximated by piecewise linear function. We shall construct the corresponding basis.

Let us consider the domain [0, 1]. For any n ∈ N, we partition [0, 1] into 2n subintervals evenly. The points xnk = 2−nk, k = 0, ..., 2n are called the nodal points of the partition.

Given u ∈ C[0, 1], let unbe the continuous function with un(xnk) = u(xnk) and linear on

each subinterval (xn,k, xn,k+1), k = 0, ..., 2n− 1. The function uncan approximate u in sup

norm. Indeed, since u is uniformly continuous on [0, 1], we have for any > 0, there exists a

probability distribution function of X.

P (|X| ≥ δ) = Z |x|/δ≥1 dF (x) = Z |x|/δ≥1 |x|2 δ2 dF (x) ≤ Z _|x|2 δ2 dF (x) = σ 2 δ

We apply this Chybeshev inequality to Sn/n. We may assume E(X) = x = 0, using independence of Xi, we get

the variance of Sn/n is σ2/n.

(42)

δ > 0 such that |u(x) − u(y)| < whenever |x − y| < δ. We choose N such that 2−N < δ. Now, for any n ≥ N , there exists an k such that xn,k ≤ x ≤ xn,k+1. Hence we have

|u(x) − un(x)| ≤ |u(x) − u(xn,k)| + |un(xn,k) − un(xn,k+1)| ≤ 2.

Thus, kun− uk∞→ 0 as n → ∞.

We shall write unin terms of a basis. Consider the hat function

φ(x) =    x + 1 for − 1 ≤ x < 0 1 − x for 0 ≤ x ≤ 1 0 otherwise. We can perform scaling and translation and produce

φn,k := φ(2nx − k).

This function is centered at xnk with support (xn,k−1, xn,k+1). The φn,0 and φn,2n are the

boundary nodal functions, whereas φn,k, k = 1, ..., 2n− 1 the interior nodal functions. The

piecewise linear function undefined above can be represented in terms of φn,k:

un(x) := 2n

X

k=0

u(xnk)φnk(x), where xnk := 2−nk.

Furthermore, the hat function φ has the following scaling property φ(x) = 1

2φ(2x + 1) + φ(2x) + 1

2φ(2x − 1). For interior nodal functions, we then have

φn−1,k =

1

2φn,2k−1+ φn,2k+ 1

2φn,2k+1 for k = 1, ..., 2n−1− 1. For boundary nodal functions, we have

φn−1,0= φn,0+ 1 2φn,1, and φn−1,2n−1 = φ_n,2n+ 1 2φn,2n−1, Let us consider the space

Vn:= span{φnk, k = 0, ..., 2n}.

The dimension of Vnis 2n+ 1. From the scaling property of φ, the space Vnare nested: V0 ⊂ V1⊂ · · · ⊂ Vn⊂ · · ·

(43)

2.3. LINEAR OPERATORS IN BANACH SPACES, BASIC 39 Let us define

ψn−1,k = φn,2k−1, k = 1, ..., 2n−1,

and the space

Wn−1= span{ψn−1,k|k = 1, ..., 2n−1}.

Then Vn= Vn−1+ Wn−1. Indeed, the inversion: {φn−1,k, ψn−1,k} → {φn,k} is given by

φn,2k−1= ψn−1,k, k = 1, ..., 2n−1 φn,2k = φn−1,k− 1 2(ψn−1,k+ ψn−1,k+1) , k = 1, ..., 2 n−1_{− 1} φn,0= φn−1,0− 1 2ψn−1,1, φn,2n = φn−1,2n−1− 1 2ψn−1,2n−1,

The dimensions of Vn−1and Wn−1are 2n−1+ 1 and 2n−1. Their sum is 2n+ 1, which is the dimension of Vn. Since Vn→ C[0, 1], we then expect

C[0, 1] = V0+ W0+ W1+ · · · . We thus expect

{φ0,0, φ0,1, ψn,k, k = 1, ..., 2n, n ≥ 0}

forms a Schauder basis in C[0, 1].

2.3 Linear Operators in Banach Spaces, Basic

Definition 2.14. Let (X, k·}X) and (Y, k · kY) be two normed linear spaces. A linear map A : X →

Y is called bounded if there exists a constant C such that kAxk_Y ≤ Ckxk_X for allx ∈ X.

Lemma 2.1. A linear map A is bounded if and only if it is continuous. Proof. 1. If A is bounded, then

kAx − Ayk_Y = kA(x − y)kY ≤ Ckx − ykX.

This shows A is continuous.

2. If A is continuous, then in particular, it is continuous at 0. This means that for any > 0, there exists a δ > 0 such that kAzkY ≤ whenever kzkX ≤ δ. Now for any x ∈ X, we rescale it

by letting z = (δ/kxkx)x. Then kzk = δ. Hence we have kAzkY ≤ . Or equivalently,

A δ kxkX x Y ≤ , or kAxkY ≤ δkxkX.

The operator norm of a bounded linear operator is define to be kAk := supkAxkY

kxkX

= sup

kxkX=1

(44)

Examples

1. In Rn, let l(x) := a · x, where a an n-vector. If we use the k · k2norm, then the corresponding

operator norm of l is exactly kak2.

2. What is the corresponding operator norm of the operator l(x) = a · x in `pfor 1 ≤ p ≤ ∞? 3. Let a ∈ [0, 1]. The mapping u 7→ u(a) is a bounded mapping from C[0, 1] → R. Find its

operator norm. Let us denote this operator by δa. What is the operator norm corresponding

to αδa+ βδb, where a, b ∈ [0, 1] and α, β ∈ R.

4. Let A be an m × n matrix mapping fromRnto itself. Find the corresponding operator norm of A when Rnis equipped with `1-norm. Do the same thing if Rnis equipped with `∞. Find the operator norm of the mapping Ku(x) =R₀1g(x, y)u(y) dy in L1, L2 and C0[0, 1].

5. The mapping

Ku(x) := Z x

0

u(y) dy

is a bounded mapping from C[0, 1] to itself. It is also a bounded mapping from Lp(0, 1) to itself.

6. Let g(x, y) : [0, 1] × [0, 1] → R be continuous. The operator Ku(x) :=

Z 1

0

g(x, y)u(y) dy

is a bounded operator from C[0, 1] to itself. We may also think K is a mapping from L1(0, 1) to itself. Find the corresponding operator norm. Do the same thing for L∞(0, 1).

7. A concrete example is

g(x, y) =

x(1 − y) for 0 ≤ x ≤ y ≤ 1

y(1 − x) for 0 ≤ y ≤ x ≤ 1. (2.1) 8. The differentiable operator D maps einx to ineinx. Then {einx} is a bounded sequence in C(T), but {Deinx} is not. Thus D is not bounded in C(T). Same proof for all Lp_{(T) as we}

treat D in Lp(T).

Kernel and Range Let X and Y be normed linear spaces and let A : X → Y be a linear map. The kernel N (A) := {x ∈ X|Ax = 0}, and the range R(A) := {Ax|x ∈ X}.

1. A is 1-1 if and only if N (A) = {0}. 2. If A is bounded, then N (A) is closed.

3. Let the matrix A = (a1, · · · , an), where a1, .., anbe column vectors in Rm. Let the operator

Ax :=Pn

(45)

2.3. LINEAR OPERATORS IN BANACH SPACES, BASIC 41 4. Let g(x, y) = n X i=1 ψi(y)φi(x) The operator Ku(x) = Z 1 0 g(x, y)u(y) dy = n X i=1 Z 1 0 ψi(y)u(y) dyφi(x)

is a projection of u onto the space spanned by {φ1, ..., φn}. It is a bounded operator.

Further-more, both the kernel and range are closed in C[0, 1]. 5. The shift operator from `∞(Z) to itself defined by

(T x)n= xn+1.

The shift operator is a bounded operator. Further, N (T ) = {0} and R(T ) = `∞(Z). How-ever, in `∞(N), we define

(T x)n= xn+1for n ≥ 1.

In this case,

N (T ) = {(x1, 0, 0, ...)|x1∈ R}.

and R(T ) = `∞(N).

6. Consider Ku = R₀xu(y) dy in C[0, 1]. Then N (K) = {0}. For Rx

0 u(y) dy = 0 implies

u ≡ 0. But

R(A) = {u ∈ C1[0, 1]|u(0) = 0} which is not closed in C[0, 1].

7. In the space C[0, 1], consider Ku =R₀xu(y) dy and A = I +K. Then Au = 0 implies u(x)+ Rx

0 u(y) dy = 0. Differentiate it in x, we obtain u

0_{+u = 0. This leads to u(x) = Ce}−x_{. Thus,}

N (A) = {Ce−x|C ∈ R}. Notice that if we restrict to the space {u ∈ C[0, 1]|u(0) = 0}, then N (A) = {0}.

Next, for any f ∈ C[0, 1], we look for a solution u ∈ C[0, 1] such that Au = f . Formally, we differentiate Au = f and get

u0+ u = f0. By using integration factor, we get

(eyu)0 = eyf0. Integrate this equation, we get

exu(x) − u(0) = Z x 0 eyf0(y) dy = exf (x) − f (0) − Z x 0 eyf (y) dy.

Applied Analysis