• 沒有找到結果。

Analysis and Application of a Nonlocal Hessian

N/A
N/A
Protected

Academic year: 2021

Share "Analysis and Application of a Nonlocal Hessian"

Copied!
42
0
0

加載中.... (立即查看全文)

全文

(1)

Vol. 8, No. 4, pp. 2161–2202

Analysis and Application of a Nonlocal Hessian

Jan Lellmann, Konstantinos Papafitsoros, Carola Sch¨onlieb, and Daniel Spector

Abstract. In this work we introduce a formulation for a nonlocal Hessian that combines the ideas of higher-order and nonlocal regularization for image restoration, extending the idea of nonlocal gradients to higher-order derivatives. By intelligently choosing the weights, the model allows us to improve on the current state of the art higher-order method, total generalized variation, with respect to overall quality and preservation of jumps in the data. In the spirit of recent work by Brezis et al., our formulation also has analytic implications: for a suitable choice of weights it can be shown to converge to classical second-order regularizers, and in fact it allows a novel characterization of higher-order Sobolev and BV spaces.

Key words. nonlocal Hessian, nonlocal total variation regularization, variational methods, fast marching method, amoeba filters

AMS subject classifications. 65D18, 68U10, 94A08, 35A15, 49J40, 49Q20, 26B30, 26B35, 46E35 DOI. 10.1137/140993818

1. Introduction and context. The total variation model of image restoration due to Rudin, Osher, and Fatemi [ROF92] is now classical—the problem of being given a noisy image g ∈ L2(Ω) on an open set Ω⊆ R2 and selecting a restored image via minimization of the energy

E(u) :=

ˆ

Ω(u− g)

2 dx + α TV(u).

Here, α > 0 is a regularization parameter at our disposal and TV(u) := |Du|(Ω) is the total variation of the measure Du (the distributional derivative of u, which has finite total mass when one assumes u is of bounded variation [AFP00]). Among the known defects of the model is the staircasing effect, where affine portions of the image are replaced by flat regions and newly created artificial boundaries, stemming from the use of the TV term in regularization. It is then natural to investigate the replacement of the total variation with Received by the editors October 31, 2014; accepted for publication (in revised form) July 10, 2015; published electronically October 6, 2015. This project was supported by King Abdullah University of Science and Technology (KAUST) award KUK-I1-007-43.

http://www.siam.org/journals/siims/8-4/99381.html

Department of Applied Mathematics and Theoretical Physics, University of Cambridge, Cambridge CB3 OWA, United Kingdom (j.lellmann@damtp.cam.ac.uk,kp366@cam.ac.uk, cbs31@cam.ac.uk). The first author’s research was supported by Leverhulme Early Career Fellowship ECF-2013-436. The second author’s research was supported by the Cambridge Centre for Analysis and the EPSRC. The third author’s research was partially supported by EPSRC grants EP/J009539/1 and EP/M00483X/1.

Technion, Israel Institute of Technology, Haifa, Israel, and Department of Applied Mathematics, National Chiao Tung University, Hsinchu 30010, Taiwan (dspector@math.nctu.edu.tw). This author’s research was supported in part by a Technion Fellowship and by Taiwan Ministry of Science and Technology research grant 103-2115-M-009-016-MY2.

2161

(2)

another regularizer, for instance a higher-order term (see [Sch98,LLT03,LT06,HS06,CEP07,

PS14] for the bounded Hessian framework, [CL97,BKP10,SST11] for infimal convolution and generalizations, and [LBL13] for anisotropic variants) or a nonlocal term (see, for example, the work of Buades, Coll, and Morel [BCM05], Kinderman, Osher, and Jones [KOJ05], and Gilboa and Osher [GO08]). In this work, we introduce and analyze a regularizer that is both higher-order and nonlocal—a nonlocal Hessian—and utilize it in a model for image restoration. Our numerical experiments demonstrate that using this regularization with a suitable choice of weights enables us to derive specialized models that compete with current state of the art higher-order methods such as the total generalized variation [BKP10]. Meanwhile, our analysis justifies the nomenclature nonlocal Hessian through its connection with recent work on nonlocal gradients [MS15]. In particular, we perform rigorous localization analysis which parallels the first-order case.

Background on higher-order regularization. The use of nonsmooth regularization terms such as the total variation in image reconstruction results in a nonlinear smoothing of recon-structed images. As a consequence, one observes a greater degree of smoothing in homoge-neous areas of the image domain while preserving characteristic structures such as edges. In particular, total variation regularization performs well if the reconstructed image is piecewise constant. The drawback of such a regularization procedure becomes apparent as soon as im-ages or signals (in one dimension) are considered which not only consist of flat regions and jumps but also possess slanted regions, i.e., piecewise linear parts. The artifact introduced by total variation regularization in this case is called staircasing. One possible approach to improve total variation minimization is the introduction of higher-order derivatives in the regularizer, whose literature we now briefly review.

In [CL97] Chambolle and Lions propose a higher-order method by means of an infimal convolution of two convex regularizers. Here, a noisy image is decomposed into three parts

g = u1+ u2+ n by solving (1.1) min (u1,u2) 1 2 ˆ Ω(u1+ u2− g) 2dx + α TV(u 1) + β TV2(u2),

where TV2(u2) :=|D2u2|(Ω) is the total variation of the distributional Hessian of u2. Then,

u1 and u2 are the piecewise constant and the piecewise affine parts of g, respectively, and

n the noise (or texture). For recent modifications of this approach in the discrete setting,

see also [SS08, SST11]. Other approaches combining first and second regularization origi-nate, for instance, from Chan, Marquina, and Mulet [CMM01], who consider total variation minimization together with weighted versions of the Laplacian, the Euler-elastica functional [MM98, CKS02], which combines total variation regularization with curvature penalization, and many more [LT06, LTC13, PS14,PSS13, Ber14]. Recently, Bredies, Kunisch, and Pock have proposed another interesting higher-order total variation model called total generalized variation (TGV) [BKP10]. The TGV regularizer of order k is of the form

(1.2) TGVkα(u) = sup ˆ Ωu div kξ dx : ξ∈ Ck c(Ω, Symk(RN)), divlξ∞≤ αl, l = 0, . . . , k− 1  ,

where Symk(RN) denotes the space of symmetric tensors of order k with arguments in RN,

(3)

and αl are fixed positive parameters. For the case k = 2, its formulation for the solution of general inverse problems was given in [BV11].

The idea of pure bounded Hessian regularization is considered by Lysaker, Lundervold, and Tai [LLT03], Scherzer [Sch98], Hinterberger and Scherzer [HS06], Lefkimmiatis, Bourquard, and Unser [LBU12], and Bergounioux and Piffet [BP10]. In these works the considered model has the general form

min u 1 2 ˆ Ω(u− g) 2 dx + α|D2u|(Ω).

In [CEP07], Chan, Esedoglu, and Park use the squared L2 norm of the Laplacian as a reg-ularizer also in combination with the H−1 norm in the data fitting term. Further, in [PS08] minimizers of functionals which are regularized by the total variation of the (l−1)st derivative, i.e.,

|D∇l−1u|(Ω),

are studied. Properties of such regularizers in terms of diffusion filters are further studied in [DWB09]. Therein, the authors consider the Euler–Lagrange equations corresponding to minimizers of functionals of the general type

J (u) = ˆ Ω(u− g) 2dx + αˆ Ωf ⎛ ⎝ |k|=p |Dku|2 ⎞ ⎠ dx

for different nonquadratic functions f . There are also works on higher-order PDE methods for image regularization; see, e.g., [CS01,LLT03,BG04,BEG08,BHS09].

Confirmed by all of these works on higher-order total variation regularization, the in-troduction of higher-order derivatives can have a positive effect on artifacts like staircasing inherent to total variation [Rin00].

Higher-order nonlocal regularization. One possible approach to a higher-order extension of nonlocal regularization has been proposed recently in the work [RBP14], with optical flow being the main application. The authors start with the cascading formulation of (second-order) TGV, TGV(u) = inf w:Ω→RN α1 ˆ Ω|Du − w| + α0 ˆ Ω|Dw|,

which reduces the higher-order differential operators that appear in the definition of TGV to a special type of infimal convolution of two terms involving only first-order derivatives [BV11]. These can then be replaced by classical first-order nonlocal derivatives, and one obtains an energy of the form

inf w:Ω→RN ˆ Ω ˆ Ωα1(x, y)|u(x)−u(y)−w(x) (x−y)|dydx+2 i=1 ˆ Ω ˆ Ωα0(x, y)|w i(x)−wi(y)| dydx.

This formulation takes into account the higher-order differential information via the second term in the minimization, and the weighting parameters α0 and α1 are now spatially depen-dent. Even though this approach can be adapted for other imaging tasks, e.g., denoising, it is not clear how to choose these weighting functions.

(4)

In this paper we define a different type of higher-order nonlocal regularizer, providing as well a rule for choosing the corresponding weighting functions for optimal results. Before we proceed we recall some basic facts about nonlocal gradients.

Background on nonlocal gradients. In the first-order setting, the analysis of nonlocal gradients and their associated energies finds its origins in the 2001 paper of Bourgain, Brezis, and Mironescu [BBM01]. In their paper, Bourgain, Brezis, and Mironescu introduce energies of the form (1.3) Fnu := ˆ Ω ˆ Ω |u(x) − u(y)|pq |x − y|pq ρn(x− y)dx 1 q dy,

where Ω is a smooth bounded domain in RN and 1 ≤ p < ∞, and in the special case q = 1. Here, the functions ρn are radial mollifiers that are assumed to satisfy the following three

properties for all n∈ N:

ρn(x)≥ 0, (1.4) ˆ RNρn(x)dx = 1, (1.5) lim n→∞ ˆ |x|>γρn(x)dx = 0 ∀γ > 0. (1.6)

An example of such a family of mollifiers are the standard Gaussian kernels that converge to a Dirac δ as n tends to infinity. Let us here remark that a perhaps more appropriate terminology for these functionals in image processing is semilocal, since asymptotically there is no possibility of nonlocality, in contrast to the genuine nonlocality allowed in image processing. The work [BBM01] connects the finiteness of the limit as n → ∞ of the functional (1.3) with the inclusion of a function u ∈ Lp(Ω) in the Sobolev space W1,p(Ω) if p > 1 or BV(Ω) if p = 1. As in the beginning of the introduction, the space BV(Ω) refers to the space of functions of bounded variation, and it is no coincidence that the two papers [BBM01,ROF92] utilize this energy space. Indeed, Gilboa and Osher [GO08] in 2008 independently introduce an energy similar to (1.3), terming it a nonlocal total variation, while the connection of the two and the introduction of the parameter q is due to Leoni and Spector [LS11]. In particular, they show in [LS14] that for p = 1 the functionals (1.3) Γ-converge to a constant times the total variation. This result extends previous work by Ponce [Pon04b] in the case q = 1 (see also the work of Aubert and Kornprobst [AK09] for an application of these results to image processing).

Gilboa and Osher [GO08] in fact introduced two forms of nonlocal total variations, and for our purposes here it will be useful to consider the second. This alternative involves introducing a nonlocal gradient operator, defined by

(1.7) Gnu(x) := N ˆ Ω u(x)− u(y) |x − y| x− y |x − y|ρn(x− y)dy, x ∈ Ω,

for u ∈ Cc1(Ω). Then, one defines the nonlocal total variation as the L1 norm of (1.7). The localization analysis of the nonlocal gradient (1.7) has been performed by Mengesha and

(5)

Spector in [MS15], where a more general (and technical) distributional definition is utilized. Their first observation is that the definition of the nonlocal gradient via the Lebesgue integral (1.7) extends to spaces of weakly differentiable functions. In this regime they discuss the localization of (1.7). They prove that the nonlocal gradient converges to its local analogue

∇u in a topology that corresponds to the regularity of the underlying function u. As a

result, they obtain yet another characterization of the spaces W1,p(Ω) and BV(Ω). Of notable interest for image processing purposes is their result on the Γ-convergence of the corresponding nonlocal total variation energies defined via nonlocal gradients of the form (1.7) to the local total variation.

One way to extend the results of Mengesha and Spector to the higher-order case is to simply study the functional that results after substituting u with∇u in (1.7). Then a nonlocal Hessian could be defined via

(1.8) Gn(∇u)(x) = N ˆ Ω ∇u(x) − ∇u(y) |x − y| x− y |x − y|ρn(x− y)dy,

where ⊗ denotes the standard tensor multiplication of vectors. While one can obtain some straightforward characterization of W2,p(RN) and BV2(RN) in this way, we find it

advanta-geous to utilize a nonlocal Hessian that is derivative-free and therefore pursue an alternative approach.

A nonlocal Hessian tuned for imaging tasks. We define an implicit nonlocal gradient

Gu(x) ∈ RN and Hessian Hu(x) ∈ Sym(RN ×N) that best explain u around x in terms of a quadratic model : (1.9) (Gu(x), Hu(x)) = argmin Gu∈RN,Hu∈Sym(RN×N) 1 2 ˆ Ω−{x} u(x + z)− u(x) − Guz−1 2z H uz 2 σx(z)dz,

where Ω− {x} = {y − x : y ∈ Ω} and σx is an appropriate weight function for each x∈ Ω.

Such a definition has the advantage of the freedom to choose the weights σx as one sees fit.

Of primary interest to our work are two questions: How does the nonlocal Hessian perform in comparison to the known state of the art methods? And in what way is it connected to the classical Hessian?

To answer the first question, the model depends on the choice of weights, and of practical relevance is the question of how to choose them for a particular purpose. The first point to mention in this regard is that as the objectives of the minimization problems (1.9) are quadratic, their solutions can be characterized by linear optimality conditions. Thus func-tionals based on the implicit nonlocal derivatives can be easily included in usual convex solvers by adding these conditions. Moreover, the weights σx(z) between any pair of points x and

y = x + z can be chosen arbitrarily, without any restrictions on symmetry. In particular,

in this work we develop a method of choosing weights to construct a regularizer that both favors piecewise affine functions while allowing for jumps in the data. Our motivation stems from the recent discussion of “amoeba” filters in [LDM07, WBV11, Wel12], which combine standard filters such as median filters with nonparametric structuring elements that are based on the data; that is, in long thin objects they would extend along the structure and thus

(6)

Input TGV Non-local Hessian

Figure 1. Illustration of the capability of the proposed nonlocal Hessian regularization to obtain true piece-wise affine reconstructions in a denoising example.

prevent smoothing perpendicular to the structure. In amoeba filtering, the shape of a struc-turing element at a point is defined as a unit circle with respect to the geodesic distance on a manifold defined by the image itself. In a similar manner, we utilize the geodesic distance to set the weights σx. This allows us to get a very close approximation to true piecewise affine regularization, in many cases improving on the results obtained using TGV; see Figure 1 for a proof of concept. We present several experiments in section 4.3that show the performance of this choice against the state of the art.

As to the second question, in the general form of (1.9), the problem is considerably harder to treat analytically, and so we will restrict ourselves to the special case of radial weights. In particular, assuming some mild regularity assumptions on u and considering the problem (1.9) with weights ρn(z)/|z|4, (1.10) (Gu(x), Hu(x)) = argmin Gu∈RN,Hu∈Sym(RN×N) 1 2 ˆ RN u(x + z)− u(x) − Guz−1 2z H uz 2 ρn(z) |z|4 dz,

we will show in Theorem 4.1 that Hu agrees with the following natural explicit definition of nonlocal Hessian.

Definition 1.1. Suppose u ∈ Cc2(RN). Then we define the explicit nonlocal Hessian as the

Lebesgue integral (1.11) Hnu(x) := N (N + 2) 2 ˆ RN

u(x + z)− 2u(x) + u(x − z) |z|2

z⊗ z − N +2|z|2 IN

|z|2 ρn(z)dz, x∈ RN, where here IN is the N× N identity matrix and ρn is a sequence satisfying (1.4)–(1.6).

We note here that the presence of the constant N (N +2)/2 as well as the term z⊗z−N +2|z|2 IN

ensure that (1.11) has the right localization properties; see section 3 for more details. The assertion of Theorem 4.1is that with the preceding choice of weights one has the equivalence

Hu(x)≡ N (N + 2) 2

ˆ

RN

u(x + z)− 2u(x) + u(x − z) |z|2

z⊗ z −N +2|z|2 IN

|z|2 ρn(z)dz.

(7)

Results concerning the explicit nonlocal Hessian. From the standpoint of analysis, the explicit version of nonlocal Hessian (1.11) is more natural, for which we are able to prove a number of results analogous to the first-order case studied by Mengesha and Spector [MS15]. Let us first extend the definition to functions which are not necessarily smooth and compactly supported, as typical for operators acting on spaces of weakly differentiable functions.

Definition 1.2. Suppose u ∈ Lp(RN). Then we define the distributional nonlocal Hessian

componentwise as Hij nu, ϕ := ˆ RNuH ij nϕ dx (1.12)

for ϕ∈ Cc(RN), where Hnijφ denotes the i, jth element of the nonlocal Hessian matrix (1.11).

A natural question is then whether these two notions agree. The following theorem shows that this is the case, provided the Lebesgue integral exists.

Theorem 1.3 (nonlocal integration by parts). Suppose that u ∈ Lp(RN) for some 1 ≤ p < +∞ and |u(x+z)−2u(x)+u(x−z)||z|2q qρn(z) ∈ L1(RN × RN) for some 1 ≤ q ≤ +∞. Then the distribution Hnu can be represented by the function Hnu, i.e., for any ϕ ∈ Cc2(RN) and

i, j = 1, . . . , N , Hij nu, ϕ = ˆ RNH ij nu(x)ϕ(x) dx, (1.13) and also Hnu∈ L1(RN,RN ×N).

We will see in section 3, in Lemmas 3.1 and 3.4, that the Lebesgue integral even makes sense for u∈ W2,p(RN) or BV2(RN), and therefore the distributional definitionHnu coincides

with the Lebesgue integral for these functions.

Then the main analysis we undertake in this paper are the following results, proving localization results in various topologies and characterizations of higher-order spaces of weakly differentiable functions. Our first result is the following theorem concerning the localization in the smooth case.

Theorem 1.4. Suppose that u∈ Cc2(RN). Then for any 1≤ p ≤ +∞,

Hnu→ ∇2u in Lp(RN,RN ×N) as n→ ∞.

When less smoothness is assumed on u, we have analogous convergence theorems, where the topology of convergence depends on the smoothness of u. When u ∈ W2,p(RN), we have the following.

Theorem 1.5. Let 1≤ p < ∞. Then for every u ∈ W2,p(RN) we have that

Hnu→ ∇2u in Lp(RN,RN ×N) as n→ ∞.

In the setting of BV2(RN) (see section 2 for a definition), we have the following theorem on the localization of the nonlocal Hessian.

Theorem 1.6. Let u∈ BV2(RN) and μn:= HnLN be a sequence ofRN ×N-valued measures. Then

μn→ D2u, weakly∗ in the space of Radon measures;

(8)

i.e., for every φ∈ C0(RN,RN ×N), (1.14) lim n→∞ ˆ RNHnu(x)· φ(x)dx = ˆ RNφ(x)· dD 2u.

We have seen that the nonlocal Hessian is well-defined as a Lebesgue integral and localizes for spaces of weakly differentiable functions. In fact, it is sufficient to assume that u∈ Lp(RN) is a function such that the distributions Hnu are in Lp(RN,RN ×N) with a uniform bound of

their Lp norms, in order to deduce that u ∈ W2,p(RN) if 1 < p < +∞ or u ∈ BV2(RN) if

p = 1. Precisely, we have the following theorems characterizing the second-order Sobolev and

BV spaces.

Theorem 1.7. Let u∈ Lp(RN) for some 1 < p <∞. Then

(1.15) u∈ W2,p(RN) ⇐⇒ lim inf

n→∞

ˆ

RN|Hnu(x)|

pdx <∞. Now let u ∈ L1(RN). Then

(1.16) u∈ BV2(RN) ⇐⇒ lim inf

n→∞

ˆ

RN|Hnu(x)|dx < ∞.

Note that when we write ´RN|Hnu(x)|pdx we mean that the distribution Hnu is

repre-sentable by an Lp function.

Finally, let us mention an important localization result from the perspective of variational image processing, the following theorem asserting the Γ-convergence [DM93, Bra02] of the nonlocal Hessian energies to the energy of the Hessian.

Theorem 1.8. Let u∈ L1(RN). Then ΓL1(RN)- lim

n→∞

ˆ

RN|Hnu| dx = |D

2u|(RN),

where the Γ-limit is taken with respect to the strong convergence un→ u in L1(RN).

The relevance of this theorem in the context of variational problems comes from the fact that Γ-convergence of the objective functions of a sequence of minimization problems, com-bined with an equicoercivity assumption, implies convergence of the minimizers in a suitable topology [Bra02, Chap. 1.5]. Assuming equicoercivity, Theorem1.8then guarantees that un-der a suitable choice of weights, the solutions of a class of nonlocal Hessian-based problems converges to the solution of the local Hessian-regularized problem, and thus our notion of “nonlocal Hessian” is justified. Note that because Theorem 4.1 connects the implicit and explicit definitions of nonlocal Hessian, these results equivalently read that for radial weights that concentrate to a Dirac mass our nonlocal Hessian experiments concentrate to the bounded Hessian framework.

Organization of the paper. The paper is organized as follows: In section2we recall some preliminary notions and we fix our notation. Section 3deals with the analysis of the nonlocal Hessian functional (1.11). After a justification of the introduction of its distributional form, we proceed in section 3.1 with the localization of (1.11) to the classical Hessian for smooth

(9)

functions u. The localization of (1.11) to its classical analogue for W2,p(RN) and BV2(RN) functions is shown in sections 3.2and 3.3, respectively. In section3.4we provide the nonlocal characterizations of the spaces W2,p(RN) and BV2(RN) in the spirit of [BBM01]. The Γ-convergence result, Theorem 1.8, is proved in section 3.5. The introduction of the implicit formulation of nonlocal Hessian (1.9), along with its connection to the explicit one, is presented in section 4.1. In section 4.2 we describe how we choose the weights σx in (1.9) in order to

achieve jump preservation in the restored images. Finally, in section 4.3 we present our numerical results, comparing our method with TGV.

2. Preliminaries and notation. For the reader’s convenience we recall here some impor-tant notions that we are going to use in the following sections and we also fix some notation. As far as our notation is concerned, whenever a function space has two arguments, the first always denotes the domain of the function, while the second denotes its range. Whenever the range is omitted, it is assumed that the functions are real valued. When a function space is in the subscript of a norm, only the domain is specified for the sake of better readability.

We use dx, dy, dz for various integrations with respect to Lebesgue measure on RN, while in section 3we will have occasion to use the more succinct notationLN2 to denote integration with respect to the Lebesgue measure in the product space RN× RN.

The reader should not confuse the different forms of the letter “H”. We denote by H the one-dimensional Hausdorff measure (HN for the N -dimensional), while H denotes the nonlocal Hessian when this is a function. As we have already seen, H denotes the distributional form of the nonlocal Hessian.

It is also very convenient to introduce the following notation:

d2u(x, y) := u(y)− 2u(x) + u(x + (x − y)),

which can be interpreted a discrete second-order differential operator in x at the direction

x− y.

We denote by | · | the Euclidean norm (vectors) and Frobenius norm (matrices).

As usual, we denote by BV(Ω) the space of functions of bounded variation defined on an open Ω⊆ RN. This space consists of all real valued functions u∈ L1(Ω) whose distributional derivative Du can be represented by a finite Radon measure. The total variation TV(u) of a function u ∈ BV(Ω) is defined to be the total variation of the measure Du, i.e., TV(u) :=

|Du|(Ω). The definition is similar for vector valued functions. We refer the reader to [AFP00] for a full account of the theory of BV functions.

We denote by BV2(Ω) the space of functions of bounded Hessian. These are all the functions that belong to the Sobolev space W1,1(Ω) such that∇u is an RN-valued BV function, i.e.,∇u ∈ BV(Ω, RN), and we set D2u := D(∇u). We refer the reader to [Dem85,BP10,PS14]

for more information about this space. Let us, however, state a theorem that will be useful for our purposes. It is the analog result to the strict approximation by smooth functions for the classical BV case; see [AFP00].

Theorem 2.1 (BV2 strict approximation by smooth functions [Dem85]). Let Ω⊆ RN be open, and let u∈ BV2(Ω). Then there exists a sequence (un)n∈N∈ W2,1(Ω)∩ C∞(Ω) that converges

to u strictly in BV2(Ω); that is,

un→ u in L1(Ω) and |D2un|(Ω) → |D2u|(Ω) as n → ∞.

(10)

We recall also the two basic notions of convergence regarding finite Radon measures. We note that M(Ω, R) denotes the space of R-valued finite Radon measures in Ω. If (μ

n)n∈N

and μ are real valued finite Radon measures defined on an open Ω ⊆ RN, we say that the sequence μn converges weakly to μ if for all φ ∈ C0(Ω) we have ´Ωφ dμn ´Ωφ dμ as n

goes to infinity. Here φ ∈ C0(Ω) means that φ is continuous on Ω and that for every > 0 there exists a compact set K ⊂ Ω such that supx∈Ω\K|φ(x)| ≤ . Note that from the Riesz representation theorem the dual space (C0(Ω,R), · ) can be identified with M(Ω, R). We say that the convergence is strict if in addition to that we also have that|μn|(Ω) → |μ|(Ω);

i.e., the total variations of μn converge to the total variation of μ. The definition is similar for vector and matrix valued measures with all the operations regarded componentwise.

We now remind the reader about some basic facts concerning Γ-convergence. Let (X, d) be a metric space, and let F, Fn : X → R ∪ {+∞} for all n ∈ N. We say that the se-quence of functionals Fn Γ-converges to F at x ∈ X in the topology of X, and we write ΓX- limn→∞Fn(x) = F (x) if the following two conditions hold:

1. For every sequence (xn)n∈N converging to x in (X, d) we have

F (x)≤ lim inf

n→∞ Fn(xn).

2. There exists a sequence (xn)n∈N converging to x in (X, d) such that

F (x)≥ lim sup n→∞

Fn(xn).

It can be proved that ΓX- limn→∞Fn(x) = F (x) if the Γ-lower and Γ-upper limits of Fnat x, denoted by ΓX- lim infn→∞Fn(x) and ΓX- lim supn→∞Fn(x), respectively, are equal to F (x), where ΓX- lim inf n→∞ Fn(x) = min  lim inf n→∞ Fn(xn) : xn→ x in (X, d)  , ΓX- lim sup n→∞ Fn(x) = min  lim sup n→∞ Fn(xn) : xn→ x in (X, d)  .

Finally, if F : X → R ∪ {+∞}, we denote by sc−XF the lower semicontinuous envelope of F , i.e., the greatest lower semicontinuous function majorized by F . We refer the reader to

[DM93,Bra02] for further details regarding Γ-convergence and lower semicontinuous envelopes.

3. Analysis of the nonlocal Hessian. The precise form we have chosen for the nonlocal Hessian can be derived from the model case of nonlocal gradients—the fractional gradient— which has been developed in [SS14]. Here we prove several results analogous to the first-order case, as in [MS15], for the generalizations involving generic radial weights that satisfy (1.4)– (1.6). Of primary importance is to first establish that the distributional nonlocal Hessian defined by (1.12) is, in fact, a distribution. Here we observe that if u∈ L1(RN), then

| Hnu, ϕ| ≤ CuL1(RN)∇2ϕL∞(RN),

so thatHnu is a distribution. Also observe that if u∈ Lp(RN) for some 1 < p <∞, then from

the estimate (3.15) below together with the fact that ϕ is of compact support we have

| Hnu, ϕ| ≤ CuLp(RN)∇2ϕLq(RN) ≤ CuLp(RN)∇2ϕL∞(RN),

(11)

where 1/p + 1/q = 1, and thus Hnu is indeed again a distribution. One observes that the

definition is in analogy to the theory of Sobolev spaces, where weak derivatives are defined in terms of the integration by parts formula. Because the Hessian is composed of two derivatives, we observe that there is no change in sign in the definition, preserving some symmetry that will be useful for us in what follows.

The second important item to address is the agreement of the distributional nonlocal Hessian with the nonlocal Hessian. The necessary and sufficient condition is the existence of the latter, which is the assertion of Theorem 1.3. We now substantiate this assertion.

Proof of Theorem 1.3. Let 1 ≤ p < +∞, and suppose that u ∈ Lp(RN) and

|u(x+z)−2u(x)+u(x−z)|q

|z|2q ρn(z) ∈ L1(RN × RN) for some 1 ≤ q ≤ +∞. Let ϕ ∈ Cc2(RN), and

fix i, j ∈ {1, . . . , N}. Then it is a consequence of Fubini’s theorem and Lebesgue’s dominated convergence theorem that

ˆ RNH ij nu(x)ϕ(x)dx = N (N + 2) 2 →0lim ˆ RN ˆ RN\B(x,) d2u(x, y) |x − y|2 (xi− yi)(xj− yj)−|x−y|N +22δij |x − y|2 ρ(x− y)ϕ(x)dydx = N (N + 2) 2 →0lim ˆ dN d2u(x, y) |x − y|2 (xi− yi)(xj− yj)−|x−y|N +22δij |x − y|2 ρ(x− y)ϕ(x)d(LN)2(x, y),

where dN :=RN × RN \ {|x − y| < }. Similarly we have ˆ RNu(x)H ij nϕ(x)dx = N (N + 2) 2 →0lim ˆ RN ˆ RN\B(x,)u(x) d2ϕ(x, y) |x − y|2 (xi− yi)(xj− yj)− δij|x−y| 2 N +2 |x − y|2 ρ(x− y)dydx = N (N + 2) 2 →0lim ˆ dN u(x)d 2ϕ(x, y) |x − y|2 (xi− yi)(xj− yj)− δij|x−y|N +22 |x − y|2 ρ(x− y)d(LN)2(x, y),

where, for notational convenience, we used the standard convention

δij =

1 if i = j,

0 if i= j.

Thus, it suffices to show that for every i, j and > 0 we have ˆ dN d2u(x, y) |x − y|2 (xi− yi)(xj− yj)− δij|x−y|N +22 |x − y|2 ρ(x− y)ϕ(x)d(LN)2(x, y) (3.1) = ˆ dN u(x)d 2ϕ(x, y) |x − y|2 (xi− yi)(xj − yj)− δij|x−y| 2 N +2 |x − y|2 ρ(x− y)d(LN)2(x, y).

(12)

In order to show (3.1), it suffices to prove ˆ dN u(y)ϕ(x) |x − y|2 (xi− yi)(xj− yj)− δij|x−y|N +22 |x − y|2 ρ(x− y)d(LN)2(x, y) (3.2) = ˆ dN u(x)ϕ(y) |x − y|2 (xi− yi)(xj− yj)− δij|x−y|N +22 |x − y|2 ρ(x− y)d(LN)2(x, y) and ˆ dN u(x + (x− y))ϕ(x) |x − y|2 (xi− yi)(xj− yj)− δij|x−y|N +22 |x − y|2 ρ(x− y)d(LN)2(x, y) (3.3) = ˆ dN u(x)ϕ(x + (x− y)) |x − y|2 (xi− yi)(xj− yj)− δij|x−y|N +22 |x − y|2 ρ(x− y)d(LN)2(x, y).

Equation (3.2) can be easily shown by alternating x and y and using the symmetry of the domain. Finally, (3.3) can be proved by employing the substitution u = 2x− y, v = 3x − 2y, noting that x− y = v − u and that the determinant of the Jacobian of this substitution is

−1.

Having established that the notion of distributional nonlocal Hessian and nonlocal Hessian agree whenever the latter exists, it is a natural question to ask when this is the case. It is a simple calculation to verify that the Lebesgue integral (1.11) exists whenever u∈ Cc2(RN). However, this is also the case for functions in the spaces W2,p(RN) and BV2(RN); see Lemmas

3.1 and 3.4.

3.1. Localization–smooth case. We are now ready to prove the localization of Hnu to 2u for smooth functions.

Proof of Theorem 1.4.

Case 1 ≤ p < +∞. Let us assume that we have shown the case p = +∞. Then we

must show that the uniform convergence Hnv→ ∇2v for v∈ Cc2(RN) implies convergence in Lp(RN,RN ×N) for any 1≤ p < +∞. We claim that this will follow from the following uniform estimate on the tails of the nonlocal Hessian. Suppose supp v⊂ B(0, R), where supp v denotes the support of v. Then for any 1 ≤ p < +∞ and > 0 there exists a L = L( , p)  1 such that sup n ˆ B(0,LR)c |Hnv(x)|p dx≤ . (3.4)

If this were the case, we would estimate the Lp-convergence as follows: ˆ RN|Hnv(x)− ∇ 2v(x)|p dx = ˆ B(0,LR) |Hnv(x)− ∇2v(x)|p dx + ˆ B(0,LR)c |Hnv(x)|p dx,

from which (3.4) implies lim sup n→∞ ˆ RN|Hnv(x)− ∇ 2v(x)|p dx≤ lim sup n→∞ ˆ B(0,LR) |Hnv(x)− ∇2v(x)|p dx + .

(13)

The conclusion then follows, since the first term vanishes from the uniform convergence as-sumed, after which > 0 is arbitrary. We will therefore show the estimate (3.4). We have, by Jensen’s inequality with respect to the measure ρn, which has

´ RNρn(x)dx = 1, that ˆ B(0,LR)c |Hnv(x)|p dx≤ N (N + 2) 2 ˆ B(0,LR)c ˆ RN |v(y) − 2v(x) + v(x + x − y)|p |x − y|2p ρn(x− y)dydx = N (N + 2) 2 ˆ B(0,LR)c ˆ y∈B(0,R) |v(y)|p |x − y|2pρn(x− y)dydx +N (N + 2) 2 ˆ B(0,LR)c ˆ x+x−y∈B(0,R) |v(x + x − y)|p |x − y|2p ρn(x− y)dydx.

Letting z = x + x− y (which means that x − y = z − x), we obtain

ˆ B(0,LR)c ˆ x+x−y∈B(0,R) |v(x + x − y)|p |x − y|2p ρn(x− y)dydx = ˆ B(0,LR)c ˆ z∈B(0,R) |v(z)|p |z − x|2pρn(z− x)dzdx,

and therefore by symmetry of ρn we have ˆ B(0,LR)c|Hn v(x)|pdx≤ 2N (N + 2) 2 ˆ B(0,LR)c ˆ y∈B(0,R) |v(y)|p |x − y|2pρn(x− y)dydx ≤N (N + 2) 2 2 (R(L− 1))2p ˆ B(0,LR)c ˆ y∈B(0,R)|v(y)| pρ n(x− y)dydx ≤N (N + 2) 2 2 (R(L− 1))2pρnL1(RN)v p Lp(RN).

Again using´RNρn(x)dx = 1, the claim, and therefore the case 1≤ p < +∞, then follows by

choosing L sufficiently large.

Case p = +∞. It therefore remains to show that the convergence in L∞(RN,RN ×N) is true. Precisely, we will show that

Hnu− ∇2u →0 uniformly,

for which it suffices to prove the convergence componentwise, i.e., Hnu− ∇2u



(i0,j0) →0 by considering two cases i0= j0 and i0 = j0. Before we proceed, let us mention some useful facts. Observe first that Proposition 5.1in the appendix and the assumption that´RNρn(x)dx = 1

for all n∈ N can be used to deduce that ˆ RN zi20zj20 |z|4 ρn(z)dz = ˆ 0 ρn(t)t N −1dt ˆ SN−1ν 2 i0νj20dH N −1(x) (3.5) = 1 N (N + 2) ·  1, i0 = j0, 3, i0 = j0.

Moreover, utilizing the radial symmetry of ρn, we have that the following integrals vanish:

i0 =j0 : ˆ RN ziz3j0 |z|4 ρn(z)dz = 0 for i= j0, (3.6)

(14)

i0 =j0 : ˆ RN zizjz2j0 |z|4 ρn(z)dz = 0 for i= j0, j = j0, i= j, (3.7) i0 = j0 : ˆ RN zizjzi0zj0 |z|4 ρn(z)dz = 0 for i= i0, j= j0. (3.8)

Subcase i0 = j0. Using (3.5), for the case that i0= j0, we have  Hnu − ∇2u(i0,j0)(x) =N(N + 2) 2  ˆRN d2u(x, y) |x − y|2 (xi0− yi0)(xj0− yj0) |x − y|2 ρn(x − y)dy − 2 ∂u ∂xi0∂xj0(x) ˆ RN z2i0z2j0 |z|4 ρn(z)dz  . Moreover, (3.6)–(3.7) imply that

N  i,j=1 ∂u ∂xi∂xj (x) ˆ RN zizi0zjzj0 |z|4 ρn(z)dz = 2 ∂u ∂xi0∂xj0 (x) ˆ RN z2i0z2j0 |z|4 ρn(z)dz.

Thus, introducing these factors of zero and writing in a more compact way, we have that

 Hnu− ∇2u(i 0,j0)(x)   =N (N + 2) 2  ˆ RN d2u(x, y)− (x − y)T∇2u(x)(x− y) |x − y|2 (xi0− yi0)(xj0 − yj0) |x − y|2 ρn(x− y)dy  .

We want to show that the right-hand side tends to zero as n→ ∞, and therefore we define now the following quantity for δ > 0:

(3.9) Qδu(x) =    ˆ B(x,δ) d2u(x, y)− (x − y)T∇2u(x)(x− y) |x − y|2 (xi0 − yi0)(xj0− yj0) |x − y|2 ρn(x− y)dy   . We then claim that we can make Qδu(x) as small as we want, independently of x and n, by

choosing sufficiently small δ > 0. If this is the case, then the case i0 = j0 would be completed, since we would then have that

 Hnu− ∇2u(i 0,j0)(x)   ≤ N (N + 2)2 Qδ(x) +N (N + 2) 2 ˆ |z|≥δ

|u(x + z) − 2u(x) + u(x − z)|

|z|2 ρn(z)dz +N (N + 2) 2 ∇ 2u(x)ˆ |z|≥δ ρn(z)dz ≤N (N + 2) 2 + N (N + 2) 2 4u δ2 +∇ 2u L∞(Ω) ˆ |z|≥δ ρn(z)dz < N (N + 2)

for n large enough, and the result follows from sending → 0.

We therefore proceed to make estimates for (3.9). Since we have assumed u ∈ Cc2(RN), we have that given > 0, there is a δ > 0 such that for every i, j = 1, . . . , N we have

  ∂u ∂xi∂xj (x)− ∂u ∂xi∂xj (y) < whenever |x − y| < δ.

(15)

Using (3.12) we can estimate Qδu(x) =    ˆ B(x,δ) d2u(x, y)− (x − y)T∇2u(x)(x− y) |x − y|2 (xi0 − yi0)(xj0 − yj0) |x − y|2 ρn(x− y)dy    (3.10) =    ˆ B(x,δ)

(x− y)T ´0012u(x + (s + t− 1)(y − x)) − ∇2u(x)dsdt(x− y)

|x − y|2 (3.11) × ((xi0 − yi0)(xj0 − yj0) |x − y|2 ρn(x− y)dy    ≤ N ˆ B(x,δ) |x − y| |x − y| |x − y|2 |xi0 − yi0||xj0 − yj0| |x − y|2 ρn(x− y)dy ≤ N.

Here, we have used the mean value theorem for scalar and vector valued functions to write

(3.12) d2u(x, y) = (x− y)T

ˆ 1

0

ˆ 1

0

2u(x + (t + s− 1)(y − x))dsdt(x− y),

and the fact that´RNρn(x)dx = 1 for all n∈ N. This completes the proof in the case i0 = j0. Subcase i0 =j0. Let us record several observations before we proceed with this case. In

fact, the same argument shows that for a single i∈ {1, . . . , N},

Iin(x) := ˆ RN d2u(x, y)− (x − y)T∇2u(x)(x− y) |x − y|2 (xi− yi)2 |x − y|2 ρn(x− y)dy   → 0 (3.13)

uniformly in x as n→ ∞, and therefore by summing in i we deduce that 

ˆRN d2u(x, y)− (x − y)|x − y|T22u(x)(x− y)ρn(x− y)dy

  → 0. (3.14)

Moreover, we observe that the same formula from Proposition 5.1 and cancellation of odd powers imply that

ˆ RN (x− y)T∇2u(x)(x− y)(xi0− yi0)2 |x − y|4 ρn(x− y)dy = N  j=1 2u ∂x2j(x) ˆ RN zj2z2i0 |z|4 ρn(z)dz = 1 N (N + 2)Δu + 2 3 2u ∂x2i 0 (x) ˆ RN zi40 |z|4ρn(z)dz = 2 N (N + 2) 1 2Δu + 2u ∂x2i 0 (x) ,

while we also have that

(16)

ˆ RN (x− y)T2u(x)(x− y) |x − y|2 ρn(x− y)dy = N  j=1 2u ∂x2j(x) ˆ RN zj2 |z|2ρn(z)dz = 1 NΔu(x).

Thus, we can estimate  Hnu− ∇2u(i 0,i0)(x)   ≤ In i0(x) +  N2 ˆRN d|x − y|2u(x, y)2 ρn(x− y)dy − ˆ RN Δu(x) 2 ρn(x− y)dy   = Iin0(x) +N 2 ˆ RN d2u(x, y)− (x − y)T∇2u(x)(x− y) |x − y|2 ρn(x− y)dy  , and the proof is completed by invoking the convergences established in (3.13) and (3.14).

3.2. Localization–W2,p(RN) case. The objective of this section is to show that if u

W2,p(RN), 1≤ p < ∞, then the nonlocal Hessian Hnu converges to∇2u in Lp. The first step

is to show that indeed in that case Hnu is indeed an Lp function. This follows from Lemma

3.1, which we prove next.

Lemma 3.1. Suppose that u∈ W2,p(RN), where 1≤ p < ∞. Then Hnu is well-defined as a Lebesgue integral, Hnu ∈ Lp(RN,RN ×N), and

(3.15)

ˆ

RN|Hnu(x)|

pdx≤ M∇2up

Lp(RN),

where the constant M depends only on N and p.

Proof. Let us begin by making estimates for a function v∈ C∞(RN)∩ W2,p(RN). From the definition of the nonlocal Hessian and utilizing Jensen’s inequality, (3.12), and Fubini’s theorem, we have the following successive estimates (the constant is always denoted with

M (N, p)): ˆ RN|Hnv(x)| pdx (3.16) = N (N + 2) 2 pˆ RN    ˆ RN d2v(x, y) |x − y|2 (x− y) ⊗ (x − y) −|x−y|N +22IN |x − y|2 ρn(x− y)dy    p dx ≤ M(N, p) ˆ RN ˆ RN |d2v(x, y)| |x − y|2 ρn(x− y)dy p dx,

(17)

≤ M(N, p) ˆ RN ˆ RN |d2v(x, y)|p |x − y|2p ρn(x− y)dy dx (3.17) ≤ M(N, p) ˆ RN ˆ RN ´1 0 |∇v(x + t(y − x)) − ∇v(x + (t − 1)(y − x))|pdt |x − y|p ρn(x− y)dy  dx = M (N, p) ˆ 1 0 ˆ 1 0 ˆ RN ˆ RN∇ 2v(x + (t + s− 1)(y − x))p ρn(x− y)dy dxdsdt = M (N, p) ˆ 1 0 ˆ 1 0 ˆ RN ˆ RN∇ 2v(x + (t + s− 1)ξ)p ρn(ξ)dξ dxdsdt, = M (N, p) ˆ 1 0 ˆ 1 0 ˆ RNρn(ξ)∇ 2vp Lp(RN)dξdsdt = M (N, p)∇2vp Lp(RN).

Consider now a sequence (vk)k∈N in C∞(RN)∩ W2,p(RN) approximating u in W2,p(RN). We already have from above that

(3.18)ˆ RN ˆ RN |vk(x + z)− 2vk(x) + vk(x− z)|p |z|2p ρn(z)dz dx≤ M∇2vkLp(RN) ∀k ∈ N.

Since vk converges to u in Lp(RN), we have that there exists a subsequence vk converging to u almost everywhere.

If we can establish that Hnu is well-defined as a Lebesgue integral, then Jensen’s inequality

and Fatou’s lemma imply that ˆ

RN



ˆRN u(x + z)− 2u(x) + u(x − z)|z|2 ρn(z) dz

 p dx ˆ RN ˆ RN

|u(x + z) − 2u(x) + u(x − z)|p

|z|2p ρn(z)dzdx ≤ lim inf →∞ ˆ RN ˆ RN |vk(x + z)− 2vk(x) + vk(x− z)|p |z|2p ρn(z)dzdx ≤ M lim inf →∞ ∇ 2v kLp(RN) = M∇2uLp(RN).

This argument, along with Jensen’s inequality, allows us to conclude that the conditions of Theorem 1.3 are satisfied, in particular that Hnu is well-defined as a Lebesgue integral, so

that the estimate (3.17) holds for W2,p functions as well, thus completing the proof. Finally, the Gagliardo–Nirenberg inequality

∇2uL1 ≤ C∇2uθLpu1−θLp

implies that for u ∈ W2,p, 2u ∈ L1, which by the preceding display yields that Hnu is

well-defined as a Lebesgue integral.

(18)

We now have the necessary tools to prove the localization for W2,p functions.

Proof of Theorem 1.5. The result holds for functions v∈ Cc2(RN), since from Theorem1.4

we have that Hnv→ ∇2v in Lp(RN,RN ×N). We now use the fact that Cc∞(RN), and hence Cc2(RN), is dense in W2,p(RN); see, for example, [Bre83]. Let > 0; then from density we have that there exists a function v∈ Cc2(RN) such that

∇2u− ∇2v

Lp(RN)≤ .

Thus using also Lemma 3.1we have

Hnu− ∇2uLp(RN)≤ Hnu− HnvLp(RN)+Hnv− ∇2vLp(RN)+∇2v− ∇2uLp(RN) ≤ C + Hnv− ∇2vLp(RN)+ .

Taking limits as n→ ∞ we get lim sup

n→∞ Hn

u− ∇2uLp(RN)≤ (C + 1) , and thus we conclude that

lim

n→∞Hnu− ∇

2u

Lp(RN)= 0.

3.3. Localization–BV2(RN) case. Analogously with the first-order case in [MS15], we can define a second-order nonlocal divergence that corresponds to Hn, and we can also derive

a second-order nonlocal integration by parts formula which is an essential tool for the proofs of this section. The second-order nonlocal divergence is defined for a function φ = (φij)Ni,j=1

as (3.19) D2 nφ(x) = N (N + 2) 2 ˆ RN φ(y)− 2φ(x) + φ(x + (x − y)) |x − y|2 · (x− y) ⊗ (x − y) −|x−y|N+22IN |x − y|2 ρn(x− y)dy,

where A·B =Ni,j=1AijBij for two N×N matrices A and B. Notice that (3.19) is well-defined for φ∈ Cc2(RN,RN ×N).

Theorem 3.2 (second-order nonlocal integration by parts formula). Suppose that u∈ L1(RN)

and |d|x−y|2u(x,y)|2 ρn(x− y) ∈ L1(RN × RN), and let φ∈ Cc2(RN,RN ×N). Then

(3.20) ˆ RNHnu(x)· φ(x)dx = ˆ RNu(x)D 2 nφ(x)dx.

In fact, this theorem can be deduced as a consequence of Theorem 1.3 through a com-ponent by comcom-ponent application and collection of terms. The following lemma shows the convergence of the second-order nonlocal divergence to the continuous analogue div2φ, where φ∈ Cc2(RN,RN ×N) and div2φ := N  i,j=1 ∂φij ∂xi∂xj.

(19)

Lemma 3.3. Let φ∈ Cc2(RN,RN ×N). Then for every 1≤ p ≤ ∞ we have

(3.21) lim

n→∞D

2

nφ− div2φLp(RN)= 0. Proof. The proof follows immediately from Theorem1.4.

The following lemma shows that the nonlocal Hessian (1.11) is well-defined for u BV2(RN). It is the analogue of Lemma 3.1for functions in BV2(RN) this time.

Lemma 3.4. Suppose that u∈ BV2(RN). Then Hnu∈ L1(RN,RN ×N) with (3.22)

ˆ

RN|Hnu(x)|dx ≤ M|D

2u|(RN), where the constant M depends only on N .

Proof. Let (uk)k∈N be a sequence of functions in C∞(RN) that converges strictly in BV2(RN). By the same calculations as in the proof of Lemma3.1 we have for every k∈ N,

ˆ

RN|Hnuk(x)|dx ≤ M(N, 1)∇

2u

kL1(RN).

Using Fatou’s lemma in a way similar to how it was used in Lemma 3.1, we can establish that

Hnu is well-defined as a Lebesgue integral, along with the estimate

ˆ

RN|Hnu(x)|dx ≤ M(N, 1) lim infk→∞ |D

2u

k|(RN)

= M (N, 1)|D2u|(RN),

where above we employed the strict convergence of D2uk to D2u. Thus the result has been

demonstrated.

We can now proceed to prove the localization result for BV2 functions. Recall that we defined μn to be theRN ×N-valued finite Radon measures μn:= HnuLN.

Proof of Theorem 1.6. We first proceed to prove (1.14) for Ccfunctions, and then we con-clude with a density argument. From the estimate (3.22) we have that (n|)n∈N is bounded; thus there exist a subsequence (μnk)k∈Nand anRN ×N-valued Radon measure μ such that μ

nk

converges to μ weakly∗. This means that for every ψ∈ Cc(RN,RN ×N) we have lim k→∞ ˆ RNHnku(x)· ψ(x)dx = ˆ RNψ(x)· dμ.

On the other hand, from the integration by parts formula (3.20) and Lemma3.3 we get that lim k→∞ ˆ RNHnku(x)· ψ(x)dx = limk→∞ ˆ RNu(x)D 2 nkψ(x)dx = ˆ RNu(x)div 2ψ(x)dx = ˆ RNψ(x)· dD 2u.

(20)

This means that μ = D2u. Observe now that since we actually deduce that every subsequence

of (μn)n∈N has a further subsequence that converges to D2u weakly∗, the initial sequence (μn)n∈N converges to D2u weakly∗.

Now we let φ ∈ C0(RN,RN ×N) and > 0. From the density of C

c (RN,RN ×N) in

C0(RN,RN ×N) we can find a ψ∈ Cc(RN,RN ×N) such thatφ − ψ< . Then, using also

the estimate (3.22), we have

 ˆRNHnu(x)· φ(x)dx − ˆ RN φ(x) dD2u ≤ ˆ RN Hnu(x)· (φ(x) − ψ(x))dx + ˆ RN Hnu(x)· ψ(x)dx − ˆ RN ψ(x) dD2u + ˆ RN (φ(x)− ψ(x)) dD2u ˆ RN|Hn u(x)|dx + ˆ RN Hnu(x)· ψ(x)dx − ˆ RN ψ(x) dD2u + |D2u|(RN) ≤ M |D2u|(RN) +ˆ RN Hnu(x)· ψ(x)dx − ˆ RN ψ(x) dD2u + |D2u|(RN).

Taking the limit n→ ∞ from both sides of the above inequality we get that lim sup n→∞  ˆRNHnu(x)· φ(x)dx − ˆ RNφ(x) dD 2u ≤ ˜M ,

and since is arbitrary, we have (1.14).

Let us note here that in the case N = 1 we can also prove strict convergence of the measures μnto D2u; that is, in addition to (1.14) we also have

|μn|(R) → |D2u|(R).

Theorem 3.5. Let N = 1. Then the sequence (μn)n∈N converges to D2u strictly as mea-sures, i.e.,

μn→ D2u weakly∗, and

(3.23)

|μn|(R) → |D2u|(R).

(3.24)

Proof. The weak convergence was proven in Theorem 1.6. Since in the space of finite Radon measures the total variation norm is lower semicontinuous with respect to the weak convergence, we also have

(3.25) |D2u|(R) ≤ lim inf

n→∞ |μn|(R).

Thus it suffices to show that

(3.26) lim sup

n→∞ |μn|(R) ≤ |D

2u|(R).

(21)

Note that in dimension one the nonlocal Hessian formula is

(3.27) Hnu(x) =

ˆ

R

u(y)− 2u(x) + u(x + (x − y))

|x − y|2 ρn(x− y)dy.

Following the proof of Lemma3.1, we can easily verify that for v∈ C∞(R) ∩ BV2(R) we have ˆ

R|Hnv(x)|dx ≤ ∇ 2v

L1(R),

i.e., the constant M that appears in the estimate (3.15) is equal to 1. Using Fatou’s lemma and the BV2 strict approximation of u by smooth functions we get that

|μn|(R) =

ˆ

R|Hnu(x)|dx ≤ |D

2u|(R),

from where (3.26) straightforwardly follows.

3.4. Characterization of higher-order Sobolev and BV spaces. Characterization of Sobo-lev and BV spaces in terms of nonlocal, derivative-free energies has been done so far only in the first-order case; see [BBM01, Pon04b, Men12, MS15]. Here we characterize the spaces

W2,p(RN) and BV2(RN) using our definition of nonlocal Hessian.

Proof of Theorem 1.7. First, we prove (1.15). Suppose that u∈ W2,p(RN). Then, Lemma 3.1 gives lim inf n→∞ ˆ RN|Hnu(x)| pdx≤ M∇2up Lp(RN) <∞.

Suppose now conversely that

lim inf

n→∞

ˆ

RN|Hnu(x)|

pdx <∞.

This means that up to a subsequence, the sequence Hnu is representable (up to a

subse-quence) by a sequence of functions bounded in Lp(RN,RN ×N); thus there exists a subsequence (Hnku)k∈Nand v∈ Lp(RN,RN ×N) such thatHnku  v weakly in Lp(RN,RN ×N). Thus, using the definition of Lp weak convergence together with the definition ofH

nu, we have for every ψ∈ Cc(RN), ˆ RNv ij(x)· ψ(x)dx = lim k→∞ ˆ RNH ij nku(x)· ψ(x) = lim k→∞ ˆ RNu(x)H ij nkψ(x) = ˆ RNu(x) 2ψ(x) ∂xi∂xjdx,

something that shows that v = 2u is the second-order weak derivative of u. Now since u∈ Lp(RN) and the second-order distributional derivative is a function, mollification of u and

the Gagliardo–Nirenberg inequality (see [Nir59, p. 128, eq. 2.5])

(3.28) ∇uLp(RN)≤ C∇2u 1 2 Lp(RN)u 1 2 Lp(RN)

(22)

implies that the first distributional derivative belongs to Lp(RN,RN), and thus u∈ W2,p(RN). We now proceed in proving (1.16). Again supposing that u ∈ BV2(RN) we have that

Lemma 3.4gives us lim inf n→∞ ˆ RN|Hnu(x)|dx ≤ C|D 2u|(RN).

Suppose now that

lim inf

n→∞

ˆ

RN|Hnu(x)|dx < ∞.

Considering again the measures μn=HnuLN we have that there exist a subsequence (μnk)k∈N and a finite Radon measure μ such that μnk  μ weakly∗ ∗. Then for every ψ ∈ Cc∞(RN) we

have, similarly as before, ˆ RNψdμ ij = lim k→∞ ˆ RNH ij nku(x)· ψ(x)dx = lim k→∞ ˆ RNu(x)H ij nkψ(x)dx = ˆ RNu(x) 2ψ(x) ∂xi∂xjdx,

something that shows that μ = D2u. Again, by first mollifying and then passing the limit,

the inequality (3.28) implies that Du∈ M(RN,RN). However, Du∈ M(RN,RN) and D2u∈ M(RN,RN ×N) imply that Du is an L1(RN,RN) function (which is a simple consequence of

the Sobolev inequality; but see also [AFP00, Exerc. 3.2]), and we therefore conclude that

u∈ BV2(RN).

3.5. Γ-convergence. For notational convenience we define the functional

(3.29) Fn(u) := ´

RN|Hnu| dx if Hnu is representable by an L1 function,

otherwise.

Proof of Theorem 1.8. The computation of the Gamma limit consists of two inequalities. For the lower bound, we must show that

|D2u|(RN)≤ lim inf

n→∞ Fn(un)

for every sequence un→ u in L1(RN). Without loss of generality we may assume that C := lim inf

n→∞ Fn(un) < +∞,

which implies that

sup ϕ lim inf n→∞  ˆRNHij nunϕ dx   ≤ C,

where the supremum is taken over ϕ∈ Cc(RN) such thatϕL(RN)≤ 1. Now, the definition of the distributional nonlocal Hessian and the convergence un→ u in L1(RN) imply that

(23)

lim n→∞ ˆ RNH ij nunϕ dx = lim n→∞ ˆ RNunH ij nϕ dx = ˆ RNu 2ϕ ∂xi∂xj dx.

We thus conclude that

sup ϕ  ˆRNu 2ϕ ∂xi∂xj dx   ≤ C,

which, arguing as in the previous section, says that u ∈ BV2(RN), in particular that D2u M(RN,RN ×N) and

|D2u|(RN)≤ ΓL1(RN)- lim inf

n→∞ Fn(u)

for every u∈ L1(RN).

For the upper bound we observe that if u∈ Cc2(RN), we have by the uniform convergence of Theorem 1.4 and the fact that u is sufficiently smooth with compact support that

lim

n→∞Fn(u) =|D

2u|(RN).

Then choosing un= u we conclude that

ΓL1(RN)- lim sup

n→∞

Fn(u)≤ lim

n→∞Fn(u)

=|D2u|(RN).

Now, taking the lower semicontinuous envelope with respect to L1(RN) strong convergence, using that both the ΓL1(RN)- lim sup and the mapping u→ |D2u|(RN) are lower semicontinu-ous on L1(RN) (for the Γ- lim sup see [DM93, Prop. 6.8]), we deduce that

ΓL1(RN)- lim sup n→∞ Fn(u)≤ sc− L1(RN)|D 2u|(RN) =|D2u|(RN) for all u∈ L1(RN).

4. Extensions and applications.

4.1. An asymmetric extension. In the previous sections we have shown that our nonlocal definition of Hn as in (1.11) localizes to the classical distributional Hessian for a specific

choice of the weights ρn and thus can be rightfully called a nonlocal Hessian. In numerical applications, however, the strength of such nonlocal models lies in the fact that the weights can be chosen to have nonlocal interactions and model specific patterns in the data. A classic example is the nonlocal total variation [GO08]:

(4.1) JN L−T V(u) = ˆ Ω ˆ Ω|u(x) − u(y)|  w(x, y)dydx.

A possible choice is to choose w(x, y) large if the patches (neighborhoods) around x and y are similar with respect to a patch distance da, such as a weighted 2 norm, and small if they

(24)

are not. In [GO08] this is achieved by setting w(x, y) = 1 if the neighborhood around y is one of the K ∈ N closest to the neighborhood around x in a search window, and w(x, y) = 0 otherwise. In effect, if the image contains a repeating pattern with a defect that is small enough to not throw off the patch distances too much, it will be repaired as long as most similar patterns do not show the defect.

Computing suitable weights is much less obvious in the case of H. We can formally extend (1.11) using arbitrary pairwise weights ρ :Rn× Rn→ R,

(4.2) Hρu(x) = C

ˆ

RN

u(x + z)− 2u(x) + u(x − z) |z|2

zz−N +2|z|2 IN

|z|2 ρx(z)dz,

and use it to create nonlocal generalizations of functionals such as TV2, for example to mini-mize the nonlocal L2-TV2 model

(4.3) f (u) :=

ˆ

RN|u − g|

2dx + αˆ

RN|Hρ|dx.

However, apart from being formulated onRN instead of Ω, formulation (4.2) has an important drawback compared to the first-order formulation (4.1): while the weights are defined between two points x and y, the left part of the integrand uses the values of u not only at x and y but also at the “mirrored” point x + (x− y). In fact we can replace the weighting function by the symmetrized version 12x(y− x) + ρx(x− y)}, which in effect relates three points instead of two and limits the choice of possible weighting functions.

In this section we therefore introduce a more versatile extension of (1.11) that allows for full nonsymmetric weights. We start with the realization that the finite-difference integrand in (4.2) effectively comes from canceling the first-order differences in the Taylor expansion of

u around x, which couples the values of u at x, y, and x + (x− y) into one term. Instead, we

can avoid this coupling by directly defining the nonlocal gradient Gu(x) and Hessian looking for a nonlocal gradient Gu(x) ∈ RN and Hessian Hu(x) ∈ Sym(RN ×N) that best explain u

around x in terms of a quadratic model, i.e., that take the place of the gradient and Hessian

in the Taylor expansion: (4.4) (Gu(x), Hu(x)) := argmin Gu∈RN,Hu∈Sym(RN×N) 1 2 ˆ Ω−{x} u(x + z)− u(x) − Guz−1 2z H uz 2 σx(z)dz.

Here the variable x + z takes the place of y in (1.11). We denote definition (4.4) the implicit nonlocal Hessian, as opposed to the explicit formulation (4.2).

The advantage is that any terms involving σx(z) are now only based on the two values of u at x and y = x + z, and (in particular bounded) domains other than RN are naturally dealt with, which is important for a numerical implementation. We also note that this ap-proach allows us to incorporate nonlocal first-order terms as a side-effect, and can be naturally extended to third- and higher-order derivatives, which we leave to further work.

With respect to implementation, the implicit model (4.4) does not add much to the overall difficulty: it is enough to add the nonlocal gradient and Hessian Gu(x) ∈ RN and

Hu(x) ∈ RN ×N as additional variables to the problem and couple them to u by adding the

數據

Figure 1. Illustration of the capability of the proposed nonlocal Hessian regularization to obtain true piece- piece-wise affine reconstructions in a denoising example.
Figure 2. Adaptive choice of the neighborhood for the image of a disc with constant slope and added Gaussian noise
Figure 3. Classical local regularization. The input consists of a disc-shaped slope with additive Gaussian noise, σ = 0.25
Figure 4. Nonlocal regularization of the problem in Figure 3 . The adaptive choice of the neighborhood and weights together with the nonlocal Hessian preserves the jumps, clips the top of the slope, and allows one to perfectly reconstruct the piecewise affin
+7

參考文獻

相關文件

Then we can draw a right triangle with angle θ as in Figure 3 and deduce from the Pythagorean Theorem that the third side has length.. This enables us to read from the

In this section we introduce a type of derivative, called a directional derivative, that enables us to find the rate of change of a function of two or more variables in any

With a view to ensuring that developing country Members are able to comply with the provisions of this Agreement, the Committee shall grant to such countries, upon request,

It would be game-changing to form a class atmosphere that encourage diversity and discussion, formed of groups with different microculture that optimized the sense of belonging

the larger dataset: 90 samples (libraries) x i , each with 27679 features (counts of SAGE tags) (x i ) d.. labels y i : 59 cancerous samples, and 31

The contents of this essay are to demonstrate that one can get the ultimate achievements by Separate-teaching also, to clarify the value of Separate-teaching and

Reading Task 6: Genre Structure and Language Features. • Now let’s look at how language features (e.g. sentence patterns) are connected to the structure

Enrich the poem with a line that appeals to this missing sense.. __________________imagery that appeals to the sense of____________has not been used in the description