Analysis and Application of a Nonlocal Hessian

(1)

Vol. 8, No. 4, pp. 2161–2202

Analysis and Application of a Nonlocal Hessian∗

Jan Lellmann†, Konstantinos Papafitsoros†, Carola Sch¨onlieb†, and Daniel Spector‡

Abstract. In this work we introduce a formulation for a nonlocal Hessian that combines the ideas of higher-order and nonlocal regularization for image restoration, extending the idea of nonlocal gradients to higher-order derivatives. By intelligently choosing the weights, the model allows us to improve on the current state of the art higher-order method, total generalized variation, with respect to overall quality and preservation of jumps in the data. In the spirit of recent work by Brezis et al., our formulation also has analytic implications: for a suitable choice of weights it can be shown to converge to classical second-order regularizers, and in fact it allows a novel characterization of higher-order Sobolev and BV spaces.

Key words. nonlocal Hessian, nonlocal total variation regularization, variational methods, fast marching method, amoeba ﬁlters

AMS subject classifications. 65D18, 68U10, 94A08, 35A15, 49J40, 49Q20, 26B30, 26B35, 46E35 DOI. 10.1137/140993818

1. Introduction and context. The total variation model of image restoration due to Rudin, Osher, and Fatemi [ROF92] is now classical—the problem of being given a noisy image g ∈ L2(Ω) on an open set Ω⊆ R2 and selecting a restored image via minimization of the energy

E(u) :=

ˆ

Ω(u− g)

2 _{dx + α TV(u).}

Here, α > 0 is a regularization parameter at our disposal and TV(u) := |Du|(Ω) is the total variation of the measure Du (the distributional derivative of u, which has finite total mass when one assumes u is of bounded variation [AFP00]). Among the known defects of the model is the staircasing effect, where affine portions of the image are replaced by flat regions and newly created artificial boundaries, stemming from the use of the TV term in regularization. It is then natural to investigate the replacement of the total variation with ∗_{Received by the editors October 31, 2014; accepted for publication (in revised form) July 10, 2015; published} electronically October 6, 2015. This project was supported by King Abdullah University of Science and Technology (KAUST) award KUK-I1-007-43.

http://www.siam.org/journals/siims/8-4/99381.html

†_{Department of Applied Mathematics and Theoretical Physics, University of Cambridge, Cambridge CB3 OWA,} United Kingdom (j.lellmann@damtp.cam.ac.uk,kp366@cam.ac.uk, cbs31@cam.ac.uk). The first author’s research was supported by Leverhulme Early Career Fellowship ECF-2013-436. The second author’s research was supported by the Cambridge Centre for Analysis and the EPSRC. The third author’s research was partially supported by EPSRC grants EP/J009539/1 and EP/M00483X/1.

‡_{Technion, Israel Institute of Technology, Haifa, Israel, and Department of Applied Mathematics, National Chiao} Tung University, Hsinchu 30010, Taiwan (dspector@math.nctu.edu.tw). This author’s research was supported in part by a Technion Fellowship and by Taiwan Ministry of Science and Technology research grant 103-2115-M-009-016-MY2.

2161

(2)

another regularizer, for instance a higher-order term (see [Sch98,LLT03,LT06,HS06,CEP07,

PS14] for the bounded Hessian framework, [CL97,BKP10,SST11] for infimal convolution and generalizations, and [LBL13] for anisotropic variants) or a nonlocal term (see, for example, the work of Buades, Coll, and Morel [BCM05], Kinderman, Osher, and Jones [KOJ05], and Gilboa and Osher [GO08]). In this work, we introduce and analyze a regularizer that is both higher-order and nonlocal—a nonlocal Hessian—and utilize it in a model for image restoration. Our numerical experiments demonstrate that using this regularization with a suitable choice of weights enables us to derive specialized models that compete with current state of the art higher-order methods such as the total generalized variation [BKP10]. Meanwhile, our analysis justifies the nomenclature nonlocal Hessian through its connection with recent work on nonlocal gradients [MS15]. In particular, we perform rigorous localization analysis which parallels the first-order case.

Background on higher-order regularization. The use of nonsmooth regularization terms such as the total variation in image reconstruction results in a nonlinear smoothing of recon-structed images. As a consequence, one observes a greater degree of smoothing in homoge-neous areas of the image domain while preserving characteristic structures such as edges. In particular, total variation regularization performs well if the reconstructed image is piecewise constant. The drawback of such a regularization procedure becomes apparent as soon as im-ages or signals (in one dimension) are considered which not only consist of ﬂat regions and jumps but also possess slanted regions, i.e., piecewise linear parts. The artifact introduced by total variation regularization in this case is called staircasing. One possible approach to improve total variation minimization is the introduction of higher-order derivatives in the regularizer, whose literature we now brieﬂy review.

In [CL97] Chambolle and Lions propose a higher-order method by means of an inﬁmal convolution of two convex regularizers. Here, a noisy image is decomposed into three parts

g = u₁+ u₂+ n by solving (1.1) min (u1,u2) 1 2 ˆ Ω(u1+ u2− g) 2_{dx + α TV(u} 1) + β TV2(u₂),

where TV2(u₂) :=|D2u₂|(Ω) is the total variation of the distributional Hessian of u₂. Then,

u₁ and u₂ are the piecewise constant and the piecewise aﬃne parts of g, respectively, and

n the noise (or texture). For recent modiﬁcations of this approach in the discrete setting,

see also [SS08, SST11]. Other approaches combining ﬁrst and second regularization origi-nate, for instance, from Chan, Marquina, and Mulet [CMM01], who consider total variation minimization together with weighted versions of the Laplacian, the Euler-elastica functional [MM98, CKS02], which combines total variation regularization with curvature penalization, and many more [LT06, LTC13, PS14,PSS13, Ber14]. Recently, Bredies, Kunisch, and Pock have proposed another interesting higher-order total variation model called total generalized variation (TGV) [BKP10]. The TGV regularizer of order k is of the form

(1.2) TGVk_α(u) = sup ˆ Ωu div k_{ξ dx : ξ}_{∈ C}k c(Ω, Symk(RN)), divlξ∞≤ αl, l = 0, . . . , k− 1 ,

where Symk(RN) denotes the space of symmetric tensors of order k with arguments in RN,

(3)

and α_l are ﬁxed positive parameters. For the case k = 2, its formulation for the solution of general inverse problems was given in [BV11].

The idea of pure bounded Hessian regularization is considered by Lysaker, Lundervold, and Tai [LLT03], Scherzer [Sch98], Hinterberger and Scherzer [HS06], Lefkimmiatis, Bourquard, and Unser [LBU12], and Bergounioux and Piﬀet [BP10]. In these works the considered model has the general form

min u 1 2 ˆ Ω(u− g) 2 _{dx + α|D}2_u|(Ω).

In [CEP07], Chan, Esedoglu, and Park use the squared L2 norm of the Laplacian as a reg-ularizer also in combination with the H−1 norm in the data ﬁtting term. Further, in [PS08] minimizers of functionals which are regularized by the total variation of the (l−1)st derivative, i.e.,

|D∇l−1_u|(Ω),

are studied. Properties of such regularizers in terms of diﬀusion ﬁlters are further studied in [DWB09]. Therein, the authors consider the Euler–Lagrange equations corresponding to minimizers of functionals of the general type

J (u) = ˆ Ω(u− g) 2_{dx + α}ˆ Ωf ⎛ ⎝ |k|=p |Dk_u_|2 ⎞ ⎠ dx

for diﬀerent nonquadratic functions f . There are also works on higher-order PDE methods for image regularization; see, e.g., [CS01,LLT03,BG04,BEG08,BHS09].

Conﬁrmed by all of these works on higher-order total variation regularization, the in-troduction of higher-order derivatives can have a positive eﬀect on artifacts like staircasing inherent to total variation [Rin00].

Higher-order nonlocal regularization. One possible approach to a higher-order extension of nonlocal regularization has been proposed recently in the work [RBP14], with optical ﬂow being the main application. The authors start with the cascading formulation of (second-order) TGV, TGV(u) = inf w:Ω→RN α₁ ˆ Ω|Du − w| + α0 ˆ Ω|Dw|,

which reduces the higher-order differential operators that appear in the definition of TGV to a special type of infimal convolution of two terms involving only first-order derivatives [BV11]. These can then be replaced by classical first-order nonlocal derivatives, and one obtains an energy of the form

inf w:Ω→RN ˆ Ω ˆ Ωα1(x, y)|u(x)−u(y)−w(x) _(x_−y)|dydx+2 i=1 ˆ Ω ˆ Ωα0(x, y)|w i_(x)_−wi_(y)_{| dydx.}

This formulation takes into account the higher-order diﬀerential information via the second term in the minimization, and the weighting parameters α₀ and α₁ are now spatially depen-dent. Even though this approach can be adapted for other imaging tasks, e.g., denoising, it is not clear how to choose these weighting functions.

(4)

In this paper we deﬁne a diﬀerent type of higher-order nonlocal regularizer, providing as well a rule for choosing the corresponding weighting functions for optimal results. Before we proceed we recall some basic facts about nonlocal gradients.

Background on nonlocal gradients. In the ﬁrst-order setting, the analysis of nonlocal gradients and their associated energies ﬁnds its origins in the 2001 paper of Bourgain, Brezis, and Mironescu [BBM01]. In their paper, Bourgain, Brezis, and Mironescu introduce energies of the form (1.3) F_nu := ˆ Ω ˆ Ω |u(x) − u(y)|pq |x − y|pq ρn(x− y)dx 1 q dy,

where Ω is a smooth bounded domain in RN and 1 ≤ p < ∞, and in the special case q = 1. Here, the functions ρn are radial molliﬁers that are assumed to satisfy the following three

properties for all n∈ N:

ρ_n(x)≥ 0, (1.4) ˆ RNρn(x)dx = 1, (1.5) lim n→∞ ˆ |x|>γρn(x)dx = 0 ∀γ > 0. (1.6)

An example of such a family of mollifiers are the standard Gaussian kernels that converge to a Dirac δ as n tends to infinity. Let us here remark that a perhaps more appropriate terminology for these functionals in image processing is semilocal, since asymptotically there is no possibility of nonlocality, in contrast to the genuine nonlocality allowed in image processing. The work [BBM01] connects the finiteness of the limit as n → ∞ of the functional (1.3) with the inclusion of a function u ∈ Lp(Ω) in the Sobolev space W1,p(Ω) if p > 1 or BV(Ω) if p = 1. As in the beginning of the introduction, the space BV(Ω) refers to the space of functions of bounded variation, and it is no coincidence that the two papers [BBM01,ROF92] utilize this energy space. Indeed, Gilboa and Osher [GO08] in 2008 independently introduce an energy similar to (1.3), terming it a nonlocal total variation, while the connection of the two and the introduction of the parameter q is due to Leoni and Spector [LS11]. In particular, they show in [LS14] that for p = 1 the functionals (1.3) Γ-converge to a constant times the total variation. This result extends previous work by Ponce [Pon04b] in the case q = 1 (see also the work of Aubert and Kornprobst [AK09] for an application of these results to image processing).

Gilboa and Osher [GO08] in fact introduced two forms of nonlocal total variations, and for our purposes here it will be useful to consider the second. This alternative involves introducing a nonlocal gradient operator, deﬁned by

(1.7) Gnu(x) := N ˆ Ω u(x)− u(y) |x − y| x− y |x − y|ρn(x− y)dy, x ∈ Ω,

for u ∈ C_c1(Ω). Then, one deﬁnes the nonlocal total variation as the L1 norm of (1.7). The localization analysis of the nonlocal gradient (1.7) has been performed by Mengesha and

(5)

Spector in [MS15], where a more general (and technical) distributional definition is utilized. Their first observation is that the definition of the nonlocal gradient via the Lebesgue integral (1.7) extends to spaces of weakly differentiable functions. In this regime they discuss the localization of (1.7). They prove that the nonlocal gradient converges to its local analogue

∇u in a topology that corresponds to the regularity of the underlying function u. As a

result, they obtain yet another characterization of the spaces W1,p(Ω) and BV(Ω). Of notable interest for image processing purposes is their result on the Γ-convergence of the corresponding nonlocal total variation energies deﬁned via nonlocal gradients of the form (1.7) to the local total variation.

One way to extend the results of Mengesha and Spector to the higher-order case is to simply study the functional that results after substituting u with∇u in (1.7). Then a nonlocal Hessian could be deﬁned via

(1.8) G_n(∇u)(x) = N ˆ Ω ∇u(x) − ∇u(y) |x − y| ⊗ x− y |x − y|ρn(x− y)dy,

where ⊗ denotes the standard tensor multiplication of vectors. While one can obtain some straightforward characterization of W2,p(RN_{) and BV}2₍_RN_{) in this way, we ﬁnd it}

advanta-geous to utilize a nonlocal Hessian that is derivative-free and therefore pursue an alternative approach.

A nonlocal Hessian tuned for imaging tasks. We deﬁne an implicit nonlocal gradient

Gu(x) ∈ RN and Hessian Hu(x) ∈ Sym(RN ×N) that best explain u around x in terms of a quadratic model : (1.9) (G_u(x), H_u(x)) = argmin Gu∈RN,Hu∈Sym(RN×N) 1 2 ˆ Ω−{x} u(x + z)− u(x) − G_uz−1 2z _H uz ₂ σx(z)dz,

where Ω− {x} = {y − x : y ∈ Ω} and σx is an appropriate weight function for each x∈ Ω.

Such a deﬁnition has the advantage of the freedom to choose the weights σx as one sees ﬁt.

Of primary interest to our work are two questions: How does the nonlocal Hessian perform in comparison to the known state of the art methods? And in what way is it connected to the classical Hessian?

To answer the ﬁrst question, the model depends on the choice of weights, and of practical relevance is the question of how to choose them for a particular purpose. The ﬁrst point to mention in this regard is that as the objectives of the minimization problems (1.9) are quadratic, their solutions can be characterized by linear optimality conditions. Thus func-tionals based on the implicit nonlocal derivatives can be easily included in usual convex solvers by adding these conditions. Moreover, the weights σ_x(z) between any pair of points x and

y = x + z can be chosen arbitrarily, without any restrictions on symmetry. In particular,

in this work we develop a method of choosing weights to construct a regularizer that both favors piecewise affine functions while allowing for jumps in the data. Our motivation stems from the recent discussion of “amoeba” filters in [LDM07, WBV11, Wel12], which combine standard filters such as median filters with nonparametric structuring elements that are based on the data; that is, in long thin objects they would extend along the structure and thus

(6)

Input TGV Non-local Hessian

Figure 1. Illustration of the capability of the proposed nonlocal Hessian regularization to obtain true piece-wise aﬃne reconstructions in a denoising example.

prevent smoothing perpendicular to the structure. In amoeba filtering, the shape of a struc-turing element at a point is defined as a unit circle with respect to the geodesic distance on a manifold defined by the image itself. In a similar manner, we utilize the geodesic distance to set the weights σ_x. This allows us to get a very close approximation to true piecewise affine regularization, in many cases improving on the results obtained using TGV; see Figure 1 for a proof of concept. We present several experiments in section 4.3that show the performance of this choice against the state of the art.

As to the second question, in the general form of (1.9), the problem is considerably harder to treat analytically, and so we will restrict ourselves to the special case of radial weights. In particular, assuming some mild regularity assumptions on u and considering the problem (1.9) with weights ρn(z)/|z|4, (1.10) (G_u(x), H_u(x)) = argmin Gu∈RN,Hu∈Sym(RN×N) 1 2 ˆ RN u(x + z)− u(x) − G_uz−1 2z _H uz ₂ ρ_n(z) |z|4 dz,

we will show in Theorem 4.1 that H_u agrees with the following natural explicit deﬁnition of nonlocal Hessian.

Definition 1.1. Suppose u ∈ C_c2(RN). Then we define the explicit nonlocal Hessian as the

Lebesgue integral (1.11) Hnu(x) := N (N + 2) 2 ˆ RN

u(x + z)− 2u(x) + u(x − z) |z|2

z⊗ z − _{N +2}|z|2 IN

|z|2 ρn(z)dz, x∈ RN, where here I_N is the N× N identity matrix and ρ_n is a sequence satisfying (1.4)–(1.6).

We note here that the presence of the constant N (N +2)/2 as well as the term z⊗z−_{N +2}|z|2 I_N

ensure that (1.11) has the right localization properties; see section 3 for more details. The assertion of Theorem 4.1is that with the preceding choice of weights one has the equivalence

H_u(x)≡ N (N + 2) 2

ˆ

RN

u(x + z)− 2u(x) + u(x − z) |z|2

z⊗ z −_{N +2}|z|2 I_N

|z|2 ρn(z)dz.

(7)

Results concerning the explicit nonlocal Hessian. From the standpoint of analysis, the explicit version of nonlocal Hessian (1.11) is more natural, for which we are able to prove a number of results analogous to the first-order case studied by Mengesha and Spector [MS15]. Let us first extend the definition to functions which are not necessarily smooth and compactly supported, as typical for operators acting on spaces of weakly differentiable functions.

Definition 1.2. Suppose u ∈ Lp(RN). Then we define the distributional nonlocal Hessian

componentwise as Hij nu, ϕ := ˆ RNuH ij nϕ dx (1.12)

for ϕ∈ C_c∞(RN), where Hnijφ denotes the i, jth element of the nonlocal Hessian matrix (1.11).

A natural question is then whether these two notions agree. The following theorem shows that this is the case, provided the Lebesgue integral exists.

Theorem 1.3 (nonlocal integration by parts). Suppose that u ∈ Lp(RN) for some 1 ≤ p < +∞ and |u(x+z)−2u(x)+u(x−z)|_|z|2q qρn(z) ∈ L1(RN × RN) for some 1 ≤ q ≤ +∞. Then the distribution H_nu can be represented by the function H_nu, i.e., for any ϕ ∈ C_c2(RN) and

i, j = 1, . . . , N , Hij nu, ϕ = ˆ RNH ij nu(x)ϕ(x) dx, (1.13) and also Hnu∈ L1(RN,RN ×N).

We will see in section 3, in Lemmas 3.1 and 3.4, that the Lebesgue integral even makes sense for u∈ W2,p(RN) or BV2(RN), and therefore the distributional deﬁnitionH_nu coincides

with the Lebesgue integral for these functions.

Then the main analysis we undertake in this paper are the following results, proving localization results in various topologies and characterizations of higher-order spaces of weakly diﬀerentiable functions. Our ﬁrst result is the following theorem concerning the localization in the smooth case.

Theorem 1.4. Suppose that u∈ C_c2(RN). Then for any 1≤ p ≤ +∞,

Hnu→ ∇2u in Lp(RN,RN ×N) as n→ ∞.

When less smoothness is assumed on u, we have analogous convergence theorems, where the topology of convergence depends on the smoothness of u. When u ∈ W2,p(RN), we have the following.

Theorem 1.5. Let 1≤ p < ∞. Then for every u ∈ W2,p(RN) we have that

H_nu→ ∇2u in Lp(RN,RN ×N) as n→ ∞.

In the setting of BV2(RN) (see section 2 for a deﬁnition), we have the following theorem on the localization of the nonlocal Hessian.

Theorem 1.6. Let u∈ BV2(RN) and μn:= HnLN be a sequence ofRN ×N-valued measures. Then

μ_n→ D2u, weakly∗ in the space of Radon measures;

(8)

i.e., for every φ∈ C₀(RN,RN ×N), (1.14) lim n→∞ ˆ RNHnu(x)· φ(x)dx = ˆ RNφ(x)· dD 2_u.

We have seen that the nonlocal Hessian is well-defined as a Lebesgue integral and localizes for spaces of weakly differentiable functions. In fact, it is sufficient to assume that u∈ Lp(RN) is a function such that the distributions Hnu are in Lp(RN,RN ×N) with a uniform bound of

their Lp norms, in order to deduce that u ∈ W2,p(RN) if 1 < p < +∞ or u ∈ BV2(RN) if

p = 1. Precisely, we have the following theorems characterizing the second-order Sobolev and

BV spaces.

Theorem 1.7. Let u∈ Lp(RN) for some 1 < p <∞. Then

(1.15) u∈ W2,p(RN) ⇐⇒ lim inf

n→∞

ˆ

RN|Hnu(x)|

p_{dx <}_∞. Now let u ∈ L1(RN). Then

(1.16) u∈ BV2(RN) ⇐⇒ lim inf

n→∞

ˆ

RN|Hnu(x)|dx < ∞.

Note that when we write ´_R_N|H_nu(x)|pdx we mean that the distribution H_nu is

repre-sentable by an Lp function.

Finally, let us mention an important localization result from the perspective of variational image processing, the following theorem asserting the Γ-convergence [DM93, Bra02] of the nonlocal Hessian energies to the energy of the Hessian.

Theorem 1.8. Let u∈ L1(RN). Then Γ_L1_(RN₎- lim

n→∞

ˆ

RN|Hnu| dx = |D

2_u_|(RN_),

where the Γ-limit is taken with respect to the strong convergence un→ u in L1(RN).

The relevance of this theorem in the context of variational problems comes from the fact that Γ-convergence of the objective functions of a sequence of minimization problems, com-bined with an equicoercivity assumption, implies convergence of the minimizers in a suitable topology [Bra02, Chap. 1.5]. Assuming equicoercivity, Theorem1.8then guarantees that un-der a suitable choice of weights, the solutions of a class of nonlocal Hessian-based problems converges to the solution of the local Hessian-regularized problem, and thus our notion of “nonlocal Hessian” is justiﬁed. Note that because Theorem 4.1 connects the implicit and explicit deﬁnitions of nonlocal Hessian, these results equivalently read that for radial weights that concentrate to a Dirac mass our nonlocal Hessian experiments concentrate to the bounded Hessian framework.

Organization of the paper. The paper is organized as follows: In section2we recall some preliminary notions and we ﬁx our notation. Section 3deals with the analysis of the nonlocal Hessian functional (1.11). After a justiﬁcation of the introduction of its distributional form, we proceed in section 3.1 with the localization of (1.11) to the classical Hessian for smooth

(9)

functions u. The localization of (1.11) to its classical analogue for W2,p(RN) and BV2(RN) functions is shown in sections 3.2and 3.3, respectively. In section3.4we provide the nonlocal characterizations of the spaces W2,p(RN) and BV2(RN) in the spirit of [BBM01]. The Γ-convergence result, Theorem 1.8, is proved in section 3.5. The introduction of the implicit formulation of nonlocal Hessian (1.9), along with its connection to the explicit one, is presented in section 4.1. In section 4.2 we describe how we choose the weights σx in (1.9) in order to

achieve jump preservation in the restored images. Finally, in section 4.3 we present our numerical results, comparing our method with TGV.

2. Preliminaries and notation. For the reader’s convenience we recall here some impor-tant notions that we are going to use in the following sections and we also fix some notation. As far as our notation is concerned, whenever a function space has two arguments, the first always denotes the domain of the function, while the second denotes its range. Whenever the range is omitted, it is assumed that the functions are real valued. When a function space is in the subscript of a norm, only the domain is specified for the sake of better readability.

We use dx, dy, dz for various integrations with respect to Lebesgue measure on RN, while in section 3we will have occasion to use the more succinct notationLN2 to denote integration with respect to the Lebesgue measure in the product space RN× RN.

The reader should not confuse the diﬀerent forms of the letter “H”. We denote by H the one-dimensional Hausdorﬀ measure (HN for the N -dimensional), while H denotes the nonlocal Hessian when this is a function. As we have already seen, H denotes the distributional form of the nonlocal Hessian.

It is also very convenient to introduce the following notation:

d2u(x, y) := u(y)− 2u(x) + u(x + (x − y)),

which can be interpreted a discrete second-order diﬀerential operator in x at the direction

x− y.

We denote by | · | the Euclidean norm (vectors) and Frobenius norm (matrices).

As usual, we denote by BV(Ω) the space of functions of bounded variation defined on an open Ω⊆ RN. This space consists of all real valued functions u∈ L1(Ω) whose distributional derivative Du can be represented by a finite Radon measure. The total variation TV(u) of a function u ∈ BV(Ω) is defined to be the total variation of the measure Du, i.e., TV(u) :=

|Du|(Ω). The deﬁnition is similar for vector valued functions. We refer the reader to [AFP00] for a full account of the theory of BV functions.

We denote by BV2(Ω) the space of functions of bounded Hessian. These are all the functions that belong to the Sobolev space W1,1(Ω) such that∇u is an RN-valued BV function, i.e.,∇u ∈ BV(Ω, RN_{), and we set D}2_{u := D(}_{∇u). We refer the reader to [}_Dem85_,_BP10_,_PS14_]

for more information about this space. Let us, however, state a theorem that will be useful for our purposes. It is the analog result to the strict approximation by smooth functions for the classical BV case; see [AFP00].

Theorem 2.1 (BV2 strict approximation by smooth functions [Dem85]). Let Ω⊆ RN be open, and let u∈ BV2(Ω). Then there exists a sequence (u_n)_n∈N∈ W2,1(Ω)∩ C∞(Ω) that converges

to u strictly in BV2(Ω); that is,

u_n→ u in L1(Ω) and |D2u_n|(Ω) → |D2u|(Ω) as n → ∞.

(10)

We recall also the two basic notions of convergence regarding ﬁnite Radon measures. We note that M(Ω, R_{) denotes the space of} _R_{-valued ﬁnite Radon measures in Ω. If (μ}

n)n∈N

and μ are real valued ﬁnite Radon measures deﬁned on an open Ω ⊆ RN, we say that the sequence μ_n converges weakly∗ to μ if for all φ ∈ C₀(Ω) we have ´_Ωφ dμ_n → ´_Ωφ dμ as n

goes to inﬁnity. Here φ ∈ C₀(Ω) means that φ is continuous on Ω and that for every > 0 there exists a compact set K ⊂ Ω such that sup_x∈Ω\K|φ(x)| ≤ . Note that from the Riesz representation theorem the dual space (C₀(Ω,R), · _∞)∗ can be identiﬁed with M(Ω, R). We say that the convergence is strict if in addition to that we also have that|μn|(Ω) → |μ|(Ω);

i.e., the total variations of μ_n converge to the total variation of μ. The deﬁnition is similar for vector and matrix valued measures with all the operations regarded componentwise.

We now remind the reader about some basic facts concerning Γ-convergence. Let (X, d) be a metric space, and let F, F_n : X → R ∪ {+∞} for all n ∈ N. We say that the se-quence of functionals F_n Γ-converges to F at x ∈ X in the topology of X, and we write ΓX- limn→∞Fn(x) = F (x) if the following two conditions hold:

1. For every sequence (x_n)_n∈N converging to x in (X, d) we have

F (x)≤ lim inf

n→∞ Fn(xn).

2. There exists a sequence (x_n)_n∈N converging to x in (X, d) such that

F (x)≥ lim sup n→∞

Fn(xn).

It can be proved that Γ_X- lim_n→∞F_n(x) = F (x) if the Γ-lower and Γ-upper limits of F_nat x, denoted by Γ_X- lim inf_n→∞F_n(x) and Γ_X- lim sup_n→∞F_n(x), respectively, are equal to F (x), where ΓX- lim inf n→∞ Fn(x) = min lim inf n→∞ Fn(xn) : xn→ x in (X, d) , Γ_X- lim sup n→∞ F_n(x) = min lim sup n→∞ F_n(x_n) : x_n→ x in (X, d) .

Finally, if F : X → R ∪ {+∞}, we denote by sc−_XF the lower semicontinuous envelope of F , i.e., the greatest lower semicontinuous function majorized by F . We refer the reader to

[DM93,Bra02] for further details regarding Γ-convergence and lower semicontinuous envelopes.

3. Analysis of the nonlocal Hessian. The precise form we have chosen for the nonlocal Hessian can be derived from the model case of nonlocal gradients—the fractional gradient— which has been developed in [SS14]. Here we prove several results analogous to the first-order case, as in [MS15], for the generalizations involving generic radial weights that satisfy (1.4)– (1.6). Of primary importance is to first establish that the distributional nonlocal Hessian defined by (1.12) is, in fact, a distribution. Here we observe that if u∈ L1(RN), then

| Hnu, ϕ| ≤ CuL1(RN)∇2ϕL∞(RN),

so thatHnu is a distribution. Also observe that if u∈ Lp(RN) for some 1 < p <∞, then from

the estimate (3.15) below together with the fact that ϕ is of compact support we have

| Hnu, ϕ| ≤ CuLp(RN)∇2ϕLq(RN) ≤ CuLp(RN)∇2ϕL∞(RN),

(11)

where 1/p + 1/q = 1, and thus H_nu is indeed again a distribution. One observes that the

definition is in analogy to the theory of Sobolev spaces, where weak derivatives are defined in terms of the integration by parts formula. Because the Hessian is composed of two derivatives, we observe that there is no change in sign in the definition, preserving some symmetry that will be useful for us in what follows.

The second important item to address is the agreement of the distributional nonlocal Hessian with the nonlocal Hessian. The necessary and suﬃcient condition is the existence of the latter, which is the assertion of Theorem 1.3. We now substantiate this assertion.

Proof of Theorem 1.3. Let 1 ≤ p < +∞, and suppose that u ∈ Lp(RN) and

|u(x+z)−2u(x)+u(x−z)|q

|z|2q ρn(z) ∈ L1(RN × RN) for some 1 ≤ q ≤ +∞. Let ϕ ∈ Cc2(RN), and

ﬁx i, j ∈ {1, . . . , N}. Then it is a consequence of Fubini’s theorem and Lebesgue’s dominated convergence theorem that

ˆ RNH ij nu(x)ϕ(x)dx = N (N + 2) 2 →0lim ˆ RN ˆ RN_\B(x,) d2u(x, y) |x − y|2 (x_i− y_i)(x_j− y_j)−|x−y|_{N +2}2δ_ij |x − y|2 ρ(x− y)ϕ(x)dydx = N (N + 2) 2 →0lim ˆ dN d2u(x, y) |x − y|2 (x_i− y_i)(x_j− y_j)−|x−y|_{N +2}2δ_ij |x − y|2 ρ(x− y)ϕ(x)d(LN)2(x, y),

where dN :=RN × RN \ {|x − y| < }. Similarly we have ˆ RNu(x)H ij nϕ(x)dx = N (N + 2) 2 →0lim ˆ RN ˆ RN_\B(x,)u(x) d2ϕ(x, y) |x − y|2 (xi− yi)(xj− yj)− δij|x−y| 2 N +2 |x − y|2 ρ(x− y)dydx = N (N + 2) 2 →0lim ˆ dN u(x)d 2_{ϕ(x, y)} |x − y|2 (x_i− y_i)(x_j− y_j)− δ_ij|x−y|_{N +2}2 |x − y|2 ρ(x− y)d(LN)2(x, y),

where, for notational convenience, we used the standard convention

δij =

1 if i = j,

0 if i= j.

Thus, it suﬃces to show that for every i, j and > 0 we have ˆ dN d2u(x, y) |x − y|2 (x_i− y_i)(x_j− y_j)− δ_ij|x−y|_{N +2}2 |x − y|2 ρ(x− y)ϕ(x)d(LN)2(x, y) (3.1) = ˆ dN u(x)d 2_{ϕ(x, y)} |x − y|2 (xi− yi)(xj − yj)− δij|x−y| 2 N +2 |x − y|2 ρ(x− y)d(LN)2(x, y).

(12)

In order to show (3.1), it suﬃces to prove ˆ dN u(y)ϕ(x) |x − y|2 (x_i− y_i)(x_j− y_j)− δ_ij|x−y|_{N +2}2 |x − y|2 ρ(x− y)d(LN)2(x, y) (3.2) = ˆ dN u(x)ϕ(y) |x − y|2 (x_i− y_i)(x_j− y_j)− δ_ij|x−y|_{N +2}2 |x − y|2 ρ(x− y)d(LN)2(x, y) and ˆ dN u(x + (x− y))ϕ(x) |x − y|2 (x_i− y_i)(x_j− y_j)− δ_ij|x−y|_{N +2}2 |x − y|2 ρ(x− y)d(LN)2(x, y) (3.3) = ˆ dN u(x)ϕ(x + (x− y)) |x − y|2 (x_i− y_i)(x_j− y_j)− δ_ij|x−y|_{N +2}2 |x − y|2 ρ(x− y)d(LN)2(x, y).

Equation (3.2) can be easily shown by alternating x and y and using the symmetry of the domain. Finally, (3.3) can be proved by employing the substitution u = 2x− y, v = 3x − 2y, noting that x− y = v − u and that the determinant of the Jacobian of this substitution is

−1.

Having established that the notion of distributional nonlocal Hessian and nonlocal Hessian agree whenever the latter exists, it is a natural question to ask when this is the case. It is a simple calculation to verify that the Lebesgue integral (1.11) exists whenever u∈ C_c2(RN). However, this is also the case for functions in the spaces W2,p(RN) and BV2(RN); see Lemmas

3.1 and 3.4.

3.1. Localization–smooth case. We are now ready to prove the localization of H_nu to ∇2_{u for smooth functions.}

Proof of Theorem 1.4.

Case 1 ≤ p < +∞. Let us assume that we have shown the case p = +∞. Then we

must show that the uniform convergence Hnv→ ∇2v for v∈ Cc2(RN) implies convergence in Lp(RN,RN ×N) for any 1≤ p < +∞. We claim that this will follow from the following uniform estimate on the tails of the nonlocal Hessian. Suppose supp v⊂ B(0, R), where supp v denotes the support of v. Then for any 1 ≤ p < +∞ and > 0 there exists a L = L(, p) 1 such that sup n ˆ B(0,LR)c |Hnv(x)|p dx≤ . (3.4)

If this were the case, we would estimate the Lp-convergence as follows: ˆ RN|Hnv(x)− ∇ 2_v(x)_|p _{dx =} ˆ B(0,LR) |Hnv(x)− ∇2v(x)|p dx + ˆ B(0,LR)c |Hnv(x)|p dx,

from which (3.4) implies lim sup n→∞ ˆ RN|Hnv(x)− ∇ 2_v(x)_|p _dx_{≤ lim sup} n→∞ ˆ B(0,LR) |Hnv(x)− ∇2v(x)|p dx + .

(13)

The conclusion then follows, since the ﬁrst term vanishes from the uniform convergence as-sumed, after which > 0 is arbitrary. We will therefore show the estimate (3.4). We have, by Jensen’s inequality with respect to the measure ρn, which has

´ RNρn(x)dx = 1, that ˆ B(0,LR)c |Hnv(x)|p dx≤ N (N + 2) 2 ˆ B(0,LR)c ˆ RN |v(y) − 2v(x) + v(x + x − y)|p |x − y|2p ρn(x− y)dydx = N (N + 2) 2 ˆ B(0,LR)c ˆ y∈B(0,R) |v(y)|p |x − y|2pρn(x− y)dydx +N (N + 2) 2 ˆ B(0,LR)c ˆ x+x−y∈B(0,R) |v(x + x − y)|p |x − y|2p ρn(x− y)dydx.

Letting z = x + x− y (which means that x − y = z − x), we obtain

ˆ B(0,LR)c ˆ x+x−y∈B(0,R) |v(x + x − y)|p |x − y|2p ρn(x− y)dydx = ˆ B(0,LR)c ˆ z∈B(0,R) |v(z)|p |z − x|2pρn(z− x)dzdx,

and therefore by symmetry of ρ_n we have ˆ B(0,LR)c|Hn v(x)|pdx≤ 2N (N + 2) 2 ˆ B(0,LR)c ˆ y∈B(0,R) |v(y)|p |x − y|2pρn(x− y)dydx ≤N (N + 2) 2 2 (R(L− 1))2p ˆ B(0,LR)c ˆ y∈B(0,R)|v(y)| p_ρ n(x− y)dydx ≤N (N + 2) 2 2 (R(L− 1))2pρnL1(RN)v p Lp_(RN₎.

Again using´_RNρn(x)dx = 1, the claim, and therefore the case 1≤ p < +∞, then follows by

choosing L suﬃciently large.

Case p = +∞. It therefore remains to show that the convergence in L∞(RN,RN ×N) is true. Precisely, we will show that

Hnu− ∇2u →0 uniformly,

for which it suﬃces to prove the convergence componentwise, i.e., Hnu− ∇2u

(i0,j0) →0 by considering two cases i₀= j₀ and i₀ = j₀. Before we proceed, let us mention some useful facts. Observe ﬁrst that Proposition 5.1in the appendix and the assumption that´_RNρn(x)dx = 1

for all n∈ N can be used to deduce that ˆ RN z_i2₀z_j2₀ |z|4 ρn(z)dz = ˆ _∞ 0 ρn(t)t N −1_dt ˆ SN−1ν 2 i0νj20dH N −1_(x) (3.5) = 1 N (N + 2) · 1, i₀ = j₀, 3, i₀ = j₀.

Moreover, utilizing the radial symmetry of ρ_n, we have that the following integrals vanish:

i0 =j0 : ˆ RN z_iz3_j₀ |z|4 ρn(z)dz = 0 for i= j0, (3.6)

(14)

i0 =j0 : ˆ RN zizjz2j0 |z|4 ρn(z)dz = 0 for i= j0, j = j0, i= j, (3.7) i0 = j0 : ˆ RN zizjzi0zj0 |z|4 ρn(z)dz = 0 for i= i0, j= j0. (3.8)

Subcase i₀ = j₀. Using (3.5), for the case that i₀= j₀, we have Hnu − ∇2u_(i₀_,j₀₎(x) =N(N + 2) 2 ˆ_R_N d2u(x, y) |x − y|2 (xi0− yi0)(xj0− yj0) |x − y|2 ρn(x − y)dy − 2 ∂u ∂xi0∂xj0(x) ˆ RN z2i0z2j0 |z|4 ρn(z)dz . Moreover, (3.6)–(3.7) imply that

N i,j=1 ∂u ∂xi∂xj (x) ˆ RN zizi0zjzj0 |z|4 ρn(z)dz = 2 ∂u ∂xi0∂xj0 (x) ˆ RN z2_i₀z2_j₀ |z|4 ρn(z)dz.

Thus, introducing these factors of zero and writing in a more compact way, we have that

H_nu− ∇2u_(i 0,j0)(x) =N (N + 2) 2 ˆ RN d2u(x, y)− (x − y)T∇2u(x)(x− y) |x − y|2 (x_i₀− y_i₀)(x_j₀ − y_j₀) |x − y|2 ρn(x− y)dy .

We want to show that the right-hand side tends to zero as n→ ∞, and therefore we deﬁne now the following quantity for δ > 0:

(3.9) Qδu(x) = ˆ B(x,δ) d2u(x, y)− (x − y)T∇2u(x)(x− y) |x − y|2 (xi0 − yi0)(xj0− yj0) |x − y|2 ρn(x− y)dy . We then claim that we can make Q_δu(x) as small as we want, independently of x and n, by

choosing suﬃciently small δ > 0. If this is the case, then the case i₀ = j₀ would be completed, since we would then have that

H_nu− ∇2u_(i 0,j0)(x)  ≤ N (N + 2)₂ Q_δ(x) +N (N + 2) 2 ˆ |z|≥δ

|u(x + z) − 2u(x) + u(x − z)|

|z|2 ρn(z)dz +N (N + 2) 2 ∇ 2_u(x)ˆ |z|≥δ ρ_n(z)dz ≤N (N + 2) 2 + N (N + 2) 2 4u_∞ δ2 +∇ 2_u L∞_(Ω) ˆ |z|≥δ ρ_n(z)dz < N (N + 2)

for n large enough, and the result follows from sending → 0.

We therefore proceed to make estimates for (3.9). Since we have assumed u ∈ C_c2(RN), we have that given > 0, there is a δ > 0 such that for every i, j = 1, . . . , N we have

∂u ∂xi∂xj (x)− ∂u ∂xi∂xj (y) < whenever |x − y| < δ.

(15)

Using (3.12) we can estimate Qδu(x) = ˆ B(x,δ) d2u(x, y)− (x − y)T∇2u(x)(x− y) |x − y|2 (xi0 − yi0)(xj0 − yj0) |x − y|2 ρn(x− y)dy (3.10) = ˆ B(x,δ)

(x− y)T ´₀1´₀1∇2u(x + (s + t− 1)(y − x)) − ∇2u(x)dsdt(x− y)

|x − y|2 (3.11) × ((xi0 − yi0)(xj0 − yj0) |x − y|2 ρn(x− y)dy ≤ N ˆ B(x,δ) |x − y||x − y| |x − y|2 |xi0 − yi0||xj0 − yj0| |x − y|2 ρn(x− y)dy ≤ N.

Here, we have used the mean value theorem for scalar and vector valued functions to write

(3.12) d2u(x, y) = (x− y)T

ˆ ₁

0

ˆ ₁

0 ∇

2_{u(x + (t + s}_{− 1)(y − x))dsdt}_(x_{− y),}

and the fact that´_RNρn(x)dx = 1 for all n∈ N. This completes the proof in the case i0 = j0. Subcase i₀ =j₀. Let us record several observations before we proceed with this case. In

fact, the same argument shows that for a single i∈ {1, . . . , N},

I_in(x) := ˆ RN d2u(x, y)− (x − y)T∇2u(x)(x− y) |x − y|2 (x_i− y_i)2 |x − y|2 ρn(x− y)dy  → 0 (3.13)

uniformly in x as n→ ∞, and therefore by summing in i we deduce that

ˆ_R_N d2u(x, y)− (x − y)_{|x − y|}T2∇2u(x)(x− y)ρn(x− y)dy

 → 0. (3.14)

Moreover, we observe that the same formula from Proposition 5.1 and cancellation of odd powers imply that

ˆ RN (x− y)T∇2u(x)(x− y)(x_i₀− y_i₀)2 |x − y|4 ρn(x− y)dy = N j=1 ∂2u ∂x2_j(x) ˆ RN z_j2z2_i₀ |z|4 ρn(z)dz = 1 N (N + 2)Δu + 2 3 ∂2u ∂x2_i 0 (x) ˆ RN z_i4₀ |z|4ρn(z)dz = 2 N (N + 2) 1 2Δu + ∂2u ∂x2_i 0 (x) ,

while we also have that

(16)

ˆ RN (x− y)T_∇2_u(x)(x_{− y)} |x − y|2 ρn(x− y)dy = N j=1 ∂2u ∂x2_j(x) ˆ RN z_j2 |z|2ρn(z)dz = 1 NΔu(x).

Thus, we can estimate H_nu− ∇2u_(i 0,i0)(x)  ≤ In i0(x) + N₂ ˆ_R_N d_{|x − y|}2u(x, y)2 ρn(x− y)dy − ˆ RN Δu(x) 2 ρn(x− y)dy = I_in₀(x) +N 2 ˆ RN d2u(x, y)− (x − y)T∇2u(x)(x− y) |x − y|2 ρn(x− y)dy , and the proof is completed by invoking the convergences established in (3.13) and (3.14).

3.2. Localization–W2,p(RN) case. The objective of this section is to show that if u ∈

W2,p(RN), 1≤ p < ∞, then the nonlocal Hessian Hnu converges to∇2u in Lp. The ﬁrst step

is to show that indeed in that case H_nu is indeed an Lp function. This follows from Lemma

3.1, which we prove next.

Lemma 3.1. Suppose that u∈ W2,p(RN), where 1≤ p < ∞. Then Hnu is well-defined as a Lebesgue integral, H_nu ∈ Lp(RN,RN ×N), and

(3.15)

ˆ

RN|Hnu(x)|

p_dx_{≤ M∇}2_up

Lp(RN),

where the constant M depends only on N and p.

Proof. Let us begin by making estimates for a function v∈ C∞(RN)∩ W2,p(RN). From the deﬁnition of the nonlocal Hessian and utilizing Jensen’s inequality, (3.12), and Fubini’s theorem, we have the following successive estimates (the constant is always denoted with

M (N, p)): ˆ RN|Hnv(x)| p_dx (3.16) = N (N + 2) 2 pˆ RN ˆ RN d2v(x, y) |x − y|2 (x− y) ⊗ (x − y) −|x−y|_{N +2}2IN |x − y|2 ρn(x− y)dy p dx ≤ M(N, p) ˆ RN ˆ RN |d2_{v(x, y)}_| |x − y|2 ρn(x− y)dy p dx,

(17)

≤ M(N, p) ˆ RN ˆ RN |d2_{v(x, y)|}p |x − y|2p ρn(x− y)dy dx (3.17) ≤ M(N, p) ˆ RN ˆ RN ´₁ 0 |∇v(x + t(y − x)) − ∇v(x + (t − 1)(y − x))|pdt |x − y|p ρn(x− y)dy dx = M (N, p) ˆ ₁ 0 ˆ ₁ 0 ˆ RN ˆ RN∇ 2_{v(x + (t + s}_{− 1)(y − x))}p ρ_n(x− y)dy dxdsdt = M (N, p) ˆ ₁ 0 ˆ ₁ 0 ˆ RN ˆ RN∇ 2_{v(x + (t + s}_{− 1)ξ)}p ρ_n(ξ)dξ dxdsdt, = M (N, p) ˆ ₁ 0 ˆ ₁ 0 ˆ RNρn(ξ)∇ 2_vp Lp(RN)dξdsdt = M (N, p)∇2vp Lp(RN).

Consider now a sequence (v_k)_k∈N in C∞(RN)∩ W2,p(RN) approximating u in W2,p(RN). We already have from above that

(3.18)_ˆ RN ˆ RN |vk(x + z)− 2vk(x) + vk(x− z)|p |z|2p ρn(z)dz dx≤ M∇2v_k_Lp_(RN₎ ∀k ∈ N.

Since vk converges to u in Lp(RN), we have that there exists a subsequence vk converging to u almost everywhere.

If we can establish that H_nu is well-deﬁned as a Lebesgue integral, then Jensen’s inequality

and Fatou’s lemma imply that ˆ

RN

ˆ_R_N u(x + z)− 2u(x) + u(x − z)_|z|2 ρn(z) dz

p dx ≤ ˆ RN ˆ RN

|u(x + z) − 2u(x) + u(x − z)|p

|z|2p ρn(z)dzdx ≤ lim inf →∞ ˆ RN ˆ RN |vk(x + z)− 2vk(x) + vk(x− z)|p |z|2p ρn(z)dzdx ≤ M lim inf →∞ ∇ 2_v kLp(RN) = M∇2u_Lp_(RN₎.

This argument, along with Jensen’s inequality, allows us to conclude that the conditions of Theorem 1.3 are satisﬁed, in particular that Hnu is well-deﬁned as a Lebesgue integral, so

that the estimate (3.17) holds for W2,p functions as well, thus completing the proof. Finally, the Gagliardo–Nirenberg inequality

∇2u_L1 ≤ C∇2uθ_Lpu1−θ_Lp

implies that for u ∈ W2,p, ∇2u ∈ L1, which by the preceding display yields that Hnu is

well-deﬁned as a Lebesgue integral.

(18)

We now have the necessary tools to prove the localization for W2,p functions.

Proof of Theorem 1.5. The result holds for functions v∈ C_c2(RN_{), since from Theorem}_1.4

we have that Hnv→ ∇2v in Lp(RN,RN ×N). We now use the fact that Cc∞(RN), and hence C_c2(RN), is dense in W2,p(RN); see, for example, [Bre83]. Let > 0; then from density we have that there exists a function v∈ C_c2(RN_{) such that}

∇2_u_{− ∇}2_v

Lp(RN)≤ .

Thus using also Lemma 3.1we have

Hnu− ∇2uLp(RN)≤ Hnu− HnvLp(RN)+Hnv− ∇2vLp(RN)+∇2v− ∇2uLp(RN) ≤ C + Hnv− ∇2vLp(RN)+ .

Taking limits as n→ ∞ we get lim sup

n→∞ Hn

u− ∇2u_Lp_(RN₎≤ (C + 1), and thus we conclude that

lim

n→∞Hnu− ∇

2_u

Lp(RN)= 0.

3.3. Localization–BV2(RN) case. Analogously with the ﬁrst-order case in [MS15], we can deﬁne a second-order nonlocal divergence that corresponds to Hn, and we can also derive

a second-order nonlocal integration by parts formula which is an essential tool for the proofs of this section. The second-order nonlocal divergence is deﬁned for a function φ = (φij)Ni,j=1

as (3.19) D2 nφ(x) = N (N + 2) 2 ˆ RN φ(y)− 2φ(x) + φ(x + (x − y)) |x − y|2 · (x− y) ⊗ (x − y) −|x−y|_N+22I_N |x − y|2 ρn(x− y)dy,

where A·B =N_i,j=1A_ijB_ij for two N×N matrices A and B. Notice that (3.19) is well-deﬁned for φ∈ C_c2(RN,RN ×N).

Theorem 3.2 (second-order nonlocal integration by parts formula). Suppose that u∈ L1(RN₎

and |d_|x−y|2u(x,y)|₂ ρ_n(x− y) ∈ L1(RN × RN), and let φ∈ C_c2(RN,RN ×N). Then

(3.20) ˆ RNHnu(x)· φ(x)dx = ˆ RNu(x)D 2 nφ(x)dx.

In fact, this theorem can be deduced as a consequence of Theorem 1.3 through a com-ponent by comcom-ponent application and collection of terms. The following lemma shows the convergence of the second-order nonlocal divergence to the continuous analogue div2φ, where φ∈ C_c2(RN,RN ×N) and div2φ := N i,j=1 ∂φij ∂x_i∂x_j.

(19)

Lemma 3.3. Let φ∈ C_c2(RN,RN ×N). Then for every 1≤ p ≤ ∞ we have

(3.21) lim

n→∞D

2

nφ− div2φLp(RN)= 0. Proof. The proof follows immediately from Theorem1.4.

The following lemma shows that the nonlocal Hessian (1.11) is well-deﬁned for u ∈ BV2(RN). It is the analogue of Lemma 3.1for functions in BV2(RN) this time.

Lemma 3.4. Suppose that u∈ BV2(RN). Then H_nu∈ L1(RN,RN ×N) with (3.22)

ˆ

RN|Hnu(x)|dx ≤ M|D

2_u_|(RN_), where the constant M depends only on N .

Proof. Let (u_k)_k∈N be a sequence of functions in C∞(RN) that converges strictly in BV2(RN). By the same calculations as in the proof of Lemma3.1 we have for every k∈ N,

ˆ

RN|Hnuk(x)|dx ≤ M(N, 1)∇

2_u

kL1(RN).

Using Fatou’s lemma in a way similar to how it was used in Lemma 3.1, we can establish that

Hnu is well-deﬁned as a Lebesgue integral, along with the estimate

ˆ

RN|Hnu(x)|dx ≤ M(N, 1) lim infk→∞ |D

2_u

k|(RN)

= M (N, 1)|D2u|(RN),

where above we employed the strict convergence of D2uk to D2u. Thus the result has been

demonstrated.

We can now proceed to prove the localization result for BV2 functions. Recall that we deﬁned μ_n to be theRN ×N-valued ﬁnite Radon measures μ_n:= H_nuLN.

Proof of Theorem 1.6. We ﬁrst proceed to prove (1.14) for C_c∞functions, and then we con-clude with a density argument. From the estimate (3.22) we have that (|μ_n|)_n∈N is bounded; thus there exist a subsequence (μ_n_k)_k∈Nand anRN ×N_{-valued Radon measure μ such that μ}

nk

converges to μ weakly∗. This means that for every ψ∈ C_c∞(RN,RN ×N) we have lim k→∞ ˆ RNHnku(x)· ψ(x)dx = ˆ RNψ(x)· dμ.

On the other hand, from the integration by parts formula (3.20) and Lemma3.3 we get that lim k→∞ ˆ RNHnku(x)· ψ(x)dx = limk→∞ ˆ RNu(x)D 2 nkψ(x)dx = ˆ RNu(x)div 2_ψ(x)dx = ˆ RNψ(x)· dD 2_u.

(20)

This means that μ = D2u. Observe now that since we actually deduce that every subsequence

of (μ_n)_n∈N has a further subsequence that converges to D2u weakly∗, the initial sequence (μn)n∈N converges to D2u weakly∗.

Now we let φ ∈ C₀(RN_,_RN ×N_{) and > 0.} _{From the density of C}∞

c (RN,RN ×N) in

C₀(RN,RN ×N) we can ﬁnd a ψ∈ C_c∞(RN,RN ×N) such thatφ − ψ_∞< . Then, using also

the estimate (3.22), we have

ˆ_R_NH_nu(x)· φ(x)dx − ˆ RN φ(x) dD2u ≤ ˆ RN H_nu(x)· (φ(x) − ψ(x))dx + ˆ RN H_nu(x)· ψ(x)dx − ˆ RN ψ(x) dD2u + ˆ RN (φ(x)− ψ(x)) dD2u ≤ ˆ RN|Hn u(x)|dx + ˆ RN H_nu(x)· ψ(x)dx − ˆ RN ψ(x) dD2u + |D2u|(RN) ≤ M|D2_u_|(RN_{) +}ˆ RN H_nu(x)· ψ(x)dx − ˆ RN ψ(x) dD2u + |D2u|(RN).

Taking the limit n→ ∞ from both sides of the above inequality we get that lim sup n→∞ ˆ_R_NH_nu(x)· φ(x)dx − ˆ RNφ(x) dD 2_u_{≤ ˜M,}

and since is arbitrary, we have (1.14).

Let us note here that in the case N = 1 we can also prove strict convergence of the measures μ_nto D2u; that is, in addition to (1.14) we also have

|μn|(R) → |D2u|(R).

Theorem 3.5. Let N = 1. Then the sequence (μ_n)_n∈N converges to D2u strictly as mea-sures, i.e.,

μ_n→ D2u weakly∗, and

(3.23)

|μn|(R) → |D2u|(R).

(3.24)

Proof. The weak∗ convergence was proven in Theorem 1.6. Since in the space of ﬁnite Radon measures the total variation norm is lower semicontinuous with respect to the weak∗ convergence, we also have

(3.25) |D2u|(R) ≤ lim inf

n→∞ |μn|(R).

Thus it suﬃces to show that

(3.26) lim sup

n→∞ |μn|(R) ≤ |D

2_u_|(R).

(21)

Note that in dimension one the nonlocal Hessian formula is

(3.27) Hnu(x) =

ˆ

R

u(y)− 2u(x) + u(x + (x − y))

|x − y|2 ρn(x− y)dy.

Following the proof of Lemma3.1, we can easily verify that for v∈ C∞(R) ∩ BV2(R) we have ˆ

R|Hnv(x)|dx ≤ ∇ 2_v

L1(R),

i.e., the constant M that appears in the estimate (3.15) is equal to 1. Using Fatou’s lemma and the BV2 strict approximation of u by smooth functions we get that

|μn|(R) =

ˆ

R|Hnu(x)|dx ≤ |D

2_u_|(R),

from where (3.26) straightforwardly follows.

3.4. Characterization of higher-order Sobolev and BV spaces. Characterization of Sobo-lev and BV spaces in terms of nonlocal, derivative-free energies has been done so far only in the ﬁrst-order case; see [BBM01, Pon04b, Men12, MS15]. Here we characterize the spaces

W2,p(RN) and BV2(RN) using our deﬁnition of nonlocal Hessian.

Proof of Theorem 1.7. First, we prove (1.15). Suppose that u∈ W2,p(RN_{). Then, Lemma} 3.1 gives lim inf n→∞ ˆ RN|Hnu(x)| p_dx_{≤ M∇}2_up Lp(RN) <∞.

Suppose now conversely that

lim inf

n→∞

ˆ

RN|Hnu(x)|

p_{dx <}_∞.

This means that up to a subsequence, the sequence Hnu is representable (up to a

subse-quence) by a sequence of functions bounded in Lp(RN,RN ×N); thus there exists a subsequence (H_n_ku)_k∈Nand v∈ Lp(RN,RN ×N) such thatH_n_ku v weakly in Lp(RN,RN ×N). Thus, using the deﬁnition of Lp _{weak convergence together with the deﬁnition of}_H

nu, we have for every ψ∈ C_c∞(RN), ˆ RNv ij_(x)_{· ψ(x)dx = lim} k→∞ ˆ RNH ij nku(x)· ψ(x) = lim k→∞ ˆ RNu(x)H ij nkψ(x) = ˆ RNu(x) ∂2ψ(x) ∂x_i∂x_jdx,

something that shows that v = ∇2u is the second-order weak derivative of u. Now since u∈ Lp₍_RN_{) and the second-order distributional derivative is a function, molliﬁcation of u and}

the Gagliardo–Nirenberg inequality (see [Nir59, p. 128, eq. 2.5])

(3.28) ∇u_Lp_(RN₎≤ C∇2u 1 2 Lp(RN)u 1 2 Lp(RN)

(22)

implies that the ﬁrst distributional derivative belongs to Lp(RN,RN), and thus u∈ W2,p(RN). We now proceed in proving (1.16). Again supposing that u ∈ BV2(RN_{) we have that}

Lemma 3.4gives us lim inf n→∞ ˆ RN|Hnu(x)|dx ≤ C|D 2_u_|(RN_).

Suppose now that

lim inf

n→∞

ˆ

RN|Hnu(x)|dx < ∞.

Considering again the measures μ_n=H_nuLN we have that there exist a subsequence (μ_n_k)_k∈N and a ﬁnite Radon measure μ such that μnk  μ weakly∗ ∗. Then for every ψ ∈ Cc∞(RN) we

have, similarly as before, ˆ RNψdμ ij _{= lim} k→∞ ˆ RNH ij nku(x)· ψ(x)dx = lim k→∞ ˆ RNu(x)H ij nkψ(x)dx = ˆ RNu(x) ∂2ψ(x) ∂x_i∂x_jdx,

something that shows that μ = D2u. Again, by ﬁrst mollifying and then passing the limit,

the inequality (3.28) implies that Du∈ M(RN,RN). However, Du∈ M(RN,RN) and D2u∈ M(RN_,_RN ×N_{) imply that Du is an L}1₍_RN_,_RN_{) function (which is a simple consequence of}

the Sobolev inequality; but see also [AFP00, Exerc. 3.2]), and we therefore conclude that

u∈ BV2(RN).

3.5. Γ-convergence. For notational convenience we deﬁne the functional

(3.29) F_n(u) := ´

RN|Hnu| dx if Hnu is representable by an L1 function,

∞ otherwise.

Proof of Theorem 1.8. The computation of the Gamma limit consists of two inequalities. For the lower bound, we must show that

|D2u|(RN)≤ lim inf

n→∞ Fn(un)

for every sequence un→ u in L1(RN). Without loss of generality we may assume that C := lim inf

n→∞ Fn(un) < +∞,

which implies that

sup ϕ lim inf n→∞ ˆ_R_NHij nunϕ dx  ≤ C,

where the supremum is taken over ϕ∈ C_c∞(RN) such thatϕ_L∞_(RN₎≤ 1. Now, the deﬁnition of the distributional nonlocal Hessian and the convergence u_n→ u in L1(RN) imply that

(23)

lim n→∞ ˆ RNH ij nunϕ dx = lim n→∞ ˆ RNunH ij nϕ dx = ˆ RNu ∂2ϕ ∂x_i∂x_j dx.

We thus conclude that

sup ϕ ˆ_R_Nu ∂ 2_ϕ ∂x_i∂x_j dx  ≤ C,

which, arguing as in the previous section, says that u ∈ BV2(RN), in particular that D2u ∈ M(RN_,_RN ×N_{) and}

|D2u|(RN)≤ Γ_L1_(RN₎- lim inf

n→∞ Fn(u)

for every u∈ L1(RN).

For the upper bound we observe that if u∈ C_c2(RN), we have by the uniform convergence of Theorem 1.4 and the fact that u is suﬃciently smooth with compact support that

lim

n→∞Fn(u) =|D

2_u|(RN_).

Then choosing un= u we conclude that

Γ_L1_(RN₎- lim sup

n→∞

Fn(u)≤ lim

n→∞Fn(u)

=|D2u|(RN).

Now, taking the lower semicontinuous envelope with respect to L1(RN) strong convergence, using that both the Γ_L1_(RN₎- lim sup and the mapping u→ |D2u|(RN) are lower semicontinu-ous on L1(RN) (for the Γ- lim sup see [DM93, Prop. 6.8]), we deduce that

Γ_L1_(RN₎- lim sup n→∞ F_n(u)≤ sc− L1(RN)|D 2_u_|(RN₎ =|D2u|(RN) for all u∈ L1(RN).

4. Extensions and applications.

4.1. An asymmetric extension. In the previous sections we have shown that our nonlocal deﬁnition of Hn as in (1.11) localizes to the classical distributional Hessian for a speciﬁc

choice of the weights ρ_n and thus can be rightfully called a nonlocal Hessian. In numerical applications, however, the strength of such nonlocal models lies in the fact that the weights can be chosen to have nonlocal interactions and model speciﬁc patterns in the data. A classic example is the nonlocal total variation [GO08]:

(4.1) J_{N L−T V}(u) = ˆ Ω ˆ Ω|u(x) − u(y)| w(x, y)dydx.

A possible choice is to choose w(x, y) large if the patches (neighborhoods) around x and y are similar with respect to a patch distance d_a, such as a weighted 2 norm, and small if they

(24)

are not. In [GO08] this is achieved by setting w(x, y) = 1 if the neighborhood around y is one of the K ∈ N closest to the neighborhood around x in a search window, and w(x, y) = 0 otherwise. In eﬀect, if the image contains a repeating pattern with a defect that is small enough to not throw oﬀ the patch distances too much, it will be repaired as long as most similar patterns do not show the defect.

Computing suitable weights is much less obvious in the case of H. We can formally extend (1.11) using arbitrary pairwise weights ρ :Rn× Rn→ R,

(4.2) H_ρu(x) = C

ˆ

RN

u(x + z)− 2u(x) + u(x − z) |z|2

zz−_{N +2}|z|2 I_N

|z|2 ρx(z)dz,

and use it to create nonlocal generalizations of functionals such as TV2, for example to mini-mize the nonlocal L2-TV2 model

(4.3) f (u) :=

ˆ

RN|u − g|

2_{dx + α}ˆ

RN|Hρ|dx.

However, apart from being formulated onRN instead of Ω, formulation (4.2) has an important drawback compared to the first-order formulation (4.1): while the weights are defined between two points x and y, the left part of the integrand uses the values of u not only at x and y but also at the “mirrored” point x + (x− y). In fact we can replace the weighting function by the symmetrized version 1₂{ρ_x(y− x) + ρ_x(x− y)}, which in effect relates three points instead of two and limits the choice of possible weighting functions.

In this section we therefore introduce a more versatile extension of (1.11) that allows for full nonsymmetric weights. We start with the realization that the finite-difference integrand in (4.2) effectively comes from canceling the first-order differences in the Taylor expansion of

u around x, which couples the values of u at x, y, and x + (x− y) into one term. Instead, we

can avoid this coupling by directly deﬁning the nonlocal gradient G_u(x) and Hessian looking for a nonlocal gradient G_u(x) ∈ RN and Hessian H_u(x) ∈ Sym(RN ×N) that best explain u

around x in terms of a quadratic model, i.e., that take the place of the gradient and Hessian

in the Taylor expansion: (4.4) (G_u(x), H_u(x)) := argmin Gu∈RN,Hu∈Sym(RN×N) 1 2 ˆ Ω−{x} u(x + z)− u(x) − G_uz−1 2z _H uz ₂ σx(z)dz.

Here the variable x + z takes the place of y in (1.11). We denote deﬁnition (4.4) the implicit nonlocal Hessian, as opposed to the explicit formulation (4.2).

The advantage is that any terms involving σ_x(z) are now only based on the two values of u at x and y = x + z, and (in particular bounded) domains other than RN are naturally dealt with, which is important for a numerical implementation. We also note that this ap-proach allows us to incorporate nonlocal ﬁrst-order terms as a side-eﬀect, and can be naturally extended to third- and higher-order derivatives, which we leave to further work.

With respect to implementation, the implicit model (4.4) does not add much to the overall diﬃculty: it is enough to add the nonlocal gradient and Hessian G_u(x) ∈ RN and

H_u(x) ∈ RN ×N as additional variables to the problem and couple them to u by adding the