Extensions and applications - Analysis and Application of a Nonlocal Hessian

R^Nu ∂²ϕ

∂x_i∂x_j dx.

We thus conclude that

sup

R^Nu ∂²ϕ

∂x_i∂x_j dx

 ≤ C,

which, arguing as in the previous section, says that u ∈ BV²(R^N), in particular that D²u ∈ M(R^N,R^{N ×N}) and

|D²u|(R^N)≤ Γ_L¹_(R^N₎- lim inf

n→∞ F_n(u) for every u∈ L¹(R^N).

For the upper bound we observe that if u∈ C_c²(R^N), we have by the uniform convergence of Theorem 1.4 and the fact that u is suﬃciently smooth with compact support that

n→∞lim F_n(u) =|D²u|(R^N).

Then choosing u_n= u we conclude that Γ_L1(R^N)- lim sup

n→∞

F_n(u)≤ lim

n→∞F_n(u)

=|D²u|(R^N).

Now, taking the lower semicontinuous envelope with respect to L¹(R^N) strong convergence, using that both the Γ_L1(R^N)- lim sup and the mapping u→ |D²u|(R^N) are lower semicontinu-ous on L¹(R^N) (for the Γ- lim sup see [DM93, Prop. 6.8]), we deduce that

Γ_L1(R^N)- lim sup

n→∞

F_n(u)≤ sc⁻_L1(R^N)|D²u|(R^N)

=|D²u|(R^N) for all u∈ L¹(R^N).

4. Extensions and applications.

4.1. An asymmetric extension. In the previous sections we have shown that our nonlocal deﬁnition of H_n as in (1.11) localizes to the classical distributional Hessian for a speciﬁc choice of the weights ρ_n and thus can be rightfully called a nonlocal Hessian. In numerical applications, however, the strength of such nonlocal models lies in the fact that the weights can be chosen to have nonlocal interactions and model speciﬁc patterns in the data. A classic example is the nonlocal total variation [GO08]:

(4.1) J_{N L−T V}(u) =

Ω|u(x) − u(y)|

w(x, y)dydx.

A possible choice is to choose w(x, y) large if the patches (neighborhoods) around x and y are similar with respect to a patch distance d_a, such as a weighted ² norm, and small if they

Downloaded 11/25/15 to 137.189.170.231. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

are not. In [GO08] this is achieved by setting w(x, y) = 1 if the neighborhood around y is one of the K ∈ N closest to the neighborhood around x in a search window, and w(x, y) = 0 otherwise. In eﬀect, if the image contains a repeating pattern with a defect that is small enough to not throw oﬀ the patch distances too much, it will be repaired as long as most similar patterns do not show the defect.

Computing suitable weights is much less obvious in the case of H. We can formally extend (1.11) using arbitrary pairwise weights ρ :Rⁿ× Rⁿ→ R,

(4.2) H_ρu(x) = C ˆ

R^N

u(x + z)− 2u(x) + u(x − z)

|z|²

zz−_{N +2}^|z|² I_N

|z|² ρ_x(z)dz,

and use it to create nonlocal generalizations of functionals such as TV², for example to mini-mize the nonlocal L²-TV² model

(4.3) f (u) :=

R^N|u − g|²dx + α ˆ

R^N|Hρ|dx.

However, apart from being formulated onR^N instead of Ω, formulation (4.2) has an important drawback compared to the ﬁrst-order formulation (4.1): while the weights are deﬁned between two points x and y, the left part of the integrand uses the values of u not only at x and y but also at the “mirrored” point x + (x− y). In fact we can replace the weighting function by the symmetrized version ¹₂{ρ_x(y− x) + ρx(x− y)}, which in eﬀect relates three points instead of two and limits the choice of possible weighting functions.

In this section we therefore introduce a more versatile extension of (1.11) that allows for full nonsymmetric weights. We start with the realization that the ﬁnite-diﬀerence integrand in (4.2) eﬀectively comes from canceling the ﬁrst-order diﬀerences in the Taylor expansion of u around x, which couples the values of u at x, y, and x + (x− y) into one term. Instead, we can avoid this coupling by directly deﬁning the nonlocal gradient G_u(x) and Hessian looking for a nonlocal gradient G_u(x) ∈ R^N and Hessian H_u(x) ∈ Sym(R^{N ×N}) that best explain u around x in terms of a quadratic model, i.e., that take the place of the gradient and Hessian in the Taylor expansion:

(4.4)

(G_u(x), H_u(x)) := argmin

Gu∈R^N,Hu∈Sym(R^N×N)

1 2

Ω−{x}

u(x + z)− u(x) − G_uz−1

2zH_uz ₂

σ_x(z)dz.

Here the variable x + z takes the place of y in (1.11). We denote deﬁnition (4.4) the implicit nonlocal Hessian, as opposed to the explicit formulation (4.2).

The advantage is that any terms involving σ_x(z) are now only based on the two values of u at x and y = x + z, and (in particular bounded) domains other than R^N are naturally dealt with, which is important for a numerical implementation. We also note that this ap-proach allows us to incorporate nonlocal ﬁrst-order terms as a side-eﬀect, and can be naturally extended to third- and higher-order derivatives, which we leave to further work.

With respect to implementation, the implicit model (4.4) does not add much to the overall diﬃculty: it is enough to add the nonlocal gradient and Hessian G_u(x) ∈ R^N and H_u(x) ∈ R^{N ×N} as additional variables to the problem and couple them to u by adding the

Downloaded 11/25/15 to 137.189.170.231. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

optimality conditions for (4.4) to the problem. Since (4.4) is a ﬁnite-dimensional quadratic minimization problem, the optimality conditions are linear, i.e., of the form A_u,xH_u(x) = b_u,x. Such linear constraints can be readily added to most solvers capable of solving the local prob-lem. Alternatively the matrices A_u,x can be inverted explicitly in a precomputation step;

however, we did not ﬁnd this to increase overall performance.

While the implicit and explicit models look diﬀerent at ﬁrst glance, from the considerations about the Taylor expansion we expect them to be closely related, and in fact this is true, as shown in the following theorem.

Theorem 4.1. Let ρ be a radially symmetric function satisfying the conditions (1.4)–(1.6), let u∈ BV²(R^N), with ∇u being Lipschitz, let x ∈ R^N, and let

Then the optimal H_u(x) is given by the explicit nonlocal Hessian, i.e.,

H_u(x) = Hu(x) = N (N + 2)

This means that by setting Ω =R^N, assuming some extra regularity for u, and substituting the weights σ_x(z) with ρ(z)/|z|⁴ the implicit nonlocal Hessian coincides with the explicit one.

In order to prove Theorem4.1we need ﬁrst the following lemma, whose only diﬀerence is that we are integrating over R^N\ B(0, ) in order to deal with the introduced singularity 1/|z|⁴ at the origin.

Lemma 4.2. Let ρ be a radially symmetric function satisfying the conditions (1.4)–(1.6), let u∈ BV²(R^N), let x∈ R^N, and let

Then the optimal H is given by H_ij = C

Downloaded 11/25/15 to 137.189.170.231. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

Proof. The optimality conditions for (4.6) in terms of G and H are

under the constraint H ∈ Sym(R^{N ×N}). Note that from a nonsymmetric solution H₀ we can always construct a symmetric solution H ∈ Sym(R^{N ×N}) by letting H = ¹₂(H₀+ H₀), as the equations are invariant under transposition of H. The above conditions correspond to the following sets of N and N² equations, respectively:

ˆ can be rewritten as a linear system with an m× m block matrix,

A V The entries in the submatrices V are all of the form

R^N\B(0,)z_iz_jz_kρ(z)

|z|⁴dz, i, j, k∈ {1, . . . , N}.

No matter what the choice of i, j, k is, there is always one index with an odd power, so every one of these integrals is zero due to symmetry and the fact that ρ is even, i.e., ρ(z) = ρ(−z).

This means that the conditions on the gradient and the Hessian parts of p decouple, i.e., the problem is

(4.9) Ap_G= a, Bp_H = b.

We can therefore look at the isolated problem of computing the Hessian part p_H, or equiva-lently H, without interference from the gradient part. The matrix B is of the form

1 2

R^N\B(0,)z_iz_jz_i₀z_j₀ρ(z)

|z|⁴ dz.

Downloaded 11/25/15 to 137.189.170.231. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

Again due to symmetry, the only way that this integral is nonzero is if there are no odd powers, so either all indices are the same or there are two pairs:

The right-hand side vector b in (4.9) is b_ij =

We consider the diﬀerent cases for i₀, j₀ separately.

Case i₀ = j0. All the terms in the left-hand side of (4.10) vanish except for (i, j) =

Then we are done for this case by simply observing that C⁻¹:= 2 where the last equality follows from (3.5).

Case i₀ = j₀. All the terms in the left-hand side of (4.10) vanish apart for i = j; i.e., we

Downloaded 11/25/15 to 137.189.170.231. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

where E is the all-ones matrix. Thus

D:= M⁻¹−−→^→0 N (N + 2) 2

I− 1 N + 2E

, and the lemma has been shown.

Remark 4.3. Note that we did not have to explicitly compute the optimal gradient G in the lemma above. However, we can bound G uniformly in . For that we assume that ∇u is Lipschitz, say with constant L. Indeed, the gradient part in (4.9) reads as

G_i ˆ

R^N\B(0,)

z²_i

|z|⁴ρ(z)dz = ˆ

R^N\B(0,)(u(x + z)− u(x)) z_i

|z|⁴ρ(z)dz, i = 1, . . . , N.

By symmetry we have ˆ

R^N

z_i²

|z|⁴ρ_n(z) dz = 1 N

R^N

|z|²ρ_n(z) dz;

therefore from the ﬁrst equation we conclude the bound

|Gi| ˆ

R^N\B(0,)

ρ(z)

|z|² dz = N

R^N\B(0,)

u(x + z)− u(x)

|z|

z_i

|z|

ρ(z)

|z|²dz

≤ LN ˆ

R^N\B(0,)

ρ(z)

|z|²dz

⇒ |Gi| ≤ LN.

We are now ready to prove Theorem 4.1, which is based on Lemma4.2, Remark4.3, and a Γ-convergence argument.

Proof of Theorem 4.1. Let F and Fbe the functions mapping a point x to the minimizers of (4.5) and (4.6). From Fatou’s lemma the functionals F are lower semicontinuous. Also

F→ F, increasingly, pointwise,

and thus from [DM93, Rem. 5.5] we have that the functionals F Γ-converge to F as → 0.

Finally, from the fact that u ∈ BV²(R^N) and Remark4.3 the minimizers of F are bounded uniformly in . Then [DM93, Thm. 7.4], we have that F attains its minimum and

argmin F = lim

→0argmin F= Hu(x), where the argmin here refers to the Hessian part of the minimizers.

Note that while Theorem 4.1requires radial symmetry of ρ, this symmetry was generally assumed throughout section 3. Therefore section 3can also be seen as providing localization results for implicit models of the form (4.5).

Downloaded 11/25/15 to 137.189.170.231. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

Figure 2. Adaptive choice of the neighborhood for the image of a disc with constant slope and added Gaussian noise. Left: A standard discretization of the second-order derivatives at the marked point uses all points in a 3× 3 neighborhood. Center: With a suitable choice of the weights σx, the discretization of the (nonlocal) Hessian at the marked point only (or mostly) involves points that are likely to belong to the same aﬃne region, here the inside of the disc. This allows one to use a straightforward second-order regularizer such as the norm of the HessianH_u, while preserving jumps. Right: Geodesic distance dMfrom the marked point x in the lower right. Points that are separated from x by a strong edge have a large geodesic distance to x and are therefore not included in the neighborhood of x that is used to deﬁne the nonlocal Hessian at x.

4.2. Choosing the weights for jump preservation. A characteristic of nonlocal models is that they are extremely ﬂexible due to the many degrees of freedom in choosing the weights.

In this work we will focus on improving on the question of how to reconstruct images that are piecewise quadratic but may have jumps. The issue here is that one wants to keep the Hessian sparse in order to favor piecewise aﬃne functions, but doing it in a straightforward way, such as by adding |D²u|(Ω) as a regularizer, enforces too much ﬁrst-order regularity [Sch98,LLT03,LT06,HS06].

There have been several attempts to overcome this issue, most notably approaches based on combined ﬁrst- and higher-order functionals (see [PS14] and the references therein), inﬁmal convolution [CL97], and TGV [BKP10, SST11]. Here we propose another strategy, making use of the nonlocal formulation (4.4). Note that for general ﬁxed weights ρ, even in the implicit model, ﬁniteness of ´

Ω|Hρ|dx does not require existence of |D²u|, as Theorem 1.7 only concerns (speciﬁc) sequences of ρ.

We draw our motivation for choosing the weights partly from a recent discussion of nonlocal

“amoeba” ﬁlters [LDM07, WBV11, Wel12]. Amoeba ﬁlters use classical techniques such as iterated median ﬁltering, but the structuring element is chosen in a highly adaptive local way that can follow the image structures, instead of being restricted to a small parametrized set of shapes. In the following we propose extending this idea to the higher-order energy minimization framework (Figure 2).

Given a noisy image g : Ω → R, we ﬁrst compute its (equally noisy) gradient image ∇g (in all of this section we consider only the discretized problem, so we can assume that the gradient exists). We then deﬁne the Riemannian manifoldM on the points of Ω with the usual Riemannian metric, weighted by ϕ(∇g) for some function ϕ : R^N → R+. In our experiments we used

ϕ(∇g) := |∇g|²+ γ (4.11)

for small γ > 0, but other choices are equally possible. The choice of γ controls how strongly

Downloaded 11/25/15 to 137.189.170.231. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

the edge information is taken into account: for γ = 0 the weights are fully anisotropic, while for large γ the magnitude of the gradient becomes irrelevant, and the method will reduce to an isotropic regularization. With these deﬁnitions, the geodesic distance d_M between two points x, y ∈ Ω now has a powerful interpretation: If d_M(x, y) is large, this indicates that x and y are separated by a strong edge and should therefore not appear in the same regularization term. On the other hand, small d_M(x) indicates that x and y are part of a more homogeneous region, and it is reasonable to assume that they should be part of the same aﬃne part of the reconstructed image.

For a given point x ∈ Ω, we sort the neighbors y¹, y², . . . in order of ascending distance, i.e., d_M(x, y¹)≤ d_M(x, y²)≤ · · · . We choose a neighborhood size M ∈ N and set the weights to

σ_x(y) :=

1, i≤ M, 0, i > M.

(4.12)

In other words, the nonlocal Hessian at x is computed using its M closest neighbors with respect to the geodesic distance through the gradient image.

The geodesic distances σ_x(y) can be eﬃciently computed using the fast marching method [Set99,OF03] by solving the Eikonal equation

|∇c(y)| = ϕ(∇g(y)), (4.13)

c(x) = 0 (4.14)

and setting d_M(x, y) = c(x). Although it is necessary to process this step for every point x ∈ Ω, it is in practice a relatively cheap operation: the fast marching method visits the neighbors of x in the order of ascending distance d_M, which means it can be stopped after M points, with M usually between 5 and 20. If M is chosen too small, one risks that the linear equation system that deﬁnes the nonlocal Hessian in (4.4) becomes underdetermined. In our experiments we found M = 12 to be a good compromise, but the choice does not appear to be a very critical one. In the experiments we used the L¹-nonlocal TV² model

u:Ω→R,min

G_u:Ω→R^N, H_u:Ω→Sym(R^N×N)

x∈Ω

|u(x) − g(x)|^pdx + α

x∈Ω

ω(x)|H_u(x)|dx (4.15)

s.t. A

G_u H_u

= Bu, (4.16)

where α > 0 is the regularization strength, and p∈ {1, 2}. The linear constraints implement the optimality conditions for G_u and H_u from (4.4), similar to (4.7)–(4.8):

y∈Ω, z=y−x

(u(y)− u(x) − G_u(x)z−1

2zH_u(x)z)z σ_x(z) = 0, x∈ Ω, (4.17)

y∈Ω, z=y−x

(u(y)− u(x) − G_u(x)z−1

2zH_u(x)z)zzσ_x(z) = 0, x∈ Ω, (4.18)

Downloaded 11/25/15 to 137.189.170.231. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

input TV TV² TGV

Figure 3. Classical local regularization. The input consists of a disc-shaped slope with additive Gaussian noise, σ = 0.25. Shown is the result of denoising the input with an L¹-data term. Total variation (TV, α = 1.1) regularization generates the well-known staircasing eﬀect. Regularization of the Hessian (TV², α = 0.8) avoids this problem, at the cost of oversmoothing jumps. Total generalized variation (TGV, α0 = 1.5, α1 = 1.1) performs best but still clips the slope at the top.

with a suitable parametrization of H_u to enforce symmetry. Note that unlike the radially symmetric case in Lemma4.2, the conditions do not necessarily uncouple; therefore the ﬁrst-order component G_u needs to be included in the optimization as well.

The local weight ω is set as ω(x) = M/|{y ∈ Ω|By,x = 0}|. While the approach does work with uniform weights ω = 1, we found that in some cases it can erroneously leave single outlier points intact. We believe that this is caused by a subtle issue: by construction of σ, outlier points are usually close neighbors to fewer points. Therefore they appear in fewer of the regularization terms |H_u(x)|, which eﬀectively decreases regularization strength at outliers.

The local weight ω counteracts this imbalance by dividing by the total number of terms in which a particular value u(x) appears.

4.3. Numerical results. All experiments were performed on an Intel Xeon E5-2630 at 2.3 GHz with 64GB of RAM, MATLAB R2014a running on Scientiﬁc Linux 6.3, GCC 4.4.6, and Mosek 7. Run times were between several seconds for the geometric examples to several minutes for the full-size images, the majority of which was spent at the solution stage. The computation of the geodesic distances d_M using the fast marching method only took a few milliseconds in all cases, and the total preprocessing time including building the sparse matrix structures A and B took less than 5 seconds for the full-size images.

The solution of the Eikonal equation and computation of the weights as well as the system matrix use a custom C++ implementation. For solving the minimization problems we used the commercial interior-point based Mosek solver with the CVX interface. This allows us to eﬃciently compute solutions with very high accuracy and therefore to evaluate the model without the risk of accidentally comparing only approximate solutions. The stopping criterium was set to guarantee that the normalized diﬀerence between the objective values f of the numerical solution and f^∗ of the true minimizer satisﬁes (f − f^∗)/f^∗ √

= 1.5· 10⁻⁸. Alternatively, nonsmooth ﬁrst-order methods could be used; however, in our experience they become prohibitively slow for a precision beyond 10⁻⁴ due to the sublinear convergence rate.

Figure 3 illustrates the eﬀect of several classical local regularizers, including TV, TV², and TGV. As expected, TV generates the well-known staircasing eﬀect, while TV² leads to oversmoothing of the jumps. TGV with hand-tuned parameters performs reasonably well;

however, it exhibits a characteristic pattern of clipping sharp local extrema. This behavior has also been analytically conﬁrmed in [PB13,PS13] for the one-dimensional case.

Downloaded 11/25/15 to 137.189.170.231. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

input TGV nonlocal Hessian (proposed)

Figure 4. Nonlocal regularization of the problem in Figure3. The adaptive choice of the neighborhood and weights together with the nonlocal Hessian preserves the jumps, clips the top of the slope, and allows one to perfectly reconstruct the piecewise aﬃne signal.

Figure 4 shows the eﬀect of our nonlocal Hessian-based method for the same problem.

The piecewise aﬃne signal is almost perfectly recovered. This is mostly due to the fact that the jumps are relatively large, which means that after computing the neighborhoods the circular region and the background are not coupled in terms of regularization. Therefore the regularization weight α can be chosen very large, which results in virtually aﬃne regions.

To see what happens with smaller jumps, we generated a pattern of opposing slopes (Figure 5). As expected, both TGV and the nonlocal Hessian approach struggle when the jump is small. This shows the limitations of our approach for choosing the weights—while it adds some structural information to the regularizer, this information is still restricted to a certain neighborhood of each point and does not take into account the full global structure.

Figures 6–8 show a quantitative comparison of our nonlocal method with the results of the TGV approach and several classical regularizers. The parameters for each method were chosen by a grid search to yield the best PSNR. For all images we also provide a more realistic

“structural similarity index” (SSIM) [WBSS04]. The nonlocal approach improves on TGV with respect to PSNR (34.09 vs. 33.28). However, it is interesting to look at the spatial distribution of the error: the parts where the nonlocal approach improves are exactly the local maxima, which are smoothed over by TGV. Surprisingly, this is hardly visible when looking at the images only (Figure 6), which is in line with the good results obtained with TGV for natural images. However, this property might become a problem when the data is not a natural image, for example in the case of depth images or digital elevation maps. We refer the reader to [LBL13] for a discussion of a problem that is more demanding in terms of the correct choice of the regularization.

Finally, Figure 9 shows an application to the “cameraman” image with L² data term.

Small details on the camera as well as long, thin structures such as the camera handle and the highlights on the tripod are well preserved. In larger, more homogeneous areas the result is what would be expected from second-order smoothness.

In its basic form, our approach is data-driven; i.e., the weights are computed directly

Downloaded 11/25/15 to 137.189.170.231. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

input TGV low TGV high nonlocal Hessian

Figure 5. Denoising results for the “opposing slopes” image. The small jump at the crossing causes a slight amount of smoothing for both the TGV and the nonlocal approaches. TGV with low regularization strength (TGV low, α0= α1= 0.8) does reasonably well at preserving the strong jump on the right but does not

在文檔中 Analysis and Application of a Nonlocal Hessian (頁 23-42)