• 沒有找到結果。

123 -norm l Signalreconstructionbyconjugategradientalgorithmbasedonsmoothing

N/A
N/A
Protected

Academic year: 2022

Share "123 -norm l Signalreconstructionbyconjugategradientalgorithmbasedonsmoothing"

Copied!
26
0
0

加載中.... (立即查看全文)

全文

(1)

https://doi.org/10.1007/s10092-019-0340-5

Signal reconstruction by conjugate gradient algorithm based on smoothing l

1

-norm

Caiying Wu1· Jiaming Zhan1· Yue Lu2· Jein-Shan Chen3

Received: 12 January 2019 / Accepted: 9 October 2019

© Istituto di Informatica e Telematica (IIT) 2019

Abstract

The l1-norm regularized minimization problem is a non-differentiable problem and has a wide range of applications in the field of compressive sensing. Many approaches have been proposed in the literature. Among them, smoothing l1-norm is one of the effective approaches. This paper follows this path, in which we adopt six smoothing functions to approximate the l1-norm. Then, we recast the signal recovery problem as a smoothing penalized least squares optimization problem, and apply the nonlinear conjugate gradient method to solve the smoothing model. The algorithm is shown globally convergent. In addition, the simulation results not only suggest some nice smoothing functions, but also show that the proposed algorithm is competitive in view of relative error.

Keywords l1-norm regularization· Compressive sensing · Conjugate gradient algorithm· Smoothing function

Mathematics Subject Classification 90C33

Caiying Wu: The research is supported by the Natural Science Foundation of Inner Mongolia Autonomous Region (2018MS01016)

Jiaming Zhan: The research is supported by the Natural Science Foundation of Inner Mongolia Autonomous Region (2018MS01016)

Yue Lu: The research is supported by National Natural Science Foundation of China (Grant No.

11601389), Doctoral Foundation of Tianjin Normal University (Grant No. 52XB1513), 2017-Outstanding Young Innovation Team Cultivation Program of Tianjin Normal University (Grant No. 135202TD1703) and Program for Innovative Research Team in Universities of Tianjin (Grant No. TD13-5078).

Jein-Shan Chen: The research is supported by Ministry of Science and Technology, Taiwan.

B

Jein-Shan Chen jschen@math.ntnu.edu.tw

Extended author information available on the last page of the article

(2)

1 Introduction

In this paper, we focus on the topic of reconstructing signals, which is one of the most important applications in compressive sensing [8,9,16]. There exist numerous textbooks and articles related to this topic and there is no need to repeat its importance and applications here. Therefore, we get into its mathematical model directly and convey our new idea for tackling this problem. Mathematically, the noise-free signal reconstruction problem model is described as follows:

minx0

s.t. b = Ax, (1)

where x ∈ IRn is the original sparse signal that needs to be recovered, A ∈ IRm×n (m  n) is the measurement matrix, b ∈ IRm is the observation vector, andx0

represents the l0-norm of x, which is defined as the number of nonzero components of x. Without loss of generality, we assume that there exists a positive constant K ≤ n such thatx0 = K . Besides this condition, if the measurement matrix A satisfies Restricted Isometry Property (RIP) of order K , then we can recover the signal x more accurately through the model (1), see [6–10] for more details.

Unfortunately, the l0-norm minimization problem (1) is an NP-hard combinatorial optimization problem. In order to avoid this difficulty, researchers attempt to replace l0-norm by l1-norm in model (1) and obtain the following l1minimization problem

minx1

s.t. b = Ax, (2)

where x1 denotes the l1 norm of x andx1 = n

i=1|xi|. Under the RIP con- dition, the model (2) has the same solution as (1). In practice, the probed signal b is usually impacted by noise, therefore there arises investigation on the noise signal reconstruction problem:

minx1

s.t. b = Ax + e, (3)

where e ∈ IRm denotes the noise. In order to deal with (3), researchers prefer to consider the associated penalized least squares optimization problem

minλx1+1

2b − Ax22 (4)

whereλ > 0 is the penalty factor. In the sequel, we call (4) the l1-norm regularized problem.

Until now, there are plenty of numerical algorithms for solving the model (4) and some first-order algorithms have drawn much attention during the past decades.

Due the huge amount of literature, we only outline and recall some approaches

(3)

as below. The gradient projection algorithm [18] is one of the earliest gradient- based algorithms for solving (4), in which it was reformulated as a box constrained quadratic program and solved by the gradient projection algorithm. Recently, the most extensively investigated first order algorithm for the solution of (4) was the iter- ative shrinkage/thresholding (IST) algorithm [15,21]. Their results triggered off many contributions based on this method [17,24,25]. In light of the IST algorithm, many variants are proposed under different optimization reformulations and techniques. For instances, Hale, Yin and Zhang [19,20] presented an IST fixed point continuation (FPC) method by an operator splitting skill. Wright, Nowak and Figueiredo [29] stud- ied the spaRSA for sparse reconstruction from solving nonsmooth convex problem.

Experimental results show that the accelerated IST methods (two IST [4] and fast IST [1]) have better convergence properties. In addition, the famous NESTA [2] first pro- posed the smoothing function of the l1-norm and then used Nesterov’s gradient method to get the solution of (3). Besides these methods, the gradient projection technique and alternating direction ideas are also considered, see [3,5,30] for more details.

Because the simplicity and lower storage requirements, conjugate gradient algo- rithms are suitable for large scale problems. In this paper, like [28,31,33], we are interested in applying the conjugate gradient method for signal recovery. We use six smoothing functions, which are studied in [23,27] for absolute value function, to approximate the l1-norm. Accordingly, we reformulate the signal recovery problem as a smoothing penalized least squares optimization problem, and apply the nonlinear conjugate gradient method [32] to solve the smoothing model. The proposed algorithm is shown globally convergent. Moreover, we report some numerical experiments to demonstrate the effectiveness of our method. Numerical comparisons with “NESTA”,

“FPCBB”, and “FISTA”, which are well-known open softwares, are presented as well.

2 Preliminary

This section recalls some ideas about smoothing technique which can be found in [11,12,23] and references therein.

It is well known that the absolute value function|t| (t ∈ IR) is not smooth at zero. In order to overcome this difficulty, we introduce some smoothing functions of|t| used in the sequel.

Definition 2.1 The functionψ : IR++× IR → IR is a smoothing function of |t|, if the following two conditions hold:

(a) ψ is continuously differentiable at (μ, t) ∈ IR++× IR;

(b) limμ↓0ψ(μ, t) = |t| for any t ∈ IR.

How to construct smoothing functions for |t|? First, we observe that |t| can be divided into two parts:

|t| = (t)+− (t)= (t)++ (−t)+,

where(t)+denotes the plus function, i.e.,(t)+= max{0, t}, and (t)= min{0, t}. As mentioned in [11,12,23], one can follow the below procedure to construct a smoothing

(4)

function for(t)+. More specifically, through importing a density (kernel) function d(t) with finite number of pieces satisfying

d(t) ≥ 0 and

 +∞

−∞ d(t)dt = 1, one can define

ˆs(t, μ) := 1 μd

t μ

 ,

whereμ is a positive parameter. If the following condition holds

 +∞

−∞ |t| d(t) dt < +∞, then the function ˆp(t, μ) defined as

ˆp(t, μ) =

 +∞

−∞ (t − s)+ˆs(s, μ)ds =

 t

−∞(t − s)ˆs(s, μ)ds ≈ (t)+

is a smoothing approximation for (t)+. There are existing well-known smoothing functions for the plus function [11,22,23], for example,

ˆψ1(μ, t) = t + μ ln

1+ eμt 

, (5)

ˆψ2(μ, t) =

⎧⎨

t if tμ2,

1 2μ

t+μ2 2

if −μ2 < t < μ2,

0 if t≤ −μ2,

(6)

ˆψ3(μ, t) =

2+ t2+ t

2 , (7)

ˆψ4(μ, t) =

⎧⎨

tμ2 if t > μ,

t2

2μ if 0≤ t ≤ μ, 0 if t < 0.

(8)

where the corresponding kernel functions are respectively given by

d1(t) = e−x (1 + e−x)2, d2(t) =

1 if −12 ≤ x ≤ 12, 0 otherwise, d3(t) = 2

(x2+ 4)32,

(5)

d4(t) =

1 if 0≤ x ≤ 1, 0 otherwise.

Likewise, we can achieve the smoothing function of|t| via convolution as follows:

ˆp(|t| , μ) = ˆp(t, μ) + ˆp(−t, μ) =

 +∞

−∞ |t − s| ˆs(s, μ)ds.

Analogous to (5)–(8), we construct four smoothing functions for|t| as [23]:

ψ1(μ, t) = μ ln



1+ eμt  + ln

1+ eμt

, (9)

ψ2(μ, t) =

⎧⎨

t if tμ2,

t2

μ +μ4 if −μ2 < t < μ2,

−t if t ≤ −μ2,

(10)

ψ3(μ, t) =

2+ t2, (11)

ψ4(μ, t) =

 t2

2μ if |t| ≤ μ,

|t| −μ2 if |t| > μ. (12)

In particular, if we take a Epanechnikov kernel function

d(t) =

3

4(1 − t2) if |t| ≤ 1, 0 if otherwise, then the associated smoothing function for|t| is expressed by

ψ5(μ, t) =

⎧⎪

⎪⎩

t if t > μ,

8tμ43 +3t4μ2 +38μ if − μ ≤ t ≤ μ,

−t if t < −μ.

(13)

Moreover, taking a Gaussian kernel function d(t) = 12πet 22 for all t ∈ IR yields

ˆs(t, μ) := 1 μd

t μ



= 1

2πμ2e

t 2 2μ2,

which leads to another type of smoothing function [27] for|t|:

ψ6(μ, t) = terf

 t

√2μ

 +

2

πμe2μ2t 2 , (14) where the error function is defined as follows:

erf(t) = 2

π

 t

0

e−u2du ∀t ∈ IR.

(6)

In summary, we have obtained six smoothing functions and will employ them to approximate the l1-norm for signal recovery problem.

3 Algorithm and convergence properties

This section is devoted to the detailed description and implementation of our algorith- mic idea and its global convergence results. Basically, it is a type of conjugate gradient algorithm. To proceed, we briefly review the conjugate gradient method. Suppose that

f : IRn→ IR is a continuously differentiable function and the target problem is

xmin∈IRn f(x).

Let dkdenote the search direction andαkbe a step length. Then, the general framework of conjugate gradient method is as below.

xk+1= xk+ αkdk, dk =

−∇ f (xk) if k= 0,

−∇ f (xk) + βkdk−1 if k> 0, (15)

where ∇ f is the gradient of f and βk is a parameter. In general, various choices of parametersβk represent (correspond to) different conjugate gradient algorithms.

Among them, the three-term PRP (Polak–Ribiere–Polyak) conjugate gradient method is the most efficient, where

dk =

−∇ f (xk) if k= 0,

−∇ f (xk) + βkdk−1− θkyk−1 if k> 0, yk−1= ∇ f (xk) − ∇ f (xk−1),

βk = (∇ f (xk))Tyk−1

∇ f (xk−1)2 , θk = (∇ f (xk))Tdk−1

∇ f (xk−1)2 . (16)

The utmost feature of this conjugate gradient method is to ensure the sufficient descent property of direction at each iteration, which plays a prominent role in the global convergence analysis. For more details, please refer to [32] and references therein.

For convenience, we denote

fμ(x) := λφi(μ, x) +1

2b − Ax22, x ∈ IRn, (17) whereφi(μ, x) is a smoothing function of l1normx1. In other words, it corresponds to the format as below:

(7)

φi(μ, x) :=

n j=1

ψi(μ, xj), i = 1, 2, 3, 4, 5, 6. (18)

Thus, the problem (4) is equivalent to the smoothing penalized least squares optimiza- tion problem:

xmin∈IRn fμ(x) (19)

which is our new target problem. To deal with the unconstrained minimization (19), the following iterative scheme is employed, see [32].

The Algorithm

Step 0 Given starting point x0∈ IRnandμ0> 0. Choose parameters ρ, δ, γ , and

¯γ ∈ (0, 1). Let d0= −∇ fμ0(x0). Set k = 0.

Step 1 If termination criterion is reached, then stop.

Step 2 Determineαk= max{ρj, j = 0, 1, 2, . . .} satisfying fμk(xk+ αkdk) ≤ fμk(xk) − 2δ(1 − γ )αkfμk(xk).

Step 3 Let xk+1= xk+ αkdkandμk+1= ¯γμk. Replace f with fμk+1 in formula (16) and compute dk+1by (16).

Step 4 Set k= k + 1 and return to step 1.

We also say a few words about how to updateμkin Step 3. For Experiment 1,μk

is updated at each iteration byμk+1= ¯γμk, ¯γ ∈ (0, 1) and we select ¯γ = 0.4. For Experiment 2 and Experiment 3, an inner loop is used to solve our relaxed problem forμk, and then reducesμk. This is indeed a technique appeared in NESTA. When μk is small enough, our algorithm no longer updates the parameterμk.

Lemma 3.1 Suppose that the function fμ(x) is defined by (17). Then, there exists a constant L> 0 such that∇fμ(x) − ∇ fμ(y) ≤ Lx − y for all x, y ∈ IRn. Proof For any fixed μ > 0, in order to prove the Lipschitz property of ∇ fμ(x), we need to verify the Lipschitz condition ofψi (i = 1, 2, 3, 4, 5, 6). To this end, we discuss two cases.

Case (i): i = 1, 3, 5, 6. For any t1, t2∈ IR, without losing of generality, let t1< t2, by Lagrange Mean Value Theorem, we have

i(μ, t1) − ψi(μ, t2) =i(μ, ξ) |t1− t2| , ξ ∈ (t1, t2).

For subsequent analysis, we need to estimatei(μ, ξ) for each i = 1, 3, 5, 6.

For i = 1, we know that

1(μ, ξ)| = 1 μ

eξμ

(1 + eμξ)2 + e−ξμ (1 + e−ξμ)2

⎦ < 2 μ.

(8)

For i = 3, it is clear that3(μ, ξ) =(4μ24μ22)3/2 <21μ. For i = 5, we have

ψ5(μ, t) =

⎧⎪

⎪⎩

1 if t> μ,

t33+3t if − μ ≤ t ≤ μ,

−1 if t< −μ.

ψ5(μ, t) =

⎧⎪

⎪⎩

0 if t> μ,

3t23+3 if − μ ≤ t ≤ μ,

0 if t< −μ.

which yields

5(μ, ξ) <23 + 3

2μ = 3 μ. For i = 6, we compute

ψ6(μ, t) = 2

π

 t 2μ

0

e−u2du, ψ6(μ, t) =

√2

πμe

−t2 2μ2,

which imply6(μ, ξ) < πμ2 .

All the aforementioned results indicate that

i(μ, t1) − ψi(μ, t2) ≤ 3

μ|t1− t2|, i = 1, 3, 5, 6. (20) for any t1, t2∈ IR.

Case (ii): i = 2, 4. Indeed, we will provide a version like (20) forψ2 andψ4. For i = 2, we know that

ψ2(μ, t) =

⎧⎨

1 if tμ2,

2tμ if −μ2 < t < μ2,

−1 if t ≤ −μ2.

If t1μ2, t2μ2, t1≤ −μ2, t2≤ −μ2 or t1, t2∈ (−μ2,μ2), then

2(μ, t1) − ψ2(μ, t2) ≤ 2

μ|t1− t2|.

If t1μ2, t2≤ −μ2, then

2(μ, t1) − ψ2(μ, t2) = 2 = μ2 μ ≤ 2

μ|t1− t2|.

If t1μ2, t2∈ (−μ2,μ2), then

2(μ, t1) − ψ2(μ, t2) = 1 −2t2

μ < 2t1

μ2t2

μ = 2

μ|t1− t2|.

(9)

If t1≤ −μ2, t2∈ (−μ2,μ2), then

2(μ, t1) − ψ2(μ, t2) = 1 +2t2

μ < −2t1

μ +2t2

μ = 2

μ|t1− t2|.

Thus, it is clear to conclude that

2(μ, t1) − ψ2(μ, t2) ≤ 2

μ|t1− t2|, ∀ t1, t2∈ IR.

For i = 4, using the similar arguments for case of i = 2, it can be verified that

4(μ, t1) − ψ4(μ, t2) ≤ 1

μ|t1− t2|, ∀ t1, t2∈ IR.

In summary, we also achieve

i(μ, t1) − ψi(μ, t2) ≤ 2

μ|t1− t2|, i = 2, 4. (21) for any t1, t2∈ IR.

Now, applying (20) and (21), for any x, y ∈ IRn, we have

∇fμ(x) − ∇ fμ(y)

=λ [∇φi(μ, x) − ∇φi(μ, y)] + AT(Ax − b) − AT(Ay − b)

≤ 3

μλnx − y + A2x − y

= Lx − y,

where L =μ3λn + A2. Thus, the proof is complete. 

Lemma 3.2 Forμ > 0, the level set L(x0) = {x ∈ IRn| fμ(x) ≤ fμ(x0)} is bounded.

Proof We prove it by contradiction. Suppose that the set L(x0) is unbounded. Then, there exists an index set K1, such that xk → ∞, k → ∞, k ∈ K1. Recalling the definition of fμ(x), we have fμ(xk) → ∞, k → ∞, k ∈ K1. This contradicts fμ(xk) ≤ fμ(x0). Thus, the level set L(x0) is bounded. 

In fact, since the continuity of fμ(x), we find the level set L(x0) is compact.

We point out that Lemmas3.1and 3.2play key roles in our theoretical part. They were assumed as two assumptions in [32] and other literature like [14]. Here we assert them by showing that function (17) based on the proposed smoothing functions satisfies these assumptions. With these properties, we are ready to present the global convergence of our algorithm.

(10)

Theorem 3.1 For anyμ > 0, consider the aforementioned algorithm with any starting point x0. Let{xk} be the sequence generated by the algorithm. Then

lim inf

k→∞ ∇ fμ(xk) = 0. (22)

Proof Suppose that the conclusion (22) is not true. Then, there exists a constantε0> 0, such that

∇ fμ(xk) ≥ ε0, ∀ k. (23)

Since(∇ fμ(xk))Tdk = −∇ fμ(xk)2, there existsαk> 0, such that

fμ(xk+ αkdk) ≤ fμ(xk) − 2δ(1 − γ )αkfμ(xk). (24) This means{ fμ(xk)} is decreasing and bounded, which implies that xk∈ L(x0) and { fμ(xk)} is convergent. We denote that limk→∞ fμ(xk) = f. From (24), it can be verified that limk→∞αkfμ(xk) = 0. By the definition of fμ(x), we know f > 0, and therefore 0< αk =αkffμ(xk)

μ(xk)αkfμf(xk). To sum up, we obtain

klim→∞αk = 0. (25)

Now, applying Lemma3.2, there is a constant r > 0, such that

∇ fμ(xk) ≤ r, ∀ k. (26)

Next, we prove {dk} is bounded by a contradiction argument. If {dk} is unbounded, then there exists an index K2, such thatdk → ∞, k → ∞, k ∈ K2. Letθk be the angle between−∇ fμ(xk) and dk. Then, we see that

cosθk= −(∇ fμ(xk))Tdk

∇ fμ(xk)dk =∇ fμ(xk)

dk . (27)

This relation enables us to apply ε0 ≤ ∇ fμ(xk) ≤ r to conclude cos θk → 0, k∈ K2, k→ ∞, which means θkπ2, k∈ K2, k→ ∞. Considering this geometric relationship, we have(∇ fμ(xk))Tdk→ 0, k ∈ K2, k→ ∞, from which we find

−∇ fμ(xk)2= (∇ fμ(xk))Tdk → 0, k ∈ K2, k → ∞,

which contradicts (23). Thus,{dk} is bounded, i.e., there exists a constant M> ε0, such that

dk ≤ M, ∀ k. (28)

(11)

Then, combining (25) together with (28) gives

klim→∞αkdk = 0. (29)

In addition, from (27), we have cosθkMε0, which further yields

− (∇ fμ(xk))Tdk = ∇ fμ(xk)dk cos θkε02

Mdk. (30) Applying the mean value theorem, we obtain

fμ(xk+ αkdk) = fμ(xk) + αk(∇ fμk))Tdk

= fμ(xk) + αk(∇ fμ(xk))Tdk+ αk(∇ fμk) − ∇ fμ(xk))Tdk

≤ fμ(xk) + αkdk(∇ fμ(xk))Tdk

dk + ∇ fμk) − ∇ fμ(xk) , (31) whereξkis between xkand xk+ αkdk. Using Lemma3.1and the compact property of level set L(x0) imply that ∇ fμ(x) is uniformly continuous on L(x0). This together and (29) implies that there exists a constantα > 0, for sufficiently large k, such that αkdk < α and

∇ fμk) − ∇ fμ(xk) < 1 2

ε02

M. (32)

Then, when k is sufficiently large, from (31), we have

fμ(xk+ αkdk) ≤ fμ(xk) + α

ε20 M +1

2 ε02 M



= fμ(xk) − ˆαε02

2M, (33)

which contradicts fμ(xk+1) − fμ(xk) → 0, k → ∞. Therefore, there must hold lim inf

k→∞ ∇ fμ(xk) = 0.



4 Numerical experiments

In this section, we conduct numerical experiments to show the performance of our algorithm for signal recovery. All tests are run in MATLAB R20116 on a 64-bit PC with an Intel (R) Core(TM) i7-6500U of 2.50 GHz CPU and 8.00 GB of RAM equipped with Windows 7 operating system.

(12)

Experiment 1

In what follows, we divide our tests into two groups: noise-free case and noise case.

we report the numerical results of our algorithm for signal recovery. In all of our simulations, A∈ IRm×nis assumed to be Gaussian matrix and m=n2, n4, x∈ IRnis a K -sparse original signal with K = 40n, whose nonzero component satisfies normal distribution N(0, 1). Moreover, we set b = Ax + e in the noisy case, where e is the Gaussian noise with zero mean and varianceσ2.

The parameters of our algorithm are summarized as follows:

x0= zeros(n, 1), λ = 0.001 ∗ ATb, ρ = 0.5, δ = 0.002, γ = 0.2, σ2= 0.0001.

In Tables1and2, we setμ0= 0.1, μk+1= 0.4μkand the algorithm terminates when the relative error

x − x

x < 4 × 10−3,

where x is the reconstruction signal. In Tables 3 and4, F denotes the smoothing functions, CPU denotes cpu time, Re denotes the relative error and It denotes the iterations, we setμ = 1.0 e−2, 1.0 e−3, 1.0 e−4, 1.0 e−5and the algorithm terminates when

| fμ(xk+1) − fμ(xk)|

| fμ(xk+1)| < 1.0 e−12.

Note that the result from each test is obtained by running our algorithm 10 times and taking its average result.

In our experiments, we found thatψ1is very unstable, sometimes it is not applicable to solve the test problems. Similar concern had also been addressed in [13, page 135].

Table 1 Comparisons of five smoothing functions for noise-free case

CPU time Iterations

n m ψ2 ψ3 ψ4 ψ5 ψ6 ψ2 ψ3 ψ4 ψ5 ψ6

2000 n/2 1.21 0.94 1.32 1.27 0.92 72 74 75 73 69

n/4 1.37 0.90 1.45 1.59 1.00 129 135 142 148 147

4000 n/2 4.16 3.83 5.01 4.50 3.80 70 70 78 72 68

n/4 4.68 4.12 5.19 5.58 4.50 131 143 143 148 146

6000 n/2 9.54 8.49 9.80 9.22 8.61 70 70 70 69 68

n/4 9.69 10.35 10.71 12.91 10.54 134 163 145 159 164

8000 n/2 16.93 16.25 18.76 18.91 15.55 67 71 74 78 69

n/4 15.76 15.65 18.65 19.49 15.75 126 137 145 144 138

10000 n/2 25.88 24.06 30.07 25.83 24.21 67 67 76 68 68

n/4 27.94 28.28 29.96 34.48 27.60 139 138 151 160 149

(13)

Table 2 Comparisons of five smoothing functions for noisy case

CPU time Iterations

n m ψ2 ψ3 ψ4 ψ5 ψ6 ψ2 ψ3 ψ4 ψ5 ψ6

2000 n/2 1.27 1.00 1.30 1.33 1.00 73 78 75 75 72

n/4 1.54 0.92 1.64 1.55 0.91 145 151 154 150 144

4000 n/2 4.48 4.47 4.89 4.62 4.36 72 80 75 72 76

n/4 4.94 5.16 5.77 5.09 4.65 137 159 154 143 158

6000 n/2 8.81 9.07 9.80 9.91 8.97 65 72 72 67 69

n/4 11.65 11.61 12.10 12.38 10.12 160 167 152 165 154

8000 n/2 16.27 17.05 18.28 18.53 16.69 69 75 76 74 71

n/4 15.76 18.04 19.27 17.99 16.91 132 155 145 145 143

10000 n/2 25.59 25.58 26.04 25.82 25.38 68 71 67 67 68

n/4 25.82 27.23 37.00 29.52 27.73 136 150 175 150 155

Accordingly, we try to adopt the equivalent expression suggested in [13, formula (3.1)]

and rewriteψ1(t) as below:

ψ1(t) = μ

 ln



eλ1(A(t))μ + eλ2(A(t))μ

 + ln



eλ1(B(t))μ + eλ2(B(t))μ



= λ1(A(t)) + μ



eλ2(A(t))−λ1(A(t)) μ



+ λ1(B(t)) + μ



eλ2(B(t))−λ1(B(t)) μ



where

A(t) =

0 0 0 −t



, B(t) =

0 0 0 t

 , λ1(A(t)) = 0, λ2(A(t)) = −t, λ1(B(t)) = 0, λ2(B(t)) = t.

Nonetheless,ψ1still does not work well along with the algorithm. We therefore omit it in our numerical reports.

From the experiments, we have the following observations:

(1) The sparse original signal can be recovered by our algorithm effectively.

Tables1,2,3,4and Figs.1,2,3,4,5illustrate that the five smoothing func- tions (exceptψ1) in our algorithm work quite well whether the given problem is noise-free or noisy.

(2) The convergence behavior with respect to CPU time, iterations, signal size, and the relative error are depicted in Figs.1,2,3,4,5, respectively. From Tables1,2 and Figs.1,2, we see that our algorithm costs more cpu time when increasing the dimension n, while the changes of their iterations are very marginal. On the whole, the performance order can be summarized as below:

ψ2≥ ψ4≈ ψ5≈ ψ6> ψ3

(14)

Table3Comparisonsoffivefunctionsfordifferentμwhenn=5000andnoise-freecase μ=1.0e2μ=1.0e3μ=1.0e4μ=1.0e5 FmCPUReItCPUReItCPUReItCPUReIt ψ2n/28.800.0158929.340.00408911.030.003310016.780.0025141 n/410.980.036517211.490.006417811.830.002918417.060.0028237 ψ3n/29.440.05671048.140.0090867.980.00328610.930.0025111 n/47.440.15321679.100.01951798.360.004516610.340.0030188 ψ4n/28.220.0538717.920.00597310.580.00279312.010.0025104 n/49.130.015713911.110.015415510.690.003517913.430.0032194 ψ5n/27.880.0419687.590.00557210.140.00259112.770.0025109 n/410.260.189513211.090.011715610.510.004116912.830.0034193 ψ6n/28.950.0360967.940.0063849.690.00279912.570.0023121 n/49.170.09261669.160.01281738.980.003817411.750.0030203

(15)

Table4Comparisonsoffivesmoothingfunctionsfordifferentμwhenn=5000andnoisycase μ=1.0e2μ=1.0e3μ=1.0e4μ=1.0e5 FmCPUReItCPUReItCPUReItCPUReIt ψ2n/28.620.0159898.770.00728811.820.002610015.800.0025143 n/411.670.037917410.400.006318010.260.003318815.030.0033242 ψ3n/28.540.05481007.860.0085888.750.00339210.380.0025109 n/47.430.15061698.430.01851697.780.00451779.820.0030203 ψ4n/26.760.0491648.110.00587611.400.00289511.930.0026105 n/411.100.14391548.840.01371529.670.003517511.520.0035184 ψ5n/27.360.0542657.820.0059759.570.00318711.330.0025104 n/410.190.149214310.000.012315410.480.003917711.370.0034191 ψ6n/28.540.0355948.780.0061929.050.00299611.570.0026119 n/47.770.09081609.150.01221808.110.004016710.500.0030207

(16)

Fig. 1 The convergence behavior: cpu time versus signal size (n)whenμk+1= 0.4μk

Fig. 2 The convergence behavior: iterations versus signal size (n) whenμk+1= 0.4μk

(17)

Fig. 3 The convergence behavior: cpu time versus− log μ with n = 5000

Fig. 4 The convergence behavior: iterations versus− log μ with n = 5000

(18)

Fig. 5 The convergence behavior: relative error versus− log μ with n = 5000

where “≥” means performs better than and “≈” means there is no difference in numerical performance.

(3) According to Tables3,4and Figs.3,4,5, the smaller the parameterμ, the better the signal to recover; and the more cpu time and iterations have to spend accordingly.

For any fixedμ, although the function ψ2sometimes takes a bit more cpu time and iterations, it has a lower relative error. In view of this, we could conclude that the functionψ2works best along with the proposed algorithm. From Fig.5, the functionψ6also has a lower relative error, which meansψ6is a possible good choice as well. To sum up, in view of relative error, we have

ψ2≥ ψ4≈ ψ5≈ ψ6> ψ3.

Experiment 2

Next, we try to do comparison with “NESTA”, which is a well-known, fast, open software proposed in [2]. In the noisy case, b= Ax + e, where e is the Gaussian noise with zero mean and varianceσ2. In all of our simulations, A∈ IRm×nis assumed to be Gaussian matrix, where n = 213and m = n4. We select K -sparse original signal x ∈ IRnwith K = 40n, its nonzero component satisfies normal distribution N(0, 1).

In our test, we set

x0= ATb, λ = 0.0048 ∗ ATb, ρ = 0.5, δ = 0.002, γ = 0.2, σ2= 0.0001.

(19)

Table5ComparisonsoffivesmoothingfunctionsandNESTAfordifferenttolwhennoise-free tol=1.0e4tol=1.0e5tol=1.0e6tol=1.0e7tol=1.0e8 FCPUReItCPUReItCPUReItCPUReItCPUReIt ψ223.779.62E0415028.301.98E0517735.652.11E0521346.621.91E0525659.971.58E05342 ψ321.826.37E0314224.553.03E0517130.882.29E0520636.542.01E0524140.391.76E05286 ψ426.647.00E0314627.034.32E0516936.792.21E0521545.122.04E0525859.341.76E05338 ψ526.322.21E0315728.042.77E0517434.732.12E0520945.382.14E0525649.891.88E05296 ψ619.534.82E0313824.833.01E0517232.412.34E0521438.802.06E0524550.091.68E05301 NESTA7.962.28E025558.302.95E0256817.123.26E04113415.981.82E05106044.501.74E052999

參考文獻

相關文件

Accordingly, we reformulate the image deblur- ring problem as a smoothing convex optimization problem, and then apply semi-proximal alternating direction method of multipliers

In this paper, we build a new class of neural networks based on the smoothing method for NCP introduced by Haddou and Maheux [18] using some family F of smoothing functions.

Chen, The semismooth-related properties of a merit function and a descent method for the nonlinear complementarity problem, Journal of Global Optimization, vol.. Soares, A new

Then, based on these systematically generated smoothing functions, a unified neural network model is pro- posed for solving absolute value equationB. The issues regarding

Lin, A smoothing Newton method based on the generalized Fischer-Burmeister function for MCPs, Nonlinear Analysis: Theory, Methods and Applications, 72(2010), 3739-3758..

This kind of algorithm has also been a powerful tool for solving many other optimization problems, including symmetric cone complementarity problems [15, 16, 20–22], symmetric

Based on the reformulation, a semi-smooth Levenberg–Marquardt method was developed, and the superlinear (quadratic) rate of convergence was established under the strict

we often use least squares to get model parameters in a fitting problem... 7 Least