• 沒有找到結果。

Signal reconstruction by conjugate gradient algorithm based on smoothing l

N/A
N/A
Protected

Academic year: 2022

Share "Signal reconstruction by conjugate gradient algorithm based on smoothing l"

Copied!
28
0
0

加載中.... (立即查看全文)

全文

(1)

Calcolo, vol. 56, no. 4, Article 42, 26 pages, December, 2019

Signal reconstruction by conjugate gradient algorithm based on smoothing l

1

-norm

Caiying Wu 1

College of Mathematics Science Inner Mongolia University

Hohhot 010021, China Jiaming Zhan 2

College of Mathematics Science Inner Mongolia University

Hohhot 010021, China Yue Lu 3

School of Mathematical Sciences Tianjin Normal University

Tianjin 300387, China Jein-Shan Chen 4 Department of Mathematics National Taiwan Normal University

Taipei 11677, Taiwan January 12, 2019

(1st revision on May 21, 2019) (2nd revision on July 29, 2019) (3rd revision on August 18, 2019)

1E-mail: wucaiyingyun@163.com. The research is supported by the Natural Science Foundation of

Inner Mongolia Autonomous Region (2018MS01016)

2E-mail: zhanjiaming1220@qq.com. The research is supported by the Natural Science Foundation of

Inner Mongolia Autonomous Region (2018MS01016)

3E-mail: jinjin403@sina.com. The research is supported by National Natural Science Foundation of

China (Grant Number: 11601389), Doctoral Foundation of Tianjin Normal University (Grant Number:

52XB1513), 2017-Outstanding Young Innovation Team Cultivation Program of Tianjin Normal Uni- versity (Grant Number: 135202TD1703) and Program for Innovative Research Team in Universities of Tianjin (Grant Number: TD13-5078).

4Corresponding author. E-mail: jschen@math.ntnu.edu.tw. The research is supported by Ministry

of Science and Technology, Taiwan.

(2)

Abstract. The l1-norm regularized minimization problem is a non-differentiable problem and has a wide range of applications in the field of compressive sensing. Many approaches have been proposed in the literature. Among them, smoothing l1-norm is one of the effec- tive approaches. This paper follows this path, in which we adopt six smoothing functions to approximate the l1-norm. Then, we recast the signal recovery problem as a smoothing penalized least squares optimization problem, and apply the nonlinear conjugate gradient method to solve the smoothing model. The algorithm is shown globally convergent. In addition, the simulation results not only suggest some nice smoothing functions, but also show that the proposed algorithm is competitive in view of relative error.

Keywords. l1-norm regularization, compressive sensing, conjugate gradient algorithm, smoothing function.

Mathematics Subject Classification (2000) 90C33

1 Introduction

In this paper, we focus on the topic of reconstructing signals, which is one of the most important applications in compressive sensing [8, 9, 16]. There exist numerous textbooks and articles related to this topic and there is no need to repeat its importance and applications here. Therefore, we get into its mathematical model directly and convey our new idea for tackling this problem. Mathematically, the noise-free signal reconstruction problem model is described as follows:

min kxk0

s.t. b = Ax, (1)

where x ∈ IRnis the original sparse signal that needs to be recovered, A ∈ IRm×n(m  n) is the measurement matrix, b ∈ IRm is the observation vector, and kxk0 represents the l0-norm of x, which is defined as the number of nonzero components of x. Without loss of generality, we assume that there exists a positive constant K ≤ n such that kxk0 = K.

Besides this condition, if the measurement matrix A satisfies Restricted Isometry Prop- erty (RIP) of order K, then we can recover the signal x more accurately through the model (1), see [6, 7, 8, 9, 10] for more details.

Unfortunately, the l0-norm minimization problem (1) is an NP-hard combinatorial optimization problem. In order to avoid this difficulty, researchers attempt to replace l0-norm by l1-norm in model (1) and obtain the following l1 minimization problem

min kxk1

s.t. b = Ax, (2)

(3)

where kxk1 denotes the l1 norm of x and kxk1 = Pn

i=1|xi|. Under the RIP condition, the model (2) has the same solution as (1). In practice, the probed signal b is usually impacted by noise, therefore there arises investigation on the noise signal reconstruction problem:

min kxk1

s.t. b = Ax + e, (3)

where e ∈ IRm denotes the noise. In order to deal with (3), researchers prefer to consider the associated penalized least squares optimization problem

min λkxk1+ 1

2kb − Axk22 (4)

where λ > 0 is the penalty factor. In the sequel, we call (4) the l1-norm regularized problem.

Until now, there are plenty of numerical algorithms for solving the model (4) and some first-order algorithms have drawn much attention during the past decades. Due the huge amount of literature, we only outline and recall some approaches as below. The gradient projection algorithm [18] is one of the earliest gradient-based algorithms for solving (4), in which it was reformulated as a box constrained quadratic program and solved by the gradient projection algorithm. Recently, the most extensively investigated first order algorithm for the solution of (4) was the iterative shrinkage/thresholding (IST) algorithm [15, 21]. Their results triggered off many contributions based on this method [17, 24, 25]. In light of the IST algorithm, many variants are proposed under different optimization reformulations and techniques. For instances, Hale, Yin and Zhang [19, 20]

presented an IST fixed point continuation (FPC) method by an operator splitting skill.

Wright, Nowak and Figueiredo [29] studied the spaRSA for sparse reconstruction from solving nonsmooth convex problem. Experimental results show that the accelerated IST methods (two IST [4] and fast IST [1]) have better convergence properties. In addition, the famous NESTA [2] first proposed the smoothing function of the l1-norm and then used Nesterov’s gradient method to get the solution of (3). Besides these methods, the gradient projection technique and alternating direction ideas are also considered, see [3, 5, 30] for more details.

Because the simplicity and lower storage requirements, conjugate gradient algorithms are suitable for large scale problems. In this paper, like [28, 31, 33], we are interested in applying the conjugate gradient method for signal recovery. We use six smoothing functions, which are studied in [23, 27] for absolute value function, to approximate the l1- norm. Accordingly, we reformulate the signal recovery problem as a smoothing penalized least squares optimization problem, and apply the nonlinear conjugate gradient method [32] to solve the smoothing model. The proposed algorithm is shown globally convergent.

(4)

Moreover, we report some numerical experiments to demonstrate the effectiveness of our method. Numerical comparisons with “NESTA”, “FPCBB”, and “FISTA”, which are well-known open softwares, are presented as well.

2 Preliminary

This section recalls some ideas about smoothing technique which can be found in [11, 12, 23] and references therein.

It is well known that the absolute value function |t| (t ∈ IR) is not smooth at zero.

In order to overcome this difficulty, we introduce some smoothing functions of |t| used in the sequel.

Definition 2.1. The function ψ : IR++× IR → IR is a smoothing function of |t|, if the following two conditions hold:

(a) ψ is continuously differentiable at (µ, t) ∈ IR++× IR;

(b) limµ↓0ψ(µ, t) = |t| for any t ∈ IR.

How to construct smoothing functions for |t|? First, we observe that |t| can be divided into two parts:

|t| = (t)+− (t)= (t)++ (−t)+,

where (t)+ denotes the plus function, i.e., (t)+ = max{0, t}, and (t) = min{0, t}. As mentioned in [11, 12, 23], one can follow the below procedure to construct a smoothing function for (t)+. More specifically, through importing a density (kernel) function d(t) with finite number of pieces satisfying

d(t) ≥ 0 and

Z +∞

−∞

d(t)dt = 1, one can define

ˆ

s(t, µ) := 1 µd t

µ

 ,

where µ is a positive parameter. If the following condition holds Z +∞

−∞

|t| d(t) dt < +∞, then the function ˆp(t, µ) defined as

ˆ

p(t, µ) = Z +∞

−∞

(t − s)+s(s, µ)ds =ˆ Z t

−∞

(t − s)ˆs(s, µ)ds ≈ (t)+

(5)

is a smoothing approximation for (t)+. There are existing well-known smoothing func- tions for the plus function [11, 22, 23], for example,

ψˆ1(µ, t) = t + µ ln



1 + eµt



, (5)

ψˆ2(µ, t) =





t if t ≥ µ2,

1

t + µ22

if −µ2 < t < µ2, 0 if t ≤ −µ2,

(6)

ψˆ3(µ, t) = p4µ2+ t2+ t

2 , (7)

ψˆ4(µ, t) =





t − µ2 if t > µ,

t2

if 0 ≤ t ≤ µ, 0 if t < 0.

(8)

where the corresponding kernel functions are respectively given by d1(t) = e−x

(1 + e−x)2,

d2(t) = 1 if − 12 ≤ x ≤ 12, 0 otherwise, d3(t) = 2

(x2+ 4)32,

d4(t) = 1 if 0 ≤ x ≤ 1, 0 otherwise.

Likewise, we can achieve the smoothing function of |t| via convolution as follows:

ˆ

p(|t| , µ) = ˆp(t, µ) + ˆp(−t, µ) = Z +∞

−∞

|t − s| ˆs(s, µ)ds.

Analogous to (5)-(8), we construct four smoothing functions for |t| as [23]:

ψ1(µ, t) = µh ln

1 + eµt

 + ln

1 + eµt

i

, (9)

ψ2(µ, t) =





t if t ≥ µ2,

t2

µ +µ4 if − µ2 < t < µ2,

−t if t ≤ −µ2,

(10)

ψ3(µ, t) =p

2+ t2, (11)

ψ4(µ, t) = ( t2

if |t| ≤ µ,

|t| − µ2 if |t| > µ. (12)

(6)

In particular, if we take a Epanechnikov kernel function d(t) =

 3

4(1 − t2) if |t| ≤ 1, 0 if otherwise, then the associated smoothing function for |t| is expressed by

ψ5(µ, t) =





t if t > µ,

t43 +3t2 + 8 if − µ ≤ t ≤ µ,

−t if t < −µ.

(13)

Moreover, taking a Gaussian kernel function d(t) = 1

et22 for all t ∈ IR yields ˆ

s(t, µ) := 1 µd t

µ



= 1

p2πµ2e2µ2t2 , which leads to another type of smoothing function [27] for |t|:

ψ6(µ, t) = terf

 t

√2µ

 +

r2

πµe2µ2t2 , (14)

where the error function is defined as follows:

erf(t) = 2

√π Z t

0

e−u2du ∀t ∈ IR.

In summary, we have obtained six smoothing functions and will employ them to approx- imate the l1-norm for signal recovery problem.

3 Algorithm and Convergence Properties

This section is devoted to the detailed description and implementation of our algorithmic idea and its global convergence results. Basically, it is a type of conjugate gradient algorithm. To proceed, we briefly review the conjugate gradient method. Suppose that f : IRn→ IR is a continuously differentiable function and the target problem is

min

x∈IRnf (x).

Let dk denote the search direction and αk be a step length. Then, the general framework of conjugate gradient method is as below.

xk+1 = xk+ αkdk,

dk= −∇f (xk) if k = 0,

−∇f (xk) + βkdk−1 if k > 0, (15)

(7)

where ∇f is the gradient of f and βk is a parameter. In general, various choices of parameters βk represent (correspond to) different conjugate gradient algorithms. Among them, the three-term PRP (Polak-Ribiere-Polyak) conjugate gradient method is the most efficient, where

dk = −∇f (xk) if k = 0,

−∇f (xk) + βkdk−1− θkyk−1 if k > 0, (16) yk−1 = ∇f (xk) − ∇f (xk−1),

βk = (∇f (xk))Tyk−1 k∇f (xk−1)k2 , θk = (∇f (xk))Tdk−1

k∇f (xk−1)k2 .

The utmost feature of this conjugate gradient method is to ensure the sufficient descent property of direction at each iteration, which plays a prominent role in the global con- vergence analysis. For more details, please refer to [32] and references therein.

For convenience, we denote

fµ(x) := λφi(µ, x) +1

2kb − Axk22, x ∈ IRn, (17) where φi(µ, x) is a smoothing function of l1 norm kxk1. In other words, it corresponds to the format as below:

φi(µ, x) :=

n

X

j=1

ψi(µ, xj), i = 1, 2, 3, 4, 5, 6. (18)

Thus, the problem (4) is equivalent to the smoothing penalized least squares optimization problem:

x∈IRminnfµ(x) (19)

which is our new target problem. To deal with the unconstrained minimization (19), the following iterative scheme is employed, see [32].

The Algorithm

Step 0. Given starting point x0 ∈ IRn and µ0 > 0. Choose parameters ρ, δ, γ, and

¯

γ ∈ (0, 1). Let d0 = −∇fµ0(x0). Set k = 0.

Step 1. If termination criterion is reached, then stop.

Step 2. Determine αk = max{ρj, j = 0, 1, 2, . . .} satisfying

fµk(xk+ αkdk) ≤ fµk(xk) − 2δ(1 − γ)αkfµk(xk).

(8)

Step 3. Let xk+1 = xk+ αkdk and µk+1 = ¯γµk. Replace f with fµk+1 in formula (16) and compute dk+1 by (16).

Step 4. Set k = k + 1 and return to step 1.

We also say a few words about how to update µk in Step 3. For Experiment 1, µk is updated at each iteration by µk+1 = ¯γµk, ¯γ ∈ (0, 1) and we select ¯γ = 0.4. For Experiment 2 and Experiment 3, an inner loop is used to solve our relaxed problem for µk, and then reduces µk. This is indeed a technique appeared in NESTA. When µk is small enough, our algorithm no longer updates the parameter µk.

Lemma 3.1. Suppose that the function fµ(x) is defined by (17). Then, there exists a constant L > 0 such that k∇fµ(x) − ∇fµ(y)k ≤ Lkx − yk for all x, y ∈ IRn.

Proof. For any fixed µ > 0, in order to prove the Lipschitz property of ∇fµ(x), we need to verify the Lipschitz condition of ψi0 (i = 1, 2, 3, 4, 5, 6). To this end, we discuss two cases.

Case (i): i = 1, 3, 5, 6. For any t1, t2 ∈ IR, without losing of generality, let t1 < t2, by Lagrange Mean Value Theorem, we have

ψi0(µ, t1) − ψ0i(µ, t2) =

ψ00i(µ, ξ)

|t1− t2| , ξ ∈ (t1, t2).

For subsequent analysis, we need to estimate

ψi00(µ, ξ)

for each i = 1, 3, 5, 6.

For i = 1, we know that

100(µ, ξ)| = 1 µ

"

eξµ (1 + eµξ)2

+ e

−ξ µ

(1 + e−ξµ )2

#

< 2 µ. For i = 3, it is clear that

ψ300(µ, ξ)

= (4µ222)3/2 < 1 . For i = 5, we have

ψ50(µ, t) =





1 if t > µ,

t33 +3t if −µ ≤ t ≤ µ,

−1 if t < −µ.

ψ005(µ, t) =





0 if t > µ,

3t23 + 3 if −µ ≤ t ≤ µ, 0 if t < −µ.

which yields

ψ500(µ, ξ)

< 3µ23 + 3

2µ = 3 µ. For i = 6, we compute

ψ06(µ, t) = 2

√π Z t

0

e−u2du, ψ006(µ, t) =

√2

√πµe

−t2 2µ2,

(9)

which imply

ψ600(µ, ξ) <

2

πµ.

All the aforementioned results indicate that

ψi0(µ, t1) − ψi0(µ, t2) ≤ 3

µ|t1 − t2|, i = 1, 3, 5, 6. (20) for any t1, t2 ∈ IR.

Case (ii): i = 2, 4. Indeed, we will provide a version like (20) for ψ20 and ψ40. For i = 2, we know that

ψ20(µ, t) =





1 if t ≥ µ2,

2t

µ if −µ2 < t < µ2,

−1 if t ≤ −µ2. If t1µ2, t2µ2, t1 ≤ −µ2, t2 ≤ −µ2 or t1, t2 ∈ (−µ2,µ2), then

ψ20(µ, t1) − ψ20(µ, t2) ≤ 2

µ|t1− t2|.

If t1µ2, t2 ≤ −µ2, then

ψ02(µ, t1) − ψ02(µ, t2)

= 2 = µ2 µ ≤ 2

µ|t1− t2|.

If t1µ2, t2 ∈ (−µ2,µ2), then

ψ02(µ, t1) − ψ02(µ, t2)

= 1 − 2t2

µ < 2t1

µ −2t2

µ = 2

µ|t1− t2|.

If t1 ≤ −µ2, t2 ∈ (−µ2,µ2), then

ψ20(µ, t1) − ψ02(µ, t2)

= 1 + 2t2

µ < −2t1 µ +2t2

µ = 2

µ|t1− t2|.

Thus, it is clear to conclude that

ψ20(µ, t1) − ψ20(µ, t2) ≤ 2

µ|t1− t2|, ∀ t1, t2 ∈ IR.

For i = 4, using the similar arguments for case of i = 2, it can be verified that

ψ40(µ, t1) − ψ40(µ, t2) ≤ 1

µ|t1− t2|, ∀t1, t2 ∈ IR.

In summary, we also achieve

ψ0i(µ, t1) − ψ0i(µ, t2) ≤ 2

µ|t1− t2|, i = 2, 4. (21)

(10)

for any t1, t2 ∈ IR.

Now, applying (20) and (21), for any x, y ∈ IRn, we have k∇fµ(x) − ∇fµ(y)k

=

λ [∇φi(µ, x) − ∇φi(µ, y)] + AT(Ax − b) − AT(Ay − b)

≤ 3

µλnkx − yk + kAk2kx − yk

= Lkx − yk,

where L = µ3λn + kAk2. Thus, the proof is complete. 2

Lemma 3.2. For µ > 0, the level set L(x0) = {x ∈ IRn| fµ(x) ≤ fµ(x0)} is bounded.

Proof. We prove it by contradiction. Suppose that the set L(x0) is unbounded. Then, there exists an index set K1, such that kxkk → ∞, k → ∞, k ∈ K1. Recalling the definition of fµ(x), we have fµ(xk) → ∞, k → ∞, k ∈ K1. This contradicts fµ(xk) ≤ fµ(x0). Thus, the level set L(x0) is bounded. 2

In fact, since the continuity of fµ(x), we find the level set L(x0) is compact. We point out that Lemma 3.1 and Lemma 3.2 play key roles in our theoretical part. They were assumed as two assumptions in [32] and other literature like [14]. Here we assert them by showing that function (17) based on the proposed smoothing functions satisfies these assumptions. With these properties, we are ready to present the global convergence of our algorithm.

Theorem 3.1. For any µ > 0, consider the aforementioned algorithm with any starting point x0. Let {xk} be the sequence generated by the algorithm. Then

lim inf

k→∞ k∇fµ(xk)k = 0. (22)

Proof. Suppose that the conclusion (22) is not true. Then, there exists a constant ε0 > 0, such that

k∇fµ(xk)k ≥ ε0, ∀k. (23)

Since (∇fµ(xk))Tdk= −k∇fµ(xk)k2, there exists αk> 0, such that

fµ(xk+ αkdk) ≤ fµ(xk) − 2δ(1 − γ)αkfµ(xk). (24) This means {fµ(xk)} is decreasing and bounded, which implies that xk ∈ L(x0) and {fµ(xk)} is convergent. We denote that limk→∞fµ(xk) = f. From (24), it can be

(11)

verified that limk→∞αkfµ(xk) = 0. By the definition of fµ(x), we know f > 0, and therefore 0 < αk = αkffµ(xk)

µ(xk)αkffµ(xk)

. To sum up, we obtain

k→∞lim αk = 0. (25)

Now, applying lemma 3.2, there is a constant r > 0, such that

k∇fµ(xk)k ≤ r, ∀ k. (26)

Next, we prove {kdkk} is bounded by a contradiction argument. If {kdkk} is un- bounded, then there exists an index K2, such that kdkk → ∞, k → ∞, k ∈ K2. Let θk be the angle between −∇fµ(xk) and dk. Then, we see that

cos θk = −(∇fµ(xk))Tdk

k∇fµ(xk)kkdkk = k∇fµ(xk)k

kdkk . (27)

This relation enables us to apply ε0 ≤ k∇fµ(xk)k ≤ r to conclude cos θk → 0, k ∈ K2, k → ∞, which means θkπ2, k ∈ K2, k → ∞. Considering this geometric relationship, we have (∇fµ(xk))Tdk → 0, k ∈ K2, k → ∞, from which we find

−k∇fµ(xk)k2 = (∇fµ(xk))Tdk → 0, k ∈ K2, k → ∞,

which contradicts (23). Thus, {kdkk} is bounded, i.e., there exists a constant M > ε0, such that

kdkk ≤ M, ∀k. (28)

Then, combining (25) together with (28) gives lim

k→∞αkkdkk = 0. (29)

In addition, from (27), we have cos θkMε0, which further yields

−(∇fµ(xk))Tdk= k∇fµ(xk)kkdkk cos θk≥ ε20

Mkdkk. (30)

Applying the mean value theorem, we obtain fµ(xk+ αkdk) = fµ(xk) + αk(∇fµk))Tdk

= fµ(xk) + αk(∇fµ(xk))Tdk+ αk(∇fµk) − ∇fµ(xk))Tdk

≤ fµ(xk) + αkkdkk(∇fµ(xk))Tdk

kdkk + k∇fµk) − ∇fµ(xk)k

, (31) where ξk is between xk and xk + αkdk. Using Lemma 3.1 and the compact property of level set L(x0) imply that ∇fµ(x) is uniformly continuous on L(x0). This together

(12)

and (29) implies that there exists a constant α > 0, for sufficiently large k, such thatb αkkdkk <α andb

k∇fµk) − ∇fµ(xk)k < 1 2

ε20

M. (32)

Then, when k is sufficiently large, from (31), we have fµ(xk+ αkdk) ≤ fµ(xk) +αb



− ε20 M +1

2 ε20 M



= fµ(xk) − αεˆ 20

2M, (33)

which contradicts fµ(xk+1) − fµ(xk) → 0, k → ∞. Therefore, there must hold lim inf

k→∞ k∇fµ(xk)k = 0.

2

4 Numerical Experiments

In this section, we conduct numerical experiments to show the performance of our algo- rithm for signal recovery. All tests are run in MATLAB R20116 on a 64-bit PC with an Intel (R) Core(TM) i7-6500U of 2.50 GHz CPU and 8.00GB of RAM equipped with Windows 7 operating system.

Experiment 1

In what follows, we divide our tests into two groups: noise-free case and noise case. we report the numerical results of our algorithm for signal recovery. In all of our simulations, A ∈ IRm×n is assumed to be Gaussian matrix and m = n2, n4, x ∈ IRn is a K-sparse orig- inal signal with K = 40n, whose nonzero component satisfies normal distribution N (0, 1).

Moreover, we set b = Ax + e in the noisy case, where e is the Gaussian noise with zero mean and variance σ2.

The parameters of our algorithm are summarized as follows:

x0 = zeros(n, 1), λ = 0.001 ∗ kATbk, ρ = 0.5, δ = 0.002, γ = 0.2, σ2 = 0.0001.

In Tables 1 and 2, we set µ0 = 0.1, µk+1 = 0.4µk and the algorithm terminates when the relative error

kx − xk

kxk < 4 × 10−3,

(13)

where x is the reconstruction signal. In Tables 3 and 4, F denotes the smoothing func- tions, CPU denotes cpu time, Re denotes the relative error and It denotes the iterations, we set µ = 1.0 e−2, 1.0 e−3, 1.0 e−4, 1.0 e−5 and the algorithm terminates when

|fµ(xk+1) − fµ(xk)|

|fµ(xk+1)| < 1.0 e−12.

Note that the result from each test is obtained by running our algorithm 10 times and taking its average result.

In our experiments, we found that ψ1 is very unstable, sometimes it is not applicable to solve the test problems. Similar concern had also been addressed in [13, page 135].

Accordingly, we try to adopt the equivalent expression suggested in [13, formula (3.1)]

and rewrite ψ1(t) as below:

ψ1(t) = µh ln

eλ1(A(t))µ + eλ2(A(t))µ  + ln

eλ1(B(t))µ + eλ2(B(t))µ i

= λ1(A(t)) + µ

eλ2(A(t))−λ1(A(t)) µ



+ λ1(B(t)) + µ

eλ2(B(t))−λ1(B(t)) µ

 where

A(t) = 0 0 0 −t



, B(t) = 0 0 0 t

 ,

λ1(A(t)) = 0, λ2(A(t)) = −t, λ1(B(t)) = 0, λ2(B(t)) = t.

Nonetheless, ψ1 still does not work well along with the algorithm. We therefore omit it in our numerical reports.

CPU time Iterations

n m ψ2 ψ3 ψ4 ψ5 ψ6 ψ2 ψ3 ψ4 ψ5 ψ6

2000 n/2 1.21 0.94 1.32 1.27 0.92 72 74 75 73 69

n/4 1.37 0.90 1.45 1.59 1.00 129 135 142 148 147

4000 n/2 4.16 3.83 5.01 4.50 3.80 70 70 78 72 68

n/4 4.68 4.12 5.19 5.58 4.50 131 143 143 148 146

6000 n/2 9.54 8.49 9.80 9.22 8.61 70 70 70 69 68

n/4 9.69 10.35 10.71 12.91 10.54 134 163 145 159 164 8000 n/2 16.93 16.25 18.76 18.91 15.55 67 71 74 78 69

n/4 15.76 15.65 18.65 19.49 15.75 126 137 145 144 138 10000 n/2 25.88 24.06 30.07 25.83 24.21 67 67 76 68 68

n/4 27.94 28.28 29.96 34.48 27.60 139 138 151 160 149 Table 1: Comparisons of five smoothing functions for noise-free case

(14)

CPU time Iterations

n m ψ2 ψ3 ψ4 ψ5 ψ6 ψ2 ψ3 ψ4 ψ5 ψ6

2000 n/2 1.27 1.00 1.30 1.33 1.00 73 78 75 75 72

n/4 1.54 0.92 1.64 1.55 0.91 145 151 154 150 144

4000 n/2 4.48 4.47 4.89 4.62 4.36 72 80 75 72 76

n/4 4.94 5.16 5.77 5.09 4.65 137 159 154 143 158

6000 n/2 8.81 9.07 9.80 9.91 8.97 65 72 72 67 69

n/4 11.65 11.61 12.10 12.38 10.12 160 167 152 165 154 8000 n/2 16.27 17.05 18.28 18.53 16.69 69 75 76 74 71

n/4 15.76 18.04 19.27 17.99 16.91 132 155 145 145 143 10000 n/2 25.59 25.58 26.04 25.82 25.38 68 71 67 67 68

n/4 25.82 27.23 37.00 29.52 27.73 136 150 175 150 155 Table 2: Comparisons of five smoothing functions for noisy case

Figure 1: The convergence behavior: cpu time vs. signal size (n)when µk+1 = 0.4µk

(15)

Figure 2: The convergence behavior: iterations vs. signal size (n) when µk+1 = 0.4µk

µ = 1.0e−2 µ = 1.0e−3 µ = 1.0e−4 µ = 1.0e−5

F m CPU Re It CPU Re It CPU Re It CPU Re It

ψ2 n/2 8.80 0.0158 92 9.34 0.0040 89 11.03 0.0033 100 16.78 0.0025 141 n/4 10.98 0.0365 172 11.49 0.0064 178 11.83 0.0029 184 17.06 0.0028 237 ψ3 n/2 9.44 0.0567 104 8.14 0.0090 86 7.98 0.0032 86 10.93 0.0025 111 n/4 7.44 0.1532 167 9.10 0.0195 179 8.36 0.0045 166 10.34 0.0030 188 ψ4 n/2 8.22 0.0538 71 7.92 0.0059 73 10.58 0.0027 93 12.01 0.0025 104 n/4 9.13 0.0157 139 11.11 0.0154 155 10.69 0.0035 179 13.43 0.0032 194 ψ5

n/2 7.88 0.0419 68 7.59 0.0055 72 10.14 0.0025 91 12.77 0.0025 109 n/4 10.26 0.1895 132 11.09 0.0117 156 10.51 0.0041 169 12.83 0.0034 193 ψ6 n/2 8.95 0.0360 96 7.94 0.0063 84 9.69 0.0027 99 12.57 0.0023 121 n/4 9.17 0.0926 166 9.16 0.0128 173 8.98 0.0038 174 11.75 0.0030 203 Table 3: Comparisons of five functions for different µ when n = 5000 and noise-free case

(16)

From the experiments, we have the following observations:

(1) The sparse original signal can be recovered by our algorithm effectively. Tables 1-4 and Figures 1-5 illustrate that the five smoothing functions (except ψ1) in our algorithm work quite well whether the given problem is noise-free or noisy.

(2) The convergence behavior with respect to CPU time, iterations, signal size, and the relative error are depicted in Figures 1-5, respectively. From Tables 1-2 and Figures 1-2, we see that our algorithm costs more cpu time when increasing the dimension n, while the changes of their iterations are very marginal. On the whole, the performance order can be summarized as below:

ψ2 ≥ ψ4 ≈ ψ5 ≈ ψ6 > ψ3

where “≥” means performs better than and “≈” means there is no difference in numerical performance.

µ = 1.0e−2 µ = 1.0e−3 µ = 1.0e−4 µ = 1.0e−5

F m CPU Re It CPU Re It CPU Re It CPU Re It

ψ2 n/2 8.62 0.0159 89 8.77 0.0072 88 11.82 0.0026 100 15.80 0.0025 143 n/4 11.67 0.0379 174 10.40 0.0063 180 10.26 0.0033 188 15.03 0.0033 242 ψ3 n/2 8.54 0.0548 100 7.86 0.0085 88 8.75 0.0033 92 10.38 0.0025 109 n/4 7.43 0.1506 169 8.43 0.0185 169 7.78 0.0045 177 9.82 0.0030 203 ψ4 n/2 6.76 0.0491 64 8.11 0.0058 76 11.40 0.0028 95 11.93 0.0026 105 n/4 11.10 0.1439 154 8.84 0.0137 152 9.67 0.0035 175 11.52 0.0035 184 ψ5

n/2 7.36 0.0542 65 7.82 0.0059 75 9.57 0.0031 87 11.33 0.0025 104 n/4 10.19 0.1492 143 10.00 0.0123 154 10.48 0.0039 177 11.37 0.0034 191 ψ6 n/2 8.54 0.0355 94 8.78 0.0061 92 9.05 0.0029 96 11.57 0.0026 119 n/4 7.77 0.0908 160 9.15 0.0122 180 8.11 0.0040 167 10.50 0.0030 207 Table 4: Comparisons of five smoothing functions for different µ when n = 5000 and

noisy case

(17)

Figure 3: The convergence behavior: cpu time vs. − log µ with n = 5000

Figure 4: The convergence behavior: iterations vs. − log µ with n = 5000

(18)

Figure 5: The convergence behavior: relative error vs. − log µ with n = 5000 (3) According to Tables 3-4 and Figures 3-5, the smaller the parameter µ, the better the

signal to recover; and the more cpu time and iterations have to spend accordingly.

For any fixed µ, although the function ψ2 sometimes takes a bit more cpu time and iterations, it has a lower relative error. In view of this, we could conclude that the function ψ2 works best along with the proposed algorithm. From Figure 5, the function ψ6 also has a lower relative error, which means ψ6 is a possible good choice as well. To sum up, in view of relative error, we have

ψ2 ≥ ψ4 ≈ ψ5 ≈ ψ6 > ψ3. Experiment 2

Next, we try to do comparison with “NESTA”, which is a well-known, fast, open software proposed in [2]. In the noisy case, b = Ax + e, where e is the Gaussian noise with zero mean and variance σ2. In all of our simulations, A ∈ IRm×n is assumed to be Gaussian matrix, where n = 213 and m = n4. We select K-sparse original signal x ∈ IRn with K = 40n, its nonzero component satisfies normal distribution N (0, 1). In our test, we set

x0 = ATb, λ = 0.0048 ∗ kATbk, ρ = 0.5, δ = 0.002, γ = 0.2, σ2 = 0.0001.

We also accelerate our algorithm by using µk+1= ωµk, 1 ≤ k ≤ t, ω = (µt

µ0)1/t, µt = 10−8, µ0 = 10−2, and t = 4.

(19)

The algorithm terminates when

|fµ(xk) − ¯fµ(xk)|

µ(xk) < tol, f¯µ(xk) = 1 min{5, k}

min{5,k}

X

l=1

fµ(xk−l).

In our experiments, the accuracy tol ∈ {10−4, 10−5, 10−6, 10−7, 10−8}. We test the con- tinuation version NESTA for comparison. In testing NESTA, other parameters are taken as default expect for the above parameters.

From the comparison experiments, we have the following observations:

(1) In the view of CPU time, the NESTA is really fast. However, from Tables 5-6 and Figure 8, as the “accuracy tol” gets smaller, the CPU time of our algorithm is not affected too much. To the contrast, when the “accuracy tol” gets smaller, the CPU time of NESTA goes up dramatically.

(2) From Tables 5-6 and Figure 7, we see that our algorithm has a small relative error when the “accuracy tol” is relatively large. Especially, when tol = 10−4 and tol = 10−5, the relative error of NESTA gets large, which indicates that the signal has not been recovered well. To the contrast, we see that the function ψ2 keeps a small error, which means it has better performance.

(3) From Tables 5-6 and Figure 6, NESTA needs more iterations. In view of this and relative error, our proposed algorithm is competitive.

Figure 6: The convergence behavior: iterations vs. accuracy

(20)

Figure 7: The convergence behavior: relative error vs. accuracy

Figure 8: The convergence behavior: CPU time vs. accuracy

(21)

Experiment 3

In order to observe the efficiency on sparse recovery. We compare our algorithm with three other algorithms: the NESTA, the FPCBB and the FISTA, in which Bernoulli matrix, Partial Hadamard matrix and Gaussian matrix are chosen. In our implementations, we select n = 214, m = n2 and sparsity is K = 40n. The parameters µk, δ and γ are the same as the Experiment 2. See Figures 9-11.

The result from each test is obtained by running our algorithm 30 times and taking its average result. Figures 9-11 show the frequency of successful reconstruction at different relative error. We can see that our algorithm gives relatively better behavior when relative error is small enough. This means that our algorithm has slightly higher efficiency on sparse recovery. To sum up, we obtain

ψ2 ≥ ψ4 ≈ ψ5 ≈ ψ6 > ψ3.

Figure 9: Comparisons among algorithms with Bernoulli matrix

(22)

Figure 10: Comparisons among algorithms with Gaussian matrix

Figure 11: Comparisons among algorithms with Partial Hadamard matrix

(23)

5 Conclusions

In this paper, the sparse signal recovery problem is investigated, in which we concentrate on the l1-norm model. In light of convolution techniques in [23, 27] and the framework of nonlinear conjugate gradient method [32], we propose an unified smoothing approach to solve the given sparse signal reconstruction problem. As a byproduct, the classical l1 norm is approximated by six smoothing functions. Numerical results show that ψ2 and ψ6 are the better choices of smoothing function to work with the algorithm. Although the theoretical part is not very notable, the contribution of this paper lies in numerical comparisons among new smoothing functions. In particular, we suggest two nice smooth- ing functions to work along with nonlinear conjugate gradient method. In addition, we compare our algorithm with three other algorithms (well-known open softwares): the NESTA, the FPCBB and the FISTA, in which Bernoulli matrix, Partial Hadamard ma- trix and Gaussian matrix are chosen. It can be seen that our algorithm gives relatively better behavior when relative error is small enough. Under this sense, we provide a new choice of simple and effective approach to deal with the recovery of the original sparse signal problem.

Our attention was drawn to [14] by one reviewer. In [14], the authors proposed a smoothing conjugate gradient method, and then used this algorithm to solve the image restoration problem. More specifically, it used a smoothing function as below:

sµ(t) =

( |t| if |t| > µ2,

t2

µ + µ4 if |t| ≤ µ2.

This smoothing function is exactly the ψ2(µ, t) function in our paper. Our contribution lies in showing that this smoothing function is also a good choice for signal reconstruction problem comparing to other smoothing functions, which are not investigated in reference [14].

Acknowledgements. The authors would like to thank the anonymous referee and the editor for their valuable comments, viewpoints, and suggestions, which help improve the manuscript a lot.

References

[1] A. Beck and M. Teboulle, A fast iterative shrinkage-thresholding algorithm for linear inverse problems, SIAM Journal on Imaging Sciences, vol. 2, pp. 183-202, 2009.

[2] S. Becker, J. Bobin, and E. Candes, NESTA: a fast and accurate first-order method for sparse recovery, SIAM Journal on Imaging Sciences, vol. 4, pp. 1-39, 2011.

(24)

[3] E. Berg and M.P. Friedlander, Probing the Pareto frontier for basis pursuit solutions, SIAM Journal on Scientific Computing, vol. 31, pp. 890-912, 2008.

[4] J.M. Bioucas-Dias and M. Figueiredo, A new TwIST: two-step iterative shrink- age/thresholding algorithms for image restoration, IEEE Trans. Image Process, vol.

16, pp. 2992-3004, 2007.

[5] R.L. Broughton, I.D. Coope, P.F. Renaud, and R.E.H. Tappenden, A box constrained gradient projection algorithm for compressed sensing, Signal Process, vol.

91, pp. 1985-1992, 2011.

[6] E.J. Candes, The Restricted Isometry Property and its Implications for Compressed Sensing, Comptes Rendus Mathematique, vol. 346, no. 9-10, pp.589-592, 2008.

[7] E.J. Candes and T. Tao, Decoding by Linear Programming, IEEE Transactions on Information Theory, vol. 51, no. 12, pp.4203-4215, 2005.

[8] E.J. Candes and T. Tao, Near optimal signal recovery from random projections:

universal encoding strategies? IEEE Transactions on Information Theory, vol. 52, no.

12, pp. 5406-5425, 2006.

[9] E.J. Candes, J.K. Romberg, and T. Tao, Robust uncertainty principles: exact signal reconstruction from highly incomplete frequency information, IEEE Transac- tions on Information Theory, vol. 52, no.2, pp. 489-509, 2006.

[10] E.J. Candes, J.K. Romberg, and T. Tao, Stable signal recovery from incom- plete and inaccurate measurements, Communications on Pure and Applied Mathe- matics, vol. 59, no. 8, pp. 1207-1223, 2006.

[11] C. Chen and O.L. Mangasarian, A class of smoothing functions for nonlinear and mixed complementarity problems, Computational Optimization and Applications, vol. 5, pp. 97-138, 1996.

[12] X. Chen, Smoothing methods for nonsmooth, nonconvex minimization, Mathemat- ical Programming, vol. 134, pp. 71-99, 2012.

[13] X. Chen, R.S. Womersley, and J. Ye, Minimizing the condition number of Gram matrix, SIAM Journal on Optimization, vol 21, no. 1, 127-148, 2011.

[14] X. Chen and W. Zhou, Smoothing nonlinear conjugate gradient method for im- age restoration using nonsmooth nonconvex minimization, SIAM Journal on Imaging Sciences, vol. 3, no. 4, pp. 765-790, 2010.

[15] C. De Mol and M. Defrise, A note on wavelet-based inversion algorithms, Con- temporary Mathematics, vol. 313, pp. 85-96, 2002.

(25)

[16] D.L. Donoho, Compressed sensing, IEEE Transactions on Information Theory, vol. 52, no. 4, pp. 1289-1306, 2006.

[17] M. Elad, Why simple shrinkage is still relevant for redundant representations?

IEEE Transactions on Information Theory, vol. 52, no. 12, pp. 5559-5569, 2006.

[18] M. Figueiredo, R. Nowak, and S.J. Wright, Gradient projection for sparse reconstruction: application to compressed sensing and other inverse problems, IEEE Journal of Selected Topics in Signal Processing, vol. 1, pp. 586-597, 2007.

[19] E.T. Hale, W. Yin, and Y. Zhang Fixed-point continuation for l1-minimization:

methodology and convergence, SIAM Journal on Optimization, vol. 19, pp. 1107-1130, 2008.

[20] E.T. Hale, W. Yin, and Y. Zhang, Fixed-point continuation applied to com- pressed sensing: implementation and numerical experiments, Journal of Computa- tional Mathematics, vol. 28, pp. 170-194, 2010.

[21] R. Nowak and M. Figueiredo, Fast wavelet-based image deconvolution using the EM algorithm, in Proceedings of the 35th Asilomar Conference on Signals, Systems and Computers, vol. 1, pp. 371-375, 2001.

[22] L. Qi and D. Sun, Smoothing functions and smoothing Newton method for comple- mentarity and variational inequality problems, Journal of Optimization Theory and Applications, vol. 113, pp. 121–147, 2001.

[23] B. Saheya, C.H. Yu, and J.-S. Chen, Numerical comparisons based on four smoothing functions for absolute value equation, Journal of Applied Mathematics and Computing, vol. 56, no. 1-2, pp. 131-149, 2018.

[24] J.L. Starck, E. Candes, and D. Donoho, Astronomical image representation by the curvelet transform, Astronomy and Astrophysics, vol. 398, pp. 785-800, 2003.

[25] J.L. Starck, M. Nguyen, and F. Murtagh, Wavelets and curvelets for image deconvolution: a combined approach, Signal Process., vol. 83, pp. 2279-2283, 2003.

[26] R. Tibshirani Regression shrinkage and selection via the lasso, Journal of the Royal Statistical Society, vol. 58, pp. 267-268, 1996.

[27] S. Voronin, G. Ozkaya, and D. Yoshida Convolution based smooth approxi- mations to the absolute value function with application to non-smooth regularization, arXiv:1408.6795v2 [math.NA] 1, July, 2015.

(26)

[28] X. Wang, F. Liu, L.C. Jiao, J. Wu, and J. Chen, Incomplete variables trun- cated conjugate gradient method for signal reconstruction in compressed sensing, In- formation Sciences, vol. 288, pp. 387-411, 2014.

[29] S. Wright, R. Nowak, and M. Figueiredo, Sparse reconstruction by separable approximation, in Proceedings of the International Conference on Acoustics, Speech, and Signal Processing, October 2008.

[30] J. Yang and Y. Zhang, Alternating direction algorithms for l1-problems in com- pressive sensing, SIAM Journal on Scientific Computing, vol. 33, pp. 250-278, 2011.

[31] K. Yin, Y.H. Xiao, and M.L. Zhang, Nonlinear conjugate gradient method for l1-norm regularization problems in compressive sensing, Journal of Computational Information Systems, vol. 7, pp. 880-885, 2011.

[32] L. Zhang, W. Zhou, and D. Li, A descent modified Polak-Ribiere-Polyak conju- gate gradient method and its global convergence, IMA Journal of Numerical Analysis, vol. 26, pp. 629-640, 2006.

[33] H. Zhu, Y.H. Xiao, and S.Y. Wu, Large sparse signal recovery by conjugate gradient algorithm based on smoothing technique, Computers and Mathematics with Applications, vol. 66, pp. 24-32, 2013.

(27)

tol=1.0e4 tol=1.0e5 tol=1.0e6 tol=1.0e7 tol=1.0e8 FCPUReItCPUReItCPUReItCPUReItCPUReIt ψ223.779.62E-0415028.301.98E-0517735.652.11E-0521346.621.91E-0525659.971.58E-05342 ψ321.826.37E-0314224.553.03E-0517130.882.29E-0520636.542.01E-0524140.391.76E-05286 ψ426.647.00E-0314627.034.32E-0516936.792.21E-0521545.122.04E-0525859.341.76E-05338 ψ526.322.21E-0315728.042.77E-0517434.732.12E-0520945.382.14E-0525649.891.88E-05296 ψ619.534.82E-0313824.833.01E-0517232.412.34E-0521438.802.06E-0524550.091.68E-05301 NESTA7.962.28E-025558.302.95E-0256817.123.26E-04113415.981.82E-05106044.501.74E-052999 Table5:ComparisonsoffivesmoothingfunctionsandNESTAfordifferenttolwhennoise-free

(28)

tol=1.0e4 tol=1.0e5 tol=1.0e6 tol=1.0e7 tol=1.0e8 FCPUReItCPUReItCPUReItCPUReItCPUReIt ψ223.252.49E-0314626.107.37E-0416536.675.91E-0422262.275.80E-04332104.445.95E-04542 ψ319.535.57E-0313819.024.24E-0313631.625.83E-0421553.045.99E-0433280.035.83E-04459 ψ425.113.52E-0316223.848.90E-0315337.375.39E-0422456.385.65E-04314104.555.68E-04520 ψ524.725.91E-0316126.407.22E-0416435.526.31E-0421457.015.75E-0431783.135.83E-04429 ψ620.451.23E-0214822.166.80E-0415633.505.63E-0422256.775.85E-0430894.435.75E-04507 NESTA7.672.13E-025258.281.10E-0254915.817.85E-04108616.457.47E-04107385.896.16E-045886 Table6:ComparisonsoffivesmoothingfunctionsandNESTAfordifferenttolwhennoisycase

參考文獻

相關文件

Then, based on these systematically generated smoothing functions, a unified neural network model is pro- posed for solving absolute value equationB. The issues regarding

In summary, the main contribution of this paper is to propose a new family of smoothing functions and correct a flaw in an algorithm studied in [13], which is used to guarantee

Lin, A smoothing Newton method based on the generalized Fischer-Burmeister function for MCPs, Nonlinear Analysis: Theory, Methods and Applications, 72(2010), 3739-3758..

Conjugate gradient method

Specifically, in Section 3, we present a smoothing function of the generalized FB function, and studied some of its favorable properties, including the Jacobian consistency property;

Specifically, in Section 3, we present a smoothing function of the generalized FB function, and studied some of its favorable properties, including the Jacobian consis- tency

Recently, the paper [36] investigates a family of smoothing functions along with a smoothing-type algorithm to tackle the absolute value equation associated with second-order

Gu, Smoothing Newton algorithm based on a regularized one-parametric class of smoothing functions for generalized complementarity problems over symmetric cones, Journal of