Signal reconstruction by conjugate gradient algorithm based on smoothing l

(1)

Calcolo, vol. 56, no. 4, Article 42, 26 pages, December, 2019

Signal reconstruction by conjugate gradient algorithm based on smoothing l

₁

-norm

Caiying Wu ¹

College of Mathematics Science Inner Mongolia University

Hohhot 010021, China Jiaming Zhan ²

College of Mathematics Science Inner Mongolia University

Hohhot 010021, China Yue Lu ³

School of Mathematical Sciences Tianjin Normal University

Tianjin 300387, China Jein-Shan Chen ⁴ Department of Mathematics National Taiwan Normal University

Taipei 11677, Taiwan January 12, 2019

(1st revision on May 21, 2019) (2nd revision on July 29, 2019) (3rd revision on August 18, 2019)

1E-mail: wucaiyingyun@163.com. The research is supported by the Natural Science Foundation of

Inner Mongolia Autonomous Region (2018MS01016)

2E-mail: zhanjiaming1220@qq.com. The research is supported by the Natural Science Foundation of

Inner Mongolia Autonomous Region (2018MS01016)

3E-mail: jinjin403@sina.com. The research is supported by National Natural Science Foundation of

China (Grant Number: 11601389), Doctoral Foundation of Tianjin Normal University (Grant Number:

52XB1513), 2017-Outstanding Young Innovation Team Cultivation Program of Tianjin Normal Uni- versity (Grant Number: 135202TD1703) and Program for Innovative Research Team in Universities of Tianjin (Grant Number: TD13-5078).

4Corresponding author. E-mail: jschen@math.ntnu.edu.tw. The research is supported by Ministry

of Science and Technology, Taiwan.

(2)

Abstract. The l₁-norm regularized minimization problem is a non-differentiable problem and has a wide range of applications in the field of compressive sensing. Many approaches have been proposed in the literature. Among them, smoothing l₁-norm is one of the effective approaches. This paper follows this path, in which we adopt six smoothing functions to approximate the l1-norm. Then, we recast the signal recovery problem as a smoothing penalized least squares optimization problem, and apply the nonlinear conjugate gradient method to solve the smoothing model. The algorithm is shown globally convergent. In addition, the simulation results not only suggest some nice smoothing functions, but also show that the proposed algorithm is competitive in view of relative error.

Keywords. l1-norm regularization, compressive sensing, conjugate gradient algorithm, smoothing function.

Mathematics Subject Classification (2000) 90C33

1 Introduction

In this paper, we focus on the topic of reconstructing signals, which is one of the most important applications in compressive sensing [8, 9, 16]. There exist numerous textbooks and articles related to this topic and there is no need to repeat its importance and applications here. Therefore, we get into its mathematical model directly and convey our new idea for tackling this problem. Mathematically, the noise-free signal reconstruction problem model is described as follows:

min kxk₀

s.t. b = Ax, (1)

where x ∈ IRⁿis the original sparse signal that needs to be recovered, A ∈ IR^m×n(m n) is the measurement matrix, b ∈ IR^m is the observation vector, and kxk₀ represents the l₀-norm of x, which is defined as the number of nonzero components of x. Without loss of generality, we assume that there exists a positive constant K ≤ n such that kxk₀ = K.

Besides this condition, if the measurement matrix A satisfies Restricted Isometry Prop- erty (RIP) of order K, then we can recover the signal x more accurately through the model (1), see [6, 7, 8, 9, 10] for more details.

Unfortunately, the l₀-norm minimization problem (1) is an NP-hard combinatorial optimization problem. In order to avoid this difficulty, researchers attempt to replace l0-norm by l1-norm in model (1) and obtain the following l1 minimization problem

min kxk₁

s.t. b = Ax, (2)

(3)

where kxk₁ denotes the l₁ norm of x and kxk₁ = Pn

i=1|x_i|. Under the RIP condition, the model (2) has the same solution as (1). In practice, the probed signal b is usually impacted by noise, therefore there arises investigation on the noise signal reconstruction problem:

min kxk₁

s.t. b = Ax + e, (3)

where e ∈ IR^m denotes the noise. In order to deal with (3), researchers prefer to consider the associated penalized least squares optimization problem

min λkxk1+ 1

2kb − Axk²₂ (4)

where λ > 0 is the penalty factor. In the sequel, we call (4) the l₁-norm regularized problem.

Until now, there are plenty of numerical algorithms for solving the model (4) and some first-order algorithms have drawn much attention during the past decades. Due the huge amount of literature, we only outline and recall some approaches as below. The gradient projection algorithm [18] is one of the earliest gradient-based algorithms for solving (4), in which it was reformulated as a box constrained quadratic program and solved by the gradient projection algorithm. Recently, the most extensively investigated first order algorithm for the solution of (4) was the iterative shrinkage/thresholding (IST) algorithm [15, 21]. Their results triggered off many contributions based on this method [17, 24, 25]. In light of the IST algorithm, many variants are proposed under different optimization reformulations and techniques. For instances, Hale, Yin and Zhang [19, 20]

presented an IST fixed point continuation (FPC) method by an operator splitting skill.

Wright, Nowak and Figueiredo [29] studied the spaRSA for sparse reconstruction from solving nonsmooth convex problem. Experimental results show that the accelerated IST methods (two IST [4] and fast IST [1]) have better convergence properties. In addition, the famous NESTA [2] first proposed the smoothing function of the l₁-norm and then used Nesterov’s gradient method to get the solution of (3). Besides these methods, the gradient projection technique and alternating direction ideas are also considered, see [3, 5, 30] for more details.

Because the simplicity and lower storage requirements, conjugate gradient algorithms are suitable for large scale problems. In this paper, like [28, 31, 33], we are interested in applying the conjugate gradient method for signal recovery. We use six smoothing functions, which are studied in [23, 27] for absolute value function, to approximate the l₁- norm. Accordingly, we reformulate the signal recovery problem as a smoothing penalized least squares optimization problem, and apply the nonlinear conjugate gradient method [32] to solve the smoothing model. The proposed algorithm is shown globally convergent.

(4)

Moreover, we report some numerical experiments to demonstrate the effectiveness of our method. Numerical comparisons with “NESTA”, “FPCBB”, and “FISTA”, which are well-known open softwares, are presented as well.

2 Preliminary

This section recalls some ideas about smoothing technique which can be found in [11, 12, 23] and references therein.

It is well known that the absolute value function |t| (t ∈ IR) is not smooth at zero.

In order to overcome this difficulty, we introduce some smoothing functions of |t| used in the sequel.

Definition 2.1. The function ψ : IR₊₊× IR → IR is a smoothing function of |t|, if the following two conditions hold:

(a) ψ is continuously differentiable at (µ, t) ∈ IR₊₊× IR;

(b) lim_µ↓0ψ(µ, t) = |t| for any t ∈ IR.

How to construct smoothing functions for |t|? First, we observe that |t| can be divided into two parts:

|t| = (t)₊− (t)−= (t)₊+ (−t)₊,

where (t)₊ denotes the plus function, i.e., (t)₊ = max{0, t}, and (t)₋ = min{0, t}. As mentioned in [11, 12, 23], one can follow the below procedure to construct a smoothing function for (t)₊. More specifically, through importing a density (kernel) function d(t) with finite number of pieces satisfying

d(t) ≥ 0 and

Z +∞

−∞

d(t)dt = 1, one can define

ˆ

s(t, µ) := 1 µd t

µ

,

where µ is a positive parameter. If the following condition holds Z +∞

−∞

|t| d(t) dt < +∞, then the function ˆp(t, µ) defined as

ˆ

p(t, µ) = Z +∞

−∞

(t − s)₊s(s, µ)ds =ˆ Z t

−∞

(t − s)ˆs(s, µ)ds ≈ (t)₊

(5)

is a smoothing approximation for (t)₊. There are existing well-known smoothing functions for the plus function [11, 22, 23], for example,

ψˆ1(µ, t) = t + µ ln

1 + e⁻^µ^t

, (5)

ψˆ₂(µ, t) =







t if t ≥ ^µ₂,

1

2µ t + ^µ₂2

if −^µ₂ < t < ^µ₂, 0 if t ≤ −^µ₂,

(6)

ψˆ₃(µ, t) = p4µ²+ t²+ t

2 , (7)

ψˆ₄(µ, t) =







t − ^µ₂ if t > µ,

t²

2µ if 0 ≤ t ≤ µ, 0 if t < 0.

(8)

where the corresponding kernel functions are respectively given by d₁(t) = e^−x

(1 + e^−x)²,

d₂(t) = 1 if − ¹₂ ≤ x ≤ ¹₂, 0 otherwise, d₃(t) = 2

(x²+ 4)³²,

d₄(t) = 1 if 0 ≤ x ≤ 1, 0 otherwise.

Likewise, we can achieve the smoothing function of |t| via convolution as follows:

ˆ

p(|t| , µ) = ˆp(t, µ) + ˆp(−t, µ) = Z +∞

−∞

|t − s| ˆs(s, µ)ds.

Analogous to (5)-(8), we construct four smoothing functions for |t| as [23]:

ψ₁(µ, t) = µh ln

1 + e⁻^µ^t

+ ln

1 + e^µ^t

i

, (9)

ψ₂(µ, t) =







t if t ≥ ^µ₂,

t²

µ +^µ₄ if − ^µ₂ < t < ^µ₂,

−t if t ≤ −^µ₂,

(10)

ψ₃(µ, t) =p

4µ²+ t², (11)

ψ₄(µ, t) = ( t²

2µ if |t| ≤ µ,

|t| − ^µ₂ if |t| > µ. (12)

(6)

In particular, if we take a Epanechnikov kernel function d(t) =

₃

4(1 − t²) if |t| ≤ 1, 0 if otherwise, then the associated smoothing function for |t| is expressed by

ψ5(µ, t) =







t if t > µ,

−_8µ^t⁴3 +^3t_4µ² + ^3µ₈ if − µ ≤ t ≤ µ,

−t if t < −µ.

(13)

Moreover, taking a Gaussian kernel function d(t) = ^√¹

2πe⁻^t2² for all t ∈ IR yields ˆ

s(t, µ) := 1 µd t

µ

= 1

p2πµ²e⁻^2µ2^t2 , which leads to another type of smoothing function [27] for |t|:

ψ₆(µ, t) = terf

t

√2µ

+

r2

πµe⁻^2µ2^t2 , (14)

where the error function is defined as follows:

erf(t) = 2

√π Z t

0

e^−u²du ∀t ∈ IR.

In summary, we have obtained six smoothing functions and will employ them to approximate the l₁-norm for signal recovery problem.

3 Algorithm and Convergence Properties

This section is devoted to the detailed description and implementation of our algorithmic idea and its global convergence results. Basically, it is a type of conjugate gradient algorithm. To proceed, we briefly review the conjugate gradient method. Suppose that f : IRⁿ→ IR is a continuously differentiable function and the target problem is

min

x∈IRⁿf (x).

Let d^k denote the search direction and αk be a step length. Then, the general framework of conjugate gradient method is as below.

x^k+1 = x^k+ α_kd^k,

d^k= −∇f (x^k) if k = 0,

−∇f (x^k) + β_kd^k−1 if k > 0, (15)

(7)

where ∇f is the gradient of f and β_k is a parameter. In general, various choices of parameters β_k represent (correspond to) different conjugate gradient algorithms. Among them, the three-term PRP (Polak-Ribiere-Polyak) conjugate gradient method is the most efficient, where

d^k = −∇f (x^k) if k = 0,

−∇f (x^k) + β_kd^k−1− θ_ky^k−1 if k > 0, (16) y^k−1 = ∇f (x^k) − ∇f (x^k−1),

β_k = (∇f (x^k))^Ty^k−1 k∇f (x^k−1)k² , θ_k = (∇f (x^k))^Td^k−1

k∇f (x^k−1)k² .

The utmost feature of this conjugate gradient method is to ensure the sufficient descent property of direction at each iteration, which plays a prominent role in the global convergence analysis. For more details, please refer to [32] and references therein.

For convenience, we denote

f_µ(x) := λφ_i(µ, x) +1

2kb − Axk²₂, x ∈ IRⁿ, (17) where φi(µ, x) is a smoothing function of l1 norm kxk1. In other words, it corresponds to the format as below:

φ_i(µ, x) :=

n

X

j=1

ψ_i(µ, x_j), i = 1, 2, 3, 4, 5, 6. (18)

Thus, the problem (4) is equivalent to the smoothing penalized least squares optimization problem:

x∈IRminⁿf_µ(x) (19)

which is our new target problem. To deal with the unconstrained minimization (19), the following iterative scheme is employed, see [32].

The Algorithm

Step 0. Given starting point x⁰ ∈ IRⁿ and µ₀ > 0. Choose parameters ρ, δ, γ, and

¯

γ ∈ (0, 1). Let d⁰ = −∇f_µ₀(x⁰). Set k = 0.

Step 1. If termination criterion is reached, then stop.

Step 2. Determine α_k = max{ρ^j, j = 0, 1, 2, . . .} satisfying

f_µ_k(x^k+ α_kd^k) ≤ f_µ_k(x^k) − 2δ(1 − γ)α_kf_µ_k(x^k).

(8)

Step 3. Let x^k+1 = x^k+ α_kd^k and µ_k+1 = ¯γµ_k. Replace f with f_µ_k+1 in formula (16) and compute d^k+1 by (16).

Step 4. Set k = k + 1 and return to step 1.

We also say a few words about how to update µ_k in Step 3. For Experiment 1, µ_k is updated at each iteration by µ_k+1 = ¯γµ_k, ¯γ ∈ (0, 1) and we select ¯γ = 0.4. For Experiment 2 and Experiment 3, an inner loop is used to solve our relaxed problem for µ_k, and then reduces µ_k. This is indeed a technique appeared in NESTA. When µ_k is small enough, our algorithm no longer updates the parameter µ_k.

Lemma 3.1. Suppose that the function f_µ(x) is defined by (17). Then, there exists a constant L > 0 such that k∇f_µ(x) − ∇f_µ(y)k ≤ Lkx − yk for all x, y ∈ IRⁿ.

Proof. For any fixed µ > 0, in order to prove the Lipschitz property of ∇fµ(x), we need to verify the Lipschitz condition of ψ_i⁰ (i = 1, 2, 3, 4, 5, 6). To this end, we discuss two cases.

Case (i): i = 1, 3, 5, 6. For any t₁, t₂ ∈ IR, without losing of generality, let t₁ < t₂, by Lagrange Mean Value Theorem, we have

ψ_i⁰(µ, t₁) − ψ⁰_i(µ, t₂) =

ψ⁰⁰_i(µ, ξ)

|t₁− t₂| , ξ ∈ (t₁, t₂).

For subsequent analysis, we need to estimate

ψ_i⁰⁰(µ, ξ)

for each i = 1, 3, 5, 6.

For i = 1, we know that

|ψ₁⁰⁰(µ, ξ)| = 1 µ

"

e^ξ^µ (1 + e^µ^ξ)²

+ e

−ξ µ

(1 + e^−ξ^µ )²

#

< 2 µ. For i = 3, it is clear that

ψ₃⁰⁰(µ, ξ)

= _(4µ2^4µ+ξ²²)^3/2 < _2µ¹ . For i = 5, we have

ψ₅⁰(µ, t) =







1 if t > µ,

−_2µ^t³3 +_2µ^3t if −µ ≤ t ≤ µ,

−1 if t < −µ.

ψ⁰⁰₅(µ, t) =







0 if t > µ,

−_2µ^3t²3 + _2µ³ if −µ ≤ t ≤ µ, 0 if t < −µ.

which yields

ψ₅⁰⁰(µ, ξ)

< 3µ² 2µ³ + 3

2µ = 3 µ. For i = 6, we compute

ψ⁰₆(µ, t) = 2

√π Z ^√^t

2µ

0

e^−u²du, ψ⁰⁰₆(µ, t) =

√2

√πµe

−t2 2µ2,

(9)

which imply

ψ₆⁰⁰(µ, ξ) <

√2

√πµ.

All the aforementioned results indicate that

ψ_i⁰(µ, t₁) − ψ_i⁰(µ, t₂) ≤ 3

µ|t₁ − t₂|, i = 1, 3, 5, 6. (20) for any t1, t2 ∈ IR.

Case (ii): i = 2, 4. Indeed, we will provide a version like (20) for ψ₂⁰ and ψ₄⁰. For i = 2, we know that

ψ₂⁰(µ, t) =







1 if t ≥ ^µ₂,

2t

µ if −^µ₂ < t < ^µ₂,

−1 if t ≤ −^µ₂. If t₁ ≥ ^µ₂, t₂ ≥ ^µ₂, t₁ ≤ −^µ₂, t₂ ≤ −^µ₂ or t₁, t₂ ∈ (−^µ₂,^µ₂), then

ψ₂⁰(µ, t₁) − ψ₂⁰(µ, t₂) ≤ 2

µ|t₁− t₂|.

If t₁ ≥ ^µ₂, t₂ ≤ −^µ₂, then

ψ⁰₂(µ, t₁) − ψ⁰₂(µ, t₂)

= 2 = µ2 µ ≤ 2

µ|t₁− t₂|.

If t₁ ≥ ^µ₂, t₂ ∈ (−^µ₂,^µ₂), then

ψ⁰₂(µ, t₁) − ψ⁰₂(µ, t₂)

= 1 − 2t2

µ < 2t1

µ −2t2

µ = 2

µ|t₁− t₂|.

If t₁ ≤ −^µ₂, t₂ ∈ (−^µ₂,^µ₂), then

ψ₂⁰(µ, t₁) − ψ⁰₂(µ, t₂)

= 1 + 2t₂

µ < −2t₁ µ +2t₂

µ = 2

µ|t₁− t₂|.

Thus, it is clear to conclude that

ψ₂⁰(µ, t₁) − ψ₂⁰(µ, t₂) ≤ 2

µ|t₁− t₂|, ∀ t₁, t₂ ∈ IR.

For i = 4, using the similar arguments for case of i = 2, it can be verified that

ψ₄⁰(µ, t₁) − ψ₄⁰(µ, t₂) ≤ 1

µ|t₁− t₂|, ∀t₁, t₂ ∈ IR.

In summary, we also achieve

ψ⁰_i(µ, t₁) − ψ⁰_i(µ, t₂) ≤ 2

µ|t₁− t₂|, i = 2, 4. (21)

(10)

for any t₁, t₂ ∈ IR.

Now, applying (20) and (21), for any x, y ∈ IRⁿ, we have k∇f_µ(x) − ∇f_µ(y)k

=

λ [∇φ_i(µ, x) − ∇φ_i(µ, y)] + A^T(Ax − b) − A^T(Ay − b)

≤ 3

µλnkx − yk + kAk²kx − yk

= Lkx − yk,

where L = _µ³λn + kAk². Thus, the proof is complete. 2

Lemma 3.2. For µ > 0, the level set L(x⁰) = {x ∈ IRⁿ| f_µ(x) ≤ f_µ(x⁰)} is bounded.

Proof. We prove it by contradiction. Suppose that the set L(x⁰) is unbounded. Then, there exists an index set K₁, such that kx^kk → ∞, k → ∞, k ∈ K₁. Recalling the definition of fµ(x), we have fµ(x^k) → ∞, k → ∞, k ∈ K1. This contradicts fµ(x^k) ≤ f_µ(x⁰). Thus, the level set L(x⁰) is bounded. 2

In fact, since the continuity of f_µ(x), we find the level set L(x⁰) is compact. We point out that Lemma 3.1 and Lemma 3.2 play key roles in our theoretical part. They were assumed as two assumptions in [32] and other literature like [14]. Here we assert them by showing that function (17) based on the proposed smoothing functions satisfies these assumptions. With these properties, we are ready to present the global convergence of our algorithm.

Theorem 3.1. For any µ > 0, consider the aforementioned algorithm with any starting point x⁰. Let {x^k} be the sequence generated by the algorithm. Then

lim inf

k→∞ k∇f_µ(x^k)k = 0. (22)

Proof. Suppose that the conclusion (22) is not true. Then, there exists a constant ε₀ > 0, such that

k∇fµ(x^k)k ≥ ε0, ∀k. (23)

Since (∇f_µ(x^k))^Td^k= −k∇f_µ(x^k)k², there exists α_k> 0, such that

f_µ(x^k+ α_kd^k) ≤ f_µ(x^k) − 2δ(1 − γ)α_kf_µ(x^k). (24) This means {f_µ(x^k)} is decreasing and bounded, which implies that x^k ∈ L(x⁰) and {fµ(x^k)} is convergent. We denote that limk→∞fµ(x^k) = f∗. From (24), it can be

(11)

verified that lim_k→∞α_kf_µ(x^k) = 0. By the definition of f_µ(x), we know f∗ > 0, and therefore 0 < α_k = ^α^k_f^f^µ^(x^k⁾

µ(x^k) ≤ ^α^k^f_f^µ^(x^k⁾

∗ . To sum up, we obtain

k→∞lim αk = 0. (25)

Now, applying lemma 3.2, there is a constant r > 0, such that

k∇f_µ(x^k)k ≤ r, ∀ k. (26)

Next, we prove {kd^kk} is bounded by a contradiction argument. If {kd^kk} is unbounded, then there exists an index K₂, such that kd^kk → ∞, k → ∞, k ∈ K₂. Let θ_k be the angle between −∇fµ(x^k) and d^k. Then, we see that

cos θ_k = −(∇f_µ(x^k))^Td^k

k∇f_µ(x^k)kkd^kk = k∇f_µ(x^k)k

kd^kk . (27)

This relation enables us to apply ε₀ ≤ k∇f_µ(x^k)k ≤ r to conclude cos θ_k → 0, k ∈ K₂, k → ∞, which means θ_k→ ^π₂, k ∈ K₂, k → ∞. Considering this geometric relationship, we have (∇f_µ(x^k))^Td^k → 0, k ∈ K₂, k → ∞, from which we find

−k∇f_µ(x^k)k² = (∇f_µ(x^k))^Td^k → 0, k ∈ K₂, k → ∞,

which contradicts (23). Thus, {kd^kk} is bounded, i.e., there exists a constant M^∗ > ε0, such that

kd^kk ≤ M^∗, ∀k. (28)

Then, combining (25) together with (28) gives lim

k→∞α_kkd^kk = 0. (29)

In addition, from (27), we have cos θ_k≥ _M^ε⁰∗, which further yields

−(∇f_µ(x^k))^Td^k= k∇f_µ(x^k)kkd^kk cos θ_k≥ ε²₀

M^∗kd^kk. (30)

Applying the mean value theorem, we obtain f_µ(x^k+ α_kd^k) = f_µ(x^k) + α_k(∇f_µ(ξ^k))^Td^k

= f_µ(x^k) + α_k(∇f_µ(x^k))^Td^k+ α_k(∇f_µ(ξ^k) − ∇f_µ(x^k))^Td^k

≤ f_µ(x^k) + α_kkd^kk(∇f_µ(x^k))^Td^k

kd^kk + k∇f_µ(ξ^k) − ∇f_µ(x^k)k

, (31) where ξ^k is between x^k and x^k + α_kd^k. Using Lemma 3.1 and the compact property of level set L(x⁰) imply that ∇f_µ(x) is uniformly continuous on L(x⁰). This together

(12)

and (29) implies that there exists a constant α > 0, for sufficiently large k, such thatb α_kkd^kk <α andb

k∇f_µ(ξ^k) − ∇f_µ(x^k)k < 1 2

ε²₀

M^∗. (32)

Then, when k is sufficiently large, from (31), we have fµ(x^k+ αkd^k) ≤ fµ(x^k) +αb

− ε²₀ M^∗ +1

2 ε²₀ M^∗

= f_µ(x^k) − αεˆ ²₀

2M^∗, (33)

which contradicts f_µ(x^k+1) − f_µ(x^k) → 0, k → ∞. Therefore, there must hold lim inf

k→∞ k∇f_µ(x^k)k = 0.

2

4 Numerical Experiments

In this section, we conduct numerical experiments to show the performance of our algorithm for signal recovery. All tests are run in MATLAB R20116 on a 64-bit PC with an Intel (R) Core(TM) i7-6500U of 2.50 GHz CPU and 8.00GB of RAM equipped with Windows 7 operating system.

Experiment 1

In what follows, we divide our tests into two groups: noise-free case and noise case. we report the numerical results of our algorithm for signal recovery. In all of our simulations, A ∈ IR^m×n is assumed to be Gaussian matrix and m = ⁿ₂, ⁿ₄, x^∗ ∈ IRⁿ is a K-sparse original signal with K = ₄₀ⁿ, whose nonzero component satisfies normal distribution N (0, 1).

Moreover, we set b = Ax + e in the noisy case, where e is the Gaussian noise with zero mean and variance σ².

The parameters of our algorithm are summarized as follows:

x⁰ = zeros(n, 1), λ = 0.001 ∗ kA^Tbk∞, ρ = 0.5, δ = 0.002, γ = 0.2, σ² = 0.0001.

In Tables 1 and 2, we set µ₀ = 0.1, µ_k+1 = 0.4µ_k and the algorithm terminates when the relative error

kx − x^∗k

kx^∗k < 4 × 10⁻³,

(13)

where x is the reconstruction signal. In Tables 3 and 4, F denotes the smoothing functions, CPU denotes cpu time, Re denotes the relative error and It denotes the iterations, we set µ = 1.0 e⁻², 1.0 e⁻³, 1.0 e⁻⁴, 1.0 e⁻⁵ and the algorithm terminates when

|f_µ(x^k+1) − f_µ(x^k)|

|f_µ(x^k+1)| < 1.0 e⁻¹².

Note that the result from each test is obtained by running our algorithm 10 times and taking its average result.

In our experiments, we found that ψ₁ is very unstable, sometimes it is not applicable to solve the test problems. Similar concern had also been addressed in [13, page 135].

Accordingly, we try to adopt the equivalent expression suggested in [13, formula (3.1)]

and rewrite ψ₁(t) as below:

ψ₁(t) = µh ln

e^λ1(A(t))^µ + e^λ2(A(t))^µ + ln

e^λ1(B(t))^µ + e^λ2(B(t))^µ i

= λ₁(A(t)) + µ

eλ2(A(t))−λ1(A(t)) µ

+ λ₁(B(t)) + µ

eλ2(B(t))−λ1(B(t)) µ

where

A(t) = 0 0 0 −t

, B(t) = 0 0 0 t

,

λ₁(A(t)) = 0, λ₂(A(t)) = −t, λ₁(B(t)) = 0, λ₂(B(t)) = t.

Nonetheless, ψ₁ still does not work well along with the algorithm. We therefore omit it in our numerical reports.

CPU time Iterations

n m ψ₂ ψ₃ ψ₄ ψ₅ ψ₆ ψ₂ ψ₃ ψ₄ ψ₅ ψ₆

2000 n/2 1.21 0.94 1.32 1.27 0.92 72 74 75 73 69

n/4 1.37 0.90 1.45 1.59 1.00 129 135 142 148 147

4000 n/2 4.16 3.83 5.01 4.50 3.80 70 70 78 72 68

n/4 4.68 4.12 5.19 5.58 4.50 131 143 143 148 146

6000 n/2 9.54 8.49 9.80 9.22 8.61 70 70 70 69 68

n/4 9.69 10.35 10.71 12.91 10.54 134 163 145 159 164 8000 n/2 16.93 16.25 18.76 18.91 15.55 67 71 74 78 69

n/4 15.76 15.65 18.65 19.49 15.75 126 137 145 144 138 10000 n/2 25.88 24.06 30.07 25.83 24.21 67 67 76 68 68

n/4 27.94 28.28 29.96 34.48 27.60 139 138 151 160 149 Table 1: Comparisons of five smoothing functions for noise-free case

(14)

CPU time Iterations

n m ψ₂ ψ₃ ψ₄ ψ₅ ψ₆ ψ₂ ψ₃ ψ₄ ψ₅ ψ₆

2000 n/2 1.27 1.00 1.30 1.33 1.00 73 78 75 75 72

n/4 1.54 0.92 1.64 1.55 0.91 145 151 154 150 144

4000 n/2 4.48 4.47 4.89 4.62 4.36 72 80 75 72 76

n/4 4.94 5.16 5.77 5.09 4.65 137 159 154 143 158

6000 n/2 8.81 9.07 9.80 9.91 8.97 65 72 72 67 69

n/4 11.65 11.61 12.10 12.38 10.12 160 167 152 165 154 8000 n/2 16.27 17.05 18.28 18.53 16.69 69 75 76 74 71

n/4 15.76 18.04 19.27 17.99 16.91 132 155 145 145 143 10000 n/2 25.59 25.58 26.04 25.82 25.38 68 71 67 67 68

n/4 25.82 27.23 37.00 29.52 27.73 136 150 175 150 155 Table 2: Comparisons of five smoothing functions for noisy case

Figure 1: The convergence behavior: cpu time vs. signal size (n)when µ_k+1 = 0.4µ_k

(15)

Figure 2: The convergence behavior: iterations vs. signal size (n) when µ_k+1 = 0.4µ_k

µ = 1.0e⁻² µ = 1.0e⁻³ µ = 1.0e⁻⁴ µ = 1.0e⁻⁵

F m CPU Re It CPU Re It CPU Re It CPU Re It

ψ₂ n/2 8.80 0.0158 92 9.34 0.0040 89 11.03 0.0033 100 16.78 0.0025 141 n/4 10.98 0.0365 172 11.49 0.0064 178 11.83 0.0029 184 17.06 0.0028 237 ψ₃ n/2 9.44 0.0567 104 8.14 0.0090 86 7.98 0.0032 86 10.93 0.0025 111 n/4 7.44 0.1532 167 9.10 0.0195 179 8.36 0.0045 166 10.34 0.0030 188 ψ₄ n/2 8.22 0.0538 71 7.92 0.0059 73 10.58 0.0027 93 12.01 0.0025 104 n/4 9.13 0.0157 139 11.11 0.0154 155 10.69 0.0035 179 13.43 0.0032 194 ψ5

n/2 7.88 0.0419 68 7.59 0.0055 72 10.14 0.0025 91 12.77 0.0025 109 n/4 10.26 0.1895 132 11.09 0.0117 156 10.51 0.0041 169 12.83 0.0034 193 ψ₆ n/2 8.95 0.0360 96 7.94 0.0063 84 9.69 0.0027 99 12.57 0.0023 121 n/4 9.17 0.0926 166 9.16 0.0128 173 8.98 0.0038 174 11.75 0.0030 203 Table 3: Comparisons of five functions for different µ when n = 5000 and noise-free case

(16)

From the experiments, we have the following observations:

(1) The sparse original signal can be recovered by our algorithm effectively. Tables 1-4 and Figures 1-5 illustrate that the five smoothing functions (except ψ₁) in our algorithm work quite well whether the given problem is noise-free or noisy.

(2) The convergence behavior with respect to CPU time, iterations, signal size, and the relative error are depicted in Figures 1-5, respectively. From Tables 1-2 and Figures 1-2, we see that our algorithm costs more cpu time when increasing the dimension n, while the changes of their iterations are very marginal. On the whole, the performance order can be summarized as below:

ψ₂ ≥ ψ₄ ≈ ψ₅ ≈ ψ₆ > ψ₃

where “≥” means performs better than and “≈” means there is no difference in numerical performance.

µ = 1.0e⁻² µ = 1.0e⁻³ µ = 1.0e⁻⁴ µ = 1.0e⁻⁵

F m CPU Re It CPU Re It CPU Re It CPU Re It

ψ₂ n/2 8.62 0.0159 89 8.77 0.0072 88 11.82 0.0026 100 15.80 0.0025 143 n/4 11.67 0.0379 174 10.40 0.0063 180 10.26 0.0033 188 15.03 0.0033 242 ψ₃ n/2 8.54 0.0548 100 7.86 0.0085 88 8.75 0.0033 92 10.38 0.0025 109 n/4 7.43 0.1506 169 8.43 0.0185 169 7.78 0.0045 177 9.82 0.0030 203 ψ₄ n/2 6.76 0.0491 64 8.11 0.0058 76 11.40 0.0028 95 11.93 0.0026 105 n/4 11.10 0.1439 154 8.84 0.0137 152 9.67 0.0035 175 11.52 0.0035 184 ψ5

n/2 7.36 0.0542 65 7.82 0.0059 75 9.57 0.0031 87 11.33 0.0025 104 n/4 10.19 0.1492 143 10.00 0.0123 154 10.48 0.0039 177 11.37 0.0034 191 ψ₆ n/2 8.54 0.0355 94 8.78 0.0061 92 9.05 0.0029 96 11.57 0.0026 119 n/4 7.77 0.0908 160 9.15 0.0122 180 8.11 0.0040 167 10.50 0.0030 207 Table 4: Comparisons of five smoothing functions for different µ when n = 5000 and

noisy case

(17)

Figure 3: The convergence behavior: cpu time vs. − log µ with n = 5000

Figure 4: The convergence behavior: iterations vs. − log µ with n = 5000

(18)

Figure 5: The convergence behavior: relative error vs. − log µ with n = 5000 (3) According to Tables 3-4 and Figures 3-5, the smaller the parameter µ, the better the

signal to recover; and the more cpu time and iterations have to spend accordingly.

For any fixed µ, although the function ψ₂ sometimes takes a bit more cpu time and iterations, it has a lower relative error. In view of this, we could conclude that the function ψ2 works best along with the proposed algorithm. From Figure 5, the function ψ₆ also has a lower relative error, which means ψ₆ is a possible good choice as well. To sum up, in view of relative error, we have

ψ₂ ≥ ψ₄ ≈ ψ₅ ≈ ψ₆ > ψ₃. Experiment 2

Next, we try to do comparison with “NESTA”, which is a well-known, fast, open software proposed in [2]. In the noisy case, b = Ax + e, where e is the Gaussian noise with zero mean and variance σ². In all of our simulations, A ∈ IR^m×n is assumed to be Gaussian matrix, where n = 2¹³ and m = ⁿ₄. We select K-sparse original signal x^∗ ∈ IRⁿ with K = ₄₀ⁿ, its nonzero component satisfies normal distribution N (0, 1). In our test, we set

x⁰ = A^Tb, λ = 0.0048 ∗ kA^Tbk∞, ρ = 0.5, δ = 0.002, γ = 0.2, σ² = 0.0001.

We also accelerate our algorithm by using µ_k+1= ωµ_k, 1 ≤ k ≤ t, ω = (µ_t

µ₀)^1/t, µ_t = 10⁻⁸, µ₀ = 10⁻², and t = 4.

(19)

The algorithm terminates when

|fµ(x^k) − ¯fµ(x^k)|

f¯µ(x^k) < tol, f¯_µ(x^k) = 1 min{5, k}

min{5,k}

X

l=1

f_µ(x^k−l).

In our experiments, the accuracy tol ∈ {10⁻⁴, 10⁻⁵, 10⁻⁶, 10⁻⁷, 10⁻⁸}. We test the continuation version NESTA for comparison. In testing NESTA, other parameters are taken as default expect for the above parameters.

From the comparison experiments, we have the following observations:

(1) In the view of CPU time, the NESTA is really fast. However, from Tables 5-6 and Figure 8, as the “accuracy tol” gets smaller, the CPU time of our algorithm is not affected too much. To the contrast, when the “accuracy tol” gets smaller, the CPU time of NESTA goes up dramatically.

(2) From Tables 5-6 and Figure 7, we see that our algorithm has a small relative error when the “accuracy tol” is relatively large. Especially, when tol = 10⁻⁴ and tol = 10⁻⁵, the relative error of NESTA gets large, which indicates that the signal has not been recovered well. To the contrast, we see that the function ψ₂ keeps a small error, which means it has better performance.

(3) From Tables 5-6 and Figure 6, NESTA needs more iterations. In view of this and relative error, our proposed algorithm is competitive.

Figure 6: The convergence behavior: iterations vs. accuracy

(20)

Figure 7: The convergence behavior: relative error vs. accuracy

Figure 8: The convergence behavior: CPU time vs. accuracy

(21)

Experiment 3

In order to observe the efficiency on sparse recovery. We compare our algorithm with three other algorithms: the NESTA, the FPCBB and the FISTA, in which Bernoulli matrix, Partial Hadamard matrix and Gaussian matrix are chosen. In our implementations, we select n = 2¹⁴, m = ⁿ₂ and sparsity is K = ₄₀ⁿ. The parameters µ_k, δ and γ are the same as the Experiment 2. See Figures 9-11.

The result from each test is obtained by running our algorithm 30 times and taking its average result. Figures 9-11 show the frequency of successful reconstruction at different relative error. We can see that our algorithm gives relatively better behavior when relative error is small enough. This means that our algorithm has slightly higher efficiency on sparse recovery. To sum up, we obtain

ψ₂ ≥ ψ₄ ≈ ψ₅ ≈ ψ₆ > ψ₃.

Figure 9: Comparisons among algorithms with Bernoulli matrix

(22)

Figure 10: Comparisons among algorithms with Gaussian matrix

Figure 11: Comparisons among algorithms with Partial Hadamard matrix

(23)

5 Conclusions

In this paper, the sparse signal recovery problem is investigated, in which we concentrate on the l1-norm model. In light of convolution techniques in [23, 27] and the framework of nonlinear conjugate gradient method [32], we propose an unified smoothing approach to solve the given sparse signal reconstruction problem. As a byproduct, the classical l₁ norm is approximated by six smoothing functions. Numerical results show that ψ₂ and ψ₆ are the better choices of smoothing function to work with the algorithm. Although the theoretical part is not very notable, the contribution of this paper lies in numerical comparisons among new smoothing functions. In particular, we suggest two nice smoothing functions to work along with nonlinear conjugate gradient method. In addition, we compare our algorithm with three other algorithms (well-known open softwares): the NESTA, the FPCBB and the FISTA, in which Bernoulli matrix, Partial Hadamard matrix and Gaussian matrix are chosen. It can be seen that our algorithm gives relatively better behavior when relative error is small enough. Under this sense, we provide a new choice of simple and effective approach to deal with the recovery of the original sparse signal problem.

Our attention was drawn to [14] by one reviewer. In [14], the authors proposed a smoothing conjugate gradient method, and then used this algorithm to solve the image restoration problem. More specifically, it used a smoothing function as below:

s_µ(t) =

( |t| if |t| > ^µ₂,

t²

µ + ^µ₄ if |t| ≤ ^µ₂.

This smoothing function is exactly the ψ2(µ, t) function in our paper. Our contribution lies in showing that this smoothing function is also a good choice for signal reconstruction problem comparing to other smoothing functions, which are not investigated in reference [14].

Acknowledgements. The authors would like to thank the anonymous referee and the editor for their valuable comments, viewpoints, and suggestions, which help improve the manuscript a lot.

References

[1] A. Beck and M. Teboulle, A fast iterative shrinkage-thresholding algorithm for linear inverse problems, SIAM Journal on Imaging Sciences, vol. 2, pp. 183-202, 2009.

[2] S. Becker, J. Bobin, and E. Candes, NESTA: a fast and accurate first-order method for sparse recovery, SIAM Journal on Imaging Sciences, vol. 4, pp. 1-39, 2011.

(24)

[3] E. Berg and M.P. Friedlander, Probing the Pareto frontier for basis pursuit solutions, SIAM Journal on Scientific Computing, vol. 31, pp. 890-912, 2008.

[4] J.M. Bioucas-Dias and M. Figueiredo, A new TwIST: two-step iterative shrinkage/thresholding algorithms for image restoration, IEEE Trans. Image Process, vol.

16, pp. 2992-3004, 2007.

[5] R.L. Broughton, I.D. Coope, P.F. Renaud, and R.E.H. Tappenden, A box constrained gradient projection algorithm for compressed sensing, Signal Process, vol.

91, pp. 1985-1992, 2011.

[6] E.J. Candes, The Restricted Isometry Property and its Implications for Compressed Sensing, Comptes Rendus Mathematique, vol. 346, no. 9-10, pp.589-592, 2008.

[7] E.J. Candes and T. Tao, Decoding by Linear Programming, IEEE Transactions on Information Theory, vol. 51, no. 12, pp.4203-4215, 2005.

[8] E.J. Candes and T. Tao, Near optimal signal recovery from random projections:

universal encoding strategies? IEEE Transactions on Information Theory, vol. 52, no.

12, pp. 5406-5425, 2006.

[9] E.J. Candes, J.K. Romberg, and T. Tao, Robust uncertainty principles: exact signal reconstruction from highly incomplete frequency information, IEEE Transac- tions on Information Theory, vol. 52, no.2, pp. 489-509, 2006.

[10] E.J. Candes, J.K. Romberg, and T. Tao, Stable signal recovery from incomplete and inaccurate measurements, Communications on Pure and Applied Mathe- matics, vol. 59, no. 8, pp. 1207-1223, 2006.

[11] C. Chen and O.L. Mangasarian, A class of smoothing functions for nonlinear and mixed complementarity problems, Computational Optimization and Applications, vol. 5, pp. 97-138, 1996.

[12] X. Chen, Smoothing methods for nonsmooth, nonconvex minimization, Mathemat- ical Programming, vol. 134, pp. 71-99, 2012.

[13] X. Chen, R.S. Womersley, and J. Ye, Minimizing the condition number of Gram matrix, SIAM Journal on Optimization, vol 21, no. 1, 127-148, 2011.

[14] X. Chen and W. Zhou, Smoothing nonlinear conjugate gradient method for image restoration using nonsmooth nonconvex minimization, SIAM Journal on Imaging Sciences, vol. 3, no. 4, pp. 765-790, 2010.

[15] C. De Mol and M. Defrise, A note on wavelet-based inversion algorithms, Con- temporary Mathematics, vol. 313, pp. 85-96, 2002.

(25)

[16] D.L. Donoho, Compressed sensing, IEEE Transactions on Information Theory, vol. 52, no. 4, pp. 1289-1306, 2006.

[17] M. Elad, Why simple shrinkage is still relevant for redundant representations?

IEEE Transactions on Information Theory, vol. 52, no. 12, pp. 5559-5569, 2006.

[18] M. Figueiredo, R. Nowak, and S.J. Wright, Gradient projection for sparse reconstruction: application to compressed sensing and other inverse problems, IEEE Journal of Selected Topics in Signal Processing, vol. 1, pp. 586-597, 2007.

[19] E.T. Hale, W. Yin, and Y. Zhang Fixed-point continuation for l1-minimization:

methodology and convergence, SIAM Journal on Optimization, vol. 19, pp. 1107-1130, 2008.

[20] E.T. Hale, W. Yin, and Y. Zhang, Fixed-point continuation applied to compressed sensing: implementation and numerical experiments, Journal of Computa- tional Mathematics, vol. 28, pp. 170-194, 2010.

[21] R. Nowak and M. Figueiredo, Fast wavelet-based image deconvolution using the EM algorithm, in Proceedings of the 35th Asilomar Conference on Signals, Systems and Computers, vol. 1, pp. 371-375, 2001.

[22] L. Qi and D. Sun, Smoothing functions and smoothing Newton method for complementarity and variational inequality problems, Journal of Optimization Theory and Applications, vol. 113, pp. 121–147, 2001.

[23] B. Saheya, C.H. Yu, and J.-S. Chen, Numerical comparisons based on four smoothing functions for absolute value equation, Journal of Applied Mathematics and Computing, vol. 56, no. 1-2, pp. 131-149, 2018.

[24] J.L. Starck, E. Candes, and D. Donoho, Astronomical image representation by the curvelet transform, Astronomy and Astrophysics, vol. 398, pp. 785-800, 2003.

[25] J.L. Starck, M. Nguyen, and F. Murtagh, Wavelets and curvelets for image deconvolution: a combined approach, Signal Process., vol. 83, pp. 2279-2283, 2003.

[26] R. Tibshirani Regression shrinkage and selection via the lasso, Journal of the Royal Statistical Society, vol. 58, pp. 267-268, 1996.

[27] S. Voronin, G. Ozkaya, and D. Yoshida Convolution based smooth approxi- mations to the absolute value function with application to non-smooth regularization, arXiv:1408.6795v2 [math.NA] 1, July, 2015.

(26)

[28] X. Wang, F. Liu, L.C. Jiao, J. Wu, and J. Chen, Incomplete variables trun- cated conjugate gradient method for signal reconstruction in compressed sensing, In- formation Sciences, vol. 288, pp. 387-411, 2014.

[29] S. Wright, R. Nowak, and M. Figueiredo, Sparse reconstruction by separable approximation, in Proceedings of the International Conference on Acoustics, Speech, and Signal Processing, October 2008.

[30] J. Yang and Y. Zhang, Alternating direction algorithms for l1-problems in compressive sensing, SIAM Journal on Scientific Computing, vol. 33, pp. 250-278, 2011.

[31] K. Yin, Y.H. Xiao, and M.L. Zhang, Nonlinear conjugate gradient method for l1-norm regularization problems in compressive sensing, Journal of Computational Information Systems, vol. 7, pp. 880-885, 2011.

[32] L. Zhang, W. Zhou, and D. Li, A descent modified Polak-Ribiere-Polyak conjugate gradient method and its global convergence, IMA Journal of Numerical Analysis, vol. 26, pp. 629-640, 2006.

[33] H. Zhu, Y.H. Xiao, and S.Y. Wu, Large sparse signal recovery by conjugate gradient algorithm based on smoothing technique, Computers and Mathematics with Applications, vol. 66, pp. 24-32, 2013.

(27)

tol=1.0e−4 tol=1.0e−5 tol=1.0e−6 tol=1.0e−7 tol=1.0e−8 FCPUReItCPUReItCPUReItCPUReItCPUReIt ψ223.779.62E-0415028.301.98E-0517735.652.11E-0521346.621.91E-0525659.971.58E-05342 ψ321.826.37E-0314224.553.03E-0517130.882.29E-0520636.542.01E-0524140.391.76E-05286 ψ426.647.00E-0314627.034.32E-0516936.792.21E-0521545.122.04E-0525859.341.76E-05338 ψ526.322.21E-0315728.042.77E-0517434.732.12E-0520945.382.14E-0525649.891.88E-05296 ψ619.534.82E-0313824.833.01E-0517232.412.34E-0521438.802.06E-0524550.091.68E-05301 NESTA7.962.28E-025558.302.95E-0256817.123.26E-04113415.981.82E-05106044.501.74E-052999 Table5:ComparisonsoffivesmoothingfunctionsandNESTAfordifferenttolwhennoise-free

(28)

tol=1.0e−4 tol=1.0e−5 tol=1.0e−6 tol=1.0e−7 tol=1.0e−8 FCPUReItCPUReItCPUReItCPUReItCPUReIt ψ223.252.49E-0314626.107.37E-0416536.675.91E-0422262.275.80E-04332104.445.95E-04542 ψ319.535.57E-0313819.024.24E-0313631.625.83E-0421553.045.99E-0433280.035.83E-04459 ψ425.113.52E-0316223.848.90E-0315337.375.39E-0422456.385.65E-04314104.555.68E-04520 ψ524.725.91E-0316126.407.22E-0416435.526.31E-0421457.015.75E-0431783.135.83E-04429 ψ620.451.23E-0214822.166.80E-0415633.505.63E-0422256.775.85E-0430894.435.75E-04507 NESTA7.672.13E-025258.281.10E-0254915.817.85E-04108616.457.47E-04107385.896.16E-045886 Table6:ComparisonsoffivesmoothingfunctionsandNESTAfordifferenttolwhennoisycase