https://doi.org/10.1007/s10092-019-0340-5

**Signal reconstruction by conjugate gradient algorithm** **based on smoothing** **l**

**l**

**1**

**-norm**

**Caiying Wu**^{1}**· Jiaming Zhan**^{1}**· Yue Lu**^{2}**· Jein-Shan Chen**^{3}

Received: 12 January 2019 / Accepted: 9 October 2019

© Istituto di Informatica e Telematica (IIT) 2019

**Abstract**

*The l*1-norm regularized minimization problem is a non-differentiable problem and
has a wide range of applications in the field of compressive sensing. Many approaches
*have been proposed in the literature. Among them, smoothing l*1-norm is one of the
effective approaches. This paper follows this path, in which we adopt six smoothing
*functions to approximate the l*1-norm. Then, we recast the signal recovery problem as
a smoothing penalized least squares optimization problem, and apply the nonlinear
conjugate gradient method to solve the smoothing model. The algorithm is shown
globally convergent. In addition, the simulation results not only suggest some nice
smoothing functions, but also show that the proposed algorithm is competitive in view
of relative error.

* Keywords l*1-norm regularization· Compressive sensing · Conjugate gradient
algorithm· Smoothing function

**Mathematics Subject Classification 90C33**

Caiying Wu: The research is supported by the Natural Science Foundation of Inner Mongolia Autonomous Region (2018MS01016)

Jiaming Zhan: The research is supported by the Natural Science Foundation of Inner Mongolia Autonomous Region (2018MS01016)

Yue Lu: The research is supported by National Natural Science Foundation of China (Grant No.

11601389), Doctoral Foundation of Tianjin Normal University (Grant No. 52XB1513), 2017-Outstanding Young Innovation Team Cultivation Program of Tianjin Normal University (Grant No. 135202TD1703) and Program for Innovative Research Team in Universities of Tianjin (Grant No. TD13-5078).

Jein-Shan Chen: The research is supported by Ministry of Science and Technology, Taiwan.

### B

Jein-Shan Chen jschen@math.ntnu.edu.twExtended author information available on the last page of the article

**1 Introduction**

In this paper, we focus on the topic of reconstructing signals, which is one of the most important applications in compressive sensing [8,9,16]. There exist numerous textbooks and articles related to this topic and there is no need to repeat its importance and applications here. Therefore, we get into its mathematical model directly and convey our new idea for tackling this problem. Mathematically, the noise-free signal reconstruction problem model is described as follows:

min*x*0

s.t. b = Ax, (1)

*where x* ∈ IR^{n}*is the original sparse signal that needs to be recovered, A* ∈ IR^{m}^{×n}*(m n) is the measurement matrix, b ∈ IR** ^{m}* is the observation vector, and

*x*0

*represents the l*0*-norm of x, which is defined as the number of nonzero components of*
*x. Without loss of generality, we assume that there exists a positive constant K* *≤ n*
such that*x*0 *= K . Besides this condition, if the measurement matrix A satisfies*
*Restricted Isometry Property (RIP) of order K , then we can recover the signal x more*
accurately through the model (1), see [6–10] for more details.

*Unfortunately, the l*0-norm minimization problem (1) is an NP-hard combinatorial
optimization problem. In order to avoid this difficulty, researchers attempt to replace
*l*0*-norm by l*1-norm in model (1) and obtain the following l1minimization problem

min*x*1

s.t. b = Ax, (2)

where *x*1 *denotes the l*1 *norm of x andx*1 = *n*

*i*=1*|x**i*|. Under the RIP con-
dition, the model (2) has the same solution as (1). In practice, the probed signal b
is usually impacted by noise, therefore there arises investigation on the noise signal
reconstruction problem:

min*x*1

s.t. b = Ax + e, (3)

*where e* ∈ IR* ^{m}* denotes the noise. In order to deal with (3), researchers prefer to
consider the associated penalized least squares optimization problem

min*λx*1+1

2*b − Ax*^{2}2 (4)

where*λ > 0 is the penalty factor. In the sequel, we call (4) the l*1-norm regularized
problem.

Until now, there are plenty of numerical algorithms for solving the model (4) and some first-order algorithms have drawn much attention during the past decades.

Due the huge amount of literature, we only outline and recall some approaches

as below. The gradient projection algorithm [18] is one of the earliest gradient- based algorithms for solving (4), in which it was reformulated as a box constrained quadratic program and solved by the gradient projection algorithm. Recently, the most extensively investigated first order algorithm for the solution of (4) was the iter- ative shrinkage/thresholding (IST) algorithm [15,21]. Their results triggered off many contributions based on this method [17,24,25]. In light of the IST algorithm, many variants are proposed under different optimization reformulations and techniques. For instances, Hale, Yin and Zhang [19,20] presented an IST fixed point continuation (FPC) method by an operator splitting skill. Wright, Nowak and Figueiredo [29] stud- ied the spaRSA for sparse reconstruction from solving nonsmooth convex problem.

Experimental results show that the accelerated IST methods (two IST [4] and fast IST
[1]) have better convergence properties. In addition, the famous NESTA [2] first pro-
*posed the smoothing function of the l*1-norm and then used Nesterov’s gradient method
to get the solution of (3). Besides these methods, the gradient projection technique
and alternating direction ideas are also considered, see [3,5,30] for more details.

Because the simplicity and lower storage requirements, conjugate gradient algo-
rithms are suitable for large scale problems. In this paper, like [28,31,33], we are
interested in applying the conjugate gradient method for signal recovery. We use six
smoothing functions, which are studied in [23,27] for absolute value function, to
*approximate the l*1-norm. Accordingly, we reformulate the signal recovery problem
as a smoothing penalized least squares optimization problem, and apply the nonlinear
conjugate gradient method [32] to solve the smoothing model. The proposed algorithm
is shown globally convergent. Moreover, we report some numerical experiments to
demonstrate the effectiveness of our method. Numerical comparisons with “NESTA”,

“FPCBB”, and “FISTA”, which are well-known open softwares, are presented as well.

**2 Preliminary**

This section recalls some ideas about smoothing technique which can be found in [11,12,23] and references therein.

It is well known that the absolute value function*|t| (t ∈ IR) is not smooth at zero. In*
order to overcome this difficulty, we introduce some smoothing functions of*|t| used*
in the sequel.

**Definition 2.1 The function***ψ : IR*_{++}*× IR → IR is a smoothing function of |t|, if the*
following two conditions hold:

(a) *ψ is continuously differentiable at (μ, t) ∈ IR*++× IR;

(b) lim_{μ↓0}*ψ(μ, t) = |t| for any t ∈ IR.*

How to construct smoothing functions for *|t|? First, we observe that |t| can be*
divided into two parts:

*|t| = (t)*+*− (t)*−*= (t)*+*+ (−t)*+*,*

where*(t)*+denotes the plus function, i.e.,*(t)*+*= max{0, t}, and (t)*−*= min{0, t}. As*
mentioned in [11,12,23], one can follow the below procedure to construct a smoothing

function for*(t)*+*. More specifically, through importing a density (kernel) function d(t)*
with finite number of pieces satisfying

*d(t) ≥ 0 and*

_{+∞}

−∞ *d(t)dt = 1,*
one can define

*ˆs(t, μ) :=* 1
*μd*

*t*
*μ*

*,*

where*μ is a positive parameter. If the following condition holds*

_{+∞}

−∞ *|t| d(t) dt < +∞,*
then the function *ˆp(t, μ) defined as*

*ˆp(t, μ) =*

_{+∞}

−∞ *(t − s)*_{+}*ˆs(s, μ)ds =*

*t*

−∞*(t − s)ˆs(s, μ)ds ≈ (t)*_{+}

is a smoothing approximation for *(t)*+. There are existing well-known smoothing
functions for the plus function [11,22,23], for example,

*ˆψ*1*(μ, t) = t + μ ln*

1*+ e*^{−}^{μ}^{t}

*,* (5)

*ˆψ*2*(μ, t) =*

⎧⎨

⎩

*t* if *t* ≥ ^{μ}_{2}*,*

1
2*μ*

*t*+^{μ}_{2} 2

if −^{μ}_{2} *< t <* ^{μ}_{2}*,*

0 *if t*≤ −^{μ}_{2}*,*

(6)

*ˆψ*3*(μ, t) =*

4μ^{2}*+ t*^{2}*+ t*

2 *,* (7)

*ˆψ*4*(μ, t) =*

⎧⎨

⎩

*t*−^{μ}_{2} *if t* *> μ,*

*t*^{2}

2*μ* if 0*≤ t ≤ μ,*
0 *if t* *< 0.*

(8)

where the corresponding kernel functions are respectively given by

*d*1*(t) =* *e*^{−x}*(1 + e*^{−x}*)*^{2}*,*
*d*2*(t) =*

1 if −^{1}_{2} *≤ x ≤* ^{1}_{2}*,*
0 otherwise,
*d*3*(t) =* 2

*(x*^{2}*+ 4)*^{3}^{2}*,*

*d*4*(t) =*

1 if 0*≤ x ≤ 1,*
0 otherwise.

Likewise, we can achieve the smoothing function of*|t| via convolution as follows:*

*ˆp(|t| , μ) = ˆp(t, μ) + ˆp(−t, μ) =*

_{+∞}

−∞ *|t − s| ˆs(s, μ)ds.*

Analogous to (5)–(8), we construct four smoothing functions for*|t| as [23]:*

*ψ*1*(μ, t) = μ*
ln

1*+ e*^{−}^{μ}* ^{t}*
+ ln

1*+ e*^{μ}^{t}

*,* (9)

*ψ*2*(μ, t) =*

⎧⎨

⎩

*t* *if t* ≥ ^{μ}_{2}*,*

*t*^{2}

*μ* +^{μ}_{4} if −^{μ}_{2} *< t <* ^{μ}_{2}*,*

*−t* *if t* ≤ −^{μ}_{2}*,*

(10)

*ψ*3*(μ, t) =*

4μ^{2}*+ t*^{2}*,* (11)

*ψ*4*(μ, t) =*

*t*^{2}

2*μ* if *|t| ≤ μ,*

*|t| −*^{μ}_{2} if *|t| > μ.* (12)

In particular, if we take a Epanechnikov kernel function

*d(t) =*

_{3}

4*(1 − t*^{2}*) if |t| ≤ 1,*
0 if otherwise*,*
then the associated smoothing function for*|t| is expressed by*

*ψ*5*(μ, t) =*

⎧⎪

⎨

⎪⎩

*t* *if t* *> μ,*

−_{8}^{t}_{μ}^{4}3 +^{3t}_{4}_{μ}^{2} +^{3}_{8}* ^{μ}* if

*− μ ≤ t ≤ μ,*

*−t* *if t* *< −μ.*

(13)

*Moreover, taking a Gaussian kernel function d(t) =* ^{√}^{1}_{2}_{π}*e*^{−}^{t 2}^{2} *for all t* ∈ IR yields

*ˆs(t, μ) :=* 1
*μd*

*t*
*μ*

= 1

2πμ^{2}*e*^{−}

*t 2*
2*μ2**,*

which leads to another type of smoothing function [27] for*|t|:*

*ψ*6*(μ, t) = terf*

*t*

√2*μ*

+

2

*πμe*^{−}^{2μ2}^{t 2}*,* (14)
where the error function is defined as follows:

erf(t) = 2

√*π*

_{t}

0

*e*^{−u}^{2}*du* *∀t ∈ IR.*

In summary, we have obtained six smoothing functions and will employ them to
*approximate the l*1-norm for signal recovery problem.

**3 Algorithm and convergence properties**

This section is devoted to the detailed description and implementation of our algorith- mic idea and its global convergence results. Basically, it is a type of conjugate gradient algorithm. To proceed, we briefly review the conjugate gradient method. Suppose that

*f* : IR* ^{n}*→ IR is a continuously differentiable function and the target problem is

*x*min∈IR^{n}*f(x).*

*Let d** ^{k}*denote the search direction and

*α*

*k*be a step length. Then, the general framework of conjugate gradient method is as below.

*x*^{k}^{+1}*= x*^{k}*+ α**k**d*^{k}*,*
*d** ^{k}* =

*−∇ f (x*^{k}*)* *if k= 0,*

*−∇ f (x*^{k}*) + β**k**d*^{k}^{−1} *if k> 0,* (15)

where *∇ f is the gradient of f and β**k* is a parameter. In general, various choices
of parameters*β**k* represent (correspond to) different conjugate gradient algorithms.

Among them, the three-term PRP (Polak–Ribiere–Polyak) conjugate gradient method is the most efficient, where

*d** ^{k}* =

*−∇ f (x*^{k}*)* *if k= 0,*

*−∇ f (x*^{k}*) + β**k**d*^{k}^{−1}*− θ**k**y*^{k}^{−1} *if k> 0,*
*y*^{k}^{−1}*= ∇ f (x*^{k}*) − ∇ f (x*^{k}^{−1}*),*

*β**k* = *(∇ f (x*^{k}*))*^{T}*y*^{k}^{−1}

*∇ f (x*^{k}^{−1}*)*^{2} *,*
*θ**k* = *(∇ f (x*^{k}*))*^{T}*d*^{k}^{−1}

*∇ f (x*^{k}^{−1}*)*^{2} *.* (16)

The utmost feature of this conjugate gradient method is to ensure the sufficient descent property of direction at each iteration, which plays a prominent role in the global convergence analysis. For more details, please refer to [32] and references therein.

For convenience, we denote

*f*_{μ}*(x) := λφ**i**(μ, x) +*1

2*b − Ax*^{2}2*, x ∈ IR*^{n}*,* (17)
where*φ**i**(μ, x) is a smoothing function of l*1norm*x*1. In other words, it corresponds
to the format as below:

*φ**i**(μ, x) :=*

*n*
*j*=1

*ψ**i**(μ, x**j**), i = 1, 2, 3, 4, 5, 6.* (18)

Thus, the problem (4) is equivalent to the smoothing penalized least squares optimiza- tion problem:

*x*min∈IR^{n}*f*_{μ}*(x)* (19)

which is our new target problem. To deal with the unconstrained minimization (19), the following iterative scheme is employed, see [32].

*The Algorithm*

*Step 0 Given starting point x*^{0}∈ IR* ^{n}*and

*μ*0

*> 0. Choose parameters ρ, δ, γ , and*

*¯γ ∈ (0, 1). Let d*^{0}*= −∇ f**μ*0*(x*^{0}*). Set k = 0.*

*Step 1 If termination criterion is reached, then stop.*

*Step 2 Determineα**k**= max{ρ*^{j}*, j = 0, 1, 2, . . .} satisfying*
*f*_{μ}_{k}*(x*^{k}*+ α**k**d*^{k}*) ≤ f*_{μ}*k**(x*^{k}*) − 2δ(1 − γ )α**k**f*_{μ}_{k}*(x*^{k}*).*

*Step 3 Let x*^{k}^{+1}*= x*^{k}*+ α**k**d** ^{k}*and

*μ*

*k*+1

*= ¯γμ*

*k*

*. Replace f with f*

_{μ}

_{k}_{+1}in formula (16) and compute d

^{k}^{+1}by (16).

*Step 4 Set k= k + 1 and return to step 1.*

We also say a few words about how to update*μ**k*in Step 3. For Experiment 1,*μ**k*

is updated at each iteration by*μ**k*+1*= ¯γμ**k*, *¯γ ∈ (0, 1) and we select ¯γ = 0.4. For*
Experiment 2 and Experiment 3, an inner loop is used to solve our relaxed problem
for*μ**k*, and then reduces*μ**k*. This is indeed a technique appeared in NESTA. When
*μ**k* is small enough, our algorithm no longer updates the parameter*μ**k*.

**Lemma 3.1 Suppose that the function f**_{μ}*(x) is defined by (*17). Then, there exists a
*constant L> 0 such that*∇*f*_{μ}*(x) − ∇ f**μ**(y) ≤* *Lx − y for all x, y ∈ IR*^{n}*.*
**Proof For any fixed μ > 0, in order to prove the Lipschitz property of ∇ f***μ**(x), we*
need to verify the Lipschitz condition of*ψ**i* *(i = 1, 2, 3, 4, 5, 6). To this end, we*
discuss two cases.

*Case (i): i* *= 1, 3, 5, 6. For any t*1*, t*2*∈ IR, without losing of generality, let t*1*< t*2, by
Lagrange Mean Value Theorem, we have

*ψ**i*^{}*(μ, t*1*) − ψ*_{i}^{}*(μ, t*2*)* =*ψ**i*^{}*(μ, ξ)** |t*1*− t*2*| , ξ ∈ (t*1*, t*2*).*

For subsequent analysis, we need to estimate*ψ*_{i}^{}*(μ, ξ)** for each i = 1, 3, 5, 6.*

*For i* = 1, we know that

*|ψ*1^{}*(μ, ξ)| =* 1
*μ*

⎡

⎣ *e*^{ξ}^{μ}

*(1 + e*^{μ}^{ξ}*)*^{2} + *e*^{−ξ}^{μ}*(1 + e*^{−ξ}^{μ}*)*^{2}

⎤

*⎦ <* 2
*μ.*

*For i* = 3, it is clear that*ψ*3^{}*(μ, ξ)* =* _{(4μ}*2

^{4}

*+ξ*

^{μ}^{2}

^{2}

*)*

^{3}

^{/2}*<*

_{2}

^{1}

_{μ}*.*

*For i*= 5, we have

*ψ*5^{}*(μ, t) =*

⎧⎪

⎨

⎪⎩

1 *if t> μ,*

−_{2μ}^{t}^{3}3+_{2μ}* ^{3t}* if

*− μ ≤ t ≤ μ,*

−1 *if t< −μ.*

*ψ*5^{}*(μ, t) =*

⎧⎪

⎨

⎪⎩

0 *if t> μ,*

−_{2μ}^{3t}^{2}3+_{2μ}^{3} if *− μ ≤ t ≤ μ,*

0 *if t< −μ.*

which yields

*ψ*5^{}*(μ, ξ)** <*3μ^{2}
2μ^{3} + 3

2μ = 3
*μ.*
*For i* = 6, we compute

*ψ*6^{}*(μ, t) =* 2

√*π*

√* ^{t}*
2

*μ*

0

*e*^{−u}^{2}*du, ψ*6^{}*(μ, t) =*

√2

√*πμe*

*−t2*
2*μ2**,*

which imply*ψ*6^{}*(μ, ξ)** <* ^{√}^{√}_{πμ}^{2} .

All the aforementioned results indicate that

*ψ**i*^{}*(μ, t*1*) − ψ*_{i}^{}*(μ, t*2*)* ≤ 3

*μ|t*1*− t*2*|, i = 1, 3, 5, 6.* (20)
*for any t*1*, t*2∈ IR.

*Case (ii): i* *= 2, 4. Indeed, we will provide a version like (20) forψ*_{2}^{} and*ψ*_{4}^{}.
*For i* = 2, we know that

*ψ*_{2}^{}*(μ, t) =*

⎧⎨

⎩

1 *if t*≥ ^{μ}_{2}*,*

*2t**μ* if −^{μ}_{2} *< t <* ^{μ}_{2}*,*

*−1 if t ≤ −*^{μ}_{2}*.*

*If t*1≥ ^{μ}_{2}*, t*2≥ ^{μ}_{2}*, t*1≤ −^{μ}_{2}*, t*2≤ −^{μ}_{2} *or t*1*, t*2*∈ (−*^{μ}_{2}*,*^{μ}_{2}*), then*

*ψ*2^{}*(μ, t*1*) − ψ*2^{}*(μ, t*2*)* ≤ 2

*μ|t*1*− t*2*|.*

*If t*1≥ ^{μ}_{2}*, t*2≤ −^{μ}_{2}, then

*ψ*2^{}*(μ, t*1*) − ψ*_{2}^{}*(μ, t*2*)** = 2 = μ*2
*μ* ≤ 2

*μ|t*1*− t*2*|.*

*If t*1≥ ^{μ}_{2}*, t*2*∈ (−*^{μ}_{2}*,*^{μ}_{2}*), then*

*ψ*2^{}*(μ, t*1*) − ψ*2^{}*(μ, t*2*)* = 1 −*2t*2

*μ* *<* *2t*1

*μ* −*2t*2

*μ* = 2

*μ|t*1*− t*2*|.*

*If t*1≤ −^{μ}_{2}*, t*2*∈ (−*^{μ}_{2}*,*^{μ}_{2}*), then*

*ψ*2^{}*(μ, t*1*) − ψ*2^{}*(μ, t*2*)* = 1 +*2t*2

*μ* *< −2t*1

*μ* +*2t*2

*μ* = 2

*μ|t*1*− t*2*|.*

Thus, it is clear to conclude that

*ψ*2^{}*(μ, t*1*) − ψ*2^{}*(μ, t*2*)* ≤ 2

*μ|t*1*− t*2*|, ∀ t*1*, t*2*∈ IR.*

*For i* *= 4, using the similar arguments for case of i = 2, it can be verified that*

*ψ*4^{}*(μ, t*1*) − ψ*4^{}*(μ, t*2*)* ≤ 1

*μ|t*1*− t*2*|, ∀ t*1*, t*2*∈ IR.*

In summary, we also achieve

*ψ**i*^{}*(μ, t*1*) − ψ*_{i}^{}*(μ, t*2*)* ≤ 2

*μ|t*1*− t*2*|, i = 2, 4.* (21)
*for any t*1*, t*2∈ IR.

Now, applying (20) and (21), for any x, y ∈ IR* ^{n}*, we have

∇*f*_{μ}*(x) − ∇ f**μ**(y)*

=*λ [∇φ**i**(μ, x) − ∇φ**i**(μ, y)] + A*^{T}*(Ax − b) − A*^{T}*(Ay − b)*

≤ 3

*μλnx − y + A*^{2}*x − y*

*= Lx − y,*

*where L* =_{μ}^{3}*λn + A*^{2}. Thus, the proof is complete.

**Lemma 3.2 For**μ > 0, the level set L(x^{0}*) = {x ∈ IR*^{n}*| f**μ**(x) ≤ f**μ**(x*^{0}*)} is bounded.*

**Proof We prove it by contradiction. Suppose that the set L(x**^{0}*) is unbounded. Then,*
*there exists an index set K*1, such that *x*^{k}* → ∞, k → ∞, k ∈ K*1. Recalling
*the definition of f*_{μ}*(x), we have f**μ**(x*^{k}*) → ∞, k → ∞, k ∈ K*1. This contradicts
*f*_{μ}*(x*^{k}*) ≤ f**μ**(x*^{0}*). Thus, the level set L(x*^{0}*) is bounded.*

*In fact, since the continuity of f*_{μ}*(x), we find the level set L(x*^{0}*) is compact.*

We point out that Lemmas3.1and 3.2play key roles in our theoretical part. They were assumed as two assumptions in [32] and other literature like [14]. Here we assert them by showing that function (17) based on the proposed smoothing functions satisfies these assumptions. With these properties, we are ready to present the global convergence of our algorithm.

**Theorem 3.1 For any**μ > 0, consider the aforementioned algorithm with any starting*point x*^{0}*. Let{x*^{k}*} be the sequence generated by the algorithm. Then*

lim inf

*k*→∞ *∇ f*_{μ}*(x*^{k}*) = 0.* (22)

* Proof Suppose that the conclusion (*22) is not true. Then, there exists a constant

*ε*0

*> 0,*such that

*∇ f*_{μ}*(x*^{k}*) ≥ ε*0*, ∀ k.* (23)

Since*(∇ f**μ**(x*^{k}*))*^{T}*d*^{k}*= −∇ f**μ**(x*^{k}*)*^{2}, there exists*α**k**> 0, such that*

*f*_{μ}*(x*^{k}*+ α**k**d*^{k}*) ≤ f**μ**(x*^{k}*) − 2δ(1 − γ )α**k**f*_{μ}*(x*^{k}*).* (24)
This means*{ f*_{μ}*(x*^{k}*)} is decreasing and bounded, which implies that x*^{k}*∈ L(x*^{0}*) and*
*{ f*_{μ}*(x*^{k}*)} is convergent. We denote that lim**k*→∞ *f*_{μ}*(x*^{k}*) = f*_{∗}. From (24), it can be
verified that lim*k*→∞*α**k**f*_{μ}*(x*^{k}*) = 0. By the definition of f*_{μ}*(x), we know f*_{∗} *> 0,*
and therefore 0*< α**k* =^{α}^{k}_{f}^{f}^{μ}^{(x}^{k}^{)}

*μ**(x*^{k}*)* ≤ ^{α}^{k}^{f}^{μ}_{f}_{∗}^{(x}^{k}* ^{)}*. To sum up, we obtain

*k*lim→∞*α**k* *= 0.* (25)

Now, applying Lemma3.2, there is a constant r *> 0, such that*

*∇ f*_{μ}*(x*^{k}*) ≤ r, ∀ k.* (26)

Next, we prove *{d*^{k}*} is bounded by a contradiction argument. If {d** ^{k}*} is

*unbounded, then there exists an index K*2, such that

*d*

^{k}*→ ∞, k → ∞, k ∈ K*2. Let

*θ*

*k*be the angle between

*−∇ f*

*μ*

*(x*

^{k}*) and d*

*. Then, we see that*

^{k}cos*θ**k*= *−(∇ f*_{μ}*(x*^{k}*))*^{T}*d*^{k}

*∇ f*_{μ}*(x*^{k}*)d** ^{k}* =

*∇ f*

_{μ}*(x*

^{k}*)*

*d*^{k}*.* (27)

This relation enables us to apply *ε*0 *≤ ∇ f**μ**(x*^{k}*) ≤ r to conclude cos θ**k* → 0,
*k∈ K*2*, k→ ∞, which means θ**k*→ ^{π}_{2}*, k∈ K*2*, k*→ ∞. Considering this geometric
relationship, we have*(∇ f**μ**(x*^{k}*))*^{T}*d*^{k}*→ 0, k ∈ K*2*, k*→ ∞, from which we find

*−∇ f**μ**(x*^{k}*)*^{2}*= (∇ f**μ**(x*^{k}*))*^{T}*d*^{k}*→ 0, k ∈ K*2*, k → ∞,*

which contradicts (23). Thus,*{d*^{k}*} is bounded, i.e., there exists a constant M*^{∗}*> ε*0,
such that

*d*^{k}* ≤ M*^{∗}*, ∀ k.* (28)

Then, combining (25) together with (28) gives

*k*lim→∞*α**k**d*^{k}* = 0.* (29)

In addition, from (27), we have cos*θ**k* ≥ _{M}^{ε}^{0}∗, which further yields

*− (∇ f*_{μ}*(x*^{k}*))*^{T}*d*^{k}*= ∇ f*_{μ}*(x*^{k}*)d*^{k}* cos θ**k* ≥ *ε*_{0}^{2}

*M*^{∗}*d*^{k}*.* (30)
Applying the mean value theorem, we obtain

*f*_{μ}*(x*^{k}*+ α**k**d*^{k}*) = f*_{μ}*(x*^{k}*) + α**k**(∇ f*_{μ}*(ξ*^{k}*))*^{T}*d*^{k}

*= f**μ**(x*^{k}*) + α**k**(∇ f**μ**(x*^{k}*))*^{T}*d*^{k}*+ α**k**(∇ f**μ**(ξ*^{k}*) − ∇ f**μ**(x*^{k}*))*^{T}*d*^{k}

*≤ f*_{μ}*(x*^{k}*) + α**k**d*^{k}*(∇ f*_{μ}*(x*^{k}*))*^{T}*d*^{k}

*d*^{k}*+ ∇ f*_{μ}*(ξ*^{k}*) − ∇ f*_{μ}*(x*^{k}*)*
*,*
(31)
where*ξ*^{k}*is between x*^{k}*and x*^{k}*+ α**k**d** ^{k}*. Using Lemma3.1and the compact property

*of level set L(x*

^{0}

*) imply that ∇ f*

_{μ}*(x) is uniformly continuous on L(x*

^{0}

*). This together*and (29) implies that there exists a constant

*α > 0, for sufficiently large k, such that*

*α*

*k*

*d*

^{k}*< α and*

*∇ f*_{μ}*(ξ*^{k}*) − ∇ f*_{μ}*(x*^{k}*) <* 1
2

*ε*_{0}^{2}

*M*^{∗}*.* (32)

*Then, when k is sufficiently large, from (31), we have*

*f*_{μ}*(x*^{k}*+ α**k**d*^{k}*) ≤ f*_{μ}*(x*^{k}*) + α*

− *ε*^{2}_{0}
*M*^{∗} +1

2
*ε*_{0}^{2}
*M*^{∗}

*= f*_{μ}*(x*^{k}*) −* *ˆαε*_{0}^{2}

*2M*^{∗}*,* (33)

*which contradicts f*_{μ}*(x*^{k}^{+1}*) − f*_{μ}*(x*^{k}*) → 0, k → ∞. Therefore, there must hold*
lim inf

*k*→∞ *∇ f*_{μ}*(x*^{k}*) = 0.*

**4 Numerical experiments**

In this section, we conduct numerical experiments to show the performance of our algorithm for signal recovery. All tests are run in MATLAB R20116 on a 64-bit PC with an Intel (R) Core(TM) i7-6500U of 2.50 GHz CPU and 8.00 GB of RAM equipped with Windows 7 operating system.

*Experiment 1*

In what follows, we divide our tests into two groups: noise-free case and noise case.

we report the numerical results of our algorithm for signal recovery. In all of our
*simulations, A*∈ IR^{m}^{×n}*is assumed to be Gaussian matrix and m*=^{n}_{2}, ^{n}_{4}*, x*^{∗}∈ IR* ^{n}*is

*a K -sparse original signal with K*=

_{40}

*, whose nonzero component satisfies normal*

^{n}*distribution N(0, 1). Moreover, we set b = Ax + e in the noisy case, where e is the*Gaussian noise with zero mean and variance

*σ*

^{2}.

The parameters of our algorithm are summarized as follows:

*x*^{0}*= zeros(n, 1), λ = 0.001 ∗ A*^{T}*b*∞*, ρ = 0.5, δ = 0.002, γ = 0.2, σ*^{2}*= 0.0001.*

In Tables1and2, we set*μ*0*= 0.1, μ**k*+1*= 0.4μ**k*and the algorithm terminates when
the relative error

*x − x*^{∗}

*x*^{∗} *< 4 × 10*^{−3}*,*

*where x is the reconstruction signal. In Tables* 3 and4, F denotes the smoothing
functions, CPU denotes cpu time, Re denotes the relative error and It denotes the
iterations, we set*μ = 1.0 e*^{−2}*, 1.0 e*^{−3}*, 1.0 e*^{−4}*, 1.0 e*^{−5}and the algorithm terminates
when

*| f*_{μ}*(x*^{k}^{+1}*) − f*_{μ}*(x*^{k}*)|*

*| f*_{μ}*(x*^{k}^{+1}*)|* *< 1.0 e*^{−12}*.*

Note that the result from each test is obtained by running our algorithm 10 times and taking its average result.

In our experiments, we found that*ψ*1is very unstable, sometimes it is not applicable
to solve the test problems. Similar concern had also been addressed in [13, page 135].

**Table 1 Comparisons of five smoothing functions for noise-free case**

CPU time Iterations

*n* *m* *ψ*2 *ψ*3 *ψ*4 *ψ*5 *ψ*6 *ψ*2 *ψ*3 *ψ*4 *ψ*5 *ψ*6

2000 *n**/2* 1*.21* 0*.94* 1*.32* 1*.27* 0*.92* 72 74 75 73 69

*n**/4* 1*.37* 0*.90* 1*.45* 1*.59* 1*.00* 129 135 142 148 147

4000 *n**/2* 4*.16* 3*.83* 5*.01* 4*.50* 3*.80* 70 70 78 72 68

*n**/4* 4*.68* 4*.12* 5*.19* 5*.58* 4*.50* 131 143 143 148 146

6000 *n**/2* 9*.54* 8*.49* 9*.80* 9*.22* 8*.61* 70 70 70 69 68

*n**/4* 9*.69* 10*.35* 10*.71* 12*.91* 10*.54* 134 163 145 159 164

8000 *n**/2* 16*.93* 16*.25* 18*.76* 18*.91* 15*.55* 67 71 74 78 69

*n**/4* 15*.76* 15*.65* 18*.65* 19*.49* 15*.75* 126 137 145 144 138

10000 *n**/2* 25*.88* 24*.06* 30*.07* 25*.83* 24*.21* 67 67 76 68 68

*n**/4* 27*.94* 28*.28* 29*.96* 34*.48* 27*.60* 139 138 151 160 149

**Table 2 Comparisons of five smoothing functions for noisy case**

CPU time Iterations

*n* *m* *ψ*2 *ψ*3 *ψ*4 *ψ*5 *ψ*6 *ψ*2 *ψ*3 *ψ*4 *ψ*5 *ψ*6

2000 *n**/2* 1*.27* 1*.00* 1*.30* 1*.33* 1*.00* 73 78 75 75 72

*n**/4* 1*.54* 0*.92* 1*.64* 1*.55* 0*.91* 145 151 154 150 144

4000 *n**/2* 4*.48* 4*.47* 4*.89* 4*.62* 4*.36* 72 80 75 72 76

*n**/4* 4*.94* 5*.16* 5*.77* 5*.09* 4*.65* 137 159 154 143 158

6000 *n**/2* 8*.81* 9*.07* 9*.80* 9*.91* 8*.97* 65 72 72 67 69

*n**/4* 11*.65* 11*.61* 12*.10* 12*.38* 10*.12* 160 167 152 165 154

8000 *n**/2* 16*.27* 17*.05* 18*.28* 18*.53* 16*.69* 69 75 76 74 71

*n**/4* 15*.76* 18*.04* 19*.27* 17*.99* 16*.91* 132 155 145 145 143

10000 *n**/2* 25*.59* 25*.58* 26*.04* 25*.82* 25*.38* 68 71 67 67 68

*n**/4* 25*.82* 27*.23* 37*.00* 29*.52* 27*.73* 136 150 175 150 155

Accordingly, we try to adopt the equivalent expression suggested in [13, formula (3.1)]

and rewrite*ψ*1*(t) as below:*

*ψ*1*(t) = μ*

ln

*e*^{λ1(A(t))}^{μ}*+ e*^{λ2(A(t))}^{μ}

+ ln

*e*^{λ1(B(t))}^{μ}*+ e*^{λ2(B(t))}^{μ}

*= λ*1*(A(t)) + μ*

*e**λ2(A(t))−λ1(A(t))*
*μ*

*+ λ*1*(B(t)) + μ*

*e**λ2(B(t))−λ1(B(t))*
*μ*

where

*A(t) =*

0 0
0 *−t*

*, B(t) =*

0 0
0 *t*

*,*
*λ*1*(A(t)) = 0, λ*2*(A(t)) = −t, λ*1*(B(t)) = 0, λ*2*(B(t)) = t.*

Nonetheless,*ψ*1still does not work well along with the algorithm. We therefore omit
it in our numerical reports.

From the experiments, we have the following observations:

(1) The sparse original signal can be recovered by our algorithm effectively.

Tables1,2,3,4and Figs.1,2,3,4,5illustrate that the five smoothing func-
tions (except*ψ*1) in our algorithm work quite well whether the given problem is
noise-free or noisy.

(2) The convergence behavior with respect to CPU time, iterations, signal size, and
the relative error are depicted in Figs.1,2,3,4,5, respectively. From Tables1,2
and Figs.1,2, we see that our algorithm costs more cpu time when increasing
*the dimension n, while the changes of their iterations are very marginal. On the*
whole, the performance order can be summarized as below:

*ψ*2*≥ ψ*4*≈ ψ*5*≈ ψ*6*> ψ*3

**Table****3**Comparisonsoffivefunctionsfordifferent*μ*when*n*=5000andnoise-freecase *μ*=*1.*0*e*−2*μ*=*1.*0*e*−3*μ*=*1.*0*e*−4*μ*=*1.*0*e*−5 F*m*CPUReItCPUReItCPUReItCPUReIt *ψ*2*n/*28*.80*0.015892*9.*34*0.*00408911*.03*0.003310016.780.0025141 *n/*41*0.*980.036517211*.49**0.*006417811*.83*0.002918417.060.0028237 *ψ*3*n/*29*.44*0.0567104*8.*14*0.*009086*7.*980.00328610.930.0025111 *n/*47*.44*0.1532167*9.*10*0.*0195179*8.*360.004516610.340.0030188 *ψ*4*n/*28*.22*0.053871*7.*92*0.*00597310*.58*0.00279312.010.0025104 *n/*49*.13*0.015713911*.11**0.*015415510*.69*0.003517913.430.0032194 *ψ*5*n/*27*.88*0.041968*7.*59*0.*00557210*.14*0.00259112.770.0025109 *n/*41*0.*260.189513211*.09**0.*011715610*.51*0.004116912.830.0034193 *ψ*6*n/*28*.95*0.036096*7.*94*0.*006384*9.*690.00279912.570.0023121 *n/*49*.17*0.0926166*9.*16*0.*0128173*8.*980.003817411.750.0030203

**Table****4**Comparisonsoffivesmoothingfunctionsfordifferent*μ*when*n*=5000andnoisycase *μ*=*1.*0*e*−2*μ*=*1.*0*e*−3*μ*=*1.*0*e*−4*μ*=*1.*0*e*−5 F*m*CPUReItCPUReItCPUReItCPUReIt *ψ*2*n/*28*.62*0.015989*8.*770.00728811*.82*0.002610015.800.0025143 *n/*41*1.*670.037917410*.40*0.006318010*.26*0.003318815.030.0033242 *ψ*3*n/*28*.54*0.0548100*7.*860.008588*8.*750.00339210.380.0025109 *n/*47*.43*0.1506169*8.*430.0185169*7.*780.00451779.820.0030203 *ψ*4*n/*26*.76*0.049164*8.*110.00587611*.40*0.00289511.930.0026105 *n/*41*1.*100.1439154*8.*840.0137152*9.*670.003517511.520.0035184 *ψ*5*n/*27*.36*0.054265*7.*820.005975*9.*570.00318711.330.0025104 *n/*41*0.*190.149214310*.00*0.012315410*.48*0.003917711.370.0034191 *ψ*6*n/*28*.54*0.035594*8.*780.006192*9.*050.00299611.570.0026119 *n/*47*.77*0.0908160*9.*150.0122180*8.*110.004016710.500.0030207

**Fig. 1 The convergence behavior: cpu time versus signal size (n)when***μ**k*+1*= 0.4μ**k*

**Fig. 2 The convergence behavior: iterations versus signal size (n) when***μ**k+1**= 0.4μ**k*

**Fig. 3 The convergence behavior: cpu time versus***− log μ with n = 5000*

**Fig. 4 The convergence behavior: iterations versus***− log μ with n = 5000*

**Fig. 5 The convergence behavior: relative error versus***− log μ with n = 5000*

where “≥” means performs better than and “≈” means there is no difference in numerical performance.

(3) According to Tables3,4and Figs.3,4,5, the smaller the parameter*μ, the better the*
signal to recover; and the more cpu time and iterations have to spend accordingly.

For any fixed*μ, although the function ψ*2sometimes takes a bit more cpu time
and iterations, it has a lower relative error. In view of this, we could conclude that
the function*ψ*2works best along with the proposed algorithm. From Fig.5, the
function*ψ*6also has a lower relative error, which means*ψ*6is a possible good
choice as well. To sum up, in view of relative error, we have

*ψ*2*≥ ψ*4*≈ ψ*5*≈ ψ*6*> ψ*3*.*

*Experiment 2*

Next, we try to do comparison with “NESTA”, which is a well-known, fast, open
software proposed in [2]. In the noisy case, b*= Ax + e, where e is the Gaussian noise*
with zero mean and variance*σ*^{2}*. In all of our simulations, A*∈ IR^{m}* ^{×n}*is assumed to

*be Gaussian matrix, where n*= 2

^{13}

*and m*=

^{n}_{4}

*. We select K -sparse original signal*

*x*

^{∗}∈ IR

^{n}*with K*=

_{40}

^{n}*, its nonzero component satisfies normal distribution N(0, 1).*

In our test, we set

*x*^{0}*= A*^{T}*b, λ = 0.0048 ∗ A*^{T}*b*∞*, ρ = 0.5, δ = 0.002, γ = 0.2, σ*^{2}*= 0.0001.*

**Table****5**ComparisonsoffivesmoothingfunctionsandNESTAfordifferenttolwhennoise-free tol=*1.*0*e*−4tol=*1.*0*e*−5tol=*1.*0*e*−6tol=*1.*0*e*−7tol=*1.*0*e*−8 FCPUReItCPUReItCPUReItCPUReItCPUReIt *ψ*223*.77*9.62E−0415028*.30*1.98E−0517735.652.11E−0521346.621.91E−0525659.971.58E−05342 *ψ*321*.82*6.37E−0314224*.55*3.03E−0517130.882.29E−0520636.542.01E−0524140.391.76E−05286 *ψ*426*.64*7.00E−0314627*.03*4.32E−0516936.792.21E−0521545.122.04E−0525859.341.76E−05338 *ψ*526*.32*2.21E−0315728*.04*2.77E−0517434.732.12E−0520945.382.14E−0525649.891.88E−05296 *ψ*619*.53*4.82E−0313824*.83*3.01E−0517232.412.34E−0521438.802.06E−0524550.091.68E−05301 NESTA*7.*962.28E−02555*8.*302.95E−0256817.123.26E−04113415.981.82E−05106044.501.74E−052999