Smoothing strategy along with conjugate gradient algorithm for signal reconstruction

(1)

Smoothing strategy along with conjugate gradient algorithm for signal reconstruction

Caiying Wu

^∗1

, Jing Wang

^∗1

, Jan Harold Alcantara

^†2,3

, and Jein-Shan Chen

^‡†2

1College of Mathematics Science, Inner Mongolia University, Hohhot 010021, China

2Department of Mathematics, National Taiwan Normal University, Taipei 11677, Taiwan

3Mathematics and Statistics Department, De La Salle University, Manila 1004, Philippines

February 1, 2021

Abstract. In this paper, we propose a new smoothing strategy along with conjugate gradient algorithm for the signal reconstruction problem. Theoretically, the proposed conjugate gradient algorithm along with the smoothing functions for the absolute value function is shown to possess some nice properties which guarantee global convergence. Numerical experiments and comparisons suggest that the proposed algorithm is an efficient approach for sparse recovery. Moreover, we demonstrate that the approach has some advantages over some existing solvers for the signal reconstruction problem.

Keywords. lp-norm regularization, signal recovery, conjugate gradient algorithm, sparse solution.

Mathematics Subject Classification (2000)90C33

∗The author’s research is supported by the Natural Science Foundation of Inner Mongolia Autonomous Region (2018MS01016)

†The author’s research is supported by Ministry of Science and Technology, Taiwan

‡Corresponding author. E-mail: [email protected].

1

(2)

1 Introduction

The target problem of this paper is the signal reconstruction problem, which has wide applications in compressive sensing [6, 7, 9, 16]. Tremendous amount of articles related to this topic can be found in the literature, and hence we do not intend to repeat its importance and various applications here. We shall directly look into its mathematical model and show our idea for tackling it. Mathematically, the signal reconstruction problem model is described as follows:

min kxk0

s.t. b = Ax, (1)

where x∈ IRⁿis the original sparse signal that needs to be recovered, A∈ IR^m^×n(m n) is the measurement matrix, b ∈ IR^m is the observed vector, and kxk⁰ represents the l0- norm of x, which is defined as the number of nonzero components of x. Note that b could also be contaminated with noise, that is, b = Ax + e, where e∈ IR^m denotes the noise.

Unfortunately, the l₀-norm minimization problem (1) is an NP-hard combinatorial optimization problem [24]. In order to avoid this difficulty, one popular approach is to replace l0-norm by l1-norm in model (1) and obtain the following l1-minimization problem:

min kxk1

s.t. b = Ax, (2)

where kxk1 denotes the l₁-norm of x andkxk1 =Pn

i=1|xi|. More specifically, under the Restricted Isometry Property (RIP), the l1-minimization (2) was shown to possess the same solution as l0-minimization (1). Please refer to [1, 4, 5, 7, 8, 10, 11, 16] for more details and survey.

Related to (2) are two regularized unconstrained minimization problems. The first one is given by

Model (I) : min

x∈IRⁿλkxk¹+1

2kAx − bk²2.

For instance, the studies in [7, 16, 21, 18] follow model (I). The second one is Model (II) : min

x∈IRⁿλ1kxk¹+ λ2kxk²2+1

2kAx − bk²2, which is the subject of investigation done in [35].

An alternative approach to compute the sparse solutions of l0-minimization (1) is proposed by Gribnoval and Nielsen in [19], which is given by the constrained model

min kxk^pp

s.t. Ax = b, (3)

(3)

where 0 < p < 1 and kxk^pp =

n

X

j=1

|x^j|^p. The problem (3) is called lp-minimization, which is motivated by the special property of lp-quasi-norm:

plim→0⁺kxk^pp =kxk0.

Similarly, the lp-minimization problem (3) is NP-hard [12]. In order to deal with this problem, three more regularized unconstrained minimization models are studied in the literature. The first one is

Model (III) : min

x∈IRⁿλkxk^pp+1

2kAx − bk²2, (4)

where λ > 0 is a parametric factor. The studies in [14, 17, 22, 26, 33] are based on the model (III). The second one is

Model (IV) : min

x∈IRⁿλ1kxk^pp+ λ2kxk²2+ 1

2kAx − bk²2,

where λ1, λ2 > 0 are two parameters. The model (IV) is studied in [34]. The third model based on (3) is

Model (V) : min

x∈IRⁿλ1kxk^pp+ λ2kxk¹+1

2kAx − bk²2,

where λ1, λ2 > 0 are two parameters. The model (V) is recently investigated in [30]. To sum up, we present in Figure 1 all the five models (I)-(V) together.

There are many optimization algorithms that can be applied to solve the above models (I)-(V). For the sake of simplicity and lower storage requirements, conjugate gradient algorithms are suitable for large scale problems. In this paper, like [27, 29, 32, 36], we are interested in applying the conjugate gradient method for signal recovery. To employ the conjugate gradient algorithm, we need to check the differentiability of the objective function that we aim to minimize. However, we observe that there is a common feature among all the aforementioned models (I)-(V). Not only they are regularized unconstrained min- imizations, but also all the objective functions are nonsmooth. The key part causing this is the non-differentiability of the absolute value function | · | inside l¹-norm and lp-norm.

In view of this feature, we shall consider smoothing strategy to approximate the absolute value function | · |. Albeit the idea is not new, we propose new smoothing function and compare different kinds of smoothing approaches along with conjugate gradient algorithm, which is the main contribution of this paper.

Recently, there are six smoothing functions studied in [25] to approximate the absolute value function|·|. Inspired by this article, we adapt them to work along with our proposed

(4)

min kxk⁰ s.t. Ax = b

min kxk^pp

s.t. Ax = b

(V) min

x∈IRⁿλ1kxk^pp+ λ2kxk1+ 1

2kAx − bk²2

(IV) min

x∈IRⁿλ1kxk^pp+ λ2kxk²2+1

2kAx − bk²2

(III) min

x∈IRⁿλkxk^pp+ 1

2kAx − bk²2

min kxk1

s.t. Ax = b

(II) min

x∈IRⁿλ₁kxk1+ λ₂kxk²2+ 1

2kAx − bk²2

(I) min

x∈IRⁿλkxk1 +1

2kAx − bk²2

Figure 1: Five Models for Sparse Reconstruction

conjugate gradient which will be recalled in the next section. First, we present them out as below.

ϕ₁(µ, t) = µ[ln(1 + e⁻^µ^t) + ln(1 + e^µ^t)],

ϕ2(µ, t) =







t if t≥ ^µ₂,

t²

µ + ^µ₄ if −^µ₂ < t < ^µ₂,

−t if t≤ −^µ₂, ϕ3(µ, t) =p4µ²+ t²,

ϕ4(µ, t) = ( t²

2µ if |t| ≤ µ,

|t| − ^µ₂ if |t| > µ,

ϕ5(µ, t) =







t if t > µ,

−_8µ^t⁴³ +^3t_4µ² + ^3µ₈ if −µ ≤ t ≤ µ,

−t if t < −µ,

ϕ₆(µ, t) = t erf

t

√2µ

+r 2

πµe⁻^2µ2^t2 .

(5)

where the error function is defined as follows:

erf(t) = 2

√π Z t

0

e^−u²du, ∀t ∈ IR.

With these smoothing functions, there are several smooth models that can be considered, which are given as below.

[SF(I)] min

x∈IRⁿλ

n

X

i=1

ϕj(µ, xi) + 1

2kAx − bk²2, [SF(II)] min

x∈IRⁿλ

n

X

i=1

ϕj(µ, xi) + λ2kxk²2+1

2kAx − bk²2, [SF(III)] min

x∈IRⁿλ

n

X

i=1

ϕ^p_j(µ, xi) + 1

2kAx − bk²2, [SF(IV)] min

x∈IRⁿλ1 n

X

i=1

ϕ^p_j(µ, xi) + λ2kxk²2+ 1

2kAx − bk²2, [SF(V)] min

x∈IRⁿλ1 n

X

i=1

ϕ^p_j(µ, xi) + λ2 n

X

i=1

ϕj(µ, xi) + 1

2kAx − bk²2,

Here, ϕj(µ, xi)≈ |xⁱ| for j = 1, 2, · · · , 6. Moreover, “[SF(I)]” stands for smoothing function for model (I). Likewise, the others represent similar meanings.

Now, we are ready to present our model. Theoretically, we only focus on model (III) given as in (4). In particular, our main idea is inspired by the re-weighted techniques used in [22, 23, 34]. To proceed, notice that we can express the objective function in model (III) as

λkxk^pp+ 1

2kAx − bk²2 = λ

n

X

i=1

|xi|^p+1

2kAx − bk²2 = λ

n

X

i=1

|xi|^p−1· |xi| + 1

2kAx − bk²2. In light of this expression, for j = 1, 2,· · · , 6, we construct the below smoothing function

Hj(x) := λ

n

X

i=1

(xi2+ µ²)^p−1² · ϕ^j(µ, xi) + 1

2kAx − bk²2, (5) where µ > 0 and µ > 0 are two parameters. Indeed, the weight (xi2 + µ²)^p−1² for i = 1, 2,· · · , n in (5) is regarded as a continuous variable, which is slightly different from that used in the literature. Then, we have a new smoothing strategy for model (III):

[ReSF(III)] min

x∈IRⁿλ

n

X

i=1

(xi2+ µ²)^p−1² · ϕ^j(µ, xi) + 1

2kAx − bk²2. (6)

(6)

“[ReSF(III)]” stands for re-weighted smoothing function for model (III). We apply a version of the conjugate gradient algorithm presented in the next section to the smooth model (6).

The remaining parts of this paper are organized as follows. In Section 2, we shall introduce a version of nonlinear conjugate gradient method to solve the model (III) by using (5) and (6). We prove the boundedness of the level sets of the objective function, as well as the Lipschitz continuity of its gradient. On the basis of these properties, we provide a theorem that shows the global convergence of the algorithm. In Section 3, we report some experimental results to demonstrate the efficiency of our proposed method.

Numerical comparisons with other popular algorithms are shown as well, which reveal that our new model has a satisfactory numerical performance, which strongly suggests that it is a good choice for the target problem. We discuss our conclusions in Section 4.

2 Algorithm and convergence analysis

In this section, we first describe the nonlinear conjugate gradient (CG) algorithm, which was proposed by Chen and Zhou and for image restoration in [13]. Here, we employ it for signal recovery. Then, we discuss the boundedness property of the level set, the Lipschitz continuity property of the objective function’s gradient and the global convergence. There are many versions of CG algorithms, and this one can be viewed as a version of the CG combined with limited memory BFGS method.

Algorithm 1

Step 0. Choose initial point x⁰ ∈ IRⁿ, ε0 > 0, µ > 0, µ > 0, r ≥ 0, δ ∈ (0, 1) and ρ∈ (0, 1). Set k = 0.

Step 1. Compute the search direction

d^k = −∇H^j(x^k) if k = 0,

−∇Hj(x^k) + βkd^k−1+ θkz^k−1 if k > 0, where

βk = ∇^TH_j(x^k)z^k⁻¹

(d^k⁻¹)^Tz^k⁻¹ − 2kz^k⁻¹k²∇^TH_j(x^k)d^k⁻¹ ((d^k⁻¹)^Tz^k⁻¹)² , θk = ∇^THj(x^k)d^k⁻¹

(d^k⁻¹)^Tz^k⁻¹ , z^k−1 = y^k−1+ tks^k−1,

tk = ε0

∇Hj(x^k)

r+ max

0,−(s^k−1)^Ty^k−1 ks^k⁻¹k²

,

(7)

with

y^k⁻¹ =∇Hj(x^k)− ∇Hj(x^k⁻¹), s^k⁻¹ = x^k− x^k⁻¹ = αk−1d^k⁻¹. Step 2. Compute αk = max{ρ⁰, ρ¹,· · · } satisfying

Hj(x^k+ ρⁱd^k)≤ Hj(x^k) + δρⁱ ∇Hj(x^k)T

d^k. Step 3. Set x^k+1 = x^k+ αkd^k.

Step 4. Set k = k + 1 and go to Step 1.

Lemma 2.1. Let Hj be defined as in (5). For any µ > 0 and µ > 0, the level set L(x⁰) = {x ∈ IRⁿ| Hj(x)≤ Hj(x⁰)} is bounded.

Proof. From the structure of the function ϕj(µ, t), it is not hard to verify that

tlim→∞(t²+ ¯µ²)^p−1² ϕj(µ, t) = +∞.

Hence, for any µ > 0, we obtain

kxk→∞lim Hj(x) = +∞. (7)

Assume thatL(x⁰) is unbounded. Then, there exists an index set K1, such thatkx^kk →

∞, as k → ∞ and k ∈ K1. By applying (7), it yields

Hj(x^k)→ +∞, as k → ∞ and k ∈ K1,

which contradicts Hj(x^k)≤ Hj(x⁰). Thus, the level setL(x⁰) is bounded. 2

Noting that the function (t² + ¯µ²)^p−1² is continuously differentiable of arbitrary order while the smoothing functions ϕ_j(µ, t) have Lipschitz continuous gradient, then we conclude that the first term of Hj given by (5) has Lipschitz continuous gradient over bounded sets. In particular, since the level setL(x⁰) is bounded by Lemma 2.1, it follows that the first term of Hj has Lipschitz continuous gradient overL(x⁰). Since the gradient of ¹₂kAx − bk²2 is also Lipschitz continuous over IRⁿ, then we get the following result.

Lemma 2.2. The function Hj defined in (5) has Lipschitz continuous gradient, that is, there exists a constant L > 0, such that

k∇Hj(x)− ∇Hj(y)k ≤ Lkx − yk, ∀ x, y ∈ L(x⁰). (8)

(8)

Lemma 2.3. Let Hj be defined as in (5) and {x^k}, {d^k} be generated by Algorithm 1.

Then, there holds

∇H^j(x^k)T

d^k≤ −1

2k∇H^j(x^k)k². (9)

Proof. With Lemma 2.1 and Lemma 2.2, the proof is exactly the same as that in [13, Lemma 2.2]. We omit it. 2

Theorem 2.1. Let Hj be defined as in (5) and {x^k}, {d^k} be generated by Algorithm 1.

Then, we have

lim inf

k→∞

∇H^j(x^k)

= 0, (10)

for j = 1, 2,· · · , 6.

Proof. Suppose that the conclusion (10) is not true. Then, there exists a constant r₁ > 0 such that

∇H^j(x^k)

≥ r¹, ∀k, j = 1, 2, · · · , 6.

Applying Lemma 2.3, it implies that {Hj(x^k)} is decreasing, which implies x^k ∈ L(x⁰).

From Lemma 2.1, the level set L(x⁰) is bounded. Thus, {Hj(x^k)} is convergent. In addition, from the Armijo line search in Step 2, we know

klim→∞αk ∇H^j(x^k)T

d^k= 0. (11)

On the other hand, applying (9) in Lemma 2.3 gives α_k ∇Hj(x^k)T

d^k ≤ −1

2α_kk∇Hj(x^k)k² ≤ −r²₁

2α_k < 0. (12) Putting (11) and (12) together yields

klim→∞α_k = 0.

Now, by [13, Lemma 2.3], there exists a constant ε > 0 such that kd^kk ≤ εk∇H^j(x^k)k, ∀k ≥ 0.

Since L(x⁰) is bounded, there is a constant r > 0 such that k∇Hj(x^k)k ≤ r, ∀k ≥ 0.

Thus, there holds kd^kk ≤ εr, and hence

klim→∞αkd^k= 0.

(9)

Note that from the Armijo line search, we have Hj(x^k)− Hj(x^k+^α_ρ^kd^k)

αk/ρ <−δ ∇H^j(x^k)T

d^k. (13)

Since {x^k} and {d^k} are bounded, there exist subsequences {x^k}k∈K and {d^k}k∈K such that

x^k → x and d^k → d, as k ∈ K, k → ∞.

Then, taking the limit on both sides of (13) with k ∈ K leads to

− (∇Hj(x))^T d≤ −δ (∇Hj(x))^T d =⇒ (∇Hj(x))^T d≥ 0, which contradicts Lemma 2.3. Therefore, the proof is complete. 2

3 Numerical Experiments

In this section, we report the results of our experiments demonstrating the applicability, efficiency and merits of our algorithm. All tests are carried out on a PC (3.20GHz, 32GB of RAM) with the use of Matlab R2020a.

The parameters of our algorithm are summarized as follows:

x⁰ = A^Tb, λ = λ0 = 0.001kA^Tbk∞, ρ = 0.2, δ = 0.3, ε0 = 10⁻¹⁰, r = 0, µ = 10⁻³.

For the smoothing parameter µ, we apply a continuation approach which has been widely used in the literature (see [3] for instance) in order to improve the quality of solution obtained. First, we set an initial value of the smoothing parameter as µ0 = 10⁻³ and use Algorithm 1 to solve (6). When running Algorithm 1, we used the stopping criterion implemented in NESTA [3] which is based on the relative variation of the objective function. In particular, we stop Algorithm 1 when

|H^j(x^k+1)− H^j(x^k)|

|Hj(x^k+1)| < 10⁻⁵, j = 1, 2, 3, 4, 5, 6. (14) For higher accuracy of the solution, a lower threshold value for the relative variation can be used. Since we are applying a continuation scheme, we reduce the value of µ then im- plement again Algorithm 1 as above with stopping criterion given by (14). In particular, for k≥ 0, we set µ^k+1 = 0.1µk. In addition to this, we also reduce the regularization parameter λ in each continuation step according to the formula λk+1 = 0.1λkfor k ≥ 0. This procedure is inspired by the approach used in fixed-point continuation (FPC) method

(10)

[20]. We repeat the continuation procedure until we obtain a solution within a desired relative error, or until we have reached a pre-specified maximum number of continuation steps. In our simulations, we set the maximum number of continuation steps to 4 which we found to be enough to obtain a good approximate solution for the problems we have considered.

3.1 Comparison of smoothing functions and choosing p

Meanwhile, the success and performance of our algorithm are indeed dependent on the parameter p and the choice of smoothing function. Thus, we explore these issues first before we present our comparisons with other algorithms.

Experiment 1. In this experiment, we determine which values of p will result to an algorithm which has a good frequency of success and fast convergence time. We let A∈ IR^m^×n be a Gaussian matrix with n = 2¹⁵= 32768, m = ⁿ₂, x^∗ ∈ IRⁿ is a K−sparse original signal with K =b₄₀ⁿc, whose nonzero components are sampled from N (0, 1). We consider two cases where b = Ax^∗ (noiseless case) and b = Ax^∗+ e where e is a Gaussian noise with zero mean and σ = 0.01.

For each test problem, the result is reported by running the algorithm 10 times and taking the average CPU time. Note that just like the numerical experiment in [29], the smoothing function ϕ1 is very unstable, so we do not include it in the numerical results.

In the sequel, the frequency of successful reconstructions means the percentage of all test instances that reach the required relative error which we set to 10⁻³, i.e.

kx − x^∗k

kx^∗k ≤ 10⁻³

where x is the solution obtained by the algorithm. The summary of the results are presented in Tables 1 and 2.

The summary of results for the noiseless and noisy cases are presented in Table 1 and Table 2, respectively. From these, we see that the algorithm has a very low frequency of success when a small value of p is used, particularly when p = 0.3. The frequency of success improves as the value of p increases. In terms of CPU time of convergence, we observe the same pattern that a larger value of p provides faster convergence. In particular, we obtain the fastest convergence time when p = 0.9. We can also see that the smoothing function ϕ3 with p = 0.9 has the optimal performance among all the smoothing functions and values of p considered. We confirm that this is the case for different dimensions in the next experiment.

Experiment 2. We compare the performance of different smoothing functions with p = 0.9, which is the optimal value of p found in the previous experiment. We generate A, b

(11)

Table 1: Average CPU time and frequency of success for different values of p and different smoothing functions ϕ_i. The table shows the results for the noiseless case where n = 32768, m = 16384 and K =bn/40c and the entries of A are sampled from N (0, 1).

p

Smoothing Functions

ϕ₂ ϕ₃ ϕ₄ ϕ₅ ϕ₆

CPU Success CPU Success CPU Success CPU Success CPU Success

Time Rate Time Rate Time Rate Time Rate Time Rate

0.3 - 0 492.96 0.7 - 0 1440.03 0.1 - 0

0.5 550.40 0.2 213.78 1 547.32 0.4 526.04 0.3 271.77 1

0.7 285.47 1 192.48 1 237.29 0.9 252.07 1 209.16 1

0.9 187.35 1 143.85 1 167.48 1 179.99 1 156.40 1

Table 2: Average CPU time and frequency of success for different values of p and different smoothing functions ϕ_i. The table shows the results for the noisy case where n = 32768, m = 16384 and K =bn/40c and the entries of A are sampled from N (0, 1).

p

Smoothing Functions

ϕ₂ ϕ₃ ϕ₄ ϕ₅ ϕ₆

CPU Success CPU Success CPU Success CPU Success CPU Success

Time Rate Time Rate Time Rate Time Rate Time Rate

0.3 - 0 495.181 0.8 - 0 - 0 - 0

0.5 586.03 0.4 211.681 1 525.22 0.5 527.27 0.3 296.24 0.9

0.7 285.53 0.9 197.75 1 237.70 1 256.40 1 186.95 1

0.9 187.01 1 144.03 1 167.70 1 178.19 1 155.81 1

(12)

and x^∗as in Experiment 1 but we consider different dimensions n∈ {2¹¹, 2¹², 2¹³, 2¹⁴, 2¹⁵, 2¹⁶} and let m = n/2 and K = bn/40c. The results are shown in Table 3. With p = 0.9, all of the smoothing functions were effective in solving the problem as we obtained 100%

success rate from 10 independent simulations. Moreover, from Table 3, we see that the function ϕ3 results to a more efficient algorithm both in the noiseless and noisy case.

Table 3: Average CPU time of the algorithm using different smoothing functions with p = 0.9. The frequency of success in all cases is 1. For each n, we set m = n/2 and K =bn/40c. The entries of A are sampled from N (0, 1).

n Noiseless Case Noisy Case

ϕ2 ϕ3 ϕ4 ϕ5 ϕ6 ϕ2 ϕ3 ϕ4 ϕ5 ϕ6

2048 0.32 0.27 0.29 0.35 0.31 0.33 0.29 0.30 0.37 0.31

4096 2.46 2.02 2.19 2.36 2.28 2.49 2.03 2.22 2.36 2.29

8192 10.84 8.12 9.37 10.18 8.80 10.45 8.01 9.28 10.09 8.67 16384 47.94 34.78 41.66 43.54 36.81 49.25 34.24 43.66 47.31 37.93 32768 183.71 144.51 167.88 177.72 154.49 187.21 148.41 168.69 178.98 158.69 65536 747.38 529.53 611.02 623.19 636.89 740.50 524.09 592.11 624.18 631.07

3.2 Comparisons with other solvers for sparse recovery prob- lems

With the findings from the previous section, we now compare our algorithm with ϕ₃ and p = 0.9 to other famous algorithms in the literature. We consider popular solvers FISTA [2], NESTA [3], and FPC-BB [20] which are designed to solve the convex model (I). We also look at comparisons with those algorithms based on the nonconvex model (III) with p = 0.5 including HALF [31], IRUcLq-v [23] and alternating direction method of multipliers (ADMM) [28].

Experiment 3. We consider the same A, b and x^∗ as in Experiment 1 and compare our algorithm with FISTA, NESTA, FPC-BB, HALF, IRUcLq-v and ADMM. For this experiment, we also consider the two cases where b is noiseless or corrupted with noise.

In addition, we explore the capability of the solvers to recover signals of different sparsity levels, namely K ∈ {b0.05mc, b0.1mc, b0.2mc, b0.3mc}. The results are summarized in Tables 4 and 5.

We observe from both the noiseless and noisy cases that FPC-BB is the fastest solver for the case when K =b0.05mc, K = b0.1mc and K = b0.2mc with 100% success rate.

For these sparsity levels, the second fastest solver is our CG Algorithm with the function

(13)

ϕ3 and p = 0.9, followed by the HALF and FISTA algorithms. On the other hand, the solvers IRUcLq-v and ADMM were able to solve all these test problems but with very high computation time compared with other solvers.

While FPC-BB is the fastest for the aforementioned sparsity levels, it failed to solve all the problems for the highest signal density level considered which is K =b0.3mc. For this case, CG Algorithm has the best performance in terms of CPU time and frequency of success among all the solvers considered. On the other hand, FISTA, HALF and IRUcLq-v algorithms were able to solve all the problems as well but the computation times required are larger, especially for IRUcLq-v. We point out that the methods IRUcLq-v and ADMM are the slowest solvers in all instances which is expected since these algorithms require the inversion of an n× n matrix, which is extremely costly for high dimensional problems.

To summarize, this experiment reveals that our method is efficient in handling different sparsity levels. In particular, it can recover signals which are dense (i.e. not strictly sparse) which is not achieved by FPC-BB, NESTA and ADMM.

Table 4: Average CPU time and frequency of success of different solvers for different sparsity levels with n = 32768 and m = 16384. The table shows the results for the noiseless case from 10 independent trials. The entries of A are sampled from N (0, 1).

Solver

Sparsity level (K)

b0.05mc b0.1mc b0.2mc b0.3mc

CPU Success CPU Success CPU Success CPU Success

Time Rate Time Rate Time Rate Time Rate

CG Algorithm 157.57 1 184.18 1 226.86 1 299.70 1

FISTA 259.09 1 283.90 1 308.37 1 345.23 1

NESTA 420.54 1 - 0 - 0 - 0

FPC-BB 106.97 1 121.15 1 175.65 1 - 0

HALF 187.59 1 224.99 1 335.86 1 543.17 1

IRUcLq-v 457.92 1 554.29 1 716.53 1 929.47 1

ADMM 578.85 1 665.60 1 967.74 1 - 0

Experiment 4. In this experiment, we look at the measurement matrix A ∈ IR^m^×n considered in [15, 23] where the entries are sampled fromN (0,_m¹) and we let n = 32768, m = n/4 = 8192. The signal x^∗ and the vector b are generated as in the previous experiments.

The results are shown in Table 6. It is evident that our method is very efficient and effective in solving the problems for different sparsity levels. In fact, it is the fastest

(14)

Table 5: Average CPU time and frequency of success of different solvers for different sparsity levels with n = 32768 and m = 16384. The table shows the results for the noisy case from 10 independent trials. The entries of A are sampled from N (0, 1).

Solver

Sparsity level (K)

CG Algorithm 159.44 1 183.83 1 227.61 1 294.96 1

FISTA 272.22 1 279.38 1 304.02 1 347.24 1

NESTA 422.98 1 - 0 - 0 - 0

FPC-BB 117.50 1 116.31 1 174.97 1 - 0

HALF 198.74 1 220.18 1 333.72 1 541.36 1

IRUcLq-v 454.91 1 642.26 1 704.10 1 912.16 1

ADMM 572.38 1 688.97 1 970.93 1 825.04 0.1

solver among all the algorithms considered followed by the FPC-BB and HALF algorithm.

Notice, however, that the time required by the HALF method to solve the problem is almost three times longer than the time needed by our algorithm. With this, we see that the CG algorithm is very efficient in handling the type of problem considered. In terms of the capability of the algorithms to solve the problem, it can also be seen that our algorithm (as well as HALF) is dominant as it was able to solve all the problems for all sparsity levels considered. On the other hand, FPC-BB algorithm failed to solve all the problems for sparsity levels b0.2mc and b0.3mc.

To close this section, we summarize our numerical findings:

• The CG algorithm with the smoothing function ϕ3 and p = 0.9 provides the best convergence time and success rate among all smoothing functions and values of p considered.

• Even though FPC-BB is usually the fastest solver among all, the difference between FPC-BB and our algorithm’s CPU time of convergence is not large (see Experiment 3). Moreover, our algorithm obtains a good success rate for problems with relatively denser signals which is not achieved by FPC-BB. In this respect, our algorithm has a major advantage.

• Taking into account both the CPU time of convergence and the success rate of the algorithms in handling different problems, we believe that our algorithm provides a better performance among all algorithms considered as it can solve more problems at a reasonable computation time.

(15)

Table 6: Average CPU time and frequency of success of different solvers for different sparsity levels with n = 32768 and m = 8192. The table shows the results for the noiseless case from 10 independent trials. The entries of the sensing matrix A are generated from N (0, 1/m)

Solver

Sparsity level (K)

CG Algorithm 24.66 1 33.00 1 51.29 1 96.44 1

FISTA 145.42 1 155.23 1 180.12 1 - 0

NESTA - 0 - 0 - 0 - 0

FPC-BB 56.95 1 70.67 1 - 0 - 0

HALF 76.27 1 88.52 1 127.90 1 251.64 1

IRUcLq-v 469.05 1 616.98 1 978.63 1 1535.11 1

ADMM 585.93 1 760.49 1 806.68 0.1 - 0

• While FPC-BB is often the fastest solver among other algorithms considered in this paper, we note that there are some problems where our method provides way better convergence time. We have shown this in Experiment 4 where we see that not only FPC-BB requires more than twice the time to obtain an approximate solution, but it also cannot solve problems with high density signals. On the other hand, the convergence times of other algorithms are much slower compared to our algorithm.

4 Conclusion

In this paper, we formulated a new model (6) which employs lp-norm along with a special smoothing function that is motivated by re-weighted techniques. In general, the smoothing strategy along with a version of conjugate gradient is the main focus and contribution of this paper, both theoretically and and numerically. Not only we show global convergence, but also from various experiments and comparisons we have strong numerical evidence to verify our proposed algorithm is a good choice for tackling sparse signal reconstruction problem. In particular, our method can efficiently solve the sparse recovery problem even when the dimension of the problem is relatively high. Moreover, we have also demonstrated that dense signals can be recovered by our algorithm, which is not achievable by other algorithms. A future work that is worth considering is the theoretical and numerical study of the other smoothing models (SFI)-(SFV) along with the conjugate gradient algorithm, and numerical comparisons of these models for dealing

(16)

with the sparse recovery problem.

Acknowledgements

The authors would like to thank the anonymous referees for their valuable comments and suggestions which have significantly improved the original version of the paper.

Declarations

Funding

C. W. and J. W. are supported by the Natural Science Foundation of Inner Mongolia Autonomous Region (2018MS01016). J. H. A. and J.-S. C. are supported by the Ministry of Science and Technology, Taiwan.

Conflict of Interest

The authors declare that they have no conflict of interest.

References

[1] R. Baraniuk, Compressive sensing, IEEE Signal Processing and Magazine, vol. 24, pp. 118-121, 2007.

[2] A. Beck and M. Teboulle, A fast iterative shrinkage thresholding algorithm for linear inverse problems, SIAM Journal on Imaging Sciences, vol. 2, no. 1, pp. 183-202, 2009.

[3] S. Becker, J. Bobin and E. Cand`es, NESTA: A fast and accurate first-order method for sparse recovery, SIAM J. Imaging Sciences, vol. 4, no. 1, pp. 1–39, 2011.

[4] A.M. Bruckstein, D.L. Donoho, and M. Elad, From sparse solutions of systems of equations to sparse modeling of signals and images, SIAM Review, vol. 51, pp. 34-81, 2009.

[5] E.J. Cand`es, The Restricted Isometry Property and its Implications for Compressed Sensing, Comptes Rendus Mathematique, vol. 346, no. 9-10, pp. 589-592, 2008.

[6] E. J. Cand`es and P. A. Randall, Highly robust error correction by convex programming, IEEE Transactions on Information Theory, vol. 54, pp. 2829-2840, 2008.

(17)

[7] E. J. Cand`es, J. Romberg, and T. Tao, Robust uncertainty principles: exact signal reconstruction from highly incomplete frequency information, IEEE Transactions on Information Theory, vol. 52, pp. 489-509, 2006.

[8] E.J. Cand`es, J.K. Romberg, and T. Tao, Stable signal recovery from incomplete and inaccurate measurements, Communications on Pure and Applied Mathematics, vol. 59, no. 8, pp. 1207-1223, 2006.

[9] E.J. Cand`es and T. Tao, Near optimal signal recovery from random projections:

universal encoding strategies, IEEE Transactions on Information Theory, vol. 52, pp.

5406-5425, 2004.

[10] E.J. Cand`es and T. Tao, Decoding by Linear Programming, IEEE Transactions on Information Theory, vol. 51, no. 12, pp. 4203-4215, 2005.

[11] E.J. Cand`es and T. Tao, Near optimal signal recovery from random projections:

universal encoding strategies? IEEE Transactions on Information Theory, vol. 52, no.

12, pp. 5406-5425, 2006.

[12] X. Chen, Y. Ye, Z. Wang, and D. Ge, Complexity of unconstrained L2− Lp

minimization, Mathematical Programming, vol. 143, pp. 371-383, 2014.

[13] X. Chen and W. Zhou , Smoothing nonlinear conjugate gradient method for image restoration using nonsmooth nonconvex minimization, SIAM Journal on Imaging Sciences, vol. 3, no. 4, pp. 765-790, 2010.

[14] X. Chen and W. Zhou, Convergence of the reweighted l1 minimization algorithm for l₂− lp minimization, Computational Optimization and Applications, vol. 59, pp.

47-61, 2014.

[15] I. Daubechies, R. DeVore, M. Fornasier and S. G¨unt¨uk, Iteratively reweighted least squares minimization for sparse recovery, Communications on Pure and Applied Mathematics, vol. 63, pp. 1–38, 2010.

[16] D. Donoho, Compressed sensing, IEEE Transactions on Information Theory, vol.

52, no. 4, pp. 1289-1306, 2006.

[17] J. Friedman, T. Hastie and R. Tibshirani. A note on the group lasso and a sparse group lasso. Stat. Theory (math.ST). Submitted on 5 Jan 2010.

[18] D. Ge, X. Jiang, and Y. Ye, A note on the complexity of Lp minimization, Mathematical Programming, vol. 129, pp. 285-299, 2011.

[19] R. Gribnoval and M. Nielsen, Sparse decompositions in unions of bases, IEEE Transactions on Information Theory, vol. 49 (2003), pp. 3320-3325.

(18)

[20] E. T. Hale, W. Yin and Y. Zhang, Fixed-point continuation for `1- minimization: Methodology and convergence, SIAM Journal on Optimization, vol.

19, pp. 1107-1130, 2008.

[21] K. Koh, S Kim, and S. Boyd, An interior-point method for large scale l1 regularized logistic regression. Journal of Machine Learning Research, vol. 8, pp. 1519-1555, 2007.

[22] M.-J. Lai and J. Wang, An unconstrained lq minimization with 0 < q ≤ 1 for sparse solutions of underdetermined linear system, SIAM Journal on Optimization, vol. 21, no. 1, pp. 82-101, 2011.

[23] M.-J. Lai, Y. Xu, and W. Yin, Improved iteratively reweighted least squares for unconstrained smoothed lq minimization, SIAM Journal on Numerical Analysis, vol.

51, no. 2, pp. 927-957, 2013.

[24] B. K. Natarajan, Sparse approximate solutions to linear systems, SIAM Journal on Computing, vol. 24, pp. 227–234, 1995.

[25] C. T. Nguyen, B. Saheya, Y.-L. Chang and J.-S. Chen, Unified smoothing functions for absolute value equation associated with second-order cone, Applied Numerical Mathematics, vol. 135, pp. 206–227, 2019.

[26] N. Simonn, J. Friedman, T. Hastieand and R. Tibshirani, A sparse-group lasso, Journal of Computational and Graphical Statistics, vol. 22, pp. 231-245, 2013.

[27] X. Wang, F. Liu, L.C. Jiao, J. Wu, and J. Chen, Incomplete variables trun- cated conjugate gradient method for signal reconstruction in compressed sensing, In- formation Sciences, vol. 288, pp. 387-411, 2014.

[28] Y. Wang, W. Yin, and J. Zheng, Global convergence of ADMM in nonconvex nonsmooth optimization, Journal of Scientific Computing, vol. 78, pp. 29–63, 2019.

[29] C. Wu, J. Zhan, Y. Lu, and J.-S. Chen, Signal reconstruction by conjugate gradient algorithm based on smoothing l1−norm, Calcolo, vol. 56, no 4, 2019.

[30] C. Wu, J. Zhang, and J.-S. Chen. An elastic unconstrained lq− l1 minimization for finding sparse solution, submitted manuscript, 2019.

[31] Z. Xu, X. Chang, F. Xu, and H. Zhang, L1/2 regularization: A thresholding representation theory and a fast solver, IEEE Transactions on Neural Networks and Learning Systems, vol. 23, pp. 1013-1027, 2012.

(19)

[32] K. Yin, Y.H. Xiao, and M.L. Zhang, Nonlinear conjugate gradient method for l1-norm regularization problems in compressive sensing, Journal of Computational Information Systems, vol. 7, pp. 880-885, 2011.

[33] C. Zhang, Nearly unbiased variable selection under minimax concave penalty, An- nals of Statistics, vol. 38, pp. 894-942, 2010.

[34] Y. Zhang and W. Ye, Sparse recovery by the iteratively reweighted l1 algorithm for elastic l2− lq minimization, Optimization, vol. 66, no. 10, pp. 1677-1687, 2017.

[35] H. Zou and T. Hastie, Regularization and variable selection via the elastic net, Journal of the Royal Statistical Society Series B-Statistical Methodology. vol. 67, pp.

301-320, 2005.

[36] H. Zhu, Y.H. Xiao, and S.Y. Wu, Large sparse signal recovery by conjugate gradient algorithm based on smoothing technique, Computers and Mathematics with Applications, vol. 66, pp. 24-32, 2013.