Smoothing strategy along with conjugate gradient algorithm for signal reconstruction
Caiying Wu
∗1, Jing Wang
∗1, Jan Harold Alcantara
†2,3, and Jein-Shan Chen
‡†21College of Mathematics Science, Inner Mongolia University, Hohhot 010021, China
2Department of Mathematics, National Taiwan Normal University, Taipei 11677, Taiwan
3Mathematics and Statistics Department, De La Salle University, Manila 1004, Philippines
February 1, 2021
Abstract. In this paper, we propose a new smoothing strategy along with conju- gate gradient algorithm for the signal reconstruction problem. Theoretically, the proposed conjugate gradient algorithm along with the smoothing functions for the absolute value function is shown to possess some nice properties which guarantee global convergence. Numerical experiments and comparisons suggest that the pro- posed algorithm is an efficient approach for sparse recovery. Moreover, we demon- strate that the approach has some advantages over some existing solvers for the signal reconstruction problem.
Keywords. lp-norm regularization, signal recovery, conjugate gradient algorithm, sparse solution.
Mathematics Subject Classification (2000)90C33
∗The author’s research is supported by the Natural Science Foundation of Inner Mongolia Autonomous Region (2018MS01016)
†The author’s research is supported by Ministry of Science and Technology, Taiwan
‡Corresponding author. E-mail: [email protected].
1
1 Introduction
The target problem of this paper is the signal reconstruction problem, which has wide applications in compressive sensing [6, 7, 9, 16]. Tremendous amount of articles related to this topic can be found in the literature, and hence we do not intend to repeat its importance and various applications here. We shall directly look into its mathemati- cal model and show our idea for tackling it. Mathematically, the signal reconstruction problem model is described as follows:
min kxk0
s.t. b = Ax, (1)
where x∈ IRnis the original sparse signal that needs to be recovered, A∈ IRm×n(m n) is the measurement matrix, b ∈ IRm is the observed vector, and kxk0 represents the l0- norm of x, which is defined as the number of nonzero components of x. Note that b could also be contaminated with noise, that is, b = Ax + e, where e∈ IRm denotes the noise.
Unfortunately, the l0-norm minimization problem (1) is an NP-hard combinatorial optimization problem [24]. In order to avoid this difficulty, one popular approach is to replace l0-norm by l1-norm in model (1) and obtain the following l1-minimization problem:
min kxk1
s.t. b = Ax, (2)
where kxk1 denotes the l1-norm of x andkxk1 =Pn
i=1|xi|. More specifically, under the Restricted Isometry Property (RIP), the l1-minimization (2) was shown to possess the same solution as l0-minimization (1). Please refer to [1, 4, 5, 7, 8, 10, 11, 16] for more details and survey.
Related to (2) are two regularized unconstrained minimization problems. The first one is given by
Model (I) : min
x∈IRnλkxk1+1
2kAx − bk22.
For instance, the studies in [7, 16, 21, 18] follow model (I). The second one is Model (II) : min
x∈IRnλ1kxk1+ λ2kxk22+1
2kAx − bk22, which is the subject of investigation done in [35].
An alternative approach to compute the sparse solutions of l0-minimization (1) is proposed by Gribnoval and Nielsen in [19], which is given by the constrained model
min kxkpp
s.t. Ax = b, (3)
where 0 < p < 1 and kxkpp =
n
X
j=1
|xj|p. The problem (3) is called lp-minimization, which is motivated by the special property of lp-quasi-norm:
plim→0+kxkpp =kxk0.
Similarly, the lp-minimization problem (3) is NP-hard [12]. In order to deal with this problem, three more regularized unconstrained minimization models are studied in the literature. The first one is
Model (III) : min
x∈IRnλkxkpp+1
2kAx − bk22, (4)
where λ > 0 is a parametric factor. The studies in [14, 17, 22, 26, 33] are based on the model (III). The second one is
Model (IV) : min
x∈IRnλ1kxkpp+ λ2kxk22+ 1
2kAx − bk22,
where λ1, λ2 > 0 are two parameters. The model (IV) is studied in [34]. The third model based on (3) is
Model (V) : min
x∈IRnλ1kxkpp+ λ2kxk1+1
2kAx − bk22,
where λ1, λ2 > 0 are two parameters. The model (V) is recently investigated in [30]. To sum up, we present in Figure 1 all the five models (I)-(V) together.
There are many optimization algorithms that can be applied to solve the above mod- els (I)-(V). For the sake of simplicity and lower storage requirements, conjugate gradient algorithms are suitable for large scale problems. In this paper, like [27, 29, 32, 36], we are interested in applying the conjugate gradient method for signal recovery. To employ the conjugate gradient algorithm, we need to check the differentiability of the objective func- tion that we aim to minimize. However, we observe that there is a common feature among all the aforementioned models (I)-(V). Not only they are regularized unconstrained min- imizations, but also all the objective functions are nonsmooth. The key part causing this is the non-differentiability of the absolute value function | · | inside l1-norm and lp-norm.
In view of this feature, we shall consider smoothing strategy to approximate the absolute value function | · |. Albeit the idea is not new, we propose new smoothing function and compare different kinds of smoothing approaches along with conjugate gradient algo- rithm, which is the main contribution of this paper.
Recently, there are six smoothing functions studied in [25] to approximate the absolute value function|·|. Inspired by this article, we adapt them to work along with our proposed
min kxk0 s.t. Ax = b
min kxkpp
s.t. Ax = b
(V) min
x∈IRnλ1kxkpp+ λ2kxk1+ 1
2kAx − bk22
(IV) min
x∈IRnλ1kxkpp+ λ2kxk22+1
2kAx − bk22
(III) min
x∈IRnλkxkpp+ 1
2kAx − bk22
min kxk1
s.t. Ax = b
(II) min
x∈IRnλ1kxk1+ λ2kxk22+ 1
2kAx − bk22
(I) min
x∈IRnλkxk1 +1
2kAx − bk22
Figure 1: Five Models for Sparse Reconstruction
conjugate gradient which will be recalled in the next section. First, we present them out as below.
ϕ1(µ, t) = µ[ln(1 + e−µt) + ln(1 + eµt)],
ϕ2(µ, t) =
t if t≥ µ2,
t2
µ + µ4 if −µ2 < t < µ2,
−t if t≤ −µ2, ϕ3(µ, t) =p4µ2+ t2,
ϕ4(µ, t) = ( t2
2µ if |t| ≤ µ,
|t| − µ2 if |t| > µ,
ϕ5(µ, t) =
t if t > µ,
−8µt43 +3t4µ2 + 3µ8 if −µ ≤ t ≤ µ,
−t if t < −µ,
ϕ6(µ, t) = t erf
t
√2µ
+r 2
πµe−2µ2t2 .
where the error function is defined as follows:
erf(t) = 2
√π Z t
0
e−u2du, ∀t ∈ IR.
With these smoothing functions, there are several smooth models that can be considered, which are given as below.
[SF(I)] min
x∈IRnλ
n
X
i=1
ϕj(µ, xi) + 1
2kAx − bk22, [SF(II)] min
x∈IRnλ
n
X
i=1
ϕj(µ, xi) + λ2kxk22+1
2kAx − bk22, [SF(III)] min
x∈IRnλ
n
X
i=1
ϕpj(µ, xi) + 1
2kAx − bk22, [SF(IV)] min
x∈IRnλ1 n
X
i=1
ϕpj(µ, xi) + λ2kxk22+ 1
2kAx − bk22, [SF(V)] min
x∈IRnλ1 n
X
i=1
ϕpj(µ, xi) + λ2 n
X
i=1
ϕj(µ, xi) + 1
2kAx − bk22,
Here, ϕj(µ, xi)≈ |xi| for j = 1, 2, · · · , 6. Moreover, “[SF(I)]” stands for smoothing func- tion for model (I). Likewise, the others represent similar meanings.
Now, we are ready to present our model. Theoretically, we only focus on model (III) given as in (4). In particular, our main idea is inspired by the re-weighted techniques used in [22, 23, 34]. To proceed, notice that we can express the objective function in model (III) as
λkxkpp+ 1
2kAx − bk22 = λ
n
X
i=1
|xi|p+1
2kAx − bk22 = λ
n
X
i=1
|xi|p−1· |xi| + 1
2kAx − bk22. In light of this expression, for j = 1, 2,· · · , 6, we construct the below smoothing function
Hj(x) := λ
n
X
i=1
(xi2+ µ2)p−12 · ϕj(µ, xi) + 1
2kAx − bk22, (5) where µ > 0 and µ > 0 are two parameters. Indeed, the weight (xi2 + µ2)p−12 for i = 1, 2,· · · , n in (5) is regarded as a continuous variable, which is slightly different from that used in the literature. Then, we have a new smoothing strategy for model (III):
[ReSF(III)] min
x∈IRnλ
n
X
i=1
(xi2+ µ2)p−12 · ϕj(µ, xi) + 1
2kAx − bk22. (6)
“[ReSF(III)]” stands for re-weighted smoothing function for model (III). We apply a version of the conjugate gradient algorithm presented in the next section to the smooth model (6).
The remaining parts of this paper are organized as follows. In Section 2, we shall introduce a version of nonlinear conjugate gradient method to solve the model (III) by using (5) and (6). We prove the boundedness of the level sets of the objective function, as well as the Lipschitz continuity of its gradient. On the basis of these properties, we provide a theorem that shows the global convergence of the algorithm. In Section 3, we report some experimental results to demonstrate the efficiency of our proposed method.
Numerical comparisons with other popular algorithms are shown as well, which reveal that our new model has a satisfactory numerical performance, which strongly suggests that it is a good choice for the target problem. We discuss our conclusions in Section 4.
2 Algorithm and convergence analysis
In this section, we first describe the nonlinear conjugate gradient (CG) algorithm, which was proposed by Chen and Zhou and for image restoration in [13]. Here, we employ it for signal recovery. Then, we discuss the boundedness property of the level set, the Lipschitz continuity property of the objective function’s gradient and the global convergence. There are many versions of CG algorithms, and this one can be viewed as a version of the CG combined with limited memory BFGS method.
Algorithm 1
Step 0. Choose initial point x0 ∈ IRn, ε0 > 0, µ > 0, µ > 0, r ≥ 0, δ ∈ (0, 1) and ρ∈ (0, 1). Set k = 0.
Step 1. Compute the search direction
dk = −∇Hj(xk) if k = 0,
−∇Hj(xk) + βkdk−1+ θkzk−1 if k > 0, where
βk = ∇THj(xk)zk−1
(dk−1)Tzk−1 − 2kzk−1k2∇THj(xk)dk−1 ((dk−1)Tzk−1)2 , θk = ∇THj(xk)dk−1
(dk−1)Tzk−1 , zk−1 = yk−1+ tksk−1,
tk = ε0
∇Hj(xk)
r+ max
0,−(sk−1)Tyk−1 ksk−1k2
,
with
yk−1 =∇Hj(xk)− ∇Hj(xk−1), sk−1 = xk− xk−1 = αk−1dk−1. Step 2. Compute αk = max{ρ0, ρ1,· · · } satisfying
Hj(xk+ ρidk)≤ Hj(xk) + δρi ∇Hj(xk)T
dk. Step 3. Set xk+1 = xk+ αkdk.
Step 4. Set k = k + 1 and go to Step 1.
Lemma 2.1. Let Hj be defined as in (5). For any µ > 0 and µ > 0, the level set L(x0) = {x ∈ IRn| Hj(x)≤ Hj(x0)} is bounded.
Proof. From the structure of the function ϕj(µ, t), it is not hard to verify that
tlim→∞(t2+ ¯µ2)p−12 ϕj(µ, t) = +∞.
Hence, for any µ > 0, we obtain
kxk→∞lim Hj(x) = +∞. (7)
Assume thatL(x0) is unbounded. Then, there exists an index set K1, such thatkxkk →
∞, as k → ∞ and k ∈ K1. By applying (7), it yields
Hj(xk)→ +∞, as k → ∞ and k ∈ K1,
which contradicts Hj(xk)≤ Hj(x0). Thus, the level setL(x0) is bounded. 2
Noting that the function (t2 + ¯µ2)p−12 is continuously differentiable of arbitrary or- der while the smoothing functions ϕj(µ, t) have Lipschitz continuous gradient, then we conclude that the first term of Hj given by (5) has Lipschitz continuous gradient over bounded sets. In particular, since the level setL(x0) is bounded by Lemma 2.1, it follows that the first term of Hj has Lipschitz continuous gradient overL(x0). Since the gradient of 12kAx − bk22 is also Lipschitz continuous over IRn, then we get the following result.
Lemma 2.2. The function Hj defined in (5) has Lipschitz continuous gradient, that is, there exists a constant L > 0, such that
k∇Hj(x)− ∇Hj(y)k ≤ Lkx − yk, ∀ x, y ∈ L(x0). (8)
Lemma 2.3. Let Hj be defined as in (5) and {xk}, {dk} be generated by Algorithm 1.
Then, there holds
∇Hj(xk)T
dk≤ −1
2k∇Hj(xk)k2. (9)
Proof. With Lemma 2.1 and Lemma 2.2, the proof is exactly the same as that in [13, Lemma 2.2]. We omit it. 2
Theorem 2.1. Let Hj be defined as in (5) and {xk}, {dk} be generated by Algorithm 1.
Then, we have
lim inf
k→∞
∇Hj(xk)
= 0, (10)
for j = 1, 2,· · · , 6.
Proof. Suppose that the conclusion (10) is not true. Then, there exists a constant r1 > 0 such that
∇Hj(xk)
≥ r1, ∀k, j = 1, 2, · · · , 6.
Applying Lemma 2.3, it implies that {Hj(xk)} is decreasing, which implies xk ∈ L(x0).
From Lemma 2.1, the level set L(x0) is bounded. Thus, {Hj(xk)} is convergent. In addition, from the Armijo line search in Step 2, we know
klim→∞αk ∇Hj(xk)T
dk= 0. (11)
On the other hand, applying (9) in Lemma 2.3 gives αk ∇Hj(xk)T
dk ≤ −1
2αkk∇Hj(xk)k2 ≤ −r21
2αk < 0. (12) Putting (11) and (12) together yields
klim→∞αk = 0.
Now, by [13, Lemma 2.3], there exists a constant ε > 0 such that kdkk ≤ εk∇Hj(xk)k, ∀k ≥ 0.
Since L(x0) is bounded, there is a constant r > 0 such that k∇Hj(xk)k ≤ r, ∀k ≥ 0.
Thus, there holds kdkk ≤ εr, and hence
klim→∞αkdk= 0.
Note that from the Armijo line search, we have Hj(xk)− Hj(xk+αρkdk)
αk/ρ <−δ ∇Hj(xk)T
dk. (13)
Since {xk} and {dk} are bounded, there exist subsequences {xk}k∈K and {dk}k∈K such that
xk → x and dk → d, as k ∈ K, k → ∞.
Then, taking the limit on both sides of (13) with k ∈ K leads to
− (∇Hj(x))T d≤ −δ (∇Hj(x))T d =⇒ (∇Hj(x))T d≥ 0, which contradicts Lemma 2.3. Therefore, the proof is complete. 2
3 Numerical Experiments
In this section, we report the results of our experiments demonstrating the applicability, efficiency and merits of our algorithm. All tests are carried out on a PC (3.20GHz, 32GB of RAM) with the use of Matlab R2020a.
The parameters of our algorithm are summarized as follows:
x0 = ATb, λ = λ0 = 0.001kATbk∞, ρ = 0.2, δ = 0.3, ε0 = 10−10, r = 0, µ = 10−3.
For the smoothing parameter µ, we apply a continuation approach which has been widely used in the literature (see [3] for instance) in order to improve the quality of solution obtained. First, we set an initial value of the smoothing parameter as µ0 = 10−3 and use Algorithm 1 to solve (6). When running Algorithm 1, we used the stopping criterion implemented in NESTA [3] which is based on the relative variation of the objective function. In particular, we stop Algorithm 1 when
|Hj(xk+1)− Hj(xk)|
|Hj(xk+1)| < 10−5, j = 1, 2, 3, 4, 5, 6. (14) For higher accuracy of the solution, a lower threshold value for the relative variation can be used. Since we are applying a continuation scheme, we reduce the value of µ then im- plement again Algorithm 1 as above with stopping criterion given by (14). In particular, for k≥ 0, we set µk+1 = 0.1µk. In addition to this, we also reduce the regularization pa- rameter λ in each continuation step according to the formula λk+1 = 0.1λkfor k ≥ 0. This procedure is inspired by the approach used in fixed-point continuation (FPC) method
[20]. We repeat the continuation procedure until we obtain a solution within a desired relative error, or until we have reached a pre-specified maximum number of continuation steps. In our simulations, we set the maximum number of continuation steps to 4 which we found to be enough to obtain a good approximate solution for the problems we have considered.
3.1 Comparison of smoothing functions and choosing p
Meanwhile, the success and performance of our algorithm are indeed dependent on the parameter p and the choice of smoothing function. Thus, we explore these issues first before we present our comparisons with other algorithms.
Experiment 1. In this experiment, we determine which values of p will result to an algorithm which has a good frequency of success and fast convergence time. We let A∈ IRm×n be a Gaussian matrix with n = 215= 32768, m = n2, x∗ ∈ IRn is a K−sparse original signal with K =b40nc, whose nonzero components are sampled from N (0, 1). We consider two cases where b = Ax∗ (noiseless case) and b = Ax∗+ e where e is a Gaussian noise with zero mean and σ = 0.01.
For each test problem, the result is reported by running the algorithm 10 times and taking the average CPU time. Note that just like the numerical experiment in [29], the smoothing function ϕ1 is very unstable, so we do not include it in the numerical results.
In the sequel, the frequency of successful reconstructions means the percentage of all test instances that reach the required relative error which we set to 10−3, i.e.
kx − x∗k
kx∗k ≤ 10−3
where x is the solution obtained by the algorithm. The summary of the results are presented in Tables 1 and 2.
The summary of results for the noiseless and noisy cases are presented in Table 1 and Table 2, respectively. From these, we see that the algorithm has a very low frequency of success when a small value of p is used, particularly when p = 0.3. The frequency of success improves as the value of p increases. In terms of CPU time of convergence, we observe the same pattern that a larger value of p provides faster convergence. In particular, we obtain the fastest convergence time when p = 0.9. We can also see that the smoothing function ϕ3 with p = 0.9 has the optimal performance among all the smoothing functions and values of p considered. We confirm that this is the case for different dimensions in the next experiment.
Experiment 2. We compare the performance of different smoothing functions with p = 0.9, which is the optimal value of p found in the previous experiment. We generate A, b
Table 1: Average CPU time and frequency of success for different values of p and different smoothing functions ϕi. The table shows the results for the noiseless case where n = 32768, m = 16384 and K =bn/40c and the entries of A are sampled from N (0, 1).
p
Smoothing Functions
ϕ2 ϕ3 ϕ4 ϕ5 ϕ6
CPU Success CPU Success CPU Success CPU Success CPU Success
Time Rate Time Rate Time Rate Time Rate Time Rate
0.3 - 0 492.96 0.7 - 0 1440.03 0.1 - 0
0.5 550.40 0.2 213.78 1 547.32 0.4 526.04 0.3 271.77 1
0.7 285.47 1 192.48 1 237.29 0.9 252.07 1 209.16 1
0.9 187.35 1 143.85 1 167.48 1 179.99 1 156.40 1
Table 2: Average CPU time and frequency of success for different values of p and different smoothing functions ϕi. The table shows the results for the noisy case where n = 32768, m = 16384 and K =bn/40c and the entries of A are sampled from N (0, 1).
p
Smoothing Functions
ϕ2 ϕ3 ϕ4 ϕ5 ϕ6
CPU Success CPU Success CPU Success CPU Success CPU Success
Time Rate Time Rate Time Rate Time Rate Time Rate
0.3 - 0 495.181 0.8 - 0 - 0 - 0
0.5 586.03 0.4 211.681 1 525.22 0.5 527.27 0.3 296.24 0.9
0.7 285.53 0.9 197.75 1 237.70 1 256.40 1 186.95 1
0.9 187.01 1 144.03 1 167.70 1 178.19 1 155.81 1
and x∗as in Experiment 1 but we consider different dimensions n∈ {211, 212, 213, 214, 215, 216} and let m = n/2 and K = bn/40c. The results are shown in Table 3. With p = 0.9, all of the smoothing functions were effective in solving the problem as we obtained 100%
success rate from 10 independent simulations. Moreover, from Table 3, we see that the function ϕ3 results to a more efficient algorithm both in the noiseless and noisy case.
Table 3: Average CPU time of the algorithm using different smoothing functions with p = 0.9. The frequency of success in all cases is 1. For each n, we set m = n/2 and K =bn/40c. The entries of A are sampled from N (0, 1).
n Noiseless Case Noisy Case
ϕ2 ϕ3 ϕ4 ϕ5 ϕ6 ϕ2 ϕ3 ϕ4 ϕ5 ϕ6
2048 0.32 0.27 0.29 0.35 0.31 0.33 0.29 0.30 0.37 0.31
4096 2.46 2.02 2.19 2.36 2.28 2.49 2.03 2.22 2.36 2.29
8192 10.84 8.12 9.37 10.18 8.80 10.45 8.01 9.28 10.09 8.67 16384 47.94 34.78 41.66 43.54 36.81 49.25 34.24 43.66 47.31 37.93 32768 183.71 144.51 167.88 177.72 154.49 187.21 148.41 168.69 178.98 158.69 65536 747.38 529.53 611.02 623.19 636.89 740.50 524.09 592.11 624.18 631.07
3.2 Comparisons with other solvers for sparse recovery prob- lems
With the findings from the previous section, we now compare our algorithm with ϕ3 and p = 0.9 to other famous algorithms in the literature. We consider popular solvers FISTA [2], NESTA [3], and FPC-BB [20] which are designed to solve the convex model (I). We also look at comparisons with those algorithms based on the nonconvex model (III) with p = 0.5 including HALF [31], IRUcLq-v [23] and alternating direction method of multipliers (ADMM) [28].
Experiment 3. We consider the same A, b and x∗ as in Experiment 1 and compare our algorithm with FISTA, NESTA, FPC-BB, HALF, IRUcLq-v and ADMM. For this experiment, we also consider the two cases where b is noiseless or corrupted with noise.
In addition, we explore the capability of the solvers to recover signals of different sparsity levels, namely K ∈ {b0.05mc, b0.1mc, b0.2mc, b0.3mc}. The results are summarized in Tables 4 and 5.
We observe from both the noiseless and noisy cases that FPC-BB is the fastest solver for the case when K =b0.05mc, K = b0.1mc and K = b0.2mc with 100% success rate.
For these sparsity levels, the second fastest solver is our CG Algorithm with the function
ϕ3 and p = 0.9, followed by the HALF and FISTA algorithms. On the other hand, the solvers IRUcLq-v and ADMM were able to solve all these test problems but with very high computation time compared with other solvers.
While FPC-BB is the fastest for the aforementioned sparsity levels, it failed to solve all the problems for the highest signal density level considered which is K =b0.3mc. For this case, CG Algorithm has the best performance in terms of CPU time and frequency of success among all the solvers considered. On the other hand, FISTA, HALF and IRUcLq-v algorithms were able to solve all the problems as well but the computation times required are larger, especially for IRUcLq-v. We point out that the methods IRUcLq-v and ADMM are the slowest solvers in all instances which is expected since these algorithms require the inversion of an n× n matrix, which is extremely costly for high dimensional problems.
To summarize, this experiment reveals that our method is efficient in handling differ- ent sparsity levels. In particular, it can recover signals which are dense (i.e. not strictly sparse) which is not achieved by FPC-BB, NESTA and ADMM.
Table 4: Average CPU time and frequency of success of different solvers for different sparsity levels with n = 32768 and m = 16384. The table shows the results for the noiseless case from 10 independent trials. The entries of A are sampled from N (0, 1).
Solver
Sparsity level (K)
b0.05mc b0.1mc b0.2mc b0.3mc
CPU Success CPU Success CPU Success CPU Success
Time Rate Time Rate Time Rate Time Rate
CG Algorithm 157.57 1 184.18 1 226.86 1 299.70 1
FISTA 259.09 1 283.90 1 308.37 1 345.23 1
NESTA 420.54 1 - 0 - 0 - 0
FPC-BB 106.97 1 121.15 1 175.65 1 - 0
HALF 187.59 1 224.99 1 335.86 1 543.17 1
IRUcLq-v 457.92 1 554.29 1 716.53 1 929.47 1
ADMM 578.85 1 665.60 1 967.74 1 - 0
Experiment 4. In this experiment, we look at the measurement matrix A ∈ IRm×n considered in [15, 23] where the entries are sampled fromN (0,m1) and we let n = 32768, m = n/4 = 8192. The signal x∗ and the vector b are generated as in the previous experiments.
The results are shown in Table 6. It is evident that our method is very efficient and effective in solving the problems for different sparsity levels. In fact, it is the fastest
Table 5: Average CPU time and frequency of success of different solvers for different sparsity levels with n = 32768 and m = 16384. The table shows the results for the noisy case from 10 independent trials. The entries of A are sampled from N (0, 1).
Solver
Sparsity level (K)
b0.05mc b0.1mc b0.2mc b0.3mc
CPU Success CPU Success CPU Success CPU Success
Time Rate Time Rate Time Rate Time Rate
CG Algorithm 159.44 1 183.83 1 227.61 1 294.96 1
FISTA 272.22 1 279.38 1 304.02 1 347.24 1
NESTA 422.98 1 - 0 - 0 - 0
FPC-BB 117.50 1 116.31 1 174.97 1 - 0
HALF 198.74 1 220.18 1 333.72 1 541.36 1
IRUcLq-v 454.91 1 642.26 1 704.10 1 912.16 1
ADMM 572.38 1 688.97 1 970.93 1 825.04 0.1
solver among all the algorithms considered followed by the FPC-BB and HALF algorithm.
Notice, however, that the time required by the HALF method to solve the problem is almost three times longer than the time needed by our algorithm. With this, we see that the CG algorithm is very efficient in handling the type of problem considered. In terms of the capability of the algorithms to solve the problem, it can also be seen that our algorithm (as well as HALF) is dominant as it was able to solve all the problems for all sparsity levels considered. On the other hand, FPC-BB algorithm failed to solve all the problems for sparsity levels b0.2mc and b0.3mc.
To close this section, we summarize our numerical findings:
• The CG algorithm with the smoothing function ϕ3 and p = 0.9 provides the best convergence time and success rate among all smoothing functions and values of p considered.
• Even though FPC-BB is usually the fastest solver among all, the difference between FPC-BB and our algorithm’s CPU time of convergence is not large (see Experiment 3). Moreover, our algorithm obtains a good success rate for problems with relatively denser signals which is not achieved by FPC-BB. In this respect, our algorithm has a major advantage.
• Taking into account both the CPU time of convergence and the success rate of the algorithms in handling different problems, we believe that our algorithm provides a better performance among all algorithms considered as it can solve more problems at a reasonable computation time.
Table 6: Average CPU time and frequency of success of different solvers for different sparsity levels with n = 32768 and m = 8192. The table shows the results for the noiseless case from 10 independent trials. The entries of the sensing matrix A are generated from N (0, 1/m)
Solver
Sparsity level (K)
b0.05mc b0.1mc b0.2mc b0.3mc
CPU Success CPU Success CPU Success CPU Success
Time Rate Time Rate Time Rate Time Rate
CG Algorithm 24.66 1 33.00 1 51.29 1 96.44 1
FISTA 145.42 1 155.23 1 180.12 1 - 0
NESTA - 0 - 0 - 0 - 0
FPC-BB 56.95 1 70.67 1 - 0 - 0
HALF 76.27 1 88.52 1 127.90 1 251.64 1
IRUcLq-v 469.05 1 616.98 1 978.63 1 1535.11 1
ADMM 585.93 1 760.49 1 806.68 0.1 - 0
• While FPC-BB is often the fastest solver among other algorithms considered in this paper, we note that there are some problems where our method provides way better convergence time. We have shown this in Experiment 4 where we see that not only FPC-BB requires more than twice the time to obtain an approximate solution, but it also cannot solve problems with high density signals. On the other hand, the convergence times of other algorithms are much slower compared to our algorithm.
4 Conclusion
In this paper, we formulated a new model (6) which employs lp-norm along with a special smoothing function that is motivated by re-weighted techniques. In general, the smoothing strategy along with a version of conjugate gradient is the main focus and contribution of this paper, both theoretically and and numerically. Not only we show global convergence, but also from various experiments and comparisons we have strong numerical evidence to verify our proposed algorithm is a good choice for tackling sparse signal reconstruction problem. In particular, our method can efficiently solve the sparse recovery problem even when the dimension of the problem is relatively high. Moreover, we have also demonstrated that dense signals can be recovered by our algorithm, which is not achievable by other algorithms. A future work that is worth considering is the theoretical and numerical study of the other smoothing models (SFI)-(SFV) along with the conjugate gradient algorithm, and numerical comparisons of these models for dealing
with the sparse recovery problem.
Acknowledgements
The authors would like to thank the anonymous referees for their valuable comments and suggestions which have significantly improved the original version of the paper.
Declarations
Funding
C. W. and J. W. are supported by the Natural Science Foundation of Inner Mongolia Autonomous Region (2018MS01016). J. H. A. and J.-S. C. are supported by the Ministry of Science and Technology, Taiwan.
Conflict of Interest
The authors declare that they have no conflict of interest.
References
[1] R. Baraniuk, Compressive sensing, IEEE Signal Processing and Magazine, vol. 24, pp. 118-121, 2007.
[2] A. Beck and M. Teboulle, A fast iterative shrinkage thresholding algorithm for linear inverse problems, SIAM Journal on Imaging Sciences, vol. 2, no. 1, pp. 183-202, 2009.
[3] S. Becker, J. Bobin and E. Cand`es, NESTA: A fast and accurate first-order method for sparse recovery, SIAM J. Imaging Sciences, vol. 4, no. 1, pp. 1–39, 2011.
[4] A.M. Bruckstein, D.L. Donoho, and M. Elad, From sparse solutions of sys- tems of equations to sparse modeling of signals and images, SIAM Review, vol. 51, pp. 34-81, 2009.
[5] E.J. Cand`es, The Restricted Isometry Property and its Implications for Compressed Sensing, Comptes Rendus Mathematique, vol. 346, no. 9-10, pp. 589-592, 2008.
[6] E. J. Cand`es and P. A. Randall, Highly robust error correction by convex pro- gramming, IEEE Transactions on Information Theory, vol. 54, pp. 2829-2840, 2008.
[7] E. J. Cand`es, J. Romberg, and T. Tao, Robust uncertainty principles: exact sig- nal reconstruction from highly incomplete frequency information, IEEE Transactions on Information Theory, vol. 52, pp. 489-509, 2006.
[8] E.J. Cand`es, J.K. Romberg, and T. Tao, Stable signal recovery from incomplete and inaccurate measurements, Communications on Pure and Applied Mathematics, vol. 59, no. 8, pp. 1207-1223, 2006.
[9] E.J. Cand`es and T. Tao, Near optimal signal recovery from random projections:
universal encoding strategies, IEEE Transactions on Information Theory, vol. 52, pp.
5406-5425, 2004.
[10] E.J. Cand`es and T. Tao, Decoding by Linear Programming, IEEE Transactions on Information Theory, vol. 51, no. 12, pp. 4203-4215, 2005.
[11] E.J. Cand`es and T. Tao, Near optimal signal recovery from random projections:
universal encoding strategies? IEEE Transactions on Information Theory, vol. 52, no.
12, pp. 5406-5425, 2006.
[12] X. Chen, Y. Ye, Z. Wang, and D. Ge, Complexity of unconstrained L2− Lp
minimization, Mathematical Programming, vol. 143, pp. 371-383, 2014.
[13] X. Chen and W. Zhou , Smoothing nonlinear conjugate gradient method for im- age restoration using nonsmooth nonconvex minimization, SIAM Journal on Imaging Sciences, vol. 3, no. 4, pp. 765-790, 2010.
[14] X. Chen and W. Zhou, Convergence of the reweighted l1 minimization algorithm for l2− lp minimization, Computational Optimization and Applications, vol. 59, pp.
47-61, 2014.
[15] I. Daubechies, R. DeVore, M. Fornasier and S. G¨unt¨uk, Iteratively reweighted least squares minimization for sparse recovery, Communications on Pure and Applied Mathematics, vol. 63, pp. 1–38, 2010.
[16] D. Donoho, Compressed sensing, IEEE Transactions on Information Theory, vol.
52, no. 4, pp. 1289-1306, 2006.
[17] J. Friedman, T. Hastie and R. Tibshirani. A note on the group lasso and a sparse group lasso. Stat. Theory (math.ST). Submitted on 5 Jan 2010.
[18] D. Ge, X. Jiang, and Y. Ye, A note on the complexity of Lp minimization, Mathematical Programming, vol. 129, pp. 285-299, 2011.
[19] R. Gribnoval and M. Nielsen, Sparse decompositions in unions of bases, IEEE Transactions on Information Theory, vol. 49 (2003), pp. 3320-3325.
[20] E. T. Hale, W. Yin and Y. Zhang, Fixed-point continuation for `1- minimization: Methodology and convergence, SIAM Journal on Optimization, vol.
19, pp. 1107-1130, 2008.
[21] K. Koh, S Kim, and S. Boyd, An interior-point method for large scale l1 regular- ized logistic regression. Journal of Machine Learning Research, vol. 8, pp. 1519-1555, 2007.
[22] M.-J. Lai and J. Wang, An unconstrained lq minimization with 0 < q ≤ 1 for sparse solutions of underdetermined linear system, SIAM Journal on Optimization, vol. 21, no. 1, pp. 82-101, 2011.
[23] M.-J. Lai, Y. Xu, and W. Yin, Improved iteratively reweighted least squares for unconstrained smoothed lq minimization, SIAM Journal on Numerical Analysis, vol.
51, no. 2, pp. 927-957, 2013.
[24] B. K. Natarajan, Sparse approximate solutions to linear systems, SIAM Journal on Computing, vol. 24, pp. 227–234, 1995.
[25] C. T. Nguyen, B. Saheya, Y.-L. Chang and J.-S. Chen, Unified smooth- ing functions for absolute value equation associated with second-order cone, Applied Numerical Mathematics, vol. 135, pp. 206–227, 2019.
[26] N. Simonn, J. Friedman, T. Hastieand and R. Tibshirani, A sparse-group lasso, Journal of Computational and Graphical Statistics, vol. 22, pp. 231-245, 2013.
[27] X. Wang, F. Liu, L.C. Jiao, J. Wu, and J. Chen, Incomplete variables trun- cated conjugate gradient method for signal reconstruction in compressed sensing, In- formation Sciences, vol. 288, pp. 387-411, 2014.
[28] Y. Wang, W. Yin, and J. Zheng, Global convergence of ADMM in nonconvex nonsmooth optimization, Journal of Scientific Computing, vol. 78, pp. 29–63, 2019.
[29] C. Wu, J. Zhan, Y. Lu, and J.-S. Chen, Signal reconstruction by conjugate gradient algorithm based on smoothing l1−norm, Calcolo, vol. 56, no 4, 2019.
[30] C. Wu, J. Zhang, and J.-S. Chen. An elastic unconstrained lq− l1 minimization for finding sparse solution, submitted manuscript, 2019.
[31] Z. Xu, X. Chang, F. Xu, and H. Zhang, L1/2 regularization: A thresholding representation theory and a fast solver, IEEE Transactions on Neural Networks and Learning Systems, vol. 23, pp. 1013-1027, 2012.
[32] K. Yin, Y.H. Xiao, and M.L. Zhang, Nonlinear conjugate gradient method for l1-norm regularization problems in compressive sensing, Journal of Computational Information Systems, vol. 7, pp. 880-885, 2011.
[33] C. Zhang, Nearly unbiased variable selection under minimax concave penalty, An- nals of Statistics, vol. 38, pp. 894-942, 2010.
[34] Y. Zhang and W. Ye, Sparse recovery by the iteratively reweighted l1 algorithm for elastic l2− lq minimization, Optimization, vol. 66, no. 10, pp. 1677-1687, 2017.
[35] H. Zou and T. Hastie, Regularization and variable selection via the elastic net, Journal of the Royal Statistical Society Series B-Statistical Methodology. vol. 67, pp.
301-320, 2005.
[36] H. Zhu, Y.H. Xiao, and S.Y. Wu, Large sparse signal recovery by conjugate gradient algorithm based on smoothing technique, Computers and Mathematics with Applications, vol. 66, pp. 24-32, 2013.