### Sparse Representation and Optimization Methods for L1-regularized Problems

Chih-Jen Lin

Department of Computer Science National Taiwan University

### Outline

Sparse Representation

Existing Optimization Methods Coordinate Descent Methods Other Methods

Experiments

Sparse Representation

### Outline

Sparse Representation

Existing Optimization Methods Coordinate Descent Methods Other Methods

Experiments

Sparse Representation

### Sparse Representation

A mathematical way to model a signal, an image, or a document is

y =X w

=w_{1}

x_{11}

...

x_{l 1}

+ · · · + w_{n}

x_{1n}

...

x_{ln}

A signal is a linear combination of others X and y are given

We would like to find w with as few non-zeros as possible (sparsity)

Sparse Representation

### Example: Image Deblurring

Consider

y = Hz

z: original image, H: blur operation y: observed image

Assume

z = Dw with known dictionary D Try to

minky − HDwk and get ˆw

Sparse Representation

### Example: Image Deblurring (Cont’d)

We hope w has few nonzeros as each image is generated using only several columns of the dictionary

The restored image is D ˆw

Sparse Representation

### Example: Face Recognition

Assume a face image is a combination of the same person’s other images

x11

...

x_{l 1}

: 1st image,

x12

...

x_{l 2}

: 2nd image, . . . l : number of pixels in a face image

Given a face image y and collections of two persons’

faces X1 and X2

Sparse Representation

### Example: Face Recognition (Cont’d)

If

minw ky − X_{1}wk < min

w ky − X_{2}wk,
predict y as the first person

We hope w has few nonzeros as noisy images shouldn’t be used

Sparse Representation

### Example: Feature Selection

Given

X =

x_{11} . . . x_{1n}
...

x_{l 1} . . . x_{ln}

, x_{i 1} . . . x_{in} : ith document
y_{i} = +1 or − 1 (two classes)

We hope to find w such that
w^{T}x_{i}

(> 0 if y_{i} = 1

< 0 if y = −1

Sparse Representation

### Example: Feature Selection (Cont’d)

Try to

minw l

X

i =1

e^{−y}^{i}^{w}^{T}^{x}^{i}
and hope that w is sparse

That is, we assume that each document is generated from important features

w_{i} 6= 0 ⇒ important features

Sparse Representation

### L1-norm Minimization I

Finding w with the smallest number of non-zeros is difficult

kwk_{0} : number of nonzeros
Instead, L1-norm minimization

minw C ky − X wk^{2} + kwk_{1}
C : a parameter given by users

Sparse Representation

### L1-norm Minimization II

1-norm versus 2-norm

kwk_{1} = |w_{1}| + · · · + |w_{n}|
kwk^{2}_{2} = w_{1}^{2} + · · · + w_{n}^{2}
Two figures

w

|w |

w
w^{2}

Sparse Representation

### L1-norm Minimization III

If using 2-norm, all wi are non-zeros
Using 1-norm, many w_{i} may be zeros
Smaller C , better sparsity

Sparse Representation

### L1-regularized Classifier

Training data {y_{i}, x_{i}}, x_{i} ∈ R^{n}, i = 1, . . . , l , y_{i} = ±1
l : # of data, n: # of features

minw kwk_{1} + C Xl

i =1ξ(w; x_{i}, y_{i})
ξ(w; xi, yi): loss function

Logistic loss:

log(1 + e^{−y w}^{T}^{x})
L1 and L2 losses:

max(1 − y w^{T}x, 0) and max(1 − y w^{T}x, 0)^{2}
We do not consider kernels

Sparse Representation

### L1-regularized Classifier (Cont’d)

kwk_{1} not differentiable ⇒ causes difficulties in
optimization

Loss functions: logistic loss twice differentiable, L2 loss differentiable, and L1 loss not differentiable We focus on logistic and L2 loss

Sometimes bias term is added

w^{T}x ⇒ w^{T}x + b

Sparse Representation

### L1-regularized Classifier (Cont’d)

Many available methods; we review existing methods and show details of some methods Notation:

f (w) ≡ kwk_{1} + C X^{l}

i =1ξ(w; x_{i}, y_{i})
is the function to be minimized, and

L(w) ≡ C X^{l}

i =1ξ(w; x_{i}, y_{i}).

We do not discuss L1-regularized regression, which is another hot topic recently

Existing Optimization Methods

### Outline

Sparse Representation

Existing Optimization Methods Coordinate Descent Methods Other Methods

Experiments

Existing Optimization Methods

### Decomposition Methods

Working on some variables at a time Cyclic coordinate descent methods

Working variables sequentially or randomly selected One-variable case:

minz f (w + ze_{j}) − f (w)
ej: indicator vector for the the j th element

Examples: Goodman (2004); Genkin et al. (2007);

Balakrishnan and Madigan (2005); Tseng and Yun (2007); Shalev-Shwartz and Tewari (2009); Duchi and Singer (2009); Wright (2010)

Existing Optimization Methods

### Decomposition Methods (Cont’d)

Gradient-based working set selection

Higher cost per iteration; larger working set

Examples: Shevade and Keerthi (2003); Tseng and Yun (2007); Yun and Toh (2009)

Active set method

Working set the same as the set of non-zero w elements

Examples: Perkins et al. (2003)

Existing Optimization Methods

### Constrained Optimization

Replace w with w^{+}− w^{−}:
min

w^{+},w^{−}

X^{n}

j =1w_{j}^{+}+X^{n}

j =1w_{j}^{−}+ C X^{l}

i =1ξ(w^{+}−w^{−}; x_{i}, y_{i})
s. t. w_{j}^{+} ≥ 0, w_{j}^{−} ≥ 0, j = 1, . . . , n.

Any bound-constrained optimization methods can be used

Examples: Schmidt et al. (2009) used Gafni and Bertsekas (1984); Kazama and Tsujii (2003) used Benson and Mor´e (2001); we have considered Lin and Mor´e (1999); Koh et al. (2007): interior point method

Existing Optimization Methods

### Constrained Optimization (Cont’d)

Equivalent problem with non-smooth constraints:

minw

X^{l}

i =1ξ(w; x_{i}, y_{i})
subject to kwk_{1} ≤ K .

C replaced by a corresponding K

Go back to LASSO (Tibshirani, 1996) if y ∈ R and least-square loss

Examples: Kivinen and Warmuth (1997); Lee et al.

(2006); Donoho and Tsaig (2008); Duchi et al.

Existing Optimization Methods

### Other Methods

Expectation maximization: Figueiredo (2003);

Krishnapuram et al. (2004, 2005).

Stochastic gradient descent: Langford et al. (2009);

Shalev-Shwartz and Tewari (2009)

Modified quasi Newton: Andrew and Gao (2007);

Yu et al. (2010)

Hybrid: easy method first and then interior-point for faster local convergence (Shi et al., 2010)

Existing Optimization Methods

### Other Methods (Cont’d)

Quadratic approximation followed by coordinate descent: Krishnapuram et al. (2005); Friedman et al. (2010); a kind of Newton approach

Cutting plane method: Teo et al. (2010)

Some methods find a solution path for different C values; e.g., Rosset (2005), Zhao and Yu (2007), Park and Hastie (2007), and Keerthi and Shevade (2007).

Here we focus on a single C

Existing Optimization Methods

### Strengths and Weaknesses of Existing Methods

Convergence speed: higher-order methods (quasi Newton or Newton) have fast local convergence, but fail to obtain a reasonable model quickly Implementation efforts: higher-order methods usually more complicated

Large data: if solving linear systems is needed, use iterative (e.g., CG) instead of direct methods Feature correlation: methods working on some variables at a time (e.g., decomposition methods) may be efficient if features are almost independent

Coordinate Descent Methods

### Outline

Sparse Representation

Existing Optimization Methods Coordinate Descent Methods Other Methods

Experiments

Coordinate Descent Methods

### Coordinate Descent Methods I

Minimizing the one-variable function

g_{j}(z) ≡ |w_{j} + z| − |w_{j}| + L(w + ze_{j}) − L(w),
where

ej ≡ [0, . . . , 0

| {z }

j −1

, 1, 0, . . . , 0]^{T}.
No closed form solution

Genkin et al. (2007), Shalev-Shwartz and Tewari (2009), and Yuan et al. (2010)

Coordinate Descent Methods

### Coordinate Descent Methods II

They differ in how to minimize this one-variable problem

While gj(z) is not differentiable, we can have a form similar to Taylor expansion:

g_{j}(z) = g_{j}(0) + g_{j}^{0}(0)z + 1

2g_{j}^{00}(ηz)z^{2},
Anoter representation (for our derivation)

min g_{j}(z) = |w_{j} + z| − |w_{j}| + L_{j}(z; w) − L_{j}(0; w),

Coordinate Descent Methods

### Coordinate Descent Methods III

where

L_{j}(z; w) ≡ L(w + ze_{j}).

is a function of z

Coordinate Descent Methods

### BBR (Genkin et al., 2007) I

They rewrite g_{j}(z) as

g_{j}(z) = g_{j}(0) + g_{j}^{0}(0)z + 1

2g_{j}^{00}(ηz)z^{2},
where 0 < η < 1,

g_{j}^{0}(0) ≡

(L^{0}_{j}(0) + 1 if w_{j} > 0,

L^{0}_{j}(0) − 1 if w_{j} < 0, (1)
and

Coordinate Descent Methods

### BBR (Genkin et al., 2007) II

g_{j}(z) is not differentiable if w_{j} = 0

BBR finds an upper bound U_{j} of g_{j}^{00}(z) in a trust
region

Uj ≥ g_{j}^{00}(z), ∀|z| ≤ ∆j.

Then ˆg_{j}(z) is an upper-bound function of g_{j}(z):

ˆ

g_{j}(z) ≡ g_{j}(0) + g_{j}^{0}(0)z + 1
2U_{j}z^{2}.
Any step z satisfying ˆg_{j}(z) < ˆg_{j}(0) leads to

gj(z) − gj(0) = gj(z) − ˆgj(0) ≤ ˆgj(z) − ˆgj(0) < 0,

Coordinate Descent Methods

### BBR (Genkin et al., 2007) III

Convergence not proved (no sufficient decrease condition via line search)

Logistic loss
U_{j} ≡ C

l

X

i =1

x_{ij}^{2}F y_{i}w^{T}x_{i}, ∆_{j}|x_{ij}|,

where

F (r , δ) =

(0.25 if |r | ≤ δ,

1

Coordinate Descent Methods

### BBR (Genkin et al., 2007) IV

The sub-problem solved in practice:

minz gˆ_{j}(z)

s. t. |z| ≤ ∆_{j} and w_{j} + z

(≥ 0 if w_{j} > 0,

≤ 0 if w_{j} < 0.

Update rule:

d = min

max P(−g_{j}^{0}(0)

U_{j} , w_{j}), −∆_{j}, ∆_{j}

,

Coordinate Descent Methods

### BBR (Genkin et al., 2007) V

where

P(z, w ) ≡

(z if sgn(w + z) = sgn(w ),

−w otherwise.

Coordinate Descent Methods

### SCD (Shalev-Shwartz and Tewari, 2009) I

SCD: stochastic coordinate descent
w = w^{+}− w^{−}

At each step, randomly select a variable from
{w_{1}^{+}, . . . , w_{n}^{+}, w_{1}^{−}, . . . , w_{n}^{−}}
One-variable sub-problem:

minz g_{j}(z) ≡ z + L_{j}(z; w^{+}− w^{−}) − L_{j}(0; w^{+}− w^{−}),
subject to

w_{j}^{k,+} + z ≥ 0 or w_{j}^{k,−}+ z ≥ 0,

Coordinate Descent Methods

### SCD (Shalev-Shwartz and Tewari, 2009) II

Second-order approximation similar to BBR:

ˆ

g_{j}(z) = g_{j}(0) + g_{j}^{0}(0)z + 1
2U_{j}z^{2},
where

g_{j}^{0}(0) =

(1 + L^{0}_{j}(0) for w_{j}^{+}

1 − L^{0}_{j}(0) for w_{j}^{−} and U_{j} ≥ g_{j}^{00}(z), ∀z.

BBR: U_{j} an upper bound of g_{j}^{00}(z) in the trust region

Coordinate Descent Methods

### SCD (Shalev-Shwartz and Tewari, 2009) III

For logistic regression,
U_{j} = 0.25C

l

X

i =1

x_{ij}^{2} ≥ g_{j}^{00}(z), ∀z.

Shalev-Shwartz and Tewari (2009) assume

−1 ≤ x_{ij} ≤ 1, ∀i , j, so a simple upper bound is
U_{j} = 0.25Cl .

Coordinate Descent Methods

### CDN (Yuan et al., 2010) I

Newton step:

minz g_{j}^{0}(0)z + 1

2g_{j}^{00}(0)z^{2}.
That is,

minz |w_{j} + z| − |wj| + L^{0}_{j}(0)z + 1

2L^{00}_{j}(0)z^{2}.
Second-order term not replaced by an upper bound
Function value may not be decreasing

Coordinate Descent Methods

### CDN (Yuan et al., 2010) II

Assume z is the optimal solution; need line search Following Tseng and Yun (2007)

g_{j}(λz) − g_{j}(0) ≤ σλ(L^{0}_{j}(0)z + |w_{j} + z| − |w_{j}|),
This is slightly different from the traditional form of
line search. Now

|w_{j} + z| − |wj|
must be taken into consideration
Convergence can be proved

Coordinate Descent Methods

### Calculating First and Second Order Information I

We have

L^{0}_{j}(0) = dL(w + ze_{j})
dz

z=0

= ∇_{j}L(w)
L^{00}_{j}(0) = d^{2}L(w + zej)

dzdz z=0

= ∇^{2}_{jj}L(w)

Coordinate Descent Methods

### Calculating First and Second Order Information II

For logistic loss:

L^{0}_{j}(0) = C

l

X

i =1

y_{i}x_{ij} τ (y_{i}(w)^{T}x_{i}) − 1 ,

L^{00}_{j}(0) = C

l

X

i =1

x_{ij}^{2} τ (y_{i}(w)^{T}x_{i})

1 − τ (y_{i}(w)^{T}x_{i}) ,
where

τ (s) ≡ 1
1 + e^{−s}

Other Methods

### Outline

Sparse Representation

Existing Optimization Methods Coordinate Descent Methods Other Methods

Experiments

Other Methods

### GLMNET (Friedman et al., 2010) I

A quadratic approximation of L(w):

f (w + d) − f (w)

= (kw + dk_{1} + L(w + d)) − (kwk_{1} + L(w))

≈∇L(w)^{T}d + 1

2d^{T}∇^{2}L(w)d + kw + dk1 − kwk_{1}.
Then

w ← w + d Line search is needed for convergence

Other Methods

### GLMNET (Friedman et al., 2010) II

But how to handle quadratic minimization with some one-norm terms?

GLMNET uses coordinate descent For logistic regression:

∇L(w) = C

l

X

i =1

τ (y_{i}w^{T}x_{i}) − 1y_{i}x_{i}

∇^{2}L(w) = CX^{T}DX ,
where D ∈ R^{l ×l} is a diagonal matrix with

Other Methods

### Bundle Methods (Teo et al., 2010) I

Also called cutting plane method
L(w) a convex loss function
If w^{k} the current solution,

L(w) ≥∇L(w^{k})^{T}(w − w^{k}) + L(w^{k})

=a^{T}_{k} w + b_{k}, ∀w,
where

a_{k} ≡ ∇L(w^{k}) and b_{k} ≡ L(w^{k}) − a^{T}_{k} w^{k}.

Other Methods

### Bundle Methods (Teo et al., 2010) II

Maintains all earlier cutting planes to form a lower-bound function for L(w):

L(w) ≥ L^{CP}_{k} (w) ≡ max

1≤t≤ka^{T}_{t} w + b_{t}, ∀w.

Obtaining w^{k+1} by solving

minw kwk_{1} + L^{CP}_{k} (w).

Other Methods

### Bundle Methods (Teo et al., 2010) III

This is a linear program using w = w^{+}− w^{−}:
min

w^{+},w^{−},ζ
n

X

j =1

w_{j}^{+}+

n

X

j =1

w_{j}^{−}+ ζ

subject to a^{T}_{t} (w^{+}− w^{−}) + b_{t} ≤ ζ, t = 1, . . . , k,
w_{j}^{+} ≥ 0, w_{j}^{−} ≥ 0, j = 1, . . . , n.

Experiments

### Outline

Sparse Representation

Existing Optimization Methods Coordinate Descent Methods Other Methods

Experiments

Experiments

### Data

Data set l n # of non-zeros

real-sim 72,309 20,958 3,709,083 news20 19,996 1,355,191 9,097,916

rcv1 677,399 47,236 49,556,258

yahoo-korea 460,554 3,052,939 156,436,656 l : number of data, n: number of features

They are all document sets

4/5 for training and 1/5 for testing

Select best C by cross validation on training

Experiments

### Compared Methods

Software using w^{T}x without b
BBR (Genkin et al., 2007)

SCD (Shalev-Shwartz and Tewari, 2009) CDN: our coordinate descent implementation TRON: our Newton implementation for

bound-constrained formulation OWL-QN (Andrew and Gao, 2007) BMRM (Teo et al., 2010)

Experiments

### Compared Methods (Cont’d)

Software using w^{T}x + b

CDN: our coordinate descent implementation BBR (Genkin et al., 2007)

CGD-GS (Yun and Toh, 2009) IPM (Koh et al., 2007)

GLMNET (Friedman et al., 2010) Lassplore (Liu et al., 2009)

Experiments

### Convergence of Objective Values (no b)

real-sim news20

Experiments

### Test Accuracy

real-sim news20

rcv1 yahoo-korea

Experiments

### Convergence of Objective Values (with b)

real-sim news20

Experiments

### Observations and Conclusions

Decomposition methods better in the early stage One-variable sub-problem in coordinate descent Use tight approximation if possible

Newton (IPM, GLMNET) and quasi Newton (OWL-QN): fast local convergence in the end We also checked gradient and sparsity

Complete results (of more data sets) and programs are in Yuan et al. (2010); JMLR 2010 (11),

3183–3234

Experiments

### References I

G. Andrew and J. Gao. Scalable training of L1-regularized log-linear models. In Proceedings of the Twenty Fourth International Conference on Machine Learning (ICML), 2007.

S. Balakrishnan and D. Madigan. Algorithms for sparse linear classifiers in the massive data setting. 2005. URL http://www.stat.rutgers.edu/~madigan/PAPERS/sm.pdf.

S. Benson and J. J. Mor´e. A limited memory variable metric method for bound constrained minimization. Preprint MCS-P909-0901, Mathematics and Computer Science Division, Argonne National Laboratory, Argonne, Illinois, 2001.

D. L. Donoho and Y. Tsaig. Fast solution of l1 minimization problems when the solution may be sparse. IEEE Transactions on Information Theory, 54:4789–4812, 2008.

J. Duchi and Y. Singer. Boosting with structural sparsity. In Proceedings of the Twenty Sixth International Conference on Machine Learning (ICML), 2009.

J. Duchi, S. Shalev-Shwartz, Y. Singer, and T. Chandra. Efficient projections onto the L1-ball for learning in high dimensions. In Proceedings of the Twenty Fifth International Conference on Machine Learning (ICML), 2008.

M. A. T. Figueiredo. Adaptive sparseness for supervised learning. IEEE Transactions on Pattern Analysis and Machine Intelligence, 25:1150–1159, 2003.

Experiments

### References II

E. M. Gafni and D. P. Bertsekas. Two-metric projection methods for constrained optimization.

SIAM Journal on Control and Optimization, 22:936–964, 1984.

A. Genkin, D. D. Lewis, and D. Madigan. Large-scale Bayesian logistic regression for text categorization. Technometrics, 49(3):291–304, 2007.

J. Goodman. Exponential priors for maximum entropy models. In Proceedings of the Annual Meeting of the Association for Computational Linguistics, 2004.

J. Kazama and J. Tsujii. Evaluation and extension of maximum entropy models with inequality constraints. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, pages 137–144, 2003.

S. S. Keerthi and S. Shevade. A fast tracking algorithm for generalized LARS/LASSO. IEEE Transactions on Neural Networks, 18(6):1826–1830, 2007.

J. Kim, Y. Kim, and Y. Kim. A gradient-based optimization algorithm for LASSO. Journal of Computational and Graphical Statistics, 17(4):994–1009, 2008.

Y. Kim and J. Kim. Gradient LASSO for feature selection. In Proceedings of the 21st International Conference on Machine Learning (ICML), 2004.

J. Kivinen and M. K. Warmuth. Exponentiated gradient versus gradient descent for linear predictors. Information and Computation, 132:1–63, 1997.

Experiments

### References III

K. Koh, S.-J. Kim, and S. Boyd. An interior-point method for large-scale l1-regularized logistic regression. Journal of Machine Learning Research, 8:1519–1555, 2007. URL

http://www.stanford.edu/~boyd/l1_logistic_reg.html.

B. Krishnapuram, A. J. Hartemink, L. Carin, and M. A. T. Figueiredo. A Bayesian approach to joint feature selection and classifier design. IEEE Transactions on Pattern Analysis and Machine Intelligence, 26(6):1105–1111, 2004.

B. Krishnapuram, L. Carin, M. A. T. Figueiredo, and A. J. Hartemink. Sparse multinomial logistic regression: fast algorithms and generalization bounds. IEEE Transactions on Pattern Analysis and Machine Intelligence, 27(6):957–968, 2005.

J. Langford, L. Li, and T. Zhang. Sparse online learning via truncated gradient. Journal of Machine Learning Research, 10:771–801, 2009.

S.-I. Lee, H. Lee, P. Abbeel, and A. Y. Ng. Efficient l1 regularized logistic regression. In Proceedings of the Twenty-first National Conference on Artificial Intelligence (AAAI-06), pages 1–9, Boston, MA, USA, July 2006.

C.-J. Lin and J. J. Mor´e. Newton’s method for large-scale bound constrained problems. SIAM Journal on Optimization, 9:1100–1127, 1999.

J. Liu, J. Chen, and J. Ye. Large-scale sparse logistic regression. In Proceedings of The 15th

Experiments

### References IV

M. Y. Park and T. Hastie. L1 regularization path algorithm for generalized linear models.

Journal of the Royal Statistical Society Series B, 69:659–677, 2007.

S. Perkins, K. Lacker, and J. Theiler. Grafting: Fast, incremental feature selection by gradient descent in function space. Journal of Machine Learning Research, 3:1333–1356, 2003.

S. Rosset. Following curved regularized optimization solution paths. In L. K. Saul, Y. Weiss, and L. Bottou, editors, Advances in Neural Information Processing Systems 17, pages 1153–1160, Cambridge, MA, 2005. MIT Press.

V. Roth. The generalized LASSO. IEEE Transactions on Neural Networks, 15(1):16–28, 2004.

M. Schmidt, G. Fung, and R. Rosales. Optimization methods for l1-regularization. Technical Report TR-2009-19, University of British Columbia, 2009.

S. Shalev-Shwartz and A. Tewari. Stochastic methods for l1 regularized loss minimization. In Proceedings of the Twenty Sixth International Conference on Machine Learning (ICML), 2009.

S. K. Shevade and S. S. Keerthi. A simple and efficient algorithm for gene selection using sparse logistic regression. Bioinformatics, 19(17):2246–2253, 2003.

J. Shi, W. Yin, S. Osher, and P. Sajda. A fast hybrid algorithm for large scale l1-regularized logistic regression. Journal of Machine Learning Research, 11:713–741, 2010.

C. H. Teo, S. Vishwanathan, A. Smola, and Q. V. Le. Bundle methods for regularized risk minimization. Journal of Machine Learning Research, 11:311–365, 2010.

Experiments

### References V

R. Tibshirani. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society Series B, 58:267–288, 1996.

P. Tseng and S. Yun. A coordinate gradient descent method for nonsmooth separable minimization. Mathematical Programming, 117:387–423, 2007.

S. J. Wright. Accelerated block-coordinate relaxation for regularized optimization. Technical report, University of Wisconsin, 2010.

J. Yu, S. Vishwanathan, S. Gunter, and N. N. Schraudolph. A quasi-Newton approach to nonsmooth convex optimization problems in machine learning. Journal of Machine Learning Research, 11:1–57, 2010.

G.-X. Yuan, K.-W. Chang, C.-J. Hsieh, and C.-J. Lin. A comparison of optimization methods and software for large-scale l1-regularized linear classification. Journal of Machine Learning Research, 11:3183–3234, 2010. URL

http://www.csie.ntu.edu.tw/~cjlin/papers/l1.pdf.

S. Yun and K.-C. Toh. A coordinate gradient descent method for l1-regularized convex minimization. 2009. To appear in Computational Optimizations and Applications.

P. Zhao and B. Yu. Stagewise lasso. Journal of Machine Learning Research, 8:2701–2726,