• 沒有找到結果。

Asymptotic convergence of an SMO algorithm without any assumptions

N/A
N/A
Protected

Academic year: 2021

Share "Asymptotic convergence of an SMO algorithm without any assumptions"

Copied!
3
0
0

加載中.... (立即查看全文)

全文

(1)

248 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 13, NO. 1, JANUARY 2002

It follows from (11) that, for allt  0

kz(t + s)kp kz(t)kp+ t+s

t [0kz()kp + `pkz( 0 h)kp] d

fors > 0 and p 2 (1; 1):

By virtue of 2) in Lemma 1, if taking the limitp ! 1+andp ! 1, we know that the above inequality still holds for allp 2 [1; 1] and s > 0. Moreover, according to the definition of Dini derivative, we obtain

 d+kz(t)kdt p

 0kz(t)kp+ `pkz(t 0 h)kp  0kz(t)kp+ `p max

0h0kz(t + )kp

fort  0 and p 2 [1; 1]: (12)

Therefore, applying the comparison theorem for the functional differ-ential inequalities (see, e.g., [8, pp. 37–38, Th. 6.9.4]), we have from (12) thatkz(t)kp yp(t) for t  0 and p 2 [1; 1], where yp(t) is the solution of the following one-dimensional differential-delay equation:

 dydtp(t) = 0yp(t) + `p max

0h0yp(t + )

fort  0 (13)

with an initial condition satisfyingyp()  kz()kpfor0h    0. Noting thatk!kp= max0h0kz()kp, we letyp(t) = k!kpe0 t fort  0h. Then, by substituting yp(t) into (13), we know that the numberpshould meet (4) which has a unique positive solution. There-fore, we havekz(t)kp k!kpe0 tfort  0, where p> 0 is the unique positive solution of nonlinear equation (4). This implies that the equilibriumz3= 0 2 <nof system (9) is globally exponentially stable. If we take!() = () 0 x3for0h    0, then we have z(t; !) = x(t; )0x3fort  0. Hence, the equilibrium x = x32 <n of system (1) is globally exponentially stable and (3) holds. The proof of Theorem 1 is completed.

Proof of Corollary 1: Let the mapG(x) = W g(x) + J : <n!

<n. It is clear that the mapG : <n ! <nis ap-contraction with constant`p= kW kp < 1. Therefore, by applying Theorem 1 to the network model (7) and noting the relation ofui(t) = ixi(t) for t  0 andi = 1; 2; . . . ; n, we obtain the proof.

REFERENCES

[1] V. S. Borkar and K. Soumyanath, “An analog scheme for fixed point computation—Part I: Theory,” IEEE Trans. Circuits Syst. I, vol. 44, pp. 351–355, Apr. 1997.

[2] A. Bouzerdoum and T. R. Pattison, “Neural network for quadratic opti-mization with bound constraints,” IEEE Trans. Neural Networks, vol. 4, pp. 293–303, Mar. 1993.

[3] Y. J. Cao and Q. H. Wu, “A note on stability of analog neural networks with time delays,” IEEE Trans. Neural Networks, vol. 7, pp. 1533–1535, Nov. 1996.

[4] K. Gopalsamy and X. Z. He, “Stability in asymmetric Hopfield nets with transmission delays,” Phys. D, vol. 76, no. 4, pp. 344–358, Sept. 1994. [5] J. K. Hale, Theory of Functional Differential Equations. New York:

Springer-Verlag, 1977.

[6] J. J. Hopfield, “Neurons with graded response have collective computa-tional properties like those of two-state neurons,” in Proc. Nat. Academy Sci., vol. 81, 1984, pp. 3088–3092.

[7] D. G. Kelly, “Stability in contractive nonlinear neural networks,” IEEE Trans. Biomed. Eng., vol. 37, pp. 231–242, Mar. 1990.

[8] V. Lakshmikantham and S. Leela, Functional, Partial, Abstract, and Complex Differential Equations, vol. II of Differential and Integral In-equalities Theory and Applications. New York: Academic, 1969. [9] X. B. Liang and T. Yamaguchi, “On the global asymptotic stability

inde-pendent of delay of neural networks,” IEICE Trans. Fundamentals, vol. E80-A, no. 1, pp. 247–250, Jan. 1997.

[10] X. B. Liang, Y. B. He, and L. D. Wu, “Some results on the stability of analog neural networks with time delays,” Proc. IEEE Int. Joint Conf. Neural Networks, pp. 1388–1391, May 4–9, 1998.

[11] X. B. Liang and L. D. Wu, “Global exponential stability of a class of neural circuits,” IEEE Trans. Circuits Syst. I, vol. 46, pp. 748–751, June 1999.

[12] X. B. Liang and J. Si, “Global exponential stability of neural networks with globally Lipschitz continuous activations and its application to linear variational inequality problem,” IEEE Trans. Neural Networks, vol. 12, pp. 349–359, Mar. 2001.

[13] X. B. Liang, “A complete proof of global exponential convergence of a neural network for quadratic optimization with bound constraints,” IEEE Trans. Neural Networks, vol. 12, pp. 636–639, May 2001. [14] J. Mallet-Paret and R. D. Nussbaum, “Global continuation and

asymp-totic behavior for periodic solutions of a differential-delay equation,” Ann. Mate. Pura. Appl., vol. 145, pp. 33–128, 1986.

[15] C. M. Marcus and R. M. Westervelt, “Stability of analog neural networks with delay,” Phys. Rev. A, vol. 39, no. 1, pp. 347–359, Jan. 1989. [16] I. W. Sandberg, “Some theorems on the dynamic response of nonlinear

transistor networks,” Bell Syst. Tech. J., vol. 48, no. 1, pp. 35–54, Jan. 1969.

[17] M. Vidyasagar, Nonlinear Systems Analysis, 2nd ed. Englewood Cliffs, NJ: Prentice-Hall, 1993.

Asymptotic Convergence of an SMO Algorithm Without any Assumptions

Chih-Jen Lin

Abstract—The asymptotic convergence of Lin can be applied to a modi-fied SMO algorithm by Keerthi et al. with some assumptions. Here we show that for this algorithm those assumptions are not necessary.

Index Terms—Asymptotic convergence, decomposition, support vector machine (SVM).

I. INTRODUCTION

Given training vectorsxi 2 Rn; i = 1; . . . ; l, in two classes, and a vectory 2 Rlsuch thatyi2 f1; 01g, the support vector machines (SVMs) [9] require the solution of the following optimization problem:

min f( ) = 12 TQ 0 eT 0  i C i =1; . . . ; l yT =0 (1)

Manuscript received September 13, 2001; revised November 19, 2001. This work was supported in part by the National Science Council of Taiwan under Grant NSC 90-2213-E-002-111.

The author is with the Department of Computer Science and Informa-tion Engineering, NaInforma-tional Taiwan University, Taipei 106, Taiwan (e-mail: cjlin@csie.ntu.edu.tw).

Publisher Item Identifier S 1045-9227(02)00943-8. 1045–9227/02$17.00 © 2002 IEEE

(2)

IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 13, NO. 1, JANUARY 2002 249

where C > 0 and e is the vector of all ones. Training vec-tors xi are mapped into a higher dimensional space by  and Qij  yiyjK(xi; xj) where K(xi; xj)  (xi)T(xj) is the kernel. Due to the density of the matrixQ, currently the decomposition method is one of the major methods to solve (1) (e.g., [3], [7], and [8]). It is an iterative process where in each iteration the index set of variables are separated to two setsB and N, where B is the working set. Then in that iteration variables corresponding toN are fixed while a subproblem on variables corresponding toB is minimized.

Among these methods, Platt’s sequential minimal optimization (SMO) [8] is a simple algorithm where in each iteration only two variables are selected in the working set so the subproblem can be analytically solved without using an optimization software. Keerthi

et al. [5] pointed out a problem in the original SMO and proposed

two modified versions. The one using the two indexes which have the maximal violation of the Karush-Kuhn-Tucker (KKT) condition may be now the most popular implementation among SVM software (e.g., LIBSVM [1], SVMTorch [2]). It is also a special case of another popular software SVMlight[3]. For convergence Keerthi and Gilbert [4] has proved that under a stopping criterion and any stopping tolerance, it terminates in finite iterations. However, this result does not imply the asymptotic convergence. On the other hand, the asymptotic convergence of Lin [6] for the software SVMlightcan be applied to this algorithm when the size of the working set is restricted to two. However, in [6, Assumption IV.1] it requires the assumption that any two by two principal submatrix of the Hessian matrixQ is positive definite. This assumption may not be true if, for example, some data points are the same. In this paper we show that without this assumption results in [6] still follow. Hence existing implementations are asymptotically convergent without any problem.

The method by Keerthi et al. is as follows: Usingyi= 61, the KKT condition of (1) can be rewritten to

max max

<C;y =10rf( )i; >0;y =01max rf( )i

 min min

<C;y =01rf( )i; >0;y =1min 0rf( )i (2) whererf( ) = Q 0 e is the gradient of f( ) defined in (1). Then they consider

i  argmax(f0rf( )tj yt=1; t< Cg

frf( )tj yt= 0 1; t> 0g) (3) j  argmin(frf( )tj yt= 0 1; t< Cg

f0rf( )tj yt=1; t> 0g) (4)

and useB  fi; jg as the working set. That is, i and j are the two elements which violate the KKT condition the most.

Iff kg is the sequence generated by the decomposition method, the asymptotic convergence means that any convergent subsequence goes to an optimum of (1). The result of finite termination by Keerthi and Gilbert cannot be extended here because both sides of (2) are not continuous functions of . In [6], the asymptotic convergence has been proved but the author has to assume that the matrixQ satisfies

min

I (min(eig(QII))) > 0 (5) whereI is any subset of f1; . . . ; lg with jIj  2 and min(eig(1)) is the smallest eigenvalue of a matrix [6, Assumption IV.1]. The main purpose of this paper is to show that (5) is not necessary.

II. MAINRESULTS

The only reason why we need (5) is for [6, Lemma IV.2]. It proves that there exists > 0 such that

f( k+1)  f( k) 0 2k k+10 kk2; for all k: (6) In the following we will show that without (5), (6) is still valid. First we note that if kis the current solution andB = fi; jg is selected using (3) and (4), the required minimization on the subproblem takes place in the rectangleS = [0; C]2[0; C] along a path where yi i+yj j = 0yT

N kN is constant. Let the parametric change in on this path be given by (t) i(t)  k i + t yi ; j(t)  k j 0 t yj ; s(t)  k s; 8s 6= i; j: The subproblem is to minimize (t)  f( (t)) subject to ( i(t); j(t)) 2 S. Let t denote the solution of this problem and k+1 = (t). Clearly jtj = k k+1p0 kk 2 : (7) As (t) is a quadratic function on t (t) = (0) + 0(0)t + 00(0) 2 t2: (8) Since 0(t) = l s=1 rf( (t))s s0(t) =yirf( (t))i0 yjrf( (t))j =yi l s=1 Qis s(t) 0 1 0 yj l s=1 Qjs s(t) 0 1 and (9) 00(t) =Q ii+ Qjj0 2yiyjQij (10) we have 0(0) =y irf( k)i0 yjrf( k)jand (11) 00(0) =(x i)T(xi) + (xj)T(xj) 0 2y2iyj2(xi)T(xj) =k(xi) 0 (xj)k2: (12)

Then our new lemma is as follows.

Lemma: If the working set selection is by using (3) and (4), there

exists > 0 such that for any k, (6) holds.

Proof: SinceQ is positive semidefinite, 00(t)  0 so we can

consider the following two cases.

Case 1) 00(0) > 0. Let t3denote the unconstrained minimum of , i.e., t3= 0 0(0)= 00(0). Clearly, t = t3where0 <  1. Then, by (8) (t) 0 (0) = 0 000(0)2 (0) + 2 2 0(0)2 00(0)  0 22 000(0)(0)2 = 0 002(0)t2 = 0 004(0)k k+10 kk2 (13) where the last equality is from (7).

(3)

250 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 13, NO. 1, JANUARY 2002

Case 2) 00(t) = 0. By (12), (xi) = (xj). Using this, (9), and (11), we get 0(0) =y i l s=1 Qis ks0 yj l s=1 Qjs ks =yi l s=1 yiys(xi)T(xs) ks0 1 0 yj l s=1 yjys(xj)T(xs) ks0 1 =yj0 yi:

With (11), since descent is assured, 0(0) 6= 0. Thus yi 6= yj and hencej 0(0)j = 2. Since 00(0) = 00(t) = 0 implies 0(t) is a linear function, with (t)  (0) and jtj  C

(t) 0 (0) = 0j 0(0)tj  0 2

Ct2= 0 k

k+10 kk2

C : (14)

Note that (0) = f( k) and (t) = f( k+1). Thus, using (7), (10), and (14), if we get   min 2C; min i;j Qii+ Qjj0 2yiyjQij 2 : Qii+ Qjj0 2yiyjQij> 0gg

then the proof is complete.

III. CONCLUSION

Using [6, Th. IV.1] results here can be extended to the decomposition method for support vector regression which selects the two-component working set in a similar way. The future challenge will be to remove the same assumption when the size of the working set is more than two.

ACKNOWLEDGMENT

The author thanks S. Keerthi for many helpful comments. REFERENCES

[1] C. -C. Chang and C. -J. Lin. (2001) LIBSVM: a library for support vector machines. [Online]. Available: http://www.csie.ntu.edu.tw/-cjlin/libsvm [2] R. Collobert and S. Bengio, “SVMTorch: A support vector machine for large-scale regression and classification problems,” J. Machine Learning Research, vol. 1, pp. 143–160, 2001.

[3] T. Joachims, “Making large-scale SVM learning practical,” in Advances in Kernel Methods – Support Vector Learning, B. Schölkopf, C. J. C. Burges, and A. J. Smola, Eds. Cambridge, MA: MIT Press, 1998. [4] S. S. Keerthi and E. G. Gilbert, “Convergence of a generalized SMO

algorithm for SVM classifier design,” Machine Learning, vol. 46, pp. 351–360, 2002.

[5] S. S. Keerthi, S. Shevade, C. Bhattacharyya, and K. Murthy, “Improve-ments to Platt’s SMO algorithm for SVM classifier design,” Neural Comput., vol. 13, pp. 637–649, 2001.

[6] C.-J. Lin, “On the convergence of the decomposition method for support vector machines,” IEEE Trans. Neural Networks, vol. 12, pp. 1288–1298, 2001.

[7] E. Osuna, R. Freund, and F. Girosi, “Training support vector machines: An application to face detection,” in Proc. CVPR’97, 1997.

[8] J. C. Platt, “Fast training of support vector machines using sequential minimal optimization,” in Advances in Kernel Methods — Support Vector Learning, B. Schölkopf, C. J. C. Burges, and A. J. Smola, Eds. Cambridge, MA: MIT Press, 1998.

[9] V. Vapnik, Statistical Learning Theory. New York: Wiley, 1998.

Comments on “Robust Stability for Interval Hopfield Neural Networks With Time Delay” by X. F. Liao

Linshan Wang

Abstract—This paper points out an unjustified inequality in the above paper. So, Liao’s theorem based on this wrong inequality is not true.

Index Terms—Hopfield-type neural network, robust stability, time delay.

In the above paper,1consider the interval Hopfield neural networks

with time delay (see (1) and (2) at the top of the next page). The following theorem was presented by Liao in 1998.

Theorem: Letwij; wTij; ai be real constants(i; j = 1; 2; . . . ; n) and assume that

w3ii0 ai<0 (3)

B := 0 diag(w3

110 a1; w3220 a2; . . . ; w3nn0 an)

+(1 0 ij)wij3)n2n (4)

is a M-matrix. Wherewij3 = maxfj(wij+ wTij)dfi()=dj; (wij + wT

ij)dfi()=dg. Then system (1) with (2) has a unique and robust stable equilibriumu3= (u31; u32; . . . ; u3n) for each constant input I = (I1; I2; . . . ; In) 2 Rn.

In the proof of the above theorem, the author derived an unjustified inequality. Since the above theorem is based on this wrong inequality, Liao’s paper does not ensure that the theorem is true. To be specific, consider the following inequality obtained by Liao (see [1, p. 1044]

n i=1 [ ijaiui0 n j=1 (wij+ wTij)fj(uj)j]  n i=1 [j iaiuij 0 j n j=1 i(wij+ wTij)fj(uj)j] = n i=1 [j iaiuij 0 j n j=1 j(wji+ wjiT)fi(ui)j] = n i=1 [j u 0 iaidj 0 j n j=1 u 0 j(wji+ w T ji) dfdi()jd]  n i=1 [j u 0 iaidj 0 j n j=1 u 0 j(wji+ w T ji) dfdi()jd]: (5)

Manuscript received April 15, 2001; revised August 3, 2001. This work was supported in part by the National Natural Science Foundation of China under Grant 19831030.

The author is with the Department of Mathematics, Qingdao University, Qingdao 266000, China, and also with the Mathematical College, Sichuan University, Chengdu, 610064, China.

Publisher Item Identifier S 1045-9227(02)00942-6.

1X. Liao and J. Yu, IEEE Trans. Neural Networks, vol. 9, pp. 1042–1045, 1998.

1045–9227/02$17.00 © 2002 IEEE

參考文獻

相關文件

Wang, A recurrent neural network for solving nonlinear convex programs subject to linear constraints, IEEE Transactions on Neural Networks, vol..

In this paper, we build a new class of neural networks based on the smoothing method for NCP introduced by Haddou and Maheux [18] using some family F of smoothing functions.

Qi (2001), Solving nonlinear complementarity problems with neural networks: a reformulation method approach, Journal of Computational and Applied Mathematics, vol. Pedrycz,

SG is simple and effective, but sometimes not robust (e.g., selecting the learning rate may be difficult) Is it possible to consider other methods.. In this work, we investigate

{ Title: Using neural networks to forecast the systematic risk..

CAST: Using neural networks to improve trading systems based on technical analysis by means of the RSI financial indicator. Performance of technical analysis in growth and small

CAST: Using neural networks to improve trading systems based on technical analysis by means of the RSI financial indicator. Performance of technical analysis in growth and small

They are suitable for different types of problems While deep learning is hot, it’s not always better than other learning methods.. For example, fully-connected