Simulation studies and real data application

Algorithm Fuzzy Weighted Support Vector Regression using the dual coordinate descent method

4. Simulation studies and real data application

In this section, the performances of our algorithm in mixture regression problems will be demonstrated and the stability of the data with outliers or large-scale data will be compared with the FWSVR of Franck et al. (2007) and fuzzy c regression. The performance measure is based on the following expression

= 1

× ( − ) (70)

where = is the estimated value for the th data point and the th regression model.

In the following experiments, we can control the penalty term to those data that assigned to the wrong regression line by tuning γ. In other words, the bigger γ means the less tolerance to the penalty term; the smaller γ means the higher tolerance to the penalty term. And the parameter ε gives the margin of tolerance, it means we can accept the error of data within the margin. So, the larger the ε, the larger the tolerance margin, and the more data will be ignored. The more data errors will be considered in the small ε, but it is also easy to overfitting. We fixed = 0.01, = 100 for the FWSVR-DCD by the cross validation. And the degree of fuzziness of fuzzy weights is fixed to 2 according to the setting adopted by most papers, e.g. Yang et al. (2007).

As the curve in Fig 3, the MSE converge to a small value and is acceptable when is greater than or equal to 2. We assume that every line is corrupted by a Gaussian noise with zero mean and variance , the th line has samples.

In the first case, we compare the FWSVR-DCD with and without alpha cut, the training data sets are defined as following

= + + , = 1, . . , , = 1, … , , ℎ = 2, and the MSEs are close, but the execution times with alpha cut is smaller than without alpha cut.

Therefore, introducing the alpha cut method actually can reduce the computing time and obtained the results more efficiency.

Table 1. The comparative results of FWSVR-DCD with and without alpha cut

Figure 4. The results of the FWSVR-DCD with (the left figure) and without (the right figure) alpha cut

In the second case, we compared the FWSVR-DCD, the FWSVR proposed by Franck et al.

(2007), and Fuzzy C Regression in large-scale cases, the training data is similar to the setting of three cross line in Franck (2007) as follows

= + + , = 1, . . , , = 1, … , , = 3, shows the results of FWSVR-DCD and FWSVR proposed by Franck (2007), both are fitting well.

But as we can see in the Table 2, the mean number of iterations and the mean execution times of

Table 2. The comparison of FWSVR-DCD, FWSVR, FCR for large-scale data

coordinate descent method and applied to FWSVR.

(a)

(b) (c)

Figure 5. Results of FWSVR-DCD (a), FWSVR (b), FCR (c) in large scale case

To illustrate the robustness of the FWSVR-DCD, we also try this method in data with different percentage outliers. The training data continues to use the above settings and be added some outliers as following

= + + , = 1, . . , , = 1, … , , = 3,

= 3

1 , = 35

0 , = 100

−1 = = = 100,

, … , ~ (0,100), , … , ~ (0, ), = = 1, σ = 4 The outliers are generated by multivariate normal distribution MN 50

50 , 20 0

0 20 , and we have tried several cases for the number of outliers . The results of = 50 is shown in Fig 6 and the estimated coefficients are shown in table 3. Although, FCR is the fastest method, it is easily affected by outliers. In comparison, both FWSVR-DCD and FWSVR are more stable in this situation, and

the former requires less execution times.

Table 3. The comparison of FWSVR-DCD, FWSVR, FCR for data with outliers

In the previous cases, we considered different straight lines with background noises which are some special outliers considered by Franck. Furthermore, we also consider another case where the outliers are concentrated in one place.

Example 1. two curves with 10 outliers:

= + + + , = 1, . . , , = 1, … , , = 2, Example 2. a curve and a straight line with 10 outliers:

= + + + , = 1, . . , , = 1, … , , = 2,

the results are shown in Fig 7 and Fig 8. If the outliers are concentrated in one place as shown in Fig 7, Fig 8, and Fig 9, the estimated coefficient will be seriously affected when the number of outliers is too large. In the above cases, when the number of outliers exceeds 50 percentage of the overall data, the fitted regression models perform bad and the results are shown in Fig 9.

Figure 7. The results of Example 1 Figure 8. The results of Example 2

Figure 9. The results of data with too many outliers

As we can note first, the influence of fuzzy weights on data is more effective and the MSE is smaller when increases. And the introducing of DCD and alpha cut apparently make the convergence more efficiency and stable. From the different cases shown above, the results of FWSVR-DCD and FWSVR are stable. However, the results of FCR sometimes did not perform well in data with outliers, because FCR is easily affected by initial values. And FWSVR-DCD reduce the sensitivity to outliers due to the design of its model and the adjustment of parameters. As previously, these results confirm that FWSVR-DCD reduce the more time consumption than FWSVR in large scale data, and it keeps the robustness property with data with outliers.

Finally, we applied the FWSVR-DCD to real data, a typical data set is the tone perception data (Cohen 1984). In tone perception experiment, a pure fundamental tone was played to a trained musician and added electronically generated overtones which were determined by a stretching ratio.

150 data points are obtained from the same musician, and the aim of this experiment was to find out how the tuning ratio affects the perception of the tone and to decide if either of two musical perception theories was reasonable (see Cohen [12] for more details). We set the = 100, =

analyzed by Bai et al. (2012) [13] and our result is consistent with theirs. Based on Fig. 10, two lines are evident which correspond to the behavior indicated by the two musical perception theories.

We also applied the FWSVR-DCD to another real data which is the Aphids data (Boiteau 1998 [14]). This data contains the results of 51 independent experiments in which varying numbers of aphids were released in a flight chamber containing 12 infected and 69 healthy tobacco plants. After 24 hours, the flight chamber was fumigated to kill the aphids, and the previously healthy plants were moved to a greenhouse and monitored to detect symptoms of infection. The number of plants displaying such symptoms was recorded. This data has also been analyzed by Grun and Leisch et al.

(2008) [15]. By cross validation, we set = 1, = 0.1, = 2, and the result is shown in Fig. 11.

Because we consider the FWSVR-DCD in linear model cases, the fitted regression line is a little bit different from the results of Grun’s. But roughly it can be seen that the fitted regression models of the two groups are actually quite similar.

Figure 10. The scatter plot of the tone perception data and the fitted two lines by Bai (left) and FWSCR-DCD (right)

Figure 11. The scatter plot of Aphids data with fitted regression lines by Grun (left) and FWSCR-DCD (right)

By simulation studies and real data applications, the execution time of FWSVR-DCD is faster than the standard FWSVR and keeps the robustness in outlier cases. FWSVR-DCD also performs well in real data applications. The above results are all discussed in the cases of linear models, but SVM can handle nonlinear cases by different kernel function, for example, polynomial kernel and Gaussian kernel. So it may be able to apply kernel function to extend FWSVR to nonlinear cases in the future.

5. Conclusion

The contribution of this thesis is to improve the computing time of fuzzy weighted support vector regression in large scale problems or data with outliers. We applied the dual coordinate descent method to FWSVR, and based on the property of the DCD method, it reduced the computing time of FWSVR and have a stable result in data with noises or outliers. In addition, the membership is developed from fuzzy theory, so we apply the alpha cut method to further improve the efficiency of calculation. The results of simulations show that the proposed method yields efficient and accurate regression model for large-scale problems and keep the robustness in data with outliers. Analyzing a real example of tone perception data, we demonstrate the practicality of the proposed method. It can effectively obtain good estimates from the tone perception data.

6. Reference

[1] Burges, C. J. (1998). A tutorial on support vector machines for pattern recognition. Data

mining and knowledge discovery, 2(2), 121-167.

[2] Cortes, C., & Vapnik, V. (1995). Support-vector networks. Machine learning, 20(3), 273-297.

[3] Ben-Hur, A., Horn, D., Siegelmann, H. T., & Vapnik, V. (2001). Support vector clustering. Journal of machine learning research, 2(Dec), 125-137.

[4] Smola, A. J., & Schölkopf, B. (2004). A tutorial on support vector regression. Statistics and

computing, 14(3), 199-222.

[5] Dufrenois, F., & Hamad, D. (2007, August). Fuzzy weighted support vector regression for multiple linear model estimation: application to object tracking in image sequences. In 2007

International Joint Conference on Neural Networks (pp. 1289-1294). IEEE.

[6] Ho, C. H., & Lin, C. J. (2012). Large-scale linear support vector regression. The Journal of

Machine Learning Research, 13(1), 3323-3348.

[7] Yang, M. S., Wu, K. L., Hsieh, J. N., & Yu, J. (2008). Alpha-cut implemented fuzzy clustering algorithms and switching regressions. IEEE Transactions on Systems, Man, and Cybernetics,

Part B (Cybernetics), 38(3), 588-603.

[8] Drucker, H., Burges, C. J., Kaufman, L., Smola, A. J., & Vapnik, V. (1997). Support vector regression machines. In Advances in neural information processing systems (pp. 155-161).

[9] Platt, J. (1998). Sequential minimal optimization: A fast algorithm for training support vector machines.

[10]Flake, G. W., & Lawrence, S. (2002). Efficient SVM regression training with SMO. Machine

Learning, 46(1-3), 271-290.

[11] Hsieh, C. J., Chang, K. W., Lin, C. J., Keerthi, S. S., & Sundararajan, S. (2008, July). A dual coordinate descent method for large-scale linear SVM. In Proceedings of the 25th

international conference on Machine learning (pp. 408-415).

[12] Bai, X., Yao, W., & Boyer, J. E. (2012). Robust fitting of mixture regression models. Computational Statistics & Data Analysis, 56(7), 2347-2359.

[13] Cohen, E. A. (1984). Some effects of inharmonic partials on interval perception. Music

Perception, 1(3), 323-349.

[14] Boiteau, G., Singh, M., Singh, R. P., Tai, G. C. C., & Turner, T. R. (1998). Rate of spread of PVY n by alateMyzus persicae (Sulzer) from infected to healthy plants under laboratory conditions. Potato Research, 41(4), 335-344.

[15] Grün, B., & Leisch, F. (2008). Finite mixtures of generalized linear regression models.

In Recent advances in linear models and related areas (pp. 205-230). Physica-Verlag HD.

在文檔中 Fuzzy Weighted Support Vector Regression Using the Dual Coordinate Descent Method (頁 27-37)