• 沒有找到結果。

code measure order (: better than; ≈: similar to)

3.5 Comparison with Real-valued ECC

Our ML-ECC framework only considers binary ECC. In this section, we compare our ML-ECC framework with real-valued ECC methods: coding with canonical correlation analysis (CCA-OC) [Zhang and Schneider, 2011] and max-margin output coding (Max-Margin) [Zhang and Schneider, 2012]. The former method uses canonical correlation analysis to find the most linearly-predictable transform of original labels. The latter one uses metric learning to locate a good encoding function. Both methods uses approximate inference as their decoding method.

The main difference between the real-valued ECC and our ML-ECC framework is that our encoding functions transform the label vector into a binary codeword, while the real-valued ECC methods transform the label vector into a real-real-valued codeword. Moreover, the base learners in our framework deal with classification tasks, while the base learners in those real-valued ECC methods deal with regression tasks.

The experiment setting is basically the same as in Section 2.4, but we only use scene and emotions datasets, with Random Forests or logistic regression base learners. Both real-valued ECC methods limit their codeword length to at most twice of the number of labels, and the codeword contains K binary bits for original labels and at most K real-valued bits. In the following experiment, we take all 2K binary and real-real-valued bits for the real-valued ECC methods. There is a parameter λ in decoding for balancing the two parts, and we set it to 1 (equally weighted). For our ML-ECC framework, we consider HAMR and BCH code with the proposed soft-input geometric decoder, and use 127-bit binary codewords.

The results on Random Forests learners are shown in Table 3.6, and the results on logistic regression learners are shown in Table 3.7. It can be seen that with a stronger base learner like Random Forests, the HAMR and BCH codes are better than both real-valued ECC methods on the two datasets and on most of the measures. With the logistic regression learner, while BCH code performs the best on scene dataset, it only wins on 0/1 loss on emotions dataset. The real-valued ECC methods give higher micro and macro F1 score than HAMR and BCH on the emotions dataset. The reason may be

Table 3.6: Comparison between ML-ECC and real-valued ECC methods using Random Forests base learners

scene

ECC 0/1 loss Hamming loss Micro-F1 Macro-F1

HAMR-geo-soft .3294 ± .0021 .0726 ± .0005 .7710 ± .0018 .7741 ± .0017 BCH-geo-soft .2526 ± .0020 .0703 ± .0006 .7944 ± .0019 .8006 ± .0018 CCA-OC .3165 ± .0010 .0934 ± .0003 .7319 ± .0010 .7403 ± .0010

emotions

ECC 0/1 loss Hamming loss Micro-F1 Macro-F1

HAMR-geo-soft .6584 ± .0053 .1763 ± .0018 .7023 ± .0031 .6756 ± .0034 BCH-geo-soft .6264 ± .0049 .1822 ± .0016 .7153 ± .0025 .6975 ± .0025 CCA-OC .6728 ± .0021 .2022 ± .0009 .6920 ± .0014 .6824 ± .0014

Table 3.7: Comparison between ML-ECC and real-valued ECC methods using logistic regression base learners

scene

ECC 0/1 loss Hamming loss Micro-F1 Macro-F1

HAMR-geo-soft .3875 ± .0024 .0920 ± .0006 .7156 ± .0019 .7204 ± .0017 BCH-geo-soft .3142 ± .0016 .0913 ± .0006 .7337 ± .0016 .7397 ± .0014 CCA-OC .3599 ± .0011 .1088 ± .0004 .6875 ± .0011 .6952 ± .0011 MaxMargin .3654 ± .0023 .1107 ± .0009 .6820 ± .0024 .6889 ± .0025

emotions

ECC 0/1 loss Hamming loss Micro-F1 Macro-F1

HAMR-geo-soft .7177 ± .0069 .2032 ± .0024 .6544 ± .0045 .6310 ± .0047 BCH-geo-soft .6787 ± .0046 .2170 ± .0026 .6652 ± .0043 .6499 ± .0044 CCA-OC .6814 ± .0024 .2068 ± .0006 .6791 ± .0008 .6715 ± .0010 MaxMargin .6855 ± .0030 .2099 ± .0009 .6768 ± .0013 .6679 ± .0014

that the power of logistic regression base learner is limited, and the parity bits of HAMR and BCH are too sophisticated for the base learner. On the other hand, the code generated by CCA-OC and MaxMargin methods are easier to learn for such linear model. For sufficiently sophisticated base learners, the proposed discrete-ECC-based framework is the better choice for multi-label classification with error correcting codes.

Chapter 4 Conclusion

We presented a framework for applying the ECCs on multi-label classification. We then studied the use of four classic ECC designs, namely the RREP, HAMR, BCH, and LDPC.

We showed that RREP can be used to give a new perspective of the RAKEL algorithm as a special instance of the framework with the k-powerset as the base learner.

We conducted experiments with the four ECC designs on various real-world datasets.

The experiments further clarified the trade-off between the strength of the ECC and the hardness of the base learning tasks. Experimental results demonstrated that several ECC designs can lead to a better use of the trade-off. For instance, HAMR is superior over RREP for the k-powerset base learners because it leads to a new algorithm that is bet-ter than the original RAKEL in bet-terms of 0/1 loss while maintaining a comparable level of Hamming loss; BCH is another superior design, which could significantly improve RAKEL in terms of 0/1 loss. When compared with the traditional BR algorithm, we showed that using a stronger ECC like HAMR or BCH can lead to better performance in terms of both 0/1 and Hamming loss.

The results justify the validity and usefulness of the framework when coupled with some classic ECC. An interesting future direction is to consider adaptive ECC like the ones studied for multi-class classification [Schapire, 1997, Li, 2006].

Besides the framework, we also presented a novel geometric decoder for general lin-ear code based on approximating the XOR operation by multiplication. This decoder is capable of not only taking hard input as algebraic decoders, but also taking soft input

from the channel into account. The soft input may be gathered from the base learner channels as their confidence of an instance to be in one class. The experimental result on this new decoder demonstrated that this decoder outshines the ordinary decoder in terms of 0/1 loss and using soft input from the Binary Relevance learner further improves the performance of this decoder on Hamming loss. We also proposed and studied several methods to gather soft input from the k-powerset learner. The results show that different ECC designs match different methods better. It remains an interesting research problem on appropriately calculating the 1-bit confidence from the k-bit confidence.

Bibliography

R. C. Bose and D. K. Ray-Chaudhuri. On a class of error correcting binary group codes.

Information and Control, 3(1), 1960.

M. R. Boutell, J. Luo, X. Shen, and C. M. Brown. Learning multi-label scene classifica-tion. Pattern Recognition, 37(9), 2004.

C.-C. Chang and C.-J. Lin. LIBSVM: a library for support vector machines, 2001. Soft-ware available at http://www.csie.ntu.edu.tw/˜cjlin/libsvm.

K. Dembczy´nski, W. Waegeman, W. Cheng, and E. H¨ullermeier. On label dependence in multi-label classification. In Proc. of the 2nd International Workshop on Learning from Multi-Label Data, 2010.

T. G. Dietterich and G. Bakiri. Solving multiclass learning problems via error-correcting output codes. Journal of Artificial Intelligence Research, 2, 1995.

S. Diplaris, G. Tsoumakas, P. Mitkas, and I. Vlahavas. Protein classification with multiple algorithms. In Proc. of Panhellenic Conference on Informatics, 2005.

A. Elisseeff and J. Weston. A kernel method for multi-labelled classification. In Advances in Neural Information Processing Systems 14, 2002.

R.-E. Fan, K.-W. Chang, C.-J. Hsieh, X.-R. Wang, and C.-J. Lin. LIBLINEAR: A library for large linear classification. Journal of Machine Learning Research, 9, 2008.

R. G. Gallager. Low Density Parity Check Codes, Monograph. M.I.T. Press, 1963.

J. Hagenuaer, E. Offer, and L. Papke. Iterative decoding of binary block and convolutional codes. IEEE Transactions on Information Theory, 42(2), 1996.

M. Hall, E. Frank, G. Holmes, B. Pfahringer, P. Reutemann, and I. H. Witten. The WEKA data mining software: an update. SIGKDD Explorations, 11(1), 2009.

R. W. Hamming. Error detecting and error correcting codes. Bell System Technical Jour-nal, 26(2), 1950.

A. Hocquenghem. Codes correcteurs d’erreurs. Chiffres, 2, 1959.

D. Hsu, S. Kakade, J. Langford, and T. Zhang. Multi-label prediction via compressed sensing. In Advances in Neural Information Processing Systems 22, 2009.

A. Z. Kouzani. Multilabel classification using error correction codes. In Advances in Computation and Intelligence - 5th International Symposium, 2010.

A. Z. Kouzani and G. Nasireding. Multilabel classification by BCH code and random forests. International Journal of Recent Trends in Engineering, 2(1), 2009.

L. Li. Multiclass boosting with repartitioning. In Proc. of the 23rd International Confer-ence on Machine Learning, 2006.

H.-T. Lin, C.-J. Lin, and R. C. Weng. A note on Platt’s probabilistic outputs for support vector machines. Machine Learning, 68(3), 2007.

D. J. C. Mackay. Information Theory, Inference and Learning Algorithms. Cambridge University Press, 1st edition, 2003.

J. P. Pestian, C. Brew, P. Matykiewicz, D. J. Hovermale, N. Johnson, K. B. Cohen, and W. Duch. A shared task involving multi-label classification of clinical free text. In Proc.

of the Workshop on BioNLP 2007: Biological, Translational, and Clinical Language Processing, 2007.

J. C. Platt. Probabilistic outputs for support vector machines and comparisons to regular-ized likelihood methods. In Advances in Large Margin Classifiers, 1999.

R. E. Schapire. Using output codes to boost multiclass learning problems. In Proc. of the 14th International Conference on Machine Learning, 1997.

C. E. Shannon. A mathematical theory of communication. Bell Systems Technical Jour-nal, 27, 1948.

F. Tai and H.-T. Lin. Multi-label classification with principal label space transformation.

Neural Computation, 2012.

K. Trohidis, G. Tsoumakas, G. Kalliris, and I. Vlahavas. Multilabel classification of music into emotions. In Proc. of the 9th International Conference on Music Information Retrieval, 2008.

G. Tsoumakas and I. Vlahavas. Random k-labelsets: An ensemble method for multilabel classification. In Proc. of the 18th European Conference on Machine Learning, 2007.

G. Tsoumakas, J. Vilcek, and E. S. Xioufis. MULAN: A Java library for multi-label learning, 2010.

J. K. Wolf. Efficient maximum likelihood decoding of linear block codes using a trellis.

IEEE Transactions on Information Theory, 24(1), 1978.

Y. Zhang and J. Schneider. Multi-label output codes using canonical correlation analysis.

In Proc. of the 14th International Conference on Artificial Intelligence and Statistics, 2011.

Y. Zhang and J. Schneider. Maximum margin output coding. In Proc. of the 29th Inter-national Conference on Machine Learning, 2012.

Appendix A

相關文件