Supplementary Materials for “Naive Parallelization of
Coordinate Descent Methods and an Application on Multi-core L1-regularized Classification”
Yong Zhuang
∗Carnegie Mellon University [email protected]
Yuchin Juan
†Criteo Research [email protected]
Guo-Xun Yuan
Facebook, Inc.
Chih-Jen Lin
National Taiwan University [email protected]
ACM Reference Format:
Yong Zhuang, Yuchin Juan, Guo-Xun Yuan, and Chih-Jen Lin. 2018. Supple- mentary Materials for “Naive Parallelization of Coordinate Descent Methods and an Application on Multi-core L1-regularized Classification”. In The 27th ACM International Conference on Information and Knowledge Management (CIKM ’18), October 22–26, 2018, Torino, Italy. ACM, New York, NY, USA, 1 page. https://doi.org/10.1145/3269206.3271687
A LINE SEARCH IN BUNDLE CDN
In this section, we explain a line search trick that can be applied in Naive CDN but cannot be applied in Bundle CDN. In addition, we demonstrate the impact of the line search trick.
In (13), the computational cost mainly comes from evaluating
∆j(βtd) ≡ L(w+ βtdej) − L(w),
which costs O(nnzj) and nnzjis the number of non-zeros in feature j. In CDN, for both SVM and LR, an upper bound of ∆j(βtd) is calculated during the training process. The cost of computing this upper bound ¯∆j(βtd) is just O(1). The details can be found in Eq.
(40) for SVM and Eq. (49) - (54) for LR in [2]. During the line search process, before computing∆j(βtd), CDN first checks if ¯∆j(βtd) satisfies (13). If satisfied, then we can skip the entire line search process, making the cost of line search drop from O(nnzj) to O(1).
(If not satisfied, then we still need to run the standard line search.) This trick, designed for one-variable problems, may not be appli- cable for multi-variable problems like (16). Therefore, if we check the code published by [1],1the implementation of this trick was commented out. They also remove the trick in the single-thread CDN to measure the speedup, meaning that they are comparing
∗Most of the work was done during the internship in Criteo Research.
†This author contributes equally with Yong Zhuang.
1https://github.com/bianan/ParallelCDN
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
CIKM ’18, October 22–26, 2018, Torino, Italy
© 2018 Copyright held by the owner/author(s). Publication rights licensed to ACM.
ACM ISBN 978-1-4503-6014-2/18/10. . . $15.00 https://doi.org/10.1145/3269206.3271687
0 500 1000 1500 2000 2500 3000 3500 Time (s)
10-4 10-3 10-2 10-1 100
Relative function value difference
NCDN NCDN*
BCDN
Figure I: Comparison between Naive CDN (NCDN), NCDN without the line-search trick (denoted as NCDN* in the fig- ure), and Bundle CDN (BCDN) with the bundle size 29,500.
16 threads are used for all methods. The data set used is kdd2010-a.
with a slower version of CDN. However, we believe in practice peo- ple would be more interested in the speedup against the original version of CDN. In Figure I, we compare the following setting on kdd2010-a.
• Naive CDN
• Naive CDN without the trick
• Bundle CDN with the best bundle size
The experiment shows that the trick indeed plays an important role for this data set.
REFERENCES
[1] Yatao Bian, Xiong Li, Mingqi Cao, and Yuncai Liu. 2013. Bundle CDN: a highly parallelized approach for large-scale l1-regularized logistic regression. In Proceed- ings of European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML/ PKDD).
[2] Rong-En Fan, Kai-Wei Chang, Cho-Jui Hsieh, Xiang-Rui Wang, and Chih-Jen Lin.
2008. LIBLINEAR: a library for large linear classification. Journal of Machine Learning Research 9 (2008), 1871–1874. http://www.csie.ntu.edu.tw/~cjlin/papers/
liblinear.pdf