• 沒有找到結果。

Supplementary Materials for “Naive Parallelization of Coordinate Descent Methods and an Application on Multi-core L1-regularized Classification”

N/A
N/A
Protected

Academic year: 2022

Share "Supplementary Materials for “Naive Parallelization of Coordinate Descent Methods and an Application on Multi-core L1-regularized Classification”"

Copied!
1
0
0

加載中.... (立即查看全文)

全文

(1)

Supplementary Materials for “Naive Parallelization of

Coordinate Descent Methods and an Application on Multi-core L1-regularized Classification”

Yong Zhuang

Carnegie Mellon University [email protected]

Yuchin Juan

Criteo Research [email protected]

Guo-Xun Yuan

Facebook, Inc.

[email protected]

Chih-Jen Lin

National Taiwan University [email protected]

ACM Reference Format:

Yong Zhuang, Yuchin Juan, Guo-Xun Yuan, and Chih-Jen Lin. 2018. Supple- mentary Materials for “Naive Parallelization of Coordinate Descent Methods and an Application on Multi-core L1-regularized Classification”. In The 27th ACM International Conference on Information and Knowledge Management (CIKM ’18), October 22–26, 2018, Torino, Italy. ACM, New York, NY, USA, 1 page. https://doi.org/10.1145/3269206.3271687

A LINE SEARCH IN BUNDLE CDN

In this section, we explain a line search trick that can be applied in Naive CDN but cannot be applied in Bundle CDN. In addition, we demonstrate the impact of the line search trick.

In (13), the computational cost mainly comes from evaluating

jtd) ≡ L(w+ βtdej) − L(w),

which costs O(nnzj) and nnzjis the number of non-zeros in feature j. In CDN, for both SVM and LR, an upper bound of ∆jtd) is calculated during the training process. The cost of computing this upper bound ¯∆jtd) is just O(1). The details can be found in Eq.

(40) for SVM and Eq. (49) - (54) for LR in [2]. During the line search process, before computing∆jtd), CDN first checks if ¯∆jtd) satisfies (13). If satisfied, then we can skip the entire line search process, making the cost of line search drop from O(nnzj) to O(1).

(If not satisfied, then we still need to run the standard line search.) This trick, designed for one-variable problems, may not be appli- cable for multi-variable problems like (16). Therefore, if we check the code published by [1],1the implementation of this trick was commented out. They also remove the trick in the single-thread CDN to measure the speedup, meaning that they are comparing

Most of the work was done during the internship in Criteo Research.

This author contributes equally with Yong Zhuang.

1https://github.com/bianan/ParallelCDN

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

CIKM ’18, October 22–26, 2018, Torino, Italy

© 2018 Copyright held by the owner/author(s). Publication rights licensed to ACM.

ACM ISBN 978-1-4503-6014-2/18/10. . . $15.00 https://doi.org/10.1145/3269206.3271687

0 500 1000 1500 2000 2500 3000 3500 Time (s)

10-4 10-3 10-2 10-1 100

Relative function value difference

NCDN NCDN*

BCDN

Figure I: Comparison between Naive CDN (NCDN), NCDN without the line-search trick (denoted as NCDN* in the fig- ure), and Bundle CDN (BCDN) with the bundle size 29,500.

16 threads are used for all methods. The data set used is kdd2010-a.

with a slower version of CDN. However, we believe in practice peo- ple would be more interested in the speedup against the original version of CDN. In Figure I, we compare the following setting on kdd2010-a.

• Naive CDN

• Naive CDN without the trick

• Bundle CDN with the best bundle size

The experiment shows that the trick indeed plays an important role for this data set.

REFERENCES

[1] Yatao Bian, Xiong Li, Mingqi Cao, and Yuncai Liu. 2013. Bundle CDN: a highly parallelized approach for large-scale l1-regularized logistic regression. In Proceed- ings of European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML/ PKDD).

[2] Rong-En Fan, Kai-Wei Chang, Cho-Jui Hsieh, Xiang-Rui Wang, and Chih-Jen Lin.

2008. LIBLINEAR: a library for large linear classification. Journal of Machine Learning Research 9 (2008), 1871–1874. http://www.csie.ntu.edu.tw/~cjlin/papers/

liblinear.pdf

參考文獻

相關文件

Based on the above analysis, for the comparison with Al- gorithm 4 and LIBLINEAR without the shrinking technique we consider the new asynchronous CD implementation that permutes

In his method, he spreads all original dataset into each node, however, when there are some nodes that have too many vertices and triangles, he will simplify this node

There are two possible reasons. The first one is that even if workloads are running simultaneously on different cores, they can still affect each other, e.g., by competing

Receiver operating characteristic (ROC) curves are a popular measure to assess performance of binary classification procedure and have extended to ROC surfaces for ternary or

Teachers may encourage students to approach the poem as an unseen text to practise the steps of analysis and annotation, instead of relying on secondary

The short film “My Shoes” has been chosen to illustrate and highlight different areas of cinematography (e.g. the use of music, camera shots, angles and movements, editing

◦ An online stage for scheduling an applicat ion to the most appropriate core type base d on predicted performance interference.  The proposed scheduler can improve ove rall

Convergence of the (block) coordinate descent method requires typi- cally that f be strictly convex (or quasiconvex or hemivariate) differentiable and, taking into account the