Conclusion 35 - 通過兩院制投票框架來使用未標記數據增強模型

In this paper, we propose a bicameralism voting framework to improve the accuracy of deep neuron networks and the training efficiency. Then we use bicameralism voting framework to label unlabeled data and enhance a model.

We first train a master model on a server, then we send the master model to multiple mobile devices, and each of them use its own data to update the master model into its stu-dent model by transfer learning. Then the master and the stustu-dent model uses bicameralism voting to classify inputs.

The bicameralism model has three advantages. First, the bicameralism model has better accuracy than a single model that use the same amount of training data. Second, the bicameralism model is computationally efficient for two reasons. First, transfer learning saves time computation because it improves upon an existing good model, and do not build the model from scratch, so it has less computation. In addition, the computation on the mobile devices can be done in parallel, which can further reduce the transfer learning time. The third advantage of the bicameralism model is its flexibility in how to train the student models. Because we only care about the prediction made by the student models in bicameralism model, how these student models are trained is up to the users. The users can use whatever data preprocessing, architecture, or model format to train the student models.

As a result it is much more flexible than the federated learning [27], which cannot combine models in different model formats, model architectures, or preprocessing of input data.

The bicameralism model is like a union of the master model and the crowd model, and it selects the prediction from whoever has a higher confidence. Although each student model has a low accuracy individually, their diversity actually helps bicameralism voting in achieving better accuracy. The master model is already a good model by itself, and with the help from a complementary crowd model, the final accuracy of the bicameralism model reaches 77.838%, which is higher than the master model and surprisingly, the erudite model. With bicameralism voting, the predictions we make are more accurate, so we try to generate labels for unlabeled data and improve our master model. With a filter when training, the final result will get a not bad accuracy, just a little bit lower than the one trained by labeled data.

6.1 Future works

We simulate the training of student models in a server (not on multiple mobile de-vices), due to the large number of mobile devices required by the framework. In the future, we wish to conduct the training of the student models on the mobile devices, and actually consider the communication issues that will arise.

The data set Food101 has a large amount of noise in the images. We would like to investigate the possibility of removing the noise with the crowd model in the future. The idea is that the master model is trained on Food101 only. However, the student models are trained with pictures taken by the mobile devices, and are likely to be labeled. Thus, the distribution of training data for the master model and the student model will be different.

As a result, the crowd model might be able to locate the noise in Food101 images and relabel them.

References

[1] M. Abadi, A. Agarwal, P. Barham, E. Brevdo, Z. Chen, C. Citro, G. S. Corrado, A. Davis, J. Dean, M. Devin, S. Ghemawat, I. Goodfellow, A. Harp, G. Irving, M. Is-ard, Y. Jia, R. Jozefowicz, L. Kaiser, M. Kudlur, J. Levenberg, D. Mané, R. Monga, S. Moore, D. Murray, C. Olah, M. Schuster, J. Shlens, B. Steiner, I. Sutskever, K. Talwar, P. Tucker, V. Vanhoucke, V. Vasudevan, F. Viégas, O. Vinyals, P. War-den, M. Wattenberg, M. Wicke, Y. Yu, and X. Zheng. TensorFlow: Large-scale machine learning on heterogeneous systems, 2015. Software available from tensor-flow.org.

[2] O. Abdel-Hamid, A.-R. Mohamed, H. Jiang, L. Deng, G. Penn, and D. Yu. Convo-lutional neural networks for speech recognition. IEEE/ACM Trans. Audio, Speech and Lang. Proc., 22(10):1533–1545, Oct. 2014.

[3] E. Bauer and R. Kohavi. An empirical comparison of voting classification algo-rithms: Bagging, boosting, and variants. Machine learning, 36(1-2):105–139, 1999.

[4] A. N. Bhagoji, S. Chakraborty, P. Mittal, and S. B. Calo. Analyzing federated learn-ing through an adversarial lens. CoRR, abs/1811.12470, 2018.

[5] K. Bonawitz, H. Eichner, W. Grieskamp, D. Huba, A. Ingerman, V. Ivanov, C. Kid-don, J. Konecný, S. Mazzocchi, H. B. McMahan, T. V. Overveldt, D. Petrou, D.

Ra-mage, and J. Roselander. Towards federated learning at scale: System design. CoRR, abs/1902.01046, 2019.

[6] L. Bossard, M. Guillaumin, and L. Van Gool. Food-101 – mining discriminative components with random forests. In European Conference on Computer Vision, 2014.

[7] F. Chen, Z. Dong, Z. Li, and X. He. Federated meta-learning for recommendation.

CoRR, abs/1802.07876, 2018.

[8] Z. Chen and D. Yi. The game imitation: Deep supervised convolutional networks for quick video game AI. CoRR, abs/1702.05663, 2017.

[9] F. Chollet. Xception: Deep learning with depthwise separable convolutions. CoRR, abs/1610.02357, 2016.

[10] E. David, N. S. Netanyahu, and L. Wolf. Deepchess: End-to-end deep neural network for automatic learning in chess. CoRR, abs/1711.09667, 2017.

[11] L. Deng and J. Platt. Ensemble deep learning for speech recognition. Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, pages 1915–1919, 01 2014.

[12] G. Eilertsen, J. Kronander, G. Denes, R. K. Mantiuk, and J. Unger. HDR image reconstruction from a single exposure using deep cnns. CoRR, abs/1710.07480, 2017.

[13] Y. Ganin and V. Lempitsky. Unsupervised domain adaptation by backpropagation.

In F. Bach and D. Blei, editors, Proceedings of the 32nd International Conference on

Machine Learning, volume 37 of Proceedings of Machine Learning Research, pages 1180–1189, Lille, France, 07–09 Jul 2015. PMLR.

[14] R. C. Geyer, T. Klein, and M. Nabi. Differentially private federated learning: A client level perspective. CoRR, abs/1712.07557, 2017.

[15] A. Hard, K. Rao, R. Mathews, F. Beaufays, S. Augenstein, H. Eichner, C. Kiddon, and D. Ramage. Federated learning for mobile keyboard prediction. CoRR, abs/

1811.03604, 2018.

[16] K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition.

CoRR, abs/1512.03385, 2015.

[17] S. Hershey, S. Chaudhuri, D. P. W. Ellis, J. F. Gemmeke, A. Jansen, C. Moore, M. Plakal, D. Platt, R. A. Saurous, B. Seybold, M. Slaney, R. Weiss, and K. Wilson.

Cnn architectures for large-scale audio classification. In International Conference on Acoustics, Speech and Signal Processing (ICASSP). 2017.

[18] S. Ioffe and C. Szegedy. Batch normalization: Accelerating deep network training by reducing internal covariate shift. CoRR, abs/1502.03167, 2015.

[19] H. Kim, J. Park, M. Bennis, and S. Kim. On-device federated learning via blockchain and its latency analysis. CoRR, abs/1808.03949, 2018.

[20] J. Konecný, H. B. McMahan, F. X. Yu, P. Richtárik, A. T. Suresh, and D. Bacon.

Federated learning: Strategies for improving communication efficiency. CoRR, abs/

1610.05492, 2016.

[21] A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet classification with deep convolutional neural networks. Commun. ACM, 60(6):84–90, May 2017.

[22] A. Krogh and J. Vedelsby. Neural network ensembles, cross validation, and active learning. In Advances in neural information processing systems, pages 231–238, 1995.

[23] G. Lample and D. S. Chaplot. Playing FPS games with deep reinforcement learning.

CoRR, abs/1609.05521, 2016.

[24] M. Long and J. Wang. Learning transferable features with deep adaptation networks.

CoRR, abs/1502.02791, 2015.

[25] C. Ma, V. Smith, M. Jaggi, M. I. Jordan, P. Richtárik, and M. Takác. Adding vs.

averaging in distributed primal-dual optimization. CoRR, abs/1502.03508, 2015.

[26] X. Mao, C. Shen, and Y. Yang. Image restoration using convolutional auto-encoders with symmetric skip connections. CoRR, abs/1606.08921, 2016.

[27] H. B. McMahan, E. Moore, D. Ramage, and B. A. y Arcas. Federated learning of deep networks using model averaging. CoRR, abs/1602.05629, 2016.

[28] H. B. McMahan, D. Ramage, K. Talwar, and L. Zhang. Learning differentially pri-vate language models without losing accuracy. CoRR, abs/1710.06963, 2017.

[29] C. Merkwirth, H. Mauser, T. Schulz-Gasch, O. Roche, M. Stahl, and T. Lengauer.

Ensemble methods for classification in cheminformatics. Journal of chemical information and computer sciences, 44(6):1971–1978, 2004.

[30] T. Nishio and R. Yonetani. Client selection for federated learning with heterogeneous resources in mobile edge. CoRR, abs/1804.08333, 2018.

[31] S. J. Pan, I. W. Tsang, J. T. Kwok, and Q. Yang. Domain adaptation via transfer component analysis. In Proceedings of the 21st International Jont Conference on

Artifical Intelligence, IJCAI’09, pages 1187–1192, San Francisco, CA, USA, 2009.

Morgan Kaufmann Publishers Inc.

[32] I. V. Serban, C. Sankar, M. Germain, S. Zhang, Z. Lin, S. Subramanian, T. Kim, M. Pieper, S. Chandar, N. R. Ke, S. Mudumba, A. de Brébisson, J. Sotelo, D. Suhubdy, V. Michalski, A. Nguyen, J. Pineau, and Y. Bengio. A deep rein-forcement learning chatbot. CoRR, abs/1709.02349, 2017.

[33] F. SHAN, L. ZHAO, and F. YANG. A novel semantic matching method for chatbots based on convolutional neural network and attention mechanism. Revue d’intelligence artificielle, 32:103–114, 12 2018.

[34] R. Shokri and V. Shmatikov. Privacy-preserving deep learning. In Proceedings of the 22Nd ACM SIGSAC Conference on Computer and Communications Security, CCS ’15, pages 1310–1321, New York, NY, USA, 2015. ACM.

[35] D. Silver, A. Huang, C. J. Maddison, A. Guez, L. Sifre, G. van den Driess-che, J. Schrittwieser, I. Antonoglou, V. Panneershelvam, M. Lanctot, S. Diele-man, D. Grewe, J. Nham, N. Kalchbrenner, I. Sutskever, T. Lillicrap, M. Leach, K. Kavukcuoglu, T. Graepel, and D. Hassabis. Mastering the game of Go with deep neural networks and tree search. Nature, 529(7587):484–489, Jan. 2016.

[36] K. Simonyan and A. Zisserman. Very deep convolutional networks for large-scale image recognition. CoRR, abs/1409.1556, 2014.

[37] C. Szegedy, S. Ioffe, and V. Vanhoucke. Inception-v4, inception-resnet and the im-pact of residual connections on learning. CoRR, abs/1602.07261, 2016.

[38] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. E. Reed, D. Anguelov, D. Erhan, V.

Van-houcke, and A. Rabinovich. Going deeper with convolutions. CoRR, abs/1409.4842, 2014.

[39] C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna. Rethinking the inception architecture for computer vision. CoRR, abs/1512.00567, 2015.

[40] E. Tzeng, J. Hoffman, T. Darrell, and K. Saenko. Simultaneous deep transfer across domains and tasks. CoRR, abs/1510.02192, 2015.

[41] E. Tzeng, J. Hoffman, N. Zhang, K. Saenko, and T. Darrell. Deep domain confusion:

Maximizing for domain invariance. CoRR, abs/1412.3474, 2014.

[42] A. Y. Vadwala, K. A. Suthar, Y. A. Karmakar, and N. Pandya. Survey paper on different speech recognition algorithm: Challenges and techniques. Int. J. Comput.

Appl., 175(1):31–36, 2017.

[43] N. Yakovenko, L. Cao, C. Raffel, and J. Fan. Poker-cnn: A pattern learning strategy for making draws and bets in poker games. CoRR, abs/1509.06731, 2015.

[44] J. Yosinski, J. Clune, Y. Bengio, and H. Lipson. How transferable are features in deep neural networks? CoRR, abs/1411.1792, 2014.

[45] H. Zhu and Y. Jin. Multi-objective evolutionary federated learning. CoRR, abs/

1812.07478, 2018.

在文檔中通過兩院制投票框架來使用未標記數據增強模型 (頁 46-54)