Future Work - 基於MapReduce之藥物不良反應分析資料方體的計算方法

Although our framework can quickly compute the contingency cube, it needs to rebuild the whole cube to incorporate new data released by FAERS. A better way is to update part of the contingency cube to reflect the source evolution. But the contingency cube is more complicated than OLAP cube, because each cell stores a contingency table and the values of b, c, and d are dependent on the content of other cells. So our first challenging is to develop a new approach to incrementally update the ADR contingency cube efficiently.

Another interesting issue is to find more efficient computing framework. In MapReduce, each intermediate (key, value) pair always has to be written back to disk before sends to the reducers, which incurs lots of I/O overhead and so easy to become the performance bottleneck. Nowadays, more and more computing frameworks have been proposed, like Spark [2] and Flink [1]. They can cache intermediate data in memory for next computing stage, so are more efficient than MapReduce. Furthermore, these frameworks can also be run on Hadoop, which means they have many supports from open source community. Recently, more related applications have been developed on Spark and Flink. Our next step is to redesign our method for contingency cube computation on these new computing frameworks.

References

[1] Apache Flink, Available: https://flink.apache.org/, [July 22, 2016].

[2] Apache Spark, Available: http://spark.apache.org/, [July 22, 2016].

[3] Canada Vigilance Adverse Reaction Online Database, Available: http://www.hc-sc.gc.ca/dhp-mps/medeff/databasdon/index-eng.php, [July 21, 2016].

[4] D. Cutting, Available: https://issues.apache.org/jira/browse/INFRA-700, [July 20, 2016].

[5] FDA's Adverse Event Reporting System (FAERS), Available:

http://www.fda.gov/Drugs/GuidanceComplianceRegulatoryInformation/Surveilla nce/AdverseDrugEffects/default.htm, [July 21, 2016].

[6] HDFS Architecture, Available: https://hadoop.apache.org/docs/r2.7.2/hadoop-project-dist/hadoop-hdfs/HdfsDesign.html, [July 19, 2016].

[7] iADRs, Available: http://iadr.csie.nuk.edu.tw, [July 21, 2016].

[8] P. Mell and T. Grance, The NIST Definition of Cloud Computing, Available:

http://nvlpubs.nist.gov/nistpubs/Legacy/SP/nistspecialpublication800-145.pdf, [July 20, 2016].

[9] National Library of Medicine RxNorm, Available:

https://www.nlm.nih.gov/research/umls/rxnorm/, [July 7, 2016].

[10] Taiwan National Adverse Drug Reactions Reporting System, Available:

https://adr.fda.gov.tw/Manager/WebLogin.aspx, [July 21, 2016].

[11] Uppsala Monitoring Centre (UMC), Available: http://www.who-umc.org, [July 21, 2016].

[12] M.S. Wang, MRiADRs, Available: https://github.com/Phate334/MRiADRs, [July 7, 2016].

[13] Yellow Card Scheme, Available: https://yellowcard.mhra.gov.uk, [July 21, 2016].

[14] S. Agarwal, R. Agrawal, P.M. Deshpande, A. Gupta, J.F. Naughton, R.

Ramakrishnan, and S. Sarawagi, “On the computation of multidimensional aggregates,” in Proceedings of 22th International Conference on Very Large Data Bases, 1996, pp. 506-521.

[15] J.S. Almenoff, K.K. LaCroix, N.A. Yuen, D. Fram, and W. DuMouchel,

“Comparative performance of two quantitative safety signaling methods:

implications for use in a pharmacovigilance department,” Journal of Drug Safety, vol. 29, no. 10, pp. 875-887, 2006.

[16] A. Bate, “Bayesian confidence propagation neural network,” Journal of Drug Safety, vol. 30, no. 7, pp. 623-625, 2007.

[17] A. Bate et al, “A Bayesian neural network method for adverse drug reaction signal generation,” European journal of clinical pharmacology, vol. 54, no. 4, pp. 315-321, 1998.

[18] B.K. Chen and Y.T. Yang, “Post-marketing surveillance of prescription drug safety: past, present, and future,” Journal of Legal Medicine, vol. 34, no. 2, pp.

193-213, 2013.

[19] E.F. Codd, S.B. Codd, and C.T. Salley, Providing OLAP (On-Line Analytical Processing) to User-Analysts: An IT Mandate, E. F. Codd & Associates, 1993.

[20] J. Dean, and S. Ghemawat, "MapReduce: simplified data processing on large clusters,” Communications of the ACM, vol. 51, no. 1, pp. 107-113, 2008.

[21] G. Deshpande, V. Gogolak, and S.W. Smith, “Data mining in drug safety,”

Journal of Pharmaceutical Medicine, vol. 24, no. 1, pp. 37-43, 2010.

[22] S.J. Evans, P.C. Waller, and S. Davis, “Use of proportional reporting ratios (PRRs) for signal generation from spontaneous adverse drug reaction reports,” Journal of Drug Safety, vol. 10, no. 6, pp. 483-486, 2001.

[23] S. Ghemawat, H. Gobioff, and S.-T. Leung, "The Google files system," in Proceedings of 19th ACM Symposium on Operating Systems Principles, 2003, pp.

29-43.

[24] S. Goil and A.N. Choudhary, “High performance OLAP and data mining on parallel computers,” Data Mining and Knowledge Discovery, vol. 1, no. 4, pp.

391-417, 1997.

[25] S. Goil and A.N. Choudhary, “A parallel scalable infrastructure for OLAP and data mining,” in Proceedings of International Symposium on Database Engineering and Applications, 1999, pp. 178-186.

[26] J. Gray, S. Chaudhuri, A. Bosworth, and H. Pirahesh, “Data cube: a relational aggregation operator generalizing group-by, cross-tab, and sub totals,” Data Mining and Knowledge Discovery, vol. 1, no. 1, pp. 29-53, 1997.

[27] F.H. Huang, “Effect of drug name inconsistence and duplicate report in SRS data to the detection of ADR signals,” Master thesis, Dept. of Computer Science and Information Engineering, National University of Kaohsiung, Taiwan, July 2015.

[28] S. Landset, T.M. Khoshgoftaar, A.N. Richter, and T. Hasanin, “A survey of open source tools for machine learning with big data in the Hadoop ecosystem,” Journal of Big Data, vol. 2, no. 1, pp. 1–36, 2015.

[29] S. Lee, J. Kim, Y.S. Moon, and W. Lee, “Efficient distributed parallel top-down computation of ROLAP data cube using MapReduce,” in Proceedings 14th International Conference on Data Warehousing and Knowledge Discovery, pp.

168–179, 2012.

[30] W.Y. Lin, H.Y. Li, J.W. Du, W.Y. Feng, C.F. Lo, and V.W. Soo, “iADRs: towards online adverse drug reaction analysis,” Springer Plus, vol. 1, article no. 72, 2012.

[31] H. Lu, X. Huang, and Z. Li, “Computing data cubes using massively parallel processors,” in Proceedings of 7th Parallel Computing Workshop, 1997.

[32] S. Muto and M. Kitsuregawa, “A dynamic load balancing strategy for parallel datacube computation,” in Proceedings of 2nd ACM International Workshop on Data Warehousing and OLAP, 1999, pp. 67–72.

[33] A. Nandi, C. Yu, P. Bohannon, and R. Ramakrishnan, “Distributed cube materialization on holistic measures,” in Proceedings IEEE International Conference on Data Engineering, pp. 183–194, 2011.

[34] A. Nandi, C. Yu, P. Bohannon, and R. Ramakrishnan, “Data cube materialization and mining over MapReduce,” IEEE Trans. Knowledge and Data Engineering, vol. 24, no. 10, pp. 1747–1759, 2012.

[35] E. Poluzzi, E. Raschi, C. Piccinni, and F. De Ponti, “Data mining techniques in pharmacovigilance: analysis of the publicly accessible FDA adverse event reporting system (AERS),” in Data Mining Applications in Engineering and Medicine, Adem Karahoca, Ed. Turkey: InTech, 2012, pp. 266-302.

[36] E. Roux, F. Thiessard, A. Fourrier, B. Begaud, and P. Tubert-Bitter, “Evaluation of statistical association measures for the automatic signal generation in pharmacovigilance,” IEEE Transactions on Information Technology in Biomedicine, vol. 9, no. 4, pp. 518-527, 2005.

[37] K. Sergey and K. Yury, “Applying Map-Reduce paradigm for parallel closed cube computation,” in Proceedings 1st International Conference on Advances in Databases, Knowledge, and Data Applications, pp. 62–67, 2009.

[38] K. Shvachko, H. Kuang, S. Radia and R. Chansler, “The Hadoop distributed file system,” in Proceedings of 26th IEEE Symposium on Mass Storage Systems and Technologies, 2010, pp. 1-10.

[39] D. Singh and C.K. Reddy, "A survey on platforms for big data analytics", Journal of Big Data, vol. 2, no. 8, pp. 1-20, 2014.

[40] J. Song, C. Guo, Z. Wang, Y. Zhang, G. Yu, J.M. Pierson, “HaoLap: A Hadoop based OLAP system for big data,” Journal of Systems and Software, vol. 102, pp.

167-181, 2015.

[41] Z. Wang, Y. Chu, K.L. Tan, D. Agrawal, A.E. Abbadi, and X. Xu, “Scalable data cube analysis over big data,” in Computing Research Repository, arXiv:1311.5663 2013.

[42] J. You, J. Xi, P. Zhang, and H. Chen, “A parallel algorithm for closed cube computation,” in Proceedings 7th IEEE/ACIS International Conference on Computer and Information Science, pp. 95–99, 2008.

在文檔中基於MapReduce之藥物不良反應分析資料方體的計算方法 (頁 58-63)