• 沒有找到結果。

以使用者有感為中心的環境物聯網研究 - 政大學術集成

N/A
N/A
Protected

Academic year: 2021

Share "以使用者有感為中心的環境物聯網研究 - 政大學術集成"

Copied!
155
0
0

加載中.... (立即查看全文)

全文

(1)Department of Computer Science National Chengchi University. Doctoral Thesis. 立. 政 治 大. ‧ 國. 學. Internet of Environmental Things: A Human Centered. ‧. Approach. n. er. io. sit. y. Nat. al. Ch. engchi. i n U. v. Student: Sachit Mahajan Advisors:Dr. Ling-Jyh Chen Prof. Tzu-Chieh Tsai. May 2019 DOI:10.6814/DIS.NCCU.TIGP.001.2019.B02.

(2) Internet of Environmental Things: A Human Centered Approach. Student:Sachit Mahajan Advisors:Dr. Ling-Jyh Chen. 立. 政 治 大 Prof. Tzu-Chieh Tsai. ‧. ‧ 國. 學 sit. y. Nat. n. er. io. A Thesis submitted to Science a l Department of Computer v i n C h Chengchi University National U engchi in partial fulfillment of the Requirements for the degree of Ph.D. in Social Networks and Human Centered Computing. May 2019 DOI:10.6814/DIS.NCCU.TIGP.001.2019.B02.

(3) 立. 政 治 大. ‧. ‧ 國. 學. n. er. io. sit. y. Nat. al. Ch. engchi. i n U. v. DOI:10.6814/DIS.NCCU.TIGP.001.2019.B02.

(4) Acknowledgements. First of all, I would like to thank god for giving me the strength and courage to do this work.. 治 政 research lab and for giving me the opportunity to work 大 on this thesis. I am thankful to him 立 for his constant support, for his patience, motivation, and immense knowledge. He believed. I would like to express my sincere gratitude to Dr. Ling-Jyh Chen for allowing me to join his. ‧ 國. 學. in me and provided an outstanding emotional support that made me belong and fit into the amazing and diverse culture of Taiwan. I am also grateful for his generous funding during. ‧. this research. I would also like to express my deepest gratitude to Prof. Tzu-Chieh Tsai. sit. y. Nat. for co-advising this research. He allowed me to benefit from his wealth of knowledge and. io. in this PhD journey.. al. er. experience in research. His continuous encouragement had a way of re-igniting my passion. n. v i n I would also like to thank theC Taiwan U Program on Social Network h e International n g c h i Graduate. and Human Centered Computing, and Institute of Information Science of Academia Sinica for the PhD fellowship that funded my research. I am also grateful to the faculty of National. Chengchi University’s department of Computer Science, the research fellows of the Institute of Information Science and the Research Center for Information Technology Innovation both of Academia Sinica for their support. Special thanks to Dr. Mark Liao for his continuous encouragement, guidance and concern during this study. I am also thankful to my labmates from Network Research Lab and to my SNHCC colleagues for all their support and the beautiful moments that we shared together in. DOI:10.6814/DIS.NCCU.TIGP.001.2019.B02.

(5) iv Taiwan. My sincere thanks to Ms. Chia-Chien and Ms. Rebecca as they were always their to help and support. Last but not the least, I am forever thankful to my parents, Suresh and Pratibha Gupta for their love, encouragement, prayers and care. I would also like to thank my brother Sumeet and his wife Swathi for their constant motivation and support. Also, thanks to my friends who were always there cheering me up and stood by me through the good times and the bad.. 立. 政 治 大. ‧. ‧ 國. 學. n. er. io. sit. y. Nat. al. Ch. engchi. i n U. v. DOI:10.6814/DIS.NCCU.TIGP.001.2019.B02.

(6) Abstract. With the continuous urbanization and industrial growth, the concern about deteriorating. 治 政 development. Sometimes even the air that looks clean isn’t 大clean and is filled with dangerous 立 particulate matter; no more than 2.5 microns (PM2.5). These particles are deadly and. air quality is also increasing. This directly impacts the human health and sustainable. ‧ 國. 學. can cause life threatening diseases. An effective way to collect, analyze and scientifically visualize the air quality data can help us continuously monitor the environment and can. ‧. facilitate people’s decision making. This work proposes a multi-pronged approach that. sit. y. Nat. encompasses around design, implementation and evaluation of a framework that exploits. io. er. crowd-sourcing and crowd-sharing using IoT (Internet of Things) platform and machine learning techniques to develop novel solutions to do air quality sensing and provide services. al. n. v i n to the people that will not just raise C hawareness related toUair quality problem but also assist engchi them in day to day living.. But there are many challenges that need to be addressed before such a system can be deployed in real-time. Efficiently handling such a large volume of data is a tedious job and making sure that any anomaly is detected is also a challenging task. Other than that, having an accurate and scalable forecast system with low computation time is also a difficult task. In this dissertation, a human-centered approach is followed. The idea is to use IoT devices and cognitive computing to generate big data which can be further used to enhance air quality management systems and forecasting. A typical case will include the collection. DOI:10.6814/DIS.NCCU.TIGP.001.2019.B02.

(7) vi and storage of data obtained from the sensors, data analytics, prediction, visualization and an alert message service in case of unusual behavior in the air quality. The main contributions of this dissertation are: 1. Initially, this dissertation addresses the issue of data collection and reliability. To tackle the issue, an Anomaly Detection Framework (ADF) is proposed that is efficient enough in identifying outliers in the raw measurement data and inferring anomalous events emission. ADF can evaluate attributes and status of each device in the system; that is,. 治 政 大PM2.5 sensing via bicycles is data and cost efficient model for mobile opportunistic 立 proposed which is then implemented and tested for real world scenarios. The results. whether a device is deployed indoors, or close to an emission source. Also, an energy,. ‧ 國. 學. helped us develop a system which would gather data on the move and at the save time would save device energy.. ‧. 2. Next, this dissertation addresses the problem of designing a scalable forecast model. y. Nat. sit. that can be implemented in real-time. One of the major challenges in designing such. n. al. er. io. a forecast system is ensuring high accuracy and an acceptable computation time. To. i n U. v. address this issue, we begin with performing a comparative analysis of already existing. Ch. engchi. forecast models. Later on, a Hybrid model based on neural networks is proposed to perform hourly PM2.5 forecast. The performance evaluation of the Hybrid model is done by comparing it with baseline models and other state of the art works. The model has been implemented in real-time and is used to provide forecast service for more than 2000 monitoring nodes in Taiwan. 3. Next, the dissertation deals with the challenge of creating fine-grained air pollution maps and then using those maps to design an algorithm which would assist urban dwellers to reduce their exposure to airborne pollutants. We propose the Clean Air Routing (CAR) algorithm that recommends health-optimal paths from origin to the. DOI:10.6814/DIS.NCCU.TIGP.001.2019.B02.

(8) vii destination. PM2.5 data are spatially and temporally interpolated on the Taiwan’s road network. The algorithm is evaluated for different travel modes as well as a comparison is provided with Google Maps result and shortest path (Dijkstra). 4. The final part of this dissertation explains about the real-time applications that have been developed based on the results obtained during this research. The applications include visualization service, an animation service to understand the trend in PM2.5, a short-term PM2.5 forecast service,an IoT enabled personal air quality chatbot assistant and a route recommendation application based on CAR algorithm.. 政 治 大 Evaluation of the framework’s 立 components has been conducted by performing a compar-. ‧ 國. 學. ative analysis with state-of-the-art systems. The proposed framework is not just limited to air quality data but it can potentially be applied to other emerging data sensing systems as well.. ‧. n. er. io. sit. y. Nat. al. Ch. engchi. i n U. v. DOI:10.6814/DIS.NCCU.TIGP.001.2019.B02.

(9) viii. 立. 政 治 大. ‧. ‧ 國. 學. n. er. io. sit. y. Nat. al. Ch. engchi. i n U. v. DOI:10.6814/DIS.NCCU.TIGP.001.2019.B02.

(10) Table of contents List of figures. 立. List of tables. xix. ‧ 國. State-of-the-art Air Quality Sensing . . . . . . . . . . . . . . . . . . . . .. 3. 1.2. Research Challenges . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 4. 1.3. The Airbox Project: Air Quality Monitoring Framework . . . . . . . . . .. 5. 1.4. Thesis Contribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 1.5. Dissertation Organization . . . . . . . . . . . . . . . . . . . . . . . . . . .. ‧. 1.1. y. 1. sit. Introduction. io. n. al. er. Nat. 2. xv. 學. 1. 政 治 大. Ch. engchi. i n U. v. 7 9. PM2.5 Sensing and Data Analysis. 11. 2.1. Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 11. 2.2. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 12. 2.3. Literature Review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 13. 2.4. Preliminary Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 14. 2.5. Opportunistic PM2.5 Sensing . . . . . . . . . . . . . . . . . . . . . . . . .. 17. DOI:10.6814/DIS.NCCU.TIGP.001.2019.B02.

(11) x. Table of contents 2.6. 2.7. 19. 2.6.1. System Parameter Tuning . . . . . . . . . . . . . . . . . . . . . .. 19. 2.6.2. System Modeling . . . . . . . . . . . . . . . . . . . . . . . . . . .. 22. Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 23. ADF: an Anomaly Detection Framework for Large-scale PM2.5 Sensing Systems 3.1 3.2. 25 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 政 治 大 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 立. 25 26. Literature Review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 29. 3.4. The Anomaly Detection Framework . . . . . . . . . . . . . . . . . . . . .. 31. ‧. Real-time Emission Detection (RED) . . . . . . . . . . . . . . . .. 3.4.3. Device Ranking (DR) . . . . . . . . . . . . . . . . . . . . . . . .. 3.4.4. Malfunction Detection (MD) . . . . . . . . . . . . . . . . . . . . .. 34 35. er. n. al. y. 3.4.2. Ch. engchi. i n U. 32. sit. Time-Sliced Anomaly Detection (TSAD) . . . . . . . . . . . . . .. io. 3.6. 3.4.1. Nat. 3.5. 學. 3.3. ‧ 國. 3. System Parameter Tuning and Modeling . . . . . . . . . . . . . . . . . . .. v. 35. Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 37. 3.5.1. AirBox Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 37. 3.5.2. TSAD Parameter Tuning . . . . . . . . . . . . . . . . . . . . . . .. 39. 3.5.3. Anomaly Detection Results . . . . . . . . . . . . . . . . . . . . .. 42. 3.5.4. Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 47. Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 48. DOI:10.6814/DIS.NCCU.TIGP.001.2019.B02.

(12) xi. Table of contents Design and Development of a Machine Learning based PM2.5 Forecast Framework. 51. 4.1. Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 51. 4.2. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 52. 4.3. Literature Review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 55. 4.4. Forecasting Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 57. 4.4.1. Autoregressive Integrated Moving Average (ARIMA) model . . . .. 57. 4.4.2. Holt-Winters (HW) Forecasting Model . . . . . . . . . . . . . . .. 58. 4.4.3. Neural Network Autoregression (NNAR) Model . . . . . . . . . .. 59 60. 4.5.1. Airbox Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 60. Preliminary Results . . . . . . . . . . . . . . . . . . . . . . . . . .. 62. sit. y. Nat. al. er. Hybrid Neural Network Model for Forecasting . . . . . . . . . . . . . . .. io. 66. 4.6.1. Cluster-based Hybrid Neural Network Model . . . . . . . . . . . .. 4.6.2. Grid-based Clustering Approach . . . . . . . . . . . . . . . . . . .. 70. 4.6.3. Wavelet-based Clustering Approach . . . . . . . . . . . . . . . . .. 71. 4.6.4. Time Series PM2.5 Data . . . . . . . . . . . . . . . . . . . . . . .. 73. 4.6.5. Wavelet Decomposition . . . . . . . . . . . . . . . . . . . . . . .. 74. 4.6.6. Results and Evaluation . . . . . . . . . . . . . . . . . . . . . . . .. 77. Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 80. n. 4.7. ‧. An Empirical Study of PM2.5 Forecasting Using Neural Networks . . . . .. 4.5.2 4.6. 學. 4.5. 立. 政 治 大. ‧ 國. 4. Ch. engchi U. v ni. 68. DOI:10.6814/DIS.NCCU.TIGP.001.2019.B02.

(13) xii. CAR: The Cleanest Air Routing Algorithm for Path Navigation with Minimal PM2.5 Exposure on the Move. 81. 5.1. Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 81. 5.2. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 82. 5.3. Literature Review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 84. 5.4. Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 86. 5.4.1. Data Integration . . . . . . . . . . . . . . . . . . . . . . . . . . .. 5.4.2. 政 治 大. 87. PM2.5 Prediction . . . . . . . . . . . . . . . . . . . . . . . . . . .. 88. 5.4.3. Spatio-temproal Interpolation . . . . . . . . . . . . . . . . . . . .. 89. 5.4.4. Path Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 91. 5.5. Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 94. 5.6. Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 96. 學. ‧. y. sit. Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 5.6.2. Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. n. al. er. 5.6.1. io. 6. Nat. 5.7. 立. ‧ 國. 5. Table of contents. Ch. engchi U. v ni. 97 98. Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100. Applications. 103. 6.1. Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103. 6.2. Inverse Distance Weighting (IDW) Animation . . . . . . . . . . . . . . . . 103. 6.3. PM2.5 Forecast Application . . . . . . . . . . . . . . . . . . . . . . . . . 104. 6.4. Health-Optimal Route Recommendation Web Application Based on CAR Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105. DOI:10.6814/DIS.NCCU.TIGP.001.2019.B02.

(14) xiii. Table of contents 6.5. 6.6. 6.5.1. System Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . 107. 6.5.2. Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108. 6.5.3. Chatbot System . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108. 6.5.4. Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110. 6.5.5. Application Overview . . . . . . . . . . . . . . . . . . . . . . . . 113. Potential Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114. 政 治 大 Conclusions and Future 立Works. 115. Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115. 7.2. Future Works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117. ‧. 119. n. al. er. io. sit. y. Nat. References. 學. 7.1. ‧ 國. 7. IoT-enabled Personal Air Quality Assistant on Instant Messenger . . . . . . 106. Ch. engchi. i n U. v. DOI:10.6814/DIS.NCCU.TIGP.001.2019.B02.

(15) 立. 政 治 大. ‧. ‧ 國. 學. n. er. io. sit. y. Nat. al. Ch. engchi. i n U. v. DOI:10.6814/DIS.NCCU.TIGP.001.2019.B02.

(16) List of figures 6. 1.3. Research approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 7. 2.1. Flowchart for PM2.5 Sensing Framework. . . . . . . . . . . . . . . . . . .. 15. 2.2. Hourly and daily PM2.5 variations . . . . . . . . . . . . . . . . . . . . . .. 2.3. Variation in PM2.5 based on wind speed . . . . . . . . . . . . . . . . . . .. 2.4. Measurement results of different movement using gsensor and gyroscope .. 2.5. CDF of stop duration by bicycle riding within Taipei city . . . . . . . . . .. 20. 2.6. CDF of stop interval duration by bicycle riding within Taipei city . . . . . .. 21. 3.1. The system architecture of the proposed anomaly detection framework . . .. 31. 3.2. The number of distinct AirBox devices online each day in the dataset . . . .. 38. 3.3. The CDF of the time interval between every two contiguous samples for each. 學. 政 治 大 Data visualization platforms . . . . . . . . . . . . . . . . . . . . . . . . . 立 ‧. io. sit. y. Nat. n. al. er. 1.2. Device visualization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. ‧ 國. 1.1. Ch. n U engchi. iv. PM2.5 sensing device in the dataset . . . . . . . . . . . . . . . . . . . . . 3.4. 7. 16 17 18. 38. The distribution of the amount of measurement data for each AirBox device in the dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 39. DOI:10.6814/DIS.NCCU.TIGP.001.2019.B02.

(17) xvi. List of figures 3.5. The CDF of the number of neighboring PM2.5 sensing devices under different d settings in the dataset . . . . . . . . . . . . . . . . . . . . . . . . . .. 3.6. 40. The distribution of the upper and lower thresholds in the dataset under different PM2.5 measurement values for spatial anomaly detection when d = 3 km . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 3.7. The distribution of the absolute offsets between every two contiguous samples of the same AirBox device in the dataset . . . . . . . . . . . . . . . .. 3.8. 41. The time series decomposition of the number of emission events detected by. 治 政 the RED module in the AirBox system from 2016/12/16 大 to 2016/12/25 . . . 立. 44. The CDF of the number of emission events detected by each device in the. 學. ‧ 國. 3.9. 41. AirBox system from 2016/12/16 to 2016/12/25 . . . . . . . . . . . . . . .. 44. ‧. 3.10 The CDF of the ranking results for all devices on each day of the 10-day observation, and the CDF of the difference between two ranking results of. y. Nat. io. sit. the same device on two contiguous days . . . . . . . . . . . . . . . . . . .. 45. n. al. er. 3.11 The histogram of the number of devices with different numbers of occur-. i n U. v. rences of undetectable results in the AirBox system from 2016/12/16 to. Ch. engchi. 2016/12/25 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 46. 3.12 The correlation between the number of indoor devices detected daily and the daily PM2.5 emissions in the five deployment cities from 2016/12/16 to 2016/12/25 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 46. 3.13 Histogram of the number of devices with different numbers of occurrences of indoor, close to emission, and malfunctioning results in the AirBox system. 4.1. from 2016/12/16 to 2016/12/25 . . . . . . . . . . . . . . . . . . . . . . . .. 47. Representation of NNAR (p, P, k)m model . . . . . . . . . . . . . . . . . .. 59. DOI:10.6814/DIS.NCCU.TIGP.001.2019.B02.

(18) xvii. List of figures 4.2. PM2.5 data variation in four major cities . . . . . . . . . . . . . . . . . . .. 61. 4.3. Plots of (a) Fitted ARIMA Model Residuals (b) ACF and (c) PACF . . . . .. 62. 4.4. The CDF plot for fitting with ARIMA model . . . . . . . . . . . . . . . .. 63. 4.5. The CDF plot for fitting with HW model . . . . . . . . . . . . . . . . . . .. 64. 4.6. The CDF plot for fitting with NNAR model . . . . . . . . . . . . . . . . .. 64. 4.7. The CDF plot for error between observed and forecasted PM2.5 for next four hours . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 65. 4.8. Flowchart for the Hybrid model . . . . . . . . . . . . . . . . . . . . . . .. 66. 4.9. Overall architecture of the proposed cluster-based HNNM . . . . . . . . .. 68. 立. 政 治 大. ‧ 國. 學. 4.10 Feature extraction and station clustering flow chart . . . . . . . . . . . . .. 69. ‧. 4.11 (a) Without clustering method (b) 2x2 Clustering method (c) 3x3 Clustering method (d) 4x4 Clustering method . . . . . . . . . . . . . . . . . . . . . .. y. Nat. 71. er. io. sit. 4.12 Clustering evaluation results of four major cities in Taiwan . . . . . . . . .. 70. 4.13 Wavelet-based clustering results of four major cities in Taiwan: (a) Big. n. al. Ch. i n U. v. Taipei City (b) Taichung City (c) Tainan City (d) Kaohsiung City . . . . . .. 72. 4.14 Cascading and filter banks scheme . . . . . . . . . . . . . . . . . . . . . .. 75. 4.15 PM2.5 data wavelet decomposition scheme . . . . . . . . . . . . . . . . .. 75. 4.16 PM2.5 data wavelet decomposition result . . . . . . . . . . . . . . . . . .. 76. engchi. 4.17 Comparison of computation time and relative error with grid/wavelet-based prediction in four areas . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 79. 5.1. Flowchart of the proposed system . . . . . . . . . . . . . . . . . . . . . .. 87. 5.2. Location of PM2.5 sensors in Taiwan . . . . . . . . . . . . . . . . . . . .. 87. DOI:10.6814/DIS.NCCU.TIGP.001.2019.B02.

(19) xviii. List of figures. 5.3. Road network of Taichung . . . . . . . . . . . . . . . . . . . . . . . . . .. 88. 5.4. Finding the shortest path . . . . . . . . . . . . . . . . . . . . . . . . . . .. 92. 5.5. Finding the health-optimal shortest path . . . . . . . . . . . . . . . . . . .. 92. 5.6. Map showing shortest path (SP) and health optimal path (HOP) for different transport modes (a) scooter, (b) bicycle, (c) driving and (d) walking . . . .. 5.7. 97. Plot for distribution of exposure reduction between CAR and Google, CAR and Shortest path algorithm . . . . . . . . . . . . . . . . . . . . . . . . . .. 政 治 大. 99. 5.8. Application user interface . . . . . . . . . . . . . . . . . . . . . . . . . . .. 6.1. IDW animation for a pollution incident . . . . . . . . . . . . . . . . . . . . 104. 6.2. PM2.5 forecast application . . . . . . . . . . . . . . . . . . . . . . . . . . 105. 6.3. Web application with route visualization and feedback option . . . . . . . . 106. 6.4. System overview of the IoT-Chatbot system . . . . . . . . . . . . . . . . . 107. 6.5. Device subscription statistics of the chatbot users . . . . . . . . . . . . . . 111. 6.6. User percentage with alarm distribution statistics . . . . . . . . . . . . . . 111. 6.7. Distribution of number of alarms set by the chatbot users . . . . . . . . . . 112. 6.8. User percentage with distribution of threshold levels . . . . . . . . . . . . . 112. 立. 99. ‧. ‧ 國. 學. n. er. io. sit. y. Nat. al. Ch. engchi. i n U. v. DOI:10.6814/DIS.NCCU.TIGP.001.2019.B02.

(20) List of tables 1.1. Summary of recent studies focusing on air quality monitoring . . . . . . . .. 政 治 大. 3. 2.1. The Relationship Among Number of Sampling Point, Number of Deployed Sensors and Total Time Spent . . . . . . . . . . . . . . . . . . . . . . . . .. 23. 4.1. Distribution of PM2.5 measuring stations . . . . . . . . . . . . . . . . . .. 61. 4.2. Results of computation time and relative error with grid/wavelet-based pre-. 立. ‧. ‧ 國. 學. y. Nat. al. er. io. sit. diction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 77. Hourly PM2.5 Prediction Results . . . . . . . . . . . . . . . . . . . . . . .. 5.2. Exposure Reduction Versus Distance Increase . . . . . . . . . . . . . . . .. 5.3. Comparison with state of the art works . . . . . . . . . . . . . . . . . . . . 100. n. 5.1. Ch. engchi U. v ni. 89 98. DOI:10.6814/DIS.NCCU.TIGP.001.2019.B02.

(21) 立. 政 治 大. ‧. ‧ 國. 學. n. er. io. sit. y. Nat. al. Ch. engchi. i n U. v. DOI:10.6814/DIS.NCCU.TIGP.001.2019.B02.

(22) Chapter 1 Introduction. 政 治 大. 立. ‧ 國. 學. The ubiquitous nature of Artificial Intelligence has transformed the environment we live in and interact with. With the rise of urban intelligence empowered by robust methods and. ‧. efficient algorithms that can handle large sets of data, people can now make data-informed. y. Nat. decisions. But with all the growth and development, we also face certain challenges that. io. sit. need immediate attention and action. One of them is the deteriorating urban air quality which. n. al. er. is a concern in many countries worldwide. Pollutants such as particulate matter have been. i n U. v. responsible for causing health issues among people of all age groups [1]. A study [2] found. Ch. engchi. out that from 1990 to 2015, PM2.5 ranked fifth in global mortality factor. Additionally, air pollution directly impacts sustainable development and poses as a danger to the ecosystem. A better way of assessment of real-time air quality and having applications that can help in understanding the impact on human lives can actually make a huge difference [3]. A human-centric system aims at bridging the gaps between various areas involving design and implementation of computing frameworks that assist people’s activities. It aims at integrating areas like social science and computing systems [4]. This makes it really important to have a framework that can do air quality monitoring as well as use technology advancements to provide services to the community.. DOI:10.6814/DIS.NCCU.TIGP.001.2019.B02.

(23) 2. Introduction Fixed air quality monitoring sensor stations are the primary means to collect air quality. data in most parts of the world. Yet, their deployment cost and maintenance often results in an accurate but geographically sparse monitoring [5]. These instruments are setup strategically and are able to measure a wide range of parameters. The measurements from these instruments are generally reliable with high temporal resolution [6]. Although, these instruments used by the government agencies and environment monitoring organization are highly accurate but sparsity of such instrument leads to less knowledge of spatial distribution [3] of pollutants. For example, the Environmental Protection Agency (EPA) Taiwan has 76. 政 治 大 is a lack of pollutant data with high spatial resolution. Such data is needed to assess the 立 monitoring sites all over Taiwan. Although these instruments are highly reliable, but there. change in concentration of pollutants with high spatial variability. There is a need to have. ‧ 國. 學. frameworks which can sense air quality data at fine spatio-temporal resolution, visualize the. ‧. data and have services that can assist the people as well as the policy makers. In this thesis, we advocate for a holistic approach to do air quality sensing using low-cost. y. Nat. sit. sensors to collect the data with a high spatio-temporal resolution. We present Internet of. n. al. er. io. Environmental Things (IoET):A Human Centered Approach, a highly-customizable, self-. i n U. v. sufficient framework that performs air quality sensing and monitoring but also includes. Ch. engchi. real-time data visualization and personalized services that can assist people in their everyday life. The framework can be deployed for other environment monitoring schemes as well. We have developed algorithms to detect anomalies in the real-time data, perform short-term and long-term air quality forecast and the use the data to provide interactive visualization maps with high spatio-temporal resolution. The fine-grained data is subsequently used for designing mobile applications like personalized air quality alert service and web application like health-optimal route recommendation service.. DOI:10.6814/DIS.NCCU.TIGP.001.2019.B02.

(24) 3. 1.1 State-of-the-art Air Quality Sensing Table 1.1 Summary of recent studies focusing on air quality monitoring Author (Year) Hubbell et al. (2018) Commodore et al. (2017) Pritchard and Gabrys (2016) Castell et al. (2015) Leonardi et al. (2014) Devarakonda et al. (2013) Zappi et al. (2012). 立. Dutta et al. (2009). 政 治 大. ‧ 國. 學. 1.1. Study Focus Focused on integrating social science and air quality sensing to understand people’s perception about air pollution [12]. Understanding how community-based participatory research was driven by the desire to raise awareness about air quality in communities [13] . An analysis about how citizen sensing can be used for technology advancement and developing new partnerships for air quality monitoring [7]. Citi-Sense MOB approach that encouraged public engagement in environmental governance using mobile sensing [8]. Mobile crowd-sourcing system to monitor air pollution with an objective to get participant’s understanding about the issue [9] . Focused on real-time air quality monitoring using vehicle based mobile sensing [14]. Proposed CitiSense project that included an online mapping tool to perform data visualization that would be used to understand how people respond after getting feedback of air quality around them [10]. Proposed participatory sensing system for personal exposure monitoring and community awareness [11] .. State-of-the-art Air Quality Sensing. ‧. There have been several works in the last few years that have addressed the issue of air. y. Nat. sit. quality monitoring at a finer level. Table 1.1 presents the summary of some of the relevant. n. al. er. io. literature on the application of low-cost sensors for air quality monitoring. A majority of. i n U. v. these works focus on low-cost sensors for air quality monitoring [7][8][9][10][11]. Few other. Ch. engchi. use an integrated citizen science approach for air quality sensing and understanding people’s perception [12][13][9]. Although these studies address several aspects of air quality sensing like using low-cost sensors, exposure monitoring and understanding human behaviour but there are several research challenges that still need to be addressed in order to realize the idea of an efficient multi-component air quality monitoring framework with attributes like: data collection and management, knowledge extraction and interactive applications.. DOI:10.6814/DIS.NCCU.TIGP.001.2019.B02.

(25) 4. Introduction. 1.2. Research Challenges. The major research challenges addressed in this work are: • Data quality is a major challenge in the deployment of large-scale environmental sensing systems for several reasons. First, because monitoring devices are deployed in uncontrolled environments, unexpected measurement results are normal and have to be carefully managed. Second, it is difficult to use a single model for all devices that are deployed in non-homogeneous environments.. 政 治 大 • It is difficult to analyze and predict sudden changes in the air quality mainly because 立. of variation in the emission sources, seasonal trends and sudden environment changes.. ‧ 國. 學. Air quality forecasting can be considered as a spatio-temporal data-intensive challenge.. ‧. In order to address this problem, there is a need of a scalable and accurate forecast system that can be implemented in real-time.. y. Nat. sit. • Determining the personal exposure to pollutants is as challenging as monitoring. n. al. er. io. them. As these pollutants can have significant impact on human health, so it becomes. i n U. v. important to have a system that can inform people about the pollutants exposure level. Ch. engchi. as well as provide recommendation which would reduce the personal exposure to air pollution. • Only generating large scale data doesn’t cater the needs of the users. There is a need to have visualization platforms, applications and other services that can raise citizen’s awareness, empower policy makers and help in assisting people in their everyday routine.. DOI:10.6814/DIS.NCCU.TIGP.001.2019.B02.

(26) 5. 1.3 The Airbox Project: Air Quality Monitoring Framework. 1.3. The Airbox Project: Air Quality Monitoring Framework. The data set used for this research work has been obtained from the AirBox project which is a pilot deployment of IoT systems for large-scale PM2.5 monitoring in Taiwan. The project was inspired by the work of the Location Aware Sensing System (LASS) community, which engages citizens to participate in the PM2.5 sensing project and enables them to make low-cost PM2.5 sensing devices on their own. It also facilitates PM2.5 monitoring at a. 政 治 大 measurement data freely available 立 to everyone [15]. The AirBox project is more mature than. finer spatio-temporal granularity and enriches environmental data analysis by making all. ‧ 國. 學. the LASS project because 1) the number of AirBox devices is much higher than the number of LASS devices; 2) the sensing devices are made by a professional company with industrial. ‧. product level consistency and reliability; and 3) the devices are deployed in public buildings with a reliable Internet connection and power supply.. y. Nat. io. sit. The AirBox project started with the deployment of 150 devices in Taipei city on March 22,. n. al. er. 2016. At present, there are more than 4000 devices in over 30 countries. Several applications. i n U. v. have been developed using data provided by the AirBox deployment. They can be divided into two categories:. Ch. engchi. 1. Data Visualization: For AirBox measurement data, two types of data visualization systems have been developed by different parties. The first type is a data-oriented visualization system that focuses on displaying the detailed information of each AirBox device. It analyses and compares data from different devices on the same day and from the same device on different days. An example is illustrated in Figure 1.1.. DOI:10.6814/DIS.NCCU.TIGP.001.2019.B02.

(27) 6. Introduction. Fig. 1.1 Device visualization. 政 治 大 The second type is a geographic 立 information (GIS)-based visualization system that. ‧ 國. 學. provides comprehensive visual results on a map by mashing up the measurement data and other location-based data. For instance, the g0v community combines the. ‧. measurement data and wind information on the same map to facilitate data analysis and interpretation (Figure 1.2a). The LASS community applies the Voronoi diagram. y. Nat. sit. algorithm to partition the map into regions based on the Euclidean distance between. n. al. er. io. the AirBox devices. Each partitioned region is color-coded based on the PM2.5. i n U. v. measurement data and the danger level advised by the EPA of Taiwan [16] (Figure. Ch. engchi. 1.2b). The community also utilizes the Inverse Distance Weighting algorithm to interpolate the colored gradient between every two AirBox devices nearby. The visual result is a real-time heatmap of PM2.5 distribution in Taiwan (Figure 1.2c). 2. Open Data Service: The measurement data of the AirBox deployment is freely available and can be obtained in two ways: 1) by applying for an access token and downloading the data from the AirBox manufacturer’s backend database; or 2) by accessing the Open Data API provided by the LASS community to download the measurement data directly. The open data API (in the JSON data format) allows people to access the latest. DOI:10.6814/DIS.NCCU.TIGP.001.2019.B02.

(28) 7. 1.4 Thesis Contribution. (a) The GIS visualization by g0v. (b) Vornoi diagram. (c) IDW diagram. Fig. 1.2 Data visualization platforms. 立. 政 治 大. er. io. sit. y. ‧. ‧ 國. 學. Nat. Fig. 1.3 Research approach. measurement data of a specific AirBox device, the last 1,000 pieces of measurement. al. n. v i n data of a particular device,C the measurement dataU h e n g c h i of one device on a specific date, and the measurement data of the nearest AirBox device [17].. 1.4. Thesis Contribution. Humans are always surrounded by sensing devices which creates a kind of fusion between physical world and virtual world. In other way, we can term it as Internet of Humans (IoH) which is a combination of Internet of Things (IoT) and Human-Centered Computing (HCC).. DOI:10.6814/DIS.NCCU.TIGP.001.2019.B02.

(29) 8. Introduction • In this work, we have combined networked sensing and crowd-sourcing techniques to collect streams of sensing data about the surroundings; and provide insightful information service at personal, society and urban levels. The approach followed in this thesis is illustrated in Figure 1.3. • We introduce a novel system to perform real-time reliable data sensing and collection. An Anomaly Detection Framework (ADF) is proposed that ensures data quality and is efficient enough in identifying outliers in the raw measurement data and inferring anomalous events emission.. 治 政 • We propose a Hybrid model for PM2.5 forecast. An extensive 大 evaluation is performed 立 by testing the forecast model on more than 2000 airbox devices in Taiwan. Also, ‧ 國. 學. having such a scalable system can help in developing an integrated system that can act as an early warning system.. ‧. • We have integrated the data to design high quality pollutant concentration maps. These. y. Nat. sit. maps are used to provide real-time visualization services to the users and can also be. n. al. er. io. used to understand the pollutant trend at a particular location or during a certain time. v. period. Such maps can be used for a better visual understanding.. Ch. engchi. i n U. • We have proposed the Cleanest Air Routing (CAR) algorithm that recommends health optimal routes to the users. The algorithm compares the PM2.5 concentration for routes suggested by the Google Maps, the shortest path and finally recommends a path that is more healthy. Based on the algorithm, a web application has been developed for users to get a first hand experience of this service. • We have also created a mobile chatbot application which acts as an IoT enabled personal air quality assistant. The chatbot not only provides air quality information but also provides services like subscription to air quality monitoring nodes in user’s area, alarm services and recommendations.. DOI:10.6814/DIS.NCCU.TIGP.001.2019.B02.

(30) 9. 1.5 Dissertation Organization. 1.5. Dissertation Organization. This dissertation is organized as follows: Chapter 1 provides the general introduction to air quality sensing and the Airbox project. It also defines the motivation, challenges and contributions. In Chapter 2, we discuss the proposed mobile opportunistic sensing framework and also explain about the field experiments. In Chapter 3, we discuss about the proposed Anomaly Detection Framework. The components of the framework are explained in detail and further discussion is done based. 政 治 大 detailed analysis of results and evaluation is done. Also, the wavelet-based forecast model is 立. on the results obtained. In Chapter 4, we propose the Hybrid model for PM2.5 forecast. A. explained in this chapter. We also discuss about development of a scalable forecast framework. ‧ 國. 學. which would be implemented for real-time PM2.5 forecast. In Chapter 5, the CAR algorithm. ‧. is proposed. Performance evaluation of the CAR algorithm is done by comparing the results with Google Maps. Details about the route recommendation application based on the CAR. Nat. sit. y. algorithm are also provided. In Chapter 6, we discuss about the real-time applications that. al. er. io. have been developed based on the results obtained during the entire research study. Chapter. v. n. 7 summarizes this thesis by highlighting the contributions and directions for future work.. Ch. engchi. i n U. DOI:10.6814/DIS.NCCU.TIGP.001.2019.B02.

(31) 立. 政 治 大. ‧. ‧ 國. 學. n. er. io. sit. y. Nat. al. Ch. engchi. i n U. v. DOI:10.6814/DIS.NCCU.TIGP.001.2019.B02.

(32) Chapter 2 PM2.5 Sensing and Data Analysis 政 治 大 立. ‧ 國. 學. 2.1. Overview. ‧. In this chapter, we discuss about the PM2.5 sensing system as well discuss the data. We try to. sit. y. Nat. understand the variations in the data and how knowledge can be extracted from understanding. io. er. various trends in data. We also present an energy, data and cost efficient model for mobile opportunistic PM2.5 sensing via bicycles. To facilitate the implementation of such systems,. n. al. Ch. i n U. v. we first investigate the accuracy issue of different Inertial Monitoring Unit (IMU) built into. engchi. the Arduino 101 for stop detection. Then, by curve fitting and optimization calculation on system parameter tuning and modeling, each sensor could start up at the optimum time in order to achieve minimum energy consumption and maximum data usability. Finally, we conduct a field experiment to evaluate the proposed model in a real world setting. The results show that total time spent of PM2.5 data collection is similar to the expected time derived from the system modeling. The content of this chapter have been published in [18].. DOI:10.6814/DIS.NCCU.TIGP.001.2019.B02.

(33) 12. PM2.5 Sensing and Data Analysis. 2.2. Introduction. The technology revolution has completely changed the world around us. We often find ourself surrounded by hi-tech devices which has actually made our life easy. Major functional elements of these devices are the smart sensors which are used for wide range of tasks like environment sensing, data collection and analysis etc. These tiny sensors have in a way revolutionized the traditional approaches to gather data, understand it and analyze it. Nowadays, more and more miniature sensors are applied in daily living, such as op-. 政 治 大 measure environmental situations, such as location sensing, air quality sensing and water 立 tical mouse, pregnancy test kit and hidden airbag. Miniature sensors are also applied to. quality inspection. Usually, these sensing fields require large amount of sensing time and. ‧ 國. 學. sensing area. Thus, the received data would also be extremely large. To address this issue,. ‧. opportunistic sensing is proposed to measure in a way such that the device turns on during a particular time period and collects the data. There are more and more miniature sensors. Nat. sit. y. being applied in our daily life. Thus, large amount of information is generated from these. al. er. io. sensors every minute. By taking advantages of benefits from urban mobility, mobile sensing. v i n architectures opportunistic sensing isCconsidered of the representative architectures. h e n gtocbehone i U n. systems have been proposed by many studies with a variety of architectures. Among these. In opportunistic sensing, data collection and data transmission happen when the conditions of the equipment are satisfied. Opportunistic sensing mainly depends on the environment. This sensing architecture could simplify wireless heterogeneous sensor network design and lead. to a more effective data collection [19]. Because opportunistic sensing is mainly "human centered, unanimous", it makes it challenging to take care of energy consumption, security and privacy . Another characteristic of opportunistic sensing is their intersection and fusion with sociology and psychology, which develops new concepts including the calculation. DOI:10.6814/DIS.NCCU.TIGP.001.2019.B02.

(34) 13. 2.3 Literature Review. of social perception and that of emotion[20, 21]. There are some hot researches on the application aspect, such as environmental monitoring[22–27] and health care. PM2.5 sensing is one of the most crucial environmental sensing because it shows significant index in air quality field. Therefore, numerous miniature PM2.5 sensors are produced to achieve high accuracy and universality for environmental monitoring. However, there are issues yet to be investigated in existing systems. Most existing commercial off-the-shelf (COTS) low cost PM2.5 sensors do not support mobile sensing because they could be easily influenced by the wind speed. As a result, in order to enhance the fineness of the measur-. 政 治 大. ing, large amount of sensors need to be deployed in certain measuring areas, which surely. 立. increases the cost.. ‧ 國. Literature Review. Nat. sit. y. ‧. 2.3. 學. We applied a novel system model for opportunistic sensing on PM2.5.. io. er. By taking advantages of benefits from urban mobility, mobile sensing systems have been proposed by many studies with a variety of architectures. Among these architectures oppor-. al. n. v i n tunistic sensing is considered to C be one of the representative architectures. In participatory hengchi U sensing, people autonomously analyze the proper time and proper approaches to collect data. This sensing architecture mainly depends on his/her analysis and judgement ability. Crowd sensing, one kind of participatory sensing functions through numerous people and sensors and its sensing efficiency would be higher than other independent sensors [28–30]. The members of the crowd leverage mobile matching and the characteristics of mobile sensors to acquire environmental data from themselves and from ambient sensors. This is also seen in Smartphone sensing, which collects data from the environment and executes data exchange when sensors are close to each other [31, 32].. DOI:10.6814/DIS.NCCU.TIGP.001.2019.B02.

(35) 14. PM2.5 Sensing and Data Analysis In opportunistic sensing, data collection and data transmission happen when the con-. ditions of the equipment are satisfied. Opportunistic sensing mainly depends on the environment. This sensing architecture could simplify wireless heterogeneous sensor network design and lead to a more effective data collection [19]. Nowadays, GPS, bluetooth, Wi-Fi, smart label, smart phone and various kinds of sensors have all become important approaches to opportunistic sensing, from which a series of research hot spots [33–37], have been derived. Because opportunistic sensing is mainly "human centered, unanimous", it makes it challenging to take care of energy consumption, security and privacy . Another characteristic. 政 治 大 which develops new concepts including the calculation of social perception and that of 立. of opportunistic sensing is their intersection and fusion with sociology and psychology,. emotion[20, 21]. There are some hot researches on the application aspect, such as environ-. ‧ 國. 學. mental monitoring[22–27] and health care. Another application of opportunistic sensing was. ‧. found in air quality sensing system. Wi-Fi, GPS and air sensor were installed on the bus. The transmission of air sensing data only takes place when the bus arrives at the bus station.. Nat. sit. y. However, there are few research works that thoroughly consider the frame and structure of. n. al. er. io. opportunistic sensing calculation.. 2.4. Ch. Preliminary Analysis e n g c h i. i n U. v. The main inspiration behind this project is the LASS (Location Aware Sensing System) community. This community engages the people to participate in PM2.5 sensing and also encourages them to try and develop sensing devices by themselves. The project facilitates PM2.5 monitoring at a finer spatiotemporal granularity and enriches data analysis by making sure that all the measurement data are available freely to everyone [15]. The devices are installed in buildings with reliable Internet connection and power source. In addition, the data (https://pm25.lass-net.org/en/) are easily accessible which makes data analysis easy. The. DOI:10.6814/DIS.NCCU.TIGP.001.2019.B02.

(36) 15. 2.4 Preliminary Analysis. sensing devices in Airbox Project are designed and developed by professional manufacturers. The industrial product level devices are made in close cooperation with Edimax Inc. and Realtek Inc. in Taiwan. The devices are based on Realtek Ameba development board. The device contains a PMS5003 PM2.5 sensor and a HTS221 temperature/humidity sensor. Another version of deployed device is called MAPS (Micro Air Pollution Sensing System) which is developed by Network Research Lab at Institute of Information Science, Academia Sinica, Taiwan. It is based on MediaTek LinkIt Smart 7688 Duo development board. It has a PMS5003 sensor for PM2.5 and BME 280 for temperature/humidity. The data sensing part. 政 治 大 system. 1. Data Producers comprise the sensors which provide sensed data. The hardware 立. of the framework is shown in Figure 2.1. There are three major components of the sensing. and the source codes are open source so that people can create such devices themselves.. ‧ 國. 學. 2. Transit Centers act as data brokers for the data sent from the data producer to data. ‧. users. Multiple data brokers can be used to achieve scalability and fault tolerance.. y. sit. io. n. al. er. applications.. Nat. 3. Data Users are those who use the sensed data, analyze it and create different types of. Ch. engchi. i n U. v. Fig. 2.1 Flowchart for PM2.5 Sensing Framework.. In this section, we will discuss how the PM2.5 data changes over time, the inflection points, the variation patterns that is observed due to wind speed, wind direction, time of the. DOI:10.6814/DIS.NCCU.TIGP.001.2019.B02.

(37) 16. PM2.5 Sensing and Data Analysis. 0 Monday. 6. 12 18 23. Tuesday. 0 Wednesday. 6. 12 18 23. 0. Thursday. 6. Friday. 12 18 23. Saturday. Sunday. PM2.5 (µg m−3). 60. 40. 20. 0 0. 6. 12 18 23. 0. 6. 12 18 23. 0. 6. 12 18 23. 0. 6. 12 18 23. hour PM2.5. 政 治 大 35. 26.0. 35. PM2.5 (µg m−3). ‧. 15. 18. 23. J FMAM J J A SOND. MonTueWedThu Fri Sat Sun. month mean and 95% confidence interval in mean. weekday. io. sit. Nat 12. hour. y. 23.5. er. PM2.5 (µg m−3). ‧ 國. 24.0. 15. 6. 24.5. 25. 20. 20. 0. 25.0. 學. 25. 30. 25.5. PM2.5 (µg m−3). 立. 30. Fig. 2.2 Hourly and daily PM2.5 variations. n. al. Ch. engchi. i n U. v. day and many more factors. For this study, the data from Shillin station of EPA Taiwan is used. Figure 2.2 shows the PM2.5 variations for the month of November. It can be observed from Figure 2.2 that sometimes there is a trend in PM2.5 variations. For example, during the weekends it can be assumed that most people would go out which means more traffic and more pollutants. So, higher PM2.5 would be observed during the weekends rather than the weekdays. Similarly, during the morning and late evening PM2.5 would be higher as people would be going and coming back from work. Such trends are easy to observe. But sometimes there are certain inflection points (sudden spikes). Inflection points can be considered as sudden increase in the PM2.5 values which might be caused by environmental factors or. DOI:10.6814/DIS.NCCU.TIGP.001.2019.B02.

(38) 17. 2.5 Opportunistic PM2.5 Sensing. NW. W. wind dir. S. SW. SE. E. NE. N. 40. PM2.5 (µg m−3). 30. 20. 10. Jan. Feb. Mar. 立. Apr. 政 治 大 May. Jun. Jul. Aug. Sep. Oct. Nov. Dec. date contribution weighted by mean. Fig. 2.3 Variation in PM2.5 based on wind speed. ‧ 國. 學. human activities. As these incidents are rare, so it is very difficult to model them using a. ‧. conventional forecast model. One year data for the monitoring station was studied and we tried to analyze how the air quality is effected by the wind speed. The plot based on the. Nat. sit. y. wind direction for different months of the year is shown in Figure 2.3. It can be observed. al. er. io. that for most of the study period, PM2.5 is higher when the wind direction is North-West. It. v. n. actually points to the fact that there is a lot of smog coming from the north of Taiwan which. Ch. constitutes a major proportion of air pollutants.. 2.5. engchi. i n U. Opportunistic PM2.5 Sensing. We have proposed a system model with energy consumption and data usability optimization. By exploring optimal start up time under the evaluation of expected stopping duration, expected stop interval duration and device warm up time, energy efficient model is designed. There are several factors which would cause vehicles or bicycles to stop such as traffic light, pedestrians, traffic jam, etc. Therefore, the proposed system leverages these stop. DOI:10.6814/DIS.NCCU.TIGP.001.2019.B02.

(39) 18. PM2.5 Sensing and Data Analysis. (a) Static gsensor. (b) Static gyroscope. (d) Walking gyroscope. (e) Riding gsensor. 立. (c) Walking gsensor. 政 治 大. (f) Riding gyroscope. Fig. 2.4 Measurement results of different movement using gsensor and gyroscope. ‧ 國. 學. opportunities to accurately sensing PM2.5. The device and platform selection with stopping. ‧. detection is stated in this part.. y. Nat. In this work, we choose Arduino platform and Arduino 101 development board as our. sit. experiment development environment. Arduino 101 as compared to Arduino Uno, has. n. al. er. io. an additional function such as accelerators and gyroscope. Moreover, we have done an. i n U. v. experiment on the stability and accuracy of COTS PM2.5 sensors. The result shows that. Ch. engchi. among several miniature sensors, G3 sensor performs the best. The high accuracy and low cost of G3 sensor makes it a suitable choice for us to apply that for the experiment. The proposed model is realized by setting sensors on bicycles rather than on vehicles, so that the mobility limitation could be reduced and even the narrow lanes could be easily accessed. To determine whether the sensor is mobile or static, two Inertial Monitoring Unit (IMU), embedded in Arduino 101 developer board, are tested by us. The first one is a 3-axis accelerometer which measures gravity on three axis and the value should be 0g on x-axis and y-axis but 1g on z-axis in a static condition. The other device is a gyroscope which measures the angular velocity in three axis and the value should be 0 degree per second in a static. DOI:10.6814/DIS.NCCU.TIGP.001.2019.B02.

(40) 19. 2.6 System Parameter Tuning and Modeling. condition. Three testing modes including static mode, walking mode, and riding mode are selected to see the accuracy of motion detection in these two devices. There are totally nine pairs of devices being tested in this experiment, and this experiment not only reflects their accuracy but also their consistency. The result is shown in Figure 2.4. The static mode results of accelerometer show that there are some deviations under testing; however, the results of gyroscope present clear zero degree per second value in three axis. Thus, Gyroscope is adopted for latter experiments. The objective of this study is to 1) develop a system model to realize opportunistic PM2.5. 政 治 大 high data usability; 2) evaluate 立 the least required number of PM2.5 sensors and conduct sensing with optimal sensor start up time, which results in efficient energy consumption and. ‧ 國. 學. experiments to verify the proposed system model under real world urban mobility; and 3) ameliorate the possible result of non-uniform sampling date distribution.. ‧. System Parameter a Tuning. y. sit er. io. 2.6.1. System Parameter Tuning and Modeling. Nat. 2.6. n. iv l C n h e We First of all, we define system parameters. n gdefine c hTi s asUthe stop duration, which represents the duration from the time bicycle stops until it begins to move again. Besides, we define Ti as the stop interval duration, which represents the duration of moving between two stops. In this opportunistic sensing experiment, we let the sensor stay in sleeping mode as the vehicle moves and starts to function as the bicycle stops. In actual situation, different amount of stop duration is encountered due to various cases, such as pedestrian appearing, traffic light, traffic jam, etc. In addition, each sensor has its warm-up time and response time. Some energy would be wasted if we immediately turn on the sensor as the bicycle stops because the stop duration might be less than the sum of warm-up time and response time. Thus, in order. DOI:10.6814/DIS.NCCU.TIGP.001.2019.B02.

(41) 2. 20. -. (TS). ❖ ❖. PM2.5 Sensing and Data Analysis. 0~150. (Curve fitting) F(x)=-1.111*x-0.44+1.136 R2 = 0.9938. 15. Fig. 2.5 CDF of stop duration by bicycle riding within Taipei city. 政 治 大. to reduce energy consumption, instead of turning on the sensor immediately as the bicycle. 立. stops, we investigate proper expected period of time to turn on the sensor so that the energy. ‧ 國. 學. consumption could be minimized. We also record the sampling data after the warm-up time to assure the data accuracy.. ‧. We take G3 sensor, a COTS PM2.5 sensor, with bicycle riding to build a practical model.. sit. y. Nat. First of all, we execute an experiment in Taipei city. With totally seventeen hours riding. io. er. with 1 hz sampling rate of gyroscope, the Cumulative Distribution Function (CDF) of stop duration (T s) is presented in Figure 2.5.. al. n. v i n Then, we applied curve fitting C onh the data with the stop e n g c h i U time ranging from 0 to 150. seconds, and we obtained the fitting results F(x) by F(x) = 1.111x. 0.44. + 1.136,. (2.1). where its R-square achieves 0.9938. To establish the proposed model, we let u be the duration from the time bicycle stops to the time G3 sensor turns on, and v be the duration for G3 sensor to warm up. Then, we let F(u+v) be the probability of not measuring data when the bicycle stops, and F(u + v). F(u) be the. probability of not measuring data while vehicle is stopping and G3 sensor is on. Optimization. DOI:10.6814/DIS.NCCU.TIGP.001.2019.B02.

(42) 3.. (TI). ❖. 21. 2.6 System Parameter Tuning and Modeling ❖. 0~400. Curve fitting G(x)=-2.145*x -0.2316+1.582 R2 = 0.8963. 18. Fig. 2.6 CDF of stop interval duration by bicycle riding within Taipei city. 政 治 大. is applied on Equation (2.2) to achieve minimum value of F(u + v) and F(u + v). 立. F(u) by. min. F(u + v) + (F(u + v). F(u)) 0, v. 0.. (2.2). ‧. s.t. u. 學. ‧ 國. setting proper values to arguments u and v.. n. al. where. Ch. u=. ec 1. ec. v,. engchi c=. ln2 . (n 1). er. io. sit. y. Nat. After the optimization, we obtained the relationship between u and v by,. i n U. v. (2.3). (2.4). The n presented in Equation(2.4) denotes the power of x in the function of F(x). Therefore, u is calculated to be 16 seconds under the condition that G3 sensor warm-up time v is 10 seconds.. DOI:10.6814/DIS.NCCU.TIGP.001.2019.B02.

(43) 22. PM2.5 Sensing and Data Analysis. 2.6.2. System Modeling. In order to confirm the feasibility of the field experiment, we firstly investigate the relationship among three parameters, which are the number of sampling point (n), the number of deployed sensors (m) and the total time spent (Ttotal ). First of all, we execute an experiment in Taipei city to investigate the duration of stop interval (Ti ). The CDF of the Ti is presented in Figure 2.6. Then, we applied curve fitting on the data with the stop interval time ranging from 0 to. 政 治 大. 400 seconds, and we obtained the fitting results G(x) by. 立. G(x) = 2.145x. 0.2316. + 1.582,. (2.5). ‧ 國. 學. where its R-square achieves 0.8963.. ‧. According to F(x) and G(x), we can estimate the minimum time spent for one successful. sit. y. Nat. sample point, which is defined as T s0 . We denote a successful sample point to be a sample. io. er. point acquired at the case that the stop duration of the bicycle is over u plus v seconds. Otherwise, we denote it as a fail sample point. Therefore, the expectation of T s0 is derived by. n. al. Ch. engchi. E[T s0 ] = T1 + T2 ,. i n U. v. (2.6). where T1 = E[Ti] + T2 =. •. Â. T s=u+v. E[Ti] +. T s · P(T s). T s<u+v. Â. T s=0. T s · P(T s). !✓. 1. 1 F(u + v). ◆. (2.7). 1 ,. and T1 denotes the expectation duration to obtain successful sampling point at the first stop duration, which means that stop duration is larger or equal to u plus v seconds. T2 denotes the. DOI:10.6814/DIS.NCCU.TIGP.001.2019.B02.

(44) 23. 2.7 Summary. Table 2.1 The Relationship Among Number of Sampling Point, Number of Deployed Sensors and Total Time Spent Device 1 m. Sample 1 n. Total Time E[Ts0 ] E[Ts0 ] · n/m. expectation duration cost in failed sampling for several times, which is the expectation failed time multiplies expectation failed iterations. Based on the above formula, the minimum time to obtain a successful sampling data is around 427.95 seconds. Hence, inferring from Table 2.1, the relationship among the three arguments, number of sampling point (n), number of. 政 治 大. deployed sensors (m) and total time spent (Ttotal ), is shown by Ttotal = E[T s0 ] ⇤. n , m. 學. ‧ 國. 立. (2.8). ‧. where E[T s0 ] is the expectation of the total time spent under the experiment with the use of one device and one sample. Thus, on the one hand, if there needs n sampling points, n folds. Nat. sit. y. of the total time spent is required. On the other hand, if there are m devices applied in the. n. al. er. io. field experiment, total time spent would be shorten by m times.. i n U. v. By this inference, we can estimate the situation and feasibility in the field experiment. Ch. engchi. taken in Taipei City. With the statistics showing from the research on Youbike daily utility, which is 50000 per day, we could infer that 10515 data may be acquired per hour.. 2.7. Summary. We elaborated the PM2.5 sensing framework as well as discussed about the variations in the data. We proposed a system model that can achieve opportunistic PM2.5 sensing under urban mobility. To realize the proposed opportunistic sensing system, we first identified a suitable platform, which is Arduino 101 with the most reliable low-cost PM2.5 sensor, and a more. DOI:10.6814/DIS.NCCU.TIGP.001.2019.B02.

(45) 24. PM2.5 Sensing and Data Analysis. accurate embedded function for stop detection. Then, we performed a real world experiment to investigate the expected value of stop duration and stop interval duration in order to tune system parameters. The optimum start up time for G3 sensor from the time it is put static equals to 16 seconds. This settings makes the system an energy- and data-efficient model. In addition, for feasibility evaluation, an experiment was done to investigate the expected value of stop interval duration. From this result, we proposed a formula that presents the relationships among number of sampling point (n), number of deployed sensors (m) and total time spent (T ). By this formula, we could easily design a cost-efficient system model by. 政 治 大. using minimum number of sensors under the condition of expected number of sampling point within expected total time spent.. 立. ‧. ‧ 國. 學. n. er. io. sit. y. Nat. al. Ch. engchi. i n U. v. DOI:10.6814/DIS.NCCU.TIGP.001.2019.B02.

(46) Chapter 3 ADF: an Anomaly Detection Framework 政 治 大 立 PM2.5 Sensing Systems for Large-scale. ‧. ‧ 國. 學. 3.1. Overview. sit. y. Nat. io. er. As the population density continues to grow in the urban settings, air quality is degrading and becoming a serious issue. Air pollution, especially fine particulate matter (PM2.5), has raised. al. n. v i n C h As a result, a number a series of concerns for public health. of large-scale, low cost PM2.5 engchi U monitoring systems have been deployed in several international smart city projects. One of the major challenges for such environmental sensing systems is ensuring the data quality. In this chapter, we explain the proposed Anomaly Detection Framework (ADF) for large-scale, real-world environmental sensing systems. The framework is comprised of four modules: 1) Time-Sliced Anomaly Detection (TSAD), which detects Spatial, Temporal, and Spatiotemporal anomalies in the real-time sensor measurement data stream; 2) Real-time Emission Detection (RED), which detects potential regional emission sources; 3) Device Ranking (DR), which provides a ranking for each sensing device; and 4) Malfunction Detection (MD), which identifies malfunctioning devices. Using real world measurement data from. DOI:10.6814/DIS.NCCU.TIGP.001.2019.B02.

(47) 26. ADF: an Anomaly Detection Framework for Large-scale PM2.5 Sensing Systems. the AirBox project, we demonstrate that the proposed framework can effectively identify outliers in the raw measurement data as well as infer anomalous events that are perceivable by the general public and government authorities. Because of its simple design, ADF is highly extensible to other advanced applications, and it can be exploited to support various large-scale environmental sensing systems. The contents of this chapter has been published in [38].. 3.2. Introduction. 立. 政 治 大. Smart city development is a response to address a range of issue which include urbaniztion. ‧ 國. 學. and sustainability. Among all smart city systems, increasing attention is being paid to the large-scale deployment of applications for monitoring finer-grained air pollution [39]. The. ‧. reason is that air quality is a global issue and it is being degraded by economic activity, rapid urbanization, and increased energy consumption. Air pollution raises a number of. y. Nat. sit. concerns ranging from public health to the social economy [40, 41]. Among all pollutants,. n. al. er. io. fine particulate matter (PM2.5) comprises particles that are less than 2.5 micrometers in. i n U. v. diameter. It has been shown that such particles are harmful to human health as they can. Ch. engchi. penetrate the alveoli (the gas exchange regions of the lungs) and even pass through the lungs to affect other organs. Recent studies have shown that PM2.5 pollution is directly related to various serious health disorders, such as asthma, cardiovascular disease, respiratory diseases, lung cancer, and premature death [42, 43]. A number of outdoor PM2.5 monitoring projects have been launched worldwide [44– 46]. Conventional approaches usually rely on professional air quality monitoring stations that are deployed at strategic locations and generally operated by national, state, or local environmental protection agencies (EPAs or their equivalent). However, the monitoring stations are extremely large and expensive to operate, and they cannot be deployed in a high. DOI:10.6814/DIS.NCCU.TIGP.001.2019.B02.

(48) 27. 3.2 Introduction. density. As a result, sophisticated air pollution dispersion models must be used to estimate PM2.5 concentrations in the areas between different stations [47]. However, the accuracy of the estimations depends on wind conditions, the terrain, and the distance to the nearest stations. Moreover, the measurement results of conventional stationary systems are only effective in representing well-mixed atmospheric pollution. They cannot indicate the air quality in our living surroundings [48, 49]. With advances in sensing and computing technology, several recent works have applied low-cost sensors in micro-scale PM2.5 sensing [50–56]. Meanwhile, some international. 政 治 大 monitoring (e.g., Chicago [57], 立 Taipei [58], and Zurich [59]). In addition, there are several. smart cities have deployed large-scale, low-cost particle sensors for real-time air quality. ‧ 國. 學. commercial products and non-profit communities that encourage people to monitor PM2.5 concentrations in their surroundings and send the measurement results to the cloud (e.g.,. ‧. AirCasting [60], Clarity [61], Laser Egg [62], LASS [63], and uHoo [64]).. y. Nat. Data quality and reliability is a challenge when it comes to large scale sensing systems.. io. sit. Although a number of approaches address the anomaly detection issue in large-scale sensing. n. al. er. systems [65–71], the following two aspects have not been investigated by existing studies:. Ch. engchi. i n U. v. 1. Effectiveness of Anomaly Detection: Existing anomaly detection techniques rely on long periods of anomaly-free observations, a priori data profiles, or a large number of labeled observations. However, these prerequisites are infeasible for real-world, large-scale systems because data collection, analysis, and modeling require a great deal of time. Moreover, existing detection techniques cannot deal with unexpected measurement results, which are inevitable in real-world systems and must be handled properly. 2. Inference of Anomalous Events: Most existing techniques focus on detecting outliers in the raw measurement results rather than identifying anomalous events, which is. DOI:10.6814/DIS.NCCU.TIGP.001.2019.B02.

(49) 28. ADF: an Anomaly Detection Framework for Large-scale PM2.5 Sensing Systems essential for real-world environmental sensing systems. Although some approaches [71] can be extended to infer anomalous events, however they are delayed indicators due to their sophisticated models. They also fail to identify events not shown in the training dataset. To address the above issues, we propose an Anomaly Detection Framework (ADF) for. large-scale, real-world environmental sensing systems. The ADF can effectively identify outliers in the raw measurement data, as well as infer anomalous events that are perceivable by the general public and government authorities. The framework is based on a core module. 政 治 大. called Time-Sliced Anomaly Detection (TSAD), which identifies spatial, temporal, and. 立. spatio-temporal anomalous instances in each segment of time-sliced measurement data. ‧ 國. 學. derived by a large-scale sensing system. Next, the sequences of TSAD results are fed into the Real-time Emission Detection (RED), Device Ranking (DR), and Malfunction Detection. ‧. (MD) modules to, respectively, infer potential regional emission sources, rank the data. sit. y. Nat. consistency of each device, and assess the properties of each device.. al. er. io. Using a real-world dataset from a large-scale PM2.5 monitoring system called AirBox [72], we conduct a comprehensive analysis of the dataset, and fine tune the parameters of the. n. v i n C h Meanwhile, theURED, DR, and MD modules are TSAD module to ensure its effectiveness. engchi implemented as online services. The real-time results are publicly available in an open data format. Our research outcomes have been used by third parties in their data visualization services. Furthermore, the analysis results are sent to our city and national EPA authorities for on-demand responses and consideration in formulating environmental policy. The contributions of this work are as follows. 1. We propose an Anomaly Detection Framework (ADF) for large-scale environmental sensing systems. The proposed framework is simple, effective, and capable of inferring anomalous events.. DOI:10.6814/DIS.NCCU.TIGP.001.2019.B02.

(50) 29. 3.3 Literature Review. 2. We discuss the decision criteria of the ADF parameters and conducted a comprehensive analysis based on a real-world dataset to identify the optimal parameter settings. 3. We implement the proposed framework incorporated with a large-scale PM2.5 monitoring system, called AirBox. As a result the system becomes a regular service providing open data service for further applications. 4. We demonstrate that the proposed system is highly extensible and can be used in large-scale environmental sensing systems for real-time anomaly detection.. Literature Review 立. 學. ‧ 國. 3.3. 政 治 大. Data quality is the key to the success of a large-scale networked sensing system. However,. ‧. the issue is extremely challenging, especially when the sensing devices are deployed in uncontrolled environments [73]. Anomaly detection is an essential factor in the implementation. y. Nat. sit. of such mechanisms. There have been several studies of anomaly detection in large systems.. n. al. er. io. They can be classified into three categories based on their fundamental algorithms:. Ch. 1. Statistics-based Techniques:. engchi. i n U. v. Wu et al. [71] proposed a spatial mining-based approach to detect outlying and boundary data in sensor networks. Their approach considers the spatial correlation of the readings between a sensor and its neighboring nodes. If the absolute value of the sensor’s deviation degree is greater than a pre-selected threshold, it is regarded as an outlier. However, the selection of the threshold is very subjective and it is made without evaluation using realistic datasets. Subsequently, Paschalidis and Chen [70] developed a statistical framework that considers both spatial and temporal correlations between the sensor readings in anomaly. DOI:10.6814/DIS.NCCU.TIGP.001.2019.B02.

(51) ADF: an Anomaly Detection Framework for Large-scale PM2.5 Sensing Systems detection. Based on Markov models, the framework relies on long periods of anomalyfree observations, but this may not be possible for real-world environmental monitoring applications [74]. Moreover, the detection process is quite long and cannot be performed in a timely manner. 2. Cluster-based Techniques: Cluster-based techniques do not require prior knowledge of the data distribution in the sensor readings; and they can support incremental models, i.e., the system can adapt to new data instances over time [73]. Such techniques detect an anomalous event. 政 治 大. if 1) the centroid of its closest cluster is a known anomaly event; or 2) the distance to. 立. the closest cluster that is a normal event is greater than the threshold. The drawbacks. 學. of these techniques are 1) determining the distance between multivariate measurement. ‧ 國. data is challenging; 2) defining the threshold of a cluster width for normal events is. ‧. complicated; and 3) the measurement data of non-homogeneous environments cannot. y. Nat. be accommodated easily [68]. As a result, these types of techniques are generally. io. sit. used as quick filters for outlier detection (e.g., reducing the number of false alarms in healthcare systems [66] and contextual anomaly detection based on sensor profiles. n. al. er. 30. Ch. i n U. v. [67]). They are not suitable for real-world anomalous event detection.. engchi. 3. Machine Learning-based Techniques:. Machine learning-based techniques are able to learn anomaly detection from labeled data, and they can be implemented using state-of-the-art machine learning tools directly. The drawbacks of such approaches are 1) a large amount of labeled data is needed as training data in order to cover all possible types of sensor readings in the system; and 2) it is difficult to achieve accurate anomaly detection unless a sophisticated set of tuning and optimization processes are investigated and implemented.. DOI:10.6814/DIS.NCCU.TIGP.001.2019.B02.

(52) 31. 3.4 The Anomaly Detection Framework. Fig. 3.1 The system architecture of the proposed anomaly detection framework For instance, Murphree proposed a sparse autoencoder that uses neural networks for anomaly detection in large systems [69]. When a large error is observed between. 治 政 the test data and the reconstructed data using the大 sparse autoencoder, the test data 立 is near the distribution described by the healthy system, and is therefore considered ‧ 國. 學. an anomaly. Ayadi et al compared three machine learning-based algorithms, namely, outlier detection by active learning, identifying the density-based local outliers, and. ‧. feature bagging for outlier detection [65]. Based on the evaluation results, the authors. sit. y. Nat. concluded that 1) none of the algorithms is suitable for every case, and the performance. io. detection in dynamic systems is still challenging.. n. al. 3.4. Ch. engchi. er. of each one depends on the properties of the application datasets; and 2) anomaly. i n U. v. The Anomaly Detection Framework. The system architecture of the proposed anomaly detection framework is shown in Figure 3.1. First, the real time sensor measurement data is fed into the Time-Sliced Anomaly Detection (TSAD) module to detect any abnormal data. Based on different applications, the data is then processed by the Real-time Emission Detection (RED) module, Device Ranking (DR) module, and Malfunction Detection (MD) module. We discuss each module in the following subsections.. DOI:10.6814/DIS.NCCU.TIGP.001.2019.B02.

參考文獻

相關文件

Followed by the use of an important degree of satisfaction with the service quality attributes, by Kano two-dimensional quality model, IPA analysis and

Through literatures relevant to service quality, service value, customer satisfaction and customer loyalty, this research conducts study on the five aspects of the theme

In order to accurately represent the student's importance and degree of satisfaction towards school service quality, as well as to design a questionnaire survey and

This study aims to explore whether the service quality and customer satisfaction have a positive impact on the organizational performance of the services and whether the

To response the increasing competition between banks and it variation in operation, it is an significant issue to upgrade service quality and consolidate customer satisfaction,..

It applied Data Mining technology about clustering and association rules to figure out optimal short-turn service route and optimal express service route, with the objective to

It applied Data Mining technology about clustering and association rules to figure out optimal short-turn service route and optimal express service route, with the objective to

One is to survey the state of the MOW service in Taiwan; another is to propose a feasible operation model of MOW service including of order-processing