• 沒有找到結果。

以影像為基礎之智慧型睡眠監測系統 - 政大學術集成

N/A
N/A
Protected

Academic year: 2021

Share "以影像為基礎之智慧型睡眠監測系統 - 政大學術集成"

Copied!
86
0
0

加載中.... (立即查看全文)

全文

(1)國立政治大學資訊科學系 Department of Computer Science National Chengchi University 碩士論文. 立. Master‟s 政 治Thesis. 大. ‧ 國. 學 ‧. 以影像為基礎之智慧型睡眠監測系統 y. Nat. n. al. er. io. sit. Intelligent Video-based Sleep Monitoring System Ch. engchi. i Un. v. 研 究 生:郭仁和 指導教授:廖文宏. 中華民國九十九年七月 July 2010.

(2) 以影像為基礎之智慧型睡眠監測系統 Intelligent Video-based Sleep Monitoring System. 研 究 生:郭仁和. Student:Jen-ho Kuo. 指導教授:廖文宏. Advisor:Wen-Hung Liao. 立. 政 治 大 國立政治大學. ‧ 國. 學. 資訊科學系 碩士論文. ‧ sit. y. Nat. er. io. A Thesis. n. submittedato Department of Computer v Science. i. l. n C h Chengchi University National i U e ngch. in partial fulfillment of the Requirements for the degree of Master in Computer Science. 中華民國九十九年七月 July 2010.

(3) 立. 政 治 大. ‧. ‧ 國. 學. n. er. io. sit. y. Nat. al. Ch. engchi. i Un. v.

(4) 立. 政 治 大. ‧. ‧ 國. 學. n. er. io. sit. y. Nat. al. Ch. engchi. i Un. v.

(5) 以影像為基礎之智慧型睡眠監測系統. 摘要. 我們提出的智慧型睡眠監測系統,是基於影像分析技術進行睡眠品質觀測,. 政 治 大. 並利用所得到的數據來推斷最佳的喚醒時間。此系統命名為 iWakeUp,利. 立. 用非接觸式的方法來收集影像資料並進行後續處理,此裝置將被安裝在一. ‧ 國. 學. 般的臥室來幫助睡眠者,以期成為增進智慧家庭生活品質的一環。在此論. ‧. 文中,我們將會描述 iWakeUp 的各個模組包括測定動作量、推斷睡眠階段. y. Nat. er. io. sit. 乃至於如何建立喚醒機制。更特別的是,我們考慮了喚醒時間與喚醒機制 的關係,於較早的時間喚醒必須具有更高的信心度,否則將付出較大的代 a. n. iv l C n hengchi U 價,反之亦然。另外為了處理晨間臥室中的光影變化,不同的背景模型也 已被整合測試,以期讓系統可以提升長時間觀測的準確度。最後,我們也 進行了使用 iWakeUp 的臨床實驗,結果指出使用 iWakeUp 喚醒的睡眠者具 有較低的嗜睡感與更好的活力。. 關鍵字:睡眠觀測、影像監控、智慧家庭生活、可適性背景模型. I.

(6) Intelligent Video-based Sleep Monitoring System. Abstract. We present a video-based monitoring system to determine the sleep status and optimal wakeup time in this thesis. We envision a smart living space in which a. 政 治 大. data collection and processing module named iWakeUp is installed in the. 立. bedroom to record and monitor sleep in a non-invasive manner. We describe the. ‧ 國. 學. overall structure of the iWakeUp system, including the procedure to measure. ‧. amount of motion, the method for inferring wake/sleep status from the acquired video and the logics for deciding the optimal wakeup time. In particular, a. y. Nat. er. io. sit. time-dependent decision rule has been incorporated to account for unequal penalties when classification error occurs. Furthermore, various background. n. al. iv. C hexamined to address modeling techniques have been U n lighting changes at dawn in engchi. the bedroom for long-term monitoring. Validation experiments are carried out to compare the alertness level upon awakening with/without reported a lower level of sleepiness and higher level of vigorousness in comparison to the control group.. Key Words: video-based sleep monitoring, sleep stages, smart living space, adaptive background modeling. II.

(7) 誌謝 僅以此論文獻予在生命中曾經幫助過我的人,包含我的家人、登山攀 岩的夥伴們、還有我的所有好朋友,沒有你們的一路支持與陪伴,不會有 今天的我。 要能夠完成這篇論文,首先要感謝的就是心理系陳瑩明同學,沒有他. 政 治 大. 日以繼夜的收集並分析臨床使用者的資料,這篇論文的方法也不會有機會. 立. 得到印證。同時,也必須感謝台大 INSIGHT 的夥伴,因為你們才得以有機. ‧ 國. 學. 會能夠執行跨領域的研究,並將成果推向應用。當然,我是不會忘記 VIPL. ‧. 的夥伴們,和你們一起研究是一件快樂的事情,僅以此論文做為紀念。. y. Nat. er. io. sit. 在研究生的生涯裡,要特別感謝的是我的指導教授,他不僅指引我論. n. 文的研究方向,更是個益友,總是能在迷惑時給我們多方面且平衡的建議。 a. iv l C n h e n g c h i U 仁和. III. 於台北. 2010 年仲秋.

(8) TABLE OF CONTENTS. 1.. Introduction ......................................................................................................................... 1. 2.. Related work ........................................................................................................................ 5 2.1.. 政 治 大. Techniques in Video-based Sleep Monitoring ................................................................... 13. ‧. 3.2.3. Local Binary Pattern ................................................................................ 19 3.2.4. Local Ternary Pattern Model ................................................................... 21 Image Noise and Noise Removal .......................................................................... 22 3.3.1. Gaussian Smooth Filter ........................................................................... 25 3.3.2. Median Filter ........................................................................................... 27 3.3.3. Image Binarization .................................................................................. 29 Motion History Image............................................................................................ 31 3.4.1. Mechanism of Motion History Image ..................................................... 31 3.4.2. Features from Motion History Image ...................................................... 33. io. sit. y. Nat. 3.3.. 學. 3.2.. 立. Near Infrared Images ............................................................................................. 13 Background Modeling ........................................................................................... 15 3.2.1. Consecutive Frames Subtraction ............................................................. 16 3.2.2. Gaussian Mixture Model ......................................................................... 17. n. al. er. 3.1.. ‧ 國. 3.. Traditional Approaches on Monitoring Sleep Quality............................................. 6 2.1.1. Acti-watch ................................................................................................. 7 2.1.2. Polysomnography ...................................................................................... 9 2.1.3. Questionnaire ........................................................................................... 11. 3.4.. 4.. Ch. engchi. i Un. v. The iWakeUp System ........................................................................................................ 35 4.1.. 4.2. 4.3.. System Architecture ............................................................................................... 36 4.1.1. Video Acquisition .................................................................................... 37 4.1.2. Background Modeling ............................................................................. 38 4.1.3. Noise Removal ........................................................................................ 41 4.1.4. Display Movement Areas ........................................................................ 43 Wake-up Rule ........................................................................................................ 43 A Time-Dependent Wake-up Rule ......................................................................... 48. 4.4.. The iWakeUp User Interface ................................................................................. 52. IV.

(9) 4.5.. Experiments with Varying Lighting Conditions .................................................... 54 4.5.1. Dataset Generation .................................................................................. 55 4.5.1.1. Lightness Changes at Dawn .................................................... 55 4.5.1.2. Adding Artificial Light ........................................................... 59 4.5.2. Experimental Results ............................................................................... 60 4.5.2.1. Uniform Brightness Environment ........................................... 61 4.5.2.2. Varying Brightness Environment ............................................ 62 4.5.3.. 5.. Validation Results .............................................................................................................. 65 5.1. 5.2.. 6.. Discussion................................................................................................ 63. Method ................................................................................................................... 65 Results and Discussion .......................................................................................... 66. 政 治 大 Conclusions and Future Work ........................................................................................... 68 立. ‧ 國. 學. Nomenclature............................................................................................................................ 70 References ................................................................................................................................ 71. ‧. n. er. io. sit. y. Nat. al. Ch. engchi. V. i Un. v.

(10) LIST OF FIGURES. Fig. 2-1:. Left: a sleeper wears an Acti-watch; Middle: Acti-watch with light sensor produced by BMedical; Right: Acti-watch produced by Philips ............................. 8. Fig. 2-2:. The actigraph generated by Acti-watch ................................................................... 9. Fig. 2-3:. Left: a child attached many sensors on his body; Right: the channels of. 政 治 大 An example of questionnaire in evaluating KSS and VAS .................................... 12 立. physiological signals of PSG ................................................................................. 10 Fig. 2-4:. The comparisons of normal image (a)(c) and infrared image (b)(d) ..................... 14. Fig. 3-2:. The first column of images are the first frames of consecutive frames, the second. ‧. ‧ 國. 學. Fig. 3-1:. column of images are the second frames of consecutive frames and the last. y. sit. er. Every pixel belonging to the background will be modeled as a part of GMM.. io. Fig. 3-3:. Nat. column of images are the results by using consecutive frames subtraction. ......... 16. al. Fig. 3-4:. n. iv n C hengchi U respectively ............................................................................................................ 18 and. denote the mean and the standard deviation of a Gaussian distribution. (Top) original frames captured from video; (bottom) corresponding motion after performing Gaussian mixture subtraction ............................................................. 19. Fig. 3-5:. The procedure to generate a local binary pattern................................................... 20. Fig. 3-6:. The histogram of LBP background model ............................................................. 21. Fig. 3-7:. The procedure of make a local ternary code .......................................................... 22. Fig. 3-8:. Left is the image captured using 100 ISO, center is captured using 1600 ISO, right upper is the detail view of left image and right bottom is the detail view of center .. ............................................................................................................................... 24. VI.

(11) Fig. 3-9:. Examples of salt and pepper noise......................................................................... 25. Fig. 3-10: Two dimension of Gaussian smooth filter and its discrete version ....................... 26 Fig. 3-11: Comparison of before and after performing Gaussian smooth filter (kernel size = 3) ............................................................................................................................... 27 Fig. 3-12: The procedure of median filter with kernel size 3 by 3 ......................................... 28 Fig. 3-13: Left image is created by adding noise salt and pepper; Right image is denoised by using median filter (kernel size = 3) ...................................................................... 29. 政 治 大. Fig. 3-14: Comparison of before and after applying banirization on Fig. 3-2 with threshold. 立. 87 ........................................................................................................................... 30. ‧ 國. 學. Fig. 3-15: Cannot retrieve motion information from a single image ..................................... 31 Fig. 3-16: (Top) original frames (Bottom) MHI after adding timestamps; brighter areas. ‧. indicate „newer‟ motions and darker areas indicate „older‟ motions ..................... 32. y. Nat. sit. Fig. 3-17: The white circle is the global orientation of the motion, red circle is the local. n. al. er. io. orientation of the motion in a specific area ........................................................... 34. i Un. v. Fig. 4-1:. The concept of iWakeUp system ........................................................................... 35. Fig. 4-2:. The architecture of iWakeUp system ..................................................................... 37. Fig.4-3:. The near infrared video devices used in the experiment ....................................... 38. Fig. 4-4:. (a) Incoming video frame; after applying (b) CFD, (c) GMM, (d) LBP (e) LTP. Ch. engchi. background subtraction .......................................................................................... 40 Fig. 4-5:. (a)(c) are the original images after CFD and GMM background subtraction, respectively; (b)(d) are its corresponding result after image binarization and median filter ........................................................................................................... 42. Fig. 4-6:. The example of overlaying movement areas with original frame ......................... 43. Fig. 4-7:. Global motion and direction of movement ............................................................ 44 VII.

(12) Fig. 4-8:. Two classes of motion pattern and the corresponding motion amount. (a) (c): slight, sustained motion, (b)(d): sudden, substantial motion ................................. 46. Fig. 4-9:. Binary classification with penalty mechanism ...................................................... 49. Fig. 4-10: Classification accuracy vs. the penalty ratio of awake and asleep classes ............ 50 Fig. 4-11: Classification accuracy vs. the penalty ratio of awake and asleep classes in the range [0.0001, 0.1] ................................................................................................. 51 Fig. 4-12: The iWakeUp user interface .................................................................................. 53. 政 治 大. Fig. 4-13: The architecture of building artificial varying brightness videos .......................... 55. 立. Fig. 4-14: The actual scenes of videos captured at dawn ....................................................... 56. ‧ 國. 學. Fig. 4-15: The actual scene and its corresponding means and standard deviation for every zone ........................................................................................................................ 57. ‧. Fig. 4-16: Left is curve increasing in linear; Right is the curve increasing in exponential .... 58. y. Nat. sit. Fig. 4-17: The smooth surface and its changes in mean and standard deviation ................... 59. n. al. er. io. Fig. 4-18: The first order of mean and standard deviation ..................................................... 60. i Un. v. Fig. 4-19: The accuracy of background subtractions in different environments .................... 64. Ch. engchi. Fig. 4-20: (a)(c)(e)(g)(i) are under uniform brightness environment; (b)(d)(f)(h)(j) are under varying brightness environment. (a)(b) are the original frames, (c)(d) are the frames after CFD, (e)(f) are the frames after GMM, (g)(h) are the frames after LBP, (i)(j) are the frames after LTP ....................................................................... 64 Fig. 5-1:. Karolinska Sleepiness Scale under 2 experimental conditions.............................. 66. Fig. 5-2:. Subjective ratings of wakefulness under 2 experimental conditions ..................... 67. Fig. 5-3:. Subjective ratings of alertness under 2 experimental conditions .......................... 67. Fig. 5-4:. Subjective ratings of vigorousness under 2 experimental conditions .................... 67. VIII.

(13) LIST OF TABLES. Table 4-1: The characteristics of each background subtraction approaches ........................... 39 Table 4-2: The noise removal approaches applied with corresponding background subtraction approaches ............................................................................................................. 42 Table 4-3: Sleep/Wake status using SVM ............................................................................... 47. 政 治 大 The profile of videos captured at dawn ................................................................. 56 立. Table 4-4: The penalty ratio and awake accuracy in different time slots................................ 52 Table 4-5:. ‧ 國. 學. Table 4-6: The training and testing dataset are randomly selected from datasets .................. 61 Table 4-7: The results under uniform brightness environment ............................................... 62. ‧. Table 4-8: The results under varying brightness environment ................................................ 62. n. er. io. sit. y. Nat. al. Ch. engchi. IX. i Un. v.

(14) 1. Introduction. In the past few years we have witnessed a vibrant growth of interest in the development and construction of smart buildings with the intention of providing comfortable living spaces and enhancing overall quality of life [1][16]. Solutions such as theft-deterrent devices, remote. 政 治 大. lighting or window curtain control system, intelligent TV and smart air conditioning have. 立. been integrated into common household environments. These advanced technologies help to. ‧ 國. 學. provide a safe and healthy environment for the inhabitants, making our world a better and enjoyable place to live. Although many efforts have been devoted to improving different. ‧. aspects of human life, sleep seems to be a relatively neglected area.. sit. y. Nat. er. io. Sleep occupies one third of our life. The quality of sleep also affects the other two. al. n. iv n C h e nperformance detrimental effects on mood and cognitive g c h i Uduring the daytime. The enhancement. thirds of life. Empirical studies have shown that sleep disruption or deprivation may have. of sleep has also been reported to improve daytime functioning. Therefore, good sleep quality is a crucial determinant of good life quality.. It is the objective of this thesis to address the monitoring and enhancement of the quality of sleep in a smart home environment. In [8], we have developed a video-based system to perform activity and movement pattern analysis in overnight sleep studies in both laboratory settings and regular homes. The results using our method have shown good correlation with polysomnography (PSG) measurements and Actiwatch data. In this thesis we. 1.

(15) adopt a similar approach to design a sleep monitoring system for smart home environments. Our objective is two-fold. Firstly, we wish to enable continuous and long-term monitoring and assessment of sleep quality under different lighting conditions in different bedroom settings. Secondly, we aim to devise an intelligent alarm which can determine optimal wakeup time based on movement data inferred from the recorded video.. The large variations in bedroom environments present a challenge for estimating. 政 治 大 when we take into account the gradual change of lighting in the morning. Frame differencing 立 movement amount from acquired video. Background modeling becomes a crucial component. ‧ 國. 學. is conceptually simple and can tolerate very slow change in lightness. Gaussian mixture model (GMM) is a well-known background modeling approach which has proven to be. ‧. effective and efficient in many applications [13]. Local binary pattern (LBP) is a. sit. y. Nat. texture-based feature descriptor that considers the relationship between a pixel and its. er. io. neighbors to arrive at a binary encoding. The histogram of patterns in a certain area can be. al. n. iv n C h eis nangextension variations [5]. Local ternary pattern c h i Uof LBP. It defines a range for those. used to model the background. LBP is computationally efficient and can tolerate illumination. neighbors whose intensities are close to the center one. It can better handle the encoding, and therefore, the modeling of flat regions, which is the major flaw of LBP. These techniques will be employed to model the changing background in the bedroom. For evaluation purposes, we will add some artificial lighting to the video clips and investigate the robustness of these methods in different situations.. It is known that sleep consists of several stages that cycle throughout the night [11]. Stage 1 is the transition state between sleep and wakefulness. Stage 2 is considered to be a. 2.

(16) more stable sleep stage. Stages 3 and 4 are considered to be deep sleep. An additional stage, rapid-eye-movement (REM) sleep, is usually associated with vivid dreams. It has been documented that waking someone up in the deep sleep stage will have a negative impact on the individual‟s cognitive performance and alertness. On the other hand, waking up from light sleep will result in a better mental status [4][7][14]. Therefore, we propose to monitor the sleep status prior to the preset alarm time and determine an appropriate time for waking up based on the wake/sleep condition inferred from the video. The wakeup logics are deduced. 政 治 大. from observations on hundreds of video clips previously recorded with PSG measurement for. 立. ground-truth reference. The rules are then formalized using machine learning algorithms.. ‧ 國. 學. Since it is more detrimental to wrongfully wake someone up during sound sleep than. ‧. not to awaken him until alarm time, the system should assign a larger penalty for the former. sit. y. Nat. classification error. Moreover, the differences in penalty should decrease as we approach the. al. n. address this issue.. er. io. preset alarm time. A support vector machine with time-varying cost function is employed to. Ch. engchi. i Un. v. To investigate the effectiveness of the iWakeUp alarm, we have designed an experimental protocol to comparatively analyze the alertness level of participants with/without using the iWakeUp system. Additionally, the iWakeUp unit has been integrated into a multi-purpose Sleep Coach system [18] and is currently under extensive field tests to further validate its efficacy in regular household environments.. The rest of this thesis is organized as follows. In chapter 2 we summarize related work in video-based sleep monitoring and present traditional approaches in monitoring sleep quality. Chapter 3 introduces the image analysis techniques used in this thesis. Chapter 4 presents the 3.

(17) overall architecture of iWakeUp system and details the operation of the key components, along with the iWakeUp user interface. Validation results by using iWakeUp are elaborated in Chapter 5. It also summarizes the experimental results of background modeling under various lighting conditions. Chapter 6 concludes this thesis and presents thoughts on future work.. 立. 政 治 大. ‧. ‧ 國. 學. n. er. io. sit. y. Nat. al. Ch. engchi. 4. i Un. v.

(18) 2. Related work. At present, most research on sleep-related disorders relies heavily on polysomnography (PSG) [2]. Standard overnight PSG comprises continuous recordings of several channels of physiological signals such as EEG, EOG, EMG and EKG, which can be utilized to classify sleep stages and evaluate the quality of sleep. Other physiological measures, such as EKG,. 政 治 大. nasal and oral airflow and oxygen saturation, may be applied for evaluation of various sleep. 立. disorders. Although PSG provides reliable information for in-depth analysis of sleep behavior,. ‧ 國. 學. its high cost and the need for well-trained professionals prevent wide-spread installation of. ‧. such devices. For a smart home environment, it is more practical to consider a replacement that is low-cost and can be easily configured. A video-based approach, which requires a. y. Nat. n. al. er. io. further investigation.. sit. relatively inexpensive near infrared camera and a computer, is an alternate that deserves. Ch. engchi. i Un. v. As we have pointed out in [8], there exist only a few articles that are concerned with video-based approach for sleep study. Furthermore, most of these studies [12] center on the investigation of sleep-related disorders, e.g., obstructive sleep apnea syndrome. The work by [17] studies normal subjects, but their objective is to classify sleep motion (body up, down, left, right) from image sequences captured with IR-based night vision camera.. On the other hand, there are some products (e.g., SleepTracker [6], aXbo sleep phase alarm [3]) on the market that claim to monitor one‟s sleep cycle and wake the user up at the right moment. They are wrist-worn devices that collect activity data in a way similar to an. 5.

(19) Actiwatch. Compared with a video-based approach, an actigraph method requires contact with human body. It can only record gross motion at a low resolution and is subject to bias due to local movement. Consequently, we anticipate our technique will produce more robust results than these wrist-worn devices.. An ideal alarm clock must work in real-time; otherwise it cannot wake the user up on time. This means that all the necessary video analysis tasks must be completed within a. 政 治 大 incoming video and related effort on background modeling. GMM [13], LBP [5] and LTP all 立. certain time constraint. To respond to this requirement, we have to consider the loading of. ‧ 國. 學. claim they have high tolerance on noises and can be adaptive in building the background models. In contrast, frame differencing is the simplest algorithm. But it may fail when there. ‧. are changes in lighting. The review in [10] provides a survey of some important background. sit. y. Nat. modeling approaches with comparative analysis. In this thesis, we will apply various. n. al. er. io. background modeling techniques to help obtain movement data from the acquired video.. Ch. i. e. i Un. v. n g c h Sleep Quality 2.1. Traditional Approaches on Monitoring There exist some approaches to monitor the sleep status of a patient such as Acti-watch, PSG and questionnaire. Each of these approaches has its advantages for monitoring sleep status but also suffer from certain limitations. Acti-watch is designed just like a normal watch which can be worn on the wrist, but it will make the user feel bound and uncomfortable. PSG is a device with multiple channels to record various physiological signals from human being. It is employed by many of medical institutions and psychology research centers to evaluate sleep quality of patients. Although PSG can completely monitor and record the sleep status of. 6.

(20) a patient overnight, it still use a contact method, in which a lot of sensors units have to be attached to a user. Questionnaire is a kind of evaluating approach which estimates a patient‟s sleep quality through subjective rating, but in sleep research field it is not often used independently.. In this thesis, we will demonstrate that our video-based sleep monitoring system can produce results that have high correlation with those obtained using conventional approaches.. 政 治 大. To begin with, we will introduce traditional sleep monitoring approaches in the following.. 立. ‧ 國. 學. 2.1.1.. Acti-watch. ‧. An Acti-watch is a wrist-worn device activity logger that records one‟s movements in. sit. y. Nat. digital form. It is already produced by several companies such as Philips, Cambridge. er. io. Neurotechnology, and BMedical (Fig. 2-1). Acti-watch is a relatively non-invasive way of. al. n. iv n C h e nforglater Acti-watch can be sent to the computer i U (Fig. 2-2). Activity levels derived c hanalysis. monitoring human rest/activity cycles. The activity data, also known as actigraph, acquired by. from actigraph have been found to correlate with sleep/wake patterns, pain level, mood, fatigue/alertness and other quantifiable parameters. Some Acti-watches are equipped with light sensors that can sense the illumination changes in the environment. This helps researchers to determine whether the activities occur during day time or night time.. 7.

(21) 立. 政 治 大. ‧. ‧ 國. 學. Fig. 2-1: Left1: a sleeper wears an Acti-watch; Middle2: Acti-watch with light sensor produced by BMedical; Right3: Acti-watch produced by Philips The function of Acti-watch is to record the movement in a fixed interval and then apply. n. al. , we label this interval as. er. io. amount of movement exceeds a predefined threshold. sit. y. Nat. a weighted linear decision rule to estimate the wake/sleep status (Eq. (2-1, Eq. (2-2). If the. v ni. wake status, and vice versa. Assume that the activity level during interval then the status is set to wake if. Ch. engchi U. is denoted as. (2-1). 1 2. 3. Retrieved at http://cbti.respironics.com/ Retrieved at http://www.bmedical.com.au/shop/actiwatch-2-starter-kit-includes-actiwatch-2-dock-and-software-minimitter-philips.html Retrieved at http://actiwatch.respironics.com/. 8. ,.

(22) Otherwise, it is set to sleep if. (2-2) In sleep studies, each interval corresponds 30 seconds, or a page of data in PSG. In other words, the above formula uses the activity log two minutes before and after a specific. 政 治 大. page to determine the status of the subject.. 立. ‧. ‧ 國. 學. n. er. io. sit. y. Nat. al. Ch. engchi. i Un. v. Fig. 2-2: The actigraph generated by Acti-watch. 2.1.2.. Polysomnography. In most laboratories and medical centers, PSG is utilized to study sleep-related disorders. Because it‟s highly accuracy, it becomes the gold standard in the field of sleep research. A PSG device needs to place several electrodes on patient‟s body. In order to acquire the most accurate and reliable physiological signals, it is time-consuming to place every electrodes on skin. But once PSG has been configured successfully, it can record several 9.

(23) channels of physiological signals at the same time such as EEG, EOG, EMG and EKG. All these information are useful for analyzing the sleep status and help doctors figure out what the problem a patient has. However, the unacceptable high cost prohibits the deployment of such devices in a home environment. And the uncomfortable feeling from attaching sensors also makes a serious problem on monitoring sleep quality (Fig. 2-3).. 立. 政 治 大. ‧. ‧ 國. 學 y. Nat. sit. n. al. er. io. Fig. 2-3: Left4: a child attached many sensors on his body; Right5: the channels of physiological signals of PSG. Ch. i Un. v. Sivan et al [12] have reported that the results from PSG measurements showed a high. engchi. correlation with those obtained from home video recording of children with obstructive sleep apnea syndrome. We use the gold standard from PSG measurement as ground truth and compare the results of our proposed approach in this thesis.. 4 5. Retrieved at http://en.wikipedia.org/wiki/Portal:Medicine/Selected_picture/97 Retrieved at http://www.advancedsleepdisorderscenter.com/polysomnogram_sleepstudy.shtml. 10.

(24) 2.1.3.. Questionnaire. In sleep-related researches, it is often necessary to collect subjective rating aside from PSG or Acti-watch to better interpret the acquired data. The information gathered from these devices reflects objective measure. It cannot decipher the true feelings of a subject. In order to have more comprehensive evaluation of sleep quality, it is beneficial to get extra information from the patient directly using a questionnaire. An example (Karolinska Sleepiness Scale. 政 治 大 that evaluates the subject‟s mental states. 1 indicates the most alert state, while 9 corresponds 立 (KSS) and Visual Analogue Scales (VAS)) is illustrated in Fig. 2-4. KSS is a scale from 1 to 9. ‧ 國. 學. to the sleepiest state. VAS is another technique used to estimate the current mental states of a patient by marking a line on a suitable position of the scale. This approach gives the. ‧. participant a relatively concrete scale to recount the exact mental state in comparison to last. er. io. sit. y. Nat. time.. Questionnaire exploits the subjective feelings from patients directly to reduce the. n. al. Ch. i Un. v. disadvantage of impersonal information by just using devices. By integrating results from. engchi. device measurement and questionnaire, it is possible to evaluate sleep quality from different perspectives. It makes the estimation of sleep quality more complete.. 11.

(25) 立. 政 治 大. ‧. ‧ 國. 學. n. er. io. sit. y. Nat. al. Ch. engchi. i Un. v. Fig. 2-4: An example of questionnaire in evaluating KSS and VAS. 12.

(26) 3.. Techniques in Video-based Sleep Monitoring. Advances in computer vision research have made real-time manipulation and analysis of image and video stream possible. Many applications, such as special effects, image understanding, and video analytics have been explored. The work by Sivan et al [12] marks one of the first attempts to apply image processing technique to study sleeping disorder. It. 政 治 大. gives us an inspiration of monitoring sleep, which is how to determine the sleep status and. 立. evaluate the sleep quality. In the following, image processing techniques related to the work. ‧ 國. 學. in this thesis will be discussed, including near infrared images, background modeling, noise. ‧. removal and motion history image.. sit. y. Nat. n. al. er. io. 3.1. Near Infrared Images. Ch. i Un. v. The reason why we can see things in the world is because the light emitting from. engchi. sources will be reflected by the surface of the objects to our eyes. Due to reflection, we can distinguish what the color of a specific object is and what it looks like. But the spectrum of the light is very wide. Not all ranges of light are visible to us. The visible range of light is from 380 nanometers to 780 nanometers. Those radiations out of this range are invisible to human eyes, but it doesn‟t mean they are also invisible for other animals or light sensors.. Since people cannot see anything in a totally dark environment, it is necessary to exploit some devices that can sense the light outside the visible spectrum to help us observe the environment. Near infrared image is one of those approaches in night vision that assist us 13.

(27) to sense the surroundings in a weakly illuminated environment. Infrared ranges from 700 nm to 900 nm in the light spectrum. By using near infrared, it is possible to take pictures or build a surveillance system in an environment without visible lights (Fig. 3-1). It is also widely used in some light sensitive experiments such as monitoring sleep status by projecting near infrared lights to the patients without disturbing them.. 學. (a). (b). ‧. ‧ 國. 立. 政 治 大. n. er. io. sit. y. Nat. al. Ch. engchi. (c). i Un. v. (d). Fig. 3-1: The comparisons of normal image (a)(c) and infrared image (b)(d)6. 6. Retrieved at http://en.wikipedia.org/wiki/Infrared_photography and http://en.wikipedia.org/wiki/Night_vision. 14.

(28) 3.2. Background Modeling Background subtraction is a widely used approach for detecting moving objects in videos. The rationale in the approach is that of detecting the moving objects from the difference between the current frame and reference frame which is called the background model. In image processing studies, moving objects or regions not belonging to background are called foreground. Otherwise, objects that always appear still in the video are called. 政 治 大 is a set of image processing techniques called background subtraction that separates the 立. background. In order to tell which part is foreground or background for further analysis, there. ‧ 國. 學. foreground from the scene.. ‧. According to Piccardi [10], three types of background modeling techniques have been identified, including parametric, non-parametric background density estimation and spatial. y. Nat. er. io. sit. correlation approaches. Parametric methods such as Gaussian mixture model (GMM) need to be pre-assigned some variables to the model to get better results. Frame differencing, local. n. al. Ch. i Un. v. binary pattern (LBP) and local ternary pattern (LTP) background model fall in the category of. engchi. non-parametric approaches. Spatial correlation approaches such as co-occurrence of image variations attempt to model the background scene using spatial information.. Each of the above approaches has its own advantages and weak points. In order to get the best performance, user has to find out what is the suitable situation for utilizing these approaches. In the following, the background modeling methods used in this thesis will be elaborated respectively.. 15.

(29) 3.2.1.. Consecutive Frames Subtraction. The simplest approach of background modeling is consecutive frames subtraction. If two frames are totally identical, then the result will be a dark image due to subtracting by itself. On the other hand, if there are some brightness areas in the subtraction image, we can conclude that some movements occur near those regions. Through subtraction of consecutive frames, it is possible to identify areas not belonging to the still background. In Fig. 3-2, the. 政 治 大. areas with brightness responses belong to the foreground, the remaining darkness places are. 立. considered part of the background.. ‧. ‧ 國. 學. n. er. io. sit. y. Nat. al. Ch. engchi. i Un. v. Fig. 3-2: The first column of images are the first frames of consecutive frames, the second column of images are the second frames of consecutive frames and the last column of images are the results by using consecutive frames subtraction.. 16.

(30) 3.2.2.. Gaussian Mixture Model. GMM is an adaptive approach which models the color or intensity distribution of every pixel in an image as a mixed model of several Gaussian distributions. As illustrated in Fig. 3-3, every pixel belonging to the background it will be modeled as a part of GMM. Once a motion pixel shows up, it will be excluded from the model because its intensity is very different from those in the model. By using GMM, it is possible to build an adaptive mechanism to simulate. 政 治 大 motion area if they do not belong to the current model. The advantage of GMM is that it can 立. the background color distribution of every pixel. Therefore, those pixels will be seen as. and. of the Gaussian distribution to make them adapt to various. ‧. ‧ 國. parameters such as. 學. be modified by setting a learning rate. It is also flexible to assign different values to the. types of background. But since GMM has to model each pixel in an image, the computational. sit. y. Nat. complexity is very high. Fig. 3-4 gives an example of applying GMM to the video clip. GMM. al. n. in the video.. er. io. can be used to locate the motion areas and rebuild the background if the object becomes still. Ch. engchi. 17. i Un. v.

(31) 立. 政 治 大. ‧ 國. 學. and Fig. 3-3: Every pixel belonging to the background will be modeled as a part of GMM. denote the mean and the standard deviation of a Gaussian distribution respectively. ‧. n. er. io. sit. y. Nat. al. Ch. engchi. 18. i Un. v.

(32) Fig. 3-4: (Top) original frames captured from video; (bottom) corresponding motion after performing Gaussian mixture subtraction. 政 治 大. 立 Local Binary Pattern. ‧ 國. 學. 3.2.3.. ‧. LBP is a texture-based method for background subtraction proposed by Heikkilä and Pietikäinen [5]. Each pixel is modeled as a group of local binary pattern histograms that are. y. Nat. er. io. sit. calculated over a circular region around the pixel. The procedure for calculating LBP is illustrated in Fig. 3-5 and Eq. (3-1, in which. denotes the LBP code of a center pixel. n. al. ,. Ch. i Un. is the gray value of the center pixel and. engchi. v. is the gray value of p-th. neighborhood of P equally spaced pixels on a circle of radius R. LBP was shown to be tolerant to illumination variations, the multimodality of the background, and the introduction or removal of background objects. Furthermore, this method can achieve real-time processing.. (3-1). 19.

(33) 立. 政 治 大. Fig. 3-5: The procedure to generate a local binary pattern7. ‧ 國. 學. After calculating LBP for every pixels of an image, the LBP background model of the. ‧. image can be established by specifying a parameter called. which is the radius of the. y. Nat. sit. area desired for gathering statistics of the LBP. As illustrated in Fig. 3-6, the histogram which. n. al. er. io. is the background model of LBP can be gathered from the local binary patterns in a. i Un. v. predefined area. However, this is just the model for this pixel; all of the pixels in this image. Ch. engchi. have its own histogram and can be calculated separately. After all of the background models have been built, we can compare the new histogram from a new frame with existing one by simple intersection or. distance to see how similar they are. The higher the similarity, the. more likely the patterns belong to the existing background. In other words, they do not belong to the same background if the similarity between these two histograms is lower than a specified threshold.. 7. Retrieved and edited at http://www.scholarpedia.org/article/Local_Binary_Pattern. 20.

(34) 政 治 大. Fig. 3-6: The histogram of LBP background model. 學. 3.2.4.. ‧ 國. 立. Local Ternary Pattern Model. ‧. LBP is a computationally efficient local texture descriptor that has been applied. y. Nat. sit. successfully to tasks such as texture classification, face recognition, and background modeling.. n. al. er. io. However, LBP is quite sensitive to random noise in near-uniform image regions, which are. i Un. v. typically seen in many home environments. The local ternary pattern (LTP) classifies the. Ch. engchi. neighboring pixels in a region into three sets, namely, greater than, approximately equal to, and less than. The formula used to define LBP (Eq. (3-1) only needs a slight modification to be used in calculating LTP (Eq. (3-2). Refer to the diagram shown in Fig. 3-7, LTP encodes the relationship using a ternary string so that a slight change in a pixel‟s intensity value will not result in any variation in the corresponding representation, achieving better noise immunity than the original LBP [8].. 21.

(35) (3-2). 立. 政 治 大. ‧ 國. 學. Fig. 3-7: The procedure of make a local ternary code. ‧. The rest of the procedure for establishing background model is almost the same as that. sit. y. Nat. for LBP background modeling. The major difference is that the number of bins of the. er. io. histogram becomes quite different. Due to the change of base system from 2 to 3, the number. al. n. iv n C prohibitively high. There is always ahtrade-off e n g between c h i Uthe need for speed and accuracy.. of bins increases very fast. If a large radius is adopted, the computational complexity will be. 3.3. Image Noise and Noise Removal By the definition, image noise is the random variation of brightness or color information in images produced by sensor and circuitry of a scanner or digital camera. Image noise is generally regarded as an undesirable by-product of image capture. Noises can significantly degrade image quality, so every camera manufacturer is always trying to reduce noises in an image. In other words, the less noise it generates, the better quality the image has.. 22.

(36) To address the noise interference issue, we have to perform post-processing on images. But there are several kinds of noises and they are caused by different mechanisms. No single effective approach exists to eliminate all of them at the same time. However, since each type of the noises has its unique characteristics, in most cases, it is possible to discover the inverse procedure to remove them along the way back. The solution to remove these noises is called „filters‟. To sum up, the most common solution to remove the noises from pictures is to find out which type of noise it belongs to. Then we can adopt corresponding filters to eliminate them.. 立. 政 治 大. ‧ 國. 學. There are several types of well-known noises. The most common in image and video processing are high ISO noise and salt and pepper noise.. ‧. . High ISO noise:. sit. y. Nat. n. al. er. io. High ISO noise is often happened when taking a picture by using high ISO mode. Refer. i Un. v. to Fig. 3-8, this noise manifests like tiny colored glasses scattered all over the picture area. It. Ch. engchi. is like getting grains when you use high-speed films. It is necessary to eliminate this kind of annoying noise to a certain extent when the images are acquired in low illumination environments.. 23.

(37) 政 治 大. 立. ‧ 國. 學. Fig. 3-8: Left is the image captured using 100 ISO, center is captured using 1600 ISO, right upper is the detail view of left image and right bottom is the detail view of center8. ‧. . Salt and pepper noise. sit. y. Nat. n. al. er. io. Salt and pepper is a kind of noise that appears on images with randomly white and. i Un. v. black pixels. It is named because the colors of noise are like salt and pepper, as shown in Fig.. Ch. engchi. 3-9. Fortunately, the intensities of the noise often fall in two extreme intensity intervals. It is thus easy to eliminate these noise pixels using statistical approaches.. 8. Retrieved at http://en.wikipedia.org/wiki/Image_noise. 24.

(38) 政 治 大. Fig. 3-9: Examples of salt and pepper noise9. 學. 3.3.1.. ‧ 國. 立. Gaussian Smooth Filter. ‧. Gaussian smooth filter is often used to blur images and remove detail and high ISO. y. Nat. sit. noise. Images applied by Gaussian smoothing filter have the more smooth results called. n. al. er. io. Gaussian blur. But this filter is different from mean filter which just gives all its neighbor. i Un. v. pixels the same weight and calculates the mean from them as its new value. Instead, it uses. Ch. engchi. distribution whose shape is like a bell to give different weights to its neighbors for generating more natural smooth surface. The kernel of Gaussian smoothing filter in image processing is often designed as a 2-D convolution operator in order to take both two directions of an image into account. Eq. (3-3 gives the equation for Gaussian smoothing filter, in which the positions in this probability model and. stands for the standard deviation of the. distribution.. 9. denote. Retrieved at http://en.wikipedia.org/wiki/Image_noise and http://en.wikipedia.org/wiki/Salt_and_pepper_noise. 25.

(39) (3-3) To make calculating on images easier, this equation has a discrete version. Fig. 3-10 is the illustration of Gaussian smooth filter in two dimensions and its discrete version. By applying this, we can eliminate the high ISO noise to get better result from Fig. 3-8. See Fig. 3-11 for the processed image.. 立. 政 治 大. ‧. ‧ 國. 學. n. er. io. sit. y. Nat. al. Ch. engchi. i Un. v. Fig. 3-10: Two dimension of Gaussian smooth filter and its discrete version. 26.

(40) Fig. 3-11: Comparison of before and after performing Gaussian smooth filter (kernel size = 3)10. 立. Median Filter. 學. ‧ 國. 3.3.2.. 政 治 大. Images captured using infrared cameras in the dark environment often contain salt and. ‧. pepper noise. Salt and pepper noises are some black and white pixels whose intensities fall on. sit. y. Nat. two extreme sides of brightness levels. They do not belong to the group which is composited. er. io. by „clean‟ pixels in an image. If the pixels containing salt and pepper noises are sorted. al. n. iv n C U the quality of images corrupted h e npossible method to pick out the noise, it becomes g c htoi enhance. according to their intensity, the noises will lie on two sides of the sorted sequence. If there is a. with salt and pepper noises.. Using the idea of median from statistics, there is always a middle number in a sorted sequence called median number. Because of the facility of median that it is always in the center position of sorted sequence, it is possible to pick up the pixel that can be the noises. As illustrated in Fig. 3-12, after selecting an area of image, it performs sorting on these intensities. 10. Retrieved at http://en.wikipedia.org/wiki/Image_noise. 27.

(41) of pixels to get a sorted sequence. Then it selects the number on center position of the sorted sequence to get the median. At the last step, it assigns the median to the central pixel of specified area to finish the procedure of median filter. User also can change the radius of the filter to get various effects according to their need. Fig. 3-13 shows the comparison of images before and after applying a median filter.. 立. 政 治 大. ‧. ‧ 國. 學. n. er. io. sit. y. Nat. al. Ch. engchi. i Un. v. Fig. 3-12: The procedure of median filter with kernel size 3 by 3. 28.

(42) 立. 政 治 大. ‧. ‧ 國. 學. n. er. io. sit. y. Nat. al. Ch. engchi. i Un. v. Fig. 3-13: Left image is created by adding noise salt and pepper; Right image is denoised by using median filter (kernel size = 3). 3.3.3.. Image Binarization. Image binarization is not designed to deal with any specific type of noise. It is used to eliminate some pixels that have weak response in images. By specifying a suitable threshold,. 29.

(43) the image can be re-colored to a binary image which has two intensities of pixels, one group contains pixels whose intensities exceed the preset threshold; another group contains pixels whose intensities are smaller than the threshold. Through this procedure, a gray-scale image can be converted into a binary image. However, it is difficult to determine what the threshold should be. In case the threshold of binarization is not good enough to separate noises and normal pixels, it is quite possible to lose useful information.. 立. 政 治 大 (3-4). ‧ 國. 學. Eq. (3-4 and Fig. 3-14 illustrate how image binarization works.. ‧. Nat. pixel‟s coordinate in this image. And. and. represent the intensity of. in original one.. n. Ch. is the. er. io. pixel in new image that exceeds or no more than. al. y. denotes the image after performing binarization where. sit. original image and. denotes the. engchi. i Un. v. Fig. 3-14: Comparison of before and after applying banirization on Fig. 3-2 with threshold 87. 30.

(44) 3.4. Motion History Image As Fig. 3-15 shows, it is difficult to estimate an object‟s movement from a single image. A single image contains only static information at a specific moment. We need an image sequence to serve this purpose. Because the frame rates are usually over 15 fps, the interval between frame to frame is too short for human to notice any delay time. However, the positions of an object in two consecutive frames are still slightly different. By accumulating. 政 治 大 time stamps to these slight motions that occurred in different frames. This technique is called 立. the differences between frames, it is possible to trace the movement of an object by marking. ‧. ‧ 國. 學. motion history image.. n. er. io. sit. y. Nat. al. Ch. engchi. i Un. v. Fig. 3-15: Cannot retrieve motion information from a single image. 3.4.1.. Mechanism of Motion History Image. In practical implementations, motion history image (MHI) requires the specification of four variables. As illustrated in Eq. (3-5 and Fig. 3-16, we use which has the motion information called The elements of. and an image. to update the motion history in. will be assigned by the 31. which is usually the frame.

(45) counts or real timestamp now if the elements in the same position of. is are not. equal to zero. Only if there is no motion occurred which means the elements in are equal to zero and the timestamp recorded in. is less than. subtracting. . In other words, the motion history in this element will be cleared because it took place too long ago. Notice that. 立. represent the coordinate of the pixels in its image.. 政 治 大 (3-5). ‧. ‧ 國. 學. n. er. io. sit. y. Nat. al. Ch. engchi. i Un. v. Fig. 3-16: (Top) original frames (Bottom) MHI after adding timestamps; brighter areas indicate „newer‟ motions and darker areas indicate „older‟ motions. 32.

(46) 3.4.2.. Features from Motion History Image. From motion history image, we can determine the overall trend of movement by visual inspection. Additionally, there are two features we can extract from the motion history image. One is global amount of the movement at that time; the other one is the global direction of the movement at that moment. The global amount of the movement can be easily calculated by adding all the elements that are nonzero in. as shown in Eq. (3-6 where. . The larger the intensity of the element is, the 政 治 大 can be obtained by adding all stronger the movement is. The global direction of the movement 立 denotes the scale of movement in pixel. ‧ 國. 學. the gradients in the image. As illustrated in Fig. 3-17, the orientations of gradients of the motions are calculated by Eq. (3-7 where. denotes the direction of the pixel. ‧. . Then the global direction of the movement can be acquired by adding all the amounts. n. al. er. io. sit. y. Nat. of the orientations.. Ch. engchi. i Un. v. (3-6). (3-7) By using these two features, we can gain a deeper understanding of the motion. For example, it may indicate waving hands if there are similar amounts of movements. 33.

(47) continuously but the directions of these two movements are opposite. In a conclusion, they can have different interpretations depending on which aspect we are concerned with. When combined with a body localization algorithm, global and local features computed from MHI can be used to recognize the posture during sleep.. 政 治 大 Fig. 3-17: The white circle 立is the global orientation of the motion, red circle is the local orientation of the motion in a specific area. ‧. ‧ 國. 學. n. er. io. sit. y. Nat. al. Ch. engchi. 34. i Un. v.

(48) 4.. The iWakeUp System. The iWakeUp system is an intelligent alarm clock driven by a video-based monitoring unit. The operating principle is as follows. The user sets up an alarm time as usual. At a certain time (say 30 minutes) prior to the alarm time, the video monitoring system will be activated. It then analyzes the incoming video continuously to estimate the sleep status and. 政 治 大. wakes the user if certain wake-up criteria are met. The overall concept of this system is. 立. illustrated in the below diagram (Fig. 4-1). A practical alarm clock must achieve real time. ‧ 國. 學. performance then it still can be used in real life. Notice that we have to use near-infrared. ‧. cameras in order to capture video in total darkness without disturbing the sleeper. Finally, the activity data collected by iWakeUp can further be used for deeper diagnosis. With regard to. y. Nat. er. io. sit. the privacy concerns, the video recording component has been designed as an optional function and can be turned off by user whenever they want.. n. al. Ch. engchi. i Un. v. Fig. 4-1: The concept of iWakeUp system. 35.

(49) 4.1. System Architecture The iWakeUp system consists of two main components, namely, movement extraction and examining by wake-up rules. The movement in a video frame will be extracted by movement extraction component which includes video acquisition, background modeling and noise removal. Each of the modules has its own algorithm to process the input from last step and feed the results into next step. For example, the background modeling module deals with. 政 治 大 step for eliminating noises. Once the filtered foreground image has been obtained, we can 立 the foreground images from original video and feed the foreground images into noise removal. ‧ 國. 學. overlay the foreground image to the original frame using a different color for visual inspection. Then the video with overlapping movement areas can be identified for double. ‧. checking. Overlaying with the original video to visualize the movement areas is optional and. sit. y. Nat. is actually removed from the final implementation due to privacy concerns. In other words,. n. al. er. io. the iWakeUp system keeps the incoming video images in a temporary buffer area and flushes them away once the desired information has been computed.. Ch. engchi. i Un. v. And the movement data will be examined by wake-up rules to see whether iWakeUp should ring the bell to wake the user or not. However, how do we determine wake-up rules in iWakeUp system?. This is the most critical part of this system and the implementation details. of each constituent module are presented subsequently.. 36.

(50) 立. 政 治 大. ‧. ‧ 國. 學 er. io. sit. y. Nat. al. n. iv n C system Fig. 4-2: The U harchitecture e n g c hofi iWakeUp. 4.1.1.. Video Acquisition. The iWakeUp system employs near infrared cameras due to its ability to capture images in total darkness. It can acquire the relatively clear video data from a real scene in a low illumination environment with the assistance of near infrared lights. The experimental setup for collecting the video of sleepers is shown in Fig.4-3. In order not to disturb the participants, it is necessary to keep the sleeping environment as comfortable as possible.. 37.

(51) Fig.4-3: The near infrared video devices used in the experiment. 政 治 大 important roles in terms of storage cost, processing speed and accuracy. It is difficult to get 立. In all video-based surveillance systems, the video resolution and frame rate play. ‧ 國. 學. accurate results in an image processing and machine learning based system due to interference caused by poor images. But it will slow down the processing speed if we adopt a higher video. ‧. resolution or higher frame rate. Since iWakeUp is an intelligent alarm clock, it is required to. sit. y. Nat. react in real-time. So we have to find a balance point between quality and speed. After several. er. io. rounds of field testing, the resolution is set to 320x240 and the frame rate is set to 30fps.. al. n. iv n C U h e n gwill video equipment c hbei removed. Under these settings, the system works smoothly. Notice that the time stamps and channel number added by other. to prevent inclusion of false. movements.. 4.1.2.. Background Modeling. If a robust background model is available, the foreground object (in our case, the human subject) and its movements can be easily detected using consecutive frame differencing (CFD). The simplest way to model the background is by frame averaging. However, this method fails to work reliably for two reasons: (1) images acquired by. 38.

(52) near-infrared cameras are very noisy, and (2) ambient light can change gradually at dawn. To address these issues, we have modified the local binary pattern (LBP) descriptor defined in [5] to local ternary pattern (LTP) for more robust representation and better modeling of the background in noisy image sequences. In order to make the testing more complete for sleeping environment, a well-known background subtraction method named Gaussian mixture model is also implemented. Refer to Table 4-1 is a brief comparison of different background subtraction methods. Because the walls or bed in bedroom are often smooth, we will test these. 政 治 大. approaches in the background module in order to find out which one has the best result in. 立. dealing with sleeping environment. Fig. 4-4 shows the comparison of applying various. ‧ 國. 學. background subtractions on incoming video frame. CFD can detect the motion from upper body but cannot sense the movement from the cover. GMM has a similar result as CFD but it. ‧. also generates plenty of salt and pepper noises. LBP tends to produce a lot of false positives. y. Nat. er. io. false positives.. sit. around the wall region. Only LTP can successfully distinguish where the motion is with few. al. n. iv n C Table 4-1: The characteristics background subtraction approaches h eofneach gchi U Non-/Parametric Texture-based. Adaptive. Speed. CFD. N. N. N. 1. GMM. P. N. Y. 2. LBP. N. Y. Y. 3. LTP. N. Y. Y. 4. 39.

(53) (a). 立. 政 治 大. ‧. ‧ 國. 學. io. sit. y. Nat. n. al. (c). er. (b). Ch. engchi. i Un. (d). v. (e). Fig. 4-4: (a) Incoming video frame; after applying (b) CFD, (c) GMM, (d) LBP (e) LTP background subtraction. 40.

(54) 4.1.3.. Noise Removal. We have considered the effect of noise in the background modeling process. However, this only deals with the noise at the individual pixel level. We cannot presume that all moving pixels are caused by the movement of the subject at this stage. Still some motion areas are generated by noisy pixels caused by poor quality of cameras. Some are created by motion outside of the region of interest (ROI) which is not the area we concerned about. The former. 政 治 大 latter can be corrected by examining the layout of the room and the position of the bed with 立. problem can be resolved using morphological filters to remove small and isolated blocks. The. ‧ 國. 學. respect to the camera, accumulating only those data within in the ROI.. ‧. We can always examine some specific regions of a video to reduce the influence of noises. But it is not necessary to perform morphological filters on the results generated by. y. Nat. er. io. sit. every background subtraction methods. Since the characteristics of each background subtraction methods vary, the types of noise are often diverse. For example, the foreground. n. al. Ch. i Un. v. produced by GMM always have salt and pepper noises due to the mechanism of statistical. engchi. manner, the noise can be easily removed after applying a median filter. In consecutive two frames subtraction, the noises often have very weak responses. As such, they can easily be removed by setting an intensity threshold. Other methods such as LBP and LTP do not need further processing on their foreground image because the noises cannot be easily removed or the results are already good enough. Table 4-2 lists the noise removal algorithms we adopt for each background modeling method. Fig. 4-5 shows the results after applying these filters. We can see that image binarization enhances the motion areas from CFD. The salt and pepper noises generated by GMM are effectively eliminated by median filter.. 41.

(55) Table 4-2: The noise removal approaches applied with corresponding background subtraction approaches Noise Removal Approaches Median Filter. Image Binarization. CFD. V. GMM. V. 立. 政 治 大. ‧. ‧ 國. 學. (a). (b). n. er. io. sit. y. Nat. al. Ch. engchi. (c). i Un. v. (d). Fig. 4-5: (a)(c) are the original images after CFD and GMM background subtraction, respectively; (b)(d) are its corresponding result after image binarization and median filter. 42.

(56) 4.1.4.. Display Movement Areas. This module is optional in iWakeUp system, but it is very important for the operator who cares about which part of body moves. Displaying movement areas is very simple. We just need to overlap the foreground image after applying corresponding noise removal filter back to the original image captured by video. With the assistance of different color, people can understand where the movements happened. This module is not only for designers or. 政 治 大 researchers who want to gather more information of movements of this patient for diagnosis 立 operators who care about the correct functioning of the system, but also for the doctors or. ‧ 國. 學. sleep disorder. Fig. 4-6 is an example we overlap the movement areas on the original frame. Once again, this module is optional and can be removed any time if the user expresses. ‧. concerns.. n. er. io. sit. y. Nat. al. Ch. engchi. i Un. v. Fig. 4-6: The example of overlaying movement areas with original frame. 4.2. Wake-up Rule This is the most critical component in the iWakeUp system. An accurate wakeup rule will ensure the user is awakened at a proper sleep stage. To derive the rules, we first need to 43.

(57) determine what information can be obtained from the acquired video.. The outcome of the noise removal stage is a binary image containing the position of motion pixels. It is straightforward to compute the number of motion pixels (or percentage of moving pixels to eliminate dependency on video resolution). We call it the global motion amount. When a number of motion frames are processed, we can further compute the motion history image [8] and the associated motion gradient to obtain the direction of global movement, as shown in. 立. 政 治 大. Fig. 4-7. Applying the same procedure to local patches, we can also estimate local. ‧ 國. 學. motion amount and the direction of local movement.. ‧. 1,400. Nat. sit. y. Motion Pixels. 1,200. Ch. engchi. i Un. v. 350 300 250 200. 600 150 400. 100. 200. 50 0 1 34 67 100 133 166 199 232 265 298 331 364 397 430 463 496 529 562 595 628 661 694 727 760 793 826 859 892 925 958 991 1024. 0. Seconds. Fig. 4-7: Global motion and direction of movement. 44. Degree. Pixels. al. n. 800. io. 1,000. er. Angle. 400.

(58) We then derive the wake-up logic from empirical studies. This is done by human observation of several hundreds of video clips taken around 30 minutes before waking and cross-referencing with PGS for ground-truth sleep stage. Two types of motion pattern that signal wakefulness have been identified:. 1) Slight, yet sustained movements (Fig. 4-8(a) (c)). 2) Short term, substantial movements (Fig. 4-8(b) (d)). 立. 政 治 大. More precisely, the rules can be summarized as follows:. ‧ 國. 學. 1) Global motion amount exceeds ML for at least TL seconds, or. ‧. 2) Accumulation of global motion amount exceeds MS in TS seconds.. n. er. io. sit. y. Nat. al. Ch. engchi. (a). i Un. v. (b). 45.

(59) 14000 12000 10000. Pixels. 8000 6000 4000 2000 0. 立. Pixels. 13. 14. 15. 16. 17. y. sit er. al. n. 6000. 12. Seconds. (c). io. 8000. 5. ‧. 10000. 4. Nat. 12000. 3. 學. 14000. 2. ‧ 國. 1. 治 政 大11 6 7 8 9 10. Ch. engchi. i Un. v. 4000 2000 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 Seconds. (d) Fig. 4-8: Two classes of motion pattern and the corresponding motion amount. (a) (c): slight, sustained motion, (b)(d): sudden, substantial motion. 46.

(60) The above logic agrees well with our intuition. But how do we determine the parameters ML, TL, MS and TS?. This is achieved implicitly by utilizing a machine learning. strategy: support vector machine (SVM).. Here we set the time interval for motion computation to be 1 second. At each time unit, we gather the motion amount of the previous and following 7 seconds to form a 15-dimensional feature vector. The vector is fed to an SVM with RBF kernel to perform. 政 治 大 awake cases significantly as the video is taken during “sleep”. Therefore we need to keep a 立. classification of awake/asleep status. It should be noted here that the asleep cases outnumber. ‧ 國. 學. balance on the number of positive/negative instances used in the training phase. Specifically, we selected 172 samples (43 awake instances, 129 asleep instances) to perform training and. ‧. 56 samples (14 awake instances, 42 asleep instances) to evaluate the performance. The results. sit. y. Nat. are summarized in Table 4-3. Notice that the overall accuracy is 88%, yet the accuracy for the. er. io. „awake‟ status is only 65%. One may argue whether this is good enough for an operative. al. n. iv n C h actual „awake‟ instances 30 minutes before e n gawakening. c h i U Consequently, the proposed scheme alarm. According to our analysis of the collected video clips, there is an average of three. should achieve the goal as expected. Table 4-3: Sleep/Wake status using SVM SVM Test Result Accuracy. Awake. Asleep. Overall. 65%. 95%. 88%. 47.

(61) 4.3. A Time-Dependent Wake-up Rule The previous results were obtained using the same penalty for both types of misclassification errors (recognizing awake as asleep and asleep as awake). Considering the fact that it is less desirable to wrongfully wake someone up too early, we can impose a time-dependent cost factor to dynamically adjust the penalty so that we will be more conservative in having a „wake-up‟ decision when the alarm time is far off.. 政 治 大. Refer to Fig. 4-8, let the white circles denote the „awake‟ instance and black circles. 立. denote the „asleep‟ instance. The original SVM formulation aims to find a hyper-plane. ‧ 國. 學. separating these two categories by maximizing the distance between these two classes while. follows:. ‧. tolerating some instances on the soft margin. The objective function can be expressed as. n. er. io. sit. y. Nat. al. Ch. engchi. i Un. v. (4-1) where. represents the training data set, x denotes the instances and y denotes the label of. categories.. represents the magnitude of soft margin and. denotes the penalty. Here. penalties which arise from erroneous classifications are treated the same. The newly proposed time-dependent rule calls for a slight modification of the objective function:. 48.

(62) (4-2) and. denote the time-varying penalty of each type of classification error,. respectively.. 立. 政 治 大. ‧. ‧ 國. 學. io. sit. y. Nat. n. al. er. where. Ch. engchi. i Un. v. Fig. 4-9: Binary classification with penalty mechanism. 49.

(63) To observe the effect of asymmetric penalties, we perform an exhaustive evaluation of the classification performance by setting. in the range 0.0001 to 12 (. is always set. to 1). The results are depicted in Fig. 4-9. The accuracy of the awake category increases when the penalty of awake class is set to a large value, and vice versa. In order to keep acceptable awake and overall accuracy, the specific range of 0.0001 ~ 0.1 for. is employed (Fig.. 4-10). 100.00% 90.00%. 立. 80.00%. y. al. n. Ch. engchi. i Un. v. 0.0001 0.3641 0.7281 1.0921 1.4561 1.8201 2.1841 2.5481 2.9121 3.2761 3.6401 4.0041 4.3681 4.7321 5.0961 5.4601 5.8241 6.1881 6.5521 6.9161 7.2801 7.6441 8.0081 8.3721 8.7361 9.1001 9.4641 9.8281 10.1921 10.5561 10.9201 11.2841 11.6481. 0.00%. io. 10.00%. Overall Accuracy. sit. 20.00%. Sleep Accuracy. Nat. 30.00%. Wake Accuracy. er. 40.00%. ‧ 國. 50.00%. ‧. Accuracy. 60.00%. 學. 70.00%. 政 治 大. Penalty Ratio. Fig. 4-10: Classification accuracy vs. the penalty ratio of awake and asleep classes. 50.

(64) 100.00% 90.00% 80.00% 70.00%. Accuracy. 60.00% 50.00% 40.00% 30.00% Wake Accuracy. 20.00% 10.00%. 立. Overall Accuracy. ‧ 國. Penalty Ratio. 學. 0.0001 0.0041 0.0081 0.0121 0.0161 0.0201 0.0241 0.0281 0.0321 0.0361 0.0401 0.0441 0.0481 0.0521 0.0561 0.0601 0.0641 0.0681 0.0721 0.0761 0.0801 0.0841 0.0881 0.0921 0.0961. 0.00%. Sleep Accuracy. 政 治 大. ‧. Fig. 4-11: Classification accuracy vs. the penalty ratio of awake and asleep classes in the range [0.0001, 0.1]. sit. y. Nat. er. io. The time-dependent decision rule works as follows. If the preset alarm time is still far. al. n. iv n C h e conservative awakening somebody wrongfully. This awakening strategy can assure that the ngchi U off, the misclassification of asleep class as awake should be punished more heavily to prevent. alarm only rings in a high confidence situation. As time passes, the misclassification penalty of sleep will be reduced to lower the standard of wakeup rules and increase the opportunity of awakening. An example of dynamic assignment is shown in Table 4-4. It should be pointed out that when the misclassification penalty is set differently for each type of events, the overall penalty, rather than the classification error, is the entry we are attempting to minimize. Therefore, in the case shown in Table 4-4, 65% (from Table 4-3) will be the upper bound on the wake accuracy.. 51.

(65) Table 4-4: The penalty ratio and awake accuracy in different time slots Slot#. Time remaining (min). The Penalty Ratio of Wake and Sleep. Wake Accuracy. 1. 30. 0.044. 13.33%. 2. 25. 0.048. 20.00%. 3. 20. 0.052. 26.67%. 4. 15. 0.056. 40.00%. 5. 10. 0.060. 46.47%. 6. 5. 7. 0. 立. 政 治 0.064 大. 60.00%. Preset alarm time. ‧. ‧ 國. 學. 4.4. The iWakeUp User Interface. sit. y. Nat. io. er. The iWakeUp user interface is illustrated in Fig. 4-12. The left upper part shows the. al. iv n C h ethe calculated motion amount along with n gdirection c h i Uof global motion. n. incoming video (320x240, 30fps by default) and the image with green circle denotes the Left bottom image. express the real-time amount of movement by red bins and the direction of movement by blue bins. The observer can easily realize what happened recently and notice important movements. System parameters and some numerical results are displayed in the right panel which is called system panel. From “Now” label, you can know what time is it. The system can also analyze both streaming and pre-recorded video in real-time through control panel which is placed below the information panel. By clicking “Open from Video Device” button, iWakeUp can selects the video from capture device that user want to use. The user can also extract motion information from some existing videos by clicking “Open from File” button. Whatever. 52.

(66) resource the user selects, iWakeUp can stop processing and release all the allocated resources automatically through easily clicking “Stop” button. If you are using the system by observing sleeper from a camera, it is also necessary to specify the alarm time range which is located in right bottom of the window. At the same time, we tried to develop the main function module of iWakeUp system separates from user interface as possible as we can. Then iWakeUp module can be easily integrated into a multi-purpose sleep-aid device known as Sleep Coach which is another project we collaborated with NTU INSIGHT Center. Within Sleep Coach,. 政 治 大. the user will no longer see the iWakeUp screen under normal operation.. 立. ‧. ‧ 國. 學. n. er. io. sit. y. Nat. al. Ch. engchi. i Un. v. Fig. 4-12: The iWakeUp user interface. 53.

數據

Fig. 2-2: The actigraph generated by Acti-watch
Fig. 2-3: Left 4 : a child attached many sensors on his body; Right 5 : the channels of physiological  signals of PSG
Fig. 2-4: An example of questionnaire in evaluating KSS and VAS
Fig. 3-2: The first column of images are the first frames of consecutive frames, the second  column of images are the second frames of consecutive frames and the last column of images
+7

參考文獻

相關文件

Animal or vegetable fats and oils and their fractiors, boiled, oxidised, dehydrated, sulphurised, blown, polymerised by heat in vacuum or in inert gas or otherwise chemically

Milk and cream, in powder, granule or other solid form, of a fat content, by weight, exceeding 1.5%, not containing added sugar or other sweetening matter.

Later, though, people learned that Copernicus was in fact telling the

Estimated resident population by age and sex in statistical local areas, New South Wales, June 1990 (No. Canberra, Australian Capital

In this research, we use conventional RGB (Red, Green, Blue) images as input data. The conventional RGB image is demosaiced from raw image by using the color interpolation

T transforms S into a region R in the xy-plane called the image of S, consisting of the images of all points in S.... So we begin by finding the images of the sides

– Change Window Type to Video Scene Editor – Select Add → Images and select all images – Drag the strip to the “1st Frame” in Layer

Location Context Interpreter block (LeZi-PDA, etc.).. Foreground