• 沒有找到結果。

卓越數位學習科學研究中心(II)

N/A
N/A
Protected

Academic year: 2021

Share "卓越數位學習科學研究中心(II)"

Copied!
44
0
0

加載中.... (立即查看全文)

全文

(1)

行政院國家科學委員會專題研究計畫 期中進度報告

卓越數位學習科學研究中心--卓越數位學習科學研究中心

(2/3)

期中進度報告(精簡版)

計 畫 類 別 : 整合型 計 畫 編 號 : NSC 98-2631-S-003-002- 執 行 期 間 : 98 年 08 月 01 日至 99 年 07 月 31 日 執 行 單 位 : 國立臺灣師範大學科學教育中心 計 畫 主 持 人 : 張俊彥 共 同 主 持 人 : 李忠謀、陳伶志、曾元顯、襲充文、李蔡彥 柯佳伶、方瓊瑤、陳柏琳、侯文娟、楊芳瑩 江政杰、葉富豪、林育慈 報 告 附 件 : 出席國際會議研究心得報告及發表論文 處 理 方 式 : 本計畫可公開查詢

中 華 民 國 99 年 06 月 02 日

(2)

行政院國家科學委員會補助專題研究計畫

期中進度報告 (精簡版)

卓越數位學習科學研究中心(2/3)

計畫類別:□ 個別型計畫

■ 整合型計畫

計畫編號:NSC97-2631-S-003-003-

執行期間: 97 年

12 月

01 日至

98 年

07 月

31 日

計畫主持人: 國立臺灣師範大學地球科學系(所)

張俊彥 教授

共同主持人: 國立臺灣師範大學資訊工程學系(所)

李忠謀 教授

國立臺灣師範大學資訊工程學系(所)

柯佳伶 副教授

國立臺灣師範大學資訊工程學系(所)

方瓊瑤 副教授

國立臺灣師範大學資訊工程學系(所)

陳柏琳 副教授

國立臺灣師範大學資訊工程學系(所)

侯文娟 助理教授

國立臺灣師範大學資訊中心

曾元顯 研究員

國立臺灣師範大學科學教育研究所

楊芳瑩 教授

中央研究院資訊科學研究所

陳伶志 助理研究員

國立中正大學心理學系

襲充文 教授

國立政治大學資訊科學系(所)

李蔡彥 教授

德明財經科技大學資訊科技系

江政杰 助理教授

輔英科技大學資訊管理系

葉富豪 助理教授

國立暨南大學

林育慈 教授

國立臺灣師範大學科學教育中心

陳佳利 博士後研究員

國立臺灣師範大學科學教育中心

張月霞 博士後研究員

本期中報告包括以下應繳交之附件:出席國際學術會議心得報告 1 份

執行單位:國立台灣師範大學

99 年

06 月

01 日

(3)

一、計畫摘要:

本卓越數位學習科學研究中心與國際頂尖研究機構合作,發展整合新興科技之創新科學學習環境 (運用影像處理,語音處理,視訊處理,語音辨識,行動科技,機器翻譯,自然語言處理,資料探勘, 機器學習等等技術)。其主要目的在於建立同時兼具個人學習以及團體互動等特色的學習教材與評量 工具為基礎的智慧型教室。為了達成上述目標,本計畫致力於整合科學教育、認知科學、資訊科學、 資訊工程等多元領域專家,並提出二個子計畫,建構智慧型未來教室(Smart Classroom 2.0),旨在勾 勒未來教室環境之藍圖。本計畫發展的創新學習環境中,教師的教學方式、學生的學習策略、師生之 間的互動以及科學學習成效之認知與情意的改變,為本計畫評估與探索的重點。這些改變包括了學生 的地球科學/資訊科學專業領域知識、高層次思考能力、動機與態度等。總計畫的任務著重於協調各 子計畫之運作、統整各子計畫間之實驗設計,以及整體計畫評鑑之實施。除此之外,總計畫亦負責卓 越數位學習科學研究中心之組織運作、定期實施研究團隊討論會議、建置實驗教室環境、參訪國際頂 尖之研究中心及學者、舉辦國際會議或工作坊等等。 關鍵字︰ 學習環境,數位科學學習,課堂學習,行動學習,評量

The R&D of i4 future learning environment, in collaboration with leading foreign institutes especially in the areas of computer science or science education, proposes to develop an innovative science learning environment which integrates modern technologies (image processing, speech processing, automatic video processing, speech recognition, mobile technologies, machine translation, natural language processing, data mining, machine learning, etc) with the aims to create an intelligent classroom that supports individualized and interactive learning materials and assessment tools. To realize the aforementioned goals, we bring together a group of experts in the area of science education, cognitive science, computer science, and computer engineering to work on two major research topics: (1) Classroom 2.0, to establish the envisioned future classroom; and (2) Testing 2.0, to pioneer new technologies on assessment. Changes and effects along four directions will be investigated and evaluated under the innovative learning environment: teachers' teaching approaches (TTA), students' learning strategies (SLS), student-teacher interactions (STI), and student science learning outcomes (SLO) in both cognitive and affective domains including students’ domain knowledge, higher-order thinking ability and attitudes and motivation in the subject matters. In particular we will look into the effects such an innovative science learning environment and students’ preferred-actual learning environment spaces have on the TTA, SLS, STI, and SLO in the school. Expected outcome includes working model of a truly smart classroom that allows for sweep upgrade of current (technology-enabled) classrooms to smart classrooms, new pedagogical models for teaching and learning in the i4 learning environment, and technologically enhanced way for data collection and interpretation on science educational researches.

(4)

二、重要執行成果及價值:

Research of Multimedia Technology on Educational Settings

Under the Center for Excellence In e-Learning Science (CeeLS) project, multimedia technologies such as 3D Compound Virtual Field Trip (3D-CVFT) and Interactive Animation were applied in the studies of earth science field trip and cognitive load theory in college and high school settings. These empirical studies have proved the effectiveness of multimedia technology in motivating learning and enhancing understanding. The studies have been accepted by the 15th Annual CyberTherapy & CyberPsychology 2010 Conference and the study abstracts will be published by the Journal of CyberTherapy & Rehabilitation (JCR), the official journal of iACToR.

Development of Animation-Based Questionnaire

The development of a full-fledged Online Contextualized Animation-Based Questionnaire (ABQ) is in progress. The result of a comparative pilot study has indicated that the ABQ is more suitable for describing abstract or unfamiliar question context than the traditional paper-based questionnaire. By visually presenting the question context, the multimedia technology will be integrated into educational survey to explore students’expectation offuturelearning environments.

Establishment of Social Tagging System and Validation of Tags for Learning

The OSR (OpenScienceResources) project is a three-year project whose project partner countries are mainly located in Europe. This project aims at the development of a shared digital repository for formal and informal science education. As one of the project partners, CeeLS joins the design of the OSR portal, which is a set of customizable learning-oriented discovery services reliably offered by the websites of science centers and museums, school portals, visualization environments, and other online education publishing services. The goal is to employ social tagging to increase the hit rate of keyword retrieval and enhance the functionality of metadata on the OSR portal. The tag is a kind of metadata given by users. Presently, a research is conducted on how to find the similarity between tags. Having similar tags can help achieve the project goals. On the other hand, in Taiwan, a social tagging system has been built simultaneously on eNTSEC (Taiwan Internet Science Education Center, http://www.ntsec.edu.tw). In addition to the OSR portal to be included on eNTSEC, the investigation of the validation of tags for educational purposes will be performed.

Research and Development of Multimedia Technology

 The Video Processing Research team has developed feasible face detection, tracking and recognition modules suitable for the lecture hall classroom environment. The technologies have been experimented in a control environment. Furthermore, working prototype has been developed and tested in semi-controlled classroom. Related research and the experimental results have also been published in two SCI indexed journal papers (Optical Engineering and Journal of Visual

(5)

Communication and Image Representation).

 The Video Processing Research team has also developed motion detection and recognition modules, which can be easily modified to detect the students’ gestures in the classroom environment. The technologies have been successfully used to detect the critical motion of nearby moving vehicles on the expressway. Related research and the experimental results have also been published in a SCI indexed journal paper (IEEE Transactions on Intelligent Transportation Systems).

 The Speech Processing Research team investigated extractive speech summarization and proposed a unified probabilistic generative framework that combines the sentence generative probability and the sentence prior probability for sentence ranking. Each sentence of a spoken document to be summarized is treated as a probabilistic generative model for predicting the document. An elegant feature of the proposed framework is that both the sentence generative probability and the sentence prior probability can be estimated in an unsupervised manner, without the need for handcrafted document-summary pairs. A prototype system for speech summarization and retrieval of the NTNU courseware has also been established. Related research and experimental results have also been published in an SCI indexed journal paper (IEEE Transactions on Audio, Speech and Language Processing, 2009).

 The core video processing technology has also been put to use to enhance learning of Computer Science topics at the K-12 level. Our experiment shows that with our proposed model, high school students are able to learn CS topics effectively without additional class time than already allotted. This finding and the teaching model has been published in the SSCI indexed Computers and Education journal.

(6)

三、成果效益(含已有之重大突破及影響)

學術成就

Studies of Virtual Reality (VR) Technology on Earth Science Field Trip

Learning at Higher Education Level

The Development of VR Tool for Assisting Geological Field Trip

This study describe how to effectively integrate the technology of Graphic-Based VR and 3D Stereo-Vision into the development of the 3D Compound Virtual Field Trip (3D-CVFT) system for earth science education. Due to the restrictions and issues related to weather, distance, and safety, actual field trips are not always feasible. The 3D-CVFT is therefore designed to serve as preparatory work for students to familiarize themselves with actual field sites.

Figure. Inspecting the Site Swinging the Wiimote

Comparison of 3D VR and Actual Geological Field Trip

This study compares an actual field trip with an online 3D Compound Virtual Field Trip (3D-CVFT) system in the major functions before, during, and after the geological field trip. Positive outcomes and responses gathered from the pilot studies on the 3D-CVFT system on earth science education have encouraged the research team to further examine how the online 3D system can function most effectively in facilitating students’learning.

(7)

Figure. The comparison between actual field trip and online 3D-CVFT system

Utilizing VR to Improve Geological Field Trip Learning

In response to the calls for engaging students with more field work experience in earth science learning, a virtual field trip reality program, 3D Virtual Hsiaoyukeng Field Trip System, is developed as an undergraduate earth science field trip module to the Hsiaoyukeng area in northern Taiwan. The study results indicate that the vast majority of the undergraduate earth science students consider the 3D VR system can help enhance their field trip learning effectively. The 3D system is perceived as a useful tool to facilitate the field trip learning activity as learning assessment and for prior knowledge acquisition.

Figure. Screenshots of Hsiaoyukeng Figure. Picking up the rock

The above three studies have been accepted by the 15th Annual CyberTherapy & CyberPsychology 2010 Conference and the study abstracts will be included in the Journal of CyberTherapy & Rehabilitation (JCR), the official journal of iACToR.

(8)

Integrating Interactive Functions into Instructional Animation Study

It is widely recognized that animation can function as a scaffold to assisting learners in constructing mental representations of cause-and-effect actions. In this study, the interactive functions are integrated into animation (Interactive Animation Group, IAG) for the purpose of minimizing extraneous cognitive load in learning.

Figure. Screenshots of IAG

This investigation compares a static illustrations group (SIG) and a continuous animation group (CAG). The results of statistical analysis showed that the IAG effectively reduces individual mental effort in learning complex cause-and-effect actions. This study has been accepted by the 15th Annual CyberTherapy & CyberPsychology 2010 Conference and the study abstract will be published the Journal of CyberTherapy & Rehabilitation (JCR), the official journal of iACToR.

Eye Movement Study

The eye-tracking lab has been established. Some pilot tests have also been conducted. By the time of writing this report, we have analyzed the eye-movement data obtained from 6 students (out of 12 available data). More data are being collected and analyzed. These limited data at this far show that when reading a science exploratory text that contains claim, arguments and data, university science students attended mostly to the meanings of different claims and backing theories. Relatively less attention was paid to data and warrants. Meanwhile, it was found that readers with relevant knowledge background spent more time on reading the scientific theories and would go back to data more often than those without relevant knowledge background. Some of the findings of the eye movement experiment have been submitted to and accepted by the 2010 National Association of Research in Science Teaching (NARST) conference.

The main task in the second year was to (1) establish the eye-tracking lab and (2) conduct pilot tests on students’eyemovementanalyses.AfterFaceLAB arrived in early 2009,ittook sometimeto setup an eye-tracking Lab and incorporate the softwares, such as Gaze Tracker and Overlay, for analysis. A closed work-shop was then held to trained research assistants (3 members). The above-mentioned preparation tasks lasted for about a couple of months. Afterward, some pilot tests were planned and conducted. Data analysis is still in progress.

(9)

In thepilottests,we examined college students’reading behaviors.Such a task wasplanned due to the prerequisite that basic patterns of information processing on science texts should be specified before examining furtherlearners’attention on coursematerialsin theclassroom environments.Thesciencetexts are unique in that they contain certain structures elements such as claims (arguments), theories, evidence, data, etc. Thus, the purpose of the pilot test is to explore how university students process such information. In so far we have invited over 20 students to participate in the experiments. However, due to some technical and learner problems, there are about 12 available data. By the time of writing this report, we have analyzed the eye-movement patterns of 6 students. More data are being collected and analyzed.

技術面效益

1. Technical Research Results

(1) Video-Based Face Recognition Using a Probabilistic Graphical Model

We have proposed a probabilistic graphical model to formulate and deal with video-based face recognition. Our formulation divides the problem into two parts: one for likelihood measure and the other for transition measure. The likelihood measure can be regarded as a traditional task of face recognition within a still image, i.e., to recognize the current observing face image appears in an image. Two-dimensional linear discriminant analysis (2DLDA) (Yang et al. 2005) is employed to judge the likelihood measure. Moreover, the transition measure estimates the probability of the change from the recognized state at the previous stage to each of possible states at the current stage. This approach for transition measure does not only consider the visual difference among persons according to the training face images but also involve prior information of the pose change in video frames. Also, experiments are underway to evaluate the performance of the proposed method.

(2) Multi-Pose Face Detection and Tracking

For Classroom 2.0, we need to detect and track target faces, either single or multiple faces, in a video. Target faces may be moving in a video, possibly with different head poses. Thus, three problems need to be treated. First, not only try are the faces need to be detected but they also need to assure differenttracking personsappeared in avideo.Thatis,a“light” version offace recognition isnecessary. Second, faces may appear in different poses, which makes the problem more difficult. Finally, faces may be lost in a cluttered background. Right now, we employ a particle filter to achieve the task of face tracking in a video. Moreover, a checklist scheme which involves different poses of the tracked face is proposed. We have added a correction phase to revise the tracking target if the system has lost tracking of the target face.

(10)

In classroom, a pure face detection and tracking may not be enough to capture face positions in such clutter environment. Hence, we design a human detection algorithm specifically for locating students who were not found in the face detection phase and then estimate the possible face positions. Face detection and human detection are used to locate student faces initially, and an optical flow approach is then used to track these faces continuously. Our approach does not only extract face images of students for the roll-call system, but also identify students in classroom as the input for student gesture recognition.

(3) 「語音處理」部分計畫執行成果報告

本研究分項在本年度的執行重點在於適用於未來教室(Classroom 2.0)環境之語音辨識、檢索 與摘要技術之發展與系統建置。首先,我們發展了投影片聲控播放系統,透過語音關鍵詞擷取 (Keyword Spotting)技術與 Microsoft Office PowerPoint 播放器結合,預期能提供講員直接以語音 來控制 PowerPoint 講稿播放,提供多模式人機互動環境。此外,本研究分項亦發展以詞主題模 型(Word Topic Models, WTM)為基礎之語音檢索與摘要技術[1-4],有別於 PLSA(Probabilistic Latent Semantic Analysis)與 LDA(Latent Dirichlet Allocation)等將文件視為主題模型。WTM 將語言 中的每一個詞視為一個機率式主題模型用來於預測其它詞的發生機率,例如在從事語音或文件檢 索時,可將每篇文件視成一個合成(Composite)的詞主題模型用來產生輸入查詢(Query)的機率, 作為相關文件排序之依據。WTM 可以迅速且正確地產生新文件之主題模型;反觀之,PLSA 與 LDA 則需經由重新訓練才能獲得新文件之主題模型。目前吾人已將 WTM 實作於廣播新聞檢索 與摘要系統中,正著手規劃將 WTM 實作於課程錄影內容之語音檢索與摘要系統。

(4) Vision-based student gesture recognition

Vision-based student gesture recognition is one research topic in the intelligent Classroom Exception Recognition system (iCERec). In the study, students in a classroom are supposed to sit through a lesson, thus student gestures constitute a space of motion expressed by the upper body, face, and hands. We have surveyed some gesture recognition papers and try to define the types of the student gestures which may occur in the classroom. Furthermore, we set up a camera system to obtain the experimental data. Two successive frames of an input sequence are shown in Figure 1. In these frames, we can observe that the partial occluding problem needs to be solved in this study. Students are not only occluded by the ones sitting in front of them, sometimes they may be also occluded by the ones sitting beside them. The motion segmentation is the other problem should be solved. More than one student may change their gestures simultaneously in an image frame. Combining the temporal and spatial information supplied from the successive frames to separate the different motions can increase the accuracy probability of motion segmentation.

(11)

Figure 1 Two successive frames of an input sequence.

In this year, we develop a testing program to detect the student’s motion. First of all, the system can automatically separate each row in the images of the input sequence. Using horizontal edge detection, morphological enhancement, and horizontal projection techniques, the system can separate each row successfully. The result is shown in Figure 2 (a). On the other hand, the system detects the students’motions by integrating the differential results of every two successive frames. The motion detection result of the two successive frames shown in Figure 1 is shown in Figure 2 (b). Based on the location of each row, the system now can estimate the students’motions.

(a) (b)

Figure 2 (a) The results of row detection, and (b) students’motion detection.

We put some seeds in the motion areas and growing regions based on these seeds, and then combine the nearby regions as objects. Figures 3 and 4 show the results of region growing and region combination. In Figures 3(b) and 4(b), we can observe that each object combined by the regions can be regarded as a student. Different students in the images are shown in different colors. Therefore, a rule-based gesture recognition technique is applied to recognize the gesture of the objects. The recognition output of the image shown in Figure 3(a) is ‘a student sitting on line 1 is rising his/her right hand.”Moreover, the output of Figure 4(a) is ‘a student sitting on line 1 is lying prone.”Now, our system can recognize six gestures of students: rising his/her right hand, rising his/her left hand, rising both his/her hands, standing up, lying prone, and sitting down.

(12)

(a) (b)

Figure 3 An example of students’gesture image (a) the original image; (b) the result of region growing and region combination.

Figure 4 Another example of students’gesture image (a) the original image; (b) the result of region growing and region combination.

(5) Automatic assessmentofstudents’answers

For improving the interaction between students and teachers, it is fundamental for teachers to understand students' learning levels. An intelligent computer system should have the capability to automatically evaluatestudents'answerswhen theteacheraskssomequestions.Assessing students’ answers is a very time-consuming activity that makes teachers cut down the time they can devote to other duties. In this project, we use computers to help the teachers in their assessment task in order to resolve the time-consuming problem. For achieving the assessing task, the first step in this year is to extract the information of sentences that students give. First of all, we built the assessment corpus. We prepared nine questions and the corresponding references in the course of automata and formal language and got thirty-eight answers for each question. With the corpus, we applied the following procedures to extract the relevant information: (1) Apply the part-of-speech tagging such that the syntactic information is extracted. (2) Remove the punctuation and decimal numbers because it plays the noise roles. (3) For group the information, we applied the stemming and normalization procedure to the sentences. (4) Extract other features. In this project, we treated the assessment problem as the classifying problem,i.e.,classifying students’scoresastwo classessuch asabove/below 6 outof10. Therefore, the methods are divided into three parts: (1) data preprocessing, (2) feature extraction and (3) SVM classification. In the preprocessing phase, we want to group word variations by stemming and filter noise by exclusion of punctuation and decimal numbers. In feature generation phase, we extracted part-of-speech for each word, term frequency (TF), Inverted Document Frequency (IDF) and entropy

(13)

as the feature to build our feature vector. In SVM classification step, we completed the classification for the answers. In summary, the current task for Intelligent COntent REtrieval system (iCORE) is continuing in understanding the sentences and assessing students’free-text answers based on natural language processing techniques proposed by the mentioned approaches.

其他學術效益

In addition to publication of research results in international journals and to attend international conferences to present research work, the team has organized a workshop, Applied Intelligent Systems

for Future Classroom, within the 23rd International Conference on Industrial, Engineering & Other Applications of Applied Intelligent Systems (IEA-AIE) 2010. The conference is to be held from June 1st through June 4th at Cordoba, Spain. The conference is one of top conferences (46 out of 701) according to Computer Science Conference Ranking. The workshop will focus on exchanging methodologies and creative ideas to construct intelligent systems for future smart classrooms. An intelligent system provided for the smart classroom should aim to, among others, improve classroom interaction and learning efficiency. This special session is to serve as a forum that brings together the technical researchers and educational practitioners of smart classroom technologies. The technologies, including computer vision, speech recognition, data mining, and natural language processing, are all within the scope of this special session. Furthermore, education technology specialists and practitioners are welcomed to share their instructional experiences. The special session will provide a great opportunity for researchers across many communities to discuss and exchange ideas on smart classroom technologies.

In all, six papers, including one paper by this research team, are accepted into the workshop and are to be presented as a full paper at the conference. The list is given below.

1. Marcos Alexandre Rose Silva, Ana Luiza Dias, and Junia Coutinho Anacleto, "Processing Common Sense Knowledge to Develop Contextualized Computer Applications"

2. Elena Verdú, Luisa M. Regueras, María Jesús Verdú and Juan Pablo de Castro, "Estimating the Difficulty Level of the Challenges Proposed in a Competitive e-Learning Environment"

3. Wen-Juan Hou, Jia-Hao Tsao, Sheng-Yang Li and Li Chen, "Automatic Assessment of Students' Free-text Answers with Support Vector Machines"

4. Soheil Sadi-Nezhad, Leila Etaati, and Ahmad Makui, "Represent a Fuzzy ANP Model for Evaluating E-Learning Platform"

5. Isaías García, Carmen Benavides, Hector Alaiz Moreton, Francisco Rodríguez and Ángel Alonso, "An Ontology-based Expert System and Interactive tool for Computer-Aided Control

(14)

Engineering Education"

6.Antonio Garrido and Eva Onaindía,“On the Application of Planning and Scheduling Techniques

(15)

期刊及研討會論文

Journal paper

(SSCI: 2 SCI: 9 EI: 2)

1. Chang, C. Y., & Lee, G. (2010). A major e-learning project to renovate science leaning environment in Taiwan. The Turkish Online Journal of Educational Technology, 9(1), 7-12. [SSCI]

2. Chiu, C.-F., & Lee, G. C. (2009). A video lecture and lab-based approach for learning of image processing concepts. International Journal on Computers and Education, 52(2), 313-323. [SSCI] 3. Yeh, F.-H., & Lee, G. C. (2009). Pyramid-structure-based reversible fragile watermarking. Optical

Engineering, 48(4). [SCI, EI]

4. Chiang, C. C., Hung, Y. P., Yang, H., & Lee, G. C. (2009). Region-based image retrieval using color-size features of watershed regions. Journal of Visual Communication and Image Representation, 20(3), 167-177. [SCI, EI]

5. Shih, A., Liu, T., Chu, D., Lee, D. T., & Lee, G. C. (2009). GR-Aligner: an algorithm for aligning pairwise genomic sequences containing rearrangement events. Bioinformatics, 25(17), 2188-2193. [SCI]

6. Chen, B., Liu, S.-H., & Chu., F.-H. (2009). Training data selection for improving discriminative training of acoustic models. Pattern Recognition Letters, 30(13), 1228-1235. [SCI-E, EI]

7. Chen, B. (2009). Word topic models for spoken document retrieval and transcription. ACM Transactions

on Asian Language Information Processing, 8(1), 2:1-2:27. [EI]

8. Lin, S.-H., Chen, B., & Yeh, Y.-M. (2009). Exploring the use of speech features and their corresponding distribution characteristics for robust speech recognition. IEEE Transactions on Audio, Speech and

Language Processing, 17(1), 84-94. [SCI-E, EI]

9. Lin, S.-H., Chen, B., & Wang, H.-M. (2009). A comparative study of probabilistic ranking models for Chinese spoken document summarization. ACM Transactions on Asian Language Information

Processing, 8(1), 3:1-3:23. [EI]

10. Chen, Y.-T., Chen, B., & Wang, H.-M. (2009). A probabilistic generative framework for extractive broadcast news speech summarization. IEEE Transactions on Audio, Speech and Language Processing,

17(1), 95-106. [SCI-E, EI]

11. Cherng, S., Fang, C. Y., Chen, C. P., & Chen, S. W. (2009). Critical Motion Detection of Nearby Moving Vehicles in a Vision-Based Driver Assistance System. IEEE Trans. on Intelligent Transportation Systems.

(16)

[SCI]

12. Chen, L.-J., Wang, B.-C., & Chen, K.-T. (2010). The Design of Puzzle Selection Strategies for GWAP Systems. Accepted and to appear in Journal of Concurrency and Computation: Practice and Experience,

John Wiley & Sons Ltd. [SCI-E, EI]

13. Chen, L.-J., & Hung, H.-H. (2010). A Two-State Markov-based Wireless Error Model for Bluetooth Networks. Accepted and to appear in Wireless Personal Communications Journal, Springer. [SCI, EI]

Conference Paper

1. Hou, W.-J., Tsao, J.-H., Li, S.-Y., & Chen, L. (2010, June 1-4). Automatic Assessment of Students'

Free-text Answers with Support Vector Machines. Paper presented at the 23rd International Conference

on Industrial, Engineering & Other Applications of Applied Intelligent Systems (IEA-AIE 2010), Cordoba, Spain. [EI]

2. Guan-Yu Chen, Hsuan-Sheng Chiu, Berlin Chen (2010, March 14-19). Latent topic modeling of word

vicinity information for speech recognition. Paper presented at the 35th IEEE International Conference

on Acoustics, Speech, and Signal Processing (ICASSP 2010), Dallas, Texas, USA.

3. Shih-Hsiang Lin, Yu-Mei Chang, Jia-Wen Liu, Berlin Chen (2010, March 14-19). Leveraging evaluation

metric-related training criteria for speech summarization. Paper presented at the 35th IEEE

International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2010), Dallas, Texas, USA.

4. Chiang, C.-C., Wu, J.-W., & Lee, G. C. (2009, May 20-22). Probabilistic based semantic image feature

using visual words. Paper presented at the 11th IAPR Conference on Machine Vision Applications

(MVA 2009), Yokomaha, Japan. [EI]

5. Lin, Y. T., Yen, B. J., & Lee, G. C. (2009, Apr. 19-24). Structuring and analyzing low-quality lecture

videos. Paper presented at the IEEE International Conference on Acoustics, Speech, and Signal

Processing (ICASSP 2009),, Taipei, Taiwan. [EI]

6. Lin, Y.-T., Yen, B.-J., Chang, C.-H., Yang, H.-F., & Lee, G. C. (2009, Nov. 14-16). Indexing and

teaching focus mining of lecture videos. Paper presented at the IEEE International Symposium on

Multimedia (MTEL 2009), San Diego, USA. [EI]

7. Chan, Y.-C., Chiang, C.-C., Wang, K.-M., & Lee, G. C. (2009, May 20-22). Video-based face recognition

(17)

Applications (MVA 2009), Yokomaha, Japan. [EI]

8. Chen, B. (2009, April 19-24). Latent topic modeling of word co-occurrence information for spoken

document retrieval. Paper presented at the the 34th IEEE International Conference on Acoustics, Speech,

and Signal Processing (ICASSP 2009), Taipei, Taiwan. [EI]

9. Lin, S.-H., Lo, Y.-T., Yeh, Y.-M., & Chen, B. (2009, September 6-10). Hybrids of supervised and

unsupervised models for extractive speech summarization. Paper presented at the the 10th Annual

Conference of the International Speech Communication Association (Interspeech 2009), Brighton, U.K. 10. Lin, S.-H., & Chen, B. (2009, September 6-10). Improved speech summarization with

multiple-hypothesis representations and Kullback-Leibler divergence measures. Paper presented at the

the 10th Annual Conference of the International Speech Communication Association (Interspeech 2009), Brighton, U.K.

11. Lin, S.-H., & Chen, B. (2009, October 23). Topic modeling for spoken document retrieval using

word-and syllable-level information. Paper presented at the the 3rd workshop on Searching Spontaneous

Conversational Speech (SSCS 2009) (in conjunction with ACM Multimedia 2009), Beijing, China. 12. Lee, H.-S., & Chen, B. (2009, April 19-24). Empirical error rate minimization based linear discriminant

analysis. Paper presented at the the 34th IEEE International Conference on Acoustics, Speech, and

Signal Processing (ICASSP 2009), Taipei, Taiwan. [EI]

13. Lee, H.-S., & Chen, B. (2009, December 13-17). Generalized likelihood ratio discriminant analysis. Paper presented at the IEEE workshop on Automatic Speech Recognition and Understanding (ASRU 2009), Merano, Italy. [EI]

14. Wang, H.-M., & Chen, B. (2009, October 4-7). Mandarin Chinese broadcast news retrieval and

summarization using probabilistic generative models. Paper presented at the 2009 APSIPA Annual

Summit and Conference (APSIPA ASC 2009), Sapporo, Japan.

15. Koh, J. L., & Lin, C. Y. (2009). Concept Shift Detection for Frequent Itemsets from Sliding Windows

over Data Streams. Paper presented at the DASFAA 2009 International Workshops, LNCS 5667,

334-348, Springer-Verlag.

16. Chen, L.-J., Syu, Y.-S., Wang, B.-C., & Lee, W.-C. (2009). An Analytical Study of GWAP-based

Geospatial Tagging Systems. Paper presented at the IEEE International Conference on Collaborative

Computing: Networking, Applications and Worksharing (CollaborateCom'09), Washinton D.C., USA. [EI]

17. Chen, L.-J., Chiou, C.-L., & Chen, Y.-C. (2009, May 26-29). An evaluation of routing reliability in

(18)

Conference on Advanced Information Networking and Applications (AINA'09), Bradford, UK. [EI] 18. Huang, J.-H., Chen, Y.-Y., Y.-C., C., Mishra, S., & Chen, L.-J. (2009, May 26-29). Improving

opportunistic data dissemination via selective forwarding. Paper presented at the the 2nd IEEE

International Workshop on Opportunistic Networking (WON'09, in conjunction with IEEE AINA'09), Bradford, UK. [EI]

(19)
(20)

出席國際學術會議心得報告

計畫編號 NSC 98-2631-S-003-002 計畫名稱 卓越數位學習科學研究中心 出國人員姓名 服務機關及職稱 國立台灣師範大學 科學教育中心 主任 張俊彥教授 會議時間地點 98.12.04~98.12.13 法國巴黎 會議名稱 歐盟計畫第二次聯合會議及參訪 發表論文題目

一、 參加會議經過:本次為參與此歐盟計畫「開放科學資源計畫

(OpenScienceResources, OSR)的年度計畫報告會議(每年兩次)以及參

訪巴黎天文台。此計畫整合各國科學博物館及科學教育中心、教育科技、

資訊工程專家、使用者社群等參與人員的專長。本研究團隊受邀參與

OSR,協助 OSR 發展社會性標籤的技術服務,並將社會性標籤技術應用

於台灣地區科學博物館或科學教育館的數位資源平台。此次報告主要為聆

聽各個國家的進度,個人研究團隊則將於 2010 年六月報告我們計畫的進

度。以下是議程:

(21)

OSR Second Consortium Meeting

10-11 December 2009, Paris

The meeting is kindly hosted by the Cité des sciences etde l’industrie

Venue: Library of sciences and Industry (BSI) Level-1 (Carrefour numérique : Classe numérique); Cité des Sciences et de l'Industrie - 30, avenue Corentin-Cariou - 75019 Paris; Metro Line 7,

station "Porte de la Villette"

Thursday 10 December

9.30 –10.00 Arrival and welcome

10.00 –10.30 Introduction (Jennifer Palumbo, Ecsite)  Welcome

 General state of advancement of the project 10.30 –11.15 Consortium agreement (Chiara Piccolo, Menon)

 Presentation of main points  Questions and discussion

11.15 –11.30 Coffee break

11.15 –13.00 Developments and next steps in OSR Design (EA) 11.30 –12.00 The OSR Survey of Science Museum/Centre Practices

12.00 –12.30 The methodology of OSR Requirement Elicitation Workshops  Schedule for the workshops

 The workshop in Heureka

12.30 –13.00 The OSR Application Profile: Characterizing the OSR content with educational metadata

13.00 –14.00 Lunch break

14.00 –14.30 The OSR Educational Pathways: Structuring OSR learning experiences

(22)

14.45 –15.45 Contents for OSR Repository

 Examples from the science museums/centres

15.45 –16.00 Coffee break

16.00 –17.30 Visit to the Cité des Sciences

Friday 11 December

9.30 –10.30 OSR Portal and technical requirements (Costas Ballas, Intrasoft)  Presentation of the first concept of the portal

 Presentation of foreseen structure

 Requirements from partners: uploading, metadata, text, translations

 Planning of workflow

10.30 –11.15 Trials and Validation (Franz Bogner, UBT)

 Presentation of objectives of the workpackage  Strategy and workplan for the next few months  Summer school in Crete

11.15 –11.30 Coffee break

11.30 –12.30 Dissemination (Ecsite + all partners)

 Report from partners about the dissemination activities implemented so far and planned for the next months: conferences attended, printed materials, websites etc.

12.30 –13.30 Lunch break

13.30 –14.00 Roadmap to a standardized approach (Jan Pawlowski, JYU)  Presentation of objectives of the workpackage  Strategy and workplan for the next few months  Discussion

(23)

 Partners share previous experiences on projects related to OSR with the consortium

14.30 –15.00 Action Plan and wrap-up (Ecsite)

 Final discussion and questions

 Summary of decisions and actions for the next months  Planning of next meetings

二、 與會心得:本此與會更加瞭解將如何與這個計畫的所有參與者合作,並帶

回許多與此計畫相關的訊息,有助於提昇台灣的研究團隊對此計畫的貢

獻。

(24)

報告人姓名 方瓊瑤 服務機構 及職稱 國立臺灣師範大學資訊工程系 時間 會議 地點 自 99 年 5 月 17 日至 99 年 5 月 21 日 法國 昂熱市(Angers, France) 會議 名稱

(中文) 第 3 屆電腦視覺、影像暨圖學理論及應用之聯合研討會中

之電腦視覺理論與應用國際研討會

(英文)

the 3

rd

Joint Conference on Computer Vision, Imaging and

Computer Graphics Theory and Applications (VISIGRAPP2010)

--International Conference on Computer Vision Theory and

Applications (VISAPP2010)

發表 論文 題目

(中文) 擷取動差為特徵之嬰兒臉部表情辨識系統

(英文) Infant Facial Expression Recognition System Based on Moment Feature Extraction 一 一一 一、、、、 參加會議經過參加會議經過參加會議經過參加會議經過 禮拜五(5/14 日)深夜搭乘十一時五十五分長榮航空直飛法國巴黎戴高樂機場的飛 機,再搭乘法國 SNCF 的 TGV 火車約一個 半小時可抵逹昂熱市,即本國際研討會的舉 辦地點。 本研討會 VISIGRAPP2010 是一個聯合 研 討 會 , 包 含 下 列 四 個 子 研 討 會 、 一 個 workshop、以及一個一個 special session.

 VISAPP-International Conference on

Computer Vision Theory and Applications

 IMAGAPP-International Conference on

Imaging Theory and Applications

 GRAPP-International Conference on

Computer Graphics Theory and Applications

(25)

 IVAPP-International Conference on Information Visualization Theory and Applications

 IMTA(Workshop)-Image Mining Theory and Applications

 ECSMIO(Special session)-Engineering and Computational Sciences for Medical

Imaging in Oncology 我們參加的是其中最大的 VISAPP2010 國際研討會,該研討會共有二百五十餘篇 投稿,錄取論文約一百五十餘篇,包含約一百篇的口頭報告論文以及五十一篇的壁報 論文。其中口頭報告論文又分為長篇(報告三十分鐘)以及短篇(報告二十分鐘)二種。 參加研討會的專家學者約二百餘人,來自世界各地,包含德國、法國、英國、美國、 中國、加拿大、哥倫比亞等多個國家,其中更有四位註冊者來自台灣。 VISAPP2010 國際研討會共為期五天,第一天為註冊日,第二天上午舉行開幕 式,下午除了一個論文報的 session 外,還有一場由來自瑞士的 Pascal Fua 博士主講的 keynote lecture,講題是 Modeling deformable surfaces from single videos。晚上還有一 個歡迎的雞尾酒會,讓來自各地的專家學者進行近距離的交流。

第三天上午的 keynote lecture 由來自法國的 Ali Mohammad-Djafari 博士主講,講題 是 Regularization and Bayesian estimation approach for inverse problems in imaging systems and computer vision。之後有一段時間為 posters session,接下來則為三場口頭 論文報告,每一場論文報告分別在五至六間研討室同時舉行。

第四天的行程與第三天略同,上午的 keynote lecture 由來自法國的 Gabrela Csurka 博士主講,講題是 Fisher kernel representation of images and some of its successful applications。之後有一段時間為 posters session,接下來還是三場口頭論文報告。口頭 論文報告同時間亦進行 special session,晚上則安排晚宴。

第五天則由連續三場口頭論文報告開始,以來自美國的 Brian A. Barsky 博士主講 的 keynote lecture 為整會研討會劃下句點,講題是 Two new approaches to depth of field postprocessing。我們的論文在第五天的 session 10(Image understanding)發表,內容 是擷取動差為特徵之嬰兒臉部表情辨識系統。

(26)

本論文主要探討以嬰兒表情為基礎的監控系統。由於嬰兒無法保護自己,若照顧 者有疏忽可能讓嬰兒處於危險中。因此我們希望研發以嬰兒表情為基礎的監控系統來 協助照顧者監控嬰兒,即使照顧者離開嬰兒身邊,也可防止意外的發生。 本研究將攝影機架設在嬰兒床上方以擷取嬰兒影像。此系統首先針對影像去除雜 訊及減少受到光源的影響。藉由膚色的資訊來做嬰兒臉部區塊的擷取。接著利用 Hu 動差、R 動差和 Z 動差去計算臉部區塊。由於每種動差包含許多不同動差,例如 Hu 動差有七個動差,因此給十五張影像去計算相同類別下臉部表情的特徵,並且藉此了 解動差間的關係。本研究將嬰兒表情分成十五個類別,分別是哭、笑、發呆…等,接 著再利用決策樹做分類。利用動差所計算出的相關係數所建構的三個決策樹來進行分 類分別是用來。實驗的結果顯示本研究所提出的方法可行,而且也針對不同種類的動 差進行分析及討論。 本篇論文的發表頗受歡迎,並引起與會者的討論,是一個大家感興趣的新的研究 主題。討論期間,更有其他研究者也想要開始進行相關的研究主題,並希望能分享我 們所收集的嬰兒表情資料庫。研討會結束後,我們在禮拜五下午搭 TGV 火車回巴黎 第二天一早從戴高樂機場搭乘長榮航空直飛台灣。 二、與會與會與會心得與會心得心得 心得 本次 VISIGRAPP2010 是聯合研討會在法國昂 熱市舉辦,昂熱市雖然是一個小城,但與巴黎只有 一個半小時的火車車程,且有 TGV 直達該市,交 通尚稱便利。昂熱市的巴士路線很多,一路巴士可 直達昂熱大學內的 ISTIA 會議地點,主辦單位還熱 心提供每人八張巴士票供與會者使用,非常貼心。 由於昂熱大學內用餐不便,故午餐皆在大學內部的 餐廰進行,這也給了與會者另一個互相交流的機會。

(27)

由於本研討會在昂熱大學的工學院而非在一般的飯店舉行,因此與會者更能專心 融入各研討會主題的討論,幾乎每一篇論文報告都引起熱烈的廻響,並進行積極的問 答。而且在最後一天的最後一個小時還安排了 keynote lecture,更凝聚了與會者的向 心力,一直到研討會結束,仍有不少人流連會場相互交流。研討會期間聆聽來自全球 各地的學者在圖學、影像與電腦視覺各主題裡不同領域的研究,意識到部份探討的議 題趣味十足,以後如果有機會也能朝相關領域發展。 三 三三 三、、、、 建議建議建議建議 非常感謝行政院國家科學委員會提供出席國際會議的相關補助,讓我們能在國際 研討會之交流上增廣見聞,並降低經濟負擔。希望未來能持續提供相關補助方案以供 國內學者出國交流機會,增進台灣學者的國際觀,並提高台灣相關研究在國際學術界 的能見度。 四 四四 四、、、、 攜回資料名稱及內容攜回資料名稱及內容攜回資料名稱及內容攜回資料名稱及內容

VISIGRAPP2010 議程手冊一本、VISIGRAPP2010 論文光碟一片、VISAPP2010 論文 集一本、以及 VISIGRAPP2011 宣傳海報乙份。

(28)

AN INFANT FACIAL EXPRESSION RECOGNITION SYSTEM

BASED ON MOMENT FEATURE EXTRACTION

C. Y. Fang, H. W. Lin,

Department of Computer Science and Information Engineering, National Taiwan Normal University, Taipei Taiwan violet@csie.ntnu.edu.tw, hanwman@yahoo.com.tw

S. W. Chen

Department of Computer Science and Information Engineering, National Taiwan Normal University, Taipei Taiwan schen@csie.ntnu.edu.tw

Keywords: Facial expression recognition, Decision tree, Moment, Correlation coefficient.

Abstract: This paper presents a vision-based infant surveillance system utilizing infant facial expression recognition software. In this study, the video camera is set above the crib to capture the infant expression sequences, which are then sent to the surveillance system. The infant face region is segmented based on the skin colour information. Three types of moments, namely Hu, R, and Zernike are then calculated based on the information available from the infant face regions. Since each type of moment in turn contains several different moments, given a single fifteen-frame sequence, the correlation coefficients between two moments of the same type can form the attribute vector of facial expressions. Fifteen infant facial expression classes have been defined in this study. Three decision trees corresponding to each type of moment have been constructed in order to classify these facial expressions. The experimental results show that the proposed method is robust and efficient. The properties of the different types of moments have also been analyzed and discussed.

1INTRODUCTION

Infants are too weak to protect themselves and lack disposing capacity, and therefore are more likely to sustain unintentional injuries especially when compared to children of other age groups. These incidents are very dangerous and can potentially lead to disabilities and in some cases even death. In Taiwan’s Taipei city, the top three causes of infant death are (1) newborns affected by maternal complications during pregnancy, (2) congenital anomalies, and (3) unintentional injuries, which in total account for 83% of all infant mortalities (Doi, 2006). Unintentional injuries are a major cause of infant deaths each year, a majority of which can be easily avoided. Some of the most common causes include dangerous objects surround the infant and unhealthy sleeping environments. Therefore, the promotion of safer homes and better sleeping environments is critical to reducing infant mortality caused by unintentional injuries.

Vision-based surveillance systems, which take advantage of camera technology to improve safety, have been used for infant care (Doi, 2006). The main goal behind the development of vision-based infant care systems is to monitor the infant when they are alone in the crib and to subsequently send

warning messages to the baby-sitters when required, in order to prevent the occurrence of unintentional injuries.

The Department of Health in Taipei city has reported that the two most common causes of unintentional injuries are suffocation and choking (Department of Health, Taipei City Government, 2007). Moreover, in Alaska and the United States, the biggest cause of death among infants due to unintentional injuries is suffocation, which accounts for nearly 65% of all mortalities due to unintentional injuries (The State of Alaska, 2005). The recognition of infant facial expressions such as those when the infant is crying or vomiting may play an important role in the timely detection of infant suffocation. Thus, this paper seeks to address the above problems by presenting a vision-based infant facial expression recognition

(a) (b) Figure 1: A video camera set above the crib.

(29)

system for infant safety surveillance.

Many facial expression recognition methods have been proposed recently. However, most of them focus on recognizing facial expressions of adults. Compared to an adult, the exact pose and position of the infant head is difficult to accurately locate or estimate and therefore, very few infant facial expression recognition methods have been proposed to date. Pal et al. (Pal, 2006) used the position of the eyebrows, eyes, and mouth to estimate the individual motions in order to classify infant facial expressions. The various classes of facial expressions include anger, pain, sadness, hunger, and fear. The features they used are the local ones. However, we believe that global moments (Zhi, 2008) are more suitable for use in infant facial expression recognition systems.

2SYSTEM FLOWCHART

The data input to the system consists of video sequences, which have been acquired by a video camera set above the crib as shown in Figure 1(a). An example image taken by the video camera is shown in Figure 1 (b).

Figure 2 shows the flowchart of the infant facial expression recognition system. The system first pre-processes the input image to remove any noise and to reduce the effects of lights and shadows. The infant face region is then segmented based on the skin colour information and then the moment features are extracted from the face region. This study extracts three types of moments as features, including seven Hu moments, ten R moments, and eight Zernike moments.

For each fifteen-frame sequence, the

correlation coefficients between two moments (features) of the same type are calculated as the attribute of infant facial expressions. These coefficients aid in the proper classification of the facial expressions. Three decision trees, which correspond to each different type of moment, are used to classify the infant facial expressions.

Five infant facial expressions, including crying, gazing, laughing, yawning and vomiting have been classified in this study. Different positions of the infant head namely front, turn left and turn right have also been considered. Thus, a total of fifteen classes have been identified.

3INFANT FACE DETECTION

Three color components from different color models have been used to detect infant skin colour. They are the S component from the HSI model, the

Cb component from the YCrCb model and a

modified U component from the LUX model. Given a pixel whose colour is represented by (r, g,

b) in the RGB color model, its corresponding

transfer functions in terms of the above components are: ) , , min( ) ( 3 1 r g b b g r S + + − = (1) b g r Cb=−0.1687 −0.3313 +0.5 (2)     > > < × = otherwise. 255 , and 5 . 1 if 256 r g 0 g r r g U (3)

The ranges of the infant skin colour are defined as S = [5, 35], Cb = [110, 133] and U = [0, 252].

These ranges have been obtained from

experimental results. Figure 3 (b) shows the skin color detection results of the input image in Figure 3 (a). Figure 3 (c) shows the result after noise reduction and image binarization. Here, a 10x10 median filter has been used to reduce the noise and the largest connected component has been selected as the face region (Figure 3 (d)).

4FEATURE EXTRACTION

In this section, we will briefly explain the different types of moments. Given an image I, let f represent an image function. For each pair of non-negative integers (p, q), the digital (p, q)th moment of I is given by

∈ = I y x q p pq I x y f x y m ) , ( ) , ( ) ( (4) Let 00 10 0 m m x = and 00 01 0 m m

y = . Then the central (p,

q)th moments of I can be defined as ) , ( ) ( ) ( ) ( ) , ( 0 0 y y f x y x x I I y x q p pq

∈ − − = µ (5)

Hu (Hu, 1962) defined the normalized central moments of I to be γ µ µ η 00 pq pq= where 1 2 + + = p q γ (6)

Infant Face Detection Feature Extraction

Classification Feature Correlation

Calculation Image Sequences

Figure 2: Flowchart of the proposed system.

(a) (b) (c) (d) Figure 3 Infant face detection.

(30)

From these normalized moments, Hu defined seven moments, which are translation, scale and rotation invariant. 02 20 1=η +η H 2 11 2 02 20 2=(η −η ) +4η H 2 03 21 2 12 30 3=(η −3η ) +(3η −η ) H 2 03 21 2 12 30 4=(η +η ) +(η +η ) H ( ) ( 3 )[ )( 3 ( ) ( 3 ) )[( )( 3 ( 03 21 2 12 30 03 21 03 21 2 03 21 2 12 30 12 30 12 30 5 η η η η η η η η η η η η η η η η + − + + − + + − + + − = H ) )( ( 4 ] ) ( ) )[( ( 03 21 12 30 11 2 03 21 2 12 30 02 20 6 η η η η η η η η η η η + + + + − + − = H ]. ) ( ) ( 3 )[ )( 3 ( ] ) ( 3 ) )[( )( 3 ( 2 03 21 2 12 30 03 21 30 12 2 03 21 2 12 30 12 30 03 21 7 η η η η η η η η η η η η η η η η + − + + − + + − + + − = H (7)

Liu et al. (Liu, 2008) claimed that the Hu moments do not have scale invariability in the discrete case, and therefore proposed ten R moments, which are an improvement over the Hu moment invariants. These R moments can be obtained from the Hu moments as shown below:

1 2 1 H H R = , 2 1 2 1 2 H H H H R − + = , 4 3 3 H H R = , | | 5 3 4 H H R = , | | 5 4 5 H H R = , 3 1 6 6 | | H H H R ⋅ = , | | | | 5 1 6 7 H H H R ⋅ = , 2 3 6 8 | | H H H R ⋅ = , | | | | 5 2 6 9 H H H R ⋅ = , 4 3 5 10 | | H H H R ⋅ = (8)

Zernike moments (Alpaydin, 2004) are defined using polar coordinates and have simple native rotation properties. The kernel of the Zernike moments consists of a set of orthogonal Zernike polynomials defined inside a unit circle. The Zernike moment of order p with repetition q for an image function f is given by

2 / 1 2 2 ) ( pq pq pq C G Z = + (9)

where Cpq indicates the real part and Gpq indicates

the imaginary part and are given by:

) , ( 4 cos ) / 2 ( 2 2 2 / 1 8 1 2 u f uv qv N u F N p C N u u v pq pq

= = + = π (10) ) , ( 4 sin ) / 2 ( 2 2 /2 1 8 1 2 u f uv qv N u F N p G N u u v pq pq

= = + = π (11) where [ ] [ ] p u q p u pq r q u p q u p u u p r F 2 2 / |) | ( 0 ! 2 | |/2 ! 2 | |/2! )! ( ) 1 ( ) ( − − =

+ =

which indicates the radial polynomials and the image size as NxN. For each pixel (x, y) in an image, u=max(|x|,|y|) and if u=| x| , then

u xy y v=2 − , otherwise, u xy y y x u v= − + | | ) ( 2 .

As Zernike moments with a larger value of p contain higher frequency information, we select

those moments whose value of p is either eight or nine in our experiments. To simplify the index, we use Z1, Z2, …, Z10 to represent Z80, Z82, …, Z99 respectively.

5CORRELATION

COEFFICIENTS

Given a video sequence I = (I1, I2, …., In) which

describes an infant facial expression, the system can calculate one type of moment for each particular frame. Suppose there are m moments,

then the system can obtain m ordered sequences Ai

= { , ,..., } 2 1 iI iIn iI A A A , i = 1, 2,…, m, where k iI A indicates the ith moment Ai of the frame Ik fork =

1, 2,…, n. Now the variances of the elements in each sequence Ai can be calculated by

= − − = n k i iI A A n S k i 1 2 2 ( ) 1 1 A (12)

where A is the mean of the elements in Ai i, and the covariance between Ai and Aj is given by

= − − − = n k j jI i iI A A A A n S k k j i 1 )] )( [( 1 1 A A (13)

Therefore, the correlation coefficients between

Ai and Aj can be defined as

j i j i j i S S S r A A A A A A = (14) Moreover,rAiAj =rAjAi , rAiAi =1, for i, j = 1,

2,…, m. For example, since seven Hu moments have been defined, we can obtain a total of 21 beneficial correlation coefficients. Figure 4 shows a video sequence of an infant crying with fifteen frames. The twenty-one correlation coefficients between the seven ordering sequences are shown in Table 1.

Table 1: The correlation coefficients between the seven Hu moment sequences.

H2 H3 H4 H5 H6 H7

H1 0.1222 0.2588 0.8795 -0.4564 -0.4431 -0.9140

H2 -- -0.8272 -0.1537 0.6927 -0.1960 0.0573

H3 -- 0.4458 -0.9237 0.2070 -0.3432

(31)

H4 -- -0.6798 -0.2366 -0.9800

H5 -- -0.1960 0.5663

H6 -- 0.3218

Similarly, we can calculate the correlation coefficients between every two R moments and every two Zernike moments. We believe that the correlation coefficients describe the relationship between these moments, which vary depending on for different facial expressions. Therefore, these coefficients can provide important information, which can be in turn used to classify the different infant facial expressions.

6CLASSIFICATION TREES

In this study, a decision tree [8], which implements the divide-and-conquer strategy, has been used to classify the infant facial expressions. A decision tree is a hierarchical model used for supervised learning and is composed of various internal decision nodes and terminal leaves. Each decision node implements a split function with discrete outcomes labeling the branches. The advantages of the decision tree are (1) it can perform a quick search of the class of the input features and (2) it can be easily understood and interpreted by mere observation. In this study, we have constructed three binary classification trees corresponding to the three different types of moments.

Suppose K infant facial expressions are to be classified, namely, Ci where i = 1,…, K. Given a

decision node S, let NS indicate the number of

training instances reaching the node S and NSi indicate the number of NS belonging to the class Ci.

It is apparent that K S i i S N N =

=1 . The impurity

measure applied in this study is an entropy function given by

= − = K h S h S S h S N N N N S E 1 2 log ) ( (15)

where 0log0≡0 . The range of this entropy function is [0, 1]. If the entropy function is zero, then node S is pure. It means that all the training instances reaching node S belong to the same class. Otherwise, if the entropy is high, it means that the many training instances reaching node S belong to different classes and hence should be split further.

The correlation coefficients rAiAj (Eq. (14))

between two attributes Ai and Aj of a training instance can be used to split the training instances.

If rAiAj >0, then the training instances can be assigned to one branch. Otherwise, the instances can be assigned to a second branch. Let the training instances in S be split into two subsets S1 and S2 (whereS1∪S2 =S and S1∩S2 =φ ) by the correlation coefficient rAiAj . Then the

accuracy of the split can be measured by

= = − − = K h K h S h S S h S S h S S h S r N N N N N N N N S E j i 1 1 2 2 log , log ) ( 2 2 2 2 1 1 1 1 A A (16) Finally, the best correlation coefficient selected by the system is ) ( min arg ) ( , * * S E S r j i j i r j i AA A A = (17)

It is to be noted that once a correlation coefficient has been selected, it cannot be selected again by its descendants.

The algorithm to construct a binary

classification tree is shown here:

Algorithm: Decision tree construction

Step 1: Initially, put all the training instances into root SR Regard SR as an internal decision node

and input SR into a decision node queue.

Step 2: Select an internal decision node S from the decision node queue. Calculate the entropy of node S using Eq. 15. If the entropy of node S is larger than a threshold Ts, proceed to Step

3,

otherwise label node S as a leaf node and proceed to Step 4.

Step 3: Find the best correlation coefficient *

* j i

r A

A to split the training instances in node

S using Eqs. 16 and 17. Split the training

instances in S into two nodes S1 and S2 using

the correlation coefficients *

* j i

r A

A and then

subsequently add S1 and S2 into the decision node queue. .

Step 4: If the queue is not empty, return to Step 2, otherwise stop the algorithm.

7EXPERIMENTAL RESULTS

The input data for our system was acquired using a SONY TRV-900 video camera mounted above the

Figure 6:The decision tree of the Hu moments. yes rH4H5>0 no

H S1

H

S2 S9H

(32)

crib and processed on a PC with an IntelRCore™ 21.86GHz CPU. The input video sequences recorded at a rate of 30 frames/second were down-sampled to a rate of six frames/second, which is the processing speed of our current system. In order to increase the processing rate, we further reduced the size (640 x 480 pixels) of each image to 320 x 240 pixels.

Five infant facial expressions, including crying, dazing, laughing, yawning and vomiting have been classified in this study. Three different poses of the infant head, including front, left, and right (an example of an infant yawning as shown from the three positions is shown in Figure 4) have been considered and a total of fifteen classes have been identified.

In the first experiment, the Hu moments and their correlation coefficients were calculated using Eqs. 7 and 14. A corresponding decision tree was constructed using the decision tree construction algorithm. Figure 7 shows the decision tree constructed using the correlation coefficients between the Hu moments as the split function. Node S1H is the root, and the split function of

H

S1 is rH4H5 >0. Nodes

H

S2 and S9H are the left and right branches of S1H respectively. The left subtree of the decision tree shown in Figure 6 is illustrated in Figure 7 and the right subtree is depicted in Figure 8. The split functions of the roots of the left subtree and the right subtree are, rH3H5 >0 and rH6H7 >0 respectively.

When Figure 7 and Figure 8 are compared with

each other, it can be seen that most of the sequences of the infant head position ‘turn right’ are classified into the left subtree as shown in Figure 7. Similarly, many sequences of the infant head position ‘turn left’ are classified into the right subtree as shown in Figure 8.

Similarly, the same fifty-nine fifteen frame sequences were used to train and create the decision trees of the R and Zernike moments. The R moments and their correlation coefficients are calculated using Eqs. 8 and 14. The decision tree created based on the correlation coefficients of the R moments consists of fifteen internal nodes and seventeen leaves with a height of ten. The experimental results are shown in Table 2.

Moreover, the Zernike moments and their correlation coefficients are calculated using Eqs. 9 and 14. The decision tree created based on the correlation coefficients of the Zernike moments includes nineteen internal nodes and twenty leaves, with a height of seven.

Table 2 also shows the classification results of the same thirty testing sequences. We observe that the correlation coefficients of the moments are useful attributes for classifying infant facial expressions. Moreover, the classification tree created from the Hu moments has a smaller height Figure 7: The left subtree of the decision tree depicted

in Figure 6. 0 5 3H > H r H S2 yes no 0 7 6H > H r H S3 0 6 1H > H r H S6 yes no 0 6 3H > H r H S4 vomiting yes no 0 6 3H > H r H S5 vomiting yes no laughing vomiting yes no 0 5 1H > H r H S7 yawning yes no 0 3 1H > H r H S8 crying yes no laughing crying 0 7 6 > H H r H S9

Figure 8: The right subtree of the decision tree shown in Figure 6. yes no 0 4 1 > H H r H S10 crying yes no 0 6 2H > H r H S11 yawning yes no 0 6 1H > H r H S12 crying yes no yawning H S13 0 7 5 > H H r 0 2 1 > H H r H S16 yes no H S14 0 4 3 > H H r no laughing no dazing dazing yes yes H S15 0 3 1 > H H r no crying dazing yes

數據

Figure 1 Two successive frames of an input sequence.
Figure 3 An example of students’ gesture image (a) the original image; (b) the result of region growing and region combination.
Figure 2: Flowchart of the proposed system.
Table  1:  The  correlation  coefficients  between  the  seven Hu moment sequences.
+3

參考文獻

相關文件

 Promote project learning, mathematical modeling, and problem-based learning to strengthen the ability to integrate and apply knowledge and skills, and make. calculated

Like the governments of many advanced economies which have formulated strategies to promote the use of information technology (IT) in learning and teaching,

Now, nearly all of the current flows through wire S since it has a much lower resistance than the light bulb. The light bulb does not glow because the current flowing through it

refined generic skills, values education, information literacy, Language across the Curriculum (

Science education provides learning experiences for students to develop scientific literacy with a firm foundation on science, realise the relationship between science, technology,

In the context of the Hong Kong school curriculum, STEM education is promoted through the Science, Technology and Mathematics Education Key Learning Areas (KLAs) in primary

We have been promoting STEM education among schools in a holistic and coherent manner, with strategies that embrace renewing the curricula of the Science,

Besides, although the elements of STEM education are embedded in individual KLAs of Science, Technology and Mathematics Education of the local school curriculum, the coherence