• 沒有找到結果。

Implementation of the Emotion Recognition Embedded System

Chapter 3 Human Emotion Recognition

3.2 Speech-signal-based Emotion Recognition

3.2.4 Implementation of the Emotion Recognition Embedded System

The developed algorithms were implemented on a DSP-based embedded system [77], to facilitate the experimental study of an entertainment robot. The embedded system consists of a microphone and a DSK6416 DSP board from Texas Instruments. The selection of the DSK6416 as the main processing unit is because of its high performance in fixed-point calculation, with a 1 GHz clock rate. Figure 3-21 shows the TMS320C6416 DSK codec interface [78-79]. The DSK uses a Texas Instruments AIC23 stereo codec for input and output of audio signals. The codec samples an analog signal from a microphone and converts the signal into digital data, so that it can be processed by the DSP. The DSP chip and codec communicate via two serial channels; one controls the codec’s internal configuration registers and the other is responsible for digital audio samples. As shown in Fig. 3-21, the McBSP1 is used as the unidirectional control channel; the McBSP2 is used as the bi-directional audio-data channel. The codec has a 12 MHz system clock. The internal sample rate subdivides the 12 MHz clock to generate common frequencies, including 48 KHz, 44.1 KHz and 8 KHz; a frequency of 8 KHz is selected to sample the user’s speech signal, in this study.

As a user speaks into the microphone, the embedded system acquires speech signals and begins to recognize the user’s emotional state. The recognition results are transmitted via

Fig. 3-21: The TMS320C6416 DSK codec interface.

RS-232 serial link to a host computer (PC), where intelligent responses are generated to react to the received speech signal.

In order to test the emotion recognition system in practical scenarios of human-robot interaction, the embedded speech processing system is integrated within the self-built entertainment robot. Figure 3-22 shows an interaction scenario for a user and the entertainment robot. The control architecture of this robot is depicted in Fig. 3-23. The DSP-based system is installed at the back of the entertainment robot. Seven Radio Controlled (RC) servos are used to control the movement of the ears, head, hands and legs of the entertainment robot. A motor servo controller, from Pololu Robotics and Electronics Inc. [80], controls the RC servos in the robot. The DSP-based emotion recognition system estimates emotion categories and determines, in real time, a suitable response for the entertainment robot. Some interesting studies [81-82] have utilized microphone arrays to avoid using a headset. Their methods improve the speech recognition system to cope with noise and direction sensitivity problems. In this study, we focus on the integration of emotional speech recognition algorithm and entertainment robot. In order to reduce the influence of the sound of robot motion or surrounding interference, a headset is used in the experiments, as shown in Fig. 3-22.

Fig. 3-22: Interaction scenario for a user and the entertainment robot.

Fig. 3-23: Control architecture of the entertainment robot.

3.3 Summary

In this chapter, two human emotion recognition methods, including bimodal information fusion algorithm, speech-signal-based emotion recognition are proposed and presented. All of these emotion recognition methods will enhance the interaction between human and robot in a natural manner.

Chapter 4

Experimental Results

In this chapter, the experimental results of robotic emotion generation and human emotion recognition are presented and discussed. For the robotic emotion generation, both anthropomorphic robotic head and artificial face simulator were employed to evaluate the results of human-robot interaction. In the part of human emotion recognition, the experimental results of three kinds of emotion recognition methods, which are descripted in Chapter 4, are presented.

4.1 Experimental Results of Robotic Emotion Generation

The developed robotic emotion generation system has been tested and evaluated for autonomous emotional interaction. We first implemented the proposed AEIS on a self-constructed anthropomorphic robotic head for experimental validation. The robotic head, however, has some hardware limitations in completing the evaluation experiments of mood transition system. A face simulator was adopted for testing the effectiveness of proposed human-robot interaction design.

4.1.1 Experiments on an Anthropomorphic Robotic Head

In order to verify the developed algorithms for emotional human-robot interaction, an embedded robotic vision system [77] has been integrated with an anthropomorphic robotic head with 16 degree-of-freedom. The DSP-based vision system was installed at back of the robotic head and the CMOS image sensor was put on the right eye to capture facial images.

The system architecture of the robotic head is depicted in Fig. 4-1. A Qwerk platform [83]

works as an embedded controller. It receives estimated emotional intensity of a user from the vision system and output corresponding pulse width modulation signals to 16 RC servos to generate corresponding robotic facial expression. Figure 4-2 shows several basic facial expressions of the robotic head.

Fig. 4-1: Architecture of the self-built anthropomorphic robotic head.

(a) Happiness (b) Disgust (c) Sadness

(d) Surprise (e) Fear (f) Anger

Fig. 4-2: Examples of facial expressions of the robotic head.

In the experiment, a user presented his facial expressions in front of the robotic head as shown in Fig. 4-3. The robot responded to the user with different degrees of wondering as the user presented various intensities of surprise. A video clip of this experiment can be found in [84].

4.1.2 Experimental Setup for the Artificial Face Simulator

A virtual-conversation scenario was set up for testing the effectiveness of proposed human-robot interaction design. As shown in Fig. 4-4(a), in the virtual-conversation test, a subject spoke to the artificial face (on the screen) while the talker’s facial expression was detected by a web camera. The subject in the experiment is a student of the authors’ Institute.

Table 4-1 lists the conversation dialogue and corresponding subject’s facial expressions during the test. In the dialogue, the subject complained about her job with sad and angry facial expressions in the beginning. Then the subject talked about the coming Christmas vacation.

Her mood varied from angry to happy state. After acquiring facial images, the user emotional state recognizer transferred the user’s facial expressions into sets of emotional intensity every 0.5 seconds. The duration of this conversation is around 36 seconds. There are 73 sets of emotional intensity values detected from the user in this conversation scenario. In order to

Fig. 4-3: Interaction scenario of a user and robotic head.

(a) (b)

Fig. 4-4: Experiment setup: interaction scenario with an artificial face.

Table 4-1: List of the conversation dialogue and corresponding subject’s facial expressions.

Sentence

# Dialogue

User’s emotional

state 1 Hi, Robot. How are you feeling today? Neutral 2 I feel so bad today. I screwed up my job. Sad 3 Do you know I feel very sad now? I really

hope it was not happened. Sad

4 I am really angry at myself for my

mindless mistake. Angry

5

However, in a few days it will be Christmas. I think that I can get relaxed during the vacation.

Neutral

6

I am planning to go to Tokyo with my boyfriend. We hear of a carnival will take place this year.

Happy

7 Ha! I can’t wait to go to the trip. Happy

observe the robotic emotional behavior purely due to individual personality and mood transition and avoid undesirable effect caused by error from user emotional state recognition, the detected user emotional intensities are regulated to more reasonable ones manually. Table 4-2 shows part of the regulated user emotional intensities when the subject uttered sentence 1 and 2. These sets of emotional intensity are utilized again as input to test the response of the artificial face with different robot personalities and moods.

Table 4-2: Regulated user emotion intensity of conversion sentence 1 and 2.

Sentence # UEk (k=1,2, …, 15)

1 (0.5,0.2,0,0.3), (0.5,0.3,0,0.2), (0.5,0.4,0,0.1), (0.6, 0.4,0,0), (0.8,0.2,0,0), (1,0,0,0), (1,0,0,0).

2 (0.9,0,0,0.1), (0.8,0,0,0.2), (0.7,0,0,0.3), (0.6,0,0,0.4), (0.5,0,0,0.5), (0.4,0,0,0.6) , (0.4,0,0,0.6), (0.3,0,0,0.7).

4.1.3 Evaluation of Robotic Mood Transition Due to Individual Personality

It is desirable that a robot behaves differently in different interaction scenarios. For example, to keep attention from students in education applications, the robot needs to behave more friendly and funny. Hence the openness and agreeableness scales are designed higher.

One can design the desired personality by adjusting the corresponding Big Five factors. In this experiment, two opposite robotic individual personalities were designed respectively for RobotA (with more active trait) and RobotB (with more passive trait). The Big Five factors were applied to model these two personalities. Table 4-3 lists the assigned scales corresponding to both opposite personalities. As we know, people belonging to active trait are usually open minded and interact with others more frequently. Hence the openness and agreeableness scales of RobotA are higher than those of RobotB and these two higher scales

Table 4-3: Definition of personality scales using Big Five factors.

RobotA (Active trait)

RobotB (Passive pessimist)

Openness 1 0.3

Conscientiousness 0.5 0.5

Extraversion 0.1 0.1

Agreeableness 0.5 0.2

Neuroticism 0.1 0.3

(Pα, Pβ) (0.34, 0.24) (0.20, -0.07)

lead the personality parameters (Pα, Pβ) to more positive tendency. Furthermore, a more passive pessimist has the tendency to experience negative thinking in general. Therefore the neuroticism factor of RobotB is higher than that of RobotA. The higher neuroticism factor of RobotB leads its personality more negative tendency on arousal (β axis). After trait values have been identified, the robot personality parameters (Pα, Pβ) are determined by using (2.4) and (2.5). And the proposed robotic mood transition model is built accordingly.

To evaluate the effectiveness of the proposed emotional expression generation scheme based on individual personality, we conducted two sessions of experiments by using the artificial face as shown in Fig. 4-4(b). In the experiments, the same input sets were presented to RobotA and RobotB with the regulated user emotional intensities, respectively with above-mentioned conversation. The robotic mood states were observed as the same user spoke to RobotA and RobotB. Accordingly, the artificial face reacted with different facial expressions resulting from mood state transition. Table 4-4 and Table 4-5 list the calculated robotic mood states (RMk) and simulated facial expressions corresponding to RobotA and RobotB respectively. Video clips of this experimental can be found in [85].

Figure 4-5 depicts the mood transition of RobotA as the above conversation was performed. The initial mood state of RobotA was set at neutral state (0.61,-0.47), referring to Fig. 2-2. The mood transition trajectories moved from the fourth quadrant to third, second and first quadrant in the end. The corresponding facial expressions varied from neutral (#1) to boredom (#2), sadness (#3), anger (#4), surprise (#5), happiness (#6) and excitement (#7) in the end. The sharp turning point (#5) in Fig. 4-5 indicates that RobotA recognized the subject’s emotional state varied rapidly from anger to happiness. Figure 4-6 shows the mood transition of RobotB as the same emotional conversation was performed. The initial mood state of RobotB was also set on neutral state. The corresponding facial expressions varied from neutral (#1) to sleepiness (#2, #3), boredom (#4), sadness (#5), boredom (#6) and then near neutral in the end. Compared with Fig. 4-5, the robotic mood transition of passive trait is

Table 4-4: Facial expressions for the RobotA.

Table 4-5: Facial expressions for the RobotB.

basically in the regions of boredom, sad and neutral emotion. It stayed almost destructive no matter what kind of the subject’s emotional states came into play. On the contrary, the robotic mood transition of active trait scattered in whole emotional space. These features manifest the difference in characters between active and passive traits. This experiment reveals that the proposed mood transition scheme is able to realize robotic emotional behavior with different personality trait. Video clips of the mood transition for RobotA and RobotB can be found in [86].

Fig. 4-5: Robotic mood transition of RobotA.

-1 -0.5 0 0.5 1

Fig. 4-6: Robotic mood transition of RobotB.

Figure 4-7 shows the variation of seven fusion weights while the subject uttered to RobotA. In the emotional conversation, the subject spoke seven dialogues as shown in Table 4-1. The corresponding fusion weights variations of these seven dialogues are shown by seven sectors in Fig. 4-7. In dialogue #1, the neutral facial expressions dominate the output behavior;

this is reasonable since the subject’s emotional state is neutral. In dialogue #2 and #3, the weights of sadness gradually increase while the transitions of subject’s emotional states are from neutral to sad. Next, the sad weight decreases and the surprise weight increases as the subject feels angry progressively (dialogue #4). In the meantime, the fear weight also increases to respond to the subject’s angry expression. After the subject turned to be happy, the surprise and fear weights decrease (dialogue #5) and happy weight increases to dominate the output behavior.

Figure 4-8 shows the variation of seven fusion weights as the subject uttered to RobotB with the same emotional conversation. In dialogue #3 and #4, the weights of sadness gradually increase while the transitions of subject’s emotional states are from neutral to sad and angry. After the subject’s emotional states become happiness, the sad weight decreases

0 5 10 15 20 25 30 35 40

Fig. 4-7: Weights variation for RobotA (active trait).

0 5 10 15 20 25 30 35 40 0

0.2 0.4 0.6 0.8

1 Neutral

Happiness Surprise Fear Sadness Disgust Angry

#1 #2 #3 #4 #5 #6 #7

Time (sec) Weight

Fig. 4-8: Weights variation for RobotB (passive trait).

(dialogue #5) and neutral weight increases to dominate the output behavior. Compared with RobotA in Fig. 4-7, the personality of passive trait leads to less behavior variations and gets into sadness emotion easily although the subject’s emotional states become happiness. These features match the emotional tendency for both active and passive traits.

4.1.4 Evaluation of Emotional Interaction Scheme

In this experiment, questionnaire evaluation for the robot mood transition design was conducted for the emotional conversation performed by the same subject with RobotA, RobotB and RobotC respectively. Here the emotional response of RobotC was designed such that it is irrelevant to the proposed emotional interaction method. RobotC just follows facial expressions as recognized from the subject. The emotional conversation with RobotA, RobotB and RobotC were recorded on three video clips [85] for questionnaire evaluation. We used the Big Five factors to evaluate the effectiveness of the proposed robotic emotional expression generation system.

Twenty subjects of age 20~40 were invited to watch the videos of virtual conversation

with RobotA, RobotB and RobotC. The invited subjects were asked to answer questionnaires (see Appendix A) after watching the above videos. In the questionnaire, a subject is asked to give scores from agreeing to disagreeing about the emotional interactions in the videos. We then average the scores using scales (0-1) for the RobotA, RobotB and RobotC respectively.

The summary of the experimental results is shown in Fig. 4-9. In the current design, facial expressions of the animation simulator are presented by direct control of pure mood transition.

Unlike wording wisdom of human, the readability of facial expressions is related to very different underlying semantics [87-89]. Although the difference between the designed facial animation and human facial expression is obvious, the current design allows an observer to answer the questionnaires more straightforwardly. The major characteristics of designed robotic trait (active and passive) are openness, agreeableness and neuroticism. By observing the openness and agreeableness factors in Fig. 4-9, both factors are evaluated higher for RobotA than those of RobotB. It reveals that RobotA is recognized to have more tendencies to

Openness

Robot A Robot B Robot C

Conscientiousness

Robot A Robot B Robot C

Extraversion

Robot A Robot B Robot C

Agreeableness

Robot A Robot B Robot C

Neuroticism

Robot A Robot B Robot C

Fig. 4-9: Questionary result of psychological impact.

react and interact with human than RobotB. Moreover, the neuroticism factor of RobotB is evaluated to be higher than that of RobotA. It indicates that the passive pessimist is indeed more inclined to experience negative thoughts than active trait. These results conform to the designed personality in Table 4-3.

As mentioned, RobotC only copies the subject’s facial expressions without any mood transition discussed in this work. In other words, the detected Big Five factors of RobotC only show the subject’s personality. In order to verify the difference between robots with the proposed mood transition scheme (RobotA and RobotB) and without it (RobotC), the same 20 subjects answered the questionnaire after watching the videos in [85]. In the questionnaire, a subject is asked to give scores from agreeing to disagree about the degree of natural or artificial interactions in the videos. The summary of the experimental results is shown in Fig.

4-10. Based on the item of natural vs. artificial in Fig. 4-10, RobotA and RobotB both behave more naturally than RobotC. It shows that the proposed mood transition method enables the robot to behave in a human-like manner.

Table 4-6 shows the average values of 20 questionnaire surveys and Table 4-7 shows the corresponding standard deviation of questionnaire result. In Table 4-6, the personality parameters of RobotA and RobotB are estimated as (0.68, 0.19) and (0.43, -0.22), respectively.

By comparing with the designed personality in Table 4-3, we see that the personality parameters of RobotA and RobotB are (0.34, 0.24) and (0.20, -0.07), respectively. It is seen

Natural vs. Artificial

0 0.2 0.4 0.6 0.8 1

Robot A Robot B Robot C

Fig. 4-10: Questionary result of Natural vs. Artificial.

Table 4-6: Estimation of personality parameters by questionnaire survey.

RobotA RobotB

Openness 0.74 0.31 Conscientiousness 0.73 0.48

Extraversion 0.78 0.23 Agreeableness 0.79 0.42

Neuroticism 0.27 0.69 (Pα, Pβ) (0.68, 0.19) (0.43, -0.22)

Table 4-7: Standard deviation of questionnaire results.

RobotA RobotB RobotC

Openness 0.16 0.20 0.24

Conscientiousness 0.14 0.21 0.23 Extraversion 0.11 0.13 0.20 Agreeableness 0.15 0.21 0.26

Neuroticism 0.16 0.30 0.29

that both Pα values (0.34 and 0.20) of designed RobotA and RobotB are proportional to the estimated Pαvalues (0.68 and 0.43) in Table 4-6, respectively. These results are represented as shown in Fig. 4-11. It reveals that both the designed and estimated mood transition velocities of RobotA are about 1.6 times (0.68/0.43 and 0.34/0.20) those of RobotB on the Pα- Pβ axes.

In another word, both designed and estimated RobotA are happier easily than RobotB with a similar ratio. Furthermore, both of the designed and estimated Pβ values of RobotB are negative. It indicates that both the designed and estimated RobotA will tend to arousal and RobotB will tend to sleepiness while the same user’s emotional intensity is imported. Hence the estimated results of robot personality parameters are consistent with the designed personality scales in Table 4-3. Based on the experimental results, it can be concluded that a robot can be designed with a desired personality and differently designed robotic personalities give distinct interactive behaviors. Moreover, the emotional robots behave more human-like interaction.

(a) Original design. (b) Estimated result.

Fig. 4-11: Representation of robot personality parameters.

4.2 Experiments on Bimodal Information Fusion Algorithm

In contrast to many existing visual-only or audio-only databases for benchmark testing [90], there is hardly a database that combines both visual and audio information. Martin et al.

[91] built an audio-visual emotion database by using a digital video camera. However, the resolution of the camera is too high to be applied for practical pet-robot scenarios, where very often low-cost vision sensors are adopted. Therefore, we built our own database from lab members using off-the-shelf COMS image sensor and PC microphone.

A DSP-based system has been designed and constructed for the experiments, for both building the database and experimental evaluation. As shown in Fig. 3-2, a user presents his facial expressions in front of the CMOS image sensor and speaks to the microphone. After acquiring both facial and speech signals, the DSP system begins to process the visual and audio information. There are five emotional expressions in the built-up database as described.

Figure 4-12 shows part of the database. Currently, the database includes fourteen persons and every one of them expresses their emotions ten times in each emotion category. So there are 140 data samples. In the off-line experiments, we randomly selected 70 data samples as training samples and the other 70 data samples were used as test samples.

Fig. 4-12: Examples of database.

4.2.1 Off-line Experimental Results

Table 4-8 shows the experimental results of five emotional categories using only the speech features. The average recognition rate is 73.7%. Table 4-9 shows the experimental results of five emotional categories using image features. The average recognition rate is 81.7%. The recognition rates of using the proposed bimodal information fusion algorithm to combine both visual and speech features are shown in Table 4-10. The recognition rate of the

Table 4-8 shows the experimental results of five emotional categories using only the speech features. The average recognition rate is 73.7%. Table 4-9 shows the experimental results of five emotional categories using image features. The average recognition rate is 81.7%. The recognition rates of using the proposed bimodal information fusion algorithm to combine both visual and speech features are shown in Table 4-10. The recognition rate of the