• 沒有找到結果。

Experiments on Speech-signal-based Emotion Recognition

Chapter 4 Experimental Results

4.3 Experiments on Speech-signal-based Emotion Recognition

The performance of the proposed emotional voice recognition system was evaluated using a self-built database. Furthermore, experimental study of the proposed system was performed by integrating the DSP-based system into an entertainment robot.

4.3.1 Experiments Using the Self-built Database

The proposed emotion recognition system was tested using a speech database built in the ISCI lab of National Chiao Tung University. There are five categories of emotional speech in the database: happiness, sadness, surprise, anger and neutral. For each category, there are three kinds of different sentences. In order to express the emotion in a natural way, each subject was asked to narrate expressive sentences, in Chinese, to imitate an actual interactive scenario. Table 4-13 lists the meaning of each sentence, in English. Currently, the database includes various emotional utterances from five persons. Each person recorded each sentence six times, so there are 90 utterances per emotion category and 450 utterances in total, in this database. In the following experiments, 45 data samples were randomly selected as training data for each emotional category and the other 45 data samples were used as test data. Part of

Table 4-13: Meaning of sentence content for five emotional categories.

Emotional category Content of sentence Anger

1. How can you do that without my agreement?

2. It’s none of your business.

3. What you are doing is wrong!

Happiness

1. It’s almost new year!

2. I will go abroad on vacation tomorrow.

3. I won the lottery!

Neutral

1. It’s a sunny day.

2. I have something to do later.

3. Are you hungry?

Sadness

1. My cat is lost.

2. I got a cold.

3. Everything went without a hitch today.

Surprise

1. Are you serious?

2. I can’t believe that it really happened.

3. Ah! My notebook is lost.

the voice clips of the database can be found in [92].

In order to compare the effect of speech features, fundamental frequency and short-time energy features, the emotion is first evaluated between any two emotional categories. Figure 4-13 shows the experimental results of the SVM classification of any two emotional categories. There are ten combinations of any two emotional expressions. It is seen that the recognition rates for these nine combinations are higher than 85%. The recognition rate of neutral vs. sadness (A vs. E) is the lowest, mainly due to the small prosodic variation between neutral and sad speech utterances. The other recognition rates lie between 85% and 96%. The average recognition rate is 89.2%. This indicates that the proposed statistical features can represent emotional characteristics properly.

The hierarchical SVM classifier (shown as Fig. 3-21) was then employed to recognize five emotional categories. In the experiments, the SVM classifier was trained using a set of 45 data samples for each emotional category. These 45 data samples came from five persons, with each person contributing three samples of each emotional sentence. The other 45 data samples were tested for recognition of the emotion category. The test results are presented in Table 4-14. The average recognition rate for the five emotional expressions is 73.78%. It is

Fig. 4-13: Experimental results of recognition rate for any two emotional categories.

noted that anger can be classified as surprise. It is due to the similar speech rates and tones of these two kinds of sentences in the self-built database. Moreover, the accent and noise of the voice influence the classification results a lot. We will take these factors into consideration in future work.

4.3.2 Experiments with the Entertainment Robot

In this study, we aim to develop an entertainment robot suitable as a children’s toy. In such a robotic application, fast response to natural speech signal is required. Therefore, a simple entertainment robot is built to verify the proposed natural speech signal emotion recognition algorithm. The complete emotion recognition system was integrated into the self-constructed entertainment robot. Figure 4-14 shows a block diagram of the implemented interaction control system on the robot.

In the experiment, a user speaks in front of the robot, as shown in Fig. 3-22. After acquiring the speech signals, the emotion recognition system begins to process the audio information. When no human speech is detected, the robot manifests a bored behavior by turning its head to look around. When a user says “hello” to the robot, with neutral emotion, the robot raises its hands to respond to the user. If a happy emotion from the user is detected, the robot rotates its ears and raises its hands to show a happy gesture. When the user expresses anger to the robot, the robot puts its hands down to portray fear. However, the robot

Table 4-14: Experimental results of recognizing five emotional categories.

Output

Input Anger Happiness Neutral Sadness Surprise Recognition rate

Anger 30 0 3 4 8 66.67%

Happiness 1 37 3 4 0 82.22%

Neutral 1 6 35 3 0 77.78%

Sadness 0 4 6 30 5 66.67%

Surprise 5 2 1 3 34 75.56%

Average 73.78%

Fig. 4-14: Block diagram of the emotional interaction system.

shakes its head if surprise is detected from the user. Figure 4-15 shows the interaction responses of the robot when the user said, “I am angry!”, with an angry tone. After that, the user used a surprise tone to the robot. As shown in Fig. 4-16, the robot shook its head in response to the recognized emotional state. The experimental results verify that the proposed emotion recognition system allows the robot to interact with a user in a natural and friendly manner. A video clip of the experimental results can be found in [93]. In the future, a fast system will be further studied to recognize human’s emotional speech and interact in a more

Fig. 4-15: Interactive response of the robot as the user says, “I am angry!” (a) The robot puts down its hands to portray fear. (b) The robot continues to put down its hands to the lowest position. (c) The robot raises its hands back to the original position.

Fig. 4-16: Interactive response of the robot, when the user speaks in a surprised tone. (a) The robot shakes its head to the right. (b) The robot shakes its head to the left. (c) The robot puts its head back to the original position.

humanlike manner. Some suitable psychological findings will also be considered to apply to the emotional robotic system.