• 沒有找到結果。

Chapter 6 Talking Cartoon Face Generator Using Scalable Vector

6.3 Construction of a Talking Cartoon Face Generator Using SVG

6.3.2 Temporal Domain Process

To synchronize the speech with the animation, we utilize some syntaxes of SVG for audio playing and frame simulation. For audio playing, since we use the Adobe SVG Viewer as the plug-in of the IE browser to view SVG files, the additional syntax

“a:audio” defined in Adobe SVG Viewer extension namespaces is applied. An example of using the syntax “a:audio” to play the speech file is shown in the following.

<a:audio xlink:href="LifeScience.wav" begin="0s" />

As we can see, the time when the audio file begins to play can be specified. For frame simulation, a frame sequence can be simulated as an animation in the following way.

<g visibility="hidden">

<set attributeName="visibility" from="hidden" to="visible"

begin="0.00s" dur="0.03s"/>

<!--Components of a frame-->

</g>

As seen above, the syntax “g” can be used to group a set of components which belong to the same frame, and the properties of visibility and time can be set up by using the syntax “set” to simulate the frame according to the demanded frame rate.

By applying the above mentioned syntaxes and specifying the properties of visibility and time for each group of components, cartoon faces can be animated with the speech synchronized.

6.4 Experimental Results

Combing the techniques proposed in the previous chapters, some experimental results of talking cartoon faces with facial expressions and head movements rendered by SVG are shown in this section. An experimental result of generating a talking cartoon face synchronized with a speech file of saying two Mandarin words “蜿蜒” is shown in Figure 6.9. Another experimental result of saying two Mandarin words “光 明” is shown in Figure 6.10.

(a) (b) (c) (d)

Figure 6.9 An experimental result of the talking cartoon face speaking “蜿蜒.”

(e) (f) (g) (h)

(i) (j) (k) (l)

(m) (n) (o) (p)

Figure 6.9 An experimental result of the talking cartoon face speaking “蜿蜒.” (continued)

(a) (b) (c) (d)

(e) (f) (g) (h)

(i) (j) (k) (l)

(m) (n) (o) (p)

Figure 6.10 An experimental result of the talking cartoon face speaking “光明.”

Chapter 7

Applications of Talking Cartoon Faces

7.1 Introduction to Implemented Applications

Talking cartoon faces can be integrated into many applications. For example, they can be used as virtual guides to guide visitors in libraries and museums. When rambling around while visiting a museum, visitors may listen to the narration of exhibitions provided by the virtual guide, or watch an introduction film with a virtual guide speaking beside it. Talking cartoon faces can also be used as software agents to help users operate the software. They can, with more vivid presentations, give users some tips on how to make use of some specific functions provided in the software.

In this study, some applications of talking cartoon faces are implemented. In Section 6.2, an application to virtual announcers is described. Talking cartoon faces are used to report news, just like news announcers on the TV. In Section 7.3, an application to audio books is presented. Unlike traditional audio books, animations of talking cartoon faces are shown together with the text content and synchronized with the speech. This application can be used in the e-learning area. Thereby, students not only can get impressed by the talking cartoon face and hence keep the content of study materials in mind, but also can have fun.

7.2 Application to Virtual Announcers

7.2.1 Introduction to Virtual Announcers

Virtual announcers are virtual talking faces that can report news. Since real news announcers may take a leave sometimes due to some reasons, virtual announcers can be used in place of them. Even though the absentee did not record news releases in advance, we still can use other one’s voice instead to generate a talking cartoon face of his/her appearance by inputting a neutral facial image of him/her to the proposed system. Therefore, virtual announcers can be easily created to temporarily take place of real announcers if the need arises. Moreover, unlike real announcers, virtual announcers can be animated throughout 24 hours without weariness. Although the content of the speech may be the same, the pose and the facial expression of the virtual announcer can be changed momentarily. Therefore, it is considered that using a virtual announcer is a more interesting way to report the same news than replaying the same video clips of the news hourly, as some TV news channels used to doing in their news programs.

7.2.2 Process of Talking Face Creation

The detail of the process of the proposed system is shown in Figure 7.1. First, a 3D cartoon face model is constructed by the cartoon face creator, which is described in Chapter 2. Second, the timing information of a speech file, which is an audio of news here, is extracted by the speech analyzer, as introduced in Chapter 3. Third, basic mouth shapes and facial expressions are assigned in the timeline based on the techniques proposed in Chapter 4 and Chapter 5. Then, a frame interpolation technique is performed by the animation editor. After the above mentioned steps are

done, the frames and the audio are synchronized in an SVG file by the animation and webpage generator. Additionally, by adding two layers, namely, the background layer and the clothes layer, for the cartoon face, an animation of a virtual announcer can be created.

Figure 7.1 An illustration of the process of proposed system.

7.2.3 Experimental Results

Two examples of experimental results are shown in Figure 7.2 and Figure 7.3.

Since the location of the rotation origin can be changed to other positions, by applying the transformation between the global and the local coordinate systems, as mentioned in Chapter 2, the location of cartoon faces can be easily changed.

Figure 7.2 An example of a virtual announcer.

Figure 7.3 Another example of a virtual announcer.

7.3 Applications to Audio Books for E-Learning

7.3.1 Introduction to Audio Books

An audio book is a recording of the contents of a book read aloud. Audio books are usually circulated as CDs, cassette tapes, or other digital formats. They are used to

tell stories, or used for other education purposes. Unlike traditional books, one may listen to an audio book while doing other works. With voice reading, one may learn the pronunciation of words which he/she may not learn correctly by just reading the book. The voice also can make a deeper impression on the reader than the text.

However, only a text and an audio version of a book is not enough. Sometimes doing other works may reduce one’s attention and hence he/she cannot concentrate on the content of the audio book. In this study, we implement an application to audio books, which combines an animation of talking cartoon faces together with the text content and the audio. By watching the animation, listening to the speech, and reading the text if necessary, students may be attracted by the talking cartoon face and hence get a better learning effect.

7.3.2 Process of Audio Book Generation

The process of audio book generation is similar to the one of virtual announcers.

The only difference is that after the generation of the SVG file, the animation must be embedded into an HTML document to be shown together with the text content and some study materials.

There are three ways to embed an SVG file into a web browser. One is to use the

<embed> tag. The Adobe SVG Viewer, which is adopted in this study as a plug-in of the IE browser, recommends users to use the <embed> tag when embedding an SVG file in HTML pages. The syntax is listed as follows.

<embed src="animation.svg" width="300" height="100"

type="image/svg+xml"

pluginspage="http://www.adobe.com/svg/viewer/install/" />

The second way is to use the <object> tag. The syntax is shown as follows.

<object data="animation.svg" width="300" height="100"

type="image/svg+xml"

codebase="http://www.adobe.com/svg/viewer/install/" />

The third way is to use the <iframe> tag. The syntax is shown as follows.

<iframe src="animation.svg" width="300" height="100">

</iframe>

7.3.3 Experimental Results

An example of experimental results is shown in Figure 7.4. By inputting an SVG file, the title and the content of an audio book, and some related links to the proposed system, an HTML document was automatically created.

Figure 7.4 An example of a virtual teacher.

Chapter 8

Conclusions and Suggestions for Future Works

8.1 Conclusions

In this study, a system for automatic 2D virtual face generation by 3D model transformation techniques has been implemented. We have presented a way to automatically transform a 2D cartoon face model into a 3D one, and animate it by statistical approximation and lip movement synthesis. The system consists of four major components, including a cartoon face creator, a speech analyzer, an animation editor, and an animation and webpage generator.

The cartoon face creator is designed to include the functions of (1) assigning feature points to a 2D face model according to an input neutral facial image or an input 2D face data set; (2) constructing a 3D local coordinate system and create a transformation between the global and the local coordinate systems by the use of a knowledge-based coordinate system transformation method proposed in this study;

and (3) defining basic facial expression parameters for use in facial animation. A face model of 72 facial feature points is used in this study. For the purpose of applying a 3D rotation technique and two curve drawing methods, some additional points are also computed.

Next, the speech analyzer is designed to perform the functions of (1) segmenting a speech file into sentence utterances; and (2) processing each segmented shorter

sentence utterance piece sequentially to extract the duration of each syllable in the speech. A method for segmentation of sentence utterances has also been proposed.

The animation editor is designed to conduct the functions of (1) generating the timing information of facial expressions automatically according to a statistical method proposed in this study; (2) translating syllables into combinations of 12 pre-defined basic mouth shapes; (3) assigning basic mouth shapes and facial expressions in the timeline as key frames; and (4) applying a frame interpolation technique to generate the remaining frames among key frames.

Finally, the animation and webpage generator is designed to perform the functions of (1) rendering and synchronizing the cartoon face with speech by the use of an editable and opened vector-based XML language, namely, Scalable Vector Graphics (SVG); (2) implementing an application to virtual announcers; and (3) embedding the SVG animation into an HTML document to generate an audio book for e-learning. The outlook of the cartoon face can be specified by adding backgrounds and clothes into the animation in this component.

Experimental results shown in the previous chapters have proven the feasibility and applicability of the proposed methods.

8.2 Suggestions for Future Works

Several suggestions for future researches are listed as follows.

(1) Improvement on facial feature detection --- In order to fit more application environments for creation of face models from neutral facial images, the performance of the facial feature detection must be improved. Besides, for precise construction of 3D face models, assigning the position of the feature

points in the Cartesian z-direction may be combined with facial feature detection techniques for side-view photographs.

(2) Rendering cartoon faces with more types --- More face types should be supported to render talking cartoon faces with higher qualities and lovelier appearances. Not only human faces, but also some face types for animals and nonhuman objects need be supported.

(3) Improvement on speech recognition --- In order to generate the talking cartoon face with less input, some speech recognition techniques such as Speech-to-Text (STT) can be integrated into the proposed system. Then a cartoon face can be animated without knowing the transcript of the speech in advance.

(4) Integration of more facial expressions --- With more facial expressions integrated, generated talking cartoon faces will become more interesting and lifelike.

(5) Integration of gestures and body actions --- Same as integration of more facial expressions, talking cartoon faces with gestures and body actions are more vivid and amusing.

(6) Integration of virtual face-painting or hair-designing --- With the integration of virtual face-painting or hair-designing, the proposed system can be used at beauty salons for customers to choose favorite make-up and hair styles that they want to be with.

(7) Simulating facial expressions and head movements with different statistical models --- Since different TV news announcers have different habits when reporting news, more types of statistical models can be used to present different reporting styles for different TV news announcers.

(8) Simulating hair movements dynamically according to the gravity --- When rotating the cartoon face, the positions of hair contour control points can be

dynamically computed according to the gravity to make the hair more realistic.

References

[1] Y. L. Chen and W. H. Tsai, “Automatic Generation of Talking Cartoon Faces from Image Sequences,” Proceedings of 2004 Conference on Computer Vision, Graphics and Image Processing, Hualien, Taiwan, Republic of China, August 2004.

[2] Y. L. Chen and W. H. Tsai, “Automatic Real-time Generation of Talking Cartoon Faces from Image Sequences in Complicated Backgrounds and Applications,”

Proceedings of 2006 International Computer Symposium (ICS 2006) - International Workshop on Image Processing, Computer Graphics, and Multimedia Technologies, Taipei, Taiwan, Republic of China, Dec. 4-6, 2006.

[3] Y. C. Lin, “A Study on Virtual Talking Head Animation by 2D Image Analysis and Voice Synchronization Techniques,” M. S. Thesis, Department of Computer and Information Science, National Chiao Tung University, Hsinchu, Taiwan, Republic of China, June 2002.

[4] C. J. Lai and W. H. Tsai, “A Study on Automatic Construction of Virtual Talking Faces and Applications,” Proceedings of 2004 Conference on Computer Vision, Graphics and Image Processing, Hualien, Taiwan, Republic of China, August 2004.

[5] C. Zhang and F. S. Cohan, “3-D Face Structure Extraction and Recognition From Images Using 3-D Morphing and Distance Mapping,” IEEE Transactions on Image Processing, Vol. 11, No. 11, pp. 1249-1259, Nov. 2002.

[6] T. Goto, S. Kshirsagar, and N. Magnenat-Thalmann, “Automatic Face Cloning and Animation Using Real-Time Facial Feature Tracking and Speech Acquisition,” IEEE Signal Processing Magazine, Vol. 18, No. 3, pp. 17-25, May

2001.

[7] H. Chen, Y. Q. Xu, H. Y. Shum, S. C. Zhu, and N. N. Zheng, “Example-based Facial Sketch Generation with Non-parametric Sampling,” Proceedings of 8th IEEE International Conference on Computer Vision, Vol. 2, pp. 433-438, July 2001.

[8] H. Chen, Z. Liu, C. Rose, Y. Xu, H. Y. Shum, and D. Salesin, “Example-Based Composite Sketching of Human Portraits,” Proceedings of the 3rd Iinternational Symposium on Non-photorealistic Animation and Rendering, Annecy, France, pp.

95-153, 2004.

[9] H. Chen, N. N. Zheng, L. Liang, Y. Li, Y. Q. Xu, and H. Y. Shum, “PicToon: A Personalized Image-based Cartoon System,” Proceedings of the 10th ACM international conference on Multimedia, Juan-les-Pins, France, pp. 171-178, 2002.

[10] Y. Li, F. Yu, Y. Q. Xu, E. Chang, and H. Y. Shum, “Speech Driven Cartoon Animation with Emotions,” Proceedings of the 9th ACM international conference on Multimedia, Ottawa, Canada, pp. 365-371, 2001.

[11] D. Burford and E. Blake, “Real-time Facial Animation for Avatars in Collaborative Virtual Environments,” Proceedings of South African Telecommunications Networks and Applications Conference '99, pp. 178-183, 1999.

[12] M. Zhang, L. Ma, X. Zeng, and Y. Wang, “Imaged-Based 3D Face Modeling,”

International Conference on Computer Graphics, Imaging and Visualization, pp.

165-168, July 26-29 2004.

[13] J. Ostermann, “Animation of Synthetic Faces in MPEG-4,” Proceedings of the Computer Animation, pp. 49-55, June 08-10, 1998.

[14] P. Ekman and W. V. Friesen, Facial Action Coding System: Investigator’s Guide,

Consulting Psychologists Press Inc., Palo Alto, California, 1978.

[15] T. M. Yeh, Drills and Exercises in Mandarin Pronunciation, National Taiwan Normal University, ROC, May 1982.