Chapter 7 Applications of Talking Cartoon Faces
7.3 Applications to Audio Books for E-Learning
7.3.2 Process of Audio Book Generation
The process of audio book generation is similar to the one of virtual announcers.
The only difference is that after the generation of the SVG file, the animation must be embedded into an HTML document to be shown together with the text content and some study materials.
There are three ways to embed an SVG file into a web browser. One is to use the
<embed> tag. The Adobe SVG Viewer, which is adopted in this study as a plug-in of the IE browser, recommends users to use the <embed> tag when embedding an SVG file in HTML pages. The syntax is listed as follows.
<embed src="animation.svg" width="300" height="100"
type="image/svg+xml"
pluginspage="http://www.adobe.com/svg/viewer/install/" />
The second way is to use the <object> tag. The syntax is shown as follows.
<object data="animation.svg" width="300" height="100"
type="image/svg+xml"
codebase="http://www.adobe.com/svg/viewer/install/" />
The third way is to use the <iframe> tag. The syntax is shown as follows.
<iframe src="animation.svg" width="300" height="100">
</iframe>
7.3.3 Experimental Results
An example of experimental results is shown in Figure 7.4. By inputting an SVG file, the title and the content of an audio book, and some related links to the proposed system, an HTML document was automatically created.
Figure 7.4 An example of a virtual teacher.
Chapter 8
Conclusions and Suggestions for Future Works
8.1 Conclusions
In this study, a system for automatic 2D virtual face generation by 3D model transformation techniques has been implemented. We have presented a way to automatically transform a 2D cartoon face model into a 3D one, and animate it by statistical approximation and lip movement synthesis. The system consists of four major components, including a cartoon face creator, a speech analyzer, an animation editor, and an animation and webpage generator.
The cartoon face creator is designed to include the functions of (1) assigning feature points to a 2D face model according to an input neutral facial image or an input 2D face data set; (2) constructing a 3D local coordinate system and create a transformation between the global and the local coordinate systems by the use of a knowledge-based coordinate system transformation method proposed in this study;
and (3) defining basic facial expression parameters for use in facial animation. A face model of 72 facial feature points is used in this study. For the purpose of applying a 3D rotation technique and two curve drawing methods, some additional points are also computed.
Next, the speech analyzer is designed to perform the functions of (1) segmenting a speech file into sentence utterances; and (2) processing each segmented shorter
sentence utterance piece sequentially to extract the duration of each syllable in the speech. A method for segmentation of sentence utterances has also been proposed.
The animation editor is designed to conduct the functions of (1) generating the timing information of facial expressions automatically according to a statistical method proposed in this study; (2) translating syllables into combinations of 12 pre-defined basic mouth shapes; (3) assigning basic mouth shapes and facial expressions in the timeline as key frames; and (4) applying a frame interpolation technique to generate the remaining frames among key frames.
Finally, the animation and webpage generator is designed to perform the functions of (1) rendering and synchronizing the cartoon face with speech by the use of an editable and opened vector-based XML language, namely, Scalable Vector Graphics (SVG); (2) implementing an application to virtual announcers; and (3) embedding the SVG animation into an HTML document to generate an audio book for e-learning. The outlook of the cartoon face can be specified by adding backgrounds and clothes into the animation in this component.
Experimental results shown in the previous chapters have proven the feasibility and applicability of the proposed methods.
8.2 Suggestions for Future Works
Several suggestions for future researches are listed as follows.
(1) Improvement on facial feature detection --- In order to fit more application environments for creation of face models from neutral facial images, the performance of the facial feature detection must be improved. Besides, for precise construction of 3D face models, assigning the position of the feature
points in the Cartesian z-direction may be combined with facial feature detection techniques for side-view photographs.
(2) Rendering cartoon faces with more types --- More face types should be supported to render talking cartoon faces with higher qualities and lovelier appearances. Not only human faces, but also some face types for animals and nonhuman objects need be supported.
(3) Improvement on speech recognition --- In order to generate the talking cartoon face with less input, some speech recognition techniques such as Speech-to-Text (STT) can be integrated into the proposed system. Then a cartoon face can be animated without knowing the transcript of the speech in advance.
(4) Integration of more facial expressions --- With more facial expressions integrated, generated talking cartoon faces will become more interesting and lifelike.
(5) Integration of gestures and body actions --- Same as integration of more facial expressions, talking cartoon faces with gestures and body actions are more vivid and amusing.
(6) Integration of virtual face-painting or hair-designing --- With the integration of virtual face-painting or hair-designing, the proposed system can be used at beauty salons for customers to choose favorite make-up and hair styles that they want to be with.
(7) Simulating facial expressions and head movements with different statistical models --- Since different TV news announcers have different habits when reporting news, more types of statistical models can be used to present different reporting styles for different TV news announcers.
(8) Simulating hair movements dynamically according to the gravity --- When rotating the cartoon face, the positions of hair contour control points can be
dynamically computed according to the gravity to make the hair more realistic.
References
[1] Y. L. Chen and W. H. Tsai, “Automatic Generation of Talking Cartoon Faces from Image Sequences,” Proceedings of 2004 Conference on Computer Vision, Graphics and Image Processing, Hualien, Taiwan, Republic of China, August 2004.
[2] Y. L. Chen and W. H. Tsai, “Automatic Real-time Generation of Talking Cartoon Faces from Image Sequences in Complicated Backgrounds and Applications,”
Proceedings of 2006 International Computer Symposium (ICS 2006) - International Workshop on Image Processing, Computer Graphics, and Multimedia Technologies, Taipei, Taiwan, Republic of China, Dec. 4-6, 2006.
[3] Y. C. Lin, “A Study on Virtual Talking Head Animation by 2D Image Analysis and Voice Synchronization Techniques,” M. S. Thesis, Department of Computer and Information Science, National Chiao Tung University, Hsinchu, Taiwan, Republic of China, June 2002.
[4] C. J. Lai and W. H. Tsai, “A Study on Automatic Construction of Virtual Talking Faces and Applications,” Proceedings of 2004 Conference on Computer Vision, Graphics and Image Processing, Hualien, Taiwan, Republic of China, August 2004.
[5] C. Zhang and F. S. Cohan, “3-D Face Structure Extraction and Recognition From Images Using 3-D Morphing and Distance Mapping,” IEEE Transactions on Image Processing, Vol. 11, No. 11, pp. 1249-1259, Nov. 2002.
[6] T. Goto, S. Kshirsagar, and N. Magnenat-Thalmann, “Automatic Face Cloning and Animation Using Real-Time Facial Feature Tracking and Speech Acquisition,” IEEE Signal Processing Magazine, Vol. 18, No. 3, pp. 17-25, May
2001.
[7] H. Chen, Y. Q. Xu, H. Y. Shum, S. C. Zhu, and N. N. Zheng, “Example-based Facial Sketch Generation with Non-parametric Sampling,” Proceedings of 8th IEEE International Conference on Computer Vision, Vol. 2, pp. 433-438, July 2001.
[8] H. Chen, Z. Liu, C. Rose, Y. Xu, H. Y. Shum, and D. Salesin, “Example-Based Composite Sketching of Human Portraits,” Proceedings of the 3rd Iinternational Symposium on Non-photorealistic Animation and Rendering, Annecy, France, pp.
95-153, 2004.
[9] H. Chen, N. N. Zheng, L. Liang, Y. Li, Y. Q. Xu, and H. Y. Shum, “PicToon: A Personalized Image-based Cartoon System,” Proceedings of the 10th ACM international conference on Multimedia, Juan-les-Pins, France, pp. 171-178, 2002.
[10] Y. Li, F. Yu, Y. Q. Xu, E. Chang, and H. Y. Shum, “Speech Driven Cartoon Animation with Emotions,” Proceedings of the 9th ACM international conference on Multimedia, Ottawa, Canada, pp. 365-371, 2001.
[11] D. Burford and E. Blake, “Real-time Facial Animation for Avatars in Collaborative Virtual Environments,” Proceedings of South African Telecommunications Networks and Applications Conference '99, pp. 178-183, 1999.
[12] M. Zhang, L. Ma, X. Zeng, and Y. Wang, “Imaged-Based 3D Face Modeling,”
International Conference on Computer Graphics, Imaging and Visualization, pp.
165-168, July 26-29 2004.
[13] J. Ostermann, “Animation of Synthetic Faces in MPEG-4,” Proceedings of the Computer Animation, pp. 49-55, June 08-10, 1998.
[14] P. Ekman and W. V. Friesen, Facial Action Coding System: Investigator’s Guide,
Consulting Psychologists Press Inc., Palo Alto, California, 1978.
[15] T. M. Yeh, Drills and Exercises in Mandarin Pronunciation, National Taiwan Normal University, ROC, May 1982.