Creation of Single Virtual-Face Images - Creation of Virtual-Face Image Sequences

Chapter 4 Creation of Virtual Faces with Dynamic Mouth Movements 42

4.4 Creation of Virtual-Face Image Sequences

4.4.5 Creation of Single Virtual-Face Images

Up to now, we have performed the warping, the scaling, the extraction, the filling, and the smoothing operations as described in Sections 4.4.1 through 4.4.4, as shown in Figure 4.14.

Figure 4.14 Illustration of the virtual face creation.

The last task is to integrate the mouth region with the input image. We paste the

mouth region onto the input image according to the positions of the feature points, and the pasting range is from points bUpLf.y to 2.1.y horizontally and from points bUpLf.x to bDnRt.x vertically. Finally, the virtual faces can be created, as illustrated by the center image shown in Figure 4.14.

4.5 Experimental Results

Some experimental results of applying the proposed method for virtual face generation are shown in Figure 4.16. These virtual faces were created using an Angelina Jolie’s photo shown in Figure 4.7(a) as the input image and a video of the author of this thesis shown in Figure 4.15 as the video model.

Figure 4.15 A real-face video model of speaking “teacher” in Chinese.

Figure 4.16 A resulting sequence of virtual face creation by using the video model in Figure 4.15.

Chapter 5 Experimental Results and Discussion

5.1 Experimental Results

In this section, we present our experimental results generated by the proposed techniques and some screen shots of our system.

Firstly, a video of this author saying some words was recorded by a camera, and then some frames and the audio data were extracted from it. In Figure 5.1, we show 150 frames extracted from the video.

Figure 5.1 Illustration of the 150 frames extracted from the video.

Next, we located 26 feature points manually, as shown in Figure 5.2(a), and the system will automatically adjust the x-coordinates of these points to make them symmetrical horizontally, as shown in Figure 5.2(b). Then, we adjust the

y-coordinates of these points to the correct positions, as shown in Figure 5.2(c).

(a)

(b) (c) Figure 5.2 Illustration of the feature point positions. (a) The feature points were located by enlarging

the image. (b) The horizontally symmetric points. (c) The adjusted feature points.

After feature point locating, we choose a single neutral facial image as an input image, and choose a text file (*.face) which contains the coordinates of the feature points of the input image for virtual face creation, as shown in Figure 5.3.

Then, we select the folder which contains previously-mentioned frame sequence.

The system will extract the feature point positions in each frame automatically by tracking them in each frame according to the previous frame using the techniques proposed in Chapter 3. In Figure 5.4, the top frame is the previous frame, in which the feature points are marked as blue dots, and the bottom frame is the current frame, in

which the feature points are marked as red dots.

After extracting the feature point positions of all frames, virtual faces can be created by using the proposed techniques described in Chapter 4. The mouth size information and the mouth state of each frame are shown in the system interface.

Figure5.5 shows an intermediate result of the creation process. The right image is the virtual face created from the left top image which is the input image, and the left bottom image is the current frame. The final results are shown in Figure 5.6 created from a video model of speaking the sentence “Good day, every teacher” (in Chinese) with a thresholding value t = 100 and an edge height Hedge = 2. Two others result using a Liv Tyler’s photo and a Neng-Jing Yi’s photo as the input images with a thresholding value t = 190 and an edge height Hedge = 5 is shown in Figures 5.7 and 5.8.

Figure 5.3 Choosing an input image and feature point coordinates of it.

Figure 5.4 The feature point tracking process.

Figure 5.5 The intermediate result of virtual face creation process.

Figure 5.6 The result of virtual face creation process by using Angelina Jolie’s photo as the input image.

Figure 5.7 The result of virtual face creation process by using Liv Tyler’s photo as the input image.

Figure 5.8 The result of virtual face creation process by using Neng-Jing Yi’s photo as the input image.

5.2 Discussions

After presenting the experimental results, we would like to discuss some issues in concern as follows.

The first issue is the feature point locating process. It is a most important process of the virtual face creation, and users must concentrate on images which the feature points will be located on during this process. They must locate 26 feature points manually, and then adjust the y-coordinates of those points slightly. The system in this study will automatically adjust the x-coordinates of those points to be at symmetric positions. Because the virtual faces we create are realistic, the y-coordinates must be located accurately to get the optimal results.

The second issue is the feature point tracking process. The proposed tracking technique dynamically fits the mouth shapes in the real-face video models and the window sizes are dynamically changed according to the two mouth states we proposed. Because the image information of closed mouths are different from other mouth shapes, including the shape, brightness, and texture, the correction of feature point locations are proposed. For each frame, when we detect any one of the three closed-mouth shapes we proposed, we correct the feature point locations immediately.

By implying the correction, we can ensure that every feature point in each frame of the video models is correct.

Finally, we discuss the virtual-face creation process. In this process, realistic virtual faces are created. We dynamically scale the mouth sizes and adjust the positions of the chins and the mouths according to the mouth shapes of the real-face video models.

The experimental results show that the created virtual faces are natural.

Chapter 6 Conclusions and Suggestions for Future Works

6.1 Conclusions

In this study, a system for automatic creation of virtual faces with dynamic mouth movements has been implemented. We have presented a way to automatically create virtual faces from a given neutral facial image, and the mouths of these virtual faces can move dynamically by feature point tracking and mouth size scaling. The system contains three components, including a feature point locator, a feature point tracker, and a virtual face creator.

By the feature point locator, 26 feature points of the input image and those points of the first frame of the video model are located manually at the accurate and symmetrical positions. Mouth feature regions are defined by groups of these points.

Next, using the feature point tracker, the feature points of each frame can be extracted by an image matching technique proposed in this study. Furthermore, mouth-movement information of the video model is analyzed in this study to get mouth states. The mouth states of each frame are detected for image matching to dynamically change window sizes.

However, correction of the feature point locations need be implemented when detecting a closed mouth. This is achieved by the use of a hierarchical bi-level thresholding technique and an edge detection technique.

Finally, the virtual face creator creates virtual face sequences with dynamical mouth movements by a proposed mouth shape morphing technique. The mouths and the chins of the virtual faces are created to move naturally, and the skins near them are made to look smooth by a morphing technique. And, a mouth size scaling technique is proposed for the synchronization of mouth movements.

The experimental results shown in the previous chapters have revealed the feasibility of the proposed system.

6.2 Suggestions for Future Works

Several suggestions for future researches are listed as follows.

(1) Automatic detection of feature points of a mouth --- For the convenience of using the proposed system, automatic feature point detection can let users skip the feature point locating process and the operation of the proposed system will be easier.

(2) Integration of eye and eyebrow movements --- With the eye and eyebrow movements, created virtual faces will become more vivid and amusing.

(3) Integration of facial wrinkles --- Just like integration of eye and eyebrow movements, virtual faces with wrinkles, such as those on the forehead, along smile lines, and round the eyes, look more natural and lifelike.

(4) Improvement on feature point tracking --- In order to deal with larger videos, the speed of feature point tracking must be faster to have the ability to handle high-quantity videos.

(5) Improvement on mouth shape detection --- In the proposed system, a person in the video model talk with a medium speed, but the talking speed

of people is faster. For wider applications, the mouth shape detection must be improved.

(6) Real-time virtual face creation --- If the facial features can be extracted in real time, the virtual face can synchronize with a speaking person in the proposed system in real time. The proposed system can be used in distance teaching for students to choose a favorite movie star or singer to be the teacher.

References

[1] Y. L. Chen and W. H. Tsai, “Automatic generation of talking cartoon faces from image sequences,” Proceedings of 2004 Conference on Computer Vision, Graphics and Image Processing, Hualien, Taiwan, Republic of China, Aug.

2004.

[2] Y. L. Chen and W. H. Tsai, “Automatic real-time generation of talking cartoon faces from image sequences in complicated backgrounds and applications,”

Proceedings of 2006 International Computer Symposium (ICS 2006) - International Workshop on Image Processing, Computer Graphics, and Multimedia Technologies, Taipei, Taiwan, Republic of China, Dec. 2006.

[3] Y. C. Lin, “A study on Virtual talking head animation by 2D Image analysis and voice synchronization techniques,” M. S. Thesis, Department of Computer and Information Science, National Chiao Tung University, Hsinchu, Taiwan, Republic of China, June 2002.

[4] C. J. Lai and W. H. Tsai, “A study on automatic construction of virtual talking faces and applications,” Proceedings of 2004 Conference on Computer Vision, Graphics and Image Processing, Hualien, Taiwan, Republic of China, Aug.

2004.

[5] Y. F. Chang and W. H. Tsai, “Automatic 2D virtual face generation by 3D model transformation techniques and applications,” M. S. Thesis, Institute of Multimedia Engineering, National Chiao Tung University, Hsinchu, Taiwan, Republic of China, June 2007.

[6] C. Bregler, M. Covell, and M. Slaney, “Video rewrite driving visual speech with audio,” ACM Computer Graphics Proc. SIGGRAPH 97, Los Angeles, CA, pp.

353-360, Aug. 1997.

[7] E. Cosatto, “Sample-based talking-head synthesis,” Computer Animation 98 Proceeding, Philadelphia, PA, USA, pp. 103-110, June 1998.

[8] I-Chen Lin, et al., “A speech driven talking head system based on a single face image,” Proceedings of the 7th Pacific Conference on Computer Graphics and Applications, Seoul, South Korea, pp. 43-49, Oct. 1999.

[9] J. P. Nedel, “Integration of speech & video applications for lip synch lip movement synthesis & time warping,” M. S. Thesis, Department of Electrical and Computer Engineering, Carnegie Mellon University, Pittsburgh, Pennsylvania, May 1999.

[10] T. Ezzat and T. Poggio, “Visual speech synthesis by morphing visemes,”

International Journal of Computer Vision, Vol. 38, issue 1, pp.45-47, June 2000.

[11] I. Buck, et al., “Performance-driven hand-drawn animation,” ACM SIGGRAPH 2006 Courses, No. 25, Boston, Massachusetts, July 30 - Aug. 03, 2006.

[12] Q. Zhang, et al., “Geometry-driven photorealistic facial expression,”

Proceedings of the 2003 ACM SIGGRAPH/Eurographics symposium on Computer animation, San Diego, California, July 2003.

[13] R. C. Gonzalez and R. E. Woods, “Digital image processing,” 2nd ed., New Jersey: Prentice-Hall, 2002.

[14] J. Gomes, et al., “Warping and morphing of graphical objects,” San Francisco, CA: Morgan Kaufmann, 1998.

[15] T. Beier and S. Neely, “Feature-based image metamorphosis,” ACM SIGGRAPH Computer Graphics, Vol. 26, issue 2, pp.35-42, July 1992.

[16] J. Ostermann, “Animation of synthetic faces in MPEG-4,” Proceedings of the Computer Animation, pp.49, June 1998.

[17] T. Goto, et al., “MPEG-4 based animation with face feature tracking,”

Proceedings of the Erographics Workshop on Computer Animation and

Simulation'99, Milano, Italy, Springer, Wien New York, pp.89-98, Sep. 1999.

在文檔中使用視訊模型從單一影像自動產生有動態嘴形動作的虛擬人臉之研究 (頁 74-0)