Chapter 1 Introduction
1.5 Thesis Organization
The remainder of the thesis is organized as follows. In Chapter 2, the proposed method of construction of 3D cartoon face models based on 2D cartoon face models and a method of creation of virtual cartoon faces are described. In Chapter 3, the proposed method of speech segmentation for lip synchronization is described. In
Chapter 4, the proposed method of simulating facial expressions and head movements is described. And then, some animation issues such as lip movements and smoothing of talking cartoon facial animation are discussed and solved in Chapter 5. Up to Chapter 5, talking cartoon faces are generated.
A final integration using an open standard language SVG (Scalable Vector Graphics) to generate web-based animations is described in Chapter 6. Some examples of applications using the proposed system are presented in Chapter 7. Finally, conclusions and some suggestions for future works are included in Chapter 8.
Chapter 2
Cartoon Face Generation and Modeling from Single Images
2.1 Introduction
To animate a cartoon face much livelier, many issues are of great concern to us, such as lip movements, eye blinks, eyebrow movements, and head movements.
Especially for simulation of head movements, including head tilting and head turning, a 2D face model is not enough to synthesize proper head poses of cartoon faces. Due to this reason, a 3D face model must be constructed to handle this problem. In the proposed system, one of the four major parts shown in Figure 1.1, which is named cartoon face creator, is designed to create personal cartoon faces, integrating the technique of 3D face model construction. In the creation process, three main steps are included. The first step is to assign facial feature points to a 2D face model. It can be done in two ways. One is to detect facial features of an input neutral facial image, generate corresponding feature points, and map them to the feature points in the predefined face model. The other is to directly assign the feature points according to the input 2D face data. In this study, we adopt both ways in constructing our face model.
The second step is to construct the local coordinate system of the face model for applying 3D rotation techniques. By creating a transformation between the global and the local coordinate systems and assigning the position of the feature points in the
third dimension, namely, the Cartesian z-coordinate, this step can be done, and then essential head movements can be simulated. The last step is to define basic facial expression parameters for use in face animation.
In this chapter, some techniques are proposed to achieve the purpose mentioned above. First, a review of Chen and Tsai [1] about constructing a 2D face model from single images is presented in Section 2.2. Second, a technique to construct a 3D face model based on the 2D face model is proposed in Section 2.3. In Section 2.4, a technique is proposed to create the cartoon face with different expressions in different poses.
2.2 Review of Adopted Cartoon Face Model
Chen and Tsai [1] proposed an automatic method for generation of personal cartoon faces from a neutral facial image. In their method, three main steps are carried out: extraction of facial feature regions, extraction of facial feature points, and creation of a face model. In the first step, a hierarchical bi-level thresholding method is used to extract the background, hair, and face regions in a given face image. A flowchart of the hierarchical bi-level thresholding method is shown in Figure 2.1.
Then, by finding all probable pairs of eye regions according to a set of rules related to the region’s heights, widths, etc., and filtering these regions according to the symmetry of the two regions in each pair, an optimal eye-pair can be detected. Taking the positions of the detected optimal eye-pair as a reference, the facial feature regions can be extracted by a knowledge-based edge detection technique.
Figure 2.1 Flowchart of hierarchical bi-level thresholding method in Chen and Tsai [1].
Before extracting facial feature points, a face model with facial feature points must be defined first. Because the 84 feature points and the facial animation parameter units (FAPUs) of the face model specified in the MPEG-4 standard are not suitable for cartoon face drawing, Chen and Tsai [1] defined a face model with 72 feature points by adding or eliminating some feature points of the face model in MPEG-4. Some FAPUs were also specified according to the MPEG-4 standard, and then a new adoptable face model was set up. An illustration of the proposed face model is shown in Figure 2.2. In order to control the facial expression of the cartoon face, some feature points were assigned to be control points which are listed as follows:
1. Eyebrow Control Points: there are 8 control points in both eyebrows, namely, 4.2, 4.4, 4.4a, 4.6, 4.1, 4.3, 4.3a, and 4.5.
2. Eye Control Points: there are 4 control points in eyes, namely, 3.1, 3.3, .3.2, and 3.4.
3. Mouth Control Points: there are 4 control points in the mouth, namely, 8.9, 8.4, 8.3, and 8.2, by which other mouth feature points are computed.
4. Jaw Control Point: there is 1 control point in the jaw, namely, 2.1, which is automatically computed by the position of the control point 8.2 and the value of the facial animation parameter JawH.
These control points in this study are the so-called face model control points.
After setting up the face model with 72 feature points, the corresponding feature points in a given facial image can be extracted from the previously mentioned facial feature regions.
(a) (b)
Figure 2.2 A face model. (a) Proposed 72 feature points. (b) Proposed facial animation parameter units in Chen and Tsai [1].
Finally, two curve drawing methods are applied to create cartoon faces. One is the corner-cutting subdivision method, in which a subdivision curve is generated by repeatedly cutting off corners of a polygon until a certain condition is reached, as shown in Figure 2.3. The other is the cubic Bezier curve approximation method, which is used to produce smooth curves with a simple polynomial equation, as shown in Figure 2.4.
Figure 2.3 An illustration of corner-cutting algorithm in Chen and Tsai [1].
Figure 2.4 Cubic Bezier curve in Chen and Tsai [1].
2.3 Construction of 3D Face Model Based on 2D Cartoon Face Model
In this section, the basic idea of constructing a 3D face model based on the above-mentioned 2D cartoon face model is introduced in Section 2.3.1. And the detail of the construction process is described in Section 2.3.2.
2.3.1 Basic Idea
Based on the face model mentioned in Section 2.2, a method is proposed to construct a 3D face model. The method can be divided into two steps: the first is to construct a local coordinate system from the global one, and the second is to assign the position of the feature points in the Cartesian z-direction. The basic idea for constructing a local coordinate system is to define a rotation origin and transform the points of the global coordinate system into those of the local one based on the rotation origin. The basic idea for assigning the position of the feature points in the Cartesian z-direction is to do the assignment based on a proposed generic model. Although a
generic model cannot represent all cases of human faces, it is practical enough in the application of generating virtual talking faces, because in real cases, one usually does not roll his/her head violently when giving a speech. With the assumption that heads are rotated slightly when speaking, a little inaccuracy of the depth information in a face model would not affect the result much, so we can then easily generate different head poses of the face model by a 3D rotation technique after the 3D face model is constructed.
2.3.2 Construction Process
The first step to construct a 3D face model is to construct a local coordinate system. As mentioned above, to achieve this goal, the first issue is to define a rotation origin. The ideal position of the rotation origin is the center of the neck, so we propose a knowledge-based method to define its position according to the position of the eyes. Some definitions of the terms used in this section are listed first as follows:
EyeballLeft/Right.x is the x position of the center of the left/right eyeball circle;
EyeballLeft/Right.y is the y position of the center of the left/right eyeball circle;
EyeMid is the position (x, y) of the center between EyeballLeft and EyeballRight.
The green dot shown in Figure 2.5 represents the point EyeMid. After computing the position of EyeMid and making use of the FAPU d in the face model which denotes the Euclidean distance between two eyeballs, we can set the rotation origin and create a the transformation between the global coordinate system and the local one, as shown in Figure 2.5.
However, before we start the transformation, some additional points (as shown in Figure 2.6) must be defined to help drawing the cartoon face in different poses, as we
will describe later in Section 2.4. These points will be also transformed into the local coordinate system, so we must set up their positions before the transformation is started. And the detailed method for setting up the additional points and conducting the transformation is expressed as an algorithm in the following.
2.1
Figure 2.5 Two coordinate systems. The lines drawn in black color represent the global coordinate system, and those drawn in red color represent the local one.
Figure 2.6 Points to help drawing.
Algorithm 2.1. Knowledge-based coordinate system transformation.
Input: One point EyeMid, some FAPUs, including d, EyebrowH, and EyebrowH2, and 72 model points in the global coordinate system.
Output: A rotation origin O, 17 additional points, and 72 model points in the local coordinate system.
Steps:
1. Let Wear denote the distance between the EyeMid and an ear in the face model.
2. Let xp and yp denote the x-position and the y-position of a point P in the face model.
3. Speculate a rotation origin O(xo, yo) to represent the center of the neck with
xi = x4.3a, yi = y4.3a + EyebrowH2/1.3; the assignment. To generate the generic model, two orthogonal photographs are used, as shown in Figure 2.7. By calculating the Euclidean distance d between two eyeballs and the distance d’ between the y-position of EyeMid and the y-position of the feature point 2.2 in the front-view image, d’ can be expressed as a constant multiple of d.
Here it is shown as 1.03d in the experiment in Figure 2.7(a). Similarly in the side-view image, the distance between EyeMid and the point 2.2 in the y-direction is set to the constant multiple of d, as shown in Figure 2.7(b). By marking all of the viewable points, including the rotation origin and some of the additional points mentioned above, and computing the distance in the z-direction between the origin and each of the points in the image, the positions of the points in the z-direction can
be computed as a constant multiple of d, too. For those points which are not viewable, based on the symmetry of the human face, their positions can be also assigned. After some adjustments and experiments, the values of the points in the z-direction adopted in this study is listed in Table 2.1.
(a) (b)
Figure 2.7 Two orthogonal photographs. (a) Front view. (b) Side view.
Table 2.1 The values of the points in the z-direction.
Category Point Value Category Point Value
Hair All hair points -0.58d 11.1 1.37d
After the two steps are done, a 3D face model is constructed. We consider d as a length unit in the face model, and we can easily change the scale and the position of a 3D face model by changing its origin and the reference d value. The scheme is useful for normalization between different faces in different scales and positions. For example, if there is a face model whose d value is a certain constant c, and we want to scale it to a larger size with the value d being another constant c’, we can just apply the geometric ratio principle to multiply the position of each point and each FAPU by a factor of c’/c.
2.4 Creation of Cartoon Face
As mentioned in Section 2.2, the cartoon face is created by the corner-cutting subdivision and the cubic Bezier curve approximation methods. In this section, two types of cartoon faces are introduced, one being the frontal cartoon face and the other the oblique cartoon face. It is hoped that by the two types of cartoon faces, a head-turning talking cartoon face can be represented smoothly.
2.4.1 Creation of Frontal Cartoon Face
A frontal cartoon face is drawn by the 72 feature points and some of the additional points mentioned previously. Let O(xo, yo) denote the position of the rotation origin in the face model. Before the creation process, for each of the additional points and the 72 model points P(xp, yp), the position of P must be transformed into the global coordinate system in the following way:
p p o
x =x + ; x yp = yo−yp.
After the transformation, the cartoon face can be drawn in the global coordinate system. The detail of the proposed frontal face creation method is described in the following as an algorithm.
Algorithm 2.2. Creation of frontal cartoon face.
Input: 72 feature points, 17 additional points, and some FAPUs, including the radii of the eyeballs r1 and r2 in the face model.
Output: an image of the frontal cartoon face.
Steps:
1. Let arc(P1, …, Pn) denote a curve composed by the points P1, …, Pn.
Figure 2.8 An illustration of arc(P1, …, Pn).
2. Draw the contour of the hair by a polygon composed by 23 hair feature points.
3. Draw the contour of the face, including the forehead, cheek, and jaw, by the cubic Bezier curves arc(11.2, 11.2a, 11.1, 11.3a, 11.3), arc(11.3, 10.9, 2.13), arc(2.13, 2.1, 2.14), and arc(2.14, 10.10, 11.2).
4. Draw the contour of the left ear by the cubic Bezier curves arc(10.8, 10.2, A).
5. Draw the contour of the right ear in a similar way.
6. Draw the contour of the nose by the cubic Bezier curves arc(9.6, C, 9.14), arc(9.14, 9.2, 9.4), arc(9.13, 9.1, 9.5), and arc(D, 9.15, E).
7. Draw the contour of the left eyebrow by the corner-cutting subdivision curves arc(4.6, 4.4a, 4.4, 4.2) and arc(4.2, G, F, 4.6).
8. Draw the contour of the right eyebrow in a similar way.
9. Draw the contour of the left eye by the cubic Bezier curves arc(3.12, 3.2, 3.8), arc(3.8, 3.4, 3.12).
10.Draw the contour of the right eye in a similar way.
11.Draw a circle with the radius r1 and the center at EyeballLeft
representative of the left eyeball.
12.Draw a circle of the right eyeball in a similar way.
13.Draw the contour of the mouth by the cubic Bezier curves arc(8.1, 8.9, 8.4), arc(8.4, 8.2, 8.3), arc(8.3, 8.10, 8.1), arc(R, 2.2, S), and arc(S, 2.3, R).
14.Fill the predefined colors into their corresponding parts.
An illustration of the steps in the creation of the frontal cartoon face is shown in Figure 2.9. An experimental result of the creation of a frontal cartoon face is shown in Figure 2.10.
(a) (b)
(c) (d)
Figure 2.9 An illustration of the steps in the creation of the frontal cartoon face. (a) The creation of the contour of a face. (b) The creation of the ears. (c) The creation of the nose. (d) The creation of the eyebrows. (e) The creation of the eyes. (f) The creation of the mouth.
(e) (f)
Figure 2.9 An illustration of the steps in the creation of the frontal cartoon face. (a) The creation of the contour of a face. (b) The creation of the ears. (c) The creation of the nose. (d) The creation of the eyebrows. (e) The creation of the eyes. (f) The creation of the mouth. (continued)
(a) (b)
Figure 2.10 An experimental result of the creation of a frontal cartoon face. (a) A male face model. (b) A female face model.
2.4.2 Generation of Basic Facial Expressions
After a frontal cartoon face is created, we are concerned about how to generate some basic facial expressions to make the face livelier. Facial Action Coding System (FACS) [14] defines some basic Facial Action Units (FAUs), which represents primary movements of facial muscles in actions such as raising eyebrows, blinking, talking, etc. The FACS has been useful for describing most important facial actions, so some of the FAUs defined in it are considered to be suitable in the study for
synthesis of facial expressions. For example, the FAU 12, whose description is lip corner puller, can be viewed as a smile. And the FAUs 1 and 2, which respectively represent the inner and outer eyebrow raisings, are the basic facial expressions that frequently happen when one is making a speech. By taking the FAUs as references, we decide to define three basic facial expressions: eye blinking, smiling, and eyebrow raising.
For eye blinking, by changing the value of the FAPU LeftEyeH and RightEyeH, and setting up the positions of four model eye points 3.2, 3.4, 3.1, and 3.3 according to these two FAPUs, we can easily generate an eye blinking effect. An experimental result is shown in Figure 2.11.
Figure 2.11 An experimental result of generation of an eye blinking effect.
Similarly, by changing the positions of two model mouth points 8.4 and 8.3 according to the FAPU UpperLipH, a smiling effect can be created. In the meanwhile, by modifying the positions of two model eye points 3.4 and 3.3 based on the FAPUs LeftEyeH and RightEyeH, a squinting effect can be combined into the cartoon face to
make the smiling more vivid. An experimental result is shown in Figure 2.12.
Figure 2.12 An experimental result of generation of a smiling effect.
For eyebrow raising, all of the 8 model eyebrow points and 4 additional eyebrow points are involved. By regulating the positions of these points according to the FAPU EyebrowH, an eyebrow raising effect can be generated. An experimental result is shown in Figure 2.13.
Figure 2.13 An experimental result of generation of an eyebrow raising effect.
2.4.3 Creation of Oblique Cartoon Face
The basic idea of creation of an oblique cartoon face is to rotate the 3D face model on the three Cartesian axes in the local coordinate system. After rotation, the
3D points are projected to the X-Y plane and transformed into the global local system.
Then the cartoon face can be illustrated by the previously-mentioned corner-cutting subdivision and cubic Bezier curve approximation methods. In this section, a review of a 3D rotation technique is presented in Section 2.4.3.1. A simulation of eyeballs gazing at a fixed target while the head is turning is described in Section 2.4.3.2. At last, the creation process, including some methods to solve the additional problems while drawing, is described in Section 2.4.3.3.
2.4.3.1 Review of 3D Rotation Technique
Suppose that a point in a 3D space, which is denoted by (x, y, z), is rotated on the three Cartesian axes respectively, as shown in Figure 2.14
Figure 2.14 An illustration of a point rotated on the three Cartesian axes.
We define positive angles to be representative of counter-clockwise rotations, and negative ones representative of clockwise rotations. Two basic trigonometric equations as follows are used as the 3D rotation formula:
sin(θ β+ )=sinθ×cosβ +cosθ×sinβ; cos(θ β+ )=cosθ×cosβ−sinθ×sinβ .
Suppose the point is first rotated on the Y axis, so the y coordinate will not be changed. It is assumed that after projecting the point onto the X-Z plane, the distance
between the point and the origin is L, as shown in Figure 2.15.
Figure 2.15 An illustration of a point rotated on the Y axis.
Then the equations above can be transformed to be as follows:
1 cos sin
After canceling L, the formula for the point rotated on the Y-axis can be derived to be as follows: position of the point after the rotation is performed.
2.4.3.2 Simulation of Eyeballs Gazing at a Fixed Target
The basic idea to simulate the eyeballs gazing at a fixed target is to set up a point representative of the focus of the eyes in the local coordinate system of the face model.
By speculating the radius of the eyeball, the position of the eyeball center can be computed by the position of the pupil and the focus. Then for every rotation performed in the creation process, the new position of the eyeball center is also calculated. And the new position of the pupil can be computed by the position of the eyeball center and the focus. In this study, the speculated radius of the eyeball is set to be 0.3d, and the position of the focus is (EyeMid.x, EyeMid.y , 15d). An illustration of the focus and eyeballs is shown in Figure 2.16.
By speculating the radius of the eyeball, the position of the eyeball center can be computed by the position of the pupil and the focus. Then for every rotation performed in the creation process, the new position of the eyeball center is also calculated. And the new position of the pupil can be computed by the position of the eyeball center and the focus. In this study, the speculated radius of the eyeball is set to be 0.3d, and the position of the focus is (EyeMid.x, EyeMid.y , 15d). An illustration of the focus and eyeballs is shown in Figure 2.16.