Dissertation Organization - 以統計方法為基礎之二維角色動畫合成

Chapter 1 Introduction

1.6 Dissertation Organization

The remainder of this dissertation is organized as follows. Chapter 2 reviews the related literature on animating characters in still pictures. Chapter 3 then summarizes statistical approaches that we employ to animate 2D characters. Considering arbitrary pictures, the two-scale image abstraction is used to eliminate redundancy information in Chapter 4. Next, Chapter 5 describes how to apply nonparametric regression to generate a novel view of a character. Chapter 6 deals with expressive talking face simulation. In addition, Chapter 7 further infers limbs movements by integrating Bayesian inference and time series with the regression model. Moreover, Chapter 8 demonstrates how to apply the proposed statistical approaches to animate passive elements for simulating natural phenomena. Finally, Chapter 9 concludes this dissertation by summarizing our contributions and suggesting future research directions.

Chapter 2 Literature Review

2D Character animation involves novel view generation, expressive talking face simulation, and limbs movement synthesis in this dissertation. Many research areas are relevant to this dissertation. The following sections thus briefly review the techniques for 2D character animation.

2.1 Image Morphing

To animate characters from still pictures in image morphing community, several studies [28, 48] referred to as shape blending have been conducted. For example, Sederberg and Greenwood [55] employed an interpolation scheme that can interpolate the length of edges and angles between two keyframes. Furthermore, several methods [70] have extracted properties of the given key-poses, and used them to generate characters’

motions. Xu et al. [74] synthesized an animal’s motion by inferring its motion cycle representing the ordered motion snapshots. By morphing among the ordered poses and refining the appearances of in-betweens, an animal could be animated. Chuang et al.[18]

adopted a wavelet curve descriptor combined with Lagrangian dynamics to implement the animation by image morphing. The wavelet coefficients could represent the shapes of images in different resolutions. Lagrangian dynamic equation could be applied to simulate periodic motions. They utilized a non-self-intersecting contour morphing to produce the motion of a similar nature by generating in-betweens. Shutler and Nixon [61] derived Zernike velocity moments from the video about a character’s locomotion.

Then they used Zernike velocity moments to reconstruct the silhouette of an occluded

character’s locomotion which preserved a smooth transition. Our method only employs the correspondence of a character in several still images to synthesize the character’s motion.

Besides, several studies[4, 52] referred to character motion synthesis have been conducted by using RBFs for image morphing. RBF is a weighted sum of the translation of a radially symmetric basic function augmented by a polynomial term. It is suitable for fitting smooth functions. It could be further used to warp facial expressions and animate images or drawings [2, 36]. In contrast, circular Gaussian is not an appropriate choice to fit noncircular structures. In this dissertation, we adopt ERBFs to fit contours of characters instead of RBFs. ERBF has the advantage of RBF-like smoothness.

Moreover, ERBF is applicable to more general shapes than RBF. Nonlinear approximation of functions in certain general spaces with ERBF networks (referred to as elliptic basis function networks in [47]) was proposed. Furthermore, a volumetric approximation and visualization system was developed with ellipsoidal Gaussian functions for a 3D volume (referred to as ellipsoidal basis functions in [33]).

DeJuan and Bodenheimer [19] synthesized in-between contours and textures of a character based on RBF interpolation and elastic registration by two given keyframes of an animation. They generated a 3D mesh, which was fitted from the implicit surface generated by RBF interpolation, to obtained in-between contours. Contour points and the corresponding normals of a character in a keyframe were used in RBF method to interpolate an implicit surface. Then a 3D mesh describing the surface was generated.

The mesh was sliced in the middle to create in-between contours. In-between textures were synthesized by using an elastic registration. Our approach fits contours with ERBF kernel in image space directly. As mentioned above, ERBF has the RBF-like smoothness and is suitable to more general shapes than RBF. Besides, in-between textures they created would be distorted in complex patterns made up of a few solid colors. LOESS we used could preserve the details without undesired distortion.

2.2 Shape Deformation

General research on solely image-based animation has recently been carried out based on the shape deformation of a single image. Recently, skeleton-based techniques [24, 76]

have been used to deform the shapes by manipulating the space in which they are embedded. These techniques were very efficient in computation and easy to be implemented. However, they did not provide convenient or meaningful interaction tools for the user. Note that the weight tuning for rigging is a painful process for users.

Besides, shape matching techniques have been used to shape deformation. Wang et al.

[33] utilized uniform grids for 2D shapes and maintained the rigidity of each square in the grid by using shape matching during deformations. They implemented pure rotational transformation for each square. Note that the global area cannot be preserved.

Botsch and Sorkine [7] deformed a 2D shape by discretizing the shape into finite elements. However, the computation time was dominated by the complexity of the discretization, and not by the intrinsic complexity of the shape itself.

Furthermore, Alexa et al. [1] considered that the shape deformation of an image should be as rigid as possible. Such deformations would minimize the amount of local scaling and shearing. Igarashi et al. [31] triangulated the input image and minimized the distortion of these triangles in the deformation process by solving a linear system of equations. Schaefer et al. [53] proposed a rigid transformation method by moving least squares. Their study concentrated on specifying deformation by using user-specified handles. In order to generate an animation, users needed to set the next pose by manipulating control vertices. Then the method deformed the entire image plane. Since it ignored the geometry of the shape, unnatural distortions or serious artifacts would be generated when the range of controlling handles were exceeded because of the limitation of the locally influencing extent by using moving least squares. Weber et al.

[68] generalized the concept of barycentric coordinates and provided a few examples of known coordinates which could be used for planar shape deformations. Note that the inputs of these works are images and the outputs are also the edited and deformed images. In comparison, our input is just an image and the output is the whole sequence of interpolated frames.

2.3 Image Interpolation

Image-based animation has recently been carried out in computer vision community [27, 34, 35, 49, 62, 65, 73]. Optical flow techniques could be widely adopted for image interpolation. Baker et al. [3] created a collection of optical flow datasets with ground truth. They measured the flow accuracy and the interpolation quality of these optical flow algorithms adopted for image interpolation. While the primary focus of the optical flow algorithms was on evaluating the flow itself. Ghosting and blurring artifacts were visible in their interpolated images even though there were minor errors in the flows.

Mahajan et al. [39] proposed an inverse optical flow method. They traced out the path of each pixel between two given images. Then the pixel in the interpolated frame was obtained by moving gradients along the corresponding path and using Poisson reconstruction. Note that they need to determine the flow of each pixel for constructing the path framework. Since these optical flow techniques are based on the disparity of two given images, most of them can only handle two similar images (the disparity or the motion between two images is limited).

2.4 View Interpolation

Several approaches [15, 25, 66] for view interpolation could be applied to generate 2D character animation. Seitz and Dyer [56] proposed a method known as view morphing.

The input image was prewarped with the image points through the fundamental matrix computed by computer vision techniques or predefined. Then images were transformed onto the same plane such that their scan lines were aligned. Two views were then morphed, and the interpolated images were postwarped with the user-specified parameters to achieve better morphing quality. However, the quality depended on the number of line correspondences made by users.

2.5 Expression and Viseme Synthesis

Importantly, synthesizing a natural expression or viseme of a character from still pictures is a critical issue for 2D character animation. Chuang and Bregler [17]

proposed an audio-driven synthesis technique for creating an expressive facial animation by extracting information from the expression axis of a speech performance.

A statistical model based on principal component analysis for factoring the expression and visual speech was learned from video. With this analysis of the facial expression, the facial motion could be more effectively retargeted to another 3D face model.

Moreover, there is a strong correlation between lips movement and speech [40], and a great number of studies have been conducted on facial animation involving lip-synching (short for lip synchronization). There have been multiple attempts at generating an animated face to match some given speech realistically [6, 8, 20, 21]. Incorporating speech therefore seems crucial to the generation of true-to-life animated faces. Our synthetic faces of the character are also driven by input speech. We reproduce small variations in facial expressions that convey the affective states, moods, and personality of the character. Furthermore, the strong interrelation between facial gestures and prosodic features has been reported in the speech processing literatures [10, 11].

However, the interrelation between facial gestures and individual phonemes is not obvious. Our main focus is to synthesize facial animation possibly driven by analyzing phonemes from input speech.

2.6 Motion Capture

Conversely, motion capture technology has enabled users to accumulate large database of human motion which makes the construction of empirical models of a motion feasible. In this technique, joint angles of a performing actor are recorded via sensors.

These values are then used to create a character’s motion [41]. A deal of research aimed at adapting the motion to different constraints while preserving the style of the original motion. Witkin and Popovic [69] developed a method in which the motion capture data was warped between keyframe-like constraints set by the animator. Warping was done

by overlapping and blending motion clips. Rose et al. [50] developed a method which used RBFs and low-order polynomials to interpolate new motions between example motions obtained from motion capture while maintaining inverse kinematic constraints.

As mentioned previously, Hornung et al. [30] accomplished the motion of photographed persons by projecting them to 3D motion data. However, they stipulated extra 3D information, including a 3D motion database and the corresponding model pose determination, thus increasing the overloads which did not belong to image reanimation. Although they could be applied to animate arbitrary characters from 2D images, their system did not work for motions where the character changed its moving direction, or where it turned its head. In this dissertation, the proposed time series scheme based on a nonparametric Bayesian approach does not have this limitation.

2.7 Time Series

In this dissertation, time series analysis is proposed to synthesize a character’s motion.

Time series has been popularly applied in statistics to forecast the trends in finance and marketing [22, 60]. They have also been used in control system, pattern recognition, or artificial intelligence [5, 46]. In computer graphics, they are adopted for aging trajectory prediction or character motion synthesis. For example, Scherbaum et al. [54] applied aging prediction to images of faces with 3D model reconstruction and support vector regression based on RBF kernel. Cai and Hodgins [12] generated animations from various user-defined constraints. Their system learned a state space model from motion capture data. This state space model was based on the deformed linear time series model, and was constructed from the concept of autoregressive model. They transferred constraint-based motion synthesis to a maximum-a-posterior (MAP) problem, and developed an optimization framework that generated a natural motion.

Furthermore, variants of hidden Markov models (HMMs)[4, 11] have been widely used to create the time series data of motion trajectories representing a character’s motion. HMMs learned from human motion data have been employed to interpolate key frames, and synthesize a new style of motion. However, these statistical schemes required full information about a character’s motion to train the initial statistical model.

For example, a large motion capture database of human body, or a large amount of user intervention for constraints was necessary. Our proposed approach learns a statistical dynamic model based on time series. Moreover, the dynamic behavior of the proposed model is predicted by Bayesian inference. More significantly, in contrast to previous methods, the proposed model allows the user to animate character smoothly without additional specified motion information.

Chapter 3 Statistical Approaches

In this chapter, we would introduce the statistical approaches that we use for 2D character animation summarily. We focus on the nonparametric regression model trained from key-poses of still images. Kernel regression with ERBFs is employed to train the model to represent the displacements of contours and fit the contours of deformed shape of a character. LOESS is adopted to fill in the color and texture information obtained from the original character in the given image. Thus, we introduce ERBFs in Section 3.1 first. Kernel regression with ERBFs is developed for regression prediction and analysis. Then LOESS is described in Section 3.2. Besides, for animating multiple limbs of a character simultaneously, Bayesian inference with RJMCMC is proposed to find parameters that satisfy the situation of the regression model. The sampling procedure of RJMCMC is outlined in Section 3.3. Furthermore, as mentioned previously, time series analysis is applied to predict the motion trajectory of the limbs.

Hence, we give a description of the time series model in Section 3.4 briefly.

3.1 Kernel Regression with Elliptic Radial Basis Functions

As mentioned before, researchers presented image morphing techniques using RBF for the kernel. RBF kernel is popular for interpolating scattered data. It is suitable for fitting smooth functions of the data and is used to warp facial expressions and animate images or drawings.

However, RBF is based on spatially-limited circular Gaussian distribution function.

It has a limitation in fitting the data on long or high-gradient shapes, such as cylindrical shapes, the body, and the head of a character. The radius might reach the shortest boundary of the area and might require numerous small RBFs to fit one long shape, which would be matched to the shape of the character such as the body and the head.

Therefore, we use ERBFs instead of RBFs. Note that ERBF has the advantage of RBF-like smoothness and is applicable to more general shapes than RBF. The kernel regression model with ERBFs is trained for the prediction of the deformed character’s shape in image space.

Note that there are two kinds of ERBFs: axis-aligned and arbitrary directional ERBFs. A comparison of these two basis functions is shown in Figure 3.1. This figure shows a long diagonal data distribution (pixels along contours) and the influences of the two basis functions are drawn overlaid on the data. The data is approximated by two basis functions: axis aligned ERBF shown in Figure 3.1 (a) and arbitrary directional ERBF shown in Figure 3.1 (b). The major axis of the ellipse with arbitrary directional ERBFs is aligned along the contour of a character which is a long diagonal data distribution (gray region). For achieving more accurate quality with smaller number of basis functions, arbitrary directional ERBFs are applied to fit the contours of a character in a still picture.

In general, let aaaaaaaiaa be a vector of the pre-sampled data and aaaaaaaaaaia be a center vector of an elliptic Gaussian. An arbitrary directional ERBF can be represented in a matrix form as follows:

( )

{ }

cos sin

, where , ,

sin cos

i i

i i i i

a a

A i x y

a a

θ θ θ

⎡ ⎤

=⎢⎣− ⎥⎦ ∈

θ θ

(3.4)

(a) (b)

Figure 3.1: Comparison of ERBFs. (a) Axis aligned ERBF. (b) Arbitrary directional ERBF. The influence range of each basis function is shown as blue arrows and black curve.

Figure 3.2: Schematic diagram of an arbitrary directional elliptic radial basis function (ERBF).

where aaaaaaaaaaaaaiaa is the covariance of Gaussian along i-axis. The orientation aia (the angle between the major axis of ellipse and i-axis) and the aspect ratio aaa are used to transfer to an arbitrary directional ERBF, as shown in Figure 3.2. Moreover, the transformation matrix aaaaa, which contains a rotation and scaling component, is applied for alignment along the data distribution. In our work, the major axis of the ellipse is aligned along the contour of the character, as shown in Figure 3.1 (b). For the mathematical details of Equation (3.1), it can be derived from a hyper radial basis function (HRBF).

HRBF is computed by using the Mahalanobis distance [29], which is defined in the matrix form as follows:

where aaaa should be the covariance of the multidimensional Gaussians rather than the single variance. HRBF differs from a standard RBF insofar each axis of the input space aaaaaaa (the space of square summable sequences of length N) has a separate smoothing parameter, i.e., a separate scale onto which the differences on this axis are viewed. It is worth mentioning that RBF kernels map the input space onto the surface of an infinite dimensional hyperspace. Note that N = 2 in arbitrary directional ERBF kernel represents the analysis of data distribution along the major axis and the minor axis in an ellipse.

Along the orientation of arbitrary directional ERBF (the major axis and the minor axis), Equation (3.1) is constructed.

σN

χ ⊆ N

In this dissertation, we formulate the problem of 2D character animation as regression analysis. Given two key-poses of a character in still images, we analyze the contours of a character and represent the displacements of these contours as ERBFs.

Then these ERBFs constructing an implicit regression surface can be used to predict the new position after deforming the shape of the character. In other words, we form a regression model trained from the given key-poses. The model is adopted to predict the motion of the character. Now, we derive the equation of kernel regression with ERBFs.

In general, the relationship of the response aa and the predictor aa can be described as

r u

Considering the above equation, f(.) denotes an unknown and smooth surface indicating the relationship between aa and aa, commonly termed the regression surface representing the shape deformation. Additionally, the error ε is assumed to come from a Normal distribution aaaaaaaa Equation (3.6), where aaa denotes the noise variance.

Note that the regression surface is estimated by using kernel regression with ERBFs.

ERBF is an appropriate choice to fit smooth functions for the form of f(.).

r u

( )

^0, ² ^τ²

N τ

As mentioned above, aa denotes the center vector of elliptic Gaussian (ERBF). The proposed regression model consists of a radial component and an affine component.

Moreover, a radial one is developed as a linear combination of a set of basis functions and their corresponding coefficients.

where aaj denotes the suitably chosen coefficient of the j-th elliptic Gaussian aa(.), a is the related center vector, and k is the number of basis functions in the model. Note that there is the relative covariance of the j-th elliptic Gaussian along arbitrary i-axis aisss.

ai(.) is the radial component chosen as an arbitrary directional ERBF. Moreover, T(.) represents the affine component. In our work, we would further train the model to predict the motion of the character by synthesizing the contours of the character’s deformed shape.

After synthesizing the contours of the character’s deformed shape, we need to fill the contours and preserve details simultaneously. It is also motivated by following the process of the traditional 2D animation production. A similar issue occurs when the line art is scanned and goes to the next step of ink and paint. Hence, a local-fitting

在文檔中以統計方法為基礎之二維角色動畫合成 (頁 31-0)