Chapter 2 Literature Review
2.7 Time Series
In this dissertation, time series analysis is proposed to synthesize a character’s motion.
Time series has been popularly applied in statistics to forecast the trends in finance and marketing [22, 60]. They have also been used in control system, pattern recognition, or artificial intelligence [5, 46]. In computer graphics, they are adopted for aging trajectory prediction or character motion synthesis. For example, Scherbaum et al. [54] applied aging prediction to images of faces with 3D model reconstruction and support vector regression based on RBF kernel. Cai and Hodgins [12] generated animations from various user-defined constraints. Their system learned a state space model from motion capture data. This state space model was based on the deformed linear time series model, and was constructed from the concept of autoregressive model. They transferred constraint-based motion synthesis to a maximum-a-posterior (MAP) problem, and developed an optimization framework that generated a natural motion.
Furthermore, variants of hidden Markov models (HMMs)[4, 11] have been widely used to create the time series data of motion trajectories representing a character’s motion. HMMs learned from human motion data have been employed to interpolate key frames, and synthesize a new style of motion. However, these statistical schemes required full information about a character’s motion to train the initial statistical model.
18
For example, a large motion capture database of human body, or a large amount of user intervention for constraints was necessary. Our proposed approach learns a statistical dynamic model based on time series. Moreover, the dynamic behavior of the proposed model is predicted by Bayesian inference. More significantly, in contrast to previous methods, the proposed model allows the user to animate character smoothly without additional specified motion information.
19
Chapter 3 Statistical Approaches
In this chapter, we would introduce the statistical approaches that we use for 2D character animation summarily. We focus on the nonparametric regression model trained from key-poses of still images. Kernel regression with ERBFs is employed to train the model to represent the displacements of contours and fit the contours of deformed shape of a character. LOESS is adopted to fill in the color and texture information obtained from the original character in the given image. Thus, we introduce ERBFs in Section 3.1 first. Kernel regression with ERBFs is developed for regression prediction and analysis. Then LOESS is described in Section 3.2. Besides, for animating multiple limbs of a character simultaneously, Bayesian inference with RJMCMC is proposed to find parameters that satisfy the situation of the regression model. The sampling procedure of RJMCMC is outlined in Section 3.3. Furthermore, as mentioned previously, time series analysis is applied to predict the motion trajectory of the limbs.
Hence, we give a description of the time series model in Section 3.4 briefly.
3.1 Kernel Regression with Elliptic Radial Basis Functions
As mentioned before, researchers presented image morphing techniques using RBF for the kernel. RBF kernel is popular for interpolating scattered data. It is suitable for fitting smooth functions of the data and is used to warp facial expressions and animate images or drawings.
20
However, RBF is based on spatially-limited circular Gaussian distribution function.
It has a limitation in fitting the data on long or high-gradient shapes, such as cylindrical shapes, the body, and the head of a character. The radius might reach the shortest boundary of the area and might require numerous small RBFs to fit one long shape, which would be matched to the shape of the character such as the body and the head.
Therefore, we use ERBFs instead of RBFs. Note that ERBF has the advantage of RBF-like smoothness and is applicable to more general shapes than RBF. The kernel regression model with ERBFs is trained for the prediction of the deformed character’s shape in image space.
Note that there are two kinds of ERBFs: axis-aligned and arbitrary directional ERBFs. A comparison of these two basis functions is shown in Figure 3.1. This figure shows a long diagonal data distribution (pixels along contours) and the influences of the two basis functions are drawn overlaid on the data. The data is approximated by two basis functions: axis aligned ERBF shown in Figure 3.1 (a) and arbitrary directional ERBF shown in Figure 3.1 (b). The major axis of the ellipse with arbitrary directional ERBFs is aligned along the contour of a character which is a long diagonal data distribution (gray region). For achieving more accurate quality with smaller number of basis functions, arbitrary directional ERBFs are applied to fit the contours of a character in a still picture.
In general, let aaaaaaaiaa be a vector of the pre-sampled data and aaaaaaaaaaia be a center vector of an elliptic Gaussian. An arbitrary directional ERBF can be represented in a matrix form as follows:
( )
,21
,
{ }
cos sin
, where , ,
sin cos
i i
i i i i
a
i i i i
a a
A i x y
a a
θ θ θ
⎡ ⎤
=⎢⎣− ⎥⎦ ∈
θ θ
(3.4)
(a) (b)
Figure 3.1: Comparison of ERBFs. (a) Axis aligned ERBF. (b) Arbitrary directional ERBF. The influence range of each basis function is shown as blue arrows and black curve.
Figure 3.2: Schematic diagram of an arbitrary directional elliptic radial basis function (ERBF).
22
where aaaaaaaaaaaaaiaa is the covariance of Gaussian along i-axis. The orientation aia (the angle between the major axis of ellipse and i-axis) and the aspect ratio aaa are used to transfer to an arbitrary directional ERBF, as shown in Figure 3.2. Moreover, the transformation matrix aaaaa, which contains a rotation and scaling component, is applied for alignment along the data distribution. In our work, the major axis of the ellipse is aligned along the contour of the character, as shown in Figure 3.1 (b). For the mathematical details of Equation (3.1), it can be derived from a hyper radial basis function (HRBF).
HRBF is computed by using the Mahalanobis distance [29], which is defined in the matrix form as follows:
where aaaa should be the covariance of the multidimensional Gaussians rather than the single variance. HRBF differs from a standard RBF insofar each axis of the input space aaaaaaa (the space of square summable sequences of length N) has a separate smoothing parameter, i.e., a separate scale onto which the differences on this axis are viewed. It is worth mentioning that RBF kernels map the input space onto the surface of an infinite dimensional hyperspace. Note that N = 2 in arbitrary directional ERBF kernel represents the analysis of data distribution along the major axis and the minor axis in an ellipse.
Along the orientation of arbitrary directional ERBF (the major axis and the minor axis), Equation (3.1) is constructed.
2
σN
2
χ ⊆ N
In this dissertation, we formulate the problem of 2D character animation as regression analysis. Given two key-poses of a character in still images, we analyze the contours of a character and represent the displacements of these contours as ERBFs.
Then these ERBFs constructing an implicit regression surface can be used to predict the new position after deforming the shape of the character. In other words, we form a regression model trained from the given key-poses. The model is adopted to predict the motion of the character. Now, we derive the equation of kernel regression with ERBFs.
In general, the relationship of the response aa and the predictor aa can be described as
r u
23
Considering the above equation, f(.) denotes an unknown and smooth surface indicating the relationship between aa and aa, commonly termed the regression surface representing the shape deformation. Additionally, the error ε is assumed to come from a Normal distribution aaaaaaaa Equation (3.6), where aaa denotes the noise variance.
Note that the regression surface is estimated by using kernel regression with ERBFs.
ERBF is an appropriate choice to fit smooth functions for the form of f(.).
r u
( )
0, 2 τ2N τ
As mentioned above, aa denotes the center vector of elliptic Gaussian (ERBF). The proposed regression model consists of a radial component and an affine component.
Moreover, a radial one is developed as a linear combination of a set of basis functions and their corresponding coefficients.
where aaj denotes the suitably chosen coefficient of the j-th elliptic Gaussian aa(.), a is the related center vector, and k is the number of basis functions in the model. Note that there is the relative covariance of the j-th elliptic Gaussian along arbitrary i-axis aisss.
ai(.) is the radial component chosen as an arbitrary directional ERBF. Moreover, T(.) represents the affine component. In our work, we would further train the model to predict the motion of the character by synthesizing the contours of the character’s deformed shape.
After synthesizing the contours of the character’s deformed shape, we need to fill the contours and preserve details simultaneously. It is also motivated by following the process of the traditional 2D animation production. A similar issue occurs when the line art is scanned and goes to the next step of ink and paint. Hence, a local-fitting methodology called LOESS is applied to preserve the details or features of characters (that is filling in the color and texture information obtained from the original character
24
in the given image). Like kernel regression, LOESS is a procedure for fitting a regression surface to data through multivariate smoothing. LOESS uses the data from the neighborhood around a specific location. In other words, LOESS performs a linear regression on points in the data set, which are weighted by a kernel centered at that pre-defined location. It is much more strongly influenced by the data points that lie close to the location pre-defined according to some scaled Euclidean distance metric.
This is achieved by weighting each data point according to its distance to the pre-defined location: a point very close to it is given a weight of one and a point far away is given a weight of zero. Note that the shape of the kernel is a design parameter for which many possible choices exist. The original LOESS uses the tri-cube weighting function. Nonetheless, we have used the Gaussian kernel to estimate the weights in the range of unit circle, as shown in Figure 3.3 (b).
During a LOESS prediction, the specific location (red dot) aa, which would be filled color or texture information, is supplied. LOESS performs a linear regression on the sampled contour points weighted by a kernel centered at aa. Given m pairs of points sampled along the contour (purple dots) of the character in the input image and the corresponding new locations of these points, the weight of the i-th sampled contour point aa with Gaussian kernel is
x0 parameter that determines how quickly weights decline in value as one moves away from aia, Wkernel is the kernel width or bandwidth which controls the amount of localness in the regression.
, aaaaaaaaaaa maaaaaaaaaaaaa=
∑
iwi( )
x01 i m≤ ≤ s=1 2Wkernel2
x0
Let aaa be the predictor of the regression and aaa be the response. The regression function is specified by using an estimated local multivariate polynomial as follows:
xi yi
( ) ( )
i1 1 2 2
ˆi i i ... M M ,
y =ζ t x +ζ t x + +ζ t x (3.9)
where taa(.) is a function that produces the j-th term in the polynomial, and aaa is the j-th term of coefficients to be estimated. Equation (3.9) can be rewritten for matrix manipulation, which can be easily extended to datasets with many inputs:
tj ζj
25
(a) (b)
Figure 3.3: LOESS analysis. (a) Original image with a uniform grid. (b) The zoom-in view of the image. LOESS with Gaussian kernel is applied to estimate the weights.
( )
where aa is the matrix form of the coefficients vector aaaaaaaaaaaaa and (.) is the matrix form of the polynomial terms aaaaaaaaaaaaaaaaaaaaaa. Given m pairs of aaaaaaa, the general way to estimate aaa is by minimizing the sum squared residuals.
ζ where a Furthermore, note that the features of the original character interiors are considered as the specific locations to be preserved during a LOESS prediction.
According to the distance to these specific locations, the warping degree is adapted to these features and is constrained by them. Unlike global deformation, LOESS can maintain local features invariant during deformations while minimizing unnatural distortion. Thus, aa is chosen by minimizing locally weighted sum of squared residuals.
aaaaaaaii
( )
obtained by the least-squares normal equations. In our work, we further fill in the color and texture information of the deformed character by using Equation (3.10) with the estimated regression coefficient vector aa.aaaaaaaai
( )
t =t xi
ζˆ
26
3.3 Reversible Jump Markov Chain Monte Carlo (RJMCMC) Sampler
Instead of using least-squares method to estimate unknown parameters during regression analysis, RJMCMC sampler is applied to estimate the optimized regression parameters. For instance, the procedure to estimate the parameters of kernel regression with ERBFs consists of three steps as follows:
1. Set Up Proper Priors: Recalling Equation (3.7), let we be the mean of the j-th elliptic Gaussian, while aaa denotes the corresponding coefficient. We define aaaai as the covariance of the j-th elliptic Gaussian along i-axis. We begin with a fairly flat Gaussian prior on the basis coefficient aaaaaaaaaaaaaaaaaa, where precision is the precision of the coefficient prior. aaa is the noise variance, and aaaaaaaaaaaaaaaaaaaaa.
A vague but proper Gamma prior distribution represents ignorance of the noise process and avoids inverting large matrices within each iteration of RJMCMC. We set aaaaaaaaaaaaaaaaaaaaaaa initially and they would be updated during RJMCMC process.
vj
2. Determine Initial Parameter Value: Set the initial dimension k of the model equal to 3, that is intercept term plus the number of predictors. Then we use k-means clustering to set the starting center vector aaa for each k-means group of anchor points.
In addition, the covariance aaaaj is computed for each group. Besides, calculate aaa by using least-squares fitting.
3. Iterate RJMCMC Sampler Until Sufficient Samples: In the RJMCMC algorithm, we propose the next state of the chain representing a new basis function according to the following criteria. First, draw a uniform random variable aaaaaaaaaa. If aaaaaaaa, then perform the Birth step. In the Birth step, we would add a basis function (ERBF) in the model. Then the corresponding parameters are updated by k-means clustering simultaneously. Recalling Figure 3.2, for each k-means group, the transformation matrix aaaaa is computed for adding this basis function. If aaaaaaaaaaaaaa, then perform the Death step. In the Death step, we would lose a basis function. We just select one basis function at random and remove it. If aaaaaaaa, then perform the Move step. In the Move step, we choose a basis function from the model at random and reset its mean vector to
2 =
27
another random position. Next, the corresponding parameters are updated.
Then we would compute the marginal log-likelihood of the model and draw k new coefficients aaa. Given n pairs of predictors aaa and corresponding responses aa, we would compute the marginal log-likelihood for the creditable change of state as follows:
( )
2{ ( )
where aa is a general representation for the response of regression, as defined in Equation (3.6).
Let X be the responses of basis functions in the matrix form. Y denotes the corresponding responses of the regression model in the matrix form. P represents the matrix form of prior precision precision. aaa represents the matrix form of k coefficients aaj defined in Equation (3.7). Furthermore, aa is obtained from the marginal posterior distribution with posterior mean aaa and modified standard deviation.
Note that the initial standard deviation is drawn from the noise variance aaa and modified to be the upper triangle of posterior variance matrix aaaaaaaaaaa ned by using Cholesky decomposition.
aaaaaaaaaaaa
Next, consider to accept the proposed change of next state. We draw a uniform random variable aaaaaaaaaa irst. If u is less than the ratio of the marginal likelihood of proposed next state to the marginal likelihood of original one, then accept the proposed change to the model and update the state of the Markov chain. Otherwise set the next state to be the current state. Then update prior precision precision by drawing a random variable from a Gamma distribution and is modified by the sum of squares of aaa every 10 iterations. Recalculate the coefficients aaa from the marginal posterior distribution with the updated prior precision precision. Furthermore, draw a random variable aaa from a Gamma distribution for a new noise variance. Given response aa defined in Equation (3.6), aaa is modified by posterior sum of squares error for the next iteration.
Repeat RJMCMC process and record the number of states. An initial portion of the chain is discarded to ensure stability. If the number of states is greater than the discarded portion, then compute aa(.) defined in Equation (3.6) by the recorded parameters of the current model for synthesizing limbs movement. All the simulations are run with a
rd
28
burn-in period of 5000 iterations of RJMCMC followed by 10000 samples.
3.4 Time Series Analysis
As mentioned before, time series is applied to generate smooth and continuous limbs’
movements. In our work, ARMA is used to analyze limbs’ movements of a character in several previous time slices for estimating the motion trajectories. Then we could synthesize the current movements following these estimated trajectories. The general form of a time series model is considered as
(
1,..., ; 1,...,)
,where Dt denotes a univariate time series and fTS(.) indicates an unknown function of time series. p and q represent non-negative integers. aa is a sequence of random variables assumed to come from a Normal distribution with mean zero and variance one.
C is assumed to be a constant. Based on this general form, ARMA is formulated as follows:
where aaa and aaa are the coefficients of parameters in this model. It is similar to the time series model proposed by Chen et al. [13], except that they assumed the functional form of fTS(.) was a known linear function whereas we assumes fTS(.) is estimated nonparametrically along with the Bayesian estimation of ERBFs already described in Section 3.1 and Section 3.3 in order to add smooth variety of the time series data, that is, we develop ERBF kernel in the original time series model with parameters inferred by using RJMCMC. We further use this nonparametric time series model to forecast the current limbs’ movements of the character from his several previous poses.
φi κi
29
Chapter 4 Two-scale Image Abstraction
Generating a natural-looking 2D character animation from still photographs or paintings can be considered to analyze and simulate the character’s motion in that image. Note that a photograph contains redundant information. The raw format of a photograph may have 16-bits or even 24-bits per color channel. Using all contours of the character extracted from raw photographs for statistical analysis is not practical and useful. Hence, it is necessary to obtain contours of interests of the characters. We advocate the two-scale abstraction similar to progressive image abstraction proposed by Farbman et al. [23]. The proposed abstraction method is based on a two-scale decomposition of the image consisting of a base layer, which encodes large-scale variations of pixels, and a detail layer. The base and detail layers would be obtained by using an edge preserving filer called the bilateral filter [64]. Given photographs, the bilateral filter is applied to obtain regions of interest. The selected contours of a character from the detail layer, which represent important features, and the contours of that character in the base layer are used to estimate the character’s motion. The redundant information of a photograph is filtered by the bilateral filter so as to animate 2D character from arbitrary still pictures by the proposed statistical approaches.
4.1 Color Space Transformation
In order to keep the regions of interest, we propose the two-scale image abstraction based on the bilateral filter. It classifies the image into a base layer and a detail layer.
Important features can be preserved by adopting the contours selected from the detail
30
layer and the contours of the base layer to train the statistical model. In contrast, unimportant features can be filtered by applying the contours of base layer to the model only.
Tomasi and Manduchi [64] suggested computing the bilateral filter on a
Tomasi and Manduchi [64] suggested computing the bilateral filter on a