Related Work - 將空間-時間連續性之輻射狀基底函數應用於角色動作壓縮之研究

There have been extensive researches on the compression of time-varying data, especially, video and audio. Several animation compression techniques use theories from these researches. Goskov et al [4] developed their compression algorithm through a wavelet framework. They used a multi-resolution approach to encode animation sequences progressively. They also applied interframe difference of wavelet details to improve the compression ratios.

Goskov [4] and other earlier researchers mainly focused on compression of animated meshes. They usually compressed each frame individually. For instance, Rossignac [11] and Karni [8] both proposed algorithms to encode the triangle indices. However, there are both spatial and temporal coherence in common animations. Thus, Ibarria and Rossignac [5] used a space-time predictor and corrector to compress animated meshes with fixed-connectivity.

In recent years, more and more motion data have been used in various kinds of applications. For instance, movie and game industries need lots motion data to drive 3D characters. Therefore, skeletal motion compressions become an important research topic.

Principal component analysis, usually abbreviated as PCA is a considerable approach for animation compression. It represents high-dimensional data in a lower dimensionality without

Figure 2Goskov’s algorithm [4]

loss much information. For instance, Alexa and Muller [1] projected the entire animation sequences into a lower dimension space through PCA such that the motion data can be represented in a more compact form. PCA exploits spatial coherences implicitly. For effectively utilizing these coherences, motion data set was usually clustered into groups before PCA is applied. Such an approach was called clustered- PCA (CPCA). Therefore, perceptually or geometrically similar motion clips can share the same PC but have different coefficients. Arikan [9] provided a hierarchical technique to compress body animation database. The original database was divided into several clips. In order to utilize temporal coherence, author used cubic Bezier curves to approximate the trajectories of joints.

Obviously, each clip can not be too long. (usually 16-32 frames.) Then the author clustered these parameterized clips into several groups and performed principal component analysis to reduce the dimensionality of such clips. To preserve meaningful high-frequency features (i.e.

feet touching ground), He used a motion-JPEG technique to encode such important contacts.

The major limitation of his approach is that contacts need to be known.

In the same year, Liu et al.[3] proposed a segment-based approach to compress human motion data. They segment original motion data into several clips by probabilistic PCA (PPCA). Then they perform PCA analysis on these segments to reduce the dimensionality and use Bezier curves to fit the coefficients of PC finally. They both utilize temporal and spatial coherence, but in two phases.

Besides compressing the captured data directly, an idea is using fewer markers to drive skeletons. Using representative markers lowers the data size and reduces ambiguities during post-processing of mocap data. In other words, the goal is to decide which marker can be removed or which marker must be kept in the data. To find out the redundancy, Liu et al [2]

adopted a data-driven modeling approach to learn piecewise local linear models and use a modified principal feature analysis to choose the subset of markers. Original motion can be approximated by using a reduced marker set and these local linear models.

Figure 3: Liu [3] segment motion data and compress each segment by PCA

Approximation/Interpolation of animation is also an interesting approach. Uses choose several key frames and other frames can be estimated by interpolation or even extrapolation.

There are many considerable mathematical interpolation methods, for instance, polynomial functions, trigonometric functions, exponential functions or splines. Mukai and Kuriyama [14]

thought that motion interpolation can be approximated by weighted combination. They treat motion interpolations as statistical predictions. Arikan and Forsyth [10] synthesized human motions by a cutting-pasting concept. This approach generated smooth motions and satisfied spatial constraints. To approximate animations, the original animations are usually represented as parametric forms such that the motion trajectories can be fitted with curves. Igarashi et al [13] proposed an interpolation technique for performance-driven character animations. Given several predefined key frames, they used radial basis functions in spatial domain to interpolate the in-between frames.

Figure 4: Liu [2] select fewer markers to drive original skeleton.

Generally speaking, radial basis function is a very powerful interpolation tool. Through some proper modifications on distance metric, the radial basis functions can be expended into the hyper space. In other words, information between different dimensions may be shared.

Ravi et al [11] realized this idea. They used a modified radial basis functions to interpolate the BRDF samples between different positions and viewing angles. Such approach makes information can be shared across angular and spatial domain. Since animations have both spatial and temporal coherences, this means the information may spreads in these domains. Therefore we can adopt this concept to exploit information sharing.

Figure 5: Igarashi et al [13] choose several key postures from the original motion sequence and use radial basis functions to interpolate in-between data.

Figure 6: Ravi [11] used a modified radial basis functions to utilize data coherence across different domains.

3. Compression through Surface Approximation

For the convenience of implementation, we assign an unique joint index to each joint after converting them from angular offset to global positions. Figure 7 is an illustration of the joint indices. For example, joint index 0 is the hip, and joint index 9 is the left hand. With these indices, we can store their positions in array and access them by indices.

As we pointed out in the introduction, the trajectory of each joint is essentially a curve.

Therefore, if we lay several joint trajectories together, we have a surface. Figure 8 is an example. In this figure, joints highlighted in yellow have similar trajectories. We collected these trajectories and form a smooth surface. The red arrow indicates the temporal direction/domain and the green arrow indicates different joint indices (i.e. spatial domain).

Since a smooth surface means the grouped data are more consistent and can be encoded with fewer parameters, therefore, our goal become rearranging data. The following chapters

“Segmentation” and “Clustering” will introduce how we choose these samples.

Figure 7: Illustration of joint index.

Figure 8: An example of constructed surface. The left one illustrates that several joints with similar trajectories form a smooth surface. The right one represents the

corresponding joints. (colored in yellow.)

4. Segmentation

Generally speaking, any motion data can be thought as a concatenation of different logical behaviors. For example, a motion may have two logical states: from walking to running. If we carefully segment motion data into several distinct motion behaviors, we may have more spatial coherence to utilize. Besides, behavior transition usually results in intense variations. Therefore, if we can cut appropriately, we can alleviate such high variations and preserve the smoothness in local surface.

In this thesis, we adopt Barbic, J et al [6]’s method to segment our motion data. Although they proposed 3 methods to segment motion automatically, we choose incremental PCA-based method. The idea is based on the observation where simple motion can be represented better in lower dimensions than the complex one in the same dimensions.

When projecting motion data into a lower dimensional space, unavoidable error will be introduced. The definition of error term is:

∑

where r is the number of principal components, N means original dimension.σ is the singular value of SVD.

is an indicator to tell us how much information retained after projection. Once we decide how much error we can accept, the number of principal components is decided. Given a starting short clip, we calculate its principal components. Then we append a new frame and perform PCA on this segment again with same number of principal components again. For simple motion, the error ratio will rise steady. If the motion is transiting into a new behavior, the error ratio will rise much quickly and a cut should be placed.

Reference Posture

Each motion segment represents a different motion behavior. Although we avoid high variations in the temporal domain by segmentation, there still exist high variations between joints in spatial domain with in the segment.

A simple method to alleviate this situation is to compute the mean posture of each segment as a reference. And this reference will be subtracted from each joint sample in this segment. Our experiments show that the equalization process is worth. The reason is that

Figure 9: J. Barbic [6] used IPCA to segment motion. Once the error with same number of PC rise quickly, we assign a cut here.

than the left one obviously.

Figure 10: The comparison of shifting joint trajectories to reference posture.

5. Clustering

To exploit more coherence, we conduct many experiments for evaluation. From observation, we realize joints have more spatial coherence or relation with near one. For example, there exists high relationship between the hip and the chest but not hands or legs.

Therefore we may separate entire body into several parts: head, extremities, and torso. If we can approximate a trajectory of joint by curve fitting, we can group many joints to form a surface and approximate it. Afterward, we perform surface approximation for each body part independently. However, there are some problems here. When people running, there exists dramatically variation in the parts of arms or legs. If we collect joints within this fixed part to form a single surface, such a surface is usually jarring and very difficult to find out useful spatial coherence. Besides, each body part is independent. Even some joints may have spatial coherence, we can not utilize them when they are assigned in different body-parts. Therefore, a practical approach is to analyze the behaviors of joints in the current segment and group joints with similar trajectories into a single cluster. A straightforward method is using the correlation of joints. But our experiments show that this is an unsuitable approach because some joint’s positions may be fixed in a specific time interval. We can not find the linear relationship with other joints.

In order to solve this problem, we calculate the difference between each joint pair frame by frame. And the similarity was defined by the standard deviation of these differences. When

joints do not move quickly in current segment, we should group all of them in a single cluster.

By contrast, if joints move quickly, usable coherences will be relatively lower and the amount of cluster elements should be much fewer.

In order to adjust the number of clusters dynamically, we have to check the average standard deviation of the current cluster when the k-means algorithm converges. If this value exceeds a predefined threshold, the current cluster will be subdivided and re-clustering until the average standard deviation of each cluster is smaller than threshold.

Here is an example of our clustering result. In this segment, people rises his arms almost symmetrically from frame 497 to frame 557. According to our thesis, joints with similar trajectories will be grouped together. Figure 12 shows that we have 4 clusters in this segment.

They are {joint 7, joint 12}, {joint 9, joint 14}, {joint 13, joint 8}, and residual joints.

Figure 11: A motion segment with rising arms symmetrically.

Figure 12: These are results after clustering. The highlight joints (yellow color) indicate that they are grouped together.

Cluster 0 Cluster 1 Cluster 2 Cluster 3

6. Approximation

6.1 Radial Basis Function

Since a general trajectory of human motion is continuous and smooth. Arikan [10] used cubic Bezier curves to approximate the trajectory of each joint. But such functions only approximate animation in temporal domain. Each joint is independent in spatial domain. To utilize both spatial and temporal coherence, we modify the radial basis functions to approximate body movement in two domains concurrently.

Radial basis functions are commonly used in function approximation, scatter-data interpolation, and time series prediction. The basic concept is that any smooth function can be approximated by weighted combination of basis functions. Since radial basis functions are constructed from scatted data sets, we call these scattered data as samples or centers because the basis is positioned there. We adjust the amplitude and width of our basis to form a smooth function. A general equation can be written as:

( ) ∑ ( ) ( )

where y(x) is the function value at position x, N is number of centers, coefficients w means the amplitude of each basis φ. ||.|| denotes the distance between centers. P(x) is a low polynomial term or called the affine term also. The commonly chosen of this basis function may be:

In our thesis, we choose Gaussian basis e⁻^kr². The constant k affects the width of each basis. Once k was chosen, we picked several samples as centers to train radial basis functions (i.e. solve coefficients w, See Appendix: “How to solve RBFs coefficients”) and we have a

smooth curve or surface which is close to the real data.

6.2 Space-time Radial Basis Functions

If we have any two joints Jointi and Jointj whose coordinates are (xi, yi, zi, ti) and (xj, yj, zj, t_j) in hyper-space. Typically, the distance term should be defined as:

( ) (

) (

)

However, we reorganized the joint sequence due to the processing mentioned previously.

In our thesis, we redefine the basis with distance as:

⎟⎟

s jo index jo index

r = − (10)

( )

2 _ _i _ _j

t frame index frame index

r = − (11)

where ct and cs are constants which control the shape of basis.

6.3 Approximation

Different composition of centers may have great influence on the compression or approximation results. Since our goal is to approximate the motion variations by radial basis function, we utilize an iterative greedy-algorithm propose by Carr [8]. For each cluster, we choose initial centers to train the radial basis functions network at beginning. If current cluster forms a surface, we choose the corners of surface as our initial centers. If current cluster is a curve, we choose the initial centers at first and last frame. Then we use this function to reconstruct the animation segment. Samples with larger residuals will be chosen as new centers and we re-train the radial basis functions. The iterative step will continue until stop criterions are satisfied. (i.e. approximation error smaller than the predefined threshold.)

Figure 14 is the iteration flowchart which illustrates how we use radial basis functions to approximate motion data gradually. Figure 15 is an example of approximating a curve. The green curve is the real data. The red curve represents the approximated curve. And blue dots mean centers.

Setup initial centers

Train RBFs

Evaluate error & Pick up new centers

Criterion Satisfied?

Finish Yes No

Figure 14: An illustration of the approximation steps.

Figure 15: An example of approximating a curve. Green curve and red curve are real data and approximated function respectively. Blue dots are centers of radial basis functions.

Figure 16: An example of real data surface (green one) and approximated surface (red one).

Experiment and Result

In order to prove our thesis utilizing spatial and temporal coherence, we prepared 16 testing data and design several experiments. The original file format is the commonly used BVH. However, as we mentioned previously, we converted the motion data from hierarchical angular domain to Euclidean system before starting compression. We stored such converted data in “BIN” file format and the file extension is “bin”. The BIN file contains the global positions of joints frame by frame in binary. The compressed motion data were encoded in

“R” file format. Therefore, our compression ratio is defined as:

Compression_ratio = ( size of BIN file )/( size of R file) (12) Since our thesis is a lossy compression method. We defined the error threshold is 5 cm if the height of the subject is 1.8 m. Such error threshold is sensible hardly for human eyes.

The first experiment compressed these testing data directly. It is worth noting that different motion data have different coherence that we can utilize. Therefore we have to adjust the basis shape (i.e. Cs & Ct) to produce better results. Generally speaking, when the target surfaces or curves are smoother, C_t and C_s should be larger such that far fewer centers are needed.

Motion Cs Ct .bin .r Ratio

Ballet05 64 256 200,687 5,187 1:38.7

Ballet23 4 256 117,059 3,869 1:30.3

Cowboy3 16 32 191,027 10,127 1:18.9

Cowboy4 16 32 165,083 12,246 1:13.4

Drunk5 16 64 269,135 26,962 1:9.9

Faint5 4 32 95,255 9,571 1:9.9

ShotShoulder03 4 32 52,475 6,146 1:8.5

Sit21 4 256 72,899 5,540 1:13.2

Sneak01 32 128 148,523 8,359 1:17.8

Stand03 32 128 62,963 5,390 1:11.7

Tired05 32 128 203,999 16,622 1:12.3

Walk25* 32 64 248,711 12,774 1:19.5

Walk34* 32 64 241,811 10,987 1:22

Second, we compared two compression approaches: a space-time method and a traditional approximation method in temporal domain only. In this experiment, the temporal constant must be same value between these two methods.

Table 1: Compression Results

Motion .bin s-t ratio

The third experiment shows how the segment length affects our compression results.

Since IPCA will produce uncontrollable segment length, we propose use several fixed length segment to test.

Table 2: The Comparison time domain only and space-time approach.

Motion Frame Seg. 100 Seg. 200 Seg. 300

Obviously, the compression ratios are better in the longer segments. This is because longer segment may propose more coherence in temporal domain. However, if segments were too long, we may find fewer joints with similar trajectories. In other words, we miss mush spatial coherence. Therefore, the length of each segment is an important tradeoff we have to concern.

The final experiment discussed the performance of our approach. The platform is P4 Table 3: A discussion of segment length.

motion

Obviously, the compression time is much longer than decompression time. This is because iteration process. Solving the coefficients of radial basis functions is inverting a large matrix essentially. Fortunately, decompression speed is more important than compression ones in practical. Therefore this thesis is still practicable and feasible.

Table 4: Algorithm performance

8. Conclusion

In this thesis, we propose a compression method for human motion data. Unlike previous studies, our method utilizes radial basis functions to approximate motion in both spatial and temporal domain simultaneously. In order to find out more coherence, we analyze motion data in temporal and spatial domain and reorder the joint sequence such that we have a smooth curve or surface.

However, if we can not find out sufficient spatial coherence in motion data, the compression ratio will be close temporal-domain compression only. This is because most clusters form curves. Besides, solving the coefficients of radial basis functions is inversing a matrix essentially. This means the matrix size can not be too large. Although we may find a cluster with much useful coherence, we still need to care about the amount of available samples.

9. Future Work

Important future research directions are listed as follows: First, since our method is a flexible compression component. It can be combined with other compression component to achieve better results. For example, there may be repetitive motion behaviors in some motion data. We can retrieve such motion and represent it more efficiency.

Second, in the typical motion data, the length of bone is fixed and this data is usually known in advance. Thus, we can use it as an additional constraint. In details, we can approximate motion data roughly. During decompression, we may utilize this constraint to enhance the reconstructed joint position more correctly.

Although we proposed a efficient framework to exploit coherence, more sophisticated methods, e.q. PPCA segmentation may improve our analysis.

Hierarchical Radial Basis Functions

Automatically deciding the width of each center is very difficult. Different human motions are quite different in behavior. Therefore, approximating surfaces with same ct and cs can not have a good compression result. Future researches may adopt the hierarchical radial basis functions to overcome this problem. For example, we use wider basis to capture the rough shapes of surfaces and use thicker basis to capture the high-variation part. In other

在文檔中將空間-時間連續性之輻射狀基底函數應用於角色動作壓縮之研究 (頁 12-0)