4 Object Posture Temporal Super-Resolution Using Tensor
4.2 Object Posture Temporal Super-Resolution
4.2.3 Temporal Super-Resolution Using Tensor
After constructing the graphical model, we next wish to transform each human motion sequence into a motion trajectory in the manifold domain. However, since the LR input sequence usually contains poor motion content with low frame rate, its projected motion trajectory in the manifold space would become non-smooth and unreliable. Therefore, we propose to first apply tensor decomposition to separate motion trajectories into two orthogonal factors: motion and person factors. Next, we transfer the motion factor extracted from HR learning sequences to the input sequence and combine the input sequence’s person factor to synthesize the motion trajectory for the input sequence with high frame
rate. However, the limitation of tensor decomposition is that the motion data need to be arranged into various orthogonal factors beforehand. Such requirement makes it hard if we apply tensor decomposition to decompose motion data into orthogonal factors. Typically, human motion sequences would have different lengths or different sampling rates. In this work, we propose a motion data alignment scheme which can automatically arrange motion data in tensor. Then, we can apply tensor decomposition to decompose motion data into orthogonal factors.
Tensor is a general form of matrices which defines multi-linear operators over a set of vector spaces and provides a unified mathematical framework for linear analysis. The decomposition of tensor can be seen as a generalization of Singular Value Decomposition (SVD) of matrices.
The tensor could be expressed as the product of N-orthogonal spaces as:
1 1 2 2 3... N N
T = ×C S × S × × S (4.1)
as illustrated in Figure 4.2(a), denotes the tensor data, denotes the core tensor, and stands for the n-th orthogonal sub-space.
The tensor decomposition process includes two steps:
1. For , computing the matrix by conducting SVD on the flattened matrix (as shown in Figure 4.2(b)) and then setting
to be the left matrix of the SVD.
2. Finding the core tensor in (4.1).
(a)
(b)
Figure 4.2 Illustration of tensor decomposition and arrangement: (a) a tensor data is decomposed into the product of core tensor and orthogonal factors, and (b) a tensor is flattened in two different ways to obtain flattened matrices.
Before using tensor decomposition to obtain the orthogonal factors, we will need to arrange motion data in the subspaces of tensor in terms of certain attributes. Since human motion sequences contain no definite labels, we need to take special care to correctly organize motion data in tensor. Below we present our proposed motion data alignment method.
We first use a continue motion curve to represent the motion trajectory for each HR learning sequence. Each motion trajectory is
normalized into the same temporal duration and then mapped into a motion curve by polynomial regression. Next, we find some points with significant motion content along the motion trajectory for data alignment.
Two examples are shown in Figure 4.3, where each motion trajectory along the first dimension in the manifold domain has some wave crests and troughs. These wave crests and troughs occur just when the person finishes a previous motion and starts to perform the next motion. The other postures in-between the wave crests and troughs would usually contain slow motion due to the human body constraint. These properties as shown in Figure 4.3 are actually invariant to different persons.
Therefore, we could sample the points on the wave crests and troughs as the significant points for each motion curve.
Figure 4.3 Illustration of the low-dimensional manifolds of two different posture sequences and the corresponding postures at the crests and troughs of the manifold.
In addition, to make sure that the sampled points contain sufficient information to represent the original motion trajectory, we additionally
s1 s2 s3 s4
s1 s2 s3 s4 s5 s6 s7 s8
sample n points on the motion curve between every two neighboring key points. These additional points are uniformly sampled under the constant motion assumption between two neighboring key points. The number n is determined by minimizing the distortion between the original motion trajectory and the reconstructed motion trajectory from the sampled points. The threshold is set as the shape context distance between two continuous postures of human motion with static motion. Finally, a fixed number of m sampled points is used to represent the motion trajectory for each training sequence. We then arrange these m points in tensor.
As to the input sequence alignment, since the LR input sequence usually does not contain reliable low-dimensional motion trajectory information, we choose to align the motion data using the raw postures instead of the points along the motion trajectory of test sequence. In order to find k postures among the m sampled points, we arrange the coordinate value of all postures to form a histogram distribution with k bins as shown in Figure 4.4. Then, we find k out of m sampled points along the mean motion trajectory of HR training sequences, where the histogram of
k sampled points is similar to the histogram of the input sequence. The
similarity between two histogram distributions is calculated by using the Bhattacharyya coefficient as follows:1
where and respectively represent the histogram of the input LR sequence and the histogram of training sequence.
(a)
(b)
Figure 4.4 (a) The coordinates of the k postures of the LR input sequence. (b) We try to find k reference points among m reference points along the mean motion curve of all the HR learning sequences. The index of the k reference points indicates the suitable position in tensor of the input sequence postures.
After the above data alignment, we now arrange all the sequences including the HR learning and the LR input sequence in tensor. As shown in Figure 4.5, since the tensor is not complete, we cannot directly apply tensor decomposition. Thus, we extract the motion factor from only the learning sequence as indicated by the red rectangle in Figure 4.5, and extract the person factor form only the column with complete postures as indicated by the blue rectangles in Figure 4.5. Next, we calculate the value of core tensor using the extracted orthogonal factors and available
tensor data. Finally, with the obtained orthogonal factors and the core tensor, we obtain the complete tensor data and then use the tensor data to reconstruct the motion trajectory for the input LR sequence.
Figure 4.5 Our scheme of arranging training postures into tensor data, where the green rectangles represent unknown object postures in the tensor. In tensor decomposition, we extract the motion factor only from the training sequences as indicated by the red rectangles and the person factor from the columns with complete postures as indicated by the blue rectangles.