3. Driving Character using Motion Sensors
3.6 Action Graph
Without the concept of graph structure, two similar motion clips belonging to different motion categories may mislead into one or another when reconstructing human motion. This causes a visually discontinuous motion, and the artificial result is hardly accepted in any interactive application. Although the discontinuous motion can be blended with convoluted filtering algorithms, such as low-pass filtering, it is not an ideal solution since it only alleviates the discontinuity. A direct method to solve it is increasing the numbers of frames in a clip, and thus it minimizes problems in synthesizing the discontinuous motion, which is also able to perform in real-time application.
However, the critical side effect of using this method is latency time. Namely the numbers of frames in a training motion clip increase, the accumulative time from online received sensors' data should also be longer. Besides, the searching time for a training motion clip also increases. Finally, the problem of synthesis still exists, and merely the frequency to combine two motion clips decreases. Therefore, in order to prevent the problem of smoothness from synthesizing two discontinuous motion clips due to ambiguity of sensor' data, we propose constructing an action graph to avoid it. The idea of action graph is similar to motion graph in [KGP02]. The graph node represents a training motion clip from the training motion, and the graph edge is automatically generated transitions.
3.6.1 Detecting candidate transitions
When we have training motion clips from training motion database, we calculate the total Euclidean distance for each pair of clips, that is the difference from the positions of one's end frame to the position of other's start frame. It is not necessary that consider the middle frames of clip since a clip is short. After we have the distance between each pair of
training motion clips, we normalize the difference for finding the candidate transition. Then we construct a distance grid plot, whose element contains the normalized difference from one training motion clip to the other. It is beneficial to find candidate transitions and for probabilistic computation in hidden Markov Model.
In order to automatically search for the candidate transition, the user has to set Gthreshold, a threshold value between zero and one. Then the difference that is under the Gthreshold will form the candidate transition, or what will be the edge of the graph. About the threshold setting, there is no definite value for all of motions. It depends on what motion categories that user wants to perform. If the motion is common in everyday life, such as walking, jumping or running, it is suggested that threshold should be set lower. The lower threshold provides a smoother transition of clips. Because people are sensitive to those common motions and can be easily aware if the transition is unusual. However, if the motion is a specific type like ballet or yoga, it is recommended that threshold is ought to be higher.
Setting the higher threshold provides a higher connective graph. Therefore, the stretching or spinning move with huge differences is able to have flexible candidate transitions from those motion types.
Figure 3.11: Taking 3 motions as an example. The distance grids plot shows that the distance between each pair of clips. (White point represents shorter distance. Red point represents distance under threshold. Black point represents longer distance.)
3.6.2 Constructing action graph
The action graph can be built after we have training motion clips and the candidate transitions in the database. A training motion clip contains a set of frames with all information of the character, such as the position of the root joint, the orientation of each joint, the reduced-dimensionality sensors' training data, and the corresponding number of cluster. The training motion clip represents the node of the graph. The length of the node, or the number of frames in a clip, should not set too long due to latency time. The longer size a clip is, the longer accumulating time (latency) we have for collecting sensors' online received data. We set ten to twenty frames per clip in our experiment, and the frame per second (FPS) is about sixty in the system. A candidate transition contains the incoming node and outgoing node, and the auto-generated a small set of frame containing with the information of the position of the root joint and the orientation of each joint. A candidate transition represents
the edge of the graph.
To construct action graph, we place all of training motion clips as nodes, and connecting two nodes with an edge if these two nodes are a candidate transition. However, the sink nodes or dead end nodes may exist in the graph, and they make the following motion infeasible. A dead end node occurs if there is no outgoing edge from this node, and a sink node happened if this node does not connect at least two other nodes. Consequently, the process of driving human motion by sensors will be halted if the system entered these nodes.
To fix this problem, we eliminate sink and dead end nodes by traversing all nodes and edges.
When the system is traversing the graph for reconstructing motion of avatar in run time, it is able to synthesize the smooth motion from nodes and edges without spending extra time on calculating the transit frames.
Figure 3.12: An example of action graph.