5.1 Experiment
Our motion data refer to CMU Mocap data format. The dance motion data came from a free 3D dancing design software called MikuMikuDance (MMD). It is so easy to compose a dance motion and create 3D animation movies. We use their transformation software tools to convert the MMD data to CMU Mocap data. We then remove the extra joints such as finger and toe. Our training features only consider about 20 joints and transfer hierarchical joint angles to position space. The sampling rate of mocap data is 30Hz and music data is 44kHz. The hopsize and window size are 0.4 seconds and 1.2 seconds.
Results of dance motion and music segmentation and clustering We show the
results of the music feature segmentation and clustering by the two statistical models.
1
2
3
The first is “Bad romance” recording from artist Lady Gaga. The second is “Just dance” recording from Lady Gaga either. The third is “ルカルカ★ナイトフィーバ ー”, voice from Vocaloid 2 third series.
Three of the testing music show the segmentation and clustering results of music.
We use different color to replace the alphabet label and the same color represents the same clustering. It is so easy to find the corresponding repeated clustering through our eyes.
23 1
2
3
All dance motion are downloaded from Nicovideo uploader.
The first is from the site: http://www.nicovideo.jp/watch/sm7821565 The second is from the site: http://www.nicovideo.jp/watch/sm9194284 The third is from the site: http://www.nicovideo.jp/watch/sm10469234
Three of the testing music also show the segmentation and clustering results of dance motion. And it is so easy to find the corresponding repeated clustering through our eyes either.
Results of Ranking Algorithm There are several consistent results shown as
follows:
(p.s.) Motion(Smo) and Music(Smu) have the same color such as yellow, but two clusters don’t represent the same cluster.
Example 1: Motion(Smo) and Music(Smu) came from the website http://donburiroom.blog8.fc2.com (Author: DONKEY). The motion and music are positively corresponded because they are artificially designed.
Data1 is an existing pair from The repeated parts show on orange window’s frames. And their ranking mark is 0.17682 through our ranking algorithm.
It is easy to find the orange window’s frames are the repeated parts and their Smu:
Smo:
24
consistency points are partly matched. The ranking mark is the highest score in our database music because this pair is the artificial composition.
Example 2: Motion(Smo) is the same as example 1 and Music(Smu) came from our ranking algorithm. The song is one best mark through our ranking algorithm.
Data2 is one best-fit result from our ranking algorithm. The repeated parts show on orange window’s frames. Their ranking mark is 0.06487.
Different between example 1 and example 2 is the size of the orange window’s frames and both of them have the repeated parts and their consistency points are partly
matched.
In these two examples, the repeated parts show their rhythmic structure and these parts similar as verse-chorus form. Obviously, we can see that ranking mark of
example 1 is higher than example 2 and the length of orange window’s frames in example 1 is longer than example 2 too. In our test data, example 1 is an elaborate composition by Author, their rhythmic structures are mostly matched. On the other hand, our result is not through man-made, but we can also see their rhythmic structures are matched in example 2.
5.2 Discussion
We discuss about the motion clustering, the weakness of our objective function and some thresholds.
Motion clustering Our similarity motion retrieval algorithm is proposed by M.
Smu: Smo:
25
Levy and M. Sandler [LS08]. However, their algorithm is not aimed at similarity motion or heterogeneous retrieval. We thus need to find strong training features to satisfy the algorithm. We first try the raw motion capture data which contains the global transition positions and Euler angles of joints. The raw data is represented by row data of training matrix. After constructed the training matrix, we start to train a large fairly number with first statistical model (HMM) and second statistical model (EM-algorithm). Unfortunately, the raw data isn’t a strong feature on the HMM training because the Euler angles of joints are unclear to distinguish two posees.
Therefore, we transfer hierarchical joint angles to position space. It is easy to distinct between two different pose on position space. After training two statistical models, the clustering result is clearly separated the different motions. Finally, we choose the position space as our features.
Objective function Our approach mainly focuses on repeated structure. In our corresponding repeated cluster algorithm, we consider whether the two rhythmical structures are both matched and the users need not have any more domain knowledge.
We don’t consider that if the motion verse part corresponds with the music chorus part.
Therefore, we may still find an unexpected pairs with a high ranking score.
Thresholds We mention that it is important to choose the strong feature for two
statistical models. Even the strong feature are chosen, the training data must also be a verse-chorus form. If song or dance motion is monotonous, the two statistical models might not success to get a right training result. The main reason is that the statistical models are aimed at the repeated parts as the training seeds.
26