Using Back Propagation Model to Design a MIDI Music Classification System

全文

(1)Int. Computer Symposium, Dec. 15-17, 2004, Taipei, Taiwan.. Using Back Propagation Model to Design a MIDI Music Classification System Yo-Ping Huang and Guan-Long Guo Dept. of Computer Science and Engineering Tatung University Taipei, Taiwan 10451 Email: yphuang@ttu.edu.tw. Chang-Tien Lu Dept. of Computer Science Virginia Tech Falls Church, VA 22043 Email: ctlu@vt.edu. Abstract-The main purpose of this paper is to. different and could not faithfully represent the original flavor of melody. To solve this kind of problem, we construct the general MIDI instrument map. All electronic instruments will follow this table to create sounds and replay the music just as they were. The only difference is the tone and characteristics of each electronic sound. GM has defined 128 kinds of musical instruments, 47 kinds of percussion instruments, and classified them into 16 categories [13-14]. It seemed sufficient in the beginning, but recently a lot companies added tones of music into electronic instruments as much as they can with the hope that their products can be more competitive in the market and distinguishable from others. Because MIDI only records music data, the size of its file is much smaller than the one that records wave data. This kind of music is popular on the network for the requirement of limited bandwidth. There are a lot of characteristics in melody. In order for users to compare and search similar music, we have to categorize the characteristics in a more systematic way. In general, these characteristics in melody include: (1) Tempo: We can easily recognize whether the music belongs to slow or fast tempo. This is an obvious characteristic. (2) The kind of musical instrument used: It is also an important characteristic. If a song uses bright piano or perceptual violin, this is also an important feature for a song's style. (3) The number and proportion of each musical instrument used in a melody: Most people have different feeling about solo or large-scale symphony orchestra. Therefore, the number and proportion of each musical instrument used in a melody should also be considered in music classification. We investigate the above characteristics to determine the style of a song and construct a model that can compare and search similar music files conveniently than from most of the search engines. The next question is how to construct an effective model to fulfill such a goal. Artificial neural network is a kind of data processing system that can imitate the neural network of human beings. This system can continually evolve its way of thinking, that is, think like human beings and learn from its experience. Back-propagation model is one of the most. investigate how to develop an effective classification system that can first categorize the characteristics in MIDI music files and then search similar music in the Internet. In this system, back propagation network is applied to train and categorize the characteristics in MIDI music. Many search engines now can provide efficient ways to search music. However, those search engines only search the files by the names of music, and cannot categorize and compare the music according to the characteristics of music. In this paper, we select representative songs of eight specific music categories to construct a module that can identify the types of music by means of back propagation network. We introduce the theoretical basis of music classification and present the experiment results to validate the effectiveness of the proposed model. Keywords: MIDI music, back propagation network, musical and percussion instruments.. 1. Introduction MIDI is an abbreviation of Musical Instrument Digital Interface [13-14]. MIDI is a kind of communication specification proposed in January 1983. Because of this specification, we can exchange music files that different electronic instruments produce with each other and prompt the development of electronic instruments rapidly and conveniently. The unit that an electronic instrument uses to make sounds is a channel. Before a channel can make a sound, it has to specify the kind of instrument, when to make a sound, the volume of the sound, the musical scale, and how long this sound will last. The only function of MIDI is to send the commands to an electronic instrument and the instrument will create the sound according to these commands. The better the electronic instrument is, the more accurate it can regenerate the original music and show the style of music completely. Although MIDI defines the functions of the electronic instruments, different electronic instruments may have different codes for different instruments. For example, code 1 probably means a piano for electronic instrument A, but could be a trumpet for B. In this case, even the music created with A can be replayed with B, the style of music will be totally. 253.

(2) Int. Computer Symposium, Dec. 15-17, 2004, Taipei, Taiwan.. well-known artificial neural networks [1-3,7-12]. Its basic principle is to use the concept of gradient steepest descent method to minimize the error function. By introducing the hidden layer concept into the network, the back propagation model has a good capability in mapping the relationship between inputs and desired outputs. We thus train the back propagation model to establish a classification system that can categorize the MIDI music files.. 10th kind: Pipe 11th kind: Synth Lead 12th kind: Synth Pad 13th kind: Synth Effects 14th kind: Ethnic 15th kind: Percussive 16th kind: Sound Effects. 2. The Classification of MIDI Music Files MIDI files have two kinds of formats, i.e., format 0 and format 1. Format 0 is an early format that has only one area for music data. Most of the music files on the network use this format. On the contrary, format 1 has many areas for music data and can be used for complex electronic instruments. We use format 0 in this paper to analyze the music. Because most mobile phones use MIDI as their rings format, lots of MIDI files are created specially for mobile phone users. Since there is only one track and contains incomplete music data in these MIDI files, in order to retrieve the real characteristics of music files, we did not use the kind of MIDI files that mobile phones used. There are 128 kinds of MIDI musical instrument sounds, which can be divided into 16 categories according to their characteristics. Table 1 lists these 16 kinds of MIDI classes. There are 47 kinds of percussion instrument sounds of MIDI and can be divided into 5 categories, based on their special types. Table 2 lists the general MIDI percussion map. Table 1. General MIDI instrument map (Channel 1-16, except 10). Prg#. Instrument. 1st kind：keyboards 001 Acoustic Grand Piano 002 Bright Acoustic Piano 003 Electric Grand Piano 004 Honky-Tonk Piano 005 Electric Piano 1 006 Electric Piano 2 007 Harpsichord 008 Clavinet 2nd kind: Chromatic Percussion 3rd kind: Organ 4th kind: Guitar 5th kind: Bass 6th kind: Strings 7th kind: Ensemble 8th kind: Brass 9th kind: Reed. 254. Table 2. General MIDI percussion map (Channel 10). Key# Percussive Instrument 1st kind: Bass Drum 35 Acoustic Bass Drum 36 Bass Drum 1 64 Low Conga 66 Low Timbale 68 Low Agogo 2nd kind: Tom 41 Low Floor Tom 43 High Floor Tom 45 Low Tom 47 Low-Mid Tom 48 Hi-Mid Tom 50 High Tom 54 Tambourine 62 Mute Hi Conga 63 Open Hi Conga 65 Hi Timbale 67 Hi Agogo 3rd kind: Snare 37 Side Stick 38 Acoustic Snare 40 Electric Snare 60 Hi Bongo 61 Low Bongo 4th kind: Hat 42 Closed Hi-Hat 44 Pedal Hi-Hat 46 Open Hi-Hat 49 Crash Cymbal 1 51 Ride Cymbal 1 52 Chinese Cymbal 55 Splash Cymbal 57 Crash Cymbal 2 59 Ride Cymbal 2 5th kind: Others 39 Hand Clap.

(3) Int. Computer Symposium, Dec. 15-17, 2004, Taipei, Taiwan.. 53 56 58 69 70 71 72 73 74 75 76 77 78 79 80 81. directly fed to the training network. Instead, those characteristic values need to be normalized into [-1, 1] before being used for the back propagation model. In our back propagation model, there are 23 input nodes, 15 hidden nodes, and 8 output nodes. Those 23 characteristic values of the music are briefly summarized in Table 3. Based on different trials from our simulation results, we use 15 nodes in the hidden layer. We use 8 nodes in the output layer to represent the 8 music categories as listed in Table 4. For example, the music 1_LULLAB.MID has the corresponding output “Blue” (output 1). The overall structure for the proposed back propagation model is shown in Fig. 1. Table 5 lists a training sample of input-output pattern for the network.. Ride Bell Cowbell Vibraslap Cabasa Maracas Short Whistle Long Whistle Short Guiro Long Guiro Claves Hi Wood Block Low Wood Block Mute Cuica Open Cuica Mute Triangle Open Triangle. Table 3. The meaning for the 23 characteristic values for the input nodes in back propagation model. The serial number for input 1 2 3. 3. The Construction of System Module. Channel 1-16, except 10. MIDI files record a lot of music information and each file contains different kinds of formats. We have to understand those music information before we can analyze them. By simple statistical analysis, the system will calculate the values of these characteristics. Then we will determine the tempo of music, the number of tracks and tones in this music, and know what kind of music instrument used in the music. Through the analysis, we can obtain practical data that can be used to analyze the music. To establish a classification model for the music files, we first need to find out what are the representative songs. These songs are regarded as the training samples for the model. The basic steps to classify and retrieve the music files are stated as follows: (1) Analyze the MIDI structure: We can analyze the original MIDI file structure to extract the characteristic value. (2) Select the representative training samples: The representative MIDI music songs are regarded as the training samples for the back propagation model. (3) Find a better set of system parameters: We can repeat the training processes to find a better set of system parameters, such as the number of hidden nodes and learning rates, for the back propagation model.. Channel 10. (4) Retrieve other MIDI files: The well-trained model is then used as the basis to search and analyze other MIDI files in the Internet. There are a lot of characteristics in MIDI music, for example, the beat, quantity of sound rail, classified timbre, etc. These characteristics cannot be. Meaning. Tempo Quantity of sound rail Proportion of Keyboards classification Proportion of Chromatic Percussion 4 classification 5 Proportion of Organ classification 6 Proportion of Guitar classification 7 Proportion of Bass classification 8 Proportion of Strings classification 9 Proportion of Ensemble classification 10 Proportion of Brass classification 11 Proportion of Reed classification 12 Proportion of Pipe classification 13 Proportion of Synth Lead classification 14 Proportion of Synth Pad classification Proportion of Synth Effects 15 classification 16 Proportion of Ethnic classification 17 Proportion of Percussive classification Proportion of Sound Effects 18 classification 19 Proportion of Bass Drum classification 20 Proportion of Tom classification 21 Proportion of Snare classification 22 Proportion of Hat classification 23 Proportion of Others classification. Table 4. The 8 music categories. The serial number for output Meaning 1 Blue. 255.

(4) Int. Computer Symposium, Dec. 15-17, 2004, Taipei, Taiwan.. 2 3 4 5 6 7 8 Input node: 23. Classical Dance Country Funk Jazz Pop Rock Hidden node: 15. Output 1 Output 2 Output 3 Output 4 Output 5 Output 6 Output 7 Output 8. Output node: 8. 4. Experimental Results and Discussions. O1. I1. .H . .. 1. I2. . . .. Since there is no definite method to decide an appropriate learning rate for the back propagation model, we performed several simulations to analyze how the learning rates affect the converging speed. Table 6 compares the final training errors from different learning rates under the same initial condition and 10,000 training iterations. Based on the final training errors from the experiments, we found the larger the learning rate, the smaller the training errors. As a result, it seems that it is better to select a larger learning rate for the back propagation model. However, a smaller training error for the training samples may not correspond to a better test result. Based on our experience, an acceptable set of network parameters should be good for both the training and test patterns. In our model, we can have the best classification result when the learning rate is equal to 2.5. Fig. 2 plots the correct training result for this case. When the learning rate is set to 3.0, we may obtain an incorrect result as shown in Fig. 3. The well-trained back propagation model is then used to search the similar music files. For example, in Fig. 4, when users click the 01.MID filename in the left hand side, the system can search the similar music files and list in descending order of similarities. By carefully analyzing the result, despite the tempo and middle musical instrument have a little difference, most characteristics of the 01.MID and 7_WALKLI.MID files are the same. Fig. 5 compares such a result. For comparison, if we use 01.MID to search for navigation and tempo, then the close one is BROWNJUG.MID as given in Fig. 6. In Fig. 7, we can see that both music files have good match except the slight differences for some musical instruments. Although BROWNJUG.MID has been classified as classical music, it also partially belongs to the rock category as shown in Fig. 8. Table 6. The training errors from different learning rates in back propagation model. Learning rate Training errors 0.1 0.009247748342085 0.5 0.001570179186564 1 0.000717116696915 1.5 0.000497303467918. O2. . . .. H15. I23. O8. Fig. 1. The overall structure of the back propagation model. Table 5. The sample of Blue, the name of the shelf: 1 _ LULLAB.MID. Serial number Input 1 Input 2 Input 3 Input 4 Input 5 Input 6 Input 7 Input 8 Input 9 Input 10 Input 11 Input 12 Input 13 Input 14 Input 15 Input 16 Input 17 Input 18 Input 19 Input 20 Input 21 Input 22 Input 23. 1 0 0 0 0 0 0 0. Value -. 883721 -. 571429 -. 133333 -1.000000 -1.000000 -. 571429 -. 666667 -1.000000 -1.000000 -1.000000 -. 333333 -1.000000 -1.000000 -1.000000 -1.000000 -1.000000 -1.000000 -1.000000 -1.000000 1 0 -1.000000 -1.000000. 256.

(5) Int. Computer Symposium, Dec. 15-17, 2004, Taipei, Taiwan.. 2 2.5 3 3.5. 0.000370886425268 0.000307177201952 0.000103514806934 0.000028861870828. 1 0.5 0 -0.5 -1 I1 I4 I7 I I I I I O O 10 13 16 19 22 2 5 01.MID. O 8. 7_WALKLI.MID. Fig. 5. Despite the tempo and middle musical instrument have a little difference most characteristics of 01.MID and 7_WALKLI.MID are the same.. Fig. 2. A correct training result when the learning rate is 2.5.. Fig. 6. If we use 01.MID to search for navigation and tempo, then the close one is BROWNJUG.MID TEMPO 1 0.5 0 -0.5 -1 I1 I4 I7 I I I I I O O O 10 13 16 19 22 2 5 8. Fig. 3. An incorrect training result when the learning rate is 3.0.. 01.MID. BROWNJUG.MID. Fig. 7. The good match between 01.MID and 7_WALKLI.MID except the slight differences for some musical instruments.. Fig. 4. Using 01.MID file to find the similar 7_WALKLI.MID file.. 257.

(6) Int. Computer Symposium, Dec. 15-17, 2004, Taipei, Taiwan.. [2]. [3]. [4] Fig. 8. Although BROWNJUG.MID has been classified as classical music, it also partially belongs to the rock category.. [5]. [6]. 5. Conclusion With the advent of advanced network technology, multimedia files can easily circulate in the Internet. Our goal is to design a system that can effectively classify the music files and apply this technology to searching multimedia files. It can advance the classification accuracy faced by most search engines that rely on the input keywords to query the files. After analyzing the characteristics of music, we identify the key factors for music classification which include 16 musical instruments and 5 percussion instruments, and propose a classification model to categorize the MIDI files based on the well-trained back propagation model. Users can select a favorite music to search similar music files without using the music filename. Experimental results verified that the proposed system can fulfill the goal of providing a satisfactory MIDI classification model. The future work can focus on training the proposed model with large music classes so that the classification model can cope with the music classification in the changing world. In addition, fuzzy classification techniques can be used in the model to improve the performance of partial query problem, such as 0.4 degree belonging to POP and 0.6 degree to ROCK. We can also provide additional functionalities in the user interface for users to input their preferences of instrument types or tempos to simplify the query.. [7]. [8]. [9]. [10]. [11]. [12]. [13]. [14]. Acknowledgment This work is supported by National Science Council, Taiwan, R.O.C. under Grants NSC92-2213-E-036-017, NSC92-2516-S-036-001, and by Tatung University under Grant B9208-I02-025.. References [1]. J.. Hertz,. A.. Krogh,. and. R.. Palmer,. 258. Introduction to the Theory of Neural Computing, Addison Wesley Publishing Company, Redwood City, CA, USA, 1991. An Introduction to Back-Propagation Neural Networks, Http://www.seattlerobotics.org/encoder/nov98/ neural.html. H. White, "Economic prediction using neural networks: the case of IBM daily stock returns," IEEE Int. Conf. on Neural Networks, San Diego, CA, vol. 2, pp.451-458, July 1988. J. Han and M. Kamber, Data Mining: Concepts and Techniques, Morgan Kaufmann Publishers, San Francisco, CA, 2001. B.Y. Ricardo and R.N. Berthier, Modern Information Retrieval, Addison-Wesley/ACM Press, New York, USA, 2002. Y.H. Ke, An Efficient Inference Model for Personalized Data Mining System, Department of Computer Science and Engineering, Tatung University, Taipei, Taiwan, 2002. W.J. Hsieh, The Analysis and Application of Grey Model and Back-propagation Network to the Premium Rate Service, Department of Computer Science and Engineering, Tatung University, Taipei, Taiwan, 2003. P. Werbos, Beyond Regression: New Tool for Prediction and Analysis in the Behavioral Sciences, Ph.D. Thesis, Harvard University, 1974. G.A. Carpenter and S. Grossberg, “ART2: self-organization of stable category recognition codes for analog input patterns,” Applied Optics, vol. 26, pp.4919-4930, 1987. T.P. Hong, C.S. Kuo, and S.C. Chi, “A fuzzy data mining algorithm for quantitative values,” 3rd Int. Conf. on Knowledge-Based Intelligent Information Engineering Systems, pp.480-483, 1999. A. Kaufmann and M.M. Gupta, Fuzzy Mathematical Models in Engineering and Management Science, Amsterdam: North-Holland, 1988. R.R. Yager and D.P. Filev, Essentials of Fuzzy Modeling and Control, John Wiley & Sons Inc, USA, 1994. Musical Instrument Digital Interface (MIDI), Http://www.indiana.edu/~emusic/MIDI.html# References. Standard MIDI Files 1.0, Http://ourworld.compuserve.com/homepages/ mark_clay/MIDIfile.htm..

(7)