Pitch Discrimination - Sound Analysis - Acoustic Signal Analysis

3. Acoustic Signal Analysis

3.2. Sound Analysis

3.2.1. Pitch Discrimination

As mentioned before, the recorded samples used in this thesis were taken from a replicated set of the Marquis Yi’s Chime-bells. Although the set was well reproduced, slight deviations might still exist. According to [6], the fundamental frequencies of all bells were measured by three different groups after the discovery of the tomb of Marquis Yi of Zeng.

These data could be used to investigate the differences between the original set and the replicated set. The fundamental frequencies of the recorded samples in this thesis have been measured through Fast Fourier Transform (FFT) with the hamming window and a window length of 8096 samples. The pitch discrimination of fundamental frequencies between these two set can be thus calculated.

Pitch discrimination means the ability to hear the small difference in pitch. According to the study [41], the average threshold of adults is about 3Hz at the pitch of 435Hz, which is about 1/17 of a whole tone. However, for people who are very sensitive to sounds, the threshold could be less than 1/100 of a tone. In musicology, the interval of a semi-tone can be divided into 100 cents. Therefore, the threshold described above can be converted to 12 cents

and 2 cents respectively. According to equal temperament, the frequency ratio between two adjacent notes is roughly equal to ^!" 2≈ 1.0595, and the frequency ratio of a whole tone can be calculated as 1.0595^! ≈ 1.1225. Then, the threshold of pitch discrimination could be converted to a percentage error of 0.72%.

To find out if there is any significant difference in pitch for every bell, the absolute percentage error between the original set and the replica could be calculated in (3.1), where Va

is the actual value and Vm is the measured value:

𝐴𝑏𝑠𝑜𝑙𝑢𝑡𝑒 𝑃𝑒𝑟𝑐𝑒𝑛𝑡𝑎𝑔𝑒 𝐸𝑟𝑟𝑜𝑟 𝐴𝑃𝐸 = 100% × ^!^!^!!_! ^!

! (3.1)

For each bell, the mean value of the three measurements from [6] is used as the actual value. The APE of all bells has been estimated. The results are shown in Fig. 3.4, 3.5 and 3.6.

Figure 3.4 APE of the bells on the first rack

Figure 3.5 APE of bells on the second rack

Figure 3.6 APE of the bells on the third rack

Table 3.2 Counts of tones under the threshold.

Racks Under Thresh. Exceed Thresh. Total Percentage

1st 31 9 40 77.5%

2nd 33 13 46 71.7%

3rd 31 11 42 73.8%

Total 95 33 128 74.2%

The statistic results of the counts on bells under the threshold have been shown in Table 3.2. Since there are 64 two-tone bells, a total number of 128 tones have been evaluated. As a result, about 74.2% of tones are under the threshold of pitch discrimination. That is to say, most of the fundamental frequencies from the replicated set are perceptually the same with the original set to most people. Besides, the first rack has the highest percentage of tones under the threshold, which means the quality of the first rack could be higher. As a result, the recorded samples are acceptable as being the reference for synthetic sounds.

3.2.2. Spectral Analysis

To gather more information from the recorded samples, both the time and frequency domain features of the signals have been examined. Among the 65 bells on the racks, 13 bells have fundamental frequencies higher than 1000Hz, and 52 bells have fundamental lower than 1000Hz. These two groups of bells have different acoustical features. For bells with higher fundamental frequency, such as the first three bells on the top and middle levels of the rack,

the decay times are short and the sounds are more clear and bright. Take 1_T_2 for example, the waveforms of both Sui and Gu tones are shown in Fig. 3.7. Both sounds decay very fast and smooth after being struck. The spectra of 1_T_2 are shown in Fig. 3.8. The fundamental frequency of the Sui tone and the Gu tone are 1391 Hz and 1780 Hz respectively. As discussed previously, the geometrical design of the Chime-bells will cause the vibrational modes to exist in pair, allowing two distinguished tones to be produced on the same bell.

Although the mode (m,n)a and (m,n)b are supposed to suppress each other due to the positions of their nodes and antinodes, they still tend to coexist in both the Sui tone and the Gu tone.

Results can be found in Fig. 3.8 that the frequency components in both sounds are very similar except for their magnitudes. This result suggests that the sounds of the Chime-bells could be the weighted combination of two different groups of modes.

Figure 3.7 Waveforms of 1_T_2 (a) Sui tone (b) Gu tone (a)

(b)

Figure 3.8 Spectra of 1_T_2 (a) Sui tone (b) Gu tone

In addition, the pairs in the spectra could be identified through observation of spectra of the sounds. For instance, in Fig. 3.8, the peaks at 1391 Hz and 1780 Hz can be considered as the first pair, the peaks at 2521 Hz and 2902 Hz as the second pair, and the peaks at 3828 Hz and 4049 Hz as the third pair. The ratios between these pair are slight less than 2 and 3 times of the fundamental frequencies of Sui and Gu tones. Similar features could be found on the bells with higher fundamental frequencies such as 1_T_1, 1_M_1…etc. The spectral contents of these bells are relatively simple compared to bells with lower fundamental frequencies.

For bells with lower fundamental frequencies, such as bells on the middle and bottom levels of the rack, the decay times are slightly longer and the sounds are drier and darker for more partials located in lower frequencies. Besides, the frequencies partials above 5000 Hz are strongly suppressed in these bells. This could be the low-pass effects caused by the installation of Mei [27]. Take 1_M_10 for example, the waveforms of both Sui and Gu tones

(a)

(b)

are shown in Fig. 3.9. Both sounds still attenuate fast except for some low frequency partial which tend to last a bit longer. The spectra of 1_M_10 are shown in Fig. 3.10. The fundamental frequency of the Sui tone and the Gu tone are 323 Hz and 407 Hz respectively.

The weighted combination can still be observed in Fig. 3.10; for instance, the peak of 323 Hz appears to be weaker in the Gu tone. However, the distribution of pairs in the spectra is different from previous case. The second and third pairs are more distant, and the ratios between these pairs and the first pair are higher than 2 or 3. Similar features could be found on the bells with lower frequencies such as 1_M_11, 1_B_1…etc, and their spectral contents are more complex compared to bells with higher fundamental frequencies. Additionally, the decay time of bells on the bottom level can be significantly longer, and the beating effect will be stronger too.

Figure 3.9 Waveforms of 1_M_10 (a) Sui tone (b) Gu tone (a)

(b)

Figure 3.10 Spectra of 1_M_10 (a) Sui tone (b) Gu tone (a)

(b)

4. Sound Synthesis of the Chime-Bells

4.1. Description of the Synthesis Model

In Fig. 4.1, the block diagram of the sound synthesis model of the Chime-bells has been shown. The essential parts of the model include an input dynamic control and a bell model.

The dynamic control consists of the excitation table and a low-pass filter is to simulate the non-linear behavior of the strike. It has been achieve by changing the low-pass filter according to the collision velocity Vc. The bell model includes mainly two rails of inharmonic digital waveguides. The weighting combination of the waveguides and the parameters of the filters could be adjusted by the users, creating different bell sounds.

Figure 4.1 Block diagram of the Chime-bell synthesizer

4.2. Dynamic filter and Extraction of Excitation Signals

For most musical instruments, playing with different initial velocity will cause not only the changes in amplitudes but also in timbres. To simulate this effect, a dynamic filter is

needed. In real performance, different tools have been used to play the Chime-bells according to their position on the racks. For bells on the top and middle levels of the rack, a small mallet is used to strike the bell. For bells on the bottom level of the rack, a wooden pillar wrapped by rubbers is used. This mechanism is pretty similar to the interaction between hammer and string in piano [18] despite of the differences in sizes and physical shapes. In the hammer string case, the force pulse from the hammer has non-linear behaviors due to the felt on the hammer. Therefore, as the strike getting harder, the force pulse will be narrower in time domain and wider in frequency domain. This idea is illustrated in Fig. 4.2:

Figure 4.2. Force pulses of different initial velocities

In this thesis, the interaction between the mallet and the bell is reduced to a single impulse. That said, no further contact after the first collision. Besides, the bell is assumed to be static before the strike. As a result, the net velocity is equal to the input collision velocity Vc. A one pole low-pass filter is set to simulate the spectral changes of the input excitation.

This filter can be implemented following the steps in [30]. The transfer function of the

dynamic filter is shown in equation (4.1):

𝐻_! 𝑧 = 𝑏_!

1 − 𝑎_!𝑧^!! (4.1)

In (4.1), 𝑎_! = 𝑒^!!"#, T is the sampling frequency, and L is the desired bandwidth in Hz.

Here the range of L from 0 to fs/2 is mapped to the range of the input velocity from 0 to 127.

The 𝑏_! is equal to 1 − 𝑎_! so the peak gain will equal to unity as z approaches to 1.

For every different collision velocity, a low-pass filter with different bandwidth will be set to modify the excitation signals. For instance, when the collision velocity is higher, the bandwidth of the filter will become wider.

The excitation table is a collection of different excitation signals extracted through inverse filtering [17]. This process is achieved by putting the recorded samples through the inverse filter 𝐴 𝑧 = 1/𝐻_!(𝑧) where 𝐻_!(𝑧) symbolizes the transfer function of the inharmonic digital waveguides in the Chime-bell model. This inverse filter will suppress the least-damped partials in the original samples, leaving a short bursting signal of most-damped partials. As shown in Fig. 4.3, the residual excitation signal dies out quickly compared to the original signal. The excitation signal contains information of the strike sound and the prediction error of the bell model. Normally, excitation signals with a length of 50 to 100 ms are enough to be fed into the bell model.

Figure 4.3 (a) Original signal and (b) excitation signal of 1_T_1 Sui tone

4.3. Bell Models

As described in section 3.2.2, the distribution of the frequency components of the bells with different fundamental frequencies is different. To cover up the entire range of the bells with different fundamental frequencies, two bell models have been developed.

The block diagram of the first Chime-bell model is shown in Fig. 4.4. As discussed previously, the sound of the Chime-bells with higher fundamental frequencies (Typically higher than 1000Hz) can be considered as the weighted combination of two groups of modes.

Therefore, two inharmonic digital waveguides are used to produce the partials of (m,n)a and (m,n)b respectively. Partials are determined through the observation of spectra of the recorded sounds. The inharmonic digital waveguide is basically a filtered feedback comb filter. The users could determine the weighting coefficients of both rails arbitrarily. To imitate the real sounds, the coefficients can be set based on the dB ratio of their fundamental frequencies from spectral analysis. As mentioned in section 2.3, the loop filters of an inharmonic digital waveguide include a fractional delay filter 𝐻_!(𝑧), a loss filter 𝐻_!(𝑧) and an inharmonic

(a)

(b)

all-pass filter 𝐻_!"(𝑧). The transfer function of a single inharmonic digital waveguide can be expressed as equation (4.2) where L is the length of the delay loop:

𝐻_! 𝑧 = 1

1 − 𝐻_!(𝑧)𝐻_!(𝑧)𝐻_!"(𝑧)𝑧^!! (4.2)

Figure 4.4 Block diagram of the Chime-bell model 1

For bells with lower fundamental frequencies (typically lower than 1000 Hz), the model above should be adjusted according to the previous discussion in section 3.2.2. Since the fundamental frequencies are too separated from the other partials, a similar concept could be borrowed from [33] that frequencies are banded to account for this high inharmonicity. As shown in Fig. 4.5, two filtered digital waveguides are added to simulate the lowest two modes of the bell. A band-pass filter is connected to the digital waveguide to exclude the unnecessary partials from excitations. The original inharmonic digital waveguides with low order band-pass filters are to generate the rest of the partials of the other modes. This hybrid method will provide flexibility for this model to match partials of the real sounds while retaining the interpretation of the waveguides. Additional, an optional resonator is placed when the beating effect is strong.

Figure 4.5 Block diagram of the Chime-bell model 2

5. Results and Discussions

5.1. Synthetic Results

The synthetic results from the models described in the previous section have been presented. All of the synthesized signals are calculated in the programming environment of Matlab.

5.1.1. Results from model 1

The first model described in section 4.3 can be employed to simulate the sounds from the Chime-bells with higher fundamental frequencies. These bells typically have fundamental frequencies higher than 1000 Hz and are smaller in size. 1_T_1 and 1_M_1 are the bells of this type. For 1_T_1, the fundamental frequencies of the Sui and Gu tones are 1986 Hz and 2332 Hz. For 1_M_1, the fundamental frequencies of the Sui and Gu tones are 1787 Hz and 2143 Hz. The parameters for loop filters can be determined by analyzing the recorded samples. For example, the inharmonic all-pass filter can be adjusted to match the essential partials identified, and the loss filter can be set based on the T60 of the real sounds. The waveforms of the recorded and synthetic tones of 1_T_1 and 1_M_1 are shown in Fig. 5.1.

Both the Sui and Gu tones have been simulated. Result can be found that the general shapes of the recorded and synthetic tones are similar. The spectra of these tones are shown in Fig.

5.2. It can be seen that there are fairly good matches on the lower frequency partials. However, mismatches can be found on the higher frequency partials.

Figure 5.1 Recorded Sui and Gu tones of 1_T_1 (a)(b), 1_M_1(e)(f); Synthetic Sui and Gu tones of 1_T_1 (c)(d), 1_M_1 (g)(h)

(a) (b)

(e) (f)

(g) (h)

Figure 5.2 Spectra of recorded and synthetic tones of (a) 1_T_1 Sui tone (b) 1_T_1 Gu tone (c) 1_M_1 Sui tone (d) 1_M_1 Gu tone

(a)

(b)

(c)

(d)

5.1.2. Results from model 2

The second model described in section 4.3 can be employed to simulate the sounds from the Chime-bells with lower fundamental frequencies. These bells typically have fundamental frequencies lower than 1000 Hz and are larger in size. 1_T_5, 1_M_11, and 1_B_2 are the bells of this type. For 1_T_5, the fundamental frequencies of the Sui and Gu tones are 653 Hz and 807 Hz. For 1_M_11, the fundamental frequencies of the Sui and Gu tones are 283 Hz and 344 Hz. For 1_B_2, the fundamental frequencies of the Sui and Gu tones are 72 Hz and 85 Hz. The process for determining the parameters of loop filters is basically the same with the first model. However, with the appearance of two additional digital waveguides, more pairs of partials need to be identified. The T60 of these bells are longer compared to the bells with higher fundamental frequencies. For bells at the bottom level, the sound could last for 10 seconds or even longer. The waveforms of the recorded and synthetic tones of 1_T_5, 1_M_11, and 1_B_2 are shown in Fig. 5.3. Both the Sui and Gu tones have been simulated.

Similar result can be found that the general shapes of the recorded and synthetic tones are still close. The spectra of these tones are shown in Fig. 5.4. It can be seen that there are fairly good matches on the lower frequency partials. However, mismatches still exist in the higher frequency partials. Note that for bells at the bottom level, the high frequency components above 5000 Hz are obviously absent. This might be the effect of Mei on the bell body discussed in previous studies [27]. This effect is successfully simulated with the additional band-pass filter connected to the original inharmonic digital waveguides.

Figure 5.3 Recorded Sui and Gu tones of 1_T_5 (a)(b), 1_M_11(e)(f), 1_B_2(i)(j);

Synthetic Sui and Gu tones of 1_T_5 (c)(d), 1_M_11 (g)(h), 1_B_2 (k)(l)

(a) (b)

(i) (j)

(k) (l)

(e) (f)

(g) (h)

Figure 5.4 Spectra of recorded and synthetic tones of (a) 1_T_5 Sui tone (b) 1_T_5 Gu tone (c) 1_M_11 Sui tone (d) 1_M_11 Gu tone (e) 1_B_2 Sui tone (f) 1_B_2 Gu tone

(a)

(b)

(c)

(d)

(e)

(f)

5.2. Listening Test

To investigate the quality of the synthetic sounds, the listen tests have been conducted.

Both the similarity and the acceptability of the synthetic sounds have been examined. 20 subjects are randomly chosen from the population of students majored in music technology, which is approximately 59 people in total. The commercial software SPSS is used to estimate the statistic results. From the results, 70% of the subjects have played musical instrument for more than five years, this indicates that most subjects in this population have received musical trainings and could be more sensitive to sounds. The Cronbach’s 𝛼 from the test results is 0.881, indicating good reliability of the tests. Every tone has been graded by Likert 5-point scale, and the statistic results have been shown in Table 5.1. The mean value of the tones from model 1 (including 1_T_1 and 1_M_1), model 2 (including 1_T_5, 1_M_11, and 1_B_2), and all tones are listed at the bottom of the table.

Table 5.1 Statistic Results (N=20)

Bell ID Tone Similarity Acceptability

Mean Std. Deviation Mean Std. Deviation

1_T_1 Sui 3.50 .88 4.10 .71

Gu 4.35 .81 4.35 .67

1_M_1 Sui 3.30 .57 3.85 .67

Gu 2.85 .87 3.40 .99

1_T_5 Sui 3.50 1.00 3.65 .98

Gu 3.00 .72 3.45 .75

1_M_11 Sui 3.95 .82 4.25 .78

Gu 3.55 .99 3.85 .87

1_B_2 Sui 3.90 .85 3.80 1.00

Gu 4.45 .60 4.20 .95

Average Model 1 3.50 .78 3.92 .76

Model 2 3.72 .83 3.86 .88

All 3.63 .80 3.89 .83

5.3. Discussions

From the statistic results in Table 5.1, it can be seen that 1_B_2 Gu tone and 1_T_1 Gu tone have the highest similarity of 4.45 and 4.35 respectively. The reason could be the fact that these tones have relatively simple spectral contents and the lower partials could be matched easily. Some comments from the subjects point out that some tones are pleasant to hear for their cleanness compared to the recorded samples. This could be the reason why 1_T_1 Gu tone has the highest acceptability of 4.35. The 1_M_1 Gu tone has the lowest similarity of 2.85 and acceptability of 3.40 because the strong noise from the strike is mostly missing. Besides, tones from model 2 have a higher similarity than tones from model 1, this might be the fact that model 2 uses more waveguides to generate sounds, leading to better matches of the tones. For most tones, the acceptability is higher than similarity. This could

imply that the synthetic tones are acceptable to most subjects even though the disparities in tones might exist.

From the synthetic results, it can be found that the proposed models are capable of producing bell sounds with matched lower partials. However, there are still some noticeable mismatches in higher partials. Some of the unwanted peaks in higher frequencies are relatively low in magnitude, yet still audible to the sensitive ears. Moreover, although the high inharmonicity of the bells with lower fundamental frequencies can be modeled through additional digital waveguides, many nuanced partials in between are still omitted in the proposed model. These defects in the synthetic sounds are due to many factors.

First of all, the sound-producing mechanism of the Chinese Chime-bells might be overly simplified in the presented models. Since the Chime-bell is a three-dimensional object with complex inscriptions on the surface, the vibrational modes could be very complicated. Even though the combination of two inharmonic digital waveguides is able to generate few pairs of matched vibrational modes, it might still be insufficient to account for the real mechanism.

Also, the partials generated by inharmonic digital waveguides are determined through the

在文檔中以非諧音數位波導為基礎之中國編鐘聲音合成模型 (頁 35-0)