• 沒有找到結果。

Chapter 6 Experimental Results

6.1 Adaptive Beamformers Using NLMS Adaptation Criterion

6.1.1 Simulation Results

In this simulation, a speech source and two noises, a white noise and a music signal are considered and the linear array contains six microphones. The speech source comes from 0° relative to the linear array, and the white noise and the music signal come from -30° and 60° respectively. Figure 6-2 illustrates the arrangement of the microphone array and the sources. The value of γ is 106 and the step size λ of 0.4 is selected. Two simulations are shown: the first one is performed to compare the performance among different parameters, and the second one is performed to observe the adaptation performance of FDABB in a sudden change of the noise channel, which the noise moves from -30° to -60°. Moreover, the two simulations are executed in three environments specified by different channel response durations: 1024, 2048, and 3072 taps.

Figure 6-2 Arrangement of microphone array, noies and speech source in simulation experiments

In the first simulation, the locations of the speech source and the noises are fixed in the overall training data length. The soft penalty parameter μ has three options, which are 0, 2, and 4. The frame number L in a block varies from 10 to 20, and 20 to 30 corresponding to different soft penalty parameters and channel response durations. Two frequency-domain performance indexes, NSR and SDR, of the most significant frequency, 410Hz, are shown in Tables 6-1, 6-2, and 6-3. The values shown in Tables 6-1, 6-2, and 6-3 are computed by averaging the last 120 frames. The notation ADL indicates that the value of L is adjusted by the CBVI with a lower threshold of 0.02, an upper threshold of 1.2, and the initial frame number 10. In other words, if the CBVI is smaller than 0.02, the value of L will be increased. On the contrary, if the CBVI is larger than 1.2, the value of L will be reset to the initial frame number. Additionally, Table 6-1 summarizes the related parameters of FDABB. Figure 6-3 depicts the NSR and the SDR from C6 to C9 with channel response duration 1024 shown in Table 6-2.

Figure 6-3 shows that the measurement index in the condition with L=1 varies heavily than the one in the conditions with L=10, L=20, and L=30; that is, the performance of the NSR and the SDR cannot be guaranteed even when the algorithm is

channel response duration grows, but the proposed beamformer with a larger value of L would have smaller performance decay and have better convergence performance.

The SDRs of SPFDBB with L=10, L=20, and L=30 in the condition of μ =2 has decreased from about 1.84dB to 4.52dB as compared with those in the conditions of 0μ = . Although the NSR increases at the same time as the SDR fell, the SDR decreasing rate is more important for ASR applications when the NSR is very low, especially when a larger value of L is chosen.

Table 6-1 The First Simulation Experiment: Soft Penalty Parameter is 0

Channel response

duration 1024

Channel response duration 2048

Channel response duration 3072 Condition L NSR(dB) SDR(dB) NSR(dB) SDR(dB) NSR(dB) SDR(dB)

C1 L = 1 -73.88 -46.67 -60.92 -42.97 -57.55 -16.97 C2 L = 10 -102.82 -47.98 -91.53 -46.33 -90.60 -45.37 C3 L = 20 -111.00 -48.85 -98.45 -47.28 -97.12 -46.08 C4 L = 30 -122.92 -50.39 -112.57 -49.60 -105.32 -48.69 C5 ADL -122.40 -50.10 -113.42 -49.87 -106.32 -48.50

Table 6-2 The First Simulation Experiment: Soft Penalty Parameter is 2

Channel response

duration 1024

Channel response duration 2048

Channel response duration 3072 Condition L NSR(dB) SDR(dB) NSR(dB) SDR(dB) NSR(dB) SDR(dB)

C6 L = 1 -65.03 -44.20 -45.77 -42.35 -46.96 -14.50 C7 L = 10 -97.97 -49.82 -89.67 -49.80 -92.87 -47.59 C8 L = 20 -110.32 -51.16 -100.32 -50.62 -96.25 -48.96 C9 L = 30 -120.92 -52.80 -109.27 -52.24 -105.24 -52.13 C10 ADL -125.34 -52.31 -110.27 -52.21 -105.07 -52.11

Table 6-3 The First Simulation Experiment: Soft Penalty Parameter is 4

Channel response

duration 1024

Channel response duration 2048

Channel response duration 3072 Condition L NSR(dB) SDR(dB) NSR(dB) SDR(dB) NSR(dB) SDR(dB)

C11 L = 1 -46.08 -29.40 -42.91 -27.39 -37.27 3.06 C12 L = 10 -93.07 -50.06 -85.35 -49.85 -91.03 -48.08 C13 L = 20 -109.65 -52.54 -95.39 -52.01 -96.00 -50.37 C14 L = 30 -120.56 -54.02 -102.04 -53.87 -105.18 -53.21 C15 ADL -121.71 -53.97 -102.62 -53.92 -105.59 -53.59

Table 6-4 Parameters of the FDABB

Length of STFT 512 Samples Length of Input data in a frame 256 Samples

Shift of STFT 80 Samples

Window function Hamming

Initial block value 10 Block value increment 10

Threshold of CBVI 0.02 and 1.2

(a) NSR

(b) SDR

Figure 6-3 NSR and SDR form C6 to C9 with channel response duration 1024. The dash-dot line represents C6 (L=1), the dot line represents C7 (L=10), the straight line represents C8 (L=20), and the dash line represents C9 (L=30)

In the first simulation, FDABB adjusts the frame number twice from 10 to 30; first

Figure 6-5 shows the NSR and the SDR from C7 to C10 with channel response duration 1024 shown in Table 6-2. Since the initial frame number of FDABB is 10, the SDR and the NSR of FDABB are equivalent to the dash line in the first 261 samples. Obviously, the FDABB could not only perform well in a shorter adaptation process but could also obtain a good convergence result. Since SPFDBB adopts the soft penalty, it emphasize on the SDR improvement than the NSR. Consequently, the SDR of L=30 is better than the SDR of L=10 after frame 300 and the convergence period of the SDR is shorter than that of the NSR.

Figure 6-4 CBVI in the first simulation experiment

(a) NSR

(b) SDR

Figure 6-5 NSR and SDR form C7 to C10 with channel response duration 1024. The dash-dot line represents C10 (ADL ), the dot line represents C7 (L=10), the straight line represents C8 (L=20), and the dash line represents C9 (L=30)

In the second simulation, the location of white noise varies from -30° to -60° during the training data sequence. As shown in the Figs. 6-6 and 6-7, CBVI and the NSR both exhibit a big jump at frame 601 in response to the noise channel variation. Since the impulse response of the speech source is fixed, the SDR has a little variation. After this sudden change is detected, FDABB resets the value of L to the initial frame number to perform advanced adaptation of the noise channel and changes the frame number at frame 771 and frame 851 to maintain convergence.

(a) NSR

(b) SDR

Figure 6-7 NSR and SDR in the second simulation experiment. The dash-dot line represents C10 ( ADL ), the dot line represents C7 (L=10), the straight line represents C8 (L=20), and the dash line represents C9 (L=30)

Table 6-5 shows the number of multiplications ratios of FDABB and SPFDBB to the reference-signal-based time-domain adaptive beamformer. Significant saving of computing power can be achieved as these data indicated.

Table 6-5 Real Multiplication Requirement Ratio

Multiplication Requirement Ratio Adaptation Phase Lower Beamformer Phase FDABB with μ=2 in the first simulation case 1 : 8.57 1 : 20.72 FDABB with μ=2 in the second simulation case 1 : 8.48 1 : 20.72

SPFDBB with L=10 1 : 10.69 1 : 20.72

SPFDBB with L=20 1 : 11.04 1 : 20.72

SPFDBB with L=30 1 : 11.17 1 : 20.72