Optimization of Late Reverberation by Using Genetic Algorithm

3 Optimal Design of Reverberators

3.3 Optimization of Late Reverberation by Using Genetic Algorithm

In the parallel comb filters and nested allpass filters system for modeling late reverberation, the input parameters are comb filter delays, comb filter attenuation gain, allpass filter delays, allpass filter attenuation gains, and the low-pass filter coefficient α . It is less efficient and wasting time for searching the eighteen optimized coefficients by GA at one time. Based on the difference performances among those filters, we divide our system into four steps. The coefficients that we will optimize in the first step are the three delays and three gains of the three-layer nested allpass filters d and _i g_i (i=1, 2, 3). The upper and lower limits for the delays are defined as 1000 and 50 and for the gains are defined as 1 and 0.1. The goal of optimization in this step is to deal a high echo density Ed and a high impulse energy En. Therefore,

the fitness function F₁( )θ₁ is defined as follows

1( )1 _d( )1 _n( ),1

F θ =E θ +wE θ (25) where θ₁=[ d d₁ ₂ d₃ g g₁ ₂ g₃] is the chromosome and w is the weighting between those two cost functions of E_d( ) and θ₁ E_n( )θ₁ .

In the second step, we want to optimize the ten delays of the parallel comb filters ( 1, 2 10)

ci i= L . The upper limit here is set to 3528 samples (80ms), and the lower limit is 441 samples (10ms) for a middle size room. The objective is to find the best chromosome so that the echo density and the modal density M could reach highest _d simultaneously. By the way, we estimate the modal density by calculating the number of poles existed on pole-zero map. The fitness function F₂(θ₂) is defined as follows

2( )2 _d( )2 _d( ),2

F θ =E θ ×M θ (26) where θ₂ =[ c c₁ ₂ L c₁₀]. In a comb filter, the model density will be decreasing when the echo density is increasing. For the reason of the trade-off between echo density and modal density, we use a multiplication here between those two parameters.

In the third step, we evaluate the gain of comb filters by a given constraint that is the reverberation time (T60) of room. When the value of the T60 has been evaluated from the room, we can search for the best gain of comb filter via GA. If the gain that we choose let the total T60 over the constraint, the step will not be stopped until it finds the proper one.

Finally in the fourth step, we get the impulse responses that created by the software of Cool Edit Pro.2.0 with the difference room modes, like church, living room or large auditorium and so on. In each room mode, we transform the impulse response into frequency domain and find the best chromosome of the parameter of α so that the frequency response curve approximates the desired one. The fitness

function is defined as

4 min( ( ) ˆ( ) ),

F = P t −P t (27)

where ( )P t is the desired frequency response and P tµ( ) is the synthesized frequency response. The flow chart of the optimization procedure for our scheme is shown in figure 15.

There are forty populations formed randomly per parameter, and the each population contains eight chromosomes. The crossover rate and mutation rate are 0.85 and 0.008. After executing each step with 100 generations, the GA optimization procedures are conducted to find the best parameters to our late reverberation. The optimum parameters we determined are listed in Table 1 and Table 2.

4 Artificail Reverberators by Using Fast Convolvers 4.1 Block Convolution

FIR-based reverberators are implemented by convolution methods. The input signal that we processed is always infinity in length, and it needs to be segmented into several of block signals. There have been two approaches to deal with the FFT block convolution: overlap-and-save method and overlap-and-add method. The convolution between input signal x[n] and impulse response h[n] of length L is expressed as The overlap-and-add method adopts non-overlapped input segments to calculate overlapped output segments. The overlapping occurs because the linear convolution of each segment with the impulse response is longer than the length of the segment.

To extend the overlap-and-add approach to block convolution, let the input signals x[n]

and impulse response h[n] are segmented as a sum of finite-length segments of length where M is the number of how many blocks of the impulse response except the least one that the length is small than N, i.e. L

Substituting Eqs. (29) and (30) into Eqs. (28) yields Because convolution is linear time-invariant, it follows that

The overlap-and-save method implements the circular convolution or linear convolution of the impulse response with the overlapped inputs, and the resulting output segments are patched together to from the output. If an L-point sequence is circularly convolved with a P-point sequence (P < L), then the first (P－1) points of the result are incorrect. Therefore, we can divide x[n] into sections of length L so that each input section overlaps the preceding section by (P－1) points. That is, we define the sections as

[ ] [ ( 1) 1], 0 -1

x nr =x n+r L− + − +P P ≤ ≤n L (36) The circular convolution of each section with hs[n] is denoted as yr,sp[n] in which time aliasing has occurred. This procedure is called the overlap-and-save method because the input segments overlap, so that each succeeding input section consists of (L－P+1) new points and (P

－1) points saved from the previous section.

The comparison between these two methods is discussed as follows. By using the overlap-and-add method, the overlap output samples must be added together and needed to define a larger buffer to store the output than the overlap-and-save method.

However, using the overlap-and-save method, the correct output samples are less than using the overlap-and-add method and the more input blocks will be segmented. For the convenience of processing the output signals, we adopt the overlap-and-save method for the FFT block convolution.

4.2 FFT Block Convolution

In this section, we use the overlap-and-save method to implement the FFT block convolution by transforming each pair of small blocks to DFT domain and performing multiplications on DFT domain. Since the complexity of the specific sizes of DFT can be reduced form O(N²) to O(NlogN) by FFT algorithms, these algorithms can perform the convolution with significant speed improvement.

To estimate the complexity of FFT block convolution in our FIR-based reverberator, we set each impulse response block size to 4,096 samples and the input signal block size to 8,192 samples. The impulse response that we measured in a

church has 49,152 samples and be segmented to 12 blocks. First, we transfer each impulse block to FFT domain with 8,192 points and it needs

4096 log 40962 =49,152multiplications and additions. The total complexity of computation with storing the FFT data of the segmented impulse response is 589,824 multiplications and additions. For transferring each block of input signal to FFT domain with 8,192 points needs 116,496 multiplications and additions. In the frequency domain, we multiply the impulse response and input signal and determine the summation of the 12 blocks impulse response. There are 12×8,192 complexity of multiplications and (12－1)×8,192 complexity of additions. To execute the inverse FFT with 8,192 samples, we need another 116,496 multiplications and additions. Thus, the total complexity of computation is 56.35 million multiply -accumulate per second (MMACS) for a stereo signal with 44.1k sampling rate.

4.3 Fast Perceptual Convolution

The fast perceptual convolution is the way to reduce the computational complexity required by FIR-based reverberators. However, the threshold in quiet is the threshold to characterize the minimum amount of energy needed in human hearing system in a noiseless environment. The main principle of the fast perceptual convolution is to reduce the multiplications needed in frequency domain below the threshold in quiet and can be integrated well with the FFT-based convolution methods.

The segmented impulse response of the FIR filter becomes sparse and hence reduces the complexity.

One well known threshold is the one made by Painter and Spanias [17]. The output signal of each output block Y k will not be perceptible if the energy is _r_'[ ] lower than the threshold. The r’ here is distinguished from the r for the overlap-and-add method. That is

'[ ] [ ] threshold in quiet is presented in Fig. 16. Because each blocked output is the summation of the multiplication of the input blocked signal and segmented impulse response in frequency domain. Substituting (38) to (40) leads to

1 Because the maximum signal magnitude is set to 1, (41) can reduce to

reduce the complexity according to (44). The block diagram of fast perceptual convolution is shown in Fig. 17, and the spectrum for all the segmented impulse responses is shown in Fig. 18. The higher frequency part will decay faster than lower frequency part.

5 Fuzzy User Interface For Reverberators

In this chapter, we utilize the concept of fuzzy logic [18] to determine the input variables of the artificial reverberator we proposed, and build a fuzzy control system

as a user interface.

5.1 Fuzzy Logic and Fuzzy Inference System

Fuzzy logic is a technology for developing sophisticated inference and human decision of reducing and explaining system complexity. A fuzzy inference system works with an input value, performs some calculations, and generates an output value.

It includes a fuzzifier, a fuzzy rule base, and a defuzzifier. The typical architecture of a fuzzy inference system is shown in Fig. 19. In the fuzzifier, there are several types of membership function which transform input data into suitable linguistic values. The fuzzy rule base stores the empirical knowledge of the operation of the process of the domain experts. The defuzzifier is used to yield a nonfuzzy decision or output from an inferred fuzzy input.

5.2 Fuzzy User Interface for Artificial Reverberators

In order to realize our artificial reverberator without making a decision with many input parameters, we propose the scheme of using friendly user interface [19]

via fuzzy inference. In Figure 20, we can see that the scheme of fuzzy user interface for artificial reverberator is composed of two stages. In the first stage, we summarize the five subjective indices to describe the reverberation impression such as Room_size, Diffusion, Warmth, Clarity and Reverb. The relationship between those

subjective indices and what the room mode we chose is presented in Table. 3. In our system, we choose the five most familiar room modes, and they are Living Room, Small Club, Church, Large Auditorium and Gymnasium, respectively. In the second stage, we use the fuzzy control system to transform those five subjective indices into eight system parameters corresponding to the inputs of our artificial reverberator.

The eight system parameters are Dim (room dimension), Comb_d (comb filter delay), Comb_g (comb filter gain), Apd (allpass filter delay), Alpha (high frequency ratio), Fc

(cut of frequency of early reflection), Ge (early reflection gain) and Gr (late

reverberation gain). We use common-sense fuzzy rules for the determination of quantization level. Fuzzy rules deal with room effect as fuzzy association (Ai, Bi) representing the linguistic rule “IF X is A_i, Then Y is B_i”.

5.2.1 Fuzzification

In the first stage of our fuzzy inference system, the R (room mode) is the only one fuzzy variable. If we choose the LR (Living room), SC (Small club), CH (Church), LA (Large auditorium), and GY (Gymnasium), the value of Room_mode is set to 0.2, 0.4, 0.6, 0.8, and 1.0 respectively. In each subjective index decision, four fuzzy membership functions, namely, S (small), M (medium), L (large), and V (very large) are assigned to their input. Gauss function is adopted here for the membership functions of those subjective indices shown in Fig. 21. The mathematical formula of Gauss membership function is represented as follows:

In the second stage of our fuzzy inference system, these five subjective indices are the fuzzy inputs and the eight system parameters are the fuzzy outputs. The Room_size is the index for presenting how large the room that we chose is. The

larger the Room_size is, the larger the dimension of room, the longer the comb filter delay, and the larger the comb filter gain will be. As for Dim, we define the universe of discourse ranging from 0 to 50, and its four fuzzy sets, SL (short), ML (moderate), LL (long), and VL (very long). Each fuzzy set has its own membership function and the characteristic values are shown in Fig. 22(a). The value of Dim got from the fuzzy inference system is set to the length of the room in our early reflection model.

The width and height of that room are set to 0.6 and 0.15 times the value of Dim. As for Comb_d, we define the universe of discourse ranging from 0 to 3600, and its four fuzzy sets and membership functions are shown in Fig. 22(b). The value of Comb_d

is set to the average value of the ten comb filter delays. As for Comb_g, we define the universe of discourse ranging from 0 to 10, and its four fuzzy sets and membership functions are shown in Fig. 22(c). The value of Comb_g is set to the comb filter gain.

The index Diffusion is the rate of echo buildup and how diffuse the echoes are.

The larger the Diffusion is, the shorter the allpass filter delay will be. As for Apd, we define the universe of discourse ranging from -150 to 300, and its four fuzzy sets and membership functions are shown in Fig. 22(d). The value of Apd is the adjustment in the optimum results of allpass filter delays determined by using GA.

The Warmth is the liveness of the bass or the reverberation for the low frequency.

The more warmth the room is, the lower high frequency ratio and cut of frequency will be. As for Alpha, we define the universe of discourse ranging from 0 to 0.9, and its four fuzzy sets and membership functions are shown in Fig. 22(f). As for Fc, we define the universe of discourse ranging from 6K to 16K, and its four fuzzy sets and membership functions are shown in Fig. 22(e).

The Clarity and Reverb are complementary in the amount of the early reflection and late reverberation. The more clarity of the sound results the less reverberation, the larger the early reflection gain and the smaller the late reverberation gain will be, and vice versa. As for Ge and Gr, we define the universe of discourses ranging from 0 to 0.9, and their four fuzzy sets and membership functions are shown in Fig. 22(f).

5.2.2 Rule Evaluation

According to the descriptions in above subsection, there are two groups of fuzzy decision rules for our system. In the rule 13-16 of Group 2, we define a weighting between those two fuzzy inputs because these two indices are complementary.

Group 1:

Rule 1: IF R is LR THEN Room_Size is S AND Diffusion is M AND Warmth is S

AND Clarity is L AND Reverb is L

Rule 2: IF R is SC THEN Room_Size is M AND Diffusion is L AND Warmth is M AND Clarity is S AND Reverb is M

Rule 3: IF R is CH THEN Room_Size is L AND Diffusion is VL AND Warmth is L AND Clarity is VL AND Reverb is VL

Rule 4: IF R is LA THEN Room_Size is VL AND Diffusion is L AND Warmth is VL AND Clarity is M AND Reverb is S

Rule 5: IF R is GY THEN Room_Size is L AND Diffusion is S AND Warmth is M AND Clarity is VL AND Reverb is M

Group 2:

Rule 1: IF Room_Size is SR THEN Dim is SL AND Comb_d is SD AND Comb_g is SG

Rule 2: IF Room_Size is MR THEN Dim is ML AND Comb_d is MD AND Comb_g is MG

Rule 3: IF Room_Size is LR THEN Dim is LL AND Comb_d is LD AND Comb_g is LG

Rule 4: IF Room_Size is VR THEN Dim is VL AND Comb_d is VD AND Comb_g is VG

Rule 13: IF Clarity is SC AND 0.3(Reverb is Vrev) THEN Ge is Sge Rule 14: IF Clarity is MC AND 0.3(Reverb is Lrev) THEN Ge is Mge Rule 15: IF Clarity is LC AND 0.3(Reverb is Mrev) THEN Ge is Lge Rule 16: IF Clarity is VC AND 0.3(Reverb is Srev) THEN Ge is Vge Rule 17: IF Reverb is Srev THEN Gr is Sgr

Rule 18: IF Reverb is Mrev THEN Gr is Mgr Rule 19: IF Reverb is Lrev THEN Gr is Lgr Rule 20: IF Reverb is Vrev THEN Gr is Vgr

5.2.3 Defuzzification

Defuzzification is a mapping from a fuzzy control actions defined over an output universe of discourse into a space of nonfuzzy (crisp) control actions. The fuzzy reasoning of the first type－Mamdani’s minimum fuzzy implication rule, Rc, is used.

This fuzzy reasoning process is illustrated in Fig. 23. The method of defuzzification we chose here is the center of area (COA) method, and it yields

where n is the number of quantization levels of the output, Z_j is the amount of control output at the quantization level j, and µ_C(Z_j) represents its membership value in the output fuzzy set C.

5.3 Graphic User Interface

For the purpose of friendly using and easy-understanding for the user who want to listen music or sound in some specific environments. We use the Matlab software and its fuzzy logic toolbox to develop a graphic user interface. It contains the artificial reverberator we proposed before and fuzzy inference module. In the view of user interface, we can first choose one from the five environments that we set

inside. Then, we can push the “Run” button to get those five subjective indices that judges the quality of the hearing conditions in the environment and the eight parameters for the artificial reverberator inputs. Besides, we can see a small window that will plot the impulse response, frequency response or the energy decay curve of the environment. The reverberation time of the environment will be shown in the left bottom. After typing the file name of the input signal, we can play the output mixed direct sound and its reverberation signal. Figure 24-26 shows the fuzzy user interface with the window which plots the impulse response, frequency response, and the EDC, respectively.

6 CONCLUSIONS

The FIR-based reverberators really have better quality compared to the IIR-based approach. However, the fact that high computational complexity and no flexibility in choosing listening environment of the FIR-based reverberators limits the applicability to practical system. The fast perceptual convolution method is really good to reduce the computational complexity of convolution but it needs an impulse response recorded in real room. In this paper, we proposed a model for the reverberator with the nested allpass/comb filter for late reverberation and image source method for early reflection. The late reverberation is the IIR-based reverberator with very small computational complexity and the early reflection is the FIR-based reverberator with very short length. By using the method, the complexity can be reduced drastically and there is only a little ringing, metallic and other artifacts.

In other words, the nested allpass/comb and image source method seems to be a good tradeoff between the IIR-based and FIR-based approaches from the aspect of the complexity and the reverberation quality.

On the other hand, most of the allpass/comb filters network’s parameters were

not optimized. The reason of using a series of optimized parameters is to achieve a high-quality reverberation effect. The optimum parameters of our reverberator were determined by genetic algorithm for a medium church. However, the parameters of more environments we interesting in can be added with the values generated by the fuzzy inference system. With the design of friendly user interface, people can choose one of the five familiar rooms and enjoy the reverberation effect without doing any effort in tuning the parameters. Finally, our reverberator can be playing in real-time via the DirectShow which is a platform in Microsoft Windows system for multimedia rendering.

在文檔中應用於3D音效之殘響器的合成與最佳化設計 (頁 28-0)