Introduction - 應用於3D音效之殘響器的合成與最佳化設計

Reverberation is a natural acoustical effect around our lives. Whatever sounds we listened in the world are immersed in somewhat reverberation effect. However the audio sound recording in a studio such as songs and music are dry (sounds without reverberation) that perceived not real enough and pleasurable. To add life to the dry recordings, through an instrument, called reverberator, to recreate such an echo effect that suits with the room which we want to listen. Besides, reverberator is a way to create the spaciousness and externalization effects, especially when headsets are used as the means of audio rendering. The reverberators will be realized by using many different artificial reverberation algorithms that will go into details in this thesis.

Thanks to the pioneers of artificial reverberator, Schroeder and Moorer [1], numbers of reverberation algorithms developed for synthesizing room responses range from the allpass/comb filters network. Some of them are very relevant (Dattorro, Gardner, and Jot) offering different approaches to obtain a reverberation algorithm with some specific characteristics: natural sounding, absence of tonal coloration, high echo density, control of the reverberation time, etc. Those IIR (Infinite Impulse Response)-based reverberators, have the merit in low complexity, but are often difficult to eliminate unnatural resonances. First, a feedback delay network (FDN) model [2] presents in section 2.3.1, which can be seen as a generalized Schroeder’s parallel comb filter by using a diagonal feedback matrix. While this structure is capable to generate much higher echo densities than the parallel comb filters. As we implement a reverberator we construct a 2-input and 2-output model with diagonal feedback matrix. Another artificial reverberator we present in this thesis is the Bai’s reverberator in section 2.3.2 which was reformed from the idea of multi-tap delay effect. Instead of using comb filters in the reverberator, we recommend a time-vary recursive filter with exponential decay delay lengths. The advantage of using this

recursive filter is reducing the obvious peaks in frequency domain and making the

“metallic” sounds disappear. However, the echoes we produce are too discrete and it leads to a grainy sound quality, particularly for impulsive input sounds. As a whole of these methods, whether simple or sophisticated, are based on the thesis of simplifying the algorithm and decrease the amount of computation. In order to determine an exact early reflection of the room with the given dimensions, we use the image source method [3] which constructs a 3D rectangular room and all virtual rooms around it. The details of using this method are shown in section 2.3.3. The third method of synthesis of reverberator we proposed is the nested allpass and comb filter reverberator. We recommend for using the nested allpass filters [4] that can increase the echo density with time and let the sounding effect more realistically in our reverberator. The details of this method will be described in section 2.3.4. The reverberator we recommend in this paper is combining the image source method for simulating the early reflections and nested allpass and comb filters for the late reverberation.

The optimum positions of the sound source and receiver that can produce the highest spaciousness will be presented in section 3.1. Allpass/comb filters network may be the most popular artificial reverberators. However, it generally takes quite an effort to adjust the parameters of the allpass/comb filters before an appropriate effect can be rendered. Concerning to this problem, we propose a way to optimize those parameters via the genetic algorithm (GA) that will present in section 3.2.

On the other hand, the FIR (Finite Impulse Response)-based reverberators, which convolve the input sequence with an impulse response modeling whatever rooms we want to listen. However, this method requires field measurements which needs quite large amount of processing power to carry on a computer and spends lots of money and time to measure the room response such as concert hall, church, and so on. In

chapter 4, a fast convolution technique, FFT (Fast Fourier Transform) block convolution, will be proposed for dealing infinity long input signal and long FIR room responses. On the other hand, the fast perceptual convolution [5], proposed by Chi-Min Liu, neglects the signals below the threshold in quiet and saves much computing complexity.

Chapter 5 proposes a way to determine the parameters of artificial reverberator that suite for the room we chosen via fuzzy user interface. It is a quicker solution for user to get the feeling of reverberation when they are listening music or songs.

Besides, we build a graphic user interface with five particular environments. It can be executed by Matlab and play the sounds with reverberation effect but in off-line.

In this thesis, we propose a summary of most often used methods for artificial reverberator and some improvements and recommendations on designing a reverberator. Our ultimate goal is to design a less complexity and well performed reverberator that can play in real-time.

2 Theory and Methods of Artificial Reverberators

2.1 Reproduction of Room Responses Reverberation plays a vital role in 3D audio reproduction in that it creates a

realistic sensation of diffusiveness of sound field and spaciousness of the acoustic environment. To reproduce the reverberation of a room, we need to determine the room response first. Generally speaking, the room response can be divided into two distinct parts as shown in Fig. 1(a). The first part is composed of many discrete echoes of the original sound, called early reflections which present the geometric configuration of the room and the positions of the sound source and receiver. In general, this part of reverberation is in the region after the direct sound from 0 to about 80 msec. If the reflection delay is greater than about 80 msec, the reflection

will be perceived as a distinct echo of the direct sound. The second part is the late reverberation which composed of more dense echoes that decrease exponentially with time is related to the diffusion of sound and the background ambience. A real church’s impulse response and an ideal room response are presented in Fig. 1(b).

There are many methods for producing reverberation of a room, and we can classify them into two approaches. The first one, the physical approach, attempts to simulate exactly the propagation of sound from the source to the listener for a given room. The simplest and most direct method is to measure the room response in a real room, and then rendering the reverberation by convolution. When the room to be simulated doesn’t exist, we can attempt to predict its room response based on purely physical considerations, the geometry of the room, properties of all surfaces in the room, and the positions and directivities of the sources and receivers. However, this approach is computationally expensive and rather inflexible for real-time implement.

The second approach, called a perceptual approach, attempts to reproduce only the salient characteristics of reverberation. Without knowing the information of the room, we can construct a digital filter with N parameters that reproduces exactly N independent attributes reverberation, and then plug those parameter estimates into our reverberator. The method is generally much simpler constructed, more flexible and ideal than the physical approach.

This approach has many potential advantages：

The reverberation algorithm can be based on efficient infinite impulse response (IIR) filters which can be implemented without spending much computation.

The reverberation algorithm can provide real-time control of all the perceptually relevant parameters.

Only one algorithm is required to simulate all reverberation.

There are many important properties about the room response needed to be considered in the design of efficient reverberators and we will discuss them as follows:

2.1.1 Echo Density

In the time domain, the echo density of a room response was defined as the number of echoes reaching the listener per second.

4 ( )3 differentiating with respect to t, we obtain that the density of echoes is proportional to the square of time:

The normal modes of a room are the frequencies that are naturally amplified by the room. The number of normal modes N below frequency f is nearly _f independent of the room shape and is given as follows：

3 2 the number of modes per Hertz.

Thus, the modal density of a room response grows proportionally to the square of the frequency.

2.1.3 Reverberation Time

The room effect is often characterized by its reverberation time (RT), a concept first established by Sabine in 1900. The reverberation time is proportional to the volume of the room and inversely proportional to the amount of sound absorption of the walls, floor and ceiling of the room. The Sabine’s empirical formula estimating the reverberation time lists as follows:

60 associated absorption coefficient, and the total absorption of materials is A . Since most materials of surface in a room are more absorptive at high frequencies, the reverberation time of a room is also decreases as the frequency increases. The reverberation time is used for estimating the degree of sound absorption in a room.

2.1.4 Energy Decay Curve (EDC) and Energy Decay Relief (EDR)

The method to determine the reverberation time of a measured room is finding the time when the associated sound pressure attenuate 60 dB in the plot of the EDC, Schroeder proposed in 1965. He suggested integrating the impulse response of the room to get the room’s energy decay curve.

2 the EDC to help visualize the frequency dependent natural of reverberation called the energy decay relief ^{EDR t}

( )

^,ω . The EDR represents the reverberation decay as a function of time and frequency in a 3D plot. To compute it, we divide the impulse response into multiple frequency bands and compute Schroeder’s integral for each

band. An example, the EDR of a typical hall is shown in Fig. 2.

2.1.5 Modeling Early Reflection

The way to model early reflection of the reverberation has mentioned before, the physical approach. A room response from a source to a listener can be obtained by solving the wave equation also known as the Helmholtz equation. However, it can seldom be performed in an analytic form and is more complex in solving. Therefore, the solution must be approximated and there are three different approaches in computational modeling of room based on acoustics [3].

Wave-based methods

Ray-based methods

Statistical models

The ray-based methods, including the ray-tracing and the image-source methods, are the most often used modeling techniques. With the assumption of the wavelength of sound is small compared to the area of surfaces in the room and large compared to the roughness of surfaces, all phenomena due to the wave nature, such as diffraction and interference, are ignored. The image-source method examines the effects of an acoustic source in a room with corresponding sources located in image rooms with reflecting boundaries. Each of the infinite image sources will produce attenuated, filtered and delayed version of the original acoustic input. The total effects can be summed to produce a transfer function or a FIR filter.

2.1.6 Modeling Late Reverberation

There are two approaches to model late reverberation, the FIR-based and IIR-based methods. Implementing convolution using the direct form FIR filter is extremely inefficient when the filter size is large. Typical room responses are several seconds long, which at a 44.1 kHz sampling rate would translate to a huge number of points filter. One method to deal with the large size FIR filter is using an

algorithm based on the Fast Fourier transform (FFT) block convolution [6]. The second method is trying to model the late reverberation of a room based on some IIR-filters, comb and allpass filters, Schoroeder proposed first in the early 1960’s, or a mixture of them. The details of comb and allpass filters will be discussed in the next section.

2.2 Filter Building Blocks 2.2.1 Comb Filter

The block diagram of comb filter shown in Fig. 3 consists of a single delay line of m samples with a feedback loop containing an attenuation gaing . The z- transform of the comb filter is given by:

( ) 1 Note that to achieve stability, g must be less than unity. The time response of this filter is an exponentially decaying sequence of impulse spaced m samples apart.

This is good for modeling reverberation because real room have a reverberation tail decaying somewhat exponentially. However, the echo density is really low, causing a “fluttering” sound on transient input. The pole-zero map of the comb filter shows that a delay line of m samples creates a total of m poles equally spaced inside the unit circle when it is stable. Half of the poles are located between 0 Hz and the Nyquist frequency / 2f = f_s Hz, where f is the sampling frequency. That is why the _s frequency response has m distinct frequency peaks giving a “metallic” sound to the reverberation tail. We perceive this sound as being metallic due to hearing the few decaying tones that correspond to the peaks in the frequency response.

2.2.2 Allpass Filter

Because the poor performance of frequency response of a comb filter, Schroeder modified to provide a flat frequency response by mixing the input signal and the comb

filter output as shown in Fig. 4. The resulting filter is called an allpass filter because its frequency response has unit magnitude for all frequencies. The z-transform of the allpass filter is given by:

( ) 1 The poles of the allpass filter are thus the same as for the comb filter, but the allpass filter now has zeros at the conjugate reciprocal locations.

And the response of an all-pass filter sounds quite similar to the comb filter, tending to create a timbral coloration.

2.2.3 Nested Allpass Filter

To achieve a more natural-sounding reverberation network, it would be desirable to combine the unit filters to produce a buildup of echoes, as it would occur in real rooms. One solution to produce more echoes is cascading multiple allpass filters which Schroeder had experimented with reverberators consisting of 5 allpass filters in series. Schroeder noted that these reverberators were indistinguishable from real rooms in terms of coloration, which may be true with stationary input signals, but other authors have found that series allpass filters are extremely susceptible to tonal coloration, especially with impulsive inputs. Gardner proposed reverberators based on a “nested” allpass filter, where the delay of an allpass filter is replaced by a series connection of a delay and another allpass filter. The block diagram and its impulse response are shown in Fig. 5(a), where the allpass delay is replaced with a system functionN z , which is allpass. Then the transfer function of this from is written: ( )

1 The advantage of using a nested allpass filter can be seen in the impulse response in Fig. 5(b). Echoes created by the inner allpass filter are recirculated to itself via the outer feedback path. Thus the echo density of a nested allpass filter increases

with time, as in real rooms.

2.3 Synthesis of Artificial Reverberators

2.3.1 Feedback Delay Networks (FDN) Reverberator

Gerzon [7] generalized the notion of unitary multichannel networks, which are N-dimensional analogues of the all-pass filter. An N-input, N-output LTI system is

defined to unitary if it preserves the total energy of all possible input signals.

Stautner and Puckette [8] then proposed a four channel general network based on four delay lines and a feedback matrix A, as shown in Fig. 6(a). In this matrix, each coefficient a_mncorresponds to the amount of signal coming out of delay line n sent to the input of delay line m. Gerzon found that the stability of this system is ensured if matrix A is the product of unitary matrix and an attenuated gain g with g <1.

To analyze the FDN system showed in Fig. 6(b), we derive the difference equations in the following:

1 Using the z-transform, assuming zero initial conditions, we can rewrite Eqs. (10) and (11) in the frequency domain as

( ) ^T ( ) ( )

Eliminating )s(z in Eqs. (12) and (13) gives the following system transfer function：

1 1

In simulation, we construct a 2-input, 2-output FDN reverberation. The feedback matrix A in our system is a unitary matrix ⎥

⎦ produce the effect of frequency dependent reverberation time, we add an absorbent filter that was proposed by Jot [9] in 1991. The absorbent filter method is to add a low-pass filter in each delay line for decaying the high frequency signals. The low-pass filter we used is a one-order filter with cut of frequency at 10 KHz. The result impulse and frequency response of the 2-channel FDN reverberator is shown in Fig. 7.

2.3.2 Multi-tap Delays and Bai’s Reverberator

Multiple delayed [11] values of an input signal can be combined easily to produce multiple reflections of the input. This can be done by having multiple taps pointing to different previous inputs stored into the delay line, or by having separate memory buffers at different sizes where input samples are stored. The typical impulse response of the multiple delay effect is presented in Fig. 8 (a). The difference equation is a simple modification of the single delay case. For instance, the difference equation of a 5 delays multi-tap delay processing algorithm would perform as the following:

1 1 2 2 3 3 4 4 5 5

[ ] [ ] [ ] [ ] [ ] [ ] [ ]

y n =x n +a x n−D +a x n−D +a x n−D +a x n−D +a x n−D . (15) The multi-tap delay can be performed as the direct-form FIR structure that shown in Fig. 8(b). In order to add an infinite number of delays, the difference equation then becomes an IIR comb filter. As a result of some disadvantages of the comb filter, we develop the pattern of the Bai’s reverberator by using the concept of the delay modulation whose delay-line center is varying with time. The details of the delay

modulation effects will be mentioned in appendix.

The bad performances, a noticeable and unpleasant sound likes a beating in a parallel wall back and forth, for the comb filter are owing to the poles in zero-pole domain and the peaks in frequency domain spaced equally. To solve this problem, we substitute the comb filter for a time-varying recursive IIR filter with an exponential decay delay length in our reverberation system. As the same reason of increasing the echo density and modal density for the Schroeder’s reverberator, we parallel multiple time-varying recursive IIR filters to increase the echo density. In practically, the structure of the Bai’s reverberator we constructed is one filter to

present the early reflection and series another one to present the late reverberation.

The advantage of using this model is reducing the obvious peaks in frequency domain and making the “metallic” sounds disappear. However, the echoes we produce are too discrete and it leads to a grainy sound quality, particularly for impulsive input sounds. The result via this scheme is shown in Fig. 9.

2.3.3 The Image-Source Method

The main mathematical approaches of modeling spatial sound fields are wave-based methods and ray-based methods. The wave-based methods are the more computationally demanding techniques such as the finite element method (FEM) and boundary element method (BEM). The techniques are suitable for simulation of low frequencies only. However, the ray-based methods, the ray-tracing and the image-source methods [12] [13], are based on geometrical room acoustics, in which the sound is supposed to act like rays. The basic distinction between the ray-tracing and the image-source methods is the way the reflection paths are typically calculated.

在文檔中應用於3D音效之殘響器的合成與最佳化設計 (頁 12-0)