2.1 Delays and Echoes
2.1.1 Single Reflection Delay
To create a single reflection (See Figure 1) of an input signal, the implementation shown above is represented in the following difference equation [2]:
n x n ax
n D
y , (1)
and it‟s transfer function is : H
Z 1aZD ; (2) Notice that the input x
n is added to a delayed copy of the input. The signal can be attenuated by a factor that is less than 1, because reflecting surfaces, as well as air, contain a loss constant a due to absorption of the energy of the source wave. The delay D represents the total time it takes for the signal to return from a reflecting wall.D is created by using a delay-line buffer of a specified length in DSP memory. The
frequency response results in a FIR comb filter where peaks in the frequency response occur at multiples of the fundamental frequency [2]. Comb filters result whenever a direct input signal is combined with delayed copies of the direct input.
The DSP can subtract the delay instead of adding it:
n x n ax
n D
y ; (3) An example implementation for adding an input to a delayed replica is:
n x
n x
n D
multiple reflections of the input. This can be done by having multiple taps pointing to different previous inputs stored into the delay line, or by having separate memory buffers at different sizes where input samples are stored.The difference equation is a simple modification of the single delay case. To 5 delays of the input, the DSP processing algorithm would perform the following difference equation operation:
n x n a x
n D
a x
n D
a x
n D
a x
n D
a x
n D
y 1 2 2 3 3 4 4 5 5 ; (5)
The structure (see Figure 2) uses 5 delay-line tap points for fetching samples. In addition, feedback can be used to take the output of the system delay and feed it back to the input.
2.2. Delay Modulation Effects
Delay-based Effects are some of the more interesting type of audio effects but are not computationally complex. The technique used is often called Delay-Line Interpolation [3], where the delay-line center tap is modified, usually by some low
frequency waveform. Figure 3 summarize some common types of modulators used for moving the center tap of a delay-line [4].
Consider the FIR comb filter. If the delay is in the range 10 to 25 ms, we will hear a quick repetition named slapback or doubling. If the delay is greater than 50 ms we will hear an echo. If the time delay is short (less than 15 ms) and if this delay time is continuously varied with a low frequency such as 5 Hz, we will hear the flanging effect. If several copies of the input signal are delayed in the range 10 to 25 ms with small and random variations in the delay times, we will hear the chorus effect, which is a combination of the vibrato effect with direct signal [5]. These effects can also be implemented as IIR comb filters.
The general structure (Figure 4) described by J. Dattorro [3] will allow the creation of many different types of delay modulation effects. Each input sample is stored into the delay line, while the moving output tap will retrieved from a different location in the buffer rotating from the tap center. When the small delay variations are mixed with the direct sound, a time-varying comb filter results [2, 3].
The general delay line equation for the structure is:
n a x
n a x
n d
n
afx
n Dfixed
y 1 2 ; (6) N = variable delay d
n ;and, d
n rotates around tap center of delay line D.As we will see, the above general structure will allow the creation of many different types of delay modulation effects. Each input sample is stored into the delay
line, while the moving output tap will retrieved from a different location in the buffer rotating from the tap center. If a delay of an input signal is very small (around 10 ms), the echo mixed with the direct sound will cause certain frequencies to be enhanced or canceled (due to the comb filtering). This will cause the output frequency response to change. By varying the amount of delay time when mixing the direct and delayed signals together, the variable delay lines create some amazing sound effects such as chorusing and flanging.
2.2.1. Flanging Effect
Flanging was coined by the way it was accidentally discovered. As legend has it, a recording engineer was recording a signal onto 2 reel-to-reel tape decks and monitored from both playback heads of the 2 tape decks at the same time. While trying to simulate the ADT or doubling effect, it was discovered that small changes in the tape speed between the 2 decks created a “swooshing” jet sound. This effect was further enhanced by repeatedly leaning on the flanges of one of the tape reels slightly to slow the taped down. Thus the flanging effect was born.
It is very easy to recreate this effect using a DSP. Flanging can be implemented in a DSP by varying the input signal with a small, variable time delay at a very low frequency and adding the delayed replica with the original input signal (Figure 5).
When the time delay offset is varied by rotating the delay-line center tap, the in-phase and out-of-phase frequencies as a result of the comb filtering sweep up and down the frequency spectrum. The “swooshing” jet engine effect created as a result is referred to as flanging.
By modifying the single reflection echo equation, the flanging can be implemented as follows:
n a x
n a x
x d
n
a x
n D
y 1 2 f ; (7)
nd rotates around tap center of delay line D, and notice that it must scale each signal by a constant to prevent overflow:
Flanging is created by periodically varying delay d(n). The variations of the delay time (or delay buffer size) can easily be controlled in the DSP using a low-frequency oscillator sine wave lookup table that calculates the variation of the delay time, and the update of the delay is determined on a sample basis or by the DSP‟s on-chip timer.
2.2.2. Chorus Effect
Chorusing is used to “thicken” sounds. This time delay algorithm (between 10 and 35 milliseconds) is designed to duplicate the effect that occurs when many musicians play the same instrument and same music part simultaneously. Musicians are usually synchronized with one another, but there are always slight differences in timing, volume, and pitch between each instrument playing the same musical notes.
This chorus effect can be re-created digitally with a variable delay line rotating around the tap center, adding the time-varying delayed result together with the input signal.
Using this digitally recreated effect, a 6-string guitar can also be chorused to sound more like a 12-string guitar. Vocals can be thickened to sound like more than one musician is singing.
The chorus algorithm is similar to flange, using the same difference equation, except the delay time is longer. With a longer delay-line, the comb filtering is brought down to the fundamental frequency and lower order harmonics. Figure 6 shows the structure of a chorus effect simulating 3 instruments [2, 3]. To implement a chorus of 3 instruments, 2 variable delay lines can be used. Use a scaling factor of a constant to prevent overflow with fixed point math while mixing all three signals with equivalent gain.
n a x
n a x
n d
n
a x
n d
n
a a
x
n D
y 1 2 1 3 2 f1 f2 ; (8) 2.2.3. Vibrato Effect
The vibrato effect duplicates vibrato in a singer's voice while sustaining a note, a musician bending a stringed instrument, or a guitarist using the guitars whammy bar.
This effect is achieved by evenly modulating the pitch of the signal. The sound that is produced can vary from a slight enhancement to a more extreme variation. It is similar to a guitarist moving the whammy bar, or a violinist creating vibrato with cyclical movement of the playing hand. Some effects units offered vibrato as well as a tremolo. However, the effect is more often seen on chorus effects units [6].
The slight change in pitch can be achieved (with a modified version of the chorus effect) by varying the depth with enough modulation to produce a pitch oscillation.
This is accomplished by changing the modify value of the delay-line pointer on-the-fly, and the value chosen is determined by a lookup table. This results in the interpolation/decimation of the stored samples via rotating the center tap of the delay line. The stored history of samples are thus played back at a slower, or faster rate, causing a slight change in pitch .
To obtain an even variation in the pitch modulation, the delay line is modified using a sine wavetable. Note that this is a stripped down of the chorus effect, in that the direct signal is not mixed with the delay-line output. This effect is often confused with tremolo, where the amplitude is varied by a LFO waveform. The tremolo and vibrato can both be combined together with a time-varying LPF to produce the effect produced by a rotating speaker (commonly referred to a 'Leslie' Rotating Speaker Emulation). The figure 7 shows the implementation of the vibrato effect.
2.3. Amplitude-Based Audio Effects
2.3.1. Tremolo Effect
Tremolo consists of panning the output result between the left and right output stereo channels at a slow periodic rate. This is achieved by allowing the output panning to vary in time periodically with a low frequency sinusoid. This example pans the output to the left speaker for positive sine values and pans the output to the right speaker for negative sine values (Figure 8). The analog version of this effect was used frequently on guitar and keyboard amplifiers manufactured in the '70s. A mono version of this effect can be done easily by modifying the code to place the tremolo result to both speakers instead of periodically panning the result. The I/O difference equation is as follows [10]:
n x n
f t
y *sin 2 cycle ; (9) f
fcycle / (sampling rate).
2.3.2. Rotary Speaker Effect
The rotary speaker effect was first used for the electronic reproduction of organ instrument. A combination of modulation and delay line can be used for a rotary speaker effect simulation, as shown in figure 9. The simulation makes use of a modulated (or fixed) delay line and amplitude modulation for intensity modifications [11]. A directional sound characteristic similar to rotate speakers can be achieved by amplitude modulating the output signal of the delay lines. A stereo rotary speaker effect is perceived due to unequal mixing of the two delay lines to the left and right channel output. The directional characteristic of the opposite horn arrangement performs an intensity variation in the listener‟s ear.
2.3.3. Wah-wah Effect
The wah-wah effect was first used for electronic guitar. It produced mostly by foot-controller signal processors containing a bandpass filter with variable center/resonant frequency and a small bandwidth. Moving the pedal back and forth changes the bandpass cut-off/center frequency. The wah-wah effect is then mixed with the direct signal as shown in Figure 10. This effect leads to a spectrum shaping similar to speech and produces a speech like “wah-wah” sound. If the variation of the center frequency is controlled by the input signal, a low frequency oscillator is used to change the center frequency. Such an effect is call an auto-wah filter.
2.4. Phase Vocoder Basics
A very interesting (and intuitive) way of modifying a sound is to make a two-dimensional representation of it, modify this representation in some or other way and reconstruct a new signal from this representation. Consequently a digital audio effect based on time-frequency representations requires three steps: an analysis (sound to representation), a transformation (of the representation) and a re-synthesis (getting back to a sound). The analysis/synthesis scheme is termed the phase vocoder. That means the input signal x(n) is multiplied by a sliding window of finite length N, which yields successive windowed signal segments. These are transformed to the spectral domain by FFTs. In this way, a time-varying spectrum X(n,k) X(n,k)ej(n,k) with k = 0,1,………, N - 1 is computed for each windowed segment. The short-time spectra
can be modified or transformed for a digital audio effect. Then each modified spectrum is applied to an IFFT and windowed in the time domain. The windowed output segments are then overlapped and added yielding the output signal [12].
The short-time Fourier transform (STFT) of the signal x(n) is given by:
Nmk XR(n,k) jXI(n,k) X(n,k)ej(n,k). (11) index n. The summation index is m. At each time index n the signal x(m) is weighted by a finite length window h(n-m). Thus the computation of (10) can be performed by a finite sum over m with an FFT of length N.
2.4.1. Filter Bank Summation Model
The computation of the time-varying spectrum of an input signal can also be interpreted as a parallel bank of N bandpass filters, as shown in Figure 11, with impulse responses and Fourier transform given by
1 the corresponding bandpass filter hk(n). Since the bandpass filters are complex-valued, we get complex-valued output signals yk(n), which will be denoted by These filter operations are performed by the convolutions
From (15) and (16) it is important to notice that
)
Based on equation (15) and two different implementations are possible, as shown
in figure 11. The first implementation is the so-called complex baseband implementation. The baseband signals X( kn, )(short-time Fourier transform) are computed by modulation of x(n) withWNnk and lowpass filtering for each channel k.
The modulation of X( kn, ) by WNnk yields the bandpass implementation, which filters the input signal with hk(n) given by (12), as shown in the lower part of figure 11. This implementation leads directly to the complex-valued bandpass signals ~( , ) shown to point out the equivalence of both implementations.
The output sequence y(n) is the sum of the bandpass signals according to
we get the frequency bands shown in the upper part of figure 11. The
property ~*( , ) the formulation of real-valued bandpass signals (real-valued kth channel)
, frequencies kand time-varying amplitude and phase. This means that we can add real-valued output signals yˆk(n)to yield the output signal
This interpretation offers analysis of a signal by a filter bank, modification of the short-time spectrum ~( , ) a hop size of R samples. So the analysis algorithm is given by
a
Nmk synthesis hop size. The synthesis algorithm is given by
s s
sound signal as a sum of sinusoids. Each of these sinusoids is modulated in amplitude and frequency. These sinusoids represent filtered versions of the original signal. The manipulation of the amplitudes and frequencies of these individual signals will produce a digital effect including pitch shifting or time stretching.One can use a filter bank to split the audio signal into several filtered versions.
The sum of these filtered versions reproduces the original signal. For a perfect reconstruction the sum of the filter frequency responses should be unity. In order to produce a digital audio effect, one needs to alter the intermediate signals that are analytical signals consisting of real and imaginary parts. The implementation of each filter can be performed by a heterodyne filter, as shown in figure 12.
The implementation of a stage of a heterodyne filter consists of a complex-valued oscillator with a fixed frequency k, a multiplier and an FIR filter.
The multiplication shifts the spectrum of the sound, and the FIR filter limits the width of the frequency shifted spectrum. This heterodyne filtering can be used to obtain intermediate analytic signals, which can be put in the form
( )
( ) ( , ) ( , ) ( , ) ( , )) ,
(n k x n e j n h n XR n k jXI n k X n k ej nk
X k
X(n,k)c o s
n,k
X(n,k)s i n
n,k
(24) and the derivation of the phase is a measure of the frequency deviation from the center frequency k. A sinusoid x
n cos
kn0
with frequency kcantermed the instantaneous frequency given by
frequency is done by computing the phase according to~
n,k ~(0,k)
0nT2fi
,k d (27)individual back shifted signals according to
Time-frequency scaling is one of the most interesting and difficult tasks that can be assigned to time-frequency representations: changing the time scale independently of the “frequency content”. For example, one can change the rhythm of a song without changing its pitch, or conversely transpose a song without any time change.
Time stretching is not a problem that can be stated outside of the perception: we know, for example, that a sum of two sinusoids is equivalent to a product of a carrier and a modulator.
There are two implementations for time-frequency scaling by phase vocoder. The first one uses a bank of oscillators, whose amplitudes and frequencies vary over time.
If we can manage to model a sound by the sum of sinusoids, time stretching and pitch shifting can be performed by expanding the amplitude and frequency functions. The second implementation uses the sliding Fourier transform as the model for re-synthesis: if we can manage to spread the image of a sliding FFT over time and calculate new phases, then we can reconstruct a new sound with the help of inverse FFTs. Both of these techniques rely on phase interpolation, which need an unwrapping algorithm at the analysis stage, or equivalently an instantaneous frequency calculation. The time stretching algorithm mainly consists of providing a synthesis grid which is different from analysis grid, and to find a way to reconstruct a signal from the values on this grid. The classical way of using a phase vocoder for time stretching is to keep the magnitude unchanged and to modify the phase in such a
way that the instantaneous frequencies are preserved.
In the FFT analysis (sum of sinusoids synthesis) approach, we calculate the instantaneous frequency for each bin and integrate the corresponding phase increment in order to reconstruct a signal as the weighted sum of cosines of the phases. However, here the hope size for the re-synthesis is different form the analysis. Therefore the following steps are necessary:
1. Calculate the phase increment per sample by d
k
k /Ra . 2. For the output samples of the re-synthesis integrate this value according to ~
n1,k
~
n,k d
k .3. Sum the intermediate signals which yields y
n
kN/02A
n,k cos
~
n,k
. (see figure 13).2.5.2. Pitch Shifting
Pitch shifting is different from frequency shifting: a frequency shift is an addition to every frequency, while pitch shifting is the multiplication of every frequency by a transposition factor. Pitch shifting can be directly linked to time stretching.
Resampling a time-stretched signal with the inverse of the time stretching ratio performs pitch shifting and going back to the initial duration of the signal (see figure 14). There are, however, alternative solutions which allow the direct calculation of a pitch shifted version of a sound.
In the time stretching algorithm using the sum of sinusoids we have an evaluation of instantaneous frequencies. As a matter of fact transposing all the instantaneous frequencies can lead to an efficient pitch shifting algorithm. Therefore the following steps have to be performed:
1. Calculate the phase increment per sample by d
k
k /Ra .2. Multiply the phase increment by the transposition factor transpo and integrate
the modified phase increment according to
n k
n k transpo d
k~ 1, ~ , .
3. Calculate the sum of sinusoids: when the transposition factor is greater than one, keep only frequencies under the Nyquist frequency bin N/2. This can be done by taking only the N/
2transpo
frequency bins.2.5.3. Robotization
The effect applies a fixed pitch onto a sound. Moreover, as it forces the sound to be periodic, many erratic and random variations are converted into robotic sounds.
The sliding FFT of pulses where the analysis is taken at the time of these pulses will give a zero phase value for the phase of the FFT. This is a clear indication that putting a zero phase before an IFFT re-synthesis will give a fixed pitch sound. So zeroing the phase can be viewed from two points of view:
1. The result of an IFFT is a pulse-like sound and summing such grains at regular intervals gives a fixed pitch.
2. Due to fact that the time-frequency representation now shows a succession of vertical lines with zero values in between, this will lead to a comb filter effect during re-synthesis.
3 Virtual Bass Synthesis
3.1 Bandwidth Extension(BWE)
There is increasing popularity of 3C products (computer, communication, and consumer electronics) in nowadays human life. From the marketplace, more audio and video functions are required for 3C products than ever. For instance, a
There is increasing popularity of 3C products (computer, communication, and consumer electronics) in nowadays human life. From the marketplace, more audio and video functions are required for 3C products than ever. For instance, a