Bandwidth extension (BWE) refers to methods that increase the frequency bandwidth of signals. It is desired when the frequency content of the signal at some point should be enhanced to improve audio effects or if the bandwidth of signal has been reduced because of some economical constraints. An obvious way to categorize various BWE methods is based on the frequency range of interest (high
frequency or low frequency) and where the signal bandwidth is actually extended (physical or psycho-acoustical extension) [14]. The psycho-acoustical extension, different to physical BWE, use no practical implementation to contain the frequency range of interest, but exploit the property of human hearing to achieve bandwidth extension. In this study, we described three kinds of BWE applications respectively virtual bass (VB) and voice clarity (VC).
The first application refers to the virtual bass technology, it focuses on how to increase bass enhancement using a loudspeaker which has no low-frequency capability such as a cell phone. A common solution to this problem is to use equalizers that make use of shelving filters or other electronic means, but it does not usually get a good result. If the power amplifier and the loudspeaker are not redesigned for the low-frequency purpose, boosting bass directly will cause distortions or even permanent damage. To overcome the above-mentioned problems, the virtual bass technology exploits a psycho-acoustic property of human hearing that humans are capable of “extrapolating” the missing fundamental in the low frequency range based on higher harmonics. The pitch-shifting algorithm of phase vocoder can realize the concept by modifying the phase properly, and then the equal loudness contour is exploited to adjust the loudness [15]. Instead of generating harmonics by pitch-shifting algorithm which requires a complex calculation of phase [16], nonlinear processing can create new bandwidth more efficiently and conveniently.
Even if the fundamental frequency is missing, it will still perceived as a residue pitch, which in this case is sometimes called ‘virtual pitch’ or ‘missing fundamental.’
Finally, we use the implementation of multistage up/down-sampling structure to save evaluation [17]. The second application refers to the voice clarity technology; it is requested when the speech is not clarified enough for listening. This problem is probably because of the low loudness of voice we want to listen or the loud
background music. It usually happens as someone watching movies or talking by telephone. In this study we aim to overcome the problem by some simple algorithms including nonlinear processing.
Because it is generally difficult to have a good low frequency loudspeaker response with small loudspeakers, it is pertinent to ask whether other options are available. One option is to use BWE, with the ‘extension’ taking part in the auditory system, instead of extending the actual physical bandwidth of the signal. This approach is to make use of the ‘missing fundamental’ effect: a special case of residue pitch, also known as virtual pitch. We can substitute an f < f1 by a series
, 2
kf k> , to evoke the residue pitch of f , while the loudspeaker does not radiate energy at frequency f . For voice clarity, we try to make the muffled voice more brilliant and clear by three simple algorithms. First of all, we modulate the magnitude of certain frequency components by graphic equalizers [18] to enhance human speech. Second, we use nonlinear process to generate high frequency harmonics. Finally we combine aforementioned two.
2.1 NONLINEAR PROCESSING
In this chapter we describe an efficient nonlinear operation to extend frequency bandwidth. This algorithm is convenient ways for generating harmonics signals with odd or even harmonics. They have their own spectral characteristics and can create different kinds of audio effects. Before explaining the applications of BWE, how the nonlinear processing works and what the characteristics of nonlinear process are should be described [19].
2.1.1 Clipper
A convenient way to generate a harmonics signal with only odd harmonics
is by means of a clipper. The clipper output signal gc in response to an input f
where lc is the threshold. The clipper in Fig. 1 demonstrates very good subjective results in the low-frequency psychoacoustic BWE application. This effect due to clipper sounds low-pitched and saturated enough, so the method is applicable to the realization of virtual bass. We can get information from Fig. 2 that only odd harmonics are created by clipper and the fundamental frequency is still preserved.
The differences between clipper and rectifier are not only positions of harmonics generated but also preservation of the fundamental frequency. Another disadvantage for a clipper, like a rectifier, cannot control the magnitude of harmonics.
2.1.2 Hyperbolic tangent
Unlike the clipper that is a “hard” clipper, the hyperbolic tangent function shown belongs to a “soft” clipper. Fig. 3 shows the transform of a sine wave 100 Hz on the time and frequency domains by the methods of hyperbolic tangent. We can notice that the waveform modified by the hyperbolic tangent in Fig. 3(a) seems to be compressed. This approach is especially suitable for dialogues but not music, which will be validated in section 2.3. It uses a function that has a gain at low and moderate signal levels, but attenuation at high signal levels. It is different form ordinary compressors because it’s memoryless. That is, it is an instantaneous compressor. During experiments, it appeared that the function where x t( ) is the input time signal, and y t( ) is the modified output signal by hyperbolic tangent.
1 2
( ) ta n h ( ( ))
y t = c c x t (2)
The constant c1 determines the maximum output level and c2determines the gain at low signal levels.
2.2 Virtual bass
With good properties of spectral characteristics, temporal characteristics and inter-modulation distortion [20], we choose the clipper as the method of creating harmonics because of the best low-frequency psychoacoustic performance. Figure 4 shows the whole process of VB realization. There are two paths, the first path is the main structure performing virtual bass, and another path just contains delay. First of all, because of efficiency of running program, we use the method of multistage to execute the up/down-sampling [21] to save evaluation, we chose the up/down-sampling ratio asM =16, and the length of original signal will decrease 16 times after operation. Figure 5 shows that the up/down-sampling process divides into two sections, it is called the interpolated FIR (IFIR) technique [22]. We choose 8 as the first up/down-sampling ratio as well as choose 2 as the second.
of pass band and stop band, we can design filter on our self by matlab toolbox. One important parameter should be noticed is the filter order, it is determined by the
Where D(δ δ1, 2) is a function of the peak pass-band rippleδ1, and peak stop rippleδ2,
The number of MPU is approximately
1, 2 1, 2
Next, between the up-sampling with down-sampling process, a band-pass filter should be designed. The filtering signal is applied to create harmonics for virtual bass. The bandwidth becomes 3 kHz after executing the down-sampling process, so the band pass filter order is much less than that without doing down-sampling process.
Figure 7 shows that band pass filters (a) and (b) have almost the same performance, however, (a) only need 60 filter order when (b) must need 960 filter taps. This is because (a) was done by down-sampling.
Table 1 shows the comparison between a direct design without up/down-sampling and the multistage design. We can observe that using multistage design is more efficient in running the program.
2.2.1 Timbre Loudness Control
We use clipper, which described in Section 2.1 for creating harmonics after
doing up and down sampling process. Then an adjustable gain control will be used.
If this procedure is not performed, the signal modified by clipper is not amplified or attenuated as a desired output. Instead of adjusting the loudness by equal loudness contour, we use timbre loudness control to obtain a suitable spectrum and timbre. It not only controls the loudness but avoids the distortion of timbre. As the equalizer is concerned, it is similar to adjust again control over the whole bandwidth. Now the work we do is the same purpose as the equalizer to design frequency curves which fit with different requirements. Each step is introduced as follows.
First, we use the white noise as the input signal, and pass it along clipper and also create a long bandwidth of harmonics. Next, we try to design a frequency curve in frequency domain, and let the input signal be filtered off the frequency curve.
After try and error, if the output sounds like the white noise in loudness and timbre, this frequency curve is the optimal design. Finally, after combination of the first and second path, a high pass filter should be design for avoiding reduction of high frequency components.
Virtual bass can also be performed on a cell phone whose loudspeaker size is much smaller than the common one, but we must redesign the range of band pass filter. The fundamental resonant frequency of a cell phone is almost 1000Hz when the fundamental resonant frequency of ordinary speakers is 200Hz~300Hz approximately. That is why I want to shift the range of band pass filter to 500Hz~1000Hz instead of 50Hz~200Hz. This process is also implemented on automotive audio.
2.3 Voice clarity
Figure 8 shows our structure for implementing a 9 band graphic equalizer using second order IIR filters. The feed forward path is a fixed gain of 0.25, while each
filter band can be multiplied by a variable gain for gain or attenuation. a and b coefficients can be generated for the following second-order transfer function and equivalent input and output difference equations: determining and setting up the filter type, design method, filter order, and frequency specification, then the coefficients will be found soon.
After describing the theory of graphic equalizer, we try to implement voice clarity by means of it. A simple way performed easily is boosting the amplitude within proper frequency ranges to clarify human voice. Figure 9 shows the frequency response of frequency range and magnitude for boosting. Each solid line indicates the frequency response of each filter band in different frequency ranges, and the dashed line represents the sum of total frequency response.
Hyperbolic tangent has been described in Section 2.2 and it is suitable for high frequency extension and dialogues processing. Similar to graphic equalizer, a simple way is proposed to realize voice clarity. Figure 10(a) shows the structure. Because the magnitude of bass or low frequency is usually much louder than the one of high frequency, let original signal pass through a high pass filter is our first step. Next,
use hyperbolic tangent to enhance voice or high frequency component. After gain-adjusted, add the original music from the other path and the output is done.
In this section there is no innovation proposed, we have aforementioned methods combined to achieve voice clarity. Figure 10(b) shows the structure.