INTRODUCTIONS - 多聲道音響重現之分析與實現

The central idea of spatial audio reproduction is to synthesize a virtual sound image. The listener perceives as if the signals reproduced at the listener’s ears would have been produced by a specific source located at an intended position [1], [2].

This attractive feature of spatial audio lends itself to an emerging audio technology with promising application in mobile phone, personal computer multimedia, video games, home theater, car audio, etc.

The rendering of spatial audio is either by headphones or loudspeakers.

Headphone reproduction is straightforward, but suffers from several shortcomings such as in-head localization, front-back reversal, and discomfort to wear. While loudspeakers do not have the same problems as the headphones, another issue adversely affects the performance of spatial audio rendering using loudspeakers.

The issue frequently encountered in loudspeaker reproduction is the crosstalk in the contralateral paths from the loudspeakers to the listener’s ears, which may obscure source localization. To overcome the problem, crosstalk cancellation systems (CCS) that seek to minimize, if not totally eliminate, the crosstalk have been studied extensively by researchers [3–8]. Various inverse filtering approaches were suggested for designing multi-channel pre-filters for CCS.

Notwithstanding the preliminary success of CCS in academic community, a problem seriously hampers the use of CCS in practical applications. The problem stems from the limited size of the so-called “sweet spot” in which CCS remains effective. The sweet spots are generally so small especially at lateral side that a head movement of a few centimeters would completely destroy the cancellation performance. Two kinds of approach can be used to address this problem – the adaptive design and the robust design. An example of adaptive CCS with head-tracker was presented in the work of Kyiakakis et al [9], [10]. This approach

dynamically adjusts the CCS filters by tracking the head position of the listener using optical or acoustical sensors. However, the approach has not been widely used because of the increased hardware and software complexity of the head tracker. On the other hand, instead of dynamically tracking the listener’s head, an alternative CCS design using fixed filters can be taken to create a “widen” sweet spot that accommodates larger head movement. Ward and Elko in Bell Labs have conducted a series of insightful analysis of the robustness issue of CCS. In their paper on this topic in 1998, robustness of a two-channel stereo loudspeaker (2 2× ) CCS was investigated using weighted cancellation performance measure at the pass zone and stop zone, respectively [11]. In the other paper by the same authors in 1999, robustness issue of a 2 2× CCS was revisited using a different measure, the condition number, which focuses more on numerical stability during matrix inversion, in the presence of noise in data and/or perturbations to system properties [12]. Yet, in another paper by Ward, a joint least squares optimization method is employed to obtain a CCS that is robust to head misalignment [13]. The above-mentioned research winds up with a simple but important conclusion that the optimal loudspeaker spacing should be inversely proportional to the operating frequency.

Along the line of robust CCS design, a celebrated “stereo dipole” configuration was suggested by Kirkeby, Nelson, and Hamada [14], [15]. In their arrangement, two loudspeakers are closely spaced with only 10° span. Their analysis of robustness of CCS also focused primarily on numerical stability in relation to the errors in matrix inversion. The consistent finding of these studies was that the optimal loudspeaker spacing is inversely proportional to the operating frequency. Since the optimal spacing is frequency dependent, a multidrive configuration of the optimal source distribution (OSD) system, comprising pairs of loudspeakers with various spacing, was suggested to deal with crosstalk for different frequency bands [16]. Another

multidrive CCS design was also developed by Bai et al., based on the genetic algorithm and array signal processing [17]. Their approach requires no crossover circuits as in the OSD system.

According to Gardner, loudspeakers spaced apart tend to yield a smaller equalization zone than loudspeakers spaced closely [18]. However, the improvement is predominantly along the front-back axis and the equalization zone widens only slightly when the speakers are positioned closely together. One disadvantage of close spacing is the lack of natural high frequency separation due to head shadowing. Another problem is that small head rotation will cause both speakers to fall on the same side so that the panning mechanism fails.

Thus far, there are pros and cons in the closely spaced CCS. The question of which kind of loudspeaker arrangement is the best has been puzzling people for quite some time. It is worth exploring further the underlying physical insights from all possible angles. This motivates the current research to undertake a comprehensive study in a hope to resolve this optimal CCS problem more conclusively. In Gardner’s work, the head-related transfer functions (HRTF) were measured in the MIT Media Lab [19], [20] and subjective listening tests were conducted. However, only the crosstalk below 6 kHz was considered to result in a bandlimited CCS design.

Furthermore, the robustness of CCS to head misalignment was discussed in depth by Takeuchi and Nelson [15]. In both works, only two listening spans including 10-deg and 60-deg spans were investigated. On the other hand, the emphasis of this work is placed on the analysis of the effects of listening angles on CCS in terms of not only robustness but also performance. There are several special features in this work.

First, not only the robustness but also the performance of CCS is examined with the aid of a more comprehensive set of indices. Second, two kinds of definitions of sweet spot are employed for assessment of robustness. Third, the present work

considers the entire audible 20 kHz band in which the listener’s head may provide natural separation for certain loudspeaker arrangements. Fourth, apart from the objective physical tests, subjective listening tests are conducted to practically assess the CCS arrangements with different listening angles. The results of subjective tests will be validated by using Analysis of Variance (ANOVA) test. Although the last three points have been investigated in [15] and [18], this study examines the design issues in further detail and in some cases reaches different conclusions than the previous research. The intention is to establish a sustainable configuration of CCS that best reconciles the separation performance and the robustness against lateral head movement, not only in theory but also in practice.

Besides sweet spot issue, another one is the computation loading. It usually needs long-tapped filters to achieve excellent performance, especially in a reverberant room. An efficient method of bandlimited implementation based on the subband approach is presented in Sec. 4. In considering the robustness against uncertainties of HRTFs and head movement and head shadowing effect at high frequencies, the proposed CCS is bandlimited to frequencies below 6 kHz [18]. That is, the CCS only functions at low frequencies and the binaural signals are directly passed through at high frequencies. The bandlimited implementation approach suggested in [18] is more computationally demanding due to its fixed operating rate. In this work, we adopted a subband filtering technique based on a cosine modulated Quadrature Mirror Filter (QMF) bank [21]. In this design, the approximated perfect reconstruction condition is fulfilled and the CCS is operated at low rate. Therefore, it can use more effort at low frequencies for characteristics of human perceptual hearing. In order to verify the proposed CCS, subjective listening experiments were conducted to compare it to the traditional CCS. The results of subjective tests will be validated by using ANOVA. The intention is to develop the CCS with light computation loading that

performs comparably well as the fullband CCS.

In addition, since the subwoofer channel plays an important role in watching DVDs and listening to music, a bass enhancement system based on a sensorless cone velocity observer is proposed in Sec. 5. There are two features in this system. One is that the cone velocity observer requires no sensor. The other is that a hybrid control architecture compressing a feedback controller and a feed-forward controller is employed. Experimental results are discussed.

At last, this technique is extended to multi-channel inverse filtering for automotive virtual surround audio system. In recent years, car electronics has received considerable attention and is regarded as the fourth ‘C’ industry in addition to the 3C industries (Computer, Communication, and Consumer electronics).

Research efforts are currently directed toward new applications in car electronics, including audio/video entertainment, global positioning system (GPS), mobile communication, active safety control, intelligent engine control, and so forth. As opposed to the traditional audio entertaining system comprised of a radio set and a cassette recorder, watching TV or Digital Versatile Disc (DVD), playing video games and even conducting video conference in a car becomes reality nowadays, owing to the rapid advances of the flat panel displays and digital telecommunication technologies.

With the increased proliferation nowadays of automotive audiovisual systems, the interior of a car is also known as a notorious listening environment due to reflections in a confined space, non-ideal user/loudspeaker positions, and ambient noise, etc. This motivates the current research to develop automotive audio spatializers to create a proper listening environment for vehicles. In addition to conventional multi-channel panning techniques [22], there are two advanced methods for spatial audio rendering: binaural audio [1]–[18] and wave field synthesis (WFS)

[23]–[26]. Binaural audio is usually intended for one user using a pair of stereo loudspeakers. This approach, however, suffers from the limited size problem of the so-called “sweet spot” in which the system remains effective [11]–[18]. In the other extreme, the WFS technique is ideally immune from the sweet spot problem and the listeners are free to move in the reproduction area. However, considerable coverage of WFS in academia has not lead to widespread commercial adoption of this technique. The key issue is that large number of loudspeakers, and hence complex processing, is required in the use of this approach, which limits its implementation in practical systems. Pragmatic approaches will be presented in this study as a compromise between binaural audio and WFS.

Although spatial audio reproduction has been studied extensively by researchers, little can be found for automotive applications with regards to this technology. By contrast, there are already some luxury cars in the market place which are equipped with multi-channel surround system. These systems are usually comprised of many high-quality loudspeakers alongside digital audio processors, e.g., Lexicon’s LOGIC 7^™ [27], Dolby’s^® Prologic II [28], and SRS^® Labs’ SRS Automotive^™ [29]. Logic 7 and Prologic II are upmixers for extending 2-5.1-channel systems. Bose^® AudioPilot^® [30], and Bang & Olufsen advanced sound system [31] can automatically adjust the volume according to the background noise. Crockett et al. pointed out new trends in automotive audio technology and suggested methods to improve stereo imaging for off-center listeners [32]. Although many commercial systems have emerged, they are mostly based on panning and equalization methods. Few if any have addressed the spatial audio rendering problem for vehicles using more sophisticated and accurate approaches. This paper aims at rendering sound fields in a car environment using various inverse filtering and up/down mixing techniques.

These approaches are targeted at less expensive cars in which only limited number of

loudspeakers is available. The proposed system can handle two kinds of audio input:

2-channel content in CD and MP3 format and 5.1-channel content in DVD and Digital Video Broadcasting (DVB) format.

This paper presents several approaches of automotive spatial audio for various passenger sitting modes. Multi-channel inverse filtering in conjunction with up/down mixing is employed to design audio spatializers for 2-channel and 5.1-channel inputs. Sweet spot analysis is conducted using the free-field point source model. Although the simulated conditions are simplified from realistic scenarios, it shows the effects of head movement on rendering performance. The proposed approaches have been implemented on a real car using a fixed-point digital signal processor (DSP) and the loudspeakers installed in the car. Listening tests were conducted for comparing the presented virtual surround systems. The results of subjective tests were processed using multivariate analysis of variance (MANOVA) [33] and design strategies are discussed.

在文檔中多聲道音響重現之分析與實現 (頁 15-22)