CONCLUSIONS - 多聲道音響重現之分析與實現

A comprehensive study has been conducted to explore the effects of listening angle on crosstalk cancellation in spatial sound reproduction using two-channel stereo systems in Sec. 3. The intention is to establish a sustainable configuration of CCS that best reconciles the separation performance and the robustness against lateral head movement, not only in theory but also in practice. Similar to the previous research which focuses mainly on numerical stability, the present work arrives at the conclusion that inversion of ill-conditioned systems results in high gain filters, loss of dynamic range and hence separation performance. Regularization is required to compromise between numerical stability and separation performance. However, findings different from the previous study had also been reached because this work employed a comprehensive approach. First, it is found from the HRTF results that the problem of high frequency ringing is not as critical as in the point source model owing to head shadowing. In addition, poor conditioning, high gain, and low performance problems at low frequencies may arise for extremely small span arrangements, whereas there is broader useful frequency range with performance and numerical stability if wide span arrangement can be used. The effects of listening angle were also examined in the context of sweet spot. Two kinds of sweet spot definitions are employed in the simulation. The relative sweet spot suggests that robustness is excellent with the use of small span arrangement notwithstanding the poor performance in the nominal position, which is in agreement with the previous research. However, it is not very useful in practical application if the average channel separation in the sweet spot is very poor even though it is relatively robust.

Therefore, in addition to the conventional relative definition, we suggest another definition, the absolute sweet spot, to make the evaluation more complete. In an absolute sweet spot, the performance is guaranteed in complement to the relative

robustness, which is desirable in practical use of the CCS. The results of absolute sweet spot reveal that arrangements with listening angle ranging from 120 to 150 degrees are optimal choices.

To justify the conjectures above, objective and subjective experiments were undertaken in an anechoic room for three loudspeaker arrangements, including the stereo dipole (10-deg), standard span (60-deg), and proposed span (120-deg). The results post-processed by the ANOVA test indicate that the 120-deg configuration performs comparably well as the standard 60-deg configuration, but is better than the 10-deg configuration. Small span arrangement produces large relative sweet spot because head displacement would cause minimal change of time-of-arrival differences between two loudspeakers using closely spaced loudspeakers. This configuration is well suited to applications that must be spatially compact, e.g., mobile phones and other portable devices. Nevertheless, the benefit of small span arrangement comes at the price of poor conditioning, high gain, and limited performance problems at low frequencies. Apart from this, due to the lack of natural high frequency separation provided by head shadowing, the small span arrangement is not able to position

“out-of-range” source when CCS breaks down at high frequencies, where the phantom source is incorrectly panned within a narrow span. The arrangement with large span appears to be more effective than the small span because head shadowing and panning effect help to provide localization effect to certain degree even if CCS breaks down. While it may seem from this report that large-span configuration is predominantly favored, problems inherent to large span prevent the span to grow indefinitely, e.g., sound image stability will become an issue for wide apart loudspeakers. A practical recommendation is perhaps the conventional 60-deg configuration which is a reasonable compromise between the two extremes (10 and 120 degrees) to achieve both robustness and performance. It was also found that the

120-deg arrangement did not perform as well as the 60-deg arrangement in positioning frontal images. If an additional center loudspeaker is available, the 3/0 format with 120-deg span would be an ideal choice.

A bandlimited CCS based on subband filtering has been developed in Sec. 4.

The intension is to establish a computationally efficient CCS without penalty on cancellation performance. The CCS is a bandlimited design which is effective up to the frequency 6 kHz. To achieve the bandlimited implementation, a pseudo cosine modulated QMF is employed, allowing the CCS to operate at low rate within an approximate PR structure. As a result of this, spatial audio processing can concentrate more on the low frequency range to better suit human perceptual hearing.

To compare the proposed CCS to traditional systems, subjective listening experiments were conducted in an anechoic room. The experiments include two parts: source localization test and sound quality test. By means of the techniques presented in Sec. 4.1, the fullband CCS operated at the sampling rate of 48 kHz requires four 3000-tapped FIR filters. On the other hand, the bandlimited CCS operated at the sampling rate of 12 kHz requires only four 1500-tapped FIR filters.

The prototype FIR filter has 120 taps. The analysis bank and the synthesis bank are generated from the prototype and implemented via polyphase representation. The results of subjective tests processed by ANOVA indicate that the bandlimited CCS performs comparably well as the fullband CCS not only in localization but also in sound quality. From Table V, the computation loading using the proposed subband filtering approach was drastically reduced by approximately eighty percent, as compared to the conventional approach. After employing fast convolution algorithm, the difference between two methods is reduced. Even though the block convolution is very efficient, it requires more memory to store temporary data. In conclusion, which method is better is dependent upon which one you concern about, speed or

memory. The bandlimited CCS with direct convolution and shuffler method is an acceptable choice.

The cone velocity observer that requires no sensor has been developed and implemented in an analog circuit in Sec. 5. Excellent estimation of cone velocity has been achieved using the suggested system. A hybrid control employing a feedforward filter and a feedback compensator is proposed. A feedforward filter is synthesized on the basis of the velocity observer. A feedback compensator is designed by using QFT. With the aid of such system, the bass response of a loudspeaker has better low-frequency extension with significant level enhancement.

A comprehensive study has been conducted to explore promising but practical approaches for the automotive virtual surround audio systems via simulations and experiments. The simulation using the free-field point source model reveals that setting three control points at each seat position creates the largest sweet spot, but the performance at each control point is compromised. Four processing methods have been presented: the first two methods are intended for two-channel inputs and the other two methods are intended for 5.1-channel inputs. A reverberation-based upmix processing is used to convert two-channel inputs to four-channel signals. In addition, the inverse filters in Method I and Method III are exploited to correct the car responses and then render a spatial listening environment. Methods II and IV are practical approaches in the sense of computation complexity and audio performance.

Conclusions can be drawn from the listening tests as follows. First, for two-channel inputs, Method I outperformed Method II, especially for the rear seat, while both performed the hidden reference. Second, for 5.1-channel inputs, Method III received the highest grades in most attributes, especially in spatial attributes. In addition, Method III performed better at the rear seat than it did at the front seat in frontal image and proximity. Third, for the single passenger mode, the two-speaker

approach is a preferred choice over the four-speaker approach in considering rendering performance and computation complexity. Fourth, inverse filtering did not perform as well for the two passenger mode as it did for the single passenger mode.

Further, the number of inverse filters increases drastically with number of passengers, rendering this scheme impractical. Fifth, overall preference is dominated by brightness and envelopment, as indicated by the multiple-regression analysis. It is concluded from the discoveries above that a simple design strategy can be formulated according to the number of passengers, using a hybrid approach. Methods I and III are employed for one passenger, while Methods II and IV are employed for more than one passenger.

在文檔中多聲道音響重現之分析與實現 (頁 63-68)