Chapter 4 Experiment Results
4.4 Demonstrations of 3D Acoustic Signal Synthesis Results
Results
In Fig. 4.30, we show the 3D acoustic signal synthesis flow. By dividing the separated signals into parts, we are able to build the 3D acoustic signal as the designed HRTF scenario.
It can be done by filtering each divided parts with its corresponding ATF and HRTF. The order of ATF filtering and HRTF filtering does not affect the output signal but the computational complexity since the HRTF filtering produce a two channel signal for each input signal.
65
Fig. 4.30 Flow Diagram of 3D Acoustic Signal Synthesis
For each sequence data, we provide three kinds of waveforms: the SLAB synthesis waveform, the HRTF+ATF waveform from the original source signals and the HRTF+ATF waveform from the separated signals.
The demonstrations show two kinds of HRTF scenarios. The first scenario which is shown as Fig. 4.31 has 25 frames and the frame interval is about 0.5 second. The second scenario which is shown as Fig. 4.32 has 27 frames and the frame interval is also about 0.5
66
second. The red point represents the source 1, the green point represents the source 2 and the blue and red parts of the headphone represent the left and right ear of HRTF respectively.
(a) (b) (c)
(d) (e) (f)
Fig. 4.31 HRTF Scenario 1, 25 Frames, Frame Interval0.5 sec,
Red: Source 1, Green: Source 2 (a) Frame 1 (b) Frame 5 (c) Frame 10
(d) Frame 15 (e) Frame 20 (f) Frame 25
67
(a) (b) (c)
(d) (e) (f)
(g)
Fig. 4.32 HRTF Scenario 2, 27 Frames, Frame Interval 0.5 sec,
Red: Source 1, Green: Source 2 (a) Frame 1 (b) Frame 5 (c) Frame 8 (d) Frame 13 (e) Frame 18 (f) Frame 21
(g) Frame 27
In order to amplify the noticeable effect of the ATF, we demonstrate the 3D acoustic signals for three different room sizes: large room with 20 x 20 x 20 (m), median room with 10 x 10 x 10 (m) and small room with 4 x 4 x 4 (m), which are shown in Fig. 4.33.
68
(a)
(b)
(c) Fig. 4.33 Different Room Sizes
(a) Large Room (b) Medium Room (c) Small Room
From Fig. 4.34 to Fig. 4.41, we can observe the effects of ATF to the waveforms and the spectrograms. By the comparisons of the figures in (a) and the ones in (c), it can be identified that the ATFs change the waveforms of the separated signals; the difference is implicit without reflection (NR), but it is visible for perfect reflectors (PR) as the wall material in the three different room sizes (Small, Medium, Large). The effect of room sizes
69
to ATFs can be observed in (f). The longer the reverberation time is, the faster the changes in the adjacent frequencies are. The explanation comes from the sum of different time domain shifting of signals cause the frequency domain magnitude variation:
2
Therefore, for a larger room, there exists some larger value of tk tm which cause a faster oscillation of the spectrum. By comparing the spectrograms in (e) with those in (b), we are able to see some blue slices at the frequencies with lower spectrum magnitudes in (f). After the HRTF filtering, the interchannel level difference (ILD) is noticeable in (d), which is related to the HRTF azimuth angle. For the signals at 45, the left channel amplitude is much larger than the right one; in the other hand, for those at 45, the right channel amplitude is larger than the left one.
70
(a) (b)
(c) (d)
(e) (f)
Fig. 4.34 “f01m01”, Separated Signal 1, NR, HRTF at 45
(a) Separated Signal in Time Domain (b) Separated Signal in Time-Frequency Domain (c) After ATF in Time Domain (d) After HRTF in Time Domain
(e) After ATF in Time-Frequency Domain (f) Log 10 Magnitude of ATF
71
(a) (b)
(c) (d)
(e) (f)
Fig. 4.35 “f01m01”, Separated Signal 1, Small Room, PR, HRTF at 45
(a) Separated Signal in Time Domain (b) Separated Signal in Time-Frequency Domain (c) After ATF in Time Domain (d) After HRTF in Time Domain
(e) After ATF in Time-Frequency Domain (f) Log 10 Magnitude of ATF
72
(a) (b)
(c) (d)
(e) (f)
Fig. 4.36 “f01m01”, Separated Signal 1, Medium Room, PR, HRTF at 45
(a) Separated Signal in Time Domain (b) Separated Signal in Time-Frequency Domain (c) After ATF in Time Domain (d) After HRTF in Time Domain
(e) After ATF in Time-Frequency Domain (f) Log 10 Magnitude of ATF
73
(a) (b)
(c) (d)
(e) (f)
Fig. 4.37 “f01m01”, Separated Signal 1, Large Room, PR, HRTF at 45
(a) Separated Signal in Time Domain (b) Separated Signal in Time-Frequency Domain (c) After ATF in Time Domain (d) After HRTF in Time Domain
(e) After ATF in Time-Frequency Domain (f) Log 10 Magnitude of ATF
74
(a) (b)
(c) (d)
(e) (f)
Fig. 4.38 “winter”, Separated Signal 2, NR, HRTF at 45
(a) Separated Signal in Time Domain (b) Separated Signal in Time-Frequency Domain (c) After ATF in Time Domain (d) After HRTF in Time Domain
(e) After ATF in Time-Frequency Domain (f) Log 10 Magnitude of ATF
75
(a) (b)
(c) (d)
(e) (f)
Fig. 4.39 “winter”, Separated Signal 2, Small Room, PR, HRTF at 45
(a) Separated Signal in Time Domain (b) Separated Signal in Time-Frequency Domain (c) After ATF in Time Domain (d) After HRTF in Time Domain
(e) After ATF in Time-Frequency Domain (f) Log 10 Magnitude of ATF
76
(a) (b)
(c) (d)
(e) (f)
Fig. 4.40 “winter”, Separated Signal 2, Medium Room, PR, HRTF at 45
(a) Separated Signal in Time Domain (b) Separated Signal in Time-Frequency Domain (c) After ATF in Time Domain (d) After HRTF in Time Domain
(e) After ATF in Time-Frequency Domain (f) Log 10 Magnitude of ATF
77
(a) (b)
(c) (d)
(e) (f)
Fig. 4.41 “winter”, Separated Signal 2, Large Room, PR, HRTF at 45
(a) Separated Signal in Time Domain (b) Separated Signal in Time-Frequency Domain (c) After ATF in Time Domain (d) After HRTF in Time Domain
(e) After ATF in Time-Frequency Domain (f) Log 10 Magnitude of ATF
78
79 discard the reverberant components and the noise energy. The permutation and scaling problems of FD-ICA are solved by the hybrid DOA and correlation method and the MDP, respectively. A least squares optimization technique based on the cross-power-spectrum approach with the gradient descent algorithm is used for the blind separation of the convolutive mixture signals. The separated signal quality is evaluated by SIR. The simulation and discussion on the SIR values, waveforms, and spectrograms of each input sequence are presented in section 4.1.
To construct a 3D audio on the headphone, the separated signals are filtered by the HRTF and the ATF at the virtual listening point. The interpolation methods of the HRTF and the ATF at the virtual listening point are derived in chapter 3. Chapter 4 discusses the ATFs of different room sizes and different wall materials. The spatial impression, which is given by the combination of the HRTF and the ATF, is demonstrated with the resulting 3D acoustic signals.
The SLAB software is used to generate the audio signals in a room, to capture the microphone array signals, and to measure the ATFs in different room sizes and wall materials. The afterward signal processing implementation is done in MATLAB. The
80
spectrograms of signals in each stage are shown to visualize the signal envelope transition process. Different HRTF scenarios are employed to demonstrate the 3D acoustic feeling of the synthesized signals.
5.2 Future Work
This thesis concentrates on the overall combination of BSS, HRTF and ATF to produce the 3D acoustic signal at a virtual listening point. Yet there are many extensions can be made to improve the quality of the 3D acoustic signal. For example, the source signal location detection can complete the sound field reconstruction and it is also helpful to obtain the corresponding ATF. Another possible subsequent work is the synthesis of moving 3D acoustic signals considering the Doppler effect of the frequency variation along with the relative velocity of each source signal to the virtual listening point. It is also expected to reduce the computational complexity of the overall process, which aims at the real time synthesis of the 3D acoustic signals. Additionally, the graphic user interface (GUI) can improve the interaction of selecting the virtual listening point in the specific acoustic room.
81 Speech and Audio Processing, vol. 11, no. 3, May 2003.
[3] E. Binghan and A. Hyvarinen, "A Fast Fixed-point Algorithm for Independent Component Analysis of Complex Valued Signals," International Journal of Neural Systems, vol. 10, no. 1, pp. 1-8, Feb. 2000.
[4] A. Bell and T. Sejnowski, "An Information-maximization Approach to Blind Separation and Blind Deconvolution," Neural Computation, vol. 7, pp. 1129-1159, 1995.
[5] S. Haykin, Ed., Unsupervised Adaptive Filtering (Volume I: Blind Source Separation), John Wiley & Sons, 2000.
[6] M. Brandstein and D. Ward, Microphone Arrays: Signal Processing Techniques and Applications, Springer, 2001.
[7] A. Hyvarinen and E. Oja, "A Fast Fixed-point Algorithm for Independent Component Analysis," Neural Computation 9, 1483-1492, 1997.
[8] S. Ikeda and N. Murata, "A Method of ICA in Time-Frequency Domain," in Proc.
ICA'99, pp. 365-371, Jan. 1999.
[9] S. Amari, et al., "Stability Analysis of Learning Algorithms for Blind Source Separation," Neural Networks, vol. 10, no. 8, pp. 1345-1351, 1997.
[10] P. Smaragdis, "Blind Separation of Convolved Mixtures in the Frequency Domain," in Proc. Int. Workshop on Independence and Artificial Neural Networks, 1998.
82
[11] S. Amari, "Natural Gradient Works Efficiently in Learning," Neural Computing, vol.
10, no. 2, pp. 251-276, 1998.
[12] S. Winter, et al., "Geometrical Understanding of the PCA Subspace Method for Overdetermined Blind Source Separation," in Proc. ICASSP, pp. 769-772, Apr. 2003.
[13] K. Niwa, et al., “Encoding Large Array Signals into a 3D Sound Field Representation for Selective Listening Point Audio based on Blind Source Separation,”
ICASSP2008(AE-P2.E10), pp. 181-184, 2008.
[14] K.Niwa, et al., "Selective Listening Point Audio Based on Blind Source Separation and Stereophonic Technology", IEICE Trans. of Information and System, vol.E92-D, no.3, Mar. 2009.
[15] H. Sawada, et al., "A Robust and Precise Method for Solving the Permutation Problem of Frequency-domain Blind Source Separation," IEEE Trans. on Speech and Audio Processing, vol. 12, no. 5, Sep. 2004.
[16] K. Matsuoka and S. Nakashima, “Minimal Distortion Principle for Blind Source Separation,” in Proc. ICA, pp. 722-727, Dec. 2001.
[17] L. Parra and C. Spence, “Convolutive Blind Separation of Non-Stationary Sources,”
IEEE Trans. on Speech and Audio Processing, vol. 8, no. 3, pp. 320-327, Mar. 2000.
[18] K. Niwa, et al., "Development of Selectable Viewpoint and Listening Point System for Musical Performance," ICA2007, PPA-06-011, 2007.
[19] M. P. Tehrani, et al., "3DAV Integrated System Featuring Arbitrary Listening-point and Viewpoint Generation," in Proc. of IEEE Multimedia Signal Processing, MMSP 2008, PID-213, pp. 855-860, Oct. 2008.
[20] Wikipedia of HRTF:
http://en.wikipedia.org/wiki/HRTF.
[21] B. D. Van Veen and K. M. Buckley, "Beamforming: A Versatile Approach to Spatial Filtering," IEEE ASSP Magazine, pp. 2-24, Apr. 1988.
83
[22] Y. Suzuki, et al, "An Optimum Computer-Generated Pulse Signal Suitable for the Measurement of Very Long Impulse Responses", J. Acoust. Soc. Am., vol.97(2), pp.
1119-1123, 1995.
[23] TSP design:
http://tosa.mri.co.jp/sounddb/tsp/tsp_design_e.htm.
[24] An online HRTF database:
http://recherche.ircam.fr/equipes/salles/listen/index.html.
[25] J. D. Miller, “SLAB: A Software-based Real-time Virtual Acoustic Environment Rendering System,” in Proc. of the 2001 International Conference on Auditory Display, Espoo, Finland, Jul. 2001.
84
85
自傳
張欽淵,1985 年 7 月 21 日出生於新竹市。2007 年畢業於國立交通大學電機資訊學院 學士班,之後進入國立交通大學電子研究所攻讀碩士學位,研究方向為多媒體訊號處
理,論文題目為「由麥克風陣列訊號合成出虛擬聆聽點的3D音訊」。