Chapter 3 3D Acoustic Signal Synthesis
3.3 Combining HRTF and ATF
(a) (b)
(c)
Fig. 3.5 Combining ATF and HRTF
(a) ATF for Each Separated Signal (b) HRTF for Each Separated Signal (c) 3D Acoustic Signal Synthesis
There are many different kinds of blind source separation methods, but it is quite difficult to completely separate the source signals in general cases since the information about the source signals and the mixing system is not fully given. The performance of the separation results may degrade owing to the channel noise, room reflections and some
28
violations of the source signal stochastic model assumptions, which are usually different for speech signals and instrument signals. However, the interferences which are introduced by other source signals can be less significant as our main purpose of separating these source signals is to synthesize them back together.
With the HRTF database and the ATF-pool, the audience is allowed to choose the arrangement of the source signals and listening position arbitrarily. In other words, the audience can have one source signal at the left side and another at the right side, which are unrelated to the original geometric spots of these source signals in the room. The spatial impression is presented with the headphone by utilizing the HRTF database and the ATF-pool to simulate the user-customized listening scenarios. Therefore, the audience can hear the synthesized 3D feeling audio signals at their own sweet spots.
For the point which does not have an ATF measurement, the estimation of its ATF is calculated by a weighted linear interpolation from the nearby measured ATFs. The weighted linear interpolation method also appears in the calculation of HRTF when the desired spatial position of HRTF cannot be found from the HRTF database.
Let y ti( ) be the separated signal corresponding to the source signal s ti( ) ,
29
Fig. 3.6 Zones of Possible Psychoacoustic Spatial Variation for the Separated Signals
Owing to the interference in the separated signals, the psychoacoustic spatial impression may be degraded by the interaural time difference (ITD) and interaural level difference (ILD). The zone of possible psychoacoustic spatial variation for each source alters based on the SIR of each separated signal. The remaining interference for the i -th separated signal affects the j -th separated signal for all ji. The subject performance degradation for such interferences depends on the human psychoacoustic resolutions of the azimuth angles, the elevation angles and the distance. For a far-field virtual listening point, the distance resolution would be less significant due to the human psychoacoustic characteristics, and the azimuth angles and the elevation angles dominate the main 3D acoustic feeling.
30
31
Chapter 4 Experiment Results
4.1 Descriptions of the Adopted BSS System
We adopt the frequency domain independent component analysis (FD-ICA) in this paper with principle component analysis (PCA) as a preprocessing dimension reduction method. We choose the Infomax method combined with the natural gradient method due to the popularity and simplicity of these two methods. The signals are separated in the time-frequency domain and each frequency band is separated individually so that the permutation and scaling problems should be fixed after the ICA process. We solve the permutation problem by the combination of the DOA approach, the neighboring correlation approach and the harmonic frequency approach. The scaling problem is solved by using the minimum distortion principal method. For the convolutive BSS method, we adopt a least squares optimization technique based on the cross-power-spectrum approach with the gradient descent algorithm. The flow diagram for the overall BSS system is shown as Fig.
4.1 below.
32
Fig. 4.1 Flow Diagram of the Adopted BSS System
Fig. 4.2 shows the arrangement of source signals and the microphone array on the X-Y plane. Two source signals are located 3.00 (m) away from each other and the interval length of the microphone array is equal to 0.50 (m). The middle point of the two source signals is 3.00 (m) away from the center of the seven microphone signals.
33
Fig. 4.2 Arrangement of the Source Signals and the Microphone Array
The settings of detailed parameters about the BSS system are shown in the Table 4.1.
The thresholds th and thU are assigned to make sure the DOA calculation is confident, and the threshold thHa is adjusted based on the number of sources and the size of the harmonic set. The range K affects the convergence speed of the convolutive BSS method.
For a larger K value, it takes more computational time to search for the valid demixing matrix W .
Table 4.1 Settings of the BSS System Parameters Parameters of the BSS System Values
Sampling Frequency 44.1 kHz
Number of Microphones, M 7
Number of Sources, N 2
Length of STFT, T 8196 pt
Frame Shift of STFT 128 pt
34
Window Function Hamming
Thresholds of confident DOA th 1.5, thU 10dB Distance for Interfrequency Correlations, 3 f
Set of Harmonic Frequencies
2 , 2f f f, 3 , 3f f f
Threshold of Harmonic Correlations, thHa 1.2
Learning Rate, 1.0
Number of Iterations 1000
Nonlinear Function, g u( ) tanh
GRe u
jtanh
GIm u
Gain of Score Function, G 100
Range of LS Optimization, K 5
Fig. 4.3 SIR of the Demixing Matrix from No Reflection (NR) Microphone Recordings
35
Fig. 4.4 SIR of the Demixing Matrix from Perfect Reflector (PR) Microphone Recordings
Fig. 4.5 Averaged SIR of NR and PR
36
Table 4.2 Source Types in Sequence Numbers Sequence
Number
Sequence
Abbreviation Source 1 Source 2
1 f01m01 Chinese speech, female Chinese speech, male 2 instru instrument, string 1 instrument, string 2 3 speech Japanese speech, female Japanese speech, male
4 winter instrument, drums instrument, piano
5 wistru instrument, string 1 instrument, piano
There are five sets of data being processed from top to toe, which are “f01m01”,
“instru”, “speech”, “winter”, and “wistru”. The “f01m01” sequences are two Chinese speech signals of a man and a woman; the “instru” sequences are two string instrument signals; the “speech” sequences are two Japanese speech signals of a man and a woman; the
“winter” sequences are instrument signals of drums and a piano; the “wistru” sequences are a string in “instru” and the piano in “winter”. The lengths of all these wave files are about 6.8 second.
The effectiveness of the demixing matrix W can be measured as the SIR values of the microphone array signals. In Fig. 4.3, the SIR of the demixing matrix from no reflection (NR) recordings shows good performance in average. The sequence number corresponds to different test sequences which are shown in Table 4.2. When the wall material changes to the perfect reflectors (PR) in Fig. 4.4, the SIR values drop to around 7dB. In Fig. 4.5, the averaged SIRs of NR are higher than the ones of PR for all input sequences. The reason for this phenomenon can be easily understood since the reflections make the purely time-delayed BSS problem into a convolutive one. Thus, the independence of the source signals is disturbed.
For the fifth sequence “wistru”, the SIR difference of source 1 and source 2 is the
37
largest among the five sequences in both the NR and PR conditions. The explanation comes from the waveforms in Fig. 4.22 (c), (d) and Fig. 4.23 (c), (d). Note that the graphs of waveforms and spectrograms were normalized to the interval [-1, 1] for observation. Thus, the true amplitude cannot be observed from the waveforms of the source signals, but we can easily find that the mixture signals are dominated by the source 2 in the “wistru” sequence.
Owing to the larger true magnitude of the source 2 (piano), the interference from source 2 to the separated signal 1 is still significant. On the other hand, the interference from source 1 to the separated signal 2 is insignificant in terms of the relative power ratio. However, in the two source case, the relative power ratio would be eliminated in the averaged SIR. Recall that the separated signals can be modeled as:
1 2
Therefore, the averaged SIR of “wistru” goes back to the normal range of the sequences.
38
(a) (b)
(c) (d)
(e) (f)
Fig. 4.6 Sequence “f01m01” Waveforms in Time Domain (a) Source 1 (b) Source 2
(c) Microphone 1 with NR (d) Microphone 7 with NR (e) Separated Signal 1 with NR (f) Separated Signal 2 with NR
39
(a) (b)
(c) (d)
(e) (f)
Fig. 4.7 Sequence “f01m01” Waveforms in Time Domain (a) Source 1 (b) Source 2
(c) Microphone 1 with PR (d) Microphone 7 with PR (e) Separated Signal 1 with PR (f) Separated Signal 2 with PR
40
(a) (b)
(c) (d)
(e) (f)
Fig. 4.8 Sequence “f01m01” Spectrograms in Time-Frequency Domain (a) Source 1 (b) Source 2
(c) Microphone 1 with NR (d) Microphone 7 with NR (e) Separated Signal 1 with NR (f) Separated Signal 2 with NR
41
(a) (b)
(c) (d)
(e) (f)
Fig. 4.9 Sequence “f01m01” Spectrograms in Time-Frequency Domain (a) Source 1 (b) Source 2
(c) Microphone 1 with PR (d) Microphone 7 with PR (e) Separated Signal 1 with PR (f) Separated Signal 2 with PR
42
(a) (b)
(c) (d)
(e) (f)
Fig. 4.10 Sequence “instru” Waveforms in Time Domain (a) Source 1 (b) Source 2
(c) Microphone 1 with NR (d) Microphone 7 with NR (e) Separated Signal 1 with NR (f) Separated Signal 2 with NR
43
(a) (b)
(c) (d)
(e) (f)
Fig. 4.11 Sequence “instru” Waveforms in Time Domain (a) Source 1 (b) Source 2
(c) Microphone 1 with PR (d) Microphone 7 with PR (e) Separated Signal 1 with PR (f) Separated Signal 2 with PR
44
(a) (b)
(c) (d)
(e) (f)
Fig. 4.12 Sequence “instru” Spectrograms in Time-Frequency Domain (a) Source 1 (b) Source 2
(c) Microphone 1 with NR (d) Microphone 7 with NR (e) Separated Signal 1 with NR (f) Separated Signal 2 with NR
45
(a) (b)
(c) (d)
(e) (f)
Fig. 4.13 Sequence “instru” Spectrograms in Time-Frequency Domain (a) Source 1 (b) Source 2
(c) Microphone 1 with PR (d) Microphone 7 with PR (e) Separated Signal 1 with PR (f) Separated Signal 2 with PR
46
(a) (b)
(c) (d)
(e) (f)
Fig. 4.14 Sequence “speech” Waveforms in Time Domain (a) Source 1 (b) Source 2
(c) Microphone 1 with NR (d) Microphone 7 with NR (e) Separated Signal 1 with NR (f) Separated Signal 2 with NR
47
(a) (b)
(c) (d)
(e) (f)
Fig. 4.15 Sequence “speech” Waveforms in Time Domain (a) Source 1 (b) Source 2
(c) Microphone 1 with PR (d) Microphone 7 with PR (e) Separated Signal 1 with PR (f) Separated Signal 2 with PR
48
(a) (b)
(c) (d)
(e) (f)
Fig. 4.16 Sequence “speech” Spectrograms in Time-Frequency Domain (a) Source 1 (b) Source 2
(c) Microphone 1 with NR (d) Microphone 7 with NR (e) Separated Signal 1 with NR (f) Separated Signal 2 with NR
49
(a) (b)
(c) (d)
(e) (f)
Fig. 4.17 Sequence “speech” Spectrograms in Time-Frequency Domain (a) Source 1 (b) Source 2
(c) Microphone 1 with PR (d) Microphone 7 with PR (e) Separated Signal 1 with PR (f) Separated Signal 2 with PR
50
(a) (b)
(c) (d)
(e) (f)
Fig. 4.18 Sequence “winter” Waveforms in Time Domain (a) Source 1 (b) Source 2
(c) Microphone 1 with NR (d) Microphone 7 with NR (e) Separated Signal 1 with NR (f) Separated Signal 2 with NR
51
(a) (b)
(c) (d)
(e) (f)
Fig. 4.19 Sequence “winter” Waveforms in Time Domain (a) Source 1 (b) Source 2
(c) Microphone 1 with PR (d) Microphone 7 with PR (e) Separated Signal 1 with PR (f) Separated Signal 2 with PR
52
(a) (b)
(c) (d)
(e) (f)
Fig. 4.20 Sequence “winter” Spectrograms in Time-Frequency Domain (a) Source 1 (b) Source 2
(c) Microphone 1 with NR (d) Microphone 7 with NR (e) Separated Signal 1 with NR (f) Separated Signal 2 with NR
53
(a) (b)
(c) (d)
(e) (f)
Fig. 4.21 Sequence “winter” Spectrograms in Time-Frequency Domain (a) Source 1 (b) Source 2
(c) Microphone 1 with PR (d) Microphone 7 with PR (e) Separated Signal 1 with PR (f) Separated Signal 2 with PR
54
(a) (b)
(c) (d)
(e) (f)
Fig. 4.22 Sequence “wistru” Waveforms in Time Domain (a) Source 1 (b) Source 2
(c) Microphone 1 with NR (d) Microphone 7 with NR (e) Separated Signal 1 with NR (f) Separated Signal 2 with NR
55
(a) (b)
(c) (d)
(e) (f)
Fig. 4.23 Sequence “wistru” Waveforms in Time Domain (a) Source 1 (b) Source 2
(c) Microphone 1 with PR (d) Microphone 7 with PR (e) Separated Signal 1 with PR (f) Separated Signal 2 with PR
56
(a) (b)
(c) (d)
(e) (f)
Fig. 4.24 Sequence “wistru” Spectrograms in Time-Frequency Domain (a) Source 1 (b) Source 2
(c) Microphone 1 with NR (d) Microphone 7 with NR (e) Separated Signal 1 with NR (f) Separated Signal 2 with NR
57
(a) (b)
(c) (d)
(e) (f)
Fig. 4.25 Sequence “wistru” Spectrograms in Time-Frequency Domain (a) Source 1 (b) Source 2
(c) Microphone 1 with PR (d) Microphone 7 with PR (e) Separated Signal 1 with PR (f) Separated Signal 2 with PR
58
4.2 Virtual Acoustic Environment
4.2.1 Introduction to NASA Sound Lab (SLAB) Software
Fig. 4.26 Snapshot of the 3D Virtual Acoustic Room in SLAB
SLAB is a software-based real-time virtual acoustic environment rendering system developed by the NASA Ames Research Center. This software provides an offline acoustic environment for spatial hearing and psychoacoustic studies. The acoustic scenario parameters considered in the SLAB include three main categories: the source, the environment, and the listener. The source parameters include the source locations, the source waveforms, the radiation pattern and radius of each source, etc. The environment parameters include the sound speed, the air absorption, the surface locations, the room dimension and the surface reflections, etc. The listener parameters include the listener location, the HRTF model and the interaural time difference (ITD), etc. There are some other specifications about the SLAB software which are presented in the following section.
59
Material Filter First-order IIR Filter
Table 4.3 Scenario Specifications [25]
System Dynamics
Sampling Rate 44.1 kHz
Update Rate 120 Hz
Internal Latency 24 msec
FIR Update Every 64 Samples (1.45 msec)
Delay Line Update Every Sample (22.7 μsec)
Table 4.4 System Dynamics Specifications [25]
Numerical Precision
Sound Input / Output 16-bit Integer
Scenario Double-precision Floating-point
Signal Processing Single-precision Floating-point
Table 4.5 Numerical Precision Specifications [25]
60
4.3 Wall Material ATF Characteristics
There are seven kinds of wall materials provided by the SLAB software. The ATF spectrum is estimated by the TSP signal changes along with different wall materials. The tail of the time domain TSP signal with N = 2048 and M = 64 appends some padding zeros in order to observe the effect of reflections from the six-sided wall materials. As in Fig.
4.27(b) shown, the padding zeros introduce some tolerable amplitude distortions.
(a) (b)
Fig. 4.27 TSP Signal with Padding Zeros (a) Time Domain (b) Frequency Domain Amplitude
The frequency spectrum characteristics for the seven materials and the no reflection scene are shown as Fig. 4.29 from (a) through (h). All the data of Fig. 4.29 are the ATFs measured from the source 1 (red point) to the virtual listening point at (1.25, 0, 1.5) in the median room of the dimension 10 x 10 x 10 in meters. The left column of Fig. 4.29 shows the frequency domain log10 amplitudes and the right column shows the frequency domain unwrapped phase. The name list of the eight wall properties are no reflection (NR), perfect reflector (PR), heavy carpet (HC), concrete (Co), heavy glass (HG), gypsum board (GB), wood with airspace (WA) and plaster on metal (PM), which are shown in Fig. 4.28.
61
(a) (b)
(c) (d)
(e) (f)
62
(g)
Fig. 4.28 Wall Materials
(a) Perfect Reflector (b) Heavy Carpet (c) Concrete (d) Heavy Glass (e) Gypsum Board (f) Wood with Airspace (g) Plaster on Metal
(a)
(b)
63
(c)
(d)
(e)
(f)
64
(g)
(h) Fig. 4.29 ATF Characteristic with Different Wall Materials,
Left: Freq. log10 Magnitude, Right: Unwrapped Phase
(a) No Reflection (b) Perfect Reflector (c) Heavy Carpet (d) Concrete (e) Heavy Glass (f) Gypsum Board (g) Wood with Airspace (h) Plaster on Metal
4.4 Demonstrations of 3D Acoustic Signal Synthesis
Results
In Fig. 4.30, we show the 3D acoustic signal synthesis flow. By dividing the separated signals into parts, we are able to build the 3D acoustic signal as the designed HRTF scenario.
It can be done by filtering each divided parts with its corresponding ATF and HRTF. The order of ATF filtering and HRTF filtering does not affect the output signal but the computational complexity since the HRTF filtering produce a two channel signal for each input signal.
65
Fig. 4.30 Flow Diagram of 3D Acoustic Signal Synthesis
For each sequence data, we provide three kinds of waveforms: the SLAB synthesis waveform, the HRTF+ATF waveform from the original source signals and the HRTF+ATF waveform from the separated signals.
The demonstrations show two kinds of HRTF scenarios. The first scenario which is shown as Fig. 4.31 has 25 frames and the frame interval is about 0.5 second. The second scenario which is shown as Fig. 4.32 has 27 frames and the frame interval is also about 0.5
66
second. The red point represents the source 1, the green point represents the source 2 and the blue and red parts of the headphone represent the left and right ear of HRTF respectively.
(a) (b) (c)
(d) (e) (f)
Fig. 4.31 HRTF Scenario 1, 25 Frames, Frame Interval0.5 sec,
Red: Source 1, Green: Source 2 (a) Frame 1 (b) Frame 5 (c) Frame 10
(d) Frame 15 (e) Frame 20 (f) Frame 25
67
(a) (b) (c)
(d) (e) (f)
(g)
Fig. 4.32 HRTF Scenario 2, 27 Frames, Frame Interval 0.5 sec,
Red: Source 1, Green: Source 2 (a) Frame 1 (b) Frame 5 (c) Frame 8 (d) Frame 13 (e) Frame 18 (f) Frame 21
(g) Frame 27
In order to amplify the noticeable effect of the ATF, we demonstrate the 3D acoustic signals for three different room sizes: large room with 20 x 20 x 20 (m), median room with 10 x 10 x 10 (m) and small room with 4 x 4 x 4 (m), which are shown in Fig. 4.33.
68
(a)
(b)
(c) Fig. 4.33 Different Room Sizes
(a) Large Room (b) Medium Room (c) Small Room
From Fig. 4.34 to Fig. 4.41, we can observe the effects of ATF to the waveforms and the spectrograms. By the comparisons of the figures in (a) and the ones in (c), it can be identified that the ATFs change the waveforms of the separated signals; the difference is implicit without reflection (NR), but it is visible for perfect reflectors (PR) as the wall material in the three different room sizes (Small, Medium, Large). The effect of room sizes
69
to ATFs can be observed in (f). The longer the reverberation time is, the faster the changes in the adjacent frequencies are. The explanation comes from the sum of different time domain shifting of signals cause the frequency domain magnitude variation:
2
Therefore, for a larger room, there exists some larger value of tk tm which cause a faster oscillation of the spectrum. By comparing the spectrograms in (e) with those in (b), we are able to see some blue slices at the frequencies with lower spectrum magnitudes in (f). After the HRTF filtering, the interchannel level difference (ILD) is noticeable in (d), which is related to the HRTF azimuth angle. For the signals at 45, the left channel amplitude is much larger than the right one; in the other hand, for those at 45, the right channel amplitude is larger than the left one.
70
(a) (b)
(c) (d)
(e) (f)
Fig. 4.34 “f01m01”, Separated Signal 1, NR, HRTF at 45
(a) Separated Signal in Time Domain (b) Separated Signal in Time-Frequency Domain (c) After ATF in Time Domain (d) After HRTF in Time Domain
(e) After ATF in Time-Frequency Domain (f) Log 10 Magnitude of ATF
71
(a) (b)
(c) (d)
(e) (f)
Fig. 4.35 “f01m01”, Separated Signal 1, Small Room, PR, HRTF at 45
(a) Separated Signal in Time Domain (b) Separated Signal in Time-Frequency Domain (c) After ATF in Time Domain (d) After HRTF in Time Domain
(e) After ATF in Time-Frequency Domain (f) Log 10 Magnitude of ATF
72
(a) (b)
(c) (d)
(e) (f)
Fig. 4.36 “f01m01”, Separated Signal 1, Medium Room, PR, HRTF at 45
(a) Separated Signal in Time Domain (b) Separated Signal in Time-Frequency Domain (c) After ATF in Time Domain (d) After HRTF in Time Domain
(e) After ATF in Time-Frequency Domain (f) Log 10 Magnitude of ATF
73
(a) (b)
(c) (d)
(e) (f)
Fig. 4.37 “f01m01”, Separated Signal 1, Large Room, PR, HRTF at 45
(a) Separated Signal in Time Domain (b) Separated Signal in Time-Frequency Domain (c) After ATF in Time Domain (d) After HRTF in Time Domain
(e) After ATF in Time-Frequency Domain (f) Log 10 Magnitude of ATF
74
(a) (b)
(c) (d)
(e) (f)
Fig. 4.38 “winter”, Separated Signal 2, NR, HRTF at 45
(a) Separated Signal in Time Domain (b) Separated Signal in Time-Frequency Domain (c) After ATF in Time Domain (d) After HRTF in Time Domain
(e) After ATF in Time-Frequency Domain (f) Log 10 Magnitude of ATF
75
(a) (b)
(c) (d)
(e) (f)
Fig. 4.39 “winter”, Separated Signal 2, Small Room, PR, HRTF at 45
(a) Separated Signal in Time Domain (b) Separated Signal in Time-Frequency Domain (c) After ATF in Time Domain (d) After HRTF in Time Domain
(e) After ATF in Time-Frequency Domain (f) Log 10 Magnitude of ATF
76
(a) (b)
(c) (d)
(e) (f)
Fig. 4.40 “winter”, Separated Signal 2, Medium Room, PR, HRTF at 45
(a) Separated Signal in Time Domain (b) Separated Signal in Time-Frequency Domain (c) After ATF in Time Domain (d) After HRTF in Time Domain
(e) After ATF in Time-Frequency Domain (f) Log 10 Magnitude of ATF
77
(a) (b)
(c) (d)
(e) (f)
Fig. 4.41 “winter”, Separated Signal 2, Large Room, PR, HRTF at 45
(a) Separated Signal in Time Domain (b) Separated Signal in Time-Frequency Domain (c) After ATF in Time Domain (d) After HRTF in Time Domain
(e) After ATF in Time-Frequency Domain (f) Log 10 Magnitude of ATF
78
79 discard the reverberant components and the noise energy. The permutation and scaling problems of FD-ICA are solved by the hybrid DOA and correlation method and the MDP, respectively. A least squares optimization technique based on the cross-power-spectrum approach with the gradient descent algorithm is used for the blind separation of the convolutive mixture signals. The separated signal quality is evaluated by SIR. The simulation and discussion on the SIR values, waveforms, and spectrograms of each input sequence are presented in section 4.1.
To construct a 3D audio on the headphone, the separated signals are filtered by the HRTF and the ATF at the virtual listening point. The interpolation methods of the HRTF and the ATF at the virtual listening point are derived in chapter 3. Chapter 4 discusses the ATFs of different room sizes and different wall materials. The spatial impression, which is given by the combination of the HRTF and the ATF, is demonstrated with the resulting 3D acoustic signals.
The SLAB software is used to generate the audio signals in a room, to capture the microphone array signals, and to measure the ATFs in different room sizes and wall materials. The afterward signal processing implementation is done in MATLAB. The
80
spectrograms of signals in each stage are shown to visualize the signal envelope transition process. Different HRTF scenarios are employed to demonstrate the 3D acoustic feeling of the synthesized signals.
5.2 Future Work
This thesis concentrates on the overall combination of BSS, HRTF and ATF to produce the 3D acoustic signal at a virtual listening point. Yet there are many extensions can be made to improve the quality of the 3D acoustic signal. For example, the source signal location detection can complete the sound field reconstruction and it is also helpful to obtain the corresponding ATF. Another possible subsequent work is the synthesis of moving 3D acoustic signals considering the Doppler effect of the frequency variation along with the relative velocity of each source signal to the virtual listening point. It is also expected to reduce the computational complexity of the overall process, which aims at the real time synthesis of the 3D acoustic signals. Additionally, the graphic user interface (GUI) can improve the interaction of selecting the virtual listening point in the specific acoustic room.
81 Speech and Audio Processing, vol. 11, no. 3, May 2003.
[3] E. Binghan and A. Hyvarinen, "A Fast Fixed-point Algorithm for Independent Component Analysis of Complex Valued Signals," International Journal of Neural Systems, vol. 10, no. 1, pp. 1-8, Feb. 2000.
[4] A. Bell and T. Sejnowski, "An Information-maximization Approach to Blind Separation and Blind Deconvolution," Neural Computation, vol. 7, pp. 1129-1159, 1995.
[5] S. Haykin, Ed., Unsupervised Adaptive Filtering (Volume I: Blind Source Separation), John Wiley & Sons, 2000.
[6] M. Brandstein and D. Ward, Microphone Arrays: Signal Processing Techniques and Applications, Springer, 2001.
[7] A. Hyvarinen and E. Oja, "A Fast Fixed-point Algorithm for Independent Component Analysis," Neural Computation 9, 1483-1492, 1997.
[8] S. Ikeda and N. Murata, "A Method of ICA in Time-Frequency Domain," in Proc.
ICA'99, pp. 365-371, Jan. 1999.
[9] S. Amari, et al., "Stability Analysis of Learning Algorithms for Blind Source Separation," Neural Networks, vol. 10, no. 8, pp. 1345-1351, 1997.
[10] P. Smaragdis, "Blind Separation of Convolved Mixtures in the Frequency Domain," in Proc. Int. Workshop on Independence and Artificial Neural Networks, 1998.
82
[11] S. Amari, "Natural Gradient Works Efficiently in Learning," Neural Computing, vol.
10, no. 2, pp. 251-276, 1998.
[12] S. Winter, et al., "Geometrical Understanding of the PCA Subspace Method for Overdetermined Blind Source Separation," in Proc. ICASSP, pp. 769-772, Apr. 2003.
[13] K. Niwa, et al., “Encoding Large Array Signals into a 3D Sound Field Representation
[13] K. Niwa, et al., “Encoding Large Array Signals into a 3D Sound Field Representation