Indoor sound field feature matching for robot's location and orientation detection

(1)

Indoor sound ﬁeld feature matching for robot’s location

and orientation detection

Jwu-Sheng Hu

1

, Wei-Han Liu

*

_{, Chieh-Cheng Cheng}

2

Department of Electrical and Control Engineering, National Chiao-Tung University, 1001 Ta Hsueh Road, Hsinchu 300, Taiwan, ROC Received 7 December 2006; received in revised form 16 July 2007

Available online 25 September 2007 Communicated by O. Siohan

Abstract

In this work, an indoor sound field feature matching method is proposed and is applied to detect a mobile robot’s location and ori-entation. The sound field feature, captured from a sound source to a pair of microphones, contains the dynamic of the propagation path. Because of the complexity of indoor environment, the features from different path can be distinguished using appropriate models. Gauss-ian mixture models are utilized in this paper to characterize the phase difference and magnitude ratio distributions between the micro-phone pair in consecutive data frames. The application provides an alternative thinking compared with traditional methods such as direction of arrival (DOA) using propagation delay. They usually suffer from reverberation, non-line-of-sight and microphone mismatch problems. The experimental results show the method not only has a high recognition rate for robot’s location and orientation, but also is robust against environmental noise.

Keywords: GMM; Robot localization; Robot’s orientation detection

1. Introduction

Indoor robot localization is an important issue in the field of robotics. Various equipments, such as camera, radio frequency identification (RFID), infrared red (IR), ultra sonic sensor, laser, wireless LAN-based methods and inertial navigation sensor have been adopted to pro-vide different solutions (Borenstein et al., 1996; Georgiev and Allen, 2004; Gutierrez-Osuna et al., 1998; Ladd et al., 2004; Larsson et al., 1996; Lee et al., 2003; McGillem and Rappaport, 1988; Ohya et al., 1998). Pattern matching or pattern recognition-based algorithms are also proposed

in this research domain. Vlassis et al. (2001)utilized edge-based feature vectors of the omni-directional images for robot localization. A place recognition method based on image signature matching was presented for mobile robots (Argamon-Engelson, 1998). For range-ﬁnder-based

sen-sors, Weiss et al. (1994) proposed a method based on

matching two scan results to derive the position and orien-tation of a moving indoor system.

For indoor robots, audio devices such as loudspeakers and microphones are becoming basic equipments. These sound-related devices can generally provide a more nat-ure way for robots to communicate with human. Addi-tionally, some researchers believe that these devices can be utilized for robot localization (Tamai et al., 2004a; Wang et al., 2004). This work investigates the feasibility of using sound ﬁeld feature matching for robot’s location and orientation detection and proposes a robust sound-based indoor robot’s pose detection system utilizing two microphones.

*

Corresponding author. Tel.: +886 3 5712121 54424; fax: +886 3 5715998.

E-mail addresses: [email protected] (J.-S. Hu), lukeliu.ece89g@ nctu.edu.tw(W.-H. Liu),[email protected](C.-C. Cheng).

1 _{Tel.: +886 3 5712121 54318; fax: +886 3 5715998.} 2 _{Tel.: +886 3 5712121 54424; fax: +886 3 5715998.}

www.elsevier.com/locate/patrec Pattern Recognition Letters 29 (2008) 149–160

(2)

1.1. Traditional sound-based robot localization methods and known problems

The idea of using multiple microphones to localize sound sources has been developed for a long time. Among various kinds of sound source localization methods, gener-alized cross-correlation (GCC)-based methods (Brandstein and Silverman, 1997; Carter et al., 1973; Knapp and Car-ter, 1976; Nikas and Shao, 1995) were discussed for robot localization application (Wang et al., 2004). In general, sound-based robot localization system uses a speaker mounted on the robot to produce sound and estimates the location of the sound source, which is the robot’s loca-tion, by a set of microphone array installed in the room (Tamai et al., 2004a; Wang et al., 2004). The main difficulty for indoor robot localization using sound wave is the com-plex propagation behavior such as reflection and diffrac-tion. Theoretically, the values of phase difference and magnitude ratio among microphones are directly related to the sound wave arrival direction and the distance between a sound source and microphones. However, these straightforward relations only exist in free space or envi-ronments with simple geometry. In real envienvi-ronments, these values exhibit stochastic phenomena due to the dis-tributed nature of the propagation path dynamics and the limitation of finite-length data. Furthermore, complex boundary conditions, near-field effect, and local sound scattering make these values hard to correlate with the source location. These variations generally result in uncer-tain estimation errors and make sound-based localization methods unreliable. Moreover, for indoor applications, the robot may move to a location that is non-line-of-sight to the sensors, i.e., without direct paths between the robot and microphones. Under this circumstance, traditional methods cannot locate the robot accurately.

Another well-known problem of sound-based robot localization methods is the microphone mismatch problem. If the microphones are not mutually matched, then the phase diﬀerence information among microphones may be distorted. However, pre-matched microphones are rela-tively expensive and mismatched microphones are diﬃcult to calibrate accurately since the characteristics of micro-phones change with the sound directions. Consequently,

the estimation accuracy varies from diﬀerent microphone pairs and is diﬃcult to be evaluated.

1.2. The proposed method based on sound ﬁeld feature matching

Traditional sound source localization algorithms

attempt to suppress the effects of complex propagation behavior, as well as estimate the direction of the direct sound source. Unlike existing sound-based robot localiza-tion systems which focus on eliminating the influence of reflection and diffraction, this work treats the propagation behavior as a local feature and attempts to recognize it by pattern matching method. In practice, the complex propa-gation behavior of a sound sources results in location or orientation dependent phase difference and magnitude ratio distributions. For example, Figs. 1 and 2 show the histograms of phase difference and the magnitude ratio in consecutive data frames measured between a microphone pair for the location ‘‘A’’ in a line-of-sight case and the location ‘‘B’’ in a non-line-of-sight case (the figure of the experimental environment is shown inFig. 9). Obviously, even under the line-of-sight case, the values of phase differ-ence and magnitude ratio are not fixed due to the complex propagation behavior.

The examples of sound field features given inFigs. 1 and 2are used to mark the location or orientation of a sound source that is mounted on the robot. Notably, both the magnitude ratio and the phase difference between two microphones are content independent. In other words, the content of the sound produced by the robot does not have to be defined. For example, the sound can be conver-sation, or even the noise emitted by an autonomous vac-uum-cleaning robot. This work adopts Gaussian mixture

models (GMMs) (Reynolds and Rose, 1995) to model

phase diﬀerence and magnitude ratio distributions and pro-poses two models, robot localization model (RLM) and robot orientation model (ROM). The ﬁrst model (RLM) is used for robot’s location detection and the second model (ROM) is used for robot’s orientation detection. The unique advantage of the proposed method is the detection of location and orientation in non-line-of-sight cases, i.e., when no direct path is available between the robot and

(3)

the microphones. To adapt to the environmental noises and enhance the robustness of the feature identiﬁcation, an on-line calibration procedure is also proposed.

The remainder of this paper is organized as follows. The next section introduces the overall system architecture. Sec-tion3describes the design of the directional sound pattern for orientation detection. Section 4 presents the formula-tions of the proposed RLM and ROM. The experimental results are discussed in Section 5 and, ﬁnally, conclusions are drawn in Section6.

2. System architecture

As shown in Fig. 3, the proposed system contains two speakers on the robot and a robot’s location and orienta-tion detecorienta-tion agent (RLODA) with two microphones. The RLODA can be placed in any part of the room as long as the reception of sound from the robot is clear enough.3 The sound pattern generated by Speaker 1 (SP1) is received by the RLODA and the RLMs for different sound source locations can be obtained by modeling the location depen-dent sound field features (phase difference and magnitude ratio distributions) measured between the two micro-phones. When the system attempts to build the ROMs, both SP1 and SP2 are used to generate a directional sound pattern. Note that the detail of generating a directional

sound pattern is described in Section3. Because the sound pattern generated by SP1 and SP2 is directional, the sound ﬁeld features change with the robot’s orientation and can be utilized for orientation detection.

Fig. 4depicts the overall system architecture. Stage I in

Fig. 4is the pre-recording stage, in which the robot moves and changes its orientation in the environment when the environment is quiet, and produces sound through the speakers to obtain a pre-recorded database. Since the sound is recorded by the two microphones, the information of the sound ﬁeld features can be obtained by this database.

Once the pre-recording stage is finished, the system enters Stage II called silent stage. In this stage, the robot remains silent and the RLODA records the environmental noises. Assuming that noise signals are additive, the sound recorded in real application can be considered as the linear combination of robot’s sound and environmental noises. Therefore, this stage adds the environmental noises to the pre-recorded database to construct the training features, phase difference and magnitude ratio distributions, and then utilizes these features to trains the parameters of RLMs and ROMs. Through this process, the effect of envi-ronmental noises is adapted in this stage.

When the robot needs to know its location or orienta-tion, the system then switches to the sounding stage, in which the robot produces a sound into the room for the RLODA to detect the robot’s location or orientation. If the robot’s location is required, the SP1 is used to generate sound; conversely, both SP1 and SP2 are excited if the robot’s orientation is needed. Because the microphones used in these three stages are the same, the mismatched characteristics between microphones are collected in the pre-recording database and would not inﬂuence the detec-tion results of proposed system. The sounding and the silent stages can be switched to each other iteratively for location or orientation detection and environmental noises adaptation. Fig. 5 illustrates the ﬂowchart of proposed system.

Additionally, wireless communication technologies such as Wireless Ethernet can be adopted to accomplish the stage synchronization and communication between the robot and the RLODA.

Fig. 3. Speaker and microphone conﬁguration of the proposed system.

3 _{In this paper, we do not discuss the issue of placement of RLODA.}

(4)

3. Directional sound pattern design for robot orientation detection

To detect the robot’s orientation by the sound ﬁeld features, the sound pattern generated by the robot

should be correlated with the robot’s orientation. How-ever, a general omni-directional sound pattern may lead to the same sound ﬁelds when the robot changes its ori-entation because the emitted sound has the same charac-teristics in all directions. Therefore, a directional sound emission approach must be designed. To realize a direc-tional sound pattern, the idea of speaker array beam-forming (Tamai et al., 2004b; Yamada et al., 2004) is adopted in this work to guarantee the directivity of the generated sound pattern. Besides directivity, another con-straint on the generated sound pattern is the number of symmetric axes (b) in the horizontal plane. Fig. 6 shows an example of how b aﬀects the orientation detection, where the solid line denotes the generated sound pattern, the dotted line denotes the symmetric axes, and the arrow denotes the robot’s orientation.

As shown inFig. 6, the sound patterns generated when the robot’s orientation is 0, 90, 180, and 270 are exactly the same when b = 4. A sound pattern generated when the robot points at a certain direction (0 in the example) would have b 1 identical sound patterns. Therefore, the generated sound could only be symmetrical along one axis (b = 1) to avoid confusion in orientation detection. Conse-quently, this work proposes a method that utilizes two speakers to generate the sound pattern that conforms to the constraint by

JSP1ðnÞ ¼ J ðnÞ JSP2ðnÞ ¼ 0:5 J ðnÞ

ð1Þ where J(n) is the original sound source and JSP1(n) and JSP2(n) is the sound emitted by SP1 and SP2. The distance between two speakers is set to 0.2 m.

Fig. 4. Overall system architecture.

(5)

Fig. 7depicts the simulation of the generated sound pat-tern of the proposed system based on the sound

propaga-tion theories introduced by Parker (1988) when the

robot’s orientation is 0, where the sound power is mea-sured at 1 m away from the SP1 with the same height. The solid lines in the circle depict the relative sound power in dB. As shown inFig. 7, the generated sound pattern is symmetric along only one axis and is suitable for robot’s orientation detection.

4. Robot localization model (RLM) and robot orientation model (ROM)

4.1. A description of the proposed RLM and ROM

To establish both RLMs and ROMs, the RLODA needs to construct models for the sound fields at different loca-tions and orientaloca-tions. PSx(xb) and MSx(xb) denote the phase difference and magnitude ratio, respectively, for con-structing RLM (S = L) or ROM (S = O) at frequency xb, b2 {1, . . ., B}. The GMMs are defined as the weighted sum of N1 and N2 mixtures of Gaussian component densities shown below, GðPSxjkSPÞ ¼ XN1 i¼1 q_SP;ig_iðPSxÞ ð2Þ GðMSxjkSMÞ ¼ XN2 i¼1 q_SM;ig_iðMSxÞ ð3Þ where S¼ fL; Og, PSx¼ P½ Sxðx1Þ PSxðxBÞ T , MSx¼ MSxðx1Þ MSxðxBÞ

½ T. qSP,iand qSM,iare the ith

mix-ture weights, and gi(PSx) and gi(MSx) are the Gaussian den-sity function. Notably, the mixture weights must satisfy the constraints: XN1 i¼1 q_SP;i¼ 1 and X N2 i¼1 q_SM;i¼ 1 ð4Þ

Fig. 7. Simulation of generated sound pattern. Fig. 6. Relations between b and the sound pattern.

(6)

The terms kSPand kSMrepresent the parameters of N1and N2component densities:

kSP¼ fqSP;lSP;RSPg and kSM¼ fqSM;lSM;RSMg ð5Þ where

qSP¼ b qSP;1 qSP;N1c denotes the phase diﬀerence

mixture weight vector with dimensions 1· N1.

qSM¼ b qSM;1 qSM;N2c denotes the magnitude

ratio mixture weight vector with dimensions 1· N2. l_SP¼ b lSP;1 lSP;N1c denotes the phase diﬀerence

mean matrix with dimensions B· N1.

l_SM¼ b lSM;1 lSM;N2c denotes the magnitude

ratio mean matrix with dimensions B· N2.

RSP¼ b RSP;1 RSP;N1c denotes the phase diﬀerence

covariance matrix with dimensions B· BN1.

RSM¼ b RSM;1 RSM;N2c denotes the magnitude

ratio covariance matrix with dimensions B· BN2. The parameters kSPand kSMin(5)can be estimated by the iterative EM algorithm (Xuan et al., 2001) which guar-antees a monotonic increase in the model’s log-likelihood value. The iterative procedure can be divided into the fol-lowing two steps:

Expectation step: G ijPðtÞSx;kSP ¼ qSP;igi P ðtÞ Sx _XN1 i¼1 qSP;igi P ðtÞ Sx , ð6Þ G ijMðtÞSx;kSM ¼ qSM;igi M ðtÞ Sx XN2 i¼1 qSM;igi M ðtÞ Sx , ð7Þ where G ijPðtÞSx;kSP and G ijMðtÞSx;kSM are posteriori probabilities. Maximization step:

(i) Estimate the mixture weights:

qSP;i¼ 1 T XT t¼1 G ijPðtÞSx;kSP , ð8Þ qSM;i ¼ 1 T XT t¼1 G ijMðtÞ_Sx;kSM , ð9Þ

(ii) Estimate the mean vector:

lSP;i¼ XT t¼1 G ijPðtÞSx;kSP PðtÞSx , XT t¼1 G ijPðtÞSx;kSP ð10Þ lSM;i ¼ XT t¼1 G ijMðtÞSx;kSM MðtÞSx , XT t¼1 G ijMðtÞSx;kSM ð11Þ

(iii) Estimate the variances:

r2SP;iðxbÞ ¼ X T t¼1 G ijPðtÞ_Sx;kSP PðtÞ2_SxðxbÞ , XT t¼1 G ijPðtÞ_Sx;kSP ! l2 SP;iðxbÞ ð12Þ r2 SM;iðxbÞ ¼ X T t¼1 G ijMðtÞSx;kSM MðtÞ2SxðxbÞ , XT t¼1 G ijMðtÞSx;kSM ! l2 SM;iðxbÞ ð13Þ

However, the EM algorithm is sensitive to the choice of initial model. A good choice of initial model results in a lower number of iterations of the EM algorithm. K-means related approaches are known to be eﬀective in ﬁnding a suitable initial model (MacQueen, 1967). This work utilizes

an accelerated K-means algorithm proposed by Elkan

(2003), which can signiﬁcantly reduce the computational power requirement.

The proposed RLM and ROM at each location and orientation are deﬁned as the linear combination of the phase diﬀerence GMM and the magnitude ratio GMM:

FRLM¼ aLPGðPLxjkLPÞ þ aLMGðMLxjkLMÞ ð14Þ FROM¼ aOPGðPOxjkOPÞ þ aOMGðMOxjkOMÞ ð15Þ where aLP, aOP, aLMand aOMrepresent the weighting fac-tors. The values of aSPand aSMcan be chosen based on the sum of the correlation values among trained locations of the phase diﬀerence GMM and magnitude ratio GMM. The GMM with higher correlation summation would be assigned a lower weight, since the ability to discriminate is considered lower under this circumstance, and vice versa. Under this principle, aSP and aSM are determined by the following formula: min X qSP aSPfCSPðqSPÞUCSPðqSPÞ T g ( þX qSM aSMfCSMðqSMÞUCSMðqSMÞ T g ) s:t: aSPaSM¼ 1; aSP>0; aSM>0 ð16Þ

where qSP2 QSPand qSM2 QSMare the B dimensional ran-dom vectors in the operation ranges, QSPand QSM.

CSPðqSPÞ ¼ Cðq½ SPjkSPð1ÞÞ CðqSPjkSPð2ÞÞ CðqSPjkSPðLÞÞ; CSMðqSMÞ ¼ Cðq½ SMjkSMð1ÞÞ CðqSMjkSMð2ÞÞ CðqSMjkSMðLÞÞ;

(7)

and U¼ 0 1 1 1 0 0 1 1 1 .. . 0 0 .. . 1 .. . .. . 0 .. . 1 1 .. . .. . .. . .. . 0 1 0 0 0 0 0 0 2 6 6 6 6 6 6 6 6 6 6 4 3 7 7 7 7 7 7 7 7 7 7 5 with dimension L· L. In addition, CðqSPjkSPðlÞÞ ¼ H ðqSPjkSPðlÞÞ ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi X qSP H2_ðq SPjkSPðlÞÞ s , ; ð17Þ CðqSMjkSMðlÞÞ ¼ H ðqSMjkSMðlÞÞ ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi X qSM H2ðqSMjkSMðlÞÞ s , ; ð18Þ HðqSPjkSPðlÞÞ ¼ GðqSPjkSPðlÞÞ X q_SP GðqSPjkSPðlÞÞ=N ðqSPÞ ! ; and HðqSMjkSMðlÞÞ ¼ GðqSMjkSMðlÞÞ X q_SM GðqSMjkSMðlÞÞ=N ðqSMÞ ! ð19Þ where N(qSP) and N(qSM) denote the total selected numbers of qSPand qSM.

The values of aSP and aSM can be obtained by solving

(16)as: aSP¼ ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi X qSM CSMðqSMÞUCSMðqSMÞ T , X qSP CSPðqSPÞUCSPðqSPÞ T v u u t ð20Þ aSM¼ ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi X q_SP CSPðqSPÞUCSPðqSPÞ T , X q_SM CSMðqSMÞUCSMðqSMÞ T v u u t ð21Þ 4.2. Sound field feature matching for location and orientation detection

The location and orientation are determined by ﬁnding the maximum a posteriori location probability and a poste-riori orientation probability for a given observation sequence: ^l ¼ arg max 16l6LFRLMðlÞ ¼ arg max 16l6LaLPGðkLPðlÞjPLYÞ þ aLMGðkLMðlÞjMLYÞ ¼ arg max 16l6LaLPðGðPLYjkLPðlÞÞpðkLPðlÞÞ=pðPLYÞÞ þ aLMðGðMLYjkLMðlÞÞpðkLMðlÞÞ=pðMLYÞÞ ð22Þ ^ o¼ arg max

16o6OFROMðoÞ ¼ arg max

16o6OaOPðGðPOYjkOPðoÞÞpðkOPðoÞÞ=pðPOYÞÞ

þ aOMðGðMOYjkOMðoÞÞpðkOMðoÞÞ=pðMOYÞÞ ð23Þ where PSY¼ Pð1ÞSY; . . . ;P ðV Þ SY n o and MSY¼ Mð1ÞSY; . . . ; n

MðV Þ_SYg are the phase diﬀerence and magnitude ratio com-puted from the testing sequences denoted as YS1(x) and YS2(x), and V denotes the testing sequence length. The probabilities p(kLP(l)) and p(kLM(l)) could be selected as 1/L and p (kOP(o)) and p(kOM(o)) could be selected as 1/O since the probability in each location and orientation is equally likely for a blind search. Moreover, because the probability densities p(PSY) and p(MSY) are the same for all location models, the detection rule can be recast as: ^l ¼ arg max 16l6LaLP YV v¼1 G PðvÞLYjkLPðlÞ þ aLM YV v¼1 G MðvÞLYjkLMðlÞ ð24Þ ^ o¼ arg max 16o6OaOP YV v¼1 GðPðvÞOYjkOPðoÞÞ þ aOM YV v¼1 G MðvÞOYjkOMðoÞ ð25Þ 5. Experimental results

Fig. 8 shows the experimental platform and the

pro-posed RLODA. In Fig. 8a, the distance between two

speakers is 0.2 m. Considering the spatial aliasing problem (Brandstein and Ward, 2001) and the highest frequency of sound generated by the robot, which is 2 kHz in this exper-iment, the distance between the two microphones of the

RLODA is chosen as 0.07 m, as shown in Fig. 8b. The

experiment was performed in an office room filled with fur-niture, which is 11.4 m in length, 4.73 m in width and 2.8 m in height. Two off-the-shelf, non-calibrated microphones are utilized on the ROLDA in this experiment and the RLODA is implemented on a PC with a stereo recording sound card. The sampling rate is 8 kHz, and the A/D res-olution is 16 bit. The pre-recording is performed every 0.1 m within the region in which the robot is allowed to tra-vel. For orientation detection, the robot is rotated in every 30 step to obtain 12 orientations in 360.

Fig. 9 depicts the experimental environment and the location of the RLODA. Note that there is a partition room in the oﬃce. Therefore, the robot is completely under non-line-of-sight case when it is in the partition room. The robot’s moving trajectories are also shown in Fig. 9 with the dotted lines from 1 to 8 in sequence.

The lengths of the training sequence and the testing sequence were set to 300 and 30. In other words, a three-second length input datum was set for training, and a 0.3 second length input datum was set for testing. The major noise in this experiment is speech noise and the minor noises are electric noise such as air conditioner noise, com-puter fan noise to simulate a general indoor environment.

(8)

Table 1 lists the average SNRs of all trajectories and the average SNRs of each trajectory pair. Fig. 10 shows the location detection results along the robot’s moving trajec-tory with a mixture number of 15 and an average SNR of 7.91 dB. As shown in Fig. 10, the location detection results are mostly very close to the actual location for most of the time.

The proposed method models the phase difference and magnitude ratio distributions measured from the sounds generated by the robot to perform robot’s location and ori-entation detection. However, the sound field features of the noise start to dominate the phase difference and magnitude ratio distributions with the increment of noise power. In this circumstance, the RLMs and ROMs may become less distinguishable and may degrade the performance of the proposed method. In Fig. 10, the detection error occurs most frequently on trajectories 1 and 8, because some area of these trajectories is completely in the partition room and the average SNR of these trajectories is lower than those of other trajectories, as shown inTable 2. Although trajecto-ries 1 and 8 contain locations that are in non-line-of-sight case, the location dependent sound field features can still be caught by the proposed RLMs.

Several experiments are conducted to access the accu-racy of the proposed method in terms of location and ori-entation detection error. Table 2 lists the average correct

rates of the location detection results where D denotes the distance between the actual location and the nearest location in the recorded database. Notably, the pre-recorded locations are discrete and are 0.1 m apart. In this experiment, if the detected result is the nearest pre-recorded location in the database, it will be regarded as a correct one. Additionally, the trial numbers for localization detec-tion and orientadetec-tion detecdetec-tion are 1210 and 332 individu-ally for each condition. As shown in Table 2, if only a single Gaussian component is utilized (M = 1), then the average correct rates are too low to be acceptable in both two SNR cases. However, the average correct rates are improved to more than 95% when the mixture number is increased (M = 11 and M = 15) and 0 6 D < 1 cm.

Fig. 9. Experimental environment. Fig. 8. The experimental platform and the proposed RLODA. (a) The

experimental platform. (b) The proposed RLODA.

Table 1

Average SNRs of all trajectories and the average SNRs of each trajectory pair (dB) Average SNR Average SNR of trajectories 1 and 8 Average SNR of trajectories 2 and 3 Average SNR of trajectories 4 and 5 Average SNR of trajectories 6 and 7 19.87 13.94 23.34 16.44 17.69 7.91 2.76 10.93 4.93 6.01

(9)

Table 3shows the average correct rates of the orienta-tion detecorienta-tion results, where A denotes the distance between the actual and the pre-recorded orientations. If the orientation detection result is the nearest pre-recorded orientation to the actual orientation, the result will be con-sidered correct. Note that the experiment is performed after a correct location is detected. As shown inTable 3, when M = 1, the average correct rates are lower than 60%. These results show that a single Gaussian component is not appropriate for modeling the ROMs. When M = 11, the average correct rates are much higher than those when M = 1 in both the SNR cases. In the condition of 0 6 A < 4, the average correct rates exceed 99% in both the SNR cases.

Fig. 11 shows the average of a posteriori probabilities measured at the locations ‘‘A’’ and ‘‘B’’, where the location ‘‘A’’ is in a line-of-sight case and the location ‘‘B’’ is in a non-line-of-sight case, as illustrated in Fig. 9. Notably, the a posteriori location probability is deﬁned as:

aLP YV v¼1 G PðvÞLYjkLPðlÞ þ aLM YV v¼1 G MðvÞLYjkLMðlÞ ð26Þ

and the a posteriori orientation probability is deﬁned as:

aOP YV v¼1 G PðvÞ_OYjkOPðoÞ þ aOM YV v¼1 G MðvÞ_OYjkOMðoÞ ð27Þ

Fig. 10. Location detection results alone X and Y axes. (a) Location detection results alone X axis. (b) Location detection results alone Y axis.

Table 2

Average correct rates of location detection results (%) Average SNR (dB) M = 1 M = 11 M = 15 0 6 D < 1 (cm) 1 6 D < 3 (cm) 3 6 D < 5 (cm) 0 6 D < 1 (cm) 1 6 D < 3 (cm) 3 6 D < 5 (cm) 0 6 D < 1 (cm) 1 6 D < 3 (cm) 3 6 D < 5 (cm) 19.87 24.00 20.83 20.41 95.45 95.00 85.45 97.19 95.00 88.35 7.91 22.98 22.89 17.52 91.98 89.50 84.13 94.38 87.93 81.57 Table 3

Average correct rates of orientation detection results (%)

Average SNR (dB) M = 1 M = 11

0 6 A < 4 4 6 A < 8 8 6 A < 12 12 6 A < 15 0 6 A < 4 4 6 A < 8 8 6 A < 12 12 6 A < 15

19.16 58.43 48.49 45.78 44.28 99.70 88.55 84.04 81.33

(10)

The average SNRs belong to the lowest SNR conditions in

Tables 2 and 3 individually. The mixture number in

Fig. 11a and b is 15, and the mixture number inFig. 11c and d is 11. The location ‘‘A’’ denotes the 113th location and the location ‘‘B’’ represents the 220th location. In the

case of 0 6 D < 1 cm, the averages of (26) (averages of a posteriori location probabilities) measured with the correct locations indices (l = 113 and 220) are much higher than those of other location indices, as shown in Fig. 11a and b. However, since the sound ﬁeld feature varies with the

ro-Fig. 11. The average of the measured a posteriori probabilities. (a) The average a posteriori location probabilities at the location ‘‘A’’. (b) The average a posteriori location probabilities at the location ‘‘B’’. (c) The average a posteriori orientation probabilities at the location ‘‘A’’. (d) The average a posteriori orientation probabilities at the location ‘‘B’’.

(11)

bot’s location and orientation, the phase diﬀerence and magnitude ratio distributions are becoming less similar while the robot is moving away from the pre-recorded loca-tion or orientaloca-tion. Therefore, inFig. 11a and b, the

diﬀer-ence between the averages of (26) measured with the

correct locations indices and with other location indices are becoming less obvious with the increase of D, and then the chance of detection error rises. This tendency explains why the average correct rates of location detection inTable 2degrade with the increase of the distances between the ac-tual and the pre-recorded locations. Although the averages

of(26)measured with the correct locations indices decrease with the increase of D, it is still higher than those measured with other location indices; as a result, the correct rates listed in Table 2 remain above 80% when 3 6 D < 5 cm. The same phenomenon appears in the experiment of orien-tation detection.Fig. 11c and d depicts the average of(27)

(averages of a posteriori orientation probabilities) with the

correct orientations of 0 for Fig. 11c and 270 for

Fig. 11d. The average of (27)measured at the correct ori-entation indices drops with the increase of A in both line-of-sight and non-line-line-of-sight cases and so does the average

(12)

correct rates of the orientation detection inTable 3. These experimental results show that utilizing GMMs to model the sound ﬁeld features is a feasible method for robot’s location and orientation detection.

6. Conclusion

A novel robot’s location and orientation detection method based on sound field features matching is pro-posed. The proposed method treats phase difference and magnitude ratio distributions between the microphones as distinct sound field features, and models them by GMMs to detect a robot’s location and orientation. Since the pro-posed method makes no assumptions about the spatial relationship between sound sources and microphones, it can be applied to both line-of-sight and non-line-of-sight cases. Moreover, the modeled sound field features are con-tent independent, so the concon-tent of sound can be designed arbitrarily. A system architecture is also proposed to pro-vide robustness to environmental noises. The proposed method is suitable to be integrated with other robot loca-tion or orientaloca-tion detecloca-tion algorithms based on different sensors to provide initial conditions for reducing the search effort, or to compensate for localizing certain locations that cannot be detected using other localization methods to per-form more robust, more accurate and faster pose and glo-bal location detection.

Acknowledgements

This work was supported by National Science Council of the ROC under Grant No. NSC94-2218-E009064 and MOE ATU Program under the account number 95W803E. References

Argamon-Engelson, S., 1998. Using image signatures for place recogni-tion. Pattern Recog. Lett. 19 (10), 941–951.

Borenstein, J., Everett, H.R., Feng, L., 1996. Navigating Mobile Robots: Sensors and Techniques. A.K. Peters, Wellesley, MA.

Brandstein, M.S., Silverman, H.F., 1997. A robust method for speech signal time-delay estimation in reverberant rooms. IEEE Int. Conf. Acoust. Speech Signal Process. 1, 375–378.

Brandstein, M., Ward, D., 2001. Microphone Arrays: Signal Processing Techniques and Applications. Springer-Verlag, New York, p. 26 (Chapter 2).

Carter, G.C., Nuttall, A.H., Cable, P.G., 1973. The smoothed coherence transform. IEEE Sig. Process. Lett. 61, 1497–1498.

Elkan, C., 2003. Using the triangle inequality to accelerate k-means. Proc. 20th Int. Conf. Machine Learning, 147–153.

Georgiev, A., Allen, P.K., 2004. Localization methods for a mobile robot in urban environments. IEEE Trans. Robot. 21, 851–864.

Gutierrez-Osuna, R., Janet, J.A., Luo, R.C., 1998. Modeling of ultrasonic range sensors for localization of autonomous mobile robots. IEEE Trans. Ind. Electron. 45 (4), 654–662.

Knapp, C.H., Carter, G.C., 1976. The generalized correlation method for estimation of time delay. IEEE Trans. Acoust. Speech Signal Process. 24, 320–327.

Ladd, A.M., Bekris, K.E., Rudys, A.P., Wallach, D.S., Kavraki, L.E., 2004. On the feasibility of using wireless ethernet for indoor localiza-tion. IEEE Trans. Robot. Automat. 20 (3), 555–559.

Larsson, U., Frosberg, J., Wernersson, A., 1996. Mobile robot localiza-tion: Integrating measurements from a time-of-ﬂight laser. IEEE Trans. Ind. Electron. 43 (3), 422–431.

Lee, J.M., Son, K., Lee, M.C., Choi, J.W., Han, S.H., Lee, M.H., 2003. Localization of a mobile robot using the image of a moving robot. IEEE Trans. Ind. Electron. 50 (3), 612–619.

MacQueen, J.B., 1967. Some methods for classiﬁcation and analysis of multivariate observations. Proc. Fifth Berkeley Symp. Mathematical Statistics and Probability, 281–297.

McGillem, C.D., Rappaport, T.S., 1988. Infra-red location system for navigation of autonomous vehicles. IEEE Int. Conf. Robot. Automat., 1236–1238.

Nikas, C.L., Shao, M., 1995. Signal Processing with Alpha-Stable Distributions and Applications. Wiley, New York.

Ohya, I., Kosaka, A., Kak, A., 1998. Vision-based navigation by a mobile robot with obstacle avoidance using single-camera vision and ultra-sonic sensing. IEEE Trans. Robot. Automat. 14 (6), 969–978. Parker, S.P., 1988. Acoustic Source Book. McGraw-Hill, New York. Reynolds, D.A., Rose, R.C., 1995. Robust text-independent speaker

identiﬁcation using Gaussian mixture speaker models. IEEE Trans. Speech Audio Process. 3 (1), 72–83.

Tamai, Y., Kagami, S., Mizoguchi, H., Amemiya, Y., Nagashima, K., Takano, T., 2004a. Real-time two-dimensional sound source localiza-tion by 128-channel huge microphone array. IEEE Int. Workshop Robot Human Interact. Commun., 65–70.

Tamai, Y., Kagami, S., Mizoguchi, H., Amemiya, Y., Nagashima, K., Takano, T., 2004b. Sound spot generation by 128-channel surround speaker array. IEEE Int. Workshop Sensor Array Multichannel Sig. Process., 542–546.

Vlassis, N., Motomurat, Y., Hara, I., Asoh, H., 2001. Edge-based features from omnidirectional images for robot. Proc. IEEE Int. Conf. Robot. Automat., 1579–1584.

Wang, Q.H., Ivanov, T., Aarabi, P., 2004. Acoustic robot navigation using distributed microphone arrays. Inform. Fusion 5, 131–140. Weiss, G., Wetzler, C., von Puttkamer, E., 1994. Keeping track of position

and orientation of moving indoor systems by correlation of range-ﬁnder scans. Proc. IEEE/RSJ/GI Int. Conf. IROS, 595–601. Xuan, G., Zhang, W., Chai, P., 2001. EM algorithms of Gaussian mixture

model and hidden Markov model. IEEE Int. Conf. Image Process., 145–148.

Yamada, M., Itsuki, N., Kinouchi, Y., 2004. Adaptive directivity control of speaker array. Control Automat. Robot. Vis. Conf., 1143–1148.