• 沒有找到結果。

S IMULATION RESULTS OF S YNTHETIC I MAGES

CHAPTER 3. MONOCULAR VISUAL NAVIGATION DESIGN

3.5 S IMULATION RESULTS OF S YNTHETIC I MAGES

In order to test the proposed system under different environments, a computer synthetic scene as shown in Fig. 3-8(a) was established by using Matlab to examine the classification rate with a known ground truth. The synthetic scene contains three walls, with the same texture. In each trial of the experiment, the path of the robot is randomly generated to observe 10 consecutive views. The texture of the walls also changed randomly in each trial. Fig. 3-8 (b) shows an 0.96%. The average false detection rate of features is 0.84%. The experiment also shows that

Ground

Original Scaled Original Scaled Original Scaled Original Scaled

1.6 45 4 1.63 1.6 0.03 0.0 1.94 1.7

the maximum false detection rate of region is 2.91%. The average false detection rate of region is 2.08%.

3.6 Summary

A monocular vision-based robot navigation system has been developed by combining IPT and ground region segmentation. Useful steps in image processing have been developed and integrated into the ground plane detection algorithm to enhance the robustness of the system.

To fulfill the requirement of real-time performance, we propose to combine a corner detection algorithm with SURF feature descriptor such that a scaled down image can give acceptable accuracy in distance measurement. Experimental results show that the system has adequate

(a) ( (b)

Fig. 3-8: The synthetic room experiment. (a) The structure of the environment. (b) A classification result. The red zone indicates the region correctly labeled as ground, while

the yellow region indicates false detection.

accuracy of 1% for distance measurement in a range of 6 meters. A local grid map can thus be built with single camera for robot navigation. The proposed MonoVNS can be an alternative to popular laser range finders and RGB-D sensing devices like Kinect for robot navigation applications. The merit of this approach is to have both obstacle avoidance and path finding functions with a low cost monocular camera.

Chapter 4.

Location Aware System Design

4.1 Introduction

In this chapter, the detailed implementation of intelligent environment and mobile robot is described, including the hardware and software module. The implemented system aims to provide intruder detection and user monitoring services. The implemented system will be applied in the experiment reported in Chapter 5.

4.2 Intelligent Environment

The developed intelligent environment consists of three main sub-systems:

ZigBee-based intrusion detection system, multimedia Human-machine interface and OSA platform. The intrusion detection system can sense abnormal states around the sensor and report the location. The video server controls all the cameras in the house to acquire images.

The selected camera is able to take pictures from the elderly according to the location information provided by the location server. These information will be sent to users through an Open Service Access (OSA) Gateway[48]. In addition to passive monitoring, the elderly can also ask for help via the speech recognition system. His/her location can be found and image can be sent to the user B in real time.

4.2.1 Intrusion Detection Sensors

We designed several useful sensor modules for intrusion detection to verify the effectiveness of the proposed algorithm [49-50]. Fig. 4-1 shows such an intruder detection

sensor module. It consists of an 8-bit microcontroller Atmega128L (Atmel Corp.), a ZigBee chip CC2420 (Texas Instruments), and an onboard Freescale MMA7260QT triaxial accelerometer (Freescale Semiconductors). Two other sensor modules, a pyro sensor and a microphone sensor, were also designed and integrated into the experiments. The details of these sensor modules are described below:

a) Triaxial accelerometer

(a)

(b)

(c)

Fig. 4-1: Intruder detection module (a) hardware of the module (b) signal of the pyro sensor (c) signal of the microphone.

The triaxial accelerometer is applied to detect vibrations. Individual magnitudes of the dynamic acceleration from X, Y and Z axes are first isolated. The summation of these magnitudes can thus be used to detect abnormal vibrations during an intrusion or collision.

b) Pyro sensor

We adopted off-the-shelf pyro-electric sensor components. The pyro sensor can detect the presence of humans in a certain area by a change in the voltage level of sensor output. As Fig.

4-1 (b) shows, if the voltage level is close to zero, the sensor will be triggered.

c) Microphone

A simple microphone was integrated into the ZigBee module for abnormal sound detection. The rising edge of the filtered microphone signal will trigger the interrupt of the chip and can be used to determine loud sounds (Fig. 4-1). The trigger level is set at +2.01 V. A loud sound can be caused by a falling object resulting from an intrusion or the breaking of a window.

If any sensor is triggered, a message including the current time, the ID of the triggered sensor, and the location of the node will be sent to the robot. Since the robot may be far from the location, an adaptive route selection algorithm is applied to send the message by forwarding data packets to the robot via selected sensor nodes [51].

4.2.2 Visualization System

The coordinates estimated and provided by ZigBee sensor are in format of numbers, which is difficult for human users to understand. Therefore in practice a usable system needs to represent the results in a more comprehensible way. The most common method nowadays is to use figure or images such as floor plan or map. However, in order to monitor the current state of the elders, the located elderly will be actually shown on the physical image obtained from a

pan-tilt camera. To achieve this feature, one needs to apply camera calibration technique.

Using camera calibration, the projection between world and image coordinates can be obtained.

In order to increase the coverage area of the camera, the pant-tilt ability of the camera is also necessary. The system first calibrate the camera and build the camera model, which represents the mapping from the world coordinate to the image plane and the pan-tilt movement of the camera.

The calibration process starts from the geometric camera calibration procedure, which aims to estimate the intrinsic and extrinsic parameters of a static pin-hole camera model. The implemented algorithm is the linear estimation method given in [38]. Assumed that the target environment observed by a camera and the image position pi=(ui,vi) T, of n feature points with known world coordinate vectors Pi=(xi,yi,zi) T are matched in the image, i =1…n .The projection between world and image coordinates can then be modeled by

including rotation matrix R and translation vector t. K represents intrinsic parameters, including skew and aspect ratio. Collecting the constraints from (2) associated with n pairs

As long as n6 (since P is a 2n by 12 matrix), the homogeneous linear least-squares can be exploited to compute the unit vector m that minimize |Pm|2 and thus estimate M. In this system, the n correlation points are marked manually from the captured image scene with a GUI interface. To increase the accuracy and avoid outliers, the Random Sample Consensus (RANSAC) algorithm is also implemented. This algorithm picks n points from a large set of sample points randomly and the estimate the matrix iteratively until the fitting error is below the expected criterion. In the experiment, the average error is around 2 pixels.

The second step is to estimate the influence brought by the pan-tilt movement. In the proposed system a pan-tilt camera model, as shown in Fig.4, is used. A rotation around the x-axis, Rx, and the y-axis, Ry, correspond to tilt and pan respectively. With this model, the projection between world and image coordinates can then be modified as:

 accuracy. The model assumes that the center of rotation of the camera is fixed and coincides with the lens center of projection during operation, which many cameras may violate such an assumption. However, in this application, the deviation of the center is negligible.

To validate the camera calibration result, eight positions in the experimental room were assigned to emulate the result of the positioning engine. As shown in Fig. 4-2, given the exact coordinate, the camera successful turns and marks the person (in the bathroom) on the image.

Thus the localized person can be visualized and the picture can be transferred through the OSA platform. To further exam the result, an experiment of user tracking combining the position engine experiment also performed. The user carried the beacon moved around room, and stopped at one of the eight position assigned in Fig. 4-2. The position engine will estimate the location of the user and the camera will move toward the user using the given coordinate. The

result showed that the user can always be captured by the camera, as shown as shown in Fig.

4-3.

4.2.3 Speech Recognition system

Considering that elderly members are generally few in a family, say around 3 persons, the mode of the speech recognition system is set as speaker-dependent. The command and user index are pre-defined integers to represent the recognition results of the input speech signals.

Fig. 4-2: An example of the “target marking” of the visualization system. (a)The map of the experiment environment. (b)The graphic user interface and the view of the pan-tilt camera at

coordinate (8,-1)

(a) (b) (c)

Fig. 4-3: Snapshots from the experimental result of the “target marking” of the visualization system. (a) the person was at the coordinate (0,0) (b) the person was at the coordinate (0,6) (c)

the person was at the coordinate (8,-1) (0,0)

(8,-1)

(3,1)

(0,6)

Camera

(3,6)

(6,5) (4,4)

(b)

Real-time Stream from the pan-tilt camera with highlighted region

(a)

The recognition algorithm is based on the dynamic time warping (DTW) method [52]. The procedure of the speech recognition is divided into three parts.

The first part is the process of speech information sampling, which is the setting of gain control and sampling frequency when analog signals transforms to digital signals. The second part is the extraction of speech features. The endpoint detection process can determine the location of real speech signals by the short time energy detection and zero-crossing rate detection. The started 128 samples are used to determine the threshold value of the energy detection and zero-crossing rate detection. When the real speech signals are determined, they are divided into a 16ms length frame. Each frame has an overlap region. In order to do the encroachment of the high frequency spectrum, the pre-emphasis work is adopted. Considering the continuity of signals at two sides of a frame; Hamming window is used on every frame. It can hold the signals at mid parts and press down the signals at two sides. The Mel Frequency Cestral Coefficients (MFCCs) is extracted for every frame. A feature vector represents each frame. The DTW recognizer compares the input feature vectors with the reference speech and gets a minimum matching error as the recognition result. The third part is the process of feature matching by the dynamic time warping method. The speech recognition system is implemented in a TI TMS320VC5402 DSK board.

4.2.4 OSA platform

The OSA API is an open, standardized interface for applications to use the capabilities of a network without owning it or knowing its technology. It consists of a framework, which is in charge of access control and service discovery; and some Service Capability Features, which map to network capabilities. It is specified and standardized in the Joint API Group, with participation of 3GPP, ETSI and the Parlay Group: a single API for the whole developer community.

Parlay/OSA [48] enables operator and 3rd party applications to make use of network functionality through a set of open, standardized interfaces. This leads to several advantages such as shorter time to market (TTM) for applications, network independency, etc. Parlay/OSA Gateway consists of several Service Capability Servers (SCS): functional entities that provide Parlay/OSA interfaces towards applications. Each SCS is seen by applications as one or more Service Capability Features (SCFs): abstractions of the functionality offered by the network, accessible via the Parlay/OSA API. Sometimes they are also called services. The Parlay/OSA SCFs are specified in terms of interface classes and their methods.

In this system, the OSA gateway is connected with the location and visualization system in order to extend the service to mobile network. Fig. 4-4 shows components applied in this system. The OSA application server receives the request from users using mobile terminals, and request data from the OSGi gateway at home. The data can then be sent to the user through two different paths: by e-mail via the internet, or by MMS via cellular network.

4.3 Mobile Robot System

Fig. 4-4: OSA network services applied in the proposed system Framework User

The proposed location aware system is integrated and tested with an on-demand intruder detection system which features a ZigBee WSN and a mobile robot. If any intruders or abnormal conditions are detected, the state and location of this alert will be transmitted to the security robot and monitoring center via the WSN. The robot can navigate autonomously to where the alarm takes place using the probability-based localization system. After the robot arrived at the spot, the onboard camera will transmit real-time images to the user via both WiFi and 3G networks. The security guards and end-users can therefore easily determine the exact situation in real time.

4.3.1 Agent-based Control System

An agent-oriented robot control (ARC) architecture [53] was adopted to handle the multi-modal information fusion and motion control of the robot. ARC is a distributed hybrid system, which couples reactive and local deliberative agents. In this architecture, a router manages the inside communication of the robot, while other components handles the real-time performance of the robot behaviors. Each individual agent is responsible for sensing, actuating, and executing a set of functions. A task manager will integrate all the information and guide each agent with proper parameters.

4.3.2 Mobile Robot Localization

In the intelligent environment, several reference nodes were deployed beforehand in the environment to localize the robot using the localization algorithm in Chapter 2. Their positions were stored in a database on the robot’s onboard computer. Two kinds of messages are defined for the ZigBee WSN: the measurement-demand and signal-report. The measurement-demand message is used for a mobile node to request the RSSI measured from reference nodes. This message is broadcasted to all the nodes which are able to receive it. Furthermore, a counter is included in this message to keep track of various measurement demands. The signal-report

message is used by the reference nodes to report the measured RSSI values. In summary, location estimation is performed using the messages and the following process:

1) A measurement-demand message is broadcasted to sensor nodes from a mobile node.

2) Each sensor node measures RSSI at the time it receives the packet. Subsequently, it transmits the RSSI, counter number and the mobile node ID to the robot.

3) The robot collects all the data, separates each by the ID and counter. If three or more RSSI values with the same counter are observed, the location of the robot can be estimated and updated using the proposed algorithm. In the current implementation, the entire procedure can be accomplished within 600ms.

4.3.3 Autonomous Navigation System

Autonomous navigation in a dynamic changing environment is essential for a robot patrolling from one place to another. In the proposed system, this task is achieved by a behavior-fusion approach adopted from [53, 54]. Fig. 4-5 illustrates the architecture of navigation system using behavior fusion. We designed three primitive behaviors for autonomous navigation in an indoor environment. These behaviors are goal seeking behavior, wall following behavior, and obstacle avoidance behavior. Goal seeking behavior is treated as attempting to move toward the direction of the target. Wall following behavior is defined as maintaining the same distance from the nearest wall. The obstacle avoidance behavior is designed so that the robot moves in a direction away from any obstacles. These behaviors will each output a set of desired wheel velocities.

The concept of pattern recognition was adopted to fuse the output of navigation behaviors.

The pattern recognition technique was used to map the environmental configuration, obtained from ultrasonic and infrared sensors onboard the robot, to the fusion weights of three navigation behaviors. The concept can be expressed as “When the environment configuration

of surroundings is similar, the fusion weights of navigation behaviors should be similar.” In this work, the Fuzzy Kohonen Clustering Network (FKCN) is applied to map the environmental patterns to fusion weights [54] . In this design, ten prototype patterns defined by the quantized distances to objects around the robot were set manually to represent typical environmental configurations for indoor navigation. Suitable fusion weights for each navigation behavior corresponding to these patterns were assigned in the rule table of FKCN.

When the robot navigates through the environment, the sensors will obtain current range data around the robot. Then the FKCN works to generate proper fusion weights for each navigation behavior, corresponding to the immediate environmental sensory data pattern. As a result, the Fig. 4-5: System architecture of autonomous navigation using behavior fusion design..

Location and heading of

mobile robot can navigate to the desired location without colliding into any objects in the environment. In this application, the positions of both the robot and the goal will be determined by the location aware system.

4.3.4 Human Detection and Tracking

The current design of location aware system is based on the received signal strength indicator (RSSI) of ZigBee wireless sensor networks. However, due to limited accuracy of the RSSI-based localization method, the robot cannot rely on merely RF information. A second method is required as the robot approaches to the user. In this implementation, a vision-based human detection and tracking system is integrated with the RSSI-based localization system to allow the robot approach the user more precisely.

In the current design, the human detection system includes two schemes: the body detection and face detection. Both methods are exploited to search and track the user as the robot approaches the user, as shown in Fig. 4-6. If a user is detected, the location of the user can be estimated by combing the detection result and the localization method proposed in Chapter 2. The robot is therefore able to track the user. In particular, the face detection can ensure that the user is facing the robot. For a robot with RGBd camera, the body detection method is used in this design based on the implementation of OpenNI[55]. The body detection system can not only identify the body, but also able to track the skeleton of the body from the depth image of Kinect. The posture of the human can therefore be recognized and the robot can adopt its pose to fit the user while delivering an object. The face detection method, on the other hand, is adopted from the functions in OpenCV[56]. The skin color segmentation is applied before the detection step to further increase the robustness

4.4 Summary

This chapter describes hardware and software modules of the location aware system developed and implemented in this thesis. The developed intelligent environment consists of three main sub-systems: ZigBee-based intrusion detection system, multimedia Human-machine interface and OSA platform, which provides the user to be alerted and monitor the house via camera or the robot from their mobile devices. The implemented mobile robot system is able to autonomous localize itself, navigate the environment, and detect human during the navigation to provide service.

(a)

(b)

Fig. 4-6: Examples of the human detection system: (a)human body detection, (b)face detection

Chapter 5.

Experimental Results

5.1 Introduction

In this chapter, several experimental results using the proposed method and

In this chapter, several experimental results using the proposed method and

相關文件