4.2 The Proposed Positioning System
4.2.3 Position Estimation
4.2.3.1 View-angle Invariant Distance Estimation
To estimate the user position, the first step is to calculate the distance between the recognized shop sigh and the user. Intuitively, the scale ratio
39 of the size of the shop sign in the input video frame to that in the database image, called SR, can be utilized to infer the distance. Fig. 4.2 shows the observation of different shapes of the shop signs under various view-angles from the user. From this figure, we can see that the pixel displacement of y-component keeps in a constant value even when the view-angle of the user changes. Thus, we define SR for each matching pair as shown in the following equation.
rij = |{y(fi) − y(fj)}/{y(Fi) − y(Fj)}| , (4.1) where rij is the SR between the i-th and j-th features. y(fi) is the y-component of the i-th recognized feature fi and Fi is the the corresponding feature of fi in the image database
Matching pair
Matching pair Feature Constant
Figure 4.2: Sign shapes under differ-ent view-angles between the object distance and the projected width in the image plane
However, shapes of shop signs under various view-angles are usually not perfectly transformed. The effect would cause the error of SR estimation.
Therefore, a spatial consistency filter is introduced to eliminate outliers with
extreme SR values. This filter calculates the average value of SR among all matching pairs. Next, the matching pairs with SR larger than the average value under a given tolerance would be removed. This process repeatedly proceeds until the percentage of inliers is larger than a given threshold. The screening condition could be represented as follows.
mkij =
where p is the tolerable error rate of Rk−1, which is the refined ratio in (k-1)-th iteration. mkij is a binary value which decides whether the i-th and j-th matching pair rij is inlier or not in k-th iteration.
The rij would be regarded as an inlier if it is closed to Rk−1. Otherwise, it would not be adopted in the next iteration and the corresponding mkij would be labeled as zero. After inliers are acquired, Rk, the value of R, could be updated as follows.
Rk = (
where N is the total number of matching pairs and Rk would be iteratively refined until the following condition is reached.
XN
where a is the tolerable error rate of outliers.
Eq. 4.2 eliminates extreme SR values. The Rk would be iteratively refined by preforming operations shown in Eq. 4.2 and 4.3 until the condition illustrated in Eq. 4.4 is reached. Next, SR values of the recognized shop signs are calculated and the corresponding distance from the user could also
41 be estimated by Eq. 4.5. The geometrical relationship is shown in Fig. 4.3,
d1 = w2
w1× d2 = R × d2, (4.5)
where d1 is the distance between the user and shop signs and d2 is the constant value stored in the image database. w is the pixel width of shop signs appearing in the video frame.
Refer to the d2 and R, d1 could be obtained and it would be sent to the next stage to reconstruct the user location.
4.2.3.2 location Reconstruction
In order to estimate current user location on the GPS map, the estimated distance between shop signs and the user should be combined with the in-formation of the user orientation. Fig. 4.4 illustrates definitions of the user orientation, pattern angle θpi and view angle θh. θh can be acquired by the digital compass. θpi, representing the pattern angle of the i-th recognized shop signs, can be calculated by the pixel position of the signs. Combining the user orientation, estimated distance from shop signs, and GPS informa-tion, the user location in the global map could be estimated by Eq. 4.6.
x =
where Np is the number of recognized sings. The xpi and ypi are the coor-dination of the i-th recognized signs which are stored in GPS database. If Np > 1, we calculate the mean of these reconstructed locations.
4.2.3.3 Path Refinement
Although the user location could be estimated in the previous stage, there is still small error of R. Therefore, Kalman filter is adopted to refine the
Camera center
Figure 4.4: Definition of the user orientation, pattern angle, and view angle path. It can combine all estimations and previous status to predict the most possible position. In our work, the previous motion trajectory, GPS raw data, and the estimated distance from shop signs are combined to pre-dict accurate location of the user. The following equations are the main mathematical operations in Kalman filter.
ˆ
xt= Aˆxt−1+ But−1, (4.7) xt= ˆxt+ Kt(zt− H ˆxt), (4.8) where the ˆxtis the predicted user location according to the previous motion on t-th. The ut−1 is the previous motion trajectory of the user on (t-1)-th.
The xt−1 is the previous decision of user location. The zt is the GPS raw data and the reconstructed location presented in section 4.2.3.2. The Kt is the confident ratio for the predicted location.
4.3 Experimental Results
In the proposed work, real sequences captured from a CMOS front-mounted camera with 1920x1080 resolution video input are utilized. Firstly, a user
43 walks along several streets and his walking paths are recorded as correct an-swers. While the positioning information outputs from GPS, the proposed system is tested simultaneously with the GPS information. Two experi-ments are conducted to compare positioning accuracy between GPS and the proposed system.
Figure 4.5: Accuracy comparison between GPS and the proposed method.
The first experiment compares the error of location estimation calcu-lated by GPS and the proposed system. The experimental result is shown in Fig. 4.5. From this figure, we can see that estimated error of the pro-posed system is the same as GPS in first 16 frames. It is reasonable because there are few recognized shop signs utilized for location reconstruction in the beginning. As the number of shop signs appears in subsequent frames increases, the estimated errors of locations by the proposed system are be-low 2 meters. Compared with location information provided by GPS with 10 meters error on average, the proposed system significantly improves po-sitioning accuracy.
The accuracy of SR is important for distance estimation and location
0 .0 0 %
Figure 4.6: Comparison of the error rate in SR estimation
reconstruction. In the second experiment, we measure the improvement of SR estimation by the stage of view-angle invariant distance estimation.
Fig. 4.6 shows the comparison result. The y-axis represents the percentage of the error estimation to the real value for the SR. For example, if the esti-mated SR is 4.9 and the real value is 5, the error percentage is 0.1/5 = 2%.
The x-axis represents the view-angle of the detected signs. The view-angle of the facade of the sign is regarded as zero degree. From Fig. 4.6, we can see that without the proposed spatial consistency filter, the error percentage increases dramatically as the view-angle is larger than 40-degree. On the contrary, with the proposed spatial consistency filter, the error percentage is 0.8% on average even when the view-angle increases. This result proves that the view-angle invariant distance estimation method supports accurate SR and correct distance estimation. Fig. 4.7 plots the estimated user location on the GPS map and shows the corresponding street view captured by the camera. Based on the spatial relationship between the user and recognized shop signs, the proposed system supports accurate positioning in practice.
Fig. 4.8 depicts paths estimated by GPS and the proposed system. It is
45 obvious that the estimated path provided by the proposed system is closer to the correct answer.
recognized signs
user
Figure 4.7: The information on the GPS map and the corresponding recog-nized shop signs in the video frame
Figure 4.8: Path estimation by different methods
4.4 Conclusion
We propose an accurate and robust positioning system based on street view recognition. With combination of dynamic street view recognition, huge GPS map and shops information, the proposed system provides accurate
user locations. The view-angle invariant distance estimation and path re-finement mechanisms are proposed to achieve positioning with high preci-sion. Compared with 20 meters error of localization results provided by GPS, the error of the proposed system is 0.97m on average. Experimental results demonstrate our system is reliable and feasible. In addition, the proposed system can be applied to many innovative navigation systems.
47
Figure 4.9: System flow diagram
Chapter 5
Hardware Architecture
5.1 Introduction
In the Chapter 2, 3 and 4, we introduce the algorithms of visually-impaired aid system. These algorithms are robust and reliable, however, the compu-tational time by software is too long to implement in the real case. There-fore, in this chapter, we would introduce the hardware implementation to these algorithm. The critical path, segmentation and morphology, of these algorithms would be picked out to discuss and design. Fig. 5.1 shows the profiling results on 2.93G dual core CPU. Obviously, in the percentage of complexity to these algorithms, the segmentation and the morphology are the critical path in our system. Sec. 5.2 would introduce the design challenge of these two algorithm. Sec. 5.3 will illustrate how the segmentation and morphology be accomplished in hardware. Sec. 5.4 would make a conclusion and show the chip implementation.