Error analysis - Error analysis of 3D line reconstruction from intersection of two triangles

Chapter 5 Error analysis of 3D line reconstruction from intersection of two triangles

5.3 Error analysis

27 The corresponding transformations, HL (for ILX → RLX, X = S, E) and HR, are found in advance by using positions of four reference points marked on the floor (not shown in Fig. 1), and their positions in the stereo images.

28 Influences from more complex situations, e.g., when the pointer’s color is close to the background, are not considered in this chapter since highly dynamic segmentation errors of the pointer due to pointer-background interaction may be so large that the error analysis of the RPPs will make no sense. (Similarly, extraction of reference points in the system calibration stage is also assumed to be free of such complex situations.) In general, more involved segmentation schemes will be needed to resolve such a problem, which is out of the scope of this chapter. One way of resolving such a problem is to employ special hardware in the system setup, e.g., attaching blinking LEDs [24] to the pointer.

the following, some error analysis methods will be developed to investigate the influence of dynamic errors on the RPPs of the proposed systems. The goal is to correctly and efficiently identify the range of error in the position of RPP.

For the pointing system shown in Fig. 5.1, πP, CL and CR are fixed in position; therefore, RPP is decided by the reconstructed planes πL and πR, and in turn decided by pointer endpoints ILS, ILE, IRS and IRE. The process of the extraction of these points from stereo images is often influenced by the imaging noises mentioned above. As a result, the obtained pointer endpoints are not stable, so is the calculated RPP. Thus, the deviation of the RPPs due to the variations of ILS, ILE, IRS and IRE

will be the main focus of this chapter.

For a preliminary examination of the above deviation, simulated noises of unit magnitude are added to these pointer endpoints. In particular, 24 simulated points placed evenly (with 15^º spacing) along ”noise” circles with radius of 1 pixel are generated for ILS = (188,158), ILE = (247,189), IRS = (159,142), and IRE = (226,155) in Fig. 5.1, as shown in Fig. 5.2. In each run of the simulation, four points, each selected from one of the above four circles, are selected as endpoints of the pointer in the stereo images to reconstruct a RPP using aforementioned homographic transformations. Fig. 5.3(a) shows all 24⁴ RPPs (in red), with the convex hull of them (the range of reconstruction errors) shown in Fig. 5.3(b), computed from the 24 × 4 simulated points.

In general, it is desirable to have such a range calculated more efficiently, e.g., with less simulated endpoints of the pointer. However, a direct reduction in the data size may underestimate range of reconstruction errors. For example, the blue region in Fig. 5.4 is obtained by using only 4 points (with 90ºspacing) from each noise circle shown in Fig. 5.2.

From some close examinations of the relationship between the above reconstruction errors and the locations of the four simulated endpoints of the pointer obtained from Fig. 5.2, it is found that the error range is mainly due to (two) extreme values in the slopes of I_LS'I_LE'(and I_RS'I_RE').

Based on such an observation, we then try to use only the contacts of the internal common tangents (CICTs) of the two noise circles in each of the stereo images (see Fig. 5.2 for such tangents). The range of reconstruction error thus obtained is also shown in Fig. 5.4 (as four points connected by black line segments). One can see that such results almost coincide with that obtained using all (24) points from each noise circle of simulated points shown in Fig. 5.2. A closer examination can be carried out by comparing the coordinates of the vertices shown in Fig.

5.4, as listed in Table 5.1. Thus, estimation of the error range from a larger number of the simulated points (24 × 4) can be replaced by using only the 8 (2 × 4) CICTs with negligible change in the estimation, and with the number of reconstructed RPPs reduced greatly (from 24⁴ to 2⁴).

(a) (b)

Fig. 5.3. (a) RPPs for simulated points shown in Fig. 5.2. (b) Range of reconstruction errors (with error-free reconstruction show by an ”x”).

Fig. 5.4. Error range shown in Fig. 5.3(b) (red), similar range but obtained by using only 4 points (with 90◦ spacing) from each noise circle in Fig. 5.2 (blue), and error range based on internal common tangents (black, see text).

Table 5.1. Coordinates of the vertices shown in Fig. 5.4.

xmax xmin ymax ymin

(blue) 535.9857 455.9600 395.6483 330.5393

(red) 539.2251 453.4022 397.8248 328.1907

(black) 539.2422 453.2395 397.8823 328.1027

The above observations regarding CICTs of two noise circles, i.e., a RPP of a pointer from stereo images will be displaced much more when the pointer is rotated than if it’s translated with comparable amount of movements of its endpoints, can be explained with a simple example, as discussed in the following. Consider a pointing system with geometric configuration similar to that shown in Fig. 5.1, and assume the pointer is initially perpendicular to the projection plane.

When the pointer is translated by k in a direction parallel to the projection plane, the RPP will be

translated by k too. However, if we fix the endpoint of the pointer which is far away from the projection plane as the center of rotation and rotate the pointer by θ degrees such that the other end of the pointer is displaced by k = θr, with r being the length of the pointer, the RPP will have a displacement of k′ > θd with d being the distance from the pointer to the projection plane. One can see that if d >> r, which is often the case in various pointing situations, the amount of movement of RPP with a rotated pointer is much larger than that due to a translated pointer, or k′ >> k. Such an example reasonably explains why the estimated maximal error range (EMER) efficiently obtained using CICTs can represent the real error range with high accuracy, as the CICTs give the limits of the rotation angle of the pointer, with its end points confined to two noise circles in each of the stereo images.

The use of unit circle for noise is only to provide a baseline for error estimation, which can in fact be adapted for specific applications. For pointing systems based on the estimation of two ends of an elongated pointer, the idea of CICTs can be generalized easily and applied to the spatial supports, regardless of their shapes²⁹, of the error distributions of the two points to estimate the EMER of the pointing position. Such supports can be obtained for a static pointer in each view by observing its two ends for some time.

5.4 Experiments

In order to clearly verify the validity of the EMERs with respect to actual error distributions, we focus on the static pointing situation in the experiments, i.e., we fix the pointer in space and measure the locus of RPPs. Thus, additional sources interferences, e.g., due to multi-camera synchronization and/or motion blur of a moving pointer, can be avoided. The error analysis results obtained here can be applied in the future to situations involving highly dynamic pointing situations if these interferences can be well controlled or even eliminated, e.g., via better imaging hardwares. We will first examine the proposed error estimation method by placing the pointer at a position, and pointing to a position on the projection plane. Then, pointing results obtained by selecting of a pair of cameras for each RPP according to the EMERs are compared with those obtained by using all cameras.

Figs. 5.5(a) and (b) show an orange stick which is fixed in the workspace and is used in the experiment as a pointer. In Fig. 5.5(c), the purple quadrilateral shows the EMER obtained for simulated 1-pixel error in point feature extraction, while the red dots are the actual positions of RPPs found by the pointing system during a period of 30 seconds. One can see that the latter is well bounded by the former.

29 For example, error distributions can often be described by elliptical Gaussian blobs.

(a)

(b)

(c)

Fig. 5.5. (a) Left image. (b) Right image. (c) EMER and actual RPPs.

(a) (b)

Fig. 5.6. (a) Layout of the synthesized room. (b) Pointing positions on the projection plane.

The actual errors and estimated errors have a nice match in their distributions which are spatially highly directional. In particular, the locations of the RPPs are now distributed in a fairly narrow region, with its elongated direction well predicted by the EMER.

To further investigate the relationship between EMERs and different pointing positions, and with respect to different camera pairs, a synthesized room of size 500cm by 500cm is built. Fig.

5.6(a) shows the top view of the layout of the room. Cameras C1, C2, C3 and C4, marked as crosses, are mounted on a ceiling of height 250cm while pointing toward the center (250, 0, 250) of the room. The red line represents the pointer and the green line corresponds to the projection plane, on which a user will point to nine fixed pointing positions P1, P2, …, P9 as shown in Fig. 5.6(b). The

Table 5.2. Suggestion of camera pairs.

Pointing

positions Camera pairs for smallest error range

P1 C3&C4 left-right symmetry of camera configuration with respect to the pointer (which is located 100cm above ground level), highly symmetrical patterns of EMERs can be observed.

The above EMERs can serve as good references for a user to select camera pairs that will achieve the highest stability in the pointing process. Table 5.2 shows such suggestion of camera pairs for each of the nine pointing positions³⁰, which correspond to the minimum areas of EMER.

On the other hand, huge EMERs in Fig. 5.7 also indicate inappropriate camera pairs e.g., C1&C3

in Fig. 5.7(c) and C2&C4 in Fig. 5.7(d), that may result in highly unstable pointing and should be avoided. One of such EMERs of C1&C3 occurs while the pointer is pointed toward P2. The problem is due to the very short pointer extracted in one of the pair of images (see Figs. 5.8(a) and (b)), which is highly sensitive to image noise and may cause huge reconstruction errors. Similar problem occurs when the pointer is pointed toward P₈ (see Figs. 5.8(c) and (d)). Note that some of EMERs shown in Fig. 5.7 are highly directional. Thus, suggestions other than those listed in Table 5.2 are possible if requirements of pointing accuracy for a particular application are not isotropic.

Fig. 5.9 shows similar experiment results obtained by moving the pointer left 150cm. The EMERs shown in Fig. 5.9 are not symmetrical due to the lack of symmetry in the geometry of system configuration. However, the huge EMER shown in Fig. 5.9(a), which does not correspond to a very short pointer in the image, as shown in Fig. 5.10, it is due to the fact that the reconstructed planes π_R and π_L are almost parallel to each other.

30 These positions are mainly used to show that if arbitrary camera pairs are adopted for different locations on the projection plane, the resultant RPPs may be too unstable to be useful. For other locations, the trend of RPP stability may be estimated via interpolation, which is omitted for brevity. On the other hand, since the proposed CICT-based error analysis is extremely efficient, the EMER, as well as the preferred camera pair, may be estimated on the run, as the pointer ends are extracted, for arbitrary RPP and user (and pointer) locations. The above arguments can also be applied to the next set of experiments which use selected (200) pointer locations to show that using unit circles is as good as using more precise (often smaller) circles to simulate noises in terms of helping the user to avoid camera pair(s) of worst stability performance.

Table 5.3. Pointing errors of the two methods for the pointer placed at (250, 100, 350).

While the goal of the proposed error analysis is to identify one of camera pairs that will result in best pointing performance in terms of pointing stability, the underlying assumption is that when highly unstable RPPs are reconstructed with data obtained from using all cameras, the problem can be alleviated by not using inappropriate cameras (or camera pairs) if possible. For example, if more than two cameras are used for the pointing system shown in Fig. 5.1, the proposed approach will choose two cameras to find the RPP, while a least square solution of RPP can be found by using all cameras, as in [31].³¹ To verify the above assumption, additional experiments are conducted for the simulation environment described in Fig. 5.6, with additive noises.

Table 5.3 shows pointing errors generated by (i) the proposed method which selects a camera pair for each pointing position (from P₁ to P₉) according to Table 5.2³² and (ii) the least square approach discussed in [31] which uses all cameras.³³ One can see that similar pointing accuracy (within 0.76cm) can be achieved by both (i) and (ii) for all pointing positions, except for P₃ and P₉ which correspond to the largest pointing error on the average for both methods. Intuitively, one would expect that most unstable pointing results will be generated for these two points, as shown in Fig. 5.11(a), since they correspond to the smallest angle between the pointer and the projection plane. Note that up to 47% reduction (from 24.74cm to 12.90cm) in mean pointing error can be

31 For [31], the pointing direction is defined by the hand-head line, and the RPP is obtained as the least square solution of the intersection of projections of this line on the projection plane from all cameras. If there are only two cameras, as shown in Fig. 1, the two approaches will generate identical RPP.

32 For P5 (P6), C2&C3 (C2&C4) are selected.

33 To ensure a fair comparison, the two end points of the pointer adopted in our system for error analysis are used to define the pointing line in each camera view for both (i) and (ii).

(a) (b)

(e) (f)

Fig. 5.7. Estimated maximal error ranges for different camera pairs: (a) C1&C2. (b) C2&C3. (c) C1&C3. (d) C2&C4. (e) C1&C4. (f) C3&C4.

Table 5.4. Pointing errors of the two methods for the pointer placed at (100, 100, 350).

Pointing position

Mean error (standard deviation)cm

Our method [31]

P1 2.12(1.11) 3.11(1.14)

P2 4.28(2.80) 4.84(3.04)

P3 9.43(6.82) 11.39(7.28)

P4 2.67(1.82) 4.11(2.05)

P5 3.40(1.71) 6.63(3.38)

P6 5.99(3.04) 14.40(8.64)

P7 7.02(5.10) 14.86(6.97)

P8 7.23(4.75) 11.51(5.46)

P9 10.87(5.07) 19.01(13.72)

(a) (b)

Fig. 5.8. (a) Image captured by C1 when the pointer is pointing toward P2. (b) Image captured by C3 when the pointer is pointing toward P2. (c) Image captured by C2 when the pointer is pointing toward P8. (d) Image captured by C4

when the pointer is pointing toward P8.

achieved with the proposed camera selection scheme for these worst case scenarios.

Additional observations can be made for more general system configurations wherein the pointer is moved left by 150cm from that specified above, as shown in Table 5.4. Unlike the nearly symmetric pattern shown in Fig. 5.11(a), the corresponding distributions of the RPPs shown in Fig. 5.11(b) are not symmetric since the camera locations are no longer symmetric with respect to the pointer position. Again, more than 40% reduction (from 19.01cm to 10.87cm) in mean pointing error can be achieved for the worst case situation with the proposed approach compared with the least square one. The above results suggest that the camera selection scheme based on the efficient error analysis proposed in this chapter can indeed help the pointing accuracy and stability.

5.5 Summary

In this chapter, a simple and real-time pointing system is implemented so that the pointing error can be examined closely. A pointer with bright color is used in the pointing process to reduce the complexity in extracting its direction in an image, and error ranges in the pointing position are estimated by synthetic image noises. To greatly increase the efficiency of the estimation, a fast analysis method is developed which only utilizes an extremely limited subset of noise data. With the help of such analysis, suitable operation positions may be suggested to a user of similar

(a)

(b)

(c)

(d)

(e)

(f)

Fig. 5.9. Estimated maximal error ranges for the pointer moved left 150cm for different camera pairs: (a) C1&C2. (b) C2&C3. (c) C1&C3. (d) C2&C4. (e) C1&C4. (f) C3&C4.

pointing systems if the pointer can be used in different locations in a 3D workspace. Moreover, in a multi-camera environment, the overall pointing operation can achieve smallest error ranges, and most stable pointing results, by automatically selecting a pair of cameras based on the proposed error analysis scheme. While experiments are conducted and studied in this chapter for static pointing situations, the proposed approach is applicable to more dynamic situations, e.g., in applications wherein instructions are given via various trajectories of pointing positions. However, the error analysis method cannot be applied directly to our people localization system because the line correspondences between different views are unknown. Further investigation is needed to address such an issue.

(a) (b)

Fig. 5.10. Image captured by C1 when the pointer is pointing toward P7. (b) Image captured by C2 when the pointer is pointing toward P7.

(a) (b)

Fig. 5.11. Distribution of RPPs of the nine pointing positions for the pointer placed at (a) (250, 100, 350) and (b) (100, 100, 350).

Chapter 6 Conclusions and future works

In this thesis, three people localization methods using the vanishing points of vertical lines and binary foreground regions are proposed. The vanishing points are used to generate 2D line samples of foreground regions in multiple views efficiently. These 2D line samples can provide sufficient information (evidence) and enable the generation of 3D line samples for potential people locations. Additionally, a grid-based footstep analysis, followed by 3D line sampling, is proposed to find potential people locations. Thus, the costly 3D reconstruction can be avoided while the computation speed can be improved. Furthermore, to improve the efficiency of the first method, a refinement procedure of 3D line samples associated with geometric rules is proposed to filter out invalid 3D line samples very efficiently. Therefore, people localization can be achieved in real-time without using special hardware. However, a lot invalid 3D line samples are also reconstructed and processed further since the correspondence of line sample pairs between different views is unknown. To alleviate such a problem, a line correspondence measure of 2D line samples is proposed and applied to filter out non-corresponding 2D line sample pairs before the 3D reconstruction stage. Because more than 90% 2D line sample pairs can be filtered out, the computation efficiency is improve significantly. Finally, we propose an error analysis of 3D line reconstruction method to improve the accuracy of line-based pointing systems, which is expected to help the improvement of the accuracy of the proposed people localization methods in the future.

Appendix A

The derivation of multiple homographic matrices for planes of different heights

Fig. A.1. Illustration of calculation a reference point on πr.

Homographic matrices are required for projecting 2D line samples onto the reference plane, as in Subsection 2.1.2. Also, the homographic matrices of multiple reference planes at different heights can be used to back-project points on a reference plane to different views for the computation of AFCR, as in Subsection 2.2.1.

In [23], [24], the authors use four vertical calibration pillars placed in the scene, with marker points at three known heights on each of them, to establish the homographies between image planes and reference planes at desired heights. Since a new reference point at any height along a pillar can be identified in the images of interest using the cross-ratio along that pillar, the above homographic relationship can actually be established for planes at arbitrary height. Thus, twelve (4 × 3) marker points are required for calculating all homographic matrices.

Instead of using twelve marker points, an approach for the derivation of multiple homographic matrices for planes of different heights, which only use eight (4 × 2) marker points on four pillars, is presented in the following. Assume each pillar has two marker points at planes π1 and π2 with heights h1 and h2, respectively. First, four marker points with height h2 are used to calculate a homographic matrix Hm2 between the image plane and the reference plane π2 as shown in Fig. A.1. Then, we will produce four reference points on π2 by projecting the four marker points with height h1, respectively. More specifically, the image point p corresponding to the marker point P can be projected to π2 by Hm2 to obtain the world coordinate of P′ as shown in Fig. A.1.

After that, we can calculate a new reference point Pr on an arbitrary imaginary plane πr with a specified height by calculating the intersection of PP'and πr. Similarly, the rest three marker points with height h1 can be used to produce another three new reference points on πr. Finally, a homographic matrix Hmr can be found by using the four new reference points. By adopting such a method, we can produce a set of homographic matrices for reference planes of various heights

在文檔中以多攝影機進行人物定位 (頁 54-0)