Data Recording and Analysis Process

Chapter 2 System Design and Processes

2.3 System Processes

2.3.2 Data Recording and Analysis Process

In the data recording and analysis process, two computers are in use as the controllers of the 14 KINECT devices and are connected by a cable. They are of a master-slave structure, as shown in Fig. 2.7. The software implementation is based on a client-server architecture which we use a windows socket to conduct the

Fig. 2.7 Master-slave structure of proposed data recording and analysis process where PC1 is the master computer and PC2 is the slave computer.

At the beginning of the data recording and analysis process, as shown in Fig. 2.8, at the client side, the master receives an instruction from the user, and then sends a request to the slave at the server side for recording. The slave starts the recording process and returns a reply to the master after getting the request. While receiving a reply from the slave at the server side, the master starts to run the recording process,

too. All the communications are implemented by multi-thread, because we have to communicate two devices and record at the same time. The ending of the data recording and analysis process is shown in Fig. 2.9.

Client(master) Server(slave) 2. Request

1. instruction

4. Reply

5. recording 3. recording

Fig. 2.8 Starting the data recording and analysis process.

Client(master) Server(slave)

2. stop request 1. wait the stop pattern

4. stop reply

5. stop recording 3. stop recording

Fig. 2.9 Ending the data recording and analysis process.

Furthermore, a series of tasks are conducted in the data recording and analysis process as described in Section 1.3, including (1) transforming color and depth images acquired by each KINECT device at each time instant into a 3D image; (2) stitching all the color images into a panoramic color background image; (3) extracting nearby 3D objects from the 3D image corresponding to each KINECT device; (4) merging the extracted 3D objects into the panoramic color background image; and (5) allowing

the user to browse the merging result from any viewpoint and display the corresponding partial view.

Chapter 3 Design of Proposed 3D Around-car Imaging System

3.1 Idea of Proposed 3D Around-car Imaging System

When constructing a 3D EDR, it is important to let the EDR “see” the view around the car with no blind spot. It is obviously not enough to use only one KINECT device whose horizontal angle range is 57 degrees only. Instead, we have to affix multiple KINECT devices around the car. In addition, the way of design for this system is different for each distinct part of a car. Before starting the description of the proposed system design, we give a brief review of the design of a car model with multiple cameras produced by Luxgen Motor Co. Ltd.

Luxgen has released a new car equipped with six RGB cameras around the car body. They are called “eagle views.” A camera is affixed to the front of the car, and another to the rear. The remaining four cameras are affixed below the side mirrors, with each side equipped with two cameras, one camera facing to the rear of the car, and the other is facing askew to the rear as shown in Fig. 3.1.

(a) (b)

(e) (f)

(g) (h)

Fig. 3.1 The cameras affixed on the body of the car (a) (b) front part of the car (c) (d) side part of the car (e) (f) rear part of the car (g) (h) the recorder on the mirror.

This design inspired us, because we can affix cameras on the side mirror. With these cameras, we can see the around-car view like through the window of the car. If someone gets close to the car near the window, it would be found by the nearby KINECT device.

About the coverage of the front view of the car, Luxgen uses only one camera to cover the front view, because the camera is of the fisheye type which yields a wider view than from a normal projective camera. The overlapping portion of the front view and the side view is narrow, and so there exist four bind spots on the corner. To improve it, we use four additional KINECT devices to cover the corner views in our design. Specifically, to cover the front-left corner, a KINECT device with its view covering the portion of (a) as shown in Fig. 3.2 is deployed. A second KINECT device is used to cover the right symmetric portion in the front. For the rear-left corner, a third KINECT device with its view covering the portion of (c) is deployed. And the fourth KINECT device is used to cover the right symmetric portion in the rear.

About the other deployed KINECT devices, we affix three KINECT devices for the front view, four for each side view, and three for the rear view. In addition, we affix a KINECT device on each of the two side mirrors which looks backward to cover the view portion of (B) as shown Fig. 3.2. More details will be described in Section 3.3.

3.2 Details of System design

3.2.1 Front Part

In this section we will focus the part of our design which is related to the driver’s views. Specifically, we will affix KINECT devices to proper car body parts to cover

“blind spots” that the driver cannot notice during driving, for example, the part of the

front view which is lower than the height of the engine hood.

Fig. 3.2 Proposed design of the KINECT-device system affixed on the vehicle and the views of the KINECT devices.

When one drives a normal car in the street, the car might accidentally run over a dog or some animal. In this situation, the driver won’t know what is happening because what is going on is within the blind spots of the car. Using the car with our

design of the multiple KINECT device system, the blind spots can be eliminated and such accidents can be avoided. Moreover, blind spots seen from a driver on the truck are much larger than those of a usual vehicle, so proposing a design to cover completely the surround of a car is really important. And this is done in this study.

Also, we affix three ferrous boxes on the bumper (as shown in Fig. 3.6(a)) for the maximal utilization of the KINECT devices. This allows us to see the region under the engine hood, as shown in Fig.3.3. The yellow object is used to tag the limit of driver’s view (below or closer than this object would not be seen in the driver’s view).

(a) (b)

Fig.3.3 A test for driver’s view. (a) Side view. (b) Front view.

3.2.2 Right- and Left-side Parts

Originally, a KINECT device was put on the iron stand which is stitched on the car as shown in Fig. 3.4. This design was considered the maximal utilization of the depth information, which is available in the range from 0.5m to 6m according to our experiment. However, this design violates the law of car modification. Other by-passing cars will possibly be scratched by the iron stand while driving on the road, so this wasn’t an appropriate design.

(a) (b)

Fig. 3.4. A car-side iron stand for holding a KINECT device. (a) With a KINECT device. (b) Without a KINECT device.

After this experience, a new design was developed with the KINECT devices affixed at the higher side rack on the car roof, as shown in Fig. 3.5. This design is considered to be safer and more convenient. The ferrous box preserves a position for each KINECT, so the position of each KINECT device will not change whenever we put KINECT devices back on the car.

(a)

(b)

Fig. 3.5. Ferrous boxes for holding KINECT devices. (a) With a KINECT device. (b) Without a KINECT device.

With this new design, the KINECT device is unmovable and safer when we are driving. Though the tilter of the KINECT device is movable, these two KINECT

devices are too high. To solve this problem, we affix two KINECT devices on the side mirrors to cover the lower view ranges as shown in Figs. 3.6(c) and 3.6(d)

3.2.3 Rear Part

In the previous part we used a ferrous bar and affixed the boxes on that.

Contrastive to the previous part, because we don’t have any space to put the ferrous bar, or there is some ferrous stand originally, we drilled a hole for fastening the screw and affixed the ferrous box as shown in Fig. 3.6(b). We use this method only as a last resort.

The rear part is similar to the part of the front view, but we have to consider the case when the car is driven backward. From common drivers’ experience, the back view is known to be as a serious bind spot of the car, so we extend the view by using three KINECT devices (originally it was only one which is not enough). Furthermore, we use two KINECT devices on the side mirrors to cover more of the back view.

(a) (b)

Fig. 3.6 Around-car KINECT devices (a) A front view. (b) A back view. (c) A lateral view. (d) A rear-view mirror.

3.3 System Performance Analysis

3.3.1 Ranges of Camera Views

With the proposed 3D images system, driving a car is like to carry a box. This box is used to collect 3D data. At each instance every view is dropped into this box which is a parallelepiped shape. The size of this box is an extension of the car plus 6m outward. After recording KINECT images and processing them in this study, we can see a 3D image around the car for every instance. However, the depth data of KINECT devices are partially available in outdoor environments due to the interference from the sun to the emitted infrared light of the device. Note that the sun has a full spectrum of light. But according to our experiments, we have found that the data acquired by the KINECT device still works during sunset time. Our experimental results about this aspect are shown in Fig. 3.7 which were collected in August. We know from this result that the size of the parallelepiped box mentioned above is floating, and the 6m range is just an ideal case.

Fig. 3.7 The relationship between the depth image quality and the sun intensity (the x-axis specifies time, the y-axis specifies the available depth range).

Though this would be a bad news to our application, but we can still utilize the color images as the views in day time, and use the depth image to construct 3D models for night time, since the quality of depth images is pretty good in night. We improve the vision in the night by using depth images, and the night time is when traditional 2D EDRs do not work well.

3.3.2 Imaging Sequence and Speed

We use two computers to speed up our imaging speed. Two problems so arises.

The first is the communication time, and the second is the synchronization of the rates of FPS (frames per second).

Firstly, we have to know the imaging speed of a single KINECT device. By reading the specification of the KINECT device, we know that the imaging speed is 30 FPS. In other words, we take a picture in 33ms. A single request from the master computer to the slave takes about 1 ms according to our experimental experience.

Since the communication steps will not affect our imaging speed too much, we can conduct sequential processes with this speed for applications.

Secondly, we want to use a clock to synchronization the FPS rates of the two computers. When a signal is received in both the master computer and the slave, they start to count their time. In that way, each side may be controlled to take a picture at the same time (or we can say in a nearly identical time).

Finally, the last problem is the synchronization of the FPSs when car is moving.

We allow 7 KINECT devices to be controlled by each computer. Take the sequential processing nature of the CPU, the imaging speed is so 33ms  7 = 231ms, so the FPS is 1/231  4.32. In other words, our car box mentioned above acquires a pair of KINECT device images in 231ms for each instance. Suppose that this car is driven slowly, just like a person working (a normal driving speed is 4km/hr = 111.11 cm/sec)

The delay length for each KINECT device coming from the delay of image acquisition time (i.e., 231ms) with respect to a proceeding KINECT device will be at most 111.11  0.231 = 25cm; and the delay length of the neighboring KINECT device will be 111.110.033= 3.67cm. By these parameters, we know that the car speed will not be a problem to our processing work.

Chapter 4 Construction of 3D Images from KINECT Images

4.1 Review of Structures of Depth and Color Images Taken by KINECT

Devices

The data acquired with KINECT devices are of two types. One is the traditional color image, and we compress this type of image into the JPEG format. In this format, we can get a reasonable image quality and a good compression rate. The other is the depth image. It stores the distance between the objects in front and the KINECT device in the unit of pixel. Unlike the color image, the depth image can’t be seen straightly. It is composed of many object distances, and we can see this image more properly by quantizing its values to be in the range of 0 to 255. After that, we can see a gray-level image that shows the distance to every object point from the camera view, as shown by the example in Fig. 4.1.

(a) (b)

Fig. 4.1 Images acquired with the KINECT device. (a) Color image. (b) Depth image.

4.2 Construction of 3D Images from KINECT Images

4.2.1 Review of Pinhole Camera Model

The pinhole camera model describes the relationship between the coordinates of a 3D point and its projection onto the image plane of a pinhole camera, where the camera aperture is a point as illustrated in Fig. 4.2.

Fig. 4.2 A tree is projected onto the image plane through a pinhole model.

The geometry of the pinhole camera model may be illustrated by Fig. 4.3, which includes the following components.

1. A 3D orthogonal coordinate system with its origin at O. The three axes of the coordinate system are X1, X2, and X3. A point P somewhere in the world is specified by coordinates (x₁, x₂, x₃) with respect to the X₁-, X₂-, and X3-axes.

2. The image plane is parallel to the X₁- and X₂-axes. The image center is denoted as R.

3. The projection of a space point P onto the image plane is denoted as Q. This point Q is just the intersection of the projection line (green) “emitted” by P and the image plane.

4. There is also a 2D coordinate system in the image plane, with its origin at R and its Y1- and Y2-axes parallel to the X1- and X2-axes, respectively. The coordinates of point Q in this coordinate system are denoted as (y₁, y₂).

Fig. 4.3 The geometry of a pinhole camera.

Next, we want to derive transformations between the coordinates (y₁, y₂) of point Q and the coordinates (x1, x2, x3) of point P. In Fig. 4.4, we see two similar triangles from which the following two equations can be derived:

Summarizing Equations 4.1 and 4.2, we get a vector equality as follows:



With the above equation, we can construct 3D images. The proposed method for this purpose will be explained in the next section.

Fig. 4.4 The geometry of a pinhole camera as seen from the X2 axis

4.2.2 Ideas of 3D Image Construction and Coordinate Conversion

With the pinhole camera model reviewed in the last section, we can now describe how we construct a 3D image by mixing color and depth images according to the pinhole camera model.

In more detail, from Equation 4.3, we can get:

And from Fig. 4.3, based on the similar-triangle principle again, we have the

where   y1 ² y2²f² is the length of line segment OQ, and x₁²x₂²x₃²

is the length of line segment OP. In the same way, we can get:

For our applications here, from the above equations we can derive more detailed facts in the following which are useful for the purpose of 3D image construction:

1. x₁²x₂²x₃² is the distance between KINECT device and a point on the object, and we denote it as d;

2. the image center is R whose location is (X_mid,Y_mid) in our captured depth image by the KINECT device;

3. denote the focal length of the depth cameras by f_d;

4. we can get Equations (4.10), (4.11), and (4.12) below from Equations (4.7), (4.8), and (4.9) for each point (x_p, y_p) in the depth image:

By (4.10), (4.11), and (4.12) , we can convert a 2D point with coordinates (xp, yp) in a depth image into a 3D point with coordinates (x₁, x₂, x₃) and construct the 3D image by associating the corresponding color.

   

For the purpose of mapping 3D points to 2D pixels in the color image, we can derive, in a similar but inverse way, the following equations according to Equations (4.7), (4.8), and (4.9) (note that the focal length of the color camera is f_C = 525):

where (Xmic, Ymic) are the image center of the color image.

Finally, we can construct a colorful 3D image from a depth image I_d and a color image Ic acquired by the KINECT device, and the details are described in the next section.

4.2.3 Construction Algorithm and Experimental Results

We have described each component of the proposed 3D image construction algorithm, and the full vision of this algorithm is presented below now.

Algorithm 4.1: construction of a 3D image.

Input: a color image Ic and a depth image Id acquired with a KINECT device. color image coordinate system by Equation (4.13) and (4.14).

 

1 ^c

Step 3. Use (x_c, y_c) as indices to find the color values (R, G, B) of the pixel P_c at coordinates (xc, yc) in the color image Ic.

Step 4. Take (R, G, B) as the color values of pixel P_d with coordinates (x₁, x₂, x₃) in the 3D space, and use these data (color values and 3D coordinates) to render a 3D color image I_3D using the OpenGL as output.

The tool of OpenGL mentioned in Step 4 above can draw 3D points in the 3D space. It so can be used to draw the 3D image from different views so that we can see a constructed model or screen in the 3D image from a specific view by the projecting the points of the model or scene onto the chosen-view plane.

Depth

Fig. 4.5 A flowchart of 3D image construction algorithm.

A result of applying Algorithm 4.1 is shown in Fig. 4.7, where the raw data (the original color image and depth image) are shown in Fig. 4.6. By Algorithm 4.1, the data can be converted into 3D format, and can be drawn with its corresponding color.

Note that the black region represents no value of depth being available, so there is no

mapping to corresponding colors there.

(a) (b)

Fig. 4.6 Images acquired by a KINECT device. (a) The depth image. (b) The color image.

(a)

(b)

Fig. 4.7 A constructed 3D image. (a) A perspective view of the 3D image. (b) A top view.

4.3 Review of a Method for Geometric Correction of 3D Images

4.3.1 Idea of Geometric Correction

Once after we took a picture of a flat wall, which included both a depth image and a color one, and then conducted Algorithm 4.1, we found that the flat wall was a curved surface instead a plane when the resulting 3D image was displayed for inspection. The reason is that the infrared light rays emitted by the KINECT device are not all parallel to the X₃-axis shown in Fig. 4.3, so that the we won’t get accurate data.

A method has been proposed to solve this problem and is reviewed here. This curved surface was supposed to be of the shape of a paraboloid. Then, a paraboloid equation was derived for correcting this error. Specifically, after the paraboloid equation was found, the coordinates of x and y in the 3D image were substituted into this equation to get a corrected z value, as illustrated in Fig.4.8.

Fig. 4.8 The paraboloid seen from the direction of the Y-axis (i.e., from the top view).

4.3.2 Correction Algorithm and Experimental Results

Based on the above idea, an approximating paraboloid equation may be derived according to the criterion of minimum sum of squared errors (MSSE) in the following way.

(1) Let the equation of the paraboloid be written as:

where A and B are the quadratic coefficients and C is an intercepted length from the KINECT device to the apex of the paraboloid, as shown in Fig. 4.8.

(2) The equation for computing the value SSE of the SSE is:

where (xi, yi, zi) are the values of a 3D image pixel computed by Algorithm 4.1 and h and w are the height and width of this depth image, respectively.

(3) To find the coefficients A, B and C, according to the minimum SSE criterion, the partial derivatives of Equation (4.16) with respect to variables A, B and C, respectively are derived, yielding the following equations:

(4) The above simultaneous equations may be solved to obtain analytic solutions for the values of the coefficients A, B, and C.

2 2

(5) The paraboloid is obtained by substituting A, B, and C back to (4.16).

Finally, we show two examples of the experimental results of applying the above scheme of geometric correction to 3D images in Figs. 4.9 and 4.10.

(a) (b)

Fig. 4.9 The 3D image of a wall seen from above. (a) Before correction. (b) After correction.

(a) ^(b)

Fig. 4.10 The 3D image of another wall seen from the top view. (a) Before correction.

在文檔中利用多部KINECT建構環車3D行車紀錄器及其應用 (頁 27-0)