CHAPTER 4. Heterogeneous Sensor Approach: Simultaneous Localization, Tracking,
4.8. Kinematics Model for Upper Extremities
4.8.2. Virtual Parameters and Virtual Joint Location Generation
The Denavit-Hartenberg parameters are computed once the required measurements are available at each timestamp. A Kalman filter is applied to estimate these parameters accordingly. The parameters are updated accordingly to the occlusion. For example, if the location of the elbow is unobserved, the parameters which require elbow measurements are not updated respectively.
Instead, the virtual measurements from the proposed upper extremity model are gen-erated to provide estimates under occlusion. The proposed parameters are classified to
4.8 KINEMATICS MODEL FOR UPPER EXTREMITIES
angular parameters, [θ , α]T, and length parameter, L. For these two different types of pa-rameters, different strategies are applied to generate the estimation. The length parameter Lof each arm segment comes from the latest estimation result. As we expect the length of the arm to generally remain constant during the rehabilitation, these parameters should be fixed and are one of the features for assisting care staff to verify different subjects. For the angular parameters, [θ , α]T, a constant angular velocity (CAV) motion model is applied to generate the virtual values given the latest results with estimated angular velocity ωθ and ω αrespectively.
After generating the parameters of the proposed upper extremity model, the virtual locations of each joint are computed with forward kinematics on the arm base frame. Later on, the locations are projected back to the global frame from the arm base frame to complete the generation of virtual measurements.
CHAPTER 5
Experimental Result
A
FTER describing the proposed approaches for extending the machine per-ception boundary with single and heterogeneous sensors, real applications are shown in this chapter. The first application is regarding a single sensor, a 2D LIDAR, under a crowded urban environment. While the second ap-plication is for the heterogeneous sensors including a RGB-D camera and IMU type sensors in an indoor environment for stroke or chronic rehabilitations. These applications show the proposed approaches with performance evaluation from different aspects to demonstrate the effectiveness of pushing the machine perception limitation regarding occlusion.5.1. Application One: 2D LIDAR in Urban Environment
As shown in Fig. 3.5, a stationary SICK LMS 291 2D laser scanner was placed at a multiple-lane intersection to collect the data used to evaluate the performance of the proposed virtual measurement model, IOT, and Galceran et al. (Galceran et al., 2015).
We used the data set from a one-hour period that included the various behavior patterns in Fig. 3.5, composed of over 130,000 scans, and evaluated the performance using the accompanying Velodyne HDL-32E and Ladybug3 360 spherical camera.
5.1.1. Benchmark with 3D LIDAR for Evaluation
It is difficult to obtain the ground truth for the testing data for urban environments unless it is labeled manually and fully observable. Therefore, we utilized a 3D laser scanner – a Velodyne HDL-32E – to collect benchmark data for verification. Since occlusion in the 3D laser scanner is less than that in the 2D laser scanner, a 3D LIDAR benchmark system
was built to generate the benchmark. A set of 3D occluded moving segments was extracted from the benchmark system.
In order to obtain these segments, we required the moving points from 3D LIDAR and the 3D points within 2D occluded grid cells. To extract the moving points from 3D LIDAR, the ground points and the points above 2 meters were eliminated to focus on objects other than the ground and leaves. The remaining points from 3D laser scanner were projected to a 2D plane, the ground, to build a 2D grid map with 3D points. The resulting grid map accumulated over time to generate the 2D stationary grid map with 3D points. Then, the resulting stationary grid map was used to extract the 3D moving points.
To produce the occlusion grid map, the benchmark generation system used not only 3D data but also 2D LIDAR data. The extracted 3D moving points were compared with the 2D occlusion grid map and the 3D occluded moving points were extracted with respect to the occluded area of 2D laser scanner. Finally, to generate the 3D occluded moving segment candidates, a growing-based segmentation process was applied which selects a point to assign to the nearest segment or to start a new segment. These candidates were refined with occlusion detection in a point-based decision manner: if the percentage of occluded points for a candidate was over 80 percent, the candidate was classified as a 3D occluded moving segment and was compared to the shape of tracked estimates when evaluating the performance.
Fig. 5.1 shows two examples of the results from the benchmark system. Moving seg-ments which were observable by the 3D LIDAR but were occluded in 2D LIDAR were successfully extracted. These segments were later utilized to evaluate the performance of three approaches. Fig. 5.2 shows the occluded area ratio from 2D LIDAR and the number of occluded moving segments from the benchmark system in each frame of the evaluation data set. The overall average occluded ratio was 52.77%. This ratio was higher if there were moving objects near the LIDAR. The average number of occluded moving segments was 26.23. The huge jump in occluded moving segments between frames 21200 and 21300 was due to a black vehicle which made a U-turn in front of the sensor: the collected data from 2D LIDAR jumped dramatically during the occlusion and far observation because the black color absorbed the light from the LIDAR.
5.1 APPLICATION ONE: 2D LIDAR IN URBAN ENVIRONMENT
−500 −40 −30 −20 −10 0 10 20 30 40 50
10 20 30 40 50 60
X (meter)
Y (meter)
(a) 3D occluded moving segments at frame 19416
(b) Photos of frame 19416
Figure 5.1. Snapshots of 3D occluded moving segment results. Different colors for the points represent different segments. The gray area is the occlusion grid map from 2D LIDAR and the darker gray points are the data from 3D LIDAR. In frame 19416, occluded moving segments at upper right were extracted while occluded moving segments in the front were revealed in frame 20394. In both frames, the occluded pedestrians moving on the left were also extracted successfully.
−500 −40 −30 −20 −10 0 10 20 30 40 50 10
20 30 40 50 60
X (meter)
Y (meter)
(c) 3D occluded moving segments at frame 20394
(d) Photos of frame 20394
Figure 5.1. Snapshots of 3D occluded moving segment results. Different colors for the points represent different segments. The gray area is the occlusion grid map from 2D LIDAR and the darker gray points are the data from 3D LIDAR. In frame 19416, occluded moving segments at upper right were extracted while occluded moving segments in the front were revealed in frame 20394. In both frames, the occluded pedestrians moving on the left were also extracted successfully.
5.1 APPLICATION ONE: 2D LIDAR IN URBAN ENVIRONMENT
Area Occluded Ratio in 2D Number of Occlusion
Figure 5.2. The blue line is the occluded area ratio from 2D LIDAR and the red line is the number of occluded moving segments from the evaluation data with the benchmark system.
Table 5.1. Average weights of different motion models, and number of estimates in different model sets in observed area.
CV CA SI NI ]Estimates
Model set D[1] 0.318 0.311 0.371 N/A 14.02 Model set D[2] 0.219 0.211 0.439 0.130 61.49
5.1.2. IOT Analysis in the Observed Area
The verification results in the observed area are shown in Table 5.2. The results show that in this complicated urban environment, the proposed virtual measurement model and IOT still outperform (Galceran et al., 2015). The proposed approach and IOT not only improve the accuracy in occlusion but also enhance the performance in the observed area.
In order to understand the change between motion models, we present two analyses.
First, we analyze the average weights of different motion models and the number of estimates in the different motion model sets along the testing data. Fig. 5.3 shows the average weights of different models in the different motion model sets. We observe that the SI model dominates while the NI model is used. This suggests that most moving objects in the urban environment follow traffic rules. Fig. 5.4 reveals that most estimates still interact with each other implicitly. Thus, the NI model is essential for tracking, even when it is assigned low model weights. Table 5.1 records the average weights of different models and the average number of estimates in different sets in the testing period. Clearly most of the estimates are in D[2], in which the NI model and the SI model are dominant. Therefore, both the NI and SI models are necessary for multiple target tracking.
1.95 2 2.05 2.1 2.15
(a) Average weights of different models in set D[1]
1.95 2 2.05 2.1 2.15
(b) Average weights of different models in set D[2]
Figure 5.3. Average weights of motion models in different motion model sets in observed area
Figure 5.4. Number of tracking estimates in different motion model sets in observed area
Second, the change of the weights in the observed area is analyzed. The 1-Norm in-dex was utilized to represent the mean displacement of the VSMM weights in each mo-tion model from the same estimate between two consecutive frames. Fig. 5.5 shows the 1-Norm-related statistic index along the observed frames, which represents the absolute change of the VSMM weights. It shows that all of the 1-Norm means are below 0.05, and that this metric decreases with time. In addition, the number of estimates decreases from 1882 with time. This suggests that moving objects prefer to maintain their behaviors and their interactions with the scene and nearby objects.
5.1 APPLICATION ONE: 2D LIDAR IN URBAN ENVIRONMENT
Figure 5.5. 1-Norm index and number of estimates with respect to observed frames.
The blue line is the mean of the 1-Norm index. The red line is the number of estimates with respect to the observed frames.
5.1.3. Virtual Measurement Model Evaluation in the Occluded Area
After analyzing in the observed area, the performances of these three approaches are revealed for the occluded area. The accuracy is defined as the ratio of tracked occluded seg-ments to the total occluded segseg-ments from the benchmark system. An occluded segment is tracked if the mean of the Euclidean distance to the 3D occluded segment points is less than a given threshold. Table 5.2 describes the average accuracy and number of tracked occluded segments with 1.5 m. In the occluded area, the proposed virtual measurement model approach yielded a 25.89% improvement in accuracy as compared to Galceran et al. (Galceran et al., 2015); the IOT approach is between the two but is close to the virtual measurement model approach. The accuracy results in Table 5.2 also show that IOT and the virtual measurement model improve tracking estimates under occlusion.
To investigate the effects of different evaluation thresholds, three approaches were performed with different thresholds from 0.8 meter to 3 meter. The results are revealed in Fig. 5.6. We found that the accuracy is over 50 % from 1.2 m with the virtual measurement model approach while it is over 50 % from 1.4 m with the IOT approach. And for both the IOT and the virtual measurement model approaches, the increments of the accuracy are getting smaller as the threshold grows. To select 1.5 m as the representative result, we
Table 5.2. Average accuracy and the number of tracked occluded segments of the three approaches. Numbers inside parentheses are the improvement comparing to Galceran et al. (Galceran et al., 2015).
Galceran et al. (Galceran et al., 2015) IOT Virtual measurement model Accuracy
(observed area)
58.44% 60.49% (+0.04%) 63.20% (+0.08%)
Accuracy (occluded
area)
45.03% 51.74% (+14.90%) 56.69% (+25.89%)
]Tracked (occluded
area)
11.97 13.92 (+1.95) 15.30 (+3.33)
1 1.5 2 2.5 3
Figure 5.6. Accuracy along with different evaluation thresholds. The blue line re-veals the accuracy for Galceran et al. (Galceran et al., 2015), the green line illus-trates the accuracy with IOT, and the red line shows the accuracy of the virtual measurement model. Note that the increments of the accuracy are decreasing while the threshold is growing.
consider that not only because it is the half width of lanes in urban areas, but also because lanes with width from 3 to 3.1 m carry maximal capacity for both motor traffic and bicycles according to Karim et al. (Karim, 2015). Since more moving objects could lead to more varying occlusions, we think this criterion is a balanced threshold for all vehicles, bikers, and pedestrians in urban environments to fairly compare these three approaches.
5.2 APPLICATION TWO: HETEROGENEOUS SENSOR FUSION IN INDOOR ENVIRONMENT
In addition to the overall accuracy evaluation, we also listed three cases of interacting object models and occlusions. Fig. 5.7 depicts the comparison results without virtual mea-surement. The moving bike moves from the bottom to the top, and the occlusion occurs in the beginning of the sequence. With virtual measurement, the uncertainty of the estimate is lower during the occluded area than the uncertainty without the virtual measurement.
Fig. 5.8 shows a bike moving downward and interacting with a moving object from the other side. The large eclipses in the middle are the estimates with interactions to avoid the collision. There are also some occlusions during the journey, and it ends with an occlusion.
This case shows that the virtual measurement model helps to raise certainty about the es-timate. Fig. 5.9 shows a challenging case in which a pedestrian is moving with a highly crowded group and is occluded in several frames. In these cases, the proposed virtual measurement approach keeps the object with less uncertainty, increasing confidence when the object interacts with nearby objects in the group.
From these cases, it is shown that the weight of the neighboring interaction model is higher when the estimate is initialized or when it reappears; it is also higher when interact-ing with nearby movinteract-ing objects. Accordinteract-ing the length of time, there are two kinds of inter-actions that affect moving objects. As shown in Fig. 5.8, one is a sudden movement for fast interaction. Fig. 5.9 shows another long-term type exhibited by continuous interaction. We also find that unless it is necessary, most moving objects avoid sudden movements. This explain the low weight for the neighboring interaction model in the previous analysis.
5.2. Application Two: Heterogeneous Sensor Fusion in Indoor Environment A stationary Kinect V2 RGB-D camera was used as the RGB-D node and smart phones / watches / wristbands with accelerometers plus G sensors and geomagnetic field sensors were applied as wearable nodes to evaluate the proposed sensor fusion approach. To sim-ply demonstrate the idea, an Asus Zenfone 2 Laser smartphone was used as the wearable node. The performance of the proposed heterogeneous sensor fusion approach was veri-fied with synthetic ground-truth data sets; we evaluated the performance under occlusion with another ten-subject data set. Due to the difference between time clocks at different sensing nodes, an NTP (Network Time Protocol) server was set up for timestamp synchro-nization. Thus, in the following experiments we evaluated the proposed approach at ten frame per second to all synchronized nodes.
−300 −20 −10 0 10 20 30 10
20 30 40 50 60
X (meter)
Y (meter)
21594
366
(a) Without virtual measurement
−300 −20 −10 0 10 20 30
10 20 30 40 50 60
X (meter)
Y (meter)
21594
510
(b) With virtual measurement
Figure 5.7. Case 1: A bike goes from the bottom to the top with occlusion in the
5.2 APPLICATION TWO: HETEROGENEOUS SENSOR FUSION IN INDOOR ENVIRONMENT
(c) Photos of last frame
Figure 5.7. Case 1: A bike goes from the bottom to the top with occlusion in the middle and reappears again. The number next to the moving object indicates track ID. Red and blue points are the stationary and moving points of 2D LIDAR, and orange rectangles are the segmentation results. Purple eclipses and lines are the previous estimates and trajectory in the observed area, and teal eclipses and lines are those in the occluded area. Finally, the magenta eclipse and line are the current estimate and the current orientation of the velocity, respectively. At the bottom, the tracked object is highlighted with a magenta rectangle.
5.2.1. Synthetic Data Generation
In the synthetic data set, subjects performed two activity sequences once. In the first activity, the subject placed his/her hand on top of his/her head per Carroll et al. (Carroll, 1965) who suggested this as a performance evaluation activity for stroke rehabilitation.
After this motion, the subject then raised his/her hand from the front of the body to the shoulder level. These motion sequences are illustrated in Fig. 5.10. A phone was held in the left hand and the Kinect was placed in front of the subject. With the first motion we evaluated the performance of the proposed approach as an observable situation for stroke rehabilitation of stroke patients. With the second motion, which introduces occlusions of the wrist joint, we verified the performance under occlusion. All these motions were executed in a standing pose.
−300 −20 −10 0 10 20 30 10
20 30 40 50 60
X (meter)
Y (meter)
19800
112
(a) Without virtual measurement
−300 −20 −10 0 10 20 30
10 20 30 40 50 60
X (meter)
Y (meter)
19800
109
(b) With virtual measurement
Figure 5.8. Case 2: A bike moves downward; midway, it interacts with a nearby moving object.
5.2 APPLICATION TWO: HETEROGENEOUS SENSOR FUSION IN INDOOR ENVIRONMENT
(c) Photos of last frame
Figure 5.8. Case 2: A bike moves downward; midway, it interacts with a nearby moving object.
Sets of synthetic data were generated given the joints’ locations as detected by the RGB-D node. There are four kinds of measurements to generate. These four measure-ments – timestamp, location, rotation, and acceleration – are necessary to determine the odometry and observations regarding the local acceleration, the virtual relative orienta-tion, and the joint location as the timestamped inputs for the proposed approach. All these four measurements were used by the wearable node and the timestamped joint location was used by the RGB-D node.
After collecting the joints’ locations as detected by the RGB-D node, the ground truth is calculated as the linear interpolation of the joint location given the collected timestamp.
Note that the timestamp differed between the RGB-D node and the wearable node. Thus, the synthetic data were generated according to the different timestamps. The wearable node is attached on the left hand as shown in Fig. 5.10. The location is generated based on the ground truth plus a 5-cm normal distribution variance. The rotation is generated from a simulated gyroscope rotating between 0 to 180 degrees and then vice-versa to 0 degree with 0.5 rad per second with a 0.1 rad per second Gaussian noise. The rotation
−250 −20 −15 −10 −5 0 5 5
10 15 20 25 30
X (meter)
Y (meter)
20238
194
(a) Without virtual measurement
−250 −20 −15 −10 −5 0 5
5 10 15 20 25 30
X (meter)
Y (meter)
20238
263
(b) With virtual measurement
Figure 5.9. Case 3: A pedestrian walks within a highly crowded group and interacts with each other while occluded by other moving objects.
5.2 APPLICATION TWO: HETEROGENEOUS SENSOR FUSION IN INDOOR ENVIRONMENT
(c) Photos of last frame
Figure 5.9. Case 3: A pedestrian walks within a highly crowded group and interacts with each other while occluded by other moving objects.
axis is along the axis from the left wrist to the left hand. These locations and orientations constitute the odometry measurements of the phone. The orientation measurements are also used to compute the virtual relative orientation quaternion as observations. The global acceleration is computed according to the locations from ground truth. With the location and the orientation, the global acceleration is converted to the local phone frame plus a 5 cm/s2Gaussian noise as observations from the accelerometer.
5.2.2. Orientation Drift Analysis with Synthetic Data
We observed orientation drift after several execution without global orientation mea-surements. This issue is examined with the generated synthetic data with a geomagnetic field sensor in one frame per second. Since the drift problem usually occurs after several time steps, we do not have to update the virtual relative orientation that quickly as the location and acceleration in ten frame per second.
Figure 5.10. Motion sequence to be evaluated. Left: hand placed on top of head;
Right: hand raised forward to shoulder level.
1.4937 1.4937 1.4937 1.4937 1.4937 1.4937
x 1012 0
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
frame Euclidean Distance of Quaternion
Figure 5.11. Results with and without geomagnetic field sensor. The red line is with the geomagnetic field sensor; the blue line is that without the sensor. The geo-magnetic field sensor provides the measurements to compute the proposed virtual relative orientation.
Fig. 5.11 shows the results with and without the geomagnetic field sensor, which provides the orientation in the Earth frame when computing the proposed virtual
Fig. 5.11 shows the results with and without the geomagnetic field sensor, which provides the orientation in the Earth frame when computing the proposed virtual