Chapter 2 Object Segmentation and Features Extraction
2.3 Video Object Feature Extraction
From the previous module, we obtain the current objects which are the objects in current frame. However, we do not know any content included in these object. Thus, we need to extract the current object features. From the past research, there are many features, such as, motion vector, texture, shape, color histogram and so on, can be used to describe objects. Actually, the more meaningful features are extracted; the object description will be more accurate. Due to real time requirement, we can only choose a few of features to describe objects. Through the strict screening, we require those features can be used to describe (a) the position of object, (b) the color of object and (c) the shape of object. As the reasons mentioned above, we select the following four features: (i) center of object, (ii) color histogram, (iii) variance of object and (iv) major and minor axis of best-fit-ellipse.
For the first feature, we use the center of object to indicate the position of an object. We calculate the center (Cx, Cy) of a moving object by using Eq.(18), where R is the region of moving object and, N is the pixel number of the region R.
∈
For the second feature of color histogram, we divide each color channel into 16 bins instead of using a full color. Such a decision is based on two reasons. The first reason is full color information takes too much memory. For example, we need to use 256*256*256*4 (Byte/object) = 64 MB/object to describe an object by using full color feature. The second reason is slightly light changing cause the object color changing. We use Eq.(19) to calculate the color bin of a moving object where ri is the value of ith red color bin and initialize to zero; IR(i, j, t) is the red color value of the pixel (i, j) at time t. Similarly, we use the same method to calculate green color bin (gi) and blue color bin (bi).
For the third feature of object variance, we use the x,y variance of an object to describe the spatial distribution of an object. The Eq.(20) is used to calculate the
Since the third feature is lack of shape description, we select the fourth feature.
We use a best-fit-ellipse [15] to approximate the shape of the object instead of using a bounding box. This is because the object bounding box will change rapidly while the posture of an object just changes a little bit. For example, the comparison between bounding box and best-fit-ellipse is shown in Fig.6 and Fig.7. The increasing width of bounding box is about 200% and too much space is not belongs to object. Reversely, the increasing width of best-fit-ellipse is about 150% and the non-belonging space is smaller than bounding box. Due to the reason, we use best-fit-ellipse instead of using bounding box. From the Fig.7, we can know that, a best-fit-ellipse is not fully wrapped up the moving object. The reason is that the best-fit-ellipse tries to find an ellipse whose second geometrical moment is equal to that of the object.
As the above reasons, we select the best-fit-ellipse as the fourth feature. For the fourth feature, the calculation is more complex. From the [15], we obtain the value Jmin and Jmax where Jmin and Jmax are the least and greatest moments of inertia for the ellipse. After that, we substitute Jmin and Jmax in Eq.(21). In Eq.(21), a and b are the length of ellipse axes. At last, Eq.(22) is used to assign the value of major and minor axis.
Fig. 6. The increase of width of bounding box is 200%
Fig. 7. The increase of width of best-fit-ellipse is about 150%
Chapter 3
Video Object Tracking Algorithm
Video object tracking is an important module in our proposed system. We utilize the tracking information to analyze the behavior of human. In this chapter, we introduce the related work of video object tracking in Section 3.1 and the detailed method of tracking algorithm is proposed in Section 3.2. In Section 3.3, we point out some drawbacks in our proposed tracking algorithm.
3.1 Related Work of Video Object Tracking
Video object tracking is an important and frequently discussed research topic. Its objective is to match the detected objects in the current frame to the corresponding objects detected previously. The tracked position and the shape of objects can be used to form the object trajectories for later human behavior analysis.
The object tracking algorithms first take the detected object masks from the segmentation algorithms as input data, and try to match the objects detected earlier using features such as position, shape and color. For example [16], Oberti et al. use the shape of the object corners to track video objects. Some tracking algorithms also take motion information into consideration. For example, Kim et al. uses the direction of the motion and the variation of the speed to compute smoothness feature as the matching criteria [17]. Chen uses the motion as the constraint to find matching objects [18]. Some other algorithms [19-22] adopt Kalman Filtering. It is a linear estimation process that estimates the current value and updates the prediction recursively to estimate and track the position of the objects.
The precision of the prediction involves two errors: the processing error and the measurement error. Because sometimes there are errors in the segmentation process due to the cluttered scene, the object masks would not be very accurate and hence the measurement errors would be large. Besides, some abrupt movements of the objects such as waving of hands will make the processing error large. The prediction error may not converge quickly if both the processing error and measurement errors are large. Thus, it may be difficult to match correctly due to the uncertainty of the prediction. In this thesis, instead of using Kalman Filtering, our algorithm uses the color and shape as features. The occlusion and the split of objects can also be handled in our tracking algorithm.
3.2 Video Object Tracking
Tracking is a difficult problem in video surveillance system due to tracking objects might be occluded. To solve this problem, many ideas have been proposed [16-22]. Generally, tracking methods can be classified into two categories. One category [19-21] estimates the motion of the objects and minimizes the error function to track the objects. The other category finds the similarity of current objects and previous objects and maximizes the similarity measure to track the objects. We choose the similarity measure method to track the objects in this system. The detailed reasons will be explained in Section 3.2.1.
From the observation, the object occlusion can happen inside the camera view or outside the camera view. For the first condition, we can observe the occlusion of the objects. For the second condition, we can not observe the occlusion of objects except we can segment the occluded objects or the occlusion objects will split in camera view.
Due to the two different conditions, Jung [19] defines the first condition as EXPLICIT
OCCLUSION and the second condition as IMPLICIT OCCLUSION. In the cases of explicit occlusion, the occlusion of objects can be detected and its trajectories can be reconstructed. However, in the cases of implicit occlusion, the splitting of objects can be detected but the trajectory of objects is hard to reconstruct. We explain the problem in Section 3.3.
Although the object-level information, such as color, shape, can be extracted via the segmentation of video objects, the higher level object semantics can be extracted from the object trajectories. Thus the object tacking process is the key role toward the human behavior analysis. Our tracking module gets the extracted object features as the input and tracks all the objects to get the object trajectories. The flow chart of tracking algorithm is shown in Fig.8. In Fig.8, we use the single object matching algorithm to find all the matches of the single objects from previous segmented object to current segmented object. If we find a match, we update both the current and previous object trajectory. The details of single object matching algorithm are presented in Section 3.2.1.
After single object matching algorithm, we classify the remaining unmatched objects into four categories. The first class is the objects occlusion in the current frame. We define these objects as the current merge objects. The second class is the objects split in the current frame. We define these objects as current split objects. The third class is the objects disappear in previous frame or in current frame. The fourth class is the new objects appear in current frame. Due to the unmatched condition, we develop a multiple objects matching algorithm to detect current merge and split objects.
In the multiple objects matching algorithm, we generate all the combination of objects and call these temporal object virtual objects. If we find a match in these virtual objects, then we conclude the detection of a current merge object or current
split object. For example, if we want to detect the current merge objects, we generate all the combination of the possible candidates in previous frame. For each combination object, or virtual object, if we find a match between this virtual object and current object, then we conclude the detection of a current merge object. The details of multiple objects matching algorithm are presented in Section 3.2.2. If we can not find a match under multiple objects matching algorithm, then we believe the remaining objects could be disappeared or new objects. The complete flow chart of tracking algorithm is presented in Section 3.2.3.
Fig. 8. The flow chart of tracking algorithm
3.2.1 Single Object Matching Algorithm
The object trajectories can be obtained by matching the current video objects
with the previously tracked video objects. In the literature of object tracking, some algorithms [19-21] adopt the Kalman Filtering to estimate and track the objects. It is appealing because it recursively estimates the object states and updates the predictions.
In the ideal situation, when the moving paths are very smooth and the object masks are very accurate, the prediction error converges quickly because both the measurement error and the processing error are small. However, the detected object boundaries may contain some errors due to the clutter scenes in the real environments, and the measurement errors thus become large. In addition, the path of a moving object may not be as smooth as expected. For example, if we connect the mass centers of a walking person, the connected path looks like zigzag rather than a straight line because all the actions such as waving of hands and striding affect the mass centers significantly. Under this condition that both the measurement error and the processing error are high, the prediction error may not converge quickly. Thus, it is difficult to track and handle some complicated conditions like object occlusions due to the uncertainty of the estimation.
In our single object matching algorithm, we match an object in previous frame to an object in current frame by using a score function. The score function is stated in Eq.(23). In the score function, we use the color histogram, object center and object major, minor axis extracted in previous module to calculate the similarity between objects. Since the color information is a very important feature during matching, we assign 70% weight to color similarity. The rest of weight is assigned to shape similarity.
The ColorSimilarity function to calculate the color similarity between two Objects (O1 and O2) is shown in Eq.(24) where ri1 and ri2 are the value of ith red color bin for O1 and O2 respectively and similarly for gi1, gi2, bi1 and bi2, N1 and N2 are the number of pixels for O and O . The value of ColorSimilarity function lies between 0
and 1.
The ShapeSimilarity function to calculate the shape similarity between two Objects (O1 and O2) is shown in Eq.(25) where major1 and major2 are the length of major axis for O1 and O2, respectively, similarly, minor1 and minor2 are the length of minor axis for O1 and O2. The value of ShapeSimilarity function lies between 0 and 1.
)) shape, the matching might be incorrect. For example in Fig.9(a), the match O1 to O3 is not correct although the color and shape is similar. Since the object O1 is closer to O2
than O3 as shown in Fig.9(b), thus, O1 match with O2 is a better match than O3. Based
(b). Merge current and previous frame into one frame Fig. 9. A matching example for three object O1, O2 and O3
Before introducing our single object matching algorithm, we must point out a critical problem in scoring. Consider the case in Fig.10 where Pi and Cj are the previous ith object and current jth object, respectively, and the numeric value are the matching score between Pi and Cj. We use greedy algorithm to find the matching and the answer is shown in Fig.10(a). We assume that the optimal solution is the largest summing score. The answer in Fig.10(a) is not an optimal solution while the optimal solution is shown in Fig.10(b). However, to find the optimal solution is a NP-Complete problem.
Since the greedy algorithm always miss to find an optimal solution, we develop an N-best algorithm to find a better answer in an NxN scoring table. The main idea of our N-best algorithm finds the first N-best candidates in each matching. We illustrate the idea of N-best algorithm in Fig.11 by using the example in Fig.10. In Fig.11, we maintain a 3x1 and a 3x3 array for recording the 3-best summing score and recording the traced path, respectively. The first iteration shown in Fig.11(a), finds 3-best matching from P1, P2 and P3 to C1. During the second iteration, shown in Fig.11(b),
: from previous frame : from current frame
we check the 3x3 matrix to prevent selecting a same path from P1, P2 and P3 to C2 and for each possible path sum all the score. Since we only maintain 3 high scores, we delete other scores and corresponding traced path from Fig.11(b) and the result is shown in Fig.11(c). Analogously, we find a non-repeated path from P1, P2 and P3 to C3
and sum the score of path in last iteration. The highest score in Fig.11(d) is the final answer whose corresponding path is P3, P2, P1 meaning that P3, P2, P1 match with C1, C2, C3 respectively. The complete algorithm to calculate the single object matching algorithm and the N-best algorithm are stated in Fig.12(a) and Fig.12(b), respectively.
(a). Not perfect match (b). Perfect match Fig.10. The perfect matching
Fig.11. Illustration of N-best algorithm
(a). N-best algorithm.
Input: The N*N score matrix M.
Output: The matching for current frame objects C1,C2,…,CN
Procedure:
S: An N*1 matrix, used to record the summing score T: An N*N matrix, used to record the traced path U: An N*N matrix, a temporally buffer for sorting V: An N*1 matrix, a temporally buffer for sorting Initial S, T to 0
For i = 1 to N
Initialize U, V to 0 For j = 1 to N
For k = 1 to N
//calculate the summing score x = S[j] + M[k][i]
//if the path haven’t traced If T[j][k] = 0
//sorting the summing score and corresponding traced path For m = 1 to N
If x > V[m]
Insert x before V[m] and insert a row before U[m]
Delete V[N+1] and delete the row U[N+1]
For n = 1 to N
U[m][n] = T[j][n]
U[m][k] = i break
(b). The Single Object Matching Algorithm
Fig. 12. The N-best algorithm and the single object matching algorithm
3.2.2 Multiple Objects Matching Algorithm
The single object matching algorithm only matches the single objects in previous frame with the single objects in current frame. Since the current merge objects and current split objects are different to single objects, thus the single object matching algorithm fail to find a match. Due to the reason, we develop a multiple objects matching algorithm to find the current merge object and current split object. The main idea is to guess the possible candidate of current merge object and current split object.
Before introducing the multiple objects matching algorithm, we explain the concept of transitive closure to eliminate the impossible candidates. First, we define a relation Ri(O1,O2) to indicate the possibility of collision between objects O1 and O2 at ith frame.
If the distance between O1 and O2 is less than a collision_threshold, then we set Ri(O1,O2) to be 1. Otherwise, we set R(O1,O2) to be 0. The relation Ri(O1,O2) can be
Input: Current frame objects and previous frame objects Output: The matching pairs
Procedure:
For each Object Pi in Previous Frame For each Object Cj in Current Frame
If the distance between Pi and Cj < Distance Threshold, then SV(Pi, Cj) = Score(Pi, Cj)
Else
SV(Pi, Cj)=0
Use N-Best Algorithm to find the matching for current frame objects C1,C2,…,CN . Let the return answer of N-Best algorithm is A1,A2,…,AN
The matching pairs are (A1,C1),(A2,C2),…,(AN,CN) For each matching pairs, if SV(Ai,Ci) < Score Threshold
Delete (Ai,Ci) from matching pairs Return matching pairs
expressed as in Eq.(26).
Second, consider the three objects in Fig.13(a), and the tabular form shown in Fig.13(b). If object O1 and O3 move toward O2, and then we have the result in figure Fig.13(c). Now, consider the relationship between possibility collision relation Ri and the possibility occlusion relation Ti in the ith frame. For example in Fig.13, the transitive closure of the possibility collision relation is the possible occlusion relation.
It is not difficult to find that the possible occlusion relation Ti is equal to transitive closure of relation Ri. Thus, the relation Ti is equivalence relation. The elements in an equivalence class of Ti are the possible occlusion set. Therefore, we can use the relation Ri to reduce the possible occlusion candidate.
(a). Previous frame
(26)
(b). The relation in tabular form (Assuming d12, d23 d, d13 > d)
(c). Current frame
Fig. 13. The relationship between possible collision and possible occlusion After finding the possible occlusion candidates, we use these candidates to generate all possible occlusion objects as virtual objects and accumulate these objects color bin value. After generating the virtual objects and its color, we use the score function in Eq.(27) to calculate the similarity.
) , ( )
,
(O1 O2 ColorSimilarity O1 O2
Score =
Since it is hard to predict the shape of occlusion objects, we only use the color (27)
information to calculate the score. Finally, we state the multiple objects matching algorithm in Fig.14.
Fig. 14. The multiple objects matching algorithm
Similarly, to find a best matching is an NP-Complete problem. We use greedy approach in multiple objects matching algorithm instead of using N-best algorithm because the N-best algorithm is too complex to solve this problem.
3.2.3 Objects Tracking Algorithm
In this section, we briefly explain the tracking module in our system. The flow chart of our tracking algorithm is shown in Fig.15. First, we use the single object matching algorithm to find the match between the previous segmented objects and current segmented objects. If we find a match, we update both the objects trajectories.
Otherwise, we pass the remaining unmatched objects to the multiple objects matching module. The first step of multiple objects matching algorithm finds the current merge object. Thus, we generate all the possible virtual objects from previously remaining
Inputs: Two set of objects S1 and S2
Output: The matching object Procedure:
We find a match from the set S1 to set S2
For each Object in S1,
Find the transitive closure of R and find the equivalence classes of T For each equivalence classes, C1,C2,…,Cn
For each generate possible occlusion objects from the Ci, say Oj
Accumulate the color bin For each Object Ok in S2
SV(Ci,Oj,Ok) = score(Oj,Ok) Find maximum score in SV(Ci,Oi,Oj) Remove those elements in Oi from Ci
objects. If we find a match between those virtual objects and current objects, then we obtain the current merge objects and update both the objects trajectories. Otherwise, we pass the remaining unmatched objects to the next step of multiple objects matching algorithm. In order to find the current split object, we generate all the possible virtual objects from current remaining objects. If we find a match between those virtual objects and previous objects, then we find the current split objects and
objects. If we find a match between those virtual objects and current objects, then we obtain the current merge objects and update both the objects trajectories. Otherwise, we pass the remaining unmatched objects to the next step of multiple objects matching algorithm. In order to find the current split object, we generate all the possible virtual objects from current remaining objects. If we find a match between those virtual objects and previous objects, then we find the current split objects and