Virtual Contour Construction Using Spatio-Temporal

2 Virtual Contour Guided Video Object Inpainting Using Posture

2.2 Occluded Object Completion Using Posture Sequence

2.2.3 Virtual Contour Construction Using Spatio-Temporal

information left in a badly damaged object is usually insufficient to reconstruct the object properly by using spatio-temporal clues.

Furthermore, completing an object frame-by-frame often causes temporal discontinuity in the object’s appearance and motion, since a frame-wise completion process does not consider an object’s temporal dependency in consecutive frames. Such temporal discontinuity results in visually annoying artifacts like flickering and jerkiness. To ensure that a completed object is visually pleasing, it is important to extract a set of features from a damaged object in a number of consecutive frames. As a result, the features not only represent the object’s characteristics (e.g., motion, appearance, and posture), but also take its temporal continuity into account.

Manifold learning based methods [10][25] have been proposed to recover the damaged/missing poses of an occluded object. Although the

consecutive poses of an object with regular and cyclic motion can be well represented by a low-dimensional manifold embedded in a high-dimensional visual space, poses with non-regular motions (e.g., transitions in two types of motions) are usually not the case. As a result, mapping reconstructing a high-dimensional video object with irregular or non-cyclic motion from the object’s low-dimensional manifold approximation usually leads to annoying artifacts (e.g., ghost images).

As mentioned earlier, we use spatio-temporal slices of a video to derive virtual object contours, which are then used as features to infer the occluded object poses. More specifically, after object extraction and removal, we sample a 3-D video volume comprised of several consecutive frames to obtain a set of directional 2-D spatio-temporal slices, as shown in Figure 2.4. For example, if a 3-D video volume (Figure 2.4 (a)) is sampled at different Y values (Figure 2.4 (b)), each resulting XT slice represents the horizontal trajectory of an object over time. The trajectory can fully capture an object’s motion if it only has horizontal motions. Other directional sampling schemes can be used to deal with objects that have different motion directions. Note that a non-pure horizontal motion will cause an object’s size to vary over time

due to the zoom-in/zoom-out effect, as shown in Figure 2.4(c). In this case, posture alignment and normalization can be used to avoid the inference of different posture scales. Without loss of generality, we use the largest posture of an object as a reference for aligning and normalizing the other postures. First, we establish the correspondence between the contour points of every two adjacent postures by shape matching [23][24]. The affine transformation parameters between the largest posture and the others can then be estimated from the corresponding points using the least squares optimization method. As a result, all postures are aligned and normalized with the largest posture via the affine transformations. As shown in Figure 2.4(d), after removing the foreground object and posture alignment, after removing the foreground object and posture alignment, object occlusion results in incomplete trajectories of the object in the spatio-temporal slices. The missing regions of object trajectories in the 2-D spatio-temporal slices must be completed using an image inpainting method before composing a virtual contour. Because an object’s occlusion period is usually short, we assume that the occluded part of a motion trajectory in a 2-D slice can be approximated by a line. Based on this assumption, the occluded part in

each directionally sampled slice can be inpainted well. Since the trajectory of an object on each 2-D slice records the locations of the same part of object over time, as long as the missing regions of trajectories are completed properly, the reconstructed trajectories will be continuous, thereby preserving the temporal continuity of an object.

(a) (b)

(e)

x y

t Y Y

X Y

X t

(f)

Figure 2.4 Sampling a 3-D video volume comprised of several consecutive frames:

(a) the original frame; (b) the object trajectory on a sampled XT plane s, indicated by the green lines in (a); (c) the original frame; (d) the object’s trajectory on a sampled YT plane, indicated by the red lines in (c); (e) 2-D spatio-temporal slices sampled on a video shot, where the object’s size varies due to non-pure horizontal motion; and (f) the removed occluded object trajectories on the XT plane sampled on the 2-D plane.

To obtain continuous object trajectories, we use the patch-based image inpainting scheme proposed in [16] to complete missing regions in the spatio-temporal slices. The method first determines the filling order of the missing regions based on the confidence term and data term as follows:

( ) ( ) ( )

P p

C p

⋅

D p

, (2.4)

where P(p) represents the priority of a missing regionp; and C(p) and

D(p) denote the confidence term and the data term expressed in (2.5) and

(2.6) respectively.

( )

I

n

D p α

∇ ⋅⊥

= , (2.6)

where Ψ represents the area of region _p Ψ , _p

α

is a normalization factor,

n denotes the unit vector orthogonal to the front

δ

Ω at point

p , and

⊥ stands for the orthogonal operator, as illustrated in Figure 2.5.

Figure 2.5 The notations used for the data and confidence terms in patch-based image inpainting [14].

Based on the filling order, a missing region is filled with the most similar neighboring patches (measured by the sum of squared differences).

After completing each spatio-temporal slice of a video frame, we use the Sobel edge detector to find the boundary of the object’s trajectory in the slice. Then, the completed spatio-temporal slices are combined to construct a virtual contour, which is used to guide the subsequent posture mapping and retrieval process.

Sometimes, image inpainting errors lead to imprecise virtual contours, making it difficult to retrieve correct postures for object inpainting. To resolve this problem, we use the object tracking scheme proposed in [27]

to correct image inpainting errors. To inpaint an occluded object, our method tracks the object in the non-occlusion period to obtain their positions. Accordingly, each spatio-temporal slice is then divided into two regions, the background region and the foreground trajectory, which allows us to apply image inpainting to the regions separately and thereby avoid inpainting errors. That is, available foreground information will only be used to infill the missing region of foreground region, and vice versa. Figure 2.6 shows that the tracking-based correction technique significantly reduces the distortion of a virtual contour caused by inpainting errors.

Figure 2.6 Virtual contours constructed by combining 2-D spatio-temporal slices derived via the patch-based inpainting method proposed in [14]. The left-hand side shows the virtual contours obtained by combining completed spatio-temproal slices without corrections, and the right-hand side shows the virtual contours with corrections.

The rationale behind the proposed virtual contour construction method is that if the continuity of object trajectories can be maintained in individually completed spatio-temporal slices, then the motion continuity of an object reconstructed by combining all the inpainted slices will also be maintained. Thus, so long as the linear line motion assumption holds during the occlusion period, a virtual contour can provide fairly precise information about the posture and filling location of an occluded object,

even if the object is badly damaged.

在文檔中利用物件修補之數位內容還原與修改技術 (頁 40-48)