Chapter 1 Introduction
1.2 Literature Survey
1.2.1 CFA Interpolation for a Single Image Sensor
Digital color images from single-chip digital cameras are obtained by interpolating the output from a CFA. The simplest CFA interpolation methodologies apply well-known image interpolation techniques, such as nearest-neighbor replication, bilinear interpolation, and cubic spline interpolation, to each color channel separately. However, these single-channel algorithms usually introduce severe color artifacts and blurs around sharp edges [1]. These drawbacks motivate the need of more advanced algorithms for improving demosaicing performance. An excellent review on advanced CFA interpolation algorithms can be found in [2].
In recent years, there have been investigations on more sophisticated CFA interpolation algorithms. In [3], Lu and Tan presented an improved hybrid CFA interpolation method that consists of two successive steps: an interpolation step to render full-color images and a post-processing step to suppress visible demosaicing artifacts. Muresan and Parks proposed an improved edge-directed CFA interpolation algorithm based on optimal recovery interpolation of grayscale images [4]. They first utilized a grayscale image interpolation algorithm based on optimal recovery estimation theory to interpolate the green plane. The
red/blue channels were interpolated using inter-channel color difference adaptive filtering.
These two CFA interpolation algorithms in general produce high quality visual results, especially in reconstructing sharp or well-defined edges of the image. However, in fine details or texture regions, where edges tend to be short and in different directions, these algorithms introduce undesirable errors and give degraded performance.
Meanwhile, two iterative CFA interpolation techniques were proposed by Gunturk et al.
[5] and Li [6], respectively. In [5], a projection-onto-convex-set (POCS) technique was presented to estimate the missing color values in red and blue channels using alternating projection scheme based on high inter-channel correlation. In [6], Li formulated the CFA interpolation as a problem of reconstructing correlated signals from decimated versions and proposed a successive approximation strategy by adopting color difference interpolation iteratively. Although these iterative CFA interpolation algorithms perform well in texture regions and reveal low computational complexity, they cannot produce satisfactorily high quality visual results in well-defined edges of the image.
Another recent CFA interpolation approach divides the demosaicing procedure into interpolation stage and decision stage [7-10]. In the interpolation stage, horizontally and vertically interpolated images are produced respectively. In the latter decision stage, a soft-decision method, in which the interpolation must be performed before the decision procedure, was employed for choosing the pixels interpolated in the direction with fewer artifacts. Because the decision stage is essential for these CFA interpolation approaches, we refer them as decision-based CFA interpolation algorithm. For the decision stage, Hirakawa
et al. proposed a homogeneity metric to measure the misguidance level of color artifacts
presented in interpolated images [7]. Based on this measurement, the interpolation decision is made by choosing the region with larger homogeneity values. In [8], Wu et al. adopted the Fisher’s linear discriminant technique to determine the optimal interpolation direction in a local window. In [9], Grossmann and Eldar utilized the YIQ color space as a tool to select thereconstructed regions with a smoother chrominance component. Recently, Omer and Werman proposed an enhanced decision-based CFA interpolation algorithm that combines the decision process with the standard CFA interpolation algorithm such as edge-directed scheme [11] to improve its performance in places the standard algorithm tends to fail [10]. The decision-based CFA interpolation algorithm performs well not only in texture regions, but also in well-defined edges of the image. However, the main drawback of these CFA interpolation algorithms is that they are not efficient in the interpolation stage because each pixel needs to be interpolated at least twice, one in horizontal direction and the other in vertical direction, for the next soft-decision procedure. This drawback also greatly increases the computing efforts in the latter decision stage. Therefore, it is still a challenge in CFA interpolation design to develop an efficient CFA interpolation method with high performance in both texture and edge regions.
1.2.2 Visual Tracking Control for a Wheeled Mobile Robot
The visual tracking control problem addressed in this thesis focuses on the visual tracking control of a unicycle-modeled (usually termed as wheeled) mobile robot equipped with an on-board monocular vision system. Due to the high number of different mobile robot visual tracking control methods, we classify the reported methods into four groups based on the type of the target to be tracked. Many efforts focus on the first group which aims to track a static target, such as a ground line, landmark, or reference image, for the purpose of mobile robot navigation or regulation (so-called homing) [15-28]. To track the ground line, Ma et al.
formulated the visual tracking control problem as controlling the shape of a ground curve in the image plane and proposed a closed-loop vision-guided control system for a nonholonomic mobile robot [16]. Coulaud et al. proposed a simple and stable feedback controller design, which avoids sophisticated image processing and control algorithms, for a mobile robot equipped with a fixed camera to track a line on the ground [17]. In the case of tracking the
landmark, the reported controllers usually modify the visual servoing technique to satisfy the nonholonomic constraint for the motion control of the mobile robot [18-21]. In [22], Zhang and Ostrowski utilized an optimal control method to solve the visual motion-planning problem by generating a virtual trajectory in the image plane and the corresponding optimal control signals for the robot to follow. Nierobisch et al. proposed a visual tracking control method for a mobile robot with a pan-tilt camera to track visual reference landmarks in the acquired views during autonomous navigation [23]. Recently, the homography-based [24, 25]
and epipolar-based [15, 26-28] visual tracking control approaches were proposed for a mobile robot equipped with a pinhole or an omni-directional (so-called central catadioptric) camera to track a reference image toward a desired configuration. These two approaches consider the mobile robot visual tracking control problem as a visual servoing regulation or visual homing problem. In [24], Chen et al. developed a visual tracking controller based on the Euclidean homography to track a desired time-varying trajectory defined by a prerecorded image sequence of a stationary target viewed by the on-board camera as the mobile robot moves.
However, the stability of their result is restricted by the non-zero reference velocity condition of the desired trajectory. To overcome this drawback, Fang et al. exploited Lyapunov-based techniques to construct a homography-based visual servoing regulation controller for proving asymptotic regulation of the mobile robot [25]. In [26], Mariottini et al. exploited the epipolar geometry defined by the current and desired camera views to develop a two-step visual servoing regulation controller. They also extend this design to the visual servoing regulation control of a mobile robot with a central catadioptric camera [27]. In [28], Goedemé et al.
developed a vision-only navigation and homing system for mobile robots with an omni-directional camera. Their method divides the visual homing operation into two phases and computes visual homing vector based on epipolar geometry estimation. Although these approaches of the first group provide appropriate solutions for static target visual tracking control problem, they cannot guarantee to solve the moving (non-static) target visual tracking
control problem.
The second group aims to track other robot teammates in a robot group for the formation control purpose [29, 30]. The proposed approaches in this group usually are designed based on the central catadioptric camera model in order to detect all robot teammates at the same instant. The subject of the third group is to track a predictable moving target, such as a projectile or straight moving ball, for mobile robot interception purpose [31, 32]. In [31], Borgstadt et al. utilized a human vision-based strategy to guide a mobile robot to intercept a projectile ball. Similarly, Capparella et al. extended the concept of human-like strategy to develop a vision-based two-level interception approach, which contains a lower level controller to control the on-board pan-tilt camera and a higher lever controller to operate the mobile robot platform, for intercepting a straight moving ball [32]. A common point of the second and third group is that the motion of the target of interest is known and predictable.
However, in some robotic applications, a mobile robot requires to track a dynamic and unpredictable motion target, such as a human’s face, for the purpose of pursuit or interaction.
Thus, the existent methods of the aforementioned two groups are not suitable to solve the dynamic moving target visual tracking control problem.
The purpose of the fourth group aims to solve the problem of tracking a dynamic moving target [33-37]. In [33], Wang et al. proposed an adaptive backstepping control law based on an image-based camera-target visual interaction model to track a dynamic moving target with unknown height parameter. Although the approach in [33] guarantees the asymptotic stability of closed-loop visual tracking control system in tracking a dynamic moving target, the case of tracking a static target cannot be guaranteed due to the non-zero restrictions on the reference velocity of the mobile robot. In [34], Song et al. combined a face detection algorithm with a PID controller to track a moving person in a home setting. The main disadvantage of their method is that it cannot guarantee the stability of the closed-loop visual tracking system based on a stability criterion. In [35], Malis et al. integrated template-based visual tracking
algorithms and model-free vision-based control techniques to build a flexible and robust visual tracking control system for various robotic applications. Because their visual tracking result is based on the homography estimation, which requires two images of the target pattern to estimate the optimal homography, the reported system only overcomes the partial occlusion problem but fails in the fully occlusion problem. In [36], Han et al. proposed an image-based visual tracking control scheme for a mobile robot to estimate the position of the target in the next image and track the target to the central area of the image. Since their method utilized the differential approximation method to estimate the velocity of target in the image plane, the estimation result is very sensitive to the image noise. Recently, a visual interaction controller had been proposed for a unicycle-modeled mobile robot to track a dynamic moving target such as human’s face [37]. The drawback of this method is that the controller requires the target’s 3D motion velocity, which is difficult to estimate when only a monocular camera is used.
Therefore, from the literature survey, one of the most important challenges in mobile robot visual tracking control design is to develop a visual tracking control system to estimate the motion of the dynamic moving target and track it based on a stability criterion. Further, in realization of the control schemes, it has been noted that the disturbances of image noise, velocity quantization error and temporary (partial/full) occlusion degrade the performance of the controller and might make the system unstable. These problems have not yet been clarified in many existent related works and hence motivated us to investigate the robustness of the visual tracking control system against the uncertainties of image noise, system model, velocity quantization and temporary occlusion.