The thesis is organized as follows. We review the background and the existing algorithms for marker detection in Chapter 2. Then we introduce the common planar marker AR systems in Chapter 3. In Chapter 4 and 5, we explain our design and conduct experiments to evaluate the system performance. Finally, we conclude the thesis in Chapter 6.
Chapter 2 Background
The section details the planar markers used in AR techniques, and other image processing methods we applied to the proposed system, STAR.
2.1 Planar Marker
Vision-based AR systems usually use planar markers to obtain the positions and angles to merge the virtual images into real image. These planar markers should have two characteristics:
(1) Square shape, and (2) Thick black peripheries. The square markers involve four right angles as the salient points for 3D detection, and the markers' thick black peripheries make them easy to be separated from the background objects. There are two common types of planar markers used in the existing AR systems, one is the template marker and the other is the ID marker. Figure 2.1 illustrates the two different planar markers. The template markers have both outer and inner boundaries and use geometry patterns or words as inner patterns, while ID markers only have outer boundaries and use the digital images as inner patterns.
In this paper, we adopt template markers in the proposed system, STAR. The template mark-ers are also used in ARToolKit [7] [8]. This kind of marker is composed of three parts: (1) white periphery, (2) thick black periphery, and (3) inner pattern. Some definitions don't take the outer white periphery as a part of template markers. But the clear boundary between contrastive black and white peripheries increases the detection rate. The second part is the thick black periphery, including square-shaped outer and inner boundaries. The existing AR systems can detect the
0.5 cm
Figure 2.1: Two types of AR planar markers (a) Template marker (b) ID-marker
two boundaries and identify markers from background objects easily. The last part, the inner pattern, helps identify different markers by using pattern matching algorithms.
In 2009, M. Fiala has indicated the template marker may have higher inter-confusing rate [14]. Comparing to the ID markers, template markers with both outer and inner boundaries are much easier to rebuild while occlusion occurs. In addition, when the marker's inner pattern is occluded, a template marker can still be detected by using partial pattern matching algorithms but the system may get a failure or wrong detection when using an ID marker.
2.2 Detection Techniques
In this section, we introduce the detection techniques used in many existing AR systems, including threshold, boundary detection and the pattern matching techniques.
2.2.1 Threshold Techniques
Threshold techniques are the most common techniques for image processing. When finding objects in an image frame, we can set a threshold to convert a color image into black-and-white image. The threshold should be closed related to the light conditions in the environment, so that it should be adjusted by the lights. An inappropriate threshold may make the detection failure
while a well-chosen threshold can reduce the difficulties in the subsequent image processing.
Therefore, it is important to choose a right threshold to handle different light conditions. The common threshold techniques can be separated into two methods: (1) the global thresholding method, and (2) the adaptive thresholding method.
Global thresholding method is the simplest and the fastest method to find an object in an image frame. When a user sets a threshold value, the pixels in the whole image with grey val-ues larger than the threshold are set to white and other pixels are set to black. The fixed global threshold method, which used in ARToolKit, fails easily under the unbalanced light conditions.
Other systems, such as ARToolKitPlus [9], use dynamic thresholding methods to solve the prob-lem, which changes the threshold randomly to guess the right threshold while the previous frame detection failed. This method performs well for the single marker detection, but it doesn't fit to multiple makers detection under the unbalanced light conditions.
Another thresholding method is the adaptive thresholding method, or so called local thresh-olding method. This method sets different thresholds for different regions to solve the problem in detecting multiple markers. Pintaric [11] tries to find the marker region and calculates the weighted average of the region as a threshold. This method is simple and widely used in many AR systems, such as Hum-AR [3], and it also suit for ARToolKit. The marker position from the previous frame is required to get a correct adaptive threshold value in the present frame.
Bradley et al. then proposed an adaptive thresholding method where no previous information is required. The threshold value is calculated by a weighted average from a fix-sized region, and used to determine the color of center pixel of the region. Nevertheless, this method has to calculate many adaptive thresholds while processing a single frame of image, which consumes a lot of time. Hence, D. Bradley uses an integral image to reduce the processing time. We show an example of integral image in Figure 2.2. The left image shows the grey level of every
1 3 2 4
Figure 2.2:An example for integral image
pixel in the original image, the integral image can be calculated by these pixels’ gray levels.
After calculating an integral image, a summation of the region can be simply calculated by the four elements in the integral image. By using the integral image, the processing time can be dramatically reduced to fit the real time requirements in processing video.
2.2.2 Boundary Detection Techniques
This classic boundary detection method is designed by Moore in 1968 [17] [18]. It is useful to detect the closed boundaries of an object. Figure 2.3 shows an example to detail the steps of finding the boundaries of an object. First, we locate the object's upper left corner, and mark it as the first boundary point and its left neighbor point as the background point. The two points are marked red and blue shown in Figure 2.3 (b). We check the 8-neighbors pixels of the boundary point, starting at the background point and proceeding in a clockwise direction. For an object, the first detected pixel is the second boundary point, and the former checked point is the new background point, which is the upper neighbor point of the second boundary point as shown in Figure 2.3 (c). The detection steps repeat to find new boundary points until it reaches to the first boundary point and re-found the second boundary point again. This method can find the closed boundary of objects simply and efficiently, but it fails to detect the inner boundaries of the hollow objects, such as the black peripheries of template markers.
1 1 1
The pattern matching algorithms are used to recognize the markers by matching the inner pattern with the template. A well-known pattern matching algorithm, Normalized Cross Cor-relation (NCC) [18], is widely used to identify patterns. ARToolKit also adopts NCC to match patterns. NCC correlates two patterns using Equation 2.1. In the equation, Gtand Gsrepresent the template data and the marker inner pattern data, respectively. r means the correlation of the two data. The correlation approaches to 1 if Gtis similar to Gs. A marker is detected while the correlation between the inner pattern and the template is larger than 0.5.
r =
In this chapter, we discussed the two types of planar markers used in AR systems first, and then introduced the image processing methods we use in STAR. We compared the pros and cons of the two shaped markers and chose the template markers, which can be reconstructed easily while the marker is occluded. In our system, we use Bradley's adaptive threshold method to cope with multiple markers detection under unbalanced light. We also use boundary following to detect objects' outer boundaries, and Normalized Cross Correlation to match makers' inner patterns. By using these methods, we don't have to design new algorithms to detect boundaries
Chapter 3 Related Work
The section discusses several existing tools and libraries, including ARToolKit, ARTag and ARToolKitPlus, used for augmented reality technologies.
3.1 ARToolKit
ARToolKit (ARTK) [7] [8], mainly developed by Hirokazu Kato, is one of the famous li-braries designed for AR applications. This popular tool is widely used in object tracking and human-computer interaction. ARTK is also an open source tool which can be applied on differ-ent platforms. With complete and detailed reference files, ARTK is a beginner-friendly tool for developing AR applications. The structure of ARTK, outlined in Figure 3.1. It can be separated into three phases: (1) Labeling, (2) Detect marker, and (3) Match pattern.
In the first phase, ARTK uses a method with a fixed global threshold to binarize a frame into a black-and-white image. After the binarization, the image contains several continued black blocks we called objects. To realize the number of objects in the image, ARTK use sequential labeling algorithm [19], assigning different labels to each object. The object area and region can also be calculated during the labeling phase. With the information of each object, ARTK can process object one by one in the following phase. The second phase contains two detections:
(1) boundary, and (2) quadrangle detection. The boundary detection intends to find the closed-contour for each object by using the boundary following method [17]. Only if an object has closed boundary, the quadrangle detection can determine whether this object is a quadrangle or
Fix threshold
Binarize image Detect contour Detect square Pattern matching
Success!
Success!
Hiro Hiro Hiro
Success!
Figure 3.2: Example of ARToolKit detection flow
not. In the third phase, ARTK identifies markers by applying pattern matching method.
Figure 3.2 illustrates an example to explain the detection flow of ARTK. First, ARTK use a fix threshold (usually 100) to binarize the colorful image into black-and-white image. At the next step, the tool detects the boundary of each object and finds the outer boundary of marker's thick black periphery. Then, boundary of an object is detected by checking whether the boundary is quadrangle or not. Finally ARTK uses the pattern matching method to compare the inner pattern with the template. If the result is close to 1, ARTK takes the object as a marker.
The detecting technique of ARTK is very efficient, however, the results of marker detection is not always reliable under different circumstances. The fixed global threshold used in the first phase generates a non-robust result under unbalanced light conditions. In addition, the boundary detection can only be used to detect the outer boundary and results in a failure in detecting a marker while partial occlusion occurs.
Marker ID
ARTag [14] [15] is another famous AR tool developed by Mark Fiala. ARTag uses ID markers to replace the template markers used in ARTK. Without using the pattern matching method, ARTag detects the inner pattern as a marker ID, which provides more robust information when detecting multiple markers at a time. The ID marker system adopted by ARTag supports up to 2002 different markers. Instead of the thresholding method, ARTag uses the gradient method to find a marker's quadrangle boundary, which provides better detection results for unbalanced light detection and partial occluded marker detection. Figure 3.3 shows the control structure of ARTag, which can also be separated into three phases: (1) boundary detection, (2) inner pattern detection, and (3) marker ID identification.
In the first phase, ARTag uses a boundary detection algorithm to find the quadrangle parts by using the gradient image. After finding the quadrangle objects, the tool tries to detect the inner patterns in the markers and sample it into a 6×6 grid. The sampled inner pattern can be translated into 36-bit digital codes. At the last phase, ARTag checks the 36-bit codes to find the pose and rotation of the marker, and decodes the digital code to get the marker ID. With the marker ID, ARTag can identify different markers correctly.
Threshold
Compared to ARTK, ARTag has better performance in marker detection. However, ARTag fails to detect a marker while the inner pattern is occluded since the correct marker ID cannot be calculated from the digital code of the occluded inner pattern. We will demonstrate and discuss the occlusion failure in Chapter 5.
3.3 ARToolKitPlus
ARToolKitPlus (ARTK+) [9] [10], developed by Daniel Wagner, is a revision from ARTK.
The main purpose of ARTK+ is to improve the performance of ARTK and apply the AR tech-nique on the mobile devices. Since accuracy and efficiency are always a tradeoff, ARTK+ uses a special detection technique which allows a user to choose the best detection method for differ-ent purposes. A user can choose high efficiency but less robust detection method for the mobile devices or use robust method for high computer power PCs. The flexible structure of ARTK+
is illustrated in Figure 3.4.
The structure flow of ARTK+ can be separated into three phases. The first phase determines a threshold value. One of the choices is specifying fixed global threshold, which is adopted by ARTK. Another choice is to use dynamic global thresholding, which uses the random number
The second phase is in charge of marker detection. A user can choose the template marker and choose pattern matching method used in original ARTK, which is an efficient but less robust method. Another choice is to use the ID marker system used by ARTag. Such a choice supports up to 4096 different markers. The last phase is to estimate pose. The pose estimation contains two sub phases: signal marker and multiple markers estimation. However, ARTK+ can only detect fix arranged multiple markers, which needs to set up the module before detection. It can't detect stably for separated multiple markers since the pose estimation can only estimate the pose of one marker in this type of detection.
ARTK+ truly improves the performance of original ARTK and gives more reliable result under many circumstances. However, ARTK+ does not fit to detect separated multiple markers, and does not performs well for detecting multiple markers under unbalanced light conditions.
In addition, since the detection algorithm is similar to ARTK, ARTK+ also can't detect partial occluded markers.
3.4 Summary
We introduced three popular AR systems in this chapter, and discussed the problems for these systems. ARTK cannot handle unbalanced light detection and partial occluded marker detection. ARTag uses gradient methods to solve these two problems, but still can't detect inner pattern occluded markers. Another system, ARTK+, is a revision from ARTK and intends to improve the drawback of the unbalanced light detection. However, it is still unable to detect multiple markers under unbalanced light and the partial occluded markers. We will detail our designs to improve these problems in the next chapter.
Chapter 4
Selectable Thresholding Augmented Reality System
The section interprets the structure of STAR, including the labeling phase, marker detection phase, and pattern matching phase.
4.1 Tool Design
Selectable Thresholding Augmented Reality System (STAR) is a tool to deal with unbal-anced light detection and partial occluded marker detection. STAR is based on ARTK [6] [7]
which is an open source AR system and provides complete and detailed reference files. These characteristics make ARTK a suitable foundation than other existing AR systems. The purpose for STAR is to solve two problems for existing AR systems using planar markers (1) detect single and multiple markers under unbalanced lights, and (2) detect partial occluded markers.
Moreover, the detection methods we use must be efficient to fulfill the real-time characteristic of AR definitions.
Figure 4.1 shows the structure of STAR, which is very similar to the structure of ARTK, but we add new steps and modify some original steps to improve the detection performance. The structure can be divided into three phases: (1) labeling, (2) marker detection, and (3) pattern matching. We will mention the details of these three phases in the following sections.
Flat area removal
The main purpose for this phase is to find the number of objects in an image, and assign different labels to identify these objects. Thus, we can check these objects one by one to find marker candidates in the following phases. This phase is composed of three steps: (1) get thresh-old, (2) binarize image, and (3) component labeling. The first step includes two methods for a user to choose: (1) dynamic global thresholding, and (2) adaptive thresholding. After getting the threshold value, we can binarize the image and find the objects by applying the sequential labeling algorithm [19]. While we want to solve the detection under unbalanced lights, we focus
on the two methods to get thresholds and detail them in the following sections.
4.2.1 Dynamic Global Thresholding
Dynamic global thresholding is a fast and simple method, which is suitable for single marker detection. This method can get the correct threshold in a short period of time under the changing light conditions, and it can also cope with general unbalanced light conditions. Different with ARTK+ [9], our dynamic global thresholding method doesn't change threshold randomly, but change it in three values. These threshold values are used to handle dark, normal and bright conditions. To find the three values, we test the detection rate under common illuminance range (from 100 Lux to 2000 Lux) by using different thresholds, and note the results in Table 4.1.
From the test results, we choose the three thresholds as 70, 140, and 250 to deal with dark, normal and bright conditions separately.
Table 4.1: Detection under common illuminance range with different threshold
illuminance
\threshold
20 40 70 100 140 170 210 250100 100% 100% 100% 0% 0% 0% 0% 0%
During the marker detection, we will modify the threshold while the light condition changes.
Since we don't use optical sensors, we change the threshold while our system can't find markers in the previous frame. We won't stop changing the threshold until we find a marker in the frame, in order to detect the new-incoming markers under any light conditions. Figure 4.2 shows the examples of using dynamic global thresholding to detect markers under dark, normal and bright
Dark condition
(250 Lux) Normal condition (870 Lux)
Light condition (1840 Lux) Figure 4.2: Detection under different illuminance with dynamic global threshold
conditions.
4.2.2 Adaptive Thresholding
The adaptive thresholding method for STAR is based on the Bradley's [16] method we've mentioned in Chapter 2. In this method, we regard the center pixel which intensity is 5% lower than the region average as black. This method can provide correct results for multiple markers detection under unbalanced light conditions as showed in Figure 4.3. Nevertheless, this method is designed to process grey level images originally, which may get wrong results for processing color images. For example, the skin color which should be recognized as white color may be recognized as black color by applying the original Bradley's method. Thus, our adaptive thresholding method doesn't calculate one integral image with grey level but calculate three integral images with red, green and blue colors separately. A pixel is set to black while the three color intensities are less than the weighted average calculated by each integral image. This method may increase processing time, but it still suitable for real-time video process.
4.3 Marker Detection Phase
The marker detection phase is to find an object is whether a possible marker or not. This phase will consume a lot of time if we check every object one by one. Hence, we eliminate the
Original image Global threshold Adaptive threshold
Figure 4.3: Using global threshold and adaptive threshold to detect multiple markers under unbalance light
Figure 4.3: Using global threshold and adaptive threshold to detect multiple markers under unbalance light