The proposed approach applied CNN classifiers to determine if local patches of an image are parts of barcodes. The detection process first involved a spatial pyramid that scaled the input images and then partitioned the scaled images into local patches. The local patches were subject to the CNN classifiers for barcode detection. Once positive detection for barcodes was confirmed, the local patches and their location information were then used for the subsequent barcode extraction. The flow chart of the barcode localization system is shown in Figure 3.1.
Figure 3.1 Barcode localization system flow chart.
3.1 Collection of training image patches
Image patches were collected for developing CNN classifier. First, 200 QR (version 7) and 200 Code 39 barcodes were created using an online generator. The barcodes contained dummy information. The Code 39 and QR barcodes were then printed using an electrophotographic printer (LaserJet M1132, HP; 600dpi) with densities of 13 and 15 mils per module, respectively. The printouts were scanned using a handheld barcode reader (9200 series, CipherLab; 752480 pixels), mimicking the typical process of
8
barcode scanning. For the QR barcodes, the printouts were placed approximately 10 and 20 cm away from the scanner. One hundred images were obtained with each distance setup. The distances were set to generate barcode images of approximately 7 and 4 pixels per module (ppm). For the Code 39 barcodes, 100 barcode images were obtained each at the distances of 5 cm and 10 cm away from the scanner. The distances were set to generate barcode images of approximately 7 and 3 ppm. The gathered code39 images were then artificially rotated to imitate variations of real world partial Code39 barcodes. The images were rotated counterclockwise with angles of 15, 30, 45, 60, 75, 90, 105, 120, 135, 150, and 165. A set of background images were also collected from various themes (e.g., posters, soda cans, and items). In summary, 500 image of Code 39 barcodes, 200 image of QR barcodes and 30 image of backgrounds were used as training data.
Training samples for the subsequent CNN classifier development were created from the collected images. The samples were patches of partial barcode images. In the process of sample patch creation, the images were downsampled by a factor of 0.5 in spatial resolution. The rescaled images were then segmented into 3232 patches in a non-overlapping manner. As a result, a total of 20,400 Code 39, 2,998 QR, and 8,745 background patches were gathered. Figure 3.2 illustrates some training sample patches.
9
Figure 3.2 Training sample patches of (a) 2D barcode, (b) 1D barcode and (c) background. The samples were patches of various module sizes and angles.
3.2 CNN architecture
A CNN system was developed for identifying partial barcode patches. The network was adapted from the architecture proposed by LeCun et al [19]. The input to the system was an image patch of 3232 pixels. The network determined if the patch was part of a 1D barcode, 2D barcode, or background. The CNN system consisted of six layers, including two convolutional layers C1 and C2, two subsampling layers S1 and S2, and two classification layers N1 and N2 (Fig. 3.3). Layers from C1 to S2 contained a series of planes, referred to as feature maps that functioned as trainable feature extractors. Layers C1 and C2, respectively, contained 6 feature maps of 28×28 pixels and 12 feature maps of 9×9 pixels. The feature maps were determined by convolution operations performed on a previous layer using trainable kernel matrices of 5×5 pixels. The convolution matrices were summed with a trainable bias and were fed into a sigmoid function to form a feature map. Therefore, layers C1 and C2, respectively, contained 156 (25×6+6) and 312 (25×12+12) trainable parameters. Layers S1 and S2, respectively, contained 6 feature
10
maps of 14×14 pixels and 12 feature maps of 5×5 pixels. These feature maps were the results of subsampling by a factor of 0.5 on the feature maps in layers C1 and C2.
Layers N1 and N2 formed a classical perception network to perform classification.
Layer N1 contained 300 neurons each of which connected to a pixel in layer S2. Layer N2
comprised 3 neurons fully connected to all the neurons in N1. The N2 neurons were outputs of a sigmoid function on the weighted sum of all the N1 neurons added biases.
Therefore, layers N1 and N2 contained 900 trainable weights and 3 trainable biases.
Figure 3.3 CNN architecture.
Stochastic back-propagation was applied to train the 1371 CNN parameters. The algorithm shuffled the training samples and arranged them into 478 batches .Each epoch go through every batch with back-propagation as input to update the model parameters.
The shuffle was performed at each epoch. The randomization of the training data was exploited to convergence at global minima. The system was trained by cycling through all the batches for 2000 epochs.
3.3 Detection of barcode with various module sizes
Spatial pyramid (SP) [21] was applied to enable the detection of barcodes at various scales. In the SP process, an input image was downsampled to various spatial resolutions, forming a pyramid of images (Fig. 3.4). The images were partitioned into patches of
11
32×32. The patches were then fed to the developed CNN classifier for detecting barcodes.
Once detected, the locations of the patches in the downsampled images were projected back to the input image. The regions, also referred to as blocks, corresponding to the inverse upsampling areas of the patches were identified for the subsequent process. In this study, the downsampling factors for the SP operation were set to 0.7, 0.5, and 0.3.
The factors were determined to detect barcodes of module sizes ranged between 2 and 11 ppm for 1D barcodes, and between 3 and 13 ppm for 2D barcodes (Table 3.1).
Figure 3.4 Image pyramid for barcode localization. The image pyramid on the left are the results of downsampling the original image by factor of 0.7, 0.5, and 0.3. The solid
frames represent the patches where barcode parts were detected. The dash frames are blocks of the detected barcode parts in the original image.
Table 3.1 Target module sizes for barcode detection.
Spatial pyramid factor 1D barcode (ppm) 2D barcode (ppm)
0.7 5 5.7
3.4 Scan line extraction for one-dimensional barcodes
The scan lines for 1D barcodes were extracted from the input image (Fig. 3.5). The extraction operation was modified from the technique proposed by Chai and Hock [8].
The approach first gathered the positive detection blocks obtained from the SP. Otsu
12
(a)
thresholding [26] and Canny edge detection [27] were performed to each block for enhancing the patterns of parallel lines of 1D barcodes. Hough transform [28] was next applied for identifying the orientations of the parallel lines in each block. The median orientation of the parallel lines for all the blocks was then determined. The scan lines were the lines passing through the centers of the blocks with a direction perpendicular to the median orientation. A demonstration of the 1D barcode extraction procedures are shown in Figure 3.6.
Figure 3.5 Scan line extraction process.
(d) (e) (f)
Figure 3.6 A demonstration of 1D barcode extraction operations: (a) input image (b) barcode candidates (c) block images (d) binarized block images (e) edge-detection
block images (f) extracted scan lines.
(b) (c)
13
3.5 Region extraction for two-dimensional barcodes
Complete regions of 2D barcodes were extracted from the input image (Fig. 3.7).
The extraction first involved a smearing of the regions that could potentially be barcodes.
Adaptive thresholding [29] and Canny edge detection were applied to determine the boundaries of the objects in the regions. The regions were then smeared using morphological closing and a square-shaped structuring element. The size of the structuring element was set to the barcode module size, which was estimated as the median pixel quantities of the black and white blobs between the centers of two consecutive positive detection patches in the image. The regions after smearing may still contain holes. Morphological filling was then applied to fill the holes. Morphological opening was next applied to reduce the noise sparkles in the background of the image.
The proposed smearing approach considered module sizes of the 2D barcodes. Therefore, it could precisely smear only the regions that were potentially to be barcodes and could separate background objects from the barcode regions.
The extraction subsequently involved segmentation and standardization. The smeared regions associated with positive detections from the CNN were identified.
Hough transforms were performed to detect the boundaries of the regions. Perspective transformation was next applied to restore the potentially distorted regions [30] to quadrangles (Figure 3.8). A demonstration of the 2D barcode extraction procedures are shown in Figure 3.9.
14
Figure 3.7 Area extraction for 2D barcodes.
Figure 3.8 An illustration of the inverse perspective transformation
(a) (b) (c) (d)
(e) (f) (g) (h) Figure 3.9 A demonstration of 2D barcode extraction operations: (a) input image (b) binarized image (c) edge-detection image (d) closed image (e) filled image (f) opened
image (g) barcode candidate image (h) output image.
15