5.3 Baseline, AWS, and HCS
5.3.2 Speed
Localization and recognition in adaptive window search algorithm is four times faster than the baseline (RANSAC). The main reason why the baseline method takes such a long time
to localize and recognize each is that there are more book images that have to be checked by geometric verification for each round than AWS. Besides, there are also more features in each recognition step because all the features in input image have to be seen except for those in matched regions. Nevertheless, with AWS, the number of features sent to the recognition server is smaller than baseline because only those in the proposed region are sent. The proposed region contains fewer noisy features, and the feature-matching step has less chance to be affected by the cluttered background. Therefore, we only have to care about the first image matching result, on whether the number of its inlier features is large enough. Also, empirically, the number of images that need to be checked by spatial verification can also be reduced to at most five. For these reasons, AWS does decrease the time to recognize and localize objects.
HCS does not improve a lot compared with the baseline algorithm in the speed part.
HCS needs to spend time building the hierarchical clustering tree, and send every cluster in each level to the recognition server. The number of features in each cluster is fewer than the whole image sent to server in the baseline but larger than the probable region in AWS, which is the reason why HCS is faster than baseline but slower than AWS. Also, AWS stops searching possible region effectively; however, HCS checks every cluster in every level until the number of inliers in a level is smaller than the threshold. It causes HCS to spend more time than AWS.
Table 5.2: Results for recall, precision and speed by applying RANSAC and FastGV on AWS and HCS algorithms. Oracle result is performed by recognizing ground truth books.
Algorithm Oracle AWS AWS HCS HCS
(RANSAC) (FastGV) (RANSAC) (FastGV)
Testing Image 27
Total Books 155
Detected Books X 111 85 96 88
Recognized Books 121 101 84 84 81
Recall 78.06% 65.16% 54.19% 54.19% 52.26%
Precision X 90.99% 98.82% 87.5% 92.05%
Processing Time per Book X 4.42s 4.22s 14.58s 10.73s
Figure 5.4: Sample results of FastGV and RANSAC applying on the HCS algorithm.
Yellow bounding boxes represent the books recognized and localized by the system.
algorithm are shown in Figure 5.4. As shown in Table 5.2, RANSAC has better recall than FastGV in both AWS and HCS algorithms, but has lower precision and lower speed. In AWS algorithm, RANSAC gets 65.16% recall and 90.99% precision while FastGV gets 54.19% recall and 98.82% precision. RANSAC spends 4.42 seconds to recognize and lo-calize a book while FastGV spends 4.22 seconds. In HCS, RANSAC gets 54.19% recall and 87.5% precision while FastGV gets 52.26% recall and 92.05% precision. RANSAC spends 14.58 seconds to recognize and localize a book while FastGV spends only 10.73 seconds. It shows that the proposed FastGV algorithm can achieve similar performance but reduce the processing time.
5.4.1 Recognition Results and Precision
All feature matching pairs will be considered to estimate the affine model by RANSAC algorithm, while FastGV algorithm checks the possible feature matching pairs first for geometric consistency and then only the matching pairs in the high inlier density region will be taken into account to calculate the affine model. Under this situation, some pos-sible inliers (correct matching feature pairs) will be filtered and some objects cannot be localized by FastGV. RANSAC achieves higher recall than FastGV. On the other hand, RANSAC is much likely to get false positive, taking an outlier feature matching pairs as an inlier. Therefore, FastGV performs better than RANSAC in precision.
5.4.2 Speed
RANSAC algorithm proposes a possible affine model in each round, and all the feature-matching pairs need to vote for each round. In this way, FastGV is faster than RANSAC because feature-matching pairs are checked only once to calculate the three geometric similarity scores. The geometric verification step in AWS spends 0.4s by RANSAC and 0.33s by FastGV. It proves that FastGV saves around 17.5% time than RANSAC.
5.5 Comparisons with Similar Works
recognize and localize books. Considering the worst case that there are no books in the input image, branch-and-bound method in [4] [5] may take a large amount of iterations to converge to the optimal solution. By AWS, if the best-matching image retrieved from the recognition server for the input image does not pass the threshold after geometric verification, the process automatically stops proposing any region of interest. Therefore, the way of AWS handling the worst case is more efficient than previous works.
Chapter 6 Conclusions
In this work, we have proposed two algorithms, adaptive window search and hierarchical cluster search, to select possible regions that contain books and send them to the recogni-tion server for image matching and geometric verificarecogni-tion. We have also proposed a fast geometric verification algorithm to improve the speed of the verification step. The results show that we did not get significant improvement in HCS due to the testing dataset, but with AWS, we obtain an improvement around 4 times in speed and 20% in recall. The fast geometric verification algorithm gets a speed improvement in both AWS and HCS.
In AWS, the geometric verification time is shortened in 17.5% time. In future work, ap-plications can be designed to use the information of books such as ISBN as the bridge to retrieve more related data through social networks after recognizing and localizing multi-ple books. AWS and HCS can also be applied to other recognition system with large-scale database (e.g., logos of daily objects) to develop other applications.
References
[1] aNobii, “anobii,” http://www.anobii.com/.
[2] Sam S. Tsai, David Chen, Vijay Chandrasekhar, Gabriel Takacs, Ngai-Man Cheung, Ramakrishna Vedantham, Radek Grzeszczuk, and Bernd Girod, “Mobile product recognition,” in Proceedings of the international conference on Multimedia, New York, NY, USA, 2010, pp. 1587–1590, ACM.
[3] Stephan Gammeter, Alexander Gassmann, Lukas Bossard, Till Quack, and Luc Van Gool, “Server-side object recognition and client-side object tracking for mobile aug-mented reality,” in Proceedings of IEEE International Workshop on Mobile Vision (CVPR 2010), 2010.
[4] Christoph H. Lampert, Matthew B. Blaschko, and Thomas Hofmann, “Beyond slid-ing window: object localization by efficient subwindow search,” in Proceedslid-ings of IEEE Conference on Computer Vision and Pattern Recognition, 2008, pp. 1–8.
[5] Tom Yeh, John J. Lee, and Trevor Darrell, “Fast concurrent object localization and recognition,” in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, June 2009, pp. 280–287.
[6] Olga Russakovsky and Andrew Y. Ng, “A steiner tree approach to efficient object detection,” in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, June 2010, pp. 1070–1077.
[7] Martin A. Fischler and Robert C. Bolles, “Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography,”
Communications of the ACM, 1981.
[8] Sam S. Tsai, David Chen, Gabriel Takacs, Vijay Chandrasekhar, Ramakrishna Vedantham, Radek Grzeszczuk, and Bernd Girod, “Fast geometric re-ranking for image-based retrieval,” in Proceedings of IEEE International Conference on Image Processing, 2010, September 2010, pp. 1029–1032.
[9] 博客來, “博客來,” http://www.books.com.tw/.
[10] Krystian Mikolajczyk and Cordelia Schmid, “Scale and affine invariant interest point detectors,” International Journal of Computer Vision, vol. 60, pp. 63–86, 2004.
[11] K. Mikolajczyk, T. Tuytelaars, C. Schmid, A. Zisserman, J. Matas, F. Schaffalitzky, T. Kadir, and L. Van Gool, “A comparison of affine region detectors,” International Journal of Computer Vision, vol. 65, pp. 43–72, 2005.
[12] J. Matas, O. Chum, M. Urban, and T. Pajdla, “Robust wide-baseline stereo from maximally stable extremal regions,” Image and Vision Computing, vol. 22, no. 10, pp. 761–767, 2004.
Pinz, Eds., vol. 3951 of Lecture Notes in Computer Science, pp. 404–417. Springer Berlin / Heidelberg, 2006.
[15] David Chen, Sam S. Tsai, Bernd Girod, Cheng-Hsin Hsu, Kyu-Han Kim, and Jatin-der Pal Singh, “Building book inventories using smartphones,” in Proceedings of the international conference on Multimedia, New York, NY, USA, 2010, pp. 651–654, ACM.
[16] James Philbin, Ondrej Chum, Michael Isard, Josef Sivic, and Andrew Zisserman,
“Object retrieval with large vocabularies and fast spatial matching,” in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, June 2007, pp.
1–8.