• 沒有找到結果。

下世代服務型機器人快速工作定義和全自主執行技術---子計畫三:快速辨識立體物件之機器人視覺技術

N/A
N/A
Protected

Academic year: 2021

Share "下世代服務型機器人快速工作定義和全自主執行技術---子計畫三:快速辨識立體物件之機器人視覺技術"

Copied!
24
0
0

加載中.... (立即查看全文)

全文

(1)

行政院國家科學委員會專題研究計畫 成果報告

下世代服務型機器人快速工作定義和全自主執行技術--子 計畫三:快速辨識立體物件之機器人視覺技術

研究成果報告(精簡版)

計 畫 類 別 : 整合型

計 畫 編 號 : NSC 99-2221-E-011-098-

執 行 期 間 : 99 年 08 月 01 日至 100 年 07 月 31 日 執 行 單 位 : 國立臺灣科技大學機械工程系

計 畫 主 持 人 : 徐繼聖

計畫參與人員: 碩士班研究生-兼任助理人員:郭昱慶 碩士班研究生-兼任助理人員:卓佑霖 碩士班研究生-兼任助理人員:Pendry Ale

報 告 附 件 : 出席國際會議研究心得報告及發表論文

處 理 方 式 : 本計畫可公開查詢

中 華 民 國 100 年 10 月 26 日

(2)

行政院國家科學委員會補助專題研究計畫

成 果 報 告

□期中進度報告

快速辨識立體物件之機器人視覺技術 第一期計畫

計畫類別:□ 個別型計畫 ■ 整合型計畫 計畫編號:NSC 99-2221-E-011-098-

執行期間:99 年 8 月 1 日至 100 年 7 月 31 日

計畫主持人:徐繼聖

計畫參與人員:郭昱慶、卓佑霖、Pendry Alexandra

成果報告類型(依經費核定清單規定繳交):■精簡報告 □完整報告

本成果報告包括以下應繳交之附件:

■可供推廣之研發成果資料表

□赴國外出差或研習心得報告一份

□赴大陸地區出差或研習心得報告一份

■出席國際學術會議心得報告及發表之論文各一份

處理方式:除產學合作研究計畫、提升產業技術及人才培育研究計畫、

列管計畫及下列情形者外,得立即公開查詢

□涉及專利或其他智慧財產權,□一年□二年後可公開查詢

執行單位:國立臺灣科技大學

中 華 民 國 壹 佰 年 十 月 十 日

(3)

中文摘要

本計畫之目的為發展一套可在不同視角與不同光照條件下快速辨識立體物件的視覺系統。

研究先回顧近年發展迅速的不變特徵偵測器與描述子,整理不同的不變特徵適用辨識的物 件特性,找出對光照和視角變化較具抵抗力的特徵,並將其應用在實際立體物件--骰子辨識。

骰子辨識將是一型娛樂用機器人的關鍵技術。主要步驟包括骰身與背景脫離,骰面切割,

共平面確認,骰點偵測和辨識。實驗證明某些不變特徵可有效進行前背景脫離,切割不同 面向的骰面,確認共平面,有助於建立不同視角的轉換式,適用於不同視角的特徵匹配。

並且這些不變特徵可容忍較大的光源變化,有助於不同光照條件下的物件辨識。本研究亦 將不變特徵應用於建立立體空間模型與車牌中的字元切割,均得到肯定的結果。本計畫之 實際成果包括書藉專章一份、國際研討會論文三篇、國內研討會論文二篇、申請中專利一 份、審核中期刊論文一篇。

英文摘要

The objective of this research is to develop a monocular vision system for fast 3D object recognition which must be robust to illumination and viewpoint variation. Invariant feature detectors and descriptors are studied for their potentials in the development. An extensive review is completed that summarizes the findings and comments reported by many researchers who made substantial contributions to the study on invariant features and interest regions in the last decade. Guided by this review, several invariant features are exploited and experimented in an automatic dice reading system which involves the segmentation of 3D objects, the dice, and recognition of the dots on the dice. This dice recognition system will be the core part of an entertainment robot. It is experimentally confirmed that some invariant features are better than others in rendering consistent matches across viewpoints and illumination conditions, leading to a fast and robust dice recognition solution. Invariant features are also applied in character segmentation and 3D scene modeling, both yielding satisfactory performance. This research leads to five conference papers, one book chapter, one patent application, and one under-reviewed journal paper.

關鍵詞: Invariant feature, covariant feature, interest region, local descriptor, stereo vision, object recognition.

(4)

前言

本計畫為開發「快速辨識立體物件之機器人視覺技術」之第一年計畫。依原計畫書規 劃,研發重點在發展單鏡頭視覺系統,利用不同視角 2D 影像中的特徵點,定義出 3D 物體 在不同視角下的特徵,再根據視角轉換計算出的轉換矩陣,決定有效特徵點的增加與刪除。

實驗的重點在測詴環境參數改變下,執行 3D 物件登錄與辨識的效果,並優化辨識核心以 滿足快速登錄與即時辨識的需求。本研究之實際執行內容包括下列三項工作:一、回顧並 研習最近十年發表之不變特徵偵測器與描述子(invariant or covariant feature detector and descriptor),整理不變特徵相關之效能評估,選取效能較優者進行實驗確認;二、將效能較 優的不變特徵應用在 3D 物件擷取與骰點辨識上,骰點辨識為本團隊執行多年之研究,本 計畫首次將不變特徵應用於立體骰身之不同平面切割,共平面特徵擷取與比對;三、將前 兩項所選擇之不變特徵應用在其它立體物件之辨識。

第一年之主要研究成果含下列數項:一、完成一書籍專章,整理歸納近年發表的不變 特徵與相關的效能評估,提供持續研究 3D 物件辨識的參考方向;二、應用不變特徵擷取 數個骰子的共平面處,並偵測在不同視角和光源下的骰點空間分佈,由此可準確讀取骰點,

並辨識其立體屬性;論文已發表於國內外相關研討會,並申請專利;三、不變特徵 shape context 與 MSER 應用於行人偵測、3D 場景重建、和字元切割,均得到肯定的結果;相關 成果已發表於國外研討會和畢業論文,併入最新實驗結果的新投稿尚持續進行中。

研究目的

本計畫之目的在開發一個可與機器人整合進行快速立體物件辨識的機器視覺系統。原 計畫分三年,其中第一年之目的在發展一單鏡頭視覺系統,定義出可在不同視角、不同光 照條件下辨識立體物件之影像特徵,進而以此特徵執行 3D 物件的快速登錄與辨識。

依上述之研究目的,本計畫擬定並執行了三項在前言中所列之工作項目,該三項工作 是根據下述目的所擬定:

一、近年發展迅速的影像不變特徵偵測器與描述子相關研究,已陸續發表許多成果,

並有許多效能比較的評估報告,本研究應深入探討這些已發表的成果,並推論不 同類型的不變特徵與其適合或不適合辨識的物件類別的關係。

二、在不同視角、不同光照條件、不同距離下利用影像不變特徵進行物件辨識。

三、為延伸本研究的影響力,不變特徵將被應用於本團隊正執行之相關主題,如 3D 場景重建、行人偵測、人臉與車牌辨識等。

研究方法:

本研究提出的方法詳述於下列論文中:

1、Stereo Correspondence with Local Descriptors for Object Recognition, Chapter 7, Advances in Theory and Applications of Stereo Vision, ed. Asim Bhatti, 129~150, Jan 2011, InTech, ISBN: 978-953-307-516-7.

2、Dice Recognition in Uncontrolled Illumination Conditions by Local Invariant Features, Proc. 14th Int’l Conf Computer Analysis of Images and Patterns (CAIP), vol. 2, 188~195, Seville, Spain, Aug. 2011.

(5)

3、(Best paper award) Invariant Features for Dice Recognition Across Illumination, Proc.

2nd IEEE Int’l Conf Multimedia Technology (ICMT), 3112~3115, Hangzhou, China, July 2011.

4、License Plate Recognition for Categorized Applications, Proc. IEEE Conf Vehicular Electronics and Safety (ICVES), 220~225, Beijing, China, July 2011.

上列部份內容亦發表於國內研討會 CVGIP 2011 與 Automation 2011.

第一篇論文摘要(因是書藉專章,故內容較豐富)

Many methods on local descriptors consider each image from stereo or multiple views a single instance without exploring much of the relationship between these instances, ending up with models of multiple independent instances. Using such a model for object recognition is like matching between a training image and a test image.

It is, however, especially interested in this chapter that models are developed integrating the information across multiple training images. The central concern is how to extract local features from stereo or multiple images so that the information from different views can be integrated in the modeling phase, and applied in the recognition phase.

This chapter is composed of the following contents: a few promising affine invariant region detectors are first reviewed in Section 2. Many invariant feature detectors are proposed in the last decade. Because the detected features are invariant to variations in viewpoint, scale, illumination, and other variables, they serve well for establishing correspondences across images. Section 3 reviews a couple local region descriptors which outperform many others in a performance comparison study. These descriptors transform affine invariant regions into vectors or distributions so that some distance measure can be applied to discern the similarity or difference between interest regions.

Those with better invariance to viewpoint changes are especially interested. Section 4 reviews a couple methods that develop models by combining the information from local descriptors extracted across multiple views. These methods offer good examples on how to integrate local invariant features across different views. A case study on performance evaluation and benchmark databases is presented in Section 5, with an introduction to its database, followed by a snapshot on other databases also good for 3D object

recognition study in Section 6.

第二篇論文摘要

A system is proposed for the recognition of the number of the dots on dice in general table game settings. Different from previous dice recognition systems which use a single top-view camera and work only under controlled illumination, the proposed one uses multiple cameras and works for uncontrolled illumination. The proposed system exploits the local invariant features robust to illumination variation and good for building homographies across multi-views. The homographies are used to enhance coplanar features and weaken non-coplanar features, giving a way to segment the top faces of the dice and make up the features ruined by possible specular reflection. To recognize the number of dots on dice, MSER (Maximally Stable Extreme Region) detectors are applied for dice localization and dots identification, followed by certain

(6)

constraints to conform the six known dice patterns. Experiments show that the proposed system can achieve a superb recognition rate in various uncontrolled illumination

conditions.

第三篇論文摘要(雖部份內容與上篇類似,但主要方法為 shape context,上篇為 MSER) A system is proposed for the recognition of the dot numbers on dices in general table game settings. Different from previous dice recognition systems which use a single top-view camera and work only under controlled illumination, the proposed one uses multiple cameras and works for uncontrolled illumination. Under controlled

illumination edges are the prominent features considered in most approaches. But reflection, often observed in uncontrolled illumination, makes the approaches solely based on edges ineffective. The proposed system exploits the local invariant features robust to illumination variation and good for building the homographies between multiple viewpoints. The homographies can enhance coplanar features and weaken non-coplanar features, giving a way to segment the top surfaces of the dices and make up the features ruined by reflection. A coarse-to-fine search with a shape-dependent local descriptor is designed to identify the dots on the segmented top surfaces. The identified dots are clustered subject to certain constraints so that each cluster conforms to one of the six know dice patterns. Experiments show that the proposed system gives satisfactory performance for various uncontrolled illumination conditions.

第四篇論文摘要(此為本研究之延伸成果,應用不變特徵於車牌字元切割)

The variables and the variation scope considered in each variable would be different for different applications of vehicle license plate recognition (VLPR). This research splits major VLPR applications into three categories: access control, traffic law enforcement, and road patrol. Each category is characterized by the variables, including plate size and illumination condition, camera viewpoint, with different scopes of variation.

Applications with more variables or larger variation scopes, as in road patrol cases, require more sophisticated methods and higher computational cost than those with fewer variables or less variation scopes, as in the access control cases. It is uneconomic to apply the methods developed for road patrol to handle access control. On the contrary, a method developed for access control cannot solve most cases in road patrol. Different from most previous works without specifying applications, this paper redefines the VLPR problem using the variables and their variation scopes for the three major applications. Because no benchmark database is available for the evaluation of VLPR methods on the three major applications, a database, called AOLP (Application-Oriented License Plate) database, composed of three application- oriented subsets is introduced and made available to the research community. There has not been a method commonly acknowledged as a baseline although VLPR has been an active research topic for more than a decade. A modular method, whose components can be adjusted for the three applications, is proposed and benchmarked on our database. In the character

segmentation module, the MSER (Maximally Stable Extremal Region) is exploited and yields a performance better than previous approaches. The proposed method is

(7)

compared with a few competitive ones to highlight its value as a benchmark.

文獻探討(節錄於上述論文以完整呈現已執行之相關文獻探討)

As object recognition being the central concern of this research, the literature survey mostly covers the works on invariant features, interest region detection, and local descriptors, especially those with robustness against variation in illumination and viewpoint.

Harris and Hessian affine detector:

Harris-Affine region detector exploits a combination of Harris corner detector, Gaussian scale-space and affine shape adaptation (Mikolajczyk, 2005). Given an image, the algorithm for detecting Harris-Affine regions consists of the following steps: (1) Detection of

scale-invariant interest regions using Harris-Laplace detector and characteristic scale selection; (2) Normalization of the scale-invariant interest regions using affine shape

adaptation; (3) Iterative estimation of the affine region; (4) Affine region update on scale and localization. In addition to the Harris-Affine region based on the Harris-Laplace detector, a similar alternative is Hessian-Affine region detector based on the Hessian matrix. Both are effective for detecting blobs and ridges, but the latter performs better in detecting long blobs.

Maximally Stable Extremal Region (MSER):

MSER considers the set of all possible thresholds able to binarize an intensity image, and an MSER is a connected region with little change in its size for a range of thresholds (Matas et. al, 2002). Because it is defined exclusively by the intensity function in the region and the outer border, and the local binarization is stable over a large range of thresholds, it possesses many favored characteristics, such as robustness to changes in viewpoint, illumination, scale and even occlusion.

SIFT and GLOH Descriptors:

Local region descriptors are mostly in vector forms that can characterize the pattern of an interest point with its neighboring region. Ten different descriptors were reviewed by

Mikolajczyk (2005), and it revealed that the GLOH (Mikolajczyk, 2005) performs the best, closely followed by SIFT (Lowe, 2004) and shape context (Belongie et al., 2002) in generating more correct matches under viewpoint and scale changes. These three descriptors also outperform others in most tests with different variables and settings. The SIFT (Scale- Invariant Feature Transform) descriptor, proposed by Lowe (2004), is derived from a 3D histogram of gradient location and orientation. GLOH is a modified version of SIFT, which computes a SIFT descriptor for a log-polar location grid with bins in both radial and angular directions.

Shape Context Descriptor:

Shape context, proposed by Belongie et al., (2002), is a descriptor that characterizes the shape of an object. Given a shape, which can be obtained by an edge detector, one can pick a point on the shape and compute the histogram of the relative coordinates of the remaining points. This design makes the descriptor more sensitive to the locations of nearby shape points than to those farther apart. Belongie use 5 bins for radial and 12 bins for angluar, giving a descriptor of dimension 60; while Mikolajczyk split radial into 9 bins and angular into 4 bins, resulting in a descriptor of dimension 36.

(8)

Integration of Local Descriptors from Multi-Views:

Depending on how the model of a given object is built, the approaches of using local invariant regions for object recognition can be split into two categories. One takes a single view of the object for developing the model, while the other uses multiple views. Both recognize the object in different views along with occlusions and different geometric and photometric conditions. Because of multiple views of the object considered in the modeling phase, the multi-view based methods can recognize the object in a much broader range of conditions. As far as stereo vision for 3D object recognition is concerned, only the methods using multi-views are considered in this section. Two methods are selected, one is given by Lowe (2001) that fuses the SIFT features from multiple views of an object into a single model with view-dependent clusters, and the other, proposed by Rothganger et. al (2006), builds a patch-based 3D model using affine descriptors and multi-view spatial constraints.

Databases for 3D Object Recognition:

The database used by Rothganger et. al (2006) consists of 9 objects and 80 test images.

The training images are stereo views for each of the 9 objects that are roughly equally spaced around the equatorial ring for each of them. The number of stereo views ranges from 7 to 12 for the different objects. The test images are monocular images of objects under varying amounts of clutter and occlusion and different lighting conditions. In addition, several other databases can also be considered for benchmarking stereo vision algorithms for object recognition. The ideal databases must offer stereo images for training, and test images collected with variations in viewpoint, scale, illumination, and partial occlusion. A few samples taken from the dataset used by Rothganger et. al (2006) are shown in Fig.1.

Fig.1 Samples from the dataset by Rothganger et. al (2006), the top two rows are from those for training, and the bottom two for testing.

(9)

計畫成果自評:

若以下表為成果自評標準:

等級 特優 欠佳

評判標準

完 成 原 計 畫 大 部 份 規 劃 工作,作品得 到國際(如知 名期刊、專利 等)之肯定,

或 引 發 廣 泛 產業關注。

完 成 原 計 畫 大 部 份 規 劃 工作,作品得 到 廣 泛 的 肯 定,如領域內 國 際 頂 尖 研 討 會 或 相 關 專利等。

完 成 原 計 畫 大 部 份 規 劃 工作,作品得 到 些 許 肯 定,如領域內 一 般 研 討 會 等。

完 成 原 計 畫 大 部 份 規 劃 工作,但作品 尚 未 得 到 其 它單位肯定。

未 完 成 原 計 畫 大 部 份 規 劃工作。

本計畫成果包括書藉專章一份、國際研討會論文三篇、國內研討會論文二篇、申請中專 利一份、審核中期刊論文一篇。其中書藉專章 Stereo Correspondence with Local Descriptors for Object Recognition 的網路版本自元月出版以來,已被下載 117 次(可由下列網址取得資料) http://www.intechopen.com/articles/show/title/stereo-correspondence-with-local-descri ptors-for-object-recognition。國際研討會中,CAIP 為影像處理模式識別領域重要研討會 之一;ICVES 為智慧車輛研究領域重要研討會之一;雖 ICMT 為較新的多媒體技術領域的 研討會,但參加者眾,本篇論文在 120 餘篇接受的論文中,入選 Best Paper Awards (共 16 篇獲獎)。再考慮申請中專利一份、審核中期刊論文一篇和國內研討會論文二篇,故成果自 評為優等。

(10)

可供推廣之研發成果資料表

▓ 可申請專利(專利申請中) ▓ 可技術移轉 日期:100 年 8 月 10 日

國科會補助計畫

計畫名稱:快速辨識立體物件之機器人視覺技術 計畫主持人:徐繼聖

計畫編號:NSC 99-2221-E-011-098- 學門領域:自動化系統整合技術

技術/創作名稱 Stereo Vision Based Dice Recognition Method and System for Uncontrolled Environments

發明人/創作人 徐繼聖、張訓嘉

技術說明

1. Different from existing dice recognition systems, which only work under controlled illumination, the present invention can work in uncontrolled illumination conditions. To enable an easy

integration with a game table, the present invention considers tilted views to the dice captured by the cameras held on the peripheral supports.

2. The present invention is composed of two major modules: dice segmentation and dots identification. Given dice images of different viewpoints, local invariant features are extracted and compared on the performance of rendering homographies with least error. The homographies can be used to enhance the coplanar features and weaken the non-coplanar features. This leads to an extraction of the coplanar features and segmentation of the top surfaces of the dice even when some features are ruined by specular reflection.

3. The dots on the segmented top surfaces of the dice must be identified as some lighting can blur the dots and specular reflection spots can appear as valid dots. A MSER (Maximally Stable Extreme Region) detector is applied for its consistency in rendering local interest regions across large illumination variation.

可利用之產業 可開發之產品

1. 休閒娛樂產業 2. 智慧型機器人產業 3. 精品玩具相關產業

技術特點

不同於其他已見於 Las Vegas 和 Macau 的娛樂場所之密閉式骰點辨識系 統,本發明是目前唯一可應用於開放式環境的自動骰點辨識系統,故可 應用於上述產業之產品開發。例如一般娛樂場所內使用之骰盅,無需更 換,僅需加裝本發明所設計之系統與,即可進行自動骰點辨識。本系統 利用複數台攝影機擷取不同視角、不同距離、不同光照條件下的骰子影 像,利用立體視覺,取得骰點分佈的幾何空間訊息,可進行精確的骰點 辨識。本系統除攝影機外,亦含遮罩與輔助光源設計,可濾除影響辨識 率之環境光源,有效提升辨識率。

推廣及運用的價值 本技術可推廣至休閒娛樂、智慧型機器人、精品玩具等產業之創新產品 設計與製造,或提昇現有產品功能,加強產品國際競爭力。

上述研發成果已透過本校技術移轉中心進行本國、中國與美國專利申請。

附件 1

(11)

出席國際學術會議心得報告

Gee-Sern Hsu

I joined the CAIP 2011 held in Seville, Spain on Aug 29~31, and presented the paper Dice Recognition in Uncontrolled Illumination Conditions by Local Invariant Features. According to the conference ranking referred in http://www.cs.ucla.edu/~eklee/paper/CS_conf_rank.htm and a few other websites, CAIP is considered a fine conference in the area of computer vision. It is given 0.84 on a unity scale, compared with ICCV (0.96) and ICIP (0.71), in http://perso.crans.org/

~genest/conf.html.

Quite a few attendees were interested to know more details about how we determine matches across different views as features from different dice revealed the same characteristics. Some were impressed by the extraction of features across dice, on top of the features within each die, using invariant features. Many considered our work an interesting application of invariant features to the entertainment and game technology sector. I joined several talks with topics on face recognition, object recognition, kernel methods and stereo vision. Among those I kept in contact with, the work by Herrera, a Ph.D. student from the University of Oulu, is closely related to the continuing phase of this research. He proposes a simplified method to calibrate Kinect, the depth and color camera pair. We talked about possible collaboration in the near future, and he offers me the package he developed and introduced in the conference. Since the second phase of this research will exploit the depth and color cameras in establishing 3D perception, which has been in progress for now, Herrera’s toolbox will be studied and compared with other tools that are available on the web. We expect this interaction to be able to initiate some collaboration research opportunities good for both parties.

I also joined the ICMT held in Hangzhou, China on July 26~28, and presented the paper Invariant Features for Dice Recognition Across Illumination. ICMT is a new conference on multimedia related areas, and this is the second time after its first commencement in 2010.

Although a new one, it seemed to have attendees no less than other major conferences, and a few keynotes offered by senior researchers, such as E. Hancock, D. Terzopoulos and others. I went to a few talks with topics of my interests, and the keynote by Hancock on “facial shape, texture and reflectance from a single view”. Actually we met again in CAIP, and kept in contact from then as he is also working on face related vision research. Our paper was awarded as one of the 16 Best Paper Awards out of the 120 accepted papers from more than 300 submissions.

Many were impressed by the live demo system that we showed in the conference, which could precisely recognize dice in various illumination conditions.

附件 2

(12)

出席國際學術會議心得報告

Gee-Sern Hsu September 10m 2011

I joined the CAIP 2011 held in Seville, Spain on Aug 29~31, and presented the paper Dice Recognition in Uncontrolled Illumination Conditions by Local Invariant Features. According to the conference ranking referred in http://www.cs.ucla.edu/~eklee/paper/CS_conf_rank.htm and a few other websites, CAIP is considered a fine conference in the area of computer vision. It is given 0.84 on a unity scale, compared with ICCV (0.96) and ICIP (0.71), in http://perso.crans.org/

~genest/conf.html.

Quite a few attendees were interested to know more details about how we determine matches across different views as features from different dice revealed the same characteristics. Some were impressed by the extraction of features across dice, on top of the features within each die, using invariant features. Many considered our work an interesting application of invariant features to the entertainment and game technology sector. I joined several talks with topics on face recognition, object recognition, kernel methods and stereo vision. Among those I kept in contact with, the work by Herrera, a Ph.D. student from the University of Oulu, is closely related to the continuing phase of this research. He proposes a simplified method to calibrate Kinect, the depth and color camera pair. We talked about possible collaboration in the near future, and he offers me the package he developed and introduced in the conference. Since the second phase of this research will exploit the depth and color cameras in establishing 3D perception, which has been in progress for now, Herrera’s toolbox will be studied and compared with other tools that are available on the web. We expect this interaction to be able to initiate some collaboration research opportunities good for both parties.

I also joined the ICMT held in Hangzhou, China on July 26~28, and presented the paper Invariant Features for Dice Recognition Across Illumination. ICMT is a new conference on multimedia related areas, and this is the second time after its first commencement in 2010.

Although a new one, it seemed to have attendees no less than other major conferences, and a few keynotes offered by senior researchers, such as E. Hancock, D. Terzopoulos and others. I went to a few talks with topics of my interests, and the keynote by Hancock on “facial shape, texture and reflectance from a single view”. Actually we met again in CAIP, and kept in contact from then as he is also working on face related vision research. Our paper was awarded as one of the 16 Best Paper Awards out of the 120 accepted papers from more than 300 submissions.

Many were impressed by the live demo system that we showed in the conference, which could precisely recognize dice in various illumination conditions.

(13)

Dice Recognition in Uncontrolled Illumination Conditions by Local Invariant Features

Gee-Sern Hsu, Hsiao-Chia Peng, Chyi-Yeu Lin, and Pendry Alexandra

Department of Mechanical Engineering, National Taiwan University of Science and Technology

[email protected]

Abstract. A system is proposed for the recognition of the number of the dots on dice in general table game settings. Different from previ- ous dice recognition systems which use a single top-view camera and work only under controlled illumination, the proposed one uses multi- ple cameras and works for uncontrolled illumination. Under controlled illumination edges are the prominent features considered by most ap- proaches. But strong specular reflection, often observed in uncontrolled illumination, paralyzes the approaches solely based on edges. The pro- posed system exploits the local invariant features robust to illumination variation and good for building homographies across multi-views. The homographies are used to enhance coplanar features and weaken non- coplanar features, giving a way to segment the top faces of the dice and make up the features ruined by possible specular reflection. To identify the dots on the segmented top faces, an MSER detector is applied for its consistency rendering local interest regions across large illumination vari- ation. Experiments show that the proposed system can achieve a superb recognition rate in various uncontrolled illumination conditions.

Keywords: Object recognition, invariant feature, local descriptor.

1 Introduction

Dice is a popular table game in casinos, especially in Asia. As automatic or computer-controlled games are emerging and becoming popular, many are inter- ested in the technologies able to assist or replace human bankers. A computer vision system is proposed in this paper for dice recognition, which refers to the automatic recognition of the numbers of dots on dice, in normal table game settings. Different from existing dice recognition systems, for example [4] and [5], which work under controlled illumination, the proposed system can work in uncontrolled illumination conditions. In controlled illumination edges are the prominent features considered. But specular reflection, often observed in uncon- trolled illumination, paralyzes the approaches solely based on edges. Fig. 1 shows an image in the middle with strong specular reflection, on the left is its edge map

Corresponding author.

A. Berciano et al. (Eds.): CAIP 2011, Part II, LNCS 6855, pp. 188–195, 2011.

 Springer-Verlag Berlin Heidelberg 2011c

(14)

Dice Recognition in Uncontrolled Illumination Conditions 189

Fig. 1. Middle: specular reflection on the dice; Left: the edge map obtained by previous methods; Right: the edge map obtained by the proposed method

obtained by previous methods. Because it is not limited to controlled illumina- tion, the proposed allows a much wider scope of applications, e.g., integration with table games or different designs of automatic dice games.

Existing dice recognition systems only consider the top view of dice. But a top-view camera is difficult to install on a game table as a specially designed camera support will be needed. To enable an easy integration with a game table, the proposed system considers tilted views to the dice captured by the cameras held on the peripheral supports around the table. Peripheral cameras are more friendly to install on a game table than top-view ones. However top views only capture the top faces of the dice, tilted views reveal the top and side surfaces.

The latter is harder to handle as a method is required to segment the top faces and remove the side surfaces.

The proposed system consists of two major modules: dice segmentation and dots identification. To segment dice, it exploits the local invariant features robust to illumination variation and good for building homographies across multi-views.

The homographies are used to enhance coplanar features, segment the top faces of the dice and make up the features ruined by possible specular reflection.

To identify the dots on the segmented top faces, an MSER (Maximally Stable Extreme Region) [8] detector is applied for its consistency rendering local interest regions across large illumination variation. Although one can consider classifiers for the segmentation and identification, such as that proposed by Viola and Jones [12], they are not considered here as a large amount of training samples are required. The proposed only need a few samples as references.

The rest of this paper is organized as follows: the dice segmentation is pre- sented in Section 2. The dot identification is elaborated in Section 3. Section 4 presents an experimental study of the proposed methods, followed by a conclu- sion in Section 5.

2 Dice Segmentation Using Local Invariant Features

Because dice can pose in arbitrary locations and orientations on a dice roller base and their sizes vary slightly according to the distance to the camera, local invariant features are explored in capturing these variations. Many local invariant feature detectors were proposed and applied in a broad range of applications.

Reviews on these detectors can be found in [10], and [9], [3]. The invariant

(15)

190 G.-S. Hsu et al.

Fig. 2. Correspondences across two different views on the local invariant features de- tected by a multi-scale Harris-Hessian detector. Many of the detected correspondences are removed for better visual inspection.

feature detectors can be generally categorized into three types [11]. One detects corner-like features, e.g., Harris-affine, Harris-Laplace, and multi-scale Harris detectors.One detects blob-like features, e.g., Hessian-affine, Hessian-Laplace, multi-scale Hessian and Difference of Gaussians (DoG) [7]. Different from the former two types, region detectors extract homogeneous local areas, e.g., the MSER detector [8], which is used in this work for identifying the dots on dice, and will be addressed in details in Sec. 3.

Due to the limitation of Harris and Hessian detectors in handling multiple scales, both are modified with multiple scales and made scale-invariant in [1].

To determine the most appropriate scale for a local feature, Harris-Laplace and Hessian-Laplace both search for the characteristic scale with a Laplace operator added on top of the multi-scales. Harris-affine and Hessian-affine obtain the affine invariant corners or blobs by an iterative estimation of elliptical affine regions proposed by Lindeberg et al. [6]. The shape of the feature region is adapted to ensure that the same region is covered when extracted from a different viewpoint.

The performance of the aforementioned 8 invariant feature detectors in render- ing the most accurate homographies between different viewpoints is evaluated by a comparison to the ground truth obtained using manually selected corre- spondences. All of the invariant regions (or interest regions) are represented in the form of SIFT descriptor [7] as it is experimentally proven as one of the most effective descriptors among others [10]. The match of the invariant fea- tures across views is measured by the Euclidean distance between the feature descriptors, and a threshold on this distance measure is determined to select correspondences. Because a dot on a die in a given view can appear quite similar to a different dot in another view, the scale factor in the local feature detectors is first chosen as that comes with the maximum number of correct correspon- dences. RANSAC [2] is then applied to filter out outliers and determine the most appropriate homographies across different views with matched correspondences.

Our experiments reveal that the multi-scale Harris-Hessian detector gives the best performance. Fig. 2 shows an example of the correspondences across two viewpoints obtained using this detector. The settings and other details of the performance evaluation are reported in Section 4.

(16)

Dice Recognition in Uncontrolled Illumination Conditions 191

GivenN different viewpoints of dice images, N(N −1)/2 homographies would be obtained using the invariant feature correspondences. In most cases 2≤ N ≤ 4 suffices. Each homography and its inverse define the transformation between a pair of different viewpoints, and such a transformation only works for the top faces of the dice as these surfaces are coplanar. This property motivates the stacking of coplanar surfaces to segment the top faces of the dice even when specular reflection appears in certain viewpoints. One can choose a dice image of any viewpoint as a reference image and transform the rest N − 1 images of different viewpoints to the reference one using the corresponding homographies.

Stacking of the reference image andN − 1 transformed images does not just enhance the coplanar features but also weaken the non-coplanar features, as those on the lateral sides of the dice would be overlapped with features from different planes. As the specular reflection can be considered a view-dependent feature, different from the coplanar features observed in other majority of views, it can be removed by imposing a threshold on a similarity measure. An example with N = 3 is shown in Fig. 1, which in the middle shows a view of the dice with strong specular reflection, and on the right is the edge map of the image by stacking the homography-transformed images from the rest two views.

3 Dot Identification and Dice Recognition

Given a segmented top face of a die, an MSER detector [8] is exploited to extract the dots from the segmented area because of its stability in rendering persistent or slowly varying edges around the dots as illumination varies. The extraction of MSER considers the set of all possible thresholds able to binarize an intensity imageI(x) into a binary image EtM(x),

EtM(x) =

1 ifI(x) ≤ tM

0 otherwise. (1)

where tM is the threshold. An MSER is a connected region in EtM(x), with little change in its size for a range of thresholds, extracted with a watershed like segmentation algorithm. The homogeneous intensity regions extracted are stable over a wide range of thresholds. The number of thresholds that maintain the connected region similar in size is known as the margin of the region.

The dots on dice are blob-like objects and MSER usually anchors on the boundaries of such objects, and thus the dots can be better located by MSER compared to other interest region detectors. Fig. 3 shows the MSER regions detected on dice. With some preprocessing, as histogram equalization, MSER can achieve highly accurate identification rate. Fig. 3 shows a case with the segmented top faces, and the regions detected by MSER before and after pre- processing. Note that the MSER can detect incomplete or partial interest regions which can be due to imperfect segmentation.

The dots identified by the MSER are clustered by k-means (k happens to be the number of dice) subject to the constraints that the number of dots in a cluster must be less than 7 and the distance between the farthest dots must

(17)

192 G.-S. Hsu et al.

(a) Segmentation of top faces

(b) Regions detected before preprocessing

(c) Regions detected after preprocessing Fig. 3. The performance of MSER in the identification of the dots

be less than the diagonal of the dice. The spatial distribution of the dots in each cluster must be verified against the 6 known patterns. For example, 6-dot must contain two parallel rows of dots and 3 dots each row. 5-dot must have two crossing rows of dots, 3 dots each row and crossing each other at the same central dot. Specific patterns are configured for 4-, 3-, and 2-dot cases. Depending on the number of dots in a given cluster, the distribution pattern for that number is examined first, and if found incompatible, two possibilities would be verified.

One is a non-dot spot falsely considered as a dot and the other is a valid dot failed to be identified as a dot. A large number of casts and experiments, with details given in Section 4, reveal that such a combination of size-constrained clustering and spatial pattern confirmation yields a superb recognition rate.

4 Experiments

The experimental setup follows a common dice table game ”sci-bo” with three dice, and three cameras of different viewpoints are installed on the sides of a game table. 12 different illumination conditions are configured to study the per- formance of the proposed system, 3 of them chosen as the training set and the rest 9 as the test set, as shown in Fig. 4. The intensity on the dice from the training set is 67, 108, and 138 in average, in 8-bit gray scale, with deviation 8, 10, and 11, respectively. The intensity on the test set is between 45 to 158 in average with deviation from 7 to 12. 120 random cast sessions and 30 manual placement sessions are carried out under each illumination condition. The man- ual placement attempts to create special layouts of the dice, such as three dice in a row and others.

4.1 Homography Based on Local Invariant Features

The training set is for the evaluation of the 8 invariant feature detectors, men- tioned in Section 2, in creating homographies with least error across different illumination conditions. The error EFi is measured by the difference between the correspondences from the invariant-feature-based homography HFi and the ground-truth HG obtained using manually selected correspondences, i.e.,

(18)

Dice Recognition in Uncontrolled Illumination Conditions 193

Fig. 4. First column from the left is the training set with 3 illumination conditions;

the rest is the test set with 9 illumination conditions

EF(a,b)i =||(H(Fa,b)i − H(Ga,b))xbF

i||

NFi

(2)

where H(Fa,b)

i is the homography that transforms the invariant features xbF

i de- tected by the invariant feature detectorFi in the imageIb to the corresponding ones in Ia; HG is the ground-truth homography obtained by manual selected correspondences betweenIa are Ib, NFi is the number of features detected by Fi, anda, b denote two different viewpoints.

Additionally, it is also desired that the correspondences from the feature-based homographies can be consistent across different scales, as some features change with scales. To investigate what features are better than others in rendering de- sired homographies across illumination and scale, the original images in 320×240 pixels are scaled down to smaller sizes, and the error is computed in each size and averaged over the three illumination conditions in the training set. Fig. 5 shows

Fig. 5. Normalized error of feature-based homography across scales and three illumi- nation conditions

(19)

194 G.-S. Hsu et al.

this comparison, the smallest scale with 128×96 reveals relatively high errors, in- dicating that some details between the dice are lost in such a small scale and thus the accuracy in the homography estimation is degraded. Among the eight invari- ant feature detectors we tested, the multi-scale Harris-Hessian detector gives the lowest error at 0.87%, and it is about 1.7 pixels in a 192 × 144 image.

4.2 Dice Identification

The performance evaluation on the 9 test sets reveals the following observations and results:

– As long as the correspondences from the feature-based homography are con- sistent over at least two scales, the average match error can be kept below or near 1%, and the top faces of dice can be perfectly segmented in all tested conditions.

– Two identification rates are measured in each test illumination condition, one is the identification of the dots and the other is the identification of the dot number on each die. The former is shown by the bar on the left and the latter by the bar on the right at each indexed illumination condition in Fig.

6. Because the MSER dot detector has been adjusted to zero miss rate on the price of additional false positives on the training set, the imperfections in the dot identification in Fig. 6 are all caused by false positives. For example, in the brightest illumination condition, indexed ”1”, 1.8%(=1 − 98.2%) of the dots identified are false positives. All false positives are found caused by specular reflection or insufficient lightings. As the intensity of the illu- mination increases, specular reflection becomes stronger, causing more false positives to appear.

– The combination of size-constrained clustering and spatial pattern confirma- tion can effectively remove the false positives and yield superb dice recogni- tion rates in all tested conditions, as shown by the right bar at each indexed illumination in Fig. 6.

Fig. 6. Identification rates in 9 illumination conditions, indexed from 1 to 9; at each index the left bar shows the rate of dot identification, and the right bar shows the rate of dice number identification

(20)

Dice Recognition in Uncontrolled Illumination Conditions 195

5 Conclusion

A solution with invariant features across multiple views is proposed for dice recognition under uncontrolled illumination. An extensive comparison on the performance of various invariant feature detectors in rendering correct homogra- phies under various test conditions and parameters shows that the multi-scale Harris Hessian is the best, and better than the commonly selected SIFT features.

The homographies built on the multi-scale Harris Hessian features are exploited to enhance the coplanar features and weaken the non-coplanar features on the dice. This leads to an extraction of the coplanar features and the segmentation of the top faces of the dice even when the features, observed from some viewpoint, are ruined by specular reflection. An MSER detector is applied for the identi- fication of dots on the top faces, followed by a pattern-specific confirmation of the spatial distribution of dots. Experiments reveal that, although false positives of dots are observed in few cases, as under strong or insufficient illumination, the numbers of the dots on the dice can still be recognized accurately by the proposed solution.

References

1. Dufournaud, Y., Schmid, C., Horaud, R.: Matching images with different resolu- tions. In: CVPR, pp. 1612–1618 (2000)

2. Fischler, M.A., Bolles, R.C.: Random sample consensus: A paradigm for model fitting with applications to image analysis and automated cartography. Commun.

ACM 24(6), 381–395 (1981)

3. Hsu, G.S.J.: Stereo Correspondence with Local Descriptors for Object Recognition.

In: Advances in Theory and Applications of Stereo Vision, ch. 7, pp. 129–150.

InTech (2011)

4. Huang, K.Y.: An auto-recognizing system for dice games using a modified unsu- pervised grey clustering algorithm. Sensors 8(2), 1212–1221 (2008)

5. Lai, Y.N., Hsu, S.T., Wang, C.Y., Tsai, M.T.: Method for recognizing dice dots.

U.S. Patent No. 2009/0263008 A1 (October 2009)

6. Lindeberg, T., G˚arding, J.: Shape-adapted smoothing in estimation of 3-d shape cues from affine deformations of local 2-d brightness structure. Image Vision Com- put. 15(6), 415–434 (1997)

7. Lowe, D.G.: Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision 60(2), 91–110 (2004)

8. Matas, J., Chum, O., Urban, M., Pajdla, T.: Robust wide baseline stereo from maximally stable extremal regions. In: BMVC (2002)

9. Mikolajczyk, K., Tuytelaars, T., Schmid, C., Zisserman, A., Matas, J., Schaffal- itzky, F., Kadir, T., Gool, L.V.: A comparison of affine region detectors. Interna- tional Journal of Computer Vision 65(1-2), 43–72 (2005)

10. Mikolajczyk, K., Schmid, C.: A performance evaluation of local descriptors. IEEE Trans. Pattern Anal. Mach. Intell. 27(10), 1615–1630 (2005)

11. Tuytelaars, T., Mikolajczyk, K.: Local invariant feature detectors: A survey. Foun- dations and Trends in Computer Graphics and Vision 3(3), 177–280 (2007) 12. Viola, P.A., Jones, M.J.: Rapid object detection using a boosted cascade of simple

features. In: CVPR, vol. (1), pp. 511–518 (2001)

(21)

國科會補助計畫衍生研發成果推廣資料表

日期:2011/10/18

國科會補助計畫

計畫名稱: 子計畫三:快速辨識立體物件之機器人視覺技術 計畫主持人: 徐繼聖

計畫編號: 99-2221-E-011-098- 學門領域: 自動化系統整合技術

無研發成果推廣資料

(22)

99 年度專題研究計畫研究成果彙整表

計畫主持人:徐繼聖 計畫編號:99-2221-E-011-098-

計畫名稱:下世代服務型機器人快速工作定義和全自主執行技術--子計畫三:快速辨識立體物件之機 器人視覺技術

量化

成果項目 實際已達成

數(被接受 或已發表)

預期總達成 數(含實際已

達成數)

本計畫實 際貢獻百

分比 單位

備 註 質 化 說 明:如 數 個 計 畫 共 同 成 果、成 果 列 為 該 期 刊 之 封 面 故 事 ...

期刊論文 0 0 0%

研究報告/技術報告 0 0 0%

研討會論文 2 1 100%

論文著作

專書 0 0 0%

申請中件數 0 0 0%

專利 已獲得件數 0 0 0%

件數 0 0 0%

技術移轉

權利金 0 0 0% 千元

碩士生 2 2 100%

博士生 0 0 0%

博士後研究員 0 0 0%

國內

參與計畫人力

(本國籍)

專任助理 0 0 0%

人次

期刊論文 0 1 50% 期刊論文審稿中

研究報告/技術報告 0 0 100%

研討會論文 3 2 100%

論文著作

專書 1 0 100% 章/本

申請中件數 1 1 100%

專利 已獲得件數 0 0 0%

件數 0 0 0%

技術移轉

權利金 0 0 0% 千元

碩士生 1 1 100%

博士生 0 0 0%

博士後研究員 0 0 0%

國外

參與計畫人力

(外國籍)

專任助理 0 0 0%

人次

(23)

其他成果

(無法以量化表達之成

果如辦理學術活動、獲 得獎項、重要國際合 作、研究成果國際影響 力及其他協助產業技 術發展之具體效益事 項等,請以文字敘述填 列。)

1. The paper presented at ICMT receives Best Paper Award;

2. Collaboration with the Machine Vision Group, University of Oulu, Finland is being initialized.

成果項目 量化 名稱或內容性質簡述

測驗工具(含質性與量性) 0

課程/模組 0

電腦及網路系統或工具 0

教材 0

舉辦之活動/競賽 0

研討會/工作坊 0

電子報、網站 0

目 計畫成果推廣之參與(閱聽)人數 0

(24)

國科會補助專題研究計畫成果報告自評表

請就研究內容與原計畫相符程度、達成預期目標情況、研究成果之學術或應用價 值(簡要敘述成果所代表之意義、價值、影響或進一步發展之可能性)、是否適 合在學術期刊發表或申請專利、主要發現或其他有關價值等,作一綜合評估。

1. 請就研究內容與原計畫相符程度、達成預期目標情況作一綜合評估

■達成目標

□未達成目標(請說明,以 100 字為限)

□實驗失敗

□因故實驗中斷

□其他原因 說明:

2. 研究成果在學術期刊發表或申請專利等情形:

論文:■已發表 □未發表之文稿 □撰寫中 □無 專利:□已獲得 ■申請中 □無

技轉:□已技轉 ■洽談中 □無 其他:(以 100 字為限)

本研究計畫目前之成果包括書藉專章一份、國際研討會論文三篇(其中一篇獲最佳論文 獎)、國內研討會論文二篇、申請中專利一份、審核中期刊論文一篇。

3. 請依學術成就、技術創新、社會影響等方面,評估研究成果之學術或應用價 值(簡要敘述成果所代表之意義、價值、影響或進一步發展之可能性)(以 500 字為限)

本技術可推廣至休閒娛樂、智慧型機器人、精品玩具等產業之創新產品設計與製造,或提 昇現有產品功能,加強產品國際競爭力。

不同於其他已見於 Las Vegas 和 Macau 的娛樂場所之密閉式骰點辨識系統,本技術所研發 出之系統,是目前唯一可應用於開放式環境的自動骰點辨識系統,可應用於上述不同產業 之產品開發。例如一般娛樂場所內使用之骰盅,無需更換,僅需加裝本發明所設計之系統 與,即可進行自動辨識。本系統利用多台攝影機擷取不同視角、不同距離、不同光照條件 下的骰子影像,利用立體視覺,取得骰點分佈的幾何空間訊息,可進行精確的骰點辨識。

本系統除攝影機外,亦含遮罩與輔助光源設計,可濾除影響辨識率之環境光源,有效提升 辨識率。本系統可推廣於休閒娛樂產業智、慧型機器人產業、或精品玩具相關產業。

數據

Fig. 1. Middle: specular reflection on the dice; Left: the edge map obtained by previous methods; Right: the edge map obtained by the proposed method
Fig. 2. Correspondences across two different views on the local invariant features de- de-tected by a multi-scale Harris-Hessian detector
Fig. 4. First column from the left is the training set with 3 illumination conditions;
Fig. 6. Identification rates in 9 illumination conditions, indexed from 1 to 9; at each index the left bar shows the rate of dot identification, and the right bar shows the rate of dice number identification

參考文獻

相關文件

二專 工業機械 二技 工業電子科技(高級技術員) 二專 汽車製造維修 二技 無塵室設備(高級技術員) 二專 紡織機械 二技 自動化科技電子(高級技術 二專 機電整合 二技

由於機器腳踏車具有簡便、快速、經濟及停放方便等特性,頗受國人喜愛,作為代步

(三)綜合機械、

在遊戲中的未來世界

。當時人們發現的引擎在啟動後,機器會去尋找適合

機器人、餐飲服務、花藝、雲端運算、網路安全、3D 數位遊戲藝術、旅 館接待、行動應用開發、展示設計、數位建設

八、 應檢人參加技術士技能檢定學科或術科採筆試非測驗題職類,測試使用計算器,除

 Promote project learning, mathematical modeling, and problem-based learning to strengthen the ability to integrate and apply knowledge and skills, and make. calculated