3D object tracking using mean-shift and similarity-based aspect-graph modeling

(1)

The 33rd Annual Conference of the IEEEIndustrial Electronics Society (IECON) Nov.5-8, 2007, Taipei,Taiwan

3D

O:)bjiect Tracking Using Mean-Shift and

Similarity-Based

Aspect-Graph

Modeling

Jwu-Sheng

Hu, Member, IEEE,

Tzung-Min

SU,

Student

Member,

IEEE,Chung-Wei

Juan,

and George Wang DepartmentofElectrical and ControlEngineering, National Chiao-Tung

University,

Taiwan

jshu(acn.

nctu.edu.tw

Abstract-Themeanshiftalgorithm is a popular method in the

fieldof2D object trackingdue to itssimplicityand robustness over slight variations of lighting condition, scale and view-point over time. However,the appearance of 3Dobject might have distinctive variations for differentviewpoints over time. In thiswork, anovel method fortracking 3Dobjects using mean-shift algorithm and a 3Dobject database is proposed to achieve a more precise tracking. A 3Dobject databaseusing similarity-based aspect-graph is built from 2D images sampled at random intervals from the viewing sphere. Contour and color features of each 2D image are used for modelingthe 3Dobject database. To conducttracking,asuitable object model is selected from the database and the mean-shift tracking is applied tofindthe local minima of asimilarity measure between the color histograms of the object model and the target image. The effectiveness of the proposed method is demonstrated byexperimentswithobjects rotatingandtranslating in space.

I. INTRODUCTION

O BJECTtrackingis a challenging problem due to the need

of real-time computing, complex background, and

variations of lighting condition, scaling and viewpoint variationsovertime. The mean shiftalgorithm [1] isapopular

method in this fieldbecause of its simplicity and robustness.

The goal of mean-shiftalgorithmis tofindthe localminimaof a

similarity measurebetween theweighted feature histograms of theobject model andtargetimage.In[1],the

object

locationcan

be foundby iteratively finding the local minima ofthe similarity

measure of weighted color histogram usingfixed kernel.Many

researches are studied in these years to improve theworkof[I].

Forexample,location and scale were both estimated in thewNvork of [2]. In feature space, both color and shape features are

adoptedin[3] instead of onlyusing color feature. Furthermore, spatiograinis proposed in [4] toreplace histogram to improve the robustness. Aniadaptivekernel model is proposed in [5]to

solve rotation andtranslation, andatunable representation for trackingusiniga setofspatial kernels with variable bandwidths

is proposed in [6]. Despite these research efforts, the tracking

accuracyof inean-shiftalgorithmon3D

object

stillsuffers from thevanrations of

appearaince

of theobject due tothe

chainge

of viewpointovertime.

In this paper,

recogmnzing

3D objects with a 3D object

database isproposedinthis worktoprovideasuitabletemplate for mean-shift tacking. Existing theorems about the high-level

3D object perception can be classified as object-center and viewer-centerrepresentation basedonthe coordinatesystem[7].

An alternative classification can be observed as model-based and view-based representation based on the constituent

elements [8]. The viewer-center and view-based framnework conforms to intuition ofhumaln perception that a person cani

memorizeanunknownobject with severalmajorviewsof itand

does not needan exhaustive 3D object model. Usually onlya

singleview isneeded later foridentifying the 3D object based

onpast

experiences.

Thesimnplest view-based description of ani object isadensely sampled collection ofviews whicharetreated independenitly. Althoughthe object can be described in a greater

detail when a large number of 2D views are collected and memorized, the computing time for recognition as well as

memory space requirement prohibit its usage in practice. Therefore,somemethods have been studiedto extract aminimal

setofobject views.Forexample,aspect-graph representation[9]

focuses onchangesintheshape of the

projection

of theobject. The vertices ofanaspect-grapharethe characteristicviews that extracted from some points on a transparent viewing sphere with the object in its center. Those characteristic views are extractedastheaspects to describe theobject from the densely sampled collection of objectviewsusingvisualevents. Avisual

eventoccurswhen theappearanceofanobject changesbetvween twoneighborviews.

The traditional aspect-graph method [10] is based on an

assumption thatanobject belongs to alimi'ted class ofshapes and characteristicviews canbeextractedusingpriorknowledge of the object. In our previous work [11], a similarity-based aspect-graph, which extendedthe work of[12], is proposedto extract a minimal set of

object

views and allowing an

incremental update ofthe aspect-graph database. Training views

ofan object in [11] are sampled at random intervals from a

viewing sphere and the object representation canbe updated

after gathering a new object view without resorting all the

previouscollected2Dviews.

The object database described above is used as a pool of templates when performing

object

tracking. The contour

obtained fromthecurenttracking resultis utilizedtofindout

the view angle of the

object

and the template for meani-shift algonrthm isupdated whennecessary. Itcainbe shown thatthe dynainic template (e.g., teinplate generated from each image) leads to

an

accumulated error. The proposed method with absolute template indexing canl preventthe errorfromdrifting. Fig. 1 illustrates the blockdiagram of the overall scheme.

The rest of the paper is organized as follows: Section II

describes theprocedure of extractingcontourand color features thatareusedto measurethesimilaritybetweentwo 2Dviews. Next, Section III describes the novelty of this work; similarity-based aspect-graph representations of3Dobjectsare

(2)

An2Dvieew

IJaampta

raidMiterva

Fig. 1. Basic workflow ofthe proposedframework:TI denotes the sufficientnumber of sampledviewisfor building the aspect-graph representation of an object.

variations on the

high-frequency

noises. The equation to

extractthe mainfeatuLre is described as (3).

TF

={ftlI

N}lf-t,

O<t <T) (3)

Fig. 2.Theimagedatabase that contains 4 3D rigidobjects,whereobject1,

object 2..object4 are listed fromleftto right.

built from2Dobjects views and then be used for selecting the suitable template for mean-shift tracking.Subsequently, some experimental results are presented to demonstrate the perfornaiiceof the proposedmethodinnrgid object tracking. Conclusions are finally madeinSection V.

II. FEATUREEXTRACTION A. ForegroundDetection and ContourExtraction

The first image databaseconitains 4realrigid objectsand is listedinFig. 2. According to the effects oflighting, shadows and

highlights

need to be removed before extracting the object features. Therefore, a robust background subtraction frameworkinour previous work [13] is applied to extract the foreground regions with the consideration to shadows and highlights. The usage of foreground detection provides the flexibility to build the object database even in an

out-of-control environment.

In order to extract the shape information from the foreground object,

Canny

edge detection [14] is applied to

extractthe shape edge and GradientVectorFlow Snake(GVF)

[15] is then applied to extract the contour information. The

contourinformationisincludedinasetZ, whichiscomposed

ofNpoints

zi,

where

zi

canbe describedas acomplex form

as(1).

Z {z_}x,Z±j},O<i<V (1)

B. ConltourFeature. Fountoradescnptor

Inorderto avoid the variationin shiftandscale, the edge pointsinside the setZarere-sampling by

(2).

Z ={zi; = (xi+jyi = Lc[(xi -XC%)+ yi(Y -

yc)] L;

(N2)

where Oci<X

(xc,

yC)

isthemeanof (x,

y)

. L meansthe

contour

length

andL meanstheexpectedcontourlength.

Then the Fourier transformis

applied

onZ tocalculate the Fourierdescriptors. The first T2

magnitude

parts areextracted

as the mainfeatureto describe the object shape without the

where

fp

means the magnitude part of Fourier descriptor at the 2rt/ V frequency.

C. Colorfeature: background-weighted histogram

In the work of [1], background-weighted histogram is proposed to improve the robustness of color histogram by

incorporating with background information.

Background-weighted

histogram reduces the variations caused from similar target features with background and inaccurate description of the target.

Let {°

M=oM-1(with

I_'

q = 1 ) be the discrete

histogram ofthe backgroundinthe feature space and o*be its smallestnonzero entry. Thisrepresentation is computedina

region around thetarget. The extent of region is application dependent andwe use an area equalto 1.5 times the target

area. Theweightsarethencomputedas(4)

{v =min(o ii/O' 1)}

M=1-

0 (4)

The

background-weighted histogram

isthen definedas(5)

(5)

q2 CvuZN

k(IlX,

D6){b(x)

-u]

withthe normalizationconstantCexpressedas(6)

C

_1/(Z17jk(x,

x 2)z [h(i) a])

wherec = N-1 be the normalized pixel location in the

regiondefinedasthetargetmodel. Theregioniscenteredat0.

k(x) is a kemel finction that assigns smaller weights to

pixels farther from the center. The function

b:R2 e{0. X 1} associates to thepixelat location :, and

the index b(x,) denotesits bininthequantizedfeaturespace.

Function a is the delta fiuction.

III. SIMILARITY-BASED ASPECT-GRAPHANDMEAN-SHIFT

TRACKING A. SimilarityMeasure

In order to calculate the similarity between 2D views, a

similarity measure metric is necessary to applied on the

extracted contour and color features. Suppose U and V

(6) l

(3)

denote onekind offeature extracted fromtwo 2D views and

L meansthefeaturelength,whicharedescribedas(7).

(7)

U ={u, ' *, " ,U.u

V ={v0, ,V,iV- L4-1I

The similaritymeasuresbetweencontonrfeaturesarethen

calculatedusing1-normdistance, whichisdefinedas(8).

dF(u,v)

=2tl1u- vJ,L

(8)

Intheworkof[1], Bhattacharyya Coefficientisproposed tomeasnre the similarityamong targetmodel and candidates.

We apply it on the similarity measure between two

weighted-color

histograms,

whichisdefinedas(9).

d(M7v)= O/qgi

~~i),

qV) =A/ (9)

B. GenerationofAspects and CharacteristicViews

Inourpreviouswork [12], a similarity-based aspect-graph

isproposed to presentthe3D

object

using aninimal set of 2D

views. The aspects of 3D objects are extracted using 2D

views sampled at ranidom intervals. Moreover, object

representations become more and more detailed nsing new

2Dviews byonlycalcnlatingthe similaritymeasnres among

thenew viewand characteristic views.

Suppose v" means the new sampled view of the

nth

object,

C;(i)

means the i,4 charactenrstic view of the

mi4

aspects of the nt object, Cn and Cn. means the

neighborviews of

cn",

that has themmimumdistance with

Ve

Ae meanstheaspects thathastheminimnm distace

with

v.,

where mn means the index of A Then four

steps areimposedtoformaspectsand characteristicviewsas

Step A-I to A-4 and the flowchart of the modified

aspect-graphrepresentationisillustratedasFig. 2. StepA-I.

When the number of existed aspects of the

n,h

object equals zero,

vw

isregardedas acharacteristicviewofa

new aspect.

StepA-2:

nth

object equalsone or two:

(A-2. 1) If(10)and (II)bothmeet,

vw

is combined into the

mn""

aspectandthe characteristicview of the

aspectkeeps the same;

(A-2.2) Otherwise, if(10) issatisfiedbut(I1)isnot, vC iscombined into the min aspectand isregardedas

a newcharacteristicview ofthe in" aspect;

(A-2.3) Otherwise, if

(1O)

and (II) areboth violated, a

new aspect of the

n1

object is built, and

vw

is

regarded as the new characteristic view of the

newaspect.

(10)

allmildAF(V5C,)<¾

alnl dB(vB( 7Cm;)<

(11)

where ¾ andT¾ are both predefined threshold

value.

Step A-3:

n1,

object is

equalto orgreaterthanthree,

(A-3.1) If (12) or (13) meet and (11) conflicts, a new

aspect is built up and Tv is regarded as the

characteristic view of thenewaspect.

(A-3.2)Otherwise, if(12) and (13) both conflict and(II)

meets,

"v

is combined into the mmn aspectand

the characteristic view of the mn1m aspect keeps the same.

(A-3.3)Otherwise, if(11), (12) and(13)are allviolated,

vns iS combined into the rn"nunaspect and is

regardedas a new"characteristicview ofthe min

aspect.

(12) allCe4dFA(P mCj4

aC

< min

dF

(TlW7f

Cn)<T4 and

dF

(Fs11,C;;e

C )>¾ (13) Moreover, if a newaspect is built, theaspectordercanbe decided using (14). Ifthe similarity distance between vC

and Cn 1 islarger thani thesimilarity distance between g

andC" ' the nevw aspect is inserted between aspect -in

and aspect min 1 Otherwise, the new aspect is inserted

betweenaspect mm andaspect7 mm 1 Therefore, the similar

aspects are closetoeach other.

(14) C. ObjectRecognition using2DCharacteristic Views

Afterbuilding the aspects-graphrepresentation ofeach3D

object, atestview ofan wiknownobject canberecognized

using the similanrty measure with the contour and color

features.Two stepsareimposedasfollows: StepB-1:

Thetestview ofannnknown objectiscompared with the characteristic views of the database via contour features. Then, thefirst T, 2D characteristic views inthe database having the smallest similarity distance with the test 2D view via contour features are preserved to be further recognized.

StepB-2:

Suppose A, is defined as the set that contains the

T6

2D

characteristicviewsdescribedattheStepB-I,then the final

similaritydistaince canbecalculated with the color features

-1 (vil c nmin)> d, (Vn Cnmi"

(4)

L

(14)eet 2 Xe Ad ane cbaracteri§tic view betweeni view m~m... and View m",

Fig. 4. The results of 3D object tracking using mean-shift algorithm without 3Dobject database, where frame 76, 141, 205, 341,373, 437, 469,

509,733!805,885,965,990 1095, 1135,1261 are listed fromlefttoright

and fromtopto down.

Na. ] Add a aew cbaracteri tic view betweenviewm__dviewI

Fig. 3. The flowchart ofthe proposed aspect-graph representation

by (15).

d(IJ7Cm) (dB 7;,Cn))2+("d (Lman/LMQ (15)

where

Wvd

is a weight, v;Z means the 2D view of an

unknowxn

object,

cn denotes

them,

icharactenrstic view of the n11object inthe database 3 Lma denotes the similarity distance calculated using contour features between the unknowxn object and the

inmt

characteristic view of the

nth1

object,

which is definedas

(16) and

(17).

L

=dF(f7,Cn)),

whereCEAT (16)

Fig.5. The results of 3Dobjecttracking using mean-shiftalgorith: object database. The frame indexis thesameasFig. 4.

where g(x) -k'(x)k and w.canbe describedas(21)

il <EAr6J

~~~~~~(17)

LM.,

arg max

(dF(),

C)) (7

D. 3DObject

fean-Shift

Tracking

Let { t 1a be the normalized pixel locations of the targetn

candidates,centeredatyinthecurrentframe. Theprobability of the feature u=I . inthe target candidate is givenby(18)

P(Y)

=C

vIZ'k(Il

(y-

x.)/h

H2)S[b(x)

-u]

(18) with the normalization constant C5 expressedas(19)

C =

/(n

-lk

( (y-X.) Ih

112)Z

Eu

VE[b(xi)

u

(19)

From the work of[1], the new location of object Y- can be calculated by (20) and (21) and then the object location is moved from the current location y0 to the new location Y-till (22) issatisfied, otherwise

y0

= and repeats(20)to(22).

y=

x-

w-gVI

(YO

-x)/hH)

(20)

Y

1 nvg(I0

(Yo--)I

hH)

(21)

421 Z

1=l

, n(O [b(x) -u]

HY-ioy1sc

(22)

After calculating the new location a new region is defined using thenewlocation and then theobject insidetleregionis

recognized using the3Dobjectdatabase. Asuitable

temnplate

is selected as the new target model q for next mean-shift tracking.

IV. EXPERIMENTAL RESULTS

This section describes several experiments to demonstrate the effectiveness of the proposed method. SONYEVI-D30 PTZ camera is used to capture object views. The 3D object database is built to test the proposed method. The training views of each object are captured in random intervals and

each one contains 72views. To testthe proposed

algorithn,

motionvideoof eachobjectis captured whichcontainsabout

2500views for eachobject. Furthermore, the computing time takentorecognize object and mean-shift trackingwasabotit I I~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

(5)

Fig.6. The results of 3Dobject tracking usingmean-shiftalgorithm without 3Dobject database, where frame 13. 36, 80, 127, 135, 144, 194, 213,229,

242, 271, 958, 1028, 1345, 1394, 1646 are listed fromlefttorightand from topto down.

Fig.7.The results of 3Dobjecttracking usingmean-sh

objectdatabase. The frame indexisthesame asFig.6.

one seconds with P4 2.8G CPU and 1IGB RAM. The

parameters used in the following experiments are: T =72,

',=25- T =336 ? T =640. 1 =0.85. T= 3 N= 256.

M= 4096,and

L,

= 250.

A. Similar appearances of each view-point over time on

simplebackground

In the first

experiment,

an

object

that has similar

appearance

of each

view-point

over time is used fortesting the robustness ofthe mean-shift tracking with using3Dobject database. Certain representative framnes are selected and showninFig. 4and Fig. 5.These framesshow that theobject hasmotions with rotation and shift.InFig. 4, the target model is set as the front of the monkey and then the mean shift tracking is applied without using the 3D object database Because the monkey has the similar color distribution with each side, the mean-shift tracker tracks the candidate in all frames. The result is good due to the similar appearance of eachview-point of the object. InFig. 5, the tracking results

arecalculatedusing the proposed method

anid

the resultsare

almost the same asthe results inFig. 4. The tracker withour

proposed Method tracks the

canididate

correctly.

B.

Different

appearancesfrom each view-point over time on simple background

In the second experiment, an object that has different appearance of eachview-point overtime is used fortesting theefficiency of theproposed method. The green turtle shell

is chosen as the target modelin thisexperiment.InFig. 6,the green shell canbe tracked in the frames 13, 36,and 80. With the rotation, the green part is vanished gradually with the cream-colored partincreasing,and the mean-sift tracker loses thecandidatefrom frames 144to 229. When the green part is back, the tracker tracks the candidate again. Themean shift tracking isapplied without using the 3D

object

database and the estimated new location becomes inaccurate when the

appearance isdifferent from the initialtemplate.InFig. 7,the template usedineach frame isreplacedxwith the suitableone

from the 3D object database and the tracker with our

proposedmethod tracks the candidate all thetime.

V. CONCLUSIONS

This workproposesanovelmethod for

tracking

3D

objects

usingmeani-shift algorithm anda3Dobject database. The3D

object database is built usingcontourand color features and presents thesimilarity-based aspect-graph of each3Dobject. When the appearance of 3D object changes over the view-point, asuitableobject model isprovided from the 3D

object database and the mean-shift tracking is applied on findinigthe localmi'nima of a

similarity

measure between the color histograms of the object model and the target image. When ain object has similar appearance of eachview-point, both the proposed method and the traditional mean-shift tracksthe candidate properly. However, the proposed method tracks the candidate well wxhen an object has different appearance of eachview-point, but the traditional mean-shift method fails. The proposed method solves the 3D object tracking problem and the effectiveness of the proposed method is demonstrated by experiments.

ACKNOWLEDGMENTS

This work was supported byNational Science Council of the R.O.C. under grant no. NSC94-2218-E009064 and

DOIT TDPA Program under the project number

95-EC-17-A-04-S1-054.

REFERENCES

[1] D. Comaniciu andM. Peter, "Mean shift: arobustapproach towvard

featuLrespaceanalysis,"IEEETronsoctions onPottern AnalysisAnd Viochine Intelligence, vol. 24,no.5, May2002.

[2] Collins RT., ";Meanshiftblobtracking throughscalespace,"in Proc.

ofIEEEInternotionalConference on Computer lision ond Pottern

Recognition, pp.234-240,2003.

[3] K.She,0.Bebis,H.Gu,and R.Miller,"cVehicle tracking usingon-line fusionofcolor andshapefeatures,"InProc.Ixt ConfonIntelligent

TransportotionSiys.,WashingtonDC, Oct. 2004.

[4] S.T. Birchfield, S. Rangarajan, "Spatiograms versus histograms for region-based tracking"', in Proc. Computer V7ision ond Pottern Recognition,pp.20-25,June 2005.

[5] H. Zhang, W. Huang, Z. Huang, L. Li, "Kernel-Based Method for Tracking Objects vwith Rotation and Translation", Internotional Conference ofPotternRecognition(ICPR),pp.23-26August,2004.

(6)

[6] V.Parameswaran, V.Ramesh,andI. Zoghlami,"Tunable Kernels for Tracking," in Proc. ofthe 2006 IEEE ComputerSocietyConference on Compuiter Vision ond PatternRecognition,2006.

[7] Peters, G., "Theories of Three-Dimensional Object Perception - A Survey," Recent Research Developments in Pattern Recognition,

TranssworldResearchNetwork, 2000.

[8] I.Weiss and M. Ray,"Model-BasedRecognition of 3DObjectsfrom Single Images," IEEE Trans. On PAMI, Vol.23, No.2, pp.116-125,

2001.

[9] Koenderink, J.J. and vanDoorn, A.J. "The singularities of the visual mapping,"Biol. Cyber.24:51-59, 1976.

[10] Ilan Shimshoni, Jean Ponce, "Finite-Resolution Aspect Graphs of PolyhedralObjects,"IEEE Trans. onPatternAnal. Mach. Intell. 19(4): 315-327, 1997.

[11] J.S. Hu, T.M. Su,and C.C. Lin,"Shape Memorization and Recognition of 3-D Objects Using a Similarity-Based Aspect-Graph Approach,""

IEEEInt'lConf onSystems,Mon, ondCybernetics,Oct. 2006. [12] C.M.Cyrand B.Kimia, "ASimilarity-Based Aspect-Graph Approach

to 3D Object Recognition," in International Journal of Computer Vision, 57(l):5-22, 2004.

[13] J.S. Hu, T.M. Suand S.C. Jeng, "RobustBackgroundSubtraction with Shadow andHighlightRemoval for Indoor Environment Surveillance," IEEERSJ1Int'lConfonIntelligent RobotsondSystems,Oct.2006. [14] J. Canny, "A Computational Approach to Edge Detection," IEEE

Transactions on Pattem Analysis and Machine Intelligence, \ol.

PAMI-8,No.6, 1986.

[15] C.Xu and J. L. Prince, "Gradient Vector Flow: A New External Force for Snakes," IEEE Conference on CompuLter Vision and Pattern Recognition,pp.66-71,1997.