The 33rd Annual Conference of the IEEEIndustrial Electronics Society (IECON) Nov.5-8, 2007, Taipei,Taiwan
3D
O:)bjiect Tracking Using Mean-Shift and
Similarity-Based
Aspect-Graph
Modeling
Jwu-Sheng
Hu, Member, IEEE,
Tzung-MinSU,
StudentMember,
IEEE,Chung-WeiJuan,
and George Wang DepartmentofElectrical and ControlEngineering, National Chiao-TungUniversity,
Taiwanjshu(acn.
nctu.edu.twAbstract-Themeanshiftalgorithm is a popular method in the
fieldof2D object trackingdue to itssimplicityand robustness over slight variations of lighting condition, scale and view-point over time. However,the appearance of 3Dobject might have distinctive variations for differentviewpoints over time. In thiswork, anovel method fortracking 3Dobjects using mean-shift algorithm and a 3Dobject database is proposed to achieve a more precise tracking. A 3Dobject databaseusing similarity-based aspect-graph is built from 2D images sampled at random intervals from the viewing sphere. Contour and color features of each 2D image are used for modelingthe 3Dobject database. To conducttracking,asuitable object model is selected from the database and the mean-shift tracking is applied tofindthe local minima of asimilarity measure between the color histograms of the object model and the target image. The effectiveness of the proposed method is demonstrated byexperimentswithobjects rotatingandtranslating in space.
I. INTRODUCTION
O BJECTtrackingis a challenging problem due to the need
of real-time computing, complex background, and
variations of lighting condition, scaling and viewpoint variationsovertime. The mean shiftalgorithm [1] isapopular
method in this fieldbecause of its simplicity and robustness.
The goal of mean-shiftalgorithmis tofindthe localminimaof a
similarity measurebetween theweighted feature histograms of theobject model andtargetimage.In[1],the
object
locationcanbe foundby iteratively finding the local minima ofthe similarity
measure of weighted color histogram usingfixed kernel.Many
researches are studied in these years to improve theworkof[I].
Forexample,location and scale were both estimated in thewNvork of [2]. In feature space, both color and shape features are
adoptedin[3] instead of onlyusing color feature. Furthermore, spatiograinis proposed in [4] toreplace histogram to improve the robustness. Aniadaptivekernel model is proposed in [5]to
solve rotation andtranslation, andatunable representation for trackingusiniga setofspatial kernels with variable bandwidths
is proposed in [6]. Despite these research efforts, the tracking
accuracyof inean-shiftalgorithmon3D
object
stillsuffers from thevanrations ofappearaince
of theobject due tothechainge
of viewpointovertime.In this paper,
recogmnzing
3D objects with a 3D objectdatabase isproposedinthis worktoprovideasuitabletemplate for mean-shift tacking. Existing theorems about the high-level
3D object perception can be classified as object-center and viewer-centerrepresentation basedonthe coordinatesystem[7].
An alternative classification can be observed as model-based and view-based representation based on the constituent
elements [8]. The viewer-center and view-based framnework conforms to intuition ofhumaln perception that a person cani
memorizeanunknownobject with severalmajorviewsof itand
does not needan exhaustive 3D object model. Usually onlya
singleview isneeded later foridentifying the 3D object based
onpast
experiences.
Thesimnplest view-based description of ani object isadensely sampled collection ofviews whicharetreated independenitly. Althoughthe object can be described in a greaterdetail when a large number of 2D views are collected and memorized, the computing time for recognition as well as
memory space requirement prohibit its usage in practice. Therefore,somemethods have been studiedto extract aminimal
setofobject views.Forexample,aspect-graph representation[9]
focuses onchangesintheshape of the
projection
of theobject. The vertices ofanaspect-grapharethe characteristicviews that extracted from some points on a transparent viewing sphere with the object in its center. Those characteristic views are extractedastheaspects to describe theobject from the densely sampled collection of objectviewsusingvisualevents. Avisualeventoccurswhen theappearanceofanobject changesbetvween twoneighborviews.
The traditional aspect-graph method [10] is based on an
assumption thatanobject belongs to alimi'ted class ofshapes and characteristicviews canbeextractedusingpriorknowledge of the object. In our previous work [11], a similarity-based aspect-graph, which extendedthe work of[12], is proposedto extract a minimal set of
object
views and allowing anincremental update ofthe aspect-graph database. Training views
ofan object in [11] are sampled at random intervals from a
viewing sphere and the object representation canbe updated
after gathering a new object view without resorting all the
previouscollected2Dviews.
The object database described above is used as a pool of templates when performing
object
tracking. The contourobtained fromthecurenttracking resultis utilizedtofindout
the view angle of the
object
and the template for meani-shift algonrthm isupdated whennecessary. Itcainbe shown thatthe dynainic template (e.g., teinplate generated from each image) leads toan
accumulated error. The proposed method with absolute template indexing canl preventthe errorfromdrifting. Fig. 1 illustrates the blockdiagram of the overall scheme.The rest of the paper is organized as follows: Section II
describes theprocedure of extractingcontourand color features thatareusedto measurethesimilaritybetweentwo 2Dviews. Next, Section III describes the novelty of this work; similarity-based aspect-graph representations of3Dobjectsare
An2Dvieew
IJaampta
raidMiterva
Fig. 1. Basic workflow ofthe proposedframework:TI denotes the sufficientnumber of sampledviewisfor building the aspect-graph representation of an object.
variations on the
high-frequency
noises. The equation toextractthe mainfeatuLre is described as (3).
TF
={ftlI
N}lf-t,
O<t <T) (3)Fig. 2.Theimagedatabase that contains 4 3D rigidobjects,whereobject1,
object 2..object4 are listed fromleftto right.
built from2Dobjects views and then be used for selecting the suitable template for mean-shift tracking.Subsequently, some experimental results are presented to demonstrate the perfornaiiceof the proposedmethodinnrgid object tracking. Conclusions are finally madeinSection V.
II. FEATUREEXTRACTION A. ForegroundDetection and ContourExtraction
The first image databaseconitains 4realrigid objectsand is listedinFig. 2. According to the effects oflighting, shadows and
highlights
need to be removed before extracting the object features. Therefore, a robust background subtraction frameworkinour previous work [13] is applied to extract the foreground regions with the consideration to shadows and highlights. The usage of foreground detection provides the flexibility to build the object database even in anout-of-control environment.
In order to extract the shape information from the foreground object,
Canny
edge detection [14] is applied toextractthe shape edge and GradientVectorFlow Snake(GVF)
[15] is then applied to extract the contour information. The
contourinformationisincludedinasetZ, whichiscomposed
ofNpoints
zi,
wherezi
canbe describedas acomplex formas(1).
Z {z_}x,Z±j},O<i<V (1)
B. ConltourFeature. Fountoradescnptor
Inorderto avoid the variationin shiftandscale, the edge pointsinside the setZarere-sampling by
(2).
Z ={zi; = (xi+jyi = Lc[(xi -XC%)+ yi(Y -
yc)] L;
(N2)
where Oci<X
(xc,
yC)
isthemeanof (x,y)
. L meansthecontour
length
andL meanstheexpectedcontourlength.Then the Fourier transformis
applied
onZ tocalculate the Fourierdescriptors. The first T2magnitude
parts areextractedas the mainfeatureto describe the object shape without the
where
fp
means the magnitude part of Fourier descriptor at the 2rt/ V frequency.C. Colorfeature: background-weighted histogram
In the work of [1], background-weighted histogram is proposed to improve the robustness of color histogram by
incorporating with background information.
Background-weighted
histogram reduces the variations caused from similar target features with background and inaccurate description of the target.Let {°
M=oM-1(with
I_'
q = 1 ) be the discretehistogram ofthe backgroundinthe feature space and o*be its smallestnonzero entry. Thisrepresentation is computedina
region around thetarget. The extent of region is application dependent andwe use an area equalto 1.5 times the target
area. Theweightsarethencomputedas(4)
{v =min(o ii/O' 1)}
M=1-
0 (4)The
background-weighted histogram
isthen definedas(5)(5)
q2 CvuZN
k(IlX,
D6){b(x)-u]
withthe normalizationconstantCexpressedas(6)
C
_1/(Z17jk(x,
x 2)z [h(i) a])wherec = N-1 be the normalized pixel location in the
regiondefinedasthetargetmodel. Theregioniscenteredat0.
k(x) is a kemel finction that assigns smaller weights to
pixels farther from the center. The function
b:R2 e{0. X 1} associates to thepixelat location :, and
the index b(x,) denotesits bininthequantizedfeaturespace.
Function a is the delta fiuction.
III. SIMILARITY-BASED ASPECT-GRAPHANDMEAN-SHIFT
TRACKING A. SimilarityMeasure
In order to calculate the similarity between 2D views, a
similarity measure metric is necessary to applied on the
extracted contour and color features. Suppose U and V
(6) l
denote onekind offeature extracted fromtwo 2D views and
L meansthefeaturelength,whicharedescribedas(7).
(7)
U ={u, ' *, " ,U.u
V ={v0, ,V,iV- L4-1I
The similaritymeasuresbetweencontonrfeaturesarethen
calculatedusing1-normdistance, whichisdefinedas(8).
dF(u,v)
=2tl1u- vJ,L
(8)Intheworkof[1], Bhattacharyya Coefficientisproposed tomeasnre the similarityamong targetmodel and candidates.
We apply it on the similarity measure between two
weighted-color
histograms,
whichisdefinedas(9).d(M7v)= O/qgi
~~i),
qV) =A/ (9)B. GenerationofAspects and CharacteristicViews
Inourpreviouswork [12], a similarity-based aspect-graph
isproposed to presentthe3D
object
using aninimal set of 2Dviews. The aspects of 3D objects are extracted using 2D
views sampled at ranidom intervals. Moreover, object
representations become more and more detailed nsing new
2Dviews byonlycalcnlatingthe similaritymeasnres among
thenew viewand characteristic views.
Suppose v" means the new sampled view of the
nth
object,C;(i)
means the i,4 charactenrstic view of themi4
aspects of the nt object, Cn and Cn. means the
neighborviews of
cn",
that has themmimumdistance withVe
Ae meanstheaspects thathastheminimnm distacewith
v.,
where mn means the index of A Then foursteps areimposedtoformaspectsand characteristicviewsas
Step A-I to A-4 and the flowchart of the modified
aspect-graphrepresentationisillustratedasFig. 2. StepA-I.
When the number of existed aspects of the
n,h
object equals zero,vw
isregardedas acharacteristicviewofanew aspect.
StepA-2:
When the number of existed aspects of the
nth
object equalsone or two:(A-2. 1) If(10)and (II)bothmeet,
vw
is combined into themn""
aspectandthe characteristicview of theaspectkeeps the same;
(A-2.2) Otherwise, if(10) issatisfiedbut(I1)isnot, vC iscombined into the min aspectand isregardedas
a newcharacteristicview ofthe in" aspect;
(A-2.3) Otherwise, if
(1O)
and (II) areboth violated, anew aspect of the
n1
object is built, andvw
isregarded as the new characteristic view of the
newaspect.
(10)
allmildAF(V5C,)<¾
alnl dB(vB( 7Cm;)<
(11)
where ¾ andT¾ are both predefined threshold
value.
Step A-3:
When the number of existed aspects of the
n1,
object isequalto orgreaterthanthree,
(A-3.1) If (12) or (13) meet and (11) conflicts, a new
aspect is built up and Tv is regarded as the
characteristic view of thenewaspect.
(A-3.2)Otherwise, if(12) and (13) both conflict and(II)
meets,
"v
is combined into the mmn aspectandthe characteristic view of the mn1m aspect keeps the same.
(A-3.3)Otherwise, if(11), (12) and(13)are allviolated,
vns iS combined into the rn"nunaspect and is
regardedas a new"characteristicview ofthe min
aspect.
(12) allCe4dFA(P mCj4
aC
< min
dF
(TlW7f
Cn)<T4 anddF
(Fs11,C;;e
C )>¾ (13) Moreover, if a newaspect is built, theaspectordercanbe decided using (14). Ifthe similarity distance between vCand Cn 1 islarger thani thesimilarity distance between g
andC" ' the nevw aspect is inserted between aspect -in
and aspect min 1 Otherwise, the new aspect is inserted
betweenaspect mm andaspect7 mm 1 Therefore, the similar
aspects are closetoeach other.
(14) C. ObjectRecognition using2DCharacteristic Views
Afterbuilding the aspects-graphrepresentation ofeach3D
object, atestview ofan wiknownobject canberecognized
using the similanrty measure with the contour and color
features.Two stepsareimposedasfollows: StepB-1:
Thetestview ofannnknown objectiscompared with the characteristic views of the database via contour features. Then, thefirst T, 2D characteristic views inthe database having the smallest similarity distance with the test 2D view via contour features are preserved to be further recognized.
StepB-2:
Suppose A, is defined as the set that contains the
T6
2DcharacteristicviewsdescribedattheStepB-I,then the final
similaritydistaince canbecalculated with the color features
-1 (vil c nmin)> d, (Vn Cnmi"
L
(14)eet 2 Xe Ad ane cbaracteri§tic view betweeni view m~m... and View m",
Fig. 4. The results of 3D object tracking using mean-shift algorithm without 3Dobject database, where frame 76, 141, 205, 341,373, 437, 469,
509,733!805,885,965,990 1095, 1135,1261 are listed fromlefttoright
and fromtopto down.
Na. ] Add a aew cbaracteri tic view betweenviewm__dviewI
Fig. 3. The flowchart ofthe proposed aspect-graph representation
by (15).
d(IJ7Cm) (dB 7;,Cn))2+("d (Lman/LMQ (15)
where
Wvd
is a weight, v;Z means the 2D view of anunknowxn
object,
cn denotesthem,
icharactenrstic view of the n11object inthe database 3 Lma denotes the similarity distance calculated using contour features between the unknowxn object and theinmt
characteristic view of thenth1
object,
which is definedas(16) and
(17).
L
=dF(f7,Cn)),
whereCEAT (16)Fig.5. The results of 3Dobjecttracking using mean-shiftalgorith: object database. The frame indexis thesameasFig. 4.
where g(x) -k'(x)k and w.canbe describedas(21)
il <EAr6J
~~~~~~(17)
LM.,
arg max(dF(),
C)) (7D. 3DObject
fean-Shift
TrackingLet { t 1a be the normalized pixel locations of the targetn
candidates,centeredatyinthecurrentframe. Theprobability of the feature u=I . inthe target candidate is givenby(18)
P(Y)
=C
vIZ'k(Il
(y-
x.)/h
H2)S[b(x)-u]
(18) with the normalization constant C5 expressedas(19)C =
/(n
-lk
( (y-X.) Ih112)Z
EuVE[b(xi)
u(19)
From the work of[1], the new location of object Y- can be calculated by (20) and (21) and then the object location is moved from the current location y0 to the new location Y-till (22) issatisfied, otherwise
y0
= and repeats(20)to(22).y=
x-
w-gVI(YO
-x)/hH)
(20)Y
1
nvg(I0
(Yo--)IhH)
(21)
421 Z
1=l
, n(O [b(x) -u]HY-ioy1sc
(22)
After calculating the new location a new region is defined using thenewlocation and then theobject insidetleregionis
recognized using the3Dobjectdatabase. Asuitable
temnplate
is selected as the new target model q for next mean-shift tracking.
IV. EXPERIMENTAL RESULTS
This section describes several experiments to demonstrate the effectiveness of the proposed method. SONYEVI-D30 PTZ camera is used to capture object views. The 3D object database is built to test the proposed method. The training views of each object are captured in random intervals and
each one contains 72views. To testthe proposed
algorithn,
motionvideoof eachobjectis captured whichcontainsabout
2500views for eachobject. Furthermore, the computing time takentorecognize object and mean-shift trackingwasabotit I I~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Fig.6. The results of 3Dobject tracking usingmean-shiftalgorithm without 3Dobject database, where frame 13. 36, 80, 127, 135, 144, 194, 213,229,
242, 271, 958, 1028, 1345, 1394, 1646 are listed fromlefttorightand from topto down.
Fig.7.The results of 3Dobjecttracking usingmean-sh
objectdatabase. The frame indexisthesame asFig.6.
one seconds with P4 2.8G CPU and 1IGB RAM. The
parameters used in the following experiments are: T =72,
',=25- T =336 ? T =640. 1 =0.85. T= 3 N= 256.
M= 4096,and
L,
= 250.A. Similar appearances of each view-point over time on
simplebackground
In the first
experiment,
anobject
that has similarappearance
of eachview-point
over time is used fortesting the robustness ofthe mean-shift tracking with using3Dobject database. Certain representative framnes are selected and showninFig. 4and Fig. 5.These framesshow that theobject hasmotions with rotation and shift.InFig. 4, the target model is set as the front of the monkey and then the mean shift tracking is applied without using the 3D object database Because the monkey has the similar color distribution with each side, the mean-shift tracker tracks the candidate in all frames. The result is good due to the similar appearance of eachview-point of the object. InFig. 5, the tracking resultsarecalculatedusing the proposed method
anid
the resultsarealmost the same asthe results inFig. 4. The tracker withour
proposed Method tracks the
canididate
correctly.B.
Different
appearancesfrom each view-point over time on simple backgroundIn the second experiment, an object that has different appearance of eachview-point overtime is used fortesting theefficiency of theproposed method. The green turtle shell
is chosen as the target modelin thisexperiment.InFig. 6,the green shell canbe tracked in the frames 13, 36,and 80. With the rotation, the green part is vanished gradually with the cream-colored partincreasing,and the mean-sift tracker loses thecandidatefrom frames 144to 229. When the green part is back, the tracker tracks the candidate again. Themean shift tracking isapplied without using the 3D
object
database and the estimated new location becomes inaccurate when theappearance isdifferent from the initialtemplate.InFig. 7,the template usedineach frame isreplacedxwith the suitableone
from the 3D object database and the tracker with our
proposedmethod tracks the candidate all thetime.
V. CONCLUSIONS
This workproposesanovelmethod for
tracking
3Dobjects
usingmeani-shift algorithm anda3Dobject database. The3D
object database is built usingcontourand color features and presents thesimilarity-based aspect-graph of each3Dobject. When the appearance of 3D object changes over the view-point, asuitableobject model isprovided from the 3D
object database and the mean-shift tracking is applied on findinigthe localmi'nima of a
similarity
measure between the color histograms of the object model and the target image. When ain object has similar appearance of eachview-point, both the proposed method and the traditional mean-shift tracksthe candidate properly. However, the proposed method tracks the candidate well wxhen an object has different appearance of eachview-point, but the traditional mean-shift method fails. The proposed method solves the 3D object tracking problem and the effectiveness of the proposed method is demonstrated by experiments.ACKNOWLEDGMENTS
This work was supported byNational Science Council of the R.O.C. under grant no. NSC94-2218-E009064 and
DOIT TDPA Program under the project number
95-EC-17-A-04-S1-054.
REFERENCES
[1] D. Comaniciu andM. Peter, "Mean shift: arobustapproach towvard
featuLrespaceanalysis,"IEEETronsoctions onPottern AnalysisAnd Viochine Intelligence, vol. 24,no.5, May2002.
[2] Collins RT., ";Meanshiftblobtracking throughscalespace,"in Proc.
ofIEEEInternotionalConference on Computer lision ond Pottern
Recognition, pp.234-240,2003.
[3] K.She,0.Bebis,H.Gu,and R.Miller,"cVehicle tracking usingon-line fusionofcolor andshapefeatures,"InProc.Ixt ConfonIntelligent
TransportotionSiys.,WashingtonDC, Oct. 2004.
[4] S.T. Birchfield, S. Rangarajan, "Spatiograms versus histograms for region-based tracking"', in Proc. Computer V7ision ond Pottern Recognition,pp.20-25,June 2005.
[5] H. Zhang, W. Huang, Z. Huang, L. Li, "Kernel-Based Method for Tracking Objects vwith Rotation and Translation", Internotional Conference ofPotternRecognition(ICPR),pp.23-26August,2004.
[6] V.Parameswaran, V.Ramesh,andI. Zoghlami,"Tunable Kernels for Tracking," in Proc. ofthe 2006 IEEE ComputerSocietyConference on Compuiter Vision ond PatternRecognition,2006.
[7] Peters, G., "Theories of Three-Dimensional Object Perception - A Survey," Recent Research Developments in Pattern Recognition,
TranssworldResearchNetwork, 2000.
[8] I.Weiss and M. Ray,"Model-BasedRecognition of 3DObjectsfrom Single Images," IEEE Trans. On PAMI, Vol.23, No.2, pp.116-125,
2001.
[9] Koenderink, J.J. and vanDoorn, A.J. "The singularities of the visual mapping,"Biol. Cyber.24:51-59, 1976.
[10] Ilan Shimshoni, Jean Ponce, "Finite-Resolution Aspect Graphs of PolyhedralObjects,"IEEE Trans. onPatternAnal. Mach. Intell. 19(4): 315-327, 1997.
[11] J.S. Hu, T.M. Su,and C.C. Lin,"Shape Memorization and Recognition of 3-D Objects Using a Similarity-Based Aspect-Graph Approach,""
IEEEInt'lConf onSystems,Mon, ondCybernetics,Oct. 2006. [12] C.M.Cyrand B.Kimia, "ASimilarity-Based Aspect-Graph Approach
to 3D Object Recognition," in International Journal of Computer Vision, 57(l):5-22, 2004.
[13] J.S. Hu, T.M. Suand S.C. Jeng, "RobustBackgroundSubtraction with Shadow andHighlightRemoval for Indoor Environment Surveillance," IEEERSJ1Int'lConfonIntelligent RobotsondSystems,Oct.2006. [14] J. Canny, "A Computational Approach to Edge Detection," IEEE
Transactions on Pattem Analysis and Machine Intelligence, \ol.
PAMI-8,No.6, 1986.
[15] C.Xu and J. L. Prince, "Gradient Vector Flow: A New External Force for Snakes," IEEE Conference on CompuLter Vision and Pattern Recognition,pp.66-71,1997.