## 中 華 大 學 碩 士 論 文

### 整合外部投影及內部形狀特徵之三維模型檢 索系統

### A 3D Model Retrieval Based on the Exterior Projection and Interior Shape Features

### 系 所 別：資訊工程學系碩士班 學號姓名：M09702051 張翔淵 指導教授：石 昭 玲 博士

### 中 華 民 國 一百 年 一 月

0

### 摘要 摘要 摘要 **摘要 **

隨著網路的發達，3D 模型在數位圖書館中的數量逐漸增加，我們需要一個搜尋 系統去幫助人們找到他們所要的 3D 資料。對於 3D 模型資料庫的管理，首先要建立 一個有效率的分類及搜尋方法，而捨棄傳統以文字為檢索鍵的方法。目前以 3D 模型 本身的內容（content）為檢索鍵的搜尋方式是 3D 模型資料庫管理上的最佳利器。因 此如何建立一個有效的 3D 模型搜尋系統，讓使用者可以利用此一系統快速地找到在 大型 3D 模型資料庫中符合使用者個人期待的相似 3D 模型，是本篇論文首要目標。

在本論文中提出整合外部投影以及內部形狀的 3D 模型特徵擷取演算法，以及 3D 模型搜尋系統。因此本系統包含兩個部分，一是特徵擷取，另一部份則是 3D 模 型搜尋。在特徵擷取的部分，分為外部特徵以及內部特徵，外部特徵是以角度投影作 為 3D 模型外部主要特徵(angle descriptor)，並取對應的投影圖來擷取相似特徵 (symmetric descriptor)；內部特徵是以 3D ART，取得 3D 模型的各種外型特性，擷取 不同的內部形狀特徵 (angle 3D-ART descriptor, radial 3D-ART descriptor, curvature 3D-ART descriptor)，並且將外部內部特徵結合，來使得搜尋系統更加的完整。在搜尋 的部份，可利用特徵向量在資料庫中找出與使用者想搜尋的 3D 模型相似度較高的模 型回應給使用者。

關鍵字：三維模型檢索系統、內部特徵及外部特徵、角度投影特徵、相似度平面、

3D-ART。

### 致謝 致謝 致謝 **致謝 **

兩年多的研究生日子總算告一段落了，心中有很多話想要說，可是卻不知怎麼

說出口，兩年的日子又一幕幕浮現眼前，有歡笑、有感動、有辛苦，但是最多的還是

那無法言喻的感激之情。

首先要感謝的是我的指導教授石昭玲老師，在這兩年來對我的諄諄教誨、關心

與指導，還要感謝李建興老師、韓欽銓老師、連振昌老師、周智勳老師，使我如沐春

風、受益良多，在此獻上我最誠摯的感謝與敬意。此外，也感謝石昭玲教授、李建興

教授、謝朝和教授、韓欽銓教授、李遠坤教授，在我的碩士論文口試中，能夠給予許

多寶貴的意見。

感謝家人及朋友們，在我最累時能讓我心情放鬆並且使我擁有信心完成學業，

同時要感謝的是實驗室的學長們，同學們，以及學弟妹們，感謝你們在這段期間內對

我的幫助與鼓勵。當然，也要感謝其他實驗室的許多其他人，在我需要幫助的時候，

伸出溫暖的手。

謝謝你們！

i

**Abstract **

The advances in 3D data acquisition, graphics hardware, and 3D data modeling and visualizing techniques have led to the proliferation of 3D models. The searching for specific 3D models becomes an important issue. Techniques for effective and efficient content-based 3D model retrieval have therefore become an essential research topic. In this thesis, exterior and interior features are proposed for 3D model retrieval. The exterior features are angle descriptor (AD) and symmetric descriptor (SD). The interior features are angle 3D-ART descriptor, radial 3D-ART descriptor and curvature 3D-ART descriptor. To improve the retrieval results, the exterior and interior features will be combined based on dynamic multi-descriptor fusion (DMDF). Experimental results show the proposed methods have a good performance.

Keyword: 3D model retrieval, exterior and interior features, angle projection, symmetric planes, 3D-ART.

**Contents **

**Abstract ... i **

**Contents ... ii **

**List of Figures ... iv **

**List of Tables ... vii **

**Chapter 1 ... 1 **

**Introduction ... 1 **

**Chapter 2 ... 3 **

**Related Work ... 3 **

**2.1 Spatial Shape Feature Based Methods... 3 **

2.1.1 Geometric Descriptor ... 3

2.1.2 Cord-Based Descriptor ... 3

2.1.3 Surface Penetration Map Descriptor ... 3

2.1.4 Shape Histogram ... 4

2.1.5 Shape Distribution ... 4

2.1.6 Extended D2 Descriptor ... 4

2.1.7 Parameterized Statistics ... 5

2.1.8 Grid D2 Descriptor ... 6

2.1.9 Surflet-Pair-Relation Histograms ... 6

2.1.10 Shape Spectrum Descriptor ... 6

2.1.11 Combined Angle and Distance Descriptor ... 8

2.1.12 Distance Descriptor ... 8

2.1.13 Sliced Histogram... 8

2.1.14 Probability Density-Based Shape Descriptor ... 9

**2.2 Frequency Based Methods ... 10 **

2.2.1 Ray-Based Sampling with Spherical Harmonics Representation... 10

2.2.2 3D Zernike Moments ... 11

2.2.3 Rotation Descriptor ... 11

2.2.4 3D Angular Radial Transform ... 12

2.2.5 The Concrete Radialized Spherical Projection Representation ... 13

**2.3 View Based Methods ... 14 **

2.3.1 Description with 2D Silhouette Contours ... 14

2.3.2 Lightfield Descriptor ... 15

2.3.3 Principal plane analysis descriptor ... 16

2.3.4 Optimal Selection of 2D Views ... 17

2.3.5 Grid Sphere and Dodecahedron ... 17

2.3.6 Depth Information Descriptor with DFT ... 18

iii

2.3.7 Elevation Descriptor ... 18

2.3.8 Curvature Maps Descriptor ... 19

2.3.9 Principal Plane Descriptor ... 20

2.3.10 The Spherical Trace Transform ... 21

2.3.11 2D Depth Image Descriptor ... 21

2.3.12 Shape Impact Descriptor ... 23

2.3.13 Panoramic Views ... 23

2.3.14 Spatial Structure Circular Descriptor ... 24

**Chapter 3 ... 26 **

**The Proposed Method for 3D Model Retrieval ... 26**

**3.1 Pre-Processing ... 26 **

3.1.1 3D model Normalization ... 26

3.1.2 Alignment of 3D model ... 27

**3.2 The Exterior Feature: ... 28 **

3.2.1 The Angle Projection ... 28

3.2.2 Feature Extraction by 2D-ART... 30

3.2.3 The Symmetric Projection ... 31

3.2.4 Feature Extraction by 2D-ART... 33

**3.3 The Interior Feature: ... 34 **

3.3.1 3D-ART ... 34

3.3.2 Angle 3D-ART Descriptor... 35

3.3.3 Radial 3D-ART Descriptor ... 36

3.3.4 Curature 3D-ART Descriptor ... 37

**3.4 The Feature Combination ... 37 **

**Chapter 4 ... 39 **

**Experimental Result ... 39 **

**4.1 Experiments on the First Database, the PSB Database ... 40 **

**4.2 Experiments on the Second Database, the ESB Database... 45 **

**4.3 Experiments on the Third Database, the SHREC-W Database ... 47 **

**4.4 Experiments on the Fourth Database, the NIST Database ... 49 **

**Chapter 5 ... 56 **

**Conclusion ... 56 **

**Reference ... 57 **

**List of Figures **

Fig. 1.1 The 3D model of a stealth bomber ... 1

Fig. 2.1 The Geometric Descriptor [1] ... 3

Fig. 2.2 Shells and sectors as basic space decompositions for shape histogram [4] ... 4

Fig. 2.3 Five shape functions based on angle (A3), lengths (D1 and D2), area (D3), and volumes (D4) [5] ... 4

Fig. 2.4 The classification of point-pair distances, IN (A), OUT (B), and MIXED (C) [6] . 5 Fig. 2.5 AD and AAD shape function from the surface based 3D shape model [9] ... 5

Fig. 2.6 Alpha multi-resolution shapes of chair [10] ... 6

Fig. 2.7 Four-dimensional of SPRH feature [13] ... 6

Fig. 2.8 The MPEG-7’s SSD of two 3D models [15] ... 7

Fig. 2.9 Elementary shapes and their corresponding shape index of MPEG-7’s SSD [15] ... 7

Fig. 2.10 Each plane equation for the x-,y-,and z-coordinates, respectively [18] ... 8

Fig. 2.11 Find intersection line between a plane equation and a mesh [18] ... 9

Fig. 2.12 (a) Examples of 100 sliced shapes for 3D model of the x-coordinate. (b) Estimate the distances from the original point to the sliced shape for the z-coordinate of 3D model. (c) A slice histogram descriptor for the x-, y-, and z-coordinate [18] ... 9

Fig. 2.13 Illustration of local surface features [19]... 10

Fig. 2.14 Density-based shape description: Measurements of the (multivariate) feature S obtained from the 3D object surface are processed into descriptor vectors, that is, the probability density function of the feature [19] ... 10

Fig. 2.15 Multi-resolution from Fourier coefficients for spherical harmonics [21] ... 11

Fig. 2.16 The Spherical Harmonic Shape Representation framework [23] ... 11

Fig. 2.17 Gaussian sphere (a) 8 sections (b) 24 sections [26] ... 12

Fig. 2.18 Real parts of 3D ART BF [27]... 12

Fig. 2.19 Real parts of 3D ART BF [27]... 12

Fig. 2.20 (a) The emanating rays and intersection of a model surface. (b) The filled points from mass center to the intersection [28] ... 14

Fig. 2.21 Sixteen views of 3D dinosaur model [29] ... 15

Fig. 2.22 The lightfield descriptor comparison of two 3D models [30] ... 15

Fig. 2.23 20 2D silhouettes of the 3D river horse model [31] ... 16

Fig. 2.24 (a) the original 3D model; (b) all points is projected onto principal plane; (c) transforming a 3D model into an adjacent triangular sequence (ATS) using the proposed triangulation process [32] ... 16

Fig. 2.25 Feature view selection process [33] ... 17

Fig. 2.26 (a) The 3D model is segmented by sphere grid. (b) Six silhouettes from the face of dodecahedron over a hemisphere [36] ... 18

v

Fig. 2.27 The depth buffers of a car and the 2D DFT of the six images [37] ... 18

Fig. 2.28 (a) The voxel grid of 3D tank model. (b) Six elevations of a 3D military tank model including front, top, left, right, rear, and bottom elevations [38] ... 19

Fig. 2.29 The concentric circles of an elevation [38] ... 19

Fig. 2.30 Curvature maps and three best matching tiles of two 3D objects [40, 42] ... 20

Fig. 2.31 The principal planes [43] ... 20

Fig. 2.32 3D models and their corresponding projection images [43]... 20

Fig. 2.33 The projection image segmented by several concentric circles of the 3D dragon model [43] ... 21

Fig. 2.34 The coarser voxel [44] ... 21

Fig. 2.35 The different gray-level image at the different sampled points [44]... 21

Fig. 2.36 The depth images from different vertices [45] ... 22

Fig. 2.37 The depth lines of a race car [45] ... 22

Fig. 2.38 The method of the dynamic programming distance [45] ... 22

Fig. 2.39 The SID computation [46] ... 23

Fig. 2.40 (a) Pose normalized 3D model (b) The position of the 3D model surface is captured by the unfolded cylindrical projection (c) The orientation of the 3D model surface is captured by the unfolded cylindrical projection [47] ... 24

Fig. 2.41 The 3D model comparison framework [48] ... 25

Fig. 3.1 The original and decomposed 3D stealth bomber model. (a) The 3D stealth bomber model circumscribed by a bounding box. (b) The bounding box of the 3D stealth bomber model is decomposed into a 100×100×100 voxel grid. (c) The normalized 3D stealth bomber model ... 27

Fig. 3.2 The three principal planes of 3D stealth bomber model ... 27

Fig. 3.3 Six different views of 3D model ... 28

**Fig. 3.4 The angle between the direction r and the normal vector n of the mesh that is **
intersected of the 3D model ... 29

Fig. 3.5 3D stealth bomber model and it’s six gray-level angle projection planes. (a) The right plane I1 and the left plane I4. (b) The top plane I2 and the bottom plane I5. (c) The front plane I3 and the rear plane I6 ... 30

Fig. 3.6 The ART basis functions. (a) The Real Parts of the ART basis functions. (b) The imaginary Parts of the ART basis functions [14] ... 31

Fig. 3.7 3D stealth bomber model and it’s six symmetric projection planes. (a) The
*difference plane I'*1*. (b) The difference plane I'*2*. (c) The difference plane I'*3. (d) The sum
*plane I'*4*. (e) The sum plane I'*5*. (f) The sum plane I'*6 ... 32

*Fig. 3.8 An opaque voxel, notated Voxel(x, y, z) = 1, if there is a polygonal surface located *
*within this voxel; otherwise, this voxel is regarded as a transparent voxel, notated Voxel(x, *
*y, z*) = 0 ... 34

Fig. 3.9 Spherical coordinates system ... 34

Fig. 3.10 Real parts of 3D ART basis function ... 35

Fig. 3.11 Gray line represents the distance from the 3D model surface to the mass center. 36 Fig. 4.1 All of the 3D model in the Helicopter’s class on PSB [55] ... 43

Fig. 4.2 All testing classes of the 3D model on PSB [55] ... 44

Fig. 4.3 All of the 3D model in the Handles’s class on ESB [56] ... 46

Fig. 4.4 All classes of the 3D model on ESB [56] ... 47

Fig. 4.5 All of the 3D model in the Airplane’s class on SHREC-W [57] ... 49

Fig. 4.6 All classes of the 3D model on SHREC-W [57] ... 49

Fig. 4.5 All of the 3D model in the First class on NIST [58] ... 50

Fig. 4.6 All classes of the 3D model on NIST [58] ... 51

Fig. 4.7 Comparative precision-recall diagrams for the PSB database... 54

Fig. 4.8 Comparative precision-recall diagrams for the ESB database ... 54

Fig. 4.9 Comparative precision-recall diagrams for the SHREC-W database ... 55

Fig. 4.10 Comparative precision-recall diagrams for the NIST database ... 55

vii

**List of Tables **

Table 4.1 The 92 testing categories in the Princeton Shape Benchmark database. |Nc| is the
number of models in a category [55] ... 41
Table 4.2 The 92 training categories in the Princeton Shape Benchmark database. |Nc| is
the number of models in a category [55] ... 42
Table 4.3 Retrieval accuracy of the proposed approach on the PSB database in terms of the
*recall value (%), Re, and DCG (%).Note that the number in the parenthesis denotes the *
iteration number corresponding to the best retrieval result during the 10 iterations ... 42
Table 4.4 Comparison of the proposed approach with other descriptors on the PSB
database in terms of DCG(%). Note that the approaches marked with * are referenced from
Akgule et al. [19] ... 43
Table 4.5 The 45 categories in the Engineering Shape Benchmark database. |Nc| is the
number of models in a category [56] ... 45
Table 4.6 Retrieval accuracy of the proposed approach on the ESB database in terms of the
*recall value (%), Re, and DCG (%).Note that the number in the parenthesis denotes the *
iteration number corresponding to the best retrieval result during the 10 iterations ... 46
Table 4.7 Comparison of the proposed approach with other descriptors on the ESB
database in terms of DCG (%). Note that the approaches marked with * are referenced
from Akgule et al. [19] ... 46
Table 4.8 The 20 categories in the SHREC Watertight database. |Nc| is the number of
models in a category [57] ... 48
Table 4.9 Retrieval accuracy of the proposed approach on the SHREC-W database in terms
*of the recall value (%), Re, and DCG (%).Note that the number in the parenthesis denotes *
the iteration number corresponding to the best retrieval result during the 10 iterations ... 48
Table 4.10 Comparison of the proposed approach with other descriptors on the SHREC-W
database in terms of DCG (%). Note that the approaches marked with * are referenced
from Akgule et al. [19] ... 48
Table 4.11 The 20 categories in the National Institute of Standards and Technology
database. |Nc| is the number of models in a category [58] ... 50
Table 4.12 Retrieval accuracy of the proposed approach on the NIST database in terms of
*the recall value (%), Re, and DCG (%).Note that the number in the parenthesis denotes the *
iteration number corresponding to the best retrieval result during the 10 iterations ... 50
Table 4.13 The retireved time of the proposed method on the four different database ... 52
Table 4.14 Retrieval accuracy of the proposed approach on the four different database in
*terms of the recall value (%), Re, and DCG (%).Note that the number in the parenthesis *
denotes the iteration number corresponding to the best retrieval result during the 10
iterations ... 53

**Chapter 1 ** **Introduction **

Recent development in advanced techniques for modeling, digitizing and visualizing 3D models has made 3D models (see Fig 1.1) as plentiful as images and video. Therefore, it is necessary to design a 3D model retrieval system which enables the users to efficiently and effectively search interested 3D models. Many retrieval systems or search engines provide a keyword-based interface for multimedia data retrieval. In general, the multimedia data is annotated with appropriate keywords, typically manually labeled by experienced managers. However, the difference in interpretation of the same multimedia data among different people will make the annotated keywords differ from person to person. To overcome the difficulties of keyword-based retrieval, content-based retrieval has become a widely accepted research direction.

Fig. 1.1 The 3D model of a stealth bomber

The primary challenge to a content-based 3D model retrieval system is to extract proper features for searching similar 3D models effectively. In general, there are three paradigms for 3D model retrieval: Spatial shape descriptors [1-19], frequency descriptors [20-28], and view-based descriptors [29-48].

Spatial shape descriptors consider the statistical distributions or histograms of local features measured at the vertices or meshes of a 3D model. The main drawback of these spatial shape descriptors is that they do not take into account how the local features are spatially distributed over the model surface.

Frequency descriptors are extracted by mapping the 3D data into frequency domain representations. The effectiveness of these frequency descriptors relies on the quality of the voxel decomposition of a 3D model.

View-based descriptors are generally obtained by projecting a 3D model on a number of 2D projections from different views. Discriminative features extracted from these 2D projection planes are combined to index similar 3D models. The problem is that rotation

2

invariance has to be solved by either pose normalization prior to 2D projections, by extracting rotation-invariant features, or by matching 2D feature descriptors over many different alignments simultaneously.

The rest of the paper is organized as follows. The related work is described in Chapter 2. In Chapter 3, the exterior and interior features are proposed for 3D model retrieval. The exterior features are angle descriptor (AD) and symmetric descriptor (SD). The interior features are angle 3D-ART descriptor, radial 3D-ART descriptor and curvature 3D-ART descriptor. Chapter 4 gives the experimental results to show the effectiveness of the proposed exterior and interior features. Finally, conclusion is given in Chapter 5.

**Chapter 2 ** **Related Work **

In this section, some related work for 3D model retrieval will be described. The 3D model retrieval methods are classified into three categories: spatial shape feature based methods, frequency based methods, and view based methods.

**2.1 Spatial Shape Feature Based Methods **

In 3D model retrieval systems, consider the statistical distributions or histograms of local features measured at the vertices or meshes of a 3D model.

2.1.1 Geometric Descriptor

Zhang and Chen [1] proposed methods to efficiently calculate features such as area, volume, moments, and Fourier transform coefficients from mesh representation of 3D models as shown in Fig. 2.1.

Fig. 2.1 The Geometric Descriptor [1].

2.1.2 Cord-Based Descriptor

Paquet et al. [2] employed moments to describe symmetries of 3D objects, cord-based descriptors to represent shape information in fine details, and wavelet transform descriptors to describe the density distribution through a volume.

2.1.3 Surface Penetration Map Descriptor

Yu et al. [3] generated a surface penetration map in which the number of surfaces intersecting with the ray emitted from the center of the sphere is counted. Features are

4

extracted from the Fourier transform of the penetration map for retrieval or comparison purpose.

2.1.4 Shape Histogram

Ankerst et al. [4] used shape histograms for 3D model retrieval where the 2D silhouette is divided into a number of areas by a collection of concentric shells and sectors.

In Fig. 2.2, quadratic form distance measure is employed to compute the distance between the histogram bins.

Fig. 2.2 Shells and sectors as basic space decompositions for shape histogram [4].

2.1.5 Shape Distribution

Osada et al. [5] tried to represent each 3D model by the probability distributions of some geometric properties computed from a set of randomly selected points located on the surface of the model. These geometric properties, including distance, angle, area, and volume, are employed to describe the shape distribution as shown in Fig. 2.3. Among these distributions, the most effective one is D2, which measures the distribution of distances between any two randomly selected points.

Fig. 2.3 Five shape functions based on angle (A3), lengths (D1 and D2), area (D3), and volumes (D4) [5].

2.1.6 Extended D2 Descriptor

Ip et al. [6, 7] proposed a modified D2 descriptor where the D2 distance is classified as IN distance, OUT distance, and MIXED distance depending on whether the line

segment connecting the two points lies inside or outside the model. The dissimilarity measure is a weighted distance of D2, IN, OUT, and MIXED distributions as shown in Fig.

2.4. However, it is difficult to do the classification task if a 3D model is represented by polygon meshes.

Fig. 2.4 The classification of point-pair distances, IN (A), OUT (B), and MIXED (C) [6].

2.1.7 Parameterized Statistics

Ohbuchi et al. [8, 9] combined the absolute angle-distance histogram (AAD) with the D2 descriptor for 3D model retrieval. AAD measures the distribution of angles between the normal vectors of two surfaces on which the two randomly selected points locate as shown in Fig. 2.5. In their experimental results, AAD outperforms D2 at the expense of about 1.5 times of computational cost.

Ohbuchi et al. [10, 11] used a multi-resolution representation of shapes to represent a 3D model as shown in Fig. 2.6. The AAD derived from each of the resolution level is used as the shape descriptor.

Fig. 2.5 AD and AAD shape function from the surface based 3D shape model [9].

6

Fig. 2.6 Alpha multi-resolution shapes of chair [10].

2.1.8 Grid D2 Descriptor

Shih et al. [12] proposed a new descriptor called grid D2 (GD2) to alleviate this problem. In GD2, a 3D model is first decomposed into a voxel grid. The random sampling operation is performed on voxels within which some polygonal surfaces locates rather than on the random points.

2.1.9 Surflet-Pair-Relation Histograms

Wahl et al. [13] proposed a statistical representation of 3D model based on a novel
four-dimensional feature. The intrinsic geometrical relation of an oriented surface-point
pair is calculated as the features. The features represent both local and global
*characteristics of the surface of 3D model. *

Fig. 2.7 Four-dimensional of SPRH feature [13].

2.1.10 Shape Spectrum Descriptor

MPEG-7 standard has defined used the shape spectrum descriptor (SSD) [14] for 3D model retrieval. SSD represents the histogram of curvatures of all points on the 3D surface.

The advantages of SSD are that it can match two 3D models without first aligning the 3D objects and it is robust to the tessellation of the 3D polygonal model. See Fig. 2.8 and Fig.

2.9.

Fig. 2.8 The MPEG-7’s SSD of two 3D models [15].

Fig. 2.9 Elementary shapes and their corresponding shape index of MPEG-7’s SSD [15].

8 2.1.11 Combined Angle and Distance Descriptor

Reisert and Burkhardt [16] exploit various further improvements of D2 and Alpha/distance (AD). They show that small and compact representations obtained by group integration can lead to reliable and informative descriptions of the objects. Seven improved descriptors are described in Reisert’s paper, including SHT distance histograms (SD), Extended SHT distance histograms (SDE), Beta/distance histograms (BD), Alpha/beta histograms (AB), Alpha/beta/distance histogram (ABD), Alpha/beta/distance SHT histograms (ABSD), Alpha/beta/distance extended SHT histograms (ABSDE).

2.1.12 Distance Descriptor

Vranic and Saupe proposed a modification PCA which is not only based on the collection of vertex vectors but also accounts the corresponding triangle areas as weighting factors [17]. After alignment, using the directions of 20 vertex on dodecahedron, distances to the farthest intersections are used as feature.

2.1.13 Sliced Histogram

You-Shin Park et al. [18] proposes a 3D model retrieval system based on principal component analysis (PCA) for normalizing all the models. The histogram of 2D images sliced along the x-, y-, and z-coordinates are used as shape descriptor for measuring the similarity in 3D models. For a 3D model, a hundred planes orthogonalize with the x-y-, and z-coordinates as sliced shape, respectively. Therefore, sliced shape is the 2D images which are intersecting with 3D model. This approach is to compute the slices of 3D models for the x-y-, and z-coordinates, respectively and to set by the principal axis based on principal component analysis (PCA) to search s the 3D model between the given query and the database.

Fig. 2.10 Each plane equation for the x-,y-,and z-coordinates, respectively [18].

Fig. 2.11 Find intersection line between a plane equation and a mesh [18].

(a) (b) (c)

Fig. 2.12 (a) Examples of 100 sliced shapes for 3D model of the x-coordinate. (b) Estimate the distances from the original point to the sliced shape for the z-coordinate of 3D model.

(c) A slice histogram descriptor for the x-, y-, and z-coordinate [18].

2.1.14 Probability Density-Based Shape Descriptor

Ceyhun Burak Akgul et al. [19] proposed content-based 3D model retrieval by a probabilistic generative description of local shape properties. The shape description framework characterizes a 3D model with sampled multivariate probability density functions of its local surface features. The density-based descriptor is efficiently computed via kernel density estimation (KDE) coupled with fast Gauss transform. Density-based characterization is invariant for the shape matching.

10

Fig. 2.13 Illustration of local surface features [19].

Fig. 2.14 Density-based shape description: Measurements of the (multivariate) feature S obtained from the 3D object surface are processed into descriptor vectors, that is, the probability density function of the feature [19].

**2.2 Frequency Based Methods **

Frequency feature based methods are extracted by mapping the 3D data into frequency domain representations. The effectiveness of these frequency descriptors relies on the quality of the voxel decomposition of a 3D model.

2.2.1 Ray-Based Sampling with Spherical Harmonics Representation

**Vranic et al. [20, 21] applied Fourier transform on the sphere with spherical harmonics **
to generate embedded multi-resolution 3D shape feature vectors shown in Fig. 2.15. This
method requires pose normalization to be rotation invariant. A modified rotation invariant
shape descriptor based on the spherical harmonics without pose normalization has been

proposed by Funkhouser et al. [22-24] as shown in Fig. 2.16. First, a 3D model is
decomposed into a collection of spherical functions which are derived by intersecting the
**model with a set of concentric spheres of different radii. Each spherical function is **
decomposed into a number of harmonics of different frequencies. The sum of norms of all
frequency components at the same radius is regarded as the shape descriptor of a spherical
function. The descriptors of all spherical functions will constitute the shape descriptor of a
3D model. The reason for the descriptor being rotation invariant is that rotating a spherical
function does not change the energies in each frequency component.

Fig. 2.15 Multi-resolution from Fourier coefficients for spherical harmonics [21].

Fig. 2.16 The Spherical Harmonic Shape Representation framework [23].

2.2.2 3D Zernike Moments

Novotni and Klein [25] used 3D Zernike moments, which is naturally an extension of spherical harmonics based descriptors, for 3D shape retrieval. These moments are computed as a projection of the function within the unit ball. The benefits of 3D Zernike moments are that they are rotation invariant and less sensitive to geometric and topological artifacts.

2.2.3 Rotation Descriptor

*Pan et al. [26] proposed the 3D shape descriptor based on rotation. For a 3D model, N *
*random points are sampled uniformly on the surface. The model is rotated T times with T *
*group of rotation angles, to find T Gaussian spheres. Next, each spherical surface is *

12

segmented into 8 sections by x-y, y-z, and x-z planes as shown in Fig. 2.17(a). We count the number of normal vectors on each section. Based on these 8 sections, the spherical surface can be segmented into 24 sections as shown in Fig. 2.17(b). The number of normal vectors in each sub-section can be calculated again. Thus, these 8-dimensional and 24-dimensional vectors are used as the feature vector for each group.

Fig. 2.17 Gaussian sphere (a) 8 sections (b) 24 sections [26].

2.2.4 3D Angular Radial Transform

Ricard et al. [27] presented a 3D shape descriptor, the 3D Angular Radial Transform (3D-ART) for 3D model retrieval. First, the 3D models are represented in spherical coordinates. Next, a Principal Components Analysis (PCA) is applied to align the 3D models along the z-axis. Then, the 3D extension of MPEG-7’s ART [14] is applied to extract feature vectors.

Fig. 2.18 Real parts of 3D ART BF [27].

Fig. 2.19 Real parts of 3D ART BF [27].

2.2.5 The Concrete Radialized Spherical Projection Representation

Papadakis et al. [28] proposed a 3D model retrieval system based on the spherical harmonics. Two shape descriptors are adopted in this paper. The 3D model is aligned by the continuous principal component analysis (CPCA) or by the modified principal component analysis (NPCA). In CPCA, point coordinates is used to align models. Besides, in NPCA, the unit normal vectors of the meshes are used to align models. The model’s surface is intersected with rays emanating from the mass center of 3D model (see Fig. 2.20 (a)). Between the intersection and the mass center, the 3D model is filled up with points in equidistance (see Fig. 2.20 (b)). The spherical harmonics (SH) is applied on the filled 3D model to extract the two feature vectors separately based on CPCA and NPCA. Then, the spherical harmonics (SH) is applied on the filled 3D model to extract the second feature vector.

14

(a) (b)

Fig. 2.20 (a) The emanating rays and intersection of a model surface. (b) The filled points from mass center to the intersection [28].

**2.3 View Based Methods **

View-based descriptors are generally obtained by projecting a 3D model on a number of 2D projections from different views. Discriminative features extracted from these 2D projection planes are combined to index similar 3D models. The problem is that rotation invariance has to be solved by either pose normalization prior to 2D projections, by extracting rotation-invariant features, or by matching 2D feature descriptors over many different alignments simultaneously.

2.3.1 Description with 2D Silhouette Contours

Super and Lu [29] exploited the curvature scale space to partition the silhouette contour of a 3D model into overlapping local parts at all scales shown in Fig. 2.21. Then, the contour scale space is used to classify the parts into types. Comparing the part-type histograms, 3D model matching can be achieved.

Fig. 2.21 Sixteen views of 3D dinosaur model [29].

2.3.2 Lightfield Descriptor

Chen et al. [30] introduced the lightfield descriptor to represent 3D models as shown in Fig. 2.22 and Fig. 2.23 The lightfield descriptor is computed from ten silhouettes. Each silhouette is represented by a 2D binary image. The Zernike moments and Fourier descriptors are employed to describe each binary image. Since a 3D model may be rotated or deformed, the number of 2D silhouettes must be large enough to represent a 3D model.

On the other hand, the retrieval time will increase as the number of silhouettes increases.

Fig. 2.22 The lightfield descriptor comparison of two 3D models [30].

16

Fig. 2.23 20 2D silhouettes of the 3D river horse model [31].

2.3.3 Principal Plane Analysis Descriptor

Kuo and Cheng [32] proposed a 3D shape retrieval system based on the principal plane analysis. First, by projecting the 3D model onto the principal plane as shown in Fig.

2.24 (b), a 3D model can be transformed into a 2D binary image. The convex hull of the binary shape image is then segmented into a number of disjoint triangles in Fig. 2.24 (c).

For each triangle, a projection score histogram and some moments are extracted as the feature vectors. However, using one 2D binary shape can not represent a complex 3D model well.

Fig. 2.24 (a) the original 3D model; (b) all points is projected onto principal plane; (c) transforming a 3D model into an adjacent triangular sequence (ATS) using the proposed triangulation process [32].

2.3.4 Optimal Selection of 2D Views

Ansary et al. [33-35] provides an “optimal” selection of 2D views to represent a 3D model. To generate the initial set of views for a 3D model. 320 2D views are projected from multiple viewpoints. These viewpoints are equally distributed on the unit sphere.

Based on the Zernike moment descriptor, 49 coefficients are used to represent these 2D views. Using the Bayesian Information Criteria (BIC), 40 representative views are extracted as the feature vector for 3D model retrieval as shown in Fig. 2.25.

Fig. 2.25 Feature view selection process [33].

2.3.5 Grid Sphere and Dodecahedron

Shih et al. [36] proposed two features, grid sphere descriptor (GSD) and dodecahedral silhouette descriptor (DSD), to describe inside and outside information of a 3D model. For GSD, a 3D model is segmented by a 32 × 64 × 64 sphere grid for each 3D object. There are 32 shells and each shell is segmented by 64×64 grid as shown in Fig. 2.26 (a). For each shell, the number of valid grids is calculated to get the GSD. For DSD, six silhouettes are rendered from faces of dodecahedron over a hemisphere to represent a 3D model. Each silhouette is represented by a 2D binary image as shown in Fig. 2.26 (b). The angular radial transformation (ART) is extracted for each silhouette as the feature vector. These two features, GSD and DSD, can be combined for 3D model retrieval.

18

(a) (b)

Fig. 2.26 (a) The 3D model is segmented by sphere grid. (b) Six silhouettes from the face of dodecahedron over a hemisphere [36].

In fact, 2D silhouettes represented by binary images can not describe the altitude information of a 3D model from different views. Thus some authors [37-41] describe altitude information by gray-level images.

2.3.6 Depth Information Descriptor with DFT

Vranic et al. [37] proposed depth buffer descriptor. Six grey-scale images are rendered
using parallel projection. The six images are transformed using the discrete Fourier
*transform (DFT) as shown in Fig. 2.27. The magnitudes of certain first k low-frequency *
coefficients of DFT are used as the depth buffer feature vector.

Fig. 2.27 The depth buffers of a car and the 2D DFT of the six images [37].

2.3.7 Elevation Descriptor

Shih et al. [38, 39] propose elevation descriptor for 3D model retrieval. The elevation descriptor is invariant to translation and scaling of 3D models and it is robust for rotation.

First, a 3D model is represented by six gray-level images which describe the altitude information of a 3D model from six different views including front, left, right, rear, top and bottom in Fig. 2.28 (b). Each gray-level image, called an elevation, is decomposed into

several concentric circles in Fig. 2.29. The sum of the altitude information within each concentric circle is then calculated. To be less sensitive to rotations, the elevation descriptor is obtained by taking the difference between the altitude sums of two successive concentric circles. Since there are six elevations for each 3D model, an efficient similarity matching method is provided to find the best match for an input model.

Fig. 2.28 (a) The voxel grid of 3D tank model. (b) Six elevations of a 3D military tank model including front, top, left, right, rear, and bottom elevations [38].

Fig. 2.29 The concentric circles of an elevation [38].

2.3.8 Curvature Maps Descriptor

Assfalg et al. [40-42] provided content-based 3D model retrieval through curvature maps. The geometric structure description of a 3D object is accomplished through the following steps: 1. smoothing and polygon simplification, 2. curvature estimation, 3.

deformation, 4. curvature mapping. In this approach, information about curvature maps is captured as: 1. tiles obtained by a uniform tessellation of the map, 2. homogeneous regions obtained by segmenting the map. The histograms are used to capture global properties of map tiles as shown in Fig. 2.30. The weighted walkthroughs is then computed describe spatial arrangement and local properties of regions on the map.

20

Fig. 2.30 Curvature maps and three best matching tiles of two 3D objects [40, 42].

2.3.9 Principal Plane Descriptor

Shih et al. [43] proposed a 3D model retrieval approach based on the principal plane descriptor. First, a 3D model is transformed into a 2D binary image by projecting it on the principal plane (see Fig. 2.31), the symmetric surface of a 3D model [32]. Moreover, for exactly representing a 3D model, the second and third planes (see Fig. 2.32) must be calculated to obtain the other two binary projecting images. Each binary image is decomposed into several concentric circles in Fig. 2.33. The sum of the altitude information within each concentric circle is then calculated. To be less sensitive to rotations, the principal plane descriptor is obtained by taking the difference between the altitude sums of two successive concentric circles. Since there are three binary images for each 3D model, an efficient similarity matching method is provided to find the best match for an input model.

Fig. 2.31 The principal planes [43].

Fig. 2.32 3D models and their corresponding projection images [43].

Fig. 2.33 The projection image segmented by several concentric circles of the 3D dragon model [43].

2.3.10 The Spherical Trace Transform

Dimitios Zarpalas et al. [44] proposed a novel methodology for content-based 3D model search and retrieval. A 3D model is decomposed into 128×128×128 voxels. A coarser voxel is constructed by combining every eight (2×2×2) neighboring voxels. Thus, a 3D model is integrated into 64×64×64 coarser voxels (see Fig. 2.34). The 12 vertices of an icosahedron are used as the sampled points. Twelve planes which is tangential to the icosahedron can be obtained based on these sampled points (see Fig. 2.35). Each plane is a gray-level image with the value (0~8) representing the number of voxels in a coarser voxel.

There are 20 kinds of icosahedrons with different radius. Therefore, a 3D model will be represented into 240 (12×20) gray-level images. A set of feature vector can be extracted from these 240 gray-level images. Further, weights are assigned to these descriptors to improve the retrieval result.

Fig. 2.34 The coarser voxel [44].

Fig. 2.35 The different gray-level image at the different sampled points [44].

2.3.11 2D Depth Image Descriptor

Mohamed Chaouch et al. [45] proposed a 3D model retrieval system based on 20 depth

22

images rendered from the vertices of a regular dodecahedron (see Fig. 2.36). For the depth image, each row (depth line) is encoded into a sequence by five characters, o, c, /, -, \, representing as exterior-background, interior-background, increased-depth, constant-depth and decreased-depth (see Fig. 2.37). The depth sequence information provides a more accurate description of 3D shape boundaries than using other 2D shape description. The dynamic programming distance (DPD) is used to compare the depth line descriptors (see Fig. 2.38).

Fig. 2.36 The depth images from different vertices [45].

Fig. 2.37 The depth lines of a race car [45].

Fig. 2.38 The method of the dynamic programming distance [45].

2.3.12 Shape Impact Descriptor

Athanasios Mademlis et al. [46] proposed a 3D shape impact descriptor (SID) based on the physics laws of gravity. This approach offers native invariance with respect to rotation and translation of the object. The SID combines both local and global features which are robust against object degeneracies. Two instances impact descriptor: the Newtonian impact descriptor (NID) and the relativistic impact descriptor (RID) are proposed. The NID is computed in the surrounding area of the 3D object, which is common to the view- based approaches and RID captures the internal structure of the 3D object, which is not taken into account by the view-based approaches.

Fig. 2.39 The SID computation [46].

2.3.13 Panoramic Views

Papadakis et al. [47] proposed a 3D model retrieval system based on panoramic view.

The panoramic views can capture the global shape of the 3D model. The object is projected on three perpendicular cylinders, which are aligned with the three principal axes (see Fig.

2.40 (a)). The cylindrical projections capture the position and orientation of a 3D model’s surface (see Fig. 2.40 (b) and (c)). For each projection, corresponding 2D Discrete Fourier Transform and 2D Discrete Wavelet Transform are applied to extract feature vectors.

24 (a)

(b) (c)

Fig. 2.40 (a) Pose normalized 3D model (b) The position of the 3D model surface is captured by the unfolded cylindrical projection (c) The orientation of the 3D model surface is captured by the unfolded cylindrical projection [47].

2.3.14 Spatial Structure Circular Descriptor

Yue Gao et al. [48] proposed the spatial structure circular descriptor (SSCD) for 3D model retrieval (see Fig. 2.41). The SSCD described the global spatial structure of a 3D model by 2D images and the attribute values of each pixel represent 3D spatial information.

SSCD is invariant to rotation and scaling. First an SSCD is generated by projecting the sphere on the depth minimal bounding sphere. Each SSCD is represented by several SSCD images. Then the SSCD images are decomposed into concentric shells around the center of the circular and sectors from the center of the circular. Finally, the histogram which is based on the attribute values of pixels in each block is used as the feature vector for 3D model retrieval. A weighted bipartite graph is constructed to match two SSCDs. The Kunn-Munkras algorithm is used to measure the similarity of two 3D models.

Fig. 2.41 The 3D model comparison framework [48].

26

**Chapter 3 **

**The Proposed Method for 3D Model Retrieval **

In this chapter, the interior and exterior features of 3D models will be proposed for 3D model retrieval. The exterior features include angle descriptor and symmetric descriptor.

The interior features are angle 3D-ART, radial 3D-ART and curvature 3D-ART. In order to get a better retrieval result, Dynamic Multi-Descriptor Fusion (DMDF) [59] is used to combine these interior and exterior features.

**3.1 Pre-Processing **

Before extracting the features, 3D models are first normalized by using the voxel grids and aligned by the principal planes [43]. The steps of pre-processing are shown as follow.

**3.1.1 3D model Normalization **

Initially, the smallest bounding box circumscribing the 3D model is constructed (see Fig. 3.1(a)). The bounding box is then decomposed into 100×100×100 voxel grids (see Fig.

*3.1(b)). A voxel located at coordinate (x, y, z) is regarded as an opaque voxel, notated *
*Voxel(x, y, z) = 1, if there is a polygonal surface located within this voxel; otherwise, this *
*voxel is regarded as a transparent voxel, notated Voxel(x, y, z) = 0. Secondly, the model’s *
mass center is moved to location (50, 50, 50). The 3D model is then scaled such that the
average distance from all opaque voxels to the mass center becomes 32. Thus, the 3D
model will be invariant to translation and scaling, as shown in Fig. 3.1(c).

Fig. 3.1 The original and decomposed 3D stealth bomber model. (a) The 3D stealth bomber model circumscribed by a bounding box. (b) The bounding box of the 3D stealth bomber model is decomposed into a 100×100×100 voxel grid. (c) The normalized 3D stealth bomber model.

**3.1.2 Alignment of 3D model **

In this section, the 3D model is aligned by its principal planes [43]. The principal planes are extracted based on principal components analysis (PCA). After the covariance matrix of all opaque voxel of 3D model being found, the eigenvalue and eigenvector are computed. The first three eigenvectors corresponding to the eigenvalues in decreasing order are taken as the unit normal vectors of three principal planes (see Fig. 3.2).

Fig. 3.2 The three principal planes of 3D stealth bomber model.

**z ****x **

**y ****y ****y **

(a)

**z ****x **

(b)

**x **

**z **

(c)

28

**3.2 The Exterior Feature **

The angle descriptor is proposed as the exterior feature of 3D model by using the angle information on the surface of the 3D model. First, a 3D model is represented by six gray-level projection planes which describe the angle information of a 3D model from six different views. For each gray-level image, the 2D angular radial transformation (ART) [14]

is used to extract the feature vector. The exterior feature represents only the surface information of a 3D model.

**3.2.1 The Angle Projection **

To capture the orientation of the model’s surface, for each voxel we calculate the
**intersections of the surface with the ray at each direction r = (x, y, z) and measure the angle ****between a ray and the normal vector n of the mesh that is intersected (see Fig 3.4). The **
**cosine of the angle between the r and the normal vector of n is defined as: **

### ( )

| 255,r

| n ,

, = **n**^{T}**r** ×

*z*
*y*
*x*

*Ang* (3.1)

Let the six projection planes be notated as*I*_{k}*, k = 1, 2, …, 6. Then, the gray value of each *
pixel on these images (see Fig. 3.5) is defined as follows:

Fig. 3.3 Six different views of 3D model.

**Fig. 3.4 The angle between the direction r and the normal vector n of the mesh that is **
intersected of the 3D model.

, 50 50

- for ), ,

, ( )

,

( _{max(} _{,} _{)}

1 *x* *y* =*Ang* *x* *y* *z* ≤*x,y*≤

*I* _{x}_{y}

, 50 50

- for ), , ,

( )

,

( _{max(} _{,}_{)}

2 *x* *z* = *Ang* *x* *y* *z* ≤*x,z*≤

*I* _{x}_{z}

, 50 50

- for ), , , (

) ,

( _{max(} _{,}_{)}

3 *y* *z* =*Ang* *x* *y* *z* ≤*y,z*≤

*I* _{y}_{z}

, 50 50

- for ), ,

, ( )

,

( _{min(} _{,} _{)}

4 *x* *y* = *Ang* *x* *y* *z* ≤*x,y*≤

*I* _{x}_{y}

, 50 50

- for ), , ,

( )

,

( _{min(} _{,}_{)}

5 *x* *z* =*Ang* *x* *y* *z* ≤*x,z*≤

*I* _{x}_{z}

, 50 50

- for ), , , (

) ,

( _{min(} _{,}_{)}

6 *y* *z* =*Ang* *x* *y* *z* ≤*y,z*≤

*I* _{y}_{z}

where

)), , , ( (

max_{1} _{50}

) ,

max( *zVoxel* *x* *y* *z*
*z* _{x}* _{y}* =

_{≤}

_{z}_{≤}

)), , , ( (

max_{1} _{50}

) ,

max( *yVoxel* *x* *y* *z*
*y* _{x}* _{z}* =

_{≤}

_{y}_{≤}

)), , , ( (

max_{1} _{50}

) ,

max( *xVoxel* *x* *y* *z*
*x* _{y}* _{z}* =

_{≤}

_{x}_{≤}

)), , , ( (

min _{50} _{1}

) ,

min( *zVoxel* *x* *y* *z*

*z* _{x}* _{y}* =

_{−}

_{≤}

_{z}_{≤}

_{−}

)), , , ( (

min _{50} _{1}

) ,

min( *yVoxel* *x* *y* *z*

*y* _{x}* _{z}* =

_{−}

_{≤}

_{y}_{≤}

_{−}

)), , , ( (

min _{50} _{1}

) ,

min( *xVoxel* *x* *y* *z*

*x* _{y}* _{z}* =

_{−}

_{≤}

_{x}_{≤}

_{−}

30

(a) (b) (c)

Fig. 3.5 3D stealth bomber model and it’s six gray-level angle projection planes. (a) The right plane I1 and the left plane I4. (b) The top plane I2 and the bottom plane I5. (c) The front plane I3 and the rear plane I6.

**3.2.2 Feature Extraction by 2D-ART **

MPEG-7’s Angular Radial Transform (ART) is an orthogonal unitary transform. ART consists of a complete set of orthonormal sinusoidal basis functions which are defined on a unit disk in the polar coordinate system. Fig. 3.5 shows a set of ART basis functions. Let

P( , )ρ θ

*f**k* denote the gray-level value located at ( , )*ρ θ on the k-th angle projection plane *

*k*.

*I* The corresponding ART coefficient *F n, m is defined as: *_{k}^{P}( )

2 1

P P

, 0 0 ,

( , )= ( , ),ρ θ ( , )ρ θ =

### ∫ ∫

^{π}

^{∗}( , )ρ θ ( , ) ρ θ ρ ρ θ,

*k* *n m* *k* *n m* *k*

*F* *n m* *V* *f* *V* *f* *d d*

where *F n, m*_{k}^{P}( )* is the ART coefficient of order n and m, * *V*_{n m}_{, } ( , )ρ θ is the ART basis
function that are separable along the angular and radial directions:

, ( , )ρ θ = ( )θ ( ),ρ

*n m* *m* *n*

*V* *A* *R*

The angular basis function is defined as:

( ) 1 exp ( ),

θ 2 θ

= π

*A**m* *jm*

and the radial basis functions is defined as:

1 0

( ) .

2 cos ( ) 0

ρ π ρ

=

=

≠

*n*

*R* *n*

*n* *n*

*I*

2
*I*

1 *I*

_{3 }

*I*

4 *I*

5 *I*

_{6 }

The ART descriptor is formed by the magnitudes of all complex ART coefficients.

The default ART descriptor consists of 36 coefficients, *F*_{k}^{P}(*n, m ,*) for 0 ≤* n *≤ 2 and 0 ≤* m *

≤ 11. In summary, the angle descriptor is defined as:

P P T P T P T T

1 2 6

[( ) , ( ) ,..., ( ) ] ,

=

**x** **x** **x** **x**

**where ** **x**^{P}* _{k}*, 1≤

*k*≤6,

*is the ART feature vector extracted from the k-th projection plane.*

P P P P T

P P P P P P T

[ (1), (2), ..., (36)]

[| (0, 0) |,...,| (0,11) |,| (1, 0) |,...,| (1,11) |,| (2, 0) |,...,| (2,11) |] ,

=

=

**x**_{k}_{k}_{k}_{k}

*k* *k* *k* *k* *k* *k*

*x* *x* *x*

*F* *F* *F* *F* *F* *F*

Fig. 3.6 The ART basis functions. (a) The Real Parts of the ART basis functions. (b) The imaginary Parts of the ART basis functions [14].

Let**x**^{P} =[(**x**_{1}^{P T}) , (**x**^{P T}_{2}) ,..., (**x**^{P T T}_{6}) ] ,and **y**^{P} =[(**y**_{1}^{P T}) , (**y**^{P T}_{2}) ,..., (**y**^{P T T}_{6}) ] , denote the angle
*descriptor of a query model and the t-th matching model in the database, respectively. The *
*distance between the query model and the t-th matching model is defined as follows: *

6 6 36

P, P P P P

,

1 1 1

| | | ( ) ( ) |.

= = =

=

### ∑

− =### ∑∑

−*i*

*q t* *k* *k* *k* *k*

*k* *k* *i*

*Dis* *x* *y* *x i* *y i*

**3.2.3 The Symmetric Projection **

To extract the symmetrical information of 3D model, the difference and sum of angle projection planes is calculated. The six angle projection planes are first obtained. Next, the difference and sum of two corresponding planes are computed.

32

(a)

(b)

(c)

(d)

(e)

(f)

Fig. 3.7 3D stealth bomber model and it’s six symmetric projection planes. (a) The
*difference plane I'*1*. (b) The difference plane I'*2*. (c) The difference plane I'*3. (d) The sum
*plane I'*4*. (e) The sum plane I'*5*. (f) The sum plane I'*6.

*I'*

2
*I'*

1
*I'*

_{3 }

*I'*

_{4 }

*I'*

_{5 }

*I'*

_{6 }

*I*

_{1 }

*I*

_{2 }

*I*

4
*I*

_{5 }

*I*

_{3 }

*I*

_{6 }

*I*

_{3 }

*I*

_{6 }

*I*

_{2 }

*I*

_{5 }

*I*

1 *I*

4
The main steps for computing the symmetric descriptor of 3D models are described as follows:

(1)*First, the six projection planes, I*1*, I*2*,…,I*6 are extracted (see Fig. 3.5).

(2)The three difference planes are represented as *I*_{1}′, *I*_{2}′, *I*_{3}′* (see Fig. 3.7). The k-th *
difference projection planes *I** _{k}*′ is defined as:

3 ,

= − +

*k*′ *k* *k*

*I* *I* *I* (3.2)
*where k = 1, 2, 3. *

(3)The sum projection planes are represented as *I*_{4}′, , *I*_{5}′ *I*_{6}′* (see Fig. 3.7). The k-th sum *
projection planes *I** _{k}*′ is defined as:

( _{−}3) / 2,

= +

*k*′ *k* *k*

*I* *I* *I* (3.3)

**where k = 4, 5, 6. **

**3.2.4 Feature Extraction by 2D-ART **

Let *f*_{k}^{S}( , )ρ θ denote the gray-level value located at ( , )*ρ θ on the k-th symmetric *
projection plane *I** _{k}*′. The corresponding ART coefficient

*F n m*

_{k}^{S}( , ) is defined as:

2 1

S S

, 0 0 ,

( , )= ( , ),ρ θ ( , )ρ θ =

### ∫ ∫

^{π}

^{∗}( , )ρ θ ( , ) ρ θ ρ ρ θ.

*k* *n m* *k* *n m* *k*

*F n m* *V* *f* *V* *f* *d d*

The ART descriptor is formed by the magnitudes of all complex ART coefficients.

The default ART descriptor consists of 36 coefficients, *F*_{k}^{S}(*n, m ,*) for 0 ≤ n ≤ 2 and 0 ≤ m

≤ 11. In summary, the symmetric descriptor is defined as:

S S T S T S T T

1 2 6

[( ) , ( ) ,..., ( ) ] ,

=

**x** **x** **x** **x**

where **x**^{S}* _{k}*, 1≤

*k*≤6,

*is the ART feature vector extracted from the k-th projection plane.*

S S S S T

T

[ (1), (2), ..., (36)]

[| (0, 0) |,...,| (0,11) |,| (1, 0) |,...,| (1,11) |,| (2, 0) |,...,| (2,11) |] .

=

=

**x**_{k}_{k}_{k}_{k}

*k* *k* *k* *k* *k* *k*

*x* *x* *x*

*F* *F* *F* *F* *F* *F*

Let **x**^{S}=[(**x**_{1}^{S T}) , (**x**^{S T}_{2}) ,..., (**x**^{S T T}_{6}) ] , and **y**^{S}=[(**y**_{1}^{S T}) , (**y**^{S T}_{2}) ,..., (**y**^{S T T}_{6}) ] , denote the
*symmetric descriptor of a query model and the t-th matching model in the database, *
*respectively. The distance between the query model and the t-th matching model is defined *
as follows:

6 6 36

S, S S S S

,

1 1 1

| | | ( ) ( ) |.

= = =

=

### ∑

− =### ∑∑

−*i*

*q t* *k* *k* *k* *k*

*k* *k* *i*

*Dis* *x* *y* *x i* *y i*

34

**3.3 The Interior Feature **

In this section, three interior feature, angle 3D-ART descriptor, radial 3D-ART descriptor and curvature 3D-ART descriptor will be proposed, to improve the tradital 3D-ART [27]. To describe the shape information of a 3D model, 3D-ART will be computed based on all voxel within the model. Moveover, these three interior features will be combined with the exterior features to improve the retrieval efficiency.

**3.3.1 3D-ART **

The 3D-ART [27] is an extension of MPEG-7’s 2D region-base shape descriptor, Angular Radial Transform (ART). The 3D-ART can express voxel distribution within 3D model (Fig 3.8).

*Fig. 3.8 An opaque voxel, notated Voxel(x, y, z) = 1, if there is a polygonal surface located *
*within this voxel; otherwise, this voxel is regarded as a transparent voxel, notated Voxel(x, *
*y, z) = 0. *

Fig. 3.9 Spherical coordinates system.

First, the object is represented in spherical coordinates ( , , )ρ θ ϕ as shown in Fig 3.9.

The 3D ART is a complex unitary transform defined on a unit sphere and its coefficients are defined by:

2 1 3DART

0 0 0 ( , , ) ( , , ) ,

θ ϕ θ ϕ

π π

ρ θ ϕ ρ θ ϕ ρ ρ θ ϕ

=

### ∫ ∫ ∫

∗*nm m* *nm m*

*F* *V* *Voxel* *d d d*

where *F*_{nm m}^{3DART}_{θ} _{ϕ} *is the ART coefficient of order n, * *m and *_{θ} *m*_{ϕ}, *Voxel*( , , )ρ θ ϕ is a 3D
object function in spherical coordinates and ( , , )

θ ϕ ρ θ ϕ

*V**nm m* is a 3D ART basis function
that are separable along the angular and the two radial directions:

( , , ) ( ) ( ) ( ).

θ ϕ ρ θ ϕ = θ θ ϕ ϕ ρ

*nm m* *m* *m* *n*

*V* *A* *A* *R*

The angular basis functions are defined as:

( ) 1 exp(2 ),

2

θ θ θθ

= π

*A**m* *jm*

( ) 1 exp(2 ),

2

ϕ ϕ ϕϕ

= π

*A**m* *jm*

and the radial basis functions is defined as:

1 , 0

( )ρ 2cos( ), 0. πρ

=

=

*n* ≠
*R* *n*

*n* *n*

The 3D-ART descriptor is formed by the magnitudes of all complex ART coefficients. The
3D-ART descriptor consists of 144 coefficients, ^{3DART}|,

θ ϕ
*n,m ,m*

*| F* for

0≤ ≤*n* 3,0≤*m*_{θ} ≤5, and 0≤*m*_{ϕ} ≤5. The real parts of the 3D ART basis function are
shown in Fig 3.10.

Fig. 3.10 Real parts of 3D ART basis function.

**3.3.2 Angle 3D-ART Descriptor **

The angle 3D-ART descriptor is proposed by using angle information of 3D model’s
**surface. According to Sec. 3.2.1, the cosine of the angle of mesh located at ** ( , , )ρ θ ϕ
is *Ang*( , , ).ρ θ ϕ The corresponding angle 3D-ART coefficient ^{A}

θ ϕ

*F**nm m* **is defined as: **