Texture and Shape for Image Retrieval – Multimedia Analysis and Indexing
Winston H. Hsu
National Taiwan University, Taipei
October 23, 2007
Office: R512, CSIE Building
Communication and Multimedia Lab (通訊與多媒體實驗室) http://www.csie.ntu.edu.tw/~winston
Outline
Texture
Statistical features
Spectral features
Edge
Shape
MMAI, Fall 07 - Winston Hsu, NTU -3-
Reminder
Homework #2
Due: TA@501 (noon, Tuesday, November 13)
Rule – “deliver quality work on time with integrity!!”
Midterm
A small recap of what we mentioned (major literatures)
High-level concepts mentioned in the course
Open book (no computer) but requiring no print-out
Mailing list
http://cmlmail.csie.ntu.edu.tw/mailman/listinfo/mmai
MMAI, Fall 07 - Winston Hsu, NTU -4-
1 9/25/07 holiday
2 10/02/07 introduction
3 10/09/07 mpeg; shot detection 4 10/16/07 cbr overview; color
5 10/23/07 texture+shape; relevance feedback
6 10/30/07 multidimensional indexing; feature reduction
7 11/06/07 midterm
8 11/13/07 gmm+cbir; svm+cbir (graphical/discriminative models) 9 11/20/07 structure discovery (sports; story)
10 11/27/07 TRECVID; concept detection; image annotation 11 12/04/07 concept detection; image annotation
12 12/11/07 un-/supervised clustering (clustering) 13 12/18/07 video retrieval
14 12/25/07 intro audio/music
15 01/01/08 holiday
16 01/08/08 project presentation #1, #2 17 01/15/08 final (no course)
18 01/22/08 project report due
Syllabus (tentative)
MMAI, Fall 07 - Winston Hsu, NTU -5-
Scenario of Content-Based Image Retrieval
Image Database
feature (vector) space feature
extraction
query image retrieved images
distance metric
N 0
0
0 1
1
1
Fusion of Multimodal Features
How to weigh the feature significance ?
Cross-validation approach
User-selected
Automatically weighting by relevance feedback
Retrieval Results by Ranking ->
Score ->
Fusion approaches such as:
Sum (Borda fuse)
WtSum (weigthed Borda Fuse) Max (Round-Robin)
MMAI, Fall 07 - Winston Hsu, NTU -7-
MMAI, Fall 07 - Winston Hsu, NTU -8-
Texture
What is texture
Has structures or repetitious pattern, i.e., checkboard
Has statistical patterns, i.e., grass, sand, rock
Why texture?
Applications to satellite images, medical images
Describe contents of real world images, i.e., clouds, fabrics, surfaces, wood, stone
Data set
e.g., Brodatz: famous texture photographs for image- texture analysis
Man-made textures & natural objects
MMAI, Fall 07 - Winston Hsu, NTU -9-
Mosaic of Brodatz Texture
Types of Computational Texture Features
Structural – describing arrangement of texture elements
Statistical – characterizing texture in terms of statistical
features
Co-occurrence matrix
Tamura (coarseness, directionality, contrast)
Multiresolution simultaneous autoregressive model (MRSAR)
Edge histogram
Spectral – based on analysis in spatial-frequency
domain
Fourier domain energy distribution
Gabor
Pyramid-structure wavelet transform (PWT)
Tree-structure wavelet transform (TWT)
Laws Filter
MMAI, Fall 07 - Winston Hsu, NTU -11-
Co-occurrence Matrix
Co-occurrence matrix C
d
Specified with a displacement vector d = {(row, column)}
Entry C
d(i, j) indicates how many times a pixel with gray level i is separated from a pixel of gray level j by the displacement vector d
Usually use normalized version of C
d
Sometimes use symmetric version of C
dd = (1, 1)
physical meaning?
MMAI, Fall 07 - Winston Hsu, NTU -12-
Co-occurrence Matrix (cont.)
Examples
* From Prof. Leow Wee Kheng, NUS
MMAI, Fall 07 - Winston Hsu, NTU -13-
Co-occurrence Matrix (cont.)
Consider the following example (black = 1, white = 0)
For d=(1,1), the only non-zero entries are at (0,0) and (1,1) captures diagonal structure
For d=(0,1), the only non-zero entries are at (0,1) and (1,0) captures horizontal structure
Measures on the following features
What does it mean when entropy has the largest value as the Nd(i,j) are equal?
A almost-obsolete feature
Not effective for classification and retrieval
Expensive to compute
Co-occurrence Matrix (cont.)
MMAI, Fall 07 - Winston Hsu, NTU -15-
Tamura – Selected Textual Properties
fine / coarse
high contrast / low contrast
roughness / smooth
directional / non-directional
line-like / blob-like
regular / irregular
MMAI, Fall 07 - Winston Hsu, NTU -16-
Psychophysical experiments – high correlation between some groups of properties
Coarseness
Contrast
Roughness
Orientation
Line-like
Regularity
Computational measures
Coarseness
Contrast
Orientation
Usefulness in Describing Texture
Similar correlations
MMAI, Fall 07 - Winston Hsu, NTU -17-
Tamura – Coarseness
Goal
Pick a large size as best when coarse texture is present, or a small size when only fine texture
Step 1: Compute averages at different scales at every points
Tamura – Coarseness (cont.)
Step 2: compute neighborhood difference at
each scale on opposite sides of different
directions
MMAI, Fall 07 - Winston Hsu, NTU -19-
Tamura – Coarseness (cont.)
Step 3: select the scale with the largest variation
Step 4: compute the coarseness
crs
MMAI, Fall 07 - Winston Hsu, NTU -20-
Tamura – Contrast
Gaussian-like histogram distribution low contrast
Histogram polarization. Is it Gaussian? How many peaks it has?
Where they are?
Polarization can be estimated by the kurtosis (曲率度)
MMAI, Fall 07 - Winston Hsu, NTU -21-
Tamura – Contrast (cont.)
Contrast estimate is given by:
unimodal distribution distribution with two separate peaks
Tamura – Orientation
Building the histogram of local edges at different orientations
By deriving the edge magnitude at X and Y directions
MMAI, Fall 07 - Winston Hsu, NTU -23-
Tamura – Orientation (cont.)
Compute the estimate from the sharpness of the peaks
By summing the second moments around each peak e.g., flat histogram
large 2nd moment (variance) small orientation
MMAI, Fall 07 - Winston Hsu, NTU -24-
(MR)SAR
Each pixel is a random variable whose value is estimated from its neighboring pixels + noise
A kid of Markov Random Field model
SAR Model (Simultaneous Autoregressive)
Describes each pixel in terms of its neighboring pixels.
MRSAR Model (MultiResolution SAR)
Describing granularities by representing textures at variety of resolutions
SAR applied at various image levels
Metric parameter differences
[Mao’92]
SAR SAR
SAR input image image pyramid
model parameters
MMAI, Fall 07 - Winston Hsu, NTU -25-
Edge Histogram
Edge histogram (EHD)
Captures the spatial distribution of the edge in six statues: 0º, 45º, 90º, 135º, non direction and no edge.
Utilizing the filters
Global EHD of an image: Concatenating 16 sub EHDs into a 96 bins
Local EHD of a segment
Grouping the edge histogram of the image-blocks fallen into the segment
Macro-block
Image-block
90° edge 0 ° edge 45 ° edge 135 ° edge non-directional edge
Vector Space Concept
Orthonormal Bases (d-dim. vectors)
Any vector in a vector space can be expanded by the set of orthonormal signals
Response for basis k,
Transform to the new bases
(1D/2D) Fourier bases are sets of orthornomal signals
MMAI, Fall 07 - Winston Hsu, NTU -27-
F g x, y
( ( ) ) ( )
u, v = g x, y( )
e!i2" ux+vy( )dxdyR2
##
The Fourier Transform
Represent function on a new basis
Think of functions as vectors, with many components
We now apply a linear transformation to transform the basis
dot product with each basis element
In the expression, u and v select the basis element, so a function of x and y becomes a function of u and v
basis elements have the form
e!i2" ux+vy( )MMAI, Fall 07 - Winston Hsu, NTU -28-
Visual Sinus Pattern*
*The following 5 slides are from Jaap van de Loosdrecht, Noordelijke Hogeschool Leeuwarden
MMAI, Fall 07 - Winston Hsu, NTU -29-
Visual Sinus Pattern w/ Low Frequency
Sinus Pattern Rotated 45 Deg.
MMAI, Fall 07 - Winston Hsu, NTU -31-
2D Sinus Pattern
MMAI, Fall 07 - Winston Hsu, NTU -32-
Difference in spatial vs. frequency domain
1D sync function of different scales
2D Rectangle
MMAI, Fall 07 - Winston Hsu, NTU -33-
Interpreting the Power Spectrum
Explain structures in power spectrum
DC
high frequency
low frequency
1
3 dark 2 3 bright
Phase and Magnitude
Fourier transform of a real function is complex
difficult to plot, visualize
instead, we can think of the phase and magnitude of the transform
Phase is the phase of the complex transform
Magnitude is the
magnitude of the complex transform
Curious fact
all natural images have about similar magnitude transform
hence, phase seems to matter, but magnitude largely doesn’t
Same for audio?
Demonstration
Take two pictures, swap the phase transforms, compute the inverse - what does the result look like?
MMAI, Fall 07 - Winston Hsu, NTU -35-
MMAI, Fall 07 - Winston Hsu, NTU -36-
This is the magnitude transform of the zebra pic
MMAI, Fall 07 - Winston Hsu, NTU -37-
This is the phase transform of the zebra pic
MMAI, Fall 07 - Winston Hsu, NTU -39-
This is the magnitude transform of the cheetah pic
MMAI, Fall 07 - Winston Hsu, NTU -40-
This is the phase transform of the cheetah pic
MMAI, Fall 07 - Winston Hsu, NTU -41-
Reconstruction with zebra phase, cheetah magnitude
Reconstruction with cheetah phase, zebra magnitude
MMAI, Fall 07 - Winston Hsu, NTU -43-
Natural Images and Their FT
What happened to the FT patterns when the texture scale and orientation are changed?
MMAI, Fall 07 - Winston Hsu, NTU -44-
Frequency Domain Features
Fourier domain energy distribution
Angular features (directionality)
where,
Radial features (coarseness)
where,
Uniform division may not be the best!!
F T
MMAI, Fall 07 - Winston Hsu, NTU -45-
Gabor Texture
Fourier coefficients depend on the entire image (Global) we lose spatial information
Objective: local spatial frequency analysis
Gabor kernels: looks like Fourier basis multiplied by a Gaussian
The product of a symmetric (even) Gaussian with an oriented sinusoid
Gabor filters come in pairs: symmetric and anti-symmetric (odd)
Each pair recover symmetric and anti-symmetric components in a particular direction
(kx, ky): the spatial frequency to which the filter responds strongly
σ : the scale of the filter. When σ = infinity, similar to FT
We need to apply a number of Gabor filters are different scales, orientations, and spatial frequencies
Example – Gabor Kernel
Gabor kernel zebra image
magnitude of the filtered image
Zebra stripes at different scales and orientations and convolved with the Gabor kernel
The response falls off when the stripes are larger or smaller
The response is large when the spatial frequency of the bars roughly matches the windowed by the Gaussian in the Gabor kernel
Local spatial frequency analysis
MMAI, Fall 07 - Winston Hsu, NTU -47-
Gabor Texture (cont.)
Image I(x,y) convoluted with Gabor filters h
mn(totally M x N)
Using first and 2nd moments for each scale and orientations
Features: e.g., 4 scales, 6 orientations
48 dimensions
odd even
Gabor kernels
MMAI, Fall 07 - Winston Hsu, NTU -48-
Gabor Texture (cont.)
Arranging the mean energy in a 2D form
structured: localized pattern
oriented (or directional): column pattern
granular: row pattern
random: random pattern
orientation
scale
frequency domain
MMAI, Fall 07 - Winston Hsu, NTU -49-
Laws Texture Energy Features
Non-Fourier type bases
Match better to intuitive texture features
The filter algorithm
Filter the input image using texture filters
Computer texture energy by summing the absolute value of filtered results in local neighborhoods around each pixel
Combine features to achieve rotational invariance
Law’s Texture Masks (1)
Basic 1D masks can be extended to create 2D masks
L5 (Level) = [ 1 4 6 4 1 ]
(Gaussian) gives a center-weighted local average
E5 (Edge) = [ -1 -2 0 2 1 ]
(gradient) responds to row or column step edges
S5 (Spot) = [ -1 0 2 0 -1 ]
(LoG) detects spots
R5 (Ripple) = [ 1 -4 6 -4 1 ]
(Gabor) detects ripplesMMAI, Fall 07 - Winston Hsu, NTU -51-
E5
L5
E5L5 Law’s Texture Masks (2)
Create 2D mask
MMAI, Fall 07 - Winston Hsu, NTU -52-
Laws Filters (2D)
MMAI, Fall 07 - Winston Hsu, NTU -53-
Laws Process
Wavelet Features (PWT, TWT)
Wavelet
Decomposition of signal with a family of basis functions with recursive filtering and sub-sampling
Each level, decomposes 2D signal into 4 subbands, LL, LH, HL, HH (L=low, H=high)
PWT: pyramid-structured wavelet transform
Recursively decomposes the LL band
Feature dimension (3x3x1+1)x2 = 20
TWT: pyramid-structured wavelet transform
Some information in the middle frequency channels
Feature dimension 40x2 = 80
MMAI, Fall 07 - Winston Hsu, NTU -55-
Texture Comparisons
Retrieval performance of different texture features according to the number of relevant images retrieved at various scopes using Corel Photo galleries
# of top matches considered
# of relevant images
[Ma’98]
MRSAR (M)
Gabor TWT PWT MRSAR
Tamura (improved)
Coarseness histogram directionality edge histogram Tamura
MMAI, Fall 07 - Winston Hsu, NTU -56-
Texture Comparisons (cont.)
Retrieval performance of texture features in terms of the number of top matches considered using Brodatz album
# of top matches considered recall
[Ma’98]
Running Running
MRSAR (M) Gabor
TWTPWT MRSAR Tamura (improved)
Coarseness histogram
directionality edge histogram Tamura
MMAI, Fall 07 - Winston Hsu, NTU -57-
Texture Comparisons (cont.)
Images of rock samples in applications related to oil exploitation [Li’00]
Texture Comparisons (cont.)
Images of rock samples in applications related to oil exploitation
Gabor descriptors outperform the others
[Li’00]
MMAI, Fall 07 - Winston Hsu, NTU -59-
Learned Similarity
Distance metrics DO matter
All based on Gabor features
Euclidean vs.
learned (supervised) distance metric
The later was maintained with texture thesaurus
[Ma’96]
Euclidean distance
learned (supervised) distance
MMAI, Fall 07 - Winston Hsu, NTU -60-
Shape
Region-base descriptor
Contour-based Shape Descriptor
2D/3D Shape Descriptor
Some relevant ones are included in MPEG-7
Not easy to derive automatically
[Bober’01]
MMAI, Fall 07 - Winston Hsu, NTU -61-
Region-based vs. Contour-based Descriptor
Columns indicate contour similarity
Outline of contours
Rows indicate region similarity
Distribution of pixels
Region-based Descriptor
Express pixel distribution within a 2D object region
Employs a complex 2D Angular Radial Transformation (ART)
35 fields each of 4 bits
Rotational and scale invariance
Robust to some non-rigid transformation
L
1metric on transformed coefficients
Advantages
Describing complex shapes with disconnected regions
Robust to segmentation noise
Small size
Fast extraction and matching
MMAI, Fall 07 - Winston Hsu, NTU -63-
(a) (b)
(c)
(d)
(e)
Contour-based Descriptor
It’s based on Curvature
(曲率)Scale-Space (CSS) representation
Found to be superior to
Zernike moments
ART
Fourier-based
Turning angles
Wavelets
Rotational and scale invariance
Robust to some non-rigid transformations
For example
Applicable to (a)
Discriminating differences in (b)
Finding similarities in (c)-(e)
MMAI, Fall 07 - Winston Hsu, NTU -64-
Problems in Shape-based Indexing
Many existing approaches assume
Segmentation is given
Human operator circle object of interest
Lack of clutter and shadows
Objects are rigid
Planar (2-D) shape models
Models are known in advance
MMAI, Fall 07 - Winston Hsu, NTU -65-
Summary
Texture features
Statistical
Spectral
Texture computation are time-consuming
compressed domain features?
Shape features
Multimodal fusion are quite helpful
Next week
Efficient indexing on high-dimensional data