• 沒有找到結果。

Texture and Shape for Image Retrieval – Multimedia Analysis and Indexing

N/A
N/A
Protected

Academic year: 2022

Share "Texture and Shape for Image Retrieval – Multimedia Analysis and Indexing"

Copied!
33
0
0

加載中.... (立即查看全文)

全文

(1)

Texture and Shape for Image Retrieval – Multimedia Analysis and Indexing

Winston H. Hsu

National Taiwan University, Taipei

October 23, 2007

Office: R512, CSIE Building

Communication and Multimedia Lab (通訊與多媒體實驗室) http://www.csie.ntu.edu.tw/~winston

Outline

Texture

Statistical features

Spectral features

Edge

Shape

(2)

MMAI, Fall 07 - Winston Hsu, NTU -3-

Reminder

Homework #2

Due: TA@501 (noon, Tuesday, November 13)

Rule – “deliver quality work on time with integrity!!”

Midterm

A small recap of what we mentioned (major literatures)

High-level concepts mentioned in the course

Open book (no computer) but requiring no print-out

Mailing list

http://cmlmail.csie.ntu.edu.tw/mailman/listinfo/mmai

MMAI, Fall 07 - Winston Hsu, NTU -4-

1 9/25/07 holiday

2 10/02/07 introduction

3 10/09/07 mpeg; shot detection 4 10/16/07 cbr overview; color

5 10/23/07 texture+shape; relevance feedback

6 10/30/07 multidimensional indexing; feature reduction

7 11/06/07 midterm

8 11/13/07 gmm+cbir; svm+cbir (graphical/discriminative models) 9 11/20/07 structure discovery (sports; story)

10 11/27/07 TRECVID; concept detection; image annotation 11 12/04/07 concept detection; image annotation

12 12/11/07 un-/supervised clustering (clustering) 13 12/18/07 video retrieval

14 12/25/07 intro audio/music

15 01/01/08 holiday

16 01/08/08 project presentation #1, #2 17 01/15/08 final (no course)

18 01/22/08 project report due

Syllabus (tentative)

(3)

MMAI, Fall 07 - Winston Hsu, NTU -5-

Scenario of Content-Based Image Retrieval

Image Database

feature (vector) space feature

extraction

query image retrieved images

distance metric

N 0

0

0 1

1

1

Fusion of Multimodal Features

How to weigh the feature significance ?

Cross-validation approach

User-selected

Automatically weighting by relevance feedback

Retrieval Results by Ranking ->

Score ->

Fusion approaches such as:

Sum (Borda fuse)

WtSum (weigthed Borda Fuse) Max (Round-Robin)

(4)

MMAI, Fall 07 - Winston Hsu, NTU -7-

MMAI, Fall 07 - Winston Hsu, NTU -8-

Texture

What is texture

Has structures or repetitious pattern, i.e., checkboard

Has statistical patterns, i.e., grass, sand, rock

Why texture?

Applications to satellite images, medical images

Describe contents of real world images, i.e., clouds, fabrics, surfaces, wood, stone

Data set

e.g., Brodatz: famous texture photographs for image- texture analysis

Man-made textures & natural objects

(5)

MMAI, Fall 07 - Winston Hsu, NTU -9-

Mosaic of Brodatz Texture

Types of Computational Texture Features

Structural – describing arrangement of texture elements

Statistical – characterizing texture in terms of statistical

features

Co-occurrence matrix

Tamura (coarseness, directionality, contrast)

Multiresolution simultaneous autoregressive model (MRSAR)

Edge histogram

Spectral – based on analysis in spatial-frequency

domain

Fourier domain energy distribution

Gabor

Pyramid-structure wavelet transform (PWT)

Tree-structure wavelet transform (TWT)

Laws Filter

(6)

MMAI, Fall 07 - Winston Hsu, NTU -11-

Co-occurrence Matrix

Co-occurrence matrix C

d

Specified with a displacement vector d = {(row, column)}

Entry C

d

(i, j) indicates how many times a pixel with gray level i is separated from a pixel of gray level j by the displacement vector d

Usually use normalized version of C

d

Sometimes use symmetric version of C

d

d = (1, 1)

physical meaning?

MMAI, Fall 07 - Winston Hsu, NTU -12-

Co-occurrence Matrix (cont.)

Examples

* From Prof. Leow Wee Kheng, NUS

(7)

MMAI, Fall 07 - Winston Hsu, NTU -13-

Co-occurrence Matrix (cont.)

Consider the following example (black = 1, white = 0)

For d=(1,1), the only non-zero entries are at (0,0) and (1,1)  captures diagonal structure

For d=(0,1), the only non-zero entries are at (0,1) and (1,0)  captures horizontal structure

Measures on the following features

What does it mean when entropy has the largest value as the Nd(i,j) are equal?

A almost-obsolete feature

Not effective for classification and retrieval

Expensive to compute

Co-occurrence Matrix (cont.)

(8)

MMAI, Fall 07 - Winston Hsu, NTU -15-

Tamura – Selected Textual Properties

fine / coarse

high contrast / low contrast

roughness / smooth

directional / non-directional

line-like / blob-like

regular / irregular

MMAI, Fall 07 - Winston Hsu, NTU -16-

Psychophysical experiments – high correlation between some groups of properties

Coarseness

Contrast

Roughness

Orientation

Line-like

Regularity

Computational measures

Coarseness

Contrast

Orientation

Usefulness in Describing Texture

Similar correlations

(9)

MMAI, Fall 07 - Winston Hsu, NTU -17-

Tamura – Coarseness

Goal

Pick a large size as best when coarse texture is present, or a small size when only fine texture

Step 1: Compute averages at different scales at every points

Tamura – Coarseness (cont.)

Step 2: compute neighborhood difference at

each scale on opposite sides of different

directions

(10)

MMAI, Fall 07 - Winston Hsu, NTU -19-

Tamura – Coarseness (cont.)

Step 3: select the scale with the largest variation

Step 4: compute the coarseness

crs

MMAI, Fall 07 - Winston Hsu, NTU -20-

Tamura – Contrast

Gaussian-like histogram distribution  low contrast

Histogram polarization. Is it Gaussian? How many peaks it has?

Where they are?

Polarization can be estimated by the kurtosis (曲率度)

(11)

MMAI, Fall 07 - Winston Hsu, NTU -21-

Tamura – Contrast (cont.)

Contrast estimate is given by:

unimodal distribution distribution with two separate peaks

Tamura – Orientation

Building the histogram of local edges at different orientations

By deriving the edge magnitude at X and Y directions

(12)

MMAI, Fall 07 - Winston Hsu, NTU -23-

Tamura – Orientation (cont.)

Compute the estimate from the sharpness of the peaks

By summing the second moments around each peak e.g., flat histogram

 large 2nd moment (variance)  small orientation

MMAI, Fall 07 - Winston Hsu, NTU -24-

(MR)SAR

Each pixel is a random variable whose value is estimated from its neighboring pixels + noise

A kid of Markov Random Field model

SAR Model (Simultaneous Autoregressive)

Describes each pixel in terms of its neighboring pixels.

MRSAR Model (MultiResolution SAR)

Describing granularities by representing textures at variety of resolutions

SAR applied at various image levels

Metric  parameter differences

[Mao’92]

SAR SAR

SAR input image image pyramid

model parameters

(13)

MMAI, Fall 07 - Winston Hsu, NTU -25-

Edge Histogram

Edge histogram (EHD)

Captures the spatial distribution of the edge in six statues: 0º, 45º, 90º, 135º, non direction and no edge.

Utilizing the filters

Global EHD of an image: Concatenating 16 sub EHDs into a 96 bins

Local EHD of a segment

Grouping the edge histogram of the image-blocks fallen into the segment

Macro-block

Image-block

90° edge 0 ° edge 45 ° edge 135 ° edge non-directional edge

Vector Space Concept

Orthonormal Bases (d-dim. vectors)

Any vector in a vector space can be expanded by the set of orthonormal signals

Response for basis k,

Transform to the new bases

(1D/2D) Fourier bases are sets of orthornomal signals

(14)

MMAI, Fall 07 - Winston Hsu, NTU -27-

F g x, y

( ( ) ) ( )

u, v = g x, y

( )

e!i2" ux+vy( )dxdy

R2

##

The Fourier Transform

Represent function on a new basis

Think of functions as vectors, with many components

We now apply a linear transformation to transform the basis

dot product with each basis element

In the expression, u and v select the basis element, so a function of x and y becomes a function of u and v

basis elements have the form

e!i2" ux+vy( )

MMAI, Fall 07 - Winston Hsu, NTU -28-

Visual Sinus Pattern*

*The following 5 slides are from Jaap van de Loosdrecht, Noordelijke Hogeschool Leeuwarden

(15)

MMAI, Fall 07 - Winston Hsu, NTU -29-

Visual Sinus Pattern w/ Low Frequency

Sinus Pattern Rotated 45 Deg.

(16)

MMAI, Fall 07 - Winston Hsu, NTU -31-

2D Sinus Pattern

MMAI, Fall 07 - Winston Hsu, NTU -32-

Difference in spatial vs. frequency domain

1D sync function of different scales

2D Rectangle

(17)

MMAI, Fall 07 - Winston Hsu, NTU -33-

Interpreting the Power Spectrum

Explain structures in power spectrum

DC

high frequency

low frequency

1

3 dark 2 3 bright

Phase and Magnitude

Fourier transform of a real function is complex

difficult to plot, visualize

instead, we can think of the phase and magnitude of the transform

Phase is the phase of the complex transform

Magnitude is the

magnitude of the complex transform

Curious fact

all natural images have about similar magnitude transform

hence, phase seems to matter, but magnitude largely doesn’t

Same for audio?

Demonstration

Take two pictures, swap the phase transforms, compute the inverse - what does the result look like?

(18)

MMAI, Fall 07 - Winston Hsu, NTU -35-

MMAI, Fall 07 - Winston Hsu, NTU -36-

This is the magnitude transform of the zebra pic

(19)

MMAI, Fall 07 - Winston Hsu, NTU -37-

This is the phase transform of the zebra pic

(20)

MMAI, Fall 07 - Winston Hsu, NTU -39-

This is the magnitude transform of the cheetah pic

MMAI, Fall 07 - Winston Hsu, NTU -40-

This is the phase transform of the cheetah pic

(21)

MMAI, Fall 07 - Winston Hsu, NTU -41-

Reconstruction with zebra phase, cheetah magnitude

Reconstruction with cheetah phase, zebra magnitude

(22)

MMAI, Fall 07 - Winston Hsu, NTU -43-

Natural Images and Their FT

What happened to the FT patterns when the texture scale and orientation are changed?

MMAI, Fall 07 - Winston Hsu, NTU -44-

Frequency Domain Features

Fourier domain energy distribution

Angular features (directionality)

where,

Radial features (coarseness)

where,

Uniform division may not be the best!!

F T

(23)

MMAI, Fall 07 - Winston Hsu, NTU -45-

Gabor Texture

Fourier coefficients depend on the entire image (Global)  we lose spatial information

Objective: local spatial frequency analysis

Gabor kernels: looks like Fourier basis multiplied by a Gaussian

The product of a symmetric (even) Gaussian with an oriented sinusoid

Gabor filters come in pairs: symmetric and anti-symmetric (odd)

Each pair recover symmetric and anti-symmetric components in a particular direction

(kx, ky): the spatial frequency to which the filter responds strongly

σ : the scale of the filter. When σ = infinity, similar to FT

We need to apply a number of Gabor filters are different scales, orientations, and spatial frequencies

Example – Gabor Kernel

Gabor kernel zebra image

magnitude of the filtered image

Zebra stripes at different scales and orientations and convolved with the Gabor kernel

The response falls off when the stripes are larger or smaller

The response is large when the spatial frequency of the bars roughly matches the windowed by the Gaussian in the Gabor kernel

Local spatial frequency analysis

(24)

MMAI, Fall 07 - Winston Hsu, NTU -47-

Gabor Texture (cont.)

Image I(x,y) convoluted with Gabor filters h

mn

(totally M x N)

Using first and 2nd moments for each scale and orientations

Features: e.g., 4 scales, 6 orientations

 48 dimensions

odd even

Gabor kernels

MMAI, Fall 07 - Winston Hsu, NTU -48-

Gabor Texture (cont.)

Arranging the mean energy in a 2D form

structured: localized pattern

oriented (or directional): column pattern

granular: row pattern

random: random pattern

orientation

scale

frequency domain

(25)

MMAI, Fall 07 - Winston Hsu, NTU -49-

Laws Texture Energy Features

Non-Fourier type bases

Match better to intuitive texture features

The filter algorithm

Filter the input image using texture filters

Computer texture energy by summing the absolute value of filtered results in local neighborhoods around each pixel

Combine features to achieve rotational invariance

Law’s Texture Masks (1)

Basic 1D masks  can be extended to create 2D masks

L5 (Level) = [ 1 4 6 4 1 ]

(Gaussian) gives a center-weighted local average

E5 (Edge) = [ -1 -2 0 2 1 ]

(gradient) responds to row or column step edges

S5 (Spot) = [ -1 0 2 0 -1 ]

(LoG) detects spots

R5 (Ripple) = [ 1 -4 6 -4 1 ]

(Gabor) detects ripples

(26)

MMAI, Fall 07 - Winston Hsu, NTU -51-

E5

L5

E5L5 Law’s Texture Masks (2)

Create 2D mask

MMAI, Fall 07 - Winston Hsu, NTU -52-

Laws Filters (2D)

(27)

MMAI, Fall 07 - Winston Hsu, NTU -53-

Laws Process

Wavelet Features (PWT, TWT)

Wavelet

Decomposition of signal with a family of basis functions with recursive filtering and sub-sampling

Each level, decomposes 2D signal into 4 subbands, LL, LH, HL, HH (L=low, H=high)

PWT: pyramid-structured wavelet transform

Recursively decomposes the LL band

Feature dimension (3x3x1+1)x2 = 20

TWT: pyramid-structured wavelet transform

Some information in the middle frequency channels

Feature dimension 40x2 = 80

(28)

MMAI, Fall 07 - Winston Hsu, NTU -55-

Texture Comparisons

Retrieval performance of different texture features according to the number of relevant images retrieved at various scopes using Corel Photo galleries

# of top matches considered

# of relevant images

[Ma’98]

MRSAR (M)

Gabor TWT PWT MRSAR

Tamura (improved)

Coarseness histogram directionality edge histogram Tamura

MMAI, Fall 07 - Winston Hsu, NTU -56-

Texture Comparisons (cont.)

Retrieval performance of texture features in terms of the number of top matches considered using Brodatz album

# of top matches considered recall

[Ma’98]

Running Running

MRSAR (M) Gabor

TWTPWT MRSAR Tamura (improved)

Coarseness histogram

directionality edge histogram Tamura

(29)

MMAI, Fall 07 - Winston Hsu, NTU -57-

Texture Comparisons (cont.)

Images of rock samples in applications related to oil exploitation [Li’00]

Texture Comparisons (cont.)

Images of rock samples in applications related to oil exploitation

Gabor descriptors outperform the others

[Li’00]

(30)

MMAI, Fall 07 - Winston Hsu, NTU -59-

Learned Similarity

Distance metrics DO matter

All based on Gabor features

Euclidean vs.

learned (supervised) distance metric

The later was maintained with texture thesaurus

[Ma’96]

Euclidean distance

learned (supervised) distance

MMAI, Fall 07 - Winston Hsu, NTU -60-

Shape

Region-base descriptor

Contour-based Shape Descriptor

2D/3D Shape Descriptor

Some relevant ones are included in MPEG-7

Not easy to derive automatically

[Bober’01]

(31)

MMAI, Fall 07 - Winston Hsu, NTU -61-

Region-based vs. Contour-based Descriptor

Columns indicate contour similarity

Outline of contours

Rows indicate region similarity

Distribution of pixels

Region-based Descriptor

Express pixel distribution within a 2D object region

Employs a complex 2D Angular Radial Transformation (ART)

35 fields each of 4 bits

Rotational and scale invariance

Robust to some non-rigid transformation

L

1

metric on transformed coefficients

Advantages

Describing complex shapes with disconnected regions

Robust to segmentation noise

Small size

Fast extraction and matching

(32)

MMAI, Fall 07 - Winston Hsu, NTU -63-

(a) (b)

(c)

(d)

(e)

Contour-based Descriptor

It’s based on Curvature

(曲率)

Scale-Space (CSS) representation

Found to be superior to

Zernike moments

ART

Fourier-based

Turning angles

Wavelets

Rotational and scale invariance

Robust to some non-rigid transformations

For example

Applicable to (a)

Discriminating differences in (b)

Finding similarities in (c)-(e)

MMAI, Fall 07 - Winston Hsu, NTU -64-

Problems in Shape-based Indexing

Many existing approaches assume

Segmentation is given

Human operator circle object of interest

Lack of clutter and shadows

Objects are rigid

Planar (2-D) shape models

Models are known in advance

(33)

MMAI, Fall 07 - Winston Hsu, NTU -65-

Summary

Texture features

Statistical

Spectral

Texture computation are time-consuming

compressed domain features?

Shape features

Multimodal fusion are quite helpful

Next week

Efficient indexing on high-dimensional data

Feature reduction

參考文獻

相關文件

220V 50 Hz single phase A.C., variable stroke control, electrical components and cabling conformed to the latest B.S.S., earthing through 3 core supply cable.. and 2,300 r.p.m.,

Wang, Solving pseudomonotone variational inequalities and pseudocon- vex optimization problems using the projection neural network, IEEE Transactions on Neural Networks 17

Define instead the imaginary.. potential, magnetic field, lattice…) Dirac-BdG Hamiltonian:. with small, and matrix

Microphone and 600 ohm line conduits shall be mechanically and electrically connected to receptacle boxes and electrically grounded to the audio system ground point.. Lines in

Two cross pieces at bottom of the stand to make a firm base with stays fixed diagonally to posts. Sliding metal buckles for adjustment of height. Measures accumulated split times.

Proceedings of the 19th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval pp.298-306.. Automatic Classification Using Supervised

These images are the results of relighting the synthesized target object under Lambertian model (left column) and Phong model (right column) with different light directions ....

Comparing mouth area images of two different people might be deceptive because of different facial features such as the lips thickness, skin texture or teeth structure..