## Image-based modeling

### Digital Visual Effects, Spring 2006 *Yung-Yu Chuang*

### 2005/5/17

*with slides by Richard Szeliski, Steve Seitz and Alexei Efros*

**Outline**

### • Models from multiple (sparse) images

– Structure from motion – Facade

### • Models from single images

– Tour into pictures – Single view metrology – Other approaches

**Models from multiple images** **(Façade, Debevec** **et. al.** **1996)**

**Facade**

### • Use a sparse set of images

### • Calibrated camera (intrinsic only)

### • Designed specifically for modeling architecture

### • Use a set of blocks to approximate architecture

### • 3 steps: geometry reconstruction, texture

### mapping and model refinement

**Idea** **Idea**

**Geometric modeling**

A block is a geometric primitive
**with a small set of parameters**

Hierarchical modeling for a scene

Rotation and translation could be constrained

**Reasons for block modeling**

### • Architectural scenes are well modeled by geometric primitives

### • Blocks provide a high level abstraction, easier to manage and add constraints

### • No need to infer surfaces from discrete features; blocks essentially provide prior models for architectures

### • Hierarchical block modeling effectively reduces

### the number of parameters for robustness and

### efficiency

**Reconstruction**

### minimize

**Reconstruction**

**Reconstruction**

nonlinear w.r.t.

camera and model

**Results**

**3 of 12 photographs**

**Results**

**Texture mapping**

**Texture mapping in real world**

### Demo movie Michael Naimark, San Francisco Museum of Modern Art, 1984

**Texture mapping**

**Texture mapping** **View-dependent texture mapping**

**View-dependent texture mapping**

### model VDTM

### VDTM single

### texture map

**View-dependent texture mapping**

**Model-based stereo**

### • Use stereo to refine the geometry

**known**
**known**
**camera**
**camera**
**viewpoints**
**viewpoints**

**Stereo**

**scene point**
**scene point**

**optical center**
**optical center**

**image plane**
**image plane**

**Stereo**

### • Basic Principle: Triangulation

– Gives reconstruction as intersection of two rays – Requires

• calibration

• **point correspondence**

**Stereo correspondence**

### • Determine Pixel Correspondence

– Pairs of points that correspond to same scene point

### • Epipolar Constraint

– Reduces correspondence problem to 1D search along
*conjugate epipolar lines*

**epipolar plane** **epipolar line****epipolar line**
**epipolar line**

**epipolar line**

**Finding correspondences**

### • apply feature matching criterion (e.g., *correlation or Lucas-Kanade) at all pixels * simultaneously

### • search only over epipolar lines (much fewer candidate positions)

**Image registration (revisited)**

### • How do we determine correspondences?

– *block matching or SSD (sum squared differences)*

*d is the disparity (horizontal motion)*

### • How big should the neighborhood be?

**Neighborhood size**

### • Smaller neighborhood: more details

### • Larger neighborhood: fewer isolated mistakes

### w = 3 w = 20

**Depth from disparity**

f

x x’

baseline

z

C C’

X

f input image (1 of 2)

[Szeliski & Kang ‘95]

depth map 3D rendering

– Camera calibration errors – Poor image resolution – Occlusions

– Violations of brightness constancy (specular reflections) – Large motions

– Low-contrast image regions

**Stereo reconstruction pipeline**

### • Steps

– Calibrate cameras – Rectify images – Compute disparity – Estimate depth

### • What will cause errors?

**Model-based stereo**

key image

offset image

warped offset image

**Epipolar geometry** **Results**

**Comparisons**

single texture, flat VDTM, flat

VDTM, model- based stereo

**Final results**

**Kite photography**

**Final results**

**Results** **Results**

**Commercial packages**

### • REALVIZ ImageModeler

**The Matrix**

Cinefex #79, October 1999.

**The Matrix**

**• Academy Awards for Scientific and Technical ** **achievement for 2000**

**• Academy Awards for Scientific and Technical**

**achievement for 2000**

*To George Borshukov, Kim Libreri and Dan * *Piponi for the development of a system for * *image-based rendering allowing choreographed * *camera movements through computer graphic * *reconstructed sets.*

This was used in The Matrix and Mission Impossible II; See The Matrix Disc #2 for more details

**Models from single images**

**Vanishing points**

### • Vanishing point

– projection of a point at infinity

image plane

camera center

ground plane vanishing point

**Vanishing points (2D)**

image plane

camera center

line on ground plane vanishing point

**Vanishing points**

### • Properties

– Any two parallel lines have the same vanishing point
**v**

– **The ray from C through v is parallel to the lines**
– An image may have more than one vanishing point

image plane

camera
center
**C**

line on ground plane
**vanishing point V**

line on ground plane

**Vanishing lines**

### • Multiple Vanishing Points

– Any set of parallel lines on the plane define a vanishing point

– *The union of all of these vanishing points is the horizon line*

• *also called vanishing line*

– Note that different planes define different vanishing lines

**v**_{1}**v**_{2}

**Vanishing lines**

### • Multiple Vanishing Points

– Any set of parallel lines on the plane define a vanishing point

– *The union of all of these vanishing points is the horizon line*

• *also called vanishing line*

– Note that different planes define different vanishing lines

**Computing vanishing points**

### • Properties

– **P**_{∞}* is a point at infinity, v is its projection*
–

*They depend only on line direction*

– **Parallel lines P**_{0}**+ tD, P**_{1}**+ tD intersect at P**_{∞}

**V**

**D**
**P**
**P**= _{0}+*t*

⎥⎥

⎥⎥

⎦

⎤

⎢⎢

⎢⎢

⎣

⎡

≅

∞

→

⎥⎥

⎥⎥

⎦

⎤

⎢⎢

⎢⎢

⎣

⎡

+ + +

≅

⎥⎥

⎥⎥

⎦

⎤

⎢⎢

⎢⎢

⎣

⎡

+ + +

= _{∞}

0 /

1 / / /

1

*Z*
*Y*
*X*

*Z*
*Z*

*Y*
*Y*

*X*
*X*

*Z*
*Z*

*Y*
*Y*

*X*
*X*

*t* *D*

*D*
*D*
*t*

*t*
*D*
*t*
*P*

*D*
*t*
*P*

*D*
*t*
*P*

*tD*
*P*

*tD*
*P*

*tD*
*P*

**P**
**P**

**= ΠP**∞

**v**
**P**_{0}

**D**

**Computing vanishing lines**

### • Properties

– **l is intersection of horizontal plane through C with image plane**
– **Compute l from two sets of parallel lines on ground plane**
– **All points at same height as C project to l**

• points higher than C project above l

– Provides way of comparing height of objects in the scene ground plane

**C** **l**

**Tour into pictures**

• Create a 3D “theatre stage” of five billboards

• Specify foreground objects through bounding polygons

• Use camera transformations to navigate through the scene

**Tour into pictures** **The idea**

### • Many scenes (especially paintings), can be represented as an axis-aligned box volume (i.e. a stage)

### • Key assumptions:

– All walls of volume are orthogonal

– Camera view plane is parallel to back of volume – Camera up is normal to volume bottom

– Volume bottom is y=0

### • Can use the vanishing point to fit the box to the

### particular Scene!

**Fitting the box volume**

### • User controls the inner box and the vanishing point placement (6 DOF)

**Foreground Objects**

### • Use separate billboard for each

### • For this to work, three separate images used:

– Original image.

– Mask to isolate desired foreground images.

– Background with objects removed

**Foreground Objects**

• Add vertical rectangles for each foreground object

• Can compute 3D coordinates P0, P1 since they are on known plane.

• P2, P3 can be computed as before (similar triangles)

**Example**

**Example** **glTip**

### • **http://www.cs.ust.hk/~cpegnel/glTIP/**

**Criminisi et al. ICCV 1999**

**Criminisi et al. ICCV 1999**

1. Find world coordinates (X,Y,Z) for a few points 2. Connect the points with planes to model geometry

– Texture map the planes

**1** **2** **3** **4**

**1**
**2**
**3**
**4**

**Measurements on planes**

Approach: unwarp then measure What kind of warp is this?

**Image rectification**

To unwarp (rectify) an image

**• solve for homography H given p and p’**

**• solve equations of the form: wp’ = Hp**
**– linear in unknowns: w and coefficients of H**
– H is defined up to an arbitrary scale factor
**– how many points are necessary to solve for H?**

**p** **p’**

**Solving for homographies**

**Solving for homographies**

**A** **h** **0**

### • Defines a least squares problem:

**2n × 9** **9** **2n**

– **Since h is only defined up to scale, solve for unit **
**vector ĥ**

– Works with 4 or more points

**Finding world coordinates (X,Y,Z)**

1. Define the ground plane (Z=0)

2. Compute points (X,Y,0) on that plane
*3. Compute the heights Z of all other points*

**Measuring height**

**1**
**2**
**3**
**4**
**5** **5.4**

**2.8**
**3.3**

Camera height

**q**_{1}

**Computing vanishing points**

### • Intersect p

_{1}

### q

_{1}

### with p

_{2}

### q

_{2}

**v**

**p**_{1}**p****2**

**q**_{2}

### • Least squares version

– Better to use more than two lines and compute the “closest”

point of intersection

– See notes by Bob Collins for one good way of doing this:

• http://www-2.cs.cmu.edu/~ph/869/www/notes/vanishing.txt

**Criminisi et al., ICCV 99**

• Load in an image

• Click on lines parallel to X axis – repeat for Y, Z axes

• Compute vanishing points

**Vanishing**
**point**
**Vanishing**

**line**

**Vanishing**
**point**

**Vertical vanishing**
**point**

**(at infinity)**

**Criminisi et al., ICCV 99**

**Criminisi et al., ICCV 99**

• Load in an image

• Click on lines parallel to X axis – repeat for Y, Z axes

• Compute vanishing points

• Specify 3D and 2D positions of 4 points on reference plane

• Compute homography H

• Specify a reference height

• Compute 3D positions of several points

• Create a 3D model from these points

• Extract texture maps

• Output a VRML model

**Results**

**Zhang et. al. CVPR 2001** **Zhang et. al. CVPR 2001**

**Zhang et. al. CVPR 2001**

**Zhang et. al. CVPR 2001**

**Oh et. al. SIGGRAPH 2001** **Oh et. al. SIGGRAPH 2001**

**Oh et. al. SIGGRAPH 2001**

**Oh et. al. SIGGRAPH 2001**

### video

**Automatic popup**

Input

Ground

Vertical

Sky

Geometric Labels Cut’n’Fold 3D Model

Image

Learned Models

**Geometric cues**

Color

Location

Texture

Perspective

**Automatic popup** **Results**

Automatic Photo Pop-up Input Images

**Results**

This approach works roughly for 35% of images.

**Failures**

Labeling Errors

**Failures**

Foreground Objects

**References**

• P. Debevec, C. Taylor and J. Malik. Modeling and Rendering Architecture from Photographs: A Hybrid Geometry- and Image-Based Approach, SIGGRAPH 1996.

• Y. Horry, K. Anjyo and K. Arai. Tour Into the Picture:

Using a Spidery Mesh Interface to Make Animation from a Single Image, SIGGRAPH 1997.

• A. Criminisi, I. Reid and A. Zisserman. Single View Metrology, ICCV 1999.

• L. Zhang, G. Dugas-Phocion, J.-S. Samson and S. Seitz.

Single View Modeling of Free-Form Scenes, CVPR 2001.

• B. Oh, M. Chen, J. Dorsey and F. Durand. Image-Based Modeling and Photo Editing, SIGGRAPH 2001.

• D. Hoiem, A. Efros and M. Hebert. Automatic Photo Pop-up, SIGGRAPH 2005.