## Image-based modeling

### Digital Visual Effects *Yung-Yu Chuang*

*with slides by Richard Szeliski, Steve Seitz and Alexei Efros*

**Outline**

### • Models from multiple (sparse) images

– Structure from motion – Facade

### • Models from single images

– Tour into pictures

– Single view metrology – Other approaches

**Models from multiple ** **images**

**(Façade, Debevec et. al. **

**(Façade, Debevec et. al.**

**1996)**

**Facade**

### • Use a sparse set of images

### • Calibrated camera (intrinsic only)

### • Designed specifically for modeling architecture

### • Use a set of blocks to approximate architecture

### • Three components:

– geometry reconstruction – texture mapping

– model refinement

**Idea**

**Idea**

**Geometric modeling**

A block is a geometric primitive
**with a small set of parameters**

Hierarchical modeling for a scene

Rotation and translation could be constrained

**Reasons for block modeling**

### • Architectural scenes are well modeled by geometric primitives.

### • Blocks provide a high level abstraction, easier to manage and add constraints.

### • No need to infer surfaces from discrete features; blocks essentially provide prior models for architectures.

### • Hierarchical block modeling effectively reduces

### the number of parameters for robustness and

### efficiency.

**Reconstruction**

### minimize

**Reconstruction**

**Reconstruction**

nonlinear w.r.t.

camera and model

**Results**

**3 of 12 photographs**

**Results**

**Texture mapping**

**Texture mapping in real world**

### Demo movie

### Michael Naimark,

### San Francisco Museum

### of Modern Art, 1984

**Texture mapping**

**Texture mapping**

**View-dependent texture mapping**

**View-dependent texture mapping**

### model VDTM

### VDTM single

### texture

### map

**View-dependent texture mapping**

**Model-based stereo**

### • Use stereo to refine the geometry

**known**
**known**
**camera**
**camera**
**viewpoints**
**viewpoints**

**Stereo**

**scene point**
**scene point**

**optical center**
**optical center**

**image plane**
**image plane**

**Stereo**

### • Basic Principle: Triangulation

– Gives reconstruction as intersection of two rays – Requires

• calibration

**• point correspondence**

**Stereo correspondence**

### • Determine Pixel Correspondence

– Pairs of points that correspond to same scene point

### • Epipolar Constraint

– Reduces correspondence problem to 1D search along
*conjugate epipolar lines*

**epipolar plane** **epipolar line****epipolar line**
**epipolar line**

**epipolar line**

**Finding correspondences**

### • apply feature matching criterion (e.g., *correlation or Lucas-Kanade) at all pixels * simultaneously

### • search only over epipolar lines (much fewer

### candidate positions)

**Image registration (revisited)**

### • How do we determine correspondences?

*– block matching or SSD (sum squared differences)*

*d is the disparity (horizontal motion)*

### • How big should the neighborhood be?

**Neighborhood size**

### • Smaller neighborhood: more details

### • Larger neighborhood: fewer isolated mistakes

### w = 3 w = 20

**Depth from disparity**

f

x x’

baseline

z

C C’

X

f

input image (1 of 2)

[Szeliski & Kang ‘95]

depth map 3D rendering

– Camera calibration errors – Poor image resolution

– Occlusions

– Violations of brightness constancy (specular reflections) – Large motions

– Low-contrast image regions

**Stereo reconstruction pipeline**

### • Steps

– Calibrate cameras – Rectify images – Compute disparity – Estimate depth

### • What will cause errors?

**Model-based stereo**

key image

offset image

warped offset image

**Results**

**Comparisons**

single texture, flat VDTM, flat

VDTM, model- based stereo

**Final results**

**Kite photography**

**Final results**

**Results**

**Results**

**Commercial packages**

### • Autodesk REALVIZ ImageModeler

**The Matrix**

**Cinefex #79, October 1999. **

**The Matrix**

**• Academy Awards for Scientific and Technical ** **achievement for 2000 **

**• Academy Awards for Scientific and Technical**

**achievement for 2000**

* To George Borshukov, Kim Libreri and Dan * *Piponi for the development of a system for *

*image-based rendering allowing choreographed * *camera movements through computer graphic * *reconstructed sets.*

This was used in The Matrix and Mission Impossible II; See The Matrix Disc #2 for more details

**Models from single **

**images**

**Vanishing points**

### • Vanishing point

– projection of a point at infinity

image plane

camera center

ground plane vanishing point

**Vanishing points (2D)**

image plane

camera center

line on ground plane vanishing point

**Vanishing points**

### • Properties

– Any two parallel lines have the same vanishing point
**v**

**– The ray from C through v is parallel to the lines**
– An image may have more than one vanishing point

image plane

camera center

**C**

line on ground plane
**vanishing point V**

line on ground plane

**Vanishing lines**

### • Multiple Vanishing Points

– Any set of parallel lines on the plane define a vanishing point
*– The union of all of these vanishing points is the horizon line*

*• also called vanishing line*

– Note that different planes define different vanishing lines

**v**_{1}**v**_{2}

**Computing vanishing points**

### • Properties

**– P**** is a point at infinity, v is its projection***– They depend only on line direction*

**– Parallel lines P**0** + tD, P**1** + tD intersect at P**

**V**

**D**
**P**

**P** _{0} *t*

_{}

0 /

1 / / /

1

*Z*
*Y*
*X*

*Z*
*Z*

*Y*
*Y*

*X*
*X*

*Z*
*Z*

*Y*
*Y*

*X*
*X*

*t* *D*

*D*
*D*
*t*

*t*
*D*
*t*
*P*

*D*
*t*
*P*

*D*
*t*
*P*

*tD*
*P*

*tD*
*P*

*tD*
*P*

**P**
**P**

** ΠP**

**v**

**P**_{0}

**D**

**Tour into pictures**

• Create a 3D “theatre stage” of five billboards

• Specify foreground objects through bounding polygons

• Use camera transformations to navigate through the scene

**Tour into pictures**

**The idea**

### • Many scenes (especially paintings), can be represented as an axis-aligned box volume (i.e. a stage)

### • Key assumptions:

– All walls of volume are orthogonal

– Camera view plane is parallel to back of volume – Camera up is normal to volume bottom

– Volume bottom is y=0

### • Can use the vanishing point to fit the box to the

### particular Scene!

**Fitting the box volume**

### • User controls the inner box and the vanishing

### point placement (6 DOF)

**Foreground Objects**

### • Use separate

### billboard for each

### • For this to work, three separate images used:

– Original image.

– Mask to isolate

desired foreground images.

– Background with objects removed

**Foreground Objects**

• Add vertical rectangles for each foreground object

• Can compute 3D coordinates P0, P1 since they are on known plane.

• P2, P3 can be computed as before (similar triangles)

**Example**

**Example**

**glTip**

**• http://www.cs.ust.hk/~cpegnel/glTIP/**

**Criminisi et al. ICCV 1999**

**Criminisi et al. ICCV 1999**

1. Find world coordinates (X,Y,Z) for a few points

2. Connect the points with planes to model geometry – Texture map the planes

**1** **2** **3** **4**

**1**
**2**
**3**
**4**

**Measurements on planes**

Approach: unwarp then measure What kind of warp is this?

**Image rectification**

To unwarp (rectify) an image

**• solve for homography H given p and p’**

**• solve equations of the form: wp’ = Hp**

**– linear in unknowns: w and coefficients of H**
– H is defined up to an arbitrary scale factor

**– how many points are necessary to solve for H?**

**p** **p’**

**Solving for homographies**

**Solving for homographies**

**A** **h** **0**

### • Defines a least squares problem:

**2n × 9** **9** **2n**

**– Since h is only defined up to scale, solve for unit **
**vector ĥ**

– Works with 4 or more points

**Finding world coordinates (X,Y,Z)**

1. Define the ground plane (Z=0)

2. Compute points (X,Y,0) on that plane

*3. Compute the heights Z of all other points*

**Measuring height**

**1**
**2**
**3**
**4**

**5** **5.4**

**2.8**
**3.3**

Camera height

**q**_{1}

**Computing vanishing points**

### • Intersect p

_{1}

### q

_{1}

### with p

_{2}

### q

_{2}

**v**

**p**_{1}

**p**_{2}**q**_{2}

### • Least squares version

– Better to use more than two lines and compute the “closest”

point of intersection

– See notes by Bob Collins for one good way of doing this:

• http://www-2.cs.cmu.edu/~ph/869/www/notes/vanishing.txt

**Criminisi et al., ICCV 99**

• Load in an image

• Click on lines parallel to X axis – repeat for Y, Z axes

• Compute vanishing points

**Vanishing**
** point**
**Vanishing**

** line**

**Vanishing**
** point**

** Vertical vanishing**
** point**

**(at infinity)**

**Criminisi et al., ICCV 99**

**Criminisi et al., ICCV 99**

• Load in an image

• Click on lines parallel to X axis – repeat for Y, Z axes

• Compute vanishing points

• Specify 3D and 2D positions of 4 points on reference plane

• Compute homography H

• Specify a reference height

• Compute 3D positions of several points

• Create a 3D model from these points

• Extract texture maps

• Output a VRML model

**Results**

**Zhang et. al. CVPR 2001**

**Zhang et. al. CVPR 2001**

**Zhang et. al. CVPR 2001**

**Zhang et. al. CVPR 2001**

**Oh et. al. SIGGRAPH 2001**

**Oh et. al. SIGGRAPH 2001**

**Oh et. al. SIGGRAPH 2001**

**Oh et. al. SIGGRAPH 2001**

### video

**Automatic popup**

Input

Ground

Vertical

Sky

Geometric Labels Cut’n’Fold 3D Model

Image

Learned Models

**Geometric cues**

Color

Location

Texture

Perspective

**Automatic popup**

**Results**

Automatic Photo Pop-up Input Images

**Results**

This approach works roughly for 35% of images.

**Failures**

Labeling Errors

**Failures**

Foreground Objects

**References**

• P. Debevec, C. Taylor and J. Malik. Modeling and Rendering Architecture from Photographs: A Hybrid

Geometry- and Image-Based Approach, SIGGRAPH 1996.

• Y. Horry, K. Anjyo and K. Arai. Tour Into the Picture:

Using a Spidery Mesh Interface to Make Animation from a Single Image, SIGGRAPH 1997.

• A. Criminisi, I. Reid and A. Zisserman. Single View Metrology, ICCV 1999.

• L. Zhang, G. Dugas-Phocion, J.-S. Samson and S. Seitz.

Single View Modeling of Free-Form Scenes, CVPR 2001.

• B. Oh, M. Chen, J. Dorsey and F. Durand. Image-Based Modeling and Photo Editing, SIGGRAPH 2001.

• D. Hoiem, A. Efros and M. Hebert. Automatic Photo Pop- up, SIGGRAPH 2005.