Chapter 13 Image-Based Rendering

(1)

Chapter 13 Image-Based Rendering

Presented by 翁丞世 0925970510 [email protected]

(2)

 View Interpolation

 Layered Depth Images

 Light Fields and Lumigraphs

 Environment Mattes

 Video-Based Rendering

Outline

(3)

13. Image-Based Rendering

(4)

 13.1.1 View-Dependent Texture Maps

 13.1.2 Application: Photo Tourism

13.1 View Interpolation

(5)

 View interpolation creates a seamless transi tion between a pair of reference images usin g one or more pre-computed depth maps.

 Closely related to this idea are view-depend ent texture maps, which blend multiple textu re maps on a 3D model’s surface.

13.1 View Interpolation

(6)

13.1 View Interpolation

(7)

 View-dependent texture maps are closely rela ted to view interpolation.

 Instead of associating a separate depth map with each input image, a single 3D model is created for the scene, but different images are used as texture map sources depending on the virtual camera’s current position.

13.1.1 View Interpolation

(8)

13.1.1 View Interpolation

(9)

 Given a new virtual camera position, the sim ilarity of this camera’s view of each polyg on (or pixel) is compared with that of poten tial source images.

 The images are then blended using a weightin g that is inversely proportional to the angl es between the virtual view and the source v iews (Figure 13.3a).

13.1.1 View Interpolation

(10)

 Even though the geometric model can be fairl y coarse, blending between different views g ives a strong sense of more detailed geometr y because of the parallax (visual motion) be tween corresponding pixels.

13.1.1 View Interpolation

(11)

13.1.2 Photo Tourism

(12)

 13.2.1 Imposters, Sprites, and Layers

13.2 Layered Depth Images

(13)

 When a depth map is warped to a novel view, holes and cracks inevitably appear behind th e foreground objects.

 One way to alleviate this problem is to keep several depth and color values (depth pixel s) at every pixel in a reference image (or, at least for pixels near foreground–backgro und transitions) (Figure 13.5).

13.2 Layered Depth Images

(14)

 The resulting data structure, which is calle d a Layered Depth Image (LDI), can be used t o render new views using a back-to-front for ward warping algorithm (Shade, Gortler, He e t al. 1998).

13.2 Layered Depth Images

(15)

13.2 Layered Depth Images

(16)

 An alternative to keeping lists of color-dep th values at each pixel, as is done in the L DI, is to organize objects into different la yers or sprites.

13.2.1 Impostors, Sprites, and Layer

s

(17)

13.2.1 Impostors, Sprites, and Layer

s

(18)

 13.3.1 Unstructured Lumigraph

 13.3.2 Surfaces Light Fields

 13.3.3 Applications: Concentric Mosaics

13.3 Light Fields and Lumigraphs

(19)

13.3 Light Fields and Lumigraphs

 Is it possible to capture and render the app earance of a scene from all possible viewpoi nts and, if so, what is the complexity of th e resulting structure?

 Let us assume that we are looking at a stati c scene, i.e., one where the objects and ill uminants are fixed, and only the observer is moving around.

(20)

13.3 Light Fields and Lumigraphs

 We can describe each image by the location a nd orientation of the virtual camera (6 DOF) .

 If we capture a two-dimensional spherical im age around each possible camera location, we can re-render any view from this information .

DOF: Degree Of Freedom

(21)

13.3 Light Fields and Lumigraphs

 To make the parameterization of this 4D func tion simpler, let us put two planes in the 3 D scene roughly bounding the area of interes t, as shown in Figure 13.7a.

 Any light ray terminating at a camera that l ives in front of the st plane (assuming that this space is empty) passes through the two planes at (s, t) and (u, v) and can be descr ibed by its 4D coordinate (s, t, u, v).

(22)

13.3 Light Fields and Lumigraphs

(23)

13.3 Light Fields and Lumigraphs

 This diagram (and parameterization) can be i nterpreted as describing a family of cameras living on the st plane with their image plan es being the uv plane.

 The uv plane can be placed at infinity, whic h corresponds to all the virtual cameras loo king in the same direction.

(24)

13.3 Light Fields and Lumigraphs

 While a light field can be used to render a com plex 3D scene from novel viewpoints, a much bet ter rendering (with less ghosting) can be obtai ned if something is known about its 3D geometry .

 The Lumigraph system of Gortler, Grzeszczuk, Sz eliski et al. (1996) extends the basic light fi eld rendering approach by taking into account t he 3D location of surface points corresponding to each 3D ray.

(25)

13.3 Light Fields and Lumigraphs

 While a light field can be used to render a com plex 3D scene from novel viewpoints, a much bet ter rendering (with less ghosting) can be obtai ned if something is known about its 3D geometry .

 The Lumigraph system of Gortler, Grzeszczuk, Sz eliski et al. (1996) extends the basic light fi eld rendering approach by taking into account t he 3D location of surface points corresponding to each 3D ray.

(26)

13.3 Light Fields and Lumigraphs

 Consider the ray (s, u ) corresponding to th e dashed line in Figure 13.8, which intersec ts the object’s surface at a distance z fro m the uv plane.

 Instead of using quadri-linear interpolation of the nearest sampled (s, t, u, v ) values around a given ray to determine its color, t he (u, v ) values are modified for each disc rete (s_i, t_i ) camera.

(27)

13.3 Light Fields and Lumigraphs

(28)

13.3.1 Unstructured Lumigraph

 When the images in a Lumigraph are acquired in an unstructured (irregular) manner, it ca n be counterproductive to resample the resul ting light rays into a regularly binned (s, t, u, v ) data structure.

 This is both because resampling always intro duces a certain amount of aliasing and becau se the resulting gridded light field can be populated very sparsely or irregularly.

(29)

13.3.1 Unstructured Lumigraph

 The alternative is to render directly from the acquired images, by finding for each light ray in a virtual came ra the closest pixels in the original images.

 The Unstructured Lumigraph Rendering (ULR) system of Bu ehler, Bosse, McMillan et al. (2001) describes how to s elect such pixels by combining a number of fidelity cri teria, including epipole consistency (distance of rays to a source camera’s center), angular deviation (simil ar incidence direction on the surface), resolution (sim ilar sampling density along the surface), continuity (t o nearby pixels), and consistency (along the ray).

(30)

13.3.2 Surfaces Light Fields

 If we know the 3D shape of the object or sce ne whose light field is being modeled, we ca n effectively compress the field because nea rby rays emanating from nearby surface eleme nts have similar color values.

(31)

13.3.2 Surfaces Light Fields

 These observations underlie the surface ligh t field representation introduced by Wood, A zuma, Aldinger et al. (2000).

 In their system, an accurate 3D model is bui lt of the object being represented.

(32)

13.3.2 Surfaces Light Fields

 The Lumisphere of all rays emanating from ea ch surface point is estimated or captured (F igure 13.9).

 Nearby Lumispheres will be highly correlated and hence amenable to both compression and m anipulation.

(33)

13.3.2 Surfaces Light Fields

(34)

13.3.2 Surfaces Light Fields

 To estimate the diffuse component of each Lumis phere, a median filtering over all visible exit ing directions is first performed for each chan nel.

 Once this has been subtracted from the Lumisphe re, the remaining values, which should consist mostly of the specular components, are reflecte d around the local surface normal, which turns each Lumisphere into a copy of the local enviro nment around that point.

(35)

13.3.2 Surfaces Light Fields

 To estimate the diffuse component of each Lumis phere, a median filtering over all visible exit ing directions is first performed for each chan nel.

 Once this has been subtracted from the Lumisphe re, the remaining values, which should consist mostly of the specular components, are reflecte d around the local surface normal, which turns each Lumisphere into a copy of the local enviro nment around that point.

(36)

13.4 Environments Mattes

 What if instead of moving around a virtual cam era, we take a complex, refractive object, suc h as the water goblet shown in Figure 13.10, a nd place it in front of a new background?

 Instead of modeling the 4D space of rays emana ting from a scene, we now need to model how ea ch pixel in our view of this object refracts i ncident light coming from its environment.

(37)

13.4 Environments Mattes

(38)

13.4 Environments Mattes

 If we assume that other objects and illuminants are su fficiently distant (the same assumption we made for su rface light fields, 4D mapping captures all the infor mation between a refractive object and its environment .

 Zongker, Werner, Curless et al. (1999) call such a rep resentation an environment matte, since it generalizes the process of object matting to not only cut and past e an object from one image into another but also take into account the subtle refractive or reflective inter play between the object and its environment.



(39)

13.4.1 Higher-Dimensional Light Fiel ds

 An environment matte in principle maps every pi xel into a 4D distribution over light rays and is, hence, a six-dimensional representation.

 If we want to handle dynamic light fields, we n eed to add another temporal dimension.

 Similarly, if we want a continuous distribution over wavelengths, this becomes another dimensio n.



(40)

13.4.2 The Modeling to Rendering Con tinuum

 The image-based rendering representations an d algorithms we have studied in this chapter span a continuum ranging from classic 3D tex ture-mapped models all the way to pure sampl ed ray-based representations such as light f ields.

(41)

13.4.2 The Modeling to Rendering Con

tinuum

(42)

13.4.2 The Modeling to Rendering Con tinuum

 Representations such as view-dependent textu re maps and Lumigraphs still use a single gl obal geometric model, but select the colors to map onto these surfaces from nearby image s.

 The best choice of representation and render ing algorithm depends on both the quantity a nd quality of the input imagery as well as t he intended application.

(43)

13.5 Video-Based Rendering

 A fair amount of work has been done in the a rea of video-based rendering and video-based animation, two terms first introduced by Sc h¨odl, Szeliski, Salesin et al. (2000) to de note the process of generating new video seq uences from captured video footage.

(44)

13.5 Video-Based Rendering

 We start with video-based animation (Section 13.5.1), in which video footage is re-arrang ed or modified, e.g., in the capture and re- rendering of facial expressions.

 Next, we turn our attention to 3D video (Sec tion 13.5.4), in which multiple synchronized video cameras are used to film a scene from different directions.

(45)

13.5.1 Video-Based Animation

 An early example of video-based animation is Video Rewrite, in which frames from original video footage are rearranged in order to mat ch them to novel spoken utterances, e.g., fo r movie dubbing (Figure 13.12).

 This is similar in spirit to the way that co ncatenative speech synthesis systems work (T aylor 2009).

(46)

13.5.1 Video-Based Animation

(47)

13.5.2 Video Textures

 Video texture is a short video clip that can be arbitrarily extended by re-arranging vide o frames while preserving visual continuity (Sch¨odl, Szeliski, Salesin et al. 2000).

 The simplest approach is to match frames by visual similarity and to jump between frames that appear similar.

(48)

(49)

13.5.4 3D Videos

 In recent years, the popularity of 3D movies has grown dramatically, with recent releases ranging from Hannah Montana, through U2’s 3 D concert movie, to James Cameron’s Avatar.

 Currently, such releases are filmed using st ereoscopic camera rigs and displayed in thea ters (or at home) to viewers wearing polariz ed glasses.

(50)

13.5.4 3D Videos

 The stereo matching techniques developed in the computer vision community along with ima ge-based rendering (view interpolation) tech niques from graphics are both essential comp onents in such scenarios, which are sometime s called free-viewpoint video (Carranza, The obalt, Magnor et al. 2003) or virtual viewpo int video (Zitnick, Kang, Uyttendaele et al.

2004).

(51)

13.5.4 3D Videos

(52)

13.5.4 3D Videos

 The stereo matching techniques developed in the computer vision community along with ima ge-based rendering (view interpolation) tech niques from graphics are both essential comp onents in such scenarios, which are sometime s called free-viewpoint video (Carranza, The obalt, Magnor et al. 2003) or virtual viewpo int video (Zitnick, Kang, Uyttendaele et al.

2004).