• 沒有找到結果。

Parametric transformations

在文檔中 Computer Vision: (頁 185-192)

14 Recognition5 Segmentation

3.6 Geometric transformations

3.6.1 Parametric transformations

Parametric transformations apply a global deformation to an image, where the behavior of the transformation is controlled by a small number of parameters. Figure3.45shows a few

ex-Transformation Matrix # DoF Preserves Icon

translation h

I t i

2×3 2 orientation rigid (Euclidean) h

R t i

2×3 3 lengths



 SS

SS

similarity h

sR t i

2×3 4 angles



 S

S

affine h

A i

2×3 6 parallelism 



projective h

H˜ i

3×3 8 straight lines

``

Table 3.5 Hierarchy of 2D coordinate transformations. Each transformation also preserves the properties listed in the rows below it, i.e., similarity preserves not only angles but also parallelism and straight lines. The2×3 matrices are extended with a third [0T1] row to form a full3 × 3 matrix for homogeneous coordinate transformations.

amples of such transformations, which are based on the 2D geometric transformations shown in Figure2.4. The formulas for these transformations were originally given in Table2.1and are reproduced here in Table3.5for ease of reference.

In general, given a transformation specified by a formula x0 = h(x) and a source image f (x), how do we compute the values of the pixels in the new image g(x), as given in (3.88)?

Think about this for a minute before proceeding and see if you can figure it out.

If you are like most people, you will come up with an algorithm that looks something like Algorithm3.1. This process is called forward warping or forward mapping and is shown in Figure3.46a. Can you think of any problems with this approach?

procedure forwardWarp(f, h, out g):

For every pixel x inf (x)

1. Compute the destination location x0= h(x).

2. Copy the pixelf (x) to g(x0).

Algorithm 3.1 Forward warping algorithm for transforming an imagef (x) into an image g(x0) through the parametric transform x0= h(x).

3.6 Geometric transformations 165

f(x) g(x’)

x x’

x’=h(x)

f(x) g(x’)

x x’

x’=h(x)

(a) (b)

Figure 3.46 Forward warping algorithm: (a) a pixelf (x) is copied to its corresponding location x0= h(x) in image g(x0); (b) detail of the source and destination pixel locations.

In fact, this approach suffers from several limitations. The process of copying a pixel f (x) to a location x0ing is not well defined when x0has a non-integer value. What do we do in such a case? What would you do?

You can round the value of x0 to the nearest integer coordinate and copy the pixel there, but the resulting image has severe aliasing and pixels that jump around a lot when animating the transformation. You can also “distribute” the value among its four nearest neighbors in a weighted (bilinear) fashion, keeping track of the per-pixel weights and normalizing at the end. This technique is called splatting and is sometimes used for volume rendering in the graphics community (Levoy and Whitted 1985;Levoy 1988;Westover 1989;Rusinkiewicz and Levoy 2000). Unfortunately, it suffers from both moderate amounts of aliasing and a fair amount of blur (loss of high-resolution detail).

The second major problem with forward warping is the appearance of cracks and holes, especially when magnifying an image. Filling such holes with their nearby neighbors can lead to further aliasing and blurring.

What can we do instead? A preferable solution is to use inverse warping (Algorithm3.2), where each pixel in the destination image g(x0) is sampled from the original image f(x) (Figure3.47).

How does this differ from the forward warping algorithm? For one thing, since ˆh(x0) is (presumably) defined for all pixels ing(x0), we no longer have holes. More importantly, resampling an image at non-integer locations is a well-studied problem (general image inter-polation, see Section3.5.2) and high-quality filters that control aliasing can be used.

Where does the function ˆh(x0) come from? Quite often, it can simply be computed as the inverse of h(x). In fact, all of the parametric transforms listed in Table3.5have closed form solutions for the inverse transform: simply take the inverse of the3 × 3 matrix specifying the transform.

In other cases, it is preferable to formulate the problem of image warping as that of re-sampling a source imagef (x) given a mapping x = ˆh(x0) from destination pixels x0 to source pixels x. For example, in optical flow (Section8.4), we estimate the flow field as the

procedure inverseWarp(f, h, out g):

For every pixel x0ing(x0)

1. Compute the source location x= ˆh(x0) 2. Resamplef (x) at location x and copy to g(x0)

Algorithm 3.2 Inverse warping algorithm for creating an imageg(x0) from an image f(x) using the parametric transform x0 = h(x).

f(x) g(x’)

x x’

x=h(x’)^

f(x) g(x’)

x x’

x=h(x’)^

(a) (b)

Figure 3.47 Inverse warping algorithm: (a) a pixelg(x0) is sampled from its corresponding location x= ˆh(x0) in image f(x); (b) detail of the source and destination pixel locations.

location of the source pixel which produced the current pixel whose flow is being estimated, as opposed to computing the destination pixel to which it is going. Similarly, when correcting for radial distortion (Section2.1.6), we calibrate the lens by computing for each pixel in the final (undistorted) image the corresponding pixel location in the original (distorted) image.

What kinds of interpolation filter are suitable for the resampling process? Any of the fil-ters we studied in Section3.5.2can be used, including nearest neighbor, bilinear, bicubic, and windowed sinc functions. While bilinear is often used for speed (e.g., inside the inner loop of a patch-tracking algorithm, see Section8.1.3), bicubic, and windowed sinc are preferable where visual quality is important.

To compute the value off (x) at a non-integer location x, we simply apply our usual FIR resampling filter,

g(x, y) =X

k,l

f (k, l)h(x− k, y − l), (3.89)

where(x, y) are the sub-pixel coordinate values and h(x, y) is some interpolating or smooth-ing kernel. Recall from Section3.5.2that when decimation is being performed, the smoothing kernel is stretched and re-scaled according to the downsampling rater.

Unfortunately, for a general (non-zoom) image transformation, the resampling rate r is not well defined. Consider a transformation that stretches thex dimensions while squashing

3.6 Geometric transformations 167

x y

x’

y’

x y

x’

y’

x y

x’

y’

ay’y

ay’x

ax’x

ax’y

(a) (b) (c)

major axis minor axis

Figure 3.48 Anisotropic texture filtering: (a) Jacobian of transform A and the induced horizontal and vertical resampling rates{ax0x, ax0y, ay0x, ay0y}; (b) elliptical footprint of an EWA smoothing kernel; (c) anisotropic filtering using multiple samples along the major axis.

Image pixels lie at line intersections.

they dimensions. The resampling kernel should be performing regular interpolation along thex dimension and smoothing (to anti-alias the blurred image) in the y direction. This gets even more complicated for the case of general affine or perspective transforms.

What can we do? Fortunately, Fourier analysis can help. The two-dimensional general-ization of the one-dimensional domain scaling law given in Table3.1is

g(Ax)⇔ |A|−1G(A−Tf). (3.90)

For all of the transforms in Table3.5except perspective, the matrix A is already defined.

For perspective transformations, the matrix A is the linearized derivative of the perspective transformation (Figure3.48a), i.e., the local affine approximation to the stretching induced by the projection (Heckbert 1986;Wolberg 1990;Gomes, Darsa, Costa et al. 1999; Akenine-M¨oller and Haines 2002).

To prevent aliasing, we need to pre-filter the imagef (x) with a filter whose frequency response is the projection of the final desired spectrum through the A−T transform (Szeliski, Winder, and Uyttendaele 2010). In general (for zoom transforms), this filter is non-separable and hence is very slow to compute. Therefore, a number of approximations to this filter are used in practice, include MIP-mapping, elliptically weighted Gaussian averaging, and anisotropic filtering (Akenine-M¨oller and Haines 2002).

MIP-mapping

MIP-mapping was first proposed byWilliams(1983) as a means to rapidly pre-filter images being used for texture mapping in computer graphics. A MIP-map18 is a standard image

18The term ‘MIP’ stands for multi in parvo, meaning ‘many in one’.

pyramid (Figure3.32), where each level is pre-filtered with a high-quality filter rather than a poorer quality approximation, such as Burt and Adelson’s (1983b) five-tap binomial. To resample an image from a MIP-map, a scalar estimate of the resampling rater is first com-puted. For example,r can be the maximum of the absolute values in A (which suppresses aliasing) or it can be the minimum (which reduces blurring). Akenine-M¨oller and Haines (2002) discuss these issues in more detail.

Once a resampling rate has been specified, a fractional pyramid level is computed using the base 2 logarithm,

l = log2r. (3.91)

One simple solution is to resample the texture from the next higher or lower pyramid level, depending on whether it is preferable to reduce aliasing or blur. A better solution is to re-sample both images and blend them linearly using the fractional component ofl. Since most MIP-map implementations use bilinear resampling within each level, this approach is usu-ally called trilinear MIP-mapping. Computer graphics rendering APIs, such as OpenGL and Direct3D, have parameters that can be used to select which variant of MIP-mapping (and of the sampling rater computation) should be used, depending on the desired tradeoff between speed and quality. Exercise3.22has you examine some of these tradeoffs in more detail.

Elliptical Weighted Average

The Elliptical Weighted Average (EWA) filter invented byGreene and Heckbert(1986) is based on the observation that the affine mapping x= Ax0defines a skewed two-dimensional coordinate system in the vicinity of each source pixel x (Figure3.48a). For every destina-tion pixel x0, the ellipsoidal projection of a small pixel grid in x0 onto x is computed (Fig-ure3.48b). This is then used to filter the source imageg(x) with a Gaussian whose inverse covariance matrix is this ellipsoid.

Despite its reputation as a high-quality filter (Akenine-M¨oller and Haines 2002), we have found in our work (Szeliski, Winder, and Uyttendaele 2010) that because a Gaussian kernel is used, the technique suffers simultaneously from both blurring and aliasing, compared to higher-quality filters. The EWA is also quite slow, although faster variants based on MIP-mapping have been proposed (Szeliski, Winder, and Uyttendaele(2010) provide some addi-tional references).

Anisotropic filtering

An alternative approach to filtering oriented textures, which is sometimes implemented in graphics hardware (GPUs), is to use anisotropic filtering (Barkans 1997;Akenine-M¨oller and Haines 2002). In this approach, several samples at different resolutions (fractional levels in the MIP-map) are combined along the major axis of the EWA Gaussian (Figure3.48c).

3.6 Geometric transformations 169

H2

i f

x x x i

f’

g1 g2 g3

u F

u G1

u G2

u G3

u F’

H1

interpolate

* h1(x)

warp ax+t

filter

* h2(x)

sample

* δ(x)

(f) (g) (h) (i) (j)

(a) (b) (c) (d) (e)

Figure 3.49 One-dimensional signal resampling (Szeliski, Winder, and Uyttendaele 2010):

(a) original sampled signalf (i); (b) interpolated signal g1(x); (c) warped signal g2(x); (d) filtered signalg3(x); (e) sampled signal f0(i). The corresponding spectra are shown below the signals, with the aliased portions shown in red.

Multi-pass transforms

The optimal approach to warping images without excessive blurring or aliasing is to adap-tively pre-filter the source image at each pixel using an ideal low-pass filter, i.e., an oriented skewed sinc or low-order (e.g., cubic) approximation (Figure3.48a). Figure3.49shows how this works in one dimension. The signal is first (theoretically) interpolated to a continuous waveform, (ideally) low-pass filtered to below the new Nyquist rate, and then re-sampled to the final desired resolution. In practice, the interpolation and decimation steps are concate-nated into a single polyphase digital filtering operation (Szeliski, Winder, and Uyttendaele 2010).

For parametric transforms, the oriented two-dimensional filtering and resampling opera-tions can be approximated using a series of one-dimensional resampling and shearing trans-forms (Catmull and Smith 1980;Heckbert 1989;Wolberg 1990;Gomes, Darsa, Costa et al.

1999; Szeliski, Winder, and Uyttendaele 2010). The advantage of using a series of one-dimensional transforms is that they are much more efficient (in terms of basic arithmetic operations) than large, non-separable, two-dimensional filter kernels.

In order to prevent aliasing, however, it may be necessary to upsample in the opposite di-rection before applying a shearing transformation (Szeliski, Winder, and Uyttendaele 2010).

Figure3.50shows this process for a rotation, where a vertical upsampling stage is added be-fore the horizontal shearing (and upsampling) stage. The upper image shows the appearance of the letter being rotated, while the lower image shows its corresponding Fourier transform.

vertical shear + downsample

(a) (b) (c) (d)

vertical upsample

horizontal shear + upsample

horizontal downsample

(e)

Figure 3.50 Four-pass rotation (Szeliski, Winder, and Uyttendaele 2010): (a) original pixel grid, image, and its Fourier transform; (b) vertical upsampling; (c) horizontal shear and up-sampling; (d) vertical shear and downup-sampling; (e) horizontal downsampling. The general affine case looks similar except that the first two stages perform general resampling.

在文檔中 Computer Vision: (頁 185-192)

相關文件