• 沒有找到結果。

Geometric primitives

在文檔中 Computer Vision: (頁 54-80)

14 Recognition5 Segmentation

2.1 Geometric primitives and transformations

2.1.1 Geometric primitives

Geometric primitives form the basic building blocks used to describe three-dimensional shapes.

In this section, we introduce points, lines, and planes. Later sections of the book discuss curves (Sections5.1and11.2), surfaces (Section12.3), and volumes (Section12.5).

2D points. 2D points (pixel coordinates in an image) can be denoted using a pair of values, x= (x, y) ∈ R2, or alternatively,

x=

"

x y

#

. (2.1)

(As stated in the introduction, we use the(x1, x2, . . .) notation to denote column vectors.) 2D points can also be represented using homogeneous coordinates,x˜= (˜x, ˜y, ˜w)∈ P2, where vectors that differ only by scale are considered to be equivalent.P2= R3− (0, 0, 0) is called the 2D projective space.

A homogeneous vector x can be converted back into an inhomogeneous vector x by˜ dividing through by the last elementw, i.e.,˜

˜

x= (˜x, ˜y, ˜w) = ˜w(x, y, 1) = ˜w ¯x, (2.2) wherex¯= (x, y, 1) is the augmented vector. Homogeneous points whose last element is ˜w = 0 are called ideal points or points at infinity and do not have an equivalent inhomogeneous representation.

2D lines. 2D lines can also be represented using homogeneous coordinates ˜l = (a, b, c).

The corresponding line equation is

¯

x· ˜l= ax + by + c = 0. (2.3)

We can normalize the line equation vector so that l= (ˆnx, ˆny, d) = (ˆn, d) withkˆnk = 1. In this case,n is the normal vector perpendicular to the line andˆ d is its distance to the origin (Figure2.2). (The one exception to this normalization is the line at infinity ˜l = (0, 0, 1), which includes all (ideal) points at infinity.)

We can also expressn as a function of rotation angleˆ θ, ˆn = (ˆnx, ˆny) = (cos θ, sin θ) (Figure2.2a). This representation is commonly used in the Hough transform line-finding

2.1 Geometric primitives and transformations 33

y

x d

θ n l

^ z

x d

n m

y

^

(a) (b)

Figure 2.2 (a) 2D line equation and (b) 3D plane equation, expressed in terms of the normal ˆ

n and distance to the origind.

algorithm, which is discussed in Section 4.3.2. The combination(θ, d) is also known as polar coordinates.

When using homogeneous coordinates, we can compute the intersection of two lines as

˜

x= ˜l1× ˜l2, (2.4)

where× is the cross product operator. Similarly, the line joining two points can be written as

˜l= ˜x1× ˜x2. (2.5)

When trying to fit an intersection point to multiple lines or, conversely, a line to multiple points, least squares techniques (Section6.1.1and AppendixA.2) can be used, as discussed in Exercise2.1.

2D conics. There are other algebraic curves that can be expressed with simple polynomial homogeneous equations. For example, the conic sections (so called because they arise as the intersection of a plane and a 3D cone) can be written using a quadric equation

˜

xTQx˜= 0. (2.6)

Quadric equations play useful roles in the study of multi-view geometry and camera calibra-tion (Hartley and Zisserman 2004;Faugeras and Luong 2001) but are not used extensively in this book.

3D points. Point coordinates in three dimensions can be written using inhomogeneous co-ordinates x= (x, y, z) ∈ R3or homogeneous coordinatesx˜= (˜x, ˜y, ˜z, ˜w)∈ P3. As before, it is sometimes useful to denote a 3D point using the augmented vectorx¯= (x, y, z, 1) with

˜ x= ˜w ¯x.

z

x

λ p

q y

r=(1-λ)p+λq

Figure 2.3 3D line equation, r= (1 − λ)p + λq.

3D planes. 3D planes can also be represented as homogeneous coordinatesm˜ = (a, b, c, d) with a corresponding plane equation

¯

x· ˜m= ax + by + cz + d = 0. (2.7)

We can also normalize the plane equation as m= (ˆnx, ˆny, ˆnz, d) = (ˆn, d) withkˆnk = 1.

In this case,n is the normal vector perpendicular to the plane andˆ d is its distance to the origin (Figure2.2b). As with the case of 2D lines, the plane at infinitym˜ = (0, 0, 0, 1), which contains all the points at infinity, cannot be normalized (i.e., it does not have a unique normal or a finite distance).

We can expressn as a function of two anglesˆ (θ, φ), ˆ

n= (cos θ cos φ, sin θ cos φ, sin φ), (2.8) i.e., using spherical coordinates, but these are less commonly used than polar coordinates since they do not uniformly sample the space of possible normal vectors.

3D lines. Lines in 3D are less elegant than either lines in 2D or planes in 3D. One possible representation is to use two points on the line,(p, q). Any other point on the line can be expressed as a linear combination of these two points

r= (1 − λ)p + λq, (2.9)

as shown in Figure2.3. If we restrict0 ≤ λ ≤ 1, we get the line segment joining p and q.

If we use homogeneous coordinates, we can write the line as

˜

r= µ˜p+ λ˜q. (2.10)

A special case of this is when the second point is at infinity, i.e.,q˜= ( ˆdx, ˆdy, ˆdz, 0) = ( ˆd, 0).

Here, we see that ˆd is the direction of the line. We can then re-write the inhomogeneous 3D line equation as

r= p + λ ˆd. (2.11)

2.1 Geometric primitives and transformations 35 A disadvantage of the endpoint representation for 3D lines is that it has too many degrees of freedom, i.e., six (three for each endpoint) instead of the four degrees that a 3D line truly has. However, if we fix the two points on the line to lie in specific planes, we obtain a rep-resentation with four degrees of freedom. For example, if we are representing nearly vertical lines, thenz = 0 and z = 1 form two suitable planes, i.e., the (x, y) coordinates in both planes provide the four coordinates describing the line. This kind of two-plane parameteri-zation is used in the light field and Lumigraph image-based rendering systems described in Chapter13 to represent the collection of rays seen by a camera as it moves in front of an object. The two-endpoint representation is also useful for representing line segments, even when their exact endpoints cannot be seen (only guessed at).

If we wish to represent all possible lines without bias towards any particular orientation, we can use Pl¨ucker coordinates (Hartley and Zisserman 2004, Chapter 2;Faugeras and Luong 2001, Chapter 3). These coordinates are the six independent non-zero entries in the4×4 skew symmetric matrix

L= ˜p˜qT − ˜qp˜T, (2.12)

wherep and˜ q are any two (non-identical) points on the line. This representation has only˜ four degrees of freedom, since L is homogeneous and also satisfiesdet(L) = 0, which results in a quadratic constraint on the Pl¨ucker coordinates.

In practice, the minimal representation is not essential for most applications. An ade-quate model of 3D lines can be obtained by estimating their direction (which may be known ahead of time, e.g., for architecture) and some point within the visible portion of the line (see Section7.5.1) or by using the two endpoints, since lines are most often visible as finite line segments. However, if you are interested in more details about the topic of minimal line parameterizations,F¨orstner(2005) discusses various ways to infer and model 3D lines in projective geometry, as well as how to estimate the uncertainty in such fitted models.

3D quadrics. The 3D analog of a conic section is a quadric surface

¯

xTQ¯x= 0 (2.13)

(Hartley and Zisserman 2004, Chapter 2). Again, while quadric surfaces are useful in the study of multi-view geometry and can also serve as useful modeling primitives (spheres, ellipsoids, cylinders), we do not study them in great detail in this book.

2.1.2 2D transformations

Having defined our basic primitives, we can now turn our attention to how they can be trans-formed. The simplest transformations occur in the 2D plane and are illustrated in Figure2.4.

y

x similarity

Euclidean affine

projective translation

Figure 2.4 Basic set of 2D planar transformations.

Translation. 2D translations can be written as x0 = x + t or x0=h

I t i

¯

x (2.14)

where I is the (2 × 2) identity matrix or

¯ x0=

"

I t

0T 1

#

¯

x (2.15)

where0 is the zero vector. Using a 2 × 3 matrix results in a more compact notation, whereas using a full-rank3 × 3 matrix (which can be obtained from the 2 × 3 matrix by appending a [0T 1] row) makes it possible to chain transformations using matrix multiplication. Note that in any equation where an augmented vector such asx appears on both sides, it can always be¯ replaced with a full homogeneous vectorx.˜

Rotation + translation. This transformation is also known as 2D rigid body motion or the 2D Euclidean transformation(since Euclidean distances are preserved). It can be written as x0= Rx + t or

x0=h

R t i

¯

x (2.16)

where

R=

"

cos θ − sin θ sin θ cos θ

#

(2.17) is an orthonormal rotation matrix with RRT = I and |R| = 1.

Scaled rotation. Also known as the similarity transform, this transformation can be ex-pressed as x0= sRx + t where s is an arbitrary scale factor. It can also be written as

x0 =h

sR t i

¯ x=

"

a −b tx

b a ty

#

¯

x, (2.18)

where we no longer require thata2+ b2 = 1. The similarity transform preserves angles between lines.

2.1 Geometric primitives and transformations 37 Affine. The affine transformation is written as x0 = A¯x, where A is an arbitrary2 × 3 matrix, i.e.,

x0 =

"

a00 a01 a02

a10 a11 a12

#

¯

x. (2.19)

Parallel lines remain parallel under affine transformations.

Projective. This transformation, also known as a perspective transform or homography, operates on homogeneous coordinates,

˜

x0 = ˜Hx˜, (2.20)

where ˜H is an arbitrary3 × 3 matrix. Note that ˜H is homogeneous, i.e., it is only defined up to a scale, and that two ˜H matrices that differ only by scale are equivalent. The resulting homogeneous coordinatex˜0must be normalized in order to obtain an inhomogeneous result x, i.e.,

x0 =h00x + h01y + h02

h20x + h21y + h22

and y0= h10x + h11y + h12

h20x + h21y + h22

. (2.21)

Perspective transformations preserve straight lines (i.e., they remain straight after the trans-formation).

Hierarchy of 2D transformations. The preceding set of transformations are illustrated in Figure 2.4and summarized in Table2.1. The easiest way to think of them is as a set of (potentially restricted)3 × 3 matrices operating on 2D homogeneous coordinate vectors.

Hartley and Zisserman(2004) contains a more detailed description of the hierarchy of 2D planar transformations.

The above transformations form a nested set of groups, i.e., they are closed under com-position and have an inverse that is a member of the same group. (This will be important later when applying these transformations to images in Section3.6.) Each (simpler) group is a subset of the more complex group below it.

Co-vectors. While the above transformations can be used to transform points in a 2D plane, can they also be used directly to transform a line equation? Consider the homogeneous equa-tion ˜l· ˜x= 0. If we transform x0 = ˜Hx, we obtain

˜l0· ˜x0= ˜l0TH˜x˜= ( ˜HT˜l0)Tx˜= ˜l· ˜x = 0, (2.22)

i.e., ˜l0 = ˜H−T˜l. Thus, the action of a projective transformation on a co-vector such as a 2D line or 3D normal can be represented by the transposed inverse of the matrix, which is equiv-alent to the adjoint of ˜H, since projective transformation matrices are homogeneous. Jim

Transformation Matrix # DoF Preserves Icon

translation h

I t i

2×3 2 orientation rigid (Euclidean) h

R t i

2×3 3 lengths



 SS

SS

similarity h

sR t i

2×3 4 angles



 S

S

affine h

A i

2×3 6 parallelism 



projective h

H˜ i

3×3 8 straight lines

``

Table 2.1 Hierarchy of 2D coordinate transformations. Each transformation also preserves the properties listed in the rows below it, i.e., similarity preserves not only angles but also parallelism and straight lines. The2×3 matrices are extended with a third [0T1] row to form a full3 × 3 matrix for homogeneous coordinate transformations.

Blinn (1998) describes (in Chapters 9 and 10) the ins and outs of notating and manipulating co-vectors.

While the above transformations are the ones we use most extensively, a number of addi-tional transformations are sometimes used.

Stretch/squash. This transformation changes the aspect ratio of an image, x0 = sxx + tx

y0 = syy + ty,

and is a restricted form of an affine transformation. Unfortunately, it does not nest cleanly with the groups listed in Table2.1.

Planar surface flow. This eight-parameter transformation (Horn 1986;Bergen, Anandan, Hanna et al. 1992;Girod, Greiner, and Niemann 2000),

x0 = a0+ a1x + a2y + a6x2+ a7xy y0 = a3+ a4x + a5y + a7x2+ a6xy,

arises when a planar surface undergoes a small 3D motion. It can thus be thought of as a small motion approximation to a full homography. Its main attraction is that it is linear in the motion parameters,ak, which are often the quantities being estimated.

2.1 Geometric primitives and transformations 39

Transformation Matrix # DoF Preserves Icon

translation h

I t i

3×4 3 orientation rigid (Euclidean) h

R t i

3×4 6 lengths



 SS

SS

similarity h

sR t i

3×4 7 angles



 S

S

affine h

A i

3×4 12 parallelism 



projective h

H˜ i

4×4 15 straight lines

``

Table 2.2 Hierarchy of 3D coordinate transformations. Each transformation also preserves the properties listed in the rows below it, i.e., similarity preserves not only angles but also parallelism and straight lines. The3 × 4 matrices are extended with a fourth [0T 1] row to form a full4 × 4 matrix for homogeneous coordinate transformations. The mnemonic icons are drawn in 2D but are meant to suggest transformations occurring in a full 3D cube.

Bilinear interpolant. This eight-parameter transform (Wolberg 1990), x0 = a0+ a1x + a2y + a6xy

y0 = a3+ a4x + a5y + a7xy,

can be used to interpolate the deformation due to the motion of the four corner points of a square. (In fact, it can interpolate the motion of any four non-collinear points.) While the deformation is linear in the motion parameters, it does not generally preserve straight lines (only lines parallel to the square axes). However, it is often quite useful, e.g., in the interpolation of sparse grids using splines (Section8.3).

2.1.3 3D transformations

The set of three-dimensional coordinate transformations is very similar to that available for 2D transformations and is summarized in Table2.2. As in 2D, these transformations form a nested set of groups.Hartley and Zisserman(2004, Section 2.4) give a more detailed descrip-tion of this hierarchy.

Translation. 3D translations can be written as x0 = x + t or x0=h

I t i

¯

x (2.23)

where I is the (3 × 3) identity matrix and 0 is the zero vector.

Rotation + translation. Also known as 3D rigid body motion or the 3D Euclidean trans-formation, it can be written as x0= Rx + t or

x0=h

R t i

¯

x (2.24)

where R is a3 × 3 orthonormal rotation matrix with RRT = I and |R| = 1. Note that sometimes it is more convenient to describe a rigid motion using

x0= R(x − c) = Rx − Rc, (2.25)

where c is the center of rotation (often the camera center).

Compactly parameterizing a 3D rotation is a non-trivial task, which we describe in more detail below.

Scaled rotation. The 3D similarity transform can be expressed as x0 = sRx + t where s is an arbitrary scale factor. It can also be written as

x0 =h

sR t i

¯

x. (2.26)

This transformation preserves angles between lines and planes.

Affine. The affine transform is written as x0 = A¯x, where A is an arbitrary3 × 4 matrix, i.e.,

x0 =



a00 a01 a02 a03

a10 a11 a12 a13

a20 a21 a22 a23

 ¯x. (2.27)

Parallel lines and planes remain parallel under affine transformations.

Projective. This transformation, variously known as a 3D perspective transform, homogra-phy, or collineation, operates on homogeneous coordinates,

˜

x0 = ˜Hx,˜ (2.28)

where ˜H is an arbitrary4 × 4 homogeneous matrix. As in 2D, the resulting homogeneous coordinatex˜0must be normalized in order to obtain an inhomogeneous result x. Perspective transformations preserve straight lines (i.e., they remain straight after the transformation).

2.1 Geometric primitives and transformations 41

v

v

n^

v× v v××

u

u

θ

Figure 2.5 Rotation around an axisn by an angleˆ θ.

2.1.4 3D rotations

The biggest difference between 2D and 3D coordinate transformations is that the parameter-ization of the 3D rotation matrix R is not as straightforward but several possibilities exist.

Euler angles

A rotation matrix can be formed as the product of three rotations around three cardinal axes, e.g.,x, y, and z, or x, y, and x. This is generally a bad idea, as the result depends on the order in which the transforms are applied. What is worse, it is not always possible to move smoothly in the parameter space, i.e., sometimes one or more of the Euler angles change dramatically in response to a small change in rotation.1 For these reasons, we do not even give the formula for Euler angles in this book—interested readers can look in other textbooks or technical reports (Faugeras 1993; Diebel 2006). Note that, in some applications, if the rotations are known to be a set of uni-axial transforms, they can always be represented using an explicit set of rigid transformations.

Axis/angle (exponential twist)

A rotation can be represented by a rotation axisn and an angleˆ θ, or equivalently by a 3D vector ω = θˆn. Figure2.5shows how we can compute the equivalent rotation. First, we project the vector v onto the axisn to obtainˆ

vk= ˆn(ˆn· v) = (ˆnnˆT)v, (2.29) which is the component of v that is not affected by the rotation. Next, we compute the perpendicular residual of v fromn,ˆ

v = v − vk= (I − ˆnˆnT)v. (2.30)

1In robotics, this is sometimes referred to as gimbal lock.

We can rotate this vector by90using the cross product,

v×= ˆn× v = [ˆn]×v, (2.31)

where[ˆn]×is the matrix form of the cross product operator with the vectornˆ= (ˆnx, ˆny, ˆnz),

[ˆn]×=



0 −ˆnz ˆny

ˆnz 0 −ˆnx

−ˆny ˆnx 0

 . (2.32)

Note that rotating this vector by another90is equivalent to taking the cross product again, v××= ˆn× v× = [ˆn]2×v= −v,

and hence

vk= v − v= v + v××= (I + [ˆn]2×)v.

We can now compute the in-plane component of the rotated vector u as u= cos θv+ sin θv× = (sin θ[ˆn]×− cos θ[ˆn]2×)v.

Putting all these terms together, we obtain the final rotated vector as

u= u+ vk= (I + sin θ[ˆn]×+ (1 − cos θ)[ˆn]2×)v. (2.33) We can therefore write the rotation matrix corresponding to a rotation byθ around an axis ˆn as

R(ˆn, θ) = I + sin θ[ˆn]×+ (1 − cos θ)[ˆn]2×, (2.34) which is known as Rodriguez’s formula (Ayache 1989).

The product of the axisn and angleˆ θ, ω = θˆn= (ωx, ωy, ωz), is a minimal represen-tation for a 3D rorepresen-tation. Rorepresen-tations through common angles such as multiples of90can be represented exactly (and converted to exact matrices) ifθ is stored in degrees. Unfortunately, this representation is not unique, since we can always add a multiple of360(2π radians) to θ and get the same rotation matrix. As well, (ˆn, θ) and (−ˆn,−θ) represent the same rotation.

However, for small rotations (e.g., corrections to rotations), this is an excellent choice.

In particular, for small (infinitesimal or instantaneous) rotations andθ expressed in radians, Rodriguez’s formula simplifies to

R(ω) ≈ I + sin θ[ˆn]×≈ I + [θˆn]× =



1 −ωz ωy

ωz 1 −ωx

−ωy ωx 1

 , (2.35)

2.1 Geometric primitives and transformations 43 which gives a nice linearized relationship between the rotation parameters ω and R. We can also write R(ω)v ≈ v + ω × v, which is handy when we want to compute the derivative of Rv with respect to ω,

∂Rv

∂ωT = −[v]× =



0 z −y

−z 0 x

y −x 0

 . (2.36)

Another way to derive a rotation through a finite angle is called the exponential twist (Murray, Li, and Sastry 1994). A rotation by an angleθ is equivalent to k rotations through θ/k. In the limit as k→ ∞, we obtain

R(ˆn, θ) = lim

k→∞(I +1

k[θˆn]×)k= exp [ω]×. (2.37) If we expand the matrix exponential as a Taylor series (using the identity[ˆn]k+2× = −[ˆn]k×, k > 0, and again assuming θ is in radians),

exp [ω]× = I + θ[ˆn]×2

2[ˆn]2×3

3![ˆn]3×+ · · ·

= I + (θ −θ3

3! + · · ·)[ˆn]×+ (θ2 2 −

θ3

4! + · · ·)[ˆn]2×

= I + sin θ[ˆn]×+ (1 − cos θ)[ˆn]2×, (2.38) which yields the familiar Rodriguez’s formula.

Unit quaternions

The unit quaternion representation is closely related to the angle/axis representation. A unit quaternion is a unit length 4-vector whose components can be written as q= (qx, qy, qz, qw) or q= (x, y, z, w) for short. Unit quaternions live on the unit sphere kqk = 1 and antipodal (opposite sign) quaternions, q and−q, represent the same rotation (Figure2.6). Other than this ambiguity (dual covering), the unit quaternion representation of a rotation is unique.

Furthermore, the representation is continuous, i.e., as rotation matrices vary continuously, one can find a continuous quaternion representation, although the path on the quaternion sphere may wrap all the way around before returning to the “origin” qo = (0, 0, 0, 1). For these and other reasons given below, quaternions are a very popular representation for pose and for pose interpolation in computer graphics (Shoemake 1985).

Quaternions can be derived from the axis/angle representation through the formula q= (v, w) = (sinθ

2n, cosˆ θ

2), (2.39)

z

x

w

║q║=1

y q0 q1

q2

-q2

Figure 2.6 Unit quaternions live on the unit spherekqk = 1. This figure shows a smooth trajectory through the three quaternions q0, q1, and q2. The antipodal point to q2, namely

−q2, represents the same rotation as q2.

wheren andˆ θ are the rotation axis and angle. Using the trigonometric identities sin θ = 2 sinθ2cosθ2 and(1 − cos θ) = 2 sin2 θ2, Rodriguez’s formula can be converted to

R(ˆn, θ) = I + sin θ[ˆn]×+ (1 − cos θ)[ˆn]2×

= I + 2w[v]×+ 2[v]2×. (2.40)

This suggests a quick way to rotate a vector v by a quaternion using a series of cross products, scalings, and additions. To obtain a formula for R(q) as a function of (x, y, z, w), recall that

[v]×=



0 −z y

z 0 −x

−y x 0

 and [v]2×=



−y2− z2 xy xz

xy −x2− z2 yz

xz yz −x2− y2

 .

We thus obtain

R(q) =



1 − 2(y2+ z2) 2(xy − zw) 2(xz + yw) 2(xy + zw) 1 − 2(x2+ z2) 2(yz − xw) 2(xz − yw) 2(yz + xw) 1 − 2(x2+ y2)



 . (2.41)

The diagonal terms can be made more symmetrical by replacing1 − 2(y2+ z2) with (x2+ w2− y2− z2), etc.

The nicest aspect of unit quaternions is that there is a simple algebra for composing rota-tions expressed as unit quaternions. Given two quaternions q0= (v0, w0) and q1= (v1, w1), the quaternion multiply operator is defined as

q2= q0q1= (v0× v1+ w0v1+ w1v0, w0w1− v0· v1), (2.42)

2.1 Geometric primitives and transformations 45 with the property that R(q2) = R(q0)R(q1). Note that quaternion multiplication is not commutative, just as 3D rotations and matrix multiplications are not.

Taking the inverse of a quaternion is easy: Just flip the sign of v orw (but not both!).

(You can verify this has the desired effect of transposing the R matrix in (2.41).) Thus, we can also define quaternion division as

q2= q0/q1= q0q−11 = (v0× v1+ w0v1− w1v0, −w0w1− v0· v1). (2.43) This is useful when the incremental rotation between two rotations is desired.

In particular, if we want to determine a rotation that is partway between two given rota-tions, we can compute the incremental rotation, take a fraction of the angle, and compute the new rotation. This procedure is called spherical linear interpolation or slerp for short (Shoe-make 1985) and is given in Algorithm2.1. Note that Shoemake presents two formulas other than the one given here. The first exponentiates qrby alpha before multiplying the original quaternion,

q2= qαrq0, (2.44)

while the second treats the quaternions as 4-vectors on a sphere and uses q2=sin(1 − α)θ

sin θ q0+sin αθ

sin θ q1, (2.45)

whereθ = cos−1(q0· q1) and the dot product is directly between the quaternion 4-vectors.

All of these formulas give comparable results, although care should be taken when q0and q1 are close together, which is why I prefer to use an arctangent to establish the rotation angle.

Which rotation representation is better?

The choice of representation for 3D rotations depends partly on the application.

The axis/angle representation is minimal, and hence does not require any additional con-straints on the parameters (no need to re-normalize after each update). If the angle is ex-pressed in degrees, it is easier to understand the pose (say,90twist aroundx-axis), and also easier to express exact rotations. When the angle is in radians, the derivatives of R with respect to ω can easily be computed (2.36).

Quaternions, on the other hand, are better if you want to keep track of a smoothly moving camera, since there are no discontinuities in the representation. It is also easier to interpolate between rotations and to chain rigid transformations (Murray, Li, and Sastry 1994;Bregler and Malik 1998).

My usual preference is to use quaternions, but to update their estimates using an incre-mental rotation, as described in Section6.2.2.

procedure slerp(q0, q1, α):

1. qr= q1/q0= (vr, wr) 2. ifwr< 0 then qr← −qr

3. θr= 2 tan−1(kvrk/wr) 4. nˆr= N (vr) = vr/kvrk 5. θα= α θr

6. qα= (sinθ2αr, cosθ2α) 7. return q2= qαq0

Algorithm 2.1 Spherical linear interpolation (slerp). The axis and total angle are first com-puted from the quaternion ratio. (This computation can be lifted outside an inner loop that generates a set of interpolated position for animation.) An incremental quaternion is then computed and multiplied by the starting rotation quaternion.

2.1.5 3D to 2D projections

Now that we know how to represent 2D and 3D geometric primitives and how to transform them spatially, we need to specify how 3D primitives are projected onto the image plane. We can do this using a linear 3D to 2D projection matrix. The simplest model is orthography, which requires no division to get the final (inhomogeneous) result. The more commonly used model is perspective, since this more accurately models the behavior of real cameras.

Orthography and para-perspective

An orthographic projection simply drops thez component of the three-dimensional coordi-nate p to obtain the 2D point x. (In this section, we use p to denote 3D points and x to denote 2D points.) This can be written as

x= [I2×2|0] p. (2.46)

If we are using homogeneous (projective) coordinates, we can write

˜ x=



1 0 0 0 0 1 0 0 0 0 0 1

 ˜p, (2.47)

2.1 Geometric primitives and transformations 47

(a) 3D view (b) orthography

(c) scaled orthography (d) para-perspective

(e) perspective (f) object-centered

Figure 2.7 Commonly used projection models: (a) 3D view of world, (b) orthography, (c) scaled orthography, (d) para-perspective, (e) perspective, (f) object-centered. Each diagram shows a top-down view of the projection. Note how parallel lines on the ground plane and box sides remain parallel in the non-perspective projections.

i.e., we drop thez component but keep the w component. Orthography is an approximate model for long focal length (telephoto) lenses and objects whose depth is shallow relative to their distance to the camera (Sawhney and Hanson 1991). It is exact only for telecentric lenses (Baker and Nayar 1999,2001).

In practice, world coordinates (which may measure dimensions in meters) need to be scaled to fit onto an image sensor (physically measured in millimeters, but ultimately mea-sured in pixels). For this reason, scaled orthography is actually more commonly used,

x= [sI2×2|0] p. (2.48)

This model is equivalent to first projecting the world points onto a local fronto-parallel image plane and then scaling this image using regular perspective projection. The scaling can be the same for all parts of the scene (Figure2.7b) or it can be different for objects that are being modeled independently (Figure2.7c). More importantly, the scaling can vary from frame to frame when estimating structure from motion, which can better model the scale change that occurs as an object approaches the camera.

Scaled orthography is a popular model for reconstructing the 3D shape of objects far away from the camera, since it greatly simplifies certain computations. For example, pose (camera orientation) can be estimated using simple least squares (Section6.2.1). Under orthography, structure and motion can simultaneously be estimated using factorization (singular value de-composition), as discussed in Section7.3(Tomasi and Kanade 1992).

A closely related projection model is para-perspective (Aloimonos 1990;Poelman and Kanade 1997). In this model, object points are again first projected onto a local reference parallel to the image plane. However, rather than being projected orthogonally to this plane, they are projected parallel to the line of sight to the object center (Figure2.7d). This is followed by the usual projection onto the final image plane, which again amounts to a scaling.

The combination of these two projections is therefore affine and can be written as

˜ x=



a00 a01 a02 a03

a10 a11 a12 a13

0 0 0 1

 ˜p. (2.49)

Note how parallel lines in 3D remain parallel after projection in Figure2.7b–d. Para-perspective provides a more accurate projection model than scaled orthography, without incurring the added complexity of per-pixel perspective division, which invalidates traditional factoriza-tion methods (Poelman and Kanade 1997).

Perspective

The most commonly used projection in computer graphics and computer vision is true 3D perspective(Figure2.7e). Here, points are projected onto the image plane by dividing them

2.1 Geometric primitives and transformations 49 by theirz component. Using inhomogeneous coordinates, this can be written as

¯

x= Pz(p) =

 x/z y/z 1

 . (2.50)

In homogeneous coordinates, the projection has a simple linear form,

˜ x=



1 0 0 0 0 1 0 0 0 0 1 0

 ˜p, (2.51)

i.e., we drop thew component of p. Thus, after projection, it is not possible to recover the distanceof the 3D point from the image, which makes sense for a 2D imaging sensor.

A form often seen in computer graphics systems is a two-step projection that first projects 3D coordinates into normalized device coordinates in the range (x, y, z) ∈ [−1, −1] × [−1, 1] × [0, 1], and then rescales these coordinates to integer pixel coordinates using a view-porttransformation (Watt 1995;OpenGL-ARB 1997). The (initial) perspective projection is then represented using a4 × 4 matrix

˜ x=





1 0 0 0

0 1 0 0

0 0 −zfar/zrange znearzfar/zrange

0 0 1 0



p˜, (2.52)

whereznearandzfar are the near and farz clipping planes and zrange = zfar− znear. Note that the first two rows are actually scaled by the focal length and the aspect ratio so that visible rays are mapped to(x, y, z) ∈ [−1, −1]2. The reason for keeping the third row, rather than dropping it, is that visibility operations, such as z-buffering, require a depth for every graphical element that is being rendered.

If we set znear = 1, zfar → ∞, and switch the sign of the third row, the third element of the normalized screen vector becomes the inverse depth, i.e., the disparity (Okutomi and Kanade 1993). This can be quite convenient in many cases since, for cameras moving around outdoors, the inverse depth to the camera is often a more well-conditioned parameterization than direct 3D distance.

While a regular 2D image sensor has no way of measuring distance to a surface point, range sensors(Section12.2) and stereo matching algorithms (Chapter11) can compute such values. It is then convenient to be able to map from a sensor-based depth or disparity valued directly back to a 3D location using the inverse of a4 × 4 matrix (Section2.1.5). We can do this if we represent perspective projection using a full-rank4 × 4 matrix, as in (2.64).

在文檔中 Computer Vision: (頁 54-80)

相關文件