1) Find bounding box of objects

(1)

Acceleration

Digital Image Synthesis g g y Yung-Yu Chuang

with slides by Mario Costa Sousa, Gordon Stoll and Pat Hanrahan

Acceleration techniques

Bounding volume hierarchy Bounding volume hierarchy

1) Find bounding box of objects

(2)

Bounding volume hierarchy

1) Find bounding box of objects 2) S li bj i

2) Split objects into two groups

Bounding volume hierarchy

1) Find bounding box of objects 2) S li bj i

2) Split objects into two groups 3) Recurse

Bounding volume hierarchy

1) Find bounding box of objects 2) S li bj i

2) Split objects into two groups 3) Recurse

Bounding volume hierarchy

1) Find bounding box of objects 2) S li bj i

2) Split objects into two groups

3) Recurse

(3)

Bounding volume hierarchy

1) Find bounding box of objects 2) S li bj i

2) Split objects into two groups 3) Recurse

Where to split?

• At midpoint

• Sort and put half of the objects on each side

• Sort, and put half of the objects on each side

• Use modeling hierarchy

BVH traversal

• If hit parent, then check all children

BVH traversal

• Don't return intersection immediately because the other subvolumes may have a closer

the other subvolumes may have a closer

intersection

(4)

Bounding volume hierarchy Bounding volume hierarchy

Space subdivision approaches

Quadtree (2D) Unifrom grid Quadtree (2D)

Octree (3D) Unifrom grid

Space subdivision approaches

KD tree BSP tree

(5)

Uniform grid Uniform grid

P

Preprocess scene 1. Find bounding box

Uniform grid

P

Preprocess scene 1. Find bounding box 2 Determine grid resolution 2. Determine grid resolution

Uniform grid

P

Preprocess scene 1. Find bounding box 2 Determine grid resolution 2. Determine grid resolution 3. Place object in cell if its

bounding box overlaps the cell

(6)

Uniform grid

P

Preprocess scene 1. Find bounding box 2 Determine grid resolution 2. Determine grid resolution 3. Place object in cell if its

bounding box overlaps the cell 4. Check that object overlaps cell

(expensive!)

Uniform grid traversal

P

Preprocess scene Traverse grid

3D li 3D DDA 3D line = 3D-DDA (Digital Differential Analyzer)

Analyzer)

1 2

x x

y m  y  b

mx y  

1

2

x

x 

1

  1

 i

i

x

b mx

y

_i_₁



_i_₁

 naive

m y

y

_i_1



_i

 _DDA

octree

Octree K-d tree

A

Leaf nodes correspond to unique regions in space

(7)

K-d tree

A

B

A

K-d tree

A

B B

B

A

K-d tree

C A

B B

B

A

K-d tree

C A

B B

C B

C

A

(8)

K-d tree

C A

D B B

C B

C

A

K-d tree

C A

D B B

C B

C

A

D

K-d tree

A

D B

B C C C C

D

A

K-d tree traversal

A

D B

B C C C C

D A

(9)

BSP tree

6 55

9 7

9 10 8

1 2

11

4 2

3

BSP tree

6 5

5 11

inside outside 9

7

inside ones

outside ones 9

10 8 1 2

11

4 2

3

BSP tree

6 55 11

2 5

9 6 7

3 4

6 7 8 9

10 8 1

2

9 10 11 11

4 2

3

BSP tree

6 5 1

9 5

9b 5

1

9 10 8

7

9a

6 7

11b 8

9b 1

2

11

9a 10 11a 11a

11b

4

2 11a

3

(10)

BSP tree

6 5

9 9b

2 1

5

1

9 10 8

7

2 3

5

6 8

9a

11b 1

2

11a 11 4 7 9b

4 2

9a 11b

3 10

11a

BSP tree traversal

6 5

9 5

9b 2

1 5

1

9 10 8

7

2 3

5

6 8

9a

11b 9a

11b 1

2

11a 11 4 7 9b

4 2

i t

9a 11b

3 point

10

11a

BSP tree traversal

6 5

9 5

9b 2

1 5

1

9 10 8

7

2 3

5

6 8

9a

11b 9a

11b 1

2

11a 11 4 7 9b

4 2

i t

9a 11b

3 point

10 11a

BSP tree traversal

6 5

9 9b 2

1 5

1

9 10 8

7

2 3

5

6 8

9a

11b 9a

11b 1

2

11a 11 4 7 9b

2

point

9a 11b 4

point 3 10

11a

(11)

Classes

• Primitive (in core/primitive.*)

G t i P i iti – GeometricPrimitive – InstancePrimitive

A t

– Aggregate

• Three types of accelerators are provided (in accelerators/*.cpp)

– GridAccel – BVHAccel – KdTreeAccel

Hierarchy

Primitive

Geometric P i iti

Transformed P i iti

Aggregate Primitive Primitive

T

Material Shape

T

p

Support both Instancing and

Store intersectable primitives, call animation Refine If necessary

Primitive

class Primitive : public ReferenceCounted {

<Primitive interface>

const int primitiveId;

static int nextprimitiveId;

}

class TransformedPrimitive: public Primitive {

{

…

R f P i iti i t

Reference<Primitive> instance;

}

Interface

BBox WorldBound();

bool CanIntersect();

geometry bool CanIntersect();

bool Intersect(const Ray &r,

Intersection *in);

// update maxt Intersection *in);

bool IntersectP(const Ray &r);

void Refine(vector<Reference<Primitive>> &refined);

void FullyRefine(vector<Reference<Primitive>> &refined);

**AreaLight *GetAreaLight();**

**BSDF *GetBSDF(const DifferentialGeometry &dg,** material

( y g,

Transform &WorldToObject);

**BSSRDF *GetBSSRDF(DifferentialGeometry &dg, BSSRDF GetBSSRDF(DifferentialGeometry &dg,**

Transform &WorldToObject);

(12)

Intersection

struct Intersection {

<Intersection interface>

DifferentialGeometry dg;

**const Primitive *primitive;**

Transform WorldToObject, ObjectToWorld;

int shapeId primitiveId;

int shapeId, primitiveId;

float rayEpsilon;

} adapti el estimated }; adaptively estimated

primitive stores the actual intersecting primitive hence primitive stores the actual intersecting primitive, hence Primitive->GetAreaLight and GetBSDF can only be called for GeometricPrimitive

for GeometricPrimitive

GeometricPrimitive

• represents a single shape

h ld f t h d it i l

• holds a reference to a Shape and its Material, and a pointer to an AreaLight

Reference<Shape> shape;

Reference<Material> material; // BRDF **AreaLight *areaLight; // emittance**

• Most operations are forwarded to shape

GeometricPrimitive

bool Intersect(Ray &r,Intersection *isect) { float thit, rayEpsilon;

float thit, rayEpsilon;

if (!shape->Intersect(r, &thit,

&rayEpsilon, &isect->dg))y p , g)) return false;

isect->primitive = this;

isect->WorldToObject = *shape->WorldToObject;

isect->ObjectToWorld = *shape->ObjectToWorld;

isect->shapeId = shape->shapeId;

isect->primitiveId = primitiveId;

isect->rayEpsilon = rayEpsilon;

r.maxt = thit;

return true;

}

Object instancing

61 unique plants, 4000 individual plants, 19.5M triangles q p , p , g

With instancing, store only 1.1M triangles, 11GB->600MB

(13)

TransformedPrimitive

R f <P i iti > i iti Reference<Primitive> primitive;

AnimatedTransform WorldToPrimitive;

for instancing and animation for instancing and animation

Ray ray = WorldToPrimitive(r);

if (!instance->Intersect(ray, isect)) return false;

r.maxt = ray.maxt;

isect->WorldToObject = isect->WorldToObject

*WorldToInstance;

TransformedPrimitive

**bool Intersect(Ray &r, Intersection *isect){**

Transform w2p;

WorldToPrimitive.Interpolate(r.time,&w2p);

Ray ray = w2p(r);

if (!primitive->Intersect(ray, isect)) return false;

return false;

r.maxt = ray.maxt;

i t > i iti Id i iti Id isect->primitiveId = primitiveId;

if (!w2p.IsIdentity()) {

// Compute world to object transformation for instance // Compute world-to-object transformation for instance

isect->WorldToObject=isect->WorldToObject*w2p;

isect->ObjectToWorld=Inverse(

isect >ObjectToWorld=Inverse(

isect->WorldToObject);

TransformedPrimitive

// Transform instance's differential geometry to world space Transform PrimitiveToWorld = Inverse(w2p);( p);

isect->dg.p = PrimitiveToWorld(isect->dg.p);

isect->dg.nn = Normalize(

PrimitiveToWorld(isect- >dg.nn));

isect->dg.dpdu=PrimitiveToWorld(isect->dg.dpdu);

isect->dg.dpdv=PrimitiveToWorld(isect->dg.dpdv);

isect->dg.dndu=PrimitiveToWorld(isect->dg.dndu);

isect->dg.dndv=PrimitiveToWorld(isect->dg.dndv);

}

return true;

}

Aggregates

• Acceleration is a heart component of a ray tracer because ray/scene intersection accounts tracer because ray/scene intersection accounts for the majority of execution time

G l d th b f / i iti

• Goal: reduce the number of ray/primitive

intersections by quick simultaneous rejection of f i iti d th f t th t b groups of primitives and the fact that nearby intersections are likely to be found first

• Two main approaches: spatial subdivision, object subdivision

• No clear winner

(14)

Ray-Box intersections

• Almost all acclerators require it

Q i k j i d i i

• Quick rejection, use enter and exit point to traverse the hierarchy

• AABB is the intersection of three slabs

Ray-Box intersections

1 D

_x

O

x

t  x

¹

 t

1

D

x

t

₁

 O

x

₁



x=x

₀

x=x

₁

Ray-Box intersections

bool BBox::IntersectP(const Ray &ray,

float *hitt0, float *hitt1) {

float t0 = ray.mint, t1 = ray.maxt;

for (int i = 0; i < 3; ++i) {

float invRayDir = 1.f / ray.d[i];

float tNear = (pMin[i] - ray.o[i]) * invRayDir;

float tFar = (pMax[i] - ray.o[i]) * invRayDir;(p [ ] y [ ]) y ; if (tNear > tFar) swap(tNear, tFar);

t0 = tNear > t0 ? tNear : t0;

i i

t0 = tNear > t0 ? tNear : t0;

t1 = tFar < t1 ? tFar : t1;

if (t0 > t1) return false;

}

segment intersection intersection is empty

}

if (hitt0) *hitt0 = t0;

if (hitt1) *hitt1 = t1;

return true;

}

Grid accelerator

• Uniform grid

(15)

Teapot in a stadium problem

• Not adaptive to distribution of primitives.

Have to determine the number of voxels

• Have to determine the number of voxels.

(problem with too many or too few)

GridAccel

Class GridAccel:public Aggregate {

<GridAccel methods>

u_int nMailboxes;

**MailboxPrim *mailboxes;**

vector<Reference<Primitive>> primitives;

int nVoxels[3];

BBox bounds;

V t Width I Width Vector Width, InvWidth;

Voxel voxels;**

M A lA

MemoryArena voxelArena;

**RWMutex *rwMutex;**

}

mailbox

struct MailboxPrim {

Reference<Primitive> primitive;

Int lastMailboxId;

}

GridAccel

GridAccel(vector<Reference<Primitive> > &p, bool forRefined bool refineImmediately) bool forRefined, bool refineImmediately) : gridForRefined(forRefined) {

// Initialize with primitives for grid // Initialize with primitives for grid

if (refineImmediately)

for (int i 0; i < p size(); ++i) for (int i = 0; i < p.size(); ++i) p[i]->FullyRefine(primitives);

l else

primitives = p;

f (i t i 0 i i iti i () i)

for (int i = 0; i < primitives.size(); ++i) bounds = Union(bounds,

())

primitives[i]->WorldBound());

(16)

Determine number of voxels

• Too many voxels → slow traverse, large

memory consumption (bad cache performance) memory consumption (bad cache performance)

• Too few voxels → too many primitives in a l

voxel

• Let the axis with the largest extent have 3 N

³

partitions (N:number of primitives)

Vector delta = bounds.pMax - bounds.pMin;

int maxAxis=bounds.MaximumExtent();

float invMaxWidth=1.f/delta[maxAxis];

float cubeRoot=3.f*powf(float(prims.size()),1.f/3.f);

float voxelsPerUnitDist=cubeRoot * invMaxWidth;

Calculate voxel size and allocate voxels

for (int axis=0; axis<3; ++axis) {

nVoxels[axis]=Round2Int(delta[axis]*voxelsPerUnitDist);

nVoxels[axis]=Clamp(nVoxels[axis], 1, 64);

}

width[axis]=delta[axis]/nVoxels[axis];

invWidth[axis]=

(width[axis]==0.f)?0.f:1.f/width[axis];

}

int nv = nVoxels[0] * nVoxels[1] * nVoxels[2];

voxels=AllocAligned<Voxel *>(nv);

memset(voxels 0 nv * sizeof(Voxel *));

memset(voxels, 0, nv * sizeof(Voxel *));

Conversion between voxel and position

int posToVoxel(const Point &P, int axis) { int v=Float2Int(

(P[axis]-bounds.pMin[axis])*InvWidth[axis]);

return Clamp(v, 0, NVoxels[axis]-1);

}

float voxelToPos(int p, int axis) const { return bounds pMin[axis]+p*Width[axis];

return bounds.pMin[axis]+p Width[axis];

}

Point voxelToPos(int x, int y, int z) const { return bounds pMin+

return bounds.pMin+

Vector(x*Width[0], y*Width[1], z*Width[2]);

}

inline int offset(int x, int y, int z) {

return z*NVoxels[0]*NVoxels[1] + y*NVoxels[0] + x;

}

Add primitives into voxels

for (u_int i=0; i<prims.size(); ++i) {

<Find voxel extent of primitive>

<Add primitive to overlapping voxels>

}

(17)

<Find voxel extent of primitive>

BBox pb = prims[i]->WorldBound();

int vmin[3] vmax[3];

int vmin[3], vmax[3];

for (int axis = 0; axis < 3; ++axis) { vmin[axis] = posToVoxel(pb pMin axis);

vmin[axis] = posToVoxel(pb.pMin, axis);

vmax[axis] = posToVoxel(pb.pMax, axis);

} }

<Add primitive to overlapping voxels>

for (int z = vmin[2]; z <= vmax[2]; ++z) for (int y = vmin[1]; y <= vmax[1]; ++y)

for (int x = vmin[0]; x <= vmax[0]; ++x) { int o = offset(x, y, z);

if (!voxels[o]) {

voxels[o] = voxelArena.Alloc<Voxel>();

*voxels[o] = Voxel(primitives[i]);

} } else {

// Add primitive to already-allocated voxel voxels[o]->AddPrimitive(primitives[i]);

voxels[o] >AddPrimitive(primitives[i]);

} }

Voxel structure

struct Voxel {

<Voxel methods>

vector<Reference<Primitive>> primitives;

bool allCanIntersect;

}

Voxel(Reference<Primitive> op) { Voxel(Reference<Primitive> op) {

allCanIntersect = false;

i iti h b k( )

primitives.push_back(op);

}

id AddP i iti (R f P i iti ) { void AddPrimitive(Reference<Primitive> p) {

primitives.push_back(p);

}

GridAccel traversal

bool GridAccel::Intersect(

Ray &ray **Intersection isect) { Ray &ray, Intersection isect) {**

<Check ray against overall grid bounds>

<Set up 3D DDA for ray>

<Walk ray through voxel grid>

}

(18)

Check against overall bound

float rayT;

if (bounds Inside(ray(ray mint))) if (bounds.Inside(ray(ray.mint)))

rayT = ray.mint;

else if (!bounds IntersectP(ray &rayT)) else if (!bounds.IntersectP(ray, &rayT))

return false;

Point gridIntersect ray(rayT);

Point gridIntersect = ray(rayT);

Set up 3D DDA (Digital Differential Analyzer)

• Similar to Bresenhm’s line drawing algorithm

Set up 3D DDA (Digital Differential Analyzer)

blue values changes along the traversal

NextCrossingT[1] Out

voxel

g g

voxel index

DeltaT[0]

rayT

Step[0]=1 DeltaT[0]

NextCrossingT[0]

Pos

DeltaT: the distance change when voxel changes 1 in that direction

Set up 3D DDA

Pos[axis]=posToVoxel(gridIntersect, axis);

if (ray.d[axis]>=0) {

NextCrossingT[axis] = rayT+

(voxelToPos(Pos[axis]+1 axis)-gridIntersect[axis]) (voxelToPos(Pos[axis]+1,axis)-gridIntersect[axis]) /ray.d[axis];

DeltaT[axis] = width[axis] / ray.d[axis];

Step[axis] = 1;

Out[axis] = nVoxels[axis];

1

Out[axis] nVoxels[axis];

} else { ...

D

_x

Step[axis] = -1;

Out[axis] = -1;

} }

} width[0]

(19)

Walk through grid

for (;;) {

*voxel=voxels[offset(Pos[0] Pos[1] Pos[2])];

*voxel=voxels[offset(Pos[0],Pos[1],Pos[2])];

if (voxel != NULL) hitSomething |=

hitSomething |=

voxel->Intersect(ray,isect,rayId);

<Advance to next voxel>

}

t hitS thi

return hitSomething;

Do not return; cut tmax instead Do not return; cut tmax instead Return when entering a voxel

that is beyond the closest found intersection.

Advance to next voxel

int bits=((NextCrossingT[0]<NextCrossingT[1])<<2) + ((NextCrossingT[0]<NextCrossingT[2])<<1) + ((NextCrossingT[1]<NextCrossingT[2]));

const int cmpToAxis[8] = { 2, 1, 2, 1, 2, 2, 0, 0 };

int stepAxis=cmpToAxis[bits];

if (ray.maxt < NextCrossingT[stepAxis]) break;

Pos[stepAxis]+=Step[stepAxis];

Pos[stepAxis]+ Step[stepAxis];

if (Pos[stepAxis] == Out[stepAxis]) break;

NextCrossingT[stepAxis] += DeltaT[stepAxis];

conditions

x<y x<z y<z

0 0 0 x≥y≥z 2

0 0 1 x≥z>y 1

0 1 0 -

0 1 1 z>x≥y 1 1 0 0 y>x≥z 2 1 0 0 y>x≥z 2

1 0 1 -

1 1 0 y≥z>x 0 1 1 0 y≥z>x 0 1 1 1 z>y>x 0

Bounding volume hierarchies

• Object subdivision. Each primitive appears in the hierarchy exactly once Additionally the the hierarchy exactly once. Additionally, the required space for the hierarchy is bounded.

BVH G id b th ffi i t t b ild b t

• BVH v.s. Grid: both are efficient to build, but BVH provides much faster intersection.

• BVH v.s. Kd-tree: Kd-tree could be slightly faster for intersection, but takes much longer to build. In addition, BVH is generally more numerically robust and less prone to subtle round-off bugs.

• accelerators/bvh.*

(20)

BVHAccel

class BVHAccel : public Aggregate {

<member functions>

uint32_t maxPrimsInNode;

enum SplitMethod { SPLIT_MIDDLE, SPLIT_EQUAL_COUNTS, SPLIT SAH };

SPLIT_SAH };

SplitMethod splitMethod;

vector<Reference<Primitive> > primitives;

LinearBVHNode *nodes;

}

BVHAccel construction

BVHAccel::BVHAccel(vector<Reference<Primitive> > &p, uint32_t mp, const string &sm) {

maxPrimsInNode = min(255u, mp);

for (uint32 t i = 0; i < p size(); ++i) for (uint32_t i = 0; i < p.size(); ++i)

p[i]->FullyRefine(primitives);

if (sm=="sah") splitMethod =SPLIT_SAH;

else if (sm=="middle") splitMethod =SPLIT_MIDDLE;

else if (sm=="equal") splitMethod=SPLIT_EQUAL_COUNTS;

else { else {

Warning("BVH split method \"%s\" unknown. Using

\"sah\".", sm.c_str());

splitMethod = SPLIT_SAH;

}

BVHAccel construction

<Initialize buildData array for primitives>

<Recursively build BVH tree for primitives>

<compute representation of depth-first traversal of BVH tree>

BVH tree>

} It is possible to construct a pointer-less BVH tree directly but it is less straightforward

directly, but it is less straightforward.

Initialize buildData array

vector<BVHPrimitiveInfo> buildData;

buildData.reserve(primitives.size());(p ());

for (int i = 0; i < primitives.size(); ++i) {

BBox bbox = primitives[i]->WorldBound();

buildData.push_back(

BVHPrimitiveInfo(i bbox));

BVHPrimitiveInfo(i, bbox));

} struct BVHPrimitiveInfo { BVHPrimitiveInfo() { } BVHPrimitiveInfo() { }

BVHPrimitiveInfo(int pn, const BBox &b) : primitiveNumber(pn), bounds(b) { centroid = .5f * b.pMin + .5f * b.pMax;p p ; }

int primitiveNumber;

Point centroid;

BBox bounds;

};

(21)

Recursively build BVH tree

MemoryArena buildArena;

uint32 t totalNodes = 0;_ ;

vector<Reference<Primitive> > orderedPrims;

orderedPrims.reserve(primitives.size());

BVHBuildNode *root = recursiveBuild(buildArena, buildData 0 primitives size() &totalNodes buildData, 0, primitives.size(), &totalNodes, orderedPrims);

[start end)

primitives.swap(orderedPrims);

[start end)

BVHBuildNode

struct BVHBuildNode {

void InitLeaf(int first, int n, BBox &b) {( , , ) { firstPrimOffset = first;

nPrimitives = n; bounds = b;

}

void InitInterior(int axis, BVHBuildNode *c0, BVHBuildNode *c1) {

BVHBuildNode *c1) { children[0] = c0; children[1] = c1;

bounds = Union(c0->bounds, c1->bounds);( , );

splitAxis = axis; nPrimitives = 0;

} The leaf contains primitives from

BVHAccel::primitives[firstPrimOffset]

BBox bounds;

BVHBuildNode *children[2];

int splitAxis firstPrimOffset nPrimitives;

BVHAccel::primitives[firstPrimOffset]

to [firstPrimOffset+nPrimitives-1]

int splitAxis, firstPrimOffset, nPrimitives;

};

recursiveBuild

• Given n primitives, there are in general 2

ⁿ

-2 possible ways to partition them into two non possible ways to partition them into two non- empty groups. In practice, one considers partitions along a coordinate axis resulting in partitions along a coordinate axis, resulting in 6n candidate partitions.

1. Choose axis

2 Ch lit

2. Choose split 3. Interior(dim,

i ild( id )

recursiveBuild(.., start, mid, ..), recursiveBuild(.., mid, end, ..) )

Choose axis

BBox cBounds;

for (int i = start; i < end; ++i) for (int i start; i < end; ++i)

cBounds=Union(cBounds, buildData[i].centroid);

int dim = centroidBounds.MaximumExtent();();

If cBounds has zreo volume, create a leaf

(22)

Choose split (split_middle)

float pmid = .5f * (centroidBounds.pMin[dim] + centroidBounds.pMax[dim]);

centroidBounds.pMax[dim]);

BVHPrimitiveInfo *midPtr = std::partition(

&buildData[start], &buildData[end-1]+1, CompareToMid(dim, pmid));

mid = midPtr - &buildData[0];

Return true if the given primitive’s bound’s centroid is below the given midpoint

Choose split (split_equal_count)

mid = (start + end) / 2;

std::nth element(&buildData[start], std::nth_element(&buildData[start],

&buildData[mid], &buildData[end-1]+1, ComparePoints(dim));p ( ));

It orders the array so that the middle pointer has median, the first half is smaller and the second half is larger in O(n).

Choose split

both heuristics work well both are sub-optimal

better solution

Choose split (split_SAH)

Do not split

N

split



i t

i

t

1

sec

( ) split



 ^



^N^A

t

^N^B

t b

t B

A ) ( ) ( )

(  







i

i t i B i

i t i A

trav

p t a p t b

t B A c

1 sec 1

sec

( ) ( )

) , (

B p

_A

 ^s

^A B ^B

p  s A

C

A

s

p

s

C

(23)

Choose split (split_SAH)

• If there are no more than 4 primitives, use equal size heuristics instead

heuristics instead.

• Instead of testing 2n candidates, the extend is divided into a small number (12) of buckets of equal extent. ( ) q Only buck boundaries are considered.

Choose split (split_SAH)

const int nBuckets = 12;

struct BucketInfo { struct BucketInfo {

int count; BBox bounds;

};

BucketInfo buckets[nBuckets];

for (int i=start; i<end; ++i) { int b = nBuckets *

((buildData[i].centroid[dim]-centroidBounds.pMin[dim])/

(centroidBounds.pMax[dim]-centroidBounds.pMin[dim]));

if (b == nBuckets) b = nBuckets-1;

b k t [b] t++

buckets[b].count++;

buckets[b].bounds = Union(buckets[b].bounds, buildData[i].bounds);

buildData[i].bounds);

}

Choose split (split_SAH)

float cost[nBuckets-1];

for (int i = 0; i < nBuckets-1; ++i) { for (int i 0; i < nBuckets 1; ++i) {

BBox b0, b1;

int count0 = 0, count1 = 0;, ; for (int j = 0; j <= i; ++j) {

b0 = Union(b0, buckets[j].bounds);

count0 += buckets[j].count; }

for (int j = i+1; j < nBuckets; ++j) { b1 = Union(b1, buckets[j].bounds);

count1 += buckets[j].count; }

cost[i] = .125f + (count0*b0.SurfaceArea() + count1*b1.SurfaceArea())/bbox.SurfaceArea();

} Traverse cost : Intersection cost = 1 : 8 } Traverse cost : Intersection cost = 1 : 8

Choose split (split_SAH)

float minCost = cost[0]; uint32_t minCostSplit = 0;

for (int i = 1; i < nBuckets-1; ++i) {( ; ; ) { if (cost[i] < minCost) {

minCost = cost[i];

minCostSplit = i;

} } }

if (nPrimitives > maxPrimsInNode ||

minCost < nPrimitives) {) { BVHPrimitiveInfo *pmid =

std::partition(&buildData[start],&buildData[end- 1]+1 C T B k t( i C tS lit B k t di 1]+1, CompareToBucket(minCostSplit, nBuckets, dim, centroidBounds));

mid = pmid - &buildData[0];p } else <create a leaf>

(24)

Compact BVH

• The last step is to convert the BVH tree into a compact representation which improves cache compact representation which improves cache, memory and thus overall performance.

BVHAccel traversal

bool BVHAccel::Intersect(const Ray &ray, Intersection *isect) const {

if (!nodes) return false;

b l hi f l bool hit = false;

Point origin = ray(ray.mint);

Vector invDir(1.f / ray.d.x, 1.f / ray.d.y, Vector invDir(1.f / ray.d.x, 1.f / ray.d.y,

1.f / ray.d.z);

uint32_t dirIsNeg[3]={ invDir.x < 0, invDir.y < 0, invDir.z < 0 };

i t32 t d N 0 offset into the nodes array to be visited uint32_t nodeNum = 0;

uint32_t todo[64];

uint32 t todoOffset = 0;

offset into the nodes array to be visited next free element in the stack nodes to be visited; acts like a stack uint32_t todoOffset 0; next free element in the stack

BVHAccel traversal

while (true) {

const LinearBVHNode *node = &nodes[nodeNum];[ ];

if (::IntersectP(node->bounds,ray,invDir,dirIsNeg)){

if (node->nPrimitives > 0) { leaf node

// Intersect ray with primitives in leaf BVH node for (uint32_t i = 0; i < node->nPrimitives; ++i){

if (primitives[node >primitivesOffset+i]

if (primitives[node->primitivesOffset+i]

->Intersect(ray, isect)) hit = true;

}

if (todoOffset == 0) break;

nodeNum = todo[--todoOffset];

}

BVHAccel traversal

else {

if (dirIsNeg[node->axis]) {

interior node

( g[ ]) {

todo[todoOffset++] = nodeNum + 1;

nodeNum = node->secondChildOffset;

} else {

todo[todoOffset++] = node >secondChildOffset;

todo[todoOffset++] = node->secondChildOffset;

nodeNum = nodeNum + 1;

} } } } else {

if (todoOffset == 0) break;

nodeNum = todo[--todoOffset];

Do not hit the bounding box; retrieve the next one if any nodeNum = todo[--todoOffset];

}

(25)

KD-Tree accelerator

• Non-uniform space subdivision (for example, kd tree and octree) is better than uniform grid kd-tree and octree) is better than uniform grid if the scene is irregularly distributed.

Spatial hierarchies

A A

A

Letters correspond to planes (A) Point Location by recursive search

Spatial hierarchies

A A

B

A

Letters correspond to planes (A, B) Point Location by recursive search

Spatial hierarchies

A

D

A

B

B C

C

D

A

Letters correspond to planes (A, B, C, D)

Point Location by recursive search

(26)

Variations

octree

kd tree octree bsp tree

kd-tree bsp-tree

“Hack” kd-tree building

• Split axis

R d bi l t t t

– Round-robin; largest extent

• Split location

– Middle of extent; median of geometry (balanced tree)

• Termination

– Target # of primitives, limited tree depth

• All of these techniques stink.

Building good kd-trees

• What split do we really want?

Cl Id th th t k t i h – Clever Idea: the one that makes ray tracing cheap – Write down an expression of cost and minimize it

G d t ti i ti

– Greedy cost optimization

• What is the cost of tracing a ray through a cell?

Cost(cell) = C_trav + Prob(hit L) * Cost(L) + Prob(hit R) * Cost(R)

Splitting with cost in mind

(27)

Split in the middle

To get through this part of empty space, you need to test all triangles on the right.

• Makes the L & R probabilities equal P tt ti t th L & R t

• Pays no attention to the L & R costs

Split at the median

• Makes the L & R costs equal

P tt ti t th L & R b biliti

• Pays no attention to the L & R probabilities

Cost-optimized split

Since Cost(R) is much higher, make it as small as possible

• Automatically and rapidly isolates complexity

P d l h k f t

• Produces large chunks of empty space

Building good kd-trees

• Need the probabilities

T t t b ti l t f

– Turns out to be proportional to surface area

• Need the child cell costs

– Simple triangle count works great (very rough approx.)

ll “b ” – Empty cell “boost”

Cost(cell) = C_trav + Prob(hit L) * Cost(L) + Prob(hit R) * Cost(R)

= C_trav + SA(L) * TriCount(L) + SA(R) * TriCount(R)

i th ti f th t t t t th t t

C_trav

is the ratio of the cost to traverse to the cost to intersect

C_trav= 1:80 in pbrt (found by experiments)

(28)

Surface area heuristic

2n splits;

must coincides with object boundary. Why?

S

a

S

_b

a b

a

p

a

 S

_b

S

^b

p  S

Termination criteria

• When should we stop splitting?

B d d th li it b f t i l – Bad: depth limit, number of triangles – Good: when split does not help any more.

h h ld f

• Threshold of cost improvement

– Stretch over multiple levels

– For example, if cost does not go down after three splits in a row, terminate

• Threshold of cell size

– Absolute probability SA(node)/SA(scene) small

Basic building algorithm

1. Pick an axis, or optimize across all three 2 B ild f did li l i ( 2. Build a set of candidate split locations (cost

extrema must be at bbox vertices) 3. Sort or bin the triangles

4. Sweep to incrementally track L/R counts, cost p y , 5. Output position of minimum cost split

Running time: T ( N )  N log N  2 T ( N / 2 ) Running time:

N N

N T

N T N N N T

log

2

) (

) 2 / ( 2 log )

(







• Characteristics of highly optimized tree

– very deep, very small leaves, big empty cells

Ray traversal algorithm

• Recursive inorder traversal

t

max

t *

* t

t

min

*

t  t t

min

  t * t

max

* t

*

min

t  t

t

max

 t

min max min

Intersect(L,tmin,tmax) Intersect(L,tmin,t*)Intersect(R,tmin,tmax) Intersect(R,t*,tmax)( , , )

a video for kdtree

(29)

Tree representation

8-byte (reduced from 16-byte, 20% gain)

struct KdAccelNode { interior

struct KdAccelNode { ...

union {

float split; // Interior u_int onePrimitive; // Leaf **u int *primitives; // Leaf**

leaf **u_int *primitives; // Leaf**

};

union {

n

union {

u_int flags; // Both u_int nPrims; // Leaf u_int aboveChild; // Interior };

} }

Tree representation

1 8

float is irrelevant in pbrt2

23 S E M

flags

2 n

Flag: 0,1,2 (interior x, y, z) 3 (leaf)

KdTreeAccel construction

• Recursive top-down algorithm d h 8 1 3 l ( )

• max depth = 8  1 . 3 log( N )

If (nPrims <= maxPrims || depth==0) {

<create leaf>

}

Interior node

• Choose split axis position

M d i t – Medpoint – Medium cut

A h i ti – Area heuristic

• Create leaf if no good splits were found

• Classify primitives with respect to split

(30)

Choose split axis position

cost of no split: 

^N

k i

k t

1

) (

cost of split:

^{k 1}^

 





^B ^N^A

k k i A N

k k i B

t

P t b P t a

t

1 1

) ( )

( assumptions:

1. t

_i_i

is the same for all primitives p

2. t

_i

: t

_t

= 80 : 1 (determined by experiments, main factor for the performance)

cost of split:

cost of no split: t

_i

N

) )(

1 (

_e _B _B _A _A

i

t

t b p N p N

t   

p )

cost of split:

t i

⁽

e

⁾⁽ p

B B

p

A A

⁾

s

B

A B

p ( | ) 

B C

A

s

A

p ( | )

B C

A

Choose split axis position

Start from the axis with maximum extent, sort all edge events and process them in order all edge events and process them in order

A C

B

C

a

₀

b

₀

a

₁

b

₁

c

₀

c

₁

Choose split axis position

If there is no split along this axis, try other axes.

When all fail, create a leaf.

KdTreeAccel traversal

(31)

KdTreeAccel traversal

t

_max

ToDo stack t

_plane

t

_min

far near

KdTreeAccel traversal

ToDo stack t

_max

t

_min

far near

KdTreeAccel traversal

t

_max

far

ToDo stack t

_plane

t

_min

near

KdTreeAccel traversal

ToDo stack t

_max

t

_min

(32)

KdTreeAccel traversal

t

_min

ToDo stack t

_max

KdTreeAccel traversal

bool KdTreeAccel::Intersect

**(const Ray &ray, Intersection *isect) (const Ray &ray, Intersection isect)** {

if (!bounds.IntersectP(ray, &tmin, &tmax)) ( ( y, , )) return false;

**KdAccelNode *node=&nodes[0];**

while (node!=NULL) {

if (ray.maxt<tmin) break;

if (!node->IsLeaf()) <Interior>

else <Leaf>

}

} ToDo stack

(max depth)

Leaf node

1. Check whether ray intersects primitive(s) inside the node; update ray’s maxt

inside the node; update ray s maxt 2. Grab next node from ToDo queue

Interior node

1. Determine near and far (by testing which side O is)

O is)

below above below above

node+1 &(nodes[node->aboveChild])

2. Determine whether we can skip a node

node+1 &(nodes[node >aboveChild])

t t

_plane

t

_plane

t t

_min

t

_max

t

_min

t

_max

near far near far

tt

_plane

(33)

Acceleration techniques Best efficiency scheme

References

• J. Goldsmith and J. Salmon, Automatic Creation of Object Hierarchies for Ray Tracing IEEE CG&A 1987 Object Hierarchies for Ray Tracing, IEEE CG&A, 1987.

• Brian Smits, Efficiency Issues for Ray Tracing, Journal of Graphics Tools, 1998. p ,

• K. Klimaszewski and T. Sederberg, Faster Ray Tracing K. Klimaszewski and T. Sederberg, Faster Ray Tracing Using Adaptive Grids, IEEE CG&A Jan/Feb 1999.

1) Find bounding box of objects

Acceleration

Digital Image Synthesis g g y Yung-Yu Chuang

Acceleration techniques

Bounding volume hierarchy Bounding volume hierarchy

1) Find bounding box of objects

Bounding volume hierarchy

1) Find bounding box of objects 2) S li bj i

2) Split objects into two groups

Bounding volume hierarchy

1) Find bounding box of objects 2) S li bj i

2) Split objects into two groups 3) Recurse

Bounding volume hierarchy

1) Find bounding box of objects 2) S li bj i

2) Split objects into two groups 3) Recurse

Bounding volume hierarchy

1) Find bounding box of objects 2) S li bj i

2) Split objects into two groups

3) Recurse

Bounding volume hierarchy

1) Find bounding box of objects 2) S li bj i

2) Split objects into two groups 3) Recurse

Where to split?

• At midpoint

• Sort and put half of the objects on each side

• Sort, and put half of the objects on each side

• Use modeling hierarchy

BVH traversal

• If hit parent, then check all children

BVH traversal

• Don't return intersection immediately because the other subvolumes may have a closer

the other subvolumes may have a closer

intersection

Bounding volume hierarchy Bounding volume hierarchy

Space subdivision approaches

Quadtree (2D) Unifrom grid Quadtree (2D)

Octree (3D) Unifrom grid

Space subdivision approaches

KD tree BSP tree

KD tree BSP tree

Uniform grid Uniform grid

Uniform grid

Uniform grid

Uniform grid

Uniform grid traversal

P

Preprocess scene Traverse grid

3D li 3D DDA 3D line = 3D-DDA (Digital Differential Analyzer)

Analyzer)

x x

y m  y  b

mx y  

x

x 

  1

x

x

b mx

y



 naive

m y

y



 DDA

octree

Octree K-d tree

A

A

K-d tree

A

B

A

K-d tree

A

B B

B

A

K-d tree

C A

 _DDA