Acceleration
Digital Image Synthesis g g y
Yung-Yu Chuang
Acceleration techniques
Bounding volume hierarchy
Bounding volume hierarchy
1) Find bounding box of objects
Bounding volume hierarchy
1) Find bounding box of objects 2) S li bj i
2) Split objects into two groups
Bounding volume hierarchy
1) Find bounding box of objects 2) S li bj i
2) Split objects into two groups
3) Recurse
Bounding volume hierarchy
1) Find bounding box of objects 2) S li bj i
2) Split objects into two groups
3) Recurse
Bounding volume hierarchy
1) Find bounding box of objects 2) S li bj i
2) Split objects into two groups
3) Recurse
Bounding volume hierarchy
1) Find bounding box of objects 2) S li bj i
2) Split objects into two groups
3) Recurse
Where to split?
• At midpoint
• Sort and put half of the objects on each side
• Sort, and put half of the objects on each side
• Use modeling hierarchy
BVH traversal
• If hit parent, then check all children
BVH traversal
• Don't return intersection immediately because the other subvolumes may have a closer
the other subvolumes may have a closer
intersection
Bounding volume hierarchy
Bounding volume hierarchy
Space subdivision approaches
Quadtree (2D)
Unifrom grid Quadtree (2D)
Unifrom grid
Space subdivision approaches
KD tree BSP tree
KD tree BSP tree
Uniform grid
Uniform grid
P
Preprocess scene
1. Find bounding box
Uniform grid
P
Preprocess scene
1. Find bounding box
2 Determine grid resolution 2. Determine grid resolution
Uniform grid
P
Preprocess scene
1. Find bounding box
2 Determine grid resolution 2. Determine grid resolution 3. Place object in cell if its
bounding box overlaps the cell
Uniform grid
P
Preprocess scene
1. Find bounding box
2 Determine grid resolution 2. Determine grid resolution 3. Place object in cell if its
bounding box overlaps the cell 4. Check that object overlaps cell
(expensive!)
Uniform grid traversal
P
Preprocess scene Traverse grid
3D li 3D DDA 3D line = 3D-DDA (Digital Differential Analyzer)
Analyzer)
1 2
x x
y m y b
mx y
1
2
x
x
1
1
i
i
x
x
b mx
y
i1
i1 naive m
y
y
i1
i DDA
octree
Octree
K-d tree
A
A
Leaf nodes correspond to unique regions in space
K-d tree
A
B
A
K-d tree
A
B B
B
A
Leaf nodes correspond to unique regions in space
K-d tree
C A
B B
B
A
K-d tree
C A
B B
C B
C
A
K-d tree
C A
D B B
C B
C
A
K-d tree
C A
D B B
C B
C
A
D
K-d tree
A
D B
B C C C C
D
K-d tree traversal
A
D B
B C C C C
D A
Leaf nodes correspond to unique regions in space
BSP tree
6 55
9 7
9 10 8
1
2
11
4 2
BSP tree
6 5
5 11
inside outside 9
7
inside ones
outside ones 9
10 8 1 2
11
4 2
3
BSP tree
6 5
5 11
2 5
9 6 7
3 4
6 7 8 9
10 8 1 2
9 10 11 11
4 2
BSP tree
6 5 1
9 5
9b 5
1
9 10 8
7
9a
6
11b 7 8
9b 1
2
11
9a 10 11a 11a
11b
4
2 11a
3
BSP tree
6 5
9 9b
2
1
5
1
9 10 8
7
2 3
5
6 8
9a 11b
1 2
11a 11 4 7 9b
4 2
9a 11b
BSP tree traversal
6 5
9 5
9b 2
1
5
1
9 10 8
7
2 3
5
6 8
9a 11b
9a
11b 1
2
11a 11 4 7 9b
4 2
i t
9a 11b
3 point
10 11a
BSP tree traversal
6 5
9 5
9b 2
1
5
1
9 10 8
7
2 3
5
6 8
9a 11b
9a
11b 1
2
11a 11 4 7 9b
4 2
i t
9a 11b point
BSP tree traversal
6 5
9 9b
2
1
5
1
9 10 8
7
2 3
5
6 8
9a
11b 9a
11b 1
2
11a 11 4 7 9b
2
point
9a 11b 4
point 3 10
11a
Classes
• Primitive (in core/primitive.*)
G t i P i iti – GeometricPrimitive – InstancePrimitive
A t
– Aggregate
• Three types of accelerators are provided (in accelerators/*.cpp)
– GridAccel – BVHAccel
– KdTreeAccel
Hierarchy
Primitive
Geometric P i iti
Transformed P i iti
Aggregate Primitive Primitive
T
Material Shape
T
p
Support both Instancing and
Store intersectable primitives, call
animation Refine If necessary
Primitive
class Primitive : public ReferenceCounted {
<Primitive interface>
<Primitive interface>
const int primitiveId;
static int nextprimitiveId;
static int nextprimitiveId;
}
class TransformedPrimitive: public Primitive {
{
…
Interface
BBox WorldBound();
bool CanIntersect();
geometry bool CanIntersect();
bool Intersect(const Ray &r,
Intersection *in);
// update maxt Intersection *in);
bool IntersectP(const Ray &r);
void Refine(vector<Reference<Primitive>> &refined);
void Refine(vector<Reference<Primitive>> &refined);
void FullyRefine(vector<Reference<Primitive>> &refined);
AreaLight *GetAreaLight();
BSDF *GetBSDF(const DifferentialGeometry &dg, material
( y g,
Transform &WorldToObject);
BSSRDF *GetBSSRDF(DifferentialGeometry &dg, BSSRDF GetBSSRDF(DifferentialGeometry &dg,
Transform &WorldToObject);
Intersection
struct Intersection {
<Intersection interface>
<Intersection interface>
DifferentialGeometry dg;
const Primitive *primitive;
const Primitive *primitive;
Transform WorldToObject, ObjectToWorld;
int shapeId primitiveId;
int shapeId, primitiveId;
float rayEpsilon;
} adapti el estimated }; adaptively estimated
primitive stores the actual intersecting primitive hence
primitive stores the actual intersecting primitive, hence
GeometricPrimitive
• represents a single shape
h ld f t h d it i l
• holds a reference to a Shape and its Material, and a pointer to an AreaLight
Reference<Shape> shape;
Reference<Material> material; // BRDF AreaLight *areaLight; // emittance
• Most operations are forwarded to shape
GeometricPrimitive
bool Intersect(Ray &r,Intersection *isect) { float thit, rayEpsilon;
float thit, rayEpsilon;
if (!shape->Intersect(r, &thit,
&rayEpsilon, &isect->dg)) y p , g)) return false;
isect->primitive = this;
isect->WorldToObject = *shape->WorldToObject;
isect->ObjectToWorld = *shape->ObjectToWorld;
isect->shapeId = shape->shapeId;
isect->primitiveId = primitiveId;
isect->rayEpsilon = rayEpsilon;
Object instancing
61 unique plants, 4000 individual plants, 19.5M triangles q p , p , g
With instancing, store only 1.1M triangles, 11GB->600MB
TransformedPrimitive
R f <P i iti > i iti
Reference<Primitive> primitive;
AnimatedTransform WorldToPrimitive;
for instancing and animation for instancing and animation
Ray ray = WorldToPrimitive(r);
if (!instance->Intersect(ray, isect)) return false;
r.maxt = ray.maxt;
TransformedPrimitive
bool Intersect(Ray &r, Intersection *isect){
Transform w2p;
Transform w2p;
WorldToPrimitive.Interpolate(r.time,&w2p);
Ray ray = w2p(r);
Ray ray = w2p(r);
if (!primitive->Intersect(ray, isect)) return false;
return false;
r.maxt = ray.maxt;
i t > i iti Id i iti Id isect->primitiveId = primitiveId;
if (!w2p.IsIdentity()) {
// Compute world to object transformation for instance // Compute world-to-object transformation for instance
isect->WorldToObject=isect->WorldToObject*w2p;
isect->ObjectToWorld=Inverse(
isect >ObjectToWorld=Inverse(
isect->WorldToObject);
TransformedPrimitive
// Transform instance's differential geometry to world space
Transform PrimitiveToWorld = Inverse(w2p); ( p);
isect->dg.p = PrimitiveToWorld(isect->dg.p);
isect->dg.nn = Normalize(
PrimitiveToWorld(isect- >dg.nn));
isect->dg.dpdu=PrimitiveToWorld(isect->dg.dpdu);
isect->dg.dpdv=PrimitiveToWorld(isect->dg.dpdv);
isect->dg.dndu=PrimitiveToWorld(isect->dg.dndu);
isect->dg.dndv=PrimitiveToWorld(isect->dg.dndv);
}
Aggregates
• Acceleration is a heart component of a ray
tracer because ray/scene intersection accounts tracer because ray/scene intersection accounts for the majority of execution time
G l d th b f / i iti
• Goal: reduce the number of ray/primitive
intersections by quick simultaneous rejection of f i iti d th f t th t b
groups of primitives and the fact that nearby intersections are likely to be found first
• Two main approaches: spatial subdivision, object subdivision
• No clear winner
Ray-Box intersections
• Almost all acclerators require it
Q i k j i d i i
• Quick rejection, use enter and exit point to traverse the hierarchy
• AABB is the intersection of three slabs
Ray-Box intersections
1 D
xO
xt x
1t
1D
xt
1 O
xx
1
x=x
0x=x
1Ray-Box intersections
bool BBox::IntersectP(const Ray &ray,
float *hitt0, float *hitt1) {
float t0 = ray.mint, t1 = ray.maxt;
for (int i = 0; i < 3; ++i) {
float invRayDir = 1.f / ray.d[i];
float tNear = (pMin[i] - ray.o[i]) * invRayDir;
float tFar = (pMax[i] - ray.o[i]) * invRayDir;(p [ ] y [ ]) y ; if (tNear > tFar) swap(tNear, tFar);
t0 = tNear > t0 ? tNear : t0;
i i
t0 = tNear > t0 ? tNear : t0;
t1 = tFar < t1 ? tFar : t1;
if (t0 > t1) return false;
}
segment intersection intersection is empty
}
Grid accelerator
• Uniform grid
Teapot in a stadium problem
• Not adaptive to distribution of primitives.
Have to determine the number of voxels
• Have to determine the number of voxels.
(problem with too many or too few)
GridAccel
Class GridAccel:public Aggregate {
<GridAccel methods>
<GridAccel methods>
u_int nMailboxes;
MailboxPrim *mailboxes;
MailboxPrim *mailboxes;
vector<Reference<Primitive>> primitives;
int nVoxels[3];
int nVoxels[3];
BBox bounds;
V t Width I Width Vector Width, InvWidth;
Voxel **voxels;
M A lA
MemoryArena voxelArena;
RWMutex *rwMutex;
}
mailbox
struct MailboxPrim {
Reference<Primitive> primitive;
Reference<Primitive> primitive;
Int lastMailboxId;
}
GridAccel
GridAccel(vector<Reference<Primitive> > &p, bool forRefined bool refineImmediately) bool forRefined, bool refineImmediately) : gridForRefined(forRefined) {
// Initialize with primitives for grid // Initialize with primitives for grid
if (refineImmediately)
for (int i 0; i < p size(); ++i) for (int i = 0; i < p.size(); ++i)
p[i]->FullyRefine(primitives);
l else
primitives = p;
f (i t i 0 i i iti i () i)
for (int i = 0; i < primitives.size(); ++i) bounds = Union(bounds,
())
primitives[i]->WorldBound());
Determine number of voxels
• Too many voxels → slow traverse, large
memory consumption (bad cache performance) memory consumption (bad cache performance)
• Too few voxels → too many primitives in a l
voxel
• Let the axis with the largest extent have 3 N
3partitions (N:number of primitives)
Vector delta = bounds.pMax - bounds.pMin;
int maxAxis=bounds.MaximumExtent();
Calculate voxel size and allocate voxels
for (int axis=0; axis<3; ++axis) {
nVoxels[axis]=Round2Int(delta[axis]*voxelsPerUnitDist);
nVoxels[axis]=Clamp(nVoxels[axis], 1, 64);
}
for (int axis=0; axis<3; ++axis) {
width[axis]=delta[axis]/nVoxels[axis];
width[axis]=delta[axis]/nVoxels[axis];
invWidth[axis]=
(width[axis]==0.f)?0.f:1.f/width[axis];
}
int nv = nVoxels[0] * nVoxels[1] * nVoxels[2];
voxels=AllocAligned<Voxel *>(nv);
memset(voxels 0 nv * sizeof(Voxel *));
memset(voxels, 0, nv * sizeof(Voxel *));
Conversion between voxel and position
int posToVoxel(const Point &P, int axis) { int v=Float2Int(
(P[axis]-bounds.pMin[axis])*InvWidth[axis]);
return Clamp(v, 0, NVoxels[axis]-1);
}
float voxelToPos(int p, int axis) const { return bounds pMin[axis]+p*Width[axis];
return bounds.pMin[axis]+p Width[axis];
}
Point voxelToPos(int x, int y, int z) const { return bounds pMin+
return bounds.pMin+
Vector(x*Width[0], y*Width[1], z*Width[2]);
}
Add primitives into voxels
for (u_int i=0; i<prims.size(); ++i) {
<Find voxel extent of primitive>
<Find voxel extent of primitive>
<Add primitive to overlapping voxels>
}
}
<Find voxel extent of primitive>
BBox pb = prims[i]->WorldBound();
int vmin[3] vmax[3];
int vmin[3], vmax[3];
for (int axis = 0; axis < 3; ++axis) {
vmin[axis] = posToVoxel(pb pMin axis);
vmin[axis] = posToVoxel(pb.pMin, axis);
vmax[axis] = posToVoxel(pb.pMax, axis);
}
}
<Add primitive to overlapping voxels>
for (int z = vmin[2]; z <= vmax[2]; ++z) for (int y = vmin[1]; y <= vmax[1]; ++y)
for (int x = vmin[0]; x <= vmax[0]; ++x) { int o = offset(x, y, z);
if (!voxels[o]) {
voxels[o] = voxelArena.Alloc<Voxel>();
*voxels[o] = Voxel(primitives[i]);
} }
else {
// Add primitive to already-allocated voxel voxels[o]->AddPrimitive(primitives[i]);
voxels[o] >AddPrimitive(primitives[i]);
} }
Voxel structure
struct Voxel {
<Voxel methods>
<Voxel methods>
vector<Reference<Primitive>> primitives;
bool allCanIntersect;
bool allCanIntersect;
}
Voxel(Reference<Primitive> op) { Voxel(Reference<Primitive> op) {
allCanIntersect = false;
i iti h b k( )
primitives.push_back(op);
}
GridAccel traversal
bool GridAccel::Intersect(
Ray &ray Intersection *isect) { Ray &ray, Intersection *isect) {
<Check ray against overall grid bounds>
<Set up 3D DDA for ray>
<Set up 3D DDA for ray>
<Walk ray through voxel grid>
}
}
Check against overall bound
float rayT;
if (bounds Inside(ray(ray mint))) if (bounds.Inside(ray(ray.mint)))
rayT = ray.mint;
else if (!bounds IntersectP(ray &rayT)) else if (!bounds.IntersectP(ray, &rayT))
return false;
Point gridIntersect ray(rayT);
Point gridIntersect = ray(rayT);
Set up 3D DDA (Digital Differential Analyzer)
• Similar to Bresenhm’s line drawing algorithm
Set up 3D DDA (Digital Differential Analyzer)
blue values changes along the traversal
NextCrossingT[1] Out
voxel
g g
voxel index
DeltaT[0]
rayT
Step[0]=1
DeltaT[0]
Set up 3D DDA
for (int axis=0; axis<3; ++axis) {
Pos[axis]=posToVoxel(gridIntersect, axis);
if (ray.d[axis]>=0) {
NextCrossingT[axis] = rayT+
(voxelToPos(Pos[axis]+1 axis)-gridIntersect[axis]) (voxelToPos(Pos[axis]+1,axis)-gridIntersect[axis]) /ray.d[axis];
DeltaT[axis] = width[axis] / ray.d[axis];
Step[axis] = 1;
Out[axis] = nVoxels[axis];
1
Out[axis] nVoxels[axis];
} else { ...
D
xStep[axis] = -1;
Out[axis] = -1;
} }
} width[0]
Walk through grid
for (;;) {
*voxel=voxels[offset(Pos[0] Pos[1] Pos[2])];
*voxel=voxels[offset(Pos[0],Pos[1],Pos[2])];
if (voxel != NULL) hitSomething |=
hitSomething |=
voxel->Intersect(ray,isect,rayId);
<Advance to next voxel>
<Advance to next voxel>
}
t hitS thi
return hitSomething;
Do not return; cut tmax instead
Do not return; cut tmax instead
Advance to next voxel
int bits=((NextCrossingT[0]<NextCrossingT[1])<<2) + ((NextCrossingT[0]<NextCrossingT[2])<<1) + ((NextCrossingT[1]<NextCrossingT[2]));
const int cmpToAxis[8] = { 2, 1, 2, 1, 2, 2, 0, 0 };
int stepAxis=cmpToAxis[bits];
if (ray.maxt < NextCrossingT[stepAxis]) break;
Pos[stepAxis]+=Step[stepAxis];
Pos[stepAxis]+ Step[stepAxis];
if (Pos[stepAxis] == Out[stepAxis]) break;
NextCrossingT[stepAxis] += DeltaT[stepAxis];
conditions
x<y x<z y<z
0 0 0 x≥y≥z 2
0 0 1 x≥z>y 1
0 1 0 -
0 1 1 z>x≥y 1 1 0 0 y>x≥z 2 1 0 0 y>x≥z 2
1 0 1 -
1 1 0 y≥z>x 0 1 1 0 y≥z>x 0
Bounding volume hierarchies
• Object subdivision. Each primitive appears in the hierarchy exactly once Additionally the the hierarchy exactly once. Additionally, the required space for the hierarchy is bounded.
BVH G id b th ffi i t t b ild b t
• BVH v.s. Grid: both are efficient to build, but BVH provides much faster intersection.
• BVH v.s. Kd-tree: Kd-tree could be slightly
faster for intersection, but takes much longer to build. In addition, BVH is generally more numerically robust and less prone to subtle round-off bugs.
• accelerators/bvh.*
BVHAccel
class BVHAccel : public Aggregate {
<member functions>
uint32_t maxPrimsInNode;
enum SplitMethod { SPLIT_MIDDLE, SPLIT_EQUAL_COUNTS, SPLIT SAH };
SPLIT_SAH };
SplitMethod splitMethod;
vector<Reference<Primitive> > primitives;
LinearBVHNode *nodes;
}
BVHAccel construction
BVHAccel::BVHAccel(vector<Reference<Primitive> > &p, uint32_t mp, const string &sm) {
maxPrimsInNode = min(255u, mp);
for (uint32 t i = 0; i < p size(); ++i) for (uint32_t i = 0; i < p.size(); ++i)
p[i]->FullyRefine(primitives);
if (sm=="sah") splitMethod =SPLIT_SAH;
else if (sm=="middle") splitMethod =SPLIT_MIDDLE;
else if (sm=="equal") splitMethod=SPLIT_EQUAL_COUNTS;
else { else {
Warning("BVH split method \"%s\" unknown. Using
\"sah\".", sm.c_str());
splitMethod = SPLIT_SAH;
}
BVHAccel construction
<Initialize buildData array for primitives>
<Recursively build BVH tree for primitives>
<Recursively build BVH tree for primitives>
<compute representation of depth-first traversal of BVH tree>
BVH tree>
} It is possible to construct a pointer-less BVH tree directly but it is less straightforward
directly, but it is less straightforward.
Initialize buildData array
vector<BVHPrimitiveInfo> buildData;
buildData.reserve(primitives.size()); (p ());
for (int i = 0; i < primitives.size(); ++i) {
BBox bbox = primitives[i]->WorldBound();
buildData.push_back(
BVHPrimitiveInfo(i bbox));
BVHPrimitiveInfo(i, bbox));
}
struct BVHPrimitiveInfo { BVHPrimitiveInfo() { } BVHPrimitiveInfo() { }BVHPrimitiveInfo(int pn, const BBox &b) : primitiveNumber(pn), bounds(b) {
centroid = .5f * b.pMin + .5f * b.pMax;p p ; }
int primitiveNumber;
Point centroid;
BBox bounds;
};
Recursively build BVH tree
MemoryArena buildArena;
uint32 t totalNodes = 0; _ ;
vector<Reference<Primitive> > orderedPrims;
orderedPrims.reserve(primitives.size());
BVHBuildNode *root = recursiveBuild(buildArena, buildData 0 primitives size() &totalNodes buildData, 0, primitives.size(), &totalNodes, orderedPrims);
[start end)
primitives.swap(orderedPrims);
[start end)
BVHBuildNode
struct BVHBuildNode {
void InitLeaf(int first, int n, BBox &b) { ( , , ) { firstPrimOffset = first;
nPrimitives = n; bounds = b;
}
void InitInterior(int axis, BVHBuildNode *c0, BVHBuildNode *c1) {
BVHBuildNode *c1) { children[0] = c0; children[1] = c1;
bounds = Union(c0->bounds, c1->bounds); ( , );
splitAxis = axis; nPrimitives = 0;
}
The leaf contains primitives fromBVHAccel::primitives[firstPrimOffset]
BBox bounds;
BVHBuildNode *children[2];
int splitAxis firstPrimOffset nPrimitives;
BVHAccel::primitives[firstPrimOffset]
to [firstPrimOffset+nPrimitives-1]
int splitAxis, firstPrimOffset, nPrimitives;
};
recursiveBuild
• Given n primitives, there are in general 2
n-2 possible ways to partition them into two non possible ways to partition them into two non- empty groups. In practice, one considers
partitions along a coordinate axis resulting in partitions along a coordinate axis, resulting in 6n candidate partitions.
1. Choose axis
2 Ch lit
2. Choose split
3. Interior(dim,
Choose axis
BBox cBounds;
for (int i = start; i < end; ++i) for (int i start; i < end; ++i)
cBounds=Union(cBounds, buildData[i].centroid);
int dim = centroidBounds.MaximumExtent(); ();
If cBounds has zreo volume, create a leaf
Choose split (split_middle)
float pmid = .5f * (centroidBounds.pMin[dim] + centroidBounds.pMax[dim]);
centroidBounds.pMax[dim]);
BVHPrimitiveInfo *midPtr = std::partition(
&buildData[start], &buildData[end-1]+1, CompareToMid(dim, pmid));
mid = midPtr - &buildData[0];
Return true if the given primitive’s bound’s centroid is below the given midpoint
Choose split (split_equal_count)
mid = (start + end) / 2;
std::nth element(&buildData[start], std::nth_element(&buildData[start],
&buildData[mid], &buildData[end-1]+1, ComparePoints(dim)); p ( ));
It orders the array so that the middle pointer has median, the first half is smaller and the second half is larger in O(n).
Choose split
both heuristics work well both are sub-optimal
Choose split (split_SAH)
Do not split
N
split
it
i
i
t
1
sec
( )
split
NAt
NBt b
t B
A ) ( ) ( )
(
i
i t i B
i
i t i A
trav
p t a p t b
t B
A c
1
sec 1
sec
( ) ( )
) , (
B p
A s
A Bs
Bp
A
C
C
A
s
p
s
CC
Choose split (split_SAH)
• If there are no more than 4 primitives, use equal size heuristics instead
heuristics instead.
• Instead of testing 2n candidates, the extend is divided
into a small number (12) of buckets of equal extent. ( ) q
Only buck boundaries are considered.
Choose split (split_SAH)
const int nBuckets = 12;
struct BucketInfo { struct BucketInfo {
int count; BBox bounds;
};
};
BucketInfo buckets[nBuckets];
for (int i=start; i<end; ++i) { int b = nBuckets *
((buildData[i].centroid[dim]-centroidBounds.pMin[dim])/
(centroidBounds.pMax[dim]-centroidBounds.pMin[dim]));
if (b == nBuckets) b = nBuckets-1;
b k t [b] t++
buckets[b].count++;
buckets[b].bounds = Union(buckets[b].bounds, buildData[i].bounds);
buildData[i].bounds);
}
Choose split (split_SAH)
float cost[nBuckets-1];
for (int i = 0; i < nBuckets-1; ++i) { for (int i 0; i < nBuckets 1; ++i) {
BBox b0, b1;
int count0 = 0, count1 = 0; , ;
for (int j = 0; j <= i; ++j) {
b0 = Union(b0, buckets[j].bounds);
count0 += buckets[j].count; }
for (int j = i+1; j < nBuckets; ++j) { b1 = Union(b1, buckets[j].bounds);
count1 += buckets[j].count; }
Choose split (split_SAH)
float minCost = cost[0]; uint32_t minCostSplit = 0;
for (int i = 1; i < nBuckets-1; ++i) {( ; ; ) { if (cost[i] < minCost) {
minCost = cost[i];
minCostSplit = i;
} } }
if (nPrimitives > maxPrimsInNode ||
minCost < nPrimitives) {) { BVHPrimitiveInfo *pmid =
std::partition(&buildData[start],&buildData[end- 1]+1 C T B k t( i C tS lit B k t di
1]+1, CompareToBucket(minCostSplit, nBuckets, dim, centroidBounds));
mid = pmid - &buildData[0];p } else <create a leaf>
Compact BVH
• The last step is to convert the BVH tree into a
compact representation which improves cache
compact representation which improves cache,
memory and thus overall performance.
BVHAccel traversal
bool BVHAccel::Intersect(const Ray &ray, Intersection *isect) const {
if (!nodes) return false;
b l hi f l
bool hit = false;
Point origin = ray(ray.mint);
Vector invDir(1.f / ray.d.x, 1.f / ray.d.y, Vector invDir(1.f / ray.d.x, 1.f / ray.d.y,
1.f / ray.d.z);
uint32_t dirIsNeg[3]={ invDir.x < 0, invDir.y < 0, invDir.z < 0 };
i t32 t d N 0 offset into the nodes array to be visited uint32_t nodeNum = 0;
uint32_t todo[64];
uint32 t todoOffset = 0;
offset into the nodes array to be visited next free element in the stack
nodes to be visited; acts like a stack uint32_t todoOffset 0; next free element in the stack
BVHAccel traversal
while (true) {
const LinearBVHNode *node = &nodes[nodeNum];[ ];
if (::IntersectP(node->bounds,ray,invDir,dirIsNeg)){
if (node->nPrimitives > 0) { leaf node
// Intersect ray with primitives in leaf BVH node for (uint32_t i = 0; i < node->nPrimitives; ++i){
if (primitives[node >primitivesOffset+i]
if (primitives[node->primitivesOffset+i]
->Intersect(ray, isect)) hit = true;
}
if (todoOffset == 0) break;
BVHAccel traversal
else {
if (dirIsNeg[node->axis]) {
interior node
( g[ ]) {
todo[todoOffset++] = nodeNum + 1;
nodeNum = node->secondChildOffset;
}
else {
todo[todoOffset++] = node >secondChildOffset;
todo[todoOffset++] = node->secondChildOffset;
nodeNum = nodeNum + 1;
} } } }
else {
if (todoOffset == 0) break;
nodeNum = todo[--todoOffset];
Do not hit the bounding box; retrieve the next one if any nodeNum = todo[--todoOffset];
}
KD-Tree accelerator
• Non-uniform space subdivision (for example,
kd tree and octree) is better than uniform grid
kd-tree and octree) is better than uniform grid
if the scene is irregularly distributed.
Spatial hierarchies
A A
A
Letters correspond to planes (A)
Point Location by recursive search
Spatial hierarchies
A A
B B
A
Spatial hierarchies
A D
A B
B C
C
D
A
Letters correspond to planes (A, B, C, D)
Point Location by recursive search
Variations
octree
kd tree octree bsp tree
kd-tree bsp-tree
“Hack” kd-tree building
• Split axis
R d bi l t t t
– Round-robin; largest extent
• Split location
– Middle of extent; median of geometry (balanced tree)
• Termination
– Target # of primitives, limited tree depth
• All of these techniques stink.
Building good kd-trees
• What split do we really want?
Cl Id th th t k t i h
– Clever Idea: the one that makes ray tracing cheap – Write down an expression of cost and minimize it
G d t ti i ti
– Greedy cost optimization
• What is the cost of tracing a ray through a cell?
Cost(cell) = C_trav + Prob(hit L) * Cost(L) + Prob(hit R) * Cost(R)
Splitting with cost in mind
Split in the middle
To get through this part of empty space, you need to test all triangles on the right.
Split at the median
• Makes the L & R costs equal
P tt ti t th L & R b biliti
• Pays no attention to the L & R probabilities
Cost-optimized split
Since Cost(R) is much higher, make it as small as possible
Building good kd-trees
• Need the probabilities
T t t b ti l t f
– Turns out to be proportional to surface area
• Need the child cell costs
– Simple triangle count works great (very rough approx.)
ll “b ” – Empty cell “boost”
Cost(cell) = C_trav + Prob(hit L) * Cost(L) + Prob(hit R) * Cost(R)
= C_trav + SA(L) * TriCount(L) + SA(R) * TriCount(R)
i th ti f th t t t t th t t
C_trav
is the ratio of the cost to traverse to the cost to intersect
C_trav = 1:80 in pbrt (found by experiments)
Surface area heuristic
2n splits;
must coincides with object
boundary. Why?
S
aS
ba b
p
aS
b
Termination criteria
• When should we stop splitting?
B d d th li it b f t i l – Bad: depth limit, number of triangles
– Good: when split does not help any more.
h h ld f
• Threshold of cost improvement
– Stretch over multiple levels
– For example, if cost does not go down after three splits in a row, terminate
• Threshold of cell size
– Absolute probability SA(node)/SA(scene) small
Basic building algorithm
1. Pick an axis, or optimize across all three 2 B ild f did li l i (
2. Build a set of candidate split locations (cost extrema must be at bbox vertices)
3. Sort or bin the triangles
4. Sweep to incrementally track L/R counts, cost p y , 5. Output position of minimum cost split
Running time: T ( N ) N log N 2 T ( N / 2 ) Running time:
N N
N T
N T
N N
N T
log
2) (
) 2 / (
2 log
) (
Ray traversal algorithm
• Recursive inorder traversal
t
maxt *
* t
t
t
t
min*
t t t
min t * t
max* t
*
mint t
t
max t
min max minIntersect(L,tmin,tmax) Intersect(L,tmin,t*) Intersect(R,tmin,tmax) Intersect(R,t*,tmax)( , , )
a video for kdtree
Tree representation
8-byte (reduced from 16-byte, 20% gain)
struct KdAccelNode { interior
struct KdAccelNode { ...
union {
float split; // Interior u_int onePrimitive; // Leaf u int *primitives; // Leaf
leaf u_int *primitives; // Leaf
};
union {
n union {
u_int flags; // Both
u_int nPrims; // Leaf
Tree representation
1 8
float is irrelevant in pbrt223
S E M
flags
2 n
Flag: 0,1,2 (interior x, y, z) 3 (leaf)
KdTreeAccel construction
• Recursive top-down algorithm d h 8 1 3 l ( )
• max depth = 8 1 . 3 log( N )
If (nPrims <= maxPrims || depth==0) {
<create leaf>
}
Interior node
• Choose split axis position
M d i t – Medpoint – Medium cut
A h i ti – Area heuristic
• Create leaf if no good splits were found
• Classify primitives with respect to split
Choose split axis position
cost of no split:
Nk
i
k t
1
) (
cost of split:
k 1
B NAk
k i A
N
k
k i B
t
P t b P t a
t
1 1
) ( )
(
assumptions:
1. t
iiis the same for all primitives p
2. t
i: t
t= 80 : 1 (determined by experiments, main factor for the performance)
cost of split:
cost of no split: t
iN
) )(
1
(
e B B A Ai
t
t b p N p N
t
p )
cost of split:
t i(
e)( p
B Bp
A A)
Choose split axis position
Start from the axis with maximum extent, sort all edge events and process them in order
all edge events and process them in order
A C
B
C
a
0b
0a
1b
1c
0c
1Choose split axis position
If there is no split along this axis, try other axes.
When all fail, create a leaf.
When all fail, create a leaf.
KdTreeAccel traversal
KdTreeAccel traversal
t
maxToDo stack t
planet
minfar near
KdTreeAccel traversal
ToDo stack t
maxt
minfar near
KdTreeAccel traversal
t
maxfar
ToDo stack t
planet
minKdTreeAccel traversal
ToDo stack t
maxt
minKdTreeAccel traversal
t
t
minToDo stack
t
maxKdTreeAccel traversal
bool KdTreeAccel::Intersect
(const Ray &ray, Intersection *isect) (const Ray &ray, Intersection isect) {
if (!bounds.IntersectP(ray, &tmin, &tmax)) ( ( y, , )) return false;
KdAccelNode *node=&nodes[0];
while (node!=NULL) {
if (ray.maxt<tmin) break;
if (!node->IsLeaf()) <Interior>
else <Leaf>
}
} ToDo stack
} ToDo stack
(max depth)
Leaf node
1. Check whether ray intersects primitive(s) inside the node; update ray’s maxt
inside the node; update ray s maxt
2. Grab next node from ToDo queue
Interior node
1. Determine near and far (by testing which side O is)
O is)
below above below above
node+1 &(nodes[node->aboveChild])
2. Determine whether we can skip a node
node+1 &(nodes[node >aboveChild])