Chapter 3 Related Work
3.1 Fast KD tree Construction
3.1.1 Introduction
KD tree is a variant of BSP tree, it is widely used in the area of computer graphics
and science computing, the relevant applications are not limited to ray tracing, but any
problem related to nearest neighbor search. For the scene with few triangles, the
construction time can be insignificant compared to the render time. However, modern
animations with elaborate scenes and detailed characters usually may have millions of
triangles, and the construction time becomes critical.
For earlier GPUs uniform grid was used [21, 22], the structure is simple but fits
badly to sparse scenes.
Although structures like BVH can efficiently represent some structured models like
characters, where a motion can be handled with local updates of BVH [9], unstructured
models like dynamics system make BVH degrade immediately, and a reconstruction
from scratch is necessary.
KD tree has been recognized as the best structure for ray tracing [11]. It is able to
fast cull away the empty spaces in the scene, which is crucial for wide open scenes
where the triangle distribution is sparse. And further, KD tree uses AABB (axis aligned
bounding box) as a proxy for geometry primitives, these axis aligned voxels offer
extremely fast traversal. Structure like OBB tree [8] is able to fit the primitives well,
however the arbitrary orientation make it much slow for traversal.
Variants of KD tree like B-KD tree [29, 34] and SKD tree [12] use two slabs for the
split, and simplify the problem of straddling triangles [14]. For such trees left node and
right node have overlapped region, and also the two empty spaces at the two ends can
be culled. However, they lack an optimized traversal approach like [4, 24, 25]. These
trees are not the mainstream and will not be discussed here.
3.1.2 Fast KD tree construction overview
KD tree construction begins as a global voxel which contains the whole scene,
called root node, then a top-down fashion binary split is proceeded recursively. Each
node is going to split into two children. The pattern continues until the current node no
longer has to split, by the time the leaf node usually has only a few triangles.
The process of fast KD tree construction in O(NlogN) can be roughly divided in
several stages :
(1) Termination criterion checking
(2) Determine the split axis
(3) Split position evaluation
(4) Classification of triangles
(5) Classification of events
(6) Splice to left and right node
Given a node, we first check some criteria to qualify the node for upcoming stages.
If the depth has reached a threshold, or the number of triangles within the node has
fallen below a user-specified value, the node then is tagged leaf. Otherwise the node is
an internal node.
To split a non leaf node, first we have to choose the split axis. Evaluation of
geometry distribution is not trivial and time consuming, the common sense is to always
choose the axis with longest extent, or choose in round robin fashion.
3.1.3 Surface Area Heuristic
Split at spatial median is easy, but inefficient. It is important to discard most of the
spaces and triangles as early as possible. Our goal is to terminate the ray as fast as we
can, and a carefully picked split can make a big difference. The trick is to produce large
chunks of empty space close to the root of the tree. [28] has mentioned “Benefits of a
good tree are not small”, and may be “several times faster than a mediocre tree”.
Fig. 3-1 Split at (A) spatial median (B) object median (C) minimum SAH cost
A cost model based on geometric probability of ray has been proposed [11, 15].
The model assumes the rays are uniformly distributed, and the likelihood a ray will
intersect the node is roughly proportional to its surface area. Also if the cost of
intersection is Ci, the cost of intersection test against N triangles roughly equals NCi.
Based on [15], the cost of tree can be expressed as follows.
T I tri
C :cost of traversal
C : cost of intersection test
N : number of triangles in the node
( ) ( )
With this cost function, the same triangle may appear in different leaves, and will
be tested several times against the same ray. A correct version is derived as follows.
( ) ( )
( | |
( ) ( )
T I L R
SA left subnode SA right subnode
C C C C C
SA node SA node
= + + | |) (3.2)
Although theoretically CL and CR are going to be recursively evaluated, in fact this
approximation works great. The concept is called surface area heuristic, and based on
this we can pick the split to minimize the cost of the node.
3.1.4 Split Candidate
Theoretically on the split axis there can be infinite split candidates, nevertheless
the cost is piecewise linear along the axis, and it changes only at the AABB planes of
the triangle. What we have to consider are these plane positions, also called events of
triangle.
Fig. 3-2 Events of triangle [32] suggested that there are three kinds of event.
starting min position of triangle on the axis ending max position of triangle on the axis
lying min=max if the triangle is perpendicular to the axis
⇒
⇒
⇒
(3.3)
All events are sorted once at root in ascending order, the order is maintained during
the entire construction process. Every node has its own sorted event list, and to each
event we perform SAH evaluation. One thing to note is that if we do not maintain the
events as sorted, but merely compute the SAH cost for each event individually, then
each time we have to count again the triangles to the left and to the right, the complexity
is terribly O(N^2).
Taking the advantage of the sorted events, the number of triangles to the left and to
the right can be counted incrementally [19, 32]. At the first event, all triangles lie on the
right side, so NL is zero and NR has all. By sweeping over the events from left to right,
each time we simply increment the NL and decrement the NR, and the cost of all
candidates can be known in O(N). Record the event with the minimum cost as the split.
Fig. 3-3 Incremental sweeping for split
|
p : Triangles starting on plane i p : Triangles ending on plane i p : Triangles lying on plane i
A bonus for culling away empty space is suggested [25, 32]. The cost is scaled by
about 80% if we choose to split at the position where one side is empty.
Exact SAH evaluation can be approximated by coarsely sampling at finite discrete
positions [14]. Especially for nodes near the root this method saves a lot of time without
losing much accuracy.
Binning [14, 19, 27] is an alternative to incremental sweeping, the method also has a
sort-free like feature, and is able to fast evaluate the cost regardless of the number of
triangles. The main problem about binning is the accuracy of the SAH cost, and deeper
the node is, more obvious the problem has become. The issues about binning will be
discussed later.
3.1.5 Triangle Classification and Clipping
Once the split position has been decided, we are going to distribute the events to left
and right. However, some of the events belong to straddling triangles, which contribute
its events to both sides, and should be further clipped into two pieces.
So before we process the events directly, we want to tag these triangles as “LEFT”,
“RIGHT”, or “BOTH”. An efficient sweeping can be applied as follows.
(3.5)
Most of the triangles will be marked “LEFT” or “RIGHT”, and a few will remain as
“BOTH”, these triangles straddle the split plane, and need special care.
3.1.6 Event Classification and Generation
There is no problem about dealing with events associated with triangles marked
“LEFT” or “RIGHT”, these events will be copied to one side with the ascending order
unchanged. But for a straddling triangle, its “starting” event lies in left and its “ending”
event lies in right. And the triangle should be clipped into two polygons, one for each
side.
The precise way is to apply a Sutherland Hodgeman clipping to the triangle, two
polygons are generated. We have to calculate the AABB for the two polygons, and
replace the old events of the triangle with the new events of the two polygons, on both
axis
flag pos tri
flag pos tri
for all t
side[t]=BOTH;
for all e && e ==splitAxis
if(e == ending && e <= splitPosition)side[e ]=LEFT;
else if(e == starting && e >= splitPosition)side[e ]=RIGHT;
sides. These new events should be merged with the event arrays to maintain the
ascending order.
The process of triangle clipping, with generation of new events and merging of
events can be an impact to performance. Fortunately in reasonable cases the number of
straddling triangles is minor.
A trivial way is to just copy these events to both sides, and update the events if they
exceed the boundaries.
3.1.7 Splice to two children
New memory is dynamically allocated for the two children, or the bias address is
incremented if we have preallocated pools.
3.1.8 Termination Criteria
The stages above are performed for each node split, and the ascending order of new
event lists are always preserved. The pattern continues until some of the criteria have
been achieved.
(1) The number of triangles of the node is smaller than a threshold.
(2) The depth of the node exceeds a threshold.
(3) The split position cannot be found for the node.
The first is mandatory, and the second can be optional. It is notable that we cannot
always find a split for every node. In a case where many triangles have the same
minimum and maximum on the split axis, the only two split candidates are the
minimum and maximum of the node’s AABB. The split like this will not be performed,
and we either switch to another split axis, or force the node to be a leaf.
3.1.9 Conclusion
All stages required for a node split are presented here.
(1) Termination criteria checking O(1) (2) Split axis picking => O(1)
(3) Incremental sweeping for split => O(N) (4) Triangle classification => O(N)
(5) Event classification, generation, and merging => O(N)
=>
(3.6)
Sum up all the stages we get O(N) for one node split. From an easy math the
complexity of the whole tree is derived as following.
(3.7)
This is the fastest we can get for KD tree construction so far. However, for
interactive games with millions of triangles even the best optimized algorithm is not
enough, and we have to look for help from hardware.