Fast KD tree Construction - Related Work - 光線追蹤法之平行化加速結構於多核處理器

Chapter 3 Related Work

3.1 Fast KD tree Construction

3.1.1 Introduction

KD tree is a variant of BSP tree, it is widely used in the area of computer graphics

and science computing, the relevant applications are not limited to ray tracing, but any

problem related to nearest neighbor search. For the scene with few triangles, the

construction time can be insignificant compared to the render time. However, modern

animations with elaborate scenes and detailed characters usually may have millions of

triangles, and the construction time becomes critical.

For earlier GPUs uniform grid was used [21, 22], the structure is simple but fits

badly to sparse scenes.

Although structures like BVH can efficiently represent some structured models like

characters, where a motion can be handled with local updates of BVH [9], unstructured

models like dynamics system make BVH degrade immediately, and a reconstruction

from scratch is necessary.

KD tree has been recognized as the best structure for ray tracing [11]. It is able to

fast cull away the empty spaces in the scene, which is crucial for wide open scenes

where the triangle distribution is sparse. And further, KD tree uses AABB (axis aligned

bounding box) as a proxy for geometry primitives, these axis aligned voxels offer

extremely fast traversal. Structure like OBB tree [8] is able to fit the primitives well,

however the arbitrary orientation make it much slow for traversal.

Variants of KD tree like B-KD tree [29, 34] and SKD tree [12] use two slabs for the

split, and simplify the problem of straddling triangles [14]. For such trees left node and

right node have overlapped region, and also the two empty spaces at the two ends can

be culled. However, they lack an optimized traversal approach like [4, 24, 25]. These

trees are not the mainstream and will not be discussed here.

3.1.2 Fast KD tree construction overview

KD tree construction begins as a global voxel which contains the whole scene,

called root node, then a top-down fashion binary split is proceeded recursively. Each

node is going to split into two children. The pattern continues until the current node no

longer has to split, by the time the leaf node usually has only a few triangles.

The process of fast KD tree construction in O(NlogN) can be roughly divided in

several stages :

(1) Termination criterion checking

(2) Determine the split axis

(3) Split position evaluation

(4) Classification of triangles

(5) Classification of events

(6) Splice to left and right node

Given a node, we first check some criteria to qualify the node for upcoming stages.

If the depth has reached a threshold, or the number of triangles within the node has

fallen below a user-specified value, the node then is tagged leaf. Otherwise the node is

an internal node.

To split a non leaf node, first we have to choose the split axis. Evaluation of

geometry distribution is not trivial and time consuming, the common sense is to always

choose the axis with longest extent, or choose in round robin fashion.

3.1.3 Surface Area Heuristic

Split at spatial median is easy, but inefficient. It is important to discard most of the

spaces and triangles as early as possible. Our goal is to terminate the ray as fast as we

can, and a carefully picked split can make a big difference. The trick is to produce large

chunks of empty space close to the root of the tree. [28] has mentioned “Benefits of a

good tree are not small”, and may be “several times faster than a mediocre tree”.

Fig. 3-1 Split at (A) spatial median (B) object median (C) minimum SAH cost

A cost model based on geometric probability of ray has been proposed [11, 15].

The model assumes the rays are uniformly distributed, and the likelihood a ray will

intersect the node is roughly proportional to its surface area. Also if the cost of

intersection is Ci, the cost of intersection test against N triangles roughly equals NCi.

Based on [15], the cost of tree can be expressed as follows.

T I tri

C :cost of traversal

C : cost of intersection test

N : number of triangles in the node

( ) ( )

With this cost function, the same triangle may appear in different leaves, and will

be tested several times against the same ray. A correct version is derived as follows.

( ) ( )

( | |

( ) ( )

T I L R

SA left subnode SA right subnode

C C C C C

SA node SA node

= + + | |) (3.2)

Although theoretically CL and CR are going to be recursively evaluated, in fact this

approximation works great. The concept is called surface area heuristic, and based on

this we can pick the split to minimize the cost of the node.

3.1.4 Split Candidate

Theoretically on the split axis there can be infinite split candidates, nevertheless

the cost is piecewise linear along the axis, and it changes only at the AABB planes of

the triangle. What we have to consider are these plane positions, also called events of

triangle.

Fig. 3-2 Events of triangle [32] suggested that there are three kinds of event.

starting min position of triangle on the axis ending max position of triangle on the axis

lying min=max if the triangle is perpendicular to the axis

⇒

(3.3)

All events are sorted once at root in ascending order, the order is maintained during

the entire construction process. Every node has its own sorted event list, and to each

event we perform SAH evaluation. One thing to note is that if we do not maintain the

events as sorted, but merely compute the SAH cost for each event individually, then

each time we have to count again the triangles to the left and to the right, the complexity

is terribly O(N^2).

Taking the advantage of the sorted events, the number of triangles to the left and to

the right can be counted incrementally [19, 32]. At the first event, all triangles lie on the

right side, so NL is zero and NR has all. By sweeping over the events from left to right,

each time we simply increment the NL and decrement the NR, and the cost of all

candidates can be known in O(N). Record the event with the minimum cost as the split.

Fig. 3-3 Incremental sweeping for split

p : Triangles starting on plane i p : Triangles ending on plane i p : Triangles lying on plane i

A bonus for culling away empty space is suggested [25, 32]. The cost is scaled by

about 80% if we choose to split at the position where one side is empty.

Exact SAH evaluation can be approximated by coarsely sampling at finite discrete

positions [14]. Especially for nodes near the root this method saves a lot of time without

losing much accuracy.

Binning [14, 19, 27] is an alternative to incremental sweeping, the method also has a

sort-free like feature, and is able to fast evaluate the cost regardless of the number of

triangles. The main problem about binning is the accuracy of the SAH cost, and deeper

the node is, more obvious the problem has become. The issues about binning will be

discussed later.

3.1.5 Triangle Classification and Clipping

Once the split position has been decided, we are going to distribute the events to left

and right. However, some of the events belong to straddling triangles, which contribute

its events to both sides, and should be further clipped into two pieces.

So before we process the events directly, we want to tag these triangles as “LEFT”,

“RIGHT”, or “BOTH”. An efficient sweeping can be applied as follows.

(3.5)

Most of the triangles will be marked “LEFT” or “RIGHT”, and a few will remain as

“BOTH”, these triangles straddle the split plane, and need special care.

3.1.6 Event Classification and Generation

There is no problem about dealing with events associated with triangles marked

“LEFT” or “RIGHT”, these events will be copied to one side with the ascending order

unchanged. But for a straddling triangle, its “starting” event lies in left and its “ending”

event lies in right. And the triangle should be clipped into two polygons, one for each

side.

The precise way is to apply a Sutherland Hodgeman clipping to the triangle, two

polygons are generated. We have to calculate the AABB for the two polygons, and

replace the old events of the triangle with the new events of the two polygons, on both

axis

flag pos tri

for all t

side[t]=BOTH;

for all e && e ==splitAxis

if(e == ending && e <= splitPosition)side[e ]=LEFT;

else if(e == starting && e >= splitPosition)side[e ]=RIGHT;

sides. These new events should be merged with the event arrays to maintain the

ascending order.

The process of triangle clipping, with generation of new events and merging of

events can be an impact to performance. Fortunately in reasonable cases the number of

straddling triangles is minor.

A trivial way is to just copy these events to both sides, and update the events if they

exceed the boundaries.

3.1.7 Splice to two children

New memory is dynamically allocated for the two children, or the bias address is

incremented if we have preallocated pools.

3.1.8 Termination Criteria

The stages above are performed for each node split, and the ascending order of new

event lists are always preserved. The pattern continues until some of the criteria have

been achieved.

(1) The number of triangles of the node is smaller than a threshold.

(2) The depth of the node exceeds a threshold.

(3) The split position cannot be found for the node.

The first is mandatory, and the second can be optional. It is notable that we cannot

always find a split for every node. In a case where many triangles have the same

minimum and maximum on the split axis, the only two split candidates are the

minimum and maximum of the node’s AABB. The split like this will not be performed,

and we either switch to another split axis, or force the node to be a leaf.

3.1.9 Conclusion

All stages required for a node split are presented here.

(1) Termination criteria checking O(1) (2) Split axis picking => O(1)

(3) Incremental sweeping for split => O(N) (4) Triangle classification => O(N)

(5) Event classification, generation, and merging => O(N)

(3.6)

Sum up all the stages we get O(N) for one node split. From an easy math the

complexity of the whole tree is derived as following.

(3.7)

This is the fastest we can get for KD tree construction so far. However, for

interactive games with millions of triangles even the best optimized algorithm is not

enough, and we have to look for help from hardware.

在文檔中光線追蹤法之平行化加速結構於多核處理器 (頁 17-26)