Optimizing KD tree Construction - 光線追蹤法之平行化加速結構於多核處理器

4.1 Augmented AABB

We have made some modifications to the fast KD tree construction [32]. They

mentioned that three kinds of event are used: “starting”, “ending” and “lying”, triangles

lying in the split plane should be placed into either left or right, depends on SAH

evaluation [32]. However, from our experiments we observed that additional conditions

are needed for “lying” events, and these conditions complicated the process.

We do not see the benefits from using “lying” events; instead we advocate the

method using minimum and maximum of the AABBs [19]. This is also more general

and robust, since KD tree is not always for triangles; any primitive can be an element of

KD tree and can be easily fit into an AABB.

Based on our observation, benefits are induced if we use the min-max events of the

AABBs.

(1) Only one bit flag is required to tell from minimum or maximum.

(2) The number of events is automatically double the number of triangles, on all

three axes.

(3) Simplify the incremental sweeping.

(4) Simplify the SAH cost function; eliminate the situation that some triangles lie on

the split candidate.

(5) Simplify the classification of triangles and events.

(6) Simplify the condition branch in ray traversal.

Based upon our observation at least 10% speedup is achieved. Unfortunately some

problems arise from the modification. The incremental sweeping for SAH now becomes

as follows.

Triangles lying on the plane orthogonal to the split axis generate minimum and

maximum of the same value. You can see that such triangles are omitted by SAH

function, since by the time the triangles have already been deleted from NR, but have

not been added to NL. Let us see the case below.

Fig. 4-1 At the time the sweeping reaches the plane p, we have NL = 2 and NR = 2

A remedy to this is to augment the AABB; we decrease the minimum and increase

the maximum by an epsilon. Now the SAH function works correctly.

Fig. 4-2 Augmented min-max events of the AABB

4.2 A Trivial and General Method for Event Classification

The standard event classification for straddling triangles [32] requires Sutherland

Hodgeman clipping and merging of the new events with the originals. The approach

seriously complicates the event classification, and cannot be applied when primitives

other than triangle are used. Here we use a general and robust method to handle the

straddling AABBs. The pseudo algorithm is as follows.

Fig. 4-3 A trivial and general method for event classification

for all e on the splitAxis S0

for all e on the non-splitAxis S=(S0+1)%3 and S=(S0+2)%3 if(side[e] == LEFT)

This method works excellently in the situation where most of the primitives have

similar sizes. And further, this method helps us handle non-triangle based surfaces.

4.3 Using Preallocated Pools

A major bottleneck of KD tree construction is the rapid memory allocation. [2, 27]

have proposed the concept of pre-allocated pool, and we have used a similar method.

The difficulties arise from the prediction of the final size of the pools. The method

to precisely estimate the size of nodes and leaves of the whole tree remains unknown.

[27] has used chunks of memory linked into lists for nodes and leaves. They perform

the construction in DFS fashion, and if the current chunk is full they allocate a new one.

Similarly we deploy one node pool and one leaf pool. At each level we first check if

the remaining space is enough for the next level. If necessary we allocate a larger pool

and move over the original arrays.

Based upon our experiments, the construction with preallocated pools is at least

twice faster than the one using dynamic allocation.

4.4 DFS KD Tree Construction with Preallocated Pools

KD tree is generally constructed in DFS. [27] proposed a DFS construction with

preallocated memory, here we present a similar implementation.

The node pool and leaf pool grow constantly during construction. Merely two

temporary arrays are needed, array A for left sub-nodes and array B for right sub-nodes.

As one node splits into two, we replace the events of the node with the events of its left

child in array A, and the events of its right child are “pushed” to array B. Array B

actually serves like a stack, each time the split reaches a leaf, the events of the newest

node in array B are “popped” to array A. Node split always takes place in array A.

Fig. 4-4 DFS KD tree construction (Blue for internal, green for leaf, red for empty), IDs in brackets are undefined

Note that while the nodes reside in the stack (array B), their node ID are still

undetermined. The node is assigned an ID and added to the node pool only after it is

popped to array A.

4.5 BFS KD Tree Construction with Preallocated Pools

We perform the construction in BFS style, each time we process one level of the tree.

The same two temporary arrays are needed. All the nodes of the first level reside in

array A. After we perform the split all the children are moved to array B. And next time

array A becomes the input. The two arrays are used alternately until the maximum level

is reached.

Any node generated is moved to the node pool, and its bias address in the pool will

be record as the left or right child of its parent node. Also any node detected as leaf will

have its events moved to the leaf pool.

Fig. 4-5 BFS KD tree construction (Blue for internal, green for leaf, red for empty)

4.6 KD Tree Construction: DFS VS BFS

The DFS fashion benefits from always choosing the left child to split, and there is

no need to store the AABB min/max. Therefore it consumes less memory than the BFS

fashion.

At traversal stage, DFS has another advantage of always placing the left child

immediately after the node in the node pool. This increases the data coherent and may

perform better when the cache size is small. The fashion is necessary for efficient

traversal [13, 20].

The BFS construction demands many fewer iterations, only one iteration for one

level. On the other hand, DFS suffers from rapid stack operations, each time a node is

split, the right child has to be pushed to array B; and each time the split reaches a leaf,

one node in array B has to be popped to array A. The frequent memory traffic

considerably limits the performance.

Based upon our experiments, construction using BFS is about 25% faster than that

using DFS. Thus we have adopted BFS as our default throughout all the upcoming

chapters.

4.7 KD Tree Node Structures: AOS VS SOA

Although storing nodes as array of structures (AOS) may be more straightforward,

structure of arrays (SOA) is necessary for parallel implementations. Especially for some

architecture like NVIDIA CUDA, where structure and class are not supported and SOA

must be used.

For traversal as KD-restart rather than KD-backtrack, the AABB min/max can be

omitted, as well as the pointer to parent.

Fig. 4-6 BFS construction with node pool implemented as array of structure (AOS)

在文檔中光線追蹤法之平行化加速結構於多核處理器 (頁 29-39)