4.1 Augmented AABB
We have made some modifications to the fast KD tree construction [32]. They
mentioned that three kinds of event are used: “starting”, “ending” and “lying”, triangles
lying in the split plane should be placed into either left or right, depends on SAH
evaluation [32]. However, from our experiments we observed that additional conditions
are needed for “lying” events, and these conditions complicated the process.
We do not see the benefits from using “lying” events; instead we advocate the
method using minimum and maximum of the AABBs [19]. This is also more general
and robust, since KD tree is not always for triangles; any primitive can be an element of
KD tree and can be easily fit into an AABB.
Based on our observation, benefits are induced if we use the min-max events of the
AABBs.
(1) Only one bit flag is required to tell from minimum or maximum.
(2) The number of events is automatically double the number of triangles, on all
three axes.
(3) Simplify the incremental sweeping.
(4) Simplify the SAH cost function; eliminate the situation that some triangles lie on
the split candidate.
(5) Simplify the classification of triangles and events.
(6) Simplify the condition branch in ray traversal.
Based upon our observation at least 10% speedup is achieved. Unfortunately some
problems arise from the modification. The incremental sweeping for SAH now becomes
as follows.
Triangles lying on the plane orthogonal to the split axis generate minimum and
maximum of the same value. You can see that such triangles are omitted by SAH
function, since by the time the triangles have already been deleted from NR, but have
not been added to NL. Let us see the case below.
Fig. 4-1 At the time the sweeping reaches the plane p, we have NL = 2 and NR = 2
A remedy to this is to augment the AABB; we decrease the minimum and increase
the maximum by an epsilon. Now the SAH function works correctly.
Fig. 4-2 Augmented min-max events of the AABB
4.2 A Trivial and General Method for Event Classification
The standard event classification for straddling triangles [32] requires Sutherland
Hodgeman clipping and merging of the new events with the originals. The approach
seriously complicates the event classification, and cannot be applied when primitives
other than triangle are used. Here we use a general and robust method to handle the
straddling AABBs. The pseudo algorithm is as follows.
Fig. 4-3 A trivial and general method for event classification
for all e on the splitAxis S0
for all e on the non-splitAxis S=(S0+1)%3 and S=(S0+2)%3 if(side[e] == LEFT)
This method works excellently in the situation where most of the primitives have
similar sizes. And further, this method helps us handle non-triangle based surfaces.
4.3 Using Preallocated Pools
A major bottleneck of KD tree construction is the rapid memory allocation. [2, 27]
have proposed the concept of pre-allocated pool, and we have used a similar method.
The difficulties arise from the prediction of the final size of the pools. The method
to precisely estimate the size of nodes and leaves of the whole tree remains unknown.
[27] has used chunks of memory linked into lists for nodes and leaves. They perform
the construction in DFS fashion, and if the current chunk is full they allocate a new one.
Similarly we deploy one node pool and one leaf pool. At each level we first check if
the remaining space is enough for the next level. If necessary we allocate a larger pool
and move over the original arrays.
Based upon our experiments, the construction with preallocated pools is at least
twice faster than the one using dynamic allocation.
4.4 DFS KD Tree Construction with Preallocated Pools
KD tree is generally constructed in DFS. [27] proposed a DFS construction with
preallocated memory, here we present a similar implementation.
The node pool and leaf pool grow constantly during construction. Merely two
temporary arrays are needed, array A for left sub-nodes and array B for right sub-nodes.
As one node splits into two, we replace the events of the node with the events of its left
child in array A, and the events of its right child are “pushed” to array B. Array B
actually serves like a stack, each time the split reaches a leaf, the events of the newest
node in array B are “popped” to array A. Node split always takes place in array A.
Fig. 4-4 DFS KD tree construction (Blue for internal, green for leaf, red for empty), IDs in brackets are undefined
Note that while the nodes reside in the stack (array B), their node ID are still
undetermined. The node is assigned an ID and added to the node pool only after it is
popped to array A.
4.5 BFS KD Tree Construction with Preallocated Pools
We perform the construction in BFS style, each time we process one level of the tree.
The same two temporary arrays are needed. All the nodes of the first level reside in
array A. After we perform the split all the children are moved to array B. And next time
array A becomes the input. The two arrays are used alternately until the maximum level
is reached.
Any node generated is moved to the node pool, and its bias address in the pool will
be record as the left or right child of its parent node. Also any node detected as leaf will
have its events moved to the leaf pool.
Fig. 4-5 BFS KD tree construction (Blue for internal, green for leaf, red for empty)
4.6 KD Tree Construction: DFS VS BFS
The DFS fashion benefits from always choosing the left child to split, and there is
no need to store the AABB min/max. Therefore it consumes less memory than the BFS
fashion.
At traversal stage, DFS has another advantage of always placing the left child
immediately after the node in the node pool. This increases the data coherent and may
perform better when the cache size is small. The fashion is necessary for efficient
traversal [13, 20].
The BFS construction demands many fewer iterations, only one iteration for one
level. On the other hand, DFS suffers from rapid stack operations, each time a node is
split, the right child has to be pushed to array B; and each time the split reaches a leaf,
one node in array B has to be popped to array A. The frequent memory traffic
considerably limits the performance.
Based upon our experiments, construction using BFS is about 25% faster than that
using DFS. Thus we have adopted BFS as our default throughout all the upcoming
chapters.
4.7 KD Tree Node Structures: AOS VS SOA
Although storing nodes as array of structures (AOS) may be more straightforward,
structure of arrays (SOA) is necessary for parallel implementations. Especially for some
architecture like NVIDIA CUDA, where structure and class are not supported and SOA
must be used.
For traversal as KD-restart rather than KD-backtrack, the AABB min/max can be
omitted, as well as the pointer to parent.
Fig. 4-6 BFS construction with node pool implemented as array of structure (AOS)