• 沒有找到結果。

For unclosed meshes, we perform boundary handling to extract violated triangles each frame. We perform traversal for the BVHs of ghost triangles and the deformable object.

On CPUs, we perform traversal sequentially according to the BVHs of all holes in the deformable object. In other words, performing traversal in the step of boundary handling on CPUs is BVH-based. If a triangle collides with ghost triangles, then the triangle is

determined to be violated. We keep the types of all violated triangles until the deformable object is self-collision free. Note that we compute exactly whether or not collision occurs between the deformable object and ghost triangles.

On GPUs, we also perform boundary handling sequentially according to the BVHs of all holes. But nt threads are created, where nt is the number of triangles of the deformable object. Each thread is responsible for the computation between a triangle and the BVHs of ghost triangles. So, the degree of parallelism is higher. In other words, performing traversal in the step of boundary handling on GPUs is triangle-based. On the other hand, when traversal is performed, we collect the potentially colliding pairs, but elementary tests are not performed for these potentially colliding pairs. We determine whether or not two triangles collide with each other by the bounding boxes of the two triangles and determine the type of the triangle according the result. If we perform elementary tests exactly for the potentially colliding pairs of the deformable object and ghost triangles, the computation is more complicated in the kernel function, and there are more global memory accesses.

Table 6.10 and 6.11 show the results on CPUs and on GPUs. On CPUs, it is better to perform boundary handling with exact collision detection for Ani. three, four, and five so as to reduce the number of violated triangles and the computation time of performing traversal. But for Ani. six, the hole of the deformable object is huge, and the computa-tion of boundary handling can be reduced a lot with inexact collision deteccomputa-tion. So, the performance of whole procedure for Ani. six is better with inexact collision detection.

On GPUs, we can observe that if we compute collision detection exactly in the step of boundary handling, the number of violated triangles is smaller, and the execution time of performing traversal is shorter, but the execution time of boundary handling increases significantly. Therefore, the performance is reduced for the whole procedure.

Traversal

We perform traversal to collect potentially colliding pairs. The situation is similar to the step of boundary handling.

On CPUs, after performing view tests and boundary handling, we get three kinds of

Ani.

Three 4.24 2032 6.6 16.41 2.22 5548 10.9 19.26

Four 2.22 1690 4.72 11.93 1.45 3898 8.32 15.55

Five 0.54 622 5.15 12.71 0.39 1367 8.21 14.71

Six 7.37 1593 6.14 20.64 3.56 3116 8.01 18.75

Table 6.10: Boundary handling with the view-line scheme by exact and inexact methods on CPUs.

Three 1.43 0.74 2.43 0.29 0.86 1.47

Four 0.88 0.53 1.64 0.27 0.62 1.1

Five 1.08 0.51 1.73 0.21 0.6 0.98

Six 0.91 0.71 1.78 0.6 0.76 1.5

Table 6.11: Boundary handling with the view-line scheme by exact and inexact methods on GPUs.

view sets for closed meshes and four kinds of view sets for unclosed meshes. Then, we build vBVHs according to all kinds of view sets. After that, we perform traversal for the vBVHs sequentially. In other words, performing traversal on CPUs is BVH-based.

On GPUs, we get a set of triangles which are negatively oriented or violated after per-forming view tests and boundary handling. Traversal is performed for negatively oriented and violated triangles. nnv threads are created, where nnv is the number of triangles, which are negatively oriented or violated. Thus, the computation of performing traversal of a triangle is handled by a thread. Note that we do not need to build vBVHs for all kinds of view sets. We perform traversal between the negatively oriented and violated triangles and the BVH of the deformable object. In other words, performing traversal on GPUs is triangle-based. Besides, we employ the front-based method presented by Tang et al.

[TMT09] to improve the performance of performing traversal, but we do not construct the bounding volume test tree. We record a list of nodes for each triangle every frame that traversal is terminated at these nodes. So, we do not need to perform traversal from the root of the BVH in the next frame but start from the history nodes. In short, our approach

is triangle-based, and the degree of parallelism is high.

Table 6.12 shows the execution time of performing traversal on GPUs with the view-line scheme for three different policies. For the first policy, we perform traversal without using history nodes. For the second policy, we employ history nodes to improve perform-ing traversal. After releasperform-ing the history nodes, traversal is performed and started from the root node of the BVH. For the last policy, traversal is performed and started from the initial traversal history nodes when the history nodes are released. We can observe that the performance of the third policy is the best, and the performance of the first policy is the worst because the cost of traversing for the BVH from the root each frame is too expensive.

Ani. without using using history nodes

history nodes restart from root restart from initial history nodes

One 0.66 0.23 0.19

Table 6.12: Execution time (in ms) of performing traversal on GPUs with the view-line scheme for three different policies.