結論與未來方向 - 光線追蹤在OpenCL平台下使用堆疊在加速結構中效能之探討

6.1 結論

以往在 OpenCL 的限制下，在 OpenCL 的環境下實作光線追蹤演算法且擁有加速結構的程式相當的少，可以查詢到參考資源有限，在實作加速結構時又會面臨到自己定義的 stack 要放在 OpenCL 下的何種記憶體會較合適?

現在經過一系列的實驗測試後，雖然執行時間的差距很小，但還是可以看出效能會被記憶體類型影響，但每個硬體的影響程度不同，以 Intel HD graphics 來看 local memory 與 global memory 的速度差異，雖然兩者都會使用到位階 L3 cache，

但是 Local Memory 提供較多的 access pattern 可以達到 full bandwidth，使得 access 的時間通常較短，所以 Local Memory 表現較好。在 GTX960 執行時間的差距又變得更小，原因在 Nvadia 硬體在實作 OpenCL 記憶體時，雖然各類型的記憶體速度會有差別，但是比起 Intel HD graphics，硬體有較好的優化，使得各類型的記憶體到 cache 中 access 資料時，花的時間都差不多，使得效能的表現很接近。

而在使用 Local Memory 實作 Stack 時，要謹慎的確定宣告容量大小，因為宣告過多會直接的影響到效能，要以適當大小來使用為最好的選擇。以上結論提供程式設計師在實作具有加速結構的光線追蹤演算法程式時，有這些建議可以參考，

就是本論文的貢獻。

6.2 未來與改善方向

主要從兩個部分著手，在繪圖品質的提升方面，由於現在每個像素點只有接收一條光線的貢獻，再加上簡單的 Phong model 的效果，畫面品質不太理想，因此可以考慮加入 Distributed ray tracing[9]的作法，讓每個像素點可以接收多條光線的貢獻，進而做出更多的光學效果。而在提升 tree traversal 的速度方面，可以考慮採用 stack-less BVH Traversal[10]，來加速 tree traversal 過程，或是參考 PBRT 的加速結構，改善 tree traversal 的流程，使之不需要每個 node 都走訪，只需拜訪最接近的物件即可，以上就是本論文的改善方向。

附圖與附表引用來源

(1) 圖 2.1.1

Wikipedia, Ray tracing

https://en.wikipedia.org/wiki/Ray_tracing_(graphics)

(2) 圖 2.2.1

Wikipedia, Bounding volume hierarchy

https://en.wikipedia.org/wiki/Bounding_volume_hierarchy

(3) 圖 2.3.1、圖 2.3.5、圖 3.2、圖 3.3

Taking Advantage of Intel® Graphics with OpenCL

https://software.intel.com/en-us/articles/taking-advantage-of-intel-graphics-with-opencl

(4) 圖 2.3.2、圖 2.3.3、圖 2.3.4、圖 2.3.6、圖 2.3.7、圖 2.3.8 The Compute Architecture of Intel® Processor Graphics Gen8.

https://software.intel.com/sites/default/files/Compute%20Architecture%20of%20Intel

%20Processor%20Graphics%20Gen8.pdf

(5) 圖 2.3.9

http://news.mydrivers.com/1/320/320824.htm

(6) 圖 2.3.10

Nvidia, GeForce - Whitepaper NVIDIA GeForce GTX 980.

https://international.download.nvidia.com/geforcecom/international/pdfs/GeForce_GT X_980_Whitepaper_FINAL.PDF

(7) 圖 2.3.11

Nvidia, geforce series

來源: http://www.geforce.com.tw/hardware/desktop-gpus/geforce-gtx-960

(8) 圖 2.4.1、圖 2.4.2、圖 2.4.4、圖 2.4.5

Khronos Group, OpenCL - The OpenCL Specification version 1.2 https://www.khronos.org/registry/OpenCL/specs/opencl-1.2.pdf

(9)圖 2.4.6

OpenCL memory data flow

https://sites.google.com/site/csc8820/opencl-basics/opencl-terms-explained

(10) 圖 3.4

OpenCL Memory Model in Kernel

https://scs.senecac.on.ca/~gpu610/pages/content/openm.html

(11) 表 3.1

Memory Region - Allocation and Memory Access Capabilities https://www.khronos.org/registry/OpenCL/specs/opencl-1.2.pdf

參考文獻

[1]T. Whitted "An improved illumination model for shaded display"

Communications of the ACM, Volume 23 Issue 6, Pages 343-349, 1980.

[2] H. J. Haverkort, Results on geometric networks and data structures Chapter 1: Introduction, page 9-10+16, 2004.

[3] Intel® , Developer Zone - The Compute Architecture of Intel® Processor Graphics Gen8.

Available:https://software.intel.com/sites/default/files/Compute%20Architecture%2 0of%20Intel%20Processor%20Graphics%20Gen8.pdf

[4] Nvidia, GeForce - Whitepaper NVIDIA GeForce GTX 980.

Available:https://international.download.nvidia.com/geforcecom/international/pdfs/

GeForce_GTX_980_Whitepaper_FINAL.PDF

[5] Khronos Group, OpenCL - The OpenCL Specification version 1.2

Available: https://www.khronos.org/registry/OpenCL/specs/opencl-1.2.pdf

[6] Intel® , Developer Zone - Taking Advantage of Intel® Graphics with OpenCL Available: https://software.intel.com/en-us/articles/taking-advantage-of-intel-graphics-with-opencl

[7] Assimp - Open Asset Import Library. Available: http://www.assimp.org/

[8] M. Pharr and G. Humphreys, Physically based rendering: From theory to implementation, 2004.

[9] R. L. Cook, T. Porter, and L. Carpenter, "Distributed ray tracing"

ACM SIGGRAPH Computer Graphics Volume 18 Issue 3, Pages 137-145, 1984.

[10] M. Hapala, T. Davidovič, I. Wald, V. Havran, and P. Slusallek, "Efficient stack-less BVH traversal for ray tracing"

Proceedings of the 27th Spring Conference on Computer Graphics, SCCG '11.

在文檔中光線追蹤在OpenCL平台下使用堆疊在加速結構中效能之探討 (頁 57-61)