Comparison of Different AOIs - Evaluation and Analysis

Chapter 4 Experimental Results and Analysis

4.2. Evaluation and Analysis

4.2.4. Comparison of Different AOIs

Based on different design methodology in the virtual world representation, we observe some differences between the grid-based approach and the GPU-based approach. For grid-based approach, we simply make a large array with each element as a variable-length linked-list. Client objects are stored in the list and are searched in a sequential way for each update. For GPU-based approach, recall that we don’t have a grid on GPU memory, but instead, we sort the client objects according to their bucket indices and then perform N-way binary search to find affected neighbors.

Apparently, the performance of grid-based approach is dominated by the average number of clients in the area of interest and the size of area of interest. The larger the size of AOI is, the more cells in the grid needed to be traversed are. However, the change of AOI does not change the behavior of GPU-based approach, and we will have same performance if the average number of clients in AOI remains the same.

From Fig, CPU performance loss are observed when the configuration changes from 2500x2500 with AOI=10x10 to 5000x5000 with AOI=20x20, while the GPU performances in the two configuration are almost identical.

Average Execution Time (GPU w/ Computation Only)

MAP=2500x2500, AOI=10x10 MAP=5000x5000, AOI=20x20

CPU GPU CPU GPU

512 2.303 9.189 5.872 9.191

1024 4.624 10.319 11.717 9.191

2048 9.259 11.821 23.392 11.889

4096 18.711 13.126 47.006 13.234

8192 37.954 16.112 94.659 16.006

16384 77.139 21.570 190.806 21.628

32768 160.524 33.607 388.601 33.684

65536 346.258 59.629 804.078 59.650

131072 787.703 115.858 1707.657 114.769

262144 1917.623 241.282 3780.537 241.614

524288 4982.445 550.882 8741.187 550.768

Table 4-12 Compare Different AOI with Same Client Density in the Virtual World

Chapter 5 Conclusions and Future Works

5.1. Conclusions

Practical and scalable middleware is the key to the successful and painless development of MMOGs to shorten the time to market while reducing the cost. In this paper, we survey the background of existing MMOG platforms and observe that the core problem toward scalability of current MMOG platform architectures is the sequential logic processing model based on CPU. From the observation, we proposed GPU-based algorithms to do logic processing, to merge update conflicts, and to perform range query in parallel. The experimental result shows that the GPU is capable of handling 0.5M clients concurrently with reasonable response time. Despite of aggressive amount of update message generated which needs to send back to client with some super-scalable I/O architecture, the GPU outperforms the CPU when number of clients grows more than 4K. And the performance boost of our approach is more than 100 times in certain scenarios.

With the rapid growth in GPU computation power, exploit the computation power of GPU in MMOG server platform is promising. This research reveals a new direction toward research concerning to optimize MMOG server performance or to simulate large number of avatars in a distributed virtual environment (DVE). As the algorithm is completely parallel, the performance growth is linearly proportional to the number of SIMD processor in GPU.

5.2. Future Works

Although we derive parallel algorithms for MMOG server computing based on GPU architecture, which give a practical solution to resolve the scalability issue with respect to client command processing, there are still problems needed to be considered, as follows:

(1) Ease of GPU Logic Development

So far we have hard-coded the move logic and attach logic in the GPU kernel.

However, game logic should be customizable with ease. Although the programming task with CUDA has prevented us from GPU assembly code, the programming paradigm difference between sequential and parallel is still cumbersome. To ease the development of game logic, probably we can define a scripting language (or use some existing scripting language), and make transformation between the scripts and the GPU code.

(2) GPU Memory Management

Memory management is crucial to server-side application. If we want to apply GPU to server-side computing, we must ensure the memory management is stable enough. However, current CUDA runtime is quite raw; if error happened in memory allocation/de-allocation, the GPU will just simply halt and never return. Therefore, based on current CUDA runtime, we must do memory management by ourselves to ensure the memory allocation always succeeds.

(3) Scalable I/O Architecture to 0.5M Clients

Since the GPU can handle up to 0.5M clients, the I/O architecture must be scalable to such degree. As we off-load the command processing to GPU, CPU becomes a mediator between the network and the GPU. This could be possible if we have some better hardware and a fine-tuned Linux-based operating system.

(4) Dynamic Load Balancing among GPUs

In a very large scale system, we may have multiple GPUs on a single server and lots of gateways interconnected with each other. Work load in different GPU should be dynamically adjusted to avoid flash-crowded effect. Also, the latency can be reduced by dynamic load balancing among GPUs.

Bibliography

[1] Tsun-Yu Hsiao, Design and Implementation of a Massive Multiplayer Online Games Middleware, Phd Dissertation, 2006.

[2] EVE Online. http://www.eve-online.com/

[3] Kenneth Moreland, Edward Angel, The FFT on a GPU, Proceedings of the ACM SIGGRAPH/EUROGRAPHICS conference on Graphics hardware, July 26-27, 2003, San Diego, California.

[4] D Manocha, General-Purpose Computations Using Graphics Processors, IEEE Computer 2005.

[5] Naga K. Govindaraju , Stephane Redon , Ming C. Lin , Dinesh Manocha, CULLIDE: interactive collision detection between complex models in large environments using graphics hardware, Proceedings of the ACM

SIGGRAPH/EUROGRAPHICS conference on Graphics hardware, July 26-27, 2003, San Diego, California.

[6] Naga K. Govindaraju , Brandon Lloyd , Wei Wang , Ming Lin , Dinesh Manocha, Fast computation of database operations using graphics processors, Proceedings of the 2004 ACM SIGMOD international conference on

Management of data, June 13-18, 2004, Paris, France.

[7] David Luebke, Mark Harris, Jens Krüger, Tim Purcell, Naga Govindaraju, Ian Buck, Cliff Woolley, Aaron Lefohn, GPGPU: general purpose computation on

graphics hardware, Proceedings of the conference on SIGGRAPH 2004 course notes, p.33-es, August 08-12, 2004, Los Angeles, CA.

[9] General-Purpose Computation Using Graphics Hardware Forum http://www.gpgpu.org/

[10] OpenGL Shading Language Specification, Version 1.20

http://www.opengl.org/registry/doc/GLSLangSpec.Full.1.20.8.pdf

[11] NVIDIA CUDA Programming Guide, Version 0.8.2

http://developer.download.nvidia.com/compute/cuda/0_81/NVIDIA_CUDA_Pr ogramming_Guide_0.8.2.pdf

[12] ATI Close-To-Metal (CTM) Guide

http://ati.de/companyinfo/researcher/documents/ATI_CTM_Guide.pdf

[13] Spread Concepts LLC, The Spread Toolkit, http://www.spread.org/

[14] Douglas C. Schmidt’s, The ADAPTIVE Communication Environment (ACE^tm) http://www.cs.wustl.edu/~schmidt/ACE.html

[15] Chen-en Lu. Design Issues of a Flexible, Scalable, and Easy-to-use MMOG Middleware. Master Thesis. 2004.

[16] Andrew Sohn, Yuetsu Kodama, Load balanced parallel radix sort, Proceedings of the 12th international conference on Supercomputing, p.305-312, July 1998, Melbourne, Australia.

[17] Guy E. Blelloch, Prefix Sums and Their Applications, Carnegie Mellon University Technical Report, CMU-CS-90-190, 1990.

[18] Paul B. Callahan , S. Rao Kosaraju, A decomposition of multidimensional point sets with applications to k-nearest-neighbors and n-body potential fields, Journal of the ACM (JACM), v.42 n.1, p.67-90, Jan. 1995.

[19] 原田隆宏，田中正幸，越塚誠一，河口洋一郎，グラフィックスハードウェアを用いた個別要素法の高速化，日本計算工学会論文集，(2007)

在文檔中利用圖形處理器加速鉅量多人連線遊戲伺服器端之運算 (頁 55-0)