• 沒有找到結果。

Media applications are characterized by large available parallelism [3], little data reuse and a high computation of memory access ratio [1]. While these characteristics are poorly matched to conventional micro processor micro-architectures, recent research has proposed using streaming micro-architecture by fit modern VLSI technology with lots of ALUs on a single chip with hierarchical communication bandwidth design to provide a leap in media applications. Relative topics of recent research are Image Stream Processor [12], Smart Memories [16], and Processing-In-Memory [10].

In order to achieve computation rates, current media processor often uses special-purpose [2], fixed-function hardware tailored to one specific application.

However, special-purpose solutions lack of the flexibility to work effectively on a wide application space. The demand for flexibility in media processing motivates the use of programmable processors [2]. To bridge the gap between inflexible special special-purpose solutions and current programmable micro-architectures that cannot meet the computational demands of media-processing applications, stream micro-architecture developed by Stanford University has been chosen [3]. Steam processors directly exploit the parallelism and locality exposed by the stream programming model [4] to achieve high performance.

Since, various stream micro-architecture with different ALU clusters are suitable for media applications, mapping multimedia applications to adequate stream programming model becomes essential. However, the organization of stream model will be optimized for different multimedia applications. The number of ALUs in a cluster and the number of clusters in the stream micro-architecture may be different,

according to the algorithm of different media applications. Thus, two solutions existed to find the number of hardware needed for dedicated application. One solution is to use hardware implementation of different stream micro-architectures to evaluate performance is expensive and time consuming. The other is a software solution, which simulates performance on different virtual stream micro-architectures and compares the performance between the architectures. By doing so, the best hardware organization, which has to fully optimize the usage of the hardware resources and reach better performance, is obtained. The second solution,,

“micro-architecture simulator”, is suitable for the demand needed.

This project has been shared by a team of three graduate students. The major tasks are low power ALU cluster design, memory design and simulator design. In this thesis, a micro-architecture simulator will be implemented. Based on different organizations, performance will be evaluated including CPU time, total memory access time, time needed to access each level of the memory hierarchy, and the real memory in use. Performance of conventional micro-architecture and stream micro-architecture will be compared to prove the improvement of the stream micro-architecture.

Before the media application being simulated in micro-architecture simulator, the

“micro-architecture decision” procedure has to be done. The micro-architecture to be simulated is decided in the micro-architecture decision step, including cluster numbers, the number of function units in a cluster, and capacity of each level memory hierarchy. After parameters that may affect the micro-architecture determined, ISA of the micro-architecture can obviously be known. Based on the ISA of the micro-architecture and the instruction format of each function unit that may be included in a cluster, the selected media application is mapped into binary stream

programming codes that can be executed in a stream processor. After the micro-architecture is decided, and dedicated stream programming codes are generated, they are put into simulator for simulation. Then, a simulation result will be generated.

The simulation result of one organization of stream micro-architecture is compared with the result of other organization of micro-architecture, and parameters that may affect the organization of the micro-architecture is adjusted. The operation of simulation and adjustment is continued till the optimal performance is discovered.

Simulation result, on the other hand, can be taken to make sure the correctness of the hand-coding binary stream programming codes. With micro-architecture simulator, the optimized micro-architecture for dedicated media application can be discovered, and can be implemented in hardware.

With micro-architecture simulator, media application is simulated on the simulator to evaluate performance of different stream micro-architecture. In this thesis, FFT (Fast Fourier Transform) is chosen as benchmark and the micro-architecture of a cluster is decided, the only variable between different organizations of stream micro-architecture is cluster number. FFT is simulated on different cluster number’s stream micro-architecture, and performance of each organization of stream micro-architecture is compared, including CPU time, memory access times of each level memory hierarchy. For performance and memory accessing benefit, 4-cluster stream micro-architecture is chosen as the best micro-architecture for FFT.

The remainder of this thesis is organized as following: Chapter 2 presents background information on media processors, micro-architecture that enables high performance on media applications with fully-programmable processors, and the design of current micro-architecture simulators. In Chapter 3, the design

methodology is presented. In Chapter 4, experimental results are provided and a comparison to conventional micro-architecture is presented. Finally, future work and conclusions are presented in Chapter 5, Chapter 6.

相關文件