Micro-Architecture of an ALU cluster Intellectual Property

Chapter 3 Development Roadmap and Proposed Design

3.3 An ALU cluster Intellectual Property

3.3.2 Micro-Architecture of an ALU cluster Intellectual Property

The detail architecture is shown in Fig 3.12. As illustrated in Fig 3.12, four main blocks composed of this design are AMBA AHB wrapper, ALU cluster, instruction and data memory. The instruction and data memory are used to feed the data and instruction required for operation into functional units.

Fig 3.12 The Proposed ALU Cluster IP Architecture

The major part for dealing the media applications is an ALU cluster as description in Section 3.2. The arithmetic units and internal storages part of the ALU cluster in this ALU cluster IP is the same as the one introduced in Section 3.2.1.

However, the control and internal storages are improved in this designed ALU cluster IP. The ALU cluster in this designed is improved the ability to reading source and writing destination. It makes all banks of data and instruction memory expose to the AMBA bus. It means that these memory banks can be accessed directly from AMBA bus through the AMBA AHB wrapper which will be introduced later. In addition, the better performance is exploited by shortening reading cycles. In the original ALU cluster, the reading must take four cycles to access one burst reading operation.

However, in the improved ALU cluster, the reading operation takes two cycle latencies in burst reading and then the data is read sequentially in every cycle.

The ALU cluster IP must has the ability to execute when the AMBA bus is granted by other masters so that the ALU cluster needs a functional block to feed address to the instruction memory automatically. As illustrates in Fig 3.12, the Pc_counter is used to process this job. It will increase the program counter by one in

every clock cycle. The decoder will compare the value of program counter with the end value of Pc_counter every cycle to check if the ALU cluster finishes the job. If the job is completed, the alu_work signal is activated to send information to the wrapper.

In the alu_work signal is inactive, the IP can not be accessed and returns RETRY signal response to AMBA bus. Besides, one special input signal combination can clear the end value of Pc_counter in the decoder and force the IP to stop execution. The special mechanism is designed in order to avoiding the possibility of the deadlock occurrence.

Another key component of ALU cluster IP is AMBA AHB wrapper. It will be discussed in this paragraph. The wrapper interface conforms to Advanced Microcontroller Bus Architecture (AMBA) Advanced High-performance Bus (AHB) protocol described in Section 3.3.1. It provides a common interface to integrate the proposed design with ARM versatile baseboard and form a media processing system.

A finite state machine (FSM) and an address generation unit (AGU) are composed of the architecture of proposed wrapper. The finite state machine of proposed wrapper is used to control the states and response the request of AMBA bus. It provides the communication capability between AHB slave bus and the ALU cluster inside proposed IP. It receives signals from AMBA bus and activates the ALU cluster to response. The FSM also controls the address generation unit to produce necessary address for the ALU cluster, whether operating in incrementing mode or wrapping mode of burst operation.

This FSM is designed with six states. They are Idle, Accessible, ALU_Work, Un-readable Wait, Un-writable Wait and Error. As shown in Fig 3.13, the state diagram of the finite state machine, the FSM will stay in the Idle state while the IP is not accessible or the operation of ALU cluster is finished. Whether IP has done the work or suffers from some error, it returns back to the Idle state. In this state, the wrapper will be ready to receive signals from bus and prepare next operation. It will go to other starts while the bus is granted and the IP will be accessed or the ALU cluster is activated. The condition of going to other state is only when the HTRANS signal equals to NONSEQ. If the NONSEQ is encountered, it identifies which operation of the IP is requested by HWRITE then the FSM will move to the target state.

Fig 3.13 The state diagram of the finite state machine

In the next state, Accessible state, the IP is accessible. When the HTRANS signal is equal to NONSEQ and the HWRITE signal is logic high, it will directly move to this state. There is a control signal to identify the different types of accessing whether incrementing mode or wrapping mode is utilized in the burst transformation while staying this state. One type is that the IP is accessed with different address with the HTRANS signal equals to NONSEQ. Another one is that the IP is accessed continuously with the address of the previous access in wrapping or incrementing mode in the burst transformation. Three conditions are forced the FSM to other states.

These cases are access is finished, ready to read but data is not ready and busy to write. The states are moved to Idle, Un-readable Wait and Un-writable Wait. The later two of the above-mentioned states are addressed below.

The Un-readable Wait state exists because of the two necessary cycle of reading data latency. One of two paths makes the FSM enter the Un-readable state is when the FSM is in the Idle state and the HTRANS signal is NONSEQ and the HWRITE signal is logic low. It presents the IP is being read. The first reading operation needs two cycles to prepare necessary data so it must be in this state until the data is ready. Then it will enter the Accessible state to perform the following reading request. Another one of two paths is from Accessible state to Un-readable Wait state because of the necessary latencies. In addition, when the IP is being written data in burst mode of wrapping or incrementing type thus the TRANS signal of AHB slave is changed to

BUSY, the FSM will enter the Un-Writable Wait state. After the signal of TRANS release from BUSY to NONSEQ or SEQ, the FSM will return from the Un-writable state to Accessible state.

The last two states of design six-stated FSM are Error and ALU_Work state.

When proposed IP is accessed illegally due to invalid address and transaction, the finite state machine will go to Error state. The invalid address and transaction result from the depth limitation of data and instruction memory. The other reason entering this state is that the IP is being accessed but is not granted expectedly. When these two cases happened, Error state will be entered and escapes from violating AMBA AHB protocol. If the Error occurs, the Error state must obey the AHB protocol and thus have two cycles response to reply the bus with proper HREADY and HRESP signal as defined in the AMBA AHB specification.

Finally, The ALU_Work state reveals that the applications are being processed in ALU cluster. From Idle state is an only one path into the state. Whether accessed by reading or writing operations, the FSM has the ability to transfer a two cycle response to the AHB bus in the ALU_Work state. Additionally the ALU cluster keeps working without being affected by any unexpected access until finishing the operations.

Eventually there is one characteristics related to the wrapper. That is data and instruction memory embedded in the IP can be access directly by proposed wrapper.

As description of the ALU cluster IP, there is one thing needed to be reminded.

One instruction must be completed through many stages so it takes more cycles to write the executed results back. The ALU is a two stage pipelined structure unit so that it takes six cycles, including two extra cycles and four necessary cycles for every operation such as instruction decoding, data source selection and results writing. Then the four stages pipelined multiplier will need eight cycles and the divider will need twenty cycles to write back the results.

在文檔中具多齊質性處理器核心之多媒體串流處理架構 (頁 41-45)