Extension of ALU cluster IP at magnetic RAM (MRAM)

IP return RETRY response Trying to read result

3.3.4 Extension of ALU cluster IP at magnetic RAM (MRAM)

This part will address that the IP is extended by replacing the data memory of the IP with a new type of memory with magnetism. Most of the other blocks are unchanged. Only a few bolocks are slightly modified to be adequate for MRAM. The first sub part is the overview of MRAM. The second one is about the necessary modifications for the connecting interface between IP and MRAM. Then, there is one another load_store unit that takes the responsibility of communication. Finally, the results of implementation are listed in the final subsection.

3.3.4.1 Overview of MRAM

General purpose memory stands for the data accessing, such as SRAM and DRAM. They have a characteristic of high speed accessing. However, once the power is off, the stored data will also be cleared. It thus can not store data in long time.

Batter-SRAM is provided to support problems of data retention but it costs with large power consumption and area overhead. The non-volatile memory therefore is invited to overcome the hardness like EEPROM, Flash memory, and etc. Nevertheless, for the non-volatile memory the accessing control is much more complex and has a limit of number to re-read and re-write.

MRAM is then innovated with a kind of non-volatile memory. It is not like conventional non-volatile memory by extra processing with gate of transistor. It uses magnetism to represent the logic state. There are several advantages of MRAM;

process compatible, large number times of data accessing, and non-volatile long life time. Because MRAM is fabricated by the upper metal layer, it thus is compatible to CMOS technology and will have low extra area overhead. The times of data accessing, over 10¹⁵, is much greater than conventional non-volatile memory. This is an attractive feature for Consumer Electronic Product. It thus can be reused nearly forever. Therefore, MRAM has a large potential to be another trend of memory.

3.3.4.2 Modifications for MRAM

Because the interface of MRAM and SRAM is different and the supporting bandwidth of MRAM is smaller than SRAM, it must be adjusted a lot to be able to connect MRAM with IP. We thus add an extra unit, Load_store unit (LSU), to account for this issue, as shown in Figure 3.3.9. Therefore, the decoder must have ability for the LSU unit. We increase conventional 142 bits to 143 bits for the instruction. If the

143rd bit is not set, the whole IP acts the same as before to execute the applications.

As the 143rd bit is set, the LSU operate to access data between IRF and MRAM depending on the instruction. The data bandwidth between IRF and MRAM is restricted by MRAM. We thus also modify the bandwidth of IRF to support byte access. The byte access will not get any trouble to the AHB wrapper. The wrapper is originally designed for byte, half word, and word access in little endian manner.

Figure 3.3.9: Modifications for MRAM with Added Load_store Unit

3.3.4.3 Implementation Results

The summary of implementation results are listed in table 3.4. It is through 0.18 um process of TSMC and cell based design kit of Artisan. The operating frequency of post-layout simulation is 105 MHz. The chip size is about 3.15x3.15mm², and its core size is 2.31x2.30mm². The gate count of the core is 260910. It must be noted that the data memory in the IP is replaced with MRAM such that there is no area cost for data memory. The instruction memory is the same of the IP with only one difference. It is the adding bit for load_store unit. Therefore, we need eighteen 8 x 128 single port static RAM (SRAM) which are generated by memory compiler with Artisan library.

The power dissipation is 403.36 mW for the total chip while is simulated by 0.9 net toggle probability. The pure core size without instruction memory is about 1.43x1.43mm². Its pure gate is 203893.67 and the power dissipation is down to 273.6mW. The physical layout of this co-project is shown in Figure 3.3.10 and its floorplan is in Figure 3.3.11.

Table 3.4: Summary of Implementation Results

Process TSMC 0.18 um

Library Artisan SAGE-x Standard Cell Library Post-layout Clock Rate 105 MHz (9.5ns)

Chip Size 3.15x3.15 mm²

Core Size (without memory) 2.31x2.30 mm²(1.43x1.43 mm²) Gate Count (without memory) 260910 (203893.67) Power Dissipation (without memory) 403.36 mw (273.6 mW)

On-chip memory 18x128x8single port SRAM

Pad

Input: 34 pins Output: 25 pins

Inout: 32pins Power: 40 pins

Figure 3.3.10: Layout of an ALU Cluster IP Extended at MRAM

HTRANS_01

mem_ls_q_07 mem_ls_q_06 mem_ls_q_05 mem_ls_q_04 mem_ls_q_03 IOVDD1 IOVSS1 mem_ls_q_02 mem_ls_q_01 CoreVDD1 CoreVSS1 mram_ls_a_8 IOVDD2

mram_ls_a_7 IOVSS2 mram_ls_a_6 mram_ls_a_5 mram_ls_a_4 CornerUL

mram_ls_a_3 mram_ls_a_2 CoreVDD2 CoreVSS2 mram_ls_a_1 mram_ls_a_0 IOVDD3 IOVSS3 mram_ls_wen mram_ls_oen

CoreVSS6

Figure 3.3.11: Floorplan of an ALU Cluster IP Extended at MRAM

In this section, a synthesizable ALU cluster IP is finished. It is designed from previous ALU cluster with added AHB slave wrapper. Thus, a hardware accelerator for media applications is complete. Owing to it is a soft IP, it is portable to different process. The extension of MRAM thus is an example that the process is changed from UMC to TSMC. The results of implementation for different process will be in next section.

在文檔中應用於多媒體串流處理之可重組式運算單元硬體加速矽智產設計 (頁 61-64)