• 沒有找到結果。

Write back stage (WB)

Chapter 3 Design and implementation

3.1 The asynchronous AVR microcontroller architecture

3.1.5 Write back stage (WB)

There are two registers in write back stage. Figure 16 illustrates the organization of execution stage. The two register contents are destination register index and destination register value. The Q4 (WB) send the request signal to write back stage latch and show the destination register index and destination register value to register file. The register file

acknowledges to Q4 to accomplish the instruction procedure in this stage.

Figure 16 The organization of write back Stage

3.1.6 The register file

The register files comprises 32, 8-bit wide general purpose registers. It provides two output ports for read and one write port for write. The hardware organization helps the data manipulation between registers more efficiency. In order to observe the register contents, we connect the register R31 data output port to outward. This function will be demonstrated below.

3.2 Design methodology

In this section, we introduce the design procedure and its development tools. Section 3.2.1 introduces the DDG of instruction – a method to describe the micro operation of AVR instruction. Section 3.2.2 introduce the design methodology and explain how to combine the synchronous CAD tool to explore asynchronous circuits.

3.2.1 Data dependency graph of AVR microcontroller

After defining the asynchronous AVR hardware organization, we implement the AVR instruction set to the organization. The DDG is used to describe the AVR instruction set micro operation. Every part of DDG is mapped to the corresponding hardware [8].

The instruction ADD Rd, Rs is used to explain how DDG describe the AVR instruction set. The meaning of the instruction ADD Rd, Rs is that add the value of register Rd and the value of register Rs, and stores the result in the destination register Rd. Its operation and instruction format is shown in Figure 17.

Figure17. The instruction content of ADD instruction

According the organization of AVR microcontroller, the DDG of ADD Rd, Rs instruction is shown in Figure 18. Because the first stages of all AVR instructions are the same, we only

show the DDG of ADD Rd, Rs instruction from instruction decode stage to write back stage.

After the decoder receiving the instruction code, it provides the Rd and Rs register contents to the ID_EX latch. The execution stage adds these two values and stores it in

EX_WB latch. The write back stage sends the Rd data back to the register file. The whole

procedure is the same as we described in section 3.1.

Figure18. The DDG of ADD instruction

3.2.2 Design steps

Figure 19 illustrates the asynchronous design steps and how to integrate the existing tools in our design. We describe the design steps below:

Referencing to [6], we created enough delay-insensitive FPGA elements with Verilog language and verify its function with Modelsim software. According to the specification, AVR organization and its DDG of AVR instruction set, we wrote the AVR organization HDL with Verilog language. The Modelsim software was used to verify the correctness of AVR in HDL.

Finally we download our design to the physical chip to validate the AVR functions.

Figure19. The design and implementation procedure

3.2.3 Design environment

Table1 illustrates the existing tools using in our design and its purpose.

Synchronous Tools Name Purpose

Xilinx ise 4.2.03i Project Management

Download design into Target Board

Synplify Pro 7.6 Logic synthesis

Modelsim SE/EE PLUS 5.4 Simulation and verification Table 1. The design environment tools

3.3 Implementation

In this section, we introduce implementation environment. Section 3.3.1 introduces the issue for using FPGA to implement asynchronous circuit. Section 3.3.2 introduces the target board and selected FPGA Chip.

3.3.1 FPGA design issues for asynchronous logic

It is no doubt that FPGA is an extremely effective means of performing fast development and test of digital circuits. The employment of large amounts of simple logic gates and

datapaths that can be rapidly programmed and reprogrammed until a desired solution has been found is a very cost effective method of hardware design.

Asynchronous circuits design is sensitive to the hazard that was introduced from the FPGA working phases, such as software mapping, routings and placements. The characteristic of delay-insensitive (DI) circuit that can operate correctly regardless of the delays on its gates and wires is suitable for the FPGA’s implementation.

3.3.2 Implementation environment

Xilinx prototype board

The Xilinx prototype board (model: AFX BG560-100) is used as the target board. It helps to implement our asynchronous AVR microcontroller design without other additional efforts. The traditional approach to experimenting with new devices involving wiring together some ICs on a circuit board is becoming impractical and ineffective. Instead, with new high-density devices on custom PC boards represents a substantial investment of time and money. This prototype boards from device manufacturers can meet this requirement for experimentation. More details about Xilinx prototype board can be found in [17]; Figure20

shows the photo of physical xilinx prototype board.

Figure20. Xilinx prototype board

FPGA chip

The target FPGA chip is XCV 1600E BG560 Virtex™-E 1.8 V; The main feature of Virtex E chip are fast, high density system gate count and low power consumption. More details about Virtex FPGA Chip can be found in [18].

Chapter 4 Testing

In order to test the implementation of AVR, a simple program was designed that could

be used to exercise the paths. The general purpose of this program was to load data from an

external memory, manipulate that data through ALU and then change the register file contents.

Resultant values could be observed by the register file. Table2 shows a typical sequence of

execution of instructions.

Table 2.Sequence of instruction execution

There are instances in the execution of a number of instructions in AVR that requires the concurrent access of the register file or other such logic. As AVR has not been designed to function in a concurrent manner, e.g. only sequential processing of instructions can occur in any stage; even with the modified micropipeline structure was introduced.

The obvious conclusions from above statements are that there are likely to be a large performance compromise but rather as a prototype device to investigate how existing tools could cope with such a task.

The AVR test program along with a number of other numeric constants, e.g. values representing the data that would be actually found in the external memory, has been connected to the AVR. The general configuration of this external logic with the associated connectivity can be seen in Figure 21, and introduced below.

Figure21. Testing configuration

4.1 Testing configuration introduction

In Figure 21, the right block represents the asynchronous AVR. In order to observe the register contents, we extend the general-purpose register R31 data to outside, named OUTPUT, as you can see in the right part of the asynchronous AVR. We connect a LED to its

OUTPUT port to indicate the register content. The left upper block is used for controlling the testing procedure and it receives a 50 MHz signal to internal timing. When this control block detects the reset signal (active low), the minimum 20ms reset signal will send into the asynchronous chip to ensure the internal register situation from the sim_clr port.

Sim_req request event signal arise later to start the first instruction operation. The AVR feedback the acknowledgement events to control block after complete inner operation. The low Sim_req request event signal response the acknowledgement from AVR. The request and acknowledgement situation are idled when transaction is fully completed. The next instruction repeat the procedure described above around.

The instruction fetch stage of AVR will send out an EPROM Enable signal and address signal. Due to the characteristic of the EPROM 2764 active low enable signal is different from the AVR active high enable signal and the voltage level transformation is needed. The instruction code generated from EPROM pass through the one rail to two rail circuits where transfer the single rail Instruction code to the dual–rail instruction code and finally, to the AVR microcontroller.

4.2 16 bits instruction composition

The type of EPROM unit employed was an 8K MD2764 device with 8-bit addressing. In order to present the 16-bits instruction or data value to an AVR instruction input, two of these devices would be enabled in parallel in order to construct that 16-bit word. The structural organization of these EPROM units is shown in Figure 22.

Figure22. EPROM 16-bit word configuration

4.3 Memory interface

In order to connect to synchronous components, some modification must be inserted to meet synchronous requirement to assure the correct function, and introduced below.

4.3.1 Dual-rail to single-rail circuit

The internal register in our AVR is designed for dual rail format. Every data-bit in dual rail format is represented by two one-bit latches. The address data provided from the NPC register is positive part of dual rail data. Furthermore, the EPROM strobe signal is provided by the combination of dual rail address data. The combination organization is constructed by the Muller-C circuits. The strobe signal is guaranteed to be late to the address data which the EPROM output the correct instruction code. Figure23 illustrates that the two-rail data constructs the strobe signal with Muller-C circuits and Y signal is directed from the positive

part of the dual rail X data. More details can be found in [13].

Figure23. Two Rails to One Rail Circuit

4.3.2 Single-rail to dual-rail circuit

The instruction register is dual-rail format. The instruction code generated from EPROM must pass through the one-rail to two-rail circuits. The differential line driver IC (AM26LS31) is used to the transformation circuits. Figure24 illustrates that the two bit single-rail data is transferred to two dual-rail data. The strobe signal is used to control the dual-rail format data.

The Z data is valid with the strobe signal is high. The Z condition is high impedance with the strobe signal is low. More details can be found in [13].

Figure24. One-rail to two-rail circuit

4.3.3 Return to zero circuits

The ultimate condition of four-phase handshake protocol is that request and acknowledgement signals are set to the idle (L) condition. To achieve the rule standard, the extra circuits are needed. We connect the return to zero circuits to the output of the single-rail to dual-rail circuits output. The handshake between instruction fetch stage and EPROM is ended with the strobe signal is low. The single-rail to dual-rail circuits output condition is high impedance with the strobe signal is low. The dual-rail data is low with the return to zero circuits because of the electronic rules. It meets the four-phase return to zero handshake protocol. Figure 25 illustrates the return to zero circuits.

Figure25. Return to zero circuits

4.4 HDL verification

The Modelsim software is used to simulate the system level behaviors of AVR and verify the AVR instruction set function. Figure26 illustrates a looping addition function.

The clr signal is send from outside to reset the internal condition in AVR. The req_AIC is send from outside to start an instruction procedure. After AVR finished one instruction, it sends acknowledgement ack_AIC signal outside. The EPROM_AD and EPROM_ENABLE are issued from instruction fetch stage to outside EPROM. Its feedback instruction data in

dual-rail format are INSTR_Id0 and INSTR_Id1.The REG_31_OUT is the register 31 contents.

After receiving the reset signal, the contents of the register R31 are cleared to zero. The contents are updated the first instruction executed. And it decrease one after the ADD instruction (INSTR_id1 =0ffe) executed. The REG_31_OUT is complement format to match up the outside LED circuits.

Figure26. Simulation result in Modelsim SE/EE PLUS 5.4

4.5 Physical circuits’ validation

The simulation software is used to verify the hardware behaviors. The verified design is downloading to the physical circuits to check the function. Table3 show the test program.

MEMORY ADDRESS ASSEMBLY CODE Description

000000 LDI R31 , 0 R31 = 0

000001 LDI R30 , 1 R30 = 1

000010 ADD R31 , R30 R31 = R31 + R30

000011 JMP 00000010 Branch to 00000010

Table 3. The test program of looping addition

We can validate the physical circuits with our downloading design by observing the LED status. The correct function of the physical circuits has the following meanings. First, the software simulation is reliable with the delay insensitive model. Second, peripheral device with the protocol transformation circuits are also working correctly. Finally, it proves correctness of the design methodology we described. Figure27, Figure28 illustrates the physical circuits. The Xilinx prototype board in Figure 27 comprises Virtex E FPGA Chip which used to implement the AVR microcontroller core and control procedure circuits. The I/O Card in Figure 28 comprises the EPROM, Memory interface, Output LED and Remote RESET button.

Figure27. Xilinx prototype board with Virtex E FPGA Chip

Figure28. I/O Card

Chapter 5 Conclusion

In this thesis, we propose an asynchronous AVR microcontroller and implement it using FPGA chip with asynchronous circuit models. The AVR microcontroller is the Reduced Instruction Set Computer (RISC) architecture and the core is a standard 8-bit microcontroller, widely used in Atmel products.

We describe the behavior of AVR instruction by using data dependency graph. In addition, we also establish several asynchronous FPGA cell libraries in Verilog. Following the verification in simulation software and validation in physical circuits, these cells also proved to work correctly in FPGA environment. With the delay insensitive model, we do not worry about the synthesis and place & routing variation. The design can be downloaded to any FPGA chip arbitrarily without altering it.

We did not implement all 90 instructions in AVR specification. The implemented instructions are listed in table 4.

Owing to lack of asynchronous design tools, we use the existing synchronous tools, such as Modelsim and Xilinx ISE4.2i, to accomplish the design and simulate it. The successful combination with existing synchronous tools is demonstrated. But the development period is long and slow and complication increasing with the scale. In order to shorten the development time, the asynchronous design tools are needed.

We modify Sutherland’s Micropipeline to dual-rail and delay insensitive Micropipeline and it could be implemented in FPGA chip without losing the asynchronous characteristics.

Once the design methodology established, we can use the same way to survey and implement other asynchronous CPU cores. The asynchronous AVR is not intended to be a fully custom designed microcontroller such as the AMULET processors, but it is as a

prototype device to investigate how existing tools could cope with such a task. In the future, a fully custom designed microcontroller will be realized in such way.

Instruction type Instruction counts Instruction lists

Arithmetic and Logic instructions

20 ADD ,ADC,SUB,SUBI,SBC ,SBCI,AND,

ANDI,OR ,ORI ,EOR ,COM,NEG,SBR, CBR ,INC ,DEC ,TST ,CLR ,SER Branch Instructions 25 RJMP,JMP,CP,CPC,CPI,BRBS,BRBC,

BREQ,BRNE,BRCS,BRCC,BRSH,BRLO,

Reference:

[1]Al Davis and Steven M. Nowick, An introduction to asynchronous circuit design, technical report, 1997.

[2]C. Mead and L. Conway, Introduction to VLSI systems, chapter 7, 1980.

[3]Chris J. Myers, Asynchronous circuit design, Wiley Interscience Publication

[4]David L. Dill, Trace theory for automatic hierarchical verifications of speed independent circuits, ACM Distinguished Dissertations, MIT Press, 1989.

[5]F. C. Cheng, S. H. Unger, M. Theobald and W.-C.Cho, Delay-insensitive carry look-ahead adders, In Proc. International Conference on VLSI Design, pages 322-328, 1997

[6]F. C. Cheng, Asynchronous Systems & System-on-a-Chip Design Lecture, http://www.cse.ttu.edu.tw/~cheng/courses/soc.htm

[7]Gaute Myklebust, Embedded systems and uC PowerPoint www.atmel.com

[8]Han-Chun Lin, Design of an Asynchronous Thumb Microprocessor, Master Thesis, Department of Computer Science and Engineering, Tatung University, July, 2002.

[9]I. E. Sutherland, Micropipeline, Communications of the ACM, 32(6):720-738, June, 1989.

[10]Jan T. Udding, A formal model for defining and classifying delay insensitive circuits, Distributed Computing, pp. 197-204, 1986.

[11]Jo C. Ebergen and Parallelaham Birtwistle, Higher Order Workshop, pp. 85-104, SpringerVerlag, 1991.

[12]M. B. Josephs and J. T. Udding, An overview of DI algebra, In Proc. Hawaii international conf, system sciences, volume I. IEEE computer society press, Jan 1993.

[13]R. E. Miller, Combinational Circuits, volume 1 of Switching Theory, 1965.

[14]Steven M. Ban, Introduction to performance analysis and optimization of asynchronous circuits, PhD thesis, California Institute of Technology, 1991.

[15]Takashi Nanya, Yoichiro ueno, Hiroto Kagotani, Masashi Kuwako and Akihiro Takamura, TITAC: Design of a quasi delay insensitive microprocessor, IEEE Design &

Test of Computers, 11(2):50-63, 1994.

[16]Wesley A. Clark, Macro modular computer systems In AFIPS Conference, Volume 30, pages 335-336, Spr. 1967.

[17]Xilinx Prototype Platforms User Guide for Virtex and Virtex-E Series FPGAs,

www.xilinx.com

[18]Virtex™-E 1.8 V Field Programmable Gate Arrays Production Product Specification,

www.xilinx.com

相關文件