• 沒有找到結果。

IMPLEMENTATION AND VERIFICATION

This chapter is organized as follows. First, we illustrate a design flow for asynchronous implementation on a synchronous FPGA. Then we describe some implementation issues.

Finally, we illustrate the verification methods.

4-1 The Design Flow

The asynchronous 8051 core is modeled by Balsa language. Descriptions of the 8051 core (.balsa file) are then translated (balsa-c) into implementations in a syntax directed-fashion with language constructs being mapped into networks of parameterized instances of “handshake components” (.breeze file) each of which has a concrete gate level implementation. balsa-netlist automatically generates Verilog netlist for Xilinx synthesis tool.

The following steps are design flow for FPGA. The Verilog netlist generated by

balsa-netlist is converted into a netlist of basic gates in the synthesis step of the design flow.

The netlist may be optimized using technology-independent logic minimization algorithms.

However, we must avoid the logic minimization for hazard free circuits and buffers generated by balsa-netlist. We add the constraint “keep hierarchy” to avoid the logic minimization. Then the synthesized netlist is mapped to the target device using a technology-mapping algorithm.

The placement algorithm maps logic blocks from the netlist to physical locations on an FPGA.

On the placement has been done, the routing algorithm determines how to interconnect the logic blocks using the available routing. The final output of the design flow is the FPGA programming file, which is a bit stream determining the state of every programmable element inside an FPGA. The design flow is shown is figure 27.

Figure 27: The Balsa and FPGA design flow

4-2 Implementation Issues

Compilation from Balsa programs to Xilinx netlist proceeds in two steps. In first step, handshake circuits form the intermediate architectures. An important characteristic about this compilation is that it is transparent, which allows feedback about important performance characteristics such as performance, area, timing and testability to be generated at the handshake circuit level and to be presented to the VLSI programmer at the Balsa level. When the designer is satisfied with the performance of the Balsa program, the corresponding handshake circuit is expanded into a gate-level netlist. At this level the design can be simulated to obtain more accurate performance figures using commercial simulators.

We choose four-phase bundled data protocol to implement the handshake circuit instead of dual-rail encoding in order to reduce the area cost. Handshake circuits are implemented only using standard cells such as AND, OR, Inverter gate and flip flop. We must pay attention to delay matching and the verification (after routing) of the timing assumptions that have been made. In order to minimize the verification effort, delay-matching is conserved. We add enough buffers on the all request signals on the push channel and the acknowledge signals on the pull channel. Because there are no asynchronous cells in FPGA, all handshake circuits are mapped onto standard cells. This results in the area overhead of the handshake circuits.

It is to be noted that the Xilinx synthesis tool could do logic minimization but it must be avoided. There are hazard-free circuits and buffers in asynchronous circuits. They can not be minimized. We can avoid this situation by adding the constraint “keep hierarchy” on the handshake modules.

RAM and ROM are not modeled by Balsa language. We implement them using the block RAM on FPGA in order to reduce area cost. We add a handshake interface between the 8051 core and the memory. The signal rfd is employed in the RAM and ROM to provide

completion detection of reading or writing operations.

Until now all instructions can be executed except MUL, DIV and MOVX. The peripherals are not considered such as timers and UART. The design was realized in Xilinx FPGA SPARTAN IIE 300 ft256.

4-3 Verification

In this section we illustrate the verification for the SA8051. There are three steps for verification. First, we do behaviour simulation in Balsa environment. Then, we do timing simulation in Xilinx environment. In this step we must check the timing validity on control circuits. Finally, we do verification on FPGA board.

4-3-1 Behavior Simulation

The environment used to do behavior simulation for SA8051 is illustrated in figure 28.

The memory model ROM and RAM are the two predefined procedures in Balas as shown in figure 29. We assign the address width and data width to determine their size. The ROM size is 4K bytes and the RAM size is 256 bytes. The contents of the ROM are loaded during initialization as 8-bit quantities in the hexadecimal format from a hexadecimal file. A hexadecimal file is translated from a C program by KEIL tool [15]. Whenever an addressing arrives at the ROM model from the ROM address channel, the ROM outputs the instruction code. When the processor wants to write data, it sets the signal rNw and sends out the address and the data. When the processor wants to read data from RAM, it resets the signal rNw and sends the address and the data.

The 8051 simulator executes the instructions in the hexadecimal file. The execution results are compared with the contents of the RAM. If the results are not equal, we must modify the code of the processor.

SA8051 (Balsa)

ROM Model (Balsa) RAM Model (Balsa)

8051 HEX file

8051 Object Code

Source file (C)

8051 Simulator

Execution Result

=

Yes Correct

No

KEIL-oh51

KEIL-c51 KEIL-a51 KEIL-bl51

Figure 28: SA8051 behavior simulation environment

Figure 29: Balsa description for memory model (a) ROM model (b) RAM model

4-3-2 Timing Simulation

When completing the behavior simulation in Balsa environment, the following step is to do timing simulation as shown in figure 30. The ROM model is automatically generated from a hexadecimal file by Xilinx CORE generator [16]. The RAM model is also generated by it.

The memory model implemented by block RAM on FPGA is combined with the processor core netlist synthesized by balsa-netlist through handshake interface. Before synthesizing, the constraint file must be added. The content of the constraint file is the “keep hierarchy”

constraint on some handshake modules. The constraint file is employed in order to satisfy timing constraints and avoid the logic minimization. When the constraint is added, the synthesis, map, placement and routing are preceded in order. A NCD file is generated after PAR (place & route). The NCD file may contain placement and routing information in varying degrees of completion. NetGen generates netlist that are compatible with Xilinx supported simulation such as ModelSim.

The result of timing simulation is compared with the result of the 8051 simulator. If the results are not equal, we must modify the processor netlist generated by balsa-netlist. For example, when the timing violation occurs on flip-flops in BrzVariable modules, we trace the write request signal and find out the corresponding write acknowledge signal. Then we add some buffers ahead of the write acknowledge signal.

CPU Core Netlist (Xilinx)

ROM Model (EDIF) RAM Model (EDIF)

Source file (C)

8051 HEX

KEIL-oh51 KEIL-c51 KEIL-a51 KEIL-bl51

8051 Simulator

Result

Synthesis

MAP

Placement and Routing

Timing Simulation

Result

=

Constraint File

Yes

No Correct

NetGen SA8051

Figure 30: SA8051 timing simulation environment

4-3-3 Board Level Verification

After timing simulation, we do verification on Digilent D2-FT system board [17] with the device Xilinx FPGA SPARTAN IIE 300 ft256. There is a frequency divider circuit on the top module because of the 50 MHz clock on board. The two input ports Activate_0r and reset are connected to the Switch 1 and Switch 2 respectively. The signal reset is set when the frequency divider is enabled. If the signal Activate_0r is set, the CPU is activated. There are 4 output ports connected to 4 seven segments and 16 LEDs respectively in order to display the results. The board level verification environment is shown in figure 31.

CPU Core

RAM ROM

Top

50 MHz

Clock

SW 1 SW 2

Activate_0r

reset

P0[3:0]

P0[7:4]

P1[7:4] P1[3:0]

7-seg 7-seg

7-seg 7-seg

16-LEDs P2[7:0]

P3[7:0]

Figure 31: Board level verification environment

4-4 Concluding Remarks

In this chapter we introduce the design flow for asynchronous circuit implementation in FPGA. Some implementation issues are described. We illustrate the flow of verification including the behavior, timing and board level.

相關文件