Instruction Set - Introduction of Microchip’s PIC18 microcontroller

Chapter 3: Asynchronous PIC18 Design

3.1 Introduction of Microchip’s PIC18 microcontroller

3.1.3 Instruction Set

The PIC18 instruction set adds many enhancements to the previous PICmicro (other families, such as PIC13, PIC16 …) instruction sets, while maintaining an easy migration from these PICmicro instruction sets.

Most instructions are a single program memory word (16 bits) but there are three

instructions that require two program memory locations.

Each single-word instruction is a 16-bit word divided into an opcode, which specifies the instruction type and one or more operands, which further specify the operation of the instruction.

The instruction set is highly orthogonal and is grouped into four basic categories, the detail lists in table 3.1:

• Byte-oriented operations

• Bit-oriented operations

• Literal operations

• Control operations

Most byte-oriented instructions have three operands:

1. The file register (specified by ‘f’)

2. The destination of the result (specified by ‘d’) 3. The accessed memory (specified by ‘a’)

The file register designator ‘f’ specifies which file register is to be used by the instruction. The destination designator ‘d’ specifies where the result of the operation is to be placed. If ‘d’ is zero, the result is placed in the WREG register. If ‘d’ is one, the result is placed in the file register specified in the instruction.

All bit-oriented instructions have three operands:

1. The file register (specified by ‘f’)

2. The bit in the file register (specified by ‘b’) 3. The accessed memory (specified by ‘a’)

The bit field designator ‘b’ selects the number of the bit affected by the operation, while the file register designator ‘f’ represents the number of the file in which the bit is located.

• A literal value to be loaded into a file register (specified by ‘k’)

• The desired FSR register to load the literal value into (specified by ‘f’) The control instructions may use some of the following operands:

• A program memory address (specified by ‘n’)

• The mode of the CALL or RETURN instructions (specified by ‘s’)

Instruction Description

COMF Complement f

CPFSEQ Compare f with WREG, skip = CPFSTG Compare f with WREG, skip >

CPFSLT Compare f with WREG, skip <

DECF Decrement f

DECFSZ Decrement f, skip if 0 DECFSNZ Decrement f, skip if not 0

INCF Increment f

INCFSZ Increment f, skip if 0 INFSNZ Increment f, skip if not 0 IORWF Inclusive OR WREG with f

MOVF Move f

MOVFF Move fs(source) 1^st word to fd 2^nd word

MOVWF Move WREG to f

MULWF Multiply WREG with f

NEGF Negate f

RLCF Rotate Left f through Carry RLNCF Rotate Left f (No Carry) RRCF Rotate Right f through Carry RRNCF Rotate Right f (No Carry)

SETF Set f

SUBFWB Subtract from WREG with borrow

SUBWF Subtract WREG from f

SUBWFB Subtract WREG from f with borrow

SWAPF Swap nibbles in f

TSTFSZ Test f, skip in f

XORWF Exclusive OR WREG with f Bit-Oriented

BN Branch if Negative

BNC Branch if Not Carry

BNN Branch if not Negative

BNOV Branch if not overflow

BNZ Branch if not Zero

BOV Branch if Overflow

BRA Branch Unconditionally

BZ Branch if Zero

CALL Call Subroutine

CLRWDT Clear Watch Dog Timer

DAW Decimal Adjust WREG

GATO Go To Address

NOP No Operation

POP Pop Top of Return Stack

PUSH Push top of return stack

RCALL Relative Call

RESET Software device Reset RETFIE Return from Interrupt enable RETLW Return with literal in WREG RETURN Return from Subroutine

SLEEP Go into Standby mode

Literal Operation

ADDLW Add literal and WREG

ANDLW AND literal and WREG

IORLW Inclusive OR literal with WREG

LFSR Move literal to FSR

MOVLB Move literal to BSR

MOVLW Move literal to WREG

MULLW Multiply literal with WREG RETLW Return with literal in WREG

SUBLW Subtract WREG from literal

XORLW Exclusive OR literal with WREG Data memory <-> Program memory operations

TBLRD Table Read

TBLWT Table Write

Table 3.1: The PIC18 Instruction Set

3.2 Asynchronous PIC18 microprocessor architecture

Asynchronous PIC18 is a pipelined 8-bit processor based on 4-pashe dual-rail handshake protocol and QDI delay model. The system block is shown in figure 3.3.

Figure 3.3: Asynchronous PIC18 system block diagram

Instructions are executed through 4 stages, namely, IF (Instruction Fetch), ID (Instruction Decode), OF (Operand Fetch) and EX/WB (Execution and Write Back).

The pipeline structure is also shown in system block diagram.

We implemented the register sets only for those which are frequently used, such as PC (Program Counter), BSR (Bank Select Register), WREG (Working Register), STATUS (for flags), STKPTR (Stack Pointer) and Stall (for instruction stall). Some PIC18 registers were not implemented, because those registers are not used frequently or related to peripherals.

3.2.1 Instruction Set

The instruction set of asynchronous PIC18 is compatible with Microchip’s PIC18 family. However, some instructions were not implemented due to not implemented peripherals. The instructions we have implemented is shown in table 3.2.

Operation Byte-Oriented Bit-Oriented Literal Control

ADDWF BCF ADDLW BC

SETF SUBFWB SUBWF SUBWFB XORWF MOVFF

Table 3.2: Asynchronous PIC18 instruction sets

3.2.2 Construction of the Basic Elements

The first element we built is a C-element; figure 3.4 shows the CMOS

transistor-level design. Base on the C-element, we can construct the dual-rail OR gate and dual-rail AND gate.

Figure 3.4: COMS transistor implementation for a generalized C-element

The RTZ protocol needs to separate its valid data signals between null signals.

Creating logic using this protocol, the output should become valid data codeword only when all the inputs are valid data. Also the output should remain as valid data until all the inputs have returned to zero. Figure 3.5 shows the construction of a dual-rail OR gate with this needs. The row of C-elements forces both of the inputs to switch before the output switches. The return to zero protocol assumes that once a gate outputs a

valid signal then all its inputs are valid. Also the data will not return to zero until all inputs have returned to zero.

Figure3.5: Dual-rail OR gate symbol and schematic

3.2.3 Pipeline latch

Our pipeline latches use the same as the Muller Pipeline design shown in Figure 3.6. If there is no acknowledgement on the input from next stage (Ack_in is low) any valid data on the input will progress through the C-elements to the output. Once one of the data outputs is active the latch will acknowledge its input (Ack_out is high).

The data output line will stay valid until the target acknowledges and the source has returned to zero. Once the outputs have finished the process the then Ack_in turns to high. The next stage is ready to accept null data. At this time, the inputs have to send the null data. If these latches are placed in a pipeline then the maximum occupancy would be 50% as for each latch that holds data, another separates the data with a null from other data in the pipeline.

Din.t

Din.f

Dout.t

Dout.f

Ack_out

Ack_in

Figure 3.6: Pipeline latch (1-bit)

3.2.4 Design for each Stage

After constructing several basic dual-rail gates, such as C-element, AND and OR, we continued to build some other needed components, such as multiplexer, demultiplexer, encoder etc. based on those basic dual-rail gates. Then, we used these gates and modules to construct the whole processor.

IF stage:

In the IF stage, the “Read” signal controls the output for valid data or null data.

When the Read signal is high, it reads PC value from the PC register and then retrieves the instructions from the program ROM. In addition, the current PC value is sent to ID stage for calculating the next PC value. After the output data has completely sent to pipeline latch, the pipeline latch returns a acknowledge signal. We use this signal to indicate that the Read signal changes to low, and then the IF stage becomes null state.

Figure 3.7: Block diagram of the IF stage

ID stage:

Figure 3.8 shows the block diagram of the ID stage. The purposes of each inner block describe as following:

Instruction Decode: Decoding input of instruction and generating control signals for inner block, OF or EXE/WB stage control.

Branch Control: If the current instruction is the branch relative instruction, the Instruction Decode will generate a control signal to the Branch Control. The Branch Control reads STATUS register value to decide this branch is taken or non-taken and then send control signal to NPC Control.

Stall Control: If the instruction is what we called two cycle instruction such as branch

which needs STATUS register value at ID stage, we will control NPC value to read this instruction again. We use this stall mechanism to make

sure that the STATUS register has been completely written at EXE/WB

stage and STATUS register value read in ID stage is correct.

MPC Control: This block receive information and generate next PC value to the PC register.

Figure 3.8: Block diagram of the ID stage

OF stage:

The main purpose of the OF stage is to prepare source and destination information for EXE/WB stage. Figure 3.9 shows the block diagram of the OF stage.

The left of the stage are input signals from ID stage. The top of the stage are input/output signals from/to registers. The bottom of the stage is the data memory interface. The right of the stage is output signals to EXE/WB stage.

Source1 contains data in data memory. Some special function registers in the data memory have been mapped into real registers. That means if the accessed

memory address is access for special registers, the data is directly retrieved from these registers without any data memory accesses. Therefore, S1 selects data from register or data memory.

Source2 contains the WREG value or the Bit-Op information.

Carry signal is used for some instructions that need the carry bit, such as “rotate left with carry”.

Destination signal gives EXE/WB stage information about where the calculate result should be written.

Figure 3.9: Block diagram of the OF stage

The inner blocks of the OF stage are described as following:

Bank Select Control: If access bank is enabled, it reads the 4-bit BSR value and concatenates with 8-bit address in instruction. The final 12-bit

is disabled, the data memory can be read only in segment-0 of bank-0 or segment-1 of bank-15.

Register Address Mapping: In synchronous PIC microcontroller registers are located in data memory. In our asynchronous PIC design, we implemented real registers instead of the registers in data memory due to reducing the memory accesses. The Register Address Mapping unit is used to detect if the memory access fetches the register, it changes the access to our duplicated registers.

Bit-Op Control: To indicate which bit to be modified, only used in bit operation instructions.

RAM Control: Data memory access control.

3.2.5 Memory

Although our asynchronous PIC18 is based on QDI delay model, the main storage does not follow this model. In order to access the conventional memory, we added a dual-rail to single-rail converter, a single-rail to dual-rail converter and matching delay as the interface to memory.

The system block is shown in figure 3.10 (a). Figure 3.10 (b) is a dual-rail to single-rail converter and figure (c) is a single-rail to dual-rail converter. After the address sends to 2-1 converter, the completion detector will generate a “strobe” signal.

The “strobe” through a delay and then control the output of the 1-2 converter. The delay time has to be longer than the memory data read out time to make sure the output data is correct.

Figure 3.10: Program ROM interface: block diagram (a); dual-rail to single-rail converter (b); single-rail to dual-rail converter.

3.2.6 Register

To accommodate QDI requirement our asynchronous PIC18 was implemented without common bus. Data transfers from one block to the register use a direct data path. Thus, one important thing is to design a dual-rail asynchronous register that match the property of our design. Figure 3.11 illustrates a 1-bit dual-rail register which is modified from the design of the TITAC [12].

When data codeword ({0, 1} or {1, 0}) is sent to din.t and din.f, the two NOR gates can hold the data on their output. When the data is written into this register, it issues an ACK signal as completion signal to inform the previous stage that the writing action is done.

For a read operation, just send a “read” signal and the dual-rail data can be read out correctly.

Figure 3.11: dual-rail register (1-bit)

3.2.7 Reset circuit

Before the reset signal is applied, the latches can hold random data and gates may have some unknown values. In order to eliminate these random data in the beginning of the circuit operation, constructing the resetting circuit with minimum hardware cost is important. Our approach is to attach a reset line to all C-elements and thus forcing all nets in the circuit to reset. If all inputs to a dual-rail gate are low then the output will switch to low. Using this assumption, it is possible to reset the whole circuit to nulls state by just resetting all latches.

Figure 3.1 shows the CMOS transistor-level of resetable C-element design. The resetable C-element is only used for designing the pipeline latches.

Figure 3.12: MOS level implementation of the C-element with reset.

3.3 Design Flow

Figure 3.13 shows our design flow of the asynchronous PIC. First of all, the asynchronous PIC specifications have to be defined. The specifications of the instructions refer to the synchronous PIC18 and the handshake protocol is 4-phase dual-rail with QDI timing assumption.

Secondly, Verilog HDL is used to build the dual-rail logic gates. Based on these dual-rail logic gates, we can construct the frequently used modules such as MUX, DEMUX and decoder etc. Then, the functional blocks needed of our asynchronous PIC18 were constructed with them.

Finally we construct the whole system with these functional blocks.

Figure 3.13: Asynchronous PIC design flow

Chapter 4: Simulation Result

In order to verify the correctness of out asynchronous PIC design, we wrote some simple programs. Table 4.1 shows one of the examples. This program runs a loop and each time the WREG register is added by 1.

Memory address Assembly code Description

00000000 MOVLW 1 WREG <= 1

00000010 MOVWF f(0),0 f(0) <= WREG

00000100 ADDWF f(0),1,0 WREG <= WREG + f(0) 00000110 GOTO 00000100 PC <= 00000100

Table4.1: A simple test program.

The Modelsim software is used to simulate the function of our system and verify the correctness of the instruction execution. Figure 4.1 shows the wave form of the simulation. In the example we verified the results by reading signals of the WREG register.

Figure 4.1: functional simulation

4.1 Performance

In order to synthesis our asynchronous PIC design, we have to modify our MOS level C-element to an AND-OR gate level code. Figure 4.2 shows the maximum path delay in each stage or block. The simulation result is based on Altera Cyclone EP1C20F300C8 FPGA.

Module Maximum Path

Delay ( ns )

Latch ~ 34

IF stage ~ 27 ID stage ~ 455 OF stage ~ 157 EX/WB stage ~ 216

Figure 4.2: Maximum path delay for each stage.

From this result, our critical path of asynchronous PIC design is in ID stage. In ID stage, not only has to decode the instructions but also need to handle the branch instructions. Obviously, if we want to improve our performance, the branch handling mechanism has to be carefully implemented.

4.2 Area

In order to compare with our design, we choose an open source implementation which is a synchronous version of the PIC processor written by the Verilog and can be downloaded into FPGA [11]. The Altera Quartus II software is used to synthesis async.

and sync. circuits, and the target device is based on Cyclone EP1C20F400C8 which is a 1 million gate counts ( or 20,000 logic elements) FPGA. The comparison result is shown in Table4.2.

Asynchronous PIC Synchronous PIC Logic Elements ~ 13,800 ~ 3,900

Table4.2: The gate count comparison

The result shows that our asynchronous PIC uses about 3.5 times logic element than the synchronous version. It is caused by two reasons. One is our design style which is dual rail data encoding method. The other is branch handling. In our asynchronous PIC, the branch instructions are determined in ID stage. This makes the ID stage to use more logic circuits to handle it.

Chapter 5: Conclusions

In this thesis, we have presented a traditional synchronous design tool “Verilog”

for implementing the asynchronous PIC microprocessor. Our asynchronous PIC can run most of the instructions of PIC18 ISA.

The PIC18 is widely used in embedded system design. However, there are not many resources we can obtain for comparing with our asynchronous design. Thus, we just verified the correctness of the function and compared the cost with a synchronous implementation so far.

Our design is based on QDI delay model. However, it is still great benefit needed if we wish to realize it with a real chip.

We will implement the full function of the interrupt mechanism, optimize the system blocks, and finally realize the design with real chip in the future.

References

[1] Alain J. Martin. ＂Synthesis of Asynchronous VLSI Circuits. Formal Methods f o r VLSI Design.＂J. Staunstrup, North-Holland, 1990.

[2] C.H. van Berekl, M.B. Josephs, and S.M. Nowick. “Scanning the technology:

Applications of asynchronous circuits.” Proceedings of the IEEE. 87(2):223-233, February 1999.

[3] Elston, C.J.; Christianson, D.B.; Findlay, P.A.; Steven, G.B. “HADES-an asynchronous superscalar processor”; Design and Test of Asynchronous Systems, IEE Colloquium. 28 Feb 1996 Page(s):10/1 - 10/6

[4] H. van Gageldonk, K. van Berkel, A. Peeters, D. Baumann, D. Gloor, and G.

Stegmann,” An asynchronous low-power 80C51 microcontroller,” pp. 96, 1998

[5] Hauck S., “Asynchronous Design Methodologies: An Overview “, Proceedings of the IEEE, 83(1):69-93, January 1995.

[6] Spars, J., Furber, S., “Principles of Asynchronous Circuit Design - A Systems Perspective” Kluwer Academic Publishers, Hardcover ISBN 0-7923-7613-7, 2001 [7] I. Sutherland, “Micropipelines,” Communications of the ACM, Volume 32, No. 6,

pp. 720-38, June 1989.

[8] I. Sutherland and S. Fairbanks, "GasP: a minimal FIFO control," Asynchronous Circuits and Systems, 2001. ASYNC 2001. Seventh International Symposium on, 2001.

[9] Montek Singh, Steven M. Nowick, “High-Throughput Asynchronous Pipelines for Fine-Grain Dynamic Datapaths”, ASYNC 2000, p198,

[10] D. E. Muller and W. S. Bartky. “A theory of asynchronous circuits.” In Proceedings of an International Symposium on the Theory of Switching, pages 204–243. Harvard University Press, Apr. 1959.

[11] Shawn Tan, “AE18 CPU CORE”, www.opencores.org/projects/ae18, 2003

[12] Takashi Nanya, Yoichiro Ueno, Hiroto Kagotani, Masashi Kuwako, and Akihiro Takamura. “TITAC: Design of a quasi-delay-insensitive microprocessor”. IEEE Design & Test of Computers, 11(2):50--63, Summer 1994.

[13] A. Takamura, M. Kuwako, et al. “TITAC-2: An Asynchronous 32-bit Microprocessor based on Scalable-Delay-Insensitive Model”. ICCD 1997, 288-294.

[14] J.V. Woods, P. Day, S.B. Furber, J.D Garside, N.C. Paver, S. Temple,

“AMULET1: an asynchronous ARM microprocessor＂, IEEE Transactions on Computers, Volume: 46 Issue: 4, pp. 385 – 398, April 1997.

[15] Yuan-Teng Chang, Master’s Thesis, “SA8051: An Asynchronous Soft-core Processor for Low-Power System-On-Chip Application.” National Chiao Tung University, 2005.

[16] http://www.microchip.com

在文檔中管線化非同步PIC18微控制器之實作 (頁 26-0)