• 沒有找到結果。

ALL-Digital Phase-locked Loop

在文檔中 鎖相迴路軟體化之研究 (頁 11-0)

Chapter 2 Overview of Software-defined Phase-locked Loop

2.3 ALL-Digital Phase-locked Loop

The all-digital phase-locked loop (ADPLL) consists of the phase detector, the time-to-digit converter (TDC), the controller, the loop filter, the digital controlled oscillator (DCO), and the frequency divider. The basic block diagram is shown in Fig. 2.1. The working flow is as following:

Step 1: The phase detector detects the phase error between the reference clock and the divided clock and outputs the phase error to TDC.

Step 2 : TDC converts the phase error to digits and output the value to controller.

Step 3 : The controller calculates the proper the DCO control word and passes the DCO control word through the loop filter to DCO.

Step 4 : DCO use the DCO control word to oscillate proper clock frequency.

Step 5 : The frequency divider divides the frequency of DCO clock cycle.

Fig. 2.1 The basic block diagram of ADPLL

OPENRISC

Instruction length 32bits

Register length 32bits

Number of general purpose registers 32

Support multiplication Yes

Support division No

Cycle count of multiplication instruction 4 Cycle count of store instruction 4

Gate count 88000

Host interface WISHBONE

Table 2.1 The OPENRISC Specification

Chapter 3

The Proposed SDPLL Architecture

3.1. SDPLL Architecture Overview

The proposed SDPLL architecture is shown in Fig. 3.1. There are nine basic modules in the proposed SDPLL. The first is Semi Asynchronous Clock Access. Semi Asynchronous Clock Access applies the entire architecture clock access. The second is Memory Controller.

Memory Controller controls Memory and communicates with CPU BUS, Memory and Error Detector. The third is Error Detector. Error Detector includes Time-to-Digit Converter, Phase Detector, Frequency Divider and Pulse Amplifier. The fourth CPU BUS is the bridge of OPENRISC. The fifth is Digital Control Oscillator Interface. Digital Control Oscillator Interface controls DCO and synchronizes the DCO control words and DCO_CLK. The sixth is State controller. State Controller decodes the CPU output message to change states of other modules. The seventh is Digital Control Oscillator. The eighth is 256X32 bits-Memory. The last is OPENRISC, which is described in chapter 2. The details of modules will be illustrated in next section and the system Specification shows in Table 3.1.

Ref_clk

Fig. 3.1 The proposed SDPLL architecture System Spec Reference clock max frequency 6.3 MHz Reference clock min frequency 50KHz Table 3.1 The system Specification

3.2. Semi Asynchronous Clock Access

Semi Asynchronous Clock Access can apply the better performance in circuit noise environment and power consumption. By means of modifying two parameter, it can perform low noise and low power environment. One parameter is operating frequency and the other is operating cycle count. The parameters affect the circuit transition. Controlling circuit transition results in low power environment. Fig. 3.2(a) and Fig. 3.2(b) illustrate the relation of transition tuning parameter. As the clock signal is idle, the clock signal is maintained high because of construct low noise environment.

Cycle count 5 10 15 20

Fig. 3.2(a) The relation of transition tuning the cycle count in 134MHz frequency

Frequency 263MHz 134MHz 90MHz 67MHz Fig. 3.2(b) The relation of transition tuning the frequency at 10 cycles

Fig. 3.3 shows Semi Asynchronous Clock Access which includes four major parts depend on application. The first is the clock signal synchronizer. The synchronizer is combined with two D flip-flops with the clear pin. If Ref_clk rise, the synchronizer sends the pulse to counters to reset the counter so that EMBED_CLK begins to oscillate. The second is the switch of delay matrix. The basic idea of the switch is a NAND gate. If one input of the NAND gate is high, the NAND gate has the same function as inverter. On the other hand, if one input is low, the output of the NAND gate maintain high. Accordingly, the switch decides EMBED_CLK oscillated or idle based on the two states of the NAND gate. The third is delay matrix to provide the variable clock period. The last is counter combined with comparator to control the operating cycle count. As the counter number equals to M_CYCLE comparator sends a signal to the counter and the switch of delay matrix in order to disable counter and make EMBED_CLK idle.

Delay

Fig. 3.3 The hardware architecture Semi Asynchronous Clock

3.3. Memory Controller

Memory Controller which plays an important role in the proposed SDPLL is shown in Fig. 3.5. CPU BUS, Memory and Error Detector communicate with different protocols through Memory Controller. Memory Controller also has strong connection with the proposed instruction set architecture. The proposed instruction set architecture (ISA) will be discussed in chapter 4.

The flow chart of the propose state of Memory Controller is shown in Fig. 3.4. When the system resets, Memory Controller is in the initial state. In the meantime, Memory Controller initializes the address counter. As the load signal high, Memory Controller changes the state to the load state. In this state, Memory Controller loads executing instructions to Memory until load signal falls. After that, Memory Controller changes the state to the transition state. In the transition state, Memory Controller receives the error value as Error Valid is high. After that, Memory Controller changes the state to the algorithm state. In order to send this instruction to CPU BUS, Memory Controller reads instruction from memory and combines the error value to instruction in the algorithm state. After sending instructions, Memory controller changes the state back to the transition state.

Subsequently, memory controller repeats the above flow again.

Initial state

Memory

Fig. 3.5 The pin assignment of the memory

3.4. Error Detector

Error Detector which is shown in Fig. 3.6 has four functional blocks such as Time-to-Digit Converter (TDC), Phase Detector, Frequency Divider and Pulse Amplifier with One Pulse Lock. The core of Error Detector is TDC. The objects which are measured by TDC are Ref_Clk and the phase error between Ref_Clk and the Div_Clk. TDC will be described in section 3.5. However, Error Detector can not only detect the error but also divide the frequency of clock cycle. The details of functional blocks will be illustrated in next section.

Ref_clk

Fig. 3.6 The block architecture of Error Detector

3.4.1. Phase Detector

Phase Detector converts the difference of Ref_Clk and Div_Clk to the pulse and judges which clock signal is lead to another.

3.4.2. Frequency Divider

Frequency Divider also divides the frequency of DCO clock cycle. The core of Frequency Divider is the counter which driven by DCO clock cycle.

3.4.3. Pulse Amplifier with One Pulse Lock

There are two serious problems in the input of TDC such as the pulse account and the pulse width. First, Pulse Amplifier which is showed in Fig. 3.7 with One Pulse Lock modifies the narrow pulse to meet the circuit requirement so as to prevent the input width violation of TDC. If the pulse is wide enough to the circuit requirement, Pulse Amplifier with One Pulse Lock keeps the pulse in the original width. Second, Pulse Amplifier with One Pulse Lock filters the pulse after the first pulse has come. The reason is that TDC accumulates all pulses width until Error_set rises.

Fig. 3.7 The hardware architecture of Pulse Amplifier with One Pulse Lock

3.5. Time-to-Digit Converter

The proposed Time-to-Digit Converter (TDC) which is shown in Fig. 3.8 consists of two major modules, Gate delay TDC and Differential delay TDC. Gate delay TDC has different resolution form Differential delay TDC. The resolution of Gate delay TDC is 10ps but the resolution of Differential delay TDC. Although Differential delay TDC has higher resolution

than Gate delay TDC, Differential delay TDC needs ten times gate counts to Gate delay TDC.

Therefore, the proposed TDC combine Gate delay TDC and Differential delay TDC to minimize the total gate count. The resolution of the proposed TDC is 1ps but the gate count of the proposed TDC is double gate count to Gate delay TDC. Differential delay TDC decides the least digit of TDC_OUT and Gate delay TDC decides the other digits. Thus, the resolution of proposed TDC is 1ps.

Fig. 3.8 The functional blocks of the proposed TDC

3.5.1. Gate delay TDC

Gated delay TDC which is shown in Fig. 3.9 is comprised of four functional blocks. The four functional blocks are TDC_CHAIN, Latch Chain Buffer, Counter and TDC Decoder.

TDC

Fig. 3.9 The functional blocks of Gate delay TDC

First, TDC_CAHIN which is shown in Fig. 3.10 is composed of 255 inverters and 1 and gate. Each inverter connects another inverter. Thus, the inverters constitute a delay chain. One input of the and gate connects the end of the delay chain and the output of the and gate connects to the start of the delay. The delay path and the and gate make up the delay ring. The purpose of the and gate is a switch to control the delay ring. When the other input of the switch is high, the switch does the same function as the buffer. On the other hand, the switch

clears the delay ring. The propagate delay of the inverter is 10ps on Faraday 90nm process.

Therefore, the resolution Gate delay TDC is 10ps.

Second, Latch Chain Buffer which is shown in Fig. 3.10 is composed of the D latches.

As the pulse is high, Latch Chain Buffer stores the state of TDC CHAIN. On the other hand, Latch Chain Buffer keeps the storing information until Error_Set rises. We choose the D latch to be the unit of Latch Chain Buffer due to the issue of the latching time.

Third, Fig. 3.10 illustrates TDC Decoder. The basic idea of TDC Decoder is finding out the position of the transition in the TDC Chain. Thus, we choose the prienc decoder at Deignware library to accomplish TDC Decoder.

Last, Fig. 3.10 illustrates the counter. The output of the last inverter of TDC CHAIN triggers the counter. The counter combines the multiplexer inside because of the dead zone.

Since the pulse goes down, TDC CHAIN will not be clear immediately. Thus, counter operates at expected case. The situation results in wrong value of TDC_OUT. Therefore, the counter which combines the multiplexer inside can prevent this situation.

As a result, Gate delay TDC has 10-ps resolution because of the inverter. As the process upgrades, the resolution of Gate delay TDC gets more higher.

... (for PULSE high latch)

Odd bits prienc decoder

Fig. 3.10 The hardware architecture of Gate delay TDC

3.5.2. Differential delay TDC

Fig. 3.11 shows the functional blocks of Differential delay TDC. The functional blocks are DELAY PULSE, EVEN GROUP, ODD GROUP and Differential Decoder. Some parts of Differential delay TDC are similar to Gate delay TDC. Differential delay TDC uses the

different delay pulse to decode the value of the different delay pulse. Because of the ten values, we can convert least digit of TDC_OUT. Therefore, the resolution of Differential delay TDC is 1ps.

Fig. 3.11 The function blocks of differential delay TDC

First, the DELAY PULSE produces ten different delay pulses. Fig. 3.12 shows that different delays are 11ps, 22ps, 33ps, 44ps, 55ps, 66ps, 77ps, 88ps and 99ps. The delay are 11ps, 33ps, 55ps, 77ps, and 99ps belong to ODD GROUP. On the other hand, The delay are 0ps, 22ps, 44ps, 66ps, and 88ps belong to EVEN GROUP. DELAY PULSE has nine inverters which have 11-ps resolution. The purpose of choosing 11-ps resolution is that 11ps minus 10ps leaves 1ps. Therefore, Differential delay TDC uses the 10-ps resolution inverter and the 11-ps resolution inverter to reach the 1-ps resolution.

Fig. 3.12 The hardware architecture of the delay pulse module

Second, Fig. 3.13 and Fig. 3.14 illustrate EVEN GROUP and ODD GROUP. EVEN GROUP has five EVEN SUB TDC (ESTDC). ODD GROUP has five ODD SUB TDC (OSTDC). The difference between ESTDC and OSTDC is the switch. One is the and gate;

the other is the and gate with one inverse pin. ESTDC and OSTDC are similar with Gate delay. TDC .CHAIN of ESTDC and OSTDC is the chain with twenty 10-ps resolution inverter. Latch Chain Buffer is the latch which stores the chain information. When Pulse is high, Latch Chain Buffer stores the information. On the other hand, Latch Chain Buffer maintains the storing information. Each output pin of Latch Chain Buffer connects to the nor gate with the next to output pin. The output of the nor gate is the input the prienc decoder.

Finally, the prienc decoder outputs the value.

However, ESTDC and OSTDC have one difference from Gate delay TDC. At Gate delay TDC, the same pulse supplies the chain and buffer. At ESTDC and OSTDC, the original pulse supplies the buffer and the delayed pulse supplies the chain. In this action, ESTDC and OSTDC output the value the diminished pulse. For example, if the 22ps-delayed pulse is the input of ESTDC, ESTDC outputs the value of 22ps-diminished pulse. Similarly, if the 11ps-delayed pulse is the input of OSTDC, OSTDC outputs the value of 11ps-diminished pulse.

Finally, the ten values form EVEN GROUP and ODD GROUP has regular patterns. We use the regular patterns to decide the TDC_OUT [3:0] by the differential decoder shown in Fig. 3.15.

Fig. 3.13 The hardware architecture of EVEN GROUP

Fig. 3.14 The hardware architecture of ODD GROUP

Fig. 3.15 The hardware architecture of differential decoder

3.6. CPU Bus

The connection of CPU is the instruction pins, the data pin and the WISHBONE control signals.

3.7. Digital Control Oscillator Interface

Fig. 3.16 and Fig 3.17 show the state diagram of Digital Control Oscillator Interface and pins of Digital Control Oscillator Interface. At the Coarse Frequency state, Digital Control Oscillator Interface outputs the control word to DCO so as to lock the frequency. At the Coarse Phase state, the control word is for phase locking. At the Coarse Transition state, the control word is the same as the coarse Frequency state.

Fig. 3.16 The finite state machine of Digital Control Oscillator Interface

Fig. 3.17 Pins of Digital Control Oscillator Interface

The DCO interface has two different the clock system. One is the CPU clock. The CPU clock supplies blocks which communicate with CPU. The other is the DCO clock. The DCO clock supplies the finite state machine of the DCO interface because the control word needs to synchronize with the DCO clock.

3.8. Digital Control Oscillator

The proposed DCO is a behavior model. The basis frequency is 333 MHz. Each step of DCO is 10fs.

Chapter 4

The proposed SDPLL Software

4.1. OPENRISC ISA Overview

The OPENRISC instruction set which shows in Fig. 4.1 includes the following principal features. First, Simple and uniform-length instruction formats featuring five Instruction Subsets. Second, OPENRISC Basic Instruction Set (ORBIS32/64) with 32-bit wide instructions aligned on 32-bit boundaries in memory and operating on 32-bit and 64-bit data.

Third, OPENRISC Vector/DSP eXtension (ORVDX64) with 32-bit wide instructions aligned on 32-bit boundaries in memory and operating on 8-, 16-, 32- and 64-bit data. Last, OPENRISC loating-Point eXtension (ORFPX32/64) with 32-bit wide instructions aligned on 32-bit boundaries in memory and operating on 32-bit and 64-bit data. The Table 4.1 shows difference between subsets. The proposed of SDPLL uses ORBIS32 to be instruction set.

Fig. 4.1 OPENRISC instruction set ORBIS32 32-bit instructions

ORBIS64 64-bit instructions

ORFPX32 Single-precision floating instruction ORFPX64 Double-precision floating instruction ORVDX64 Vector instruction

Table 4.1 Lists of RISC instruction set

4.2. The Proposed ISA

The proposed ISA has three major ideas: the input instructions, the block jump instruction and the output instructions.

4.2.1. Input Instructions

The purpose of the input instructions is to pass the value to OPENRISC. We choose the ori instruction (Format: ori rD,rA,K. The immediate value is zero-extended and combined with the contents of general-purpose register rA in a bit-wise logical OR operation. The result is placed into general-purpose register rD) and the movi instruction (Format: movhi rD,K.

The 16-bit immediate value is zero-extended, shifted left by 16 bits, and placed into general-purpose register rD.) to be input instruction.

Fig_4.2(a) and Fig. 4.2(b) show the machine code and the assembly code of the instruction. The memory stores the input instructions whose machine code are 0x198004d2 and 0xa98c162f. As the memory controller which is described in the section 3.3 reads instructions from memory, the memory controller detect the instruction whether the instruction is 0x198004d2 or 0xa98c162f. or not. If the instruction is 0x198004d2, the memory controller replaces the lower 16-bit part of the instruction to the higher 16-bit part of the instruction. Similarly, if the instruction is 0xa98c162f, the memory controller replaces the high 16-bit part of the instruction to the lower 16-bit part of the instruction. The above description shows in Fig. 4.3.

Fig. 4.2(a)

The input instruction of lower 16bits

Fig. 4.2(b)

The input instruction of higher 16bits

Fig. 4.3 The input instruction working flow

4.2.2. Block Jump Instruction

The instruction memory which is 256 x 32 bits is distributed into 16 blocks which consists of 16 32-bit instructions. We use the block to be a algorithm operation because most algorithm operation need less than 16 instructions.

The block jump instruction especially focuses on the clock phase issue whether REF_CLK is lead to DIV_CLK or not. Because the different issue has the different operation, we use the block jump instruction which shows in Fig. 4.4. If REF_CLK is lead to DIV_CLK and the memory controller reads the block jump instruction, the reading address is changed to the begin address of the num_block_read block. On the other hand, if REF_CLK is lag to DIV_CLK and the memory controller reads the block jump instruction, the reading address is changed to the begin address of the num_block_lag block. Fig. 4.5 shows the jumping process.

Jump_block, num_block_lag, num_block_lead

0

1c num_block_lag num_block_lead

31 26 25 8 7 4 3 0

Instruction format

machine code

Fig.4.4 The block jump instruction

Fig. 4.5 The working flow of the block jump instruciton

4.2.3. Output Instruction

The output instruction is the way of outputting the information of OPENRISC such as the DCO control word and the state control word. We choose the store instruction of ORBIS32. The store instruction has two parts to be information: data and address. Storing data is the DCO control word. Storing address is the state control word. Fig. 4.6 shows the machine code of the store instruction. Target Rf is a register which maintains the value of the DCO control word. Infor which is shown at Table 4.2 is the input of the state controller.

Fig. 4.6 The output instruction Infor Pin assignment State

0 Frequency Lock operation [0] DCO_mode

1 Phase Lock operation 0 Frequency detection [1] detect_mode

1 Phase Error detection 0 coarse tracking [2] Tracking_mode

1 fine tracking

Table 4.2 The pin assignment of the state controller

4.3. Proposed SDPLL Algorithm

The proposed SDPLL has two basic states: the coarse frequency state and the coarse phase state. At the coarse frequency state, we use the TDC to get the half value of the period

The proposed SDPLL has two basic states: the coarse frequency state and the coarse phase state. At the coarse frequency state, we use the TDC to get the half value of the period

在文檔中 鎖相迴路軟體化之研究 (頁 11-0)

相關文件