Chapter 1 Introduction
1.3. Thesis Organization
Section 2 shows the overview of conventional ADDLL and SDDLL. Section 3 illustrates the architecture of the proposed SDDLL. Section 4 illustrates the control strategy of the proposed SDDLL and the simulation result. Section 5 presents conclusions and future works.
Chapter 2
Overview of SDDLL
The characteristics of the proposed SDDLL are software controllability and programmability. SDDLL combines the CPU and the silicon IPs of the delay-locked loop, so the hardware and software can work in coordination with each other. All of the components will be discussed as follows.
2.1. Basic Concept
There are several types of delay-locked loop. On the whole, all-digital approach has the fast-locking feature and higher tolerance to process variation and the supply voltage, but its skew and jitter are more serious relatively. However, for SoC implementation, all-digital approach is more suitable due to the compatibility for integration system and the insensitivity to supply noise. So All-digital Delay-locked Loop(ADDLL) is chosen as the basic DLL IPs for the platform of the proposed SDDLL.
Delay-locked loop can generate an output clock whose phase is related to the reference clock via a delay chain. Therefore, the delay time should be integral multiples of reference clock’s period when the DLL is locking.
N the clock period of the reference clock.
The conventional All-digital Delay-locked Loop contained several components, such as Phase Detector (PD), Digital-controlled Delay Line (DCDL) and the control unit for DCDL.
Digital-controlled Delay Line
Fig. 1. Basic block diagram of ADDLL.
The architecture of ADDLL is shown in Fig.1. The Phase Detector compares the phase relation between the reference and output clock. The control unit can change the digital signal to adjust the delay time of DCDL according to the output of Phase Detector. If the output clock is leading the reference clock, the control unit can extend the delay time of DCDL. On the contrary, if the output clock is lagging the reference clock, the control unit can shorten the delay time of DCDL. Fig. 2 shows that an illustration of phase tracking for the 4-bit DCDL.
Assume that the intrinsic delay of DCDL is 1ns, and the After appropriate phase tracking for the delay time of DCDL, the DLL should be locking.
Control Signal
CLKref
CLKout
Control bits 0011 0101 0100
4ns
Fig. 2. The delay adjustment of DLL.
(a)The example 4-bit delay line.
(b)The timing diagram of phase tracking when the input clock period is 5ns.
In consideration of the actual conditions, phase-tracking has some tough questions to solve. Because of the influence of input jitter and intrinsic jitter of DCDL and the dead zone of PD, it is more difficult to achieve lock state. This design should consider these unideal effects. If the DLL is in the locking state, there is still a few phase error. In general, the amount of phase error is associated with the resolution of DCDL.
2.2. Locking Issue 2.2.1.
False-lockingFalse-locking is also called Stuck-locking. False-locking will cause the DLL could not achieve the phase-locking state permanently. If the initial delay of DCDL is shorter than half of input clock period, then false-locking occurs.
Ti n i t *Tr e f 2
1 (2)
Tinit is the initial delay of DCDL, Tref is the period of reference clock. If the inequality (2) is true, the false-locking occurs. Fig. 3 shows the result of false-locking.
CLKref
Fig. 3. The timing diagram of false-locking when the input clock period is 5ns.
Because the initial delay is shorter than half of reference clock, the phase of output clock is more near the original one. Therefore, PD determines that the output clock is lagging. The delay should be shorter. At last, the control signal should be zero, that is, the delay will be the shortest. However, DLL still could not lock because it is impossible for DCDL to be zero delay. Therefore, the clock-deskew function of DLL is also meaningless. The DLL design have to avoid the occurrence of the false-locking.
2.2.2.
Harmonic-lockingHarmonic-locking actually can successfully lock the phase in the end. It means that the delay time of DCDL is larger than one reference cycle. Although DLL is still able to lock, the longer delay path will increase the intrinsic jitter of DCDL. Furthermore, if the DLL has multiphase applications, then the harmonic-locking is not allowed. The reason is that DLL needs to divide one reference cycle delay into multi-part to generate the multiphase output
clock.
The result of Harmonic-locking is as follows.
N the clock period of the reference clock. Moreover, The condition that brings about
Harmonic-locking is as follows.
Tinit *Tr e f 2
3 (4)
Tinit is the initial delay of DCDL. When the inequality (4) is true, the delay time will approach two or more reference cycles. The timing diagram is shown in Fig 4.
CLKref
Fig. 4. The timing diagram of Harmonic-locking when the input clock period is 5ns.
The final delay time in Fig. 4 is two reference clock cycles. It can notice that the DLL also need more than two cycles to adjust the delay time of DCDL because it must wait about two reference clock cycles to detect the real delayed phase for each tuning. Apart from this, Harmonic-locking is not that big problem when DLL does not provide multiphase applications. If the false-tracking and harmonic-tracking should be avoided, the DLL should fulfill the following condition.
Tref Tinit *Tref 2
* 3 2
1 (5)
2.3. Locking Strategy
The locking strategy of Digital DLL can be divided into several categories. The most conventional method is the sequential search algorithm, i.e., the shift register-controlled DLL and the counter-controlled DLL. But the lock-in time of DLL increases exponentially with the number of control bits. The second one is the successive-approximation register-controlled DLL (SARDLL). The strategy of SARDLL is like the binary search algorithm, so its lock-in time can be shorter. The last one is Time-to-digital Converter (TDC) scheme. TDC can roughly estimate the input clock period and use a digital output to represent it. According to the digital output, The DLL can set up the delay of DCDL. TDC can achieve the shortest lock-in time at the cost of area and power. In this work, the SDDLL will adopt the above methods.
2.3.1.
TDCThe architecture of TDC is shown in Fig. 5. TDC can measure an input pulse and give a corresponding digital output via a cascaded counter and D-type flip-flops.
AO21D4
Fig. 5. The architecture of TDC.
The effect of TDC scheme can help the DLL lock faster. If the mapping from TDC to
DCDL is appropriate. The initial delay time of DCDL can approach one reference clock period quickly, so DLL can achieve faster phase-locking. Moreover, the locking issues will not take place due to the appropriate initial delay. TDC is used to accelerate phase-locking and solve the conventional locking issues.
2.3.2.
Pulse Amplifier with One Pulse LockTDC has its minimum measurable pulse width because of the setup time of D-type flip-flops and a little gate delay. If the input pulse is too short, the TDC cannot detect the existence of input pulse. Therefore, this design adopt the Pulse Amplifier to extend the narrow pulse to avoid the input pulse violation of TDC. Fig. 6 shows the architecture of Pulse Amplifier.
Delay Path
RB
D Q
RB
D Q
Error_set INPUT_PULSE
1 1
OUTPUT_PULSE
Fig. 6. The architecture of Pulse Amplifier with One Pulse Lock.
If the input pulse is too narrow, the Pulse Amplifier can extend the length of input pulse to the total delay path and output the new pulse. It solves the input violation problem of TDC.
The Pulse Amplifier can also filter the other pulses after the first pulse via the control of Error_set, that is, the function of one pulse lock.
Chapter 3
The Architecture of The Proposed SDDLL
SDDLL combines the Or1200 CPU and DLL IPs via the WISHBONE bus. Resolution, range of operating frequency and lock-in time are important performance for DLL, so they should be take into account in the DLL design.
It is a big challenge for SDDLL to keep the above performance factor with the communication of hardware and software. The proposed SDDLL supply multiphase output clock and duty cycle calibration, so a multiphase DCDL and duty cycle correctors are adopted in this work.
The organization of this section is as follow. Section 3.1 introduce the basic concept of SDDLL. Section 3.2 shows the architecture of the proposed SDDLL and the communication interface of hardware and software. Section 3.3 shows the detailed silicon IPs of the hardware part of DLL.
3.1. Basic Concept of SDDLL
In the conventional ADDLL, the control unit implements the control strategy and adjusts the control signal to change the delay time of DCDL. The main idea of SDDLL is that replacing the control unit by CPU and software. Let CPU execute the control strategy and tune the delay chain because the software has more flexibility and portability. SDDLL is just like an embedded hardware and software codesign. There are several CPUs in many systems nowadays. If some of them are idle, DLL can also steal the CPU to do phase-tracking for DLL.
The control strategy can be modified for different usages easily, but we should be careful for
the software code writing.
Fig. 3.1. The basic concept of SDDLL.
This design will use the WISHBONE bus to integrate the CPU and the other DLL IP. The software will be put in the Flash. The CPU will read the software via the bus and execute it, and the CPU can also exchange data with DLL via the bus. Therefore, the CPU can control the delay line in the DLL block.
3.2. The Architecture of SDDLL
The or1200 CPU provides the bus interface for WISHBONE bus. This work selects compatible WISHBONE bus to integrate the CPU and all the silicon IPs. The WISHBONE bus is a master-slave interface and asynchronous access mechanism. The or1200 CPU is the master, and it can make a request to the slave for read or write. The architecture is shown in Fig. 3.2.
The software is compiled by GNU toolchain first, and the compiled machine code is stored into the read-only flash. After the system is reset, CPU will access the instructions from
the flash and execute it. CPU can do memory access for data read and write to complete all of the instructions. CPU can also communicate with the DLL via the bus, so they can exchange the information just like TDC output, phase state and digital control signal. SACA is in charge of the system clock generator. The system clock is transferred to all of the blocks via the
Slave 1 Slave 2 Slave 3
Or1200
Fig. 3.2. The architecture and data flow of SDDLL platform.
3.2.1.
CPUThe control unit is replaced with the or1200 CPU. The or1200 CPU is a free open source, released by OpenCores. The or1200 is 32-bit scalar RISC structure with the Harvard architecture, so Or1200 do instruction and data access separately. The used Or1200 is an uni-core CPU.
In this work, This design enables a 1K instruction cache in order to reducing the number of instruction access. In general, one bus access needs three system cycles to handle it. But if there is a cache hit, the instruction access only spends one system cycle. Otherwise, cache miss needs the miss penalty to recover the missing instruction. Or1200 will fetch the after
four instructions for miss penalty, so it needs twelve system cycles.
The data cache is disabled because the repeated data access for the same address is rare.
And the address is not continuous, so enabling the data cache is not worth.
The gate count of CPU with 1KB instruction cache is about 150K in TSMC 65nm process. The reason why choosing Or1200 is that it is an open source and has implemented in various commercial systems.
Fig. 3.3. The overview of OPENRISC Or1200.
3.2.2.
BUSOr1200 provides WISHBONE bus interface. The WISHBONE bus has high compatibility because it is an asynchronous bus. That is, it choose the hand-shaking mechanism for the communication. The master make a request with the access address. The bus will transform the request to the related slave. The slave will give an ack back to master, and then the data transition starts.
3.2.3.
Semi Asynchronous Clock Access (SACA)The SACA is used to be a system clock generator for CPU computation. It will transfer
the system clock via the WISHBONE bus. SACA can multiply the clock frequency with a digital control signal. SACA can apply better performance in circuit noise environment and power consumption.
Ref. clk
SACA clk
8 cycles
Fig. 3.4. An example of SACA.
3.3. The Hardware Architecture of DLL
The hardware of DLL is the key part of the SDDLL. It has the function of clock deskew, multiphase output clock and duty cycle calibration. The architecture of DLL is shown in Fig.
3.5.
In this work, an 8-stage Multiphase Delay Line is chosen. The DCDL with larger stage number can generate more multiphase output clock. But if the stage number is too larger, the highest frequency will be limited by the intrinsic delay of Multiphase DCDL. 8 is also an even number. It can easily generate half delay of the total DCDL. It is good for duty cycle calibration, so 8-stage is chosen.
DLL can communicate with CPU via the WISHBONE bus. DLL transfer the information of the phase state (Lead or Lag) and the TDC-measured output for extended reference clock and phase error.
The CPU will execute the instructions to decide the next digital control signal according to the information from DLL, and transfer the digital control signal back to the DLL.
Therefore, The delay of Multiphase DCDL will change.
Filter
Fig. 3.5. The architecture and the data flow of DLL.
The relation of extended pulse and reference clock is shown in Fig. 3.6.
The clock extender is just like a divide-by-2 frequency divider. The length of extended pulse is the whole reference cycle. TDC can measure the pulse to help the delay of DCDL be near one reference cycle in the first step.
Reference clock
Extended Pulse
Fig. 3.6. The waveform of reference clock and extended pulse.
As mentioned in Chapter 2, the Pulse Amplifier with One Pulse Lock (OPL_PA) is applied to prevent the input pulse violation of TDC. It can lengthen the narrow pulse so as to meet the limitation of the minimum pulse for TDC.
With the pulse amplifier, TDC can measure every kind of pulse. DLL can use TDC to measure the reference clock cycle and the phase error. It can help the SDDLL accelerate the speed of phase-locking.
DLL transfer the information to CPU, and then the software that executed by CPU will make decisions. CPU will transfer the result of digital control signal back to DLL, and the signal can control the delay time of Multiphase DCDL.
3.3.1
Multiphase DCDLThe 8-stage Multiphase DCDL is the coarse-fine structure. It has eight equivalent delay chains. Each delay chain can be divided into two parts. i.e., Coarse delay line and Fine delay line. The architecture of Multiphase DCDL is shown in Fig. 3.7.
Coarse
Fig. 3.7. The architecture of 8-stage Multiphase DCDL.
P0~P7 are the multiphase output clock, and the delay of each delay line should be 1/8
reference clock cycle. The total delay of Multiphase DCDL is one reference cycle when DLL is locking
The design of 8-stage Multiphase DCDL should give consideration to the higher operating frequency, the wider frequency range, the higher resolution and the lower intrinsic jitter. This is a hard design issue and a big challenge in 8-stage multiphase delay line. The consideration of intrinsic delay should be as short as possible due to the consideration of higher operating frequency. Each delay line should be the same because each delay between the multiphase output clock must be precisely equivalent. The rise/fall time unbalance of delay chain may affect the highest operating frequency. This case should be avoided.
Fig. 3.8 shows the waveform of 8-stage Multiphase DCDL when DLL is locking. Each delay between Multiphase clocks is about 1/8 reference clock cycle.
P0
P1 P2 P3 P4 P5 P6 P7 Ref. clk
Fig. 3.8. The waveform of 8-stage Multiphase DCDL.
For the multiphase applications, the total delay of DCDL should be just right one reference clock in order to generate eight multiphase output clock. The frequency range of this 8-stage Multiphase delay line is 1.035MHz ~ 161.29MHz. i.e., the delay range is 6.2ns ~
966.183ns. The resolution of the 8-stage Multiphase DCDL is 90fs.
Table 1. The specification of one delay line
Coarse delay line Fine delay line
C1 C2 F1 F2 F3
Used elements
AO21D4 &
counter
AO21D4 2 parallel AOI
Resolution 0.95ns 20.32ps 1.516ps 133.22fs 11.53fs
3.3.2.
Coarse Delay LineThe coarse delay line [2] can be divided into two parts. The first part (C1 delay line) is composed of several delay cells (AO21D4) and a counter. The differential circuit will generate a narrow pulse for the positive edge and negative edge of input clock. The narrow pulse will trigger the count of delay chain. The counter will count up to C1, and the count stops. The output of counter will be 1 simultaneously. And then D-type flip-flop and counter will be reset. The output of counter will be 0 soon. Therefore, the output of counter will be a narrow pulse, too. The counter will wait the next pulse to trigger the count function. The counter scheme can extend the frequency range with a smaller area.
The second part (C2 delay line) is a selectable delay path. It can choose one path with the control signal, so it can decide the length of delay path. The frequency divider is adopted to recover the waveform of input clock and solve the problem of rise/fall time unbalance for C2 delay line.
3.3.3.
Fine Delay LineIn this work, The fine delay line is composed of the variable capacitive delay elements.
The parallel gates are used as the parallel capacitors on the loading line. The control signal can switch the capacitors in parallel. The number of parallel capacitors may affect the delay time because RC. In this work, the Fine delay line is composed of three components, F1 delay line, F2 delay line and F3 delay line. The F3 delay line has the highest resolution among them. The architecture of Fine-delay line is shown in Fig 3.10.
F1[15]
Fig. 3.9. The architecture of fine delay line.
The NOT gate is adopted to drive the parallel capacitors and the buffer also has the isolation function. The change of driving ability and capacity loading can cause different delay time. This method is good at lower power consumption and higher resolution, but it is very sensitive to capacity loading. It is more difficult for layout issues.