CHAPTER 3 DESIGN TECHNIQUES OF FLASH ADC
3.4 D ESIGN I SSUES AND A RCHITECTURE OF U LTRA H IGH S PEED F LASH ADC
3.4.1 Design Guidelines
After studied varies design, we have come up with the following design guide lines.
1. Auto-zero should be avoided since the series-input capacitors degrade the speed.
2. Track-and-Hold should be included to improve resolution bandwidth.
3. Replacing distributed T/H with single front-end T/H avoids clock skew.
4. Background digital calibration should be avoided since it is area-inefficient and needs complicate control timing.
5. Offset averaging should be included for accuracy improvement.
6. Replacing resistive averaging with active one saves power consumption and avoids dummy amplifiers.
7. Trade-off between one or two stages averaging meets the requirement of accuracy.
8. Interpolation could be included for power consumption and chip area reduction.
9. Replacing resistive interpolation with active one achieves high speed operation.
10. Cascaded continuous time amplifiers should be avoided to improve the overall signal bandwidth before latch.
11. Reset switches should be added in comparator for fast overdrive recovery [18].
12. Reset switches should be avoided in preamplifier behind reference ladder for low kick back noise.
13. Speed and gain of comparator are optimized by cascading preamplifier and latches [19].
14. Replacing ROM-based digital encoder with logic one achieves high speed
3.4.2 4-bit 4-GSps Flash Architecture
The flash architecture shown in Figure 3.10 includes a front-end T/H, a reference ladder, 15 comparators for quantization, and a thermometer-Gray-binary digital encoder. The comparator has three stages to improve speed, a continuous time preamplifier, the first latch with reset switch, and the second latch with reset switch.
The resolution of this ADC is only 4-bit. It does not need offsets averaging techniques due to its low accuracy. The detail circuit implementation will be discussed in Chapter 4.
Figure 3.10 4-bit flash ADC architecture
4 bit
3.4.3 5-bit 4-GSps Flash Architecture Using Active Averaging Technique
The flash architecture shown in Figure 3.11 includes a front-end T/H, a reference ladder, 31 comparators for quantization, two dummy amplifiers for preamplifiers, and a thermometer-Gray-binary digital encoder. The comparator of quantization also has three stages to improve speed, a continuous time amplifier and latch with reset switch, first latch with reset switch, and second latch with reset switch.
The resolution of this ADC is 5-bit, and it needs offset averaging techniques for higher accuracy. An active averaging is used in this ADC. The detail circuit implementation will be discussed in Chapter 4 and Chapter5.
Figure 3.11 5-bit flash ADC with active averaging technique
3.4.4 5-bit 4-GSps Flash Architecture Using Active Interpolation Technique
The flash architecture shown in Figure 3.12 includes the same sub circuits described in previous section. To save power consumption and chip area, an active interpolation is used in this ADC. This technique reduces the number of amplifiers of preamplifier stages from 33 to 16. The detail circuit implementation will be discussed in Chapter 4 and Chapter5.
Figure 3.12 5-bit flash ADC with active interpolation technique
Chapter 4
A 4-bit 4 GSps Flash ADC Circuit Design
Based on the design methodologies and circuit architecture at the end of chapter 3, the circuit implementation and related issues will be discussed in detail in this Chapter. The required resolution is 4-bit, targeting to sampling rate above 3.125GSps and maximally at 4GSps. In Section 4.1, the design issues target on the ultra high speed front-end T/H. Section 4.2 shows how the preamp recovers from large overdrive. Section 4.3 describes the operation of the first latch. Section 4.4 gives the advantage of clocked S-R latch. Section 4.5 explains how to use combinational logic to create digital encoder. Section 4.6 considers the high speed full swing clock input and digital code output. Section 4.7, the whole chip design considerations are given.
Finally, summary the circuits in 4-bit architecture.
4.1 Front-End Track-and-Hold
The front-end track-and-hold circuit improves the dynamic performance of an ADC. Due to the usage of the front-end track-and-hold circuit, the distributed time skew problem of quantization is alleviated. For gigahertz sampling rate operation,
linearity, shift to the T/H circuit design.
The open loop T/H architecture shown in Figure 4.1 is the most popular for high sampling rate. The T/H consists mainly of a sampling switch, a holding capacitor, and unity gain buffers on the input and output. This simple architecture makes it possible to design for very high speed. Since it does not have the benefits of feedback, the accuracy cannot be high.
Figure 4.1 Track-and-Hold in open loop configuration
Differential structure has several benefits over single ended structure. The circuit is less sensitive to common mode noise. The clock-feed-through error is ideally zero. Finally, the even order distortion tones are significantly reduced.
However, the source follower is the best choice for unity gain buffer in the open loop structure. Unfortunately, it is difficult to design a fully differential source follower.
Thus, [20] has proposed the pseudo-differential T/H as shown in Figure 4.2.
Figure 4.2 Pseudo-differential type Track-and-Hold in open loop configuration
V in Vout
SW
Input Buffer Output Buffer CH
CH
SW V in
Vout
SW
Input Buffer Output Buffer
CH
The T/H circuit shown in Figure 4.3 precedes the flash quantization [17]. The circuit is constructed as follows: First, the dummy switches are used to absorb signal-dependent charge injection and clock feed-through released from the sampling switches [21][22]. Second, the holding capacitors are made large enough to overcome the gate capacitance variation of the MOSFET. Third, the buffer is made by PMOS whose bulk is connected to its source to suppress the body effect.
Figure 4.3 Track-and-Hold with PMOS constant current source
In Figure 4.3, the constant current source follower is the simplest realization of a unity gain buffer. But, it still has limitation and drawback. When the input to the buffer is fast and has large amplitude, the slew rate at the output of the circuit is limited. Thus, its speed can not be linearly improved by increasing the bias current.
Although we can suppress the body effect by connecting its bulk to source, its gain still can not reach real unity. Due to the finite output resistance, its gain can only
clk clkb
clk clkb Vin
Vout
CH
CH
transistor to obtain the unity gain. However, it still has the drawback that the large input swing forces the cascade transistor entering triode region easily. This causes the buffer distortion. Furthermore, the non-linearity of the source follower makes T/H poor in dynamic performance.
Figure 4.4 PMOS push-pull source follower buffer
As discussed above, we use the buffer structure that combines the constant current source follower and push-pull source follower. In NMOS example, Figure 4.5 shows three types of pseudo-differential buffers. Figure 4.5(a) has the best linearity but the worst slewing property and less than unity gain. Figure 4.5(b) is a push-pull source follower. It has the best slewing property but the worst linearity. Its gain is controlled for unity. Figure 4.5(c) is a new NMOS source follower. Well control the ratio of two source followers can obtain the wanted slewing property. The gain of this buffer approximates unity. We also obtain the enough linearity for our T/H spec.
V in
Vout
Figure 4.5 (a) NMOS constant current source follower, (b) NMOS push-pull source follower (c) NMOS constant current & push-pull source follower
Based on the new source follower, Figure 4.6 shows the T/H architecture. The input buffer is contributed with NMOS whose bulk is connected to its source to suppress the body effect by using deep N-well process, and the output buffer with PMOS.
VIN
VOUT
VINB
VOUTB Vbn
(a)
VIN VOUT VINB
VOUTB
(b)
VIN
VOUT VINB
VOUTB
(c)
Figure 4.6 New pseudo-differential architecture
Full view of the T/H circuit schematic diagram is shown in Figure 4.7. The sampling switches charge injection and clock feed-through cause distortion by adding or removing charge on the holding capacitor when they disconnects the signal source.
Dummy switches driven by the complement of the switch clock mainly lower the common mode jump. Because both drain and source of the dummy switches are connected to the holding capacitor, their size ratios are initially chosen as half the size of switches. Then, they are fine tuned through simulation.
For ultra high speed operation, the input common mode voltage must be high, 1.3V, and the output common mode voltage of the input buffer for sampling must be low. Then the common mode voltage of sampling signal passing through the output buffer becomes high again. Due to the low sampling common mode voltage, 0.5V, only NMOS for the sampling and dummy switches are used in order to obtain the high speed.
I
Innppuutt BBuuffffeerr CLCLKK OuOuttppuutt BBuuffffeerr
V in Vout
CLK CLK
CLKB CLKB
Vin Vout
Figure 4.7 Track-and-Hold circuit schematic diagram
4.2 Preamplifier
The main purpose of the preamplifier is to provide information about the difference between input signal and reference voltage generated by a resistor reference ladder. For high speed operation, the preamplifier stage should be wideband with sufficient gain to overcome the comparator offsets. It should also recover from large overdrive within one clock cycle.
Figure 4.8 shows the simplest differential amplifier. It is a open-loop single-pole and has large gain-bandwidth. Although adding a reset switch can improve the operation speed, it induces kick-back noise to the reference ladder. Thus we choose the continuous-time amplifier without a reset switch for preamplifier [17].
Figure 4.8 Open-loop single-pole amplifier
The following analysis addresses the fundamental limitation of an open-loop single-pole amplifier in the overdrive recovery. The preamplifier is completely unbalanced at t=0-. With a step input applied to the preamplifier at t=0+, the output transient is shown in Figure 4.9. The step response of the amplifier is given by
( ) (
t A V I R) (
e)
I RVout = ⋅∆ + ⋅ ⋅1− −t/τ − ⋅ (4.1)
where A is the voltage gain, ∆Vis the voltage difference between the input and reference tap, I is the tail current, and τ is the time constant.
VIN
VOUT
VINB
VOUTB
Figure 4.9 Step-input response of preamplifier
Assume that the output step response does not settle within one clock cycle as shown in Figure 4.9. The gain-bandwidth requirement to obtain the desired gain (G=Vout
(
t =T)
/∆V ) can be derived from Eq. (4.1). The output voltage at t=T isabout 3. The GBW required to obtain gain more than unity (G > 1) to overcome the offset of the first latch in one period (T=250ps) of 4 GHz sampling rate is 6.36 GHz.
The higher the DC gain, the lower the required GBW. However, increasing the load resistor for higher gain also lowers the output common mode. That will cause the input transistors enter the triode region and lower the operation speed of the first latch.
Based on the simple open-loop single-pole amplifier, Figure 4.10 shows our fully differential preamplifier. The input signal and reference voltage are all differential.
Figure 4.10 Fully differential open-loop single-pole preamplifier
Vin2
Vo2 Vo1
Vref2 Vref1
Vin1
0.5mA Vbn
1.4k 1.4k
0.5mA
4.3 First Latch
The latch shown in Figure 4.11 operates at 1.35GHz clock rate in 0.35-um CMOS process [17]. The output of the preamplifier drives the first latch. The first latch stage provides large enough output swing to second latch in the worst case. As the preamplifier, overdrive recovery limits highest ADC clock frequency. Due to the continuous time preamplifier separating kick back noise, it is possible to insert reset switch between the two output nodes to optimize power at highest clock frequency.
Figure 4.11 First latch circuit
While the track-and-hold is in track mode, the reset switches are turned on to erase the residual voltage from the previous overdriving. During this reset mode, it output is reset through two parallel discharge paths for fast overdrive recovery. When the track-and-hold is in hold mode, the reset switches are turned off. During this regeneration mode, differential pair (M1, M2, and M9) configured from cross-couple inverters (M1-M4) steer the tail current from one side to the other, speeding up
Vin2
Figure 4.12 Two first latch operation mode, (a) reset mode, (b) regeneration mode Vin2
gnd CLK
Vin1 CLK
vdd
(a) Reset mode CLK
∆V (b) Regeneration mode
∆V
−
off off
off
4.4 Second Latch
Although the first latch provides large enough output swing, it is still not reaching the rail-to-rail logical level. Thus, adding another latch to provide rail-to-rail swing is needed. But, this has some drawback. First, it consumes more power and area when adding second latch array. Second, the second latch may not reach rail-to-rail level at high clock rate. So, combining the clocked latch and continuous time S-R latch can easily reach rail-to-rail logic level and obtain fast overdrive recovery [12][17].
Based on the first latch architecture, the second latch is designed without the tail current to save power at regeneration mode. Changing the back to back inverters with two cross-coupled PMOS lowers the parasitic capacitance to improve the regeneration speed. Figure 4.13 shows the second latch. When the second latch output in single-end, it must add a dummy inverter at the non-used node to balance the regenerative speed.
Figure 4.13 Second latch circuit Vin2 Vo1 CLK
Vin1 CLK Vo2
M1 M2
M3 M4
M5 M6
4.5 Digital Encoder
In many converters, different internal coding schemes are used before the final binary code is generated. Function of the digital encoding for flash ADC is to convert the thermometer code into binary code. Many digital encoding schemes have been developed to suppress glitch errors caused by the meta-stability of the comparators [24][25] and bubbles in the thermometer code [26][27].
Meta-stability errors occur when non-binary comparator levels drive the digital encoder and produce senseless outputs. The meta-stable state can be suppressed by increasing the clock period and/or the gain during the regeneration phase. In our work, cascading two latch stages lowers the meta-stable error at the same clock period.
There are three major sources that induce bubble errors. The first source is that overall input-referred random offset being greater than 0.5 LSB can switch the order of the two adjacent thresholds. The second source is that zero-crossings occur in different time delay due to the comparators have no front-end track-and-hold. The third source is the different propagation delay through each comparator path. In our work, the analysis of the random offset decides whether to average or not. Adding the front-end track-and-hold circuit eliminates the clock skew dependent bubble errors.
Inserting reset switches to first latch and second latch suppresses the propagation delay dependent bubble errors.
Except the analog methods described above, using Gray encoding can also suppress the mete-stability and bubble error. The probability of mete-stable states can be lowered because in Gray encoding no signal is applied to more than one input.
That allows the use of pipelining to increase the time for regeneration. The effect of bubbles is reduced because the accuracy of the Gray code degrades gradually as more bubbles appear in the thermometer code. Table 4-1 shows the correspondence among thermometer, Gray, and binary codes of 4-bit.
Binary Code Gray Code Thermometer Code
0000 0000 000000000000000
0001 0001 000000000000001
0010 0011 000000000000011
0011 0010 000000000000111
0100 0110 000000000001111
0101 0111 000000000011111
0110 0101 000000000111111
0111 0100 000000001111111
1000 1100 000000011111111
1001 1101 000000111111111
1010 1111 000001111111111
1011 1110 000011111111111
1100 1010 000111111111111
1101 1011 001111111111111
1110 1001 011111111111111
1111 1000 111111111111111
Table 4-1 Binary-Gray-thermometer code implementation
At ultra high speed operation, logic based encoder is more suitable than ROM based encoder [25]. Logic based encoder can be pipelined easily for higher speed operation. Each thermometer bit influences only one Gray bit (as shown in the 4-bit example in Figure 4.14) [28]. Extra delay cells are added to match the delay difference among the signal paths. The Gray code is converted to binary code using two-input EXOR cells with delay matching. To operate at 4GHz sampling rate and beyond, the D-flip-flops are implemented with true single-phase clocked (TSPC) circuits [29].
Figure 4.14 Digital encoder
Although this encoder is very fast, it is still not fast enough to operate at 4GHz.
The propagation delay of the critical path is longer than one period (250ps). Thus, pipelining the circuit is needed. The modified digital encoder is shown in Figure 4.15.
Extra delay cells (shaded lines) are added to match the delay mismatch.
Gray3
Bin0 Bin1 Bin2 Bin3
Gray0 Gray1 Gray2 Gray3
CLK Th8 Th4 Th12b Th11b
Th13 Th15b Th3b Th5 Th7b Th6b Th10 Th14b Th2 Th9 Th1
Figure 4.15Pipelined digital encoder Delay Cell Delay Cell DelayCell
4.6 Clock Generator and Output Driver
In this ADC, clock jitter randomly modulates the periodic sampling instants of the T/H. Non-uniform sampling raises the noise floor of the digitized system and degrades the signal-to-noise ratio (SNR). Clock jitter is major a concern for high-speed ADCs. Given a sinusoidal input waveform with amplitude A and radian frequency ω , the SNR due to clock jitter only is (the ideal SNR for 4-bit quantization of a sine wave) at input frequency of 2GHz, the rms clock jitter should be less than 4 ps.
Low-noise methods taken from analog circuit design are applied to the clock generator (see Figure 4.16). The circuits convert a differential sine wave with 600mV amplitude input into two phases full swing clock. Then, it is followed by stages of ratio inverters to drive the clock load for this ADC.
Figure 4.16 Clock generator
In order to drive the output load at ultra high speed operation, the driving circuit Vin2
is composed of open-drain configuration, Figure 4.17. Each single-ended mode with output swing of 200mV needs 18mA driving current.
Figure 4.17 Open-drain output driver Binary code
Ω 50
2nH
1pF 0.8pF
Off-Chip
4.7 Whole Chip Design Issues
The chip floorplan is shown in Figure 4.18.
Figure 4.18 Layout floorplan
In order to avoid large supply and ground bounce, decoupling capacitors are added in analog and digital blocks. The power supply basically divides the chip layout into three domains. The large distance between analog and digital circuits couple less noise from digital to analog. The sensitive analog lines should be as short as possible.
The guard ring and device matching techniques are implemented as well. The chip layout is shown in Figure 4.19.
Digital
Reference Ladder
Analog Decoupling Capacitance Noise
Sensitivity of noise
Large
Large Small
Small
Figure 4.19 Layout of the 4-bit 4GSps flash ADC
4.8 Summary
The key features of this work are now summarized. A front-end T/H for these flash ADCs enables beyond Nyquist input up to 4GHz conversion rate. Continuous time preamplifier provides low kick back noise. Reset switches in the first latch and the second latch give fast overdrive recovery. The second latch is the fastest such CMOS circuit with rail-to-rail output. Replacing the ROM-based encoder with logic-based encoder and using pipeline technique in the encoder improves the operation speed. Using thermometer-Gray-binary digital encoder lowers the bubble errors.
Chapter 5
A 5-bit 4 GSps Flash ADC Circuit Design
Based on the 4-bit 4GSps architecture and circuit described in Chapter 4, two 5-bit architectures and related issues will be discussed in detail in this Chapter. The required resolution is 4-bit, and the sampling rate above 3.125GSps and maximally at 4GSps. In section 5.1, the design issues on the higher gain preamplifier for the comparators are presented. Section 5.2 introduces how to use active averaging to improve accuracy and Section 5.3 introduces how to use active interpolation to add one more bit. Section 5.4 compares the difference of circuit between the 4-bit and 5-bit architecture.
5.1 Preamplifier
The preamplifiers described in previous chapter use passive load. Although using passive load has larger GBW than using active load, it is hard to increase gain
The preamplifiers described in previous chapter use passive load. Although using passive load has larger GBW than using active load, it is hard to increase gain