A Symbol-Rate Timing Synchronization Method for
Low Power Wireless OFDM Systems
Jui-Yuan Yu, Ching-Che Chung, and Chen-Yi Lee
Abstract—This work addresses power reduction and perfor-mance improvement for wireless orthogonal frequency-division multiplexing (OFDM) systems using a dynamic sample-timing controller (DSTC) and phase-tunable clock generator (PTCG). The receiver, applying the proposed DSTC algorithm, searches for the optimal sampling phase at the symbol rate, instead of the Nyquist rate (or higher), to reduce the extra power consumed in high-rate operations. The proposed PTCG circuits provide the desired clock phase for optimum sampling to improve system performance. Both the DSTC and the PTCG are evaluated in a multibandt OFDM (MB-OFDM) ultra-wide-band system. Sim-ulation results indicate that the overall system performance is improved by 1.7-dB signal-to-noise ratio at a packet error rate of 8% and the total baseband power is reduced by 40%.
Index Terms—Dynamic sample-timing controller (DSTC), or-thogonal frequency-division multiplexing (OFDM), phase-tunable clock generator (PTCG), synchronization, ultra-wide-band.
TRADEOFF between system performance and power dissipation is one of the most critical issues in the design of a wireless portable device. Timing synchronization plays an important role in ensuring good signal decoding performance, since it determines the sampling timing and frequency of the analog-to-digital converter (ADC) on incoming signals or packets. Existing design approaches apply multirate sampling (at Nyquist rate or higher than symbol rate –) to the incoming waveform with a fixed high-rate clock source that drives an ADC circuit. Those high-rate sampled signals are then calculated by an interpolation algorithm  to yield a symbol-rate signal stream for data decoding. This design methodology to designing power-thirsty portable devices is facing increasing difficulty, because both the ADC circuits and the interpolation circuits are operated at a higher processing rate, resulting in higher power consumption.
To enable power reduction with symbol-rate sampling, both Mueller–Muller detection (MMD)  and MMD-based timing recovery methods  have been proposed under a pulse amplitude modulation (PAM) scheme for best sampling timing search within a sample period. The literature explores the timing synchronization issue in orthogonal frequency-division mul-tiplexing (OFDM) systems based on the best block-boundary
Manuscript received August 14, 2007; revised November 5, 2007 and December 5, 2007. First published May 23, 2008; current version published September 12, 2008. This work was supported by MOEA of Taiwan, R.O.C., under Grant 95-EC-17-A-03-S1-0005. This paper was recommended by Associate Editor L. Larson.
The authors are with the Department of Electronics Engineering, National Chiao Tung University, Hsinchu 300, Taiwan, R.O.C. (e-mail: blues@si2lab. org).
Digital Object Identifier 10.1109/TCSII.2008.923405
search for each fast Fourier transform (FFT) window , . However, those studies ,  do not guarantee that the signals in each block are sampled at the best sampling timing. Accordingly, multirate sampling schemes ,  have been developed to maintain system performance; hence the high-rate operations significantly increase power dissipation.
To maintain system performance and, in the meantime, to re-duce power dissipation, this work presents a dynamic sample-timing control (DSTC) scheme for symbol-rate synchronization in OFDM systems, where the optimal sampling timing within a symbol-period interval can be calculated. Unlike multirate sam-pling methods –, this DSTC requires aided circuits in a clock source design to generate a phase-tunable clock waveform that corresponds to the best sampling instance as calculated by the DSTC. A digitally-controlled oscillator (DCO) design con-cept  is applied to the phase-tunable clock generator (PTCG) design to enable this symbol-rate DSTC  for low-power wireless applications.
The rest of this paper is organized as follows. Section II presents an overview of the proposed system. Section III then derives the proposed DSTC algorithm. Section IV shows the design of the proposed PTCG. Section V analyzes the system performance and the hardware design complexity of our proposal.
OFDM signals transformed by an -point discrete inverse Fourier transformation (IDFT) after digital-to-analog conver-sion (DAC) are expressed as
(1) where is an information symbol stream with phase-shift keying (PSK) or quadrature amplitude modulation (QAM) en-coded, and is the sample period. In up/down and analog/ digital data conversions, the signal suffers from any nonideal hardware distortion, including every filter response ( and ) from both the transmitter (TX) and the receiver (RX) sides. Therefore, down-converted signals in a receiver are given by
where , is the carrier frequency
offset (CFO) between the TX and RX, and is additive white Gaussian noise (AWGN). After the ADC circuits, signals in dig-ital time domain are given by
Fig. 1. Block diagram of the proposed baseband receiver with the aid of the proposed DSTC and PTCG.
Fig. 2. SIR power ratio versus sampling timing error " with f(t) =
raised 0 cosine lter (roll-off factor 0.5).
where is a sampling phase offset fraction of the sample pe-riod, and is an impulse function. Once a packet has been detected, the DSTC is activated to provide commands to the PTCG to generate the optimal clock phase for signal sampling in the ADCs. Then, the signals follow the conventional decoding flows. Fig. 1 depicts the system diagrams and their operations.
The goal of this algorithm is to determine a signal sampling instance with the sampling rate equal to the symbol rate,
, where the intersymbol interference (ISI) associated with filter pulse responses is minimized. Hence, the optimum sam-pling instance is defined as
where is written in a simplified notation as and the ratio is the signal-to-ISI power ratio (SIR). Thus, the
is determined when the minimum ISI power sum appears in the denominator of (4). In other words, the SIR of the sam-pled signals becomes maximized when the optimum sampling instance is chosen. Here, is replaced by a raised-cosine filter impulse response with a roll-off factor of 0.5 as shown in Fig. 2. A noncalibrated sampling timing error may yield low signal-integrity data even in the absence of noise, implying there is system performance degradation when sampling time is not well-calculated.
Accordingly, the absolute-squared-sum of the received signals is jxR;"[n]j2= m xT ;"[n0m]f"[m]ej(2f )(n+")T 2 +jwB;"[n]j2 +2Re wB;"[n] m xT ;"[n0m]f"[m]ej(2f )(n+")T (5) where is the band-limited zero-mean additive noise sam-pled at timing offset with . Notably, is assumed to be independent of transmitted signals, and the ex-pected value of the received signals is
(6) where represents the power of the color noise . The absolute-function operation suppresses the CFO factor. There-fore, the expected received signal power is composed of the transmitted signals filtered by the and the band-limited noise power. The effects of on the transmitted signals are expressed as main signal taps and their filter interfer-ence . Moreover, the expected power
may be assumed to be a constant, say unit power, because every received signal power is adjusted by applying an automatic gain control (AGC) mechanism, thus normalizing the signal power to the dynamic range of the ADC. For simplicity
is defined. Equation (6) becomes
(7) These information symbols are assumed to be in-dependent, and then
. Therefore, (7) reduces to (8) Consequently, the expected absolute-squared value of the ceived signals is determined by the power of both the filter re-sponse and AWGN. Based on the SIR definition, (8) is rewritten as
(9) where is the interference power of the filter tail. is defined as a characteristic func-tion (CF) of the . A sharper CF curve is more easily recognized to calibrate the sampling timing errors. Fig. 3 plots a CF curve that corresponds to the raised-cosine filter of Fig. 2 in a noiseless channel. This finding reveals that the max-imum implies the optimum sampling instance
Fig. 3. CF for timing error search.
Fig. 4. Proposed PTCG.
. Therefore, the search based on the SIR curve in (4) is transferred to the search of the maximum , i.e.,
(10) Each sample period is planned to be divided into eight phases, as shown in Fig. 3, for the finite hardware resolution and limited CF value degradation. Therefore, the optimal sampling timing from these eight positions always corresponds to a CF value that approaches the maximum value. The next section describes the design of an 8-phase clock generator.
An all-digital PTCG provides eight clock sampling candi-dates for phase selection, and outputs a specific one according to the calculated in (10). This PTCG phase-tuning is achieved within a few cycles, and a clock output during this tuning pe-riod is glitch-free. Fig. 4 presents the proposed PTCG, which primarily consists of an all-digital pahse-locked loop (ADPLL), a TDC, and a cell-based delay line. Initially, the ADPLL is locked to the target frequency with the period . This gen-erated clock is used as a reference source for multiphase clock generation.
In the earlier delay-locked loop (DLL)-based multiphase clock generation approach , the TDC enables a delay line locked to a single clock period , giving a
in each delay stage. In a high-speed cell-based DLL design, however, maintaining such a short delay and a high resolution simultaneously is difficult. Thus, in this design, the TDC mea-sures three periods and makes the DLL lock to . After the DLL is locked, each delay stage presents a
delay. Hence, the minimum delay constraint for each delay stage (D) is extended to three times its original value. Moreover, the numbers in the numerator and denominator of the delay
Fig. 5. Proposed TDC in the PTCG.
Fig. 6. Packet frame used for the DSTC computation.
Fig. 7. (a) SNR required atPER = 8% and the probability in estimating a
timing error". (b) Overall system PER in our proposed DSTC and
22-interpo-lation design schemes.
fraction 3/8 are not divisible by each other. As a result, the gen-erated phase after each delay cell presents a unique fraction of the period.
Fig. 5 shows the proposed TDC design architecture. The TDC takes the input PLL528 from the ADPLL. From this PLL528,
in a variable RANGE to the PTCG controller. According to the RANGE, the controller determines whether the periods of both PLL528 and PULSE_IN are correctly generated to avoid a false lock in this loop. Then, the phase detector (PD) of the PTCG continues fine tuning the delay of the delay elements to improve the accuracy of the output phase position.
An example is shown here. The delay between the
PLL528 and P0 is .
Therefore, the P0 phase shift to the PLL528 is
. The clocks are
generated accordingly. This PTCG takes the estimated timing error , represented by Forward or Backward, from the DSTC to select a proper clock phase for ADC sampling. To avoid glitches in CLK528, a Forward command is converted cycli-cally to several Backward commands by a glitch-free controller, say a phase rotator block.
V. SIMULATION ANDMEASUREMENTRESULTS
The proposed DSTC and PTCG  are evaluated in a multi-band OFDM (MB-OFDM)-based ultra-wide-multi-band (UWB) system  with a low-density-parity-check (LDPC) code for error correction . The signal bandwidth is 528 MHz with quadrature phase-shift keying (QPSK) and OFDM modula-tions, and the maximum data rate 480 Mbps is selected in the following simulations.
The dynamic timing recovery starts the search right after a packet is detected. Each packet is composed of 21 OFDM symbols at the beginning of each preamble frame (Packet Sync Seq), which is applied to the DSTC as shown in Fig. 6. With those 21 identical OFDM symbols in the packet sync sequence, each of which gives an absolute-squared sum, and the sampling time is changed in the time slots between OFDM symbols. In other words, the PTCG changes its output clock phase only during the time slots associated with band transitions such that signals in each OFDM symbol are sampled with the same clock phase within an OFDM block period.
Fig. 7 plots the overall system performance. The curve de-noted in Fig. 7(a) represents the signal-to-noise ratio (SNR) required to reach a packet error rate (PER) of 8%, where whole packets are sampled at a fixed and identical sampling offset . When the DSTC algorithm is applied, the optimal sampling instance is sought during the preamble. Before the end of the preamble, the DSTC decides which timing instance is the best for sampling in terms of system performance. Since the DSTC is operated in a noisy environment, it does not always choose the best sampling instance. Consequently, the curve repre-sents the probability of the final decision made by the DSTC. Therefore, the SNR of our proposed system required to reach
is given by
On the other hand, the system with the interpolation scheme takes two samples (pair sample) within each symbol period for timing synchronization. Although the signals from the interpo-lated pair-samples are noise-averaged, one of the pair samples always suffers from stronger ICI effects, leading to degrade the signal quality. Therefore, this interpolation-based approach does not outperform our proposal with signals sampled at the optimal instance. Moreover, the interpolation approaches in the existing literature does not support phase-tunable capability such that the probability function in this case can be regarded as a uniform distribution. Fig. 7(b) plots the system performances of the proposed DSTC-PTCG and the interpola-tion schemes. Fig. 8 shows both the simulated and measured waveforms from the PTCG design. This PTCG provides eight clock phases operating at 528 MHz, and each consecutive phase is separated by about 237 ps. As shown in Fig. 8(a), the output CLK528 is initially aligned to P5. When a command Forward is asserted, the selected output clock phase from the multiplexer (PH_SEL) counts down to zero and cyclically rotates back to P7 and P6. As the targeted clock phase is reached, a phase ready signal (PH_RDY) is activated to denote that the clock is updated from a new phase. To further explain the conversion of the Forward into several Backwards commands, P5 is again assumed to be initially selected as the system clock (CLK528), and the value of PH_SEL changes at the rising edge of the system clock, say P5. If is directly updated to before the rise of P6, a glitch may occur. Con-versely, a change in CLK528 from to can avoid this glitch problem, except for the duty cycle change of CLK528 in the phase change intervals. The waveform in Fig. 8(b) plots the phase and . The measured RMS and jitters are 30 s and 101 ps, respectively.
The resulting PTCG power is 10.9 mW  in the 0.13- m standard CMOS process. Table I presents both the performance and the power reduction in this work. The scheme herein offers an improvement of approximately 1.7-dB SNR over that of the interpolation method. In this MB-OFDM UWB system, the symbol rate is 528 MHz, and the interpolation scheme re-quires a sampling rate of 1056 MHz in the ADC circuits. The estimated power reduction is from 160 mW 2 to 70 mW 2 (for both I and Q paths) if the ADC circuits in  are taken into account. When the baseband processor power 31.2 mW  is included, this reduced sampling rate results in a baseband power
saving of mW mW
if the ADC  is calculated together. Note that the pro-posed symbol-rate synchronization method requires both the DSTC and PTCG circuits with power consumption of 1.9 and 10.9 mW, respectively. Fig. 9 presents a microphoto of this base-band test chip.
Fig. 8. Generated PTCG waveforms. (a) Simulated waveforms. (b) Measured waveforms.
Fig. 9. Microphoto of the test chip in 0.13-m standard CMOS technology.
In this work, both the DSTC and the PTCG schemes are proposed to enable symbol-rate synchronization to reduce power consumption by preventing high-rate circuit opera-tions. This proposal offers better signal sampling quality and enhances overall system performance compared to those interpolation-based solutions. In addition, this proposal has low design complexity with the low power feature, making it very suitable for realizing cost-effective OFDM-based wireless communications solutions.
The authors would like to thank the United Microelectronics Coporateion for the University Shuttle Program in fabricating the test chip. In addition, the measurement services provided by Chip Implementation Center are also acknowledged.
 M. Bhardwaj, “A 180 MS/s, 162 MS/s wide-band three-channel base-band and MAC processor for 802.11a/b/g,” in ISSCC Dig. Tech. Papers, Feb. 2005, pp. 454–455.
 J. Thomson, “An integrated 802.11a baseband and MAC processor,” in
ISSCC Dig. Tech. Papers, Feb. 2002, pp. 126–127.
 M. Simon, “Nonlinear analysis of an absolute value type of an early-late gate bit synchronizer,” IEEE Trans. Commun., vol. COM-18, no. 10, pp. 589–596, Oct. 1970.
 F. M. Gardner, “Interpolation in digital modems—Part II: Implemen-tation and performance,” IEEE Trans. Commun., vol. 41, no. 6, pp. 998–1008, Jun. 1993.
 K. Mueller and M. Muller, “Timing recovery in digital synchronous data receivers,” IEEE Trans. Commun., vol. COM-24, no. 5, pp. 516–531, May 1976.
 F. A. Musa and A. C. Carusone, “A baud-rate timing recovery scheme with a dual-function analog filter,” IEEE Trans. Circuits Syst. II, Exp.
Briefs, vol. 53, no. 12, pp. 1393–1397, Dec. 2006.
 C. Williams, M. A. Beach, and S. McLaughlin, “Robust OFDM timing synchronization,” IEEE Electron. Lett., vol. 14, pp. 751–752, Jun. 2005.
 H.-Y. Liu and C.-Y. Lee, “A low-complexity synchronizer for OFDM-based UWB system,” IEEE Trans. Circuits Syst. II, Exp. Briefs, vol. 53, no. 11, pp. 1269–1273, Nov. 2006.
 D. Sheng, C.-C. Chung, and C.-Y. Lee, “An ultra-low-power and portable digitally controlled oscillator for SoC applications,” IEEE
Trans. Circuits Syst. II, Exp. Briefs, vol. 54, no. 11, pp. 954–958, Nov.
 J.-Y. Yu, C.-C. Chung, H.-Y. Liu, Y.-W. Lin, W.-C. Liao, T.-Y. Hsu, and C.-Y. Lee, “A 31.2 mW UWB baseband transceiver with all-digital
I/Q-mismatch calibration and dynamic sampling,” in Proc. IEEE Int. Symp. VLSI Circuits, 2006, pp. 236–237.
 C.-C. Chung and C.-Y. Lee, “A new DLL-based approach for all-digital multiphase clock generation,” IEEE J. Solid-State Circuits, vol. 39, no. 3, pp. 469–475, Mar. 2004.
 Multi-Band OFDM Physical Layer Proposal Merger #1 for IEEE
802.15.3a, IEEE P802.15 Working Group for Wireless Personal Area
Networks, Mar. 2004.
 H.-Y. Liu, C.-C. Lin, Y.-W. Lin, C.-C. Chung, K.-L. Lin, W.-C. Chang, L.-H. Chen, H.-C. Chang, and C.-Y. Lee, “A 480 Mb/s LDPC-COFDM-based UWB baseband transceiver,” in Dig. IEEE Int.
Conf. Solid-State Circuits, 2005, pp. 444–445, 609.
 C. Sandner, M. Clara, A. Santner, T. Hartig, and F. Kuttner, “A 6-bit
1.2 GSps low-power flash-ADC in 0.13-m digital CMOS,” IEEE J.