Scheduling of MIMO-SDPLL - Tracking Algorithm

Chapter 3 Tracking Algorithm

3.5. Scheduling of MIMO-SDPLL

In MIMO-SDPLL, the number of error detector and DCO can more than one. Because OpenRISC is a single core CPU, it needs handle IP cores switched. It repeats pooling error detector’s error flag. If error flag is high, then CPU do correspondence action. At this work, the proposed MIMO-SDPLL is 2-by-2. Because CPU’s computing power is a restriction. Thus the simplest scheduling is used. The B set of error detector and DCO start tracking algorithm after A set reach frequency maintain state. Fig.3.16. shows this simple scheduling.

IP core A

IP core B

tracking tracking

maintain maintain

CPU time

waiting

Fig.3.16. The scheduling of 2×2 MIMO-SDPLL

However, it is notable that one stage execution time of tracking algorithm need small than the reference clock period of each combination of error detector and DCO. This can promise the correctness of tracking algorithm.

Chapter 4 Implementation and Simulation Result

In this section, the implementation of 2×2 MIMO-SDPLL is discussed at two parts, hardware and software. In hardware section, the implementation details of MIMO-SDPLL architecture are presented. In software section, the method of hardware control via software and software programming are presented. The hardware and software co-simulation result is showed at the end of this section.

4.1. Hardware

The overview of hardware architecture shows in fig.4.1. There are several IP cores include CPU, bus, error detector [3], DCO [7], SACA, memory and flash. The entire IP cores connect to the bus. Only CPU can send transaction request actively at this architecture. The DCO is high-resolution and wide frequency range proposed at [7]. The specification of IP cores list in table 1. The detail of IP cores is mentioned at following sub-section.

After reset signal is asserted, the CPU OpenRISC or1200 start fetching instructions from flash and execution them. These instructions are compiled form C source code. The entire tracking algorithm programmed in C source code. After hardware initial, CPU starts polling error detectors alternative. When reference clock and divided clock have phase error, the error detector will raise error flag signal. CPU then does one tracking algorithm stage for the correspond device.

Fig.4.1. 2×2 MIMO-SDPLL architecture

Table 1 Hardware specification

Item Description Process UMC 90nm SP_RVT Process

CPU

OpenRISC or1200

Maximum clock frequency: 250MHz Gate count: 10k

Bus

WISHBONE bus Architecture: shared bus

Maximum clock frequency: 250MHz

PFD

Minimum error pulse: 200ps

Minimum detectable clock difference: 45ps Gate count: 107

TDC Resolution: 15ps Gate count: 6k

Error detector

Divider

N Range: 1~1023

DCO Frequency range: 660KHz~460MHz Resolution: 10fs

Reference clock Frequency range: 660k/N~460M/N Hz

SACA 103~1231MHz of 64 stage Each stage: 140ps

Memory On FPGA board memory Address space: 8MB

Flash On FPGA board flash Address space: 8MB

4.1.1. Memory Map

There are several I/O devices in MIMO-SDPLL. CPU needs access these devices from bus. The common method is that all I/O devices include memory are treated as a whole memory. Thus software can communication with specific hardware depends on this memory map. Fig.4.2 is the memory map at this work.

Fig.4.2 Memory map of 2×2 MIMO-SDPLL

4.1.2. CPU

The selection of CPU is OpenRISC or1200. Or1200 support cache, MMU and basic DSP capabilities. For direct use and area issue, the above function doesn’t implement in this work.

By default configuration, the divided is simulation by multiplier with the help of compiler. It needs implement because the software is compiled without standard library. At UMC90 process, or1200’s gate count is 10k and has maximum frequency up to 250MHz.

4.1.3. WISHOBONE Bus Protocol

For CPU compatibility and IP cores connection, the WISHBONE [8] bus is chosen.

WISHBONE has two type of interface: MASTER or SLAVE. MASTER can request transaction to SLAVE. Like CPU is a MASTER. SLAVE can reply transaction request from MASTER. All IP cores are SLAVE beside CPU in this work. Fig.4.3 shows a simple example of its single read and single write protocol. The bus protocol works as follows:

Single read-

CLK_I EDGE 0: MASTER presents a valid address on ADR_O.

MASTER presents bank select SEL_O.

MASTER negates WE_O to indicate a read cycle.

MASTER asserts STB_O to indicate the start of phase.

MASTER asserts CYC_O to indicate the start of cycle.

CLK_I EDGE 1: SLAVE decodes input, and responding SLAVE asserts ACK_I.

SLAVE presents a valid data on DAT_I.

MASTER monitors ACK_I, and prepares to latch data on DAT_I.

Note: SLAVE can add any number of wait states (WSS) before asserts ACK_I.

CLK_I EDGE 2: MASTER latches data on DAT_I.

MASTER negates STB_O and CYC_O to indicate the end of the cycle.

SLAVE negates ACK_I in response to negated STB_O.

Single write-

CLK_I EDGE 0: MASTER presents a valid address on ADR_O.

MASTER presents valid data on DAT_O.

MASTER presents bank select SEL_O.

MASTER asserts WE_O to indicate a read cycle.

MASTER asserts STB_O to indicate the start of phase.

MASTER asserts CYC_O to indicate the start of cycle.

CLK_I EDGE 1: SLAVE decodes input, and responding SLAVE asserts ACK_I.

SLAVE prepares to latch data on DAT_O.

MASTER monitors ACK_I, and prepares to terminate the cycle.

Note: SLAVE can add any number of wait states (WSS) before asserts ACK_I.

CLK_I EDGE 2: SLAVE latches data on DAT_O.

MASTER negates STB_O and CYC_O to indicate the end of the cycle.

SLAVE negates ACK_I in response to negated STB_O.

CLK_I

Fig.4.3 WISHBONE read / write timing graph

4.1.4. IP Cores Interconnection

In MIMO-SDPLL, the IP cores need be connected for communication. There are four defined types of WISHBONE interconnection, point-to-point, data flow, shared bus and crossbar switch. The shared bus and crossbar switch is useful for connecting two or more MASTERs with SLAVEs. They are suitable interconnection at this work. And the shared bus requires less interconnection logic and routing resources than crossbar switch. So the shared bus system is chosen for IP cores connection.

Fig.4.4 shows the interconnection block diagram. There two MASTERs and four SLAVEs. MASTERs include or1200’s data channel and instruction channel. SLAVEs include PLL module A, PLL module B, storage module and SACA module. PLL module is the combination of error detector module and DCO module. Storage module includes memory and flash.

The bus arbiter allocates the bus access priority between MASTERs, data channel priority is higher than instruction channel because of data dependency. The address comparator switches the correct data flow from MASTERs to SLAVEs according to the memory map in fig.4.2.

Fig.4.4. The interconnection block diagram of MIMO-SDPLL IP cores with shared bus system.

4.1.5. Semi-asynchronous clock access (SACA) module

It is important to decide the system clock in this work. There are two choices, one is reference clock and another is DCO clock. But reference clock is too slow and DCO clock is not stable enough for CPU computation. For this reason, the semi-asynchronous clock access (SACA) [3] is used.

SACA is a clock generator which synchronous to the rising edge of reference clock. And start trigger fixed number of cycles with specific period asynchronous to reference clock. The fixed number of cycles and clock period are defined by user. Fig.4.5 shows an example of SACA with four cycle count.

Fig.4.5. An example of SACA.

However, if the SACA clock frequency is higher than the frequency upper bond of system, the system will failed. To prevent this condition, the modified SACA architecture is proposed. In fig.4.6 shows the block diagram. It uses TDC, frequency divider, encoder and semi-asynchronous clocker (SAC) to generate the nearest clock. And this output clock will close to Nf × fref. Nf is the frequency multiplication factor define by user, fref is the reference clock frequency. The output clock range of SACA is from 103MHz to 1231MHz into 64 stages, each stage is 140ps.

For example, if the reference clock is 30MHz and system needs 120MHz. The TDC will convert the reference clock period into digital data and divider will divide this data by four.

Then encoder gets this divided-data and mapping it to the nearest clock of SAC. Finally, SAC

output the nearest clock about 30×4 = 120MHz for four cycles.

In this work, the Nf setting to more than 2048. Because the execution time of one algorithm step needs less than one reference clock cycle.

TDC

Fig 4.6. The modified SACA block diagram

4.2. Software

4.2.1. Software programming

The software programming environment lists in table 2. The working flows of software development are common. First, use C language to develop program. Second, compile the source code by gcc cross-compiler. Finally, gcc will generate executable binary file. This binary file will place in flash.

Table 2 Software environment

Item Description Development language C

Cross compiler gcc 3.4.4 for OpenRISC 32 bit architecture Host CentOS release 5.2

Kernel version: 2.6.18-92.1.17.el5

4.2.2. Memory map I/O control

In order to checking and setting hardware, the memory map I/O is used at this work. This manner help software programmer simpler and easier to access hardware IP cores. The memory map of IP cores shows in fig.4.2.

Here give an example of memory map I/O control. First, define the base address of IP core depend on fig.4.2.

#define DEVICE BASE ADDR x _ _ 0 95000000 (8) Second, declare a volatile pointer and assign base address value. A volatile qualifier must be used when reading the contents of a memory location whose value can change unknown to the current program.

* _ ;

_ = ( *) _ _ ;

volatile unsigned long DEVICE PTR

DEVICE PTR unsigned long DEVICE BASE ADDR (9) Finally, this pointer can read or write IP cores register by software.

/ /

This section shows the simulation result. The simulation setting lists in table 3. The simulation waveform is presented at fig.4.7. Because the waveform is hard to observation, the phase error information is recoded to draw a curve. This is helpful for the variation of phase

error. Fig.4.8 shows the result of phase error variation graph. X-axis represents number of reference clock; this is unrelated to clock period. Y-axis represents phase error between Nth reference clock and Nth divided clock in picoseconds. The above figures shows clock B start tracking after clock A reaching phase maintain state. And both clock convergence at 550^th reference clock. The figure below shows the detail of 550^th and 800^th. The divided clock A maintain at ± 3ps and clock B at ± 9ps.

Table 3 Simulation setting

Item Description Reference clock period Divided clock A: 23333.333ns

Divided clock B: 25555.555ns Divided N Divided clock A: 100

Divided clock B: 100 Phase error variation Divided clock A: ± 3ps

Divided clock B: ± 9ps

(a)

(b)

(c)

Fig.4.7. The waveforms of simulation result.

(a) The full view of waveform. (b) The zoom version of (a). Clock A do coarse tracking. (c) The zoom version of (c). Clock A and B enter frequency maintain stage.

Fig.4.8. 2×2 MIMO-SDPLL simulation result of phase error variation

Chapter 5 Conclusion and Future Work

From the simulation result, the proposed 2×2 MIMO- SDPLL has high resolution under software control. And more than one clock can be handled with scheduling. When the specification needs substantially modify, platform can fit the new specification by replacing the software.

In the future, the CPU scheduling can improve to multi-tasking. And the software development can provide different software IPs for different applications.

Reference

[1] Terng-Yin Hsu, Bai-Jue Shieh, Chen-Yi Lee” An all-digital phase-locked loop(ADPLL)-based clock recovery circuit” Solid-State Circuits, IEEE Journal of Volume 34, Issue 8, Aug. 1999 Page(s):1063-1073

[2] Ching-Che Chung, Chen-Yi Lee, “An all-digital phase-locked loop for high-speed clock generation” IEEE Journal of Solid-State Circuits, Vol38,pp.347-351, Feb.2003

[3] Chang-Ying Chuang, Terng-Yin Hsu” The Study of Software-defined Phase-locked loop ” Thesis CS, NCTU 2008.

[4] “OpenRISC 1200 IP Core Specification” Rev. 0.7, Sep 6, 2001

[5] “OpenRISC 1000 Architecture Manual “July 13, 2004

[6] Li Jyun-Rong, Hsu Terng-Yin” The Study of All Digital Phase-Locked Loop (ADPLL) and its Applications” Thesis CS, NCTU 2006.

[7] Jung-Chin Lai, Terng-Yin Hsu” The study of Wideband, Cell-based Digital Controlled Oscillator and its Implementation” Thesis CS, NCTU 2007.

[8] “WISHBONE System-on-Chip (SoC) Interconnection Architecture for Portable IP Cores”

Revision: B.3, Released: September 7, 2002

在文檔中多輸入多輸出鎖相迴路軟體化之研究 (頁 33-0)