適用於通用性系統設定的結合MIMO偵測與通道解碼之演算法與系統架構設計與探討

(1)

行政院國家科學委員會專題研究計畫期末報告

適用於通用性系統設定的結合 MIMO 偵測與通道解碼之演算法與系統架構設計與探討

計畫類別：個別型

計畫編號： NSC 101-2218-E-011-045-

執行期間： 101 年 11 月 01 日至 102 年 09 月 30 日執行單位：國立臺灣科技大學電子工程系

計畫主持人：沈中安

計畫參與人員：碩士班研究生-兼任助理人員：黃建豪碩士班研究生-兼任助理人員：游家博

報告附件：出席國際會議研究心得報告及發表論文

公開資訊：本計畫可公開查詢

中華民國 102 年 12 月 24 日

(2)

中文摘要：本報告針對多輸入多輸出(Multi Input, Multi Output, MIMO)無線通訊系統，提出一種可調式聯合偵測與解碼 (configurable joint detection and decoding, CJDD)演算法。此方式可以將多輸入多輸出(MIMO)的偵測和迴旋碼 (Convolutional Codes)的解碼在同一階段實現。再者，相較於前人先前研究提出的方法只能針對單一特定的系統，而可調式聯合偵測與解碼(CJDD)演算法可以被配置成支援不同編碼率(Encoder Code Rates)和訊號調變技術(Modulation Schemes)的組合。因此，本文提出的可調式聯合偵測與解碼 (CJDD)演算法更可以實踐在實際的 MIMO 無線通訊系統中。再者，本計畫研討了可容錯式之 MIMO 偵測演算法及電路架構，

使得 MIMO 訊號可達到近似最佳解而整體系統功率消耗(包含 MIMO 偵測系統及記憶體)可有效降低。最後，由於我們使用的 MIMO 偵測演算法是基於樹狀搜尋法則，經由與其他研究學者共同合作，我們發現我們研討的樹狀搜尋演算法能夠被有效的套用在晶片網路(Network on Chip, NoC)的繞線

(Routing)問題中。因此，我們有效的擴大了本計畫的研討領域，參與了晶片網路高效能低功耗繞線的研究。

中文關鍵詞：偵測, 迴旋碼, 可調式聯合偵測與解碼,可容錯, 記憶體, 晶片網路

英文摘要： This report presents a configurable joint detection and decoding algorithm (CJDD) for MIMO wireless communication systems. The proposed approach can perform MIMO detection and decoding of convolutional codes in a single stage. Moreover, the CJDD algorithm can be configured to support various combinations of modulation schemes and encoder code rates, in

contrast to the previously reported method that can only operate with specific system settings.

Therefore, the CJDD algorithm illustrated in this report is more realizable to practical MIMO wireless systems. Furthermore, this report describes the algorithm and VLSI architecture of error-resilient MIMO detector for the memory dominated wireless systems. This scheme can achieve a close-to optimal signal quality while the memory is operated under reduced supply voltage. Therefore, the overall system power consumption (including the MIMO detector and embedded buffering memories) can be significantly.

Finally we applied the tree searching algorithm that

(3)

was designed for MIMO detection to the routing

problem in the Network on Chip (NoC) such that a low power routing method is achieved.

英文關鍵詞： MIMO, Convolutional codes, configurable joint detection and decoding, error-resilient, memory, network on chip, NOC

(4)

國科國科會會專專題題研研究究計計畫畫成成果果報報告告

I. I .

摘摘要要及及關關鍵鍵詞詞

( (k ke ey yw wo or rd ds s) )… …… …… …… …… …… …… …… …… …… …… …… …… …… …… …… …… …… …… …… …… …… …… …… …… …… …… …… …… …… …… …. .1 1

II I I. .

前前言言…

…… …… …… …… …… …… …… …… …… …… …… …… …… …… …… …… …… …… …… …… …… …… …… …… …… …… …… …… …… …… …… …… …… …… …… …… …… …… …… …1 1

II I II I. .

研研究究目目的的…

…… …… …… …… …… …… …… …… …… …… …… …… …… …… …… …… …… …… …… …… …… …… …… …… …… …… …… …… …… …… …… …… …… …… …… …… …. .. .. .2 2

IV I V . .

文獻文獻探探討討…

…… …… …… …… …… …… …… …… …… …… …… …… …… …… …… …… …… …… …… …… …… …… …… …… …… …… …… …… …… …… …… …… …… …… …… …… …. .. .. .. .3 3

V V . .

研究研究方方法法…

…… …… …… …… …… …… …… …… …… …… …… …… …… …… …… …… …… …… …… …… …… …… …… …… …… …… …… …… …… …… …… …… …… …… …… …… …. .. .. .. .. .3 3

VI V I. .

結果結果與與討討論論…

…… …… …… …… …… …… …… …… …… …… …… …… …… …… …… …… …… …… …… …… …… …… …… …… …… …… …… …… …… …… …… …… …… …… …… …. .. .. .4 4

VI V II I. .

參參考考文文獻獻…

…… …… …… …… …… …… …… …… …… …… …… …… …… …… …… …… …… …… …… …… …… …… …… …… …… …… …… …… …… …… …… …… …… …… …… …. .. .. .1 10 0

(5)

1

I.

摘要及關鍵詞 (keywords)

本報告針對多輸入多輸出(Multi Input, Multi Output, MIMO)無線通訊系統，提出一種可調式聯合偵測與解碼 (configurable joint detection and decoding, CJDD)演算法。此方式可以將多輸入多輸出(MIMO) 的偵測和迴旋碼(Convolutional Codes)的解碼在同一階段實現。再者，相較於前人先前研究提出的方法只能針對單一特定的系統，而可調式聯合偵測與解碼(CJDD)演算法可以被配置成支援不同編碼率 (Encoder Code Rates)和訊號調變技術(Modulation Schemes)的組合。因此，本文提出的可調式聯合偵測與解碼(CJDD)演算法更可以實踐在實際的 MIMO 無線通訊系統中。再者，本計畫研討了可容錯式之 MIMO 偵測演算法及電路架構，使得 MIMO 訊號可達到近似最佳解而整體系統功率消耗(包含 MIMO 偵測系統及記憶體)可有效降低。最後，由於我們使用的 MIMO 偵測演算法是基於樹狀搜尋法則，經由與其他研究學者共同合作，我們發現我們研討的樹狀搜尋演算法能夠被有效的套用在晶片網路 (Network on Chip, NoC)的繞線(Routing)問題中。因此，我們有效的擴大了本計畫的研討領域，參與了晶片網路高效能低功耗繞線的研究。

Abstract – This report presents a configurable joint detection and decoding algorithm (CJDD) for MIMO wireless communication systems. The proposed approach can perform MIMO detection and decoding of convolutional codes in a single stage. Moreover, the CJDD algorithm can be configured to support various combinations of modulation schemes and encoder code rates, in contrast to the previously reported method that can only operate with specific system settings. Therefore, the CJDD algorithm illustrated in this report is more realizable to practical MIMO wireless systems. Furthermore, this report describes the algorithm and VLSI architecture of error-resilient MIMO detector for the memory dominated wireless systems. This scheme can achieve a close-to optimal signal quality while the memory is operated under reduced supply voltage.

Therefore, the overall system power consumption (including the MIMO detector and embedded buffering memories) can be significantly. Finally we applied the tree searching algorithm that was designed for MIMO detection to the routing problem in the Network on Chip (NoC) such that a low power routing method is achieved.

關鍵字( keyword )：MIMO、偵測、迴旋碼(Convolutional Codes)、可調式聯合偵測與解碼 (configurable

joint detection and decoding, CJDD)、可容錯 (error-resilient)、記憶體(memory)、晶片網路(Network on Chip, NoC)。

II. 前言

隨著科學及工程技術的提升，無線通訊漸漸成為人們日常生活中不可或缺的一部份，因此為了提升的傳輸資料速率( Data Rates )和傳輸品質( Transmission Quality )，多輸入多輸出通訊系統( MIMO Communication System )逐漸受到重視。從 MIMO 通訊系統模型圖( Fig. 1 )中可以觀察出，現階段的通訊系統中，會將資訊可能是來自語音、圖片、影像或是控制信號等等，轉換成為原始資料位元( source information bits )，透過通道編碼器( channel encoder ) 編碼後位元( coded bits )，最後藉由訊號調變 (modulation )的技術，將編碼後位元轉換成不同的傳送封包( transmit symbol )，透過一至多根發射天線群( transmit antennas )送出。另一方面，一至多根接收天線群( receive antennas )則是將接收封包 ( receive symbol )匯入多輸入多輸出偵測器( MIMO Detection )中，並盡可能還原出發送訊號點，之後再將結果送入通道解碼器( channel decoder )解出原始資料位元。從 Fig. 1 中可以看出，在傳送端有

M

個發射天線群( transmit antennas )，而在接收端有

N

個接收天線群( receive antennas )。故傳送訊號時，所有的

M

個傳送封包( transmit symbol )在圖中註記為 (其中

i

=1,2,…,m)，會組成一個

M×1

的傳送向量( transmit vector) ，透過無線通道的

N×M

傳輸矩陣傳送之後，可以在接收端被收到；而接收封包( receive symbol )在圖中註記為 (其中

i

=1,2,…,n)，會組成一個

N×1

的接收

(6)

2

sˆ Hˆ yˆ

sˆ1

sˆ2

sˆM

yˆ1

yˆ2

yˆ3

Fig.1 MIMO 通訊系統模型圖

向量( received vector) 。並且可表示成方程式( 1 )，而在此方程式中的代表的是

N×1

的高斯雜訊。

1 另外，在目前的通訊系統領域中，迴旋碼( Convolutional Codes )因為其具備強大的除錯能力，因此經常被應用較高雜訊的無線傳輸通道環境下，如： GSM ( Gobal System for Mobile ) 、 CDMA ( Code-Division Multiple Access )、DAB ( Digital Audio Broadcasting )…等。而通常在通道編碼( channel encoding )時，為了增加系統的可靠性，會加入冗餘位元( redundant bits )，藉此獲得較佳的傳輸品質，而根據其額外增加位元之不同，而會有不同的編碼率( coded rate )，而其定義是每輸入 x 個原始資料位元( source information bits )會產生 n 個編碼後位元( coded bits )，就稱編碼率( coded rate )為 x/n。其 x/n 的數值越大傳輸的效益就越高，在本研究中主要採用的編碼率有兩種，分別是 1/2 以及 1/3 編碼率( coded rate )。在當代的無線通信系統中，為了支援更高的傳輸資料速率( Data Rates )和更好的傳輸品質( Transmission Quality )， MIMO 技術[1],[3]已經是重要的核心。然而，多輸入多輸出（ MIMO ）技術嚴重地增加系統複雜度其中還包括了面積( area )還有功率消耗( Power Consumption )的問題，造成此問題的原因在於調配輸入與輸入端的天線群，需要多維度 ( Multi-dimensional )的訊號處理( Signal Processing )，而之中包含了向量和矩陣的數學運算。若是要將多輸入多輸出（MIMO）技術應用在電池驅動的無線裝置中，則必須更進一步的減少多輸入多輸出（MIMO）

系統複雜度，尤其對接收器( receiver )而言更是一大挑戰。

III. 研究目的

本研究的研究成果可分為三項。首先，為了降低 MIMO 通訊系統中接收器的系統複雜度，在本研究中提出了一種可調式聯合偵測與解碼 (configurable joint detection and decoding, CJDD)演算法[27]，利用提升 MIMO 偵測器( MIMO detector )和通道解碼器( channel decoder)間的運算關聯性，藉此降低 MIMO 系統複雜度。並且搭配本計畫所定義的系統參數，讓本研究成果可以適用在不同的調變架構及不同編碼率 ( coded rate )的迴旋碼( Convolutional Codes )的交叉組合條件。進而實現可調式( Configurable )的概念，和[4]所提出的研究結果相比，本報告中所提出的演算法被應用在多輸入多輸出( MIMO )無線通訊系統的範圍更大，並且更能夠在實際的系統中實現。接著，本研究探討了可容錯 MIMO 偵測(Error Resilient MIMO Detection)演算法則及系統電路架構[25]。我們所提出的架構可以使得 MIMO 偵測的結果在記憶體的供應電壓降低(以節省功率消耗)時仍能夠維持良好的訊號品質。因此，本研究的成果可以有效降低整體系統功率消耗(包含 MIMO 偵測器及記憶體元件)。最後，由於本研究中使用了樹狀搜尋(tree searching)的演算法，

我們發現我們所研討的演算法能夠被廣泛的應用在不同的領域。例如我們探討了將此演算法應用在晶片網路中的繞線問題中並得到了低功耗繞線法則的結果[26]。

(7)

3 IV. 文獻探討

從先前提及的 MIMO 接收器中，可以發現 MIMO 偵測器和通道解碼器之間存在著一種相依 ( inter-dependent )的關係。因此若是想要設計出高效能 MIMO 接收器，則須 MIMO 偵測器和通道解碼器兩者一起綜合考量( joint consideration )。在文獻中有兩種架構最常被提及：分離式及結合式。

分離式結構的特色是把多輸入多輸出偵測器( MIMO detector )和通道解碼器( channel decoder)在演算法以及應替架構上，分開成兩個完全分離的元件。如文獻[2], [5], [7], [9]各別設計出有效地 MIMO 偵測器和通道解碼器，或是[4], [6], [8],[9]針對偵測器和解碼器鏈(detector-decoder chain)的連接點進行優化。

尤其[2]-[3], [5], [9]提出利用樹狀搜尋方法實現多輸入多輸出偵測( MIMO Detection )，利用大量地降低複雜度，展現最佳（或接近最優的）誤碼率(Bit Error Rate ; BER)。另一方面，同時考慮偵測器和解碼器鏈的方法已經在[2], [4], [6]被提出。在[2]中作者提出一種在 MIMO 偵測器和通道解碼器兩者之間疊代的結構( Iterative Structure )，這個方法是將通道解碼器的輸出送回 MIMO 偵測器更新資訊，而 MIMO 偵測器則是將更新完的資料再次送回通道解碼器檢索出更可靠的解碼結果，而資訊在兩者之間交換過數次後，可以提升相應的解碼位元品質。自此方法問世之後，許多文獻[5], [9], [10]發表了能夠支援疊代偵測及解碼的MIMO 偵測器系統架構與電路，而[8]還基於疊帶偵測器和解碼器的方式利用 FPGA 實現基於MIMO 正交頻分複用技術( Orthogonal frequency-division multiplexing, OFDM )的系統架構。

另一方面，雖然採用疊代方式可以降低誤碼率，由於大量的疊代在成的過度的處理延遲( Processing Latency )，在實際系統(尤其是實際及時系統 ; Real-Time System)中造成了許多困擾。尤有甚者，疊代架構亦需要大量的記憶體以做為暫存之用。因此，雖然疊代系統在理論上有許多優點，它在實際的無線通訊系統中甚少使用。反之在[4]和[6]中作者提出一種將多 MIMO 偵測器和通道解碼器實現在同一階段中，此方法稱為聯合偵測與解碼( Joint Detection and Decoding, JDD)，尤其[4]提出的演算法和系統電路結構，連結了聯合MIMO 偵測與迴旋碼(Convolutional Codes)解碼兩者，而[6] 則是針對區塊碼( Block Code )，在[4]的提出的設計，可比較的系統複雜度相較於傳統分離式顯著地改善誤碼率約略是 2~2.5dB。

除此之外，有鑑於記憶體在無線通訊系種中所佔的面積及功率消耗比重不斷上升，已經接近 50%的總體複雜度，過去這幾年來許多文獻探討了如果記憶體在降低供應電壓的情形下(以節省功率消耗)造成儲存資料的錯誤，相關訊號處理系統(如 MIMO 偵測及通道解碼)會發生的狀況。更有甚者，許多文獻提出了能夠修正此種記憶體錯誤的容錯式(

error-resilient or error-tolerant

)系統。例如文獻

Error!

Reference source not found. 和 Error! Reference source not found. 探討了因降低電壓而產生的記憶體錯誤對 MIMO 系統的誤碼率造成的影響

，

文獻 Error! Reference source not found.-[22]

提出了多個容錯式通道解碼器

。最後，文獻[23][24]探討了記憶體錯誤對於數位信號處理系統的效能影響。

V. 研究方法

由前一章節文獻探討中發現，雖然聯合偵測與解碼(JDD)不只可以降低系統複雜度，亦可改善卻誤碼率( BER )，但是這之中包含了明顯的缺點，即該系統只能在 16 QAM 及迴旋碼編碼率為 1/2 的環境設定條件下使用。由於大多數的無線系統需要不同的配置，如：64-QAM 搭配編碼率為 1/3 的迴旋碼。

在 MIMO 系統中就實際面而言，聯合偵測與解碼受到前述缺點所侷限。為了增加 MIMO 接收器的效率，

就這方面來說，將聯合偵測與解碼發展成可以被套用在不同系統架構組合是本計畫努力的方向。為了讓合偵測與解碼應用上更具彈性，可以在不同調變方式（QAM scheme ）及不同編碼率( coded rate ) 的交叉組合皆能搭配，從系統特性的角度分析，定義出某些特定的系統參數，再搭配樹狀搜尋 ( tree-searching )的概念，實現出可調式聯合偵測與解碼 (configurable joint detection and decoding, CJDD)演算法。另外，我們探討了可容錯的 MIMO 偵測器演算法及電路架構以便進一步降低整體系統功率消耗。最後，由於樹狀搜尋演算法的廣泛應用，我們將我們發展的演算法應用在其他研

(8)

4

究領域，以擴大本研究之廣度。

VI. 結果與討論（含結論與建議）

本章節會分四個部分會介紹可調式聯合偵測與解碼 (configurable joint detection and decoding, CJDD)演算法的研究成果[27]。第一部分是介紹為了達成可調式( configurable )的功能，

而規劃出的系統參數，第二部分則是有效訊號點搜尋器( The Valid Symbol Finder)，而第三部分則是以 K-Best 樹狀搜尋( tree-searching )的方式為例實現可調式聯合偵測與解碼( CJDD )的演算法，

而最後一個部分是小結及討論。我們會接著介紹容錯式 MIMO 偵測器[25]及應用於晶片網路之低功耗繞線法則[26]。

1. 系統參數規劃

聯合偵測與解碼( JDD )的演算法受制於其本身的天性，只能使用在 16-QAM 搭配 1/2 編碼率(coded rate)或是 64-QAM 搭配 1/3 編碼率(coded rate)，而無法自由搭配，使得在實際應用上缺少了彈性，

因此為了系統可以自由搭配兩者之間的組合，進而實現為可調式( configurable )的精神，在本研究特別定義了以下這幾個控制參數：

 ：指的是在樹狀搜尋( tree-searching )中每一個節點( node )所使用的位元數。這個參數是根據所採用的 QAM 的規格而決定的。舉例來說 16QAM 使用了2 個訊號點( Symbols )，因為樹狀搜尋( tree-searching )採用實虛分軸的方式，所以每個節點( node )可以用 2 個位元去表示，其計算公式如下：

2 ，：所採用2 QAM 規格。 2

 ：指的是每次執行聯合偵測與解碼( JDD )所需的位元數。這個參數跟採用的編碼後位元 ( coded bits )數目以及前面提及的參數相關，計算方法如下:

, 3 LCM(a,b)為 a 與 b 的最小公倍數(Lease Common Multiple) 。

 ：每次執行聯合偵測與解碼( JDD )在樹狀搜尋( tree-searching )中所需要尋找的階層數 ( levels )，這個參數是根據前述的參數及產生的，其計算公式如下：

(4)

 ：每次執行聯合偵測與解碼( JDD )所時候需要經過欄柵狀態( trellis state )數。這個參數是由上述的參數以及編碼後位元( coded bits )所決定的，其計算公式如下：

Table I 不同系統調變架構組合下之控制參數表

Modulation 16QAM 64QAM Code Rate 1/2 1/3 1/2 1/3

n 2 3 2 3

M

c

4 4 6 6

B

l

2 2 3 3

B

v

2 6 6 3

L

v

1 3 2 1

L

t

1 2 3 1

(9)

5

5) 由上述的控制參數計算方式，可以針對不同的系統調變架構組合而得到 Table I，舉例來說，若是系統採用的是 1/3 編碼率及 16-QAM 的調變架構，則在樹狀搜尋架構中每一個節點( node )需要 2 個位元 ( 2)。每個原始資料位元透過通道編碼器產生 3 個編碼後位元，因此為了讓兩者可以匹配，聯合偵測與解碼所需最少位元數至少要 6 個( 6)。就樹狀搜尋演算法架構而言，因為每個節點需要兩個位元，而聯合偵測與解碼( JDD )每次至少需要 6 個位元，因此每次搜尋時必須跨 3 個階層( levels ) 數( 3)。若針對解碼而言，因為採用 1/3 編碼率的調變設定，若想用欄柵狀態解碼出原始資料位元，則每次至少需要 3 個編碼後位元，考慮聯合偵測與解碼演算法每次至少要 6 個位元，所以需要兩次經過欄柵狀態解碼，詳細的執行方式會在下一小節範例中說明。

2. 有效訊號點搜尋演算法( The Valid Symbol Finder, VSF)

這個部分是可調式聯合偵測與解碼演算法的核心，最主要的功能是檢查樹狀搜尋演算法架構下的有效信號點( Valid symbols )，忽略掉無效信號點( invalid symbols )的檢查，進而解碼出輸出位元( output bits )。有效節點的定義是在通道編碼能夠映射出輸出位元的節點。舉例來說，當 2時 ( 16-QAM，1/2 coded rate )，樹狀搜尋引擎會在每一個階層的分支中，標示出有效信號點，並且清除掉無效信號點。然而若是承襲前一小節所舉的例子 3 ( 16-QAM，1/3 code rate )，因為編碼後位元數( coded bits )和節點( node )所使用的位元數兩者之間的不匹配，而使得執行上有所困難，

因此為了實現上述的系統功能，則必須依靠前一小節所計算出的參數來完成。Fig. 2 是承襲這個例子 ( 16-QAM，1/3 code rate )進行通道編碼繪製成的有限狀態機( Finite State Machine，FSM )的示意圖。當中的[xxxx]所代表的是狀態暫存器的值，而 v/yzw 則表示輸入/輸出的位元。從這個例子可以看出，初始狀態[0000](圖最左邊的欄位)根據其輸入的位元的變化，產生兩種可能狀態變化[0000]或是[1000](圖中第二個欄位)，同時也會改變輸出的值分別對應到 000 或 111，而這就是通道編碼時，

會根據現在的有限狀態機狀態和原始資料位元作為輸入信號，而產生的輸出，此輸出即為編碼後位元，

而第一次執行狀態機( FSM )產生的三個位元，前兩個位元(00)會對應到 16-QAM 的訊號點( Symbol )，

剩下的一個位元(0)則是會和下一次狀態機所產生的三個編碼後位元結合，對應到 16-QAM 上的兩組訊號點。因為 3，有效訊號點搜尋器( VSF )每次會在樹狀搜尋結構中搜尋 3 個階層( level )的節點( nodes )，根據 2，從編碼器( encoder )的有限狀態機(FSM )特性中可以分析出4( 2 )條搜尋路徑，換句話說在每一條搜尋路徑中會有個有效信號點。

根據以上描述，有效訊號點搜尋演算法最主要的工作即是基於不同的系統參數，決定樹狀搜尋演算法的行為，包含所需搜尋的階層數目、需產生的有效路徑等。Fig. 3 闡明了有效訊號點搜尋器演算法的流程圖。首先，我們會接收到要計算的節點狀態，並且宣告三個計數器變數，用來控制三個不同功能的迴圈。第一個迴圈控制查詢編碼器的次數，每一個狀態輸入後都分別向編碼器輸入 0 和 1，以取得下一個有效狀態和有效輸出位元，直到做完次。接下來，第二個迴圈將得到的有效輸出位元以

Fig. 2 以 16QAM+1/3 編碼率通道架構為例的編碼狀態示意圖

(10)

6 Fig. 3 有效訊號點搜尋器演算法流程圖

Fig. 4 有效訊號點搜尋器演算法虛擬碼

Algorithm I Valid Symbol Finder 1:VSF(input state, Lt, Lv, Bl)

2:Lt_cnt = 0, i = 0, j=0, k = 0, size_dst = 0;

3:array State_src, State_dst, Metric, Obits_src, Obits_dst, Paths;

4:State_src [0] ← input state, Obits_src = ø, Obits_dst = ø;

5:repeat

6: repeat

7: initial state ← State_src [i];

8: repeat for input bits 0 and 1 respectively

9: Identify next state and output bit sequences 1 2… ;

10: State_dst [j] ← next state;

11: Obits_dst [j] ← Obits_src , _{1 2}… ; 12: j ++ ;

13: end 14: i ++;

15: until i = sizeof(State_src) 16: i ← j ;

17: j = 0 ;

18: State_src ← State_dst , clear State_dst ; 19: Obits_src ← Obits_dst , clear Obits_dst ; 20: Lt_cnt ++ ;

21:until Lt_cnt = Lt

22:repeat

23: Read out the bit sequence _{1 2}… from Obits_src[k];

24: Group every Bl bits to a modulation point  generating Lv ; 25: symbols ¹… ;

26: Paths[k] ← ¹… ;

27: Metric[k] ← Compute the path metric of path Paths[k];

28: k ++;

29:until k = sizeof(State_src)

30:return {State_src, Paths, and Metric}

每為單位映射成訊號點，每條路徑可以得到個訊號點。接著第三個迴圈利用訊號點計算出2 條路徑的路徑長後，將有效狀態、路徑和路徑長回傳，並結束演算法。有此可見，此演算法將以之前所提到的系統系數為控制參數並產生正確的有效路徑。為了驗證演算法的功能，我們將此演算法以 C 語言

(11)

7 Fig. 5 基於 K-Best 樹狀搜尋法則之可調式聯合偵測與解碼( CJDD )演算法的虛擬碼

Algorithm II CJDD K-Best approach K-best(root state, m, Nch, Lt, Lv, Bl)

array State_sur, State_tmp, PM, PM_tmp, Path i = 0, j = 0, L = Nch m + Lv

State_sur[0] ← root state repeat

i = 0;

repeat

{State_tmp[i], Path[i], PM_tmp[i]}VSF(State_sur[i], Lt, Lv, Bl);

i = i + 1;

until State_sur [i] = NULL sorting PM_tmp ;

Keep K best PM_tmp, Path, PM_tmp, and State_tmp ; update State_sur ← State_tmp

L = L – Lv until L = 0

return source bits from the path with the minimum path metric

實現並驗證其正確性。更詳細的演算法架構可以參考 Fig. 4 的有效訊號點搜尋器演算法虛擬碼。而在之後的章節中我們會說明實驗結果。本小節提出的演算法並以整理為論文投稿[27]。

3. 基於 K-Best 樹狀搜尋法則之可調式聯合偵測與解碼( CJDD )演算法

如之前文獻所述，基於樹狀搜尋演算法的 MIMO 偵測法則可以達到較低的運算複雜度，因此廣受研究及應用。前人的研究已研討了各式不同樹狀搜尋演算法在 MIMO 偵測上的應用，並分析了其中的優缺點及複雜度。在諸多數狀搜尋演算法中，以 K-Best 演算法則最被廣泛接受為最適合電路系統架構設計及實現。K-Best 的運算主要積於廣度憂修先原則，即由上而下每一階層依序處理。在處理每一階層時，

唯有 K 個最好的節點會予以保留，因此而稱為 K-Best 演算法。由於其廣度優先的特性的特性，使得它的結構固定並且容易利用電路實現。在本小節中，我們將會介紹如何利用 K-Best 樹狀搜尋的方式，

實現可調式聯合偵測與解碼( CJDD )演算法。K-Best 採用階層對階層( level by level )的方式展開樹狀結構。首先，會根據本報告先前提出的規則計算出的系統參數(方程式(2) – (5))，每次搜尋樹狀結構中一個或是多個階層( level )，直到前面所提及的有效訊號點搜尋器( VSF )能夠同時辨識出有效路徑和訊號點( symbols )為止。取得訊號點後便可用於計算路徑長度，根據其結果保留 K 個最佳倖存點( survivors )。其演算法概念在於：不論倖存點產生多少路徑，K-Best 演算法會從 K 個倖存點，而後在選出下一組新的 K 個倖存點( survivors )，依此類推。有別於其他典型的 K-Best 樹狀搜尋法，在本研究中採用的 K-Best 樹狀搜尋演算法以實現可調式聯合偵測與解碼的方式，只有在階層對階層的狀況下才需要將樹展開，以節省運算量。Fig.5 顯示了基於 K-Best 樹狀搜尋法則之可調式聯合偵測與解碼( CJDD )演算法的虛擬碼。由此流程圖可以清楚看出，K-best 演算法可以大略看作兩個迴圈；外部迴圈控制樹狀結構的層數，以每層為單位，直到全部樹狀結構被搜尋完為止。內部迴圈控制倖存點，將每個倖存點送進有效訊號點搜尋器中進行運算，直到這一組倖存點計算完為止。演算法一開始會將編碼器狀態的初始值當作第一個倖存點送入有效訊號點搜尋器中進行運算，並回傳有效路徑、路徑長和有效狀態。K-best 演算法會將有效訊號點搜尋器所回傳的路徑值排序，並檢查倖存點個數是否超過 K 個，若超過則只取 K 個當作下一組倖存點。每層計算完後，都會檢查是否計算過所有的樹狀結構，如果否則繼續，如果是則回傳目前最短路徑上的所有有效輸出位元，並結束演算。

為了驗證我們提出的演算法並確認我們提出的演算法可以達到(相較於傳統分離式架構)比較好的誤碼率，我們以電腦模擬驗證了不同系統參數下的誤碼率。Fig. 6 為比較實驗的結果。本次實驗是基於4 4 MIMO 無線通訊系統並假設使用 16-QAM 作為調變技術。在編碼方面本實驗分別探討了編碼率為 1/2 及 1/3 的系統並分別分析我們提出的 CJDD 演算法誤碼率及傳統分離式法則誤碼率。就傳統架構而言，此次實驗使用了傳統的 K-Best MIMO 偵測器並假設 K 值為 64。此 K-Best 處理器會產生軟輸出 (soft output)而輸出結果會被送到次一級的維特基解碼器(Viterbi decoder)進行迴旋碼解碼的工作。

(12)

8 Fig. 6 CJDD 演算法與傳統分離式演算法之誤碼率比較

4 6 8 10 12

10^-5 10^-4 10^-3 10^-2 10^-1 10⁰

EbNo(dB)

BER

4x4 16QAM K=64, CJDD VS. Conventional

CJDD, 1/3rate CJDD, 1/2rate

Conv., 1/3rate Soft Output Conv., 1/2rate Soft Output

Fig. 7 低電壓記憶體之容錯式 MIMO 偵測系統示意圖

另一方面，我們的 CJDD 演算法亦使用 K 值為 64 之 K-Best 架構並同時進行 MIMO 偵測與通道解碼。由 Fig. 6 我們可看出，當編碼率為 1/2 時，CJDD 演算法造成之誤碼率相較於傳統架構有約 2dB 的改進。

再者，當系統參數改變為使用編碼率為 1/3 時，我們可以獲得更顯著的誤碼率改進。因此，由實驗結果我們可以看出，本研究提出的 CJDD 演算法可以顯著的改進 MIMO 無線通訊系統之誤碼率。

4. 適用於低電壓記憶體之容錯式 MIMO 偵測演算法及電路架構

有鑑於記憶體功率消耗在整體系統功率消耗所佔之比重，我們希望能夠降低記憶體之供應電壓以降低其功率消耗。然而，降低記憶體電壓將會造成記憶體中之資料錯誤使得後級之訊號處理元件(如 MIMO 偵測)出現運算錯誤。在這個階段的研究中，我們和 University of California, Irvine 的學者合作，開發能夠運作在低供應電壓記憶體環境下的 MIMO 偵測演算法及電路。主要目的為降低系統功率消耗(透過降低記憶體供應電壓)同時保持訊號之正確性(透過本研究提出之容錯式架構)。這樣的系統概念可以 Fig. 7 表示。由 Fig. 7 可見，本項研究的最主要工作在於研發可以修正記憶體(因降低供應電壓而造成)的錯誤以達到原本的訊號可靠度的 MIMO 偵測器。然而，由於此一 MIMO 偵測器須修正除了原本之通道干擾以外之訊號干擾，相較於原本的 MIMO 偵測器而言，其複雜度將不可避免的會呈現上升的狀況。因此，本項研究的最大挑戰在於研發低複雜度的可容錯 MIMO 偵測器。

同樣的，我們這邊探討的 MIMO 偵測器依然以樹狀搜尋演算法為基本結構。與傳統的基於樹狀搜尋演算法的 MIMO 偵測器不同的是，在此系統中，每一個節點都會搜尋可能的記憶體錯誤造成的分枝節點。

一但此一分枝節點被認定為最短路徑，則我們可以認定記憶體中發生儲存資料錯誤並加以修正。因此，

我們可以得到一結論，可容錯 MIMO 偵測器可以從原本之基於樹狀結構之 MIMO 偵測器加以改進，以加入修正記憶體錯誤的目的。這樣的概念可以從 Fig. 8 顯示出。圖中左邊的分枝為原本的樹狀結構，而右邊兩個分枝為考量記憶體錯誤所多出來的可能結果。我們可以清楚看出，相較於傳統 MIMO 偵測系統，

容錯式系統最大的挑戰在於會搜尋更多的節點因而使得複雜度上升。在我們的研究中，我們提出了降低複雜度的有效演算法及電路，使得功率消耗的幅度在可接受的範圍之內。因此，當我們考量整體功率消耗(記憶體加上 MIMO 偵測器之功率消耗)時，我們得到了顯著改善的結果[25]。同時，我們得到一

(13)

9 Fig. 8 容錯式 MIMO 偵測演算法概念圖

個結論，當記憶體面積越大時，我們提出的系統架構的功率節省(power saving)就越大，因為記憶體功率消耗所佔的比重越重。這樣的結論使得我們的系統更適於用在未來的無線通訊系統中，因其隨著天線數的增加及 OFDM 載波(subcarrier)的增加，將會需要更大量的記憶體元件。

5. 應用於低功耗晶片網路(Network on Chip, NoC)之繞線演算法

由於可容錯 MIMO 偵測器或是 CJDD 演算法皆使用樹狀搜尋演算法為基本架構，本研究計畫在執行中對於各種不同的樹狀搜尋法則有深入的研討，並對其效能與複雜度優缺點有深入的掌握。由於樹狀搜尋演算法可被廣泛的應用在不同的研究架構中，我們亦積極況展研究的廣度，試圖將研究的範圍應用在別的問題中。透過與數位其他學者的共同研究，我們發現本研究研討的樹狀搜尋演算法可以做為晶片網路中繞線問題解決方法之一的啟發。晶片網路(Network on Chip, NoC)是使用網路連線方式實現系統晶片內部元件之間互相溝通的最新解決方法。它的基本原理是將系統晶片中的每一個元件(如處理器核心、記憶體裝置、輸入輸出裝置、共同處理器等)視為網路中的一個節點，而彼此之間的資料傳遞則透過某種繞線(routing)演算法達成。例如，當處理器核心欲將處理完畢之資料儲存回記憶體，它會將此一資料丟到晶片網路中，而晶片網路控制器會基於繞線法則將資料傳遞到記憶體裝置所在的節點。有此可見，繞線演算法(routing algorithm)之良窳將決定性地影響晶片網路的系統效能如速度及功率消耗等。由於繞線問題可被看作為在兩節點間找到最適當路徑之路徑最佳化問題，與 MIMO 偵測中的樹狀搜尋問題(尋找最短路徑)有異曲同工之妙。因此，我們藉由樹狀搜尋演算法的啟發，改進了晶片網路中的繞線問題，使得晶片網路系統達到低功耗的目的[26]。

6. 結論與建議

本研究計畫的具體貢獻可分為三項:

(1) 在本研究計畫中提出的可調式演算法能夠在單一階段中，同時實現偵測和迴旋碼的解碼，相較於前人提出的聯合偵測與解碼的演算法只能被使用在特定的調變參數和特定編碼率下，本計畫方法即使隨意地變化兩者之間的組合依舊能夠有效執行。可以利用本計畫規劃出的系統參數調整樹狀搜尋的架構，進而實現可調式的精神，故相較於前人的設計，本研究計畫提出的演算法更適合應用在多輸入多輸出無線通訊系統。這項研究成果以整理成論文[27]並投稿於 2014 的會議。

(2) 本研究結合國外學者提出了可容錯式的 MIMO 偵測演算法及電路架構。相較於傳統式架構，我們提出的方法可以有效降低整體系統功率消耗，並同時能夠保時訊號可靠度。相關結果已整理成期刊論文[25]並通過審查，將於近期內發表。

(3) 本研究啟發了針對晶片網路繞線問題的進一步改進。根據我們的結果，提出的繞線演算法能夠有效的降低功率消耗，改進晶片網路的效能。此一成果已發表於 2013 年的會議論文中[26] 。另外，經由本計畫的研討，我們認為相關研究有極高的延展性，非常有潛力進行進一步更深入的研究。具體而言，本研究的未來延伸方向可包括以下幾項:

(1) 將本研究提出之可調式聯合偵測與解碼的演算法套用在其他的樹狀搜尋演算法架構中，並探討比較不同架構之間的優缺點。本計畫的演算法主要為基於 K-Best 演算法，目前正在探討使用

(14)

10

不同樹狀搜尋演算法可能帶來的效能改進及複雜度降低。目前考量的演算法可包括深度優先法則(Depth-first approach)或是 Fano 演算法。

(2) 將本研究提出之可調式聯合偵測與解碼的演算法具體實現於硬體電路中。其中將會牽涉到如何降低硬體電路複雜度等設計問題。一旦有了硬體電路實現，我們可以進行更多實驗，探討更多面相的系統設計。這項工作目前也正在進行中，我們預計將設計 CJDD 之數位積體電路及 FPGA 平台實現。

(3) 將本研究提出之可調式聯合偵測與解碼的演算法擴展到更先進的編碼器架構中，例如渦輪碼 (Turbo code)以增加系統的可應用性。

VII. 參考文獻

[1] D. Gesbert, M. Shafi, D-S. Shiu, P. J. Smith, and A. Naguib, “From theory to practice: An overview of MIMO space-time coded wireless systems,” IEEE J. Sel. Areas Commun., vol. 21, no. 3, pp. 281–302, Apr. 2003.

[2] B. Hochwald and S. Brink, “Achieving near-capacity on a multipleantenna channel,” IEEE Trans.

Commun., vol. 51, no. 3, pp. 389–399, Mar. 2003.

[3] A. D. Murugan, H. El Gamal, M. O. Damen, and G. Caire, “A unified framework for tree search decoding: Rediscovering the sequential decoder,” IEEE Trans. Inf. Theory, vol. 52, no. 3, pp. 933–953, Mar. 2006.

[4] C. P. Sukumar, C.-A. Shen, and A. M. Eltawil, “Joint Detection and Decoding for MIMO Systems Using Convolutional Codes: Algorithm and VLSI Architecture,” IEEE Trans. Circuits Syst. I, vol. 59, no. 9, pp.

1919–1931, Sep 2012.

[5] C. Studer, A. Burg, and H. Bolcskei, “Soft-output sphere decoding: Algorithms and VLSI implementation,” IEEE J. Sel. Areas Commun., vol. 26, no. 2, pp. 290–300, Feb. 2008.

[6] H. Vikalo and B. Hassibi, “On joint detection and decoding of linear block codes on Gaussian vector channels,” IEEE Trans. Signal Process., vol. 54, no. 9, pp. 3330–3342, Sep. 2006.

[7] F. Sun and T. Zhang, “Low-power state-parallel relaxed adaptive viterbi decoder,” IEEE Trans. Circuits Syst. I, vol. 54, no. 5, pp. 1060–1068, May 2007.

[8] L. Boher, R. Rabineau, and M. Helard, “FPGA implementation of an iterative receiver for MIMO-OFDM systems,” IEEE Trans. J. Sel. Areas Commun., vol. 26, no. 6, pp. 857–866, Aug. 2008.

[9] M. Mahdavi and M. Shabany, "Novel MIMO Detection Algorithm for High-Order Constellations in the Complex Domain," IEEE Trans. VLSI Systems, vol. 21, no. 5, pp. 834-847, May 2013.

[10] J. Ketonen, M. Juntti, and J.R. Cavallaro, "Performance—Complexity Comparison of Receivers for a LTE MIMO–OFDM System," IEEE Transactions on Signal Processing, vol.58, no.6, pp.3360-3372, June 2010.

[11] C.-Y. Yang and D. Markovic, “A flexible DSP architecture for MIMO sphere decoding,” IEEE Jour.

Solid-State Circuits, vol. 56, no. 10, pp. 2301–2314, Oct. 2009.

[12] A. Burg, M. Borgmann, M. Wenk, M. Zellweger, W. Fichtner, and H. Bolcskei, “VLSI implementation of MIMO detection using the sphere decoding,” IEEE Jour. Solid-State Circuits, vol. 40, no. 7, pp.

1566-1577, Jul 2005.

[13] L. Liu, F. Ye, X. Ma, T. Zhang, and J. Ren, "A 1.1-Gb/s 115-pJ/bit Configurable MIMO Detector Using 0.13-um CMOS Technology," IEEE Trans. on Circuits and Systems II, , vol.57, no.9, pp.701-705, Sep.

2010. [14] C. Novak, C. Studer, A. Burg, and G. Matz, "The effect of unreliable LLR storage on the performance of MIMO-BICM." in Proc. Asilomar Conf Signals, Systems and Computers (ASILOMAR), pages 736-740, 2010.

(15)

11

[15] C. Gimmler-Dumont, C. Brehm, and N. When, "Reliability Study on System Memories of an Iterative MIMO-BICM System," in Proc. IEEE/IFIP 20th International Conference on VLSI and System-on-Chip (VLSI-SoC), pp.255,258, 2012.

[16] A. M. A. Hussien, M. S. Khairy, A. Khajeh, A.M. Eltawil, and F. J. Kurdahi,"A Class of Low Power Error Compensation Iterative Decoders," 2011 IEEE Global Telecommunications Conference (GLOBECOM 2011), pp.1,6, 5-9 Dec. 2011

[17] J. Geldmacher and J. Götze, " On fault tolerant decoding of Turbo codes, " in Proc. 012 7th International Symposium on Turbo Codes and Iterative Information Processing (ISTC), pages 245-249, 2012.

[18] J. Geldmacher and J. Götze, "EXIT-Optimized Index Assignments for Turbo Decoders with Unreliable LLR Transfer," IEEE Communications Letters, vol.17,no.5, 992-995, 2013.

[19] J. Geldmacher, K. Hueske, and J. Götze, "Turbo Equalization for Receivers with Unreliable Buffer Memory," In Proc. IEEE Vehicular Technology Conf. (VTC Fall), pages 1-5, 2011.

[20] E. Kim and N. Shanbhag, "Energy-Efficient LDPC Decoders Based on Error-Resiliency," IEEE Workshop on Signal Processing Systems (SiPS), pages 149-154, 2012.

[21] M. May, M. Alles, and N. Wehn. "A Case Study in Reliability-Aware Design: A Resilient LDPC Code Decoder," In Proc. Design, Automation and Test in Europe DATE '08, pages 456-461, 2008.

[22] R. A. Abdallah and N. R. Shanbhag, "Error-Resilient Low-Power Viterbi Decoder Architectures," IEEE Transactions on Signal Processing, vol. 57, no. 12, pp. 4906-4917, 2009.

[23] C. Roth, C. Benkeser, C. Studer, G. Karakonstantis, and A. Burg, "Data mapping for unreliable memories," In Proc. Annual Allerton Conference on Communication, Control, and Computing (Allerton), pages 679-685, 2012.

[24] C. Gimmler-Dumont, M. May, and N. When, "Cross-Layer Error Resilience and Its Application to Wireless Communication Systems," Journal of Low Power Electronics (JOLPE), vol. 9, no.1, April 2013.

[25] M. S. Khairy, C.-A. Shen, A. M. Eltawil, and F. J Kurdahi, " Algorithms and Architectures of Energy-Efficient Error- Resilient MIMO Detectors for Memory-Dominated Wireless Communication Systems," IEEE Transactions on Circuits and Systems - I, to appear, Dec. 2013.

[26] C.-K. Hsu; K.-L. Tsai, J.-F. Jheng, S.-J. Ruan, and C.-A. Shen, "A low power detection routing method for bufferless NoC," Quality Electronic Design (ISQED), 2013 14th

International Symposium on, vol., no., pp.364,367, 4-6 March 2013.

[27] C.-H. Huang, C.-P. Yu, and C.-A. Shen, "A Configurable Joint Detection and Decoding Algorithm for MIMO Wireless Communications," submitted for review, ISCAS 2014, Oct. 2013.

(16)

1 國科會補助專題研究計畫出席國際學術會議心得報告

日期：102 年 12 月 20 日

一、參加會議經過

由於可容錯 MIMO 偵測器或是 CJDD 演算法皆使用樹狀搜尋演算法為基本架構，本研究計畫在執行中對於各種不同的樹狀搜尋法則有深入的研討，並對其效能與複雜度優缺點有深入的掌握。由於樹狀搜尋演算法可被廣泛的應用在不同的研究架構中，我們亦積極況展研究的廣度，試圖將研究的範圍應用在別的問題中。透過與數位其他學者的共同研究，我們發現本研究研討的樹狀搜尋演算法可以做為晶片網路中繞線問題解決方法之一的啟發。晶片網路(Network on Chip, NoC)是使用網路連線方式實現系統晶片內部元件之間互相溝通的最新解決方法。因此，我們藉由樹狀搜尋演算法的啟發，改進了晶片網路中的繞線問題，使得晶片網路系統達到低功耗的目的。經由參與此一研究，我們將研究成果發表在

ISQED 2013

會議中並由我出席會議代表報告論文。ISQED 在國際電子設計領域為ㄧ小而精的會議，內容非常著重在實際電子電路系統的效能及複雜度等問題上。

每年的會議均為春天在北加州的 Santa Clara 舉辦。由於非常接近矽谷，往往吸引了很多矽谷高科技公司的員工及主管與會。今年的會議為 3 月 4 日至 3 月 6 日舉辦，我全程參加並聆聽了多場演說。我們的海報安排在第二天(3 月 5 日)的傍晚舉辦。以下照片攝於 ISQED 2013 會場大廳。

二、與會心得

整體感覺本次會議有非常多論文探討 3D 積體電路(IC)的相關設計、製造、生產的相關議題。例如有多場演講討論在 3D IC 時的 RTL 設計該需要做如何的修正，而有多場演講探討如何 3D IC 的情境下進行低功耗設計。相關演講的作者皆發表他們的最新成果。由於有多場演說都是矽谷當地的科技公司發表，具有頗高時用性質。值得ㄧ題的是，其中有ㄧ場 Keynote speech 請到電子電路界權威 Chenming Hu 教授討論 The Changing Decice Technology。主要內容即為探討 FinFET 裝置的

計畫編號 NSC

101－2218－E－011－045－

計畫名稱適用於通用性系統設定的結合 MIMO 偵測與通道解碼之演算法與系統架構設計與探討

出國人員

姓名沈中安服務機構及職稱台灣科技大學電子系助理教授

會議時間 102 年 3 月 4 日至

102 年 3 月 6 日會議地點 Santa Clara, CA, USA

會議名稱

(中文) 國際品質電子設計會議

(英文) International Symposium on Quality Electronic Design (ISQED)

發表題目

(中文) 無緩衝裝置晶片網路之低功耗偵測繞線法則

(英文) A low power detection routing method for bufferless NoC

(17)

2

性及設計理念。由於 Prof. Hu 為 FinFET 方面研究的先驅，能夠聆聽他親自介紹可謂獲益良多。

另外，讓我印象深刻的還有許多演講討論電路或是記憶體元件中的軟錯誤(soft errors) 問題。

隨著積體電路製成的繼續演進，軟錯誤將會成為重要的課題。因此不論系統開發、架構設計，甚至是演算法開發都會需要瞭解軟錯誤的電路特性，並尋求解決之道。接續者相關的研究是更廣泛的 Energy-aware 系統設計理念。隨著手持式嵌入式系統的蓬勃發展，能量消耗 (energy consumption)已經成為電子電路系統設計中最重要的限制(constraint)及考量因素。因此本次會議有數篇論文探討在系統(或甚至演算法)設計時便將能量消耗考量在內，因此能夠從設計的最初階段便嘗試節省能量的消耗，以期達到節能的最大化。這些關於電子電路系統設計實務考量的最新趨勢，使得參與本會議不虛此行。

三、發表論文全文或摘要

本次會議發表之論文全文附於報告之後。

四、建議

ISQED 會議ㄧ向以它的眾多矽谷高科技公司參與聞名。因此，經由參加此會議，我們可以看到聽到不但是學術界的研究重點，亦可ㄧ窺工業界的實務經驗。由本次會議可感覺目前的電子電路系統發展著重於以下三方面，個人覺得可作為未來教學研究之重點方向。

 FinFET 裝置之相關議題。3D IC 相關領域必定會是未來的趨勢。因此，相關研究教學課題如低功耗設計，設計流程，架構設計等皆須作相對應的調整。

 電子電路系統或記憶體元件之軟錯誤(soft error)相關議題及 Energy-aware 電路及系統相關課題。

 低能量消耗之綠色系統或是綠色電路相關課題。

五、攜回資料名稱及內容

本次會議之議程(program)及論文收錄(proceeding) 。六、其他

(18)

A Low Power Deﬂection Routing Method for Bufferless NoC

Chung-Kai Hsu

¹

, Kun-Lin Tsai

^2∗

, Jing-Fu Jheng

¹

, Shanq-Jang Ruan

¹

, Chung-An Shen

¹

1

Department of Electronic and Computer Engineering,

National Taiwan University of Science and Technology, Taipei, Taiwan

2

Department of Electrical Engineering, Tunghai University, Taichung, Taiwan

*E-mail: kltsai@thu.edu.tw

Abstract—Network-on-Chip has been proposed for high per- formance on-chip communication. The major component of a Network-on-Chip architecture is the router, which affects the data transmission latency, chip area and power consumption. Inside the router, buffers occupy a signiﬁcant a mount of power and a large partition of chip area. Therefore bufferless NoC, which discards the buffers in the routers, has been proposed for solving the power and area problem. In this paper, a low power deﬂection routing method is proposed for the bufferless on-chip network dealing with the routing problem and achieving the low power goal. The proposed method uses routing matrix for constructing the possible routing path, and then selects the best route for each data packet. Only few calculations are used in this method therefore lowering power consumption the low power goal. The experimental result shows that the proposed approach can greatly reduce power consumption and chip are compared with previous work.

Keywords—Deﬂection routing, low power, bufferless router, NoC

I. Introduction

With the reducing size of chips and the progress of process technology, millions of transistors can be placed in a single chip. The development of System-on-a-Chip (SoC) allows more functional components being integrated into a single chip. Under the demand of functions, SoC design has become more complex; hundreds of silicon intellectual property (SIP) modules can be contained in an SoC [1], [2]. Nevertheless, it has become a major design challenge to improve the system performance and the on-chip communication capability under the area and power limitations [3].

To overcome the challenges of on-chip data communica- tion, Network-on-Chip (NoC) is further proposed [3]. Data transmission between SIPs can be delivered through a packet and communicated with a router [4], [5]. With high regular- ity and favorably controllable structures, NoC presents high efﬁciency and reliability, and the regular structures allow the modular NoC being repeatedly utilized.

In regard to the router design, buffer is the key component and affects the NoC bandwidth. Over the past few years, a considerable amount of studies has been done on NoC buffer design [6]–[8]. However, the NoC with buffer structures consume signiﬁcant power and circuit area. Some researches proposed bufferless NoC for reducing the power consumption

and circuit area [9]–[11]. Bufferless NoC removes the router buffers and transmits the data packets by deflecting the packets to a free output port [10]. According to the data flow, deflec- tion routing algorithm based on permutation tree is proposed to compute the routing matrix [12]. However, the permutation tree routing also consumes large power and area. As the values of the routing matrix are used for constructing the routing path, the permutation tree has a lot of complicated cases to be computed.

Unlike previous researches using lot of calculation to decide the packet routing path in the bufferless NoC, in this paper, a novel deﬂection routing approach with simple operation is proposed to achieve low power and low area goals.

The deﬂection routing method automatically selects a routing path according to the characteristics of current trafﬁc load in NoC and the characteristics of packets. To avoid deadlock and livelock, a concept of priority scheme is also used in the proposed method. The experimental results show that the proposed method can reduce power and considerably. Besides, the packet transmission latency can also be improved by using the proposed method.

The rest of this paper is organized as follows. Section 2 introduces the permutation tree routing method. Section 3 describes the proposed low power deﬂection routing approach.

The implementation of deﬂection router is discussed in Section 4. Section 5 reports the experimental results and discussion.

Finally, the conclusion of this paper is provided in Section 6.

II. Permutation Tree Routing (PTR)

To decide the routing path, a routing matrix is used for per- mutation tree in order to ﬁnd the optimal routing permutation.

All possible routing permutations can be found by constructing a permutation tree [12].

Let

i be the input direction and j be the output direction.

The routing cost function

c can be deﬁned as follows:

cij

= 0 if i = j ∧ d

ij

= true, 1 if i = j ∧ d

ij

= false,

2 if i = j. (1)

where

dij

indicates whether the distance of the packet is the

closest to its destination. In other words, when the remaining

distance to the destination via output port

j is shorter than that

to input port

i, dij

is true; otherwise,

dij

is false. It should be

(19)

TABLE I. PERMUTATION COMBINATION FOR DIFFERENT TOPOLOGIES.

Types Mesh Star 3-D Mesh

Ports number 4 5 6

Cases 24 120 720

noted that the larger the value

c is, the higher than penalty it

has, i.e., a worse routing decision.

According to the cost function (1), the entries of the routing matrix can be calculated by using the weighted value:

rij

= w

icij,

(2)

where

wi

represents the weighted value of input port

i. The

weighted value depicts the packet priorities, for mesh, the lowest priority of packet with

wi

= 1 and highest one with

wi

= 4. Note that w

i

is set to 0 when no valid packet exists.

The structure of a permutation tree constructs the datapath based on the entries of routing matrix to optimize routing decision. The value of each case is calculated by 3 additions.

Summing up the cost values, the cost-driven routing can be decided according to the minimum summation result. However, the permutation combination of routing cases depends on port number of topology which affect the overall performance of routing. In Table I, it can be easily observed that number of cases of routing is dramatically increased for the advanced topology. Therefore, it will degrade the performance, consume high power, and occupy large chip area because of a large number of routing cases.

III. Low Power Deﬂection Routing (LPDR) An effective routing algorithm is proposed in this section for adjusting the routing path for data packets, so that the routing can satisfy every situation.

Figure 1 shows the flow of low power deflection routing method for a star router. The proposed algorithm first checks the header of the input packet to obtain the destination coordi- nate. To obey the basic rule which provides the destination for output packets, the subset of possible connections is used for constructing a valid routing. The priority information is also included in the input packet. The accumulated latency can be increased, as the priorities of packets are efficiently assigned.

On the other hand, the coordinate of packets is checked to perform routing for the effective transmission between SIPs.

After checking the header information of incoming packets, the reference table must be generated to list the routing options of the preference direction, deﬂective direction, and priority.

In the reference table, the deﬂective direction is ﬁxed and is corresponding to the input direction. It is important to note that the reference table will affect the determine of output ports for all incoming packets.

According to the reference table, the I/O pattern that makes a decision of packet routing is calculated. There are ﬁve different directions for the packet coming to the current router and going to the next one. In order to achieve the low power and small area goals, the simple arithmetic without multipli- cation and division is used. However, high communication performance is still held in the router.

Valid Packet address

Preference route Deflection route

Priority

Generate reference table 1 2 3 4

1 2 3 4

IO Sum

s1

s2

s3

s4

Calculate I/O pattern

1 2 3 4 r11

r21

r31

r41

r12

r22

r32

r42

r13

r23

r33

r43

r14

r24

r34

r44

Hop Current coordinate Destination coordinate

5 r51r52r53r54

5 r15

r25

r35

r45

r55 5 s5

3 4

1 5 2 Star

Fig. 1. The ﬂow of low power deﬂection routing method.

A. Packet Address

There are four ﬁelds in each packet header, including valid bits, priority bits, current coordinates, and destination coordi- nates. The proposed method uses these ﬁelds to construct the datapath.

The valid bit is used to classify that the packet comes from a valid router or gets fault. When the packet is from the other valid router, the valid bit is set to 1, otherwise, 0. However, when the packets arrive to the target router, the valid bit is also set to 0. When the packets come into the router, all valid bits are checked whether the packets are valid or not.

All packets in the module are assigned with different priorities depending on the transmission latency. According to the standard format of header ﬁle, the packet with the highest hop count needs to be routed ﬁrst. Therefore, the proposed method compares all hop counts to determine the priority of each incoming packet. For a star router, the lowest priority is with number 1, and the highest priority is with number 5.

B. Reference Table

Priorities are assigned to the packets from every input direction. The assignment is based on the comparison of hop count ﬁeld in the packet header. In order to determine if the packet routes to preference direction or to deﬂection direction, the distance between the current coordinate and the destination coordinate must be analyzed.

Let (x

C, yC

) be the current coordinate and (x

D, yD

) be the destination coordinate. The distance

disc can be calculated by

function (3).

disc :

xdisc

= x

D− xC,

ydisc

= y

D− yC.

(3) Note that the reﬂective direction routing can be directly determined according to the input direction of each packet.

For each incoming packet, a sign function is used for measuring the relative position. The sign function of

xdisc

and

ydisc

can be deﬁned as follows:

sgn(x

disc

) = −1, if x

disc< 0,

0, if

xdisc

= 0, 1, if

xdisc> 0.

適用於通用性系統設定的結合MIMO偵測與通道解碼之演算法與系統架構設計與探討

行政院國家科學委員會專題研究計畫 期末報告