馬可夫分群演算法 (Markov Cluster Algorithm) - 局部 (Mesoscopic) 分析低密度奇偶檢查碼結構

四. 局部 (Mesoscopic) 分析低密度奇偶檢查碼結構

4.2. 馬可夫分群演算法 (Markov Cluster Algorithm)

Markov Cluster Algorithm (MCL) 是屬於 structurally equivalent 的分類方法，

由 Stijn van Dongen [7] 提出，是一種能夠將大量節點分群 (cluster) 的方法，

MCL 已應用在分析複雜生物網路，例如蛋白質交互作用的網路拓撲

(Topological Similarity of Protein Interaction Network)，透過 MCL 進行 protein family detection；還有人類的疾病基因研究 (Disease-Gene network)，人類的疾病基因網路 (Disease-Gene network) 是一種二分圖 (bipartite graph)，利用 MCL 可以預測疾病基因網路，分群後的結果表示同 cluster 內的基因容易透過生物反映程序作用影響同 cluster 內的疾病，相較於任意選擇的基因，同 cluster 內的基因有比較大的機率影響同 cluster 內的疾病。

MCL 是一種基於隨機漫步 (random walk) 的方法來不斷模擬節點在圖中行走的路徑，經過一段時間後，節點在某個區域內行走的機率較高，這些區域稱為群 (cluster)。MCL 主要的兩種運算是 expansion 和 inflation，expansion 的目的主要是讓節點能夠在不同節點間行走，inflation 的目的主要是放大每一條線 (edge) 行走的機率，透過初始的 stochastic matrix 不斷的執行 expansion 和 inflation 將節點分群 (cluster)。表 7 為 MCL 的演算法。

Step 1. Input a Graph, expansion parameter 𝑒, inflation parameter 𝑟;

Step 2. Create the adjacency matrix from the graph;

Step 3. Add self-loops;

Step 4. Normalize the matrix 𝑀;

Step 5. Expand the matrix with 𝑒^𝑡ℎ power, i.e. (𝑀)^𝑒;

Step 6. Inflate by taking inflation of the resulting matrix with parameter 𝑟;

Step 7. Repeat step 5 and step 6 until a steady state is achieved;

Step 8. Interpret resulting matrix to discover clusters.

Definition: [18]

Given a matrix 𝑀 ∈ ℛ^𝑘×𝑙, 𝑀 ≥ , and a real nonnegative number 𝑟,

the matrix resulting from rescaling each of the columns of 𝑀 with power coefficient 𝑟 i s called 𝛤_𝑟𝑀, and 𝛤_𝑟 is called the inflation operator with power coefficient 𝑟. Formall y, the action of 𝛤_𝑟: ℛ^𝑘×𝑙 → ℛ^𝑘×𝑙 is defined by

(𝛤_𝑟𝑀)_𝑝𝑞 = (𝑀_𝑝𝑞)^𝑟/ ∑(𝑀_𝑖𝑞)^𝑟

𝑘

𝑖=

表 7 Markov cluster algorithm

expansion parameter = 2, inflation parameter = 2

Step 1

表 8 Markov cluster algorithm example

表 8 為 Markov cluster algorithm 範例圖，經過 MCL 運算後節點 1, 2, 3

我們所分析的低密度奇偶檢查碼的 Tanner graph 就是 bipartite graph，且節點數多，適合利用 MCL 將低密度奇偶檢查碼進行分群。本文使用 Cytoscapse [19] 一種用於分析複雜網路分析的平台來分析，其中提供許多插件用於分析複雜網路，例如 clusterMaker [20]，clusterMaker 提供了 MCL 這種分群演算法，

表 9 為 MCL 參數設定，藉由 MCL 的分析後，不同方法的群結構分布如圖 21，MCL 會將 bipartite graph 切割出許多不同大小的 cluster，圖 24 統計不同大小的 cluster 數量。

MCL Parameter Value Expansion parameter 2 Inflation parameter 2 Number of iteration 16 表 9 Markov cluster algorithm parameter setting

random topology zigzag topology

PEG topology IPEG topology

QC-LDPC codes topology

Check node Red color Variable node Blue color Largest cluster MIPEG topology

圖 21 Topology by MCL

由圖 21 發現 PEG-based 的碼透過 MCL 的分群後，都存在著 1 個大的 clusters，接下來我們分析 PEG-based 大 cluster 內節點的特性。

 度分布 (Degree distribution)

由馬可夫分群演算法 (Markov Cluster Algorithm) 的分析我們發現 high degree 的 variable node 不屬於任何 cluster，cluster 內的 variable node 都是 low degree variable node，表 10 為 PEG-based 最大群內 variable node 與 check node 原本的 degree distribution。

number of variable nodes Degree PEG IPEG MIPEG

2 307 331 306 3 201 210 172

5 54 65 55

total nodes 562 606 533

number of check nodes

Degree PEG IPEG MIPEG

1. Average shortest path length

圖 22 為 PEG-based cluster 在不同 degree 下 average shortest path length 的表現，由結果可以發現，不同 degree 下 MIPEG 的 average shortest path length 皆比 IPEG 和 PEG 長，由 Tanner graph 的觀點，希望 cycle length 越長越好，

而我們觀察到 variable node 間的 shortest path length 效能較好的碼都比較長，

如果在設計低密度奇偶檢查碼時考量 variable node 到 variable 的 shortest path length，增加 shortest path length 或許也能增加 cycle length 的長度。MCL 切出的大 cluster，所有節點的 average shortest path length 影響 waterfall region 的效能。

圖 22 Average shortest path length of PEG-based cluster 2. Betweeness centrality

由圖 23 的結果可以發現，MIPEG 的 betweeness centrality 都比其他方法值較大，也就表示 MIPEG 不同 degree 下的 variable node 成為 trapping set 的機率比其他方法低。在設計低密度奇偶檢查碼時考慮節點的 betweeness centrality 或許能夠避免節點成為 trapping set，提升 error floor region 的效能。

y = 0.0352x²- 0.3043x + 4.4196 R² = 1

y = 0.0389x²- 0.3148x + 4.4856 R² = 1

y = 0.0107x²- 0.2666x + 5.2311 R² = 1

3.8 4 4.2 4.4 4.6 4.8 5

2 3 5

average shortest path length

degree

Average Shortest Path Length Distribution

PEG IPEG MIPEG

多項式(PEG) 多項式(IPEG) 多項式(MIPEG)

圖 23 Betweeness centrality of PEG-based cluster

圖 24 Cluster histogram by MCL

由圖 21 與圖 24 發現 QC-LDPC codes 本身是一個大的 cluster，而 PEG-based 的部分擁有一個比較大的 cluster，cluster size 約 1000，但也有一些小的 cluster，而 Random 的部分都是由小的 clusters 組成，表現了 Random,

random zigzag PEG IPEG MIPEG QC-LDPC codes

QC-LDPC codes PEG-based

Largest cluster

smallclusters

由 2.1.1 我們知道 𝑎 值越小發生機率越高，𝑏 值越小錯誤機率越大，本文

Number of variable nodes : 14

Red nodes : check nodes Blue nodes : variable nodes

out_cluster

Cluster 1 Cluster 2 Cluster 3 Cluster 4

b Cluster1 Cluster 2 Cluster 3 Cluster 4 Ratio

2 4 5 1 1

22= .5

3 3 3 6

22= .27

4 3 2 5

22= .23 表 11 MCL intra-cluster 𝑎 = 2, 𝑏 分布表

綠底的部分表示在 Cluster1 選 2 個 variable node，(𝑎, 𝑏) = (2, 2) 的組合數有 4 個；而在 intra-cluster 內任選 2 個 variable node (𝑎, 𝑏) = (2, 2) 的組合數就是把 𝑏 = 2 那一列相加，總共有 11 個；接下來 normalize 所有小 cluster 內任選 2 個 variable nodes 的組合數，總共是 22 個，所以

就表示在 intra-cluster 內任選 2 個 variable node (𝑎, 𝑏) = (2, 2) 比例 (Ratio) 為 .5 ，利用這種計算方式表示 intra-cluster 內任選 𝑎 個 variable node 的比例分布。

2. inter-cluster : 屬於小 cluster 的 node 選 (𝑎, 𝑏) trapping set 如圖 28，

normalize 的分母為 ( 4 𝑎)。

圖 28 MCL inter-cluster 範例圖

小 cluster

Number of variable nodes : 14 4

𝑎

3. in-out-cluster : 小 cluster 內選個 node，out-cluster 內任選 𝑎 − 個 nodes，如圖 29，normalize 的分母為 ( 4) × ( 𝑁

𝑎 − )。

圖 29 MCL in-out-cluster 範例圖

4. out-cluster : out-cluster 內選 (𝑎, 𝑏) trapping set 如圖 30，normalize 的分母為 (𝑁

𝑎)。

圖 30 MCL out-cluster 範例圖

本文以下比較 PEG, IPEG, MIPEG 上述的 4 種情況挑選 (𝑎, 𝑏) trapping set 的 ratio。

小 cluster

Number of variable nodes : 14

Red nodes : check nodes Blue nodes : variable nodes

out_cluster 4

𝑁 𝑎 −

Number of variable nodes: 𝑁

𝑁

𝑎 out_cluster

Number of variable nodes: 𝑁

圖 31 PEG MCL trapping (𝑎 = 2) 比較結果

圖 32 IPEG MCL trapping (𝑎 = 2) 比較結果

圖 33 MIPEG MCL trapping (𝑎 = 2) 比較結果

由圖 31、圖 32、圖 33 可以發現，𝑎 = 2 intra-cluster 挑選到 (𝑎, 𝑏) trapping set 的機率最大。

圖 34 PEG MCL trapping (𝑎 = 3) 比較結果

圖 35 IPEG MCL trapping (𝑎 = 3) 比較結果

圖 36 MIPEG MCL trapping (𝑎 = 3) 比較結果

由圖 34、圖 35、圖 36 可以發現 𝑎 = 3 時 intra-cluster 挑選到 (𝑎, 𝑏) trapping set 的機率也是最大。

透過以上的結果發現在小 cluster 內確實容易挑到 (𝑎, 𝑏) trapping set。我們一開始提到 trapping set 與 betweeness centrality 的關係，接下來我們統計小 clusters 內 nodes 的 betweeness centrality 如圖 39 發現小 clusters 內 nodes 的 betweeness centrality 都比較小，而 high degree node 的 betweeness centrality 比其他節點高出許多，圖最右邊紅色表示 high degree node 的 betweeness centrality。

圖 37 The betweeness centrality in clusters of PEG

圖 38 The betweeness centrality in clusters of IPEG

圖 39 The betweeness centrality in clusters of MIPEG

0.00E+00

由本節的實驗結果可以發現 LDPC codes 的 Tanner graph 透過 MCL 的分群後，QC-LDPC codes 只有一個 cluster，表示不容易受到其他節點突發狀況而影響，也反映了 QC-LDPC codes 在 error floor region 表現較好的趨勢；而 Random 則是有許多小 clusters，表示 Random 的 LDPC codes 容易受到突發狀況影響，也反映了 Random 表現都比其他方法差的趨勢；而 PEG-based 透過 MCL 的分群後，具有 1 個較大的 cluster 和少許的小 clusters，表示 PEG-based 中的大 cluster 內部救回錯誤節點的機會較高，透過 network parameter 的分析也反映了 MIPEG 參數值 (average shortest path length, betweeness centrality) 都大於其他方法；另外在小 cluster 的部分，透過比較 intra-cluster, inter-cluster, in-out-cluster 與 out-cluster，我們發現 intra-cluster 最容易選到 (𝑎, 𝑏) trapping set，也就是說這些小 cluster 確實容易形成 (𝑎, 𝑏) trapping set。

在文檔中利用分群演算法分析低密度奇偶檢查碼的結構 (頁 34-49)