利用LT 編碼增進網路通訊系統吞吐量之研究

(1)

國立交通大學

電子工程學系電子研究所碩士班

碩士論文

利用 LT 編碼增進網路通訊系統吞吐量之研究

Throughput Enhancement Using LT Codes in Erasure

Network Communications

研究生：胡健甫

指導教授：張錫嘉教授

(2)

利用 LT 編碼增進網路通訊系統吞吐量之研究

Throughput Enhancement Using LT Codes in Erasure

Network Communications

研究生：胡健甫 Student：Chien-Fu Hu 指導教授：張錫嘉 Advisior：Hsie-Chia Chang 國立交通大學電子工程學系電子研究所碩士班碩士論文 A Thesis

Submitted to Department Electronics Engineering & Institute of Electronics College of Electrical and Computer Engineering

National Chiao Tung University In Partial Fulfillment of the Requirements

for the Degree of Master In

Electronics Engineering December 2008

(3)

利用 LT 編碼增進網路通訊系統吞吐量之研究

學生：胡健甫指導教授：張錫嘉 國立交通大學 電子工程學系 電子研究所碩士班 摘要網路傳輸時所發生的瓶頸於接收端已造成嚴重的吞吐量降低。這篇論文中，我們介紹了一個將LT碼的編碼系統應用於傳輸環境之下來增進吞吐量。所提的LT編碼能改善吞吐量來逼近理論上的最大值。同時也能提供重要的資料保護能力來對抗於通道的封包遺失。在LT編碼系統中，相對於所有傳輸資料量，中間節點所需要的緩沖器變成不重要的因素而能大量地減少。在某些網路中，我們結合LT碼和特定的網路編碼機制來得到更進一步接收端吞吐量的改善，同時也對資料提供強大的保護。在整個傳輸過程，因編碼機制所造成的額外負擔也在一個合理的代價。所提出的LT碼工作在封包數目從4K到64K，每個封包的大小為1KB。提出的方法能在兩種網路之下成功的減輕瓶頸：含edge-disjoint edges（EDPs）以及不含EDPs。同時也能給予強大的資料保護在封包遺失比率從0%到20%。我們提及的LT碼能改善20%到30%的吞吐量。於包含EDPs的網路之中，結合的LT-網路編碼能提供更進一步到達約50%的吞吐量改善。最後，編碼運算是在有限場GF(2) 而所需的額外負擔為每一個碼字大約30個exclusive-or運算。

(4)

Throughput Enhancement Using LT Codes in Erasure

Network Communications

Student：Chien-Fu Hu Advisor：Hsie-Chia Chang

Department of Electronics Engineering Institute of Electronics

National Chiao Tung Eunversity

Abstract

Bottlenecks occurring during the transmission in the network have caused serious consequences on the throughput degradation in receivers. In this thesis, a coding system that applies low degree LT codes in transmission environment to enhance the throughput is introduced. The proposed LT code can improve the throughput

approach to the theoretical maximum. In the meanwhile, it provides a significant data protection ability against the packet loss in erasure channels. The required buffer size in the intermediates in the LT codes system becomes a inconsequential factor that it can be reduced considerably compared to the data size. In some network topology, we combine low degree LT codes with specified network coding mechanism to get further improvement in throughput of every receiver and also give strong protection of data. The overhead of the coding mechanism causes the reasonable computation cost in whole transmission.

For each packet of 1KB, the proposed LT codes work under the entire packet number from 4K to 64K. The proposed method alleviates the bottlenecks successfully in two kinds of topologies, with edge-disjoint paths (EDPs) and without EDPs. It also gives strong protection of data in the erasure channels with different scale of loss rate from 0% to 20%. Our proposed low degree LT code can enhance the throughput in the range from 20% to 30%. In the network with edge-disjoint paths, the combined LT-network code offers advanced improvement up to 50% or so. Finally, the

(5)

誌謝

時間過得很快，兩年多的研究生活就這樣匆匆而過。回首來時，是許多人的幫助才能讓我順利完成。首先要感謝指導我的張錫嘉老師，在研究上放心地讓我做新的嘗試，並作適時的修正。除此之外，平日也相當照顧學生，尤其在我研究上最低潮的時候，總不停地鼓勵我，十分感動。也很幸運能待在 OASIS 和 OCEAN 這氣氛良好的實驗室。謝謝建青學長教導我使用 LATEX 軟體，彥欽學姐在畢業之後還特地花費時間研讀 Network Coding 的書，充實我的理論基礎。國光不厭其煩地教導我 LT CODE，平日更是一起玩樂的朋友。齊哥耐心地聽我一次次的口試並給予建議，大頭跟阿龍學長也叮嚀我投影片的修改。胖達、義凱、佳瑋、元等學長姐也給我許多指導。大嘴、永裕、QMO、鑫偉更是一起奮鬥的好伙伴，尤其最後半年總是一起討論、互相打氣，沒日沒夜地做實驗、趕論文。而高守、振揚、裕淳、晶今、廷聿等更是不可多得的學弟妹。此外，在 316 實驗室的大大、承曄、JUJU、已畢業的 apu、阿俊讓我在實驗之餘，能放鬆心情。感謝 VAN 在我苦悶的碩三生活相互打氣，也謝謝篤雄、老大、 acer 總是在我詢問程式問題的時候，非常耐心且仔細地教導、指正我。也謝謝國權、孟琦陪我一起打球鍛鍊體魄。也謝謝遠在美國的包子三不五時跟我哈拉，讓我心情保持愉悅。身為棒球愛好會唯二會員之ㄧ並且一起騎腳踏車環島的秀逗，從考試開始到碩三，一路互相扶持砥礪到最後，非常感動。更特別感謝 spice 和俊男兩位朋友從我重考開始無論在研究、生活上一直給予我支持、鼓勵，也一起度過許多美好的時光。最後，我要謝謝我的父母、姐姐總在生活上給我最大的支柱，給予我正面的能量。謝謝小嗨在我漫長的碩士生活中每天回應我的喜怒哀樂，一路陪伴，當我的避風港。千言萬語，還是只有感謝，再感謝！

(6)

Throughput Enhancement Using LT Codes in Erasure

Network Communications

Student: Chien-Fu Hu

Advisor: Dr. Hsie-Chia Chang

Department of Electronics Engineering

National Chiao Tung University

(7)

List of Figures

2.1 Butterfly network . . . 5

2.2 Tradition routing in butterfly network . . . 6

2.3 Network coding in butterfly network . . . 7

2.4 One-source three-sinks network . . . 8

2.5 Tradition routing in one-source three-sinks network . . . 9

2.6 Network coding in one-source three-sinks network . . . 9

2.7 Corresponding mapping of butterfly network in Fig 2.3 . . . 13

2.8 General mapping modified of butterfly network in Fig 2.7 . . . 14

3.1 LT decoding procedure cited from Fig.4 in [7] . . . 22

4.1 Simulation flow chart . . . 27

4.2 Basic concept of fragmentation . . . 30

4.3 Data fragment . . . 32

4.4 Illustration of an encoding symbol . . . 33

4.5 Decoding flow chart . . . 36

(11)

5.1 (a) Butterfly. (b) One-source three-sinks. . . 41

5.2 Normalized throughput of two networks . . . 50

5.3 Butterfly network with different buffer size . . . 51

6.1 Edge disjoint paths in butterfly network . . . 54

A.1 Butterfly network . . . 58

A.2 Encoding flow chart of encode and store mechanism . . . 62

A.3 Illustration of labeling operation . . . 63

A.4 Same operation while coding . . . 65

A.5 Inverse operation while coding . . . 66

A.6 (a) Step I. (b) Step II. . . 67

A.7 Distribution table . . . 69

A.8 Modified encoding flow chart adding repeated LUT and distribution table . . 70

(12)

List of Tables

4.1 Design parameters . . . 29

4.2 Example of the parameters setup . . . 31

4.3 Parameters of illustrated distribution . . . 37

5.1 Parameters of simulation . . . 41

5.2 Relations between Noriginal and Ncoding . . . 42

5.3 Average run cycles of file size 4M B, 8M B, 16M B in butterfly network . . . 46

5.4 Average run cycles of file size 32M B, 62.5M B in butterfly network . . . 47

5.5 Average run cycles of file size 4M B, 8M B, 16M B in one-source three-sinks network . . . 48

5.6 Average run cycles of file size 32M B, 62.5M B in one-source three-sinks network 49 5.7 XOR operations of coding systems in Fig 5.1(a) . . . 51

5.8 XOR operations of coding systems in Fig 5.1(b) . . . 52

6.1 Average run cycles with LT-network codes of file size from 4M B to 62.5M B in butterfly network . . . 56

(13)

(14)

Abstract

Bottlenecks occurring during the transmission in the network have caused serious con-sequences on the throughput degradation in receivers. In this thesis, a coding system that apply low degree LT codes in transmission environment to enhance the throughput is in-troduced. The proposed LT code can improve the throughput approach to the theoretical maximum. In the meanwhile, it provides a significant data protection ability against the packet loss in erasure channels. The required buffer size in the intermediates in the LT codes system becomes a inconsequential factor that it can be reduced considerably compared to the data size. In some network topology, we combine low degree LT codes with specified net-work coding mechanism to get further improvement in throughput of every receiver and also give strong protection of data. The overhead of the coding mechanism causes the reasonable computation cost in whole transmission.

For each packet of 1KB, the proposed LT codes work under the entire packet number from 4K to 64K. The proposed method alleviates the bottlenecks successfully in two kinds of topologies, with edge-disjoint paths (EDPs) and without EDPs. It also gives strong protection of data in the erasure channels with different scale of loss rate from 0% to 20%. Our proposed low degree LT code can enhance the throughput in the range from 20% to 30%. In the network with edge-disjoint paths, the combined LT-network code offers advanced improvement up to 50% or so. Finally, the computation is over GF (2) and the coding overhead is about 30 XOR operations per codeword.

(15)

Chapter 1 Introduction

Network has existed for nearly 30 years, and it plays a significant role in computer commu-nication. Owing to the enhancement of the bandwidth, a large quantity of data are able to deliver from the server to users such as MOD (Multimedia On Demand) that the server multicasts data to the users who request the services. Data transmits from one source to the other destination through some nodes called intermediates during the transmission. In the exiting network system, the tradition method is that every intermediate node just does store-and-forward to pass the information to the next node. We find that the tradition routing, however, can not achieve the max flow proved by the max-flow min-cut theorem, particularly in the multicasting applications. Therefore, network coding theory is proposed to alleviate the intermediate bottleneck and to elevate the utility of the network channel.

Packets flooding in the communication networks suffer the loss due to the disturbance. In the erasure channels, every sink either believe what it receives or get nothing. Now that we can’t avoid the loss, we have to facilitate systems to have good capability to fight against error and protect the information. Our goal is to enhance the throughput of the sink and in the meantime, to enable system to establish error protection mechanism to reduce information loss.

(16)

The thesis is organized as followed. Firstly, we introduce some basic concepts regard-ing network codregard-ing includregard-ing some simple examples, mathematical representation, and a innovated method called random network coding which have been announced.

Although the theory confirms the benefit on network coding, the implementation is cum-bersome due to the restrict of the true network transmission such as packet loss, network topology, etc. In order to give consideration to both network throughput and error protec-tion mechanism, we propose LT code to apply for the network coding to fulfill our goals. In chapter 3, we introduce LT code, a famous code applied in erasure channel. We will give explicit description concerning coding procedure, decoding procedure and the design of degree distribution.

In chapter 4, we evaluate the method that network coding collaborates with LT code to approach the optimal performance proved by the theory. The construction of simulation environment will be explained. For the sake of simplicity, the acyclic network and single source multicast condition are specifically concerned in our simulation.

We show our simulation results in chapter 5. We compare three different systems includ-ing routinclud-ing, a codinclud-ing method have been proposed and LT code applied to network accordinclud-ing different packet loss rate.

Finally, some conclusions and discussions will show in chapter 6. In the meantime, we point out some experimental experience in the appendix.

(17)

Chapter 2 Network Coding

Over the last decade, there has been a large interest in network coding. The concept was firstly propounded by [1] and there is dramatic increase in the number of publications on it. Firstly, we describe the main concept of network coding and the well-known butterfly network will be illustrated. Secondly, We generalize the method in mathematical domain, representing the equivalent network coding. Because of the impractical efficient coefficients assignment of every intermediate, random network coding is proposed [3]. In the end of the chapter, we consider the packet loss during the realistic transmission. Therefore, we propose the cooperative method between network and LT code to enhance the throughput of network, in the meantime, to provide error protection of the data.

2.1 Max-flow Theorem

Before we describe the concept on network coding, we should know how to evaluate the theoretical max flow of a sink in the network to be the target we want to achieve. A network consisting of nodes and edges can be viewed as a kind of graph. And the acyclic network is the network that contains no circle or loop composed by the directed edges. The transmission model is in a condition called single source multicast circumstance that there exists only one

(18)

source node to transmit the data to different number of destinations. In graph theory, we know that for a given network topology, we can calculate the max flow of each sink. The max-flow min-cut theorem is described as followed.

Theorem 2.1.1 (Max-flow min-cut Theorem). If f is a flow network G = (V, E) with source s and sink t, then the following conditions are equivalent:

1. f is a maximum flow in G.

2. The residual network Gf contains on augmenting paths. 3. |f| = c(S, T ) for some cut (S, T ) of G.

There are some algorithms to calculate the max flow such as Ford-Fulkerson algorithm, Edmonds-Karp algorithm, and the Relabel-to-Front algorithm in [6].

2.2 Main concept on Network Coding

As mentioned before, data packets are delivered from the source to the destination according the path composed of the chain of the intermediate nodes. Every intermediate node receives the data packets from its input link, storing, and then passes the packets to the next node by the output link. In the case that one intermediate node in the path transmit the data toward multiple nodes or destinations, it copies the data from the input link and then pass the same copy to the different output links. In some situation, this store-and-forward method causes that the node receives the same data by the different input links belonged to distinct nodes, decreasing the utility of the bandwidth. Now that the intermediate nodes process data during the transmission, we let them do arithmetic calculation rather than store-and-pass. The packets transmitted during the path become either true information or some combination (linear or non-linear) of the data. Every destination node also called

(19)

W

U

T

Y

X

Z

S

Figure 2.1: Butterfly network

sink receives the sufficient processed packets, decoding them to recover the true information. Network coding aims to resolve the bottleneck of the intermediate node, and to let every sink fulfill its theoretical max flow. The butterfly network is illustrated in Figure A.1.

2.2.1 Butterfly Network ( Coding in Intermediates )

Fig A.1 is a communication network represented by the nodes and directed edges (links). Nodes are categorized with three types such as source, intermediate and sink. Node without any incoming edges is called source that transmits the information, and by contrast, node without outgoing edges is called sink which is the destination of the messages. Node which is neither source nor sink is called intermediate. In the figure, the node labeled S is source, the nodes labeled Y and Z are sinks, and the other nodes labeled T, U, W, X are intermediates. The directed edge represents the direction of the lossless packet transmission channel and each one has its own capacity per unit time. Each edge capacity of the example is set to 1. The network is said to be acyclic if there exits no directed circle in the whole network

(20)

topology. The multicast condition is that the source wants to transmit the data to all the sinks in the network. In the example, the source S multicasts the data to both the sinks, node Y and Z.

First, we consider the traditional store-and-forward method. In the first transmission, S sends data b1 and b2 to the T and U by the edges ST and SU respectively. And then every intermediate sends the data it receives to the next node. Obviously, we can find that the node W has 2 incoming edges TW and UW but only one edge WX. Therefore, node W chooses either b1 or b2 passing to the X. Assume that W choose b1 and b2 in order, node Z will receive both b1 and b2 but node Y will only receive the data b1. That is to say, we need extra transmission to let W send data b2 to Y through the edge WX and XY. The equivalent throughput of the entire network for node Y and Z is 1 meaning that every sink receive one data per transmission.

b

₁

b

₁

b

₁

W

U

T

Y

X

Z

b

₂

b

₁

b

₁

b

₁

b

₂

b

₂

S

(a)

b

₃

b

₃

b

₃

W

U

T

Y

X

Z

b

₃

b

₂

b

₂

b

₃

b

₃

b

₂

S

(b)

Figure 2.2: Tradition routing in butterfly network

There is one modification in store-and-forward. The optimal method is that in the first time S send two data b1 and b2 to node T and U, and W sends the data b1. After the first run, the outcome is the same as mentioned above. Y receives the only data b1 and Z receives

(21)

W

U

T

Y

X

Z

b

₂

b

₁

b

₁

b

₁

b

₂

b

₂

S

b

₁

⊕b

₂

b

₁

⊕b

₂

b

₁

⊕b

₂

Figure 2.3: Network coding in butterfly network

both b1 and b2. The different is that in the second time, S sends the data b2 and b3 to the node T and U, and node W chooses the b3 passing to the X. When the second transmission ends, we find that every sink has three data b1, b2 and b3 respectively. Hence, every sink obtains 3 data in 2 transmission, the throughput of each sink enhances to 1.5. The procedure is shown in Fig 2.2.

Based on the Theorem 2.1.1, the theoretical max throughput of the sink in the example is 2. Unfortunately, the tradition method is 1 and even the improved method is 1.5, which can not reach the max flow.

In the example above, we locate that the the network obstacle is node W. The number of incoming edges is 2, whereas that of outgoing edges is 1, causing that one of the two data packets needs to be stored in the buffer awaiting the extra delivery. The bottleneck of W can be resolved by the network coding skill. Let W do exclusive-or operation of two packets from incoming edges. The edge WX will transmit the data b1L b2 to node X. We see that Y will receive the data b1 and b1L b2, and Z will receive the data b2 and b1L b2. Both of

(22)

them can recover the true information b1 and b2 by doing the XOR operation of two packets they receive as shown in Fig 2.3. The equivalent throughput is elevated to 2, the theoretical maximum.

2.2.2 One-source Three-Sinks Network ( Coding in Source )

Fig 2.4 is another network topology. Capacity of every edge is also set to 1. Source S needs to multicast data to all the destinations, node X, Y, and Z. The max flow of every sink is 2.

W

U

T

Y

X

Z

S

Figure 2.4: One-source three-sinks network

The tradition routing is shown in Fig 2.5. Firstly, node S sends data b1, b2, b3 to their adjacent nodes, U, T and W. After first transmission, each sink receives two data as shown in 2.5(a). Then, we allocate the data by shifting them to different edges clockwise (shifting counterclockwise gets the same outcome) as shown in 2.5(b). Each sink can get the third data from one edge of two, and get the repeated data from the other one. The efficient throughput of each sink is 1.5.

How can we apply the network coding method in this network topology? In this situation, all we do is let source S do coding. S sends data b1 to U by edge SU and data b2 to T by edge ST as routing does. However, S node doesn’t send the data b3 but b1L b2. Obviously,

(23)

W

U

T

Y

X

Z

b1

S

b2 b1 b2 b1 b2 b3 b₃ b₃ (a)

W

U

T

Y

X

Z

b1

S

b2 b1 b2 b2 b1 b3 b₃ b₃ (b)

Figure 2.5: Tradition routing in one-source three-sinks network

by this alteration applying coding technique, each sink X, Y and Z will receive two data (b1, b2), (b1, b1L b2), and (b2, b1L b2) in turn. Every sink is capable of get two data in one time. Hence, the throughput of every sink is approach to 2, the theoretical maximum.

W

U

T

Y

X

Z

b1

S

b_1⊕b2 b2 b1 b2 b1 b1⊕b2 b2 b1⊕b2

Figure 2.6: Network coding in one-source three-sinks network

The benefit of the network coding is illustrated in two examples above. We claim that network coding can enlarge throughput by letting intermediates or sources do some sim-ple operations. However, it is insufficient to elucidate the delicacy of network coding by merely indicating the specified operation in some nodes to convince that the result will get

(24)

reformed coincidentally. In the following, we will formulate the method mathematically and theoretically.

2.3 Mathematical Representation

We exemplify two cases how network coding apply to multicasting system in different network and gain the improvement compared to the method nowadays. However, we can’t foresee and control every operation in every node intuitively, expecting our straightforward innovation work successfully. In this subchapter, we will formulate the mathematical model to generalize the network coding issue. Adhere to this formulation, we can analyze and resolve the problem systematically.

Network coding is proposed to enhance the flow in the network by doing some computa-tion of original data either in sources or intermediates. Every data packet flooding in network can be regarded as one combination of all intrinsic data. ( Here, only the linear operation is discussed for the implementation simplicity.) We find that the original data spans one space, and packets in every edge span another. That is to say, there exists one mapping in every edge between two spaces. The functionality of every node becomes to map the entire received symbols from its incoming edges to a symbol for outgoing edges. Network coding can be converted to the mechanism for encoding process of every edge.

For the clarification, the definition and symbol notations used in our mathematical Rep-resentation are listed as followed. The notation is quoted by [2].

Notations

• Source: A node without any truly incoming edges.

(25)

• In(T)/Out(T): The set of incoming/outgoing edges of node T. • In(S): a set of imaginary edges without originating nodes. • ω: The number of the imaginary edges.

• data unit: An element of GF (F ).

• message x: A ω-dimension row vector ∈ Fω_. • A network code is in GF (F ) and ω dimension.

Definition 2.3.1. A network consists of a local encoding mapping ˜

ke : F|In(T )| _{→ F}

for each node T in the network and each channel e ∈ Out(T ).

By Definition 2.3.1, we construct the the transform between the incoming and outgoing edges in one node. Since the acyclic network provides the upstream to downstream procedure, data is transmitted by the path composed of edges. The mapping of each edge is equivalent to continual transforming by the passed edges before. Hence, we give another definition to represent the outcome of the processing of the recursive mapping.

Definition 2.3.2. A network consists of a local encoding mapping ˜ke : F|In(T )| _{→ F and a} global encoding mapping ˜fe : Fω _{→ F for each edge e in the network such that:}

• For every node T and edge e ∈ Out(T ), ˜fe(x) is uniquely determinded by ( ˜fd_{(x), d ∈} In(T )), and ˜ke is the mapping via

( ˜fd_{(x), d ∈ In(T )) 7−→ ˜}fe(x)

• The mapping ˜fe are the natural projections from the space Fω _{to the ω different} coor-dinates,respectively.

(26)

Considering the physical implementation, it is desirable that the fast computation and simple circuit in the node. Therefore, the linear transformation is involved. If the encoding mapping ˜fe(x) is linear, there exists a corresponding column vector fe with ω dimension such that the product x · fe is equal to ˜fe(x), where x is the ω-dimensional row vector data generated from the source. Similarly, there exists |In(T )|-dimensional column vector kesuch that y · ke = ˜ke(y), where y ∈ F|In(T )| represents the symbol received in the node T. Since every edge has its own mapping column vector, we can formulate the operation in the node of every edges connected in one node. If a pair of edge (d, e) is linked by one node T with d ∈ In(T ) and e ∈ Out(T ), we call these two edges an adjacent pair. Therefore, we can formulate the coding process by matrix form in every node.

Definition 2.3.3. Network consists of a scalar kd,e, called the local encoding kernel, for every adjacent pair (d,e). Meanwhile, the encoding kernel at the node T means the |In(T )| × |Out(T )| matrix

KT = [kd,e]d∈In(T ),e∈Out(T )

The network coding can be therefore viewed as forming the effective matrix of every node, and every edge can be viewed as a series of computation of the column vector in every matrix of the node that data passes. Note that the structure of matrix assures the order of linked edges.

Definition 2.3.4. A network consists of a scalar kd,e, for every adjacent pair (d,e)in the network as well as an ω-dimensional column vector fe for every channel e such that:

• fe = P

d∈In(T )kd,efd , where e ∈ Out(T ).

• The vector fe for the ω imaginary channels e ∈ In(S) form the natural basis of the vector space Fω_.

(27)

• The vector fe is called global encoding kernel for the channel e.

2.3.1 Butterfly Network over GF (2)

W

U

T

S

Y

X

Z

      = 1 0 0 1 S K

[ ]

1 1 = T K =

[ ]

1 1 U K       = 1 0 ' os f       = 0 1 os f       = 1 1 W K       0 1       1 0       1 0       0 1       1 1       0 1       1 0       1 1       1 1

[ ]

1 1 = W K

Figure 2.7: Corresponding mapping of butterfly network in Fig 2.3

The corresponding edge mapping and operation matrix of every node in Fig 2.3 is showed in Fig 2.7. The imaginary edges of source S is two, and global encoding kernel of two edges, fos and fos′, represent the mapping of the original data to produce the information data

b1 and b2. The exclusive-or operation means the computation is in GF (2). According the matrix of every node, we can calculate the global coding kernel fe of every edge.

We give some examples to derive the global encoding kernel in Definition 2.4. Observing the source matrix KS with 2 incoming and 2 outgoing edges, the element of matrix repre-sents the scalar of two specified linked edge. Based on the definition2.4, we can finds that

(28)

the equivalent global encoding kernel is the summation of the global encoding kernel with corresponding scalar in node matrix.

fST = X

d∈In(S),e∈Out(S)

kd,efe = k11fOS+ k21fOS′ = 1 ·

1 0 + 0 ·0₁ =1 0 fW X = X d∈In(W ),e∈Out(W )

kd,efe = k11fOS+ k21fOS′ = 1 ·

1 0 + 1 ·0₁ =1 1

Fig 2.7 is the special case that the chosen finite field F is 2. However, scalars in every matrix and computations are done in the GF (F ), and it can be generalized in Fig 2.8.

W

U

T

S

Y

X

Z

      = r p q n K_S

[ ]

s t K_T = K_U =

[

u v

]

      = 1 0 ' os f       = 0 1 os f       = x w K_W

[

y z

]

K_X =       p n       r q       ps ns       ru qu       pt nt       rv qv       + + ruxy pswy quxy nswy       + + ruxz pswz quxz nswz       + + rux psw qux nsw

(29)

2.3.2 Butterfly Network over GF (F )

In Fig 2.8, each global kernel can be calculated by the same steps described above. The design parameters are the scalars in every matrix such as n, p, q, r, . . . , z. The assignment of all scalars influences the efficiency of the network utility. Concerning to sink Y , if we want to approach the theoretical maximum, 2, the global kernel fT Y and fXY should be linear independent ,namely , the space spanned by these two vector should also be 2. The condition of another sink Z is the same. If the two vectors are linear dependent, the sink will suffer the flow decreasing.Therefore, we can remark that when the source transmits a message of ω data units into the network, a receiving node T obtains sufficient information to decode the message if and only if dim(VT) = ω, of which a necessary prerequisite is that maxflow(T ) ≤ ω. The prerequisite assures the necessity to applying network coding to enhance utility of the network. If maxflow(T ) > ω, the entire network is capable of affording the whole being transmitted data. There exists no bottleneck in the network and transmission will certainly accomplished without difficulty.

We convert linear network coding to matrix forming, and comprehend that the key to enhance the throughput and decode information successfully is the well designed coefficients in every matrix of each node in whole network. However, it is difficult to implement this concept directly, and the random coding mechanism is recommended in the next subchapter.

2.4 Random Network Coding

We have derived the identical method for network coding from the inference above. Designing a effective linear network code is equivalent to finding out adequate coding matrix K of every node in the network to guarantee that every sink receive enough information to decode the data. However, in the real communication network is enormous and complex that we

(30)

we should avoid the cumbersome and inefficient task such as detecting the entire network. Observing the example in Fig 2.8, the number of design parameters in the network with seven nodes is twelve, and it will increase explosively with the total number of network grows.

Since the coefficients assignment of each node is time-consuming and exhausted, we let each node produce the coefficients randomly rather than appoint them. Linear random network coding provides the method that every node independently and randomly select linear mapping from inputs to outputs from some finite field. By doing this, we don’t need to design the coefficients tiresomely and the key factor becomes how to choose the linear combinations effectively. Coefficients are chosen uniformly or more generally, based on a distribution. We can regard uniformly choice as a special case that every candidates are selected with equal probability. Regarding the sink, it receives the packets which is the linear combination of the intrinsic information and recover data from them. If the distribution performs outstandingly, every sink is able to recover the original data after it receives N data ,where N is the total number of data in source. Namely, the dimension of the packets originated from the linear random coding should span N -dimensional space equivalent to the space spanned by N data.

Random network coding offers a coding mechanism by statistic property instead of de-terministic structure. However, we ought to know that designing a well performed encoding matrix in every intermediate is difficult but not impossible. If we try hard to find efficient encoding matrix in every intermediate, we can therefore get the optimal solution as two examples mentioned above. In the meantime, assigning every encoding matrix varies sig-nificantly due to the network topologies. Namely, we have to design the specified encoding matrix whenever we meet different networks. Random network coding let intermediate en-code independently regardless of the topologies. The performance should be basically not as good as the well-designed optimal structure for specified networks. However, if we can

(31)

find a good mechanism to combine packets effectively, the outcome can be approach to the optimal solutions.

Another factor we should concern is the filed size we choose. If the computation is under a insufficient finite filed, the combinations of data will be easily dependent with each other. It shortens the codeword space which should be as large as the data space and therefore degrade the network utility. The innovated random network coding and some theorems can be found in [3] and [4].

2.5 Summary

The discussion above is on the basis that packets are delivered in lossless communication channel. However, in real system, the packet will suffer loss from the unsteady and noised environment. It causes that even the well designed random method performs poorly due to the packet loss. In order to work against the packet loss during the transmission, we request the coding mechanism with the following properties.

• Coding is based on distribution. • Simple encoding operation. • Good protection of data.

First property continues the random network coding method, and second one simplify hardware implementation and operation complexity. The final property is involved to protect the data due to the inevitable loss. Hence , we bring up a method that applying LT code to network coding to fulfill the demands and enhance the throughput of every sink during entire transmission.

(32)

Chapter 3 LT Code

LT code ( Luby Transform code ) is a sparse random linear fountain code designed by Michael Luby with a outstandingly cheap computation for decoding algorithm. It especially outperforms in the communication for channels with erasures, such as the internet. Every receiver collects any N packets to recover the original data, where N is slightly greater than the original files size K. The computation complexity is astonishingly small, growing linearly with the file size K.

The chapter is organized as followed. Firstly, we introduce the main concept on fountain code. Secondly, we specifically discuss the LT code including encoding process, decoding process , and code structure regarding to distribution. Finally, we focus on how LT code applying to network coding to enhance the efficient flow quantity to fulfill our goal.

3.1 Fountain Code

Fountain code is the one kind of rateless code for erasure channel that packets are either received correctly or lost. Packets passing during the erasure channels gets loss with the probability, causing the sink receives incomplete data, asking retransmission instruction for the erasure parts. The retransmission mechanism results inefficiency of the utility of

(33)

network. The situation becomes worse when multicasting or broadcasting applied in the system. Thus, the need for the erasure correct code is needed to avoid the retransmission. The concept of Fountain code is that the source produces considerable quantities encoded packets, limitless potentially. Comparatively, the sink is respected to receives a slightly larger quantities compared to the total size of data to recover successfully. The encoding method is randomly picks of file with size N . Every encoded packet is a randomly linear summation under modulo 2. If the process continues, it forms a generator matrix of infinite length. However, sink only receives packets of size N due to the erasure channels. The received N packets and the K file forms another generator matrix G. Every element Gnk is set to 1 to represent that source and encoded packets is connected, otherwise represent no connection. Supposed that we know the matrix G, we can therefore decode the whole data without retransmission. If N < K, there can be no opportunity to decode successfully because of the insufficient information. If the number of received packets is exactly K, we need the K × K matrix is invertible, meaning spanning the same space of the file space. The key of the performance that N is closer to K depends whether any subspace of K × N be capable of forming isomorphism space of file space. More detail is introduced in [7].

3.2 LT code

LT code is introduced in [9], and is the first realization of a class of the random linear fountain codes which is the record-breaking sparse-graph code for erasure channels. It substantially reduce encoding and decoding complexity.

3.2.1 Encoding

(34)

1. Randomly choose the degree dn of the packet from a degree distribution ρ(d); the appropriate choice of ρ depends on the source file size K, as we will discuss later. 2. Uniformly choose dndistinct input packets, and set tnequal to the bitwise sum of these

dn packets. The equivalent operation can be done by continuously exclusive-or-ing the packets until dn times.

After the encoding process, source defines a bipartite graph categorized by source in-formation and encoded symbols. Connection structure between two groups depends on the degree distribution significantly. Degree dn means the number of distinct source information connected to an encoded symbol. If the mean degree ¯d is extremely smaller than K, the graph is spare. We can regard the produced code as an irregular low-density generator-matrix code.

In order to decode successfully, sinks have to know the the information including con-nected degree and the members of the concon-nected source information of the received symbols. There are two method for the source to communicate code information with the sinks. First is relied on the synchronized clocks. We can use the random number generator which is seeded by the clock to decode every encoded symbol based on random degree and each connection members of this symbol. Another is to carry the information with the packets. However, the overhead is significantly depended on the max degree of distribution and the size of the index bits to assign identical number of each source information. The cost is tiny if the size of packet is much longer than these carried information.

3.2.2 Decoding

Decoding process is easily in the erasure channel. All that a decoder need to do is to solve the equivalent function t = Gs to recover s from t, where s are source information and t

(35)

are received symbols. Since the channel is erasure, we receive the certainly correct symbols or get nothing due to lost. The simple method to decode is by message-passing, using the complete certain symbols to recover those with uncertainly. The decoding procedure are described as follows.

1. Find a check node tnis degree 1 ( only connected with one source packet sk). ( If there exists no such check node, this decoding algorithm stops right now, and fails to decode all the source packets).

(a) Set sk = tn.

(b) Add sk to all check nodes tn′ that are connected to sk

tn′ := tn′+ sk

for all n′ _{such that Gn}

′_k = 1.

(c) Remove all the edges connected to the source packet sk. 2. Repeat 1 until all sk are decoded.

A simple example is illustrated in Fig 3.1. There are three source information (S1, S2, S3)and four encoded check symbols (t1, t2, t3, t4). At the beginning, only t1 is connected merely to S1. We set S1 = t1 = 1 and cancel the edge between them after the first iteration as showed in b. Then, we add the S1 to all connected check nodes, deleting the edges between S1 and its connected group. In second iteration, we find that t4 is only connected to S2, and we can recover S2 from t4. Similarly, we delete the edge between S2 and t4. Repeat the iteration, we can finally recover all three source packet successfully.

In our example above, total data can be decoded. However, if we find that there exist no check nodes with degree one, decoding procedure will stop, meaning that the process

(36)

crashes. Namely, we need to receive extra symbols to decode the remaining to recover source information. S₂ S₃ 1 0 1 1 1 0 1 1 1 1 1 0 1 0 1 1 1 0 1 1 1 0 1 a b c d e f S₁ S₁ S₂ S₃ S₁ S₂ S₃ S₂ S₃ S₁ S₁ S₂ S₃ S₂ S₃ S₁

Figure 3.1: LT decoding procedure cited from Fig.4 in [7]

3.2.3 Distribution Design

We have described coding process and decoding process in the case that the operation is based on a determined distribution ρ(d). In the following, we discuss how to design a distribution to performs well.

In the decoding process, we discover that decoding procedure fails if there exists no sym-bol with degree one. If we want decoding continues, there must also exist some symsym-bols with lower degree that have chance to become a new degree one symbol to let process keep going. At the same time, if the max degree of distribution is too low, there may exists some source information that are connected to none of the encoded symbol and therefore causes tremen-dous loss. Thus, the performance is vitally depended on the designed distribution. In order

(37)

to fulfil the desired requirements, ideal soliton distribution is derived from mathematical theory.

Ideal soliton distribution defines ρ(d) as follows:

ρ(d) = 1/K1 for d = 1

d(d−1) for d = 2, 3, . . . , K

(3.1) The expected value of degree is roughly logeK.

Soliton distribution works poorly in the real transmission. Because when we obey on this distribution, it gets high probability that there exist no degree one check symbol during the decoding process. Thus, robust soliton distribution modifies the degree distribution.

Robust soliton distribution defines extra two parameters, c and δ.

c : a constant determined by the designer.

δ : the probability that the decoding fails to decode completely after a certain number K′ of symbols have been received.

The modified terms are

τ (d) =    s K 1 d for d = 1, 2, . . . , (K/S) − 1 s Klog(S/δ) for d = K/S 0 for d > K/S (3.2) S is a constant calculated by S ≡ c · loge(K/δ)√K (3.3) Add the modified terms to ideal soliton distribution and normalizes, we get the robust soliton distribution µ(d)

(38)

µ(d) = ρ(d) + τ (d) Z (3.4) where Z =X d ρ(d) + τ (d) (3.5)

Regarding to the additional distribution τ (d) summed to the ideal solition ρ(d), the max degree d is extended to K/S. The spike at d = K/S ensures whole source information connected with higher probability during the encoding process. The max degree required is proportional to the size of files, with inverse proportion to S calculated by the tuning parameter c and δ. δ can be viewed as the probability of decoding failure, and if we want to lower the failure probability, we have the higher corresponding max degree, which fits in with the straightforward intuition.

If we want to decode the source information completely after receiving the whole symbols with a probability (1 − δ) at least, then the required number of total received packets is K′ _{= KZ. It is obvious that Z will slightly larger than 1, and equal to 1 for the optimal} solution we look forward.

Robust soliton distribution offer two designed parameters c and δ to design distribution. The number of each degree depends on c, δ, K essentially. The more representative factor is Z in equation 3.5, the excess quantity of necessitated symbols. A good distribution can be tuned to the result that the needed overhead is usually about 5 to 10 percentage. And the constant c is usually chosen smaller than to 1 to get better performance.

3.3 Summary

(39)

• Coding is based on distribution. • Simple encoding operation. • Good protection of data.

We can easily discover that LT code can meet these desired requirements. Therefore, we propose a method to apply LT code for network coding to accomplish our targets. More detail will be illustrated in the next chapter.

(40)

Chapter 4 Cooperative Network Coding with LT

Code

We have introduced the network coding theoretically in chapter 2 and the LT code in chapter 3. In this chapter, we will demonstrate the innovative ideas how to gather them to achieve our targets, high throughput and prominent error protection. The flow chart below shows the simulation procedure, cutting into several partitions. We will explain the work of each section according the Fig 4.1.

The chapter is organized as followed. First, we will specify the network and calculate the max flow quantity, the goal we pursuit, of every sink. Due to the coding, every packet must carry the extra index of combined information, which causes the fragmentation. After the initialization, the packets will be encoded, transmitted, received, buffered, and decoded recursively until the sink decodes the whole information source delivered. Simulation ends if every node recover the total information. The whole simulation environment is using C++ and the detail will be discussed in every section.

(41)

Data Fragment

Packet Transmission and

Receiving

Buffering

LT Encoding

LT Decoding

Is data decoded

completely?

Start

Yes

End

No

(42)

4.1 Network Topology Specification

In the simulation, the acyclic network is specifically concerned. The transmission model is in single source multicast communication and no edges are connected between sinks meaning that sinks receive the data from either sources or intermediates, or both. The conditions are listed below.

• Acyclic network.

• Single source multicast.

• No shared content among sinks.

Based on these conditions above, two network topologies exemplified in Fig 2.1 and Fig 2.4 are particularly discussed whose max flow is 2 unit capacities in every sink.

4.2 Data Fragment

We apply LT code for network coding. Thus, every packet includes two parts. One part is the outcome by series of exclusive-or operations of the original information. The other is the overhead that records the indices of all original information involved during the encoding process. It is intuitive that additional overhead will lengthen if the total transmitted data enlarges, and we will need more number of bits to record the data correspondingly. We can evaluate by the following formula.

F + D × I = B The parameters are listed in Table 4.1 .

(43)

Table 4.1: Design parameters M total files

B unit capacity D max degree of LT code F fragment information size

I data index

M : The total files to be transmitted. When M enlarges, the number of transmission time increases.

B : The unit capacity of the edge. We set this by finding out the greatest common divi-sor(G.C.D) of all edges. The capacity is the multiple of B. In reality, it should be the last guaranteed bandwidth of network.

D : The max degree of the LT Code, meanwhile, is the largest number allowed to combine the information.

F : The actual information size of the encoded symbol. If network coding is not executed, F will be the same with B.

I : The number of bits to assign the number of every fragment for identification. It can be calculated by I = log2(M

F).

We find that the real information carried in a packet shortens due to the overhead of encoding information. The efficiency drops out after the segmentation, causing extra trans-mission compared to simple routing. (In existing system, there exists particular headers to record the information of transmitted packet.) We define some notations as followed.

• Noriginal. Total number of packets a sink should receive in routing method. • Ncoding. Total number of packets a sink should receive in coding method.

(44)

• M = Noriginal× B = Ncoding× F .

• Of rag. Normalized overhead after the fragmentation. The loss can be calculated by

Of rag = Ncoding Noriginal Fc Fa Fb B Noriginal=10 Ncoding (a) (b) (c) 1 2 3 4 5 6 7 8 9 10 O_frag 12 1.2 13 1.3 20 2 Fd (d) 100 10

Figure 4.2: Basic concept of fragmentation

We describe the basic concept discussed above in Fig 4.2. Assume the unit capacity of the transmission channel is B, and the number of times needed to delivery is Noriginal, which is 10 without any coding mechanism. Every small block labeled from 1 to 10 is helpful to show the ratio of every fragment in different examples. Fig 4.2-(a) shows the fragment Fa is 90% of B and max codeword degree D is 2. The required transmission times is therefore increased to Ncoding, 12. That is to say, we have two extra transmissions due to the fragmentation, and the overhead Of rag is 1.2. Compared (a) and (b), the difference is that the number of bits to index each fragment in (b) is twice as large as that in (a). The quantity of true information a packet can carry is from 90% to 80% of B, causing one extra transmission. Considering

(45)

two examples in (c) and (d), the max degree D increases to 10 and 18, weighting 50% and 90% of a packet. When true information weights lower percentage in a packet, it results in a huge quantity of transmission times. In examples (c) and (d), Ncoding are increased to 20 and 100 which are much more than Noriginal. That means we must design a outstanding coding mechanism to make up for additional transmission due to fragmentation. The accurate calculation is showed below.

Table 4.2: Example of the parameters setup M D B _{F (bits) D × I} Of rag 1KB = 8192b 8172 20b 1.0024 512KB 2 4KB = 32768b 32752 16b 1.0005 8KB = 65536b 65522 14b 1.0002 1KB = 8192b 8164 28b 1.0034 5M B 2 4KB = 32768b 32744 24b 1.0007 8KB = 65536b 65514 22b 1.0003 1KB = 8192b 8152 40b 1.0049 384M B 2 4KB = 32768b 32732 36b 1.0011 8KB = 65536b 65502 34b 1.0005 4B = 32b _× _× _× 384M B 2 16B = 128b 76 52b 1.6842 64B = 512b 462 50b 1.108 30 7592 600b 1.0790 384M B 60 1KB = 8192b 6992 1200b 1.1716 120 5792 2400b 1.4144

Table 4.2 is the illustration of the relation of the designed parameters. We can find that Of rag reach 1 closely if the unit bandwidth is not extremely small, meaning the overhead is slight after fragmentation. However, if the bandwidth is quite small compared to the total data, it causes vital overhead due to the significantly large quantity of index bits carried on. If the case happens, such as the × sign in the table ( implying that the required number of index bits is larger than the unit capacity can afford ), we recommend the tradition routing

(46)

method. Another factor, the max degree of the LT code D, also influences overhead. We have to control the overhead to be adequate or reasonable for fear that even the well performed LT code can’t compensate for the fragmentation overhead, and consequently lowers utility of whole network.

The required index bits can be derived from the formula introduced above. It varies due to different numbers of the transmitted packets. We provide two modes that index bits to record each fragment is either 16b or 32b. The former permits 216_{− 1 transmitted fragments} in total and the later permits 232_{− 1. If we consider the case that total file size is 32MB,} unit capacity is 1KB, and max degree is 10, the required index bits is 13b, total overhead is 130b, and the corresponding Of rag is 1.016. If we apply 16b mode, the required overhead is 160b, and Of rag is 1.02.

When data are fragmented, we cut the total file into pieces. At the same time, we assign the number to each slice as the packet ID. Thus, every non-coding packet can be viewed as a encoded symbol of degree one. Fig 4.3 is the example.

B2

B3

B4

F1

F2

F3

F2

F1

F3

1

2

3 F2

F3

F4

F6

F5

F6

F7

D

B1

B5

F4

F5

F6

4

5

6

7 Without fragment

Data fragment

F7

(47)

4.3 LT Encoding

Coding process is completely the same with the operations of the LT code encoding. A slight different is that since the packets contain the indices of the combined fragments, we need to reallocate the indices after finishing the encoding process. The detail discussion will be presented below.

Source node possesses total fragments also called symbols with degree one, and it can easily encode any symbol with requested degree. Encoding process is described as followed.

1. Randomly choose the degree d of this coding based on distribution. 2. Randomly choose the index of the Ncoding uniformly in d iterations. 3. Do XOR operations of the fragments.

4. Reallocate the indices of the chosen fragments in order. 5. Repeat 3 and 4 until the degree of packet is d.

F1

⊕F3⊕F4⊕F7⊕F8⊕F9

0 0 0

D=10

9 8 7 4 3 1

0

Figure 4.4: Illustration of an encoding symbol

Fig 4.4 is the example of an encoded symbol whose degree is 6. The max degree D is 10, and we set the residual indices to 0 in the unused index positions. In the same time, the index number we use is started from 1, not 0. The indices sequence is ordered from low to high during the encoding process.

(48)

4.4 Packets Transmission / Receiving

Packets are transmitted from nodes to nodes by edges with packet loss rate L. Since we transmit the data in the erasure channel, meaning that the we either lose the packets or believe every value we receive, we set the packet loss mechanism to point whether packet is lost. If mechanism occurs no loss, the packets will be sent to the adjacent nodes by the edges successfully, otherwise, the adjacent nodes will receive zero packets, representing the null transmission due to the loss. We create data to be all zeros to represent loss occurrence. Simulation result will be shown to compare the throughput and the ability of error protection in every sink with different packet loss.

4.5 Buffering

Buffer are used to store the packets from the incoming edges. It provide the temporary storage to preserve the data especially useful when the total incoming flow is larger than the total outgoing flow. The buffering method is obeyed on FIFO, fist in first out mechanism. In traditional routing, we should choose adequate size of buffer size for fear that the when bottleneck occurs in some intermediate, there will be considerable packet loss if the buffer is too small. When we apply LT code on network, the entire packets flooding the network are encoded symbols. If receiving rate is higher than transmitting rate of a certain node, some packets are definitely discarded. Entire packets are encoded symbols, obeying the designed distribution and therefore, even we lose some encoded symbols, causing the difference of desired distribution a source should conform, the difference will be subtlety tiny. That is to say, the buffer size will no longer a ignoring problem because of the coding mechanism and we can use smaller size of buffer to save the hardware cost, achieving well performed throughput as well.

(49)

4.6 LT Decoding

Decoding process resembles the LT decoding. A sink receives symbols continuously, mean-while, activate the decoding procedure. We will receive and decode recursively until whole data are covered completely. The decoded symbol is one segment of the original data, therefore, we have to recover the true information by extra de-fragmenting operation. The procedure is listed below corresponding to a received symbol.

1. Check the symbol degree. If degree is one, step 2 ,else step 4.

2. Check whether this degree one symbol has been decoded before. If so, step 6, else step 3.

3. Defragment the new symbol, labeling it in decoded index list.

4. Check whether there exists the same indices between the symbol and the decoded index list. If yes, step 5, else, step 6.

5. XOR operation of two matched codewords, erasing the same index of the symbol. Return to step 1.

6. Finish.

The decoding process can be shortly summarized as searching the same indices among the decoded index list, XOR operation, and erasing the computed index to decrease the degree iteratively. Since the property of the LT decoding is vitally dependent on the degree one symbol, we should check two conditions to start the decoding. First condition is the new decoded index, and the other is the same degrees between this received symbol and the list of decoded indices. If the received symbol can not fit in with these two conditions, no decoding process is started up. In the meantime, after one coding process, there should exist

(50)

Check received symbol degree

Check same index between symbol and

decoded list Is 1 ?

Start

No

Finish

Is any same ?

No

Yes

Defragment, labeling new index

Yes

_{XOR operation, erasing}

the same index Check symbol degree Check whether symbol

is decoded

Has decoded?

No

Yes

Figure 4.5: Decoding flow chart

no indices the same with any index in decoded list. That is to say, the decoding procedure stops if there is no valuable information among the un-decoded symbols. The flow chart is in Fig 4.5.

Decoding flow chart describes the decoding process when receiving one encoded symbol. Firstly, we need to check the degree of the new arrived symbol. If the degree is not one, we have to examine the index members of this symbol to see if we can reduce its degree by exclusive-or-ing the same index in the decoded list. If not, the decoding is finished leafing

(51)

the symbol which can not be decoded. If we find any symbol with index matched to any index member of the symbols, we extract its information, reducing the degree and return the degree check condition. If we discover the degree of received symbol is one, we firstly examine whether we have decoded this symbol before. If so, it represents this symbol is helpless for us to get more information and we stops the decoding procedure right away for fear that we spend much time searching whether there exists same index of this useless symbol. We should avoid the meaningless check. If this symbol is newly decoded, we have to search symbols with this new information to reduce the degrees of those un-decoded symbols to help the procedure go on.

4.7 Degree Distribution Analysis

The discussion above is established on the known well designed degree distribution. In chapter 3, We have studied how to obtain ideal soliton distribution and the robust soliton distribution modified by two additional parameter c and δ. Also, we realize the ratio of the degree one and degree max relevantly influence the decoding performance. If we want to get distribution by robust soliton distribution, the example below shows the corresponding procedure to find distribution with following parameters.

Table 4.3: Parameters of illustrated distribution M 10000KB B 1KB Noriginal 10000 I 16b c 0.2 δ 0.05 F D

(52)

dis-tribution and realize the max degree D can be decided by the K/S, the max degree used in robust soliton distribution. Assume we choose c = 0.2 and δ = 0.05, we can therefore calculate out S = 244, K/S = 41, and Z ≃ 1.33. The corresponding distribution of ρ(d) and τ (d) is showed in Fig 4.6. In Fig 4.6, we find that the modified term τ (d) adds the weighting mostly in max degree K/S, and in degree one. Weighting of every degree in distribution is the sum of ρ(d) and τ (d). After the calculation, we get max degree D is 41, and the fragment size will be 1KB − 41 × 16b = 7356b. Ncoding is therefore increased to 11137. Since LT code concerns the number of actual transmitted symbols, we have to put Ncoding = 10871 to the calculation to get another required max degree and distribution, obtaining another new fragment size. The process will go on iteratively till the outcome converges. The final design will be Ncoding = 10917 and D = 43.

If we obey the soliton distribution, we can tune a adequate one in sufficient tries. In our experience, we find that what we do care is the equivalent throughput in the sink. If we attempt to elevate the probability of successful decoding and reduce parameter Z with smaller transmission times, we must pay a high max degree for the coding system. It fatally decreases our ability to carry true information of one packet. The well designed distribution is usually hard to compensate for the Ncoding, the actual transmission times.

Now that we concern most is the throughput, we hope to let Ncoding is closer to Noriginal as possible as we can. The intuitive thought is that the quantity of information carried on one packet should be larger. Our methodology is to limit the max degree D and we tune the distribution below our restriction. The goal we want to reach is quite the same, the difference is the way approach to it. If the degree we set up is too small, it is almost impossible for us to tune a adequate distribution. Therefore, we have to enlarge out max degree and tune it again. The process continues till the overall outcome of throughput get enhanced.

(53)

0 5 10 15 20 25 30 35 40 45 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5

Robustl Soliton Distribution

tau rho

Figure 4.6: Robust Soliton Distribution with two components ρ and τ

to enlarge the distribution of degree one because it is the key to get decoding keep going by belief propagation. On the other hand, we lower the weighting of degree two because if we don’t have sufficient degree one codewords, we have more degree two codewords in vain. The third modification is to lower the weighting the distribution of max degree. The max degree functions as the trunk of the code structure, seizing the total information. We let this task distributed to other degrees, and expect that we can alleviate the risk to be unable to decode because of the too many codewords with max degree. Hence, we get much lower distribution of max degree and let the weighting transferred to other degrees.

(54)

Chapter 5 Comparison and Simulation Result

In this chapter, we will show our simulations to verify our proposed method. We take two kind of network topologies we have introduced in chapter 2. Every network is operated in multicasting environment. The chapter will be separated to several parts according to different topics.

Every section contains the comparisons with different methods including traditional rout-ing, coding method proposed in chapter 2, and proposed LT code applied to network coding. We simulate in the environment with packet loss rate 0%, 5%, 10%, and 20%. We compare every throughput of different methodology and calculate the total overhead of the coding system such as coding operations and decoding operations.

5.1 Throughput

In this section, we focus on the throughput of the network by different transmission mecha-nisms. We show the results of two different kinds of networks in Fig 5.1. Since the capacity of all channels is quite the same, we use 1KB as the unit capacity. On the other hand, LT code performs variously owing to different numbers of transmitted packets, we set up different size of file M from 4M B, 8M B, 16M B, 32M B, to 62.5M B. Number of bits a

(55)

W

U

T

Y

X

Z

S

(a)

W

U

T

Y

X

Z

S

(b)

Figure 5.1: (a) Butterfly. (b) One-source three-sinks.

packet index uses is 16b. The max degree we design here is 10 so that the overhead due to the fragmentation will be substantially tiny. The corresponding Noriginal to each size of files is 4K, 8K, 16K, 32K and 64K. Buffer size of each intermediate is set to 1K. Parameters and the relations between Noriginal and Ncoding are summarized as followed in Table 5.1 and Table 5.2.

Table 5.1: Parameters of simulation

M 4M B, 8M B, 16M B, 32M B, 62.5M B B 1KB Noriginal 4K, 8K, 16K, 32K, 62.5K I 16b D 10 Buffer size 1K L ( loss rate ) 0%, 5%, 10% , 20%

In our simulations, we compare three different methods. The first one is traditional routing method existed in the current system. The second one is network coding proposed in [1]. The third one is our proposed LT code applied to network environment. Since we do care is the utility of network of every sink, that is to say, if we use less cycles to let all sinks

(56)

Table 5.2: Relations between Noriginal and Ncoding File Size M 4M B 8M B 16M B 32M B 62.5M B

Noriginal 4K 8K 16K 32K 62.5K Noriginal 4178 8356 16711 33421 65275 Of rag 1.02 1.02 1.02 1.02 1.02

receive the data, the system has higher efficient utility. Results are showed in Table 5.3 to Table 5.6.

Table 5.3 and Table 5.4 show the average required run cycles in Fig 5.1(a) according to different levels of completeness in different loss rate. As the tables show, run cycles of routing is close to the required times a source should transmit when edges occur no loss. And we can also find that in lossless environment, results of the methoed proposed in [1] outperforms quite a lot compared to routing and proposed LT. It should be no surprise because this code is a specified design for this particular network so that it can performs outstandingly. However, when it suffers from different level of packet loss, we observe that routing mechanism and the specified design perform from bad to worse sharply. It means that the ability to protect packets from loss is quite insufficient so that we can not recover the lost packets from what we received, therefore, we require more cycles to transmit data to sinks. Regarding proposed LT, we find that required cycles to decode data are also increasing with severe loss. Since proposed LT code offer mechanism against packet loss, we can still decode a certain part of information when some packets are lost. Run cycles of proposed LT are significantly smaller than all the others especially when sinks receive 90% and 95% of entire data. Namely, if you can bear data loss that is from 5% to 10% or so, proposed LT supply a well constructed coding system to enhance throughput in the erasure channels that the worst loss rate scale up to 20%.

利用LT 編碼增進網路通訊系統吞吐量之研究

國 立 交 通 大 學

電子工程學系 電子研究所碩士班

碩 士 論 文

利用 LT 編碼增進網路通訊系統吞吐量之研究

Throughput Enhancement Using LT Codes in Erasure

Network Communications

研究生：胡健甫

指導教授：張錫嘉 教授

利用 LT 編碼增進網路通訊系統吞吐量之研究

Throughput Enhancement Using LT Codes in Erasure

Network Communications

利用 LT 編碼增進網路通訊系統吞吐量之研究

Throughput Enhancement Using LT Codes in Erasure

Network Communications

誌 謝

Throughput Enhancement Using LT Codes in Erasure

Network Communications

Student: Chien-Fu Hu

Advisor: Dr. Hsie-Chia Chang

Department of Electronics Engineering

National Chiao Tung University

Contents

List of Figures

List of Tables

Chapter 1

Introduction

Chapter 2

Network Coding

2.1

Max-flow Theorem

2.2

Main concept on Network Coding

W

U

T

Y

X

Z

S

2.2.1

Butterfly Network ( Coding in Intermediates )

b

b

b

W

U

T

Y

X

Z

b

b

b

b

b

b

S

b

b

b

W

U

T

Y

X

Z

b

b

b

b

b

b

S

W

U

T

Y

X

Z

國立交通大學

電子工程學系電子研究所碩士班

碩士論文

指導教授：張錫嘉教授

誌謝