O UTLINE OF T HESIS - RESEARCH OVERVIEW - 適用於非對稱網路連線之動態用戶的彈性應用層多點傳播

CHAPTER 1 RESEARCH OVERVIEW

1.3 O UTLINE OF T HESIS

In chapter 2, we will briefly introduce three different architecture of multicast and the related work of application layer multicast. In chapter 3, we will describe the detail of the three approaches. In chapter 4, we will describe the use of three approaches in the mechanism of application layer multicast and how it works. In chapter 5, we will show the simulation results and evaluate our performance. In chapter 6, it is the conclusion and the future work.

Chapter 2 Background

2.1 Architecture of Multicast

Since multi-receiver multimedia applications, like video-conferencing, video streaming, e-learning and online-gaming, are more and more popular on the Internet, multicast is an important mechanism need to be developed. Multicast is a one-to-many transmission mechanism and is very efficient to reduce duplicate packets and bandwidth consumption when it is used for multi-receiver applications.

There are three different architecture of multicast: IP multicast, application layer multicast (ALM), and overlay multicast [21]. Figure 2 shows the difference between these three architecture clearly. IP multicast is developed on network layer and use routers as the relaying nodes. IP multicast is the most directly implementation of multicast and can reduce most duplicate packets in transmission process. However, because of several issues [1], IP multicast is not globally deployed yet.

Figure 2: Three different architecture of multicast.

(a) IP multicast, (b) application layer multicast, (c) overlay multicast

Overlay multicast is the architecture that constructs a backbone by overlay proxy first.

And then it establishes multicast trees among overlay proxy and end hosts. It has good performance that is close to IP multicast. However, the main issue is the deployment of overlay proxy.

Application layer multicast is a hot research topic in recent few years [2~5, 7, 11, 12, 14, 16] and has become an attractive solution of multicast. It is also the easiest one for immediate deployment among these three architectures. ALM is developed on application layer and doesn’t need to modify any existent protocol in lower layer. User’s desktops or mobile devices (called peer) replace the function of routers in IP multicast. Data packets are replicated at peers and send to other peers in the same application layer multicast service.

2.2 Related Work

We briefly introduce some research about application layer multicast.

2.2.1 Multiple Description Coding (MDC)

The overview of multiple description coding is in [22]. It is a method to encode signal into multiple separate descriptions (streams) and any subset of descriptions can be restore to the original signal with different quality. If users receive more description, the distortion will be less and the quality of restored signal will be higher. One difference between MDC and IDA is MDC is a source coding method and IDA is a channel coding method.

2.2.2 Multiple Stripes

SplitStream [2] and CoopNet [3] are two mechanisms using multiple stripes. Both of them get a result that using multiple stripes will increase the robustness, resilience, and load balance. The detail of multiple stripes will be described in next chapter. There are two main differences between them. First, CoopNet uses a centralized tree building algorithm while SplitStream is decentralized because it bases on Scribe. Second, CoopNet does not handle the

bandwidth contribution of peers in the trees.

2.2.3 Waypoint

There is new overlay architecture in [9]. Besides normal participants, they employ some machines call waypoints. In the overlay tree, these two kinds of nodes are interwoven. It is different to the overlay proxy in overlay multicast. Waypoints are the same as normal participants and they run the same protocol. Therefore, its behavior is the same as other users, rather than statically provisioned infrastructure nodes, such as Overcast nodes in [6]. The purpose of waypoint’s participation is to increase the total amount of resource in the system.

In their experiences, waypoint is needed in some cases and their investigation is still in progress. Besides, the difference between waypoint and helper will be described in next chapter.

2.2.4 Tree Building Algorithm

The two main categories of tree building are based on distributed hash table (DHT) and hierarchical clustering. Scribe [14] and Bayeux [16] are the ALM tree building methods that base on the Pastry [13] and Tapestry [15] DHT mechanism. In the beginning, the purpose of DHT is to search and route efficiently (with the bound of hops, usually log(N) hops). Every peer will be assigned an id that generated by hash mechanism and the routing path will base on the hash id. And then, the tree building methods that base on DHT build transmission paths along the routing path by reverse path forwarding (RPF) method. These transmission paths have same source peer, so these paths will form a tree structure.

NICE [4] and ZIGZAG [5] are the ALM tree building methods that base on hierarchical clustering. Each peer who wants to join the tree will be put into a cluster base on specified metrics (such as distance). One or more peers in the cluster will be promoted to a higher level cluster and their responsibility is to transmit data to all peers in lower level cluster. And then, there is also a higher level cluster to send data to the high level clusters. Finally, the only one member in the highest level cluster will be the source peer. These clusters construct a tree

structure and a level of clusters represents a level of the tree.

2.2.5 Enhancement of Resilience

Probabilistic Resilient Multicast (PRM) [11] uses a proactive forwarding approach to increase the data delivery ratio. The method is to use randomized forwarding. Every node randomly chooses a constant number of other nodes and forwards data to them with low probability. The randomized forwarding is simultaneous with the usual data forwarding mechanism. Hence, some nodes may receive the same data. But this approach will make the nodes can receive the data even when their parents fail. And then, it proposes an extension called Ephemeral Guaranteed Forwarding (EGF). When some nodes are repairing their transmission paths (finding new parents), they can request other nodes to temporarily increase the probability of forwarding data to them. Therefore, they still can receive the data in the repairing process.

A proactive approach to reconstruct multicast trees is proposed in [12]. When some interior nodes leave trees or fail, it will minimize the disruption of service for those affected nodes. The approach is every interior node should compute a parent-to-be for each of its children. In the computation process, it will consider the degree constraints of each node. And it also deals with the situation of multiple leaves. After that, if an interior node leaves the tree or fails, all its children will find their new parent (parent-to-be) immediately. By this approach, when some edges of the tree broken, every node can find a new parent quickly and also recover the transmission path.

Chapter 3 Principle

3.1 Overall System

The three approaches we use are information dispersal algorithm, multiple stripes, and helper. Base on these three approaches, we can establish the scheme of application layer multicast system presented in Figure 3.

Figure 3: Application Layer Multicast System

The whole graph is an ALM system with many independent ALM services inside. Each service provides different streaming data. The circle nodes are the users who join the system and generally called peer. Each peer has different name based on what they do in the system.

Peers who provide streaming data are called source peers. Peers who subscribe the service are called subscribers. Peers who don’t subscribe the service are called helpers of the service.

Peers who don’t subscribe any service are called idle peers and they can be helpers for all services. Now, we point out the place our three approaches performed.

Information dispersal algorithm The dark circle node is the source peer. Before it sends out the streaming data, the data will be processed by IDA.

Multiple stripes In Figure 3, there are two different lines inside the ALM service in left side. The two kinds of line represent two different trees and source peer transmit different

stripes which are generated from IDA through different trees.

Helper The ALM service in left side shows there are two helpers contribute their bandwidth to the service. They help to retransmit stripes to the subscribers of the service.

We have several hypotheses for the ALM system. One, most peers of the ALM system are using asymmetric connectivity. It means they have poor upstream bandwidth. Two, all peers have degree constraint which is determined by the upstream bandwidth offered by individual peers. Three, the data rate of the streaming data provided in ALM system is too high for general ALM mechanism. Four, the number of peers in the ALM system is very large.

In other words, the lack of helper is not a problem. Five, the streaming data transmitted in ALM service might have security requirement.

In the following sections, we will introduce the principle detail of the three approaches.

3.2 Information Dispersal Algorithm (IDA)

(n, m) information dispersal algorithm (IDA) [8] is a method that disperses data for security, fault tolerance and etc. It can disperse the original data into n pieces and we must have m pieces or more, , to be able to restore to the original data. For security, peers can’t know any content of data if they have less than m pieces. For fault tolerance, it means that it could tolerate some of pieces missing and still can restore to the original data. Moreover, it has a special characteristic that the m pieces we mention before is any m pieces without any order, needn’t to be continues, and no any piece is must have. On the contrary, using IDA will cause the data size much bigger. If the size of data F is

m≤n

F , we disperse it into n pieces, the

size of each piece will beF m/ . Therefore, the total size of n pieces will be F ⋅( /n m). Following we will brief describe how to use IDA in transmission process.

3.2.1 Split

First, we must decide (n, m) and then we use IDA on the transmitted data F. Let the

content of F be , F is divided into N units. We use the content of F to generate a matrix B and the blank places are filled in 0. The size of matrix B is .

1, 2, , _N of m different vectors must be linearly independent. And then we use to compose matrix A that the size of matrix A is .

It is the initialization step from (1) to (4). Then we will use IDA in the transmission process. First, we will divide F into n pieces. Base on the matrix A and B we calculate before,

A B⋅ =C, C is a n× ⎡⎢N m/ ⎤⎥ matrix. We can divide C into n vectors called cⁱ, 1≤ ≤i n, the length of a vector is . These n vectors are the pieces we want. And then these n pieces will be transmitted through different transmission paths to all peers (based on multiple stripes approach).

1 ( 1) 1

ik i k m im km

c =a ⋅b ₋ ₊ + +a ⋅b (7)

3.2.2 Restoration

After splitting F into n pieces in previous section, we will show we can restore F from any m pieces. When peer receives m pieces or more, the peer can restore the original F. We choose any m pieces and choose the corresponding m vectors

in A. Through the expression below, we can restore the matrix B and also we obtain the original data F.

, 1

Therefore, as long as all peers generate the same matrix A as source peer, they could restore the original data from any m pieces of the data.

3.2.3 Efficiency

Only produce matrix C and restore matrix B in IDA will influence on the efficiency of Trickle. It is because the production of matrix A and A^-1 is only once, we don’t need to reproduce. Now, we consider the number of operations split and restoration need.

For split, it needs n m× × ⎡⎢N m/ multiplication operations and n×(m− × ⎡1) ⎢N m/ ⎤⎥ addition operations. The complexity of operation is also associated with the size of b which is

F N. Therefore, split is affected by the parameter n and the size of data. The complexity of split is O n( ×F). For restoration, it needs m m× × ⎡⎢N m/ ⎤⎥ multiplication operations and addition operations. Therefore, restoration is affected by the parameter m and the size of data. The complexity of restoration is

( 1) /

m× m− × ⎡⎢N m⎤⎥

( )

O m× F .

3.2.4 Advantage

Using the combination of IDA and multiple stripes in transmission process has several advantages. First, security and fault tolerance, these are the design purposes of IDA. These two characteristics quite match our requirement. Because we join the concept of helper into Trickle, security of data transmission is necessary when the data is sensitive. And further, because the whole ALM service is composed by peers (end hosts), the service can’t promise peers can receive all data. By the fault tolerance of IDA, the transmission process is more resilient. Second, we only need to receive m of n pieces to restore original data, so we do not need to wait the remand n-m pieces. Because every piece is transmitted through different path, some paths are congested and some are not. Therefore, the thing that influences on our delay of receiving data is the most quick m paths, not all n paths. Even if the congestion of transmission paths will change, we still can prevent to be influenced by most congested n-m paths without changing our transmission paths. Hence, the receiving process is more efficient.

3.2.5 Influence of n and m

Because our IDA and multiple stripes approach will influence each other, the decision of n and m will cause some effect. First, n will influence on how many trees we will build in an

ALM service. Too many trees will lead to increase amount of control signal and the control overhead will consume our rare upstream bandwidth. Second, the ratio of n and m will influence on the transmission overhead. As we mention before, the size of data that using IDA will times bigger than the original one. It means that the data transmitted through network will increase times. However, we don’t really need to receive all n stripes.

We will explain the improvement in next chapter. In addition, the ratio of n and m will influence on the resilience of ALM service. The larger ratio of n and m is, the more resilient ALM service will be. In other words, we can tolerate more stripes lose. Hence, the decision of n and m is very important and must base on what application the service wants to provide.

( /n m)

3.3 Multiple Stripes/Trees

The concept of multiple stripes has been brought up in previous research [2], [3]. Each message transported from source peer will be divided into several stripes. In tree building step, we not only construct one tree, we construct as many different trees as the number of stripes that one message is divided into. And each tree transmits different stripe of the message to subscribers. Hence, the peer receives different stripes of the message through different transmission paths. Figure 4 present an example of multiple stripes. By this approach, we get a lot of advantages mentioned before. Moreover, we enhance multiple stripes by combining with IDA approach. The description is in following content.

Figure 4: An example that a message is divided into 2 stripes

3.3.1 Combine with IDA

The combination of multiple stripes and IDA mean that one message will be divided into several stripes by IDA approach. And each stripe will be transmitted through different trees.

The purpose of multiple stripes/trees is to prevent losing a whole message when one link broken. The purpose of IDA is to promise peers will restore the original message even some stripes lose.

When we use (n, m) IDA technology, every peer joining in the ALM service will have n different transmission paths (actually, it is mostly less than n, we will explain about it in next chapter) that the origination is source peer and destination is itself. Each transmission path of the peer transmits different stripe and peers only need to receive any m stripes of the total n

stripes, then those stripes can be restored to the original message. In other words, the transmission process can tolerate to loss at most n-m stripes.

Therefore, we must make sure that any peer leave the ALM service will not cause the remand peers who are still in the ALM service can not restore the original message. This situation happens only when the leaving peer is in more than n-m of n transmission paths of any peer. When such peer leaves the ALM service, it will cause more than n-m transmission paths of the peer break down and the peer can only receive less than m stripes. Hence, the peer can not restore the original message. This situation can be prevented by building disjoint transmission paths.

3.3.2 Building Disjoint Paths

Base on the description before, we know that we must prevent any peer is in more than n-m transmission path of other peers. The solution for building this kind of transmission paths

is to restrict that every peer become interior node in constant number of trees. And the number must not bigger than n-m. Therefore, there are two different choices to decide the number of being interior node for each peer. The first choice is every peer becomes interior node in more than one tree and the second choice is to restrict every peer can become interior node in only one tree. The advantage of first choice is that some peers with high upstream bandwidth will contribute their bandwidth averagely in multiple trees without contributing whole upstream bandwidth in only one tree. It is good for Trickle because in transmission process with IDA, the importance of all stripes is the same. However, there is a critical disadvantage to contribute bandwidth in multiple trees. It will cause the height of trees in ALM service much bigger and the end to end delay will be higher because the degree of nodes in each tree are smaller.

Therefore we decide to pick the second choice that every peer becomes interior node in only one tree. This method restrict that any peer at most in one of n transmission paths of other subscribers. And then, we have to use a method to decide peers become interior node in

which tree. In SplitStream [2], it provides a method that can establish interior-node-disjoint trees. Each tree root has different prefix of groupId. Due to the tree building method, groupId and nodeId of peers will cause which peers become interior node. Therefore, it will restrict that every peer will play the role of interior node in no more than one tree and being leaf node in the remand trees. This method is only fit for the tree building algorithms that base on DHT.

In out hypothesis, we hope Trickle can be applied to any tree building algorithm. Therefore, we use different method to decide peers become interior node in which tree.

3.3.3 Become Interior Node

Each peer decides it will become interior node in which tree by itself. When a new peer is in the process of joining service, there are two decision methods to decide which tree it should be interior node. One, in order to prevent violating the DHT concept, if the tree building algorithm is based on DHT, it decides to be interior node in the tree whose root has the same prefix hash id. It is approximately the same as SplitStream, but the only difference is the peer already decides the tree it wants to be interior node by itself in the beginning of joining service. The second method is the peer randomly decides which tree it should be interior node and be leaf node in other trees.

在文檔中適用於非對稱網路連線之動態用戶的彈性應用層多點傳播 (頁 14-0)