Multicast has been examined as an effective way to distribute data to potentially large group of peers. It is useful for scaling multi-party applications.
However, deployment multicast service in network layer [1] has not widely adopted by most commercial Internet service providers (ISP), and thus theoretical researches related to multicast service have been over a decade or even more, large parts of the Internet are still incapable of native multicast support so far. Currently some researches have transfer their focus on implementing multicast services in the application layer [2][3][4][5]. This kinds multicast are called application layer multicast (ALM). They do not change the network infrastructure; instead they implement multicast forwarding functionality exclusively at end-hosts. Overlay network, also can be called application layer peer-to-peer network in this scope, is a computer network built upon the application layer and is thought as a platform to realize multicast service.
Figure 1.1: Network Layer Multicast vs. Overlay Network Multicast shows the different between network layer multicast and overly network multicast. The blue nodes are peers and orange nodes are routers. In network layer multicast, routers help duplicate and forward packets to other peers; but in overlay networks, peers help duplicate and forward packets to other peers.
Figure 1.1: Network Layer Multicast vs. Overlay Network Multicast
Decentralized but structured, and Decentralized and unstructured architecture.
1) Centralized architecture: As Figure 1.2: Centralized Architecture shows that routing discovery between two arbitrary nodes is supported by centralized server. The centralized server maintains all routing information of nodes in the network. Nodes send routing discovery request to centralized server to receive destination’s position. The weakness of Centralized architecture is the network population issue. Because centralized server maintains all routing discovery information, the heavy holding leads the network size cannot be too large. Also the centralized server suffers single point of failure problem. If the centralized server shutdowns or is attacked, nodes cannot communicate each others; if the centralized server is compromised, Man-in-the-middle attack is possible occurred in routing discovery request of nodes. This kind of architecture suffers efficiency and security problems.
Figure 1.2: Centralized Architecture
2) Decentralized and unstructured architecture: Because the problem of centralized architecture, some researches have transferred to focused on decentralized system. It means that there is no centralized server in the system, each node need to record its neighbors’ information. When one node wants to communication other node, it uses flooding or random walks to separate routing discovery request. Other node who receives the request message helps forward to
destination. Figure 1.3: Decentralized and Unstructured Architecture shows that source node random selects one node to send request; and the selected one helps forward message to destination. Because nodes are randomly chosen, it may not be the optimistic paths. Without centralized server, there is no single point of failure problem in this kind of architecture. But because of the routing discovery request is using flooding or random walks, the system cannot guarantee requests will be received by destinations in limited TTL (Time to Live) value and unlimited TTL value may consume the network resource. So this kind of architecture also is unable to support large nodes in the network. KaZaA, BitTorrent and Overnet/eDonkey2000 are these kinds of overlay networks.
Figure 1.3: Decentralized and Unstructured Architecture
3) Decentralized and structured architecture: Based on the problems of above two architectures, current researches is focusing on structured architectures, which means routing discovery is transmitted by some routing rules, but not based on flooding or random walk anymore. These kinds of systems have no centralized server to handle routing discovery requests, and nodes form the hierarchy for helping forward request messages. Each node also records its neighbors’ information.
When node sends a routing discovery request, it follows the routing rule each system maintains. The routing rule is based on the hierarchy the nodes form. There are many kinds of hierarchies, such as ring based hierarchy, like Chord [7], tree based
hierarchy, like Pastry [9] and Tapestry [10], and others, like CAN [8]. Based on specific hierarchy, each system can guarantee that the routing discovery request is received by destination node in given maximum bounds of routing hops. Also many researches have shown that decentralized and structured overlay networks provide better performance on data routing than two above architectures [11].
Figure 1.4: Decentralized and Structured Architecture
Based on their routing mechanism, they also propose their own multicast architectures, CAN multicast [12], Internet indirection infrastructure [13], Scribe [14]
and Bayeux [15], respectively.
Figure 1.5: (a) Ring-based Multicast and (b) Tree-based Multicast
Since in overlay networks, popular files are shared with many peers and queried by many others as well, intermediate nodes in routing paths may be the bottleneck of data transmission.
In this paper we propose a network coding based multicast to release the
loading if intermediate nodes and to improve the system throughput. Network coding is a method proposed to improve the efficiency of a given network topology.
In conventional networks, each node either relays or replicates data from input links to output links. But in networks with network coding, each node supports to decode data from input links and to encode data into output links. With network coding, data provider needs to know topology of nodes in multicast paths. We add the topology search information into the data request and acknowledgement packets. We also propose a group key distribution scheme that group key piggybacks in topology search packets without additional packet for distributing it.
1.1 Data Request in Overlay Networks
First we introduce how data request in decentralized and structured overlay networks. The system assumes entries are roughly evenly distributed in both node and data namespaces. It means that each peer can request data with the identity of the data, sometimes the name of the data. Next we show the basic procedure of data request in overlay networks.
Any data provider holding data D computes H by using hash function (H=
hash(D)). Then the data provider informs the node whose identity is equal to or similar to H. This node is called session node. The session node connects the relationship between the data provider and data D. Now any peer who wants data D computes H first from hash function and then generates a routing discovery request.
It sends the routing discovery request to the session node. The session node forwards the request message to the data provider. After data provider receives the request message, it knows who wants the data and it can directly sends D to the peer.
The routing of the request is based on each peer’s identity. If the namespace
size of the system and the base of the identity are defined as N and b separately, routing request can be achieved in at most logb(N) hops. The basic idea of routing request of some decentralized and structured overlay network architectures are very familiar. Data location can be easily implemented on overlay network architectures by associating a hash value H with each data D, and storing the (H, D) pair at the node to which the hash value H maps. Here we just give an abstract and the detailed can be found in each system. Based on data request routing, architecture also has its multicast protocol. Next we describe the basic idea how multicast can be realize in overlay networks.
1.2 Multicast in Overlay Networks
First, each peer who wants to query the data D computes H = hash(D).
Second, the peer sends the request to the session node. The session node would receive many requests. Peers who have send requests form a group for this session time. Then the session node forwards all the requests to the data provider. The data provider would know all the peers requesting the data. It chooses some peers out of the group as intermediate node and sends data D to them. Intermediate nodes help to duplicate and forward data to other peers in the group.
In this paper we observe some phenomenon of the multicast service in decentralized and structured overlay network: the intermediate nodes may be the bottleneck of the system through multicast service. We study to use the information-theoretic technology called network coding to release the heavy loading of the intermediate nodes. Network coding is a coding method proposed to improve the data transmission efficiency of a given network topology. We will have more detailed description about network coding in section two. By solving the bottleneck
problem, we show that the overall system throughput is also enhanced.
The remainder of this paper is organized as follows. Section two is related work; section three is observation; section four is our proposed scheme; section five is performance evaluation, and section six is the conclusion.