Chapter 1 Introduction
1.4 Summary
The remaining part of this thesis is organized as follows. Chapter 2 describes the current work in P2P streaming studies related to our research. Chapter 3 presents the idea, design and implementation of our system in details. Chapter 4 presents the experiment setup, results, system performance and analysis. Finally, we give our conclusions in Chapter 5.
Chapter 2
Related Work
There have been comprehensive studies on P2P systems. In this chapter, we will first briefly describe the current developments of P2P live streaming systems, P2P VoD streaming systems, and P2P streaming systems with time-shift function, and then we will discuss the overlay topologies used in P2P streaming systems and their data delivery mechanisms.
2.1 P2P streaming overlay
P2P streaming technologies can be broadly divided into two classes: tree-based approaches and mesh-based approaches, following is a brief description.
2.1.1 Tree-based
In tree-based overlay, nodes are connected to form a tree-shaped graph, with the source node as the root and peer nodes as interior nodes or leaf nodes, establishing parent-child relations. Parent nodes are responsible for sending the streaming data to their children. Single-tree structure is the simplest form of this type of structures. The advantage of the tree structure is that the transmission delay is usually shorter because the streaming data is transmitted along the fixed paths. However, there are immediate visible defects. First, when an interior node fails, its offspring nodes are disconnected from the source and cannot receive streaming data immediately. The tree must be rebuilt, causing extra overhead. Second, most nodes in the system are leaf nodes, but since they have no children nodes, they cannot provide its uplink transmission capacity to the system. To solve the mentioned problems, multiple-tree overlay has been proposed. By transmitting part of the streaming content with independent multicast trees, the system distributes the forwarding load on every node and hopefully minimizes the effect of
peer churn on the disruption of streaming data..
2.1.2 Mesh-based
In mesh-based overlay, each node is connected to partial nodes in the system forming a mesh distribution graph. Since there is no parent-child relationship between connected nodes, a common strategy is that connected nodes exchange the availability information of the streaming data periodically, and then request their required data from the nodes owning the missing data. Mesh-based systems may have longer setup delay and need extra control messages, such as the data availability information and pull messages for missing streaming data. However, the self-organizing characteristic makes them robust to node failures and peer churn.
2.2 P2P streaming data delivery mechanisms
Three different data delivery mechanisms have been used in P2P streaming systems: push mechanism, pull mechanism, and hybrid push-pull mechanism.
2.2.1 Push mechanism
Using push mechanism, when a node receives data, it pushes the data to other nodes in the network without explicit requests from these nodes. Since this mechanism has no requests for data, it reduces control message overhead and shortens the setup delay, but it is also costly to recover from lost data or lost connection. For example, if the connection between two nodes is broken, the streaming data cannot be transmitted across the broken connection, and the topology must be rebuilt.
2.2.2 Pull mechanism
Using pull mechanism, a node pulls its required data by sending requests to other nodes. With the capability to pull, the system is robust to lost data or lost connection, but the message overhead in requesting every single data block has also make it suffer a longer setup delay, and the pulling operations should be scheduled carefully to avoid
redundant data transmission. For example, a request is made with an overloaded node and the requested data are not transmitted in time. The requester may make another request with another node, and consequently receive duplicate data blocks.
2.2.3 Hybrid push-pull mechanism
The hybrid push-pull mechanism extracts the advantages from both the push mechanism and the pull mechanism. This hybrid mechanism is used in GridMedia [4]
and the new version of CoolStreaming [5]. In GridMedia, the node first pulls the data it needs. When it detects the pulling procedure is smooth, it then tells the sending peer to push data to it. In the new version of CoolStreaming, when a node pulls data from streaming media content, so the issue here is to form the overlay structure and adopt a content delivery mechanism. CoopNet [6] adopts a centralized model; the source node is responsible to collect information from the joining nodes and maintain a multi-tree structure. Using a multiple-description-coding (MDC) technique, each tree to transmit different MDC descriptions. However, CoopNet is not a pure P2P system, but a complement to a client-server framework; the multi-tree overlay is only invoked when the server is unable to handle the load imposed by clients.
In SplitStream [7], the streaming content is split into multiple stripes and independent multicast trees are constructed for delivering each stripe. By constructing a
forest of multicast trees such that an interior node in one tree is a leaf node in all the remaining trees, the forwarding load can be evenly spread across all participating nodes, but such node-disjointness is a property hard to achieve, especially in heterogeneous environments [8]. In GridMedia, the bootstrap procedure uses a rendezvous point to assist the bootstrap of the overlay. A newly joined node first contacts the rendezvous point to obtain a list of nodes that already joined the overlay. Then, it measures the end-to-end delay to each node in the list and selects a number of node as partners, with the probability of a node is selected is in inverse to the end-to-end delay, thus making nodes nearby more likely to be selected. In DONet/CoolStreaming [3], a newly joined node first contacts an origin node and the origin node randomly selects a deputy and redirects the new node to the deputy. The new node can obtain a list of partner candidates from the deputy and establish partnership with these candidates. In the system, the video stream is divided into segments of uniform length, and the availability of segments in the buffer of a node is represented as a bitmap called Buffer Map (BM).
Each node continuously exchanges its BM with its partners and then schedules the pulling operation accordingly. The scheduling algorithm took both availability and partners’ upload ability into consideration; the block with least number of available providers will be pulled first, from the partner with the highest available and sufficient bandwidth among the multiple potential providers, if any.
2.3.2 P2P VoD streaming
Video-on-Demand (VoD) service provides users the functionality to watch whatever and whenever they want. Here, the issue is what should a peer caches to support the system, and how to find such cached content in the system. In P2Cast [9], peers watching the same video clip within a time threshold form a session in single-tree fashion, each peer caches the beginning part of the video and a newly joined peer can be
patched with the cached beginning part and its parent’s buffer contents. In P2Vod [10], peers form generations, where in each generation, peers have synchronized buffer start.
A newly joined peer will try to join a generation, or form a new generation appended to the older generation. Generations are numbered, from G1 as the oldest generation and Gn as the youngest generation. Nodes in these generations excluding the server form a video session. In a session, if there is no client that still has the first block of the video, the session will be closed, and a new video session is needed for newly joined clients.
Both P2Cast and P2Vod only support start-from-beginning VoD viewing. oStream [11]
provides peers the ability to watch from arbitrary positions, but because the system inserts new peers into the system, video disruption will be noticeable on the child nodes of the new peers.
BASS [12] applied BitTorrent protocol to download video content, with the VoD server to support emergency content, which is too close to the playback deadline but is not arrived yet. The simulation result shows the mechanism helps reducing 34% of the bandwidth of the serverwhen users’ average outgoing bandwidth is about the same as download priority so that the urgent data can be downloaded first. PONDER also gives up the tit-for-tat incentives; peers are served based only on their needs without considering their contributions. This maximizes the amount of data that can be downloaded before the playback time. PONDER achieves 70% saving of server
bandwidth with users’ average outgoing bandwidth being about 80% of video bit-rate, and up to 93% saving for users’ average outgoing bandwidth being 112% of the video bit-rate.
2.3.3 P2P live streaming with time-shift streaming support
To the best of our knowledge, P2TSS [14], LiveShift [15] and an IPTV variation [16] are the few researches on providing both live streaming and time-shift streaming.
P2TSS presents two distributed cache algorithms: Initial Play-out Position Caching (IPP) and Live Stream Position Caching (LSP). It allows peers to decide which video block to be cached locally and shared with other peers. Their simulation results indicate that P2TSS achieves low server stress by utilizing the peer resource.However, in IPP, the availability is not uniform for each video block, while in LSP, though the availability is uniform for each video block, it requires extra bandwidth and more connections for each peer to fill its distributed streaming cache.
LiveShift is a software prototype. It is a live streaming system based on a multiple tree overlay. As a peer watches the video and the video data reaches a predefined size, the data is stored and the peer adds a reference to the segment in a DHT. Although they have presented a demonstration scenario, there is no detailed analytic results of the system.
IPTV is an integrated media delivery architecture that provides four basic functionalities of video delivery: linear TV, video on demand (VoD), time-shifted TV (tsTV) and network personal video recorder (nPVR). The system adopts native IP multicast for linear TV, and distributed caching and P2P mechanism for VoD, tsTV and nPVR services.
2.4 Live streaming framework based on
DONet/Coolstreaming
system, we would not create a new one; instead, we adopted the new implementation of DONet/Coolstreaming as the live streaming framework to deliver live contents. In the following, we’ll introduce the characteristics of the new DONet/Coolstreaming.
1. Node hierarchy
For each node in the system, it maintains three levels of nodes: members, partners and parents. Members give a partial view of currently active nodes in the system, and no connection is established between the node and its known members. Connections established between partners to exchange block availability information. Parent-child relations are formed when connections are established for actual block transmission.Apparently a node’s parents and children are a subset of its partners set.
2. Multiple Sub-Streams
The video stream is encoded and packed into continuous blocks and can be decomposed into S sub-streams, by grouping blocks whose timestamps have the same modulo of S. By dividing the stream into multiple sub-streams, each sub-stream can be retrieved from different parent nodes independently, which means a node can retrieve data from up to S nodes. Figure 2-1 shows a video stream divided into four sub-streams with S=4.
Figure 2-1 Sub-streams dividing 3. Joining procedure
A newly joining node first contacts the bootstrap server and retrieves a list of available channels. After selecting a channel, the node retrieves a partial list of the currently active nodes in the channel, and put the nodes in the list to its membership cache. Then the node randomly selects some nodes, with whom connections are established so that they will (1) exchange their membership cache knowledge and (2) exchange block availability information periodically. The exchanged information helps the node to decide where it should start requesting data. Then again, the node randomly selects some partners to establish a parent-child relationship, where actual data transmission takes place. A parent can be subscribed with multiple sub-streams.
4. Hybrid push-pull mechanism
To form a parent-child relationship, the node subscribes a sub-stream with another node. When a node receives a subscription message with a designated starting timestamp, the node becomes the parent node of the subscriber node and stores the subscriber’s information, including its IP, communication port number and data port number in a sub-stream subscriber list. The parent node starts sending to the subscriber all blocks in the subscribed sub-stream starting from the timestamp given. The parent can be either the source or another node. In the source case, it pushes a block to the subscribers whenever it finishes packing a new block, and in the another node case, it pushes a block to subscribers whenever it receives a new block. The subscription contract is ended when the subscriber sends an unsubscribing message, or when the parent node is unable to push blocks to the subscriber because of underlying network problems.
5. Parent re-selection
As the subscription increases, a node may be overloaded and starts to lag pushing blocks to its subscribers. A node can detect such lagging by (1) comparing sub-stream
receiving status between parents, or (2) comparing sub-stream receiving status between its own buffer and its partners’ buffer. As shown in the upper part of Figure 2-2, the node compares the receiving status in its buffer, and can discover that sub-stream 2 is lagging behind sub-stream 1 by three blocks. As shown in the lower part of figure 2-2, the node compares the receiving status in its buffer with a partner’s buffer, and can discover that its sub-stream 2 is lagging behind the partner’s sub-stream 2 by three blocks. If the lagging range is larger than a certain threshold, which can potentially indicates the node is overloaded, the parent re-selection is triggered, and a new parent node will be selected to support the lagging sub-stream and the original subscription is cancelled. The new parent node can be selected from the current partners if there’s any, or from current parents with better buffering status, if there’s no available partners.
...
Figure 2-2 Comparing sub-stream status in parent re-selection
2.5 Kademlia DHT
To store the information of the cached contents distributedly, a distributed hash table (DHT) is used. We adopted Kademlia for the purpose. Kademlia is a DHT system based on XOR metric. Each Kademlia node has a 160-bit identifier; each node chooses its identifier at random when joining the system. The keys used for the hash table mapping are also 160-bit identifiers, which we use SHA-1 hash function on the name of wanted file to generate. Given two identifiers x and y, the distance between them is the bitwise XOR result interpreted as an integer. The detailed operation will not be described here, but two major functions used in our system are PUT<key, value> and GET<key>. PUT<key, value> function stores the <key, value> pair on K nodes closest to the key, where K is a system parameter that can be adjusted. The GET<key> function retrieves the value associated with the given id, i.e., PUT<key, value> had been performed.
Chapter 3
System Design & Implementation
P2P time-shift streaming is similar to P2P VoD services, except that the size and duration of a VoD program can be pre-calculated, while those of time-shift video streaming can’t be done in advance. Therefore, caching mechanism is the major issue in the system.
3.1 System overview
By studying related research on P2P live streaming and P2P VoD streaming systems, we conclude that our system needs to cope with following issues: live streaming, content caching, publishing, searching and fetching, which we sorted to three major topics:
1. Live streaming framework
Live streaming framework provides a basis for this system, as time-shift streaming contents are provided by the contents that live streaming viewers had watched and cached in their local storage. The details of the applied live streaming framework based on DONet/Coolstreaming had been presented in Section 2.4.
2. Caching strategy and cache replacement policy
Live streaming nodes cache the contents they had watched to support time-shift streaming nodes, and thus the caching strategy is an important issue of the system design. Two factors, cached data redundancy and time-shift service span, were considered. It is clear that having all live streaming nodes caching all the contents they had watched provides the most data redundancy, but the shortest service span, because each node only has a limited storage space, and the time-shift service span hat a single node can provide is equal to the node’s storage space. On the other hand, having only
one replica in the system provides a storage space equal to the sum of all nodes’ storage space, but this provides poor data redundancy since the departure or failure of a node means the lose of data. Therefore, a mechanism keeping a balance between them is important, and thus we propose a probability algorithm to keep a desired number of replicas in the system.
3. Time-shift content search/fetching mechanism
The cached content must be located before it can be retrieved, we adopted Kademlia [17-18] distributed hash table (DHT) for content publishing and content search. With the published knowledge collected from the DHT, time-shift contents can be fetched from multiple sources in an efficient and load-balancing manner.
3.2 System architecture
Figure 3-1 System Architecture
Figure 3-1 depicts the architecture of our system. The system contains three types of components: bootstrap server, provider and viewer. The bootstrap server maintains a list of available channels and a list of participating nodes of each channel, in order to bootstrap the newly joined nodes. A provider is also a source node in the live streaming network and it registers its providing channel’s information with the bootstrap server.
Viewers first join the system with the help of the bootstrap server, and then retrieves the desired video contents for live streaming playback or time-shift playback.
Figure 3-2 System diagram of a node
Figure 3-2 depicts the system diagram of a node; the node can be a channel provider or viewer. The player-buffer relationship depends on the type of the node. For a provider, the player encodes the original video stream in to packet stream, and the stream data is put in its buffer for data transmission and genertaing its buffering status.
For a viewer, the video content is also put in its buffer for data transmission, generating buffering status and playback. To share video content among peers, the buffered content
can be transmitted through either live streaming mechanism or time-shift streaming mechanism. The live streaming part handles the content transmission for live streaming, and cooperates with the time-shift streaming part to cache and publish the contents.
Transmissions are carried out on TCP connections to avoid network layer data losses.
Kademlia DHT is used to published the cahed content, and its messagesare transmitted over UDP packets.
3.3 Transmission unit in streaming
The basic streaming flow of our system is depicted in Figure 3-3. The video stream is generated from the video source. A video server encodes the video stream into continuous packets and transmits to the viewer nodes. Each viewer node receives the video packets, and the video player decodes the received packets back to video. It is
The basic streaming flow of our system is depicted in Figure 3-3. The video stream is generated from the video source. A video server encodes the video stream into continuous packets and transmits to the viewer nodes. Each viewer node receives the video packets, and the video player decodes the received packets back to video. It is