TCP壅塞控制技術之研究與設計

全文

(1)國立交通大學資訊工程學系博士論文. TCP 壅塞控制技術之研究與設計 Study and Design of TCP Congestion Control Techniques. 研究生：詹益禎指導教授：陳耀宗博士. 中華民國九十三年六月.

(2) TCP 壅塞控制技術之研究與設計 !. Study and Design of TCP Congestion Control Techniques ! ! 研究生：詹益禎. Student: Yi-Cheng Chan. 指導教授：陳耀宗教授! ! !!Advisor: Yaw-Chung Chen ! ! ! 國立交通大學資訊工程學系博士論文 ! ! A Dissertation Submitted to Institute of Computer Science and Information Engineering College of Electrical Engineering and Computer Science National Chiao Tung University In Partial Fulfillment of the requirements For the Degree of Doctor of Philosophy in Computer Science and Information Engineering. June 2004 Hsinchu, Taiwan, Republic of China. ! 中華民國九十三年六月.

(3) Study and Design of TCP Congestion Control Techniques Student: Yi-Cheng Chan Advisor: Dr. Yaw-Chung Chen. A Dissertation Submitted to the Department of Computer Science and Information Engineering College of Electrical Engineering and Computer Science National Chiao-Tung University In Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy in Computer Science and Information Engineering. Hsinchu, Taiwan, Republic of China June 2004.

(4) TCP 壅塞控制技術之研究與設計學生：詹益禎. 指導教授：陳耀宗博士國立交通大學資訊工程學系摘. 要. 隨著網際網路訊務流量的快速成長，如何以有效率的方式使用網路資源是一個成功的壅塞控制機制所要面對的根本問題。TCP 身為網際網路上一個被廣為使用的端對端傳輸層協定，它被創造出許多不同的版本，用來改進網路的使用效能。在目前的 TCP 版本中有兩個特別值得留意的方案，一個是現今網際網路上廣為使用的 Reno，另一個則是宣稱相較於 Reno 可增進百分之三十七到七十一傳輸效能的 Vegas。 TCP Vegas 能夠偵測出初期的網路壅塞而且可以成功的避免在 TCP Reno 上時常發生的週期性封包遺失的現象，很多研究報告已經指出，Vegas 在很多方面都要優於 Reno，例如整體網路的使用率、穩定性、公平性、以及傳輸速率等。很可惜的是 Vegas 並不完美，仍然有一些缺點存在於它的壅塞控制機制中，這些問題或問題的起因包含了重新繞路、永久壅塞、競爭連線間的公平性、非對稱網路傳輸、高頻寬延遲乘積網路、以及無線傳輸下的非壅塞封包遺失等。在這份論文中，我們提出了四個改進 Vegas 的機制，為 Vegas 移除邁向成功的障礙。這些新提出的機制有些是單純的端對端方法，有些則利用了路由器所提供的資訊以改善連線的傳輸效能。第一個提出的方案 RoVegas 是一個使用路由器訊息回饋的改進方案，藉由封包路徑上的路由器執行特定的機制，RoVegas 可以解決重新繞路時所引起的問題，可以解決永久壅塞問題，也可以增進競爭連線間的公平性，以及改善在非對稱網路傳輸時，TCP 可能的效能損失。 Enhanced Vegas 是一個純粹端點對端點的改進機制，不用路由器的協助，它可以量測出發生在前送路徑和返回路徑上的網路壅塞程度，因此它能精準而合宜的調整封包的傳送速率，有效提高當壅塞發生在返回路徑時的連線效能。由於在擁有大壅塞窗口時的反應速度過於緩慢，TCP 在高頻寬-延遲乘積網路中顯得效能不彰，因此第三個改進機制 Quick Vegas 被提出來。Quick Vegas 利用連 i.

(5) 線壅塞窗口的調整紀錄以及連線估測堆積在佇列中的封包數量為依據，對 TCP 壅塞控制演算法做出調整，這個改變使得一個連線的送端在調整壅塞窗口大小時採取更有效和積極的態度，因此讓連線在高頻寬-延遲乘積網路中能有較好的表現。 TCP 壅塞控制的一個眾所週知的問題是它沒有辦法分辨出封包遺失的原因，傳統的 TCP 把所有封包遺失的原因都歸咎於網路的壅塞，這種推測在異質性日益顯著的網際網路中並不合宜。錯把傳輸失誤所造成的封包遺失當成網路壅塞的訊號將導致 TCP 不必要的效能損失。最後一個提出的改進方案 RedVegas 利用 TCP Vegas 原有的特性以及路由器在封包上的壅塞標記，可以準確的判斷出傳輸失誤所造成的封包遺失，透過封包遺失原因的分類，RedVegas 可以適切的對不同原因的封包遺失做出不同的反應，因此改善了在異質網路傳輸中的 TCP 效能。. ii.

(6) Study and Design of TCP Congestion Control Techniques Student: Yi-Cheng Chan. Advisor: Dr. Yaw-Chung Chen. Institute of Computer Science and Information Engineering National Chiao Tung University. ABSTRACT. With the fast growth of Internet traffic, how to efficiently utilize network resources is essential to a successful congestion control. Transmission Control Protocol (TCP) is a widely used end-to-end transport protocol in the Internet, it has several implementation versions (i.e., Tahoe, Reno, Vegas...) which intend to improve network utilization. Among these TCP variants, there are two notable approaches. One is Reno which has been widely deployed on the Internet; the other is Vegas with a claim of 37 to 71 percent throughput improvement over Reno was achieved. TCP Vegas detects network congestion in the early stage and successfully prevents periodic packet loss that usually occurs in TCP Reno. It has been demonstrated that TCP Vegas outperforms TCP Reno in the aspects of overall network utilization, stability, fairness, and throughput. However, TCP Vegas still suffers problems that inhere in its congestion control algorithm, these include issues of rerouting, persistent congestion, fairness, network asymmetry, high bandwidth-delay product (BDP) networks, and internetworking of wired and wireless networks. In this dissertation, we propose four enhanced mechanisms to remove the obstacles of TCP Vegas for achieving a real success. These mechanisms not only adopt end-toend approaches but also utilize the information that provided by routers to improve the performance of connections. The first proposed mechanism, RoVegas, uses a router-assisted approach. By performing the proposed scheme in routers along the round-trip path, RoVegas can solve the problems of rerouting and persistent congestion, enhance the fairness iii.

(7) among the competitive connections, and improve the throughput when congestion occurs on the backward path. An end-to-end scheme, Enhanced Vegas, is also presented to improve the performance degradation of TCP Vegas in asymmetric networks. Through distinguishing whether congestion occurs in the forward path or not, Enhanced Vegas significantly advances the connection throughput when the backward path is congested. TCP congestion control may function poorly in high BDP networks because of its slow response with large congestion window size. In the third mechanism, we propose an improved version of TCP Vegas called Quick Vegas, in which we present an efficient congestion window control algorithm for a TCP source. The modification allows TCP connections to react faster and better to high BDP networks and therefore improves the overall performance. A well-known problem in providing TCP congestion control over wired and wireless networks is that it may encounter both congestion loss and random loss. Traditional TCP interprets every packet loss as caused by congestion which may not be the case in the current Internet. In the last proposed mechanism, RedVegas, we utilize the innate nature of TCP Vegas and congestion indications marked by routers to detect random packet losses precisely. Through the packet loss differentiation, RedVegas reacts appropriately to the losses, and therefore the throughput of connection over heterogeneous networks can be significantly improved.. Keywords: Asymmetric networks, heterogeneous networks, high bandwidthdelay product networks, Internet, TCP congestion control, TCP Vegas, transport protocol, wireless networks.. iv.

(8) Acknowledgements. I come, I research, I conquer. Finally and fortunately, I acquire my Ph. D. degree. Years ago, passing the entrance exam of NCTU, I got into the science research hall. Then I started leading a very different and unforgettable life from what I used to have. Especially the recent two years of my daily life was: studying, researching, experimenting, writing, submitting, waiting, and then repeating them all. This was not an easy life for me. Sometimes I would say it was “suffering” and “frustrating”. So during this suffering and frustrating time, there must be some strength to help me to go through the dilemma. The strength is from my supervisor – Prof. YawChung Chen, my senior – Dr. Chia-Tai Chan, my wife, my two lovely kids, and my parents as well. Here I will devote my gratitude to them that they always stayed with me and encouraged me to pass through the way overgrown with brambles. And let me find my broad road. First I would say thanks to Prof. Yaw-Chung Chen. He is such a nice supervisor that never gives me too much pressure but will help me at the right time. No matter in life or in research, he is my best supervisor. Second I would thank my senior, Dr. Chia-Tai Chan. In the preceding six years in NCTU, I was still having a job in Accton Corp. At that time, as getting no precise idea about my research, I was a little bit scared to throw myself into the unknown future. But Dr. Chia-Tai Chan who called me via the phone a couple of times and encouraged me “When you give up, you’ll regret in your whole life”. Yes,. v.

(9) I agreed with him. Then I quit my job and concentrated on my research. And all the way, he stayed beside me as a best friend as well as a good teacher. Then I appreciate my beloved family – my dear wife and lovely kids. Since I dedicated myself to finish my doctor degree, my wife – Jeannie, has become the backbone of my family. She needs to earn money to support the high tuition of two kids in kindergarten. She takes care of the two kids, most of household chores, ... etc. More importantly, she hardly complains. Moreover, I have to thank the two lovely kids: my daughter – Mia, and my son – Lucas. Being tired from my Lab. on the way home, I expect to watch the smiles from my two kids. They always make me relaxed and recharge my exhausting energy. Finally, I would devote my thankfulness to my father and mother: without them, without me. They make me what I am today. Here I would dedicate this dissertation to both of them: my dear father – Mr. De-Zhen Chan, and my dear mother – Mrs. Shu Qien-Dai Chan.. Yi-Cheng Chan National Chiao Tung University in Hsinchu, Taiwan. June 2004. vi.

(10) Contents Abstract in Chinese. i. Abstract in English. iii. Acknowledgements. v. Contents. vii. List of Figures. x. List of Tables. xiii. 1 Introduction. 1. 1.1. Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 2. 1.2. Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 3. 1.3. Dissertation Outline . . . . . . . . . . . . . . . . . . . . . . . . . . .. 5. 2 Background 2.1. 6. TCP Reno . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 6. 2.1.1. TCP NewReno . . . . . . . . . . . . . . . . . . . . . . . . . .. 9. 2.1.2. SACK . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10. 2.1.3. FACK . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11. 2.2. TCP Vegas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11. 2.3. Other Congestion Avoidance Mechanisms . . . . . . . . . . . . . . . . 17. 2.4. Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 vii.

(11) 3 RoVegas: A Router-Assisted Congestion Avoidance Mechanism for TCP Vegas. 19. 3.1. Problem Statements . . . . . . . . . . . . . . . . . . . . . . . . . . . 20. 3.2. RoVegas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 3.2.1. Proposed Mechanism . . . . . . . . . . . . . . . . . . . . . . . 23. 3.2.2. Implementation Issue . . . . . . . . . . . . . . . . . . . . . . . 25. 3.3. Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26. 3.4. Performance Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . 28. 3.5. 3.6. 3.4.1. Analysis on Vegas . . . . . . . . . . . . . . . . . . . . . . . . . 29. 3.4.2. Analysis on RoVegas . . . . . . . . . . . . . . . . . . . . . . . 31. Performance Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . 32 3.5.1. Throughput Improvement . . . . . . . . . . . . . . . . . . . . 33. 3.5.2. Persistent Congestion . . . . . . . . . . . . . . . . . . . . . . . 38. 3.5.3. Fairness Enhancement . . . . . . . . . . . . . . . . . . . . . . 40. 3.5.4. Gradual Deployment . . . . . . . . . . . . . . . . . . . . . . . 43. Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45. 4 Enhanced Vegas: An End-to-End Approach of TCP Vegas for Asymmetric Networks. 46. 4.1. Enhanced Vegas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47. 4.2. Performance Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . 49. 4.3. Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51. 5 Quick Vegas: Improving Performance of TCP Vegas for High BandwidthDelay Product Networks. 52. 5.1. Problem Statements . . . . . . . . . . . . . . . . . . . . . . . . . . . 53. 5.2. Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54. 5.3. Quick Vegas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55. 5.4. Performance Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . 57 5.4.1. Basic Behavior . . . . . . . . . . . . . . . . . . . . . . . . . . 58. 5.4.2. Convergence Time . . . . . . . . . . . . . . . . . . . . . . . . 60 viii.

(12) 5.4.3 5.5. Utilization, Queue Length, and Fairness . . . . . . . . . . . . 61. Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65. 6 RedVegas: Performance Improvement of TCP Vegas over Heterogeneous Networks. 67. 6.1. Problem and Motivation . . . . . . . . . . . . . . . . . . . . . . . . . 68. 6.2. Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69. 6.3. RedVegas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70. 6.4. Performance Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . 73. 6.5. 6.4.1. Basic Behavior . . . . . . . . . . . . . . . . . . . . . . . . . . 75. 6.4.2. Impact of Random Loss . . . . . . . . . . . . . . . . . . . . . 76. 6.4.3. Impact of Random Loss and Cross Traffic . . . . . . . . . . . 78. 6.4.4. Numeric Analysis . . . . . . . . . . . . . . . . . . . . . . . . . 79. Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81. 7 Conclusions and Future Work. 82. 7.1. Summary of Contributions . . . . . . . . . . . . . . . . . . . . . . . . 82. 7.2. Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84. Bibliography. 85. Curriculum Vitae. 92. Publication List. 94. ix.

(13) List of Figures 2.1. Packets in transit during slow-start. . . . . . . . . . . . . . . . . . . .. 7. 2.2. TCP Reno’s window evolution. . . . . . . . . . . . . . . . . . . . . .. 9. 2.3. TCP Vegas’ window evolution. . . . . . . . . . . . . . . . . . . . . . . 13. 2.4. Phase transition diagram of TCP Vegas. . . . . . . . . . . . . . . . . 14. 2.5. Flowchart of the procedure for TCP Vegas upon receiving an ACK. . 16. 3.1. Fields of an AQT option. . . . . . . . . . . . . . . . . . . . . . . . . . 24. 3.2. Network model for analysis. . . . . . . . . . . . . . . . . . . . . . . . 28. 3.3. A single bottleneck network topology for investigating throughputs of Vegas and RoVegas when the congestion occurs on the backward path. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33. 3.4. Throughput of Vegas in asymmetric networks. . . . . . . . . . . . . . 34. 3.5. Throughput of RoVegas in asymmetric networks. . . . . . . . . . . . 35. 3.6. Throughput comparison between Vegas and RoVegas with the backward traffic load is 0.9 in the single bottleneck network topology. . . . 36. 3.7. Average throughput versus different backward traffic loads for Vegas and RoVegas in the single bottleneck network topology. . . . . . . . . 37. 3.8. A parking lot network topology for investigating throughputs of Vegas and RoVegas when the congestion occurs on the backward path. . . . 37. 3.9. Throughput comparison between Vegas and RoVegas with the backward traffic load is 0.8 in the parking lot network topology. . . . . . . 38. 3.10 Average throughput versus different backward traffic loads for Vegas and RoVegas in the parking lot network topology. . . . . . . . . . . . 39 x.

(14) 3.11 Network topology for studying the bias and fairness issues of Vegas and RoVegas. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 3.12 Queue occupancy of the forward bottleneck for Vegas and RoVegas. . 40 3.13 Network topology for exploring the fairness issue of Vegas and RoVegas, in which the traffic pairs are featured by different propagation delay. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 3.14 Fairness investigation of Vegas and RoVegas in which connections with the same propagation delay and successively enter the network every 30 second. (a) Vegas (α = 1, β = 3). (b) Vegas (α = β = 2). (c) RoVegas (α = 1, β = 3). (d) RoVegas (α = β = 2). . . . . . . . . 42 3.15 Fairness investigation of Vegas and RoVegas in which connections with different propagation delay and enter the network at the same time. (a) Vegas (α = 1, β = 3). (b) Vegas (α = β = 2). (c) RoVegas (α = 1, β = 3). (d) RoVegas (α = β = 2). . . . . . . . . . . . . . . . 43 3.16 Throughput comparison between Vegas and RoVegas for only R2 is AQT-enabled in the parking lot network. . . . . . . . . . . . . . . . . 45 4.1. Network configuration for the simulations. . . . . . . . . . . . . . . . 49. 4.2. Throughput comparison between Vegas and Enhanced Vegas with the backward traffic load 0.9. . . . . . . . . . . . . . . . . . . . . . . . . . 50. 4.3. Average throughput versus backward traffic load for Vegas and Enhanced Vegas. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51. 5.1. Network configuration for the simulations. . . . . . . . . . . . . . . . 57. 5.2. Basic behavior of TCP Vegas. . . . . . . . . . . . . . . . . . . . . . . 58. 5.3. Basic behavior of Quick Vegas.. 5.4. Convergence time of new connections. . . . . . . . . . . . . . . . . . . 61. 5.5. Convergence time of connections when available bandwidth is halved.. 5.6. Convergence time of connections when available bandwidth is doubled. 62. 5.7. Queue status of the bottleneck. . . . . . . . . . . . . . . . . . . . . . 64. . . . . . . . . . . . . . . . . . . . . . 59. xi. 62.

(15) 6.1. Snapshot of the consecutive packets in network pipe. . . . . . . . . . 72. 6.2. Flowchart to illustrate the procedure of Vegas/RedVegas as it receives an ACK. Shady blocks are for RedVegas especially. . . . . . . . . . . 74. 6.3. Network configuration for the simulations. . . . . . . . . . . . . . . . 75. 6.4. Average goodput versus cross traffic load for the three TCP variants.. 6.5. Average goodput versus data packet random loss rate for the three. 75. TCP variants. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76 6.6. Average goodput versus data packet random loss rate for the three TCP variants with the cross traffic load is 0.5. . . . . . . . . . . . . . 77. 6.7. Average goodput versus cross traffic load for the three TCP variants with the data packet random loss rate is 0.05. . . . . . . . . . . . . . 78. 7.1. Operation fields of proposed mechanisms. . . . . . . . . . . . . . . . . 84. xii.

(16) List of Tables 2.1. RFCs for the TCP implementation. . . . . . . . . . . . . . . . . . . .. 2.2. Variable description of Fig. 2.5. . . . . . . . . . . . . . . . . . . . . . 15. 5.1. Link utilization of the bottleneck. . . . . . . . . . . . . . . . . . . . . 63. 5.2. Average queue length (packets). . . . . . . . . . . . . . . . . . . . . . 65. 5.3. Fairness index. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65. 6.1. Numeric analysis of packet loss differentiation for RedVegas. . . . . . 80. xiii. 7.

(17) Chapter 1 Introduction Transmission Control Protocol (TCP) is the most popular transport protocol for the current Internet. It provides a reliable data transport between two end hosts of a connection as well as controls the connection’s bandwidth usage to avoid network congestion. Many Internet applications use it as the underlying communication protocol. The behavior of TCP is therefore tightly coupled with the overall Internet performance. The essential strategy of TCP is sending packets to the network without a reservation and then reacting to observable events occurred. Original TCP is officially defined in [1]. It has a simple sliding window flow control mechanism without any congestion control. After observing a series of congestion collapses in 1980’s, Jacobson introduced several innovative congestion control mechanisms into TCP in 1988. This TCP version, called TCP Tahoe [2], includes the slow-start, additive increase and multiplicative decrease (AIMD), and fast retransmit algorithms. Two years later, the fast recovery algorithm was added to Tahoe to form a new TCP version called TCP Reno [3]. TCP Reno is currently the dominating TCP version deployed in the Internet. TCP congestion control is an active research area. Over the years, considerable research regarding the knowledge about TCP has been done [4, 5, 6, 7, 8, 9, 10]. TCP Reno can be thought as a reactive congestion control scheme. It uses packet. 1.

(18) loss as an indicator for congestion. In order to probe the available bandwidth along the end-to-end path, TCP congestion window will be increased until a packet loss is detected, at which point the congestion window is halved and a linear increase algorithm will take over until further packet loss is experienced. It is known that TCP Reno may periodically generate packet loss by itself and can not efficiently recover multiple packet losses from a window of data. Moreover, the AIMD strategy of TCP Reno leads to periodic oscillations in the aspects of congestion window size, round-trip delay, and queue length of the bottleneck node. Recent works have shown that the oscillation may induce chaotic behavior into the network thus adversely affects overall network performance [11, 12]. To alleviate the performance degradation problem of packet loss, many researchers attempted to refine the fast recovery algorithm which embedded in TCP Reno. New proposals includes TCP NewReno [13], SACK [14], FACK [15], Net Reno [16], and LT [17]. All these algorithms bring performance improvement for a connection after a packet loss is detected. To combat the inherent oscillation problem of TCP Reno, many congestion avoidance mechanisms are proposed. These works include DUAL [18], CARD [19], Tri-S [20], Packet-Pair [21], TCP vegas [22, 23, 24, 25], and TCP Santa Cruz [26]. Among these creative mechanisms, there is a notable approach, TCP Vegas, with a claim of 37 to 71 percent throughput improvement over TCP Reno was achieved.. 1.1. Motivation. With the fast growth of Internet traffic, how to efficiently utilize network resources is essential to a successful congestion control. TCP Vegas with a proactive congestion control strategy has the potential to provide a stable and efficient network environment. To prevent the performance degradation caused by AIMD, TCP Vegas employs a fundamentally different congestion avoidance approach. It uses the difference between the expected and actual throughput to estimate the available bandwidth in the 2.

(19) network. The idea is that when the network is not congested, the actual throughput will be close to the expected throughput. Otherwise the actual throughput will be smaller than the expected throughput. TCP Vegas uses the difference in throughput to gauge the congestion level in the network and update the congestion window size accordingly. As a result, TCP Vegas is able to detect network congestion in the early stage and successfully prevents periodic packet loss that usually occurs in TCP Reno. Many studies have demonstrated that TCP Vegas outperforms TCP Reno in the aspects of overall network utilization [23, 24], stability [9, 10], fairness [9, 10], and throughput [11, 23, 24]. Although TCP Vegas is superior to TCP Reno in the aforementioned aspects, however, TCP Vegas still suffers problems that inhere in its congestion control algorithm, these include issues of rerouting [9], persistent congestion [9], fairness [10, 27, 28], network asymmetry [29, 30, 31], high bandwidth-delay product (BDP) networks [32], internetworking of wired and wireless networks [33, 34], and incompatibility between TCP Reno and Vegas [9, 35, 36]. All these problems may prevent TCP Vegas from achieving a success.. 1.2. Contributions. In this dissertation, we propose four enhanced mechanisms to remove the obstacles of TCP Vegas for achieving a success. These mechanisms not only adopt end-to-end approaches but also utilize the information that provided by routers to improve the performance of connections. We now briefly describe each proposed mechanism and its contributions as follows: • RoVegas: The first proposed mechanism, RoVegas, is a router-assisted approach. In RoVegas, we define a new IP options named AQT (accumulate queuing time) to collect the queuing time experienced by a probing packet. By performing the proposed scheme in routers along the round-trip path, a RoVegas source may obtain the queuing time of both forward and backward directions as well as the fixed delay of the round-trip path. As a result, it can 3.

(20) solve the problems of rerouting and persistent congestion, enhance the fairness among the competitive connections, and improve the throughput when congestion occurs along the backward path. • Enhanced Vegas: An end-to-end scheme, Enhanced Vegas, is also presented to improve the performance degradation problem of TCP Vegas in asymmetric networks. The mechanism uses TCP timestamps option to estimate queueing delay on the forward and backward path separately without clock synchronization. Through distinguishing whether congestion occurs in the forward path or not, Enhanced Vegas significantly advances the connection throughput when the backward path is congested. • Quick Vegas: TCP congestion control may function poorly in high BDP networks because of its slow response with large congestion window size. In the third mechanism, we propose an improved version of TCP Vegas called Quick Vegas, in which we present an efficient congestion window control algorithm for a TCP source. Our algorithm is based on the increment history and estimated amount of extra data to update the congestion window intelligently. The modification allows TCP connections to react faster and better to high BDP networks and therefore improves the overall performance. • RedVegas: A well-known problem in providing TCP congestion control over wired and wireless networks is that it may encounter both congestion loss and random loss. Traditional TCP interprets every packet loss as caused by congestion which may not be the case in the current Internet. Misinterpretation of a random loss as an indication of network congestion results in TCP slowing down its sending rate unnecessarily. In the last proposed mechanism, RedVegas, we utilize the innate nature of TCP Vegas and congestion indications marked by routers to detect random packet losses precisely. Through the packet loss differentiation, RedVegas reacts appropriately to the losses, and therefore the throughput of connections over heterogeneous networks can be significantly improved. 4.

(21) 1.3. Dissertation Outline. The rest of this dissertation is organized as follows. In Chapter 2, we review the design principles of the two notable TCP implementations, TCP Reno and TCP Vegas. Some variants of TCP Reno and several innovative congestion avoidance mechanisms are also discussed in this Chapter. The proposed RoVegas, Enhanced Vegas, Quick Vegas, and RedVegas are described and evaluated in Chapters 3, 4, 5, and 6 respectively. Finally, we conclude the main work of this dissertation and point out some future work in Chapter 7.. 5.

(22) Chapter 2 Background End hosts sharing a best-effort network need to respond to congestion by implementing congestion control mechanisms to ensure network stability. Otherwise, the network may be driven into congestion collapses. Over past decades, two main congestion control algorithms were proposed and tested in real networks. One is TCP Reno [2, 3], and the other is TCP Vegas [22, 23, 24, 25]. TCP Reno has been widely deployed on the current Internet. Several RFCs are documented for the implementation of TCP Reno. The basic functionality is recommended by [1, 37, 38, 39, 40] and extensions are exhibited in [14, 41, 42, 43]. Table 2.1 lists these RFCs. In the following sections, we summarize the congestion control mechanism embedded in TCP Reno and TCP Vegas. Besides, some variants of TCP Reno and several innovative congestion avoidance mechanisms are also discussed.. 2.1. TCP Reno. TCP Reno is a window1 -based congestion control mechanism. Its window-adjustment algorithm consists of three phases; slow-start, AIMD (additive increase/multiplicative decrease), and fast retransmit and recovery. A connection begins with the slow-start phase. The objective of slow-start is to enable a TCP connection to discover the 1. TCP window refers to the amount of outstanding data that can be transmitted by the sender. without acknowledgements.. 6.

(23) Table 2.1: RFCs for the TCP implementation. RFC number. Topic. 793. Transmission Control Protocol. 1122. Requirements for Internet Hosts - Communication Layers. 1323. TCP Extensions for High Performance. 2018. TCP Selective Acknowledgement Options. 2581. TCP Congestion Control. 2582. The NewReno Modification to TCP’s Fast Recovery Algorithm. 2914. Congestion Control Principles. 3168. The Addition of Explicit Congestion Notification (ECN) to IP. 3390. Increasing TCP’s Initial Window. CWND=1. CWND=2. CWND=4. CWND=8. Source. ... Destination Time. Figure 2.1: Packets in transit during slow-start. available bandwidth by gradually increasing the amount of data injected into the network from the initial window size2 . Upon receiving an acknowledgement packet (ACK), the congestion window size (CWND) is increased by one packet. With reference to Fig. 2.1, initially, the sender starts by transmitting one packet and waits for its ACK. When that ACK is received, the congestion window is incremented from one to two, and two packets can be sent. When both of these two packets are acknowledged, the congestion window is increased to four, and so on. 2. RFC 2581 suggests an initial window size of two packets and RFC 3390 suggests a larger. initial window can be used for reducing the duration of startup period, specifically for connections running in long propagation delay networks.. 7.

(24) Since the CWND in the slow-start phase expands exponentially, the packets sent at this increasing rate would quickly lead to network congestion. To avoid this, the AIMD phase begins when CWND reaches the slow-start threshold (SSTHRESH ). In AIMD phase, the CWND is added by 1/CWND packet every once receiving an ACK, this makes window size grow linearly. The process continues until a packet loss is detected and then the the CWND will be cut by half. There are two ways for TCP Reno to detect packet loss. One is based on the reception of three duplicate ACKs, the other is based on retransmission timeout. When a source receives three duplicate ACKs, the fast retransmit and recovery algorithm is performed. It retransmits the lost packet immediately without waiting for a coarse-grained timer to expire. In the meantime, the SSTHRESH is set to half of CWND, which is then set to SSTHRESH plus the number of duplicate ACKs. The CWND is increased by one packet every once receiving a duplicate ACK. When the ACK of a retransmitted packet is received, the CWND is set to SSTHRESH and the source reenters the AIMD phase. If a serious congestion occurs and there is no sufficient survived packets to trigger three duplicate ACKs, the congestion will be detected by a coarse-grained retransmission timeout. When the retransmission timer expires, the SSTHRESH is set to half of CWND and then the CWND is reset to one and finally the source restarts from slow-start phase. A window evolution example including three window-adjustment phases of TCP Reno can be referred to Fig. 2.2. A connection starts from slow-start phase with an exponentially increasing rate. Since the connection has no idea about the available bandwidth of the network, the over expanded window size incurs a severe congestion quickly. After a retransmission timeout, the connection restarts from slow-start phase. When the CWND grows up to the SSTHRESH, the window size is increased linearly. After that, the pattern of periodically additive increasing and multiplicative decreasing of window size continues throughout the lifetime of the connection. The fast retransmit and recovery algorithm of TCP Reno allows a connection to quickly recover from isolated packet losses. However, when multiple packets are 8.

(25) 60. CWND (Packets). 50 40 30 20 10 0 0. 2. 4. 6. 8. 10. Time (s). Figure 2.2: TCP Reno’s window evolution. dropped from a window of data, TCP Reno may suffer serious performance problems. Since it retransmits at most one dropped packet per round-trip time, and further the CWND may be decreased more than once due to multiple packet losses occurred during one round-trip time interval. In this situation, TCP Reno operates at a very low rate and loses a significant amount of throughput. A number of enhanced loss recovery algorithms have been proposed to improve the above problem. In the following subsections, we briefly describe three noted remedies of TCP Reno, these include TCP NewReno [13], SACK [14], and FACK [15].. 2.1.1. TCP NewReno. TCP NewReno makes a small change to a connection source, it may eliminate TCP Reno’s waiting for a retransmission timeout when multiple packets are lost from a window. The change enhances the fast recovery algorithm of TCP Reno. In TCP Reno, partial ACKs3 bringing the connection out of fast recovery results 3. Partial ACK is an acknowledgement that acknowledge some but not all of the outstanding. packets at the start of that fast recovery phase.. 9.

(26) in a retransmission timeout in case of multiple packet losses. In TCP NewReno, when a source receives a partial ACK, it won’t get out of fast recovery [5, 42, 13]. Instead, it assumes that the packet immediately follows the most recently acknowledged packet has been lost, and hence retransmits the lost packet. Thus, in the situation of multiple packet losses, TCP NewReno will retransmit one lost packet per round-trip time until all of the lost packets from the same window have been recovered, and will not incur retransmission timeout. It remains in fast recovery phase until all of the outstanding packets at the start of that fast recovery phase have been acknowledged. Although this can avoid the unnecessary window reduction, the recovery time is still long. The implementation details of TCP NewReno has been specified in RFC 2582.. 2.1.2. SACK. Another way to deal with multiple packet losses is to tell the source which packets have arrived at the destination. Selective Acknowledgments (SACK) does so exactly. TCP adapts accumulated acknowledgement strategy to acknowledge the successfully transmitted packets, this improves the robustness of acknowledgement when the path back to the source features high loss rate. However, the drawback of accumulated acknowledgement is that after a packet loss the source is unable to find out which packets are successfully transmitted. Therefore, it is unable to recover more than one lost packet in each round-trip time. SACK option [14] field contains a number of SACK blocks, where each SACK blocks reports a non-contiguous set of data that has been received and buffered. The destination uses ACK with SACK option to inform the source one contiguous block of data that has been received out of order at the destination. When SACK blocks are received by the source, they are used to maintain an image of the receiver queue, i.e., which packets are missing and which have been received at the destination. Scoreboard is set up to track those transmitted and received packets according to the previous information of the SACK option. For. 10.

(27) each transmitted packet, scoreboard records its sequence number and a flag bit that indicates whether the packet has been “SACKed”. A packet with the SACKed bit turned on does not require to retransmit, but packets with the SACKed bit off and sequence number less than the highest SACKed packet are eligible for retransmission. Whether a SACKed packet is on or off, it is removed from the retransmission buffer only when it has been cumulatively acknowledged. SACK TCP implementation still uses the same congestion control algorithms as TCP Reno. The main difference between SACK TCP and TCP Reno is the behavior in the event of multiple packet losses. SACK TCP refines the fast retransmit and fast recovery strategy of TCP Reno so that multiple lost packets in a single window can be recovered within one round-trip time.. 2.1.3. FACK. Forward Acknowledgments (FACK) [15] was developed to decouple the congestion control algorithms from the data recovery algorithms. It uses the additional information provided by SACK option to keep an explicit measure of the total amount of outstanding data in the network. The goal of the FACK algorithm is to perform precise congestion control during recovery. By accurately controlling the outstanding data in the network, FACK can improve the connection throughput during the data recovery phase.. 2.2. TCP Vegas. TCP Vegas is a rate-based congestion control mechanism. It can detect network congestion in the early stage and successfully prevents periodic packet loss that usually occurs in TCP Reno. TCP Vegas features three improvements as compared with TCP Reno: (1) a new retransmission mechanism, (2) an improved congestion avoidance mechanism, and (3) a more effective slow-start mechanism. We summary the design principles of TCP Vegas as follows.. 11.

(28) TCP Vegas adopts a more sophisticated bandwidth estimation scheme that tries to avoid rather than to react to congestion. It uses the measured round-trip time (RTT ) to accurately calculate the amount of data packets that a source can send. Its window adjustment algorithm consists of three phases: slow-start (SS), congestion avoidance (CA), and fast retransmit and fast recovery (FF). The congestion window is updated based on the currently executing phase. During the congestion avoidance phase, TCP Vegas does not continually increase the congestion window. Instead, it tries to detect incipient congestion by comparing the actual throughput to the expected throughput. Vegas estimates a proper amount of extra data to be kept in the network pipe and controls the congestion window size accordingly. It records the RTT and sets BaseRTT to the minimum of ever measured round-trip times. The amount of extra data (∆) is estimated as follows: ∆ = (Expected − Actual) × BaseRT T,. (2.1). where Expected throughput is the current congestion window size (CWND) divided by BaseRTT, and Actual throughput represents the CWND divided by the newly measured smoothed-RTT. The CWND is kept constant when the ∆ is between two thresholds α and β. If ∆ is greater than β, it is taken as a sign for incipient congestion, thus the CWND will be reduced. On the other hand, if the ∆ is smaller than α, the available bandwidth may be under utilized. Hence, the CWND will be increased. The updating of CWND is per-RTT basis. The rule for congestion window adjustment can be expressed as follows:     CWND + 1,   . if ∆ < α. CWND =  CWND − 1, if ∆ > β      CWND,. .. (2.2). if α ≤ ∆ ≤ β. During the slow-start phase, TCP Vegas is similar to TCP Reno that allows a connection to quickly ramp up to the available bandwidth. However, to ensure that the sending rate will not increase too fast, TCP Vegas doubles the size of its congestion window only every other RTT. A similar congestion detection mechanism 12.

(29) 18 16. CWND (Packets). 14 12 10 8 6 4 2 0 0. 1. 2. 3. 4. 5. Time (s). Figure 2.3: TCP Vegas’ window evolution. is applied during the slow-start to decide when to switch the phase. If the estimated amount of extra data is greater than γ, TCP Vegas leaves the slow-start phase, reduces its congestion window size by 1/8 and enters the congestion avoidance phase. By keeping a proper amount of extra data in the network, TCP Vegas does not generate packet loss by itself. Ideally, it can maintain a stable window size as well as fully utilize the network resources if the network resources remain constant. An example of TCP Vegas’ window evolution in a stable network environment can be referred to Fig. 2.3. As in TCP Reno, a triple-duplicate acknowledgement (ACK) always results in packet retransmission. However, in order to retransmit the lost packets quickly, TCP Vegas extends TCP Reno’s fast retransmission strategy. TCP Vegas measures the RTT for every packet sent based on fine-grained clock values. Using the finegrained RTT measurements, a timeout period for each packet is computed. When a duplicate ACK is received, TCP Vegas will check whether the timeout period of the oldest unacknowledgement packet is expired. If so, the packet is retransmitted. This modification leads to packet retransmission after just one or two duplicate ACKs. When a non-duplicate ACK that is the first or second ACK after a fast. 13.

(30) New ACK / Coarse-grained Timeout / Idle. START. SS Triple Duplicate ACK / Fine-grained Timeout. CWND >= SSTHRESH / Delta > gamma. Coarse-grained Timeout / Idle Coarse-grained Timeout / Idle. CA. New ACK. New ACK. FF Duplicate ACK. Triple Duplicate ACK / Fine-grained Timeout. Figure 2.4: Phase transition diagram of TCP Vegas. retransmission is received, TCP Vegas will again check for the expiration of the timer and may retransmit another packet. Note that, packet retransmission due to an expired fine-grained timer is conditioned on the reception of certain ACKs. After a packet retransmission was triggered by a duplicate ACK and the ACK of the lost packet is received, the congestion window size will be reduced to alleviate the network congestion. There are two cases for TCP Vegas to set the CWND. If the lost packet has been transmitted just once, the CWND will be three fourth of the previous congestion window size. Otherwise, it is taken as a sign for more serious congestion, and one half of the previous congestion window size will be set to CWND. Notably, in case of multiple packet losses occurred during one round-trip time that trigger more than one fast retransmission, the congestion window will be reduced only for the first retransmission. If a loss episode is severe enough that no ACKs are received to trigger fast. 14.

(31) Table 2.2: Variable description of Fig. 2.5. Variable. Description. ACKSeqNo. sequence number of the last successfully received packet. NumDupACK. number of duplicate ACK. RTO. duration of the coarse-grained retransmission timer. FGRTO. duration of the fine-grained retransmission timer. CWNDCT. last congestion window adjustment time due to a packet loss detection. SendTime. sending time of the lost packet. Delta. amount of extra data. LostSeqNo. squence number of the lost packet. NumTransmit. number of transmission times of the lost packet. NewCWND. congestion window size that will be used as a lost packet is recovered. IncrFlag. a flag used to adjust congestion window every other RTT. IncrAmt. increment amount of congestion window size for each new ACK is received. WorriedCtr. a counter used to check FGRTO after a lost packet is recovered. retransmit algorithm, eventually, the losses will be identified by Reno-style coarsegrained timeout. When this occurs, the slow-start threshold (SSTHRESH ) will be set to one half of current CWND, then the CWND will be reset to two, and finally the connection will restart from slow-start. Figure 2.4 shows the phase transition diagram of TCP Vegas. A connection begins with the slow-start phase. The window-adjustment phase transition is owing to the specific events as depicted along the edges. TCP congestion control is mainly based on the feedback of ACKs. The control procedure will be triggered whenever an ACK is received by the connection source. Figure 2.5 illustrates the detailed procedure of TCP Vegas as it receives an ACK. The description of variables used in Fig. 2.5 is shown in Table 2.2.. 15.

(32) ACK arrives Yes. No. duplicate ACK. Yes. ++NumDupACK. NumDupACK > 3. No. CWND = NewCWND SSTHRESH = 2 No. NumDupACK == 3 or FGRTO expired. NumDupACK = 0 update RTO. Yes No. WorriedCtr = 2 FGRTO expon. backoff. NumDupACK > 3. No. ACKSeqNo >= tagged Pkt SeqNo. Yes Yes. No. ++CWND. CWNDCT < SendTime. calculate Delta. Yes No. Yes. NumTransmit > 1. Yes. IncrFlag = ! IncrFlag NewCWND = (1/2)*CWND. NewCWND = (3/4)*CWND. CWND = NewCWND + NumDupACK CWNDCT = current time. No. Yes. ! IncrFlag. No. Delta > gamma. Yes. No. CWND < SSTHRESH. Yes. - - CWND IncrAmt = 0. IncrAmt = 0. No. Delta > beta. Yes. Delta < alpha. IncrAmt = 1/CWND. No. IncrAmt = 0. IncrAmt = 1. CWND = (7/8)*CWND SSTHRESH = 2 IncrAmt = 0. retransmit the lost Pkt. tag next Pkt SeqNo. NumTransmit == 1. Yes. CWND = CWND + IncrAmt update FGRTO. No NumDupACK = 3. Yes. WorriedCtr > 0. No. - - WorriedCtr. No. FGRTO expired. Yes. NumDupACK = 3 retransmit the lost Pkt. free the ACK. NumDupACK == 0 or NumDupACK > 2. No. Yes. transmit next Pkts. END. Figure 2.5: Flowchart of the procedure for TCP Vegas upon receiving an ACK. 16.

(33) 2.3. Other Congestion Avoidance Mechanisms. TCP window size and the queue length of bottleneck node in the operation of TCP Reno often exhibit a clear oscillating behaviors when the traffic volume exceed the available resources. Such oscillation is inherent in the additive increase and multiplicative decrease algorithm and is used as a measure of probing resource changes. In addition to TCP Vegas, many efforts of end-to-end congestion control mechanisms such as DUAL [18], CARD [19], Tri-S [20] have been paid since 1988 by steering system away from the periodic congestion losses and it is expected that a connection can operate in the equilibrium point. However, these proposals do not attract much attentions as compared to TCP Vegas. We briefly describe these mechanisms as follows. The window in Jain’s CARD [19] approach is increased by one packet size and decreased by one-eighth based on the gradient of delay-window curve, which is used to evaluate the optimal point of the system. The performance of the window control mechanism was studied with a deterministic simulation model of a connection in a wide-area network. Note that the window changes during every adjustment, that is, it oscillates around its optimal point. DUAL scheme [18] defines one optimal point with queue length and uses the corresponding delay as the congestion signal. The congestion window normally uses fine-tuning to adjust window size, namely increases by 1/CWND for each ACK received. The algorithm decreases the congestion window by one-eighth if the current RTT is greater than the average of the minimum and maximum RTTs observed so far for every two RTTs. If a timeout is detected, the algorithm assumes that substantial traffic increase and severe congestion have occurred. It uses quick-turning to reduce the window size, similar to TCP Tahoe timeout action (CWND is set to 1 and SSTHRESH is set to half of the window). The Tri-S scheme proposed in [20], searches the operating point based on continuous evaluation of the current throughput gradient. For every RTT, the Tri-S increases window size by one packet and compares the throughput achieved to the 17.

(34) throughput when the window was one packet smaller. It is this difference that determines the increase, decrease or unchange of the window. TCP Santa Cruz [26] was designed with transmission-media heterogeneity in mind. With timestamp option in RFC 1323, it operates by summing the relative delays from the beginning of a session and then updating the measurements at discrete intervals. The bandwidth probing in this work is closely related to the Packet Pair [21], which uses the spacing of the ACKs to determine the available bandwidth in the networks. Similar to the proactive congestion avoidance mechanism in TCP Vegas [22, 23], this monitoring of the available bandwidth permits the detection of the incipient stage of congestion, and allows the congestion window to increase or decrease in response to early warning signs to reach a target optimal operating point.. 2.4. Chapter Summary. In this chapter, we outline the design principles of TCP Reno and TCP Vegas. TCP Reno uses packet loss as a signal to indicate that network is congested and reduces its window size accordingly. Therefore, TCP Reno can be concluded as a reactive congestion control mechanism. An appealing alternative, TCP Vegas, uses a sophisticated bandwidth estimation scheme to keep a proper amount of extra data in the network. As a result, it may steer the system away from congestion loss before it actually occurs. Thus, TCP Vegas is a proactive congestion control mechanism. Some variants of TCP Reno and several innovative congestion avoidance mechanisms are also reviewed in this chapter.. 18.

(35) Chapter 3 RoVegas: A Router-Assisted Congestion Avoidance Mechanism for TCP Vegas The most innovative idea of TCP Vegas is its congestion avoidance mechanism. It uses queueing delay as the congestion measure to predict whether congestion is about to happen. Queueing delay may provide a more fine-grained information of the network status than the binary signal – packet loss. Based on the additional fine-grained information, TCP Vegas not only reacts to but also avoids congestion. As a result, it can prevent the performance degradation caused by AIMD strategy and may provide a more stable and efficient transmission as compared to that of TCP Reno. However, the measurement of queueing delay is noisy. An inaccurate queueing delay estimation may incur serious impact on the performance. In this chapter, we propose a router-assisted congestion avoidance mechanism (RoVegas) for TCP Vegas. Through the proposed mechanism performed in routers along the round-trip path, RoVegas may obtain a more precise queueing delay and fixed delay measurement, and solve several problems that inhere in TCP Vegas. The rest of this chapter is organized as follows. Section 3.1 describes the problems. 19.

(36) that inhere in the congestion avoidance mechanism of TCP Vegas. Section 3.2 discusses the RoVegas. In Section 3.3, related work is reviewed. Section 3.4 and 3.5 present the analysis and simulation results respectively. Lastly, we summarize this chapter in Section 3.6.. 3.1. Problem Statements. In TCP Vegas, several problems may adversely affect the connection performance. We summarize these problems as follows. Rerouting: Since TCP Vegas estimates the BaseRTT to compute the expected throughput and adjust its window size accordingly. Thus it is very important to estimate the quantity accurately for Vegas connections. Rerouting may cause a change of the fixed delay1 that could result in substantial throughput degradation. When the routing path of a connection is changed, if the new route has a shorter fixed delay, it will not cause any serious problem for Vegas because most likely some packets will experience shorter round-trip time, and BaseRTT will be updated eventually. On the other hand, if the new route for the connection has a longer fixed delay, it would be unable to tell whether the increased round-trip time is due to network congestion or route change. The source host may misinterpret the increased round-trip time as a signal of congestion in the network and decrease its window size. This is just the opposite of what the source should do. Persistent Congestion: Persistent congestion is another problem caused by the incorrect estimation of BaseRTT [9]. Overestimation of the BaseRTT in Vegas may cause a substantial influence on the performance. Suppose that a connection starts while many other active connections also exist, the network is congested and the packets are accumulated in the bottleneck. Then, due to the queuing delay caused by those packets from other connections, the packets of the new connection may experience a round-trip time that are considerably larger than the actual fixed 1. The fixed delay is the sum of propagation delay and packet processing time along the round-trip. path. In other words, the fixed delay is the round-trip time without queuing delay.. 20.

(37) delay along the path. Hence, the window size of the new connection will be set to a value such that its expected amount of extra data lies between α and β; in fact, there may be much more extra data in the bottleneck queue due to the inaccurate estimation of the fixed delay. The situation can be more explicit described as follows. TCP Vegas uses the following inequality to detect and control the extra data in the network pipe. α ≤ (Expected − Actual) × BaseRT T ≤ β,. (3.1). We can rewrite Eq. (3.1) as: α ≤ CW N D × (1 −. BaseRT T ) ≤ β. RT T. (3.2). An overestimated BaseRTT will shrink the estimated amount of extra data (i. e., CW N D × (1 − BaseRT T /RT T )) and cause the Vegas source to misjudge that the network is not so congested. As a result, the Vegas source sets its window size larger than it should be and therefore puts more extra data on the bottleneck queue. This scenario will repeat for each newly added connection, and it may cause the bottleneck node to remain in a persistent congestion. Persistent congestion is likely to happen in TCP Vegas due to its fine-tuned congestion avoidance mechanism. Unfairness: Different from TCP Reno, TCP Vegas is not biased against the connections with longer round-trip time [9, 10]. However, there is still unfairness comeing with the nature of Vegas. According to the difference between the expected and actual throughputs, a Vegas source attempts to maintain an amount of extra data between two thresholds α and β by adjusting its congestion window size. The range between α and β induces uncertainty to the achievable throughput of connections. Since Vegas may keep different amount of extra data in the bottleneck even for the connections with the same round-trip path. Thus, it prohibits better fairness achievement among the competing connections. Furthermore, the inaccurate computation of expected throughput may also lead to unfairness. Recall that the computation of expected throughput is based on the measurement of BaseRTT. If Vegas connections can not estimate the BaseRTT 21.

(38) accurately, it may affect the fairness achievement. When a new connection starts sending data while many other connections are also active, it may cause overestimation of the fixed delay and result in unfair distribution of bandwidth among the Vegas connections. Network Asymmetry: Based on the estimated extra data kept in the bottleneck, Vegas updates its congestion window to avoid congestion as well as to maintain high throughput. However, a roughly measured RTT may lead to a coarse adjustment of congestion window size. If the network congestion occurs in the direction of ACK (backward path), it may underestimate the actual throughput and cause an unnecessary decreasing of the congestion window size. Ideally, congestion in the backward path should not affect the network throughput in the forward path, which is the data transfer direction. Obviously, the control mechanism must be able to distinguish whether congestion occurs in the forward path or not and adjust the congestion window size more intelligently. Incompatibility: TCP Vegas adopts a proactive congestion avoidance scheme, it reduces its congestion window before an actual packet loss occurs. TCP Reno, on the other hand, employs a reactive congestion control mechanism. It keeps increasing its congestion window until a packet loss is detected. Researchers [9, 35, 36] have found that when Reno and Vegas perform head-to-head, Reno generally steals bandwidth from Vegas. This incompatibility between Vegas and Reno depress the adoption of Vegas on the Internet.. 3.2. RoVegas. From the above discussion, we can find that the coarse estimation of fixed delay along the round-trip path, BaseRTT, results in problems such as issues of rerouting, persistent congestion, and unfairness. A Vegas source is unable to distinguish whether congestion occurs in the forward path or not, this further leads to unnecessary throughput degradation when the congestion occurs on the backward path. In this section, we propose a router-assisted congestion avoidance mechanism (RoVegas) 22.

(39) for TCP Vegas to deal with these problems. The details of the proposed mechanism are described as follows.. 3.2.1. Proposed Mechanism. TCP Vegas estimates a proper amount of extra data to be kept in the network pipe and controls the congestion window size accordingly. The amount is between two thresholds α and β, as shown in Eq. (3.1). When backward congestion occurs, the increased backward queuing time will affect the Actual throughput and enlarge the difference between the Expected throughput and Actual throughput. It results in decreasing the congestion window size. Since the network resources in the backward path should not affect the traffic in the forward path, it is unnecessary to reduce the congestion window size when only backward congestion occurs. A measured RTT can be divided into four parts: forward fixed delay (i. e., propagation delay and packet processing time), forward queuing time, backward fixed delay, and backward queuing time. To utilize the network bandwidth efficiently, we redefine the Actual throughput as Actual0 =. CW N D , RT T − QTb. (3.3). where RTT is the newly measured round-trip time, QTb is the backward queuing time, and CWND is the current congestion window size. Consequently, the Actual 0 is a throughput that can be achieved if there is no backward queuing delay along the path. To realize our scheme, we define a new IP option named AQT (accumulate queuing time) to collect the queuing time along the path. According to the general format of IP options described in [44], the fields of an AQT option are created as in Fig. 3.1. The option type and length fields indicate the type and length of this IP option. The AQT field expresses the accumulated queuing time that a packet experienced along the routing path. The AQT-Echo field echoes the accumulated queuing time value of an AQT option that was sent by the remote TCP.. 23.

(40) Option Type Option Length. 1 Byte 1 Byte. AQT (Accumulated Queuing Time). 3 Bytes. AQT-Echo. 3 Bytes. Figure 3.1: Fields of an AQT option. A probing packet is a normal TCP packet (data or ACK) with AQT option in its IP header. When a RoVegas source sends out a probing packet, it sets the AQT field to zero. An AQT-enabled router (i. e., a router that is capable of AQT option processing) adds the queuing delay of a received probing packet to the AQT field. The queuing time is computed based on the queuing disciplines. The details regarding how to compute the queuing time of each received probing packet in various queuing disciplines is beyond the scope of our discussion. Whenever a RoVegas destination acknowledges a probing packet, it inserts an AQT option into the ACK. The AQT-Echo field is set to the value of the AQT field of the received packet, then the AQT field is reset to zero. Through the AQTenabled routers along the round-trip path, a RoVegas source is able to obtain both the forward queuing time (the value of AQT-Echo field) and backward queuing time (the value of AQT field) from the received probing packet. Moreover, for each probing packet received by a RoVegas source, the BaseRTT can be derived as follows: BaseRT T = RT T − (AQT + AQT-Echo).. (3.4). Notice that, the derived BaseRTT of a connection will be identical for each probing packet received when both the route and size of the probing packets are fixed. The derived BaseRTT of RoVegas represents the actual fixed delay along the round-trip path, if the path of a connection is rerouted and the fixed delay is changed, the newly derived BaseRTT may reflect the rerouting information. As a result, the issue of rerouting can be solved. Furthermore, since each connection of RoVegas is able to measure the fixed delay without bias, the problem of persistent 24.

(41) congestion can be avoided and the fairness among the competitive connections can also be improved. To avoid the unnecessary reduction of congestion window size, the proposed router-assisted congestion avoidance mechanism is described as follows: • Derive the Expected throughput that is defined as the current congestion window size divided by BaseRTT. • Calculate the Actual 0 as the current congestion window size divided by the difference between the newly measured RTT and backward queuing time. • Let Dif f = (Expected − Actual 0 ) × BaseRT T . • Let wcur and wnext be the congestion window sizes for the current RTT and the next RTT, respectively. The rule for congestion window adjustment is as follows:.     wcur + 1,   . if Dif f < α. wnext =  wcur − 1, if Dif f > β .      wcur , if α ≤ Dif f ≤ β. 3.2.2. (3.5). Implementation Issue. RoVegas relies on probing packets to probe the network status, therefore, how often a probing packet will be sent for a connection is an important issue. Since the window adjustment of RoVegas is performed on per-RTT basis. Inserting probing packets frequently makes the proposed mechanism robust against the network congestion, however, it also imposes more overhead on RoVegas. For the overhead induced by the probing packets, we consider the worst case that every packet with the AQT option. If the data packet size is 1500 bytes, which is the maximum transmission unit of Ethernet, the overhead ratio of data packets is 8/1500, which is about 0.53 %. In the practical implementation, the number of the probing packets per-RTT can be dynamically adjusted depending on the network status. That is, the more severe the backward congestion is, the more frequent the AQT option should be 25.

(42) inserted into a data packet. Through this way, the overhead induced by the AQT option can be reduced to an even smaller amount. We make every packet to be a probing packet and demonstrate that the proposed mechanism is effective to improve the performance of TCP Vegas by the results of both analysis and simulation shown in Sections 3.4 and 3.5.. 3.3. Related Work. Congestion control for TCP is an active research area. Since Brakmo et al. [22, 23] proposed TCP Vegas in 1994 with claiming to achieve higher throughput and onefifth to one-half the losses of TCP Reno, there have been quite a lot of studies focusing on TCP Vegas. Ahn et al. [24] performed some live Internet experiments with TCP Vegas. They reproduced claims in [22, 23] with varying background traffic and concluded that Vegas indeed offers improved throughput of at least 3 to 8 percent over Reno. TCP Vegas is also found to retransmit fewer packets and to have a lower average and a lower variance of RTT than Reno. By using the fluid model and simulations, Mo et al. [9] show that Vegas is not biased against connections with longer round-trip time like Reno does. It achieves better fairness of bandwidth sharing among the competitive connections with different propagation delays. However, they also pointed out that TCP Vegas does not receive a fair share of bandwidth in the presence of a TCP Reno connection. Two problems of Vegas that could have serious impact on its performance are also described in [9]. One is the rerouting problem. Rerouting may lead to the change of fixed delay and therefore bring about inaccurate estimation of BaseRTT. This may erroneously affect the adjustment of the congestion window size. The other is the persistent congestion, which is still caused by the inaccurate estimation of BaseRTT. Hasegawa et al. [10] focus on the fairness and stability of the congestion control mechanisms for TCP. They use an analytical model to derive that TCP Vegas can 26.

(43) offer higher performance and much stable operation than Reno. However, because of the default values of α and β in the implementation, connections of Vegas could not share the total bandwidth in a fair manner. Thus Hasegawa et al. propose an enhanced Vegas which sets α equal to β to remove the uncertainty induced by the range between α and β. Through the analytical study and simulation, Boutremans et al. [27] show that in addition to the setting of α and β, the fairness of TCP Vegas critically requires an accurate estimation of propagation delay. Nevertheless, they think there is no obvious way to achieve this. To prevent the performance degradation of TCP Vegas in asymmetric networks, Elloumi et al. [29] proposed a modified algorithm. It divides a round-trip time into a forward trip time and a backward trip time in order to remove the effects of backward path congestion. However, it seems unlikely to work without clock synchronization. Another mechanism for solving the issue of Vegas in asymmetric networks is proposed in [30, 31]. Fu et al. employ an end-to-end method to measure the actual flow rate on the forward path at a source of TCP Vegas. Based on the differences between the expected rate along the round-trip path and the actual flow rate on the forward path, the source adjusts the congestion window size accordingly. However, in a backward congestion environment the self-clocking behavior of TCP will be disturbed. Then the TCP traffic with bursty nature will make the source hard to decide the measure interval between two consecutive tagged packets. Moreover, the actual flow rate on the forward path measured by the source may be usually greater than the expected rate along the round-trip path. It may lead to an over-increased congestion window size, and causes congestion along the forward path. To enhance the throughput of Vegas when it performs with TCP Reno headto-head, Lai [35] suggests two approaches, one is using the random early detection (RED) mechanism in the router, the other is adjusting parameters of Vegas. Both may improve the performance of Vegas. Feng et al. [36] show that the default configuration of Vegas is indeed incompatible with TCP Reno. However, with a careful analysis of how Reno and Vegas use 27.

(44) lf .... S1. uf R2. R1. ub. D1. .... lb. Figure 3.2: Network model for analysis. buffer space in the routers, Vegas and Reno can be compatible with one another if Vegas is configured properly. Nevertheless, no mechanism has been proposed to configure Vegas automatically.. 3.4. Performance Analysis. In this section, we present a steady-state performance analysis of both Vegas and RoVegas. By investigating the queue length of the bottleneck buffer through the analytical approach, we can clarify the essential nature of these two mechanisms. The network model used in the analysis is depicted in Fig. 3.2. Assuming the source S1 is a greedy source. The destination D1 generates an ACK immediately upon receiving a data packet sent from S1 . Either the forward or backward link between two routers R1 and R2 is the bottleneck along the path. The forward link between two routers has a capacity of uf (data packets per second) and backward link has a capacity ub (ACKs per second). To facilitate the analysis as the backward path is congested, a normalized asymmetric factor k, k =uf /ub , is introduced [33]. The network is defined as asymmetric if the asymmetric factor is greater than one. The service discipline is assumed to be First-In-First-Out (FIFO). Let τ be the BaseRTT (without any queuing delay), lf and lb be the mean numbers of packets queued in the forward and backward bottleneck buffer respectively. Since the win28.

(45) dow size of Vegas converges to a fixed value in steady state, the mean number of packets queued in bottleneck buffer should also be converged to a fixed level [10].. 3.4.1. Analysis on Vegas. The congestion avoidance mechanism of Vegas shown in Eq. (3.1) can be rewritten as below:. RT T RT T × β. × α ≤ CW N D ≤ RT T − BaseRT T RT T − BaseRT T. (3.6). The BaseRTT and RTT can be expressed as follows: BaseRT T = τ, RT T = τ +. (3.7). lb lf + . ub uf. (3.8). After substitution of Eq. (3.7) and Eq. (3.8), Eq. (3.6) can be rewritten as:. τ u f ub + l f ub + l b uf τ u f ub + l f ub + l b uf × β. × α ≤ CW N D ≤ l f ub + l b uf l f ub + l b uf. (3.9). Symmetric Network (k ≤ 1): If the bottleneck is in the forward path, packets will be accumulated in the forward bottleneck queue and no packets will be queued in the backward path, that is lb = 0, thus Eq. (3.9) can be simplified as:. τ uf + l f τ uf + l f × β. × α ≤ CW N D ≤ lf lf. (3.10). Since S1 is the only traffic source in the network thus it may occupy all the bandwidth of the bottleneck. Based on the fluid approximation, the congestion window size of S1 can be obtained through the bandwidth-delay product of the bottleneck as follows: CW N D = uf × (τ +. lf ). uf. (3.11). By substituting Eq. (3.11) into Eq. (3.10), we have α ≤ lf ≤ β.. 29. (3.12).

(46) The throughput T of S1 can also be derived from Eq. (3.8) and Eq. (3.11) as: CW N D = uf . RT T. T =. (3.13). From Eq. (3.12) and Eq. (3.13), we observe that when the bottleneck appears in the forward path, the mean number of packets queued in forward bottleneck buffer is kept stable between α and β, and the link bandwidth is always fully utilized in steady state. This observation matches the design goal of Vegas. Asymmetric Network (k > 1): If the bottleneck exists in the backward path then the queue of the backward bottleneck node will be built up and no packets will be queued in the forward path, that is lf = 0, therefore Eq. (3.9) can be rewritten as:. τ ub + l b τ ub + l b × β. × α ≤ CW N D ≤ lb lb. (3.14). Similar to the Eq. (3.11), the window size of S1 can also be obtained by the bandwidthdelay product of the bottleneck link: CW N D = ub × (τ +. lb ). ub. (3.15). By substituting Eq. (3.15) into Eq. (3.14), we have α ≤ lb ≤ β.. (3.16). In the meantime, the throughput T of S1 can be derived from Eq. (3.8) and Eq. (3.15) as: T =. uf CW N D = ub = . k RT T. (3.17). From Eq. (3.16) we can find that, Vegas is unable to distinguish whether congestion occurs in the forward path or not. It keeps a steady quantity of extra data between α and β on the backward path. This may lead to poor utilization of forward path. As shown in Eq. (3.17), the throughput of S1 is limited by the capacity of backward path. Notably, an ACK in the backward path implies that a data packet has arrived at its destination. Therefore, the throughput of S1 is uf /k (data packets per second). 30.

(47) 3.4.2. Analysis on RoVegas. The congestion avoidance mechanism of RoVegas can be briefly expressed as follows:. β CW N D CW N D α . − ≤ lb ≤ BaseRT T BaseRT T BaseRT T RT T − ub. (3.18). By Eq. (3.18), we have the congestion window size of RoVegas as:. RT T RT T × β. lb ×α ≤ CW N D ≤ RT T −BaseRT T − ulbb RT T −BaseRT T − ub. (3.19). From Eq. (3.7) and Eq. (3.8), Eq. (3.19) can be rewritten as:. τ uf + l f τ uf + l f × β. × α ≤ CW N D ≤ lf lf. (3.20). Since the result of Eq. (3.20) is identical to that of Eq. (3.10). If the bottleneck is in the forward path (i.e., lb = 0), the behavior of RoVegas will be same as Vegas. However, the result of Eq. (3.20) reveals that the throughput of RoVegas in the case of backward congestion is not simply limited by the bandwidth of backward path as that of Vegas. As shown in Eq. (3.18), RoVegas always attempts to maintain a proper amount of extra data in the forward path regardless of where the congestion occurs. However, TCP is a “self-clocking” protocol, that is, it uses ACKs as a “clock” to strobe new packets into the network [2]. Hence, as the backward path is congested, the rate of data flow in the forward direction will be throttled in a manner by the rate of ACK flow. There exists a further restriction in Vegas that may limit the growth of the congestion window. The congestion window will not be increased if the source is unable to keep up with, that is, the difference between the congestion window size and the amount of outstanding data is larger than two maximum-sized packets [25]. Be a variant of TCP Vegas, RoVegas also complies with this restriction. In an asymmetric network, for example k = 8, assuming that in steady state the forward path can be fully utilized by S1 , it means that 7/8 of ACKs will be dropped in the backward path. With TCP, the ACKs are cumulative [45], that is, later 31.

(48) ACKs carry all the information contained in earlier ACKs. In this case, a survived ACK may represent that eight data packets have arrived at the destination. Once a survived ACK is received by the source, the difference between the congestion window size and the amount of outstanding data is eight packets. It will restrict the growth of the congestion window. Actually, the forward path may not be fully utilized by RoVegas with k = 8. For an asymmetric network, if the dropping ratio of ACKs reaches 2/3, the congestion window of RoVegas will not be increased. Since for each ACK received by the RoVegas source, the difference between the congestion window size and the amount of outstanding data will be three packets. In such situation, RoVegas enters the steady state and the growth of congestion window stops. For each ACK received, the RoVegas source may send three packets back-to-back. Let F be the throughput ratio of RoVegas to Vegas (i.e., F = throughput of RoVegas ). In asymmetric networks, we have the throughput relathroughput of Vegas tionship of Vegas and RoVegas as follows:. 1 < F ≤ 3, ∀k > 1.. (3.21). Note that, the throughput of RoVegas contains the overhead induced by the AQT option. So the actual throughput ratio of RoVegas to Vegas should be slightly smaller than F. Equation (3.21) will be further verified by the following performance evaluation.. 3.5. Performance Evaluation. In this section, we compare the performance of TCP RoVegas with TCP Vegas by using the network simulator ns-2.1b9a [46]. We show the performance results in backward congestion environments, the bias experiments, the fairness investigations among the competitive connections, and the study of gradual deployment. The FIFO service discipline is assumed. Every packet of RoVegas is a probing packet. Whenever a throughput of RoVegas is computed, the overhead induced by 32.