4. Behind Traditional TCP: protocols for high-throughput and high-latency networks PA191: Advanced Computer Networking Eva Hladká Slides by: Petr Holub, Tomáš Rebok Faculty of Informatics Masaryk University Autumn 2015 Eva Hladká (Fl MU) 4. Behind Traditional TCP . Autumn 2015 1/66 Lecture overview Q Traditional TCP and its issues Q Improving the traditional TCP • Multi-stream TCP • WeblOO Q Conservative Extensions to TCP • GridDT • Scalable TCP, High-Speed TCP, H-TCP, BIC-TCP, CUBIC-TCP Q TCP Extensions with IP Support • QuickStart, E-TCP, FAST Q Approaches Different from TCP • tsunami • RBUDP • XCP • SCTP, DCCP, STP, Reliable UDP, XTP O Conclusions O Literature Eva Hladká (Fl MU) 4. Behind Traditional TCP . Autumn 2015 Lecture overview Q Traditional TCP and its issues Q Improving the traditional TCP • Multi-stream TCP • WeblOO Q Conservative Extensions to TCP • GridDT • Scalable TCP, High-Speed TCP, H-TCP, BIC-TCP, CUBIC-TCP Q TCP Extensions with IP Support • QuickStart, E-TCP, FAST Q Approaches Different from TCP • tsunami • RBUDP • XCP • SCTP, DCCP, STP, Reliable UDP, XTP O Conclusions Q Literature Eva Hladká (Fl MU) 4. Behind Traditional TCP . Autumn 2015 Protocols for reliable data transmission Protocols for reliable data transmission have to: • ensure the reliability of the transfer • retransmissions of lost packets • FEC might be usefully employed • a protection from congestion • network, receiver Behavior evaluation: • aggressiveness - ability to utilise available bandwidth • responsiveness - ability to recover from a packet loss • fairness - getting a fair portion of network throughput when more streams/participants use the network Eva Hladká (Fl MU) 4. Behind Traditional TCP . Autumn 2015 Problem statement network links with high capacity and high latency • iGrid 2005: San Diego o Brno, RTT = 205 ms • SC|05: Seattle o Brno, RTT = 174 ms traditional TCP is not suitable for such an environment: • 10Gb/s, RTT = 100 ms, 1500B MTU =^ sending/outstanding window 83.333 packets =^ a single packet may be lost in at most 1:36 hour O terribly slow O if errors are more frequent, the maximum throughput cannot be reached • How could be a better network utilization achieved? • How could be a reasonable co-existence with traditional TCP ensured? • How could be a gradual deployment of a new protocol ensured? Eva Hladká (Fl MU) 4. Behind Traditional TCP Autumn 2015 5 / 66 Traditional TCP flow control vs. congestion control no control: sender network receiver ^ flow control (rwnd) --> sender network receiver congestion control (cwnd): <---> v sender network receiver Eva Hladká (Fl MU) 4. Behind Traditional TCP . Autumn 2015 6 / 66 Traditional "CP I. Trans mis sian rate adjustment Transmission network Small-capacity receiver Flow control is for receivers Congestion control is for the network Internal congestion Large-capacity receiver Congestion collapse was first observed in 19S6 by V.Jacobson. Congestion control was added to TCP (TCP Tahoe)in 19SS. From Computer Networks, A. Tanenbaum Eva Hladká (Fl MU) 4. Behind Traditional TCP . Autumn 2015 7/i Traditional TCP Traditional TCP II. Flow control • an explicit feedback from receiver(s) using rwnd • deterministic Congestion control • an approximate sender's estimation of available throughput (using cwnd) the final window used: ownd ownd = m\n{rwnd, cwnd} The bandwidth bw could be computed as: bw = 8 • MSS • ownd RTT (1) Eva Hladká (Fl MU) Autumn 2015 8 / 66 Traditional TCP Traditional TCP II. Flow Control Packet Sent Source Port Dest. Port Sequence Number Acknowledgment HL/Flags Window D. Checksum Options.. Packet Received Source Port Dest. Port Sequence Number Acknowledgment HL/FIa Window Options.. acknowledged to be sent outside window Eva Hladká (Fl MU) 4. Behind Traditional TCP . Autumn 2015 9 / 66 Traditional "CP - fahoe and Reno Congestion control: • traditionally based on AIMD - Additive Increase Multiplicative Decrease approach Tahoe [1] • cwnd = cwnd + 1 ... per RTT without loss (above sstresh) • sstresh = 0, 5cwnd cwnd = 1 ... per every loss Reno [2] adds • fast retransmission • a TCP receiver sends an immediate duplicate ACK when an out-of-order segment arrives • all segments after the dropped one trigger duplicate ACKs • a loss is indicated by 3 duplicate ACKs (~ four successive identical ACKs without intervening packets) 9 once received, TCP performs a fast retransmission without waiting for the retransmission timer to expire • fast recovery - slow-start phase not used any more sstresh = cwnd = 0, 5cwnd Eva Hladká (Fl MU) 4. Behind Traditional TCP . Autumn 2015 10 / Traditional TCP Connection opening : cwnd = 1 segment Slow Start Congestion Avoidance Exponential increase for cwnd until cwnd SSTHRESH Additive increase for cwnd cwnd = SSTHRESH Retransmission timeout S STHRESH:- c wnd/2 cwnd:= 1 segment Retransmission timeout SSTHRESH:=cwnd/2 'Exponential increase for cwnd: for every useful acknowledgment received, cwnd := cwnd + (1 segment size) •Additive increase for cwnd: for every useful acknowledgment received, cwnd := cwnd + (segment size)*(segment size) / cwnd it takes a full window to increment the window size by one. Eva Hladká (Fl MU) 4. Behind Traditional TCP . Autumn 2015 11 / 66 Traditional TCP Traditional "CP - fahoe 0 10 20 30 40 50 time [RTT] o cwnd .......... sstresh Eva Hladká (Fl MU) 4. Behind Traditional TCP . Autumn 2015 Traditional TCP Connection opening : cwnd = 1 segment Slow Start Exponential increase for cwnd until cwnd = SSTHRESH Retransmission timeout SSTEIRESH:=cwnd/2 Retransmission timeout SSTHRESH:=cwnd/2 cwnd = SSTHRESH Retransmission timeout SSTHRESH:=cwnd/2 cwnd:= 1 segment Congestion Avoidance Additive increase for cwnd 3 duplicate ack received 3 duplicate ack received Fast Recovery Exponential increase beyond cwnd Expected ack received cwnd:=cwnd/2 Eva Hladká (Fl MU) 4. Behind Traditional TCP . Autumn 2015 13 / 66 Traditional TCP Traditional TCP - Reno II. 0 10 20 30 40 50 time [RTT] o cwnd .......... sstresh Eva Hladká (Fl MU) 4. Behind Traditional TCP . Autumn 2015 TCP Vegas Vegas—a concept of congestion control [3] • when a network is congested, the RTT becomes higher • RTT is monitored during the transmission • when a RTT increase is detected, the congestion's window size is linearly reduced a possibility to measure an available network bandwidth using inter-packet spacing/dispersion Eva Hladká (Fl MU) 4. Behind Traditional TCP . Autumn 2015 Traditional "CP a reaction to packet loss—retransmission • Tahoe: the whole actual window ownd • Reno: a single segment in the Fast Retransmission mode • NewReno: more segments in the Fast Retransmission mode • Selective Acknowledgement (SACK): just the lost packets fundamental question: How could be a sufficient size of cwnd (under real conditions) achieved in the network having high capacity and high RTT? .. .without affecting/disallowing the "common" users from using network? Eva Hladká (Fl MU) 4. Behind Traditional TCP . Autumn 2015 Traditional TCP Response Function represents a relation between bw and a steady-state packet loss rate p • cwndaverage « (for MSS-sized segments) . using (1): 6iv « the responsiveness of traditional TCP • assuming, that the packet has been lost when cwnd = bw ■ RTT _ bw RTT2 6 ~ 2MSS Eva Hladká (Fl MU) 4. Behind Traditional TCP . Autumn 2015 17 / 66 Traditional TCP - Responsiveness TCP responsiveness 18000 C= 622Mbit/s C=2.5Gbit/s C=10Gbit/s 50 100 RTT (ms) 150 200 Eva Hladká (Fl MU) 4. Behind Traditional TCP . Autumn 2015 18 / 66 Traditional TCP Traditional "CP - Fairness I. a fairness in a point of equilibrium the fairness is considered for • streams with different RTT • streams with different MTU The speed of convergence to the point of equilibrium DOES matter! Eva Hladká (Fl MU) 4. Behind Traditional TCP . Autumn 2015 19 / 66 Traditional TCP cwnd + = MSS, cwnd * = 0,5 (30 steps) 100 80 - 60 -- 40 -- 20 - 0 0 20 40 60 80 100 Eva Hladká (Fl MU) 4. Behind Traditional TCP . Autumn 2015 20 / 66 Traditional TCP cwnd + = MSS, cwnd*= 0,83 (30 steps) 100 80 - 60 -- 40 -- 20 - 0 ŕ \ //: /......i r^. . . . ^ // jí r....... .......A jí Jt ......Já / 0 20 40 60 80 100 Eva Hladká (Fl MU) 4. Behind Traditional TCP . Autumn 2015 21 / 66 Improving the traditional TCP Multi-stream TCP Lecture overview Q Traditional TCP and its issues Q Improving the traditional TCP • Multi-stream TCP • WeblOO Q Conservative Extensions to TCP • GridDT • Scalable TCP, High-Speed TCP, H-TCP, BIC-TCP, CUBIC-TCP Q TCP Extensions with IP Support • QuickStart, E-TCP, FAST Q Approaches Different from TCP • tsunami • RBUDP • XCP • SCTP, DCCP, STP, Reliable UDP, XTP O Conclusions Q Literature Eva Hladká (Fl MU) 4. Behind Traditional TCP . Autumn 2015 22 / 66 Improving the traditional TCP Multi-stream TCP Multi-stream TCP assumes multiple TCP streams transferring a single data flow in fact, improves the TCP's performace/behavior just in cases of isolated packet losses • a loss of more packets usually affects more TCP streams usually available because of a simple implementation • bbftp, GridFTP, Internet Backplane Protocol, . .. drawbacks: • more complicated than traditional TCP (more threads are necessary) • the startup is accelerated linearly only • leads to a synchronous overloading of queues and caches in the routers Eva Hladká (Fl MU) 4. Behind Traditional TCP . Autumn 2015 23 / 66 Improving the traditional TCP Multi-stream TCP TCP implementation tuning I. cooperation with HW • Rx/Tx TCP Checksum Offloading • ordinarily available zero copy • accessing the network usually leads to several data copies: user-land o kernel o network card • page flipping - user-land o kernel data movement • support for, e.g., sendfileO • implementations for Linux, FreeBSD, Solaris, ... Eva Hladká (Fl MU) 4. Behind Traditional TCP autumn 2015 24 / 66 Improving the traditional TCP WeblOO TCP implementation tuning II. • WeblOO [4, 5] • a software that implements instruments in the Linux TCP/IP stack -TCP Kernel Instrumentation Set (TCP-KIS) • more than 125 "puls/rods" • information available via /proc • distributed in two pieces: • a kernel patch adding the instruments • a suite of "userland" libraries and tools for accessing the kernel instrumentation (command-line, GUI) • the WeblOO software allows: • monitoring (extended statistics) • instruments' tuning • support for auto-tuning Eva Hladká (Fl MU) 4. Behind Traditional TCP . Autumn 2015 25 / 66 Conservative Extensions to TCP GridDT Lecture overview Q Traditional TCP and its issues Q Improving the traditional TCP • Multi-stream TCP • WeblOO Q Conservative Extensions to TCP • GridDT • Scalable TCP, High-Speed TCP, H-TCP, BIC-TCP, CUBIC-TCP Q TCP Extensions with IP Support • QuickStart, E-TCP, FAST Q Approaches Different from TCP • tsunami • RBUDP • XCP • SCTP, DCCP, STP, Reliable UDP, XTP O Conclusions Q Literature Conservative Extensions to TCP GridDT GridDT • a collection of ad-hoc modifications :( • correction of sstresh • faster slowstart • AIMD's modification for congestion control • cwnd = cwnd + a . .. per RTT without packet loss • cwnd = b cwnd ... per packet loss • just the sender's side has to be modified Eva Hladká (Fl MU) 4. Behind Traditional TCP Conservative Extensions to TCP GridDT GridDT - fairness 100 80 -- 60 -- 40 -- 20 - 0 S if .......ä f....... Jt .....■ * / 0 20 40 60 80 100 Eva Hladká (Fl MU) 4. Behind Traditional TCP . Autumn 2015 28 / 66 Conservative Extensions to TCP GridDT GridDT - example CERN (GVA) Bottleneck POS 2.5 Gb/s Starlight (CHI) Host #1 POS 10 Gb/s Host #2 Sunnyvale TCP Reno performance (see slide #8): First stream GVA <-> Sunnyvale : RTT =181 ms ; Avg. throughput over a period of 7000s = 202 Mb/s Second stream GVA<->CHI : RTT =117 ms; Avg. throughput over a period of 7000s = 514 Mb/s Links utilization 71,6% Grid DT tuning in order to improve fairness between two TCP streams with different RTT: First stream GVA <-> Sunnyvale : RTT = 1 81 ms, Additive increment = A = 7 ; Average throughput = 330 Mb/s Second stream GVA<->CHI : RTT =117 ms, Additive increment = B = 3 ; Average throughput = 388 Mb/s Links utilization 71.8% Throughput of two streams with different RTT sharing a 1 Gbps bottleneck 3000 4000 Time (s) ■A=7 ; RTT=181ms ■Average over the life of the connection RTT=181ms -B=3 ; RTT=117ms Average over the life of the connection RTT=117ms Eva Hladká (Fl MU) 4. Behind Traditional TCP . Autumn 2015 29 0672 Scalable TCP Conservative Extensions to TCP Scalable TCP, High-Speed TCP, H-TCP, BIC-TCP, CUBIC-TCP • proposed by Tom Kelly [1] • congestion control is not AIMD any more: • cwnd = cwnd + 0, 01 cwnd . .. per RTT without packet loss cwnd = cwnd + 0, 01 ... per ACK • cwnd = 0, 875 cwnd ... per packet loss =4> Multiplicative Increase Multiplicative Decrease (MIMD) • for smaller window size and/or higher loss rate in the network the Scalable-TCP switches into AIMD mode Eva Hladká (Fl MU) Autumn 2015 30 / 66 Conservative Extensions to TCP Scalable TCP, High-Speed TCP, H-TCP, BIC-TCP, CUBIC-TCP Scalable TCP Time (RTT) Time (RTT) Figure: Packet loss recovery times for the traditional TCP (left) are proportional to cwnd and RTT. A Scalable TCP connection (right) has packet loss recovery times that are proportional to connection's RTT only. (Note: link capacity c < C) Eva Hladká (Fl MU) 4. Behind Traditional TCP . Autumn 2015 31 / 66 Conservative Extensions to TCP Scalable TCP, High-Speed TCP, H-TCP, BIC-TCP, CUBIC-TCP Scalable TCP - fairness I. Two concurrent Scalable TCP streams, Scalable control switched on when >30Mb/s, twiced number of steps in comparison with previous simulations 100 80 - 60 - 40 - 20 - 0 ▲ \ Sir Ss * ä f....... /......i r^. . . . ,i A .......i Jt ......äá íĺ-— 0 Eva Hladká (Fl MU) 20 40 60 80 4. Behind Traditional TCP . 100 Autumn 2015 32 / 66 Conservative Extensions to TCP Scalable TCP, High-Speed TCP, H-TCP, BIC-TCP, CUBIC-TCP Scalable TCP - fairness II. Scalable TCP and traditional TCP streams, Scalable control switched on when >30Mb/s, twiced number of steps 100 80 - 60 - 40 - 20 - 0 ▲ \ Sir Ss /......i r^. . . . ,i .......i jŕ p....... Jt .....Jt iĹ-— 0 Eva Hladká (Fl MU) 20 40 60 80 100_ 4. Behind Traditional TCP . Autumn 2015 33 / 66 Conservative Extensions to TCP Scalable TCP, High-Speed TCP, H-TCP, BIC-TCP, CUBIC-TCP Scalable TCP - Response curve 1000 r (Ü N "en O "D 100 F 10 r T-1-1 I I 1 0.0001 Standard TCP J_I_I_I_I_■ ■ I_I_I_I_I_I_I_■ ■ I_I_I_I_I_I_I_1_L 0.001 0.01 Loss rate 0.1 Eva Hladká (Fl MU) 4. Behind Traditional TCP . Autumn 2015 34 / 66 Conservative Extensions to TCP Scalable TCP, High-Speed TCP, H-TCP, BIC-TCP, CUBIC-TCP High-Speed TCP (HSTCP) Sally Floyd, RFC3649, [2] congestion control AIMD/MIMD: • cwnd = cwnd + a(cwnd) . .. per RTT without loss i i , a(cwnd) cwnd = cwnd H—-—-r1 cwnd ... per ACK • cwnd = b(cwnd) cwnd ... per packet loss emulates the behavior of traditional TCP for small window sizes and/or higher packet loss rates in the network Eva Hladká (Fl MU) 4. Behind Traditional TCP . Autumn 2015 35 / 66 Conservative Extensions to TCP Scalable TCP, High-Speed TCP, H-TCP, BIC-TCP, CUBIC-TCP ;h-Speed TCP (HSTCP) proposed MIM D parametrization = -0.4(log(TcQ - 3,64) + 0j 5 7,69 a a(cwnd) = 2cwnd b(cwnd) 12,8(2- b(cwnd))w^2 80000 70000 ■■ 60000 ■■ si "Q C 800 1000 1200 time [RTT] 1600 1800 2000 Eva Hladká (Fl MU) 4. Behind Traditional TCP autumn 2015 36 / 66 Conservative Extensions to TCP Scalable TCP, High-Speed TCP, H-TCP, BIC-TCP, CUBIC-TCP High-Speed TCP (HSTCP) • a parametrization equivalent to the Scalable-TCP is possible: Linear HSTCP • a comparison with the Multi-stream TCP N(cwnd) « 0,23cwnd°A • N(cwnd) - the number of parallel TCP connections emulated by the HighSpeed TCP response function with congestion window cwnd Neither Scalable TCP nor HSTCP (sophistically) deal with the slow-start phase. Eva Hladká (Fl MU) 4. Behind Traditional TCP . Autumn 2015 37 / 66 Conservative Extensions to TCP Scalable TCP, High-Speed TCP, H-TCP, BIC-TCP, CUBIC-TCP HSTCP - Response curve 100000 Ltl \ -F m -F TJ Iii —i £ľ ■iH "O Ľ Oj ■M 10000 h 1000 h 100 h 10 h ÍÍ8A-3, 3S> Regular- TCP Highspeed TCP Scalable TCP _L _L _L 1 le-10 le-09 le-0S le-07 J_ _L _L _L X 1&-0Ě le-05 0.0001 0.001 Loss Rate P 0.01 0. 1 Eva Hladká (Fl MU) 4. Behind Traditional TCP Autumn 2015 38 / 66 Conservative Extensions to TCP Scalable TCP, High-Speed TCP, H-TCP, BIC-TCP, CUBIC-TCP H-TCP I. • created by researchers at the Hamilton Institute in Ireland • a simple change to cwnd increase function • increases its aggressiveness (in particular, the rate of additive increase) as the time since the previous loss (backoff) increases • increase rate a is a function of the elapsed time since the last backoff • the AIMD mechanism is used • preserves many of the key properties of standard TCP: fairness, responsiveness, relationship to buffering Eva Hladká (Fl MU) 4. Behind Traditional TCP . Autumn 2015 39 / 66 H-TCP II. Conservative Extensions to TCP Scalable TCP, High-Speed TCP, H-TCP, BIC-TCP, CUBIC-TCP • A . .. time elapsed from last congestion experienced • Ai ... for A < A/_ a TCP's grow is used • Aß . .. the bandwidth threshold, above which the TCP fall is used (for significant bandwidth changes the 0.5 fall is used) • Tmjn, Tmax ... the minimal resp. maximal RTTs measured • B(k) . .. maximum throughput measurement for the last interval without packet loss Eva Hladká (Fl MU) Autumn 2015 40 / 66 Conservative Extensions to TCP Scalable TCP, High-Speed TCP, H-TCP, BIC-TCP, CUBIC-TCI H-TCP III. i i , 2(1-/3) a(A) cwnd = cwr?a H—-—My ,v J cwnd ... per ACK cwnd — b{B) cwnd ... per loss 3(A) = b(B) = maxla^A)^; 1} 0,5 mjnrLbml-q 8} ' max J A < AL A> AL B(k+l)-B(k) B(k) in the other case > A B a7(A) = 1 + 10(A - A/_) + 0,5(A - AL) quadratic increment function 4. Behind Traditional TCP utumn 2015 41 / 66 Conservative Extensions to TCP Scalable TCP, High-Speed TCP, H-TCP, BIC-TCP, CUBIC-TCP BIC-TCP • the default algorithm in Linux kernels (2.6.8 and above) • uses binary-search algorithm for cwnd update [3] • 4 phases: (1) a reaction to a packet loss (2) additive increase (3) binary search (4) maximum probing Conservative Extensions to TCP Scalable TCP, High-Speed TCP, H-TCP, BIC-TCP, CUBIC-TCP BIC-TCP (1) Packet loss • BIC-TCP starts from the TCP slow start • when a loss is detected, it uses multiplicative decrease (as standard TCP) and sets the windows just before and after loss event as: • previous window size —>• l/l/max (the size of cwnd before the loss) • reduced window size —>> Wm-m (the size of cwnd after the loss) • =^ because the loss occured when cwnd < Wmax, the point of equilibrium of cwnd will be searched in the range {Wm-In\ Wmax) (2) Additive increase • starting the search from cwnd = Wmin+Wm3x might be too challenging for the network • thus, when Wmin+Wmax > Wmm + Smax, the additive increase takes place -> cwnd = Wmin + Smax • the window linearly increases by Smax every RTT Eva Hladká (Fl MU) 4. Behind Traditional TCP . Autumn 2015 43 / 66 Conservative Extensions to TCP Scalable TCP, High-Speed TCP, H-TCP, BIC-TCP, CUBIC-TCP BIC-TCP (3) Binary search • once the target (cwnd = VVm/n^VVmax) is reached, the Wmin = cwnd • otherwise (a packet loss happened) Wmax = cwnd • and the searching continues to the new target (using the additive increase, if necessary) until the change of cwnd is less than the Smjn constant • here, cwnd = Wmax is set The points (2) and (3) lead to linear (additive) increase, which turns into logarithmic one (binary search). Eva Hladká (Fl MU) 4. Behind Traditional TCP . Autumn 2015 44 / 66 Conservative Extensions to TCP Scalable TCP, High-Speed TCP, H-TCP, BIC-TCP, CUBIC-TCP BIC-TCP (4) Maximum probing • inverse process to points (3) and (2) • first, the inverse binary search takes place (until the cwnd growth is greater than Smax) • once the cwnd growth is greater than Smax, the linear growth (by a reasonably large fixed increment) takes place • first exponencial growth, then linear growth Assumed benefits: • traditional TCP "friendliness" • during the "plateau" (3), the TCP flows are able to grow • AIMD behavior (even though faster) during (2) and (4) phases • more stable window size =4> better network utilization • most of the time, the BIC-TCP should spend in the "plateau" (3) Eva Hladká (Fl MU) 4. Behind Traditional TCP . Autumn 2015 45 / 66 Conservative Extensions to TCP Scalable TCP, High-Speed TCP, H-TCP, BIC-TCP, CUBIC-TCP CUBIC-TCP even though being pretty good scalable, fair, and stable, BIC's growth function is considered to be still aggressive for TCP • especially under short RTTs or low speed networks CUBIC-TCP • a new release of BIC, which uses a cubic function • for the purpose of simplicity in protocol analysis, the number of phases was further reduced max Wcubic=C(T-K)3+Wr where C is a scaling constant, T is the time elapsed since last loss event, Wmax is the window size before loss event, K = w™xP\ and f3 is a constant decrease factor CWND Steady State Behavior Probing / W "max Time -► Eva Hladká (Fl MU) 4. Behind Traditional TCP . Autumn 2015 46 / 66 TCP Extensions with IP Support QuickStart, E-TCP, FAST Lecture overview Q Traditional TCP and its issues Q Improving the traditional TCP • Multi-stream TCP • WeblOO Q Conservative Extensions to TCP • GridDT • Scalable TCP, High-Speed TCP, H-TCP, BIC-TCP, CUBIC-TCP Q TCP Extensions with IP Support • QuickStart, E-TCP, FAST Q Approaches Different from TCP • tsunami • RBUDP • XCP • SCTP, DCCP, STP, Reliable UDP, XTP O Conclusions Q Literature Eva Hladká (Fl MU) 4. Behind Traditional TCP . Autumn 2015 47 / 66 TCP Extensions with IP Support QuickStart, E-TCP, FAST Quickstart (QS)/Limited Slowstart I. • there is a strong assumption, that the slow-start phase cannot be improved without an interaction with lower network layers • a proposal: 4-byte option in IP header, which comprises of QS TTL and Initial Rate fields • sender, which wants to use the QS, sets the QS TTL to an arbitrary (but high enough) value and the Initial Rate to requested rate, which it wants to start the sending at, and sends the SYN packet Eva Hladká (Fl MU) 4. Behind Traditional TCP . Autumn 2015 48 / 66 TCP Extensions with IP Support QuickStart, E-TCP, FAST Quickstart (QS)/Limited Slowstart II. • each router on the path, which support the QS, decreases the QS TTL by one and decreases the Initial Rate, if necessary • receiver sends the QS TTL and Initial Rate in the SYN/ACK packet to the sender • sender knows, whether all the routers on the path support the QS (by comparing the QS TTL and the TTL) • sender sets the appropriate cwnd and starts using its congestion control mechanism (e.g., AIMD) • Requires changes in the IP layer! :-( Eva Hladká (Fl MU) 4. Behind Traditional TCP . Autumn 2015 49 / 66 E-TCP I. TCP Extensions with IP Support QuickStart, E-TCP, FAST Early Congestion Notification (ECN) • a component of Advanced Queue Management (AQM) • a bit, which is set by routers when a congestion of link/buffer/queue is coming • ECN flag has to be mirrored by the receiver • the TCP should react to the ECN bit being set in the same way as to a packet loss • requires the routers' administrators to configure the AQM/ECN :-( Eva Hladká (Fl MU) 4. Behind Traditional TCP . Autumn 2015 50 / 66 E-TCP II. TCP Extensions with IP Support QuickStart, E-TCP, FAST E-TCP • proposes to mirror the ECN bit just once (for the first time only) • freezes the cwnd when an ACK having ECN-bit set is received from the receiver • requires introducing of small (synthetic) losses to the network in order to perform multiplicative decrease because of fairness • requires a change in receivers' behavior to ECN bit :-( Eva Hladká (Fl MU) 4. Behind Traditional TCP . Autumn 2015 51 / 66 TCP Extensions with IP Support QuickStart, E-TCP, FAST FAST • Fast AQM Scalable TCP (FAST) [5] • uses end-to-end delay, ECN and packet losses for congestion detection / avoidance • if too few packets are queued in the routers (detected by RTT monitoring), the sending rate is increased • differences from the TCP Vegas: • TCP Vegas makes fixed size adjustments to the rate, independent of how far the current rate is from the target rate • FAST TCP makes larger steps when the system is further from equilibrium and smaller steps near equilibrium • if the ECN is available in the network, FAST TCP can be extended to use ECN marking to replace/supplement queueing delay and packet loss as the congestion measure Eva Hladká (Fl MU) 4. Behind Traditional TCP . Autumn 2015 52 / Approaches Different from TCP Lecture overview Q Traditional TCP and its issues Q Improving the traditional TCP • Multi-stream TCP • WeblOO Q Conservative Extensions to TCP • GridDT • Scalable TCP, High-Speed TCP, H-TCP, BIC-TCP, CUBIC-TCP Q TCP Extensions with IP Support • QuickStart, E-TCP, FAST Q Approaches Different from TCP • tsunami • RBUDP • XCP • SCTP, DCCP, STP, Reliable UDP, XTP O Conclusions Q Literature Eva Hladká (Fl MU) 4. Behind Traditional TCP . Autumn 2015 53 / 66 Approaches Different from TCP tsunami tsunami TCP connection for out-of-band control channel • connection parameters negitiation • requirements for retransmissions - uses NACKs instead of ACKs • connection termination negotiation UDP channel for data transmission • MIMD congestion control • highly configurable/customizable • MIMD parameters, losses threshold, maximum size of the queue for retransmissions, the interval of sending the retransmissions' requests, etc. Eva Hladká (Fl MU) 4. Behind Traditional TCP . Autumn 2015 54 / 66 Approaches Different from TCP RBUDP Reliable Blast UDP - RBUDP similar to tsunami - out-of-band TCP channel for control, UDP for data transmission proposed for disk-to-disk transmissions, resp. the transmissions where the complete transmitted data could be saved in the sender's memory sends data in a user-defined rate • app_perf (a clon of iperf) is used for an estimation of networks'/receivers' capacity Eva Hladká (Fl MU) 4. Behind Traditional TCP . Autumn 2015 55 / 66 Approaches Different from TCP RBUDI iable Blast UDP - RBUDP Sender Receiver ----> UDP data traffic -► TCP signaling traffic Figure 1. The Time Sequence Diagram of RBUDP Source: E. He, J. Leigh, O. Yu, T. A. DeFanti, "Reliable Blast UDP: Predictable High Performance Bulk Data Transfer," IEEE Cluster Computing 2002, Chicago, Illinois, Sept, 2002. A start of the transmission (using pre-defined rate) B end of the transmission C sending the DONE signal via the control channel; the receiver responses mask of data, that had arrived D re-sending of missing data with a E-F-G end of transmission C and D repeat until all the data are delivered. The steps Eva Hladká 4. Behind Traditional TCP utumn 2015 56 / 66 Approaches Different from TCP XCP eXplicit Control Protocol - XPC uses a feedback from routers per paket ' o o o o o o RÔL Round Trip Time Conla Congestion Window , Feedback = + 0.1 packet ■ ' o ooo o o ' Congestion Header Eva Hladká (Fl MU) 4. Behind Traditional TCP Autumn 2015 57 / 66 Approaches Different from TCP XCP eXplicit Control Protocol - XPC • uses a feedback from routers per paket m I- //°o0o0o0o0o0o°\\ Round Trip Time Congestion Window Feedback = - 0.3 packet Eva Hladká (Fl MU) 4. Behind Traditional TCP . Autumn 2015 57 / 66 Approaches Different from TCP XCP eXplicit Control Protocol - XPC uses a feedback from routers per paket Congestion Window = Congestion Window + Feedback Eva Hladká (Fl MU) 4. Behind Traditional TCP . Autumn 2015 57 / 66 Approaches Different from TCP SCTP, DCCP, STP, Reliable UDP, XTP Different approaches I. SCTP • multi-stream, multi-homed transport (end node might have several IP addresses) • message-oriented like UDP, ensures reliable, in-sequence transport of messages with congestion control like TCP • http://www.sctp.org/ DCCP • non-reliable protocol (UDP) with a congestion control compatible with the TCP • http://www.ietf.org/html.charters/deep-charter.html • http://www.icir.org/kohler/dcp/ Eva Hladká (Fl MU) 4. Behind Traditional TCP . Autumn 2015 58 / 66 Approaches Different from TCP SCTP, DCCP, STP, Reliable UDP, XTP Different approaches II. • STP • based on CTS/RTS • a simple protocol designed for a simple implementation in HW • without any sophisticated congestion control mechanism • http://lwn.net/2001/features/OLS/pdf/pdf/stlinux.pdf • Reliable UDP • ensures reliable and in-order delivery (up to the maximum number of retransmissions) • RFC908 a RFC1151 9 originally proposed for IP telephony • connection parameters can be set per-connection • http://www.j avvin.com/protocolRUDP.html • XTP (Xpress Transfer Protocol), ... Eva Hladká (Fl MU) 4. Behind Traditional TCP . Autumn 2015 59 / 66 Lecture overview Q Traditional TCP and its issues Q Improving the traditional TCP • Multi-stream TCP • WeblOO Q Conservative Extensions to TCP • GridDT • Scalable TCP, High-Speed TCP, H-TCP, BIC-TCP, CUBIC-TCP Q TCP Extensions with IP Support • QuickStart, E-TCP, FAST Q Approaches Different from TCP • tsunami • RBUDP • XCP • SCTP, DCCP, STP, Reliable UDP, XTP O Conclusions Q Literature Eva Hladká (Fl MU) 4. Behind Traditional TCP . Autumn 2015 Conclusions I. Conclusions Current state: • multi-stream TCP is intensively used (e.g., Grid applications) • looking for a way which will allow safe (i.e., backward compatible) development/deployment of post-TCP protocols • aggressive protocols are used on private/dedicated networks/circuits (e.g., A-networks Czechl_ight/CESNET2, SurfNet, CaNET*4, ...) • implementation SCTP under FreeBSD 7.0 • implementation DCCP under Linux Eva Hladká (Fl MU) 4. Behind Traditional TCP . Autumn 2015 61 / 66 Conclusions I. • interaction with L3 (IP) • interaction with data link layer • variable delay and throughput in wireless networks • optical burst switching • specific per-flow states in routers: • e.g., per-flow setting for packet loss generation (—>> E-TCP) • may help short-term flows with high capacity demands (macro-bursts) • problem with scalability and cost :-( Eva Hladká (Fl MU) 4. Behind Traditional TCP . Autumn 2015 62 / 66 Lecture overview Q Traditional TCP and its issues Q Improving the traditional TCP • Multi-stream TCP • WeblOO Q Conservative Extensions to TCP • GridDT • Scalable TCP, High-Speed TCP, H-TCP, BIC-TCP, CUBIC-TCP Q TCP Extensions with IP Support • QuickStart, E-TCP, FAST Q Approaches Different from TCP • tsunami • RBUDP • XCP • SCTP, DCCP, STP, Reliable UDP, XTP O Conclusions O Literature Eva Hladká (Fl MU) 4. Behind Traditional TCP . Autumn 2015 Literature Jacobson V. "Congestion Avoidance and Control", Proceedings of ACM SIGCOMM'88 (Standford, CA, Aug. 1988), pp. 314-329. ftp://ftp.ee.lbl.gov/papers/congavoid.ps.Z [5 Allman M., Paxson V., Stevens W. "TCP Congestion Control", RFC2581, Apr. 1999. http://www.rfc-editor.org/rfc/rfc2581.txt _I Brakmo L, Peterson L. "TCP Vegas: End to End Congestion Avoidance on a Global Internet", IEEE Journal of Selected Areas in Communication, Vol. 13, No. 8, pp. 1465-1480, Oct. 1995. ftp://ftp.cs.arizona.edu/xkernel/Papers/jsac.ps _I http://www.weblOO.org _I Hacker T. J., Athey B. D., Sommerfield J. "Experiences Using WeblOO for End-To-End Network Performance Tuning" http://www.weblOO.org/docs/ExperiencesUsingWeblOOforHostTuning.pdf Eva Hladká (Fl MU) 4. Behind Traditional TCP . Autumn 2015 64 / Literature Kelly T. "Scalable TCP: Improving Performance in Highspeed Wide Area Networks", PFLDnet2003, http://datatag.web.cern.ch/datatag/pfldnet2003/papers/kelly.pdf, http://wwwlce.eng.cam.ac.uk/~ctk21/scalable/ Floyd S. "HighSpeed TCP for Large Congestion Windows", 2003, http://www.potaroo.net/ietf/all-ids/draft-floyd-tcp-highspeed-03.txt BIC-TCP, http://www.esc.ncsu.edu/faculty/rhee/export/bitcp/ Floyd S., Allman M., Jain A., Sarolahti P. "Quick-Start for TCP and IP", 2006, http://www.ietf.org/internet-drafts/draft-ietf-tsvwg-quickstart-02.txt Jin C, Wei D., Low S. H., Buhrmaster G., Bunn J., Choe D. H., Cottrell R. L. A., Doyle J C, Newman H., Paganini F., Ravot S., Singh S. "FAST - Fast AQM Scalable TCP." http://netlab.caltech.edu/FAST/ http://netlab.caltech.edu/pub/papers/FAST-infocom2004.pdf tsunami, http: //www. anml. iu. edu/anmlresearch.html Zdroj: E. He, J. Leigh, O. Yu, T. A. DeFanti, "Reliable Blast UDP: Predictable High Performance Bulk Data Transfer," IEEE Cluster Computing 2002, Chicago, Illinois, Sept, 2002. Eva Hladká (Fl MU) 4. Behind Traditional TCP . Autumn 2015 65 / Further materials • Workshops PFLDnet 2003-2010 • http: //datatag.web.cern.ch/datatag/pfldnet2003/program.html • http://www-didc.lbl.gov/PFLDnet2004/ • http://www.ens-lyon.fr/LIP/RESO/pfldnet2005/ • http://www.hpcc.jp/pfldnet2006/ • http://wil.cs.caltech.edu/pfldnet2007/ • prof. Sally Floyd's pages: • http://www.icir.org/floyd/papers.html • RFC3426 - "General Architectural and Policy Considerations" http://www.Hamilton.ie/net/eval/results_HI2005.pdf Eva Hladká (Fl MU) 4. Behind Traditional TCP . Autumn 2015 66