Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
Transmission Control Protocol
(section)
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
==Protocol operation== [[File:Tcp state diagram fixed new.svg|right|thumbnail|250px|A Simplified TCP State Diagram.]] TCP protocol operations may be divided into three phases. ''Connection establishment'' is a multi-step handshake process that establishes a connection before entering the ''data transfer'' phase. After data transfer is completed, the ''connection termination'' closes the connection and releases all allocated resources. A TCP connection is managed by an operating system through a resource that represents the local end-point for communications, the ''[[Internet socket]]''. During the lifetime of a TCP connection, the local end-point undergoes a series of [[state (computer science)|state]] changes:{{sfn|RFC 9293|loc=3.3.2. State Machine Overview}} {| class="wikitable" |+ TCP socket states ! State ! Endpoint ! Description |- |LISTEN |Server |Waiting for a connection request from any remote TCP end-point. |- |SYN-SENT |Client |Waiting for a matching connection request after having sent a connection request. |- |SYN-RECEIVED |Server |Waiting for a confirming connection request acknowledgment after having both received and sent a connection request. |- |ESTABLISHED |Server and client |An open connection, data received can be delivered to the user. The normal state for the data transfer phase of the connection. |- |FIN-WAIT-1 |Server and client |Waiting for a connection termination request from the remote TCP, or an acknowledgment of the connection termination request previously sent. |- |FIN-WAIT-2 |Server and client |Waiting for a connection termination request from the remote TCP. |- |CLOSE-WAIT |Server and client |Waiting for a connection termination request from the local user. |- |CLOSING |Server and client |Waiting for a connection termination request acknowledgment from the remote TCP. |- |LAST-ACK |Server and client |Waiting for an acknowledgment of the connection termination request previously sent to the remote TCP (which includes an acknowledgment of its connection termination request). |- |TIME-WAIT |Server or client |Waiting for enough time to pass to be sure that all remaining packets on the connection have expired. |- |CLOSED |Server and client |No connection state at all. |} ==={{Anchor|CONNECTION-ESTABLISHMENT}}Connection establishment=== Before a client attempts to connect with a server, the server must first bind to and listen at a port to open it up for connections: this is called a passive open. Once the passive open is established, a client may establish a connection by initiating an active open using the three-way (or 3-step) handshake: # '''SYN''': The active open is performed by the client sending a SYN to the server. The client sets the segment's sequence number to a random value A. # '''SYN-ACK''': In response, the server replies with a SYN-ACK. The acknowledgment number is set to one more than the received sequence number i.e. A+1, and the sequence number that the server chooses for the packet is another random number, B. # '''ACK''': Finally, the client sends an ACK back to the server. The sequence number is set to the received acknowledgment value i.e. A+1, and the acknowledgment number is set to one more than the received sequence number i.e. B+1. Steps 1 and 2 establish and acknowledge the sequence number for one direction (client to server). Steps 2 and 3 establish and acknowledge the sequence number for the other direction (server to client). Following the completion of these steps, both the client and server have received acknowledgments and a full-duplex communication is established. ===Connection termination===<!--[[FIN (TCP)]] redirects here--> [[File:TCP CLOSE.svg|right|thumbnail|260px|Connection termination]] [[File:TCP close() - sequence diagram.svg|thumb|Detailed TCP close() sequence diagram]] The connection termination phase uses a four-way handshake, with each side of the connection terminating independently. When an endpoint wishes to stop its half of the connection, it transmits a FIN packet, which the other end acknowledges with an ACK. Therefore, a typical tear-down requires a pair of FIN and ACK segments from each TCP endpoint. After the side that sent the first FIN has responded with the final ACK, it waits for a timeout before finally closing the connection, during which time the local port is unavailable for new connections; this state lets the TCP client resend the final acknowledgment to the server in case the ACK is lost in transit. The time duration is implementation-dependent, but some common values are 30 seconds, 1 minute, and 2 minutes. After the timeout, the client enters the CLOSED state and the local port becomes available for new connections.<ref>{{Cite book |last=Kurose |first=James F. |url=https://www.worldcat.org/oclc/936004518 |title=Computer networking : a top-down approach |date=2017 |others=Keith W. Ross |isbn=978-0-13-359414-0 |edition=7th |location=Harlow, England |pages=286 |oclc=936004518}}</ref> It is also possible to terminate the connection by a 3-way handshake, when host A sends a FIN and host B replies with a FIN & ACK (combining two steps into one) and host A replies with an ACK.<ref>{{cite book|last= Tanenbaum|first= Andrew S.|author-link= Andrew S. Tanenbaum|title= Computer Networks|edition= Fourth|date= 2003-03-17|publisher= Prentice Hall|isbn= 978-0-13-066102-9|url= https://archive.org/details/computernetworks00tane_2}}</ref> Some operating systems, such as [[Linux]]<ref>{{Cite web |title=linux/net/ipv4/tcp_minisocks.c at master · torvalds/linux |url=https://github.com/torvalds/linux/blob/master/net/ipv4/tcp_minisocks.c |access-date=2025-04-24 |website=GitHub |language=en}}</ref> implement a half-duplex close sequence. If the host actively closes a connection, while still having unread incoming data available, the host sends the signal RST (losing any received data) instead of FIN. This assures that a TCP application is aware there was a data loss.{{sfn|RFC 1122|loc=4.2.2.13. Closing a Connection}} A connection can be in a [[TCP half-open|half-open]] state, in which case one side has terminated the connection, but the other has not. The side that has terminated can no longer send any data into the connection, but the other side can. The terminating side should continue reading the data until the other side terminates as well.<ref>{{Cite web |date=2020-03-02 |title=TCP (Transmission Control Protocol) – The transmission protocol explained |url=https://www.ionos.com/digitalguide/server/know-how/introduction-to-tcp/ |access-date=2025-04-24 |website=IONOS Digital Guide |language=en}}</ref><ref>{{Cite web |title=The TCP/IP Guide - TCP Connection Termination |url=http://www.tcpipguide.com/free/t_TCPConnectionTermination-2.htm |access-date=2025-04-24 |website=www.tcpipguide.com}}</ref> ===Resource usage=== Most implementations allocate an entry in a table that maps a session to a running operating system process. Because TCP packets do not include a session identifier, both endpoints identify the session using the client's address and port. Whenever a packet is received, the TCP implementation must perform a lookup on this table to find the destination process. Each entry in the table is known as a Transmission Control Block or TCB. It contains information about the endpoints (IP and port), status of the connection, running data about the packets that are being exchanged and buffers for sending and receiving data. The number of sessions in the server side is limited only by memory and can grow as new connections arrive, but the client must allocate an [[ephemeral port]] before sending the first SYN to the server. This port remains allocated during the whole conversation and effectively limits the number of outgoing connections from each of the client's IP addresses. If an application fails to properly close unrequired connections, a client can run out of resources and become unable to establish new TCP connections, even from other applications. Both endpoints must also allocate space for unacknowledged packets and received (but unread) data. ===Data transfer=== The Transmission Control Protocol differs in several key features compared to the [[User Datagram Protocol]]: * Ordered data transfer: the destination host rearranges segments according to a sequence number<ref name=comer/> * Retransmission of lost packets: any cumulative stream not acknowledged is retransmitted<ref name=comer/> * Error-free data transfer: corrupted packets are treated as lost and are retransmitted{{sfn|RFC 9293|loc=2.2. Key TCP Concepts}} * Flow control: limits the rate a sender transfers data to guarantee reliable delivery. The receiver continually hints the sender on how much data can be received. When the receiving host's buffer fills, the next acknowledgment suspends the transfer and allows the data in the buffer to be processed.<ref name=comer/> * Congestion control: lost packets (presumed due to congestion) trigger a reduction in data delivery rate<ref name=comer/> ====Reliable transmission==== TCP uses a ''sequence number'' to identify each byte of data. The sequence number identifies the order of the bytes sent from each computer so that the data can be reconstructed in order, regardless of any [[out-of-order delivery]] that may occur. The sequence number of the first byte is chosen by the transmitter for the first packet, which is flagged SYN. This number can be arbitrary, and should, in fact, be unpredictable to defend against [[TCP sequence prediction attack]]s. Acknowledgments (ACKs) are sent with a sequence number by the receiver of data to tell the sender that data has been received to the specified byte. ACKs do not imply that the data has been delivered to the application, they merely signify that it is now the receiver's responsibility to deliver the data. Reliability is achieved by the sender detecting lost data and retransmitting it. TCP uses two primary techniques to identify loss. Retransmission timeout (RTO) and duplicate cumulative acknowledgments (DupAcks). When a TCP segment is retransmitted, it retains the same sequence number as the original delivery attempt. This conflation of delivery and logical data ordering means that, when acknowledgment is received after a retransmission, the sender cannot tell whether the original transmission or the retransmission is being acknowledged, the so-called ''retransmission ambiguity''.{{sfn|Karn|Partridge|1991|p=364}} TCP incurs complexity due to retransmission ambiguity.{{sfn|RFC 9002|loc=4.2. Monotonically Increasing Packet Numbers}} =====Duplicate-ACK-based retransmission===== If a single segment (say segment number 100) in a stream is lost, then the receiver cannot acknowledge packets above that segment number (100) because it uses cumulative ACKs. Hence the receiver acknowledges packet 99 again on the receipt of another data packet. This duplicate acknowledgement is used as a signal for packet loss. That is, if the sender receives three duplicate acknowledgments, it retransmits the last unacknowledged packet. A threshold of three is used because the network may reorder segments causing duplicate acknowledgements. This threshold has been demonstrated to avoid spurious retransmissions due to reordering.<ref>{{cite journal|last1=Mathis|last2=Mathew|last3=Semke|last4=Mahdavi|last5=Ott|title=The macroscopic behavior of the TCP congestion avoidance algorithm|journal=ACM SIGCOMM Computer Communication Review|volume=27|issue=3|pages=67–82|year=1997|doi=10.1145/263932.264023|citeseerx=10.1.1.40.7002|s2cid=1894993}}</ref> Some TCP implementations use [[selective acknowledgement]]s (SACKs) to provide explicit feedback about the segments that have been received. This greatly improves TCP's ability to retransmit the right segments. Retransmission ambiguity can cause spurious fast retransmissions and congestion avoidance if there is reordering beyond the duplicate acknowledgment threshold.{{sfn|RFC 3522|p=4}} In the last two decades more packet reordering has been observed over the Internet<ref>{{cite journal |last1=Leung |first1=Ka-cheong |last2=Li |first2=Victor O.k. |last3=Yang |first3=Daiqin |date=2007 |title=An Overview of Packet Reordering in Transmission Control Protocol (TCP): Problems, Solutions, and Challenges |url=https://ieeexplore.ieee.org/document/4118693 |journal=IEEE Transactions on Parallel and Distributed Systems |volume=18 |issue=4 |pages=522–535 |doi=10.1109/TPDS.2007.1011}}</ref> which led TCP implementations, such as the one in the Linux Kernel to adopt heuristic methods to scale the duplicate acknowledgment threshold.<ref>{{cite thesis |last=Johannessen |first=Mads |date=2015 |title=Investigate reordering in Linux TCP |publisher=University of Oslo |url=http://urn.nb.no/URN:NBN:no-51662 |degree=MSc}}</ref> Recently, there have been efforts to completely phase out duplicate-ACK-based fast-retransmissions and replace them with timer based ones.<ref>{{cite conference |url=https://www.ietf.org/proceedings/94/slides/slides-94-tcpm-6.pdf |title=RACK: a time-based fast loss detection for TCP draft-cheng-tcpm-rack-00 |last1=Cheng |first1=Yuchung |date=2015 |publisher=IETF |location=Yokohama |conference=IETF94}}</ref> (Not to be confused with the classic RTO discussed below). The time based loss detection algorithm called Recent Acknowledgment (RACK){{Sfn|RFC 8985}} has been adopted as the default algorithm in Linux and Windows.<ref>{{cite conference |url=https://datatracker.ietf.org/meeting/100/materials/slides-100-tcpm-draft-ietf-tcpm-rack-01.pdf |title=RACK: a time-based fast loss recovery draft-ietf-tcpm-rack-02 |last1=Cheng |first1=Yuchung |last2=Cardwell |first2=Neal |last3=Dukkipati |first3=Nandita |last4=Jha |first4=Priyaranjan |date=2017 |publisher=IETF |location=Yokohama |conference=IETF100}}</ref> =====Timeout-based retransmission===== When a sender transmits a segment, it initializes a timer with a conservative estimate of the arrival time of the acknowledgment. The segment is retransmitted if the timer expires, with a new timeout threshold of twice the previous value, resulting in [[exponential backoff]] behavior. Typically, the initial timer value is {{math|smoothed RTT + max(''G'', 4{{times}}RTT variation)}}, where {{mvar|G}} is the clock granularity.{{sfn|RFC 6298|p=2}} This guards against excessive transmission traffic due to faulty or malicious actors, such as [[man-in-the-middle attack|man-in-the-middle]] [[denial of service attack]]ers. Accurate RTT estimates are important for loss recovery, as it allows a sender to assume an unacknowledged packet to be lost after sufficient time elapses (i.e., determining the RTO time).{{sfn|Zhang|1986|p=399}} Retransmission ambiguity can lead a sender's estimate of RTT to be imprecise.{{sfn|Zhang|1986|p=399}} In an environment with variable RTTs, spurious timeouts can occur:{{sfn|Karn|Partridge|1991|p=365}} if the RTT is under-estimated, then the RTO fires and triggers a needless retransmit and slow-start. After a spurious retransmission, when the acknowledgments for the original transmissions arrive, the sender may believe them to be acknowledging the retransmission and conclude, incorrectly, that segments sent between the original transmission and retransmission have been lost, causing further needless retransmissions to the extent that the link truly becomes congested;{{sfn|Ludwig|Katz|2000|p=31-33}}{{sfn|Gurtov|Ludwig|2003|p=2}} selective acknowledgement can reduce this effect.{{sfn|Gurtov|Floyd|2004|p=1}} {{harvtxt|RFC 6298}} specifies that implementations must not use retransmitted segments when estimating RTT.{{sfn|RFC 6298|p=4}} [[Karn's algorithm]] ensures that a good RTT estimate will be produced—eventually—by waiting until there is an unambiguous acknowledgment before adjusting the RTO.{{sfn|Karn|Partridge|1991|p=370-372}} After spurious retransmissions, however, it may take significant time before such an unambiguous acknowledgment arrives, degrading performance in the interim.{{sfn|Allman|Paxson|1999|p=268}} TCP timestamps also resolve the retransmission ambiguity problem in setting the RTO,{{sfn|RFC 6298|p=4}} though they do not necessarily improve the RTT estimate.{{sfn|RFC 7323|p=7}} ====Error detection==== Sequence numbers allow receivers to discard duplicate packets and properly sequence out-of-order packets. Acknowledgments allow senders to determine when to retransmit lost packets. To assure correctness a checksum field is included; see {{slink||Checksum computation}} for details. The TCP checksum is a weak check by modern standards and is normally paired with a [[cyclic redundancy check|CRC]] integrity check at [[layer 2]], below both TCP and IP, such as is used in [[Point-to-Point Protocol|PPP]] or the [[Ethernet]] frame. However, introduction of errors in packets between CRC-protected hops is common and the 16-bit TCP checksum catches most of these.<ref>{{Cite conference|last1=Stone |last2=Partridge |title=Proceedings of the conference on Applications, Technologies, Architectures, and Protocols for Computer Communication |chapter=When the CRC and TCP checksum disagree |journal=ACM SIGCOMM Computer Communication Review |pages=309–319 |year=2000 |chapter-url=http://citeseer.ist.psu.edu/stone00when.html |doi=10.1145/347059.347561 |citeseerx=10.1.1.27.7611 |isbn=978-1581132236 |s2cid=9547018 |access-date=2008-04-28 |archive-date=2008-05-05 |archive-url=https://web.archive.org/web/20080505024952/http://citeseer.ist.psu.edu/stone00when.html |url-status=live }}</ref> ====Flow control==== TCP uses an end-to-end [[flow control (data)|flow control]] protocol to avoid having the sender send data too fast for the TCP receiver to receive and process it reliably. Having a mechanism for flow control is essential in an environment where machines of diverse network speeds communicate. For example, if a PC sends data to a smartphone that is slowly processing received data, the smartphone must be able to regulate the data flow so as not to be overwhelmed.<ref name=comer/> TCP uses a [[sliding window]] flow control protocol. In each TCP segment, the receiver specifies in the ''receive window'' field the amount of additionally received data (in bytes) that it is willing to buffer for the connection. The sending host can send only up to that amount of data before it must wait for an acknowledgment and receive window update from the receiving host. [[File:Tcp.svg|right|thumbnail|250px|TCP sequence numbers and receive windows behave very much like a clock. The receive window shifts each time the receiver receives and acknowledges a new segment of data. Once it runs out of sequence numbers, the sequence number loops back to 0.]] When a receiver advertises a window size of 0, the sender stops sending data and starts its ''persist timer''. The persist timer is used to protect TCP from a [[deadlock (computer science)|deadlock]] situation that could arise if a subsequent window size update from the receiver is lost, and the sender cannot send more data until receiving a new window size update from the receiver. When the persist timer expires, the TCP sender attempts recovery by sending a small packet so that the receiver responds by sending another acknowledgment containing the new window size. If a receiver is processing incoming data in small increments, it may repeatedly advertise a small receive window. This is referred to as the [[silly window syndrome]], since it is inefficient to send only a few bytes of data in a TCP segment, given the relatively large overhead of the TCP header. ====Congestion control==== {{Main|TCP congestion control}} The final main aspect of TCP is [[congestion control]]. TCP uses a number of mechanisms to achieve high performance and avoid [[congestive collapse]], a gridlock situation where network performance is severely degraded. These mechanisms control the rate of data entering the network, keeping the data flow below a rate that would trigger collapse. They also yield an approximately [[max-min fair]] allocation between flows. Acknowledgments for data sent, or the lack of acknowledgments, are used by senders to infer network conditions between the TCP sender and receiver. Coupled with timers, TCP senders and receivers can alter the behavior of the flow of data. This is more generally referred to as congestion control or congestion avoidance. Modern implementations of TCP contain four intertwined algorithms: [[TCP congestion control#Slow start|slow start]], [[TCP congestion avoidance algorithm|congestion avoidance]], [[fast retransmit]], and [[fast recovery]].{{sfn|RFC 5681}} In addition, senders employ a ''retransmission timeout'' (RTO) that is based on the estimated [[round-trip time]] (RTT) between the sender and receiver, as well as the variance in this round-trip time.{{sfn|RFC 6298}} There are subtleties in the estimation of RTT. For example, senders must be careful when calculating RTT samples for retransmitted packets; typically they use [[Karn's Algorithm]] or TCP timestamps.{{sfn|RFC 7323}} These individual RTT samples are then averaged over time to create a smoothed round trip time (SRTT) using [[Jacobson's algorithm]]. This SRTT value is what is used as the round-trip time estimate. Enhancing TCP to reliably handle loss, minimize errors, manage congestion and go fast in very high-speed environments are ongoing areas of research and standards development. As a result, there are a number of [[TCP congestion avoidance algorithm]] variations. ===Maximum segment size=== The [[maximum segment size]] (MSS) is the largest amount of data, specified in bytes, that TCP is willing to receive in a single segment. For best performance, the MSS should be set small enough to avoid [[IP fragmentation]], which can lead to packet loss and excessive retransmissions. To accomplish this, typically the MSS is announced by each side using the MSS option when the TCP connection is established. The option value is derived from the [[MTU (networking)|maximum transmission unit]] (MTU) size of the data link layer of the networks to which the sender and receiver are directly attached. TCP senders can use [[path MTU discovery]] to infer the minimum MTU along the network path between the sender and receiver, and use this to dynamically adjust the MSS to avoid IP fragmentation within the network. MSS announcement may also be called ''MSS negotiation'' but, strictly speaking, the MSS is not ''negotiated''. Two completely independent values of MSS are permitted for the two directions of data flow in a TCP connection,{{sfn|RFC 1122}}{{sfn|RFC 9293}} so there is no need to agree on a common MSS configuration for a bidirectional connection. ===Selective acknowledgments=== {{see also|SACK Panic}} Relying purely on the cumulative acknowledgment scheme employed by the original TCP can lead to inefficiencies when packets are lost. For example, suppose bytes with sequence number 1,000 to 10,999 are sent in 10 different TCP segments of equal size, and the second segment (sequence numbers 2,000 to 2,999) is lost during transmission. In a pure cumulative acknowledgment protocol, the receiver can only send a cumulative ACK value of 2,000 (the sequence number immediately following the last sequence number of the received data) and cannot say that it received bytes 3,000 to 10,999 successfully. Thus the sender may then have to resend all data starting with sequence number 2,000. To alleviate this issue TCP employs the ''selective acknowledgment (SACK)'' option, defined in 1996 in {{harvtxt|RFC 2018}}, which allows the receiver to acknowledge discontinuous blocks of packets that were received correctly, in addition to the sequence number immediately following the last sequence number of the last contiguous byte received successively, as in the basic TCP acknowledgment. The acknowledgment can include a number of ''SACK blocks'', where each SACK block is conveyed by the ''Left Edge of Block'' (the first sequence number of the block) and the ''Right Edge of Block'' (the sequence number immediately following the last sequence number of the block), with a ''Block'' being a contiguous range that the receiver correctly received. In the example above, the receiver would send an ACK segment with a cumulative ACK value of 2,000 and a SACK option header with sequence numbers 3,000 and 11,000. The sender would accordingly retransmit only the second segment with sequence numbers 2,000 to 2,999. A TCP sender may interpret an out-of-order segment delivery as a lost segment. If it does so, the TCP sender will retransmit the segment previous to the out-of-order packet and slow its data delivery rate for that connection. The duplicate-SACK option, an extension to the SACK option that was defined in May 2000 in {{harvtxt|RFC 2883}}, solves this problem. Once the TCP receiver detects a second duplicate packet, it sends a D-ACK to indicate that no segments were lost, allowing the TCP sender to reinstate the higher transmission rate. The SACK option is not mandatory and comes into operation only if both parties support it. This is negotiated when a connection is established. SACK uses a TCP header option (see {{slink||TCP segment structure}} for details). The use of SACK has become widespread—all popular TCP stacks support it. Selective acknowledgment is also used in [[Stream Control Transmission Protocol]] (SCTP). Selective acknowledgements can be 'reneged', where the receiver unilaterally discards the selectively acknowledged data. {{harvtxt|RFC 2018}} discouraged such behavior, but did not prohibit it to allow receivers the option of reneging if they, for example, ran out of buffer space.{{sfn|RFC 2018|p=10}} The possibility of reneging leads to implementation complexity for both senders and receivers, and also imposes memory costs on the sender.{{sfn|RFC 9002|loc=4.4. No Reneging}} ===Window scaling=== {{Main|TCP window scale option}} For more efficient use of high-bandwidth networks, a larger TCP window size may be used. A 16-bit TCP window size field controls the flow of data and its value is limited to 65,535 bytes. Since the size field cannot be expanded beyond this limit, a scaling factor is used. The [[TCP window scale option]], as defined in {{harvtxt|RFC 1323}}, is an option used to increase the maximum window size to 1 gigabyte. Scaling up to these larger window sizes is necessary for [[TCP tuning]]. The window scale option is used only during the TCP 3-way handshake. The window scale value represents the number of bits to left-shift the 16-bit window size field when interpreting it. The window scale value can be set from 0 (no shift) to 14 for each direction independently. Both sides must send the option in their SYN segments to enable window scaling in either direction. Some routers and packet firewalls rewrite the window scaling factor during a transmission. This causes sending and receiving sides to assume different TCP window sizes. The result is non-stable traffic that may be very slow. The problem is visible on some sites behind a defective router.<ref>{{cite web |url=https://lwn.net/Articles/92727/ |title=TCP window scaling and broken routers |website=LWN.net |access-date=2016-07-21 |archive-date=2020-03-31 |archive-url=https://web.archive.org/web/20200331213612/https://lwn.net/Articles/92727/ |url-status=live }}</ref> ===TCP timestamps=== TCP timestamps, defined in {{harvtxt|RFC 1323}} in 1992, can help TCP determine in which order packets were sent. TCP timestamps are not normally aligned to the system clock and start at some random value. Many operating systems will increment the timestamp for every elapsed millisecond; however, the RFC only states that the ticks should be proportional. There are two timestamp fields: * a 4-byte sender timestamp value (my timestamp) * a 4-byte echo reply timestamp value (the most recent timestamp received from you). TCP timestamps are used in an algorithm known as ''Protection Against Wrapped Sequence'' numbers, or ''PAWS''. PAWS is used when the receive window crosses the sequence number wraparound boundary. In the case where a packet was potentially retransmitted, it answers the question: "Is this sequence number in the first 4 GB or the second?" And the timestamp is used to break the tie. Also, the Eifel detection algorithm uses TCP timestamps to determine if retransmissions are occurring because packets are lost or simply out of order.{{sfn|RFC 3522}} TCP timestamps are enabled by default in Linux,<ref>{{cite web |title=IP sysctl |url=https://www.kernel.org/doc/Documentation/networking/ip-sysctl.txt |website=Linux Kernel Documentation |access-date=15 December 2018 |archive-date=5 March 2016 |archive-url=https://web.archive.org/web/20160305080444/https://www.kernel.org/doc/Documentation/networking/ip-sysctl.txt |url-status=live }}</ref> and disabled by default in Windows Server 2008, 2012 and 2016.<ref>{{cite web |last1=Wang |first1=Eve |title=TCP timestamp is disabled |url=https://social.technet.microsoft.com/Forums/office/en-US/6b1e4653-320f-4dbf-8b1a-64d27d8464fc/tcp-timestamp-is-disabled |website=Technet – Windows Server 2012 Essentials |publisher=Microsoft |access-date=2018-12-15 |archive-url=https://web.archive.org/web/20181215225201/https://social.technet.microsoft.com/Forums/office/en-US/6b1e4653-320f-4dbf-8b1a-64d27d8464fc/tcp-timestamp-is-disabled |archive-date=2018-12-15 |url-status=dead }}</ref> Recent Statistics show that the level of TCP timestamp adoption has stagnated, at ~40%, owing to Windows Server dropping support since Windows Server 2008.<ref name="2017stats">{{cite web |url=http://profiles.murdoch.edu.au/myprofile/david-murray/files/2012/06/An_Analysis_of_Changing_Enterprise_Network_Traffic_Characteristics-22.pdf |title=An Analysis of Changing Enterprise Network Traffic Characteristics |author1=David Murray |author2=Terry Koziniec |author3=Sebastian Zander |author4=Michael Dixon |author5=Polychronis Koutsakis |publisher=The 23rd Asia-Pacific Conference on Communications (APCC 2017) |date=2017 |access-date=3 October 2017 |archive-date=3 October 2017 |archive-url=https://web.archive.org/web/20171003124654/http://profiles.murdoch.edu.au/myprofile/david-murray/files/2012/06/An_Analysis_of_Changing_Enterprise_Network_Traffic_Characteristics-22.pdf |url-status=live }}</ref> ===Out-of-band data=== It is possible to interrupt or abort the queued stream instead of waiting for the stream to finish. This is done by specifying the data as ''urgent''. This marks the transmission as [[out-of-band data]] (OOB) and tells the receiving program to process it immediately. When finished, TCP informs the application and resumes the stream queue. An example is when TCP is used for a remote login session where the user can send a keyboard sequence that interrupts or aborts the remotely running program without waiting for the program to finish its current transfer.<ref name=comer/> The ''urgent'' pointer only alters the processing on the remote host and doesn't expedite any processing on the network itself. The capability is implemented differently or poorly on different systems or may not be supported. Where it is available, it is prudent to assume only single bytes of OOB data will be reliably handled.<ref>{{cite web |last= Gont |first= Fernando |title= On the implementation of TCP urgent data |publisher= 73rd IETF meeting |date= November 2008 |url= http://www.gont.com.ar/talks/IETF73/ietf73-tcpm-urgent-data.ppt |access-date= 2009-01-04 |archive-date= 2019-05-16 |archive-url= https://web.archive.org/web/20190516181338/https://www.gont.com.ar/talks/IETF73/ietf73-tcpm-urgent-data.ppt |url-status= live }}</ref><ref>{{cite book |last= Peterson |first= Larry |title= Computer Networks |url= https://archive.org/details/computernetworks00pete_974 |url-access= limited |publisher= Morgan Kaufmann |year= 2003 |page= [https://archive.org/details/computernetworks00pete_974/page/n419 401] |isbn= 978-1-55860-832-0}}</ref> Since the feature is not frequently used, it is not well tested on some platforms and has been associated with [[Vulnerability (computing)|vulnerabilities]], [[WinNuke]] for instance. ===Forcing data delivery=== Normally, TCP waits for 200 ms for a full packet of data to send ([[Nagle's Algorithm]] tries to group small messages into a single packet). This wait creates small, but potentially serious delays if repeated constantly during a file transfer. For example, a typical send block would be 4 KB, a typical MSS is 1460, so 2 packets go out on a 10 Mbit/s Ethernet taking ~1.2 ms each followed by a third carrying the remaining 1176 after a 197 ms pause because TCP is waiting for a full buffer. In the case of telnet, each user keystroke is echoed back by the server before the user can see it on the screen. This delay would become very annoying. Setting the [[network socket|socket]] option <code>TCP_NODELAY</code> overrides the default 200 ms send delay. Application programs use this socket option to force output to be sent after writing a character or line of characters. The {{harvtxt|RFC 793}} defines the <code>PSH</code> push bit as "a message to the receiving TCP stack to send this data immediately up to the receiving application".<ref name=comer/> There is no way to indicate or control it in [[user space]] using [[Berkeley sockets]]; it is controlled by the [[protocol stack]] only.<ref name="Stevens2006">{{cite book|author=Richard W. Stevens|title=TCP/IP Illustrated. Vol. 1, The protocols|url=https://archive.org/details/tcpipillustrated00stev|url-access=registration|date=November 2011|publisher=Addison-Wesley|isbn=978-0-201-63346-7|pages=Chapter 20}}</ref>
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)