(October 27, 1995)
Daniel Z. Tabor Jr.
New Jersey Institute of Technology
ACKs and Retransmission:
Acknowledgments refer to a position in the data stream
using stream sequence numbers (not datagrams or segments).
Sequence numbers are used to reorder arriving segments.
The largest contiguous prefix of the stream is ACKed.
ACKs always specify the sequence number of the next
octet the receiver expects to receive.
Time-Out and Retransmission:
Timers are started for every segment sent and TCP waits
for ACKs for each of them.
When a timer expires:
TCP assumes the segment was lost or corrupted and
retransmits the segment.
A new timer is started.
ACK response times vary greatly due to:
Traversal of different speed networks.
Different paths mean a different number of networks and
gateways.
Delays at gateways differ based on traffic.
Adaptive Retransmission:
TCP accommodates varying internet delays by using an
adaptive retransmission algorithm which monitors the
connection performance and revises timers accordingly.
Sample Round Trip Time:
Computed from the elapsed time between sending a
segment and receiving it's ACK.
Estimated Round Trip Time (RTT):
Stored as a weighted average, which is slowly changed
based on sample trip times.
Makes weighted average immune to changes lasting a
short time.
Lambda >> 0
Makes weighted average respond quickly to delay
changes.
Time-out = Beta * RTT
(where Beta > 1)
Beta is difficult to choose.
Detecting packet loss quickly improves throughput and
reduces unnecessary waiting before retransmission.
Acknowledgment Ambiguity:
Since ACKs refer to the data (not datagrams) received, it is
possible a delayed original piece of data will be considered
the ACK for a retransmission of it.
This situation is known as acknowledgment ambiguity.
Picking either ACK will throw off the established RTT.
Karn's Algorithm:
Avoids the problem of ambiguous ACKs by only adjusting
the estimated RTT for unambiguous ACKs.
It ensures that segments are only transmitted once.
Timer Backoff Strategy:
Timer Backoff Strategy:
Required by TCP to handle KarnĘs short comings where
segments are sent even after sharp increases in delay.
The timer backoff strategy uses timers normally, but
increases the time-out value if the timer expires and causes
a retransmission.
The timer backoff strategy is upper-bounded by the
longest path delay in the internet.
New_time-out = Alpha * time-out
(where Alpha >= 2)
It uses samples taken from retransmissions for
subsequent packets, until a valid sample is taken.
This works even with networks that loose many packets.
High Variance in Delay:
Improvements to the former RTT computations include:
Estimating an average RTT.
Estimating the trip delay variance.
Using the estimated variance in place of Alpha.
DIFF= SAMPLE - Old_RTT
RTT = Old_RTT - Gamma * DIFF
DEV = Old_DEV + Gamma (|DIFF| - Old_DEV)
Where:
DEV is the estimated mean derivation.
Gamma controls how quickly new samples affect the
weighted average (0..1)
Congestion Control:
Congestion Control:
A global issue which networks must face, as congestion
builds, the network delay increases.
Deadlock:
May result from attempts to retransmit packets/segments
(congestion collapse).
Is the extreme case of congestion.
TCP uses slow-start and multiplicative decrease to
avoid congestion, and also maintains a congestion window
limit.
Allowed_window = The minimum of:
(receiver_advertised_limit, congestion_window)
Receiver_advertised_limit and congestion_window are equal during a non-congested connection.
Multiplicative Decrease
Congestion Avoidance:
Multiplicative Decrease Congestion Avoidance:
Assumes most datagram losses come from congestion.
Upon segment loss:
Reduce the congestion_window by half (minimum = 1 segment)
Backoff retransmission timers exponentially for segments remaining in the allowed_window.
By using multiplicative decrease congestion avoidance:
It provides quick and significant traffic reduction
through exponential backoff of traffic and retransmissions.
Slow-Start Recovery:
Slow-Start Recovery:
Start the congestion_window at the size of a single
segment
Increase the allowed_window size by one segment each
time an acknowledgment arrives.
This is used for a new connection or after a period of
congestion.
By increasing the allowed_window size:
It expands the window by a power of two each time.
It only takes Log2N round trips to send N segments.
When the congestion_window reaches half its original size,
TCP decreases the number of segments sent (slows down the rate of increment).
This is the congestion avoidance phase.
TCP Performance Enhancements:
TCP performance increase results from:
Slow-start increase.
Multiplicative decrease.
Congestion avoidance.
Measurement of variation.
Exponential timer backoff.
Establishing TCP Connections:
TCP uses a three-way handshake scenario to setup and
release a connection since it uses ACKs and piggybacking.
This is necessary and sufficient for end-to-end
synchronization.
The appropriate bits must be set in the headers of outgoing
and incoming datagrams to determine whether a connection
setup or release is required.
SYN:
Synchronization bit is set in segments 1 and 2 of the
handshake.
ACK:
Acknowledgment bit is set in segments 2 and 3 of the
handshake.
Full-Duplex connections can be established in various orders:
Passive wait (A), Active initiate (B).
Active initiate (A), Passive wait (B).
Simultaneous active initiation (A & B).
TCP ignores additional requests for connections to the
same place after the first has been established.
Sequence Numbers (seq =):
Sent and ACKed during the handshake.
The initial sequence number is chosen at random by the
initiating machine to identify bytes in the stream its
sending.
A sends:
B records:
B sends:
A records:
A sends:
B receives:
(SYN, seq. # octet X)
(sequence # octet X)
(SYN, ACK X+1, seq. # octet Y)
(sequence # octet Y)
(ACK Y+1)
(ACK (Y+1)
Closing a TCP Connection:
TCP uses a modified three-way handshake for a required graceful closure of a connection.
After transmitting the remaining data, one end will attempt
to close it's half of the connection by sending a segment
with the FIN bit set.
The FIN is acknowledged and only half of the connection is closed (until the other end closes it's connection).
The first FIN is acknowledged only.
Steps taken for closure:
machine notifies the application of the request to shut down.
The second FIN is sent including a redundant ACK for the first FIN.
The original machine sends a final ACK which then effectively end it's data sending connection.
TCP Connection Reset:
When a segment is sent with the CODE field RST bit set
(initiating a connection termination):
The other end immediately aborts the connection.
It then notifies the application of the connection
reset.
An instantaneous abort frees all buffer space allocated
for that connection and ceases the full-duplex
communication.
TCP Finite State Machine:
TIMED WAIT is used to handle unreliable delivery
problems.
Maximum Segment Lifetime:
The maximum time an old segment can live on the
internet (similar to TTL).
Forcing Delivery with Push:
Push (PSH bit)
Sent with a segment that requires immediate delivery of
the segment by TCP.
An example of it's use would be with interactive users
(keystroke: usually used after each one).
Push is a way to immediately send information without
waiting for the buffer to fill.
Reserved Port Numbers:
Well-known port assignments are used along with dynamic
port assignment (just like UDP).
Ports less than 256 (integer number) are used for well-known ports.
Usually ports less than 1023 are used, but only ports less than 256 are standard.
Although UDP and TCP ports are independent, designers use the same integer port number when accessing services
available to both protocols.
Example:
ECHO = port 7
TIME = port 37
Silly Window Syndrome and Small Packets:
TCP buffers incoming data:
Buffer size of K-bytes
Uses the WINDOW field in acknowledgments to advertise
it's size.
If the buffer is filled, WINDOW size =
0 is advertised in the ACK.
As each single byte (1 byte) in the buffer became
available, TCP would advertise a WINDOW size of one.
The sender would then send one segment for each buffer space available.
This would eventually limit the transfer of data to
single segments (1 segment = 1 octet) each time.
Small packets use too much network bandwidth by creating excess overhead using single octet window
advertisements.
Early TCP implementations exhibited a problem known as Silly Window Syndrome in which:
Each ACK advertises a small amount of space available
Each segment only carries a small amount of data.
Receive-side
Silly Window Avoidance:
Receive-side Silly Window Avoidance:
Before sending an updated window advertisement after
advertising a zero window, wait for space to become
available that is either:
At least 50% of the total buffer size or
Equal to a maximum sized segment.
Two approaches:
TCP adds each segment but does not advertise a window size
increase until the available buffer space (window at the
receiver) reaches the limit specified by the silly window avoidance heuristic.
TCP delays sending an ACK when the silly window avoidance algorithm specifies that the window is not sufficiently
large enough to advertise.
Delayed ACKs can cause communication problems (for obvious reasons).
Sender-side
Silly Window Avoidance:
Sender-side Silly Window Avoidance
:
The sender delays sending segments until it can accumulate a
reasonable amount of data in it's output buffer (known as
clumping).
Sufficient data usually constitutes a maximum-sized segment.
The sending node buffers until:
A segment is filled
Or if it is still waiting to send data when an ACK
arrives, it sends all available buffered data.