rfc9768.original   rfc9768.txt 
TCP Maintenance & Minor Extensions (tcpm) B. Briscoe Internet Engineering Task Force (IETF) B. Briscoe
Internet-Draft Independent Request for Comments: 9768 Independent
Updates: 3168 (if approved) M. Kühlewind Updates: 3168 M. Kühlewind
Intended status: Standards Track Ericsson Category: Standards Track Ericsson
Expires: 11 September 2025 R. Scheffenegger ISSN: 2070-1721 R. Scheffenegger
NetApp NetApp
10 March 2025 August 2025
More Accurate Explicit Congestion Notification (AccECN) Feedback in TCP More Accurate Explicit Congestion Notification (AccECN) Feedback in TCP
draft-ietf-tcpm-accurate-ecn-34
Abstract Abstract
Explicit Congestion Notification (ECN) is a mechanism where network Explicit Congestion Notification (ECN) is a mechanism by which
nodes can mark IP packets instead of dropping them to indicate network nodes can mark IP packets instead of dropping them to
incipient congestion to the endpoints. Receivers with an ECN-capable indicate incipient congestion to the endpoints. Receivers with an
transport protocol feed back this information to the sender. ECN was ECN-capable transport protocol feed back this information to the
originally specified for TCP in such a way that only one feedback sender. ECN was originally specified for TCP in such a way that only
signal can be transmitted per Round-Trip Time (RTT). Recent new TCP one feedback signal can be transmitted per Round-Trip Time (RTT).
mechanisms like Congestion Exposure (ConEx), Data Center TCP (DCTCP) Newer TCP mechanisms like Congestion Exposure (ConEx), Data Center
or Low Latency, Low Loss, and Scalable Throughput (L4S) need more TCP (DCTCP), or Low Latency, Low Loss, and Scalable Throughput (L4S)
Accurate ECN (AccECN) feedback information whenever more than one need more Accurate ECN (AccECN) feedback information whenever more
marking is received in one RTT. This document updates the original than one marking is received in one RTT. This document updates the
ECN specification in RFC 3168 to specify a scheme that provides more original ECN specification defined in RFC 3168 by specifying a scheme
than one feedback signal per RTT in the TCP header. Given TCP header that provides more than one feedback signal per RTT in the TCP
space is scarce, it allocates a reserved header bit previously header. Given TCP header space is scarce, it allocates a reserved
assigned to the ECN-Nonce. It also overloads the two existing ECN header bit previously assigned to the ECN-nonce. It also overloads
flags in the TCP header. The resulting extra space is additionally the two existing ECN flags in the TCP header. The resulting extra
exploited to feed back the IP-ECN field received during the TCP space is additionally exploited to feed back the IP-ECN field
connection establishment. Supplementary feedback information can received during the TCP connection establishment. Supplementary
optionally be provided in two new TCP option alternatives, which are feedback information can optionally be provided in two new TCP option
never used on the TCP SYN. The document also specifies the treatment alternatives, which are never used on the TCP SYN. The document also
of this updated TCP wire protocol by middleboxes. specifies the treatment of this updated TCP wire protocol by
middleboxes.
Status of This Memo Status of This Memo
This Internet-Draft is submitted in full conformance with the This is an Internet Standards Track document.
provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list of current Internet-
Drafts is at https://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months This document is a product of the Internet Engineering Task Force
and may be updated, replaced, or obsoleted by other documents at any (IETF). It represents the consensus of the IETF community. It has
time. It is inappropriate to use Internet-Drafts as reference received public review and has been approved for publication by the
material or to cite them other than as "work in progress." Internet Engineering Steering Group (IESG). Further information on
Internet Standards is available in Section 2 of RFC 7841.
This Internet-Draft will expire on 11 September 2025. Information about the current status of this document, any errata,
and how to provide feedback on it may be obtained at
https://www.rfc-editor.org/info/rfc9768.
Copyright Notice Copyright Notice
Copyright (c) 2025 IETF Trust and the persons identified as the Copyright (c) 2025 IETF Trust and the persons identified as the
document authors. All rights reserved. document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents (https://trustee.ietf.org/ Provisions Relating to IETF Documents
license-info) in effect on the date of publication of this document. (https://trustee.ietf.org/license-info) in effect on the date of
Please review these documents carefully, as they describe your rights publication of this document. Please review these documents
and restrictions with respect to this document. Code Components carefully, as they describe your rights and restrictions with respect
extracted from this document must include Revised BSD License text as to this document. Code Components extracted from this document must
described in Section 4.e of the Trust Legal Provisions and are include Revised BSD License text as described in Section 4.e of the
provided without warranty as described in the Revised BSD License. Trust Legal Provisions and are provided without warranty as described
in the Revised BSD License.
This document may contain material from IETF Documents or IETF This document may contain material from IETF Documents or IETF
Contributions published or made publicly available before November Contributions published or made publicly available before November
10, 2008. The person(s) controlling the copyright in some of this 10, 2008. The person(s) controlling the copyright in some of this
material may not have granted the IETF Trust the right to allow material may not have granted the IETF Trust the right to allow
modifications of such material outside the IETF Standards Process. modifications of such material outside the IETF Standards Process.
Without obtaining an adequate license from the person(s) controlling Without obtaining an adequate license from the person(s) controlling
the copyright in such materials, this document may not be modified the copyright in such materials, this document may not be modified
outside the IETF Standards Process, and derivative works of it may outside the IETF Standards Process, and derivative works of it may
not be created outside the IETF Standards Process, except to format not be created outside the IETF Standards Process, except to format
it for publication as an RFC or to translate it into languages other it for publication as an RFC or to translate it into languages other
than English. than English.
Table of Contents Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 4 1. Introduction
1.1. Document Roadmap . . . . . . . . . . . . . . . . . . . . 5 1.1. Document Roadmap
1.2. Goals . . . . . . . . . . . . . . . . . . . . . . . . . . 6 1.2. Goals
1.3. Terminology . . . . . . . . . . . . . . . . . . . . . . . 6 1.3. Terminology
1.4. Recap of Existing ECN feedback in IP/TCP . . . . . . . . 7 1.4. Recap of Existing ECN Feedback in IP/TCP
2. AccECN Protocol Overview and Rationale . . . . . . . . . . . 9 2. AccECN Protocol Overview and Rationale
2.1. Capability Negotiation . . . . . . . . . . . . . . . . . 10 2.1. Capability Negotiation
2.2. Feedback Mechanism . . . . . . . . . . . . . . . . . . . 11 2.2. Feedback Mechanism
2.3. Delayed ACKs and Resilience Against ACK Loss . . . . . . 11 2.3. Delayed ACKs and Resilience Against ACK Loss
2.4. Feedback Metrics . . . . . . . . . . . . . . . . . . . . 12 2.4. Feedback Metrics
2.5. Generic (Mechanistic) Reflector . . . . . . . . . . . . . 12 2.5. Generic (Mechanistic) Reflector
3. AccECN Protocol Specification . . . . . . . . . . . . . . . . 13 3. AccECN Protocol Specification
3.1. Negotiating to use AccECN . . . . . . . . . . . . . . . . 13 3.1. Negotiating to Use AccECN
3.1.1. Negotiation during the TCP three-way handshake . . . 13 3.1.1. Negotiation During the TCP Three-Way Handshake
3.1.2. Backward Compatibility . . . . . . . . . . . . . . . 15 3.1.2. Backward Compatibility
3.1.3. Forward Compatibility . . . . . . . . . . . . . . . . 17 3.1.3. Forward Compatibility
3.1.4. Multiple SYNs or SYN/ACKs . . . . . . . . . . . . . . 18 3.1.4. Multiple SYNs or SYN/ACKs
3.1.4.1. Retransmitted SYNs . . . . . . . . . . . . . . . 18 3.1.4.1. Retransmitted SYNs
3.1.4.2. Retransmitted SYN/ACKs . . . . . . . . . . . . . 19 3.1.4.2. Retransmitted SYN/ACKs
3.1.5. Implications of AccECN Mode . . . . . . . . . . . . . 20 3.1.5. Implications of AccECN Mode
3.2. AccECN Feedback . . . . . . . . . . . . . . . . . . . . . 24 3.2. AccECN Feedback
3.2.1. Initialization of Feedback Counters . . . . . . . . . 25 3.2.1. Initialization of Feedback Counters
3.2.2. The ACE Field . . . . . . . . . . . . . . . . . . . . 25 3.2.2. The ACE Field
3.2.2.1. ACE Field on the ACK of the SYN/ACK . . . . . . . 26 3.2.2.1. ACE Field on the ACK of the SYN/ACK
3.2.2.2. Encoding and Decoding Feedback in the ACE 3.2.2.2. Encoding and Decoding Feedback in the ACE Field
Field . . . . . . . . . . . . . . . . . . . . . . . 29 3.2.2.3. Testing for Mangling of the IP/ECN Field
3.2.2.3. Testing for Mangling of the IP/ECN Field . . . . 31 3.2.2.4. Testing for Zeroing of the ACE Field
3.2.2.4. Testing for Zeroing of the ACE Field . . . . . . 33 3.2.2.5. Safety Against Ambiguity of the ACE Field
3.2.2.5. Safety against Ambiguity of the ACE Field . . . . 34 3.2.3. The AccECN Option
3.2.3. The AccECN Option . . . . . . . . . . . . . . . . . . 37
3.2.3.1. Encoding and Decoding Feedback in the AccECN Option 3.2.3.1. Encoding and Decoding Feedback in the AccECN Option
Fields . . . . . . . . . . . . . . . . . . . . . . 39 Fields
3.2.3.2. Path Traversal of the AccECN Option . . . . . . . 39 3.2.3.2. Path Traversal of the AccECN Option
3.2.3.3. Usage of the AccECN TCP Option . . . . . . . . . 44 3.2.3.3. Usage of the AccECN TCP Option
3.3. AccECN Compliance Requirements for TCP Proxies, Offload 3.3. AccECN Compliance Requirements for TCP Proxies, Offload
Engines and other Middleboxes . . . . . . . . . . . . . . 46 Engines, and Other Middleboxes
3.3.1. Requirements for TCP Proxies . . . . . . . . . . . . 46 3.3.1. Requirements for TCP Proxies
3.3.2. Requirements for Transparent Middleboxes and TCP 3.3.2. Requirements for Transparent Middleboxes and TCP
Normalizers . . . . . . . . . . . . . . . . . . . . . 46 Normalizers
3.3.3. Requirements for TCP ACK Filtering . . . . . . . . . 47 3.3.3. Requirements for TCP ACK Filtering
3.3.4. Requirements for TCP Segmentation Offload and Large 3.3.4. Requirements for TCP Segmentation Offload and Large
Receive Offload . . . . . . . . . . . . . . . . . . . 48 Receive Offload
4. Updates to RFC 3168 . . . . . . . . . . . . . . . . . . . . . 49 4. Updates to RFC 3168
5. Interaction with TCP Variants . . . . . . . . . . . . . . . . 51 5. Interaction with TCP Variants
5.1. Compatibility with SYN Cookies . . . . . . . . . . . . . 51 5.1. Compatibility with SYN Cookies
5.2. Compatibility with TCP Experiments and Common TCP 5.2. Compatibility with TCP Experiments and Common TCP Options
Options . . . . . . . . . . . . . . . . . . . . . . . . . 51 5.3. Compatibility with Feedback Integrity Mechanisms
5.3. Compatibility with Feedback Integrity Mechanisms . . . . 52 6. Summary: Protocol Properties
6. Summary: Protocol Properties . . . . . . . . . . . . . . . . 53 7. IANA Considerations
7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 55 8. Security and Privacy Considerations
8. Security and Privacy Considerations . . . . . . . . . . . . . 57 9. References
9. References . . . . . . . . . . . . . . . . . . . . . . . . . 58 9.1. Normative References
9.1. Normative References . . . . . . . . . . . . . . . . . . 58 9.2. Informative References
9.2. Informative References . . . . . . . . . . . . . . . . . 59 Appendix A. Example Algorithms
Appendix A. Example Algorithms . . . . . . . . . . . . . . . . . 62 A.1. Example Algorithm to Encode/Decode the AccECN Option
A.1. Example Algorithm to Encode/Decode the AccECN Option . . 62
A.2. Example Algorithm for Safety Against Long Sequences of ACK A.2. Example Algorithm for Safety Against Long Sequences of ACK
Loss . . . . . . . . . . . . . . . . . . . . . . . . . . 63 Loss
A.2.1. Safety Algorithm without the AccECN Option . . . . . 64 A.2.1. Safety Algorithm Without the AccECN Option
A.2.2. Safety Algorithm with the AccECN Option . . . . . . . 66 A.2.2. Safety Algorithm with the AccECN Option
A.3. Example Algorithm to Estimate Marked Bytes from Marked A.3. Example Algorithm to Estimate Marked Bytes from Marked
Packets . . . . . . . . . . . . . . . . . . . . . . . . . 68 Packets
A.4. Example Algorithm to Count Not-ECT Bytes . . . . . . . . 68 A.4. Example Algorithm to Count Not-ECT Bytes
Appendix B. Rationale for Usage of TCP Header Flags . . . . . . 69 Appendix B. Rationale for Usage of TCP Header Flags
B.1. Three TCP Header Flags in the SYN-SYN/ACK Handshake . . . 69 B.1. Three TCP Header Flags in the SYN-SYN/ACK Handshake
B.2. Four Codepoints in the SYN/ACK . . . . . . . . . . . . . 70 B.2. Four Codepoints in the SYN/ACK
B.3. Space for Future Evolution . . . . . . . . . . . . . . . 70 B.3. Space for Future Evolution
Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . 72 Acknowledgements
Comments Solicited . . . . . . . . . . . . . . . . . . . . . . . 72 Authors' Addresses
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 73
1. Introduction 1. Introduction
Explicit Congestion Notification (ECN) [RFC3168] is a mechanism where Explicit Congestion Notification (ECN) [RFC3168] is a mechanism by
network nodes can mark IP packets instead of dropping them to which network nodes can mark IP packets instead of dropping them to
indicate incipient congestion to the endpoints. Receivers with an indicate incipient congestion to the endpoints. Receivers with an
ECN-capable transport protocol feed back this information to the ECN-capable transport protocol feed back this information to the
sender. In RFC 3168, ECN was specified for TCP in such a way that sender. In RFC 3168, ECN was specified for TCP in such a way that
only one feedback signal could be transmitted per Round-Trip Time only one feedback signal could be transmitted per Round-Trip Time
(RTT). This is sufficient for congestion control scheme like Reno (RTT). This is sufficient for congestion control schemes like Reno
[RFC6582] and Cubic [RFC9438], as those schemes reduce their [RFC6582] and CUBIC [RFC9438], as those schemes reduce their
congestion window by a fixed factor if congestion occurs within an congestion window by a fixed factor if congestion occurs within an
RTT independent of the number of received congestion markings. RTT independent of the number of received congestion markings.
Recently, proposed mechanisms like Congestion Exposure (ConEx Recently, proposed mechanisms like Congestion Exposure (ConEx
[RFC7713]), DCTCP [RFC8257] or L4S [RFC9330] need to know when more [RFC7713]), DCTCP [RFC8257], and L4S [RFC9330] need to know when more
than one marking is received in one RTT, which is information that than one marking is received in one RTT, which is information that
cannot be provided by the feedback scheme as specified in [RFC3168]. cannot be provided by the feedback scheme as specified in [RFC3168].
This document specifies an update to the ECN feedback scheme of RFC This document specifies an update to the ECN feedback scheme of RFC
3168 that provides more accurate information and could be used by 3168 that provides more accurate information and could be used by
these and potentially other future TCP extensions, while still also these and potentially other future TCP extensions, while still also
supporting the pre-existing TCP congestion controllers that use just supporting the pre-existing TCP congestion controllers that use just
one feedback signal per round. Congestion control is the term the one feedback signal per round. Congestion control is the term the
IETF uses to describe data rate management. It is the algorithm that IETF uses to describe data rate management. It is the algorithm that
a sender uses to optimize its sending rate so that it transmits data a sender uses to optimize its sending rate so that it transmits data
as fast as the network can carry it, but no faster. A fuller as fast as the network can carry it, but no faster. A fuller
treatment of the motivation for this specification is given in the description of the motivation for this specification is given in the
associated requirements document [RFC7560]. associated requirements document [RFC7560].
This document specifies a standards track scheme for ECN feedback in This document specifies a Standards Track scheme for ECN feedback in
the TCP header to provide more than one feedback signal per RTT. It the TCP header to provide more than one feedback signal per RTT. It
will be called the more Accurate ECN feedback scheme, or AccECN for is called the more "Accurate ECN" feedback scheme, or AccECN for
short. This document updates RFC 3168 with respect to negotiation short. This document updates RFC 3168 with respect to negotiation
and use of the feedback scheme for TCP. All aspects of RFC 3168 and use of the feedback scheme for TCP. All aspects of RFC 3168
other than the TCP feedback scheme and its negotiation remain other than the TCP feedback scheme and its negotiation remain
unchanged by this specification. In particular the definition of ECN unchanged by this specification. In particular, the definition of
at the IP layer is unaffected. Section 4 gives a more detailed ECN at the IP layer is unaffected. Section 4 details the aspects of
specification of exactly which aspects of RFC 3168 this document RFC 3168 that are updated by this document.
updates.
This document uses the term Classic ECN feedback when it needs to This document uses the term "Classic ECN feedback" when it needs to
distinguish the TCP/ECN feedback scheme defined in [RFC3168] from the distinguish the TCP/ECN feedback scheme defined in [RFC3168] from the
AccECN TCP feedback scheme. AccECN is intended to offer a complete AccECN TCP feedback scheme. AccECN is intended to offer a complete
replacement for Classic TCP/ECN feedback, not a fork in the design of replacement for Classic TCP/ECN feedback, not a fork in the design of
TCP. AccECN feedback complements TCP's loss feedback and it can TCP. AccECN feedback complements TCP's loss feedback and it can
coexist alongside hosts using Classic TCP/ECN feedback. So its coexist alongside hosts using Classic TCP/ECN feedback. So its
applicability is intended to include the public Internet as well as applicability is intended to include the public Internet as well as
private IP network such as data centres (and even any non-IP networks private IP networks such as data centres (and even any non-IP
over which TCP is used), whether or not any nodes on the path support networks over which TCP is used), whether or not any nodes on the
ECN, of whatever flavour. path support ECN, of whatever flavour.
AccECN feedback overloads the two existing ECN flags in the TCP AccECN feedback overloads the two existing ECN flags in the TCP
header and allocates the currently reserved flag (previously called header and allocates the currently reserved flag (previously called
NS) in the TCP header, to be used as one three-bit counter field for NS) in the TCP header to be used as one 3-bit counter field for
feeding back the number of packets marked as congestion experienced feeding back the number of packets marked as congestion experienced
(CE). Given the new definitions of these three bits, both ends have (CE). Given the new definitions of these three bits, both ends have
to support the new wire protocol before it can be used. Therefore, to support the new wire protocol before it can be used. Therefore,
during the TCP handshake, the two ends use these three bits in the during the TCP handshake, the two ends use these three bits in the
TCP header to negotiate the most advanced feedback protocol that they TCP header to negotiate the most advanced feedback protocol that they
can both support, in a way that is backward compatible with can both support, in a way that is backward compatible with
[RFC3168]. [RFC3168].
AccECN is solely a change to the TCP wire protocol; it covers the AccECN is solely a change to the TCP wire protocol; it covers the
negotiation and signaling of more Accurate ECN feedback from a TCP negotiation and signaling of more Accurate ECN feedback from a TCP
Data Receiver to a Data Sender. It is completely independent of how Data Receiver to a Data Sender. It is completely independent of how
TCP might respond to congestion feedback, which is out of scope, but TCP might respond to congestion feedback, which is out of scope, but
ultimately the motivation for Accurate ECN feedback. Like Classic ultimately the motivation for Accurate ECN feedback. Like Classic
ECN feedback, AccECN can be used by standard Reno or CUBIC congestion ECN feedback, AccECN can be used by standard Reno or CUBIC congestion
control [RFC5681] [RFC9438] to respond to the existence of at least control [RFC5681] [RFC9438] to respond to the existence of at least
one congestion notification within a round trip. Or, unlike Reno or one congestion notification within a round trip. Or, unlike Reno or
CUBIC, AccECN can be used to respond to the extent of congestion CUBIC, AccECN can be used to respond to the extent of congestion
notification over a round trip, as for example DCTCP does in notification over a round trip, as for example DCTCP does in
controlled environments [RFC8257]. For congestion response, this controlled environments [RFC8257]. For congestion response, this
specification refers to the original ECN specificiation adopted in specification refers to the original ECN specification adopted in
2001 [RFC3168], as updated by the more relaxed rules introduced in 2001 [RFC3168], as updated by the more relaxed rules introduced in
2018 to allow ECN experiments [RFC8311], namely: a TCP-based Low 2018 to allow ECN experiments [RFC8311], namely: a TCP-based Low
Latency Low Loss Scalable (L4S) congestion control [RFC9330]; or Latency Low Loss Scalable (L4S) congestion control [RFC9330]; or
Alternative Backoff with ECN (ABE) [RFC8511]. Alternative Backoff with ECN (ABE) [RFC8511].
Section 5.2 explains how AccECN is compatible with current commonly Section 5.2 explains how AccECN is compatible with current commonly
used TCP options, and a number of current experimental modifications used TCP options, and a number of current experimental modifications
to TCP, as well as SYN cookies. to TCP, as well as SYN cookies.
1.1. Document Roadmap 1.1. Document Roadmap
The following introductory section outlines the goals of AccECN The following introductory section outlines the goals of AccECN
(Section 1.2). Then, terminology is defined (Section 1.3) and a (Section 1.2). Then, terminology is defined (Section 1.3) and a
recap of existing prerequisite technology is given (Section 1.4). recap of existing prerequisite technology is given (Section 1.4).
Section 2 gives an informative overview of the AccECN protocol. Then Section 2 gives an informative overview of the AccECN protocol. Then
Section 3 gives the normative protocol specification, and Section 3.3 Section 3 gives the normative protocol specification, and Section 3.3
collects together requirements for proxies, offload engines and other collects requirements for proxies, offload engines, and other
middleboxes. Section 4 clarifies which aspects of RFC 3168 are middleboxes. Section 4 clarifies which aspects of RFC 3168 are
updated by AccECN. Section 5 assesses the interaction of AccECN with updated by AccECN. Section 5 assesses the interaction of AccECN with
commonly used variants of TCP, whether standardized or not. Then commonly used variants of TCP, whether they are standardized or not.
Section 6 summarizes the features and properties of AccECN. Then Section 6 summarizes the features and properties of AccECN.
Section 7 summarizes the protocol fields and numbers that IANA will Section 7 summarizes the protocol fields and numbers that IANA
need to assign and Section 8 points to the aspects of the protocol assigned, and Section 8 points to the aspects of the protocol that
that will be of interest to the security community. will be of interest to the security community.
Appendix A gives pseudocode examples for the various algorithms that Appendix A gives pseudocode examples for the various algorithms that
AccECN uses and Appendix B explains why AccECN uses flags in the main AccECN uses, and Appendix B explains why AccECN uses flags in the
TCP header and quantifies the space left for future use. main TCP header and quantifies the space left for future use.
1.2. Goals 1.2. Goals
[RFC7560] enumerates requirements that a candidate feedback scheme [RFC7560] enumerates requirements that a candidate feedback scheme
will need to satisfy, under the headings: resilience, timeliness, needs to satisfy, under the headings: resilience, timeliness,
integrity, accuracy (including ordering and lack of bias), integrity, accuracy (including ordering and lack of bias),
complexity, overhead and compatibility (both backward and forward). complexity, overhead, and compatibility (both backward and forward).
It recognizes that a perfect scheme that fully satisfies all the It recognizes that a perfect scheme that fully satisfies all the
requirements is unlikely and trade-offs between requirements are requirements is unlikely and trade-offs between requirements are
likely. Section 6 presents the properties of AccECN against these likely. Section 6 considers the properties of AccECN against these
requirements and discusses the trade-offs made. requirements and discusses the trade-offs.
The requirements document recognizes that a protocol as ubiquitous as The requirements document recognizes that a protocol as ubiquitous as
TCP needs to be able to serve as-yet-unspecified requirements. TCP needs to be able to serve as-yet-unspecified requirements.
Therefore an AccECN receiver acts as a generic (mechanistic) Therefore, an AccECN receiver acts as a generic (mechanistic)
reflector of congestion information with the aim that in future new reflector of congestion information with the aim that new sender
sender behaviours can be deployed unilaterally (see Section 2.5). behaviours can be deployed unilaterally (see Section 2.5) in the
future.
1.3. Terminology 1.3. Terminology
AccECN: The more Accurate ECN feedback scheme will be called AccECN AccECN: The more Accurate ECN feedback scheme is called AccECN for
for short. short.
Classic ECN: The ECN protocol specified in [RFC3168]. Classic ECN: The ECN protocol specified in [RFC3168].
Classic ECN feedback: The feedback aspect of the ECN protocol Classic ECN feedback: The feedback aspect of the ECN protocol
specified in [RFC3168], including generation, encoding, specified in [RFC3168], including generation, encoding,
transmission and decoding of feedback, but not the Data Sender's transmission and decoding of feedback, but not the Data Sender's
subsequent response to that feedback. subsequent response to that feedback.
ACK: A TCP acknowledgement, with or without a data payload (ACK=1). ACK: A TCP acknowledgement, with or without a data payload (ACK=1).
skipping to change at page 7, line 30 skipping to change at line 308
data and sends AccECN feedback. data and sends AccECN feedback.
Data Sender: The endpoint of a TCP half-connection that sends data Data Sender: The endpoint of a TCP half-connection that sends data
and receives AccECN feedback. and receives AccECN feedback.
In a mild abuse of terminology, this document sometimes refers to In a mild abuse of terminology, this document sometimes refers to
'TCP packets' instead of 'TCP segments'. 'TCP packets' instead of 'TCP segments'.
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and
"OPTIONAL" in this document are to be interpreted as described in BCP "OPTIONAL" in this document are to be interpreted as described in
14 [RFC2119] [RFC8174] when, and only when, they appear in all BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all
capitals, as shown here. capitals, as shown here.
1.4. Recap of Existing ECN feedback in IP/TCP 1.4. Recap of Existing ECN Feedback in IP/TCP
Explicit Congestion Notification (ECN) [RFC3168] can be split into Explicit Congestion Notification (ECN) [RFC3168] can be split into
two parts conceptionally. In the forward direction, alongside the two parts conceptionally. In the forward direction, alongside the
data stream, it uses a two-bit field in the IP header. This is data stream, it uses a 2-bit field in the IP header. This is
referred to as IP-ECN later on. This signal carried in the IP (Layer referred to as IP-ECN later on. This signal carried in the IP (Layer
3) header is exposed to network devices and may be modified when such 3) header is exposed to network devices and may be modified when such
a device starts to experience congestion (see Table 1). The second a device starts to experience congestion (see Table 1). The second
part is the feedback mechanism, by which the original data sender is part is the feedback mechanism, by which the original data sender is
notified of the current congestion state of the intermediate path. notified of the current congestion state of the intermediate path.
That returned signal is carried in a protocol specific manner, and is That returned signal is carried in a protocol-specific manner, and is
not to be modified by intermediate network devices. While ECN is in not to be modified by intermediate network devices. While ECN is in
active use for protocols such as QUIC [RFC9000], SCTP [RFC9260], RTP active use for protocols such as QUIC [RFC9000], SCTP [RFC9260], RTP
[RFC6679] and Remote Direct Memory Access over Converged Ethernet [RFC6679], and Remote Direct Memory Access over Converged Ethernet
[RoCEv2], this document only concerns itself with the specific [RoCEv2], this document only concerns itself with the specific
implementation for the TCP protocol. implementation for the TCP protocol.
Once ECN has been negotiated for a transport layer connection, the Once ECN has been negotiated for a transport layer connection, the
Data Sender for either half-connection can set two possible Data Sender for either half-connection can set two possible
codepoints (ECT(0) or ECT(1)) in the IP header of a data packet to codepoints (ECT(0) or ECT(1)) in the IP header of a data packet to
indicate an ECN-capable transport (ECT). If the ECN codepoint is indicate an ECN-capable transport (ECT). If the ECN codepoint is
0b00, the packet is considered to have been sent by a Not ECN-capable 0b00, the packet is considered to have been sent by a Not ECN-capable
Transport (Not-ECT). When a network node experiences congestion, it Transport (Not-ECT). When a network node experiences congestion, it
will occasionally either drop or mark a packet, with the choice will occasionally either drop or mark a packet, with the choice
skipping to change at page 8, line 32 skipping to change at line 356
+------------------+----------------+---------------------------+ +------------------+----------------+---------------------------+
| 0b01 | ECT(1) | ECN-Capable Transport (1) | | 0b01 | ECT(1) | ECN-Capable Transport (1) |
+------------------+----------------+---------------------------+ +------------------+----------------+---------------------------+
| 0b10 | ECT(0) | ECN-Capable Transport (0) | | 0b10 | ECT(0) | ECN-Capable Transport (0) |
+------------------+----------------+---------------------------+ +------------------+----------------+---------------------------+
| 0b11 | CE | Congestion Experienced | | 0b11 | CE | Congestion Experienced |
+------------------+----------------+---------------------------+ +------------------+----------------+---------------------------+
Table 1: The ECN Field in the IP Header Table 1: The ECN Field in the IP Header
In the TCP header the first two bits in byte 14 (the TCP header flags In the TCP header, the first two bits in byte 14 (the TCP header
at bit offsets 8 and 9 labelled Congestion Window Reduced (CWR) and flags at bit offsets 8 and 9 labelled Congestion Window Reduced (CWR)
Explicit Congestion notification Echo (ECE) in Figure 1) are defined and Explicit Congestion notification Echo (ECE) in Figure 1) are
as flags for the use of Classic ECN [RFC3168]. A TCP Client defined as flags for the use of Classic ECN [RFC3168]. A TCP Client
indicates that it supports Classic ECN feedback by setting (CWR,ECE) indicates that it supports Classic ECN feedback by setting (CWR,ECE)
= (1,1) in the SYN, and an ECN-enabled TCP Server confirms Classic = (1,1) in the SYN, and an ECN-enabled TCP Server confirms Classic
ECN support by setting (CWR,ECE) = (0,1) in the SYN/ACK. On ECN support by setting (CWR,ECE) = (0,1) in the SYN/ACK. On
reception of a CE-marked packet at the IP layer, the Data Receiver reception of a CE-marked packet at the IP layer, the Data Receiver
for that half-connection starts to set the Echo Congestion for that half-connection starts to set the Echo Congestion
Experienced (ECE) flag continuously in the TCP header of ACKs, which Experienced (ECE) flag continuously in the TCP header of ACKs, which
gives the signal resilience to loss or reordering of ACKs. The Data gives the signal resilience to loss or reordering of ACKs. The Data
Sender for the same half-connection confirms that it has received at Sender for the same half-connection confirms that it has received at
least one ECE signal by responding with the congestion window reduced least one ECE signal by responding with the CWR flag, which allows
(CWR) flag, which allows the Data Receiver to stop repeating the ECN- the Data Receiver to stop repeating the ECN-Echo flag. This always
Echo flag. This always leads to a full RTT of ACKs with ECE set. leads to a full RTT of ACKs with ECE set. Thus Classic ECN cannot
Thus Classic ECN cannot feed back any additional CE markings arriving feed back any additional CE markings arriving within this RTT.
within this RTT.
The last bit in byte 13 of the TCP header (the TCP header flag at bit The last bit in byte 13 of the TCP header (the TCP header flag at bit
offset 7 in Figure 1) was defined as the Nonce Sum (NS) for the ECN offset 7 in Figure 1) was defined as the Nonce Sum (NS) for the ECN-
Nonce [RFC3540]. In the absence of widespread deployment RFC 3540 nonce [RFC3540]. In the absence of widespread deployment, RFC 3540
has been reclassified as historic [RFC8311] and the respective flag was reclassified as Historic [RFC8311] and the respective flag was
has been marked as "reserved", making this TCP flag available for use marked as "Reserved", which made this TCP flag available for use by
by AccECN instead. AccECN instead.
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+ +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
| | | N | C | E | U | A | P | R | S | F | | | | N | C | E | U | A | P | R | S | F |
| Header Length | Reserved | S | W | C | R | C | S | S | Y | I | | Header Length | Reserved | S | W | C | R | C | S | S | Y | I |
| | | | R | E | G | K | H | T | N | N | | | | | R | E | G | K | H | T | N | N |
+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+ +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
Figure 1: TCP header flags as defined before the Nonce Sum flag Figure 1: TCP Header Flags as Defined Before the Nonce Sum Flag
reverted to Reserved Reverted to Reserved
2. AccECN Protocol Overview and Rationale 2. AccECN Protocol Overview and Rationale
This section provides an informative overview of the AccECN protocol This section provides an informative overview of the AccECN protocol
that will be normatively specified in Section 3 that is normatively specified in Section 3.
Like the general TCP approach, the Data Receiver of each TCP half- Like the general TCP approach, the Data Receiver of each TCP half-
connection sends AccECN feedback to the Data Sender on TCP connection sends AccECN feedback to the Data Sender on TCP
acknowledgements, reusing data packets of the other half-connection acknowledgements, reusing data packets of the other half-connection
whenever possible. whenever possible.
The AccECN protocol has had to be designed in two parts: The AccECN protocol has had to be designed in two parts:
* an essential feedback part that re-uses the TCP-ECN header bits * an essential feedback part that reuses the TCP-ECN header bits for
for the Data Receiver to feed back the number of packets arriving the Data Receiver to feed back the number of packets arriving with
with CE in the IP-ECN field. This provides more accuracy than CE in the IP-ECN field. This provides more accuracy than Classic
Classic ECN feedback, but limited resilience against ACK loss; ECN feedback, but limited resilience against ACK loss;
* a supplementary feedback part using one of two new alternative * a supplementary feedback part using one of two new alternative
AccECN TCP options that provide additional feedback on the number AccECN TCP options that provide additional feedback on the number
of payload bytes that arrive marked with each of the three ECN of payload bytes that arrive marked with each of the three ECN
codepoints in the IP-ECN field (not just CE marks). See the BCP codepoints in the IP-ECN field (not just CE marks). See the BCP
on Byte and Packet Congestion Notification [RFC7141] for the on Byte and Packet Congestion Notification [RFC7141] for the
rationale determining that conveying congested payload bytes rationale determining that conveying congested payload bytes
should be preferred over just providing feedback about congested should be preferred over just providing feedback about congested
packets. This also provides greater resilience against ACK loss packets. This also provides greater resilience against ACK loss
than the essential feedback, but it is currently more likely to than the essential feedback, but it is currently more likely to
suffer from middlebox interference. suffer from middlebox interference.
The two part design was necessary, given limitations on the space The two part design was necessary, given limitations on the space
available for TCP options and given the possibility that certain available for TCP options and given the possibility that certain
incorrectly designed middleboxes might prevent TCP using any new incorrectly designed middleboxes might prevent TCP from using any new
options. options.
The essential feedback part overloads the previous definition of the The essential feedback part overloads the previous definition of the
three flags in the TCP header that had been assigned for use by three flags in the TCP header that had been assigned for use by
Classic ECN. This design choice deliberately allows AccECN peers to Classic ECN. This design choice deliberately allows AccECN peers to
replace the Classic ECN feedback protocol, rather than leaving replace the Classic ECN feedback protocol, rather than leaving
Classic ECN feedback intact and adding more accurate feedback Classic ECN feedback intact and adding more accurate feedback
separately because: separately because:
* this efficiently reuses scarce TCP header space, given TCP option * this efficiently reuses scarce TCP header space, given TCP option
space is approaching saturation; space is approaching saturation;
* a single upgrade path for the TCP protocol is preferable to a fork * a single upgrade path for the TCP protocol is preferable to a fork
in the design which modifies the TCP header to convey all ECN in the design that modifies the TCP header to convey all ECN
feedback; feedback;
* otherwise Classic and Accurate ECN feedback could give conflicting * otherwise, Classic and Accurate ECN feedback could give
feedback about the same segment, which could open up new security conflicting feedback about the same segment, which could open up
concerns and make implementations unnecessarily complex; new security concerns and make implementations unnecessarily
complex;
* middleboxes are more likely to faithfully forward the TCP ECN * middleboxes are more likely to faithfully forward the TCP ECN
flags than newly defined areas of the TCP header. flags than newly defined areas of the TCP header.
AccECN is designed to work even if the supplementary feedback part is AccECN is designed to work even if the supplementary feedback part is
removed or zeroed out, as long as the essential feedback part gets removed or zeroed out, as long as the essential feedback part gets
through. through.
2.1. Capability Negotiation 2.1. Capability Negotiation
AccECN is a change to the wire protocol of the main TCP header, AccECN changes the wire protocol of the main TCP header; therefore,
therefore it can only be used if both endpoints have been upgraded to it can only be used if both endpoints have been upgraded to
understand it. The TCP Client signals support for AccECN on the understand it. The TCP Client signals support for AccECN on the
initial SYN of a connection and the TCP Server signals whether it initial SYN of a connection, and the TCP Server signals whether it
supports AccECN on the SYN/ACK. The TCP flags on the SYN that the supports AccECN on the SYN/ACK. The TCP flags on the SYN that the
TCP Client uses to signal AccECN support have been carefully chosen TCP Client uses to signal AccECN support have been carefully chosen
so that a TCP Server will interpret them as a request to support the so that a TCP Server will interpret them as a request to support the
most recent variant of ECN feedback that it supports. Then the TCP most recent variant of ECN feedback that it supports. Then the TCP
Client falls back to the same variant of ECN feedback. Client falls back to the same variant of ECN feedback.
An AccECN TCP Client does not send an AccECN Option on the SYN as SYN An AccECN TCP Client does not send an AccECN Option on the SYN as SYN
option space is limited. The TCP Server sends an AccECN Option on option space is limited. The TCP Server sends an AccECN Option on
the SYN/ACK and the TCP Client sends one on the first ACK to test the SYN/ACK, and the TCP Client sends one on the first ACK to test
whether the network path forwards these options correctly. whether the network path forwards these options correctly.
2.2. Feedback Mechanism 2.2. Feedback Mechanism
A Data Receiver maintains four counters initialized at the start of A Data Receiver maintains four counters initialized at the start of
the half-connection. Three count the number of arriving payload the half-connection. Three count the number of arriving payload
bytes marked CE, ECT(1) and ECT(0) in the IP-ECN field. These byte bytes marked CE, ECT(1), and ECT(0) in the IP-ECN field. These byte
counters reflect only the TCP payload length, excluding the TCP counters reflect only the TCP payload length, excluding the TCP
header and TCP options. The fourth counter counts the number of header and TCP options. The fourth counter counts the number of
packets arriving marked with a CE codepoint (including control packets arriving marked with a CE codepoint (including control
packets without payload if they are CE-marked). packets without payload if they are CE-marked).
The Data Sender maintains four equivalent counters for the half The Data Sender maintains four equivalent counters for the half
connection, and the AccECN protocol is designed to ensure they will connection, and the AccECN protocol is designed to ensure they will
match the values in the Data Receiver's counters, albeit after a match the values in the Data Receiver's counters, albeit after a
little delay. little delay.
Each ACK carries the three least significant bits (LSBs) of the Each ACK carries the three least significant bits (LSBs) of the
packet-based CE counter using the ECN bits in the TCP header, now packet-based CE counter using the ECN bits in the TCP header, now
renamed the Accurate ECN (ACE) field (see Figure 3 later). The 24 renamed the Accurate ECN (ACE) field (see Figure 3). The 24 LSBs of
LSBs of some or all of the byte counters can be optionally carried in some or all of the byte counters can be optionally carried in an
an AccECN Option. For efficient use of limited option space, two AccECN Option. For efficient use of limited option space, two
alternative forms of AccECN Option are specified with the fields in alternative forms of the AccECN Option are specified with the fields
the opposite order to each other. in the opposite order to each other.
2.3. Delayed ACKs and Resilience Against ACK Loss 2.3. Delayed ACKs and Resilience Against ACK Loss
With both the ACE and the AccECN Option mechanisms, the Data Receiver With both the ACE and the AccECN Option mechanisms, the Data Receiver
continually repeats the current LSBs of each of its respective continually repeats the current LSBs of each of its respective
counters. There is no need to acknowledge these continually repeated counters. There is no need to acknowledge these continually repeated
counters, so the congestion window reduced (CWR) mechanism of counters, so the Congestion Window Reduced (CWR) mechanism of
[RFC3168] is no longer used. Even if some ACKs are lost, the Data [RFC3168] is no longer used. Even if some ACKs are lost, the Data
Sender ought to be able to infer how much to increment its own Sender ought to be able to infer how much to increment its own
counters, even if the protocol field has wrapped. counters, even if the protocol field has wrapped.
The 3-bit ACE field can wrap fairly frequently. Therefore, even if The 3-bit ACE field can wrap fairly frequently. Therefore, even if
it appears to have incremented by one (say), the field might have it appears to have incremented by one (say), the field might have
actually cycled completely then incremented by one. The Data actually cycled completely and then incremented by one. The Data
Receiver is not allowed to delay sending an ACK to such an extent Receiver is not allowed to delay sending an ACK to such an extent
that the ACE field would cycle. However ACKs received at the Data that the ACE field would cycle. However, ACKs received at the Data
Sender could still cycle because a whole sequence of ACKs carrying Sender could still cycle because a whole sequence of ACKs carrying
intervening values of the field might all be lost or delayed in intervening values of the field might all be lost or delayed in
transit. transit.
The fields in an AccECN Option are larger, but they will increment in The fields in an AccECN Option are larger, but they will increment in
larger steps because they count bytes not packets. Nonetheless, larger steps because they count bytes not packets. Nonetheless,
their size has been chosen such that a whole cycle of the field would their size has been chosen such that a whole cycle of the field would
never occur between ACKs unless there had been an infeasibly long never occur between ACKs unless there has been an infeasibly long
sequence of ACK losses. Therefore, provided that an AccECN Option is sequence of ACK losses. Therefore, provided that an AccECN Option is
available, it can be treated as a dependable feedback channel. available, it can be treated as a dependable feedback channel.
If an AccECN Option is not available, e.g., it is being stripped by a If an AccECN Option is not available, e.g., it is being stripped by a
middlebox, the AccECN protocol will only feed back information on CE middlebox, the AccECN protocol will only feed back information on CE
markings (using the ACE field). Although not ideal, this will be markings (using the ACE field). Although not ideal, this will be
sufficient, because it is envisaged that neither ECT(0) nor ECT(1) sufficient, because it is envisaged that neither ECT(0) nor ECT(1)
will ever indicate more severe congestion than CE, even though future will ever indicate more severe congestion than CE, even though future
uses for ECT(0) or ECT(1) are still unclear [RFC8311]. Because the uses for ECT(0) or ECT(1) are still unclear [RFC8311]. Because the
3-bit ACE field is so small, when it is the only field available, the 3-bit ACE field is so small, when it is the only field available, the
skipping to change at page 12, line 26 skipping to change at line 536
AccECN Option on an ACK. The rules are designed to ensure that the AccECN Option on an ACK. The rules are designed to ensure that the
order in which different markings arrive at the receiver is order in which different markings arrive at the receiver is
communicated to the sender (as long as options are reaching the communicated to the sender (as long as options are reaching the
sender and as long as there is no ACK loss). Implementations are sender and as long as there is no ACK loss). Implementations are
encouraged to send an AccECN Option more frequently, but this is left encouraged to send an AccECN Option more frequently, but this is left
up to the implementer. up to the implementer.
2.4. Feedback Metrics 2.4. Feedback Metrics
The CE packet counter in the ACE field and the CE byte counter in The CE packet counter in the ACE field and the CE byte counter in
AccECN Options both provide feedback on received CE-marks. The CE AccECN Options both provide feedback on received CE marks. The CE
packet counter includes control packets that do not have payload packet counter includes control packets that do not have payload
data, while the CE byte counter solely includes marked payload bytes. data, while the CE byte counter solely includes marked payload bytes.
If both are present, the byte counter in an AccECN Option will If both are present, the byte counter in an AccECN Option will
provide the more accurate information needed for modern congestion provide the more accurate information needed for modern congestion
control and policing schemes, such as L4S, DCTCP or ConEx. If AccECN control and policing schemes, such as L4S, DCTCP, or ConEx. If
Options are stripped, a simple algorithm to estimate the number of AccECN Options are stripped, a simple algorithm to estimate the
marked bytes from the ACE field is given in Appendix A.3. number of marked bytes from the ACE field is given in Appendix A.3.
The AccECN design has been generalized so that it ought to be able to The AccECN design has been generalized so that it ought to be able to
support possible future uses of the experimental ECT(1) codepoint support possible future uses of the experimental ECT(1) codepoint
other than the L4S experiment [RFC9330], such as a lower severity or other than the L4S experiment [RFC9330], such as a lower severity or
a more instant congestion signal than CE. a more instant congestion signal than CE.
Feedback in bytes is provided to protect against the receiver or a Feedback in bytes is provided to protect against the receiver or a
middlebox using attacks similar to 'ACK-Division' to artificially middlebox using attacks similar to 'ACK-Division' to artificially
inflate the congestion window, which is why [RFC5681] now recommends inflate the congestion window, which is why [RFC5681] now recommends
that TCP counts acknowledged bytes not packets. that TCP counts acknowledge bytes not packets.
2.5. Generic (Mechanistic) Reflector 2.5. Generic (Mechanistic) Reflector
The ACE field provides feedback about CE markings in the IP-ECN field The ACE field provides feedback about CE markings in the IP-ECN field
of both data and control packets. According to [RFC3168] the Data of both data and control packets. According to [RFC3168], the Data
Sender is meant to set the IP-ECN field of control packets to Not- Sender is meant to set the IP-ECN field of control packets to Not-
ECT. However, mechanisms in certain private networks (e.g., data ECT. However, mechanisms in certain private networks (e.g., data
centres) set control packets to be ECN capable because they are centres) set control packets to be ECN-capable because they are
precisely the packets that performance depends on most. precisely the packets that performance depends on most.
For this reason, AccECN is designed to be a generic reflector of For this reason, AccECN is designed to be a generic reflector of
whatever ECN markings it sees, whether or not they are compliant with whatever ECN markings it sees, whether or not they are compliant with
a current standard. Then as standards evolve, Data Senders can a current standard. Then as standards evolve, Data Senders can
upgrade unilaterally without any need for receivers to upgrade too. upgrade unilaterally without any need for receivers to upgrade too.
It is also useful to be able to rely on generic reflection behaviour It is also useful to be able to rely on generic reflection behaviour
when senders need to test for unexpected interference with markings when senders need to test for unexpected interference with markings
(for instance Section 3.2.2.3, Section 3.2.2.4 and Section 3.2.3.2 of (for instance Sections 3.2.2.3, 3.2.2.4, and 3.2.3.2 of the present
the present document and paragraph 2 of Section 20.2 of [RFC3168]). document and paragraph 2 of Section 20.2 of [RFC3168]).
The initial SYN and SYN/ACK are the most critical control packets, so The initial SYN and SYN/ACK are the most critical control packets, so
AccECN feeds back their IP-ECN fields. Although RFC 3168 prohibits AccECN feeds back their IP-ECN fields. Although RFC 3168 prohibits
ECN-capable SYNs and SYN/ACKs, providing feedback of ECN marking on ECN-capable SYNs and SYN/ACKs, providing feedback of ECN marking on
the SYN and SYN/ACK supports future scenarios in which SYNs might be the SYN and SYN/ACK supports future scenarios in which SYNs might be
ECN-enabled (without prejudging whether they ought to be). For ECN-enabled (without prejudging whether they ought to be). For
instance, [RFC8311] updates this aspect of RFC 3168 to allow instance, [RFC8311] updates this aspect of RFC 3168 to allow
experimentation with ECN-capable TCP control packets. experimentation with ECN-capable TCP control packets.
Even if the TCP Client (or Server) has set the SYN (or SYN/ACK) to Even if the TCP Client (or Server) has set the SYN (or SYN/ACK) to
not-ECT in compliance with RFC 3168, feedback on the state of the IP- Not-ECT in compliance with RFC 3168, feedback on the state of the IP-
ECN field when it arrives at the receiver could still be useful, ECN field when it arrives at the receiver could still be useful,
because middleboxes have been known to overwrite the IP-ECN field as because middleboxes have been known to overwrite the IP-ECN field as
if it is still part of the old Type of Service (ToS) field if it is still part of the old Type of Service (ToS) field
[Mandalari18]. For example, if a TCP Client has set the SYN to Not- [Mandalari18]. For example, if a TCP Client has set the SYN to Not-
ECT, but receives feedback that the IP-ECN field on the SYN arrived ECT, but receives feedback that the IP-ECN field on the SYN arrived
with a different codepoint, it can detect such middlebox with a different codepoint, it can detect such middlebox
interference. Previously, neither end knew what IP-ECN field the interference. Previously, neither end knew what IP-ECN field the
other had sent. So, if a TCP Server received ECT or CE on a SYN, it other sent. So, if a TCP Server received ECT or CE on a SYN, it
could not know whether it was invalid because only the TCP Client could not know whether it was invalid because only the TCP Client
knew whether it originally marked the SYN as Not-ECT (or ECT). knew whether it originally marked the SYN as Not-ECT (or ECT).
Therefore, prior to AccECN, the Server's only safe course of action Therefore, prior to AccECN, the Server's only safe course of action
in this example was to disable ECN for the connection. Instead, the in this example was to disable ECN for the connection. Instead, the
AccECN protocol allows the Server and Client to feed back the ECN AccECN protocol allows the Server and Client to feed back the ECN
field received on the SYN and SYN/ACK to their peer, which then has field received on the SYN and SYN/ACK to their peer, which now has
all the information to decide whether the connection has to fall-back all the information to decide whether the connection has to fall back
from supporting ECN (or not). from supporting ECN (or not).
3. AccECN Protocol Specification 3. AccECN Protocol Specification
3.1. Negotiating to use AccECN 3.1. Negotiating to Use AccECN
3.1.1. Negotiation during the TCP three-way handshake 3.1.1. Negotiation During the TCP Three-Way Handshake
Given the ECN Nonce [RFC3540] has been reclassified as historic Given the ECN-nonce [RFC3540] has been reclassified as Historic
[RFC8311], the TCP flag that was previously called NS (Nonce Sum) is [RFC8311], the TCP flag that was previously called NS (Nonce Sum) is
renamed as the AE (Accurate ECN) flag (the TCP header flag at bit renamed as the AE (Accurate ECN) flag (the TCP header flag at bit
offset 7 in Figure 2). See the IANA Considerations in Section 7. offset 7 in Figure 2). See the IANA Considerations in Section 7.
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+ +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
| | | A | C | E | U | A | P | R | S | F | | | | A | C | E | U | A | P | R | S | F |
| Header Length | Reserved | E | W | C | R | C | S | S | Y | I | | Header Length | Reserved | E | W | C | R | C | S | S | Y | I |
| | | | R | E | G | K | H | T | N | N | | | | | R | E | G | K | H | T | N | N |
+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+ +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
Figure 2: The new definition of the TCP header flags during the Figure 2: The New Definition of the TCP Header Flags During the
TCP three-way handshake TCP Three-Way Handshake
During the TCP three-way handshake at the start of a connection, to During the TCP three-way handshake at the start of a connection, to
request more Accurate ECN feedback the TCP Client (host A) MUST set request more Accurate ECN feedback the TCP Client (host A) MUST set
the TCP flags (AE,CWR,ECE) = (1,1,1) in the initial SYN segment. the TCP flags (AE,CWR,ECE) = (1,1,1) in the initial SYN segment.
If a TCP Server (host B) that is AccECN-enabled receives a SYN with If a TCP Server (host B) that is AccECN-enabled receives a SYN with
the above three flags set, it MUST set both its half connections into the above three flags set, it MUST set both its half connections into
AccECN mode. Then it MUST set the AE, CWR and ECE TCP flags on the AccECN mode. Then it MUST set the AE, CWR, and ECE TCP flags on the
SYN/ACK to the combination in the top block of Table 2 that feeds SYN/ACK to the combination in the top block of Table 2 that feeds
back the IP-ECN field that arrived on the SYN. This applies whether back the IP-ECN field that arrived on the SYN. This applies whether
or not the Server itself supports setting the IP-ECN field on a SYN or not the Server itself supports setting the IP-ECN field on a SYN
or SYN/ACK (see Section 2.5 for rationale). or SYN/ACK (see Section 2.5 for rationale).
When the TCP Server returns any of the 4 combinations in the top When the TCP Server returns any of the four combinations in the top
block of Table 2, it confirms that it supports AccECN. The TCP block of Table 2, it confirms that it supports AccECN. The TCP
Server MUST NOT set one of these 4 combination of flags on the SYN/ Server MUST NOT set one of these four combinations of flags on the
ACK unless the preceding SYN requested support for AccECN as above. SYN/ACK unless the preceding SYN requested support for AccECN as
above.
Once a TCP Client (A) has sent the above SYN to declare that it Once a TCP Client (A) has sent the above SYN to declare that it
supports AccECN, and once it has received the above SYN/ACK segment supports AccECN, and once it has received the above SYN/ACK segment
that confirms that the TCP Server supports AccECN, the TCP Client that confirms that the TCP Server supports AccECN, the TCP Client
MUST set both its half connections into AccECN mode. The TCP Client MUST set both its half connections into AccECN mode. The TCP Client
MUST NOT enter AccECN mode (or any feedback mode) before it has MUST NOT enter AccECN mode (or any feedback mode) before it has
received the first SYN/ACK. received the first SYN/ACK.
Once in AccECN mode, a TCP Client or Server has the rights and Once in AccECN mode, a TCP Client or Server has the rights and
obligations to participate in the ECN protocol defined in obligations to participate in the ECN protocol defined in
Section 3.1.5. Section 3.1.5.
The procedures to follow for retransmission of SYNs or SYN/ACKs are The procedures for retransmission of SYNs or SYN/ACKs are given in
given in Section 3.1.4. Section 3.1.4.
It is RECOMMENDED that the AccECN protocol is implemented alongside It is RECOMMENDED that the AccECN protocol be implemented alongside
Selective Acknowledgement (SACK) [RFC2018]. If SACK is implemented Selective Acknowledgement (SACK) [RFC2018]. If SACK is implemented
with AccECN, Duplicate Selective Acknowledgement (D-SACK) [RFC2883] with AccECN, Duplicate Selective Acknowledgement (D-SACK) [RFC2883]
MUST also be implemented. MUST also be implemented.
3.1.2. Backward Compatibility 3.1.2. Backward Compatibility
The three flags set to 1 to indicate AccECN support on the SYN have The three flags are set to 1 to indicate AccECN support on the SYN
been carefully chosen to enable natural fall-back to prior stages in have been carefully chosen to enable natural fall-back to prior
the evolution of ECN. Table 2 tabulates all the negotiation stages in the evolution of ECN. Table 2 tabulates all the
possibilities for ECN-related capabilities that involve at least one negotiation possibilities for ECN-related capabilities that involve
AccECN-capable host. The entries in the first two columns have been at least one AccECN-capable host. The entries in the first two
abbreviated, as follows: columns have been abbreviated, as follows:
AccECN: Supports more Accurate ECN Feedback (the present AccECN: Supports more Accurate ECN feedback (the present
specification) specification)
Nonce: Supports ECN Nonce feedback [RFC3540] Nonce: Supports ECN-nonce feedback [RFC3540]
ECN: Supports 'Classic' ECN feedback [RFC3168] ECN: Supports 'Classic' ECN feedback [RFC3168]
No ECN: Not ECN-capable. Implicit congestion notification using No ECN: Not ECN-capable. Implicit congestion notification using
packet drop. packet drop.
+========+========+============+============+======================+ +========+========+============+============+======================+
| Host A | Host B | SYN | SYN/ACK | Feedback Mode | | Host A | Host B | SYN | SYN/ACK | Feedback Mode |
| | | A->B | B->A | of Host A | | | | A->B | B->A | of Host A |
| | | AE CWR ECE | AE CWR ECE | | | | | AE CWR ECE | AE CWR ECE | |
+========+========+============+============+======================+ +========+========+============+============+======================+
| AccECN | AccECN | 1 1 1 | 0 1 0 | AccECN (Not-ECT SYN) | | AccECN | AccECN | 1 1 1 | 0 1 0 | AccECN (Not-ECT SYN) |
| AccECN | AccECN | 1 1 1 | 0 1 1 | AccECN (ECT1 on SYN) | | AccECN | AccECN | 1 1 1 | 0 1 1 | AccECN (ECT1 on SYN) |
| AccECN | AccECN | 1 1 1 | 1 0 0 | AccECN (ECT0 on SYN) | | AccECN | AccECN | 1 1 1 | 1 0 0 | AccECN (ECT0 on SYN) |
| AccECN | AccECN | 1 1 1 | 1 1 0 | AccECN (CE on SYN) | | AccECN | AccECN | 1 1 1 | 1 1 0 | AccECN (CE on SYN) |
+--------+--------+------------+------------+----------------------+ +--------+--------+------------+------------+----------------------+
+--------+--------+------------+------------+----------------------+ +--------+--------+------------+------------+----------------------+
| AccECN | Nonce | 1 1 1 | 1 0 1 | (Reserved) | | AccECN | Nonce | 1 1 1 | 1 0 1 | (Reserved) |
| AccECN | ECN | 1 1 1 | 0 0 1 | Classic ECN | | AccECN | ECN | 1 1 1 | 0 0 1 | Classic ECN |
| AccECN | No ECN | 1 1 1 | 0 0 0 | Not ECN | | AccECN | No ECN | 1 1 1 | 0 0 0 | Not ECN |
+--------+--------+------------+------------+----------------------+ +--------+--------+------------+------------+----------------------+
+--------+--------+------------+------------+----------------------+ +--------+--------+------------+------------+----------------------+
| Nonce | AccECN | 0 1 1 | 0 0 1 | Classic ECN | | Nonce | AccECN | 0 1 1 | 0 0 1 | Classic ECN |
| ECN | AccECN | 0 1 1 | 0 0 1 | Classic ECN | | ECN | AccECN | 0 1 1 | 0 0 1 | Classic ECN |
| No ECN | AccECN | 0 0 0 | 0 0 0 | Not ECN | | No ECN | AccECN | 0 0 0 | 0 0 0 | Not ECN |
+--------+--------+------------+------------+----------------------+ +--------+--------+------------+------------+----------------------+
+--------+--------+------------+------------+----------------------+ +--------+--------+------------+------------+----------------------+
| AccECN | Broken | 1 1 1 | 1 1 1 | Not ECN | | AccECN | Broken | 1 1 1 | 1 1 1 | Not ECN |
+--------+--------+------------+------------+----------------------+ +--------+--------+------------+------------+----------------------+
Table 2: ECN capability negotiation between Client (A) and Table 2: ECN Capability Negotiation Between Client (A) and
Server (B) Server (B)
Table 2 is divided into blocks each separated by an empty row. Table 2 is divided into blocks, with each block separated by an empty
row.
1. The top block shows the case already described in Section 3.1 1. The top block shows the case already described in Section 3.1
where both endpoints support AccECN and how the TCP Server (B) where both endpoints support AccECN and how the TCP Server (B)
indicates congestion feedback. indicates congestion feedback.
2. The second block shows the cases where the TCP Client (A) 2. The second block shows the cases where the TCP Client (A)
supports AccECN but the TCP Server (B) supports some earlier supports AccECN but the TCP Server (B) supports some earlier
variant of TCP feedback, indicated in its SYN/ACK. Therefore, as variant of TCP feedback, as indicated in its SYN/ACK. Therefore,
soon as an AccECN-capable TCP Client (A) receives the SYN/ACK as soon as an AccECN-capable TCP Client (A) receives the SYN/ACK
shown it MUST set both its half connections into the feedback shown, it MUST set both its half connections into the feedback
mode shown in the rightmost column. If the TCP Client has set mode shown in the rightmost column. If the TCP Client has set
itself into Classic ECN feedback mode it MUST then comply with itself into Classic ECN feedback mode, it MUST comply with
[RFC3168]. [RFC3168].
An AccECN implementation has no need to recognize or support the An AccECN implementation has no need to recognize or support the
Server response labelled 'Nonce' or ECN Nonce feedback more Server response labelled 'Nonce' or ECN-nonce feedback more
generally [RFC3540], which has been reclassified as historic generally [RFC3540], as RFC 3540 has been reclassified as
[RFC8311]. AccECN is compatible with alternative ECN feedback Historic [RFC8311]. AccECN is compatible with alternative ECN
integrity approaches to the nonce (see Section 5.3). The SYN/ACK feedback integrity approaches to the nonce (see Section 5.3).
labelled 'Nonce' with (AE,CWR,ECE) = (1,0,1) is reserved for The SYN/ACK labelled 'Nonce' with (AE,CWR,ECE) = (1,0,1) is
future use. A TCP Client (A) that receives such a SYN/ACK reserved for future use. A TCP Client (A) that receives such a
follows the procedure for forward compatibility given in SYN/ACK follows the procedure for forward compatibility given in
Section 3.1.3. Section 3.1.3.
3. The third block shows the cases where the TCP Server (B) supports 3. The third block shows the cases where the TCP Server (B) supports
AccECN but the TCP Client (A) supports some earlier variant of AccECN but the TCP Client (A) supports some earlier variant of
TCP feedback, indicated in its SYN. TCP feedback, as indicated in its SYN.
When an AccECN-enabled TCP Server (B) receives a SYN with When an AccECN-enabled TCP Server (B) receives a SYN with
(AE,CWR,ECE) = (0,1,1) it MUST do one of the following: (AE,CWR,ECE) = (0,1,1), it MUST do one of the following:
* set both its half connections into the Classic ECN feedback * set both its half connections into the Classic ECN feedback
mode and return a SYN/ACK with (AE,CWR,ECE) = (0,0,1) as mode and return a SYN/ACK with (AE,CWR,ECE) = (0,0,1) as
shown. Then it MUST comply with [RFC3168]. shown. Then it MUST comply with [RFC3168].
* set both its half-connections into Not ECN mode and return a * set both its half-connections into Not ECN mode and return a
SYN/ACK with (AE,CWR,ECE) = (0,0,0), then continue with ECN SYN/ACK with (AE,CWR,ECE) = (0,0,0), then continue with ECN
disabled. This latter case is unlikely to be desirable, but disabled. This latter case is unlikely to be desirable, but
it is allowed as a possibility, e.g., for minimal TCP it is allowed as a possibility, e.g., for minimal TCP
implementations. implementations.
When an AccECN-enabled TCP Server (B) receives a SYN with When an AccECN-enabled TCP Server (B) receives a SYN with
(AE,CWR,ECE) = (0,0,0) it MUST set both its half connections into (AE,CWR,ECE) = (0,0,0), it MUST set both its half connections
the Not ECN feedback mode, return a SYN/ACK with (AE,CWR,ECE) = into the Not ECN feedback mode, return a SYN/ACK with
(0,0,0) as shown and continue with ECN disabled. (AE,CWR,ECE) = (0,0,0) as shown, and continue with ECN disabled.
4. The fourth block displays a combination labelled `Broken'. Some 4. The fourth block displays a combination labelled 'Broken'. Some
older TCP Server implementations incorrectly set the TCP-ECN older TCP Server implementations incorrectly set the TCP-ECN
flags in the SYN/ACK by reflecting those in the SYN. Such broken flags in the SYN/ACK by reflecting those in the SYN. Such broken
TCP Servers (B) cannot support ECN, so as soon as an AccECN- TCP Servers (B) cannot support ECN; so as soon as an AccECN-
capable TCP Client (A) receives such a broken SYN/ACK it MUST capable TCP Client (A) receives such a broken SYN/ACK, it MUST
fall back to Not ECN mode for both its half connections and fall back to Not ECN mode for both its half connections and
continue with ECN disabled. continue with ECN disabled.
The following additional rules do not fit the structure of the table, The following additional rules do not fit the structure of the table,
but they complement it: but they complement it:
Simultaneous Open: An originating AccECN Host (A), having sent a SYN Simultaneous Open: An originating AccECN Host (A), having sent a SYN
with (AE,CWR,ECE) = (1,1,1), might receive another SYN from host with (AE,CWR,ECE) = (1,1,1), might receive another SYN from host
B. Host A MUST then enter the same feedback mode as it would have B. Host A MUST then enter the same feedback mode as it would have
entered had it been a responding host and received the same SYN. entered had it been a responding host and received the same SYN.
skipping to change at page 17, line 30 skipping to change at line 782
new TCP connection if they receive an in-window SYN packet during new TCP connection if they receive an in-window SYN packet during
TIME-WAIT state. When a TCP host enters TIME-WAIT or CLOSED TIME-WAIT state. When a TCP host enters TIME-WAIT or CLOSED
state, it ought to ignore any previous state about the negotiation state, it ought to ignore any previous state about the negotiation
of AccECN for that connection and renegotiate the feedback mode of AccECN for that connection and renegotiate the feedback mode
according to Table 2. according to Table 2.
3.1.3. Forward Compatibility 3.1.3. Forward Compatibility
If a TCP Server that implements AccECN receives a SYN with the three If a TCP Server that implements AccECN receives a SYN with the three
TCP header flags (AE,CWR,ECE) set to any combination other than TCP header flags (AE,CWR,ECE) set to any combination other than
(0,0,0), (0,1,1) or (1,1,1) and it does not have logic specific to (0,0,0), (0,1,1), or (1,1,1) and it does not have logic specific to
such a combination, the Server MUST negotiate the use of AccECN as if such a combination, the Server MUST negotiate the use of AccECN as if
the three flags had been set to (1,1,1). However, an AccECN Client the three flags had been set to (1,1,1). However, an AccECN Client
implementation MUST NOT send a SYN with any combination other than implementation MUST NOT send a SYN with any combination other than
the three listed. the three listed.
If a TCP Client has sent a SYN requesting AccECN feedback with If a TCP Client sent a SYN requesting AccECN feedback with
(AE,CWR,ECE) = (1,1,1) then receives a SYN/ACK with the currently (AE,CWR,ECE) = (1,1,1) and then receives a SYN/ACK with the currently
reserved combination (AE,CWR,ECE) = (1,0,1) but it does not have reserved combination (AE,CWR,ECE) = (1,0,1) but it does not have
logic specific to such a combination, the Client MUST enable AccECN logic specific to such a combination, the Client MUST enable AccECN
mode as if the SYN/ACK confirmed that the Server supported AccECN and mode as if the SYN/ACK confirmed that the Server supported AccECN and
as if it fed back that the IP-ECN field on the SYN had arrived as if it fed back that the IP-ECN field on the SYN had arrived
unchanged. However, an AccECN Server implementation MUST NOT send a unchanged. However, an AccECN Server implementation MUST NOT send a
SYN/ACK with this combination (AE,CWR,ECE) = (1,0,1). SYN/ACK with this combination (AE,CWR,ECE) = (1,0,1).
| For the avoidance of doubt, the behaviour described in the | For the avoidance of doubt, the behaviour described in the
| present specification applies whether or not the three | present specification applies whether or not the three
| remaining reserved TCP header flags are zero. | remaining reserved TCP header flags are zero.
All these requirements ensure that future uses of all the Reserved All of these requirements ensure that future uses of all the Reserved
combinations on a SYN or SYN/ACK can rely on consistent behaviour combinations on a SYN or SYN/ACK can rely on consistent behaviour
from the installed base of AccECN implementations. See Appendix B.3 from the installed base of AccECN implementations. See Appendix B.3
for related discussion. for related discussion.
3.1.4. Multiple SYNs or SYN/ACKs 3.1.4. Multiple SYNs or SYN/ACKs
3.1.4.1. Retransmitted SYNs 3.1.4.1. Retransmitted SYNs
If the sender of an AccECN SYN (the TCP Client) times out before If the sender of an AccECN SYN (the TCP Client) times out before
receiving the SYN/ACK, it SHOULD attempt to negotiate the use of receiving the SYN/ACK, it SHOULD attempt to negotiate the use of
AccECN at least one more time by continuing to set all three TCP ECN AccECN at least one more time by continuing to set all three TCP ECN
flags (AE,CWR,ECE) = (1,1,1) on the first retransmitted SYN (using flags (AE,CWR,ECE) = (1,1,1) on the first retransmitted SYN (using
the usual retransmission time-outs). If this first retransmission the usual retransmission timeouts). If this first retransmission
also fails to be acknowledged, in deployment scenarios where AccECN also fails to be acknowledged, in deployment scenarios where AccECN
path traversal might be problematic, the TCP Client SHOULD send path traversal might be problematic, the TCP Client SHOULD send
subsequent retransmissions of the SYN with the three TCP-ECN flags subsequent retransmissions of the SYN with the three TCP-ECN flags
cleared (AE,CWR,ECE) = (0,0,0). Such a retransmitted SYN MUST use cleared (AE,CWR,ECE) = (0,0,0). Such a retransmitted SYN MUST use
the same initial sequence number (ISN) as the original SYN. the same initial sequence number (ISN) as the original SYN.
Retrying once before fall-back adds delay in the case where a Retrying once before fall-back adds delay in the case where a
middlebox drops an AccECN (or ECN) SYN deliberately. However, recent middlebox drops an AccECN (or ECN) SYN deliberately. However, recent
measurements [Mandalari18] imply that a drop is less likely to be due measurements [Mandalari18] imply that a drop is less likely to be due
to middlebox interference than other intermittent causes of loss, to middlebox interference than other intermittent causes of loss,
e.g., congestion, wireless transmission loss, etc. e.g., congestion, wireless transmission loss, etc.
Implementers MAY use other fall-back strategies if they are found to Implementers MAY use other fall-back strategies if they are found to
be more effective (e.g., attempting to negotiate AccECN on the SYN be more effective (e.g., attempting to negotiate AccECN on the SYN
only once or more than twice (most appropriate during high levels of only once or more than twice (most appropriate during high levels of
congestion). congestion).
Further it might make sense to also remove any other new or Further it might make sense to also remove any other new or
experimental fields or options on the SYN in case a middlebox might experimental fields or options on the SYN in case a middlebox might
be blocking them, although the required behaviour will depend on the be blocking them, although the required behaviour will depend on the
specification of the other option(s) and any attempt to co-ordinate specification of the other option(s) and any attempt to coordinate
fall-back between different modules of the stack. For instance, even fall-back between different modules of the stack. For instance, even
if taking part in an [RFC8311] experiment that allows ECT on a SYN, if taking part in an [RFC8311] experiment that allows ECT on a SYN,
it would be advisable to try it without. it would be advisable to try it without.
Whichever fall-back strategy is used, the TCP initiator SHOULD cache Whichever fall-back strategy is used, the TCP initiator SHOULD cache
failed connection attempts. If it does, it SHOULD NOT give up failed connection attempts. If it does, it SHOULD NOT give up
attempting to negotiate AccECN on the SYN of subsequent connection attempting to negotiate AccECN on the SYN of subsequent connection
attempts until it is clear that the blockage is persistently and attempts until it is clear that the blockage is persistently and
specifically due to AccECN. The cache needs to be arranged to expire specifically due to AccECN. The cache needs to be arranged to expire
so that the initiator will infrequently attempt to check whether the so that the initiator will infrequently attempt to check whether the
skipping to change at page 19, line 15 skipping to change at line 858
All fall-back strategies will need to follow all the normative rules All fall-back strategies will need to follow all the normative rules
in Section 3.1.5, which concern behaviour when SYNs or SYN/ACKs in Section 3.1.5, which concern behaviour when SYNs or SYN/ACKs
negotiating different types of feedback have been sent within the negotiating different types of feedback have been sent within the
same connection, including the possibility that they arrive out of same connection, including the possibility that they arrive out of
order. As examples, the following non-normative bullets call out order. As examples, the following non-normative bullets call out
those rules from Section 3.1.5 that apply to the above fall-back those rules from Section 3.1.5 that apply to the above fall-back
strategies: strategies:
* Once the TCP Client has sent SYNs with (AE,CWR,ECE) = (1,1,1) and * Once the TCP Client has sent SYNs with (AE,CWR,ECE) = (1,1,1) and
with (AE,CWR,ECE) = (0,0,0), it might eventually receive a SYN/ACK with (AE,CWR,ECE) = (0,0,0), it might eventually receive a SYN/ACK
from the Server in response to one, the other, or both and from the Server in response to one, the other, or both, and
possibly reordered; possibly reordered;
* Such a TCP Client enters the feedback mode appropriate to the * Such a TCP Client enters the feedback mode appropriate to the
first SYN/ACK it receives according to Table 2, and it does not first SYN/ACK it receives according to Table 2, and it does not
switch to a different mode, whatever other SYN/ACKs it might switch to a different mode, whatever other SYN/ACKs it might
receive or send; receive or send;
* If a TCP Client has entered AccECN mode but then subsequently * If a TCP Client has entered AccECN mode but then subsequently
sends a SYN or receives a SYN/ACK with (AE,CWR,ECE) = (0,0,0), it sends a SYN or receives a SYN/ACK with (AE,CWR,ECE) = (0,0,0), it
is still allowed to set ECT on packets for the rest of the is still allowed to set ECT on packets for the rest of the
connection. Note that this rule is different to that of a Server connection. Note that this rule is different than that of a
in an equivalent position (Section 3.1.5 explains). Server in an equivalent position (Section 3.1.5 explains).
* Having entered AccECN mode, in general a TCP Client commits to * Having entered AccECN mode, in general a TCP Client commits to
respond to any incoming congestion feedback, whether or not it respond to any incoming congestion feedback, whether or not it
sets ECT on outgoing packets (for rationale and some exceptions sets ECT on outgoing packets (for rationale and some exceptions
see Section 3.2.2.3, Section 3.2.2.4); see Section 3.2.2.3, Section 3.2.2.4);
* Having entered AccECN mode, a TCP Client commits to using AccECN * Having entered AccECN mode, a TCP Client commits to using AccECN
to feed back the IP-ECN field in incoming packets for the rest of to feed back the IP-ECN field in incoming packets for the rest of
the connection, as specified in Section 3.2, even if it is not the connection, as specified in Section 3.2, even if it is not
itself setting ECT on outgoing packets. itself setting ECT on outgoing packets.
skipping to change at page 20, line 10 skipping to change at line 902
negotiating different types of feedback are sent within the same negotiating different types of feedback are sent within the same
connection, including the possibility that they arrive out of order. connection, including the possibility that they arrive out of order.
As examples, the following non-normative bullets call out those rules As examples, the following non-normative bullets call out those rules
from Section 3.1.5 that apply to the above fall-back strategies: from Section 3.1.5 that apply to the above fall-back strategies:
* An AccECN-capable TCP Server enters the feedback mode appropriate * An AccECN-capable TCP Server enters the feedback mode appropriate
to the first SYN it receives using Table 2, and it does not switch to the first SYN it receives using Table 2, and it does not switch
to a different mode, whatever other SYNs it might receive and to a different mode, whatever other SYNs it might receive and
whatever SYN/ACKs it might send; whatever SYN/ACKs it might send;
* if a TCP Server in AccECN mode receives a SYN with (AE,CWR,ECE) = * If a TCP Server in AccECN mode receives a SYN with (AE,CWR,ECE) =
(0,0,0), it preferably acknowledges it first using an AccECN SYN/ (0,0,0), it preferably acknowledges it first using an AccECN SYN/
ACK, but it can retry using a SYN/ACK with (AE,CWR,ECE) = (0,0,0); ACK, but it can retry using a SYN/ACK with (AE,CWR,ECE) = (0,0,0);
* If a TCP Server in AccECN mode sends multiple AccECN SYN/ACKs, it * If a TCP Server in AccECN mode sends multiple AccECN SYN/ACKs, it
uses the TCP-ECN flags in each SYN/ACK to feed back the IP-ECN uses the TCP-ECN flags in each SYN/ACK to feed back the IP-ECN
field on the latest SYN to have arrived; field on the latest SYN to have arrived;
* If a TCP Server enters AccECN mode then subsequently sends a SYN/ * If a TCP Server enters AccECN mode and then subsequently sends a
ACK or receives a SYN with (AE,CWR,ECE) = (0,0,0), it is SYN/ACK or receives a SYN with (AE,CWR,ECE) = (0,0,0), it is
prohibited from setting ECT on any packet for the rest of the prohibited from setting ECT on any packet for the rest of the
connection; connection;
* Having entered AccECN mode, in general a TCP Server commits to * Having entered AccECN mode, in general a TCP Server commits to
respond to any incoming congestion feedback, whether or not it respond to any incoming congestion feedback, whether or not it
sets ECT on outgoing packets (for rationale and some exceptions sets ECT on outgoing packets (for rationale and some exceptions
see Section 3.2.2.3, Section 3.2.2.4); see Sections 3.2.2.3, 3.2.2.4);
* Having entered AccECN mode, a TCP Server commits to using AccECN * Having entered AccECN mode, a TCP Server commits to using AccECN
to feed back the IP-ECN field in incoming packets for the rest of to feed back the IP-ECN field in incoming packets for the rest of
the connection, as specified in Section 3.2, even if it is not the connection, as specified in Section 3.2, even if it is not
itself setting ECT on outgoing packets. itself setting ECT on outgoing packets.
3.1.5. Implications of AccECN Mode 3.1.5. Implications of AccECN Mode
Section 3.1.1 describes the only ways that a host can enter AccECN Section 3.1.1 describes the only ways that a host can enter AccECN
mode, whether as a Client or as a Server. mode, whether as a Client or as a Server.
skipping to change at page 21, line 5 skipping to change at line 946
synchronization; synchronization;
'Valid SYN': A SYN that has the same port numbers and the same ISN 'Valid SYN': A SYN that has the same port numbers and the same ISN
as the SYN that first caused the Server to open the connection. as the SYN that first caused the Server to open the connection.
An 'Acceptable' packet is defined in Section 1.3. An 'Acceptable' packet is defined in Section 1.3.
Handling SYNs or SYN/ACKs of multiple types (e.g., fall-back): Handling SYNs or SYN/ACKs of multiple types (e.g., fall-back):
* Any implementation that supports AccECN: * Any implementation that supports AccECN:
- MUST NOT switch into a different feedback mode to the one it - MUST NOT switch into a different feedback mode than the one it
first entered according to Table 2, no matter whether it first entered according to Table 2, no matter whether it
subsequently receives valid SYNs or Acceptable SYN/ACKs of subsequently receives valid SYNs or Acceptable SYN/ACKs of
different types. different types.
- SHOULD ignore the TCP-ECN flags in SYNs or SYN/ACKs that are - SHOULD ignore the TCP-ECN flags in SYNs or SYN/ACKs that are
received after the implementation reaches the Established received after the implementation reaches the Established
state, in line with the general TCP approach [RFC9293]; state, in line with the general TCP approach [RFC9293];
Reason: Reaching established state implies that at least one Reason: Reaching established state implies that at least one
SYN and one SYN/ACK have successfully been delivered. And all SYN and one SYN/ACK have successfully been delivered. And all
skipping to change at page 22, line 35 skipping to change at line 1024
- SHOULD respond to any subsequent valid SYN using a SYN/ACK with - SHOULD respond to any subsequent valid SYN using a SYN/ACK with
(AE,CWR,ECE) = (0,0,0), even if the SYN is offering to (AE,CWR,ECE) = (0,0,0), even if the SYN is offering to
negotiate Classic ECN or AccECN feedback mode; negotiate Classic ECN or AccECN feedback mode;
Rationale: There would be no point in the Server offering any Rationale: There would be no point in the Server offering any
type of ECN feedback, because the Client will not be using ECN. type of ECN feedback, because the Client will not be using ECN.
However, there is no interoperability reason to make this rule However, there is no interoperability reason to make this rule
mandatory. mandatory.
If for any reason a host is not willing to provide ECN feedback on a If for any reason a host is not willing to provide ECN feedback on a
particular TCP connection, it SHOULD clear the AE, CWR and ECE flags particular TCP connection, it SHOULD clear the AE, CWR, and ECE flags
in all SYN and/or SYN/ACK packets that it sends. in all SYN and/or SYN/ACK packets that it sends.
Sending ECT: Sending ECT:
* Any implementation that supports AccECN: * Any implementation that supports AccECN:
- MUST NOT set ECT if it is in Not ECN feedback mode. - MUST NOT set ECT if it is in Not ECN feedback mode.
A Data Sender in AccECN mode: A Data Sender in AccECN mode:
skipping to change at page 23, line 12 skipping to change at line 1049
- MAY not set ECT on any packet (for instance if it has reason to - MAY not set ECT on any packet (for instance if it has reason to
believe such a packet would be blocked); believe such a packet would be blocked);
A TCP Server in AccECN mode: A TCP Server in AccECN mode:
- MUST NOT set ECT on any packet for the rest of the connection, - MUST NOT set ECT on any packet for the rest of the connection,
if it has received or sent at least one valid SYN or Acceptable if it has received or sent at least one valid SYN or Acceptable
SYN/ACK with (AE,CWR,ECE) = (0,0,0) during the handshake. SYN/ACK with (AE,CWR,ECE) = (0,0,0) during the handshake.
This rule solely applies to a Server because, when a Server This rule solely applies to a Server because, when a Server
enters AccECN mode it doesn't know for sure whether the Client enters AccECN mode, it doesn't know for sure whether the Client
will end up in AccECN mode. But when a Client enters AccECN will end up in AccECN mode. But when a Client enters AccECN
mode, it can be certain that the Server is already in AccECN mode, it can be certain that the Server is already in AccECN
feedback mode. feedback mode.
Congestion response: Congestion response:
* A host in AccECN mode: * A host in AccECN mode:
- is obliged to respond appropriately to AccECN feedback that - is obliged to respond appropriately to AccECN feedback that
indicates there were ECN marks on packets it had previously indicates there were ECN marks on packets it had previously
sent, where 'appropriately' is defined in Section 6.1 of sent, where 'appropriately' is defined in Section 6.1 of
[RFC3168] and updated by Sections 2.1 and 4.1 of [RFC8311]; [RFC3168] and updated by Sections 2.1 and 4.1 of [RFC8311];
- is still obliged to respond appropriately to congestion - is still obliged to respond appropriately to congestion
feedback, even when it is solely sending non-ECN-capable feedback, even when it is solely sending non-ECN-capable
packets (for rationale, some examples and some exceptions see packets (for rationale, some examples and some exceptions see
Section 3.2.2.3, Section 3.2.2.4). Sections 3.2.2.3 and 3.2.2.4).
- is still obliged to respond appropriately to congestion - is still obliged to respond appropriately to congestion
feedback, even if it has sent or received a SYN or SYN/ACK feedback, even if it has sent or received a SYN or SYN/ACK
packet with (AE,CWR,ECE) = (0,0,0) during the handshake; packet with (AE,CWR,ECE) = (0,0,0) during the handshake;
- MUST NOT set CWR to indicate that it has received and responded - MUST NOT set CWR to indicate that it has received and responded
to indications of congestion. to indications of congestion.
For the avoidance of doubt, this is unlike an RFC 3168 data For the avoidance of doubt, this is unlike an RFC 3168 data
sender and this does not preclude the Data Sender from setting sender and this does not preclude the Data Sender from setting
skipping to change at page 24, line 29 skipping to change at line 1112
- MUST NOT use reception of packets with ECT set in the IP-ECN - MUST NOT use reception of packets with ECT set in the IP-ECN
field as an implicit signal that the peer is ECN-capable. field as an implicit signal that the peer is ECN-capable.
Reason: ECT at the IP layer does not explicitly confirm the Reason: ECT at the IP layer does not explicitly confirm the
peer has the correct ECN feedback logic, because the packets peer has the correct ECN feedback logic, because the packets
could have been mangled at the IP layer. could have been mangled at the IP layer.
3.2. AccECN Feedback 3.2. AccECN Feedback
Each Data Receiver of each half connection maintains four counters, Each Data Receiver of each half connection maintains four counters,
r.cep, r.ceb, r.e0b and r.e1b: r.cep, r.ceb, r.e0b, and r.e1b:
* The Data Receiver MUST increment the CE packet counter (r.cep), * The Data Receiver MUST increment the CE packet counter (r.cep),
for every Acceptable packet that it receives with the CE code for every Acceptable packet that it receives with the CE code
point in the IP ECN field, including CE marked control packets and point in the IP-ECN field, including CE-marked control packets and
retransmissions but excluding CE on SYN packets (SYN=1; ACK=0). retransmissions but excluding CE on SYN packets (SYN=1; ACK=0).
* A Data Receiver that supports sending of AccECN TCP Options MUST * A Data Receiver that supports sending of AccECN TCP Options MUST
increment the r.ceb, r.e0b or r.e1b byte counters by the number of increment the r.ceb, r.e0b, or r.e1b byte counters by the number
TCP payload octets in Acceptable packets marked with the CE, of TCP payload octets in Acceptable packets marked with the CE,
ECT(0) and ECT(1) codepoint in their IP-ECN field, including any ECT(0), and ECT(1) codepoint in their IP-ECN field, including any
payload octets on control packets and retransmissions, but not payload octets on control packets and retransmissions, but not
including any payload octets on SYN packets (SYN=1; ACK=0). including any payload octets on SYN packets (SYN=1; ACK=0).
Each Data Sender of each half connection maintains four counters, Each Data Sender of each half connection maintains four counters,
s.cep, s.ceb, s.e0b and s.e1b intended to track the equivalent s.cep, s.ceb, s.e0b, and s.e1b, intended to track the equivalent
counters at the Data Receiver. counters at the Data Receiver.
A Data Receiver feeds back the CE packet counter using the Accurate A Data Receiver feeds back the CE packet counter using the Accurate
ECN (ACE) field, as explained in Section 3.2.2. And it optionally ECN (ACE) field, as explained in Section 3.2.2. And it optionally
feeds back all the byte counters using the AccECN TCP Option, as feeds back all the byte counters using the AccECN TCP Option, as
specified in Section 3.2.3. specified in Section 3.2.3.
Whenever a Data Receiver feeds back the value of any counter, it MUST Whenever a Data Receiver feeds back the value of any counter, it MUST
report the most recent value, no matter whether it is in a pure ACK, report the most recent value, no matter whether it is in a pure ACK,
or an ACK piggybacked on a packet used by the other half-connection, or an ACK piggybacked on a packet used by the other half-connection,
whether new payload data or a retransmission. Therefore the feedback whether a new payload data or a retransmission. Therefore, the
piggybacked on a retransmitted packet is unlikely to be the same as feedback piggybacked on a retransmitted packet is unlikely to be the
the feedback on the original packet. same as the feedback on the original packet.
3.2.1. Initialization of Feedback Counters 3.2.1. Initialization of Feedback Counters
When a host first enters AccECN mode, in its role as a Data Receiver When a host first enters AccECN mode, in its role as a Data Receiver,
it initializes its counters to r.cep = 5, r.e0b = r.e1b = 1 and r.ceb it initializes its counters to r.cep = 5, r.e0b = r.e1b = 1, and
= 0, r.ceb = 0,
Non-zero initial values are used to support a stateless handshake Non-zero initial values are used to support a stateless handshake
(see Section 5.1) and to be distinct from cases where the fields are (see Section 5.1) and to be distinct from cases where the fields are
incorrectly zeroed (e.g., by middleboxes - see Section 3.2.3.2.4). incorrectly zeroed (e.g., by middleboxes -- see Section 3.2.3.2.4).
When a host enters AccECN mode, in its role as a Data Sender it When a host enters AccECN mode, in its role as a Data Sender, it
initializes its counters to s.cep = 5, s.e0b = s.e1b = 1 and s.ceb = initializes its counters to s.cep = 5, s.e0b = s.e1b = 1, and s.ceb =
0. 0.
3.2.2. The ACE Field 3.2.2. The ACE Field
After AccECN has been negotiated on the SYN and SYN/ACK, both hosts After AccECN has been negotiated on the SYN and SYN/ACK, both hosts
overload the three TCP flags (AE, CWR and ECE) in the main TCP header overload the three TCP flags (AE, CWR, and ECE) in the main TCP
as one 3-bit field. Then the field is given a new name, ACE, as header as one 3-bit field. Then the field is given a new name, ACE,
shown in Figure 3. as shown in Figure 3.
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+ +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
| | | | U | A | P | R | S | F | | | | | U | A | P | R | S | F |
| Header Length | Reserved | ACE | R | C | S | S | Y | I | | Header Length | Reserved | ACE | R | C | S | S | Y | I |
| | | | G | K | H | T | N | N | | | | | G | K | H | T | N | N |
+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+ +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
Figure 3: Definition of the ACE field within bytes 13 and 14 of Figure 3: Definition of the ACE Field Within Bytes 13 and 14 of
the TCP Header (when AccECN has been negotiated and SYN=0). the TCP Header (When AccECN Has Been Negotiated and SYN=0).
The original definition of these three flags in the TCP header, The original definition of these three flags in the TCP header,
including the addition of support for the ECN Nonce, is shown for including the addition of support for the ECN-nonce, is shown for
comparison in Figure 1. This specification does not rename these comparison in Figure 1. This specification does not rename these
three TCP flags to ACE unconditionally; it merely overloads them with three TCP flags to ACE unconditionally; it merely overloads them with
another name and definition once an AccECN connection has been another name and definition once an AccECN connection has been
established. established.
With one exception (Section 3.2.2.1), a host with both of its half- With one exception (Section 3.2.2.1), a host with both of its half-
connections in AccECN mode MUST interpret the AE, CWR and ECE flags connections in AccECN mode MUST interpret the AE, CWR, and ECE flags
as the 3-bit ACE counter on a segment with the SYN flag cleared as the 3-bit ACE counter on a segment with the SYN flag cleared
(SYN=0). On such a packet, a Data Receiver MUST encode the three (SYN=0). On such a packet, a Data Receiver MUST encode the 3 least
least significant bits of its r.cep counter into the ACE field that significant bits of its r.cep counter into the ACE field that it
it feeds back to the Data Sender. The least significant bit is at feeds back to the Data Sender. The least significant bit is at bit
bit offset 9 in Figure 3. A host MUST NOT interpret the 3 flags as a offset 9 in Figure 3. A host MUST NOT interpret the three flags as a
3-bit ACE field on any segment with SYN=1 (whether ACK is 0 or 1), or 3-bit ACE field on any segment with SYN=1 (whether ACK is 0 or 1), or
if AccECN negotiation is incomplete or has not succeeded. if AccECN negotiation is incomplete or has not succeeded.
Both parts of each of these conditions are equally important. For Both parts of each of these conditions are equally important. For
instance, even if AccECN negotiation has been successful, the ACE instance, even if AccECN negotiation has been successful, the ACE
field is not defined on any segments with SYN=1 (e.g., a field is not defined on any segments with SYN=1 (e.g., a
retransmission of an unacknowledged SYN/ACK, or when both ends send retransmission of an unacknowledged SYN/ACK, or when both ends send
SYN/ACKs after AccECN support has been successfully negotiated during SYN/ACKs after AccECN support has been successfully negotiated during
a simultaneous open). a simultaneous open).
skipping to change at page 26, line 46 skipping to change at line 1221
with a packet that does not satisfy these conditions (e.g., it has with a packet that does not satisfy these conditions (e.g., it has
data to include on the ACK), it SHOULD first send a pure ACK that data to include on the ACK), it SHOULD first send a pure ACK that
does satisfy these conditions (see Section 5.2), so that it can feed does satisfy these conditions (see Section 5.2), so that it can feed
back which of the four values of the IP-ECN field arrived on the SYN/ back which of the four values of the IP-ECN field arrived on the SYN/
ACK. A valid exception to this "SHOULD" would be where the ACK. A valid exception to this "SHOULD" would be where the
implementation will only be used in an environment where mangling of implementation will only be used in an environment where mangling of
the ECN field is unlikely. the ECN field is unlikely.
The TCP Client MUST also use the handshake encoding for the pure ACK The TCP Client MUST also use the handshake encoding for the pure ACK
of any retransmitted SYN/ACK that confirms that the TCP Server of any retransmitted SYN/ACK that confirms that the TCP Server
supports AccECN. The procedure for the TCP Server to follow if the supports AccECN. If the final ACK of the handshake does not arrive
final ACK of the handshake does not arrive before its retransmission before its retransmission timer expires, the TCP Server is follow the
timer expires is given in Section 3.1.4.2. procedure given in Section 3.1.4.2.
+==================+================+=====================+ +==================+================+=====================+
| IP-ECN codepoint | ACE on pure | r.cep of TCP Client | | IP-ECN codepoint | ACE on pure | r.cep of TCP Client |
| on SYN/ACK | ACK of SYN/ACK | in AccECN mode | | on SYN/ACK | ACK of SYN/ACK | in AccECN mode |
+==================+================+=====================+ +==================+================+=====================+
| Not-ECT | 0b010 | 5 | | Not-ECT | 0b010 | 5 |
+------------------+----------------+---------------------+ +------------------+----------------+---------------------+
| ECT(1) | 0b011 | 5 | | ECT(1) | 0b011 | 5 |
+------------------+----------------+---------------------+ +------------------+----------------+---------------------+
| ECT(0) | 0b100 | 5 | | ECT(0) | 0b100 | 5 |
+------------------+----------------+---------------------+ +------------------+----------------+---------------------+
| CE | 0b110 | 6 | | CE | 0b110 | 6 |
+------------------+----------------+---------------------+ +------------------+----------------+---------------------+
Table 3: The encoding of the ACE field in the ACK of Table 3: The Encoding of the ACE Field in the ACK of
the SYN-ACK to reflect the SYN-ACK's IP-ECN field the SYN-ACK to Reflect the SYN-ACK's IP-ECN Field
When an AccECN Server in SYN-RCVD state receives a pure ACK with When an AccECN Server in SYN-RCVD state receives a pure ACK with
SYN=0 and no SACK blocks, instead of treating the ACE field as a SYN=0 and no SACK blocks, instead of treating the ACE field as a
counter, it MUST infer the meaning of each possible value of the ACE counter, it MUST infer the meaning of each possible value of the ACE
field from Table 4, which also shows the value that an AccECN Server field from Table 4, which also shows the value that an AccECN Server
MUST set s.cep to as a result. MUST set s.cep to as a result.
Given this encoding of the ACE field on the ACK of a SYN/ACK is Given this encoding of the ACE field on the ACK of a SYN/ACK is
exceptional, an AccECN Server using large receive offload (LRO) might exceptional, an AccECN Server using large receive offload (LRO) might
prefer to disable LRO until such an ACK has transitioned it out of prefer to disable LRO until such an ACK has transitioned it out of
skipping to change at page 28, line 28 skipping to change at line 1275
+------------+--------------------------+---------------------+ +------------+--------------------------+---------------------+
| 0b101 | Currently Unused {Note | 5 | | 0b101 | Currently Unused {Note | 5 |
| | 2} | | | | 2} | |
+------------+--------------------------+---------------------+ +------------+--------------------------+---------------------+
| 0b110 | CE | 6 | | 0b110 | CE | 6 |
+------------+--------------------------+---------------------+ +------------+--------------------------+---------------------+
| 0b111 | Currently Unused {Note | 5 | | 0b111 | Currently Unused {Note | 5 |
| | 2} | | | | 2} | |
+------------+--------------------------+---------------------+ +------------+--------------------------+---------------------+
Table 4: Meaning of the ACE field on the ACK of the SYN/ACK Table 4: Meaning of the ACE Field on the ACK of the SYN/ACK
{Note 1}: If the Server is in AccECN mode and in SYN-RCVD state, and Note 1: If the Server is in AccECN mode and in SYN-RCVD state, and
if it receives a value of zero on a pure ACK with SYN=0 and no SACK if it receives a value of zero on a pure ACK with SYN=0 and
blocks, for the rest of the connection the Server MUST NOT set ECT on no SACK blocks, for the rest of the connection the Server
outgoing packets and MUST NOT respond to AccECN feedback. MUST NOT set ECT on outgoing packets and MUST NOT respond to
Nonetheless, as a Data Receiver it MUST NOT disable AccECN feedback. AccECN feedback. Nonetheless, as a Data Receiver, it MUST
NOT disable AccECN feedback.
Any of the circumstances below could cause a value of zero but, Any of the circumstances below could cause a value of zero
whatever the cause, the actions above would be the appropriate but, whatever the cause, the actions above would be the
response: appropriate response:
* The TCP Client has somehow entered No ECN feedback mode (most * The TCP Client has somehow entered No ECN feedback mode
likely if the Server received a SYN or sent a SYN/ACK with (most likely if the Server received a SYN or sent a SYN/
(AE,CWR,ECE) = (0,0,0) after entering AccECN mode, but possible ACK with (AE,CWR,ECE) = (0,0,0) after entering AccECN
even if it didn't); mode, but possible even if it didn't);
* The TCP Client genuinely might be in AccECN mode, but its count of * The TCP Client genuinely might be in AccECN mode, but its
received CE marks might have caused the ACE field to wrap to zero. count of received CE marks might have caused the ACE
This is highly unlikely, but not impossible because the Server field to wrap to zero. This is highly unlikely, but not
might have already sent multiple packets while still in SYN-RCVD impossible because the Server might have already sent
state, e.g., using TFO (see Section 5.2) and some might have been multiple packets while still in SYN-RCVD state, e.g.,
CE-marked. Then ACE on the first ACK seen by the Server might be using TFO (see Section 5.2), and some might have been CE-
zero, due to previous ACKs experiencing an unfortunate pattern of marked. Then ACE on the first ACK seen by the Server
loss or delay. might be zero, due to previous ACKs experiencing an
unfortunate pattern of loss or delay.
* Some form of non-compliance at the TCP Client or on the path (see * There is some form of non-compliance at the TCP Client or
Section 3.2.2.4). on the path (see Section 3.2.2.4).
{Note 2}: If the Server is in AccECN mode, these values are Currently Note 2: If the Server is in AccECN mode, these values are Currently
Unused but the AccECN Server's behaviour is still defined for forward Unused but the AccECN Server's behaviour is still defined
compatibility. Then the designer of a future protocol can know for for forward compatibility. Then the designer of a future
certain what AccECN Servers will do with these codepoints. protocol can know for certain what AccECN Servers will do
with these codepoints.
{Note 3}: In the case where a Server that implements AccECN is also Note 3: In the case where a Server that implements AccECN is also
using a stateless handshake (termed a SYN cookie) it will not using a stateless handshake (termed a SYN cookie), it will
remember whether it entered AccECN mode. The values 0b000 or 0b001 not remember whether it entered AccECN mode. The values
will remind it that it did not enter AccECN mode, because AccECN does 0b000 or 0b001 will remind it that it did not enter AccECN
not use them (see Section 5.1 for details). If a Server that uses a mode, because AccECN does not use them (see Section 5.1 for
stateless handshake and implements AccECN receives either of these details). If a Server that uses a stateless handshake and
two values in the ACK, its action is implementation-dependent and implements AccECN receives either of these two values in the
outside the scope of this document. It will certainly not take the ACK, its action is implementation-dependent and outside the
action in the third column because, after it receives either of these scope of this document. It will certainly not take the
values, it is not in AccECN mode. In example, it will not disable action in the third column because, after it receives either
ECN (at least not just because ACE is 0b000) and it will not set of these values, it is not in AccECN mode. For example, it
s.cep. will not disable ECN (at least not just because ACE is
0b000) and it will not set s.cep.
3.2.2.2. Encoding and Decoding Feedback in the ACE Field 3.2.2.2. Encoding and Decoding Feedback in the ACE Field
Whenever the Data Receiver sends an ACK with SYN=0 (with or without Whenever the Data Receiver sends an ACK with SYN=0 (with or without
data), unless the handshake encoding in Section 3.2.2.1 applies, the data), unless the handshake encoding in Section 3.2.2.1 applies, the
Data Receiver MUST encode the least significant 3 bits of its r.cep Data Receiver MUST encode the least significant 3 bits of its r.cep
counter into the ACE field (see Appendix A.2). counter into the ACE field (see Appendix A.2).
Whenever the Data Sender receives an ACK with SYN=0 (with or without Whenever the Data Sender receives an ACK with SYN=0 (with or without
data), it first checks whether it has already been superseded data), it first checks whether it has already been superseded
(defined in Appendix A.1) by another ACK in which case it ignores the (defined in Appendix A.1) by another ACK in which case it ignores the
ECN feedback. If the ACK has not been superseded, and if the special ECN feedback. If the ACK has not been superseded, and if the special
handshake encoding in Section 3.2.2.1 does not apply, the Data Sender handshake encoding in Section 3.2.2.1 does not apply, the Data Sender
decodes the ACE field as follows (see Appendix A.2 for examples). decodes the ACE field as follows (see Appendix A.2 for examples).
* It takes the least significant 3 bits of its local s.cep counter * It takes the least significant 3 bits of its local s.cep counter
and subtracts them from the incoming ACE counter to work out the and subtracts them from the incoming ACE counter to work out the
minimum positive increment it could apply to s.cep (assuming the minimum positive increment it could apply to s.cep (assuming the
ACE field only wrapped at most once). ACE field only wrapped once at most).
* It then follows the safety procedures in Section 3.2.2.5.2 to * It then follows the safety procedures in Section 3.2.2.5.2 to
calculate or estimate how many packets the ACK could have calculate or estimate how many packets the ACK could have
acknowledged under the prevailing conditions to determine whether acknowledged under the prevailing conditions to determine whether
the ACE field might have wrapped more than once. the ACE field might have wrapped more than once.
The encode/decode procedures during the three-way handshake are The encode/decode procedures during the three-way handshake are
exceptions to the general rules given so far, so they are spelled out exceptions to the general rules given so far, so they are spelled out
step by step below for clarity: step by step below for clarity:
skipping to change at page 30, line 19 skipping to change at line 1368
Reason: It would be redundant for the Server to include CE-marked Reason: It would be redundant for the Server to include CE-marked
SYNs in its r.cep counter, because it already reliably delivers SYNs in its r.cep counter, because it already reliably delivers
feedback of any CE marking using the encoding in the top block of feedback of any CE marking using the encoding in the top block of
Table 2 in the SYN/ACK. This also ensures that, when the Server Table 2 in the SYN/ACK. This also ensures that, when the Server
starts using the ACE field, it has not unnecessarily consumed more starts using the ACE field, it has not unnecessarily consumed more
than one initial value, given they can be used to negotiate than one initial value, given they can be used to negotiate
variants of the AccECN protocol (see Appendix B.3). variants of the AccECN protocol (see Appendix B.3).
* If a TCP Client in AccECN mode receives CE feedback in the TCP * If a TCP Client in AccECN mode receives CE feedback in the TCP
flags of a SYN/ACK, it MUST NOT increment s.cep (it remains at its flags of a SYN/ACK, it MUST NOT increment s.cep (it remains at its
initial value of 5), so that it stays in step with r.cep on the initial value of 5) so that it stays in step with r.cep on the
Server. Nonetheless, the TCP Client still triggers the congestion Server. Nonetheless, the TCP Client still triggers the congestion
control actions necessary to respond to the CE feedback. control actions necessary to respond to the CE feedback.
* If a TCP Client in AccECN mode receives a CE mark in the IP-ECN * If a TCP Client in AccECN mode receives a CE mark in the IP-ECN
field of a SYN/ACK, it MUST increment r.cep, but no more than once field of a SYN/ACK, it MUST increment r.cep, but no more than once
no matter how many CE-marked SYN/ACKs it receives no matter how many CE-marked SYN/ACKs it receives (i.e.,
(i.e., incremented from 5 to 6, but no further). incremented from 5 to 6, but no further).
Reason: Incrementing r.cep ensures the Client will eventually Reason: Incrementing r.cep ensures the Client will eventually
deliver any CE marking to the Server reliably when it starts using deliver any CE marking to the Server reliably when it starts using
the ACE field. Even though the Client also feeds back any CE the ACE field. Even though the Client also feeds back any CE
marking on the ACK of the SYN/ACK using the encoding in Table 3, marking on the ACK of the SYN/ACK using the encoding in Table 3,
this ACK is not delivered reliably, so it can be considered as a this ACK is not delivered reliably, so it can be considered as a
timely notification that is redundant but unreliable. The Client timely notification that is redundant but unreliable. The Client
does not increment r.cep more than once, because the Server can does not increment r.cep more than once, because the Server can
only increment s.cep once (see next bullet). Also, this limits only increment s.cep once (see next bullet). Also, this limits
the unnecessarily consumed initial values of the ACE field to two. the unnecessarily consumed initial values of the ACE field to two.
* If a TCP Server in AccECN mode and in SYN-RCVD state receives CE * If a TCP Server in AccECN mode and in SYN-RCVD state receives CE
feedback in the TCP flags of a pure ACK with no SACK blocks, it feedback in the TCP flags of a pure ACK with no SACK blocks, it
MUST increment s.cep (from 5 to 6). The TCP Server then triggers MUST increment s.cep (from 5 to 6). The TCP Server then triggers
the congestion control actions necessary to respond to the CE the congestion control actions necessary to respond to the CE
feedback. feedback.
Reasoning: The TCP Server can only increment s.cep once, because Reasoning: The TCP Server can only increment s.cep once, because
the first ACK it receives will cause it to transition out of SYN- the first ACK it receives will cause it to transition out of SYN-
RCVD state. The Server's congestion response would be no RCVD state. The Server's congestion response would be no
different even if it could receive feedback of more than one CE- different, even if it could receive feedback of more than one CE-
marked SYN/ACK. marked SYN/ACK.
Once the TCP Server transitions to ESTABLISHED state, it might Once the TCP Server transitions to ESTABLISHED state, it might
later receive other pure ACK(s) with the handshake encoding in the later receive other pure ACK(s) with the handshake encoding in the
ACE field. A Server MAY implement a test for such a case, but it ACE field. A Server MAY implement a test for such a case, but it
is not required. Therefore, once in the ESTABLISHED state, it is not required. Therefore, once in the ESTABLISHED state, it
will be sufficient for the Server to consider the ACE field to be will be sufficient for the Server to consider the ACE field to be
encoded as the normal ACE counter on all packets with SYN=0. encoded as the normal ACE counter on all packets with SYN=0.
Reasoning: Such ACKs will be quite unusual, e.g., a SYN/ACK (or Reasoning: Such ACKs will be quite unusual, e.g., a SYN/ACK (or
skipping to change at page 31, line 46 skipping to change at line 1444
comparison implies an invalid transition of the IP-ECN field, for comparison implies an invalid transition of the IP-ECN field, for
the remainder of the half-connection the Server is advised to send the remainder of the half-connection the Server is advised to send
non-ECN-capable packets, but it still ought to respond to any non-ECN-capable packets, but it still ought to respond to any
feedback of CE markings (explained below). However, the Server feedback of CE markings (explained below). However, the Server
MUST remain in the AccECN feedback mode and it MUST continue to MUST remain in the AccECN feedback mode and it MUST continue to
feed back any ECN markings on arriving packets (in its role as feed back any ECN markings on arriving packets (in its role as
Data Receiver). Data Receiver).
If a Data Sender in AccECN mode starts sending non-ECN-capable If a Data Sender in AccECN mode starts sending non-ECN-capable
packets because it has detected mangling, it is still advised to packets because it has detected mangling, it is still advised to
respond to CE feedback. Reason: any CE-marking arriving at the Data respond to CE feedback. Reason: Any CE marking arriving at the Data
Receiver could be due to something early in the path mangling the Receiver could be due to something early in the path mangling the
non-ECN-capable IP-ECN field into an ECN-capable codepoint and then, non-ECN-capable IP-ECN field into an ECN-capable codepoint and then,
later in the path, a network bottleneck might be applying CE-markings later in the path, a network bottleneck might be applying CE markings
to indicate genuine congestion. This argument applies whether the to indicate genuine congestion. This argument applies whether the
handshake packet originally sent by the TCP Client or Server was non- handshake packet originally sent by the TCP Client or Server was non-
ECN-capable or ECN-capable because, in either case, an unsafe ECN-capable or ECN-capable because, in either case, an unsafe
transition could imply that non-ECN-capable packets later in the transition could imply that non-ECN-capable packets later in the
connection might get mangled. connection might get mangled.
Once a Data Sender has entered AccECN mode it is advised to check Once a Data Sender has entered AccECN mode it is advised to check
whether it is receiving continuous feedback of CE. Specifying whether it is receiving continuous feedback of CE. Specifying
exactly how to do this is beyond the scope of the present exactly how to do this is beyond the scope of the present
specification, but the sender might check whether the feedback for specification, but the sender might check whether the feedback for
every packet it sends for the first three or four rounds indicates every packet it sends for the first three or four rounds indicates CE
CE-marking. If continuous CE-marking is detected, for the remainder marking. If continuous CE marking is detected, for the remainder of
of the half-connection, the Data Sender ought to send non-ECN-capable the half-connection, the Data Sender ought to send non-ECN-capable
packets and it is advised not to respond to any feedback of CE packets, and it is advised not to respond to any feedback of CE
markings. The Data Sender might occasionally test whether it can markings. The Data Sender might occasionally test whether it can
resume sending ECN-capable packets. resume sending ECN-capable packets.
The above advice on switching to sending non-ECN-capable packets but The above advice on switching to sending non-ECN-capable packets but
still responding to CE-markings unless they become continuous is not still responding to CE markings unless they become continuous is not
stated normatively (in capitals), because the best strategy might stated normatively (in capitals), because the best strategy might
depend on experience of the most likely types of mangling, which can depend on experience of the most likely types of mangling, which can
only be known at the time of deployment. The same is true for other only be known at the time of deployment. The same is true for other
forms of mangling (or resumption of expected marking) during later forms of mangling (or resumption of expected marking) during later
stages of a connection. stages of a connection.
As always, once a host has entered AccECN mode, it follows the As always, once a host has entered AccECN mode, it follows the
general mandatory requirements (Section 3.1.5) to remain in the same general mandatory requirements (Section 3.1.5) to remain in the same
feedback mode and to continue feeding back any ECN markings on feedback mode and to continue feeding back any ECN markings on
arriving packets using AccECN feedback. This follows the general arriving packets using AccECN feedback. This follows the general
skipping to change at page 32, line 42 skipping to change at line 1488
whatever it receives (Section 2.5). whatever it receives (Section 2.5).
The ACK of the SYN/ACK is not reliably delivered (nonetheless, the The ACK of the SYN/ACK is not reliably delivered (nonetheless, the
count of CE marks is still eventually delivered reliably). If this count of CE marks is still eventually delivered reliably). If this
ACK does not arrive, the Server is advised to continue to send ECN- ACK does not arrive, the Server is advised to continue to send ECN-
capable packets without having tested for mangling of the IP-ECN capable packets without having tested for mangling of the IP-ECN
field on the SYN/ACK. field on the SYN/ACK.
All the fall-back behaviours in this section are necessary in case All the fall-back behaviours in this section are necessary in case
mangling of the IP-ECN field is asymmetric, which is currently common mangling of the IP-ECN field is asymmetric, which is currently common
over some mobile networks [Mandalari18]. Then one end might see no over some mobile networks [Mandalari18]. In this case, one end might
unsafe transition and continue sending ECN-capable packets, while the see no unsafe transition and continue sending ECN-capable packets,
other end sees an unsafe transition and stops sending ECN-capable while the other end sees an unsafe transition and stops sending ECN-
packets. capable packets.
Invalid transitions of the IP-ECN field are defined in section 18 of Invalid transitions of the IP-ECN field are defined in Section 18 of
the Classic ECN specification [RFC3168] and repeated here for the Classic ECN specification [RFC3168] and repeated here for
convenience: convenience:
* the not-ECT codepoint changes; * the Not-ECT codepoint changes;
* either ECT codepoint transitions to not-ECT;
* either ECT codepoint transitions to Not-ECT;
* the CE codepoint changes. * the CE codepoint changes.
RFC 3168 says that a router that changes ECT to not-ECT is invalid RFC 3168 says that a router that changes ECT to Not-ECT is invalid
but safe. However, from a host's viewpoint, this transition is but safe. However, from a host's viewpoint, this transition is
unsafe because it could be the result of two transitions at different unsafe because it could be the result of two transitions at different
routers on the path: ECT to CE (safe) then CE to not-ECT (unsafe). routers on the path: ECT to CE (safe) then CE to Not-ECT (unsafe).
This scenario could well happen where an ECN-enabled home router This scenario could well happen where an ECN-enabled home router
congests its upstream mobile broadband bottleneck link, then the congests its upstream mobile broadband bottleneck link, then the
ingress to the mobile network clears the ECN field [Mandalari18]. ingress to the mobile network clears the ECN field [Mandalari18].
3.2.2.4. Testing for Zeroing of the ACE Field 3.2.2.4. Testing for Zeroing of the ACE Field
Section 3.2.2 required the Data Receiver to initialize the r.cep Section 3.2.2 required the Data Receiver to initialize the r.cep
counter to a non-zero value. Therefore, in either direction the counter to a non-zero value. Therefore, in either direction the
initial value of the ACE counter ought to be non-zero. initial value of the ACE counter ought to be non-zero.
skipping to change at page 34, line 13 skipping to change at line 1554
the other half connection. the other half connection.
If reordering occurs, the first feedback packet that arrives will not If reordering occurs, the first feedback packet that arrives will not
necessarily be the same as the first packet in sequence order. The necessarily be the same as the first packet in sequence order. The
test has been specified loosely like this to simplify implementation, test has been specified loosely like this to simplify implementation,
and because it would not have been any more precise to have specified and because it would not have been any more precise to have specified
the first packet in sequence order, which would not necessarily be the first packet in sequence order, which would not necessarily be
the first ACE counter that the Data Receiver fed back anyway, given the first ACE counter that the Data Receiver fed back anyway, given
it might have been a retransmission. it might have been a retransmission.
The possibility of re-ordering means that there is a small chance The possibility of reordering means that there is a small chance that
that the ACE field on the first packet to arrive is genuinely zero the ACE field on the first packet to arrive is genuinely zero
(without middlebox interference). This would cause a host to (without middlebox interference). This would cause a host to
unnecessarily disable ECN for a half connection. Therefore, in unnecessarily disable ECN for a half connection. Therefore, in
environments where there is no evidence of the ACE field being environments where there is no evidence of the ACE field being
zeroed, implementations MAY skip this test. zeroed, implementations MAY skip this test.
Note that the Data Sender MUST NOT test whether the arriving counter Note that the Data Sender MUST NOT test whether the arriving counter
in the initial ACE field has been initialized to a specific valid in the initial ACE field has been initialized to a specific valid
value - the above check solely tests whether the ACE fields have been value -- the above check solely tests whether the ACE fields have
incorrectly zeroed. This allows hosts to use different initial been incorrectly zeroed. This allows hosts to use different initial
values as an additional signalling channel in future. values as an additional signalling channel in the future.
3.2.2.5. Safety against Ambiguity of the ACE Field 3.2.2.5. Safety Against Ambiguity of the ACE Field
If too many CE-marked segments are acknowledged at once, or if a long If too many CE-marked segments are acknowledged at once, or if a long
run of ACKs is lost or thinned out, the 3-bit counter in the ACE run of ACKs is lost or thinned out, the 3-bit counter in the ACE
field might have cycled between two ACKs arriving at the Data Sender. field might have cycled between two ACKs arriving at the Data Sender.
The following safety procedures minimize this ambiguity. The following safety procedures minimize this ambiguity.
3.2.2.5.1. Packet Receiver Safety Procedures 3.2.2.5.1. Packet Receiver Safety Procedures
The following rules define when the receiver of a packet in AccECN The following rules define when the receiver of a packet in AccECN
mode emits an ACK: mode emits an ACK:
Change-Triggered ACKs: An AccECN Data Receiver SHOULD emit an ACK Change-Triggered ACKs: An AccECN Data Receiver SHOULD emit an ACK
whenever a data packet marked CE arrives after the previous packet whenever a data packet marked CE arrives after the previous packet
was not CE. was not CE.
Even though this rule is stated as a "SHOULD", it is important for Even though this rule is stated as a "SHOULD", it is important for
a transition to trigger an ACK if at all possible, The only valid a transition to trigger an ACK if at all possible. The only valid
exception to this rule is given below these bullets. exception to this rule is given below these bullets.
For the avoidance of doubt, this rule is deliberately worded to For the avoidance of doubt, this rule is deliberately worded to
apply solely when _data_ packets arrive, but the comparison with apply solely when _data_ packets arrive, but the comparison with
the previous packet includes any packet, not just data packets. the previous packet includes any packet, not just data packets.
Increment-Triggered ACKs: An AccECN receiver of a packet MUST emit Increment-Triggered ACKs: An AccECN receiver of a packet MUST emit
an ACK if 'n' CE marks have arrived since the previous ACK. If an ACK if 'n' CE marks have arrived since the previous ACK. If
there is unacknowledged data at the receiver, 'n' SHOULD be 2. If there is unacknowledged data at the receiver, 'n' SHOULD be 2. If
there is no unacknowledged data at the receiver, 'n' SHOULD be 3 there is no unacknowledged data at the receiver, 'n' SHOULD be 3
and MUST be no less than 3. In either case, 'n' MUST be no and MUST be no less than 3. In either case, 'n' MUST be no
greater than 7. greater than 7.
The above rules for when to send an ACK are designed to be The above rules for when to send an ACK are designed to be
complemented by those in Section 3.2.3.3, which concern whether an complemented by those in Section 3.2.3.3, which concern whether an
AccECN TCP Option ought to be included on ACKs. AccECN TCP Option ought to be included on ACKs.
If the arrivals of a number of data packets are all processed as one If the arrivals of a number of data packets are all processed as one
event, e.g., using large receive offload (LRO) or generic receive event, e.g., using large receive offload (LRO) or generic receive
offload (GRO), both the above rules SHOULD be interpreted as offload (GRO), both the above rules SHOULD be interpreted as
requiring multiple ACKs to be emitted back-to-back (for each requiring multiple ACKs to be emitted back to back (for each
transition and for each sequence of 'n' CE marks). If this is transition and for each sequence of 'n' CE marks). If this is
problematic for high performance, either rule can be interpreted as problematic for high performance, either rule can be interpreted as
requiring just a single ACK at the end of the whole receive event. requiring just a single ACK at the end of the whole receive event.
Even if a number of data packets do not arrive as one event, the Even if a number of data packets do not arrive as one event, the
'Change-Triggered ACKs' rule could sometimes cause the ACK rate to be 'Change-Triggered ACKs' rule could sometimes cause the ACK rate to be
problematic for high performance (although high performance protocols problematic for high performance (although high performance protocols
such as DCTCP already successfully use change-triggered ACKs). The such as DCTCP already successfully use change-triggered ACKs). The
rationale for change-triggered ACKs is so that the Data Sender can rationale for change-triggered ACKs is so that the Data Sender can
rely on them to detect queue growth as soon as possible, particularly rely on them to detect queue growth as soon as possible, particularly
at the start of a flow. The approach can lead to some additional at the start of a flow. The approach can lead to some additional
ACKs but it feeds back the timing and the order in which ECN marks ACKs but it feeds back the timing and the order in which ECN marks
are received with minimal additional complexity. If CE marks are are received with minimal additional complexity. If CE marks are
infrequent, as is the case for most Active Queue Managment (AQM) infrequent, as is the case for most Active Queue Management (AQM)
packet schedulers at the time of writing, or there are multiple marks packet schedulers at the time of writing, or there are multiple marks
in a row, the additional load will be low. However, marking patterns in a row, the additional load will be low. However, marking patterns
with numerous non-contiguous CE marks could increase the load with numerous non-contiguous CE marks could increase the load
significantly. One possible compromise would be for the receiver to significantly. One possible compromise would be for the receiver to
heuristically detect whether the sender is in slow-start, then to heuristically detect whether the sender is in slow-start, then to
implement change-triggered ACKs while the sender is in slow-start, implement change-triggered ACKs while the sender is in slow-start,
and offload otherwise. and offload otherwise.
In a scenario where both endpoints support AccECN, if host B has In a scenario where both endpoints support AccECN, if host B has
chosen to use ECN-capable pure ACKs (as allowed in [RFC8311] chosen to use ECN-capable pure ACKs (as allowed in [RFC8311]
experiments) and enough of these ACKs become CE-marked, then the experiments) and enough of these ACKs become CE marked, then the
'Increment-Triggered ACKs' rule ensures that its peer (host A) gives 'Increment-Triggered ACKs' rule ensures that its peer (host A) gives
B sufficient feedback about this congestion on the ACKs from B to A. B sufficient feedback about this congestion on the ACKs from B to A.
Normally, for instance in a unidirectional data scenario from host A Normally, for instance in a unidirectional data scenario from host A
to B, the Data Sender (A) can piggyback that feedback on its data. to B, the Data Sender (A) can piggyback that feedback on its data.
But if A stops sending data, the second part of the 'Increment- But if A stops sending data, the second part of the 'Increment-
Triggered ACKs' rule requires A to emit a pure ACK for at least every Triggered ACKs' rule requires A to emit a pure ACK for at least every
third CE-marked incoming ACK over the subsequent round trip. third CE-marked incoming ACK over the subsequent round trip.
Although TCP normally only ACKs data segments, in this case the Although TCP normally only ACKs data segments, in this case the
increment-triggered ACK rule makes it mandatory for A to emit ACKs of increment-triggered ACK rule makes it mandatory for A to emit ACKs of
skipping to change at page 36, line 21 skipping to change at line 1655
even if A also uses ECN-capable pure ACKs, and even if there is even if A also uses ECN-capable pure ACKs, and even if there is
pathological congestion in both directions, any resulting ping-pong pathological congestion in both directions, any resulting ping-pong
of ACKs will be rapidly damped. of ACKs will be rapidly damped.
In the above bidirectional scenario, incoming ACKs of ACKs could be In the above bidirectional scenario, incoming ACKs of ACKs could be
mistaken for duplicate ACKs. But ACKs of ACKs can be distinguished mistaken for duplicate ACKs. But ACKs of ACKs can be distinguished
from duplicate ACKs because they do not contain any SACK blocks even from duplicate ACKs because they do not contain any SACK blocks even
when SACK has been negotiated. It is outside the scope of this when SACK has been negotiated. It is outside the scope of this
AccECN specification to normatively specify this additional test for AccECN specification to normatively specify this additional test for
DupACKs, because ACKs of ACKs can only arise if the original ACKs are DupACKs, because ACKs of ACKs can only arise if the original ACKs are
ECN-capable. Instead any specification that allows ECN-capable pure ECN-capable. Instead, any specification that allows ECN-capable pure
ACKs MUST make sending ACKs of ACKs conditional on measures to ACKs MUST make sending ACKs of ACKs conditional on measures to
distinguish ACKs of ACKs from DupACKs (see for example distinguish ACKs of ACKs from DupACKs (see for example [ECN++]). All
[I-D.ietf-tcpm-generalized-ecn]). All that is necessary here is to that is necessary here is to require that these ACKs of ACKs MUST NOT
require that these ACKs of ACKs MUST NOT contain any SACK blocks contain any SACK blocks (which would normally not happen anyway).
(which would normally not happen anyway).
3.2.2.5.2. Data Sender Safety Procedures 3.2.2.5.2. Data Sender Safety Procedures
If the Data Sender has not received AccECN TCP Options to give it If the Data Sender has not received AccECN TCP Options to give it
more dependable information, and it detects that the ACE field could more dependable information, and it detects that the ACE field could
have cycled, it SHOULD deem whether it cycled by taking the safest have cycled, it SHOULD deem whether it cycled by taking the safest
likely case under the prevailing conditions. It can detect if the likely case under the prevailing conditions. It can detect if the
counter could have cycled by using the jump in the acknowledgement counter could have cycled by using the jump in the acknowledgement
number since the last ACK to calculate or estimate how many segments number since the last ACK to calculate or estimate how many segments
could have been acknowledged. An example algorithm to implement this could have been acknowledged. An example algorithm to implement this
skipping to change at page 37, line 33 skipping to change at line 1715
| Kind = 174 | Length = 11 | EE1B field | | Kind = 174 | Length = 11 | EE1B field |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| EE1B (cont'd) | ECEB field | | EE1B (cont'd) | ECEB field |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| EE0B field | Order 1 | EE0B field | Order 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Figure 4: The Two Alternative AccECN TCP Options Figure 4: The Two Alternative AccECN TCP Options
Figure 4 shows two option field orders; order 0 and order 1. They Figure 4 shows two option field orders; order 0 and order 1. They
both consists of three 24-bit fields. Order 0 provides the 24 least both consist of three 24-bit fields. Order 0 provides the 24 least
significant bits of the r.e0b, r.ceb and r.e1b counters, significant bits of the r.e0b, r.ceb, and r.e1b counters,
respectively. Order 1 provides the same fields, but in the opposite respectively. Order 1 provides the same fields, but in the opposite
order. On each packet, the Data Receiver can use whichever order is order. On each packet, the Data Receiver can use whichever order is
more efficient. In either case, the bytes within the fields are in more efficient. In either case, the bytes within the fields are in
network byte order (big-endian). network byte order (big-endian).
The choice to use three bytes (24 bits) fields in the options was The choice to use three bytes (24 bits) fields in the options was
made to strike a balance between TCP option space usage, and the made to strike a balance between TCP option space usage, and the
required fidelity of the counters to accomodate typical scenarios required fidelity of the counters to accommodate typical scenarios
such as hardware TCP segmentation offloading (TSO), and periods where such as hardware TCP Segmentation Offloading (TSO), and periods
no option may be transmitted (e.g., SACK loss recovery). Providing during which no option may be transmitted (e.g., SACK loss recovery).
only 2 bytes (16 bits) for these counters could easily roll over Providing only 2 bytes (16 bits) for these counters could easily roll
within a single TSO transmission or large/generic receive offload over within a single TSO transmission or large/generic receive
(LRO/GRO) event. Having two distinct orderings further allows the offload (LRO/GRO) event. Having two distinct orderings further
transmission of the most pertinent changes in an abbreviated option allows the transmission of the most pertinent changes in an
(see below). abbreviated option (see below).
When a Data Receiver sends an AccECN Option, it MUST set the Kind When a Data Receiver sends an AccECN Option, it MUST set the Kind
field to 172 if using Order 0, or to 174 if using Order 1. These two field to 172 if using Order 0, or to 174 if using Order 1. These two
new TCP Option Kinds are registered in Section 7 and called new TCP Option Kinds are registered in Section 7 and are called
respectively AccECN0 and AccECN1. AccECN0 and AccECN1, respectively.
Note that there is no field to feed back Not-ECT bytes. Nonetheless Note that there is no field to feed back Not-ECT bytes. Nonetheless,
an algorithm for the Data Sender to calculate the number of payload an algorithm for the Data Sender to calculate the number of payload
bytes received as Not-ECT is given in Appendix A.4. bytes received as Not-ECT is given in Appendix A.4.
Whenever a Data Receiver sends an AccECN Option, the rules in Whenever a Data Receiver sends an AccECN Option, the rules in
Section 3.2.3.3 allow it to omit unchanged fields from the tail of Section 3.2.3.3 allow it to omit unchanged fields from the tail of
the option, to help cope with option space limitations, as long as it the option, to help cope with option space limitations, as long as it
preserves the order of the remaining fields and includes any field preserves the order of the remaining fields and includes any field
that has changed. The length field MUST indicate which fields are that has changed. The length field MUST indicate which fields are
present as follows: present as follows:
skipping to change at page 38, line 48 skipping to change at line 1776
but there is very limited space for the option. but there is very limited space for the option.
All implementations of a Data Sender that read any AccECN Option MUST All implementations of a Data Sender that read any AccECN Option MUST
be able to read AccECN Options of any of the above lengths. For be able to read AccECN Options of any of the above lengths. For
forward compatibility, if the AccECN Option is of any other length, forward compatibility, if the AccECN Option is of any other length,
implementations MUST use those whole 3-octet fields that fit within implementations MUST use those whole 3-octet fields that fit within
the length and ignore the remainder of the option, treating it as the length and ignore the remainder of the option, treating it as
padding. padding.
AccECN Options have to be optional to implement, because both sender AccECN Options have to be optional to implement, because both sender
and receiver have to be able to cope without options anyway - in and receiver have to be able to cope without options anyway -- in
cases where they do not traverse a network path. It is RECOMMENDED cases where they do not traverse a network path. It is RECOMMENDED
to implement both sending and receiving of AccECN Options. Support to implement both sending and receiving of AccECN Options. Support
for AccECN Options is particularly valuable over paths that introduce for AccECN Options is particularly valuable over paths that introduce
a high degree of ACK filtering, where the 3-bit ACE counter alone a high degree of ACK filtering, where the 3-bit ACE counter alone
might sometimes be insufficient, when it is ambiguous whether it has might sometimes be insufficient, when it is ambiguous whether it has
wrapped. If sending of AccECN Options is implemented, the fall-backs wrapped. If sending of AccECN Options is implemented, the fall-backs
described in this document will need to be implemented as well described in this document will need to be implemented as well
(unless solely for a controlled environment where path traversal is (unless solely for a controlled environment where path traversal is
not considered a problem). Even if a developer does not implement not considered a problem). Even if a developer does not implement
logic to understand received AccECN Options, it is RECOMMENDED that logic to understand received AccECN Options, it is RECOMMENDED that
they implement logic to send AccECN Options. Otherwise, those remote they implement logic to send AccECN Options. Otherwise, those remote
peers that implement the receiving logic will still be excluded from peers that implement the receiving logic will still be excluded from
congestion feedback that is robust against the increasingly congestion feedback that is robust against the increasingly
aggressive ACK filtering in the Internet. The logic to send AccECN aggressive ACK filtering in the Internet. The logic to send AccECN
Options is the simpler to implement of the two sides. Options is the simpler to implement of the two sides.
If a Data Receiver intends to send an AccECN Option at any time If a Data Receiver intends to send an AccECN Option at any time
during the rest of the connection it is RECOMMENDED to also test path during the rest of the connection, it is RECOMMENDED to also test
traversal of the AccECN Option as specified in Section 3.2.3.2. path traversal of the AccECN Option as specified in Section 3.2.3.2.
3.2.3.1. Encoding and Decoding Feedback in the AccECN Option Fields 3.2.3.1. Encoding and Decoding Feedback in the AccECN Option Fields
Whenever the Data Receiver includes any of the counter fields (ECEB, Whenever the Data Receiver includes any of the counter fields (ECEB,
EE0B, EE1B) in an AccECN Option, it MUST encode the 24 least EE0B, EE1B) in an AccECN Option, it MUST encode the 24 least
significant bits of the current value of the associated counter into significant bits of the current value of the associated counter into
the field (respectively r.ceb, r.e0b, r.e1b). the field (respectively r.ceb, r.e0b, r.e1b).
Whenever the Data Sender receives an ACK carrying an AccECN Option, Whenever the Data Sender receives an ACK carrying an AccECN Option,
it first checks whether the ACK has already been superseded by it first checks whether the ACK has already been superseded by
another ACK in which case it ignores the ECN feedback. If the ACK another ACK in which case it ignores the ECN feedback. If the ACK
has not been superseded, the Data Sender normally decodes the fields has not been superseded, the Data Sender normally decodes the fields
in the AccECN Option as follows. For each field, it takes the least in the AccECN Option as follows. For each field, it takes the least
significant 24 bits of its associated local counter (s.ceb, s.e0b or significant 24 bits of its associated local counter (s.ceb, s.e0b, or
s.e1b) and subtracts them from the counter in the associated field of s.e1b) and subtracts them from the counter in the associated field of
the incoming AccECN Option (respectively ECEB, EE0B, EE1B), to work the incoming AccECN Option (respectively ECEB, EE0B, EE1B), to work
out the minimum positive increment it could apply to s.ceb, s.e0b or out the minimum positive increment it could apply to s.ceb, s.e0b, or
s.e1b (assuming the field in the option only wrapped at most once). s.e1b (assuming the field in the option only wrapped once at most).
Appendix A.1 gives an example algorithm for the Data Receiver to Appendix A.1 gives an example algorithm for the Data Receiver to
encode its byte counters into an AccECN Option, and for the Data encode its byte counters into an AccECN Option, and for the Data
Sender to decode the AccECN Option fields into its byte counters. Sender to decode the AccECN Option fields into its byte counters.
Note that, as specified in Section 3.2, any data on the SYN (SYN=1, Note that, as specified in Section 3.2, any data on the SYN (SYN=1,
ACK=0) is not included in any of the byte counters held locally for ACK=0) is not included in any of the byte counters held locally for
each ECN marking nor in an AccECN Option on the wire. each ECN marking nor in an AccECN Option on the wire.
3.2.3.2. Path Traversal of the AccECN Option 3.2.3.2. Path Traversal of the AccECN Option
3.2.3.2.1. Testing the AccECN Option during the Handshake
3.2.3.2.1. Testing the AccECN Option During the Handshake
The TCP Client MUST NOT include an AccECN TCP Option on the SYN. If The TCP Client MUST NOT include an AccECN TCP Option on the SYN. If
there is somehow an AccECN Option on a SYN, it MUST be ignored when there is somehow an AccECN Option on a SYN, it MUST be ignored when
forwarded or received. forwarded or received.
A TCP Server that confirms its support for AccECN (in response to an A TCP Server that confirms its support for AccECN (in response to an
AccECN SYN from the Client as described in Section 3.1) SHOULD AccECN SYN from the Client as described in Section 3.1) SHOULD
include an AccECN TCP Option on the SYN/ACK. include an AccECN TCP Option on the SYN/ACK.
A TCP Client that has successfully negotiated AccECN SHOULD include A TCP Client that has successfully negotiated AccECN SHOULD include
an AccECN Option in the first ACK at the end of the three-way an AccECN Option in the first ACK at the end of the three-way
handshake. However, this first ACK is not delivered reliably, so the handshake. However, this first ACK is not delivered reliably, so the
TCP Client SHOULD also include an AccECN Option on the first data TCP Client SHOULD also include an AccECN Option on the first data
segment it sends (if it ever sends one). segment it sends (if it ever sends one).
A host MAY omit an AccECN Option in any of the above three cases due A host MAY omit an AccECN Option in any of the above three cases
to insufficient option space or if it has cached knowledge that the because of insufficient option space or because it has cached
packet would be likely to be blocked on the path to the other host if knowledge that the packet would be likely to be blocked on the path
it included an AccECN Option. to the other host if it included an AccECN Option.
3.2.3.2.2. Testing for Loss of Packets Carrying the AccECN Option 3.2.3.2.2. Testing for Loss of Packets Carrying the AccECN Option
If the TCP Server has not received an ACK to acknowledge its SYN/ACK If the TCP Server has not received an ACK to acknowledge its SYN/ACK
after the normal TCP timeout or it receives a second SYN with a after the normal TCP timeout or if it receives a second SYN with a
request for AccECN support, then either the SYN/ACK might just have request for AccECN support, then either the SYN/ACK might just have
been lost, e.g., due to congestion, or a middlebox might be blocking been lost, e.g., due to congestion, or a middlebox might be blocking
AccECN Options. To expedite connection setup in deployment scenarios AccECN Options. To expedite connection setup in deployment scenarios
where AccECN path traversal might be problematic, the TCP Server where AccECN path traversal might be problematic, the TCP Server
SHOULD retransmit the SYN/ACK, but with no AccECN Option. If this SHOULD retransmit the SYN/ACK, but with no AccECN Option. If this
retransmission times out, to expedite connection setup, the TCP retransmission times out, to expedite connection setup, the TCP
Server SHOULD retransmit the SYN/ACK with (AE,CWR,ECE) = (0,0,0) and Server SHOULD retransmit the SYN/ACK with (AE,CWR,ECE) = (0,0,0) and
no AccECN Option, but it remains in AccECN feedback mode (per no AccECN Option, but it remains in AccECN feedback mode (per
Section 3.1.5). Section 3.1.5).
skipping to change at page 41, line 7 skipping to change at line 1875
The above fall-back approach limits any interference by middleboxes The above fall-back approach limits any interference by middleboxes
that might drop packets with unknown options, even though it is more that might drop packets with unknown options, even though it is more
likely that SYN/ACK loss is due to congestion. The TCP Server MAY likely that SYN/ACK loss is due to congestion. The TCP Server MAY
try to send another packet with an AccECN Option at a later point try to send another packet with an AccECN Option at a later point
during the connection but it ought to monitor if that packet got lost during the connection but it ought to monitor if that packet got lost
as well, in which case it SHOULD disable the sending of AccECN as well, in which case it SHOULD disable the sending of AccECN
Options for this half-connection. Options for this half-connection.
Implementers MAY use other fall-back strategies if they are found to Implementers MAY use other fall-back strategies if they are found to
be more effective (e.g., retrying an AccECN Option for a second time be more effective (e.g., retrying an AccECN Option for a second time
before fall-back - most appropriate during high levels of before fall-back -- most appropriate during high levels of
congestion). However, other fall-back strategies will need to follow congestion). However, other fall-back strategies will need to follow
all the rules in Section 3.1.5, which concern behaviour when SYNs or all the rules in Section 3.1.5, which concern behaviour when SYNs or
SYN/ACKs negotiating different types of feedback have been sent SYN/ACKs negotiating different types of feedback have been sent
within the same connection. within the same connection.
Further it might make sense to also remove any other new or Further it might make sense to also remove any other new or
experimental fields or options on the SYN/ACK, although the required experimental fields or options on the SYN/ACK, although the required
behaviour will depend on the specification of the other option(s) and behaviour will depend on the specification of the other option(s) and
on any attempt to co-ordinate fall-back between different modules of on any attempt to coordinate fall-back between different modules of
the stack. the stack.
If the TCP Client detects that the first data segment it sent with an If the TCP Client detects that the first data segment it sent with an
AccECN Option was lost, in deployment scenarios where AccECN path AccECN Option was lost, in deployment scenarios where AccECN path
traversal might be problematic, it SHOULD fall back to no AccECN traversal might be problematic, it SHOULD fall back to no AccECN
Option on the retransmission. Again, implementers MAY use other Option on the retransmission. Again, implementers MAY use other
fall-back strategies such as attempting to retransmit a second fall-back strategies such as attempting to retransmit a second
segment with an AccECN Option before fall-back, and/or caching segment with an AccECN Option before fall-back, and/or caching
whether AccECN Options are blocked for subsequent connections. whether AccECN Options are blocked for subsequent connections.
[RFC9040] further discusses caching of TCP parameters and status [RFC9040] further discusses caching of TCP parameters and status
skipping to change at page 41, line 40 skipping to change at line 1908
recognize, a host that is sending little or no data but mostly pure recognize, a host that is sending little or no data but mostly pure
ACKs will not inherently detect such losses. Such a host MAY detect ACKs will not inherently detect such losses. Such a host MAY detect
loss of ACKs carrying the AccECN Option by detecting whether the loss of ACKs carrying the AccECN Option by detecting whether the
acknowledged data always reappears as a retransmission. In such acknowledged data always reappears as a retransmission. In such
cases, the host SHOULD disable the sending of the AccECN Option for cases, the host SHOULD disable the sending of the AccECN Option for
this half-connection. this half-connection.
If a host falls back to not sending AccECN Options, it will continue If a host falls back to not sending AccECN Options, it will continue
to process any incoming AccECN Options as normal. to process any incoming AccECN Options as normal.
Either host MAY include AccECN Options in a subsequent segment or Either host MAY include AccECN Options in one or more subsequent
segments to retest whether AccECN Options can traverse the path. segments to retest whether AccECN Options can traverse the path.
Similarly, an AccECN endpoint MAY separately memorize which data Similarly, an AccECN endpoint MAY separately memorize which data
packets carried an AccECN Option and disable the sending of AccECN packets carried an AccECN Option and disable the sending of AccECN
Options if the loss probability of those packets is significantly Options if the loss probability of those packets is significantly
higher than that of all other data packets in the same connection. higher than that of all other data packets in the same connection.
3.2.3.2.3. Testing for Absence of the AccECN Option 3.2.3.2.3. Testing for Absence of the AccECN Option
If the TCP Client has successfully negotiated AccECN but does not If the TCP Client has successfully negotiated AccECN but does not
skipping to change at page 43, line 5 skipping to change at line 1962
the initial value of the EE0B field or EE1B field in an AccECN Option the initial value of the EE0B field or EE1B field in an AccECN Option
(if one exists) ought to be non-zero. If AccECN has been negotiated: (if one exists) ought to be non-zero. If AccECN has been negotiated:
* the TCP Server MAY check that the initial value of the EE0B field * the TCP Server MAY check that the initial value of the EE0B field
or the EE1B field is non-zero in the first segment that or the EE1B field is non-zero in the first segment that
acknowledges sequence space that at least covers the ISN plus 1. acknowledges sequence space that at least covers the ISN plus 1.
If it runs a test and either initial value is zero, the Server If it runs a test and either initial value is zero, the Server
will switch into a mode that ignores AccECN Options for this half will switch into a mode that ignores AccECN Options for this half
connection. connection.
* the TCP Client MAY check the initial value of the EE0B field or * the TCP Client MAY check that the initial value of the EE0B field
the EE1B field is non-zero on the SYN/ACK. If it runs a test and or the EE1B field is non-zero on the SYN/ACK. If it runs a test
either initial value is zero, the Client will switch into a mode and either initial value is zero, the Client will switch into a
that ignores AccECN Options for this half connection. mode that ignores AccECN Options for this half connection.
While a host is in the mode that ignores AccECN Options it MUST adopt While a host is in the mode that ignores AccECN Options, it MUST
the conservative interpretation of the ACE field discussed in adopt the conservative interpretation of the ACE field discussed in
Section 3.2.2.5. Section 3.2.2.5.
Note that the Data Sender MUST NOT test whether the arriving byte Note that the Data Sender MUST NOT test whether the arriving byte
counters in an initial AccECN Option have been initialized to counters in an initial AccECN Option have been initialized to
specific valid values - the above checks solely test whether these specific valid values -- the above checks solely test whether these
fields have been incorrectly zeroed. This allows hosts to use fields have been incorrectly zeroed. This allows hosts to use
different initial values as an additional signalling channel in different initial values as an additional signalling channel in the
future. Also note that the initial value of either field might be future. Also note that the initial value of either field might be
greater than its expected initial value, because the counters might greater than its expected initial value, because the counters might
already have been incremented. Nonetheless, the initial values of already have been incremented. Nonetheless, the initial values of
the counters have been chosen so that they cannot wrap to zero on the counters have been chosen so that they cannot wrap to zero on
these initial segments. these initial segments.
3.2.3.2.5. Consistency between AccECN Feedback Fields 3.2.3.2.5. Consistency Between AccECN Feedback Fields
When AccECN Options are available they ought to provide more When AccECN Options are available, they ought to provide more
unambiguous feedback. However, they supplement but do not replace unambiguous feedback. However, they supplement but do not replace
the ACE field. An endpoint using AccECN feedback MUST always the ACE field. An endpoint using AccECN feedback MUST always
reconcile the information provided in the ACE field with that in any reconcile the information provided in the ACE field with that in any
AccECN Option, so that the state of the ACE-related packet counter AccECN Option, so that the state of the ACE-related packet counter
can be relied on if future feedback does not carry an AccECN Option. can be relied on if future feedback does not carry an AccECN Option.
If an AccECN Option is present, the s.cep counter might increase more If an AccECN Option is present, the s.cep counter might increase more
than expected from the increase of the s.ceb counter (e.g., due to a than expected from the increase of the s.ceb counter (e.g., due to a
CE-marked control packet). The sender's response to such a situation CE-marked control packet). The sender's response to such a situation
is out of scope, and needs to be dealt with in a specification that is out of scope, and needs to be dealt with in a specification that
skipping to change at page 44, line 8 skipping to change at line 2012
the s.cep has not (and by testing ACK coverage it is certain how much the s.cep has not (and by testing ACK coverage it is certain how much
the ACE field has wrapped), and if there is no explanation other than the ACE field has wrapped), and if there is no explanation other than
an invalid protocol transition due to some form of feedback mangling, an invalid protocol transition due to some form of feedback mangling,
the Data Sender MUST disable sending ECN-capable packets for the the Data Sender MUST disable sending ECN-capable packets for the
remainder of the half-connection by setting the IP-ECN field in all remainder of the half-connection by setting the IP-ECN field in all
subsequent packets to Not-ECT. subsequent packets to Not-ECT.
3.2.3.3. Usage of the AccECN TCP Option 3.2.3.3. Usage of the AccECN TCP Option
If a Data Receiver in AccECN mode intends to use AccECN TCP Options If a Data Receiver in AccECN mode intends to use AccECN TCP Options
to provide feedback, the rules below determine when it includes an to provide feedback, the rules below determine when to include an
AccECN TCP Option, and which fields to include, given other options AccECN TCP Option, and which fields to include, given other options
might be competing for limited option space: might be competing for limited option space:
Importance of Congestion Control: AccECN is for congestion control, Importance of Congestion Control: AccECN is for congestion control,
which implementations SHOULD generally prioritize over other TCP which implementations SHOULD generally prioritize over other TCP
options when there is insufficient space for all the options in options when there is insufficient space for all the options in
use. use.
If SACK has been negotiated [RFC2018], and the smallest If SACK has been negotiated [RFC2018], and the smallest
recommended AccECN Option would leave insufficient space for two recommended AccECN Option would leave insufficient space for two
skipping to change at page 44, line 38 skipping to change at line 2042
A scheduled ACK means an ACK that the Data Receiver would send by A scheduled ACK means an ACK that the Data Receiver would send by
its regular delayed ACK rules. Recall that Section 1.3 defines an its regular delayed ACK rules. Recall that Section 1.3 defines an
'ACK' as either with data payload or without. But the above rule 'ACK' as either with data payload or without. But the above rule
is worded so that, in the common case when most of the data is is worded so that, in the common case when most of the data is
from a Server to a Client, the Server only includes an AccECN TCP from a Server to a Client, the Server only includes an AccECN TCP
Option while it is acknowledging data from the Client. Option while it is acknowledging data from the Client.
When available TCP option space is limited on particular packets, the When available TCP option space is limited on particular packets, the
recommended scheme will need to include compromises. To guide the recommended scheme will need to include compromises. To guide the
implementer the rules below are ranked in order of importance, but implementer, the rules below are ranked in order of importance, but
the final decision has to be implementation-dependent, because the final decision has to be implementation-dependent, because
tradeoffs will alter as new TCP options are defined and new use-cases tradeoffs will alter as new TCP options are defined and new use-cases
arise. arise.
Necessary Option Length: When TCP option space is limited, an AccECN Necessary Option Length: When TCP option space is limited, an AccECN
TCP option MAY be truncated to omit one or two fields from the end TCP option MAY be truncated to omit one or two fields from the end
of the option, as indicated by the permitted variants listed in of the option, as indicated by the permitted variants listed in
Table 5, provided that the counter(s) that have changed since the Table 5, provided that the counter(s) that have changed since the
previous AccECN TCP option are not omitted. previous AccECN TCP option are not omitted.
skipping to change at page 45, line 51 skipping to change at line 2104
available for payload data with counter field(s) that have never available for payload data with counter field(s) that have never
changed. changed.
As an example of the recommended scheme, if ECT(0) is the only As an example of the recommended scheme, if ECT(0) is the only
codepoint that has ever arrived in the IP-ECN field, the Data codepoint that has ever arrived in the IP-ECN field, the Data
Receiver will feed back an AccECN0 TCP Option with only the EE0B Receiver will feed back an AccECN0 TCP Option with only the EE0B
field on every packet that acknowledges new data. However, as soon field on every packet that acknowledges new data. However, as soon
as even one CE-marked packet arrives, on every packet that as even one CE-marked packet arrives, on every packet that
acknowledges new data it will start to include an option with two acknowledges new data it will start to include an option with two
fields, EE0B and ECEB. As a second example, if the first packet to fields, EE0B and ECEB. As a second example, if the first packet to
arrive happens to be CE-marked, the Data Receiver will have to arrive happens to be CE marked, the Data Receiver will have to
arbitrarily choose whether to precede the ECEB field with an EE0B arbitrarily choose whether to precede the ECEB field with an EE0B
field or an EE1B field. If it chooses, say, EEB0 but it turns out field or an EE1B field. If it chooses, say, EEB0 but it turns out
never to receive ECT(0), it can start sending EE1B and ECEB instead - never to receive ECT(0), it can start sending EE1B and ECEB instead
it does not have to include the EE0B field if the r.e0b counter has -- it does not have to include the EE0B field if the r.e0b counter
never changed during the connection. never changed during the connection.
With the recommended scheme, if the data sending direction switches With the recommended scheme, if the data sending direction switches
during a connection, there can be cases where the AccECN TCP Option during a connection, there can be cases where the AccECN TCP Option
that is meant to feed back the counter values at the end of a volley that is meant to feed back the counter values at the end of a volley
in one direction never reaches the other peer, due to packet loss. in one direction never reaches the other peer due to packet loss.
ACE feedback ought to be sufficient to fill this gap, given accurate ACE feedback ought to be sufficient to fill this gap, given accurate
feedback becomes moot after data transmission has paused. feedback becomes moot after data transmission has paused.
Appendix A.3 gives an example algorithm to estimate the number of Appendix A.3 gives an example algorithm to estimate the number of
marked bytes from the ACE field alone, if AccECN Options are not marked bytes from the ACE field alone, if AccECN Options are not
available. available.
If a host has determined that segments with AccECN Options always If a host has determined that segments with AccECN Options always
seem to be discarded somewhere along the path, it is no longer seem to be discarded somewhere along the path, it is no longer
obliged to follow any of the rules in this section. obliged to follow any of the rules in this section.
3.3. AccECN Compliance Requirements for TCP Proxies, Offload Engines 3.3. AccECN Compliance Requirements for TCP Proxies, Offload Engines,
and other Middleboxes and Other Middleboxes
Given AccECN alters the TCP protocol on the wire, this section Given AccECN alters the TCP protocol on the wire, this section
specifies new requirements on certain networking equipment that specifies new requirements on certain networking equipment that
forwards TCP and inspects TCP header information. forwards TCP and inspects TCP header information.
3.3.1. Requirements for TCP Proxies 3.3.1. Requirements for TCP Proxies
A large class of middleboxes split TCP connections. Such a middlebox A large class of middleboxes split TCP connections. Such a middlebox
would be compliant with the AccECN protocol if the TCP implementation would be compliant with the AccECN protocol if the TCP implementation
on each side complied with the present AccECN specification and each on each side complied with the present AccECN specification and each
side negotiated AccECN independently of the other side. side negotiated AccECN independently of the other side.
3.3.2. Requirements for Transparent Middleboxes and TCP Normalizers 3.3.2. Requirements for Transparent Middleboxes and TCP Normalizers
Another large class of middleboxes intervenes to some degree at the Another large class of middleboxes intervenes to some degree at the
transport layer, but attempts to be transparent (invisible) to the transport layer, but attempts to be transparent (invisible) to the
end-to-end connection. A subset of this class of middleboxes end-to-end connection. A subset of this class of middleboxes
attempts to `normalize' the TCP wire protocol by checking that all attempts to 'normalize' the TCP wire protocol by checking that all
values in header fields comply with a rather narrow interpretation of values in header fields comply with a rather narrow interpretation of
the TCP specifications that is also not always up to date. the TCP specifications that is not always up to date.
A middlebox that is not normalizing the TCP protocol and does not A middlebox that is not normalizing the TCP protocol and does not
itself act as a back-to-back pair of TCP endpoints (i.e., a middlebox itself act as a back-to-back pair of TCP endpoints (i.e., a middlebox
that intends to be transparent or invisible at the transport layer) that intends to be transparent or invisible at the transport layer)
ought to forward AccECN TCP Options unaltered, whether or not the ought to forward AccECN TCP Options unaltered, whether or not the
length value matches one of those specified in Section 3.2.3, and length value matches one of those specified in Section 3.2.3, and
whether or not the initial values of the byte-counter fields match whether or not the initial values of the byte-counter fields match
those in Section 3.2.1. This is because blocking apparently invalid those in Section 3.2.1. This is because blocking apparently invalid
values prevents the standardized set of values being extended in values prevents the standardized set of values from being extended in
future (such outdated normalizers would block updated hosts from the future (such outdated normalizers would block updated hosts from
using the extended AccECN standard). using the extended AccECN standard).
A TCP normalizer is likely to block or alter an AccECN TCP Option if A TCP normalizer is likely to block or alter an AccECN TCP Option if
the length value or the initial values of its byte-counter fields do the length value or the initial values of its byte-counter fields do
not match one of those specified in Section 3.2.3 or Section 3.2.1. not match one of those specified in Sections 3.2.3 or 3.2.1.
However, to comply with the present AccECN specification, a middlebox However, to comply with the present AccECN specification, a middlebox
MUST NOT change the ACE field; or those fields of an AccECN Option MUST NOT change the ACE field; or those fields of an AccECN Option
that are currently specified in Section 3.2.3; or any AccECN field that are currently specified in Section 3.2.3; or any AccECN field
covered by integrity protection (e.g., [RFC5925]). covered by integrity protection (e.g., [RFC5925]).
3.3.3. Requirements for TCP ACK Filtering 3.3.3. Requirements for TCP ACK Filtering
Section 5.2.1 of BCP 69 [RFC3449] gives best current practice on Section 5.2.1 of [RFC3449] gives best current practice on filtering
filtering (aka. thinning or coalescing) of pure TCP ACKs. It advises (aka thinning or coalescing) of pure TCP ACKs. It advises that
that filtering ACKs carrying ECN feedback ought to preserve the filtering ACKs carrying ECN feedback ought to preserve the correct
correct operation of ECN feedback. As the present specification operation of ECN feedback. As the present specification updates the
updates the operation of ECN feedback, this section discusses how an operation of ECN feedback, this section discusses how an ACK filter
ACK filter might preserve correct operation of AccECN feedback as might preserve correct operation of AccECN feedback as well.
well.
The problem divides into two parts: determining if an ACK is part of The problem divides into two parts: determining if an ACK is part of
a connection that is using AccECN and then preserving the correct a connection that is using AccECN and then preserving the correct
operation of AccECN feedback: operation of AccECN feedback:
* To determine whether a pure TCP ACK is part of an AccECN * To determine whether a pure TCP ACK is part of an AccECN
connection without resorting to connection tracking and per-flow connection without resorting to connection tracking and per-flow
state, a useful heuristic would be to check for a non-zero ECN state, a useful heuristic would be to check for a non-zero ECN
field at the IP layer (because the ECN++ experiment only allows field at the IP layer (because the ECN++ experiment only allows
TCP pure ACKs to be ECN-capable if AccECN has been negotiated TCP pure ACKs to be ECN-capable if AccECN has been negotiated
[I-D.ietf-tcpm-generalized-ecn]). This heuristic is simple and [ECN++]). This heuristic is simple and stateless. However, it
stateless. However, it might omit some AccECN ACKs, because might omit some AccECN ACKs, because AccECN can be used without
AccECN can be used without ECN++ and even if it is, ECN++ does not ECN++ and even if it is, ECN++ does not have to make pure ACKs
have to make pure ACKs ECN-capable - only deployment experience ECN-capable -- only deployment experience will tell. Also, TCP
will tell. Also, TCP ACKs might be ECN-capable owing to some ACKs might be ECN-capable owing to some scheme other than AccECN,
scheme other than AccECN, e.g., [RFC5690] or some future standards e.g., [RFC5690] or some future standards action. Again, only
action. Again, only deployment experience will tell. deployment experience will tell.
* The main concern with preserving correct AccECN operation involves * The main concern with preserving correct AccECN operation involves
leaving enough ACKs for the Data Sender to work out whether the leaving enough ACKs for the Data Sender to work out whether the
3-bit ACE field has wrapped. In the worst case, in feedback about 3-bit ACE field has wrapped. In the worst case, in feedback about
a run of received packets that were all ECN-marked, the ACE field a run of received packets that were all ECN-marked, the ACE field
will wrap every 8 acknowledged packets. ACE field wrap might be will wrap every 8 acknowledged packets. ACE field wrap might be
of less concern if packets also carry AccECN TCP Options. of less concern if packets also carry AccECN TCP Options.
However, note that logic to read an AccECN TCP Option is optional However, note that logic to read an AccECN TCP Option is optional
to implement (albeit recommended see Section 3.2.3). So one end to implement (albeit recommended -- see Section 3.2.3). So one
writing an AccECN TCP Option into a packet does not necessarily end writing an AccECN TCP Option into a packet does not
imply that the other end will read it. necessarily imply that the other end will read it.
Note that the present specification of AccECN in TCP does not presume Note that the present specification of AccECN in TCP does not presume
to rely on any of the above ACK filtering behaviour in the network, to rely on any of the above ACK filtering behaviour in the network,
because it has to be robust against pre-existing network nodes that because it has to be robust against pre-existing network nodes that
do not distinguish AccECN ACKs, and robust against ACK loss during do not distinguish AccECN ACKs, and robust against ACK loss during
overload more generally. overload more generally.
3.3.4. Requirements for TCP Segmentation Offload and Large Receive 3.3.4. Requirements for TCP Segmentation Offload and Large Receive
Offload Offload
skipping to change at page 48, line 30 skipping to change at line 2227
Offloading can happen in the transmit path, usually referred to as Offloading can happen in the transmit path, usually referred to as
TCP Segmentation Offload (TSO), and the receive path where it is TCP Segmentation Offload (TSO), and the receive path where it is
called Large Receive Offload (LRO). called Large Receive Offload (LRO).
In the transmit direction, with AccECN, all segments created from the In the transmit direction, with AccECN, all segments created from the
same super-segment should retain the same ACE field, which should same super-segment should retain the same ACE field, which should
make TSO straighforward. make TSO straighforward.
However, with TSO hardware that supports [RFC3168], the CWR bit is However, with TSO hardware that supports [RFC3168], the CWR bit is
usually masked out on the middle and last segment. If applied to an usually masked out on the middle and last segments. If applied to an
AccECN segment, this would change the ACE field, and would be AccECN segment, this would change the ACE field, and would be
interpreted as having received numerous CE marks in the receive interpreted as having received numerous CE marks in the receive
direction. Therefore, currently available TSO hardware with direction. Therefore, currently available TSO hardware with
[RFC3168] support may need some minor driver changes, to adjust the [RFC3168] support may need some minor driver changes, to adjust the
bitmask for the first, middle and last segment processed with TSO. bitmask for the first, middle, and last segments processed with TSO.
Initially, when Classic ECN [RFC3168] and Accurate ECN flows coexist Initially, when Classic ECN [RFC3168] and Accurate ECN flows coexist
on the same offloading engine, the host software may need to work on the same offloading engine, the host software may need to work
around incompatibilities (e.g., when only global configurable TSO TCP around incompatibilities (e.g., when only global configurable TSO TCP
Flag bitmasks are available), otherwise this would cause some issues. Flag bitmasks are available), otherwise this would cause some issues.
One way around this could be to only negotiate for Accurate ECN, but One way around this could be to only negotiate for Accurate ECN, but
not offer a fall back to [RFC3168] ECN. Another way could be to not offer a fall back to [RFC3168] ECN. Another way could be to
allow TSO only as long as the CWR flag in the TCP header is not set - allow TSO only as long as the CWR flag in the TCP header is not set
at the cost of more processing overhead while the ACE field has this -- at the cost of more processing overhead while the ACE field has
bit set. this bit set.
For LRO in the receive direction, a different issue may get exposed For LRO in the receive direction, a different issue may get exposed
with [RFC3168] ECN supporting hardware. with [RFC3168] ECN supporting hardware.
The ACE field changes with every received CE marking, so today's The ACE field changes with every received CE marking, so today's
receive offloading could lead to many interrupts in high congestion receive offloading could lead to many interrupts in high congestion
situations. Although that would be useful (because congestion situations. Although that would be useful (because congestion
information is received sooner), it could also significantly increase information is received sooner), it could also significantly increase
processor load, particularly in scenarios such as DCTCP or L4S where processor load, particularly in scenarios such as DCTCP or L4S where
the marking rate is generally higher. the marking rate is generally higher.
Current offload hardware ejects a segment from the coalescing process Current offload hardware ejects a segment from the coalescing process
whenever the TCP ECN flags change. In data centres it has been whenever the TCP ECN flags change. In data centres, it has been
fortunate for this offload hardware that DCTCP-style feedback changes fortunate for this offload hardware that DCTCP-style feedback changes
less often when there are long sequences of CE marks, which is more less often when there are long sequences of CE marks, which is more
common with a step marking threshold (but less likely the more short common with a step marking threshold (but less likely the more short
flows are in the mix). The ACE counter approach has been designed so flows are in the mix). The ACE counter approach has been designed so
that coalescing can continue over arbitrary patterns of marking and that coalescing can continue over arbitrary patterns of marking and
only needs to stop when the counter wraps. Nonetheless, until the only needs to stop when the counter wraps. Nonetheless, until the
particular offload hardware in use implements this more efficient particular offload hardware in use implements this more efficient
approach, it is likely to be more efficient for AccECN connections to approach, it is likely to be more efficient for AccECN connections to
implement this counter-style logic using software segmentation implement this counter-style logic using software segmentation
offload. offload.
skipping to change at page 49, line 35 skipping to change at line 2278
ECN encodes a varying signal in the ACK stream, so it is inevitable ECN encodes a varying signal in the ACK stream, so it is inevitable
that offload hardware will ultimately need to handle any form of ECN that offload hardware will ultimately need to handle any form of ECN
feedback exceptionally. The ACE field has been designed as a counter feedback exceptionally. The ACE field has been designed as a counter
so that it is straightforward for offload hardware to pass on the so that it is straightforward for offload hardware to pass on the
highest counter, and to push a segment from its cache before the highest counter, and to push a segment from its cache before the
counter wraps. The purpose of working towards standardized TCP ECN counter wraps. The purpose of working towards standardized TCP ECN
feedback is to reduce the risk for hardware developers, who would feedback is to reduce the risk for hardware developers, who would
otherwise have to guess which scheme is likely to become dominant. otherwise have to guess which scheme is likely to become dominant.
The above process has been designed to enable a continuing The above process has been designed to enable a continuing
incremental deployment path - to more highly dynamic congestion incremental deployment path -- to more highly dynamic congestion
control. Once offload hardware supports AccECN, it will be able to control. Once offload hardware supports AccECN, it will be able to
coalesce efficiently for any sequence of marks, instead of relying coalesce efficiently for any sequence of marks, instead of relying on
for efficiency on the long marking sequences from step marking. In the long marking sequences from step marking for efficiency. In the
the next stage, marking can evolve from a step to a ramp function. next stage, marking can evolve from a step to a ramp function. That
That in turn will allow host congestion control algorithms to respond in turn will allow host congestion control algorithms to respond
faster to dynamics, while being backwards compatible with existing faster to dynamics, while being backwards compatible with existing
host algorithms. host algorithms.
4. Updates to RFC 3168 4. Updates to RFC 3168
This section clarifies which parts of RFC3168 are updated and maps This section clarifies which parts of RFC 3168 are updated and maps
them to the sections of the present AccECN specification that update them to the relevant updated sections of the present AccECN
them: specification.
* The whole of "6.1.1 TCP Initialization" of [RFC3168] is updated by * The whole of Section 6.1.1 of [RFC3168] is updated by Section 3.1
Section 3.1 of the present specification. of the present specification.
* In "6.1.2. The TCP Sender" of [RFC3168], all mentions of a * In Section 6.1.2 of [RFC3168], all mentions of a congestion
congestion response to an ECN-Echo (ECE) ACK packet are updated by response to an ECN-Echo (ECE) ACK packet are updated by
Section 3.2 of the present specification to mean an increment to Section 3.2 of the present specification to mean an increment to
the sender's count of CE-marked packets, s.cep. And the the sender's count of CE-marked packets, s.cep. And the
requirements to set the CWR flag no longer apply, as specified in requirements to set the CWR flag no longer apply, as specified in
Section 3.1.5 of the present specification. Otherwise, the Section 3.1.5 of the present specification. Otherwise, the
remaining requirements in "6.1.2. The TCP Sender" still stand. remaining requirements in Section 6.1.2 of [RFC3168] still stand.
It will be noted that RFC 8311 already updates, or potentially It will be noted that [RFC8311] already updates, or potentially
updates, a number of the requirements in "6.1.2. The TCP Sender". updates, a number of the requirements in Section 6.1.2 of
Section 6.1.2 of RFC 3168 extended standard TCP congestion control [RFC3168]. Section 6.1.2 of RFC 3168 extended standard TCP
[RFC5681] to cover ECN marking as well as packet drop. Whereas, congestion control [RFC5681] to cover ECN marking as well as
RFC 8311 enables experimentation with alternative responses to ECN packet drop. Whereas, [RFC8311] enables experimentation with
marking, if specified for instance by an experimental RFC on the alternative responses to ECN marking, if specified for instance by
IETF document stream. RFC 8311 also strengthened the statement an Experimental RFC produced by the IETF Stream. [RFC8311] also
that "ECT(0) SHOULD be used" to a "MUST" (see [RFC8311] for the strengthened the statement that "ECT(0) SHOULD be used" to a
details). "MUST" (see [RFC8311] for the details).
* The whole of "6.1.3. The TCP Receiver" of [RFC3168] is updated by * The whole of Section 6.1.3 of [RFC3168] is updated by Section 3.2
Section 3.2 of the present specification, with the exception of of the present specification, with the exception of the last
the last paragraph (about congestion response to drop and ECN in paragraph (about congestion response to drop and ECN in the same
the same round trip), which still stands. Incidentally, this last round trip), which still stands. Incidentally, this last
paragraph is in the wrong section, because it relates to "TCP paragraph is in the wrong section, because it relates to "TCP
Sender" behaviour. Sender" behaviour.
* The following text within "6.1.5. Retransmitted TCP packets": * The following text within Section 6.1.5 of [RFC3168]:
"the TCP data receiver SHOULD ignore the ECN field on arriving | the TCP data receiver SHOULD ignore the ECN field on arriving
data packets that are outside of the receiver's current | data packets that are outside of the receiver's current window.
window."
is updated by more stringent acceptability tests for any packet is updated by more stringent acceptability tests for any packet
(not just data packets) in the present specification. (not just data packets) in the present specification.
Specifically, in the normative specification of AccECN (Section 3) Specifically, in the normative specification of AccECN
only 'Acceptable' packets contribute to the ECN counters at the (Section 3), only 'Acceptable' packets contribute to the ECN
AccECN receiver and Section 1.3 defines an Acceptable packet as counters at the AccECN receiver and Section 1.3 defines an
one that passes acceptability tests equivalent in strength to Acceptable packet as one that passes acceptability tests
those in both [RFC9293] and [RFC5961]. equivalent in strength to those in both [RFC9293] and [RFC5961].
* Sections 5.2, 6.1.1, 6.1.4, 6.1.5 and 6.1.6 of [RFC3168] prohibit * Sections 5.2, 6.1.1, 6.1.4, 6.1.5, and 6.1.6 of [RFC3168] prohibit
use of ECN on TCP control packets and retransmissions. The use of ECN on TCP control packets and retransmissions. The
present specification does not update that aspect of RFC 3168, but present specification does not update that aspect of [RFC3168],
it does say what feedback an AccECN Data Receiver ought to provide but it does say what feedback an AccECN Data Receiver ought to
if it receives an ECN-capable control packet or retransmission. provide if it receives an ECN-capable control packet or
This ensures AccECN is forward compatible with any future scheme retransmission. This ensures AccECN is forward compatible with
that allows ECN on these packets, as provided for in section 4.3 any future scheme that allows ECN on these packets, as provided
of [RFC8311] and as proposed in [I-D.ietf-tcpm-generalized-ecn]. for in Section 4.3 of [RFC8311] and as proposed in [ECN++].
5. Interaction with TCP Variants 5. Interaction with TCP Variants
This section is informative, not normative. This section is informative, not normative.
5.1. Compatibility with SYN Cookies 5.1. Compatibility with SYN Cookies
A TCP Server can use SYN Cookies (see Appendix A of [RFC4987]) to A TCP Server can use SYN Cookies (see Appendix A of [RFC4987]) to
protect itself from SYN flooding attacks. It places minimal commonly protect itself from SYN flooding attacks. It places minimal commonly
used connection state in the SYN/ACK, and deliberately does not hold used connection state in the SYN/ACK, and deliberately does not hold
any state while waiting for the subsequent ACK (e.g., it closes the any state while waiting for the subsequent ACK (e.g., it closes the
thread). Therefore it cannot record the fact that it entered AccECN thread). Therefore, it cannot record the fact that it entered AccECN
mode for both half-connections. Indeed, it cannot even remember mode for both half-connections. Indeed, it cannot even remember
whether it negotiated the use of Classic ECN [RFC3168]. whether it negotiated the use of Classic ECN [RFC3168].
Nonetheless, such a Server can determine that it negotiated AccECN as Nonetheless, such a Server can determine that it negotiated AccECN as
follows. If a TCP Server using SYN Cookies supports AccECN and if it follows. If a TCP Server using SYN Cookies supports AccECN and if it
receives a pure ACK that acknowledges an ISN that is a valid SYN receives a pure ACK that acknowledges an ISN that is a valid SYN
cookie, and if the ACK contains an ACE field with the value 0b010 to cookie, and if the ACK contains an ACE field with the value 0b010 to
0b111 (decimal 2 to 7), the Server can infer the first two stages of 0b111 (decimal 2 to 7), the Server can infer the first two stages of
the handshake: the handshake:
* the TCP Client has to have requested AccECN support on the SYN; * the TCP Client has to have requested AccECN support on the SYN;
* then, even though the Server kept no state, it has to have * then, even though the Server kept no state, it has to have
confirmed that it supported AccECN. confirmed that it supported AccECN.
Therefore the Server can switch itself into AccECN mode, and continue Therefore, the Server can switch itself into AccECN mode, and
as if it had never forgotten that it switched itself into AccECN mode continue as if it had never forgotten that it switched itself into
earlier. AccECN mode earlier.
If the pure ACK that acknowledges a SYN cookie contains an ACE field If the pure ACK that acknowledges a SYN cookie contains an ACE field
with the value 0b000 or 0b001, these values indicate that the TCP with the value 0b000 or 0b001, these values indicate that the TCP
Client did not request support for AccECN and therefore the Server Client did not request support for AccECN; therefore, the Server does
does not enter AccECN mode for this connection. Further, 0b001 on not enter AccECN mode for this connection. Further, 0b001 on the ACK
the ACK implies that the Server sent an ECN-capable SYN/ACK, which implies that the Server sent an ECN-capable SYN/ACK, which was marked
was marked CE in the network, and the non-AccECN TCP Client fed this CE in the network, and the non-AccECN TCP Client fed this back by
back by setting ECE on the ACK of the SYN/ACK. setting ECE on the ACK of the SYN/ACK.
5.2. Compatibility with TCP Experiments and Common TCP Options 5.2. Compatibility with TCP Experiments and Common TCP Options
AccECN is compatible (at least on paper) with the most commonly used AccECN is compatible (at least on paper) with the most commonly used
TCP options: MSS, time-stamp, window scaling, SACK and TCP-AO. It is TCP options: MSS, time-stamp, window scaling, SACK, and TCP-AO. It
also compatible with Multipath TCP (MPTCP [RFC8684]) and the is also compatible with Multipath TCP (MPTCP [RFC8684]) and the
experimental TCP option TCP Fast Open (TFO [RFC7413]). AccECN is experimental TCP option TCP Fast Open (TFO [RFC7413]). AccECN is
friendly to all these protocols, because space for TCP options is friendly to all these protocols, because space for TCP options is
particularly scarce on the SYN, where AccECN consumes zero additional particularly scarce on the SYN, where AccECN consumes zero additional
header space. header space.
When option space is under pressure from other options, When option space is under pressure from other options,
Section 3.2.3.3 provides guidance on how important it is to send an Section 3.2.3.3 provides guidance on how important it is to send an
AccECN Option relative to other options, and which fields are more AccECN Option relative to other options, and which fields are more
important to include. important to include.
Implementers of TFO need to take careful note of the recommendation Implementers of TFO need to take careful note of the recommendation
in Section 3.2.2.1. That section recommends that, if the TCP Client in Section 3.2.2.1. That section recommends that, if the TCP Client
has successfully negotiated AccECN, when acknowledging the SYN/ACK, has successfully negotiated AccECN, when acknowledging the SYN/ACK,
even if it has data to send, it sends a pure ACK immediately before even if it has data to send, it sends a pure ACK immediately before
the data. Then it can reflect the IP-ECN field of the SYN/ACK on the data. Then it can reflect the IP-ECN field of the SYN/ACK on
this pure ACK, which allows the Server to detect ECN mangling. Note this pure ACK, which allows the Server to detect ECN mangling. Note
that, as specified in Section 3.2, any data on the SYN (SYN=1, ACK=0) that, as specified in Section 3.2, any data on the SYN (SYN=1, ACK=0)
is not included in any of the byte counters held locally for each ECN is not included in any of the byte counters held locally for each ECN
marking, nor in the AccECN Option on the wire. marking, nor in the AccECN Option on the wire.
AccECN feedback is compatible with the ECN++ AccECN feedback is compatible with the ECN++ experiment [ECN++],
[I-D.ietf-tcpm-generalized-ecn] experiment, which allows TCP control which allows TCP control packets and retransmissions to be ECN-
packets and retransmissions to be ECN-capable ([RFC3168] was updated capable ([RFC3168] was updated by [RFC8311] to permit such
by [RFC8311] to permit such experiments). AccECN is likely to experiments). AccECN is likely to inherently support any experiment
inherently support any experiment with ECN-capable packets, because with ECN-capable packets, because it feeds back the contents of the
it feeds back the contents of the ECN field mechanistically, without ECN field mechanistically, without judging whether or not a packet
judging whether a packet ought to use the ECN capability or not ought to use the ECN capability (Section 2.5). This specification
(Section 2.5). This specification does not discuss implementing does not discuss implementing AccECN alongside [RFC5562], which was
AccECN alongside [RFC5562], which was an earlier experimental an earlier experimental protocol with narrower scope than ECN++ and a
protocol with narrower scope than ECN++ and a 5-way handshake. 5-way handshake.
5.3. Compatibility with Feedback Integrity Mechanisms 5.3. Compatibility with Feedback Integrity Mechanisms
Three alternative mechanisms are available to assure the integrity of Three alternative mechanisms are available to assure the integrity of
ECN and/or loss signals. AccECN is compatible with any of these ECN and/or loss signals. AccECN is compatible with any of these
approaches: approaches:
* The Data Sender can test the integrity of the receiver's ECN (or * The Data Sender can test the integrity of the receiver's ECN (or
loss) feedback by occasionally setting the IP-ECN field to a value loss) feedback by occasionally setting the IP-ECN field to a value
normally only set by the network (and/or deliberately leaving a normally only set by the network (and/or deliberately leaving a
sequence number gap). Then it can test whether the Data sequence number gap). Then it can test whether the Data
Receiver's feedback faithfully reports what it expects (similar to Receiver's feedback faithfully reports what it expects (similar to
paragraph 2 of Section 20.2 of [RFC3168]). Unlike the ECN Nonce paragraph 2 of Section 20.2 of [RFC3168]). Unlike the ECN-nonce
[RFC3540], this approach does not waste the ECT(1) codepoint in [RFC3540], this approach does not waste the ECT(1) codepoint in
the IP header, it does not require standardization and it does not the IP header, it does not require standardization, and it does
rely on misbehaving receivers volunteering to reveal feedback not rely on misbehaving receivers volunteering to reveal feedback
information that allows them to be detected. However, setting the information that allows them to be detected. However, setting the
CE mark by the sender might conceal actual congestion feedback CE mark by the sender might conceal actual congestion feedback
from the network and therefore ought to only be done sparingly. from the network and therefore ought to only be done sparingly.
* Networks generate congestion signals when they are becoming * Networks generate congestion signals when they are becoming
congested, so networks are more likely than Data Senders to be congested, so networks are more likely than Data Senders to be
concerned about the integrity of the receiver's feedback of these concerned about the integrity of the receiver's feedback of these
signals. A network can enforce a congestion response to its ECN signals. A network can enforce a congestion response to its ECN
markings (or packet losses) using congestion exposure (ConEx) markings (or packet losses) using congestion exposure (ConEx)
audit [RFC7713]. Whether the receiver or a downstream network is audit [RFC7713]. Whether the receiver or a downstream network is
suppressing congestion feedback or the sender is unresponsive to suppressing congestion feedback, or the sender is unresponsive to
the feedback, or both, ConEx audit can neutralize any advantage the feedback, or both, ConEx audit can neutralize any advantage
that any of these three parties would otherwise gain. that any of these three parties would otherwise gain.
ConEx is an experimental change to the Data Sender that would be ConEx is an experimental change to the Data Sender that would be
most useful when combined with AccECN. Without AccECN, the ConEx most useful when combined with AccECN. Without AccECN, the ConEx
behaviour of a Data Sender would have to be more conservative than behaviour of a Data Sender would have to be more conservative than
would be necessary if it had the accurate feedback of AccECN. would be necessary if it had the accurate feedback of AccECN.
* The standards track TCP authentication option (TCP-AO [RFC5925]) * The Standards Track TCP authentication option (TCP-AO [RFC5925])
can be used to detect any tampering with AccECN feedback between can be used to detect any tampering with AccECN feedback between
the Data Receiver and the Data Sender (whether malicious or the Data Receiver and the Data Sender (whether malicious or
accidental). The AccECN fields are immutable end-to-end, so they accidental). The AccECN fields are immutable end to end, so they
are amenable to TCP-AO protection, which covers TCP options by are amenable to TCP-AO protection, which covers TCP options by
default. However, TCP-AO is often too brittle to use on many end- default. However, TCP-AO is often too brittle to use on many end-
to-end paths, where middleboxes can make verification fail in to-end paths, where middleboxes can make verification fail in
their attempts to improve performance or security, e.g., Network their attempts to improve performance or security, e.g., Network
Address (and Port) Translation (NAT/NAPT), resegmentation or Address Translation (NAT) and Network Address Port Translation
shifting the sequence space. (NAPT), resegmentation, or shifting the sequence space.
6. Summary: Protocol Properties 6. Summary: Protocol Properties
This section is informative not normative. It describes how well the This section is informative, not normative. It describes how well
protocol satisfies the agreed requirements for a more Accurate ECN the protocol satisfies the agreed requirements for a more Accurate
feedback protocol [RFC7560]. ECN feedback protocol [RFC7560].
Accuracy: From each ACK, the Data Sender can infer the number of new Accuracy: From each ACK, the Data Sender can infer the number of new
CE marked segments since the previous ACK. This provides better CE-marked segments since the previous ACK. This provides better
accuracy on CE feedback than Classic ECN. In addition if an accuracy on CE feedback than Classic ECN. In addition, if an
AccECN Option is present (not blocked by the network path) the AccECN Option is present (not blocked by the network path), the
number of bytes marked with CE, ECT(1) and ECT(0) are provided. number of bytes marked with CE, ECT(1), and ECT(0) are provided.
Overhead: The AccECN scheme is divided into two parts. The Overhead: The AccECN scheme is divided into two parts. The
essential feedback part reuses the 3 flags already assigned to ECN essential feedback part reuses the three flags already assigned to
in the TCP header. The supplementary feedback part adds an ECN in the TCP header. The supplementary feedback part adds an
additional TCP option consuming up to 11 bytes. However, no TCP additional TCP option consuming up to 11 bytes. However, no TCP
option space is consumed in the SYN. option space is consumed in the SYN.
Ordering: The order in which marks arrive at the Data Receiver is Ordering: The order in which marks arrive at the Data Receiver is
preserved in AccECN feedback, because the Data Receiver is preserved in AccECN feedback, because the Data Receiver is
expected to send an ACK immediately whenever a different mark expected to send an ACK immediately whenever a different mark
arrives. arrives.
Timeliness: While the same ECN markings are arriving continually at Timeliness: While the same ECN markings are arriving continually at
the Data Receiver, it can defer ACKs as TCP does normally, but it the Data Receiver, it can defer ACKs as TCP does normally, but it
skipping to change at page 54, line 18 skipping to change at line 2500
Timeliness vs Overhead: Change-Triggered ACKs are intended to enable Timeliness vs Overhead: Change-Triggered ACKs are intended to enable
latency-sensitive uses of ECN feedback by capturing the timing of latency-sensitive uses of ECN feedback by capturing the timing of
transitions but not wasting resources while the state of the transitions but not wasting resources while the state of the
signalling system is stable. Within the constraints of the signalling system is stable. Within the constraints of the
change-triggered ACK rules, the receiver can control how change-triggered ACK rules, the receiver can control how
frequently it sends AccECN TCP Options and therefore to some frequently it sends AccECN TCP Options and therefore to some
extent it can control the overhead induced by AccECN. extent it can control the overhead induced by AccECN.
Resilience: All information is provided based on counters. Resilience: All information is provided based on counters.
Therefore if ACKs are lost, the counters on the first ACK Therefore if ACKs are lost, the counters on the first ACK
following the losses allows the Data Sender to immediately recover following the losses allow the Data Sender to immediately recover
the number of the ECN markings that it missed. And if data or the number of the ECN markings that it missed. If data or ACKs
ACKs are reordered, stale congestion information can be identified are reordered, stale congestion information can be identified and
and ignored. ignored.
Resilience against Bias: Because feedback is based on repetition of Resilience against Bias: Because feedback is based on repetition of
counters, random losses do not remove any information, they only counters, random losses do not remove any information, they only
delay it. Therefore, even though some ACKs are change-triggered, delay it. Therefore, even though some ACKs are change-triggered,
random losses will not alter the proportions of the different ECN random losses will not alter the proportions of the different ECN
markings in the feedback. markings in the feedback.
Resilience vs Overhead: If space is limited in some segments Resilience vs Overhead: If space is limited in some segments (e.g.,
(e.g., because more options are needed on some segments, such as because more options are needed on some segments, such as the SACK
the SACK option after loss), the Data Receiver can send AccECN option after loss), the Data Receiver can send AccECN Options less
Options less frequently or truncate fields that have not changed, frequently or truncate fields that have not changed, usually down
usually down to as little as 5 bytes. to as little as 5 bytes.
Resilience vs Timeliness and Ordering: Ordering information and the Resilience vs Timeliness and Ordering: Ordering information and the
timing of transitions cannot be communicated in three cases: i) timing of transitions cannot be communicated in three cases: i)
during ACK loss; ii) if something on the path strips AccECN during ACK loss; ii) if something on the path strips AccECN
Options; or iii) if the Data Receiver is unable to support Change- Options; or iii) if the Data Receiver is unable to support Change-
Triggered ACKs. Following ACK reordering, the Data Sender can Triggered ACKs. Following ACK reordering, the Data Sender can
reconstruct the order in which feedback was sent, but not until reconstruct the order in which feedback was sent, but not until
all the missing feedback has arrived. all the missing feedback has arrived.
Complexity: An AccECN implementation solely involves simple counter Complexity: An AccECN implementation solely involves simple counter
increments, some modulo arithmetic to communicate the least increments, some modulo arithmetic to communicate the least
significant bits and allow for wrap, and some heuristics for significant bits and allow for wrap, and some heuristics for
safety against fields cycling due to prolonged periods of ACK safety against fields cycling due to prolonged periods of ACK
loss. Each host needs to maintain eight additional counters. The loss. Each host needs to maintain eight additional counters. The
hosts have to apply some additional tests to detect tampering by hosts have to apply some additional tests to detect tampering by
middleboxes, but in general the protocol is simple to understand, middleboxes, but in general the protocol is simple to understand
simple to implement and requires few cycles per packet to execute. and implement and requires few cycles per packet to execute.
Integrity: AccECN is compatible with at least three approaches that Integrity: AccECN is compatible with at least three approaches that
can assure the integrity of ECN feedback. If AccECN Options are can assure the integrity of ECN feedback. If AccECN Options are
stripped the resolution of the feedback is degraded, but the stripped, the resolution of the feedback is degraded, but the
integrity of this degraded feedback can still be assured. integrity of this degraded feedback can still be assured.
Backward Compatibility: If only one endpoint supports the AccECN Backward Compatibility: If only one endpoint supports the AccECN
scheme, it will fall-back to the most advanced ECN feedback scheme scheme, it will fall back to the most advanced ECN feedback scheme
supported by the other end. supported by the other end.
If AccECN Options are stripped by a middlebox, AccECN still If AccECN Options are stripped by a middlebox, AccECN still
provides basic congestion feedback in the ACE field. Further, provides basic congestion feedback in the ACE field. Further,
AccECN can be used to detect mangling of the IP ECN field; AccECN can be used to detect mangling of the IP-ECN field;
mangling of the TCP ECN flags; blocking of ECT-marked segments; mangling of the TCP ECN flags; blocking of ECT-marked segments;
and blocking of segments carrying an AccECN Option. It can detect and blocking of segments carrying an AccECN Option. It can detect
these conditions during TCP's three-way handshake so that it can these conditions during TCP's three-way handshake so that it can
fall back to operation without ECN and/or operation without AccECN fall back to operation without ECN and/or operation without AccECN
Options. Options.
Forward Compatibility: The behaviour of endpoints and middleboxes is Forward Compatibility: The behaviour of endpoints and middleboxes is
carefully defined for all reserved or currently unused codepoints carefully defined for all reserved or currently unused codepoints
in the scheme. Then, the designers of security devices can in the scheme. Then, the designers of security devices can
understand which currently unused values might appear in future. understand which currently unused values might appear in the
So, even if they choose to treat such values as anomalous while future. So, even if they choose to treat such values as anomalous
they are not widely used, any blocking will at least be under while they are not widely used, any blocking will at least be
policy control not hard-coded. Then, if previously unused values under policy control and not hard-coded. Then, if previously
start to appear on the Internet (or in standards), such policies unused values start to appear on the Internet (or in standards),
could be quickly reversed. such policies could be quickly reversed.
7. IANA Considerations 7. IANA Considerations
This document reassigns the TCP header flag at bit offset 7 to the This document reassigns the TCP header flag at bit offset 7 to the
AccECN protocol. This bit was previously called the Nonce Sum (NS) AccECN protocol. This bit was previously called the Nonce Sum (NS)
flag [RFC3540], but RFC 3540 has been reclassified as historic flag [RFC3540], but RFC 3540 has been reclassified as Historic
[RFC8311]. The flag will now be defined as the following in the "TCP [RFC8311]. The flag is now defined as the following in the "TCP
Header Flags" registry in the "Transmission Control Protocol (TCP) Header Flags" registry in the "Transmission Control Protocol (TCP)
Parameters" registry group: Parameters" registry group:
+=====+==============+===========+==============================+ +=====+==============+===========+==============================+
| Bit | Name | Reference | Assignment Notes | | Bit | Name | Reference | Assignment Notes |
+=====+==============+===========+==============================+ +=====+==============+===========+==============================+
| 7 | AE (Accurate | RFC XXXX | Previously used as NS (Nonce | | 7 | AE (Accurate | RFC 9768 | Previously used as NS (Nonce |
| | ECN) | | Sum) by [RFC3540], which is | | | ECN) | | Sum) by [RFC3540], which is |
| | | | now historic [RFC8311] | | | | | now Historic [RFC8311] |
+-----+--------------+-----------+------------------------------+ +-----+--------------+-----------+------------------------------+
Table 6: TCP header flag reassignment Table 6: TCP Header Flag Reassignment
[TO BE REMOVED: IANA is requested to update the existing entry in the
TCP Header Flags registry (https://www.iana.org/assignments/tcp-
parameters/tcp-parameters.xhtml#tcp-header-flags) for Bit 7 to "AE
(Accurate ECN)" and to change the reference to this RFC-to-be instead
of RFC8311. Also IANA is requested to change the assignment note to
"Previously used as NS (Nonce Sum) by [RFC3540], which is now
historic [RFC8311]."]
This document also defines two new TCP options for AccECN, assigned This document also defines two new TCP options for AccECN from the
values of 172 and 174 (decimal) from the TCP option space. These TCP option space. These values are defined as the following in the
values are defined as the following in the "TCP Option Kind Numbers" "TCP Option Kind Numbers" registry in the "Transmission Control
registry in the "Transmission Control Protocol (TCP) Parameters" Protocol (TCP) Parameters" registry group:
registry group:
+======+========+================================+===========+ +======+========+================================+===========+
| Kind | Length | Meaning | Reference | | Kind | Length | Meaning | Reference |
+======+========+================================+===========+ +======+========+================================+===========+
| 172 | N | Accurate ECN Order 0 (AccECN0) | RFC XXXX | | 172 | N | Accurate ECN Order 0 (AccECN0) | RFC 9768 |
+------+--------+--------------------------------+-----------+ +------+--------+--------------------------------+-----------+
| 174 | N | Accurate ECN Order 1 (AccECN1) | RFC XXXX | | 174 | N | Accurate ECN Order 1 (AccECN1) | RFC 9768 |
+------+--------+--------------------------------+-----------+ +------+--------+--------------------------------+-----------+
Table 7: New TCP Option assignments Table 7: New TCP Option assignments
[TO BE REMOVED: These registrations have taken place using the early
registration procedure, which may be temporary if this draft does not
proceed, at the following location: http://www.iana.org/assignments/
tcp-parameters/tcp-parameters.xhtml#tcp-parameters-1 ]
Early experimental implementations of the two AccECN Options used Early experimental implementations of the two AccECN Options used
experimental option 254 per [RFC6994] with the 16-bit magic numbers experimental option 254 per [RFC6994] with the 16-bit magic numbers
0xACC0 and 0xACC1 respectively for Order 0 and 1, as allocated in the 0xACC0 and 0xACC1, respectively, for Order 0 and 1, as allocated in
IANA "TCP Experimental Option Experiment Identifiers (TCP ExIDs)" the IANA "TCP/UDP Experimental Option Experiment Identifiers (TCP/UDP
registry. Even earlier experimental implementations used the single ExIDs)" registry. Even earlier experimental implementations used the
magic number 0xACCE (16 bits). Uses of these experimental options single magic number 0xACCE (16 bits). Uses of these experimental
SHOULD migrate to use the new option kinds (172 & 174). options SHOULD migrate to use the new option kinds (172 and 174).
[TO BE REMOVED: IANA is requested to replace the references for all
three of the above experimental options (0xACC0, 0xACC1 and 0xACCE)
with a reference to the present RFC XXXX.]
[TO BE REMOVED: If the early registrations, which may be temporary,
do not proceed, the three references to them in the TCP ExIDs
registry at the following location will also need to be edited out:
https://www.iana.org/assignments/tcp-parameters/tcp-
parameters.xhtml#tcp-exids ]
8. Security and Privacy Considerations 8. Security and Privacy Considerations
If ever the supplementary feedback part of AccECN based on one of the If ever the supplementary feedback part of AccECN that is based on
new AccECN TCP Options is unusable (due for example to middlebox one of the new AccECN TCP Options is unusable (due for example to
interference) the essential feedback part of AccECN's congestion middlebox interference), the essential feedback part of AccECN's
feedback offers only limited resilience to long runs of ACK loss (see congestion feedback offers only limited resilience to long runs of
Section 3.2.2.5). These problems are unlikely to be due to malicious ACK loss (see Section 3.2.2.5). These problems are unlikely to be
intervention (because if an attacker could strip a TCP option or due to malicious intervention (because if an attacker could strip a
discard a long run of ACKs it could wreak other arbitrary havoc). TCP option or discard a long run of ACKs, it could wreak other
However, it would be of concern if AccECN's resilience could be arbitrary havoc). However, it would be of concern if AccECN's
indirectly compromised during a flooding attack. AccECN is still resilience could be indirectly compromised during a flooding attack.
considered safe though, because if AccECN Options are not present, AccECN is still considered safe though, because if AccECN Options are
the AccECN Data Sender is then required to switch to more not present, the AccECN Data Sender is then required to switch to
conservative assumptions about wrap of congestion indication counters more conservative assumptions about wrap of congestion indication
(see Section 3.2.2.5 and Appendix A.2). counters (see Section 3.2.2.5 and Appendix A.2).
Section 5.1 describes how a TCP Server can negotiate AccECN and use Section 5.1 describes how a TCP Server can negotiate AccECN and use
the SYN cookie method for mitigating SYN flooding attacks. the SYN cookie method for mitigating SYN flooding attacks.
There is concern that ECN feedback could be altered or suppressed, There is concern that ECN feedback could be altered or suppressed,
particularly because a misbehaving Data Receiver could increase its particularly because a misbehaving Data Receiver could increase its
own throughput at the expense of others. AccECN is compatible with own throughput at the expense of others. AccECN is compatible with
the three schemes known to assure the integrity of ECN feedback (see the three schemes known to assure the integrity of ECN feedback (see
Section 5.3 for details). If AccECN Options are stripped by an Section 5.3 for details). If AccECN Options are stripped by an
incorrectly implemented middlebox, the resolution of the feedback incorrectly implemented middlebox, the resolution of the feedback
will be degraded, but the integrity of this degraded information can will be degraded, but the integrity of this degraded information can
still be assured. Assuring that Data Senders respond appropriately still be assured. Assuring that Data Senders respond appropriately
to ECN feedback is possible, but the scope of the present document is to ECN feedback is possible, but the scope of the present document is
confined to the feedback protocol, and excludes the response to this confined to the feedback protocol and excludes the response to this
feedback. feedback.
In Section 3.2.3 a Data Sender is allowed to ignore an unrecognized In Section 3.2.3, a Data Sender is allowed to ignore an unrecognized
TCP AccECN Option length and read as many whole 3-octet fields from TCP AccECN Option length and read as many whole 3-octet fields from
it as possible up to a maximum of 3, treating the remainder as it as possible up to a maximum of 3, treating the remainder as
padding. This opens up a potential covert channel of up to 29B (40 - padding. This opens up a potential covert channel of up to 29B (40 -
(2+3*3)) B. However, it is really an overt channel (not hidden) and (2+3*3)) B. However, it is really an overt channel (not hidden) and
it is no different to the use of unknown TCP options with unknown it is no different than the use of unknown TCP options with unknown
option lengths in general. Therefore, where this is of concern, it option lengths in general. Therefore, where this is of concern, it
can already be adequately mitigated by regular TCP normalizer can already be adequately mitigated by regular TCP normalizer
technology (see Section 3.3.2). technology (see Section 3.3.2).
The AccECN protocol is not believed to introduce any new privacy The AccECN protocol is not believed to introduce any new privacy
concerns, because it merely counts and feeds back signals at the concerns, because it merely counts and feeds back signals at the
transport layer that had already been visible at the IP layer. A transport layer that had already been visible at the IP layer. A
covert channel can be used to compromise privacy. However, as covert channel can be used to compromise privacy. However, as
explained above, undefined TCP options in general open up such explained above, undefined TCP options in general open up such
channels and common techniques are available to close them off. channels, and common techniques are available to close them off.
There is a potential concern that a Data Receiver could deliberately There is a potential concern that a Data Receiver could deliberately
omit AccECN Options pretending that they had been stripped by a omit AccECN Options pretending that they had been stripped by a
middlebox. No known way can yet be contrived for a receiver to take middlebox. No known way can yet be contrived for a receiver to take
advantage of this behaviour, which seems to always degrade its own advantage of this behaviour, which seems to always degrade its own
performance. However, the concern is mentioned here for performance. However, the concern is mentioned here for
completeness. completeness.
A generic privacy concern of any new protocol is that for a while it A generic privacy concern of any new protocol is that for a while it
will be used by a small population of hosts, and thus show up more will be used by a small population of hosts, and thus show up more
easily. However, it is expected that this option will become easily. However, it is expected that AccECN will become available in
available in operating systems over time, and eventually turned on by operating systems over time and that it will eventually be turned on
default in them. Thus a individual identification of a particular by default. Thus, an individual identification of a particular user
user is less of a concern than the fingerprinting of specific is less of a concern than the fingerprinting of specific versions of
versions of operation systems. However, the latter can be done using operation systems. However, the latter can be done using different
different means independent of Accurate ECN. means independent of Accurate ECN.
As Accurate ECN exposes more bits in the TCP header which could be As Accurate ECN exposes more bits in the TCP header that could be
tampered with without interfering with the transport excessively, it tampered with without interfering with the transport excessively, it
may allow an additional way to identify specific data streams across may allow an additional way to identify specific data streams across
a virtual private network (VPN) to an attacker which has access to a virtual private network (VPN) to an attacker that has access to the
the datastream before and after the VPN tunnel endpoints. This may datastream before and after the VPN tunnel endpoints. This may be
be achieved by injecting or modifying the ACE field in specific achieved by injecting or modifying the ACE field in specific patterns
patters that can be recognized. that can be recognized.
Overall, Accurate ECN does not change the risk profile on privacy to Overall, Accurate ECN does not change the risk profile on privacy to
a user dramatically beyond what is already possible using classic a user dramatically beyond what is already possible using classic
ECN. However, in order to prevent such attacks and means of easier ECN. However, in order to prevent such attacks and means of easier
identification of flows, it is adviseable for privacy conscious users identification of flows, it is advisable for privacy-conscious users
behind VPNs to not enable the Accurate ECN, or Classic ECN for that behind VPNs to not enable the Accurate ECN, or Classic ECN for that
matter. matter.
9. References 9. References
9.1. Normative References 9.1. Normative References
[RFC2018] Mathis, M., Mahdavi, J., Floyd, S., and A. Romanow, "TCP [RFC2018] Mathis, M., Mahdavi, J., Floyd, S., and A. Romanow, "TCP
Selective Acknowledgment Options", RFC 2018, Selective Acknowledgment Options", RFC 2018,
DOI 10.17487/RFC2018, October 1996, DOI 10.17487/RFC2018, October 1996,
skipping to change at page 59, line 30 skipping to change at line 2722
[RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC [RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC
2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174,
May 2017, <https://www.rfc-editor.org/info/rfc8174>. May 2017, <https://www.rfc-editor.org/info/rfc8174>.
[RFC9293] Eddy, W., Ed., "Transmission Control Protocol (TCP)", [RFC9293] Eddy, W., Ed., "Transmission Control Protocol (TCP)",
STD 7, RFC 9293, DOI 10.17487/RFC9293, August 2022, STD 7, RFC 9293, DOI 10.17487/RFC9293, August 2022,
<https://www.rfc-editor.org/info/rfc9293>. <https://www.rfc-editor.org/info/rfc9293>.
9.2. Informative References 9.2. Informative References
[I-D.ietf-tcpm-generalized-ecn] [ECN++] Bagnulo, M. and B. Briscoe, "ECN++: Adding Explicit
Bagnulo, M. and B. Briscoe, "ECN++: Adding Explicit
Congestion Notification (ECN) to TCP Control Packets", Congestion Notification (ECN) to TCP Control Packets",
Work in Progress, Internet-Draft, draft-ietf-tcpm- Work in Progress, Internet-Draft, draft-ietf-tcpm-
generalized-ecn-16, 20 October 2024, generalized-ecn-17, 21 April 2025,
<https://datatracker.ietf.org/doc/html/draft-ietf-tcpm- <https://datatracker.ietf.org/doc/html/draft-ietf-tcpm-
generalized-ecn-16>. generalized-ecn-17>.
[Mandalari18] [Mandalari18]
Mandalari, A., Lutu, A., Briscoe, B., Bagnulo, M., and Ö. Mandalari, A., Lutu, A., Briscoe, B., Bagnulo, M., and Ö.
Alay, "Measuring ECN++: Good News for ++, Bad News for ECN Alay, "Measuring ECN++: Good News for ++, Bad News for ECN
over Mobile", IEEE Communications Magazine , March 2018, over Mobile", IEEE Communications Magazine , March 2018,
<http://www.it.uc3m.es/amandala/ <http://www.it.uc3m.es/amandala/
ecn++/ecn_commag_2018.html>. ecn++/ecn_commag_2018.html>.
[RFC3449] Balakrishnan, H., Padmanabhan, V., Fairhurst, G., and M. [RFC3449] Balakrishnan, H., Padmanabhan, V., Fairhurst, G., and M.
Sooriyabandara, "TCP Performance Implications of Network Sooriyabandara, "TCP Performance Implications of Network
skipping to change at page 62, line 21 skipping to change at line 2852
(L4S) Internet Service: Architecture", RFC 9330, (L4S) Internet Service: Architecture", RFC 9330,
DOI 10.17487/RFC9330, January 2023, DOI 10.17487/RFC9330, January 2023,
<https://www.rfc-editor.org/info/rfc9330>. <https://www.rfc-editor.org/info/rfc9330>.
[RFC9438] Xu, L., Ha, S., Rhee, I., Goel, V., and L. Eggert, Ed., [RFC9438] Xu, L., Ha, S., Rhee, I., Goel, V., and L. Eggert, Ed.,
"CUBIC for Fast and Long-Distance Networks", RFC 9438, "CUBIC for Fast and Long-Distance Networks", RFC 9438,
DOI 10.17487/RFC9438, August 2023, DOI 10.17487/RFC9438, August 2023,
<https://www.rfc-editor.org/info/rfc9438>. <https://www.rfc-editor.org/info/rfc9438>.
[RoCEv2] InfiniBand Trade Association, "InfiniBand Architecture [RoCEv2] InfiniBand Trade Association, "InfiniBand Architecture
Specification Volume 1, Release 1.4", 2020, Specification", Volume 1, Release 1.4, 2020,
<https://www.infinibandta.org/ibta-specification/>. <https://www.infinibandta.org/ibta-specification/>.
Appendix A. Example Algorithms Appendix A. Example Algorithms
This appendix is informative, not normative. It gives example This appendix is informative, not normative. It gives example
algorithms that would satisfy the normative requirements of the algorithms that would satisfy the normative requirements of the
AccECN protocol. However, implementers are free to choose other ways AccECN protocol. However, implementers are free to choose other ways
to implement the requirements. to implement the requirements.
A.1. Example Algorithm to Encode/Decode the AccECN Option A.1. Example Algorithm to Encode/Decode the AccECN Option
skipping to change at page 62, line 46 skipping to change at line 2877
the ECEB field into its byte counter s.ceb. The other counters for the ECEB field into its byte counter s.ceb. The other counters for
bytes marked ECT(0) and ECT(1) in an AccECN Option would be similarly bytes marked ECT(0) and ECT(1) in an AccECN Option would be similarly
encoded and decoded. encoded and decoded.
It is assumed that each local byte counter is an unsigned integer It is assumed that each local byte counter is an unsigned integer
greater than 24b (probably 32b), and that the following constant has greater than 24b (probably 32b), and that the following constant has
been assigned: been assigned:
DIVOPT = 2^24 DIVOPT = 2^24
Every time a CE marked data segment arrives, the Data Receiver Every time a CE-marked data segment arrives, the Data Receiver
increments its local value of r.ceb by the size of the TCP Data. increments its local value of r.ceb by the size of the TCP Data.
Whenever it sends an ACK with an AccECN Option, the value it writes Whenever it sends an ACK with an AccECN Option, the value it writes
into the ECEB field is into the ECEB field is
ECEB = r.ceb % DIVOPT ECEB = r.ceb % DIVOPT
where '%' is the remainder operator. where '%' is the remainder operator.
On the arrival of an AccECN Option, the Data Sender first makes sure On the arrival of an AccECN Option, the Data Sender first makes sure
the ACK has not been superseded in order to avoid winding the s.ceb the ACK has not been superseded in order to avoid winding the s.ceb
counter backwards. It uses the TCP acknowledgement number and any counter backwards. It uses the TCP acknowledgement number and any
SACK options [RFC2018] to calculate newlyAckedB, the amount of new SACK options [RFC2018] to calculate newlyAckedB, the amount of new
data that the ACK acknowledges in bytes (newlyAckedB can be zero but data that the ACK acknowledges in bytes (newlyAckedB can be zero but
not negative). If newlyAckedB is zero, either the ACK has been not negative). If newlyAckedB is zero, either the ACK has been
superseded or CE-marked packet(s) without data could have arrived. superseded or CE-marked packet(s) without data could have arrived.
To break the tie for the latter case, the Data Sender could use time- To break the tie for the latter case, the Data Sender could use time-
stamps [RFC7323] (if present) to work out newlyAckedT, the amount of stamps [RFC7323] (if present) to work out newlyAckedT, the amount of
new time that the ACK acknowledges. If the Data Sender determines new time that the ACK acknowledges. If the Data Sender determines
that the ACK has been superseded it ignores the AccECN Option. that the ACK has been superseded, it ignores the AccECN Option.
Otherwise, the Data Sender calculates the minimum non-negative Otherwise, the Data Sender calculates the minimum non-negative
difference d.ceb between the ECEB field and its local s.ceb counter, difference d.ceb between the ECEB field and its local s.ceb counter,
using modulo arithmetic as follows: using modulo arithmetic as follows:
if ((newlyAckedB > 0) || (newlyAckedT > 0)) { if ((newlyAckedB > 0) || (newlyAckedT > 0)) {
d.ceb = (ECEB + DIVOPT - (s.ceb % DIVOPT)) % DIVOPT d.ceb = (ECEB + DIVOPT - (s.ceb % DIVOPT)) % DIVOPT
s.ceb += d.ceb s.ceb += d.ceb
} }
For example, if s.ceb is 33,554,433 and ECEB is 1461 (both decimal), For example, if s.ceb is 33,554,433 and ECEB is 1461 (both decimal),
then then
s.ceb % DIVOPT = 1 s.ceb % DIVOPT = 1
d.ceb = (1461 + 2^24 - 1) % 2^24 d.ceb = (1461 + 2^24 - 1) % 2^24
= 1460 = 1460
s.ceb = 33,554,433 + 1460 s.ceb = 33,554,433 + 1460
= 33,555,893 = 33,555,893
In practice an implementation might use heuristics to guess the In practice, an implementation might use heuristics to guess the
feedback in missing ACKs, then when it subsequently receives feedback feedback in missing ACKs. Then when it subsequently receives
it might find that it needs to correct its earlier heuristics as part feedback, it might find that it needs to correct its earlier
of the decoding process. The above decoding process does not include heuristics as part of the decoding process. The above decoding
any such heuristics. process does not include any such heuristics.
A.2. Example Algorithm for Safety Against Long Sequences of ACK Loss A.2. Example Algorithm for Safety Against Long Sequences of ACK Loss
The example algorithms below show how a Data Receiver in AccECN mode The example algorithms below show how a Data Receiver in AccECN mode
could encode its CE packet counter r.cep into the ACE field, and how could encode its CE packet counter r.cep into the ACE field, and how
the Data Sender in AccECN mode could decode the ACE field into its the Data Sender in AccECN mode could decode the ACE field into its
s.cep counter. The Data Sender's algorithm includes code to s.cep counter. The Data Sender's algorithm includes code to
heuristically detect a long enough unbroken string of ACK losses that heuristically detect a long enough unbroken string of ACK losses that
could have concealed a cycle of the congestion counter in the ACE could have concealed a cycle of the congestion counter in the ACE
field of the next ACK to arrive. field of the next ACK to arrive.
Two variants of the algorithm are given: i) a more conservative Two variants of the algorithm are given: i) a more conservative
variant for a Data Sender to use if it detects that AccECN Options variant for a Data Sender to use if it detects that AccECN Options
are not available (see Section 3.2.2.5 and Section 3.2.3.2); and ii) are not available (see Section 3.2.2.5 and Section 3.2.3.2); and ii)
a less conservative variant that is feasible when complementary a less conservative variant that is feasible when complementary
information is available from AccECN Options. information is available from AccECN Options.
A.2.1. Safety Algorithm without the AccECN Option A.2.1. Safety Algorithm Without the AccECN Option
It is assumed that each local packet counter is a sufficiently sized It is assumed that each local packet counter is a sufficiently sized
unsigned integer (probably 32b) and that the following constant has unsigned integer (probably 32b) and that the following constant has
been assigned: been assigned:
DIVACE = 2^3 DIVACE = 2^3
Every time an Acceptable CE marked packet arrives (Section 3.2.2.2), Every time an Acceptable CE marked packet arrives (Section 3.2.2.2),
the Data Receiver increments its local value of r.cep by 1. It the Data Receiver increments its local value of r.cep by 1. It
repeats the same value of ACE in every subsequent ACK until the next repeats the same value of ACE in every subsequent ACK until the next
CE marking arrives, where CE marking arrives, where
ACE = r.cep % DIVACE. ACE = r.cep % DIVACE.
If the Data Sender received an earlier value of the counter that had If the Data Sender received an earlier value of the counter that had
been delayed due to ACK reordering, it might incorrectly calculate been delayed due to ACK reordering, it might incorrectly calculate
that the ACE field had wrapped. Therefore, on the arrival of every that the ACE field had wrapped. Therefore, on the arrival of every
ACK, the Data Sender ensures the ACK has not been superseded using ACK, the Data Sender ensures the ACK has not been superseded using
the TCP acknowledgement number, any SACK options and timestamps (if the TCP acknowledgement number, any SACK options, and timestamps (if
available) to calculate newlyAckedB, as in Appendix A.1. If the ACK available) to calculate newlyAckedB, as in Appendix A.1. If the ACK
has not been superseded, the Data Sender calculates the minimum has not been superseded, the Data Sender calculates the minimum
difference d.cep between the ACE field and its local s.cep counter, difference d.cep between the ACE field and its local s.cep counter,
using modulo arithmetic as follows: using modulo arithmetic as follows:
if ((newlyAckedB > 0) || (newlyAckedT > 0)) if ((newlyAckedB > 0) || (newlyAckedT > 0))
d.cep = (ACE + DIVACE - (s.cep % DIVACE)) % DIVACE d.cep = (ACE + DIVACE - (s.cep % DIVACE)) % DIVACE
Section 3.2.2.5 expects the Data Sender to assume that the ACE field Section 3.2.2.5 expects the Data Sender to assume that the ACE field
cycled if it is the safest likely case under prevailing conditions. cycled if it is the safest likely case under prevailing conditions.
The 3-bit ACE field in an arriving ACK could have cycled and become The 3-bit ACE field in an arriving ACK could have cycled and become
ambiguous to the Data Sender if a sequence of ACKs goes missing that ambiguous to the Data Sender if a sequence of ACKs goes missing that
covers a stream of data long enough to contain 8 or more CE marks. covers a stream of data long enough to contain 8 or more CE marks.
We use the word `missing' rather than `lost', because some or all the We use the word 'missing' rather than 'lost', because some or all the
missing ACKs might arrive eventually, but out of order. Even if some missing ACKs might arrive eventually, but out of order. Even if some
of the missing ACKs were piggy-backed on data (i.e., not pure ACKs) of the missing ACKs were piggy-backed on data (i.e., not pure ACKs)
retransmissions will not repair the lost AccECN information, because retransmissions will not repair the lost AccECN information, because
AccECN requires retransmissions to carry the latest AccECN counters, AccECN requires retransmissions to carry the latest AccECN counters,
not the original ones. not the original ones.
The phrase `under prevailing conditions' allows for implementation- The phrase 'under prevailing conditions' allows for implementation-
dependent interpretation. A Data Sender might take account of the dependent interpretation. A Data Sender might take account of the
prevailing size of data segments and the prevailing CE marking rate prevailing size of data segments and the prevailing CE marking rate
just before the sequence of missing ACKs. However, we shall start just before the sequence of missing ACKs. However, we shall start
with the simplest algorithm, which assumes segments are all full- with the simplest algorithm, which assumes segments are all full-
sized and ultra-conservatively it assumes that ECN marking was 100% sized and ultra-conservatively it assumes that ECN marking was 100%
on the forward path when ACKs on the reverse path started to all be on the forward path when ACKs on the reverse path started to all be
dropped. Specifically, if newlyAckedB is the amount of data that an dropped. Specifically, if newlyAckedB is the amount of data that an
ACK acknowledges since the previous ACK, then the Data Sender could ACK acknowledges since the previous ACK, then the Data Sender could
assume that this acknowledges newlyAckedPkt full-sized segments, assume that this acknowledges newlyAckedPkt full-sized segments,
where newlyAckedPkt = newlyAckedB/MSS. Then it could assume that the where newlyAckedPkt = newlyAckedB/MSS. Then it could assume that the
ACE field incremented by ACE field incremented by
dSafer.cep = newlyAckedPkt - ((newlyAckedPkt - d.cep) % DIVACE), dSafer.cep = newlyAckedPkt - ((newlyAckedPkt - d.cep) % DIVACE)
For example, imagine an ACK acknowledges newlyAckedPkt=9 more full- For example, imagine an ACK acknowledges newlyAckedPkt=9 more full-
size segments than any previous ACK, and that ACE increments by a size segments than any previous ACK, and that ACE increments by a
minimum of 2 CE marks (d.cep=2). The above formula works out that it minimum of 2 CE marks (d.cep=2). The above formula works out that it
would still be safe to assume 2 CE marks (because 9 - ((9-2) % 8) = would still be safe to assume 2 CE marks (because 9 - ((9-2) % 8) =
2). However, if ACE increases by a minimum of 2 but acknowledges 10 2). However, if ACE increases by a minimum of 2 but acknowledges 10
full-sized segments, then it would be necessary to assume that there full-sized segments, then it would be necessary to assume that there
could have been 10 CE marks (because 10 - ((10-2) % 8) = 10). could have been 10 CE marks (because 10 - ((10-2) % 8) = 10).
Note that checks would need to be added to the above pseudocode for Note that checks would need to be added to the above pseudocode for
(d.cep > newlyAckedPkt), which could occur if newlyAckedPkt had been (d.cep > newlyAckedPkt), which could occur if newlyAckedPkt had been
wrongly estimated using an inappropriate packet size. wrongly estimated using an inappropriate packet size.
ACKs that acknowledge a large stretch of packets might be common in ACKs that acknowledge a large stretch of packets might be common in
data centres to achieve a high packet rate or might be due to ACK data centres to achieve a high packet rate or might be due to ACK
thinning by a middlebox. In these cases, cycling of the ACE field thinning by a middlebox. In these cases, cycling of the ACE field
would often appear to have been possible, so the above algorithm would often appear to have been possible, so the above algorithm
would be over-conservative, leading to a false high marking rate and would be overly conservative, leading to a false high marking rate
poor performance. Therefore it would be reasonable to only use and poor performance. Therefore, it would be reasonable to only use
dSafer.cep rather than d.cep if the moving average of newlyAckedPkt dSafer.cep rather than d.cep if the moving average of newlyAckedPkt
was well below 8. was well below 8.
Implementers could build in more heuristics to estimate prevailing Implementers could build in more heuristics to estimate a prevailing
average segment size and prevailing ECN marking. For instance, average segment size and prevailing ECN marking. For instance,
newlyAckedPkt in the above formula could be replaced with newlyAckedPkt in the above formula could be replaced with
newlyAckedPktHeur = newlyAckedPkt*p*MSS/s, where s is the prevailing newlyAckedPktHeur = newlyAckedPkt*p*MSS/s, where s is the prevailing
segment size and p is the prevailing ECN marking probability. segment size and p is the prevailing ECN marking probability.
However, ultimately, if TCP's ECN feedback becomes inaccurate it However, ultimately, if TCP's ECN feedback becomes inaccurate, it
still has loss detection to fall back on. Therefore, it would seem still has loss detection to fall back on. Therefore, it would seem
safe to implement a simple algorithm, rather than a perfect one. safe to implement a simple algorithm, rather than a perfect one.
The simple algorithm for dSafer.cep above requires no monitoring of The simple algorithm for dSafer.cep above requires no monitoring of
prevailing conditions and it would still be safe if, for example, prevailing conditions and it would still be safe if, for example,
segments were on average at least 5% of full-sized as long as ECN segments were on average at least 5% of full-sized as long as ECN
marking was 5% or less. Assuming it was used, the Data Sender would marking was 5% or less. Assuming it was used, the Data Sender would
increment its packet counter as follows: increment its packet counter as follows:
s.cep += dSafer.cep s.cep += dSafer.cep
If missing acknowledgement numbers arrive later (due to reordering), If missing acknowledgement numbers arrive later (due to reordering),
Section 3.2.2.5 says "the Data Sender MAY attempt to neutralize the Section 3.2.2.5.2 says "the Data Sender MAY attempt to neutralize the
effect of any action it took based on a conservative assumption that effect of any action it took based on a conservative assumption that
it later found to be incorrect". To do this, the Data Sender would it later found to be incorrect". To do this, the Data Sender would
have to store the values of all the relevant variables whenever it have to store the values of all the relevant variables whenever it
made assumptions, so that it could re-evaluate them later. Given made assumptions, so that it could re-evaluate them later. Given
this could become complex and it is not required, we do not attempt this could become complex and it is not required, we do not attempt
to provide an example of how to do this. to provide an example of how to do this.
A.2.2. Safety Algorithm with the AccECN Option A.2.2. Safety Algorithm with the AccECN Option
When AccECN Options are available on the ACKs before and after the When AccECN Options are available on the ACKs before and after the
possible sequence of ACK losses, if the Data Sender only needs CE- possible sequence of ACK losses, if the Data Sender only needs CE-
marked bytes, it will have sufficient information in AccECN Options marked bytes, it will have sufficient information in AccECN Options
without needing to process the ACE field. If for some reason it without needing to process the ACE field. If for some reason it
needs CE-marked packets, if dSafer.cep is different from d.cep, it needs CE-marked packets, if dSafer.cep is different from d.cep, it
can determine whether d.cep is likely to be a safe enough estimate by can determine whether d.cep is likely to be a safe enough estimate by
checking whether the average marked segment size (s = d.ceb/d.cep) is checking whether the average marked segment size (s = d.ceb/d.cep) is
less than the MSS (where d.ceb is the amount of newly CE-marked bytes less than the MSS (where d.ceb is the amount of newly CE-marked bytes
- see Appendix A.1). Specifically, it could use the following -- see Appendix A.1). Specifically, it could use the following
algorithm: algorithm:
SAFETY_FACTOR = 2 SAFETY_FACTOR = 2
if (dSafer.cep > d.cep) { if (dSafer.cep > d.cep) {
if (d.ceb <= MSS * d.cep) { % Same as (s <= MSS), but no DBZ if (d.ceb <= MSS * d.cep) { % Same as (s <= MSS), but no DBZ
sSafer = d.ceb/dSafer.cep sSafer = d.ceb/dSafer.cep
if (sSafer < MSS/SAFETY_FACTOR) if (sSafer < MSS/SAFETY_FACTOR)
dSafer.cep = d.cep % d.cep is a safe enough estimate dSafer.cep = d.cep % d.cep is a safe enough estimate
} % else } % else
% No need for else; dSafer.cep is already correct, % No need for else; dSafer.cep is already correct,
skipping to change at page 67, line 22 skipping to change at line 3084
MSS/SAFETY_FACTOR+--------------+ safest MSS/SAFETY_FACTOR+--------------+ safest
| | | |
| d.cep is safe| | d.cep is safe|
| enough | | enough |
+--------------------> +-------------------->
MSS s MSS s
The following examples give the reasoning behind the algorithm, The following examples give the reasoning behind the algorithm,
assuming MSS=1460 : assuming MSS=1460 :
* if d.cep=0, dSafer.cep=8 and d.ceb=1460, then s=infinity and * if d.cep=0, dSafer.cep=8, and d.ceb=1460, then s=infinity and
sSafer=182.5. sSafer=182.5.
Therefore even though the average size of 8 data segments is Therefore, even though the average size of 8 data segments is
unlikely to have been as small as MSS/8, d.cep cannot have been unlikely to have been as small as MSS/8, d.cep cannot have been
correct, because it would imply an average segment size greater correct, because it would imply an average segment size greater
than the MSS. than the MSS.
* if d.cep=2, dSafer.cep=10 and d.ceb=1460, then s=730 and * if d.cep=2, dSafer.cep=10, and d.ceb=1460, then s=730 and
sSafer=146. sSafer=146.
Therefore d.cep is safe enough, because the average size of 10 Therefore d.cep is safe enough, because the average size of 10
data segments is unlikely to have been as small as MSS/10. data segments is unlikely to have been as small as MSS/10.
* if d.cep=7, dSafer.cep=15 and d.ceb=10200, then s=1457 and * if d.cep=7, dSafer.cep=15, and d.ceb=10200, then s=1457 and
sSafer=680. sSafer=680.
Therefore d.cep is safe enough, because the average data segment Therefore d.cep is safe enough, because the average data segment
size is more likely to have been just less than one MSS, rather size is more likely to have been just less than one MSS, rather
than below MSS/2. than below MSS/2.
If pure ACKs were allowed to be ECN-capable, missing ACKs would be If pure ACKs were allowed to be ECN-capable, missing ACKs would be
far less likely. However, because [RFC3168] currently precludes far less likely. However, because [RFC3168] currently precludes
this, the above algorithm assumes that pure ACKs are not ECN-capable. this, the above algorithm assumes that pure ACKs are not ECN-capable.
A.3. Example Algorithm to Estimate Marked Bytes from Marked Packets A.3. Example Algorithm to Estimate Marked Bytes from Marked Packets
If AccECN Options are not available, the Data Sender can only decode If AccECN Options are not available, the Data Sender can only decode
CE-marking from the ACE field in packets. Every time an ACK arrives, a CE marking from the ACE field in packets. Every time an ACK
to convert this into an estimate of CE-marked bytes, it needs an arrives, to convert this into an estimate of CE-marked bytes, it
average of the segment size, s_ave. Then it can add or subtract needs an average of the segment size, s_ave. Then it can add or
s_ave from the value of d.ceb as the value of d.cep increments or subtract s_ave from the value of d.ceb as the value of d.cep
decrements. Some possible ways to calculate s_ave are outlined increments or decrements. Some possible ways to calculate s_ave are
below. The precise details will depend on why an estimate of marked outlined below. The precise details will depend on why an estimate
bytes is needed. of marked bytes is needed.
The implementation could keep a record of the byte numbers of all the The implementation could keep a record of the byte numbers of all the
boundaries between packets in flight (including control packets), and boundaries between packets in flight (including control packets), and
recalculate s_ave on every ACK. However it would be simpler to recalculate s_ave on every ACK. However, it would be simpler to
merely maintain a counter packets_in_flight for the number of packets merely maintain a counter packets_in_flight for the number of packets
in flight (including control packets), which is reset once per RTT. in flight (including control packets), which is reset once per RTT.
Either way, it would estimate s_ave as: Either way, it would estimate s_ave as:
s_ave ~= flightsize / packets_in_flight, s_ave ~= flightsize / packets_in_flight,
where flightsize is the variable that TCP already maintains for the where flightsize is the variable that TCP already maintains for the
number of bytes in flight and '~=' means 'approximately equal to'. number of bytes in flight and '~=' means 'approximately equal to'.
To avoid floating point arithmetic, it could right-bit-shift by To avoid floating point arithmetic, it could right-bit-shift by
lg(packets_in_flight), where lg() means log base 2. lg(packets_in_flight), where lg() means log base 2.
skipping to change at page 68, line 45 skipping to change at line 3149
where a is the decay constant for the EWMA. However, then it is where a is the decay constant for the EWMA. However, then it is
necessary to choose a good value for this constant, which ought to necessary to choose a good value for this constant, which ought to
depend on the number of packets in flight. Also the decay constant depend on the number of packets in flight. Also the decay constant
needs to be power of two to avoid floating point arithmetic. needs to be power of two to avoid floating point arithmetic.
A.4. Example Algorithm to Count Not-ECT Bytes A.4. Example Algorithm to Count Not-ECT Bytes
A Data Sender in AccECN mode can infer the amount of TCP payload data A Data Sender in AccECN mode can infer the amount of TCP payload data
arriving at the receiver marked Not-ECT from the difference between arriving at the receiver marked Not-ECT from the difference between
the amount of newly ACKed data and the sum of the bytes with the the amount of newly ACKed data and the sum of the bytes with the
other three markings, d.ceb, d.e0b and d.e1b. other three markings, d.ceb, d.e0b, and d.e1b.
For this approach to be precise, it has to be assumed that spurious For this approach to be precise, it has to be assumed that spurious
(unnecessary) retransmissions do not lead to double counting. This (unnecessary) retransmissions do not lead to double counting. This
assumption is currently correct, given that RFC 3168 requires that assumption is currently correct, given that RFC 3168 requires that
the Data Sender marks retransmitted segments as Not-ECT. However, the Data Sender mark retransmitted segments as Not-ECT. However, the
the converse is not true; necessary retransmissions will result in converse is not true; necessary retransmissions will result in
under-counting. undercounting.
However, such precision is unlikely to be necessary. The only known However, such precision is unlikely to be necessary. The only known
use of a count of Not-ECT marked bytes is to test whether equipment use of a count of Not-ECT marked bytes is to test whether equipment
on the path is clearing the ECN field (perhaps due to an out-dated on the path is clearing the ECN field (perhaps due to an out-dated
attempt to clear, or bleach, what used to be the IPv4 ToS byte or the attempt to clear, or bleach, what used to be the IPv4 ToS byte or the
IPv6 Traffic Class field). To detect bleaching it will be sufficient IPv6 Traffic Class field). To detect bleaching, it will be
to detect whether nearly all bytes arrive marked as Not-ECT. sufficient to detect whether nearly all bytes arrive marked as Not-
Therefore there ought to be no need to keep track of the details of ECT. Therefore, there ought to be no need to keep track of the
retransmissions. details of retransmissions.
Appendix B. Rationale for Usage of TCP Header Flags Appendix B. Rationale for Usage of TCP Header Flags
B.1. Three TCP Header Flags in the SYN-SYN/ACK Handshake B.1. Three TCP Header Flags in the SYN-SYN/ACK Handshake
AccECN uses a rather unorthodox approach to negotiate the highest AccECN uses a rather unorthodox approach to negotiate the highest
version TCP ECN feedback scheme that both ends support, as justified version TCP ECN feedback scheme that both ends support, as justified
below. It follows from the original TCP ECN capability negotiation below. It follows from the original TCP ECN capability negotiation
[RFC3168], in which the Client set the 2 least significant of the [RFC3168], in which the Client set the 2 least significant of the
original reserved flags in the TCP header, and fell back to no ECN original reserved flags in the TCP header, and fell back to No ECN
support if the Server responded with the 2 flags cleared, which had support if the Server responded with the 2 flags cleared, which had
previously been the default. previously been the default.
Classic ECN used header flags rather than a TCP option because it was Classic ECN used header flags rather than a TCP option because it was
considered more efficient to use a header flag for 1 bit of feedback considered more efficient to use a header flag for 1 bit of feedback
per ACK, and this bit could be overloaded to indicate support for per ACK, and this bit could be overloaded to indicate support for
Classic ECN during the handshake. During the development of ECN, 1 Classic ECN during the handshake. During the development of ECN, 1
bit crept up to 2, in order to deliver the feedback reliably and to bit crept up to 2, in order to deliver the feedback reliably and to
work round some broken hosts that reflected the reserved flags during work round some broken hosts that reflected the reserved flags during
the handshake. the handshake.
In order to be backward compatible with RFC 3168, AccECN continues In order to be backward compatible with RFC 3168, AccECN continues
this approach, using the 3rd least significant TCP header flag that this approach, using the 3rd least significant TCP header flag that
had previously been allocated for the ECN nonce (now historic). had previously been allocated for the ECN-nonce (now historic).
Then, whatever form of Server an AccECN Client encounters, the Then, whatever form of Server an AccECN Client encounters, the
connection can fall back to the highest version of feedback protocol connection can fall back to the highest version of feedback protocol
that both ends support, as explained in Section 3.1. that both ends support, as explained in Section 3.1.
If AccECN capability negotiation had used the more orthodox approach If AccECN capability negotiation had used the more orthodox approach
of a TCP option, it would still have had to set the two ECN flags in of a TCP option, it would still have had to set the two ECN flags in
the main TCP header, in order to be able to fall back to Classic RFC the main TCP header, in order to be able to fall back to Classic ECN
3168 ECN, or to disable ECN support, without another round of [RFC3168], or to disable ECN support, without another round of
negotiation. Then AccECN would also have had to handle all the negotiation. Then AccECN would also have had to handle all the
different ways that Servers currently respond to settings of the ECN different ways that Servers currently respond to settings of the ECN
flags in the main TCP header, including all the conflicting cases flags in the main TCP header, including all of the conflicting cases
where a Server might have said it supported one approach in the flags where a Server might have said it supported one approach in the flags
and another approach in a new TCP option. And AccECN would have had and another approach in a new TCP option. And AccECN would have had
to deal with all the additional possibilities where a middlebox might to deal with all of the additional possibilities where a middlebox
have mangled the ECN flags, or removed TCP options. Thus, usage of might have mangled the ECN flags, or removed TCP options. Thus,
the 3rd reserved TCP header flag simplified the protocol. usage of the 3rd reserved TCP header flag simplified the protocol.
The third flag was used in a way that could be distinguished from the The third flag was used in a way that could be distinguished from the
ECN nonce, in case any nonce deployment was encountered. Previous ECN-nonce, in case any nonce deployment was encountered. Previous
usage of this flag for the ECN nonce was integrated into the original usage of this flag for the ECN-nonce was integrated into the original
ECN negotiation. This further justified the 3rd flag's use for ECN negotiation. This further justified the third flag's use for
AccECN, because a non-ECN usage of this flag would have had to use it AccECN, because a non-ECN usage of this flag would have had to use it
as a separate single bit, rather than in combination with the other 2 as a separate single bit, rather than in combination with the other 2
ECN flags. ECN flags.
Indeed, having overloaded the original uses of these three flags for Indeed, having overloaded the original uses of these three flags for
its handshake, AccECN overloads all three bits again as a 3-bit its handshake, AccECN overloads all three bits again as a 3-bit
counter. counter.
B.2. Four Codepoints in the SYN/ACK B.2. Four Codepoints in the SYN/ACK
Of the 8 possible codepoints that the 3 TCP header flags can indicate Of the eight possible codepoints that the three TCP header flags can
on the SYN/ACK, 4 already indicated earlier (or broken) versions of indicate on the SYN/ACK, four already indicated earlier (or broken)
ECN support, 1 now being historic. In the early design of AccECN, an versions of ECN support, one now being Historic. In the early design
AccECN Server could use only 2 of the 4 remaining codepoints. They of AccECN, an AccECN Server could use only 2 of the 4 remaining
both indicated AccECN support, but one fed back that the SYN had codepoints. They both indicated AccECN support, but one fed back
arrived marked as CE. Even though ECN support on a SYN is not yet on that the SYN had arrived marked as CE. Even though ECN support on a
the standards track, the idea is for either end to act as a SYN is not yet on the Standards Track, the idea is for either end to
mechanistic reflector, so that future capabilities can be act as a mechanistic reflector, so that future capabilities can be
unilaterally deployed without requiring 2-ended deployment (justified unilaterally deployed without requiring 2-ended deployment (justified
in Section 2.5). in Section 2.5).
During traversal testing it was discovered that the IP-ECN field in During traversal testing, it was discovered that the IP-ECN field in
the SYN was mangled on a non-negligible proportion of paths. the SYN was mangled on a non-negligible proportion of paths.
Therefore it was necessary to allow the SYN/ACK to feed all four IP- Therefore, it was necessary to allow the SYN/ACK to feed all four IP-
ECN codepoints that the SYN could arrive with back to the Client. ECN codepoints that the SYN could arrive with back to the Client.
Without this, the Client could not know whether to disable ECN for Without this, the Client could not know whether to disable ECN for
the connection due to mangling of the IP-ECN field (also explained in the connection due to mangling of the IP-ECN field (also explained in
Section 2.5). This development consumed the remaining 2 codepoints Section 2.5). This development consumed the remaining two codepoints
on the SYN/ACK that had been reserved for future use by AccECN in on the SYN/ACK that had been reserved for future use by AccECN in
earlier versions. earlier versions.
B.3. Space for Future Evolution B.3. Space for Future Evolution
Despite availability of usable TCP header space being extremely Despite availability of usable TCP header space being extremely
scarce, the AccECN protocol has taken all possible steps to ensure scarce, the AccECN protocol has taken all possible steps to ensure
that there is space to negotiate possible future variants of the that there is space to negotiate possible future variants of the
protocol, either if a variant of AccECN is required, or if a protocol, either if a variant of AccECN is required, or if a
completely different ECN feedback approach is needed: completely different ECN feedback approach is needed.
Future AccECN variants: When the AccECN capability is negotiated Future AccECN variants: When the AccECN capability is negotiated
during TCP's three-way handshake, the rows in Table 2 tagged as during TCP's three-way handshake, the rows in Table 2 tagged as
'Nonce' and 'Broken' in the column for the capability of node B 'Nonce' and 'Broken' in the column for the capability of node B
are unused by any current protocol in the RFC series. These could are unused by any current protocol defined in the RFC series.
be used by TCP Servers in future to indicate a variant of the These could be used by TCP Servers in the future to indicate a
AccECN protocol. In recent measurement studies in which the variant of the AccECN protocol. In recent measurement studies in
response of large numbers of Servers to an AccECN SYN has been which the response of large numbers of Servers to an AccECN SYN
tested, e.g., [Mandalari18], a very small number of SYN/ACKs has been tested, e.g., [Mandalari18], a very small number of SYN/
arrive with the pattern tagged as 'Nonce', and a small but more ACKs arrive with the pattern tagged as 'Nonce', and a small but
significant number arrive with the pattern tagged as 'Broken'. more significant number arrive with the pattern tagged as
The 'Nonce' pattern could be a sign that a few Servers have 'Broken'. The 'Nonce' pattern could be a sign that a few Servers
implemented the ECN Nonce [RFC3540], which has now been have implemented the ECN-nonce [RFC3540], which has now been
reclassified as historic [RFC8311], or it could be the random reclassified as Historic [RFC8311], or it could be the random
result of some unknown middlebox behaviour. The greater result of some unknown middlebox behaviour. The greater
prevalence of the 'Broken' pattern suggests that some instances prevalence of the 'Broken' pattern suggests that some instances
still exist of the broken code that reflects the reserved flags on still exist of the broken code that reflects the reserved flags on
the SYN. the SYN.
The requirement not to reject unexpected initial values of the ACE The requirement not to reject unexpected initial values of the ACE
counter (in the main TCP header) in the last paragraph of counter (in the main TCP header) in the last paragraph of
Section 3.2.2.4 ensures that 3 unused codepoints on the ACK of the Section 3.2.2.4 ensures that three unused codepoints on the ACK of
SYN/ACK, 6 unused values on the first SYN=0 data packet from the the SYN/ACK, six unused values on the first SYN=0 data packet from
Client and 7 unused values on the first SYN=0 data packet from the the Client, and seven unused values on the first SYN=0 data packet
Server could be used to declare future variants of the AccECN from the Server could be used to declare future variants of the
protocol. The word 'declare' is used rather than 'negotiate' AccECN protocol. The word 'declare' is used rather than
because, at this late stage in the three-way handshake, it would 'negotiate' because, at this late stage in the three-way
be too late for a negotiation between the endpoints to be handshake, it would be too late for a negotiation between the
completed. A similar requirement not to reject unexpected initial endpoints to be completed. A similar requirement not to reject
values in AccECN TCP Options (Section 3.2.3.2.4) is for the same unexpected initial values in AccECN TCP Options
purpose. If traversal of AccECN TCP Options were reliable, this (Section 3.2.3.2.4) is for the same purpose. If traversal of
would have enabled a far wider range of future variation of the AccECN TCP Options were reliable, this would have enabled a far
whole AccECN protocol. Nonetheless, it could be used to reliably wider range of future variation of the whole AccECN protocol.
negotiate a wide range of variation in the semantics of the AccECN Nonetheless, it could be used to reliably negotiate a wide range
Option. of variation in the semantics of the AccECN Option.
Future non-AccECN variants: Five codepoints out of the 8 possible in Future non-AccECN variants: Five codepoints out of the eight
the 3 TCP header flags used by AccECN are unused on the initial possible in the three TCP header flags used by AccECN are unused
SYN (in the order (AE,CWR,ECE)): (0,0,1), (0,1,0), (1,0,0), on the initial SYN (in the order (AE,CWR,ECE)): (0,0,1), (0,1,0),
(1,0,1), (1,1,0). Section 3.1.3 ensures that the installed base (1,0,0), (1,0,1), (1,1,0). Section 3.1.3 ensures that the
of AccECN Servers will all assume these are equivalent to AccECN installed base of AccECN Servers will all assume these are
negotiation with (1,1,1) on the SYN. These codepoints would not equivalent to AccECN negotiation with (1,1,1) on the SYN. These
allow fall-back to Classic ECN support for a Server that did not codepoints would not allow fall-back to Classic ECN support for a
understand them, but this approach ensures they are available in Server that did not understand them, but this approach ensures
future, perhaps for uses other than ECN alongside the AccECN they are available in the future, perhaps for uses other than ECN
scheme. All possible combinations of SYN/ACK could be used in alongside the AccECN scheme. All possible combinations of SYN/ACK
response except either (0,0,0) or reflection of the same values could be used in response except either (0,0,0) or reflection of
sent on the SYN. the same values sent on the SYN.
In order to extend AccECN or ECN in future, other ways could be In order to extend AccECN or ECN in the future, other ways could
resorted to, although their traversal properties are likely to be be resorted to, although their traversal properties are likely to
inferior. They include a new TCP option; using the remaining be inferior. They include a new TCP option; using the remaining
reserved flags in the main TCP header (preferably extending the reserved flags in the main TCP header (preferably extending the
3-bit combinations used by AccECN to 4-bit combinations, rather 3-bit combinations used by AccECN to 4-bit combinations, rather
than burning one bit for just one state); a non-zero urgent than burning one bit for just one state); a non-zero urgent
pointer in combination with the URG flag cleared; or some other pointer in combination with the URG flag cleared; or some other
unexpected combination of fields yet to be invented. unexpected combination of fields yet to be invented.
Acknowledgements Acknowledgements
We want to thank Koen De Schepper, Praveen Balasubramanian, Michael We want to thank Koen De Schepper, Praveen Balasubramanian, Michael
Welzl, Gorry Fairhurst, David Black, Spencer Dawkins, Michael Scharf, Welzl, Gorry Fairhurst, David Black, Spencer Dawkins, Michael Scharf,
skipping to change at page 72, line 23 skipping to change at line 3322
Järvinen, Neal Cardwell, Yoshifumi Nishida, Martin Duke, Jonathan Järvinen, Neal Cardwell, Yoshifumi Nishida, Martin Duke, Jonathan
Morton, Vidhi Goel, Alex Burr, Markku Kojo, Grenville Armitage and Morton, Vidhi Goel, Alex Burr, Markku Kojo, Grenville Armitage and
Wes Eddy for their input and discussion. The idea of using the three Wes Eddy for their input and discussion. The idea of using the three
ECN-related TCP flags as one field for more accurate TCP-ECN feedback ECN-related TCP flags as one field for more accurate TCP-ECN feedback
was first introduced in the re-ECN protocol that was the ancestor of was first introduced in the re-ECN protocol that was the ancestor of
ConEx. ConEx.
The following contributed implementations of AccECN that validated The following contributed implementations of AccECN that validated
and helped to improve this specification: and helped to improve this specification:
Linux: Mirja Kühlewind, Ilpo Järvinen, Neal Cardwell and Chia-Yu Linux: Mirja Kühlewind, Ilpo Järvinen, Neal Cardwell, and Chia-Yu
Chang; Chang
FreeBSD: Richard Scheffenegger; FreeBSD: Richard Scheffenegger
Apple OSs: Vidhi Goel. Apple OSs: Vidhi Goel
Bob Briscoe was part-funded by Apple Inc, the Comcast Innovation Bob Briscoe was part-funded by Apple Inc, the Comcast Innovation
Fund, the European Community under its Seventh Framework Programme Fund, the European Community under its Seventh Framework Programme
through the Reducing Internet Transport Latency (RITE) project (ICT- through the Reducing Internet Transport Latency (RITE) project (ICT-
317700) and through the Trilogy 2 project (ICT-317756), and the 317700) and through the Trilogy 2 project (ICT-317756), and the
Research Council of Norway through the TimeIn project. The views Research Council of Norway through the TimeIn project. The views
expressed here are solely those of the authors. expressed here are solely those of the authors.
Mirja Kühlewind was partly supported by the European Commission under Mirja Kühlewind was partly supported by the European Commission under
Horizon 2020 grant agreement no. 688421 Measurement and Architecture Horizon 2020 grant agreement no. 688421 Measurement and Architecture
for a Middleboxed Internet (MAMI), and by the Swiss State Secretariat for a Middleboxed Internet (MAMI), and by the Swiss State Secretariat
for Education, Research, and Innovation under contract no. 15.0268. for Education, Research, and Innovation under contract no. 15.0268.
This support does not imply endorsement. This support does not imply endorsement.
Comments Solicited
This section is to be removed before publishing as an RFC.
Comments and questions are encouraged and very welcome. They can be
addressed to the IETF TCP maintenance and minor modifications working
group mailing list <tcpm@ietf.org>, and/or to the authors.
Authors' Addresses Authors' Addresses
Bob Briscoe Bob Briscoe
Independent Independent
United Kingdom United Kingdom
Email: ietf@bobbriscoe.net Email: ietf@bobbriscoe.net
URI: http://bobbriscoe.net/ URI: http://bobbriscoe.net/
Mirja Kühlewind Mirja Kühlewind
Ericsson Ericsson
 End of changes. 286 change blocks. 
740 lines changed or deleted 706 lines changed or added

This html diff was produced by rfcdiff 1.48.