<?xmlversion="1.0" encoding="US-ASCII"?>version='1.0' encoding='UTF-8'?> <!DOCTYPE rfc [ <!ENTITY nbsp " "> <!ENTITYOuml "Ö"> <!ENTITY auml "ä"> <!ENTITY uuml "ü"> <!ENTITYzwsp "​"> <!ENTITY nbhy "‑"> <!ENTITYmdash "—"> <!ENTITYwj "⁠"> ]><?xml-stylesheet type='text/xsl' href='http://xml.resource.org/authoring/rfc2629.xslt' ?> <!-- Alterations to I-D/RFC boilerplate --> <?rfc private="" ?> <!-- Default private="" Produce an internal memo 2.5pp shorter than an I-D or RFC --> <?rfc rfcprocack="yes" ?> <!-- Default rfcprocack="no" add a short sentence acknowledging xml2rfc --> <?rfc strict="no" ?> <!-- Default strict="no" Don't check I-D nits --> <?rfc rfcedstyle="yes" ?> <!-- Default rfcedstyle="yes" attempt to closely follow finer details from the latest observable RFC-Editor style --> <!-- IETF process --> <?rfc iprnotified="no" ?> <!-- Default iprnotified="no" I haven't disclosed existence of IPR to IETF --> <!-- ToC format --> <?rfc toc="yes" ?> <!-- Default toc="no" No Table of Contents --> <!-- ToC depth --> <?rfc tocdepth="4" ?> <!-- Default tocDepth="3" Exclude subsections of depth >3 from Table of Contents --> <!-- Cross referencing, footnotes, comments --> <?rfc symrefs="yes"?> <!-- Default symrefs="no" Don't use anchors, but use numbers for refs --> <?rfc sortrefs="yes"?> <!-- Default sortrefs="no" Don't sort references into order --> <?rfc comments="yes" ?> <!-- Default comments="no" Don't render comments --> <?rfc inline="no" ?> <!-- Default inline="no" if comments is "yes", then render comments inline; otherwise render them in an `Editorial Comments' section --> <!-- Pagination control --> <?rfc compact="yes"?> <!-- Default compact="no" Start sections on new pages --> <?rfc subcompact="no"?> <!-- Default subcompact="(as compact setting)" yes/no is not quite as compact as yes/yes --> <!-- HTML formatting control --> <?rfc emoticonic="yes" ?> <!-- Default emoticonic="no" Doesn't prettify HTML format --><rfc xmlns:xi="http://www.w3.org/2001/XInclude" category="std"consensus="yes"consensus="true" docName="draft-ietf-tcpm-accurate-ecn-34" number="9768" submissionType="IETF" ipr="pre5378Trust200902" updates="3168"xmlns:xi="http://www.w3.org/2001/XInclude">obsoletes="" tocInclude="true" tocDepth="4" symRefs="true" sortRefs="true" version="3" xml:lang="en"> <front> <title abbrev="Accurate TCP-ECN Feedback">More Accurate Explicit Congestion Notification (AccECN) Feedback in TCP</title> <seriesInfo name="RFC" value="9768"/> <author fullname="Bob Briscoe" initials="B." surname="Briscoe"> <organization>Independent</organization> <address> <postal><street/> <city/> <country>UK</country><country>United Kingdom</country> </postal> <email>ietf@bobbriscoe.net</email> <uri>http://bobbriscoe.net/</uri> </address> </author> <author fullname="MirjaKühlewind"Kühlewind" initials="M."surname="Kühlewind">surname="Kühlewind"> <organization>Ericsson</organization> <address> <postal><street/><country>Germany</country> </postal> <email>ietf@kuehlewind.net</email> </address> </author> <author fullname="Richard Scheffenegger" initials="R." surname="Scheffenegger"> <organization>NetApp</organization> <address> <postal><street/><city>Vienna</city><region/> <code/><country>Austria</country> </postal> <email>Richard.Scheffenegger@netapp.com</email> </address> </author> <dateyear=""/> <area>Transport</area> <workgroup>TCP Maintenance & Minor Extensions (tcpm)</workgroup>year="2025" month="August"/> <area>WIT</area> <workgroup>tcpm</workgroup> <keyword>Congestion Control and Management</keyword> <keyword>Congestion Notification</keyword> <keyword>Feedback</keyword> <keyword>Reliable</keyword> <keyword>Ordered</keyword> <keyword>Protocol</keyword> <keyword>ECN</keyword> <abstract> <t>Explicit Congestion Notification (ECN) is a mechanismwhereby which network nodes can mark IP packets instead of dropping them to indicate incipient congestion to the endpoints. Receivers with an ECN-capable transport protocol feed back this information to the sender. ECN was originally specified for TCP in such a way that only one feedback signal can be transmitted per Round-Trip Time (RTT).Recent newNewer TCP mechanisms like Congestion Exposure (ConEx), Data Center TCP(DCTCP)(DCTCP), or Low Latency, Low Loss, and Scalable Throughput (L4S) need more Accurate ECN (AccECN) feedback information whenever more than one marking is received in one RTT. This document updates the original ECN specification defined in RFC 3168to specifyby specifying a scheme that provides more than one feedback signal per RTT in the TCP header. Given TCP header space is scarce, it allocates a reserved header bit previously assigned to theECN-Nonce.ECN-nonce. It also overloads the two existing ECN flags in the TCP header. The resulting extra space is additionally exploited to feed back the IP-ECN field received during the TCP connection establishment. Supplementary feedback information can optionally be provided in two new TCP option alternatives, which are never used on the TCP SYN. The document also specifies the treatment of this updated TCP wire protocol by middleboxes.</t> </abstract> </front><!-- ================================================================ --><middle><!-- ================================================================ --><sectionanchor="accecn_Introduction" title="Introduction">anchor="accecn_Introduction"> <name>Introduction</name> <t>Explicit Congestion Notification (ECN) <xref target="RFC3168"/> is a mechanismwhereby which network nodes can mark IP packets instead of dropping them to indicate incipient congestion to the endpoints. Receivers with an ECN-capable transport protocol feed back this information to the sender. In RFC 3168, ECN was specified for TCP in such a way that only one feedback signal could be transmitted per Round-Trip Time (RTT). This is sufficient for congestion controlschemeschemes like Reno <xref target="RFC6582"/> andCubicCUBIC <xref target="RFC9438"/>, as those schemes reduce their congestion window by a fixed factor if congestion occurs within an RTT independent of the number of received congestion markings. <!-- [rfced] Because these documents are defined in Informational RFCs, is "proposed" needed here? Original: Recently, proposed mechanisms like Congestion Exposure (ConEx [RFC7713]), DCTCP [RFC8257] or L4S [RFC9330] need to know when more than one marking is received in one RTT, which is information that cannot be provided by the feedback scheme as specified in [RFC3168]. Perhaps: Newer mechanisms like Congestion Exposure (ConEx [RFC7713]), DCTCP [RFC8257], or L4S [RFC9330] ... Or perhaps, "More recently defined mechanisms ..." --> Recently, proposed mechanisms like Congestion Exposure (ConEx <xref target="RFC7713"/>), DCTCP <xreftarget="RFC8257"/> ortarget="RFC8257"/>, and L4S <xref target="RFC9330"/> need to know when more than one marking is received in one RTT, which is information that cannot be provided by the feedback scheme as specified in <xref target="RFC3168"/>. This document specifies an update to the ECN feedback scheme of RFC 3168 that provides more accurate information and could be used by these and potentially other future TCP extensions, while still also supporting the pre-existing TCP congestion controllers that use just one feedback signal per round. Congestion control is the term the IETF uses to describe data rate management. It is the algorithm that a sender uses to optimize its sending rate so that it transmits data as fast as the network can carry it, but no faster. A fullertreatmentdescription of the motivation for this specification is given in the associated requirements document <xref target="RFC7560"/>.</t> <t>This document specifies astandards trackStandards Track scheme for ECN feedback in the TCP header to provide more than one feedback signal per RTT. Itwill beis called the moreAccurate ECN"Accurate ECN" feedback scheme, or AccECN for short. This document updates RFC 3168 with respect to negotiation and use of the feedback scheme for TCP. All aspects of RFC 3168 other than the TCP feedback scheme and its negotiation remain unchanged by this specification. Inparticularparticular, the definition of ECN at the IP layer is unaffected. <xref target="accecn_3168_updates"/>gives a more detailed specification of exactly whichdetails the aspects of RFC 3168 that are updated by thisdocument updates.</t>document.</t> <t>This document uses the termClassic"Classic ECNfeedbackfeedback" when it needs to distinguish the TCP/ECN feedback scheme defined in <xref target="RFC3168"/> from the AccECN TCP feedback scheme. AccECN is intended to offer a complete replacement for Classic TCP/ECN feedback, not a fork in the design of TCP. AccECN feedback complements TCP's loss feedback and it can coexist alongside hosts using Classic TCP/ECN feedback. So its applicability is intended to include the public Internet as well as private IPnetworknetworks such as data centres (and even any non-IP networks over which TCP is used), whether or not any nodes on the path support ECN, of whatever flavour.</t> <t>AccECN feedback overloads the two existing ECN flags in the TCP header and allocates the currently reserved flag (previously called NS) in the TCPheader,header to be used as onethree-bit3-bit counter field for feeding back the number of packets marked as congestion experienced (CE). Given the new definitions of these three bits, both ends have to support the new wire protocol before it can be used. Therefore, during the TCP handshake, the two ends use these three bits in the TCP header to negotiate the most advanced feedback protocol that they can both support, in a way that is backward compatible with <xref target="RFC3168"/>.</t> <t>AccECN is solely a change to the TCP wire protocol; it covers the negotiation and signaling of more Accurate ECN feedback from a TCP Data Receiver to a Data Sender. It is completely independent of how TCP might respond to congestion feedback, which is out of scope, but ultimately the motivation for Accurate ECN feedback. Like Classic ECN feedback, AccECN can be used by standard Reno or CUBIC congestion control <xref target="RFC5681"/> <xref target="RFC9438"/> to respond to the existence of at least one congestion notification within a round trip. <!-- [rfced] We are having trouble parsing "extent of congestion notification". Perhaps this means "indicate the amount of congestion over the round trip"? Please clarify. Original: Or, unlike Reno or CUBIC, AccECN can be used to respond to the extent of congestion notification over a round trip, as for example DCTCP does in controlled environments [RFC8257]. --> Or, unlike Reno or CUBIC, AccECN can be used to respond to the extent of congestion notification over a round trip, as for example DCTCP does in controlled environments <xref target="RFC8257"/>. For congestion response, this specification refers to the original ECNspecificiationspecification adopted in 2001 <xref target="RFC3168"/>, as updated by the more relaxed rules introduced in 2018 to allow ECN experiments <xref target="RFC8311"/>, namely: a TCP-based Low Latency Low Loss Scalable (L4S) congestion control <xref target="RFC9330"/>; or Alternative Backoff with ECN (ABE) <xref target="RFC8511"/>.</t> <t><xref target="accecn_Interaction_Other"/> explains how AccECN is compatible with current commonly used TCP options, and a number of current experimental modifications to TCP, as well as SYN cookies.</t><section title="Document Roadmap"><section> <name>Document Roadmap</name> <t>The following introductory section outlines the goals of AccECN (<xref target="accecn_Goals"/>). Then, terminology is defined (<xref target="accecn_Terminology"/>) and a recap of existing prerequisite technology is given (<xref target="accecn_Recap"/>).</t> <t><xref target="accecn_Overview"/> gives an informative overview of the AccECN protocol. Then <xref target="accecn_Spec"/> gives the normative protocol specification, and <xref target="accecn_Mbox_Operation"/> collectstogetherrequirements for proxies, offloadenginesengines, and other middleboxes. <xref target="accecn_3168_updates"/> clarifies which aspects of RFC 3168 are updated by AccECN. <xref target="accecn_Interact_Variants"/> assesses the interaction of AccECN with commonly used variants of TCP, whether they are standardized or not. Then <xref target="accecn_Properties"/> summarizes the features and properties of AccECN.</t> <t><xref target="accecn_IANA_Considerations"/> summarizes the protocol fields and numbers that IANAwill need to assignassigned, and <xref target="accecn_Security_Considerations"/> points to the aspects of the protocol that will be of interest to the security community.</t> <t><xref target="accecn_Algo_Examples"/> gives pseudocode examples for the various algorithms that AccECNusesuses, and <xref target="accecn_flags_rationale"/> explains why AccECN uses flags in the main TCP header and quantifies the space left for future use.</t> </section> <sectionanchor="accecn_Goals" title="Goals">anchor="accecn_Goals"> <name>Goals</name> <t><xref target="RFC7560"/> enumerates requirements that a candidate feedback schemewill needneeds to satisfy, under the headings: resilience, timeliness, integrity, accuracy (including ordering and lack of bias), complexity,overheadoverhead, and compatibility (both backward and forward). It recognizes that a perfect scheme that fully satisfies all the requirements is unlikely and trade-offs between requirements are likely. <xref target="accecn_Properties"/>presentsconsiders the properties of AccECN against these requirements and discusses thetrade-offs made.</t>trade-offs.</t> <t>The requirements document recognizes that a protocol as ubiquitous as TCP needs to be able to serve as-yet-unspecified requirements.ThereforeTherefore, an AccECN receiver acts as a generic (mechanistic) reflector of congestion information with the aim thatin futurenew sender behaviours can be deployed unilaterally (see <xreftarget="accecn_demb_reflector"/>).</t>target="accecn_demb_reflector"/>) in the future.</t> </section> <sectionanchor="accecn_Terminology" title="Terminology"> <t><list style="hanging"> <t hangText="AccECN:">Theanchor="accecn_Terminology"> <name>Terminology</name> <dl newline="false" spacing="normal"> <dt>AccECN:</dt> <dd>The more Accurate ECN feedback schemewill beis called AccECN forshort.</t> <t hangText="Classic ECN:">Theshort.</dd> <dt>Classic ECN:</dt> <dd>The ECN protocol specified in <xreftarget="RFC3168"/>.</t> <t hangText="Classictarget="RFC3168"/>.</dd> <dt>Classic ECNfeedback:">Thefeedback:</dt> <dd>The feedback aspect of the ECN protocol specified in <xref target="RFC3168"/>, including generation, encoding, transmission and decoding of feedback, but not the Data Sender's subsequent response to thatfeedback.</t> <t hangText="ACK:">Afeedback.</dd> <dt>ACK:</dt> <dd>A TCP acknowledgement, with or without a data payload(ACK=1).</t> <t hangText="Pure ACK:">A(ACK=1).</dd> <dt>Pure ACK:</dt> <dd>A TCP acknowledgement without a datapayload.</t> <t hangText="Acceptablepayload.</dd> <dt>Acceptable packet /segment:">Asegment:</dt> <dd>A packet or segment that passes the acceptability tests in <xref target="RFC9293"/> and <xref target="RFC5961"/>, or that has passed other tests with equivalentprotection.</t> <t hangText="TCP Client:">Theprotection.</dd> <dt>TCP Client:</dt> <dd>The TCP stack that originates a connection (theinitiator).</t> <t hangText="TCP Server:">Theinitiator).</dd> <dt>TCP Server:</dt> <dd>The TCP stack that responds to a connection request (thelistener).</t> <t hangText="Three-way handshake:">Thelistener).</dd> <dt>Three-way handshake:</dt> <dd>The procedure used to establish a TCP connection as described in the TCP protocol specification <xreftarget="RFC9293"/>.</t> <t hangText="Data Receiver:">Thetarget="RFC9293"/>.</dd> <dt>Data Receiver:</dt> <dd>The endpoint of a TCP half-connection that receives data and sends AccECNfeedback.</t> <t hangText="Data Sender:">Thefeedback.</dd> <dt>Data Sender:</dt> <dd>The endpoint of a TCP half-connection that sends data and receives AccECNfeedback.</t> </list>feedback.</dd> </dl> <t> In a mild abuse of terminology, this document sometimes refers to 'TCP packets' instead of 'TCP segments'.</t><t>The<t> The key words"MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY","<bcp14>MUST</bcp14>", "<bcp14>MUST NOT</bcp14>", "<bcp14>REQUIRED</bcp14>", "<bcp14>SHALL</bcp14>", "<bcp14>SHALL NOT</bcp14>", "<bcp14>SHOULD</bcp14>", "<bcp14>SHOULD NOT</bcp14>", "<bcp14>RECOMMENDED</bcp14>", "<bcp14>NOT RECOMMENDED</bcp14>", "<bcp14>MAY</bcp14>", and"OPTIONAL""<bcp14>OPTIONAL</bcp14>" in this document are to be interpreted as described inBCP 14BCP 14 <xref target="RFC2119"/> <xref target="RFC8174"/> when, and only when, they appear in all capitals, as shownhere.</t>here. </t> </section> <sectionanchor="accecn_Recap" title="Recapanchor="accecn_Recap"> <name>Recap of Existing ECNfeedbackFeedback inIP/TCP">IP/TCP</name> <t>Explicit Congestion Notification (ECN) <xref target="RFC3168"/> can be split into two parts conceptionally. In the forward direction, alongside the data stream, it uses atwo-bit2-bit field in the IP header. This is referred to as IP-ECN later on. This signal carried in the IP (Layer 3) header is exposed to network devices and may be modified when such a device starts to experience congestion (see <xref target="accecn_Tab_ECN"/>). The second part is the feedback mechanism, by which the original data sender is notified of the current congestion state of the intermediate path. That returned signal is carried in aprotocol specificprotocol-specific manner, and is not to be modified by intermediate network devices. While ECN is in active use for protocols such as QUIC <xref target="RFC9000"/>, SCTP <xref target="RFC9260"/>, RTP <xreftarget="RFC6679"/>target="RFC6679"/>, and Remote Direct Memory Access over Converged Ethernet <xref target="RoCEv2"/>, this document only concerns itself with the specific implementation for the TCP protocol.</t> <t>Once ECN has been negotiated for a transport layer connection, the Data Sender for either half-connection can set two possible codepoints (ECT(0) or ECT(1)) in the IP header of a data packet to indicate an ECN-capable transport (ECT). If the ECN codepoint is 0b00, the packet is considered to have been sent by a Not ECN-capable Transport (Not-ECT). When a network node experiences congestion, it will occasionally either drop or mark a packet, with the choice depending on the packet's ECN codepoint. If the codepoint is Not-ECT, only drop is appropriate. If the codepoint is ECT(0) or ECT(1), the node can mark the packet by setting the ECN codepoint to 0b11, which is termed 'Congestion Experienced' (CE), or loosely a 'congestion mark'. <xref target="accecn_Tab_ECN"/> summarises these codepoints.</t><texttable anchor="accecn_Tab_ECN" title="The<table anchor="accecn_Tab_ECN"> <name>The ECN Field in the IPHeader"> <ttcol>IP-ECN codepoint</ttcol> <ttcol>Codepoint name</ttcol> <ttcol>Description</ttcol> <c>0b00</c> <c>Not-ECT</c> <c>Not ECN-Capable Transport</c> <c>0b01</c> <c>ECT(1)</c> <c>ECN-Capable Transport (1)</c> <c>0b10</c> <c>ECT(0)</c> <c>ECN-Capable Transport (0)</c> <c>0b11</c> <c>CE</c> <c>Congestion Experienced</c> </texttable>Header</name> <thead> <tr> <th>IP-ECN codepoint</th> <th>Codepoint name</th> <th>Description</th> </tr> </thead> <tbody> <tr> <td>0b00</td> <td>Not-ECT</td> <td>Not ECN-Capable Transport</td> </tr> <tr> <td>0b01</td> <td>ECT(1)</td> <td>ECN-Capable Transport (1)</td> </tr> <tr> <td>0b10</td> <td>ECT(0)</td> <td>ECN-Capable Transport (0)</td> </tr> <tr> <td>0b11</td> <td>CE</td> <td>Congestion Experienced</td> </tr> </tbody> </table> <t>In the TCPheaderheader, the first two bits in byte 14 (the TCP header flags at bit offsets 8 and 9 labelled Congestion Window Reduced (CWR) and Explicit Congestion notification Echo (ECE) in <xref target="accecn_Fig_TCPHdr"/>) are defined as flags for the use of Classic ECN <xref target="RFC3168"/>. A TCP Client indicates that it supports Classic ECN feedback by setting (CWR,ECE) = (1,1) in the SYN, and an ECN-enabled TCP Server confirms Classic ECN support by setting (CWR,ECE) = (0,1) in the SYN/ACK. On reception of a CE-marked packet at the IP layer, the Data Receiver for that half-connection starts to set the Echo Congestion Experienced (ECE) flag continuously in the TCP header of ACKs, which gives the signal resilience to loss or reordering of ACKs. The Data Sender for the same half-connection confirms that it has received at least one ECE signal by responding with thecongestion window reduced (CWR)CWR flag, which allows the Data Receiver to stop repeating the ECN-Echo flag. This always leads to a full RTT of ACKs with ECE set. Thus Classic ECN cannot feed back any additional CE markings arriving within this RTT.</t> <t>The last bit in byte 13 of the TCP header (the TCP header flag at bit offset 7 in <xref target="accecn_Fig_TCPHdr"/>) was defined as the Nonce Sum (NS) for theECN NonceECN-nonce <xref target="RFC3540"/>. In the absence of widespreaddeploymentdeployment, RFC 3540has beenwas reclassified ashistoricHistoric <xref target="RFC8311"/> and the respective flaghas beenwas marked as"reserved", making"Reserved", which made this TCP flag available for use by AccECN instead.</t><?rfc needLines="8" ?><figurealign="center" anchor="accecn_Fig_TCPHdr" title="TCP header flagsanchor="accecn_Fig_TCPHdr"> <name>TCP Header Flags asdefined beforeDefined Before the Nonce Sumflag revertedFlag Reverted toReserved">Reserved</name> <artwork align="center"><![CDATA[ 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+ | | | N | C | E | U | A | P | R | S | F | | Header Length | Reserved | S | W | C | R | C | S | S | Y | I | | | | | R | E | G | K | H | T | N | N | +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+ ]]></artwork> </figure> </section> </section><!-- ================================================================ --><sectionanchor="accecn_Overview" title="AccECNanchor="accecn_Overview"> <name>AccECN Protocol Overview andRationale">Rationale</name> <t>This section provides an informative overview of the AccECN protocol thatwill beis normatively specified in <xreftarget="accecn_Spec"/></t>target="accecn_Spec"/>.</t> <t>Like the general TCP approach, the Data Receiver of each TCP half-connection sends AccECN feedback to the Data Sender on TCP acknowledgements, reusing data packets of the other half-connection whenever possible.</t> <t>The AccECN protocol has had to be designed in twoparts:<list style="symbols">parts:</t> <ul spacing="normal"> <li> <t>an essential feedback part thatre-usesreuses the TCP-ECN header bits for the Data Receiver to feed back the number of packets arriving with CE in the IP-ECN field. This provides more accuracy than Classic ECN feedback, but limited resilience against ACK loss;</t> </li> <li> <t>a supplementary feedback part using one of two new alternative AccECN TCP options that provide additional feedback on the number of payload bytes that arrive marked with each of the three ECN codepoints in the IP-ECN field (not just CE marks). See the BCP on Byte and Packet Congestion Notification <xref target="RFC7141"/> for the rationale determining that conveying congested payload bytes should be preferred over just providing feedback about congested packets. This also provides greater resilience against ACK loss than the essential feedback, but it is currently more likely to suffer from middlebox interference.</t></list>The</li> </ul> <t>The two part design was necessary, given limitations on the space available for TCP options and given the possibility that certain incorrectly designed middleboxes might prevent TCP from using any new options.</t> <t>The essential feedback part overloads the previous definition of the three flags in the TCP header that had been assigned for use by Classic ECN. This design choice deliberately allows AccECN peers to replace the Classic ECN feedback protocol, rather than leaving Classic ECN feedback intact and adding more accurate feedback separatelybecause:<list style="symbols">because:</t> <ul spacing="normal"> <li> <t>this efficiently reuses scarce TCP header space, given TCP option space is approaching saturation;</t> </li> <li> <t>a single upgrade path for the TCP protocol is preferable to a fork in the designwhichthat modifies the TCP header to convey all ECN feedback;</t><t>otherwise</li> <li> <t>otherwise, Classic and Accurate ECN feedback could give conflicting feedback about the same segment, which could open up new security concerns and make implementations unnecessarily complex;</t> </li> <li> <t>middleboxes are more likely to faithfully forward the TCP ECN flags than newly defined areas of the TCP header.</t></list></t></li> </ul> <t>AccECN is designed to work even if the supplementary feedback part is removed or zeroed out, as long as the essential feedback part gets through.</t><section title="Capability Negotiation"><section> <name>Capability Negotiation</name> <t>AccECNis a change tochanges the wire protocol of the main TCPheader, thereforeheader; therefore, it can only be used if both endpoints have been upgraded to understand it. The TCP Client signals support for AccECN on the initial SYN of aconnectionconnection, and the TCP Server signals whether it supports AccECN on the SYN/ACK. The TCP flags on the SYN that the TCP Client uses to signal AccECN support have been carefully chosen so that a TCP Server will interpret them as a request to support the most recent variant of ECN feedback that it supports. Then the TCP Client falls back to the same variant of ECN feedback.</t> <t>An AccECN TCP Client does not send an AccECN Option on the SYN as SYN option space is limited. The TCP Server sends an AccECN Option on theSYN/ACKSYN/ACK, and the TCP Client sends one on the first ACK to test whether the network path forwards these options correctly.</t> </section><section title="Feedback Mechanism"><section> <name>Feedback Mechanism</name> <t>A Data Receiver maintains four counters initialized at the start of the half-connection. Three count the number of arriving payload bytes marked CE,ECT(1)ECT(1), and ECT(0) in the IP-ECN field. These byte counters reflect only the TCP payload length, excluding the TCP header and TCP options. The fourth counter counts the number of packets arriving marked with a CE codepoint (including control packets without payload if they are CE-marked).</t> <t>The Data Sender maintains four equivalent counters for the half connection, and the AccECN protocol is designed to ensure they will match the values in the Data Receiver's counters, albeit after a little delay.</t> <t>Each ACK carries the three least significant bits (LSBs) of the packet-based CE counter using the ECN bits in the TCP header, now renamed the Accurate ECN (ACE) field (see <xreftarget="accecn_Fig_ACE_ACK"/> later).target="accecn_Fig_ACE_ACK"/>). The 24 LSBs of some or all of the byte counters can be optionally carried in an AccECN Option. For efficient use of limited option space, two alternative forms of the AccECN Option are specified with the fields in the opposite order to each other.</t> </section><section title="Delayed<section> <name>Delayed ACKs and Resilience Against ACKLoss">Loss</name> <t>With both the ACE and the AccECN Option mechanisms, the Data Receiver continually repeats the current LSBs of each of its respective counters. There is no need to acknowledge these continually repeated counters, so thecongestion window reducedCongestion Window Reduced (CWR) mechanism of <xref target="RFC3168"/> is no longer used. Even if some ACKs are lost, the Data Sender ought to be able to infer how much to increment its own counters, even if the protocol field has wrapped.</t> <t>The 3-bit ACE field can wrap fairly frequently. Therefore, even if it appears to have incremented by one (say), the field might have actually cycled completely and then incremented by one. The Data Receiver is not allowed to delay sending an ACK to such an extent that the ACE field would cycle.HoweverHowever, ACKs received at the Data Sender could still cycle because a whole sequence of ACKs carrying intervening values of the field might all be lost or delayed in transit.</t> <t>The fields in an AccECN Option are larger, but they will increment in larger steps because they count bytes not packets. Nonetheless, their size has been chosen such that a whole cycle of the field would never occur between ACKs unless therehadhas been an infeasibly long sequence of ACK losses. Therefore, provided that an AccECN Option is available, it can be treated as a dependable feedback channel.</t> <t>If an AccECN Option is not available,e.g., ite.g., it is being stripped by a middlebox, the AccECN protocol will only feed back information on CE markings (using the ACE field). Although not ideal, this will be sufficient, because it is envisaged that neither ECT(0) nor ECT(1) will ever indicate more severe congestion than CE, even though future uses for ECT(0) or ECT(1) are still unclear <xref target="RFC8311"/>. Because the 3-bit ACE field is so small, when it is the only field available, the Data Sender has to interpret it assuming the most likely wrap, but with a degree of conservatism.</t> <t>Certain specified events trigger the Data Receiver to include an AccECN Option on an ACK. The rules are designed to ensure that the order in which different markings arrive at the receiver is communicated to the sender (as long as options are reaching the sender and as long as there is no ACK loss). Implementations are encouraged to send an AccECN Option more frequently, but this is left up to the implementer.</t> <!--As one ACK might acknowledge multiple data segments at the same time the proposed scheme providing accumulated information does not preserve the order at which the marking were received.This decision was taken deliberately to reduce complexity.--> </section><section title="Feedback Metrics"><section> <name>Feedback Metrics</name> <t>The CE packet counter in the ACE field and the CE byte counter in AccECN Options both provide feedback on receivedCE-marks.CE marks. The CE packet counter includes control packets that do not have payload data, while the CE byte counter solely includes marked payload bytes. If both are present, the byte counter in an AccECN Option will provide the more accurate information needed for modern congestion control and policing schemes, such as L4S,DCTCPDCTCP, or ConEx. If AccECN Options are stripped, a simple algorithm to estimate the number of marked bytes from the ACE field is given in <xref target="accecn_Algo_ACE_Bytes"/>.</t> <t>The AccECN design has been generalized so that it ought to be able to support possible future uses of the experimental ECT(1) codepoint other than the L4S experiment <xref target="RFC9330"/>, such as a lower severity or a more instant congestion signal than CE.</t> <t>Feedback in bytes is provided to protect against the receiver or a middlebox using attacks similar to 'ACK-Division' to artificially inflate the congestion window, which is why <xref target="RFC5681"/> now recommends that TCP countsacknowledgedacknowledge bytes not packets.</t> </section> <sectionanchor="accecn_demb_reflector" title="Genericanchor="accecn_demb_reflector"> <name>Generic (Mechanistic)Reflector">Reflector</name> <t>The ACE field provides feedback about CE markings in the IP-ECN field of both data and control packets. According to <xreftarget="RFC3168"/>target="RFC3168"/>, the Data Sender is meant to set the IP-ECN field of control packets to Not-ECT. However, mechanisms in certain private networks(e.g., data(e.g., data centres) set control packets to beECN capableECN-capable because they are precisely the packets that performance depends on most.</t> <t>For this reason, AccECN is designed to be a generic reflector of whatever ECN markings it sees, whether or not they are compliant with a current standard. Then as standards evolve, Data Senders can upgrade unilaterally without any need for receivers to upgrade too.</t> <t>It is also useful to be able to rely on generic reflection behaviour when senders need to test for unexpected interference with markings (for instance Sections <xreftarget="accecn_sec_ecn-mangling"/>,target="accecn_sec_ecn-mangling" format="counter"/>, <xreftarget="accecn_sec_ACE_init_invalid"/>target="accecn_sec_ACE_init_invalid" format="counter"/>, and <xreftarget="accecn_Mbox_Interference"/>target="accecn_Mbox_Interference" format="counter"/> of the present document and paragraph 2 ofSection 20.2 of<xreftarget="RFC3168"/>).</t>target="RFC3168" sectionFormat="of" section="20.2"/>).</t> <t>The initial SYN and SYN/ACK are the most critical control packets, so AccECN feeds back their IP-ECN fields. Although RFC 3168 prohibits ECN-capable SYNs and SYN/ACKs, providing feedback of ECN marking on the SYN and SYN/ACK supports future scenarios in which SYNs might be ECN-enabled (without prejudging whether they ought to be). For instance, <xref target="RFC8311"/> updates this aspect of RFC 3168 to allow experimentation with ECN-capable TCP control packets.</t> <t>Even if the TCP Client (or Server) has set the SYN (or SYN/ACK) tonot-ECTNot-ECT in compliance with RFC 3168, feedback on the state of the IP-ECN field when it arrives at the receiver could still be useful, because middleboxes have been known to overwrite the IP-ECN field as if it is still part of the old Type of Service (ToS) field <xref target="Mandalari18"/>. For example, if a TCP Client has set the SYN to Not-ECT, but receives feedback that the IP-ECN field on the SYN arrived with a different codepoint, it can detect such middlebox interference. Previously, neither end knew what IP-ECN field the otherhadsent. So, if a TCP Server received ECT or CE on a SYN, it could not know whether it was invalid because only the TCP Client knew whether it originally marked the SYN as Not-ECT (or ECT). Therefore, prior to AccECN, the Server's only safe course of action in this example was to disable ECN for the connection. Instead, the AccECN protocol allows the Server and Client to feed back the ECN field received on the SYN and SYN/ACK to their peer, whichthennow has all the information to decide whether the connection has tofall-backfall back from supporting ECN (or not).</t> </section> </section><!-- ================================================================ --><sectionanchor="accecn_Spec" title="AccECNanchor="accecn_Spec"> <name>AccECN ProtocolSpecification">Specification</name> <sectionanchor="accecn_Negotiation" title="Negotiatinganchor="accecn_Negotiation"> <name>Negotiating touse AccECN"> <t/>Use AccECN</name> <sectionanchor="accecn_Negotiation_3WHS" title="Negotiation duringanchor="accecn_Negotiation_3WHS"> <name>Negotiation During the TCPthree-way handshake">Three-Way Handshake</name> <t>Given theECN NonceECN-nonce <xref target="RFC3540"/> has been reclassified ashistoricHistoric <xref target="RFC8311"/>, the TCP flag that was previously called NS (Nonce Sum) is renamed as the AE (Accurate ECN) flag (the TCP header flag at bit offset 7 in <xref target="accecn_Fig_TCPHdr_AE"/>). See the IANA Considerations in <xref target="accecn_IANA_Considerations"/>.</t> <figurealign="center" anchor="accecn_Fig_TCPHdr_AE" title="The new definitionanchor="accecn_Fig_TCPHdr_AE"> <name>The New Definition of the TCPheader flags duringHeader Flags During the TCPthree-way handshake">Three-Way Handshake</name> <artwork align="center"><![CDATA[ 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+ | | | A | C | E | U | A | P | R | S | F | | Header Length | Reserved | E | W | C | R | C | S | S | Y | I | | | | | R | E | G | K | H | T | N | N | +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+ ]]></artwork> </figure> <t>During the TCP three-way handshake at the start of a connection, to request more Accurate ECN feedback the TCP Client (host A)MUST<bcp14>MUST</bcp14> set the TCP flags (AE,CWR,ECE) = (1,1,1) in the initial SYN segment.</t> <t>If a TCP Server (host B) that is AccECN-enabled receives a SYN with the above three flags set, itMUST<bcp14>MUST</bcp14> set both its half connections into AccECN mode. Then itMUST<bcp14>MUST</bcp14> set the AE,CWRCWR, and ECE TCP flags on the SYN/ACK to the combination in the top block of <xref target="accecn_Tab_Negotiation"/> that feeds back the IP-ECN field that arrived on the SYN. This applies whether or not the Server itself supports setting the IP-ECN field on a SYN or SYN/ACK (see <xref target="accecn_demb_reflector"/> for rationale).</t> <t>When the TCP Server returns any of the4four combinations in the top block of <xref target="accecn_Tab_Negotiation"/>, it confirms that it supports AccECN. The TCP ServerMUST NOT<bcp14>MUST NOT</bcp14> set one of these4 combinationfour combinations of flags on the SYN/ACK unless the preceding SYN requested support for AccECN as above.</t> <t>Once a TCP Client (A) has sent the above SYN to declare that it supports AccECN, and once it has received the above SYN/ACK segment that confirms that the TCP Server supports AccECN, the TCP ClientMUST<bcp14>MUST</bcp14> set both its half connections into AccECN mode. The TCP ClientMUST NOT<bcp14>MUST NOT</bcp14> enter AccECN mode (or any feedback mode) before it has received the first SYN/ACK.</t> <!-- [rfced Will "rights and obligations" be commonly understood in this context? We only see it used in RFC 3647, and it appears as part of quoted text there. Section 3.1.1 original: Once in AccECN mode, a TCP Client or Server has the rights and obligations to participate in the ECN protocol defined in Section 3.1.5. Section 3.1.5 original: An implementation that supports AccECN has the rights and obligations concerning the use of ECN defined below, which update those in Section 6.1.1 of [RFC3168]. --> <t>Once in AccECN mode, a TCP Client or Server has the rights and obligations to participate in the ECN protocol defined in <xref target="accecn_implications_accecn_mode"/>.</t> <t>The proceduresto followfor retransmission of SYNs or SYN/ACKs are given in <xref target="accecn_sec_multiple_SYNs_or_SYN-ACKs"/>.</t> <t>It isRECOMMENDED<bcp14>RECOMMENDED</bcp14> that the AccECN protocolisbe implemented alongside Selective Acknowledgement (SACK) <xref target="RFC2018"/>. If SACK is implemented with AccECN, Duplicate Selective Acknowledgement (D-SACK) <xref target="RFC2883"/>MUST<bcp14>MUST</bcp14> also be implemented.</t> </section> <sectionanchor="accecn_sec_backward_compat" title="Backward Compatibility">anchor="accecn_sec_backward_compat"> <name>Backward Compatibility</name> <t>The three flags are set to 1 to indicate AccECN support on the SYN have been carefully chosen to enable natural fall-back to prior stages in the evolution of ECN. <xref target="accecn_Tab_Negotiation"/> tabulates all the negotiation possibilities for ECN-related capabilities that involve at least one AccECN-capable host. The entries in the first two columns have been abbreviated, as follows:<list hangIndent="4" style="hanging"> <t hangText="AccECN:">Supports</t> <dl newline="false" spacing="normal" indent="4"> <dt>AccECN:</dt> <dd>Supports more Accurate ECNFeedbackfeedback (the presentspecification)</t> <t hangText="Nonce:">Supports ECN Noncespecification)</dd> <dt>Nonce:</dt> <dd>Supports ECN-nonce feedback <xreftarget="RFC3540"/></t> <t hangText="ECN:">Supportstarget="RFC3540"/></dd> <dt>ECN:</dt> <dd>Supports 'Classic' ECN feedback <xreftarget="RFC3168"/></t> <t hangText="No ECN:">Nottarget="RFC3168"/></dd> <dt>No ECN:</dt> <dd>Not ECN-capable. Implicit congestion notification using packetdrop.</t> </list></t>drop.</dd> </dl> <!-- <?rfc needLines="23" ?> --> <table align="center" anchor="accecn_Tab_Negotiation"> <name>ECNcapability negotiation betweenCapability Negotiation Between Client (A) and Server (B)</name> <thead> <tr> <th align="left">Host A</th> <th align="left">Host B</th> <thalign="center">SYN<br/>A->B<br/>AE CWR ECE</th>align="center">SYN<br/>A->B<br/>AE CWR ECE</th> <thalign="center">SYN/ACK<br/>B->A<br/>AE CWR ECE</th>align="center">SYN/ACK<br/>B->A<br/>AE CWR ECE</th> <th align="left">Feedback Mode<br/>of Host A</th> </tr> </thead> <tbody> <tr> <td align="left">AccECN<br/>AccECN<br/>AccECN<br/>AccECN</td> <td align="left">AccECN<br/>AccECN<br/>AccECN<br/>AccECN</td> <td align="center">1 1 1<br/>1 1 1<br/>1 1 1<br/>1 1 1</td> <td align="center">0 1 0<br/>0 1 1<br/>1 0 0<br/>1 1 0</td> <td align="left">AccECN (Not-ECT SYN)<br/>AccECN (ECT1 on SYN)<br/>AccECN (ECT0 on SYN)<br/>AccECN (CE on SYN)</td> </tr> <tr> <td align="left"/> <td align="left"/> <td align="center"/> <td align="center"/> <td align="left"/> </tr> <tr> <td align="left">AccECN<br/>AccECN<br/>AccECN</td> <td align="left">Nonce<br/>ECN<br/>No ECN</td> <td align="center">1 1 1<br/>1 1 1<br/>1 1 1</td> <td align="center">1 0 1<br/>0 0 1<br/>0 0 0</td> <td align="left">(Reserved)<br/>Classic ECN<br/>Not ECN</td> </tr> <tr> <td align="left"/> <td align="left"/> <td align="center"/> <td align="center"/> <td align="left"/> </tr> <tr> <td align="left">Nonce<br/>ECN<br/>No ECN</td> <td align="left">AccECN<br/>AccECN<br/>AccECN</td> <td align="center">0 1 1<br/>0 1 1<br/>0 0 0</td> <td align="center">0 0 1<br/>0 0 1<br/>0 0 0</td> <td align="left">Classic ECN<br/>Classic ECN<br/>Not ECN</td> </tr> <tr> <td align="left"/> <td align="left"/> <td align="center"/> <td align="center"/> <td align="left"/> </tr> <tr> <td align="left">AccECN</td> <td align="left">Broken</td> <td align="center">1 1 1</td> <td align="center">1 1 1</td> <td align="left">Not ECN</td> </tr> </tbody> </table> <t><xref target="accecn_Tab_Negotiation"/> is divided intoblocksblocks, with each block separated by an emptyrow.<list style="numbers">row.</t> <ol spacing="normal" type="1"><li> <t>The top block shows the case already described in <xref target="accecn_Negotiation"/> where both endpoints support AccECN and how the TCP Server (B) indicates congestion feedback.</t> </li> <li> <t>The second block shows the cases where the TCP Client (A) supports AccECN but the TCP Server (B) supports some earlier variant of TCP feedback, as indicated in its SYN/ACK. Therefore, as soon as an AccECN-capable TCP Client (A) receives the SYN/ACKshownshown, itMUST<bcp14>MUST</bcp14> set both its half connections into the feedback mode shown in the rightmost column. If the TCP Client has set itself into Classic ECN feedbackmodemode, itMUST then<bcp14>MUST</bcp14> comply with <xreftarget="RFC3168"/>.<vspace blankLines="1"/>Antarget="RFC3168"/>.</t> <t>An AccECN implementation has no need to recognize or support the Server response labelled 'Nonce' orECN NonceECN-nonce feedback more generally <xref target="RFC3540"/>,whichas RFC 3540 has been reclassified ashistoricHistoric <xref target="RFC8311"/>. AccECN is compatible with alternative ECN feedback integrity approaches to the nonce (see <xref target="accecn_Integrity"/>). The SYN/ACK labelled 'Nonce' with (AE,CWR,ECE) = (1,0,1) is reserved for future use. A TCP Client (A) that receives such a SYN/ACK follows the procedure for forward compatibility given in <xref target="accecn_sec_forward_compat"/>.</t> </li> <li> <t>The third block shows the cases where the TCP Server (B) supports AccECN but the TCP Client (A) supports some earlier variant of TCP feedback, as indicated in itsSYN.<vspace blankLines="1"/>WhenSYN.</t> <t>When an AccECN-enabled TCP Server (B) receives a SYN with (AE,CWR,ECE) =(0,1,1)(0,1,1), itMUST<bcp14>MUST</bcp14> do one of thefollowing:<list style="symbols">following:</t> <ul spacing="normal"> <li> <t>set both its half connections into the Classic ECN feedback mode and return a SYN/ACK with (AE,CWR,ECE) = (0,0,1) as shown. Then itMUST<bcp14>MUST</bcp14> comply with <xref target="RFC3168"/>.</t> </li> <li> <t>set both its half-connections into Not ECN mode and return a SYN/ACK with (AE,CWR,ECE) = (0,0,0), then continue with ECN disabled. This latter case is unlikely to be desirable, but it is allowed as a possibility,e.g., fore.g., for minimal TCP implementations.</t></list>When</li> </ul> <t>When an AccECN-enabled TCP Server (B) receives a SYN with (AE,CWR,ECE) =(0,0,0)(0,0,0), itMUST<bcp14>MUST</bcp14> set both its half connections into the Not ECN feedback mode, return a SYN/ACK with (AE,CWR,ECE) = (0,0,0) asshownshown, and continue with ECN disabled.</t> </li> <li> <t>The fourth block displays a combination labelled`Broken'.'Broken'. Some older TCP Server implementations incorrectly set the TCP-ECN flags in the SYN/ACK by reflecting those in the SYN. Such broken TCP Servers (B) cannot supportECN,ECN; so as soon as an AccECN-capable TCP Client (A) receives such a brokenSYN/ACKSYN/ACK, itMUST<bcp14>MUST</bcp14> fall back to Not ECN mode for both its half connections and continue with ECN disabled.</t></list></t></li> </ol> <t>The following additional rules do not fit the structure of the table, but they complementit:<list style="hanging"> <t hangText="Simultaneous Open:">Anit:</t> <dl newline="false" spacing="normal"> <dt>Simultaneous Open:</dt> <dd>An originating AccECN Host (A), having sent a SYN with (AE,CWR,ECE) = (1,1,1), might receive another SYN from host B. Host AMUST<bcp14>MUST</bcp14> then enter the same feedback mode as it would have entered had it been a responding host and received the same SYN. Then host AMUST<bcp14>MUST</bcp14> send the same SYN/ACK as it would have sent had it been a respondinghost.</t> <t hangText="In-windowhost.</dd> <dt>In-window SYN duringTIME-WAIT:">ManyTIME-WAIT:</dt> <dd>Many TCP implementations create a new TCP connection if they receive an in-window SYN packet during TIME-WAIT state. When a TCP host enters TIME-WAIT or CLOSED state, it ought to ignore any previous state about the negotiation of AccECN for that connection and renegotiate the feedback mode according to <xreftarget="accecn_Tab_Negotiation"/>.</t> </list></t>target="accecn_Tab_Negotiation"/>.</dd> </dl> </section> <sectionanchor="accecn_sec_forward_compat" title="Forward Compatibility">anchor="accecn_sec_forward_compat"> <name>Forward Compatibility</name> <t>If a TCP Server that implements AccECN receives a SYN with the three TCP header flags (AE,CWR,ECE) set to any combination other than (0,0,0),(0,1,1)(0,1,1), or (1,1,1) and it does not have logic specific to such a combination, the ServerMUST<bcp14>MUST</bcp14> negotiate the use of AccECN as if the three flags had been set to (1,1,1). However, an AccECN Client implementationMUST NOT<bcp14>MUST NOT</bcp14> send a SYN with any combination other than the three listed.</t> <t>If a TCP Clienthassent a SYN requesting AccECN feedback with (AE,CWR,ECE) = (1,1,1) and then receives a SYN/ACK with the currently reserved combination (AE,CWR,ECE) = (1,0,1) but it does not have logic specific to such a combination, the ClientMUST<bcp14>MUST</bcp14> enable AccECN mode as if the SYN/ACK confirmed that the Server supported AccECN and as if it fed back that the IP-ECN field on the SYN had arrived unchanged. However, an AccECN Server implementationMUST NOT<bcp14>MUST NOT</bcp14> send a SYN/ACK with this combination (AE,CWR,ECE) = (1,0,1).</t> <aside> <t>For the avoidance of doubt, the behaviour described in the present specification applies whether or not the three remaining reserved TCP header flags are zero.</t> </aside> <!-- [rfced] Because "Reserved combination" is not used much, would it help the reader to add a pointer - perhaps to table 2? Original: All these requirements ensure that future uses of all the Reserved combinations on a SYN or SYN/ACK can rely on consistent behaviour from the installed base of AccECN implementations. See Appendix B.3 for related discussion. --> <t>All of these requirements ensure that future uses of all the Reserved combinations on a SYN or SYN/ACK can rely on consistent behaviour from the installed base of AccECN implementations. See <xref target="accecn_space_evolution"/> for related discussion.</t> </section> <sectionanchor="accecn_sec_multiple_SYNs_or_SYN-ACKs" title="Multipleanchor="accecn_sec_multiple_SYNs_or_SYN-ACKs"> <name>Multiple SYNs orSYN/ACKs"> <t/>SYN/ACKs</name> <sectionanchor="accecn_sec_SYN_rexmt" title="Retransmitted SYNs">anchor="accecn_sec_SYN_rexmt"> <name>Retransmitted SYNs</name> <t>If the sender of an AccECN SYN (the TCP Client) times out before receiving the SYN/ACK, itSHOULD<bcp14>SHOULD</bcp14> attempt to negotiate the use of AccECN at least one more time by continuing to set all three TCP ECN flags (AE,CWR,ECE) = (1,1,1) on the first retransmitted SYN (using the usual retransmissiontime-outs).timeouts). If this first retransmission also fails to be acknowledged, in deployment scenarios where AccECN path traversal might be problematic, the TCP ClientSHOULD<bcp14>SHOULD</bcp14> send subsequent retransmissions of the SYN with the three TCP-ECN flags cleared (AE,CWR,ECE) = (0,0,0). Such a retransmitted SYNMUST<bcp14>MUST</bcp14> use the same initial sequence number (ISN) as the original SYN.</t> <t>Retrying once before fall-back adds delay in the case where a middlebox drops an AccECN (or ECN) SYN deliberately. However, recent measurements <xref target="Mandalari18"/> imply that a drop is less likely to be due to middlebox interference than other intermittent causes of loss,e.g., congestion,e.g., congestion, wireless transmission loss, etc.</t><t>Implementers<!-- [rfced] Should a second closing parens appear after "congestion)"? Original: Implementers MAY use other fall-back strategies if they are found to be more effective(e.g., attempting(e.g., attempting to negotiate AccECN on the SYN only once or more than twice (most appropriate during high levels of congestion). --> <!-- [rfced] We are unsure what "try it without" refers to here. Is it "advisable to experiment without using the ECT on a SYN"? Original (sentence prior included for context): Further it might make sense to also remove any other new or experimental fields or options on the SYN in case a middlebox might be blocking them, although the required behaviour will depend on the specification of the other option(s) and any attempt to co-ordinate fall-back between different modules of the stack. For instance, even if taking part in an [RFC8311] experiment that allows ECT on a SYN, it would be advisable to try it without. --> <t>Implementers <bcp14>MAY</bcp14> use other fall-back strategies if they are found to be more effective (e.g., attempting to negotiate AccECN on the SYN only once or more than twice (most appropriate during high levels of congestion).</t> <t>Further it might make sense to also remove any other new or experimental fields or options on the SYN in case a middlebox might be blocking them, although the required behaviour will depend on the specification of the other option(s) and any attempt toco-ordinatecoordinate fall-back between different modules of the stack. For instance, even if taking part in an <xref target="RFC8311"/> experiment that allows ECT on a SYN, it would be advisable to try it without.</t> <t>Whichever fall-back strategy is used, the TCP initiatorSHOULD<bcp14>SHOULD</bcp14> cache failed connection attempts. If it does, itSHOULD NOT<bcp14>SHOULD NOT</bcp14> give up attempting to negotiate AccECN on the SYN of subsequent connection attempts until it is clear that the blockage is persistently and specifically due to AccECN. The cache needs to be arranged to expire so that the initiator will infrequently attempt to check whether the problem has been resolved.</t> <t>All fall-back strategies will need to follow all the normative rules in <xref target="accecn_implications_accecn_mode"/>, which concern behaviour when SYNs or SYN/ACKs negotiating different types of feedback have been sent within the same connection, including the possibility that they arrive out of order. As examples, the following non-normative bullets call out those rules from <xref target="accecn_implications_accecn_mode"/> that apply to the above fall-backstrategies:<list style="symbols">strategies:</t> <!-- [rfced] Throughout, some of the bulleted lists use a mix of periods and semicolons to close the item - some within the same list. Please consider whether these may be updated for consistency. We recommend using terminating periods, unless the goal is to clarify an "and" or "or" connection between the list items. Please review. --> <ul spacing="normal"> <li> <t>Once the TCP Client has sent SYNs with (AE,CWR,ECE) = (1,1,1) and with (AE,CWR,ECE) = (0,0,0), it might eventually receive a SYN/ACK from the Server in response to one, the other, orbothboth, and possibly reordered;</t> </li> <li> <t>Such a TCP Client enters the feedback mode appropriate to the first SYN/ACK it receives according to <xref target="accecn_Tab_Negotiation"/>, and it does not switch to a different mode, whatever other SYN/ACKs it might receive or send;</t> </li> <li> <t>If a TCP Client has entered AccECN mode but then subsequently sends a SYN or receives a SYN/ACK with (AE,CWR,ECE) = (0,0,0), it is still allowed to set ECT on packets for the rest of the connection. Note that this rule is differenttothan that of a Server in an equivalent position (<xref target="accecn_implications_accecn_mode"/> explains).</t> </li> <li> <t>Having entered AccECN mode, in general a TCP Client commits to respond to any incoming congestion feedback, whether or not it sets ECT on outgoing packets (for rationale and some exceptions see <xref target="accecn_sec_ecn-mangling"/>, <xref target="accecn_sec_ACE_init_invalid"/>);</t> </li> <li> <t>Having entered AccECN mode, a TCP Client commits to using AccECN to feed back the IP-ECN field in incoming packets for the rest of the connection, as specified in <xref target="accecn_feedback"/>, even if it is not itself setting ECT on outgoing packets.</t></list></t></li> </ul> </section> <sectionanchor="accecn_sec_SYN-ACK_rexmt" title="Retransmitted SYN/ACKs">anchor="accecn_sec_SYN-ACK_rexmt"> <name>Retransmitted SYN/ACKs</name> <t>A TCP Server might send multiple SYN/ACKs indicating different feedback modes. For instance, when falling back to sending a SYN/ACK with (AE,CWR,ECE) = (0,0,0) after previous AccECN SYN/ACKs have timed out (<xref target="accecn_AccECN_Option_Loss"/>); or to acknowledge different retransmissions of the SYN (<xref target="accecn_sec_SYN_rexmt"/>).</t> <t>All fall-back strategies will need to follow all the normative rules in <xref target="accecn_implications_accecn_mode"/>, which concern behaviour when SYNs or SYN/ACKs negotiating different types of feedback are sent within the same connection, including the possibility that they arrive out of order. As examples, the following non-normative bullets call out those rules from <xref target="accecn_implications_accecn_mode"/> that apply to the above fall-backstrategies:<list style="symbols">strategies:</t> <ul spacing="normal"> <li> <t>An AccECN-capable TCP Server enters the feedback mode appropriate to the first SYN it receives using <xref target="accecn_Tab_Negotiation"/>, and it does not switch to a different mode, whatever other SYNs it might receive and whatever SYN/ACKs it might send;</t><t>if</li> <li> <t>If a TCP Server in AccECN mode receives a SYN with (AE,CWR,ECE) = (0,0,0), it preferably acknowledges it first using an AccECN SYN/ACK, but it can retry using a SYN/ACK with (AE,CWR,ECE) = (0,0,0);</t> </li> <li> <t>If a TCP Server in AccECN mode sends multiple AccECN SYN/ACKs, it uses the TCP-ECN flags in each SYN/ACK to feed back the IP-ECN field on the latest SYN to have arrived;</t> </li> <li> <t>If a TCP Server enters AccECN mode and then subsequently sends a SYN/ACK or receives a SYN with (AE,CWR,ECE) = (0,0,0), it is prohibited from setting ECT on any packet for the rest of the connection;</t> </li> <li> <t>Having entered AccECN mode, in general a TCP Server commits to respond to any incoming congestion feedback, whether or not it sets ECT on outgoing packets (for rationale and some exceptions see Sections <xreftarget="accecn_sec_ecn-mangling"/>,target="accecn_sec_ecn-mangling" format="counter"/>, <xreftarget="accecn_sec_ACE_init_invalid"/>);</t>target="accecn_sec_ACE_init_invalid" format="counter"/>);</t> </li> <li> <t>Having entered AccECN mode, a TCP Server commits to using AccECN to feed back the IP-ECN field in incoming packets for the rest of the connection, as specified in <xref target="accecn_feedback"/>, even if it is not itself setting ECT on outgoing packets.</t></list></t></li> </ul> </section> </section> <sectionanchor="accecn_implications_accecn_mode" title="Implicationsanchor="accecn_implications_accecn_mode"> <name>Implications of AccECNMode">Mode</name> <t><xref target="accecn_Negotiation_3WHS"/> describes the only ways that a host can enter AccECN mode, whether as a Client or as a Server.</t> <t>An implementation that supports AccECN has the rights and obligations concerning the use of ECN defined below, which update those inSection 6.1.1 of<xreftarget="RFC3168"/>.target="RFC3168" sectionFormat="of" section="6.1.1"/>. This section uses the followingdefinitions:<list style="hanging"> <t hangText="'Duringdefinitions:</t> <dl newline="false" spacing="normal"> <dt>'During thehandshake':">Thehandshake':</dt> <dd>The connection states prior tosynchronization;</t> <t hangText="'Valid SYN':">Asynchronization;</dd> <dt>'Valid SYN':</dt> <dd>A SYN that has the same port numbers and the same ISN as the SYN that first caused the Server to open the connection. An 'Acceptable' packet is defined in <xreftarget="accecn_Terminology"/>.</t> </list></t>target="accecn_Terminology"/>.</dd> </dl> <t>Handling SYNs or SYN/ACKs of multiple types(e.g., fall-back): <list style="symbols">(e.g., fall-back): </t> <ul spacing="normal"> <li> <t>Any implementation that supportsAccECN:<list style="symbols"> <t>MUST NOTAccECN:</t> <ul spacing="normal"> <li> <t><bcp14>MUST NOT</bcp14> switch into a different feedback modetothan the one it first entered according to <xref target="accecn_Tab_Negotiation"/>, no matter whether it subsequently receives valid SYNs or Acceptable SYN/ACKs of different types.</t><t>SHOULD</li> <li> <t><bcp14>SHOULD</bcp14> ignore the TCP-ECN flags in SYNs or SYN/ACKs that are received after the implementation reaches the Established state, in line with the general TCP approach <xreftarget="RFC9293"/>;<vspace blankLines="1"/>Reason:target="RFC9293"/>;</t> <t>Reason: Reaching established state implies that at least one SYN and one SYN/ACK have successfully been delivered. And all the rules for handshake fall-back are designed to work based on those packets that successfully traverse the path, whatever other handshake packets are lost or delayed.</t><t>MUST NOT</li> <li> <t><bcp14>MUST NOT</bcp14> send a 'Classic' ECN-setup SYN <xref target="RFC3168"/> with (AE,CWR,ECE) = (0,1,1) and a SYN with (AE,CWR,ECE) = (1,1,1) requesting AccECN feedback within the same connection;</t><t>MUST NOT</li> <li> <t><bcp14>MUST NOT</bcp14> send a 'Classic' ECN-setup SYN/ACK <xref target="RFC3168"/> with (AE,CWR,ECE) = (0,0,1) and a SYN/ACK agreeing to use AccECN feedback within the same connection;</t><t>MUST</li> <li> <t><bcp14>MUST</bcp14> reset the connection with a RST packet, if it receives a 'Classic' ECN-setup SYN with (AE,CWR,ECE) = (0,1,1) and a SYN requesting AccECN feedback during the same handshake;</t><t>MUST</li> <li> <t><bcp14>MUST</bcp14> reset the connection with a RST packet, if it receives 'Classic' ECN-setup SYN/ACK with (AE,CWR,ECE) = (0,0,1) and a SYN/ACK agreeing to use AccECN feedback during the same handshake;</t></list>The</li> </ul> <t>The last four rules are necessary because, if one peer were to negotiate the feedback mode in two different types of handshake, it would not be possible for the other peer to know for certain which handshake packet(s) the other end had eventually received or in which order it received them. So, in the absence of these rules, the two peers could end up using different ECN feedback modes without knowing it.</t> </li> <li> <t>A host in AccECN mode that is feeding back the IP-ECN field on a SYN orSYN/ACK:<list style="symbols"> <t>MUSTSYN/ACK:</t> <ul spacing="normal"> <li> <t><bcp14>MUST</bcp14> feed back the IP-ECN field on the latest valid SYN or acceptable SYN/ACK to arrive.</t></list></t></li> </ul> </li> <li> <t>A TCP Server already in AccECNmode:<list style="symbols"> <t>SHOULDmode:</t> <ul spacing="normal"> <li> <t><bcp14>SHOULD</bcp14> acknowledge a valid SYN arriving with (AE,CWR,ECE) = (0,0,0) by emitting an AccECN SYN/ACK (with the appropriate combination of TCP-ECN flags to feed back the IP-ECN field of this latest SYN);</t><t>MAY</li> <li> <t><bcp14>MAY</bcp14> acknowledge a valid SYN arriving with (AE,CWR,ECE) = (0,0,0) by sending a SYN/ACK with (AE,CWR,ECE) = (0,0,0);</t></list>Rationale:</li> </ul> <t>Rationale: When a SYN arrives with (AE,CWR,ECE) = (0,0,0) at a TCP Server that is already in AccECN mode, it implies that the TCP Client had probably not received the previous AccECN SYN/ACK emitted by the TCP Server. Therefore, the first bullet recommends attempting at least one more AccECN SYN/ACK. Nonetheless, the second bullet recognizes that the Server might eventually need to fall back to a non-ECN SYN/ACK. In either case, the TCP Server remains in AccECN feedback mode (according to the earlier requirement not to switch modes).</t> </li> <li> <t>An AccECN-capable TCP Server already in Not ECNmode:<list style="symbols"> <t>SHOULDmode:</t> <ul spacing="normal"> <li> <t><bcp14>SHOULD</bcp14> respond to any subsequent valid SYN using a SYN/ACK with (AE,CWR,ECE) = (0,0,0), even if the SYN is offering to negotiate Classic ECN or AccECN feedbackmode;<vspace blankLines="1"/>Rationale:mode;</t> <t>Rationale: There would be no point in the Server offering any type of ECN feedback, because the Client will not be using ECN. However, there is no interoperability reason to make this rule mandatory.</t></list></t> </list>If</li> </ul> </li> </ul> <t>If for any reason a host is not willing to provide ECN feedback on a particular TCP connection, itSHOULD<bcp14>SHOULD</bcp14> clear the AE,CWRCWR, and ECE flags in all SYN and/or SYN/ACK packets that it sends.</t> <t>SendingECT:<list style="symbols">ECT:</t> <ul spacing="normal"> <li> <t>Any implementation that supportsAccECN:<list style="symbols"> <t>MUST NOTAccECN:</t> <ul spacing="normal"> <li> <t><bcp14>MUST NOT</bcp14> set ECT if it is in Not ECN feedback mode.</t></list>A</li> </ul> <t>A Data Sender in AccECNmode:<list style="symbols"> <t>SHOULDmode:</t> <ul spacing="normal"> <li> <t><bcp14>SHOULD</bcp14> set an ECT codepoint in the IP header of packets to indicate to the network that the transport is capable and willing to participate in ECN for this packet;</t><t>MAY</li> <li> <t><bcp14>MAY</bcp14> not set ECT on any packet (for instance if it has reason to believe such a packet would be blocked);</t></list>A</li> </ul> <t>A TCP Server in AccECNmode:<list style="symbols"> <t>MUST NOTmode:</t> <ul spacing="normal"> <li> <t><bcp14>MUST NOT</bcp14> set ECT on any packet for the rest of the connection, if it has received or sent at least one valid SYN or Acceptable SYN/ACK with (AE,CWR,ECE) = (0,0,0) during thehandshake.<vspace blankLines="0"/>Thishandshake.</t> <t>This rule solely applies to a Server because, when a Server enters AccECNmodemode, it doesn't know for sure whether the Client will end up in AccECN mode. But when a Client enters AccECN mode, it can be certain that the Server is already in AccECN feedback mode.</t></list></t> </list></t></li> </ul> </li> </ul> <t>Congestionresponse:<list style="symbols">response:</t> <ul spacing="normal"> <li> <t>A host in AccECNmode:<list style="symbols">mode:</t> <ul spacing="normal"> <li> <t>is obliged to respond appropriately to AccECN feedback that indicates there were ECN marks on packets it had previously sent, where 'appropriately' is defined inSection 6.1 of<xreftarget="RFC3168"/>target="RFC3168" sectionFormat="of" section="6.1"/> and updated by Sections2.1<xref target="RFC8311" sectionFormat="bare" section="2.1"/> and4.1<xref target="RFC8311" sectionFormat="bare" section="4.1"/> of <xref target="RFC8311"/>;</t> </li> <li> <t>is still obliged to respond appropriately to congestion feedback, even when it is solely sending non-ECN-capable packets (for rationale, some examples and some exceptions see Sections <xreftarget="accecn_sec_ecn-mangling"/>,target="accecn_sec_ecn-mangling" format="counter"/> and <xreftarget="accecn_sec_ACE_init_invalid"/>).</t>target="accecn_sec_ACE_init_invalid" format="counter"/>).</t> </li> <li> <t>is still obliged to respond appropriately to congestion feedback, even if it has sent or received a SYN or SYN/ACK packet with (AE,CWR,ECE) = (0,0,0) during the handshake;</t><t>MUST NOT</li> <li> <t><bcp14>MUST NOT</bcp14> set CWR to indicate that it has received and responded to indications ofcongestion.<vspace blankLines="1"/>Forcongestion.</t> <t>For the avoidance of doubt, this is unlike an RFC 3168 data sender and this does not preclude the Data Sender from setting the bits of the ACE counter field, which includes an overloaded use of the same bit.</t></list></t> </list></t></li> </ul> </li> </ul> <t>ReceivingECT:<list style="symbols">ECT:</t> <ul spacing="normal"> <li> <t>A host in AccECNmode:<list style="symbols"> <t>MUSTmode:</t> <ul spacing="normal"> <li> <t><bcp14>MUST</bcp14> feed back the information in the IP-ECN field of incoming packets using Accurate ECN feedback, as specified in <xreftarget="accecn_feedback"/>.<vspace blankLines="1"/>Fortarget="accecn_feedback"/>.</t> <t>For the avoidance of doubt, this requirement stands even if the AccECN host has also sent or received a SYN or SYN/ACK with (AE,CWR,ECE) = (0,0,0). Reason: Such a SYN or SYN/ACK implies some form of packet mangling might be present. Even if the remote peer is not setting ECT, it could still be set erroneously by packet mangling at the IP layer (see <xref target="accecn_sec_ecn-mangling"/>). In such cases, the Data Sender is best placed to decide whether ECN markings are valid, but it can only do that if the Data Receiver mechanistically feeds back any ECN markings. This approach will not lead to TCP Options being generated unnecessarily if the recommended simple scheme in <xref target="accecn_option_usage"/> is used, because no byte counters will change if no packets are set to ECT.</t><t>MUST NOT</li> <li> <t><bcp14>MUST NOT</bcp14> use reception of packets with ECT set in the IP-ECN field as an implicit signal that the peer isECN-capable.<vspace blankLines="1"/>Reason:ECN-capable.</t> <t>Reason: ECT at the IP layer does not explicitly confirm the peer has the correct ECN feedback logic, because the packets could have been mangled at the IP layer.</t></list></t> </list></t></li> </ul> </li> </ul> </section> </section> <sectionanchor="accecn_feedback" title="AccECN Feedback">anchor="accecn_feedback"> <name>AccECN Feedback</name> <t>Each Data Receiver of each half connection maintains four counters, r.cep, r.ceb,r.e0br.e0b, andr.e1b:<list style="symbols">r.e1b:</t> <ul spacing="normal"> <li> <t>The Data ReceiverMUST<bcp14>MUST</bcp14> increment the CE packet counter (r.cep), for every Acceptable packet that it receives with the CE code point in theIP ECNIP-ECN field, includingCE markedCE-marked control packets and retransmissions but excluding CE on SYN packets (SYN=1; ACK=0).</t> </li> <li> <t>A Data Receiver that supports sending of AccECN TCP OptionsMUST<bcp14>MUST</bcp14> increment the r.ceb,r.e0br.e0b, or r.e1b byte counters by the number of TCP payload octets in Acceptable packets marked with the CE,ECT(0)ECT(0), and ECT(1) codepoint in their IP-ECN field, including any payload octets on control packets and retransmissions, but not including any payload octets on SYN packets (SYN=1; ACK=0).</t></list></t></li> </ul> <t>Each Data Sender of each half connection maintains four counters, s.cep, s.ceb,s.e0bs.e0b, ands.e1bs.e1b, intended to track the equivalent counters at the Data Receiver.</t> <t>A Data Receiver feeds back the CE packet counter using the Accurate ECN (ACE) field, as explained in <xref target="accecn_ACE"/>. And it optionally feeds back all the byte counters using the AccECN TCP Option, as specified in <xref target="accecn_option"/>.</t> <t>Whenever a Data Receiver feeds back the value of any counter, itMUST<bcp14>MUST</bcp14> report the most recent value, no matter whether it is in a pure ACK, or an ACK piggybacked on a packet used by the other half-connection, whether a new payload data or a retransmission.ThereforeTherefore, the feedback piggybacked on a retransmitted packet is unlikely to be the same as the feedback on the original packet.</t> <sectionanchor="accecn_init_counters" title="Initializationanchor="accecn_init_counters"> <name>Initialization of FeedbackCounters">Counters</name> <t>When a host first enters AccECN mode, in its role as a DataReceiverReceiver, it initializes its counters to r.cep = 5, r.e0b = r.e1b =11, and r.ceb = 0,</t> <t>Non-zero initial values are used to support a stateless handshake (see <xref target="accecn_Interaction_SYN_Cookies"/>) and to be distinct from cases where the fields are incorrectly zeroed(e.g., by(e.g., by middleboxes--- see <xref target="accecn_sec_zero_option"/>).</t> <t>When a host enters AccECN mode, in its role as a DataSenderSender, it initializes its counters to s.cep = 5, s.e0b = s.e1b =11, and s.ceb = 0.</t> </section> <sectionanchor="accecn_ACE" title="Theanchor="accecn_ACE"> <name>The ACEField">Field</name> <t>After AccECN has been negotiated on the SYN and SYN/ACK, both hosts overload the three TCP flags (AE,CWRCWR, and ECE) in the main TCP header as one 3-bit field. Then the field is given a new name, ACE, as shown in <xref target="accecn_Fig_ACE_ACK"/>.</t> <!-- <?rfc needLines="9" ?> --> <figurealign="center" anchor="accecn_Fig_ACE_ACK" title="Definitionanchor="accecn_Fig_ACE_ACK"> <name>Definition of the ACEfield within bytesField Within Bytes 13 and 14 of the TCP Header(when(When AccECNhas been negotiatedHas Been Negotiated andSYN=0).">SYN=0).</name> <artwork align="center"><![CDATA[ 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+ | | | | U | A | P | R | S | F | | Header Length | Reserved | ACE | R | C | S | S | Y | I | | | | | G | K | H | T | N | N | +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+ ]]></artwork> </figure> <t>The original definition of these three flags in the TCP header, including the addition of support for theECN Nonce,ECN-nonce, is shown for comparison in <xref target="accecn_Fig_TCPHdr"/>. This specification does not rename these three TCP flags to ACE unconditionally; it merely overloads them with another name and definition once an AccECN connection has been established.</t> <t>With one exception (<xref target="accecn_ACE_3rdACK"/>), a host with both of its half-connections in AccECN modeMUST<bcp14>MUST</bcp14> interpret the AE,CWRCWR, and ECE flags as the 3-bit ACE counter on a segment with the SYN flag cleared (SYN=0). On such a packet, a Data ReceiverMUST<bcp14>MUST</bcp14> encode thethree3 least significant bits of its r.cep counter into the ACE field that it feeds back to the Data Sender. The least significant bit is at bit offset 9 in <xref target="accecn_Fig_ACE_ACK"/>. A hostMUST NOT<bcp14>MUST NOT</bcp14> interpret the3three flags as a 3-bit ACE field on any segment with SYN=1 (whether ACK is 0 or 1), or if AccECN negotiation is incomplete or has not succeeded.</t> <t>Both parts of each of these conditions are equally important. For instance, even if AccECN negotiation has been successful, the ACE field is not defined on any segments with SYN=1(e.g., a(e.g., a retransmission of an unacknowledged SYN/ACK, or when both ends send SYN/ACKs after AccECN support has been successfully negotiated during a simultaneous open).</t> <sectionanchor="accecn_ACE_3rdACK" title="ACEanchor="accecn_ACE_3rdACK"> <name>ACE Field on the ACK of theSYN/ACK">SYN/ACK</name> <!-- [rfced] For clarity, we'd like to add quotes to "handshake encoding". Please confirm this is correct, as opposed to "handshake encoding of the ACE field". Original: This shall be called the handshake encoding of the ACE field, and it is the only exception to the rule that the ACE field carries the 3 least significant bits of the r.cep counter on packets with SYN=0. --> <t>A TCP Client (A) in AccECN modeMUST<bcp14>MUST</bcp14> feed back which of the 4 possible values of the IP-ECN field was on the SYN/ACK by writing it into the ACE field of a pure ACK with no SACK blocks using the binary encoding in <xref target="accecn_Tab_SYN-ACK_fb2"/> (which is the same as that used on the SYN/ACK in <xref target="accecn_Tab_Negotiation"/>). This shall be called the handshake encoding of the ACE field, and it is the only exception to the rule that the ACE field carries the 3 least significant bits of the r.cep counter on packets with SYN=0.</t> <t>Normally, a TCP Client acknowledges a SYN/ACK with an ACK that satisfies the above conditions anyway (SYN=0, no data, no SACK blocks). If an AccECN TCP Client intends to acknowledge the SYN/ACK with a packet that does not satisfy these conditions(e.g., it(e.g., it has data to include on the ACK), itSHOULD<bcp14>SHOULD</bcp14> first send a pure ACK that does satisfy these conditions (see <xref target="accecn_Interaction_Other"/>), so that it can feed back which of the four values of the IP-ECN field arrived on the SYN/ACK. A valid exception to this"SHOULD""<bcp14>SHOULD</bcp14>" would be where the implementation will only be used in an environment where mangling of the ECN field is unlikely.</t> <t>The TCP ClientMUST<bcp14>MUST</bcp14> also use the handshake encoding for the pure ACK of any retransmitted SYN/ACK that confirms that the TCP Server supports AccECN.The procedure for the TCP Server to follow ifIf the final ACK of the handshake does not arrive before its retransmission timerexpiresexpires, the TCP Server is follow the procedure given in <xref target="accecn_sec_SYN-ACK_rexmt"/>.</t><texttable anchor="accecn_Tab_SYN-ACK_fb2" title="The encoding<table anchor="accecn_Tab_SYN-ACK_fb2"> <name>The Encoding of the ACEfieldField in the ACK of the SYN-ACK toreflectReflect the SYN-ACK's IP-ECNfield"> <ttcol>IP-ECNField</name> <thead> <tr> <th>IP-ECN codepoint onSYN/ACK</ttcol> <ttcol>ACESYN/ACK</th> <th>ACE on pure ACK ofSYN/ACK</ttcol> <ttcol>r.cepSYN/ACK</th> <th>r.cep of TCP Client in AccECNmode</ttcol> <c>Not-ECT</c> <c>0b010</c> <c>5</c> <c>ECT(1)</c> <c>0b011</c> <c>5</c> <c>ECT(0)</c> <c>0b100</c> <c>5</c> <c>CE</c> <c>0b110</c> <c>6</c> </texttable> <t>Whenmode</th> </tr> </thead> <tbody> <tr> <td>Not-ECT</td> <td>0b010</td> <td>5</td> </tr> <tr> <td>ECT(1)</td> <td>0b011</td> <td>5</td> </tr> <tr> <td>ECT(0)</td> <td>0b100</td> <td>5</td> </tr> <tr> <td>CE</td> <td>0b110</td> <td>6</td> </tr> </tbody> </table> <!-- [rfced] For readability, may we break this text into two sentences? Original: When an AccECN Server in SYN-RCVD state receives a pure ACK with SYN=0 and no SACK blocks, instead of treating the ACE field as a counter, it MUST infer the meaning of each possible value of the ACE field from Table 4, which also shows the value that an AccECN Server MUST set s.cep to as a result. Perhaps: When an AccECN Server in SYN-RCVD state receives a pure ACK with SYN=0 and no SACK blocks, it MUST infer the meaning of each possible value of the ACE field from Table 4 instead of treating the ACE field as a counter. Table 4 also shows the value to which an AccECN Server MUST set s.cep as a result. --> <t>When an AccECN Server in SYN-RCVD state receives a pure ACK with SYN=0 and no SACK blocks, instead of treating the ACE field as a counter, it <bcp14>MUST</bcp14> infer the meaning of each possible value of the ACE field from <xref target="accecn_Tab_SYN-ACK_fb"/>, which also shows the value that an AccECN ServerMUST<bcp14>MUST</bcp14> set s.cep to as a result.</t> <!-- [rfced] We are unclear what "it" refers to in the following. Perhaps "it" can be deleted? Original: Given this encoding of the ACE field on the ACK of a SYN/ACK is exceptional, an AccECN Server using large receive offload (LRO) might prefer to disable LRO until such an ACK has transitioned it out of SYN-RCVD state. --> <t>Given this encoding of the ACE field on the ACK of a SYN/ACK is exceptional, an AccECN Server using large receive offload (LRO) might prefer to disable LRO until such an ACK has transitioned it out of SYN-RCVD state.</t><texttable anchor="accecn_Tab_SYN-ACK_fb" title="Meaning<table anchor="accecn_Tab_SYN-ACK_fb"> <name>Meaning of the ACEfieldField on the ACK of theSYN/ACK"> <ttcol>ACESYN/ACK</name> <thead> <tr> <th>ACE on ACK ofSYN/ACK</ttcol> <ttcol>IP-ECNSYN/ACK</th> <th>IP-ECN codepoint on SYN/ACK inferred byServer</ttcol> <ttcol>s.cepServer</th> <th>s.cep of TCP Server in AccECNmode</ttcol> <c>0b000</c> <c>{Notesmode</th> </tr> </thead> <tbody> <tr> <td>0b000</td> <td>{Notes 1,3}</c> <c>Disable s.cep</c> <c>0b001</c> <c>{Notes3}</td> <td>Disable s.cep</td> </tr> <tr> <td>0b001</td> <td>{Notes 2,3}</c> <c>5</c> <c>0b010</c> <c>Not-ECT</c> <c>5</c> <c>0b011</c> <c>ECT(1)</c> <c>5</c> <c>0b100</c> <c>ECT(0)</c> <c>5</c> <c>0b101</c> <c>Currently3}</td> <td>5</td> </tr> <tr> <td>0b010</td> <td>Not-ECT</td> <td>5</td> </tr> <tr> <td>0b011</td> <td>ECT(1)</td> <td>5</td> </tr> <tr> <td>0b100</td> <td>ECT(0)</td> <td>5</td> </tr> <tr> <td>0b101</td> <td>Currently Unused {Note2}</c> <c>5</c> <c>0b110</c> <c>CE</c> <c>6</c> <c>0b111</c> <c>Currently2}</td> <td>5</td> </tr> <tr> <td>0b110</td> <td>CE</td> <td>6</td> </tr> <tr> <td>0b111</td> <td>Currently Unused {Note2}</c> <c>5</c> </texttable> <t>{Note 1}: If2}</td> <td>5</td> </tr> </tbody> </table> <!-- [rfced] We converted the notes following Table 4 into a list for clarity. Please let us know if you have any concerns. --> <dl indent="9"><dt>Note 1:</dt><dd><t>If the Server is in AccECN mode and in SYN-RCVD state, and if it receives a value of zero on a pure ACK with SYN=0 and no SACK blocks, for the rest of the connection the ServerMUST NOT<bcp14>MUST NOT</bcp14> set ECT on outgoing packets andMUST NOT<bcp14>MUST NOT</bcp14> respond to AccECN feedback. Nonetheless, as a DataReceiverReceiver, itMUST NOT<bcp14>MUST NOT</bcp14> disable AccECN feedback.</t> <t>Any of the circumstances below could cause a value of zero but, whatever the cause, the actions above would be the appropriateresponse:<list style="symbols">response:</t> <ul spacing="normal"> <li> <t>The TCP Client has somehow entered No ECN feedback mode (most likely if the Server received a SYN or sent a SYN/ACK with (AE,CWR,ECE) = (0,0,0) after entering AccECN mode, but possible even if it didn't);</t> </li> <li> <t>The TCP Client genuinely might be in AccECN mode, but its count of received CE marks might have caused the ACE field to wrap to zero. This is highly unlikely, but not impossible because the Server might have already sent multiple packets while still in SYN-RCVD state,e.g., usinge.g., using TFO (see <xreftarget="accecn_Interaction_Other"/>)target="accecn_Interaction_Other"/>), and some might have been CE-marked. Then ACE on the first ACK seen by the Server might be zero, due to previous ACKs experiencing an unfortunate pattern of loss or delay.</t><t>Some</li> <li> <t>There is some form of non-compliance at the TCP Client or on the path (see <xref target="accecn_sec_ACE_init_invalid"/>).</t></list></t> <t>{Note 2}:</li> </ul></dd> <dt>Note 2:</dt><dd> If the Server is in AccECN mode, these values are Currently Unused but the AccECN Server's behaviour is still defined for forward compatibility. Then the designer of a future protocol can know for certain what AccECN Servers will do with thesecodepoints.</t> <t>{Note 3}:codepoints.</dd> <dt>Note 3:</dt><dd> In the case where a Server that implements AccECN is also using a stateless handshake (termed a SYNcookie)cookie), it will not remember whether it entered AccECN mode. The values 0b000 or 0b001 will remind it that it did not enter AccECN mode, because AccECN does not use them (see <xref target="accecn_Interaction_SYN_Cookies"/> for details). If a Server that uses a stateless handshake and implements AccECN receives either of these two values in the ACK, its action is implementation-dependent and outside the scope of this document. It will certainly not take the action in the third column because, after it receives either of these values, it is not in AccECN mode.InFor example, it will not disable ECN (at least not just because ACE is 0b000) and it will not sets.cep.</t>s.cep.</dd></dl> </section> <sectionanchor="accecn_sec_ACE_feedback" title="Encodinganchor="accecn_sec_ACE_feedback"> <name>Encoding and Decoding Feedback in the ACEField">Field</name> <t>Whenever the Data Receiver sends an ACK with SYN=0 (with or without data), unless the handshake encoding in <xref target="accecn_ACE_3rdACK"/> applies, the Data ReceiverMUST<bcp14>MUST</bcp14> encode the least significant 3 bits of its r.cep counter into the ACE field (see <xref target="accecn_Algo_ACE_Wrap"/>).</t> <t>Whenever the Data Sender receives an ACK with SYN=0 (with or without data), it first checks whether it has already been superseded (defined in <xref target="accecn_Algo_Option_Coding"/>) by another ACK in which case it ignores the ECN feedback. If the ACK has not been superseded, and if the special handshake encoding in <xref target="accecn_ACE_3rdACK"/> does not apply, the Data Sender decodes the ACE field as follows (see <xref target="accecn_Algo_ACE_Wrap"/> forexamples).<list style="symbols">examples).</t> <ul spacing="normal"> <li> <t>It takes the least significant 3 bits of its local s.cep counter and subtracts them from the incoming ACE counter to work out the minimum positive increment it could apply to s.cep (assuming the ACE field only wrapped once atmost once).</t>most).</t> </li> <li> <t>It then follows the safety procedures in <xref target="accecn_ACE_Safety_S"/> to calculate or estimate how many packets the ACK could have acknowledged under the prevailing conditions to determine whether the ACE field might have wrapped more than once.</t></list></t></li> </ul> <t>The encode/decode procedures during the three-way handshake are exceptions to the general rules given so far, so they are spelled out step by step below forclarity:<list style="symbols">clarity:</t> <ul spacing="normal"> <li> <t>If a TCP Server in AccECN mode receives a CE mark in the IP-ECN field of a SYN (SYN=1, ACK=0), itMUST NOT<bcp14>MUST NOT</bcp14> increment r.cep (it remains at its initial value of 5).<vspace blankLines="1"/>Reason:</t> <t>Reason: It would be redundant for the Server to include CE-marked SYNs in its r.cep counter, because it already reliably delivers feedback of any CE marking using the encoding in the top block of <xref target="accecn_Tab_Negotiation"/> in the SYN/ACK. This also ensures that, when the Server starts using the ACE field, it has not unnecessarily consumed more than one initial value, given they can be used to negotiate variants of the AccECN protocol (see <xref target="accecn_space_evolution"/>).</t> </li> <li> <t>If a TCP Client in AccECN mode receives CE feedback in the TCP flags of a SYN/ACK, itMUST NOT<bcp14>MUST NOT</bcp14> increment s.cep (it remains at its initial value of5),5) so that it stays in step with r.cep on the Server. Nonetheless, the TCP Client still triggers the congestion control actions necessary to respond to the CE feedback.</t> </li> <li> <t>If a TCP Client in AccECN mode receives a CE mark in the IP-ECN field of a SYN/ACK, itMUST<bcp14>MUST</bcp14> increment r.cep, but no more than once no matter how many CE-marked SYN/ACKs it receives(i.e., incremented(i.e., incremented from 5 to 6, but no further).<vspace blankLines="1"/>Reason:</t> <t>Reason: Incrementing r.cep ensures the Client will eventually deliver any CE marking to the Server reliably when it starts using the ACE field. Even though the Client also feeds back any CE marking on the ACK of the SYN/ACK using the encoding in <xref target="accecn_Tab_SYN-ACK_fb2"/>, this ACK is not delivered reliably, so it can be considered as a timely notification that is redundant but unreliable. The Client does not increment r.cep more than once, because the Server can only increment s.cep once (see next bullet). Also, this limits the unnecessarily consumed initial values of the ACE field to two.</t> </li> <li> <t>If a TCP Server in AccECN mode and in SYN-RCVD state receives CE feedback in the TCP flags of a pure ACK with no SACK blocks, itMUST<bcp14>MUST</bcp14> increment s.cep (from 5 to 6). The TCP Server then triggers the congestion control actions necessary to respond to the CEfeedback.<vspace blankLines="1"/>Reasoning:feedback.</t> <t>Reasoning: The TCP Server can only increment s.cep once, because the first ACK it receives will cause it to transition out of SYN-RCVD state. The Server's congestion response would be nodifferentdifferent, even if it could receive feedback of more than one CE-markedSYN/ACK.<vspace blankLines="1"/>OnceSYN/ACK.</t> <t>Once the TCP Server transitions to ESTABLISHED state, it might later receive other pure ACK(s) with the handshake encoding in the ACE field. A ServerMAY<bcp14>MAY</bcp14> implement a test for such a case, but it is not required. Therefore, once in the ESTABLISHED state, it will be sufficient for the Server to consider the ACE field to be encoded as the normal ACE counter on all packets withSYN=0.<vspace blankLines="1"/>Reasoning:SYN=0.</t> <t>Reasoning: Such ACKs will be quite unusual,e.g., ae.g., a SYN/ACK (or ACK of the SYN/ACK) that is delayed for longer than the Server's retransmission timeout; or packet duplication by the network. And the impact of any error in the feedback on such ACKs will only be temporary.</t></list></t></li> </ul> </section> <sectionanchor="accecn_sec_ecn-mangling" title="Testinganchor="accecn_sec_ecn-mangling"> <name>Testing for Mangling of the IP/ECNField"> <t><list style="symbols">Field</name> <ul spacing="normal"> <li> <t>TCP Clientside:<vspace blankLines="1"/>Theside:</t> <t>The value of the TCP-ECN flags on the SYN/ACK indicates the value of the IP-ECN field when the SYN arrived at the Server. The TCP Client can compare this with how it originally set the IP-ECN field on the SYN. If this comparison implies an invalid transition (defined below) of the IP-ECN field, for the remainder of the half-connection the Client is advised to send non-ECN-capable packets, but it still ought to respond to any feedback of CE markings (explained below). However, the TCP ClientMUST<bcp14>MUST</bcp14> remain in the AccECN feedback mode and itMUST<bcp14>MUST</bcp14> continue to feed back any ECN markings on arriving packets (in its role as Data Receiver). <!--There is no need to say the following for forward compatibility: "If the server deliberately sends false feedback in the ACE field that implies an unsafe transition, it MUST continue the connection even if the client does not disable sending ECN-capablepackets"--></t>packets"--> </t> </li> <li> <t>TCP Serverside:<vspace blankLines="1"/>Theside:</t> <t>The value of the ACE field on the last ACK of the three-way handshake indicates the value of the IP-ECN field when the SYN/ACK arrived at the TCP Client. The Server can compare this with how it originally set the IP-ECN field on the SYN/ACK. If this comparison implies an invalid transition of the IP-ECN field, for the remainder of the half-connection the Server is advised to send non-ECN-capable packets, but it still ought to respond to any feedback of CE markings (explained below). However, the ServerMUST<bcp14>MUST</bcp14> remain in the AccECN feedback mode and itMUST<bcp14>MUST</bcp14> continue to feed back any ECN markings on arriving packets (in its role as Data Receiver).<!--There is no need to say the following for forward compatibility: "If the client deliberately sends false feedback in the ACE field that implies an unsafe transition, it MUST continue the connection even if the server does not disable sending ECN-capablepackets"--></t> </list></t>packets"--> </t> </li> </ul> <t>If a Data Sender in AccECN mode starts sending non-ECN-capable packets because it has detected mangling, it is still advised to respond to CE feedback. Reason:any CE-markingAny CE marking arriving at the Data Receiver could be due to something early in the path mangling the non-ECN-capable IP-ECN field into an ECN-capable codepoint and then, later in the path, a network bottleneck might be applyingCE-markingsCE markings to indicate genuine congestion. This argument applies whether the handshake packet originally sent by the TCP Client or Server was non-ECN-capable or ECN-capable because, in either case, an unsafe transition could imply that non-ECN-capable packets later in the connection might get mangled.</t> <t>Once a Data Sender has entered AccECN mode it is advised to check whether it is receiving continuous feedback of CE. Specifying exactly how to do this is beyond the scope of the present specification, but the sender might check whether the feedback for every packet it sends for the first three or four rounds indicatesCE-marking.CE marking. If continuousCE-markingCE marking is detected, for the remainder of the half-connection, the Data Sender ought to send non-ECN-capablepacketspackets, and it is advised not to respond to any feedback of CE markings. The Data Sender might occasionally test whether it can resume sending ECN-capable packets.</t> <t>The above advice on switching to sending non-ECN-capable packets but still responding toCE-markingsCE markings unless they become continuous is not stated normatively (in capitals), because the best strategy might depend on experience of the most likely types of mangling, which can only be known at the time of deployment. The same is true for other forms of mangling (or resumption of expected marking) during later stages of a connection.</t> <t>As always, once a host has entered AccECN mode, it follows the general mandatory requirements (<xref target="accecn_implications_accecn_mode"/>) to remain in the same feedback mode and to continue feeding back any ECN markings on arriving packets using AccECN feedback. This follows the general approach where an AccECN Data Receiver mechanistically reflects whatever it receives (<xref target="accecn_demb_reflector"/>).</t> <t>The ACK of the SYN/ACK is not reliably delivered (nonetheless, the count of CE marks is still eventually delivered reliably). If this ACK does not arrive, the Server is advised to continue to send ECN-capable packets without having tested for mangling of the IP-ECN field on the SYN/ACK.</t> <t>All the fall-back behaviours in this section are necessary in case mangling of the IP-ECN field is asymmetric, which is currently common over some mobile networks <xref target="Mandalari18"/>.ThenIn this case, one end might see no unsafe transition and continue sending ECN-capable packets, while the other end sees an unsafe transition and stops sending ECN-capable packets.</t> <t>Invalid transitions of the IP-ECN field are defined insection 18Section <xref target="RFC3168" sectionFormat="bare" section="18"/> of the Classic ECN specification <xref target="RFC3168"/> and repeated here forconvenience:<list style="symbols">convenience:</t> <ul spacing="normal"> <li> <t>thenot-ECTNot-ECT codepoint changes;</t> </li> <li> <t>either ECT codepoint transitions tonot-ECT;</t>Not-ECT;</t> </li> <li> <t>the CE codepoint changes.</t></list></t></li> </ul> <t>RFC 3168 says that a router that changes ECT tonot-ECTNot-ECT is invalid but safe. However, from a host's viewpoint, this transition is unsafe because it could be the result of two transitions at different routers on the path: ECT to CE (safe) then CE tonot-ECTNot-ECT (unsafe). This scenario could well happen where an ECN-enabled home router congests its upstream mobile broadband bottleneck link, then the ingress to the mobile network clears the ECN field <xref target="Mandalari18"/>.</t> </section> <sectionanchor="accecn_sec_ACE_init_invalid" title="Testinganchor="accecn_sec_ACE_init_invalid"> <name>Testing for Zeroing of the ACEField">Field</name> <t><xref target="accecn_ACE"/> required the Data Receiver to initialize the r.cep counter to a non-zero value. Therefore, in either direction the initial value of the ACE counter ought to be non-zero.</t> <t>This section does not concern the case where the ACE field is zero when the handshake encoding has been used on the ACK of the SYN/ACK under the carefully worded conditions in <xref target="accecn_ACE_3rdACK"/>.</t> <t>If AccECN has been successfully negotiated, the Data SenderMAY<bcp14>MAY</bcp14> check the value of the ACE counter in the first feedback packet (with or without data) that arrives after the three-way handshake. If the value of this ACE field is found to be zero (0b000), for the remainder of the half-connection the Data Sender ought to send non-ECN-capable packets and it is advised not to respond to any feedback of CE markings.</t> <t>Reason: the symptoms imply any or all of the following:<list style="symbols"></t> <ul spacing="normal"> <li> <t>the remote peer has somehow entered Not ECN feedback mode;</t> </li> <li> <t>a broken remote TCP implementation;</t> </li> <li> <t>potential mangling of the ECN fields in the TCP headers (although unlikely given they clearly survived during the handshake).</t></list></t></li> </ul> <!-- [rfced] We are having trouble parsing "depend on experience of the most likely scenarios". Does it depend on how good the experience is, the outcome, etc? Please consider whether this text can be clarified. Original: This advice is not stated normatively (in capitals), because the best strategy might depend on experience of the most likely scenarios, which can only be known at the time of deployment. --> <t>This advice is not stated normatively (in capitals), because the best strategy might depend on experience of the most likely scenarios, which can only be known at the time of deployment.</t> <t>Note that a host in AccECN modeMUST<bcp14>MUST</bcp14> continue to provide Accurate ECN feedback to its peer, even if it is no longer sending ECT itself over the other half connection. <!--There is no need to say the following for forward compatibility: "If a data receiver negotiates AccECN but then zeros the ACE field in its first segment with SYN=0, it MUST continue the connection even if the data sender does not disable sending ECN-capablepackets."--></t>packets."--> </t> <t>If reordering occurs, the first feedback packet that arrives will not necessarily be the same as the first packet in sequence order. The test has been specified loosely like this to simplify implementation, and because it would not have been any more precise to have specified the first packet in sequence order, which would not necessarily be the first ACE counter that the Data Receiver fed back anyway, given it might have been a retransmission.</t> <t>The possibility ofre-orderingreordering means that there is a small chance that the ACE field on the first packet to arrive is genuinely zero (without middlebox interference). This would cause a host to unnecessarily disable ECN for a half connection. Therefore, in environments where there is no evidence of the ACE field being zeroed, implementationsMAY<bcp14>MAY</bcp14> skip this test.</t> <t>Note that the Data SenderMUST NOT<bcp14>MUST NOT</bcp14> test whether the arriving counter in the initial ACE field has been initialized to a specific valid value--- the above check solely tests whether the ACE fields have been incorrectly zeroed. This allows hosts to use different initial values as an additional signalling channel in the future.</t> </section> <sectionanchor="accecn_ACE_Safety" title="Safety againstanchor="accecn_ACE_Safety"> <name>Safety Against Ambiguity of the ACEField">Field</name> <t>If too many CE-marked segments are acknowledged at once, or if a long run of ACKs is lost or thinned out, the 3-bit counter in the ACE field might have cycled between two ACKs arriving at the Data Sender. The following safety procedures minimize this ambiguity.</t> <sectionanchor="accecn_ACE_Safety_R" title="Packetanchor="accecn_ACE_Safety_R"> <name>Packet Receiver SafetyProcedures">Procedures</name> <t>The following rules define when the receiver of a packet in AccECN mode emits anACK:<list style="hanging"> <t hangText="Change-Triggered ACKs:">AnACK:</t> <dl newline="false" spacing="normal"> <dt>Change-Triggered ACKs:</dt> <dd> <t>An AccECN Data ReceiverSHOULD<bcp14>SHOULD</bcp14> emit an ACK whenever a data packet marked CE arrives after the previous packet was notCE.<vspace blankLines="1"/>EvenCE.</t> <!-- [rfced] Where is "below these bullets", as we don't see a bulletized list in Section 3.2.2.5.1? If possible, we recommend adding a pointer for clarity. Original: Even though this rule is stated as a "SHOULD", it is important for a transition to trigger an ACK if at all possible, The only valid exception to this rule is given below thesebullets.<vspace blankLines="1"/>Forbullets. --> <t>Even though this rule is stated as a "<bcp14>SHOULD</bcp14>", it is important for a transition to trigger an ACK if at all possible. The only valid exception to this rule is given below these bullets.</t> <t>For the avoidance of doubt, this rule is deliberately worded to apply solely when<spanx style="emph">data</spanx><em>data</em> packets arrive, but the comparison with the previous packet includes any packet, not just data packets.</t><t hangText="Increment-Triggered ACKs:">An</dd> <dt>Increment-Triggered ACKs:</dt> <dd>An AccECN receiver of a packetMUST<bcp14>MUST</bcp14> emit an ACK if 'n' CE marks have arrived since the previous ACK. If there is unacknowledged data at the receiver, 'n'SHOULD<bcp14>SHOULD</bcp14> be 2. If there is no unacknowledged data at the receiver, 'n'SHOULD<bcp14>SHOULD</bcp14> be 3 andMUST<bcp14>MUST</bcp14> be no less than 3. In either case, 'n'MUST<bcp14>MUST</bcp14> be no greater than7.</t> </list>The7.</dd> </dl> <t>The above rules for when to send an ACK are designed to be complemented by those in <xref target="accecn_option_usage"/>, which concern whether an AccECN TCP Option ought to be included on ACKs.</t> <t>If the arrivals of a number of data packets are all processed as one event,e.g., usinge.g., using large receive offload (LRO) or generic receive offload (GRO), both the above rulesSHOULD<bcp14>SHOULD</bcp14> be interpreted as requiring multiple ACKs to be emittedback-to-backback to back (for each transition and for each sequence of 'n' CE marks). If this is problematic for high performance, either rule can be interpreted as requiring just a single ACK at the end of the whole receive event.</t> <t>Even if a number of data packets do not arrive as one event, the 'Change-Triggered ACKs' rule could sometimes cause the ACK rate to be problematic for high performance (although high performance protocols such as DCTCP already successfully use change-triggered ACKs). The rationale for change-triggered ACKs is so that the Data Sender can rely on them to detect queue growth as soon as possible, particularly at the start of a flow. The approach can lead to some additional ACKs but it feeds back the timing and the order in which ECN marks are received with minimal additional complexity. If CE marks are infrequent, as is the case for most Active QueueManagmentManagement (AQM) packet schedulers at the time of writing, or there are multiple marks in a row, the additional load will be low. However, marking patterns with numerous non-contiguous CE marks could increase the load significantly. One possible compromise would be for the receiver to heuristically detect whether the sender is in slow-start, then to implement change-triggered ACKs while the sender is in slow-start, and offload otherwise.</t> <t>In a scenario where both endpoints support AccECN, if host B has chosen to use ECN-capable pure ACKs (as allowed in <xref target="RFC8311"/> experiments) and enough of these ACKs becomeCE-marked,CE marked, then the 'Increment-Triggered ACKs' rule ensures that its peer (host A) gives B sufficient feedback about this congestion on the ACKs from B to A. Normally, for instance in a unidirectional data scenario from host A to B, the Data Sender (A) can piggyback that feedback on its data. But if A stops sending data, the second part of the 'Increment-Triggered ACKs' rule requires A to emit a pure ACK for at least every third CE-marked incoming ACK over the subsequent round trip.</t> <t>Although TCP normally only ACKs data segments, in this case the increment-triggered ACK rule makes it mandatory for A to emit ACKs of ACKs. This is justifiable because the ACKs in this case are ECN-capable and so, even though the ACKs of these ACKs do not acknowledge new data, they feed back new congestion state (useful in case B starts sending). The minimum of 3 for 'n' in this case ensures that, even if A also uses ECN-capable pure ACKs, and even if there is pathological congestion in both directions, any resulting ping-pong of ACKs will be rapidly damped.</t> <t>In the above bidirectional scenario, incoming ACKs of ACKs could be mistaken for duplicate ACKs. But ACKs of ACKs can be distinguished from duplicate ACKs because they do not contain any SACK blocks even when SACK has been negotiated. It is outside the scope of this AccECN specification to normatively specify this additional test for DupACKs, because ACKs of ACKs can only arise if the original ACKs are ECN-capable.InsteadInstead, any specification that allows ECN-capable pure ACKsMUST<bcp14>MUST</bcp14> make sending ACKs of ACKs conditional on measures to distinguish ACKs of ACKs from DupACKs (see for example <xref target="I-D.ietf-tcpm-generalized-ecn"/>). All that is necessary here is to require that these ACKs of ACKsMUST NOT<bcp14>MUST NOT</bcp14> contain any SACK blocks (which would normally not happen anyway).</t> </section> <sectionanchor="accecn_ACE_Safety_S" title="Dataanchor="accecn_ACE_Safety_S"> <name>Data Sender SafetyProcedures">Procedures</name> <t>If the Data Sender has not received AccECN TCP Options to give it more dependable information, and it detects that the ACE field could have cycled, itSHOULD<bcp14>SHOULD</bcp14> deem whether it cycled by taking the safest likely case under the prevailing conditions. It can detect if the counter could have cycled by using the jump in the acknowledgement number since the last ACK to calculate or estimate how many segments could have been acknowledged. An example algorithm to implement this policy is given in <xref target="accecn_Algo_ACE_Wrap"/>. An implementationMAY<bcp14>MAY</bcp14> use an alternative algorithm as long as it satisfies the requirements in this subsection.</t> <t>If missing acknowledgement numbers arrive later (reordering) and prove that the counter did not cycle, the Data SenderMAY<bcp14>MAY</bcp14> attempt to neutralize the effect of any action it took based on a conservative assumption that it later found to be incorrect.</t> <t>The Data Sender can estimate how many packets (of any marking) an ACK acknowledges. If the ACE counter on an ACK seems to imply that the minimum number of newly CE-marked packets is greater than the number of newly acknowledged packets, the Data SenderSHOULD<bcp14>SHOULD</bcp14> consider the ACE counter to be correct (and its count of control packets to be incomplete), unless it can be sure that it is counting all control packets correctly.</t> </section> </section> </section> <sectionanchor="accecn_option" title="Theanchor="accecn_option"> <name>The AccECNOption">Option</name> <t>Two alternative AccECN Options are defined as shown in <xref target="accecn_Fig_TCPopt"/>. The initial 'E' of each field name stands for 'Echo'.</t> <figurealign="center" anchor="accecn_Fig_TCPopt" title="Theanchor="accecn_Fig_TCPopt"> <name>The Two Alternative AccECN TCPOptions">Options</name> <artwork align="center"><![CDATA[ 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Kind = 172 | Length = 11 | EE0B field | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | EE0B (cont'd) | ECEB field | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | EE1B field | Order 0 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Kind = 174 | Length = 11 | EE1B field | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | EE1B (cont'd) | ECEB field | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | EE0B field | Order 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ ]]></artwork> </figure> <t><xref target="accecn_Fig_TCPopt"/> shows two option field orders; order 0 and order 1. They bothconsistsconsist of three 24-bit fields. Order 0 provides the 24 least significant bits of the r.e0b,r.cebr.ceb, and r.e1b counters, respectively. Order 1 provides the same fields, but in the opposite order. On each packet, the Data Receiver can use whichever order is more efficient. In either case, the bytes within the fields are in network byte order (big-endian).</t> <t>The choice to use three bytes (24 bits) fields in the options was made to strike a balance between TCP option space usage, and the required fidelity of the counters toaccomodateaccommodate typical scenarios such as hardware TCPsegmentation offloadingSegmentation Offloading (TSO), and periodswhereduring which no option may be transmitted(e.g., SACK(e.g., SACK loss recovery). Providing only 2 bytes (16 bits) for these counters could easily roll over within a single TSO transmission or large/generic receive offload (LRO/GRO) event. Having two distinct orderings further allows the transmission of the most pertinent changes in an abbreviated option (see below).</t> <t>When a Data Receiver sends an AccECN Option, itMUST<bcp14>MUST</bcp14> set the Kind field to 172 if using Order 0, or to 174 if using Order 1. These two new TCP Option Kinds are registered in <xref target="accecn_IANA_Considerations"/> and are calledrespectivelyAccECN0 andAccECN1.</t>AccECN1, respectively.</t> <t>Note that there is no field to feed back Not-ECT bytes.NonethelessNonetheless, an algorithm for the Data Sender to calculate the number of payload bytes received as Not-ECT is given in <xref target="accecn_Algo_Not-ECT"/>.</t> <t>Whenever a Data Receiver sends an AccECN Option, the rules in <xref target="accecn_option_usage"/> allow it to omit unchanged fields from the tail of the option, to help cope with option space limitations, as long as it preserves the order of the remaining fields and includes any field that has changed. The length fieldMUST<bcp14>MUST</bcp14> indicate which fields are present as follows:</t><texttable suppress-title="true" anchor="accecn_Fig_TCPopttab" title="Fields<table anchor="accecn_Fig_TCPopttab"> <name>Fields included in AccECN TCP Options of each length andorder"> <ttcol>Length</ttcol> <ttcol>Order 0</ttcol> <ttcol>Order 1</ttcol> <c>11</c> <c>EE0B,order</name> <thead> <tr> <th>Length</th> <th>Order 0</th> <th>Order 1</th> </tr> </thead> <tbody> <tr> <td>11</td> <td>EE0B, ECEB,EE1B</c> <c>EE1B,EE1B</td> <td>EE1B, ECEB,EE0B</c> <c>8</c> <c>EE0B, ECEB</c> <c>EE1B, ECEB</c> <c>5</c> <c>EE0B</c> <c>EE1B</c> <c>2</c> <c>(empty)</c> <c>(empty)</c> </texttable>EE0B</td> </tr> <tr> <td>8</td> <td>EE0B, ECEB</td> <td>EE1B, ECEB</td> </tr> <tr> <td>5</td> <td>EE0B</td> <td>EE1B</td> </tr> <tr> <td>2</td> <td>(empty)</td> <td>(empty)</td> </tr> </tbody> </table> <t>The empty option of Length=2 is provided to allow for a case where an AccECN Option has to be sent(e.g., on(e.g., on the SYN/ACK to test the path), but there is very limited space for the option.</t> <t>All implementations of a Data Sender that read any AccECN OptionMUST<bcp14>MUST</bcp14> be able to read AccECN Options of any of the above lengths. For forward compatibility, if the AccECN Option is of any other length, implementationsMUST<bcp14>MUST</bcp14> use those whole 3-octet fields that fit within the length and ignore the remainder of the option, treating it as padding.</t> <t>AccECN Options have to be optional to implement, because both sender and receiver have to be able to cope without options anyway--- in cases where they do not traverse a network path. It isRECOMMENDED<bcp14>RECOMMENDED</bcp14> to implement both sending and receiving of AccECN Options. Support for AccECN Options is particularly valuable over paths that introduce a high degree of ACK filtering, where the 3-bit ACE counter alone might sometimes be insufficient, when it is ambiguous whether it has wrapped. If sending of AccECN Options is implemented, the fall-backs described in this document will need to be implemented as well (unless solely for a controlled environment where path traversal is not considered a problem). Even if a developer does not implement logic to understand received AccECN Options, it isRECOMMENDED<bcp14>RECOMMENDED</bcp14> that they implement logic to send AccECN Options. Otherwise, those remote peers that implement the receiving logic will still be excluded from congestion feedback that is robust against the increasingly aggressive ACK filtering in the Internet. The logic to send AccECN Options is the simpler to implement of the two sides.</t> <t>If a Data Receiver intends to send an AccECN Option at any time during the rest of theconnectionconnection, it isRECOMMENDED<bcp14>RECOMMENDED</bcp14> to also test path traversal of the AccECN Option as specified in <xref target="accecn_Mbox_Interference"/>.</t><section title="Encoding<section> <name>Encoding and Decoding Feedback in the AccECN OptionFields">Fields</name> <t>Whenever the Data Receiver includes any of the counter fields (ECEB, EE0B, EE1B) in an AccECN Option, itMUST<bcp14>MUST</bcp14> encode the 24 least significant bits of the current value of the associated counter into the field (respectively r.ceb, r.e0b, r.e1b).</t> <t>Whenever the Data Sender receives an ACK carrying an AccECN Option, it first checks whether the ACK has already been superseded by another ACK in which case it ignores the ECN feedback. If the ACK has not been superseded, the Data Sender normally decodes the fields in the AccECN Option as follows. For each field, it takes the least significant 24 bits of its associated local counter (s.ceb,s.e0bs.e0b, or s.e1b) and subtracts them from the counter in the associated field of the incoming AccECN Option (respectively ECEB, EE0B, EE1B), to work out the minimum positive increment it could apply to s.ceb,s.e0bs.e0b, or s.e1b (assuming the field in the option only wrapped once atmost once).</t>most).</t> <t><xref target="accecn_Algo_Option_Coding"/> gives an example algorithm for the Data Receiver to encode its byte counters into an AccECN Option, and for the Data Sender to decode the AccECN Option fields into its byte counters.</t> <t>Note that, as specified in <xref target="accecn_feedback"/>, any data on the SYN (SYN=1, ACK=0) is not included in any of the byte counters held locally for each ECN marking nor in an AccECN Option on the wire.</t> </section> <sectionanchor="accecn_Mbox_Interference" title="Pathanchor="accecn_Mbox_Interference"> <name>Path Traversal of the AccECNOption"> <t/>Option</name> <sectionanchor="accecn_AccECN_Option_3WHS" title="Testinganchor="accecn_AccECN_Option_3WHS"> <name>Testing the AccECN OptionduringDuring theHandshake">Handshake</name> <t>The TCP ClientMUST NOT<bcp14>MUST NOT</bcp14> include an AccECN TCP Option on the SYN. If there is somehow an AccECN Option on a SYN, itMUST<bcp14>MUST</bcp14> be ignored when forwarded or received.</t> <t>A TCP Server that confirms its support for AccECN (in response to an AccECN SYN from the Client as described in <xref target="accecn_Negotiation"/>)SHOULD<bcp14>SHOULD</bcp14> include an AccECN TCP Option on the SYN/ACK.</t> <t>A TCP Client that has successfully negotiated AccECNSHOULD<bcp14>SHOULD</bcp14> include an AccECN Option in the first ACK at the end of the three-way handshake. However, this first ACK is not delivered reliably, so the TCP ClientSHOULD<bcp14>SHOULD</bcp14> also include an AccECN Option on the first data segment it sends (if it ever sends one).</t> <t>A hostMAY<bcp14>MAY</bcp14> omit an AccECN Option in any of the above three casesdue tobecause of insufficient option space orifbecause it has cached knowledge that the packet would be likely to be blocked on the path to the other host if it included an AccECN Option.</t> </section> <sectionanchor="accecn_AccECN_Option_Loss" title="Testinganchor="accecn_AccECN_Option_Loss"> <name>Testing for Loss of Packets Carrying the AccECNOption">Option</name> <t>If the TCP Server has not received an ACK to acknowledge its SYN/ACK after the normal TCP timeout or if it receives a second SYN with a request for AccECN support, then either the SYN/ACK might just have been lost,e.g., duee.g., due to congestion, or a middlebox might be blocking AccECN Options. To expedite connection setup in deployment scenarios where AccECN path traversal might be problematic, the TCP ServerSHOULD<bcp14>SHOULD</bcp14> retransmit the SYN/ACK, but with no AccECN Option. If this retransmission times out, to expedite connection setup, the TCP ServerSHOULD<bcp14>SHOULD</bcp14> retransmit the SYN/ACK with (AE,CWR,ECE) = (0,0,0) and no AccECN Option, but it remains in AccECN feedback mode (per <xref target="accecn_implications_accecn_mode"/>).</t> <aside> <t>Note that a retransmitted AccECN SYN/ACK will not necessarily have the same TCP-ECN flags as the original SYN/ACK, because it feeds back the IP-ECN field of the latest SYN to have arrived (by the rule in <xref target="accecn_implications_accecn_mode"/>).</t> </aside> <t>The above fall-back approach limits any interference by middleboxes that might drop packets with unknown options, even though it is more likely that SYN/ACK loss is due to congestion. The TCP ServerMAY<bcp14>MAY</bcp14> try to send another packet with an AccECN Option at a later point during the connection but it ought to monitor if that packet got lost as well, in which case itSHOULD<bcp14>SHOULD</bcp14> disable the sending of AccECN Options for this half-connection.</t> <t>ImplementersMAY<bcp14>MAY</bcp14> use other fall-back strategies if they are found to be more effective(e.g., retrying(e.g., retrying an AccECN Option for a second time before fall-back--- most appropriate during high levels of congestion). However, other fall-back strategies will need to follow all the rules in <xref target="accecn_implications_accecn_mode"/>, which concern behaviour when SYNs or SYN/ACKs negotiating different types of feedback have been sent within the same connection.</t> <t>Further it might make sense to also remove any other new or experimental fields or options on the SYN/ACK, although the required behaviour will depend on the specification of the other option(s) and on any attempt toco-ordinatecoordinate fall-back between different modules of the stack.</t> <t>If the TCP Client detects that the first data segment it sent with an AccECN Option was lost, in deployment scenarios where AccECN path traversal might be problematic, itSHOULD<bcp14>SHOULD</bcp14> fall back to no AccECN Option on the retransmission. Again, implementersMAY<bcp14>MAY</bcp14> use other fall-back strategies such as attempting to retransmit a second segment with an AccECN Option before fall-back, and/or caching whether AccECN Options are blocked for subsequent connections. <xref target="RFC9040"/> further discusses caching of TCP parameters and status information.</t> <t>If a middlebox is dropping packets with options it does not recognize, a host that is sending little or no data but mostly pure ACKs will not inherently detect such losses. Such a hostMAY<bcp14>MAY</bcp14> detect loss of ACKs carrying the AccECN Option by detecting whether the acknowledged data always reappears as a retransmission. In such cases, the hostSHOULD<bcp14>SHOULD</bcp14> disable the sending of the AccECN Option for this half-connection.</t> <t>If a host falls back to not sending AccECN Options, it will continue to process any incoming AccECN Options as normal.</t> <t>Either hostMAY<bcp14>MAY</bcp14> include AccECN Options ina subsequent segmentone or more subsequent segments to retest whether AccECN Options can traverse the path.</t> <t>Similarly, an AccECN endpointMAY<bcp14>MAY</bcp14> separately memorize which data packets carried an AccECN Option and disable the sending of AccECN Options if the loss probability of those packets is significantly higher than that of all other data packets in the same connection.</t> </section><section title="Testing<section> <name>Testing for Absence of the AccECNOption">Option</name> <t>If the TCP Client has successfully negotiated AccECN but does not receive an AccECN Option on the SYN/ACK(e.g., because(e.g., because is has been stripped by a middlebox or not sent by the Server), the Client switches into a mode that assumes that the AccECN Option is not available for this half connection.</t> <t>Similarly, if the TCP Server has successfully negotiated AccECN but does not receive an AccECN Option on the first segment that acknowledges sequence space at least covering the ISN, it switches into a mode that assumes that the AccECN Option is not available for this half connection.</t> <t>While a host is in this mode that assumes incoming AccECN Options are not available, itMUST<bcp14>MUST</bcp14> adopt the conservative interpretation of the ACE field discussed in <xref target="accecn_ACE_Safety"/>. However, it cannot make any assumption about support of outgoing AccECN Options on the other half connection, so itSHOULD<bcp14>SHOULD</bcp14> continue to send AccECN Options itself (unless it has established that sending AccECN Options is causing packets to be blocked as in <xref target="accecn_AccECN_Option_Loss"/>).</t> <t>If a host is in the mode that assumes incoming AccECN Options are not available, but it receives an AccECN Option at any later point during the connection, this clearly indicates that AccECN Options are no longer blocked on the respective path, and the AccECN endpointMAY<bcp14>MAY</bcp14> switch out of the mode that assumes AccECN Options are not available for this half connection.</t> </section> <sectionanchor="accecn_sec_zero_option" title="Testanchor="accecn_sec_zero_option"> <name>Test for Zeroing of the AccECNOption">Option</name> <t>For a related test for invalid initialization of the ACE field, see <xref target="accecn_sec_ACE_init_invalid"/></t> <t><xref target="accecn_init_counters"/> required the Data Receiver to initialize the r.e0b and r.e1b counters to a non-zero value. Therefore, in either direction the initial value of the EE0B field or EE1B field in an AccECN Option (if one exists) ought to be non-zero. If AccECN has beennegotiated:<list style="symbols">negotiated:</t> <ul spacing="normal"> <li> <t>the TCP ServerMAY<bcp14>MAY</bcp14> check that the initial value of the EE0B field or the EE1B field is non-zero in the first segment that acknowledges sequence space that at least covers the ISN plus 1. If it runs a test and either initial value is zero, the Server will switch into a mode that ignores AccECN Options for this half connection.</t> </li> <li> <t>the TCP ClientMAY<bcp14>MAY</bcp14> check that the initial value of the EE0B field or the EE1B field is non-zero on the SYN/ACK. If it runs a test and either initial value is zero, the Client will switch into a mode that ignores AccECN Options for this half connection.</t></list></t></li> </ul> <t>While a host is in the mode that ignores AccECNOptionsOptions, itMUST<bcp14>MUST</bcp14> adopt the conservative interpretation of the ACE field discussed in <xref target="accecn_ACE_Safety"/>.</t> <t>Note that the Data SenderMUST NOT<bcp14>MUST NOT</bcp14> test whether the arriving byte counters in an initial AccECN Option have been initialized to specific valid values--- the above checks solely test whether these fields have been incorrectly zeroed. This allows hosts to use different initial values as an additional signalling channel in the future. Also note that the initial value of either field might be greater than its expected initial value, because the counters might already have been incremented. Nonetheless, the initial values of the counters have been chosen so that they cannot wrap to zero on these initial segments.</t> </section><section title="Consistency between<section> <name>Consistency Between AccECN FeedbackFields">Fields</name> <t>When AccECN Options areavailableavailable, they ought to provide more unambiguous feedback. However, they supplement but do not replace the ACE field. An endpoint using AccECN feedbackMUST<bcp14>MUST</bcp14> always reconcile the information provided in the ACE field with that in any AccECN Option, so that the state of the ACE-related packet counter can be relied on if future feedback does not carry an AccECN Option.</t> <t>If an AccECN Option is present, the s.cep counter might increase more than expected from the increase of the s.ceb counter(e.g., due(e.g., due to a CE-marked control packet). The sender's response to such a situation is out of scope, and needs to be dealt with in a specification that uses ECN-capable control packets. Theoretically, this situation could also occur if a middlebox mangled an AccECN Option but not the ACE field. However, the Data Sender has to assume that the integrity of AccECN Options is sound, based on the above test of the well-known initial values and optionally other integrity tests (<xref target="accecn_Integrity"/>).</t> <t>If either endpoint detects that the s.ceb counter has increased but the s.cep has not (and by testing ACK coverage it is certain how much the ACE field has wrapped), and if there is no explanation other than an invalid protocol transition due to some form of feedback mangling, the Data SenderMUST<bcp14>MUST</bcp14> disable sending ECN-capable packets for the remainder of the half-connection by setting the IP-ECN field in all subsequent packets to Not-ECT.<!--There is no need to say the following for forward compatibility: "If a data receiver negotiates AccECN but then deliberately makes the counters inconsistent, it MUST continue the connection even if the data sender does not disable sending ECN-capablepackets."--></t>packets."--> </t> </section> </section> <sectionanchor="accecn_option_usage" title="Usageanchor="accecn_option_usage"> <name>Usage of the AccECN TCPOption">Option</name> <t>If a Data Receiver in AccECN mode intends to use AccECN TCP Options to provide feedback, the rules below determine whenit includesto include an AccECN TCP Option, and which fields to include, given other options might be competing for limited optionspace:<list style="hanging"> <t hangText="Importancespace:</t> <dl newline="false" spacing="normal"> <dt>Importance of CongestionControl:">AccECNControl:</dt> <dd> <t>AccECN is for congestion control, which implementationsSHOULD<bcp14>SHOULD</bcp14> generally prioritize over other TCP options when there is insufficient space for all the options inuse.<vspace blankLines="1"/>Ifuse.</t> <t>If SACK has been negotiated <xref target="RFC2018"/>, and the smallest recommended AccECN Option would leave insufficient space for two SACK blocks on a particular ACK, the Data ReceiverMUST<bcp14>MUST</bcp14> give precedence to the SACK option (total 18 octets), because loss feedback is more critical.</t><t hangText="Recommended</dd> <dt>Recommended SimpleScheme:">TheScheme:</dt> <dd> <!-- [rfced] For ease of the reader, we suggest adding a pointer to the examples. Original: Recommended Simple Scheme: The Data Receiver SHOULD include an AccECN TCP Option on every scheduled ACK if any byte counter has incremented since the last ACK. Whenever possible, it SHOULD include a field for every byte counter that has changed at some time during the connection (see examples later).<vspace blankLines="1"/>A--> <t>The Data Receiver <bcp14>SHOULD</bcp14> include an AccECN TCP Option on every scheduled ACK if any byte counter has incremented since the last ACK. Whenever possible, it <bcp14>SHOULD</bcp14> include a field for every byte counter that has changed at some time during the connection (see examples later). </t> <t>A scheduled ACK means an ACK that the Data Receiver would send by its regular delayed ACK rules. Recall that <xref target="accecn_Terminology"/> defines an 'ACK' as either with data payload or without. But the above rule is worded so that, in the common case when most of the data is from a Server to a Client, the Server only includes an AccECN TCP Option while it is acknowledging data from the Client.</t></list>When</dd> </dl> <t>When available TCP option space is limited on particular packets, the recommended scheme will need to include compromises. To guide theimplementerimplementer, the rules below are ranked in order of importance, but the final decision has to be implementation-dependent, because tradeoffs will alter as new TCP options are defined and new use-casesarise.<list style="hanging"> <t hangText="Necessaryarise.</t> <dl newline="false" spacing="normal"> <dt>Necessary OptionLength:">WhenLength:</dt> <dd> <t>When TCP option space is limited, an AccECN TCP optionMAY<bcp14>MAY</bcp14> be truncated to omit one or two fields from the end of the option, as indicated by the permitted variants listed in <xref target="accecn_Fig_TCPopttab"/>, provided that the counter(s) that have changed since the previous AccECN TCP option are notomitted.<vspace blankLines="1"/>omitted.</t> <t> If there is insufficient space to include an AccECN TCP option containing the counter(s) that have changed since the previous AccECN TCP option, then the entire AccECN TCP optionMUST<bcp14>MUST</bcp14> be omitted. (see <xref target="accecn_option"/>);</t><t hangText="Change-Triggered</dd> <dt>Change-Triggered AccECN TCPOptions:">IfOptions:</dt> <dd> <t>If an arriving packet increments a different byte counter to that incremented by the previous packet, the Data ReceiverSHOULD<bcp14>SHOULD</bcp14> feed it back in an AccECN Option on the next scheduled ACK.<vspace blankLines="1"/></t> <t> For the avoidance of doubt, this rule does not concern the arrival of control packets with no payload, because they cannot alter any byte counters.</t><t hangText="Continual Repetition:">Otherwise,</dd> <dt>Continual Repetition:</dt> <dd> <t>Otherwise, if arriving packets continue to increment the same bytecounter:<list style="symbols">counter:</t> <ul spacing="normal"> <li> <t>the Data ReceiverSHOULD<bcp14>SHOULD</bcp14> include a counter that has continued to increment on the next scheduled ACK following a change-triggered AccECN TCP Option;</t> </li> <li> <t>while the same counter continues to increment, itSHOULD<bcp14>SHOULD</bcp14> include the counter every n ACKs as consistently as possible, where n can be chosen by the implementer;</t> </li> <li> <t>ItSHOULD<bcp14>SHOULD</bcp14> always include an AccECN Option if the r.ceb counter is incrementing and itMAY<bcp14>MAY</bcp14> include an AccECN Option if r.ec0b or r.ec1b is incrementing</t> </li> <li> <t>ItSHOULD<bcp14>SHOULD</bcp14> include each counter at least once for every 2^22 bytes incremented to prevent overflow during continual repetition.</t></list></t> </list></t></li> </ul> </dd> </dl> <t>The above rules complement those in <xref target="accecn_ACE_Safety"/>, which determine when to generate an ACK irrespective of whether an AccECN TCP Option is to be included.</t> <t>The recommended scheme is intended as a simple way to ensure that all the relevant byte counters will be carried on any ACK that reaches the Data Sender, no matter how many pure ACKs are filtered or coalesced along the network path, and without consuming the space available for payload data with counter field(s) that have never changed.</t> <t>As an example of the recommended scheme, if ECT(0) is the only codepoint that has ever arrived in the IP-ECN field, the Data Receiver will feed back an AccECN0 TCP Option with only the EE0B field on every packet that acknowledges new data. However, as soon as even one CE-marked packet arrives, on every packet that acknowledges new data it will start to include an option with two fields, EE0B and ECEB. As a second example, if the first packet to arrive happens to beCE-marked,CE marked, the Data Receiver will have to arbitrarily choose whether to precede the ECEB field with an EE0B field or an EE1B field. If it chooses, say, EEB0 but it turns out never to receive ECT(0), it can start sending EE1B and ECEB instead--- it does not have to include the EE0B field if the r.e0b counterhasnever changed during the connection.</t> <t>With the recommended scheme, if the data sending direction switches during a connection, there can be cases where the AccECN TCP Option that is meant to feed back the counter values at the end of a volley in one direction never reaches the otherpeer,peer due to packet loss. ACE feedback ought to be sufficient to fill this gap, given accurate feedback becomes moot after data transmission has paused.</t> <t><xref target="accecn_Algo_ACE_Bytes"/> gives an example algorithm to estimate the number of marked bytes from the ACE field alone, if AccECN Options are not available.</t> <t>If a host has determined that segments with AccECN Options always seem to be discarded somewhere along the path, it is no longer obliged to follow any of the rules in this section.</t> </section> </section> </section> <sectionanchor="accecn_Mbox_Operation" title="AccECNanchor="accecn_Mbox_Operation"> <name>AccECN Compliance Requirements for TCP Proxies, OffloadEnginesEngines, andother Middleboxes">Other Middleboxes</name> <t>Given AccECN alters the TCP protocol on the wire, this section specifies new requirements on certain networking equipment that forwards TCP and inspects TCP header information.</t><section title="Requirements<section> <name>Requirements for TCPProxies">Proxies</name> <t>A large class of middleboxes split TCP connections. Such a middlebox would be compliant with the AccECN protocol if the TCP implementation on each side complied with the present AccECN specification and each side negotiated AccECN independently of the other side.</t> </section> <sectionanchor="accecn_middlebox_transparent_normalizers" title="Requirementsanchor="accecn_middlebox_transparent_normalizers"> <name>Requirements for Transparent Middleboxes and TCPNormalizers">Normalizers</name> <t>Another large class of middleboxes intervenes to some degree at the transport layer, but attempts to be transparent (invisible) to the end-to-end connection. A subset of this class of middleboxes attempts to`normalize''normalize' the TCP wire protocol by checking that all values in header fields comply with a rather narrow interpretation of the TCP specifications that isalsonot always up to date.</t> <t>A middlebox that is not normalizing the TCP protocol and does not itself act as a back-to-back pair of TCP endpoints(i.e., a(i.e., a middlebox that intends to be transparent or invisible at the transport layer) ought to forward AccECN TCP Options unaltered, whether or not the length value matches one of those specified in <xref target="accecn_option"/>, and whether or not the initial values of the byte-counter fields match those in <xref target="accecn_init_counters"/>. This is because blocking apparently invalid values prevents the standardized set of values from being extended in the future (such outdated normalizers would block updated hosts from using the extended AccECN standard).</t> <t>A TCP normalizer is likely to block or alter an AccECN TCP Option if the length value or the initial values of its byte-counter fields do not match one of those specified in Sections <xreftarget="accecn_option"/>target="accecn_option" format="counter"/> or <xreftarget="accecn_init_counters"/>.target="accecn_init_counters" format="counter"/>. However, to comply with the present AccECN specification, a middleboxMUST NOT<bcp14>MUST NOT</bcp14> change the ACE field; or those fields of an AccECN Option that are currently specified in <xref target="accecn_option"/>; or any AccECN field covered by integrity protection(e.g., <xref(e.g., <xref target="RFC5925"/>).</t> <!-- This includes the explicitly stated requirements to forward Reserved (Rsvd) and Currently Unused (CU) values unaltered. An 'ideal' TCP normalizer would not have to change to accommodate AccECN, because AccECN does not directly contravene any existing TCP specifications, even though it uses existing TCP fields in unorthodox ways. --> </section><section title="Requirements<section> <name>Requirements for TCP ACKFiltering"> <t>SectionFiltering</name> <!-- [rfced] Mention of BCP 69 was removed to the HTML and PDF could link directly to Section 5.2.1 ofBCP 69 <xref target="RFC3449"/>RFC 3449. Would you prefer that BCP 69 be included as the cite tag? Original: Section 5.2.1 of BCP 69 [RFC3449] gives best current practice on filtering (aka. thinning or coalescing) of pure TCP ACKs. Perhaps: Section 5.2.1 of RFC 3449 [BCP69] gives best current practice on filtering (aka thinning or coalescing) of pure TCP ACKs. --> <t><xref target="RFC3449" sectionFormat="of" section="5.2.1"/> gives best current practice on filtering (aka thinning or coalescing) of pure TCP ACKs. It advises that filtering ACKs carrying ECN feedback ought to preserve the correct operation of ECN feedback. As the present specification updates the operation of ECN feedback, this section discusses how an ACK filter might preserve correct operation of AccECN feedback as well.</t> <t>The problem divides into two parts: determining if an ACK is part of a connection that is using AccECN and then preserving the correct operation of AccECNfeedback:<list style="symbols">feedback:</t> <ul spacing="normal"> <li> <!-- [rfced] Does "even if it is" refer to using AccECN without ECN++ or with ECN++? Original: However, it might omit some AccECN ACKs, because AccECN can be used without ECN++ and even if it is, ECN++ does not have to make pure ACKs ECN-capable - only deployment experience will tell. Perhaps: However, it might omit some AccECN ACKs because AccECN can be used without ECN++. Even if ECN++ is used, it does not have to make pure ACKs ECN-capable - only deployment experience will tell. --> <t>To determine whether a pure TCP ACK is part of an AccECN connection without resorting to connection tracking and per-flow state, a useful heuristic would be to check for a non-zero ECN field at the IP layer (because the ECN++ experiment only allows TCP pure ACKs to be ECN-capable if AccECN has been negotiated <xref target="I-D.ietf-tcpm-generalized-ecn"/>). This heuristic is simple and stateless. However, it might omit some AccECN ACKs, because AccECN can be used without ECN++ and even if it is, ECN++ does not have to make pure ACKs ECN-capable--- only deployment experience will tell. Also, TCP ACKs might be ECN-capable owing to some scheme other than AccECN,e.g., <xrefe.g., <xref target="RFC5690"/> or some future standards action. Again, only deployment experience will tell.</t> </li> <li> <t>The main concern with preserving correct AccECN operation involves leaving enough ACKs for the Data Sender to work out whether the 3-bit ACE field has wrapped. In the worst case, in feedback about a run of received packets that were all ECN-marked, the ACE field will wrap every 8 acknowledged packets. ACE field wrap might be of less concern if packets also carry AccECN TCP Options. However, note that logic to read an AccECN TCP Option is optional to implement (albeit recommended—-- see <xref target="accecn_option"/>). So one end writing an AccECN TCP Option into a packet does not necessarily imply that the other end will read it.</t></list></t></li> </ul> <t>Note that the present specification of AccECN in TCP does not presume to rely on any of the above ACK filtering behaviour in the network, because it has to be robust against pre-existing network nodes that do not distinguish AccECN ACKs, and robust against ACK loss during overload more generally.</t> </section><section title="Requirements<section> <name>Requirements for TCP Segmentation Offload and Large ReceiveOffload">Offload</name> <t>Hardware to offload certain TCP processing represents another large class of middleboxes (even though it is often a function of a host's network interface and rarely in its own 'box').</t> <t>Offloading can happen in the transmit path, usually referred to as TCP Segmentation Offload (TSO), and the receive path where it is called Large Receive Offload (LRO).</t> <t>In the transmit direction, with AccECN, all segments created from the same super-segment should retain the same ACE field, which should make TSO straighforward.</t> <t>However, with TSO hardware that supports <xref target="RFC3168"/>, the CWR bit is usually masked out on the middle and lastsegment.segments. If applied to an AccECN segment, this would change the ACE field, and would be interpreted as having received numerous CE marks in the receive direction. Therefore, currently available TSO hardware with <xref target="RFC3168"/> support may need some minor driver changes, to adjust the bitmask for the first,middlemiddle, and lastsegmentsegments processed with TSO.</t> <t>Initially, when Classic ECN <xref target="RFC3168"/> and Accurate ECN flows coexist on the same offloading engine, the host software may need to work around incompatibilities(e.g., when(e.g., when only global configurable TSO TCP Flag bitmasks are available), otherwise this would cause some issues.</t> <!-- [rfced] Instead of using [RFC3168] as an adjective, may we update this text to refer to "Classic ECN"? Original: One way around this could be to only negotiate for Accurate ECN, but not offer a fall back to [RFC3168] ECN. Perhaps: One way around this could be to only negotiate for Accurate ECN, but not offer a fall back to Classic ECN [RFC3168]. Original: For LRO in the receive direction, a different issue may get exposed with [RFC3168] ECN supporting hardware. Perhaps: For LRO in the receive direction, a different issue may get exposed with Classic-ECN [RFC3168] supporting hardware. --> <t>One way around this could be to only negotiate for Accurate ECN, but not offer a fall back to <xref target="RFC3168"/> ECN. Another way could be to allow TSO only as long as the CWR flag in the TCP header is not set--- at the cost of more processing overhead while the ACE field has this bit set.</t> <t>For LRO in the receive direction, a different issue may get exposed with <xref target="RFC3168"/> ECN supporting hardware.</t> <t>The ACE field changes with every received CE marking, so today's receive offloading could lead to many interrupts in high congestion situations. Although that would be useful (because congestion information is received sooner), it could also significantly increase processor load, particularly in scenarios such as DCTCP or L4S where the marking rate is generally higher.</t> <t>Current offload hardware ejects a segment from the coalescing process whenever the TCP ECN flags change. In datacentrescentres, it has been fortunate for this offload hardware that DCTCP-style feedback changes less often when there are long sequences of CE marks, which is more common with a step marking threshold (but less likely the more short flows are in the mix). The ACE counter approach has been designed so that coalescing can continue over arbitrary patterns of marking and only needs to stop when the counter wraps. Nonetheless, until the particular offload hardware in use implements this more efficient approach, it is likely to be more efficient for AccECN connections to implement this counter-style logic using software segmentation offload.</t> <t>ECN encodes a varying signal in the ACK stream, so it is inevitable that offload hardware will ultimately need to handle any form of ECN feedback exceptionally. The ACE field has been designed as a counter so that it is straightforward for offload hardware to pass on the highest counter, and to push a segment from its cache before the counter wraps. The purpose of working towards standardized TCP ECN feedback is to reduce the risk for hardware developers, who would otherwise have to guess which scheme is likely to become dominant.</t> <t>The above process has been designed to enable a continuing incremental deployment path--- to more highly dynamic congestion control. Once offload hardware supports AccECN, it will be able to coalesce efficiently for any sequence of marks, instead of relyingfor efficiencyon the long marking sequences from stepmarking.marking for efficiency. In the next stage, marking can evolve from a step to a ramp function. That in turn will allow host congestion control algorithms to respond faster to dynamics, while being backwards compatible with existing host algorithms.</t> </section> </section> </section> <sectionanchor="accecn_3168_updates" title="Updatesanchor="accecn_3168_updates"> <name>Updates to RFC3168">3168</name> <t>This section clarifies which parts ofRFC3168RFC 3168 are updated and maps them to the relevant updated sections of the present AccECNspecification that update them: <list style="symbols">specification.</t> <ul spacing="normal"> <li> <t>The whole of"6.1.1 TCP Initialization" of<xreftarget="RFC3168"/>target="RFC3168" sectionFormat="of" section="6.1.1"/> is updated by <xref target="accecn_Negotiation"/> of the present specification.</t><t>In "6.1.2.</li> <!-- [rfced] Throughout: We have removed the section titles and linked the section numbers directly to the section of the RFC specified. For example, the text has been updated as follows: Original: * The whole of "6.1.1 TCPSender"Initialization" of [RFC3168] is updated by Section 3.1 of the present specification. Current: * The whole of Section 6.1.1 of [RFC3168] is updated by Section 3.1 of the present specification. In the HTML and PDF files, "Section 6.1.1 links to Section 6.1.1 of RFC 3168. Please review and let us know if you prefer the section titles be included. --> <li> <t>In <xreftarget="RFC3168"/>,target="RFC3168" sectionFormat="of" section="6.1.2"/>, all mentions of a congestion response to an ECN-Echo (ECE) ACK packet are updated by <xref target="accecn_feedback"/> of the present specification to mean an increment to the sender's count of CE-marked packets, s.cep. And the requirements to set the CWR flag no longer apply, as specified in <xref target="accecn_implications_accecn_mode"/> of the present specification. Otherwise, the remaining requirements in"6.1.2. The TCP Sender"<xref target="RFC3168" sectionFormat="of" section="6.1.2"/> stillstand.<vspace blankLines="1"/>Itstand.</t> <!-- [rfced] We are unclear why "potentially updates" is mentioned here. Is it mentioned to cover implementations of RFC 3168 have not been updated yet and/or potential future updates? Otherwise, may it be cut? Original: It will be noted that RFC 8311 already updates, or potentially updates, a number of the requirements in "6.1.2. The TCP Sender". --> <t>It will be noted that <xref target="RFC8311"/> already updates, or potentially updates, a number of the requirements in <xref target="RFC3168" sectionFormat="of" section="6.1.2"/>. Section 6.1.2 of RFC 3168 extended standard TCP congestion control <xref target="RFC5681"/> to cover ECN marking as well as packet drop. Whereas,RFC 8311<xref target="RFC8311"/> enables experimentation with alternative responses to ECN marking, if specified for instance by anexperimentalExperimental RFConproduced by the IETFdocument stream. RFC 8311Stream. <xref target="RFC8311"/> also strengthened the statement that "ECT(0)SHOULD<bcp14>SHOULD</bcp14> be used" to a"MUST""<bcp14>MUST</bcp14>" (see <xref target="RFC8311"/> for the details).</t> </li> <li> <t>The whole of"6.1.3. The TCP Receiver" of<xreftarget="RFC3168"/>target="RFC3168" sectionFormat="of" section="6.1.3"/> is updated by <xref target="accecn_feedback"/> of the present specification, with the exception of the last paragraph (about congestion response to drop and ECN in the same round trip), which still stands. Incidentally, this last paragraph is in the wrong section, because it relates to "TCP Sender" behaviour.</t> </li> <li> <t>The following text within"6.1.5. Retransmitted TCP packets": <list style="empty"> <t>"the<xref target="RFC3168" sectionFormat="of" section="6.1.5"/>:</t> <blockquote><t>the TCP data receiverSHOULD<bcp14>SHOULD</bcp14> ignore the ECN field on arriving data packets that are outside of the receiver's currentwindow."</t> </list> iswindow.</t></blockquote> <t>is updated by more stringent acceptability tests for any packet (not just data packets) in the present specification. Specifically, in the normative specification of AccECN (<xreftarget="accecn_Spec"/>)target="accecn_Spec"/>), only 'Acceptable' packets contribute to the ECN counters at the AccECN receiver and <xref target="accecn_Terminology"/> defines an Acceptable packet as one that passes acceptability tests equivalent in strength to those in both <xref target="RFC9293"/> and <xref target="RFC5961"/>.</t> </li> <li> <t>Sections5.2, 6.1.1, 6.1.4, 6.1.5 and 6.1.6<xref target="RFC3168" sectionFormat="bare" section="5.2"/>, <xref target="RFC3168" sectionFormat="bare" section="6.1.1"/>, <xref target="RFC3168" sectionFormat="bare" section="6.1.4"/>, <xref target="RFC3168" sectionFormat="bare" section="6.1.5"/>, and <xref target="RFC3168" sectionFormat="bare" section="6.1.6"/> of <xref target="RFC3168"/> prohibit use of ECN on TCP control packets and retransmissions. The present specification does not update that aspect ofRFC 3168,<xref target="RFC3168"/>, but it does say what feedback an AccECN Data Receiver ought to provide if it receives an ECN-capable control packet or retransmission. This ensures AccECN is forward compatible with any future scheme that allows ECN on these packets, as provided for insection 4.3 of<xreftarget="RFC8311"/>target="RFC8311" sectionFormat="of" section="4.3"/> and as proposed in <xref target="I-D.ietf-tcpm-generalized-ecn"/>.</t></list></t></li> </ul> </section> <sectionanchor="accecn_Interact_Variants" title="Interactionanchor="accecn_Interact_Variants"> <name>Interaction with TCPVariants">Variants</name> <t>This section is informative, not normative.</t> <sectionanchor="accecn_Interaction_SYN_Cookies" title="Compatibilityanchor="accecn_Interaction_SYN_Cookies"> <name>Compatibility with SYNCookies">Cookies</name> <t>A TCP Server can use SYN Cookies (seeAppendix A of<xref section="A" target="RFC4987"/>) to protect itself from SYN flooding attacks. It places minimal commonly used connection state in the SYN/ACK, and deliberately does not hold any state while waiting for the subsequent ACK(e.g., it(e.g., it closes the thread).ThereforeTherefore, it cannot record the fact that it entered AccECN mode for both half-connections. Indeed, it cannot even remember whether it negotiated the use of Classic ECN <xref target="RFC3168"/>.</t> <t>Nonetheless, such a Server can determine that it negotiated AccECN as follows. If a TCP Server using SYN Cookies supports AccECN and if it receives a pure ACK that acknowledges an ISN that is a valid SYN cookie, and if the ACK contains an ACE field with the value 0b010 to 0b111 (decimal 2 to 7), the Server can infer the first two stages of thehandshake:<list style="symbols">handshake:</t> <ul spacing="normal"> <li> <t>the TCP Client has to have requested AccECN support on the SYN;</t> </li> <li> <t>then, even though the Server kept no state, it has to have confirmed that it supported AccECN.</t></list>Therefore</li> </ul> <t>Therefore, the Server can switch itself into AccECN mode, and continue as if it had never forgotten that it switched itself into AccECN mode earlier.</t> <t>If the pure ACK that acknowledges a SYN cookie contains an ACE field with the value 0b000 or 0b001, these values indicate that the TCP Client did not request support forAccECN and thereforeAccECN; therefore, the Server does not enter AccECN mode for this connection. Further, 0b001 on the ACK implies that the Server sent an ECN-capable SYN/ACK, which was marked CE in the network, and the non-AccECN TCP Client fed this back by setting ECE on the ACK of the SYN/ACK.</t> </section> <sectionanchor="accecn_Interaction_Other" title="Compatibilityanchor="accecn_Interaction_Other"> <name>Compatibility with TCP Experiments and Common TCPOptions">Options</name> <t>AccECN is compatible (at least on paper) with the most commonly used TCP options: MSS, time-stamp, window scaling,SACKSACK, and TCP-AO. It is also compatible with Multipath TCP (MPTCP <xref target="RFC8684"/>) and the experimental TCP option TCP Fast Open (TFO <xref target="RFC7413"/>). AccECN is friendly to all these protocols, because space for TCP options is particularly scarce on the SYN, where AccECN consumes zero additional header space.</t> <!-- [rfced] As we believe "pressure" refers to options vying for limited space, perhaps this update would be more clear? Original: When option space is under pressure from other options, Section 3.2.3.3 provides guidance on how important it is to send an AccECN Option relative to other options, and which fields are more important to include. Perhaps: Because option space is limited, Section 3.2.3.3 provides guidance on how important it is to send an AccECN Option relative to other options and specifies which fields are more important to include. --> <t>When option space is under pressure from other options, <xref target="accecn_option_usage"/> provides guidance on how important it is to send an AccECN Option relative to other options, and which fields are more important to include.</t> <t>Implementers of TFO need to take careful note of the recommendation in <xref target="accecn_ACE_3rdACK"/>. That section recommends that, if the TCP Client has successfully negotiated AccECN, when acknowledging the SYN/ACK, even if it has data to send, it sends a pure ACK immediately before the data. Then it can reflect the IP-ECN field of the SYN/ACK on this pure ACK, which allows the Server to detect ECN mangling. Note that, as specified in <xref target="accecn_feedback"/>, any data on the SYN (SYN=1, ACK=0) is not included in any of the byte counters held locally for each ECN marking, nor in the AccECN Option on the wire.</t> <t>AccECN feedback is compatible with the ECN++ experiment <xreftarget="I-D.ietf-tcpm-generalized-ecn"/> experiment,target="I-D.ietf-tcpm-generalized-ecn"/>, which allows TCP control packets and retransmissions to be ECN-capable (<xref target="RFC3168"/> was updated by <xref target="RFC8311"/> to permit such experiments). AccECN is likely to inherently support any experiment with ECN-capable packets, because it feeds back the contents of the ECN field mechanistically, without judging whether or not a packet ought to use the ECN capabilityor not(<xref target="accecn_demb_reflector"/>). This specification does not discuss implementing AccECN alongside <xref target="RFC5562"/>, which was an earlier experimental protocol with narrower scope than ECN++ and a 5-way handshake.</t> </section> <sectionanchor="accecn_Integrity" title="Compatibilityanchor="accecn_Integrity"> <name>Compatibility with Feedback IntegrityMechanisms">Mechanisms</name> <t>Three alternative mechanisms are available to assure the integrity of ECN and/or loss signals. AccECN is compatible with any of theseapproaches:<list style="symbols">approaches:</t> <ul spacing="normal"> <li> <t>The Data Sender can test the integrity of the receiver's ECN (or loss) feedback by occasionally setting the IP-ECN field to a value normally only set by the network (and/or deliberately leaving a sequence number gap). Then it can test whether the Data Receiver's feedback faithfully reports what it expects (similar to paragraph 2 ofSection 20.2 of<xreftarget="RFC3168"/>).target="RFC3168" sectionFormat="of" section="20.2"/>). Unlike theECN NonceECN-nonce <xref target="RFC3540"/>, this approach does not waste the ECT(1) codepoint in the IP header, it does not requirestandardizationstandardization, and it does not rely on misbehaving receivers volunteering to reveal feedback information that allows them to be detected. However, setting the CE mark by the sender might conceal actual congestion feedback from the network and therefore ought to only be done sparingly.</t> </li> <li> <t>Networks generate congestion signals when they are becoming congested, so networks are more likely than Data Senders to be concerned about the integrity of the receiver's feedback of these signals. A network can enforce a congestion response to its ECN markings (or packet losses) using congestion exposure (ConEx) audit <xref target="RFC7713"/>. Whether the receiver or a downstream network is suppressing congestionfeedbackfeedback, or the sender is unresponsive to the feedback, or both, ConEx audit can neutralize any advantage that any of these three parties would otherwise gain.<vspace blankLines="1"/>ConEx</t> <!-- [rfced] Please confirm "experimental" is correct here. We ask because RFC 7713 is an Informational RFC. Original: ConEx is an experimental change to the Data Sender that would be most useful when combined with AccECN. --> <t>ConEx is an experimental change to the Data Sender that would be most useful when combined with AccECN. Without AccECN, the ConEx behaviour of a Data Sender would have to be more conservative than would be necessary if it had the accurate feedback of AccECN.</t> </li> <li> <t>Thestandards trackStandards Track TCP authentication option (TCP-AO <xref target="RFC5925"/>) can be used to detect any tampering with AccECN feedback between the Data Receiver and the Data Sender (whether malicious or accidental). The AccECN fields are immutableend-to-end,end to end, so they are amenable to TCP-AO protection, which covers TCP options by default. However, TCP-AO is often too brittle to use on many end-to-end paths, where middleboxes can make verification fail in their attempts to improve performance or security,e.g., Networke.g., Network Address(and Port)Translation(NAT/NAPT), resegmentation(NAT) and Network Address Port Translation (NAPT), resegmentation, or shifting the sequence space.</t></list></t></li> </ul> </section> </section><!-- ================================================================ --><sectionanchor="accecn_Properties" title="Summary:anchor="accecn_Properties"> <name>Summary: ProtocolProperties">Properties</name> <t>This section isinformativeinformative, not normative. It describes how well the protocol satisfies the agreed requirements for a more Accurate ECN feedback protocol <xreftarget="RFC7560"/>.<list style="hanging"> <t hangText="Accuracy:">Fromtarget="RFC7560"/>.</t> <dl newline="false" spacing="normal"> <dt>Accuracy:</dt> <dd>From each ACK, the Data Sender can infer the number of newCE markedCE-marked segments since the previous ACK. This provides better accuracy on CE feedback than Classic ECN. Inadditionaddition, if an AccECN Option is present (not blocked by the networkpath)path), the number of bytes marked with CE,ECT(1)ECT(1), and ECT(0) areprovided.</t> <t hangText="Overhead:">Theprovided.</dd> <dt>Overhead:</dt> <dd>The AccECN scheme is divided into two parts. The essential feedback part reuses the3three flags already assigned to ECN in the TCP header. The supplementary feedback part adds an additional TCP option consuming up to 11 bytes. However, no TCP option space is consumed in theSYN.</t> <t hangText="Ordering:">TheSYN.</dd> <dt>Ordering:</dt> <dd>The order in which marks arrive at the Data Receiver is preserved in AccECN feedback, because the Data Receiver is expected to send an ACK immediately whenever a different markarrives.</t> <t hangText="Timeliness:">Whilearrives.</dd> <dt>Timeliness:</dt> <dd>While the same ECN markings are arriving continually at the Data Receiver, it can defer ACKs as TCP does normally, but it will immediately send an ACK as soon as a different ECN markingarrives.</t> <t hangText="Timelinessarrives.</dd> <dt>Timeliness vsOverhead:">Change-TriggeredOverhead:</dt> <dd>Change-Triggered ACKs are intended to enable latency-sensitive uses of ECN feedback by capturing the timing of transitions but not wasting resources while the state of the signalling system is stable. Within the constraints of the change-triggered ACK rules, the receiver can control how frequently it sends AccECN TCP Options and therefore to some extent it can control the overhead induced byAccECN.</t> <t hangText="Resilience:">AllAccECN.</dd> <dt>Resilience:</dt> <dd>All information is provided based on counters. Therefore if ACKs are lost, the counters on the first ACK following the lossesallowsallow the Data Sender to immediately recover the number of the ECN markings that it missed.And ifIf data or ACKs are reordered, stale congestion information can be identified andignored.</t> <t hangText="Resilienceignored.</dd> <dt>Resilience againstBias:">BecauseBias:</dt> <dd>Because feedback is based on repetition of counters, random losses do not remove any information, they only delay it. Therefore, even though some ACKs are change-triggered, random losses will not alter the proportions of the different ECN markings in thefeedback.</t> <t hangText="Resiliencefeedback.</dd> <dt>Resilience vsOverhead:">IfOverhead:</dt> <dd>If space is limited in some segments(e.g., because(e.g., because more options are needed on some segments, such as the SACK option after loss), the Data Receiver can send AccECN Options less frequently or truncate fields that have not changed, usually down to as little as 5bytes.</t> <t hangText="Resiliencebytes.</dd> <dt>Resilience vs Timeliness andOrdering:">OrderingOrdering:</dt> <dd>Ordering information and the timing of transitions cannot be communicated in three cases: i) during ACK loss; ii) if something on the path strips AccECN Options; or iii) if the Data Receiver is unable to support Change-Triggered ACKs. Following ACK reordering, the Data Sender can reconstruct the order in which feedback was sent, but not until all the missing feedback hasarrived.</t> <t hangText="Complexity:">Anarrived.</dd> <dt>Complexity:</dt> <dd>An AccECN implementation solely involves simple counter increments, some modulo arithmetic to communicate the least significant bits and allow for wrap, and some heuristics for safety against fields cycling due to prolonged periods of ACK loss. Each host needs to maintain eight additional counters. The hosts have to apply some additional tests to detect tampering by middleboxes, but in general the protocol is simple tounderstand, simple tounderstand and implement and requires few cycles per packet toexecute.</t> <t hangText="Integrity:">AccECNexecute.</dd> <dt>Integrity:</dt> <dd>AccECN is compatible with at least three approaches that can assure the integrity of ECN feedback. If AccECN Options arestrippedstripped, the resolution of the feedback is degraded, but the integrity of this degraded feedback can still beassured.</t> <t hangText="Backward Compatibility:">Ifassured.</dd> <dt>Backward Compatibility:</dt> <dd> <t>If only one endpoint supports the AccECN scheme, it willfall-backfall back to the most advanced ECN feedback scheme supported by the otherend.<vspace blankLines="1"/>Ifend.</t> <t>If AccECN Options are stripped by a middlebox, AccECN still provides basic congestion feedback in the ACE field. Further, AccECN can be used to detect mangling of theIP ECNIP-ECN field; mangling of the TCP ECN flags; blocking of ECT-marked segments; and blocking of segments carrying an AccECN Option. It can detect these conditions during TCP's three-way handshake so that it can fall back to operation without ECN and/or operation without AccECN Options.</t><t hangText="Forward Compatibility:">The</dd> <dt>Forward Compatibility:</dt> <dd>The behaviour of endpoints and middleboxes is carefully defined for all reserved or currently unused codepoints in the scheme. Then, the designers of security devices can understand which currently unused values might appear in the future. So, even if they choose to treat such values as anomalous while they are not widely used, any blocking will at least be under policy control and not hard-coded. Then, if previously unused values start to appear on the Internet (or in standards), such policies could be quicklyreversed.</t> </list></t>reversed.</dd> </dl> </section><!-- ================================================================ --><sectionanchor="accecn_IANA_Considerations" title="IANA Considerations">anchor="accecn_IANA_Considerations"> <name>IANA Considerations</name> <t>This document reassigns the TCP header flag at bit offset 7 to the AccECN protocol. This bit was previously called the Nonce Sum (NS) flag <xref target="RFC3540"/>, but RFC 3540 has been reclassified ashistoricHistoric <xref target="RFC8311"/>. The flagwillis nowbedefined as the following in the "TCP Header Flags" registry in the "Transmission Control Protocol (TCP) Parameters" registry group:</t><texttable suppress-title="true" title="TCP header flag reassignment"> <ttcol>Bit</ttcol> <ttcol>Name</ttcol> <ttcol>Reference</ttcol> <ttcol>Assignment Notes</ttcol> <c>7</c> <c>AE (Accurate ECN)</c> <c>RFC XXXX</c> <c>Previously used as NS (Nonce Sum) by [RFC3540], which is now historic [RFC8311]</c> </texttable> <t>[TO BE REMOVED: IANA is requested to update the existing entry in the TCP<table> <name>TCP HeaderFlags registry (https://www.iana.org/assignments/tcp-parameters/tcp-parameters.xhtml#tcp-header-flags) for Bit 7 to "AEFlag Reassignment</name> <thead> <tr> <th>Bit</th> <th>Name</th> <th>Reference</th> <th>Assignment Notes</th> </tr> </thead> <tbody> <tr> <td>7</td> <td>AE (AccurateECN)" and to change the reference to this RFC-to-be instead of RFC8311. Also IANA is requested to change the assignment note to "PreviouslyECN)</td> <td>RFC 9768</td> <td>Previously used as NS (Nonce Sum) by[RFC3540],<xref target="RFC3540"/>, which is nowhistoric [RFC8311]."]</t>Historic <xref target="RFC8311"/></td> </tr> </tbody> </table> <t>This document also defines two new TCP options forAccECN, assigned values of 172 and 174 (decimal)AccECN from the TCP option space. These values are defined as the following in the "TCP Option Kind Numbers" registry in the "Transmission Control Protocol (TCP) Parameters" registry group:</t><texttable suppress-title="true" title="New<table> <name>New TCP Optionassignments"> <ttcol>Kind</ttcol> <ttcol>Length</ttcol> <ttcol>Meaning</ttcol> <ttcol>Reference</ttcol> <c>172</c> <c>N</c> <c>Accurateassignments</name> <thead> <tr> <th>Kind</th> <th>Length</th> <th>Meaning</th> <th>Reference</th> </tr> </thead> <tbody> <tr> <td>172</td> <td>N</td> <td>Accurate ECN Order 0(AccECN0)</c> <c>RFC XXXX</c> <c>174</c> <c>N</c> <c>Accurate(AccECN0)</td> <td>RFC 9768</td> </tr> <tr> <td>174</td> <td>N</td> <td>Accurate ECN Order 1(AccECN1)</c> <c>RFC XXXX</c> </texttable> <t>[TO BE REMOVED: These registrations(AccECN1)</td> <td>RFC 9768</td> </tr> </tbody> </table> <!-- [rfced] We havetaken place usingupdated theearly registration procedure, which may be temporary if this draft doesregistry title per the note below from IANA. While draft-ietf-tsvwg-udp-options has notproceed, atyet been published, this title matches what currently appears on thefollowing location: http://www.iana.org/assignments/tcp-parameters/tcp-parameters.xhtml#tcp-parameters-1 ]</t> <t>EarlyIANA site. Please let us know any concerns. NOTE: The name of the registry called "TCP Experimental Option Experiment Identifiers (TCP ExIDs)" in the IANA Considerations section has been changed to "TCP/UDP Experimental Option Experiment Identifiers (TCP/UDP ExIDs)," per draft-ietf-tsvwg-udp-options-45. Original: Early experimental implementations of the two AccECN Options used experimental option 254 per<xref target="RFC6994"/>[RFC6994] with the 16-bit magic numbers 0xACC0 and 0xACC1 respectively for Order 0 and 1, as allocated in the IANA "TCP Experimental Option Experiment Identifiers (TCP ExIDs)" registry. --> <t>Early experimental implementations of the two AccECN Options used experimental option 254 per <xref target="RFC6994"/> with the 16-bit magic numbers 0xACC0 and 0xACC1, respectively, for Order 0 and 1, as allocated in the IANA "TCP/UDP Experimental Option Experiment Identifiers (TCP/UDP ExIDs)" registry. Even earlier experimental implementations used the single magic number 0xACCE (16 bits). Uses of these experimental optionsSHOULD<bcp14>SHOULD</bcp14> migrate to use the new option kinds (172& 174).</t> <t>[TO BE REMOVED: IANA is requested to replace the references for all three of the above experimental options (0xACC0, 0xACC1and0xACCE) with a reference to the present RFC XXXX.]</t> <t>[TO BE REMOVED: If the early registrations, which may be temporary, do not proceed, the three references to them in the TCP ExIDs registry at the following location will also need to be edited out: https://www.iana.org/assignments/tcp-parameters/tcp-parameters.xhtml#tcp-exids ]</t>174).</t> </section><!-- ================================================================ --><sectionanchor="accecn_Security_Considerations" title="Securityanchor="accecn_Security_Considerations"> <name>Security and PrivacyConsiderations">Considerations</name> <t>If ever the supplementary feedback part of AccECN that is based on one of the new AccECN TCP Options is unusable (due for example to middleboxinterference)interference), the essential feedback part of AccECN's congestion feedback offers only limited resilience to long runs of ACK loss (see <xref target="accecn_ACE_Safety"/>). These problems are unlikely to be due to malicious intervention (because if an attacker could strip a TCP option or discard a long run ofACKsACKs, it could wreak other arbitrary havoc). However, it would be of concern if AccECN's resilience could be indirectly compromised during a flooding attack. AccECN is still considered safe though, because if AccECN Options are not present, the AccECN Data Sender is then required to switch to more conservative assumptions about wrap of congestion indication counters (see <xref target="accecn_ACE_Safety"/> and <xref target="accecn_Algo_ACE_Wrap"/>).</t> <t><xref target="accecn_Interaction_SYN_Cookies"/> describes how a TCP Server can negotiate AccECN and use the SYN cookie method for mitigating SYN flooding attacks.</t> <t>There is concern that ECN feedback could be altered or suppressed, particularly because a misbehaving Data Receiver could increase its own throughput at the expense of others. AccECN is compatible with the three schemes known to assure the integrity of ECN feedback (see <xref target="accecn_Integrity"/> for details). If AccECN Options are stripped by an incorrectly implemented middlebox, the resolution of the feedback will be degraded, but the integrity of this degraded information can still be assured. Assuring that Data Senders respond appropriately to ECN feedback is possible, but the scope of the present document is confined to the feedbackprotocol,protocol and excludes the response to this feedback.</t> <!-- [rfced] Please consider whether the placement of B at the end of the sentence is correct. Original: This opens up a potential covert channel of up to 29B (40 - (2+3*3)) B. --> <t>In <xreftarget="accecn_option"/>target="accecn_option"/>, a Data Sender is allowed to ignore an unrecognized TCP AccECN Option length and read as many whole 3-octet fields from it as possible up to a maximum of 3, treating the remainder as padding. This opens up a potential covert channel of up to 29B (40 - (2+3*3)) B. However, it is really an overt channel (not hidden) and it is no differenttothan the use of unknown TCP options with unknown option lengths in general. Therefore, where this is of concern, it can already be adequately mitigated by regular TCP normalizer technology (see <xref target="accecn_middlebox_transparent_normalizers"/>).</t> <t>The AccECN protocol is not believed to introduce any new privacy concerns, because it merely counts and feeds back signals at the transport layer that had already been visible at the IP layer. A covert channel can be used to compromise privacy. However, as explained above, undefined TCP options in general open up suchchannelschannels, and common techniques are available to close them off.</t> <!-- [rfced] This sentence reads a bit awkwardly. Perhaps this can be rephrased? Original: No known way can yet be contrived for a receiver to take advantage of this behaviour, which seems to always degrade its own performance. Perhaps: Currently, there is no known way for a receiver to take advantage of this behaviour, which seems to always degrade its own performance. --> <t>There is a potential concern that a Data Receiver could deliberately omit AccECN Options pretending that they had been stripped by a middlebox. No known way can yet be contrived for a receiver to take advantage of this behaviour, which seems to always degrade its own performance. However, the concern is mentioned here for completeness.</t><t>A<!-- [rfced] Instead of "show up more easily", perhaps "be more easily identified" would improve readability? Original: A generic privacy concern of any new protocol is that for a while it will be used by a small population of hosts, and thus show up more easily. --> <!-- [rfced] We have updated the text as shown below. Please let us know any concerns. Original: However, it is expected that this option will become available in operating systems over time, and eventually turned on by default in them.ThusCurrent: However, it is expected that AccECN will become available in operating systems over time and that it will eventually be turned on by default. --> <t>A generic privacy concern of any new protocol is that for a while it will be used by a small population of hosts, and thus show up more easily. However, it is expected that AccECN will become available in operating systems over time and that it will eventually be turned on by default. Thus, an individual identification of a particular user is less of a concern than the fingerprinting of specific versions of operation systems. However, the latter can be done using different means independent of Accurate ECN.</t> <t>As Accurate ECN exposes more bits in the TCP headerwhichthat could be tampered with without interfering with the transport excessively, it may allow an additional way to identify specific data streams across a virtual private network (VPN) to an attackerwhichthat has access to the datastream before and after the VPN tunnel endpoints. This may be achieved by injecting or modifying the ACE field in specificpatterspatterns that can be recognized.</t> <t>Overall, Accurate ECN does not change the risk profile on privacy to a user dramatically beyond what is already possible using classic ECN. However, in order to prevent such attacks and means of easier identification of flows, it isadviseableadvisable forprivacy consciousprivacy-conscious users behind VPNs to not enable the Accurate ECN, or Classic ECN for that matter.</t> </section> </middle> <back><!-- ================================================================ --> <references title="Normative References"><displayreference target="I-D.ietf-tcpm-generalized-ecn" to="ECN++"/> <references> <name>References</name> <references> <name>Normative References</name> <xi:includehref="http://xml2rfc.tools.ietf.org/public/rfc/bibxml/reference.RFC.9293.xml"/>href="https://bib.ietf.org/public/rfc/bibxml/reference.RFC.9293.xml"/> <xi:includehref="http://xml2rfc.tools.ietf.org/public/rfc/bibxml/reference.RFC.2018.xml"/>href="https://bib.ietf.org/public/rfc/bibxml/reference.RFC.2018.xml"/> <xi:includehref="http://xml2rfc.tools.ietf.org/public/rfc/bibxml/reference.RFC.2119.xml"/>href="https://bib.ietf.org/public/rfc/bibxml/reference.RFC.2119.xml"/> <xi:includehref="http://xml2rfc.tools.ietf.org/public/rfc/bibxml/reference.RFC.2883.xml"/>href="https://bib.ietf.org/public/rfc/bibxml/reference.RFC.2883.xml"/> <xi:includehref="http://xml2rfc.tools.ietf.org/public/rfc/bibxml/reference.RFC.3168.xml"/>href="https://bib.ietf.org/public/rfc/bibxml/reference.RFC.3168.xml"/> <xi:includehref="http://xml2rfc.tools.ietf.org/public/rfc/bibxml/reference.RFC.5961.xml"/>href="https://bib.ietf.org/public/rfc/bibxml/reference.RFC.5961.xml"/> <xi:includehref="http://xml2rfc.tools.ietf.org/public/rfc/bibxml/reference.RFC.8174.xml"/>href="https://bib.ietf.org/public/rfc/bibxml/reference.RFC.8174.xml"/> </references><references title="Informative References"><references> <name>Informative References</name> <xi:includehref="http://xml2rfc.tools.ietf.org/public/rfc/bibxml/reference.RFC.3449.xml"/>href="https://bib.ietf.org/public/rfc/bibxml/reference.RFC.3449.xml"/> <xi:includehref="http://xml2rfc.tools.ietf.org/public/rfc/bibxml/reference.RFC.3540.xml"/>href="https://bib.ietf.org/public/rfc/bibxml/reference.RFC.3540.xml"/> <xi:includehref="http://xml2rfc.tools.ietf.org/public/rfc/bibxml/reference.RFC.4987.xml"/>href="https://bib.ietf.org/public/rfc/bibxml/reference.RFC.4987.xml"/> <xi:includehref="http://xml2rfc.tools.ietf.org/public/rfc/bibxml/reference.RFC.5562.xml"/>href="https://bib.ietf.org/public/rfc/bibxml/reference.RFC.5562.xml"/> <xi:includehref="http://xml2rfc.tools.ietf.org/public/rfc/bibxml/reference.RFC.5681.xml"/>href="https://bib.ietf.org/public/rfc/bibxml/reference.RFC.5681.xml"/> <xi:includehref="http://xml2rfc.tools.ietf.org/public/rfc/bibxml/reference.RFC.5690.xml"/>href="https://bib.ietf.org/public/rfc/bibxml/reference.RFC.5690.xml"/> <xi:includehref="http://xml2rfc.tools.ietf.org/public/rfc/bibxml/reference.RFC.5925.xml"/>href="https://bib.ietf.org/public/rfc/bibxml/reference.RFC.5925.xml"/> <xi:includehref="http://xml2rfc.tools.ietf.org/public/rfc/bibxml/reference.RFC.8684.xml"/>href="https://bib.ietf.org/public/rfc/bibxml/reference.RFC.8684.xml"/> <xi:includehref="http://xml2rfc.tools.ietf.org/public/rfc/bibxml/reference.RFC.6994.xml"/>href="https://bib.ietf.org/public/rfc/bibxml/reference.RFC.6994.xml"/> <xi:includehref="http://xml2rfc.tools.ietf.org/public/rfc/bibxml/reference.RFC.6582.xml"/>href="https://bib.ietf.org/public/rfc/bibxml/reference.RFC.6582.xml"/> <xi:includehref="http://xml2rfc.tools.ietf.org/public/rfc/bibxml/reference.RFC.7323.xml"/>href="https://bib.ietf.org/public/rfc/bibxml/reference.RFC.7323.xml"/> <xi:includehref="http://xml2rfc.tools.ietf.org/public/rfc/bibxml/reference.RFC.7560.xml"/>href="https://bib.ietf.org/public/rfc/bibxml/reference.RFC.7560.xml"/> <xi:includehref="http://xml2rfc.tools.ietf.org/public/rfc/bibxml/reference.RFC.7413.xml"/>href="https://bib.ietf.org/public/rfc/bibxml/reference.RFC.7413.xml"/> <xi:includehref="http://xml2rfc.tools.ietf.org/public/rfc/bibxml/reference.RFC.7713.xml"/>href="https://bib.ietf.org/public/rfc/bibxml/reference.RFC.7713.xml"/> <xi:includehref="http://xml2rfc.tools.ietf.org/public/rfc/bibxml/reference.RFC.8257.xml"/>href="https://bib.ietf.org/public/rfc/bibxml/reference.RFC.8257.xml"/> <xi:includehref="http://xml2rfc.tools.ietf.org/public/rfc/bibxml/reference.RFC.8311.xml"/>href="https://bib.ietf.org/public/rfc/bibxml/reference.RFC.8311.xml"/> <!-- [I-D.ietf-tcpm-generalized-ecn] draft-ietf-tcpm-generalized-ecn-17 IESG State: I-D Exists as of 04/25/25. --> <xi:include href="https://bib.ietf.org/public/rfc/bibxml3/reference.I-D.ietf-tcpm-generalized-ecn.xml"/> <xi:includehref="http://xml2rfc.tools.ietf.org/public/rfc/bibxml/reference.RFC.7141.xml"/>href="https://bib.ietf.org/public/rfc/bibxml/reference.RFC.7141.xml"/> <xi:includehref="http://xml2rfc.tools.ietf.org/public/rfc/bibxml/reference.RFC.9260.xml"/>href="https://bib.ietf.org/public/rfc/bibxml/reference.RFC.9260.xml"/> <xi:includehref="http://xml2rfc.tools.ietf.org/public/rfc/bibxml/reference.RFC.9000.xml"/>href="https://bib.ietf.org/public/rfc/bibxml/reference.RFC.9000.xml"/> <xi:includehref="http://xml2rfc.tools.ietf.org/public/rfc/bibxml/reference.RFC.6679.xml"/>href="https://bib.ietf.org/public/rfc/bibxml/reference.RFC.6679.xml"/> <xi:includehref="http://xml2rfc.tools.ietf.org/public/rfc/bibxml/reference.RFC.9438.xml"/>href="https://bib.ietf.org/public/rfc/bibxml/reference.RFC.9438.xml"/> <xi:includehref="http://xml2rfc.tools.ietf.org/public/rfc/bibxml/reference.RFC.9040.xml"/>href="https://bib.ietf.org/public/rfc/bibxml/reference.RFC.9040.xml"/> <xi:includehref="http://xml2rfc.tools.ietf.org/public/rfc/bibxml/reference.RFC.8511.xml"/>href="https://bib.ietf.org/public/rfc/bibxml/reference.RFC.8511.xml"/> <xi:includehref="http://xml2rfc.tools.ietf.org/public/rfc/bibxml/reference.RFC.9330.xml"/> <reference anchor="RoCEv2"> <front> <title>InfiniBandhref="https://bib.ietf.org/public/rfc/bibxml/reference.RFC.9330.xml"/> <!-- [rfced] [RoCEv2] Please review. We could not confirm the Volume or Release number for this reference. Note that there is information at the current URL which mentions "Volume 1 Release 1.8" (see: https://www.infinibandta.org/wp-content/uploads/2024/09/IBTA-Overview-of-IBTA-Volume-1-Release-1.8.pdf). Would you like us to update this reference to Release 1.8, use a version-less reference, or keep the Release 1.4 version of the reference? Current: [RoCEv2] InfiniBand Trade Association, "InfiniBand ArchitectureSpecificationSpecification", Volume 1, Release1.4</title>1.4, 2020, <https://www.infinibandta.org/ibta-specification/>. Perhaps: [RoCEv2] InfiniBand Trade Association, "InfiniBand Architecture Specification", <https://www.infinibandta.org/ibta-specification/>. OR [RoCEv2] InfiniBand Trade Association, "InfiniBand Architecture Specification", Volume 1, Release 1.8, July 2024, <https://www.infinibandta.org/ibta-specification/>. --> <reference anchor="RoCEv2" target="https://www.infinibandta.org/ibta-specification/"> <front> <title>InfiniBand Architecture Specification</title> <author> <organization>InfiniBand Trade Association</organization> </author> <date year="2020"/> </front><format target="https://www.infinibandta.org/ibta-specification/" /><refcontent>Volume 1, Release 1.4</refcontent> <!-- https://cw.infinibandta.org/document/dl/7781"/> --> </reference> <referenceanchor="Mandalari18">anchor="Mandalari18" target="http://www.it.uc3m.es/amandala/ecn++/ecn_commag_2018.html"> <front> <title>Measuring ECN++: Good News for ++, Bad News for ECN over Mobile</title> <author fullname="Anna Mandalari" initials="A." surname="Mandalari"> <organization>UC3M</organization> </author> <author fullname="Andra Lutu" initials="A." surname="Lutu"> <organization>Simula</organization> <address> <postal> <street/> <city/> <region/> <code/> <country/> </postal> <phone/><facsimile/><email/> <uri/> </address> </author> <author fullname="Bob Briscoe" initials="B." surname="Briscoe"> <organization>Simula</organization> <address> <postal> <street/> <city/> <region/> <code/> <country/> </postal> <phone/><facsimile/><email/> <uri/> </address> </author> <author fullname="Marcelo Bagnulo" initials="M." surname="Bagnulo"> <organization>UC3M</organization> <address> <postal> <street/> <city/> <region/> <code/> <country/> </postal> <phone/><facsimile/><email/> <uri/> </address> </author> <authorfullname="Özgüfullname="Özgü Alay"initials="Ö."initials="Ö." surname="Alay"> <organization>Simula</organization> <address> <postal> <street/> <city/> <region/> <code/> <country/> </postal> <phone/><facsimile/><email/> <uri/> </address> </author> <date month="March" year="2018"/> </front> <seriesInfo name="IEEE Communications Magazine" value=""/><format target="http://www.it.uc3m.es/amandala/ecn++/ecn_commag_2018.html" type="PDF"/></reference> </references> </references> <sectionanchor="accecn_Algo_Examples" title="Example Algorithms">anchor="accecn_Algo_Examples"> <name>Example Algorithms</name> <!-- [rfced] May we update "implement" to "satisfy" to clarify the text and avoid "implementers implement"? Original: However, implementers are free to choose other ways to implement the requirements. --> <t>This appendix is informative, not normative. It gives example algorithms that would satisfy the normative requirements of the AccECN protocol. However, implementers are free to choose other ways to implement the requirements.</t><!--ToDo:<!-- [rfced] The following note was included in the XML. ToDo: Note to RFC Editor: Pls change all bare <artwork> elements (without any keywords like align) to <sourcecode>. Reason My XML editor doesn't support the <sourcecode> element, so it mangles line breaks within sourcecode, ignoring even CDATAprotection.-->protection. We have updated the XML file as noted. Please let us know how/if he "type" attribute of each sourcecode element should be set. Perhaps some/all should be marked as pseudocode? If the current list of preferred values for "type" (https://www.rfc-editor.org/rpc/wiki/doku.php?id=sourcecode-types) does not contain an applicable type, then feel free to let us know. Also, it is acceptable to leave the "type" attribute not set. --> <sectionanchor="accecn_Algo_Option_Coding" title="Exampleanchor="accecn_Algo_Option_Coding"> <name>Example Algorithm to Encode/Decode the AccECNOption">Option</name> <t><!--ToDo: Example code to check the AccECN Option fields are consistent with the ACE field.-->The example algorithms below show how a Data Receiver in AccECN mode could encode its CE byte counter r.ceb into the ECEB field within an AccECN TCP Option, and how a Data Sender in AccECN mode could decode the ECEB field into its byte counter s.ceb. The other counters for bytes marked ECT(0) and ECT(1) in an AccECN Option would be similarly encoded and decoded.</t> <t>It is assumed that each local byte counter is an unsigned integer greater than 24b (probably 32b), and that the following constant has been assigned:</t> <sourcecode><![CDATA[ DIVOPT = 2^24]]></sourcecode> <t>Every time aCE markedCE-marked data segment arrives, the Data Receiver increments its local value of r.ceb by the size of the TCP Data. Whenever it sends an ACK with an AccECN Option, the value it writes into the ECEB field is</t> <sourcecode><![CDATA[ ECEB = r.ceb % DIVOPT]]></sourcecode> <t>where '%' is the remainder operator.</t> <t>On the arrival of an AccECN Option, the Data Sender first makes sure the ACK has not been superseded in order to avoid winding the s.ceb counter backwards. It uses the TCP acknowledgement number and any SACK options <xref target="RFC2018"/> to calculate newlyAckedB, the amount of new data that the ACK acknowledges in bytes (newlyAckedB can be zero but not negative). If newlyAckedB is zero, either the ACK has been superseded or CE-marked packet(s) without data could have arrived. To break the tie for the latter case, the Data Sender could use time-stamps <xref target="RFC7323"/> (if present) to work out newlyAckedT, the amount of new time that the ACK acknowledges. If the Data Sender determines that the ACK has beensupersededsuperseded, it ignores the AccECN Option. Otherwise, the Data Sender calculates the minimum non-negative difference d.ceb between the ECEB field and its local s.ceb counter, using modulo arithmetic as follows:</t><figure><sourcecode><![CDATA[ if ((newlyAckedB > 0) || (newlyAckedT > 0)) { d.ceb = (ECEB + DIVOPT - (s.ceb % DIVOPT)) % DIVOPT s.ceb += d.ceb } ]]></sourcecode></figure><t>For example, if s.ceb is 33,554,433 and ECEB is 1461 (both decimal), then</t><figure><sourcecode><![CDATA[ s.ceb % DIVOPT = 1 d.ceb = (1461 + 2^24 - 1) % 2^24 = 1460 s.ceb = 33,554,433 + 1460 = 33,555,893 ]]></sourcecode></figure><t>Inpracticepractice, an implementation might use heuristics to guess the feedback in missingACKs, thenACKs. Then when it subsequently receivesfeedbackfeedback, it might find that it needs to correct its earlier heuristics as part of the decoding process. The above decoding process does not include any such heuristics.</t> </section> <sectionanchor="accecn_Algo_ACE_Wrap" title="Exampleanchor="accecn_Algo_ACE_Wrap"> <name>Example Algorithm for Safety Against Long Sequences of ACKLoss">Loss</name> <t>The example algorithms below show how a Data Receiver in AccECN mode could encode its CE packet counter r.cep into the ACE field, and how the Data Sender in AccECN mode could decode the ACE field into its s.cep counter. The Data Sender's algorithm includes code to heuristically detect a long enough unbroken string of ACK losses that could have concealed a cycle of the congestion counter in the ACE field of the next ACK to arrive.</t> <t>Two variants of the algorithm are given: i) a more conservative variant for a Data Sender to use if it detects that AccECN Options are not available (see <xref target="accecn_ACE_Safety"/> and <xref target="accecn_Mbox_Interference"/>); and ii) a less conservative variant that is feasible when complementary information is available from AccECN Options.</t><section title="Safety<section> <name>Safety AlgorithmwithoutWithout the AccECNOption">Option</name> <t>It is assumed that each local packet counter is a sufficiently sized unsigned integer (probably 32b) and that the following constant has been assigned:</t> <sourcecode><![CDATA[ DIVACE = 2^3]]></sourcecode> <t>Every time an Acceptable CE marked packet arrives (<xref target="accecn_sec_ACE_feedback"/>), the Data Receiver increments its local value of r.cep by 1. It repeats the same value of ACE in every subsequent ACK until the next CE marking arrives, where</t> <sourcecode><![CDATA[ ACE = r.cep % DIVACE.]]></sourcecode> <t>If the Data Sender received an earlier value of the counter that had been delayed due to ACK reordering, it might incorrectly calculate that the ACE field had wrapped. Therefore, on the arrival of every ACK, the Data Sender ensures the ACK has not been superseded using the TCP acknowledgement number, any SACKoptionsoptions, and timestamps (if available) to calculate newlyAckedB, as in <xref target="accecn_Algo_Option_Coding"/>. If the ACK has not been superseded, the Data Sender calculates the minimum difference d.cep between the ACE field and its local s.cep counter, using modulo arithmetic as follows:</t> <sourcecode><![CDATA[ if ((newlyAckedB > 0) || (newlyAckedT > 0)) d.cep = (ACE + DIVACE - (s.cep % DIVACE)) % DIVACE ]]></sourcecode> <t><xref target="accecn_ACE_Safety"/> expects the Data Sender to assume that the ACE field cycled if it is the safest likely case under prevailing conditions. The 3-bit ACE field in an arriving ACK could have cycled and become ambiguous to the Data Sender if a sequence of ACKs goes missing that covers a stream of data long enough to contain 8 or more CE marks. We use the word`missing''missing' rather than`lost','lost', because some or all the missing ACKs might arrive eventually, but out of order. Even if some of the missing ACKs were piggy-backed on data(i.e., not(i.e., not pure ACKs) retransmissions will not repair the lost AccECN information, because AccECN requires retransmissions to carry the latest AccECN counters, not the original ones.</t> <!-- [rfced] We are having trouble parsing this sentence. Where does the "which" statement end - after "full-sized"? Does "it" refer to the algorithm? Original: However, we shall start with the simplest algorithm, which assumes segments are all full- sized and ultra-conservatively it assumes that ECN marking was 100% on the forward path when ACKs on the reverse path started to all be dropped. --> <t>The phrase`under'under prevailing conditions' allows for implementation-dependent interpretation. A Data Sender might take account of the prevailing size of data segments and the prevailing CE marking rate just before the sequence of missing ACKs. However, we shall start with the simplest algorithm, which assumes segments are all full-sized and ultra-conservatively it assumes that ECN marking was 100% on the forward path when ACKs on the reverse path started to all be dropped. Specifically, if newlyAckedB is the amount of data that an ACK acknowledges since the previous ACK, then the Data Sender could assume that this acknowledges newlyAckedPkt full-sized segments, where newlyAckedPkt = newlyAckedB/MSS. Then it could assume that the ACE field incremented by</t> <sourcecode><![CDATA[ dSafer.cep = newlyAckedPkt - ((newlyAckedPkt - d.cep) %DIVACE),DIVACE) ]]></sourcecode> <!-- [rfced] May we change "works out" to "indicates" or "determines"? Original: The above formula works out that it would still be safe to assume 2 CE marks (because 9 - ((9-2) % 8) = 2). --> <t>For example, imagine an ACK acknowledges newlyAckedPkt=9 more full-size segments than any previous ACK, and that ACE increments by a minimum of 2 CE marks (d.cep=2). The above formula works out that it would still be safe to assume 2 CE marks (because 9 - ((9-2) % 8) = 2). However, if ACE increases by a minimum of 2 but acknowledges 10 full-sized segments, then it would be necessary to assume that there could have been 10 CE marks (because 10 - ((10-2) % 8) = 10).</t> <t>Note that checks would need to be added to the above pseudocode for (d.cep > newlyAckedPkt), which could occur if newlyAckedPkt had been wrongly estimated using an inappropriate packet size.</t> <t>ACKs that acknowledge a large stretch of packets might be common in data centres to achieve a high packet rate or might be due to ACK thinning by a middlebox. In these cases, cycling of the ACE field would often appear to have been possible, so the above algorithm would beover-conservative,overly conservative, leading to a false high marking rate and poor performance.ThereforeTherefore, it would be reasonable to only use dSafer.cep rather than d.cep if the moving average of newlyAckedPkt was well below 8.</t> <t>Implementers could build in more heuristics to estimate a prevailing average segment size and prevailing ECN marking. For instance, newlyAckedPkt in the above formula could be replaced with newlyAckedPktHeur = newlyAckedPkt*p*MSS/s, where s is the prevailing segment size and p is the prevailing ECN marking probability. However, ultimately, if TCP's ECN feedback becomesinaccurateinaccurate, it still has loss detection to fall back on. Therefore, it would seem safe to implement a simple algorithm, rather than a perfect one.</t> <!-- [rfced] Does "5% of full-sized" mean segments are "5% of their full size"? May we change "as long as" to "while" for readability? Original: The simple algorithm for dSafer.cep above requires no monitoring of prevailing conditions and it would still be safe if, for example, segments were on average at least 5% of full-sized as long as ECN marking was 5% or less. --> <t>The simple algorithm for dSafer.cep above requires no monitoring of prevailing conditions and it would still be safe if, for example, segments were on average at least 5% of full-sized as long as ECN marking was 5% or less. Assuming it was used, the Data Sender would increment its packet counter as follows:</t> <sourcecode><![CDATA[ s.cep += dSafer.cep]]></sourcecode> <!-- [rfced] We updated the text to point directly to Section 3.2.2.5.2 (where the quoted text appears). Please let us know any concerns. Original: If missing acknowledgement numbers arrive later (due to reordering), Section 3.2.2.5 says "the Data Sender MAY attempt to neutralize the effect of any action it took based on a conservative assumption that it later found to be incorrect". --> <t>If missing acknowledgement numbers arrive later (due to reordering), <xreftarget="accecn_ACE_Safety"/>target="accecn_ACE_Safety_S"/> says "the Data SenderMAY<bcp14>MAY</bcp14> attempt to neutralize the effect of any action it took based on a conservative assumption that it later found to be incorrect". To do this, the Data Sender would have to store the values of all the relevant variables whenever it made assumptions, so that it could re-evaluate them later. Given this could become complex and it is not required, we do not attempt to provide an example of how to do this.</t> </section><section title="Safety<section> <name>Safety Algorithm with the AccECNOption">Option</name> <!--ToDo: Ilpo says this algo is useless, 'cos (I think) you don't have the state of d.ceb and d.cep at the same time. See emails 3/1/20.--> <t>When AccECN Options are available on the ACKs before and after the possible sequence of ACK losses, if the Data Sender only needs CE-marked bytes, it will have sufficient information in AccECN Options without needing to process the ACE field. If for some reason it needs CE-marked packets, if dSafer.cep is different from d.cep, it can determine whether d.cep is likely to be a safe enough estimate by checking whether the average marked segment size (s = d.ceb/d.cep) is less than the MSS (where d.ceb is the amount of newly CE-marked bytes--- see <xref target="accecn_Algo_Option_Coding"/>). Specifically, it could use the following algorithm:</t><figure><sourcecode><![CDATA[ SAFETY_FACTOR = 2 if (dSafer.cep > d.cep) { if (d.ceb <= MSS * d.cep) { % Same as (s <= MSS), but no DBZ sSafer = d.ceb/dSafer.cep if (sSafer < MSS/SAFETY_FACTOR) dSafer.cep = d.cep % d.cep is a safe enough estimate } % else % No need for else; dSafer.cep is already correct, % because d.cep must have been too small } ]]></sourcecode></figure><!-- [rfced] We are having trouble parsing "will consider d.cep can replace". Please clarify. Original: The chart below shows when the above algorithm will consider d.cep can replace dSafer.cep as a safe enough estimate of the number of CE- marked packets: Perhaps: The chart below shows when the above algorithm will consider the number of CE-marked packets as a safe enough estimate to replace dsafer.cep with d.cep. --> <t>The chart below shows when the above algorithm will consider d.cep can replace dSafer.cep as a safe enough estimate of the number of CE-marked packets:</t><figure align="left"><artwork align="left"><![CDATA[ ^ sSafer| | MSS+ | | dSafer.cep | is MSS/SAFETY_FACTOR+--------------+ safest | | | d.cep is safe| | enough | +--------------------> MSS s ]]></artwork></figure><t>The following examples give the reasoning behind the algorithm, assuming MSS=1460:<list style="symbols">:</t> <ul spacing="normal"> <li> <t>if d.cep=0,dSafer.cep=8dSafer.cep=8, and d.ceb=1460, then s=infinity andsSafer=182.5.<vspace blankLines="0"/>ThereforesSafer=182.5.</t> <t>Therefore, even though the average size of 8 data segments is unlikely to have been as small as MSS/8, d.cep cannot have been correct, because it would imply an average segment size greater than the MSS.</t> </li> <li> <t>if d.cep=2,dSafer.cep=10dSafer.cep=10, and d.ceb=1460, then s=730 andsSafer=146.<vspace blankLines="0"/>ThereforesSafer=146.</t> <t>Therefore d.cep is safe enough, because the average size of 10 data segments is unlikely to have been as small as MSS/10.</t> </li> <li> <t>if d.cep=7,dSafer.cep=15dSafer.cep=15, and d.ceb=10200, then s=1457 andsSafer=680.<vspace blankLines="0"/>ThereforesSafer=680.</t> <t>Therefore d.cep is safe enough, because the average data segment size is more likely to have been just less than one MSS, rather than below MSS/2.</t></list></t></li> </ul> <t>If pure ACKs were allowed to be ECN-capable, missing ACKs would be far less likely. However, because <xref target="RFC3168"/> currently precludes this, the above algorithm assumes that pure ACKs are not ECN-capable.</t> </section> </section> <sectionanchor="accecn_Algo_ACE_Bytes" title="Exampleanchor="accecn_Algo_ACE_Bytes"> <name>Example Algorithm to Estimate Marked Bytes from MarkedPackets"> <t>IfPackets</name> <!-- [rfced] To what does "this" refer - the ACK? The sentence prior is included for context. Original: If AccECN Options are not available, the Data Sender can only decode CE-marking from the ACE field in packets. Every time an ACK arrives, to convert this into an estimate of CE-marked bytes, it needs an average of the segment size, s_ave. --> <t>If AccECN Options are not available, the Data Sender can only decode a CE marking from the ACE field in packets. Every time an ACK arrives, to convert this into an estimate of CE-marked bytes, it needs an average of the segment size, s_ave. Then it can add or subtract s_ave from the value of d.ceb as the value of d.cep increments or decrements. Some possible ways to calculate s_ave are outlined below. The precise details will depend on why an estimate of marked bytes is needed.</t> <t>The implementation could keep a record of the byte numbers of all the boundaries between packets in flight (including control packets), and recalculate s_ave on every ACK.HoweverHowever, it would be simpler to merely maintain a counter packets_in_flight for the number of packets in flight (including control packets), which is reset once per RTT. Either way, it would estimate s_ave as:</t> <sourcecode><![CDATA[ s_ave ~= flightsize / packets_in_flight,]]></sourcecode> <t>where flightsize is the variable that TCP already maintains for the number of bytes in flight and '~=' means 'approximately equal to'. To avoid floating point arithmetic, it could right-bit-shift by lg(packets_in_flight), where lg() means log base 2.</t> <t>An alternative would be to maintain an exponentially weighted moving average (EWMA) of the segment size:</t> <sourcecode><![CDATA[ s_ave = a * s + (1-a) * s_ave,]]></sourcecode> <t>where a is the decay constant for the EWMA. However, then it is necessary to choose a good value for this constant, which ought to depend on the number of packets in flight. Also the decay constant needs to be power of two to avoid floating point arithmetic.</t> </section> <sectionanchor="accecn_Algo_Not-ECT" title="Exampleanchor="accecn_Algo_Not-ECT"> <name>Example Algorithm to Count Not-ECTBytes">Bytes</name> <t>A Data Sender in AccECN mode can infer the amount of TCP payload data arriving at the receiver marked Not-ECT from the difference between the amount of newly ACKed data and the sum of the bytes with the other three markings, d.ceb,d.e0bd.e0b, and d.e1b.</t> <!--ToDo: write-up pseudocode, rather than just describe it.--> <t>For this approach to be precise, it has to be assumed that spurious (unnecessary) retransmissions do not lead to double counting. This assumption is currently correct, given that RFC 3168 requires that the Data Sendermarksmark retransmitted segments as Not-ECT. However, the converse is not true; necessary retransmissions will result inunder-counting.</t>undercounting.</t> <t>However, such precision is unlikely to be necessary. The only known use of a count of Not-ECT marked bytes is to test whether equipment on the path is clearing the ECN field (perhaps due to an out-dated attempt to clear, or bleach, what used to be the IPv4 ToS byte or the IPv6 Traffic Class field). To detectbleachingbleaching, it will be sufficient to detect whether nearly all bytes arrive marked as Not-ECT.ThereforeTherefore, there ought to be no need to keep track of the details of retransmissions.</t> </section> </section> <sectionanchor="accecn_flags_rationale" title="Rationaleanchor="accecn_flags_rationale"> <name>Rationale for Usage of TCP HeaderFlags"> <section title="ThreeFlags</name> <section> <name>Three TCP Header Flags in the SYN-SYN/ACKHandshake">Handshake</name> <t>AccECN uses a rather unorthodox approach to negotiate the highest version TCP ECN feedback scheme that both ends support, as justified below. It follows from the original TCP ECN capability negotiation <xref target="RFC3168"/>, in which the Client set the 2 least significant of the original reserved flags in the TCP header, and fell back tonoNo ECN support if the Server responded with the 2 flags cleared, which had previously been the default.</t> <t>Classic ECN used header flags rather than a TCP option because it was considered more efficient to use a header flag for 1 bit of feedback per ACK, and this bit could be overloaded to indicate support for Classic ECN during the handshake. During the development of ECN, 1 bit crept up to 2, in order to deliver the feedback reliably and to work round some broken hosts that reflected the reserved flags during the handshake.</t> <t>In order to be backward compatible with RFC 3168, AccECN continues this approach, using the 3rd least significant TCP header flag that had previously been allocated for theECN nonceECN-nonce (now historic). Then, whatever form of Server an AccECN Client encounters, the connection can fall back to the highest version of feedback protocol that both ends support, as explained in <xref target="accecn_Negotiation"/>.</t> <t>If AccECN capability negotiation had used the more orthodox approach of a TCP option, it would still have had to set the two ECN flags in the main TCP header, in order to be able to fall back to ClassicRFC 3168 ECN,ECN <xref target="RFC3168"/>, or to disable ECN support, without another round of negotiation. Then AccECN would also have had to handle all the different ways that Servers currently respond to settings of the ECN flags in the main TCP header, including all of the conflicting cases where a Server might have said it supported one approach in the flags and another approach in a new TCP option. And AccECN would have had to deal with all of the additional possibilities where a middlebox might have mangled the ECN flags, or removed TCP options. Thus, usage of the 3rd reserved TCP header flag simplified the protocol.</t> <t>The third flag was used in a way that could be distinguished from theECN nonce,ECN-nonce, in case any nonce deployment was encountered. Previous usage of this flag for theECN nonceECN-nonce was integrated into the original ECN negotiation. This further justified the3rdthird flag's use for AccECN, because a non-ECN usage of this flag would have had to use it as a separate single bit, rather than in combination with the other 2 ECN flags.</t> <t>Indeed, having overloaded the original uses of these three flags for its handshake, AccECN overloads all three bits again as a 3-bit counter.</t> </section><section title="Four<section> <name>Four Codepoints in theSYN/ACK">SYN/ACK</name> <t>Of the8eight possible codepoints that the3three TCP header flags can indicate on the SYN/ACK,4four already indicated earlier (or broken) versions of ECN support,1one now beinghistoric.Historic. In the early design of AccECN, an AccECN Server could use only 2 of the 4 remaining codepoints. They both indicated AccECN support, but one fed back that the SYN had arrived marked as CE. Even though ECN support on a SYN is not yet on thestandards track,Standards Track, the idea is for either end to act as a mechanistic reflector, so that future capabilities can be unilaterally deployed without requiring 2-ended deployment (justified in <xref target="accecn_demb_reflector"/>).</t> <!-- [rfced] Does "earlier versions" refer to earlier draft versions of this document? Original: This development consumed the remaining 2 codepoints on the SYN/ACK that had been reserved for future use by AccECN in earlier versions. --> <t>During traversaltestingtesting, it was discovered that the IP-ECN field in the SYN was mangled on a non-negligible proportion of paths.ThereforeTherefore, it was necessary to allow the SYN/ACK to feed all four IP-ECN codepoints that the SYN could arrive with back to the Client. Without this, the Client could not know whether to disable ECN for the connection due to mangling of the IP-ECN field (also explained in <xref target="accecn_demb_reflector"/>). This development consumed the remaining2two codepoints on the SYN/ACK that had been reserved for future use by AccECN in earlier versions.</t> </section> <sectionanchor="accecn_space_evolution" title="Spaceanchor="accecn_space_evolution"> <name>Space for FutureEvolution">Evolution</name> <t>Despite availability of usable TCP header space being extremely scarce, the AccECN protocol has taken all possible steps to ensure that there is space to negotiate possible future variants of the protocol, either if a variant of AccECN is required, or if a completely different ECN feedback approach isneeded:<list style="hanging"> <t hangText="Futureneeded.</t> <dl newline="false" spacing="normal"> <dt>Future AccECNvariants:">Whenvariants:</dt> <dd> <t>When the AccECN capability is negotiated during TCP's three-way handshake, the rows in <xref target="accecn_Tab_Negotiation"/> tagged as 'Nonce' and 'Broken' in the column for the capability of node B are unused by any current protocol defined in the RFC series. These could be used by TCP Servers in the future to indicate a variant of the AccECN protocol. In recent measurement studies in which the response of large numbers of Servers to an AccECN SYN has been tested,e.g., <xrefe.g., <xref target="Mandalari18"/>, a very small number of SYN/ACKs arrive with the pattern tagged as 'Nonce', and a small but more significant number arrive with the pattern tagged as 'Broken'. The 'Nonce' pattern could be a sign that a few Servers have implemented theECN NonceECN-nonce <xref target="RFC3540"/>, which has now been reclassified ashistoricHistoric <xref target="RFC8311"/>, or it could be the random result of some unknown middlebox behaviour. The greater prevalence of the 'Broken' pattern suggests that some instances still exist of the broken code that reflects the reserved flags on theSYN.<vspace blankLines="1"/>TheSYN.</t> <t>The requirement not to reject unexpected initial values of the ACE counter (in the main TCP header) in the last paragraph of <xref target="accecn_sec_ACE_init_invalid"/> ensures that3three unused codepoints on the ACK of the SYN/ACK,6six unused values on the first SYN=0 data packet from theClientClient, and7seven unused values on the first SYN=0 data packet from the Server could be used to declare future variants of the AccECN protocol. The word 'declare' is used rather than 'negotiate' because, at this late stage in the three-way handshake, it would be too late for a negotiation between the endpoints to be completed. A similar requirement not to reject unexpected initial values in AccECN TCP Options (<xref target="accecn_sec_zero_option"/>) is for the same purpose. If traversal of AccECN TCP Options were reliable, this would have enabled a far wider range of future variation of the whole AccECN protocol. Nonetheless, it could be used to reliably negotiate a wide range of variation in the semantics of the AccECN Option.</t><t hangText="Future</dd> <dt>Future non-AccECNvariants:">Fivevariants:</dt> <dd> <t>Five codepoints out of the8eight possible in the3three TCP header flags used by AccECN are unused on the initial SYN (in the order (AE,CWR,ECE)): (0,0,1), (0,1,0), (1,0,0), (1,0,1), (1,1,0). <xref target="accecn_sec_forward_compat"/> ensures that the installed base of AccECN Servers will all assume these are equivalent to AccECN negotiation with (1,1,1) on the SYN. These codepoints would not allow fall-back to Classic ECN support for a Server that did not understand them, but this approach ensures they are available in the future, perhaps for uses other than ECN alongside the AccECN scheme. All possible combinations of SYN/ACK could be used in response except either (0,0,0) or reflection of the same values sent on the SYN.<vspace blankLines="1"/>In</t> <t>In order to extend AccECN or ECN in the future, other ways could be resorted to, although their traversal properties are likely to be inferior. They include a new TCP option; using the remaining reserved flags in the main TCP header (preferably extending the 3-bit combinations used by AccECN to 4-bit combinations, rather than burning one bit for just one state); a non-zero urgent pointer in combination with the URG flag cleared; or some other unexpected combination of fields yet to be invented.</t></list></t></dd> </dl> </section> </section><!-- ================================================================ --><section anchor="accecn_Acknowledgements"numbered="false" title="Acknowledgements">numbered="false"> <name>Acknowledgements</name> <t>We want to thankKoen<contact fullname="Koen DeSchepper, Praveen Balasubramanian, Michael Welzl, Gorry Fairhurst, David Black, Spencer Dawkins, Michael Scharf, Michael Tüxen, Yuchung Cheng, Kenjiro Cho, Olivier Tilmans, Ilpo Järvinen, Neal Cardwell, Yoshifumi Nishida, Martin Duke, Jonathan Morton, Vidhi Goel, Alex Burr, Markku Kojo, Grenville Armitage and Wes EddySchepper"/>, <contact fullname="Praveen Balasubramanian"/>, <contact fullname="Michael Welzl"/>, <contact fullname="Gorry Fairhurst"/>, <contact fullname="David Black"/>, <contact fullname="Spencer Dawkins"/>, <contact fullname="Michael Scharf"/>, <contact fullname="Michael Tüxen"/>, <contact fullname="Yuchung Cheng"/>, <contact fullname="Kenjiro Cho"/>, <contact fullname="Olivier Tilmans"/>, <contact fullname="Ilpo Järvinen"/>, <contact fullname="Neal Cardwell"/>, <contact fullname="Yoshifumi Nishida"/>, <contact fullname="Martin Duke"/>, <contact fullname="Jonathan Morton"/>, <contact fullname="Vidhi Goel"/>, <contact fullname="Alex Burr"/>, <contact fullname="Markku Kojo"/>, <contact fullname="Grenville Armitage"/> and <contact fullname="Wes Eddy"/> for their input and discussion. The idea of using the three ECN-related TCP flags as one field for more accurate TCP-ECN feedback was first introduced in the re-ECN protocol that was the ancestor of ConEx.</t> <t>The following contributed implementations of AccECN that validated and helped to improve thisspecification:<list style="hanging"> <t hangText="Linux:">Mirja Kühlewind, Ilpo Järvinen, Neal Cardwell and Chia-Yu Chang;</t> <t hangText="FreeBSD:">Richard Scheffenegger;</t> <t hangText="Apple OSs:">Vidhi Goel.</t> </list></t> <t>Bob Briscoespecification:</t> <dl newline="false" spacing="normal"> <dt>Linux:</dt> <dd><t><contact fullname="Mirja Kühlewind"/>, <contact fullname="Ilpo Järvinen"/>, <contact fullname="Neal Cardwell"/>, and <contact fullname="Chia-Yu Chang"/></t></dd> <dt>FreeBSD:</dt> <dd><t><contact fullname="Richard Scheffenegger"/></t></dd> <dt>Apple OSs:</dt> <dd><t><contact fullname="Vidhi Goel"/></t></dd> </dl> <t><contact fullname="Bob Briscoe"/> was part-funded by Apple Inc, the Comcast Innovation Fund, the European Community under its Seventh Framework Programme through the Reducing Internet Transport Latency (RITE) project (ICT-317700) and through the Trilogy 2 project (ICT-317756), and the Research Council of Norway through the TimeIn project. The views expressed here are solely those of the authors.</t><t>Mirja Kühlewind<t><contact fullname="Mirja Kühlewind"/> was partly supported by the European Commission under Horizon 2020 grant agreement no. 688421 Measurement and Architecture for a Middleboxed Internet (MAMI), and by the Swiss State Secretariat for Education, Research, and Innovation under contract no. 15.0268. This support does not imply endorsement.</t> </section> </back> <!--================================================================ --> <section anchor="accecn_Comments_Solicited" numbered="false" removeInRFC="true" title="Comments Solicited"> <t>Comments and questions[rfced] Please review the following terminology-related questions. A) We updated the following to the form on the right. Please let us know if any corrections areencouraged and very welcome. They canneeded. not-ECT vs Not-ECT no ECN vs No ECN ECN Nonce vs ECN-Nonce vs ECN nonce (to match RFC 3540) Cubic vs CUBIC (to match RFC 9438) IP ECN field vs IP-ECN field ECN capable vs ECN-capable (to match RFC 3168, though we wonder if it should beaddressedopen (ECN capable) when not acting as an adjective appearing before then noun. time-out vs timeout CE mark* vs CE-mark* - updated to use theIETF TCP maintenancehyphen when acting as an adjective appearing before the noun B) Please review occurrences of the terms below andminor modifications working group mailing list <tcpm@ietf.org>, and/orlet us know if/how they may be made consistent. TCP Option vs TCP option (perhaps TCP Option when referring to a specific option?) Established state vs established state vs ESTABLISHED state half connection vs half-connection C) We note that "time-stamp" is used consistently. However, RFC 7323 uses "timestamp". May we update the text for consistency? --> <!-- [rfced] Please review whether any of the notes in this document should be in the <aside> element. It is defined as "a container for content that is semantically less important or tangential to theauthors.</t> </section> </back>content that surrounds it" (https://authors.ietf.org/en/rfcxml-vocabulary#aside). --> <!-- [rfced] Some author comments are present in the XML. Please confirm that no updates related to these comments are outstanding. Note that the comments will be deleted prior to publication. --> <!-- [rfced] Please review the "Inclusive Language" portion of the online Style Guide <https://www.rfc-editor.org/styleguide/part2/#inclusive_language> and let us know if any changes are needed. Updates of this nature typically result in more precise language, which is helpful for readers. Note that our script did not flag any words in particular, but this should still be reviewed as a best practice. --> </rfc>