Editing TCP offload engine (section)

==Types==

Instead of replacing the TCP stack with a TOE entirely, there are alternative techniques to offload some operations in co-operation with the operating system's TCP stack. [[TCP checksum offload]] and [[large segment offload]] are supported by the majority of today's Ethernet NICs. Newer techniques like [[large receive offload]] and TCP acknowledgment offload are already implemented in some high-end Ethernet hardware, but are effective even when implemented purely in software.<ref name=lwn-lro>{{cite news |author=Jonathan Corbet |date=2007-08-01 |publisher=[[LWN.net]] |title=Large receive offload |url=https://lwn.net/Articles/243949/ |access-date=2007-08-22 }}</ref><ref name=menon>{{cite conference |date=2008-04-28 |author1=Aravind Menon |author2=Willy Zwaenepoel |title=Optimizing TCP Receive Performance |conference=USENIX Annual Technical Conference |publisher=USENIX |url=http://www.usenix.org/event/usenix08/tech/full_papers/menon/menon_html/paper.html }}</ref>

===Parallel-stack full offload===
Parallel-stack full offload gets its name from the concept of two parallel TCP/IP Stacks. The first is the main host stack which is included with the host OS. The second or "parallel stack" is connected between the [[Internet protocol suite#Application layer|Application Layer]] and the [[Internet protocol suite#Transport layer|Transport Layer (TCP)]] using a "vampire tap". The vampire tap intercepts TCP connection requests by applications and is responsible for TCP connection management as well as TCP data transfer. Many of the criticisms in the following section relate to this type of TCP offload.

===HBA full offload===
HBA (Host Bus Adapter) full offload is found in iSCSI [[host adapter]]s which present themselves as disk controllers to the host system while connecting (via TCP/IP) to an [[iSCSI]] storage device. This type of TCP offload not only offloads TCP/IP processing but it also offloads the iSCSI initiator function. Because the HBA appears to the host as a disk controller, it can only be used with iSCSI devices and is not appropriate for general TCP/IP offload.

===TCP chimney partial offload===
TCP chimney offload addresses the major security criticism of parallel-stack full offload. In partial offload, the main system stack controls all connections to the host. After a connection has been established between the local host (usually a server) and a foreign host (usually a client) the connection and its state are passed to the TCP offload engine. The heavy lifting of data transmit and receive is handled by the offload device. Almost all TCP offload engines use some type of TCP/IP hardware implementation to perform the data transfer without host CPU intervention. When the connection is closed, the connection state is returned from the offload engine to the main system stack. Maintaining control of TCP connections allows the main system stack to implement and control connection security.

===Large receive offload===
'''Large receive offload''' ('''LRO''') is a technique for increasing inbound [[throughput]] of high-[[bandwidth (computing)|bandwidth]] network connections by reducing [[central processing unit]] (CPU) overhead. It works by aggregating multiple incoming [[packet (information technology)|packet]]s from a single [[stream (computing)|stream]] into a larger buffer before they are passed higher up the networking stack, thus reducing the number of packets that have to be processed. [[Linux]] implementations generally use LRO in conjunction with the [[New API]] (NAPI) to also reduce the number of [[interrupt]]s.

According to benchmarks, even implementing this technique entirely in software can increase network performance significantly.<ref name=lwn-lro/><ref name=menon /><ref>{{cite mailing list |author= Andrew Gallatin |title= lro: Generic Large Receive Offload for TCP traffic |mailing-list= linux-kernel |date= 2007-07-25 |url= https://lkml.org/lkml/2007/7/25/313 |access-date= 2007-08-22 }}</ref> {{As of | 2007 | April}}, the [[Linux kernel]] supports LRO for [[Transmission Control Protocol|TCP]] in software only.  [[FreeBSD]] 8 supports LRO in hardware on adapters that support it.<ref>{{cite web|url=http://www.freebsd.org/cgi/man.cgi?cxgb|title=Cxgb|website=Freebsd.org|access-date=12 July 2018}}</ref><ref>{{cite web|url=http://www.freebsd.org/cgi/man.cgi?mxge|title=Mxge|website=Freebsd.org|access-date=12 July 2018}}</ref><ref>{{cite web|url=http://www.freebsd.org/cgi/man.cgi?nxge|title=Nxge|website=Freebsd.org|access-date=12 July 2018}}</ref>
<ref name=vmxnet-lro>{{cite news | date= 2011-07-04 |publisher= [[VMware]] | title= Poor TCP performance can occur in Linux virtual machines with LRO enabled |url= http://kb.vmware.com/kb/1027511 |access-date= 2011-08-17 }}</ref>

LRO should not operate on machines acting as routers, as it breaks the [[end-to-end principle]] and can significantly impact performance.<ref>{{cite web|url= https://web.archive.org/web/20191124112839/http://downloadmirror.intel.com/14687/eng/readme.txt|title= Linux* Base Driver for the Intel(R) Ethernet 10 Gigabit PCI Express Family of Adapters|publisher= [[Intel Corporation]]|date= 2013-02-12|access-date= 2013-04-24}}</ref><ref>{{cite web|url= https://bugzilla.redhat.com/show_bug.cgi?id=772317|title= Disable LRO for all NICs that have LRO enabled|publisher= [[Red Hat, Inc.]]|date= 2013-01-10|access-date= 2013-04-24}}</ref>

====Generic receive offload====
'''Generic receive offload''' ('''GRO''') implements a generalised LRO in software that isn't restricted to TCP/[[IPv4]] or have the issues created by LRO.<ref>{{cite web|url=https://lwn.net/Articles/358910/|title=JLS2009: Generic receive offload|website=[[lwn.net]]}}</ref><ref>{{cite conference | last1 = Huang| first1 = Shu| last2 = Baldine| first2 = Ilia| title = Performance Evaluation of 10GE NICs with SR-IOV Support: I/O Virtualization and network Stack Optimizations| editor1-last = Schmitt| editor1-first = Jens B. | conference = Measurement, Modeling, and Evaluation of Computing Systems and Dependability and Fault Tolerance: 16th International GI/ITG Conference, MMB & DFT 2012 |location=Kaiserslautern, Germany |date=March 2012 | url = https://books.google.com/books?id=C3wQBwAAQBAJ| series = Lecture Notes in Computer Science | volume = 7201| publisher = Springer| publication-date = 2012| page = 198| isbn = 9783642285400| access-date = 2016-10-11| quote = Large-Receive-Offload (LRO) reduces the per-packet processing overhead by aggregating smaller packets into larger ones and passing them up to the network stack. Generic-Receive-Offload (GRO) provides a generalized software version of LRO [...].}}
</ref>

===Large send offload===
In [[computer network]]ing, '''large send offload''' ('''LSO''') is a technique for increasing egress [[throughput]] of high-[[Bandwidth (computing)|bandwidth]] network connections by reducing [[central processing unit|CPU]] overhead. It works by passing a multipacket buffer to the [[network interface card]] (NIC). The NIC then splits this buffer into separate packets. The technique is also called '''TCP segmentation offload''' ('''TSO''') or '''generic segmentation offload''' ('''GSO''') when applied to [[Transmission Control Protocol|TCP]].  LSO and LRO are independent and use of one does not require the use of the other. 

When a system needs to send large chunks of data out over a computer network, the chunks first need breaking down into smaller segments that can pass through all the network elements like routers and switches between the source and destination computers.   This process is referred to as ''[[Packet segmentation|segmentation]]''.  Often the TCP protocol in the host computer performs this segmentation.  Offloading this work to the NIC is called ''TCP segmentation offload'' (TSO).

For example, a unit of 64 KiB (65,536 bytes) of data is usually segmented to 45 segments of 1460 bytes each before it is sent through the NIC and over the network.  With some intelligence in the NIC, the host CPU can hand over the 64 KB of data to the NIC in a single transmit-request, the NIC can break that data down into smaller segments of 1460 bytes, add the TCP, [[Internet Protocol|IP]], and data link layer protocol headers — according to a template provided by the host's TCP/IP stack — to each segment, and send the resulting frames over the network. This significantly reduces the work done by the CPU. {{As of | 2014}} many new NICs on the market  support TSO.

Some network cards implement TSO generically enough that it can be used for offloading fragmentation of other [[transport layer]] protocols, or for doing [[IP fragmentation]] for protocols that don't support fragmentation by themselves, such as [[User Datagram Protocol|UDP]].