Editing Error detection and correction (section)

== Applications ==
Applications that require low latency (such as telephone conversations) cannot use [[automatic repeat request]] (ARQ); they must use [[forward error correction]] (FEC). By the time an ARQ system discovers an error and re-transmits it, the re-sent data will arrive too late to be usable.

Applications where the transmitter immediately forgets the information as soon as it is sent (such as most television cameras) cannot use ARQ; they must use FEC because when an error occurs, the original data is no longer available.

Applications that use ARQ must have a [[return channel]]; applications having no return channel cannot use ARQ. Applications that require extremely low error rates (such as digital money transfers) must use ARQ due to the possibility of uncorrectable errors with FEC.

Reliability and inspection engineering also make use of the theory of error-correcting codes,<ref>{{cite journal |url=http://www.eng.tau.ac.il/~bengal/SCI_paper.pdf|journal=IIE Transactions |title=Self-correcting inspection procedure under inspection errors |author1=Ben-Gal I. |author2=Herer Y. |author3=Raz T. |publisher=IIE Transactions on Quality and Reliability, 34(6), pp. 529-540. |year=2003 |access-date=2014-01-10 |archive-url=https://web.archive.org/web/20131013171945/http://www.eng.tau.ac.il/~bengal/SCI_paper.pdf |archive-date=2013-10-13 |url-status=dead }}</ref> as well as natural language.<ref name="DOI10.7275/bjvb-2n37">
{{cite journal
| author = Yvo Meeres, Tommi A. Pirinen
| authorlink =
| year = 2021
| title = Vowel Harmony Viewed as Error-Correcting Code
| journal = Proceedings of the Society for Computation in Linguistics
| volume = 4
| issue = 1
| pages = 313–322
| doi = 10.7275/bjvb-2n37
| pmid = 
}}
</ref>

=== Internet ===
In a typical [[TCP/IP]] stack, error control is performed at multiple levels:
* Each [[Ethernet frame]] uses [[Cyclic redundancy check|CRC-32]] error detection. Frames with detected errors are discarded by the receiver hardware.
* The [[IPv4]] header contains a [[IPv4 header checksum|checksum]] protecting the contents of the header. [[Network packet|Packets]] with incorrect checksums are dropped within the network or at the receiver.
* The checksum was omitted from the [[IPv6]] header in order to minimize processing costs in [[network routing]] and because current [[link layer]] technology is assumed to provide sufficient error detection (see also RFC 3819).
* [[User Datagram Protocol|UDP]] has an optional checksum covering the payload and addressing information in the UDP and IP headers. Packets with incorrect checksums are discarded by the [[network stack]]. The checksum is optional under IPv4, and required under IPv6. When omitted, it is assumed the data-link layer provides the desired level of error protection.
* [[Transmission Control Protocol|TCP]] provides a checksum for protecting the payload and addressing information in the TCP and IP headers. Packets with incorrect checksums are discarded by the network stack and eventually get retransmitted using ARQ, either explicitly (such as through [[three-way handshake]]) or implicitly due to a [[timeout (computing)|timeout]].

=== Deep-space telecommunications ===
The development of error-correction codes was tightly coupled with the history of deep-space missions due to the extreme dilution of signal power over interplanetary distances, and the limited power availability aboard space probes. Whereas early missions sent their data uncoded, starting in 1968, digital error correction was implemented in the form of (sub-optimally decoded) [[convolutional code]]s and [[Reed–Muller code]]s.<ref name="deep-space-codes">K. Andrews et al., ''The Development of Turbo and LDPC Codes for Deep-Space Applications'', Proceedings of the IEEE, Vol. 95, No. 11, Nov. 2007.</ref> The Reed–Muller code was well suited to the noise the spacecraft was subject to (approximately matching a [[Gaussian function|bell curve]]), and was implemented for the Mariner spacecraft and used on missions between 1969 and 1977.

The [[Voyager 1]] and [[Voyager 2]] missions, which started in 1977, were designed to deliver color imaging and scientific information from [[Jupiter]] and [[Saturn]].<ref name="voyager">{{cite book |first1=William Cary |last1=Huffman |first2=Vera S. |last2=Pless |author-link2=Vera Pless |title=Fundamentals of Error-Correcting Codes |publisher=[[Cambridge University Press]] |year=2003 |isbn=978-0-521-78280-7 |url-access=registration |url=https://archive.org/details/fundamentalsofer0000huff }}</ref> This resulted in increased coding requirements, and thus, the spacecraft were supported by (optimally [[Viterbi decoder|Viterbi-decoded]]) convolutional codes that could be [[concatenated code|concatenated]] with an outer [[Binary Golay code|Golay (24,12,8) code]]. The Voyager 2 craft additionally supported an implementation of a [[Reed–Solomon code]]. The concatenated Reed–Solomon–Viterbi (RSV) code allowed for very powerful error correction, and enabled the spacecraft's extended journey to [[Uranus]] and [[Neptune]]. After ECC system upgrades in 1989, both crafts used V2 RSV coding.

The [[Consultative Committee for Space Data Systems]] currently recommends usage of error correction codes with performance similar to the Voyager 2 RSV code as a minimum. Concatenated codes are increasingly falling out of favor with space missions, and are replaced by more powerful codes such as [[Turbo code]]s or [[LDPC code]]s.

The different kinds of deep space and orbital missions that are conducted suggest that trying to find a one-size-fits-all error correction system will be an ongoing problem. For missions close to Earth, the nature of the [[Noise (electronics)|noise]] in the [[communication channel]] is different from that which a spacecraft on an interplanetary mission experiences. Additionally, as a spacecraft increases its distance from Earth, the problem of correcting for noise becomes more difficult.

=== Satellite broadcasting ===
The demand for satellite [[transponder]] bandwidth continues to grow, fueled by the desire to deliver television (including new channels and [[high-definition television]]) and IP data. Transponder availability and bandwidth constraints have limited this growth. Transponder capacity is determined by the selected [[modulation]] scheme and the proportion of capacity consumed by FEC.

=== Data storage ===
Error detection and correction codes are often used to improve the reliability of data storage media.<ref>{{Cite book|last1=Kurtas|first1=Erozan M.|url=https://books.google.com/books?id=Vx_NBQAAQBAJ&q=Error+detection+and+correction+codes+are+often+used+to+improve+the+reliability+of+data+storage+media&pg=PR5|title=Advanced Error Control Techniques for Data Storage Systems|last2=Vasic|first2=Bane|date=2018-10-03|publisher=CRC Press|isbn=978-1-4200-3649-7|language=en}}{{Dead link|date=March 2020 |bot=InternetArchiveBot |fix-attempted=yes }}</ref>  A parity track capable of detecting single-bit errors was present on the first [[magnetic tape data storage]] in 1951. The [[optimal rectangular code]] used in [[group coded recording]] tapes not only detects but also corrects single-bit errors. Some [[file format]]s, particularly [[archive formats]], include a checksum (most often [[CRC32]]) to detect corruption and truncation and can employ redundancy or [[parity file]]s to recover portions of corrupted data. [[Cross-interleaved Reed–Solomon coding|Reed-Solomon codes]] are used in [[compact disc]]s to correct errors caused by scratches.

Modern hard drives use Reed–Solomon codes to detect and correct minor errors in sector reads, and to recover corrupted data from failing sectors and store that data in the spare sectors.<ref>{{cite web |archive-url=https://web.archive.org/web/20080202143103/http://www.myharddrivedied.com/presentations_whitepaper.html |archive-date=2008-02-02 |url=http://www.myharddrivedied.com/presentations_whitepaper.html |title=My Hard Drive Died |author=Scott A. Moulton}}</ref> [[RAID]] systems use a variety of error correction techniques to recover data when a hard drive completely fails.  Filesystems such as [[ZFS]] or [[Btrfs]], as well as some [[RAID]] implementations, support [[data scrubbing]] and resilvering, which allows bad blocks to be detected and (hopefully) recovered before they are used.<ref>{{Cite book|last1=Qiao|first1=Zhi|last2=Fu|first2=Song|last3=Chen|first3=Hsing-Bung|last4=Settlemyer|first4=Bradley|title=2019 IEEE International Conference on Cluster Computing (CLUSTER) |chapter=Building Reliable High-Performance Storage Systems: An Empirical and Analytical Study |date=2019|pages=1–10|doi=10.1109/CLUSTER.2019.8891006|isbn=978-1-7281-4734-5|s2cid=207951690}}</ref> The recovered data may be re-written to exactly the same physical location, to spare blocks elsewhere on the same piece of hardware, or the data may be rewritten onto replacement hardware.

=== {{Anchor|LINUX-EDAC|BLUESMOKE}}Error-correcting memory ===
{{Main|ECC memory}}

[[Dynamic random-access memory]] (DRAM) may provide stronger protection against [[soft error]]s by relying on error-correcting codes. Such error-correcting memory, known as ''ECC'' or ''EDAC-protected'' memory, is particularly desirable for mission-critical applications, such as scientific computing, financial, medical, etc. as well as extraterrestrial applications due to the increased [[cosmic ray|radiation]] in space.

Error-correcting memory controllers traditionally use [[Hamming code]]s, although some use [[triple modular redundancy]]. [[Error correction code#Interleaving|Interleaving]] allows distributing the effect of a single cosmic ray potentially upsetting multiple physically neighboring bits across multiple words by associating neighboring bits to different words. As long as a [[single-event upset]] (SEU) does not exceed the error threshold (e.g., a single error) in any particular word between accesses, it can be corrected (e.g., by a single-bit error-correcting code), and the illusion of an error-free memory system may be maintained.<ref>{{cite web
 |title       = Using StrongArm SA-1110 in the On-Board Computer of Nanosatellite
 |publisher   = Tsinghua Space Center, [[Tsinghua University]], Beijing
 |access-date  = 2009-02-16
 |url         = http://www.apmcsta.org/File/doc/Conferences/6th%20meeting/Chen%20Zhenyu.doc
|archive-url = https://web.archive.org/web/20111002152735/http://www.apmcsta.org/File/doc/Conferences/6th%20meeting/Chen%20Zhenyu.doc
|url-status  = dead
|archive-date = 2011-10-02
}}<!-- I wish I had a better reference --></ref>

In addition to hardware providing features required for ECC memory to operate, [[operating system]]s usually contain related reporting facilities that are used to provide notifications when soft errors are transparently recovered. One example is the [[Linux kernel]]'s ''EDAC'' subsystem (previously known as ''Bluesmoke''), which collects the data from error-checking-enabled components inside a computer system; besides collecting and reporting back the events related to ECC memory, it also supports other checksumming errors, including those detected on the [[PCI bus]].<ref>{{cite magazine
 | url = http://www.admin-magazine.com/Articles/Monitoring-Memory-Errors
 | title = Error Detection and Correction
 | access-date = 2014-08-12
 | author = Jeff Layton | magazine = [[Linux Magazine]]
}}</ref><ref>{{cite web
 | url = http://bluesmoke.sourceforge.net/
 | title = EDAC Project
 | access-date = 2014-08-12
 | website = bluesmoke.sourceforge.net
}}</ref><ref>{{cite web
 |url         = https://www.kernel.org/doc/Documentation/edac.txt
 |title       = Documentation/edac.txt
 |work        = Linux kernel documentation
 |date        = 2014-06-16
 |access-date  = 2014-08-12
 |publisher   = [[kernel.org]]
 |url-status     = dead
 |archive-url  = https://web.archive.org/web/20090905174616/http://www.kernel.org/doc/Documentation/edac.txt
 |archive-date = 2009-09-05
}}</ref> A few systems{{specify|date=December 2021}} also support [[memory scrubbing]] to catch and correct errors early before they become unrecoverable.