Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
Checksum
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
{{Short description|Data used to detect errors in other data}} {{more citations needed|date=January 2024}} [[File:Checksum.svg|thumb|330px|right|Effect of a typical checksum function (the Unix<code>[[cksum]]</code> utility)]] A '''checksum''' is a small-sized [[Block (data storage)|block]] of data derived from another block of [[digital data]] for the purpose of [[error detection|detecting errors]] that may have been introduced during its [[telecommunications|transmission]] or [[computer storage|storage]]. By themselves, checksums are often used to verify [[data integrity]] but are not relied upon to verify [[data authenticity]].<ref>{{Cite web |title=Definition of CHECKSUM |url=https://www.merriam-webster.com/dictionary/checksum |access-date=2022-03-10 |website=Merriam-Webster |language=en |archive-date=2022-03-10 |archive-url=https://web.archive.org/web/20220310132715/https://www.merriam-webster.com/dictionary/checksum |url-status=live }}</ref> The [[algorithm|procedure]] which generates this checksum is called a '''checksum function''' or '''checksum algorithm'''. Depending on its design goals, a good checksum algorithm usually outputs a significantly different value, even for small changes made to the input.<ref>{{Cite web |last=Hoffman |first=Chris |title=What Is a Checksum (and Why Should You Care)? |url=https://www.howtogeek.com/363735/what-is-a-checksum-and-why-should-you-care/ |access-date=2022-03-10 |website=How-To Geek |date=30 September 2019 |language=en-US |archive-date=2022-03-09 |archive-url=https://web.archive.org/web/20220309114058/https://www.howtogeek.com/363735/what-is-a-checksum-and-why-should-you-care/ |url-status=live }}</ref> This is especially true of [[cryptographic hash function]]s, which may be used to detect many data corruption errors and verify overall [[data integrity]]; if the computed checksum for the current data input matches the stored value of a previously computed checksum, there is a very high probability the data has not been accidentally altered or corrupted. Checksum functions are related to [[hash function]]s, [[fingerprint (computing)|fingerprint]]s, [[randomization function]]s, and [[cryptographic hash function]]s. However, each of those concepts has different applications and therefore different design goals. For instance, a function returning the start of a string can provide a hash appropriate for some applications but will never be a suitable checksum. Checksums are used as [[cryptographic primitive]]s in larger authentication algorithms. For cryptographic systems with these two specific design goals{{Clarify|reason=What two specific design goals?|date=March 2023}}, see [[HMAC]]. [[Check digit]]s and [[parity bit]]s are special cases of checksums, appropriate for small blocks of data (such as [[Social Security number]]s, [[bank account]] numbers, [[computer word]]s, single [[byte]]s, etc.). Some [[error-correcting code]]s are based on special checksums which not only detect common errors but also allow the original data to be recovered in certain cases. ==Algorithms== ===Parity byte or parity word=== The simplest checksum algorithm is the so-called [[longitudinal redundancy check|longitudinal parity check]], which breaks the data into "words" with a fixed number {{math|''n''}} of bits, and then computes the bitwise [[exclusive or]] (XOR) of all those words. The result is appended to the message as an extra word. In simpler terms, for {{math|''n''}}=1 this means adding a bit to the end of the data bits to guarantee that there is an even number of '1's. To check the integrity of a message, the receiver computes the bitwise exclusive or of all its words, including the checksum; if the result is not a word consisting of {{math|''n''}} zeros, the receiver knows a transmission error occurred.<ref>{{Cite web |last=Fairhurst |first=Gorry |date=2014 |title=Checksums & Integrity Checks |url=https://erg.abdn.ac.uk/users/gorry/eg3576/checksums.html |access-date=March 11, 2022 |archive-date=April 8, 2022 |archive-url=https://web.archive.org/web/20220408011213/https://erg.abdn.ac.uk/users/gorry/eg3576/checksums.html |url-status=live }}</ref> With this checksum, any transmission error which flips a single bit of the message, or an odd number of bits, will be detected as an incorrect checksum. However, an error that affects two bits will not be detected if those bits lie at the same position in two distinct words. Also swapping of two or more words will not be detected. If the affected bits are independently chosen at random, the probability of a two-bit error being undetected is {{math|1/''n''}}. ===Sum complement=== A variant of the previous algorithm is to add all the "words" as unsigned binary numbers, discarding any overflow bits, and append the [[two's complement]] of the total as the checksum. To validate a message, the receiver adds all the words in the same manner, including the checksum; if the result is not a word full of zeros, an error must have occurred. This variant, too, detects any single-bit error, but the pro modular sum is used in [[SAE J1708]].<ref>{{cite web|url=http://www.kvaser.com/zh/about-can/related-protocols-and-standards/50.html |title=SAE J1708 |publisher=Kvaser.com |archive-url=https://web.archive.org/web/20131211152639/http://www.kvaser.com/zh/about-can/related-protocols-and-standards/50.html |archive-date=11 December 2013 }}</ref> ===Position-dependent=== The simple checksums described above fail to detect some common errors which affect many bits at once, such as changing the order of data words, or inserting or deleting words with all bits set to zero. The checksum algorithms most used in practice, such as [[Fletcher's checksum]], [[Adler-32]], and [[cyclic redundancy check]]s (CRCs), address these weaknesses by considering not only the value of each word but also its position in the sequence. This feature generally increases the [[Analysis of algorithms|cost]] of computing the checksum. ==={{anchor|fuzzy checksum}}Fuzzy checksum=== The idea of fuzzy checksum was developed for detection of [[email spam]] by building up cooperative databases from multiple ISPs of email suspected to be spam. The content of such spam may often vary in its details, which would render normal checksumming ineffective. By contrast, a "fuzzy checksum" reduces the body text to its characteristic minimum, then generates a checksum in the usual manner. This greatly increases the chances of slightly different spam emails producing the same checksum. The ISP spam detection software, such as [[SpamAssassin]], of co-operating ISPs, submits checksums of all emails to the centralised service such as [[Distributed Checksum Clearinghouse|DCC]]. If the count of a submitted fuzzy checksum exceeds a certain threshold, the database notes that this probably indicates spam. ISP service users similarly generate a fuzzy checksum on each of their emails and request the service for a spam likelihood.<ref>{{cite web | url=https://cwiki.apache.org/confluence/display/spamassassin/iXhash | title=IXhash | publisher=Apache | access-date=7 January 2020 | archive-date=31 August 2020 | archive-url=https://web.archive.org/web/20200831125801/https://cwiki.apache.org/confluence/display/spamassassin/iXhash | url-status=live }}</ref> ===General considerations=== A message that is {{math|''m''}} bits long can be viewed as a corner of the {{math|''m''}}-dimensional [[hypercube]]. The effect of a checksum algorithm that yields an {{math|''n''}}-bit checksum is to map each {{math|''m''}}-bit message to a corner of a larger hypercube, with dimension {{math|''m'' + ''n''}}. The {{math|2<sup>''m'' + ''n''</sup>}} corners of this hypercube represent all possible received messages. The valid received messages (those that have the correct checksum) comprise a smaller set, with only {{math|2<sup>''m''</sup>}} corners. A single-bit transmission error then corresponds to a displacement from a valid corner (the correct message and checksum) to one of the {{math|''m''}} adjacent corners. An error which affects {{math|''k''}} bits moves the message to a corner which is {{math|''k''}} steps removed from its correct corner. The goal of a good checksum algorithm is to spread the valid corners as far from each other as possible, to increase the likelihood "typical" transmission errors will end up in an invalid corner. ==See also== {{div col|colwidth=20em}} '''General topic''' * [[Algorithm]] * [[Check digit]] * [[Damm algorithm]] * [[Data rot]] * [[File verification]] * [[Fletcher's checksum]] * [[Frame check sequence]] * [[cksum]] * [[md5sum]] * [[sha1sum]] * [[Parchive]] * [[Sum (Unix)]] * [[SYSV checksum]] * [[BSD checksum]] * [[xxHash]] '''Error correction''' * [[Hamming code]] * [[Reed–Solomon error correction]] * [[IPv4 header checksum]] '''Hash functions''' * [[List of hash functions]] * [[Luhn algorithm]] * [[Parity bit]] * [[Rolling checksum]] * [[Verhoeff algorithm]] '''File systems''' * [[Bcachefs]], [[Btrfs]], [[ReFS]] and [[ZFS]] – file systems that perform automatic file integrity checking using checksums '''Related concepts''' *[[Isopsephy]] *[[Gematria]] *[[File fixity]] {{div col end}} ==References== {{reflist}} ==Further reading== *{{cite web |first1=Philip |last1=Koopman |first2=Kevin |last2=Driscoll |first3=Brendan |last3=Hall |title=Cyclic Redundancy Code and Checksum Algorithms to Ensure Critical Data Integrity |date=March 2015 |id=DOT/FAA/TC-14/49 |publisher=Federal Aviation Administration |url=http://www.tc.faa.gov/its/worldpac/techrpt/tc14-49.pdf |archive-url=https://web.archive.org/web/20150518073214/http://www.tc.faa.gov/its/worldpac/techrpt/tc14-49.pdf |archive-date=2015-05-18 |url-status=live }} *{{cite arXiv |first1=Philip |last1=Koopman |title=Large-Block Modular Addition Checksum Algorithms |eprint=2302.13432 |date=2023|class=cs.DS }} ==External links== {{wikibooks |1= Algorithm Implementation |2= Checksums }} *[http://www.netrino.com/Embedded-Systems/How-To/Additive-Checksums Additive Checksums (C)] theory from Barr Group *[http://www.peterjockisch.de/texte/computerartikel/Kryptographische-Pruefsummen/Kryptographische-Pruefsummen_EN.html Practical Application of Cryptographic Checksums] *[https://www.peterjockisch.de/Checksums_A4.pdf A4] *[https://www.peterjockisch.de/Checksums_US-Letter.pdf US-Letter] *[https://www.peterjockisch.de/Checksums_US-Letter_2.pdf US-Letter two-column] *[https://codebeautify.org/checksum-calculator Checksum Calculator] *[https://github.com/CRPrinzler/HASH-verify Open source python based application with GUI used to verify downloads.] [[Category:Checksum algorithms| ]]
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)
Pages transcluded onto the current version of this page
(
help
)
:
Template:Anchor
(
edit
)
Template:Cite arXiv
(
edit
)
Template:Cite web
(
edit
)
Template:Clarify
(
edit
)
Template:Div col
(
edit
)
Template:Div col end
(
edit
)
Template:Math
(
edit
)
Template:More citations needed
(
edit
)
Template:Reflist
(
edit
)
Template:Short description
(
edit
)
Template:Sister project
(
edit
)
Template:Wikibooks
(
edit
)