Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
Byte order mark
(section)
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
{{Use dmy dates|date=April 2022}} {{Short description|Unicode character}} {{Redirect|FEFF}} The '''byte-order mark''' ('''BOM''') is a particular usage of the special [[Unicode]] character code, {{unichar|FEFF|Zero Width No-Break Space}}, whose appearance as a [[Magic number (programming)#Magic numbers in files|magic number]] at the start of a text stream can signal several things to a [[computer program|program]] reading the text:<ref name="unicode FAQ">{{cite web|url=https://www.unicode.org/faq/utf_bom.html#BOM |title=FAQ - UTF-8, UTF-16, UTF-32 & BOM |website=Unicode.org |access-date=28 January 2017}}</ref> * the byte order, or [[endianness]], of the text stream in the cases of 16-[[bit]] and 32-bit encodings; * the fact that the text stream's encoding is Unicode, to a high level of confidence; * which Unicode character encoding is used. BOM use is optional. Its presence interferes with the use of [[UTF-8]] by software that does not expect non-[[ASCII]] bytes at the start of a file but that could otherwise handle the text stream. Unicode can be encoded in units of 8-bit, 16-bit, or 32-bit integers. For the 16- and 32-bit representations, a computer receiving text from arbitrary sources needs to know which byte order the integers are encoded in. The BOM is encoded in the same scheme as the rest of the document and becomes a [[Universal Character Set characters#Noncharacters|{{Proper name|noncharacter}}]] Unicode code point if its bytes are swapped. Hence, the process accessing the text can examine these first few bytes to determine the endianness, without requiring some contract or [[metadata]] outside of the text stream itself. Generally the receiving computer will swap the bytes to its own endianness, if necessary, and would no longer need the BOM for processing. The byte sequence of the BOM differs per Unicode encoding (including ones outside the Unicode standard such as [[UTF-7]], see [[#Byte_order_marks_by_encoding|table below]]), and none of the sequences is likely to appear at the start of text streams stored in other encodings. Therefore, placing an encoded BOM at the start of a text stream can indicate that the text is Unicode and identify the encoding scheme used. This use of the BOM is called a "Unicode signature".<ref>{{cite web|title=The Unicode® Standard Version 9.0|url=https://www.unicode.org/versions/Unicode9.0.0/UnicodeStandard-9.0.pdf|website=The Unicode Consortium|ref=2.13 Special Characters}}</ref>
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)