Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
Unicode and HTML
(section)
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
{{Short description|Relationship between Unicode characters and HTML}} {{Multiple issues| {{primary sources|date=December 2011}} {{essay-like|date=December 2011}} {{refimprove|date=January 2011}} }} {{SpecialChars}} {{Html series}} Web pages authored using HyperText Markup Language ([[HTML email|HTML]]) may contain multilingual text represented with the Unicode universal character set. Key to the relationship between Unicode and HTML is the relationship between the "document character set", which defines the set of characters that may be present in an HTML document and assigns numbers to them, and the "external character encoding", or "charset", used to encode a given document as a sequence of bytes. In RFC 1866, the initial HTML 2.0 standard, the document character set was defined as ISO-8859-1 (later HTML standard defaults to [[Windows-1252]] encoding). It was extended to [[ISO 10646]] (which is basically equivalent to Unicode) by {{IETF RFC|2070}}. It does not vary between documents of different languages or created on different platforms. The external character encoding is chosen by the author of the document (or the software the author uses to create the document) and determines how the bytes used to store and/or transmit the document map to characters from the document character set. Characters not present in the chosen external character encoding may be represented by character entity references. The relationship between [[Unicode]] and HTML tends to be a difficult topic for many computer professionals, document authors, and [[World Wide Web|web]] users alike. The accurate representation of text in [[web page]]s from different [[natural language]]s and [[writing system]]s is complicated by the details of [[character encoding]], [[markup language]] syntax, [[Computer font|font]], and varying levels of support by [[web browser]]s.
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)