Editing UTF-32 (section)

== History ==
The original [[ISO/IEC 10646]] standard defines a 32-bit ''encoding form'' called '''UCS-4''', in which each code point in the [[Universal Character Set]] (UCS) is represented by a 31-bit value from 0 to 0x7FFFFFFF (the sign bit was unused and zero). In November 2003, Unicode was restricted by RFC&nbsp;3629 to match the constraints of the [[UTF-16]] encoding: explicitly prohibiting code points greater than U+10FFFF (and also the high and low surrogates U+D800 through U+DFFF). This limited subset defines UTF-32.<ref>{{Cite web|title=Publicly Available Standards - ISO/IEC 10646:2020 |url=https://standards.iso.org/ittf/PubliclyAvailableStandards/index.html|access-date=2021-10-12|website=ISO Standards |quote=Clause 9.4: "Because surrogate code points are not UCS scalar values, UTF-32 code units in the range 0000 D800-0000 DFFF are ill-formed". Clause 4.57: "[UCS codespace] <!--doubt this is in any source, or then redundant previously: codespace--> consisting of the integers from 0 to 10 FFFF (hexadecimal)".<!--in an older(?) source like: "uses the UCS codespace which consists of the integers from 0 to 10FFFF."[http://std.dkuug.dk/JTC1/sc2/WG2/docs/n3967.zip/FCD-10646-00-Main.pdf]--> Clause 4.58: "[UCS scalar value] any UCS code point except high-surrogate and low-surrogate code points".}}</ref><ref name="4_or_3_bytes">{{Cite web |title=Mapping codepoints to Unicode encoding forms |url=https://scripts.sil.org/cms/scripts/page.php?site_id=nrsi&item_id=IWS-AppendixA |first1=Peter |last1=Constable |date=2001-06-13 |access-date=2022-10-03 |website=Computers and Writing Systems - SIL International }}</ref> Although the ISO standard had (as of 1998 in Unicode 2.1) "reserved for private use" 0xE00000 to 0xFFFFFF, and 0x60000000 to 0x7FFFFFFF<ref>{{Cite web |title=Annex B - The Universal Character Set (UCS) |url=http://std.dkuug.dk/cen/tc304/guidecharactersets/guideannexb.html |access-date=2022-10-03 |website=DKUUG Standardizing |url-status=live |archive-url=https://web.archive.org/web/20220122081513/http://std.dkuug.dk/cen/tc304/guidecharactersets/guideannexb.html |archive-date= Jan 22, 2022 }}</ref> these areas were removed in later versions. Because the Principles and Procedures document of [[ISO/IEC JTC 1/SC 2]] Working Group 2 states that all future assignments of code points will be constrained to the Unicode range, UTF-32 will be able to represent all UCS code points and UTF-32 and UCS-4 are identical.<ref>{{Cite book |title=The Unicode Standard, version 6.0 |date=February 2011 |publisher=[[Unicode Consortium]] |isbn=978-1-936213-01-6 |location=Mountain View, CA |pages=573 |chapter=C.2 Encoding Forms in ISO/IEC 10646 |quote=It [UCS-4] is now treated simply as a synonym for UTF-32, and is considered the canonical form for representation of characters in 10646. |chapter-url=https://www.unicode.org/versions/Unicode6.0.0/appC.pdf}}</ref>