Editing Character encoding (section)

===Code points===
A code point is represented by a sequence of code units. The mapping is defined by the encoding. Thus, the number of code units required to represent a code point depends on the encoding:
* UTF-8: code points map to a sequence of one, two, three or four code units.
* UTF-16: code units are twice as long as 8-bit code units. Therefore, any code point with a scalar value less than U+10000 is encoded with a single code unit. Code points with a value U+10000 or higher require two code units each. These pairs of code units have a unique term in UTF-16: [[UTF-16#Code points from U+010000 to U+10FFFF|"Unicode surrogate pairs".]]
* UTF-32: the 32-bit code unit is large enough that every code point is represented as a single code unit.
* GB 18030: multiple code units per code point are common, because of the small code units. Code points are mapped to one, two, or four code units.<ref>{{cite web | url=https://docs.oracle.com/javase/tutorial/i18n/text/terminology.html | title=Terminology (The Java Tutorials) | publisher=Oracle | access-date=25 March 2018 }}</ref>