Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
Double-byte character set
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
{{redirect|DBCS}} {{No footnotes|date=September 2021}} A '''double-byte character set''' ('''DBCS''') is a [[character encoding]] in which either all characters (including [[control characters]]) are encoded in two bytes, or merely every [[graphic character]] not representable by an accompanying [[single-byte character set]] ([[SBCS]]) is encoded in two [[bytes]] ([[Han characters]] would generally comprise most of these two-byte characters). A DBCS supports national languages that contain many unique characters or symbols (the maximum number of characters that can be represented with one byte is [[256 (number)|256]] characters, while two bytes can represent up to [[65536 (number)|65,536]] characters). Examples of such languages include [[Japanese language|Japanese]] and [[Chinese language|Chinese]]. [[Hangul]] does not contain as many characters, but [[KS X 1001]] supports both Hangul and [[Hanja]], and uses two bytes per character. ==In CJK computing== The term ''DBCS'' traditionally refers to a character encoding where each graphic character is encoded in two bytes. In an 8-bit code, such as [[Big-5]] or [[Shift JIS]], a character from the DBCS is represented with a lead (first) byte with the [[most significant bit]] set (i.e., being greater than seven bits), and paired up with a single-byte character-set (SBCS). For the practical reason of maintaining compatibility with unmodified, off-the-shelf software, the SBCS is associated with [[half-width character]]s and the DBCS with [[full-width character]]s. In a 7-bit code such as [[ISO-2022-JP]], [[ANSI escape sequence|escape sequences]] or [[Shift Out|shift codes]] are used to switch between the SBCS and DBCS. Sometimes, the use of the term "DBCS" can imply an underlying structure that does not comply with [[ISO 2022]]. For example, "DBCS" can sometimes mean a double-byte encoding that is specifically not [[Extended Unix Code]] (EUC). This original meaning of DBCS is different from what some consider correct usage today. Some insist that these character encodings be properly called [[multi-byte character set]]s (MBCS) or [[variable-width encoding]]s, because character encodings such as [[EUC-JP]], [[EUC-KR]], [[EUC-TW]], [[GB 18030]], and [[UTF-8]] use more than two bytes for some characters, and they support one byte for other characters. ==Ambiguity <span class="anchor" id="Controversy"></span> == Some people use DBCS to mean the [[UTF-16]] and [[UTF-8]] encodings, while other people use the term DBCS to mean older (pre-[[Unicode]]) character encodings that use more than one byte per character. [[Shift JIS]], [[GB 2312]] and [[Big5]] are a few character encodings that can contain more than one byte per character, but even using the term DBCS for these character encodings is incorrect terminology because these character encodings are really [[variable-width encoding]]s (as are both UTF-16 and UTF-8). Some [[IBM]] mainframes do have true DBCS code pages, which contain only the double byte portion of a multi-byte code page. If a person uses the term "DBCS enablement" for software [[internationalization]], they are using ambiguous terminology. They either mean they want to write software for [[East Asian]] markets using older technology with code pages, or they are planning on using Unicode. Sometimes this term also implies [[translation]] into an East Asian language. Usually "Unicode enablement" means internationalizing software by using Unicode, and "DBCS enablement" means using incompatible character encodings that exist between the various countries in East Asia for internationalizing software. Since Unicode, unlike many other character encodings, supports all the major languages in East Asia, it is generally easier to enable and maintain software that uses Unicode. DBCS (non-Unicode) enablement is usually only desired when much older operating systems or applications do not support Unicode. ==TBCS== <!-- Section header used as target for redirects --> A triple-byte character set (TBCS) is a character encoding in which characters (including control characters) are encoded in three bytes.<!-- to be improved per IBM's definition. --> ==See also== * [[Variable-width encoding]] (also known as MBCS β multi-byte character set) * [[DOS/V]] ==External links== *[https://docs.microsoft.com/en-us/windows/win32/intl/double-byte-character-sets Microsoft's definition of "double-byte character set"] *{{webarchive|url=https://web.archive.org/web/20181018165225/http://www-01.ibm.com/software/globalization/terminology/d.html#x2001652|date=October 18, 2018|title=IBM's definition of "double-byte character set"}} {{character encoding}} [[Category:Character encoding]]
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)
Pages transcluded onto the current version of this page
(
help
)
:
Template:Character encoding
(
edit
)
Template:No footnotes
(
edit
)
Template:Redirect
(
edit
)
Template:Webarchive
(
edit
)