Editing Extended Unix Code (section)

==EUC-KR==
{{Redirect|EUC-KR|the variant so named in HTML standards|Unified Hangul Code}}
{{Infobox character encoding
|name=EUC-KR
|alias=Wansung, IBM-970
|mime=EUC-KR
|image=EUC-KR_without_extensions.svg
|caption=EUC-KR code structure
|standard=KS X 2901 (KS C 5861)
|lang=[[Korean language|Korean]], [[English language|English]], [[Russian language|Russian]]
|encodes=[[KS X 1001]]
|extends=[[ASCII]] or [[ISO 646|ISO 646:KR]]
|extensions=[[MacKorean|Mac OS Korean]], [[Code page 949 (IBM)|IBM-949]], [[Unified Hangul Code|Unified Hangul Code (Windows-949)]]
|next=[[Unified Hangul Code]] (web standards)
|classification = [[Extended ASCII|Extended]] [[ISO 646]], [[variable-width encoding|variable-length encoding]], [[CJK characters|CJK encoding]], EUC
}}

'''EUC-KR''' is a [[variable-width encoding|variable-length encoding]] to represent Korean text using two coded character sets, {{nowrap|[[KS X 1001]]}} (formerly KS C 5601)<ref>{{cite web |url=https://examples.oreilly.com/cjkvinfo/AppL/ksx1001.pdf |title=KS X 1001:1992}}</ref><ref>{{cite iso-ir |number=149 |title=KS C 5601:1987 |sponsor=Korea Bureau of Standards |date=1988-10-01}}</ref> and either {{nowrap|[[ISO 646]]:KR}} ({{nowrap|KS X 1003}}, formerly {{nowrap|KS C 5636}}) or [[ASCII]], depending on variant. {{nowrap|[[KS X 2901]]}} (formerly {{nowrap|KS C 5861}}) stipulates the encoding and {{IETF RFC|1557}} dubbed it as EUC-KR.

A character drawn from KS X 1001 (G1, code set 1) is encoded as two bytes in GR (0xA1–0xFE) and a character from {{nowrap|KS X 1003}} or ASCII (G0, code set 0) takes one byte in GL (0x21–0x7E).

It is usually referred to as Wansung ({{korean|완성|rr=Wanseong|lit=precomposed<ref>{{cite book|chapter-url=https://books.google.com/books?id=SA92uQqTB-AC&pg=PA146|title=CJKV Information Processing|last=Lunde|first=Ken|author-link=Ken Lunde|page=146|chapter=Chapter 3: Character Set Standards|isbn=978-0596514471|date=2009|publisher="O'Reilly Media, Inc." }}</ref>}}) in the [[Republic of Korea]]. IBM refers to the double-byte component as '''Code page 971''',<ref>{{Cite web|title=IBM Globalization – Coded character set identifiers – CCSID 971|url=https://www-01.ibm.com/software/globalization/ccsid/ccsid971.html|archive-url=https://web.archive.org/web/20141130005339/http://www-01.ibm.com/software/globalization/ccsid/ccsid971.html|access-date=2021-09-03|archive-date=2014-11-30}}</ref> and to EUC-KR with ASCII as '''Code page 970'''.<ref>{{cite web|url=https://www-01.ibm.com/software/globalization/ccsid/ccsid970.html|title=CCSID 970|publisher=IBM|work=IBM Globalization|archive-url=https://web.archive.org/web/20141201233141/http://www-01.ibm.com/software/globalization/ccsid/ccsid970.html|archive-date=2014-12-01}}</ref><ref>{{cite web|url=https://icu4c-demos.unicode.org/icu-bin/convexp?conv=euc-kr|title=ibm-970_P110_P110-2006_U2 (alias euc-kr)|work=Converter Explorer – ICU Demonstration|publisher=International Components for Unicode}}</ref><ref>{{Citation|title=International Components for Unicode (ICU), ibm-970_P110_P110-2006_U2.ucm|url=https://github.com/unicode-org/icu/blob/master/icu4c/source/data/mappings/ibm-970_P110_P110-2006_U2.ucm|date=2002-12-03}}</ref> It is implemented as '''Code page 20949''' ("Korean Wansung")<ref name="winids" /><ref>{{cite web |url=https://source.winehq.org/git/wine.git/blob/6f68543692a7588daa581d00c475715395036b15:/tools/make_unicode#l946 |title=dump_krwansung_codepage: build Korean Wansung table from the KSX1001 file |work=make_unicode: Generate code page .c files from ftp.unicode.org descriptions |first=Alexandre |last=Julliard |date=11 March 2021 |publisher=[[Wine (software)|Wine Project]]}}</ref> and '''Code page 51949''' ("EUC Korean") by Microsoft.<ref name="winids">{{cite web |url=https://docs.microsoft.com/en-us/windows/desktop/intl/code-page-identifiers |title=Code Page Identifiers |publisher=Microsoft |department=Windows Dev Center|date=7 January 2021 }}</ref>

{{As of|2025|05}}, less than 0.065% of all web pages globally declare using EUC-KR,<ref>{{Cite web |title=Usage Statistics and Market Share of EUC-KR for Websites, March 2025 |url=https://w3techs.com/technologies/details/en-euckr |access-date=2025-05-02 |website=w3techs.com}}</ref> but <!-- At 7.4% but rather showing result of calculation, that number plus other encoding adds up to way more than 100.0%. [[KS C 5601]] seems to be related, and is factored into the calculation: 100-95.5% = --> 4.5% of South Korean web pages use EUC-KR.<ref>{{Cite web|title=Distribution of Character Encodings among websites that use .kr|url=https://w3techs.com/technologies/segmentation/tld-kr-/character_encoding|website=w3techs.com|access-date=2025-05-02}}</ref> <!-- Cyrillic Windows-1251 in Russia at 100-96.1% = 3.9% (4.5%) is often higher, currently lower, and some small languages have higher non-UTF-8 use: https://w3techs.com/technologies/segmentation/cl-br-/character_encoding meaning the following is an untrue statement: making it the most popular non-[[UTF-8]]/Unicode encoding for a language/web domain.<ref>{{Cite web|url=https://w3techs.com/technologies/segmentation/cl-ko-/character_encoding|title=Distribution of Character Encodings among websites that use Korean|website=w3techs.com|access-date=2022-06-18}}</ref> --> Including extensions, it is the most widely used legacy character encoding in Korea on all three major platforms ([[macOS]], other Unix-like OSes, and Windows), but its use has been very slowly shifting to [[UTF-8]] as it gains popularity, especially on Linux and macOS.

As with most other encodings, [[UTF-8]] is now preferred for new use, solving problems with consistency between platforms and vendors.

===Unified Hangul Code===
{{Main|Unified Hangul Code}}

A common extension of EUC-KR is the [[Unified Hangul Code]] ({{korean|통합형 한글 코드|rr=Tonghabhyeong Hangeul Kodeu|labels=no}},<ref>{{cite web|url=https://www.w3c.or.kr/i18n/hangul-i18n/ko-code.html|title=한글 코드에 대하여|publisher=W3C|language=ko|access-date=2019-01-07|archive-url=https://web.archive.org/web/20130524175322/http://www.w3c.or.kr/i18n/hangul-i18n/ko-code.html|archive-date=2013-05-24|url-status=dead}}</ref> or {{korean|통합 완성형|rr=Tonghab Wansunghyung|labels=no}}), which is the default Korean codepage on Microsoft Windows. It is given the code page number 949 by Microsoft, and 1261<ref>In [https://opensource.apple.com/source/ICU/ICU-59180.0.1/icuSources/common/ucnv_lmb.cpp.auto.html ucnv_lmb.cpp], a file originating from [[IBM]] and included in the [[International Components for Unicode]] source tree, the lead byte 0x11 is commented as referring to "Korean: ibm-1261" after the definition of <code>ULMBCS_GRP_KO</code>, and is mapped to the <code>"windows-949"</code> ICU codec in the <code>OptGroupByteToCPName</code> array later in the file.</ref> or 1363<ref>{{citation|url=https://www-01.ibm.com/software/globalization/ccsid/ccsid1363.html|publisher=IBM|title=Coded character set identifiers – CCSID 1363|work=IBM Globalization|archive-url=https://web.archive.org/web/20141129210404/http://www-01.ibm.com/software/globalization/ccsid/ccsid1363.html|archive-date=2014-11-29|url-status=dead}}</ref> by IBM. [[Code page 949 (IBM)|IBM's code page 949]] is a different, unrelated, EUC-KR extension.

Unified Hangul Code extends EUC-KR by using codes that do not conform to the EUC structure to incorporate additional syllable blocks, completing the coverage of the composed syllable blocks available in [[Johab]] and Unicode. The [[W3C]]/[[WHATWG]] Encoding Standard used by [[HTML5]] incorporates the Unified Hangul Code extensions into its definition of EUC-KR.<ref>{{citation|url=https://encoding.spec.whatwg.org/#index-euc-kr|title=5. Indexes (§ index EUC-KR)|work=Encoding Standard|publisher=WHATWG}}</ref>

===Mac OS Korean (HangulTalk)===
Other encodings incorporating EUC-KR as a subset include the Mac OS Korean script (known as Code page 10003 or <code>x-mac-korean</code>),<ref name="msdnlabels"/> which was used by HangulTalk (MacOS-KH), the Korean localization of the [[classic Mac OS]]. It was developed by Elex Computer ({{lang|ko|일렉스}}), who were at the time the authorised distributor of Apple Macintosh computers in South Korea.<ref>{{cite web |url=http://hojin.freeservers.com/beige/hom/11HangulTalk.html |title=HangulTalk: De facto standard Hangul environment for Mac |work=Guide to using Hangul on Macintosh |last=Gil |first=Hojin}}</ref><ref name="lunde2009appE"/>

HangulTalk adds extension characters with lead bytes between 0xA1 and 0xAD, both in unused space within the EUC-KR GR plane (trail bytes 0xA1&ndash;0xFE), and using non-EUC codes outside of it (trail bytes 0x41&ndash;0xA0). Some of these characters are font-style-independent stylized [[dingbat]]s.<ref name="lunde2009appE">{{citation|mode=cs1 |title=Appendix E: Vendor Character Set Standards |work=CJKV Information Processing: Chinese, Japanese, Korean & Vietnamese Computing |last=Lunde |first=Ken |author-link=Ken Lunde |year=2009 |edition=2nd |publisher=[[O'Reilly Media|O'Reilly]] |location=[[Sebastopol, CA]] |isbn=978-0-596-51447-1 |url=https://resources.oreilly.com/examples/9780596514471/blob/master/cjkvip2e-appE.pdf}}</ref> Many of these characters do not have exact Unicode mappings, and Apple software maps these cases variously to [[combining character|combining sequences]], to approximate mappings with an appended [[Private Use Area|private-use]] character as a modifier for round-trip purposes, or to private-use characters.<ref name="mackoreantxt">{{cite web |url=https://unicode.org/Public/MAPPINGS/VENDORS/APPLE/KOREAN.TXT |author=Apple |author-link=Apple, Inc |title=Map (external version) from Mac OS Korean encoding to Unicode 3.2 and later |date=2005-04-05 |publisher=[[Unicode Consortium]]}}</ref>

Apple also uses certain single-byte codes outside of the EUC-KR plane for additional characters: 0x80 for a [[required space]], 0x81 for a [[won sign]] (₩), 0x82 for an [[en dash]] (&ndash;), 0x83 for a [[copyright sign]] ({{not a typo|©}}), 0x84 for a wide [[underscore]] ({{not a typo|＿}}) and 0xFF for an [[ellipsis]] (...).<ref name="mackoreantxt" /> Although none of these additional single-byte codes are within the lead byte range of plain EUC-KR (unlike Apple's extensions to EUC-CN, [[#x-mac-chinesesimp|see above]]), some are within the lead byte range of Unified Hangul Code (specifically, 0x81, 0x82, 0x83 and 0x84).