Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
Extended Unix Code
(section)
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
==EUC-KR== {{Redirect|EUC-KR|the variant so named in HTML standards|Unified Hangul Code}} {{Infobox character encoding |name=EUC-KR |alias=Wansung, IBM-970 |mime=EUC-KR |image=EUC-KR_without_extensions.svg |caption=EUC-KR code structure |standard=KS X 2901 (KS C 5861) |lang=[[Korean language|Korean]], [[English language|English]], [[Russian language|Russian]] |encodes=[[KS X 1001]] |extends=[[ASCII]] or [[ISO 646|ISO 646:KR]] |extensions=[[MacKorean|Mac OS Korean]], [[Code page 949 (IBM)|IBM-949]], [[Unified Hangul Code|Unified Hangul Code (Windows-949)]] |next=[[Unified Hangul Code]] (web standards) |classification = [[Extended ASCII|Extended]] [[ISO 646]], [[variable-width encoding|variable-length encoding]], [[CJK characters|CJK encoding]], EUC }} '''EUC-KR''' is a [[variable-width encoding|variable-length encoding]] to represent Korean text using two coded character sets, {{nowrap|[[KS X 1001]]}} (formerly KS C 5601)<ref>{{cite web |url=https://examples.oreilly.com/cjkvinfo/AppL/ksx1001.pdf |title=KS X 1001:1992}}</ref><ref>{{cite iso-ir |number=149 |title=KS C 5601:1987 |sponsor=Korea Bureau of Standards |date=1988-10-01}}</ref> and either {{nowrap|[[ISO 646]]:KR}} ({{nowrap|KS X 1003}}, formerly {{nowrap|KS C 5636}}) or [[ASCII]], depending on variant. {{nowrap|[[KS X 2901]]}} (formerly {{nowrap|KS C 5861}}) stipulates the encoding and {{IETF RFC|1557}} dubbed it as EUC-KR. A character drawn from KS X 1001 (G1, code set 1) is encoded as two bytes in GR (0xA1β0xFE) and a character from {{nowrap|KS X 1003}} or ASCII (G0, code set 0) takes one byte in GL (0x21β0x7E). It is usually referred to as Wansung ({{korean|μμ±|rr=Wanseong|lit=precomposed<ref>{{cite book|chapter-url=https://books.google.com/books?id=SA92uQqTB-AC&pg=PA146|title=CJKV Information Processing|last=Lunde|first=Ken|author-link=Ken Lunde|page=146|chapter=Chapter 3: Character Set Standards|isbn=978-0596514471|date=2009|publisher="O'Reilly Media, Inc." }}</ref>}}) in the [[Republic of Korea]]. IBM refers to the double-byte component as '''Code page 971''',<ref>{{Cite web|title=IBM Globalization β Coded character set identifiers β CCSID 971|url=https://www-01.ibm.com/software/globalization/ccsid/ccsid971.html|archive-url=https://web.archive.org/web/20141130005339/http://www-01.ibm.com/software/globalization/ccsid/ccsid971.html|access-date=2021-09-03|archive-date=2014-11-30}}</ref> and to EUC-KR with ASCII as '''Code page 970'''.<ref>{{cite web|url=https://www-01.ibm.com/software/globalization/ccsid/ccsid970.html|title=CCSID 970|publisher=IBM|work=IBM Globalization|archive-url=https://web.archive.org/web/20141201233141/http://www-01.ibm.com/software/globalization/ccsid/ccsid970.html|archive-date=2014-12-01}}</ref><ref>{{cite web|url=https://icu4c-demos.unicode.org/icu-bin/convexp?conv=euc-kr|title=ibm-970_P110_P110-2006_U2 (alias euc-kr)|work=Converter Explorer β ICU Demonstration|publisher=International Components for Unicode}}</ref><ref>{{Citation|title=International Components for Unicode (ICU), ibm-970_P110_P110-2006_U2.ucm|url=https://github.com/unicode-org/icu/blob/master/icu4c/source/data/mappings/ibm-970_P110_P110-2006_U2.ucm|date=2002-12-03}}</ref> It is implemented as '''Code page 20949''' ("Korean Wansung")<ref name="winids" /><ref>{{cite web |url=https://source.winehq.org/git/wine.git/blob/6f68543692a7588daa581d00c475715395036b15:/tools/make_unicode#l946 |title=dump_krwansung_codepage: build Korean Wansung table from the KSX1001 file |work=make_unicode: Generate code page .c files from ftp.unicode.org descriptions |first=Alexandre |last=Julliard |date=11 March 2021 |publisher=[[Wine (software)|Wine Project]]}}</ref> and '''Code page 51949''' ("EUC Korean") by Microsoft.<ref name="winids">{{cite web |url=https://docs.microsoft.com/en-us/windows/desktop/intl/code-page-identifiers |title=Code Page Identifiers |publisher=Microsoft |department=Windows Dev Center|date=7 January 2021 }}</ref> {{As of|2025|05}}, less than 0.065% of all web pages globally declare using EUC-KR,<ref>{{Cite web |title=Usage Statistics and Market Share of EUC-KR for Websites, March 2025 |url=https://w3techs.com/technologies/details/en-euckr |access-date=2025-05-02 |website=w3techs.com}}</ref> but <!-- At 7.4% but rather showing result of calculation, that number plus other encoding adds up to way more than 100.0%. [[KS C 5601]] seems to be related, and is factored into the calculation: 100-95.5% = --> 4.5% of South Korean web pages use EUC-KR.<ref>{{Cite web|title=Distribution of Character Encodings among websites that use .kr|url=https://w3techs.com/technologies/segmentation/tld-kr-/character_encoding|website=w3techs.com|access-date=2025-05-02}}</ref> <!-- Cyrillic Windows-1251 in Russia at 100-96.1% = 3.9% (4.5%) is often higher, currently lower, and some small languages have higher non-UTF-8 use: https://w3techs.com/technologies/segmentation/cl-br-/character_encoding meaning the following is an untrue statement: making it the most popular non-[[UTF-8]]/Unicode encoding for a language/web domain.<ref>{{Cite web|url=https://w3techs.com/technologies/segmentation/cl-ko-/character_encoding|title=Distribution of Character Encodings among websites that use Korean|website=w3techs.com|access-date=2022-06-18}}</ref> --> Including extensions, it is the most widely used legacy character encoding in Korea on all three major platforms ([[macOS]], other Unix-like OSes, and Windows), but its use has been very slowly shifting to [[UTF-8]] as it gains popularity, especially on Linux and macOS. As with most other encodings, [[UTF-8]] is now preferred for new use, solving problems with consistency between platforms and vendors. ===Unified Hangul Code=== {{Main|Unified Hangul Code}} A common extension of EUC-KR is the [[Unified Hangul Code]] ({{korean|ν΅ν©ν νκΈ μ½λ|rr=Tonghabhyeong Hangeul Kodeu|labels=no}},<ref>{{cite web|url=https://www.w3c.or.kr/i18n/hangul-i18n/ko-code.html|title=νκΈ μ½λμ λνμ¬|publisher=W3C|language=ko|access-date=2019-01-07|archive-url=https://web.archive.org/web/20130524175322/http://www.w3c.or.kr/i18n/hangul-i18n/ko-code.html|archive-date=2013-05-24|url-status=dead}}</ref> or {{korean|ν΅ν© μμ±ν|rr=Tonghab Wansunghyung|labels=no}}), which is the default Korean codepage on Microsoft Windows. It is given the code page number 949 by Microsoft, and 1261<ref>In [https://opensource.apple.com/source/ICU/ICU-59180.0.1/icuSources/common/ucnv_lmb.cpp.auto.html ucnv_lmb.cpp], a file originating from [[IBM]] and included in the [[International Components for Unicode]] source tree, the lead byte 0x11 is commented as referring to "Korean: ibm-1261" after the definition of <code>ULMBCS_GRP_KO</code>, and is mapped to the <code>"windows-949"</code> ICU codec in the <code>OptGroupByteToCPName</code> array later in the file.</ref> or 1363<ref>{{citation|url=https://www-01.ibm.com/software/globalization/ccsid/ccsid1363.html|publisher=IBM|title=Coded character set identifiers β CCSID 1363|work=IBM Globalization|archive-url=https://web.archive.org/web/20141129210404/http://www-01.ibm.com/software/globalization/ccsid/ccsid1363.html|archive-date=2014-11-29|url-status=dead}}</ref> by IBM. [[Code page 949 (IBM)|IBM's code page 949]] is a different, unrelated, EUC-KR extension. Unified Hangul Code extends EUC-KR by using codes that do not conform to the EUC structure to incorporate additional syllable blocks, completing the coverage of the composed syllable blocks available in [[Johab]] and Unicode. The [[W3C]]/[[WHATWG]] Encoding Standard used by [[HTML5]] incorporates the Unified Hangul Code extensions into its definition of EUC-KR.<ref>{{citation|url=https://encoding.spec.whatwg.org/#index-euc-kr|title=5. Indexes (Β§ index EUC-KR)|work=Encoding Standard|publisher=WHATWG}}</ref> ===Mac OS Korean (HangulTalk)=== Other encodings incorporating EUC-KR as a subset include the Mac OS Korean script (known as Code page 10003 or <code>x-mac-korean</code>),<ref name="msdnlabels"/> which was used by HangulTalk (MacOS-KH), the Korean localization of the [[classic Mac OS]]. It was developed by Elex Computer ({{lang|ko|μΌλ μ€}}), who were at the time the authorised distributor of Apple Macintosh computers in South Korea.<ref>{{cite web |url=http://hojin.freeservers.com/beige/hom/11HangulTalk.html |title=HangulTalk: De facto standard Hangul environment for Mac |work=Guide to using Hangul on Macintosh |last=Gil |first=Hojin}}</ref><ref name="lunde2009appE"/> HangulTalk adds extension characters with lead bytes between 0xA1 and 0xAD, both in unused space within the EUC-KR GR plane (trail bytes 0xA1–0xFE), and using non-EUC codes outside of it (trail bytes 0x41–0xA0). Some of these characters are font-style-independent stylized [[dingbat]]s.<ref name="lunde2009appE">{{citation|mode=cs1 |title=Appendix E: Vendor Character Set Standards |work=CJKV Information Processing: Chinese, Japanese, Korean & Vietnamese Computing |last=Lunde |first=Ken |author-link=Ken Lunde |year=2009 |edition=2nd |publisher=[[O'Reilly Media|O'Reilly]] |location=[[Sebastopol, CA]] |isbn=978-0-596-51447-1 |url=https://resources.oreilly.com/examples/9780596514471/blob/master/cjkvip2e-appE.pdf}}</ref> Many of these characters do not have exact Unicode mappings, and Apple software maps these cases variously to [[combining character|combining sequences]], to approximate mappings with an appended [[Private Use Area|private-use]] character as a modifier for round-trip purposes, or to private-use characters.<ref name="mackoreantxt">{{cite web |url=https://unicode.org/Public/MAPPINGS/VENDORS/APPLE/KOREAN.TXT |author=Apple |author-link=Apple, Inc |title=Map (external version) from Mac OS Korean encoding to Unicode 3.2 and later |date=2005-04-05 |publisher=[[Unicode Consortium]]}}</ref> Apple also uses certain single-byte codes outside of the EUC-KR plane for additional characters: 0x80 for a [[required space]], 0x81 for a [[won sign]] (β©), 0x82 for an [[en dash]] (–), 0x83 for a [[copyright sign]] ({{not a typo|Β©}}), 0x84 for a wide [[underscore]] ({{not a typo|οΌΏ}}) and 0xFF for an [[ellipsis]] (...).<ref name="mackoreantxt" /> Although none of these additional single-byte codes are within the lead byte range of plain EUC-KR (unlike Apple's extensions to EUC-CN, [[#x-mac-chinesesimp|see above]]), some are within the lead byte range of Unified Hangul Code (specifically, 0x81, 0x82, 0x83 and 0x84).
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)