Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
Rich Text Format
(section)
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
==Character encoding== A standard RTF file can only consist of 7-bit [[ASCII]] characters, but can use [[escape sequence]]s to encode other characters.<ref name="Microsoft RTF Syntax">{{citation |url=http://msdn.microsoft.com/en-us/library/aa140284(office.10).aspx |title=Microsoft RTF Syntax}}</ref> The two character escapes are [[code page]] escapes and, starting with RTF 1.5, [[Unicode]] escapes. In a code page escape, two [[hexadecimal]] digits following a backslash and [[typewriter apostrophe]] denote a character taken from a Windows code page. For example, if the code page is set to [[Windows-1256]]<!-- this does not require control codes, it may be a system or application default -->, the sequence <code>\'c8</code> will encode the Arabic letter ''bฤสผ'' ุจ. It is also possible to specify a "Character Set" in the preamble of the RTF document and associate it to a header. For example, the preamble has the text <code>\f3\fnil\fcharset128</code>, then, in the body of the document, the text <code>\f3\'bd\'f0</code> will represent the code point <code>0xbd 0xf0 </code> from the Character Set 128 (which corresponds to the [[Shift-JIS]] code page), which encodes "้". {| class="wikitable sortable" style="white-space: nowrap;" |- ! RTF Character Set !! Code Page !! Description |- | 0 ||[[Windows-1252]]|| Latin alphabet, Western Europe / Americas |- | 1 || 0 || Default Windows API code page for system locale |- | 2 || 42 || Symbol ([[Private Use Area|PUA-mapped]])<ref>{{cite web |url=http://archives.miloush.net/michkap/archive/2005/11/08/490495.html |last=Kaplan |first=Michael S |title=More than you ever wanted to know about CP_SYMBOL |work=Sorting It All Out |date=2005-11-08}}</ref> character set |- |77 |2 |Default Macintosh-compatibility code page for system locale |- |128 |[[Code page 932 (Microsoft Windows)|Windows-932]] |Japanese, [[Shift JIS]] (Windows version) |- |129 |[[Windows-949]] |Korean, Unified Hangul Code (extended Wansung) |- |130 |[[Code page 1361|Windows-1361]] |Korean, [[Johab]] (ASCII-based version) |- |134 |[[Code page 936 (Microsoft Windows)|Windows-936]] |Chinese, [[GBK (character encoding)|GBK]] (extended [[GB 2312]]) |- |136 |[[Windows-950]] |Chinese, [[Big5]] |- |161 |[[Windows-1253]] |Greek |- |162 |[[Windows-1254]] |Latin alphabet, Turkish |- |163 |[[Windows-1258]] |Latin alphabet, Vietnamese |- |177 |[[Windows-1255]] |Hebrew |- |178 |[[Windows-1256]] |Arabic |- |186 |[[Windows-1257]] |Baltic |- |204 |[[Windows-1251]] |Cyrillic |- |238 |[[Windows-1250]] |Latin alphabet, Eastern Europe |- |255 |1 |Default [[Windows OEM character set|OEM code page]] for system locale |} For a Unicode escape, the control word <code>\u</code> is used, followed by a 16-bit signed integer which corresponds to the Unicode UTF-16 code unit number. For the benefit of programs without Unicode support, this must be followed by the nearest representation of this character in the specified code page. For example, <code>\u1576?</code> would give the Arabic letter ''bฤสผ'' ุจ, but indicates that older programs which do not support Unicode should render it as a question mark instead. The control word <code>\uc0</code> can be used to indicate that subsequent Unicode escape sequences within the current group do not specify the substitution character. Until RTF specification version 1.5 release in 1997, RTF only handled 7-bit characters directly and 8-bit characters encoded as hexadecimal (using <code>\'xx</code>). Since RTF 1.5, however, RTF control words generally accept signed 16-bit numbers as arguments. Unicode values greater than 32767 must be expressed as negative numbers.<ref name="rtf15" /> If a Unicode character is [[Unicode plane|outside BMP]], it is encoded with a surrogate pair. Support for Unicode was made due to text handling changes in Microsoft Word โ Microsoft Word 97 is a partially Unicode-enabled application and it handles text using the [[UTF-16|16-bit Unicode character encoding scheme]].<ref name="rtf15" /> Microsoft Word 2000 and later versions are Unicode-enabled applications that handle text using the 16-bit Unicode character encoding scheme.<ref name="rtf16" /> Because RTF files are usually 7-bit ASCII [[plain text]], they can be easily transmitted between PC-based operating systems. Converters that communicate with Microsoft Word for MS Windows or Macintosh generally expect data transfer as 8-bit characters and binary data which can contain any 8-bit values.<ref name="rtf191" />
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)