Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
Bidirectional text
(section)
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
=== Explicit formatting === Explicit formatting characters, also referred to as "directional formatting characters", are special Unicode sequences that direct the algorithm to modify its default behavior. These characters are subdivided into "marks", "embeddings", "isolates", and "overrides". Their effects continue until the occurrence of either a paragraph separator, or a "pop" character. ==== Marks ==== {{see also|Right-to-left mark|Left-to-right mark|Arabic letter mark}} If a "weak" character is followed by another "weak" character, the algorithm will look at the first neighbouring "strong" character. Sometimes this leads to unintentional display errors. These errors are corrected or prevented with "pseudo-strong" characters. Such [[Unicode control characters]] are called ''marks''. The mark ({{unichar|200E|Left-to-right mark|note=LRM}} or {{unichar|200F|Right-to-left mark|note=RLM}}) is to be inserted into a location to make an enclosed weak character inherit its writing direction. For example, to correctly display the {{unichar|2122|trade mark sign}} for an English name brand (LTR) in an Arabic (RTL) passage, an LRM mark is inserted after the trademark symbol if the symbol is not followed by LTR text (e.g. "{{lang|ar|قرأ Wikipedia™‎ طوال اليوم.|rtl=yes}}"). If the LRM mark is not added, the weak character ™ will be neighbored by a strong LTR character and a strong RTL character. Hence, in an RTL context, it will be considered to be RTL, and displayed in an incorrect order (e.g. "{{lang|ar|قرأ Wikipedia™ طوال اليوم.|rtl=yes}}"). ==== Embeddings ==== The "embedding" directional formatting characters are the classical Unicode method of explicit formatting, and as of Unicode 6.3, are being discouraged in favor of "isolates". An "embedding" signals that a piece of text is to be treated as directionally distinct. The text within the scope of the embedding formatting characters is not independent of the surrounding text. Also, characters within an embedding can affect the ordering of characters outside. Unicode 6.3 recognized that directional embeddings usually have too strong an effect on their surroundings and are thus unnecessarily difficult to use. ==== Isolates ==== The "isolate" directional formatting characters signal that a piece of text is to be treated as directionally isolated from its surroundings. As of Unicode 6.3, these are the formatting characters that are being encouraged in new documents – once target platforms are known to support them. These formatting characters were introduced after it became apparent that directional embeddings usually have too strong an effect on their surroundings and are thus unnecessarily difficult to use. Unlike the legacy 'embedding' directional formatting characters, 'isolate' characters have no effect on the ordering of the text outside their scope. Isolates can be nested, and may be placed within embeddings and overrides. ==== Overrides ==== The "override" directional formatting characters allow for special cases, such as for part numbers (e.g. to force a part number made of mixed English, digits and Hebrew letters to be written from right to left), and are recommended to be avoided wherever possible. As is true of the other directional formatting characters, "overrides" can be nested one inside another, and in embeddings and isolates. ===== Using Unicode to override ===== {{Anchor|Using unicode to override}}Using {{Unichar|202D}} will switch the text direction from left-to-right to right-to-left. Similarly, using {{Unichar|202e}} will switch the text direction from right-to-left to left-to-right. Refer to the [https://www.unicode.org/reports/tr9/ Unicode Bidirectional Algorithm]. ==== Pops ==== The "pop" directional formatting character, encoded at {{Unichar|202C}}, terminates the scope of the most recent "embedding", "override", or "isolate".
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)