Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
Unicode
(section)
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
=== Security<span class="anchor" id="Security issues"></span> === Unicode has a large number of [[homoglyphs]], many of which look very similar or identical to ASCII letters. Substitution of these can make an identifier or URL that looks correct, but directs to a different location than expected.<ref>{{Cite web |title=UTR #36: Unicode Security Considerations |url=https://unicode.org/reports/tr36/ |website=Unicode}}</ref> Additionally, homoglyphs can also be used for manipulating the output of [[NLP (computer science)|natural language processing (NLP)]] systems.<ref>{{Cite book |last1=Boucher |first1=Nicholas |last2=Shumailov |first2=Ilia |last3=Anderson |first3=Ross |last4=Papernot |first4=Nicolas |title=2022 IEEE Symposium on Security and Privacy (SP) |chapter=Bad Characters: Imperceptible NLP Attacks |year=2022 |chapter-url=https://ieeexplore.ieee.org/document/9833641 |location=San Francisco, CA, US |publisher=IEEE |pages=1987β2004 |arxiv=2106.09898 |doi=10.1109/SP46214.2022.9833641 |isbn=978-1-66541-316-9 |s2cid=235485405}}</ref> Mitigation requires disallowing these characters, displaying them differently, or requiring that they resolve to the same identifier;<ref>{{Cite web |last=Engineering |first=Spotify |date=2013-06-18 |title=Creative usernames and Spotify account hijacking |url=https://engineering.atspotify.com/2013/06/creative-usernames/ |access-date=2023-04-15 |website=Spotify Engineering |language=en-US}}</ref> all of this is complicated due to the huge and constantly changing set of characters.<ref>{{cite tech report | last=Wheeler | first=David A. | title=Initial Analysis of Underhanded Source Code | year=2020 | jstor=resrep25332.7 | url=http://www.jstor.org/stable/resrep25332.7 | page=4β1β4β10}}</ref><ref>{{Cite web |title=UTR #36: Unicode Security Considerations |url=https://unicode.org/reports/tr36/ |access-date=27 June 2022 |website=Unicode}}</ref> A security advisory was released in 2021 by two researchers, one from the [[University of Cambridge]] and the other from the [[University of Edinburgh]], in which they assert that the [[Bidirectional Text#Bidirectional text#Explicit formatting|BiDi marks]] can be used to make large sections of code do something different from what they appear to do. The problem was named "[[Trojan Source]]".<ref>{{Cite web |first1=Nicholas |last1=Boucher |first2=Ross |last2=Anderson |title=Trojan Source: Invisible Vulnerabilities |url=https://www.trojansource.codes/trojan-source.pdf |access-date=2 November 2021}}</ref> In response, code editors started highlighting marks to indicate forced text-direction changes.<ref>{{Cite web |title=Visual Studio Code October 2021 |url=https://code.visualstudio.com/updates/v1_62#_unicode-directional-formatting-characters |access-date=11 November 2021 |website=code.visualstudio.com |language=en}}</ref> The [[UTF-8]] and [[UTF-16]] encodings do not accept all possible sequences of code units. Implementations vary in what they do when reading an invalid sequence, which has led to security bugs.<ref>{{Cite web |first1=Dominique|last1= Dittert |title=From Unicode to Exploit: The Security Risks of Overlong UTF-8 Encodings |date= 6 September 2024 |url=https://herolab.usd.de/en/the-security-risks-of-overlong-utf-8-encodings/ |access-date=26 December 2024}}</ref><ref>{{Cite web |first1=Kevin|last1= Boone |title= UTF-8 and the problem of over-long characters|url= https://kevinboone.me/overlong.html |access-date=26 December 2024}}</ref>
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)