Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
Internationalized Resource Identifier
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
{{Short description|Expanded set of characters on the URI protocol}} {{Infobox technology standard | title = Internationalized Resource Identifier | long_name = Internationalized Resource Identifier | status = Proposed Standard | year_started = {{Start date|2002|04|22|df=y}} | first_published = {{Start date|2002|04|22|df=y}} | version_date = {{Start date|2020|01|21|df=y}} | authors = {{plainlist| * Martin Dürst * Michel Suignard }} | organization = [[Internet Engineering Task Force|{{abbr|IETF|Internet Engineering Task Force}}]] | domain = [[Character encoding]] | website = {{IETF RFC|3987}} | abbreviation = IRI | base_standards = {{plainlist| * [[Internationalized domain name|{{abbr|IDNA|Internationalized Domain Names in Applications}}]] * [[UTF-8]] * [[Unicode_equivalence#Normalization|Unicode Normalization]] (UAX #15) }} }} The '''Internationalized Resource Identifier''' ('''IRI''') is an [[Internet Standard|internet protocol standard]] which builds on the [[Uniform Resource Identifier]] (URI) protocol by greatly expanding the set of permitted characters.<ref name="gangemi">{{cite journal|last1=Gangemi|first1=Aldo|last2=Presutti|first2=Valentina|date=2006|title=The bourne identity of a web resource|url=http://ra.ethz.ch/CDstore/www2006/www.ibiblio.org/hhalpin/irw2006/vpresutti.pdf|journal=Proceedings of Identity Reference and the Web Workshop (IRW)|series=Laboratory for Applied Ontology|page=3|quote=Notice that IRIs (Internationalized Resource Identifier) [11] are supposed to replace URIs in next future.}}</ref><ref>{{Cite journal|url=https://tools.ietf.org/html/rfc3987#section-1.3|title=Internationalized Resource Identifiers (IRIs)|last=Suignard|first=Michel|website=tools.ietf.org|date=January 2005 |language=en|access-date=2018-06-09|quote=This document defines a new protocol element, the Internationalized Resource Identifier (IRI), as a complement to the Uniform Resource Identifier (URI). An IRI is a sequence of characters from the Universal Character Set (Unicode/ISO 10646). A mapping from IRIs to URIs is defined, which means that IRIs can be used instead of URIs, where appropriate, to identify resources. The approach of defining a new protocol element was chosen instead of extending or changing the definition of URIs.}}</ref><ref>{{Cite journal|url=https://tools.ietf.org/html/rfc3987#page-3|title=Internationalized Resource Identifiers (IRIs)|last=Suignard|first=Michel|website=tools.ietf.org|date=January 2005 |language=en|access-date=2018-06-09|quote=This document defines a new protocol element called Internationalized Resource Identifier (IRI) by extending the syntax of URIs to a much wider repertoire of characters. It also defines "internationalized" versions corresponding to other constructs from [RFC3986], such as URI references. The syntax of IRIs is defined in section 2, and the relationship between IRIs and URIs in section 3.}}</ref> It was defined by the [[Internet Engineering Task Force]] (IETF) in 2005 in RFC 3987. While URIs are limited to a subset of the [[US-ASCII]] character set (characters outside that set must be mapped to octets according to some unspecified character encoding, then [[percent-encoding|percent-encoded]]), IRIs may additionally contain most characters from the [[Universal Character Set]] (Unicode/[[ISO 10646]]),<ref>{{Cite journal|url=http://tools.ietf.org/html/rfc3987|title=Internationalized Resource Identifiers (IRIs)|last=Suignard|first=Michel|website=tools.ietf.org|date=January 2005 |language=en|access-date=2018-06-09}}</ref><ref>{{Cite journal|url=https://tools.ietf.org/html/rfc3987#section-1.3|title=Internationalized Resource Identifiers (IRIs)|last=Suignard|first=Michel|website=tools.ietf.org|date=January 2005 |language=en|access-date=2018-06-09}}</ref> including [[Chinese characters| Chinese]], [[Japanese writing system|Japanese]], [[Korean alphabet|Korean]], and [[Cyrillic script|Cyrillic]] characters. == Syntax == IRIs extend URIs by using the [[Universal Character Set]], where URIs were limited to [[ASCII]], with far fewer characters. IRIs may be represented by a sequence of octets but by definition are defined as a sequence of characters, because IRIs may be spoken or written by hand.<ref name="rfc3987" /><!-- section 2.0 --> == Compatibility == IRIs are mapped to URIs to retain backwards-compatibility with systems that do not support the new format.<ref name=rfc3987>{{cite journal|last1=Duerst|first1=M.|title=RFC 3987|journal=Network Working Group|date=2005|volume=Standards Track|url=http://tools.ietf.org/html/rfc3987|access-date=12 October 2014}}</ref> For applications and protocols that do not allow direct consumption of IRIs, the IRI should first be converted to Unicode using [[Unicode equivalence|canonical composition normalization (NFC)]], if not already in Unicode format. All non-ASCII code points in the IRI should next be encoded as [[UTF-8]], and the resulting bytes [[Percent-encoding|percent-encoded]], to produce a valid URI. Example: The IRI https://en.wiktionary.org/wiki/Ῥόδος becomes the URI https://en.wiktionary.org/wiki/%E1%BF%AC%CF%8C%CE%B4%CE%BF%CF%82 ASCII code points that are invalid URI characters ''may'' be encoded the same way, depending on implementation.<ref name="rfc3987" /> This conversion is easily reversible; by definition, converting an IRI to an URI and back again will yield an IRI that is semantically equivalent to the original IRI, even though it may differ in exact representation.<ref>{{cite book|last1=Hendler|first1=Hrsg. Dieter Fensel; Hrsg. John Domingue; Hrsg. James A.|title=Handbook of Semantic Web Technologies|date=2010|publisher=Springer-Verlag GmbH|location=Berlin|isbn=978-3-540-92912-3|edition=1. Aufl.|url=https://books.google.com/books?id=sdEFvSb9WNsC|access-date=12 October 2014}}</ref> Some protocols may impose further transformations; e.g. [[Punycode]] for [[Domain Name System|DNS]] labels. == Advantages == There are reasons to see URIs displayed in different languages; mostly, it makes it easier for users who are unfamiliar with the Latin (A–Z) alphabet. Assuming that it isn't too difficult for anyone to replicate arbitrary Unicode on their keyboards, this can make the [[URI]] system more accessible.<ref>{{cite web|last1=Clark|first1=Kendall|title=Internationalizing the URI|url=http://www.xml.com/pub/a/2003/05/07/deviant.html|publisher=O’Reilly Media, Inc.|access-date=12 October 2014|date=2003-05-07}}</ref> == Disadvantages == Mixing IRIs and [[ASCII]] [[uniform resource identifier|URIs]] can make it much easier to execute [[phishing]] attacks that trick someone into believing they are on a different site than they really are. For example, one can replace an ASCII "a" in <code>www.myfictionalbank.com</code> with the Unicode look-alike "[[α]]" to give <code>www.myfictionαlbank.com</code> and point that IRI to a malicious site. This is known as an [[IDN homograph attack]]. While a URI does not provide people with a way to specify web resources using their own alphabets, an IRI does not make clear how web resources can be accessed with keyboards that are not capable of generating the requisite internationalized characters. This means that IRIs are now handled in a way very similar to many other software which might require the use of a non-keyboard [[input method]] when dealing with texts in various languages. == See also == * [[Internationalized domain name|IDN]] (Internationalized Domain Name) * [[Semantic Web]] * [[Punycode]] * [[XRI]] (Extensible Resource Identifier) == References == {{reflist}} == External links == * [https://datatracker.ietf.org/doc/rfc3987/ RFC 3987: Proposed Standard of Internationalized Resource Identifiers (IRIs)] * [http://www.w3.org/International/ W3C Internationalization Activity] * [https://www.iana.org/assignments/uri-schemes/uri-schemes.xhtml#uri-schemes-3 IANA List of Registered URI Schemes] * [https://www.w3.org/International/iri-edit/spec-use-survey.html Survey of use of IRIs in W3C Specs] {{Semantic Web|state=collapsed}} {{URI scheme}} [[Category:Application layer protocols]] [[Category:Internet protocols]] [[Category:Internet Standards]] [[Category:Semantic Web| ]] [[Category:URL]] [[Category:Web services]]
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)
Pages transcluded onto the current version of this page
(
help
)
:
Template:Cite book
(
edit
)
Template:Cite journal
(
edit
)
Template:Cite web
(
edit
)
Template:Infobox technology standard
(
edit
)
Template:Reflist
(
edit
)
Template:Semantic Web
(
edit
)
Template:Short description
(
edit
)
Template:URI scheme
(
edit
)