Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
Uniform Resource Identifier
(section)
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
== Design == === URLs and URNs === A [[Uniform Resource Name]] (URN) is a URI that identifies a resource by name in a particular namespace. A URN may be used to talk about a resource without implying its location or how to access it. For example, in the [[International Standard Book Number]] (ISBN) system, ''<nowiki>ISBN</nowiki> 0-486-27557-4'' identifies a specific edition of the [[William Shakespeare]] play ''[[Romeo and Juliet]]''. The URN for that edition would be ''<nowiki>urn:isbn:0-486-27557-4</nowiki>''. However, it gives no information as to where to find a copy of that book. A [[Uniform Resource Locator]] (URL) is a URI that specifies the means of acting upon or obtaining the representation of a resource, i.e. specifying both its primary access mechanism and network location. For example, the URL <code><nowiki>http://example.org/wiki/Main_Page</nowiki></code> refers to a resource identified as <code><nowiki>/wiki/Main_Page</nowiki></code>, whose representation is obtainable via the [[Hypertext Transfer Protocol]] (''http:'') from a network host whose [[domain name]] is <code><nowiki>example.org</nowiki></code>. (In this case, HTTP usually implies it to be in the form of [[HTML]] and related code. In practice, that is not necessarily the case, as HTTP allows specifying arbitrary formats in its header.) A URN is analogous to a person's name, while a URL is analogous to their street address. In other words, a URN identifies an item and a URL provides a method for finding it. Technical publications, especially standards produced by the IETF and by the W3C, normally reflect a view outlined in a [[W3C Recommendation]] of 30 July 2001, which acknowledges the precedence of the term URI rather than endorsing any formal subdivision into URL and URN. {{cquote|URL is a useful but informal concept: a URL is a type of URI that identifies a resource via a representation of its primary access mechanism (e.g., its network "location"), rather than by some other attributes it may have.<ref>{{cite web |last1=((URI Planning Interest Group, W3C/IETF)) |title=URIs, URLs, and URNs: Clarifications and Recommendations 1.0 |url=https://www.w3.org/TR/uri-clarification/ |website=www.w3.org |publisher=W3C/IETF |access-date=8 December 2020 |date=September 2001}}</ref>}} As such, a URL is simply a URI that happens to point to a resource over a network.{{efn|A report published in 2002 by a joint W3C/IETF working group aimed to normalize the divergent views held within the IETF and W3C over the relationship between the various 'UR*' terms and standards. While not published as a full standard by either organization, it has become the basis for the above common understanding and has informed many standards since then.}}{{Ref RFC|3305}} However, in non-technical contexts and in software for the World Wide Web, the term "URL" remains widely used. Additionally, the term "web address" (which has no formal definition) often occurs in non-technical publications as a synonym for a URI that uses the ''http'' or ''https'' schemes. Such assumptions can lead to confusion, for example, in the case of XML namespaces that have a [[#Relation to XML namespaces|visual similarity to resolvable URIs]]. Specifications produced by the [[WHATWG]] prefer ''URL'' over ''URI'', and so newer HTML5 APIs use ''URL'' over ''URI''.<ref>{{cite web |title= 6.3. URL APIs elsewhere |url=https://url.spec.whatwg.org/#url-apis-elsewhere |date= 12 May 2025 |website=URL Standard }}</ref> {{cquote|Standardize on the term URL. URI and IRI [Internationalized Resource Identifier] are just confusing. In practice a single algorithm is used for both so keeping them distinct is not helping anyone. URL also easily wins the search result popularity contest.<ref>{{cite web |title=URL Standard: Goals |url=https://url.spec.whatwg.org/#goals}}</ref>}} While most URI schemes were originally designed to be used with a particular [[protocol (computing)|protocol]], and often have the same name, they are semantically different from protocols. For example, the scheme ''http'' is generally used for interacting with [[web resource]]s using HTTP, but the scheme ''[[file URI scheme|file]]'' has no protocol. ===<span class="anchor" id="syntax"></span> Syntax === {{see also|List of URI schemes}} A URI has a scheme that refers to a specification for assigning identifiers within that scheme. As such, the URI syntax is a federated and extensible naming system wherein each scheme's specification may further restrict the syntax and semantics of identifiers using that scheme. The URI generic syntax is a superset of the syntax of all URI schemes. It was first defined in {{IETF RFC|2396}}, published in August 1998,{{Ref RFC|2396}} and finalized in {{IETF RFC|3986}}, published in January 2005.{{Sfn|Berners-Lee, Tim; Fielding, Roy T.; Masinter, Larry|2005|p=46|ps=; "9. Acknowledgements"}} A URI is composed from an allowed set of [[ASCII]] characters consisting of [[Filename|reserved characters]] (gen-delims: <code>:</code>, <code>/</code>, <code>?</code>, <code>#</code>, <code>[</code>, <code>]</code>, and <code>@</code>; sub-delims: <code>!</code>, <code>$</code>, <code>&</code>, <code>'</code>, <code>(</code>, <code>)</code>, <code>*</code>, <code>+</code>, <code>,</code>, <code>;</code>, and <code>=</code>),{{Sfn|Berners-Lee, Tim; Fielding, Roy T.; Masinter, Larry|2005|ps=; "2.2. Reserved Characters", "2.3. Unreserved Characters"|pp=13β14}} unreserved characters ([[Latin-script alphabet|uppercase and lowercase letters]], [[Arabic numerals|decimal digits]], <code>-</code>, <code>.</code>, <code>_</code>, and <code>~</code>),{{Sfn|Berners-Lee, Tim; Fielding, Roy T.; Masinter, Larry|2005|ps=; "2.2. Reserved Characters", "2.3. Unreserved Characters"|pp=13β14}} and the character <code>%</code>.{{Sfn|Berners-Lee, Tim; Fielding, Roy T.; Masinter, Larry|2005|ps=; "2.1. Percent-Encoding"|pp=12}} Syntax components and subcomponents are separated by ''delimiters'' from the reserved characters (only from generic reserved characters for components) and define ''identifying data'' represented as unreserved characters, reserved characters that do not act as delimiters in the component and subcomponent respectively,{{Ref RFC|3986|rsection=2}} and [[percent-encoding]]s when the corresponding character is outside the allowed set or is being used as a delimiter of, or within, the component. A percent-encoding of an identifying data [[Octet (computing)|octet]] is a sequence of three characters, consisting of the character <code>%</code> followed by the two hexadecimal digits representing that octet's numeric value.{{Ref RFC|3986|rsection=2.1}} <section begin="syntax"/><!-- This section is transcluded in other articles. See Help:Labeled section transclusion -->The URI generic syntax consists of five ''components'' organized hierarchically in order of decreasing significance from left to right:{{Ref RFC|3986|rsection=3}} <pre> URI = scheme ":" ["//" authority] path ["?" query] ["#" fragment] </pre> A component is ''undefined'' if it has an associated delimiter and the delimiter does not appear in the URI; the scheme and path components are always defined.{{Ref RFC|3986|rsection=5.2.1}} A component is ''empty'' if it has no characters; the scheme component is always non-empty.{{Ref RFC|3986|rsection=3}} The authority component consists of ''subcomponents'': <pre> authority = [userinfo "@"] host [":" port] </pre> This is represented in a [[syntax diagram]] as: <div class="skin-invert-image">{{wide image|URI syntax diagram.svg|900px|alt=URI syntax diagram}}</div> The URI comprises: * A non-empty '''{{visible anchor|scheme}}''' component followed by a colon (<code>:</code>), consisting of a sequence of characters beginning with a letter and followed by any combination of letters, digits, plus (<code>+</code>), period (<code>.</code>), or hyphen (<code>-</code>). Although schemes are case-insensitive, the canonical form is lowercase and documents that specify schemes must do so with lowercase letters. Examples of popular schemes include <code>[[Hypertext Transfer Protocol|http]]</code>, <code>[[HTTP Secure|https]]</code>, <code>[[File Transfer Protocol|ftp]]</code>, <code>[[mailto]]</code>, <code>[[File URI scheme|file]]</code>, <code>[[Data URI scheme|data]]</code> and <code>[[Internet Relay Chat#URI scheme|irc]]</code>. URI schemes should be registered with the [[Internet Assigned Numbers Authority|Internet Assigned Numbers Authority (IANA)]], although non-registered schemes are used in practice.{{efn|The procedures for registering new URI schemes were originally defined in 1999 by {{IETF RFC|2717}}, and are now defined by {{IETF RFC|7595|link=no}}, published in June 2015.{{Ref RFC|7595}}}} * An optional '''{{visible anchor|authority}}''' component preceded by two slashes (<code>//</code>), comprising: ** An optional '''{{visible anchor|userinfo}}''' subcomponent followed by an at symbol (<code>@</code>), that may consist of a [[User (computing)|user name]] and an optional [[password]] preceded by a colon (<code>:</code>). Use of the format <code>username:password</code> in the userinfo subcomponent is deprecated for security reasons. Applications should not render as clear text any data after the first colon (<code>:</code>) found within a userinfo subcomponent unless the data after the colon is the empty string (indicating no password). ** A '''{{visible anchor|host}}''' subcomponent, consisting of either a registered name (including but not limited to a [[hostname]]) or an [[IP address]]. [[IPv4]] addresses must be in [[dot-decimal notation]], and [[IPv6]] addresses must be enclosed in brackets (<code>[]</code>).{{Ref RFC|3986|rsection=3.2.2}}{{efn|For URIs relating to resources on the World Wide Web, some web browsers allow {{code|.0}} portions of dot-decimal notation to be dropped or raw integer IP addresses to be used.{{sfnp|Lawrence|2014}}}} ** An optional '''{{visible anchor|port}}''' subcomponent preceded by a colon (<code>:</code>), consisting of decimal digits. * A '''{{visible anchor|path}}''' component, consisting of a sequence of path segments separated by a slash (<code>/</code>). A path is always defined for a URI, though the defined path may be empty (zero length). A segment may also be empty, resulting in two consecutive slashes (<code>//</code>) in the path component. A path component may resemble or map exactly to a [[path (computing)|file system path]] but does not always imply a relation to one. If an authority component is defined, then the path component must either be empty or begin with a slash (<code>/</code>). If an authority component is undefined, then the path cannot begin with an empty segmentβthat is, with two slashes (<code>//</code>)βsince the following characters would be interpreted as an authority component.{{Ref RFC|2396|rsection=3.3}} : By convention, in '''http''' and '''https''' URIs, the last part of a ''path'' is named '''{{visible anchor|pathinfo}}''' and it is optional. It is composed by zero or more path segments that do not refer to an existing physical resource name (e.g. a file, an internal module program or an executable program) but to a logical part (e.g. a command or a qualifier part) that has to be passed separately to the first part of the path that identifies an executable module or program managed by a [[web server]]; this is often used to select dynamic content (a document, etc.) or to tailor it as requested (see also: [[Common Gateway Interface|CGI]] and PATH_INFO, etc.). : Example: :: URI: {{code|1="http://www.example.com/questions/3456/my-document"}} :: where: {{code|1="/questions"}} is the first part of the ''path'' (an executable module or program) and {{code|1="/3456/my-document"}} is the second part of the ''path'' named ''pathinfo'', which is passed to the executable module or program named {{code|1="/questions"}} to select the requested document. : An '''http''' or '''https''' URI containing a ''pathinfo'' part without a [[#query|query]] part may also be referred to as a '[[clean URL]],' whose last part may be a '[[Clean URL#Slug|slug]].' {| class="wikitable" style="float: right; font-size: 0.9em; margin-left: 1em" |- ! Query delimiter ! Example |- | Ampersand (<code>&</code>) | <code>key1=value1&key2=value2</code> |- | Semicolon (<code>;</code>){{efn|Historic {{IETF RFC|1866}} (obsoleted by {{IETF RFC|2854|link=no}}) encourages CGI authors to support ';' in addition to '&'.{{Ref RFC|1866|rsection=8.2.1}}}} | <code>key1=value1;key2=value2</code> |} * An optional '''{{visible anchor|query}}''' component preceded by a question mark (<code>?</code>), consisting of a [[query string]] of non-hierarchical data. Its syntax is not well defined, but by convention is most often a sequence of [[attributeβvalue pair]]s separated by a [[delimiter]]. * An optional '''{{visible anchor|fragment}}''' component preceded by a [[Number sign|hash]] (<code>#</code>). The fragment contains a [[fragment identifier]] providing direction to a secondary resource, such as a section heading in an article identified by the remainder of the URI. When the primary resource is an [[HTML]] document, the fragment is often an [[HTML#Attributes|<code>id</code> attribute]] of a specific element, and web browsers will scroll this element into view.<section end="syntax"/> The scheme- or implementation-specific reserved character <code>+</code> may be used in the scheme, userinfo, host, path, query, and fragment, and the scheme- or implementation-specific reserved characters <code>!</code>, <code>$</code>, <code>&</code>, <code>'</code>, <code>(</code>, <code>)</code>, <code>*</code>, <code>,</code>, <code>;</code>, and <code>=</code> may be used in the userinfo, host, path, query, and fragment. Additionally, the generic reserved character <code>:</code> may be used in the userinfo, path, query and fragment, the generic reserved characters <code>@</code> and <code>/</code> may be used in the path, query and fragment, and the generic reserved character <code>?</code> may be used in the query and fragment.{{Ref RFC|3986|rsection=A}} === Example URIs === The following figure displays example URIs and their component parts. {{Pre|<nowiki/> {{color|rgb(0, 76, 178)|userinfo}} {{color|rgb(0, 177, 17)|host}} {{color|rgb(178, 111, 0)|port}} {{color|rgb(0, 76, 178)|ββββ΄ββββ}} {{color|rgb(0, 177, 17)|ββββββββ΄βββββββ}} {{color|rgb(178, 111, 0)|ββ΄ββ}} <nowiki>https://john.doe@www.example.com:1234/forum/questions/?tag=networking&order=newest#top</nowiki> {{color|rgb(178, 111, 0)|βββ¬ββ}} {{color|rgb(176, 0, 177)|βββββββββββββββ¬ββββββββββββββ}}{{color|rgb(0, 76, 178)|βββββββββ¬ββββββββ}} {{color|rgb(0, 178, 17)|ββββββββββββββ¬βββββββββββββ}} {{color|rgb(178, 111, 0)|ββ¬β}} {{color|rgb(178, 111, 0)|scheme}} {{color|rgb(176, 0, 177)|authority}} {{color|rgb(0, 76, 178)|path}} {{color|rgb(0, 178, 17)|{{color|rgb(0, 178, 17)|query}}}} {{color|rgb(178, 111, 0)|fragment}} {{color|rgb(0, 76, 178)|userinfo}} {{color|rgb(0, 177, 17)|host}} {{color|rgb(178, 111, 0)|port}} {{color|rgb(0, 76, 178)|ββββ΄ββββ}} {{color|rgb(0, 177, 17)|ββββββββ΄βββββββ}} {{color|rgb(178, 111, 0)|ββ΄ββ}} <nowiki>https://john.doe@www.example.com:1234/forum/questions/?tag=networking&order=newest#:~:text=whatever</nowiki> {{color|rgb(178, 111, 0)|βββ¬ββ}} {{color|rgb(176, 0, 177)|βββββββββββββββ¬ββββββββββββββ}}{{color|rgb(0, 76, 178)|βββββββββ¬ββββββββ}} {{color|rgb(0, 178, 17)|ββββββββββββββ¬βββββββββββββ}} {{color|rgb(178, 111, 0)|βββββββββ¬ββββββββ}} {{color|rgb(178, 111, 0)|scheme}} {{color|rgb(176, 0, 177)|authority}} {{color|rgb(0, 76, 178)|path}} {{color|rgb(0, 178, 17)|{{color|rgb(0, 178, 17)|query}}}} {{color|rgb(178, 111, 0)|fragment}} <nowiki>ldap://[2001:db8::7]/c=GB?objectClass?one</nowiki> {{color|rgb(178, 111, 0)|ββ¬ββ}} {{color|rgb(176, 0, 177)|βββββββ¬ββββββ}}{{color|rgb(0, 76, 178)|βββ¬ββ}} {{color|rgb(0, 178, 17)|ββββββββ¬βββββββ}} {{color|rgb(178, 111, 0)|scheme}} {{color|rgb(176, 0, 177)|authority}} {{color|rgb(0, 76, 178)|path}} {{color|rgb(0, 178, 17)|query}} <nowiki>mailto:John.Doe@example.com</nowiki> {{color|rgb(178, 111, 0)|βββ¬βββ}} {{color|rgb(0, 76, 178)|ββββββ¬ββββββββββββββ}} {{color|rgb(178, 111, 0)|scheme}} {{color|rgb(0, 76, 178)|path}} <nowiki>news:comp.infosystems.www.servers.unix</nowiki> {{color|rgb(178, 111, 0)|ββ¬ββ}} {{color|rgb(0, 76, 178)|βββββββββββββββ¬ββββββββββββββββββ}} {{color|rgb(178, 111, 0)|scheme}} {{color|rgb(0, 76, 178)|path}} <nowiki>tel:+1-816-555-1212</nowiki> {{color|rgb(178, 111, 0)|ββ¬β}} {{color|rgb(0, 76, 178)|ββββββββ¬βββββββ}} {{color|rgb(178, 111, 0)|scheme}} {{color|rgb(0, 76, 178)|path}} <nowiki>telnet://192.0.2.16:80/</nowiki> {{color|rgb(178, 111, 0)|βββ¬βββ}} {{color|rgb(176, 0, 177)|βββββββ¬ββββββ}}{{color|rgb(0, 76, 178)|β}} {{color|rgb(178, 111, 0)|scheme}} {{color|rgb(176, 0, 177)|authority}} {{color|rgb(0, 76, 178)|path}} <nowiki>urn:oasis:names:specification:docbook:dtd:xml:4.1.2</nowiki> {{color|rgb(178, 111, 0)|ββ¬β}} {{color|rgb(0, 76, 178)|ββββββββββββββββββββββββ¬βββββββββββββββββββββββ}} {{color|rgb(178, 111, 0)|scheme}} {{color|rgb(0, 76, 178)|path}} }} DOIs ([[digital object identifier]]s) fit within the [[Handle System]] and fit within the URI system, [[Handle System#DOIs-Handles-URIs|as facilitated by appropriate syntax]].<!--Per the [[Digital object identifier]] and [[Handle System]] articles, which see.--> === URI references === A ''URI reference'' is either a URI or a ''relative reference'' when it does not begin with a scheme component followed by a colon (<code>:</code>).{{Ref RFC|3986|rsection=4.1}} A path segment that contains a colon character (e.g., <code>foo:bar</code>) cannot be used as the first path segment of a relative reference if its path component does not begin with a slash (<code>/</code>), as it would be mistaken for a scheme component. Such a path segment must be preceded by a dot path segment (e.g., <code>./foo:bar</code>).{{Ref RFC|3986|rsection=4.2}} Web document [[markup language]]s frequently use URI references to point to other resources, such as external documents or specific portions of the same logical document:{{Ref RFC|3986|rsection=4.4}} * in [[HTML]], the value of the <code>src</code> attribute of the <code>img</code> element provides a URI reference, as does the value of the <code>href</code> attribute of the <code>a</code> or <code>link</code> element; * in [[XML]], the [[system identifier]] appearing after the <code>SYSTEM</code> keyword in a [[Document Type Definition|DTD]] is a fragmentless URI reference; * in [[XSLT]], the value of the <code>href</code> attribute of the <code>xsl:import</code> element/instruction is a URI reference; likewise the first argument to the <code>document()</code> function. <pre> https://example.com/path/resource.txt#fragment //example.com/path/resource.txt /path/resource.txt path/resource.txt ../resource.txt ./resource.txt resource.txt #fragment </pre> === Resolution === ''Resolving'' a URI reference against a ''base URI'' results in a ''target URI''. This implies that the base URI exists and is an ''absolute URI'' (a URI with no fragment component). The base URI can be obtained, in order of precedence, from:{{Ref RFC|3986|rsection=5.1}} * the reference URI itself if it is a URI; * the content of the representation; * the entity encapsulating the representation; * the URI used for the actual retrieval of the representation; * the context of the application. Within a representation with a well defined base URI of <pre> http://a/b/c/d;p?q </pre> a relative reference is resolved to its target URI as follows:{{Ref RFC|3986|rsection=5.4}} <pre> "g:h" -> "g:h" "g" -> "http://a/b/c/g" "./g" -> "http://a/b/c/g" "g/" -> "http://a/b/c/g/" "/g" -> "http://a/g" "//g" -> "http://g" "?y" -> "http://a/b/c/d;p?y" "g?y" -> "http://a/b/c/g?y" "#s" -> "http://a/b/c/d;p?q#s" "g#s" -> "http://a/b/c/g#s" "g?y#s" -> "http://a/b/c/g?y#s" ";x" -> "http://a/b/c/;x" "g;x" -> "http://a/b/c/g;x" "g;x?y#s" -> "http://a/b/c/g;x?y#s" "" -> "http://a/b/c/d;p?q" "." -> "http://a/b/c/" "./" -> "http://a/b/c/" ".." -> "http://a/b/" "../" -> "http://a/b/" "../g" -> "http://a/b/g" "../.." -> "http://a/" "../../" -> "http://a/" "../../g" -> "http://a/g" </pre> === URL munging === URL munging is a technique by which a [[Command (computing)|command]] is appended to a URL, usually at the end, after a "?" [[Lexical analysis#Token|token]]. It is commonly used in [[WebDAV]] as a mechanism of adding functionality to [[HTTP]]. In a versioning system, for example, to add a "checkout" command to a URL, it is written as <code><nowiki>http://editing.com/resource/file.php?command=checkout</nowiki></code>. It has the advantage of both being easy for [[Common Gateway Interface|CGI parsers]] and also acts as an intermediary between HTTP and underlying resource, in this case.{{sfn|Whitehead|1998|p=38}} === Relation to XML namespaces === In [[XML]], a [[XML namespace|namespace]] is an abstract domain to which a collection of element and attribute names can be assigned.<!-- who or what can do such assignation? --> The namespace name is a character string which must adhere to the generic URI syntax.{{sfnp|Morrison|2006}} However, the name is generally not considered to be a URI,{{sfnp|Harold|2004}} because the URI specification bases the decision not only on lexical components, but also on their intended use. A namespace name does not necessarily imply any of the semantics of URI schemes; for example, a namespace name beginning with ''http:'' may have no connotation to the use of the [[HTTP]]. Originally, the namespace name could match the syntax of any non-empty URI reference, but the use of relative URI references was deprecated by the W3C.{{sfnp|W3C|2009}} A separate W3C specification for namespaces in XML 1.1 permits [[Internationalized Resource Identifier]] (IRI) references to serve as the basis for namespace names in addition to URI references.{{sfnp|W3C|2006}}
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)