Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
Uniform Resource Identifier
(section)
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
===<span class="anchor" id="syntax"></span> Syntax === {{see also|List of URI schemes}} A URI has a scheme that refers to a specification for assigning identifiers within that scheme. As such, the URI syntax is a federated and extensible naming system wherein each scheme's specification may further restrict the syntax and semantics of identifiers using that scheme. The URI generic syntax is a superset of the syntax of all URI schemes. It was first defined in {{IETF RFC|2396}}, published in August 1998,{{Ref RFC|2396}} and finalized in {{IETF RFC|3986}}, published in January 2005.{{Sfn|Berners-Lee, Tim; Fielding, Roy T.; Masinter, Larry|2005|p=46|ps=; "9. Acknowledgements"}} A URI is composed from an allowed set of [[ASCII]] characters consisting of [[Filename|reserved characters]] (gen-delims: <code>:</code>, <code>/</code>, <code>?</code>, <code>#</code>, <code>[</code>, <code>]</code>, and <code>@</code>; sub-delims: <code>!</code>, <code>$</code>, <code>&</code>, <code>'</code>, <code>(</code>, <code>)</code>, <code>*</code>, <code>+</code>, <code>,</code>, <code>;</code>, and <code>=</code>),{{Sfn|Berners-Lee, Tim; Fielding, Roy T.; Masinter, Larry|2005|ps=; "2.2. Reserved Characters", "2.3. Unreserved Characters"|pp=13–14}} unreserved characters ([[Latin-script alphabet|uppercase and lowercase letters]], [[Arabic numerals|decimal digits]], <code>-</code>, <code>.</code>, <code>_</code>, and <code>~</code>),{{Sfn|Berners-Lee, Tim; Fielding, Roy T.; Masinter, Larry|2005|ps=; "2.2. Reserved Characters", "2.3. Unreserved Characters"|pp=13–14}} and the character <code>%</code>.{{Sfn|Berners-Lee, Tim; Fielding, Roy T.; Masinter, Larry|2005|ps=; "2.1. Percent-Encoding"|pp=12}} Syntax components and subcomponents are separated by ''delimiters'' from the reserved characters (only from generic reserved characters for components) and define ''identifying data'' represented as unreserved characters, reserved characters that do not act as delimiters in the component and subcomponent respectively,{{Ref RFC|3986|rsection=2}} and [[percent-encoding]]s when the corresponding character is outside the allowed set or is being used as a delimiter of, or within, the component. A percent-encoding of an identifying data [[Octet (computing)|octet]] is a sequence of three characters, consisting of the character <code>%</code> followed by the two hexadecimal digits representing that octet's numeric value.{{Ref RFC|3986|rsection=2.1}} <section begin="syntax"/><!-- This section is transcluded in other articles. See Help:Labeled section transclusion -->The URI generic syntax consists of five ''components'' organized hierarchically in order of decreasing significance from left to right:{{Ref RFC|3986|rsection=3}} <pre> URI = scheme ":" ["//" authority] path ["?" query] ["#" fragment] </pre> A component is ''undefined'' if it has an associated delimiter and the delimiter does not appear in the URI; the scheme and path components are always defined.{{Ref RFC|3986|rsection=5.2.1}} A component is ''empty'' if it has no characters; the scheme component is always non-empty.{{Ref RFC|3986|rsection=3}} The authority component consists of ''subcomponents'': <pre> authority = [userinfo "@"] host [":" port] </pre> This is represented in a [[syntax diagram]] as: <div class="skin-invert-image">{{wide image|URI syntax diagram.svg|900px|alt=URI syntax diagram}}</div> The URI comprises: * A non-empty '''{{visible anchor|scheme}}''' component followed by a colon (<code>:</code>), consisting of a sequence of characters beginning with a letter and followed by any combination of letters, digits, plus (<code>+</code>), period (<code>.</code>), or hyphen (<code>-</code>). Although schemes are case-insensitive, the canonical form is lowercase and documents that specify schemes must do so with lowercase letters. Examples of popular schemes include <code>[[Hypertext Transfer Protocol|http]]</code>, <code>[[HTTP Secure|https]]</code>, <code>[[File Transfer Protocol|ftp]]</code>, <code>[[mailto]]</code>, <code>[[File URI scheme|file]]</code>, <code>[[Data URI scheme|data]]</code> and <code>[[Internet Relay Chat#URI scheme|irc]]</code>. URI schemes should be registered with the [[Internet Assigned Numbers Authority|Internet Assigned Numbers Authority (IANA)]], although non-registered schemes are used in practice.{{efn|The procedures for registering new URI schemes were originally defined in 1999 by {{IETF RFC|2717}}, and are now defined by {{IETF RFC|7595|link=no}}, published in June 2015.{{Ref RFC|7595}}}} * An optional '''{{visible anchor|authority}}''' component preceded by two slashes (<code>//</code>), comprising: ** An optional '''{{visible anchor|userinfo}}''' subcomponent followed by an at symbol (<code>@</code>), that may consist of a [[User (computing)|user name]] and an optional [[password]] preceded by a colon (<code>:</code>). Use of the format <code>username:password</code> in the userinfo subcomponent is deprecated for security reasons. Applications should not render as clear text any data after the first colon (<code>:</code>) found within a userinfo subcomponent unless the data after the colon is the empty string (indicating no password). ** A '''{{visible anchor|host}}''' subcomponent, consisting of either a registered name (including but not limited to a [[hostname]]) or an [[IP address]]. [[IPv4]] addresses must be in [[dot-decimal notation]], and [[IPv6]] addresses must be enclosed in brackets (<code>[]</code>).{{Ref RFC|3986|rsection=3.2.2}}{{efn|For URIs relating to resources on the World Wide Web, some web browsers allow {{code|.0}} portions of dot-decimal notation to be dropped or raw integer IP addresses to be used.{{sfnp|Lawrence|2014}}}} ** An optional '''{{visible anchor|port}}''' subcomponent preceded by a colon (<code>:</code>), consisting of decimal digits. * A '''{{visible anchor|path}}''' component, consisting of a sequence of path segments separated by a slash (<code>/</code>). A path is always defined for a URI, though the defined path may be empty (zero length). A segment may also be empty, resulting in two consecutive slashes (<code>//</code>) in the path component. A path component may resemble or map exactly to a [[path (computing)|file system path]] but does not always imply a relation to one. If an authority component is defined, then the path component must either be empty or begin with a slash (<code>/</code>). If an authority component is undefined, then the path cannot begin with an empty segment—that is, with two slashes (<code>//</code>)—since the following characters would be interpreted as an authority component.{{Ref RFC|2396|rsection=3.3}} : By convention, in '''http''' and '''https''' URIs, the last part of a ''path'' is named '''{{visible anchor|pathinfo}}''' and it is optional. It is composed by zero or more path segments that do not refer to an existing physical resource name (e.g. a file, an internal module program or an executable program) but to a logical part (e.g. a command or a qualifier part) that has to be passed separately to the first part of the path that identifies an executable module or program managed by a [[web server]]; this is often used to select dynamic content (a document, etc.) or to tailor it as requested (see also: [[Common Gateway Interface|CGI]] and PATH_INFO, etc.). : Example: :: URI: {{code|1="http://www.example.com/questions/3456/my-document"}} :: where: {{code|1="/questions"}} is the first part of the ''path'' (an executable module or program) and {{code|1="/3456/my-document"}} is the second part of the ''path'' named ''pathinfo'', which is passed to the executable module or program named {{code|1="/questions"}} to select the requested document. : An '''http''' or '''https''' URI containing a ''pathinfo'' part without a [[#query|query]] part may also be referred to as a '[[clean URL]],' whose last part may be a '[[Clean URL#Slug|slug]].' {| class="wikitable" style="float: right; font-size: 0.9em; margin-left: 1em" |- ! Query delimiter ! Example |- | Ampersand (<code>&</code>) | <code>key1=value1&key2=value2</code> |- | Semicolon (<code>;</code>){{efn|Historic {{IETF RFC|1866}} (obsoleted by {{IETF RFC|2854|link=no}}) encourages CGI authors to support ';' in addition to '&'.{{Ref RFC|1866|rsection=8.2.1}}}} | <code>key1=value1;key2=value2</code> |} * An optional '''{{visible anchor|query}}''' component preceded by a question mark (<code>?</code>), consisting of a [[query string]] of non-hierarchical data. Its syntax is not well defined, but by convention is most often a sequence of [[attribute–value pair]]s separated by a [[delimiter]]. * An optional '''{{visible anchor|fragment}}''' component preceded by a [[Number sign|hash]] (<code>#</code>). The fragment contains a [[fragment identifier]] providing direction to a secondary resource, such as a section heading in an article identified by the remainder of the URI. When the primary resource is an [[HTML]] document, the fragment is often an [[HTML#Attributes|<code>id</code> attribute]] of a specific element, and web browsers will scroll this element into view.<section end="syntax"/> The scheme- or implementation-specific reserved character <code>+</code> may be used in the scheme, userinfo, host, path, query, and fragment, and the scheme- or implementation-specific reserved characters <code>!</code>, <code>$</code>, <code>&</code>, <code>'</code>, <code>(</code>, <code>)</code>, <code>*</code>, <code>,</code>, <code>;</code>, and <code>=</code> may be used in the userinfo, host, path, query, and fragment. Additionally, the generic reserved character <code>:</code> may be used in the userinfo, path, query and fragment, the generic reserved characters <code>@</code> and <code>/</code> may be used in the path, query and fragment, and the generic reserved character <code>?</code> may be used in the query and fragment.{{Ref RFC|3986|rsection=A}}
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)