URI: The Uniform Resource Identifier Explained
The concept of URLs is familiar to most people. A URL is a web address that is used to direct users to websites on the internet. But what is a URI? The concept of URIs was conceived of by the forefather of the World Wide Web, Tim Berners-Lee. When he first used the term in RFC 1630, he was still speaking of a Universal Resource Identifier. Since then, however, through, among other things, publications by the World Wide Web Consortium (W3C), URI has been established as the acronym for the Uniform Resource Identifier, and to this day still goes by it. With regard to the original idea, however, nothing has changed.
What is the Uniform Resource Identifier (URI)?
The Uniform Resource Identifier (URI) is intended to identify abstract or physical resources on the Internet. What these resources are supposed to be can vary according to the situation. It can thus be, a website, for example. However, email senders and recipients can also be identified via URI. Applications use the unambiguous designation to identify a resource or to request data from it.
Protocols such as HTTP or FTP can function on this basis as the form of identification is predefined by the URI syntax. From this URI, a system can read where and how certain information should be identified.
URI Syntax
A URI consists of up to five parts. However, only two of these are mandatory.
- scheme: Gives information about the protocol being used.
- authority: Identifies the domains.
- path: Shows the exact path to the resource.
- query: Represents a request action.
- fragment: Refers to a partial aspect of a resource.
Only scheme and path must appear in every identifier. In the URI syntax, all components are listed successively and separated by specific, predefined characters.
scheme :// authority path ? query # fragment
The double forward slashes after the first colon are then only necessary if the authority part is filled. Furthermore, authority can also contain user information that is then detached from the domain’s @ symbol, and finally another port designation, which in turn is separated from the domain with a colon.
A typical web address is a good example: 'https://example.org/test/test1?search=test-question#part2'
- scheme: https
- authority: example.org
- path: test/test1
- query: search=test-question
- fragment: part2
In the example, the URI refers to one part of a website. This part (part2) is accessed via HTTP; is located on a device with example.org as the identifier and can be found at the specified path if one performs a search beforehand. With the Uniform Resource Identifier, an email address can also be identified: 'mailto:user@example.org'.
- scheme: mailto
- path: user@example.org
In this case, not only are the mandatory components contained in the URI. Other potential resources can be identified with this syntax, such as files or even telephone numbers.
Although it’s true that the path is a mandatory specification in every URI, the part’s content can be empty. In other words, “http://example.org/” is a valid URI with an empty path.
URI schemes, in other words, the first part of every URI, are managed by the IANA. Although one can also use their own schemes, those that have been confirmed by the organisation are known throughout the entire Internet. The best-known schemes are:
- about: Browser information
- data: Embedded data
- feed: Web feeds
- file: Files
- ftp: File Transfer Protocol
- git: Version management with Git
- http: Hypertext Transfer Protocol
- https: Hypertext Transfer Protocol Secure
- imap: Internet Message Access Protocol
- mailto: Email addresses
- news: Usenet newsgroups
- pop: POP3
- rsync: Data synchronization
- sftp: SSH File Transfer Protocol
- ssh: Secure shell
- tel: Telephone numbers
- urn: Uniform Resource Names
The IANA publishes an official list of all known URI schemes.
URI Reference
In order to not always have to write (and save) a complete URI specification, many applications use a shorter version of the syntax. For the shortened version to be properly understood, there must always be a base URI that is fully formulated. The URI references are then resolved internally. For this reason, one distinguishes absolute references from relative ones. The absolute URI functions independently of context and consists of at least scheme, authority and path. The relative reference is in the actual short form. With this form, only the deviation from the base URI is specified. A relative URI must, for this reason, always be located in the same namespace, as is the case with the base URI.
With the relative reference, no scheme is specified. To be able to distinguish relative URIs from absolute URIs, no colon may appear in the first segment of a path because the part before the colon would then be interpreted as a scheme. Among the relevant references one distinguishes three different types that one recognises each time via a marker at the beginning of the path:
- A relative path reference begins without a forward slash.
- An absolute path reference begins with a forward slash.
- A network-path reference begins with two forward slashes.
URI vs. URL vs. URN
There is a lot of confusion regarding the very similar sounding URI, URL and URN abbreviations. The uncertainty is underpinned by the fact that all three concepts are, in technical terms, also related to each other. The Uniform Resource Locator is used to display where a resource is located. For this reason, the URL is also utilised when surfing on the Internet to navigate to specific websites. In contrast, the Uniform Resource Name is location-independent and permanently designates a resource. Thus, if URLs are primarily known in the form of web addresses, a URN can, for example, also appear as an ISBN to permanently identify a book.
URL and URN follow the URI syntax. For this reason, both designation types are URI subsets. URL and URN are thus always URIs. Conversely, Uniform Resource Identifiers are neither URLs nor URNs.