Back to: HTML Tutorials
HTML URL Encoding with Examples
In this article, I am going to discuss HTML URL Encoding with Examples. Please read our previous article where we discussed HTML Responsive Web Design with Examples.
HTML Uniform Resource Locators
A Uniform Resource Locator (URL) is just a website’s address for accessing its content, such as www.dotnettutorials.net. In simple words, URLs are used to specify the location of a website so that users can use them to access various websites.
The URL redirects visitors to a specific online resource, such as a video, webpage, etc. When you search for anything on Google, it will return you multiple URLs for resources that are relevant to your search query.
URL – Uniform Resource Locator
URL stands for Uniform Resource Locator. It is actually a web address. A URL can contain words i.e. (dotnettutorials.net) or an Internet Protocol (IP) address i.e.192.168.67.52. But most of the user use URL in the form of words because it is easy to remember than numbers.
A URL is used by web browsers to request documents from webservers. A Uniform Resource Locator is a web address that is used to refer to a document on the internet.
URL Syntax: Scheme://prefix.domain:port/path/filename
- Scheme – Specifies the type of Internet service to be provided (most common is HTTP or HTTPS)
- Prefix – Specifies the domain prefix (default for HTTP is www)
- Domain – Specifies the name of an Internet domain (like www.dotnettutorials.net)
- Port – Specifies the host’s port number (default for HTTP is 80)
- Path – Specifies a path on the server.
- Filename – Specifies the name of a file or resource.
Common URL Schemes
Following are the lists of some common schemes:
- http: It stands for HyperText Transfer Protocol and is used for Common web pages. Not encrypted.
- https: It stands for Secure HyperText Transfer Protocol and is used for Secure web pages. Encrypted.
- ftp: It stands for File Transfer Protocol and is used for Downloading or uploading files.
- file: Used for a file on your computer
URL Encoding in HTML
URL encoding is the practice of translating unprintable characters or characters with special meaning within URLs to a representation that is unambiguous and universally accepted by web browsers and servers. These characters include −
- ASCII control characters − Unprintable characters typically used for output control. Character ranges 00-1F hex (0-31 decimal) and 7F (127 decimal). A complete encoding table is given below.
- Non-ASCII control characters − These are characters beyond the ASCII character set of 128 characters. This range is part of the ISO-Latin character set and includes the entire “top half” of the ISO-Latin set 80-FF hex (128-255 decimal). A complete encoding table is given below.
- Reserved characters − These are special characters such as the dollar sign, ampersand, plus, common, forward slash, colon, semi-colon, equals sign, question mark, and “at” symbol. All of these can have different meanings inside a URL so need to be encoded. A complete encoding table is given below.
- Unsafe characters − These are space, quotation marks, less than symbol, greater than symbol, pound character, percent character, Left Curly Brace, Right Curly Brace, Pipe, Backslash, Caret, Tilde, Left Square Bracket, Right Square Bracket, Grave Accent. These characters present the possibility of being misunderstood within URLs for various reasons. These characters should also always be encoded. A complete encoding table is given below.
Encoding occurs by replacing all restricted characters with a % sign followed by two hexadecimal numbers. The numerical values of the character in the ASCII character set are represented by these two hexadecimal values. For example, a backspace is not acceptable in a URL and is replaced by ‘%08’ while encoding. Similarly, a “=” sign is replaced by ‘%3D’.
Example1: Welcome to Dot Net Tutorials
After encoding the above text will be converted as:
Example2: Copyright © DotNetTutorials
After encoding the above text will be converted as:
In the above examples, the blank space is denoted by “+” and the © sign is represented by %C2%A9
Reserved Characters
These are special characters such as the dollar sign, ampersand, plus, common, forward slash, colon, semi-colon, equals sign, question mark, and “at” symbol. All of these can have different meanings inside a URL so need to be encoded. Some letters are reserved or restricted from being used in URLs. If we use reserved characters in a URL, we must encode them to help avoid problems. A complete encoding table is given below.
Unreserved Characters
Unreserved characters are those that are allowed in a URL but do not have a reserved purpose. Uppercase and lowercase letters, decimal digits, hyphens, periods, underscores are unreserved characters.
ASCII Encoding Example
Character | Encoded UTF-8 |
€ | %E2%82%AC |
£ | %C2%A3 |
© | %C2%A9 |
® | %C2%AE |
À | %C3%80 |
Á | %C3%81 |
 | %C3%82 |
à | %C3%83 |
Ä | %C3%84 |
Å | %C3%85 |
space | %20 |
! | %21 |
“ | %22 |
# | %23 |
$ | %24 |
% | %25 |
& | %26 |
In the next article, I am going to discuss HTML vs XHTML with examples. Here, in this article, I try to explain HTML URL Encoding with Examples and I hope you enjoy this HTML URL Encoding with Examples article.