Handling reserved and unsafe characters
In addition to the nonprinting characters, you’ll need to encode reserved and unsafe characters in your URLs as well.
Reserved characters are those characters that have a specific meaning within the URL itself. For example, many URLs use the slash character to separate elements of a pathname within the URL. If you need to include a slash in a URL that is not intended to be an element separator, you’ll need to encode it as %2F:
http://www.calculator.com/compute?3%2f4
This URL actually references the resource named compute on the www.calculator.com server and passes the string 3/4 to it, as delineated by the question mark (?). Presumably, the resource is actually a server-side program that performs some arithmetic function on the passed value and returns a result.
Unsafe characters are those that have no special meaning within the URL, but may have a special meaning in the context in which the URL is written. For example, the double quotation mark character ( “) is used to delimit URLs in many HTML tags. If you were to include a double quotation mark directly in a URL, you would probably confuse the HTML browser. Instead, encode the double quotation mark as %22 to avoid any possible conflict.
Other reserved and unsafe characters that should always be encoded are shown in Table 7.1.
| Character | Description | Usage | Encoding |
|---|---|---|---|
| ; | Semicolon | Reserved | %3B |
| / | Slash | Reserved | %2F |
| ? | Question mark | Reserved | %3F |
| : | Colon | Reserved | %3A |
| @ | At sign | Reserved | %40 |
| = | Equal sign | Reserved | %3D |
| & | Ampersand | Reserved | %26 |
| < | Less than sign | Unsafe | %3C |
| > | Greater than sign | Unsafe | %3E |
| “ | Double quotation mark | Unsafe | %22 |
| # | Hash symbol | Unsafe | %23 |
| % | Percent | Unsafe | %25 |
| { | Left curly brace | Unsafe | %7B |
| } | Right curly brace | Unsafe | %7D |
| | | Vertical bar | Unsafe | %7C |
| \ | Backslash | Unsafe | %5C |
| ^ | Caret | Unsafe | %5E |
| ~ | Tilde | Unsafe | %7E |
| [ | Left square bracket | Unsafe | %5B |
| ] | Right square bracket | Unsafe | %5D |
| ` | Back single quotation mark | Unsafe | %60 |
In general, you should always encode a character if there is some doubt as to whether it can be placed as-is in a URL. As a rule of thumb, any character other than a letter, number, or any of the characters $-_.+!*’(), should be encoded.
It is never an error to encode a character, unless that character has a specific meaning in the URL. For example, encoding the slashes in an http URL will cause them to be used as regular characters, not as pathname delimiters, breaking the URL.














































