logo logo

How to Handle reserve and unsafe characters

Handling reserved and unsafe characters

In addition to the nonprinting characters, you’ll need to encode reserved and unsafe characters in your URLs as well.

Reserved characters are those characters that have a specific meaning within the URL itself. For example, many URLs use the slash character to separate elements of a pathname within the URL. If you need to include a slash in a URL that is not intended to be an element separator, you’ll need to encode it as %2F:

http://www.calculator.com/compute?3%2f4

This URL actually references the resource named compute on the www.calculator.com server and passes the string 3/4 to it, as delineated by the question mark (?). Presumably, the resource is actually a server-side program that performs some arithmetic function on the passed value and returns a result.

Unsafe characters are those that have no special meaning within the URL, but may have a special meaning in the context in which the URL is written. For example, the double quotation mark character ( “) is used to delimit URLs in many HTML tags. If you were to include a double quotation mark directly in a URL, you would probably confuse the HTML browser. Instead, encode the double quotation mark as %22 to avoid any possible conflict.

Other reserved and unsafe characters that should always be encoded are shown in Table 7.1.

Table 7.1: Reserved and unsafe characters and their URL encodings
Character Description Usage Encoding
; Semicolon Reserved %3B
/ Slash Reserved %2F
? Question mark Reserved %3F
: Colon Reserved %3A
@ At sign Reserved %40
= Equal sign Reserved %3D
& Ampersand Reserved %26
< Less than sign Unsafe %3C
> Greater than sign Unsafe %3E
Double quotation mark Unsafe %22
# Hash symbol Unsafe %23
% Percent Unsafe %25
{ Left curly brace Unsafe %7B
} Right curly brace Unsafe %7D
| Vertical bar Unsafe %7C
\ Backslash Unsafe %5C
^ Caret Unsafe %5E
~ Tilde Unsafe %7E
[ Left square bracket Unsafe %5B
] Right square bracket Unsafe %5D
` Back single quotation mark Unsafe %60

In general, you should always encode a character if there is some doubt as to whether it can be placed as-is in a URL. As a rule of thumb, any character other than a letter, number, or any of the characters $-_.+!*’(), should be encoded.

It is never an error to encode a character, unless that character has a specific meaning in the URL. For example, encoding the slashes in an http URL will cause them to be used as regular characters, not as pathname delimiters, breaking the URL.

bottom

No Comments »

No comments yet.

RSS feed for comments on this post. TrackBack URL

Leave a comment

bottom