Percent-encoding, also known as URL encoding, is a mechanism for encoding information in a Uniform Resource Identifier (URI) under certain circumstances. While this is known as URL encoding it is, in fact, used more generally within the main Uniform Resource Identifier (URI) set, which includes both Uniform Resource Locator (URL) and Uniform Resource Name (URN). As such, it is also used in the preparation of data of the "application/x-www-form-urlencoded" media type, as is often used in email messages and the submission of HTML form data in HTTP requests.
A | B | C | D | E | F | G | H | I | J | K | L | M | N | O | P | Q | R | S | T | U | V | W | X | Y | Z
| |
a | b | c | d | e | f | g | h | i | j | k | l | m | n | o | p | q | r | s | t | u | v | w | x | y | z
| |
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
| - | _ | . | ~ | |||||||||||||
! | * | ' | ( | ) | ; | : | @ | & | = | + | sign >$" target="_blank" >* | , | / | ? | % | # | |
|
No other characters are allowed in a URI.
%"), are then used in the URI in place of the reserved character.
For example, the reserved character "/", if used in the "path" component of a URI, has the special meaning of being a delimiter between path segments. If, according to a given URI scheme, "/" needs to be in a path segment, then the three characters "%2F" or "%2f" must be used in the segment instead of a raw "/".
Reserved characters that have no reserved purpose in a particular context may also be percent-encoded, but are not semantically different from those that are not percent-encoded.
For example, in the "query" component of a URI, "/" is still considered a reserved character, but it normally has no reserved purpose, unless a particular URI scheme says otherwise. When it has no reserved purpose, the character does not need to be percent-encoded.
URIs that differ only by the percent-encoding of reserved characters are not normally considered equivalent (denoting the same resource), unless it can be determined that the reserved characters in question have no reserved purpose. This determination is dependent upon the rules established for reserved characters by individual URI schemes.
URIs that differ only by the percent-encoding of unreserved characters are equivalent by definition, but URI processors, in practice, may not always recognize this equivalence. For example, URI consumers shouldn't treat "%41" differently than "A" or "%7E" differently than "~", but some do. For maximum interoperability, URI producers are discouraged from percent-encoding unreserved characters.
%FF", but byte value 41 (hexadecimal) should be represented by "A", not "%41".
For example, many URI schemes and protocols based on RFCs 1738 and 2396 presume that the data characters will be converted to bytes according to some unspecified character encoding before being represented in a URI by unreserved characters or percent-encoded bytes. If the scheme does not allow the URI to provide a hint as to what encoding was used, or if the encoding conflicts with the use of ASCII to percent-encode reserved and unreserved characters, then the URI cannot be reliably interpreted. Some schemes fail to account for encoding at all, and instead just suggest that data characters map directly to URI characters, which leaves it up to implementations to decide whether and how to percent-encode data characters that are in neither the reserved nor unreserved sets.
Not addressed by the current specification is what to do with encoded character data. For example, in computers, character data manifests in encoded form, at some level, and thus could be treated as either binary data or as character data when being mapped to URI characters. Presumably, it is up to the URI scheme specifications to account for this possibility and require one or the other, but in practice, few, if any, actually do.
%uxxxx, where xxxx is a Unicode value represented as four hexadecimal digits. This behavior is not specified by any RFC and has been rejected by the W3C.
+" instead of "%20". The MIME type of data encoded this way is application/x-www-form-urlencoded, and it is currently defined (still in a very outdated manner) in the HTML and XForms specifications. In addition, the CGI specification contains rules for how web servers decode data of this type and make it available to applications.
When sent in an HTTP GET request, application/x-www-form-urlencoded data is included in the query component of the request URI. When sent in an HTTP POST request or via email, the data is placed in the body of the message, and the name of the media type is included in the message's Content-Type header.
Other resources:
URI scheme | Internet standards | Binary to text encoding formats
This article is licensed under the GNU Free Documentation License.
It uses material from the
"Percent-encoding".
Home Page • arts • business • computers • games • health • hospitals • home • kids & teens • news • physicians • recreation• reference • regional • science • shopping • society • sports • world