ISO 2022, more formally ISO/IEC 2022, is an ISO standard (equivalent to the ECMA standard ECMA-35) specifying
ISO 2022 was developed as a technique to attack both of these problems: to represent characters in multiple character sets within a single character encoding, and to represent large character sets.
Being based on ISO 646, ISO 2022 exhibits many of ISO 646's properties. For example, the most significant bit of each byte does not carry any meaning; this allows ISO 2022 (like ISO 646) to be easily transmitted through 7-bit communication channels. (This 7-bit property also forms the basis of the EUC code.)
To represent multiple character sets, the ISO 2022 character encodings include escape sequences which indicate the character set for characters which follow. The escape sequences are registered with ISO and are often three characters long starting with the ASCII ESCAPE character (hexadecimal 1B, octal 33). These character encodings require data to be processed sequentially in a forward direction since the correct interpretation of the data depends on the most recently encountered escape sequence.
To represent large character sets, ISO 2022 builds on ISO 646's property that 1 byte can define 94 graphic (printable) characters (in addition to space and 33 control characters). Using two bytes, it is thus possible to represent up to 8836 (94×94) characters; and, using three bytes, up to 830584 (94×94×94) characters. For the two-byte character sets, the code point of each character is normally specified in so-called kuten form (sometimes called quwei, especially when dealing with GB2312 and related standards), which specifies a zone (ku or qu), and the point (ten) or position (wei) of that character within the zone.
The escape sequences therefore do not only declare which character set is being used, but also, by knowing the properties of these character sets, know whether a 94-, 8836-, or 830584-character (or some other sized) encoding is being dealt with.
In practice, the escape sequences declaring the national character sets may be absent if context or convention dictates that a certain national character set is to be used. For example, RFC 1922, which defines ISO-2022-CN, allows ASCII SHIFT characters to be used instead of escape sequences.
Although the ISO 2022 character sets are still in common use, particularly ISO-2022-JP, most modern e-mail applications are converting to use the simpler Unicode character encodings such as UTF-8.
This article is licensed under the GNU Free Documentation License.
It uses material from the
"ISO/IEC 2022".
Home Page • arts • business • computers • games • health • hospitals • home • kids & teens • news • physicians • recreation• reference • regional • science • shopping • society • sports • world