Unicode has a certain amount of duplication due to aiming to allow legacy encodings to be converted to Unicode without losing any information. Sometimes these characters are rendered identically, other times they are rendered with different sizes or styles at least in the fonts intended to match up with the expectations of legacy systems.
CJK fullwidth forms
In traditional
CJK encodings characters usually took either a single
byte (known as halfwidth) or two bytes (known as fullwidth). Characters that took a single byte were generally displayed at half the width of those that took two bytes. Some characters such as the
latin alphabet were available in both halfwidth and fullwidth versions. As the halfwidth versions were more commonly used they were generally the ones mapped to the standard code points for those characters. Therefore a separate section was needed for the fullwidth forms to preserve the distinction.
Greek
Many
Greek letters are used as
technical symbols. All of the Greek letters are encoded in the Greek section of Unicode but many are encoded a second time under the name of the technical symbol they represent. Of these,
micro sign is in the
Latin-1 range and most of the rest are in the
Letterlike Symbols range. The "micro sign" (U+00B5, µ) is obviously inherited from
ISO 8859-1, but the origin of the others is less clear.
Roman numerals
Unicode has a number of characters specifically designated as
Roman numerals, as part of the
Number Forms range from U+2160 to U+2183. For example, MCMLXXXVIII could alternatively be written as
ⅯⅭⅯⅬⅩⅩⅩⅧ. This range includes both upper- and lowercase numerals, as well as pre-combined glyphs for numbers up to 12 (Ⅻ or XII), mainly intended for the clock faces for
compatibility with non–West-European encodings. The pre-combined glyphs should only be used to represent the individual numbers where the use of individual glyphs is not wanted, and not to replace compounded numbers. Similarly precombined glyphs for 5000 and 10000 exist.
Unicode