Character Encoding: Which Schemes Encode to Which Sizes?

unicode_map

The subject of character mapping and encoding is formidable, and many confuse the various schemes and standards available. As a reference, the list below gives the main encoding options and their associated output sizes.

  • ASCII -> 7 bits

  • “Extended ASCII” -> 8 bits

  • UTF-7 -> 7 bits

  • IBM (OEM) Code Maps -> 8 bits

  • ANSI (Microsoft) Code Maps -> 8 bits

  • ISO 8859 -> 8 bits

  • UTF-8 -> 1-4 bytes

  • UTF-16 -> 2-4 bytes

  • UTF-32 -> 4 bytes

  • UCS-2 -> 2 bytes (obsolete)

  • UCS-4 -> 4 bytes

Links

Related posts: