Tuesday, November 3, 2015

EBCDIC - Few details...

From a very early age, most of us are taught about ASCII, and how this is used by computers to convert single byte numbers to the characters we see on our screen. So an 'a' is really 97 as far as the computer is concerned. So imagine my surprise when I found out that mainframes don't use ASCII, but EBCDIC.

Image result for ebcdic

I remember my reaction: "You've got to be kidding! Didn't EBCDIC die out years ago?"

Nope. And it's not just z/OS that uses it.
       IBM i,
       Fujitsu BSD2000/OSD,
       Unisys MCP,
       z/VSE,
       z/VM and
       z/TPF all happily continue to use EBCDIC today.

To them an 'a' is really 129, not 97.

This all worked out fine for many years. In fact EBCDIC was the most popular encoding system in the world until the Personal Computer revolution brought ASCII to the limelight.

But EBCDIC falls down when we need to display languages other than English. Words like "på" (Swedish) and "brève" (French) need special characters not necessarily available in the standard EBCDIC table. 

What's worse, there's no way that all these special characters for all the languages in the world are going to fit into the 255 places that an eight bit number has. To get around this, IBM created code pages.

EBCDIC Code Pages

Today there's no such thing as a single EBCDIC code table. You can find a few websites that claim to convert from ASCII to EBCDIC. But the chances are that they're really converting from ASCII to EBCDIC code page 37, or EBCDIC 0037.

EBCDIC 0037 is the default code page used by the United States and other English speaking countries when working with MVS: the traditional side of z/OS. It has all the normal a-z, A-Z, 0-9 characters, and other symbols like +, () and *. It also includes a few of the foreign characters for when we've borrowed foreign words like "resumé".

However if you're in France, the chances are that you'll be using EBCDIC 0297. In EBCDIC 0297, the standard a-z, A-z and 0-9 characters are the the same as EBCDIC 0037. But to see French words, other characters are used for other numbers.

 For example,
       - 177 is a pound sign (£) in EBCDIC 0037, &
       - A cross-hash (#)           in EBCDIC 0297.

There are many different code pages for all the different regions. From Spain and Iceland to Thailand and Japan.

Compare with ASCII

This is not a lot different to ASCII, which has gone from the original 7-bit code ASCII to ISO8859 with its different sub-definitions.

For example
  - ISO8859-1 is the standard 'Extended ASCII' that we all love,
  - ISO8859-2 is better for Eastern Europe, and
  - ISO8859-4 for countries like Latvia, Lithuania, Estonia and Greenland.

IBM controls these EBCDIC code pages, and assigns an ID to them called the Coded Character Set Identifier (CCSID).
 - The CCSID for EBCDIC 0037 is, you guessed it, 37.
 - IBM also has set CCSIDs for other characters sets - CCSID 1208 is Unicode (UTF-8).

No comments:

Post a Comment