Characters
- Written language is an amazing invention.
- Unlike spoken language, which evolved over eons in homo sapiens, the
concept of a written form for language has been invented independantly
a number of times.
- The earliest of these inventions quickly developed into Sumerian cuneiform
around 3000 BC.
- Wikipedia defines a character as a grapheme which is...
A grapheme
designates the atomic unit in written language. Graphemes include letters,
Chinese ideograms, numerals, punctuation marks, and other symbols.
- Every language has a corresponding finite set of characters, whether it
uses a phonetic system (like English) or one based on ideograms (like Chinese)
- A language's common character set tends to be pretty stable. When was
the last time that English added a new character?
- Families of languages (e.g., Romance, Arabic) tend to converge to a
common set of characters for many practical reasons.
- So, it's natural to represent a character as an integer, which identifies
the character in the character set.
- The important computer character sets you should be familiar with are ASCII
and Unicode
- ASCII, the American
Standard Code for Information Interchange, has 128 characters
designed to encode the Roman alphabet used in English and other Western
European languages.
- Unicode, or Universal
Character Set, is an international standard designed to handle all known
languages and is becoming widely used on the web.