Characters
- Written language is an amazing invention.
- Unlike spoken language, which evolved over eons in homo sapiens, the
concept of a written form for language has been invented independantly
a number of times.
- The earliest of these inventions quickly developed into Sumerian
cuneiform around 3000 BC.
- Wikipedia defines a character as a grapheme which is...
A grapheme
designates the atomic unit in written language. Graphemes include
letters, Chinese ideograms, numerals, punctuation marks, and other
symbols.
- Every language has a corresponding finite set of characters, whether it
uses a phonetic system (like English) or one based on ideograms (like
Chinese)
- A language's common character set tends to be pretty stable.
When was the last time that English added a new character?
- Families of languages (e.g., Romance, Arabic) tend to converge
to a common set of characters for many practical reasons.
- So, it's natural to represent a character as an integer, which
identifies the character in the character set.
- The important computer character sets you should be familiar with are
ASCII and Unicode
- ASCII, the American
Standard Code for Information Interchange, has 128 characters
designed to encode the Roman alphabet used in English and other
Western European languages.
- Unicode, or
Universal Character Set, is an international standard designed to
handle all known languages and is becoming widely used on the web.
Last Modified -