You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
55 lines
2.2 KiB
55 lines
2.2 KiB
15 years ago
|
Transliteration Table README
|
||
|
----------------------------
|
||
|
|
||
|
1. Rationale: Identifiers within the database or programming languages
|
||
|
only accept latin-1 characters, numbers and '_' character.
|
||
|
|
||
|
Application developers can enter captions (titles) to give
|
||
|
objects or variables a meaningful name using full unicode set.
|
||
|
|
||
|
Transliteration is used to convert unicode captions to identifiers
|
||
|
without loosing meaning of the names.
|
||
|
|
||
|
More info:
|
||
|
http://en.wikipedia.org/wiki/Transliteration
|
||
|
http://en.wikipedia.org/wiki/Romanization
|
||
|
|
||
|
2. We use special kind of romanization as we only allow characters
|
||
|
described in 1.
|
||
|
|
||
|
3. Implementation: transliteration table, was generated by
|
||
|
generate_transliteration_table.sh shell script is used
|
||
|
to transliterate any unicode character (having code < 65535)
|
||
|
to an identifier, what gives constant time for converting
|
||
|
single character.
|
||
|
|
||
|
The resulting generated code is kept in transliteration_table.{h|cpp} files,
|
||
|
included by identifier.cpp for use in public utility functions.
|
||
|
|
||
|
For each item, the table (basically a table of c-strings) contains:
|
||
|
- a NULL string it the resulting conversion have to be "_" string;
|
||
|
- a c-string of size 1 or more containing a valid transliteration
|
||
|
as described in 1;
|
||
|
- an empty string "" if the transliteration should return empty string
|
||
|
(can be useful e.g. for soft signs in Cyrillic)
|
||
|
|
||
|
4. Fixes: Because iconv/recode tools are not fully implemented in regards
|
||
|
to transliteration to latin-1 (e.g. no good support
|
||
|
for Greek and Cyrillic/Serbian characters),
|
||
|
the transliteration_table.cpp file is patched with
|
||
|
transliteration_table.cpp.patch which provides fixes written by hand.
|
||
|
|
||
|
If you find invalid or missing transliterations:
|
||
|
a) edit transliteration_table.cpp (using UTF-8-compliant text editor!)
|
||
|
- if transliteration_table.cpp file does not exist,
|
||
|
extract it from transliteration_table.bz2 archive
|
||
|
b) run update_transliteration_table_patch.sh shell script,
|
||
|
what will update the transliteration_table.cpp.patch file
|
||
|
c) send the transliteration_table.cpp.patch file to the Kexi team
|
||
|
|
||
|
5. Credits
|
||
|
Jaroslaw Staniek <js at iidea.pl>
|
||
|
Michael Drueing <michael at drueing.de>
|
||
|
Chusslove Illich <caslav.ilic at gmx.net>
|
||
|
Michal Svec <rebel at atrey.karlin.mff.cuni.cz>
|