|
Using Khmer script (and
many other Indic languages) in computers is more complicated than
using languages that apply standard Latin encoding (such as Spanish
or French), as fonts have to be interpreted, characters placed in
the right place (reordered) and many exceptions handled. Special
software is needed to handle Khmer and other Indic languages.
There are several high
quality user interfaces (UI) already developed in OpenSource
Software (these interfaces are the equivalent of the Windows desktop
in a Microsoft environment, the program that allows users to access
the different applications). Among these, there are some that seem
to be well adapted to the goals of this project, as they already
supports many Indic complex scripts (such as Thai, Hindi or Kannada,
not very different from Khmer in complication) and are being
translated to many languages, so the mechanics of translation are
very well developed. Using one of these interfaces that can easily
integrate the capability of using Khmer script seems to be a good
technical solution that would permit a low-risk adaptation of an
interface that will be maintained and improved by the worldwide
computing community.
Besides the user
interface, support for Khmer Script also needs to be developed
independently for office applications, Internet applications and
some utilities. As with user interfaces, the state of
internationalization of office and Internet applications is already
very high, including support for some Indic languages, which
simplifies the work.
The last three years
have seen an important advance on the handling of Indic language
scripts by OpenSource Software.
Unfortunately, as Khmer
was not yet standardized in Unicode -and no fonts were available-
Khmer was not included in these developments that embraced Indian
languages such as Hindi, Kannada or Tamil or other languages that
are written suing Indic scripts, such as Thai.
Khmer is now handled
almost correctly by very-up-to-date versions of Windows XP
using the last version of Uniscribe (Microsoft rendering engine,
Usp10.dll, not included in the XP standard distribution). It may also work correctly in Windows 2000. For more
information, you can look into
our page on this
subject, or either
here or
here. MS
Word still has some problems, but reportedly, MS Publishers handles
it very well. Other MS applications - such as PowerPoint - still have
important problems with Khmer. Many Win32 (Microsoft Windows) versions of
OpenSource software get language support from Uniscribe. Mozilla
and OpenOffice handle Khmer correctly under Windows 2000 and XP.
Different projects aim
at allowing OpenSource Software to handle all the languages of the
world (I18n
or Internationalization projects) under Linux and to handle local
date, currency and other formats (Localization or l10n). Many of
these projects are being supported by major computer manufacturers
such as IBM or Sun Microsystems.
The most global of these
projects maybe the
OpenI18N group that “aims
to provide a common open-source environment where applications can
be executed and behave correctly worldwide, with different scripts,
cultures and languages.”
Here are some specific implementations that are required for the
KhmerOS initiative:
The
ICU project -managed by IBM- has included
support for many Indic languages. ICU gives script layout support
to the
OpenOffice suite. No work has been done
yet to implement Khmer in ICU, but the work done in Pango opens the
way for the implementation (they are very similar). See
our page on ICU.
A modified form of the
ICU libraries
is used to give support to
Pango, a rendering infrastructure that is
used in high level interfaces such as
Gnome (see our
page on Gnome and Pango) or partially used in the
Mozilla browser. The problem of using
Pango is that the printing modules for Gnome still do not use Pango,
so screen display of Khmer does not imply being able to print Khmer
(which for now is the case, but work is being done to integrate
Pango and Gnome-print by the Gnome-print maintainer).
Gnome
(together with other Pango based applications) has been
preliminarily chosen as the user interface for this project. Once
implemented, Pango gives also support to quite a number of
applications in the Gnome environment, including the Evolution
e-mail/agenda tool, the Gimp graphic editor and some multimedia
utilities.
When
time for implementation comes, it will be very important to make
sure that the user interface used allows correct handling of Khmer
in the screen and in printing. If Gnome is not prepared to handle
Khmer correctly, other interface will have to be selected.
The
OpenSource alternative for a user interface seems to be
KDE, which receives support from
Qt. It could be an option for this initiative if Gnome proved to
to work well in Khmer 100%. This change would require changing the
set of tools included in the project for tools that use the KDE
toolkit.
In relation with these
developments, it is necessary to develop locales for Khmer. A
locale is a data file that contains information about date formats,
number formats, sorting… and other cultural information, so that
when dates and other data is printed, it follows local conventions.
See ICU in the status pages
for what is happening with locales.
It is also necessary to develop a "dictionary"
of Khmer words (a word list).
This dictionary is used for spell checking, for indicating the
word-processor and other programs where they should hyphenate or
terminate a line of text (as in Khmer no spaces are inserted between
words), and also to do dictionary based ordering (instead of
rule-based ordering, as the official dictionary is no always
systematic). The same dictionary format is used by OpenOffice and
Mozilla (e-mail). A synonym dictionary should be considered in later
versions of the software. See the
status page for
developments
Next, ordering algorithms have to be developed. One following
governamental indications, should follow the
Chuon Nath
dictionary order (not systematic). Words not in the dictionary
will be clasified according to specific ordering rule. A second
algorith with similar rules, but not taking into consideration the
specificities of the Chuon Nath
dictionary also needs to be developed.
Some development of software have to be made in order to have all
these applications work in Khmer in an OpenSource operating system
such as Linux.
These developments can
be done by the “maintainer” of the application, a person who usually
does it out of his/her own will and on a non-profit basis. This
person knows the program very well and the effort required is not to
large. The problem is that this person usually have a day-job that
allows very little time for this work, and they have many other
priorities as maintainers. The developments for Khmer, if done by
them, could take a long time.
They can also be done by
a volunteer or a student in a Cambodian university that will write
the necessary code. This solution could be used in case the project
does not have enough funds, but it can also take a long time, as
volunteers students are not always available and have other
priorities.
The third solution is to
contract the development to a person or company who will take care
of it. This person needs to learn how the program works, find
similar developments, adjust them for Khmer and add them to the
standard code of the applications.
Please look at the
status page for work that has already been done.
Another project to keep an
eye on is
Freetype, the project maintains
FreeType 2, “a software font engine that is designed to be small,
efficient, highly customizable and portable while capable of
producing high-quality output (glyph images). It can be used in
graphics libraries, display servers, font conversion tools, text
image generation tools, and many other products as well”.
There a couple of other projects that
so far do not seem to be moving much, but should be watched closely.
They have not yet produced any results of interest for this project.
They are:
The Indian GNU/Linux Project
(“The goal of this project is to create a Linux distribution
that supports Indian Languages from a GUI/Application level as well
as Kernel level.¨) and
The Indic-Computing Project.
(“We
create open-source infrastructural code, and provide technical
documentation on Indian language computing issues. Our mailing lists
provide forums where Indian language computing can be discussed”.)
|