Hi all,
I would like to kindly ask you about different charset of po-files. There are
UTF-8:
grep charset=UTF-8 *.po | wc -l
23
non-UTF-8:
grep charset= *.po | grep -v UTF-8 | wc -l
21
Is there a particular reason for this diversity? Why not use only UTF-8?
Best, Martin
Martin Landa wrote:
I would like to kindly ask you about different charset of po-files. There are
UTF-8:
grep charset=UTF-8 *.po | wc -l
23
non-UTF-8:
grep charset= *.po | grep -v UTF-8 | wc -l
21
Is there a particular reason for this diversity? Why not use only UTF-8?
Unibyte encodings are more widely supported, and are more compatible
with non-GNU gettext() implementations[1]. It also acts as a safety
measure against "gratuitous" use of characters outside of the locale's
normal repertoire.
Also, in locales whose primary language doesn't use the roman alphabet
(e.g. Russian), the historical encoding(s) are usually sufficiently
well entrenched that UTF-8 isn't a realistic option.
[1] GNU gettext will automatically convert UTF-8 message catalogues to
the locale's encoding, while other versions may just pass the strings
through untouched, meaning that a UTF-8 message catalogue will only
work in a UTF-8 locale.
--
Glynn Clements <glynn@gclements.plus.com>
Glynn,
OK, thanks for clear explanation...
Cheers Martin
2006/7/31, Glynn Clements <glynn@gclements.plus.com>:
Martin Landa wrote:
> I would like to kindly ask you about different charset of po-files. There are
>
> UTF-8:
> grep charset=UTF-8 *.po | wc -l
> 23
>
> non-UTF-8:
> grep charset= *.po | grep -v UTF-8 | wc -l
> 21
>
> Is there a particular reason for this diversity? Why not use only UTF-8?
Unibyte encodings are more widely supported, and are more compatible
with non-GNU gettext() implementations[1]. It also acts as a safety
measure against "gratuitous" use of characters outside of the locale's
normal repertoire.
Also, in locales whose primary language doesn't use the roman alphabet
(e.g. Russian), the historical encoding(s) are usually sufficiently
well entrenched that UTF-8 isn't a realistic option.
[1] GNU gettext will automatically convert UTF-8 message catalogues to
the locale's encoding, while other versions may just pass the strings
through untouched, meaning that a UTF-8 message catalogue will only
work in a UTF-8 locale.
--
Glynn Clements <glynn@gclements.plus.com>