[GRASS-dev] Re: [Qgis-developer] again on encoding problems

Hi all.
I assume grass-dev are aware of the problem. Has this been solved in wxPy GUI? How? Looks as a serious issue, as it is keeping lots of people away from grass in qgis at least.
If you have a solution, we’ll be happy of implementing in qgis, workload permitting.
Thanks.

-------- Messaggio originale --------

Oggetto:

Re: [Qgis-developer] again on encoding problems

Data:

Thu, 25 Oct 2012 15:16:38 +0900

Mittente:

Paolo Cavallini <cavallini@faunalia.it>

A:

Maris Nartiss <maris.gis@gmail.com>

CC:

qgis-developer <qgis-developer@lists.osgeo.org>

Thanks maris.
Are grass dev aware of the problem? Could we then display grass in English when on windows, for these languages?
It seems important.
All the best.

Maris Nartiss maris.gis@gmail.com ha scritto:

(attachments)

Parte allegato al messaggio (163 Bytes)

Paolo Cavallini wrote:

I assume grass-dev are aware of the problem.

Yes.

Has this been solved in wxPy GUI? How?

No.

Looks as a serious issue, as it is keeping lots of people
away from grass in qgis at least.
If you have a solution, we'll be happy of implementing in qgis, workload
permitting.

There are two issues for which there is no viable solution:

1. OEM encoding.
2. Shift-JIS.

Regarding #1: GRASS neither knows nor cares whether a string is in
ANSI or OEM encoding. Much of it doesn't care about encodings at all,
and just treats strings as sequences of bytes. Anything which needs to
care about the encoding (e.g. the GUI) will just use "the locale's
encoding", which on Windows means "the ANSI codepage". If you use the
OEM codepage for anything, you lose.

Suggestions as to how to determine whether a string uses the ANSI or
OEM page are welcome, if unlikely.

Regarding #2: On Windows, any byte within the range 0-127 is assumed
to represent the corresponding ASCII character. For encodings which
assign other characters to any byte within that range (either
individually or as part of a multi-byte sequence), that is likely to
cause problems.

The most obvious example is that any occurrence of the byte 0x5C
within a filename is assumed to be a directory separator.
Unfortunately, Shift-JIS uses 0x5C as the second byte of a multi-byte
sequence, meaning that Japanese filenames may be parsed incorrectly.

Neither EUC-JP nor UTF-8 have this problem (as these only re-purpose
codes above 128), but unfortunately Windows doesn't provide locales
which uses either of these encodings.

And I can't think of any solution which doesn't involve re-writing all
code which handles pathnames.

Similar issues may exist with the other punctuation characters which
are "mingled" with the alphabetic characters, i.e. "[\]^_{|}~" (e.g. |
is commonly used as a field separator, so tabular data which includes
Japanese text may be parsed incorrectly).

While such cases are probably less common than the pathname issue, a
fix is even less viable (i.e. fixing all string-handling code).

--
Glynn Clements <glynn@gclements.plus.com>

Il 26/10/2012 06:32, Glynn Clements ha scritto:

Paolo Cavallini wrote:

I assume grass-dev are aware of the problem.

Yes.

Has this been solved in wxPy GUI? How?

No.

Thanks a lot Glynn for the explanation.
All the best.

--
Paolo Cavallini - Faunalia
www.faunalia.eu
Full contact details at www.faunalia.eu/pc
Nuovi corsi QGIS e PostGIS: http://www.faunalia.it/calendario