Hello all,
WinGRASS activities followers migh have noticed error reports dealing
with file/folder encoding problems on Windows. I have been trying to
get rid of some of those problems and my current conclusions are
really bad - it's impossible. Good news - it's Windows that's broken
by design and not GRASS. Bad news - it's impossible to fix non-latin
letter support on Windows for file/folder names.
The root of the problem: CMD (and thus also .bat files) use OEM
encoding, GUI applications use ANSI encoding. When pasing around
strings between GUI and CMD (CLI or .bat files) one has to do
conversion from one encoding to second one. As we can't know from
where string comes from or where it will be used, it's not possible to
implement universal solution without gazzilion of hacks.
List of unsupported things (in both GUIs):
Starting GRASS if user's name contains non-latin letter (cause -
.grassrc is in %HOME%);
Setting GISDBASE to path with non-latin letter;
Importing/exporting data from/to folder with non-latin letter in it's path;
must be more.
Unless somebody comes up with brilliant idea how to fix encoding
issues, I suggest to downgrade priority of all windows encoding
related bugs and to add a warning to documentation, WinGRASS
installer, QGIS about issues with non-latin file names.
What to do if You have data/user name with non-latin letter and You
cant change it (i.e. no permission)? Your SOL.
The root of the problem: CMD (and thus also .bat files) use OEM
encoding, GUI applications use ANSI encoding.
The console functions (WriteConsole etc) use the OEM encoding.
Everything else uses ANSI encoding. In particular, the argv parameter
passed to main uses ANSI encoding. This is true for both "console" and
"GUI" programs.
AFAICT, the only situation where the OEM encoding will make its way
into GRASS is via curses.
Hello,
You are overlooking issues coming from current GRASS architecture. We
have CLI+GUI mixture and thus have to deal with both codepages at same
time. Also GRASS has to relay on other components working fine too
(GDAL/OGR).
I wrote a small test example for wish GUI.
Requirements to run:
Rename file_check_code to .zip file and extract it's contents (GMail
doesn't allow to send executable files)
WinGRASS-6.4.SVN-r45713-1-Setup.exe
All system locale settings set to some non-latin locale (Latvian,
Russian would do)
GRASS installed into C:\Program Files\GRASS 6.4.SVN (fix paths in
files if it differs)
A username with non-latin letter (Māris is a nice one
Some shapefile in Your users home directory.
Extract attached files to users home directory and run kodejums.bat from CMD.exe
When test asks for file, point it to shapefile
I'm also attaching output of code run on my Vista machine. One is run
with username Māris and second one - test.
Glynn, if You have an idea how to fix my example code, we could track
down all similar use patterns in gis.m etc. and fix them.
Somebody with better wxPython-fu could write similar testing code I
could run and provide it's results.
The root of the problem: CMD (and thus also .bat files) use OEM
encoding, GUI applications use ANSI encoding.
The console functions (WriteConsole etc) use the OEM encoding.
Everything else uses ANSI encoding. In particular, the argv parameter
passed to main uses ANSI encoding. This is true for both "console" and
"GUI" programs.
AFAICT, the only situation where the OEM encoding will make its way
into GRASS is via curses.
You are overlooking issues coming from current GRASS architecture. We
have CLI+GUI mixture and thus have to deal with both codepages at same
time.
We have to deal with Unicode and the current ANSI codepage. There's no
fundamental reason why we should need to deal with the OEM codepage.
I wrote a small test example for wish GUI.
Okay; I think that the immediate problem here is due to the
"FOR ... usebackq" trick. AFAICT, it treats the output from the
command within the backquotes as being in the OEM codepage, when it's
actually in the ANSI codepage.
Also:
echo "%HOME%" > out.txt
will use the OEM codepage when writing the file, but anything which
reads it is going to assume the ANSI codepage.
Other than that, it would just mean that the console displays
non-ASCII characters incorrectly, which doesn't affect programs which
aren't actually using the console.
I suspect that the simplest fix is to replace init.bat with e.g. a
Python version.
You are overlooking issues coming from current GRASS architecture. We
have CLI+GUI mixture and thus have to deal with both codepages at same
time.
We have to deal with Unicode and the current ANSI codepage. There's no
fundamental reason why we should need to deal with the OEM codepage.
.bat files and input from CMD.
I wrote a small test example for wish GUI.
Okay; I think that the immediate problem here is due to the
"FOR ... usebackq" trick. AFAICT, it treats the output from the
command within the backquotes as being in the OEM codepage, when it's
actually in the ANSI codepage.
Correct.
Also:
echo "%HOME%" > out.txt
will use the OEM codepage when writing the file, but anything which
reads it is going to assume the ANSI codepage.
Correct.
Other than that, it would just mean that the console displays
non-ASCII characters incorrectly, which doesn't affect programs which
aren't actually using the console.
I suspect that the simplest fix is to replace init.bat with e.g. a
Python version.
Not that easy. I set up all GRASS env variables via TCL. g.gisenv
started to work (horay!) and I managet to launch gis.m (double
horray!). Tools like d.vect/d.rast, w.what, g.copy, g.region seemed to
work, still v.in.ogr and r.in.gdal both where failing with "DSN not
found" and "File doesn't exist" on different file separators (/, \ and
\\). Being unable to import/export any data makes little sense to run
GRASS. Also I haven't tested any heavy shell/.bat files.
If in GRASS 7 we get rid of any non-python stuff for startup and
modules, it migh work.