[GRASS-dev] [GRASS GIS] #2525: Unable to open sqlite database if path contains non-latin letters

#2525: Unable to open sqlite database if path contains non-latin letters
-------------------------+--------------------------------------------------
Reporter: marisn | Owner: grass-dev@…
     Type: defect | Status: new
Priority: major | Milestone: 7.0.0
Component: wxGUI | Version: svn-releasebranch70
Keywords: | Platform: MSWindows Vista
      Cpu: Unspecified |
-------------------------+--------------------------------------------------
Seems that any operation touching attribute database fails if path
contains non-latin letter.
It is a single instance of a general problem of passing file names as
arguments between GUI and modules.
Output in CMD window:
{{{
GRASS_INFO_WARNING(5668,2): Unable open database
<C:\Users\Māris\Documents\grass
data\nc_basic_spm_grass7\PERMANENT\sqlite\sqlite.db> by driver <sqlite>
GRASS_INFO_END(5668,2)
}}}

One of outputs in wxGUI command console:
{{{
Exception in thread Thread-26:
Traceback (most recent call last):
   File "C:\Program Files\GRASS GIS
7.0.0svn\Python27\lib\threading.py", line 810, in
__bootstrap_inner
     self.run()
   File "C:\Program Files\GRASS GIS
7.0.0svn\gui\wxpython\gui_core\forms.py", line 374, in run
     self.resultQ.put((requestId, self.request.run()))
   File "C:\Program Files\GRASS GIS
7.0.0svn\gui\wxpython\gui_core\forms.py", line 289, in run
     cparams[map]['dbInfo'] = gselect.VectorDBInfo(map)
   File "C:\Program Files\GRASS GIS
7.0.0svn\gui\wxpython\gui_core\gselect.py", line 743, in
__init__
     self._DescribeTables() # -> self.tables
   File "C:\Program Files\GRASS GIS
7.0.0svn\gui\wxpython\gui_core\gselect.py", line 770, in
_DescribeTables
     database = self.layers[layer]["database"])['cols']:
   File "C:\Program Files\GRASS GIS
7.0.0svn\etc\python\grass\script\db.py", line 43, in
db_describe
     s = read_command('db.describe', flags='c', table=table,
**args)
   File "C:\Program Files\GRASS GIS
7.0.0svn\etc\python\grass\script\core.py", line 425, in
read_command
     return handle_errors(returncode, stdout, args, kwargs)
   File "C:\Program Files\GRASS GIS
7.0.0svn\etc\python\grass\script\core.py", line 308, in
handle_errors
     returncode=returncode)
CalledModuleError: Module run None ['db.describe', '-c',
'table=census', 'driver=sqlite', 'database=C:\\Users\\M\xe2r
is\\Documents\\grassdata\\nc_basic_spm_grass7\\PERMANENT\\sq
lite\\sqlite.db'] ended with error
Process ended with non-zero return code 1. See errors in the
(error) output.
}}}

GRASS version: 7.0.0svn
GRASS SVN Revision: 63925
Build Date: 2015-01-02
Build Platform: i686-pc-mingw32
GDAL/OGR: 1.11.1
PROJ.4: 4.8.0
GEOS: 3.4.2
SQLite: 3.7.17
Python: 2.7.4
wxPython: 2.8.12.1
Platform: Windows-Vista-6.0.6002-SP2

Note: could CommandLineToArgvW be helpful?

--
Ticket URL: <http://trac.osgeo.org/grass/ticket/2525&gt;
GRASS GIS <http://grass.osgeo.org>

#2525: Unable to open sqlite database if path contains non-latin letters
-------------------------+--------------------------------------------------
Reporter: marisn | Owner: grass-dev@…
     Type: defect | Status: new
Priority: major | Milestone: 7.0.0
Component: wxGUI | Version: svn-releasebranch70
Keywords: | Platform: MSWindows Vista
      Cpu: Unspecified |
-------------------------+--------------------------------------------------

Comment(by glynn):

Replying to [ticket:2525 marisn]:

> Note: could CommandLineToArgvW be helpful?

What we need is the reverse: something which reliably converts argv to a
command string.

We actually have one of those (make_command_line() in lib/gis/spawn.c),
and Python also has one (list2cmdline() in the subprocess module). The
problem is that both of these only reverse the parsing which is done by
the executable itself, not that done by the shell. The shell's parsing
rules are even less well documented than those of the executable, and even
less sane.

The other issue is that the shell uses two different encodings
(codepages): "ANSI" and "OEM". Most of the time this doesn't matter; you
can just pass the byte strings straight through. But there are cases (such
as using the FOR command with backticks to take process output and use it
as an argument) where this doesn't work, and any character which doesn't
have the same codepoint in both encodings will cause problems (problems
which can't realistically be solved).

As for filenames, the main issues are

  1. If you use byte strings (i.e. char*) (e.g. fopen()), you can't access
any file whose name isn't representable in the current codepage. Those
files effectively don't exist in the char* world.

  2. The only supported encoding for Japanese is Shift-JIS (cp932), which
has the unfortunate feature of not being entirely compatible with ASCII.
Specifically, 0x5c is used both for the directory separator (normally
backslash, but actually prints as a yen (¥) sign in Japanese locales) and
as the second byte of some multi-byte sequences. Meaning that any code
which tries to parse filenames as byte strings with 0x5c as a directory
separator will often fail on Japanese filenames.

Neither of these have any simple solution (not even unreliable "hacks").
The only effective solution is to use the Unicode (i.e. wchar_t*) API.

In practical terms, that would mean writing a compatibility layer which
re-implements all of the standard ANSI C and POSIX filesystem calls,
taking UTF-8 char* arguments, converting to UTF-16 wchar_t*, then using
the Windows-specific wchar_t* functions. Anything which uses third-party
library functions which take filenames as char* won't work.

We'd also need custom startup code which used main16(int argc, wchar_t
**argv) as the entry point, converted all arguments to UTF-8, then called
main(). We'd still have issues with reading filenames from files, stdin,
or output from child processes, as these would either have to be in UTF-8
or would need to be converted to UTF-8 (which means that we'd need to know
the encoding).

--
Ticket URL: <http://trac.osgeo.org/grass/ticket/2525#comment:1&gt;
GRASS GIS <http://grass.osgeo.org>

#2525: Unable to open sqlite database if path contains non-latin letters
-------------------------+--------------------------------------------------
Reporter: marisn | Owner: grass-dev@…
     Type: defect | Status: new
Priority: major | Milestone: 7.0.0
Component: wxGUI | Version: svn-releasebranch70
Keywords: | Platform: MSWindows Vista
      Cpu: Unspecified |
-------------------------+--------------------------------------------------

Comment(by glynn):

Replying to [ticket:2525 marisn]:

> Seems that any operation touching attribute database fails if path
contains non-latin letter.
> It is a single instance of a general problem of passing file names as
arguments between GUI and modules.

Do you have the same problem with filenames other than database files?
E.g. do r.in.* or r.out.* work?

--
Ticket URL: <http://trac.osgeo.org/grass/ticket/2525#comment:2&gt;
GRASS GIS <http://grass.osgeo.org>

#2525: Unable to open sqlite database if path contains non-latin letters
-------------------------+--------------------------------------------------
Reporter: marisn | Owner: grass-dev@…
     Type: defect | Status: new
Priority: major | Milestone: 7.0.0
Component: wxGUI | Version: svn-releasebranch70
Keywords: | Platform: MSWindows Vista
      Cpu: Unspecified |
-------------------------+--------------------------------------------------

Comment(by hellik):

Replying to [comment:2 glynn]:
> Replying to [ticket:2525 marisn]:
>
> > Seems that any operation touching attribute database fails if path
contains non-latin letter.
> > It is a single instance of a general problem of passing file names as
arguments between GUI and modules.
>
> Do you have the same problem with filenames other than database files?
E.g. do r.in.* or r.out.* work?

{{{
r.out.gdal --verbose input=MRVBF4@user1 output=C:\wd\eudata\Māris\test.tif
format=GTiff
}}}

{{{
Exception in thread Thread-278:
Traceback (most recent call last):
   File "C:\OSGeo4W\apps\Python27\lib\threading.py", line
810, in __bootstrap_inner
     self.run()
   File "C:\OSGeo4W\apps\grass\grass-7.1.svn\gui\wxpython\cor
e\gconsole.py", line 155, in run
     self.resultQ.put((requestId, self.requestCmd.run()))
   File "C:\OSGeo4W\apps\grass\grass-7.1.svn\gui\wxpython\cor
e\gcmd.py", line 575, in run
     env = self.env)
   File "C:\OSGeo4W\apps\grass\grass-7.1.svn\gui\wxpython\cor
e\gcmd.py", line 161, in __init__
     args = map(EncodeString, args)
   File "C:\OSGeo4W\apps\grass\grass-7.1.svn\gui\wxpython\cor
e\gcmd.py", line 92, in EncodeString
     return string.encode(_enc)
   File "C:\OSGeo4W\apps\Python27\lib\encodings\cp1252.py",
line 12, in encode
     return
codecs.charmap_encode(input,errors,encoding_table)
UnicodeEncodeError: 'charmap' codec can't encode character
u'\u0101' in position 21: character maps to <undefined>
}}}

--
Ticket URL: <http://trac.osgeo.org/grass/ticket/2525#comment:3&gt;
GRASS GIS <http://grass.osgeo.org>

#2525: Unable to open sqlite database if path contains non-latin letters
-------------------------+--------------------------------------------------
Reporter: marisn | Owner: grass-dev@…
     Type: defect | Status: new
Priority: major | Milestone: 7.0.0
Component: wxGUI | Version: svn-releasebranch70
Keywords: | Platform: MSWindows Vista
      Cpu: Unspecified |
-------------------------+--------------------------------------------------

Comment(by hellik):

Replying to [comment:2 glynn]:
> Replying to [ticket:2525 marisn]:
>
> > Seems that any operation touching attribute database fails if path
contains non-latin letter.
> > It is a single instance of a general problem of passing file names as
arguments between GUI and modules.
>
> Do you have the same problem with filenames other than database files?
E.g. do r.in.* or r.out.* work?

it doesn't work with

{{{
r.in.gdal --verbose input=C:\wd\eudata\Māris\test.tif output=testfile
}}}

--
Ticket URL: <http://trac.osgeo.org/grass/ticket/2525#comment:4&gt;
GRASS GIS <http://grass.osgeo.org>

#2525: Unable to open sqlite database if path contains non-latin letters
-------------------------+--------------------------------------------------
Reporter: marisn | Owner: grass-dev@…
     Type: defect | Status: new
Priority: major | Milestone: 7.0.0
Component: wxGUI | Version: svn-releasebranch70
Keywords: | Platform: MSWindows Vista
      Cpu: Unspecified |
-------------------------+--------------------------------------------------

Comment(by glynn):

Replying to [comment:3 hellik]:

>
{{{
   File "C:\OSGeo4W\apps\Python27\lib\encodings\cp1252.py",
}}}

Again, that character doesn't exist in cp1252. This isn't specific to
GRASS; any portable C code will have exactly the same problems. The only
way that you can open that file is to use Windows-specific functions (e.g.
_wfopen() or CreateFileW()). Even passing it as an argument requires using
wmain() instead of main().

I'm only interested in whether it works in a locale which uses cp1257. The
fact that it doesn't work with cp1252 is a "wontfix".

--
Ticket URL: <http://trac.osgeo.org/grass/ticket/2525#comment:5&gt;
GRASS GIS <http://grass.osgeo.org>