[GRASS-dev] [GRASS GIS] #3220: WinGRASS not recognizing accented utf-8 (nor cp1252) attribute values

#3220: WinGRASS not recognizing accented utf-8 (nor cp1252) attribute values
-------------------------+---------------------------------
Reporter: hellik | Owner: grass-dev@…
     Type: defect | Status: new
Priority: normal | Milestone: 7.2.0
Component: Default | Version: svn-releasebranch72
Keywords: | CPU: Unspecified
Platform: MSWindows 8 |
-------------------------+---------------------------------
taken from the user ML:

https://lists.osgeo.org/pipermail/grass-user/2016-December/075682.html

{{{
I've got shape files with Swedish accented letters (ÄÖÅ) in the some of
the attribute
values. The Attributes are shwon as they should in the GUI. SQL
statements, however,
are not recognizing them. They're also messed up in the command output if
another
(not accented) values are queried.

I sat GRASS_DB_ENCODING to cp1252 firstly and it didn't work. Then I
converted the
dbf file into utf-8 and sat it as the value of the variable, to no avail.
I also
tried using the 'encoding' parameter in v.in.ogr in both cases, didn't
work.

I tried it on windows 8.1 and windows 10. The same is happening in both,
stable GRASS
7.0.5 and GRASS 7.2.0RC1.

The problem is only happening on Windows. Fedora and Mac OsX don't have
this issue
with the same shape files.
}}}

https://lists.osgeo.org/pipermail/grass-user/2016-December/075688.html

{{{
confirmed with

GRASS version: 7.3.svn
GRASS SVN revision: r70001
Build date: 2016-12-06
Build platform: x86_64-w64-mingw32
GDAL: 2.1.2
PROJ.4: 4.9.3
GEOS: 3.5.0
SQLite: 3.14.1
Python: 2.7.5
wxPython: 2.8.12.1
Platform: Windows-8-6.2.9200 (OSGeo4W)
}}}

{{{
and a test vector with following attributes

v.db.select map=test_points at data file=D:\temp\test_point.txt

cat|id|names
1|1|ÄÖÅ
2||Æ
3||Ø
4||Å,å,Æ,æ,Ø,ø
5||ø, Ø
6||Þ
7||Ð
8||Å
9||æ
}}}

{{{
d.vect map=test_points2 at data where="names = 'Å,å,Æ,æ,Ø,ø'" width=1
icon=basic/point size=10

doesn't show the selected point in the map display.
}}}

{{{
v.report map=test_points at data option=coor
cat|id|names|x|y|z
1|1|ÄÖÅ|1.37409120951759|47.039352838731|0.0
2||Æ|2.62326503635168|28.5515802015863|0.0
3||Ø|44.095836087244|57.2825782187707|0.0
4||Å,å,Æ,æ,Ø,ø|30.8545935228025|49.787535257766|0.0
5||ø, Ø|10.1183079973563|51.0367090846001|0.0
6||Þ|20.361533377396|52.0360481460674|0.0
8||Ã…|15.1491119517375|60.3621017805262|0.0
9||æ|-1.26290587954035|52.5879880709736|0.0

Traceback (most recent call last):
   File "C:\OSGEO4~1\apps\grass\grass-7.3.svn\gui\wxpython\gu
i_core\goutput.py", line 473, in OnCmdOutput

self.cmdOutput.AddStyledMessage(message, type)
   File "C:\OSGEO4~1\apps\grass\grass-7.3.svn\gui\wxpython\gu
i_core\goutput.py", line 772, in AddStyledMessage

self.AddTextWrapped(message, wrap=None)
   File "C:\OSGEO4~1\apps\grass\grass-7.3.svn\gui\wxpython\gu
i_core\goutput.py", line 721, in AddTextWrapped

txt = EncodeString(txt)
   File "C:\OSGEO4~1\apps\grass\grass-7.3.svn\gui\wxpython\co
re\gcmd.py", line 97, in EncodeString

return string.encode(_enc)
   File "C:\OSGEO4~1\apps\Python27\lib\encodings\cp1252.py",
line 12, in encode

return codecs.charmap_encode(input,errors,encoding_table)
UnicodeDecodeError
:
'ascii' codec can't decode byte 0xc3 in position 3: ordinal
not in range(128)
}}}

--
Ticket URL: <https://trac.osgeo.org/grass/ticket/3220&gt;
GRASS GIS <https://grass.osgeo.org>

#3220: WinGRASS not recognizing accented utf-8 (nor cp1252) attribute values
--------------------------+---------------------------------
  Reporter: hellik | Owner: grass-dev@…
      Type: defect | Status: new
  Priority: normal | Milestone: 7.2.0
Component: Default | Version: svn-releasebranch72
Resolution: | Keywords:
       CPU: Unspecified | Platform: MSWindows 8
--------------------------+---------------------------------

Comment (by martinl):

Import (`v.import/v.in.ogr`) with `encoding=cp1252` will not help?

--
Ticket URL: <https://trac.osgeo.org/grass/ticket/3220#comment:1&gt;
GRASS GIS <https://grass.osgeo.org>

#3220: WinGRASS not recognizing accented utf-8 (nor cp1252) attribute values
--------------------------+---------------------------------
  Reporter: hellik | Owner: grass-dev@…
      Type: defect | Status: new
  Priority: normal | Milestone: 7.2.0
Component: Default | Version: svn-releasebranch72
Resolution: | Keywords:
       CPU: Unspecified | Platform: MSWindows 8
--------------------------+---------------------------------

Comment (by hellik):

Replying to [comment:1 martinl]:
> Import (`v.import/v.in.ogr`) with `encoding=cp1252` will not help?

{{{
v.import encoding=cp1252 input=D:\temp\test_points.shp layer=test_points
output=testimportcp1252
WARNING: All available OGR layers will be imported into vector map
<test_points>
Check if OGR layer <test_points> contains polygons...
Importing 9 features (OGR layer <test_points>)...
-----------------------------------------------------
Building topology for vector map <testimportcp1252@data2>...
Registering primitives...
9 primitives registered
9 vertices registered
Building areas...
0 areas built
0 isles built
Attaching islands...
Attaching centroids...
Number of nodes: 0
Number of primitives: 9
Number of points: 9
Number of lines: 0
Number of boundaries: 0
Number of centroids: 0
Number of areas: 0
Number of isles: 0
Input <D:\temp\test_points.shp> successfully imported without reprojection
}}}

{{{
v.report map=testimportcp1252@data2 option=coor
cat|id|names|x|y|z
1|1|ÄÖÅ|1.37409120951759|47.039352838731|0.0
2||Æ|2.62326503635168|28.5515802015863|0.0
3||Ø|44.095836087244|57.2825782187707|0.0
4||Å,å,Æ,æ,Ø,ø|30.8545935228025|49.787535257766|0.0
5||ø, Ø|10.1183079973563|51.0367090846001|0.0
6||Þ|20.361533377396|52.0360481460674|0.0
8||Ã…|15.1491119517375|60.3621017805262|0.0
9||æ|-1.26290587954035|52.5879880709736|0.0

Traceback (most recent call last):
   File "C:\OSGEO4~1\apps\grass\grass-7.3.svn\gui\wxpython\gu
i_core\goutput.py", line 473, in OnCmdOutput

self.cmdOutput.AddStyledMessage(message, type)
   File "C:\OSGEO4~1\apps\grass\grass-7.3.svn\gui\wxpython\gu
i_core\goutput.py", line 772, in AddStyledMessage

self.AddTextWrapped(message, wrap=None)
   File "C:\OSGEO4~1\apps\grass\grass-7.3.svn\gui\wxpython\gu
i_core\goutput.py", line 721, in AddTextWrapped

txt = EncodeString(txt)
   File "C:\OSGEO4~1\apps\grass\grass-7.3.svn\gui\wxpython\co
re\gcmd.py", line 97, in EncodeString

return string.encode(_enc)
   File "C:\OSGEO4~1\apps\Python27\lib\encodings\cp1252.py",
line 12, in encode

return codecs.charmap_encode(input,errors,encoding_table)
UnicodeDecodeError
:
'ascii' codec can't decode byte 0xc3 in position 3: ordinal
not in range(128)
}}}

it doesn't help.

--
Ticket URL: <https://trac.osgeo.org/grass/ticket/3220#comment:2&gt;
GRASS GIS <https://grass.osgeo.org>

#3220: WinGRASS not recognizing accented utf-8 (nor cp1252) attribute values
--------------------------+---------------------------------
  Reporter: hellik | Owner: grass-dev@…
      Type: defect | Status: new
  Priority: normal | Milestone: 7.2.0
Component: Default | Version: svn-releasebranch72
Resolution: | Keywords:
       CPU: Unspecified | Platform: MSWindows 8
--------------------------+---------------------------------
Changes (by hellik):

* Attachment "test_points_encoding_errors.zip" added.

zipped shapefile in wgs84 for testing

--
Ticket URL: <https://trac.osgeo.org/grass/ticket/3220&gt;
GRASS GIS <https://grass.osgeo.org>

#3220: WinGRASS not recognizing accented utf-8 (nor cp1252) attribute values
--------------------------+---------------------------------
  Reporter: hellik | Owner: grass-dev@…
      Type: defect | Status: new
  Priority: normal | Milestone: 7.2.0
Component: Default | Version: svn-releasebranch72
Resolution: | Keywords:
       CPU: Unspecified | Platform: MSWindows 8
--------------------------+---------------------------------

Comment (by razz):

Replying to [comment:1 martinl]:
> Import (`v.import/v.in.ogr`) with `encoding=cp1252` will not help?
[[BR]]

encoding=cp1252 in v.in.ogr did not help. And I'm getting them messed even
with v.db.select but without any error output:

{{{
v.db.select map=test_points
cat|id|names
1|1|ÄÖÅ
2||Æ
3||Ø
4||Å,å,Æ,æ,Ø,ø
5||ø, Ø
6||Þ
7||Ð
8||Ã…
9||æ
(Wed Dec 07 15:32:51 2016) Command finished (0 sec)
}}}

I used the stand-alone installer for both GRASS 7.0.5 and GRASS 7.2.0svn,
if it matters.

--
Ticket URL: <https://trac.osgeo.org/grass/ticket/3220#comment:3&gt;
GRASS GIS <https://grass.osgeo.org>

#3220: WinGRASS not recognizing accented utf-8 (nor cp1252) attribute values
--------------------------+---------------------------------
  Reporter: hellik | Owner: grass-dev@…
      Type: defect | Status: new
  Priority: normal | Milestone: 7.2.0
Component: Default | Version: svn-releasebranch72
Resolution: | Keywords:
       CPU: Unspecified | Platform: MSWindows 8
--------------------------+---------------------------------

Comment (by mlennert):

Replying to [comment:1 martinl]:
> Import (`v.import/v.in.ogr`) with `encoding=cp1252` will not help?

Looking at the file, I do not have the feeling that it is in cp1252, but
rather in utf-8, so IIUC the parameter setting for v.in.ogr should be
encoding=utf-8.

--
Ticket URL: <https://trac.osgeo.org/grass/ticket/3220#comment:4&gt;
GRASS GIS <https://grass.osgeo.org>

#3220: WinGRASS not recognizing accented utf-8 (nor cp1252) attribute values
--------------------------+---------------------------------
  Reporter: hellik | Owner: grass-dev@…
      Type: defect | Status: new
  Priority: normal | Milestone: 7.2.0
Component: Default | Version: svn-releasebranch72
Resolution: | Keywords:
       CPU: Unspecified | Platform: MSWindows 8
--------------------------+---------------------------------

Comment (by hellik):

Replying to [comment:4 mlennert]:
> Replying to [comment:1 martinl]:
> > Import (`v.import/v.in.ogr`) with `encoding=cp1252` will not help?
>
> Looking at the file, I do not have the feeling that it is in cp1252, but
rather in utf-8, so IIUC the parameter setting for v.in.ogr should be
encoding=utf-8.

Tried it also with UTF-8,it fails here too.

--
Ticket URL: <https://trac.osgeo.org/grass/ticket/3220#comment:5&gt;
GRASS GIS <https://grass.osgeo.org>

#3220: WinGRASS not recognizing accented utf-8 (nor cp1252) attribute values
--------------------------+---------------------------------
  Reporter: hellik | Owner: grass-dev@…
      Type: defect | Status: new
  Priority: normal | Milestone: 7.2.0
Component: Default | Version: svn-releasebranch72
Resolution: | Keywords:
       CPU: Unspecified | Platform: MSWindows 8
--------------------------+---------------------------------
Changes (by hellik):

* Attachment "qgis_shapefile_cp1252.zip" added.

qgis generated cp1252 example shapefile

--
Ticket URL: <https://trac.osgeo.org/grass/ticket/3220&gt;
GRASS GIS <https://grass.osgeo.org>

#3220: WinGRASS not recognizing accented utf-8 (nor cp1252) attribute values
--------------------------+---------------------------------
  Reporter: hellik | Owner: grass-dev@…
      Type: defect | Status: new
  Priority: normal | Milestone: 7.2.0
Component: Default | Version: svn-releasebranch72
Resolution: | Keywords:
       CPU: Unspecified | Platform: MSWindows 8
--------------------------+---------------------------------

Comment (by hellik):

Replying to [comment:4 mlennert]:
> Replying to [comment:1 martinl]:
> > Import (`v.import/v.in.ogr`) with `encoding=cp1252` will not help?
>
> Looking at the file, I do not have the feeling that it is in cp1252, but
rather in utf-8, so IIUC the parameter setting for v.in.ogr should be
encoding=utf-8.

added now a qgis generated (hopefully) cp1p1252 example shape files. this
one fails here also on a self compiled linux grass trunk.

--
Ticket URL: <https://trac.osgeo.org/grass/ticket/3220#comment:6&gt;
GRASS GIS <https://grass.osgeo.org>

#3220: WinGRASS not recognizing accented utf-8 (nor cp1252) attribute values
--------------------------+---------------------------------
  Reporter: hellik | Owner: grass-dev@…
      Type: defect | Status: new
  Priority: normal | Milestone: 7.2.0
Component: Default | Version: svn-releasebranch72
Resolution: | Keywords:
       CPU: Unspecified | Platform: MSWindows 8
--------------------------+---------------------------------

Comment (by marisn):

A comment without looking into actual code.
It is necessary to provide clear info on reproducing the issue. Crucial
info is:
* Windows locale (will influence assumed encoding);
* The mechanism of executing example command (CMD.exe will have different
encoding than other places. Think ANSI vs OEM).

Some related reading:
https://trac.osgeo.org/grass/ticket/2525#comment:1
https://trac.osgeo.org/grass/ticket/2120#comment:10
http://stackoverflow.com/a/17177904
https://bugs.python.org/issue6135

--
Ticket URL: <https://trac.osgeo.org/grass/ticket/3220#comment:7&gt;
GRASS GIS <https://grass.osgeo.org>

#3220: WinGRASS not recognizing accented utf-8 (nor cp1252) attribute values
--------------------------+---------------------------------
  Reporter: hellik | Owner: grass-dev@…
      Type: defect | Status: new
  Priority: normal | Milestone: 7.2.0
Component: Default | Version: svn-releasebranch72
Resolution: | Keywords:
       CPU: Unspecified | Platform: MSWindows 8
--------------------------+---------------------------------

Comment (by razz):

Replying to [comment:7 marisn]:
> ... Crucial info is:
> * Windows locale (will influence assumed encoding);
> * The mechanism of executing example command (CMD.exe will have
different encoding than other places. Think ANSI vs OEM).

[[BR]]
Here is what I've got: Windows locale

{{{
systeminfo
System Locale: sv;Svenska
Input Locale: sv;Svenska
}}}

Originally, I've got

{{{
chcp
850
}}}

But since it's not working, I tried using

{{{
chcp 1252
}}}
and the Nordic OEM:

{{{
chcp 865
}}}

before importing, in the cmd and from within GRASS in the command console.
Nothing really changed.
The rest of the reading I did was way over my head, sorry. But here's a
link to the original shape file I'm having issues with. I can't attach it
here because it's a bit over 2MB and I'm afraid that taking a sample from
it and exporting it might change the encoding on export:
https://www.dropbox.com/s/2ptgaf5owco63f0/stockholm.zip?dl=0 (the link
should be valid for a month).

--
Ticket URL: <https://trac.osgeo.org/grass/ticket/3220#comment:8&gt;
GRASS GIS <https://grass.osgeo.org>

#3220: WinGRASS not recognizing accented utf-8 (nor cp1252) attribute values
--------------------------+---------------------------------
  Reporter: hellik | Owner: grass-dev@…
      Type: defect | Status: new
  Priority: normal | Milestone: 7.2.4
Component: Default | Version: svn-releasebranch72
Resolution: | Keywords:
       CPU: Unspecified | Platform: MSWindows 8
--------------------------+---------------------------------

Comment (by hellik):

see also #3925

--
Ticket URL: <https://trac.osgeo.org/grass/ticket/3220#comment:14&gt;
GRASS GIS <https://grass.osgeo.org>