[GRASS-dev] wxgui encoding issues

Hi all,

I'm sending this to the mailing list as I'm not sure where we're at in terms of the discussions concerning encoding issues in the wxgui (I know Maris has been active on the windows front). If my problems are real bugs, then I'll post relevant tickets.

Using any version, I have the following encoding issues in the GUI (knowing that my machine is configured with locale fr_BE.UTF-8) :

- r.reclass interactive input with accents gives me

"Traceback (most recent call last):
   File "/home/mlennert/SRC/GRASS/grass_trunk/dist.i686-pc-linux-gnu/etc/gui/wxpython/gui_core/forms.py", line 1738, in OnFileText
     f.write(text)
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe9' in position 7: ordinal not in range(128)"

The same data in a file works without issues.

Here are example reclass rules for the NC landclass96 raster to reproduce:

1 = 1 développé
2 = 2 agricole
3 = 3 herbacé
4 = 4 broussaille
5 = 5 forêt
6 = 6 eau
7 = 7 sédiment

- Trying to save any model from the graphical modeller, even a completely empty one gives me:

Traceback (most recent call last):
   File "/home/mlennert/SRC/GRASS/grass_trunk/dist.i686-pc-linux-gnu/etc/gui/wxpython/gmodeler/frame.py", line 868, in WriteModelFile
     WriteModelFile(fd = tmpfile, model = self.model)
   File "/home/mlennert/SRC/GRASS/grass_trunk/dist.i686-pc-linux-gnu/etc/gui/wxpython/gmodeler/model.py", line 1758, in __init__
     self._properties()
   File "/home/mlennert/SRC/GRASS/grass_trunk/dist.i686-pc-linux-gnu/etc/gui/wxpython/gmodeler/model.py", line 1807, in _properties
     self.fd.write('%s<name>%s</name>\n' % (' ' * self.indent, self.properties['name']))
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe8' in position 17: ordinal not in range(128)

This second issue seems to come from line 58 in gui/wxpython/gmodele/model.py :

self.properties = { 'name' : _("model"),

Does this property really have to be translatable ? If I change line 1807 mentioned in the traceback to

             self.fd.write('%s<name>%s</name>\n' % (' ' * self.indent, self.properties['name'].encode('UTF-8')))

it works. What I don't understand is that I don't have to do this for the description in the next line which also contains an accent in French.

- In all versions but 6.4.2 trying to import a file which resides in a directory with accent in the name ("données"), I see this in the file text field of the import wizard after browing to the file

/home/mlennert/Desktop/données/boundary_county.shp

Exploring this a bit further, I see the following behaviour:

- My own locale is defined as follows

LANG=fr_BE.UTF-8
LANGUAGE=
LC_CTYPE="fr_BE.UTF-8"
LC_NUMERIC="fr_BE.UTF-8"
LC_TIME="fr_BE.UTF-8"
LC_COLLATE="fr_BE.UTF-8"
LC_MONETARY="fr_BE.UTF-8"
LC_MESSAGES="fr_BE.UTF-8"
LC_PAPER="fr_BE.UTF-8"
LC_NAME="fr_BE.UTF-8"
LC_ADDRESS="fr_BE.UTF-8"
LC_TELEPHONE="fr_BE.UTF-8"
LC_MEASUREMENT="fr_BE.UTF-8"
LC_IDENTIFICATION="fr_BE.UTF-8"
LC_ALL=

- When I start 6.4.2 and then type 'locale' at the command prompt I get exactly the same.

- But when I start 6.4.3RC1 I see

LANG=
LANGUAGE=
LC_CTYPE="POSIX"
LC_NUMERIC="POSIX"
LC_TIME="POSIX"
LC_COLLATE="POSIX"
LC_MONETARY="POSIX"
LC_MESSAGES="POSIX"
LC_PAPER="POSIX"
LC_NAME="POSIX"
LC_ADDRESS="POSIX"
LC_TELEPHONE="POSIX"
LC_MEASUREMENT="POSIX"
LC_IDENTIFICATION="POSIX"
LC_ALL=

and the gui starts in English.

When I set LC_ALL=fr_BE.UTF-8 before starting grass6.4.3 RC1, and the type 'locale' at the command prompt, I get

LANG=
LANGUAGE=
LC_CTYPE="fr_BE.UTF-8"
LC_NUMERIC="fr_BE.UTF-8"
LC_TIME="fr_BE.UTF-8"
LC_COLLATE="fr_BE.UTF-8"
LC_MONETARY="fr_BE.UTF-8"
LC_MESSAGES="fr_BE.UTF-8"
LC_PAPER="fr_BE.UTF-8"
LC_NAME="fr_BE.UTF-8"
LC_ADDRESS="fr_BE.UTF-8"
LC_TELEPHONE="fr_BE.UTF-8"
LC_MEASUREMENT="fr_BE.UTF-8"
LC_IDENTIFICATION="fr_BE.UTF-8"
LC_ALL=fr_BE.UTF-8

and the gui in French. I can then import the file from the directory with accents, but the problems with saving the model and with using reclass rules with accents interactively remain.

I see some differences in the way LANG and LANGUAGE are handled in Init.sh, but I don't really understand where that comes to play for the GUI.

In any case, these encoding errors are quite an issue for my students...

So, if anyone can point me to existing discussions or bug tickets that I have failed to find, I'm more than happy to document these issues there. Or I can file a new ticket. Unless I'm just doing something wrong here...

Moritz

Moritz Lennert wrote:

Using any version, I have the following encoding issues in the GUI
(knowing that my machine is configured with locale fr_BE.UTF-8) :

- r.reclass interactive input with accents gives me

"Traceback (most recent call last):
   File
"/home/mlennert/SRC/GRASS/grass_trunk/dist.i686-pc-linux-gnu/etc/gui/wxpython/gui_core/forms.py",
line 1738, in OnFileText
     f.write(text)
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe9' in
position 7: ordinal not in range(128)"

File objects will normally[1] use ASCII as the encoding, regardless of
any environment settings. Writing a unicode object to a file will
convert it to a byte string according the file's encoding (or, if it
doesn't have one, according the default encoding, which is normally
ASCII).

[1] One exception is that a tty will automatically use the
environment's encoding.

The same data in a file works without issues.

If you tell r.reclass to read from a file, Python doesn't get
involved, and the data stays as bytes.

Data entered via wxWidgets will originate as Unicode. The
environment's encoding only matters if the Python code explicitly
makes it matter.

Python's built-in conversions between "str" and "unicode" types use
the default encoding, which is independent of the environment (it can
be set in sitecustomize.py, but cannot be changed after start-up).

If you want to read/write Unicode data from/to files (including pipes,
etc), you either need to use codecs.open() instead of the built-in
open() or file(), or explicitly use the str.decode() and/or
unicode.encode() methods.

Most GRASS libraries and modules don't care about the encoding; it's
all just bytes. If you use non-ASCII characters in a reclass file,
categories, etc, the file may not be interpreted correctly on another
system with a different encoding.

--
Glynn Clements <glynn@gclements.plus.com>

On 08/11/12 13:42, Moritz Lennert wrote:

- Trying to save any model from the graphical modeller, even a
completely empty one gives me:

Traceback (most recent call last):
File
"/home/mlennert/SRC/GRASS/grass_trunk/dist.i686-pc-linux-gnu/etc/gui/wxpython/gmodeler/frame.py",
line 868, in WriteModelFile
WriteModelFile(fd = tmpfile, model = self.model)
File
"/home/mlennert/SRC/GRASS/grass_trunk/dist.i686-pc-linux-gnu/etc/gui/wxpython/gmodeler/model.py",
line 1758, in __init__
self._properties()
File
"/home/mlennert/SRC/GRASS/grass_trunk/dist.i686-pc-linux-gnu/etc/gui/wxpython/gmodeler/model.py",
line 1807, in _properties
self.fd.write('%s<name>%s</name>\n' % (' ' * self.indent,
self.properties['name']))
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe8' in
position 17: ordinal not in range(128)

This second issue seems to come from line 58 in
gui/wxpython/gmodele/model.py :

self.properties = { 'name' : _("model"),

Does this property really have to be translatable ? If I change line
1807 mentioned in the traceback to

self.fd.write('%s<name>%s</name>\n' % (' ' * self.indent,
self.properties['name'].encode('UTF-8')))

it works. What I don't understand is that I don't have to do this for
the description in the next line which also contains an accent in French.

I see an attempts at solving this in r52329 (and consequent merges in other branches):

@@ -2001,9 +2001,11 @@
  # DATE: %s
  #
-#############################################################################

-""" % (properties['name'],
- properties['author'],
- properties['description'],
- time.asctime()))
+#%s
+""" % ('#' * 79,
+ properties['name'],
+ EncodeString(properties['author']),
+ EncodeString('\n# '.join(properties['description'].splitlines())),
+ time.asctime(),
+ '#' * 79))

          self.fd.write(

Any reason why you do not encode 'name' ? This also causes an error.

Moritz

Hi,

2012/11/19 Moritz Lennert <mlennert@club.worldonline.be>:

[...]

Any reason why you do not encode 'name' ? This also causes an error.

not really, I would just expect, that 'name' will not contain any
non-ascii characters. Changed in r53917.

Martin

--
Martin Landa <landa.martin gmail.com> * http://geo.fsv.cvut.cz/~landa

On 19/11/12 18:05, Martin Landa wrote:

Hi,

2012/11/19 Moritz Lennert<mlennert@club.worldonline.be>:

[...]

Any reason why you do not encode 'name' ? This also causes an error.

not really, I would just expect, that 'name' will not contain any
non-ascii characters.

Its default value is 'Model' and it's marked as translatable string in the code. Model in French is 'Modèle'...

More generally, I think we should always expect the user to try to use the most "natural" language, i.e. the one she's used to speaking and writing in everyday life. My students constantly use accents, spaces and the lot in file and directory names, even though I tell them not to. People just don't want to have think differently for a program, programs are expected (by them) to adjust to their needs...

Changed in r53917.

Thanks ! Can this also be applied to the 6.4 release branch ?

Moritz

Hi,

2012/11/21 Moritz Lennert <mlennert@club.worldonline.be>:

[...]

Changed in r53917.

Thanks ! Can this also be applied to the 6.4 release branch ?

done in r53946. Martin

--
Martin Landa <landa.martin gmail.com> * http://geo.fsv.cvut.cz/~landa

On Wed, November 21, 2012 10:25, Martin Landa wrote:

Hi,

2012/11/21 Moritz Lennert <mlennert@club.worldonline.be>:

[...]

Changed in r53917.

Thanks ! Can this also be applied to the 6.4 release branch ?

done in r53946. Martin

Thanks ! Now I hit the next problem in that same line:

Trying to save the model (with default name and description in French
containing accents) as a python script, I get:

Traceback (most recent call last):
  File "/usr/lib/grass64/etc/wxpython/gmodeler/frame.py",
line 1593, in OnSaveAs

self.SaveAs(force = False)
  File "/usr/lib/grass64/etc/wxpython/gmodeler/frame.py",
line 1582, in SaveAs

fd.write(self.body.GetText())
UnicodeEncodeError
:
'ascii' codec can't encode character u'\xe8' in position
126: ordinal not in range(128)

Moritz

On Wed, November 21, 2012 14:12, Moritz Lennert wrote:

On Wed, November 21, 2012 10:25, Martin Landa wrote:

Hi,

2012/11/21 Moritz Lennert <mlennert@club.worldonline.be>:

[...]

Changed in r53917.

Thanks ! Can this also be applied to the 6.4 release branch ?

done in r53946. Martin

Thanks ! Now I hit the next problem in that same line:

Trying to save the model (with default name and description in French
containing accents) as a python script, I get:

Traceback (most recent call last):
  File "/usr/lib/grass64/etc/wxpython/gmodeler/frame.py",
line 1593, in OnSaveAs

self.SaveAs(force = False)
  File "/usr/lib/grass64/etc/wxpython/gmodeler/frame.py",
line 1582, in SaveAs

fd.write(self.body.GetText())
UnicodeEncodeError
:
'ascii' codec can't encode character u'\xe8' in position
126: ordinal not in range(128)

And another one, this time due to the fact that the variable type
'character chain' is translated into 'chaîne de charactères':

Traceback (most recent call last):
  File "/usr/lib/grass64/etc/wxpython/gmodeler/dialogs.py",
line 706, in OnEndEdit

self.parent.UpdateModelVariables()
  File "/usr/lib/grass64/etc/wxpython/gmodeler/frame.py",
line 1413, in UpdateModelVariables

variables[name] = { 'type' : str(values[1]) }
UnicodeEncodeError
:
'ascii' codec can't encode character u'\xee' in position 3:
ordinal not in range(128)

Moritz

On Wed, November 21, 2012 14:20, Moritz Lennert wrote:

On Wed, November 21, 2012 14:12, Moritz Lennert wrote:

On Wed, November 21, 2012 10:25, Martin Landa wrote:

Hi,

2012/11/21 Moritz Lennert <mlennert@club.worldonline.be>:

[...]

Changed in r53917.

Thanks ! Can this also be applied to the 6.4 release branch ?

done in r53946. Martin

Thanks ! Now I hit the next problem in that same line:

Trying to save the model (with default name and description in French
containing accents) as a python script, I get:

Traceback (most recent call last):
  File "/usr/lib/grass64/etc/wxpython/gmodeler/frame.py",
line 1593, in OnSaveAs

self.SaveAs(force = False)
  File "/usr/lib/grass64/etc/wxpython/gmodeler/frame.py",
line 1582, in SaveAs

fd.write(self.body.GetText())
UnicodeEncodeError
:
'ascii' codec can't encode character u'\xe8' in position
126: ordinal not in range(128)

And another one, this time due to the fact that the variable type
'character chain' is translated into 'chaîne de caractères':

Sorry, the original obviously is 'string', not 'character chain'. At least
the description of the variable will also possible contain special
characters. We can probably impose the use of variable names without such
characters.

Moritz