[GRASS-dev] fonts and character encoding

Do I no longer need to provide a way to manually set character encoding with default display fonts (i.e., it accompanies the font)?

Michael


Michael Barton, Professor of Anthropology
School of Human Evolution & Social Change
Center for Social Dynamics and Complexity
Arizona State University

phone: 480-965-6213
fax: 480-965-7671
www: http://www.public.asu.edu/~cmbarton

Michael Barton wrote:

Do I no longer need to provide a way to manually set character encoding with
default display fonts (i.e., it accompanies the font)?

Each font has a default encoding, but the user may wish to specify a
different encoding.

Also, it makes a difference where the text is coming from.

If it is being read from a file, the encoding needs to match that used
in the file. But display commands which read text from files should
really have an encoding= option (well, it tends to be named charset=;
if we're going to fix that, it should be sooner rather than later).

If the text is passed as a command-line argument, the encoding needs
to match whatever the GUI uses. Both Python and Tcl/Tk use UTF-8
internally, but (I think) use the locale's encoding when passing text
to system functions which expect a char*, e.g. execve()'s argv
parameter.

So, I think that the GUI should probably set GRASS_FT_ENCODING to the
locale's encoding (e.g. from "locale charmap").

--
Glynn Clements <glynn@gclements.plus.com>

On 5/12/07 4:44 AM, "Glynn Clements" <glynn@gclements.plus.com> wrote:

So, I think that the GUI should probably set GRASS_FT_ENCODING to the
locale's encoding (e.g. from "locale charmap").

You mean instead of having the user set it? How do I access locale charmap
from TclTk or wxPython?

Michael

__________________________________________
Michael Barton, Professor of Anthropology
School of Human Evolution & Social Change
Center for Social Dynamics & Complexity
Arizona State University

phone: 480-965-6213
fax: 480-965-7671
www: http://www.public.asu.edu/~cmbarton

Michael Barton wrote:

> So, I think that the GUI should probably set GRASS_FT_ENCODING to the
> locale's encoding (e.g. from "locale charmap").

You mean instead of having the user set it?

Yes.

AFAICT:

Tcl and Python both use Unicode internally, and automatically convert
to the locale's encoding when calling system functions. So if the user
types a value for e.g. "d.text text=..." into the GUI, the sequence of
bytes which end up in the corresponding argv will be in the locale's
encoding.

At least, that's the default. With Tcl, you can change the default
encoding with "encoding system <name>", where <name> is one of the
values returned by "encoding names". Also, when reading/writing
streams (files or pipes), you can change the encoding for that
particular stream with "fconfigure $fh -encoding ...".

But, in general, the GUI should probably stick with the locale's
encoding, as that's what external commands will normally be expecting.

How do I access locale charmap from TclTk or wxPython?

Tcl:
  set encoding [exec locale charmap]
Python:
  encoding = Popen(["locale", "charmap"], stdout=PIPE).communicate()[0].strip()

AFAICT, the resulting value will be valid for iconv(), and thus for
$GRASS_FT_ENCODING.

Needless to say, this won't work on Windows. You can get the Windows
codepage by parsing the output from "mode con cp /status", although
there might be better solutions. iconv appears to accept cp??? for
most of the common ones (e.g. cp437 = US, cp850 = western Europe,
etc).

--
Glynn Clements <glynn@gclements.plus.com>

Glynn Clements wrote:

Tcl and Python both use Unicode internally, and automatically convert
to the locale's encoding when calling system functions. So if the user
types a value for e.g. "d.text text=..." into the GUI, the sequence of
bytes which end up in the corresponding argv will be in the locale's
encoding.

..

But, in general, the GUI should probably stick with the locale's
encoding, as that's what external commands will normally be expecting.

..

Needless to say, this won't work on Windows. You can get the Windows
codepage by parsing the output from "mode con cp /status", although
there might be better solutions. iconv appears to accept cp??? for
most of the common ones (e.g. cp437 = US, cp850 = western Europe,
etc).

I was thinking about this with WRT to the meta-data systems.

e.g. say a raster map's units are °C or kg/m³ or µg/l.

Could "r.support units=" support that or would a user have to edit the
cell_misc/$MAP/units file themselves? And if they did edit it by hand,
would the (new) file be compatible with what GRASS expects? Would
d.legend, GUI profile tool, etc, transfer the degree symbol (or whatever)
unharmed? Could the map be transfered to another (foreign) system intact?

Hamish

Hamish wrote:

> Tcl and Python both use Unicode internally, and automatically convert
> to the locale's encoding when calling system functions. So if the user
> types a value for e.g. "d.text text=..." into the GUI, the sequence of
> bytes which end up in the corresponding argv will be in the locale's
> encoding.
..
> But, in general, the GUI should probably stick with the locale's
> encoding, as that's what external commands will normally be expecting.
..
> Needless to say, this won't work on Windows. You can get the Windows
> codepage by parsing the output from "mode con cp /status", although
> there might be better solutions. iconv appears to accept cp??? for
> most of the common ones (e.g. cp437 = US, cp850 = western Europe,
> etc).

I was thinking about this with WRT to the meta-data systems.

e.g. say a raster map's units are °C or kg/m³ or µg/l.

Could "r.support units=" support that or would a user have to edit the
cell_misc/$MAP/units file themselves?

No reason why not.

And if they did edit it by hand,
would the (new) file be compatible with what GRASS expects?

What does GRASS expect?

Would
d.legend, GUI profile tool, etc, transfer the degree symbol (or whatever)
unharmed? Could the map be transfered to another (foreign) system intact?

d.* commands will work so long as the encoding used in the units file
matches the current font encoding ($GRASS_FT_ENCODING or "d.font
charset=...").

If the strings are used by the GUI, they'll probably need to be in the
locale's encoding. If they are simply passed through, they need to at
least be valid in the locale's encoding (everything is valid in
ISO-8859-1, but not all strings are valid in UTF-8).

Most of GRASS doesn't deal with characters, just bytes. The only time
that the encoding matters is when passing strings to something which
wants to convert them to Unicode, e.g. FreeType, Tcl, Python.

--
Glynn Clements <glynn@gclements.plus.com>

On 5/12/07 2:08 PM, "Glynn Clements" <glynn@gclements.plus.com> wrote:

How do I access locale charmap from TclTk or wxPython?

Tcl:
set encoding [exec locale charmap]
Python:
encoding = Popen(["locale", "charmap"], stdout=PIPE).communicate()[0].strip()

AFAICT, the resulting value will be valid for iconv(), and thus for
$GRASS_FT_ENCODING.

Needless to say, this won't work on Windows.

I'm reluctant to implement code that doesn't work with windows.

You can get the Windows
codepage by parsing the output from "mode con cp /status", although
there might be better solutions. iconv appears to accept cp??? for
most of the common ones (e.g. cp437 = US, cp850 = western Europe,
etc).

Sorry to bother, but how can I get this information from within TclTk or
wxPython?

Michael

__________________________________________
Michael Barton, Professor of Anthropology
School of Human Evolution & Social Change
Center for Social Dynamics & Complexity
Arizona State University

phone: 480-965-6213
fax: 480-965-7671
www: http://www.public.asu.edu/~cmbarton

On 5/13/07, Michael Barton <michael.barton@asu.edu> wrote:

On 5/12/07 2:08 PM, "Glynn Clements" <glynn@gclements.plus.com> wrote:

[...]

> You can get the Windows
> codepage by parsing the output from "mode con cp /status", although
> there might be better solutions. iconv appears to accept cp??? for
> most of the common ones (e.g. cp437 = US, cp850 = western Europe,
> etc).

Sorry to bother, but how can I get this information from within TclTk or
wxPython?

system("mode con cp /status > some_temp_name_probably_from_g.temp")

then open and parse it?

Daniel.

--
-- Daniel Calvelo Aros

On Sun, 13 May 2007, Michael Barton wrote:

You can get the Windows
codepage by parsing the output from "mode con cp /status", although
there might be better solutions. iconv appears to accept cp??? for
most of the common ones (e.g. cp437 = US, cp850 = western Europe,
etc).

Sorry to bother, but how can I get this information from within TclTk or
wxPython?

Well on my Windows system the output from running "mode con cp /status" is:

Status for device CON:
----------------------
     Code page: 850

That includes one blank line beforehand and two afterwards, if it makes any difference to the parsing if you're going about it that way.

Paul

Michael Barton wrote:

On 5/12/07 2:08 PM, "Glynn Clements" <glynn@gclements.plus.com> wrote:

>> How do I access locale charmap from TclTk or wxPython?
>
> Tcl:
> set encoding [exec locale charmap]
> Python:
> encoding = Popen(["locale", "charmap"], stdout=PIPE).communicate()[0].strip()
>
> AFAICT, the resulting value will be valid for iconv(), and thus for
> $GRASS_FT_ENCODING.
>
> Needless to say, this won't work on Windows.

I'm reluctant to implement code that doesn't work with windows.

> You can get the Windows
> codepage by parsing the output from "mode con cp /status", although
> there might be better solutions. iconv appears to accept cp??? for
> most of the common ones (e.g. cp437 = US, cp850 = western Europe,
> etc).

Sorry to bother, but how can I get this information from within TclTk or
wxPython?

  set str [exec mode con cp /status]
  regexp {[0-9]+} $str num
  set encoding cp$num

--
Glynn Clements <glynn@gclements.plus.com>

Thanks. I can use TclTk to find out what kind of system I'm on and implement
this or the other code to find out the encoding.

Michael

On 5/14/07 12:32 AM, "Glynn Clements" <glynn@gclements.plus.com> wrote:

Sorry to bother, but how can I get this information from within TclTk or
wxPython?

set str [exec mode con cp /status]
regexp {[0-9]+} $str num
set encoding cp$num

__________________________________________
Michael Barton, Professor of Anthropology
School of Human Evolution & Social Change
Center for Social Dynamics & Complexity
Arizona State University

phone: 480-965-6213
fax: 480-965-7671
www: http://www.public.asu.edu/~cmbarton