[GRASS-dev] g.mlist, g.mremove fails in non-English locale

  My locale is:

$ locale
LANG=pl_PL.UTF-8
LC_CTYPE="pl_PL.UTF-8"
LC_NUMERIC="pl_PL.UTF-8"
LC_TIME="pl_PL.UTF-8"
LC_COLLATE="pl_PL.UTF-8"
LC_MONETARY="pl_PL.UTF-8"
LC_MESSAGES="pl_PL.UTF-8"
LC_PAPER="pl_PL.UTF-8"
LC_NAME="pl_PL.UTF-8"
LC_ADDRESS="pl_PL.UTF-8"
LC_TELEPHONE="pl_PL.UTF-8"
LC_MEASUREMENT="pl_PL.UTF-8"
LC_IDENTIFICATION="pl_PL.UTF-8"
LC_ALL=

  In this locale g.mlist is broken. Eg. the command:

g.mlist rast pat=* sep=,

  prints the following:

dostêpnych,mapsecie,N51E015.hgt,N51E016.hgt,<PERMANENT>:,plików,raster,w

  instead of only:

N51E015.hgt,N51E016.hgt

This also breaks g.mremove in turn.

It looks that the Polish translation "raster plików dostêpnych w mapsecie <PERMANENT>:", used in g.list, gets mixed into g.mlist output. What's wrong?

Maciek

Maciej Sieczka wrote:

$ locale
LANG=pl_PL.UTF-8
LC_CTYPE="pl_PL.UTF-8"
LC_NUMERIC="pl_PL.UTF-8"
LC_TIME="pl_PL.UTF-8"
LC_COLLATE="pl_PL.UTF-8"
LC_MONETARY="pl_PL.UTF-8"
LC_MESSAGES="pl_PL.UTF-8"
LC_PAPER="pl_PL.UTF-8"
LC_NAME="pl_PL.UTF-8"
LC_ADDRESS="pl_PL.UTF-8"
LC_TELEPHONE="pl_PL.UTF-8"
LC_MEASUREMENT="pl_PL.UTF-8"
LC_IDENTIFICATION="pl_PL.UTF-8"
LC_ALL=

  In this locale g.mlist is broken. Eg. the command:

g.mlist rast pat=* sep=,

  prints the following:

dostępnych,mapsecie,N51E015.hgt,N51E016.hgt,<PERMANENT>:,plików,raster,w

  instead of only:

N51E015.hgt,N51E016.hgt

This also breaks g.mremove in turn.

It looks that the Polish translation "raster plików dostępnych w
mapsecie <PERMANENT>:", used in g.list, gets mixed into g.mlist output.
What's wrong?

What's wrong is a matter of perspective.

In the immediate sense, the problem is that g.mlist removes the header
lines with:

  | grep -vE '^-+$|files available' \

which doesn't allow for NLS. Checking for e.g. ">:$" would probably be
better (although technically both > and : are legal in map names, it
seems unlikely to occur in practice).

In a more general sense, the problem is that modules which output
information in machine-readable form are somewhat scarce compared to
modules which output in human-readable form.

--
Glynn Clements <glynn@gclements.plus.com>

Maciej Sieczka wrote:

  My locale is:

$ locale
LANG=pl_PL.UTF-8
LC_CTYPE="pl_PL.UTF-8"
LC_NUMERIC="pl_PL.UTF-8"
LC_TIME="pl_PL.UTF-8"
LC_COLLATE="pl_PL.UTF-8"
LC_MONETARY="pl_PL.UTF-8"
LC_MESSAGES="pl_PL.UTF-8"
LC_PAPER="pl_PL.UTF-8"
LC_NAME="pl_PL.UTF-8"
LC_ADDRESS="pl_PL.UTF-8"
LC_TELEPHONE="pl_PL.UTF-8"
LC_MEASUREMENT="pl_PL.UTF-8"
LC_IDENTIFICATION="pl_PL.UTF-8"
LC_ALL=

  In this locale g.mlist is broken. Eg. the command:

g.mlist rast pat=* sep=,

  prints the following:

dostêpnych,mapsecie,N51E015.hgt,N51E016.hgt,<PERMANENT>:,plików,raster,w

  instead of only:

N51E015.hgt,N51E016.hgt

This also breaks g.mremove in turn.

It looks that the Polish translation "raster plików dostêpnych w
mapsecie <PERMANENT>:", used in g.list, gets mixed into g.mlist output.

What's wrong?

The g.mlist script relies on the g.list module being run untranslated.

The immediate fix is to add a couple of lines to the start of the script
temporarily disabling the locale settings, much like is done for scripts
that use awk to stop "," being used as the decimal marker.

The long term fix is to rewrite g.mlist not to be a hack.

Hamish

      ____________________________________________________________________________________
You rock. That's why Blockbuster's offering you one month of Blockbuster Total Access, No Cost.
http://tc.deals.yahoo.com/tc/blockbuster/text5.com

Hamish pisze:

Maciej Sieczka wrote:

What's wrong?

The g.mlist script relies on the g.list module being run untranslated

Ah right.

The immediate fix is to add a couple of lines to the start of the script
temporarily disabling the locale settings, much like is done for scripts
that use awk to stop "," being used as the decimal marker.

I think this is a good idea for all shell scripts then. Many use grep, sed, tr etc. like if the output was always in English.

The long term fix is to rewrite g.mlist not to be a hack.

I'm affraid the issue is not limited to g.mlist.

Maciek

Hamish <hamish_b@yahoo.com> writes:

[...]

>> In this locale g.mlist is broken. Eg. the command:

>> g.mlist rast pat=* sep=,

>> prints the following:

[...]

>> It looks that the Polish translation "raster plików dostêpnych w
>> mapsecie <PERMANENT>:", used in g.list, gets mixed into g.mlist
>> output.

>> What's wrong?

> The g.mlist script relies on the g.list module being run
> untranslated.

> The immediate fix is to add a couple of lines to the start of the
> script temporarily disabling the locale settings, much like is done
> for scripts that use awk to stop "," being used as the decimal
> marker.

  It's not even a couple of lines. Just prepending `LC_ALL=C' to
  `g.list' should do the trick, e. g.:

- g.list ...
+ LC_ALL=C g.list ...

> The long term fix is to rewrite g.mlist not to be a hack.

  It's rather `g.list' that should be extended to output in
  machine-readable form. (I've already suggested the `-1' and
  `--no-decoration' options to achieve that, though I haven't
  prepared a patch as of yet.)

>> In this locale g.mlist is broken. Eg. the command:

...

>> It looks that the Polish translation "raster plików dostêpnych w
>> mapsecie <PERMANENT>:", used in g.list, gets mixed into g.mlist
>> output.

>> What's wrong?

Hamish:

> The g.mlist script relies on the g.list module being run
> untranslated.

Ivan:

- g.list ...
+ LC_ALL=C g.list ...

thanks, committed in r30881 and backported to 6.3.0 in r30882.

many other modules use grep, but g.mlist was/is the worst script for
depending on module decorations. Others to consider are ones that parse
'd.mon -L', g.region without -g, and 'db.connect -p'. I haven't checked
if any of those three use i18n macros. Of them db.connect shouldn't IMO.
Anyway when we come across them the fix is easy to implement. (I don't
like an "apply to all scripts" solution)

> The long term fix is to rewrite g.mlist not to be a hack.

  It's rather `g.list' that should be extended to output in
  machine-readable form. (I've already suggested the `-1' and
  `--no-decoration' options to achieve that, though I haven't
  prepared a patch as of yet.)

right. I wonder how to deal with multiple mapsets? fully qualify
name@mapset for all map names or use a #header line with the mapset name
before each new mapset listing? Probably fully qualify everything: it's
easy enough to strip off the mapset with 's/@.*$//' or `cut -f1 -d@` and
much less weird to deal with.

FWIW g.mlist just gives you all map names without qualification, but they
will be in the mapset search path so it only becomes an issue if you have
multiple maps of the same name in the mapset search path.

(g.mremove explicitly limits itself to maps in the current mapset. It
could use 'g.mlist mapset=.' instead of 'g.mlist mapset=`g.gisenv
MAPSET`' but that's just cosmetic)

I worry about using -1 and -l as they can be confused depending on the
font; for many other modules we have used -g to tell the module to create
parsable output. (why -g? I've no idea. but so it is)

Hamish

      ____________________________________________________________________________________
You rock. That's why Blockbuster's offering you one month of Blockbuster Total Access, No Cost.
http://tc.deals.yahoo.com/tc/blockbuster/text5.com

Hamish <hamish_b@yahoo.com> writes:

[...]

> many other modules use grep, but g.mlist was/is the worst script for
> depending on module decorations. Others to consider are ones that
> parse 'd.mon -L', g.region without -g, and 'db.connect -p'. I haven't
> checked if any of those three use i18n macros. Of them db.connect
> shouldn't IMO. Anyway when we come across them the fix is easy to
> implement. (I don't like an "apply to all scripts" solution)

  For every module formatting its output in a human-readable
  manner by default there should probably be an option to format
  it in a machine-readable one instead.

>>> The long term fix is to rewrite g.mlist not to be a hack.

>> It's rather `g.list' that should be extended to output in
>> machine-readable form. (I've already suggested the `-1' and
>> `--no-decoration' options to achieve that, though I haven't prepared
>> a patch as of yet.)

> right. I wonder how to deal with multiple mapsets? fully qualify
> name@mapset for all map names or use a #header line with the mapset
> name before each new mapset listing? Probably fully qualify
> everything: it's easy enough to strip off the mapset with 's/@.*$//'
> or `cut -f1 -d@` and much less weird to deal with.

  I think it's a right solution.

> FWIW g.mlist just gives you all map names without qualification, but
> they will be in the mapset search path so it only becomes an issue if
> you have multiple maps of the same name in the mapset search path.

  Would `g.list' be able to produce fully qualified names,
  `g.mlist' could be made to either strip the mapset, or to pass
  the names unaltered. It seems to be better not to alter the
  names, unless an option is given.

  It seems to me that the whole reason for producing a list of
  non-qualified names in current `g.mlist' was that the qualified
  list is much harder to produce. With `g.list' changed, it'd be
  straightforward.

> (g.mremove explicitly limits itself to maps in the current mapset. It
> could use 'g.mlist mapset=.' instead of 'g.mlist mapset=`g.gisenv
> MAPSET`' but that's just cosmetic)

> I worry about using -1 and -l as they can be confused depending on
> the font;

  However, this is consistent with the other utilities. Consider,
  e. g., ls(1) or enscript(1).

> for many other modules we have used -g to tell the module to create
> parsable output. (why -g? I've no idea. but so it is)

  I'd opt for the set of options, and not the single one, to
  control the `g.list' output. In particular, I find one-column
  decorated output useful.

Hamish wrote:

> > The long term fix is to rewrite g.mlist not to be a hack.
>
> It's rather `g.list' that should be extended to output in
> machine-readable form. (I've already suggested the `-1' and
> `--no-decoration' options to achieve that, though I haven't
> prepared a patch as of yet.)

right. I wonder how to deal with multiple mapsets? fully qualify
name@mapset for all map names or use a #header line with the mapset name
before each new mapset listing? Probably fully qualify everything: it's
easy enough to strip off the mapset with 's/@.*$//' or `cut -f1 -d@` and
much less weird to deal with.

Using header lines requires stateful parsing, which complicates
matters.

For machine readable output, provide an option to output either
qualified or unqualified names. Qualified names should be the default,
IMHO. Unqualified names are a convenience for interactive use, to save
the user from having to type @mapset every time. Where map names are
passed within or between programs, qualified names should always be
used.

--
Glynn Clements <glynn@gclements.plus.com>