[GRASS-dev] [GRASS GIS] #745: r.report formatting doesn't seem to be multi-byte aware

#745: r.report formatting doesn't seem to be multi-byte aware
-------------------------+--------------------------------------------------
Reporter: peifer | Owner: grass-dev@lists.osgeo.org
     Type: defect | Status: new
Priority: minor | Milestone: 6.4.0
Component: Raster | Version: 6.4.0 RCs
Keywords: | Platform: Linux
      Cpu: Unspecified |
-------------------------+--------------------------------------------------
{{{
| 126|DEG02 - Gera, Kreisfreie Stadt |
| 127|ES617 - Málaga |
| 128|DE915 - Göttingen |
| 129|DE413 - Märkisch-Oderland |
| 130|CH023 - Solothurn |
| 130|CH023 - Solothurn |
| 131|UKI22 - Outer London - South |
| 132|UKD42 - Blackpool |
| 133|GR124 - Pella |
| 134|ITC31 - Imperia |
| 135|DK022 - Vest- og Sydsjælland |
| 136|NL422 - Midden-Limburg |
| 137|RO423 - Hunedoara |
| 138|DEA1D - Rhein-Kreis Neuss |
| 139|CH013 - Genève |
| 140|BE256 - Arr. Roeselare |
| 141|FR625 - Lot |
}}}

--
Ticket URL: <http://trac.osgeo.org/grass/ticket/745&gt;
GRASS GIS <http://grass.osgeo.org>

#745: r.report formatting doesn't seem to be multi-byte aware
-------------------------+--------------------------------------------------
Reporter: peifer | Owner: grass-dev@…
     Type: defect | Status: new
Priority: minor | Milestone: 6.4.0
Component: Raster | Version: 6.4.0 RCs
Keywords: | Platform: Linux
      Cpu: Unspecified |
-------------------------+--------------------------------------------------

Comment(by neteler):

A similar issue has been reported for gawk and mawk. The ticket includes
suggestions

http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=404980

--
Ticket URL: <https://trac.osgeo.org/grass/ticket/745#comment:1&gt;
GRASS GIS <http://grass.osgeo.org>

#745: r.report formatting doesn't seem to be multi-byte aware
-------------------------+--------------------------------------------------
Reporter: peifer | Owner: grass-dev@…
     Type: defect | Status: new
Priority: minor | Milestone: 6.4.1
Component: Raster | Version: 6.4.0 RCs
Keywords: | Platform: Linux
      Cpu: Unspecified |
-------------------------+--------------------------------------------------
Changes (by neteler):

  * milestone: 6.4.0 => 6.4.1

--
Ticket URL: <http://trac.osgeo.org/grass/ticket/745#comment:2&gt;
GRASS GIS <http://grass.osgeo.org>

#745: r.report formatting doesn't seem to be multi-byte aware
-------------------------+--------------------------------------------------
Reporter: peifer | Owner: grass-dev@…
     Type: defect | Status: new
Priority: minor | Milestone: 6.4.1
Component: Raster | Version: 6.4.0 RCs
Keywords: | Platform: Linux
      Cpu: Unspecified |
-------------------------+--------------------------------------------------

Comment(by glynn):

Replying to [ticket:745 peifer]:

The same issue applies to anything which attempts to format output
containing user-supplied text (hard-coded text should all be ASCII).

The main problem with fixing this is that the necessary functions aren't
available on all systems, so we will need either wrapper functions or a
lot of #ifdef's.

An outline approach is to convert the string from multi-byte to wide with
mbstowcs() or mbsrtowcs(), then either use wcslen() to find the number of
characters in the wide string, or use wcswidth() to find the width (in
columns) of the wide string.

wcswidth() correctly handles the "full-width" characters found in CJK
locales, which occupy two columns. However, wcswidth() is POSIX while
wcslen() is C99. None of the necessary functions are in C89.

Also, this rules out relying upon printf() etc for formatting, as printf's
width specifiers are in "char"s (i.e. bytes), not characters or columns.

BTW, regarding gawk/mawk: as the ticket notes, POSIX explicitly states
that awk matches the behaviour of printf, i.e. field widths are in bytes,
not characters/columns.

--
Ticket URL: <https://trac.osgeo.org/grass/ticket/745#comment:3&gt;
GRASS GIS <http://grass.osgeo.org>

#745: r.report formatting doesn't seem to be multi-byte aware
-------------------------+--------------------------------------------------
Reporter: peifer | Owner: grass-dev@…
     Type: defect | Status: new
Priority: minor | Milestone: 6.4.1
Component: Raster | Version: 6.4.0 RCs
Keywords: | Platform: Linux
      Cpu: Unspecified |
-------------------------+--------------------------------------------------

Comment(by peifer):

Replying to [comment:3 glynn]:

Thanks for the explanations, the situation seems to be complex. It might
not be worth investing to much time in this in order to get a nicer
"pritty print" functionality.

> BTW, regarding gawk/mawk: as the ticket notes, POSIX explicitly states
that awk matches the behaviour of printf, i.e. field widths are in bytes,
not characters/columns.

Gawk maintainer Arnold Robbins changed gawk's printf behaviour after I
pointed him to the issue, some 2 years ago. (Needless to say that he knows
that POSIX specifies it differently).

{{{
# bash built-in printf
[peifer:~]> printf "%-12s|\n" "dôležité"
dôležité |

# The other printf
[peifer:~]> /usr/bin/printf "%-12s|\n" "dôležité"
dôležité |

# Gawk 3.1.7 and higher
[peifer:~]> gawk 'BEGIN{ printf "%-12s|\n", "dôležité" }'
dôležité |
}}}

--
Ticket URL: <https://trac.osgeo.org/grass/ticket/745#comment:4&gt;
GRASS GIS <http://grass.osgeo.org>