[GRASS-dev] [GRASS GIS] #32: r.what: shouldn't use static buffers for the inputs

#32: r.what: shouldn't use static buffers for the inputs
---------------------+------------------------------------------------------
Reporter: 1gray | Owner: grass-dev@lists.osgeo.org
     Type: defect | Status: new
Priority: minor | Milestone: 6.4.0
Component: default | Version: svn-trunk
Keywords: |
---------------------+------------------------------------------------------
It seems that I'm a lucky one to face all these !``static buffer!''
issues.

{{{
$ r.what \
       input=2006-
08-{0{1,2,3,4,5,6,7,8,9},{1,2}{0,1,2,3,4,5,6,7,8,9},3{0,1}}.[...].{{1,2,3,4,5,6,7,8,9},1{0,1,2,3,4,5,6,7,8,9},20}
\
       east_north=[...]
r.what: can only do up to 150 cell files, sorry
$
}}}

Surely I can raise {{{NFILES}}} in {{{raster/r.what/main.c}}}, but
wouldn't it be better to replace all the mess in {{{r.what}}} with a
pure dynamic buffer-based solution? (Hopefully I could make a patch.)

I've tested the command above with GRASS 6.2.3-debian-1, but I can see
the relevant code in a recent SVN trunk as well.

--
Ticket URL: <http://trac.osgeo.org/grass/ticket/32&gt;
GRASS GIS <http://grass.osgeo.org>
GRASS Geographic Information System (GRASS GIS) - http://grass.osgeo.org/

#32: r.what: shouldn't use static buffers for the inputs
----------------------+-----------------------------------------------------
  Reporter: 1gray | Owner: grass-dev@lists.osgeo.org
      Type: defect | Status: new
  Priority: minor | Milestone: 6.4.0
Component: default | Version: svn-trunk
Resolution: | Keywords:
----------------------+-----------------------------------------------------
Comment (by 1gray):

On the other hand, (much of) the {{{r.what}}} functionality could easily
be achieved by using {{{v.what.rast}}} together with {{{v.db.select}}}.

Shouldn't therefore {{{r.rast}}} be replaced by a (supposedly simpler)
Shell script?

--
Ticket URL: <http://trac.osgeo.org/grass/ticket/32#comment:1&gt;
GRASS GIS <http://grass.osgeo.org>
GRASS Geographic Information System (GRASS GIS) - http://grass.osgeo.org/

#32: r.what: shouldn't use static buffers for the inputs
----------------------+-----------------------------------------------------
  Reporter: 1gray | Owner: grass-dev@lists.osgeo.org
      Type: defect | Status: new
  Priority: minor | Milestone: 6.4.0
Component: default | Version: svn-trunk
Resolution: | Keywords:
----------------------+-----------------------------------------------------
Comment (by glynn):

Replying to [comment:1 1gray]:
> Shouldn't therefore {{{r.rast}}} be replaced by a (supposedly simpler)
Shell script?
No.
For a start, v.what.rast only accepts a single raster map. Also, r.what
shouldn't depend upon the vector and database functionality (most raster
modules should still work if e.g. you can't GDAL to work).

--
Ticket URL: <http://trac.osgeo.org/grass/ticket/32#comment:2&gt;
GRASS GIS <http://grass.osgeo.org>
GRASS Geographic Information System (GRASS GIS) - http://grass.osgeo.org/

GRASS GIS <trac@osgeo.org> writes:

> Comment (by glynn):

  Thanks for a quick response!

> Replying to [comment:1 1gray]:

>> Shouldn't therefore {{{r.rast}}} be replaced by a (supposedly
>> simpler) Shell script?

> No. For a start, v.what.rast only accepts a single raster map.

  May it be appropriate to extend it to handle any number of
  raster maps (to fill the corresponding number of the DB
  columns)?

> Also, r.what shouldn't depend upon the vector and database
> functionality

  Indeed, it's a valid point.

> (most raster modules should still work if e.g. you can't GDAL to
> work).

  Is the GRASS vector implementation tied that closely to GDAL
  (OGR)?

  I've taken a look at both `r.what' and `v.what.rast', and the
  code seems to be quite similar. It's not surprising, since the
  processing scheme is similar as well:

  * collect the points to query the raster at (either from the
    command line, stdin or from the vector given);

  * query the raster;

  * output the results (either to stdout or to the vector given.)

  It makes me question, could this task be generalized into a set
  of helper functions to maintain lists of the points to query
  (``cache''), and to perform the query itself? What library
  should these functions be added to? lib/raster/?

  Both of the modules currently have some minor deficiencies,
  e. g.:

  * it's not possible to query several rasters with `v.what.rast';

  * it's not possible to specify a different field separator
    string (`fs') with `r.what' (while it's possible with
    `v.db.select'.)

  With the query code (and the output formatting code) split off
  to the library, it may become feasible to overcome these issues.

Ivan Shmakov wrote:

  Both of the modules currently have some minor deficiencies,
  e. g.:

...

  * it's not possible to specify a different field separator
    string (`fs') with `r.what' (while it's possible with
    `v.db.select'.)

That one's trival. done in svn/trunk.
  http://trac.osgeo.org/grass/changeset/30112

I did it as %c like r.cats, other modules are more liberal and allow
%s.
Shrug.

Hamish

      ____________________________________________________________________________________
Be a better friend, newshound, and
know-it-all with Yahoo! Mobile. Try it now. http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ

Ivan Shmakov wrote:

> (most raster modules should still work if e.g. you can't GDAL to
> work).

  Is the GRASS vector implementation tied that closely to GDAL
  (OGR)?

libvect has libgdal as a dependency.

Even if v.what.rast doesn't end up calling into GDAL, it won't even
run if libgdal won't load (e.g. if one of GDAL's many dependencies
fails to load).

  I've taken a look at both `r.what' and `v.what.rast', and the
  code seems to be quite similar. It's not surprising, since the
  processing scheme is similar as well:

  * collect the points to query the raster at (either from the
    command line, stdin or from the vector given);

  * query the raster;

  * output the results (either to stdout or to the vector given.)

  It makes me question, could this task be generalized into a set
  of helper functions to maintain lists of the points to query
  (``cache''), and to perform the query itself? What library
  should these functions be added to? lib/raster/?

One option would be for v.what.rast to use r.what as a slave process
to perform the actual raster lookups.

Some history: GRASS originally only supported raster maps. The vector
code was initially a separate package named "mapdev" (in versions
<=5.3, the vector modules reside in the src/mapdev directory).

The result is that raster functionality is built into libgis, while
anything related to vectors is in separate libraries. Many simple
raster modules have no dependencies other than libgis and its
dependencies (libdatetime, zlib, libc, libm, maybe libraries for
sockets and XDR if those aren't built into libgis).

OTOH, any related to vector modules requires several other modules,
including GDAL. GDAL has an almost open-ended set of dependencies,
some of which are proprietary (and thus prime candidates for binary
incompatibility). It's use of C++ also used to be a source of binary
incompatibility, but the Linux C++ ABI seems to have stabilised in
recent years.

  Both of the modules currently have some minor deficiencies,
  e. g.:

  * it's not possible to query several rasters with `v.what.rast';

  * it's not possible to specify a different field separator
    string (`fs') with `r.what' (while it's possible with
    `v.db.select'.)

  With the query code (and the output formatting code) split off
  to the library, it may become feasible to overcome these issues.

Adding fs= to r.what would be simple enough.

--
Glynn Clements <glynn@gclements.plus.com>

Glynn Clements <glynn@gclements.plus.com> writes:

>>> (most raster modules should still work if e.g. you can't GDAL to
>>> work).

>> Is the GRASS vector implementation tied that closely to GDAL (OGR)?

> libvect has libgdal as a dependency.

  ACK.

  The next silly question is, why both `v.sample' and
  `v.what.rast'? Couldn't ``no interpolation'' be added to
  `v.sample' and `v.what.rast' removed? (I haven't looked at the
  `v.sample' source as of yet.)

> Even if v.what.rast doesn't end up calling into GDAL, it won't even
> run if libgdal won't load (e.g. if one of GDAL's many dependencies
> fails to load).

  Just out of curiosity, would it make sense to split `libvect'
  into two libraries, one depending on GDAL and one not, so that
  some basic vector operations would be available irrespective to
  GDAL's presence?

>> I've taken a look at both `r.what' and `v.what.rast', and the code
>> seems to be quite similar. It's not surprising, since the
>> processing scheme is similar as well:

[...]

>> It makes me question, could this task be generalized into a set of
>> helper functions to maintain lists of the points to query
>> (``cache''), and to perform the query itself? What library should
>> these functions be added to? lib/raster/?

> One option would be for v.what.rast to use r.what as a slave process
> to perform the actual raster lookups.

  Well, yes. But why not a set of library functions? (Having to
  convert the numbers to their ASCII representations and then back
  to machine's doesn't sound like a great idea.)

[...]

Hamish <hamish_b@yahoo.com> writes:

>> Both of the modules currently have some minor deficiencies, e. g.:

> ...

>> * it's not possible to specify a different field separator string
>> (`fs') with `r.what' (while it's possible with `v.db.select'.)

> That one's trival. done in svn/trunk.
> http://trac.osgeo.org/grass/changeset/30112

  Thanks!

> @@ -144,4 +147,16 @@
> projection = G_projection();

  BTW, did anyone notice that the `projection' variable is unused?

  Apparently, the only side effect of this line is to have
  G__init_window () called (via G_get_set_window ().) The
  questions are:

  * is this call necessary in `r.what.rast'?

  * shouldn't this side effect of G_projection () and
    G_get_set_window () be documented?

> + /* see v.in.ascii for a better solution */
> + if (opt_fs->answer != NULL) {
> + if (strcmp(opt_fs->answer, "space") == 0)
> + fs = ' ';
> + else if (strcmp(opt_fs->answer, "tab") == 0)
> + fs = '\t';
> + else if (strcmp(opt_fs->answer, "\\t") == 0)
> + fs = '\t';
> + else
> + fs = opt_fs->answer[0];
> + }
> +

  Oh, the piece of code like this is a sure candidate for a
  library function! (taking into account the number of times it
  appears in the GRASS source.)

> withcats = 0;
> nfiles = 0;

> I did it as %c like r.cats, other modules are more liberal and allow
> %s. Shrug.

  I have no use for %s just now, but there should probably be a
  GRASS-wide policy on using either %c or %s for `fs'. (In order
  for the user to learn this exactly once.)

Ivan Shmakov wrote:

> @@ -144,4 +147,16 @@
> projection = G_projection();

  BTW, did anyone notice that the `projection' variable is unused?

  Apparently, the only side effect of this line is to have
  G__init_window () called (via G_get_set_window ().) The
  questions are:

  * is this call necessary in `r.what.rast'?

I doubt it.

  * shouldn't this side effect of G_projection () and
    G_get_set_window () be documented?

Almost everything ends up calling G_get_set_window() at some point.

This is one of the most common examples of the idiom of initialising
values upon first use.

However, this particular case is a bit of a mess. There are actually
two static windows: dbwindow in (G_get_window()) and G__.window. In
practice, both G_get_window() and G__init_window() invariably end up
setting both to the same value.

[Actually, G_get_window() doesn't set either if GRASS_REGION is set;
but as it's unlikely to change over the lifetime of the process, it
doesn't make any noticable difference.]

--
Glynn Clements <glynn@gclements.plus.com>

Ivan Shmakov wrote:

> Even if v.what.rast doesn't end up calling into GDAL, it won't even
> run if libgdal won't load (e.g. if one of GDAL's many dependencies
> fails to load).

  Just out of curiosity, would it make sense to split `libvect'
  into two libraries, one depending on GDAL and one not, so that
  some basic vector operations would be available irrespective to
  GDAL's presence?

The problem is that the vector reading code all depends upon OGR. The
vector library understands two types of vector map: native and OGR.
The top-level input functions (open, rewind, read, close) all dispatch
to the corresponding native or OGR function depending upon the map
type.

You can build without OGR support, in which case it only supports the
native format. But if you build with OGR support, the generic vector
input functions all depend upon OGR. So, a separate library of vector
functions which don't depend upon GDAL/OGR wouldn't include the
ability to read vector maps, which probably wouldn't be much use.

>> I've taken a look at both `r.what' and `v.what.rast', and the code
>> seems to be quite similar. It's not surprising, since the
>> processing scheme is similar as well:

[...]

>> It makes me question, could this task be generalized into a set of
>> helper functions to maintain lists of the points to query
>> (``cache''), and to perform the query itself? What library should
>> these functions be added to? lib/raster/?

> One option would be for v.what.rast to use r.what as a slave process
> to perform the actual raster lookups.

  Well, yes. But why not a set of library functions? (Having to
  convert the numbers to their ASCII representations and then back
  to machine's doesn't sound like a great idea.)

There doesn't appear to be all that much code which is actually common
to the two. Even the code which is structurally similar has enough
differences that the code required to interface to generalised library
functions may outweight the common code.

--
Glynn Clements <glynn@gclements.plus.com>

Michael,

re. http://trac.osgeo.org/grass/changeset/30212

i.rectify does not require an xterm to run, it's a normal processing
module.

Hamish

      ____________________________________________________________________________________
Be a better friend, newshound, and
know-it-all with Yahoo! Mobile. Try it now. http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ

On Mon, February 18, 2008 02:59, Glynn Clements wrote:

Ivan Shmakov wrote:

> Even if v.what.rast doesn't end up calling into GDAL, it won't even
> run if libgdal won't load (e.g. if one of GDAL's many dependencies
> fails to load).

  Just out of curiosity, would it make sense to split `libvect'
  into two libraries, one depending on GDAL and one not, so that
  some basic vector operations would be available irrespective to
  GDAL's presence?

The problem is that the vector reading code all depends upon OGR. The
vector library understands two types of vector map: native and OGR.
The top-level input functions (open, rewind, read, close) all dispatch
to the corresponding native or OGR function depending upon the map
type.

You can build without OGR support, in which case it only supports the
native format. But if you build with OGR support, the generic vector
input functions all depend upon OGR. So, a separate library of vector
functions which don't depend upon GDAL/OGR wouldn't include the
ability to read vector maps, which probably wouldn't be much use.

Also out of pure curiosity: I really don't understand why this is so. Gdal
and ogr are only necessary to import/export vector data and to treat maps
linked into grass via v.external. I would guess that most users that
really work with vector maps in grass import them into GRASS' native
format and thus don't need any ogr features from then on. So is the
dependency of ogr only there for v.external ?

Moritz

Moritz Lennert wrote:

>> > Even if v.what.rast doesn't end up calling into GDAL, it won't even
>> > run if libgdal won't load (e.g. if one of GDAL's many dependencies
>> > fails to load).
>>
>> Just out of curiosity, would it make sense to split `libvect'
>> into two libraries, one depending on GDAL and one not, so that
>> some basic vector operations would be available irrespective to
>> GDAL's presence?
>
> The problem is that the vector reading code all depends upon OGR. The
> vector library understands two types of vector map: native and OGR.
> The top-level input functions (open, rewind, read, close) all dispatch
> to the corresponding native or OGR function depending upon the map
> type.
>
> You can build without OGR support, in which case it only supports the
> native format. But if you build with OGR support, the generic vector
> input functions all depend upon OGR. So, a separate library of vector
> functions which don't depend upon GDAL/OGR wouldn't include the
> ability to read vector maps, which probably wouldn't be much use.

Also out of pure curiosity: I really don't understand why this is so. Gdal
and ogr are only necessary to import/export vector data and to treat maps
linked into grass via v.external. I would guess that most users that
really work with vector maps in grass import them into GRASS' native
format and thus don't need any ogr features from then on. So is the
dependency of ogr only there for v.external ?

Yes, exactly.

In order to be able to "link" to OGR maps, the code to read such maps
has to be part of the vector library rather than in a module.

FWIW, I've been considering doing something similar for raster maps
and GDAL for 7.x.

--
Glynn Clements <glynn@gclements.plus.com>

Ivan Shmakov <ivan@theory.asu.ru> writes:
Hamish <hamish_b@yahoo.com> writes:

[...]

>> + /* see v.in.ascii for a better solution */
>> + if (opt_fs->answer != NULL) {
>> + if (strcmp(opt_fs->answer, "space") == 0)
>> + fs = ' ';
>> + else if (strcmp(opt_fs->answer, "tab") == 0)
>> + fs = '\t';
>> + else if (strcmp(opt_fs->answer, "\\t") == 0)
>> + fs = '\t';
>> + else
>> + fs = opt_fs->answer[0];
>> + }
>> +

> Oh, the piece of code like this is a sure candidate for a library
> function! (taking into account the number of times it appears in
> the GRASS source.)

  The base for this function may be like the following:

void
backslash (char *d, const char *s)
{
  /* http://www.opengroup.org/onlinepubs/009695399/basedefs/xbd_chap05.html
   */
  char *dp;
  const char *sp;

  for (dp = d, sp = s; *sp != '\0'; dp++, sp++) {
    if (*sp != '\\') {
      /* not an escape, just copy */
      *dp = *sp;
      continue;
    }

    /* advance to the next character */
    ++sp;

    switch (*sp) {
    case '\\': *dp = '\\'; break;
    case 'a': *dp = '\a'; break;
    case 'b': *dp = '\b'; break;
    case 'f': *dp = '\f'; break;
    case 'n': *dp = '\n'; break;
    case 'r': *dp = '\r'; break;
    case 't': *dp = '\t'; break;
    case 'v': *dp = '\v'; break;
    /* check if it's \O, \OO, \OOO or \xHH */
    case '0': case '1': case '2': case '3': case '4':
    case '5': case '6': case '7':
    case 'x': case 'X':
      {
        const int hex_p = ((*sp) == 'x' || (*sp) == 'X');
        char buf[4], *tail;
        long r;

        if (hex_p) {
          strncpy (buf, sp + 1, 2);
          buf[2] = '\0';
        } else {
          strncpy (buf, sp, sizeof (buf) - 1);
          buf[sizeof (buf) - 1] = '\0';
        }

        /* 1 .. 3 octal or exactly two hexadecimal digits are expected
         */
        if ((hex_p && (buf[0] == '\0' || buf[1] == '\0')
            || ((r = strtol (buf, &tail, hex_p ? 16 : 8)),
                tail == buf))) {
          /* NB: skips over unhandled escapes */
          *(dp++) = '\\';
          *(dp) = *(sp);
          continue;
        }

        *dp = r;
        sp += -1 + (hex_p ? 1 : 0) + (tail - buf);
      }
      break;
    default:
      /* NB: skips over unhandled escapes */
      *(dp++) = '\\';
      *(dp) = *(sp);
      continue;
    }
  }

  /* terminate the destination with NUL */
  *dp = '\0';
}

  Surprisingly enough, coreutils/src/printf.c has its print_esc ()
  function putchar ()-based. So, I had to write the above.

  Please note that the implementation above has one subtle
  difference from that of the GNU Coreutils `printf' utility.
  Namely, the latter allows for `\xH', while the former only
  accepts `\xHH' (and passes `\xH' untransformed.) Also, the
  implementation above gives no warning when for a malformed
  escape.

  Note also that the function above doesn't generally allow `d'
  and `s' to overlap, except for the `d == s' case, which is
  allowed.

  May I suggest the function above (as `G_backslash') for
  `lib/gis/strings.c'?

>> withcats = 0; nfiles = 0;

>> I did it as %c like r.cats, other modules are more liberal and allow
>> %s. Shrug.

> I have no use for %s just now, but there should probably be a
> GRASS-wide policy on using either %c or %s for `fs'. (In order for
> the user to learn this exactly once.)

  Since the function above allows strings, I'd opt for allowing a
  string for `sep' everywhere.

Ivan Shmakov wrote:

>> I did it as %c like r.cats, other modules are more liberal and allow
>> %s. Shrug.

> I have no use for %s just now, but there should probably be a
> GRASS-wide policy on using either %c or %s for `fs'. (In order for
> the user to learn this exactly once.)

  Since the function above allows strings, I'd opt for allowing a
  string for `sep' everywhere.

When a string is used as a separator, the behaviour isn't entirely
obvious. If the string is passed as the separator argument to
G_tokenize(), it treats the string as a set of characters, all of
which are treated as separators.

For a module to provide this behaviour, it would be more natural to
use e.g. "opt.sep->multiple = YES" and require the individual values
to be single characters (after any parsing of backslash sequences).

--
Glynn Clements <glynn@gclements.plus.com>

Glynn Clements <glynn@gclements.plus.com> writes:

>>>> I did it as %c like r.cats, other modules are more liberal and
>>>> allow %s. Shrug.

>>> I have no use for %s just now, but there should probably be a
>>> GRASS-wide policy on using either %c or %s for `fs'. (In order for
>>> the user to learn this exactly once.)

>> Since the function above allows strings, I'd opt for allowing a
>> string for `sep' everywhere.

  s/sep/fs/

> When a string is used as a separator, the behaviour isn't entirely
> obvious. If the string is passed as the separator argument to
> G_tokenize(), it treats the string as a set of characters, all of
> which are treated as separators.

  I should have been more specific. I had in mind the /output/
  field separator option only.

> For a module to provide this behaviour, it would be more natural to
> use e.g. "opt.sep->multiple = YES" and require the individual values
> to be single characters (after any parsing of backslash sequences).

  It may make sense.

  Alternatively, it may be suggested to follow the Awk approach
  and allow a regular expression as an input field separator. (It
  looks like that Gnulib contains an RE matching module.)

#32: r.what: shouldn't use static buffers for the inputs
--------------------------+-------------------------------------------------
  Reporter: 1gray | Owner: grass-dev@lists.osgeo.org
      Type: defect | Status: new
  Priority: minor | Milestone: 6.4.0
Component: Raster | Version: svn-trunk
Resolution: | Keywords: r.what
  Platform: Unspecified | Cpu: Unspecified
--------------------------+-------------------------------------------------
Changes (by hamish):

  * keywords: => r.what
  * platform: => Unspecified
  * component: default => Raster
  * cpu: => Unspecified

--
Ticket URL: <https://trac.osgeo.org/grass/ticket/32#comment:3&gt;
GRASS GIS <http://grass.osgeo.org>