[GRASS-dev] what is the meaning of: "Error reading raster data for row 239 of <MASK>"

Hello,

When I run a script that loops over a long series of point data sets and then does a series of raster calculations based on these data sets, I sometimes get the following error:

ERROR: Error reading raster data for row 239 of <MASK>

Can someone explain what this means and how to debug this ?

Moritz

On Jul 2, 2015 5:18 PM, “Moritz Lennert” <mlennert@club.worldonline.be> wrote:

Hello,

When I run a script that loops over a long series of point data sets and then does a series of raster calculations based on these data sets, I sometimes get the following error:

ERROR: Error reading raster data for row 239 of

Can someone explain what this means and how to debug this ?

Maybe some off_t issue?
Please post the region settings and related info…

Markus

Moritz


grass-dev mailing list
grass-dev@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/grass-dev

* Moritz Lennert <mlennert@club.worldonline.be> [2015-07-02 17:18:49 +0200]:

Hello,

When I run a script that loops over a long series of point data sets and
then does a series of raster calculations based on these data sets, I
sometimes get the following error:

ERROR: Error reading raster data for row 239 of <MASK>

Can someone explain what this means and how to debug this ?

I am too interested in this question. I've seen this in my script for
landsat8 to land surface temperature when instructing very large amount
of neighbor-modifiers for r.mapcalc.

Some attempts fail with such/similar error. Re-running the same command
works fine most of the times. All this in a machine with plenty of
memory.

Nikos

On 02/07/15 19:26, Markus Neteler wrote:

On Jul 2, 2015 5:18 PM, "Moritz Lennert" <mlennert@club.worldonline.be
<mailto:mlennert@club.worldonline.be>> wrote:
>
> Hello,
>
> When I run a script that loops over a long series of point data sets
and then does a series of raster calculations based on these data sets,
I sometimes get the following error:
>
> ERROR: Error reading raster data for row 239 of <MASK>
>
> Can someone explain what this means and how to debug this ?

Maybe some off_t issue?
Please post the region settings and related info...

> g.version -reb
GRASS 7.1.svn (2015)

  ./configure --prefix=/usr/lib --sysconfdir=/etc --sharedstatedir=/var --enable-socket --enable-shared --enable-largefile --with-postgres --with-mysql --with-pthread --with-cxx --with-x --with-gdal --with-freetype --with-motif --with-readline --with-nls --with-odbc --with-sqlite --with-freetype-includes=/usr/include/freetype2 --with-tcltk-includes=/usr/include/tcl --with-postgres-includes=/usr/include/postgresql --with-mysql-includes=/usr/include/mysql --with-proj-share=/usr/share/proj --with-python=/usr/bin/python-config --with-cairo --with-geos --with-blas --with-lapack --with-liblas=/usr/bin/liblas-config
libgis Revision: 64732
libgis Date: 2015-02-25 01:54:05 +0100 (mer 25 fév 2015)
PROJ.4: 4.9.1
GDAL/OGR: 1.10.1
GEOS: 3.4.2
SQLite: 3.8.10.2

> g.region -p
projection: 99 (Lambert Azimuthal Equal Area)
zone: 0
datum: etrs89
ellipsoid: grs80
north: 3177000
south: 2937000
west: 3790000
east: 4071000
nsres: 1000
ewres: 1000
rows: 240
cols: 281
cells: 67440

> r.info -g MASKnorth=3177000
south=2937000
east=4071000
west=3790000
nsres=1000
ewres=1000
rows=240
cols=281
cells=67440
datatype=CELL
ncats=1

Here's the error:

ERREUR: Error reading raster data for row 239 of <MASK>
Traceback (most recent call last):
   File "../../calulate_huff.py", line 346, in <module>
     quiet=True)
   File "/data/home/mlennert/SRC/GRASS/grass_trunk/dist.x86_64-unknown-linux-gnu/etc/python/grass/script/core.py", line 376, in run_command
     return handle_errors(returncode, returncode, args, kwargs)
   File "/data/home/mlennert/SRC/GRASS/grass_trunk/dist.x86_64-unknown-linux-gnu/etc/python/grass/script/core.py", line 312, in handle_errors
     returncode=returncode)
grass.exceptions.CalledModuleError: Module run None ['r.mapcalc', '--o', '--q', 'expression=temp_prob = float(firm_rate_364596) / float(sum_rates)'] ended with error
Process ended with non-zero return code 1. See errors in the (error) output.

And here the info for the two maps in the r.mapcalc call:

> r.info -g firm_rate_364596
north=3177000
south=2937000
east=4071000
west=3790000
nsres=1000
ewres=1000
rows=240
cols=281
cells=67440
datatype=DCELL
ncats=0

> r.info -g sum_rates
north=3177000
south=2937000
east=4071000
west=3790000
nsres=1000
ewres=1000
rows=240
cols=281
cells=67440
datatype=DCELL
ncats=0

But running the same script on the same maps, the error always appears at a different stage (other firm_rate_* map).

Moritz

On 03-07-15 11:04, Moritz Lennert wrote:

On 02/07/15 19:26, Markus Neteler wrote:

On Jul 2, 2015 5:18 PM, "Moritz Lennert" <mlennert@club.worldonline.be
<mailto:mlennert@club.worldonline.be>> wrote:
>
> Hello,
>
> When I run a script that loops over a long series of point data sets
and then does a series of raster calculations based on these data sets,
I sometimes get the following error:
>
> ERROR: Error reading raster data for row 239 of <MASK>
>
> Can someone explain what this means and how to debug this ?

Just to note that this does not seem to be an isolated problem:

* http://lists.osgeo.org/pipermail/grass-dev/2014-September/070584.html
* http://lists.osgeo.org/pipermail/grass-dev/2015-May/074937.html

Maybe some off_t issue?
Please post the region settings and related info...

> g.version -reb
GRASS 7.1.svn (2015)

./configure --prefix=/usr/lib --sysconfdir=/etc --sharedstatedir=/var --enable-socket --enable-shared --enable-largefile --with-postgres --with-mysql --with-pthread --with-cxx --with-x --with-gdal --with-freetype --with-motif --with-readline --with-nls --with-odbc --with-sqlite --with-freetype-includes=/usr/include/freetype2 --with-tcltk-includes=/usr/include/tcl --with-postgres-includes=/usr/include/postgresql --with-mysql-includes=/usr/include/mysql --with-proj-share=/usr/share/proj --with-python=/usr/bin/python-config --with-cairo --with-geos --with-blas --with-lapack --with-liblas=/usr/bin/liblas-config
libgis Revision: 64732
libgis Date: 2015-02-25 01:54:05 +0100 (mer 25 fév 2015)
PROJ.4: 4.9.1
GDAL/OGR: 1.10.1
GEOS: 3.4.2
SQLite: 3.8.10.2

> g.region -p
projection: 99 (Lambert Azimuthal Equal Area)
zone: 0
datum: etrs89
ellipsoid: grs80
north: 3177000
south: 2937000
west: 3790000
east: 4071000
nsres: 1000
ewres: 1000
rows: 240
cols: 281
cells: 67440

> r.info -g MASKnorth=3177000
south=2937000
east=4071000
west=3790000
nsres=1000
ewres=1000
rows=240
cols=281
cells=67440
datatype=CELL
ncats=1

Here's the error:

ERREUR: Error reading raster data for row 239 of <MASK>
Traceback (most recent call last):
  File "../../calulate_huff.py", line 346, in <module>
    quiet=True)
  File "/data/home/mlennert/SRC/GRASS/grass_trunk/dist.x86_64-unknown-linux-gnu/etc/python/grass/script/core.py", line 376, in run_command
    return handle_errors(returncode, returncode, args, kwargs)
  File "/data/home/mlennert/SRC/GRASS/grass_trunk/dist.x86_64-unknown-linux-gnu/etc/python/grass/script/core.py", line 312, in handle_errors
    returncode=returncode)
grass.exceptions.CalledModuleError: Module run None ['r.mapcalc', '--o', '--q', 'expression=temp_prob = float(firm_rate_364596) / float(sum_rates)'] ended with error
Process ended with non-zero return code 1. See errors in the (error) output.

And here the info for the two maps in the r.mapcalc call:

> r.info -g firm_rate_364596
north=3177000
south=2937000
east=4071000
west=3790000
nsres=1000
ewres=1000
rows=240
cols=281
cells=67440
datatype=DCELL
ncats=0

> r.info -g sum_rates
north=3177000
south=2937000
east=4071000
west=3790000
nsres=1000
ewres=1000
rows=240
cols=281
cells=67440
datatype=DCELL
ncats=0

But running the same script on the same maps, the error always appears at a different stage (other firm_rate_* map).

Moritz
_______________________________________________
grass-dev mailing list
grass-dev@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/grass-dev

On 03/07/15 11:27, Paulo van Breugel wrote:

On 03-07-15 11:04, Moritz Lennert wrote:

On 02/07/15 19:26, Markus Neteler wrote:

On Jul 2, 2015 5:18 PM, "Moritz Lennert" <mlennert@club.worldonline.be
<mailto:mlennert@club.worldonline.be>> wrote:
>
> Hello,
>
> When I run a script that loops over a long series of point data sets
and then does a series of raster calculations based on these data sets,
I sometimes get the following error:
>
> ERROR: Error reading raster data for row 239 of <MASK>
>
> Can someone explain what this means and how to debug this ?

Just to note that this does not seem to be an isolated problem:

* http://lists.osgeo.org/pipermail/grass-dev/2014-September/070584.html
* http://lists.osgeo.org/pipermail/grass-dev/2015-May/074937.html

Right. And as in those two cases, when I do not use a mask, I do not have the problem, but calculation times are increased which makes it a pain to work without a mask.

The error comes from here:

raster/get_row.c:142: G_fatal_error(_("Error reading raster data for row %d of <%s>"),

which is part of the 'read_data_compressed' function.

read(fcb->data_fd, cmp, readamount) returns 0, while readamount is 18.

I'm not familiar enough with the raster lib to understand what this implies.

Moritz

Moritz Lennert wrote:

The error comes from here:

raster/get_row.c:142: G_fatal_error(_("Error reading raster data for row
%d of <%s>"),

which is part of the 'read_data_compressed' function.

read(fcb->data_fd, cmp, readamount) returns 0, while readamount is 18.

I'm not familiar enough with the raster lib to understand what this implies.

The odd thing is that a MASK raster is read using the same functions
which are used for reading any other raster.

Is the MASK raster itself fine? If you rename it to something other
than MASK, can it be read by r.* commands without error?

--
Glynn Clements <glynn@gclements.plus.com>

On Sat, 4 Jul 2015 18:50:44 +0100, Glynn Clements
<glynn@gclements.plus.com> wrote:

Moritz Lennert wrote:

The error comes from here:

raster/get_row.c:142: G_fatal_error(_("Error reading raster data for

row

%d of <%s>"),

which is part of the 'read_data_compressed' function.

read(fcb->data_fd, cmp, readamount) returns 0, while readamount is 18.

I'm not familiar enough with the raster lib to understand what this
implies.

The odd thing is that a MASK raster is read using the same functions
which are used for reading any other raster.

Is the MASK raster itself fine? If you rename it to something other
than MASK, can it be read by r.* commands without error?

I've tried r.info, r.stats, r.report, r.univar, r.neighbors, r.series,
r.mapcalc and all of them work fine on the result of 'g.rename
MASK,mymask'.

Moritz

On 2015-07-07 09:45, Moritz Lennert wrote:

On Sat, 4 Jul 2015 18:50:44 +0100, Glynn Clements
<glynn@gclements.plus.com> wrote:

Moritz Lennert wrote:

The error comes from here:

raster/get_row.c:142: G_fatal_error(_("Error reading raster data for

row

%d of <%s>"),

which is part of the 'read_data_compressed' function.

read(fcb->data_fd, cmp, readamount) returns 0, while readamount is 18.

I'm not familiar enough with the raster lib to understand what this
implies.

The odd thing is that a MASK raster is read using the same functions
which are used for reading any other raster.

Is the MASK raster itself fine? If you rename it to something other
than MASK, can it be read by r.* commands without error?

I've tried r.info, r.stats, r.report, r.univar, r.neighbors, r.series,
r.mapcalc and all of them work fine on the result of 'g.rename
MASK,mymask'.

Same for displaying the map in the wxgui.

I don't know how to debug this...

Moritz

Moritz Lennert wrote:

I don't know how to debug this...

Can you identify a repeatable test case?

If I could make it happen, I could debug it.

--
Glynn Clements <glynn@gclements.plus.com>

On 2015-07-09 01:01, Glynn Clements wrote:

Moritz Lennert wrote:

I don't know how to debug this...

Can you identify a repeatable test case?

If I could make it happen, I could debug it.

You can get a location names TEST here:

http://tomahawk.ulb.ac.be/moritz/mask_bug_testlocation.tgz

This contains only a PERMANENT mapset.

In that mapset, launch the following command:

r.mask vect=hull; for map in $(g.list rast pat="firm_rate*"); do echo $map ; r.mapcalc "temp_prob = float($map) / sum_rates" --o --q; done; r.mask -r

I get the error arbitrarily for different firm_rate_* maps, sometimes only for one, sometimes for many, but at each run its for different maps.

Moritz

Moritz Lennert wrote:

>> I don't know how to debug this...
>
> Can you identify a repeatable test case?
>
> If I could make it happen, I could debug it.

You can get a location names TEST here:

http://tomahawk.ulb.ac.be/moritz/mask_bug_testlocation.tgz

This contains only a PERMANENT mapset.

In that mapset, launch the following command:

r.mask vect=hull; for map in $(g.list rast pat="firm_rate*"); do echo
$map ; r.mapcalc "temp_prob = float($map) / sum_rates" --o --q; done;
r.mask -r

I get the error arbitrarily for different firm_rate_* maps, sometimes
only for one, sometimes for many, but at each run its for different
maps.

So it's non-deterministic (I'm getting one error for every 10-20
passes over the data, i.e. every 1200-2500 commands), and only applies
to r.mapcalc.

My first guess was a race condition related to pthreads. I tried

  export WORKERS=0

before running the test, and it hasn't happened since.

And actually I'm now fairly certain as to the specific cause.

When compiled with pthread support, r.mapcalc has a mutex for each map
to prevent concurrent access to a single map from multiple threads.

Concurrent access to different maps (and to core lib/gis and and
lib/raster functionality) from different threads is supposed to be
safe (see r34485 and the interval surrounding it), but the MASK was
overlooked.

If a MASK is in use, reading a row from any raster map will read the
corresponding row from the MASK, and there's nothing to prevent
different threads from concurrently accessing two different maps and
thus accessing the MASK.

So, in read_data_{compressed,uncompressed,read_data_fp_compressed} in
lib/raster/get_row.c we have code like:

    if (lseek(fcb->data_fd, (off_t) row * bufsize, SEEK_SET) == -1)
  G_fatal_error(_("Error reading raster data for row %d of <%s>"),
          row, fcb->name);

    if (read(fcb->data_fd, data_buf, bufsize) != bufsize)
  G_fatal_error(_("Error reading raster data for row %d of <%s>"),
          row, fcb->name);

If multiple threads execute this code concurrently, you can end up
with the calls being interleaved like so:

  Thread 1 Thread 2

  lseek
      lseek
      read
  read

meaning that the file offset has changed betwee the lseek() and the
read() (this is why X/Open and POSIX added pread(), but that's still
relatively new).

This only results in an error at the end of the file (the first read()
will leave the file offset at EOF, so the second read() fails), but in
other situations it's likely causing the wrong row of the MASK to be
read.

A possible quick fix:

  if (R__.auto_mask > 0)
      putenv("WORKERS=0");

A slightly better fix would be to check for masking and if it's
enabled, have a single mutex which guards *all* raster reads so that
even concurrent access to different maps is blocked. Unlike the above
hack, this still allows computations to be executed in parallel.

Better still would be to guard access to the MASK so that the other
aspects of raster input can be parallelised (raster I/O is still a
major bottleneck, and mostly because of processing rather than actual
disc access).

But that would involve either adding pthread code directly into the
base raster input code in lib/raster/get_row.c (undesirable) or at
least adding a mechanism to allow r.mapcalc to hook into it to provide
the mutex.

--
Glynn Clements <glynn@gclements.plus.com>

On 14/07/15 09:46, Glynn Clements wrote:

Moritz Lennert wrote:

I don't know how to debug this...

Can you identify a repeatable test case?

If I could make it happen, I could debug it.

You can get a location names TEST here:

http://tomahawk.ulb.ac.be/moritz/mask_bug_testlocation.tgz

This contains only a PERMANENT mapset.

In that mapset, launch the following command:

r.mask vect=hull; for map in $(g.list rast pat="firm_rate*"); do echo
$map ; r.mapcalc "temp_prob = float($map) / sum_rates" --o --q; done;
r.mask -r

I get the error arbitrarily for different firm_rate_* maps, sometimes
only for one, sometimes for many, but at each run its for different
maps.

So it's non-deterministic (I'm getting one error for every 10-20
passes over the data, i.e. every 1200-2500 commands), and only applies
to r.mapcalc.

My first guess was a race condition related to pthreads. I tried

  export WORKERS=0

before running the test, and it hasn't happened since.

And actually I'm now fairly certain as to the specific cause.

When compiled with pthread support, r.mapcalc has a mutex for each map
to prevent concurrent access to a single map from multiple threads.

Concurrent access to different maps (and to core lib/gis and and
lib/raster functionality) from different threads is supposed to be
safe (see r34485 and the interval surrounding it), but the MASK was
overlooked.

If a MASK is in use, reading a row from any raster map will read the
corresponding row from the MASK, and there's nothing to prevent
different threads from concurrently accessing two different maps and
thus accessing the MASK.

So, in read_data_{compressed,uncompressed,read_data_fp_compressed} in
lib/raster/get_row.c we have code like:

     if (lseek(fcb->data_fd, (off_t) row * bufsize, SEEK_SET) == -1)
  G_fatal_error(_("Error reading raster data for row %d of <%s>"),
          row, fcb->name);

     if (read(fcb->data_fd, data_buf, bufsize) != bufsize)
  G_fatal_error(_("Error reading raster data for row %d of <%s>"),
          row, fcb->name);

If multiple threads execute this code concurrently, you can end up
with the calls being interleaved like so:

  Thread 1 Thread 2

  lseek
      lseek
      read
  read

meaning that the file offset has changed betwee the lseek() and the
read() (this is why X/Open and POSIX added pread(), but that's still
relatively new).

This only results in an error at the end of the file (the first read()
will leave the file offset at EOF, so the second read() fails), but in
other situations it's likely causing the wrong row of the MASK to be
read.

A possible quick fix:

  if (R__.auto_mask > 0)
      putenv("WORKERS=0");

A slightly better fix would be to check for masking and if it's
enabled, have a single mutex which guards *all* raster reads so that
even concurrent access to different maps is blocked. Unlike the above
hack, this still allows computations to be executed in parallel.

Better still would be to guard access to the MASK so that the other
aspects of raster input can be parallelised (raster I/O is still a
major bottleneck, and mostly because of processing rather than actual
disc access).

But that would involve either adding pthread code directly into the
base raster input code in lib/raster/get_row.c (undesirable) or at
least adding a mechanism to allow r.mapcalc to hook into it to provide
the mutex.

Thanks for the detailed analysis and explanation !

So, for me, the best solution at this stage is to just set WORKERS to 0 ?

The rest of your proposed solutions is above my head, so I couldn't help with implementation.

Moritz

Moritz Lennert wrote:

So, for me, the best solution at this stage is to just set WORKERS to 0 ?

That should work. Also, the issue should be fixed by r65591.

--
Glynn Clements <glynn@gclements.plus.com>

On 15/07/15 05:21, Glynn Clements wrote:

Moritz Lennert wrote:

So, for me, the best solution at this stage is to just set WORKERS
to 0 ?

That should work. Also, the issue should be fixed by r65591.

Seems to work nicely. Thanks for the quick fix !

One thing I noticed is that on the one test case I used here for
testing your fix, running with WORKERS=0 is slightly faster than without
setting it. I didn't test rigorously, but is that expected ?

Moritz

Moritz Lennert wrote:

One thing I noticed is that on the one test case I used here for
testing your fix, running with WORKERS=0 is slightly faster than without
setting it. I didn't test rigorously, but is that expected ?

Maybe. It avoids the overhead of switching threads. And using multiple
threads only provides a gain if it results in using cores which would
otherwise be idle.

--
Glynn Clements <glynn@gclements.plus.com>

On 18/07/15 11:59, Glynn Clements wrote:

Moritz Lennert wrote:

One thing I noticed is that on the one test case I used here for
testing your fix, running with WORKERS=0 is slightly faster than without
setting it. I didn't test rigorously, but is that expected ?

Maybe. It avoids the overhead of switching threads. And using multiple
threads only provides a gain if it results in using cores which would
otherwise be idle.

Ok. In any case, this probably a candidate for backporting to grass70release before the upcoming release. Can it be backported as such ?

Moritz

Thank you guys, this has been an ongoing issue for me for the last
year or two. I have experienced this error sporadically but could not
recreate it and thus figured it was something specific to my machine.
I noticed this error when using r.series and some of the new temporal
modules-- in each case there was an active MASK. I could reliable
cause this error to occur (but not in a deterministic manner) by
canceling (ctrl-c) a module that had opened a large number of raster
maps. The next run of r.series (or temporal module) would then fail
with the error as described in this thread.

Excellent work and +1 on a backport to grass70.

Dylan

On Sat, Jul 18, 2015 at 12:06 PM, Moritz Lennert
<mlennert@club.worldonline.be> wrote:

On 18/07/15 11:59, Glynn Clements wrote:

Moritz Lennert wrote:

One thing I noticed is that on the one test case I used here for
testing your fix, running with WORKERS=0 is slightly faster than without
setting it. I didn't test rigorously, but is that expected ?

Maybe. It avoids the overhead of switching threads. And using multiple
threads only provides a gain if it results in using cores which would
otherwise be idle.

Ok. In any case, this probably a candidate for backporting to grass70release
before the upcoming release. Can it be backported as such ?

Moritz
_______________________________________________
grass-dev mailing list
grass-dev@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/grass-dev

Moritz Lennert wrote:

Ok. In any case, this probably a candidate for backporting to
grass70release before the upcoming release. Can it be backported as such ?

Yes.

--
Glynn Clements <glynn@gclements.plus.com>

2015-07-21 20:36 GMT+02:00 Glynn Clements <glynn@gclements.plus.com>:

Ok. In any case, this probably a candidate for backporting to
grass70release before the upcoming release. Can it be backported as such ?

Yes.

I backported r65591 to relbr70 as r65764. Martin

--
Martin Landa
http://geo.fsv.cvut.cz/gwiki/Landa
http://gismentors.cz/mentors/landa