On Wed, Oct 14, 2015 at 9:45 PM, Dylan Beaudette
<dylan.beaudette@gmail.com> wrote:
On Wed, Oct 14, 2015 at 12:55 PM, Dylan Beaudette
<dylan.beaudette@gmail.com> wrote:On Wed, Oct 14, 2015 at 10:50 AM, Dylan Beaudette
<dylan.beaudette@gmail.com> wrote:Some additional clues:
The original stack was 365 maps with 3105 x 7025 cells.
1. zooming into a smaller region (30 x 40 cells) and running
t.rast.series 100x resulted in 100 "correct" maps, no errors.2. returning to the full extent and running t.rast.series 30x on the
first 31 maps resulted in 30 "correct" maps, no errors.3. returning to the full extent and running t.rast.series 30x on the
last 31 maps resulted in 30 "correct" maps, no errorsSo, it seems that t.rast.series (r.series) is throwing an error, or
generating wront output, when when:a large set of maps are supplied as input, and, a region that has a
moderate number of total cells.Yeah, I know, that isn't very specific. I will try re-compiling with
debugging and no optimization next.Dylan
More data,
1. re-compiled with CFLAGS="-g -Wall":
* Multiple runs of t.rast.series with the full stack (365 maps with
3105 x 7025 cells), no errors.
* each run required about 8.5 minutes to complete2. re-compiled with CFLAGS="-O2 -mtune=native -march=native" LDFLAGS="-s":
* 10x tests with full stack, no errors
* each run required about 3.5 minutes3. re-run original script (see listing below)
* random errors from t.rast.seriesThis doesn't make much sense to me. The only difference between my
latest "tests" and the original code is that the input to
t.rast.series was static over the course of my "tests", vs. dynamic
within the original code (see below). I purposely selected a stack
that caused t.rast.series to throw an error for my tests.OK, this does make sense--t.rast.series (r.series) was not the source
of the problems. I was able to verify this by running t.univar on the
output from the previous step:# NOTE: 4 CPUs so that external disk isn't thrashed
gdd_max_C=30
gdd_min_C=10
gdd_base_C=10
t.rast.mapcalc --q --o nprocs=4 input=tmin_subset,tmax_subset
output=gdd basename=gdd expr="max(((min(tmax_subset, $gdd_max_C) +
max(tmin_subset, $gdd_min_C)) / 2.0) - $gdd_base_C, 0)"... which means that t.rast.mapcalc was generating one (or more)
outputs with some kind of problem, which was then causing t.univar and
t.rast.series to fail.
I can now verify that t.rast.mapcalc is creating some raster maps with
corrupt (?) data. Corrupt in the sense that subsequent reading of the
maps results in the "Error reading raster data for row ..." error.
Just in case anyone is interested, I have opened a ticket for more
informative errors raised by lib/raster/get_row.c
(https://trac.osgeo.org/grass/ticket/2762).
As previously reported, errors seem to occur about 50-60% of the time
and _do not_ appear to be related to the number of concurrent
t.rast.mapcalc instances.
After some more testing, I have found that t.rast.mapcalc does not
(randomly) generate corrupt maps when the output from the mapcalc
expression results in a CELL type map. Expressions that result in both
FCELL and DCELL seem to trigger the corruption.
Fortunately my current project isn't too discriminating and is fine
with CELL output from t.rast.mapcalc.
I now suspect that this is an overflow issue in t.rast.mapcalc (well
the library functions that it calls) that may or may not be influenced
by the use of files linked via r.external.
The inputs to t.rast.mapcalc are files that have been registered with
r.external. I suspect that the multiple concurrent r.mapcalc instances
may be to blame. I don't have an explanation other than some evidence
from the last time I encountered this type of issue. The workflow then
was :1. spawn 8 concurrent processes via backgrounding: r.sun -> r.mapcalc
2. when finished with daily solar models, sum maps with r.series
I would occasionally encounter the "Error reading raster data for row
xxx" error from r.series in this case and assume that r.series had
somehow broken the map in question.It would seem that concurrent use of r.mapcalc may be worth
investigating... however, it is strange that it only occurs sometimes.
I stand corrected. My previous encounters with the "Error reading
raster data for row ..." error were likely associated with this
related problem, which is now fixed:
http://lists.osgeo.org/pipermail/grass-dev/2015-July/075627.html
Oddly enough, I didn't have problems with maps generated with the
following (similar) code:# spring frost
# if tmin never drops below 0 before the start of summer, then the
last "spring frost" is on day 0
# NOTE: 2 CPUs so that disk isn't thrashed
t.rast.mapcalc --o -n nprocs=2 input=tmin output=spring_frost
basename=spring_frost \
expr="if(start_doy() < 182, if(tmin < 0, start_doy(), 0), null())"# fall frost
# NOTE: 2 CPUs so that disk isn't thrashed
t.rast.mapcalc --o -n nprocs=2 input=tmin output=fall_frost
basename=fall_frost \
expr="if(start_doy() > 213, if(tmin < 0, start_doy(), 365), null())"
... Not so odd anymore, as these t.rast.mapcalc expressions always
resulted in CELL maps.
Dylan