[GRASS-dev] [GRASS GIS] #1421: scalability of r.terraflow

#1421: scalability of r.terraflow
-------------------------------------+--------------------------------------
Reporter: dnewcomb | Owner: grass-dev@…
     Type: enhancement | Status: new
Priority: normal | Milestone: 7.0.0
Component: Default | Version: svn-develbranch6
Keywords: r.terraflow large grids | Platform: Linux
      Cpu: x86-64 |
-------------------------------------+--------------------------------------
I have an fcell grid of elevations for the state of North Carolina (51000
rows 133000 columns 6783000000 cells) . I tried to run r.terraflow in
GRASS7 ( 8/8/2011 svn snapshot) and ran into the dimension limits. So I
patched them according to Glynn's email , http://www.osgeo.org/pipermail
/grass-user/2004-February/024722.html and tried again ( Would it be
better to change the dimension variable to int instead of short int?) .

This time my Streams file builds to about 26 GB and then r.terraflow bombs
with :

MFD flow direction
D8CUT=999999986991104.000000
Memory size: 808.00M (847249408) bytes
Memory manager registering memory in MM_IGNORE_MEMORY_EXCEEDED mode.
r.terraflow: grass2str.h:145: AMI_STREAM<T>*
cell2stream(char*, elevation_type, long int*) [with T =
float, elevation_type = float]: Assertion `nrows * ncols ==
str->stream_len()' failed.

The memory size is interesting, because I'm giving it 8GB of RAM out of 16
GB in the command. The temp directory has about 900GB of space, so it has
plenty of room .

The box is 64 bit Ubuntu 11.04

related to ?

http://trac.osgeo.org/grass/ticket/1006

--
Ticket URL: <http://trac.osgeo.org/grass/ticket/1421&gt;
GRASS GIS <http://grass.osgeo.org>

#1421: scalability of r.terraflow
-------------------------------------+--------------------------------------
Reporter: dnewcomb | Owner: grass-dev@…
     Type: enhancement | Status: new
Priority: normal | Milestone: 7.0.0
Component: Default | Version: svn-develbranch6
Keywords: r.terraflow large grids | Platform: Linux
      Cpu: x86-64 |
-------------------------------------+--------------------------------------

Comment(by dnewcomb):

OK, I got it to work with the big grid.

for types.h
typedef short dimension_type; /* represent dimension of the grid */
static const dimension_type dimension_type_max=SHORT_MAX;

changed to :

typedef long dimension_type; /* represent dimension of the grid */
static const dimension_type dimension_type_max=LONG_MAX;

for 3scan.h:
line 127
         assert(ae == AMI_ERROR_END_OF_STREAM);
changed to :
         assert((off_t)ae == AMI_ERROR_END_OF_STREAM);
line 141
         assert(ae == AMI_ERROR_END_OF_STREAM)
changed to:
         assert((off_t)ae == AMI_ERROR_END_OF_STREAM)

output from command:

GRASS 7.0.svn (ncstpft_nad83):/data2/grass7_svn/grass_trunk/bin.x86_64
-unknown-linux-gnu > r.terraflow --overwrite elevation=nc_20ft_ncfpm
filled=nc_fill direction=nc_direct swatershed=nc_sink
accumulation=nc_flow_accum tci=nc_tci memory=8000
stream_dir=/data2/bareearth stats=/data2/bareearth/stats2.out
STREAM temporary files in /data2/bareearth (THESE INTERMEDIATE STREAMS
WILL NOT BE DELETED IN CASE OF ABNORMAL TERMINATION OF THE PROGRAM. TO
SAVE SPACE PLEASE DELETE THESE FILES MANUALLY!)
MFD flow direction
D8CUT=999999986991104.000000
Memory size: 7.81G (8388608000) bytes
Memory manager registering memory in MM_IGNORE_MEMORY_EXCEEDED mode.
total elements=6783000000, nodata elements=3291491362
largest temporary files:
FILL: 454.84G (488376000000) [-1806934592 elements, 72B each]
FLOW: 312.17G (335184829248) [3491508638 elements, 96B each]
Will need at least 909.67G (976752000000) space available in
/data2/bareearth
------------------------------
COMPUTING FLOW DIRECTIONS
classifying nodata (inner & boundary)
EMPQUEUEADAPTIVE: starting in-memory pqueue
EMPQUEUEADAPTIVE: available memory: 7997.93MB
EMPQUEUEADAPTIVE: desired memory: 7997.93MB
sz_stream: 270400 buf_arity: 200 mm_overhead: 8666496 mm_avail:
8386435434.
EMPQUEUEADAPTIVE: memory overhead set to 8.26501MB
EMPQUEUEADAPTIVE: pqsize set to 1047221117
EMPQUEUEADAPTIVE: starting in-memory pqueue
EMPQUEUEADAPTIVE: available memory: 7997.15MB
EMPQUEUEADAPTIVE: desired memory: 7997.15MB
sz_stream: 270400 buf_arity: 200 mm_overhead: 8666496 mm_avail:
8385624130.
EMPQUEUEADAPTIVE: memory overhead set to 8.26501MB
EMPQUEUEADAPTIVE: pqsize set to 1047119704
assigning preliminary directions
finding flat areas (plateaus and depressions)
EMPQUEUEADAPTIVE: starting in-memory pqueue
EMPQUEUEADAPTIVE: available memory: 7997.41MB
EMPQUEUEADAPTIVE: desired memory: 7997.41MB
sz_stream: 270400 buf_arity: 200 mm_overhead: 8666496 mm_avail:
8385894538.
EMPQUEUEADAPTIVE: memory overhead set to 8.26501MB
EMPQUEUEADAPTIVE: pqsize set to 1047153505
EMPQUEUEADAPTIVE: starting in-memory pqueue
EMPQUEUEADAPTIVE: available memory: 7996.64MB
EMPQUEUEADAPTIVE: desired memory: 7996.64MB
sz_stream: 270400 buf_arity: 200 mm_overhead: 8666496 mm_avail:
8385083234.
EMPQUEUEADAPTIVE: memory overhead set to 8.26501MB
EMPQUEUEADAPTIVE: pqsize set to 1047052092
EMPQUEUEADAPTIVE: starting in-memory pqueue
EMPQUEUEADAPTIVE: available memory: 7995.86MB
EMPQUEUEADAPTIVE: desired memory: 7995.86MB
sz_stream: 270400 buf_arity: 200 mm_overhead: 8666496 mm_avail:
8384271930.
EMPQUEUEADAPTIVE: memory overhead set to 8.26501MB
EMPQUEUEADAPTIVE: pqsize set to 1046950679
assigning directions on plateaus
generating watersheds and watershed graph
EMPQUEUEADAPTIVE: starting in-memory pqueue
EMPQUEUEADAPTIVE: available memory: 7998.96MB
EMPQUEUEADAPTIVE: desired memory: 7998.96MB
sz_stream: 270424 buf_arity: 200 mm_overhead: 8705664 mm_avail:
8387517074.
EMPQUEUEADAPTIVE: memory overhead set to 8.30237MB
EMPQUEUEADAPTIVE: pqsize set to 261837856
flooding depressions
available memory: 7999MB (8387787594B)
UnionFind::makeSet: reallocate double 2000
UnionFind::makeSet: reallocate double 4000
UnionFind::makeSet: reallocate double 8000
UnionFind::makeSet: reallocate double 16000
UnionFind::makeSet: reallocate double 32000
UnionFind::makeSet: reallocate double 64000
UnionFind::makeSet: reallocate double 128000
UnionFind::makeSet: reallocate double 256000
UnionFind::makeSet: reallocate double 512000
UnionFind::makeSet: reallocate double 1024000
UnionFind::makeSet: reallocate double 2048000
UnionFind::makeSet: reallocate double 4096000
UnionFind::makeSet: reallocate double 8192000
UnionFind::makeSet: reallocate double 16384000
UnionFind::makeSet: reallocate double 32768000
warning: watershed 1 (R=1) not done
warning: watershed 31667557 (R=31688834) not done
warning: watershed 31667558 (R=31688834) not done
warning: watershed 31674901 (R=31688834) not done
warning: watershed 31676231 (R=31688834) not done
warning: watershed 31688834 (R=31688834) not done
------------------------------
REASSIGNING DIRECTIONS
finding flat areas (plateaus and depressions)
EMPQUEUEADAPTIVE: starting in-memory pqueue
EMPQUEUEADAPTIVE: available memory: 7997.15MB
EMPQUEUEADAPTIVE: desired memory: 7997.15MB
sz_stream: 270400 buf_arity: 200 mm_overhead: 8666496 mm_avail:
8385624138.
EMPQUEUEADAPTIVE: memory overhead set to 8.26501MB
EMPQUEUEADAPTIVE: pqsize set to 1047119705
EMPQUEUEADAPTIVE: starting in-memory pqueue
EMPQUEUEADAPTIVE: available memory: 7996.38MB
EMPQUEUEADAPTIVE: desired memory: 7996.38MB
sz_stream: 270400 buf_arity: 200 mm_overhead: 8666496 mm_avail:
8384812834.
EMPQUEUEADAPTIVE: memory overhead set to 8.26501MB
EMPQUEUEADAPTIVE: pqsize set to 1047018292
EMPQUEUEADAPTIVE: starting in-memory pqueue
EMPQUEUEADAPTIVE: available memory: 7995.61MB
EMPQUEUEADAPTIVE: desired memory: 7995.61MB
sz_stream: 270400 buf_arity: 200 mm_overhead: 8666496 mm_avail:
8384001530.
EMPQUEUEADAPTIVE: memory overhead set to 8.26501MB
EMPQUEUEADAPTIVE: pqsize set to 1046916879
EMPQUEUEADAPTIVE: starting in-memory pqueue
EMPQUEUEADAPTIVE: available memory: 7994.83MB
EMPQUEUEADAPTIVE: desired memory: 7994.83MB
sz_stream: 270400 buf_arity: 200 mm_overhead: 8666496 mm_avail:
8383190226.
EMPQUEUEADAPTIVE: memory overhead set to 8.26501MB
EMPQUEUEADAPTIVE: pqsize set to 1046815466
assigning directions on plateaus
creating flowStream: [AMI_STREAM /data2/bareearth/flowStream 0]
compute flow directions done.
  100%
  100%
  100%
------------------------------
COMPUTING FLOW ACCUMULATION
creating sweep stream from fill output stream
   sorting sweep stream
sweeping: EMPQUEUEADAPTIVE: starting in-memory pqueue
EMPQUEUEADAPTIVE: available memory: 7999.73MB
EMPQUEUEADAPTIVE: desired memory: 7999.73MB
sz_stream: 270424 buf_arity: 200 mm_overhead: 8705664 mm_avail:
8388328213.
EMPQUEUEADAPTIVE: memory overhead set to 8.30237MB
EMPQUEUEADAPTIVE: pqsize set to 261863204
  100%
sorting sweep output stream
  100%
r.terraflow complete.

--
Ticket URL: <http://trac.osgeo.org/grass/ticket/1421#comment:1&gt;
GRASS GIS <http://grass.osgeo.org>

#1421: scalability of r.terraflow
--------------------------------------+-------------------------------------
Reporter: dnewcomb | Owner: grass-dev@…
     Type: enhancement | Status: new
Priority: normal | Milestone: 7.0.0
Component: Raster | Version: svn-develbranch6
Keywords: r.terraflow, large grids | Platform: Linux
      Cpu: x86-64 |
--------------------------------------+-------------------------------------
Changes (by neteler):

  * keywords: r.terraflow large grids => r.terraflow, large grids
  * component: Default => Raster

Comment:

Could you please retry with a recent version of G7?

--
Ticket URL: <http://trac.osgeo.org/grass/ticket/1421#comment:2&gt;
GRASS GIS <http://grass.osgeo.org>

#1421: scalability of r.terraflow
--------------------------------------+-------------------------------------
Reporter: dnewcomb | Owner: grass-dev@…
     Type: enhancement | Status: new
Priority: normal | Milestone: 7.0.0
Component: Raster | Version: svn-develbranch6
Keywords: r.terraflow, large grids | Platform: Linux
      Cpu: x86-64 |
--------------------------------------+-------------------------------------

Comment(by dnewcomb):

Running r.terraflow on Ubuntu 12.04.4 64 bit with
grass-7.0.svn_src_snapshot_2014_03_22
with same input grid ( 51000 rows, 133000 columns):
Run from the gui, command stops with error:

ERROR: [nrows=22004, ncols=33006] dimension_type overflow -- change
dimension_type and recompile

--
Ticket URL: <https://trac.osgeo.org/grass/ticket/1421#comment:3&gt;
GRASS GIS <http://grass.osgeo.org>

#1421: scalability of r.terraflow
--------------------------------------+-------------------------------------
Reporter: dnewcomb | Owner: grass-dev@…
     Type: enhancement | Status: new
Priority: normal | Milestone: 7.0.0
Component: Raster | Version: svn-develbranch6
Keywords: r.terraflow, large grids | Platform: Linux
      Cpu: x86-64 |
--------------------------------------+-------------------------------------

Comment(by dnewcomb):

r.terraflow has been running for 9 hours with the modifications. posted
in the diff files on a grid 51000 rows and 133000 columns. It should take
5 days or so to complete in this computer.

The thing that confuses me at the moment is the FILL line below in the
temp file listing below. Why are there a large negative number of
elements?

total elements=6783000000, nodata elements=3291487486
largest temporary files:
FILL: 454.84G (488376000000) [-1806934592 elements, 72B each]
FLOW: 312.17G (335185201344) [3491512514 elements, 96B each]
Will need at least 909.67G (976752000000) space available in /data1

--
Ticket URL: <https://trac.osgeo.org/grass/ticket/1421#comment:4&gt;
GRASS GIS <http://grass.osgeo.org>

#1421: scalability of r.terraflow
--------------------------------------+-------------------------------------
Reporter: dnewcomb | Owner: grass-dev@…
     Type: enhancement | Status: new
Priority: normal | Milestone: 7.0.0
Component: Raster | Version: svn-develbranch6
Keywords: r.terraflow, large grids | Platform: Linux
      Cpu: x86-64 |
--------------------------------------+-------------------------------------

Comment(by glynn):

Replying to [comment:4 dnewcomb]:

> The thing that confuses me at the moment is the FILL line below in the
temp file listing below. Why are there a large negative number of
elements?

r.terraflow/main.cpp:410:
{{{
   G_message( "\t\t FILL: %s [%d elements, %dB each]",
                   formatNumber(buf, fillmaxsize),
                   nrows * ncols, sizeof(waterWindowType));
   G_message( "\t\t FLOW: %s [%ld elements, %dB each]",
                   formatNumber(buf, flowmaxsize),
                   (long)(nrows * ncols - nodata_count),
sizeof(sweepItem));
}}}

Even if dimension_type is changed to "long", the value is still being
formatted as an "int" ("%d" conversion specifier).

Also, the cast to "long" in the second call is wrong. If nrows and ncols
are of type "int" (or any smaller type, e.g. "short"), the multiplication
will be performed as "int", which may overflow; casting the (possibly
overflowed) result to "long" won't change that.

It should be:

{{{
   G_message( "\t\t FLOW: %s [%ld elements, %dB each]",
                   formatNumber(buf, flowmaxsize),
                   (long)nrows * ncols - nodata_count, sizeof(sweepItem));
}}}

Casting either of the operands to "long" will force the multiplication to
be performed as "long" and yield a "long" result (however, note that
"long" is still only 32 bits on 64-bit versions of Windows).

--
Ticket URL: <https://trac.osgeo.org/grass/ticket/1421#comment:5&gt;
GRASS GIS <http://grass.osgeo.org>

#1421: scalability of r.terraflow
--------------------------------------+-------------------------------------
Reporter: dnewcomb | Owner: grass-dev@…
     Type: enhancement | Status: new
Priority: normal | Milestone: 7.0.0
Component: Raster | Version: svn-develbranch6
Keywords: r.terraflow, large grids | Platform: Linux
      Cpu: x86-64 |
--------------------------------------+-------------------------------------

Old description:

I have an fcell grid of elevations for the state of North Carolina (51000
rows 133000 columns 6783000000 cells) . I tried to run r.terraflow in
GRASS7 ( 8/8/2011 svn snapshot) and ran into the dimension limits. So I
patched them according to Glynn's email , http://www.osgeo.org/pipermail
/grass-user/2004-February/024722.html and tried again ( Would it be
better to change the dimension variable to int instead of short int?) .

This time my Streams file builds to about 26 GB and then r.terraflow
bombs with :

MFD flow direction
D8CUT=999999986991104.000000
Memory size: 808.00M (847249408) bytes
Memory manager registering memory in MM_IGNORE_MEMORY_EXCEEDED mode.
r.terraflow: grass2str.h:145: AMI_STREAM<T>*
cell2stream(char*, elevation_type, long int*) [with T =
float, elevation_type = float]: Assertion `nrows * ncols ==
str->stream_len()' failed.

The memory size is interesting, because I'm giving it 8GB of RAM out of
16 GB in the command. The temp directory has about 900GB of space, so it
has plenty of room .

The box is 64 bit Ubuntu 11.04

related to ?

http://trac.osgeo.org/grass/ticket/1006

New description:

I have an fcell grid of elevations for the state of North Carolina (51000
rows 133000 columns 6783000000 cells) . I tried to run r.terraflow in
GRASS7 ( 8/8/2011 svn snapshot) and ran into the dimension limits. So I
patched them according to Glynn's email , http://www.osgeo.org/pipermail
/grass-user/2004-February/024722.html and tried again ( Would it be
better to change the dimension variable to int instead of short int?) .

This time my Streams file builds to about 26 GB and then r.terraflow bombs
with :
{{{
MFD flow direction
D8CUT=999999986991104.000000
Memory size: 808.00M (847249408) bytes
Memory manager registering memory in MM_IGNORE_MEMORY_EXCEEDED mode.
r.terraflow: grass2str.h:145: AMI_STREAM<T>*
cell2stream(char*, elevation_type, long int*) [with T =
float, elevation_type = float]: Assertion `nrows * ncols ==
str->stream_len()' failed.
}}}
The memory size is interesting, because I'm giving it 8GB of RAM out of 16
GB in the command. The temp directory has about 900GB of space, so it has
plenty of room .

The box is 64 bit Ubuntu 11.04

related to ?

http://trac.osgeo.org/grass/ticket/1006

--

Comment(by hamish):

add '!{{{' and '}}}' around code block.

--
Ticket URL: <https://trac.osgeo.org/grass/ticket/1421#comment:6&gt;
GRASS GIS <http://grass.osgeo.org>

#1421: scalability of r.terraflow
--------------------------------------+-------------------------------------
Reporter: dnewcomb | Owner: grass-dev@…
     Type: enhancement | Status: new
Priority: normal | Milestone: 7.0.0
Component: Raster | Version: svn-develbranch6
Keywords: r.terraflow, large grids | Platform: Linux
      Cpu: x86-64 |
--------------------------------------+-------------------------------------

Comment(by hamish):

G_message %ld changes applied in trunk with r59505, and devbr6 with
r59507.

I'd note a few lines above this a cast to `(long long)` is also used.

Hamish

--
Ticket URL: <https://trac.osgeo.org/grass/ticket/1421#comment:7&gt;
GRASS GIS <http://grass.osgeo.org>

#1421: scalability of r.terraflow
--------------------------------------+-------------------------------------
Reporter: dnewcomb | Owner: grass-dev@…
     Type: enhancement | Status: new
Priority: normal | Milestone: 7.0.0
Component: Raster | Version: svn-develbranch6
Keywords: r.terraflow, large grids | Platform: Linux
      Cpu: x86-64 |
--------------------------------------+-------------------------------------

Comment(by hamish):

fwiw, 'diff -u' which gives "Unified" diffs is the preferred diff format.
It provides a few lines of context around the change.

Hamish

--
Ticket URL: <https://trac.osgeo.org/grass/ticket/1421#comment:8&gt;
GRASS GIS <http://grass.osgeo.org>

#1421: scalability of r.terraflow
--------------------------------------+-------------------------------------
Reporter: dnewcomb | Owner: grass-dev@…
     Type: enhancement | Status: new
Priority: normal | Milestone: 7.0.0
Component: Raster | Version: svn-develbranch6
Keywords: r.terraflow, large grids | Platform: Linux
      Cpu: x86-64 |
--------------------------------------+-------------------------------------

Comment(by dnewcomb):

Replying to [comment:8 hamish]:
> fwiw, 'diff -u' which gives "Unified" diffs is the preferred diff
format. It provides a few lines of context around the change.
>
>
> Hamish
Thanks! Still learning..

--
Ticket URL: <https://trac.osgeo.org/grass/ticket/1421#comment:9&gt;
GRASS GIS <http://grass.osgeo.org>

#1421: scalability of r.terraflow
--------------------------------------+-------------------------------------
Reporter: dnewcomb | Owner: grass-dev@…
     Type: enhancement | Status: new
Priority: normal | Milestone: 7.0.0
Component: Raster | Version: svn-develbranch6
Keywords: r.terraflow, large grids | Platform: Linux
      Cpu: x86-64 |
--------------------------------------+-------------------------------------

Comment(by dnewcomb):

Replying to [comment:7 hamish]:
> G_message %ld changes applied in trunk with r59505, and devbr6 with
r59507.
>
> I'd note a few lines above this a cast to `(long long)` is also used.
>
>
> Hamish
Restarted large grid run with 59507 and edited 3dscan.h and types.h
Now reads:

COMPUTING FLOW DIRECTIONS
classifying nodata (inner & boundary)
total elements=6783000000, nodata elements=3291487486
largest temporary files:
FILL: 454.84G (488376000000) [6783000000 elements, 72B each]
FLOW: 312.17G (335185201344) [3491512514 elements, 96B each]
Will need at least 909.67G (976752000000) space available in /data1

--
Ticket URL: <https://trac.osgeo.org/grass/ticket/1421#comment:10&gt;
GRASS GIS <http://grass.osgeo.org>

#1421: scalability of r.terraflow
--------------------------------------+-------------------------------------
Reporter: dnewcomb | Owner: grass-dev@…
     Type: enhancement | Status: new
Priority: normal | Milestone: 7.0.0
Component: Raster | Version: svn-develbranch6
Keywords: r.terraflow, large grids | Platform: Linux
      Cpu: x86-64 |
--------------------------------------+-------------------------------------

Comment(by dnewcomb):

Seems to have finished correctly in 53.3 hours.

--
Ticket URL: <https://trac.osgeo.org/grass/ticket/1421#comment:11&gt;
GRASS GIS <http://grass.osgeo.org>

#1421: scalability of r.terraflow
--------------------------------------+-------------------------------------
Reporter: dnewcomb | Owner: grass-dev@…
     Type: enhancement | Status: new
Priority: normal | Milestone: 7.0.0
Component: Raster | Version: svn-develbranch6
Keywords: r.terraflow, large grids | Platform: Linux
      Cpu: x86-64 |
--------------------------------------+-------------------------------------

Comment(by neteler):

Backported to relbr7 in r61340.

(unrelated:
dnewcomb, please add you large file calculation timings in
http://grasswiki.osgeo.org/wiki/GRASS_GIS_Performance )

Can the ticket be closed?

--
Ticket URL: <http://trac.osgeo.org/grass/ticket/1421#comment:12&gt;
GRASS GIS <http://grass.osgeo.org>

#1421: scalability of r.terraflow
--------------------------+-------------------------------------------------
  Reporter: dnewcomb | Owner: grass-dev@…
      Type: enhancement | Status: closed
  Priority: normal | Milestone: 7.0.0
Component: Raster | Version: svn-develbranch6
Resolution: fixed | Keywords: r.terraflow, large grids
  Platform: Linux | Cpu: x86-64
--------------------------+-------------------------------------------------
Changes (by dnewcomb):

  * status: new => closed
  * resolution: => fixed

Comment:

I will redo the timings when I get back from leave and post on the wiki.

--
Ticket URL: <http://trac.osgeo.org/grass/ticket/1421#comment:13&gt;
GRASS GIS <http://grass.osgeo.org>