[GRASS-user] r.watershed / r.terraflow - huge area (amazon)

Hello all

I’m working on deriving a nice drainage network for the Amazon basin. The problem is that the area is too large.

The region settings are these:

projection: 3 (Latitude-Longitude)
zone: 0
datum: wgs84
ellipsoid: wgs84
north: 6:17N
south: 21:30S
west: 80:37W
east: 44:50W
nsres: 0:00:03
ewres: 0:00:03
rows: 33340
cols: 42940
cells: 1431619600

With 14.3 billion cels, r.watershed would need about 450 GiB of disk space running in seg mode, is that correct? (from the manual, 31 MB for 1 million cells).

I do have that disk space available, but it’s in a secondary drive. If I set TMPDIR to that drive, will r/watershed use it? I ask because the documentation (https://grass.osgeo.org/grass72/manuals/variables.html) only says that this environmental variable is used by “[Various GRASS GIS commands and wxGUI]”.

On a side note, r.terraflow should be an alternative for such a large dataset, but from the manual:
“r.terraflow has a limit on the number of rows and columns (max 32,767 each)”.

That’s the size of a positive short integer. Is this limit still needed?

thanks

Carlos

···

Prof. Carlos Henrique Grohmann
Institute of Energy and Environment - Univ. of São Paulo, Brazil

  • Digital Terrain Analysis | GIS | Remote Sensing -

http://carlosgrohmann.com
http://orcid.org/0000-0001-5073-5572


Can’t stop the signal.

On Tue, Jan 10, 2017 at 12:30 PM, Carlos Grohmann
<carlos.grohmann@gmail.com> wrote:

Hello all

I'm working on deriving a nice drainage network for the Amazon basin. The
problem is that the area is too large.

The region settings are these:
projection: 3 (Latitude-Longitude)
zone: 0
datum: wgs84
ellipsoid: wgs84
north: 6:17N
south: 21:30S
west: 80:37W
east: 44:50W
nsres: 0:00:03
ewres: 0:00:03
rows: 33340
cols: 42940
cells: 1431619600

With 14.3 billion cels,

Isn't it 1431619600 = 1.431.619.600 = 1.4 billion cells?

r.watershed would need about 450 GiB of disk space
running in seg mode, is that correct? (from the manual, 31 MB for 1 million
cells).

https://grass.osgeo.org/grass72/manuals/r.watershed.html#large-regions-with-many-cells
" The upper limit of the ram version is 2 billion (2^31 - 1) cells,
whereas the upper limit for the seg version is 9 billion-billion (2^63
- 1 = 9.223372e+18) cells."

So that would be supported by r.watershed.

I do have that disk space available, but it's in a secondary drive. If I set
TMPDIR to that drive, will r/watershed use it? I ask because the
documentation (https://grass.osgeo.org/grass72/manuals/variables.html) only
says that this environmental variable is used by "[Various GRASS GIS
commands and wxGUI]".

AFAIK the TMPDIR variable is not relevant here. The tmp dir is

location/mapset/.tmp/

which you could link to the extra drive. This should be simplified for sure:

Involved code:
https://trac.osgeo.org/grass/browser/grass/trunk/lib/gis/file_name.c
https://trac.osgeo.org/grass/browser/grass/trunk/lib/gis/tempfile.c

Please open a ticket with the relevant info in it.

On a side note, r.terraflow should be an alternative for such a large
dataset, but from the manual:
"r.terraflow has a limit on the number of rows and columns (max 32,767
each)".
That's the size of a positive short integer. Is this limit still needed?

I have no idea here, did you try it?

Markus

--
Markus Neteler
http://www.mundialis.de - free data with free software
http://grass.osgeo.org
http://courses.neteler.org/blog

Hi,

2017-01-10 17:51 GMT+01:00 Markus Neteler <neteler@osgeo.org>:

AFAIK the TMPDIR variable is not relevant here. The tmp dir is

location/mapset/.tmp/

btw, it is possible for vector lib by GRASS_VECTOR_TMPDIR_MAPSET=0 and
TMPDIR variables. There were some reasons why it was not implemented
also for raster library, I don't remember exactly. Ma

--
Martin Landa
http://geo.fsv.cvut.cz/gwiki/Landa
http://gismentors.cz/mentors/landa

2017-01-10 18:10 GMT+01:00 Martin Landa <landa.martin@gmail.com>:

btw, it is possible for vector lib by GRASS_VECTOR_TMPDIR_MAPSET=0 and
TMPDIR variables. There were some reasons why it was not implemented
also for raster library, I don't remember exactly. Ma

See [1]

"""
GRASS_VECTOR_TMPDIR_MAPSET[vectorlib]
By default GRASS temporary directory is located in
$LOCATION/$MAPSET/.tmp/$HOSTNAME. If GRASS_VECTOR_TMPDIR_MAPSET is set
to '0', the temporary directory is located in TMPDIR (environmental
variable defined by the user or GRASS initialization script if not
given).
Important note: This variable is currently used only in vector
library. In other words the variable is ignored by raster or raster3d
library.
"""

Ma

[1] https://grass.osgeo.org/grass72/manuals/variables.html

--
Martin Landa
http://geo.fsv.cvut.cz/gwiki/Landa
http://gismentors.cz/mentors/landa

Hi,

Sorry for the bad math :slight_smile:

I mixed up things. With 03-sec resolution, I have 1.4 billion cells, and r.watershed would need 45.5 GB.
With 01-sec resolution, I have 12.8 billion cells and r.watershed need about 415 GB.

The Location is in a drive with 1TB free. So this shouldn’t be a problem. I’m running it now and it seems it’s going to work. I’ll report on this tomorrow.

I think I didn’t expressed correctly the r,terraflow issue. I didn’t meant if the rows/columns limit is needed in the sense of it being a required parameter, but it was more of a general question, in the sense of ‘do we need to have that limit hardcoded?’ I can open a ticket on this issue.

best

Carlos

···

On Tue, Jan 10, 2017 at 2:51 PM, Markus Neteler <neteler@osgeo.org> wrote:

On Tue, Jan 10, 2017 at 12:30 PM, Carlos Grohmann
<carlos.grohmann@gmail.com> wrote:

Hello all

I’m working on deriving a nice drainage network for the Amazon basin. The
problem is that the area is too large.

The region settings are these:
projection: 3 (Latitude-Longitude)
zone: 0
datum: wgs84
ellipsoid: wgs84
north: 6:17N
south: 21:30S
west: 80:37W
east: 44:50W
nsres: 0:00:03
ewres: 0:00:03
rows: 33340
cols: 42940
cells: 1431619600

With 14.3 billion cels,

Isn’t it 1431619600 = 1.431.619.600 = 1.4 billion cells?

r.watershed would need about 450 GiB of disk space
running in seg mode, is that correct? (from the manual, 31 MB for 1 million
cells).

https://grass.osgeo.org/grass72/manuals/r.watershed.html#large-regions-with-many-cells
" The upper limit of the ram version is 2 billion (2^31 - 1) cells,
whereas the upper limit for the seg version is 9 billion-billion (2^63

  • 1 = 9.223372e+18) cells."

So that would be supported by r.watershed.

I do have that disk space available, but it’s in a secondary drive. If I set
TMPDIR to that drive, will r/watershed use it? I ask because the
documentation (https://grass.osgeo.org/grass72/manuals/variables.html) only
says that this environmental variable is used by “[Various GRASS GIS
commands and wxGUI]”.

AFAIK the TMPDIR variable is not relevant here. The tmp dir is

location/mapset/.tmp/

which you could link to the extra drive. This should be simplified for sure:

Involved code:
https://trac.osgeo.org/grass/browser/grass/trunk/lib/gis/file_name.c
https://trac.osgeo.org/grass/browser/grass/trunk/lib/gis/tempfile.c

Please open a ticket with the relevant info in it.

On a side note, r.terraflow should be an alternative for such a large
dataset, but from the manual:
“r.terraflow has a limit on the number of rows and columns (max 32,767
each)”.
That’s the size of a positive short integer. Is this limit still needed?

I have no idea here, did you try it?

Markus


Markus Neteler
http://www.mundialis.de - free data with free software
http://grass.osgeo.org
http://courses.neteler.org/blog

Prof. Carlos Henrique Grohmann
Institute of Energy and Environment - Univ. of São Paulo, Brazil

  • Digital Terrain Analysis | GIS | Remote Sensing -

http://carlosgrohmann.com
http://orcid.org/0000-0001-5073-5572


Can’t stop the signal.

On Wed, Jan 11, 2017 at 3:58 PM, Carlos Grohmann <carlos.grohmann@gmail.com>
wrote:

Hi,

Sorry for the bad math :slight_smile:

I mixed up things. With 03-sec resolution, I have 1.4 billion cells, and
r.watershed would need 45.5 GB.
With 01-sec resolution, I have 12.8 billion cells and r.watershed need
about 415 GB.

The Location is in a drive with 1TB free. So this shouldn't be a problem.
I'm running it now and it seems it's going to work. I'll report on this
tomorrow.

I think I didn't expressed correctly the r,terraflow issue. I didn't meant
if the rows/columns limit is needed in the sense of it being a required
parameter, but it was more of a general question, in the sense of 'do we
need to have that limit hardcoded?' I can open a ticket on this issue.

The limit of 32,767 is indirectly hard-coded because r.terraflow uses

short integer to store row and column indices. The authors of the module
assumed that massive grids are not larger than 32,767 x 32,767 cells. This
limit can be changed by using 32 bit integers to store row and column
indices, but this will increase disk space requirements which are already
higher than for r.watershed.

Markus M

best

Carlos

On Tue, Jan 10, 2017 at 2:51 PM, Markus Neteler <neteler@osgeo.org> wrote:

On Tue, Jan 10, 2017 at 12:30 PM, Carlos Grohmann
<carlos.grohmann@gmail.com> wrote:
> Hello all
>
> I'm working on deriving a nice drainage network for the Amazon basin.
The
> problem is that the area is too large.
>
> The region settings are these:
> projection: 3 (Latitude-Longitude)
> zone: 0
> datum: wgs84
> ellipsoid: wgs84
> north: 6:17N
> south: 21:30S
> west: 80:37W
> east: 44:50W
> nsres: 0:00:03
> ewres: 0:00:03
> rows: 33340
> cols: 42940
> cells: 1431619600
>
> With 14.3 billion cels,

Isn't it 1431619600 = 1.431.619.600 = 1.4 billion cells?

> r.watershed would need about 450 GiB of disk space
> running in seg mode, is that correct? (from the manual, 31 MB for 1
million
> cells).

https://grass.osgeo.org/grass72/manuals/r.watershed.html#
large-regions-with-many-cells
" The upper limit of the ram version is 2 billion (2^31 - 1) cells,
whereas the upper limit for the seg version is 9 billion-billion (2^63
- 1 = 9.223372e+18) cells."

So that would be supported by r.watershed.

> I do have that disk space available, but it's in a secondary drive. If
I set
> TMPDIR to that drive, will r/watershed use it? I ask because the
> documentation (https://grass.osgeo.org/grass72/manuals/variables.html)
only
> says that this environmental variable is used by "[Various GRASS GIS
> commands and wxGUI]".

AFAIK the TMPDIR variable is not relevant here. The tmp dir is

location/mapset/.tmp/

which you could link to the extra drive. This should be simplified for
sure:

Involved code:
https://trac.osgeo.org/grass/browser/grass/trunk/lib/gis/file_name.c
https://trac.osgeo.org/grass/browser/grass/trunk/lib/gis/tempfile.c

Please open a ticket with the relevant info in it.

> On a side note, r.terraflow should be an alternative for such a large
> dataset, but from the manual:
> "r.terraflow has a limit on the number of rows and columns (max 32,767
> each)".
> That's the size of a positive short integer. Is this limit still needed?

I have no idea here, did you try it?

Markus

--
Markus Neteler
http://www.mundialis.de - free data with free software
http://grass.osgeo.org
http://courses.neteler.org/blog

--
Prof. Carlos Henrique Grohmann
Institute of Energy and Environment - Univ. of São Paulo, Brazil
- Digital Terrain Analysis | GIS | Remote Sensing -

http://carlosgrohmann.com
http://orcid.org/0000-0001-5073-5572
________________
Can’t stop the signal.

_______________________________________________
grass-user mailing list
grass-user@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/grass-user