[GRASS-user] r.watershed and big files

Hello,

I am trying to use the r.watershed function to calculate streams (rivers)
from elevation data. As long as I make sure that the GRASS region is not
too big things work fine, but in the end I would like to do the
calculations at the resolution of the elevation data.

Let me first clarify what I mean with big in this case. I can do the
calculations with regions up to 10000x10000 pixels, but above that I start
to run into trouble. My elevation data is about 80000x60000, so I guess
that is quite big.

So my first question would be: Is the file I try to process too big or has
r.watershed been run successfully on such big files?

From the mailing list I understand GRASS needs to be compiled with large

file support to work with such big files. I am not sure if my version is, I
have tried in on Windows with the stable version 6.4 from the OSGeo4W
installer and on Ubuntu with the 6.4 version from their repository.

This is the error I got on Ubuntu while running r.watershed:

SECTION 3 beginning: Initiating Variables. 5 sections total.
WARNING: No such file or directory
WARNING: cseg_open(): could not write segment file
Floating point exception
WARNING: Subprocess failed with exit code 34817

Any advice on how to prevent this error?

Thanks,

Arno

On Wed, 24 Aug 2011, arno@agerrius.nl wrote:

From the mailing list I understand GRASS needs to be compiled with large
file support to work with such big files. I am not sure if my version is,
I have tried in on Windows with the stable version 6.4 from the OSGeo4W
installer and on Ubuntu with the 6.4 version from their repository.

Arno,

   Others probably have better advice, but my suggestion is to build the
source on ubuntu yourself. That way you can be sure that LFS is built in. I
don't know that this configuration option is starndard in any pre-built
binary as I always roll my own.

Rich

On Wed, 2011-08-24 at 13:58 +0200, arno@agerrius.nl wrote:

Hello,

I am trying to use the r.watershed function to calculate streams (rivers)
from elevation data. As long as I make sure that the GRASS region is not
too big things work fine, but in the end I would like to do the
calculations at the resolution of the elevation data.

Let me first clarify what I mean with big in this case. I can do the
calculations with regions up to 10000x10000 pixels, but above that I start
to run into trouble. My elevation data is about 80000x60000, so I guess
that is quite big.

There are some nice guidelines on the r.watershed man page regarding
memory requirements. Specifically:

------------------------------------------------------
In-memory mode and disk swap mode
There are two versions of this program: ram and seg. ram is used by
default, seg can be used by setting the -m flag.
The ram version requires a maximum of 31 MB of RAM for 1 million cells.
Together with the amount of system memory (RAM) available, this value
can be used to estimate whether the current region can be processed with
the ram version.
The ram version uses virtual memory managed by the operating system to
store all the data structures and is faster than the seg version; seg
uses the GRASS segmentation library which manages data in disk files.
seg uses only as much system memory (RAM) as specified with the memory
option, allowing other processes to operate on the same system, even
when the current geographic region is huge.
Due to memory requirements of both programs, it is quite easy to run out
of memory when working with huge map regions. If the ram version runs
out of memory and the resolution size of the current geographic region
cannot be increased, either more memory needs to be added to the
computer, or the swap space size needs to be increased. If seg runs out
of memory, additional disk space needs to be freed up for the program to
run. The r.terraflow module was specifically designed with huge regions
in mind and may be useful here as an alternative.
---------------------------------------------------

So with your 80,000 X 60,000 = 4,800,000,000 cells region you would need
about 150 GB (!) of RAM to run it in memory.
So you'll most likely need to use the -m option which writes everything
out to disk, making the process much slower.

So my first question would be: Is the file I try to process too big or has
r.watershed been run successfully on such big files?

>From the mailing list I understand GRASS needs to be compiled with large
file support to work with such big files. I am not sure if my version is, I
have tried in on Windows with the stable version 6.4 from the OSGeo4W
installer and on Ubuntu with the 6.4 version from their repository.

This is the error I got on Ubuntu while running r.watershed:

SECTION 3 beginning: Initiating Variables. 5 sections total.
WARNING: No such file or directory
WARNING: cseg_open(): could not write segment file
Floating point exception
WARNING: Subprocess failed with exit code 34817

Not sure about these errors, THey do *not* indicate memory problems.

Any advice on how to prevent this error?

Thanks,

Arno
_______________________________________________
grass-user mailing list
grass-user@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/grass-user

This mail was received via Mail-SeCure System.

On Wed, Aug 24, 2011 at 3:49 PM, Micha Silver <micha@arava.co.il> wrote:

On Wed, 2011-08-24 at 13:58 +0200, arno@agerrius.nl wrote:

Hello,

I am trying to use the r.watershed function to calculate streams (rivers)
from elevation data. As long as I make sure that the GRASS region is not
too big things work fine, but in the end I would like to do the
calculations at the resolution of the elevation data.

Let me first clarify what I mean with big in this case. I can do the
calculations with regions up to 10000x10000 pixels, but above that I start
to run into trouble. My elevation data is about 80000x60000, so I guess
that is quite big.

There are some nice guidelines on the r.watershed man page regarding
memory requirements. Specifically:

------------------------------------------------------
In-memory mode and disk swap mode
There are two versions of this program: ram and seg. ram is used by
default, seg can be used by setting the -m flag.
The ram version requires a maximum of 31 MB of RAM for 1 million cells.
Together with the amount of system memory (RAM) available, this value
can be used to estimate whether the current region can be processed with
the ram version.
The ram version uses virtual memory managed by the operating system to
store all the data structures and is faster than the seg version; seg
uses the GRASS segmentation library which manages data in disk files.
seg uses only as much system memory (RAM) as specified with the memory
option, allowing other processes to operate on the same system, even
when the current geographic region is huge.
Due to memory requirements of both programs, it is quite easy to run out
of memory when working with huge map regions. If the ram version runs
out of memory and the resolution size of the current geographic region
cannot be increased, either more memory needs to be added to the
computer, or the swap space size needs to be increased. If seg runs out
of memory, additional disk space needs to be freed up for the program to
run. The r.terraflow module was specifically designed with huge regions
in mind and may be useful here as an alternative.
---------------------------------------------------

So with your 80,000 X 60,000 = 4,800,000,000 cells region you would need
about 150 GB (!) of RAM to run it in memory.

This large number of cells is not yet supported in r.watershed (see
similar problem with r.terraflow [0]). The maximum number of cells is
2^31 - 1 = 2,147,483,647. This limitation is 1) independent of whether
the in-memory or on-disk option is used, 2) for the (recommended)
on-disk option mainly imposed by the segment library of grass and
applies therefore also to e.g r.cost and r.walk.

So you'll most likely need to use the -m option which writes everything
out to disk, making the process much slower.

So my first question would be: Is the file I try to process too big or has
r.watershed been run successfully on such big files?

>From the mailing list I understand GRASS needs to be compiled with large
file support to work with such big files. I am not sure if my version is, I
have tried in on Windows with the stable version 6.4 from the OSGeo4W
installer and on Ubuntu with the 6.4 version from their repository.

This is the error I got on Ubuntu while running r.watershed:

SECTION 3 beginning: Initiating Variables. 5 sections total.
WARNING: No such file or directory
WARNING: cseg_open(): could not write segment file
Floating point exception
WARNING: Subprocess failed with exit code 34817

Not sure about these errors, THey do *not* indicate memory problems.

I guess so too. The first message indicates that the on-disk option
(-m flag) is used. One reason could be not enough free disk space to
create temporary files. BTW, there seems to be a cut'n paste error,
the first message should read "SECTION 1 beginning: ..." and not
"SECTION 3 beginning: ..."

Just out of curiosity: how did you create this massive 4.8 billion cells DEM?

Markus M

[0] http://lists.osgeo.org/pipermail/grass-dev/2011-August/055314.html

Any advice on how to prevent this error?

Thanks,

Arno
_______________________________________________
grass-user mailing list
grass-user@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/grass-user

This mail was received via Mail-SeCure System.

_______________________________________________
grass-user mailing list
grass-user@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/grass-user

Quoting Arno Gerretsen <arno@agerrius.nl>:

I guess so too. The first message indicates that the on-disk option
(-m flag) is used. One reason could be not enough free disk space to
create temporary files. BTW, there seems to be a cut'n paste error,
the first message should read "SECTION 1 beginning: ..." and not
"SECTION 3 beginning: ..."

I probably forgot to copy the first part then. Let me try again with the new knowledge about the maximum size. Hopefully that will prevent these errors.

I did another attempt to use r.watershed on my big file. This time I compiled GRASS 6.4.1 from source code and made sure it was configured for largefiles (--enable-largefile). However I still get errors from r.watershed and it seems related to the segment file:

GRASS 6.4.1 (ML):~/grass/bin > r.watershed elevation=ML_elev stream=stream_total accumulation=accumulation_total threshold=10000 --o -m memory=2000
SECTION 1 beginning: Initiating Variables. 5 sections total.
WARNING: No such file or directory
WARNING: dseg_open(): could not write segment file
WARNING: Subprocess failed with exit code 136

Can anybody tell me what is causing this (and how I could solve it)? In this case I was using a region of 19800x16200 pixels.

Thanks,

Arno

On Thu, Sep 1, 2011 at 4:07 PM, Arno Gerretsen <arno@agerrius.nl> wrote:

Quoting Arno Gerretsen <arno@agerrius.nl>:

I guess so too. The first message indicates that the on-disk option
(-m flag) is used. One reason could be not enough free disk space to
create temporary files. BTW, there seems to be a cut'n paste error,
the first message should read "SECTION 1 beginning: ..." and not
"SECTION 3 beginning: ..."

I probably forgot to copy the first part then. Let me try again with the
new knowledge about the maximum size. Hopefully that will prevent these
errors.

I did another attempt to use r.watershed on my big file. This time I
compiled GRASS 6.4.1 from source code and made sure it was configured for
largefiles (--enable-largefile). However I still get errors from r.watershed
and it seems related to the segment file:

GRASS 6.4.1 (ML):~/grass/bin > r.watershed elevation=ML_elev
stream=stream_total accumulation=accumulation_total threshold=10000 --o -m
memory=2000
SECTION 1 beginning: Initiating Variables. 5 sections total.
WARNING: No such file or directory
WARNING: dseg_open(): could not write segment file
WARNING: Subprocess failed with exit code 136

Can anybody tell me what is causing this (and how I could solve it)? In this
case I was using a region of 19800x16200 pixels.

I think the reason is that GRASS could not initialize a temporary
segment file, maybe because of insufficient free disk space?
dseg_open() s called relative late in the initialization process when
a number of other segment files have already been successfully
created. You could try to create a new GRASS database on a different
partition with more free disk space and try r.watershed again.

If possible, you could also try GRASS7 and run r.watershed --verbose,
this will print out estimated disk and memory requirements (and is
much faster than r.watershed -m in GRASS 6).

Markus M

Quoting Markus Metz <markus.metz.giswork@googlemail.com>:

On Thu, Sep 1, 2011 at 4:07 PM, Arno Gerretsen <arno@agerrius.nl> wrote:

I did another attempt to use r.watershed on my big file. This time I
compiled GRASS 6.4.1 from source code and made sure it was configured for
largefiles (--enable-largefile). However I still get errors from r.watershed
and it seems related to the segment file:

GRASS 6.4.1 (ML):~/grass/bin > r.watershed elevation=ML_elev
stream=stream_total accumulation=accumulation_total threshold=10000 --o -m
memory=2000
SECTION 1 beginning: Initiating Variables. 5 sections total.
WARNING: No such file or directory
WARNING: dseg_open(): could not write segment file
WARNING: Subprocess failed with exit code 136

Can anybody tell me what is causing this (and how I could solve it)? In this
case I was using a region of 19800x16200 pixels.

I think the reason is that GRASS could not initialize a temporary
segment file, maybe because of insufficient free disk space?
dseg_open() s called relative late in the initialization process when
a number of other segment files have already been successfully
created. You could try to create a new GRASS database on a different
partition with more free disk space and try r.watershed again.

If possible, you could also try GRASS7 and run r.watershed --verbose,
this will print out estimated disk and memory requirements (and is
much faster than r.watershed -m in GRASS 6).

I am already using the partition with most disk space, there is about 170 GB left.

I think I will give GRASS 7 a go and see if that helps.

Thanks,

Arno

On Thu, Sep 1, 2011 at 4:07 PM, Arno Gerretsen <arno@agerrius.nl> wrote:
...

I did another attempt to use r.watershed on my big file. This time I
compiled GRASS 6.4.1 from source code and made sure it was configured for
largefiles (--enable-largefile).

I tried myself with GRASS 6.4.2.svn (note that you have 6.4.1). That may make
the difference.

However I still get errors from r.watershed
and it seems related to the segment file:

GRASS 6.4.1 (ML):~/grass/bin > r.watershed elevation=ML_elev
stream=stream_total accumulation=accumulation_total threshold=10000 --o -m
memory=2000
SECTION 1 beginning: Initiating Variables. 5 sections total.
WARNING: No such file or directory
WARNING: dseg_open(): could not write segment file
WARNING: Subprocess failed with exit code 136

Can anybody tell me what is causing this (and how I could solve it)? In this
case I was using a region of 19800x16200 pixels.

GRASS 6.4.2svn (patUTM32):~ > g.region rast=italy_dem20_final res=60 -p
projection: 1 (UTM)
zone: 32
datum: wgs84
ellipsoid: wgs84
north: 5220298.87460934
south: 3930058.87460934
west: 313280.88065731
east: 1328680.88065731
nsres: 60
ewres: 60.00118182
rows: 21504
cols: 16923
cells: 363912192

GRASS 6.4.2svn (patUTM32):~ > r.watershed italy_dem20_final
stream=stream_total accumulation=accumulation_total threshold=10000 -m
memory=2000
SECTION 1 beginning: Initiating Variables. 5 sections total.
SECTION 1b (of 5): Determining Offmap Flow.
100%
SECTION 2: A * Search.
  46%
...

Looks fine!

Markus

Quoting Markus Neteler <neteler@osgeo.org>:

On Thu, Sep 1, 2011 at 4:07 PM, Arno Gerretsen <arno@agerrius.nl> wrote:
...

I did another attempt to use r.watershed on my big file. This time I
compiled GRASS 6.4.1 from source code and made sure it was configured for
largefiles (--enable-largefile).

I tried myself with GRASS 6.4.2.svn (note that you have 6.4.1). That may make
the difference.

I got the grass 7.0 trunk compiled now and that one seems to work as well.

Arno

Just for the record, the calculation time;

GRASS 6.4.2svn (patUTM32):~ > r.watershed italy_dem20_final
stream=stream_total accumulation=accumulation_total threshold=10000 -m
memory=2000

(cells: 364 million)

GRASS 6.4.2svn: done in two hours.

GRASS 7,svn, on the same machine: 36 min :slight_smile:

Markus

On Fri, 2 Sep 2011, Markus Neteler wrote:

(cells: 364 million)
GRASS 6.4.2svn: done in two hours.
GRASS 7,svn, on the same machine: 36 min :slight_smile:

Markus,

   Quite impressive. Is 7.svn ready for real-world producion work?

   I'll be starting a project where I'll need fairly high resolution (yet to
be determined) DEM of a 15 mi^2 area. Modeling hyrology over this area will
take some time.

Rich

On Fri, Sep 2, 2011 at 3:34 PM, Rich Shepard <rshepard@appl-ecosys.com> wrote:

On Fri, 2 Sep 2011, Markus Neteler wrote:

(cells: 364 million)
GRASS 6.4.2svn: done in two hours.
GRASS 7,svn, on the same machine: 36 min :slight_smile:

Markus,

Quite impressive. Is 7.svn ready for real-world producion work?

Yes and no:

- the C modules are pretty stable in general,
- the Python code need way more testing and fixing.

Markus

Markus Neteler wrote:

- the Python code need way more testing and fixing.

Which Python code? The GUI or the scripts?

--
Glynn Clements <glynn@gclements.plus.com>

On Mon, Sep 5, 2011 at 3:08 AM, Glynn Clements <glynn@gclements.plus.com> wrote:

Markus Neteler wrote:

- the Python code need way more testing and fixing.

Which Python code? The GUI or the scripts?

Both.
See my recent fixes + grass-dev emails (but I am concentrating on the
scripts for now). However, it is way better already with respect to
last week :slight_smile:

Markus