[GRASS-user] r.stream.extract error

Hi,
I’m using the r.stream.extract grass command

r.stream.extract elevation=elv accumulation=upa threshold=0.5 depression=dep direction=dir stream_raster=stream memory=35000 --o --verbose

where the elv is raster of 142690 * 80490 = 11,485,118,100 cell

and I get this error

12.97% of data are kept in memory
Will need up to 293.52 GB (300563 MB) of disk space
Creating temporary files…
Loading input raster maps…
0…3…6…9…12…15…18…21…24…27…30…33…36…39…42…45…48…51…54…57…60…63…66…69…72…75…78…81…84…87…90…93…96…99…100
ERROR: Unable to load input raster map(s)

According to the help manual the memory=35000 should be set in according to the overall memory available. I set the HPC upper memory limit to 40G.

I try several combination of these parameters but i still get the same error.
If the r.stream.extract is based on r.watershed than the segmentation library should be able to handle a huge raster.

Anyone know how to over pass this limitation/error ?

Thank you
Best

···

Giuseppe Amatulli, Ph.D.

Research scientist at
Yale School of Forestry & Environmental Studies
Yale Center for Research Computing
Center for Science and Social Science Information
New Haven, 06511

Teaching: http://spatial-ecology.org
Work: https://environment.yale.edu/profile/giuseppe-amatulli/

On Mon, Oct 30, 2017 at 1:42 PM, Giuseppe Amatulli <giuseppe.amatulli@gmail.com> wrote:

Hi,
I’m using the r.stream.extract grass command

r.stream.extract elevation=elv accumulation=upa threshold=0.5 depression=dep direction=dir stream_raster=stream memory=35000 --o --verbose

where the elv is raster of 142690 * 80490 = 11,485,118,100 cell

and I get this error

12.97% of data are kept in memory
Will need up to 293.52 GB (300563 MB) of disk space
Creating temporary files…
Loading input raster maps…
0…3…6…9…12…15…18…21…24…27…30…33…36…39…42…45…48…51…54…57…60…63…66…69…72…75…78…81…84…87…90…93…96…99…100
ERROR: Unable to load input raster map(s)

This error is caused by integer overflow because not all variables necessary to support such large maps were 64 bit integer.

Fixed in trunk and relbr72 with r71620,1, and tested with a DEM with 172800 * 67200 = 11,612,160,000 cells: r.stream.extract finished successfully in 18 hours (not a HPC, a standard desktop maschine with 32 GB of RAM and a 750 GB SSD).

According to the help manual the memory=35000 should be set in according to the overall memory available. I set the HPC upper memory limit to 40G.

I try several combination of these parameters but i still get the same error.
If the r.stream.extract is based on r.watershed than the segmentation library should be able to handle a huge raster.

r.stream.extract is based on a version of r.watershed that did not support yet such huge raster maps, therefore support for such huge raster maps needed to be added to r.stream.extract separately.

Anyone know how to over pass this limitation/error ?

Please use the latest GRASS 7.2 or GRASS 7.3 version from svn.

Markus M

Thank you
Best

Giuseppe Amatulli, Ph.D.

Research scientist at
Yale School of Forestry & Environmental Studies
Yale Center for Research Computing
Center for Science and Social Science Information
New Haven, 06511
Teaching: http://spatial-ecology.org
Work: https://environment.yale.edu/profile/giuseppe-amatulli/


grass-user mailing list
grass-user@lists.osgeo.org
https://lists.osgeo.org/mailman/listinfo/grass-user

Thanks Markus!!
I will test and I will let you know how it works.

I have few more questions

  1. now how much is the upper limit matrix cell number that r.stream.extract can handle?
  2. is the r.stream.basins add-on subjects to the same limitation? In case would be possible to update also for r.stream.basins?
  3. is r.stream.extract support the use of multi-threaded through openMP? Would be difficult implement?

Best
Giuseppe

···

On 31 October 2017 at 15:54, Markus Metz <markus.metz.giswork@gmail.com> wrote:

On Mon, Oct 30, 2017 at 1:42 PM, Giuseppe Amatulli <giuseppe.amatulli@gmail.com> wrote:

Hi,
I’m using the r.stream.extract grass command

r.stream.extract elevation=elv accumulation=upa threshold=0.5 depression=dep direction=dir stream_raster=stream memory=35000 --o --verbose

where the elv is raster of 142690 * 80490 = 11,485,118,100 cell

and I get this error

12.97% of data are kept in memory
Will need up to 293.52 GB (300563 MB) of disk space
Creating temporary files…
Loading input raster maps…
0…3…6…9…12…15…18…21…24…27…30…33…36…39…42…45…48…51…54…57…60…63…66…69…72…75…78…81…84…87…90…93…96…99…100
ERROR: Unable to load input raster map(s)

This error is caused by integer overflow because not all variables necessary to support such large maps were 64 bit integer.

Fixed in trunk and relbr72 with r71620,1, and tested with a DEM with 172800 * 67200 = 11,612,160,000 cells: r.stream.extract finished successfully in 18 hours (not a HPC, a standard desktop maschine with 32 GB of RAM and a 750 GB SSD).

According to the help manual the memory=35000 should be set in according to the overall memory available. I set the HPC upper memory limit to 40G.

I try several combination of these parameters but i still get the same error.
If the r.stream.extract is based on r.watershed than the segmentation library should be able to handle a huge raster.

r.stream.extract is based on a version of r.watershed that did not support yet such huge raster maps, therefore support for such huge raster maps needed to be added to r.stream.extract separately.

Anyone know how to over pass this limitation/error ?

Please use the latest GRASS 7.2 or GRASS 7.3 version from svn.

Markus M

Thank you
Best

Giuseppe Amatulli, Ph.D.

Research scientist at
Yale School of Forestry & Environmental Studies
Yale Center for Research Computing
Center for Science and Social Science Information
New Haven, 06511
Teaching: http://spatial-ecology.org
Work: https://environment.yale.edu/profile/giuseppe-amatulli/


grass-user mailing list
grass-user@lists.osgeo.org
https://lists.osgeo.org/mailman/listinfo/grass-user

Giuseppe Amatulli, Ph.D.

Research scientist at
Yale School of Forestry & Environmental Studies
Yale Center for Research Computing
Center for Science and Social Science Information
New Haven, 06511

Teaching: http://spatial-ecology.org
Work: https://environment.yale.edu/profile/giuseppe-amatulli/

On Wed, Nov 1, 2017 at 7:15 PM, Giuseppe Amatulli <giuseppe.amatulli@gmail.com> wrote:

Thanks Markus!!
I will test and I will let you know how it works.

Your feedback is very helpful!

I have few more questions

  1. now how much is the upper limit matrix cell number that r.stream.extract can handle?

About 1.15e+18 cells.

Another limitation is the number of detected stream segments. This must not be larger than 2,147,483,647 streams, therefore you need to figure out a reasonable threshold with a smaller test region. A threshold of 0.5 is definitively too small, no matter how large or small the input is. Threshold should typically be larger than 1000, but is somewhat dependent on the resolution of the input. As a rule of thumb, with a coarser resolution, a smaller threshold might be suitable, with a higher resolution, the threshold should be larger. Testing different threshold values in a small subset of the full region can safe a lot of time.

  1. is the r.stream.basins add-on subjects to the same limitation? In case would be possible to update also for r.stream.basins?

The limitation in r.watershed and r.stream.extract comes from the search for drainage directions and flow accumulation. The other r.stream.* modules should support large input data, as long as the number of stream segments does not exceed 2,147,483,647.

  1. is r.stream.extract support the use of multi-threaded through openMP? Would be difficult implement?

In your case, only less than 13% of temporary data are kept in memory. Parallelization with openMP or similar will not help here, your CPU will run only at less than 20% with one thread anyway. The limit is disk I/O. You can make it faster by using more memory and/or using a faster disk storage device.

Markus M

Best
Giuseppe

On 31 October 2017 at 15:54, Markus Metz <markus.metz.giswork@gmail.com> wrote:

On Mon, Oct 30, 2017 at 1:42 PM, Giuseppe Amatulli <giuseppe.amatulli@gmail.com> wrote:

Hi,
I’m using the r.stream.extract grass command

r.stream.extract elevation=elv accumulation=upa threshold=0.5 depression=dep direction=dir stream_raster=stream memory=35000 --o --verbose

where the elv is raster of 142690 * 80490 = 11,485,118,100 cell

and I get this error

12.97% of data are kept in memory
Will need up to 293.52 GB (300563 MB) of disk space
Creating temporary files…
Loading input raster maps…
0…3…6…9…12…15…18…21…24…27…30…33…36…39…42…45…48…51…54…57…60…63…66…69…72…75…78…81…84…87…90…93…96…99…100
ERROR: Unable to load input raster map(s)

This error is caused by integer overflow because not all variables necessary to support such large maps were 64 bit integer.

Fixed in trunk and relbr72 with r71620,1, and tested with a DEM with 172800 * 67200 = 11,612,160,000 cells: r.stream.extract finished successfully in 18 hours (not a HPC, a standard desktop maschine with 32 GB of RAM and a 750 GB SSD).

According to the help manual the memory=35000 should be set in according to the overall memory available. I set the HPC upper memory limit to 40G.

I try several combination of these parameters but i still get the same error.
If the r.stream.extract is based on r.watershed than the segmentation library should be able to handle a huge raster.

r.stream.extract is based on a version of r.watershed that did not support yet such huge raster maps, therefore support for such huge raster maps needed to be added to r.stream.extract separately.

Anyone know how to over pass this limitation/error ?

Please use the latest GRASS 7.2 or GRASS 7.3 version from svn.

Markus M

Thank you
Best

Giuseppe Amatulli, Ph.D.

Research scientist at
Yale School of Forestry & Environmental Studies
Yale Center for Research Computing
Center for Science and Social Science Information
New Haven, 06511
Teaching: http://spatial-ecology.org
Work: https://environment.yale.edu/profile/giuseppe-amatulli/


grass-user mailing list
grass-user@lists.osgeo.org
https://lists.osgeo.org/mailman/listinfo/grass-user


Giuseppe Amatulli, Ph.D.

Research scientist at
Yale School of Forestry & Environmental Studies
Yale Center for Research Computing
Center for Science and Social Science Information
New Haven, 06511
Teaching: http://spatial-ecology.org
Work: https://environment.yale.edu/profile/giuseppe-amatulli/

Thanks again!!

I’m working with a area-flowaccumulation so the 0.5 threshold means 0.5 km2, which is 90m * 90m * 60 cell. My intention is prune back the stream later on with a machine learning procedure. I will be carefully look not to overpass the 2,147,483,647 detected stream segments.

To reduce as much as possible I/O I save the *.tif file in the /dev/shm of each node, read then with r.external and build up the location on the flight in each /tmp. So, it quite fast. I will try to increase a bit the RAM.

Will post later how is going.
Best
Giuseppe

···

On 1 November 2017 at 17:12, Markus Metz <markus.metz.giswork@gmail.com> wrote:

On Wed, Nov 1, 2017 at 7:15 PM, Giuseppe Amatulli <giuseppe.amatulli@gmail.com> wrote:

Thanks Markus!!
I will test and I will let you know how it works.

Your feedback is very helpful!

I have few more questions

  1. now how much is the upper limit matrix cell number that r.stream.extract can handle?

About 1.15e+18 cells.

Another limitation is the number of detected stream segments. This must not be larger than 2,147,483,647 streams, therefore you need to figure out a reasonable threshold with a smaller test region. A threshold of 0.5 is definitively too small, no matter how large or small the input is. Threshold should typically be larger than 1000, but is somewhat dependent on the resolution of the input. As a rule of thumb, with a coarser resolution, a smaller threshold might be suitable, with a higher resolution, the threshold should be larger. Testing different threshold values in a small subset of the full region can safe a lot of time.

  1. is the r.stream.basins add-on subjects to the same limitation? In case would be possible to update also for r.stream.basins?

The limitation in r.watershed and r.stream.extract comes from the search for drainage directions and flow accumulation. The other r.stream.* modules should support large input data, as long as the number of stream segments does not exceed 2,147,483,647.

  1. is r.stream.extract support the use of multi-threaded through openMP? Would be difficult implement?

In your case, only less than 13% of temporary data are kept in memory. Parallelization with openMP or similar will not help here, your CPU will run only at less than 20% with one thread anyway. The limit is disk I/O. You can make it faster by using more memory and/or using a faster disk storage device.

Markus M

Best
Giuseppe

On 31 October 2017 at 15:54, Markus Metz <markus.metz.giswork@gmail.com> wrote:

On Mon, Oct 30, 2017 at 1:42 PM, Giuseppe Amatulli <giuseppe.amatulli@gmail.com> wrote:

Hi,
I’m using the r.stream.extract grass command

r.stream.extract elevation=elv accumulation=upa threshold=0.5 depression=dep direction=dir stream_raster=stream memory=35000 --o --verbose

where the elv is raster of 142690 * 80490 = 11,485,118,100 cell

and I get this error

12.97% of data are kept in memory
Will need up to 293.52 GB (300563 MB) of disk space
Creating temporary files…
Loading input raster maps…
0…3…6…9…12…15…18…21…24…27…30…33…36…39…42…45…48…51…54…57…60…63…66…69…72…75…78…81…84…87…90…93…96…99…100
ERROR: Unable to load input raster map(s)

This error is caused by integer overflow because not all variables necessary to support such large maps were 64 bit integer.

Fixed in trunk and relbr72 with r71620,1, and tested with a DEM with 172800 * 67200 = 11,612,160,000 cells: r.stream.extract finished successfully in 18 hours (not a HPC, a standard desktop maschine with 32 GB of RAM and a 750 GB SSD).

According to the help manual the memory=35000 should be set in according to the overall memory available. I set the HPC upper memory limit to 40G.

I try several combination of these parameters but i still get the same error.
If the r.stream.extract is based on r.watershed than the segmentation library should be able to handle a huge raster.

r.stream.extract is based on a version of r.watershed that did not support yet such huge raster maps, therefore support for such huge raster maps needed to be added to r.stream.extract separately.

Anyone know how to over pass this limitation/error ?

Please use the latest GRASS 7.2 or GRASS 7.3 version from svn.

Markus M

Thank you
Best

Giuseppe Amatulli, Ph.D.

Research scientist at
Yale School of Forestry & Environmental Studies
Yale Center for Research Computing
Center for Science and Social Science Information
New Haven, 06511
Teaching: http://spatial-ecology.org
Work: https://environment.yale.edu/profile/giuseppe-amatulli/


grass-user mailing list
grass-user@lists.osgeo.org
https://lists.osgeo.org/mailman/listinfo/grass-user


Giuseppe Amatulli, Ph.D.

Research scientist at
Yale School of Forestry & Environmental Studies
Yale Center for Research Computing
Center for Science and Social Science Information
New Haven, 06511
Teaching: http://spatial-ecology.org
Work: https://environment.yale.edu/profile/giuseppe-amatulli/

Giuseppe Amatulli, Ph.D.

Research scientist at
Yale School of Forestry & Environmental Studies
Yale Center for Research Computing
Center for Science and Social Science Information
New Haven, 06511

Teaching: http://spatial-ecology.org
Work: https://environment.yale.edu/profile/giuseppe-amatulli/

On Wed, Nov 1, 2017 at 10:41 PM, Giuseppe Amatulli <giuseppe.amatulli@gmail.com> wrote:

Thanks again!!

I’m working with a area-flowaccumulation so the 0.5 threshold means 0.5 km2, which is 90m * 90m * 60 cell.

I forgot to mention that the unit of the threshold option is cells, not squared map units. That means you need to change the threshold value.

Markus M

My intention is prune back the stream later on with a machine learning procedure. I will be carefully look not to overpass the 2,147,483,647 detected stream segments.

To reduce as much as possible I/O I save the *.tif file in the /dev/shm of each node, read then with r.external and build up the location on the flight in each /tmp. So, it quite fast. I will try to increase a bit the RAM.

Will post later how is going.
Best
Giuseppe

On 1 November 2017 at 17:12, Markus Metz <markus.metz.giswork@gmail.com> wrote:

On Wed, Nov 1, 2017 at 7:15 PM, Giuseppe Amatulli <giuseppe.amatulli@gmail.com> wrote:

Thanks Markus!!
I will test and I will let you know how it works.

Your feedback is very helpful!

I have few more questions

  1. now how much is the upper limit matrix cell number that r.stream.extract can handle?

About 1.15e+18 cells.

Another limitation is the number of detected stream segments. This must not be larger than 2,147,483,647 streams, therefore you need to figure out a reasonable threshold with a smaller test region. A threshold of 0.5 is definitively too small, no matter how large or small the input is. Threshold should typically be larger than 1000, but is somewhat dependent on the resolution of the input. As a rule of thumb, with a coarser resolution, a smaller threshold might be suitable, with a higher resolution, the threshold should be larger. Testing different threshold values in a small subset of the full region can safe a lot of time.

  1. is the r.stream.basins add-on subjects to the same limitation? In case would be possible to update also for r.stream.basins?

The limitation in r.watershed and r.stream.extract comes from the search for drainage directions and flow accumulation. The other r.stream.* modules should support large input data, as long as the number of stream segments does not exceed 2,147,483,647.

  1. is r.stream.extract support the use of multi-threaded through openMP? Would be difficult implement?

In your case, only less than 13% of temporary data are kept in memory. Parallelization with openMP or similar will not help here, your CPU will run only at less than 20% with one thread anyway. The limit is disk I/O. You can make it faster by using more memory and/or using a faster disk storage device.

Markus M

Best
Giuseppe

On 31 October 2017 at 15:54, Markus Metz <markus.metz.giswork@gmail.com> wrote:

On Mon, Oct 30, 2017 at 1:42 PM, Giuseppe Amatulli <giuseppe.amatulli@gmail.com> wrote:

Hi,
I’m using the r.stream.extract grass command

r.stream.extract elevation=elv accumulation=upa threshold=0.5 depression=dep direction=dir stream_raster=stream memory=35000 --o --verbose

where the elv is raster of 142690 * 80490 = 11,485,118,100 cell

and I get this error

12.97% of data are kept in memory
Will need up to 293.52 GB (300563 MB) of disk space
Creating temporary files…
Loading input raster maps…
0…3…6…9…12…15…18…21…24…27…30…33…36…39…42…45…48…51…54…57…60…63…66…69…72…75…78…81…84…87…90…93…96…99…100
ERROR: Unable to load input raster map(s)

This error is caused by integer overflow because not all variables necessary to support such large maps were 64 bit integer.

Fixed in trunk and relbr72 with r71620,1, and tested with a DEM with 172800 * 67200 = 11,612,160,000 cells: r.stream.extract finished successfully in 18 hours (not a HPC, a standard desktop maschine with 32 GB of RAM and a 750 GB SSD).

According to the help manual the memory=35000 should be set in according to the overall memory available. I set the HPC upper memory limit to 40G.

I try several combination of these parameters but i still get the same error.
If the r.stream.extract is based on r.watershed than the segmentation library should be able to handle a huge raster.

r.stream.extract is based on a version of r.watershed that did not support yet such huge raster maps, therefore support for such huge raster maps needed to be added to r.stream.extract separately.

Anyone know how to over pass this limitation/error ?

Please use the latest GRASS 7.2 or GRASS 7.3 version from svn.

Markus M

Thank you
Best

Giuseppe Amatulli, Ph.D.

Research scientist at
Yale School of Forestry & Environmental Studies
Yale Center for Research Computing
Center for Science and Social Science Information
New Haven, 06511
Teaching: http://spatial-ecology.org
Work: https://environment.yale.edu/profile/giuseppe-amatulli/


grass-user mailing list
grass-user@lists.osgeo.org
https://lists.osgeo.org/mailman/listinfo/grass-user


Giuseppe Amatulli, Ph.D.

Research scientist at
Yale School of Forestry & Environmental Studies
Yale Center for Research Computing
Center for Science and Social Science Information
New Haven, 06511
Teaching: http://spatial-ecology.org
Work: https://environment.yale.edu/profile/giuseppe-amatulli/


Giuseppe Amatulli, Ph.D.

Research scientist at
Yale School of Forestry & Environmental Studies
Yale Center for Research Computing
Center for Science and Social Science Information
New Haven, 06511
Teaching: http://spatial-ecology.org
Work: https://environment.yale.edu/profile/giuseppe-amatulli/

Hi Markus,
I have compile the GRASS7.2.3svn

r.stream.extract elevation=elv accumulation=upa threshold=0.5 depression=dep direction=dir stream_raster=stream memory=45000 --o --verbose

with a small area

expr 600 * 310 = 186 000 and everything works fine.

If I enlarge the area
expr 72870 * 80040 = 5 832 514 800

I get the following warning

16.67% of data are kept in memory
Will need up to 293.52 GB (300563 MB) of disk space
Creating temporary files…
Loading input raster maps…
0…3…6…9…12…15…18…21…24…27…30…33…36…39…42…45…48…51…54…57…60…63…66…69…72…75…78…81…84…87…90…93…96…99…100
Initializing A* search…
0…WARNING: Segment pagein: read EOF
WARNING: segment lib: put: pagein failed
WARNING: Unable to write segment file
WARNING: Segment pagein: read EOF

and than the stream is not created.
Is something due with compile of the GRASS7.2.3svn ?
or something else?

Concerning the threshold: I’m using area-flow-accumulation expressed in km2. So this means that each pixel have a value of the upper stream sum-area in km2. So if I fix the threshold to 0.5 means that my stream will start when all the cells below have value > 0.5. I have check and look correct to me, in fact my smallest upper stream basin have 7 cell 90*90 ( ~ 1/2 km2).

Thank you
Giuseppe

···

On 1 November 2017 at 17:52, Markus Metz <markus.metz.giswork@gmail.com> wrote:

On Wed, Nov 1, 2017 at 10:41 PM, Giuseppe Amatulli <giuseppe.amatulli@gmail.com> wrote:

Thanks again!!

I’m working with a area-flowaccumulation so the 0.5 threshold means 0.5 km2, which is 90m * 90m * 60 cell.

I forgot to mention that the unit of the threshold option is cells, not squared map units. That means you need to change the threshold value.

Markus M

My intention is prune back the stream later on with a machine learning procedure. I will be carefully look not to overpass the 2,147,483,647 detected stream segments.

To reduce as much as possible I/O I save the *.tif file in the /dev/shm of each node, read then with r.external and build up the location on the flight in each /tmp. So, it quite fast. I will try to increase a bit the RAM.

Will post later how is going.
Best
Giuseppe

On 1 November 2017 at 17:12, Markus Metz <markus.metz.giswork@gmail.com> wrote:

On Wed, Nov 1, 2017 at 7:15 PM, Giuseppe Amatulli <giuseppe.amatulli@gmail.com> wrote:

Thanks Markus!!
I will test and I will let you know how it works.

Your feedback is very helpful!

I have few more questions

  1. now how much is the upper limit matrix cell number that r.stream.extract can handle?

About 1.15e+18 cells.

Another limitation is the number of detected stream segments. This must not be larger than 2,147,483,647 streams, therefore you need to figure out a reasonable threshold with a smaller test region. A threshold of 0.5 is definitively too small, no matter how large or small the input is. Threshold should typically be larger than 1000, but is somewhat dependent on the resolution of the input. As a rule of thumb, with a coarser resolution, a smaller threshold might be suitable, with a higher resolution, the threshold should be larger. Testing different threshold values in a small subset of the full region can safe a lot of time.

  1. is the r.stream.basins add-on subjects to the same limitation? In case would be possible to update also for r.stream.basins?

The limitation in r.watershed and r.stream.extract comes from the search for drainage directions and flow accumulation. The other r.stream.* modules should support large input data, as long as the number of stream segments does not exceed 2,147,483,647.

  1. is r.stream.extract support the use of multi-threaded through openMP? Would be difficult implement?

In your case, only less than 13% of temporary data are kept in memory. Parallelization with openMP or similar will not help here, your CPU will run only at less than 20% with one thread anyway. The limit is disk I/O. You can make it faster by using more memory and/or using a faster disk storage device.

Markus M

Best
Giuseppe

On 31 October 2017 at 15:54, Markus Metz <markus.metz.giswork@gmail.com> wrote:

On Mon, Oct 30, 2017 at 1:42 PM, Giuseppe Amatulli <giuseppe.amatulli@gmail.com> wrote:

Hi,
I’m using the r.stream.extract grass command

r.stream.extract elevation=elv accumulation=upa threshold=0.5 depression=dep direction=dir stream_raster=stream memory=35000 --o --verbose

where the elv is raster of 142690 * 80490 = 11,485,118,100 cell

and I get this error

12.97% of data are kept in memory
Will need up to 293.52 GB (300563 MB) of disk space
Creating temporary files…
Loading input raster maps…
0…3…6…9…12…15…18…21…24…27…30…33…36…39…42…45…48…51…54…57…60…63…66…69…72…75…78…81…84…87…90…93…96…99…100
ERROR: Unable to load input raster map(s)

This error is caused by integer overflow because not all variables necessary to support such large maps were 64 bit integer.

Fixed in trunk and relbr72 with r71620,1, and tested with a DEM with 172800 * 67200 = 11,612,160,000 cells: r.stream.extract finished successfully in 18 hours (not a HPC, a standard desktop maschine with 32 GB of RAM and a 750 GB SSD).

According to the help manual the memory=35000 should be set in according to the overall memory available. I set the HPC upper memory limit to 40G.

I try several combination of these parameters but i still get the same error.
If the r.stream.extract is based on r.watershed than the segmentation library should be able to handle a huge raster.

r.stream.extract is based on a version of r.watershed that did not support yet such huge raster maps, therefore support for such huge raster maps needed to be added to r.stream.extract separately.

Anyone know how to over pass this limitation/error ?

Please use the latest GRASS 7.2 or GRASS 7.3 version from svn.

Markus M

Thank you
Best

Giuseppe Amatulli, Ph.D.

Research scientist at
Yale School of Forestry & Environmental Studies
Yale Center for Research Computing
Center for Science and Social Science Information
New Haven, 06511
Teaching: http://spatial-ecology.org
Work: https://environment.yale.edu/profile/giuseppe-amatulli/


grass-user mailing list
grass-user@lists.osgeo.org
https://lists.osgeo.org/mailman/listinfo/grass-user


Giuseppe Amatulli, Ph.D.

Research scientist at
Yale School of Forestry & Environmental Studies
Yale Center for Research Computing
Center for Science and Social Science Information
New Haven, 06511
Teaching: http://spatial-ecology.org
Work: https://environment.yale.edu/profile/giuseppe-amatulli/


Giuseppe Amatulli, Ph.D.

Research scientist at
Yale School of Forestry & Environmental Studies
Yale Center for Research Computing
Center for Science and Social Science Information
New Haven, 06511
Teaching: http://spatial-ecology.org
Work: https://environment.yale.edu/profile/giuseppe-amatulli/

Giuseppe Amatulli, Ph.D.

Research scientist at
Yale School of Forestry & Environmental Studies
Yale Center for Research Computing
Center for Science and Social Science Information
New Haven, 06511

Teaching: http://spatial-ecology.net
Work: https://environment.yale.edu/profile/giuseppe-amatulli/

On Wed, Nov 8, 2017 at 3:47 PM, Giuseppe Amatulli <giuseppe.amatulli@gmail.com> wrote:

Hi Markus,
I have compile the GRASS7.2.3svn

r.stream.extract elevation=elv accumulation=upa threshold=0.5 depression=dep direction=dir stream_raster=stream memory=45000 --o --verbose

with a small area

expr 600 * 310 = 186 000 and everything works fine.

If I enlarge the area
expr 72870 * 80040 = 5 832 514 800

I get the following warning

16.67% of data are kept in memory
Will need up to 293.52 GB (300563 MB) of disk space
Creating temporary files…
Loading input raster maps…
0…3…6…9…12…15…18…21…24…27…30…33…36…39…42…45…48…51…54…57…60…63…66…69…72…75…78…81…84…87…90…93…96…99…100
Initializing A* search…
0…WARNING: Segment pagein: read EOF
WARNING: segment lib: put: pagein failed
WARNING: Unable to write segment file
WARNING: Segment pagein: read EOF

There was a small bug in the segment library, fixed in trunk r71648. You will need to update your local copy of GRASS 7.3.

Markus M

and than the stream is not created.
Is something due with compile of the GRASS7.2.3svn ?
or something else?

Concerning the threshold: I’m using area-flow-accumulation expressed in km2. So this means that each pixel have a value of the upper stream sum-area in km2. So if I fix the threshold to 0.5 means that my stream will start when all the cells below have value > 0.5. I have check and look correct to me, in fact my smallest upper stream basin have 7 cell 90*90 ( ~ 1/2 km2).

Thank you
Giuseppe

On 1 November 2017 at 17:52, Markus Metz <markus.metz.giswork@gmail.com> wrote:

On Wed, Nov 1, 2017 at 10:41 PM, Giuseppe Amatulli <giuseppe.amatulli@gmail.com> wrote:

Thanks again!!

I’m working with a area-flowaccumulation so the 0.5 threshold means 0.5 km2, which is 90m * 90m * 60 cell.

I forgot to mention that the unit of the threshold option is cells, not squared map units. That means you need to change the threshold value.

Markus M

My intention is prune back the stream later on with a machine learning procedure. I will be carefully look not to overpass the 2,147,483,647 detected stream segments.

To reduce as much as possible I/O I save the *.tif file in the /dev/shm of each node, read then with r.external and build up the location on the flight in each /tmp. So, it quite fast. I will try to increase a bit the RAM.

Will post later how is going.
Best
Giuseppe

On 1 November 2017 at 17:12, Markus Metz <markus.metz.giswork@gmail.com> wrote:

On Wed, Nov 1, 2017 at 7:15 PM, Giuseppe Amatulli <giuseppe.amatulli@gmail.com> wrote:

Thanks Markus!!
I will test and I will let you know how it works.

Your feedback is very helpful!

I have few more questions

  1. now how much is the upper limit matrix cell number that r.stream.extract can handle?

About 1.15e+18 cells.

Another limitation is the number of detected stream segments. This must not be larger than 2,147,483,647 streams, therefore you need to figure out a reasonable threshold with a smaller test region. A threshold of 0.5 is definitively too small, no matter how large or small the input is. Threshold should typically be larger than 1000, but is somewhat dependent on the resolution of the input. As a rule of thumb, with a coarser resolution, a smaller threshold might be suitable, with a higher resolution, the threshold should be larger. Testing different threshold values in a small subset of the full region can safe a lot of time.

  1. is the r.stream.basins add-on subjects to the same limitation? In case would be possible to update also for r.stream.basins?

The limitation in r.watershed and r.stream.extract comes from the search for drainage directions and flow accumulation. The other r.stream.* modules should support large input data, as long as the number of stream segments does not exceed 2,147,483,647.

  1. is r.stream.extract support the use of multi-threaded through openMP? Would be difficult implement?

In your case, only less than 13% of temporary data are kept in memory. Parallelization with openMP or similar will not help here, your CPU will run only at less than 20% with one thread anyway. The limit is disk I/O. You can make it faster by using more memory and/or using a faster disk storage device.

Markus M

Best
Giuseppe

On 31 October 2017 at 15:54, Markus Metz <markus.metz.giswork@gmail.com> wrote:

On Mon, Oct 30, 2017 at 1:42 PM, Giuseppe Amatulli <giuseppe.amatulli@gmail.com> wrote:

Hi,
I’m using the r.stream.extract grass command

r.stream.extract elevation=elv accumulation=upa threshold=0.5 depression=dep direction=dir stream_raster=stream memory=35000 --o --verbose

where the elv is raster of 142690 * 80490 = 11,485,118,100 cell

and I get this error

12.97% of data are kept in memory
Will need up to 293.52 GB (300563 MB) of disk space
Creating temporary files…
Loading input raster maps…
0…3…6…9…12…15…18…21…24…27…30…33…36…39…42…45…48…51…54…57…60…63…66…69…72…75…78…81…84…87…90…93…96…99…100
ERROR: Unable to load input raster map(s)

This error is caused by integer overflow because not all variables necessary to support such large maps were 64 bit integer.

Fixed in trunk and relbr72 with r71620,1, and tested with a DEM with 172800 * 67200 = 11,612,160,000 cells: r.stream.extract finished successfully in 18 hours (not a HPC, a standard desktop maschine with 32 GB of RAM and a 750 GB SSD).

According to the help manual the memory=35000 should be set in according to the overall memory available. I set the HPC upper memory limit to 40G.

I try several combination of these parameters but i still get the same error.
If the r.stream.extract is based on r.watershed than the segmentation library should be able to handle a huge raster.

r.stream.extract is based on a version of r.watershed that did not support yet such huge raster maps, therefore support for such huge raster maps needed to be added to r.stream.extract separately.

Anyone know how to over pass this limitation/error ?

Please use the latest GRASS 7.2 or GRASS 7.3 version from svn.

Markus M

Thank you
Best

Giuseppe Amatulli, Ph.D.

Research scientist at
Yale School of Forestry & Environmental Studies
Yale Center for Research Computing
Center for Science and Social Science Information
New Haven, 06511
Teaching: http://spatial-ecology.org
Work: https://environment.yale.edu/profile/giuseppe-amatulli/


grass-user mailing list
grass-user@lists.osgeo.org
https://lists.osgeo.org/mailman/listinfo/grass-user


Giuseppe Amatulli, Ph.D.

Research scientist at
Yale School of Forestry & Environmental Studies
Yale Center for Research Computing
Center for Science and Social Science Information
New Haven, 06511
Teaching: http://spatial-ecology.org
Work: https://environment.yale.edu/profile/giuseppe-amatulli/


Giuseppe Amatulli, Ph.D.

Research scientist at
Yale School of Forestry & Environmental Studies
Yale Center for Research Computing
Center for Science and Social Science Information
New Haven, 06511
Teaching: http://spatial-ecology.org
Work: https://environment.yale.edu/profile/giuseppe-amatulli/


Giuseppe Amatulli, Ph.D.

Research scientist at
Yale School of Forestry & Environmental Studies
Yale Center for Research Computing
Center for Science and Social Science Information
New Haven, 06511
Teaching: http://spatial-ecology.net
Work: https://environment.yale.edu/profile/giuseppe-amatulli/

On Wed, Nov 8, 2017 at 5:29 PM, Markus Metz <markus.metz.giswork@gmail.com> wrote:

On Wed, Nov 8, 2017 at 3:47 PM, Giuseppe Amatulli <giuseppe.amatulli@gmail.com> wrote:

Hi Markus,
I have compile the GRASS7.2.3svn

r.stream.extract elevation=elv accumulation=upa threshold=0.5 depression=dep direction=dir stream_raster=stream memory=45000 --o --verbose

with a small area

expr 600 * 310 = 186 000 and everything works fine.

If I enlarge the area
expr 72870 * 80040 = 5 832 514 800

I get the following warning

16.67% of data are kept in memory
Will need up to 293.52 GB (300563 MB) of disk space
Creating temporary files…
Loading input raster maps…
0…3…6…9…12…15…18…21…24…27…30…33…36…39…42…45…48…51…54…57…60…63…66…69…72…75…78…81…84…87…90…93…96…99…100
Initializing A* search…
0…WARNING: Segment pagein: read EOF
WARNING: segment lib: put: pagein failed
WARNING: Unable to write segment file
WARNING: Segment pagein: read EOF

There was a small bug in the segment library, fixed in trunk r71648. You will need to update your local copy of GRASS 7.3.

Now also fixed for GRASS 7.2 in r71649.

Markus M

Markus M

and than the stream is not created.
Is something due with compile of the GRASS7.2.3svn ?
or something else?

Concerning the threshold: I’m using area-flow-accumulation expressed in km2. So this means that each pixel have a value of the upper stream sum-area in km2. So if I fix the threshold to 0.5 means that my stream will start when all the cells below have value > 0.5. I have check and look correct to me, in fact my smallest upper stream basin have 7 cell 90*90 ( ~ 1/2 km2).

Thank you
Giuseppe

On 1 November 2017 at 17:52, Markus Metz <markus.metz.giswork@gmail.com> wrote:

On Wed, Nov 1, 2017 at 10:41 PM, Giuseppe Amatulli <giuseppe.amatulli@gmail.com> wrote:

Thanks again!!

I’m working with a area-flowaccumulation so the 0.5 threshold means 0.5 km2, which is 90m * 90m * 60 cell.

I forgot to mention that the unit of the threshold option is cells, not squared map units. That means you need to change the threshold value.

Markus M

My intention is prune back the stream later on with a machine learning procedure. I will be carefully look not to overpass the 2,147,483,647 detected stream segments.

To reduce as much as possible I/O I save the *.tif file in the /dev/shm of each node, read then with r.external and build up the location on the flight in each /tmp. So, it quite fast. I will try to increase a bit the RAM.

Will post later how is going.
Best
Giuseppe

On 1 November 2017 at 17:12, Markus Metz <markus.metz.giswork@gmail.com> wrote:

On Wed, Nov 1, 2017 at 7:15 PM, Giuseppe Amatulli <giuseppe.amatulli@gmail.com> wrote:

Thanks Markus!!
I will test and I will let you know how it works.

Your feedback is very helpful!

I have few more questions

  1. now how much is the upper limit matrix cell number that r.stream.extract can handle?

About 1.15e+18 cells.

Another limitation is the number of detected stream segments. This must not be larger than 2,147,483,647 streams, therefore you need to figure out a reasonable threshold with a smaller test region. A threshold of 0.5 is definitively too small, no matter how large or small the input is. Threshold should typically be larger than 1000, but is somewhat dependent on the resolution of the input. As a rule of thumb, with a coarser resolution, a smaller threshold might be suitable, with a higher resolution, the threshold should be larger. Testing different threshold values in a small subset of the full region can safe a lot of time.

  1. is the r.stream.basins add-on subjects to the same limitation? In case would be possible to update also for r.stream.basins?

The limitation in r.watershed and r.stream.extract comes from the search for drainage directions and flow accumulation. The other r.stream.* modules should support large input data, as long as the number of stream segments does not exceed 2,147,483,647.

  1. is r.stream.extract support the use of multi-threaded through openMP? Would be difficult implement?

In your case, only less than 13% of temporary data are kept in memory. Parallelization with openMP or similar will not help here, your CPU will run only at less than 20% with one thread anyway. The limit is disk I/O. You can make it faster by using more memory and/or using a faster disk storage device.

Markus M

Best
Giuseppe

On 31 October 2017 at 15:54, Markus Metz <markus.metz.giswork@gmail.com> wrote:

On Mon, Oct 30, 2017 at 1:42 PM, Giuseppe Amatulli <giuseppe.amatulli@gmail.com> wrote:

Hi,
I’m using the r.stream.extract grass command

r.stream.extract elevation=elv accumulation=upa threshold=0.5 depression=dep direction=dir stream_raster=stream memory=35000 --o --verbose

where the elv is raster of 142690 * 80490 = 11,485,118,100 cell

and I get this error

12.97% of data are kept in memory
Will need up to 293.52 GB (300563 MB) of disk space
Creating temporary files…
Loading input raster maps…
0…3…6…9…12…15…18…21…24…27…30…33…36…39…42…45…48…51…54…57…60…63…66…69…72…75…78…81…84…87…90…93…96…99…100
ERROR: Unable to load input raster map(s)

This error is caused by integer overflow because not all variables necessary to support such large maps were 64 bit integer.

Fixed in trunk and relbr72 with r71620,1, and tested with a DEM with 172800 * 67200 = 11,612,160,000 cells: r.stream.extract finished successfully in 18 hours (not a HPC, a standard desktop maschine with 32 GB of RAM and a 750 GB SSD).

According to the help manual the memory=35000 should be set in according to the overall memory available. I set the HPC upper memory limit to 40G.

I try several combination of these parameters but i still get the same error.
If the r.stream.extract is based on r.watershed than the segmentation library should be able to handle a huge raster.

r.stream.extract is based on a version of r.watershed that did not support yet such huge raster maps, therefore support for such huge raster maps needed to be added to r.stream.extract separately.

Anyone know how to over pass this limitation/error ?

Please use the latest GRASS 7.2 or GRASS 7.3 version from svn.

Markus M

Thank you
Best

Giuseppe Amatulli, Ph.D.

Research scientist at
Yale School of Forestry & Environmental Studies
Yale Center for Research Computing
Center for Science and Social Science Information
New Haven, 06511
Teaching: http://spatial-ecology.org
Work: https://environment.yale.edu/profile/giuseppe-amatulli/


grass-user mailing list
grass-user@lists.osgeo.org
https://lists.osgeo.org/mailman/listinfo/grass-user


Giuseppe Amatulli, Ph.D.

Research scientist at
Yale School of Forestry & Environmental Studies
Yale Center for Research Computing
Center for Science and Social Science Information
New Haven, 06511
Teaching: http://spatial-ecology.org
Work: https://environment.yale.edu/profile/giuseppe-amatulli/


Giuseppe Amatulli, Ph.D.

Research scientist at
Yale School of Forestry & Environmental Studies
Yale Center for Research Computing
Center for Science and Social Science Information
New Haven, 06511
Teaching: http://spatial-ecology.org
Work: https://environment.yale.edu/profile/giuseppe-amatulli/


Giuseppe Amatulli, Ph.D.

Research scientist at
Yale School of Forestry & Environmental Studies
Yale Center for Research Computing
Center for Science and Social Science Information
New Haven, 06511
Teaching: http://spatial-ecology.net
Work: https://environment.yale.edu/profile/giuseppe-amatulli/

On Wed, Nov 1, 2017 at 10:12 PM, Markus Metz
<markus.metz.giswork@gmail.com> wrote:

On Wed, Nov 1, 2017 at 7:15 PM, Giuseppe Amatulli
<giuseppe.amatulli@gmail.com> wrote:

Thanks Markus!!
I will test and I will let you know how it works.

Your feedback is very helpful!

I have few more questions
1) now how much is the upper limit matrix cell number that
r.stream.extract can handle?

About 1.15e+18 cells.

Another limitation is the number of detected stream segments. This must not
be larger than 2,147,483,647 streams,

...

(Added as a note to
https://grasswiki.osgeo.org/wiki/GRASS_GIS_Performance#Some_benchmarks
)

best,
markusN

Hi Markus M.

I was testing the r.stream.extract for 2 tiff

  1. 80040 x 72870 = 5,832,514,800 i got the stream results - no error

  2. 80490 x 142690 = 11,485,118,100 i got the following error

A* Search…
0…2…/var/spool/slurmd/job6514787/slurm_script: line 80: 28925 Bus error r.stream.extract elevation=elv accumulation=upa threshold=0.5 depression=dep direction=dir stream_raster=stream memory=45000 --o --verbose

Do you think that is something with the slurm ram limitation or is something with the r.extract.stream?
If the the stream output have a number of stream segments larger than 2,147,483,647 what is happen?
Do I get an error or all the value larger than 2,147,483,647 are just rounded to 2,147,483,647 ?

Moreover, if use the stream obtain from option 1) and I use the stream as input for the r.stream.basins
I got the following error

reading raster map …
0…3…6…9…12…15…18…21…24…27…30…33…36…39…42…45…/var/spool/slurmd/job6514788/slurm_script: line 82: 17687 Bus error /gpfs/home/fas/sbsc/ga254/.grass7/addons/bin/r.stream.basins -l stream_rast=stream direction=dir

is this something that need to be fixed in r.stream.basins, or should i think that is due to other problems

Thank you
Giuseppe

···

On 10 November 2017 at 11:21, Markus Neteler <neteler@osgeo.org> wrote:

On Wed, Nov 1, 2017 at 10:12 PM, Markus Metz
<markus.metz.giswork@gmail.com> wrote:

On Wed, Nov 1, 2017 at 7:15 PM, Giuseppe Amatulli
<giuseppe.amatulli@gmail.com> wrote:

Thanks Markus!!
I will test and I will let you know how it works.

Your feedback is very helpful!

I have few more questions

  1. now how much is the upper limit matrix cell number that
    r.stream.extract can handle?

About 1.15e+18 cells.

Another limitation is the number of detected stream segments. This must not
be larger than 2,147,483,647 streams,

(Added as a note to
https://grasswiki.osgeo.org/wiki/GRASS_GIS_Performance#Some_benchmarks
)

best,
markusN

Giuseppe Amatulli, Ph.D.

Research scientist at
Yale School of Forestry & Environmental Studies
Yale Center for Research Computing
Center for Science and Social Science Information
New Haven, 06511

Teaching: http://spatial-ecology.net
Work: https://environment.yale.edu/profile/giuseppe-amatulli/

Hello

I think it was already mentioned in this thread: the threshold=0.5 is certainly wrong. The threshold value is number of pixels for minimum basin size. Usually it would be in the thousands. With your region size of 11 billion you probably want a threshold of tens of thousands.

···

On 11/16/2017 11:51 PM, Giuseppe Amatulli wrote:

Hi Markus M.

I was testing the r.stream.extract for 2 tiff

  1. 80040 x 72870 = 5,832,514,800 i got the stream results - no error

  2. 80490 x 142690 = 11,485,118,100 i got the following error

A* Search…
0…2…/var/spool/slurmd/job6514787/slurm_script: line 80: 28925 Bus error r.stream.extract elevation=elv accumulation=upa threshold=0.5 depression=dep direction=dir stream_raster=stream memory=45000 --o --verbose

Do you think that is something with the slurm ram limitation or is something with the r.extract.stream?
If the the stream output have a number of stream segments larger than 2,147,483,647 what is happen?
Do I get an error or all the value larger than 2,147,483,647 are just rounded to 2,147,483,647 ?

Moreover, if use the stream obtain from option 1) and I use the stream as input for the r.stream.basins
I got the following error

reading raster map …
0…3…6…9…12…15…18…21…24…27…30…33…36…39…42…45…/var/spool/slurmd/job6514788/slurm_script: line 82: 17687 Bus error /gpfs/home/fas/sbsc/ga254/.grass7/addons/bin/r.stream.basins -l stream_rast=stream direction=dir

is this something that need to be fixed in r.stream.basins, or should i think that is due to other problems

Thank you
Giuseppe

On 10 November 2017 at 11:21, Markus Neteler <neteler@osgeo.org> wrote:

On Wed, Nov 1, 2017 at 10:12 PM, Markus Metz
<markus.metz.giswork@gmail.com> wrote:

On Wed, Nov 1, 2017 at 7:15 PM, Giuseppe Amatulli
<giuseppe.amatulli@gmail.com> wrote:

Thanks Markus!!
I will test and I will let you know how it works.

Your feedback is very helpful!

I have few more questions

  1. now how much is the upper limit matrix cell number that
    r.stream.extract can handle?

About 1.15e+18 cells.

Another limitation is the number of detected stream segments. This must not
be larger than 2,147,483,647 streams,

(Added as a note to
https://grasswiki.osgeo.org/wiki/GRASS_GIS_Performance#Some_benchmarks
)

best,
markusN

Giuseppe Amatulli, Ph.D.

Research scientist at
Yale School of Forestry & Environmental Studies
Yale Center for Research Computing
Center for Science and Social Science Information
New Haven, 06511

Teaching: http://spatial-ecology.net
Work: https://environment.yale.edu/profile/giuseppe-amatulli/

_______________________________________________
grass-user mailing list
[grass-user@lists.osgeo.org](mailto:grass-user@lists.osgeo.org)
[https://lists.osgeo.org/mailman/listinfo/grass-user](https://lists.osgeo.org/mailman/listinfo/grass-user)
-- 
Micha Silver
Ben Gurion Univ.
Sde Boker, Remote Sensing Lab
cell: +972-523-665918

On Fri, Nov 17, 2017 at 7:52 AM, Micha Silver <tsvibar@gmail.com> wrote:

Hello

On 11/16/2017 11:51 PM, Giuseppe Amatulli wrote:

Hi Markus M.

I was testing the r.stream.extract for 2 tiff

  1. 80040 x 72870 = 5,832,514,800 i got the stream results - no error

  2. 80490 x 142690 = 11,485,118,100 i got the following error

A* Search…
0…2…/var/spool/slurmd/job6514787/slurm_script: line 80: 28925 Bus error r.stream.extract elevation=elv accumulation=upa threshold=0.5 depression=dep direction=dir stream_raster=stream memory=45000 --o --verbose

I think it was already mentioned in this thread: the threshold=0.5 is certainly wrong. The threshold value is number of pixels for minimum basin size. Usually it would be in the thousands. With your region size of 11 billion you probably want a threshold of tens of thousands.

As Giuseppe explained previously, accumulation has been rescaled to square kilometers, therefore the threshold is ok (still a bit small, but not nonsense).

Markus M

Do you think that is something with the slurm ram limitation or is something with the r.extract.stream?
If the the stream output have a number of stream segments larger than 2,147,483,647 what is happen?
Do I get an error or all the value larger than 2,147,483,647 are just rounded to 2,147,483,647 ?

Moreover, if use the stream obtain from option 1) and I use the stream as input for the r.stream.basins
I got the following error

reading raster map …
0…3…6…9…12…15…18…21…24…27…30…33…36…39…42…45…/var/spool/slurmd/job6514788/slurm_script: line 82: 17687 Bus error /gpfs/home/fas/sbsc/ga254/.grass7/addons/bin/r.stream.basins -l stream_rast=stream direction=dir

is this something that need to be fixed in r.stream.basins, or should i think that is due to other problems

Thank you
Giuseppe

On 10 November 2017 at 11:21, Markus Neteler <neteler@osgeo.org> wrote:

On Wed, Nov 1, 2017 at 10:12 PM, Markus Metz
<markus.metz.giswork@gmail.com> wrote:

On Wed, Nov 1, 2017 at 7:15 PM, Giuseppe Amatulli
<giuseppe.amatulli@gmail.com> wrote:

Thanks Markus!!
I will test and I will let you know how it works.

Your feedback is very helpful!

I have few more questions

  1. now how much is the upper limit matrix cell number that
    r.stream.extract can handle?

About 1.15e+18 cells.

Another limitation is the number of detected stream segments. This must not
be larger than 2,147,483,647 streams,

(Added as a note to
https://grasswiki.osgeo.org/wiki/GRASS_GIS_Performance#Some_benchmarks
)

best,
markusN


Giuseppe Amatulli, Ph.D.

Research scientist at
Yale School of Forestry & Environmental Studies
Yale Center for Research Computing
Center for Science and Social Science Information
New Haven, 06511
Teaching: http://spatial-ecology.net
Work: https://environment.yale.edu/profile/giuseppe-amatulli/


grass-user mailing list
grass-user@lists.osgeo.org
https://lists.osgeo.org/mailman/listinfo/grass-user


Micha Silver
Ben Gurion Univ.
Sde Boker, Remote Sensing Lab
cell: +972-523-665918

Ah, I missed that, thanks

···

On 11/17/2017 10:30 AM, Markus Metz wrote:

On Fri, Nov 17, 2017 at 7:52 AM, Micha Silver <tsvibar@gmail.com> wrote:

Hello

On 11/16/2017 11:51 PM, Giuseppe Amatulli wrote:

I think it was already mentioned in this thread: the threshold=0.5 is certainly wrong. The threshold value is number of pixels for minimum basin size. Usually it would be in the thousands. With your region size of 11 billion you probably want a threshold of tens of thousands.

As Giuseppe explained previously, accumulation has been rescaled to square kilometers, therefore the threshold is ok (still a bit small, but not nonsense).

Markus M

-- 
Micha Silver
Ben Gurion Univ.
Sde Boker, Remote Sensing Lab
cell: +972-523-665918

On Thu, Nov 16, 2017 at 10:51 PM, Giuseppe Amatulli <giuseppe.amatulli@gmail.com> wrote:

Hi Markus M.

I was testing the r.stream.extract for 2 tiff

  1. 80040 x 72870 = 5,832,514,800 i got the stream results - no error

Great, that means large maps with more than 2 billion cells are supported.

  1. 80490 x 142690 = 11,485,118,100 i got the following error

A* Search…
0…2…/var/spool/slurmd/job6514787/slurm_script: line 80: 28925 Bus error r.stream.extract elevation=elv accumulation=upa threshold=0.5 depression=dep direction=dir stream_raster=stream memory=45000 --o --verbose

From wikipedia:
“a bus error is a fault raised by hardware, notifying an operating system (OS) that a process is trying to access memory that the CPU cannot physically address: an invalid address for the address bus, hence the name.”

I guess you either need to raise the RAM limit in slurm or slightly reduce the memory option for r.stream.extract.

Markus M

Do you think that is something with the slurm ram limitation or is something with the r.extract.stream?
If the the stream output have a number of stream segments larger than 2,147,483,647 what is happen?
Do I get an error or all the value larger than 2,147,483,647 are just rounded to 2,147,483,647 ?

Moreover, if use the stream obtain from option 1) and I use the stream as input for the r.stream.basins
I got the following error

reading raster map …
0…3…6…9…12…15…18…21…24…27…30…33…36…39…42…45…/var/spool/slurmd/job6514788/slurm_script: line 82: 17687 Bus error /gpfs/home/fas/sbsc/ga254/.grass7/addons/bin/r.stream.basins -l stream_rast=stream direction=dir

is this something that need to be fixed in r.stream.basins, or should i think that is due to other problems

Thank you
Giuseppe

On 10 November 2017 at 11:21, Markus Neteler <neteler@osgeo.org> wrote:

On Wed, Nov 1, 2017 at 10:12 PM, Markus Metz
<markus.metz.giswork@gmail.com> wrote:

On Wed, Nov 1, 2017 at 7:15 PM, Giuseppe Amatulli
<giuseppe.amatulli@gmail.com> wrote:

Thanks Markus!!
I will test and I will let you know how it works.

Your feedback is very helpful!

I have few more questions

  1. now how much is the upper limit matrix cell number that
    r.stream.extract can handle?

About 1.15e+18 cells.

Another limitation is the number of detected stream segments. This must not
be larger than 2,147,483,647 streams,

(Added as a note to
https://grasswiki.osgeo.org/wiki/GRASS_GIS_Performance#Some_benchmarks
)

best,
markusN


Giuseppe Amatulli, Ph.D.

Research scientist at
Yale School of Forestry & Environmental Studies
Yale Center for Research Computing
Center for Science and Social Science Information
New Haven, 06511
Teaching: http://spatial-ecology.net
Work: https://environment.yale.edu/profile/giuseppe-amatulli/

Hi Markus M,
I re-tested (size 80490 x 142690 = 11,485,118,100 ) r.stream.extract & r.stream.basins setting 25G RAM for the 2 commands with a slurm upper limit of 60G RAM.

The r.stream.extract use this RAM

···

On 17 November 2017 at 03:33, Markus Metz <markus.metz.giswork@gmail.com> wrote:

On Thu, Nov 16, 2017 at 10:51 PM, Giuseppe Amatulli <giuseppe.amatulli@gmail.com> wrote:

Hi Markus M.

I was testing the r.stream.extract for 2 tiff

  1. 80040 x 72870 = 5,832,514,800 i got the stream results - no error

Great, that means large maps with more than 2 billion cells are supported.

  1. 80490 x 142690 = 11,485,118,100 i got the following error

A* Search…
0…2…/var/spool/slurmd/job6514787/slurm_script: line 80: 28925 Bus error r.stream.extract elevation=elv accumulation=upa threshold=0.5 depression=dep direction=dir stream_raster=stream memory=45000 --o --verbose

From wikipedia:
“a bus error is a fault raised by hardware, notifying an operating system (OS) that a process is trying to access memory that the CPU cannot physically address: an invalid address for the address bus, hence the name.”

I guess you either need to raise the RAM limit in slurm or slightly reduce the memory option for r.stream.extract.

Markus M

Do you think that is something with the slurm ram limitation or is something with the r.extract.stream?
If the the stream output have a number of stream segments larger than 2,147,483,647 what is happen?
Do I get an error or all the value larger than 2,147,483,647 are just rounded to 2,147,483,647 ?

Moreover, if use the stream obtain from option 1) and I use the stream as input for the r.stream.basins
I got the following error

reading raster map …
0…3…6…9…12…15…18…21…24…27…30…33…36…39…42…45…/var/spool/slurmd/job6514788/slurm_script: line 82: 17687 Bus error /gpfs/home/fas/sbsc/ga254/.grass7/addons/bin/r.stream.basins -l stream_rast=stream direction=dir

is this something that need to be fixed in r.stream.basins, or should i think that is due to other problems

Thank you
Giuseppe

On 10 November 2017 at 11:21, Markus Neteler <neteler@osgeo.org> wrote:

On Wed, Nov 1, 2017 at 10:12 PM, Markus Metz
<markus.metz.giswork@gmail.com> wrote:

On Wed, Nov 1, 2017 at 7:15 PM, Giuseppe Amatulli
<giuseppe.amatulli@gmail.com> wrote:

Thanks Markus!!
I will test and I will let you know how it works.

Your feedback is very helpful!

I have few more questions

  1. now how much is the upper limit matrix cell number that
    r.stream.extract can handle?

About 1.15e+18 cells.

Another limitation is the number of detected stream segments. This must not
be larger than 2,147,483,647 streams,

(Added as a note to
https://grasswiki.osgeo.org/wiki/GRASS_GIS_Performance#Some_benchmarks
)

best,
markusN


Giuseppe Amatulli, Ph.D.

Research scientist at
Yale School of Forestry & Environmental Studies
Yale Center for Research Computing
Center for Science and Social Science Information
New Haven, 06511
Teaching: http://spatial-ecology.net
Work: https://environment.yale.edu/profile/giuseppe-amatulli/

Giuseppe Amatulli, Ph.D.

Research scientist at
Yale School of Forestry & Environmental Studies
Yale Center for Research Computing
Center for Science and Social Science Information
New Haven, 06511

Teaching: http://spatial-ecology.net
Work: https://environment.yale.edu/profile/giuseppe-amatulli/

Just finished the test with 100G and the computation finish without error.

So my conclusion is:
even if setting the max value of RAM in the r.stream.basins to memory=25000 the effective requirements is a bit less the then 100G.
A similar situation happens with r.stream.extract where the requested used RAM is a bit less then 50G.

Any thoughts?

Thanks you
Best

Giuseppe

···

On 26 November 2017 at 20:33, Giuseppe Amatulli <giuseppe.amatulli@gmail.com> wrote:

Hi Markus M,
I re-tested (size 80490 x 142690 = 11,485,118,100 ) r.stream.extract & r.stream.basins setting 25G RAM for the 2 commands with a slurm upper limit of 60G RAM.

The r.stream.extract use this RAM

############################################################
JobID MaxVMSize


6731334.bat+ 47459272K
############################################################

and finish without problem

rather r.stream.basins use

############################################################
JobID MaxVMSize


6731334.bat+ 90241708K
############################################################

and get kill with this error.

Reading raster map …
0…3…6…9…12…15…/var/spool/slurmd/job6731334/slurm_script: line 88: 15041 Bus error /gpfs/home/fas/sbsc/ga254/.grass7/addons/bin/r.stream.basins -l stream_rast=stream direction=dir basins=lbasin memory=25000 --o --verbose

I think something is implemented different in r.stream.basins compare to r.stream.extract.
I will try to ask for more RAM (100G) but I’m afraid that is going to fail again.

Any thoughts?

Thanks you
Best

Giuseppe

On 17 November 2017 at 03:33, Markus Metz <markus.metz.giswork@gmail.com> wrote:

On Thu, Nov 16, 2017 at 10:51 PM, Giuseppe Amatulli <giuseppe.amatulli@gmail.com> wrote:

Hi Markus M.

I was testing the r.stream.extract for 2 tiff

  1. 80040 x 72870 = 5,832,514,800 i got the stream results - no error

Great, that means large maps with more than 2 billion cells are supported.

  1. 80490 x 142690 = 11,485,118,100 i got the following error

A* Search…
0…2…/var/spool/slurmd/job6514787/slurm_script: line 80: 28925 Bus error r.stream.extract elevation=elv accumulation=upa threshold=0.5 depression=dep direction=dir stream_raster=stream memory=45000 --o --verbose

From wikipedia:
“a bus error is a fault raised by hardware, notifying an operating system (OS) that a process is trying to access memory that the CPU cannot physically address: an invalid address for the address bus, hence the name.”

I guess you either need to raise the RAM limit in slurm or slightly reduce the memory option for r.stream.extract.

Markus M

Do you think that is something with the slurm ram limitation or is something with the r.extract.stream?
If the the stream output have a number of stream segments larger than 2,147,483,647 what is happen?
Do I get an error or all the value larger than 2,147,483,647 are just rounded to 2,147,483,647 ?

Moreover, if use the stream obtain from option 1) and I use the stream as input for the r.stream.basins
I got the following error

reading raster map …
0…3…6…9…12…15…18…21…24…27…30…33…36…39…42…45…/var/spool/slurmd/job6514788/slurm_script: line 82: 17687 Bus error /gpfs/home/fas/sbsc/ga254/.grass7/addons/bin/r.stream.basins -l stream_rast=stream direction=dir

is this something that need to be fixed in r.stream.basins, or should i think that is due to other problems

Thank you
Giuseppe

On 10 November 2017 at 11:21, Markus Neteler <neteler@osgeo.org> wrote:

On Wed, Nov 1, 2017 at 10:12 PM, Markus Metz
<markus.metz.giswork@gmail.com> wrote:

On Wed, Nov 1, 2017 at 7:15 PM, Giuseppe Amatulli
<giuseppe.amatulli@gmail.com> wrote:

Thanks Markus!!
I will test and I will let you know how it works.

Your feedback is very helpful!

I have few more questions

  1. now how much is the upper limit matrix cell number that
    r.stream.extract can handle?

About 1.15e+18 cells.

Another limitation is the number of detected stream segments. This must not
be larger than 2,147,483,647 streams,

(Added as a note to
https://grasswiki.osgeo.org/wiki/GRASS_GIS_Performance#Some_benchmarks
)

best,
markusN


Giuseppe Amatulli, Ph.D.

Research scientist at
Yale School of Forestry & Environmental Studies
Yale Center for Research Computing
Center for Science and Social Science Information
New Haven, 06511
Teaching: http://spatial-ecology.net
Work: https://environment.yale.edu/profile/giuseppe-amatulli/

Giuseppe Amatulli, Ph.D.

Research scientist at
Yale School of Forestry & Environmental Studies
Yale Center for Research Computing
Center for Science and Social Science Information
New Haven, 06511

Teaching: http://spatial-ecology.net
Work: https://environment.yale.edu/profile/giuseppe-amatulli/

Giuseppe Amatulli, Ph.D.

Research scientist at
Yale School of Forestry & Environmental Studies
Yale Center for Research Computing
Center for Science and Social Science Information
New Haven, 06511

Teaching: http://spatial-ecology.net
Work: https://environment.yale.edu/profile/giuseppe-amatulli/

On Mon, Nov 27, 2017 at 3:56 AM, Giuseppe Amatulli <giuseppe.amatulli@gmail.com> wrote:

Just finished the test with 100G and the computation finish without error.

So my conclusion is:
even if setting the max value of RAM in the r.stream.basins to memory=25000 the effective requirements is a bit less the then 100G.

For r.stream.basins you need to use the -m flag, otherwise everything is done in RAM. A nice enhancement for all affected r.stream.* modules would be to switch automatically to disk swap mode if the amount of memory needed for all-in-RAM processing is larger than what is allocated with the memory=MB option.

A similar situation happens with r.stream.extract where the requested used RAM is a bit less then 50G.

This is strange. In my tests, r.stream.extract always uses a bit less than what is given with the memory option. I could process a raster with 11,612,160,000 cells and memory=25000 and actual peak memory consumption was about 24000 MB.

Markus M

Any thoughts?

Thanks you
Best
Giuseppe

On 26 November 2017 at 20:33, Giuseppe Amatulli <giuseppe.amatulli@gmail.com> wrote:

Hi Markus M,
I re-tested (size 80490 x 142690 = 11,485,118,100 ) r.stream.extract & r.stream.basins setting 25G RAM for the 2 commands with a slurm upper limit of 60G RAM.

The r.stream.extract use this RAM
############################################################
JobID MaxVMSize


6731334.bat+ 47459272K
############################################################
and finish without problem

rather r.stream.basins use
############################################################
JobID MaxVMSize


6731334.bat+ 90241708K
############################################################
and get kill with this error.

Reading raster map …
0…3…6…9…12…15…/var/spool/slurmd/job6731334/slurm_script: line 88: 15041 Bus error /gpfs/home/fas/sbsc/ga254/.grass7/addons/bin/r.stream.basins -l stream_rast=stream direction=dir basins=lbasin memory=25000 --o --verbose

I think something is implemented different in r.stream.basins compare to r.stream.extract.
I will try to ask for more RAM (100G) but I’m afraid that is going to fail again.

Any thoughts?

Thanks you
Best
Giuseppe

On 17 November 2017 at 03:33, Markus Metz <markus.metz.giswork@gmail.com> wrote:

On Thu, Nov 16, 2017 at 10:51 PM, Giuseppe Amatulli <giuseppe.amatulli@gmail.com> wrote:

Hi Markus M.

I was testing the r.stream.extract for 2 tiff

  1. 80040 x 72870 = 5,832,514,800 i got the stream results - no error

Great, that means large maps with more than 2 billion cells are supported.

  1. 80490 x 142690 = 11,485,118,100 i got the following error

A* Search…
0…2…/var/spool/slurmd/job6514787/slurm_script: line 80: 28925 Bus error r.stream.extract elevation=elv accumulation=upa threshold=0.5 depression=dep direction=dir stream_raster=stream memory=45000 --o --verbose

From wikipedia:
“a bus error is a fault raised by hardware, notifying an operating system (OS) that a process is trying to access memory that the CPU cannot physically address: an invalid address for the address bus, hence the name.”

I guess you either need to raise the RAM limit in slurm or slightly reduce the memory option for r.stream.extract.

Markus M

Do you think that is something with the slurm ram limitation or is something with the r.extract.stream?
If the the stream output have a number of stream segments larger than 2,147,483,647 what is happen?
Do I get an error or all the value larger than 2,147,483,647 are just rounded to 2,147,483,647 ?

Moreover, if use the stream obtain from option 1) and I use the stream as input for the r.stream.basins
I got the following error

reading raster map …
0…3…6…9…12…15…18…21…24…27…30…33…36…39…42…45…/var/spool/slurmd/job6514788/slurm_script: line 82: 17687 Bus error /gpfs/home/fas/sbsc/ga254/.grass7/addons/bin/r.stream.basins -l stream_rast=stream direction=dir

is this something that need to be fixed in r.stream.basins, or should i think that is due to other problems

Thank you
Giuseppe

On 10 November 2017 at 11:21, Markus Neteler <neteler@osgeo.org> wrote:

On Wed, Nov 1, 2017 at 10:12 PM, Markus Metz
<markus.metz.giswork@gmail.com> wrote:

On Wed, Nov 1, 2017 at 7:15 PM, Giuseppe Amatulli
<giuseppe.amatulli@gmail.com> wrote:

Thanks Markus!!
I will test and I will let you know how it works.

Your feedback is very helpful!

I have few more questions

  1. now how much is the upper limit matrix cell number that
    r.stream.extract can handle?

About 1.15e+18 cells.

Another limitation is the number of detected stream segments. This must not
be larger than 2,147,483,647 streams,

(Added as a note to
https://grasswiki.osgeo.org/wiki/GRASS_GIS_Performance#Some_benchmarks
)

best,
markusN


Giuseppe Amatulli, Ph.D.

Research scientist at
Yale School of Forestry & Environmental Studies
Yale Center for Research Computing
Center for Science and Social Science Information
New Haven, 06511
Teaching: http://spatial-ecology.net
Work: https://environment.yale.edu/profile/giuseppe-amatulli/


Giuseppe Amatulli, Ph.D.

Research scientist at
Yale School of Forestry & Environmental Studies
Yale Center for Research Computing
Center for Science and Social Science Information
New Haven, 06511
Teaching: http://spatial-ecology.net
Work: https://environment.yale.edu/profile/giuseppe-amatulli/


Giuseppe Amatulli, Ph.D.

Research scientist at
Yale School of Forestry & Environmental Studies
Yale Center for Research Computing
Center for Science and Social Science Information
New Haven, 06511
Teaching: http://spatial-ecology.net
Work: https://environment.yale.edu/profile/giuseppe-amatulli/