[GRASS-user] GRASS 7 error code 9

Pankaj wrote:

Is there any GRASS - internal limit on usage of
memory.

No, not that I know of. Usually it is the operating
system which poses the limit.

there is of course the possibility of a small bug
somewhere getting in the way, but for the most
part I would expect it to cope with large rasters
remarkably well.

note that at a region size of 46341 x 46341 we
get to 2^31 cells, and to the point where a signed
32bit integer overflows and wraps backwards on
itself. If that were the case I'd suspect a malloc
error or a segfault not SIGKILL, but it may be a
clue.

Perhaps , due to my past experience of working
with big files, I had set aside approximately
200gb of swap space during installation of linux.

ok, then you are well ahead of me. :slight_smile:

And, I understand that GRASS shares the same space
for its memory requirement.

GRASS simply uses the operating system's available
resources, but individual modules typically handle
things themselves. In this case I hope that Markus
Metz is able to help, as he rewrote much of that
code and first made it possible to run the
r.watershed module with large arrays, and may
have run some large-raster tests in the past.(??)

have you tried the same region with r.terraflow?

Hamish

Hamish wrote:

note that at a region size of 46341 x 46341 we
get to 2^31 cells, and to the point where a signed
32bit integer overflows and wraps backwards on
itself. If that were the case I'd suspect a malloc
error or a segfault not SIGKILL, but it may be a
clue.

your region is 42001x42001 so probably not to do
with the region size.

to recap:

  r.watershed -m ... memory=8000
works.

  r.watershed -m ... memory=15000
fails.

lat/lon location, looks like the source is SRTM data. (is r.watershed happy in lat/lon locations?)

relevant section of the debug log is:

{{{
...
SECTION 1a: Mark masked and NULL cells
0%
2%
4%
100%
D1/3: open segments for A* points
D1/3: segment lib: fast address activated
D1/3: segment lib: fast seek activated
D1/3: open segments for A* search heap
D1/3: heap memory 1943.31 MB
D1/3: A* search heap open segments 971, total 6730
D1/3: segment lib: fast address activated
D1/3: segment lib: fast seek activated
SECTION 1b: Determining Offmap Flow.
WARNING: Subprocess failed with exit code 9
...
}}}

what happens if you try without the -m segmentation
flag?

Hamish

Hamish wrote:

note that at a region size of 46341 x 46341 we
get to 2^31 cells, and to the point where a signed
32bit integer overflows and wraps backwards on
itself. If that were the case I'd suspect a malloc
error or a segfault not SIGKILL, but it may be a
clue.

The value of 9 indicates that the child process (etc/r.watershed.seg)
terminated either with exit(9) or due to signal 9 (SIGKILL). The
latter seems more likely (i.e. SIGKILL from the kernel's "OOM-killer"
is quite likely for a process which uses too much memory, while I
can't see any mechanism by which exit(9) would occur).

If the kernel runs low on either physical or virtual memory, it
identifies a process which is using a lot of memory and kills it as if
by sending SIGKILL (although SIGKILL isn't actually "sent"; it can't
be blocked, ignored or caught, so the kernel just deletes the process;
if the parent calls wait() etc on the process, it is reported that the
process was terminated via SIGKILL).

AFAICT, r.watershed requires far more memory than just the size of the
underlying raster data. It's possible that it isn't interpreting the
memory= parameter correctly. It's also possible that the kernel is
including the memory used for caching the segment file in deciding
which process to kill.

--
Glynn Clements <glynn@gclements.plus.com>

Dear Grass users and developers,

The region is only 42001 * 42001 i.e. smaller than the limit of 46341 x 46341.
The module r.watershed didn’t work without -m due to memory requirement.
So, the next best, I thought was to assign RAM as much as possible.
And I began trying with 15500 MB downwards.
I understand that it’s really difficult to isolate Memory overflow errors in any program code.
Both the modules have worked ( r.watershed and r.stream.extract) after allocating approximately half the available RAM.
I am really interested in investigating the reasons of r.stream.extract not creating the vector table in postgres with large regions.
Please note that for smaller regions, everything is perfect.
I have observed that the initial calculation of memory requirement is not perfect but it’s a good indicator.
Almost two years back with quadcore AMD, 2 gb RAM on GRASSv6 , I had observed this.
For a much smaller region and 1 kM resolution data, the memory required as reported by r.watershed was 60gb (without RAM i.e. -m flag).
With usual swap space of 4gb, it didn’t worked.
However, it worked when I added a temporary swap space of 38 gb and used -m flag.
I ran my computer uninterrupted for a week to get the results.

The GRASSv7 is in much better shape and moving in right direction to utilise fully the expected hardware development (processing power, cheaper RAM, multiple cores)
and better quality of DEM data. ( Eight years back, it was a big pain digitising the contours for the fun of watching rivers flow on your desktop.)

Now, in the coming month, what I plan that I will format the computer.
Reload everything from scratch.
And on 1 tb swap space and 1 tb disk space , try everything afresh.
This will highlight the issue related with disk space, if any.
I will additionally try to do things advised by GRASS developers.

On Sun, Nov 20, 2011 at 3:38 AM, Glynn Clements <glynn@gclements.plus.com> wrote:

Hamish wrote:

note that at a region size of 46341 x 46341 we
get to 2^31 cells, and to the point where a signed
32bit integer overflows and wraps backwards on
itself. If that were the case I’d suspect a malloc
error or a segfault not SIGKILL, but it may be a
clue.

The value of 9 indicates that the child process (etc/r.watershed.seg)
terminated either with exit(9) or due to signal 9 (SIGKILL). The
latter seems more likely (i.e. SIGKILL from the kernel’s “OOM-killer”
is quite likely for a process which uses too much memory, while I
can’t see any mechanism by which exit(9) would occur).

If the kernel runs low on either physical or virtual memory, it
identifies a process which is using a lot of memory and kills it as if
by sending SIGKILL (although SIGKILL isn’t actually “sent”; it can’t
be blocked, ignored or caught, so the kernel just deletes the process;
if the parent calls wait() etc on the process, it is reported that the
process was terminated via SIGKILL).

AFAICT, r.watershed requires far more memory than just the size of the
underlying raster data. It’s possible that it isn’t interpreting the
memory= parameter correctly. It’s also possible that the kernel is
including the memory used for caching the segment file in deciding
which process to kill.


Glynn Clements <glynn@gclements.plus.com>

Glynn Clements wrote:

AFAICT, r.watershed requires far more memory than just the size of the
underlying raster data. It's possible that it isn't interpreting the
memory= parameter correctly.

Yes, there was an error in the distribution of memory over the various
temporary data structures, fixed in r49314. r.watershed.seg should now
use usually a bit less than memory=X MB and only in special cases get
close to the allowed limit.

Markus M

Pankaj Kr Sharma wrote:

Dear Grass users and developers,

The region is only 42001 * 42001 i.e. smaller than the limit of 46341 x
46341.
The module r.watershed didn't work without -m due to memory requirement.
So, the next best, I thought was to assign RAM as much as possible.
And I began trying with 15500 MB downwards.

It is not a good idea to assign all free memory to r.watershed -m,
this will actually make the module slower. If there are e.g. a total
of 16 GB RAM, some of that is used by the os and possibly other
applications, so there may be only 15 GB free (currently unused). The
memory option should be set to some value smaller than the currently
available RAM, otherwise the module will become much slower when going
into swap space. The idea of the -m flag and memory option is to
prevent r.watershed from using up all memory. As a rough rule of
thumb, the memory option should be set to 50% - 80% of what free -m
reports as free. Anything larger would probably slow the module down.

Markus M

I understand that it's really difficult to isolate Memory overflow errors in
any program code.
Both the modules have worked ( r.watershed and r.stream.extract) after
allocating approximately half the available RAM.
I am really interested in investigating the reasons of r.stream.extract not
creating the vector table in postgres with large regions.
Please note that for smaller regions, everything is perfect.
I have observed that the initial calculation of memory requirement is not
perfect but it's a good indicator.
Almost two years back with quadcore AMD, 2 gb RAM on GRASSv6 , I had
observed this.
For a much smaller region and 1 kM resolution data, the memory required as
reported by r.watershed was 60gb (without RAM i.e. -m flag).
With usual swap space of 4gb, it didn't worked.
However, it worked when I added a temporary swap space of 38 gb and used -m
flag.
I ran my computer uninterrupted for a week to get the results.

The GRASSv7 is in much better shape and moving in right direction to utilise
fully the expected hardware development (processing power, cheaper RAM,
multiple cores)
and better quality of DEM data. ( Eight years back, it was a big pain
digitising the contours for the fun of watching rivers flow on your
desktop.)

Now, in the coming month, what I plan that I will format the computer.
Reload everything from scratch.
And on 1 tb swap space and 1 tb disk space , try everything afresh.
This will highlight the issue related with disk space, if any.
I will additionally try to do things advised by GRASS developers.

On Sun, Nov 20, 2011 at 3:38 AM, Glynn Clements <glynn@gclements.plus.com>
wrote:

Hamish wrote:

> note that at a region size of 46341 x 46341 we
> get to 2^31 cells, and to the point where a signed
> 32bit integer overflows and wraps backwards on
> itself. If that were the case I'd suspect a malloc
> error or a segfault not SIGKILL, but it may be a
> clue.

The value of 9 indicates that the child process (etc/r.watershed.seg)
terminated either with exit(9) or due to signal 9 (SIGKILL). The
latter seems more likely (i.e. SIGKILL from the kernel's "OOM-killer"
is quite likely for a process which uses too much memory, while I
can't see any mechanism by which exit(9) would occur).

If the kernel runs low on either physical or virtual memory, it
identifies a process which is using a lot of memory and kills it as if
by sending SIGKILL (although SIGKILL isn't actually "sent"; it can't
be blocked, ignored or caught, so the kernel just deletes the process;
if the parent calls wait() etc on the process, it is reported that the
process was terminated via SIGKILL).

AFAICT, r.watershed requires far more memory than just the size of the
underlying raster data. It's possible that it isn't interpreting the
memory= parameter correctly. It's also possible that the kernel is
including the memory used for caching the segment file in deciding
which process to kill.

--
Glynn Clements <glynn@gclements.plus.com>