[GRASS-dev] GRASS on cluster: G_malloc: out of memory management

hi,

I am running GRASS on an external cluster and have to specify in the
job scheduler how much memory I need. Despite a large amount
defined, I always get:

v.vol.rst ...
WARNING: Points are more dense than specified 'DMIN'--ignored 49295 points
         (remain 9976)
Percent complete: ERROR: G_malloc: out of memory

Is there a way to make this message more descriptive? Then I could
try to figure out if my memory request was simply ignored. Currently
I am quite in the dark (since the same job runs on our FEM-CEA
cluster with the same about of RAM but different job scheduler).

Knowing how much v.vol.rst tried to allocate would be already useful.

thanks
Markus

Markus Neteler wrote:

I am running GRASS on an external cluster and have to specify in the
job scheduler how much memory I need. Despite a large amount
defined, I always get:

v.vol.rst ...
WARNING: Points are more dense than specified 'DMIN'--ignored 49295 points
         (remain 9976)
Percent complete: ERROR: G_malloc: out of memory

Is there a way to make this message more descriptive? Then I could
try to figure out if my memory request was simply ignored. Currently
I am quite in the dark (since the same job runs on our FEM-CEA
cluster with the same about of RAM but different job scheduler).

Knowing how much v.vol.rst tried to allocate would be already useful.

--- lib/gis/alloc.c (revision 34189)
+++ lib/gis/alloc.c (working copy)
@@ -42,7 +42,7 @@
     if (buf)
   return buf;

- G_fatal_error(_("G_malloc: out of memory"));
+ G_fatal_error(_("G_malloc: unable to allocate %lu bytes"), (unsigned long) n);
     return NULL;
}

A slightly more informative option would be to report the immediate
caller e.g.:

  void *G__malloc(const char *file, int line, size_t n)
  {
      void *buf;
  
      if (n <= 0)
    n = 1; /* make sure we get a valid request */
  
      buf = malloc(n);
      if (buf)
    return buf;
  
      G_fatal_error(_("G_malloc: unable to allocate %lu bytes at %s:%d"),
                    (unsigned long) n, file, line);
      return NULL;
  }

Then in gisdefs.h:

  -void *G_malloc(size_t);
  +void *G__malloc(const char *, int, size_t);
  +#define G_malloc(n) G__malloc(__FILE__, __LINE__, (n))

Alternatively, have G_malloc() call abort() on error; this will
normally generate a coredump (if coredumps are enabled) which can be
examined with gdb to determine the complete call chain and the exact
state of the process.

However, bear in mind that abort()ing on out-of-memory is likely to
produce large coredumps; ensure that "ulimit -c" is set accordingly.

--
Glynn Clements <glynn@gclements.plus.com>

On Sat, Nov 8, 2008 at 1:18 AM, Glynn Clements <glynn@gclements.plus.com> wrote:

Markus Neteler wrote:

I am running GRASS on an external cluster

BTW: It is this nice cluster:
  http://www.hpc2n.umu.se/resources/Akka/
(Akka is ranked 39 on the latest Top 500 list and 16 on Green 500)

and have to specify in the
job scheduler how much memory I need. Despite a large amount
defined, I always get:

v.vol.rst ...
WARNING: Points are more dense than specified 'DMIN'--ignored 49295 points
         (remain 9976)
Percent complete: ERROR: G_malloc: out of memory

Is there a way to make this message more descriptive? Then I could
try to figure out if my memory request was simply ignored. Currently
I am quite in the dark (since the same job runs on our FEM-CEA
cluster with the same about of RAM but different job scheduler).

Knowing how much v.vol.rst tried to allocate would be already useful.

--- lib/gis/alloc.c (revision 34189)
+++ lib/gis/alloc.c (working copy)
@@ -42,7 +42,7 @@
    if (buf)
       return buf;

- G_fatal_error(_("G_malloc: out of memory"));
+ G_fatal_error(_("G_malloc: unable to allocate %lu bytes"), (unsigned long) n);
    return NULL;
}

This helped already helped: I had to request a little bit more RAM
for the jobs. Now the queue is filled up.

A slightly more informative option would be to report the immediate
caller e.g.:

       void *G__malloc(const char *file, int line, size_t n)
       {
           void *buf;

           if (n <= 0)
               n = 1; /* make sure we get a valid request */

           buf = malloc(n);
           if (buf)
               return buf;

           G_fatal_error(_("G_malloc: unable to allocate %lu bytes at %s:%d"),
                         (unsigned long) n, file, line);
           return NULL;
       }

Then in gisdefs.h:

       -void *G_malloc(size_t);
       +void *G__malloc(const char *, int, size_t);
       +#define G_malloc(n) G__malloc(__FILE__, __LINE__, (n))

I think that it would help a lot (from time to time users come up with
out of memory problems and usually they set the raster resolution to
nanometers).
If you submit it to GRASS 7, I'll backport as usual.

Alternatively, have G_malloc() call abort() on error; this will
normally generate a coredump (if coredumps are enabled) which can be
examined with gdb to determine the complete call chain and the exact
state of the process.

However, bear in mind that abort()ing on out-of-memory is likely to
produce large coredumps; ensure that "ulimit -c" is set accordingly.

This sounds rather risky (and if I submit 1400 jobs and 40% coredump,
they may remove my account...).

The second solution sounds perfect.

Markus