[GRASS-dev] [GRASS GIS] #438: v.distance -a uses too much memory

#438: v.distance -a uses too much memory
------------------------------------------+---------------------------------
Reporter: mlennert | Owner: grass-dev@lists.osgeo.org
     Type: defect | Status: new
Priority: major | Milestone: 7.0.0
Component: Vector | Version: svn-trunk
Keywords: v.distance memory allocation | Platform: Unspecified
      Cpu: Unspecified |
------------------------------------------+---------------------------------
Not sure if this should be considered as a bug or a wish for
enhancement...chosing bug for now as it makes the module useless with
large files.

When trying to calculate a distance matrix between 20 000 points with
v.distance -a, I get:

ERREUR:G_realloc: unable to allocate 1985321728 bytes at main.c:568

As the machine only has 1 GB of RAM, this is normal, but v.distance should
be rewritten to not keep everything in memory, at least when dealing with
the -a flag, and to only allocate memory for data really requested.

Currently, it allocates memory for a large number of NEAR structures
(3xint+10xdouble i.e., for example, 3x4+10x8=92Bytes for each point) which
contain space for all the potential uplad options (lines 447-8 of
vector/v.distance/main.c):

{{{
         anear = 2 * nfrom;
         Near = (NEAR *) G_calloc(anear, sizeof(NEAR));
}}}

And then goes on to if necessary add memory space for the entire From x To
matrix (lines 566-8 of vector/v.distance/main.c) in the loop of the to
objects (count= total number of distances calculated after each loop):

{{{

              if (anear <= count) {
                    anear += 10 + nfrom / 10;
                    Near = (NEAR *) G_realloc(Near, anear * sizeof(NEAR));
}}}

I'm not sure I completely understand this last part, as it seems to create
huge jumps in allocation, i.e. when the count of distances goes beyond
nfrom*2 (or later values of anear), it reallocates memory space for anear
new NEARS. In my case, when count>40000, anear=40000+10+20000/10=42010,
i.e. adding space for 2010 new NEAR structures, without knowing (AFAICT)
how many will actually still come...

But, as I said, I don't understand the code well enough to make a definite
judgement. It would seem, however, that it might be better to calculate
each distance and update the table immediately, or maybe write the
necessary queries to a temp file to be able to launch the query at the end
in one run, but without keeping everything in memory.

--
Ticket URL: <http://trac.osgeo.org/grass/ticket/438&gt;
GRASS GIS <http://grass.osgeo.org>

#438: v.distance -a uses too much memory
------------------------------------------+---------------------------------
Reporter: mlennert | Owner: grass-dev@…
     Type: defect | Status: new
Priority: major | Milestone: 7.0.0
Component: Vector | Version: svn-trunk
Keywords: v.distance memory allocation | Platform: Unspecified
      Cpu: Unspecified |
------------------------------------------+---------------------------------

Comment(by neteler):

Is this still an issue with the current 7.SVN version?

--
Ticket URL: <http://trac.osgeo.org/grass/ticket/438#comment:1&gt;
GRASS GIS <http://grass.osgeo.org>

#438: v.distance -a uses too much memory
------------------------------------------+---------------------------------
Reporter: mlennert | Owner: grass-dev@…
     Type: defect | Status: new
Priority: major | Milestone: 7.0.0
Component: Vector | Version: svn-trunk
Keywords: v.distance memory allocation | Platform: Unspecified
      Cpu: Unspecified |
------------------------------------------+---------------------------------

Comment(by mlennert):

Replying to [comment:1 neteler]:
> Is this still an issue with the current 7.SVN version?

The code has changed.

I just checked on a machine with 8GB of RAM and

v.distance -a -p from=ssbel from_type=centroid to=ssbel to_type=centroid
upload=dist col=dist > dist_ssbel

where ssbel has a bit more than 20000 centroids, still crashed after
memory _and_ swap go up to their max levels.

I'm aware that we're talking about 400,000,000 pairs of points, but I'm
still hoping that there is a way to avoid such heavy memory usage.

Moritz

--
Ticket URL: <https://trac.osgeo.org/grass/ticket/438#comment:2&gt;
GRASS GIS <http://grass.osgeo.org>