Hamish,
Thanks for the tips. I've used r.in.xyz a few times and it is quite an
improvement over the s.in.ascii, s.to.rast (or the grass6 equivalent)
path, but I do want to use v.in.ascii without topology support because
ultimately I want to run a modified version of v.surf.rst that I wrote
to process very large lidar point sets. On grass5 and the sites format,
I could process over 350 million points using my modified v.surf.rst and
it was a bit faster than v.surf.rst on smaller inputs around 10 million
points. I would like to clean up my code a bit and submit it to the
GRASS project or perhaps have it as an add-on, but the new vector format
is limiting the number of points I can get into Grass6.
I followed the previous memory problems discussed by you, Helena, and
Radim, and I think this is a separate problem. I'm not sure if the
problem that freeing memory is slow applies to the tokenizer in the
first pass over the data or something in the topology building. I
thought it was the latter.
I think the recent change in G_free_tokens for LL projections is a bug,
but a bug that is fixable in a rather short period of time. I don't
have any lat/long data to test on and I won't have much time to look
into in the next few weeks, but perhaps I can take a look at it in late
July. The bigger problem is the 64-bit file support in the vector
library. Are the 64-bit file I/O function ftello and fseeko portable to
all the different platforms that run grass? Perhaps Glynn knows? If
there is any point in the code that writes file offsets to disk, we
would need to be careful about compatibility issues there. I don't know
this section of the code very well, so I'm reluctant to make any
changes, before getting some advice.
-Andy
On Thu, 2006-06-29 at 17:59 +1200, Hamish wrote:
Andrew Danner wrote:
> I'm having problems importing huge lidar point sets using v.in.ascii.
> I thought this issue was resolved with the -b flag, but v.in.ascii is
> consuming all the memory even in the initial scan of the data (before
> building the topology, which should be skipped with the -b flag)
>
> My data set is comma separated x,y,z points
>
> v.in.ascii -ztb input=BasinPoints.txt output=NeuseBasinPts fs="," z=3
>
> Sample data:
>
> 1939340.84,825793.89,657.22
> 1939071.95,825987.78,660.22
> 1939035.52,826013.97,662.46
> 1938762.45,826210.15,686.28
> 1938744.05,826223.34,688.57
> 1938707.4,826249.58,694.1
> 1938689.21,826262.62,696.55
> 1938670.91,826275.77,698.07
> 1938616.48,826314.99,699.31
> 1938598.36,826328.09,698.58
>
> I have over 300 million such records and the input file is over 11GB.
>
> v.in.ascii runs out of memory and crashes during points_analyse in
> v.in.ascii/points.c
Hi,
you'll probably want to use the new r.in.xyz module for anything more
than 3 million points. I think v.surf.rst is the only module which can
do something useful with vector points without topology.
http://grass.ibiblio.org/grass61/manuals/html61_user/r.in.xyz.html
http://hamish.bowman.googlepages.com/grassfiles#xyz
v.in.ascii (without -b) has finite memory needs due to topological support.
Search the mailing list archives for many comments on "v.in.ascii memory
leak" by Radim, Helena, and myself on the subject. Here is some valgrind
analysis I did on it at the time:
http://bambi.otago.ac.nz/hamish/grass/memleak/v.in.ascii/
If you can find a way to lessen the memory footprint, then great!
Same for large file and 64bit support fixes.
> I can now see why the free was originally moved outside the loop
> to fix lat/long problems: because tokens[i] is redirected to a
> different buffer in the LL case. This seems problematic and a possible
> source of memory leaks.
but LL parsing support was only added relatively recently? need to check
the CVS log. rev 1.9:
http://freegis.org/cgi-bin/viewcvs.cgi/grass6/vector/v.in.ascii/points.c
so not a "core" unfixable part of the code?
Radim said that freeing memory was slow. Maybe free a chunk of memory
every 50000th point or so?
Hamish