Markus N.
cc to Markus M. and dev list,
that is a lot of points and a huge raster and I think it ran out of memory due to memory leak
and it is swapping. There are several problems that v.surf.rst has with large files
(we have same problem too I just did not have
time to look for the fix - I may ask Markus M for help because it is related to things
that he has been dealing with):
- one is the use of 32 bit file offset instead of off_t (see below, my expertise in these issues is zero)
- another one is highly probable memory leak that might have been introduced with
crossvalidation - although it manages to create a quadtree in memory
it keeps filling the memory as it runs the computation so it looks like cleanup after
each segment computation is missing, I have some notes about where it is but I don't know how difficult
it is to find it and fix it.
For now we run what I call "supersegmentation" we just divide the whole region
into overlapping strips, interpolate and patch together using a script.
But the above issues are in fact bugs and should be fixed - I had it on my list for summer
but I guess it is starting to be an obstacle for more people than me so
I may as well start looking into it now - any advice will be appreciated,
Helena
P.S. For removing the waves in a DEM - get a small subset where it is a really big
problem and test the procedure with r.random - I found it quite challenging to get rid
of the waves once they are introduced.
And make sure they are not real - I was in for
quite a surprise when flying to FOSS4G in Victoria over the mountains that we used in the
1996 erosion modeling paper and where I diligently removed the waves after the reviewer
suggested they are artificial - there was no vegetation and the hillslopes have literally steps,
I should have looked at some photos when working with those data.
--------------------------------------------------------------------------------
The offending files for fseek problems are:
output2d.c, resout2d.c, resout2dmod.c, write2d.c
The offending files where the wrong type is used for offsets are:
interp2d.c, ressegm2d.c, segmen2d.c
GRASS 6.3.cvs (interp):~ > v.surf.rst -t in=OffShorePts4_xyz layer=0 dmin=0.0003 elev=TestAll_z zmult=1.0 tension=1300000 smooth=1 segmax=40 npmin=250 maskmap=InterpSouth_mask
Percent complete:
Reading lines from vector map ...
Reading nodes from vector map ...
WARNING: some points outside of region -- will ignore...
100%
WARNING: strip exists with insufficient data
WARNING: there are points outside specified 2D/3D region--ignored 48760
points
WARNING: ignoring 9075883 points -- too dense
The number of points from vector map is 10676718
The number of points outside of 2D/3D region 48760
The number of points being used is 1552075
Processing all selected output files
will require -268854816 bytes of disk space for temp files
dnorm = 0.034525, rescaled tension = 44.882578
Bitmap mask created
Percent complete:
WARNING: taking too long to find points for interpolation--please change
the region to area where your points are. Continuing
calculations...
Cannot fseek elev offset2=-1276198144
WARNING: taking too long to find points for interpolation--please change
the region to area where your points are. Continuing
calculations...
Cannot fseek elev offset2=-773086712
WARNING: taking too long to find points for interpolation--please change
the region to area where your points are. Continuing
calculations...
Cannot fseek elev offset2=-521417852
Cannot fseek elev offset2=-458453496
Cannot fseek elev offset2=-427071888
Cannot fseek elev offset2=-427070316
Helena,
Here are some more details regarding the 2GB file limit in the rst
code. The problem is that the current code uses 32-bit file offsets (int
and long) instead of 64-bit (off_t) offsets. The problems seem to be
limited to the src/libes/rst_gmsl/interp_float directory. You can grep
for "offset" and "fseek" and see where the problem lines are.
The offending files for fseek problems are:
output2d.c, resout2d.c, resout2dmod.c, write2d.c
The offending files where the wrong type is used for offsets are:
interp2d.c, ressegm2d.c, segmen2d.c
It is my understanding that the off_t data type should be used for all
file offsets. If the following compiler flags are set, then off_t is a
64 bit offset, otherwise it is 32-bits
-D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE
The off_t data type is declared in <unistd.h>
I am trying to rebuild at least interp2d.c, ressegm2d.c, and segmen2d.c
to use the off_t data type. I have added the line #include <unistd.h>,
but for some reason, the files do not recognize the off_t data type.
e.g.,
make[3]: Entering directory
`grass-5.4.0/src/libes/rst_gmsl/interp_float'
gcc -Igrass54-build/src/libes/rst_gmsl/interp_float
-Igrass-5.4.0/src/libes/rst_gmsl/interp_float -Igrass-5.4.0/src/include
-Igrass54-build/src/include -g -I../tree -I../data -g -O2
-D_FILE_OFFSET_BITS=64 -D_LARGE_FILE_SOURCE -fPIC -c interp2d.c -o
grass54-build/src/libes/rst_gmsl/interp_float/interp2d.o
interp2d.c: In function `IL_grid_calc_2d':
interp2d.c:87: error: `off_t' undeclared (first use in this function)
interp2d.c:87: error: (Each undeclared identifier is reported only once
interp2d.c:87: error: for each function it appears in.)
interp2d.c:87: error: parse error before "offset"
interp2d.c:147: error: `offset' undeclared (first use in this function)
interp2d.c:279: error: `offset2' undeclared (first use in this function)
So my first problem is "How can I get the grass libraries in rst_gmsl to
recognize the off_t type?" I suspect that another problem will be that
fseek explicitly uses long data types for offsets. I am running on Linux
and can use the fseeko version in GNU libc 2.x which uses off_t, but
this makes the code non-portable. The real solution would be to convert
fseeks to lseeks, but lseek takes a integer file descriptor while fseek
uses FILE pointers.
Can you check with the GRASS devel list and see what is the right way to
resolve these problems?
-Andy
On Apr 4, 2009, at 1:27 PM, Markus Neteler wrote:
Helena,
an Italian colleague of mine (in cc) tries to reinterpolate a large DEM
to get rid of contour line artefacts. I suggested to her to
random sample (r.random -b ...) and then v.surf.rst.random -b input=DTM_10m n=30% raster_output=DTM_random \
vector_output=rand_vectv.surf.rst with default parameters:
Percentuale completa:
Reading lines from vector map ...
Reading nodes from vector map ...
ATTENZIONE: strip exists with insufficient dataThe number of points from vector map is 27746241
The number of points outside of 2D/3D region 0
The number of points being used is 27746241
Processing all selected output files
will require -2119504096 bytes of disk space for temp filesATTENZIONE: taking too long to find points for interpolation--please change
the region to area where your points are. Continuing
calculations...-> 2GB barrier reached I guess.
So, since it consumed too much RAM, I suggested to her to activate the
segmentation. So she used:v.surf.rst input=rand_vect@sqlite layer=1 elev=elev_surf slope=slope_surf
aspect=aspect_surf tension=10. segmax=25 npmin=120 dmin=10.000000
dmax=25.000000 zmult=1.0to not use too much memory (though she has 4GB installed).
However, after 12 hours it did not even show 0%...
Further details:
GRASS 6.5.svn (Arno):~ > g.region -p
projection: 1 (UTM)
zone: 32
datum: wgs84
ellipsoid: wgs84
north: 4905887.5471429
south: 4756307.5471429
west: 592474.77809005
east: 768334.77809005
nsres: 10
ewres: 10
rows: 14958
cols: 17586
cells: 263051388GRASS 6.5.svn (Arno):~ > uname -a
Linux margherita 2.6.24-23-generic #1 SMP Thu Feb 5 14:30:38 UTC 2009
x86_64 GNU/LinuxDo you have a pointer for us?
thanks
Markus