[GRASS-user] v.lidar.edgedetection very slow

Some of the nested for(;:wink: loops could be improved by
adjusting start and end values. Maybe the attached patch for
vector/lidar/lidarlib/TcholBand.c helps. No warranty!

tested with v.lidar.edgedetection, coor and dbf binary files identical
before and after the patch. Hopefully that module touches all modified
lib fns to make it a good test. Applied in trunk and devbr6. Sped things
up by ~13% in my one test.

thanks,
Hamish

Hamish wrote:

Some of the nested for(;:wink: loops could be improved by
adjusting start and end values. Maybe the attached patch for
vector/lidar/lidarlib/TcholBand.c helps. No warranty!
    
tested with v.lidar.edgedetection, coor and dbf binary files identical
before and after the patch. Hopefully that module touches all modified
lib fns to make it a good test. Applied in trunk and devbr6. Sped things
up by ~13% in my one test.
  

Did some more testing. The see and sen options have drastic influence on
speed, with larger values it completes in a minute, with default values
it takes hours (lidar points in North Carolina sample dataset with 10m
resolution in the current computational region).

If have not the faintest idea about the theory behind the edge detection
code, but something appears strange to me, it's the effect of the sen
and see options. These options are interpolation spline step values
(option description), apparently something like step size. The current
computational region will be subdivided into several overlapping
subregions (source code comment). For each subregion, bilinar
interpolation, T Cholesky Decomposition, bicubic interpolation and
another T Cholesky Decomposition is done, then point classification.
What I don't understand is why with smaller steps, i.e. more, smaller
subregions, the resolution of these subregions is increased and the
matrix becomes larger. That's causing the incredibly long running time
of the module. It should be the other way around, larger step values
cause larger subregions and larger matrices not smaller matrices.
Changing the code in main.c and edgedetection.c accordingly gives very
similar results to the original version, only that the time required is
near constant for different spline step values and always very short,
nothing like hours, just a minute or two. It would be nice to have some
hints in the manual how to calculate reasonable values for a given lidar
point set and current computational region.

I'm not attaching any patches because I'm not sure of the original
behaviour is intended or if my arguments make sense. Haven't read all
the references...

Markus M

Ok, so I've got back to this and it appears the problem may have been sqlite.
'v.in.ascii -z-b-r z=3', then I connected dbf (definately needed for v.outlier to work- though no more Auxiliar_outlier_table errors) and edgedetection was quick. It only ever uses one core (25% CPU) and when RAM reaches around 520Mb usage it drops back to 80MB and climbs again. But it works and only took around 2mins. This is for the 182kb file.

We'll see how it handles the 47MB ones later, particularly for dbf. I still want to get a working methodology for at least one tile first.

Much of the other things people have mentioned have gone over my head. I'm not a coder. I do not want to start changing things that I do not understand (Hardware is more my thing). Cheers for the help though, greatly appreciated.

John

I read through the one of the papers today (M. A. Brovelli , M. Cannata and U. M. Longoni, 2002. Managing and processing LIDAR data within GRASS, Proceedings of the Open source GIS - GRASS users conference 2002) and the authors recommended a step value or 3 to 4 time the average point spacing of the data set. So 4m sen & see values seems appropriate for a data set that has an average coverage of 1 return/m^2. Unfortunately, the authors didn’t really explain why it should be 3-4 times and not say 10. I can only guess that a smaller step size would produce finer details in the interpolation and better feature isolation for the classification.

There definitely seems to be something funky going on between the spline step size and the division of the region into subregion. I ran tests using v.lidar.edgedetection on tiles of 500x500m, 800x800m , 801x801m and 1000x1000m. Times for each run were 1m54.065s, 6m35.420s, 22m16.708s & 68m25.265s respectively. There was a ~3 fold increase in processing time when I grew the data set by 1m from (800m)^2 (what I understand to be the maximum size of the region before it is broken up into subregions) to (801m)^2 and it tripled again when I jumped to (1000m)^2. I also recompiled grass with NSPLX_MAX and NSPLY_MAX set to 1500 instead of the current 200 and the time to process a (1000m)^2 tile dropped from 68 minutes to 12 minutes.

My understanding is that a region is broken up into smaller subregions to avoid memory limitations while processing. I can’t see any reason why the division of the region into more smaller subregions should increase the resolution of those subregions. I’m far from an expert on these programs, and I’m not skilled enough to read through the code and know what it is doing, but if I understand you correctly, your arguments make sense to me. If you’re kind enough to share your changes I’ll apply them and do some more testing.

Cheers,

Mike

On 20-Jun-09, at 1:56 AM, Markus GRASS wrote:

Did some more testing. The see and sen options have drastic influence on
speed, with larger values it completes in a minute, with default values
it takes hours (lidar points in North Carolina sample dataset with 10m
resolution in the current computational region).

If have not the faintest idea about the theory behind the edge detection
code, but something appears strange to me, it’s the effect of the sen
and see options. These options are interpolation spline step values
(option description), apparently something like step size. The current
computational region will be subdivided into several overlapping
subregions (source code comment). For each subregion, bilinar
interpolation, T Cholesky Decomposition, bicubic interpolation and
another T Cholesky Decomposition is done, then point classification.
What I don’t understand is why with smaller steps, i.e. more, smaller
subregions, the resolution of these subregions is increased and the
matrix becomes larger. That’s causing the incredibly long running time
of the module. It should be the other way around, larger step values
cause larger subregions and larger matrices not smaller matrices.
Changing the code in main.c and edgedetection.c accordingly gives very
similar results to the original version, only that the time required is
near constant for different spline step values and always very short,
nothing like hours, just a minute or two. It would be nice to have some
hints in the manual how to calculate reasonable values for a given lidar
point set and current computational region.

I’m not attaching any patches because I’m not sure of the original
behaviour is intended or if my arguments make sense. Haven’t read all
the references…

Markus M


grass-user mailing list
grass-user@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/grass-user

Michael Perdue wrote:

I read through the one of the papers today ( M. A. Brovelli , M.
Cannata and U. M. Longoni, 2002. Managing and processing LIDAR data
within GRASS, Proceedings of the / Open source GIS - GRASS users
conference 2002) and the authors recommended a step value or 3 to 4
time the average point spacing of the data set. So 4m sen & see values
seems appropriate for a data set that has an average coverage of 1
return/m^2. Unfortunately, the authors didn't really explain why it
should be 3-4 times and not say 10. I can only guess that a smaller
step size would produce finer details in the interpolation and better
feature isolation for the classification./

There definitely seems to be something funky going on between the
spline step size and the division of the region into subregion. I ran
tests using v.lidar.edgedetection on tiles of 500x500m, 800x800m ,
801x801m and 1000x1000m. Times for each run
were 1m54.065s, 6m35.420s, 22m16.708s & 68m25.265s respectively. There
was a ~3 fold increase in processing time when I grew the data set by
1m from (800m)^2 (what I understand to be the maximum size of the
region before it is broken up into subregions) to (801m)^2 and
it tripled again when I jumped to (1000m)^2. I also recompiled grass
with NSPLX_MAX and NSPLY_MAX set to 1500 instead of the current 200
and the time to process a (1000m)^2 tile dropped from 68 minutes to 12
minutes.

My understanding is that a region is broken up into smaller subregions
to avoid memory limitations while processing. I can't see any reason
why the division of the region into more smaller subregions should
increase the resolution of those subregions. I'm far from an expert on
these programs, and I'm not skilled enough to read through the code
and know what it is doing, but if I understand you correctly, your
arguments make sense to me. If you're kind enough to share your
changes I'll apply them and do some more testing.

I didn't get to look into that in more detail, stopped at
edgedetection.c without getting a working solution. The general idea is
either to use the current geographic region resolution or to use point
density for estimated resolution, then use sen and see as step size. I
can not possibly say if that makes sense, I have to read the references,
but will not get to it soon. How about contacting the original authors?

Markus M