[GRASS-dev] lidar tools update in grass7

Markus Metz wrote:

The core of the lidar tools is the lidarlib, that is AFAICT
robust and working. Bugs are most likely in modules, so I'll try
to add comments there. I have reorganized the code for v.surf.bspline
in the hope that it is now easier to read.

Slightly off-topic, but while we are thinking about this code ...

A while back I did some crude profiling of the v.lidar tools and found
that it was spending lots and lots of time in the Tcholetsky decomposition
loop (3-for loops deep).

It seemed like a good & simple test case for using OpenMP to multi-thread
it, but I got stuck with it segfaulting. AFAIR the problem was that OpenMP
wanted you to malloc the entire thing before starting, and this could
get big.

if it interests you, see the attached patch and
  https://trac.osgeo.org/grass/ticket/657
  http://lists.osgeo.org/pipermail/grass-dev/2009-June/044709.html
  http://lists.osgeo.org/pipermail/grass-dev/2009-June/044705.html
  http://grass.osgeo.org/wiki/OpenMP

I forgot to mention, there is a problem with sqlite, it is much slower
than dbf, no idea if this is a problem of my system or of the grass
sqlite driver or of the way auxiliary tables are accessed by the lidar
tools.

that sounds vaguely familiar, but as a general thing not necessarily
to do with v.lidar. probably something about it in the archives?

Hamish

(attachments)

openmp_lidarlib_trunk.patch (1.85 KB)

more updates in r40860, mostly documentation

Hamish wrote:

Markus Metz wrote:
  

The core of the lidar tools is the lidarlib, that is AFAICT
robust and working. Bugs are most likely in modules, so I'll try
to add comments there. I have reorganized the code for v.surf.bspline
in the hope that it is now easier to read.
    
Slightly off-topic, but while we are thinking about this code ...

A while back I did some crude profiling of the v.lidar tools and found
that it was spending lots and lots of time in the Tcholetsky decomposition
loop (3-for loops deep).
  

That's better now because the matrix is a bit smaller.

It seemed like a good & simple test case for using OpenMP to multi-thread
it, but I got stuck with it segfaulting. AFAIR the problem was that OpenMP
wanted you to malloc the entire thing before starting, and this could
get big.

if it interests you, see the attached patch and
  https://trac.osgeo.org/grass/ticket/657
  http://lists.osgeo.org/pipermail/grass-dev/2009-June/044709.html
  http://lists.osgeo.org/pipermail/grass-dev/2009-June/044705.html
  http://grass.osgeo.org/wiki/OpenMP
  

Hmm, if I understand the code right, only the innermost for loop can be executed in parallel because the first two for-loops need results of previous runs (not possible to run i + 1 before i, same for j). But I don't know that parallel stuff, I would in any case first optimize the code without parallelization, only then add multithreading.

I forgot to mention, there is a problem with sqlite, it is much slower
than dbf, no idea if this is a problem of my system or of the grass sqlite driver or of the way auxiliary tables are accessed by the lidar
tools.
    
that sounds vaguely familiar, but as a general thing not necessarily
to do with v.lidar. probably something about it in the archives?
  

I discovered two little tricks: first, create the table, close database and driver, open database and driver again, work with table, second, creating an index on the table if possible helps too. Tested with sqlite and dbf.

Markus M

Hello,

2010/2/8 Hamish <hamish_b@yahoo.com>:

Markus Metz wrote:

The core of the lidar tools is the lidarlib, that is AFAICT
robust and working. Bugs are most likely in modules, so I'll try
to add comments there. I have reorganized the code for v.surf.bspline
in the hope that it is now easier to read.

Slightly off-topic, but while we are thinking about this code ...

A while back I did some crude profiling of the v.lidar tools and found
that it was spending lots and lots of time in the Tcholetsky decomposition
loop (3-for loops deep).

It seemed like a good & simple test case for using OpenMP to multi-thread
it, but I got stuck with it segfaulting. AFAIR the problem was that OpenMP
wanted you to malloc the entire thing before starting, and this could
get big.

There is an OpenMP version of a bandwidth optimized cholesky
decomposition algorithm in gmathlib.
I will have a look on the lidarlib to evaluate if we can use the
existing OpenMP tuned
cholesky. Besides of that, i will move the lidar linear equation
solver and the inversion algorithm into
the gmath library, so more modules may benefit from it.

I have made some performance tests with the OpenMP tuned linear
equation solver (cholesky, gauss, lu) in gmathlib,
there is unfortunately not much speedup, which is related to the
design of the solver algorithms.
The solver benchmarks are available in the grass7 lib/gmath/tests directory.
I have had only measurable speedup with the intel compiler (which is
free for open source projects),
the speedup with the gcc OpenMP implementation is little.

Additionally it would be interesting to consider the use of the OpenMP
tuned conjugate gradient solver from gmath within the lidar tools.
This solver has an excellent linear speedup (related to the
parallelzied matrix-vector multiplication), is quite fast (non-linear
convergence)
and works with sparse matrices.

Best regards
Soeren

if it interests you, see the attached patch and
https://trac.osgeo.org/grass/ticket/657
http://lists.osgeo.org/pipermail/grass-dev/2009-June/044709.html
http://lists.osgeo.org/pipermail/grass-dev/2009-June/044705.html
http://grass.osgeo.org/wiki/OpenMP

I forgot to mention, there is a problem with sqlite, it is much slower
than dbf, no idea if this is a problem of my system or of the grass
sqlite driver or of the way auxiliary tables are accessed by the lidar
tools.

that sounds vaguely familiar, but as a general thing not necessarily
to do with v.lidar. probably something about it in the archives?

Hamish

_______________________________________________
grass-dev mailing list
grass-dev@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/grass-dev

On Mon, Feb 8, 2010 at 5:18 AM, Hamish <hamish_b@yahoo.com> wrote:

Markus Metz wrote:

...

I forgot to mention, there is a problem with sqlite, it is much slower
than dbf, no idea if this is a problem of my system or of the grass
sqlite driver or of the way auxiliary tables are accessed by the lidar
tools.

that sounds vaguely familiar, but as a general thing not necessarily
to do with v.lidar. probably something about it in the archives?

I found some older conversation incl. patches and speed tests:

http://lists.osgeo.org/pipermail/grass-user/2009-June/051009.html
... perhaps useful?

On Mon, Feb 8, 2010 at 9:17 AM, Markus Metz
<markus.metz.giswork@googlemail.com> wrote:

I discovered two little tricks: first, create the table, close database and
driver, open database and driver again, work with table, second, creating an
index on the table if possible helps too. Tested with sqlite and dbf.

I wonder if sqlite is creating some index when creating/closing/opening again?

Markus

Hello,

Slightly off-topic, but while we are thinking about this code ...

A while back I did some crude profiling of the v.lidar tools and found
that it was spending lots and lots of time in the Tcholetsky decomposition
loop (3-for loops deep).

That's better now because the matrix is a bit smaller.

Yes, its a nice symmetric band matrix. Was a bit puzzling for me to
understand the creation of the matrix without any documentation!
So, i have implemented some conversion functions into the gmath
library while moving the tchol* code from lidar lib into the gmath
lib: now transformation of qudratic <-> band <-> sparse matrices can
be performed. A new CG band solver, the matrix type conversion and a
new band matrix -vector multiplication function are available with
tests in the svn-trunk of grass 7.

It seemed like a good & simple test case for using OpenMP to multi-thread
it, but I got stuck with it segfaulting. AFAIR the problem was that OpenMP
wanted you to malloc the entire thing before starting, and this could
get big.

if it interests you, see the attached patch and
https://trac.osgeo.org/grass/ticket/657
http://lists.osgeo.org/pipermail/grass-dev/2009-June/044709.html
http://lists.osgeo.org/pipermail/grass-dev/2009-June/044705.html
http://grass.osgeo.org/wiki/OpenMP

Hmm, if I understand the code right, only the innermost for loop can be
executed in parallel because the first two for-loops need results of
previous runs (not possible to run i + 1 before i, same for j). But I don't
know that parallel stuff, I would in any case first optimize the code
without parallelization, only then add multithreading.

I fully agree. The band cholesky solver is well suited for the job but
not designed for parallelization. Thus i have implemented an iterative
Conjugate Gradient (CG) solver for band matrices (just by replacing
the matrix - vector multiplication) to replace the cholesky band
solver. Well, the cholesky solver out-performes the CG solver by a
factor of 10+. I have partly parallelized the CG solver for band
matrices, but even the speedup on a 8 core system is to small to
compete with the cholesky band solver. Besides of that, the created
band matrices are of bad condition, so the CG solver may fail for
subregions.
Hence, i would suggest to use MPI parallelization or something else on
process base, to concurrently compute the subregions which are created
by default. Maybe a python script using overlapping tiling and
subprocess can do the job?

Best regards
Soeren

Soeren Gebbert wrote:

Hello,

A while back I did some crude profiling of the v.lidar tools and found
that it was spending lots and lots of time in the Tcholetsky decomposition
loop (3-for loops deep).

That's better now because the matrix is a bit smaller.

Yes, its a nice symmetric band matrix. Was a bit puzzling for me to
understand the creation of the matrix without any documentation!

I added documentation where I understood the code but honestly, the
inner workins of the lidar_lib are still a mystery to me.

So, i have implemented some conversion functions into the gmath
library while moving the tchol* code from lidar lib into the gmath
lib

There are still bugs in the code, mostly minor concerning warning and
error messages, one major in v.surf.bspline for very large raster
output regions. I'm busy fixing it, also converting it to a new module
r.resamp.bspline that does particularly well for filling NULL cells, a
real alternative to (the much slower) r.fillnulls.

[...] i would suggest to use MPI parallelization or something else on
process base, to concurrently compute the subregions which are created
by default. Maybe a python script using overlapping tiling and
subprocess can do the job?

I think that should be integrated into the current lidar tools because
they make use of the current subregion tiling and C code is still
faster than Python code (right?). Treatment of overlapping zones of
neighbouring subregions is done sometimes in the lidar_lib, sometimes
by the modules themselves. It took me quite some time to understand it
and fix it, therefore I would advise against code porting/duplication.

My 2c,

Markus M

Hello Markus,

2010/4/28 Markus Metz <markus.metz.giswork@googlemail.com>:

Soeren Gebbert wrote:

Hello,

A while back I did some crude profiling of the v.lidar tools and found
that it was spending lots and lots of time in the Tcholetsky decomposition
loop (3-for loops deep).

That's better now because the matrix is a bit smaller.

Yes, its a nice symmetric band matrix. Was a bit puzzling for me to
understand the creation of the matrix without any documentation!

I added documentation where I understood the code but honestly, the
inner workins of the lidar_lib are still a mystery to me.

Thanks for adding documentation, that makes maintaining the modules
much easier!
The bilinear and bicubic interpolation is a mystery for me too ... at
module and theoretical level.

So, i have implemented some conversion functions into the gmath
library while moving the tchol* code from lidar lib into the gmath
lib

There are still bugs in the code, mostly minor concerning warning and
error messages, one major in v.surf.bspline for very large raster
output regions. I'm busy fixing it, also converting it to a new module
r.resamp.bspline that does particularly well for filling NULL cells, a
real alternative to (the much slower) r.fillnulls.

I have renamed some variable (italian -> english) in the lidar ilb and
modules, to get a better understanding of what is happening in the
code. I have committed the changes to svn, in case the changes are
wrong, please tell me, i will revert it.

[...] i would suggest to use MPI parallelization or something else on
process base, to concurrently compute the subregions which are created
by default. Maybe a python script using overlapping tiling and
subprocess can do the job?

I think that should be integrated into the current lidar tools because
they make use of the current subregion tiling and C code is still
faster than Python code (right?). Treatment of overlapping zones of

The computation time of overlapping regions should be the same in
python as in C.
IMHO it is easier to program a python module which computes the tiles,
split up the vector points based on the tiles and distribute the data
across a grid or cloud for computation. After the distributed
processes are finished, the script can gather the data and merge it
together.

If you implement this into the lidar modules, each module needs to be
changed to support MPI. Using MPI is IMHO much more complex, then just
running the modules on different datasets on different nodes,
gathering and patching the data. There are already modules which
support this process (g.region, r.tileset, r.patch ... ). Well, is
there a module which supports the extraction of vector points of the
current region without topology support?

neighbouring subregions is done sometimes in the lidar_lib, sometimes
by the modules themselves. It took me quite some time to understand it
and fix it, therefore I would advise against code porting/duplication.

Maybe the overlapping algorithm can be simplified in the python version?
In my vision, the LIDAR computation could be run in an elastic
computation cloud (EC2) as WPS backend using recursive tiling
algorithm for data distribution.

Best regards
Soeren

My 2c,

Markus M

Soeren wrote:

The computation time of overlapping regions should be the
same in python as in C.
IMHO it is easier to program a python module which computes
the tiles, split up the vector points based on the tiles and
distribute the data across a grid or cloud for computation.
After the distributed processes are finished, the script can
gather the data and merge it together.

If you implement this into the lidar modules, each module
needs to be changed to support MPI. Using MPI is IMHO much more
complex, then just running the modules on different datasets
on different nodes, gathering and patching the data.

...

Maybe the overlapping algorithm can be simplified in the
python version?
In my vision, the LIDAR computation could be run in an
elastic computation cloud (EC2) as WPS backend using
recursive tiling algorithm for data distribution.

thinking big!

for my 2c (fwiw I run about 160 computational-hours per day
split up and dynamically sent out to a couple of clusters using
MPI), I would first ask how much processing are we talking
about? If the typical large-case lidar tools job only took ~3
hours to run on a modern PC, I'd much prefer the simplicity of
OpenMP and run it on the same quad-core desktop or 8-core Xeon
server that the GIS is running on.

If the jobs could take 12-24 hours and have to happen often I
would more seriously consider using MPI. Setting up the mpd
daemon and organizing the mpirun jobs is a lot of overhead,
which for smaller jobs seems slightly counter-productive to me.

also, I think it is quite natural to have the control and
management scripts in python, but get worried about splitting
up low-level processing into a hybrid python/C mix. for one
thing it makes the code harder to follow/understand, for another
you are at risk from different component versions interacting
in different ways. (see wxNviz, vdigit problems on WinGrass..)

(and of course this has to be an optional extra, and I'm all for
exploring all these different methods .. we don't know which
one will win out/survive in a few years time when 16-core
workstations are the norm)

regards,
Hamish

Hi Hamish,

2010/5/1 Hamish <hamish_b@yahoo.com>:

Soeren wrote:

The computation time of overlapping regions should be the
same in python as in C.
IMHO it is easier to program a python module which computes
the tiles, split up the vector points based on the tiles and
distribute the data across a grid or cloud for computation.
After the distributed processes are finished, the script can
gather the data and merge it together.

If you implement this into the lidar modules, each module
needs to be changed to support MPI. Using MPI is IMHO much more
complex, then just running the modules on different datasets
on different nodes, gathering and patching the data.

...

Maybe the overlapping algorithm can be simplified in the
python version?
In my vision, the LIDAR computation could be run in an
elastic computation cloud (EC2) as WPS backend using
recursive tiling algorithm for data distribution.

thinking big!

Thinking practical. :slight_smile:

Ignoring the WPS overhead, such an approach may work for every lidar
module and other SIMD (single instruction, multiple data) type
modules, where the computational effort is much larger than the
tiling, distribution and gathering effort (rst - interpolation, ...).
A tiling - distribution - gathering approach can be implemented in
python very simple (multiprocessing, pympi ...).

for my 2c (fwiw I run about 160 computational-hours per day
split up and dynamically sent out to a couple of clusters using
MPI), I would first ask how much processing are we talking
about? If the typical large-case lidar tools job only took ~3
hours to run on a modern PC, I'd much prefer the simplicity of
OpenMP and run it on the same quad-core desktop or 8-core Xeon
server that the GIS is running on.

If the jobs could take 12-24 hours and have to happen often I
would more seriously consider using MPI. Setting up the mpd
daemon and organizing the mpirun jobs is a lot of overhead,
which for smaller jobs seems slightly counter-productive to me.

In case the GRASS project has developers which have the time and the
knowledge to parallelize the lidar modules using OpenMP or MPI that's
fine. But i do not see that we have such resources. I do not have the
knowledge and time to implement it. IMHO many modules would benefit
from a more general, generic approach described above. Such an
approach can be run on multi-core machines, a cluster or in a cloud.

I do not have a cluster available, only a 4 core machine. If i need to
run a huge LIDAR job in short time (which i never did before!), i
would need to buy additional computational power. I.e: Amazon EC2 is a
quite reasonable choice for that.

also, I think it is quite natural to have the control and
management scripts in python, but get worried about splitting
up low-level processing into a hybrid python/C mix. for one
thing it makes the code harder to follow/understand, for another
you are at risk from different component versions interacting
in different ways. (see wxNviz, vdigit problems on WinGrass..)

Using i.e: MPI for parallelism means often to implement two versions
of the program, a serial and a parallel version. OpenMP works only on
multi-core or ccNUMA machines (SGI Altix) not in a cluster or cloud.
Many paralleized programs are more complex and mostly rewritten than
the serial version they based on. IMHO to implement a single python
approach with additional python dependencies is not as risky as to
implement and maintain many modules of highly complex parallel C code
with additional serial versions.

Using a tiling - distribution . gathering python approach will have a
much lower speedup as pure OpenMP or MPI versions. But the benefit
will raise with the size of the computational problem.

Best regards
Soeren

(and of course this has to be an optional extra, and I'm all for
exploring all these different methods .. we don't know which
one will win out/survive in a few years time when 16-core
workstations are the norm)

regards,
Hamish

On Wed, Apr 28, 2010, Soeren Gebbert wrote:

Hmm, if I understand the code right, only the innermost for loop [of the tchol solver] can be
executed in parallel because the first two for-loops need results of
previous runs (not possible to run i + 1 before i, same for j). But I don't
know that parallel stuff, I would in any case first optimize the code
without parallelization, only then add multithreading.

I fully agree. The band cholesky solver is well suited for the job but
not designed for parallelization. Thus i have implemented an iterative
Conjugate Gradient (CG) solver for band matrices (just by replacing
the matrix - vector multiplication) to replace the cholesky band
solver. Well, the cholesky solver out-performes the CG solver by a
factor of 10+. I have partly parallelized the CG solver for band
matrices, but even the speedup on a 8 core system is to small to
compete with the cholesky band solver. Besides of that, the created
band matrices are of bad condition, so the CG solver may fail for
subregions.

IMHO, it doesn't really make sense to replace a function with another
function that's 10+ times slower and produces worse results... If you
need a parallelized version and a 10+ core system to achieve the same
speed, and the results are not as good as with the original tchol
solver, I would suggest to keep using the original tchol solver until
the CG solver is faster and produces identical results (also for
asymmetrical matrices).

Hence, i would suggest to use MPI parallelization or something else on
process base, to concurrently compute the subregions which are created
by default. Maybe a python script using overlapping tiling and
subprocess can do the job?

The handling of the overlapping tiles is deeply embedded in the module
code, at first glance it seems to me that you would have to completely
translate the module to python and then call the interpolation
routines from python? Seems messy to me. The subregion tiling is a bit
tricky and I spent quite some time fixing and optimizing it.

On Fri, Apr 30 Soeren Gebbert wrote:

I have renamed some variable (italian -> english) in the lidar ilb and
modules, to get a better understanding of what is happening in the
code. I have committed the changes to svn, in case the changes are
wrong, please tell me, i will revert it.

Thanks, looks good to me!

[...] i would suggest to use MPI parallelization or something else on
process base, to concurrently compute the subregions which are created
by default. Maybe a python script using overlapping tiling and
subprocess can do the job?

I think that should be integrated into the current lidar tools because
they make use of the current subregion tiling and C code is still
faster than Python code (right?). Treatment of overlapping zones of

The computation time of overlapping regions should be the same in
python as in C.
IMHO it is easier to program a python module which computes the tiles,
split up the vector points based on the tiles and distribute the data
across a grid or cloud for computation. After the distributed
processes are finished, the script can gather the data and merge it
together.

As above, IMHO that would be a complete rewrite of the modules in
python. Maybe because I am more familiar with C. Changing the C code
so that tiles can be processed in parallel should not be too hard.

If you implement this into the lidar modules, each module needs to be
changed to support MPI. Using MPI is IMHO much more complex, then just
running the modules on different datasets on different nodes,
gathering and patching the data. There are already modules which
support this process (g.region, r.tileset, r.patch ... ).

r.tileset is not doing the same like the subregion tiling done in the
lidartools because of the overlapping. The outermost overlapping edge
is used for interpolation but the values interpolated there are never
used, it exists to avoid edge effects along the borders of the
subregion. The interpolation results of the second overlapping zone
are used but weighted according to their distance to the core zone.
The interpolation results of the core zone are used as is. The extends
of the overlapping zones dramatically influence accuracy and speed,
and the current design should be kept in order to avoid edge effects.
AFAICT, the interpolated surfaces are seamless, no edge effects
whatsoever visible in various diagnostic maps calculated.

Well, is
there a module which supports the extraction of vector points of the
current region without topology support?

The only way to extract vector points for the current region without
topology is to go through all points and check for each point if it
falls into the current region.

neighbouring subregions is done sometimes in the lidar_lib, sometimes
by the modules themselves. It took me quite some time to understand it
and fix it, therefore I would advise against code porting/duplication.

Maybe the overlapping algorithm can be simplified in the python version?

I had accuracy of results in mind when I modified the overlapping
algorithm because the previous algorithm produced bad results, the
reason why v.surf.bpline was restricted in such a way that only one
subregion was needed, i.e. processing everything at once. If there is
a way to simplify it without changing the results, great!

In my vision, the LIDAR computation could be run in an elastic
computation cloud (EC2) as WPS backend using recursive tiling
algorithm for data distribution.

For massive LiDAR datasets, an approach like this would be really cool!

Markus M

Hello Markus,

2010/5/3 Markus Metz <markus.metz.giswork@googlemail.com>:

On Wed, Apr 28, 2010, Soeren Gebbert wrote:

Hmm, if I understand the code right, only the innermost for loop [of the tchol solver] can be
executed in parallel because the first two for-loops need results of
previous runs (not possible to run i + 1 before i, same for j). But I don't
know that parallel stuff, I would in any case first optimize the code
without parallelization, only then add multithreading.

I fully agree. The band cholesky solver is well suited for the job but
not designed for parallelization. Thus i have implemented an iterative
Conjugate Gradient (CG) solver for band matrices (just by replacing
the matrix - vector multiplication) to replace the cholesky band
solver. Well, the cholesky solver out-performes the CG solver by a
factor of 10+. I have partly parallelized the CG solver for band
matrices, but even the speedup on a 8 core system is to small to
compete with the cholesky band solver. Besides of that, the created
band matrices are of bad condition, so the CG solver may fail for
subregions.

IMHO, it doesn't really make sense to replace a function with another
function that's 10+ times slower and produces worse results... If you
need a parallelized version and a 10+ core system to achieve the same
speed, and the results are not as good as with the original tchol
solver, I would suggest to keep using the original tchol solver until
the CG solver is faster and produces identical results (also for
asymmetrical matrices).

Sorry for my misunderstanding words, i not suggest to use the parallel
CG algorithm instead of the cholesky. I just wanted to say that i
tried to find an alternative and failed.

Hence, i would suggest to use MPI parallelization or something else on
process base, to concurrently compute the subregions which are created
by default. Maybe a python script using overlapping tiling and
subprocess can do the job?

The handling of the overlapping tiles is deeply embedded in the module
code, at first glance it seems to me that you would have to completely
translate the module to python and then call the interpolation
routines from python? Seems messy to me. The subregion tiling is a bit
tricky and I spent quite some time fixing and optimizing it.

I don't intent to move the tiling from the C-code into the
Python-code, but to have a Python module which is able to tile huge
regions into smaller subregions avoiding edge effects using
overlapping strategies. These subregions should be still large enough
so that the lidar tools can use there tiling strategies. No
interpolation is called from the python modules.
If that mean to re-implement the lidar tiling mechanism in Python
using spline estimation and all the fancy algorithms would be to messy
indeed. I was hoping that there are simpler overlapping strategies
avoiding edge effects (large overlapping edges which are cut to
smaller ones when patching the large tiles using a pixel averaging
strategy ...).

[snip]

As above, IMHO that would be a complete rewrite of the modules in
python. Maybe because I am more familiar with C. Changing the C code
so that tiles can be processed in parallel should not be too hard.

I see. That will mean to re-fracture the C-code, putting the
consolidated tiling algorithms in the lidar library and implement MPI
parallelism in the lidarlib and each module. Thats IMHO a lot of work
too. :slight_smile:

[snip]

r.tileset is not doing the same like the subregion tiling done in the
lidartools because of the overlapping. The outermost overlapping edge
is used for interpolation but the values interpolated there are never
used, it exists to avoid edge effects along the borders of the
subregion. The interpolation results of the second overlapping zone
are used but weighted according to their distance to the core zone.
The interpolation results of the core zone are used as is. The extends
of the overlapping zones dramatically influence accuracy and speed,
and the current design should be kept in order to avoid edge effects.
AFAICT, the interpolated surfaces are seamless, no edge effects
whatsoever visible in various diagnostic maps calculated.

Please excuse my ignorance, i am not familiar with the lidar tiling
algorithm. So i thought the size of the overlapping edges can be
computed in python and used as parameter for r.tileset (parameter
overlap). After the lidar interpolation, the created tiles can be
patched by cutting the overlapping edges and using pixel averaging.
This will, as you mentioned above not be sufficient to avoid edge
effects.

So there will be no good alternative to a native MPI lidar version?

Best regards
Soeren

On Mon, May 3 Soeren Gebbert wrote:

Hello Markus,

2010/5/3 Markus Metz:

IMHO, it doesn't really make sense to replace a function with another
function that's 10+ times slower and produces worse results... If you
need a parallelized version and a 10+ core system to achieve the same
speed, and the results are not as good as with the original tchol
solver, I would suggest to keep using the original tchol solver until
the CG solver is faster and produces identical results (also for
asymmetrical matrices).

Sorry for my misunderstanding words, i not suggest to use the parallel
CG algorithm instead of the cholesky. I just wanted to say that i
tried to find an alternative and failed.

Um, don't give up that quickly;-) The cholesky decomposition is the
main bottleneck, it would be great to get it faster!

[snip]

I was hoping that there are simpler overlapping strategies
avoiding edge effects (large overlapping edges which are cut to
smaller ones when patching the large tiles using a pixel averaging
strategy ...).

Me too, I tried simpler tiling methods but it did not work. This fancy
method with two overlapping zones and weighing according to distance
from core zone seems to be the only solution to avoid edge effects and
is present nearly everywhere in the C code, both in lidarlib and in
the modules.

[snip]

IMHO that would be a complete rewrite of the modules in
python. Maybe because I am more familiar with C. Changing the C code
so that tiles can be processed in parallel should not be too hard.

I see. That will mean to re-fracture the C-code, putting the
consolidated tiling algorithms in the lidar library and implement MPI
parallelism in the lidarlib and each module. Thats IMHO a lot of work
too. :slight_smile:

I agree, but at least conceptually it seems relatively easy to me to
process the tiles in parallel.

[snip]

[...] i am not familiar with the lidar tiling
algorithm. So i thought the size of the overlapping edges can be
computed in python and used as parameter for r.tileset (parameter
overlap). After the lidar interpolation, the created tiles can be
patched by cutting the overlapping edges and using pixel averaging.
This will, as you mentioned above not be sufficient to avoid edge
effects.

Pixel averaging is not enough, needed is the sum of the weighed
interpolation values, and the weighing for e.g. v.surf.bspline is done
in lidarlib/raster.c and for v.outlier in v.outlier/outlier.c.

So there will be no good alternative to a native MPI lidar version?

Maybe something like this: calculate subregions in python as done in
lidarlib, with the help of r.tileset plus calculating buffer zones
plus g.region, run the chosen lidar module in parallel for each
subregion set by g.region (can you use different computational regions
in parallel?), do some r.mapcalc stuff to weigh the interpolated
values in the overlapping zones as done in the lidar tools, patch
tiles by summing up with r.series method=sum. The r.mapcalc and
r.series steps will eat up (some of the) saved time I guess. Both
meta- or supertiles in a python script and rewriting current modules
in C to process subregions in parallel is quite a bit of work I guess.
But I am sure it's worth it for very large regions.

Markus M

Maybe this could help to fasten the work with minimum effort? Google
to Cholesky decomposotion returned positives on it.

ScaLAPACK is a library of high-performance linear algebra routines for
distributed-memory message-passing MIMD computers and networks of
workstations supporting PVM and/or MPI

http://www.netlib.org/scalapack/slug/node9.html#SECTION04110000000000000000

Um, don't give up that quickly;-) The cholesky decomposition is the
main bottleneck, it would be great to get it faster!

Hi,

I'm Sara and I am a ph.d student at the Politecnico di Milano with the group
that developed the LiDAR set of commands.
Unfortunately I'm not very found in programming but I work a lot with LiDAR
data.

I am not able to help on the programming side but I can help testing the
updated command (e.g running the "new command" on LiDAR dataset that I've
already filtered in order to evaluate differences in the
performances/results)

Sara
--
View this message in context: http://osgeo-org.1803224.n2.nabble.com/lidar-tools-update-in-grass7-tp4521505p5029463.html
Sent from the Grass - Dev mailing list archive at Nabble.com.

Hi Sara,

your help is highly appreciated! There are no new commands, only one
in grass7 called r.resamp.bspline. Contrary to v.surf.bspline, it uses
a raster map as input and resamples that map to the resolution and
extends of the current computational region. The existing commands
have been modified and fixed where necessary. They should all run
faster now and some problems with the zones overlapping adjacent
subregions where eliminated. Still, more testing is always good, it
seems that the previous version of the lidar tools did not receive a
lot of testing or bug fixing...

There is one open issue with edge detection. v.lidar.edgedetection
uses two criteria, a high gradient threshold and a low gradient
threshold (hard and soft criterion). All points with a gradient larger
than high gradient threshold are classified as edges. All points with
a gradient larger than low gradient threshold and smaller than high
gradient threshold are investigated more closely, but only if they are
in the core interpolation zone and do not fall into an overlapping
zone. In my tests, it seemed to be ok to treat points in the
overlapping zone identical to points in the core zone. I understand
why the module is conservative and currently treats these points as
unknown and does not classify them as terrain or edge, but it seems to
me that this is a bit overly conservative. It would be great if you
could look into this issue! I could send you a patch for
v.lidar.edgedetection that classifies all points to either terrain or
edge, so you can analyse the classification accuracy.

Markus M

On Mon, May 10, 2010 at 10:24 AM, Sara Lucca <saretta.lucca@hotmail.it> wrote:

Hi,

I'm Sara and I am a ph.d student at the Politecnico di Milano with the group
that developed the LiDAR set of commands.
Unfortunately I'm not very found in programming but I work a lot with LiDAR
data.

I am not able to help on the programming side but I can help testing the
updated command (e.g running the "new command" on LiDAR dataset that I've
already filtered in order to evaluate differences in the
performances/results)

Sara
--
View this message in context: http://osgeo-org.1803224.n2.nabble.com/lidar-tools-update-in-grass7-tp4521505p5029463.html
Sent from the Grass - Dev mailing list archive at Nabble.com.
_______________________________________________
grass-dev mailing list
grass-dev@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/grass-dev