[GRASS-dev] anyone interested in helping to port an algorithm from matlab?

Hi,

I was taking another look on a paper published in Computers and GeoSciences on
something called a 'Variance Quadtree' algorithm[1]. The main idea is to
recursively partition an image into smaller and smaller rectangles, based on
the amount of variance within each rectangle. Thus, regions of more variation
get split into smaller nested rectangles.

The source code for the algorithm is given in Matlab source[2]. I have tried
unsuccessfully to port the code to R, mostly because I do not completely
understand several of the matlab matrix idioms used in the code.

I think that a C version of the algorithm, along with some minor modifications
to convert the quadtree structure into GRASS vectors (similar to what
v.surf.rst can output) would be a nice addition to imagery analysis in GRASS.

Anyone want to give some pointers / help on porting this to C? The algorithm
is relatively simple, however some of the matlab-specific matrix operations
may be a challenge.

1. http://dx.doi.org/10.1016/j.cageo.2006.08.009
2. http://www.usyd.edu.au/su/agric/acpa/software/matlab/vqt_matlab.zip

Cheers,

Dylan

--
Dylan Beaudette
Soil Resource Laboratory
http://casoilresource.lawr.ucdavis.edu/
University of California at Davis
530.754.7341

On Wednesday 05 November 2008, Dylan Beaudette wrote:

Hi,

I was taking another look on a paper published in Computers and GeoSciences
on something called a 'Variance Quadtree' algorithm[1]. The main idea is to
recursively partition an image into smaller and smaller rectangles, based
on the amount of variance within each rectangle. Thus, regions of more
variation get split into smaller nested rectangles.

The source code for the algorithm is given in Matlab source[2]. I have
tried unsuccessfully to port the code to R, mostly because I do not
completely understand several of the matlab matrix idioms used in the code.

I think that a C version of the algorithm, along with some minor
modifications to convert the quadtree structure into GRASS vectors (similar
to what v.surf.rst can output) would be a nice addition to imagery analysis
in GRASS.

Anyone want to give some pointers / help on porting this to C? The
algorithm is relatively simple, however some of the matlab-specific matrix
operations may be a challenge.

1. http://dx.doi.org/10.1016/j.cageo.2006.08.009
2. http://www.usyd.edu.au/su/agric/acpa/software/matlab/vqt_matlab.zip

Cheers,

Dylan

Just in case this help, attached is a working copy of a FORTRAN95 version of
the same algorithm.

Dylan

--
Dylan Beaudette
Soil Resource Laboratory
http://casoilresource.lawr.ucdavis.edu/
University of California at Davis
530.754.7341

(attachments)

vqt2c.for (7.92 KB)

Dylan Beaudette wrote:

The source code for the algorithm is given in Matlab source[2]. I have
tried unsuccessfully to port the code to R, mostly because I do not
completely understand several of the matlab matrix idioms used in the
code.

what are you stuck on in particular?

you can install GNU Octave to play around with Matlab-like syntax.

dot-before-operator (e.g. ".*") for array multiplication might be a little
confusing, it does like:

a = [ 1 2 ]
b = [ 10 20 ]
a .* b

ans =
    10 40

a * b'

ans =
    50

( ' transposes array from 1x2 to 2x1)

Hamish

On Wednesday 05 November 2008, Hamish wrote:

Dylan Beaudette wrote:
> The source code for the algorithm is given in Matlab source[2]. I have
> tried unsuccessfully to port the code to R, mostly because I do not
> completely understand several of the matlab matrix idioms used in the
> code.

what are you stuck on in particular?

you can install GNU Octave to play around with Matlab-like syntax.

dot-before-operator (e.g. ".*") for array multiplication might be a little

confusing, it does like:
>> a = [ 1 2 ]
>> b = [ 10 20 ]
>> a .* b

ans =
    10 40

>> a * b'

ans =
    50

( ' transposes array from 1x2 to 2x1)

Hamish

Hi Hamish,

Actually I was able to make some progress in R, until I got to something like
this:

# accumulate new quadrants
# WTF ?
cc = [cc;cord]
Qc=[Qc;Qh]

if cc is matrix, and cord is a matrix, what exactly is happening?

Thanks for the tip on Ocatave, I will give that a try too. I have a sneaking
suspicion that this algorithm is not very efficient (the fortran version), as
it takes quite a while (well 3 minutes) to process a 400x400 grid.

I would be interesting in developing a C version, that had some additional
functionality, and could output GRASS polygons. Also- I think that the
algorithm has some kind of rounding or indexing errors, as I see systematic
variation in areas of the quadrants (see attached).

Cheers,
Dylan

--
Dylan Beaudette
Soil Resource Laboratory
http://casoilresource.lawr.ucdavis.edu/
University of California at Davis
530.754.7341

(attachments)

area_not_consistant.png

Dylan Beaudette wrote:

# accumulate new quadrants
# WTF ?
cc = [cc;cord]
Qc=[Qc;Qh]

if cc is matrix, and cord is a matrix, what exactly is
happening?

"coord" is appended to the end of "cc". e.g.:

a = [ 10 20 ]
b = [ 15 25 ]
c = [a; b]

c =

    10 20
    15 25

d = [ a; b; a ]

d =

    10 20
    15 25
    10 20

I have a sneaking suspicion that this algorithm is not very efficient
(the fortran version), as it takes quite a while (well 3 minutes) to
process a 400x400 grid.

in the above example, appending the variable has the effect of resizing
the array, which is very very slow. Whenever possible it is better to
determine/guess the final size of the array and create it (full of NaNs)
and populate it row by row. "cc = [cc; next]" will get slower and slower
as the iterations grow. You can crop off any leftover NaNs once finished.

many "for" loops in matlab can be converted to array calcs greatly
speeding things up. if things are done right, nested for loops should
be a rare sight.

matlab disk/sscanf i/o has traditionally been much much slower than C,
although I think this has somewhat improved with newer versions (or it
just seems that way because the CPUs get faster and faster?). I usually
try to use UNIX tools (sed/awk/...) to pre-process ascii data before
load()ing into matlab. you can use !cmd or system() calls in matlab to
do that within the script.

The Fortran compiler you use can greatly affect run time. Using something
modern?

Hamish

On Wed, Nov 5, 2008 at 7:39 PM, Hamish <hamish_b@yahoo.com> wrote:

Dylan Beaudette wrote:

# accumulate new quadrants
# WTF ?
cc = [cc;cord]
Qc=[Qc;Qh]

if cc is matrix, and cord is a matrix, what exactly is
happening?

"coord" is appended to the end of "cc". e.g.:

a = [ 10 20 ]
b = [ 15 25 ]
c = [a; b]

c =

   10 20
   15 25

d = [ a; b; a ]

d =

   10 20
   15 25
   10 20

I have a sneaking suspicion that this algorithm is not very efficient
(the fortran version), as it takes quite a while (well 3 minutes) to
process a 400x400 grid.

in the above example, appending the variable has the effect of resizing
the array, which is very very slow. Whenever possible it is better to
determine/guess the final size of the array and create it (full of NaNs)
and populate it row by row. "cc = [cc; next]" will get slower and slower
as the iterations grow. You can crop off any leftover NaNs once finished.

I thought that this was what was happening, but could not verify it.
In R this would be accomplished by 'row-binding' things together with
rbind(). Same problem in R when dynamically re-sizing an array/matrix-
usually the same strategy for speeding it up. I haven't run the
program in Octave/Matlab, so I am not sure how it performs compared to
the fortran version... probably a lot slower.

many "for" loops in matlab can be converted to array calcs greatly
speeding things up. if things are done right, nested for loops should
be a rare sight.

Ditto for R-- vectorized approaches are usually preferred. If I ever
did finish porting this thing to R, I would try and convert all of the
for() loops into *apply() statements and the like.

matlab disk/sscanf i/o has traditionally been much much slower than C,
although I think this has somewhat improved with newer versions (or it
just seems that way because the CPUs get faster and faster?). I usually
try to use UNIX tools (sed/awk/...) to pre-process ascii data before
load()ing into matlab. you can use !cmd or system() calls in matlab to
do that within the script.

OK-- good tips. I think that this algorithm really belongs in
something like C, as it is not much more than a simple set of wrappers
to a recursive partitioning approach.

The Fortran compiler you use can greatly affect run time. Using something
modern?

I was only able to compile it with f95 -- not sure if that is just an
alias for something in the gcc collection or not.

In thinking about how to implement this as a GRASS function, I wonder
if code could be borrowed / shared with the segment library-- which
essentially builds a quadtree for the partitioning of vector points.
That approach might be able to be modified to partition the region
based on rectangles of equal variance, instead of equal number of
points. I may contact Helena and see what she thinks.

Cheers,

Dylan

Hamish