[GRASSLIST:8906] distance algorithim

Hi all,

I'm trying to optimise a grass module that takes two maps of a
phenomenon at different times. It calculates the distance between the
raster cells in the later map to the nearest cell in the earlier map.

Obviously when the earlier map has alot of cells present it takes a
while to find the closest.

Currently I'm optimising it by making a list of cell coordinates for
the cells in the earlier map. I only add cells to the list that are
not surrounded by other cells - ie I'm only adding cells from the
outlines of contiguous areas. Each cell present in the later map is
compared to the earlier map to check if the same raster cell is
present, if so the distance is 0.

I also split this list into a grid. The grid is basically a lower
resolution raster of the same map, but contains a list in each cell of
the coordinates that fall within the grid position.

e.g

Say the raster region is 100,100 and the grid spacing is 16

This gives a grid size of 7 rounding up from 6.25.

Say we have a coordinate in the later map: 34,50. This evaluates to
grid position (3, 4). Because 34/16 = 2.125 and 50/16=3.125, and each
is rounded up.
To find the nearest cell in the past map we compare distances to all
the coords associated with grid position (3,4) - we also compare the
distances to the 8 surrounding grid positions in case coords in one of
these is closer. If no coords are found then the grid positions
surrounding the surrounding positions are searched, etc.

The grid spacing has some effect on performance - and for my purposes
it seems like 16 does alright.

----

So, the point of this post is:

Is this the best way to be doing this? Does anyone know of a better
algorithm? I know for linear 1 dimensional data I could be using a
range of structures to find the closest, but for two dimensional data
I don't know any.

Alternatively, if my current method is reasonable are there any
additional things I could do to speed it up?

Note, I am doing stochastic dispersal simulations, and although this
does run a 25 timestep simulation within half an hour, I need to do as
many replications as possible. The major hold up at the moment is this
distance evaluation. Part of the problem is that the present cells are
very scattered, so only adding the boundary cells to the lists doesn't
have as great an impact as it usually would.

Any suggestions or comment appreciated, thanks!

Joel

Doesn't r.distance already do this?

Michael
__________________________________________
Michael Barton, Professor of Anthropology
School of Human Evolution and Social Change
Arizona State University
Tempe, AZ 85287-2402

phone: 480-965-6213
fax: 480-965-7671
www: http://www.public.asu.edu/~cmbarton

From: Joel Peter William Pitt <joel.pitt@gmail.com>
Date: Mon, 7 Nov 2005 15:30:19 +1300
To: <grasslist@baylor.edu>
Subject: [GRASSLIST:8906] distance algorithim

Hi all,

I'm trying to optimise a grass module that takes two maps of a
phenomenon at different times. It calculates the distance between the
raster cells in the later map to the nearest cell in the earlier map.

Obviously when the earlier map has alot of cells present it takes a
while to find the closest.

Currently I'm optimising it by making a list of cell coordinates for
the cells in the earlier map. I only add cells to the list that are
not surrounded by other cells - ie I'm only adding cells from the
outlines of contiguous areas. Each cell present in the later map is
compared to the earlier map to check if the same raster cell is
present, if so the distance is 0.

I also split this list into a grid. The grid is basically a lower
resolution raster of the same map, but contains a list in each cell of
the coordinates that fall within the grid position.

e.g

Say the raster region is 100,100 and the grid spacing is 16

This gives a grid size of 7 rounding up from 6.25.

Say we have a coordinate in the later map: 34,50. This evaluates to
grid position (3, 4). Because 34/16 = 2.125 and 50/16=3.125, and each
is rounded up.
To find the nearest cell in the past map we compare distances to all
the coords associated with grid position (3,4) - we also compare the
distances to the 8 surrounding grid positions in case coords in one of
these is closer. If no coords are found then the grid positions
surrounding the surrounding positions are searched, etc.

The grid spacing has some effect on performance - and for my purposes
it seems like 16 does alright.

----

So, the point of this post is:

Is this the best way to be doing this? Does anyone know of a better
algorithm? I know for linear 1 dimensional data I could be using a
range of structures to find the closest, but for two dimensional data
I don't know any.

Alternatively, if my current method is reasonable are there any
additional things I could do to speed it up?

Note, I am doing stochastic dispersal simulations, and although this
does run a 25 timestep simulation within half an hour, I need to do as
many replications as possible. The major hold up at the moment is this
distance evaluation. Part of the problem is that the present cells are
very scattered, so only adding the boundary cells to the lists doesn't
have as great an impact as it usually would.

Any suggestions or comment appreciated, thanks!

Joel

From the r.distance page:
"Locates the closest points between "objects" in two raster maps. An "object" is defined as all the grid cells that have the same category number, and closest means having the shortest "straight-line" distance."

Unless I give every cell a unique category r.distance isn't suitable. I also calculate a statistics based on the distances as my module runs.

As an update - the described algorithm does work pretty fast. I hadn't "make installed" my changes so my tweaks weren't actually being used. Doh.

Perhaps I should alter r.distance to allow for this alternative behaviour? Does this sound useful to anyone?

Cheers,
Joel

Michael Barton wrote:

Doesn't r.distance already do this?

Michael
__________________________________________
Michael Barton, Professor of Anthropology
School of Human Evolution and Social Change
Arizona State University
Tempe, AZ 85287-2402

phone: 480-965-6213
fax: 480-965-7671
www: http://www.public.asu.edu/~cmbarton

From: Joel Peter William Pitt <joel.pitt@gmail.com>
Date: Mon, 7 Nov 2005 15:30:19 +1300
To: <grasslist@baylor.edu>
Subject: [GRASSLIST:8906] distance algorithim

Hi all,

I'm trying to optimise a grass module that takes two maps of a
phenomenon at different times. It calculates the distance between the
raster cells in the later map to the nearest cell in the earlier map.

Obviously when the earlier map has alot of cells present it takes a
while to find the closest.

Currently I'm optimising it by making a list of cell coordinates for
the cells in the earlier map. I only add cells to the list that are
not surrounded by other cells - ie I'm only adding cells from the
outlines of contiguous areas. Each cell present in the later map is
compared to the earlier map to check if the same raster cell is
present, if so the distance is 0.

I also split this list into a grid. The grid is basically a lower
resolution raster of the same map, but contains a list in each cell of
the coordinates that fall within the grid position.

e.g

Say the raster region is 100,100 and the grid spacing is 16

This gives a grid size of 7 rounding up from 6.25.

Say we have a coordinate in the later map: 34,50. This evaluates to
grid position (3, 4). Because 34/16 = 2.125 and 50/16=3.125, and each
is rounded up.
To find the nearest cell in the past map we compare distances to all
the coords associated with grid position (3,4) - we also compare the
distances to the 8 surrounding grid positions in case coords in one of
these is closer. If no coords are found then the grid positions
surrounding the surrounding positions are searched, etc.

The grid spacing has some effect on performance - and for my purposes
it seems like 16 does alright.

----

So, the point of this post is:

Is this the best way to be doing this? Does anyone know of a better
algorithm? I know for linear 1 dimensional data I could be using a
range of structures to find the closest, but for two dimensional data
I don't know any.

Alternatively, if my current method is reasonable are there any
additional things I could do to speed it up?

Note, I am doing stochastic dispersal simulations, and although this
does run a 25 timestep simulation within half an hour, I need to do as
many replications as possible. The major hold up at the moment is this
distance evaluation. Part of the problem is that the present cells are
very scattered, so only adding the boundary cells to the lists doesn't
have as great an impact as it usually would.

Any suggestions or comment appreciated, thanks!

Joel
   

From the r.distance page:
"Locates the closest points between "objects" in two raster maps. An
"object" is defined as all the grid cells that have the same category
number, and closest means having the shortest "straight-line"
distance."

Unless I give every cell a unique category r.distance isn't suitable.
I also calculate a statistics based on the distances as my module
runs.

As an update - the described algorithm does work pretty fast. I hadn't
"make installed" my changes so my tweaks weren't actually being used.
Doh.

Perhaps I should alter r.distance to allow for this alternative
behaviour? Does this sound useful to anyone?

what about r.cost with a cost map of solid ones? (r.mapcalc one=1)
Then multiply answer by map resolution to get a "distance from" map.
Does the -k flag fix the distance-to-diagonals problem in this case?

Hamish

Hamish wrote:

From the r.distance page:
"Locates the closest points between "objects" in two raster maps. An "object" is defined as all the grid cells that have the same category number, and closest means having the shortest "straight-line"
distance."

Unless I give every cell a unique category r.distance isn't suitable.
I also calculate a statistics based on the distances as my module
runs.

As an update - the described algorithm does work pretty fast. I hadn't
"make installed" my changes so my tweaks weren't actually being used.
Doh.

Perhaps I should alter r.distance to allow for this alternative behaviour? Does this sound useful to anyone?
   
what about r.cost with a cost map of solid ones? (r.mapcalc one=1)
Then multiply answer by map resolution to get a "distance from" map.
Does the -k flag fix the distance-to-diagonals problem in this case?

Good point Hamish.

After digging through the man page, the following gives me a list of distances:

r.mapcalc "area.one=1"
r.cost -k input=area.one output=distance start_rast=past_sites
r.mapcalc "distance2=if(isnull(present_sites),null(),distance)"
r.mapcalc "dist_meters=distance2 * (ewres()+nsres())/2."
r.stats -1 -n input=dist_meters

This takes about the same cumalative amount of time as my module, although mine
also calculates the angle to the nearest past site for each distance and generates
moment statistics.

Cheers,
Joel

----

Note: I've forwarded this to the dev list on Michael's suggestion. Here is the original message:

Hi all,

I'm trying to optimise a grass module that takes two maps of a
phenomenon at different times. It calculates the distance between the
raster cells in the later map to the nearest cell in the earlier map.

Obviously when the earlier map has alot of cells present it takes a
while to find the closest.

Currently I'm optimising it by making a list of cell coordinates for
the cells in the earlier map. I only add cells to the list that are
not surrounded by other cells - ie I'm only adding cells from the
outlines of contiguous areas. Each cell present in the later map is
compared to the earlier map to check if the same raster cell is
present, if so the distance is 0.

I also split this list into a grid. The grid is basically a lower
resolution raster of the same map, but contains a list in each cell of
the coordinates that fall within the grid position.

e.g

Say the raster region is 100,100 and the grid spacing is 16

This gives a grid size of 7 rounding up from 6.25.

Say we have a coordinate in the later map: 34,50. This evaluates to
grid position (3, 4). Because 34/16 = 2.125 and 50/16=3.125, and each
is rounded up.
To find the nearest cell in the past map we compare distances to all
the coords associated with grid position (3,4) - we also compare the
distances to the 8 surrounding grid positions in case coords in one of
these is closer. If no coords are found then the grid positions
surrounding the surrounding positions are searched, etc.

The grid spacing has some effect on performance - and for my purposes
it seems like 16 does alright.

----

So, the point of this post is:

Is this the best way to be doing this? Does anyone know of a better
algorithm? I know for linear 1 dimensional data I could be using a
range of structures to find the closest, but for two dimensional data
I don't know any.

Alternatively, if my current method is reasonable are there any
additional things I could do to speed it up?

Note, I am doing stochastic dispersal simulations, and although this
does run a 25 timestep simulation within half an hour, I need to do as
many replications as possible. The major hold up at the moment is this
distance evaluation. Part of the problem is that the present cells are
very scattered, so only adding the boundary cells to the lists doesn't
have as great an impact as it usually would.

Any suggestions or comment appreciated, thanks!

Joel

--
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
Joel Pitt, Room B534
Bio-Protection & Ecology Division
PO Box 84
Lincoln University 8150
New Zealand

After digging through the man page, the following gives me a list of
distances:

r.mapcalc "area.one=1"
r.cost -k input=area.one output=distance start_rast=past_sites
r.mapcalc "distance2=if(isnull(present_sites),null(),distance)"
r.mapcalc "dist_meters=distance2 * (ewres()+nsres())/2."
r.stats -1 -n input=dist_meters

This takes about the same cumalative amount of time as my module,
although mine also calculates the angle to the nearest past site for
each distance and generates moment statistics.

how do the results compare between the two methods? i.e. how good a job
does r.cost do with the knight's move? without?

r.mapcalc residuals=map1-map2
r.univar residuals

sort of thing
?

Hamish

r.mapcalc "area.one=1"
r.cost -k input=area.one output=distance start_rast=past_sites
r.mapcalc "distance2=if(isnull(present_sites),null(),distance)"
r.mapcalc "dist_meters=distance2 * (ewres()+nsres())/2."
r.stats -1 -n input=dist_meters

This takes about the same cumalative amount of time as my module,

oh yeah, try using 100 percent memory in r.cost to speed things up.

Hamish

On 11/9/05, Hamish <hamish_nospam@yahoo.com> wrote:

how do the results compare between the two methods? i.e. how good a job
does r.cost do with the knight's move? without?

r.mapcalc residuals=map1-map2
r.univar residuals

sort of thing
?

As an example, 10 % random cells with distances to 10% of those:

r.mapcalc residuals=map1-map2
r.univar residuals

n: 4720
minimum: -0.682179
maximum: 6.86607
range: 7.54825
mean: 0.0994347
standard deviation: 0.418746
variance: 0.175348
variation coefficient: 421.127 %
sum: 469.332

So doesn't seem like a huge difference, but when I compare the
distribution of distances with a kolmogorov-smirnov test they are
statistically different. I'm not a statistician though so maybe this
isn't a valid measure.

Any comments?

Joel