[GRASS-user] proximity mapping / site clustering

So far the network engineer/planner (who happens to be my Dad) I have been working with on using GRASS for WAN Network engineering and design has been working out awesome. After a successful steiner tree, i have come to another problem to solve with GRASS.

i have all these sites on the minimal connection network. what i need to do now on a simplistic level, is cluster the sites or aggregate them. Clustering them by proximity is the first step, even though traffic generation is another factor that will have to be added.

I have looked at v.net.iso and v.net.alloc, but none seem to produce the results sought after. is there a function or combination of functions that might cluster sites on the network based upon their proximity to each other?

i used v.net.iso with roughly a mile iso band. i was able to highlight on the optimal network all sites within a mile of each other but those were not the desired results.

Any ideas?

thanks in advance

followup/correction.

The “hubbing” does not have to occur along the optimal route (steiner tree). this is “line of sight” orientated (cell towers as example).

On 7/21/06, M S <mseibel@gmail.com> wrote:

So far the network engineer/planner (who happens to be my Dad) I have been working with on using GRASS for WAN Network engineering and design has been working out awesome. After a successful steiner tree, i have come to another problem to solve with GRASS.

i have all these sites on the minimal connection network. what i need to do now on a simplistic level, is cluster the sites or aggregate them. Clustering them by proximity is the first step, even though traffic generation is another factor that will have to be added.

I have looked at v.net.iso and v.net.alloc, but none seem to produce the results sought after. is there a function or combination of functions that might cluster sites on the network based upon their proximity to each other?

i used v.net.iso with roughly a mile iso band. i was able to highlight on the optimal network all sites within a mile of each other but those were not the desired results.

Any ideas?

thanks in advance

This sounds like a fun job for GRASS-R coupling!

sample case with the spearfish dataset and the vector 'bugsites'

#GRASS6.1
v.out.ascii bugsites > bugsites.xy

start R:
-----------------------------------------------------------------
x <- read.table('bugsites.xy', sep="|")
names(x) <- c('easting', 'northing', 'cat')

y <- data.frame(x[,1:2])

library(cluster)

y.pam <- pam(y, 5)
plot(y, col=y.pam$clustering, main="Bugsites Spatial Clustering, 5 classes",
cex=0.5, pch=16)

#the centroids of the clustering
points(y.pam$medoids, pch=15, col=1:5)

y$cluster <- y.pam$clustering
y$orig_cat <- as.numeric(row.names(y))

write.table(y, file='bugsites.clust', row.names=FALSE)
-----------------------------------------------------------------

#back in GRASS6.1
v.in.ascii in=bugsites.clust out=bclust fs=" " columns='x double, y double,
cluster integer, orig_cat integer' skip=1

#display the clustered data with convex hulls:
d.rast elevation.10m

# there are 5 clusters
for x in `seq 1 5`
do v.extract --o in=bclust where="cluster=$x" out=bclust_$x
v.hull in=bclust_$x out=bclust_hull_$x; g.remove vect=bclust_$x
d.vect bclust_hull_$x
done

#display the clustered points
d.vect bclust icon=basic/box fcol=blue col=blue size=5

#note that in the attribute table there exists the original CAT and spatial
cluster number:
v.db.select bclust

cat|x|y|cluster|orig_cat
1|590232|4915039|1|1
2|590430|4915204|1|2
3|590529|4914625|1|3
4|590546|4915353|1|4
5|590612|4915320|1|5
6|590744|4915535|1|6

just for fun, here is a link to the image.
http://169.237.35.250/~dylan/temp/spatial_clustering_idea1.png

Cheers,

Dylan

On Friday 21 July 2006 14:33, M S wrote:

followup/correction.

The "hubbing" does not have to occur along the optimal route (steiner
tree). this is "line of sight" orientated (cell towers as example).

On 7/21/06, M S <mseibel@gmail.com> wrote:
> So far the network engineer/planner (who happens to be my Dad) I have
> been working with on using GRASS for WAN Network engineering and design
> has been working out awesome. After a successful steiner tree, i have
> come to another problem to solve with GRASS.
>
> i have all these sites on the minimal connection network. what i need to
> do now on a simplistic level, is cluster the sites or aggregate them.
> Clustering them by proximity is the first step, even though traffic
> generation is another factor that will have to be added.
>
> I have looked at v.net.iso and v.net.alloc, but none seem to produce the
> results sought after. is there a function or combination of functions
> that might cluster sites on the network based upon their proximity to
> each other?
>
>
> i used v.net.iso with roughly a mile iso band. i was able to highlight
> on the optimal network all sites within a mile of each other but those
> were not the desired results.
>
> Any ideas?
>
> thanks in advance

--
Dylan Beaudette
Soils and Biogeochemistry Graduate Group
University of California at Davis
530.754.7341

One more example for fun...

#see previous email in thread by D.E.B.

# add in cost derived from slope map, and distance to some initial point
b_start:
echo "596204.6875|4917668.75" | v.in.ascii out=b_start
r.cost in=slope out=b_cost start_points=b_start stop_points=bugsites -k

#add this to bugsites att tables
echo "ALTER TABLE bugsites add column s_cost double" | db.execute
v.what.rast vect=bugsites rast=b_cost column=s_cost

#export busgites
v.out.ascii.db in=bugsites out=bugsites.xy columns='s_cost'

# similar R commands, but standardize the data frame prior to partitionin
around medoid approach:

y.pam <- pam(y, 5, stand=TRUE)

#yields slightly different results. note red 'dot' is the location of some
arbitrary starting point ' b_start'
http://169.237.35.250/~dylan/temp/spatial_clustering_idea2-cost.png

Seriously, I really wish that GRASS, R, PostGIS, etc. weren't so damn
flexible - then I would stick to my work instead of trying out things like
this!

Cheers,

Dylan

On Friday 21 July 2006 14:33, M S wrote:

followup/correction.

The "hubbing" does not have to occur along the optimal route (steiner
tree). this is "line of sight" orientated (cell towers as example).

On 7/21/06, M S <mseibel@gmail.com> wrote:
> So far the network engineer/planner (who happens to be my Dad) I have
> been working with on using GRASS for WAN Network engineering and design
> has been working out awesome. After a successful steiner tree, i have
> come to another problem to solve with GRASS.
>
> i have all these sites on the minimal connection network. what i need to
> do now on a simplistic level, is cluster the sites or aggregate them.
> Clustering them by proximity is the first step, even though traffic
> generation is another factor that will have to be added.
>
> I have looked at v.net.iso and v.net.alloc, but none seem to produce the
> results sought after. is there a function or combination of functions
> that might cluster sites on the network based upon their proximity to
> each other?
>
>
> i used v.net.iso with roughly a mile iso band. i was able to highlight
> on the optimal network all sites within a mile of each other but those
> were not the desired results.
>
> Any ideas?
>
> thanks in advance

--
Dylan Beaudette
Soils and Biogeochemistry Graduate Group
University of California at Davis
530.754.7341

Indeed. The examples posted were using standard numerical clustering,
which just happened to incorporate spatial coordinates. the packages
mentioned up thread are probably better suited for spatial studies.
Also, see the most recent GRASS Newsletter for a great article by R.
Bivand on getting raster / point data into R.

Good luck,

Dylan

On 7/22/06, Jarek Jasiewicz <jarekj@amu.edu.pl> wrote:

Dylan Beaudette napisa³(a):
> This sounds like a fun job for GRASS-R coupling!
>
> sample case with the spearfish dataset and the vector 'bugsites'
>
> #GRASS6.1
> v.out.ascii bugsites > bugsites.xy
>
> start R:
> -----------------------------------------------------------------
> x <- read.table('bugsites.xy', sep="|")
> names(x) <- c('easting', 'northing', 'cat')
>
> y <- data.frame(x[,1:2])
>
> library(cluster)
>
> y.pam <- pam(y, 5)
> plot(y, col=y.pam$clustering, main="Bugsites Spatial Clustering, 5 classes",
> cex=0.5, pch=16)
>
> #the centroids of the clustering
> points(y.pam$medoids, pch=15, col=1:5)
>
> y$cluster <- y.pam$clustering
> y$orig_cat <- as.numeric(row.names(y))
>
> write.table(y, file='bugsites.clust', row.names=FALSE)
> -----------------------------------------------------------------
>
> #back in GRASS6.1
> v.in.ascii in=bugsites.clust out=bclust fs=" " columns='x double, y double,
> cluster integer, orig_cat integer' skip=1
>
> #display the clustered data with convex hulls:
> d.rast elevation.10m
>
> # there are 5 clusters
> for x in `seq 1 5`
> do v.extract --o in=bclust where="cluster=$x" out=bclust_$x
> v.hull in=bclust_$x out=bclust_hull_$x; g.remove vect=bclust_$x
> d.vect bclust_hull_$x
> done
>
> #display the clustered points
> d.vect bclust icon=basic/box fcol=blue col=blue size=5
>
> #note that in the attribute table there exists the original CAT and spatial
> cluster number:
> v.db.select bclust
>
> cat|x|y|cluster|orig_cat
> 1|590232|4915039|1|1
> 2|590430|4915204|1|2
> 3|590529|4914625|1|3
> 4|590546|4915353|1|4
> 5|590612|4915320|1|5
> 6|590744|4915535|1|6
>
> just for fun, here is a link to the image.
> http://169.237.35.250/~dylan/temp/spatial_clustering_idea1.png
>
> Cheers,
>
> Dylan
>
> On Friday 21 July 2006 14:33, M S wrote:
>
>> followup/correction.
>>
>> The "hubbing" does not have to occur along the optimal route (steiner
>> tree). this is "line of sight" orientated (cell towers as example).
>>
>> On 7/21/06, M S <mseibel@gmail.com> wrote:
>>
>>> So far the network engineer/planner (who happens to be my Dad) I have
>>> been working with on using GRASS for WAN Network engineering and design
>>> has been working out awesome. After a successful steiner tree, i have
>>> come to another problem to solve with GRASS.
>>>
>>> i have all these sites on the minimal connection network. what i need to
>>> do now on a simplistic level, is cluster the sites or aggregate them.
>>> Clustering them by proximity is the first step, even though traffic
>>> generation is another factor that will have to be added.
>>>
>>> I have looked at v.net.iso and v.net.alloc, but none seem to produce the
>>> results sought after. is there a function or combination of functions
>>> that might cluster sites on the network based upon their proximity to
>>> each other?
>>>
>>> i used v.net.iso with roughly a mile iso band. i was able to highlight
>>> on the optimal network all sites within a mile of each other but those
>>> were not the desired results.
>>>
>>> Any ideas?
>>>
>>> thanks in advance
>>>
>
Yes, you shoud use R: with following packages:
DCluster, splances, spdep and spatstat. In that packages are lots
clustering mtohods
Best regards
jarek Jasiewicz

Sure thing. GRASS, R, PostGIS, QGIS, GDAL -- all worth their weight in
gold if you printed the source code.

GRASS newsletter: see vol. 3 , the article by Roger Bivand:
http://grass.itc.it/newsletter/index.php

and possibly the summary here:
http://casoilresource.lawr.ucdavis.edu/drupal/node/221

with a small no. of sites, the pam() clustering works well, for larger
sample sizes use clara() : both found in the cluster package. note
that pam() is a specialized version of k-means, but generally more
robust. also note that instructing the algorithm to automatically
standardized the input data is usaully a good idea- but see the
references for details.

clustering around set "medoids" with pam() or set "centroids" with
kmeans() is fairly simple, just see the documents associated with
these functions. the CRAN website will be a good start. also, for some
rather generic R examples and documents see the link:
http://casoilresource.lawr.ucdavis.edu/drupal/node/100

regarding the steiner problem: if you have a chance could you post
some sample data, along with your commands so others can give it a
try. I have been meaning to experiment with the v.net.* commands for
sometime now. if you would like i can post it all on our lab's
website, crediting you. or the GRASS wiki might be another option.

I will post the spatial clustering examples to a formatted webpage in
the next couple of days for clarity.

Good luck,

Dylan

On 7/22/06, M S <mseibel@gmail.com> wrote:

ok... given you have a bunch of sites as points say 50 sites. and you have
another set of points say 3.

is there a way to cluster the 50 sites to or around the 3 other points?
sort of using a gravity model to cluster the 50 points to the 3 in the
shortest distance?

On 7/22/06, M S <mseibel@gmail.com> wrote:
>
> this is awesome. i cant thank you enough. i am long time arc/info user
and know GIS very well. i've been getting into grass recently, but it is
takign a large larning curve to untrain my mind from the E$RI way of doing
things. you could never do this with arc/info. like you said the
flexibility of grass and R make this so sweet!
>
> I might have to learn R too. funny it is from the S package from Bell
Labs because my dad used to work for them, then ATT then Lucent. he has
been working with this stuff a long time and i am just now showing him how
GRASS GIS can solve the network planning problems.
>
> i'll have to look at the article from recent grass newsletter. i dont
think i get that publication. is it off the main website?
>
> thanks so much. this is very encouraging and awesome!
>
> On 7/22/06, Dylan Beaudette < dylan.beaudette@gmail.com> wrote:
> > Indeed. The examples posted were using standard numerical clustering,
> > which just happened to incorporate spatial coordinates. the packages
> > mentioned up thread are probably better suited for spatial studies.
> > Also, see the most recent GRASS Newsletter for a great article by R.
> > Bivand on getting raster / point data into R.
> >
> > Good luck,
> >
> > Dylan
> >
>

fantastic. for now some of this R and statistics is over my head! i’ll have to spend some time getting orientated. thank you for the links.

is this part below, referring to my previous question to cluster one mass of points around an other set of points that are much less? (eg, 50 points and how to cluster them around 3 points from another layer)

clustering around set “medoids” with pam() or set “centroids” with
kmeans() is fairly simple, just see the documents associated with
these functions. the CRAN website will be a good start. also, for some
rather generic R examples and documents see the link:
http://casoilresource.lawr.ucdavis.edu/drupal/node/100

I would love to post my steiner results. maybe i can do the same thing on spearfish data because the data i used is client specific. I’ll make a run on the spearfish60 data, and provide a step by step walk through.

Dear all...
the topic was really interesting.... and really useful for me!!!!
so, although I'm not so experienced with shell programming, I have
decided to try to create a shell script for performing clustering into
grass using R.
I have heavily used Dylan's previous e-mails...

I attach the script to this e-mail....
please try it and give me suggestion about how to improve it... ( the
code is not so good :frowning: )

The script need an input point map, the number of cluster to create and
the name of the output map...

Cheers,

Ivan

On ven, 2006-07-21 at 16:07 -0700, Dylan Beaudette wrote:

One more example for fun...

#see previous email in thread by D.E.B.

# add in cost derived from slope map, and distance to some initial point
b_start:
echo "596204.6875|4917668.75" | v.in.ascii out=b_start
r.cost in=slope out=b_cost start_points=b_start stop_points=bugsites -k

#add this to bugsites att tables
echo "ALTER TABLE bugsites add column s_cost double" | db.execute
v.what.rast vect=bugsites rast=b_cost column=s_cost

#export busgites
v.out.ascii.db in=bugsites out=bugsites.xy columns='s_cost'

# similar R commands, but standardize the data frame prior to partitionin
around medoid approach:

y.pam <- pam(y, 5, stand=TRUE)

#yields slightly different results. note red 'dot' is the location of some
arbitrary starting point ' b_start'
http://169.237.35.250/~dylan/temp/spatial_clustering_idea2-cost.png

Seriously, I really wish that GRASS, R, PostGIS, etc. weren't so damn
flexible - then I would stick to my work instead of trying out things like
this!

Cheers,

Dylan

On Friday 21 July 2006 14:33, M S wrote:
> followup/correction.
>
> The "hubbing" does not have to occur along the optimal route (steiner
> tree). this is "line of sight" orientated (cell towers as example).
>
> On 7/21/06, M S <mseibel@gmail.com> wrote:
> > So far the network engineer/planner (who happens to be my Dad) I have
> > been working with on using GRASS for WAN Network engineering and design
> > has been working out awesome. After a successful steiner tree, i have
> > come to another problem to solve with GRASS.
> >
> > i have all these sites on the minimal connection network. what i need to
> > do now on a simplistic level, is cluster the sites or aggregate them.
> > Clustering them by proximity is the first step, even though traffic
> > generation is another factor that will have to be added.
> >
> > I have looked at v.net.iso and v.net.alloc, but none seem to produce the
> > results sought after. is there a function or combination of functions
> > that might cluster sites on the network based upon their proximity to
> > each other?
> >
> >
> > i used v.net.iso with roughly a mile iso band. i was able to highlight
> > on the optimal network all sites within a mile of each other but those
> > were not the desired results.
> >
> > Any ideas?
> >
> > thanks in advance

--
Ivan Marchesini
Department of Civil and Environmental Engineering
University of Perugia
Via G. Duranti 93/a
06125
Perugia (Italy)
e-mail: marchesini@unipg.it
        ivan.marchesini@gmail.com
tel: +39(0)755853760
fax: +39(0)755853756
jabber: geoivan73@jabber.org

(attachments)

v.cluster (4.06 KB)

Be sure the read up on the documentation, as it is all spelled out there.

A quick example of "partitioning around medoids", with known medoid
locations might be accomplished by combining the bugsites along with
user-digitized centers in R.

quick example:

#interactively digitize 3 "centers"
d.where | awk '{print $1"|"$2}' | v.in.ascii out=b_my_centers

#export as text:
v.out.ascii in=b_my_centers out=b_my_centers.xy

#switch to R
x <- read.table('bugsites.xy', sep="|")
y <- read.table('b_my_centers.xy', sep="|")

#name the columns in the imported dataframes
names(y) <- c('easting', 'northing', 'cat')
head(x)
names(x) <- c('cat','easting', 'northing', 'cost')
head(x)

#composite the two lists of points, keeping only the easting,northing
z <- rbind(x[,2:3], y[,1:2])
str(z)

#cluster around the medoids (the last three rows in z)
z.pam = pam(z, 3, stand=TRUE, medoids=c(91,92,93) )

#plot the results
plot(z$easting, z$northing, col=z.pam$clustering, main="Spatial Clustering aroun
d Medoids")
points(y$easting, y$northing, col=c(2,1,3), pch=16 )

output image here:
http://169.237.35.250/~dylan/temp/spatial_clustering_idea3-pam.png

filled circles are the medoids digitized in GRASS, hollow circles are
the original bugsites.
follow similar email on how to get this data back into GRASS.

note that a better way to do this would use the R/GRASS interface,
such that REAL spatial objects are moved between GRASS and R, instead
of just coordinates.

in regards to large datasets, pam() should work for perhaps 1,000
records, wheras clara() should be used for anything larger.

Cheers,

Dylan

On 7/22/06, M S <mseibel@gmail.com> wrote:

fantastic. for now some of this R and statistics is over my head! i'll
have to spend some time getting orientated. thank you for the links.

is this part below, referring to my previous question to cluster one mass of
points around an other set of points that are much less? (eg, 50 points and
how to cluster them around 3 points from another layer)

clustering around set "medoids" with pam() or set "centroids" with
kmeans() is fairly simple, just see the documents associated with
these functions. the CRAN website will be a good start. also, for some
rather generic R examples and documents see the link:
http://casoilresource.lawr.ucdavis.edu/drupal/node/100

I would love to post my steiner results. maybe i can do the same thing on
spearfish data because the data i used is client specific. I'll make a run
on the spearfish60 data, and provide a step by step walk through.