[GRASS-user] algorithm used in i.cluster

Georg_Kaspar1 · April 21, 2010, 3:29pm

hi,

a quote from Markus' "Introduction to GRASS GIS Software" (1998):

"[...] GRASS uses the "minimum distance to means" algorithm for deriving the clusters [...]

"minimum distance to means" is described in Lillesand and Kiefer (2000), as a supervised classification technique which requires a priori information. Does the i.cluster module use a modified version or have there been changes since 1998?
the manual page does not contain any further information on how the module works...

does anybody know which exact algorithm is used (k-means probably...)?

thanks & regards,
georg

neteler · April 22, 2010, 1:48am

On Wed, Apr 21, 2010 at 5:29 PM, Georg Kaspar <georg@muenster.de> wrote:

hi,

a quote from Markus' "Introduction to GRASS GIS Software" (1998):

"[...] GRASS uses the "minimum distance to means" algorithm for deriving the
clusters [...]

This knowledge is essentially based on
http://grass.osgeo.org/grass64/manuals/html64_user/i.cluster.html

and "The GRASS 4 Image Processing manual"
http://grass.itc.it/gdp/imagery/grass4_image_processing.pdf

and
http://trac.osgeo.org/grass/browser/grass/branches/releasebranch_6_4/imagery/i.cluster
http://download.osgeo.org/grass/grass6_progman/c__exec_8c_source.html
-> /* generate class means */
00063 I_cluster_means(C);

etc.

"minimum distance to means" is described in Lillesand and Kiefer (2000), as
a supervised classification technique which requires a priori information.
Does the i.cluster module use a modified version or have there been changes
since 1998?

To my knowledge, no algorithm changes have been taken place.

the manual page does not contain any further information on how the module
works...

Luckily it is open source

does anybody know which exact algorithm is used (k-means probably...)?

I'm afraid that you need to compare it to a k-means algorithm. Please let
us know your findings.

Markus

Georg_Kaspar1 · April 22, 2010, 11:59am

Markus Neteler wrote:

Luckily it is open source

and also quite well documented (at least it is possible to understand what's going on without being a C-guru)

does anybody know which exact algorithm is used (k-means probably...)?

I'm afraid that you need to compare it to a k-means algorithm. Please let
us know your findings.

the steps described in

http://download.osgeo.org/grass/grass6_progman/c__exec_8c_source.html

seem to fit the migrating means algorithm described in richards and jia (2006) and based on the isodata algorithm in ball and hall (1965):

1. set C initial cluster centers (at random)
2. assign each pixel to the nearest center
3. compute a new set of means
4. repeat until no further changes occur. (shapiro also implemented a maximum number of iterations)

empty clusters can also be deleted and similar clusters can be merged as implemented in i.cluster.

best regards,
Georg

Georg_Kaspar1 · April 22, 2010, 3:21pm

Georg Kaspar schrieb:

the steps described in

http://download.osgeo.org/grass/grass6_progman/c__exec_8c_source.html

seem to fit the migrating means algorithm described in richards and jia (2006) and based on the isodata algorithm in ball and hall (1965):

which is in principle the same as k-means.
I wrote a mail to Michael Shapiro, who wrote the code back in the 90's and received this answer:

Georg,

It has been a very long time since I wrote that code so my memory may be suspect, but I think that is correct.

----- Original Message -----
From: "Georg Kaspar" <georg.kaspar@uni-muenster.de>
To: mshapiro@ncsa.illinois.edu
Sent: Thursday, April 22, 2010 10:09:57 AM
Subject: algorithm used in GRASS module i.cluster

Dear Mr Shapiro,
for my diploma thesis I need to know the algorithm implemented in the GRASS GIS module "i.cluster". I had a quick look at the source code and think that it uses an iterative process similar to k-means clustering. Can you confirm this assumption?
Thank you very much in advance!
best regards,
Georg Kaspar

Georg_Kaspar1 · April 22, 2010, 3:35pm

Sorry for flooding the list, but i found an interesting note in Schowengerdt (2007), p. 400:

"The ISODATA algorithm (Ball and Hall, 1967) is a common modification of the K-means algorithm and includes merging of clusters if their separation is below a threshold, and splitting of a single cluster into two clusters if it becomes too large"

The algorithm implemented in the i.cluster module involves merging of classes (I_cluster_merge) though no splitting function seems to be implemented.
Since Michael Shapiro stated that he might have used the K-means algorithm, I think we can be pretty shure that it is a modified version similar to the isodata algorithm, which is described as migrating means in Richards (2006).

Markus_GRASS · April 22, 2010, 3:48pm

On Thu, Apr 22, 2010 at 5:35 PM, Georg Kaspar <georg@muenster.de> wrote:

Sorry for flooding the list, but i found an interesting note in Schowengerdt (2007), p. 400:

IMHO, you’re welcome:-)

Would you mind assembling all this info in a nice little paragraph including full references so that the i.cluster manual can be updated?

Thanks,

Markus M

“The ISODATA algorithm (Ball and Hall, 1967) is a common modification of the K-means algorithm and includes merging of clusters if their separation is below a threshold, and splitting of a single cluster into two clusters if it becomes too large”

The algorithm implemented in the i.cluster module involves merging of classes (I_cluster_merge) though no splitting function seems to be implemented.
Since Michael Shapiro stated that he might have used the K-means algorithm, I think we can be pretty shure that it is a modified version similar to the isodata algorithm, which is described as migrating means in Richards (2006).

grass-user mailing list
grass-user@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/grass-user