[GRASS-user] Calculating eigen values and %varianceexplainedafter PCA analysis

Thanks Nikos,

Have read your mail and the associated links. Many thanks they have confirmed some things for me.

Firstly I would like to standardise the PCA, therefore I would like each input to contribute equally to the result. As such I use the correlation matrix as opposed to the covariance. By using this method I do not need to centre the data, yes?

Secondly, using the by hand method r.covar -r -> m.eigensystem -> r.mapcalculator, in particular when applying the eigen vectors to the input imagery I can disregard the signs and take them as absolute values, yes?

Finally the size of these values indicates their relative contribution to the component, so for eg. if band 1 has an eigen vecor of 0.8 and band 2 has a value of 0.1, band 1 contributes more to the pc than band 2, yes?

I will run some tests this afternoon and continue next week and report back. Let me know if my knowledge above is correct.

Many thanks,
Wesley

Wesley Roberts MSc.
Researcher: Earth Observation (Ecosystems)
Natural Resources and the Environment
CSIR
Tel: +27 (21) 888-2490
Fax: +27 (21) 888-2693

"To know the road ahead, ask those coming back."
- Chinese proverb

Nikos Alexandris <nikos.alexandris@felis.uni-freiburg.de> 02/27/09 1:27 PM >>>

Wesley

I downloaded and installed GRASS 6.4 and after much "wailing and
gnashing of teeth" I got m.eigensystem to work. Below are some
comments and questions.

Nice that it worked-out finally. Hopefully my comments are useful for
you (and correct). You can have a look in the following links
[1][2][3][4].

Over the last couple of days I have been running PCA analyses using
the i.pca and r.covar -> m.eigensystem -> r.mapcalc. The analysis
seeks to create a component surface where tree crowns are separated
from understory and ground in a plantation forest. Inputs are three
digital aerial photographs (red, green, blue), a top of canopy height
model, and an intensity surface derived from lidar return intensity
measures. Output from the PCA will be input into a tree couting method
which (if all goes well) will use mathematical morphology to isolate
tree crowns for counting purposes

Interesting stuff!

My results are interesting and worth mentioning to the list. Firstly,
the results from both the automated (i.pca) and the
'by-hand-method' (r.covar -> m.eigensystem -> r.mapcalc) differ. For
example; the eigen values from the automated approach are as follows

(-0.50 -0.53 -0.49 -0.47 -0.08)
(-0.38 -0.30 -0.13 0.86 0.11)
(-0.34 -0.35 0.86 -0.14 0.05)
(0.70 -0.71 -0.01 0.06 0.03)
(0.00 -0.03 0.07 0.13 -0.99)

while the eigen values from the 'by-hand-method' are completely
different, in fact I am a little confused with regards to the ouput
from i.pca and the m.eigensystem. i.pca returns the n number of
components plus the eigen values for each component (or are those
vectors?).

Yes, those are the eigen_VECTORS_(=loadings, on other words the amount
of information that contribute each of the original dimensions in the
resulting components). Each row corresponds to one principal components.
In your example above you "know" that the 1st component (1st row) is
composed by the original dimensions (each column) and each original
dimension has "contributed" according to the _loadings_:

So dimenions 1 -> -0.50, dimension 2 -> -0.53 , dimension 3 -> -0.49,
dimension 4 -> -0.47 and dimension 5 -> -0.08

If I understand well the PCA myself, you can disregard the "signs" and
see the loadings as absolute values.

Would it be fair in saying that these are the coefficients which have
been applied to the input imagery to attain the output components (in
the same way the m.eigensystem works with r.mapcalc)?

Yes.

Output from the m.eigensystem approach only gives one eigen value per
component (see below).
Are the above values from i.pca not the eigen vectors?

It should be the case with i.pca as well since eigen_VALUES_ (=represent
the variances of the original dimensions that are "kept" in each
component) are important for the interpretation of what exactly are each
of the components. But, i.pca just does not report the eigen_VALUES_.

At some point some C-expert needs to have a look in the code (i.pca) and
correct the "bug" which does not let the eigen_VALUES_ from being
printed.

If this is the case then both methods still differ significantly. Is
this possible, and which should I use.

Please have a look at my comments/questions in link [2]. i.pca follows
the "SVD" method. You performed the non-standartised PCA using the
covariance matrix. Note that you can use also the standartised method by
using the correlation matrix.

Qualitatively, the 'by-hand-method' seems to isolate the crowns very
nicely in PC1 while the automated (i.pca) approach isolates crowns in
PC3?? I rescaled the output in the i.pca method, would this contribute
to the differences seen?

I am going to run more tests on the rest of my data and will see if
these issues arise again. In the meantime if anyone of the list can
offer some insight into the two different pca analysis examples I
would greatly appreciate it.

I would be happy to hear more. It's a tool I also need.
Kindest regards, Nikos

[...]

---
Links:

# in grass-user mailing list

[1] # In these posts I didn't know much about PCA #
http://n2.nabble.com/i.pca--vs.--r.covar-m.eigensystem-r.mapcalc-td1885820.html#a1885821

[2] # this is the one I have sent you already #
http://n2.nabble.com/Comparison-between-&quot;i\.pca&quot;\-and\-R&#39;s\-&quot;prcomp\(\)&quot;%
3A-explanations-and-questions-td2283997.html#a2284070

# in grass-trac

[3] http://trac.osgeo.org/grass/ticket/341

[4] http://trac.osgeo.org/grass/ticket/430

--
This message is subject to the CSIR's copyright terms and conditions, e-mail legal notice, and implemented Open Document Format (ODF) standard.
The full disclaimer details can be found at http://www.csir.co.za/disclaimer.html.

This message has been scanned for viruses and dangerous content by MailScanner,
and is believed to be clean. MailScanner thanks Transtec Computers for their support.

Wesley:

Firstly I would like to standardise the PCA, therefore I would like
each input to contribute equally to the result. As such I use the
correlation matrix as opposed to the covariance. By using this method
I do not need to centre the data, yes?

I think we have to make it clear that _data centering_ and
_standardising_ and are two different things.

# data centering: is about whether the variables should be shifted to be
zero centered (copy-pasted from R's help for prcomp() function).#

# standardising: //Ehmm... thinking...// if I got the PCA concept
right, using the correlation matrix is a kind of normalising with which
the variance of each input variable/feature/dimension is set to 1 just
before the analysis is applied.#

It's up to you to test data centering or not, and/or standardising or
not. For several applications it has been shown that standardising
improves results in several ways. (!?).

Secondly, using the by hand method r.covar -r -> m.eigensystem ->
r.mapcalculator, in particular when applying the eigen vectors to the
input imagery I can disregard the signs and take them as absolute
values, yes?

No. My apologies for not being clear before. You can ignore the signs
when you just try to interpret "how much" each original dimension has
affected a component. When it comes to produce a component by hand (as
you describe it above) you _certainly_ need to consider the signs.

Finally the size of these values indicates their relative contribution
to the component, so for eg. if band 1 has an eigen vecor of 0.8 and
band 2 has a value of 0.1, band 1 contributes more to the pc than band
2, yes?

Correct (given that you are talking about the _same_ component).

I will run some tests this afternoon and continue next week and report
back. Let me know if my knowledge above is correct.

Let _me_ know, if you have the time, if my statements are correct! :slight_smile:

Cheers, Nikos