[GRASS-user] Calculating eigen values and %varianceexplainedafter PCA analysis

Hi Hamish,

I would like to check out the new i.pca module described below in the svn repository. Where can I find devbr6 (6.5svn)? So far I have only been able to find 6.4svn.

Many thanks,
Wesley

Wesley Roberts MSc.
Researcher: Earth Observation (Ecosystems)
Natural Resources and the Environment
CSIR
Tel: +27 (21) 888-2490
Fax: +27 (21) 888-2693

"To know the road ahead, ask those coming back."
- Chinese proverb

Nikos Alexandris <nikos.alexandris@felis.uni-freiburg.de> 03/01/09 3:31 PM >>>

Nikos:

> It should be the case with i.pca as well since eigen_VALUES_ (=represent
> the variances of the original dimensions that are "kept" in each
> component) are important for the interpretation of what exactly are each
> of the components. But, i.pca just does not report the eigen_VALUES_.
>
> At some point some C-expert needs to have a look in the code (i.pca) and
> correct the "bug" which does not let the eigen_VALUES_ from being
> printed.

Hamish:

done in devbr6 (6.5svn) please test, I'm not a multivariate stats guru
and may have done something dumb so didn't port to other branches yet.

I changed the i.pca output to be like:

Eigen (vectors) and values:
PC1 ( -0.63 -0.65 -0.43 ) 88.07
PC2 ( 0.23 0.37 -0.90 ) 11.48
PC3 ( 0.75 -0.66 -0.08 ) 0.45

As it was previously sent to stderr via G_message() I don't feel bad about
breaking output text compatibility. I wanted to add "%" to the values but
due to the sprintf()+strcat() method in the code that was a pain, so I
didn't.

Wesley:

> > If this is the case then both methods still differ significantly. Is
> > this possible, and which should I use.

> Please have a look at my comments/questions in link [2].
> i.pca follows the "SVD" method. You performed the non-standartised PCA
> using the covariance matrix. Note that you can use also the
> standartised method by using the correlation matrix.

does the r.mapcalc command at the end of the m.eigensystem help page*
do that conversion, or ...? ie how can we test these against each other?
how to do the standardized method?

We can easily cross-compare now. I will run my test in GRASS and R.
In GRASS:
* i.pca --> SVD without data centering method

* "r.covar" + "m.eigensystem" --> non-standardised PCA (about data
centering I am unsure here, I will investigate the _numbers_)

* "r.covar -r" + "m.eigensystem" # note the "-r" flag --> standardised
PCA (about data centering I am unsure, willing to test).

[*] (is "\-" there a typo or some old mapcalc syntax?)

IMHO: yes. Maybe it's some forgotten _backslash_ !?

also ISTR somebody (Dylan?) doing a comparison with the R-stats interface.

It would be nice to run tests using the Spearfish imagery dataset. After
my own tests I noticed it matched what was used in the m.eigensystem help
page.

I will probably run the test with my own data. Since I just need to
copy-paste the commands. I don't have the spearfish dataset currently.

my results follow.

Hamish

----------------------------
#Spearfish imagery sample dataset
g.region rast=spot.ms.1

# 'by-hand-method'
G65> echo "3" > test_m.eigensystem # number of input maps
G65> r.covar map=spot.ms.1,spot.ms.2,spot.ms.3 >> test_m.eigensystem

G65> cat test_m.eigensystem
3
462.876649 480.411218 281.758307
480.411218 513.015646 278.914813
281.758307 278.914813 336.326645

G65> m.eigensystem < test_m.eigensystem
-----
C The output is N sets of values. One E line and N V W lines
C
C E real imaginary percent-importance
C V real imaginary
C N real imaginary
C W real imaginary
C ...
C
C where E is the eigen value (and it relative importance)
C and V are the eigenvector for this eigenvalue.
C N are the normalized eigenvector for this eigenvalue.
C W are the N vector multiplied by the square root of the
C magnitude of the eigen value (E).
-----

E 1159.7452017844 0.0000000000 88.38
V 0.6910021591 0.0000000000
V 0.7205280412 0.0000000000
V 0.4805108400 0.0000000000
N 0.6236808478 0.0000000000
N 0.6503301526 0.0000000000
N 0.4336967751 0.0000000000
W 21.2394712045 0.0000000000
W 22.1470141296 0.0000000000
W 14.7695575384 0.0000000000

E 5.9705414972 0.0000000000 0.45
V 0.7119385973 0.0000000000
V -0.6358200627 0.0000000000
V -0.0703936743 0.0000000000
N 0.7438340890 0.0000000000
N -0.6643053754 0.0000000000
N -0.0735473745 0.0000000000
W 1.8175356507 0.0000000000
W -1.6232096923 0.0000000000
W -0.1797107407 0.0000000000

E 146.5031967184 0.0000000000 11.16
V 0.2265837636 0.0000000000
V 0.3474697082 0.0000000000
V -0.8468727535 0.0000000000
N 0.2402770238 0.0000000000
N 0.3684685345 0.0000000000
N -0.8980522763 0.0000000000
W 2.9082771721 0.0000000000
W 4.4598880523 0.0000000000
W -10.8698904856 0.0000000000

# 'all-in-one method' using r.covar+m.eigensystem:
G65> (echo 3; r.covar spot.ms.1,spot.ms.2,spot.ms.3 ) | m.eigensystem

Then, using the W vector, new maps are created:
r.mapcalc 'pc.1 = 21.2395*map.1 + 22.1470*map.2 + 14.7696*map.3'
r.mapcalc 'pc.2 = 2.9083*map.1 + 4.4599*map.2 - 10.8699*map.3'
r.mapcalc 'pc.3 = 1.8175*map.1 - 1.6232*map.2 \- 0.1797*map.3'

(is "\-" above a typo or some old mapcalc syntax?)

Probably a typo.

which look highly similar (but not identical) to i.pca output maps.

It's ok. R print's out lot's of decimals. Perhaps the difference is due
to some rounding of the numbers.

(after 'r.colors color=grey')

# 'automatic method'
imagery60:G6.5svn> i.pca in=spot.ms.1,spot.ms.2,spot.ms.3 out=spot_pca

Eigen (vectors) and values:
PC1 ( -0.63 -0.65 -0.43 ) 88.07
PC2 ( 0.23 0.37 -0.90 ) 11.48
PC3 ( 0.75 -0.66 -0.08 ) 0.45

--
This message is subject to the CSIR's copyright terms and conditions, e-mail legal notice, and implemented Open Document Format (ODF) standard.
The full disclaimer details can be found at http://www.csir.co.za/disclaimer.html.

This message has been scanned for viruses and dangerous content by MailScanner,
and is believed to be clean. MailScanner thanks Transtec Computers for their support.

On Mon, 2009-03-02 at 16:33 +0200, Wesley Roberts wrote:

I would like to check out the new i.pca module described below in the
svn repository. Where can I find devbr6 (6.5svn)? So far I have only
been able to find 6.4svn.

Many thanks,
Wesley

Wesley,

you can grab the grass6_devel source code (now ="6.5") for example with
the following command:
svn checkout https://svn.osgeo.org/grass/grass/branches/develbranch_6
grass6_devel

Kind regards, Nikos