[GRASS-user] Calculating eigen values and % varianceexplainedafter PCA analysis

Dear Nikos and list,

I downloaded and installed GRASS 6.4 and after much "wailing and gnashing of teeth" I got m.eigensystem to work. Below are some comments and questions.

Over the last couple of days I have been running PCA analyses using the i.pca and r.covar -> m.eigensystem -> r.mapcalc. The analysis seeks to create a component surface where tree crowns are separated from understory and ground in a plantation forest. Inputs are three digital aerial photographs (red, green, blue), a top of canopy height model, and an intensity surface derived from lidar return intensity measures. Output from the PCA will be input into a tree couting method which (if all goes well) will use mathematical morphology to isolate tree crowns for counting purposes

My results are interesting and worth mentioning to the list. Firstly, the results from both the automated (i.pca) and the 'by-hand-method' (r.covar -> m.eigensystem -> r.mapcalc) differ. For example; the eigen values from the automated approach are as follows

(-0.50 -0.53 -0.49 -0.47 -0.08)

(-0.38 -0.30 -0.13 0.86 0.11)

(-0.34 -0.35 0.86 -0.14 0.05)

(0.70 -0.71 -0.01 0.06 0.03)

(0.00 -0.03 0.07 0.13 -0.99)

while the eigen values from the 'by-hand-method' are completely different, in fact I am a little confused with regards to the ouput from i.pca and the m.eigensystem.

i.pca returns the n number of components plus the eigen values for each component (or are those vectors?). Would it be fair in saying that these are the coefficients which have been applied to the input imagery to attain the output components (in the same way the m.eigensystem works with r.mapcalc)? Output from the m.eigensystem approach only gives one eigen value per component (see below). Are the above values from i.pca not the eigen vectors? If this is the case then both methods still differ significantly. Is this possible, and which should I use.

Qualitatively, the 'by-hand-method' seems to isolate the crowns very nicely in PC1 while the automated (i.pca) approach isolates crowns in PC3?? I rescaled the output in the i.pca method, would this contribute to the differences seen?

I am going to run more tests on the rest of my data and will see if these issues arise again. In the meantime if anyone of the list can offer some insight into the two different pca analysis examples I would greatly appreciate it.

Many thanks and kind regards,
Wesley

PC1
E 3759.8207866502 0.0000000000 89.81
V -0.7126171230 0.0000000000
V -0.7014527001 0.0000000000
V -0.5479326672 0.0000000000
V -0.0063691433 0.0000000000
V 0.0003086030 0.0000000000
N -0.6249753328 0.0000000000
N -0.6151839755 0.0000000000
N -0.4805447273 0.0000000000
N -0.0055858291 0.0000000000
N 0.0002706492 0.0000000000
W -38.3218484280 0.0000000000
W -37.7214680719 0.0000000000
W -29.4657424614 0.0000000000
W -0.3425083898 0.0000000000
W 0.0165955013 0.0000000000
PC3
E 83.1252237615 0.0000000000 1.99
V 0.2494768845 0.0000000000
V 0.2555418427 0.0000000000
V -0.6518350827 0.0000000000
V 0.0206244955 0.0000000000
V 0.0053148479 0.0000000000
N 0.3355163716 0.0000000000
N 0.3436730101 0.0000000000
N -0.8766397026 0.0000000000
N 0.0277374632 0.0000000000
N 0.0071478305 0.0000000000
W 3.0590046010 0.0000000000
W 3.1333711503 0.0000000000
W -7.9925902599 0.0000000000
W 0.2528908714 0.0000000000
W 0.0651689407 0.0000000000
PC4
E 19.5501569351 0.0000000000 0.47
V 0.7054176684 0.0000000000
V -0.7117116152 0.0000000000
V -0.0069731939 0.0000000000
V 0.0578916991 0.0000000000
V 0.0277215748 0.0000000000
N 0.7025026853 0.0000000000
N -0.7087706239 0.0000000000
N -0.0069443787 0.0000000000
N 0.0576524745 0.0000000000
N 0.0276070215 0.0000000000
W 3.1061549216 0.0000000000
W -3.1338689624 0.0000000000
W -0.0307049588 0.0000000000
W 0.2549136409 0.0000000000
W 0.1220659899 0.0000000000
PC2
E 322.3045348724 0.0000000000 7.70
V -0.0625878598 0.0000000000
V 0.0327193611 0.0000000000
V 0.0258859244 0.0000000000
V 1.1718777756 0.0000000000
V -0.0080859959 0.0000000000
N -0.0532972371 0.0000000000
N 0.0278624569 0.0000000000
N 0.0220433843 0.0000000000
N 0.9979227246 0.0000000000
N -0.0068857002 0.0000000000
W -0.9568368746 0.0000000000
W 0.5002102855 0.0000000000
W 0.3957413952 0.0000000000
W 17.9155489758 0.0000000000
W -0.1236178876 0.0000000000
PC5
E 1.6296377808 0.0000000000 0.04
V -0.0220049147 0.0000000000
V 0.0174806810 0.0000000000
V 0.0067441814 0.0000000000
V 0.0050864663 0.0000000000
V 0.9998141827 0.0000000000
N -0.0219995304 0.0000000000
N 0.0174764038 0.0000000000
N 0.0067425312 0.0000000000
N 0.0050852217 0.0000000000
N 0.9995695447 0.0000000000
W -0.0280839993 0.0000000000
W 0.0223098995 0.0000000000
W 0.0086073311 0.0000000000
W 0.0064916550 0.0000000000
W 1.2760231622 0.0000000000

Wesley Roberts MSc.
Researcher: Earth Observation (Ecosystems)
Natural Resources and the Environment
CSIR
Tel: +27 (21) 888-2490
Fax: +27 (21) 888-2693

"To know the road ahead, ask those coming back."
- Chinese proverb

Nikos Alexandris <nikos.alexandris@felis.uni-freiburg.de> 02/25/09 3:47 PM >>>

Nikos:

If you have success running m.eigensystem, can I please ask you to let
the list (and me :-p) know about it? I compiled m.eigensystem without
errors but it doesn't seem to work within from grass65. I am currently
re-compiling grass6_devel to see whether it will work or not.

Or do I need to work with grass6.4?

# Answering to myself :-p

Yes, it works. I was confused by the fact that nomatter what argument is
given to m.eigensystem it does not print short description/help
messages. Instead it runs. Of course "man m.eigensystem" lets you read
the manual.

So here is an(other) example:

        # get covariance matrix
        r.covar
        map=MOD2006_239_500_sur_refl_b02,MOD2006_239_500_sur_refl_b06,MOD2006_239_500_sur_refl_b07,MOD2007_242_500_sur_refl_b02,MOD2007_242_500_sur_refl_b06,MOD2007_242_500_sur_refl_b07
        
        r.covar: complete ...
         100%
        230737.894890 195532.919270 135003.507092 160548.524865
        157011.283237 109163.152597
        195532.919270 370156.630631 302611.016829 169840.571354
        329762.496795 255222.040018
        135003.507092 302611.016829 269579.148609 123245.319257
        267537.923455 218257.721900
        160548.524865 169840.571354 123245.319257 283021.657148
        227212.808304 97810.797191
        157011.283237 329762.496795 267537.923455 227212.808304
        411853.700150 306427.271345
        109163.152597 255222.040018 218257.721900 97810.797191
        306427.271345 287565.577718
        
        # output in a file
        r.covar
        map=MOD2006_239_500_sur_refl_b02,MOD2006_239_500_sur_refl_b06,MOD2006_239_500_sur_refl_b07,MOD2007_242_500_sur_refl_b02,MOD2007_242_500_sur_refl_b06,MOD2007_242_500_sur_refl_b07 > test_m.eigensystem
        
        # edit the file and add a number K which corresponds to the
        number of rows and columns of the cov matrix
        # in my example: add the number "6" in the top line of the
        test_m.eigensystem file
        
        # run m.eigensystem
        m.eigensystem < test_m.eigensystem
        
        E 1384795.0108207434 .0000000000 74.74
        V .4835516921 .0000000000
        V .8488469532 .0000000000
        V .6938880512 .0000000000
        V .5233792650 .0000000000
        V .8927982533 .0000000000
        V .6795723454 .0000000000
        N .2806476574 .0000000000
        N .4926606871 .0000000000
        N .4027243813 .0000000000
        N .3037631075 .0000000000
        N .5181694995 .0000000000
        N .3944157157 .0000000000
        W 330.2586235389 .0000000000
        W 579.7498611168 .0000000000
        W 473.9152326520 .0000000000
        W 357.4602642457 .0000000000
        W 609.7679463257 .0000000000
        W 464.1378182325 .0000000000
        E 241514.2792070895 .0000000000 13.03
        
        [...]
        
--
This message is subject to the CSIR's copyright terms and conditions, e-mail legal notice, and implemented Open Document Format (ODF) standard.
The full disclaimer details can be found at http://www.csir.co.za/disclaimer.html.

This message has been scanned for viruses and dangerous content by MailScanner,
and is believed to be clean. MailScanner thanks Transtec Computers for their support.

Wesley

I downloaded and installed GRASS 6.4 and after much "wailing and
gnashing of teeth" I got m.eigensystem to work. Below are some
comments and questions.

Nice that it worked-out finally. Hopefully my comments are useful for
you (and correct). You can have a look in the following links
[1][2][3][4].

Over the last couple of days I have been running PCA analyses using
the i.pca and r.covar -> m.eigensystem -> r.mapcalc. The analysis
seeks to create a component surface where tree crowns are separated
from understory and ground in a plantation forest. Inputs are three
digital aerial photographs (red, green, blue), a top of canopy height
model, and an intensity surface derived from lidar return intensity
measures. Output from the PCA will be input into a tree couting method
which (if all goes well) will use mathematical morphology to isolate
tree crowns for counting purposes

Interesting stuff!

My results are interesting and worth mentioning to the list. Firstly,
the results from both the automated (i.pca) and the
'by-hand-method' (r.covar -> m.eigensystem -> r.mapcalc) differ. For
example; the eigen values from the automated approach are as follows

(-0.50 -0.53 -0.49 -0.47 -0.08)
(-0.38 -0.30 -0.13 0.86 0.11)
(-0.34 -0.35 0.86 -0.14 0.05)
(0.70 -0.71 -0.01 0.06 0.03)
(0.00 -0.03 0.07 0.13 -0.99)

while the eigen values from the 'by-hand-method' are completely
different, in fact I am a little confused with regards to the ouput
from i.pca and the m.eigensystem. i.pca returns the n number of
components plus the eigen values for each component (or are those
vectors?).

Yes, those are the eigen_VECTORS_(=loadings, on other words the amount
of information that contribute each of the original dimensions in the
resulting components). Each row corresponds to one principal components.
In your example above you "know" that the 1st component (1st row) is
composed by the original dimensions (each column) and each original
dimension has "contributed" according to the _loadings_:

So dimenions 1 -> -0.50, dimension 2 -> -0.53 , dimension 3 -> -0.49,
dimension 4 -> -0.47 and dimension 5 -> -0.08

If I understand well the PCA myself, you can disregard the "signs" and
see the loadings as absolute values.

Would it be fair in saying that these are the coefficients which have
been applied to the input imagery to attain the output components (in
the same way the m.eigensystem works with r.mapcalc)?

Yes.

Output from the m.eigensystem approach only gives one eigen value per
component (see below).
Are the above values from i.pca not the eigen vectors?

It should be the case with i.pca as well since eigen_VALUES_ (=represent
the variances of the original dimensions that are "kept" in each
component) are important for the interpretation of what exactly are each
of the components. But, i.pca just does not report the eigen_VALUES_.

At some point some C-expert needs to have a look in the code (i.pca) and
correct the "bug" which does not let the eigen_VALUES_ from being
printed.

If this is the case then both methods still differ significantly. Is
this possible, and which should I use.

Please have a look at my comments/questions in link [2]. i.pca follows
the "SVD" method. You performed the non-standartised PCA using the
covariance matrix. Note that you can use also the standartised method by
using the correlation matrix.

Qualitatively, the 'by-hand-method' seems to isolate the crowns very
nicely in PC1 while the automated (i.pca) approach isolates crowns in
PC3?? I rescaled the output in the i.pca method, would this contribute
to the differences seen?

I am going to run more tests on the rest of my data and will see if
these issues arise again. In the meantime if anyone of the list can
offer some insight into the two different pca analysis examples I
would greatly appreciate it.

I would be happy to hear more. It's a tool I also need.
Kindest regards, Nikos

[...]

---
Links:

# in grass-user mailing list

[1] # In these posts I didn't know much about PCA #
http://n2.nabble.com/i.pca--vs.--r.covar-m.eigensystem-r.mapcalc-td1885820.html#a1885821

[2] # this is the one I have sent you already #
http://n2.nabble.com/Comparison-between-&quot;i\.pca&quot;\-and\-R&#39;s\-&quot;prcomp\(\)&quot;%
3A-explanations-and-questions-td2283997.html#a2284070

# in grass-trac

[3] http://trac.osgeo.org/grass/ticket/341

[4] http://trac.osgeo.org/grass/ticket/430

Nikos wrote:

It should be the case with i.pca as well since eigen_VALUES_ (=represent
the variances of the original dimensions that are "kept" in each
component) are important for the interpretation of what exactly are each
of the components. But, i.pca just does not report the eigen_VALUES_.

At some point some C-expert needs to have a look in the code (i.pca) and
correct the "bug" which does not let the eigen_VALUES_ from being
printed.

done in devbr6 (6.5svn) please test, I'm not a multivariate stats guru
and may have done something dumb so didn't port to other branches yet.

I changed the i.pca output to be like:

Eigen (vectors) and values:
PC1 ( -0.63 -0.65 -0.43 ) 88.07
PC2 ( 0.23 0.37 -0.90 ) 11.48
PC3 ( 0.75 -0.66 -0.08 ) 0.45

As it was previously sent to stderr via G_message() I don't feel bad about
breaking output text compatibility. I wanted to add "%" to the values but
due to the sprintf()+strcat() method in the code that was a pain, so I
didn't.

> If this is the case then both methods still differ significantly. Is
> this possible, and which should I use.

Please have a look at my comments/questions in link [2].
i.pca follows the "SVD" method. You performed the non-standartised PCA
using the covariance matrix. Note that you can use also the
standartised method by using the correlation matrix.

does the r.mapcalc command at the end of the m.eigensystem help page*
do that conversion, or ...? ie how can we test these against each other?
how to do the standardized method?

[*] (is "\-" there a typo or some old mapcalc syntax?)

also ISTR somebody (Dylan?) doing a comparison with the R-stats interface.

It would be nice to run tests using the Spearfish imagery dataset. After
my own tests I noticed it matched what was used in the m.eigensystem help
page.

my results follow.

Hamish

----------------------------
#Spearfish imagery sample dataset
g.region rast=spot.ms.1

# 'by-hand-method'
G65> echo "3" > test_m.eigensystem # number of input maps
G65> r.covar map=spot.ms.1,spot.ms.2,spot.ms.3 >> test_m.eigensystem

G65> cat test_m.eigensystem
3
462.876649 480.411218 281.758307
480.411218 513.015646 278.914813
281.758307 278.914813 336.326645

G65> m.eigensystem < test_m.eigensystem
-----
C The output is N sets of values. One E line and N V W lines
C
C E real imaginary percent-importance
C V real imaginary
C N real imaginary
C W real imaginary
C ...
C
C where E is the eigen value (and it relative importance)
C and V are the eigenvector for this eigenvalue.
C N are the normalized eigenvector for this eigenvalue.
C W are the N vector multiplied by the square root of the
C magnitude of the eigen value (E).
-----

E 1159.7452017844 0.0000000000 88.38
V 0.6910021591 0.0000000000
V 0.7205280412 0.0000000000
V 0.4805108400 0.0000000000
N 0.6236808478 0.0000000000
N 0.6503301526 0.0000000000
N 0.4336967751 0.0000000000
W 21.2394712045 0.0000000000
W 22.1470141296 0.0000000000
W 14.7695575384 0.0000000000

E 5.9705414972 0.0000000000 0.45
V 0.7119385973 0.0000000000
V -0.6358200627 0.0000000000
V -0.0703936743 0.0000000000
N 0.7438340890 0.0000000000
N -0.6643053754 0.0000000000
N -0.0735473745 0.0000000000
W 1.8175356507 0.0000000000
W -1.6232096923 0.0000000000
W -0.1797107407 0.0000000000

E 146.5031967184 0.0000000000 11.16
V 0.2265837636 0.0000000000
V 0.3474697082 0.0000000000
V -0.8468727535 0.0000000000
N 0.2402770238 0.0000000000
N 0.3684685345 0.0000000000
N -0.8980522763 0.0000000000
W 2.9082771721 0.0000000000
W 4.4598880523 0.0000000000
W -10.8698904856 0.0000000000

# 'all-in-one method' using r.covar+m.eigensystem:
G65> (echo 3; r.covar spot.ms.1,spot.ms.2,spot.ms.3 ) | m.eigensystem

Then, using the W vector, new maps are created:
r.mapcalc 'pc.1 = 21.2395*map.1 + 22.1470*map.2 + 14.7696*map.3'
r.mapcalc 'pc.2 = 2.9083*map.1 + 4.4599*map.2 - 10.8699*map.3'
r.mapcalc 'pc.3 = 1.8175*map.1 - 1.6232*map.2 \- 0.1797*map.3'

(is "\-" above a typo or some old mapcalc syntax?)

which look highly similar (but not identical) to i.pca output maps.
(after 'r.colors color=grey')

# 'automatic method'
imagery60:G6.5svn> i.pca in=spot.ms.1,spot.ms.2,spot.ms.3 out=spot_pca

Eigen (vectors) and values:
PC1 ( -0.63 -0.65 -0.43 ) 88.07
PC2 ( 0.23 0.37 -0.90 ) 11.48
PC3 ( 0.75 -0.66 -0.08 ) 0.45

Hamish wrote:

# 'automatic method'
imagery60:G6.5svn> i.pca in=spot.ms.1,spot.ms.2,spot.ms.3 out=spot_pca

Eigen (vectors) and values:
PC1 ( -0.63 -0.65 -0.43 ) 88.07
PC2 ( 0.23 0.37 -0.90 ) 11.48
PC3 ( 0.75 -0.66 -0.08 ) 0.45

changed to:
Eigen values, (vectors), and [percent importance]:
Eigenvalue 1: 1170.12 ( -0.63 -0.65 -0.43 ) [88.07%]
Eigenvalue 2: 152.49 ( 0.23 0.37 -0.90 ) [11.48%]
Eigenvalue 3: 6.01 ( 0.75 -0.66 -0.08 ) [0.45%]

comments welcome.

Hamish

On Sun, 2009-03-01 at 01:59 -0800, Hamish wrote:

Hamish wrote:
> # 'automatic method'
> imagery60:G6.5svn> i.pca in=spot.ms.1,spot.ms.2,spot.ms.3 out=spot_pca
>
> Eigen (vectors) and values:
> PC1 ( -0.63 -0.65 -0.43 ) 88.07
> PC2 ( 0.23 0.37 -0.90 ) 11.48
> PC3 ( 0.75 -0.66 -0.08 ) 0.45

changed to:
Eigen values, (vectors), and [percent importance]:
Eigenvalue 1: 1170.12 ( -0.63 -0.65 -0.43 ) [88.07%]
Eigenvalue 2: 152.49 ( 0.23 0.37 -0.90 ) [11.48%]
Eigenvalue 3: 6.01 ( 0.75 -0.66 -0.08 ) [0.45%]

comments welcome.
Hamish

Congrats after I see that... it works :-p

.

Seriously now: Thank You Hamish!!
Kindest regards, Nikos

P.S. I overdid it with ideas below... :smiley:
---

I am testing... (after the compilation completes). Even before testing I
dould like to drop-in my idea about the output, that is "be more close
to the _standards_ (=e.g. the books present the results)".

   * Present first the variance (=eigenvalues) because it's the first
thing you will look at to know "how much variance of the original data
is _expressed_ in each new component.

   * The importance, since it refers to the eigenvalue, it's better to
come right after it.

   * Present the loadings (eigenvectors) for each new component.

   * Column-wise or row-wise? The results can be either presented
column-wise, that is one column for each new component _or_ row-wise,
as they are currently printed. I think row-wise just looks better :slight_smile:

"Some" examples... (only 2 for column-wise and all the rest row-wise...
playing around).

# column-wise examples ##############################################

  PC1 PC2 PC3
1170.12 152.49 6.01
[88.07%] [11.48%] [0.47%]
-0.63 0.23 0.75
-0.65 0.37 -0.66
-0.43 -0.90 -0.08

or

#...and perhaps naming each row after
# _original feature_ or
# _original variable_ or
# _original image_ or
# _original dimension_ or
# _original input_ ?

Dimensions PCA PC2 PC3
Variance 1170.12 152.49 6.01
Importance [88.07%] [11.48%] [0.47%]
1st input -0.63 0.23 0.75
2nd input -0.65 0.37 -0.66
3rd input -0.43 -0.90 -0.08

or

Dimensions PCA PC2 PC3
Variance 1170.12 152.49 6.01
Importance(%) 88.07 11.48 0.47
1st input -0.63 0.23 0.75
2nd input -0.65 0.37 -0.66
3rd input -0.43 -0.90 -0.08

or

[...]

# row-wise examples ##############################################

Eigenvalues, [importance] and (eigenvectors)
PC1 1170.12 [88.07%] ( -0.63 -0.65 -0.43 )
PC2 152.49 [11.48%] ( 0.23 0.37 -0.90 )
PC3 6.01 [0.45%] ( 0.75 -0.66 -0.08 )

or

Eigenvalues, importance and (eigenvectors)
PC1 1170.12 88.07% ( -0.63 -0.65 -0.43 )
PC2 152.49 11.48% ( 0.23 0.37 -0.90 )
PC3 6.01 0.45% ( 0.75 -0.66 -0.08 )

or

    Eigenvalues Importance Eigenvectors
PC1 1170.12 88.07% ( -0.63 -0.65 -0.43 )
PC2 152.49 11.48% ( 0.23 0.37 -0.90 )
PC3 6.01 0.45% ( 0.75 -0.66 -0.08 )

or

    Eigenvalues Importance Eigenvectors
PC1 1170.12 88.07% ( -0.63 -0.65 -0.43 )
PC2 152.49 11.48% ( 0.23 0.37 -0.90 )
PC3 6.01 0.45% ( 0.75 -0.66 -0.08 )

or

     Eigenvalues % Eigenvectors
PC1 1170.12 88.07 ( -0.63 -0.65 -0.43 )
PC2 152.49 11.48 ( 0.23 0.37 -0.90 )
PC3 6.01 0.45 ( 0.75 -0.66 -0.08 )

or

     Eigenvalues % Eigenvectors
PC1 1170.12 88.07 ( -0.63 -0.65 -0.43 )
PC2 152.49 11.48 ( 0.23 0.37 -0.90 )
PC3 6.01 0.45 ( 0.75 -0.66 -0.08 )

or

     Eigenvalues % Eigenvectors
PC1 1170.12 88.07 ( -0.63 -0.65 -0.43 )
PC2 152.49 11.48 ( 0.23 0.37 -0.90 )
PC3 6.01 0.45 ( 0.75 -0.66 -0.08 )

or

     Eigenvalues [%] Eigenvectors
PC1 1170.12 [88.07] ( -0.63 -0.65 -0.43 )
PC2 152.49 [11.48] ( 0.23 0.37 -0.90 )
PC3 6.01 [0.45] ( 0.75 -0.66 -0.08 )

or

     Eigenvalues [%] Eigenvectors
PC1 1170.12 [88.07] ( -0.63 -0.65 -0.43 )
PC2 152.49 [11.48] ( 0.23 0.37 -0.90 )
PC3 6.01 [0.45] ( 0.75 -0.66 -0.08 )

or

    Variance Variance(%) Eigenvectors
PC1 1170.12 88.07 ( -0.63 -0.65 -0.43 )
PC2 152.49 11.48 ( 0.23 0.37 -0.90 )
PC3 6.01 0.45 ( 0.75 -0.66 -0.08 )

or

        Std % Eigenvectors
PC1 1170.12 88.07 ( -0.63 -0.65 -0.43 )
PC2 152.49 11.48 ( 0.23 0.37 -0.90 )
PC3 6.01 0.45 ( 0.75 -0.66 -0.08 )

Nikos:

[...] i.pca just does not report the eigen_VALUES_. At some point some
C-expert needs to have a look in the code (i.pca) and correct the
"bug" which does not let the eigen_VALUES_ from being printed.

Hamish:

done in devbr6 (6.5svn) please test, I'm not a multivariate stats guru
and may have done something dumb so didn't port to other branches yet.

First test:

[...]
Following modules are missing the 'description.html' file in src code:
----------------------------------------------------------------------
GRASS GIS compilation log
-------------------------
Started compilation: Sun Mar 1 11:41:04 CET 2009
--
Errors in:
/usr/local/src/grass6_devel/imagery/i.pca
--
In case of errors please change into the directory with error and run 'make'.
If you get multiple errors, you need to deal with them in the order they
appear in the error log. If you get an error building a library, you will
also get errors from anything which uses the library.
--
Finished compilation: Sun Mar 1 11:46:34 CET 2009
make: *** [default] Error 1
nik@vertical:/usr/local/src/grass6_devel$ cd imagery/i.pca/
nik@vertical:/usr/local/src/grass6_devel/imagery/i.pca$ make
gcc -I/usr/local/src/grass6_devel/dist.x86_64-unknown-linux-gnu/include -g -Wall -DPACKAGE=\""grassmods"\" -I/usr/local/src/grass6_devel/dist.x86_64-unknown-linux-gnu/include -o OBJ.x86_64-unknown-linux-gnu/support.o -c support.c
support.c: In function ‘write_history’:
support.c:43: error: expected expression before ‘<<’ token
support.c:46: error: expected expression before ‘==’ token
support.c:49: error: expected expression before ‘>>’ token
support.c:67: warning: format not a string literal and no format arguments
make: *** [OBJ.x86_64-unknown-linux-gnu/support.o] Error 1

# Am I doing something wrong?

Hi,

2009/3/1 Nikos Alexandris <nikos.alexandris@felis.uni-freiburg.de>:

gcc -I/usr/local/src/grass6_devel/dist.x86_64-unknown-linux-gnu/include -g -Wall -DPACKAGE=\""grassmods"\" -I/usr/local/src/grass6_devel/dist.x86_64-unknown-linux-gnu/include -o OBJ.x86_64-unknown-linux-gnu/support.o -c support.c
support.c: In function ‘write_history’:
support.c:43: error: expected expression before ‘<<’ token
support.c:46: error: expected expression before ‘==’ token
support.c:49: error: expected expression before ‘>>’ token
support.c:67: warning: format not a string literal and no format arguments
make: *** [OBJ.x86_64-unknown-linux-gnu/support.o] Error 1

simply, your local copy of support.c is corrupted (in conflict). Run

svn revert . -R

in imagery/i.pca to revert your local changes.

M.

--
Martin Landa <landa.martin gmail.com> * http://gama.fsv.cvut.cz/~landa

On Sun, 2009-03-01 at 12:46 +0100, Nikos Alexandris wrote:

Nikos:
> [...] i.pca just does not report the eigen_VALUES_. At some point some
> C-expert needs to have a look in the code (i.pca) and correct the
> "bug" which does not let the eigen_VALUES_ from being printed.

Hamish:
> done in devbr6 (6.5svn) please test, I'm not a multivariate stats guru
> and may have done something dumb so didn't port to other branches yet.

First test:

[...]

Errors in:
/usr/local/src/grass6_devel/imagery/i.pca

[...]

Finished compilation: Sun Mar 1 11:46:34 CET 2009
make: *** [default] Error 1
nik@vertical:/usr/local/src/grass6_devel$ cd imagery/i.pca/
nik@vertical:/usr/local/src/grass6_devel/imagery/i.pca$ make
gcc -I/usr/local/src/grass6_devel/dist.x86_64-unknown-linux-gnu/include -g -Wall -DPACKAGE=\""grassmods"\" -I/usr/local/src/grass6_devel/dist.x86_64-unknown-linux-gnu/include -o OBJ.x86_64-unknown-linux-gnu/support.o -c support.c
support.c: In function ‘write_history’:
support.c:43: error: expected expression before ‘<<’ token
support.c:46: error: expected expression before ‘==’ token
support.c:49: error: expected expression before ‘>>’ token
support.c:67: warning: format not a string literal and no format arguments
make: *** [OBJ.x86_64-unknown-linux-gnu/support.o] Error 1

OK, I removed the lines below, compiled again and now it works.
Perfect!!!

43 +<<<<<<< .mine
44 if (!to_std)
45 - G_message(_("Eigen values:"));

47 +=======

Nikos:

> It should be the case with i.pca as well since eigen_VALUES_ (=represent
> the variances of the original dimensions that are "kept" in each
> component) are important for the interpretation of what exactly are each
> of the components. But, i.pca just does not report the eigen_VALUES_.
>
> At some point some C-expert needs to have a look in the code (i.pca) and
> correct the "bug" which does not let the eigen_VALUES_ from being
> printed.

Hamish:

done in devbr6 (6.5svn) please test, I'm not a multivariate stats guru
and may have done something dumb so didn't port to other branches yet.

I changed the i.pca output to be like:

Eigen (vectors) and values:
PC1 ( -0.63 -0.65 -0.43 ) 88.07
PC2 ( 0.23 0.37 -0.90 ) 11.48
PC3 ( 0.75 -0.66 -0.08 ) 0.45

As it was previously sent to stderr via G_message() I don't feel bad about
breaking output text compatibility. I wanted to add "%" to the values but
due to the sprintf()+strcat() method in the code that was a pain, so I
didn't.

Wesley:

> > If this is the case then both methods still differ significantly. Is
> > this possible, and which should I use.

> Please have a look at my comments/questions in link [2].
> i.pca follows the "SVD" method. You performed the non-standartised PCA
> using the covariance matrix. Note that you can use also the
> standartised method by using the correlation matrix.

does the r.mapcalc command at the end of the m.eigensystem help page*
do that conversion, or ...? ie how can we test these against each other?
how to do the standardized method?

We can easily cross-compare now. I will run my test in GRASS and R.
In GRASS:
* i.pca --> SVD without data centering method

* "r.covar" + "m.eigensystem" --> non-standardised PCA (about data
centering I am unsure here, I will investigate the _numbers_)

* "r.covar -r" + "m.eigensystem" # note the "-r" flag --> standardised
PCA (about data centering I am unsure, willing to test).

[*] (is "\-" there a typo or some old mapcalc syntax?)

IMHO: yes. Maybe it's some forgotten _backslash_ !?

also ISTR somebody (Dylan?) doing a comparison with the R-stats interface.

It would be nice to run tests using the Spearfish imagery dataset. After
my own tests I noticed it matched what was used in the m.eigensystem help
page.

I will probably run the test with my own data. Since I just need to
copy-paste the commands. I don't have the spearfish dataset currently.

my results follow.

Hamish

----------------------------
#Spearfish imagery sample dataset
g.region rast=spot.ms.1

# 'by-hand-method'
G65> echo "3" > test_m.eigensystem # number of input maps
G65> r.covar map=spot.ms.1,spot.ms.2,spot.ms.3 >> test_m.eigensystem

G65> cat test_m.eigensystem
3
462.876649 480.411218 281.758307
480.411218 513.015646 278.914813
281.758307 278.914813 336.326645

G65> m.eigensystem < test_m.eigensystem
-----
C The output is N sets of values. One E line and N V W lines
C
C E real imaginary percent-importance
C V real imaginary
C N real imaginary
C W real imaginary
C ...
C
C where E is the eigen value (and it relative importance)
C and V are the eigenvector for this eigenvalue.
C N are the normalized eigenvector for this eigenvalue.
C W are the N vector multiplied by the square root of the
C magnitude of the eigen value (E).
-----

E 1159.7452017844 0.0000000000 88.38
V 0.6910021591 0.0000000000
V 0.7205280412 0.0000000000
V 0.4805108400 0.0000000000
N 0.6236808478 0.0000000000
N 0.6503301526 0.0000000000
N 0.4336967751 0.0000000000
W 21.2394712045 0.0000000000
W 22.1470141296 0.0000000000
W 14.7695575384 0.0000000000

E 5.9705414972 0.0000000000 0.45
V 0.7119385973 0.0000000000
V -0.6358200627 0.0000000000
V -0.0703936743 0.0000000000
N 0.7438340890 0.0000000000
N -0.6643053754 0.0000000000
N -0.0735473745 0.0000000000
W 1.8175356507 0.0000000000
W -1.6232096923 0.0000000000
W -0.1797107407 0.0000000000

E 146.5031967184 0.0000000000 11.16
V 0.2265837636 0.0000000000
V 0.3474697082 0.0000000000
V -0.8468727535 0.0000000000
N 0.2402770238 0.0000000000
N 0.3684685345 0.0000000000
N -0.8980522763 0.0000000000
W 2.9082771721 0.0000000000
W 4.4598880523 0.0000000000
W -10.8698904856 0.0000000000

# 'all-in-one method' using r.covar+m.eigensystem:
G65> (echo 3; r.covar spot.ms.1,spot.ms.2,spot.ms.3 ) | m.eigensystem

Then, using the W vector, new maps are created:
r.mapcalc 'pc.1 = 21.2395*map.1 + 22.1470*map.2 + 14.7696*map.3'
r.mapcalc 'pc.2 = 2.9083*map.1 + 4.4599*map.2 - 10.8699*map.3'
r.mapcalc 'pc.3 = 1.8175*map.1 - 1.6232*map.2 \- 0.1797*map.3'

(is "\-" above a typo or some old mapcalc syntax?)

Probably a typo.

which look highly similar (but not identical) to i.pca output maps.

It's ok. R print's out lot's of decimals. Perhaps the difference is due
to some rounding of the numbers.

(after 'r.colors color=grey')

# 'automatic method'
imagery60:G6.5svn> i.pca in=spot.ms.1,spot.ms.2,spot.ms.3 out=spot_pca

Eigen (vectors) and values:
PC1 ( -0.63 -0.65 -0.43 ) 88.07
PC2 ( 0.23 0.37 -0.90 ) 11.48
PC3 ( 0.75 -0.66 -0.08 ) 0.45