[GRASS-dev] [GRASS GIS] #1668: r.regression.line F-test incorrect

#1668: r.regression.line F-test incorrect
-------------------------+--------------------------------------------------
Reporter: cmbarton | Owner: grass-dev@…
     Type: defect | Status: new
Priority: normal | Milestone: 7.0.0
Component: Raster | Version: unspecified
Keywords: | Platform: Unspecified
      Cpu: Unspecified |
-------------------------+--------------------------------------------------
One of my students noticed that the "F-test" in r.regression.line does not
seem to be calculating F, but instead calculating -(R squared).

For example, from the Spearfish demo data,

{{{

r.regression.line map1=elevation.dem@PERMANENT map2=slope@PERMANENT
y = a + b*x
    a (Offset): -16.675093
    b (Gain): 0.020833
    R (sumXY - sumX*sumY/N): 0.481666
    N (Number of elements): 2611107
    F (F-test significance): -0.232002
    meanX (Mean of map1): 1353.724982
    sdX (Standard deviation of map1): 176.754565
    meanY (Mean of map2): 11.527723
    sdY (Standard deviation of map2): 7.645157

}}}

0.481666^2 = 0.232002

I haven't checked, but this probably affects all versions of GRASS

--
Ticket URL: <http://trac.osgeo.org/grass/ticket/1668&gt;
GRASS GIS <http://grass.osgeo.org>

#1668: r.regression.line F-test incorrect
-------------------------+--------------------------------------------------
Reporter: cmbarton | Owner: grass-dev@…
     Type: defect | Status: new
Priority: normal | Milestone: 7.0.0
Component: Raster | Version: unspecified
Keywords: | Platform: Unspecified
      Cpu: Unspecified |
-------------------------+--------------------------------------------------

Comment(by mlennert):

Replying to [ticket:1668 cmbarton]:
> One of my students noticed that the "F-test" in r.regression.line does
not seem to be calculating F, but instead calculating -(R squared).
>
> For example, from the Spearfish demo data,
>
> {{{
>
> r.regression.line map1=elevation.dem@PERMANENT map2=slope@PERMANENT
> y = a + b*x
> a (Offset): -16.675093
> b (Gain): 0.020833
> R (sumXY - sumX*sumY/N): 0.481666
> N (Number of elements): 2611107
> F (F-test significance): -0.232002
> meanX (Mean of map1): 1353.724982
> sdX (Standard deviation of map1): 176.754565
> meanY (Mean of map2): 11.527723
> sdY (Standard deviation of map2): 7.645157
>
> }}}
>
> 0.481666^2 = 0.232002

The values only seem the same because of rounding.

However, the formula for calculating the statistic does not seem correct
in the code. IIUC, instead of

{{{
F = R * R / (1 - R * R / count - 2);
}}}

I think it should be

{{{
F = R * R / (1 - R * R) / (count - 2);
}}}

but this should be checked by a statistician.

Moritz

--
Ticket URL: <http://trac.osgeo.org/grass/ticket/1668#comment:1&gt;
GRASS GIS <http://grass.osgeo.org>

#1668: r.regression.line F-test incorrect
-------------------------+--------------------------------------------------
Reporter: cmbarton | Owner: grass-dev@…
     Type: defect | Status: new
Priority: normal | Milestone: 7.0.0
Component: Raster | Version: unspecified
Keywords: | Platform: Unspecified
      Cpu: Unspecified |
-------------------------+--------------------------------------------------

Comment(by cmbarton):

These DO give VERY different values. For slope vs. elevation in my
Spearfish example, the current equation gives a value of -0.2320019531,
while the revised equation gives a value of 1.02410689008565E-006.

http://www.weibull.com/DOEWeb/hypothesis_tests_in_simple_linear_regression.htm
defines
F = SumSq Regression /( SumSq Errors / (count - 2))

--
Ticket URL: <http://trac.osgeo.org/grass/ticket/1668#comment:2&gt;
GRASS GIS <http://grass.osgeo.org>

#1668: r.regression.line F-test incorrect
-------------------------+--------------------------------------------------
Reporter: cmbarton | Owner: grass-dev@…
     Type: defect | Status: new
Priority: normal | Milestone: 7.0.0
Component: Raster | Version: unspecified
Keywords: | Platform: Unspecified
      Cpu: Unspecified |
-------------------------+--------------------------------------------------

Comment(by mlennert):

Replying to [comment:2 cmbarton]:
> These DO give VERY different values. For slope vs. elevation in my
Spearfish example, the current equation gives a value of -0.2320019531,
while the revised equation gives a value of 1.02410689008565E-006.
>
>
http://www.weibull.com/DOEWeb/hypothesis_tests_in_simple_linear_regression.htm
defines
> F = SumSq Regression /( SumSq Errors / (count - 2))

Yup, sorry, in my proposal another set of parentheses was missing. It
should be:

F = R * R / ((1 - R * R) / (count - 2));

where

R*R = SumSq Regression
1-R*R = SumSq Errors

which in your example gives you a value for F = 788781, i.e. a probability
that there is a relationship so close to one that most software will
probably just give you 1 after rounding.

Again, I can commit this, but would like to have someone more versed in
statistics confirm.

Moritz

--
Ticket URL: <http://trac.osgeo.org/grass/ticket/1668#comment:3&gt;
GRASS GIS <http://grass.osgeo.org>

#1668: r.regression.line F-test incorrect
-------------------------+--------------------------------------------------
Reporter: cmbarton | Owner: grass-dev@…
     Type: defect | Status: new
Priority: normal | Milestone: 7.0.0
Component: Raster | Version: unspecified
Keywords: | Platform: Unspecified
      Cpu: Unspecified |
-------------------------+--------------------------------------------------

Comment(by mmetz):

Replying to [comment:3 mlennert]:
> Replying to [comment:2 cmbarton]:
> > These DO give VERY different values. For slope vs. elevation in my
Spearfish example, the current equation gives a value of -0.2320019531,
while the revised equation gives a value of 1.02410689008565E-006.
> >
> >
http://www.weibull.com/DOEWeb/hypothesis_tests_in_simple_linear_regression.htm
defines
> > F = SumSq Regression /( SumSq Errors / (count - 2))
>
> Yup, sorry, in my proposal another set of parentheses was missing. It
should be:
>
> F = R * R / ((1 - R * R) / (count - 2));
>
> where
>
> R*R = SumSq Regression
> 1-R*R = SumSq Errors
>
> which in your example gives you a value for F = 788781, i.e. a
probability that there is a relationship so close to one that most
software will probably just give you 1 after rounding.
>
> Again, I can commit this, but would like to have someone more versed in
statistics confirm.

You can compare to the grass7 addon r.regression.multi whose results are
identical to those of R.

Markus M

--
Ticket URL: <http://trac.osgeo.org/grass/ticket/1668#comment:4&gt;
GRASS GIS <http://grass.osgeo.org>

#1668: r.regression.line F-test incorrect
-------------------------+--------------------------------------------------
Reporter: cmbarton | Owner: grass-dev@…
     Type: defect | Status: new
Priority: normal | Milestone: 7.0.0
Component: Raster | Version: unspecified
Keywords: | Platform: Unspecified
      Cpu: Unspecified |
-------------------------+--------------------------------------------------

Comment(by mlennert):

Replying to [comment:4 mmetz]:
> You can compare to the grass7 addon r.regression.multi whose results are
identical to those of R.

This is what I did and actually r.regression.line with the above change
gives the same result as R, not r.regression.multi (regressing the landsat
bands 10 and 20 from the NC data set):

F (R): 6268922
F (r.regression.line with change): 6268922.212939
F (r.regression.multi): 6268947.256273

I think I've found the problem in r.regression.multi:

Instead of

{{{
F = ((SStot - SSerr) * (count - n_predictors)) / (SSerr * n_predictors);
}}}

it should be

{{{
F = ((SStot - SSerr) * (count - n_predictors - 1)) / (SSerr *
n_predictors);
}}}

Moritz

--
Ticket URL: <http://trac.osgeo.org/grass/ticket/1668#comment:5&gt;
GRASS GIS <http://grass.osgeo.org>

#1668: r.regression.line F-test incorrect
-------------------------+--------------------------------------------------
Reporter: cmbarton | Owner: grass-dev@…
     Type: defect | Status: new
Priority: normal | Milestone: 7.0.0
Component: Raster | Version: unspecified
Keywords: | Platform: Unspecified
      Cpu: Unspecified |
-------------------------+--------------------------------------------------

Comment(by mlennert):

I've just committed a fix for r.regression.line to trunk, grass65 and
grass64_release. I'll leave it up to Markus to decide whether my fix for
r.regression.multi is the right one.

Moritz

--
Ticket URL: <http://trac.osgeo.org/grass/ticket/1668#comment:6&gt;
GRASS GIS <http://grass.osgeo.org>

#1668: r.regression.line F-test incorrect
-------------------------+--------------------------------------------------
Reporter: cmbarton | Owner: grass-dev@…
     Type: defect | Status: new
Priority: normal | Milestone: 7.0.0
Component: Raster | Version: unspecified
Keywords: | Platform: Unspecified
      Cpu: Unspecified |
-------------------------+--------------------------------------------------

Comment(by mmetz):

Replying to [comment:6 mlennert]:
> I've just committed a fix for r.regression.line to trunk, grass65 and
grass64_release. I'll leave it up to Markus to decide whether my fix for
r.regression.multi is the right one.

Your fix for r.regression.multi seems correct. Apparently I have only
validated the F values for the predictors against R, not the global F.
Fixed in r51906.

Markus M

--
Ticket URL: <http://trac.osgeo.org/grass/ticket/1668#comment:7&gt;
GRASS GIS <http://grass.osgeo.org>

#1668: r.regression.line F-test incorrect
-------------------------+--------------------------------------------------
Reporter: cmbarton | Owner: grass-dev@…
     Type: defect | Status: new
Priority: normal | Milestone: 7.0.0
Component: Raster | Version: unspecified
Keywords: | Platform: Unspecified
      Cpu: Unspecified |
-------------------------+--------------------------------------------------

Comment(by cmbarton):

Does this mean that is fixed in GRASS 7 too?

Michael

--
Ticket URL: <http://trac.osgeo.org/grass/ticket/1668#comment:8&gt;
GRASS GIS <http://grass.osgeo.org>

#1668: r.regression.line F-test incorrect
--------------------------+-------------------------------------------------
  Reporter: cmbarton | Owner: grass-dev@…
      Type: defect | Status: closed
  Priority: normal | Milestone: 7.0.0
Component: Raster | Version: unspecified
Resolution: fixed | Keywords:
  Platform: Unspecified | Cpu: Unspecified
--------------------------+-------------------------------------------------
Changes (by mlennert):

  * status: new => closed
  * resolution: => fixed

Comment:

Replying to [comment:8 cmbarton]:
> Does this mean that is fixed in GRASS 7 too?

Yes, closing the bug.

Moritz

--
Ticket URL: <http://trac.osgeo.org/grass/ticket/1668#comment:9&gt;
GRASS GIS <http://grass.osgeo.org>

#1668: r.regression.line F-test incorrect
--------------------------+-------------------------------------------------
  Reporter: cmbarton | Owner: grass-dev@…
      Type: defect | Status: closed
  Priority: normal | Milestone: 7.0.0
Component: Raster | Version: unspecified
Resolution: fixed | Keywords:
  Platform: Unspecified | Cpu: Unspecified
--------------------------+-------------------------------------------------
Changes (by cmbarton):

* cc: cmbarton (added)

Comment:

Thanks much

Michael

--
Ticket URL: <http://trac.osgeo.org/grass/ticket/1668#comment:10&gt;
GRASS GIS <http://grass.osgeo.org>