Hi,

I wonder if this change is reasonable to make the script

working with FP maps:

diff -u -r1.17 scripts/r.regression.line

--- r.regression.line 3 Dec 2007 22:52:43 -0000 1.17

+++ r.regression.line 3 Dec 2007 22:54:09 -0000

@@ -90,7 +90,7 @@

#calculate regression equation

-r.stats -c input=$GIS_OPT_MAP1,$GIS_OPT_MAP2 > "$TMP"

+r.stats -cAN input=$GIS_OPT_MAP1,$GIS_OPT_MAP2 > "$TMP"

awk '{tot += $3;sumX +=$1 * $3; sumsqX +=$1*$1*$3;sumY +=$2 * $3; sumsqY +=$2*$2*$3;\

sumXY +=$1*$2*$3;\

}\

changes:

-n to suppress no data

-A to avoid range output for FP maps

?

(Un)related question:

Would it be a big mess to implement r.regression.line a C module?

Markus

Markus Neteler wrote:

Hi,

I wonder if this change is reasonable to make the script

working with FP maps:

diff -u -r1.17 scripts/r.regression.line

--- r.regression.line 3 Dec 2007 22:52:43 -0000 1.17

+++ r.regression.line 3 Dec 2007 22:54:09 -0000

@@ -90,7 +90,7 @@

#calculate regression equation

-r.stats -c input=$GIS_OPT_MAP1,$GIS_OPT_MAP2 > "$TMP"

+r.stats -cAN input=$GIS_OPT_MAP1,$GIS_OPT_MAP2 > "$TMP"

awk '{tot += $3;sumX +=$1 * $3; sumsqX +=$1*$1*$3;sumY +=$2 * $3; sumsqY

+=$2*$2*$3;\

sumXY +=$1*$2*$3;\

}\

changes:

-n to suppress no data

-A to avoid range output for FP maps

?

Committed to CVS.

(Un)related question:

Would it be a big mess to implement r.regression.line a C module?

Markus

--

View this message in context: http://www.nabble.com/r.regression.line-fix-proposal-tf4939973.html#a14158969

Sent from the Grass - Dev mailing list archive at Nabble.com.

On Tuesday 04 December 2007, Markus Neteler wrote:

Markus Neteler wrote:

> Hi,

>

> I wonder if this change is reasonable to make the script

> working with FP maps:

>

> diff -u -r1.17 scripts/r.regression.line

> --- r.regression.line 3 Dec 2007 22:52:43 -0000 1.17

> +++ r.regression.line 3 Dec 2007 22:54:09 -0000

> @@ -90,7 +90,7 @@

>

>

> #calculate regression equation

> -r.stats -c input=$GIS_OPT_MAP1,$GIS_OPT_MAP2 > "$TMP"

> +r.stats -cAN input=$GIS_OPT_MAP1,$GIS_OPT_MAP2 > "$TMP"

> awk '{tot += $3;sumX +=$1 * $3; sumsqX +=$1*$1*$3;sumY +=$2 * $3; sumsqY

> +=$2*$2*$3;\

> sumXY +=$1*$2*$3;\

> }\

>

>

> changes:

> -n to suppress no data

> -A to avoid range output for FP maps

> ?

Committed to CVS.

(Un)related question:

Would it be a big mess to implement r.regression.line a C module?

Markus

If written as a C module, could it take advantage of any stats library

functions lying around ?

Dylan

--

Dylan Beaudette

Soil Resource Laboratory

http://casoilresource.lawr.ucdavis.edu/

University of California at Davis

530.754.7341

Dylan Beaudette wrote:

> Would it be a big mess to implement r.regression.line a C module?

>

> Markus

If written as a C module, could it take advantage of any stats library

functions lying around ?

The aggregate functions in lib/stats require that the entire sample is

held in memory. This makes them unsuitable for computing an aggregate

over a substantial proportion of a map's values.

For r.regression.line, r.univar, r.statistics, etc, you need to use an

incremental approach. For some aggregates (count, sum, mean), this is

relatively straightforward.

For variance and deviation, there's the issue of a one-pass or

two-pass algorithm. A two-pass approach (calculating the mean on the

first pass) is more accurate, but requires two passes (which rules out

reading data from a pipe).

For quantiles, you don't want to sort vast amounts of data at

O(n.log(n)) complexity just to obtain specific quantiles. It's more

efficient to compute successive histograms, refining the interval(s)

containing the desired quantile(s) on each pass, and only sorting once

you've reduced the data to a manageable size. This could require

several passes, depending upon the amount of data, the amount of

memory available, and the distribution of the data.

--

Glynn Clements <glynn@gclements.plus.com>

I have added a new flag to r.regression.line since the current

approach wasn't very precise (due to r.stats -c which isn't perfect

for FP maps).

#calculate regression equation

-r.stats -cnA input=$GIS_OPT_MAP1,$GIS_OPT_MAP2 > "$TMP"

+if [ $GIS_FLAG_S -eq 1 ] ; then

+ # slower but accurate

+ r.stats -n1 input=$GIS_OPT_MAP1,$GIS_OPT_MAP2 | sed 's+$+ 1+g' > "$TMP"

+else

+ # count "identical" pixels

+ r.stats -cnA input=$GIS_OPT_MAP1,$GIS_OPT_MAP2 > "$TMP"

+fi

+

The flag -s select the slower method which writes out all

pixel values individually to the temporary file. The result is then

identical to that obtained from R-stats's lm() function.

Markus

--

View this message in context: http://www.nabble.com/r.regression.line-fix-proposal-tp14141009p15124457.html

Sent from the Grass - Dev mailing list archive at Nabble.com.