[GRASS-dev] r.univar -e

Hi all,

I have added '-e' flag to r.univar according to r.univar.sh. See the
attached patch. Please look at the code, any comments welcomed...
(before committing to CVS - if desired...).

Best, Martin

  GRASS 6.3.cvs (spearfish60):~ > r.univar help

Description:
  Calculates univariate statistics from the non-null cells of a raster map.

Keywords:
raster, statistics

Usage:
  r.univar [-qge] map=name [percentile=value]

Flags:
  -q Quiet mode
  -g Print the stats in shell script style
  -e Calculate extended statistics (quartiles and percentile)

Parameters:
         map Name of input raster map
  percentile Percentile to calculate (requires -e flag)
               default: 90

Just simple test:

* for DCELL:
GRASS 6.3.cvs (spearfish60):~ > g.region rast=elevation.10m;r.univar
elevation.10m -e
100%
total null and non-null cells: 2654802
total null cells : 0

Of the non-null cells:
----------------------
Number of cells (excluding NULL cells): 2654802
Minimum : 1061.06
Maximum : 1846.74
Range : 785.679
Arithmetic mean : 1348.37
Arithmetic mean of absolute values : 1348.37
Standard deviation : 175.494
Variance : 30798.3
Variation coefficient : 13.0153 %
Sum : 3579659211.6848597527
1st Quartile : 1196.8
Median (even number of cells) : 1309.37
3st Quartile : 1480.29
90 Percentile : 1613.6

* for FCELL:
GRASS 6.3.cvs (spearfish60):~ > g.region rast=slope;r.univar slope -e
100%
total null and non-null cells: 303052
total null cells : 12929

Of the non-null cells:
----------------------
Number of cells (excluding NULL cells): 290123
Minimum : 0
Maximum : 52.5202
Range : 52.5202
Arithmetic mean : 11.5277
Arithmetic mean of absolute values : 11.5277
Standard deviation : 7.64516
Variance : 58.4484
Variation coefficient : 66.3198 %
Sum : 3344457.5523851216
1st Quartile : 5.38598
Median (odd number of cells) : 9.97027
3st Quartile : 16.3104
90 Percentile : 22.4337

* for CELL:
GRASS 6.3.cvs (spearfish60):~ > g.region rast=roads;r.univar roads -e
100%
total null and non-null cells: 302418
total null cells : 291124

Of the non-null cells:
----------------------
Number of cells (excluding NULL cells): 11294
Minimum : 1
Maximum : 5
Range : 4
Arithmetic mean : 3.8091
Arithmetic mean of absolute values : 3.8091
Standard deviation : 1.29705
Variance : 1.68235
Variation coefficient : 34.0514 %
Sum : 43020
1st Quartile : 3
Median (even number of cells) : 4
3st Quartile : 5
90 Percentile : 5

--
Martin Landa <landa.martin@gmail.com> * http://gama.fsv.cvut.cz/~landa *

(attachments)

r_univar-e.diff.gz (2.91 KB)

Hoorah!

Michael
__________________________________________
Michael Barton, Professor of Anthropology
School of Human Evolution & Social Change
Center for Social Dynamics and Complexity
Arizona State University

phone: 480-965-6213
fax: 480-965-7671
www: http://www.public.asu.edu/~cmbarton

From: Martin Landa <landa.martin@gmail.com>
Date: Thu, 31 Aug 2006 09:42:00 +0200
To: grass-dev <grass-dev@grass.itc.it>
Subject: [GRASS-dev] r.univar -e

Hi all,

I have added '-e' flag to r.univar according to r.univar.sh. See the
attached patch. Please look at the code, any comments welcomed...
(before committing to CVS - if desired...).

Best, Martin

  GRASS 6.3.cvs (spearfish60):~ > r.univar help

Description:
  Calculates univariate statistics from the non-null cells of a raster map.

Keywords:
raster, statistics

Usage:
  r.univar [-qge] map=name [percentile=value]

Flags:
  -q Quiet mode
  -g Print the stats in shell script style
  -e Calculate extended statistics (quartiles and percentile)

Parameters:
         map Name of input raster map
  percentile Percentile to calculate (requires -e flag)
               default: 90

Just simple test:

* for DCELL:
GRASS 6.3.cvs (spearfish60):~ > g.region rast=elevation.10m;r.univar
elevation.10m -e
100%
total null and non-null cells: 2654802
total null cells : 0

Of the non-null cells:
----------------------
Number of cells (excluding NULL cells): 2654802
Minimum : 1061.06
Maximum : 1846.74
Range : 785.679
Arithmetic mean : 1348.37
Arithmetic mean of absolute values : 1348.37
Standard deviation : 175.494
Variance : 30798.3
Variation coefficient : 13.0153 %
Sum : 3579659211.6848597527
1st Quartile : 1196.8
Median (even number of cells) : 1309.37
3st Quartile : 1480.29
90 Percentile : 1613.6

* for FCELL:
GRASS 6.3.cvs (spearfish60):~ > g.region rast=slope;r.univar slope -e
100%
total null and non-null cells: 303052
total null cells : 12929

Of the non-null cells:
----------------------
Number of cells (excluding NULL cells): 290123
Minimum : 0
Maximum : 52.5202
Range : 52.5202
Arithmetic mean : 11.5277
Arithmetic mean of absolute values : 11.5277
Standard deviation : 7.64516
Variance : 58.4484
Variation coefficient : 66.3198 %
Sum : 3344457.5523851216
1st Quartile : 5.38598
Median (odd number of cells) : 9.97027
3st Quartile : 16.3104
90 Percentile : 22.4337

* for CELL:
GRASS 6.3.cvs (spearfish60):~ > g.region rast=roads;r.univar roads -e
100%
total null and non-null cells: 302418
total null cells : 291124

Of the non-null cells:
----------------------
Number of cells (excluding NULL cells): 11294
Minimum : 1
Maximum : 5
Range : 4
Arithmetic mean : 3.8091
Arithmetic mean of absolute values : 3.8091
Standard deviation : 1.29705
Variance : 1.68235
Variation coefficient : 34.0514 %
Sum : 43020
1st Quartile : 3
Median (even number of cells) : 4
3st Quartile : 5
90 Percentile : 5

--
Martin Landa <landa.martin@gmail.com> * http://gama.fsv.cvut.cz/~landa *

Martin Landa wrote:

Hi all,

I have added '-e' flag to r.univar according to r.univar.sh. See the
attached patch. Please look at the code, any comments welcomed...
(before committing to CVS - if desired...).

Nice work!

I have merged your patch locally with a few minor changes:
  http://bambi.otago.ac.nz/hamish/grass/r.univar_ext.c
    or if you prefer,
  http://bambi.otago.ac.nz/hamish/grass/r.univar_ext.diff

A few comments/questions before putting it in CVS:

* It is quite a bit faster than r.univar.sh! (I guess this is to be
expected, but it's always nice to see)

* qsort() comparison functions are declared as static int.
a) shouldn't they just be int?
b) could/should these fns be inlined for speed?

* GRASS 5's s.cellstats uses something called qisort() instead of
qsort(), which claims to be faster. Comments from the crowd?
http://freegis.org/cgi-bin/viewcvs.cgi/grass/src/sites/s.cellstats/qisort.c?rev=HEAD&content-type=text/vnd.viewcvs-markup

* Are there any issues with having shell variables (-g flag) which start
with a number?

* I've held off on reformatting the output. Your patch assumes that the
font will be monospaced, while the new GUI(s) may prefer to use a
proportional font.

* fabsf() removed as it's non-POSIX. fabs() used instead.

* Kept "Mean" instead of "Arithmetic mean". Sample, population,
geometric means don't apply in this context, so keep it simple.

TODOs
easy: mean of squares (if anyone needs this just ask & I'll put it in)
harder: mode, skewness, kurtosis (begging required, file a wish)

thanks,
Hamish

Hamish,

2006/9/6, Hamish <hamish_nospam@yahoo.com>:

I have merged your patch locally with a few minor changes:
  http://bambi.otago.ac.nz/hamish/grass/r.univar_ext.c
    or if you prefer,
  http://bambi.otago.ac.nz/hamish/grass/r.univar_ext.diff

A few comments/questions before putting it in CVS:

* It is quite a bit faster than r.univar.sh! (I guess this is to be
expected, but it's always nice to see)

It should be:-)

* qsort() comparison functions are declared as static int.
a) shouldn't they just be int?
b) could/should these fns be inlined for speed?

Not sure, ...

* GRASS 5's s.cellstats uses something called qisort() instead of
qsort(), which claims to be faster. Comments from the crowd?
http://freegis.org/cgi-bin/viewcvs.cgi/grass/src/sites/s.cellstats/qisort.c?rev=HEAD&content-type=text/vnd.viewcvs-markup

Hm, I have simply tested

time r.univar elevation.10m -e 2>/dev/null| grep s$

real 0m0.817s
user 0m0.764s
sys 0m0.052s

time r.univar elevation.10m -e 2>/dev/null| grep s$

real 0m0.532s
user 0m0.516s
sys 0m0.016s

Not sure, should be qisort function part of GRASS library?

* Are there any issues with having shell variables (-g flag) which start
with a number?

* I've held off on reformatting the output. Your patch assumes that the
font will be monospaced, while the new GUI(s) may prefer to use a
proportional font.

* fabsf() removed as it's non-POSIX. fabs() used instead.

* Kept "Mean" instead of "Arithmetic mean". Sample, population,
geometric means don't apply in this context, so keep it simple.

Agreed

TODOs
easy: mean of squares (if anyone needs this just ask & I'll put it in)
harder: mode, skewness, kurtosis (begging required, file a wish)

Nice to know:-)

Best, Martin

_______________________________________________
grass-dev mailing list
grass-dev@grass.itc.it
http://grass.itc.it/mailman/listinfo/grass-dev

--
Martin Landa <landa.martin@gmail.com> * http://gama.fsv.cvut.cz/~landa *

On Wed, 2006-09-06 at 17:08 +1200, Hamish wrote:

Martin Landa wrote:
> Hi all,
>
> I have added '-e' flag to r.univar according to r.univar.sh. See the
> attached patch. Please look at the code, any comments welcomed...
> (before committing to CVS - if desired...).

Nice work!

I have merged your patch locally with a few minor changes:
  http://bambi.otago.ac.nz/hamish/grass/r.univar_ext.c
    or if you prefer,
  http://bambi.otago.ac.nz/hamish/grass/r.univar_ext.diff

Great work guys!

A few comments/questions before putting it in CVS:

* qsort() comparison functions are declared as static int.
a) shouldn't they just be int?

static is the best declaration. Declaring the function static means it
is "bound" to that file. As long as qsort() is called [only] from that
file, it will work as designed.

b) could/should these fns be inlined for speed?

Not 100% sure. I would assume that it is legal, but ignored.

* GRASS 5's s.cellstats uses something called qisort() instead of
qsort(), which claims to be faster. Comments from the crowd?
http://freegis.org/cgi-bin/viewcvs.cgi/grass/src/sites/s.cellstats/qisort.c?rev=HEAD&content-type=text/vnd.viewcvs-markup

It seems that there exists a number of replacements for specific
applications (the glibc qsort() is a one-stop-shop and not always
"efficient").

Here is another similar example:
http://www.corpit.ru/mjt/qsort.html

I'll defer to Glynn...

--
Brad Douglas <rez touchofmadness com> KB8UYR
Address: 37.493,-121.924 / WGS84 National Map Corps #TNMC-3785

Hamish wrote:

> I have added '-e' flag to r.univar according to r.univar.sh. See the
> attached patch. Please look at the code, any comments welcomed...
> (before committing to CVS - if desired...).

Nice work!

I have merged your patch locally with a few minor changes:
  http://bambi.otago.ac.nz/hamish/grass/r.univar_ext.c
    or if you prefer,
  http://bambi.otago.ac.nz/hamish/grass/r.univar_ext.diff

A few comments/questions before putting it in CVS:

* qsort() comparison functions are declared as static int.
a) shouldn't they just be int?

They aren't used from outside the file in which they are defined, so
they should be declared "static".

Note that "static" is a storage specifier; it isn't part of the type.

b) could/should these fns be inlined for speed?

Indirect function calls (e.g. qsort() callbacks) cannot be inlined.

* GRASS 5's s.cellstats uses something called qisort() instead of
qsort(), which claims to be faster. Comments from the crowd?
http://freegis.org/cgi-bin/viewcvs.cgi/grass/src/sites/s.cellstats/qisort.c?rev=HEAD&content-type=text/vnd.viewcvs-markup

It claims to be faster than some specific qsort() implementations on a
specific system for specific test cases.

Unless there is empirical evidence that qisort() beats the system's
qsort() on the majority of systems with representative test data, I
would recommend sticking with the system's qsort() routine.

* Are there any issues with having shell variables (-g flag) which start
with a number?

Yes; at least, bash doesn't allow them.

--
Glynn Clements <glynn@gclements.plus.com>

Martin:

> > I have added '-e' flag to r.univar according to r.univar.sh. See
> > the attached patch. Please look at the code, any comments
> > welcomed... (before committing to CVS - if desired...).

Hamish:

> I have merged your patch locally with a few minor changes:

now in 6.3-CVS.

> A few comments/questions before putting it in CVS:

> * qsort() comparison functions are declared as static int.
> a) shouldn't they just be int?

Brad:

static is the best declaration. Declaring the function static means
it is "bound" to that file. As long as qsort() is called [only] from
that file, it will work as designed.

Glynn:

They aren't used from outside the file in which they are defined, so
they should be declared "static".

Note that "static" is a storage specifier; it isn't part of the type.

ok, I was thinking about the "global variable" use of that memory space.
I guess the multi-file thing precludes qisort.c becoming a fast G_qsort()?

fwiw, r.terraflow defines comparison functions in a similar way.

> b) could/should these fns be inlined for speed?

Indirect function calls (e.g. qsort() callbacks) cannot be inlined.

> * GRASS 5's s.cellstats uses something called qisort() instead of
> qsort(), which claims to be faster. Comments from the crowd?
> http://freegis.org/cgi-bin/viewcvs.cgi/grass/src/sites/s.cellstats/qisort.c?rev=HEAD&content-type=text/vnd.viewcvs-markup

It claims to be faster than some specific qsort() implementations on a
specific system for specific test cases.

Unless there is empirical evidence that qisort() beats the system's
qsort() on the majority of systems with representative test data, I
would recommend sticking with the system's qsort() routine.

Martin:
real 0m0.817s
..
real 0m0.532s

But I suppose the gcc/glibc people have their [good] reasons..........

> * Are there any issues with having shell variables (-g flag) which
> start with a number?

Yes; at least, bash doesn't allow them.

ok, changed 1st_quartile= to first_quartile=, etc.

I have updated i.landsat.rgb in CVS to use the new r.univar. For my
sample imagery, processing now takes 4.0 seconds instead of 31.5! 8x win!

Hamish

Hamish wrote on 09/07/2006 12:02 PM:

Martin:
  

I have added '-e' flag to r.univar according to r.univar.sh. See
the attached patch. Please look at the code, any comments
welcomed... (before committing to CVS - if desired...).

I have updated i.landsat.rgb in CVS to use the new r.univar. For my
sample imagery, processing now takes 4.0 seconds instead of 31.5! 8x win!
  
Wow, this is really great. Thanks so much for the long
awaited improvement.

Markus

Hamish wrote:

> > A few comments/questions before putting it in CVS:
>
> > * qsort() comparison functions are declared as static int.
> > a) shouldn't they just be int?
Brad:
> static is the best declaration. Declaring the function static means
> it is "bound" to that file. As long as qsort() is called [only] from
> that file, it will work as designed.
Glynn:
> They aren't used from outside the file in which they are defined, so
> they should be declared "static".
>
> Note that "static" is a storage specifier; it isn't part of the type.

ok, I was thinking about the "global variable" use of that memory space.
I guess the multi-file thing precludes qisort.c becoming a fast G_qsort()?

No. In my comments above, "used" refers to the point where the
variable's name appears.

If you have e.g. G_qsort(..., cmp_int), cmp_int is "used" in the file
in which the call occurs, not the file where G_qsort() is defined. So
long as the call occurs in the file where cmp_int is defined, it
doesn't matter if it is declared "static".

The "static" modifier on a global variable causes the symbol to be
omitted from the symbol table of the object file, so the symbol cannot
be referenced from another file. The object (variable, function) to
which the symbol refers can still be referenced via a pointer, just
not by name.

> > b) could/should these fns be inlined for speed?
>
> Indirect function calls (e.g. qsort() callbacks) cannot be inlined.
>
> > * GRASS 5's s.cellstats uses something called qisort() instead of
> > qsort(), which claims to be faster. Comments from the crowd?
> > http://freegis.org/cgi-bin/viewcvs.cgi/grass/src/sites/s.cellstats/qisort.c?rev=HEAD&content-type=text/vnd.viewcvs-markup
>
> It claims to be faster than some specific qsort() implementations on a
> specific system for specific test cases.
>
> Unless there is empirical evidence that qisort() beats the system's
> qsort() on the majority of systems with representative test data, I
> would recommend sticking with the system's qsort() routine.

Martin:
real 0m0.817s
..
real 0m0.532s

That's a sample population of one, which is a rather small sample.
Also, I don't know how representative the test data is.

But I suppose the gcc/glibc people have their [good] reasons..........

The libc qsort() needs to work over a wide range of cases, in terms of
element size, number of elements, and whether the data is almost
sorted, unsorted almost reverse-sorted etc.

Some approaches do better for almost-sorted data at the expense of the
general case (or vice-versa), or have better worst-case behaviour at
the expense of the general case (or vice-versa).

In general terms, a sorting algorithm optimised for specific cases
will do better than a general-case one. If we were to add a single
G_qsort() function, we would need to consider all use cases rather
than choosing an algorithm based upon a specific case. OTOH, if we
were to add a selection of sorting functions, each case would need to
determine the correct one to use.

--
Glynn Clements <glynn@gclements.plus.com>

Hamish wrote:
> I guess the multi-file thing precludes qisort.c becoming a
> fast G_qsort()?

No. In my comments above, "used" refers to the point where the
variable's name appears.

If you have e.g. G_qsort(..., cmp_int), cmp_int is "used" in the file
in which the call occurs, not the file where G_qsort() is defined. So
long as the call occurs in the file where cmp_int is defined, it
doesn't matter if it is declared "static".

ok.

The libc qsort() needs to work over a wide range of cases, in terms of
element size, number of elements, and whether the data is almost
sorted, unsorted almost reverse-sorted etc.

Some approaches do better for almost-sorted data at the expense of the
general case (or vice-versa), or have better worst-case behaviour at
the expense of the general case (or vice-versa).

In general terms, a sorting algorithm optimised for specific cases
will do better than a general-case one. If we were to add a single
G_qsort() function, we would need to consider all use cases rather
than choosing an algorithm based upon a specific case. OTOH, if we
were to add a selection of sorting functions, each case would need to
determine the correct one to use.

I suppose our general case is cells which are not fully random. Usually
raster data will be clumped, or at least cells will be part of a
continous function (nearby cells won't be far off). Highly random raster
data is probably pretty unusual.

Hamish