[GRASS-dev] Re: [GRASS-SVN] r33673 - in grass/trunk/raster: . r.statistics2

Glynn,

On Sun, Oct 5, 2008 at 12:40 AM, <svn_grass@osgeo.org> wrote:

Author: glynn
Date: 2008-10-04 18:40:43 -0400 (Sat, 04 Oct 2008)
New Revision: 33673

...

Added: grass/trunk/raster/r.statistics2/r.statistics2.html

--- grass/trunk/raster/r.statistics2/r.statistics2.html (rev 0)
+++ grass/trunk/raster/r.statistics2/r.statistics2.html 2008-10-04 22:40:43 UTC (rev 33673)
@@ -0,0 +1,16 @@
+<h2>DESCRIPTION</h2>
+
+<em>r.statistics2</em> is intended to be a partial replacement for
+r.statistics, with support for floating-point cover maps at the
+expense of not support quantiles.

... could you elaborate on the limitations ("partial replacement")?
Any reason to keep r.statistics?

thanks
Markus

Markus Neteler wrote:

> Author: glynn
> Date: 2008-10-04 18:40:43 -0400 (Sat, 04 Oct 2008)
> New Revision: 33673
...
> Added: grass/trunk/raster/r.statistics2/r.statistics2.html
> ===================================================================
> --- grass/trunk/raster/r.statistics2/r.statistics2.html (rev 0)
> +++ grass/trunk/raster/r.statistics2/r.statistics2.html 2008-10-04 22:40:43 UTC (rev 33673)
> @@ -0,0 +1,16 @@
> +<h2>DESCRIPTION</h2>
> +
> +<em>r.statistics2</em> is intended to be a partial replacement for
> +r.statistics, with support for floating-point cover maps at the
> +expense of not support quantiles.

... could you elaborate on the limitations ("partial replacement")?
Any reason to keep r.statistics?

r.statistics2 lacks quantiles, mode and diversity, as these can't be
calculated by accumulation (i.e. they require sorting and/or binning
of the cover values).

r.statistics relies upon the binning and sorting performed by r.stats,
but that relies upon quantisation to keep the number of bins down.
With unquantised FP maps, it's quite likely that you will end up with
approximately one bin per cell. For large maps, this may exceed
available memory, and sorting such large arrays will be slow.

Computing the missing attributes (without loading the maps into
memory) would require an approach similar to r.quantile, but with
fewer bins (you need one set of bins for each base category) and thus
more passes. That's really a job for a separate module, as its
structure would be entirely different to r.statistics2.

If there are only a few base categories, you could just run r.quantile
once for each category.

--
Glynn Clements <glynn@gclements.plus.com>

Glynn Clements wrote:

> > Author: glynn
> > Date: 2008-10-04 18:40:43 -0400 (Sat, 04 Oct 2008)
> > New Revision: 33673
> ...
> > Added: grass/trunk/raster/r.statistics2/r.statistics2.html
> > ===================================================================
> > --- grass/trunk/raster/r.statistics2/r.statistics2.html (rev 0)
> > +++ grass/trunk/raster/r.statistics2/r.statistics2.html 2008-10-04 22:40:43 UTC (rev 33673)
> > @@ -0,0 +1,16 @@
> > +<h2>DESCRIPTION</h2>
> > +
> > +<em>r.statistics2</em> is intended to be a partial replacement for
> > +r.statistics, with support for floating-point cover maps at the
> > +expense of not support quantiles.
>
> ... could you elaborate on the limitations ("partial replacement")?
> Any reason to keep r.statistics?

r.statistics2 lacks quantiles, mode and diversity, as these can't be
calculated by accumulation (i.e. they require sorting and/or binning
of the cover values).

Computing the missing attributes (without loading the maps into
memory) would require an approach similar to r.quantile, but with
fewer bins (you need one set of bins for each base category) and thus
more passes. That's really a job for a separate module, as its
structure would be entirely different to r.statistics2.

I've now added such a module, r.statistics3 (I'll let someone else
think up better names for these modules). It's based upon r.quantile,
and as such only computes quantiles.

I have no idea how you would compute diversity or mode for FP data
(other than by sorting the entire set of values), or if this is even
meaningful (if the values are physical quantities, then in the absence
of rounding due to limited measuring precision, each value should
theoretically be unique).

--
Glynn Clements <glynn@gclements.plus.com>

Glynn Clements wrote:

I have no idea how you would compute diversity or mode for FP data
(other than by sorting the entire set of values), or if this is even
meaningful (if the values are physical quantities, then in the absence
of rounding due to limited measuring precision, each value should
theoretically be unique).

The quick answer is that you can't calculate diversity or mode for FP data
without depending on the quirks of FP precision or generalizing into
quantiles.

A more abstract answer is that you can, if you consider peaks in the
histogram, (like a gas chromatograph plot or radiance absorption peaks in
satellite imagery). A bimodal distribution would be more diverse than a
simple Gaussian bell curve, etc. Spike with the biggest area under it would
be the mode?(??) Again there is the issue of quantile limits: how wide is a
real spike vs. what is sampling noise? what constitutes the spike vs
bump threshold?

just an idea,
Hamish

__________________________________________________
Do You Yahoo!?
Tired of spam? Yahoo! Mail has the best spam protection around
http://mail.yahoo.com