[GRASS-dev] r.statistics in G7

Hi,

sorry if my question might sound fairly naive, but I miss what’s the rationale behind having in GRASS 7 the following modules:

r.statistics
r.statistics2
r.statistics3

Could’t they be grouped into one at a certain stage?

Thanks to anyone who’ll take the time to answer.

Best regards,

Dr. Margherita DI LEO
Scientific / technical project officer

European Commission - DG JRC
Institute for Environment and Sustainability (IES)
Via Fermi, 2749
I-21027 Ispra (VA) - Italy - TP 261

Tel. +39 0332 78 3600
margherita.di-leo@jrc.ec.europa.eu

Disclaimer: The views expressed are purely those of the writer and may not in any circumstance be regarded as stating an official position of the European Commission.

hi,

r.statistics
r.statistics2
r.statistics3

r.statistics2 is intended to be a partial replacement for r.statistics, with
support for floating-point cover maps at the expense of not support
quantiles. [1]

r.statistics3 is intended to be a partial replacement for r.statistics, with
support for floating-point cover maps. It provides quantile calculations,
which are absent from r.statistics2. [2]

make sense?

[1] http://grass.osgeo.org/grass70/manuals/r.statistics2.html
[2] http://grass.osgeo.org/grass70/manuals/r.statistics3.html

-----
best regards
Helmut
--
View this message in context: http://osgeo-org.1560.x6.nabble.com/r-statistics-in-G7-tp5063033p5063034.html
Sent from the Grass - Dev mailing list archive at Nabble.com.

Margherita Di Leo wrote:

sorry if my question might sound fairly naive, but I miss what's the
rationale behind having in GRASS 7 the following modules:

r.statistics
r.statistics2
r.statistics3

Could't they be grouped into one at a certain stage?

r.statistics2 and r.statistics3 are intended to replace r.statistics.
But those two modules have almost nothing in common. r.statistics2
calculates statistics which are based upon accumulators (i.e. count,
sum of x^n, sum of (x-mean)^n), while r.statistics3 calculates
quantiles.

If you want a work-alike replacement for r.statistics, it would be
simpler to create a script which just runs r.statistics2 and/or
r.statistics3 to do the work.

In the event that you want both types of statistics, there could be
some efficiency gains to be had by merging the two, but only at the
cost of creating a module which is noticeably more complex than the
sum of its parts.

--
Glynn Clements <glynn@gclements.plus.com>

Hi Glynn,

···

On Sat, Jun 29, 2013 at 9:48 PM, Glynn Clements <glynn@gclements.plus.com> wrote:

r.statistics2 and r.statistics3 are intended to replace r.statistics.
But those two modules have almost nothing in common. r.statistics2
calculates statistics which are based upon accumulators (i.e. count,
sum of x^n, sum of (x-mean)^n), while r.statistics3 calculates
quantiles.

If you want a work-alike replacement for r.statistics, it would be
simpler to create a script which just runs r.statistics2 and/or
r.statistics3 to do the work.

In the event that you want both types of statistics, there could be
some efficiency gains to be had by merging the two, but only at the
cost of creating a module which is noticeably more complex than the
sum of its parts.

Thank you for the explanation! I perfectly agree that it’s better to keep a couple of modules instead of a very complex one. But from the user’s POV their names at the moment are not very informative. If you consider also r.stats… how could the user guess what’s the purpose of them all at the first glance? Perhaps names like r.stats.*, where * is the particular function that they perform, would be a bit easier to understand (?)

Just my 2 cents

cheers madi

Best regards,

Dr. Margherita DI LEO
Scientific / technical project officer

European Commission - DG JRC
Institute for Environment and Sustainability (IES)
Via Fermi, 2749
I-21027 Ispra (VA) - Italy - TP 261

Tel. +39 0332 78 3600
margherita.di-leo@jrc.ec.europa.eu

Disclaimer: The views expressed are purely those of the writer and may not in any circumstance be regarded as stating an official position of the European Commission.

Helmut wrote:

r.statistics2 is intended to be a partial replacement for r.statistics,
with support for floating-point cover maps at the expense of not support
quantiles. [1]

r.statistics3 is intended to be a partial replacement for r.statistics,
with support for floating-point cover maps. It provides quantile
calculations, which are absent from r.statistics2. [2]

Glynn wrote:

r.statistics2 and r.statistics3 are intended to replace r.statistics.
But those two modules have almost nothing in common. r.statistics2
calculates statistics which are based upon accumulators (i.e. count,
sum of x^n, sum of (x-mean)^n), while r.statistics3 calculates
quantiles.

If you want a work-alike replacement for r.statistics, it would be
simpler to create a script which just runs r.statistics2 and/or
r.statistics3 to do the work.

In the event that you want both types of statistics, there could be
some efficiency gains to be had by merging the two, but only at the
cost of creating a module which is noticeably more complex than the
sum of its parts.

Madi:

Thank you for the explanation! I perfectly agree that it's better to
keep a couple of modules instead of a very complex one. But from the
user's POV their names at the moment are not very informative. If you
consider also r.stats... how could the user guess what's the purpose of
them all at the first glance? Perhaps names like r.stats.*, where * is
the particular function that they perform, would be a bit easier to
understand (?)

perhaps -> r.stats.cover and r.stats.quantile?

we should also add r.stats (and perhaps r.univar) into this discussion.
r.stats -> r.stats.summary ?

Hamish

On Sun, Jun 30, 2013 at 3:20 AM, Hamish <hamish_b@yahoo.com> wrote:

Helmut wrote:
>>> r.statistics2 is intended to be a partial replacement for r.statistics,
>>> with support for floating-point cover maps at the expense of not
support
>>> quantiles. [1]
>>>
>>> r.statistics3 is intended to be a partial replacement for r.statistics,
>>> with support for floating-point cover maps. It provides quantile
>>> calculations, which are absent from r.statistics2. [2]

Glynn wrote:
>> r.statistics2 and r.statistics3 are intended to replace r.statistics.
>> But those two modules have almost nothing in common. r.statistics2
>> calculates statistics which are based upon accumulators (i.e. count,
>> sum of x^n, sum of (x-mean)^n), while r.statistics3 calculates
>> quantiles.
>>
>> If you want a work-alike replacement for r.statistics, it would be
>> simpler to create a script which just runs r.statistics2 and/or
>> r.statistics3 to do the work.
>>
>> In the event that you want both types of statistics, there could be
>> some efficiency gains to be had by merging the two, but only at the
>> cost of creating a module which is noticeably more complex than the
>> sum of its parts.
>

Madi:
> Thank you for the explanation! I perfectly agree that it's better to
> keep a couple of modules instead of a very complex one. But from the
> user's POV their names at the moment are not very informative. If you
> consider also r.stats... how could the user guess what's the purpose of
> them all at the first glance? Perhaps names like r.stats.*, where * is
> the particular function that they perform, would be a bit easier to
> understand (?)

perhaps -> r.stats.cover and r.stats.quantile?

we should also add r.stats (and perhaps r.univar) into this discussion.
r.stats -> r.stats.summary ?

+1

Thanks,
madi

--
Best regards,

Dr. Margherita DI LEO
Scientific / technical project officer

European Commission - DG JRC
Institute for Environment and Sustainability (IES)
Via Fermi, 2749
I-21027 Ispra (VA) - Italy - TP 261

Tel. +39 0332 78 3600
margherita.di-leo@jrc.ec.europa.eu

Disclaimer: The views expressed are purely those of the writer and may not
in any circumstance be regarded as stating an official position of the
European Commission.

Hamish wrote:

perhaps -> r.stats.cover and r.stats.quantile?

I'm not sure about the first one. Is there a generic name for
aggregates which involve sums (count, sum, mean, variance, standard
deviation, skew, kurtosis)?

r.statistics3 was derived from r.quantile by keeping a separate state
for each category in the base map.

Note that neither r.statistics2 nor r.statistics3 can calculate the
mode. I'm not sure if the concept of mode is even meaningful when the
inputs are floating-point maps (both modules automatically promote the
cover maps to DCELL, and always generate DCELL outputs (even for
method=count)).

However, r.mode still exists (maybe we should rename it to
r.statistics4 for consistency).

we should also add r.stats (and perhaps r.univar) into this discussion.
r.stats -> r.stats.summary ?

r.collate? r.stats basically groups the input values (or the cartesian
product of multiple inputs) into bins then dumps the <value(s),count>
pairs.

--
Glynn Clements <glynn@gclements.plus.com>

Hi,

would be nice to decide before we start releasing tech-previews. Martin

2013/7/2 Glynn Clements <glynn@gclements.plus.com>:

Hamish wrote:

perhaps -> r.stats.cover and r.stats.quantile?

I'm not sure about the first one. Is there a generic name for
aggregates which involve sums (count, sum, mean, variance, standard
deviation, skew, kurtosis)?

r.statistics3 was derived from r.quantile by keeping a separate state
for each category in the base map.

Note that neither r.statistics2 nor r.statistics3 can calculate the
mode. I'm not sure if the concept of mode is even meaningful when the
inputs are floating-point maps (both modules automatically promote the
cover maps to DCELL, and always generate DCELL outputs (even for
method=count)).

However, r.mode still exists (maybe we should rename it to
r.statistics4 for consistency).

we should also add r.stats (and perhaps r.univar) into this discussion.
r.stats -> r.stats.summary ?

r.collate? r.stats basically groups the input values (or the cartesian
product of multiple inputs) into bins then dumps the <value(s),count>
pairs.

--
Glynn Clements <glynn@gclements.plus.com>
_______________________________________________
grass-dev mailing list
grass-dev@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/grass-dev

--
Martin Landa <landa.martin gmail.com> * http://geo.fsv.cvut.cz/~landa

Hi,

2013-08-04 23:51 GMT+02:00 Martin Landa <landa.martin@gmail.com>:

would be nice to decide before we start releasing tech-previews. Martin

for the record, after discussion in OSGeo Vienna Code Sprint -
`r.statistics2` has been renamed in trunk to `r.stats.zonal` and
`r.statistics3` to `r.stats.quantile`. Martin

Martin Landa wrote:

> would be nice to decide before we start releasing tech-previews. Martin

for the record, after discussion in OSGeo Vienna Code Sprint -
`r.statistics2` has been renamed in trunk to `r.stats.zonal` and
`r.statistics3` to `r.stats.quantile`. Martin

In which case, r.statistics should probably be removed altogether.

It has been (almost completely) superseded by the other two modules (a
fact which is less likely to be noticed in light of the renaming).

The only functionality which isn't available via the other modules is
the mode calculation. The main reason being that the new modules
assume floating-point data (r.statistics uses r.stats which is based
upon categories), for which the concept of "mode" isn't particularly
meaningful.

--
Glynn Clements <glynn@gclements.plus.com>