Disclaimer: The views expressed are purely those of the writer and may not in any circumstance be regarded as stating an official position of the European Commission.
r.statistics2 is intended to be a partial replacement for r.statistics, with
support for floating-point cover maps at the expense of not support
quantiles. [1]
r.statistics3 is intended to be a partial replacement for r.statistics, with
support for floating-point cover maps. It provides quantile calculations,
which are absent from r.statistics2. [2]
sorry if my question might sound fairly naive, but I miss what's the
rationale behind having in GRASS 7 the following modules:
r.statistics
r.statistics2
r.statistics3
Could't they be grouped into one at a certain stage?
r.statistics2 and r.statistics3 are intended to replace r.statistics.
But those two modules have almost nothing in common. r.statistics2
calculates statistics which are based upon accumulators (i.e. count,
sum of x^n, sum of (x-mean)^n), while r.statistics3 calculates
quantiles.
If you want a work-alike replacement for r.statistics, it would be
simpler to create a script which just runs r.statistics2 and/or
r.statistics3 to do the work.
In the event that you want both types of statistics, there could be
some efficiency gains to be had by merging the two, but only at the
cost of creating a module which is noticeably more complex than the
sum of its parts.
r.statistics2 and r.statistics3 are intended to replace r.statistics.
But those two modules have almost nothing in common. r.statistics2
calculates statistics which are based upon accumulators (i.e. count,
sum of x^n, sum of (x-mean)^n), while r.statistics3 calculates
quantiles.
If you want a work-alike replacement for r.statistics, it would be
simpler to create a script which just runs r.statistics2 and/or
r.statistics3 to do the work.
In the event that you want both types of statistics, there could be
some efficiency gains to be had by merging the two, but only at the
cost of creating a module which is noticeably more complex than the
sum of its parts.
Thank you for the explanation! I perfectly agree that it’s better to keep a couple of modules instead of a very complex one. But from the user’s POV their names at the moment are not very informative. If you consider also r.stats… how could the user guess what’s the purpose of them all at the first glance? Perhaps names like r.stats.*, where * is the particular function that they perform, would be a bit easier to understand (?)
Just my 2 cents
cheers madi
–
Best regards,
Dr. Margherita DI LEO
Scientific / technical project officer
European Commission - DG JRC
Institute for Environment and Sustainability (IES)
Via Fermi, 2749
I-21027 Ispra (VA) - Italy - TP 261
Disclaimer: The views expressed are purely those of the writer and may not in any circumstance be regarded as stating an official position of the European Commission.
r.statistics2 is intended to be a partial replacement for r.statistics,
with support for floating-point cover maps at the expense of not support
quantiles. [1]
r.statistics3 is intended to be a partial replacement for r.statistics,
with support for floating-point cover maps. It provides quantile
calculations, which are absent from r.statistics2. [2]
Glynn wrote:
r.statistics2 and r.statistics3 are intended to replace r.statistics.
But those two modules have almost nothing in common. r.statistics2
calculates statistics which are based upon accumulators (i.e. count,
sum of x^n, sum of (x-mean)^n), while r.statistics3 calculates
quantiles.
If you want a work-alike replacement for r.statistics, it would be
simpler to create a script which just runs r.statistics2 and/or
r.statistics3 to do the work.
In the event that you want both types of statistics, there could be
some efficiency gains to be had by merging the two, but only at the
cost of creating a module which is noticeably more complex than the
sum of its parts.
Madi:
Thank you for the explanation! I perfectly agree that it's better to
keep a couple of modules instead of a very complex one. But from the
user's POV their names at the moment are not very informative. If you
consider also r.stats... how could the user guess what's the purpose of
them all at the first glance? Perhaps names like r.stats.*, where * is
the particular function that they perform, would be a bit easier to
understand (?)
perhaps -> r.stats.cover and r.stats.quantile?
we should also add r.stats (and perhaps r.univar) into this discussion.
r.stats -> r.stats.summary ?
On Sun, Jun 30, 2013 at 3:20 AM, Hamish <hamish_b@yahoo.com> wrote:
Helmut wrote:
>>> r.statistics2 is intended to be a partial replacement for r.statistics,
>>> with support for floating-point cover maps at the expense of not
support
>>> quantiles. [1]
>>>
>>> r.statistics3 is intended to be a partial replacement for r.statistics,
>>> with support for floating-point cover maps. It provides quantile
>>> calculations, which are absent from r.statistics2. [2]
Glynn wrote:
>> r.statistics2 and r.statistics3 are intended to replace r.statistics.
>> But those two modules have almost nothing in common. r.statistics2
>> calculates statistics which are based upon accumulators (i.e. count,
>> sum of x^n, sum of (x-mean)^n), while r.statistics3 calculates
>> quantiles.
>>
>> If you want a work-alike replacement for r.statistics, it would be
>> simpler to create a script which just runs r.statistics2 and/or
>> r.statistics3 to do the work.
>>
>> In the event that you want both types of statistics, there could be
>> some efficiency gains to be had by merging the two, but only at the
>> cost of creating a module which is noticeably more complex than the
>> sum of its parts.
>
Madi:
> Thank you for the explanation! I perfectly agree that it's better to
> keep a couple of modules instead of a very complex one. But from the
> user's POV their names at the moment are not very informative. If you
> consider also r.stats... how could the user guess what's the purpose of
> them all at the first glance? Perhaps names like r.stats.*, where * is
> the particular function that they perform, would be a bit easier to
> understand (?)
perhaps -> r.stats.cover and r.stats.quantile?
we should also add r.stats (and perhaps r.univar) into this discussion.
r.stats -> r.stats.summary ?
+1
Thanks,
madi
--
Best regards,
Dr. Margherita DI LEO
Scientific / technical project officer
European Commission - DG JRC
Institute for Environment and Sustainability (IES)
Via Fermi, 2749
I-21027 Ispra (VA) - Italy - TP 261
Disclaimer: The views expressed are purely those of the writer and may not
in any circumstance be regarded as stating an official position of the
European Commission.
I'm not sure about the first one. Is there a generic name for
aggregates which involve sums (count, sum, mean, variance, standard
deviation, skew, kurtosis)?
r.statistics3 was derived from r.quantile by keeping a separate state
for each category in the base map.
Note that neither r.statistics2 nor r.statistics3 can calculate the
mode. I'm not sure if the concept of mode is even meaningful when the
inputs are floating-point maps (both modules automatically promote the
cover maps to DCELL, and always generate DCELL outputs (even for
method=count)).
However, r.mode still exists (maybe we should rename it to
r.statistics4 for consistency).
we should also add r.stats (and perhaps r.univar) into this discussion.
r.stats -> r.stats.summary ?
r.collate? r.stats basically groups the input values (or the cartesian
product of multiple inputs) into bins then dumps the <value(s),count>
pairs.
I'm not sure about the first one. Is there a generic name for
aggregates which involve sums (count, sum, mean, variance, standard
deviation, skew, kurtosis)?
r.statistics3 was derived from r.quantile by keeping a separate state
for each category in the base map.
Note that neither r.statistics2 nor r.statistics3 can calculate the
mode. I'm not sure if the concept of mode is even meaningful when the
inputs are floating-point maps (both modules automatically promote the
cover maps to DCELL, and always generate DCELL outputs (even for
method=count)).
However, r.mode still exists (maybe we should rename it to
r.statistics4 for consistency).
we should also add r.stats (and perhaps r.univar) into this discussion.
r.stats -> r.stats.summary ?
r.collate? r.stats basically groups the input values (or the cartesian
product of multiple inputs) into bins then dumps the <value(s),count>
pairs.
2013-08-04 23:51 GMT+02:00 Martin Landa <landa.martin@gmail.com>:
would be nice to decide before we start releasing tech-previews. Martin
for the record, after discussion in OSGeo Vienna Code Sprint -
`r.statistics2` has been renamed in trunk to `r.stats.zonal` and
`r.statistics3` to `r.stats.quantile`. Martin
> would be nice to decide before we start releasing tech-previews. Martin
for the record, after discussion in OSGeo Vienna Code Sprint -
`r.statistics2` has been renamed in trunk to `r.stats.zonal` and
`r.statistics3` to `r.stats.quantile`. Martin
In which case, r.statistics should probably be removed altogether.
It has been (almost completely) superseded by the other two modules (a
fact which is less likely to be noticed in light of the renaming).
The only functionality which isn't available via the other modules is
the mode calculation. The main reason being that the new modules
assume floating-point data (r.statistics uses r.stats which is based
upon categories), for which the concept of "mode" isn't particularly
meaningful.