[GRASS-dev] G7: r.neighbors changes data type from CELL to DCELL

Hi,

in r59668 and r59669 I have fixed the issue that a map type
(CELL/FCELL/DCELL) was not preserved as expected but always converted
to DCELL.

Now CELL is generated for

    {c_count, w_count, NO_CATS, 0, 0, 1, "count", "count of non-NULL values"},
    {c_divr, NULL, divr_cats, 0, 0, 1, "diversity",
    {c_intr, NULL, intr_cats, 0, 0, 1, "interspersion",

Example:
g.region rast=landuse96_28m
r.info -g landuse96_28m | grep datatype
datatype=CELL
r.neighbors input=landuse96_28m output=landuse96_28m_count7x7 method=count
100%
r.info -g landuse96_28m_count7x7 | grep datatype
datatype=CELL

Remaining issues:

1) "quantile=0.5" creeps into the history:

r.info landuse96_28m_mode7x7
...
| r.neighbors input="landuse96_28m" output="landuse96_28m_mode7x7" met\ |
| hod="mode" size=3 quantile=0.5

--> quantile=0.5 should not be there (even if ignored).

2) The method=mode does not preserve CELL (likewise in r.series).

Any suggestion how to fix these two remaining problems?

thanks
Markus

Markus Neteler wrote:

in r59668 and r59669 I have fixed the issue that a map type
(CELL/FCELL/DCELL) was not preserved as expected but always converted
to DCELL.

Now CELL is generated for

    {c_count, w_count, NO_CATS, 0, 0, 1, "count", "count of non-NULL values"},
    {c_divr, NULL, divr_cats, 0, 0, 1, "diversity",
    {c_intr, NULL, intr_cats, 0, 0, 1, "interspersion",

Note that this change is incorrect for method=count when used with
weight=, as the result is the sum of the weights, which isn't
necessarily an integer.

Fixed in trunk in r59677.

Remaining issues:

1) "quantile=0.5" creeps into the history:

r.info landuse96_28m_mode7x7
...
| r.neighbors input="landuse96_28m" output="landuse96_28m_mode7x7" met\ |
| hod="mode" size=3 quantile=0.5

--> quantile=0.5 should not be there (even if ignored).

Remove the line:

    parm.quantile->answer = "0.5";

Optionally change:

  out->quantile = (parm.quantile->answer && parm.quantile->answers[i])
      ? atof(parm.quantile->answers[i])
      : 0;

to default to 0.5 rather than 0.

But given that we already a distinct "median" method, using
"quantile=0.5" is largely redundant.

2) The method=mode does not preserve CELL (likewise in r.series).

Any suggestion how to fix these two remaining problems?

You could just set is_int=1 (although that would make r59677
incorrect, as the weighted mode calulation does return an integer
result for integer values).

Using method=mode isn't particularly meaningful on floating-point
data, as the result is an artifact of the quantisation of the original
data. If measurement was perfectly accurate, every value should be
unique. This is largely why r.stats.zonal (formerly r.statistics2)
doesn't have method=mode.

Alternatively, replace the boolean is_int field with either a pointer
to a function which returns the result type based upon the input type
and whether weights are being used, or an enumeration (which would
either select such a function, or an equivalent block of code within a
switch statement).

--
Glynn Clements <glynn@gclements.plus.com>

On Thu, Apr 10, 2014 at 8:56 PM, Glynn Clements
<glynn@gclements.plus.com> wrote:

Markus Neteler wrote:

in r59668 and r59669 I have fixed the issue that a map type
(CELL/FCELL/DCELL) was not preserved as expected but always converted
to DCELL.

Now CELL is generated for

    {c_count, w_count, NO_CATS, 0, 0, 1, "count", "count of non-NULL values"},
    {c_divr, NULL, divr_cats, 0, 0, 1, "diversity",
    {c_intr, NULL, intr_cats, 0, 0, 1, "interspersion",

Note that this change is incorrect for method=count when used with
weight=, as the result is the sum of the weights, which isn't
necessarily an integer.

Fixed in trunk in r59677.

Thanks,

Remaining issues:

1) "quantile=0.5" creeps into the history:

r.info landuse96_28m_mode7x7
...
| r.neighbors input="landuse96_28m" output="landuse96_28m_mode7x7" met\ |
| hod="mode" size=3 quantile=0.5

--> quantile=0.5 should not be there (even if ignored).

Remove the line:

    parm.quantile->answer = "0.5";

Optionally change:

        out->quantile = (parm.quantile->answer && parm.quantile->answers[i])
            ? atof(parm.quantile->answers[i])
            : 0;

to default to 0.5 rather than 0.

But given that we already a distinct "median" method, using
"quantile=0.5" is largely redundant.

... mmh, not sure what to change.

2) The method=mode does not preserve CELL (likewise in r.series).

Any suggestion how to fix these two remaining problems?

You could just set is_int=1 (although that would make r59677
incorrect, as the weighted mode calulation does return an integer
result for integer values).

Using method=mode isn't particularly meaningful on floating-point
data, as the result is an artifact of the quantisation of the original
data. If measurement was perfectly accurate, every value should be
unique. This is largely why r.stats.zonal (formerly r.statistics2)
doesn't have method=mode.

Alternatively, replace the boolean is_int field with either a pointer
to a function which returns the result type based upon the input type
and whether weights are being used, or an enumeration (which would
either select such a function, or an equivalent block of code within a
switch statement).

This is a bit over my programming skills.
But the broken "mode" operator is a critical issue.

Help (=svn commits) welcome,

thanks
Markus

Markus Neteler wrote:

>> 1) "quantile=0.5" creeps into the history:
>>
>> r.info landuse96_28m_mode7x7
>> ...
>> | r.neighbors input="landuse96_28m" output="landuse96_28m_mode7x7" met\ |
>> | hod="mode" size=3 quantile=0.5
>>
>> --> quantile=0.5 should not be there (even if ignored).
>
> Remove the line:
>
> parm.quantile->answer = "0.5";
>
> Optionally change:
>
> out->quantile = (parm.quantile->answer && parm.quantile->answers[i])
> ? atof(parm.quantile->answers[i])
> : 0;
>
> to default to 0.5 rather than 0.
>
> But given that we already a distinct "median" method, using
> "quantile=0.5" is largely redundant.

... mmh, not sure what to change.

Fixed in r60345.

>> 2) The method=mode does not preserve CELL (likewise in r.series).

Note that the same issue applies to minimum, maximum, range, and
(unweighted) sum.

This is a bit over my programming skills.
But the broken "mode" operator is a critical issue.

Help (=svn commits) welcome,

r60346 implements the following logic:

For any aggregate, there are 2 factors affecting the output type:

1. Whether the input map is integer or floating-point.
2. Whether the weighted or unweighted version of the aggregate is used.

These combine to create 4 possibilities:

type integer integer float float
weighted no yes no yes
                      
average float float float float
median [1] [1] float float
mode integer integer [2] [2]
minimum integer integer float float
maximum integer integer float float
range integer integer float float
stddev float float float float
sum integer float float float
count integer float integer float
variance float float float float
diversity integer integer integer integer
interspersion integer integer integer integer
quart1 [1] [1] float float
quart3 [1] [1] float float
perc90 [1] [1] float float
quantile [1] [1] float float

[1] For integer input, quantiles may produce float results from
interpolating between adjacent values.

[2] Calculating the mode of floating-point data is essentially
meaningless.

With the current aggregates, there are 5 cases:

1. Output is always float: average, variance, stddev, quantiles (with
interpolation).

2. Output is always integer: diversity, interspersion.

3. Output is integer if unweighted, float if weighted: count.

4. Output matches input: minimum, maximum, range, mode (subject to
note 2 above), quantiles (without interpolation).

5. Output is integer for integer input and unweighted aggregate,
otherwise float: sum.

--
Glynn Clements <glynn@gclements.plus.com>

2014-05-19 15:08 GMT+02:00 Glynn Clements <glynn@gclements.plus.com>:

r60346 implements the following logic:

For any aggregate, there are 2 factors affecting the output type:

1. Whether the input map is integer or floating-point.
2. Whether the weighted or unweighted version of the aggregate is used.

These combine to create 4 possibilities:

type integer integer float float
weighted no yes no yes

average float float float float
median [1] [1] float float
mode integer integer [2] [2]
minimum integer integer float float
maximum integer integer float float
range integer integer float float
stddev float float float float
sum integer float float float
count integer float integer float
variance float float float float
diversity integer integer integer integer
interspersion integer integer integer integer
quart1 [1] [1] float float
quart3 [1] [1] float float
perc90 [1] [1] float float
quantile [1] [1] float float

[1] For integer input, quantiles may produce float results from
interpolating between adjacent values.

[2] Calculating the mode of floating-point data is essentially
meaningless.

With the current aggregates, there are 5 cases:

1. Output is always float: average, variance, stddev, quantiles (with
interpolation).

2. Output is always integer: diversity, interspersion.

3. Output is integer if unweighted, float if weighted: count.

4. Output matches input: minimum, maximum, range, mode (subject to
note 2 above), quantiles (without interpolation).

5. Output is integer for integer input and unweighted aggregate,
otherwise float: sum.

would be nice to put to the manual I guess. Martin

--
Martin Landa * http://geo.fsv.cvut.cz/gwiki/Landa

r60346 implements the following logic:

For any aggregate, there are 2 factors affecting the output type:

1. Whether the input map is integer or floating-point.
2. Whether the weighted or unweighted version of the aggregate is used.

Out of curiosity, could this logic also be applied to r.resamp.stats?
Or did this module already receive a similar solution?

Cheers
Stefan

Blumentrath, Stefan wrote:

>r60346 implements the following logic:

>For any aggregate, there are 2 factors affecting the output type:

>1. Whether the input map is integer or floating-point.
>2. Whether the weighted or unweighted version of the aggregate is used.

Out of curiosity, could this logic also be applied to r.resamp.stats?
Or did this module already receive a similar solution?

r.resamp.stats always generates DCELL output.

In theory, the same logic could be used for anything which uses
lib/stats, i.e. r.neighbors, r.resamp.stats, r.series,
r.series.interp, r3.neighbors, and v.vect.stats.

Some of those only use the unweighted methods, in which case the logic
would be simpler.

v.vect.stats has the output type dictated by the type of the output
column. However, each method has a flag to indicate whether conversion
to integer should use truncation or rounding. Rounding is used for
methods which can return non-integer results for integer data.

--
Glynn Clements <glynn@gclements.plus.com>

On Mon, May 19, 2014 at 4:49 PM, Martin Landa <landa.martin@gmail.com> wrote:

2014-05-19 15:08 GMT+02:00 Glynn Clements <glynn@gclements.plus.com>:

r60346 implements the following logic:

For any aggregate, there are 2 factors affecting the output type:

...

would be nice to put to the manual I guess. Martin

I have taken liberty to add Glynn's comments in r60375 to the manual.

Markus

PS: Yes, even checked HTML with http://validator.w3.org/ :slight_smile:

2014-05-20 16:57 GMT+02:00 Markus Neteler <neteler@osgeo.org>:

On Mon, May 19, 2014 at 4:49 PM, Martin Landa <landa.martin@gmail.com> wrote:

2014-05-19 15:08 GMT+02:00 Glynn Clements <glynn@gclements.plus.com>:

r60346 implements the following logic:

For any aggregate, there are 2 factors affecting the output type:

...

would be nice to put to the manual I guess. Martin

I have taken liberty to add Glynn's comments in r60375 to the manual.

thanks! Martin

--
Martin Landa * http://geo.fsv.cvut.cz/gwiki/Landa

On Mon, May 19, 2014 at 3:08 PM, Glynn Clements
<glynn@gclements.plus.com> wrote:
...

Fixed in r60345.

...

r60346 implements the following logic:

My tests went fine so far, any objections that I sync the r.neighbors
version in relbr7 to trunk?

Markus

On Wed, May 21, 2014 at 7:49 AM, Markus Neteler <neteler@osgeo.org> wrote:

On Mon, May 19, 2014 at 3:08 PM, Glynn Clements
<glynn@gclements.plus.com> wrote:
...

Fixed in r60345.

...

r60346 implements the following logic:

My tests went fine so far, any objections that I sync the r.neighbors
version in relbr7 to trunk?

Done: relbr7 updated in r60456.

Markus

Hi Glynn,

Glynn Clements wrote:

In theory, the same logic could be used for anything which uses lib/stats, i.e. r.neighbors, r.resamp.stats, r.series, r.series.interp, r3.neighbors, and v.vect.stats.

Thanks for your reply, and sorry that mine took a while.

Provided that I neither have an idea how much work it would be to implement that in a more general approach, nor the competence to do so, would you mind if I open an enhancement request in trac?
I would consider it as a reasonable improvement, as it would help to both avoid that maps unnecessarily occupy hard disk space and speed up subsequent operations as well as it would spare users who care about the storage type from running a r.mapcal int() / float() and g.remove after one of the commands you named (maybe also r.in.xyz?). Finally, other modules like t.rast.aggregate would benefit from such an improvement as well.

May be having a "type-option" in the named modules, like in r.in.xyz could be another solution which also covers the FCELL/DCELL difference?

All the best,
Stefan

Blumentrath, Stefan wrote:

> In theory, the same logic could be used for anything which uses
> lib/stats, i.e. r.neighbors, r.resamp.stats, r.series,
> r.series.interp, r3.neighbors, and v.vect.stats.

Thanks for your reply, and sorry that mine took a while.

Provided that I neither have an idea how much work it would be to
implement that in a more general approach, nor the competence to do
so, would you mind if I open an enhancement request in trac?

That's probably the best course of action.

--
Glynn Clements <glynn@gclements.plus.com>