[GRASS-dev] GRASS 7 r.neighbours average method giving incorrect value

Hi,

I noticed today that "average" method in r.neighbours gives an incorrect value. Let me give you an example:
1) I have a raster soiltype1 consisting of 0's and 1's:

r.univar map=soiltype1@soil percentile=100
total null and non-null cells: 35000000
total null cells: 10299386
Of the non-null cells:
----------------------
n: 24700614
minimum: 0
maximum: 1
range: 1
mean: 0.0678528
mean of absolute values: 0.0678528
standard deviation: 0.251493
variance: 0.0632488
variation coefficient: 370.645 %
sum: 1676006

2) I want to get the ratio of 0s and 1s in a neighbourhood for each cell. Therefore I run the command 'r.neighbors input=soiltype1@soil output=soiltype1_avg size=11 method=average'

3) Judging from previous steps, it should be only logical that the output raster has values from 0 to 1. However it seems that 0.5 has been added:

r.univar map=soiltype1_avg@soil percentile=100
total null and non-null cells: 35000000
total null cells: 9921008
Of the non-null cells:
----------------------
n: 25078992
minimum: 0.5
maximum: 1.5
range: 1
mean: 0.567911
mean of absolute values: 0.567911
standard deviation: 0.200923
variance: 0.0403699
variation coefficient: 35.3792 %
sum: 14242639.875584

Is there any explanation for this behaviour? I doubt it is supposed to do that, am I right?
I'm using WinGRASS 7 revision 57043.

Regards,
Allar

Allar Haav wrote:

I noticed today that "average" method in r.neighbours gives an incorrect
value. Let me give you an example:

3) Judging from previous steps, it should be only logical that the
output raster has values from 0 to 1. However it seems that 0.5 has been
added:

When the output is an integer map, the result of the average, variance
and stddev methods is supposed to be rounded to the nearest integer
(rather than towards zero), so 0.5 is added prior to truncation.

r54820 removed the option to generate integer output maps, so the
"is the output map integer" check became redundant. But rather than
changing it to never adding 0.5, now it's always added.

This should be fixed by r57064.

--
Glynn Clements <glynn@gclements.plus.com>

Right, explanation makes sense.
Also, thanks for the quick fix. Much appreciated!

Allar

On 11/07/2013 12:30, Glynn Clements wrote:

When the output is an integer map, the result of the average, variance
and stddev methods is supposed to be rounded to the nearest integer
(rather than towards zero), so 0.5 is added prior to truncation.

r54820 removed the option to generate integer output maps, so the
"is the output map integer" check became redundant. But rather than
changing it to never adding 0.5, now it's always added.

This should be fixed by r57064.

On Thu, Jul 11, 2013 at 1:30 PM, Glynn Clements
<glynn@gclements.plus.com> wrote:

Allar Haav wrote:

I noticed today that "average" method in r.neighbours gives an incorrect
value. Let me give you an example:

3) Judging from previous steps, it should be only logical that the
output raster has values from 0 to 1. However it seems that 0.5 has been
added:

When the output is an integer map, the result of the average, variance
and stddev methods is supposed to be rounded to the nearest integer
(rather than towards zero), so 0.5 is added prior to truncation.

For rounding of negative values, 0.5 would need to be subtracted, not
added prior to truncation. See e.g. r.mapcalc's round().

Currently (in G6), the output map type is the same like the input map
type. For an integer map with zeros and ones, the average will never
be anything close to 0.5, but always either zero or one, not realistic
and most probably not desired.

I would rather have DCELL output all the times, maybe keep the maptype
for min, max?

Markus M

Markus Metz wrote:

I would rather have DCELL output all the times, maybe keep the maptype
for min, max?

Retaining the type for mode, sum and range would also make sense, as
the result will always be an integer for integer inputs.

For diversity, the result is always an integer even for floating-point
inputs.

For average, median, variance, standard deviation and interspersion,
the output should always be floating-point (although I'm not entirely
certain about median).

--
Glynn Clements <glynn@gclements.plus.com>

Glynn:

For average, median, variance, standard deviation and interspersion,
the output should always be floating-point (although I'm not entirely
certain about median).

why not for median? is there any other way to deal with even number of
values in the list? if value will often be 0.5 then round() isn't ideal..
It seems better to consistently say "median will output DCELL" than to
say "depending on the number of input maps, and the number of scattered
NULLs at each grid cell, sometimes median will save the map as CELL if it
can".

or is there some other reason or way to deal with that?

?,
Hamish

Hamish wrote:

> For average, median, variance, standard deviation and interspersion,
> the output should always be floating-point (although I'm not entirely
> certain about median).

why not for median? is there any other way to deal with even number of
values in the list? if value will often be 0.5 then round() isn't ideal..
It seems better to consistently say "median will output DCELL" than to
say "depending on the number of input maps, and the number of scattered
NULLs at each grid cell, sometimes median will save the map as CELL if it
can".

or is there some other reason or way to deal with that?

It's a question of definition.

If the number of values is odd, the median will be one of the input
values. If the number of values is even, but the two middle values are
equal, again the median will be one of the input values.

If the number of values is even and the two middle values are
different, then any value between the two is a median, insofar as half
the values will be less than that value while the other half will be
greater. If the two middle values are integers and not adjacent, then
some of the "candidate" medians are also integers.

Taking the arithmetic mean of the two values is one possible approach,
although not necessarily the best one. It's worth considering whether
it really justifies expanding to DCELL for one extra bit (for integer
inputs, the output will always be either an integer or an integer plus
a half).

--
Glynn Clements <glynn@gclements.plus.com>