Michael Barton wrote:

> I thought that smoothing will take care of it but apparently just the

> opposite

This does seem strange. Even odder is the fact that the anomalies get bigger

the larger the smoothing neighborhood. Usually, increasing the neighborhood

creates greater smoothing.

That's the case if the neighbours are being averaged, but it doesn't

hold for a quantile.

It might be the way we are using smoothing. We check to see if an

erosion/deposition value exceeds the smoothed value by some user-determined

amount. It is does, the smoothed value is used instead of the original

value. The goal was to get rid of extreme values but leave the rest alone.

That will typically cause gradient discontinuities as you switch

between filtered and unfiltered data. That wouldn't necessarily be an

issue in itself, but it will be if you start calculating derivatives

on the result.

Actually, ignore my earlier suggestion of using a

median-of-filtered-values approach. That will have similar problems

regarding derivatives.

Moreover, any mechanism that has a discrete "cut" (i.e. if <criterion>

then <value1> else <value2>) will have the same issue. You need to

find some way to introduce a continuous transition. E.g. rather than

selecting filtered vs unfiltered based upon a threshold, interpolate

between filtered and unfiltered based upon the distance from the

threshold, turning the discrete step into an ogive (S-curve).

Even then, any kind of nonlinearity risks introducing artifacts when

it occurs as part of a complex process, particularly if it's

iterative.

In general, iterative processes won't end up being stable just because

you want them to be stable. If there's any kind of "amplification"

involved, instability is likely to be the "default" behaviour;

stability has to be forced.

> BTW as I understand it median is computed by dividing the values into

> a finite number of classes

> so the median from a continuous field would not change continuously

> from cell to cell and

> the effect you are getting could be expected especially for larger

> number of cells.

I don't know how r.neighbors computes the median. However, it is supposed to

be a value such that half the neighbor cells have larger values and half

have smaller values. I would have thought that the way to compute would be

to order the neighborhood values from smallest to largest and find the

middle. Since the neighborhood is always an odd number, the middle is always

a single cell.

This is essentially what happens, although if there are any null

cells, the median may be computed over an even number of cells

resulting in the mean of the two middle cells. The actual

implementation (from lib/stats/c_median.c) is:

void c_median(DCELL *result, DCELL *values, int n)

{

n = sort_cell(values, n);

if (n < 1)

G_set_d_null_value(result, 1);

else

*result = (values[(n-1)/2] + values[n/2]) / 2;

}

where sort_cell() sorts the cell values into ascending order.

[If the number of cells is odd, (n-1)/2 and n/2 will be equal, so the

middle value is averaged with itself.]

> Mean is computed directly from the floating point values - is that

> right?

This ought to be the sum of the neighbor cell values divided by the number

of cells.

The median is less affected by a single extreme cell than the mean, which is

why we preferred the median. However, if the median calculation algorithm is

problematic, then we'll have to switch to the mean.

It's problematic if you're going to differentiate the result, because

it's a discontinuous function; it's essentially a tree of if/then/else

choices.

--

Glynn Clements <glynn@gclements.plus.com>