Michael Barton wrote:
> I thought that smoothing will take care of it but apparently just the
> opposite
This does seem strange. Even odder is the fact that the anomalies get bigger
the larger the smoothing neighborhood. Usually, increasing the neighborhood
creates greater smoothing.
That's the case if the neighbours are being averaged, but it doesn't
hold for a quantile.
It might be the way we are using smoothing. We check to see if an
erosion/deposition value exceeds the smoothed value by some user-determined
amount. It is does, the smoothed value is used instead of the original
value. The goal was to get rid of extreme values but leave the rest alone.
That will typically cause gradient discontinuities as you switch
between filtered and unfiltered data. That wouldn't necessarily be an
issue in itself, but it will be if you start calculating derivatives
on the result.
Actually, ignore my earlier suggestion of using a
median-of-filtered-values approach. That will have similar problems
regarding derivatives.
Moreover, any mechanism that has a discrete "cut" (i.e. if <criterion>
then <value1> else <value2>) will have the same issue. You need to
find some way to introduce a continuous transition. E.g. rather than
selecting filtered vs unfiltered based upon a threshold, interpolate
between filtered and unfiltered based upon the distance from the
threshold, turning the discrete step into an ogive (S-curve).
Even then, any kind of nonlinearity risks introducing artifacts when
it occurs as part of a complex process, particularly if it's
iterative.
In general, iterative processes won't end up being stable just because
you want them to be stable. If there's any kind of "amplification"
involved, instability is likely to be the "default" behaviour;
stability has to be forced.
> BTW as I understand it median is computed by dividing the values into
> a finite number of classes
> so the median from a continuous field would not change continuously
> from cell to cell and
> the effect you are getting could be expected especially for larger
> number of cells.
I don't know how r.neighbors computes the median. However, it is supposed to
be a value such that half the neighbor cells have larger values and half
have smaller values. I would have thought that the way to compute would be
to order the neighborhood values from smallest to largest and find the
middle. Since the neighborhood is always an odd number, the middle is always
a single cell.
This is essentially what happens, although if there are any null
cells, the median may be computed over an even number of cells
resulting in the mean of the two middle cells. The actual
implementation (from lib/stats/c_median.c) is:
void c_median(DCELL *result, DCELL *values, int n)
{
n = sort_cell(values, n);
if (n < 1)
G_set_d_null_value(result, 1);
else
*result = (values[(n-1)/2] + values[n/2]) / 2;
}
where sort_cell() sorts the cell values into ascending order.
[If the number of cells is odd, (n-1)/2 and n/2 will be equal, so the
middle value is averaged with itself.]
> Mean is computed directly from the floating point values - is that
> right?
This ought to be the sum of the neighbor cell values divided by the number
of cells.
The median is less affected by a single extreme cell than the mean, which is
why we preferred the median. However, if the median calculation algorithm is
problematic, then we'll have to switch to the mean.
It's problematic if you're going to differentiate the result, because
it's a discontinuous function; it's essentially a tree of if/then/else
choices.
--
Glynn Clements <glynn@gclements.plus.com>