[GRASS-dev] [GRASS GIS] #1401: r.series for time series

#1401: r.series for time series
---------------------------------+------------------------------------------
Reporter: mmetz | Owner: grass-dev@…
     Type: enhancement | Status: new
Priority: normal | Milestone: 7.0.0
Component: Raster | Version: svn-trunk
Keywords: raster, time series | Platform: All
      Cpu: All |
---------------------------------+------------------------------------------
The attached patch adds weighing windows to r.series. This is useful for
interpolation of NULL cells in time series data, e.g. with
{{{
r.series method=average window=gauss
}}}
which will provide more realistic results than the same command without a
weighing window.

The weighing windows are taken from r.resamp.filter with modifications to
the gauss and normal windows to make them pseudo-finite. At the moment,
only one windowing function can be selected, i.e. coupling an infinite
window with a finite window is currently not possible. In contrast to
r.resamp.filter, the sinc window will never have negative weights. In my
tests, the hann, hamming, and blackman windows seem to produce reasonable
results even without coupling with a finite window.

--
Ticket URL: <https://trac.osgeo.org/grass/ticket/1401&gt;
GRASS GIS <http://grass.osgeo.org>

#1401: r.series for time series
---------------------------------+------------------------------------------
Reporter: mmetz | Owner: grass-dev@…
     Type: enhancement | Status: new
Priority: normal | Milestone: 7.0.0
Component: Raster | Version: svn-trunk
Keywords: raster, time series | Platform: All
      Cpu: All |
---------------------------------+------------------------------------------

Comment(by glynn):

Replying to [ticket:1401 mmetz]:

> The attached patch adds weighing windows to r.series

A few points:

1. This patch places the centre of the window in the centre of the time
period; I would have expected this to be configurable. E.g. I can imagine
situations where the most recent data would have the greatest weight, with
weight decreasing as the data gets older. Unlike spatial data, time has an
inherent asymmetry between past and future.

2. I don't see the stated modifications to the gauss/normal or sinc
functions. They appear to be identical to the r.series versions. Also, I'm
not sure why those modifications would make sense; e.g. the fact that
sinc/lanczos have negative weights is one of their advantages, as it
results in sharper output than filters with only positive weights.

3. Is "radius" really the correct term for one-dimensional data?

Possible enhancements:

Allowing arbitrary weights (one per input map) to be read from a file
and/or a command line option.

Accepting a list of weight maps (one per input map), i.e. each input cell
is a value/weight pair rather than just a value.

--
Ticket URL: <https://trac.osgeo.org/grass/ticket/1401#comment:1&gt;
GRASS GIS <http://grass.osgeo.org>

#1401: r.series for time series
---------------------------------+------------------------------------------
Reporter: mmetz | Owner: grass-dev@…
     Type: enhancement | Status: new
Priority: normal | Milestone: 7.0.0
Component: Raster | Version: svn-trunk
Keywords: raster, time series | Platform: All
      Cpu: All |
---------------------------------+------------------------------------------

Comment(by glynn):

Replying to [comment:1 glynn]:

> They appear to be identical to the r.series versions.

Er, I mean the r.resamp.filter versions.

--
Ticket URL: <https://trac.osgeo.org/grass/ticket/1401#comment:2&gt;
GRASS GIS <http://grass.osgeo.org>

#1401: r.series for time series
---------------------------------+------------------------------------------
Reporter: mmetz | Owner: grass-dev@…
     Type: enhancement | Status: new
Priority: normal | Milestone: 7.0.0
Component: Raster | Version: svn-trunk
Keywords: raster, time series | Platform: All
      Cpu: All |
---------------------------------+------------------------------------------

Comment(by mmetz):

Replying to [comment:1 glynn]:
> Replying to [ticket:1401 mmetz]:
>
> > The attached patch adds weighing windows to r.series
>
> A few points:
>
> 1. This patch places the centre of the window in the centre of the time
period; I would have expected this to be configurable. E.g. I can imagine
situations where the most recent data would have the greatest weight, with
weight decreasing as the data gets older. Unlike spatial data, time has an
inherent asymmetry between past and future.

Sometimes yes. You could also achieve this with r.series
in=map01,map02,map03,map04,map03,map02,map01. The patch assumes, as does
the existing r.series for regression parameters, that the input is ordered
and the (time) distance between maps is constant. This can be exploited by
changing the order of the input maps. r.series
in=map01,map01,map01,map02,map02,map03,map04 should give more weight to
map01 and map02 and less to map03 and map04, effectively skewing the
weight function.
>
> 2. I don't see the stated modifications to the gauss/normal or sinc
functions. They appear to be identical to the r.series versions. Also, I'm
not sure why those modifications would make sense; e.g. the fact that
sinc/lanczos have negative weights is one of their advantages, as it
results in sharper output than filters with only positive weights.
>
The gauss/normal functions have a radius of 0 (zero) in
r.resamp.filter/main.c:L102 and L103 to indicate that they are infinite I
assume. I have changed this radius to 4 for gauss and 8 for normal since
normal = f_gauss(x/2) / 2. These functions now cover the 99.9% confidence
interval for the number of input maps (see also adjustment of the gauss
window in v.kernel).

The sinc function becomes negative for x larger than an odd multiple of PI
and smaller than the next even multiple of PI. In the patch, x is always
>= 0 && <= PI. In order to have a sharper sinc window, something like
sinc2 and sinc3 would be needed, like lanczos2 and lanczos3. Or an
additional option radius/distance.

> 3. Is "radius" really the correct term for one-dimensional data?
Not sure. It comes from cut'n paste and I kept the term to keep the code
similar. The user doesn't see the term anyway.
>
> Possible enhancements:
>
> Allowing arbitrary weights (one per input map) to be read from a file
and/or a command line option.
>
I was thinking about that too, similar to r.neighbors.

> Accepting a list of weight maps (one per input map), i.e. each input
cell is a value/weight pair rather than just a value.

Hmm, I would need a new module anyway to calculate regression parameters
between two time series. The second time series could also be treated as
weight maps if a weighed version of the selected function is available.

The current patch already makes r.series quite complicated, that's why I
did not introduce more options like skew or radius/distance or custom
weights.

--
Ticket URL: <https://trac.osgeo.org/grass/ticket/1401#comment:3&gt;
GRASS GIS <http://grass.osgeo.org>

#1401: r.series for time series
---------------------------------+------------------------------------------
Reporter: mmetz | Owner: grass-dev@…
     Type: enhancement | Status: new
Priority: normal | Milestone: 7.0.0
Component: Raster | Version: svn-trunk
Keywords: raster, time series | Platform: All
      Cpu: All |
---------------------------------+------------------------------------------

Comment(by glynn):

Replying to [comment:3 mmetz]:

The more I think about it, the less sense this patch appears to make.

Filtering would make sense if you were generating a (time-domain) filtered
version of the data (with the filter radius on the order of the sample
interval) then computing the aggregate over the filtered data (i.e. a
1-dimensional analogue of r.resamp.filter as a pre-processing stemp), but
that isn't what's happening.

Computing a weighted aggregate would make sense if the weights were
inputs, e.g. "confidence" measures for individual maps or individual
cells. It might also make sense with calculated weights where the
weighting function had some theoretical basis (e.g. weight decreasing with
age). But that isn't what's happening either.

Maybe I'm missing something, but I can't see how any of the
r.resamp.filter functions actually make sense as a weighting function for
time-series data.

--
Ticket URL: <https://trac.osgeo.org/grass/ticket/1401#comment:4&gt;
GRASS GIS <http://grass.osgeo.org>

#1401: r.series for time series
---------------------------------+------------------------------------------
Reporter: mmetz | Owner: grass-dev@…
     Type: enhancement | Status: new
Priority: normal | Milestone: 7.0.0
Component: Raster | Version: svn-trunk
Keywords: raster, time series | Platform: All
      Cpu: All |
---------------------------------+------------------------------------------

Comment(by mmetz):

Replying to [comment:4 glynn]:
> Replying to [comment:3 mmetz]:
>
> The more I think about it, the less sense this patch appears to make.
>
> Filtering would make sense if you were generating a (time-domain)
filtered version of the data (with the filter radius on the order of the
sample interval) then computing the aggregate over the filtered data (i.e.
a 1-dimensional analogue of r.resamp.filter as a pre-processing stemp),
but that isn't what's happening.
>
> Computing a weighted aggregate would make sense if the weights were
inputs, e.g. "confidence" measures for individual maps or individual
cells. It might also make sense with calculated weights where the
weighting function had some theoretical basis (e.g. weight decreasing with
age). But that isn't what's happening either.
>
> Maybe I'm missing something, but I can't see how any of the
r.resamp.filter functions actually make sense as a weighting function for
time-series data.

I came up with this path because I want to do NULL cell interpolation in
time. The time series data need to change gradually over time, e.g. NDVI
or temperature, not e.g. daily rainfall.

I will try to explain the motivation for this patch with an example. MODIS
16 day NDVI comes with many NULL cells. Spatial interpolation of these
NULL cells is problematic because NDVI does not necessarily change
gradually in space. NDVI changes gradually in time for most natural land
cover types if the time steps are not too large. I assume that NDVI at a
given time step will be more similar to the next or previous time step and
less similar to, say, 5 time steps before or ahead. Therefore I want to
give higher weight to time steps closer to the given time step instead of
using a simple arithmetic mean.

More accurate would probably be some interpolation method. I can not use
the Rast_interp_*() methods because they return NULL if any of the input
values is NULL, instead I would need a non-uniform method as implemented
in the RST and lidar (bspline) libs, but for 1D.

Since NDVI is cyclic, an FFT approach would also be an option, but again
FFT can not handle NULL cells, NULLs need to be replaced first with a
reasonable fill value.

--
Ticket URL: <https://trac.osgeo.org/grass/ticket/1401#comment:5&gt;
GRASS GIS <http://grass.osgeo.org>

#1401: r.series for time series
---------------------------------+------------------------------------------
Reporter: mmetz | Owner: grass-dev@…
     Type: enhancement | Status: new
Priority: normal | Milestone: 7.0.0
Component: Raster | Version: svn-trunk
Keywords: raster, time series | Platform: All
      Cpu: All |
---------------------------------+------------------------------------------

Comment(by glynn):

Replying to [comment:5 mmetz]:

> I will try to explain the motivation for this patch with an example.

Okay, so you're filtering in the time domain, then taking a single sample
from the middle of the interval?

Even so, I would suggest making the "radius" (i.e. cut-off frequency) a
parameter, and probably also the centre (although that can be fudged by
padding with null maps).

In general, the cut-off frequency shouldn't depend upon the filtering
method; the reason the various filters in r.resamp.filter have associated
radii is so that the cut-off frequency doesn't vary with the filter method
(i.e. the user-supplied radius value is normalised).

Also, if you're likely to want more than one output, extending r.series to
support multiple outputs (with different offsets) would be significantly
more efficient than multiple runs of r.series (reading the inputs could
easily take more time than the actual processing).

--
Ticket URL: <https://trac.osgeo.org/grass/ticket/1401#comment:6&gt;
GRASS GIS <http://grass.osgeo.org>

#1401: r.series for time series
---------------------------------+------------------------------------------
Reporter: mmetz | Owner: grass-dev@…
     Type: enhancement | Status: new
Priority: normal | Milestone: 7.0.0
Component: Raster | Version: svn-trunk
Keywords: raster, time series | Platform: All
      Cpu: All |
---------------------------------+------------------------------------------

Comment(by mmetz):

Replying to [comment:6 glynn]:
>
> Okay, so you're filtering in the time domain, then taking a single
sample from the middle of the interval?

I calculate the weighed average for a given point in time from a given
time interval, same like r.resamp.filter calculates the weighed average
for a given point in space from a given space interval, e.g.
r.resamp.filter/main.c:L268,269,276 and w_ave() in lib/stats/c_ave.c.
r.resamp.filter could just as well use w_ave() for each dimension. In the
patch, the point in time is in the middle of the interval. Since there is
no multiple centre option (yet), I guess this is what you mean with taking
a single sample which is located in the middle of the interval.
>
> Even so, I would suggest making the "radius" (i.e. cut-off frequency) a
parameter, and probably also the centre (although that can be fudged by
padding with null maps).
>
The radius is currently determined by the number of input maps. Additional
options radius and centre would clearly speed up processing when several
different outputs are needed for different points in time. The way I
currently use the patched r.series, an option radius would only make sense
in combination with a multiple option centre. The number of outputs would
then be n_methods x n_centres.

> In general, the cut-off frequency shouldn't depend upon the filtering
method; the reason the various filters in r.resamp.filter have associated
radii is so that the cut-off frequency doesn't vary with the filter method
(i.e. the user-supplied radius value is normalised).

AFAICT, the patch does the same, the user-supplied radius is normalised.
It's not clear to me how the term frequency can be applied to a
gauss/normal distribution, for the sine and cosine-based filters this is
obvious. The associated radii for gauss and normal are chosen such that
filter(dx) is nearly 0 for dx = user-radius in order to be comparable with
lanczos, hermite or cosine-based filters where cos(normalised_dx) = 0 for
dx = user-radius. IOW, the gauss filter in r.resamp.filter sets the user-
radius to 1 SD (soft filter), and the gauss filter in the patch for
r.series sets the user-radius to 4 SD (sharper filter), granted that
f_gauss() assumes SD = 1 which I am not sure about. It looks more like SD
= 1/4, and for f_normal(), SD = 1/2 in the exponent and SD = 1 in the
square root?

--
Ticket URL: <https://trac.osgeo.org/grass/ticket/1401#comment:7&gt;
GRASS GIS <http://grass.osgeo.org>

#1401: r.series for time series
---------------------------------+------------------------------------------
Reporter: mmetz | Owner: grass-dev@…
     Type: enhancement | Status: new
Priority: normal | Milestone: 7.0.0
Component: Raster | Version: svn-trunk
Keywords: raster, time series | Platform: All
      Cpu: All |
---------------------------------+------------------------------------------

Comment(by glynn):

Replying to [comment:7 mmetz]:

> I calculate the weighed average for a given point in time from a given
time interval, same like r.resamp.filter calculates the weighed average
for a given point in space from a given space interval,

In the case of r.resamp.filter, each output cell is typically calculated
based upon a small subset of the input cells, not the entire input.

> The way I currently use the patched r.series, an option radius would
only make sense in combination with a multiple option centre.

It would make sense even for a single output, as it controls how rapidly
the weight falls off with distance from the centre.

> > In general, the cut-off frequency shouldn't depend upon the filtering
method; the reason the various filters in r.resamp.filter have associated
radii is so that the cut-off frequency doesn't vary with the filter method
(i.e. the user-supplied radius value is normalised).
>
> AFAICT, the patch does the same, the user-supplied radius is normalised.

AFAICT, the code maps the input range to the width of the window function,
so wider windows will be "sharper" (have a higher cut-off frequency), e.g.
lanczos3 will be sharper than lanczos2.

> It's not clear to me how the term frequency can be applied to a
gauss/normal distribution,

r.resamp.filter implements FIR (finite impulse response) filtering (you
get an error if you don't have at least one finite window function). If
you calculate the DFT of any of the window functions, you'll get its
frequency response. All of the functions are low-pass filters, as they're
symmetric.

See http://en.wikipedia.org/wiki/Window_function for examples of common
window functions and their frequency responses.

A piecewise-continuous function defined by sampled data can be considered
a mixture (sum) of the underlying signal and quantisation noise. The
intent of a low pass filter is to discard the quantisation noise while
retaining the signal.

The cut-off frequency is normally chosen according to the sampling
frequency, as the quantisation noise is dominated by the sampling
frequency and its harmonics. In general, the cut-off frequency is
inversely proportional to the width of the central "lobe" of the window
function.

If you run r.resamp.filter with a specific radius, you get a specific cut-
off frequency regardless of the method chosen. So while lanczos3 uses 3
times as large a window as lanczos1, the cut-off frequency remains the
same. This is what I mean when I say that the radius is "normalised".

OTOH, if you map the range of inputs to the range of the filter, "long-
tailed" filters with a wider domain (e.g. lanczos3) will have a narrower
central lobe and thus a higher cut-off frequency.

The way that Lanczos filters are defined, the number of samples is
supposed to be proportional to the order ("a" parameter), so lanczos3
should use 3 times as many samples (at the same sampling frequency, i.e.
cover 3 times as large a time interval) as lanczos1 in order to get a
similar frequency response (higher-order filters will fall off faster, but
the frequency at which the fall-off starts should be the same).

See http://en.wikipedia.org/wiki/File:Lanczos-kernel.svg for an
illustration. If both graphs were drawn on the same axes, they would have
roughly the same shape, but the a=3 window would have a longer tail. By
scaling the axes to the same width, the a=3 window has a narrower central
lobe.

--
Ticket URL: <https://trac.osgeo.org/grass/ticket/1401#comment:8&gt;
GRASS GIS <http://grass.osgeo.org>

#1401: r.series for time series
---------------------------------+------------------------------------------
Reporter: mmetz | Owner: grass-dev@…
     Type: enhancement | Status: new
Priority: normal | Milestone: 7.0.0
Component: Raster | Version: svn-trunk
Keywords: raster, time series | Platform: All
      Cpu: All |
---------------------------------+------------------------------------------

Comment(by mmetz):

Replying to [comment:8 glynn]:
> Replying to [comment:7 mmetz]:
>
> > The way I currently use the patched r.series, an option radius would
only make sense in combination with a multiple option centre.
>
> It would make sense even for a single output, as it controls how rapidly
the weight falls off with distance from the centre.
>

I was reading up a bit and thinking about it. As I understand it, the
fundamental difference between r.resamp.filter and the patched r.series is
that for r.resamp.filter, the window size is a function of the user-given
radius and the chosen (combination of) filtering window(s), whereas for
r.series, the radius is a function of the chosen window size (number of
input maps) and the filtering window. The reason for my implementation is
a simpler user-interface. Otherwise, i.e. with a required option radius,
warning/error messages are needed if the number of input maps exceeds or
is smaller than the window size. And I wanted an easy way to test various
filters with differing sharpness (e.g. lanczos1 vs. lanczos3) using the
same window size.

In short, I think that a user-controlled window size results in a simpler
user-interface, but an option radius would make the patch more versatile
and is more in line with the concept of FIR filtering.

--
Ticket URL: <https://trac.osgeo.org/grass/ticket/1401#comment:9&gt;
GRASS GIS <http://grass.osgeo.org>

#1401: r.series for time series
---------------------------------+------------------------------------------
Reporter: mmetz | Owner: grass-dev@…
     Type: enhancement | Status: new
Priority: normal | Milestone: 7.0.0
Component: Raster | Version: svn-trunk
Keywords: raster, time series | Platform: All
      Cpu: All |
---------------------------------+------------------------------------------

Comment(by glynn):

Replying to [comment:9 mmetz]:

> I was reading up a bit and thinking about it. As I understand it, the
fundamental difference between r.resamp.filter and the patched r.series is
that for r.resamp.filter, the window size is a function of the user-given
radius and the chosen (combination of) filtering window(s), whereas for
r.series, the radius is a function of the chosen window size (number of
input maps) and the filtering window.

A more fundamental difference is that r.resamp.filter performs convolution
(i.e. a weighted sum is calculated for every cell), while r.series
calculates a single value. The main reason why r.resamp.filter raises an
error for an unbounded window is that it's almost certainly a user error.
A secondary reason is that it's ridiculously slow; if there was a use for
such cases, an FFT-based implementation would be needed.

In theory, convolution calculates an unbounded integral (or, in the
discrete case, an infinite sum). The window size is merely an optimisation
(there's no point in summing an infinite number of zeros), and shouldn't
affect the result.

> The reason for my implementation is a simpler user-interface. Otherwise,
i.e. with a required option radius, warning/error messages are needed if
the number of input maps exceeds or is smaller than the window size.

It's perfectly reasonable for the domain of the filter function to extend
beyond the set of input maps, particularly for a "wide" filter such as
lanczos3 or gauss. The correct solution is to simply clip the kernel to
the window (and calculate the divisor accordingly), not to "squash" the
kernel to fit the input range.

> And I wanted an easy way to test various filters with differing
sharpness (e.g. lanczos1 vs. lanczos3) using the same window size.

The various Lanczos filters should all have the same sharpness; higher-
order filters should just be more accurate at the cost of performance.
That is why the width of the filter's domain and the number of input
samples shouldn't affect the result.

--
Ticket URL: <https://trac.osgeo.org/grass/ticket/1401#comment:10&gt;
GRASS GIS <http://grass.osgeo.org>

#1401: r.series for time series
-------------------------------------------+--------------------------------
Reporter: mmetz | Owner: grass-dev@…
     Type: enhancement | Status: new
Priority: normal | Milestone: 7.0.0
Component: Raster | Version: svn-trunk
Keywords: r.series, raster, time series | Platform: All
      Cpu: All |
-------------------------------------------+--------------------------------
Changes (by neteler):

  * keywords: raster, time series => r.series, raster, time series

--
Ticket URL: <http://trac.osgeo.org/grass/ticket/1401#comment:11&gt;
GRASS GIS <http://grass.osgeo.org>