[GRASS-dev] r.series: threshold/count method?

Hi,

it would be ideal to have a new threshold/count method in r.series.
Example: I am calculating growing degree days (accumulated temperatures
over one year, [1]) and want to know at which day of year (count) this
happens. Input would be 365 growing degree days maps, parameters
a threshold value (e.g. 440 for insect moulting) and the result a
count value per pixel which corresponds to the map number which
I gave as input (which is the day of year DOY in this case).

Would that be hard to implement? A bit of rounding/epsilon will be
involved to match the requested threshold since the input maps
are typically FP maps.

I can imagine more use cases: first frost day in autumn (theshold -0.1°C
or so), input daily minimum temperature maps and more...

Markus

[1] http://en.wikipedia.org/wiki/Growing_degree_day

On Fri, Aug 28, 2009 at 10:47 PM, Markus Neteler<neteler@osgeo.org> wrote:

Hi,

it would be ideal to have a new threshold/count method in r.series.
Example: I am calculating growing degree days (accumulated temperatures
over one year, [1]) and want to know at which day of year (count) this
happens. Input would be 365 growing degree days maps, parameters
a threshold value (e.g. 440 for insect moulting) and the result a
count value per pixel which corresponds to the map number which
I gave as input (which is the day of year DOY in this case).

Would that be hard to implement? A bit of rounding/epsilon will be
involved to match the requested threshold since the input maps
are typically FP maps.

I can imagine more use cases: first frost day in autumn (theshold -0.1°C
or so), input daily minimum temperature maps and more...

Markus

[1] http://en.wikipedia.org/wiki/Growing_degree_day

Attached an attempt for review.
My problem is how to pass the threshold value into the function
(currently hardcoded sample value).

Markus

(attachments)

c_thresh.c (765 Bytes)
thresh.diff (981 Bytes)

On Sun, Aug 30, 2009 at 9:05 PM, Markus Neteler<neteler@osgeo.org> wrote:

On Fri, Aug 28, 2009 at 10:47 PM, Markus Neteler<neteler@osgeo.org> wrote:

Hi,

it would be ideal to have a new threshold/count method in r.series.
Example: I am calculating growing degree days (accumulated temperatures
over one year, [1]) and want to know at which day of year (count) this
happens. Input would be 365 growing degree days maps, parameters
a threshold value (e.g. 440 for insect moulting) and the result a
count value per pixel which corresponds to the map number which
I gave as input (which is the day of year DOY in this case).

Would that be hard to implement? A bit of rounding/epsilon will be
involved to match the requested threshold since the input maps
are typically FP maps.

I can imagine more use cases: first frost day in autumn (theshold -0.1°C
or so), input daily minimum temperature maps and more...

Markus

[1] http://en.wikipedia.org/wiki/Growing_degree_day

Attached an attempt for review.
My problem is how to pass the threshold value into the function

It seems that adding a new parameter would break the internal API.
Hope that I am wrong...

Markus

(currently hardcoded sample value).

Markus

Markus Neteler wrote:

> Attached an attempt for review.
> My problem is how to pass the threshold value into the function

It seems that adding a new parameter would break the internal API.
Hope that I am wrong...

In 7.0, the prototype for aggregates has recently been changed to
accept an additional parameter of type "const void *". r.series uses
this to implement the quantile= option for method=quantile.

--
Glynn Clements <glynn@gclements.plus.com>

Glynn Clements wrote:

Markus Neteler wrote:
   

Attached an attempt for review.
My problem is how to pass the threshold value into the function
       

It seems that adding a new parameter would break the internal API.
Hope that I am wrong...
     
In 7.0, the prototype for aggregates has recently been changed to
accept an additional parameter of type "const void *". r.series uses
this to implement the quantile= option for method=quantile.
   

Ah, thanks - I didn't realize the difference. This will help.

Markus

On Thu, Sep 3, 2009 at 11:47 PM, Markus Neteler<neteler.osgeo@gmail.com> wrote:

Glynn Clements wrote:

Markus Neteler wrote:

In 7.0, the prototype for aggregates has recently been changed to
accept an additional parameter of type "const void *". r.series uses
this to implement the quantile= option for method=quantile.

Ah, thanks - I didn't realize the difference. This will help.

It does.

Since I eventually need it in (locally) 6.4, I have backported
"arbitrary quantiles" from trunk.

I have now added the new threshold method (locally):

   quantile Quantile to calculate for method=quantile
              options: 0.0-1.0
  threshold Threshold to calculate for method=threshold

but need to pass the value to the function. This happens for quantile here
in main.c from r.series:

   240 if (null && flag.nulls->answer)
   241 G_set_d_null_value(&out->buf[col], 1);
   242 else {
   243 memcpy(values_tmp, values, num_inputs *
sizeof(DCELL));
   244 (*out->method_fn)(&out->buf[col],
values_tmp, num_inputs, &out->quantile);
   245 }

Since I am getting lost in the pointers, how to pass '&out->threshold'
instead of
&out->quantile when I use the threshold method? I guess I don't want to add
more parameters to line 244...

Some condition upon presence of quantile/threshold is IMHO needed...

Markus

Markus Neteler wrote:

>> In 7.0, the prototype for aggregates has recently been changed to
>> accept an additional parameter of type "const void *". r.series uses
>> this to implement the quantile= option for method=quantile.
>>
>
> Ah, thanks - I didn't realize the difference. This will help.

It does.

Since I eventually need it in (locally) 6.4, I have backported
"arbitrary quantiles" from trunk.

I have now added the new threshold method (locally):

   quantile Quantile to calculate for method=quantile
              options: 0.0-1.0
  threshold Threshold to calculate for method=threshold

but need to pass the value to the function. This happens for quantile here
in main.c from r.series:

   240 if (null && flag.nulls->answer)
   241 G_set_d_null_value(&out->buf[col], 1);
   242 else {
   243 memcpy(values_tmp, values, num_inputs * sizeof(DCELL));
   244 (*out->method_fn)(&out->buf[col], values_tmp, num_inputs, &out->quantile);
   245 }

Since I am getting lost in the pointers, how to pass '&out->threshold'
instead of
&out->quantile when I use the threshold method? I guess I don't want to add
more parameters to line 244...

I would have renamed the quantile= option to parameter= rather than
having a separate threshold= option, and done likewise for the
"quantile" field in struct output.

But if additional aggregates start getting more complex parameters
(e.g. multiple parameters or different types), the more general
solution is to add a "const void *parameter" to struct output, which
is set when the structure is initialised.

--
Glynn Clements <glynn@gclements.plus.com>

On Sat, Sep 5, 2009 at 8:02 PM, Glynn Clements<glynn@gclements.plus.com> wrote:

Markus Neteler wrote:

>> In 7.0, the prototype for aggregates has recently been changed to
>> accept an additional parameter of type "const void *". r.series uses
>> this to implement the quantile= option for method=quantile.
>>
>
> Ah, thanks - I didn't realize the difference. This will help.

It does.

Since I eventually need it in (locally) 6.4, I have backported
"arbitrary quantiles" from trunk.

I have now added the new threshold method (locally):

quantile Quantile to calculate for method=quantile
options: 0.0-1.0
threshold Threshold to calculate for method=threshold

but need to pass the value to the function. This happens for quantile here
in main.c from r.series:

240 if (null && flag.nulls->answer)
241 G_set_d_null_value(&out->buf[col], 1);
242 else {
243 memcpy(values_tmp, values, num_inputs * sizeof(DCELL));
244 (*out->method_fn)(&out->buf[col], values_tmp, num_inputs, &out->quantile);
245 }

Since I am getting lost in the pointers, how to pass '&out->threshold'
instead of
&out->quantile when I use the threshold method? I guess I don't want to add
more parameters to line 244...

I would have renamed the quantile= option to parameter= rather than
having a separate threshold= option, and done likewise for the
"quantile" field in struct output.

Sounds good, I am happy about suggestions.
Drawback: then currently enforced 0.-1. range for quantiles
would be removed. Maybe a minor issue or to be tested in
the c_percentile.c file.

But if additional aggregates start getting more complex parameters
(e.g. multiple parameters or different types), the more general
solution is to add a "const void *parameter" to struct output, which
is set when the structure is initialised.

I have an extra issue: For the particular growing degree day calculation
I am facing the problem that the values I am comparing with a FP values
and that GRASS_EPSILON is useless in this particular case. It works
only with

/* EPSILON = GRASS_EPSILON; */
  EPSILON = 10.
  if (fabs(tval - values[i]) < EPSILON ) {

because the pixel values are growing so quickly from one map to the
next. Since we need a generic solution, another "precision" or whatever
parameter needs to be passed otherwise the thresholding fails completely.
EPSILON however depends on the input maps and needs to be user
controllable.

How to deal with that?

Markus

Markus Neteler wrote:

I have an extra issue: For the particular growing degree day calculation
I am facing the problem that the values I am comparing with a FP values
and that GRASS_EPSILON is useless in this particular case. It works
only with

/* EPSILON = GRASS_EPSILON; */
  EPSILON = 10.
  if (fabs(tval - values[i]) < EPSILON ) {

because the pixel values are growing so quickly from one map to the
next. Since we need a generic solution, another "precision" or whatever
parameter needs to be passed otherwise the thresholding fails completely.
EPSILON however depends on the input maps and needs to be user
controllable.

How to deal with that?

What exactly are you trying to measure?

From your original post, it sounds like you have a monotonically

increasing input, and you want the time (index) at which the value
first exceeds the threshold, e.g.:

  for (i = 0; i < n; i++) {
      if (values[i] >= threshold) {
          *result = (DCELL) i;
          return;
      }
  }
  Rast_set_d_null_value(result, 1);

If you need an approximate equality comparison, the usual solution is
to use the ratio of the difference to one of the values, e.g.:

  double EPSILON = 1e-6; /* 1 ppm */
  if (fabs(values[i] - threshold) < EPSILON * threshold)
    ...

--
Glynn Clements <glynn@gclements.plus.com>

On Mon, Sep 7, 2009 at 11:49 PM, Glynn Clements<glynn@gclements.plus.com> wrote:

Markus Neteler wrote:

I have an extra issue: For the particular growing degree day calculation
I am facing the problem that the values I am comparing with a FP values
and that GRASS_EPSILON is useless in this particular case. It works
only with

/* EPSILON = GRASS_EPSILON; */
EPSILON = 10.
if (fabs(tval - values[i]) < EPSILON ) {

because the pixel values are growing so quickly from one map to the
next. Since we need a generic solution, another "precision" or whatever
parameter needs to be passed otherwise the thresholding fails completely.
EPSILON however depends on the input maps and needs to be user
controllable.

How to deal with that?

What exactly are you trying to measure?

From your original post, it sounds like you have a monotonically
increasing input,

Here an example (doy = day of year; gdd = growing degree days):

"doy","gdd"
1,0.0
2,0.0
3,0.0
4,0.0
...
44,3.4
45,3.4
46,4.5
47,6.3
...
99,164.6
100,170.3
101,175.4
102,178.5
...
193,997.8
194,1008.4
195,1018.3
196,1026.9
...
363,1980.1
364,1980.1
365,1980.1

it is a kind of sigmoidal function which differs from year to year and
pixel stack to pixel stack.

and you want the time (index) at which the value
first exceeds the threshold, e.g.:

   for \(i = 0; i &lt; n; i\+\+\) \{
       if \(values\[i\] &gt;= threshold\) \{
           \*result = \(DCELL\) i;
           return;
       \}
   \}
   Rast\_set\_d\_null\_value\(result, 1\);

If you need an approximate equality comparison, the usual solution is
to use the ratio of the difference to one of the values, e.g.:

   double EPSILON = 1e\-6; /\* 1 ppm \*/
   if \(fabs\(values\[i\] \- threshold\) &lt; EPSILON \* threshold\)
           \.\.\.

The problem is the non-linear increase here, I guess: If I would check
for 1000 GDD in above example, I have the input

...
193,997.8
194,1008.4
...

but get
EPSILON = 1e-6

abs(997 - 1000) < EPSILON * 1000

[1] FALSE

abs(1008.4 - 1000) < EPSILON * 1000

[1] FALSE

and still won't find it.

Markus

Markus:

The problem is the non-linear increase here, I guess: If I
would check for 1000 GDD in above example, I have the input

...
193,997.8
194,1008.4
...

but get
EPSILON = 1e-6
> abs(997 - 1000) < EPSILON * 1000
[1] FALSE
> abs(1008.4 - 1000) < EPSILON * 1000
[1] FALSE

and still won't find it.

the correct output here is "194", yes? ie the first day to cross that
threshold? Can you guarantee that the input data array is sorted?

why worry about EPSILON at all? If sorted:

for(i=0; i < data.n; i++)
{
   if ( data.val[i] > threshold )
      return data.day[i];
}

If not sorted just run through the whole loop and

first_day=366;
for (day)
{
if ((val > thresh) && (day < first_day))
    first_day = day;
}
return first_day;

?
Hamish

On Tue, Sep 8, 2009 at 8:05 AM, Hamish<hamish_b@yahoo.com> wrote:

Markus:

The problem is the non-linear increase here, I guess: If I
would check for 1000 GDD in above example, I have the input

...
193,997.8
194,1008.4
...

but get
EPSILON = 1e-6
> abs(997 - 1000) < EPSILON * 1000
[1] FALSE
> abs(1008.4 - 1000) < EPSILON * 1000
[1] FALSE

and still won't find it.

the correct output here is "194", yes?

Yes.

ie the first day to cross that threshold?

Yes.

Can you guarantee that the input data array is sorted?

Yes, because it is an accumulation function.

why worry about EPSILON at all? If sorted:

for(i=0; i < data.n; i++)
{
if ( data.val[i] > threshold )
return data.day[i];
}

Perhaps you are right...

If not sorted just run through the whole loop and

first_day=366;
for (day)
{
if ((val > thresh) && (day < first_day))
first_day = day;
}
return first_day;

Since the threshold function should be a generic one, I had
implemented this EPSILON stuff which then effectively causes
problems in this particular case. Maybe you code snippet is more
generic...

Markus