[GRASS-dev] [GRASS GIS] #2466: r.param.scale slow on Windows when there is no data in raster

#2466: r.param.scale slow on Windows when there is no data in raster
--------------------------------+-------------------------------------------
Reporter: annakrat | Owner: grass-dev@…
     Type: defect | Status: new
Priority: normal | Milestone: 7.0.0
Component: Raster | Version: unspecified
Keywords: r.param.scale, nan | Platform: MSWindows 8
      Cpu: Unspecified |
--------------------------------+-------------------------------------------
When I run r.param.scale on Windows with raster map with a lot of no data,
the computation is significantly slower (90 s on Ubuntu, 50 min on
Windows). r.param.scale doesn't propagate nulls. I read somewhere that
computing with nan can be slower on some architectures, could this be the
reason?

I attached a patch to propagate nans. I also realized that at one point
during preprocessing, there are large numbers computed (especially with
big cell size and high window size), so I thought double would be more
appropriate.

If there are no objection, I would commit it to test it on Windows.

--
Ticket URL: <http://trac.osgeo.org/grass/ticket/2466&gt;
GRASS GIS <http://grass.osgeo.org>

#2466: r.param.scale slow on Windows when there is no data in raster
--------------------------------+-------------------------------------------
Reporter: annakrat | Owner: grass-dev@…
     Type: defect | Status: new
Priority: normal | Milestone: 7.0.0
Component: Raster | Version: unspecified
Keywords: r.param.scale, nan | Platform: MSWindows 8
      Cpu: Unspecified |
--------------------------------+-------------------------------------------

Comment(by glynn):

Replying to [ticket:2466 annakrat]:

> When I run r.param.scale on Windows with raster map with a lot of no
data, the computation is significantly slower (90 s on Ubuntu, 50 min on
Windows). r.param.scale doesn't propagate nulls. I read somewhere that
computing with nan can be slower on some architectures, could this be the
reason?

Normal floating-point operations will be handled directly by the CPU,
without the OS getting involved. But it's possible that converting NaN to
integer raises an exception which must be handled by the OS.

Aside from performance, it's not guaranteed that converting a NaN to an
integer will produce the integer null value (-2^31^), although that
happens to be the case on Linux/x86-64.

Code which converts potentially-null values needs to handle nulls
explicitly, e.g.
{{{
double d = ...;
int x;

if (Rast_is_d_null_value(&d))
     Rast_set_c_null_value(&x, 1);
else
     x = (CELL)floor(d);
}}}

Also, conversion should use normally use floor, ceil or rounding
(floor(x+0.5)). A C cast rounds toward zero (i.e. downward for positive
values, upward for negative values), which is rarely the correct choice.

> I attached a patch to propagate nans. I also realized that at one point
during preprocessing, there are large numbers computed (especially with
big cell size and high window size), so I thought double would be more
appropriate.

Are these values ever converted to integer? If so, conversion of out-of-
range values needs to be handled explicitly.

--
Ticket URL: <http://trac.osgeo.org/grass/ticket/2466#comment:1&gt;
GRASS GIS <http://grass.osgeo.org>

#2466: r.param.scale slow on Windows when there is no data in raster
--------------------------------+-------------------------------------------
Reporter: annakrat | Owner: grass-dev@…
     Type: defect | Status: new
Priority: normal | Milestone: 7.0.0
Component: Raster | Version: unspecified
Keywords: r.param.scale, nan | Platform: MSWindows 8
      Cpu: Unspecified |
--------------------------------+-------------------------------------------

Comment(by annakrat):

Replying to [comment:1 glynn]:
> Replying to [ticket:2466 annakrat]:
>
> > When I run r.param.scale on Windows with raster map with a lot of no
data, the computation is significantly slower (90 s on Ubuntu, 50 min on
Windows). r.param.scale doesn't propagate nulls. I read somewhere that
computing with nan can be slower on some architectures, could this be the
reason?
>
> Normal floating-point operations will be handled directly by the CPU,
without the OS getting involved. But it's possible that converting NaN to
integer raises an exception which must be handled by the OS.
>
> Aside from performance, it's not guaranteed that converting a NaN to an
integer will produce the integer null value (-2^31^), although that
happens to be the case on Linux/x86-64.
>
> Code which converts potentially-null values needs to handle nulls
explicitly, e.g.
> {{{
> double d = ...;
> int x;
>
> if (Rast_is_d_null_value(&d))
> Rast_set_c_null_value(&x, 1);
> else
> x = (CELL)floor(d);
> }}}
>
> Also, conversion should use normally use floor, ceil or rounding
(floor(x+0.5)). A C cast rounds toward zero (i.e. downward for positive
values, upward for negative values), which is rarely the correct choice.

It's not clear to me, what the algorithm should do when there is a null
value at one of the cells in the search window. It could just skip the
entire computation and write null in the central cell. This would be
probably the most correct option but it results in null edges. Or we could
assign 0 values but then the result on edges would be distorted by the
zeros. Which approach should we implement?

>
> > I attached a patch to propagate nans. I also realized that at one
point during preprocessing, there are large numbers computed (especially
with big cell size and high window size), so I thought double would be
more appropriate.
>
> Are these values ever converted to integer? If so, conversion of out-of-
range values needs to be handled explicitly.

I don't think so.

--
Ticket URL: <http://trac.osgeo.org/grass/ticket/2466#comment:2&gt;
GRASS GIS <http://grass.osgeo.org>

#2466: r.param.scale slow on Windows when there is no data in raster
--------------------------------+-------------------------------------------
Reporter: annakrat | Owner: grass-dev@…
     Type: defect | Status: new
Priority: normal | Milestone: 7.0.0
Component: Raster | Version: unspecified
Keywords: r.param.scale, nan | Platform: MSWindows 8
      Cpu: Unspecified |
--------------------------------+-------------------------------------------

Comment(by glynn):

Replying to [comment:2 annakrat]:

> It's not clear to me, what the algorithm should do when there is a null
value at one of the cells in the search window. It could just skip the
entire computation and write null in the central cell. This would be
probably the most correct option but it results in null edges. Or we could
assign 0 values but then the result on edges would be distorted by the
zeros. Which approach should we implement?

AFAICT, you either need to accept that nulls will propagate (i.e. if any
cell in the window is null, the result will be null), or replace the
least-squares fitting algorithm with something which can handle nulls.

FWIW, that module dates back to at least GRASS 4.x, when neither nulls nor
floating-point existed.

--
Ticket URL: <http://trac.osgeo.org/grass/ticket/2466#comment:3&gt;
GRASS GIS <http://grass.osgeo.org>

#2466: r.param.scale slow on Windows when there is no data in raster
--------------------------------+-------------------------------------------
Reporter: annakrat | Owner: grass-dev@…
     Type: defect | Status: new
Priority: normal | Milestone: 7.0.0
Component: Raster | Version: unspecified
Keywords: r.param.scale, nan | Platform: MSWindows 8
      Cpu: Unspecified |
--------------------------------+-------------------------------------------

Comment(by annakrat):

Replying to [comment:3 glynn]:
> AFAICT, you either need to accept that nulls will propagate (i.e. if any
cell in the window is null, the result will be null), or replace the
least-squares fitting algorithm with something which can handle nulls.

I changed it so that nulls propagate (r62648). I will test on Windows if
there will be any speed difference.

--
Ticket URL: <http://trac.osgeo.org/grass/ticket/2466#comment:4&gt;
GRASS GIS <http://grass.osgeo.org>

#2466: r.param.scale slow on Windows when there is no data in raster
--------------------------+-------------------------------------------------
  Reporter: annakrat | Owner: grass-dev@…
      Type: defect | Status: closed
  Priority: normal | Milestone: 7.0.0
Component: Raster | Version: unspecified
Resolution: fixed | Keywords: r.param.scale, nan
  Platform: MSWindows 8 | Cpu: Unspecified
--------------------------+-------------------------------------------------
Changes (by annakrat):

  * status: new => closed
  * resolution: => fixed

Comment:

Replying to [comment:4 annakrat]:
> Replying to [comment:3 glynn]:
> > AFAICT, you either need to accept that nulls will propagate (i.e. if
any cell in the window is null, the result will be null), or replace the
least-squares fitting algorithm with something which can handle nulls.
>
> I changed it so that nulls propagate (r62648). I will test on Windows if
there will be any speed difference.

Tested on Windows, it is running fast now. Backported in r62692. Closing
ticket.

--
Ticket URL: <http://trac.osgeo.org/grass/ticket/2466#comment:5&gt;
GRASS GIS <http://grass.osgeo.org>