[GRASS-dev] r.statistics limitation to CELL

Hi,

would it be much work to fix this:

GRASS 6.3.cvs (nc_spm_05):~ > r.statistics base=landuse96_28m \
               cover=elevation out=elevstats_avg method=average
ERROR: This module currently only works for integer (CELL) maps

Rounding elevation to CELL first isn't a great option.

Markus

Markus Neteler wrote:

would it be much work to fix this:

GRASS 6.3.cvs (nc_spm_05):~ > r.statistics base=landuse96_28m \
               cover=elevation out=elevstats_avg method=average
ERROR: This module currently only works for integer (CELL) maps

Rounding elevation to CELL first isn't a great option.

1. r.statistics works by reclassing the base map, so the base map
can't be FP.

2. r.statistics uses r.stats to calculate the statistics, and r.stats
reads its inputs as CELL.

r.stats is inherently based upon discrete categories. Even if it reads
FP maps as FP, you would need to quantise the values, otherwise you
would end up with every cell as its own category with count == 1. This
would require memory proportional to the size of the input map
multiplied by a significant factor (48-64 bytes per cell, or even
more).

To handle FP data, you really need a completely new approach which
computes aggregates incrementally, using an accumulator. This would
limit it to aggregates which can be computed that way, e.g. count,
sum, mean, variance and standard deviation.

[The last two would need to either use the one-pass algorithm (which
can result in negative variance for near-constant data due to rounding
error), or use two passes (computing the mean in the first pass so
that the actual deviations can be used in the second pass). See also:
the history of r.univar.]

As I've mentioned several times before, computing quantiles (e.g.
median) of large amounts of floating-point data is an open-ended
problem; any given approach has both pros and cons.

--
Glynn Clements <glynn@gclements.plus.com>

On Sat, Jun 02, 2007 at 01:00:12PM +0100, Glynn Clements wrote:

Markus Neteler wrote:

> would it be much work to fix this:
>
> GRASS 6.3.cvs (nc_spm_05):~ > r.statistics base=landuse96_28m \
> cover=elevation out=elevstats_avg method=average
> ERROR: This module currently only works for integer (CELL) maps
>
> Rounding elevation to CELL first isn't a great option.

1. r.statistics works by reclassing the base map, so the base map
can't be FP.

In this case I meant the cover map "elevation" which is rejected.
landuse96_28m is a CELL map, elevation FCELL.

2. r.statistics uses r.stats to calculate the statistics, and r.stats
reads its inputs as CELL.

In this case, r.statistics could also accept an FCELL map
without complaining? Currently I need extra steps to round
elevation to a CELL map before running r.statistics.

r.stats is inherently based upon discrete categories. Even if it reads
FP maps as FP, you would need to quantise the values, otherwise you
would end up with every cell as its own category with count == 1. This
would require memory proportional to the size of the input map
multiplied by a significant factor (48-64 bytes per cell, or even
more).

To handle FP data, you really need a completely new approach which
computes aggregates incrementally, using an accumulator. This would
limit it to aggregates which can be computed that way, e.g. count,
sum, mean, variance and standard deviation.

I darkly remember that Soeren has something already in the works.

[The last two would need to either use the one-pass algorithm (which
can result in negative variance for near-constant data due to rounding
error), or use two passes (computing the mean in the first pass so
that the actual deviations can be used in the second pass). See also:
the history of r.univar.]

As I've mentioned several times before, computing quantiles (e.g.
median) of large amounts of floating-point data is an open-ended
problem; any given approach has both pros and cons.

--
Glynn Clements <glynn@gclements.plus.com>

Markus

Hi,

Markus Neteler schrieb:

On Sat, Jun 02, 2007 at 01:00:12PM +0100, Glynn Clements wrote:

Markus Neteler wrote:

would it be much work to fix this:

GRASS 6.3.cvs (nc_spm_05):~ > r.statistics base=landuse96_28m \
               cover=elevation out=elevstats_avg method=average
ERROR: This module currently only works for integer (CELL) maps

Rounding elevation to CELL first isn't a great option.

1. r.statistics works by reclassing the base map, so the base map
can't be FP.

In this case I meant the cover map "elevation" which is rejected.
landuse96_28m is a CELL map, elevation FCELL.

2. r.statistics uses r.stats to calculate the statistics, and r.stats
reads its inputs as CELL.

In this case, r.statistics could also accept an FCELL map
without complaining? Currently I need extra steps to round
elevation to a CELL map before running r.statistics.

r.stats is inherently based upon discrete categories. Even if it reads
FP maps as FP, you would need to quantise the values, otherwise you
would end up with every cell as its own category with count == 1. This
would require memory proportional to the size of the input map
multiplied by a significant factor (48-64 bytes per cell, or even
more).

To handle FP data, you really need a completely new approach which
computes aggregates incrementally, using an accumulator. This would
limit it to aggregates which can be computed that way, e.g. count,
sum, mean, variance and standard deviation.

I darkly remember that Soeren has something already in the works.

I have implemented r3.stats from scratch to handle fp ranges:

GRASS 6.3.cvs > r3.stats help

Description:
  Generates volume statistics for raster3d maps.

Keywords:
  raster3d, statistics

Usage:
  r3.stats [-e] input=name [nsteps=value] [--verbose] [--quiet]

Flags:
   -e Calculate statistics based on equal value groups
  --v Verbose module output
  --q Quiet module output

Parameters:
    input Name of input raster3d map
   nsteps Number of sub-ranges to collect stats from
            default: 20

Example:

#region
north: 1000
south: 0
west: 0
east: 2000
top: 10.00000000
bottom: 0.00000000
nsres: 50
nsres3: 50
ewres: 50
ewres3: 50
tbres: 1
rows: 20
rows3: 20
cols: 40
cols3: 40
depths: 10
cells: 800
3dcells: 8000

# create a 3d map
r3.mapcalc "map3d = depth()"

# automatically calculated value range sub-groups

GRASS 6.3.cvs > r3.stats map3d nsteps=5
  100%
   num | minimum <= value | value < maximum | volume | perc | cell count
       1 1.000000000 2.800000000 4000000.000 20.00000 1600
       2 2.800000000 4.600000000 4000000.000 20.00000 1600
       3 4.600000000 6.400000000 4000000.000 20.00000 1600
       4 6.400000000 8.200000000 4000000.000 20.00000 1600
       5 8.200000000 10.000000001 4000000.000 20.00000 1600
       6 * * 0.000 0.00000 0

Sum of non Null cells:
         Volume = 20000000.000
         Percentage = 100.000
         Cell count = 8000

Sum of all cells:
         Volume = 20000000.000
         Percentage = 100.000
         Cell count = 8000

# groups of equal values
                    GRASS 6.3.cvs > r3.stats -e map3d
  100%
Sort non-null values
   num | value | volume | perc | cell count
       1 1.000000 2000000.000 10.00000 800
       2 2.000000 2000000.000 10.00000 800
       3 3.000000 2000000.000 10.00000 800
       4 4.000000 2000000.000 10.00000 800
       5 5.000000 2000000.000 10.00000 800
       6 6.000000 2000000.000 10.00000 800
       7 7.000000 2000000.000 10.00000 800
       8 8.000000 2000000.000 10.00000 800
       9 9.000000 2000000.000 10.00000 800
      10 10.000000 2000000.000 10.00000 800
      11 * 0.000 0.00000 0

Number of groups with equal values: 10
Sum of non Null cells:
         Volume = 20000000.000
         Percentage = 100.000
         Cell count = 8000

Sum of all cells:
         Volume = 20000000.000
         Percentage = 100.000
         Cell count = 8000

The approach is based on binary tree search algorithm.
The memory consumption increases with the number of sub-ranges or
the number of groups of equal values.

The map does not need to be read completely in memory.
AFAICT the sub-range search algorithm has an O(nlog(n)) complexity.
I guess the equal value computation can be improved, because the
computation time increases with the number of detected equal value groups.

Sören

[The last two would need to either use the one-pass algorithm (which
can result in negative variance for near-constant data due to rounding
error), or use two passes (computing the mean in the first pass so
that the actual deviations can be used in the second pass). See also:
the history of r.univar.]

As I've mentioned several times before, computing quantiles (e.g. median) of large amounts of floating-point data is an open-ended
problem; any given approach has both pros and cons.

--
Glynn Clements <glynn@gclements.plus.com>

Markus

_______________________________________________
grass-dev mailing list
grass-dev@grass.itc.it
http://grass.itc.it/mailman/listinfo/grass-dev

Markus Neteler wrote:

> > would it be much work to fix this:
> >
> > GRASS 6.3.cvs (nc_spm_05):~ > r.statistics base=landuse96_28m \
> > cover=elevation out=elevstats_avg method=average
> > ERROR: This module currently only works for integer (CELL) maps
> >
> > Rounding elevation to CELL first isn't a great option.
>
> 1. r.statistics works by reclassing the base map, so the base map
> can't be FP.

In this case I meant the cover map "elevation" which is rejected.
landuse96_28m is a CELL map, elevation FCELL.

> 2. r.statistics uses r.stats to calculate the statistics, and r.stats
> reads its inputs as CELL.

In this case, r.statistics could also accept an FCELL map
without complaining? Currently I need extra steps to round
elevation to a CELL map before running r.statistics.

You should be able to simply remove the check at
raster/r.statistics/main.c:83:

    if( G_raster_map_is_fp(covermap->answer, mapset) != 0 )
        G_fatal_error (_("This module currently only works for integer (CELL) maps"));

Or you might want to replace the error with a warning that the result
is inaccurate.

--
Glynn Clements <glynn@gclements.plus.com>

On Sun, Jun 03, 2007 at 03:58:40AM +0100, Glynn Clements wrote:

Markus Neteler wrote:
> Glynn Clements wrote:
> > 2. r.statistics uses r.stats to calculate the statistics, and r.stats
> > reads its inputs as CELL.
>
> In this case, r.statistics could also accept an FCELL map
> without complaining? Currently I need extra steps to round
> elevation to a CELL map before running r.statistics.

You should be able to simply remove the check at
raster/r.statistics/main.c:83:

    if( G_raster_map_is_fp(covermap->answer, mapset) != 0 )
        G_fatal_error (_("This module currently only works for integer (CELL) maps"));

Or you might want to replace the error with a warning that the result
is inaccurate.

If so, the latter. Currently I have to round() the elevation map
before running r.statistics - so results are inaccurate anyway
but the procedure requires more extra work.

The master question is: is it possible to conditionalize calls in
r3.stats to make it working for 2D maps. The new implementation from
Soeren sounds promising and along the lines of your suggestion to
rewrite r.stats from scratch.

Markus

Hi,

Markus Neteler schrieb:

On Sun, Jun 03, 2007 at 03:58:40AM +0100, Glynn Clements wrote:

Markus Neteler wrote:

Glynn Clements wrote:

2. r.statistics uses r.stats to calculate the statistics, and r.stats
reads its inputs as CELL.

In this case, r.statistics could also accept an FCELL map
without complaining? Currently I need extra steps to round
elevation to a CELL map before running r.statistics.

You should be able to simply remove the check at
raster/r.statistics/main.c:83:

    if( G_raster_map_is_fp(covermap->answer, mapset) != 0 )
        G_fatal_error (_("This module currently only works for integer (CELL) maps"));

Or you might want to replace the error with a warning that the result
is inaccurate.

If so, the latter. Currently I have to round() the elevation map
before running r.statistics - so results are inaccurate anyway
but the procedure requires more extra work.

The master question is: is it possible to conditionalize calls in
r3.stats to make it working for 2D maps. The new implementation from
Soeren sounds promising and along the lines of your suggestion to
rewrite r.stats from scratch.

The range and equal value group algorithm in r3.stats is generic, so
it is not complicated to extend r3.stats with raster support.
It is possible to call the statistic creation and computation functions
in r3.stats for every double/float array of which volume/area statistics should be
computed with very tiny modifications (eg: area instead of volume computation).
It is mostly independently from raster, volume or what ever maps.
But to implement all the nice features of r.stats in r3.stats is not that easy.

I have currently no time to implement this. :frowning:
Currently more important for me is to finish the core design of the gpde library
and to implement a faster preconditioned conjugate gradient solver (pcg) for better
numerical stability in v.surf.rst.
And to check the stability of the gpde lib to work with multiple threads (which is not easy).

Best regards
Soeren

Markus

_______________________________________________
grass-dev mailing list
grass-dev@grass.itc.it
http://grass.itc.it/mailman/listinfo/grass-dev

Markus Neteler wrote:

> > > 2. r.statistics uses r.stats to calculate the statistics, and r.stats
> > > reads its inputs as CELL.
> >
> > In this case, r.statistics could also accept an FCELL map
> > without complaining? Currently I need extra steps to round
> > elevation to a CELL map before running r.statistics.
>
> You should be able to simply remove the check at
> raster/r.statistics/main.c:83:
>
> if( G_raster_map_is_fp(covermap->answer, mapset) != 0 )
> G_fatal_error (_("This module currently only works for integer (CELL) maps"));
>
> Or you might want to replace the error with a warning that the result
> is inaccurate.

If so, the latter. Currently I have to round() the elevation map
before running r.statistics - so results are inaccurate anyway
but the procedure requires more extra work.

But if you round() manually, you know that you're introducing
inaccuracy, hence the suggestion of a warning if you allow r.stats to
do it automatically.

The master question is: is it possible to conditionalize calls in
r3.stats to make it working for 2D maps. The new implementation from
Soeren sounds promising and along the lines of your suggestion to
rewrite r.stats from scratch.

I don't know how r3.stats works.

However, if we were to re-write something, it would probably be better
to re-write r.statistics to avoid using r.stats than to re-write
r.stats itself. r.stats has other uses beyond being a back-end for
r.statistics.

r.statistics is a simpler case, as I would expect the number of
distinct categories in the base map to be low enough that you could
comfortably store an accumulator per category.

To simplify matters futher, I would use a linear array, indexed by
base category, rather than an associative array (btree, hash table
etc). Base categories would normally be small non-negative integers
(i.e. suitable for use as indices), and if they aren't (i.e. the
allocation is sparse or includes negative values), you can always
"renumber" the categories using a reclass, e.g.:

r.stats -n $inmap | awk '{print $1,"=",NR-1}' | r.reclass input=$inmap output=$outmap

Finally, it may be better to move the "awkward" aggregates (median,
mode) to a separate module, as the algorithms involved are completely
different.

It would probably be worth offering a choice of a single- or
multiple-pass algorithm. Count, sum, mean, minimum and maximum only
require a single pass. AFAICT, variance, standard deviation, skew and
kurtosis can all be done in a single pass, but a multiple-pass
algorithm would be more accurate.

Actually, computing the mode of a large amount of FP data (or integer
data with a large range) is even worse than computing quantiles.
AFAICT, any general solution must have worst-case memory consumption
proportional to either the number of cells in the input map or the
number of possible values (i.e. 2^32 for CELL/FCELL, 2^64 for DCELL).

For a large map with a small range, you need one counter for each
possible value. For a map with a large range, the optimal solution is
to simply sort the input map then scan the sorted data while tracking
the most common value found to data.

For a DCELL map which is too large to fit into memory, the only
practical solution is to reduce the precision.

--
Glynn Clements <glynn@gclements.plus.com>

Hi developers,
can someone please tell me where the alpha blending value of a map is
kept in grass? Inside the colorrules file? It is something that wasn't
added lots of time ago, right?
I couldn't find it by doing a quick search in the manuals and dev archives.

Thanks
Andrea

Andrea Antonello wrote:

can someone please tell me where the alpha blending value of a map is
kept in grass? Inside the colorrules file? It is something that wasn't
added lots of time ago, right?

Maps don't have a translucency (alpha-blending) value.

Translucency is a property of a layer in gis.m and the wx GUI; it
doesn't exist outside of those applications.

--
Glynn Clements <glynn@gclements.plus.com>

Thanks Glynn,
that means that there is also no plan to put it in some kind of persistence?
I feel that this is pretty near to the colorrules system.

Andrea

Glynn Clements probaly wrote:

Andrea Antonello wrote:

can someone please tell me where the alpha blending value of a map is
kept in grass? Inside the colorrules file? It is something that wasn't
added lots of time ago, right?

Maps don't have a translucency (alpha-blending) value.

Translucency is a property of a layer in gis.m and the wx GUI; it
doesn't exist outside of those applications.

Glynn Clements wrote:

Maps don't have a translucency (alpha-blending) value.

Just to note, r.in.gdal will save the .alpha channel, if present in the
input map. But AFAIK the GIS treats it as just another map.

You can also try d.his to replicate the effect.

Hamish

That is interesting, since that seems to give really an alpha for every
pixel. I was looking towards having alpha inside the rules in order to
have parts of the maps at different translucency.
However, if gdal creates another map for alpha, the path to persistence
will be long. I'll have to check what QGis does for this case.
In JGrass we would like to have alpha persistent and since we want to
keep compatibility with GRASS, I would really love to hear some opinions
from the developer team in order to follow their rules, instead of
changing afterwards to fix compatibility.
Any idea about what would be best?

Andrea

Hamish probaly wrote:

Glynn Clements wrote:

Maps don't have a translucency (alpha-blending) value.

Just to note, r.in.gdal will save the .alpha channel, if present in the
input map. But AFAIK the GIS treats it as just another map.

You can also try d.his to replicate the effect.

Hamish