[GRASS-dev] v.rast.stats: NULL values for very small areas

Hi all,

`v.rast.stats` uploads NULL values for areas which are smaller that
computation region resolution (ie. fits completely into one cell), see
[1] --- region grid in black, red lines for area of given category. In
this case v.rast.stats uploads NULL values. I wonder how to change
this behaviour since the modules uses `v.to.rast` [2].

Any idea? Thanks, Martin

[1] http://geo102.fsv.cvut.cz/~landa/tmp/v-rast-stats-null.png
[2] https://trac.osgeo.org/grass/browser/grass/trunk/scripts/v.rast.stats/v.rast.stats.py#L142

--
Martin Landa
http://geo.fsv.cvut.cz/gwiki/Landa
http://gismentors.cz/mentors/landa

On 06/10/15 16:57, Martin Landa wrote:

Hi all,

`v.rast.stats` uploads NULL values for areas which are smaller that
computation region resolution (ie. fits completely into one cell), see
[1] --- region grid in black, red lines for area of given category. In
this case v.rast.stats uploads NULL values. I wonder how to change
this behaviour since the modules uses `v.to.rast` [2].

One can argue that setting them NULL is actually a valid choice as it might not be legitimate to give them values of pixel so much larger. But maybe this should be decided by a flag ?

Any idea?

Get area and if area < pixel size then

- number = 1
- range = 0
- stddev,variance,coeff_var = 0 (or NULL)
- all other values = pixel value which you can get by replacing the areas by their centroids, you could use v.what.rast to get the pixel value.

But I wonder if this is not a bit of overkill...

Moritz

Hi,

2015-10-06 17:29 GMT+02:00 Moritz Lennert <mlennert@club.worldonline.be>:

One can argue that setting them NULL is actually a valid choice as it might
not be legitimate to give them values of pixel so much larger. But maybe
this should be decided by a flag ?

right, a flag could be an option.

[...]

Get area and if area < pixel size then

- number = 1

There can be more areas with same category.

- range = 0
- stddev,variance,coeff_var = 0 (or NULL)
- all other values = pixel value which you can get by replacing the areas by
their centroids, you could use v.what.rast to get the pixel value.

But I wonder if this is not a bit of overkill...

Why? I have request from the users, they are not happy about NULL
values. I just wonder how to implement within Python script. Probably
rewriting to C would make sense(?)

Martin

--
Martin Landa
http://geo.fsv.cvut.cz/gwiki/Landa
http://gismentors.cz/mentors/landa

On 06/10/15 17:51, Martin Landa wrote:

Hi,

2015-10-06 17:29 GMT+02:00 Moritz Lennert <mlennert@club.worldonline.be>:

One can argue that setting them NULL is actually a valid choice as it might
not be legitimate to give them values of pixel so much larger. But maybe
this should be decided by a flag ?

right, a flag could be an option.

[...]

Get area and if area < pixel size then

- number = 1

There can be more areas with same category.

Yes, but the 'number' statistic gives you the number of pixels in the area. This number is important to interpret the other statistics (a mean and stddev based on 2 pixels do not necessarily have the same statistical value as the same statistics based on several hundred pixels). Putting number = 1 is a bit of a 'lie', here as there is less than a pixel in the area, but the only way to get any numbers into the statistics is to use the value of a pixel, so number=1 means: we used one single pixel to determine these statistics.

- range = 0
- stddev,variance,coeff_var = 0 (or NULL)
- all other values = pixel value which you can get by replacing the areas by
their centroids, you could use v.what.rast to get the pixel value.

But I wonder if this is not a bit of overkill...

Why? I have request from the users, they are not happy about NULL
values.

But, as already said, one could argue that NULL is the scientifically most correct answer in this case...

v.rast.stats is not about getting pixel values, but about getting statistics of collections of pixel values. When you do not even have one single pixel entirely containted in the area, then it makes sense to say that you cannot calculate any statistics and set the value to NULL.

If you just want pixel values, you can always use v.what.rast based on the area centroids.

I just wonder how to implement within Python script.

Get areas of all polygons (or lenghts of lines) with v.to.db and then divide the features into those with areas above pixel size and those with areas below. Treat each set differently (i.e. the former as before, the latter by using v.what.rast - with the -p flag - to get the pixel value). As the updating of the table is done feature by feature anyhow, all you lose in terms of performance is the additional step of calculating areas/lengths of features.

Probably
rewriting to C would make sense(?)

It might make it more efficient, but I think the python version would work well and the change in python version should not be too complicated.

Moritz

2015-10-06 19:26 GMT+03:00 Moritz Lennert <mlennert@club.worldonline.be>:

I just wonder how to implement within Python script.

Get areas of all polygons (or lenghts of lines) with v.to.db and then divide
the features into those with areas above pixel size and those with areas
below. Treat each set differently (i.e. the former as before, the latter by
using v.what.rast - with the -p flag - to get the pixel value). As the
updating of the table is done feature by feature anyhow, all you lose in
terms of performance is the additional step of calculating areas/lengths of
features.

First of all - v.rast.stats is not affected by the size of polygon but
by its shape/location. It uses v.to.rast thus data will be provided if
raster cell centre falls into vector polygon (no matter how small it
could be). Thus theoretically it is possible to construct a large
polygon that still gets NULL value.
Thus the discussion should be - should statistics be collected also
for cells that only partially lie within the polygon. Raster cell is
the smallest unit and it is homogeneous thus interpretation "collect
data on all cells even if polygon just touches it" might seem to be a
valid idea. Proposed solution with v.what.rast is not a solution as it
will sample raster map at the location of centroid thus in case of
polygon covering more than one cell, a value of one of cells will be
provided.

Personally I tend to favour current behaviour. It could be a bit
better documented as current documentation states: "The vector map
will be rasterized according to the raster map resolution." It is
correct but an extra explanation would be helpful like: "Stats are
provided only for polygons covering raster cell centres (or whole
raster cell area)."

Just my 0.02c,
Māris.

Hi,

What about a flag for checking if all polygon categories are present in the raster version of the vector map, and giving a warning if that is not the case.

That will take more time but makes output more reliable.

Maybe one can - in case of non-rasterized polygons/lines - edit the raster map by writing cats to cells containing centroids (or midpoints for lines)....

However, also then - in case of significant mismatch between map scale and resolution of the region - can cats be not present in the raster (if more than one centroid falls into a cell). That only indicates that the user either has to rethink settings, choose another approach (repeat the command for only the non-rasterized polygons) or ignore the issues...

Cheers
Stefan

-----Original Message-----
From: grass-dev-bounces@lists.osgeo.org [mailto:grass-dev-bounces@lists.osgeo.org] On Behalf Of Maris Nartiss
Sent: 7. oktober 2015 10:23
To: Moritz Lennert <mlennert@club.worldonline.be>
Cc: Martin Landa <landa.martin@gmail.com>; GRASS developers list <grass-dev@lists.osgeo.org>
Subject: Re: [GRASS-dev] v.rast.stats: NULL values for very small areas

2015-10-06 19:26 GMT+03:00 Moritz Lennert <mlennert@club.worldonline.be>:

I just wonder how to implement within Python script.

Get areas of all polygons (or lenghts of lines) with v.to.db and then
divide the features into those with areas above pixel size and those
with areas below. Treat each set differently (i.e. the former as
before, the latter by using v.what.rast - with the -p flag - to get
the pixel value). As the updating of the table is done feature by
feature anyhow, all you lose in terms of performance is the additional
step of calculating areas/lengths of features.

First of all - v.rast.stats is not affected by the size of polygon but by its shape/location. It uses v.to.rast thus data will be provided if raster cell centre falls into vector polygon (no matter how small it could be). Thus theoretically it is possible to construct a large polygon that still gets NULL value.
Thus the discussion should be - should statistics be collected also for cells that only partially lie within the polygon. Raster cell is the smallest unit and it is homogeneous thus interpretation "collect data on all cells even if polygon just touches it" might seem to be a valid idea. Proposed solution with v.what.rast is not a solution as it will sample raster map at the location of centroid thus in case of polygon covering more than one cell, a value of one of cells will be provided.

Personally I tend to favour current behaviour. It could be a bit better documented as current documentation states: "The vector map will be rasterized according to the raster map resolution." It is correct but an extra explanation would be helpful like: "Stats are provided only for polygons covering raster cell centres (or whole raster cell area)."

Just my 0.02c,
Māris.
_______________________________________________
grass-dev mailing list
grass-dev@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/grass-dev