[GRASS-dev] how to reproject a raster map with absolute numbers without losing data

Hi,

In discussions with students the following problem has arisen, and I'm not sure how to respond to this:

When reprojecting raster data, one has to chose a method of interpolation. However, when the data you have represents absolute numbers (such as population grids), you do not wish to interpolate. What is needed is a technique to redistribute all units (the counts) into a new partition of space. This is implement in r.resamp.stats, but is not possible with r.proj.

As an example, I took the 2010 Worldpop population grid [1] of Madagascar. The data is described as:

SPATIAL RESOLUTION: 0.000833333 decimal degrees (approx 100m at the equator)
PROJECTION: Geographic, WGS84

I import the data into a epsg:4326 location and calculate the sum of the pixel values, i.e. the total population:

r.univar worldpop_madagascar_2010

total null and non-null cells: 143500959
total null cells: 70052410

Of the non-null cells:
----------------------
n: 73448549
minimum: 0
maximum: 1163.43
range: 1163.43
mean: 0.282021
mean of absolute values: 0.282021
standard deviation: 4.3961
variance: 19.3257
variation coefficient: 1558.79 %
sum: 20714000.1804286

I then reproject the data to UTM 38S (epsg:32738), by first estimating the correct region settings with

r.proj -g location=LL_WGS84 mapset=mlennert input=worldpop_madagascar_2010

which gives me

n=8678821.71318899 s=7156445.80116155 w=302657.5841309 e=1098038.55818598 rows=16387 cols=8757

I then set the region accordingly in order to stay close to the original grid and reproject

g.region n=8678821.71318899 s=7156445.80116155 w=302657.5841309 e=1098038.55818598 rows=16387 cols=8757
r.proj location=LL_WGS84 mapset=mlennert input=worldpop_madagascar_2010

When I then run r.univar on the resulting map, I get:

r.univar worldpop_madagascar_2010 100%
total null and non-null cells: 143500959
total null cells: 73263298

Of the non-null cells:
----------------------
n: 70237661
minimum: 0
maximum: 1163.43
range: 1163.43
mean: 0.281982
mean of absolute values: 0.281982
standard deviation: 4.37828
variance: 19.1693
variation coefficient: 1552.68 %
sum: 19805754.8529859

Madagascar has just lost 908,246 inhabitants !

Note also that while the total number of cells is equal between the two, the number of non-null cells is lower after reprojection.

Also note that the new region has resolutions:

nsres: 92.9014409
ewres: 90.82802033

while g.region -m in the lat-long location gives me:

nsres=92.24099389
ewres=87.22776965

When I use these resolution settings in the UTM location:

g.region n=8678821.71318899 s=7156445.80116155 w=302657.5841309 e=1098038.55818598 nsres=92.24099389 ewres=87.22776965 -a

I get the following result:

r.univar wp_mdg_2010
  100%
total null and non-null cells: 150525600
total null cells: 76865463

Of the non-null cells:
----------------------
n: 73660137
minimum: 0
maximum: 1163.43
range: 1163.43
mean: 0.282092
mean of absolute values: 0.282092
standard deviation: 4.38898
variance: 19.2632
variation coefficient: 1555.87 %
sum: 20778920.5206973

i.e. a total sum and a number of non-null cells much closer to those of the original map.

Two questions out of this (and maybe a potential bug report);

- How does r.proj determine the region settings. These seem quite "off" at least in what I would expect. How does g.region -m do it ? Could r.proj just use that procedure ? (BTW, reprojecting with a simple 'Save as' in QGIS, I get a pixel size of 88.8198,-91.3023, and using zonal statistiques on the entire country, I get exactly the same population be in lat-long or in UTM...)

- Would it be possible (desirable?) to implement some equivalent of r.resamp.stats, ideally with the -w flag for weighting the sum.

Or is there something I fundamentally misunderstand ?

Moritz

P.S. I've tried to summarise the issue on a the wiki page about the computational region [2]. If anyone wants to add to that, please feel free.

[1] http://www.worldpop.org.uk/
[2] https://grasswiki.osgeo.org/wiki/Computational_region#Understanding_the_impact_of_region_settings

On Mon, Nov 28, 2016 at 5:51 PM, Moritz Lennert <mlennert@club.worldonline.be> wrote:

Hi,

In discussions with students the following problem has arisen, and I’m not sure how to respond to this:

When reprojecting raster data, one has to chose a method of interpolation. However, when the data you have represents absolute numbers (such as population grids), you do not wish to interpolate. What is needed is a technique to redistribute all units (the counts) into a new partition of space. This is implement in r.resamp.stats, but is not possible with r.proj.

As an example, I took the 2010 Worldpop population grid [1] of Madagascar. The data is described as:

SPATIAL RESOLUTION: 0.000833333 decimal degrees (approx 100m at the equator)
PROJECTION: Geographic, WGS84

I import the data into a epsg:4326 location and calculate the sum of the pixel values, i.e. the total population:

r.univar worldpop_madagascar_2010

total null and non-null cells: 143500959
total null cells: 70052410

Of the non-null cells:

n: 73448549
minimum: 0
maximum: 1163.43
range: 1163.43
mean: 0.282021
mean of absolute values: 0.282021
standard deviation: 4.3961
variance: 19.3257
variation coefficient: 1558.79 %
sum: 20714000.1804286

I then reproject the data to UTM 38S (epsg:32738), by first estimating the correct region settings with

r.proj -g location=LL_WGS84 mapset=mlennert input=worldpop_madagascar_2010

which gives me

n=8678821.71318899 s=7156445.80116155 w=302657.5841309 e=1098038.55818598 rows=16387 cols=8757

I then set the region accordingly in order to stay close to the original grid and reproject

g.region n=8678821.71318899 s=7156445.80116155 w=302657.5841309 e=1098038.55818598 rows=16387 cols=8757
r.proj location=LL_WGS84 mapset=mlennert input=worldpop_madagascar_2010

When I then run r.univar on the resulting map, I get:

r.univar worldpop_madagascar_2010 100%
total null and non-null cells: 143500959
total null cells: 73263298

Of the non-null cells:

n: 70237661
minimum: 0
maximum: 1163.43
range: 1163.43
mean: 0.281982
mean of absolute values: 0.281982
standard deviation: 4.37828
variance: 19.1693
variation coefficient: 1552.68 %
sum: 19805754.8529859

Madagascar has just lost 908,246 inhabitants !

You would need to convert absolute numbers to densities before reprojection. Note that in latlong, cells have different sizes when expressed in square meters.

Note also that while the total number of cells is equal between the two, the number of non-null cells is lower after reprojection.

For illustration, you could create a vector grid in the source location, then reproject the vector grid to the target location to see how cells would be differently reprojected in different parts of the target computational region.

Also note that the new region has resolutions:

nsres: 92.9014409
ewres: 90.82802033

while g.region -m in the lat-long location gives me:

nsres=92.24099389
ewres=87.22776965

When I use these resolution settings in the UTM location:

g.region n=8678821.71318899 s=7156445.80116155 w=302657.5841309 e=1098038.55818598 nsres=92.24099389 ewres=87.22776965 -a

I get the following result:

r.univar wp_mdg_2010
100%
total null and non-null cells: 150525600
total null cells: 76865463

Of the non-null cells:

n: 73660137
minimum: 0
maximum: 1163.43
range: 1163.43
mean: 0.282092
mean of absolute values: 0.282092
standard deviation: 4.38898
variance: 19.2632
variation coefficient: 1555.87 %
sum: 20778920.5206973

i.e. a total sum and a number of non-null cells much closer to those of the original map.

Two questions out of this (and maybe a potential bug report);

  • How does r.proj determine the region settings. These seem quite “off” at least in what I would expect.

What would you expect? r.proj walks along the border of the input raster map and reprojects the border coordinates for each cell. Target e,w,n,s are the repsective maxima and minima of the reprojected coordinates. The resolution is simply e.g. (n - s) / rows.

How does g.region -m do it ?

g.region -m uses the mean of the edge lengths (western, eastern, northern, southern edge) divided by rows or cols.

Could r.proj just use that procedure ?

r.proj -g is more accurate than g.region -m because r.proj walks along the borders. The purpose of r.proj -g is to figure out extents that cover the whole input image.

(BTW, reprojecting with a simple ‘Save as’ in QGIS, I get a pixel size of 88.8198,-91.3023, and using zonal statistiques on the entire country, I get exactly the same population be in lat-long or in UTM…)

Is QGIS using gdalwarp internally?

  • Would it be possible (desirable?) to implement some equivalent of r.resamp.stats, ideally with the -w flag for weighting the sum.

IMHO, the -w flag does not make sense because cell borders are no longer rectangles after reprojection, they are rather something like trapezoids.

Markus M

Or is there something I fundamentally misunderstand ?

Moritz

P.S. I’ve tried to summarise the issue on a the wiki page about the computational region [2]. If anyone wants to add to that, please feel free.

[1] http://www.worldpop.org.uk/
[2] https://grasswiki.osgeo.org/wiki/Computational_region#Understanding_the_impact_of_region_settings


grass-dev mailing list
grass-dev@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/grass-dev

Le 29 novembre 2016 15:10:11 GMT+01:00, Markus Metz <markus.metz.giswork@gmail.com> a écrit :

On Mon, Nov 28, 2016 at 5:51 PM, Moritz Lennert <
mlennert@club.worldonline.be> wrote:

Hi,

In discussions with students the following problem has arisen, and

I'm
not sure how to respond to this:

When reprojecting raster data, one has to chose a method of

interpolation. However, when the data you have represents absolute
numbers
(such as population grids), you do not wish to interpolate. What is
needed
is a technique to redistribute all units (the counts) into a new
partition
of space. This is implement in r.resamp.stats, but is not possible with
r.proj.

As an example, I took the 2010 Worldpop population grid [1] of

Madagascar. The data is described as:

SPATIAL RESOLUTION: 0.000833333 decimal degrees (approx 100m at the

equator)

PROJECTION: Geographic, WGS84

I import the data into a epsg:4326 location and calculate the sum of

the
pixel values, i.e. the total population:

r.univar worldpop_madagascar_2010

total null and non-null cells: 143500959
total null cells: 70052410

Of the non-null cells:
----------------------
n: 73448549
minimum: 0
maximum: 1163.43
range: 1163.43
mean: 0.282021
mean of absolute values: 0.282021
standard deviation: 4.3961
variance: 19.3257
variation coefficient: 1558.79 %
sum: 20714000.1804286

I then reproject the data to UTM 38S (epsg:32738), by first

estimating
the correct region settings with

r.proj -g location=LL_WGS84 mapset=mlennert

input=worldpop_madagascar_2010

which gives me

n=8678821.71318899 s=7156445.80116155 w=302657.5841309

e=1098038.55818598
rows=16387 cols=8757

I then set the region accordingly in order to stay close to the

original
grid and reproject

g.region n=8678821.71318899 s=7156445.80116155 w=302657.5841309

e=1098038.55818598 rows=16387 cols=8757

r.proj location=LL_WGS84 mapset=mlennert

input=worldpop_madagascar_2010

When I then run r.univar on the resulting map, I get:

r.univar worldpop_madagascar_2010 100%
total null and non-null cells: 143500959
total null cells: 73263298

Of the non-null cells:
----------------------
n: 70237661
minimum: 0
maximum: 1163.43
range: 1163.43
mean: 0.281982
mean of absolute values: 0.281982
standard deviation: 4.37828
variance: 19.1693
variation coefficient: 1552.68 %
sum: 19805754.8529859

Madagascar has just lost 908,246 inhabitants !

You would need to convert absolute numbers to densities before
reprojection. Note that in latlong, cells have different sizes when
expressed in square meters.

Yes. Do we have any existing module to calculate area by pixel ? An area() function in r.mapcalc would be nice...

Note also that while the total number of cells is equal between the

two,
the number of non-null cells is lower after reprojection.

For illustration, you could create a vector grid in the source
location,
then reproject the vector grid to the target location to see how cells
would be differently reprojected in different parts of the target
computational region.

That would be a nice pedagogical tool, yes.

Also note that the new region has resolutions:

nsres: 92.9014409
ewres: 90.82802033

while g.region -m in the lat-long location gives me:

nsres=92.24099389
ewres=87.22776965

When I use these resolution settings in the UTM location:

g.region n=8678821.71318899 s=7156445.80116155 w=302657.5841309

e=1098038.55818598 nsres=92.24099389 ewres=87.22776965 -a

I get the following result:

r.univar wp_mdg_2010
100%
total null and non-null cells: 150525600
total null cells: 76865463

Of the non-null cells:
----------------------
n: 73660137
minimum: 0
maximum: 1163.43
range: 1163.43
mean: 0.282092
mean of absolute values: 0.282092
standard deviation: 4.38898
variance: 19.2632
variation coefficient: 1555.87 %
sum: 20778920.5206973

i.e. a total sum and a number of non-null cells much closer to those

of
the original map.

Two questions out of this (and maybe a potential bug report);

- How does r.proj determine the region settings. These seem quite

"off"
at least in what I would expect.

What would you expect?

I guess something that preserves as much as possible the number of pixels ? But yes I'm conscious that that's probably not a generalizable criterion...

r.proj walks along the border of the input

raster
map and reprojects the border coordinates for each cell. Target e,w,n,s
are
the repsective maxima and minima of the reprojected coordinates. The
resolution is simply e.g. (n - s) / rows.

How does g.region -m do it ?

g.region -m uses the mean of the edge lengths (western, eastern,
northern,
southern edge) divided by rows or cols.

Could r.proj just use that procedure ?

r.proj -g is more accurate than g.region -m because r.proj walks along
the
borders. The purpose of r.proj -g is to figure out extents that cover
the
whole input image.

OK, thanks for the explanation.

(BTW, reprojecting with a simple 'Save as' in QGIS, I get a pixel

size of
88.8198,-91.3023, and using zonal statistiques on the entire country, I
get
exactly the same population be in lat-long or in UTM...)

Is QGIS using gdalwarp internally?

Don't know....

- Would it be possible (desirable?) to implement some equivalent of

r.resamp.stats, ideally with the -w flag for weighting the sum.

IMHO, the -w flag does not make sense because cell borders are no
longer
rectangles after reprojection, they are rather something like
trapezoids.

And why does that imply that -w doesn't make sense ?

Thanks for the explanation !!

Moritz

On Tue, Nov 29, 2016 at 5:44 PM, Moritz Lennert <mlennert@club.worldonline.be> wrote:

Le 29 novembre 2016 15:10:11 GMT+01:00, Markus Metz <markus.metz.giswork@gmail.com> a écrit :

On Mon, Nov 28, 2016 at 5:51 PM, Moritz Lennert <
mlennert@club.worldonline.be> wrote:

Hi,

In discussions with students the following problem has arisen, and
I’m
not sure how to respond to this:

When reprojecting raster data, one has to chose a method of
interpolation. However, when the data you have represents absolute
numbers
(such as population grids), you do not wish to interpolate. What is
needed
is a technique to redistribute all units (the counts) into a new
partition
of space. This is implement in r.resamp.stats, but is not possible with
r.proj.

As an example, I took the 2010 Worldpop population grid [1] of
Madagascar. The data is described as:

SPATIAL RESOLUTION: 0.000833333 decimal degrees (approx 100m at the
equator)
PROJECTION: Geographic, WGS84

I import the data into a epsg:4326 location and calculate the sum of
the
pixel values, i.e. the total population:

r.univar worldpop_madagascar_2010

total null and non-null cells: 143500959
total null cells: 70052410

Of the non-null cells:

n: 73448549
minimum: 0
maximum: 1163.43
range: 1163.43
mean: 0.282021
mean of absolute values: 0.282021
standard deviation: 4.3961
variance: 19.3257
variation coefficient: 1558.79 %
sum: 20714000.1804286

I then reproject the data to UTM 38S (epsg:32738), by first
estimating
the correct region settings with

r.proj -g location=LL_WGS84 mapset=mlennert
input=worldpop_madagascar_2010

which gives me

n=8678821.71318899 s=7156445.80116155 w=302657.5841309
e=1098038.55818598
rows=16387 cols=8757

I then set the region accordingly in order to stay close to the
original
grid and reproject

g.region n=8678821.71318899 s=7156445.80116155 w=302657.5841309
e=1098038.55818598 rows=16387 cols=8757
r.proj location=LL_WGS84 mapset=mlennert
input=worldpop_madagascar_2010

When I then run r.univar on the resulting map, I get:

r.univar worldpop_madagascar_2010 100%
total null and non-null cells: 143500959
total null cells: 73263298

Of the non-null cells:

n: 70237661
minimum: 0
maximum: 1163.43
range: 1163.43
mean: 0.281982
mean of absolute values: 0.281982
standard deviation: 4.37828
variance: 19.1693
variation coefficient: 1552.68 %
sum: 19805754.8529859

Madagascar has just lost 908,246 inhabitants !

You would need to convert absolute numbers to densities before
reprojection. Note that in latlong, cells have different sizes when
expressed in square meters.

Yes. Do we have any existing module to calculate area by pixel ?

Not that I know of. Only for vector areas.

An area() function in r.mapcalc would be nice…

Indeed.

[…]

  • Would it be possible (desirable?) to implement some equivalent of
    r.resamp.stats, ideally with the -w flag for weighting the sum.

IMHO, the -w flag does not make sense because cell borders are no
longer
rectangles after reprojection, they are rather something like
trapezoids.

And why does that imply that -w doesn’t make sense ?

Because the -w flag in r.resamp.stats works with a rectangle overlapping other rectangles. It only works with rectangles, i.e. different cell extents in the same coordinate reference system (GRASS location). For reprojection, something like r.in.xyz could work:

  1. convert raster cells to vector points with raster cell value as vector z value.

  2. reproject these vector points to the target location

  3. export the vector points with v.out.ascii format=point

  4. import the exported points with r.in.xyz method=sum

The method options of r.in.xyz could be added to r.proj for spatial aggregation, but that would mean that most of the code of r.in.xyz would need to be copied to r.proj, then adapted. That would be an interesting enhancement, also considering that gdalwarp offers as resampling methods near, bilinear, cubic, cubicspline, lanczos, average, mode, max, min, med, Q1, Q3.

Markus M

Would indeed be a nice addition, but it isn’t too difficult to compute, using: From . In the comment section of that post there is a link to an addon on github that computes the (fractional) total area of all grid cells in a raster map. I haven’t tried it out, but perhaps this has some relevant code.

···

On 29-11-16 22:33, Markus Metz wrote:

On Tue, Nov 29, 2016 at 5:44 PM, Moritz Lennert <mlennert@club.worldonline.be> wrote:

Le 29 novembre 2016 15:10:11 GMT+01:00, Markus Metz <markus.metz.giswork@gmail.com> a écrit :

On Mon, Nov 28, 2016 at 5:51 PM, Moritz Lennert <
mlennert@club.worldonline.be> wrote:

Hi,

In discussions with students the following problem has arisen, and
I’m
not sure how to respond to this:

When reprojecting raster data, one has to chose a method of
interpolation. However, when the data you have represents absolute
numbers
(such as population grids), you do not wish to interpolate. What is
needed
is a technique to redistribute all units (the counts) into a new
partition
of space. This is implement in r.resamp.stats, but is not possible with
r.proj.

As an example, I took the 2010 Worldpop population grid [1] of
Madagascar. The data is described as:

SPATIAL RESOLUTION: 0.000833333 decimal degrees (approx 100m at the
equator)
PROJECTION: Geographic, WGS84

I import the data into a epsg:4326 location and calculate the sum of
the
pixel values, i.e. the total population:

r.univar worldpop_madagascar_2010

total null and non-null cells: 143500959
total null cells: 70052410

Of the non-null cells:

n: 73448549
minimum: 0
maximum: 1163.43
range: 1163.43
mean: 0.282021
mean of absolute values: 0.282021
standard deviation: 4.3961
variance: 19.3257
variation coefficient: 1558.79 %
sum: 20714000.1804286

I then reproject the data to UTM 38S (epsg:32738), by first
estimating
the correct region settings with

r.proj -g location=LL_WGS84 mapset=mlennert
input=worldpop_madagascar_2010

which gives me

n=8678821.71318899 s=7156445.80116155 w=302657.5841309
e=1098038.55818598
rows=16387 cols=8757

I then set the region accordingly in order to stay close to the
original
grid and reproject

g.region n=8678821.71318899 s=7156445.80116155 w=302657.5841309
e=1098038.55818598 rows=16387 cols=8757
r.proj location=LL_WGS84 mapset=mlennert
input=worldpop_madagascar_2010

When I then run r.univar on the resulting map, I get:

r.univar worldpop_madagascar_2010 100%
total null and non-null cells: 143500959
total null cells: 73263298

Of the non-null cells:

n: 70237661
minimum: 0
maximum: 1163.43
range: 1163.43
mean: 0.281982
mean of absolute values: 0.281982
standard deviation: 4.37828
variance: 19.1693
variation coefficient: 1552.68 %
sum: 19805754.8529859

Madagascar has just lost 908,246 inhabitants !

You would need to convert absolute numbers to densities before
reprojection. Note that in latlong, cells have different sizes when
expressed in square meters.

Yes. Do we have any existing module to calculate area by pixel ?

Not that I know of. Only for vector areas.

An area() function in r.mapcalc would be nice…

g.region rast=afripop10@PERMANENT
PI=3.1415926536
export g.region -g `export `g.proj -j
r.mapcalc "rcs = (sin(y()+$nsres/2) - sin(y()-$nsres/2)) * ($ewres * $PI/180) * float($a)^2";

https://pvanb.wordpress.com/2012/12/17/calculating-the-raster-cell-area-of-an-unprojected-raster-layer/

Indeed.

[…]

  • Would it be possible (desirable?) to implement some equivalent of
    r.resamp.stats, ideally with the -w flag for weighting the sum.

IMHO, the -w flag does not make sense because cell borders are no
longer
rectangles after reprojection, they are rather something like
trapezoids.

And why does that imply that -w doesn’t make sense ?

Because the -w flag in r.resamp.stats works with a rectangle overlapping other rectangles. It only works with rectangles, i.e. different cell extents in the same coordinate reference system (GRASS location). For reprojection, something like r.in.xyz could work:

  1. convert raster cells to vector points with raster cell value as vector z value.

  2. reproject these vector points to the target location

  3. export the vector points with v.out.ascii format=point

  4. import the exported points with r.in.xyz method=sum

The method options of r.in.xyz could be added to r.proj for spatial aggregation, but that would mean that most of the code of r.in.xyz would need to be copied to r.proj, then adapted. That would be an interesting enhancement, also considering that gdalwarp offers as resampling methods near, bilinear, cubic, cubicspline, lanczos, average, mode, max, min, med, Q1, Q3.

Markus M

_______________________________________________
grass-dev mailing list
[grass-dev@lists.osgeo.org](mailto:grass-dev@lists.osgeo.org)
[http://lists.osgeo.org/mailman/listinfo/grass-dev](http://lists.osgeo.org/mailman/listinfo/grass-dev)

On 30/11/16 09:40, Paulo van Breugel wrote:

On 29-11-16 22:33, Markus Metz wrote:

On Tue, Nov 29, 2016 at 5:44 PM, Moritz Lennert
<mlennert@club.worldonline.be <mailto:mlennert@club.worldonline.be>>
wrote:
>
> Le 29 novembre 2016 15:10:11 GMT+01:00, Markus Metz
<markus.metz.giswork@gmail.com <mailto:markus.metz.giswork@gmail.com>>
a écrit :
> >On Mon, Nov 28, 2016 at 5:51 PM, Moritz Lennert <
> >mlennert@club.worldonline.be <mailto:mlennert@club.worldonline.be>>
wrote:
> >>
> >> Hi,
> >>
> >> In discussions with students the following problem has arisen, and
> >I'm
> >not sure how to respond to this:
> >>
> >> When reprojecting raster data, one has to chose a method of
> >interpolation. However, when the data you have represents absolute
> >numbers
> >(such as population grids), you do not wish to interpolate. What is
> >needed
> >is a technique to redistribute all units (the counts) into a new
> >partition
> >of space. This is implement in r.resamp.stats, but is not possible with
> >r.proj.
> >>
> >> As an example, I took the 2010 Worldpop population grid [1] of
> >Madagascar. The data is described as:
> >>
> >> SPATIAL RESOLUTION: 0.000833333 decimal degrees (approx 100m at the
> >equator)
> >> PROJECTION: Geographic, WGS84
> >>
> >> I import the data into a epsg:4326 location and calculate the sum of
> >the
> >pixel values, i.e. the total population:
> >>
> >> r.univar worldpop_madagascar_2010
> >>
> >> total null and non-null cells: 143500959
> >> total null cells: 70052410
> >>
> >> Of the non-null cells:
> >> ----------------------
> >> n: 73448549
> >> minimum: 0
> >> maximum: 1163.43
> >> range: 1163.43
> >> mean: 0.282021
> >> mean of absolute values: 0.282021
> >> standard deviation: 4.3961
> >> variance: 19.3257
> >> variation coefficient: 1558.79 %
> >> sum: 20714000.1804286
> >>
> >> I then reproject the data to UTM 38S (epsg:32738), by first
> >estimating
> >the correct region settings with
> >>
> >> r.proj -g location=LL_WGS84 mapset=mlennert
> >input=worldpop_madagascar_2010
> >>
> >> which gives me
> >>
> >> n=8678821.71318899 s=7156445.80116155 w=302657.5841309
> >e=1098038.55818598
> >rows=16387 cols=8757
> >>
> >> I then set the region accordingly in order to stay close to the
> >original
> >grid and reproject
> >>
> >> g.region n=8678821.71318899 s=7156445.80116155 w=302657.5841309
> >e=1098038.55818598 rows=16387 cols=8757
> >> r.proj location=LL_WGS84 mapset=mlennert
> >input=worldpop_madagascar_2010
> >>
> >> When I then run r.univar on the resulting map, I get:
> >>
> >> r.univar worldpop_madagascar_2010 100%
> >> total null and non-null cells: 143500959
> >> total null cells: 73263298
> >>
> >> Of the non-null cells:
> >> ----------------------
> >> n: 70237661
> >> minimum: 0
> >> maximum: 1163.43
> >> range: 1163.43
> >> mean: 0.281982
> >> mean of absolute values: 0.281982
> >> standard deviation: 4.37828
> >> variance: 19.1693
> >> variation coefficient: 1552.68 %
> >> sum: 19805754.8529859
> >>
> >> Madagascar has just lost 908,246 inhabitants !
> >
> >You would need to convert absolute numbers to densities before
> >reprojection. Note that in latlong, cells have different sizes when
> >expressed in square meters.
>
> Yes. Do we have any existing module to calculate area by pixel ?

Not that I know of. Only for vector areas.

> An area() function in r.mapcalc would be nice...

Would indeed be a nice addition, but it isn't too difficult to compute,
using:

|g.region rast=afripop10@PERMANENT|
|PI=3.1415926536|
|export `g.region -g`|
|export `g.proj -j`|
|r.mapcalc "rcs = (sin(y()+$nsres/2) - sin(y()-$nsres/2)) * ($ewres *
$PI/180) * float($a)^2";|

Yes, obviously, but:

- Not many people will remember such formulas by heart, and so having a function implementing it would be a nice convenience.

- This function does the calculations on the sphere. For most applications this is precise enough, but there might be some cases where taking into account the actual ellipsoid might be better. I don't know how far the geodesic distance functions in GRASS go concerning this.

In the case of Afripop/Worldpop the error of the estimation is probably much bigger than the error of approximating the earth to a sphere, so going beyond would probably be overkill.

Moritz

On Wed, Nov 30, 2016 at 1:22 PM, Moritz Lennert <mlennert@club.worldonline.be> wrote:

On 30/11/16 09:40, Paulo van Breugel wrote:

On 29-11-16 22:33, Markus Metz wrote:

On Tue, Nov 29, 2016 at 5:44 PM, Moritz Lennert
<mlennert@club.worldonline.be mailto:[mlennert@club.worldonline.be](mailto:mlennert@club.worldonline.be)>
wrote:

Le 29 novembre 2016 15:10:11 GMT+01:00, Markus Metz
<markus.metz.giswork@gmail.com mailto:[markus.metz.giswork@gmail.com](mailto:markus.metz.giswork@gmail.com)>
a écrit :

On Mon, Nov 28, 2016 at 5:51 PM, Moritz Lennert <
mlennert@club.worldonline.be mailto:[mlennert@club.worldonline.be](mailto:mlennert@club.worldonline.be)>

wrote:

Hi,

In discussions with students the following problem has arisen, and
I’m
not sure how to respond to this:

When reprojecting raster data, one has to chose a method of
interpolation. However, when the data you have represents absolute
numbers
(such as population grids), you do not wish to interpolate. What is
needed
is a technique to redistribute all units (the counts) into a new
partition
of space. This is implement in r.resamp.stats, but is not possible with
r.proj.

As an example, I took the 2010 Worldpop population grid [1] of
Madagascar. The data is described as:

SPATIAL RESOLUTION: 0.000833333 decimal degrees (approx 100m at the
equator)
PROJECTION: Geographic, WGS84

I import the data into a epsg:4326 location and calculate the sum of
the
pixel values, i.e. the total population:

r.univar worldpop_madagascar_2010

total null and non-null cells: 143500959
total null cells: 70052410

Of the non-null cells:

n: 73448549
minimum: 0
maximum: 1163.43
range: 1163.43
mean: 0.282021
mean of absolute values: 0.282021
standard deviation: 4.3961
variance: 19.3257
variation coefficient: 1558.79 %
sum: 20714000.1804286

I then reproject the data to UTM 38S (epsg:32738), by first
estimating
the correct region settings with

r.proj -g location=LL_WGS84 mapset=mlennert
input=worldpop_madagascar_2010

which gives me

n=8678821.71318899 s=7156445.80116155 w=302657.5841309
e=1098038.55818598
rows=16387 cols=8757

I then set the region accordingly in order to stay close to the
original
grid and reproject

g.region n=8678821.71318899 s=7156445.80116155 w=302657.5841309
e=1098038.55818598 rows=16387 cols=8757
r.proj location=LL_WGS84 mapset=mlennert
input=worldpop_madagascar_2010

When I then run r.univar on the resulting map, I get:

r.univar worldpop_madagascar_2010 100%
total null and non-null cells: 143500959
total null cells: 73263298

Of the non-null cells:

n: 70237661
minimum: 0
maximum: 1163.43
range: 1163.43
mean: 0.281982
mean of absolute values: 0.281982
standard deviation: 4.37828
variance: 19.1693
variation coefficient: 1552.68 %
sum: 19805754.8529859

Madagascar has just lost 908,246 inhabitants !

You would need to convert absolute numbers to densities before
reprojection. Note that in latlong, cells have different sizes when
expressed in square meters.

Yes. Do we have any existing module to calculate area by pixel ?

Not that I know of. Only for vector areas.

An area() function in r.mapcalc would be nice…

Would indeed be a nice addition, but it isn’t too difficult to compute,
using:

|g.region rast=afripop10@PERMANENT|
|PI=3.1415926536|
|export g.region -g|
|export g.proj -j|
|r.mapcalc “rcs = (sin(y()+$nsres/2) - sin(y()-$nsres/2)) * ($ewres *
$PI/180) * float($a)^2”;|

Yes, obviously, but:

  • Not many people will remember such formulas by heart, and so having a function implementing it would be a nice convenience.

  • This function does the calculations on the sphere. For most applications this is precise enough, but there might be some cases where taking into account the actual ellipsoid might be better. I don’t know how far the geodesic distance functions in GRASS go concerning this.

GRASS has G_area_of_cell_at_row() in lib/gis/area.c. A new area() function for r.mapcalc would only need to initialize calculations with G_begin_cell_area_calculations() which considers the actual ellipsoid, then call G_area_of_cell_at_row().

Markus M

In the case of Afripop/Worldpop the error of the estimation is probably much bigger than the error of approximating the earth to a sphere, so going beyond would probably be overkill.

Moritz

On Wed, Nov 30, 2016 at 2:12 PM, Markus Metz <markus.metz.giswork@gmail.com>
wrote:

On Wed, Nov 30, 2016 at 1:22 PM, Moritz Lennert <
mlennert@club.worldonline.be> wrote:
>
> On 30/11/16 09:40, Paulo van Breugel wrote:
>>
>>
>>
>> On 29-11-16 22:33, Markus Metz wrote:
>>>
>>>
>>>
>>> On Tue, Nov 29, 2016 at 5:44 PM, Moritz Lennert
>>> <mlennert@club.worldonline.be <mailto:mlennert@club.worldonline.be>>
>>> wrote:
>>> >
>>> >
>>> >
>>> > Le 29 novembre 2016 15:10:11 GMT+01:00, Markus Metz
>>> <markus.metz.giswork@gmail.com <mailto:markus.metz.giswork@gmail.com>>
>>> a écrit :
>>> > >On Mon, Nov 28, 2016 at 5:51 PM, Moritz Lennert <
>>> > >mlennert@club.worldonline.be <mailto:mlennert@club.worldonline.be>>
>>>
>>> wrote:
>>> > >>
>>> > >> Hi,
>>> > >>
>>> > >> In discussions with students the following problem has arisen, and
>>> > >I'm
>>> > >not sure how to respond to this:
>>> > >>
>>> > >> When reprojecting raster data, one has to chose a method of
>>> > >interpolation. However, when the data you have represents absolute
>>> > >numbers
>>> > >(such as population grids), you do not wish to interpolate. What is
>>> > >needed
>>> > >is a technique to redistribute all units (the counts) into a new
>>> > >partition
>>> > >of space. This is implement in r.resamp.stats, but is not possible
with
>>> > >r.proj.
>>> > >>
>>> > >> As an example, I took the 2010 Worldpop population grid [1] of
>>> > >Madagascar. The data is described as:
>>> > >>
>>> > >> SPATIAL RESOLUTION: 0.000833333 decimal degrees (approx 100m at
the
>>> > >equator)
>>> > >> PROJECTION: Geographic, WGS84
>>> > >>
>>> > >> I import the data into a epsg:4326 location and calculate the sum
of
>>> > >the
>>> > >pixel values, i.e. the total population:
>>> > >>
>>> > >> r.univar worldpop_madagascar_2010
>>> > >>
>>> > >> total null and non-null cells: 143500959
>>> > >> total null cells: 70052410
>>> > >>
>>> > >> Of the non-null cells:
>>> > >> ----------------------
>>> > >> n: 73448549
>>> > >> minimum: 0
>>> > >> maximum: 1163.43
>>> > >> range: 1163.43
>>> > >> mean: 0.282021
>>> > >> mean of absolute values: 0.282021
>>> > >> standard deviation: 4.3961
>>> > >> variance: 19.3257
>>> > >> variation coefficient: 1558.79 %
>>> > >> sum: 20714000.1804286
>>> > >>
>>> > >> I then reproject the data to UTM 38S (epsg:32738), by first
>>> > >estimating
>>> > >the correct region settings with
>>> > >>
>>> > >> r.proj -g location=LL_WGS84 mapset=mlennert
>>> > >input=worldpop_madagascar_2010
>>> > >>
>>> > >> which gives me
>>> > >>
>>> > >> n=8678821.71318899 s=7156445.80116155 w=302657.5841309
>>> > >e=1098038.55818598
>>> > >rows=16387 cols=8757
>>> > >>
>>> > >> I then set the region accordingly in order to stay close to the
>>> > >original
>>> > >grid and reproject
>>> > >>
>>> > >> g.region n=8678821.71318899 s=7156445.80116155 w=302657.5841309
>>> > >e=1098038.55818598 rows=16387 cols=8757
>>> > >> r.proj location=LL_WGS84 mapset=mlennert
>>> > >input=worldpop_madagascar_2010
>>> > >>
>>> > >> When I then run r.univar on the resulting map, I get:
>>> > >>
>>> > >> r.univar worldpop_madagascar_2010 100%
>>> > >> total null and non-null cells: 143500959
>>> > >> total null cells: 73263298
>>> > >>
>>> > >> Of the non-null cells:
>>> > >> ----------------------
>>> > >> n: 70237661
>>> > >> minimum: 0
>>> > >> maximum: 1163.43
>>> > >> range: 1163.43
>>> > >> mean: 0.281982
>>> > >> mean of absolute values: 0.281982
>>> > >> standard deviation: 4.37828
>>> > >> variance: 19.1693
>>> > >> variation coefficient: 1552.68 %
>>> > >> sum: 19805754.8529859
>>> > >>
>>> > >> Madagascar has just lost 908,246 inhabitants !
>>> > >
>>> > >You would need to convert absolute numbers to densities before
>>> > >reprojection. Note that in latlong, cells have different sizes when
>>> > >expressed in square meters.
>>> >
>>> > Yes. Do we have any existing module to calculate area by pixel ?
>>>
>>> Not that I know of. Only for vector areas.
>>>
>>> > An area() function in r.mapcalc would be nice...
>>
>>
>> Would indeed be a nice addition, but it isn't too difficult to compute,
>> using:
>>
>> |g.region rast=afripop10@PERMANENT|
>> |PI=3.1415926536 <0415%20926%20536>|
>> |export `g.region -g`|
>> |export `g.proj -j`|
>> |r.mapcalc "rcs = (sin(y()+$nsres/2) - sin(y()-$nsres/2)) * ($ewres *
>> $PI/180) * float($a)^2";|
>>
>
>
> Yes, obviously, but:
>
> - Not many people will remember such formulas by heart, and so having a
function implementing it would be a nice convenience.
>
> - This function does the calculations on the sphere. For most
applications this is precise enough, but there might be some cases where
taking into account the actual ellipsoid might be better. I don't know how
far the geodesic distance functions in GRASS go concerning this.

True, good enough for Afripop given the large error margin in that data.
However, would be great if something better / more accurate could be
implemented in r.mapcalc.

GRASS has G_area_of_cell_at_row() in lib/gis/area.c. A new area() function
for r.mapcalc would only need to initialize calculations with
G_begin_cell_area_calculations() which considers the actual ellipsoid,
then call G_area_of_cell_at_row().

I saw G_area_of_cell_at_row() was used in this addon on github (
https://github.com/joergsteinkamp/r.area), but that is in C and I was
wondering if / how this can be used / accessed in a python script.

Markus M

>
> In the case of Afripop/Worldpop the error of the estimation is probably
much bigger than the error of approximating the earth to a sphere, so going
beyond would probably be overkill.
>
> Moritz

_______________________________________________
grass-dev mailing list
grass-dev@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/grass-dev

Le 29 novembre 2016 22:33:39 GMT+01:00, Markus Metz markus.metz.giswork@gmail.com a écrit :

For
reprojection, something like r.in.xyz could work:

I’m currently going through this process and it raises a series of issues:

  1. convert raster cells to vector points with raster cell value as
    vector z
    value.

I use r.to.vect for this. The -b flag allows +/- fast conversion, but it takes into account all cells. It would be great to have a flag telling it to only convert non-null cells. Or at least a hint to set a mask, which seems to have the desired effect.

  1. reproject these vector points to the target location

I locally modified v.proj to add a ‘Don’t build topology’ flag.

Is there any indication against this ?

  1. export the vector points with v.out.ascii format=point
  2. import the exported points with r.in.xyz method=sum

This works nicely.

Thanks ! If I find the time I will wrap this into a r.proj.aggregate (?) script…

The method options of r.in.xyz could be added to r.proj for spatial
aggregation, but that would mean that most of the code of r.in.xyz
would
need to be copied to r.proj, then adapted. That would be an interesting
enhancement, also considering that gdalwarp offers as resampling
methods
near, bilinear, cubic, cubicspline, lanczos, average, mode, max, min,
med,
Q1, Q3.

But it doesn’t offer sum…

Moritz

Hei,

What about something like this (example converts from UTM33 to WGS84 and creates ):

r.stats -gn1 INPUTMAP | cs2cs -E -f “%.6f” +proj=utm +no_defs +zone=33 +a=6378137 +rf=298.257222101 +towgs84=0.000,0.000,0.000 +to_meter=1 +to +proj=longlat +ellps=WGS84 +datum=WGS84 +no_defs | tr ‘\t’ ’ ’ | r.in.xyz --o --v z=6 input=- output=REPROJECTED_MAP separator=space

Or is cs2cs to slow or does it modify also z?

BTW, I think several modules would benefit from a -b do not build topology…

Cheers

Stefan

Le 30 novembre 2016 21:05:18 GMT+01:00, "Blumentrath, Stefan" <Stefan.Blumentrath@nina.no> a écrit :

Hei,

What about something like this (example converts from UTM33 to WGS84
and creates ):
r.stats -gn1 INPUTMAP | cs2cs -E -f "%.6f" +proj=utm +no_defs +zone=33
+a=6378137 +rf=298.257222101 +towgs84=0.000,0.000,0.000 +to_meter=1 +to
+proj=longlat +ellps=WGS84 +datum=WGS84 +no_defs | tr '\t' ' ' |
r.in.xyz --o --v z=6 input=- output=REPROJECTED_MAP separator=space

Or is cs2cs to slow

I did a superficial test and it did seem quite slow. But I didn't try very hard.

or does it modify also z?

AFAIK it modifies z. One would need to convert coordinates and then paste the new coordinates to the values.

BTW, I think several modules would benefit from a -b do not build
topology...

Which other modules are you thinking about?

Moritz

-----Original Message-----
From: Moritz Lennert [mailto:mlennert@club.worldonline.be]
Sent: torsdag 1. desember 2016 00.01

BTW, I think several modules would benefit from a -b do not build
topology...

Which other modules are you thinking about?
Moritz

All modules which produce points, e.g. v.to.points, but also v.extract would benefit (there may be more for sure).

In a project I had to patch data on a specific land cover type from three countries, which should be then merged into one dataset. Thus the workflow was v.extract -> v.patch. Building the topology, esp for maps with a lot of areas is often the most time consuming step...

Cheers
Stefan

On Wed, Nov 30, 2016 at 7:35 PM, Moritz Lennert <mlennert@club.worldonline.be> wrote:

Le 29 novembre 2016 22:33:39 GMT+01:00, Markus Metz <markus.metz.giswork@gmail.com> a écrit :

For
reprojection, something like r.in.xyz could work:

I’m currently going through this process and it raises a series of issues:

  1. convert raster cells to vector points with raster cell value as
    vector z
    value.

I use r.to.vect for this. The -b flag allows +/- fast conversion, but it takes into account all cells. It would be great to have a flag telling it to only convert non-null cells. Or at least a hint to set a mask, which seems to have the desired effect.

r.to.vect converts only non-null cells, no mask needed, at least for me in trunk.

  1. reproject these vector points to the target location

I locally modified v.proj to add a ‘Don’t build topology’ flag.

Is there any indication against this ?

No.

  1. export the vector points with v.out.ascii format=point
  2. import the exported points with r.in.xyz method=sum

This works nicely.

Thanks ! If I find the time I will wrap this into a r.proj.aggregate (?) script…

The method options of r.in.xyz could be added to r.proj for spatial
aggregation, but that would mean that most of the code of r.in.xyz
would
need to be copied to r.proj, then adapted. That would be an interesting
enhancement, also considering that gdalwarp offers as resampling
methods
near, bilinear, cubic, cubicspline, lanczos, average, mode, max, min,
med,
Q1, Q3.

But it doesn’t offer sum…

True, but it offers statistical aggregates. And for average, you need to have the sum internally anyway, then divide by the cell count (number of source cells falling into the current target cell), at least this is how I imagine how average would be calculated.

Markus M

Moritz

On 01/12/16 08:49, Markus Metz wrote:

r.to.vect converts only non-null cells, no mask needed, at least for me
in trunk.

Weird, I waws convinced that I saw otherwise before, but now I can confirm. Sorry for the noise...

>2. reproject these vector points to the target location

I locally modified v.proj to add a 'Don't build topology' flag.

Is there any indication against this ?

No.

Ok, added in trunk with r69960.

>
>The method options of r.in.xyz <http://r.in.xyz> could be added to

r.proj for spatial

>aggregation, but that would mean that most of the code of r.in.xyz

<http://r.in.xyz>

>would
>need to be copied to r.proj, then adapted. That would be an interesting
>enhancement, also considering that gdalwarp offers as resampling
>methods
>near, bilinear, cubic, cubicspline, lanczos, average, mode, max, min,
>med,
>Q1, Q3.

But it doesn't offer sum....

True, but it offers statistical aggregates. And for average, you need to
have the sum internally anyway, then divide by the cell count (number of
source cells falling into the current target cell), at least this is how
I imagine how average would be calculated.

True :wink:

Moritz

On 30/11/16 09:40, Paulo van Breugel wrote:

On Tue, Nov 29, 2016 at 5:44 PM, Moritz Lennert

> An area() function in r.mapcalc would be nice...

Would indeed be a nice addition, but it isn't too difficult to compute,
using:

|g.region rast=afripop10@PERMANENT|
|PI=3.1415926536|
|export `g.region -g`|
|export `g.proj -j`|

This doesn't work for me as the output is

g.proj -j

+proj=longlat
+no_defs
+a=6378137
+rf=298.257223563
+towgs84=0.000,0.000,0.000

and +a is not acceptable as variable name.

|r.mapcalc "rcs = (sin(y()+$nsres/2) - sin(y()-$nsres/2)) * ($ewres *
$PI/180) * float($a)^2";|

I tested this approach as well. I.e.

area = above formula
pop_density = pop / area

reproject pop_density

pop = pop_density * (ewres*nsres)

It is faster than the r.in.xyz approach. But it does not seem to be as precise.

I still lost about 100.000 inhabitants, and more when I smooth more (the "loss" increases from nearest neighbor to bilinear to bicubic).

In order to avoid precision issues, I tried with both density by m2 and density by km2, but results are similar.

I don't know which part of the error comes from the use of spheroid approximation and which part comes from the resampling at reprojection.

But I do think that if it is important to maintain exact totals, then the r.to.vect - v.proj - v.out.ascii - r.in.xyz solution seems to be more reliable.

Moritz

On 01-12-16 12:01, Moritz Lennert wrote:

On 30/11/16 09:40, Paulo van Breugel wrote:

On Tue, Nov 29, 2016 at 5:44 PM, Moritz Lennert

> An area() function in r.mapcalc would be nice...

Would indeed be a nice addition, but it isn't too difficult to compute,
using:

|g.region rast=afripop10@PERMANENT|
|PI=3.1415926536|
|export `g.region -g`|
|export `g.proj -j`|

This doesn't work for me as the output is

g.proj -j

+proj=longlat
+no_defs
+a=6378137
+rf=298.257223563
+towgs84=0.000,0.000,0.000

and +a is not acceptable as variable name.

You are right, that should be g.proj -g I think

|r.mapcalc "rcs = (sin(y()+$nsres/2) - sin(y()-$nsres/2)) * ($ewres *
$PI/180) * float($a)^2";|

I tested this approach as well. I.e.

area = above formula
pop_density = pop / area

reproject pop_density

pop = pop_density * (ewres*nsres)

It is faster than the r.in.xyz approach. But it does not seem to be as precise.

I still lost about 100.000 inhabitants, and more when I smooth more (the "loss" increases from nearest neighbor to bilinear to bicubic).

Just guessing, but an issue I had in the past with layers like Afripop is that it may contain cells with extremely high values (a result of the inter/extrapolation methods used to create those layers I guess, not sure it is an issue for e.g., Wordpop). Depending on the resampling method, this could also add to the differences in totals.

In order to avoid precision issues, I tried with both density by m2 and density by km2, but results are similar.

Does it make a difference if you use a much higher target resolution?

I don't know which part of the error comes from the use of spheroid approximation and which part comes from the resampling at reprojection.

It would be interesting to see how results are if using the same routine as above (convert to densities, reproject, convert back to totals), but using the more precise G_area_of_cell_at_row() function to compute the cell areas.

But I do think that if it is important to maintain exact totals, then the r.to.vect - v.proj - v.out.ascii - r.in.xyz solution seems to be more reliable.

Moritz

On Thu, Dec 1, 2016 at 1:16 PM, Paulo van Breugel <p.vanbreugel@gmail.com> wrote:

On 01-12-16 12:01, Moritz Lennert wrote:

On 30/11/16 09:40, Paulo van Breugel wrote:

On Tue, Nov 29, 2016 at 5:44 PM, Moritz Lennert

An area() function in r.mapcalc would be nice…

Would indeed be a nice addition, but it isn’t too difficult to compute,
using:

|g.region rast=afripop10@PERMANENT|
|PI=3.1415926536|
|export g.region -g|
|export g.proj -j|

This doesn’t work for me as the output is

g.proj -j

+proj=longlat
+no_defs
+a=6378137
+rf=298.257223563
+towgs84=0.000,0.000,0.000

and +a is not acceptable as variable name.

You are right, that should be g.proj -g I think

|r.mapcalc “rcs = (sin(y()+$nsres/2) - sin(y()-$nsres/2)) * ($ewres *
$PI/180) * float($a)^2”;|

I tested this approach as well. I.e.

area = above formula
pop_density = pop / area

reproject pop_density

pop = pop_density * (ewres*nsres)

It is faster than the r.in.xyz approach. But it does not seem to be as precise.

I still lost about 100.000 inhabitants, and more when I smooth more (the “loss” increases from nearest neighbor to bilinear to bicubic).

Note that some of the loss happens when a non-zero cell is bordering a NULL cell, unless you use one of the *_f interpolation methods.

Just guessing, but an issue I had in the past with layers like Afripop is that it may contain cells with extremely high values (a result of the inter/extrapolation methods used to create those layers I guess, not sure it is an issue for e.g., Wordpop). Depending on the resampling method, this could also add to the differences in totals.

In order to avoid precision issues, I tried with both density by m2 and density by km2, but results are similar.

Why not per hectare? The target cell size is close to one hectare, it could be set to exactly one hectare.

Does it make a difference if you use a much higher target resolution?

In theory, a higher target resolution should result in a higher total sum, a lower target resolution in a lower total sum.

I don’t know which part of the error comes from the use of spheroid approximation and which part comes from the resampling at reprojection.

It would be interesting to see how results are if using the same routine as above (convert to densities, reproject, convert back to totals), but using the more precise G_area_of_cell_at_row() function to compute the cell areas.

Try the new area() function of r.mapcalc :wink:

in the source location

record absolute count as src_count

convert absolute count to density per hectare with

r.mapcalc “pop_dens_ha = 10000.0 * pop_count / area()”

in the target location

set the region according to r.proj -g

use reasonable grid geometry with e.g. g.region -pa res=100

reproject pop_dens_ha with r.proj method=bilinear_f

now density is equal to the absolute count because density is per hectare and cell size is 10000 square meter

record absolute count as tgt_count

fix any deviations between src_count and tgt_count with

r.mapcalc “pop_count_calibrated = pop_dens_ha * $src_count / $tgt_count”

this calbration could also be done for administrative units separately

Markus M

But I do think that if it is important to maintain exact totals, then the r.to.vect - v.proj - v.out.ascii - r.in.xyz solution seems to be more reliable.

Moritz

On Thu, Dec 1, 2016 at 2:31 PM, Markus Metz <markus.metz.giswork@gmail.com>
wrote:

On Thu, Dec 1, 2016 at 1:16 PM, Paulo van Breugel <p.vanbreugel@gmail.com>
wrote:
>
>
>
> On 01-12-16 12:01, Moritz Lennert wrote:
>>
>> On 30/11/16 09:40, Paulo van Breugel wrote:
>>>
>>>
>>>> On Tue, Nov 29, 2016 at 5:44 PM, Moritz Lennert
>>
>>
>>>>
>>>> > An area() function in r.mapcalc would be nice...
>>>
>>>
>>> Would indeed be a nice addition, but it isn't too difficult to compute,
>>> using:
>>>
>>> |g.region rast=afripop10@PERMANENT|
>>> |PI=3.1415926536 <0415%20926%20536>|
>>> |export `g.region -g`|
>>> |export `g.proj -j`|
>>
>>
>> This doesn't work for me as the output is
>>
>> g.proj -j
>>
>> +proj=longlat
>> +no_defs
>> +a=6378137
>> +rf=298.257223563
>> +towgs84=0.000,0.000,0.000
>>
>> and +a is not acceptable as variable name.
>
>
> You are right, that should be g.proj -g I think
>
>>
>>> |r.mapcalc "rcs = (sin(y()+$nsres/2) - sin(y()-$nsres/2)) * ($ewres *
>>> $PI/180) * float($a)^2";|
>>>
>>
>> I tested this approach as well. I.e.
>>
>> area = above formula
>> pop_density = pop / area
>>
>> reproject pop_density
>>
>> pop = pop_density * (ewres*nsres)
>>
>> It is faster than the r.in.xyz approach. But it does not seem to be as
precise.
>>
>> I still lost about 100.000 inhabitants, and more when I smooth more
(the "loss" increases from nearest neighbor to bilinear to bicubic).

Note that some of the loss happens when a non-zero cell is bordering a
NULL cell, unless you use one of the *_f interpolation methods.
>
>
> Just guessing, but an issue I had in the past with layers like Afripop
is that it may contain cells with extremely high values (a result of the
inter/extrapolation methods used to create those layers I guess, not sure
it is an issue for e.g., Wordpop). Depending on the resampling method, this
could also add to the differences in totals.
>
>>
>> In order to avoid precision issues, I tried with both density by m2 and
density by km2, but results are similar.

Why not per hectare? The target cell size is close to one hectare, it
could be set to exactly one hectare.
>
> Does it make a difference if you use a much higher target resolution?

In theory, a higher target resolution should result in a higher total sum,
a lower target resolution in a lower total sum.
>>
>>
>> I don't know which part of the error comes from the use of spheroid
approximation and which part comes from the resampling at reprojection.
>
>
> It would be interesting to see how results are if using the same routine
as above (convert to densities, reproject, convert back to totals), but
using the more precise G_area_of_cell_at_row() function to compute the cell
areas.

Try the new area() function of r.mapcalc :wink:

Missed that in the meantime it was already implemented, awesome!

in the source location
record absolute count as src_count
convert absolute count to density per hectare with
r.mapcalc "pop_dens_ha = 10000.0 * pop_count / area()"

in the target location
set the region according to r.proj -g
use reasonable grid geometry with e.g. g.region -pa res=100
reproject pop_dens_ha with r.proj method=bilinear_f
now density is equal to the absolute count because density is per hectare
and cell size is 10000 square meter
record absolute count as tgt_count

fix any deviations between src_count and tgt_count with
r.mapcalc "pop_count_calibrated = pop_dens_ha * $src_count / $tgt_count"

this calbration could also be done for administrative units separately

Markus M
>
>
>>
>> But I do think that if it is important to maintain exact totals, then
the r.to.vect - v.proj - v.out.ascii - r.in.xyz solution seems to be more
reliable.
>>
>> Moritz
>>
>>
>

On Thu, Dec 1, 2016 at 12:01 PM, Moritz Lennert <mlennert@club.worldonline.be> wrote:

On 30/11/16 09:40, Paulo van Breugel wrote:

On Tue, Nov 29, 2016 at 5:44 PM, Moritz Lennert

An area() function in r.mapcalc would be nice…
[…]

|r.mapcalc “rcs = (sin(y()+$nsres/2) - sin(y()-$nsres/2)) * ($ewres *
$PI/180) * float($a)^2”;|

I tested this approach as well. I.e.

area = above formula
pop_density = pop / area

reproject pop_density

pop = pop_density * (ewres*nsres)

It is faster than the r.in.xyz approach. But it does not seem to be as precise.

I still lost about 100.000 inhabitants, and more when I smooth more (the “loss” increases from nearest neighbor to bilinear to bicubic).

In order to avoid precision issues, I tried with both density by m2 and density by km2, but results are similar.

I don’t know which part of the error comes from the use of spheroid approximation and which part comes from the resampling at reprojection.

But I do think that if it is important to maintain exact totals, then the r.to.vect - v.proj - v.out.ascii - r.in.xyz solution seems to be more reliable.

But with this approach you get spatial artefacts. Assume there are 9 cell in source with

10 10 10
10 10 10
10 10 10

they can become in target e.g.

10 0 20
10 0 20
10 10 10

Total count is maintained, but some target cells might receive no source cell whereas other target cells might receive more than one source cell.

Markus M

On 01/12/16 14:31, Markus Metz wrote:

On Thu, Dec 1, 2016 at 1:16 PM, Paulo van Breugel
<p.vanbreugel@gmail.com <mailto:p.vanbreugel@gmail.com>> wrote:

On 01-12-16 12:01, Moritz Lennert wrote:

On 30/11/16 09:40, Paulo van Breugel wrote:

On Tue, Nov 29, 2016 at 5:44 PM, Moritz Lennert

> An area() function in r.mapcalc would be nice...

Would indeed be a nice addition, but it isn't too difficult to compute,
using:

|g.region rast=afripop10@PERMANENT|
|PI=3.1415926536|
|export `g.region -g`|
|export `g.proj -j`|

This doesn't work for me as the output is

g.proj -j

+proj=longlat
+no_defs
+a=6378137
+rf=298.257223563
+towgs84=0.000,0.000,0.000

and +a is not acceptable as variable name.

You are right, that should be g.proj -g I think

No this does not provide a, but the name of the ellipsoid. I'm not sure that you can currently directly parse g.proj output for a...

This works for me:

eval $(g.proj -j | awk '{if(match($1,"=")>0){sub("+", "", $1); print $1}}')

|r.mapcalc "rcs = (sin(y()+$nsres/2) - sin(y()-$nsres/2)) * ($ewres *
$PI/180) * float($a)^2";|

I tested this approach as well. I.e.

area = above formula
pop_density = pop / area

reproject pop_density

pop = pop_density * (ewres*nsres)

It is faster than the r.in.xyz <http://r.in.xyz> approach. But it

does not seem to be as precise.

I still lost about 100.000 inhabitants, and more when I smooth more

(the "loss" increases from nearest neighbor to bilinear to bicubic).

Note that some of the loss happens when a non-zero cell is bordering a
NULL cell, unless you use one of the *_f interpolation methods.

Just guessing, but an issue I had in the past with layers like Afripop

is that it may contain cells with extremely high values (a result of the
inter/extrapolation methods used to create those layers I guess, not
sure it is an issue for e.g., Wordpop). Depending on the resampling
method, this could also add to the differences in totals.

In order to avoid precision issues, I tried with both density by m2

and density by km2, but results are similar.

Why not per hectare? The target cell size is close to one hectare, it
could be set to exactly one hectare.

Does it make a difference if you use a much higher target resolution?

In theory, a higher target resolution should result in a higher total
sum, a lower target resolution in a lower total sum.

Why is that if we work in densities ?

I don't know which part of the error comes from the use of spheroid

approximation and which part comes from the resampling at reprojection.

It would be interesting to see how results are if using the same

routine as above (convert to densities, reproject, convert back to
totals), but using the more precise G_area_of_cell_at_row() function to
compute the cell areas.

Try the new area() function of r.mapcalc :wink:

Thanks !!

FYI, for all of Madagascar, for pixels of resolution 0:00:02.99988 (i.e. +/- 100m at the equator), I get an RMSE of about 42m2 between Paulo's formula and area() in r.mapcalc. The error varies between 31 and 52m2, with highest values in the North, i.e. the closer you get to the equator.

in the source location
record absolute count as src_count
convert absolute count to density per hectare with
r.mapcalc "pop_dens_ha = 10000.0 * pop_count / area()"

in the target location
set the region according to r.proj -g
use reasonable grid geometry with e.g. g.region -pa res=100
reproject pop_dens_ha with r.proj method=bilinear_f

Actually, using area(), and a 100m target resolution, I get closer to the source total with bilinear than with bilinear_f, but in all cases the target total is now _above_ the source total:

Source total: 20714000

Target total - Source total:

nn: 22473
bilinear: 2300
bilinear_f: 26873

now density is equal to the absolute count because density is per
hectare and cell size is 10000 square meter
record absolute count as tgt_count

fix any deviations between src_count and tgt_count with
r.mapcalc "pop_count_calibrated = pop_dens_ha * $src_count / $tgt_count"

this calbration could also be done for administrative units separately

Yes, obviously, by calibrating this way we will get the same total by definition. But this means that we postulate that the "error" is distributed equally across space...

Thanks to both of you for the very helpful discussion. I'm afraid few people dealing with these data think much about this problem.

Moritz