[GRASS-user] Redefining computational region for vector overlay

Hello GRASS users!

I am working to make a certain vector process faster and more effective and would like to ask you for some advice on how to proceed.

I have a vector map containing points and a collection of vector maps containing polygons. My task is to calculate the area covered by these polygons en each polygon map but only under buffers of certain sizes around each of the points. I understand how to do this. My problem is how to do it faster and more effectively.

Each polygon map contains around 2 million of them. The point map has only ~60 points. Both cover the entire country (Sweden). The buffer zones extend only a relatively small area around each point (up to some tens of km). Selecting points and generating buffers around them is easy and fast. The problem appears in the overlay operation. The vast majority of polygons are not involved in the computations but both v.clip and v.overlay seem to need to process them all. This takes a long time.

I wish there could be a way of limiting the number of features involved in the computation to the size of each buffer zone in order to cut down the processing time. I understand (and confirmed empirically!) that setting the computational region does not work for vector operations in GRASS (except for v.in.region). I therefore wonder what can be done in this respect.

I am not new to GRASS but have not used it for heavy duty tasks in a long time, my skills have become rusty! Any hint will be greatly appreciated!

Thanks in advance!

Hernán

Le Sun, 13 May 2018 13:21:51 +0200,
Hernán De Angelis <dhdeangelis@comhem.se> a écrit :

Hello GRASS users!

I am working to make a certain vector process faster and more
effective and would like to ask you for some advice on how to proceed.

I have a vector map containing points and a collection of vector maps
containing polygons. My task is to calculate the area covered by
these polygons en each polygon map but only under buffers of certain
sizes around each of the points. I understand how to do this. My
problem is how to do it faster and more effectively.

Each polygon map contains around 2 million of them. The point map has
only ~60 points. Both cover the entire country (Sweden). The buffer
zones extend only a relatively small area around each point (up to
some tens of km). Selecting points and generating buffers around them
is easy and fast. The problem appears in the overlay operation. The
vast majority of polygons are not involved in the computations but
both v.clip and v.overlay seem to need to process them all. This
takes a long time.

I wish there could be a way of limiting the number of features
involved in the computation to the size of each buffer zone in order
to cut down the processing time. I understand (and confirmed
empirically!) that setting the computational region does not work for
vector operations in GRASS (except for v.in.region). I therefore
wonder what can be done in this respect.

I am not new to GRASS but have not used it for heavy duty tasks in a
long time, my skills have become rusty! Any hint will be greatly
appreciated!

I'm a bit surprised that v.overlay is so slow for you. In the the
little alternative (faster) script in [1] I got quite fast results
using v.overlay with other modules. Maybe it can help you identify a
faster way.

Moritz

[1] https://trac.osgeo.org/grass/ticket/3361.

Thanks for your answer and link, Moritz.

Of course, slow or fast are relative terms. May be I wasn't clear enough. Let's me explain better:

The core of my processing is simple:

1. extract point from point map (v.extract)

2. create buffer (v.buffer)

3. clip polygon map with buffered point (v.clip)

4. calculate areas (v.to.db)

This takes only 8 minutes. So we can agree to call it "fast".

The problem is: I have to do this for 60 points, in 5 different buffer sizes, for 14 sets of polygon maps, that is 4200 times.

At this rate, this process will take 4200 x 8 minutes = 33600 minutes, or about 23 days and 8 hours.

That's why my question was if there was setting I might activate to reduce computation time like, for example, limiting the computational region to the area under the buffer. In the meantime since I posted my question I read more carefully the v.clip manual and see that there is a switch (-r) to use in a defined region. This is what I am using now.

I will have a look at your script, but for what I see it does not seem critically different from my working routine. I believe I may explore another path, namely converting the polygons to a raster of suitably small pixel size and calculating the areas using r.stats instead. may be that's significantly faster.

Regards and thanks for the help!

/H.

On 2018-05-13 17:05, Moritz Lennert wrote:

Le Sun, 13 May 2018 13:21:51 +0200,
Hernán De Angelis <dhdeangelis@comhem.se> a écrit :

Hello GRASS users!

I am working to make a certain vector process faster and more
effective and would like to ask you for some advice on how to proceed.

I have a vector map containing points and a collection of vector maps
containing polygons. My task is to calculate the area covered by
these polygons en each polygon map but only under buffers of certain
sizes around each of the points. I understand how to do this. My
problem is how to do it faster and more effectively.

Each polygon map contains around 2 million of them. The point map has
only ~60 points. Both cover the entire country (Sweden). The buffer
zones extend only a relatively small area around each point (up to
some tens of km). Selecting points and generating buffers around them
is easy and fast. The problem appears in the overlay operation. The
vast majority of polygons are not involved in the computations but
both v.clip and v.overlay seem to need to process them all. This
takes a long time.

I wish there could be a way of limiting the number of features
involved in the computation to the size of each buffer zone in order
to cut down the processing time. I understand (and confirmed
empirically!) that setting the computational region does not work for
vector operations in GRASS (except for v.in.region). I therefore
wonder what can be done in this respect.

I am not new to GRASS but have not used it for heavy duty tasks in a
long time, my skills have become rusty! Any hint will be greatly
appreciated!

I'm a bit surprised that v.overlay is so slow for you. In the the
little alternative (faster) script in [1] I got quite fast results
using v.overlay with other modules. Maybe it can help you identify a
faster way.

Moritz

[1] https://trac.osgeo.org/grass/ticket/3361.

Le Sun, 13 May 2018 17:48:41 +0200,
Hernán De Angelis <dhdeangelis@comhem.se> a écrit :

Thanks for your answer and link, Moritz.

Of course, slow or fast are relative terms. May be I wasn't clear
enough. Let's me explain better:

The core of my processing is simple:

1. extract point from point map (v.extract)

2. create buffer (v.buffer)

3. clip polygon map with buffered point (v.clip)

4. calculate areas (v.to.db)

This takes only 8 minutes. So we can agree to call it "fast".

8 minutes for 1 buffer and the 2 million polygons ? And it is step 3
which takes a long time ? I agree that this should be faster. It would
be interesting to understand which part of v.clip takes the longest.

AFAICT, if you clip areas by areas it uses v.overlay so that is
identical with my script. However, by default there is a dissolve step,
unless you use the -d flag. Maybe try with this ?

I imagine that your buffers overlap ? Another option would be to run
v.buffer only once on all points with the -t flag, then v.overlay, then
calculate the areas of overlap between the buffers and the small
polygons and with some SQL magic group the areas by original point cats.

The problem is: I have to do this for 60 points, in 5 different
buffer sizes, for 14 sets of polygon maps, that is 4200 times.

At this rate, this process will take 4200 x 8 minutes = 33600
minutes, or about 23 days and 8 hours.

If this approach ends up being the only one, you might be able to
significantly shorten this by parallelizing it.

That's why my question was if there was setting I might activate to
reduce computation time like, for example, limiting the computational
region to the area under the buffer. In the meantime since I posted
my question I read more carefully the v.clip manual and see that
there is a switch (-r) to use in a defined region. This is what I am
using now.

Watch out, the -r flag in v.clip means that instead of clipping using
the buffers, you clip your polygons by the rectangle
representing your current computational region. This means you will not
have the area of polygons overlaid by the buffers, but the area that
falls in the current computational region.

I will have a look at your script, but for what I see it does not
seem critically different from my working routine. I believe I may
explore another path, namely converting the polygons to a raster of
suitably small pixel size and calculating the areas using r.stats
instead. may be that's significantly faster.

r.stats, or r.stats.zonal, or r.univar with the zones parameter, but
for all these the issue is how to handle overlapping buffer areas.

GRASS GIS normally uses a spatial index. This means that overlay
operations of vector features should not have to test in detail all
features, but only those that fall into the relevant bounding boxes.
Maybe there is some issue with this index not being
used ? CC'ing MarkusM who wrote the index code.

Moritz

Hi again Moritz, and thanks for your detailed answer.

Thank you for explaining what the -r switch does in v.clip. Yes, it is step 3 that takes long time to complete.

I suspect that the polygon maps may be problematic themselves. When I imported them from shapefiles it also took a long time with all the cleaning and topology checks.

Unfortunately I do not have much time to explore this in detail now. I will try converting to these polygon maps to a raster with small enough cell size. That will be fine.

Thanks again for the help!

Cheers!

H.

On 2018-05-13 19:06, Moritz Lennert wrote:

Le Sun, 13 May 2018 17:48:41 +0200,
Hernán De Angelis <dhdeangelis@comhem.se> a écrit :

Thanks for your answer and link, Moritz.

Of course, slow or fast are relative terms. May be I wasn't clear
enough. Let's me explain better:

The core of my processing is simple:

1. extract point from point map (v.extract)

2. create buffer (v.buffer)

3. clip polygon map with buffered point (v.clip)

4. calculate areas (v.to.db)

This takes only 8 minutes. So we can agree to call it "fast".

8 minutes for 1 buffer and the 2 million polygons ? And it is step 3
which takes a long time ? I agree that this should be faster. It would
be interesting to understand which part of v.clip takes the longest.

AFAICT, if you clip areas by areas it uses v.overlay so that is
identical with my script. However, by default there is a dissolve step,
unless you use the -d flag. Maybe try with this ?

I imagine that your buffers overlap ? Another option would be to run
v.buffer only once on all points with the -t flag, then v.overlay, then
calculate the areas of overlap between the buffers and the small
polygons and with some SQL magic group the areas by original point cats.

The problem is: I have to do this for 60 points, in 5 different
buffer sizes, for 14 sets of polygon maps, that is 4200 times.

At this rate, this process will take 4200 x 8 minutes = 33600
minutes, or about 23 days and 8 hours.

If this approach ends up being the only one, you might be able to
significantly shorten this by parallelizing it.

That's why my question was if there was setting I might activate to
reduce computation time like, for example, limiting the computational
region to the area under the buffer. In the meantime since I posted
my question I read more carefully the v.clip manual and see that
there is a switch (-r) to use in a defined region. This is what I am
using now.

Watch out, the -r flag in v.clip means that instead of clipping using
the buffers, you clip your polygons by the rectangle
representing your current computational region. This means you will not
have the area of polygons overlaid by the buffers, but the area that
falls in the current computational region.

I will have a look at your script, but for what I see it does not
seem critically different from my working routine. I believe I may
explore another path, namely converting the polygons to a raster of
suitably small pixel size and calculating the areas using r.stats
instead. may be that's significantly faster.

r.stats, or r.stats.zonal, or r.univar with the zones parameter, but
for all these the issue is how to handle overlapping buffer areas.

GRASS GIS normally uses a spatial index. This means that overlay
operations of vector features should not have to test in detail all
features, but only those that fall into the relevant bounding boxes.
Maybe there is some issue with this index not being
used ? CC'ing MarkusM who wrote the index code.

Moritz

On Sun, May 13, 2018 at 7:06 PM, Moritz Lennert <mlennert@club.worldonline.be> wrote:

Le Sun, 13 May 2018 17:48:41 +0200,
Hernán De Angelis <dhdeangelis@comhem.se> a écrit :

Thanks for your answer and link, Moritz.

Of course, slow or fast are relative terms. May be I wasn’t clear
enough. Let’s me explain better:

The core of my processing is simple:

  1. extract point from point map (v.extract)

  2. create buffer (v.buffer)

  3. clip polygon map with buffered point (v.clip)

  4. calculate areas (v.to.db)

This takes only 8 minutes. So we can agree to call it “fast”.

8 minutes for 1 buffer and the 2 million polygons ? And it is step 3
which takes a long time ? I agree that this should be faster. It would
be interesting to understand which part of v.clip takes the longest.

AFAICT, if you clip areas by areas it uses v.overlay so that is
identical with my script. However, by default there is a dissolve step,
unless you use the -d flag. Maybe try with this ?

I imagine that your buffers overlap ? Another option would be to run
v.buffer only once on all points with the -t flag, then v.overlay, then
calculate the areas of overlap between the buffers and the small
polygons and with some SQL magic group the areas by original point cats.

The problem is: I have to do this for 60 points, in 5 different
buffer sizes, for 14 sets of polygon maps, that is 4200 times.

At this rate, this process will take 4200 x 8 minutes = 33600
minutes, or about 23 days and 8 hours.

If this approach ends up being the only one, you might be able to
significantly shorten this by parallelizing it.

That’s why my question was if there was setting I might activate to
reduce computation time like, for example, limiting the computational
region to the area under the buffer. In the meantime since I posted
my question I read more carefully the v.clip manual and see that
there is a switch (-r) to use in a defined region. This is what I am
using now.

Watch out, the -r flag in v.clip means that instead of clipping using
the buffers, you clip your polygons by the rectangle
representing your current computational region. This means you will not
have the area of polygons overlaid by the buffers, but the area that
falls in the current computational region.

I will have a look at your script, but for what I see it does not
seem critically different from my working routine. I believe I may
explore another path, namely converting the polygons to a raster of
suitably small pixel size and calculating the areas using r.stats
instead. may be that’s significantly faster.

r.stats, or r.stats.zonal, or r.univar with the zones parameter, but
for all these the issue is how to handle overlapping buffer areas.

GRASS GIS normally uses a spatial index. This means that overlay
operations of vector features should not have to test in detail all
features, but only those that fall into the relevant bounding boxes.
Maybe there is some issue with this index not being
used ? CC’ing MarkusM who wrote the index code.

You could try to replace

  1. v.clip

with

3.1. v.select to select only those areas overlapping with the current buffer

3.2 v.overlay of the selected areas with the current buffer which should now take only seconds

Markus M

On 2018-05-13 21:29, Markus Metz wrote:

ou could try to replace

3. v.clip

with

3.1. v.select to select only those areas overlapping with the current buffer
3.2 v.overlay of the selected areas with the current buffer which should now take only seconds

Markus M

That made a *very* serious improvement in performance!

Now processing ~2.5 points per minute (30 in 12 minutes). From the previous 1 every 8 minutes with v.clip this change makes a ~20x improvement in performance!

Thanks Markus for the suggestion and thanks Moritz for your kind help!

/H.