[GRASS-dev] [GRASS GIS] #3361: v.select: very slow on within (GEOS) operator

#3361: v.select: very slow on within (GEOS) operator
---------------------------------------+-------------------------
Reporter: mlennert | Owner: grass-dev@…
     Type: enhancement | Status: new
Priority: normal | Milestone: 7.4.0
Component: Vector | Version: svn-trunk
Keywords: v.select GEOS within slow | CPU: Unspecified
Platform: Unspecified |
---------------------------------------+-------------------------
I have not made similar tests with the other operators, but using the
within operator v.select is very slow.

First I create a buffer around the NC railroads map:

{{{
v.buffer railroads dist=5000 out=rail5000
}}}

Then v.select:

{{{
time v.select ain=boundary_municp bin=rail5000 op=within out=select
real 2m13.989s
user 1m57.888s
sys 0m15.956s
}}}

Using the following script, I get the identical result much faster (maybe
using v.distance is another option, but I haven't tried that):

{{{
g.copy vect=boundary_municp,munic
v.db.addcolumn munic col="totalarea double precision"
v.to.db munic op=area col=totalarea
v.overlay ain=munic bin=rail5000 op=and out=munic_and_buffer
v.db.addcolumn munic_and_buffer col="area double precision"
v.to.db munic_and_buffer op=area col=area
sleep 1
v.extract boundary_municp cat=$(db.select -c sql="select a_cat from
munic_and_buffer where round(area,1)/round(a_totalarea,1)=1" | awk
'{printf"%s,", $1}') output=select_bis
}}}

Time for running entire script:

{{{
real 0m14.611s
user 0m6.084s
sys 0m5.084s
}}}

I stumbled across this because a student had a within operation that kept
on running for hours and hours, and using an equivalent of the above
script we were able to get the same result within minutes.

I imagine that by going through GEOS we lose the spatial index, or that
there are other significant overheads, and that this is what causes such a
serious slowdown. This is such a difference, however, that I wonder if
there is anything we could do to optimize v.select's GEOS operators ? Or
is the only solution to implement the same operators natively ? Maybe a
nice GSoC project ?

I'm classifying this as an enhancement, but I'm pretty close to
considering such long operation time as soon as there is a significant
amount of data as a bug...

--
Ticket URL: <https://trac.osgeo.org/grass/ticket/3361&gt;
GRASS GIS <https://grass.osgeo.org>

#3361: v.select: very slow using within (GEOS) operator
--------------------------+---------------------------------------
  Reporter: mlennert | Owner: grass-dev@…
      Type: enhancement | Status: new
  Priority: normal | Milestone: 7.4.0
Component: Vector | Version: svn-trunk
Resolution: | Keywords: v.select GEOS within slow
       CPU: Unspecified | Platform: Unspecified
--------------------------+---------------------------------------

--
Ticket URL: <https://trac.osgeo.org/grass/ticket/3361#comment:1&gt;
GRASS GIS <https://grass.osgeo.org>

#3361: v.select: very slow using GEOS operators
--------------------------+---------------------------------------
  Reporter: mlennert | Owner: grass-dev@…
      Type: enhancement | Status: new
  Priority: normal | Milestone: 7.4.1
Component: Vector | Version: svn-trunk
Resolution: | Keywords: v.select GEOS within slow
       CPU: Unspecified | Platform: Unspecified
--------------------------+---------------------------------------

Comment (by mlennert):

Actually, it is not only within. Comparing the native 'overlap' operator
with its GEOS equivalent, the 'intersects' operator, I get significant
time difference:

{{{
time v.select -c ain=boundary_municp bin=rail5000 op=overlap
out=select_overlap
real 0m27.363s
user 0m12.836s
sys 0m14.696s
}}}

{{{
time v.select -c ain=boundary_municp bin=rail5000 op=intersects
out=select_intersects
real 1m12.190s
user 0m56.844s
sys 0m15.511s
}}}

--
Ticket URL: <https://trac.osgeo.org/grass/ticket/3361#comment:3&gt;
GRASS GIS <https://grass.osgeo.org>

#3361: v.select: very slow using GEOS operators
--------------------------+---------------------------------------
  Reporter: mlennert | Owner: grass-dev@…
      Type: enhancement | Status: closed
  Priority: normal | Milestone: 7.4.1
Component: Vector | Version: svn-trunk
Resolution: fixed | Keywords: v.select GEOS within slow
       CPU: Unspecified | Platform: Unspecified
--------------------------+---------------------------------------
Changes (by mmetz):

* status: new => closed
* resolution: => fixed

Comment:

In [changeset:"72705" 72705]:
{{{
#!CommitTicketReference repository="" revision="72705"
v.select: re-organize code to select features from vector map A by
features from other vector map B (fixes #3361)
}}}

--
Ticket URL: <https://trac.osgeo.org/grass/ticket/3361#comment:4&gt;
GRASS GIS <https://grass.osgeo.org>

#3361: v.select: very slow using GEOS operators
--------------------------+---------------------------------------
  Reporter: mlennert | Owner: grass-dev@…
      Type: enhancement | Status: closed
  Priority: normal | Milestone: 7.4.1
Component: Vector | Version: svn-trunk
Resolution: fixed | Keywords: v.select GEOS within slow
       CPU: Unspecified | Platform: Unspecified
--------------------------+---------------------------------------

Comment (by mmetz):

Replying to [comment:4 mmetz]:
> In [changeset:"72705" 72705]:
> {{{
> #!CommitTicketReference repository="" revision="72705"
> v.select: re-organize code to select features from vector map A by
features from other vector map B (fixes #3361)
> }}}

Assuming that the result will be a subset of map A, selected by features
from map B, the code re-organization results in a substantial speed-up.
v.select is now nearly as fast as the alternative in the description.

The results of `operator=overlap` and the GEOS-equivalent
`operator=intersects` are identical, but the speed difference based on the
example in the description

{{{
v.select ain=boundary_municp bin=rail5000 out=select op=overlap/intersects
}}}

is astonishing, as of trunk r72705.

--
Ticket URL: <https://trac.osgeo.org/grass/ticket/3361#comment:5&gt;
GRASS GIS <https://grass.osgeo.org>

#3361: v.select: very slow using GEOS operators
--------------------------+---------------------------------------
  Reporter: mlennert | Owner: grass-dev@…
      Type: enhancement | Status: closed
  Priority: normal | Milestone: 7.6.0
Component: Vector | Version: svn-trunk
Resolution: fixed | Keywords: v.select GEOS within slow
       CPU: Unspecified | Platform: Unspecified
--------------------------+---------------------------------------
Changes (by neteler):

* milestone: 7.4.1 => 7.6.0

--
Ticket URL: <https://trac.osgeo.org/grass/ticket/3361#comment:6&gt;
GRASS GIS <https://grass.osgeo.org>

#3361: v.select: very slow using GEOS operators
--------------------------+---------------------------------------
  Reporter: mlennert | Owner: grass-dev@…
      Type: enhancement | Status: closed
  Priority: normal | Milestone: 7.6.0
Component: Vector | Version: svn-trunk
Resolution: fixed | Keywords: v.select GEOS within slow
       CPU: Unspecified | Platform: Unspecified
--------------------------+---------------------------------------

Comment (by mlennert):

Replying to [comment:5 mmetz]:
> Replying to [comment:4 mmetz]:
> > In [changeset:"72705" 72705]:
> > {{{
> > #!CommitTicketReference repository="" revision="72705"
> > v.select: re-organize code to select features from vector map A by
features from other vector map B (fixes #3361)
> > }}}
>
> Assuming that the result will be a subset of map A, selected by features
from map B, the code re-organization results in a substantial speed-up.
v.select is now nearly as fast as the alternative in the description.
>
> The results of `operator=overlap` and the GEOS-equivalent
`operator=intersects` are identical, but the speed difference based on the
example in the description
>
> {{{
> v.select ain=boundary_municp bin=rail5000 out=select
op=overlap/intersects
> }}}
>
> is astonishing, as of trunk r72705.

As reported on the [http://lists.osgeo.org/pipermail/grass-
user/2018-May/078255.html grass-users list], working with r72716, I
actually get different results depending on whether I use intersects or
overlap, when working with atype=areas and btype=lines. Don't know if this
result is expected. I can provide the data privately if useful.

--
Ticket URL: <https://trac.osgeo.org/grass/ticket/3361#comment:7&gt;
GRASS GIS <https://grass.osgeo.org>

#3361: v.select: very slow using GEOS operators
--------------------------+---------------------------------------
  Reporter: mlennert | Owner: grass-dev@…
      Type: enhancement | Status: closed
  Priority: normal | Milestone: 7.4.2
Component: Vector | Version: svn-trunk
Resolution: fixed | Keywords: v.select GEOS within slow
       CPU: Unspecified | Platform: Unspecified
--------------------------+---------------------------------------
Changes (by neteler):

* milestone: 7.6.0 => 7.4.2

Comment:

Replying to [comment:4 mmetz]:
> In [changeset:"72705" 72705]:
> {{{
> #!CommitTicketReference repository="" revision="72705"
> v.select: re-organize code to select features from vector map A by
features from other vector map B (fixes #3361)
> }}}

Reopened for potential backport

--
Ticket URL: <https://trac.osgeo.org/grass/ticket/3361#comment:8&gt;
GRASS GIS <https://grass.osgeo.org>