[GRASS-dev] [GRASS GIS] #2035: v.kcv optimization

#2035: v.kcv optimization
-------------------------+--------------------------------------------------
Reporter: mmetz | Owner: grass-dev@…
     Type: enhancement | Status: new
Priority: normal | Milestone: 7.0.0
Component: Vector | Version: svn-trunk
Keywords: v.kcv | Platform: Unspecified
      Cpu: Unspecified |
-------------------------+--------------------------------------------------
Jan Ruzicka and Jan Vandrol:

in connection with attempts to rewrite some GRASS module with OpenMP we
have rewrote the v.kcv module. The new version is much faster that the
previous (no OpenMP is now needed - there is only one loop now with direct
writing to the resulting layer). We believe that the new version of the
module do the same work as original. We are not familiar with GRASS GIS
development so the code is included in the attachment of this message.
Hope that it could be useful.

--
Ticket URL: <http://trac.osgeo.org/grass/ticket/2035&gt;
GRASS GIS <http://grass.osgeo.org>

#2035: v.kcv optimization
-------------------------+--------------------------------------------------
Reporter: mmetz | Owner: grass-dev@…
     Type: enhancement | Status: new
Priority: normal | Milestone: 7.0.0
Component: Vector | Version: svn-trunk
Keywords: v.kcv | Platform: Unspecified
      Cpu: Unspecified |
-------------------------+--------------------------------------------------

Comment(by mmetz):

Some comments on the patch:

The patch makes sense to me and is indeed a substantial speed improval.
However, partitions were confused with categories. The patch was written
in C++ style, but the module is a C module. I made further changes to the
patch: the attribute table of current vector is modified instead of
creating a new vector, giving a further speed-up. The vector is opened
without topology, that makes it easier for larger point clouds. Updating
the database uses db_[begin|commit]_transaction(), giving a further speed-
up.

Thanks for your contribution!

--
Ticket URL: <https://trac.osgeo.org/grass/ticket/2035#comment:1&gt;
GRASS GIS <http://grass.osgeo.org>

Thank you, we would like to help more, but C is not our favorite language.

Kind regards

Jan Ruzicka

On 18.7.2013 11:02, GRASS GIS wrote:

#2035: v.kcv optimization
-------------------------+--------------------------------------------------
  Reporter: mmetz | Owner: grass-dev@…
      Type: enhancement | Status: new
  Priority: normal | Milestone: 7.0.0
Component: Vector | Version: svn-trunk
  Keywords: v.kcv | Platform: Unspecified
       Cpu: Unspecified |
-------------------------+--------------------------------------------------

Comment(by mmetz):

  Some comments on the patch:

  The patch makes sense to me and is indeed a substantial speed improval.
  However, partitions were confused with categories. The patch was written
  in C++ style, but the module is a C module. I made further changes to the
  patch: the attribute table of current vector is modified instead of
  creating a new vector, giving a further speed-up. The vector is opened
  without topology, that makes it easier for larger point clouds. Updating
  the database uses db_[begin|commit]_transaction(), giving a further speed-
  up.

  Thanks for your contribution!

#2035: v.kcv optimization
-------------------------+--------------------------------------------------
Reporter: mmetz | Owner: grass-dev@…
     Type: enhancement | Status: new
Priority: normal | Milestone: 7.0.0
Component: Vector | Version: svn-trunk
Keywords: v.kcv | Platform: Unspecified
      Cpu: Unspecified |
-------------------------+--------------------------------------------------

Comment(by mmetz):

Any testing of v.kcv must ensure that there is no bias both in the point
selection and the partition assignment. Thus a 2-step random selection
process seems to provide the least biased results: 1. randomly select a
point, 2. randomly assign a partition to this point. This is what v.kcv
does in GRASS 6, but in a very inefficient way.

v.kcv has been optimized in trunk r57231.

Some testing with vectors in the North Carolina dataset:

Vector geonames_wake with 1088 points copied to the sqlite mapset:

{{{
g.copy vect=geonames_wake,my_geonames_wake

# g6
time v.kcv in=my_geonames_wake out=my_geonames_wake_kvc col=part k=10

real 2m27.108s
user 0m0.979s
sys 0m0.754s

# with db_[begin|commit]_transaction() added to local copy
real 0m1.931s
user 0m0.742s
sys 0m0.358s

# g7
v.kcv map=my_geonames_wake col=part k=10

real 0m0.625s
user 0m0.133s
sys 0m0.068s

}}}

Vector geodetic_pts with 29939 points copied to the sqlite mapset:

{{{
g.copy vect=geodetic_pts,my_geodetic_pts

# g6 with db_[begin|commit]_transaction() added to local copy
v.kcv in=my_geodetic_pts out=my_geodetic_pts_kvc col=part k=10

real 3m26.652s
user 2m52.859s
sys 0m30.714s

# g7
v.kcv map=my_geodetic_pts col=part k=10

real 0m4.813s
user 0m2.836s
sys 0m1.527s
}}}

In GRASS 7, about 70% of the processing time are used by the database
driver sqlite for a vector with about 120 000 points (not feasible to
process in GRASS 6). Any further optimization should thus look at the
sqlite driver, or even sqlite itself instead of the v.kcv module.

For GRASS 6, I would recommend to add db_[begin|commit]_transaction().

--
Ticket URL: <https://trac.osgeo.org/grass/ticket/2035#comment:2&gt;
GRASS GIS <http://grass.osgeo.org>

#2035: v.kcv optimization
-------------------------+--------------------------------------------------
Reporter: mmetz | Owner: grass-dev@…
     Type: enhancement | Status: new
Priority: normal | Milestone: 7.0.0
Component: Vector | Version: svn-trunk
Keywords: v.kcv | Platform: Unspecified
      Cpu: Unspecified |
-------------------------+--------------------------------------------------

Comment(by mmetz):

The attached patch is for GRASS 6.4.3 for those interested.

--
Ticket URL: <https://trac.osgeo.org/grass/ticket/2035#comment:3&gt;
GRASS GIS <http://grass.osgeo.org>

#2035: v.kcv optimization
-------------------------+--------------------------------------------------
Reporter: mmetz | Owner: grass-dev@…
     Type: enhancement | Status: new
Priority: normal | Milestone: 6.4.4
Component: Vector | Version: svn-trunk
Keywords: v.kcv | Platform: Unspecified
      Cpu: Unspecified |
-------------------------+--------------------------------------------------
Changes (by neteler):

  * milestone: 7.0.0 => 6.4.4

Comment:

Replying to [comment:3 mmetz]:
> The attached patch is for GRASS 6.4.3 for those interested.

Backported to devbranch65 in r59417.

--
Ticket URL: <http://trac.osgeo.org/grass/ticket/2035#comment:4&gt;
GRASS GIS <http://grass.osgeo.org>

#2035: v.kcv optimization
-------------------------+--------------------------------------------------
Reporter: mmetz | Owner: grass-dev@…
     Type: enhancement | Status: new
Priority: normal | Milestone: 6.4.4
Component: Vector | Version: svn-trunk
Keywords: v.kcv | Platform: Unspecified
      Cpu: Unspecified |
-------------------------+--------------------------------------------------

Comment(by mlennert):

I strongly prefer the GRASS65 version of v.kcv, and so would support
backporting it, with one serious remark though:

{{{
> g.gisenv
GISDBASE=/data/GRASS/DATA
LOCATION_NAME=nc_spm_08
MAPSET=user1

> v.info -c elev_lid792_randpts@PERMANENT
Displaying column types/names for database connection of layer 1:
INTEGER|cat
DOUBLE PRECISION|value

> v.kcv elev_lid792_randpts@PERMANENT k=10 col=partition
  100%
ERROR: Bug: attempt to update map which is not in current mapset

  > v.info -c elev_lid792_randpts@PERMANENT
Displaying column types/names for database connection of layer 1:
INTEGER|cat
DOUBLE PRECISION|value
INTEGER|partition
}}}

In other words: I have just updated the attribute table of a vector map
that is not in the current mapset. That looks like a serious breach of our
general rules here !

Trying the same in trunk, I get:

{{{
ERREUR :Vector map <elev_lid792_randpts> not found in current mapset
}}}

Moritz

--
Ticket URL: <https://trac.osgeo.org/grass/ticket/2035#comment:5&gt;
GRASS GIS <http://grass.osgeo.org>

#2035: v.kcv optimization
-------------------------+--------------------------------------------------
Reporter: mmetz | Owner: grass-dev@…
     Type: enhancement | Status: new
Priority: normal | Milestone: 6.4.4
Component: Vector | Version: svn-trunk
Keywords: v.kcv | Platform: Unspecified
      Cpu: Unspecified |
-------------------------+--------------------------------------------------

Comment(by neteler):

Replying to [comment:5 mlennert]:
> I strongly prefer the GRASS65 version of v.kcv, and so would support
> backporting it, with one serious remark though:
...
> In other words: I have just updated the attribute table of a vector map
> that is not in the current mapset.

I have added a test as used in v.build: r60263.

Still you can override that :frowning:

{{{
GRASS 6.5.svn (nc_spm_08):~ > v.kcv input=geonames_wake@PERMANENT
column=part k=10
  100%
ERROR: Bug: attempt to update map which is not in current mapset
}}}

This appears to affect also at least v.build:

{{{
GRASS 6.5.svn (nc_spm_08):~ > g.gisenv | grep MAPSET
MAPSET='neteler';
GRASS 6.5.svn (nc_spm_08):~ > v.build geonames_wake@PERMANENT
Building topology for vector map <geonames_wake>...
Registering primitives...
1088 primitives registered
...
}}}

... hence not scope of this ticket.

--
Ticket URL: <https://trac.osgeo.org/grass/ticket/2035#comment:6&gt;
GRASS GIS <http://grass.osgeo.org>

#2035: v.kcv optimization
-------------------------+--------------------------------------------------
Reporter: mmetz | Owner: grass-dev@…
     Type: enhancement | Status: new
Priority: normal | Milestone: 6.4.5
Component: Vector | Version: svn-trunk
Keywords: v.kcv | Platform: Unspecified
      Cpu: Unspecified |
-------------------------+--------------------------------------------------
Changes (by neteler):

  * milestone: 6.4.4 => 6.4.5

Comment:

For the record: The attachment:v.kcv_G64.patch still awaits the 6.4
backport.
OK to apply it?

--
Ticket URL: <https://trac.osgeo.org/grass/ticket/2035#comment:7&gt;
GRASS GIS <http://grass.osgeo.org>