#2131: Terrible performance from v.what.rast due to per-iteration db_execute
-------------------------------------+--------------------------------------
Reporter: hamish | Owner: grass-dev@…
Type: defect | Status: new
Priority: major | Milestone: 6.4.4
Component: Database | Version: svn-develbranch6
Keywords: v.what.rast, db_execute | Platform: Linux
Cpu: x86-64 |
-------------------------------------+--------------------------------------
Hi,
I'm running v.what.rast for 175k query points in 6.x. It's taking a
horribly long time.
With debug at level 1 it shows that it gets done with the query processing
and
on to the "Updating db table" stage in less than 1 second. Over an *hour
later* I'm still waiting for the dbf process, which is running at 99% cpu!
This
is a fast workstation too.
v.out.ascii's columns= option was suffering the same trouble last time I
tried,
to the point where it becomes unusable with more than ~ 10k vector points.
The v.colors, v.in.garmin, and v.in.gpsbabel scripts /used to/ suffer from
the same
thing, but we sped that up by writing all the sql commands to a temp file
and
then just running db.execute once. It seems that opening and closing the
database has non-trivial overhead associated with it, and when you do that
for
every single cat it adds up in a pretty impressive way. Even if another DB
backend is faster to start+write+stop, I doubt it would be more than ~20%
different, max. It seems 100k points takes much much longer than just 10x
the time for a 10k point vector map.
demo:
{{{
g.region rast=elevation
v.random out=test_100k_pts n=100000
v.db.addtable test_100k_pts column='cat integer, elev double' #gets slow
too!
time v.what.rast vect=test_100k_pts rast=elevation column=elev
}}}
My current workaround is to add a flag to v.what.rast to optionally print
the
result to stdout instead of writing it to a db column. (done locally, I'm
still
testing some other interpolation improvements so haven't committed
anything yet)
With that -p flag, the module takes 0.5 seconds to complete when stdout is
redirected to /dev/null.
any thoughts on the idea to write the sql commands to a to tempfile or
pipe,
then run db_execute_immediate() just once for all of them?
(maybe the per-iteration bsearch() in the loop is inefficient too, but
`top`
shows that 'dbf' is the thing eating all the cpu time)
in trunk it takes about 6 seconds to complete the 100k random points, I'm
not seeing anything obvious in the module changelog, so I guess something
in the libraries got fixed? any hints?
thanks,
Hamish
--
Ticket URL: <https://trac.osgeo.org/grass/ticket/2131>
GRASS GIS <http://grass.osgeo.org>