[GRASS-dev] large vector problems

Dear Markus, Markus and Jens,

My MacBook survived the burn-in test :slight_smile: After 29 hours under full load,
alternating processor and disk access limited, I got my mega-vector file
cleaned. Thank you for the support and suggestions to solve my problems
with the large vector cleaning operation!

I did recompile the 6.5dev version on Linux, no large file support, no
64bit, used the standard pre-compiled binary and devel packages from the
suse repositories, so nothing special at all. A last test showed me that
the 6.4RC3 version worked too.

Probably 2 effects were mixed that causes my problems:
-lack of memory in the first place, and
-the database was on a Samba share (which works perfectly well, unless with
these mega vector datasets)

Why it doesn't work on the Mac _natively_ is a still unanswered question.
Maybe some built-in memory limit? At least my VMWare Suse did the job.

Although the dataset was cleaned, files of this size are virtually
impossible to handle, especially as standard querying, extract and overlay
operations with raster datasets simply take too much time. The fact that
ArcINFO workstation (also a topological GIS) is an order of magnitude
faster and not so memory hungry makes me believe that there should be a way
to improve speed on this kind of operations, but unfortunately I'm not a
algorythm guru...

I'll post a few related issues with mega files that makes working with them
very difficult. I'll post them as (separate) enhancements on trac, as they
are in my opinion of major importance:
-selecting a large vector map from a dropdown box in the wxPython GUI takes
a long time
-renaming this vector took 25 minutes (PostgreSQL access!)
-v.extract is also incredibly slow
-removing a vector file with an unreachable PostgreSQL database link does
not work, not even in force mode
-v.what consuming several GB of RAM only for querying a large vector map??
-v.rast.stats suffers from setting masks, extracting polygons and querying,
not usefull anymore for vector files this size, this is a particular slow
operations

I'll put my shell script to create a new mapset and automatically generate
a postgresql schema somewhere on the wiki.

With kind regards,

Wouter Boasson (MSc)
Geo-IT Research and Coordination

RIVM - National Institute for Public Health and the Environment
Expertise Centre for Methodology and Information Services

Contact information
-----------------------
RIVM
VenZ/EMI, Pb 86
t.a.v. dhr. Drs. Wouter Boasson
Postbus 1
3720 BA Bilthoven

T +31(0)302748518
M +31(0)611131150
F +31(0)302744456
E wouter.boasson@rivm.nl
mo - th

Wouter Boasson wrote:

Dear Markus, Markus and Jens,

My MacBook survived the burn-in test :slight_smile: After 29 hours under full load,
alternating processor and disk access limited, I got my mega-vector file
cleaned. Thank you for the support and suggestions to solve my problems
with the large vector cleaning operation!
  

Glad to hear that it worked in the end!

Although the dataset was cleaned, files of this size are virtually
impossible to handle, especially as standard querying, extract and overlay
operations with raster datasets simply take too much time.

There are ways to improve both speed and memory consumption. There are hints in the source code and I have some ideas, but this is no easy task. The grass vector model is complex, changes need a lot of testing before they can be applied. And there are not that many developers working on the core grass vector libraries... This will only happen in grass7, I guess. Hopefully sometime this year...

I'll post a few related issues with mega files that makes working with them
very difficult. I'll post them as (separate) enhancements on trac, as they
are in my opinion of major importance:
-selecting a large vector map from a dropdown box in the wxPython GUI takes
a long time
-renaming this vector took 25 minutes (PostgreSQL access!)
-v.extract is also incredibly slow
-removing a vector file with an unreachable PostgreSQL database link does
not work, not even in force mode
-v.what consuming several GB of RAM only for querying a large vector map??
  

Some of the above operations could be improved, but it will take some time.

-v.rast.stats suffers from setting masks, extracting polygons and querying,
not usefull anymore for vector files this size, this is a particular slow
operations
  

Try the example script in the help page of r.univar.zonal, available in the grass-addons:
http://grass.osgeo.org/wiki/GRASS_AddOns#r.univar.zonal
It should be easy to modify it to your needs, it does something very similar to v.rast.stats, only faster.

Best regards,

Markus M

Hello,
reading this thread, and being sometimes concerned with large vector
files (associated with big related tables), I wonder if it's worth
manually creating indexes (on cat field) : can it be a effective way to
speed up queries, or is the kink elsewhere, at the geometric level (data
handled by grass, not the linked DBMS) ?

Thank you,
Vincent

Le jeudi 26 février 2009 à 08:42 +0100, Markus Metz a écrit :

Wouter Boasson wrote:
> Dear Markus, Markus and Jens,
>
> My MacBook survived the burn-in test :slight_smile: After 29 hours under full load,
> alternating processor and disk access limited, I got my mega-vector file
> cleaned. Thank you for the support and suggestions to solve my problems
> with the large vector cleaning operation!
>
Glad to hear that it worked in the end!
> Although the dataset was cleaned, files of this size are virtually
> impossible to handle, especially as standard querying, extract and overlay
> operations with raster datasets simply take too much time.
There are ways to improve both speed and memory consumption. There are
hints in the source code and I have some ideas, but this is no easy
task. The grass vector model is complex, changes need a lot of testing
before they can be applied. And there are not that many developers
working on the core grass vector libraries... This will only happen in
grass7, I guess. Hopefully sometime this year...
> I'll post a few related issues with mega files that makes working with them
> very difficult. I'll post them as (separate) enhancements on trac, as they
> are in my opinion of major importance:
> -selecting a large vector map from a dropdown box in the wxPython GUI takes
> a long time
> -renaming this vector took 25 minutes (PostgreSQL access!)
> -v.extract is also incredibly slow
> -removing a vector file with an unreachable PostgreSQL database link does
> not work, not even in force mode
> -v.what consuming several GB of RAM only for querying a large vector map??
>
Some of the above operations could be improved, but it will take some time.
> -v.rast.stats suffers from setting masks, extracting polygons and querying,
> not usefull anymore for vector files this size, this is a particular slow
> operations
>
Try the example script in the help page of r.univar.zonal, available in
the grass-addons:
http://grass.osgeo.org/wiki/GRASS_AddOns#r.univar.zonal
It should be easy to modify it to your needs, it does something very
similar to v.rast.stats, only faster.

Best regards,

Markus M

_______________________________________________
grass-dev mailing list
grass-dev@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/grass-dev

On 26/02/09 08:56, Vincent Bain wrote:

Hello,
reading this thread, and being sometimes concerned with large vector
files (associated with big related tables), I wonder if it's worth
manually creating indexes (on cat field) : can it be a effective way to
speed up queries, or is the kink elsewhere, at the geometric level (data
handled by grass, not the linked DBMS) ?

AFAIK, an index on the cat field is created automatically when a new vector is created, but if you link an existing table to a vector map, no indices are created. So, yes, creating an index on your cat field can make a significant difference for operations involving that field...

Moritz