[GRASS-user] FW: Working with large vector files

I sent a similar question about large vector files to a listserv I moderate (starserv), and one of the users made the comment below indicating dbf files themselves can’t be larger than 2gb. Can other types of databases be used as the backend for vector files? Is this statement not true? How would this behavior affect things like v.in.ascii (which I noticed uses a process called dbf for most of the importing process).

–j


Jonathan A. Greenberg, PhD
NRC Research Associate
NASA Ames Research Center
MS 242-4
Moffett Field, CA 94035-1000
Office: 650-604-5896
Cell: 415-794-5043
AIM: jgrn307
MSN: jgrn307@hotmail.com

------ Forwarded Message
From: Richard Pollock pollock@pcigeomatics.com
Reply-To: starserv@ucdavis.edu
Date: Thu, 5 Oct 2006 18:40:54 -0400
To: starserv@ucdavis.edu
Conversation: Working with large vector files
Subject: RE: Working with large vector files

The file format can also be an issue. The maximum size of a .DBF file is 2GB. That is because the file has to contain offsets to various other locations within the file, and those offsets are based on 32-bit integers. No software that is writing to a .DBF file can get around this. Ideally, the software should detect when it it has written as much data that the output file format can accommodate, refuse to write any more, close the file, and inform the user of the situation. If the software keeps writing then all it will do is convert a maximum-sized file that is at least usable into a corrupt file that may contain more data but is unusable because of messed up offset values.

Lots of file formats that have been around a while have similar problems. When they were designed, people weren’t worried about files anywhere near 2GB in size.

So, the first thing is to find a storage format that is not intrinsically limited in size.

I understand that GRASS can write to a PostGIS database. PostGIS is based on PostgreSQL (a free, opensource DBMS), which has a maximum table size of 32 TB. If GRASS can read a buffer-full of the input data, process it, and write the results out to PostGIS table, and repeat until all the input data are processed, then that may be your solution. At least, as long as the processing doesn’t involve displaying the data (displaying very large datasets has its own problems).

Cheers,

Richard


From: owner-starserv@ucdavis.edu [mailto:owner-starserv@ucdavis.edu] On Behalf Of Jonathan Greenberg
Sent: Thursday, October 05, 2006 5:31 PM
To: STARServ
Subject: Re: Working with large vector files

I’ve been working on techniques to perform tree crown recognition using high spatial resolution remote sensing imagery — the final output of these algorithms is a polygon coverage representing each tree crown in an image — as you can imagine that’s a LOT of trees for a standard quickbird image (on the order of 2 million polygons). I understand that I can be subsetting the rasters and doing smaller extractions, but this is, at best, a hack — there’s been a lot of work on efficient handling of massive raster images (look at RS packages like ENVI and GRASS), but massive vector handling is seriously lagging — early estimations are that I’d need about 25 or so subsets for a single quickbird scene to keep myself under the memory requirements.

Right now I’m just trying to import a csv of xloc,yloc, crown radius (the output of my crown mapping algorithm) into SOME GIS, perform a buffer operation on that crown radius parameter (to give me the crown polygon), and work with that layer. ArcMap can actually import the points, but the buffering process completely overwhelms it (I noticed the DBF file hits 2gb and then I get the error). I’m trying GRASS right now but my first try also got a memory error (I’m working on a 32-bit PC and a 64-bit mac, incidentally).

Besides GRASS and ArcMap, what else could I be trying out? I should point out ENVI also has a vector size problem — displaying large vectors creates an out of memory error (at least on a 32-bit PC, haven’t tried it on my mac yet).

–j

On 10/5/06 2:08 PM, “Richard Pollock” pollock@pcigeomatics.com wrote:

What software created these large files in the first place?

What format are the files in?

Cheers,

Richard


From: owner-starserv@ucdavis.edu [mailto:owner-starserv@ucdavis.edu] On Behalf Of Joanna Grossman
Sent: Thursday, October 05, 2006 4:23 PM
To: starserv@ucdavis.edu
Subject: Re: Working with large vector files

I’m not sure Jonathan, but it’s certainly worth trying out GRASS and some of the other open source tools out there.
http://www.freegis.org/database/?cat=4

Good luck!

Joanna

Jonathan Greenberg jgreenberg@arc.nasa.gov wrote:

After banging my head against this issue for the Nth time, I’m putting out a
plaintive cry of “HELP!” – I am working (or would like to work with) vector
files which are larger than the 2gb limit imposed on them by ArcMap – can
anyone recommend a GIS program that CAN deal with massive vector coverages
– efficiently would be nice, but simply being able to open and process them
without getting corruption errors would be a great start…

–j


Jonathan A. Greenberg, PhD
NRC Research Associate
NASA Ames Research Center
MS 242-4
Moffett Field, CA 94035-1000
Office: 650-604-5896
Cell: 415-794-5043
AIM: jgrn307
MSN: jgrn307@hotmail.com

------ End of Forwarded Message