[GRASS5] [bug #1170] (grass) m.in.e00 segmentation faults

this bug's URL: http://intevation.de/rt/webrt?serial_num=1170
-------------------------------------------------------------------------

Subject: m.in.e00 segmentation faults

Platform: GNU/Linux/i386
grass obtained from: Trento Italy site
grass binary for platform: Compiled from Sources
GRASS Version: 5.0pre4

I've been trying to import some fairly large binary arc export files. Files with only point coverages produced a segmentation fault with "action=all" (which is the default.) but succeeded with "action=vector." Using m.in.e00 to import a very large (300+ MB) file with polygon coverage produced a segmentation fault immediately after writing the last dig_cats file. "action=analyse" revealed no problems. With a smaller (70+ MB) file with a polygon coverage m.in.e00 created all of the dig_cats files but ran very slowly trying to import the vector map. It ran for approximately 4 hours and was about 80% done before the OS killed the process.

Importing a point coverage produced a site file with locations and numeric codes identifying the sites, but brought in no other attribute values, which made the site files nearly useless.

Roger Miller
rgrmill@rt66.com

-------------------------------------------- Managed by Request Tracker

Request Tracker wrote:

this bug's URL: http://intevation.de/rt/webrt?serial_num=1170
-------------------------------------------------------------------------

Subject: m.in.e00 segmentation faults

Platform: GNU/Linux/i386
grass obtained from: Trento Italy site
grass binary for platform: Compiled from Sources
GRASS Version: 5.0pre4

I've been trying to import some fairly large binary arc export files.
Files with only point coverages produced a segmentation fault with
"action=all" (which is the default.) but succeeded with "action=vector."

Generaly, segmentation faults are the consequence of a bugged e00 file
(m.in.e00 is especialy confused with some bad info table description).
But some bugs have been corrected in the latest release (and in CVS).
If you compiled from source, you may get the latest version of m.in.e00
with the CVS Web Interface and try again.

Using m.in.e00 to import a very large (300+ MB) file with polygon coverage
produced a segmentation fault immediately after writing the last dig_cats
file. "action=analyse" revealed no problems.

Unfortunately, I don't have such big e00 file at hand. If you know a site
where I can download one, send me the url (I have a fast internet connection).

With a smaller (70+ MB)
file with a polygon coverage m.in.e00 created all of the dig_cats files
but ran very slowly trying to import the vector map. It ran for
approximately 4 hours and was about 80% done before the OS killed the
process.

Writing the dig_cat file (adding each entry) takes an exponential time
(I guess that each time you add a new record, the library function
test it against any other previous entries). This is done just after
creating them, so I suspect that is the reason why it takes so much
time... and never finished. The only solution I'm thinking of is to
write directly the cat files, without the use of the grass library,
since there cannot be duplicate entries in e00. I hope also that this
problem will be solved in the new vector format (Grass 5.1).

Importing a point coverage produced a site file with locations and numeric
codes identifying the sites, but brought in no other attribute values,
which made the site files nearly useless.

Yes, but you may also have some dig_cats file for the other attributes,
as if you have imported a line file... (It's a "feature" that I want to
correct soon)

--
Michel WURTZ - DIG - Maison de la télédétection
               500, rue J.F. Breton
               34093 MONTPELLIER Cedex 5

On Wed, Jul 03, 2002 at 08:14:32AM +0000, Michel Wurtz wrote:

Writing the dig_cat file (adding each entry) takes an exponential time
(I guess that each time you add a new record, the library function
test it against any other previous entries). This is done just after
creating them, so I suspect that is the reason why it takes so much
time... and never finished. The only solution I'm thinking of is to
write directly the cat files, without the use of the grass library,
since there cannot be duplicate entries in e00. I hope also that this
problem will be solved in the new vector format (Grass 5.1).

Yes, that could be a huge bottleneck. I was comparing insertion sorts
vs. insert and then merge sorts on linked lists, and found dramatic time
differences as the list got larger. The merge sort after inserting 1.5
million ints took about 30 secs. on my aged machine while I gave up on the
insertion sort after around 20 or 30 minutes... Making the results
unique only requires one additional pass through the data after sorting.

You could bypass the API (even though I dislike recommending it) in the
creation of the category list(s), and just use it to write the files.
A better solution would be to add a bulk category update routine to
the library.

--
Eric G. Miller <egm2@jps.net>