[GRASS5] vector cats files

Folks,

I don't know if my problem here is with the documentation, the gis library or
with my application.

I have been modifying m.in.e00 to read some larger coverages. The relatively
small example I'm working with contains almost 25,000 polygons and the
database in the exported coverage describes 183 attributes for each polygon.

Without modification m.in.e00 runs for 4 hours before it consumes all the
available memory (ram and swap) and crashes. I've modified the code so that
it now reads and writes one attribute at a time, and allows the user to
select one or more attributes to save. That saves a lot of time and memory.

I also tried to reduce the size of the output cats files by using the GRASS5
cats file format in which each line can describe a range of category numbers.
The program runs, produces a good-looking cats file and a complete vector
file. The vector file displays correctly using d.vect, but any module that
accesses the cats file seems to either seg fault (d.what.vect) or ignore the
categories (v.digit).

If use the old format with no ranges then everthing works fine, but I have a
category file that is considerably bigger than I need.

Has the GRASS5 vector library not been updated to use the GRASS5 cats file
format? If it has been updated, then could their be a bug in the library
that shows up on large files? Or do I need to look elsewhere for the problem?

A couple further questions.

m.in.e00 produces one cats file for each attribute that it imports. The
user is supposed to activate a file by copying one of the cats files over to
a file with the default name for the map, then running v.support. If I
perform that step (which takes about 10 minutes) then change to a new cats
file and repeat the step I get a very long dump of duplicate labels. Is
there some way to avoid that? The only way I've found so far is to delete
and reimport the vector file. The step of running v.support on a second set
of cats seems unnecessary to me. Is there some way to avoid doing that?

Roger Miller

On Mon, Jul 08, 2002 at 05:27:45PM -0600, Roger Miller wrote:

I also tried to reduce the size of the output cats files by using the GRASS5
cats file format in which each line can describe a range of category numbers.
The program runs, produces a good-looking cats file and a complete vector
file. The vector file displays correctly using d.vect, but any module that
accesses the cats file seems to either seg fault (d.what.vect) or ignore the
categories (v.digit).

If use the old format with no ranges then everthing works fine, but I have a
category file that is considerably bigger than I need.

Has the GRASS5 vector library not been updated to use the GRASS5 cats file
format? If it has been updated, then could their be a bug in the library
that shows up on large files? Or do I need to look elsewhere for the problem?

AFAIK, vector cats don't support ranges. Range support is primarily for
floating point rasters where exact quantities are possibly
unrepresentable (fractions like .1). The main problem I saw was in
G_set_cat() as it does a linear search/comparison for each insert. This
isn't desirable for data imports due to the geometric time increase...
Better to just build a list or array of categories, sort, then uniquify
and write out the results....

A couple further questions.

m.in.e00 produces one cats file for each attribute that it imports. The
user is supposed to activate a file by copying one of the cats files over to
a file with the default name for the map, then running v.support. If I
perform that step (which takes about 10 minutes) then change to a new cats
file and repeat the step I get a very long dump of duplicate labels. Is
there some way to avoid that? The only way I've found so far is to delete
and reimport the vector file. The step of running v.support on a second set
of cats seems unnecessary to me. Is there some way to avoid doing that?

The easy way is to use symlinks to change the "active" category label.
I don't recall whether v.support is necessary (I thought it wasn't)...
We are just talking about the dig_cats file, and not the dig_att?

--
Eric G. Miller <egm2@jps.net>

On Friday 12 July 2002 14:39, Eric G. Miller wrote:

AFAIK, vector cats don't support ranges. Range support is primarily for
floating point rasters where exact quantities are possibly
unrepresentable (fractions like .1).

Thanks Eric. The programmer's manual says that vector cats files are
identical to raster cats files. This looks like something that needs to be
revised.

The easy way is to use symlinks to change the "active" category label.
I don't recall whether v.support is necessary (I thought it wasn't)...
We are just talking about the dig_cats file, and not the dig_att?

We were talking about dig_cats files. I tested and found that it wasn't
necessary to run v.support after changing the dig_cats file. Good thing, as
once I did get one of the larger files imported it took about 20 minutes to
run it through v.support just once.

I have revised m.in.e00 so that it can be used to import larger e00 files.
Once I'm convinced that I'm done I'll commit the changes. If anyone really
needs a version now that will read big e00 files (50,000+ polygons with 340+
attribute items each) let me know.

Roger Miller

On Fri, Jul 12, 2002 at 04:44:59PM -0600, Roger Miller wrote:

On Friday 12 July 2002 14:39, Eric G. Miller wrote:

>
> AFAIK, vector cats don't support ranges. Range support is primarily for
> floating point rasters where exact quantities are possibly
> unrepresentable (fractions like .1).

Thanks Eric. The programmer's manual says that vector cats files are
identical to raster cats files. This looks like something that needs to be
revised.

It does use the same functions, data structures and file format. But
vector cats are always indexed with integers and ranges are only used
for floating point "indices". The G_set_cat() function is essentially
the same as calling the G_set_c_raster_cat() function (which works on
single integer index/label pairs)...

> The easy way is to use symlinks to change the "active" category label.
> I don't recall whether v.support is necessary (I thought it wasn't)...
> We are just talking about the dig_cats file, and not the dig_att?

We were talking about dig_cats files. I tested and found that it wasn't
necessary to run v.support after changing the dig_cats file. Good thing, as
once I did get one of the larger files imported it took about 20 minutes to
run it through v.support just once.

Right. That's what I thought. v.support doesn't care about dig_cats,
it uses the dig_att file for placing the category number to the
specified type when building topology. What label (or tuple when
external DBMS's are used) is only found on label (tuple) lookups.

I have revised m.in.e00 so that it can be used to import larger e00 files.
Once I'm convinced that I'm done I'll commit the changes. If anyone really
needs a version now that will read big e00 files (50,000+ polygons with 340+
attribute items each) let me know.

I'd say, it might be worthwhile to look at adding some functions for
bulk loading to libes/gis/cats.c. The only real issue is how such
functions should behave when duplicate category indices with different
labels are encountered (maybe generate a duplicates list?). This
issue should go away for vectors in 5.1 (since dig_cats is not used),
but such bulk functions still might have utility for raster imports??

--
Eric G. Miller <egm2@jps.net>