[GRASS-user] too many categories: buffer overflow

Hello,

I think I'm experiencing a buffer overflow. This is a hard one to search for with GRASS GIS because the word "buffer" and "overflow" appear throughout as in r.buffer and overflowing weirs, etc. but I'm referring to the C-code error type of buffer oveflow:

r.to.vect -v input=basins output=basins type=area

DBMI-SQLite driver error:
Error in sqlite3_step():
UNIQUE constraint failed: basins.cat

ERROR: Unable to insert into table: insert into basins values (
       -2137121269, '(Category -2137121269)')

Does anyone have any suggestion how I can convert a raster with MANY categories to a vector?

Thanks,

  -k.

It looks like you’ve run out of integer values for the “category” primary key.

Do you really want a vector polygon map with > 2 billion features?

···

On 18/06/2019 15:14, Ken Mankoff wrote:

Hello,

I think I'm experiencing a buffer overflow. This is a hard one to search for with GRASS GIS because the word "buffer" and "overflow" appear throughout as in r.buffer and overflowing weirs, etc. but I'm referring to the C-code error type of buffer oveflow:

r.to.vect -v input=basins output=basins type=area

DBMI-SQLite driver error:
Error in sqlite3_step():
UNIQUE constraint failed: basins.cat

ERROR: Unable to insert into table: insert into basins values (
       -2137121269, '(Category -2137121269)')

Does anyone have any suggestion how I can convert a raster with MANY categories to a vector?

Thanks,

  -k.
_______________________________________________
grass-user mailing list
[grass-user@lists.osgeo.org](mailto:grass-user@lists.osgeo.org)
[https://lists.osgeo.org/mailman/listinfo/grass-user](https://lists.osgeo.org/mailman/listinfo/grass-user)
-- 
Micha Silver
Ben Gurion Univ.
Sde Boker, Remote Sensing Lab
cell: +972-523-665918

On Tue, Jun 18, 2019 at 4:08 PM Micha Silver <tsvibar@gmail.com> wrote:

It looks like you’ve run out of integer values for the “category” primary key.

Do you really want a vector polygon map with > 2 billion features?

On 18/06/2019 15:14, Ken Mankoff wrote:

Hello,

I think I’m experiencing a buffer overflow. This is a hard one to search for with GRASS GIS because the word “buffer” and “overflow” appear throughout as in r.buffer and overflowing weirs, etc. but I’m referring to the C-code error type of buffer oveflow:

r.to.vect -v input=basins output=basins type=area

DBMI-SQLite driver error:
Error in sqlite3_step():
UNIQUE constraint failed: basins.cat

ERROR: Unable to insert into table: insert into basins values (
-2137121269, ‘(Category -2137121269)’)

This is close to the 32 bit signed integer limit, but not yet there, the lower limit for raster maps of type CELL is at −2,147,483,648

The error “UNIQUE constraint failed: basins.cat” indicates that the given category already exists. Another issue is that negative categories are not allowed for vectors. Yet another issue is that basin numbers should be all positive, indicating that integer overflow occurred when creating the raster map basins, which is probably the root of this problem. What is the range of values in the raster map basins (r.info -r basins) and how was it created?

Markus M

Hi Micha and Markus,

On 2019-06-18 at 10:07 -04, Micha Silver <tsvibar@gmail.com> wrote...

Do you really want a vector polygon map with > 2 billion features?

No, and there are not that many.

% r.info -r basins
  min=-2147474681
  max=2147429730

But I don't have categories from 1 to 2147429730. The values are sparse. I describe my workflow and why I've created these sparse values in more detail below.

Even though << 2 billion, there should be many basins. This is all of Greenland at 30 m resolution, which is 4.5 billion features.

Taking a step back, I'm trying to generate unique basin values that match the stream and outlet CAT values. Here is my workflow which doesn't appear to have any problems when run at 90x90 m resolution (400 million cells) but fails at 30x30 m resolution (10x as many, or 4.5 billion cells).

1) Find streams:

r.stream.extract elevation=head threshold=${THRESH} memory=16384 direction=dir stream_raster=streams stream_vector=streams

2) Find outlets. Where streams have outlets, use the same CAT value so the two can be linked in further analysis. But many outlets don't have streams. These need to have unique categories for the next step when we find basins. This is where my error is. I set the unique value to the cell #, which is > 2 billion when using a 30x30 m domain.

r.mapcalc "outlets_all = if(dir < 0, 1, null())"
r.mapcalc "outlets_streams_1 = if((dir < 0) && (not(isnull(streams))), streams, outlets_all)"
### BUG INTRODUCED HERE, setting (eventual) cat to cell number:
r.mapcalc "outlets_streams = if(outlets_streams_1 != 1, outlets_streams_1, max(outlets_streams_1)+1+col()+(max(col())*(row()-1)))"

# convert outlets to a vector.
r.out.xyz input=outlets_streams | \
    v.in.ascii input=- output=outlets_streams separator=pipe \
        columns="x int, y int, cat int" x=1 y=2 cat=3

Q: How can I create the outlets_streams vector for all locations where dir < 0 (all outlets), that maintains the same value as the streams raster where that raster is defined, but unique values at all other locations where streams is not defined, but dir < 0?

3) Find basins

r.stream.basins -m direction=dir points=outlets_streams basins=basins_all memory=16384 --verbose

4) Absorb small basins

r.clump -d input=basins_all output=basins_nosmall minsize=124
r.mode base=basins_nosmall cover=basins_all output=basins
### BUG APPEARS HERE
r.to.vect -v input=basins output=basins type=area

# drop outlets for absorbed basins.
r.mapcalc "outlets = if(outlets_streams == basins, basins, null())"
r.to.vect -v input=outlets output=outlets type=point

NOTE: I use r.mode instead of r.area because I need to maintain the category value, so that eventual vectors can have linked primary keys. r.area re-assigns categories.

Any advice how to generate streams, outlets, and basins all with linked primary key would be much appreciated.

Thanks,

  -k.

On 18/06/19 21:51, Ken Mankoff wrote:

Hi Micha and Markus,

On 2019-06-18 at 10:07 -04, Micha Silver <tsvibar@gmail.com> wrote...

Do you really want a vector polygon map with > 2 billion features?

No, and there are not that many.

% r.info -r basins
   min=-2147474681
   max=2147429730

But I don't have categories from 1 to 2147429730. The values are sparse. I describe my workflow and why I've created these sparse values in more detail below.

Even though << 2 billion, there should be many basins. This is all of Greenland at 30 m resolution, which is 4.5 billion features.

Taking a step back, I'm trying to generate unique basin values that match the stream and outlet CAT values. Here is my workflow which doesn't appear to have any problems when run at 90x90 m resolution (400 million cells) but fails at 30x30 m resolution (10x as many, or 4.5 billion cells).

1) Find streams:

r.stream.extract elevation=head threshold=${THRESH} memory=16384 direction=dir stream_raster=streams stream_vector=streams

2) Find outlets. Where streams have outlets, use the same CAT value so the two can be linked in further analysis. But many outlets don't have streams. These need to have unique categories for the next step when we find basins. This is where my error is. I set the unique value to the cell #, which is > 2 billion when using a 30x30 m domain.

r.mapcalc "outlets_all = if(dir < 0, 1, null())"
r.mapcalc "outlets_streams_1 = if((dir < 0) && (not(isnull(streams))), streams, outlets_all)"
### BUG INTRODUCED HERE, setting (eventual) cat to cell number:
r.mapcalc "outlets_streams = if(outlets_streams_1 != 1, outlets_streams_1, max(outlets_streams_1)+1+col()+(max(col())*(row()-1)))"

# convert outlets to a vector.
r.out.xyz input=outlets_streams | \
     v.in.ascii input=- output=outlets_streams separator=pipe \
         columns="x int, y int, cat int" x=1 y=2 cat=3

Q: How can I create the outlets_streams vector for all locations where dir < 0 (all outlets), that maintains the same value as the streams raster where that raster is defined, but unique values at all other locations where streams is not defined, but dir < 0?

3) Find basins

r.stream.basins -m direction=dir points=outlets_streams basins=basins_all memory=16384 --verbose

4) Absorb small basins

r.clump -d input=basins_all output=basins_nosmall minsize=124
r.mode base=basins_nosmall cover=basins_all output=basins
### BUG APPEARS HERE
r.to.vect -v input=basins output=basins type=area

# drop outlets for absorbed basins.
r.mapcalc "outlets = if(outlets_streams == basins, basins, null())"
r.to.vect -v input=outlets output=outlets type=point

NOTE: I use r.mode instead of r.area because I need to maintain the category value, so that eventual vectors can have linked primary keys. r.area re-assigns categories.

Any advice how to generate streams, outlets, and basins all with linked primary key would be much appreciated.

Just a rapid, wild guess here, but would it be feasible to just vectorize the three separately and then create the link afterwards using v.distance ?

Moritz

On 2019-06-20 at 09:37 +02, Moritz Lennert <mlennert@club.worldonline.be> wrote...

Just a rapid, wild guess here, but would it be feasible to just
vectorize the three separately and then create the link afterwards
using v.distance ?

This may work. An alternate method is if I do this through the Python GRASS interface, I can more easily create unique IDs that are not based on the simple "Cell #" algorithm I'm currently using. That is harder to do in bash.

Or just recompile GRASS to use LONGs for primary key rather than INT.

For now I'm sticking with a lower resolution 90 m raster. Problem solved.

Thanks,

  -k.