[GRASS5] Maximum number of categories

Bad news!

I have discovered, that number of categories for each element is stored
as 1byte. It was enough when only one cat of each field was allowed.
Now each field may have many cats and 256 is not enough.

I think that there are already too many 5.7 vector files in use,
so I must increase 'coor' format version (5.0->5.1) [1], and to handle more versions
for reading. Unfortunately it will not be possible to read new files in older versions
of GRASS 5.7. I'll add first the code for reading and after some 1-2 months also for writing,
to minimize problems.

Do you have other suggestions?
What other should be changed in coor format? Now for example:
- 'field' is limited to short
- 4 bits for element type
See V1__rewrite_line_nat.

Thanks to faunalia.it, they reached the limit first.

Radim

'coor' format version has nothing to do with GRASS SW version!

Radim Blazek said:

Bad news!

I have discovered, that number of categories for each element is stored
as 1byte. It was enough when only one cat of each field was allowed.
Now each field may have many cats and 256 is not enough.

I think that there are already too many 5.7 vector files in use,
so I must increase 'coor' format version (5.0->5.1) [1], and to handle
more versions
for reading. Unfortunately it will not be possible to read new files in
older versions
of GRASS 5.7. I'll add first the code for reading and after some 1-2
months also for writing,
to minimize problems.

Do you have other suggestions?

If the 'coor' format solution is a "second-best" solution in form of a
hack (please coorect me if it isn't), I would think that since 5.7 is
officially declared as development version, people have to live with the
risk of things being broken. So, I think it would be better to revise the
format completely and cleanly to include all necessary changes, instead of
finding other solutions that might be not as clean and then carry this
around for the next years.

Moritz

On Saturday 24 April 2004 08:43, Moritz Lennert wrote:

Radim Blazek said:
> Bad news!
>
> I have discovered, that number of categories for each element is stored
> as 1byte. It was enough when only one cat of each field was allowed.
> Now each field may have many cats and 256 is not enough.
>
> I think that there are already too many 5.7 vector files in use,
> so I must increase 'coor' format version (5.0->5.1) [1], and to handle
> more versions
> for reading. Unfortunately it will not be possible to read new files in
> older versions
> of GRASS 5.7. I'll add first the code for reading and after some 1-2
> months also for writing,
> to minimize problems.
>
> Do you have other suggestions?

If the 'coor' format solution is a "second-best" solution in form of a
hack (please coorect me if it isn't), I would think that since 5.7 is
officially declared as development version, people have to live with the
risk of things being broken. So, I think it would be better to revise the
format completely and cleanly to include all necessary changes, instead of
finding other solutions that might be not as clean and then carry this
around for the next years.

Moritz

The 'coor' format was designed for 5.7 without any compromises.
AFAIK, the suggested change (number of cats 1 byte -> 2(4) bytes)
is the only required.
I know that 5.7 is devel version and it is devel version exactly
for these reasons. However, we (including me, I think) said
few times that 5.7 can be used already for work.
To have support (reading) for both 5.0 and 5.1 'coor' formats
means only few rows of code.

I think that I'll do both steps of the change (i.e. first reading,
then writing) without long delay but in 2 cvs commits.
The problem may appear only for groups using more installation
of GRASS and sharing the data. In that case, they can either:
1) make first update for reading on all machines and then
   also for writing
or
2) stop the work for a while and update all installations directly
   to reading+writing.
Is it OK?

Radim

Radim Blazek said:

On Saturday 24 April 2004 08:43, Moritz Lennert wrote:

Radim Blazek said:
> Bad news!
>
> I have discovered, that number of categories for each element is
stored
> as 1byte. It was enough when only one cat of each field was allowed.
> Now each field may have many cats and 256 is not enough.
>
> I think that there are already too many 5.7 vector files in use,
> so I must increase 'coor' format version (5.0->5.1) [1], and to handle
> more versions
> for reading. Unfortunately it will not be possible to read new files
in
> older versions
> of GRASS 5.7. I'll add first the code for reading and after some 1-2
> months also for writing,
> to minimize problems.
>
> Do you have other suggestions?

If the 'coor' format solution is a "second-best" solution in form of a
hack (please coorect me if it isn't), I would think that since 5.7 is
officially declared as development version, people have to live with the
risk of things being broken. So, I think it would be better to revise
the
format completely and cleanly to include all necessary changes, instead
of
finding other solutions that might be not as clean and then carry this
around for the next years.

Moritz

The 'coor' format was designed for 5.7 without any compromises.
AFAIK, the suggested change (number of cats 1 byte -> 2(4) bytes)
is the only required.
I know that 5.7 is devel version and it is devel version exactly
for these reasons. However, we (including me, I think) said
few times that 5.7 can be used already for work.
To have support (reading) for both 5.0 and 5.1 'coor' formats
means only few rows of code.

I think that I'll do both steps of the change (i.e. first reading,
then writing) without long delay but in 2 cvs commits.
The problem may appear only for groups using more installation
of GRASS and sharing the data. In that case, they can either:
1) make first update for reading on all machines and then
   also for writing
or
2) stop the work for a while and update all installations directly
   to reading+writing.
Is it OK?

Sounds good to me !

Moritz

Hello Radim

On Mon, 26 Apr 2004, Radim Blazek wrote:

The 'coor' format was designed for 5.7 without any compromises.
AFAIK, the suggested change (number of cats 1 byte -> 2(4) bytes)
is the only required.
I know that 5.7 is devel version and it is devel version exactly
for these reasons. However, we (including me, I think) said
few times that 5.7 can be used already for work.
To have support (reading) for both 5.0 and 5.1 'coor' formats
means only few rows of code.

I think that I'll do both steps of the change (i.e. first reading,
then writing) without long delay but in 2 cvs commits.
The problem may appear only for groups using more installation
of GRASS and sharing the data. In that case, they can either:
1) make first update for reading on all machines and then
   also for writing
or
2) stop the work for a while and update all installations directly
   to reading+writing.
Is it OK?

I think after you add the code for reading then you could release GRASS
5.7.0 (with code from 5.3 copied in to make it usable of course). Then the
code for writing will only be in the development CVS versions but if people
have problems reading files you only have to tell them to make sure they
are using 5.7.0 (rather than checking the date of the snapshot they used).

I also think you should use the opportunity to expand the other two items
you mentioned in the binary coor formats to cover all possibilities for
expansion for the forseeable future, if you think there is the slightest
chance it might be needed.

Paul

[...] but if people have problems reading files you only have to
tell them to make sure they are using 5.7.0 (rather than checking the
date of the snapshot they used).

Is it possible to make g.version reference a macro of some sort to
return the build date? As it stands, "GRASS 5.3-cvs (2004)" and
"GRASS 5.7.-cvs (2004)" aren't very informative from a debugging point
of view. ^

Even better than build date might be cvs checkout-date; the only way I
see how to do that is a cron job that touches and checks-in the VERSION
file with a $Date$ string in it, and then change the parsing on the
VERSION file. (or have the cron job update the file itself)

grass/src/CMD/VERSION
grass51/include/VERSION

?
Hamish

On Monday 26 April 2004 12:49, Paul Kelly wrote:

Hello Radim

On Mon, 26 Apr 2004, Radim Blazek wrote:
> The 'coor' format was designed for 5.7 without any compromises.
> AFAIK, the suggested change (number of cats 1 byte -> 2(4) bytes)
> is the only required.
> I know that 5.7 is devel version and it is devel version exactly
> for these reasons. However, we (including me, I think) said
> few times that 5.7 can be used already for work.
> To have support (reading) for both 5.0 and 5.1 'coor' formats
> means only few rows of code.
>
> I think that I'll do both steps of the change (i.e. first reading,
> then writing) without long delay but in 2 cvs commits.
> The problem may appear only for groups using more installation
> of GRASS and sharing the data. In that case, they can either:
> 1) make first update for reading on all machines and then
> also for writing
> or
> 2) stop the work for a while and update all installations directly
> to reading+writing.
> Is it OK?

I think after you add the code for reading then you could release GRASS
5.7.0 (with code from 5.3 copied in to make it usable of course). Then the
code for writing will only be in the development CVS versions but if people
have problems reading files you only have to tell them to make sure they
are using 5.7.0 (rather than checking the date of the snapshot they used).

I prefere not to release 5.7.0 with a format which has known limits.
I want also replace shapefile+postGIS by OGR before 5.7.0 (change in the format).
I'll return to original plan to wait a while between code for reading and writing.
I think that it is almost the same to say that they have to use 5.7.0
or cvs older than some date.

I also think you should use the opportunity to expand the other two items
you mentioned in the binary coor formats to cover all possibilities for
expansion for the forseeable future, if you think there is the slightest
chance it might be needed.

Currently elemets are written as:
1 byt - header:
        0 bit: 1 - alive, 0 - dead
        1 bit: 1 - categories, 0 - no category
        2-3 bit: store type
        4-5 bit: reserved for store type expansion
        6-7 bit: not used

1 byte - number of categories (if categories are present)
n_cats * 2 bytes - field (2 bytes per field)
n_cats * 4 bytes - cats (4 bytes per cat)

4 bytes - number of coordinates (linear elements only)
n_coor * 8 bytes - x coordinates
n_coor * 8 bytes - y coordinates
[n_coor * 8 bytes - z coordinates]

That means 4 bits (+1 reserved) for element type i.e. maximu 32 types
and we have currently 8 types defined including not used face, kernel
and volume. Possible new type could be Polygon but it will not be
possible to read files with polygons in older version anyway.
So I think, that first byte header may be left as it is.

I'll change only the number of categories and fields to 4 bytes ->

4 bytes - number of categories (if categories are present)
n_cats * 4 bytes - field (4 bytes per field)
n_cats * 4 bytes - cats (4 bytes per cat)

Radim