[GRASS5] Compression (?) of CELL maps (int)

Question: Are CELL maps compressed? If yes, the compression
rate seems to me much lower than for FCELL.

GRASS:/ssi-data/grass/PERMANENT/cell > l
-rw-r--r-- 1 grass fepssi 78518805 Jun 5 15:42 060150.g.s9_SA_135.int

GRASS:/ssi-data/grass/PERMANENT/cell > gzip 060150.g.s9_SA_135.int

GRASS:/ssi-data/grass/PERMANENT/cell > l
-rw-r--r-- 1 grass fepssi 49847399 Jun 5 15:42 060150.g.s9_SA_135.int.gz

-> 63 %

Can we somehow improve the situation?

Markus

Markus Neteler wrote:

Question: Are CELL maps compressed?

G_open_cell_new() and G_open_raster_new(..., CELL_TYPE) create a
compressed map; but ...

If yes, the compression
rate seems to me much lower than for FCELL.

FCELL/DCELL data is zlib (gzip) compressed; CELL is RLE-compressed,
which is much less effective.

GRASS:/ssi-data/grass/PERMANENT/cell > l
-rw-r--r-- 1 grass fepssi 78518805 Jun 5 15:42 060150.g.s9_SA_135.int

GRASS:/ssi-data/grass/PERMANENT/cell > gzip 060150.g.s9_SA_135.int

GRASS:/ssi-data/grass/PERMANENT/cell > l
-rw-r--r-- 1 grass fepssi 49847399 Jun 5 15:42 060150.g.s9_SA_135.int.gz

-> 63 %

Can we somehow improve the situation?

Use zlib for CELL maps. But that means messing with the core raster
I/O functions.

FWIW, I intend to completely re-write the raster I/O code once I move
to 5.1. The existing code isn't particularly legible; two significant
bugs which were introduced with the addition of FP and NULL support
managed to remain undetected until after 5.0.0 was released.

Also, there appear to be significant performance issues with that
code; the limiting factor ought to be the raw disk I/O rate, but that
doesn't appear to be the case at present.

--
Glynn Clements <glynn.clements@virgin.net>

On Thu, Jun 05, 2003 at 10:20:58PM +0100, Glynn Clements wrote:

Markus Neteler wrote:

> Question: Are CELL maps compressed?

G_open_cell_new() and G_open_raster_new(..., CELL_TYPE) create a
compressed map; but ...

> If yes, the compression
> rate seems to me much lower than for FCELL.

FCELL/DCELL data is zlib (gzip) compressed; CELL is RLE-compressed,
which is much less effective.

> GRASS:/ssi-data/grass/PERMANENT/cell > l
> -rw-r--r-- 1 grass fepssi 78518805 Jun 5 15:42 060150.g.s9_SA_135.int
>
> GRASS:/ssi-data/grass/PERMANENT/cell > gzip 060150.g.s9_SA_135.int
>
> GRASS:/ssi-data/grass/PERMANENT/cell > l
> -rw-r--r-- 1 grass fepssi 49847399 Jun 5 15:42 060150.g.s9_SA_135.int.gz
>
> -> 63 %
>
> Can we somehow improve the situation?

Use zlib for CELL maps. But that means messing with the core raster
I/O functions.

FWIW, I intend to completely re-write the raster I/O code once I move
to 5.1.

Wouldn't be better to wait for 5.3 for this then?

The existing code isn't particularly legible; two significant
bugs which were introduced with the addition of FP and NULL support
managed to remain undetected until after 5.0.0 was released.

Is there a possibility to only fix the bugs first?

Also, there appear to be significant performance issues with that
code; the limiting factor ought to be the raw disk I/O rate, but that
doesn't appear to be the case at present.

Bernhard Reiter wrote:

> FWIW, I intend to completely re-write the raster I/O code once I move
> to 5.1.

Wouldn't be better to wait for 5.3 for this then?

Why? The changes would be substantially less far-reaching than the
5.1. vector re-write.

> The existing code isn't particularly legible; two significant
> bugs which were introduced with the addition of FP and NULL support
> managed to remain undetected until after 5.0.0 was released.

Is there a possibility to only fix the bugs first?

I've already fixed those two bugs; it was the process of doing so that
led me to the conclusion that the raster I/O code needs to be
re-written to improve legibility.

--
Glynn Clements <glynn.clements@virgin.net>

On Fri, Jun 06, 2003 at 01:16:02PM +0100, Glynn Clements wrote:

Bernhard Reiter wrote:

> > FWIW, I intend to completely re-write the raster I/O code once I move
> > to 5.1.
>
> Wouldn't be better to wait for 5.3 for this then?

Why? The changes would be substantially less far-reaching than the
5.1. vector re-write.

Would it comprise also a rewrite of the NULL handling?
Maybe also the idea of having virtual mapsets with GDAL could be
implemented after the lib is more legible...

Markus

On Fri, Jun 06, 2003 at 01:16:02PM +0100, Glynn Clements wrote:

Bernhard Reiter wrote:

> > FWIW, I intend to completely re-write the raster I/O code once I move
> > to 5.1.
>
> Wouldn't be better to wait for 5.3 for this then?

Why? The changes would be substantially less far-reaching than the
5.1. vector re-write.

I was just trying to get a feeling for this.
If the changes are easy to make stable,
of course they are welcome for 5.1.

The idea is to not delay 5.2 too much as vector is already there.

> > The existing code isn't particularly legible; two significant
> > bugs which were introduced with the addition of FP and NULL support
> > managed to remain undetected until after 5.0.0 was released.
>
> Is there a possibility to only fix the bugs first?

I've already fixed those two bugs;

Great!

it was the process of doing so that
led me to the conclusion that the raster I/O code needs to be
re-written to improve legibility.

  Bernhard

On Fri, 6 Jun 2003, Bernhard Reiter wrote:

The idea is to not delay 5.2 too much as vector is already there.

But sites isn't. That's the major off-putting factor for me; I use sites
all the time especially as the ASCII format is so totally simple and
transparent and easy to deal with. Also there is substantial sites
functionality that isn't present in 5.1 or else requires re-learning how
to do things. I'm not sure though; is there s.sample, s.surf.idw,
s.voronoi, things like that in the 5.1 vector functionality? Maybe there
is.

But it will be a big wrench to have to stop using the sites format. I'm
not sure of the main reasons for leaving it out (that was before I used
GRASS and one reason why I wanted to look at the old developers mailing
list). At the time the decision was made, was it presumed that there would
be enough programmer labour available to re-implement all the sites
functionality of 5.0 in the 5.1 vector functions and modules? I can't see
this happening although I'm not saying I wouldn't help if I thought it was
a worthwhile effort.

Paul

On Fri, Jun 06, 2003 at 08:18:38PM +0100, Paul Kelly wrote:

On Fri, 6 Jun 2003, Bernhard Reiter wrote:

> The idea is to not delay 5.2 too much as vector is already there.

But sites isn't. That's the major off-putting factor for me; I use sites
all the time especially as the ASCII format is so totally simple and
transparent and easy to deal with. Also there is substantial sites
functionality that isn't present in 5.1 or else requires re-learning how
to do things. I'm not sure though; is there s.sample, s.surf.idw,
s.voronoi, things like that in the 5.1 vector functionality? Maybe there
is.

Say, it is there partially now. The idea is to use vector nodes instead of
sites in an own format. This reduces the maintenaance efforts to two
formats instead of three. The sites format of 5.0 is limited:
- no NULL support
- troubles with strings
- slow
- huge files
etc.

But it will be a big wrench to have to stop using the sites format. I'm
not sure of the main reasons for leaving it out (that was before I used
GRASS and one reason why I wanted to look at the old developers mailing
list). At the time the decision was made, was it presumed that there would
be enough programmer labour available to re-implement all the sites
functionality of 5.0 in the 5.1 vector functions and modules? I can't see
this happening although I'm not saying I wouldn't help if I thought it was
a worthwhile effort.

In fact the upgrading of sites modules to the 5.1 vector format should
be cleaner than the sites API (Radim and others may correct me).
Using the Vect_*() functions gives you at the same time
- spatial index
- multiple DBMS support
- NULL support
etc

and all under a common interface. I assume also that you already find
some functionality of sites 5.0 *modules* in the 5.1 vector *library*.
It is desired to move more common functionality to the libs.

To welcome another 5.1 "sites" vector programmer were great.

Markus

Markus Neteler wrote:

> > > FWIW, I intend to completely re-write the raster I/O code once I move
> > > to 5.1.
> >
> > Wouldn't be better to wait for 5.3 for this then?
>
> Why? The changes would be substantially less far-reaching than the
> 5.1. vector re-write.

Would it comprise also a rewrite of the NULL handling?

When a separate NULL bitmap is present, the nulls would be embedded
prior to the rescaling.

There should at least be the option of storing NULLs in the main
raster file rather than a separate file.

Also, the scaling should probably be done arithmetically (Bressenham
or DDA) rather than using a lookup table. On modern CPUs, arithmetic
operations are faster than memory accesses.

--
Glynn Clements <glynn.clements@virgin.net>

On Saturday 07 June 2003 00:05, Markus Neteler wrote:

> > The idea is to not delay 5.2 too much as vector is already there.
>
> But sites isn't. That's the major off-putting factor for me; I use sites
> all the time especially as the ASCII format is so totally simple and
> transparent and easy to deal with. Also there is substantial sites
> functionality that isn't present in 5.1 or else requires re-learning how
> to do things. I'm not sure though; is there s.sample, s.surf.idw,
> s.voronoi, things like that in the 5.1 vector functionality? Maybe there
> is.

No. s.delaunay, s.probplt, s.sample, s.sv, s.voronoi, s.hull, s.medp, qcount,
s.surf.idw, s.territory, s.univar, s.normal, s.perturb, s.random , s.vol.rst,
s.windavg and maybe more do not have vector equivalent. There is nowbody, who
could update these modules.

Say, it is there partially now. The idea is to use vector nodes instead of
sites in an own format. This reduces the maintenaance efforts to two
formats instead of three. The sites format of 5.0 is limited:
- no NULL support

NULL is supported by DBMI in theory, but it is missing in drivers and modules.

- troubles with strings
- slow

Or fast. I don't think that vectors are faster. To insert more lines into postgress
(insert not copy) is so slow that it is not very usable.

- huge files

Coordinates are almost of the same size and if you take topology + spatial index,
vector files are much bigger.

> But it will be a big wrench to have to stop using the sites format. I'm
> not sure of the main reasons for leaving it out (that was before I used
> GRASS and one reason why I wanted to look at the old developers mailing
> list). At the time the decision was made, was it presumed that there
> would be enough programmer labour available to re-implement all the sites
> functionality of 5.0 in the 5.1 vector functions and modules? I can't see
> this happening although I'm not saying I wouldn't help if I thought it
> was a worthwhile effort.

I am not sure it is worthwhile. I worry, that it is impossible to get
such performance with vector+dbmi(+driver+database) as with plaint text files.
It would be good to compare speed of v.surf.rst and s.surf.rst. But compare
and try to tune a bit rst library first.

Yes, 5.1 vector format is bad, maybe it's time to stop 5.1 development,
skip 5.2 and start 5.3.

Radim

On Tue, 10 Jun 2003, Radim Blazek wrote:

Yes, 5.1 vector format is bad, maybe it's time to stop 5.1 development,
skip 5.2 and start 5.3.

:slight_smile: I was kind of thinking that maybe sites could exist alongside the new
vector format in 5.1 and 5.2 (so that all the sites modules that don't
have a 5.1 vector equivalent could still be used), and then perhaps for
5.3 sites could become fully integrated into the new vector format.
Something like that.

On Tue, Jun 10, 2003 at 06:07:51PM +0200, Radim Blazek wrote:

Yes, 5.1 vector format is bad, maybe it's time to stop 5.1 development,
skip 5.2 and start 5.3.

Sarcasm is not helpful in an international list.

Paul said: Where are sites in 5.1?

Markus said: Not there yet, but a lot of functionality can be solved
    differently and for the better.

You wrote: Not sure if it is for the better in all cases.

There are several possible ways to go on and non will abandon 5.1.

On Tuesday 10 June 2003 18:21, Paul Kelly wrote:

On Tue, 10 Jun 2003, Radim Blazek wrote:
> Yes, 5.1 vector format is bad, maybe it's time to stop 5.1 development,
> skip 5.2 and start 5.3.

:slight_smile:

No it wasn't :), or at least not only :). I am realy thinking about that
these days. There are various reasons for that, both technical and political.
I'll send more about it later.

I was kind of thinking that maybe sites could exist alongside the new
vector format in 5.1 and 5.2 (so that all the sites modules that don't
have a 5.1 vector equivalent could still be used), and then perhaps for
5.3 sites could become fully integrated into the new vector format.
Something like that.

I think, that there is still a lot of time to update s.* modules
(if it is sensible). It may take 2 years to finish 5.1 so that
it may be released as 5.2. For those, who want to use 5.1 for their
work already now, we could add s.* modules to cpbin.conf (make binmix).
The only problem is d.what.sites (R_get_location_* changed).

Radim