[GRASS5] [bug #4244] (grass) segmentation library short read()s

this bug's URL: http://intevation.de/rt/webrt?serial_num=4244
-------------------------------------------------------------------------

Subject: segmentation library short read()s

Hi,

as many have noticed, sometime after 6.0 was released modules using the
segmentation library started to see read errors. After fixing the reported
error message in pagein.c we see that a short read is happening:
WARNING: segment_pagein: short count during read(), got 4076, expected 4096
WARNING: segment_pagein: short count during read(), got 1004, expected 1024
...

seems to be consistently short 20 bytes.

spearfish60 example:

GRASS 6.1-cvs
G6.1> r.cost in=elevation.10m out=cost_test.61 coordinate=599732,4920952
WARNING: segment_pagein: short count during read(), got 524268, expected
         524288
WARNING: segment_pagein: short count during read(), got 524268, expected
         524288
[done]

================
GRASS 6.0.2
G6.0.2> r.cost in=elevation.10m out=cost_test.602 coordinate=599732,4920952
Peak cost value: 645941.903859
[done]

Resultant maps differ slightly!!

G61> r.univar -g cost_test61 | grep sum
sum=86555328769.3534545898
G61> r.univar -g cost_test602 | grep sum
sum=86555572145.8440551758

also seen in r.surf.contour, r.watershed, r.walk, ...

two relevant changes to the segment lib since 6.0:
"Roberto Flor (ITC-irst): use lseek"
http://freegis.org/cgi-bin/viewcvs.cgi/grass6/lib/segment/format.c.diff?r1=2.0&r2=2.1

"added missing config.h for off_t (as suggested by Glynn; bug #3974)"
http://freegis.org/cgi-bin/viewcvs.cgi/grass6/lib/segment/seek.c.diff?r1=2.0&r2=2.1

could it be some site_t change somewhere else...?

Hamish

-------------------------------------------- Managed by Request Tracker

Request Tracker wrote:

this bug's URL: http://intevation.de/rt/webrt?serial_num=4244
-------------------------------------------------------------------------

Subject: segmentation library short read()s

Hi,

as many have noticed, sometime after 6.0 was released modules using the
segmentation library started to see read errors. After fixing the reported
error message in pagein.c we see that a short read is happening:
WARNING: segment_pagein: short count during read(), got 4076, expected 4096
WARNING: segment_pagein: short count during read(), got 1004, expected 1024
...

seems to be consistently short 20 bytes.

It is caused by this change to zero_fill():

  RCS file: /grassrepository/grass6/lib/segment/format.c,v
  Working file: format.c

  ----------------------------
  revision 2.1
  date: 2005/06/30 15:48:40; author: markus; state: Exp; lines: +28 -4
  Roberto Flor (ITC-irst): use lseek
  ----------------------------

Specifically, the line:

    if ( lseek(fd,nbytes-1,SEEK_SET) < 0 ) {

should use SEEK_CUR rather than SEEK_SET, as it is supposed to be
writing nbytes from the current position, not from the start of the
file.

Note that there is a 20-byte header (5 "int"s) at the start of the
file, which would explain why the read() is always 20 bytes short
(zero_fill() is always called just after the header has been written).

I'm not sure why that header is there in the first place; segment
files aren't supposed to be shared across different processes, are
they?

--
Glynn Clements <glynn@gclements.plus.com>

> this bug's URL: http://intevation.de/rt/webrt?serial_num=4244
> -------------------------------------------------------------------
>
> Subject: segmentation library short read()s

..

> as many have noticed, sometime after 6.0 was released modules using
> the segmentation library started to see read errors. After fixing
> the reported error message in pagein.c we see that a short read is
> happening: WARNING: segment_pagein: short count during read(), got
> 4076, expected 4096 WARNING: segment_pagein: short count during
> read(), got 1004, expected 1024 ...
>
> seems to be consistently short 20 bytes.

It is caused by this change to zero_fill():

  RCS file: /grassrepository/grass6/lib/segment/format.c,v

..

  revision 2.1

..

Specifically, the line:

    if ( lseek(fd,nbytes-1,SEEK_SET) < 0 ) {

should use SEEK_CUR rather than SEEK_SET, as it is supposed to be
writing nbytes from the current position, not from the start of the
file.

I can confirm that change gets rid of the warnings and the r.univar
sum of values now matches the 6.0.2 value.

Note that there is a 20-byte header (5 "int"s) at the start of the
file, which would explain why the read() is always 20 bytes short
(zero_fill() is always called just after the header has been written).

I'm not sure why that header is there in the first place; segment
files aren't supposed to be shared across different processes, are
they?

Don't know. I wouldn't have thought so.

Hamish

Glynn Clements wrote:

Request Tracker wrote:

Specifically, the line:

   if ( lseek(fd,nbytes-1,SEEK_SET) < 0 ) {

should use SEEK_CUR rather than SEEK_SET, as it is supposed to be
writing nbytes from the current position, not from the start of the
file.

Note that there is a 20-byte header (5 "int"s) at the start of the
file, which would explain why the read() is always 20 bytes short
(zero_fill() is always called just after the header has been written).

I'm not sure why that header is there in the first place; segment
files aren't supposed to be shared across different processes, are
they?

The patch is quite correct; the segment file was shorter of 20 bytes
when using the lseek zero_fill.
The header is required because the segment_format just create the file,
while the segment_init initialize it.
The typical segment library usage was:
    fd=open_for_writing
    segment_format(fd,..)
    close(fd)
    fd=open_for_reading_writing
    segment_init(fd,...)

and the parameters are shared through the header.
The lseek zero_fill does not really allocate the file, it just extend
the file size, so it's possible a write error later, but it's a lot
faster than writing zero bytes to a file. The write error are handled
inside segment_pageout and segment_put_row with a G_warning and a -1
return code.

Request Tracker wrote:
this bug's URL: http://intevation.de/rt/webrt?serial_num=4244
----------------------------------------------------------------

Subject: segmentation library short read()s

...

>Specifically, the line:
>
> if ( lseek(fd,nbytes-1,SEEK_SET) < 0 ) {
>
>should use SEEK_CUR rather than SEEK_SET, as it is supposed to be
>writing nbytes from the current position, not from the start of the
>file.
>
>Note that there is a 20-byte header (5 "int"s) at the start of the
>file, which would explain why the read() is always 20 bytes short
>(zero_fill() is always called just after the header has been
>written).
>
>I'm not sure why that header is there in the first place; segment
>files aren't supposed to be shared across different processes, are
>they?

The patch is quite correct; the segment file was shorter of 20 bytes
when using the lseek zero_fill.
The header is required because the segment_format just create the
file, while the segment_init initialize it.
The typical segment library usage was:
    fd=open_for_writing
    segment_format(fd,..)
    close(fd)
    fd=open_for_reading_writing
    segment_init(fd,...)

and the parameters are shared through the header.
The lseek zero_fill does not really allocate the file, it just
extend the file size, so it's possible a write error later, but
it's a lot faster than writing zero bytes to a file. The write error
are handled inside segment_pageout and segment_put_row with a
G_warning and a -1 return code.

.... so, should this change be committed to CVS or not?

> if ( lseek(fd,nbytes-1,SEEK_SET) < 0 ) {
>
>should use SEEK_CUR rather than SEEK_SET

If not, I find it worring that r.cost output is not the same from GRASS
6.0 to GRASS 6.1.

thanks,
Hamish

Hamish wrote:

> Request Tracker wrote:
> this bug's URL: http://intevation.de/rt/webrt?serial_num=4244
> ----------------------------------------------------------------
>
> Subject: segmentation library short read()s
...

> >Specifically, the line:
> >
> > if ( lseek(fd,nbytes-1,SEEK_SET) < 0 ) {
> >
> >should use SEEK_CUR rather than SEEK_SET, as it is supposed to be
> >writing nbytes from the current position, not from the start of the
> >file.

[snip]

.... so, should this change be committed to CVS or not?

Yes.

--
Glynn Clements <glynn@gclements.plus.com>