[GRASS5] shift bug in r.in.ascii/r.out.ascii

On Fri, 29 Jun 2001, Markus Neteler wrote:

Hi all,

just found (obviously) a bug in r.in.ascii/r.out.ascii:

if you export/import the same map several times, it get's
shifted each time!

     r.out.ascii in=dem out=dem0.asc
     r.in.ascii in=dem0.asc out=dem1

     r.out.ascii in=dem1 out=dem1.asc
     r.in.ascii in=dem1.asc out=dem2

     r.out.ascii in=dem2 out=dem2.asc
     r.in.ascii in=dem2.asc out=dem3
     d.rast dem
     d.rast dem1
     d.rast dem2
     d.rast dem3

As being lack of time today, any volunteers?

Looks to me as if fseek is moving one byte too far back in the buffer on
line 142 in r.in.ascii/cmd/gethead.c, putting the last digit of the number
of columns as the first cell value read. Since it is followed by a
newline, it is treated as value.
      fseek(fd, -(len+1), SEEK_CUR);

For me, fseek(fd, -(len), SEEK_CUR); fixes it, but I haven't tried
it with other header fields or command line options.

Roger

--
Roger Bivand
Economic Geography Section, Department of Economics, Norwegian School of
Economics and Business Administration, Breiviksveien 40, N-5045 Bergen,
Norway. voice: +47 55 95 93 55; fax +47 55 95 93 93
e-mail: Roger.Bivand@nhh.no
and: Department of Geography and Regional Development, University of
Gdansk, al. Mar. J. Pilsudskiego 46, PL-81 378 Gdynia, Poland.

Roger Bivand wrote:

> just found (obviously) a bug in r.in.ascii/r.out.ascii:
>
> if you export/import the same map several times, it get's
> shifted each time!
>
> r.out.ascii in=dem out=dem0.asc
> r.in.ascii in=dem0.asc out=dem1
>
> r.out.ascii in=dem1 out=dem1.asc
> r.in.ascii in=dem1.asc out=dem2
>
> r.out.ascii in=dem2 out=dem2.asc
> r.in.ascii in=dem2.asc out=dem3
> d.rast dem
> d.rast dem1
> d.rast dem2
> d.rast dem3
>
> As being lack of time today, any volunteers?

Looks to me as if fseek is moving one byte too far back in the buffer on
line 142 in r.in.ascii/cmd/gethead.c, putting the last digit of the number
of columns as the first cell value read. Since it is followed by a
newline, it is treated as value.
      fseek(fd, -(len+1), SEEK_CUR);

For me, fseek(fd, -(len), SEEK_CUR); fixes it, but I haven't tried
it with other header fields or command line options.

Suggestion: call ftell() before G_getl() and use fseek(SEEK_SET) to
restore the position. Alternatively, you could use fgetpos() and
fsetpos(); these are more portable, but that probably isn't an issue
here.

--
Glynn Clements <glynn.clements@virgin.net>

From neteler Mon Jul 2 10:16:22 2001
Return-Path: <neteler>
Received: by hgeo02.geog.uni-hannover.de (SMI-8.6/SMI-SVR4)
  id KAA04622; Mon, 2 Jul 2001 10:16:22 +0100
Date: Mon, 2 Jul 2001 10:16:22 +0100
From: Markus Neteler <neteler@geog.uni-hannover.de>
To: grass5 developers list <grass5@geog.uni-hannover.de>
Subject: Re: [GRASS5] shift bug in r.in.ascii/r.out.ascii
Message-ID: <20010702101622.N16481@hgeo02.geog.uni-hannover.de>
Mail-Followup-To: grass5 developers list <grass5@geog.uni-hannover.de>
References: <20010629151210.L16481@hgeo02.geog.uni-hannover.de> <Pine.LNX.4.21.0106291928330.2921-100000@reclus.nhh.no> <15164.50234.521459.627900@cerise.nosuchdomain.co.uk>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
User-Agent: Mutt/1.2.5i
In-Reply-To: <15164.50234.521459.627900@cerise.nosuchdomain.co.uk>; from glynn.clements@virgin.net on Fri, Jun 29, 2001 at 07:08:58PM +0100
Sender: grass5-admin@geog.uni-hannover.de
Errors-To: grass5-admin@geog.uni-hannover.de
X-BeenThere: grass5@geog.uni-hannover.de
X-Mailman-Version: 2.0.5
Precedence: bulk
List-Help: <mailto:grass5-request@geog.uni-hannover.de?subject=help>
List-Post: <mailto:grass5@geog.uni-hannover.de>
List-Subscribe: <http://www.geog.uni-hannover.de/mailman/listinfo/grass5&gt;,
  <mailto:grass5-request@geog.uni-hannover.de?subject=subscribe>
List-Id: GRASS 5 Developers mailing list <grass5.geog.uni-hannover.de>
List-Unsubscribe: <http://www.geog.uni-hannover.de/mailman/listinfo/grass5&gt;,
  <mailto:grass5-request@geog.uni-hannover.de?subject=unsubscribe>
List-Archive: <http://www.geog.uni-hannover.de/pipermail/grass5/&gt;
Status: O
Content-Length: 1579
Lines: 46

On Fri, Jun 29, 2001 at 07:08:58PM +0100, Glynn Clements wrote:

Roger Bivand wrote:

> > just found (obviously) a bug in r.in.ascii/r.out.ascii:
> >
> > if you export/import the same map several times, it get's
> > shifted each time!
> >
> > r.out.ascii in=dem out=dem0.asc
> > r.in.ascii in=dem0.asc out=dem1
> >
> > r.out.ascii in=dem1 out=dem1.asc
> > r.in.ascii in=dem1.asc out=dem2
> >
> > r.out.ascii in=dem2 out=dem2.asc
> > r.in.ascii in=dem2.asc out=dem3
> > d.rast dem
> > d.rast dem1
> > d.rast dem2
> > d.rast dem3
> >
> > As being lack of time today, any volunteers?
>
> Looks to me as if fseek is moving one byte too far back in the buffer on
> line 142 in r.in.ascii/cmd/gethead.c, putting the last digit of the number
> of columns as the first cell value read. Since it is followed by a
> newline, it is treated as value.
> fseek(fd, -(len+1), SEEK_CUR);
>
> For me, fseek(fd, -(len), SEEK_CUR); fixes it, but I haven't tried
> it with other header fields or command line options.

Suggestion: call ftell() before G_getl() and use fseek(SEEK_SET) to
restore the position. Alternatively, you could use fgetpos() and
fsetpos(); these are more portable, but that probably isn't an issue
here.

Thanks for your help! I've tried Roger's suggestion and tested in a
Transverse Mercator and a lat/long location - it seems to work.

This version is uploaded to CVS now.

Glynn, do you think your idea is to be preferred?

Markus

From neteler Mon Jul 2 10:29:08 2001
Return-Path: <neteler>
Received: by hgeo02.geog.uni-hannover.de (SMI-8.6/SMI-SVR4)
  id KAA04712; Mon, 2 Jul 2001 10:29:08 +0100
Date: Mon, 2 Jul 2001 10:29:07 +0100
From: Markus Neteler <neteler@geog.uni-hannover.de>
To: Roger Bivand <Roger.Bivand@nhh.no>
Cc: grass5 developers list <grass5@geog.uni-hannover.de>
Subject: Re: [GRASS5] shift bug in r.in.ascii/r.out.ascii
Message-ID: <20010702102907.O16481@hgeo02.geog.uni-hannover.de>
Mail-Followup-To: Roger Bivand <Roger.Bivand@nhh.no>,
  grass5 developers list <grass5@geog.uni-hannover.de>
References: <20010629151210.L16481@hgeo02.geog.uni-hannover.de> <Pine.LNX.4.21.0106292127450.2921-200000@reclus.nhh.no>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
User-Agent: Mutt/1.2.5i
In-Reply-To: <Pine.LNX.4.21.0106292127450.2921-200000@reclus.nhh.no>; from Roger.Bivand@nhh.no on Fri, Jun 29, 2001 at 09:29:48PM +0200
Sender: grass5-admin@geog.uni-hannover.de
Errors-To: grass5-admin@geog.uni-hannover.de
X-BeenThere: grass5@geog.uni-hannover.de
X-Mailman-Version: 2.0.5
Precedence: bulk
List-Help: <mailto:grass5-request@geog.uni-hannover.de?subject=help>
List-Post: <mailto:grass5@geog.uni-hannover.de>
List-Subscribe: <http://www.geog.uni-hannover.de/mailman/listinfo/grass5&gt;,
  <mailto:grass5-request@geog.uni-hannover.de?subject=subscribe>
List-Id: GRASS 5 Developers mailing list <grass5.geog.uni-hannover.de>
List-Unsubscribe: <http://www.geog.uni-hannover.de/mailman/listinfo/grass5&gt;,
  <mailto:grass5-request@geog.uni-hannover.de?subject=unsubscribe>
List-Archive: <http://www.geog.uni-hannover.de/pipermail/grass5/&gt;
Status: O
Content-Length: 411
Lines: 13

On Fri, Jun 29, 2001 at 09:29:48PM +0200, Roger Bivand wrote:

Markus:

Herewith the fix with Glynn's suggestion. I still haven't tested it with
more headers than r.out.ascii gives, or with command-line options set, but
this ought to be OK? (???)

Roger

... o.k (have been overlooking this mail). Thanks for sending the file,
I have tested as well, now Glynn's version is in place in CVS.

Markus

Markus Neteler wrote:

> > Looks to me as if fseek is moving one byte too far back in the buffer on
> > line 142 in r.in.ascii/cmd/gethead.c, putting the last digit of the number
> > of columns as the first cell value read. Since it is followed by a
> > newline, it is treated as value.
> > fseek(fd, -(len+1), SEEK_CUR);
> >
> > For me, fseek(fd, -(len), SEEK_CUR); fixes it, but I haven't tried
> > it with other header fields or command line options.
>
> Suggestion: call ftell() before G_getl() and use fseek(SEEK_SET) to
> restore the position. Alternatively, you could use fgetpos() and
> fsetpos(); these are more portable, but that probably isn't an issue
> here.

Thanks for your help! I've tried Roger's suggestion and tested in a
Transverse Mercator and a lat/long location - it seems to work.

This version is uploaded to CVS now.

Glynn, do you think your idea is to be preferred?

Yes.

Basically, restoring the position to one which was previously recorded
with ftell() or fgetpos() is more robust than trying to compute the
required offset from the current file position.

In this case, the code is assuming that the number of bytes by which
G_getl() advances the stream's position is equal to the length of the
string which G_getl() stores in the buffer.

Whilst this may happen to be the case, I can't see anything which
actually states that it is so (G_getl() doesn't appear to have an
entry in progman50.pdf).

Furthermore, the existing implementation of G_getl() suggests that it
isn't the case as G_getl() removes the trailing newline. This implies
that the original version (offset of "-(len+1)") should be correct;
maybe there is another (equal-but-opposite) off-by-one bug elsewhere?

There is another issue, which isn't applicable here but may be worth
bearing in mind generally, which is that the behaviour of fseek() on a
text (as opposed to binary) stream is undefined unless the offset is
either zero, or the value returned from a previous call to ftell() (in
the case of SEEK_SET).

--
Glynn Clements <glynn.clements@virgin.net>

Has anybody tested whether r.in.arc and r.out.arc does not have the same problem ?
At least one version of r.out.arc was written based on r.out.ascii so the bug might
have
been transfered there too.

Helena

Glynn Clements wrote:

Markus Neteler wrote:

> > > Looks to me as if fseek is moving one byte too far back in the buffer on
> > > line 142 in r.in.ascii/cmd/gethead.c, putting the last digit of the number
> > > of columns as the first cell value read. Since it is followed by a
> > > newline, it is treated as value.
> > > fseek(fd, -(len+1), SEEK_CUR);
> > >
> > > For me, fseek(fd, -(len), SEEK_CUR); fixes it, but I haven't tried
> > > it with other header fields or command line options.
> >
> > Suggestion: call ftell() before G_getl() and use fseek(SEEK_SET) to
> > restore the position. Alternatively, you could use fgetpos() and
> > fsetpos(); these are more portable, but that probably isn't an issue
> > here.
>
> Thanks for your help! I've tried Roger's suggestion and tested in a
> Transverse Mercator and a lat/long location - it seems to work.
>
> This version is uploaded to CVS now.
>
> Glynn, do you think your idea is to be preferred?

Yes.

Basically, restoring the position to one which was previously recorded
with ftell() or fgetpos() is more robust than trying to compute the
required offset from the current file position.

In this case, the code is assuming that the number of bytes by which
G_getl() advances the stream's position is equal to the length of the
string which G_getl() stores in the buffer.

Whilst this may happen to be the case, I can't see anything which
actually states that it is so (G_getl() doesn't appear to have an
entry in progman50.pdf).

Furthermore, the existing implementation of G_getl() suggests that it
isn't the case as G_getl() removes the trailing newline. This implies
that the original version (offset of "-(len+1)") should be correct;
maybe there is another (equal-but-opposite) off-by-one bug elsewhere?

There is another issue, which isn't applicable here but may be worth
bearing in mind generally, which is that the behaviour of fseek() on a
text (as opposed to binary) stream is undefined unless the offset is
either zero, or the value returned from a previous call to ftell() (in
the case of SEEK_SET).

--
Glynn Clements <glynn.clements@virgin.net>
_______________________________________________
grass5 mailing list
grass5@geog.uni-hannover.de
http://www.geog.uni-hannover.de/mailman/listinfo/grass5

Markus Neteler wrote:

On Thu, Jul 05, 2001 at 05:39:48PM -0500, Helena wrote:
> Has anybody tested whether r.in.arc and r.out.arc does not have the same problem ?
> At least one version of r.out.arc was written based on r.out.ascii so the bug might
> have
> been transfered there too.
>
> Helena

I have just run the same test - it seems to be working alright.

thanks for testing it

But... even here we have the precision problem:

ncols 140
nrows 144
xllcorner 3568200
yllcorner 5765462.5
cellsize 12.500000
NODATA_value -9999
126.870003 126.550003 126.25 125.93 125.599998
       ^^^ ^^^^ ^^^^^

Markus, the numbers are perfectly right for
floating point, if you want to have 9 digits (126.870000) then you would have
to store that number as double, which of course can be done but as we discussed
before do you want to have your milion cell DEM with vertical accuracy in dm stored
as doubles? I don't have a problem with this because this is a good reminder for me
how the data are stored and that it needs to be taken into account when running
the models. If it bothers you - change the output to just 7 digits. So for example if
you are using
r.out.ascii for floating point raster map and your elevations are say 100.000000 -
150.000000
you should use dp = 4 rather than the default 6, because the FP holds only 7 digits
(correct me somebody if I am not right). If you want those data output with 6 decimals
places
you would need double precission.
r.stats does not give you that choice so it may be useful to add it there and then there
are also
some programs which print out numbers with ridiculous number of digits, e.g. r.colors:
I counted 29 digits for r.colors data range when you run it with the rules option . That
should be
easy to fix.
As it was already discussed it would be possible to have grass automatically adjust
everything
related to fp to just 7 digits but there may be cases when this may be too restrictive.
Because nothing is really broken here maybe the best approach would be that whoever is
working
on a program where dp option should be added or number of digits for print out needs to
be adjusted
fixes that too, or when time allows it along with other more important things.

I guess that I have written more than I really wanted to,

Helena

Same story as with r.stats, r.to.sites, r.in./out.ascii, r.in./out.arc.

While the NULL thing is a severe change probably interesting for 5.1, the
precision issue might be solved now.

Regards

Markus

> Glynn Clements wrote:
>
> > Markus Neteler wrote:
> >
> > > > > Looks to me as if fseek is moving one byte too far back in the buffer on
> > > > > line 142 in r.in.ascii/cmd/gethead.c, putting the last digit of the number
> > > > > of columns as the first cell value read. Since it is followed by a
> > > > > newline, it is treated as value.
> > > > > fseek(fd, -(len+1), SEEK_CUR);
> > > > >
> > > > > For me, fseek(fd, -(len), SEEK_CUR); fixes it, but I haven't tried
> > > > > it with other header fields or command line options.
> > > >
> > > > Suggestion: call ftell() before G_getl() and use fseek(SEEK_SET) to
> > > > restore the position. Alternatively, you could use fgetpos() and
> > > > fsetpos(); these are more portable, but that probably isn't an issue
> > > > here.
> > >
> > > Thanks for your help! I've tried Roger's suggestion and tested in a
> > > Transverse Mercator and a lat/long location - it seems to work.
> > >
> > > This version is uploaded to CVS now.
> > >
> > > Glynn, do you think your idea is to be preferred?
> >
> > Yes.
> >
> > Basically, restoring the position to one which was previously recorded
> > with ftell() or fgetpos() is more robust than trying to compute the
> > required offset from the current file position.
> >
> > In this case, the code is assuming that the number of bytes by which
> > G_getl() advances the stream's position is equal to the length of the
> > string which G_getl() stores in the buffer.
> >
> > Whilst this may happen to be the case, I can't see anything which
> > actually states that it is so (G_getl() doesn't appear to have an
> > entry in progman50.pdf).
> >
> > Furthermore, the existing implementation of G_getl() suggests that it
> > isn't the case as G_getl() removes the trailing newline. This implies
> > that the original version (offset of "-(len+1)") should be correct;
> > maybe there is another (equal-but-opposite) off-by-one bug elsewhere?
> >
> > There is another issue, which isn't applicable here but may be worth
> > bearing in mind generally, which is that the behaviour of fseek() on a
> > text (as opposed to binary) stream is undefined unless the offset is
> > either zero, or the value returned from a previous call to ftell() (in
> > the case of SEEK_SET).
> >
> > --
> > Glynn Clements <glynn.clements@virgin.net>
> > _______________________________________________
> > grass5 mailing list
> > grass5@geog.uni-hannover.de
> > http://www.geog.uni-hannover.de/mailman/listinfo/grass5
>
> _______________________________________________
> grass5 mailing list
> grass5@geog.uni-hannover.de
> http://www.geog.uni-hannover.de/mailman/listinfo/grass5

--
Markus Neteler * University of Hannover
Institute of Physical Geography and Landscape Ecology
Schneiderberg 50 * D-30167 Hannover * Germany
Tel: ++49-(0)511-762-4494 Fax: -3984
_______________________________________________
grass5 mailing list
grass5@geog.uni-hannover.de
http://www.geog.uni-hannover.de/mailman/listinfo/grass5

On Fri, 6 Jul 2001, Helena wrote:

Markus Neteler wrote:

> I have just run the same test - it seems to be working alright.

thanks for testing it

>
> But... even here we have the precision problem:
>
> ncols 140
> nrows 144
> xllcorner 3568200
> yllcorner 5765462.5
> cellsize 12.500000
> NODATA_value -9999
> 126.870003 126.550003 126.25 125.93 125.599998
> ^^^ ^^^^ ^^^^^

Markus, the numbers are perfectly right for
floating point, if you want to have 9 digits (126.870000) then you would have
to store that number as double, which of course can be done but as we discussed
before do you want to have your milion cell DEM with vertical accuracy in dm stored
as doubles? I don't have a problem with this because this is a good reminder for me
how the data are stored and that it needs to be taken into account when running
the models. If it bothers you - change the output to just 7 digits. So for example if
you are using
r.out.ascii for floating point raster map and your elevations are say 100.000000 -
150.000000
you should use dp = 4 rather than the default 6, because the FP holds only 7 digits
(correct me somebody if I am not right). If you want those data output with 6 decimals
places
you would need double precission.
r.stats does not give you that choice so it may be useful to add it there and then there
are also
some programs which print out numbers with ridiculous number of digits, e.g. r.colors:
I counted 29 digits for r.colors data range when you run it with the rules option . That
should be
easy to fix.
As it was already discussed it would be possible to have grass automatically adjust
everything
related to fp to just 7 digits but there may be cases when this may be too restrictive.
Because nothing is really broken here maybe the best approach would be that whoever is
working
on a program where dp option should be added or number of digits for print out needs to
be adjusted
fixes that too, or when time allows it along with other more important things.

I guess that I have written more than I really wanted to,

Not at all! Some of these representation things need to be repeated when
the time is right - that is when people see the trailing digits as
representing something. FP gives you - as you say - about 7 digits, which
is fairly precise for continuous phenomena, DP will give you about 16. FP
loses out numerically when you do things to it, like lots of trigonometry,
and the number of digits you can trust will rot from the right - this
isn't COBOL or database stuff! The binary representation _is_ just a
representation, in DP does a couple of hectares more or less matter at
continental scales?

Roger

--
Roger Bivand
Economic Geography Section, Department of Economics, Norwegian School of
Economics and Business Administration, Breiviksveien 40, N-5045 Bergen,
Norway. voice: +47 55 95 93 55; fax +47 55 95 93 93
e-mail: Roger.Bivand@nhh.no
and: Department of Geography and Regional Development, University of
Gdansk, al. Mar. J. Pilsudskiego 46, PL-81 378 Gdynia, Poland.