[GRASS-dev] Re: [GRASS-user] r.stats: negative cell counts and percentages

Hermann wrote:

svn up https://svn.osgeo.org/grass/grass/branches/releasebranch_6_4 grass64_release

not that it matters here, but do you use 32 bit or 64 bit
CPU+OS?

(on 8 Sptember 2009)

...

>> I have the following raster map:
>>
>> Type of Map: raster
Number of Categories: 255 Data
Type: CELL Rows:
58000 Columns:
67000 Total
Cells: 3886000000
>>
>> r.stats (-c and -p) tells me:

...

>> I guess there is an integer overflow somewhere(?)

Yeah, I think you are right.

There are many int++'s in raster/r.stats/stats.c
Perhaps those counters++ should be changed to "long long"?

what is the result of r.univar ?

I see r.univar has been modified to use int for number of cells
and number of non-null cells. (earlier versions used unsigned
int, why the change?) But 'unsigned int' for the counter just
postpones the problem a couple of years, maybe long long is
better there?

Like r.info, I think r.univar and r.stats are pretty fundamental
raster modules to have working correctly for this.

#32bit linux system/gcc
sizeof(long long) is 8
sizeof(long) is 4
sizeof(int) is 4

#64bit linux system/gcc
sizeof(long long) is 8
sizeof(long) is 8
sizeof(int) is 4

I notice r.info uses unsigned long long + printf %llu
Shall we standardize on that? If it can be done without
dealing with --enable-lfs all the better, I'm pretty sure
that LFS has more to do with which 64bit friendly glibc
functions to use so not relevant. But I'm not 100% sure
about this stuff so I ask.

I'm guessing that off_t should only be used for addresses,
or if all else fails can it be repurposed?

Hamish

Hamish wrote:

I notice r.info uses unsigned long long + printf %llu
Shall we standardize on that? If it can be done without
dealing with --enable-lfs all the better, I'm pretty sure
that LFS has more to do with which 64bit friendly glibc
functions to use so not relevant. But I'm not 100% sure
about this stuff so I ask.

... and g.region uses an #ifdef:

#ifdef HAVE_LONG_LONG_INT
            fprintf(stdout, "%-*s %lld\n", width, "cells:",
                    (long long)window->rows * window->cols);
            if (print_flag & PRINT_3D)
                fprintf(stdout, "%-*s %lld\n", width, "3dcells:",
                        (long long)window->rows3 * window->cols3 *
                        window->depths);
#else
            fprintf(stdout, "%-*s %ld\n", width, "cells:",
                    (long)window->rows * window->cols);
            if (print_flag & PRINT_3D)
                fprintf(stdout, "%-*s %ld\n", width, "3dcells:",
                        (long)window->rows3 * window->cols3 * window->depths);
#endif

Hamish

Hamish wrote:

> >> I guess there is an integer overflow somewhere(?)

Yeah, I think you are right.

There are many int++'s in raster/r.stats/stats.c
Perhaps those counters++ should be changed to "long long"?

"long long" is C99, but provided by some C89 compilers as an
extension. Any use must be conditionalised upon:

  #ifdef HAVE_LONG_LONG_INT

Any "long long" values need to be printed with "%lld" ("%llu" for
unsigned), which also needs to be conditionalised.

I notice r.info uses unsigned long long + printf %llu
Shall we standardize on that?

The main downside is that you can end up needing a lot of conditional
code.

If it can be done without
dealing with --enable-lfs all the better, I'm pretty sure
that LFS has more to do with which 64bit friendly glibc
functions to use so not relevant. But I'm not 100% sure
about this stuff so I ask.

LFS has two parts: making off_t a 64-bit type, and redirecting I/O
functions to the 64-bit versions.

In order to make off_t a 64-bit type, there has to *be* a 64-bit type.
But that type will exist regardless of whether LFS is enabled.

I'm guessing that off_t should only be used for addresses,
or if all else fails can it be repurposed?

off_t should only be used for file offsets. There's no advantage of
using it as a general-purpose integer type, as you need to convert it
to a standard type for printing, so you still need to check for the
existence of "long long" to determine which type to convert it to.

--
Glynn Clements <glynn@gclements.plus.com>

Hamish wrote:

> I notice r.info uses unsigned long long + printf %llu
> Shall we standardize on that?

Glynn wrote:

The main downside is that you can end up needing a lot of
conditional code.

I'm not seeing any alternative though.

For any module which does math with it (eg casting prior to
variable multiplication/division), perhaps to save some noise in
the code the LONGTYPE could be set up as a macro at the top
of the file and then the casts should look like
(LONGTYPE)chellhd.rows * cellhb*cols

setting up strfmt=LONGFMT for %llu or %lu is a bit uglier, but
a number of other modules pull such tricks to aviod CELL/FCELL/
DCELL switch statements.

Hamish

Hamish wrote:

Hamish wrote:
> > I notice r.info uses unsigned long long + printf %llu
> > Shall we standardize on that?

Glynn wrote:
> The main downside is that you can end up needing a lot of
> conditional code.

I'm not seeing any alternative though.

For any module which does math with it (eg casting prior to
variable multiplication/division), perhaps to save some noise in
the code the LONGTYPE could be set up as a macro at the top
of the file and then the casts should look like
(LONGTYPE)chellhd.rows * cellhb*cols

I suggest adding the following (or something similar) to config.h:

  #ifdef HAVE_LONG_LONG_INT
  #define longest long long
  #define PRILONGEST_PREFIX "ll"
  #else
  #define longest long
  #define PRILONGEST_PREFIX "l"
  #endif

  #define PRIdLONGEST PRILONGEST_PREFIX "d"
  #define PRIiLONGEST PRILONGEST_PREFIX "i"
  #define PRIoLONGEST PRILONGEST_PREFIX "o"
  #define PRIuLONGEST PRILONGEST_PREFIX "u"
  #define PRIxLONGEST PRILONGEST_PREFIX "x"
  #define PRIXLONGEST PRILONGEST_PREFIX "X"

I'm unsure whether it's better to use a macro or a typedef for the
type. Using a macro allows the use of "unsigned longest" rather than
requiring a separate typedef (e.g. "u_longest").

I have no particular preference as to the name of the macro/type, or
whether to use upper or lower case for a macro (a typedef should use
lower case).

The PRI* syntax is intended to mirror the C99 macros from
<inttypes.h>. The definition doesn't include the leading "%", so that
you can include flags, a field width and/or a precision.

--
Glynn Clements <glynn@gclements.plus.com>

Hamish wrote:

Hermann wrote:

svn up https://svn.osgeo.org/grass/grass/branches/releasebranch_6_4 grass64_release

not that it matters here, but do you use 32 bit or 64 bit
CPU+OS?

32

what is the result of r.univar ?

total null and non-null cells: -408967296
total null cells: -913132378

Hermann

> what is the result of r.univar ?

Hermann:

total null and non-null cells: -408967296
total null cells: -913132378

fyi filed as bug # 771.

The problem and solution are known, so a fix shouldn't take too
long.

regards,
Hamish