[GRASS-dev] Usage of size_t

Hello,

I'm moving size_t topic from user to dev list.

As I understand, size_t should be used also for number of rows and
number of columns of a map. But for example Rast_window_{rows,cols}
returns int. So will this be also changed/fixed to size_t?

Note that some instances of using int instead of size_t can be
identified by GCC warning option -Wsign-conversion. It marks usage of
int as an argument of malloc/G_calloc/... or usage of int for
iterating C++ STL containers.

Vaclav

From topic "NVIZ can't allocate enough memory" on [GRASS-user]:

From: Glynn Clements <glynn@gclements.plus.com>
Date: Tue, Jun 12, 2012 at 5:23 AM
Subject: Re: [GRASS-user] NVIZ can't allocate enough memory
To: Daniel Lee <lee@isi-solutions.org>
Cc: GRASS user list <grass-user@lists.osgeo.org>

Daniel Lee wrote:

ERROR: G_malloc: unable to allocate 18446744071974792324 bytes at gsds.c:575

18446744071974792324 = 0xffffffff9899ac84

Does anybody know what could be the problem? If I'm interpreting the bytes
correctly, that'd be about 10^10 gigabytes, which several orders of
magnitude larger than the rasters involved. Thanks!

A calculation has overflowed the range of a signed 32-bit integer
resulting in a negative value. This value has then been converted to a
signed 64-bit integer, and then to an unsigned 64-bit integer.

Here's the output of g.region:

rows: 32001
cols: 20001
cells: 640052001

At 4 bytes per cell, 640052001 cells = 2560208004 bytes.

2560208004 = 0x9899ac84

The maximum value representable by a 32-bit integer is 0x7fffffff =
2147483647.

Using an unsigned integer would eliminate the problem in this
particular case, but the region wouldn't need to be much larger in
order for that to be insufficient (specifically, 32001 x 33553 would
overflow).

The correct solution is to use "size_t" rather than "int". E.g. for
this specific case:

--- gsds.c (revision 52015)
+++ gsds.c (working copy)
@@ -481,7 +481,8 @@
int gsds_alloc_typbuff(int id, int *dims, int ndims, int type)
{
dataset *ds;
- int i, siz = 1;
+ int i;
+ size_t siz = 1;

if \(\(ds = get\_dataset\(id\)\)\) \{
   /\*

More generally, it's important that the conversion comes before the
mulitply; e.g.:

   size\_t cells = \(size\_t\) rows \* cols;

will work but:

   size\_t cells = rows \* cols;

won't; the calculation will be perfomed using "int", overflow, then
the overflowed result will be promoted.

However: the number of instances of this particular issue in the GRASS
source code is probably somewhere between "dozens" and "hundreds", and
there's no systematic way to identify them. Running test cases with a
region of >= 2^32 cells (or even >= 2^29 cells, i.e. >= 2^31 bytes at
4 bytes per cell) would find a lot of them, but it requires a 64-bit
system with plenty of RAM, and it's going to be slow.

--
Glynn Clements <glynn@gclements.plus.com>
_______________________________________________
grass-user mailing list
grass-user@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/grass-user

On Thu, Jun 21, 2012 at 5:16 PM, Vaclav Petras <wenzeslaus@gmail.com> wrote:

Hello,

I'm moving size_t topic from user to dev list.

As I understand, size_t should be used also for number of rows and
number of columns of a map. But for example Rast_window_{rows,cols}
returns int. So will this be also changed/fixed to size_t?

Considering current hardware, we struggle to support maps with
2147483647 (the maximum value representable by a 32-bit integer;
Glynn's previous answer) cells = rows * columns. Therefore I guess
that for the time being int is fine as data type for rows and columns.
Supporting 2147483647 rows x 2147483647 columns is still a far way to
go. Type size_t should be used for rows * columns * sizeof(some data)
when allocating memory, and type off_t should be used for rows *
columns * sizeof(some data) for disk IO. I would just cast as
suggested by Glynn.

Note that some instances of using int instead of size_t can be
identified by GCC warning option -Wsign-conversion. It marks usage of
int as an argument of malloc/G_calloc/... or usage of int for
iterating C++ STL containers.

The number of rows and columns should always be non-negative, thus no
worries about sign conversion.

Also note that when calling e.g. Rast_northing_to_row() which returns
the row number, not the number of rows, the resulting row can be
negative and must be treated as double or casted to some signed
integer, not a potentially unsigned size_t.

Markus M

Vaclav

From topic "NVIZ can't allocate enough memory" on [GRASS-user]:

From: Glynn Clements <glynn@gclements.plus.com>
Date: Tue, Jun 12, 2012 at 5:23 AM
Subject: Re: [GRASS-user] NVIZ can't allocate enough memory
To: Daniel Lee <lee@isi-solutions.org>
Cc: GRASS user list <grass-user@lists.osgeo.org>

Daniel Lee wrote:

ERROR: G_malloc: unable to allocate 18446744071974792324 bytes at gsds.c:575

18446744071974792324 = 0xffffffff9899ac84

Does anybody know what could be the problem? If I'm interpreting the bytes
correctly, that'd be about 10^10 gigabytes, which several orders of
magnitude larger than the rasters involved. Thanks!

A calculation has overflowed the range of a signed 32-bit integer
resulting in a negative value. This value has then been converted to a
signed 64-bit integer, and then to an unsigned 64-bit integer.

Here's the output of g.region:

rows: 32001
cols: 20001
cells: 640052001

At 4 bytes per cell, 640052001 cells = 2560208004 bytes.

2560208004 = 0x9899ac84

The maximum value representable by a 32-bit integer is 0x7fffffff =
2147483647.

Using an unsigned integer would eliminate the problem in this
particular case, but the region wouldn't need to be much larger in
order for that to be insufficient (specifically, 32001 x 33553 would
overflow).

The correct solution is to use "size_t" rather than "int". E.g. for
this specific case:

--- gsds.c (revision 52015)
+++ gsds.c (working copy)
@@ -481,7 +481,8 @@
int gsds_alloc_typbuff(int id, int *dims, int ndims, int type)
{
dataset *ds;
- int i, siz = 1;
+ int i;
+ size_t siz = 1;

if \(\(ds = get\_dataset\(id\)\)\) \{
   /\*

More generally, it's important that the conversion comes before the
mulitply; e.g.:

   size\_t cells = \(size\_t\) rows \* cols;

will work but:

   size\_t cells = rows \* cols;

won't; the calculation will be perfomed using "int", overflow, then
the overflowed result will be promoted.

However: the number of instances of this particular issue in the GRASS
source code is probably somewhere between "dozens" and "hundreds", and
there's no systematic way to identify them. Running test cases with a
region of >= 2^32 cells (or even >= 2^29 cells, i.e. >= 2^31 bytes at
4 bytes per cell) would find a lot of them, but it requires a 64-bit
system with plenty of RAM, and it's going to be slow.

--
Glynn Clements <glynn@gclements.plus.com>
_______________________________________________
grass-user mailing list
grass-user@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/grass-user

_______________________________________________
grass-dev mailing list
grass-dev@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/grass-dev

Markus Metz wrote:

> I'm moving size_t topic from user to dev list.
>
> As I understand, size_t should be used also for number of rows and
> number of columns of a map. But for example Rast_window_{rows,cols}
> returns int. So will this be also changed/fixed to size_t?

Considering current hardware, we struggle to support maps with
2147483647 (the maximum value representable by a 32-bit integer;
Glynn's previous answer) cells = rows * columns. Therefore I guess
that for the time being int is fine as data type for rows and columns.
Supporting 2147483647 rows x 2147483647 columns is still a far way to
go. Type size_t should be used for rows * columns * sizeof(some data)
when allocating memory, and type off_t should be used for rows *
columns * sizeof(some data) for disk IO. I would just cast as
suggested by Glynn.

FWIW:

size_t is specified by the C standard as the type of a "sizeof"
expression. It's used in the library as the argument to malloc(), and
wherever an array size is specified (e.g. memcpy, fread, etc).

On systems with a flat address space (i.e. anything which GRASS
targets), size_t will have the same size as a pointer.

off_t isn't specified by the C standard, but by POSIX, and refers to
the type used by lseek() to store a file offset. On 32-bit systems,
off_t may be either 32 or 64 bits. off_t is a signed type.

If you're going to read an entire map into memory, then using size_t
for the number of cells or bytes is sufficient for any map which will
actually fit into memory.

OTOH, if you're processing the file row-by-row, it may not actually be
possible to calculate the total number of cells or bytes.

For any file which can be opened, off_t is guaranteed to be large
enough to represent the size of the file on disk; without LFS, the OS
will refuse to open a file larger than 2GiB.

However, GRASS raster maps are normally compressed, so the number of
bytes or cells in the decompressed data may exceed 2^31 even if the
file is smaller than 2 GiB.

--
Glynn Clements <glynn@gclements.plus.com>