[GRASS5] potential problems in d.vect.thematic when using a column of type real in postgresql

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hello,

I just discovered a slight problem in d.vect.thematic, but I'm not
sure whether d.vect.thematic should/could be altered for this, or
whether a simple warning should be inserted in the man page.

When using a column that is defined as 'real' in postgresql, the
results of the comparisons are more or less arbitrary, since, as the
postgresql manual puts it:

"The data types real and double precision are inexact,
variable-precision numeric types", meaning that "comparing two
floating-point values for equality may or may not work as expected".

Now there seem to be two possible solutions for that in postgresql:

1) quote the constant you are comparing your real value with, i.e.
(within psql):

grass=> select * from espvie where e_=81.58;
grands_sec | pop | icm | e_ | cat | e2
- ------------+-----+-----+----+-----+----
(0 lignes)

grass=> select * from espvie where e_='81.58';
grands_sec | pop | icm | e_ | cat | e2
- ------------+-------+----------+-------+-----+-------
23103 | 13600 | 0.722814 | 81.58 | 51 | 81.58
(1 ligne)

Using this solution would mean changing the lines constructing the
sqlwhere variable from

sqlwhere="$GIS_OPT_column $mincomparison $rangemin and
$GIS_OPT_column<=$rangemax"

to

sqlwhere="$GIS_OPT_column $mincomparison '$rangemin' and
$GIS_OPT_column<='$rangemax'"

But I don't know what the effect of this would be on other drivers
(such as dbf). And don't have the time to test this just now.

2) use the 'numeric' data type:

grass=> \d espvie
               Table «public.espvie»
  Colonne | Type | Modificateurs
- ------------+----------------------+---------------
grands_sec | character varying(8) |
pop | real |
icm | real |
e_ | real |
cat | integer |
e2 | numeric |

grass=> select * from espvie where e2=81.58;
grands_sec | pop | icm | e_ | cat | e2
- ------------+-------+----------+-------+-----+-------
23103 | 13600 | 0.722814 | 81.58 | 51 | 81.58
(1 ligne)

This solution would imply a simple warning in the man page.

This problem is obviously wider than just d.vect.thematic. d.vect is
affected by it as well, and I imagine any command using a where
clause...

Any suggestions ?

Moritz
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.1 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org

iD8DBQFDKGvvrIrMbm76jD8RAhRIAJ9Xmhv2YbqevRow1PaMfq3voa98RQCfTY9M
o/3YjYr1it/fMFHBir/ni0A=
=vb29
-----END PGP SIGNATURE-----

From: Moritz Lennert <mlennert@club.worldonline.be>
Sent: Wed, 14 Sep 2005 20:29:03 +0200

Hello,

I just discovered a slight problem in d.vect.thematic, but I'm not
sure whether d.vect.thematic should/could be altered for this, or
whether a simple warning should be inserted in the man page.

When using a column that is defined as 'real' in postgresql, the
results of the comparisons are more or less arbitrary, since, as the
postgresql manual puts it:

"The data types real and double precision are inexact,
variable-precision numeric types", meaning that "comparing two
floating-point values for equality may or may not work as expected".

That's a general problem with floating-point arithmetic.

The common viable solution is to define "equality" as "very small
difference", e.g. instead of a=b compare abs(a-b) to something small (say 1e-
10) or a/b to something close to one.

If the values are stored as floating point but are supposed to be fixed-
point (e.g. currency) then 1) you may run into heavy trouble and 2) you may
compare int(a*r) to int(b*r) where r is 10^precision.

In GRASS, there are currently no functions in expressions for every DBMI
driver, thus the only possible rewrite for 'a=b' would be:

'a/b < 1+1e-10 and b/a < 1+1e-10'

which might run into division by zero errors and so on.

Now there seem to be two possible solutions for that in postgresql:

1) quote the constant you are comparing your real value with, i.e.
(within psql):

grass=> select * from espvie where e_=81.58;
grands_sec | pop | icm | e_ | cat | e2
- ------------+-----+-----+----+-----+----
(0 lignes)

grass=> select * from espvie where e_='81.58';
grands_sec | pop | icm | e_ | cat | e2
- ------------+-------+----------+-------+-----+-------
23103 | 13600 | 0.722814 | 81.58 | 51 | 81.58
(1 ligne)

Using this solution would mean changing the lines constructing the
sqlwhere variable from

sqlwhere="$GIS_OPT_column $mincomparison $rangemin and
$GIS_OPT_column<=$rangemax"

to

sqlwhere="$GIS_OPT_column $mincomparison '$rangemin' and
$GIS_OPT_column<='$rangemax'"

But I don't know what the effect of this would be on other drivers
(such as dbf). And don't have the time to test this just now.

Try it. DBF driver can be adjusted to deal with that, since expression
calculations perform type conversion. I'd say it works currently. The
problem is that there is no guarantee that C's conversion will produce the
proper value giving the expected result. We might try to be extremely smart
and detect an equality testing for reals and then use an approximate
equality. Seems dubious, but some packages actually do that, knowinf what
the machine epsilon is.

2) use the 'numeric' data type:

grass=> \d espvie
               Table «public.espvie»
  Colonne | Type | Modificateurs
- ------------+----------------------+---------------
grands_sec | character varying(8) |
pop | real |
icm | real |
e_ | real |
cat | integer |
e2 | numeric |

grass=> select * from espvie where e2=81.58;
grands_sec | pop | icm | e_ | cat | e2
- ------------+-------+----------+-------+-----+-------
23103 | 13600 | 0.722814 | 81.58 | 51 | 81.58
(1 ligne)

This solution would imply a simple warning in the man page.

I´d say a warning is in order. In the sql notes, preferably.

This problem is obviously wider than just d.vect.thematic. d.vect is
affected by it as well, and I imagine any command using a where
clause...

Any suggestions ?

Moritz

thus the only possible rewrite for 'a=b' would be:

'a/b < 1+1e-10 and b/a < 1+1e-10'

which might run into division by zero errors and so on.

note for floating points, divide by zero just gives a value of "nan".
(not a number)

It is only with integers that it causes an error.

Hamish

Hamish wrote:

> thus the only possible rewrite for 'a=b' would be:
>
> 'a/b < 1+1e-10 and b/a < 1+1e-10'
>
> which might run into division by zero errors and so on.

note for floating points, divide by zero just gives a value of "nan".
(not a number)

Dividing zero by zero gives NaN; dividing non-zero values by zero
gives infinity.

--
Glynn Clements <glynn@gclements.plus.com>