[GRASS-dev] [GRASS GIS] #3407: v.in.ogr fails to import ESRI Shapefile with empty attribute table cells

#3407: v.in.ogr fails to import ESRI Shapefile with empty attribute table cells
---------------------+-------------------------
Reporter: mikatt | Owner: grass-dev@…
     Type: defect | Status: new
Priority: normal | Milestone:
Component: Default | Version: 7.2.1
Keywords: | CPU: x86-64
Platform: Linux |
---------------------+-------------------------
GRASS 7.2.1 (on Linux Mint 18) produces an error when trying to read in
ESRI shapefiles that have empty cells. This used to work fine until
recently, and I have no problems with GRASS 7.2.0 on Windows 7.

Here is a minimal example including messages:

Create a point vector from csv (note the empty 'val' cell in the last
line):
{{{
v.in.ascii --overwrite input=points.csv output=points_imp_g
separator=comma skip=1 columns=num integer,X double precision,Y double
precision,val double precision x=2 y=3

v.db.select map=points_imp_g
cat|num|X|Y|val
1|1|2487491.643|5112118.33|1.5
2|2|2481985.459|5109162.78|2.3
3|3|2478284.289|5105331.04|
}}}

Export to ESRI shape (works fine):
{{{
v.out.ogr input=points_imp_g output=points_exp_g format=ESRI_Shapefile
output_layer=points_exp_g

Exporting 3 features...
v.out.ogr complete. 3 features (Point type) written to <points_exp_g>
(ESRI_Shapefile format).
}}}

Import the created shape fails:
{{{
v.import input=points_exp_g/points_exp_g.shp layer=points_exp_g
output=points_imp_g2

WARNING: All available OGR layers will be imported into vector map
<points_exp_g>
Check if OGR layer <points_exp_g> contains polygons...
Column name <cat> renamed to <cat_>
Importing 3 features (OGR layer <points_exp_g>)...
DBMI-SQLite driver error:
Error in sqlite3_prepare():
near ")": syntax error
DBMI-SQLite driver error:
Error in sqlite3_prepare():
near ")": syntax error
ERROR: Cannot insert new row: insert into points_imp_g2 values ( 3, 3, 3,
2478284.288999999873340, 5105331.040000000037253, )
ERROR: Unable to import <points_exp_g/points_exp_g.shp>
}}}

Have there been substantial changes to v.in.ogr or could this be a
RGDAL/OGR issue?

--
Ticket URL: <https://trac.osgeo.org/grass/ticket/3407&gt;
GRASS GIS <https://grass.osgeo.org>

#3407: v.in.ogr fails to import ESRI Shapefile with empty attribute table cells
----------------------+-------------------------
  Reporter: mikatt | Owner: grass-dev@…
      Type: defect | Status: new
  Priority: normal | Milestone:
Component: Default | Version: 7.2.1
Resolution: | Keywords:
       CPU: x86-64 | Platform: Linux
----------------------+-------------------------

Comment (by mmetz):

Replying to [ticket:3407 mikatt]:
> GRASS 7.2.1 (on Linux Mint 18) produces an error when trying to read in
ESRI shapefiles that have empty cells. This used to work fine until
recently, and I have no problems with GRASS 7.2.0 on Windows 7.

What are the GDAL/OGR versions?
>
> Here is a minimal example including messages:
>
> [...]

Thanks for the example, I could reproduce the error.
>
> Have there been substantial changes to v.in.ogr or could this be a
RGDAL/OGR issue?
>
This is a GDAL/OGR issue. As of GDAL 2.2, `OGR_F_IsFieldSet()` apparently
returns true even if the field is empty, and `OGR_F_GetFieldAsString()`
returns a zero-length string (not NULL) for an empty field.

v.in.ogr has been fixed in r71431,2 (trunk, relbr72) to account for the
new behaviour of GDAL 2.2, supporting older versions of GDAL.

--
Ticket URL: <https://trac.osgeo.org/grass/ticket/3407#comment:1&gt;
GRASS GIS <https://grass.osgeo.org>

#3407: v.in.ogr fails to import ESRI Shapefile with empty attribute table cells
----------------------+-------------------------
  Reporter: mikatt | Owner: grass-dev@…
      Type: defect | Status: new
  Priority: normal | Milestone:
Component: Default | Version: 7.2.1
Resolution: | Keywords:
       CPU: x86-64 | Platform: Linux
----------------------+-------------------------

Comment (by rouault):

> This is a GDAL/OGR issue.

Actually it is a feature / per design. See
https://svn.osgeo.org/gdal/branches/2.2/gdal/MIGRATION_GUIDE.TXT about RFC
67 : ttps://trac.osgeo.org/gdal/wiki/rfc67_nullfieldvalues

> v.in.ogr has been fixed in r71431,2 (trunk, relbr72) to account for the
new behaviour of GDAL 2.2, supporting older versions of GDAL.

The fix is not ideal as it considers an empty string as NULL. I'd suggest
instead using OGR_F_IsFieldSetAndNotNull() when GDAL_VERSION_NUM >=
2020000 instead of OGR_F_IsFieldSet()

--
Ticket URL: <https://trac.osgeo.org/grass/ticket/3407#comment:2&gt;
GRASS GIS <https://grass.osgeo.org>

#3407: v.in.ogr fails to import ESRI Shapefile with empty attribute table cells
----------------------+-------------------------
  Reporter: mikatt | Owner: grass-dev@…
      Type: defect | Status: new
  Priority: normal | Milestone:
Component: Default | Version: 7.2.1
Resolution: | Keywords:
       CPU: x86-64 | Platform: Linux
----------------------+-------------------------

Comment (by mmetz):

Replying to [comment:2 rouault]:
> > This is a GDAL/OGR issue.
>
> Actually it is a feature / per design. See
https://svn.osgeo.org/gdal/branches/2.2/gdal/MIGRATION_GUIDE.TXT about RFC
67 : https://trac.osgeo.org/gdal/wiki/rfc67_nullfieldvalues
>
> > v.in.ogr has been fixed in r71431,2 (trunk, relbr72) to account for
the new behaviour of GDAL 2.2, supporting older versions of GDAL.
>
> The fix is not ideal as it considers an empty string as NULL.

v.in.ogr translates attributes from OGR to the GRASS default database
driver, and (at least for sqlite), empty strings are not allowed, but need
to be replaced with something else. Therefore I decided to set empty
strings to "NULL". Testing the imported vector with v.db.select yields the
expected result, i.e. an empty field as in the original csv file.

Do you have another suggestion about how to replace an empty string?

--
Ticket URL: <https://trac.osgeo.org/grass/ticket/3407#comment:3&gt;
GRASS GIS <https://grass.osgeo.org>

#3407: v.in.ogr fails to import ESRI Shapefile with empty attribute table cells
----------------------+-------------------------
  Reporter: mikatt | Owner: grass-dev@…
      Type: defect | Status: new
  Priority: normal | Milestone:
Component: Default | Version: 7.2.1
Resolution: | Keywords:
       CPU: x86-64 | Platform: Linux
----------------------+-------------------------

Comment (by rouault):

I'm just speaking about the OGR side of things. With GDAL 2.2, you have 3
potential states for a field:
- unset -> OGR_F_IsFieldSet() == FALSE. OGR_F_GetFieldAsString() will
return an empty string (it is not allowed to return NULL)
- set, but NULL -> OGR_F_IsFieldSet() == TRUE and OGR_F_IsFieldNull() ==
TRUE. OGR_F_GetFieldAsString() will also eturn an empty string
- set, and not NULL --> OGR_F_IsFieldSet() == TRUE and OGR_F_IsFieldNull()
== FALSE. An empty string is considered as a not-null set field, so a
potential return of OGR_F_GetFieldAsString()

Depending on drivers, they will decide whether to dispatch a field as
unset, set or set with empty string.
The shapefile driver as of GDAL 2.2 will set all fields, either to NULL,
or to a not-NULL set value. Note: for strings, there's no way in .dbf to
distinguish NULL from empty string, so an empty string is mapped to a NULL
field (in previous GDAL versions, this was mapped to a unset field)
The CSV driver has traditionnally mapped an empty field as a an empty
string, and this hasn't changed in GDAL 2.2

Hope that clarifies things. Ah nullness and emptyness....

So perhaps given GRASS constraints and concepts, testing the return of
OGR_F_GetFieldAsString() against the empty string is good enough if you
don't need to distinguish between the 3 above states.

--
Ticket URL: <https://trac.osgeo.org/grass/ticket/3407#comment:4&gt;
GRASS GIS <https://grass.osgeo.org>

#3407: v.in.ogr fails to import ESRI Shapefile with empty attribute table cells
----------------------+-------------------------
  Reporter: mikatt | Owner: grass-dev@…
      Type: defect | Status: new
  Priority: normal | Milestone:
Component: Default | Version: 7.2.1
Resolution: | Keywords:
       CPU: x86-64 | Platform: Linux
----------------------+-------------------------

Comment (by mmetz):

Replying to [comment:4 rouault]:
> I'm just speaking about the OGR side of things. With GDAL 2.2, you have
3 potential states for a field:
> - unset -> OGR_F_IsFieldSet() == FALSE. OGR_F_GetFieldAsString() will
return an empty string (it is not allowed to return NULL)
> - set, but NULL -> OGR_F_IsFieldSet() == TRUE and OGR_F_IsFieldNull() ==
TRUE. OGR_F_GetFieldAsString() will also eturn an empty string
> - set, and not NULL --> OGR_F_IsFieldSet() == TRUE and
OGR_F_IsFieldNull() == FALSE. An empty string is considered as a not-null
set field, so a potential return of OGR_F_GetFieldAsString()
>
> Depending on drivers, they will decide whether to dispatch a field as
unset, set or set with empty string.
> The shapefile driver as of GDAL 2.2 will set all fields, either to NULL,
or to a not-NULL set value. Note: for strings, there's no way in .dbf to
distinguish NULL from empty string, so an empty string is mapped to a NULL
field (in previous GDAL versions, this was mapped to a unset field)
> The CSV driver has traditionnally mapped an empty field as a an empty
string, and this hasn't changed in GDAL 2.2
>
> Hope that clarifies things. Ah nullness and emptyness....
>
> So perhaps given GRASS constraints and concepts,

Constraints and concepts of dbf/sqlite/postgresql/mysql, to be precise.

As you explained, the OGR dbf driver, which is also used internally in
GRASS for dbf databases, does not distinguish between unset and NULL, thus
import with dbf as GRASS database should work as before.

Regarding import into GRASS, there was no need to distinguish between the
3 possible states, at least there are no known problems.

> testing the return of OGR_F_GetFieldAsString() against the empty string
is good enough if you don't need to distinguish between the 3 above
states.

Regarding export from GRASS to OGR, what happens if a field is not
explicitly set or unset? Is it now unset or set, but NULL? Do we need to
adjust v.out.ogr using `OGR_F_SetFieldNull()` and/or `OGR_F_UnsetField()`
instead of not touching the field at all?

--
Ticket URL: <https://trac.osgeo.org/grass/ticket/3407#comment:5&gt;
GRASS GIS <https://grass.osgeo.org>

#3407: v.in.ogr fails to import ESRI Shapefile with empty attribute table cells
----------------------+-------------------------
  Reporter: mikatt | Owner: grass-dev@…
      Type: defect | Status: new
  Priority: normal | Milestone:
Component: Default | Version: 7.2.1
Resolution: | Keywords:
       CPU: x86-64 | Platform: Linux
----------------------+-------------------------

Comment (by rouault):

> Regarding export from GRASS to OGR, what happens if a field is not
explicitly set or unset? Is it now unset or set, but NULL? Do we need to
adjust v.out.ogr using `OGR_F_SetFieldNull()` and/or `OGR_F_UnsetField()`
instead of not touching the field at all?

By default and by definition, a field that is not set ... is unset :slight_smile:
This is no different in GDAL 2.2 than in previous versions. So depending
on drivers, they might react differently on writing if a field is unset or
if it set to an empty string. For CSV and Shapefile, this will result in
the same file. For GeoJSON, a unset field doesn't appear at all in the
result, whereas an empty string is written as '"field" : "" '

With GDAL 2.2, you can also set a field to NULL. The behaviour will depend
on drivers. For CSV and Shapefile, NULL will be dealt as unset or empty
string. For GeoJSON, it will be written as ' "field": null '.

--
Ticket URL: <https://trac.osgeo.org/grass/ticket/3407#comment:6&gt;
GRASS GIS <https://grass.osgeo.org>

#3407: v.in.ogr fails to import ESRI Shapefile with empty attribute table cells
----------------------+-------------------------
  Reporter: mikatt | Owner: grass-dev@…
      Type: defect | Status: new
  Priority: normal | Milestone:
Component: Default | Version: 7.2.1
Resolution: | Keywords:
       CPU: x86-64 | Platform: Linux
----------------------+-------------------------

Comment (by mmetz):

Replying to [comment:6 rouault]:
> > Regarding export from GRASS to OGR, what happens if a field is not
explicitly set or unset? Is it now unset or set, but NULL? Do we need to
adjust v.out.ogr using `OGR_F_SetFieldNull()` and/or `OGR_F_UnsetField()`
instead of not touching the field at all?
>
> By default and by definition, a field that is not set ... is unset :slight_smile:
This is no different in GDAL 2.2 than in previous versions. So depending
on drivers, they might react differently on writing if a field is unset or
if it set to an empty string. For CSV and Shapefile, this will result in
the same file. For GeoJSON, a unset field doesn't appear at all in the
result, whereas an empty string is written as '"field" : "" '

Ok, a field that is not set is unset or set, but NULL, depending on the
driver :wink:

>
> With GDAL 2.2, you can also set a field to NULL. The behaviour will
depend on drivers. For CSV and Shapefile, NULL will be dealt as unset or
empty string. For GeoJSON, it will be written as ' "field": null '.

From [https://svn.osgeo.org/gdal/branches/2.2/gdal/MIGRATION_GUIDE.TXT\]:

"before, a unset field was written as field_name: null. Starting with GDAL
2.2, only fields explicitly set as null with OGR_F_SetFieldNull() will be
written with a null value."

In order to preserve backwards compatibility for other software packages,
GRASS will need to use OGR_F_SetFieldNull() starting with GDAL 2.2 if data
export should not change.

--
Ticket URL: <https://trac.osgeo.org/grass/ticket/3407#comment:7&gt;
GRASS GIS <https://grass.osgeo.org>

#3407: v.in.ogr fails to import ESRI Shapefile with empty attribute table cells
----------------------+-------------------------
  Reporter: mikatt | Owner: grass-dev@…
      Type: defect | Status: new
  Priority: normal | Milestone:
Component: Default | Version: 7.2.1
Resolution: | Keywords:
       CPU: x86-64 | Platform: Linux
----------------------+-------------------------

Comment (by rouault):

> In order to preserve backwards compatibility for other software
packages, GRASS will need to use OGR_F_SetFieldNull() starting with GDAL
2.2 if data export should not change.

Yes that only affects GeoJSON based drivers and GML, that are the only
ones that can translate differently unset and NULL in the format. Other
drivers will translate in the same way a unset field or a null field.

--
Ticket URL: <https://trac.osgeo.org/grass/ticket/3407#comment:8&gt;
GRASS GIS <https://grass.osgeo.org>

#3407: v.in.ogr fails to import ESRI Shapefile with empty attribute table cells
----------------------+-------------------------
  Reporter: mikatt | Owner: grass-dev@…
      Type: defect | Status: new
  Priority: normal | Milestone:
Component: Default | Version: 7.2.1
Resolution: | Keywords:
       CPU: x86-64 | Platform: Linux
----------------------+-------------------------

Comment (by mmetz):

Replying to [comment:8 rouault]:
> > In order to preserve backwards compatibility for other software
packages, GRASS will need to use OGR_F_SetFieldNull() starting with GDAL
2.2 if data export should not change.
>
> Yes that only affects GeoJSON based drivers and GML, that are the only
ones that can translate differently unset and NULL in the format. Other
drivers will translate in the same way a unset field or a null field.

OK, export from GRASS to OGR (v.out.ogr) has been adjusted in r71433,4
(trunk, relbr72) such that exported data are the same independent of the
GDAL version, important for compatibility with other software not (yet)
using GDAL 2.2.

--
Ticket URL: <https://trac.osgeo.org/grass/ticket/3407#comment:9&gt;
GRASS GIS <https://grass.osgeo.org>