[GRASS-dev] [GRASS GIS] #198: v.in.ascii: column scanning is borked

#198: v.in.ascii: column scanning is borked
------------------------+---------------------------------------------------
Reporter: hamish | Owner: grass-dev@lists.osgeo.org
     Type: defect | Status: new
Priority: critical | Milestone: 6.4.0
Component: Vector | Version: svn-develbranch6
Keywords: v.in.ascii | Platform: All
      Cpu: All |
------------------------+---------------------------------------------------
Hi,

this bug is related to the old RT bugs 2763 and 5209.
   http://intevation.de/rt/webrt?serial_num=2763
   http://intevation.de/rt/webrt?serial_num=5209

and the clumsy empty last-column work-around in v.in.gpsbabel:
http://trac.osgeo.org/grass/browser/grass/trunk/scripts/v.in.gpsbabel/v.in.gpsbabel#L298
  "FIXME: if last field (comments) is empty it causes a not-enough fields
error in v.in.ascii"

The column type scanning step in v.in.ascii's points mode no longer
accepts empty columns as NULL, and imported tables have columns truncated.
Note that passing empty values in double columns works in GRASS 6.2.3!

It would be nice to allow numeric columns as empty or 'NULL' for an empty
record, and allow "nan" or "inf" without the scanning function deciding
that the column contains strings.
(For varchar columns the word 'NULL' should not be stripped however)

Input file:
{{{
cat << EOF > test.dat
cat|x|y|name|value|count
1|2.3|4.5|Foo|3.1415|4
2|2.4|4.6|Bar|||
EOF
}}}

Import without column declaration:
{{{
G64svn> v.in.ascii in=test.dat out=test_null_import skip=1 \
            cat=1 x=2 y=3 --verbose

Scanning input for column types...
Maximum input row length: 25
Maximum number of columns: 6
Minimum number of columns: 6
Column: 1 type: integer
Column: 2 type: double
Column: 3 type: double
Column: 4 type: string length: 3
Column: 5 type: string length: 0
Column: 6 type: string length: 0
Importing points...
Populating table...
Building topology for vector map <test_null_import>...
2 primitives registered
Building areas: 100%
0 areas built
0 isles built
Attaching islands:
Attaching centroids: 100%
Topology was built
Number of nodes : 2
Number of primitives: 2
Number of points : 2
Number of lines : 0
Number of boundaries: 0
Number of centroids : 0
Number of areas : 0
Number of isles : 0
v.in.ascii complete.

G64svn> v.info -c test_null_import
Displaying column types/names for database connection of layer 1:
INTEGER|int_1
DOUBLE PRECISION|dbl_1
DOUBLE PRECISION|dbl_2
CHARACTER|str_1
}}}

  * what happened to columns 5 and 6?

{{{
  Column: 5 type: string length: 0
  Column: 6 type: string length: 0
}}}
  * Columns 5 and 6 incorrectly scanned as (empty) "string" type.

Also, I am not sure if hiding the column scanning result behind --verbose
mode is advisable, given that it is buggy and it is the first line of
defense when the input file contains typos.

Import with column declaration:
{{{
G64svn> v.in.ascii in=test.dat out=test_null_import skip=1 \
           cat=1 x=2 y=3 --verbose \
           columns='cat int, x double, y double, name varchar(10), value
double, count int'

Scanning input for column types...
Maximum input row length: 25
Maximum number of columns: 6
Minimum number of columns: 6
Column: 1 type: integer
Column: 2 type: double
Column: 3 type: double
Column: 4 type: string length: 3
Column: 5 type: string length: 0
Column: 6 type: string length: 0
WARNING: Table <test_null_import> linked to vector map <test_null_import>
          does not exist
ERROR: Column number 5 defined as double has string values
}}}

  * in addition to previous errors the "table does not exist" warning's
meaning is a mystery.

changing the empty {{{"||"}}} to "|NULL|" doesn't help, the scanning step
declares it as a string column (length: 4) and refuses to continue.

this is important code, so tread with greatest care.....

Hamish

--
Ticket URL: <http://trac.osgeo.org/grass/ticket/198&gt;
GRASS GIS <http://grass.osgeo.org>

#198: v.in.ascii: column scanning is borked
-----------------------+----------------------------------------------------
  Reporter: hamish | Owner: grass-dev@lists.osgeo.org
      Type: defect | Status: new
  Priority: critical | Milestone: 6.4.0
Component: Vector | Version: svn-develbranch6
Resolution: | Keywords: v.in.ascii
  Platform: All | Cpu: All
-----------------------+----------------------------------------------------
Comment (by mmetz):

Try attached patch for the missing values problem. NULL, nan or inf is
still not recognized. There is however still a nonsense warning for
completely empty columns declared double, but import is successful.

Markus M

--
Ticket URL: <https://trac.osgeo.org/grass/ticket/198#comment:1&gt;
GRASS GIS <http://grass.osgeo.org>

#198: v.in.ascii: column scanning is borked
-----------------------+----------------------------------------------------
  Reporter: hamish | Owner: grass-dev@lists.osgeo.org
      Type: defect | Status: new
  Priority: critical | Milestone: 6.4.1
Component: Vector | Version: svn-develbranch6
Resolution: | Keywords: v.in.ascii
  Platform: All | Cpu: All
-----------------------+----------------------------------------------------
Changes (by hamish):

  * milestone: 6.4.0 => 6.4.1

Comment:

patch applied in 6.5 and 7; looks like it's fine but deferring backport to
relbr64 until 6.4.1 to allow more testing.

Hamish

--
Ticket URL: <https://trac.osgeo.org/grass/ticket/198#comment:2&gt;
GRASS GIS <http://grass.osgeo.org>

#198: v.in.ascii: column scanning is borked
-----------------------+----------------------------------------------------
  Reporter: hamish | Owner: grass-dev@lists.osgeo.org
      Type: defect | Status: new
  Priority: critical | Milestone: 6.4.1
Component: Vector | Version: svn-develbranch6
Resolution: | Keywords: v.in.ascii
  Platform: All | Cpu: All
-----------------------+----------------------------------------------------
Changes (by hamish):

  * milestone: => 6.4.1

--
Ticket URL: <https://trac.osgeo.org/grass/ticket/198#comment:4&gt;
GRASS GIS <http://grass.osgeo.org>

#198: v.in.ascii: column scanning is borked
------------------------+---------------------------------------------------
Reporter: hamish | Owner: grass-dev@…
     Type: defect | Status: new
Priority: critical | Milestone: 6.4.1
Component: Vector | Version: svn-develbranch6
Keywords: v.in.ascii | Platform: All
      Cpu: All |
------------------------+---------------------------------------------------
Changes (by martinl):

* cc: martinl (added)

Comment:

Replying to [comment:2 hamish]:
> patch applied in 6.5 and 7; looks like it's fine but deferring backport
to relbr64 until 6.4.1 to allow more testing.

it's already in relbr64. So can we close the ticket?

--
Ticket URL: <http://trac.osgeo.org/grass/ticket/198#comment:5&gt;
GRASS GIS <http://grass.osgeo.org>

#198: v.in.ascii: column scanning is borked
-----------------------+----------------------------------------------------
  Reporter: hamish | Owner: grass-dev@…
      Type: defect | Status: closed
  Priority: critical | Milestone: 6.4.1
Component: Vector | Version: svn-develbranch6
Resolution: fixed | Keywords: v.in.ascii
  Platform: All | Cpu: All
-----------------------+----------------------------------------------------
Changes (by martinl):

  * status: new => closed
  * resolution: => fixed

Comment:

Replying to [comment:5 martinl]:
> Replying to [comment:2 hamish]:
> > patch applied in 6.5 and 7; looks like it's fine but deferring
backport to relbr64 until 6.4.1 to allow more testing.
>
> it's already in relbr64. So can we close the ticket?

Closing for now.

--
Ticket URL: <http://trac.osgeo.org/grass/ticket/198#comment:6&gt;
GRASS GIS <http://grass.osgeo.org>

#198: v.in.ascii: column scanning is borked
-----------------------+----------------------------------------------------
  Reporter: hamish | Owner: grass-dev@…
      Type: defect | Status: reopened
  Priority: critical | Milestone: 6.4.2
Component: Vector | Version: svn-develbranch6
Resolution: | Keywords: v.in.ascii
  Platform: All | Cpu: All
-----------------------+----------------------------------------------------
Changes (by mmetz):

  * status: closed => reopened
  * resolution: fixed =>
  * milestone: 6.4.1 => 6.4.2

Comment:

Still not working. Test data are LiDAR laz data available here

http://liblas.org/samples/

The file I used is srs.laz

The commands

{{{
las2txt -i srs.laz -o srs.ascii --parse xyztiaunrcCpedRGB --delimiter "|"

# check ascii file
head srs.ascii
289814.15|4320978.61|170.76|499450.80599405|260|||6|0|2|Ground|0|0|0|0|0|0
289814.64|4320978.84|170.76|499450.80600805|280|||6|0|2|Ground|0|0|0|0|0|0
289815.12|4320979.06|170.75|499450.80602205|280|||6|0|2|Ground|0|0|0|0|0|0

# import in GRASS
las2txt -i srs.laz --stdout --parse xyztiaunrcCpedRGB --delimiter "|" |
v.in.ascii in=- out=srs_ascii -z x=1 y=2 z=3 --o

# only the first 5 columns were imported

# check table contents
v.db.select srs_ascii where="cat = 1"
cat|dbl_1|dbl_2|dbl_3|dbl_4|int_1
1|289814.15|4320978.61|170.76|499450.80599405|260
}}}

Markus M

--
Ticket URL: <https://trac.osgeo.org/grass/ticket/198#comment:7&gt;
GRASS GIS <http://grass.osgeo.org>

#198: v.in.ascii: column scanning is borked
-----------------------+----------------------------------------------------
  Reporter: hamish | Owner: grass-dev@…
      Type: defect | Status: closed
  Priority: critical | Milestone: 6.4.2
Component: Vector | Version: svn-develbranch6
Resolution: invalid | Keywords: v.in.ascii
  Platform: All | Cpu: All
-----------------------+----------------------------------------------------
Changes (by mmetz):

  * status: reopened => closed
  * resolution: => invalid

Comment:

Replying to [comment:7 mmetz]:
> Still not working. Test data are LiDAR laz data available here
>
> http://liblas.org/samples/
>
> The file I used is srs.laz
>
> [snip]
>
> # only the first 5 columns were imported
>
It's not v.in.ascii, it's G_getl2() that fails to fetch the whole line,
probably because of some obscure encoding of the output of las2txt which I
am not able to figure out, or las2txt writes weird characters for empty
fields.

Closing as invalid.

--
Ticket URL: <https://trac.osgeo.org/grass/ticket/198#comment:8&gt;
GRASS GIS <http://grass.osgeo.org>

#198: v.in.ascii: column scanning is borked
-----------------------+----------------------------------------------------
  Reporter: hamish | Owner: grass-dev@…
      Type: defect | Status: closed
  Priority: critical | Milestone: 6.4.2
Component: Vector | Version: svn-develbranch6
Resolution: invalid | Keywords: v.in.ascii
  Platform: All | Cpu: All
-----------------------+----------------------------------------------------

Comment(by hamish):

Replying to [comment:8 mmetz]:
> It's not v.in.ascii, it's G_getl2() that fails to fetch the
> whole line, probably because of some obscure encoding of the
> output of las2txt which I am not able to figure out, or
> las2txt writes weird characters for empty fields.

Hi,

instead of piping to v.in.ascii can you save to a file which we can have a
peek at in hexdump? what version of las2txt? does the same happen with
the snake lidar sample data from the grass wiki lidar page or just this
dataset?

(if you found it others probably will too)

Hamish

--
Ticket URL: <http://trac.osgeo.org/grass/ticket/198#comment:9&gt;
GRASS GIS <http://grass.osgeo.org>

#198: v.in.ascii: column scanning is borked
-----------------------+----------------------------------------------------
  Reporter: hamish | Owner: grass-dev@…
      Type: defect | Status: closed
  Priority: critical | Milestone: 6.4.2
Component: Vector | Version: svn-develbranch6
Resolution: invalid | Keywords: v.in.ascii
  Platform: All | Cpu: All
-----------------------+----------------------------------------------------

Comment(by mmetz):

Replying to [comment:9 hamish]:
> Replying to [comment:8 mmetz]:
> > It's not v.in.ascii, it's G_getl2() that fails to fetch the
> > whole line, probably because of some obscure encoding of the
> > output of las2txt which I am not able to figure out, or
> > las2txt writes weird characters for empty fields.
>
> Hi,
>
> instead of piping to v.in.ascii can you save to a file which we can have
a peek at in hexdump? what version of las2txt? does the same happen with
the snake lidar sample data from the grass wiki lidar page or just this
dataset?
>
las2txt version: libLAS 1.6.1 with GeoTIFF 1.3.0 GDAL 1.8.0 LASzip 1.2.0

The same happens with "Serpent Mound Model LAS Data.las" from the grass
wiki lidar page.

Attached is the las2txt output for srs.laz.

I am pretty sure this problem is caused by las2txt which does not check if
a given attribute exists. If it does not exist, some weird value is
written.

Markus M

--
Ticket URL: <http://trac.osgeo.org/grass/ticket/198#comment:10&gt;
GRASS GIS <http://grass.osgeo.org>

#198: v.in.ascii: column scanning is borked
-----------------------+----------------------------------------------------
  Reporter: hamish | Owner: grass-dev@…
      Type: defect | Status: closed
  Priority: critical | Milestone: 6.4.2
Component: Vector | Version: svn-develbranch6
Resolution: invalid | Keywords: v.in.ascii
  Platform: All | Cpu: All
-----------------------+----------------------------------------------------

Comment(by hamish):

Replying to [comment:10 mmetz]:
> Attached is the las2txt output for srs.laz.
>
> I am pretty sure this problem is caused by las2txt which does
> not check if a given attribute exists. If it does not exist,
> some weird value is written.

correct. columns 6 and 7 are not empty.

as viewed in `less`:

{{{
289814.15|4320978.61|170.76|499450.80599405|260|^@|^@|6|0|2|Ground|0|0|0|0|0|0
289814.64|4320978.84|170.76|499450.80600805|280|^@|^@|6|0|2|Ground|0|0|0|0|0|0
289815.12|4320979.06|170.75|499450.80602205|280|^@|^@|6|0|2|Ground|0|0|0|0|0|0
289815.60|4320979.28|170.74|499450.80603605|280|^@|^@|6|0|2|Ground|0|0|0|0|0|0
289816.08|4320979.50|170.68|499450.80605005|260|^@|^@|6|0|2|Ground|0|0|0|0|0|0
289816.56|4320979.71|170.66|499450.80606405|240|^@|^@|6|0|2|Ground|0|0|0|0|0|0
289817.03|4320979.92|170.63|499450.80607806|240|^@|^@|6|0|2|Ground|0|0|0|0|0|0
289817.53|4320980.16|170.62|499450.80609206|280|^@|^@|6|0|2|Ground|0|0|0|0|0|0
289818.01|4320980.38|170.61|499450.80610606|280|^@|^@|6|0|2|Ground|0|0|0|0|0|0
289818.50|4320980.59|170.58|499450.80612006|260|^@|^@|6|0|2|Ground|0|0|0|0|0|0
}}}

`^@` means the null char.

I think it is reasonable for G_getl2() to stop on null terminators, and
there's nothing more to do here but file a bug with `las2txt`.

Hamish

--
Ticket URL: <https://trac.osgeo.org/grass/ticket/198#comment:11&gt;
GRASS GIS <http://grass.osgeo.org>

#198: v.in.ascii: column scanning is borked
-----------------------+----------------------------------------------------
  Reporter: hamish | Owner: grass-dev@…
      Type: defect | Status: closed
  Priority: critical | Milestone: 6.4.2
Component: Vector | Version: svn-develbranch6
Resolution: invalid | Keywords: v.in.ascii
  Platform: All | Cpu: All
-----------------------+----------------------------------------------------

Comment(by mmetz):

Replying to [comment:11 hamish]:
> Replying to [comment:10 mmetz]:
> > Attached is the las2txt output for srs.laz.
> >
> > I am pretty sure this problem is caused by las2txt which does
> > not check if a given attribute exists. If it does not exist,
> > some weird value is written.
>
> correct. columns 6 and 7 are not empty.
[snip]
>
> I think it is reasonable for G_getl2() to stop on null terminators, and
there's nothing more to do here but file a bug with `las2txt`.
>
I would rather call this a user error that I did because the proper way to
do it would be to investigate the .la[s|z] file first with `lasinfo`,
decide what attributes I want to import based on the attributes available
and then set the --parse options accordingly. Or use v.in.lidar which does
it all automatically;-)

Markus M

--
Ticket URL: <http://trac.osgeo.org/grass/ticket/198#comment:12&gt;
GRASS GIS <http://grass.osgeo.org>