[GRASS-dev] [GRASS GIS] #39: r.in.srtm fails to validate zip files

#39: r.in.srtm fails to validate zip files
-----------------------+----------------------------------------------------
Reporter: kyngchaos | Owner: grass-dev@lists.osgeo.org
     Type: defect | Status: new
Priority: major | Milestone: 6.3.0
Component: default | Version: unspecified
Keywords: |
-----------------------+----------------------------------------------------
r.in.srtm uses "file -ib" on the zip file to check if it has the correct
mime type. "-i" seems to be a very ambiguous way to specify the mime
flag. The BSD version of file (at least on OSX) uses "-i" for the "do no
classify" regular files option, and "-I" (uppercase) for the mime option,
though has "-i" for mime listed as a legacy option. GNU file and old UNIX
versions use "-i" for the mime option. SUS does not have mime as a
minimum option to support.

It looks like all have the extended "--mime" flag. Attached is a patch
to use that flag. There should probably be a way to handle the
possibility of a system having a "file" command without any mime flag, but
I don't know how to handle that, and it may not be critical.

--
Ticket URL: <http://trac.osgeo.org/grass/ticket/39&gt;
GRASS GIS <http://grass.osgeo.org>
GRASS Geographic Information System (GRASS GIS) - http://grass.osgeo.org/

#39: r.in.srtm fails to validate zip files
------------------------+---------------------------------------------------
  Reporter: kyngchaos | Owner: grass-dev@lists.osgeo.org
      Type: defect | Status: new
  Priority: major | Milestone: 6.3.0
Component: default | Version: unspecified
Resolution: | Keywords:
------------------------+---------------------------------------------------
Comment (by hamish):

could 'unzip -t' help?

what does the IEEE standard say about the -i `file` option?

Hamish

--
Ticket URL: <http://trac.osgeo.org/grass/ticket/39#comment:1&gt;
GRASS GIS <http://grass.osgeo.org>
GRASS Geographic Information System (GRASS GIS) - http://grass.osgeo.org/

#39: r.in.srtm fails to validate zip files
------------------------+---------------------------------------------------
  Reporter: kyngchaos | Owner: grass-dev@lists.osgeo.org
      Type: defect | Status: new
  Priority: major | Milestone: 6.3.0
Component: default | Version: unspecified
Resolution: | Keywords:
------------------------+---------------------------------------------------
Comment (by kyngchaos):

I don't know why it was changed ([19303]), but it used to try unzipping
immediately, and testing whether that failed. "unzip -t" tests by
unzipping anyways before the real unzipping, thus wasting time (though it
is only in memory).

--
Ticket URL: <http://trac.osgeo.org/grass/ticket/39#comment:2&gt;
GRASS GIS <http://grass.osgeo.org>
GRASS Geographic Information System (GRASS GIS) - http://grass.osgeo.org/

#39: r.in.srtm fails to validate zip files
------------------------+---------------------------------------------------
  Reporter: kyngchaos | Owner: grass-dev@lists.osgeo.org
      Type: defect | Status: new
  Priority: major | Milestone: 6.3.0
Component: default | Version: unspecified
Resolution: | Keywords:
------------------------+---------------------------------------------------
Comment (by kyngchaos):

Interesting - I was wondering why I didn't notice this before, I know I've
converted SRTM since that change. I totally forgot to check older OSX
versions. OSX 10.5 has "-I" as the mime flag. OSX 10.4, and probably
previous versions, has "-i" as the mime flag.

> what does the IEEE standard say about the -i file option?

Dunno, but the SUS standard (which OSX 10.5 now conforms to) has "-i" as
the "do not classify regular files" option.
[http://en.wikipedia.org/wiki/File_(Unix)#Usage]

--
Ticket URL: <http://trac.osgeo.org/grass/ticket/39#comment:3&gt;
GRASS GIS <http://grass.osgeo.org>
GRASS Geographic Information System (GRASS GIS) - http://grass.osgeo.org/

#39: r.in.srtm fails to validate zip files
------------------------+---------------------------------------------------
  Reporter: kyngchaos | Owner: grass-dev@lists.osgeo.org
      Type: defect | Status: new
  Priority: major | Milestone: 6.3.0
Component: default | Version: unspecified
Resolution: | Keywords:
------------------------+---------------------------------------------------
Comment (by kyngchaos):

I guess my main concerns are:

  * the "--mime" flag is standard on modern systems - it looks like it.
  * I've noticed that people tend to avoid the double-dash long versions of
flags, but I don't know if there is a technical reason other than less
typing. In this case it avoids confusion in the -i/-I short flag.

If there are no objections to "--mime", I'll patch r.in.srtm.

--
Ticket URL: <http://trac.osgeo.org/grass/ticket/39#comment:4&gt;
GRASS GIS <http://grass.osgeo.org>
GRASS Geographic Information System (GRASS GIS) - http://grass.osgeo.org/

GRASS GIS wrote:

#39: r.in.srtm fails to validate zip files
------------------------+---------------------------------------------------
  Reporter: kyngchaos | Owner: grass-dev@lists.osgeo.org
      Type: defect | Status: new
  Priority: major | Milestone: 6.3.0
Component: default | Version: unspecified
Resolution: | Keywords:
------------------------+---------------------------------------------------
Comment (by kyngchaos):

I guess my main concerns are:

  * the "--mime" flag is standard on modern systems - it looks like it.
  * I've noticed that people tend to avoid the double-dash long versions of
flags, but I don't know if there is a technical reason other than less
typing. In this case it avoids confusion in the -i/-I short flag.

If there are no objections to "--mime", I'll patch r.in.srtm.

I'd suggest skipping the use of "file" altogether. It isn't a standard
utility on Windows.

I would assume that the filename ending in .zip is at least prima
facie evidence of the file actually being a ZIP file.

--
Glynn Clements <glynn@gclements.plus.com>

Makes sense. I wonder why it was changed in the past? It used to try unzipping and check if it failed. Markus?

http://trac.osgeo.org/grass/changeset/19303

On Feb 9, 2008, at 3:25 PM, Glynn Clements wrote:

GRASS GIS wrote:

#39: r.in.srtm fails to validate zip files
------------------------+---------------------------------------------------
Reporter: kyngchaos | Owner: grass-dev@lists.osgeo.org
     Type: defect | Status: new
Priority: major | Milestone: 6.3.0
Component: default | Version: unspecified
Resolution: | Keywords:
------------------------+---------------------------------------------------
Comment (by kyngchaos):

I guess my main concerns are:

* the "--mime" flag is standard on modern systems - it looks like it.
* I've noticed that people tend to avoid the double-dash long versions of
flags, but I don't know if there is a technical reason other than less
typing. In this case it avoids confusion in the -i/-I short flag.

If there are no objections to "--mime", I'll patch r.in.srtm.

I'd suggest skipping the use of "file" altogether. It isn't a standard
utility on Windows.

I would assume that the filename ending in .zip is at least prima
facie evidence of the file actually being a ZIP file.

--
Glynn Clements <glynn@gclements.plus.com>
_______________________________________________
grass-dev mailing list
grass-dev@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/grass-dev

-----
William Kyngesburye <kyngchaos*at*kyngchaos*dot*com>
http://www.kyngchaos.com/

All generalizations are dangerous, even this one.

#39: r.in.srtm fails to validate zip files
------------------------+---------------------------------------------------
  Reporter: kyngchaos | Owner: hamish
      Type: defect | Status: assigned
  Priority: major | Milestone: 6.3.0
Component: default | Version: unspecified
Resolution: | Keywords:
------------------------+---------------------------------------------------
Changes (by hamish):

  * status: new => assigned
  * owner: grass-dev@lists.osgeo.org => hamish
* cc: grass-dev@lists.osgeo.org (added)

Comment:

SUS says:
   http://www.opengroup.org/onlinepubs/009695399/utilities/file.html

--mime is not mentioned.

re "why was it changed" in r19303: it never used `zip` to test, it was
using `ls` to look for a file with a .zip extension. (input= refers to the
tile name not the file name)

{{{
-ls -1 "$FILE.hgt.zip" | grep zip > /dev/null
-if [ $? -ne 0 ] ; then
+# really a ZIP file?
+if [ "`file -ib $FILE.hgt.zip`" != "application/x-zip" ] ; then
    echo "$FILE.hgt.zip is no zip file."
    exit 1
  fi
}}}

for one thing "echo $foo.zip | grep zip" was always true.

Perhaps the reason for the test had to do with downloaders creating
incomplete or 0 byte files, or having wrongly named files?

I've removed `file` and added a call to `unzip -t` in 6.3 trunk.

waiting to backport to 6.3.0's branch and close the bug until someone
tests the change please.

Hamish

--
Ticket URL: <http://trac.osgeo.org/grass/ticket/39#comment:5&gt;
GRASS GIS <http://grass.osgeo.org>
GRASS Geographic Information System (GRASS GIS) - http://grass.osgeo.org/

#39: r.in.srtm fails to validate zip files
------------------------+---------------------------------------------------
  Reporter: kyngchaos | Owner: hamish
      Type: defect | Status: assigned
  Priority: major | Milestone: 6.3.0
Component: default | Version: unspecified
Resolution: | Keywords:
------------------------+---------------------------------------------------
Comment (by hamish):

see r30046

--
Ticket URL: <http://trac.osgeo.org/grass/ticket/39#comment:6&gt;
GRASS GIS <http://grass.osgeo.org>
GRASS Geographic Information System (GRASS GIS) - http://grass.osgeo.org/

GRASS GIS wrote:

#39: r.in.srtm fails to validate zip files
------------------------+---------------------------------------------------
  Reporter: kyngchaos | Owner: hamish
      Type: defect | Status: assigned
  Priority: major | Milestone: 6.3.0
Component: default | Version: unspecified
Resolution: | Keywords:
------------------------+---------------------------------------------------
Changes (by hamish):

  * status: new => assigned
  * owner: grass-dev@lists.osgeo.org => hamish
* cc: grass-dev@lists.osgeo.org (added)

Comment:

SUS says:
   http://www.opengroup.org/onlinepubs/009695399/utilities/file.html

--mime is not mentioned.

re "why was it changed" in r19303: it never used `zip` to test, it was
using `ls` to look for a file with a .zip extension. (input= refers to the
tile name not the file name)

{{{
-ls -1 "$FILE.hgt.zip" | grep zip > /dev/null
-if [ $? -ne 0 ] ; then
+# really a ZIP file?
+if [ "`file -ib $FILE.hgt.zip`" != "application/x-zip" ] ; then
    echo "$FILE.hgt.zip is no zip file."
    exit 1
  fi
}}}

for one thing "echo $foo.zip | grep zip" was always true.

Perhaps the reason for the test had to do with downloaders creating
incomplete or 0 byte files, or having wrongly named files?

If incomplete downloads are an issue, file probably won't help if you
manage to download anything at all, as it normally only checks the
first few bytes of the file. My /usr/share/misc/file/magic file has:

# ZIP archives (Greg Roelofs, c/o zip-bugs@wkuvx1.wku.edu)
0 string PK\003\004

4 byte 0x09 Zip archive data, at least v0.9 to extract
4 byte 0x0a Zip archive data, at least v1.0 to extract
4 byte 0x0b Zip archive data, at least v1.1 to extract
4 byte 0x14

30 ubelong !0x6d696d65 Zip archive data, at least v2.0 to extract

0x161 string WINZIP Zip archive data, WinZIP self-extracting

This suggests that the first 8 bytes are sufficient in most cases, 34
bytes for v2.0, or 360 bytes for a self-extracting archive.

I've removed `file` and added a call to `unzip -t` in 6.3 trunk.

waiting to backport to 6.3.0's branch and close the bug until someone
tests the change please.

"unzip -vb" would probably be enough to catch aborted downloads. ZIP
files have the catalog at the end of the file, so a truncated file
would normally lack a catalog.

--
Glynn Clements <glynn@gclements.plus.com>

> #39: r.in.srtm fails to validate zip files

Hamish:

> re "why was it changed" in r19303: it never used `zip` to test, it
> was using `ls` to look for a file with a .zip extension. (input=
> refers to the tile name not the file name)

....

> for one thing "echo $foo.zip | grep zip" was always true.

oops, wrong. that was used as a 'if [ -f ${FILE}.zip ] ;' test.
Not like echo as the file wasn't guaranteed to be there.

> Perhaps the reason for the test had to do with downloaders
> creating incomplete or 0 byte files, or having wrongly named files?

No, the module built the input filename internally, and was checking
that it came up with a correct filename. Getting mime type from `file`
was just a more advanced form of that.

Glynn:

If incomplete downloads are an issue,

probably not. It will exit with an error if the unzip step fails.

> I've removed `file` and added a call to `unzip -t` in 6.3 trunk.

and I think that should be fine.

> waiting to backport to 6.3.0's branch and close the bug until
> someone tests the change please.

....

Hamish

      ____________________________________________________________________________________
Be a better friend, newshound, and
know-it-all with Yahoo! Mobile. Try it now. http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ

#39: r.in.srtm fails to validate zip files
------------------------+---------------------------------------------------
  Reporter: kyngchaos | Owner: hamish
      Type: defect | Status: closed
  Priority: major | Milestone: 6.3.0
Component: default | Version: unspecified
Resolution: fixed | Keywords:
------------------------+---------------------------------------------------
Changes (by hamish):

  * status: assigned => closed
  * resolution: => fixed

--
Ticket URL: <http://trac.osgeo.org/grass/ticket/39#comment:7&gt;
GRASS GIS <http://grass.osgeo.org>
GRASS Geographic Information System (GRASS GIS) - http://grass.osgeo.org/