[Geoserver-devel] random idea: hash checksum

Hi All,
As a non-developer lurker on this list, I thought I’d mention an idea that occurred to me as a potentially useful GeoServer feature-- providing hash checksum service for services like WFS, sort of a vendor extension to WFS for data verification. So if this is my WFS request:

http://localhost:8080/geoserver/metroparks/ows?service=WFS&version=1.0.0&request=GetFeature&typeName=metroparks:cuyahoga_hydro_polygon&outputFormat=SHAPE-ZIP

http://localhost:8080/geoserver/metroparks/ows?service=WFS&version=1.0.0&request=GetFeature&typeName=metroparks:cuyahoga_hydro_polygon&outputFormat=SHAPE-ZIP*-HASH-SHA512*

I could see some logistical problems with this-- typically WFS is streaming, and I don’t know if that makes generating a hash any more difficult without temporarily storing in memory, but that’s what makes me a user :).

Anyway, back to your regularly scheduled program, i.e. things in the timeline and funded projects. :slight_smile:

Best,
Steve

On Sat, Sep 15, 2012 at 6:32 PM, Stephen Mather <mather.stephen@anonymised.com> wrote:

Hi All,
As a non-developer lurker on this list, I thought I’d mention an idea that occurred to me as a potentially useful GeoServer feature-- providing hash checksum service for services like WFS, sort of a vendor extension to WFS for data verification. So if this is my WFS request:

http://localhost:8080/geoserver/metroparks/ows?service=WFS&version=1.0.0&request=GetFeature&typeName=metroparks:cuyahoga_hydro_polygon&outputFormat=SHAPE-ZIP

http://localhost:8080/geoserver/metroparks/ows?service=WFS&version=1.0.0&request=GetFeature&typeName=metroparks:cuyahoga_hydro_polygon&outputFormat=SHAPE-ZIP*-HASH-SHA512*

I could see some logistical problems with this-- typically WFS is streaming, and I don’t know if that makes generating a hash any more difficult without temporarily storing in memory, but that’s what makes me a user :).

Computing a checksum in a streaming manner is not a problem, the problem if anything is… where do we put the checksum, especially since the checksum itself is available only when the download has been completed?

Cheers
Andrea

==
Our support, Your Success! Visit http://opensdi.geo-solutions.it for more information.

Ing. Andrea Aime
@geowolf
Technical Lead

GeoSolutions S.A.S.
Via Poggio alle Viti 1187
55054 Massarosa (LU)
Italy
phone: +39 0584 962313
fax: +39 0584 962313
mob: +39 339 8844549

http://www.geo-solutions.it
http://twitter.com/geosolutions_it


On Sun, Sep 16, 2012 at 2:58 AM, Stephen Mather <mather.stephen@anonymised.com> wrote:

Is the question since if the underlying data changes the hash changes, so that hash need stored for a particular request for a particular point in time?

Err, no, I would not store it anyways, the question is, how do I give the sum to the client? (since I’m already using the
stream to return the value?)

Or do you mean that when that particular format is used, then only the checksum is returned?
The problem of such an approach is that between the download and the checksum is requested
something in the data (or in the configuration) might have changed, thus making the checksum
useless

Cheers
Andrea

==
Our support, Your Success! Visit http://opensdi.geo-solutions.it for more information.

Ing. Andrea Aime
@geowolf
Technical Lead

GeoSolutions S.A.S.
Via Poggio alle Viti 1187
55054 Massarosa (LU)
Italy
phone: +39 0584 962313
fax: +39 0584 962313
mob: +39 339 8844549

http://www.geo-solutions.it
http://twitter.com/geosolutions_it


Or do you mean that when that particular format is used, then only the checksum is returned?
The problem of such an approach is that between the download and the checksum is requested
something in the data (or in the configuration) might have changed, thus making the checksum
useless

Before I thought it through, I thought there would be a separate checksum, but then concluded the same-- it would in many cases be misleading and thus useless with changing data.

Continuing the process of thinking aloud-- for zipped WFS types like shapefile, double zipping with the checksum inside the outer zipfile seems viable. For something like GeoJSON, there may be no solution-- embedding the checksum in the GeoJSON itself would be impossible, I would think-- an extreme case of a snake eating it’s own tail.

Best,
Steve

What are you trying to do?

Detect corruption? ZIP has internal CRC-32 checksums that should protect against accidental corruption.

Protect against malicious attack? Running GeoServer over SSL/TLS will protect against malicious modification and protect against server identity spoofing.

Detect data change? Hashes will not be useful for detecting data change because timestamps in many payload formats will cause them to be different every time.

Kind regards,
Ben.

On 16/09/12 00:32, Stephen Mather wrote:

Hi All,
         As a non-developer lurker on this list, I thought I'd mention
an idea that occurred to me as a potentially useful GeoServer feature--
providing hash checksum service for services like WFS, sort of a vendor
extension to WFS for data verification. So if this is my WFS request:

http://localhost:8080/geoserver/metroparks/ows?service=WFS&version=1.0.0&request=GetFeature&typeName=metroparks:cuyahoga_hydro_polygon&outputFormat=SHAPE-ZIP

http://localhost:8080/geoserver/metroparks/ows?service=WFS&version=1.0.0&request=GetFeature&typeName=metroparks:cuyahoga_hydro_polygon&outputFormat=*SHAPE-ZIP/-HASH-SHA512/*

I could see some logistical problems with this-- typically WFS is
streaming, and I don't know if that makes generating a hash any more
difficult without temporarily storing in memory, but that's what makes
me a user :).

Anyway, back to your regularly scheduled program, i.e. things in the
timeline and funded projects. :slight_smile:

Best,
Steve

--
Ben Caradoc-Davies <Ben.Caradoc-Davies@anonymised.com>
Software Engineer
CSIRO Earth Science and Resource Engineering
Australian Resources Research Centre