[Geoserver-devel] community importer laundry list

The following is a list of items that came up during my development with the community importer.
The first part is bugs/issues with the importer, the second is geoserver bugs/issues, and the last is ideas for enhancements:

Import bugs/issues:

  • doc/markdown directory should be merged into sphinx docs and reviewed

  • revisit expand parameter in API - this provides more information on representations in the API - by expanding by default unless at the collection level

  • protect against a task being run again

  • prevent any localworkspace from being serialized

  • an unhandled exception in the importer may result in an error state but there is no way to track from the client side - for example, a runtime exception (non-user visible error) is hard to trace to the actual problem. Solution: introduce UUID for tracing the exception from the client to the server administrator.

  • change a layer name or nativeName results in an error when trying to get the attributes because the feature type no longer exists. Solution: put this at the task level, check for existing native name and throw if present. re-check again at import time

  • validate if direct import and the store format does not support a specific updateMode (or if it’s not implemented). probably not a good idea to support this on shapefile or other stores immediately. Longer term, would be nice to have but should 400 on such a request until supported.

  • fix Importer.run(ImportContext,ImportFilter,ProgressMonitor) to handle exceptions better - currently if the first task works but the second fails, state will be invalid

  • break implementation of Importer.doXXXImport into separate classes for testing to allow mocking and error-injection for verification of behavior

  • serialbinding works fine from my testing. remove use of xstreampersister to reduce permutations of configuration

  • for unifying (and constraining) docs and functionalty, perhaps the best approach is to note the philosophy of the API is to not duplicate anything from the other config API. Then, we enumerate the specific representation transitions supported in the docs, then ensure they’re covered. This came up for example with the way a LayerInfo was updated. The implementation was copying the (somehwat incomplete) FeatureInfo passed in ‘short-hand’ and basically overriding existing configuration. In reality, we only really want to allow changing the SRS (of the possible properties I could identify)

  • Any errors in the streaming response (see: ImportContextJSONFormat) are not caught because at that point the status has already been written. therefore the client just gets invalid JSON w/ a 200 status.

  • remove complexity of spring context - replace BeanResourceFinder with direct mapping

  • complete support for updateModes for database indirect import (probably should include truncating any cache as well)

Some general geoserver bugs/issues found during testing:

  • REST Config API - one can create a db datastore w/ invalid credentials and there is no way of getting feedback w/out reloading the catalog (otherwise it doesn’t seem to validate the new connection) ??

  • request to /geoserver/rest/imports just gives response /geoserver/rest response if importer not active (hard to test if the API is enabled/present)

  • Possible NPE when deleting resource after deleting layer?
    at org.geoserver.catalog.ResourcePool$FeatureTypeCache.dispose(ResourcePool.java:1697)
    at org.geoserver.catalog.ResourcePool$FeatureTypeCache.dispose(ResourcePool.java:1689)
    at org.geoserver.catalog.ResourcePool$CatalogResourceCache.remove(ResourcePool.java:1668)
    at org.geoserver.catalog.ResourcePool.clear(ResourcePool.java:1009)

Enhancements:

  • add capabilities/version endpoint (for example: /rest/importer/capabilities ) that describes supported formats, etc.

  • add support for testing projection from prj file before transferring data

  • if describing supported formats is too difficult, adding an endpoint for verifying if the provided files (by name only) can be processed before transferring any data.

  • support for fetching from remote URL

  • web-hooks for completion of import (to avoid polling the progress endpoint during async import)

Ian Schneider
Software Engineer | Boundless
ischneider@anonymised.com

  • doc/markdown directory should be merged into sphinx docs and reviewed

I had good luck converting uDig docs with “pandoc”.

On Thu, Nov 7, 2013 at 9:16 PM, Ian Schneider
<ischneider@anonymised.com>wrote:

The following is a list of items that came up during my development with
the community importer.
The first part is bugs/issues with the importer, the second is geoserver
bugs/issues, and the last is ideas for enhancements:

Import bugs/issues:

* doc/markdown directory should be merged into sphinx docs and reviewed

* revisit expand parameter in API - this provides more information on
representations in the API - by expanding by default unless at the
collection level

* protect against a task being run again

* prevent any localworkspace from being serialized

* an unhandled exception in the importer may result in an error state but
there is no way to track from the client side - for example, a runtime
exception (non-user visible error) is hard to trace to the actual problem.
Solution: introduce UUID for tracing the exception from the client to the
server administrator.

* change a layer name or nativeName results in an error when trying to get
the attributes because the feature type no longer exists. Solution: put
this at the task level, check for existing native name and throw if
present. re-check again at import time

* validate if direct import and the store format does not support a
specific updateMode (or if it's not implemented). probably not a good idea
to support this on shapefile or other stores immediately. Longer term,
would be nice to have but should 400 on such a request until supported.

* fix Importer.run(ImportContext,ImportFilter,ProgressMonitor) to handle
exceptions better - currently if the first task works but the second fails,
state will be invalid

* break implementation of Importer.doXXXImport into separate classes for
testing to allow mocking and error-injection for verification of behavior

* serialbinding works fine from my testing. remove use of xstreampersister
to reduce permutations of configuration

* for unifying (and constraining) docs and functionalty, perhaps the best
approach is to note the philosophy of the API is to not duplicate anything
from the other config API. Then, we enumerate the specific representation
transitions supported in the docs, then ensure they're covered. This came
up for example with the way a LayerInfo <http://LayerInfo.html> was
updated. The implementation was copying the (somehwat incomplete)
FeatureInfo <http://FeatureInfo.html> passed in 'short-hand' and
basically overriding existing configuration. In reality, we only really
want to allow changing the SRS (of the possible properties I could identify)

* Any errors in the streaming response (see: ImportContextJSONFormat) are
not caught because at that point the status has already been written.
therefore the client just gets invalid JSON w/ a 200 status.

* remove complexity of spring context - replace BeanResourceFinder with
direct mapping

* complete support for updateModes for database indirect import (probably
should include truncating any cache as well)

Ian, thanks for the list. I'm not sure I understand one thing though, are
these things that were fixed
since the code was contributed to GeoServer, or are they still to be fixed?
I'm asking since I see a number of commits in the module since it got
contributed back to GeoServer:
https://github.com/ischneider/geoserver/commits/importer-2.3.x/src/community/importer

Anyways:
* if they are still open, I'd say, let's open a ticket for each of them,
unless the issue is trivial and there is no much interest in fixing is
anyways
* if they have been solved, let's get the patches :slight_smile:

Some general geoserver bugs/issues found during testing:

* REST Config API - one can create a db datastore w/ invalid credentials
and there is no way of getting feedback w/out reloading the catalog
(otherwise it doesn't seem to validate the new connection) ??

Ticket

* request to /geoserver/rest/imports just gives response /geoserver/restresponse if importer not active (hard to test if the API is enabled/present)

Yes, annoying one, not sure we can do anything about it with the current
Restlet library (we were early implementors
of the REST concepts, and now we have everything based on a pretty old
version of a library that's probably not very
common anymore, at least, most of the REST oriented services I've seen
recently have been implemented using Jersey)

* Possible NPE when deleting resource after deleting layer?
at
org.geoserver.catalog.ResourcePool$FeatureTypeCache.dispose(ResourcePool.java:1697)
at
org.geoserver.catalog.ResourcePool$FeatureTypeCache.dispose(ResourcePool.java:1689)
at
org.geoserver.catalog.ResourcePool$CatalogResourceCache.remove(ResourcePool.java:1668)
at org.geoserver.catalog.ResourcePool.clear(ResourcePool.java:1009)

This one might have been fixed in the meantime, not sure

Enhancements:

* add capabilities/version endpoint (for example:
/rest/importer/capabilities ) that describes supported formats, etc.

Nice one

* add support for testing projection from prj file before transferring data

Hum... hu?

* if describing supported formats is too difficult, adding an endpoint for
verifying if the provided files (by name only) can be processed before
transferring any data.

Right.

* support for fetching from remote URL

interesting one as well

* web-hooks for completion of import (to avoid polling the progress
endpoint during async import)

You mean something like comet/websockets?

Cheers
Andrea

--

Our support, Your Success! Visit http://opensdi.geo-solutions.it for more
information.

Ing. Andrea Aime
@geowolf
Technical Lead

GeoSolutions S.A.S.
Via Poggio alle Viti 1187
55054 Massarosa (LU)
Italy
phone: +39 0584 962313
fax: +39 0584 1660272
mob: +39 339 8844549

http://www.geo-solutions.it
http://twitter.com/geosolutions_it

-------------------------------------------------------

Ah, by the way,
thinking about the laundry list again I realized it’s not really answering (at least to me, with limited knowledge
of the new importer code) the question of whether the importer is ready to move to extension or not.

A module in extension is not like a module in core, it is not supposed to have the same level of testing, nor
the same level of stability, but certainly one cannot just do as they please anymore and change wildly
its external interfaces once it’s there.

So I guess the overall question is:

  • can any of the importer issues bring down GeoServer, or impair the users to the point they will just want to
    get rid of it
  • do its external interfaces still need heavy refactoring?

Cheers
Andrea

···

On Sat, Nov 9, 2013 at 11:54 AM, Andrea Aime <andrea.aime@anonymised.com> wrote:

On Thu, Nov 7, 2013 at 9:16 PM, Ian Schneider <ischneider@anonymised.com> wrote:

==
Our support, Your Success! Visit http://opensdi.geo-solutions.it for more information.

Ing. Andrea Aime

@geowolf
Technical Lead

GeoSolutions S.A.S.
Via Poggio alle Viti 1187
55054 Massarosa (LU)
Italy
phone: +39 0584 962313
fax: +39 0584 1660272
mob: +39 339 8844549

http://www.geo-solutions.it
http://twitter.com/geosolutions_it


The following is a list of items that came up during my development with the community importer.
The first part is bugs/issues with the importer, the second is geoserver bugs/issues, and the last is ideas for enhancements:

Import bugs/issues:

  • doc/markdown directory should be merged into sphinx docs and reviewed

  • revisit expand parameter in API - this provides more information on representations in the API - by expanding by default unless at the collection level

  • protect against a task being run again

  • prevent any localworkspace from being serialized

  • an unhandled exception in the importer may result in an error state but there is no way to track from the client side - for example, a runtime exception (non-user visible error) is hard to trace to the actual problem. Solution: introduce UUID for tracing the exception from the client to the server administrator.

  • change a layer name or nativeName results in an error when trying to get the attributes because the feature type no longer exists. Solution: put this at the task level, check for existing native name and throw if present. re-check again at import time

  • validate if direct import and the store format does not support a specific updateMode (or if it’s not implemented). probably not a good idea to support this on shapefile or other stores immediately. Longer term, would be nice to have but should 400 on such a request until supported.

  • fix Importer.run(ImportContext,ImportFilter,ProgressMonitor) to handle exceptions better - currently if the first task works but the second fails, state will be invalid

  • break implementation of Importer.doXXXImport into separate classes for testing to allow mocking and error-injection for verification of behavior

  • serialbinding works fine from my testing. remove use of xstreampersister to reduce permutations of configuration

  • for unifying (and constraining) docs and functionalty, perhaps the best approach is to note the philosophy of the API is to not duplicate anything from the other config API. Then, we enumerate the specific representation transitions supported in the docs, then ensure they’re covered. This came up for example with the way a LayerInfo was updated. The implementation was copying the (somehwat incomplete) FeatureInfo passed in ‘short-hand’ and basically overriding existing configuration. In reality, we only really want to allow changing the SRS (of the possible properties I could identify)

  • Any errors in the streaming response (see: ImportContextJSONFormat) are not caught because at that point the status has already been written. therefore the client just gets invalid JSON w/ a 200 status.

  • remove complexity of spring context - replace BeanResourceFinder with direct mapping

  • complete support for updateModes for database indirect import (probably should include truncating any cache as well)

Ian, thanks for the list. I’m not sure I understand one thing though, are these things that were fixed
since the code was contributed to GeoServer, or are they still to be fixed?
I’m asking since I see a number of commits in the module since it got contributed back to GeoServer:
https://github.com/ischneider/geoserver/commits/importer-2.3.x/src/community/importer

Anyways:

  • if they are still open, I’d say, let’s open a ticket for each of them, unless the issue is trivial and there is no much interest in fixing is anyways
  • if they have been solved, let’s get the patches :slight_smile:

Some general geoserver bugs/issues found during testing:

  • REST Config API - one can create a db datastore w/ invalid credentials and there is no way of getting feedback w/out reloading the catalog (otherwise it doesn’t seem to validate the new connection) ??

Ticket

  • request to /geoserver/rest/imports just gives response /geoserver/rest response if importer not active (hard to test if the API is enabled/present)

Yes, annoying one, not sure we can do anything about it with the current Restlet library (we were early implementors
of the REST concepts, and now we have everything based on a pretty old version of a library that’s probably not very
common anymore, at least, most of the REST oriented services I’ve seen recently have been implemented using Jersey)

  • Possible NPE when deleting resource after deleting layer?
    at org.geoserver.catalog.ResourcePool$FeatureTypeCache.dispose(ResourcePool.java:1697)
    at org.geoserver.catalog.ResourcePool$FeatureTypeCache.dispose(ResourcePool.java:1689)
    at org.geoserver.catalog.ResourcePool$CatalogResourceCache.remove(ResourcePool.java:1668)
    at org.geoserver.catalog.ResourcePool.clear(ResourcePool.java:1009)

This one might have been fixed in the meantime, not sure

Enhancements:

  • add capabilities/version endpoint (for example: /rest/importer/capabilities ) that describes supported formats, etc.

Nice one

  • add support for testing projection from prj file before transferring data

Hum… hu?

  • if describing supported formats is too difficult, adding an endpoint for verifying if the provided files (by name only) can be processed before transferring any data.

Right.

  • support for fetching from remote URL

interesting one as well

  • web-hooks for completion of import (to avoid polling the progress endpoint during async import)

You mean something like comet/websockets?

Cheers
Andrea

==
Our support, Your Success! Visit http://opensdi.geo-solutions.it for more information.

Ing. Andrea Aime

@geowolf
Technical Lead

GeoSolutions S.A.S.
Via Poggio alle Viti 1187
55054 Massarosa (LU)
Italy
phone: +39 0584 962313
fax: +39 0584 1660272
mob: +39 339 8844549

http://www.geo-solutions.it
http://twitter.com/geosolutions_it