[Geoserver-devel] Some headaches with CSW GetRecords

Hi,
the GetRecords operation is proving to be a bit of hard nut to crack due to some
of its “features”.

Basically, a GetRecords operation is quite similar to a GetFeature one, in that
you can ask for different types of record types, e.g., both dublin core and
ISO, which are structurally different.
What we are going to do is to turn each into a Query object and then
ask for the records to the CatalogStore, which can respond the way it wants:
it may decide that it does not have any ISO record for example, but some
csw:Record, or it may have an internal model mapping to both representations
and as such it would be able to respond to both queries (actually, if one store
targets ISO it has to respond dublic core too since that is mandatory).

In the latter case we could be returning the same information twice, in two
different formats.

And that per se it would not be the end of the world, if it wasn’t for the fact that there
is a third parameter, outputSchema, that controls how the record get encoded.
The outputSchema defaults to the csw:Record representation, but one could
ask for ISO or ebRIM.

So, see, one can query csw:Record but then have it returned as an ISO representation,
or ask for iso and ebrim and have it returned as csw:Record… holy mess!

Now, I’ve tried to see what other CSW implementations do in this case.
GeoNetwork has one and only one internal representation (which might or might not
be equal to one of the output formats), and then XSLT to the canonical outputs.
The XSLT is configurable since the internal representation can vary.
PyCSW has internal representations that are equal to some of the canonical outputs,
and seems to have some sort of universal translator that goes among types.

I was leaning a bit towards the second approach, which we could programmatically
execute against the features containing the records, but discussing it with Emanuele
he pointed some severe limitations of such approach:

  • ebRim/EO and csw:Record and almost impossible to translate to each other in
    a generic manner as they contain completely different information
  • csw:Record can hardly be translated to ISO in a compliant way as we would
    not have enough info to build all of the compulsory ISO fields out of the
    dublin core representation

In the end its the CatalogStore itself that is best placed to do such transformation,
since it as the internal model handy, so it can first translate the Query against
its internal model, execute it, and then convert the internal model to the desired
representation. For example, the store working against the GeoServer own catalog
could be queried with csw:Record but with outputSchema http://www.isotc211.org/2005/gmd,
it would then translate the csw:Record query against its internal model, run it,
and then encode the results in ISO records using the full set of information
available in the internal model

This would result in a modification of CatalogStore from:

FeatureCollection getRecords(Query q, Transaction t) throws IOException;

to

FeatureCollection getRecords(Query q, FeaureType targetSchema, Transaction t) throws IOException;

Now, doing this solves one problem but leaves a potential other open.
If I ask for both csw:Record and ISO with ISO as the output, it is most likely that we
result will contain the same records duplicated…
I guess it is such a corner case that we probably should not bother, imho the
GetRecords request is ill posed to start with, but wanted to gather some opinions about it
nevertheless

Soo… what do you think?

Cheers
Andrea

==
Our support, Your Success! Visit http://opensdi.geo-solutions.it for more information.

Ing. Andrea Aime
@geowolf
Technical Lead

GeoSolutions S.A.S.
Via Poggio alle Viti 1187
55054 Massarosa (LU)
Italy
phone: +39 0584 962313
fax: +39 0584 962313
mob: +39 339 8844549

http://www.geo-solutions.it
http://twitter.com/geosolutions_it


On Fri, Sep 14, 2012 at 12:00 PM, Andrea Aime <andrea.aime@anonymised.com> wrote:

FeatureCollection getRecords(Query q, FeaureType targetSchema, Transaction t) throws IOException;

One observation about this move: the Query normally assumes simmetry between queried
and returned objects, thus the PropertyName are contained in there.
However in this case we have an asymmetry, so maybe we should switch to:

FeatureCollection getRecords(Query q, List targetProperties, FeaureType targetSchema, Transaction t) throws IOException;

and ignore the ones in the Query.
Or else, use the ones in the Query but assume they are part of the output.

Another possibility is to have a CSWQuery extending Query that has a targetSchema property.
Hmm… I believe I like this one better.

Finally, there is the bit about what happens if the store cannot do the type conversion.
I guess we should throw a Exception in such case? Maybe a specific one,
UnsupportedTargetTypeException, so that the CSW can return a proper service
exception referring to the outputSchema as the culprit

Cheers
Andrea

==
Our support, Your Success! Visit http://opensdi.geo-solutions.it for more information.

Ing. Andrea Aime
@geowolf
Technical Lead

GeoSolutions S.A.S.
Via Poggio alle Viti 1187
55054 Massarosa (LU)
Italy
phone: +39 0584 962313
fax: +39 0584 962313
mob: +39 339 8844549

http://www.geo-solutions.it
http://twitter.com/geosolutions_it


Not much to add but just some general feedback that i would vote for keeping things simple rather than bending over backward to support something in the spec that we don’t feel makes much sense.

Would having the CatalogStore advertise its capabilities in terms of what it can output help? Or do we already have that?

Regarding the api changes unfortunately i haven’t kept enough to know that they really mean and my questions at this point are still to basic. What does the Query.getPropertyNames() actually refer to? The metadata fields of the source records?

On Fri, Sep 14, 2012 at 4:00 AM, Andrea Aime <andrea.aime@anonymised.com> wrote:

Hi,
the GetRecords operation is proving to be a bit of hard nut to crack due to some
of its “features”.

Basically, a GetRecords operation is quite similar to a GetFeature one, in that
you can ask for different types of record types, e.g., both dublin core and
ISO, which are structurally different.
What we are going to do is to turn each into a Query object and then
ask for the records to the CatalogStore, which can respond the way it wants:
it may decide that it does not have any ISO record for example, but some
csw:Record, or it may have an internal model mapping to both representations
and as such it would be able to respond to both queries (actually, if one store
targets ISO it has to respond dublic core too since that is mandatory).

In the latter case we could be returning the same information twice, in two
different formats.

And that per se it would not be the end of the world, if it wasn’t for the fact that there
is a third parameter, outputSchema, that controls how the record get encoded.
The outputSchema defaults to the csw:Record representation, but one could
ask for ISO or ebRIM.

So, see, one can query csw:Record but then have it returned as an ISO representation,
or ask for iso and ebrim and have it returned as csw:Record… holy mess!

Now, I’ve tried to see what other CSW implementations do in this case.
GeoNetwork has one and only one internal representation (which might or might not
be equal to one of the output formats), and then XSLT to the canonical outputs.
The XSLT is configurable since the internal representation can vary.
PyCSW has internal representations that are equal to some of the canonical outputs,
and seems to have some sort of universal translator that goes among types.

I was leaning a bit towards the second approach, which we could programmatically
execute against the features containing the records, but discussing it with Emanuele
he pointed some severe limitations of such approach:

  • ebRim/EO and csw:Record and almost impossible to translate to each other in
    a generic manner as they contain completely different information
  • csw:Record can hardly be translated to ISO in a compliant way as we would
    not have enough info to build all of the compulsory ISO fields out of the
    dublin core representation

In the end its the CatalogStore itself that is best placed to do such transformation,
since it as the internal model handy, so it can first translate the Query against
its internal model, execute it, and then convert the internal model to the desired
representation. For example, the store working against the GeoServer own catalog
could be queried with csw:Record but with outputSchema http://www.isotc211.org/2005/gmd,
it would then translate the csw:Record query against its internal model, run it,
and then encode the results in ISO records using the full set of information
available in the internal model

This would result in a modification of CatalogStore from:

FeatureCollection getRecords(Query q, Transaction t) throws IOException;

to

FeatureCollection getRecords(Query q, FeaureType targetSchema, Transaction t) throws IOException;

Now, doing this solves one problem but leaves a potential other open.
If I ask for both csw:Record and ISO with ISO as the output, it is most likely that we
result will contain the same records duplicated…
I guess it is such a corner case that we probably should not bother, imho the
GetRecords request is ill posed to start with, but wanted to gather some opinions about it
nevertheless

Soo… what do you think?

Cheers
Andrea

==
Our support, Your Success! Visit http://opensdi.geo-solutions.it for more information.

Ing. Andrea Aime
@geowolf
Technical Lead

GeoSolutions S.A.S.
Via Poggio alle Viti 1187
55054 Massarosa (LU)
Italy
phone: +39 0584 962313
fax: +39 0584 962313
mob: +39 339 8844549

http://www.geo-solutions.it
http://twitter.com/geosolutions_it



Got visibility?
Most devs has no idea what their production app looks like.
Find out how fast your code is with AppDynamics Lite.
http://ad.doubleclick.net/clk;262219671;13503038;y?
http://info.appdynamics.com/FreeJavaPerformanceDownload.html


Geoserver-devel mailing list
Geoserver-devel@anonymised.comsts.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/geoserver-devel


Justin Deoliveira
OpenGeo - http://opengeo.org
Enterprise support for open source geospatial.

On Mon, Sep 17, 2012 at 4:31 PM, Justin Deoliveira <jdeolive@anonymised.com> wrote:

Not much to add but just some general feedback that i would vote for keeping things simple rather than bending over backward to support something in the spec that we don’t feel makes much sense.

Would having the CatalogStore advertise its capabilities in terms of what it can output help? Or do we already have that?

We don’t have it, but we definitely could add it. The thing is, querying in one form and returning in the other
is allowed, and sometimes is unfortunately necessary.
For example, if I want to make a full text search the only way I know is to query csw:Record and do a like
on csw:AnyText, regardless of the kind of record I want to return (not sure about ISO, but I’m not aware
of any way to make a full text search against ebRim records for example).

Regarding the api changes unfortunately i haven’t kept enough to know that they really mean and my questions at this point are still to basic. What does the Query.getPropertyNames() actually refer to? The metadata fields of the source records?

Right, that was part of my dillema. getPropertyNames() is not used for filtering, it’s definitely part of the output
management, it defines what attributes we return, so it should be referring to the output schema, not the
queried one.
That’s why I was tempted to make it explicitly separated from Query itself:

FeatureCollection getRecords(Query q, List targetProperties, FeaureType targetSchema, Transaction t) throws IOException;

Cheers
Andrea

==
Our support, Your Success! Visit http://opensdi.geo-solutions.it for more information.

Ing. Andrea Aime
@geowolf
Technical Lead

GeoSolutions S.A.S.
Via Poggio alle Viti 1187
55054 Massarosa (LU)
Italy
phone: +39 0584 962313
fax: +39 0584 962313
mob: +39 339 8844549

http://www.geo-solutions.it
http://twitter.com/geosolutions_it