[Geoserver-devel] ows service object models

Hi all,

Recently I have been asked to estimate what a wfs 2.0 implementation for geoserver would take. In doing so I started thinking about how to handle the long time issue of how an object model for a particular ows spec evolves.

Currently the general architecture we have in place works as follows:

1) From the xml schema for the spec use EMF to generate an object model
2) Instrument the model to make any needed customizations
3) Write xml bindings to encode/decode the object model
4) Implement the operations for the service using the generated object model

This is currently what is used for wfs 1.0, 1.1 and wcs 1.0, 1.1. However the two are somewhat different with regard to how the respective spec models evolved between versions. On the one hand wfs did not change very much at all, while wcs changed drastically.

When implementing a new version of a service, one has two options. The first option (that was used for wfs) is to use a single object model and have both versions of the service use it. Which again is only really possible if the spec versions are relatively similar. The nice thing about this is that you only implement the operations once for the multiple verisons.

The alternative (that was used for wcs) was to generate a different object model for the new version of the spec, and have a completely different service that implements the operations. The downside here is that we have to implement an entirely new service, the upside is that the existing service remains untouched.

Getting back to wfs 2.0 it is sort of a middle ground. While the xml schema has changed a lot the core operations still remain relatively the same. Which means I don't think the three spec versions can share an object model. But at the same time the thought of reimplementing all the wfs operations on top of a new request model does not make sense either.

So... how do we proceed. I can think of a couple of different options.

The first would be to abandon the current architecture of trying to autogenerate the object model from xml schema and come up with a central non generated set of objects. Basically going back the old way of doing things and the way wms does it.

The pros of such an approach that i can see is:

* Flexibility. Often things from the xml schema don't translate across very well, or we want to model something slightly different than the xml schema does

* Stability. The service operations always get implemented in terms of a stable object model.

* Simplicity. This approach is much simpler than the gtxml/emf setup which to my own fault has been over architected in a lot of places.

The cons:

* Parsing/encoding work. One of the nice things about using EMF as the object model is that we can use dynamic bindings to do most of the parsing and encoding work, which can be a time consuming task.

* Maintenance. There is still the burden of manually updating the internal object model to support changes in the spec. And depending on the changes could be a significant task since different versions of a spec can sometimes conflict

The second option would be to stick with xsd/emf and generate a new model for the new spec. And to reuse the operations with different object models we use emf reflection to access the object model, as to not depend on any specific version of the model.

Pros:

* Time. We still achieve the time saving on parsing/encoding work since we are using EMF.

* Separation. it is nice to have the different object models separated instead of trying to merge them into one beast.

Cons:

* Reflection. Doing all access via reflection is a painful way to code.

Anyways, interested in hearing what people think as there are more pros/cons that should be considered and mostly other alternatives of how to proceed.

-Justin

--
Justin Deoliveira
OpenGeo - http://opengeo.org
Enterprise support for open source geospatial.

Thanks for the summary; some ideas for discussion, comments and questions inline.

My expectations were:
- Generate a new data model for each major specification release
  (wcs is an outlier as the difference between 1.0 and 1.1 was massive)
- Gradually update/replace reflection based bindings to track
  changes 2.0, 2.1, etc as we do for GML

Is there any specific areas of WFS 2.0 that show a new object model is required? Or
is it a case of a thousand paper cuts.

Jody

On 04/07/2010, at 4:03 AM, Justin Deoliveira wrote:

Hi all,

Recently I have been asked to estimate what a wfs 2.0 implementation for
geoserver would take. In doing so I started thinking about how to handle
the long time issue of how an object model for a particular ows spec
evolves.

Currently the general architecture we have in place works as follows:

1) From the xml schema for the spec use EMF to generate an object model
2) Instrument the model to make any needed customizations
3) Write xml bindings to encode/decode the object model
4) Implement the operations for the service using the generated object model

This is currently what is used for wfs 1.0, 1.1 and wcs 1.0, 1.1.
However the two are somewhat different with regard to how the respective
spec models evolved between versions. On the one hand wfs did not change
very much at all, while wcs changed drastically.

When implementing a new version of a service, one has two options. The
first option (that was used for wfs) is to use a single object model and
have both versions of the service use it. Which again is only really
possible if the spec versions are relatively similar. The nice thing
about this is that you only implement the operations once for the
multiple verisons.

The alternative (that was used for wcs) was to generate a different
object model for the new version of the spec, and have a completely
different service that implements the operations. The downside here is
that we have to implement an entirely new service, the upside is that
the existing service remains untouched.

Getting back to wfs 2.0 it is sort of a middle ground. While the xml
schema has changed a lot the core operations still remain relatively the
same. Which means I don't think the three spec versions can share an
object model. But at the same time the thought of reimplementing all the
wfs operations on top of a new request model does not make sense either.

That is disappointing - but expected with a 2.0 specification. It was my hope we
could share a data model for an entire series: 1.0, 1.1 etc...

WCS is really the odd one out since wcs 1.0 was pretty half baked; I think that
is more an issue with the OGC then a reflection of our general architecture.

So... how do we proceed. I can think of a couple of different options.

The first would be to abandon the current architecture of trying to
autogenerate the object model from xml schema and come up with a central
non generated set of objects. Basically going back the old way of doing
things and the way wms does it.

This is the approach used with GML parsing/encoding right now is it not? A non
EMF object model; and sharing binding definition between each version of the
GML specification.

The pros of such an approach that i can see is:
* Flexibility. Often things from the xml schema don't translate across
very well, or we want to model something slightly different than the xml
schema does
* Stability. The service operations always get implemented in terms of a
stable object model.
* Simplicity. This approach is much simpler than the gtxml/emf setup
which to my own fault has been over architected in a lot of places.

The ability to reuse the bindings for WPS has been very helpful.

The cons:
* Parsing/encoding work. One of the nice things about using EMF as the
object model is that we can use dynamic bindings to do most of the
parsing and encoding work, which can be a time consuming task.
* Maintenance. There is still the burden of manually updating the
internal object model to support changes in the spec. And depending on
the changes could be a significant task since different versions of a
spec can sometimes conflict

Two variations on your idea:
- generate out the initial object model to save time; you can still use
reflection based dynamic bindings as needed ... but expect to
  supplement the dynamic bindings with each release of the wfs specification.
- don't use an object model; use a data structure ... and generate the
bindings to build the data structure. This is not a great idea but we could
run things off XML as an alternative to an object model.

Questions:
- can we focus on generating the bindings; perhaps with the assumption
that the single data stucture can be accessed using reflection: either java
bean reflection, or treating the data model more like a simple record and
use java reflection directly against fields

The second option would be to stick with xsd/emf and generate a new
model for the new spec. And to reuse the operations with different
object models we use emf reflection to access the object model, as to
not depend on any specific version of the model.

You may also be able to ask EMF to implement a common interface
used by both the old and the new object model in a few key areas.

Pros:
* Time. We still achieve the time saving on parsing/encoding work since
we are using EMF.
* Separation. it is nice to have the different object models separated
instead of trying to merge them into one beast.

+1

Cons:
* Reflection. Doing all access via reflection is a painful way to code.

Anyways, interested in hearing what people think as there are more
pros/cons that should be considered and mostly other alternatives of how
to proceed.

Assorted questions:
- You are treating the generated object model more as a data structure are
  you not?
- Right now the main attraction to EMF seems to be reflection based binding?
Can the same results be reproduced (in slower form) using normal Java beans
reflection?
- Is the object model capabilities of EMF flexible enough for us to insert a common
interface into the generated object model of WFS 1.1 and WFS 2.0? While I understand
the two must be different; what the rest of GeoServer needs out of both data structures
may be limited in scope.
- EMF supports adaptor; could we leave the generated models (one for each spec) alone;
and write a minimal adaptor gathering up the information needed to support the rest of geoserver

Justin Deoliveira ha scritto:

The first would be to abandon the current architecture of trying to autogenerate the object model from xml schema and come up with a central non generated set of objects. Basically going back the old way of doing things and the way wms does it.

The pros of such an approach that i can see is:

* Flexibility. Often things from the xml schema don't translate across very well, or we want to model something slightly different than the xml schema does

* Stability. The service operations always get implemented in terms of a stable object model.

* Simplicity. This approach is much simpler than the gtxml/emf setup which to my own fault has been over architected in a lot of places.

The cons:

* Parsing/encoding work. One of the nice things about using EMF as the object model is that we can use dynamic bindings to do most of the parsing and encoding work, which can be a time consuming task.

* Maintenance. There is still the burden of manually updating the internal object model to support changes in the spec. And depending on the changes could be a significant task since different versions of a spec can sometimes conflict

The second option would be to stick with xsd/emf and generate a new model for the new spec. And to reuse the operations with different object models we use emf reflection to access the object model, as to not depend on any specific version of the model.

I prefer this option. Maybe starting from a pojo object model that has
been generated once from the xml schemas (if I can dream, so that
documentation in the schema translates into the beans javadocs), but
that can be then hacked by hand into the desired shape.
As for bindings, I believe the EMF advantage over javabean reflection
is that it knows what's inside collections?
However that could be addressed by a form of reflective binding that
just needs to know the collection data type.
Maybe a bindings generator could scan
the object model and interactively ask which class is the type of
each collection around, and then generate the bindings for those
(whilst for the classes that do not have collection childs, the
normal javabeans reflection based one would do).

Actually I believe maintenance is a plus on the side of this approach.
Not having to depend on Eclipse tooling and playing with generating
the interfaces or changing the schemas or whatever else should
make maintenance easier and lower the learning curve for
people that want to play with GeoServer sources.
The times I had to massage an EMF model I would have dreamed of
being able to just change the classes (always found the model
update process quite convoluted and hard to remember).

Cheers
Andrea

--
Andrea Aime
OpenGeo - http://opengeo.org
Expert service straight from the developers.