[Geoserver-devel] Problems integrating app-schema DataAccess with GeoServer on trunk

Jody,

I have performed the initial port of the app-schema DataAccess (formerly known as ComplexDataStore) to trunk. Although there is some support in DataAccessFinder for using SPI to find and load implementations of DataAccessFactory, this implementation is not used by GeoServer.

How should we proceed? Would the GeoServer community like to support DataAccess providers, or should we work around this problem? The solution on the 1.6.x branch was to fork the wfs and web modules into wfs-c and web-c. Now that we have DataAccess, we can do better.

Kind regards,

--
Ben Caradoc-Davies <Ben.Caradoc-Davies@anonymised.com>
Software Engineer, CSIRO Exploration and Mining
Australian Resources Research Centre
26 Dick Perry Ave, Kensington WA 6151, Australia

Ben Caradoc-Davies ha scritto:

Jody,

I have performed the initial port of the app-schema DataAccess (formerly known as ComplexDataStore) to trunk. Although there is some support in DataAccessFinder for using SPI to find and load implementations of DataAccessFactory, this implementation is not used by GeoServer.

How should we proceed? Would the GeoServer community like to support DataAccess providers, or should we work around this problem? The solution on the 1.6.x branch was to fork the wfs and web modules into wfs-c and web-c. Now that we have DataAccess, we can do better.

First a question, is does data access finder pick up data stores too?
It should, since a DataStore is a DataAccess.

If so, I guess a limited patch could do the job: most of the
code accesses the catalog in search of a FeatureSource/Store,
and DataAccess provides them, so most of the intermediate code,
that is, the code that builds up queries, maps and the like
should keep on working untouched.

What needs modifications is whatever deals with FeatureType and
Feature, and there we have to be very very careful, to avoid
putting the code into silly slow code paths when that is not
necessary (my expectations are that the careful part will have
to be Feature manipulation, FeatureType wise it'll be slower
but not to a point it can be noticed).

So what I'd like to see is a set of incremental (small) patches
that do unlock the usage of DataAccess and complex features
one bit at a time, going up from the catalog unto the
output producers.
It would be nice if each patch could be reviewed and then
committed.
I also expect to see tests, so that we can prove stuff
is working and will keep on working as GeoServer evolves.
Can the complex data store be used on top of one or more property
data stores? This is how we do functional testing now, and
works very nicely because there is no need to setup external
databases.

Cheers
Andrea
--
Andrea Aime
OpenGeo - http://opengeo.org
Expert service straight from the developers.

So what I'd like to see is a set of incremental (small) patches
that do unlock the usage of DataAccess and complex features
one bit at a time, going up from the catalog unto the
output producers.
It would be nice if each patch could be reviewed and then
committed.
I also expect to see tests, so that we can prove stuff
is working and will keep on working as GeoServer evolves.

This sounds ideal, we really need to have an efficient way to roll in
key changes in the back-end and have confidence that its all working
properly.

I suspect we may have a few situations where upstream modules may need
to force some type narrowing to keep working unaffected with a more
general backend.

Can the complex data store be used on top of one or more property
data stores? This is how we do functional testing now, and
works very nicely because there is no need to setup external
databases.

Yes - thats exactly how we do unit tests for the "application schemas"
(note the name change, many application schemas will be GML simple
features profile)

Cheers
Andrea
--
Andrea Aime
OpenGeo - http://opengeo.org
Expert service straight from the developers.

-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
Geoserver-devel mailing list
Geoserver-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/geoserver-devel

Andrea Aime wrote:

First a question, is does data access finder pick up data stores too?
It should, since a DataStore is a DataAccess.

This is not a given (I don't think SPI considers inheritance), but it should work because DataAccessFinder.getAllDataStores goes out of its way to call DataStoreFinder to get all the DataStoreFactorySpi implementations as well as the DataAccessFactory implementations.

If so, I guess a limited patch could do the job: most of the
code accesses the catalog in search of a FeatureSource/Store,
and DataAccess provides them, so most of the intermediate code,
that is, the code that builds up queries, maps and the like
should keep on working untouched.

We will also need to get the web configuration interface to ignore DataAccess implementations until it can handle them.

What needs modifications is whatever deals with FeatureType and
Feature, and there we have to be very very careful, to avoid
putting the code into silly slow code paths when that is not
necessary (my expectations are that the careful part will have
to be Feature manipulation, FeatureType wise it'll be slower
but not to a point it can be noticed).

Would you be happy with instanceof tests, or can you recommend a better pattern? Strategy, perhaps?

So what I'd like to see is a set of incremental (small) patches
that do unlock the usage of DataAccess and complex features
one bit at a time, going up from the catalog unto the
output producers.
It would be nice if each patch could be reviewed and then
committed.
I also expect to see tests, so that we can prove stuff
is working and will keep on working as GeoServer evolves.

For testing purposes, it might be necessary to write a skeletal (for example) TestDataAccess, so that GeoServer does not depend on app-schema. In fact, this is where we should start (test-driven development).

Can the complex data store be used on top of one or more property
data stores? This is how we do functional testing now, and
works very nicely because there is no need to setup external
databases.

Yes. It uses any DataStore. Most of the app-schema unit tests use property files.

Kind regards,

--
Ben Caradoc-Davies <Ben.Caradoc-Davies@anonymised.com>
Software Engineer, CSIRO Exploration and Mining
Australian Resources Research Centre
26 Dick Perry Ave, Kensington WA 6151, Australia

Ben Caradoc-Davies ha scritto:

Andrea Aime wrote:

First a question, is does data access finder pick up data stores too?
It should, since a DataStore is a DataAccess.

This is not a given (I don't think SPI considers inheritance), but it should work because DataAccessFinder.getAllDataStores goes out of its way to call DataStoreFinder to get all the DataStoreFactorySpi implementations as well as the DataAccessFactory implementations.

Hum, ok, that does not sound like a big problem anyways.

If so, I guess a limited patch could do the job: most of the
code accesses the catalog in search of a FeatureSource/Store,
and DataAccess provides them, so most of the intermediate code,
that is, the code that builds up queries, maps and the like
should keep on working untouched.

We will also need to get the web configuration interface to ignore DataAccess implementations until it can handle them.

Correct.

What needs modifications is whatever deals with FeatureType and
Feature, and there we have to be very very careful, to avoid
putting the code into silly slow code paths when that is not
necessary (my expectations are that the careful part will have
to be Feature manipulation, FeatureType wise it'll be slower
but not to a point it can be noticed).

Would you be happy with instanceof tests, or can you recommend a better pattern? Strategy, perhaps?

Ideally speaking, choose an optimizer accessor before starting to work
on the feaures, and then keep on using that one for the whole
encoding session.
This is what the PropertyName impl already does, in a way, it caches
the last used accessor... not ideal, since it has to second guess
itself and check every time if the accessor is still good, but usually
good enough.
For XML encoding for example it would be nice to have a Feature binding
and a simple Feature binding, or yes, just do an instanceof at runtime,
since instanceof is blazing fast.
I guess we'll have to decide on a case by case basis trying to get
a balance between cleanness and resources required to implement
a certain approach.

So what I'd like to see is a set of incremental (small) patches
that do unlock the usage of DataAccess and complex features
one bit at a time, going up from the catalog unto the
output producers.
It would be nice if each patch could be reviewed and then
committed.
I also expect to see tests, so that we can prove stuff
is working and will keep on working as GeoServer evolves.

For testing purposes, it might be necessary to write a skeletal (for example) TestDataAccess, so that GeoServer does not depend on app-schema. In fact, this is where we should start (test-driven development).

Sounds good to me.

Can the complex data store be used on top of one or more property
data stores? This is how we do functional testing now, and
works very nicely because there is no need to setup external
databases.

Yes. It uses any DataStore. Most of the app-schema unit tests use property files.

Great!
Cheers
Andrea

--
Andrea Aime
OpenGeo - http://opengeo.org
Expert service straight from the developers.

You should be talking with Gabriel on the GeoServer list to sort out a plan. I imagine the plan will be presented to that list as a GSIP proposal thing. Gabriel went throught the process of introducing DataAccess and DataAccessFactory into the GeoTools codebase ... I am not sure what the best progression is to introduce it into GeoServer.

We should continue further discussion on the GeoServer list; if there is any API change required (as you start using DataAccess for real) please let me know - it should be within the scope of Gabriels origional DataAccess proposal.

Jody

Ben Caradoc-Davies wrote:

Jody,

I have performed the initial port of the app-schema DataAccess (formerly known as ComplexDataStore) to trunk. Although there is some support in DataAccessFinder for using SPI to find and load implementations of DataAccessFactory, this implementation is not used by GeoServer.

How should we proceed? Would the GeoServer community like to support DataAccess providers, or should we work around this problem? The solution on the 1.6.x branch was to fork the wfs and web modules into wfs-c and web-c. Now that we have DataAccess, we can do better.

Kind regards,

Hi, chiming in late as usual...
(hem.. turned out to be a long post so be sit down...)

My thoughts about integrating app-schema and hence DataAccess into GeoServer
are driven by mi vision, which I'm not sure if its the best one but here it
goes.

Of course we do need a plan. What I'd really would like to see is to rephrase
all the geoserver code base in terms of DataAccess/Feature/FeatureType.

For the most part it shouldn't be that hard, since GeoServer essentially
shouldn't be doing a lot of that sort of resource handling by itself. But
there're always details that'll hit us hard.

What I mean is, ideally geoserver picks up a OWS request, performs a geotools
request and encodes the result. That and configuration. Si it makes sense to
me for geoserver to treat them all the generic way (DataAccess family)
instead of the special case way (DataStore family).

Truth is, GeoServer does a lot of tweaking to geotools feature related stuff,
like having its own feature reader decorators etc.

For instance, these are the number of references in geoserver trunk for each
of the affected interfaces:
SimpleFeatureType: 532
SimpleFeature: 527
DataStore: 128
FeatureCollection<SFT, SF>: 204
FeatureSource<SFT, SF>: 185
FeatureStore<SFT,SF>: 71
FeatureReader<SFT, SF>: 22
SimpleFeatureBuilder: 33
SimpleFeatureTypeBuilder: 20

I am not sure how many of those are going to be mechanic changes and how many
will require additional thinking. Off the top of my head what I can think of
is:
- everything that needs to build a Feature/Type uses
SimpleFeatureBuilder/SimpleFeatureTypeBuilder. We need generic Feature
replacements for them that still makes for the improved implementations in
case the target schema is "simple". Also, since everything assumes simple
features, all the SimpleFeatureType specific methods are used (eg,
SimpleFT.getAttributeDescriptors():List<AttributeDescriptor> instead of
FeatureType.getDescriptors():List<PropertyDescriptor>. This means getting rid
of all those assumptions imposes adding a lot of tests (eg, ResourcePool has
no test case itself) but the hardest part is perhaps thinking if even
adapting the code to FeatureType implies the assumptions in the code are
still valid.

- FeatureReader/FeatureCollection etc decorators need to be updated
- There is no support in GeoTools for a lot of widely used utility methods to
work on Feature/Type (eg, DataUtilities.subType, etc)
- All and every GeoTools implementation of the above interfaces work on
SFT/SF, a lot of them are abstract base classes geoserver extends (eg,
DataFeatureCollection)
- DataAccess is defined in terms of FeatureSource/Store/Locking, not
FeatureWriter in there. If FeatureWriter is somewhat needed DataAccess will
need to be reviewed. Afaik FeatureWriter as obtained from DataStore acts as
an editable cursor and at the time of writing DataAccess it was not clear
that was a very used/needed apprach in contrast to
FeatureStore.add/modify/removeFeatures()

If I had to start laying out a plan, I would say small steps, bottom up, a
subsystem at a time (if only our "subsystems" were that well decoupled as
they could...). And it would be something like the pseudo-plan bellow, but
first a reasoning: if we're going to do this on trunk, which I think we
should, we need to be aware that it implies a lot of deviation from 1.7.x,
which is ok since trunk's goal is 2.0, but just be aware of the possible
extra complexity in forward porting fixes from 1.7.x to trunk.

General plan: divide the refactoring in per subsystem iterations, for each one
refactor and add tests as required, make sure all CITE passes, cut a
milestone.

The attached geoserver modules dependency graph is meant to help picture this
out. It does not contains transitive deps nor direct redundant ones for the
sake of clarity.

0) get rid of as many deprecated classes as possible. Yes, there're a lot yet.
Starting with the old style, servlet based services and friends.
1) port Catalog to DataAccess/Feature/FeatureType (DAFFT from now on :slight_smile: ).
This will force for changes both in geotools and in the geoserver classes
using DataStoreInfo/FeatureTypeInfo etc, and may provide for a good sens of
the actual effort required.
2) port the rest of the main module
3) port validation and wfs
4) port wcs and wms
5) port web (the new one, get rid of the old one)

As you can see, this is not going to be a cheap process to accomplish. Yet,
one that's worth going for imho. But I'm sure I'm missing a lot of details or
even there might be another smarter approach, so please feel free to
argument.

best regards,

Gabriel

On Thursday 13 November 2008 09:17:19 pm Jody Garnett wrote:

You should be talking with Gabriel on the GeoServer list to sort out a
plan. I imagine the plan will be presented to that list as a GSIP
proposal thing. Gabriel went throught the process of introducing
DataAccess and DataAccessFactory into the GeoTools codebase ... I am not
sure what the best progression is to introduce it into GeoServer.

We should continue further discussion on the GeoServer list; if there is
any API change required (as you start using DataAccess for real) please
let me know - it should be within the scope of Gabriels origional
DataAccess proposal.

Jody

Ben Caradoc-Davies wrote:
> Jody,
>
> I have performed the initial port of the app-schema DataAccess
> (formerly known as ComplexDataStore) to trunk. Although there is some
> support in DataAccessFinder for using SPI to find and load
> implementations of DataAccessFactory, this implementation is not used
> by GeoServer.
>
> How should we proceed? Would the GeoServer community like to support
> DataAccess providers, or should we work around this problem? The
> solution on the 1.6.x branch was to fork the wfs and web modules into
> wfs-c and web-c. Now that we have DataAccess, we can do better.
>
> Kind regards,

-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's
challenge Build the coolest Linux based applications with Moblin SDK & win
great prizes Grand prize is a trip for two to an Open Source event anywhere
in the world http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
Geoserver-devel mailing list
Geoserver-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/geoserver-devel

(attachments)

geoserver_depgraph.png