[Geoserver-devel] GSIP 31 - Use Data Access API

Jody,

I have renamed the DataAccess proposal and now propose it as GSIP 31.
http://geoserver.org/display/GEOS/GSIP+31+-+Use+DataAccess+API

Note that the proposal is not specific to app-schema. It proposes
support for all DataAccess implementations.

I have also created a container Jira issue:
http://jira.codehaus.org/browse/GEOS-2568

It is my intent that the work be incremental and preserve existing
behaviour wherever possible. In particular, we want to maintain the
existing performance and simplicity of DataStore/SimpleFeature.

Preliminary analysis (and some test implementations) suggest that
Catalog requires no changes. The initial work will comprise changes in
ResourcePool, (catalog) DataStoreInfo, and (catalog) FeatureTypeInfo,
their implementations and consumers. I will submit a patch.

Open questions:

- What is the future of the vfny DataStoreInfo and vfny FeatureTypeInfo
classes? These are @deprecated but widely used. Are these to be retained
or should they or their consumers be refactored?

- Should DataStoreFinder/DataAccessFinder be retained as separate
classes or merged? (Here we are drifting into GeoTools).

- Which service modules are to be migrated first? Do we have any volunteers?

Other classes that may require work are RetypingDataStore, FeatureTypes,
and GeoServerFeatureLocking. No doubt, as the implementation proceeds,
more opportunities for scope expansion will become apparent. :slight_smile:

Tasks related to GSIP 31 should subtasks of GEOS-2568, to facilitate
communication.

Kind regards,

--
Ben Caradoc-Davies <Ben.Caradoc-Davies@anonymised.com>
Software Engineer, CSIRO Exploration and Mining
Australian Resources Research Centre
26 Dick Perry Ave, Kensington WA 6151, Australia

Ben Caradoc-Davies ha scritto:

Jody,

I have renamed the DataAccess proposal and now propose it as GSIP 31.
http://geoserver.org/display/GEOS/GSIP+31+-+Use+DataAccess+API

Note that the proposal is not specific to app-schema. It proposes
support for all DataAccess implementations.

The general idea in the GSIP is fine, but it's a very, very big
topic that requires changes in both GeoTools and GeoServer.
So I'd say we need a linked improvement proposal in GeoTools
as well, and a vote in both control commitees.

The second thing that is required for any GSIP to pass (both
in GeoServer and in GeoTools) is resourcing.
As far as I know OpenGeo has agreed to only provide code
reviews for the changes that are going to happen, to make
sure we don't regress (too much) in the existing functionalities
while adding the new ones (and given the extensiveness
of the change, even only code reviews will take a lot
of time on our part, which means a significant financial
investment anyways). Anyways, I'm not the one paying the
bills on this one, Chris might want to chime in here :wink:

Do you have enough resources to pull this up on your own?
(the GSIP misses a resourcing part, the "who does what" one).
Moreover, what happens if someone pulls the plug mid-way?
The changes should be incremental enough to allow us to
keep them in even if the entire switch is not complete.

I have also created a container Jira issue:
http://jira.codehaus.org/browse/GEOS-2568

It is my intent that the work be incremental and preserve existing
behaviour wherever possible. In particular, we want to maintain the
existing performance and simplicity of DataStore/SimpleFeature.

Excellent.

Preliminary analysis (and some test implementations) suggest that
Catalog requires no changes. The initial work will comprise changes in
ResourcePool, (catalog) DataStoreInfo, and (catalog) FeatureTypeInfo,
their implementations and consumers. I will submit a patch.

Good. Yeah, I think that to preserve functionality/performance
one has to make really sure we use SimpleFeature when possible
in the long running tight loops (xml encoding, rendering, and so on),
I don't see super-big problems otherwise.
Of course that means keep on using simple feature types as well
when possible (a smart feature type builder could be used that
it recognizes when it's possible to return a simple feature type
maybe?).

Open questions:

- What is the future of the vfny DataStoreInfo and vfny FeatureTypeInfo
classes? These are @deprecated but widely used. Are these to be retained
or should they or their consumers be refactored?

- Should DataStoreFinder/DataAccessFinder be retained as separate
classes or merged? (Here we are drifting into GeoTools).

They could be merged, but since we're talking public api,
this should go thought a deprecation cycle.
Maybe DataAccessFinder could provide some methods that only return
data stores thought (what if you have code that can only handle a
DataStore anyways?).

- Which service modules are to be migrated first? Do we have any volunteers?

Complex features are all about WFS, so I'd say to start with that one?

Cheers
Andrea

--
Andrea Aime
OpenGeo - http://opengeo.org
Expert service straight from the developers.

Andrea Aime wrote:

Ben Caradoc-Davies ha scritto:

I have renamed the DataAccess proposal and now propose it as GSIP 31.
http://geoserver.org/display/GEOS/GSIP+31+-+Use+DataAccess+API

The general idea in the GSIP is fine, but it's a very, very big
topic that requires changes in both GeoTools and GeoServer.
So I'd say we need a linked improvement proposal in GeoTools
as well, and a vote in both control commitees.

Preliminary investigations indicate that only small changes are required in GeoTools. DataAccess/Feature/FeatureType (DAFFT), which is what I mean by "DataAccess API", is already well-supported in GeoTools. Furthermore, as a library with a decoupled architecture, GeoTools is amenable to extension, so I can readily add new builders and implementations without breaking backwards compatibility. The problem with GeoServer is its centralised resource management, which has not yet undergone the DAFFT transition to the same extent as GeoTools, and is a blocker of future development. This is why the proposal is restricted to GeoServer.

I anticipate that, at least for the first wave of changes, I may be able to get by with changes to GeoTools as small as spot cleaning with a little SimpleFeature Remover (TM). For example:
http://jira.codehaus.org/browse/GEOT-2270

The second thing that is required for any GSIP to pass (both
in GeoServer and in GeoTools) is resourcing.
As far as I know OpenGeo has agreed to only provide code
reviews for the changes that are going to happen, to make
sure we don't regress (too much) in the existing functionalities
while adding the new ones (and given the extensiveness
of the change, even only code reviews will take a lot
of time on our part, which means a significant financial
investment anyways). Anyways, I'm not the one paying the
bills on this one, Chris might want to chime in here :wink:

I would be delighted if OpenGeo:
1. Review the patches.
2. Run CITE tests.
3. Decide if performance is sufficient and no regressions are introduced.
4. Commit the changes.
5. The same ongoing support that is already provided.

Do you have enough resources to pull this up on your own?

I believe so. I am now working full-time on this task. It is essential for the success of my project that this work be completed. We can add more developers if required, although it is my preference that they continue on other areas (e.g. GeoTools app-schema).

Note also:
http://en.wikipedia.org/wiki/Mythical_Man_Month

(the GSIP misses a resourcing part, the "who does what" one).

I have not seen a GSIP that includes such a section.

I undertake to make the minimal changes that will allow DataAccess loading and encoding of WFS responses.

I do *not* propose to support DataAccess in WMS or other services, but some movement towards this functionality will likely be a by-product of the main/wfs work.

Moreover, what happens if someone pulls the plug mid-way?
The changes should be incremental enough to allow us to
keep them in even if the entire switch is not complete.

Changes will be designed to minimise impact on existing functionality, be incremental enough to preserve my sanity. And if the quality is not satisfactory, I trust that you will reject them. :slight_smile:

Kind regards,

--
Ben Caradoc-Davies <Ben.Caradoc-Davies@anonymised.com>
Software Engineer, CSIRO Exploration and Mining
Australian Resources Research Centre
26 Dick Perry Ave, Kensington WA 6151, Australia

Sorry to chime in late. To add to Andrea's feedback, not only is resourcing important for this one, but also is the time. Since the changes are quite extensive in GeoServer they must be done in a way that does not disrupt the entire project. At the same time doing this on a branch has proven time and time again that it won't work.

An ideal timing situation I see is starting the work right after trunk branches to stable. For instance when current trunk branches to 2.0.x or when trunk branches to 2.1.x, etc... That will give you some freedom and some time to leave trunk somewhat unstable, at the same time not depart from the core code base.

As for patch review... i am happy to review patches but I ask that they be submitted in a clear way. Submitting a patch for the entire switch for example would not be manageable. Even patches at the module level are head to deal with.

So what I would like to see from this proposal is a set of subsystems that can be switched over in isolation. An example might be the WFS GetFeature. For each subsystem I would also like to see a jira and and a patch applied to the jira. I think that will make for the most efficient review process.

-Justin

Ben Caradoc-Davies wrote:

Jody,

I have renamed the DataAccess proposal and now propose it as GSIP 31.
http://geoserver.org/display/GEOS/GSIP+31+-+Use+DataAccess+API

Note that the proposal is not specific to app-schema. It proposes
support for all DataAccess implementations.

I have also created a container Jira issue:
http://jira.codehaus.org/browse/GEOS-2568

It is my intent that the work be incremental and preserve existing
behaviour wherever possible. In particular, we want to maintain the
existing performance and simplicity of DataStore/SimpleFeature.

Preliminary analysis (and some test implementations) suggest that
Catalog requires no changes. The initial work will comprise changes in
ResourcePool, (catalog) DataStoreInfo, and (catalog) FeatureTypeInfo,
their implementations and consumers. I will submit a patch.

Open questions:

- What is the future of the vfny DataStoreInfo and vfny FeatureTypeInfo
classes? These are @deprecated but widely used. Are these to be retained
or should they or their consumers be refactored?

- Should DataStoreFinder/DataAccessFinder be retained as separate
classes or merged? (Here we are drifting into GeoTools).

- Which service modules are to be migrated first? Do we have any volunteers?

Other classes that may require work are RetypingDataStore, FeatureTypes,
and GeoServerFeatureLocking. No doubt, as the implementation proceeds,
more opportunities for scope expansion will become apparent. :slight_smile:

Tasks related to GSIP 31 should subtasks of GEOS-2568, to facilitate
communication.

Kind regards,

--
Justin Deoliveira
OpenGeo - http://opengeo.org
Enterprise support for open source geospatial.

Justin Deoliveira wrote:

Sorry to chime in late. To add to Andrea's feedback, not only is resourcing important for this one, but also is the time. Since the changes are quite extensive in GeoServer they must be done in a way that does not disrupt the entire project.

Of course.

> At the same time doing this on a

branch has proven time and time again that it won't work.

Indeed.

An ideal timing situation I see is starting the work right after trunk branches to stable. For instance when current trunk branches to 2.0.x or when trunk branches to 2.1.x, etc... That will give you some freedom and some time to leave trunk somewhat unstable, at the same time not depart from the core code base.

Is a freeze in effect on trunk? I was unaware of this. We can hold off if you are wanting to branch in the next few days.

As for patch review... i am happy to review patches but I ask that they be submitted in a clear way. Submitting a patch for the entire switch for example would not be manageable. Even patches at the module level are head to deal with.
So what I would like to see from this proposal is a set of subsystems that can be switched over in isolation. An example might be the WFS GetFeature. For each subsystem I would also like to see a jira and and a patch applied to the jira. I think that will make for the most efficient review process.

The switch cannot be done this way because (catalog) DataStoreInfo and (catalog) FeatureTypeInfo interfaces must be changed, and consequential changes affect several modules (main, data, wfs, wms). The changes are small, and consist mainly of type broadening. That is, the initial work consists of widespread but simple type and API changes that do not change the behaviour of GeoServer for DataStore/SimpleFeature/SimpleFeatureType. In my view, it is safest to apply these as an atomic commit across the codebase. Clean build before, clean build after. Functional enhancements will follow as later incremental patches.

If you really, really want, I can break up the switch into individual patches. I do not know how you prefer to work, by I would prefer to manage these as a single changeset. Given that we are changing interfaces, GS will not build until all the patches are applied, so it would seem to be making more work to split them up. Nonetheless, I am happy to do so if you want to follow this approach to patch submission.

Kind regards,
Ben.

--
Ben Caradoc-Davies <Ben.Caradoc-Davies@anonymised.com>
Software Engineer, CSIRO Exploration and Mining
Australian Resources Research Centre
26 Dick Perry Ave, Kensington WA 6151, Australia

Is a freeze in effect on trunk? I was unaware of this. We can hold off if you are wanting to branch in the next few days.

In the next few days no. But in the next few months yes. Since your proposal does not have any time lines attached to it, and is not on the road map with any time attached to it I am unable to determine when you plan to proceed and start submitting patches.

As for patch review... i am happy to review patches but I ask that they be submitted in a clear way. Submitting a patch for the entire switch for example would not be manageable. Even patches at the module level are head to deal with.
So what I would like to see from this proposal is a set of subsystems that can be switched over in isolation. An example might be the WFS GetFeature. For each subsystem I would also like to see a jira and and a patch applied to the jira. I think that will make for the most efficient review process.

The switch cannot be done this way because (catalog) DataStoreInfo and (catalog) FeatureTypeInfo interfaces must be changed, and consequential changes affect several modules (main, data, wfs, wms). The changes are small, and consist mainly of type broadening. That is, the initial work consists of widespread but simple type and API changes that do not change the behaviour of GeoServer for DataStore/SimpleFeature/SimpleFeatureType. In my view, it is safest to apply these as an atomic commit across the codebase. Clean build before, clean build after. Functional enhancements will follow as later incremental patches.

So you are planning an inside out approach? If that is the case yes I agree the work can't be broken up. But if you take an outside-in approach it might be able to. What i mean is starting web and services, and work down into core (main,data,etc...).

For instance... wfs should be able to be switched over w/o touching the catalog should it not? Since it will only be type broadening?

If you really, really want, I can break up the switch into individual patches. I do not know how you prefer to work, by I would prefer to manage these as a single changeset. Given that we are changing interfaces, GS will not build until all the patches are applied, so it would seem to be making more work to split them up. Nonetheless, I am happy to do so if you want to follow this approach to patch submission.

Kind regards,
Ben.

--
Justin Deoliveira
OpenGeo - http://opengeo.org
Enterprise support for open source geospatial.

Justin Deoliveira wrote:

Is a freeze in effect on trunk? I was unaware of this. We can hold off if you are wanting to branch in the next few days.

In the next few days no. But in the next few months yes. Since your proposal does not have any time lines attached to it, and is not on the road map with any time attached to it I am unable to determine when you plan to proceed and start submitting patches.

The timeline for the first API changes is 30 Jan (two weeks from today). It is mostly done and will likely take a few more days, double and round up for troubleshooting and overheads.

We will get a better idea of how much work remains when we run community/app-schema/sample-data-access-test and later try to use gt-app-schema AppSchemaDataAccess. I want a working AppSchemaDataAccess wfs by the end of March.

I could not figure out how to add this to the Draft Roadmap, which is autogenerated from Jira. I am aiming to get these changes in to 2.0, but as long as they are on trunk I am happy. The proposal has stated 2.0 since the first draft was written by Jody.

I thought I needed to get a GSIP accepted before it was put on the roadmap? Chicken and egg?

So you are planning an inside out approach? If that is the case yes I agree the work can't be broken up. But if you take an outside-in approach it might be able to. What i mean is starting web and services, and work down into core (main,data,etc...).
For instance... wfs should be able to be switched over w/o touching the catalog should it not? Since it will only be type broadening?

One core service we require is resource loading, in particular, SPI detection and loading of DataAccessFactory implementations. Looking at ResourcePool indicates the changes required, and a clean global API change appears to be the simplest approach, hence the inside-out approach. Fix the infrastructure, then build on it.

Something closer to outside-in was the approach taken in 1.6.x (the web-c and wfs-c forks); this led to duplication, isolation, and the slow decay of these forks. I hope that the proposed work will lead to a more sustainable outcome.

Kind regards,

--
Ben Caradoc-Davies <Ben.Caradoc-Davies@anonymised.com>
Software Engineer, CSIRO Exploration and Mining
Australian Resources Research Centre
26 Dick Perry Ave, Kensington WA 6151, Australia

The timeline for the first API changes is 30 Jan (two weeks from today). It is mostly done and will likely take a few more days, double and round up for troubleshooting and overheads.

We will get a better idea of how much work remains when we run community/app-schema/sample-data-access-test and later try to use gt-app-schema AppSchemaDataAccess. I want a working AppSchemaDataAccess wfs by the end of March.

I will be honest with you, I am skeptical of your time line given the current state of the proposal and the fact that you are more or less proposing a "wait and see" for an estimate of the work.

I guess I will wait for patches to make my final judgment and vote for or against the proposal, but this seems to be happening a bit opaquely. I have a lot of unanswered questions. I have not been asked to review any code up to this point. The only review that has been done has been internal afaik. And from previous emails on this subject I think there is a misconception that just b/c there is code in the geotools repository that people have been reviewing it.

And let me please re-inforce that I speak as someone who has tried to pull this work off before, not as someone who does not want to see it move forward, quite the contrary.

I could not figure out how to add this to the Draft Roadmap, which is autogenerated from Jira. I am aiming to get these changes in to 2.0, but as long as they are on trunk I am happy. The proposal has stated 2.0 since the first draft was written by Jody.

I thought I needed to get a GSIP accepted before it was put on the roadmap? Chicken and egg?

So you are planning an inside out approach? If that is the case yes I agree the work can't be broken up. But if you take an outside-in approach it might be able to. What i mean is starting web and services, and work down into core (main,data,etc...).
For instance... wfs should be able to be switched over w/o touching the catalog should it not? Since it will only be type broadening?

One core service we require is resource loading, in particular, SPI detection and loading of DataAccessFactory implementations. Looking at ResourcePool indicates the changes required, and a clean global API change appears to be the simplest approach, hence the inside-out approach. Fix the infrastructure, then build on it.

Something closer to outside-in was the approach taken in 1.6.x (the web-c and wfs-c forks); this led to duplication, isolation, and the slow decay of these forks. I hope that the proposed work will lead to a more sustainable outcome.

Hmmm... I don't think the inside-out approach was the cause of the problem. This is just my opinion but the lack of funding and failure to engage the core developer community is what led to these forks not going anywhere. Anyways, proceed with the patches as you see fit but if they are submitted in a way that is hard to manage, I warn you not to expect a timely review.

Again, I hope this email does not come across as too negative... I just know from experience that if these questions are left un asked this will result in yet another failed attempt at complex feature support.

-Justin

Kind regards,

--
Justin Deoliveira
OpenGeo - http://opengeo.org
Enterprise support for open source geospatial.

Justin Deoliveira wrote:

I will be honest with you, I am skeptical of your time line given the current state of the proposal and the fact that you are more or less proposing a "wait and see" for an estimate of the work.

Scepticism is a good thing!

Estimation is hard, doubly so when I am the only person working on this project and this is my first time hacking the core of GeoServer. Only my exposure to DAFFT in GeoTools gives me some confidence.

Architectural wisdom is the hardest to obtain; fortunately I have benefited from the advice of Rob A, Gabriel, and Jody, amongst others.

In the absence of a detailed plan, I will use test-driven, incremental, iterative development to feel my way through the dark.

I guess I will wait for patches to make my final judgment and vote for or against the proposal, but this seems to be happening a bit opaquely.

The lack of information in the proposal reflects my lack of knowledge of GeoServer. Core developers are welcome to help and contribute to the plan. Gabriel and Jody have already done so.

I have a lot of unanswered questions.

So do I!

Hmmm... I don't think the inside-out approach was the cause of the problem. This is just my opinion but the lack of funding and failure to engage the core developer community is what led to these forks not going anywhere.

We have the resources, and are willing to engage, as demonstrated by this proposal. What more can we do?

Anyways, proceed with the patches as you see fit but if they are submitted in a way that is hard to manage, I warn you not to expect a timely review.

I ask only that if they are too cumbersome, you let me know so that I can revise and resubmit.

Again, I hope this email does not come across as too negative... I just know from experience that if these questions are left un asked this will result in yet another failed attempt at complex feature support.

Every attempt gets a bit better. Now we have DAFFT, built on the experience earned in the failures of the past, we should be able to get further. At least we should improve support for DAFFT in GeoServer core. Full support for complex features is a long, long road.

Kind regards,

--
Ben Caradoc-Davies <Ben.Caradoc-Davies@anonymised.com>
Software Engineer, CSIRO Exploration and Mining
Australian Resources Research Centre
26 Dick Perry Ave, Kensington WA 6151, Australia

Hmmm... I don't think the inside-out approach was the cause of the
problem. This is just my opinion but the lack of funding and failure to
engage the core developer community is what led to these forks not going
anywhere.

Sounds like its worth being aware that in the last round of effort
these forks of geoserver modules were simply a tactic to allow us to
progress the core Geotools improvements required. Now is the time to
address the Geoserver end, and the resources are committed.

Limits to resource availability (especially at the reviwing end), and
general dynamic nature of the code base, requires a step by step
approach. Lets not throw a spanner in the works trying to argue that
because the first step didnt get us across the river we have chosen
the wrong place to cross. The question is simply, is there any better
way of doing this? We will try to ensure that the community has the
opportunity to review some coherent set of small changes that allow
progress to be made by the available resources. Yes - we need to test
how deep this river is at each step, and back out if its going to
drown us, but this looks feasible, and we dont see a better way of
getting across.

Rob

Sorry to chime in even later. A few thoughts

Ben Caradoc-Davies wrote:

Andrea Aime wrote:

Ben Caradoc-Davies ha scritto:

I have renamed the DataAccess proposal and now propose it as GSIP 31.
http://geoserver.org/display/GEOS/GSIP+31+-+Use+DataAccess+API

The general idea in the GSIP is fine, but it's a very, very big
topic that requires changes in both GeoTools and GeoServer.
So I'd say we need a linked improvement proposal in GeoTools
as well, and a vote in both control commitees.

Preliminary investigations indicate that only small changes are required in GeoTools. DataAccess/Feature/FeatureType (DAFFT), which is what I mean by "DataAccess API", is already well-supported in GeoTools. Furthermore, as a library with a decoupled architecture, GeoTools is amenable to extension, so I can readily add new builders and implementations without breaking backwards compatibility. The problem with GeoServer is its centralised resource management, which has not yet undergone the DAFFT transition to the same extent as GeoTools, and is a blocker of future development. This is why the proposal is restricted to GeoServer.

So my main problem with this paragraph is the notion that the DataAccess API is 'well supported' in GeoTools. Has it seen any real use in production? Has it been integrated in stable versions of any projects?

The part of the switch that scares me is what happens after the classes have been changed over. I've been a part of a feature model change and a data api change, and they are easy to do in GeoServer. But the bitch is in the months after when lots of bugs pop up.

So I guess the assurance more needed from my end is that there are tasked resources that we can assign issues to and have them fixed as quickly as possible. These will pop up probably until we hit 2.0.2. I want to be sure that the job is not seen as done when the data access api is on, for me that's only the very start of the job.

I anticipate that, at least for the first wave of changes, I may be able to get by with changes to GeoTools as small as spot cleaning with a little SimpleFeature Remover (TM). For example:
http://jira.codehaus.org/browse/GEOT-2270

The second thing that is required for any GSIP to pass (both
in GeoServer and in GeoTools) is resourcing.
As far as I know OpenGeo has agreed to only provide code
reviews for the changes that are going to happen, to make
sure we don't regress (too much) in the existing functionalities
while adding the new ones (and given the extensiveness
of the change, even only code reviews will take a lot
of time on our part, which means a significant financial
investment anyways). Anyways, I'm not the one paying the
bills on this one, Chris might want to chime in here :wink:

I would be delighted if OpenGeo:
1. Review the patches.
2. Run CITE tests.
3. Decide if performance is sufficient and no regressions are introduced.
4. Commit the changes.
5. The same ongoing support that is already provided.

Do you have enough resources to pull this up on your own?

All these are fine except I'm not sure on #5. If this introduces a bunch of bugs that no one is helping us fix we'll look in to backing out the changes. But that can easily be seen as much more support than we already supply. We are obviously willing to help fix a few bugs introduced, but we don't want to be left as the maintainers of a massive new work.

It doesn't sound like this is what you're saying, it sounds like you are committed to help out for the indefinite future. But I just want to be clear. We will be looking at the patches more closely than others since they are core and cut across everything, and no matter what will introduce bugs. I wish our testing was good enough to catch most all of the potential bugs immediately, but it's not.

I believe so. I am now working full-time on this task. It is essential for the success of my project that this work be completed. We can add more developers if required, although it is my preference that they continue on other areas (e.g. GeoTools app-schema).

When does funding on your project end? And are you able to task people on helping fix bugs until that time?

I apologize if any of this comes across as negative, it's really just us evaluating risk and trying to minimize it as much as possible. This will be a great improvement for GeoServer, and we're excited to grow the community.

best regards,

Chris

--
Chris Holmes
OpenGeo - http://opengeo.org
Expert service straight from the developers.

Thanks Chris

These are of course exactly the right sort of questions.

And the good news is that we do indeed have a long term capability and willingness to support bug fixing, test support and build processes the changes.

The bad news is that testing of such stuff is extremely difficult until we can do regression tests against geoserver using the changes. Obviously some of these are available in the code base, but as you say, no set of tests is perfect (or free from bugs!) so we need to plan for this.

We would hope the community can generally pay some attention with all its private test cases as part of the review process of the incremental changes - to catch the bugs ASAP, with least pain. At least no one can complain that we are not doing this openly though :slight_smile:

Rob

-----Original Message-----
From: Chris Holmes [mailto:cholmes@anonymised.com]
Sent: Friday, 16 January 2009 2:44 PM
To: Caradoc-Davies, Ben (E&M, Kensington)
Cc: Andrea Aime; Woodcock, Robert (E&M, Kensington); Fraser, Ryan (E&M, Kensington); Geoserver-devel; Geotools-Devel list; Atkinson, Rob (CLW, Lucas Heights)
Subject: Re: [Geoserver-devel] GSIP 31 - Use Data Access API

Sorry to chime in even later. A few thoughts

Ben Caradoc-Davies wrote:

Andrea Aime wrote:

Ben Caradoc-Davies ha scritto:

I have renamed the DataAccess proposal and now propose it as GSIP 31.
http://geoserver.org/display/GEOS/GSIP+31+-+Use+DataAccess+API

The general idea in the GSIP is fine, but it's a very, very big
topic that requires changes in both GeoTools and GeoServer.
So I'd say we need a linked improvement proposal in GeoTools
as well, and a vote in both control commitees.

Preliminary investigations indicate that only small changes are required
in GeoTools. DataAccess/Feature/FeatureType (DAFFT), which is what I
mean by "DataAccess API", is already well-supported in GeoTools.
Furthermore, as a library with a decoupled architecture, GeoTools is
amenable to extension, so I can readily add new builders and
implementations without breaking backwards compatibility. The problem
with GeoServer is its centralised resource management, which has not yet
undergone the DAFFT transition to the same extent as GeoTools, and is a
blocker of future development. This is why the proposal is restricted to
GeoServer.

So my main problem with this paragraph is the notion that the DataAccess
API is 'well supported' in GeoTools. Has it seen any real use in
production? Has it been integrated in stable versions of any projects?

The part of the switch that scares me is what happens after the classes
have been changed over. I've been a part of a feature model change and
a data api change, and they are easy to do in GeoServer. But the bitch
is in the months after when lots of bugs pop up.

So I guess the assurance more needed from my end is that there are
tasked resources that we can assign issues to and have them fixed as
quickly as possible. These will pop up probably until we hit 2.0.2. I
want to be sure that the job is not seen as done when the data access
api is on, for me that's only the very start of the job.

I anticipate that, at least for the first wave of changes, I may be able
to get by with changes to GeoTools as small as spot cleaning with a
little SimpleFeature Remover (TM). For example:
http://jira.codehaus.org/browse/GEOT-2270

The second thing that is required for any GSIP to pass (both
in GeoServer and in GeoTools) is resourcing.
As far as I know OpenGeo has agreed to only provide code
reviews for the changes that are going to happen, to make
sure we don't regress (too much) in the existing functionalities
while adding the new ones (and given the extensiveness
of the change, even only code reviews will take a lot
of time on our part, which means a significant financial
investment anyways). Anyways, I'm not the one paying the
bills on this one, Chris might want to chime in here :wink:

I would be delighted if OpenGeo:
1. Review the patches.
2. Run CITE tests.
3. Decide if performance is sufficient and no regressions are introduced.
4. Commit the changes.
5. The same ongoing support that is already provided.

Do you have enough resources to pull this up on your own?

All these are fine except I'm not sure on #5. If this introduces a
bunch of bugs that no one is helping us fix we'll look in to backing out
the changes. But that can easily be seen as much more support than we
already supply. We are obviously willing to help fix a few bugs
introduced, but we don't want to be left as the maintainers of a massive
new work.

It doesn't sound like this is what you're saying, it sounds like you are
committed to help out for the indefinite future. But I just want to be
clear. We will be looking at the patches more closely than others since
they are core and cut across everything, and no matter what will
introduce bugs. I wish our testing was good enough to catch most all of
the potential bugs immediately, but it's not.

I believe so. I am now working full-time on this task. It is essential
for the success of my project that this work be completed. We can add
more developers if required, although it is my preference that they
continue on other areas (e.g. GeoTools app-schema).

When does funding on your project end? And are you able to task people
on helping fix bugs until that time?

I apologize if any of this comes across as negative, it's really just us
evaluating risk and trying to minimize it as much as possible. This
will be a great improvement for GeoServer, and we're excited to grow the
community.

best regards,

Chris

--
Chris Holmes
OpenGeo - http://opengeo.org
Expert service straight from the developers.

Chris Holmes wrote:

So my main problem with this paragraph is the notion that the DataAccess API is 'well supported' in GeoTools. Has it seen any real use in production? Has it been integrated in stable versions of any projects?

No, nor will it, until it is supported by GeoServer. Chicken. Egg.

So I guess the assurance more needed from my end is that there are tasked resources that we can assign issues to and have them fixed as quickly as possible.

[snip]

If this introduces a bunch of bugs that no one is helping us fix we'll look in to backing out the changes.

As you should.

it sounds like you are committed to help out for the indefinite future.

Until June 2011.

But I just want to be clear. We will be looking at the patches more closely than others since they are core and cut across everything, and no matter what will introduce bugs.

Good.

I wish our testing was good enough to catch most all of the potential bugs immediately, but it's not.

No doubt an expanded and more diverse user base will help detect more bugs, and encourage better unit and regression testing.

When does funding on your project end?

June 2011.

And are you able to task people on helping fix bugs until that time?

Dr Robert Woodcock (AuScope Grid - Director) has confirmed to me that we will task people to help fix bugs in the GeoServer WFS DataAccess processing chain until June 2011.

We will not undertake to to support WMS, but I am confident that this work will also benefit the WMS implementation.

Kind regards,

--
Ben Caradoc-Davies <Ben.Caradoc-Davies@anonymised.com>
Software Engineer, CSIRO Exploration and Mining
Australian Resources Research Centre
26 Dick Perry Ave, Kensington WA 6151, Australia

Jody Garnett wrote:

    I guess I will wait for patches to make my final judgment and vote
    for or against the proposal, but this seems to be happening a bit
    opaquely.

I am not sure that approach works Justin - need a clear GSIP before starting work do we not?

The proposal lacks enough detail to get my vote so I am unsure of any other way to evaluate it. I of course support the proposal in theory (i would like to see us get to complex features), but I am more concerned with how it will be implemented, in terms of code.

    I have a lot of unanswered questions. I have not been asked to
    review any code up to this point. The only review that has been done
    has been internal afaik. And from previous emails on this subject I
    think there is a misconception that just b/c there is code in the
    geotools repository that people have been reviewing it.

    And let me please re-inforce that I speak as someone who has tried
    to pull this work off before, not as someone who does not want to
    see it move forward, quite the contrary.

This was one of my requests in asking Ben to make a test DataAccess available for unit tests to be written. Right now only Ben and Gabriel have hands on experience with the DataAccess API.

Perhaps we can review unit tests of the DataAccess API prior to accepting this proposal? That is the danger right - that this work will start and we will find out that the DataAccess API is missing something?

While unit tests would be a step forward, I would rather see a patch for a small section of the code base (non-trivial) ported to the new api. The fact that there seems to be an argument that everything has to be changed at once hints to me that this is not going to be a very clean transition.

Cheers,
Jody

--
Justin Deoliveira
OpenGeo - http://opengeo.org
Enterprise support for open source geospatial.

chiming in (late)

Afaik these are some of the most important things to address geoserver wise:
- Catalog (already discussed and quite
- FilterFactory: it is namespace aware or its unuseful. The approach taken in
the app-schema modules is the one I like. I.e. PropertyName is given a ns
context (NamespaceSupport or such) which in turn is given to the FilterFactory
(as a hint? don't quite remember). GeoServer has knowledge of the (incoming
query) prefix to ns mappings so that's what it should pass.
- Namespaces as mere data objects: Recognize the fact that a single datastore
may be serving FeatureTypes in different namespaces (and we have two well known
candidates here: app-schema and wfs datastores) and stop assuming we can gige
the DataAccessFactory a "namespace" parameter and all and every FT it serves
will be tied to that namespace. Hence the need for trunk/Wicet UI, the old one
is just not up to the task, and the new one may need further tweaking.

There are more low level ones like like "generizicing" all those FeatureSource
decorators, overloading utility methods to support Feature/FeatureType, etc.

I guess the concern of availability of resources is already covered as
explained by Ben, as long as we give them enough support as to start making
this happen now. Waiting another six months may be disastrous.

So, what are we actually requiring them in order to give the go ahead? It
would be unfair to ask for a detailed plan with strict deadlines (anyone that
was in touch with this stuff knows how blurry that could be, working on
geotools trunk is that risky for set deadlines), but we could surely ask and
help in building a little more detailed plan that what's currently on the
GSIP.
About fear of broad changes, I can asure you that the tiniest change in
broading the typing of any DAFFT related class leads to a lot of cross cutting
changes, which doesn't mean we can't make then the smaller our codebase
allows.
As one that tried to figure out what it would mean doing this job, I strongly
assess that a bit of clean up is requisite zero. Clean up meaning getting rid
of all those deprecated classes, from old servlets to configuration; or you'll
end up getting frustrated by the number of backward compatibility _hacks_ you
need to make to preserve Data and friends instead of using the new
configuration classes directly.

So why don't we laid the plan out as the set of functional (very few) and non
functional (the most) requirements and plan on each of them?
for example, not messing up with performance was clearly stated as a non
functional one, so write this down and state what's going to be done in order
to preserve performance.

I guess my point is if we can turn that wiki page into a set of requirements,
a feature plan and a risk assessment and contention plan, we would be making
progress faster than exposing everyone's concerns on a email trhead which
makes loosing the big picture too easy.

My 2c.-

Gabriel

On Monday 19 January 2009 12:17:18 Justin Deoliveira wrote:

Jody Garnett wrote:
> I guess I will wait for patches to make my final judgment and vote
> for or against the proposal, but this seems to be happening a bit
> opaquely.
>
>
> I am not sure that approach works Justin - need a clear GSIP before
> starting work do we not?

The proposal lacks enough detail to get my vote so I am unsure of any
other way to evaluate it. I of course support the proposal in theory (i
would like to see us get to complex features), but I am more concerned
with how it will be implemented, in terms of code.

> I have a lot of unanswered questions. I have not been asked to
> review any code up to this point. The only review that has been done
> has been internal afaik. And from previous emails on this subject I
> think there is a misconception that just b/c there is code in the
> geotools repository that people have been reviewing it.
>
> And let me please re-inforce that I speak as someone who has tried
> to pull this work off before, not as someone who does not want to
> see it move forward, quite the contrary.
>
>
> This was one of my requests in asking Ben to make a test DataAccess
> available for unit tests to be written. Right now only Ben and Gabriel
> have hands on experience with the DataAccess API.
>
> Perhaps we can review unit tests of the DataAccess API prior to
> accepting this proposal? That is the danger right - that this work will
> start and we will find out that the DataAccess API is missing something?

While unit tests would be a step forward, I would rather see a patch for
a small section of the code base (non-trivial) ported to the new api.
The fact that there seems to be an argument that everything has to be
changed at once hints to me that this is not going to be a very clean
transition.

> Cheers,
> Jody

--
Gabriel Roldan
OpenGeo - http://www.opengeo.org

Gabriel I am concerned by both the “clean up cruft” line in your origional plan; and your commends about FilterFactory and issues with namespace in this email.

It sounds like you recognize the breath of scope you are talking about it beyond the requirements of this current proposal. This proposal cannot be a general request to clean up everything in GeoServer; it must remain focused on just making the transition to DataAccess.

Can we do an IRC breakout with Ben focused on cleaning up this proposal into the state it can be voted on

Jody

On Tue, Jan 20, 2009 at 2:38 AM, Gabriel Roldan <groldan@anonymised.com> wrote:

chiming in (late)

Afaik these are some of the most important things to address geoserver wise:

  • Catalog (already discussed and quite
  • FilterFactory: it is namespace aware or its unuseful. The approach taken in
    the app-schema modules is the one I like. I.e. PropertyName is given a ns
    context (NamespaceSupport or such) which in turn is given to the FilterFactory
    (as a hint? don’t quite remember). GeoServer has knowledge of the (incoming
    query) prefix to ns mappings so that’s what it should pass.
  • Namespaces as mere data objects: Recognize the fact that a single datastore
    may be serving FeatureTypes in different namespaces (and we have two well known
    candidates here: app-schema and wfs datastores) and stop assuming we can gige
    the DataAccessFactory a “namespace” parameter and all and every FT it serves
    will be tied to that namespace. Hence the need for trunk/Wicet UI, the old one
    is just not up to the task, and the new one may need further tweaking.

There are more low level ones like like “generizicing” all those FeatureSource
decorators, overloading utility methods to support Feature/FeatureType, etc.

I guess the concern of availability of resources is already covered as
explained by Ben, as long as we give them enough support as to start making
this happen now. Waiting another six months may be disastrous.

So, what are we actually requiring them in order to give the go ahead? It
would be unfair to ask for a detailed plan with strict deadlines (anyone that
was in touch with this stuff knows how blurry that could be, working on
geotools trunk is that risky for set deadlines), but we could surely ask and
help in building a little more detailed plan that what’s currently on the
GSIP.
About fear of broad changes, I can asure you that the tiniest change in
broading the typing of any DAFFT related class leads to a lot of cross cutting
changes, which doesn’t mean we can’t make then the smaller our codebase
allows.
As one that tried to figure out what it would mean doing this job, I strongly
assess that a bit of clean up is requisite zero. Clean up meaning getting rid
of all those deprecated classes, from old servlets to configuration; or you’ll
end up getting frustrated by the number of backward compatibility hacks you
need to make to preserve Data and friends instead of using the new
configuration classes directly.

So why don’t we laid the plan out as the set of functional (very few) and non
functional (the most) requirements and plan on each of them?
for example, not messing up with performance was clearly stated as a non
functional one, so write this down and state what’s going to be done in order
to preserve performance.

I guess my point is if we can turn that wiki page into a set of requirements,
a feature plan and a risk assessment and contention plan, we would be making
progress faster than exposing everyone’s concerns on a email trhead which
makes loosing the big picture too easy.

My 2c.-

Gabriel

On Monday 19 January 2009 12:17:18 Justin Deoliveira wrote:

Jody Garnett wrote:

I guess I will wait for patches to make my final judgment and vote
for or against the proposal, but this seems to be happening a bit
opaquely.

I am not sure that approach works Justin - need a clear GSIP before
starting work do we not?

The proposal lacks enough detail to get my vote so I am unsure of any
other way to evaluate it. I of course support the proposal in theory (i
would like to see us get to complex features), but I am more concerned
with how it will be implemented, in terms of code.

I have a lot of unanswered questions. I have not been asked to
review any code up to this point. The only review that has been done
has been internal afaik. And from previous emails on this subject I
think there is a misconception that just b/c there is code in the
geotools repository that people have been reviewing it.

And let me please re-inforce that I speak as someone who has tried
to pull this work off before, not as someone who does not want to
see it move forward, quite the contrary.

This was one of my requests in asking Ben to make a test DataAccess
available for unit tests to be written. Right now only Ben and Gabriel
have hands on experience with the DataAccess API.

Perhaps we can review unit tests of the DataAccess API prior to
accepting this proposal? That is the danger right - that this work will
start and we will find out that the DataAccess API is missing something?

While unit tests would be a step forward, I would rather see a patch for
a small section of the code base (non-trivial) ported to the new api.
The fact that there seems to be an argument that everything has to be
changed at once hints to me that this is not going to be a very clean
transition.

Cheers,
Jody

Gabriel Roldan
OpenGeo - http://www.opengeo.org