[Geoserver-devel] GSIP 69 - Catalog scalability enhancements - proof of concept

Moving forward with the discussion about the code being “proof of concept”.

By “proof of concept” I don’t mean that the code is bad, I mean that it’s a stage
where it proofs the concept but it’s not ready to be integrated in a code
based that was meant to become stable soon, simply because it’s last minute
(last minute changes are rarely good) and imcomplete.

This one is mostly about what’s happening above the catalog.
Right now we have some exemplar use cases, which have been indeed picked up with care.
However they are only three, that’s my biggest worry, I’m pretty sure that by
developing the full switch you might have seen more use cases and found more
bugs in the implementations.

Now, in order to reap the benefits of the scalable API one has to make the code
actually use the new scalable methods whenever large amounts of data is read
from the catalog, meaning switching also most other capabilities documents,
the Describe* methods (most of them can take no layer/coverage/feature type
identifier and describe the whole server as a result), the GUI, I guess
some parts of RESTConfig.

Now, let’s say we commit the proposal as it is now, with only the exemplar cases.
You argue it is done to minimize the risk. I say the net effect is that
it actually makes it way too easy, if not natural, to do all of the above
work outside of the proposal framework with little scrutiny, because everything
related to scalability is turned to “bug” or “improvement” jiras, forgetting that
these jira wire up with code that is not as well tested as the rest, and thus
put us at risk of getting something fundamentally broken while we are
doing bugfix releases.
So in the end the same amount of work gets done in the 2.2.x series, but with
very little scrutiny, and the proposal looks less scary because it changes less
code. Seems like a trick, that’s why I called it the “trojan horse”.

Even if you “promise” not to do any of these changes in the 2.2.x series the
fact remains that these changes are getting in very late in the game, after
3 months since I asked to start the release process and was told to wait “two weeks”.
As much as you feel my feedback is unfair, try to put on the other plate of the
scale how much unfair it has been already for me.

I’d much prefer to see the work done in a new trunk, done fully, done well,
and eventually be backported later if we don’t find a compromise on timed releases.
Again, this is negative feedback but I don’t want to be a show stopper, if everybody
else feels the proposal should go on I’ll vote -0 on it.

Cheers
Andrea

We are in agreement here, i don't think we have to continue debating over
2.2.x or not. I have already tried to articulate our initial reasoning for
including just the api changes so i am not going to do that anymore.
Clearly this discussion has brought to light that even the api changes are
not fleshed out well enough at this point to be included short term. Great,
that is what the proposal process is for. I have also tried to apologize
for the miscommunication on our part (mostly my fault) regarding this. Not
sure what more I can do there also... maybe I deserve to be
blacklisted/suspended for my misstep, ejection from the PSC perhaps... not
sure.

What I will do is try to articulate without ambiguity the current stance
and expectations from our side. We are in no way shape or
form targeting the 2.2.x branch with this work. Our only goal at this point
is to gather feedback from the community about the new api, which is
currently happening and that is great. Once discussion ramps down and we
have settled on what the initial api can look like we would like to start
working toward committing it to whatever the suitable unstable branch is at
that time, 2.3.x, 2.4.x, etc...

On Sat, Apr 28, 2012 at 2:30 PM, Andrea Aime
<andrea.aime@anonymised.com>wrote:

Moving forward with the discussion about the code being "proof of concept".

By "proof of concept" I don't mean that the code is bad, I mean that it's
a stage
where it proofs the concept but it's not ready to be integrated in a code
based that was meant to become stable soon, simply because it's last minute
(last minute changes are rarely good) and imcomplete.

This one is mostly about what's happening above the catalog.
Right now we have some exemplar use cases, which have been indeed picked
up with care.
However they are only three, that's my biggest worry, I'm pretty sure that
by
developing the full switch you might have seen more use cases and found
more
bugs in the implementations.

Now, in order to reap the benefits of the scalable API one has to make the
code
actually use the new scalable methods whenever large amounts of data is
read
from the catalog, meaning switching also most other capabilities documents,
the Describe* methods (most of them can take no layer/coverage/feature type
identifier and describe the whole server as a result), the GUI, I guess
some parts of RESTConfig.

Now, let's say we commit the proposal as it is now, with only the exemplar
cases.
You argue it is done to minimize the risk. I say the net effect is that
it actually makes it way too easy, if not natural, to do all of the above
work outside of the proposal framework with little scrutiny, because
everything
related to scalability is turned to "bug" or "improvement" jiras,
forgetting that
these jira wire up with code that is not as well tested as the rest, and
thus
put us at risk of getting something fundamentally broken while we are
doing bugfix releases.
So in the end the same amount of work gets done in the 2.2.x series, but
with
very little scrutiny, and the proposal looks less scary because it changes
less
code. Seems like a trick, that's why I called it the "trojan horse".

Even if you "promise" not to do any of these changes in the 2.2.x series
the
fact remains that these changes are getting in very late in the game, after
3 months since I asked to start the release process and was told to wait
"two weeks".
As much as you feel my feedback is unfair, try to put on the other plate
of the
scale how much unfair it has been already for me.

I'd much prefer to see the work done in a new trunk, done fully, done
well,
and eventually be backported later if we don't find a compromise on timed
releases.
Again, this is negative feedback but I don't want to be a show stopper, if
everybody
else feels the proposal should go on I'll vote -0 on it.

Cheers
Andrea

--
Justin Deoliveira
OpenGeo - http://opengeo.org
Enterprise support for open source geospatial.

C’mon, don’t be so dramatic… actually my first reaction after seeing that GSIP 69 was being proposed anyways was to step down myself in protest… so I should just shut up :-p

Hopefully we’re not that far away, betting my money on 2.3.x

Cheers
Andrea

···

We are in agreement here, i don’t think we have to continue debating over 2.2.x or not. I have already tried to articulate our initial reasoning for including just the api changes so i am not going to do that anymore. Clearly this discussion has brought to light that even the api changes are not fleshed out well enough at this point to be included short term. Great, that is what the proposal process is for. I have also tried to apologize for the miscommunication on our part (mostly my fault) regarding this. Not sure what more I can do there also… maybe I deserve to be blacklisted/suspended for my misstep, ejection from the PSC perhaps… not sure.

What I will do is try to articulate without ambiguity the current stance and expectations from our side. We are in no way shape or form targeting the 2.2.x branch with this work. Our only goal at this point is to gather feedback from the community about the new api, which is currently happening and that is great. Once discussion ramps down and we have settled on what the initial api can look like we would like to start working toward committing it to whatever the suitable unstable branch is at that time, 2.3.x, 2.4.x, etc…

On 29/04/12 02:21, Justin Deoliveira wrote:

maybe I deserve to be blacklisted/suspended for my misstep, ejection
from the PSC perhaps

No, your punishment is to stay *on* the PSC. :slight_smile:

This is exciting work, and has stimulated a lot of debate. Given the volume of the proposal, something is bound to be miscommunicated.

--
Ben Caradoc-Davies <Ben.Caradoc-Davies@anonymised.com>
Software Engineer
CSIRO Earth Science and Resource Engineering
Australian Resources Research Centre

On Mon, Apr 30, 2012 at 10:02 AM, Ben Caradoc-Davies <Ben.Caradoc-Davies@anonymised.com> wrote:

On 29/04/12 02:21, Justin Deoliveira wrote:

maybe I deserve to be blacklisted/suspended for my misstep, ejection
from the PSC perhaps

No, your punishment is to stay on the PSC. :slight_smile:

LOL, very well put Ben :slight_smile:

Cheers
Andrea

Ing. Andrea Aime
GeoSolutions S.A.S.
Tech lead

Via Poggio alle Viti 1187
55054 Massarosa (LU)
Italy

phone: +39 0584 962313
fax: +39 0584 962313
mob: +39 339 8844549

http://www.geo-solutions.it
http://geo-solutions.blogspot.com/
http://www.youtube.com/user/GeoSolutionsIT
http://www.linkedin.com/in/andreaaime
http://twitter.com/geowolf


What I will do is try to articulate without ambiguity the current stance and expectations from our side. We are in no way shape or form targeting the 2.2.x branch with this work. Our only goal at this point is to gather feedback from the community about the new api, which is currently happening and that is great. Once discussion ramps down and we have settled on what the initial api can look like we would like to start working toward committing it to whatever the suitable unstable branch is at that time, 2.3.x, 2.4.x, etc…

Hopefully we’re not that far away, betting my money on 2.3.x

I agree - while I did not grind through the patch (way to go Andrea) I also did not react quite as strongly the duplication involved in the predicate data structures.

I have had experience using the GeoTools filter support with normal POJOs so I know it works reasonably well. I agree that the ECQL utility class helps with test case readability if that really is a driving factor.

Jody

On Sat, Apr 28, 2012 at 11:30 AM, Andrea Aime
<andrea.aime@anonymised.com> wrote:

Moving forward with the discussion about the code being "proof of concept".

By "proof of concept" I don't mean that the code is bad, I mean that it's a
stage
where it proofs the concept but it's not ready to be integrated in a code
based that was meant to become stable soon, simply because it's last minute
(last minute changes are rarely good) and imcomplete.

Well, the "soon to be stable" 2.2 branch is out of the question.
Question is whether we'll allow progress to occur in the "soon to be
trunk" 2.3 branch, or (surprisingly) everything needs to be nailed
down to the minimal detail to allow new development to happen on
trunk.

This one is mostly about what's happening above the catalog.
Right now we have some exemplar use cases, which have been indeed picked up
with care.
However they are only three, that's my biggest worry, I'm pretty sure that
by
developing the full switch you might have seen more use cases and found more
bugs in the implementations.

Now, in order to reap the benefits of the scalable API one has to make the
code
actually use the new scalable methods whenever large amounts of data is read
from the catalog, meaning switching also most other capabilities documents,
the Describe* methods (most of them can take no layer/coverage/feature type
identifier and describe the whole server as a result), the GUI, I guess
some parts of RESTConfig.

I judged smarter to identify the driving use cases first rather than
go an update the whole code base in one shot.
Note the use cases are meant to be representative of all the
(existing) different uses of the catalog where scalability is a
concern. If you can identify more, then that would be awesome.
For instance, the three ones picked up represent the cases where:
- you need to process either the full list of a given type or
resource, or rather using some simple filtering and sorting. The
example is GetCaps, but applies also to Describe* and RESTConfig's
lists of resources.
- paging, filtering with and "iLike" like predicate, and sorting: GUI
- client side full scans where part of the filter is encodable and
part not, and that usually implies building a lot of objects to then
be discarded: SecureCatalog

With that in place, it looks like it'd be possible to migrate the rest
of the offending code where those usage patterns apply. May be it's
not so a good idea, it seemed to be to me and to the people inside
OpenGeo whom I validated the proposal with before going public with
it.

Now, let's say we commit the proposal as it is now, with only the exemplar
cases.
You argue it is done to minimize the risk. I say the net effect is that
it actually makes it way too easy, if not natural, to do all of the above
work outside of the proposal framework with little scrutiny, because
everything
related to scalability is turned to "bug" or "improvement" jiras, forgetting
that
these jira wire up with code that is not as well tested as the rest, and
thus
put us at risk of getting something fundamentally broken while we are
doing bugfix releases.

I see your point. While the proposal keeps on under discussion status,
I see no problem on start porting more stuff over on the proposal's
branch? Yet we needed the proposal to go out of incubation, so I think
it has been a good approach: gather all this feedback earlier in the
process instead of going public with it once we have migrated
everything/

So in the end the same amount of work gets done in the 2.2.x series, but
with
very little scrutiny, and the proposal looks less scary because it changes
less
code. Seems like a trick, that's why I called it the "trojan horse".

If so every iterative approach is so too.

Even if you "promise" not to do any of these changes in the 2.2.x series the
fact remains that these changes are getting in very late in the game, after
3 months since I asked to start the release process and was told to wait
"two weeks".

I don't "promise". I _consult_ with the PSC about the feasibility of
getting any of this into the 2.2.x series, and obey the PSC decision.
I don't remember having told you to wait for two weeks to get GSIP69
in place for 2.2.x. Rather the contrary, I remember having told you
this work was not targeting 2.2.x but a new trunk. If later in the
game I ask the PSC what the opinion is about doing so, I don't see
what's disrespectful about asking. If, on the contrary, I did ever
told you to wait for two week with regard to GSIP69, I very much
apologize.

As much as you feel my feedback is unfair, try to put on the other plate of
the
scale how much unfair it has been already for me.

Please, explain how the GSIP 69 proposal has been unfair for you, so
that I'm more careful in the future.

I'd much prefer to see the work done in a new trunk, done fully, done well,
and eventually be backported later if we don't find a compromise on timed
releases.
Again, this is negative feedback but I don't want to be a show stopper, if
everybody
else feels the proposal should go on I'll vote -0 on it.

This is not negative feedback, it's feedback. I think by the time you
replied to this the 2.2.x debate was already out of the question, but
may be wrong. In any case. I _agree_ it should be done on a new trunk.

Cheers,
Gabriel.

Cheers
Andrea

--
Gabriel Roldan
OpenGeo - http://opengeo.org
Expert service straight from the developers.

On Thu, May 3, 2012 at 6:00 PM, Gabriel Roldan <groldan@anonymised.com> wrote:

On Sat, Apr 28, 2012 at 11:30 AM, Andrea Aime
<andrea.aime@anonymised.com> wrote:

Moving forward with the discussion about the code being “proof of concept”.

By “proof of concept” I don’t mean that the code is bad, I mean that it’s a
stage
where it proofs the concept but it’s not ready to be integrated in a code
based that was meant to become stable soon, simply because it’s last minute
(last minute changes are rarely good) and imcomplete.

Well, the “soon to be stable” 2.2 branch is out of the question.
Question is whether we’ll allow progress to occur in the “soon to be
trunk” 2.3 branch, or (surprisingly) everything needs to be nailed
down to the minimal detail to allow new development to happen on
trunk.

My feedback was still based on the idea that you wanted to commit the work
on 2.2.x
I saw Justin say it would not have happened, but it I did not see you say the
same, and you’re the proponent of the GSIP, so I (wrongly) assumed you wanted to
go on and still commit on 2.2.x

So in the end the same amount of work gets done in the 2.2.x series, but
with
very little scrutiny, and the proposal looks less scary because it changes
less
code. Seems like a trick, that’s why I called it the “trojan horse”.

If so every iterative approach is so too.

Iterative on trunk is fine.

Even if you “promise” not to do any of these changes in the 2.2.x series the
fact remains that these changes are getting in very late in the game, after
3 months since I asked to start the release process and was told to wait
“two weeks”.

I don’t “promise”. I consult with the PSC about the feasibility of
getting any of this into the 2.2.x series, and obey the PSC decision.
I don’t remember having told you to wait for two weeks to get GSIP69
in place for 2.2.x. Rather the contrary, I remember having told you
this work was not targeting 2.2.x but a new trunk. If later in the
game I ask the PSC what the opinion is about doing so, I don’t see
what’s disrespectful about asking. If, on the contrary, I did ever
told you to wait for two week with regard to GSIP69, I very much
apologize.

You were not the one telling me to wait two weeks, Justin was.
It’s the sum of having had to wait months first and then getting another
proposal in that made me snap.

Anyways, the above has been clarified already,
and if we get the timed releases proposals going we’ll eradicate
the very possibility of finding ourselves in this rut again in the future.

Cheers
Andrea


Ing. Andrea Aime
GeoSolutions S.A.S.
Tech lead

Via Poggio alle Viti 1187
55054 Massarosa (LU)
Italy

phone: +39 0584 962313
fax: +39 0584 962313
mob: +39 339 8844549

http://www.geo-solutions.it
http://geo-solutions.blogspot.com/
http://www.youtube.com/user/GeoSolutionsIT
http://www.linkedin.com/in/andreaaime
http://twitter.com/geowolf