[Geoserver-devel] GSIP 69 - Catalog scalability enhancements

Hi all,

I've put together a GSIP to enhance the GeoServer's catalog vertical scalability
<http://geoserver.org/display/GEOS/GSIP+69+-+Catalog+scalability+enhancements&gt;

At OpenGeo we're hoping to get this proposal implemented on 2.2.x,
since the Catalog API changes are purely additive and no Catalog
client code really needs to be changed to maintain the current
functionality.
Yet it enables to _progressively_ upgrade hungry client code to use
the new access methods in a streaming fashion.
That said, we're also willing to push back to 2.3 if that seems like
too much for the stable 2.2.x branch, although based on your feedback
we'll try to make any amendments that could possibly allow the API
extensions to land on 2.2.x to avoid having to maintain a separate
branch for the OpenGeo Suite.

So, any feedback will be much appreciated.

Best regards,
Gabriel
--
Gabriel Roldan
OpenGeo - http://opengeo.org
Expert service straight from the developers.

Ciao Gabriel,
I do see the need for the changes you describe but, not to be picky,
the security changes were supposed to be the last ones bringing
instability for 2.2.x. Changes to the internal look a bit scary with
these regards.

Baseline is, I am personally against allowing these changes right away
for 2.2.x I am ok if we leave them settle a bit and then we backport
them later on.

Regards,
Simone Giannecchini
-------------------------------------------------------
Ing. Simone Giannecchini
GeoSolutions S.A.S.
Founder

Via Poggio alle Viti 1187
55054 Massarosa (LU)
Italy

phone: +39 0584 962313
fax: +39 0584 962313
mob: +39 333 8128928

http://www.geo-solutions.it
http://geo-solutions.blogspot.com/
http://www.youtube.com/user/GeoSolutionsIT
http://www.linkedin.com/in/simonegiannecchini
http://twitter.com/simogeo

-------------------------------------------------------

On Tue, Apr 24, 2012 at 6:11 PM, Gabriel Roldan <groldan@anonymised.com> wrote:

Hi all,

I've put together a GSIP to enhance the GeoServer's catalog vertical scalability
<http://geoserver.org/display/GEOS/GSIP+69+-+Catalog+scalability+enhancements&gt;

At OpenGeo we're hoping to get this proposal implemented on 2.2.x,
since the Catalog API changes are purely additive and no Catalog
client code really needs to be changed to maintain the current
functionality.
Yet it enables to _progressively_ upgrade hungry client code to use
the new access methods in a streaming fashion.
That said, we're also willing to push back to 2.3 if that seems like
too much for the stable 2.2.x branch, although based on your feedback
we'll try to make any amendments that could possibly allow the API
extensions to land on 2.2.x to avoid having to maintain a separate
branch for the OpenGeo Suite.

So, any feedback will be much appreciated.

Best regards,
Gabriel
--
Gabriel Roldan
OpenGeo - http://opengeo.org
Expert service straight from the developers.

------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and
threat landscape has changed and how IT managers can respond. Discussions
will include endpoint security, mobile security and the latest in malware
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Geoserver-devel mailing list
Geoserver-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/geoserver-devel

Hi Simone,

thanks for the quick reply.
I would like to gather some more feedback on the proposal itself,
whether it looks good or there's something that's not clear enough or
directly looks odd.
I am expecting some resistance to incorporate all changes to 2.2.x,
that was my initial though as well. But first step would be to
evaluate the proposal and have 2.2.x split onto its own branch, so
that we have room on trunk?
That said the concern is totally valid, and reason why the proposal is
all about API extensions, purely additive, so _no_ client code _needs_
to be touched, nor even any internal Catalog data structure or data
access method.
Perhaps a middle compromise could be agreed: adding the new methods,
not changing anything else on 2.2.x, and work on a 2.3.x trunk, so
that we can backport any Catalog client code on a case by case basis
once consolidated on trunk?

But even for that it'd be good to have some feedback on the proposal
itself, letting the 2.2.x discussion off for a while?

Cheers,
Gabriel

On Tue, Apr 24, 2012 at 2:11 PM, Simone Giannecchini
<simone.giannecchini@anonymised.com> wrote:

Ciao Gabriel,
I do see the need for the changes you describe but, not to be picky,
the security changes were supposed to be the last ones bringing
instability for 2.2.x. Changes to the internal look a bit scary with
these regards.

Baseline is, I am personally against allowing these changes right away
for 2.2.x I am ok if we leave them settle a bit and then we backport
them later on.

Regards,
Simone Giannecchini
-------------------------------------------------------
Ing. Simone Giannecchini
GeoSolutions S.A.S.
Founder

Via Poggio alle Viti 1187
55054 Massarosa (LU)
Italy

phone: +39 0584 962313
fax: +39 0584 962313
mob: +39 333 8128928

http://www.geo-solutions.it
http://geo-solutions.blogspot.com/
http://www.youtube.com/user/GeoSolutionsIT
http://www.linkedin.com/in/simonegiannecchini
http://twitter.com/simogeo

-------------------------------------------------------

On Tue, Apr 24, 2012 at 6:11 PM, Gabriel Roldan <groldan@anonymised.com> wrote:

Hi all,

I've put together a GSIP to enhance the GeoServer's catalog vertical scalability
<http://geoserver.org/display/GEOS/GSIP+69+-+Catalog+scalability+enhancements&gt;

At OpenGeo we're hoping to get this proposal implemented on 2.2.x,
since the Catalog API changes are purely additive and no Catalog
client code really needs to be changed to maintain the current
functionality.
Yet it enables to _progressively_ upgrade hungry client code to use
the new access methods in a streaming fashion.
That said, we're also willing to push back to 2.3 if that seems like
too much for the stable 2.2.x branch, although based on your feedback
we'll try to make any amendments that could possibly allow the API
extensions to land on 2.2.x to avoid having to maintain a separate
branch for the OpenGeo Suite.

So, any feedback will be much appreciated.

Best regards,
Gabriel
--
Gabriel Roldan
OpenGeo - http://opengeo.org
Expert service straight from the developers.

------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and
threat landscape has changed and how IT managers can respond. Discussions
will include endpoint security, mobile security and the latest in malware
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Geoserver-devel mailing list
Geoserver-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/geoserver-devel

--
Gabriel Roldan
OpenGeo - http://opengeo.org
Expert service straight from the developers.

Just some clarification. The proposal is additions to the catalog interface
itself, which are additive, with no updates to client code. Even though the
github branch does contain changes for client code the idea is to only make
the api changes.

So the desire to get this onto 2.2.x stems from the fact that
these changes without anything out of the catalog changing should be low
risk and not represent instability. But certainly we realize
that changes on any scale represent some amount of risk and that we
previously agreed that geoserver 2.2.x should be striving for a stable
release. Which is why it was originally stated that we are perfectly happy
and willing to wait for 2.2.x to branch before pushing on this. Which if we
are formally agreeing to only bug fixes from here on in should happen soon
imo, but that is another discussion.

-Justin

On Tue, Apr 24, 2012 at 1:38 PM, Gabriel Roldan <groldan@anonymised.com> wrote:

Hi Simone,

thanks for the quick reply.
I would like to gather some more feedback on the proposal itself,
whether it looks good or there's something that's not clear enough or
directly looks odd.
I am expecting some resistance to incorporate all changes to 2.2.x,
that was my initial though as well. But first step would be to
evaluate the proposal and have 2.2.x split onto its own branch, so
that we have room on trunk?
That said the concern is totally valid, and reason why the proposal is
all about API extensions, purely additive, so _no_ client code _needs_
to be touched, nor even any internal Catalog data structure or data
access method.
Perhaps a middle compromise could be agreed: adding the new methods,
not changing anything else on 2.2.x, and work on a 2.3.x trunk, so
that we can backport any Catalog client code on a case by case basis
once consolidated on trunk?

But even for that it'd be good to have some feedback on the proposal
itself, letting the 2.2.x discussion off for a while?

Cheers,
Gabriel

On Tue, Apr 24, 2012 at 2:11 PM, Simone Giannecchini
<simone.giannecchini@anonymised.com> wrote:
> Ciao Gabriel,
> I do see the need for the changes you describe but, not to be picky,
> the security changes were supposed to be the last ones bringing
> instability for 2.2.x. Changes to the internal look a bit scary with
> these regards.
>
> Baseline is, I am personally against allowing these changes right away
> for 2.2.x I am ok if we leave them settle a bit and then we backport
> them later on.
>
> Regards,
> Simone Giannecchini
> -------------------------------------------------------
> Ing. Simone Giannecchini
> GeoSolutions S.A.S.
> Founder
>
> Via Poggio alle Viti 1187
> 55054 Massarosa (LU)
> Italy
>
> phone: +39 0584 962313
> fax: +39 0584 962313
> mob: +39 333 8128928
>
> http://www.geo-solutions.it
> http://geo-solutions.blogspot.com/
> http://www.youtube.com/user/GeoSolutionsIT
> http://www.linkedin.com/in/simonegiannecchini
> http://twitter.com/simogeo
>
> -------------------------------------------------------
>
>
> On Tue, Apr 24, 2012 at 6:11 PM, Gabriel Roldan <groldan@anonymised.com>
wrote:
>> Hi all,
>>
>> I've put together a GSIP to enhance the GeoServer's catalog vertical
scalability
>> <
http://geoserver.org/display/GEOS/GSIP+69+-+Catalog+scalability+enhancements
>
>>
>> At OpenGeo we're hoping to get this proposal implemented on 2.2.x,
>> since the Catalog API changes are purely additive and no Catalog
>> client code really needs to be changed to maintain the current
>> functionality.
>> Yet it enables to _progressively_ upgrade hungry client code to use
>> the new access methods in a streaming fashion.
>> That said, we're also willing to push back to 2.3 if that seems like
>> too much for the stable 2.2.x branch, although based on your feedback
>> we'll try to make any amendments that could possibly allow the API
>> extensions to land on 2.2.x to avoid having to maintain a separate
>> branch for the OpenGeo Suite.
>>
>> So, any feedback will be much appreciated.
>>
>> Best regards,
>> Gabriel
>> --
>> Gabriel Roldan
>> OpenGeo - http://opengeo.org
>> Expert service straight from the developers.
>>
>>
------------------------------------------------------------------------------
>> Live Security Virtual Conference
>> Exclusive live event will cover all the ways today's security and
>> threat landscape has changed and how IT managers can respond.
Discussions
>> will include endpoint security, mobile security and the latest in
malware
>> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
>> _______________________________________________
>> Geoserver-devel mailing list
>> Geoserver-devel@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/geoserver-devel

--
Gabriel Roldan
OpenGeo - http://opengeo.org
Expert service straight from the developers.

------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and
threat landscape has changed and how IT managers can respond. Discussions
will include endpoint security, mobile security and the latest in malware
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Geoserver-devel mailing list
Geoserver-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/geoserver-devel

--
Justin Deoliveira
OpenGeo - http://opengeo.org
Enterprise support for open source geospatial.

On Tue, Apr 24, 2012 at 6:11 PM, Gabriel Roldan <groldan@anonymised.com> wrote:

Hi all,

I've put together a GSIP to enhance the GeoServer's catalog vertical
scalability
<
http://geoserver.org/display/GEOS/GSIP+69+-+Catalog+scalability+enhancements
>

At OpenGeo we're hoping to get this proposal implemented on 2.2.x,
since the Catalog API changes are purely additive and no Catalog
client code really needs to be changed to maintain the current
functionality.
Yet it enables to _progressively_ upgrade hungry client code to use
the new access methods in a streaming fashion.
That said, we're also willing to push back to 2.3 if that seems like
too much for the stable 2.2.x branch, although based on your feedback
we'll try to make any amendments that could possibly allow the API
extensions to land on 2.2.x to avoid having to maintain a separate
branch for the OpenGeo Suite.

So, any feedback will be much appreciated.

I don't see a patch attached to the GSIP? No technical feedback for the
moment.

However, let me bring up this thread dated January 26 2012:
http://osgeo-org.1560.n6.nabble.com/Time-to-start-the-GeoServer-2-2-0-long-release-process-td4340171.html

In that thread me (and GeoSolutions by proxy) was proposing to start
relasing GeoServer 2.2.x
before the GSIP storm that was accumulated could land on trunk, and in
order to have the 2.2.0
release sometimes in May (2/3 months from then)... not out of a whim, but
because we needed it.

Summary reaction to this:
- desire to get the security GSIP in before the release in the hope that it
would have been
  ready within "two week". The actual work landed slightly more than two
months later (ok, delays happen)
- GSIP 69 is ensured not to land on 2.2.x (see Gabriel mail). Not it seems
it is.

Long story short, it seems that all the assurances were disattended and
we're back at
square one, while in January we had what seemed a rather good GeoServer
almost ready for release.

I'm not making a negative judgement on the technical work, both components
are welcomed
developments, but personally I'm rather disappointed.

Cheers
Andrea

--
-------------------------------------------------------
Ing. Andrea Aime
GeoSolutions S.A.S.
Tech lead

Via Poggio alle Viti 1187
55054 Massarosa (LU)
Italy

phone: +39 0584 962313
fax: +39 0584 962313
mob: +39 339 8844549

http://www.geo-solutions.it
http://geo-solutions.blogspot.com/
http://www.youtube.com/user/GeoSolutionsIT
http://www.linkedin.com/in/andreaaime
http://twitter.com/geowolf

-------------------------------------------------------

Fair enough. Point taken. This was poorly managed on our part and for that
I apologize. Additional comments inline.

On Tue, Apr 24, 2012 at 2:50 PM, Andrea Aime
<andrea.aime@anonymised.com>wrote:

On Tue, Apr 24, 2012 at 6:11 PM, Gabriel Roldan <groldan@anonymised.com>wrote:

Hi all,

I've put together a GSIP to enhance the GeoServer's catalog vertical
scalability
<
http://geoserver.org/display/GEOS/GSIP+69+-+Catalog+scalability+enhancements
>

At OpenGeo we're hoping to get this proposal implemented on 2.2.x,
since the Catalog API changes are purely additive and no Catalog
client code really needs to be changed to maintain the current
functionality.
Yet it enables to _progressively_ upgrade hungry client code to use
the new access methods in a streaming fashion.
That said, we're also willing to push back to 2.3 if that seems like
too much for the stable 2.2.x branch, although based on your feedback
we'll try to make any amendments that could possibly allow the API
extensions to land on 2.2.x to avoid having to maintain a separate
branch for the OpenGeo Suite.

So, any feedback will be much appreciated.

I don't see a patch attached to the GSIP? No technical feedback for the
moment.

I believe Gabriel should be remedying this soon.

However, let me bring up this thread dated January 26 2012:

http://osgeo-org.1560.n6.nabble.com/Time-to-start-the-GeoServer-2-2-0-long-release-process-td4340171.html

In that thread me (and GeoSolutions by proxy) was proposing to start
relasing GeoServer 2.2.x
before the GSIP storm that was accumulated could land on trunk, and in
order to have the 2.2.0
release sometimes in May (2/3 months from then)... not out of a whim, but
because we needed it.

Summary reaction to this:
- desire to get the security GSIP in before the release in the hope that
it would have been
  ready within "two week". The actual work landed slightly more than two
months later (ok, delays happen)
- GSIP 69 is ensured not to land on 2.2.x (see Gabriel mail). Not it seems
it is.

As stated previously the idea was that only part of the proposal was being

*proposed* for 2.2.x. The non risky part. I admit though this was not
appropriately communicated.

Long story short, it seems that all the assurances were disattended and
we're back at
square one, while in January we had what seemed a rather good GeoServer
almost ready for release.

I'm not making a negative judgement on the technical work, both components
are welcomed
developments, but personally I'm rather disappointed.

You are free to recommend constructive ways to remedy the situation, and I
believe you will find us accommodating. Let's take GSIP-69 off the table
for 2.2.x and we will remind ourselves that no GSIP's (not even ones
proposed with a willingness to wait for 2.3.x) should be put forth until
2.2.x is branched.

Shall we also roll back the security work? Doing so should leave us back in
a state in which GeoServer is good again and ready for release.

Cheers
Andrea

--
-------------------------------------------------------
Ing. Andrea Aime
GeoSolutions S.A.S.
Tech lead

Via Poggio alle Viti 1187
55054 Massarosa (LU)
Italy

phone: +39 0584 962313
fax: +39 0584 962313
mob: +39 339 8844549

http://www.geo-solutions.it
http://geo-solutions.blogspot.com/
http://www.youtube.com/user/GeoSolutionsIT
http://www.linkedin.com/in/andreaaime
http://twitter.com/geowolf

-------------------------------------------------------

------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and
threat landscape has changed and how IT managers can respond. Discussions
will include endpoint security, mobile security and the latest in malware
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Geoserver-devel mailing list
Geoserver-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/geoserver-devel

--
Justin Deoliveira
OpenGeo - http://opengeo.org
Enterprise support for open source geospatial.

Hi Andrea,

On Tue, Apr 24, 2012 at 3:50 PM, Andrea Aime
<andrea.aime@anonymised.com> wrote:

On Tue, Apr 24, 2012 at 6:11 PM, Gabriel Roldan <groldan@anonymised.com> wrote:

Hi all,

I've put together a GSIP to enhance the GeoServer's catalog vertical
scalability

<http://geoserver.org/display/GEOS/GSIP+69+-+Catalog+scalability+enhancements&gt;

At OpenGeo we're hoping to get this proposal implemented on 2.2.x,
since the Catalog API changes are purely additive and no Catalog
client code really needs to be changed to maintain the current
functionality.
Yet it enables to _progressively_ upgrade hungry client code to use
the new access methods in a streaming fashion.
That said, we're also willing to push back to 2.3 if that seems like
too much for the stable 2.2.x branch, although based on your feedback
we'll try to make any amendments that could possibly allow the API
extensions to land on 2.2.x to avoid having to maintain a separate
branch for the OpenGeo Suite.

So, any feedback will be much appreciated.

I don't see a patch attached to the GSIP? No technical feedback for the
moment.

I could make things more explicit on the main proposal page, but each
section contains links to more detailed information, including the
actual "patch" to be seen inline. Namely:
<http://geoserver.org/display/GEOS/GSIP+69+-+API+Proposal&gt;
As it's all about new stuff, when it comes strictly to the API
proposal, I judged it'd be convenient to have the whole proposal to be
seed inline. Yet there's also a link to the github branch where the
whole work lives, both API proposal, code migration for exemplary use
cases, and alternative jdbc backend:
<https://github.com/groldan/geoserver/tree/GSIP69&gt;

That said I'm working on updating that branch with sensible, step by
step squashed commits, one per each of the above mentioned additions.
So if you can go ahead and give some feedback based on what's on those
pages. Otherwise I'll let the list know when I've the squashed commits
into the before mentioned branch so each change can easily be seen as
an actual patch.

WRT 2.2.x or not, my intention is to just state what the ideal would
be for us given the current situation, stressing to the limits that it
is fine for us if that is found unacceptable by the PSC.

Best regards,
Gabriel

However, let me bring up this thread dated January 26 2012:
http://osgeo-org.1560.n6.nabble.com/Time-to-start-the-GeoServer-2-2-0-long-release-process-td4340171.html

In that thread me (and GeoSolutions by proxy) was proposing to start
relasing GeoServer 2.2.x
before the GSIP storm that was accumulated could land on trunk, and in order
to have the 2.2.0
release sometimes in May (2/3 months from then)... not out of a whim, but
because we needed it.

Summary reaction to this:
- desire to get the security GSIP in before the release in the hope that it
would have been
ready within "two week". The actual work landed slightly more than two
months later (ok, delays happen)
- GSIP 69 is ensured not to land on 2.2.x (see Gabriel mail). Not it seems
it is.

Long story short, it seems that all the assurances were disattended and
we're back at
square one, while in January we had what seemed a rather good GeoServer
almost ready for release.

I'm not making a negative judgement on the technical work, both components
are welcomed
developments, but personally I'm rather disappointed.

Cheers
Andrea

--
-------------------------------------------------------
Ing. Andrea Aime
GeoSolutions S.A.S.
Tech lead

Via Poggio alle Viti 1187
55054 Massarosa (LU)
Italy

phone: +39 0584 962313
fax: +39 0584 962313
mob: +39 339 8844549

http://www.geo-solutions.it
http://geo-solutions.blogspot.com/
http://www.youtube.com/user/GeoSolutionsIT
http://www.linkedin.com/in/andreaaime
http://twitter.com/geowolf

-------------------------------------------------------

--
Gabriel Roldan
OpenGeo - http://opengeo.org
Expert service straight from the developers.

I could make things more explicit on the main proposal page, but each
section contains links to more detailed information, including the
actual "patch" to be seen inline. Namely:
<http://geoserver.org/display/GEOS/GSIP+69+-+API+Proposal&gt;
As it's all about new stuff, when it comes strictly to the API
proposal, I judged it'd be convenient to have the whole proposal to be
seed inline. Yet there's also a link to the github branch where the
whole work lives, both API proposal, code migration for exemplary use
cases, and alternative jdbc backend:
<https://github.com/groldan/geoserver/tree/GSIP69&gt;

All right, took half a day to go though the proposal and the code.

The proposal direction is good, agree with all the needs it expresses
and the general idea that we should be able to filter and page and load
stuff interactively.

I strongly disagree on the idea that rolling a new filter API is better than
using the OGC one, this is a first show stopper for me.
The Predicate API is very limited and has no spatial filter support,
GeoServer core already depends on GeoTools heavily so the whole
reasoning about Predicate advantages is pretty empty, I actually
see a lot more weaknesses in rolling a new API:
- it violates the KISS principle, adding more stuff does not make anything
  simpler
- it does not make writing encoders any easier, on the contrary, demands
  more code to be written while we already have a pretty simple way to
  split a complex OGC filter into a supported and unsupported part,
  lots of encoders that we can mimick and/or copy code from
- it does not avod external dependencies, as geotools is already there
- it misses a lot of expressiveness, instead of writing clumsly Predicates
  that can only run in memory (since they are not well known) we can
  actually use a API that can get translated to SQL (thinking about the
  name matching filters in the GUI here)
- the idea that the domain is different is quite surprising, most of the
elements
  that grow in big numbers have a bbox attached to them, so they are
  indeed spatial. One of the things you normally want to do in a security
  subsystem is restrict access by geographic area, and we could not
  express that with Predicate

Moreover, with OGC filters it would get really easy to create a datastore
based catalog implementation if we want to, and it would be much better
of a proof of concept than the current ones (more on this later).

The only drawback of Filter is that it is supposed to be a "closed" API,
with no way to implement a new filter, but that is actually less of a
limitation
since the model is rich, and easily worked around by implementing
whatever filter function is missing.

Moving forward, I would advise against having to check if the store
can data sort or not, it just makes the caller code more complicated
and forces it to do work arounds if sorting is really needed.
In GeoTools we have code that does merge-sort using disk space
if necessary that can sort whatever amount of data with little memory
footprint (of course, at the price of performance).
It would be better to have a method that checks if sorting can be done
fast instead, so if the code needs sorting as an optimization it can
leverage it or use an alternate path, but code that really needs sorting
will just ask it and have it done by the catalog impl without having to
repeat that in all places that do need sorting for good.

A small other thing is that these methods are supposed to access file system
or the network, but they don't throw any exception... I can live with that,
most likely the calling code does not have anything meaningful to do
in case of exception anyways, but thought I'd point it out anyways.

A thing that I find instead surprising is seeing no trace of a transaction
concept,
if the intent is to move to a model where multiple GeoServer share the same
db
and write on it in parallel, being able to use transactions seems quite
important,
there is a need for coordination that is not addressed by this proposal.

The modifications done below and above the API changes are simple proofs
of concept, meaning the validation of the API is low and the checks on its
quality low as well, not something we'd want to fast track on a code base
that we want to make more stable.

Let's start by what's above the API. All we have is a handful of examples,
but the code base is largerly unchanged. On one side this means the new
code is not going to be touched very much, on the other side it means we
get no benefit from the new API and we're not validating it and its
implementation
at all. Looks like a trojan horse to get in the higher level modifications
later,
which will actually destabilize the code base as we are already in RC or
bugfix release mode.
Moreover various of the predicates created have no chance to be encoded
in native mode since they are not "well known".
In fact the authorization subsystem should be changed too in order to
leverage
the new scalability api, so that it returns a filter instead of checking
point by
point a single layer.
Same goes for the GUI filtering code, which:
- loads the full data set in memory in case the store is not able to sort on
  the desired attribute
- builds a predicate that is not encodable (with OGC filter we could
  actually encode it instead).

The bits below the API are baffling too. Both the JE and JDBC
implementations
are based on key/value store where the value is the XML dump of each
CatalogInfo.
This makes the whole point about filter encoding moot, as there is almost
no filter
being encoded down at the DB level.
Say I want to do a GetMap request with a single layer, we know the name, we
end
up scanning the whole database, load all the blobs, parse them back in
memory,
apply the filter in memory. Sure it scales to 100M items, but nobody will
want to wait
for the time it takes this implementation to do a simple GetMap.
I know they are community modules, but even a community module should have
a split
chance of being used, this implementation seems so weak that I don't
believe anyone
will want actually want to use it, and in order to validate the API we
should have an
implementation that actually makes use of some of its concept (some actual
native filtering
for example).

(little aside, nice to see bonecp in the mix, I actually wanted to try out
that connection pool
me too)

Long story short, the proposal seems weak in some API points, and the
implementation is
proof of concept which I don't think should be allowed to land on trunk
right now.

But I want to offer a reasonable alternative: shall we move to a time boxed
6 months
release cycle? 4 months to add new stuff, 2 months to stabilize, rinse and
repeat,
push out 3-4 bugfix releases in the stable series.
This way we don't have to have these long waits for new features to show
up, this
much needed scalability improvement can land on a stable series in the next
8-9 months (assuming 2 months to release 2.2.0 and 6 months cycle), be
vetted
and improved so that it's an API and an implementation we can all be proud
of.

I really want to GSIP in, just not now and not in its current state.
But I'm willing to put forward resources to help out making it become a
reality

I really do hope that the rest of the PSC chimes in as well, this is an
important
GSIP and it deserves other people opinions (besides my personal rants).

Ah, next week I'll also try to prepare a GSIP for the 6 months release
cycle,
unless of course there are very negative reactions to the idea.

Cheers
Andrea

--
-------------------------------------------------------
Ing. Andrea Aime
GeoSolutions S.A.S.
Tech lead

Via Poggio alle Viti 1187
55054 Massarosa (LU)
Italy

phone: +39 0584 962313
fax: +39 0584 962313
mob: +39 339 8844549

http://www.geo-solutions.it
http://geo-solutions.blogspot.com/
http://www.youtube.com/user/GeoSolutionsIT
http://www.linkedin.com/in/andreaaime
http://twitter.com/geowolf

-------------------------------------------------------

Hello Andrea and all

On Fri, Apr 27, 2012 at 2:29 PM, Andrea Aime
<andrea.aime@anonymised.com> wrote:

I could make things more explicit on the main proposal page, but each
section contains links to more detailed information, including the
actual "patch" to be seen inline. Namely:
<http://geoserver.org/display/GEOS/GSIP+69+-+API+Proposal&gt;
As it's all about new stuff, when it comes strictly to the API
proposal, I judged it'd be convenient to have the whole proposal to be
seed inline. Yet there's also a link to the github branch where the
whole work lives, both API proposal, code migration for exemplary use
cases, and alternative jdbc backend:
<https://github.com/groldan/geoserver/tree/GSIP69&gt;

All right, took half a day to go though the proposal and the code.

Thanks for doing that, much appreciated.

The proposal direction is good, agree with all the needs it expresses
and the general idea that we should be able to filter and page and load
stuff interactively.

That's a good starting point.

I strongly disagree on the idea that rolling a new filter API is better than
using the OGC one, this is a first show stopper for me.
The Predicate API is very limited and has no spatial filter support,
GeoServer core already depends on GeoTools heavily so the whole
reasoning about Predicate advantages is pretty empty, I actually
see a lot more weaknesses in rolling a new API:
- it violates the KISS principle, adding more stuff does not make anything
simpler
- it does not make writing encoders any easier, on the contrary, demands
more code to be written while we already have a pretty simple way to
split a complex OGC filter into a supported and unsupported part,
lots of encoders that we can mimick and/or copy code from
- it does not avod external dependencies, as geotools is already there
- it misses a lot of expressiveness, instead of writing clumsly Predicates
that can only run in memory (since they are not well known) we can
actually use a API that can get translated to SQL (thinking about the
name matching filters in the GUI here)
- the idea that the domain is different is quite surprising, most of the
elements
that grow in big numbers have a bbox attached to them, so they are
indeed spatial. One of the things you normally want to do in a security
subsystem is restrict access by geographic area, and we could not
express that with Predicate

This is arguable and I don't want to focus on this discussion just
now, I feel it's more important to for the time being to point out
some possible errors of interpretation bellow. They're probably due to
the patches not being squashed on single functional units, which I
apologize for. As I mentioned before, I'm working on that.

Moreover, with OGC filters it would get really easy to create a datastore
based catalog implementation if we want to, and it would be much better
of a proof of concept than the current ones (more on this later).

The only drawback of Filter is that it is supposed to be a "closed" API,
with no way to implement a new filter, but that is actually less of a
limitation
since the model is rich, and easily worked around by implementing
whatever filter function is missing.

Just pointing out here that nothing impedes you of doing so either way
if you want to take advantage of the current feature access oriented
infrastructure, but that'd be an implementation detail, better hidden
from the Catalog API. Given the proposed Catalog predicate object
model is so small and based on observed usage, it'd be a lot easier
going from Predicate to Filter than the other way around. Also, if you
wanted to leverage the DataStore API and adapt it to the Catalog API,
you would be making assumptions about the data structures the catalog
objects are stored with (i.e. flat RDBMS table). I don't want those
assumptions at the API level, so deliberately stayed away. That's how
a layered architecture usually works. More on this later, already
starting to rant and said I wanted to focus on code review.

Moving forward, I would advise against having to check if the store
can data sort or not, it just makes the caller code more complicated
and forces it to do work arounds if sorting is really needed.

I agree with that. Justin already pointed that out already but I left
it in just to get the proposal out and let others voice up. Leveraging
the GeoTools merge-sort code looks to me like the way to go and need
to look into it.

In GeoTools we have code that does merge-sort using disk space
if necessary that can sort whatever amount of data with little memory
footprint (of course, at the price of performance).
It would be better to have a method that checks if sorting can be done
fast instead, so if the code needs sorting as an optimization it can
leverage it or use an alternate path, but code that really needs sorting
will just ask it and have it done by the catalog impl without having to
repeat that in all places that do need sorting for good.

As a rule of dumb, I am against of adding such a check just because or
just in case without an actual use case. We can always add that check
when the real need arises, but it's harder to get rid of it is that
never happens, plus the Catalog API complexity wouldn't be reduced at
all. My intention is hence to leave the canSort method on
CatalogFacade, but remove it from Catalog. O rather remove it from
both. Implementing the merge-sort on CatalogImpl and using canSort on
CatlogFacade seems to make the most sense, given the way the Catalog
API has evolved, it's unlikely that you have multiple Catalog
implementations but can certainly have multiple CatalogFacade ones.

A small other thing is that these methods are supposed to access file system
or the network, but they don't throw any exception... I can live with that,
most likely the calling code does not have anything meaningful to do
in case of exception anyways, but thought I'd point it out anyways.

That's right. There's a lot of literature out there on checked vs
unchecked exceptions. Plus the Catalog is already using unchecked ones
so it kind of makes sense to follow suite.

A thing that I find instead surprising is seeing no trace of a transaction
concept,
if the intent is to move to a model where multiple GeoServer share the same
db
and write on it in parallel, being able to use transactions seems quite
important,
there is a need for coordination that is not addressed by this proposal.

I'm not sure if you mean just database transactions or some kind of
coordination between nodes in a cluster.
The later is out of scope. The former is just a wrong assessment.
Check out the JDBCCatalogFacade add/remove/save methods. They're
transactional. It's just that spring-jdbc is taking care of the boiler
plate.

If, by the other hand, you mean transactions at the Catalog API level,
so that you can, for example, add multiple resources atomically,
that'd be subject of another GSIP.

The modifications done below and above the API changes are simple proofs
of concept, meaning the validation of the API is low and the checks on its
quality low as well, not something we'd want to fast track on a code base
that we want to make more stable.

Strongly disagree here. Call it proof of concept if you want, but I
spent quite a bit of time identifying "exemplary", not "random sample"
use cases where the vertical scalability of the system was highly
compromised. If you have more exemplary use cases from current code
those will be very welcomed. Most of the service code just query
single objects by name or by id, which are not a scalability problem.

Let's start by what's above the API. All we have is a handful of examples,
but the code base is largerly unchanged.

I think my above paragraph explains that. And wonder how come you
seemed to be so concerned about large code base changes, and now
complain about the lack of it. Identifying key scalability
compromising use cases and allowing for incremental adoption of the
new API seemed to me like a good way of addressing your original
concern, which I share btw.

On one side this means the new
code is not going to be touched very much,
on the other side it means we
get no benefit from the new API and we're not validating it and its
implementation
at all.

If starting up geoserver in seconds instead of minutes, loading the
home page almost instantly instead waiting for seconds or even
minutes, under-second response times for the layers page list with
thousands of layers, including filtering and paging; not going OOM or
getting timeouts when doing a GetCapabilities request under
concurrency and/or low heap size, but instead streaming out as quickly
as possible, using as little memory as possible, and gracefully
degrading under load; are not ways of exercising the new API, then I'm
lost.

Looks like a trojan horse to get in the higher level modifications
later,

that's discourteous.

which will actually destabilize the code base as we are already in RC or
bugfix release mode.

I don't get you. Haven't we said we don't want to destabilize the
stable code base? isn't touching as little as possible on the stable
branch (even nothing at all!) a way of preserving its stability, yet
you complain there are too few changes on the wider code base? Haven't
we discussed already to put only the API on 2.2 for peace of mind re
stability, and adding the incremental changes on 2.3 only, and even
not touching 2.2.x at all?

Moreover various of the predicates created have no chance to be encoded
in native mode since they are not "well known".

Examples? The only one I can think of is the one in
SecureCatalogImpl.securityFilter.
The way it works is so that it builds the wrapper policy on an object
by object basis just like it used to be. It may be possible to write
it in a way that's "encodable" to SQL, that's a piece of code I don't
fully understand. In any case, the net effect is that that predicate
that executes in-process, is and'ed to any other filter the
SecureCatalogImpl gets, and the backend can split the anded filter
into supported and unsupported parts, hence having the chance of
evaluating way less objects than if doing a full scan everytime, as
it's currently.

In fact the authorization subsystem should be changed too in order to
leverage
the new scalability api, so that it returns a filter instead of checking
point by
point a single layer.

How's that a bad thing? If a user is authorized to 10 out of 1000
resources, isn't it better to get only those 10 out of the catalog,
and to create a wrapper object for only those 10 instead of for the
1000 of them?

Same goes for the GUI filtering code, which:
- loads the full data set in memory in case the store is not able to sort on
the desired attribute

That is true, but kind of a dead code snippet. The trick is that
BeanProperties (addressing a property name, like styles.name,
resource.store.workspace.name, etc) do can be sorted at the db. For
the layers page, the only non BeanProperty is the "enabled" one,
because it uses the derived enabled() property instead of the
isEnabled() property. But still, one could sort on the "enabled" (as
per the isEnabled() accessor) natively. I should make the code account
for that. Or if we end up using the geotools on-disk sorting code that
wouldn't be a concern either?

- builds a predicate that is not encodable (with OGC filter we could
actually encode it instead).

Again, which one? the only non encodable one is the enabled() property
in LayerProvider. It should actually be encodable adding the smarts to
mean the isEnabled() property plus the in-process check to account for
derived non-enabled states (when the store or resource is not but the
layer is).

The bits below the API are baffling too. Both the JE and JDBC
implementations
are based on key/value store where the value is the XML dump of each
CatalogInfo.

That's a good thing.

This makes the whole point about filter encoding moot, as there is almost no
filter
being encoded down at the DB level.

That is plain wrong.
Both implementaions use indexes to filter on the different properties.

Say I want to do a GetMap request with a single layer, we know the name, we
end
up scanning the whole database, load all the blobs, parse them back in
memory,
apply the filter in memory.

Wrong assessment. The BDB JE implementation uses an index based on
name. The JDBC one uses a separate table where the properties are
stored and indexed.

Sure it scales to 100M items, but nobody will
want to wait
for the time it takes this implementation to do a simple GetMap.
I know they are community modules, but even a community module should have a
split
chance of being used, this implementation seems so weak that I don't believe
anyone
will want actually want to use it, and in order to validate the API we
should have an
implementation that actually makes use of some of its concept (some actual
native filtering
for example).

When you have a chance please re-review taking into consideration my
previous comments.
tips: ConfigDatabase uses PredicatToSQL. Looks like you didn't find
that out and hence was so disappointed. I would have been to if that
were the case.
That said, I am not sure the current approach at translating a
predicate to SQL is the best one, given I honestly suck at SQL. But
for proof of concept, it does a pretty decent job, and as far as I
understand, a GSIP with "under discussion" status doesn't need to have
all its implementation details perfected.

(little aside, nice to see bonecp in the mix, I actually wanted to try out
that connection pool
me too)

yeah, seems to work better than dbcp.

Long story short, the proposal seems weak in some API points, and the
implementation is
proof of concept which I don't think should be allowed to land on trunk
right now.

You say "proof of concept" as if it were pejorative. I call it
functional prototype and have seen more than one land on the community
space over the years.

But I want to offer a reasonable alternative: shall we move to a time boxed
6 months
release cycle? 4 months to add new stuff, 2 months to stabilize, rinse and
repeat,
push out 3-4 bugfix releases in the stable series.
This way we don't have to have these long waits for new features to show up,
this
much needed scalability improvement can land on a stable series in the next
8-9 months (assuming 2 months to release 2.2.0 and 6 months cycle), be
vetted
and improved so that it's an API and an implementation we can all be proud
of.

Sounds reasonable to me. Separate email thread?

I really want to GSIP in, just not now and not in its current state.
But I'm willing to put forward resources to help out making it become a
reality

I really do hope that the rest of the PSC chimes in as well, this is an
important
GSIP and it deserves other people opinions (besides my personal rants).

So do I.

Thanks for your time.
Gabriel.

Ah, next week I'll also try to prepare a GSIP for the 6 months release
cycle,
unless of course there are very negative reactions to the idea.

Cheers
Andrea

--
-------------------------------------------------------
Ing. Andrea Aime
GeoSolutions S.A.S.
Tech lead

Via Poggio alle Viti 1187
55054 Massarosa (LU)
Italy

phone: +39 0584 962313
fax: +39 0584 962313
mob: +39 339 8844549

http://www.geo-solutions.it
http://geo-solutions.blogspot.com/
http://www.youtube.com/user/GeoSolutionsIT
http://www.linkedin.com/in/andreaaime
http://twitter.com/geowolf

-------------------------------------------------------

--
Gabriel Roldan
OpenGeo - http://opengeo.org
Expert service straight from the developers.

On Fri, Apr 27, 2012 at 1:29 PM, Andrea Aime andrea.aime@anonymised.com wrote:

I could make things more explicit on the main proposal page, but each
section contains links to more detailed information, including the
actual “patch” to be seen inline. Namely:
<http://geoserver.org/display/GEOS/GSIP+69±+API+Proposal>
As it’s all about new stuff, when it comes strictly to the API
proposal, I judged it’d be convenient to have the whole proposal to be
seed inline. Yet there’s also a link to the github branch where the
whole work lives, both API proposal, code migration for exemplary use
cases, and alternative jdbc backend:
<https://github.com/groldan/geoserver/tree/GSIP69>

All right, took half a day to go though the proposal and the code.

The proposal direction is good, agree with all the needs it expresses
and the general idea that we should be able to filter and page and load
stuff interactively.

I strongly disagree on the idea that rolling a new filter API is better than
using the OGC one, this is a first show stopper for me.
The Predicate API is very limited and has no spatial filter support,
GeoServer core already depends on GeoTools heavily so the whole
reasoning about Predicate advantages is pretty empty, I actually
see a lot more weaknesses in rolling a new API:

Yeah I agree there are definitely some upsides to using the existing geotools filter api. But some reservations inline.

  • it violates the KISS principle, adding more stuff does not make anything
    simpler

Not sure I would classify the geotools filter api as simple… the predicate api looks simpler to me. But it sounds like by simple you mean minimal additions, then yes i agree.

  • it does not make writing encoders any easier, on the contrary, demands
    more code to be written while we already have a pretty simple way to
    split a complex OGC filter into a supported and unsupported part,
    lots of encoders that we can mimick and/or copy code from
  • it does not avod external dependencies, as geotools is already there
  • it misses a lot of expressiveness, instead of writing clumsly Predicates
    that can only run in memory (since they are not well known) we can
    actually use a API that can get translated to SQL (thinking about the
    name matching filters in the GUI here)

Well I think the geotools filter is a bit too “expressive”… in that writing a simple filter requires too much code in my opinion. It really lacks a solid builder like we have for feature stuff. If i remember were you working on one a while back? Part of a new style builder or something? I guess we also have cql which solves that one too.

  • the idea that the domain is different is quite surprising, most of the elements
    that grow in big numbers have a bbox attached to them, so they are
    indeed spatial. One of the things you normally want to do in a security
    subsystem is restrict access by geographic area, and we could not
    express that with Predicate

While I agree that making use of the spatial aspect of the catalog makes a lot of sense, its not surprising to me it is looked over. The catalog has always just been considered a configuration store providing simple crud operations so I don’t think people readily jump to seeing it as a spatial store of information. And I can’t think of a use case in geoserver today where to do lookup of a layer based on its bounding box. But I think the idea is actually really cool and really powerful. And if geoserver ever wants to provide a CSW view or implementation one that will be crucial.

Moreover, with OGC filters it would get really easy to create a datastore
based catalog implementation if we want to, and it would be much better
of a proof of concept than the current ones (more on this later).

The only drawback of Filter is that it is supposed to be a “closed” API,
with no way to implement a new filter, but that is actually less of a limitation
since the model is rich, and easily worked around by implementing
whatever filter function is missing.

I think the reliance on datastore is one of the downfalls as well… yes there is lots of good infrastructure for splitting filters up based on capabilities and the like… but its pretty tied to the feature model no? Like for instance unless you have feature types around you can’t really get any information about types of attributes specified in predicates. Also the in memory implementations of the filter are pretty heavily based on feature objects. I believe this has been extracted so its now possible to execute filters directly on java bean like objects but personally i have never really used that so have no idea how well it works.

In the end I can see us writing a lot of code to turn catalog objects into Features and FeatureType representations to pull this off. Doable, but could also be rather clumsy.

So while I agree going with the filter api is very tempting but i am not totally sold on it. Although I am interested to hear more about your idea of a datastore backed catalog implementation.

Moving forward, I would advise against having to check if the store
can data sort or not, it just makes the caller code more complicated
and forces it to do work arounds if sorting is really needed.
In GeoTools we have code that does merge-sort using disk space
if necessary that can sort whatever amount of data with little memory
footprint (of course, at the price of performance).
It would be better to have a method that checks if sorting can be done
fast instead, so if the code needs sorting as an optimization it can
leverage it or use an alternate path, but code that really needs sorting
will just ask it and have it done by the catalog impl without having to
repeat that in all places that do need sorting for good.

I agree here totally. We made this mistake with datastores and it led to chaos. We shouldn’t add any filtering or querying capability to the api without a default implementation to go in places where native capabilities are not available. Even if that default implementation is horribly inefficient i think it is better than just throwing an exception back when a user tries to do something, or as we see here have to check a flag before usage. It defeats of purpose of having an api to abstract data access.

A small other thing is that these methods are supposed to access file system
or the network, but they don’t throw any exception… I can live with that,
most likely the calling code does not have anything meaningful to do
in case of exception anyways, but thought I’d point it out anyways.

Right, i guess this stems from the fact the original catalog api throws no exceptions. I don’t have a strong opinion either way but i know much of the time having to deal with checked exceptions just means rethrowing them back wrapped in a runtime exception anyways. This is the everlasting checked vs non checked argument.

A thing that I find instead surprising is seeing no trace of a transaction concept,
if the intent is to move to a model where multiple GeoServer share the same db
and write on it in parallel, being able to use transactions seems quite important,
there is a need for coordination that is not addressed by this proposal.

This is an interesting one. Indeed some notion of transaction is needed, that is for sure. But not sure it has to be a first class citizen in the api. Look at how spring approaches transactions. It encourages declarative transaction management and keeping transaction handling isolated to an aspect, keeping transaction handling code out of the main data access api and a separate concern.

Anyways, imo making the catalog api supporting transactions will warrant its own proposal. And in the interest of making incremental progress practically something that could be put off to a future iteration.

The modifications done below and above the API changes are simple proofs
of concept, meaning the validation of the API is low and the checks on its
quality low as well, not something we’d want to fast track on a code base
that we want to make more stable.

Let’s start by what’s above the API. All we have is a handful of examples,
but the code base is largerly unchanged. On one side this means the new
code is not going to be touched very much, on the other side it means we
get no benefit from the new API and we’re not validating it and its implementation
at all. Looks like a trojan horse to get in the higher level modifications later,
which will actually destabilize the code base as we are already in RC or
bugfix release mode.
Moreover various of the predicates created have no chance to be encoded
in native mode since they are not “well known”.
In fact the authorization subsystem should be changed too in order to leverage
the new scalability api, so that it returns a filter instead of checking point by
point a single layer.
Same goes for the GUI filtering code, which:

  • loads the full data set in memory in case the store is not able to sort on
    the desired attribute
  • builds a predicate that is not encodable (with OGC filter we could
    actually encode it instead).

Fair enough, but another way of looking at this is that it is low risk. It seems historically common that a developer has developed some new functionality and wants to get it into the codebase to start getting it wider exposure. As long as users are not forced to use it or can easily turn it off I think we have always considered that acceptable.

The other approach is what I think you are saying here is to ensure the new api is used everywhere, in order to ensure it meets the requirements of the system. This is the approach more in line with the new catalog for 2.0. We ripped out the core and replaced it but had all the client code help to validate the new stuff. Had the benefit of a large variety of unit tests ready and waiting, etc…But it was still painful in the early going , and something users had no way of avoiding. So not sure which is approach is better. Both have upsides and downsides.

The bits below the API are baffling too. Both the JE and JDBC implementations
are based on key/value store where the value is the XML dump of each CatalogInfo.
This makes the whole point about filter encoding moot, as there is almost no filter
being encoded down at the DB level.
Say I want to do a GetMap request with a single layer, we know the name, we end
up scanning the whole database, load all the blobs, parse them back in memory,
apply the filter in memory. Sure it scales to 100M items, but nobody will want to wait
for the time it takes this implementation to do a simple GetMap.
I know they are community modules, but even a community module should have a split
chance of being used, this implementation seems so weak that I don’t believe anyone
will want actually want to use it, and in order to validate the API we should have an
implementation that actually makes use of some of its concept (some actual native filtering
for example).

I agree that the approach of serializing as a single blob and maintaining it by key doesn’t map well to a relational database so the jdbc implementation seems weird. But it does map more naturally to non relational stores, document databases, etc… Which could also be a nice fit. Thinking something like couch that allows for easily working with json documents. Could be a nice fit since we already easily emit json representations of all the catalog objects.

(little aside, nice to see bonecp in the mix, I actually wanted to try out that connection pool
me too)

Long story short, the proposal seems weak in some API points, and the implementation is
proof of concept which I don’t think should be allowed to land on trunk right now.

But I want to offer a reasonable alternative: shall we move to a time boxed 6 months
release cycle? 4 months to add new stuff, 2 months to stabilize, rinse and repeat,
push out 3-4 bugfix releases in the stable series.

I would love to see this, but am skeptical about it. Strict time boxed iterations sound great on paper but have practical issues. We did try to maintain this a few years back and did for a while, but the process was pretty frustrating. It’s hard to timebox on a project like geoserver in which so much large scale feature development happens, the mandates and deadlines for which are driven solely by customer requirements and schedules.

There is also the question of resourcing. Managing a process like this takes organizations stepping up with resources to actually do releases, help review and manage proposals, etc… all in a timley fashion. its significant. In the past it is has generally been the same people tasked with doing releases. People have talked about stepping up to share the burden but i have yet to see it really happen.

Anyways, looking forward to trying to make this work. Hopefully we can draw on past experience to come up with something that will work long term.

This way we don’t have to have these long waits for new features to show up, this
much needed scalability improvement can land on a stable series in the next
8-9 months (assuming 2 months to release 2.2.0 and 6 months cycle), be vetted
and improved so that it’s an API and an implementation we can all be proud of.

I really want to GSIP in, just not now and not in its current state.
But I’m willing to put forward resources to help out making it become a reality

I really do hope that the rest of the PSC chimes in as well, this is an important
GSIP and it deserves other people opinions (besides my personal rants).

Ah, next week I’ll also try to prepare a GSIP for the 6 months release cycle,
unless of course there are very negative reactions to the idea.

Nope, would love the idea, it will remove ambiguity as to what development is appropriate and when. GeoServer has been drifting away from an actual process over the last couple of years and it is starting to show. We could definitely do with a bit more structure. Thanks for stepping up and taking this on.

Cheers

Andrea

Ing. Andrea Aime
GeoSolutions S.A.S.
Tech lead

Via Poggio alle Viti 1187
55054 Massarosa (LU)
Italy

phone: +39 0584 962313
fax: +39 0584 962313
mob: +39 339 8844549

http://www.geo-solutions.it
http://geo-solutions.blogspot.com/


Justin Deoliveira
OpenGeo - http://opengeo.org
Enterprise support for open source geospatial.

Last one about the serialization subsystem.

The idea of storing all the elements in text/binary blobs is not new
(DeeGree actually does that to handle complex features, only making
some of the attributes explicit, those that the admin thinks is going
to be used as filters), and I fully agree it maps well to key/value
nosql databases, as well as any db that has first class support for
xml and json and allows to do direct searches on it.

Doing that in a relational BDMS is clunky though, for a variety of
reasons, but I agree it’s a quick way to get a R-DBMS catalog working
(though… say goodbye to spatial filtering, which is not good at all).

Ah, about the ability to filter and me not seeing PredicateToSQL,
I actually saw it and went over it a few times, what I did not
see was the exporting of attributes in the second table and the
usage while building the sql.

To my defense, how can you guess that the “relations” table is
actually exported attributes from a bean? The naming makes no sense to me:

CREATE TABLE CATALOG (id VARCHAR(100) PRIMARY KEY, blob bytea);

CREATE TABLE DEFAULTS (id VARCHAR(256) PRIMARY KEY, type VARCHAR(100));

CREATE TABLE RELATIONS (id VARCHAR(100), type VARCHAR(50), relation VARCHAR(256), value VARCHAR(4096));

The code of the PredicateToSQL is rather criptic, and does a lot of string
manipulation, a standard OGC filter encoder seems actually simpler.
Now that I know it’s exporting attributes and encoding searches on them
I still have a hard time imagining how the actual query to the database
looks like, though it seems to be something with one subquery for each
queried property, something like
“select blob from catalog where id in (suquery1) or id in (subquery2) or …”

However, if it works, it works, the problematic point will be
selling it to the db admin that manages the database cluster sitting
behind GeoServer in a HA setup, and the people wanting to interact with the
database.

Now, you might rightfully say that you don’t care and that they are free to
implement their own.
On the other side, I believe the current in memory catalog will keep on fitting
most simple installation needs (100-1000 layers).
Installations that have lot more and are setup for multitenancy will likely
be fully HA and have the dreaded db admin looking at the schema and indexes.
People that want to publish new layers and hope to use the familiar SQL commands
will also be disappointed as we turn them to REST config.

Which leaves the proposed implementations for those green field installation or
places where the db admin actually does not mind seeing the db used as a key/value
store… that is, what looks like a relatively narrow use case.

Justin makes a good point about transactions handling in a servlet/dispatcher filter,
I agree it’s a good idea, something that we should add and that would also
reduce to zero the need for the existing config locks. How hard would it be to wire
the catalog trasaction handling with the typical Spring filters for transaction management?

Long story short, the persistent implementations that good community modules meant
to demonstrate the feasibility of doing secondary storage catalog implementations.
Not happy about how the jdbc one looks, but they serve the need of showing multiple
implementations just fine.

Cheers
Andrea

Not much to comment on except that however the implementation decides to do
storage I see purely an implementation detail. As long as the api doesn't
force in any way or make an assumption about storage we should be fine. The
fact that the current jdbc implementation is done this way stems from the
fact that the original implementation was bdb where this key value blob
style makes more sense, and the idea was to get a quick jdbc prototype up
and running. Ideally someone steps up with a nice implementation for jdbc
that does a mapping to a pure relational model. There is also the dbconfig
module to take into consideration. Given a choice of implementing it over I
would say away from hibernate but that said there is some good code there,
could be an easy pick up for someone who knows hibernate.

On Sat, Apr 28, 2012 at 3:00 PM, Andrea Aime
<andrea.aime@anonymised.com>wrote:

Last one about the serialization subsystem.

The idea of storing all the elements in text/binary blobs is not new
(DeeGree actually does that to handle complex features, only making
some of the attributes explicit, those that the admin thinks is going
to be used as filters), and I fully agree it maps well to key/value
nosql databases, as well as any db that has first class support for
xml and json and allows to do direct searches on it.

Doing that in a relational BDMS is clunky though, for a variety of
reasons, but I agree it's a quick way to get a R-DBMS catalog working
(though.. say goodbye to spatial filtering, which is not good at all).

Ah, about the ability to filter and me not seeing PredicateToSQL,
I actually saw it and went over it a few times, what I did not
see was the exporting of attributes in the second table and the
usage while building the sql.

To my defense, how can you guess that the "relations" table is
actually exported attributes from a bean? The naming makes no sense to me:

CREATE TABLE CATALOG (id VARCHAR(100) PRIMARY KEY, blob bytea);

CREATE TABLE DEFAULTS (id VARCHAR(256) PRIMARY KEY, type VARCHAR(100));

CREATE TABLE RELATIONS (id VARCHAR(100), type VARCHAR(50), relation
VARCHAR(256), value VARCHAR(4096));

The code of the PredicateToSQL is rather criptic, and does a lot of string
manipulation, a standard OGC filter encoder seems actually simpler.
Now that I know it's exporting attributes and encoding searches on them
I still have a hard time imagining how the actual query to the database
looks like, though it seems to be something with one subquery for each
queried property, something like
"select blob from catalog where id in (suquery1) or id in (subquery2) or
..."

However, if it works, it works, the problematic point will be
selling it to the db admin that manages the database cluster sitting
behind GeoServer in a HA setup, and the people wanting to interact with the
database.

Now, you might rightfully say that you don't care and that they are free to
implement their own.
On the other side, I believe the current in memory catalog will keep on
fitting
most simple installation needs (100-1000 layers).
Installations that have lot more and are setup for multitenancy will likely
be fully HA and have the dreaded db admin looking at the schema and
indexes.
People that want to publish new layers and hope to use the familiar SQL
commands
will also be disappointed as we turn them to REST config.

Which leaves the proposed implementations for those green field
installation or
places where the db admin actually does not mind seeing the db used as a
key/value
store.. that is, what looks like a relatively narrow use case.

Justin makes a good point about transactions handling in a
servlet/dispatcher filter,
I agree it's a good idea, something that we should add and that would also
reduce to zero the need for the existing config locks. How hard would it
be to wire
the catalog trasaction handling with the typical Spring filters for
transaction management?

Long story short, the persistent implementations that good community
modules meant
to demonstrate the feasibility of doing secondary storage catalog
implementations.
Not happy about how the jdbc one looks, but they serve the need of showing
multiple
implementations just fine.

Cheers
Andrea

--
Justin Deoliveira
OpenGeo - http://opengeo.org
Enterprise support for open source geospatial.

Hi guys,
since I don’t like to write a lot I will try to be very concise …

  1. As a PSC member and actual developer using heavily 2.2.x, I would like to have the warranty that this GSIP does not introduce any instability on the code before branching and having a release.
    As far as I understood this GSIP implies only API changes, so hopefully this won’t impact the actual functionality of the catalog and GeoServer core.

  2. Technically speaking:
    a) are we sure the Predicate is the best option? Why not using Generic DAOs? Do we have something already in place that performs filtering and pagination at lower level?
    b) I guess any extension to the catalog is nice but almost unuseful until we have everything in memory. We should envisage on such improvements the possibility of serializing/deserializing objects on the fly and introduction of Level 1 and possibly Level 2 Caching mechanisms.

my vote for the moment is -0 on this GSIP, until we clarify some more things especially on a more longer term refactoring of the catalog.

Regards,
Alessio.

···

On Sat, Apr 28, 2012 at 8:29 PM, Justin Deoliveira <jdeolive@anonymised.com> wrote:


Live Security Virtual Conference
Exclusive live event will cover all the ways today’s security and
threat landscape has changed and how IT managers can respond. Discussions
will include endpoint security, mobile security and the latest in malware
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/


Geoserver-devel mailing list
Geoserver-devel@anonymised.comsts.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/geoserver-devel

Not much to comment on except that however the implementation decides to do storage I see purely an implementation detail. As long as the api doesn’t force in any way or make an assumption about storage we should be fine. The fact that the current jdbc implementation is done this way stems from the fact that the original implementation was bdb where this key value blob style makes more sense, and the idea was to get a quick jdbc prototype up and running. Ideally someone steps up with a nice implementation for jdbc that does a mapping to a pure relational model. There is also the dbconfig module to take into consideration. Given a choice of implementing it over I would say away from hibernate but that said there is some good code there, could be an easy pick up for someone who knows hibernate.

On Sat, Apr 28, 2012 at 3:00 PM, Andrea Aime <andrea.aime@anonymised.com> wrote:

Last one about the serialization subsystem.

The idea of storing all the elements in text/binary blobs is not new
(DeeGree actually does that to handle complex features, only making
some of the attributes explicit, those that the admin thinks is going
to be used as filters), and I fully agree it maps well to key/value
nosql databases, as well as any db that has first class support for
xml and json and allows to do direct searches on it.

Doing that in a relational BDMS is clunky though, for a variety of
reasons, but I agree it’s a quick way to get a R-DBMS catalog working
(though… say goodbye to spatial filtering, which is not good at all).

Ah, about the ability to filter and me not seeing PredicateToSQL,
I actually saw it and went over it a few times, what I did not
see was the exporting of attributes in the second table and the
usage while building the sql.

To my defense, how can you guess that the “relations” table is
actually exported attributes from a bean? The naming makes no sense to me:

CREATE TABLE CATALOG (id VARCHAR(100) PRIMARY KEY, blob bytea);

CREATE TABLE DEFAULTS (id VARCHAR(256) PRIMARY KEY, type VARCHAR(100));

CREATE TABLE RELATIONS (id VARCHAR(100), type VARCHAR(50), relation VARCHAR(256), value VARCHAR(4096));

The code of the PredicateToSQL is rather criptic, and does a lot of string
manipulation, a standard OGC filter encoder seems actually simpler.
Now that I know it’s exporting attributes and encoding searches on them
I still have a hard time imagining how the actual query to the database
looks like, though it seems to be something with one subquery for each
queried property, something like
“select blob from catalog where id in (suquery1) or id in (subquery2) or …”

However, if it works, it works, the problematic point will be
selling it to the db admin that manages the database cluster sitting
behind GeoServer in a HA setup, and the people wanting to interact with the
database.

Now, you might rightfully say that you don’t care and that they are free to
implement their own.
On the other side, I believe the current in memory catalog will keep on fitting
most simple installation needs (100-1000 layers).
Installations that have lot more and are setup for multitenancy will likely
be fully HA and have the dreaded db admin looking at the schema and indexes.
People that want to publish new layers and hope to use the familiar SQL commands
will also be disappointed as we turn them to REST config.

Which leaves the proposed implementations for those green field installation or
places where the db admin actually does not mind seeing the db used as a key/value
store… that is, what looks like a relatively narrow use case.

Justin makes a good point about transactions handling in a servlet/dispatcher filter,
I agree it’s a good idea, something that we should add and that would also
reduce to zero the need for the existing config locks. How hard would it be to wire
the catalog trasaction handling with the typical Spring filters for transaction management?

Long story short, the persistent implementations that good community modules meant
to demonstrate the feasibility of doing secondary storage catalog implementations.
Not happy about how the jdbc one looks, but they serve the need of showing multiple
implementations just fine.

Cheers

Andrea


Justin Deoliveira
OpenGeo - http://opengeo.org
Enterprise support for open source geospatial.

On Sun, Apr 29, 2012 at 5:24 PM, Alessio Fabiani <alessio.fabiani@anonymised.com> wrote:

  1. Technically speaking:
    a) are we sure the Predicate is the best option? Why not using Generic DAOs? Do we have something already in place that performs filtering and pagination at lower level?

The current DAO in GeoServer is the CatalogFacade. No, we don’t have anything prior to this GSIP that can handle filtering, pagination and sorting.
How filtering/pagination that is done is left to the specific implementation, the in memory one just scan the whole set of beans, the secondary storage
ones try to encode the data access indications in the native language of the storage (SQL for JDBC databases)

b) I guess any extension to the catalog is nice but almost unuseful until we have everything in memory. We should envisage on such improvements the possibility of serializing/deserializing objects on the fly and introduction of Level 1 and possibly Level 2 Caching mechanisms.

Yeah, the two community extensions do at least part of that, whether caching should sit on top of a persisten mechanism or inside
it is a matter of design, both solutions are applicable today (but none is truly implemented, besides some light caching I’ve
see in the jdbc store)

Cheers
Andrea

Ing. Andrea Aime
GeoSolutions S.A.S.
Tech lead

Via Poggio alle Viti 1187
55054 Massarosa (LU)
Italy

phone: +39 0584 962313
fax: +39 0584 962313
mob: +39 339 8844549

http://www.geo-solutions.it
http://geo-solutions.blogspot.com/
http://www.youtube.com/user/GeoSolutionsIT
http://www.linkedin.com/in/andreaaime
http://twitter.com/geowolf


On Mon, Apr 30, 2012 at 7:08 AM, Andrea Aime
<andrea.aime@anonymised.com> wrote:

On Sun, Apr 29, 2012 at 5:24 PM, Alessio Fabiani
<alessio.fabiani@anonymised.com> wrote:

2. Technically speaking:
a) are we sure the Predicate is the best option? Why not using Generic
DAOs? Do we have something already in place that performs filtering and
pagination at lower level?

The current DAO in GeoServer is the CatalogFacade. No, we don't have
anything prior to this GSIP that can handle filtering, pagination and
sorting.
How filtering/pagination that is done is left to the specific
implementation, the in memory one just scan the whole set of beans, the
secondary storage
ones try to encode the data access indications in the native language of the
storage (SQL for JDBC databases)

b) I guess any extension to the catalog is nice but almost unuseful until
we have everything in memory. We should envisage on such improvements the
possibility of serializing/deserializing objects on the fly and introduction
of Level 1 and possibly Level 2 Caching mechanisms.

Yeah, the two community extensions do at least part of that, whether caching
should sit on top of a persisten mechanism or inside
it is a matter of design, both solutions are applicable today (but none is
truly implemented, besides some light caching I've
see in the jdbc store)

Just a quick one on caching. I managed to update the branch in
sensible squashed commits[1]. The old one is still available for
reference[2].
The jdbcconfig module uses caching. The construction of the cache is
decoupled from it though[3]. Current default implementation of a cache
provider is very light, it needs to be made configurable [4]. By the
other hand, I have working exploratory prototype that uses a
distributed cache among nodes in a cluster[5], backed by a distributed
concurrent map, with a number of configurable properties[6].

Cheers,
Gabriel

[1] <https://github.com/groldan/geoserver/tree/GSIP69&gt;
[2] <https://github.com/groldan/geoserver/tree/GSIP69_old&gt;
[3] <https://github.com/groldan/geoserver/blob/GSIP69/src/community/jdbcconfig/src/main/java/org/geoserver/jdbcconfig/internal/ConfigDatabase.java#L97&gt;
[4] <https://github.com/groldan/geoserver/blob/GSIP69/src/main/src/main/java/org/geoserver/util/DefaultCacheProvider.java&gt;
[5] <https://github.com/groldan/geoserver/blob/scalability_clustering/src/community/cluster/src/main/java/org/geoserver/cluster/DistributedCacheProvider.java&gt;
[6] <https://github.com/groldan/geoserver/blob/scalability_clustering/src/community/cluster/src/main/java/org/geoserver/cluster/DistributedCacheProvider.java&gt;

Cheers
Andrea

--
-------------------------------------------------------
Ing. Andrea Aime
GeoSolutions S.A.S.
Tech lead

Via Poggio alle Viti 1187
55054 Massarosa (LU)
Italy

phone: +39 0584 962313
fax: +39 0584 962313
mob: +39 339 8844549

http://www.geo-solutions.it
http://geo-solutions.blogspot.com/
http://www.youtube.com/user/GeoSolutionsIT
http://www.linkedin.com/in/andreaaime
http://twitter.com/geowolf

-------------------------------------------------------

------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and
threat landscape has changed and how IT managers can respond. Discussions
will include endpoint security, mobile security and the latest in malware
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Geoserver-devel mailing list
Geoserver-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/geoserver-devel

--
Gabriel Roldan
OpenGeo - http://opengeo.org
Expert service straight from the developers.

Sorry for hijacking the discussion but I read about caching technologies for clustered environments. The upcoming new security module has a self implemented cache for authentication tokens. This is extremely important for stateless services not having an http session. Justin and me decided to defer the official cache discussion, but reality seems to be faster.

To support a clustered environment, the security module would need a distributed cache (like EHCache with a proper backend). I think it would be a good idea to have one single cache provider for all geoserver needs.

Christian

2012/4/30 Gabriel Roldan <groldan@anonymised.com>

On Mon, Apr 30, 2012 at 7:08 AM, Andrea Aime
<andrea.aime@anonymised.com> wrote:

On Sun, Apr 29, 2012 at 5:24 PM, Alessio Fabiani
<alessio.fabiani@anonymised.com268…> wrote:

  1. Technically speaking:
    a) are we sure the Predicate is the best option? Why not using Generic
    DAOs? Do we have something already in place that performs filtering and
    pagination at lower level?

The current DAO in GeoServer is the CatalogFacade. No, we don’t have
anything prior to this GSIP that can handle filtering, pagination and
sorting.
How filtering/pagination that is done is left to the specific
implementation, the in memory one just scan the whole set of beans, the
secondary storage
ones try to encode the data access indications in the native language of the
storage (SQL for JDBC databases)

b) I guess any extension to the catalog is nice but almost unuseful until
we have everything in memory. We should envisage on such improvements the
possibility of serializing/deserializing objects on the fly and introduction
of Level 1 and possibly Level 2 Caching mechanisms.

Yeah, the two community extensions do at least part of that, whether caching
should sit on top of a persisten mechanism or inside
it is a matter of design, both solutions are applicable today (but none is
truly implemented, besides some light caching I’ve
see in the jdbc store)

Just a quick one on caching. I managed to update the branch in
sensible squashed commits[1]. The old one is still available for
reference[2].
The jdbcconfig module uses caching. The construction of the cache is
decoupled from it though[3]. Current default implementation of a cache
provider is very light, it needs to be made configurable [4]. By the
other hand, I have working exploratory prototype that uses a
distributed cache among nodes in a cluster[5], backed by a distributed
concurrent map, with a number of configurable properties[6].

Cheers,
Gabriel

[1] <https://github.com/groldan/geoserver/tree/GSIP69>
[2] <https://github.com/groldan/geoserver/tree/GSIP69_old>
[3] <https://github.com/groldan/geoserver/blob/GSIP69/src/community/jdbcconfig/src/main/java/org/geoserver/jdbcconfig/internal/ConfigDatabase.java#L97>
[4] <https://github.com/groldan/geoserver/blob/GSIP69/src/main/src/main/java/org/geoserver/util/DefaultCacheProvider.java>
[5] <https://github.com/groldan/geoserver/blob/scalability_clustering/src/community/cluster/src/main/java/org/geoserver/cluster/DistributedCacheProvider.java>
[6] <https://github.com/groldan/geoserver/blob/scalability_clustering/src/community/cluster/src/main/java/org/geoserver/cluster/DistributedCacheProvider.java>

Cheers
Andrea

Ing. Andrea Aime
GeoSolutions S.A.S.
Tech lead

Via Poggio alle Viti 1187
55054 Massarosa (LU)
Italy

phone: +39 0584 962313
fax: +39 0584 962313
mob: +39 339 8844549

http://www.geo-solutions.it
http://geo-solutions.blogspot.com/
http://www.youtube.com/user/GeoSolutionsIT
http://www.linkedin.com/in/andreaaime
http://twitter.com/geowolf



Live Security Virtual Conference
Exclusive live event will cover all the ways today’s security and
threat landscape has changed and how IT managers can respond. Discussions
will include endpoint security, mobile security and the latest in malware
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/


Geoserver-devel mailing list
Geoserver-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/geoserver-devel


Gabriel Roldan
OpenGeo - http://opengeo.org
Expert service straight from the developers.


Live Security Virtual Conference
Exclusive live event will cover all the ways today’s security and
threat landscape has changed and how IT managers can respond. Discussions
will include endpoint security, mobile security and the latest in malware
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/


Geoserver-devel mailing list
Geoserver-devel@anonymised.comsts.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/geoserver-devel

On Mon, Apr 30, 2012 at 5:47 PM, Christian Mueller <mcrmcr21@anonymised.com> wrote:

Sorry for hijacking the discussion but I read about caching technologies for clustered environments. The upcoming new security module has a self implemented cache for authentication tokens. This is extremely important for stateless services not having an http session. Justin and me decided to defer the official cache discussion, but reality seems to be faster.

To support a clustered environment, the security module would need a distributed cache (like EHCache with a proper backend). I think it would be a good idea to have one single cache provider for all geoserver needs.

I agree having a shared default would be good, in that it would reduce the amount of libraries and system
devs need to be familiar with and increase general ability to maintain code, at the same time imposing
it as “the one and only” would be too much of a limitation, so whatever we go for should be pluggable
and easy to swap out for something different

Cheers
Andrea

Ing. Andrea Aime
GeoSolutions S.A.S.
Tech lead

Via Poggio alle Viti 1187
55054 Massarosa (LU)
Italy

phone: +39 0584 962313
fax: +39 0584 962313
mob: +39 339 8844549

http://www.geo-solutions.it
http://geo-solutions.blogspot.com/
http://www.youtube.com/user/GeoSolutionsIT
http://www.linkedin.com/in/andreaaime
http://twitter.com/geowolf


On Mon, Apr 30, 2012 at 12:47 PM, Christian Mueller <mcrmcr21@anonymised.com> wrote:

Sorry for hijacking the discussion but I read about caching technologies for
clustered environments. The upcoming new security module has a self
implemented cache for authentication tokens. This is extremely important for
stateless services not having an http session. Justin and me decided to
defer the official cache discussion, but reality seems to be faster.

To support a clustered environment, the security module would need a
distributed cache (like EHCache with a proper backend). I think it would be
a good idea to have one single cache provider for all geoserver needs.

Indeed. I mentioned I had a working exploratory prototype, and was
going to bring the concern of distributed caching when the time comes.
If at all, I think we could agree that a first step is decoupling the
cache from its user class, hence the "cache provider" abstraction. It
would also allow you to swap implementations depending on the
clustering technology/approach you chose. But yeah, I foreseen the
need for the different caches of serializable objects to be
harmonized.

Cheers,
Gabriel.

Christian

2012/4/30 Gabriel Roldan <groldan@anonymised.com>

On Mon, Apr 30, 2012 at 7:08 AM, Andrea Aime
<andrea.aime@anonymised.com> wrote:
> On Sun, Apr 29, 2012 at 5:24 PM, Alessio Fabiani
> <alessio.fabiani@anonymised.com> wrote:
>>
>> 2. Technically speaking:
>> a) are we sure the Predicate is the best option? Why not using Generic
>> DAOs? Do we have something already in place that performs filtering and
>> pagination at lower level?
>
>
> The current DAO in GeoServer is the CatalogFacade. No, we don't have
> anything prior to this GSIP that can handle filtering, pagination and
> sorting.
> How filtering/pagination that is done is left to the specific
> implementation, the in memory one just scan the whole set of beans, the
> secondary storage
> ones try to encode the data access indications in the native language of
> the
> storage (SQL for JDBC databases)
>
>>
>> b) I guess any extension to the catalog is nice but almost unuseful
>> until
>> we have everything in memory. We should envisage on such improvements
>> the
>> possibility of serializing/deserializing objects on the fly and
>> introduction
>> of Level 1 and possibly Level 2 Caching mechanisms.
>
>
> Yeah, the two community extensions do at least part of that, whether
> caching
> should sit on top of a persisten mechanism or inside
> it is a matter of design, both solutions are applicable today (but none
> is
> truly implemented, besides some light caching I've
> see in the jdbc store)

Just a quick one on caching. I managed to update the branch in
sensible squashed commits[1]. The old one is still available for
reference[2].
The jdbcconfig module uses caching. The construction of the cache is
decoupled from it though[3]. Current default implementation of a cache
provider is very light, it needs to be made configurable [4]. By the
other hand, I have working exploratory prototype that uses a
distributed cache among nodes in a cluster[5], backed by a distributed
concurrent map, with a number of configurable properties[6].

Cheers,
Gabriel

[1] <https://github.com/groldan/geoserver/tree/GSIP69&gt;
[2] <https://github.com/groldan/geoserver/tree/GSIP69_old&gt;
[3]
<https://github.com/groldan/geoserver/blob/GSIP69/src/community/jdbcconfig/src/main/java/org/geoserver/jdbcconfig/internal/ConfigDatabase.java#L97&gt;
[4]
<https://github.com/groldan/geoserver/blob/GSIP69/src/main/src/main/java/org/geoserver/util/DefaultCacheProvider.java&gt;
[5]
<https://github.com/groldan/geoserver/blob/scalability_clustering/src/community/cluster/src/main/java/org/geoserver/cluster/DistributedCacheProvider.java&gt;
[6]
<https://github.com/groldan/geoserver/blob/scalability_clustering/src/community/cluster/src/main/java/org/geoserver/cluster/DistributedCacheProvider.java&gt;
>
> Cheers
> Andrea
>
> --
> -------------------------------------------------------
> Ing. Andrea Aime
> GeoSolutions S.A.S.
> Tech lead
>
> Via Poggio alle Viti 1187
> 55054 Massarosa (LU)
> Italy
>
> phone: +39 0584 962313
> fax: +39 0584 962313
> mob: +39 339 8844549
>
> http://www.geo-solutions.it
> http://geo-solutions.blogspot.com/
> http://www.youtube.com/user/GeoSolutionsIT
> http://www.linkedin.com/in/andreaaime
> http://twitter.com/geowolf
>
> -------------------------------------------------------
>
>
> ------------------------------------------------------------------------------
> Live Security Virtual Conference
> Exclusive live event will cover all the ways today's security and
> threat landscape has changed and how IT managers can respond.
> Discussions
> will include endpoint security, mobile security and the latest in
> malware
> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
> _______________________________________________
> Geoserver-devel mailing list
> Geoserver-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/geoserver-devel
>

--
Gabriel Roldan
OpenGeo - http://opengeo.org
Expert service straight from the developers.

------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and
threat landscape has changed and how IT managers can respond. Discussions
will include endpoint security, mobile security and the latest in malware
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Geoserver-devel mailing list
Geoserver-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/geoserver-devel

--
Gabriel Roldan
OpenGeo - http://opengeo.org
Expert service straight from the developers.

On Mon, Apr 30, 2012 at 1:07 PM, Andrea Aime
<andrea.aime@anonymised.com> wrote:

On Mon, Apr 30, 2012 at 5:47 PM, Christian Mueller <mcrmcr21@anonymised.com>
wrote:

Sorry for hijacking the discussion but I read about caching technologies
for clustered environments. The upcoming new security module has a self
implemented cache for authentication tokens. This is extremely important for
stateless services not having an http session. Justin and me decided to
defer the official cache discussion, but reality seems to be faster.

To support a clustered environment, the security module would need a
distributed cache (like EHCache with a proper backend). I think it would be
a good idea to have one single cache provider for all geoserver needs.

I agree having a shared default would be good, in that it would reduce the
amount of libraries and system
devs need to be familiar with and increase general ability to maintain code,
at the same time imposing
it as "the one and only" would be too much of a limitation, so whatever we
go for should be pluggable
and easy to swap out for something different

Right. My previous reply states the same thing with other words I think.
Cheers,
Gabriel

Cheers
Andrea

--
-------------------------------------------------------
Ing. Andrea Aime
GeoSolutions S.A.S.
Tech lead

Via Poggio alle Viti 1187
55054 Massarosa (LU)
Italy

phone: +39 0584 962313
fax: +39 0584 962313
mob: +39 339 8844549

http://www.geo-solutions.it
http://geo-solutions.blogspot.com/
http://www.youtube.com/user/GeoSolutionsIT
http://www.linkedin.com/in/andreaaime
http://twitter.com/geowolf

-------------------------------------------------------

--
Gabriel Roldan
OpenGeo - http://opengeo.org
Expert service straight from the developers.