[Geoserver-devel] new configuration gsip reworked

Justin_Deoliveira · May 21, 2008, 9:45pm

Hi all,

Here is the newest version of the configuration GSIP for your reading pleasure.

http://geoserver.org/display/GEOS/GSIP+8+-+New+Configuration+System

Questions/comments/feedback welcome. It would be nice to be able to vote on this in next weeks IRC meeting.

-Justin

--
Justin Deoliveira
The Open Planning Project
jdeolive@anonymised.com

jive · May 22, 2008, 1:29am

Quick question:
- your diagram shows the existing persistence layer untouched? Is that still the case or can we replace it with something that is more simple to maintain .. reading further I see that appears to be future work. Your configuration persistence page currently only documents the XStream approach.

Comments:
- ResourcePool idea is well presented, nice work

Ignore if you want:
- can you show the difference between some information we keep explicitly, like setName in your example, and how extra property settings are handled over time
- may want to consider org.geoserver.repository.FeatureTypeInfo (ie change the package name) to avoid the use of the dreaded catalog word

Cheers,
Jody

Hi all,

Here is the newest version of the configuration GSIP for your reading pleasure.

http://geoserver.org/display/GEOS/GSIP+8+-+New+Configuration+System

Questions/comments/feedback welcome. It would be nice to be able to vote on this in next weeks IRC meeting.

-Justin

Justin_Deoliveira · May 22, 2008, 6:04am

Jody Garnett wrote:

Quick question:
- your diagram shows the existing persistence layer untouched? Is that still the case or can we replace it with something that is more simple to maintain .. reading further I see that appears to be future work. Your configuration persistence page currently only documents the XStream approach.

Yeah, changing the persistence layer is phase 3, and target for geoserver 2.x since it breaks backwards compatibility with our on disk storage format. I should make that clearer in the GSIP.

Comments:
- ResourcePool idea is well presented, nice work

Ignore if you want:
- can you show the difference between some information we keep explicitly, like setName in your example, and how extra property settings are handled over time

You mean like maps of metadata?

- may want to consider org.geoserver.repository.FeatureTypeInfo (ie change the package name) to avoid the use of the dreaded catalog word

fair enough... catalog does kind of have a bad stigma attached to it.

Cheers,
Jody

Hi all,

Here is the newest version of the configuration GSIP for your reading pleasure.

http://geoserver.org/display/GEOS/GSIP+8+-+New+Configuration+System

Questions/comments/feedback welcome. It would be nice to be able to vote on this in next weeks IRC meeting.

-Justin

!DSPAM:4007,4834cc92197581439371379!

--
Justin Deoliveira
The Open Planning Project
jdeolive@anonymised.com

Andrea_Aime3 · May 22, 2008, 1:36pm

Justin Deoliveira ha scritto:

Hi all,

Here is the newest version of the configuration GSIP for your reading pleasure.

http://geoserver.org/display/GEOS/GSIP+8+-+New+Configuration+System

Questions/comments/feedback welcome. It would be nice to be able to vote on this in next weeks IRC meeting.

Resource pool... wondering if we may be served better by a method
like:

catalog.getResourcePool().getResource(DataStore.class, this);

and have that method backed by a set of pluggable resource loaders.
The alternative is changing the ResourcePool API any time we need
to cache a new kind of resource. Which may not happen that often,
so it's a good alternative. Just wanted to present the options.

Finally, reading the document I'm not sure what will become of the
"apply/save" cycle. Is it still there, meaning that the "User Interface"
block contains the xxxConfig objects?

Cheers
Andrea

Alessio_Fabiani · May 22, 2008, 2:12pm

The proposal seems quite interesting, and I’m almost for it.

What is not fully clear to me is how the lazy loading of resources is handled by the new catalog system. A sort of caching system has been considered? One of the point of weakness of the actual configuration system, IMHO, is the loading and maintinment of all the resources into memory.

Very good the use of XStream and Hibernate for the persistance layer.

Cheers,
Alessio.

On Thu, May 22, 2008 at 3:36 PM, Andrea Aime <aaime@anonymised.com> wrote:

Justin Deoliveira ha scritto:

Hi all,

Here is the newest version of the configuration GSIP for your reading
pleasure.

http://geoserver.org/display/GEOS/GSIP+8±+New+Configuration+System

Questions/comments/feedback welcome. It would be nice to be able to vote
on this in next weeks IRC meeting.

Resource pool… wondering if we may be served better by a method
like:

catalog.getResourcePool().getResource(DataStore.class, this);

and have that method backed by a set of pluggable resource loaders.
The alternative is changing the ResourcePool API any time we need
to cache a new kind of resource. Which may not happen that often,
so it’s a good alternative. Just wanted to present the options.

Finally, reading the document I’m not sure what will become of the
“apply/save” cycle. Is it still there, meaning that the “User Interface”
block contains the xxxConfig objects?

Cheers
Andrea

This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/

Geoserver-devel mailing list
Geoserver-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/geoserver-devel

–

Eng. Alessio Fabiani
Vice-President /CTO GeoSolutions S.A.S.
Via Carignoni 51
55041 Camaiore (LU)
Italy

phone: +39 0584983027
fax: +39 0584983027
mob: +39 349 8227000

http://www.geo-solutions.it

Justin_Deoliveira · May 22, 2008, 3:37pm

Andrea Aime wrote:

Justin Deoliveira ha scritto:

Hi all,

Here is the newest version of the configuration GSIP for your reading pleasure.

http://geoserver.org/display/GEOS/GSIP+8+-+New+Configuration+System

Questions/comments/feedback welcome. It would be nice to be able to vote on this in next weeks IRC meeting.

Resource pool... wondering if we may be served better by a method
like:

catalog.getResourcePool().getResource(DataStore.class, this);

and have that method backed by a set of pluggable resource loaders.
The alternative is changing the ResourcePool API any time we need
to cache a new kind of resource. Which may not happen that often,
so it's a good alternative. Just wanted to present the options.

This is an interesting thought Andrea. Just thinking about people who will want to write their own services which have special resources, like say perhaps WPS. I like this idea... +1.

Finally, reading the document I'm not sure what will become of the
"apply/save" cycle. Is it still there, meaning that the "User Interface"
block contains the xxxConfig objects?

Still there, and it works like before since the *.config objects still mirror the *.global objects. Once we remove the config layer then we will be playing a direct access game.

Cheers
Andrea

!DSPAM:4007,483576e2148662143011171!

--
Justin Deoliveira
The Open Planning Project
jdeolive@anonymised.com

Justin_Deoliveira · May 22, 2008, 3:44pm

Alessio Fabiani wrote:

The proposal seems quite interesting, and I'm almost for it.

What is not fully clear to me is how the lazy loading of resources is handled by the new catalog system. A sort of caching system has been considered? One of the point of weakness of the actual configuration system, IMHO, is the loading and maintinment of all the resources into memory.

I agree with you 100% Alessio. ResourcePool is a bit better than what we had before. All resources are stored in a LRU map, keyed by their config object. When an object gets ejected from the map, its dispose() method is called (well for datastores and coverage readers anyways).

Caching of the resoures is not so much the issue imho, rather than how we access resources on startup. Currently on startup every datastore is connected to and every feature type is loaded. It will take a bit of work to make sure everything is loaded lazily.

But part of the reason why isolating all resource access is nice is it gives us one place to worry about resource loading. Which I think is the first step toward coming up with something better than what we have today.

Very good the use of XStream and Hibernate for the persistance layer.

Cheers,
               Alessio.

On Thu, May 22, 2008 at 3:36 PM, Andrea Aime <aaime@anonymised.com <mailto:aaime@anonymised.com>> wrote:

    Justin Deoliveira ha scritto:
     > Hi all,
     >
     > Here is the newest version of the configuration GSIP for your reading
     > pleasure.
     >
     > http://geoserver.org/display/GEOS/GSIP+8+-+New+Configuration+System
     >
     > Questions/comments/feedback welcome. It would be nice to be able
    to vote
     > on this in next weeks IRC meeting.

    Resource pool... wondering if we may be served better by a method
    like:

    catalog.getResourcePool().getResource(DataStore.class, this);

    and have that method backed by a set of pluggable resource loaders.
    The alternative is changing the ResourcePool API any time we need
    to cache a new kind of resource. Which may not happen that often,
    so it's a good alternative. Just wanted to present the options.

    Finally, reading the document I'm not sure what will become of the
    "apply/save" cycle. Is it still there, meaning that the "User Interface"
    block contains the xxxConfig objects?

    Cheers
    Andrea

    -------------------------------------------------------------------------
    This SF.net email is sponsored by: Microsoft
    Defy all challenges. Microsoft(R) Visual Studio 2008.
    http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
    _______________________________________________
    Geoserver-devel mailing list
    Geoserver-devel@lists.sourceforge.net
    <mailto:Geoserver-devel@lists.sourceforge.net>
    https://lists.sourceforge.net/lists/listinfo/geoserver-devel

--
-------------------------------------------------------
Eng. Alessio Fabiani
Vice-President /CTO GeoSolutions S.A.S.
Via Carignoni 51
55041 Camaiore (LU)
Italy

phone: +39 0584983027
fax: +39 0584983027
mob: +39 349 8227000

http://www.geo-solutions.it

------------------------------------------------------- !DSPAM:4007,48357f61170955219720167!

------------------------------------------------------------------------

-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/

!DSPAM:4007,48357f61170955219720167!

------------------------------------------------------------------------

_______________________________________________
Geoserver-devel mailing list
Geoserver-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/geoserver-devel

!DSPAM:4007,48357f61170955219720167!

--
Justin Deoliveira
The Open Planning Project
jdeolive@anonymised.com

jive · May 22, 2008, 7:37pm

Justin Deoliveira wrote:

- can you show the difference between some information we keep explicitly, like setName in your example, and how extra property settings are handled over time

You mean like maps of metadata?

I mean all the little flags we put in over time to control rendering, make something editable or not; etc. etc... The "seven steps of hell" was how David Blasby described doing this work when I first showed him; make a change in the xml; change it in the xml loader, change it in the xml saver, change it in global, change it in config, change it in the form, change it in the action ... all to record a new boolean.

Cheers,
Jody

jive · May 22, 2008, 7:40pm

Justin Deoliveira wrote:

Caching of the resoures is not so much the issue imho, rather than how we access resources on startup. Currently on startup every datastore is connected to and every feature type is loaded. It will take a bit of work to make sure everything is loaded lazily.

Reminds me; we originally were lazy and were forced to be greedy as requests would time out while waiting for ArcSDE (for example) to connect. Can we
have some kind of priority for that LRU queue; setting up an SeConnection is way more expensive then opening a shapefile; or even 80 shapefiles.

But part of the reason why isolating all resource access is nice is it gives us one place to worry about resource loading. Which I think is the first step toward coming up with something better than what we have today.

jdeolive++

Jody

jive · May 22, 2008, 7:43pm

Justin Deoliveira wrote:

Resource pool... wondering if we may be served better by a method
like:

catalog.getResourcePool().getResource(DataStore.class, this);

and have that method backed by a set of pluggable resource loaders.
The alternative is changing the ResourcePool API any time we need
to cache a new kind of resource. Which may not happen that often,
so it's a good alternative. Just wanted to present the options.

This is an interesting thought Andrea. Just thinking about people who will want to write their own services which have special resources, like say perhaps WPS. I like this idea... +1.

heh; now you are caught up to the major benifit of the uDig catalog. Also plus one; but you may want a getDataStore method to start out with just to be explicit (the major complaint we have with the udig catalog).

Still there, and it works like before since the *.config objects still mirror the *.global objects. Once we remove the config layer then we will be playing a direct access game.

Is the direct access game (when it arrives) going to be confusing for us? ie do we need to think about events...
Jody

Andrea_Aime3 · May 23, 2008, 7:46am

Jody Garnett ha scritto:
...

heh; now you are caught up to the major benifit of the uDig catalog. Also plus one; but you may want a getDataStore method to start out with just to be explicit (the major complaint we have with the udig catalog).

Extensible, but explicit for the most common cases... yeah, api becomes
a little wider, but makes sense.

Still there, and it works like before since the *.config objects still mirror the *.global objects. Once we remove the config layer then we will be playing a direct access game.

Is the direct access game (when it arrives) going to be confusing for us? ie do we need to think about events...

Hum? Confusing how? Services are already playing the direct access
game, it's the UI only that goes thru the config objects.
And yes, if your service is any stateful or has any caches, it will
have to listen to catalog changes I guess?

Cheers
Andrea

Andrea_Aime3 · May 23, 2008, 7:50am

Jody Garnett ha scritto:

Justin Deoliveira wrote:

Caching of the resoures is not so much the issue imho, rather than how we access resources on startup. Currently on startup every datastore is connected to and every feature type is loaded. It will take a bit of work to make sure everything is loaded lazily.

Reminds me; we originally were lazy and were forced to be greedy as requests would time out while waiting for ArcSDE (for example) to connect. Can we
have some kind of priority for that LRU queue; setting up an SeConnection is way more expensive then opening a shapefile; or even 80 shapefiles.

Ah ha, interesting one. LRU already has its own priority (you get to
top when used and move towards the land of the dead when others are
used), wouldn't adding another priority concept mess up things?
What about having multiple queue, and a way to set a "regionId" or
something like that to decide in which queue a resource ends up?
Most datastores may end up in the standard ds queue, but some,
like ArcSDE, and maybe WFS and Oracle (when not using JNDI pools)
should end up in their own?

Cheers
Andrea

Andrea_Aime3 · May 23, 2008, 1:47pm

Justin Deoliveira ha scritto:

Hi all,

Here is the newest version of the configuration GSIP for your reading pleasure.

http://geoserver.org/display/GEOS/GSIP+8+-+New+Configuration+System

Questions/comments/feedback welcome. It would be nice to be able to vote on this in next weeks IRC meeting.

Just adding two extra bits before I forget them, thought they are
not strictly related to this phase of the configuration subsystem
overhaul.

First, the granularity of modifications and storage. I believe this
proposal, in his full blown implementation, will trigger specific
events, like "datastore x created", "featuretype y configured" instead
of the current, dump, "catalog changed" event. Correct?
I also hope we'll have a good storage granularity, that is, we'll have
to try and make sure we just save what's strictly necessary, avoiding
to store the whole catalog in a single shot, which may become an
expensive operation when hundreds of feature types are configured.
Finally, is the resource pool maintained accordingly,
that is, avoiding to drop and recreate all datastores like we do now,
at great expense of time, especially for some datastores like SDE?
At the same time it would be wise to remove datastores from the pool
if removed from the configuratin, since some of the cached objects
(connection pools) are scarce resources.

The second thing that popped up to mind is that it would be nice
to have "hidden" layers, that is, feature types that are configured,
available, but not advertised along with capabilities. This idea
comes from the frequent users request to have transient layers
built from views and to be mainly used on a user by user basis.
I believe the use case calls for a layer that's not advertised
to the whole world... well, in fact it would be nice to have it
available only when a certain user is making the requests, but
this seems quite a bit harder. If the user is authenticated,
it would be possible to have the security subsystem handle it,
but I'm sceptical the vast majority of apps would want to
rely on the GeoServer security subsystem... thought with a good
REST api to manage users and security it could become a viable
solution.

Anyways, just some food for thought
Cheers
Andrea

Justin_Deoliveira · May 23, 2008, 2:09pm

Andrea Aime wrote:

Justin Deoliveira ha scritto:

Hi all,

Here is the newest version of the configuration GSIP for your reading pleasure.

http://geoserver.org/display/GEOS/GSIP+8+-+New+Configuration+System

Questions/comments/feedback welcome. It would be nice to be able to vote on this in next weeks IRC meeting.

Just adding two extra bits before I forget them, thought they are
not strictly related to this phase of the configuration subsystem
overhaul.

First, the granularity of modifications and storage. I believe this
proposal, in his full blown implementation, will trigger specific
events, like "datastore x created", "featuretype y configured" instead
of the current, dump, "catalog changed" event. Correct?

Correct, events are thrown for every addition, removal, and modification of any "configuration" object.

I also hope we'll have a good storage granularity, that is, we'll have
to try and make sure we just save what's strictly necessary, avoiding
to store the whole catalog in a single shot, which may become an
expensive operation when hundreds of feature types are configured.

Yeah... depends on how we do it. As you say if we serialize in one shot it could lead to a huge file that needs to be read back in. Off the top of my head what might work well is to keep the separation we have today between the catalog.xml and individual info.xml files per feature type. That coupled with lazy loading should lead to some good results.

Finally, is the resource pool maintained accordingly,
that is, avoiding to drop and recreate all datastores like we do now,
at great expense of time, especially for some datastores like SDE?
At the same time it would be wise to remove datastores from the pool
if removed from the configuratin, since some of the cached objects
(connection pools) are scarce resources.

Unfortunately not as implemented currently. Having to maintain total backwards compatibility with the UI has forced us to mimic the way things work today.

The second thing that popped up to mind is that it would be nice
to have "hidden" layers, that is, feature types that are configured,
available, but not advertised along with capabilities. This idea
comes from the frequent users request to have transient layers
built from views and to be mainly used on a user by user basis.
I believe the use case calls for a layer that's not advertised
to the whole world... well, in fact it would be nice to have it
available only when a certain user is making the requests, but
this seems quite a bit harder. If the user is authenticated,
it would be possible to have the security subsystem handle it,
but I'm sceptical the vast majority of apps would want to
rely on the GeoServer security subsystem... thought with a good
REST api to manage users and security it could become a viable
solution.

Interesting... I am not sure I quite understand the use case. However at first glance I think i prefer making this functionality of the security system: being able to specify read access on a layer to specific groups. Rather then add a special type of layer which implies some access control constraints.

Anyways, just some food for thought
Cheers
Andrea

!DSPAM:4007,4836caf1149873327367457!

--
Justin Deoliveira
The Open Planning Project
jdeolive@anonymised.com

Andrea_Aime3 · May 23, 2008, 2:19pm

Justin Deoliveira ha scritto:
...

I also hope we'll have a good storage granularity, that is, we'll have
to try and make sure we just save what's strictly necessary, avoiding
to store the whole catalog in a single shot, which may become an
expensive operation when hundreds of feature types are configured.

Yeah... depends on how we do it. As you say if we serialize in one shot it could lead to a huge file that needs to be read back in. Off the top of my head what might work well is to keep the separation we have today between the catalog.xml and individual info.xml files per feature type. That coupled with lazy loading should lead to some good results.

I'm wondering if we really need to reload back in anything... modify
the configuration model directly (in future) save the bit that needs to
be saved, and be done with it? Reloading back in has the advantage of
letting you know you're screwed... the very moment you are, instead
of waiting for the next restart, so it may be a good debugging tool,
but I'm wondering if it is necessary for normal operation.
Again, I'm talking of phase 2, not the current proposal scope.

Finally, is the resource pool maintained accordingly,
that is, avoiding to drop and recreate all datastores like we do now,
at great expense of time, especially for some datastores like SDE?
At the same time it would be wise to remove datastores from the pool
if removed from the configuratin, since some of the cached objects
(connection pools) are scarce resources.

Unfortunately not as implemented currently. Having to maintain total backwards compatibility with the UI has forced us to mimic the way things work today.

Yep, but when we get rid of it, we can be smarter and avoid reloading
everything, right?

...

Interesting... I am not sure I quite understand the use case.

Think of an application built on top of GeoServer that builds summarizing views of data for a specific user that then wants
to see the data on the web with wms. To do so you have to configure
a new layer, but at the same time, that layer is sort of private
to that user, it's part of his conversation with the application.

Cheers
Andrea

jive · May 23, 2008, 3:01pm

Andrea Aime wrote:

Ah ha, interesting one.

That is always a scary statement comming from you Andrea. It usually means I have uncovered something that is going to hurt my brain to figure out.

LRU already has its own priority (you get to top when used and move towards the land of the dead when others are
used), wouldn't adding another priority concept mess up things?
What about having multiple queue, and a way to set a "regionId" or
something like that to decide in which queue a resource ends up?
Most datastores may end up in the standard ds queue, but some,
like ArcSDE, and maybe WFS and Oracle (when not using JNDI pools)
should end up in their own?

Okay lets me try getting out of this quick:
- do *nothing* for the first cut; LRU is simple and we got enough going on (I think everyone is tired right now and we need to rest up before the sprint; draw some icons or something)
- do *nothing* for the second cut; and punt the responsibility on your "DataSource" factory finder should take care of recycling JDBC "pools" should it not?

The only thing that leaves out in the cold is ArcSDE; a concern for a small portion of our users so ...
- consider a "priority" or a flag to keep something like an ArcSDE entry in the queue when an arcsde user notices and asks us for it

Jody

Andrea_Aime3 · May 23, 2008, 3:07pm

Jody Garnett ha scritto:

Andrea Aime wrote:

Ah ha, interesting one.

That is always a scary statement comming from you Andrea. It usually means I have uncovered something that is going to hurt my brain to figure out.

LRU already has its own priority (you get to top when used and move towards the land of the dead when others are
used), wouldn't adding another priority concept mess up things?
What about having multiple queue, and a way to set a "regionId" or
something like that to decide in which queue a resource ends up?
Most datastores may end up in the standard ds queue, but some,
like ArcSDE, and maybe WFS and Oracle (when not using JNDI pools)
should end up in their own?

Okay lets me try getting out of this quick:
- do *nothing* for the first cut; LRU is simple and we got enough going on (I think everyone is tired right now and we need to rest up before the sprint; draw some icons or something)
- do *nothing* for the second cut; and punt the responsibility on your "DataSource" factory finder should take care of recycling JDBC "pools" should it not?

No, it should not, seems like a scary mistake instead. If the factories
keep the pools around, how do you get rid of them?
Would you be happy if a misconfigured arcsde pool that you already
dropped kept on eating your preciously payed client licenses?
Such resource management should not be mandated to the factory finder
in my opinion, but to the user code, that knows the specific use
case you're dealing with.

The only thing that leaves out in the cold is ArcSDE; a concern for a small portion of our users so ...
- consider a "priority" or a flag to keep something like an ArcSDE entry in the queue when an arcsde user notices and asks us for it

I still feel a pool managed by geoserver with multiple regions is a better solution. That ensures SDE gets its own queue, so that nobody
will kick it out of it, and allows GeoServer to get rid of the connection and its pool when the user removes the datastore.

Cheers
Andrea

Justin_Deoliveira · May 23, 2008, 3:12pm

I still feel a pool managed by geoserver with multiple regions is a better solution. That ensures SDE gets its own queue, so that nobody
will kick it out of it, and allows GeoServer to get rid of the connection and its pool when the user removes the datastore.

Yeah I like this idea of grouping resources into "levels" based on how expensive they are to create / destroy. You said before that it would be nice to have an extension point bases system for resources. So I imagine at point there will be some sort of "ResourceFactory". I can see adding a method like:

int getLevel(Object)

Which would rate the resource in terms of its cost and it could be grouped accordingly.

Just a thought.

Cheers
Andrea

!DSPAM:4007,4836ddce204428992556831!

--
Justin Deoliveira
The Open Planning Project
jdeolive@anonymised.com