[Geoserver-devel] resource publishing split (long email warning)

I have been thinking about how to approach the resource publishing split
lately and wanted to dump my thoughts to start some conversation and
perhaps come to a general consensus on approach before any formal
proposals etc..

So to recap, what is the "resource/publishing split"? Well put simply the
ability to decouple a resource from the services that publish it.
Some simple use cases:

* be able to publish a layer with WMS, but not WFS or WCS
* be able to add a resource (say a shapefile) once, but publish it
multiple times with the same or different services.

And the list goes on.

In terms of code what this means is de-coupling LayerInfo from
ResourceInfo. Currently we do some checks behind the scenes to ensure
that the relationship between a LayerInfo and its ResourceInfo is 1-1,
that they share the same name, etc...

Also lumped into the resource/publishing split idea is the idea of
"virtual services". This is basically the ability to have multiple "virtual geoservers", each serving different layers, with different combinations of services, etc... Put simply this is the equivalent of the mapfile for mapserver.

The running idea for pulling this off is to add The idea of a Map into geoserver configuration. A map being just a grouping of layers (although probably some other configuration as well). Also note that the terms "map" and "layer" in this context do not imply just WMS, a "layer" is just a published resource, published by any service, WMS, WFS, WCS, etc... The use cases for this one include:

* Supporting different resources published by different "virtual servers"
* Add configuration on a "virtual service" bases, example SRS's for a particular WMS configuration
* Running multiple WMS with different security policies

And many more. Currently we achieve some of these today with some workarounds for the capabilities document.

So that is more or less the *what*, what about the *how*? These are just my thoughts on what a reasonable implementation strategy would look like, taking into account allowance for a gradual upgrade path.

1) The MapInfo interface needs to be "beefed up". Current it is not used by anything, it was just placed into the mix for brainstorming purposes when the catalog interfaces were redesigned. And currently it is just a container for a list of layers. IT will need a list of allowed SRS's, and other other stuff we want to store on a virtual configuration basis.

Here is where things get interesting and where we have some options, 2a) and 2b)

2a) We port all services to use the Map objects. This would mean that services would never look up resources directly. They would always look up layers (constrained by a Map object). Similar to how resources are looked up now and always qualified by a namespace.

So in any request, the client would specify the map or virtual configuration to be used, probably with a simple kvp like "map=foo". The service handling the call would when look up that map, and use it to qualify any layers that are looked up.

The downside of 2a is that it represents a ton of work. On the same scale as porting to the new catalog and configuration. This is where 2b) comes in.

2b) Use a view of the catalog to constrain the layers and resources that a service is allowed to see. This "view" would take the form of a thread local variable, so a view per request. This view is driven by the map /configuration specified by the client.

So in any request, the first thing that happens is that some pre-processing step figures out what map is being requested, and then populates a thread local catalog variable with a view based on that map. The view is just a wrapper around the real catalog which filters out what is available based on the specified map. This is very similar in nature to how the security stuff works with "SecureCatalog".

The only changes in the short term are that the services need to get the catalog from the thread local, rather than the real catalog. This could pretty easily be implemented with a spring proxy, and would be totally transparent.

The downside of 2b) is that it does not really get us the resource/publishing split, but gets us the virtual configuration. That said 2b) represents quite a nice gradual update path as we can port services over after the fact. See 3b)

3b) This is more or less the same process as 2a) *but* without having to worry about the map being requested, because this occurs transparently by the thread local view of the catalog.

The only work to do (which is much less) is to have services go through layer, rather than directly to resource.

And that is more or less it folks. Please share your thoughts and opinions on this one.

-Justin

--
Justin Deoliveira
OpenGeo - http://opengeo.org
Enterprise support for open source geospatial.

The b) path seems a lot less risky, and thus more appealing from my perspective.

I think it'd be good to get an RnD page going on this, to lay out the use cases in particular, and the exact expected outcomes more. What the implications for UI are, what they are for the REST endpoints, what they are for OGC end points.

But the general approach sounds sane to me.

Chris

Justin Deoliveira wrote:

I have been thinking about how to approach the resource publishing split
lately and wanted to dump my thoughts to start some conversation and
perhaps come to a general consensus on approach before any formal
proposals etc..

So to recap, what is the "resource/publishing split"? Well put simply the
ability to decouple a resource from the services that publish it.
Some simple use cases:

* be able to publish a layer with WMS, but not WFS or WCS
* be able to add a resource (say a shapefile) once, but publish it
multiple times with the same or different services.

And the list goes on.

In terms of code what this means is de-coupling LayerInfo from
ResourceInfo. Currently we do some checks behind the scenes to ensure
that the relationship between a LayerInfo and its ResourceInfo is 1-1,
that they share the same name, etc...

Also lumped into the resource/publishing split idea is the idea of
"virtual services". This is basically the ability to have multiple "virtual geoservers", each serving different layers, with different combinations of services, etc... Put simply this is the equivalent of the mapfile for mapserver.

The running idea for pulling this off is to add The idea of a Map into geoserver configuration. A map being just a grouping of layers (although probably some other configuration as well). Also note that the terms "map" and "layer" in this context do not imply just WMS, a "layer" is just a published resource, published by any service, WMS, WFS, WCS, etc... The use cases for this one include:

* Supporting different resources published by different "virtual servers"
* Add configuration on a "virtual service" bases, example SRS's for a particular WMS configuration
* Running multiple WMS with different security policies

And many more. Currently we achieve some of these today with some workarounds for the capabilities document.

So that is more or less the *what*, what about the *how*? These are just my thoughts on what a reasonable implementation strategy would look like, taking into account allowance for a gradual upgrade path.

1) The MapInfo interface needs to be "beefed up". Current it is not used by anything, it was just placed into the mix for brainstorming purposes when the catalog interfaces were redesigned. And currently it is just a container for a list of layers. IT will need a list of allowed SRS's, and other other stuff we want to store on a virtual configuration basis.

Here is where things get interesting and where we have some options, 2a) and 2b)

2a) We port all services to use the Map objects. This would mean that services would never look up resources directly. They would always look up layers (constrained by a Map object). Similar to how resources are looked up now and always qualified by a namespace.

So in any request, the client would specify the map or virtual configuration to be used, probably with a simple kvp like "map=foo". The service handling the call would when look up that map, and use it to qualify any layers that are looked up.

The downside of 2a is that it represents a ton of work. On the same scale as porting to the new catalog and configuration. This is where 2b) comes in.

2b) Use a view of the catalog to constrain the layers and resources that a service is allowed to see. This "view" would take the form of a thread local variable, so a view per request. This view is driven by the map /configuration specified by the client.

So in any request, the first thing that happens is that some pre-processing step figures out what map is being requested, and then populates a thread local catalog variable with a view based on that map. The view is just a wrapper around the real catalog which filters out what is available based on the specified map. This is very similar in nature to how the security stuff works with "SecureCatalog".

The only changes in the short term are that the services need to get the catalog from the thread local, rather than the real catalog. This could pretty easily be implemented with a spring proxy, and would be totally transparent.

The downside of 2b) is that it does not really get us the resource/publishing split, but gets us the virtual configuration. That said 2b) represents quite a nice gradual update path as we can port services over after the fact. See 3b)

3b) This is more or less the same process as 2a) *but* without having to worry about the map being requested, because this occurs transparently by the thread local view of the catalog.

The only work to do (which is much less) is to have services go through layer, rather than directly to resource.

And that is more or less it folks. Please share your thoughts and opinions on this one.

-Justin

--
Chris Holmes
OpenGeo - http://opengeo.org
Expert service straight from the developers.

Justin Deoliveira ha scritto:

I have been thinking about how to approach the resource publishing split
lately and wanted to dump my thoughts to start some conversation and
perhaps come to a general consensus on approach before any formal
proposals etc..

Sorry for my late reply, but here goes.

So to recap, what is the "resource/publishing split"? Well put simply the
ability to decouple a resource from the services that publish it.
Some simple use cases:

* be able to publish a layer with WMS, but not WFS or WCS
* be able to add a resource (say a shapefile) once, but publish it
multiple times with the same or different services.

And with different configurations, names, and so on. Right?

Also lumped into the resource/publishing split idea is the idea of
"virtual services". This is basically the ability to have multiple "virtual geoservers", each serving different layers, with different combinations of services, etc... Put simply this is the equivalent of the mapfile for mapserver.

Yep. Actually, this is the reason number one to have the split imho :slight_smile:

The running idea for pulling this off is to add The idea of a Map into geoserver configuration. A map being just a grouping of layers (although probably some other configuration as well). Also note that the terms "map" and "layer" in this context do not imply just WMS, a "layer" is just a published resource, published by any service, WMS, WFS, WCS, etc... The use cases for this one include:

* Supporting different resources published by different "virtual servers"
* Add configuration on a "virtual service" bases, example SRS's for a particular WMS configuration
* Running multiple WMS with different security policies

And many more. Currently we achieve some of these today with some workarounds for the capabilities document.

So that is more or less the *what*, what about the *how*? These are just my thoughts on what a reasonable implementation strategy would look like, taking into account allowance for a gradual upgrade path.

1) The MapInfo interface needs to be "beefed up". Current it is not used by anything, it was just placed into the mix for brainstorming purposes when the catalog interfaces were redesigned. And currently it is just a container for a list of layers. IT will need a list of allowed SRS's, and other other stuff we want to store on a virtual configuration basis.

Wondering if we do want to move the service configuration under
MapInfo fully. And have a default map to make things work as usual
when you're not specifying which map you want to use.

Here is where things get interesting and where we have some options, 2a) and 2b)

2a) We port all services to use the Map objects. This would mean that services would never look up resources directly. They would always look up layers (constrained by a Map object). Similar to how resources are looked up now and always qualified by a namespace.

So in any request, the client would specify the map or virtual configuration to be used, probably with a simple kvp like "map=foo". The service handling the call would when look up that map, and use it to qualify any layers that are looked up.

The downside of 2a is that it represents a ton of work. On the same scale as porting to the new catalog and configuration. This is where 2b) comes in.

2b) Use a view of the catalog to constrain the layers and resources that a service is allowed to see. This "view" would take the form of a thread local variable, so a view per request. This view is driven by the map /configuration specified by the client.

So in any request, the first thing that happens is that some pre-processing step figures out what map is being requested, and then populates a thread local catalog variable with a view based on that map. The view is just a wrapper around the real catalog which filters out what is available based on the specified map. This is very similar in nature to how the security stuff works with "SecureCatalog".

The only changes in the short term are that the services need to get the catalog from the thread local, rather than the real catalog. This could pretty easily be implemented with a spring proxy, and would be totally transparent.

The downside of 2b) is that it does not really get us the resource/publishing split, but gets us the virtual configuration. That said 2b) represents quite a nice gradual update path as we can port services over after the fact. See 3b)

I'm not fully seeing the downside of dealing with "virtual configuration" in the service code case. I mean, the service code plays
within one of the Map context, that's his "GeoServer" so to speak.
What is the advantage of making this explicit?

3b) This is more or less the same process as 2a) *but* without having to worry about the map being requested, because this occurs transparently by the thread local view of the catalog.

The only work to do (which is much less) is to have services go through layer, rather than directly to resource.

And that is more or less it folks. Please share your thoughts and opinions on this one.

I like the virtual config approach, it keeps the service code lean,
avoid adding an extra concern that the service code is not actually
interested into. It also leaves the room open for a "per user sessions
catalog", something we'll want to have once WPS is done: think of
having the ability to register the results of your WPS call into
a local catalog and be able to play with them from WMS and WFS
without having to publish them for everybody to see.
This would work nicely also for people interested in uploading
some data thru RestConfig. From this point of view also a per
user persistent catalog would be interesting.
We'll have to add user registration eventually anyways, the current
setup of the admin specifying username and password in a file
is not good, the admin should control the user roles, but not
the user password.

Cheers
Andrea

--
Andrea Aime
OpenGeo - http://opengeo.org
Expert service straight from the developers.