[Geoserver-devel] Layers derived from Processes

So I’ve been hearing a few different use cases around processing, and wanted to get a discussion going on the possibilities for some GeoServer improvements that could improve our story.

One is to be able to more fully define layers defined by processes on the fly. Right now we have the rendering transformations, which are totally awesome. And you can use that to render a layer on the fly. But if you want to query the resulting layer you have to recreate the exact wps request that’s embedded in the SLD. And if you want to change the visualization you have to also get comfortable with the full SLD. It seems like it’d be better if you could just define a layer that’s the result of a WPS process, coming from one or more layers. It’d be a layer that lives in the catalog, that can be queried with WMS or WFS/WCS, could have further WPS done on it, and could have a variety of SLD’s associated with it. It’d be listed in the capabilities document. But it’d always be constructed on the fly, running the backend WPS process on the layers that make it up. The advantage would be that it always stays up to date, and also doesn’t take up lots of room in a database. It’d be ideal for WPS processes that can be applied quickly, and/or that work on relatively small datasets. It’d be similar to a SQL View layer, but instead of being defined by SQL it’d be defined by a Process and its inputs (be they layers or set variables, or perhaps even parametric like sql views can be).

The other that jumps to mind is a related construct that would work for processes that may not work so well on the fly. Think like routing, where you need to load the whole graph in memory. Or even heatmaps could benefit, by doing a global heatmap on the whole layer, which would tile better. This would be sort of like a ‘cache’ of the process. I think it’d work similar to how the WPS process that outputs to the GeoServer catalog works. It’d output the process to a new layer, storing in the default database or raster format. But I think a key difference/improvement would be for it to be ‘live’ - have knowledge of changes to the base layer. So just like we kill the tile cache when an edit comes in over WFS-T or we’re notified on GeoRSS so too the derived WPS layer would rerun it’s whole process if there’s a change.

A related type of layer would be to just run the process on a schedule. So like if the notification stuff is not easy to set up, or the user knows they’ll want it run at certain times each day, they could define a derived WPS layer that re-runs the process at that set time.

Do others have other types of layers derived from WPS that could be useful? I think it’d be great to have a GUI that lets our users easily make all three type of layers. I haven’t reached any great insight on how that GUI would work - they could all be different types of a Derived Layer, with different config options. Or they could each be their own type of layer, which might make it clearer? Or maybe more confusing. I guess they maybe wouldn’t depend on a datastore? Or we could make a ‘virtual datastore’ for convenience. But each layer would just be defined by one or more other layers plus one or more processes.

I do think having this could make the WPS builder tool a lot more useful. Right now you make your query, but it then just returns you a big xml document or maybe a geotiff of what you made. It’d be great if instead you could build your query and then look at the results on openlayers. We could make that pretty easily now with a shortcut to the gs catalog import process. But with the above options you could have more control. The on the fly one in particular could be nice, as it could ease the creation of complex rendering transformations. Users could just define the transform, and then use a standard tool like uDig or GeoExplorer to define the styling. With these as options GeoServer becomes a place to explore spatial processing, instead of just building WPS. And could make it easier to make pure javascript clients that let people apply processes to full layers instead of just smaller datasets and getting the results in the browser.

Curious for people’s feedback, as I imagine others have been thinking on this too. And indeed just generally from everyone on what other improvements we could do on the WPS, since it’s been out there for a bit, and I’ve heard of some people doing some pretty cool stuff with it.

Chris

On Wed, Jan 16, 2013 at 10:50 PM, Chris Holmes <cholmes@anonymised.com> wrote:

So I’ve been hearing a few different use cases around processing, and wanted to get a discussion going on the possibilities for some GeoServer improvements that could improve our story.

One is to be able to more fully define layers defined by processes on the fly. Right now we have the rendering transformations, which are totally awesome. And you can use that to render a layer on the fly. But if you want to query the resulting layer you have to recreate the exact wps request that’s embedded in the SLD. And if you want to change the visualization you have to also get comfortable with the full SLD. It seems like it’d be better if you could just define a layer that’s the result of a WPS process, coming from one or more layers. It’d be a layer that lives in the catalog, that can be queried with WMS or WFS/WCS, could have further WPS done on it, and could have a variety of SLD’s associated with it. It’d be listed in the capabilities document. But it’d always be constructed on the fly, running the backend WPS process on the layers that make it up. The advantage would be that it always stays up to date, and also doesn’t take up lots of room in a database. It’d be ideal for WPS processes that can be applied quickly, and/or that work on relatively small datasets. It’d be similar to a SQL View layer, but instead of being defined by SQL it’d be defined by a Process and its inputs (be they layers or set variables, or perhaps even parametric like sql views can be).

The other that jumps to mind is a related construct that would work for processes that may not work so well on the fly. Think like routing, where you need to load the whole graph in memory. Or even heatmaps could benefit, by doing a global heatmap on the whole layer, which would tile better. This would be sort of like a ‘cache’ of the process. I think it’d work similar to how the WPS process that outputs to the GeoServer catalog works. It’d output the process to a new layer, storing in the default database or raster format. But I think a key difference/improvement would be for it to be ‘live’ - have knowledge of changes to the base layer. So just like we kill the tile cache when an edit comes in over WFS-T or we’re notified on GeoRSS so too the derived WPS layer would rerun it’s whole process if there’s a change.

A related type of layer would be to just run the process on a schedule. So like if the notification stuff is not easy to set up, or the user knows they’ll want it run at certain times each day, they could define a derived WPS layer that re-runs the process at that set time.

Fully on board so far, that’s exactly in line with what I was thinking in terms of layers derived from WPS processes
(something I have been wanting for quite some time, but failed to secure funding for so far).

Do others have other types of layers derived from WPS that could be useful? I think it’d be great to have a GUI that lets our users easily make all three type of layers. I haven’t reached any great insight on how that GUI would work - they could all be different types of a Derived Layer, with different config options. Or they could each be their own type of layer, which might make it clearer? Or maybe more confusing. I guess they maybe wouldn’t depend on a datastore? Or we could make a ‘virtual datastore’ for convenience. But each layer would just be defined by one or more other layers plus one or more processes.

To me they seem like the same thing, with different optional behaviors.
ProcessResourceInfo, with different ways to manage it by the ResourcePool, streaming, scheduled
or on demand update (have a “dirty” flag that rest config can manage).

The idea of using a datastore in the middle may make things quicker to implement, but also clunkier, having
a dedicated resource type is imho cleaner, even if it’s more work.
One can refer to the latest resource type added, the WMS cascaded layers ones, it took a few days but the
result is imho better than have tried to force that into a CoverageStore/CoverageInfo paradigm.

I do think having this could make the WPS builder tool a lot more useful. Right now you make your query, but it then just returns you a big xml document or maybe a geotiff of what you made. It’d be great if instead you could build your query and then look at the results on openlayers. We could make that pretty easily now with a shortcut to the gs catalog import process. But with the above options you could have more control. The on the fly one in particular could be nice, as it could ease the creation of complex rendering transformations. Users could just define the transform, and then use a standard tool like uDig or GeoExplorer to define the styling. With these as options GeoServer becomes a place to explore spatial processing, instead of just building WPS. And could make it easier to make pure javascript clients that let people apply processes to full layers instead of just smaller datasets and getting the results in the browser.

Yep, fully agreed.

A distinction between processes that can stream and ones that need to load everything in memory
should probably be made clearer to the administrator, as it will influence what choice of caching
the process results will have and it’s a must in terms of resource control.

Also agree having something like the process build to help setting up these “processing layers”
would be great.
Just a word of caution about the process builder, I’ve created it as a quick and dirty tool to interactively
run processes, but it’s definitely “ugly inside”, you might want to heavily refactor it, or to just start
clean.

Cheers
Andrea

==
Our support, Your Success! Visit http://opensdi.geo-solutions.it for more information.

Ing. Andrea Aime
@geowolf
Technical Lead

GeoSolutions S.A.S.
Via Poggio alle Viti 1187
55054 Massarosa (LU)
Italy
phone: +39 0584 962313
fax: +39 0584 1660272
mob: +39 339 8844549

http://www.geo-solutions.it
http://twitter.com/geosolutions_it