[Geoserver-devel] Discussing a deep refactor of the KML subsystem

Hi all,
I’m writing this mail to start a discussion on some GSIP we want to make later down the
line (that is, not by Jan 21) and that would result in a full refactor of the KML generation
subsystem in GeoServer, which is a big amount of work, but luckily for us, funded, too,
in the direction to give the KML subsystem better support for the future, both in
terms of evolutions and maintenance: it is no big secret that few (none?) developers
are happy to work in that portion of the code base.

The starting point is of course to setup some design that everybody would be happy with,
then we’ll make a proposal and do the work.
So, this is the phase in which we’re trying to gather as much feedback as possible
from the community in order to make the KML subsystem better :slight_smile:

A good starting point could be David’s retired proposal on this very topic, here:
http://geoserver.org/display/GEOS/GSIP+21±+KML+Vector+Transformer+Refactoring

Some ideas I’d certainly like to reuse:

  • factor out code shared with streaming renderer in helper classes to avoid duplication
  • add some selected extension points to make it easier to interact with the kml generator
  • the idea of classifications is a nice one, but a bit out of the scope of the refactoring work
  • pluggable placemark generation is also interesting, thought we’d have to figure out
    a way for the various listeners to work togheter intead of encoding twice the same element/attribute

One aspect that is not considered in the proposal above, but which is imho key to

untangle the existing code, is to have a clear separation between the code that’s
writing the KML and the one that manipulates the feature collections and styles
in order to get the desired result.
To make some parallels, the WMS renderer, StreamingRenderer, does something
similar, but it’s not actually painting pixels at the low level, it’s telling a Graphics2D
to do so.

Now, a clean way to do so could be to create an object model of KML, have the
main code build the tree representing the KML output, and then have a second
bit of code doing the XML encoding.
This has some advantages, for example, having a target object model means
that we can have N listeners/plugins manipulate the structure iteratively
without the risk of generating in output the same information multiple time.
It has of course a deadly drawback: we’d have to keep everyting in memory
to follow this approach, breaking information streaming.

That of course cannot be, but there may be some compromise between that
and a solution that embeds in same code chunk feature/style manipulation
and XML writing.

Ideas:

  • have an object model, but work with it in chunks. Like, build in memory
    a placemark, serialize it to xml, and then move to the next, and so on
    (and something similar would happen to Document and Link objects)
  • have a set of nested “builders” that write the XML, and have the main
    code call upon them, these builders would have methods dealing
    with KML concepts as opposed to low level XML ones

I also had a look at xsd-kml, but it seems to me that it’s geared towards
parsing and there is no concept of “streaming writing” using it?

Well, that’s enough for a first cut in the discussion.
Feedback appreciated :slight_smile:

Cheers
Andrea

==
Our support, Your Success! Visit http://opensdi.geo-solutions.it for more information.

Ing. Andrea Aime
@geowolf
Technical Lead

GeoSolutions S.A.S.
Via Poggio alle Viti 1187
55054 Massarosa (LU)
Italy
phone: +39 0584 962313
fax: +39 0584 1660272
mob: +39 339 8844549

http://www.geo-solutions.it
http://twitter.com/geosolutions_it


First off this will be a welcome change for all the reasons you stated. So very much in support of this work. I will pose a couple of additional questions not necessarily related to design.

  1. Will we go for feature parity with that is there now? Or shall we cut scope to things that we are feel are used?
  2. Is this a good time to factor kml out of the main wms module?

For (2) I think there could be some benefits. What I envision is:

Step 1. Factor out current kml stuff in own module and demote it to a community module.
Step 2. Start on the new one as a community module.
Step 3. Once the new module is done promote it to an extension / core module to replace the first.

Anyways, just a thought. I figured this could be a nice way to allow for backwards compatibility (users can still use the old module) without having to burden the new module with maintaining the same api/contract/features/etc… as the old.

And actually a follow up to (2):

  1. Does it make sense to try and do this for the most part at the geotools level.

More comments inline.

···

On Wed, Jan 9, 2013 at 8:06 AM, Andrea Aime <andrea.aime@anonymised.com> wrote:

Hi all,
I’m writing this mail to start a discussion on some GSIP we want to make later down the
line (that is, not by Jan 21) and that would result in a full refactor of the KML generation
subsystem in GeoServer, which is a big amount of work, but luckily for us, funded, too,
in the direction to give the KML subsystem better support for the future, both in
terms of evolutions and maintenance: it is no big secret that few (none?) developers
are happy to work in that portion of the code base.

The starting point is of course to setup some design that everybody would be happy with,
then we’ll make a proposal and do the work.
So, this is the phase in which we’re trying to gather as much feedback as possible
from the community in order to make the KML subsystem better :slight_smile:

A good starting point could be David’s retired proposal on this very topic, here:
http://geoserver.org/display/GEOS/GSIP+21±+KML+Vector+Transformer+Refactoring

Some ideas I’d certainly like to reuse:

  • factor out code shared with streaming renderer in helper classes to avoid duplication
  • add some selected extension points to make it easier to interact with the kml generator
  • the idea of classifications is a nice one, but a bit out of the scope of the refactoring work
  • pluggable placemark generation is also interesting, thought we’d have to figure out
    a way for the various listeners to work togheter intead of encoding twice the same element/attribute

One aspect that is not considered in the proposal above, but which is imho key to

untangle the existing code, is to have a clear separation between the code that’s
writing the KML and the one that manipulates the feature collections and styles
in order to get the desired result.
To make some parallels, the WMS renderer, StreamingRenderer, does something
similar, but it’s not actually painting pixels at the low level, it’s telling a Graphics2D
to do so.

Now, a clean way to do so could be to create an object model of KML, have the
main code build the tree representing the KML output, and then have a second
bit of code doing the XML encoding.
This has some advantages, for example, having a target object model means
that we can have N listeners/plugins manipulate the structure iteratively
without the risk of generating in output the same information multiple time.
It has of course a deadly drawback: we’d have to keep everyting in memory
to follow this approach, breaking information streaming.

That of course cannot be, but there may be some compromise between that
and a solution that embeds in same code chunk feature/style manipulation
and XML writing.

Ideas:

  • have an object model, but work with it in chunks. Like, build in memory
    a placemark, serialize it to xml, and then move to the next, and so on
    (and something similar would happen to Document and Link objects)
  • have a set of nested “builders” that write the XML, and have the main
    code call upon them, these builders would have methods dealing
    with KML concepts as opposed to low level XML ones

Object model sounds good to me. Actually this was one thing that was needed for recent parsing work. We needed a way to represent kml folders so we rolled a new class to represent them.

I also had a look at xsd-kml, but it seems to me that it’s geared towards
parsing and there is no concept of “streaming writing” using it?

Yeah, it only does parsing at the moment. In terms of streaming writing it is indeed possible as we do this with GML. Basically if we want to stream a bunch of objects back (say like placemarks in a document) instead of returning a list of all them we just return an iterator or a collection with a lazy iterator. As i understand more or less the way streaming with WPS processes work.

I do think an xsd-kml approach could work but I am well aware there are issues with it. I am happy to offer help if you folks want to go that way but am fine if you don’t.

Well, that’s enough for a first cut in the discussion.
Feedback appreciated :slight_smile:

Cheers
Andrea

==
Our support, Your Success! Visit http://opensdi.geo-solutions.it for more information.

Ing. Andrea Aime
@geowolf
Technical Lead

GeoSolutions S.A.S.
Via Poggio alle Viti 1187
55054 Massarosa (LU)
Italy
phone: +39 0584 962313
fax: +39 0584 1660272
mob: +39 339 8844549

http://www.geo-solutions.it
http://twitter.com/geosolutions_it



Master Java SE, Java EE, Eclipse, Spring, Hibernate, JavaScript, jQuery
and much more. Keep your Java skills current with LearnJavaNow -
200+ hours of step-by-step video tutorials by Java experts.
SALE $49.99 this month only – learn more at:
http://p.sf.net/sfu/learnmore_122612


Geoserver-devel mailing list
Geoserver-devel@anonymised.comsts.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/geoserver-devel


Justin Deoliveira
OpenGeo - http://opengeo.org
Enterprise support for open source geospatial.

First off this will be a welcome change for all the reasons you stated. So very much in support of this work. I will pose a couple of additional questions not necessarily related to design.

  1. Will we go for feature parity with that is there now? Or shall we cut scope to things that we are feel are used?

We have to maintain feature parity

  1. Is this a good time to factor kml out of the main wms module?

For (2) I think there could be some benefits. What I envision is:

Step 1. Factor out current kml stuff in own module and demote it to a community module.
Step 2. Start on the new one as a community module.
Step 3. Once the new module is done promote it to an extension / core module to replace the first.

Sounds like a sensible plan to me, this way we are free to work on the KML refactor without
impacting other people that might need to use KML on trunk, and at the same time
who is interested in checking out the new module can do so.

And actually a follow up to (2):

  1. Does it make sense to try and do this for the most part at the geotools level.

Uh, I did not consider this at all. Sounds like a good idea, but I’d first develop everything
in a single code base to ease up the development, and then eventually backport the portions that are not tied to
GeoServer like we did for the WPS processes.

Yep, noticed. However I don’t see anything representing a Placemark, which
is a richer concept than GeoTools own Feature, and also the styles are not
exactly matched 1-1 to GeoTools own styles.

Yep, my concern here is that the streamable portion of GML/WPS is a list of features, whilst
in the case of KML the whole concept is more of a tree, so, more articulated.
I guess we could ask the caller to build the tree in a static way, and leave as streaming
concepts only the list of placemarks.

A concern I have about using a “object model” oriented approach is that creating a hierarchy
of classes representing the whole KML model seems like too much work:
https://developers.google.com/kml/documentation/kmlreference

There is this project providing a java API for KML, but it seems completely dead, besides,
it’s not clear at all if this thing can stream:
http://code.google.com/p/javaapiforkml/ (last commit is over a year old)

I’d avoid the option of generating a model using EMF, that library tries to exert too much control
on the generated objects, it’s not possible to mix plain beans, and
have no idea if one can make a streaming EList.

The downside of using a limited model is that you can only serialize what you can represent…
which is sort of the upside of using a streaming renderer oriented approach, with a streaming
kml builder playing the part of the XML writer, it seems easier to extend to add new features.

The downside is that it’s not as pluggable as using the object model, where each “listener”
can manipulate the object model before it gets written

Help gladly accepted :slight_smile:

Cheers
Andrea

···

Ideas:

  • have an object model, but work with it in chunks. Like, build in memory
    a placemark, serialize it to xml, and then move to the next, and so on
    (and something similar would happen to Document and Link objects)
  • have a set of nested “builders” that write the XML, and have the main
    code call upon them, these builders would have methods dealing
    with KML concepts as opposed to low level XML ones

Object model sounds good to me. Actually this was one thing that was needed for recent parsing work. We needed a way to represent kml folders so we rolled a new class to represent them.

I also had a look at xsd-kml, but it seems to me that it’s geared towards
parsing and there is no concept of “streaming writing” using it?

Yeah, it only does parsing at the moment. In terms of streaming writing it is indeed possible as we do this with GML. Basically if we want to stream a bunch of objects back (say like placemarks in a document) instead of returning a list of all them we just return an iterator or a collection with a lazy iterator. As i understand more or less the way streaming with WPS processes work.

I do think an xsd-kml approach could work but I am well aware there are issues with it. I am happy to offer help if you folks want to go that way but am fine if you don’t.