[Geoserver-devel] wfs-ng

Hi all,

I'll be working on the wfs-ng module this and next weeks with the
following high level goals:
- port to ContentDataStore
- have a single WFSDataStore implementation with strategy objects for
the different versions, instead of one datastore per wfs version
- make it share the http client code with the wms and wps modules
- implement transaction support for wfs .1.1
- verify interoperability with a bunch of customer provided wfs instances

Just ran into the question of what the earlier approach for "ng"
modules was before they're ready to replace the "original" ones, in
terms of connection parameter clashes.
Do we prefer "ng" modules:
1- to share the connection parameter names with the old ones, so that
they can be easily replaced
2- to temporarily use some different parameter name so that both
original and ng can be used at the same time, but fall back to the
original connection parameter when it's ready to replace the old
module
3- just use a different set of parameter names

Additionally, it just occurred to me that the datastore could be
configured to use both a pooling http client (as the wms client can)
and to fetch features from the upstream server using multiple threads.
Default behavior would be to use a single thread. The old wfs 1.0
client forced spawning a new fetching thread per request, which didn't
scale, so we had to avoid that. I'm thinking a more modern approach
could be taken though, in order to have a fixed number of threads that
hit a given wfs server, and still get some performance improvement by
allowing a single request to use multiple threads to fetch contents,
as long as the upstream server supports paging. I'm not committed yet
to do that,but feedback would be much appreciated.

TIA,
Gabriel.

--
Gabriel Roldan
OpenGeo - http://opengeo.org
Expert service straight from the developers.

On Tue, Jan 24, 2012 at 1:56 PM, Gabriel Roldan <groldan@anonymised.com> wrote:

Hi all,

I’ll be working on the wfs-ng module this and next weeks with the
following high level goals:

  • port to ContentDataStore
  • have a single WFSDataStore implementation with strategy objects for
    the different versions, instead of one datastore per wfs version
  • make it share the http client code with the wms and wps modules
  • implement transaction support for wfs .1.1
  • verify interoperability with a bunch of customer provided wfs instances

Just ran into the question of what the earlier approach for “ng”
modules was before they’re ready to replace the “original” ones, in
terms of connection parameter clashes.
Do we prefer “ng” modules:
1- to share the connection parameter names with the old ones, so that
they can be easily replaced

2- to temporarily use some different parameter name so that both
original and ng can be used at the same time, but fall back to the
original connection parameter when it’s ready to replace the old
module

3- just use a different set of parameter names

The jdbc case was a bit different because of the “dbtype” parameter, which was really what did the matching of parameters to factory implementations. But I don’t think wfs has the equivalent. ANyways, postgis for example, during the deprecation phase of the old datastore tried not to interfere, by only accepting “postgis-ng” as the dbtype. Once the old datastore was moved to unsupported only then did the new factory take over and start to accept “postgis”.

I suggest following a similar approach, which I think this your #2.

Additionally, it just occurred to me that the datastore could be
configured to use both a pooling http client (as the wms client can)
and to fetch features from the upstream server using multiple threads.
Default behavior would be to use a single thread. The old wfs 1.0
client forced spawning a new fetching thread per request, which didn’t
scale, so we had to avoid that. I’m thinking a more modern approach
could be taken though, in order to have a fixed number of threads that
hit a given wfs server, and still get some performance improvement by
allowing a single request to use multiple threads to fetch contents,
as long as the upstream server supports paging. I’m not committed yet
to do that,but feedback would be much appreciated.

Makes sense… WFS is exactly built for speed as a protocol so i imagine that threads will spend a lot of time waiting on I/O anyways… an asynchronous approach probably makes more sense.

TIA,
Gabriel.


Gabriel Roldan
OpenGeo - http://opengeo.org
Expert service straight from the developers.


Keep Your Developer Skills Current with LearnDevNow!
The most comprehensive online learning library for Microsoft developers
is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
Metro Style Apps, more. Free future releases when you subscribe now!
http://p.sf.net/sfu/learndevnow-d2d


Geoserver-devel mailing list
Geoserver-devel@anonymised.comsts.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/geoserver-devel


Justin Deoliveira
OpenGeo - http://opengeo.org
Enterprise support for open source geospatial.

On Tue, Jan 24, 2012 at 7:41 PM, Justin Deoliveira <jdeolive@anonymised.com> wrote:

On Tue, Jan 24, 2012 at 1:56 PM, Gabriel Roldan <groldan@anonymised.com> wrote:

Hi all,

I'll be working on the wfs-ng module this and next weeks with the
following high level goals:
- port to ContentDataStore
- have a single WFSDataStore implementation with strategy objects for
the different versions, instead of one datastore per wfs version
- make it share the http client code with the wms and wps modules
- implement transaction support for wfs .1.1
- verify interoperability with a bunch of customer provided wfs instances

Just ran into the question of what the earlier approach for "ng"
modules was before they're ready to replace the "original" ones, in
terms of connection parameter clashes.
Do we prefer "ng" modules:
1- to share the connection parameter names with the old ones, so that
they can be easily replaced

2- to temporarily use some different parameter name so that both
original and ng can be used at the same time, but fall back to the
original connection parameter when it's ready to replace the old
module

3- just use a different set of parameter names

The jdbc case was a bit different because of the "dbtype" parameter, which
was really what did the matching of parameters to factory implementations.
But I don't think wfs has the equivalent. ANyways, postgis for example,
during the deprecation phase of the old datastore tried not to interfere, by
only accepting "postgis-ng" as the dbtype. Once the old datastore was moved
to unsupported only then did the new factory take over and start to accept
"postgis".

Thanks Justin, that makes sense. So I'll follow that approach, letting
them coexist until ng can replace, then it can take over the original
parameter set.

I suggest following a similar approach, which I think this your #2.

Additionally, it just occurred to me that the datastore could be
configured to use both a pooling http client (as the wms client can)
and to fetch features from the upstream server using multiple threads.
Default behavior would be to use a single thread. The old wfs 1.0
client forced spawning a new fetching thread per request, which didn't
scale, so we had to avoid that. I'm thinking a more modern approach
could be taken though, in order to have a fixed number of threads that
hit a given wfs server, and still get some performance improvement by
allowing a single request to use multiple threads to fetch contents,
as long as the upstream server supports paging. I'm not committed yet
to do that,but feedback would be much appreciated.

Makes sense... WFS is exactly built for speed as a protocol so i imagine
that threads will spend a lot of time waiting on I/O anyways... an
asynchronous approach probably makes more sense.

yup, lets see if something good comes out of it. I'm sure there's a
lot of good literature on how to do that well. Anyways, not a
requirement, just on the wish list.

Cheers,
Gabriel

TIA,
Gabriel.

--
Gabriel Roldan
OpenGeo - http://opengeo.org
Expert service straight from the developers.

------------------------------------------------------------------------------
Keep Your Developer Skills Current with LearnDevNow!
The most comprehensive online learning library for Microsoft developers
is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
Metro Style Apps, more. Free future releases when you subscribe now!
http://p.sf.net/sfu/learndevnow-d2d
_______________________________________________
Geoserver-devel mailing list
Geoserver-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/geoserver-devel

--
Justin Deoliveira
OpenGeo - http://opengeo.org
Enterprise support for open source geospatial.

--
Gabriel Roldan
OpenGeo - http://opengeo.org
Expert service straight from the developers.

On Tue, Jan 24, 2012 at 9:56 PM, Gabriel Roldan <groldan@anonymised.com…> wrote:

Hi all,

I’ll be working on the wfs-ng module this and next weeks with the
following high level goals:

  • port to ContentDataStore
  • have a single WFSDataStore implementation with strategy objects for
    the different versions, instead of one datastore per wfs version
  • make it share the http client code with the wms and wps modules
  • implement transaction support for wfs .1.1
  • verify interoperability with a bunch of customer provided wfs instances

Nice. How are you going to deal with all the xml parsing/encoding required?

Just ran into the question of what the earlier approach for “ng”
modules was before they’re ready to replace the “original” ones, in
terms of connection parameter clashes.
Do we prefer “ng” modules:
1- to share the connection parameter names with the old ones, so that
they can be easily replaced
2- to temporarily use some different parameter name so that both
original and ng can be used at the same time, but fall back to the
original connection parameter when it’s ready to replace the old
module
3- just use a different set of parameter names

I agree 2 is probably the best way

Additionally, it just occurred to me that the datastore could be
configured to use both a pooling http client (as the wms client can)
and to fetch features from the upstream server using multiple threads.
Default behavior would be to use a single thread. The old wfs 1.0
client forced spawning a new fetching thread per request, which didn’t
scale, so we had to avoid that. I’m thinking a more modern approach
could be taken though, in order to have a fixed number of threads that
hit a given wfs server, and still get some performance improvement by
allowing a single request to use multiple threads to fetch contents,
as long as the upstream server supports paging. I’m not committed yet
to do that,but feedback would be much appreciated.

Sounds like a cool optimization, at the same time I’m wondering how much
impact it will actually have. Rationale:

  • relying on paging it will work only against WFS 2.0 official implementations,
    or on WFS 1.0/1.1 vendor extensions. You did not mention WFS 2.0 above,
    is that going to be implemented as well?
    For vendor extensions, how do you recognize them? And how do you deal
    with the issue of the first record being 0 or 1 depending on the implementation
    (GeoServer uses 0, MapServer uses 1, the spec is not clear)
  • parallel requests will actually speedup things only if the bottleneck is the CPU
    and/or if each request is bandwidth capped, so that if you make parallel
    requests you either get more CPU power or more total bandwidth

I agree it could be an interesting optimization for some cases, at the same
time making a truly interoperable client, that deals with WFS 1.1 servers
not flipping the axis as they should, usage of non EPSG codes (ESRI)
and so on, will be a “lot of fun” already :slight_smile:

Cheers
Andrea

Ing. Andrea Aime
GeoSolutions S.A.S.
Tech lead

Via Poggio alle Viti 1187
55054 Massarosa (LU)
Italy

phone: +39 0584 962313
fax: +39 0584 962313
mob: +39 339 8844549

http://www.geo-solutions.it
http://geo-solutions.blogspot.com/
http://www.youtube.com/user/GeoSolutionsIT
http://www.linkedin.com/in/andreaaime
http://twitter.com/geowolf


On Wed, Jan 25, 2012 at 10:21 AM, Andrea Aime
<andrea.aime@anonymised.com> wrote:

On Tue, Jan 24, 2012 at 9:56 PM, Gabriel Roldan <groldan@anonymised.com> wrote:

Hi all,

I'll be working on the wfs-ng module this and next weeks with the
following high level goals:
- port to ContentDataStore
- have a single WFSDataStore implementation with strategy objects for
the different versions, instead of one datastore per wfs version
- make it share the http client code with the wms and wps modules
- implement transaction support for wfs .1.1
- verify interoperability with a bunch of customer provided wfs instances

Nice. How are you going to deal with all the xml parsing/encoding required?

For any recognized oddity on each server a unit test will ensure the
sent request is like the server likes it. And for each server response
there'll be a mocked up response serving an xml file from the test
resources, plus any other needed thing like response headers etc to
account for server specific deviations from the spec.

As for xml tech, not enough resources to port all the WFS 1.0 xml
handling for the gt-xsd framework, so each protocol strategy object is
free to use whatever they want. For 1.1. we stick to gt-xsd for most
of the things. A possible exception being GetFeature response parsing.
Justin mentioned he has a refurbished streaming parser somewhere, so
it'd be worth giving it a try. Otherwise the StAX custom parser
already in the wfs 1.1 client would prevail.

Just ran into the question of what the earlier approach for "ng"
modules was before they're ready to replace the "original" ones, in
terms of connection parameter clashes.
Do we prefer "ng" modules:
1- to share the connection parameter names with the old ones, so that
they can be easily replaced
2- to temporarily use some different parameter name so that both
original and ng can be used at the same time, but fall back to the
original connection parameter when it's ready to replace the old
module
3- just use a different set of parameter names

I agree 2 is probably the best way

yup, thanks.

Additionally, it just occurred to me that the datastore could be
configured to use both a pooling http client (as the wms client can)
and to fetch features from the upstream server using multiple threads.
Default behavior would be to use a single thread. The old wfs 1.0
client forced spawning a new fetching thread per request, which didn't
scale, so we had to avoid that. I'm thinking a more modern approach
could be taken though, in order to have a fixed number of threads that
hit a given wfs server, and still get some performance improvement by
allowing a single request to use multiple threads to fetch contents,
as long as the upstream server supports paging. I'm not committed yet
to do that,but feedback would be much appreciated.

Sounds like a cool optimization, at the same time I'm wondering how much
impact it will actually have. Rationale:
- relying on paging it will work only against WFS 2.0 official
implementations,
or on WFS 1.0/1.1 vendor extensions. You did not mention WFS 2.0 above,
is that going to be implemented as well?
For vendor extensions, how do you recognize them? And how do you deal
with the issue of the first record being 0 or 1 depending on the
implementation
(GeoServer uses 0, MapServer uses 1, the spec is not clear)
- parallel requests will actually speedup things only if the bottleneck is
the CPU
and/or if each request is bandwidth capped, so that if you make parallel
requests you either get more CPU power or more total bandwidth

yeah, was mostly thinking out loud. Not something to do for this first
round, and certainly would need to evaluate the benefits and provide
enough configurability and perhaps heuristics adaptive to the actual
runtime conditions and/or enough profiling information as to be able
to tune on a case by case basis. Which would make that a project on
its own right, outside the current scope.

I agree it could be an interesting optimization for some cases, at the same
time making a truly interoperable client, that deals with WFS 1.1 servers
not flipping the axis as they should, usage of non EPSG codes (ESRI)
and so on, will be a "lot of fun" already :slight_smile:

Indeed... if you want to call it fun.

Cheers,
Gabriel

Cheers
Andrea

--
-------------------------------------------------------
Ing. Andrea Aime
GeoSolutions S.A.S.
Tech lead

Via Poggio alle Viti 1187
55054 Massarosa (LU)
Italy

phone: +39 0584 962313
fax: +39 0584 962313
mob: +39 339 8844549

http://www.geo-solutions.it
http://geo-solutions.blogspot.com/
http://www.youtube.com/user/GeoSolutionsIT
http://www.linkedin.com/in/andreaaime
http://twitter.com/geowolf

-------------------------------------------------------

--
Gabriel Roldan
OpenGeo - http://opengeo.org
Expert service straight from the developers.

On Wed, Jan 25, 2012 at 9:34 AM, Gabriel Roldan <groldan@anonymised.com> wrote:

On Wed, Jan 25, 2012 at 10:21 AM, Andrea Aime
<andrea.aime@anonymised.com> wrote:

On Tue, Jan 24, 2012 at 9:56 PM, Gabriel Roldan <groldan@anonymised.com> wrote:

Hi all,

I’ll be working on the wfs-ng module this and next weeks with the
following high level goals:

  • port to ContentDataStore
  • have a single WFSDataStore implementation with strategy objects for
    the different versions, instead of one datastore per wfs version
  • make it share the http client code with the wms and wps modules
  • implement transaction support for wfs .1.1
  • verify interoperability with a bunch of customer provided wfs instances

Nice. How are you going to deal with all the xml parsing/encoding required?

For any recognized oddity on each server a unit test will ensure the
sent request is like the server likes it. And for each server response
there’ll be a mocked up response serving an xml file from the test
resources, plus any other needed thing like response headers etc to
account for server specific deviations from the spec.

As for xml tech, not enough resources to port all the WFS 1.0 xml
handling for the gt-xsd framework, so each protocol strategy object is
free to use whatever they want. For 1.1. we stick to gt-xsd for most
of the things. A possible exception being GetFeature response parsing.
Justin mentioned he has a refurbished streaming parser somewhere, so
it’d be worth giving it a try. Otherwise the StAX custom parser
already in the wfs 1.1 client would prevail.

Right now this lives on a branch in my github repo.

https://github.com/jdeolive/geotools/blob/kml/modules/extension/xsd/xsd-core/src/main/java/org/geotools/xml/PullParser.java

Personally I think it would be a shame to not use this opportunity to try and consolidate some of our feature parsing code. Although I understand that generally people hate the gt-xsd stuff so i am happy to help out in this area.

Just ran into the question of what the earlier approach for “ng”
modules was before they’re ready to replace the “original” ones, in
terms of connection parameter clashes.
Do we prefer “ng” modules:
1- to share the connection parameter names with the old ones, so that
they can be easily replaced
2- to temporarily use some different parameter name so that both
original and ng can be used at the same time, but fall back to the
original connection parameter when it’s ready to replace the old
module
3- just use a different set of parameter names

I agree 2 is probably the best way

yup, thanks.

Additionally, it just occurred to me that the datastore could be
configured to use both a pooling http client (as the wms client can)
and to fetch features from the upstream server using multiple threads.
Default behavior would be to use a single thread. The old wfs 1.0
client forced spawning a new fetching thread per request, which didn’t
scale, so we had to avoid that. I’m thinking a more modern approach
could be taken though, in order to have a fixed number of threads that
hit a given wfs server, and still get some performance improvement by
allowing a single request to use multiple threads to fetch contents,
as long as the upstream server supports paging. I’m not committed yet
to do that,but feedback would be much appreciated.

Sounds like a cool optimization, at the same time I’m wondering how much
impact it will actually have. Rationale:

  • relying on paging it will work only against WFS 2.0 official
    implementations,
    or on WFS 1.0/1.1 vendor extensions. You did not mention WFS 2.0 above,
    is that going to be implemented as well?
    For vendor extensions, how do you recognize them? And how do you deal
    with the issue of the first record being 0 or 1 depending on the
    implementation
    (GeoServer uses 0, MapServer uses 1, the spec is not clear)
  • parallel requests will actually speedup things only if the bottleneck is
    the CPU
    and/or if each request is bandwidth capped, so that if you make parallel
    requests you either get more CPU power or more total bandwidth

yeah, was mostly thinking out loud. Not something to do for this first
round, and certainly would need to evaluate the benefits and provide
enough configurability and perhaps heuristics adaptive to the actual
runtime conditions and/or enough profiling information as to be able
to tune on a case by case basis. Which would make that a project on
its own right, outside the current scope.

I agree it could be an interesting optimization for some cases, at the same
time making a truly interoperable client, that deals with WFS 1.1 servers
not flipping the axis as they should, usage of non EPSG codes (ESRI)
and so on, will be a “lot of fun” already :slight_smile:

Indeed… if you want to call it fun.

Cheers,
Gabriel

Cheers
Andrea

Ing. Andrea Aime
GeoSolutions S.A.S.
Tech lead

Via Poggio alle Viti 1187
55054 Massarosa (LU)
Italy

phone: +39 0584 962313
fax: +39 0584 962313
mob: +39 339 8844549

http://www.geo-solutions.it
http://geo-solutions.blogspot.com/
http://www.youtube.com/user/GeoSolutionsIT
http://www.linkedin.com/in/andreaaime
http://twitter.com/geowolf



Gabriel Roldan
OpenGeo - http://opengeo.org
Expert service straight from the developers.


Keep Your Developer Skills Current with LearnDevNow!
The most comprehensive online learning library for Microsoft developers
is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
Metro Style Apps, more. Free future releases when you subscribe now!
http://p.sf.net/sfu/learndevnow-d2d


Geoserver-devel mailing list
Geoserver-devel@anonymised.comsts.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/geoserver-devel


Justin Deoliveira
OpenGeo - http://opengeo.org
Enterprise support for open source geospatial.

On Wed, Jan 25, 2012 at 1:45 PM, Justin Deoliveira <jdeolive@anonymised.com> wrote:

Right now this lives on a branch in my github repo.

https://github.com/jdeolive/geotools/blob/kml/modules/extension/xsd/xsd-core/src/main/java/org/geotools/xml/PullParser.java

Cool, I'll definitely give it a try.

Personally I think it would be a shame to not use this opportunity to try
and consolidate some of our feature parsing code. Although I understand that
generally people hate the gt-xsd stuff so i am happy to help out in this
area.

People only hates it when it breaks, which happens to everything. The
95/99% the of the times things work you don't hear about it. We're
martyrs.

Cheers,
Gabriel

--
Gabriel Roldan
OpenGeo - http://opengeo.org
Expert service straight from the developers.