[Geoserver-devel] Proposing a new community module: join emulation support functions

Hi,
I'd like to propose the addition of a new community module that
will add the support for limited support for server side joining by
means of filter functions.

The idea is to be able to support that commonly asked for use case
of "find all bus stops within x meters from this bank" or "find all the
maple trees in this land parcel" where the bank or the land parcel
are in another WFS layer and are identified by id or some other
attribute query.

We don't support that right now, and the client has to make two
separate requests. That gets ugly pretty quickly as the client has
to make two round trips and javascript clients cannot really handle
huge geometry (what if the area of interest is the coastline of
Norvegia for example?).

The idea of the module is to have filter function that do the queries,
so that everything happens server side, where there is much more
resources available to do the job:

<wfs:GetFeature ... service="WFS" version="1.0.0">
  <wfs:Query typeName="busStops">
    <ogc:Filter>
      <ogc:DWithin>
        <ogc:PropertyName>busStopGeom</ogc:PropertyName>
        <ogc:Function name="querySingle">
           <ogc:Literal>banks</ogc:Literal> <!-- the layer -->
           <ogc:Literal>bankGeom</ogc:Literal> <!-- the geometry attribute -->
           <ogc:Literal>BANK_ID = 'abcde'</ogc:Literal> <!-- CQL filter -->
        </ogc:Function>
        <ogc:Distance units="meter">1000</ogc:Distance>
      </ogc:DWithin>
    </ogc:Filter>
  </wfs:Query>
</wfs:GetFeature>

The module would add this and other functions that allow
to extract an attribute value from a layer in GeoServer, effectively
making the two queries right in the server.

A version that picks multiple features would look like:

<wfs:GetFeature ... service="WFS" version="1.0.0">
  <wfs:Query typeName="busStops">
    <ogc:Filter>
      <ogc:DWithin>
        <ogc:PropertyName>busStopGeom</ogc:PropertyName>
        <ogc:Function name="collectGeometries"> <!-- collapse list to
multigeom -->
          <ogc:Function name="queryCollection">
             <ogc:Literal>banks</ogc:Literal> <!-- the layer -->
             <ogc:Literal>bankGeom</ogc:Literal> <!-- the geometry -->
             <ogc:Literal>BANK_NAME = 'Metro'</ogc:Literal> <!-- CQL -->
          </ogc:Function>
        </ogc:Function>
        <ogc:Distance units="meter">1000</ogc:Distance>
      </ogc:DWithin>
    </ogc:Filter>
  </wfs:Query>
</wfs:GetFeature>

Two filter functions, the first returns a list of values, the
second would summarize them into a single geometry for
DWithin to use.

Now of course the second function might get dangerous, so there
would be a configurable limit to the amount of records returned and
their size:
- if queryCollection returns more than x records, boom, service exception
- if collect geometries ends up with a too large one (as counted by
  the number of ordinates) boom again

In order to make this work efficiently we'll also need another
bit in geotools: constant function elision.
With the current implementation a function in the filter is evaluated
in memory for each returned features, which is extremely inefficient.
I want to have a way to mark a function so that if it's not using
any feature attribute we can assume its result is going to be a
constant, and thus it can be optimized out and replaced with a literal
by evaluating it just once.

Opinions?

Cheers
Andrea

PS: I know the "right" thing would be to actually support joins, but
also believe everybody understands this one I'm proposing can be
done in days and get some of the benefit with limits in how large
the join can be, whilst actual join support would take many weeks
of works and various changes in the gt2 api as well (Query and datastore
modifications, new datastores, native join support in databases and
the like).

--
Ing. Andrea Aime
Technical Lead

GeoSolutions S.A.S.
Via Poggio alle Viti 1187
55054 Massarosa (LU)
Italy

phone: +39 0584962313
fax: +39 0584962313

http://www.geo-solutions.it
http://geo-solutions.blogspot.com/
http://www.linkedin.com/in/andreaaime
http://twitter.com/geowolf

-----------------------------------------------------

+1, sounds like a really great improvement. And I’d say that small improvements that give a concrete win without becoming bogged down in the end all ‘right’ thing are becoming ‘the geoserver way’, and this is a great step towards joins.

One question, does the (probably lame) WFS specified join stuff kick in at all with this? Like is there overlap between what that does and what this provides? Or does that only make sense to implement when we get ‘real’ join support? No worries at all if the answer is ‘we could do it with these constructs, but we don’t care about it’, I’m just curious, like if a client comes along and says ‘I want official WFS Joins’, if this could be used as a basis for a limited version of that. Though I do remember that joins seemed to be one of those less thought out areas of the WFS spec.

C

On Fri, Jan 28, 2011 at 5:17 AM, Andrea Aime <andrea.aime@anonymised.com> wrote:

Hi,
I’d like to propose the addition of a new community module that
will add the support for limited support for server side joining by
means of filter functions.

The idea is to be able to support that commonly asked for use case
of “find all bus stops within x meters from this bank” or “find all the
maple trees in this land parcel” where the bank or the land parcel
are in another WFS layer and are identified by id or some other
attribute query.

We don’t support that right now, and the client has to make two
separate requests. That gets ugly pretty quickly as the client has
to make two round trips and javascript clients cannot really handle
huge geometry (what if the area of interest is the coastline of
Norvegia for example?).

The idea of the module is to have filter function that do the queries,
so that everything happens server side, where there is much more
resources available to do the job:

<wfs:GetFeature … service=“WFS” version=“1.0.0”>
<wfs:Query typeName=“busStops”>
ogc:Filter
ogc:DWithin
ogc:PropertyNamebusStopGeom</ogc:PropertyName>
<ogc:Function name=“querySingle”>
ogc:Literalbanks</ogc:Literal>
ogc:LiteralbankGeom</ogc:Literal>
ogc:LiteralBANK_ID = ‘abcde’</ogc:Literal>
</ogc:Function>
<ogc:Distance units=“meter”>1000</ogc:Distance>
</ogc:DWithin>
</ogc:Filter>
</wfs:Query>
</wfs:GetFeature>

The module would add this and other functions that allow
to extract an attribute value from a layer in GeoServer, effectively
making the two queries right in the server.

A version that picks multiple features would look like:

<wfs:GetFeature … service=“WFS” version=“1.0.0”>
<wfs:Query typeName=“busStops”>
ogc:Filter
ogc:DWithin
ogc:PropertyNamebusStopGeom</ogc:PropertyName>
<ogc:Function name=“collectGeometries”>
<ogc:Function name=“queryCollection”>
ogc:Literalbanks</ogc:Literal>
ogc:LiteralbankGeom</ogc:Literal>
ogc:LiteralBANK_NAME = ‘Metro’</ogc:Literal>
</ogc:Function>
</ogc:Function>
<ogc:Distance units=“meter”>1000</ogc:Distance>
</ogc:DWithin>
</ogc:Filter>
</wfs:Query>
</wfs:GetFeature>

Two filter functions, the first returns a list of values, the
second would summarize them into a single geometry for
DWithin to use.

Now of course the second function might get dangerous, so there
would be a configurable limit to the amount of records returned and
their size:

  • if queryCollection returns more than x records, boom, service exception
  • if collect geometries ends up with a too large one (as counted by
    the number of ordinates) boom again

In order to make this work efficiently we’ll also need another
bit in geotools: constant function elision.
With the current implementation a function in the filter is evaluated
in memory for each returned features, which is extremely inefficient.
I want to have a way to mark a function so that if it’s not using
any feature attribute we can assume its result is going to be a
constant, and thus it can be optimized out and replaced with a literal
by evaluating it just once.

Opinions?

Cheers
Andrea

PS: I know the “right” thing would be to actually support joins, but
also believe everybody understands this one I’m proposing can be
done in days and get some of the benefit with limits in how large
the join can be, whilst actual join support would take many weeks
of works and various changes in the gt2 api as well (Query and datastore
modifications, new datastores, native join support in databases and
the like).


Ing. Andrea Aime
Technical Lead

GeoSolutions S.A.S.
Via Poggio alle Viti 1187
55054 Massarosa (LU)
Italy

phone: +39 0584962313
fax: +39 0584962313

http://www.geo-solutions.it
http://geo-solutions.blogspot.com/
http://www.linkedin.com/in/andreaaime
http://twitter.com/geowolf



Special Offer-- Download ArcSight Logger for FREE (a $49 USD value)!
Finally, a world-class log management solution at an even better price-free!
Download using promo code Free_Logger_4_Dev2Dev. Offer expires
February 28th, so secure your free ArcSight Logger TODAY!
http://p.sf.net/sfu/arcsight-sfd2d


Geoserver-devel mailing list
Geoserver-devel@anonymised.comsts.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/geoserver-devel

On Fri, Jan 28, 2011 at 2:53 PM, Chris Holmes <cholmes@anonymised.com> wrote:

+1, sounds like a really great improvement. And I'd say that small
improvements that give a concrete win without becoming bogged down in the
end all 'right' thing are becoming 'the geoserver way', and this is a great
step towards joins.

One question, does the (probably lame) WFS specified join stuff kick in at
all with this? Like is there overlap between what that does and what this
provides? Or does that only make sense to implement when we get 'real' join
support? No worries at all if the answer is 'we could do it with these
constructs, but we don't care about it', I'm just curious, like if a client
comes along and says 'I want official WFS Joins', if this could be used as a
basis for a limited version of that. Though I do remember that joins seemed
to be one of those less thought out areas of the WFS spec.

Well, I don't know the joins the in wfs spec enough to provide you with a full
answer.
The joins I've seen in WFS 2.0 generate a "tuple", which is a
construct containing
two or more features, equivalent more or less to the following

select a.p1, a.p2, b.p3, b.p4
from a inner join b on <join condition>

and the wfs result would look something like:

<tuple>
  <a>
     <p1>...</p1>
     <p2>...</p2>
  </a>
  <b>
     <p3>...</p3>
     <p4>...</p4>
  </b>
</tuple>

However the case I'm after is joining only for the sake
of filtering.
Not sure the functions I'm going to build will work
in that case, a generic filter might involve multiple
conditions from two or more feature types, I did not
think about it much but it does not look like it would
be something easy to coax into simple single self contained
call to a single layer (which is what the functions I'm proposing
do).

Cheers
Andrea

--
Ing. Andrea Aime
Technical Lead

GeoSolutions S.A.S.
Via Poggio alle Viti 1187
55054 Massarosa (LU)
Italy

phone: +39 0584962313
fax: +39 0584962313

http://www.geo-solutions.it
http://geo-solutions.blogspot.com/
http://www.linkedin.com/in/andreaaime
http://twitter.com/geowolf

-----------------------------------------------------

+1 , sounds like a great idea. Looking forward to hearing how it progresses.

On Fri, Jan 28, 2011 at 7:08 AM, Andrea Aime <andrea.aime@anonymised.com> wrote:

On Fri, Jan 28, 2011 at 2:53 PM, Chris Holmes <cholmes@anonymised.com501…> wrote:

+1, sounds like a really great improvement. And I’d say that small
improvements that give a concrete win without becoming bogged down in the
end all ‘right’ thing are becoming ‘the geoserver way’, and this is a great
step towards joins.

One question, does the (probably lame) WFS specified join stuff kick in at
all with this? Like is there overlap between what that does and what this
provides? Or does that only make sense to implement when we get ‘real’ join
support? No worries at all if the answer is ‘we could do it with these
constructs, but we don’t care about it’, I’m just curious, like if a client
comes along and says ‘I want official WFS Joins’, if this could be used as a
basis for a limited version of that. Though I do remember that joins seemed
to be one of those less thought out areas of the WFS spec.

Well, I don’t know the joins the in wfs spec enough to provide you with a full
answer.
The joins I’ve seen in WFS 2.0 generate a “tuple”, which is a
construct containing
two or more features, equivalent more or less to the following

select a.p1, a.p2, b.p3, b.p4
from a inner join b on

and the wfs result would look something like:

... ... ... ...

However the case I’m after is joining only for the sake
of filtering.
Not sure the functions I’m going to build will work
in that case, a generic filter might involve multiple
conditions from two or more feature types, I did not
think about it much but it does not look like it would
be something easy to coax into simple single self contained
call to a single layer (which is what the functions I’m proposing
do).

Cheers
Andrea

Ing. Andrea Aime
Technical Lead

GeoSolutions S.A.S.
Via Poggio alle Viti 1187
55054 Massarosa (LU)
Italy

phone: +39 0584962313
fax: +39 0584962313

http://www.geo-solutions.it
http://geo-solutions.blogspot.com/
http://www.linkedin.com/in/andreaaime
http://twitter.com/geowolf



Special Offer-- Download ArcSight Logger for FREE (a $49 USD value)!
Finally, a world-class log management solution at an even better price-free!
Download using promo code Free_Logger_4_Dev2Dev. Offer expires
February 28th, so secure your free ArcSight Logger TODAY!
http://p.sf.net/sfu/arcsight-sfd2d


Geoserver-devel mailing list
Geoserver-devel@anonymised.comsts.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/geoserver-devel


Justin Deoliveira
OpenGeo - http://opengeo.org
Enterprise support for open source geospatial.

Cool, was just curious. Thanks for the explanation.

On Fri, Jan 28, 2011 at 9:08 AM, Andrea Aime <andrea.aime@anonymised.com> wrote:

On Fri, Jan 28, 2011 at 2:53 PM, Chris Holmes <cholmes@anonymised.com> wrote:

+1, sounds like a really great improvement. And I’d say that small
improvements that give a concrete win without becoming bogged down in the
end all ‘right’ thing are becoming ‘the geoserver way’, and this is a great
step towards joins.

One question, does the (probably lame) WFS specified join stuff kick in at
all with this? Like is there overlap between what that does and what this
provides? Or does that only make sense to implement when we get ‘real’ join
support? No worries at all if the answer is ‘we could do it with these
constructs, but we don’t care about it’, I’m just curious, like if a client
comes along and says ‘I want official WFS Joins’, if this could be used as a
basis for a limited version of that. Though I do remember that joins seemed
to be one of those less thought out areas of the WFS spec.

Well, I don’t know the joins the in wfs spec enough to provide you with a full
answer.
The joins I’ve seen in WFS 2.0 generate a “tuple”, which is a
construct containing
two or more features, equivalent more or less to the following

select a.p1, a.p2, b.p3, b.p4
from a inner join b on

and the wfs result would look something like:

... ... ... ...

However the case I’m after is joining only for the sake
of filtering.
Not sure the functions I’m going to build will work
in that case, a generic filter might involve multiple
conditions from two or more feature types, I did not
think about it much but it does not look like it would
be something easy to coax into simple single self contained
call to a single layer (which is what the functions I’m proposing
do).

Cheers
Andrea

Ing. Andrea Aime
Technical Lead

GeoSolutions S.A.S.
Via Poggio alle Viti 1187
55054 Massarosa (LU)
Italy

phone: +39 0584962313
fax: +39 0584962313

http://www.geo-solutions.it
http://geo-solutions.blogspot.com/
http://www.linkedin.com/in/andreaaime
http://twitter.com/geowolf