[Geoserver-devel] Feature Source with vector generalisations

Last week I posed a question about having a FeatureSource which has also some generalizations of the vector data to speed up response time and reduce data transfer.

While it is is surely not the problem to implement such a FeatureSource, I need the ScaleDenominator from the caller.

I studied the source and I want to ask the specialists about the following assumptions:

1) StreamingRenderer and ShapeFileRenderer use FeatureSource>>getFeatures(queryObject)
2) The Query interface offers the possibility to pass Hints

My idea would be to pass the scaleDenominater as a hint to the queryObject. This makes sense since
the scaleDenominator is only usefull for some use cases and should not be part of the FeatureSource API.

I am not sure here if the ScaleDenomintar is enough, perhaps a unit of measure is also required. On the other side, the Query interface offers a getter for the CRS, I think I can get the unit from the CRS, yes or no ?

Anyway, if my assumptions come close to the truth, I would do such an implementation and insert one line in the mentioned renderer classes to test it.

opinions ?

Hi Christian,

I will delegate to Andrea on this one since he is the rendering expert but I will say that imo this is a perfectly valid use of a query hint. And indeed afaik many other type of hints are passed down by the render to perform similar functions like decimation, etc...

2c,

-Justin

Christian Müller wrote:

Last week I posed a question about having a FeatureSource which has also some generalizations of the vector data to speed up response time and reduce data transfer.

While it is is surely not the problem to implement such a FeatureSource, I need the ScaleDenominator from the caller.

I studied the source and I want to ask the specialists about the following assumptions:

1) StreamingRenderer and ShapeFileRenderer use FeatureSource>>getFeatures(queryObject)
2) The Query interface offers the possibility to pass Hints

My idea would be to pass the scaleDenominater as a hint to the queryObject. This makes sense since
the scaleDenominator is only usefull for some use cases and should not be part of the FeatureSource API.

I am not sure here if the ScaleDenomintar is enough, perhaps a unit of measure is also required. On the other side, the Query interface offers a getter for the CRS, I think I can get the unit from the CRS, yes or no ?

Anyway, if my assumptions come close to the truth, I would do such an implementation and insert one line in the mentioned renderer classes to test it.

opinions ?

------------------------------------------------------------------------------
_______________________________________________
Geoserver-devel mailing list
Geoserver-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/geoserver-devel

--
Justin Deoliveira
OpenGeo - http://opengeo.org
Enterprise support for open source geospatial.

A few comments inline.

On Tue, Mar 31, 2009 at 3:26 AM, Christian Müller
<christian.mueller@anonymised.com> wrote:

Last week I posed a question about having a FeatureSource which has also
some generalizations of the vector data to speed up response time and reduce
data transfer.

While it is is surely not the problem to implement such a FeatureSource, I
need the ScaleDenominator from the caller.

I studied the source and I want to ask the specialists about the following
assumptions:

1) StreamingRenderer and ShapeFileRenderer use
FeatureSource>>getFeatures(queryObject)
2) The Query interface offers the possibility to pass Hints

My idea would be to pass the scaleDenominater as a hint to the queryObject.
This makes sense since the scaleDenominator is only usefull for some use cases and should not be part of the FeatureSource API.

Indeed; this is a good approach and you may find such information
already available; used to control decimation.

I am not sure here if the ScaleDenomintar is enough, perhaps a unit of
measure is also required. On the other side, the Query interface offers a
getter for the CRS, I think I can get the unit from the CRS, yes or no ?

It depends how you want to think about the problem; you may also be
considering the data CRS? You may wish to explore where to perform
your generalization? Before decimation or after? Before transformation
or after?

Anyway, if my assumptions come close to the truth, I would do such an
implementation and insert one line in the mentioned renderer classes to test
it.

opinions ?

Sounds good; you should be able to provide a hint to the renderer
prior to calling; ie you may not even need to insert a line in the
renderer classes for your experiment.

Jody

Christian Müller ha scritto:

Last week I posed a question about having a FeatureSource which has also some generalizations of the vector data to speed up response time and reduce data transfer.

While it is is surely not the problem to implement such a FeatureSource, I need the ScaleDenominator from the caller.

I studied the source and I want to ask the specialists about the following assumptions:

1) StreamingRenderer and ShapeFileRenderer use FeatureSource>>getFeatures(queryObject)
2) The Query interface offers the possibility to pass Hints

My idea would be to pass the scaleDenominater as a hint to the queryObject. This makes sense since
the scaleDenominator is only usefull for some use cases and should not be part of the FeatureSource API.

I am not sure here if the ScaleDenomintar is enough, perhaps a unit of measure is also required. On the other side, the Query interface offers a getter for the CRS, I think I can get the unit from the CRS, yes or no ?

Anyway, if my assumptions come close to the truth, I would do such an implementation and insert one line in the mentioned renderer classes to test it.

opinions ?

If you're trying to speed up the rendering process by doing generalization in memory inside a FeatureSource wrapper, hem, sorry,
you're on the wrong path, at least for WMS.

The renderer is already doing that, and in a very efficient way, so
if you implemented it only in memory, as a wrapper, you would at
best get a small slowdown, because you'd be doing the generalization
inside the wrapper, and the renderer would try to do it again (uselessly
since you already did it).

Different is the case in which you're not using a wrapper, but you have
some native way to access pre-decimated data in your datastore, for example if:
- your native store contains multiple version of the same geometry at
   different generalization levels
- your native store has way to efficiently generalize the data on the
   fly and this can reduce network traffic and the associated data
   encoding/decoding path

Algorithm wise, there are many, but I've found that using generic and
well behaved algorithms like Douglas-Peuker results in a _slowdown_
when performed inside the renderer, the recursive nature of the
algorithm makes generalization more expensive than the speedup gained
by rendering generalized data (if you are also reprojecting it may
be you still get a speedup, as reprojection might be more expensive
that generalization, don't know exactly because I did not try this
case out). So the current renderer uses a very simple one pass algorithm
that avoids expensive calculations, and it's based on pure offset:
a point is skipped if the deltaX, deltaY are less than one pixel size
from the last chosen point.

So, if you have some kind of native support it's worth experimenting
with it. I can add one hint, the generalization distance/offset,
that will be specified in the same unit as the native data, and you
can leverage it inside your datastore to decide how you want to
generalize.

When mixing generalization and transformation things get more complex.
As a rule of thumb, the renderer now never asks the datastore to
transform data, for two reasons:
- the datastores have been historically very unreliable at reprojecting
   data
- the renderer does reprojection after generalizing (this is
   very important to get good performance, as reprojection is expensive)
   and datastores so far did not have any generalization capability

Jody mentioned generalizing before/after reprojection.
This is an interesting topic too. The renderer always transforms
after generalization to get a good speedup of the whole rendering
process, this is in the common case good, but it's not in less
well behaved cases.
The issue is related to how the generalization distance is picked up.
Now the renderer gets the pixel size in the rendering CRS, and back
transforms it into the native CRS by placing a small segment in
the middle of the rendered area. If the transformatio is within
its "well behaved" area the result is going to be a good distance
for the whole rendering area, but if the linear deformation varies
wildly along the rendered area you may over generalize the input
data, resulting in bad rendering afterwards.
Two examples I can make are:
- rendering a polar stereographic so that the pole is in the middle
   and the rendered area goes well below 80° latitude. In this case
   you'll see ruined polygons (e.g., they were touching in the
   native data, but they will be rendered as disconnected) at
   the borders of the map
- rendering a UTM zone well beyond 6° from the central meridian,
   with the same effect

If you value accuracy more than speed even in those badly behaved
cases you should do the reprojection before doing the generalization.
A compromise could be performing some sampling of the area being
rendered and pick the lowest generalization distance, or generate
a grid of generalization distances so that each area uses
a more appropriate generalization.
Both of these approaches would fail anyways if the projection
has singularities, as there is not grid sampling good enough to
cope with a linear deformation that diverges close to a line
or a point inside (or at the border) or your map.
To be fair thought, you should not render such map to start with,
as you've gone way past any reasonable standard of good cartographic
representation.

Jody's point about the choice of algorithm is also another
interesting point. The algorithm I described above is good
for rendering assuming the renderer can cope with invalid
geometries, as the algorithm gives no guarantees the result
will be topologically correct. However, for the rendering
case that is good enough.

JTS has a topology preserving Douglas-Peuker implementation,
as that's what I would choose if I had to deal with a WFS output,
where speed is not such an issue (XML encoding is so expensive
that the difference in the generalization algorithm speed
will be un-noticeable anyways) but you want to give your
clients good data.

Hope this helps
Cheers
Andrea

--
Andrea Aime
OpenGeo - http://opengeo.org
Expert service straight from the developers.

Thanks for the information.

I never had the intension to generalise on the fly. I want to implement the same principle I used
with the imagemosaic-jdbc module, storing pyramid levels in the datastore.

All the pre generalized geometries have to be valid, because WFS should also be supported.

As you pointed out, I think there are 2,perhaps 3 interesting cases

1) The native store contains multiple generalizations of the same geometry .
  In the case of a shape file, each generalized shape file has the same number of records and
  the same attributes, but different geometries. In the case of a JDBC database one could make it more intelligent by using views ( I did it this way).
2) The native store has the possibility to generalize before sending the data
  (e. g. DB2 has a SQL function st_generalize(...) using Douglas Peuker )
3) Another idea would be to generalize a Feature Source at system startup and put the generalized geometries in a temp directory. These generalizations MUST be executed in a background thread
  at low priority and should not influence the response time of the system. If a generalization is finished, use it , otherwise fall back to an already finished generalization or the base geometries. At any point in time, the FeatureSource is fully functional, only the returned geometries my differ.

But anyway, I do not want any generalization on the fly and 1) is my first target.

Passing difference/offset in the native CRS as a hint would make me happy :slight_smile:
If the hint is missing, you always get the base geometries, the same holds true if the value of the hint is 0.

I would like to avoid CRS transformations to be consistent with other FeatureSources.

Pleasy give me a hint and I would start. If possible in 2.5.x and 2.6.x, I need this feature in geoserver.

christian

Andrea Aime writes:

Christian Müller ha scritto:

Last week I posed a question about having a FeatureSource which has also some generalizations of the vector data to speed up response time and reduce data transfer.

While it is is surely not the problem to implement such a FeatureSource, I need the ScaleDenominator from the caller.

I studied the source and I want to ask the specialists about the following assumptions:

1) StreamingRenderer and ShapeFileRenderer use FeatureSource>>getFeatures(queryObject)
2) The Query interface offers the possibility to pass Hints

My idea would be to pass the scaleDenominater as a hint to the queryObject. This makes sense since
the scaleDenominator is only usefull for some use cases and should not be part of the FeatureSource API.

I am not sure here if the ScaleDenomintar is enough, perhaps a unit of measure is also required. On the other side, the Query interface offers a getter for the CRS, I think I can get the unit from the CRS, yes or no ?

Anyway, if my assumptions come close to the truth, I would do such an implementation and insert one line in the mentioned renderer classes to test it.

opinions ?

If you're trying to speed up the rendering process by doing generalization in memory inside a FeatureSource wrapper, hem, sorry,
you're on the wrong path, at least for WMS.

The renderer is already doing that, and in a very efficient way, so
if you implemented it only in memory, as a wrapper, you would at
best get a small slowdown, because you'd be doing the generalization
inside the wrapper, and the renderer would try to do it again (uselessly
since you already did it).

Different is the case in which you're not using a wrapper, but you have
some native way to access pre-decimated data in your datastore, for example if:
- your native store contains multiple version of the same geometry at
  different generalization levels
- your native store has way to efficiently generalize the data on the
  fly and this can reduce network traffic and the associated data
  encoding/decoding path

Algorithm wise, there are many, but I've found that using generic and
well behaved algorithms like Douglas-Peuker results in a _slowdown_
when performed inside the renderer, the recursive nature of the
algorithm makes generalization more expensive than the speedup gained
by rendering generalized data (if you are also reprojecting it may
be you still get a speedup, as reprojection might be more expensive
that generalization, don't know exactly because I did not try this
case out). So the current renderer uses a very simple one pass algorithm
that avoids expensive calculations, and it's based on pure offset:
a point is skipped if the deltaX, deltaY are less than one pixel size
from the last chosen point.

So, if you have some kind of native support it's worth experimenting
with it. I can add one hint, the generalization distance/offset,
that will be specified in the same unit as the native data, and you
can leverage it inside your datastore to decide how you want to
generalize.

When mixing generalization and transformation things get more complex.
As a rule of thumb, the renderer now never asks the datastore to
transform data, for two reasons:
- the datastores have been historically very unreliable at reprojecting
  data
- the renderer does reprojection after generalizing (this is
  very important to get good performance, as reprojection is expensive)
  and datastores so far did not have any generalization capability

Jody mentioned generalizing before/after reprojection.
This is an interesting topic too. The renderer always transforms
after generalization to get a good speedup of the whole rendering
process, this is in the common case good, but it's not in less
well behaved cases.
The issue is related to how the generalization distance is picked up.
Now the renderer gets the pixel size in the rendering CRS, and back
transforms it into the native CRS by placing a small segment in
the middle of the rendered area. If the transformatio is within
its "well behaved" area the result is going to be a good distance
for the whole rendering area, but if the linear deformation varies
wildly along the rendered area you may over generalize the input
data, resulting in bad rendering afterwards.
Two examples I can make are:
- rendering a polar stereographic so that the pole is in the middle
  and the rendered area goes well below 80° latitude. In this case
  you'll see ruined polygons (e.g., they were touching in the
  native data, but they will be rendered as disconnected) at
  the borders of the map
- rendering a UTM zone well beyond 6° from the central meridian,
  with the same effect

If you value accuracy more than speed even in those badly behaved
cases you should do the reprojection before doing the generalization.
A compromise could be performing some sampling of the area being
rendered and pick the lowest generalization distance, or generate
a grid of generalization distances so that each area uses
a more appropriate generalization.
Both of these approaches would fail anyways if the projection
has singularities, as there is not grid sampling good enough to
cope with a linear deformation that diverges close to a line
or a point inside (or at the border) or your map.
To be fair thought, you should not render such map to start with,
as you've gone way past any reasonable standard of good cartographic
representation.

Jody's point about the choice of algorithm is also another
interesting point. The algorithm I described above is good
for rendering assuming the renderer can cope with invalid
geometries, as the algorithm gives no guarantees the result
will be topologically correct. However, for the rendering
case that is good enough.

JTS has a topology preserving Douglas-Peuker implementation,
as that's what I would choose if I had to deal with a WFS output,
where speed is not such an issue (XML encoding is so expensive
that the difference in the generalization algorithm speed
will be un-noticeable anyways) but you want to give your
clients good data.

Hope this helps
Cheers
Andrea

--
Andrea Aime
OpenGeo - http://opengeo.org
Expert service straight from the developers.

Christian Müller ha scritto:

But anyway, I do not want any generalization on the fly and 1) is my first target.

Passing difference/offset in the native CRS as a hint would make me happy :slight_smile:
If the hint is missing, you always get the base geometries, the same holds true if the value of the hint is 0.

I would like to avoid CRS transformations to be consistent with other FeatureSources.

Pleasy give me a hint and I would start. If possible in 2.5.x and 2.6.x, I need this feature in geoserver.

Hum, I looked into the streaming renderer code and the thing is not that
easy unfortunately, as the generalization distance in the general case
has to be evaluated on a feature by feature basis:
- the same data source can provide different geometries properties
   in distinct spatial reference systems. In this case, no single
   rendering hint can be provided
- the same column can contain geometries with different spatial
   reference system too. I know it sounds crazy, but as far as I remember
   OGC WFS makes sure that you can handle that case as well... mixed
   with the fact that in GeoServer you have to declare a SRS anyways
   (we have no way to support a srs-less dataset at the moment)
   it means the code cannot just check if the native SRS is null
   (which is what I would expect from a mixed SRS geometry column).
   Btw, Justin, can you confirm the above or it's just my memory failing
   me?

So it seems I can pass down a generalization distance only if
there is no reprojection going on and the native srs has not
been tampered with?
Alternatively I guess I could pass down a generalization distance
in the rendering SRS, and pass down the rendering CRS as well.
It will be the job of the datastore to figure out if the generalization
distance need to be adjusted given the native SRS of the data, or not.
Opinions?

Cheers
Andrea

--
Andrea Aime
OpenGeo - http://opengeo.org
Expert service straight from the developers.

Andrea, I am not sure what you are meaning

"same data source can provide different geometries properties"

1a)
I assume you mean that within one feature type we can have more than one geometry property.
Since a feature has a default geometry, this is the one we should use. If I have a feature with more
geometry properties with different CRSs, it is my responsibility to generalize all geometries in a proper way in advance. If you have no distance for the default geometry --> no hint

1b)
If you mean you have to render layers with different CRSs, I see no problem. The generalization difference is individual for each feature source.

2) Having different CRSs within the geometries of one geometry property, forget it. No hint. I dont want to make a master thesis about CRS calculations.

Btw, which names should I use
package: org.geotools.data.gen
FeatureSource org.geotools.data.GenFeatureSource
.....

Andrea Aime writes:

Christian Müller ha scritto:

But anyway, I do not want any generalization on the fly and 1) is my first target.

Passing difference/offset in the native CRS as a hint would make me happy :slight_smile:
If the hint is missing, you always get the base geometries, the same holds true if the value of the hint is 0.

I would like to avoid CRS transformations to be consistent with other FeatureSources.

Pleasy give me a hint and I would start. If possible in 2.5.x and 2.6.x, I need this feature in geoserver.

Hum, I looked into the streaming renderer code and the thing is not that
easy unfortunately, as the generalization distance in the general case
has to be evaluated on a feature by feature basis:
- the same data source can provide different geometries properties
  in distinct spatial reference systems. In this case, no single
  rendering hint can be provided
- the same column can contain geometries with different spatial
  reference system too. I know it sounds crazy, but as far as I remember
  OGC WFS makes sure that you can handle that case as well... mixed
  with the fact that in GeoServer you have to declare a SRS anyways
  (we have no way to support a srs-less dataset at the moment)
  it means the code cannot just check if the native SRS is null
  (which is what I would expect from a mixed SRS geometry column).
  Btw, Justin, can you confirm the above or it's just my memory failing
  me?

So it seems I can pass down a generalization distance only if
there is no reprojection going on and the native srs has not
been tampered with?
Alternatively I guess I could pass down a generalization distance
in the rendering SRS, and pass down the rendering CRS as well.
It will be the job of the datastore to figure out if the generalization
distance need to be adjusted given the native SRS of the data, or not.
Opinions?

Cheers
Andrea

--
Andrea Aime
OpenGeo - http://opengeo.org
Expert service straight from the developers.

Christian Müller ha scritto:

Andrea, I am not sure what you are meaning
"same data source can provide different geometries properties"
1a)
I assume you mean that within one feature type we can have more than one geometry property.

Indeed.

Since a feature has a default geometry, this is the one we should use.

Wrong:
- if it's WMS, each Symbolizer has a Geometry element that can be used
   to choose which geometry will be used for rendering
- if it's WFS, all geometries will be dumped into GML unless otherwise
   specified

The "default geometry" is a WMS only concept, it's the geometry being
used if the Symbolizer element does not contain a Geometry sub-element
specifying explicitly the geometry property to be used.

If I have a feature with more
geometry properties with different CRSs, it is my responsibility to generalize all geometries in a proper way in advance. If you have no distance for the default geometry --> no hint

If you have different CRS
the generalization distance, expressed in the native data CRS (as
you said, you don't want to deal with transformations), is different
for every CRS due to the different linear deformation each projection
has. The query is one, the generalization is one for each
SRS queried. I know this case is not common nor practical, but
I cannot behave as if it did not exist, one behaviour for such
case has to be specified.

1b)
If you mean you have to render layers with different CRSs, I see no problem. The generalization difference is individual for each feature source.

No, one source may have different geometry columns. Basically
only shapefiles are l geometry property only, spatial databases
and GML files may have many (I believe one of the OGC conformance
tests mandates one feature type that has 5 separate geometry columns,
one of which has generic geometries, each of them in different CRS).

2) Having different CRSs within the geometries of one geometry property, forget it. No hint. I dont want to make a master thesis about CRS calculations.

Btw, which names should I use
package: org.geotools.data.gen
FeatureSource org.geotools.data.GenFeatureSource

Package is ok, we try not to use abbreviated names thought.
MultiResolutionFeatureSource, GeneralizingFeatureSource could
be acceptable names, thought I agree they are long.
Anyways, if you say you have native support, wont' that be
part of the DB2 package or something?
Also remember to have anything work in GeoServer you have to
start with a DataStore, it's the only pluggable thing.
Or else, explain how you want to proceed and let's see if
you need more extension points out of GeoServer.

Sorry to pester, but if you want to put this into GeoServer
it has to try and play well with the existing use cases.
Probably the quickest approach is to pass down the resolution
hint only if there is only one geometry column in use and
only one CRS. In this case, and assuming the linear deformation
does not vary wildly, it makes sense to talk about a single
generalization distance expressed in the datastore native CRS.

Cheers
Andrea

--
Andrea Aime
OpenGeo - http://opengeo.org
Expert service straight from the developers.

- the same column can contain geometries with different spatial
  reference system too. I know it sounds crazy, but as far as I remember
  OGC WFS makes sure that you can handle that case as well... mixed
  with the fact that in GeoServer you have to declare a SRS anyways
  (we have no way to support a srs-less dataset at the moment)
  it means the code cannot just check if the native SRS is null
  (which is what I would expect from a mixed SRS geometry column).
  Btw, Justin, can you confirm the above or it's just my memory failing
  me?

The spec definitely does not prohibit it. But the CITE tests only ensure we can handle multiple geoms in the same referencing system.

So it seems I can pass down a generalization distance only if
there is no reprojection going on and the native srs has not
been tampered with?
Alternatively I guess I could pass down a generalization distance
in the rendering SRS, and pass down the rendering CRS as well.
It will be the job of the datastore to figure out if the generalization
distance need to be adjusted given the native SRS of the data, or not.
Opinions?

Cheers
Andrea

--
Justin Deoliveira
OpenGeo - http://opengeo.org
Enterprise support for open source geospatial.

Justin Deoliveira ha scritto:

- the same column can contain geometries with different spatial
  reference system too. I know it sounds crazy, but as far as I remember
  OGC WFS makes sure that you can handle that case as well... mixed
  with the fact that in GeoServer you have to declare a SRS anyways
  (we have no way to support a srs-less dataset at the moment)
  it means the code cannot just check if the native SRS is null
  (which is what I would expect from a mixed SRS geometry column).
  Btw, Justin, can you confirm the above or it's just my memory failing
  me?

The spec definitely does not prohibit it. But the CITE tests only ensure we can handle multiple geoms in the same referencing system.

Then I wonder why we have all this code around checking the JTS Geometry
user data for a per-geometry CRS (in the rendering code in particular,
it was already there before I joined GeoServer)... I thought I saw it
in the GML encoding path but I may be wrong

Cheers
Andrea

--
Andrea Aime
OpenGeo - http://opengeo.org
Expert service straight from the developers.

The spec definitely does not prohibit it. But the CITE tests only ensure we can handle multiple geoms in the same referencing system.

Then I wonder why we have all this code around checking the JTS Geometry
user data for a per-geometry CRS (in the rendering code in particular,
it was already there before I joined GeoServer)... I thought I saw it
in the GML encoding path but I may be wrong

I meant the CITE tests only test the same referencing system case. Having a different referencing system per geometry is still perfectly valid according to the spec, just that there is no code that tests it.

And indeed most of the GML parsing/encoding code is written generically to handle this case.

Cheers
Andrea

--
Justin Deoliveira
OpenGeo - http://opengeo.org
Enterprise support for open source geospatial.

1)
Andrea, after the last messages I would propose:
Pass the resolution hint if you are sure that nothing bad can happen. However you do it, I rely on getting the hint with feature types having one geometry property with the same CRS.

2) I did a quick implementation of GeneralizingFeatureSource which has a constructor

public GeneralizingFeatureSource(FeatureSource<SimpleFeatureType, SimpleFeature> baseFeatureSource,
      Map<Double,FeatureSource<SimpleFeatureType, SimpleFeature>> genMap)

Ok, if I have all feature sources and distances, life is easy.

Some points where I need some help:

a)
I think a better solution would be having the feature type names as arguments for the constructor and addionally a datastore. This would avoid creating unused FeatureSource objects.

b) I need some plugin mechanismn for a "GeneralizationFinder". We could resolve this like the JDBC data store by specifing the classname of the JDBC Driver class. A default implementation could read from an xml config file, I need another implementation querying the database.

c) The new GeneralizingFeatureSource objects needs a name. And I have to add them to existing data stores. Here I have absolute no idea at the moment.

Christian Müller ha scritto:

1)
Andrea, after the last messages I would propose:
Pass the resolution hint if you are sure that nothing bad can happen. However you do it, I rely on getting the hint with feature types having one geometry property with the same CRS.

2) I did a quick implementation of GeneralizingFeatureSource which has a constructor

public GeneralizingFeatureSource(FeatureSource<SimpleFeatureType, SimpleFeature> baseFeatureSource,
      Map<Double,FeatureSource<SimpleFeatureType, SimpleFeature>> genMap)

Ok, if I have all feature sources and distances, life is easy.

Some points where I need some help:

a)
I think a better solution would be having the feature type names as arguments for the constructor and addionally a datastore. This would avoid creating unused FeatureSource objects.

b) I need some plugin mechanismn for a "GeneralizationFinder". We could resolve this like the JDBC data store by specifing the classname of the JDBC Driver class. A default implementation could read from an xml config file, I need another implementation querying the database.

I still need to wrap my head around this, let's talk about it a bit
more. Your specific use case assumes you have pre-generalized data available. I see this possible as either:
- having multiple feature types available in a single data store
   that have the same structure, and whose geometry is generalized in
   a different view. Point in case, multiple shapefiles, multiple
   database tables or views
- having a single feature type with multiple geometry columns,
   generalized in different ways (that's the way we did it for the
   sigma.openplans.org demo for example, and we used SLD to select
   the proper geometry at the proper scale)

Now, where do we need this finder? And why do we need it in the first
place?
The most GeoServer compatible way I can think of is having a
PyramidDataStore that takes as parameters a reference to another
datastore and some configuration on how to merge the multiple ft/
multiple geom columns into one. This would not require any GeoServer
change, thought it's probably going to require quite some fiddling
with text files.
Another way is to make jdbc-ng only handle this, and add an extra
parameter pointing to a configuration file that specified the
mapping.
Yet another way, which would work only for the "single featuretype
with multiple geom columns" would be to have the ResourcePool
look for a generalizing feature source that can remap the feature
type into one with a single geometry, and support the generalization
hints.
We could support the thing at the datastore level if we had a
generalizing datastore wrapper, in this case we could add an
extension point to the resource pool that looks for a generalizing
wrapper for a given datastore. This is needed because if you
have 5 resolutions, and thus 5 separate feature types, we still
need to come out to the user and report only one merged feature
type, and that requires wrapping the DataStore.getTypeNames
method (which is used in the UI to allow you choose which feature
types to add to GeoServer).

The latter seems the cleanest way, thought it requires some
core changes.
Opinions?

Cheers
Andrea

--
Andrea Aime
OpenGeo - http://opengeo.org
Expert service straight from the developers.

Andrea, can we agree on 3 terms

1) Vertical Generalization
Having multipe stuctural indentical features

2) Horizonalt Generalization
Having mulitple geometry properties

3) Dynamic Generalization
Delegating the work to a backend (Btw, we have to include generalization into jdbc-ng)

As you pointed out, we will need some config file (I would prefer XML). Nevertheless I should be possible to read this config from anywhere else. This was my idea with a " "GeneralizationFinder". The default implementation should do the file stuff.

To tell the truth, until know I spent most of my time in geotools, so I am completely new to the concept of the Catalog and the ResourcePool.

The data store wrapper sounds good ,but first,we should make a data acess wrapper. I dont want to support modifications at this point in time.

If we implement this wrapper class, is there a need for DataStore (DataAccess) and the correspoinding factory class ?.

As you see, I am quite uncertain about the starting point, the creation of the wrapper object.

Andrea Aime writes:

Christian Müller ha scritto:

1)
Andrea, after the last messages I would propose:
Pass the resolution hint if you are sure that nothing bad can happen. However you do it, I rely on getting the hint with feature types having one geometry property with the same CRS.

2) I did a quick implementation of GeneralizingFeatureSource which has a constructor

public GeneralizingFeatureSource(FeatureSource<SimpleFeatureType, SimpleFeature> baseFeatureSource,
      Map<Double,FeatureSource<SimpleFeatureType, SimpleFeature>> genMap)

Ok, if I have all feature sources and distances, life is easy.

Some points where I need some help:

a)
I think a better solution would be having the feature type names as arguments for the constructor and addionally a datastore. This would avoid creating unused FeatureSource objects.

b) I need some plugin mechanismn for a "GeneralizationFinder". We could resolve this like the JDBC data store by specifing the classname of the JDBC Driver class. A default implementation could read from an xml config file, I need another implementation querying the database.

I still need to wrap my head around this, let's talk about it a bit
more. Your specific use case assumes you have pre-generalized data available. I see this possible as either:
- having multiple feature types available in a single data store
  that have the same structure, and whose geometry is generalized in
  a different view. Point in case, multiple shapefiles, multiple
  database tables or views
- having a single feature type with multiple geometry columns,
  generalized in different ways (that's the way we did it for the
  sigma.openplans.org demo for example, and we used SLD to select
  the proper geometry at the proper scale)

Now, where do we need this finder? And why do we need it in the first
place?
The most GeoServer compatible way I can think of is having a
PyramidDataStore that takes as parameters a reference to another
datastore and some configuration on how to merge the multiple ft/
multiple geom columns into one. This would not require any GeoServer
change, thought it's probably going to require quite some fiddling
with text files.
Another way is to make jdbc-ng only handle this, and add an extra
parameter pointing to a configuration file that specified the
mapping.
Yet another way, which would work only for the "single featuretype
with multiple geom columns" would be to have the ResourcePool
look for a generalizing feature source that can remap the feature
type into one with a single geometry, and support the generalization
hints.
We could support the thing at the datastore level if we had a
generalizing datastore wrapper, in this case we could add an
extension point to the resource pool that looks for a generalizing
wrapper for a given datastore. This is needed because if you
have 5 resolutions, and thus 5 separate feature types, we still
need to come out to the user and report only one merged feature
type, and that requires wrapping the DataStore.getTypeNames
method (which is used in the UI to allow you choose which feature
types to add to GeoServer).

The latter seems the cleanest way, thought it requires some
core changes.
Opinions?

Cheers
Andrea

--
Andrea Aime
OpenGeo - http://opengeo.org
Expert service straight from the developers.

Christian Müller ha scritto:

Andrea, can we agree on 3 terms
1) Vertical Generalization
Having multipe stuctural indentical features
2) Horizonalt Generalization
Having mulitple geometry properties

Hmmm... I don't feel the need to make up terms for this, but
if you'll use them, I'll know what you mean.

3) Dynamic Generalization
Delegating the work to a backend (Btw, we have to include generalization into jdbc-ng)

We could, but we don't have to. I made some tests time ago and doing
Douglas-Peuker on the fly generalization inside postgis, and it gave me
no speedup (on the contrary, it was so slow that it made WMS rendering
slower, with a PostGIS database sitting on the other side of a
100Mbit home network). That's why I'm pushing for the non topology
preserving case too, it's the only generalization technique that
could make on the fly, in database generalization worthwhile.

As you pointed out, we will need some config file (I would prefer XML). Nevertheless I should be possible to read this config from anywhere else. This was my idea with a " "GeneralizationFinder". The default implementation should do the file stuff.
To tell the truth, until know I spent most of my time in geotools, so I am completely new to the concept of the Catalog and the ResourcePool.
The data store wrapper sounds good ,but first,we should make a data acess wrapper. I dont want to support modifications at this point in time.
If we implement this wrapper class, is there a need for DataStore (DataAccess) and the correspoinding factory class ?.
As you see, I am quite uncertain about the starting point, the creation of the wrapper object.

First off we need to hear back from other devs as this would be a core
modification in a very core class, then we'll decide how to handle
this. Anyways, the two most promising routes seem to suggest the
usage of a DataStore wrapper (whether we need a factory or not, it's
up to how the hooking is made).
Also, if the datastore wrapper is made in GeoServer you can limit
yourself to a handful of methods (no need to worry about the
reader/writer methods DataStore provides, GeoServer does not use them),
and throw unsupported operation exceptions for the others.
Finally, yes, it's ok if the pre-generalized data is returned read-only.

Cheers
Andrea

--
Andrea Aime
OpenGeo - http://opengeo.org
Expert service straight from the developers.

First off we need to hear back from other devs as this would be a core
modification in a very core class, then we'll decide how to handle
this. Anyways, the two most promising routes seem to suggest the
usage of a DataStore wrapper (whether we need a factory or not, it's
up to how the hooking is made).
Also, if the datastore wrapper is made in GeoServer you can limit
yourself to a handful of methods (no need to worry about the
reader/writer methods DataStore provides, GeoServer does not use them),
and throw unsupported operation exceptions for the others.
Finally, yes, it's ok if the pre-generalized data is returned read-only.

An extension point seems reasonable to me. So what exactly would it look like:

DataStoreWrapper {

   boolean canHandle( FeatureTypeInfo featureType, DataStore dataStore );

   DataStore wrap( FeatureTypeINfo featureType, DAtaSTore dataStore );

}

would we want to handle teh wrapping of FeatureSources as well? So adding perhaps a:

   FeatureSource wrap( FeatureTypeINfo featureTYpe, FeatureSource featureSource );

Or is that overkill?

Cheers
Andrea

--
Justin Deoliveira
OpenGeo - http://opengeo.org
Enterprise support for open source geospatial.