[Geoserver-devel] Creating new geoserver data store extensions

All,

I am exploring the possibility of developing a new GeoServer data store extension (along the lines of the sql server/oracle/teradata extensions) for the Hive distributed database. This would presumably allow me to store geo data at very large scale with excellent response time. I am particularly interested in storing map overlay features (i.e. individual address points with associated data and area overlays, like zip codes, states, counties, etc).

I have a few questions:

  1. Does this seem feasible, e.g. are there any obvious roadblocks to this or am I unlikely to succeed for some technical reason that I haven’t seen yet?

  2. What pieces of plumbing would I need to write? As I mentioned, this datastore is non-relational and does not have a formal structured query language like SQL, so my assumption is that I likely need to build some sort of predicate analyzer/query builder, but I’m OK with that. My main question is around the integration points with GeoServer.

  3. What is the recommended process for deploying/testing/debugging? I have read the developer manual, but it seems a little thin

Thanks,
Chris

On Sat, Nov 12, 2011 at 4:48 PM, Chris Shain <chris@anonymised.com> wrote:

All,

I am exploring the possibility of developing a new GeoServer data store extension (along the lines of the sql server/oracle/teradata extensions) for the Hive distributed database. This would presumably allow me to store geo data at very large scale with excellent response time. I am particularly interested in storing map overlay features (i.e. individual address points with associated data and area overlays, like zip codes, states, counties, etc).

I have a few questions:

  1. Does this seem feasible, e.g. are there any obvious roadblocks to this or am I unlikely to succeed for some technical reason that I haven’t seen yet?

I guess you need some way to spatially index your data so that you can respond quickly to bbox queries.
Not knowing much about hive, if it’s able to repond to a “like” query fast you could use geohashing to
turn a spatial entity into a string.

  1. What pieces of plumbing would I need to write? As I mentioned, this datastore is non-relational and does not have a formal structured query language like SQL, so my assumption is that I likely need to build some sort of predicate analyzer/query builder, but I’m OK with that. My main question is around the integration points with GeoServer.

You need to write your own DataStore implementation

  1. What is the recommended process for deploying/testing/debugging? I have read the developer manual, but it seems a little thin

DataStores are GeoTools abstractions, you should look into the GeoTools manuals.
Starting points:
http://docs.geotools.org/stable/tutorials/advanced/datastore.html
and have a look at any ContentDataStore based implementation.
There are various, the jdbc stores, the aggregating store, property data store ng,
the ogr store, etc, have a look here:
http://svn.osgeo.org/geotools/trunk/

Cheers
Andrea

Ing. Andrea Aime
GeoSolutions S.A.S.
Tech lead

Via Poggio alle Viti 1187
55054 Massarosa (LU)
Italy

phone: +39 0584 962313
fax: +39 0584 962313

http://www.geo-solutions.it
http://geo-solutions.blogspot.com/
http://www.youtube.com/user/GeoSolutionsIT
http://www.linkedin.com/in/andreaaime
http://twitter.com/geowolf


Thanks Andrea! I will let you know how I make out and whether (read: when) I run into trouble next.

For the historical record, I mistyped and meant HBase, not Hive, but the questions and responses are the same.

For the curious, HBase is a distributed database built on the Hadoop platform, and is capable of handling petabytes of data, including binary data like map tiles. It is an open source implementation of Google’s BigTable architecture.

On Nov 12, 2011 11:59 AM, “Andrea Aime” <andrea.aime@anonymised.com> wrote:

On Sat, Nov 12, 2011 at 4:48 PM, Chris Shain <chris@anonymised.com> wrote:

All,

I am exploring the possibility of developing a new GeoServer data store extension (along the lines of the sql server/oracle/teradata extensions) for the Hive distributed database. This would presumably allow me to store geo data at very large scale with excellent response time. I am particularly interested in storing map overlay features (i.e. individual address points with associated data and area overlays, like zip codes, states, counties, etc).

I have a few questions:

  1. Does this seem feasible, e.g. are there any obvious roadblocks to this or am I unlikely to succeed for some technical reason that I haven’t seen yet?

I guess you need some way to spatially index your data so that you can respond quickly to bbox queries.
Not knowing much about hive, if it’s able to repond to a “like” query fast you could use geohashing to
turn a spatial entity into a string.

  1. What pieces of plumbing would I need to write? As I mentioned, this datastore is non-relational and does not have a formal structured query language like SQL, so my assumption is that I likely need to build some sort of predicate analyzer/query builder, but I’m OK with that. My main question is around the integration points with GeoServer.

You need to write your own DataStore implementation

  1. What is the recommended process for deploying/testing/debugging? I have read the developer manual, but it seems a little thin

DataStores are GeoTools abstractions, you should look into the GeoTools manuals.
Starting points:
http://docs.geotools.org/stable/tutorials/advanced/datastore.html
and have a look at any ContentDataStore based implementation.
There are various, the jdbc stores, the aggregating store, property data store ng,
the ogr store, etc, have a look here:
http://svn.osgeo.org/geotools/trunk/

Cheers
Andrea

Ing. Andrea Aime
GeoSolutions S.A.S.
Tech lead

Via Poggio alle Viti 1187
55054 Massarosa (LU)
Italy

phone: +39 0584 962313
fax: +39 0584 962313

http://www.geo-solutions.it
http://geo-solutions.blogspot.com/
http://www.youtube.com/user/GeoSolutionsIT
http://www.linkedin.com/in/andreaaime
http://twitter.com/geowolf


Some progress, but I think I’m stuck again.

I have implemented a toy version of my data store as per the abstract data store docs here: http://docs.geotools.org/latest/userguide/tutorial/advanced/abstractdatastore.html, including some JUnit tests to make sure that it works (it does, for everything that I can test).

Now I want to get it working in the geoserver web UI (so that I can create some stores and maybe get some data loaded into them). I have created my META-INF/services/org.geotools.data.DataStoreFactorySpi file, and put the name of my datastore into it (org.geotools.extension.HBaseDataStoreFactory, for what it’s worth). I have also packaged my data store and associated DataStoreFactorySpi implementation into a jar located in WEB-INF/lib. Unfortunately, I am not seeing the new store type as an option in the “New data source” page of the web UI. I’ve cranked up the logging on my application server and in Geoserver itself, and I am not even seeing the name of my data store in the logs.

Any ideas? Is there some other magic that I’m not yet aware of?

On Sat, Nov 12, 2011 at 1:01 PM, Chris Shain <chris@anonymised.com> wrote:

Thanks Andrea! I will let you know how I make out and whether (read: when) I run into trouble next.

For the historical record, I mistyped and meant HBase, not Hive, but the questions and responses are the same.

For the curious, HBase is a distributed database built on the Hadoop platform, and is capable of handling petabytes of data, including binary data like map tiles. It is an open source implementation of Google’s BigTable architecture.

On Nov 12, 2011 11:59 AM, “Andrea Aime” <andrea.aime@anonymised.com> wrote:

On Sat, Nov 12, 2011 at 4:48 PM, Chris Shain <chris@anonymised.com> wrote:

All,

I am exploring the possibility of developing a new GeoServer data store extension (along the lines of the sql server/oracle/teradata extensions) for the Hive distributed database. This would presumably allow me to store geo data at very large scale with excellent response time. I am particularly interested in storing map overlay features (i.e. individual address points with associated data and area overlays, like zip codes, states, counties, etc).

I have a few questions:

  1. Does this seem feasible, e.g. are there any obvious roadblocks to this or am I unlikely to succeed for some technical reason that I haven’t seen yet?

I guess you need some way to spatially index your data so that you can respond quickly to bbox queries.
Not knowing much about hive, if it’s able to repond to a “like” query fast you could use geohashing to
turn a spatial entity into a string.

  1. What pieces of plumbing would I need to write? As I mentioned, this datastore is non-relational and does not have a formal structured query language like SQL, so my assumption is that I likely need to build some sort of predicate analyzer/query builder, but I’m OK with that. My main question is around the integration points with GeoServer.

You need to write your own DataStore implementation

  1. What is the recommended process for deploying/testing/debugging? I have read the developer manual, but it seems a little thin

DataStores are GeoTools abstractions, you should look into the GeoTools manuals.
Starting points:
http://docs.geotools.org/stable/tutorials/advanced/datastore.html
and have a look at any ContentDataStore based implementation.
There are various, the jdbc stores, the aggregating store, property data store ng,
the ogr store, etc, have a look here:
http://svn.osgeo.org/geotools/trunk/

Cheers
Andrea

Ing. Andrea Aime
GeoSolutions S.A.S.
Tech lead

Via Poggio alle Viti 1187
55054 Massarosa (LU)
Italy

phone: +39 0584 962313
fax: +39 0584 962313

http://www.geo-solutions.it
http://geo-solutions.blogspot.com/
http://www.youtube.com/user/GeoSolutionsIT
http://www.linkedin.com/in/andreaaime
http://twitter.com/geowolf


On Wed, Nov 16, 2011 at 12:06 AM, Chris Shain <chris@anonymised.com> wrote:

Some progress, but I think I’m stuck again.

I have implemented a toy version of my data store as per the abstract data store docs here: http://docs.geotools.org/latest/userguide/tutorial/advanced/abstractdatastore.html, including some JUnit tests to make sure that it works (it does, for everything that I can test).

Ugh, AbstractDataStore is something we’re trying to move away from. I tried to steer you torwards ContentDataStore for a reason, it’s the base class for all new data stores (and the few old data stores still using it are being migrated to it as well).

Now I want to get it working in the geoserver web UI (so that I can create some stores and maybe get some data loaded into them). I have created my META-INF/services/org.geotools.data.DataStoreFactorySpi file, and put the name of my datastore into it (org.geotools.extension.HBaseDataStoreFactory, for what it’s worth). I have also packaged my data store and associated DataStoreFactorySpi implementation into a jar located in WEB-INF/lib. Unfortunately, I am not seeing the new store type as an option in the “New data source” page of the web UI. I’ve cranked up the logging on my application server and in Geoserver itself, and I am not even seeing the name of my data store in the logs.

Any ideas? Is there some other magic that I’m not yet aware of?

Not really. If you look at the existing data stores code there is always a class testing the data store creating
via DataAccessFinder + param map, which is the pluggable way of doing it.
If that does not work GeoServer won’t be able to pick up your new store

Cheers
Andrea

Ing. Andrea Aime
GeoSolutions S.A.S.
Tech lead

Via Poggio alle Viti 1187
55054 Massarosa (LU)
Italy

phone: +39 0584 962313
fax: +39 0584 962313

http://www.geo-solutions.it
http://geo-solutions.blogspot.com/
http://www.youtube.com/user/GeoSolutionsIT
http://www.linkedin.com/in/andreaaime
http://twitter.com/geowolf


That’s OK- the documentation for Abstract was a lot more complete, so I went with that. I am sure at this point I can re-implement on Content now that I know how things work.

Can you point me to the sources for the existing datastore addins? There are no zipped sources on sourceforge, and I am wary of checking out everything via svn.

On Wed, Nov 16, 2011 at 2:47 AM, Andrea Aime <andrea.aime@anonymised.com> wrote:

On Wed, Nov 16, 2011 at 12:06 AM, Chris Shain <chris@anonymised.com> wrote:

Some progress, but I think I’m stuck again.

I have implemented a toy version of my data store as per the abstract data store docs here: http://docs.geotools.org/latest/userguide/tutorial/advanced/abstractdatastore.html, including some JUnit tests to make sure that it works (it does, for everything that I can test).

Ugh, AbstractDataStore is something we’re trying to move away from. I tried to steer you torwards ContentDataStore for a reason, it’s the base class for all new data stores (and the few old data stores still using it are being migrated to it as well).

Now I want to get it working in the geoserver web UI (so that I can create some stores and maybe get some data loaded into them). I have created my META-INF/services/org.geotools.data.DataStoreFactorySpi file, and put the name of my datastore into it (org.geotools.extension.HBaseDataStoreFactory, for what it’s worth). I have also packaged my data store and associated DataStoreFactorySpi implementation into a jar located in WEB-INF/lib. Unfortunately, I am not seeing the new store type as an option in the “New data source” page of the web UI. I’ve cranked up the logging on my application server and in Geoserver itself, and I am not even seeing the name of my data store in the logs.

Any ideas? Is there some other magic that I’m not yet aware of?

Not really. If you look at the existing data stores code there is always a class testing the data store creating
via DataAccessFinder + param map, which is the pluggable way of doing it.
If that does not work GeoServer won’t be able to pick up your new store

Cheers

Andrea

Ing. Andrea Aime
GeoSolutions S.A.S.
Tech lead

Via Poggio alle Viti 1187
55054 Massarosa (LU)
Italy

phone: +39 0584 962313
fax: +39 0584 962313

http://www.geo-solutions.it
http://geo-solutions.blogspot.com/
http://www.youtube.com/user/GeoSolutionsIT
http://www.linkedin.com/in/andreaaime
http://twitter.com/geowolf


On Wed, Nov 16, 2011 at 4:31 PM, Chris Shain <chris@anonymised.com> wrote:

That’s OK- the documentation for Abstract was a lot more complete, so I went with that. I am sure at this point I can re-implement on Content now that I know how things work.

Can you point me to the sources for the existing datastore addins? There are no zipped sources on sourceforge, and I am wary of checking out everything via svn.

The modules are in GeoTools and they cannot be downloaded separately from the rest.
You can download the full sources getting the “-project” file in the GeoTools downlads, example:

http://sourceforge.net/projects/geotools/files/GeoTools%208.0%20Releases/8.0-M3/

Some direct svn links. The first store group using the ContentDataStore is the JDBC ones:
http://svn.osgeo.org/geotools/trunk/modules/plugin/jdbc/ (base classes) and http://svn.osgeo.org/geotools/trunk/modules/plugin/jdbc/ (implementations for the various databases)

We have a growing group of new stores based on the same class now:
http://svn.osgeo.org/geotools/trunk/modules/unsupported/couchdb
http://svn.osgeo.org/geotools/trunk/modules/unsupported/feature-aggregate/
http://svn.osgeo.org/geotools/trunk/modules/unsupported/ogr/
http://svn.osgeo.org/geotools/trunk/modules/unsupported/property-ng/
http://svn.osgeo.org/geotools/trunk/modules/unsupported/shapefile-ng/

Cheers
Andrea

Ing. Andrea Aime
GeoSolutions S.A.S.
Tech lead

Via Poggio alle Viti 1187
55054 Massarosa (LU)
Italy

phone: +39 0584 962313
fax: +39 0584 962313

http://www.geo-solutions.it
http://geo-solutions.blogspot.com/
http://www.youtube.com/user/GeoSolutionsIT
http://www.linkedin.com/in/andreaaime
http://twitter.com/geowolf


Many thanks. That’s helped a bunch. If I ever get this working, I’ll make an attempt at getting the docs for the process up to date.

On Wed, Nov 16, 2011 at 11:06 AM, Andrea Aime <andrea.aime@anonymised.com> wrote:

On Wed, Nov 16, 2011 at 4:31 PM, Chris Shain <chris@anonymised.com> wrote:

That’s OK- the documentation for Abstract was a lot more complete, so I went with that. I am sure at this point I can re-implement on Content now that I know how things work.

Can you point me to the sources for the existing datastore addins? There are no zipped sources on sourceforge, and I am wary of checking out everything via svn.

The modules are in GeoTools and they cannot be downloaded separately from the rest.
You can download the full sources getting the “-project” file in the GeoTools downlads, example:

http://sourceforge.net/projects/geotools/files/GeoTools%208.0%20Releases/8.0-M3/

Some direct svn links. The first store group using the ContentDataStore is the JDBC ones:
http://svn.osgeo.org/geotools/trunk/modules/plugin/jdbc/ (base classes) and http://svn.osgeo.org/geotools/trunk/modules/plugin/jdbc/ (implementations for the various databases)

We have a growing group of new stores based on the same class now:
http://svn.osgeo.org/geotools/trunk/modules/unsupported/couchdb
http://svn.osgeo.org/geotools/trunk/modules/unsupported/feature-aggregate/
http://svn.osgeo.org/geotools/trunk/modules/unsupported/ogr/
http://svn.osgeo.org/geotools/trunk/modules/unsupported/property-ng/
http://svn.osgeo.org/geotools/trunk/modules/unsupported/shapefile-ng/

Cheers

Andrea

Ing. Andrea Aime
GeoSolutions S.A.S.
Tech lead

Via Poggio alle Viti 1187
55054 Massarosa (LU)
Italy

phone: +39 0584 962313
fax: +39 0584 962313

http://www.geo-solutions.it
http://geo-solutions.blogspot.com/
http://www.youtube.com/user/GeoSolutionsIT
http://www.linkedin.com/in/andreaaime
http://twitter.com/geowolf


FYI the immediate reason that my data source was not showing up was that I was building against the wrong version of the GeoTools libs (duh). I had been building against 8.0-M3, but the stable version of GeoServer uses 2.7.3 it seems.

The next question that I have is around packaging. It seems that most of the other data stores have very minimal additional baggage, but unfortunately the HBase client carries ~50MB of additional dependencies. Obviously including all of that in the plugin jar isn’t optimal. Is there an application-level way to instruct GeoServer to append some folder (or set of jars) to it’s classpath? Everything I read suggests that it is the (WAR’d) application’s responsibility (in this case GeoServer) to manage dependencies, not the application server, which kind of makes sense to me.

On Wed, Nov 16, 2011 at 11:16 AM, Chris Shain <chris@anonymised.com> wrote:

Many thanks. That’s helped a bunch. If I ever get this working, I’ll make an attempt at getting the docs for the process up to date.

On Wed, Nov 16, 2011 at 11:06 AM, Andrea Aime <andrea.aime@anonymised.com> wrote:

On Wed, Nov 16, 2011 at 4:31 PM, Chris Shain <chris@anonymised.com> wrote:

That’s OK- the documentation for Abstract was a lot more complete, so I went with that. I am sure at this point I can re-implement on Content now that I know how things work.

Can you point me to the sources for the existing datastore addins? There are no zipped sources on sourceforge, and I am wary of checking out everything via svn.

The modules are in GeoTools and they cannot be downloaded separately from the rest.
You can download the full sources getting the “-project” file in the GeoTools downlads, example:

http://sourceforge.net/projects/geotools/files/GeoTools%208.0%20Releases/8.0-M3/

Some direct svn links. The first store group using the ContentDataStore is the JDBC ones:
http://svn.osgeo.org/geotools/trunk/modules/plugin/jdbc/ (base classes) and http://svn.osgeo.org/geotools/trunk/modules/plugin/jdbc/ (implementations for the various databases)

We have a growing group of new stores based on the same class now:
http://svn.osgeo.org/geotools/trunk/modules/unsupported/couchdb
http://svn.osgeo.org/geotools/trunk/modules/unsupported/feature-aggregate/
http://svn.osgeo.org/geotools/trunk/modules/unsupported/ogr/
http://svn.osgeo.org/geotools/trunk/modules/unsupported/property-ng/
http://svn.osgeo.org/geotools/trunk/modules/unsupported/shapefile-ng/

Cheers

Andrea

Ing. Andrea Aime
GeoSolutions S.A.S.
Tech lead

Via Poggio alle Viti 1187
55054 Massarosa (LU)
Italy

phone: +39 0584 962313
fax: +39 0584 962313

http://www.geo-solutions.it
http://geo-solutions.blogspot.com/
http://www.youtube.com/user/GeoSolutionsIT
http://www.linkedin.com/in/andreaaime
http://twitter.com/geowolf


On Thu, Nov 17, 2011 at 1:05 AM, Chris Shain <chris@anonymised.com> wrote:

FYI the immediate reason that my data source was not showing up was that I was building against the wrong version of the GeoTools libs (duh). I had been building against 8.0-M3, but the stable version of GeoServer uses 2.7.3 it seems.

Correct. New developments are always made first against trunk and then backported to the stable series once they are good, that’s why I pointed you there.
Also, the set of stores based on ContentDataStore on 2.7.x is a lot smaller (basically just the jdbc ones).

The next question that I have is around packaging. It seems that most of the other data stores have very minimal additional baggage, but unfortunately the HBase client carries ~50MB of additional dependencies. Obviously including all of that in the plugin jar isn’t optimal. Is there an application-level way to instruct GeoServer to append some folder (or set of jars) to it’s classpath? Everything I read suggests that it is the (WAR’d) application’s responsibility (in this case GeoServer) to manage dependencies, not the application server, which kind of makes sense to me.

GeoServer does not have, at the moment, the ability to pick jars from a random folder. Extensions are zip files containing a set of jars that you unzip and drop in WEB-INF/lib.
If someone wants to try and add support for loading external jars it may be good, I guess.
The thing is that, regardless, there is no guarantee that an extension built for 2.1.1 will work for 2.1.2 (a datastore will, something that depends on some internal
GeoServer class might not), so putting them, say, in the data dir, might cause quite a bit of confusion when people upgrade GeoServer.

Cheers
Andrea

Ing. Andrea Aime
GeoSolutions S.A.S.
Tech lead

Via Poggio alle Viti 1187
55054 Massarosa (LU)
Italy

phone: +39 0584 962313
fax: +39 0584 962313

http://www.geo-solutions.it
http://geo-solutions.blogspot.com/
http://www.youtube.com/user/GeoSolutionsIT
http://www.linkedin.com/in/andreaaime
http://twitter.com/geowolf


So far I’m in pretty good shape- I can store and retrieve SimpleFeatures, and I can see said features on OpenLayers and in ArcMap (via WFS). In the end, not too hard to do, once I got all of the dependencies right.

Now on to bboxes…

Bounding boxes, as near as I can tell, work as follows- assume you have a 3D coordinate space, X, Y, Z. Z is elevation, and is often 0. You need to figure out quickly which Features of a given type intersect a given rectangular box (min/max along each dimension, representing what the user is currently looking at), keeping in mind that features may be non-rectangular and may not have any points inside of the rectangle. For example, given the bounding box ((0,0),(10,10)), that box is intersected by the triangle (polygon) ((5,12),(12,5),(12,12)) even though none of those points fall inside of the box. This is complicated by the fact that the bounding box may be small or quite large (e.g. the whole world). From my reading, it seems like the way that this is done is first to geohash the centroid of any given SimpleFeature, then truncate the geohash to an acceptable search range (I am assuming that this derives from the size of the bbox?), then calculate the 9 neighbors of that geohash (resulting in a 3x3 rectangle of 9 geohashes). That gives you 9 prefixes that you need to search on in order to quickly find the features in a bounding box around the feature. Is that right?

So in my case, I have effectively unlimited storage space (literally petabytes) and fixed lookup/sorted scan time. Given that, I could generate a row for every possible prefix for any given feature centroid geohash I’ve indexed (e.g. “a”, “ab”, “abc”, abc1", “abc12”, “abc123”), and in each of those rows, a value for every feature id with that geohash prefix (I also have the pleasure of being schemaless). That way given a bounding box with a given centroid geohash and precision (again, does geohash precision correspond to size of the bounding box?), I can always look up in exactly 9 fetches operations everything in that bbox. Does that seem right?

On Thu, Nov 17, 2011 at 4:26 AM, Andrea Aime <andrea.aime@anonymised.com> wrote:

On Thu, Nov 17, 2011 at 1:05 AM, Chris Shain <chris@anonymised.com> wrote:

FYI the immediate reason that my data source was not showing up was that I was building against the wrong version of the GeoTools libs (duh). I had been building against 8.0-M3, but the stable version of GeoServer uses 2.7.3 it seems.

Correct. New developments are always made first against trunk and then backported to the stable series once they are good, that’s why I pointed you there.
Also, the set of stores based on ContentDataStore on 2.7.x is a lot smaller (basically just the jdbc ones).

The next question that I have is around packaging. It seems that most of the other data stores have very minimal additional baggage, but unfortunately the HBase client carries ~50MB of additional dependencies. Obviously including all of that in the plugin jar isn’t optimal. Is there an application-level way to instruct GeoServer to append some folder (or set of jars) to it’s classpath? Everything I read suggests that it is the (WAR’d) application’s responsibility (in this case GeoServer) to manage dependencies, not the application server, which kind of makes sense to me.

GeoServer does not have, at the moment, the ability to pick jars from a random folder. Extensions are zip files containing a set of jars that you unzip and drop in WEB-INF/lib.
If someone wants to try and add support for loading external jars it may be good, I guess.
The thing is that, regardless, there is no guarantee that an extension built for 2.1.1 will work for 2.1.2 (a datastore will, something that depends on some internal
GeoServer class might not), so putting them, say, in the data dir, might cause quite a bit of confusion when people upgrade GeoServer.

Cheers

Andrea

Ing. Andrea Aime
GeoSolutions S.A.S.
Tech lead

Via Poggio alle Viti 1187
55054 Massarosa (LU)
Italy

phone: +39 0584 962313
fax: +39 0584 962313

http://www.geo-solutions.it
http://geo-solutions.blogspot.com/
http://www.youtube.com/user/GeoSolutionsIT
http://www.linkedin.com/in/andreaaime
http://twitter.com/geowolf


Can anyone give me guidance on how to hook an optimized (indexed) bounding box feature lookup to the ContentDataStore/ContentFeatureStore/FeatureReader<SimpleFeatureType, SimpleFeature> class hierarchy?

i.e. I was assuming there was a getFeaturesByBoundingBox( … ) type of function, but that doesn’t seem to be available. What is my integration point here?

On Tue, Nov 22, 2011 at 6:49 PM, Chris Shain <chris@anonymised.com> wrote:

So far I’m in pretty good shape- I can store and retrieve SimpleFeatures, and I can see said features on OpenLayers and in ArcMap (via WFS). In the end, not too hard to do, once I got all of the dependencies right.

Now on to bboxes…

Bounding boxes, as near as I can tell, work as follows- assume you have a 3D coordinate space, X, Y, Z. Z is elevation, and is often 0. You need to figure out quickly which Features of a given type intersect a given rectangular box (min/max along each dimension, representing what the user is currently looking at), keeping in mind that features may be non-rectangular and may not have any points inside of the rectangle. For example, given the bounding box ((0,0),(10,10)), that box is intersected by the triangle (polygon) ((5,12),(12,5),(12,12)) even though none of those points fall inside of the box. This is complicated by the fact that the bounding box may be small or quite large (e.g. the whole world). From my reading, it seems like the way that this is done is first to geohash the centroid of any given SimpleFeature, then truncate the geohash to an acceptable search range (I am assuming that this derives from the size of the bbox?), then calculate the 9 neighbors of that geohash (resulting in a 3x3 rectangle of 9 geohashes). That gives you 9 prefixes that you need to search on in order to quickly find the features in a bounding box around the feature. Is that right?

So in my case, I have effectively unlimited storage space (literally petabytes) and fixed lookup/sorted scan time. Given that, I could generate a row for every possible prefix for any given feature centroid geohash I’ve indexed (e.g. “a”, “ab”, “abc”, abc1", “abc12”, “abc123”), and in each of those rows, a value for every feature id with that geohash prefix (I also have the pleasure of being schemaless). That way given a bounding box with a given centroid geohash and precision (again, does geohash precision correspond to size of the bounding box?), I can always look up in exactly 9 fetches operations everything in that bbox. Does that seem right?

On Thu, Nov 17, 2011 at 4:26 AM, Andrea Aime <andrea.aime@anonymised.com> wrote:

On Thu, Nov 17, 2011 at 1:05 AM, Chris Shain <chris@anonymised.com> wrote:

FYI the immediate reason that my data source was not showing up was that I was building against the wrong version of the GeoTools libs (duh). I had been building against 8.0-M3, but the stable version of GeoServer uses 2.7.3 it seems.

Correct. New developments are always made first against trunk and then backported to the stable series once they are good, that’s why I pointed you there.
Also, the set of stores based on ContentDataStore on 2.7.x is a lot smaller (basically just the jdbc ones).

The next question that I have is around packaging. It seems that most of the other data stores have very minimal additional baggage, but unfortunately the HBase client carries ~50MB of additional dependencies. Obviously including all of that in the plugin jar isn’t optimal. Is there an application-level way to instruct GeoServer to append some folder (or set of jars) to it’s classpath? Everything I read suggests that it is the (WAR’d) application’s responsibility (in this case GeoServer) to manage dependencies, not the application server, which kind of makes sense to me.

GeoServer does not have, at the moment, the ability to pick jars from a random folder. Extensions are zip files containing a set of jars that you unzip and drop in WEB-INF/lib.
If someone wants to try and add support for loading external jars it may be good, I guess.
The thing is that, regardless, there is no guarantee that an extension built for 2.1.1 will work for 2.1.2 (a datastore will, something that depends on some internal
GeoServer class might not), so putting them, say, in the data dir, might cause quite a bit of confusion when people upgrade GeoServer.

Cheers

Andrea

Ing. Andrea Aime
GeoSolutions S.A.S.
Tech lead

Via Poggio alle Viti 1187
55054 Massarosa (LU)
Italy

phone: +39 0584 962313
fax: +39 0584 962313

http://www.geo-solutions.it
http://geo-solutions.blogspot.com/
http://www.youtube.com/user/GeoSolutionsIT
http://www.linkedin.com/in/andreaaime
http://twitter.com/geowolf


On Mon, Nov 28, 2011 at 8:32 PM, Chris Shain <chris@anonymised.com> wrote:

Can anyone give me guidance on how to hook an optimized (indexed) bounding box feature lookup to the ContentDataStore/ContentFeatureStore/FeatureReader<SimpleFeatureType, SimpleFeature> class hierarchy?

i.e. I was assuming there was a getFeaturesByBoundingBox( … ) type of function, but that doesn’t seem to be available. What is my integration point here?

The Query object, and the Filter inside of it. If all the filtering you can do natively is by bounding box
then the situation looks a lot like in the shapefile data store, there is already a filter visitor that
will extract the bbox out of a filter (and then you have to run the full filter in memory, two pass
filtering, first pass natively against the bbox, second pass in memory to get support for all kinds
of filters).

See the ShapefileFeatureSource.getReaderInternal() method here for an example:

http://svn.osgeo.org/geotools/trunk/modules/unsupported/shapefile-ng/src/main/java/org/geotools/data/shapefile/ng/ShapefileFeatureSource.java

Cheers
Andrea

Ing. Andrea Aime
GeoSolutions S.A.S.
Tech lead

Via Poggio alle Viti 1187
55054 Massarosa (LU)
Italy

phone: +39 0584 962313
fax: +39 0584 962313

http://www.geo-solutions.it
http://geo-solutions.blogspot.com/
http://www.youtube.com/user/GeoSolutionsIT
http://www.linkedin.com/in/andreaaime
http://twitter.com/geowolf


Perfect, thanks again Andrea

On Mon, Nov 28, 2011 at 2:44 PM, Andrea Aime <andrea.aime@anonymised.com> wrote:

On Mon, Nov 28, 2011 at 8:32 PM, Chris Shain <chris@anonymised.com> wrote:

Can anyone give me guidance on how to hook an optimized (indexed) bounding box feature lookup to the ContentDataStore/ContentFeatureStore/FeatureReader<SimpleFeatureType, SimpleFeature> class hierarchy?

i.e. I was assuming there was a getFeaturesByBoundingBox( … ) type of function, but that doesn’t seem to be available. What is my integration point here?

The Query object, and the Filter inside of it. If all the filtering you can do natively is by bounding box
then the situation looks a lot like in the shapefile data store, there is already a filter visitor that
will extract the bbox out of a filter (and then you have to run the full filter in memory, two pass
filtering, first pass natively against the bbox, second pass in memory to get support for all kinds
of filters).

See the ShapefileFeatureSource.getReaderInternal() method here for an example:

http://svn.osgeo.org/geotools/trunk/modules/unsupported/shapefile-ng/src/main/java/org/geotools/data/shapefile/ng/ShapefileFeatureSource.java

Cheers

Andrea

Ing. Andrea Aime
GeoSolutions S.A.S.
Tech lead

Via Poggio alle Viti 1187
55054 Massarosa (LU)
Italy

phone: +39 0584 962313
fax: +39 0584 962313

http://www.geo-solutions.it
http://geo-solutions.blogspot.com/
http://www.youtube.com/user/GeoSolutionsIT
http://www.linkedin.com/in/andreaaime
http://twitter.com/geowolf