[Proposal] Add GeoParquet community module

Hi all,

I’d like to propose adding a new community module for the GeoParquet datastore.

GeoParquet is a specification for storing geospatial vector data in the Apache Parquet columnar storage format, enabling efficient querying and storage of large geospatial datasets.

The format is gaining popularity in the geospatial community as it provides compression, column-oriented layout, and predicate pushdown - all features that make it particularly well-suited for large geospatial datasets.

I’ve implemented a community module that:

  • Provides a Wicket UI for configuring the GeoParquet datastore in the GeoServer admin interface
  • Supports various connection patterns including local files, HTTP/HTTPS, S3, and glob patterns for multiple files
  • Includes proper type conversion for numeric parameters
  • Has comprehensive validation for URIs and glob patterns
  • Is well-documented with detailed Javadocs

The implementation leverages the GeoTools GeoParquet datastore, and I’ve focused on making the UI intuitive and robust with proper validation. The module supports both single file access and directory/glob pattern access for partitioned or multi-file datasets.

Regards,
Gabriel Roldan

2 Likes

Hi Gabriel,
+1 for the community module, while I have not tried it, I’ve been following the commits and it looks pretty interesting.

There’s only one thing that has me stumped, and mind, quickly glancing over the GeoTools store… why did you need
to create all those “ForwardingXYZ” classes, e.g. “ForwardingDataStore”, when they seem to be doing a very similar (maybe same)
job as the “DecoratingXYZ” that we already have in our codebase?

I get the slightly different meaning nuance, a forwarder is a straight delegation, a decoration often involves behavioral change,
but is it worth duplicating code for such a minor difference? (by the way, I had to look up the difference, never heard of a
“forwarder” before today, it seems to be something commonly present in Guava… wondering how many others were aware
of it without a search).

Hi Andrea,
thanks.

Good question about the decorators.

  • ForwardingDataStore: looks like I just missed there’s DecoratingDataStore
  • ForwardingFeatureSource: I couldn’t find such a decorator in GeoTools. There’s one in GeoServer though, GeoServerFeatureSource.
  • ForwardingDataStoreFactory: afaict there’s no such base decorator for DataStoreFactorySpi

I’m also using ReTypingFeatureCollection and DecoratingFeature that are already there.

So I didn’t add them due to any semantic purity or religious belief, but to keep the changes confined without touching core. I can get rid of ForwardingDataStore given there is a replacement in core I missed though.

We can have the discussion of providing a complete set of base decorators/forwarders for the GeoTools data apis, but that’d be a separate one. I figured package private in an unsupported module would be good enough for the time being given the time constraints. But yeah, I’d rather not overpopulate the codebase if I could.

Cheers!

+1

Cheers,

Torben

On Mon, Apr 21, 2025 at 7:04 PM Gabriel Roldan via OSGeo Discourse <noreply@discourse.osgeo.org> wrote:

groldan
April 22

Hi all,

I’d like to propose adding a new community module for the GeoParquet datastore.

GeoParquet is a specification for storing geospatial vector data in the Apache Parquet columnar storage format, enabling efficient querying and storage of large geospatial datasets.

The format is gaining popularity in the geospatial community as it provides compression, column-oriented layout, and predicate pushdown - all features that make it particularly well-suited for large geospatial datasets.

I’ve implemented a community module that:

  • Provides a Wicket UI for configuring the GeoParquet datastore in the GeoServer admin interface
  • Supports various connection patterns including local files, HTTP/HTTPS, S3, and glob patterns for multiple files
  • Includes proper type conversion for numeric parameters
  • Has comprehensive validation for URIs and glob patterns
  • Is well-documented with detailed Javadocs

The implementation leverages the GeoTools GeoParquet datastore, and I’ve focused on making the UI intuitive and robust with proper validation. The module supports both single file access and directory/glob pattern access for partitioned or multi-file datasets.

Regards,
Gabriel Roldan


Visit Topic or reply to this email to respond.

You are receiving this because you enabled mailing list mode.

To unsubscribe from these emails, click here.

+0