[Geoserver-users] Loading vector data into SOLR

Hello all,

I see that there is a SOLR backend community module available to use, which I find quite interesting and was wondering two things;
How does the performance compare between SOLR and say Postgres/gis?
How do I load GML into SOLR?

I’ve been looking at https://cwiki.apache.org/confluence/display/solr/Spatial+Search and it says to use WKT - is that really the best way of doing things?

Any help would be much appreciated.

Regards,
Ramo

Hello, Ramo.

Andre Aime is more expert than me, but I can tell you a few things.

Yes, the spatial data does need to be stored in SOLR as a WKT string. SOLR has its own spatial search capability and this capability is utilized by the SOLR module.

If you can get your shapes/points into PostGIS, you should be able to create the WKT strings fairly easily - http://postgis.refractions.net/documentation/manual-1.4/ST_AsText.html

The performance of the spatial search in SOLR is not as good as in PostGIS - as you probably would expect - but it is still quite good.

The main idea of the module is to be able to plot the results of any ‘q’, ‘fq’ search in a SOLR store - ie. run very powerful text-based searches - and then plot the results efficiently - particularly important if your search is returning large numbers of results.

Regards,

David Collins

Geological Survey of NSW, Australia.

···

On Wed, Mar 11, 2015 at 8:55 PM, Tobias Reinicke <ramotswa@anonymised.com> wrote:

Hello all,

I see that there is a SOLR backend community module available to use, which I find quite interesting and was wondering two things;
How does the performance compare between SOLR and say Postgres/gis?
How do I load GML into SOLR?

I’ve been looking at https://cwiki.apache.org/confluence/display/solr/Spatial+Search and it says to use WKT - is that really the best way of doing things?

Any help would be much appreciated.

Regards,
Ramo


Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now. http://goparallel.sourceforge.net/


Geoserver-users mailing list
Geoserver-users@anonymised.comsts.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/geoserver-users

Hi David,

Many thanks - that’s some great info - if I may pry though - why is the spatial search of SOLR not as good as PostGIS (as I don’t know SOLR much I don’t know what to expect :slight_smile: ) - I presume it’s along the lines of PG has better spatial indexes (indicies?)?
If so, that pretty much answers my question; that there is very little point in trying to replace my PG with a SOLR backend for WMS rendering purposes. Would that be a right assumption?

Thanks again,

Ramo

···

On 11 March 2015 at 22:31, David Collins <david.8.collins@anonymised.com> wrote:

Hello, Ramo.

Andre Aime is more expert than me, but I can tell you a few things.

Yes, the spatial data does need to be stored in SOLR as a WKT string. SOLR has its own spatial search capability and this capability is utilized by the SOLR module.

If you can get your shapes/points into PostGIS, you should be able to create the WKT strings fairly easily - http://postgis.refractions.net/documentation/manual-1.4/ST_AsText.html

The performance of the spatial search in SOLR is not as good as in PostGIS - as you probably would expect - but it is still quite good.

The main idea of the module is to be able to plot the results of any ‘q’, ‘fq’ search in a SOLR store - ie. run very powerful text-based searches - and then plot the results efficiently - particularly important if your search is returning large numbers of results.

Regards,

David Collins

Geological Survey of NSW, Australia.

On Wed, Mar 11, 2015 at 8:55 PM, Tobias Reinicke <ramotswa@anonymised.com.> wrote:

Hello all,

I see that there is a SOLR backend community module available to use, which I find quite interesting and was wondering two things;
How does the performance compare between SOLR and say Postgres/gis?
How do I load GML into SOLR?

I’ve been looking at https://cwiki.apache.org/confluence/display/solr/Spatial+Search and it says to use WKT - is that really the best way of doing things?

Any help would be much appreciated.

Regards,
Ramo


Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now. http://goparallel.sourceforge.net/


Geoserver-users mailing list
Geoserver-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/geoserver-users

On Thu, Mar 12, 2015 at 10:09 AM, Tobias Reinicke <ramotswa@anonymised.com>
wrote:

Hi David,

Many thanks - that's some great info - if I may pry though - why is the
spatial search of SOLR not as good as PostGIS (as I don't know SOLR much I
don't know what to expect :slight_smile: ) - I presume it's along the lines of PG has
better spatial indexes (indicies?)?

I don't have numbers to back what I'm about to say, but I think the search
is actually good, it's the data transfer that is not as fast as in
PostGIS...
that said, we would need to do some benchmarking to find that out, the
impression of SOLR being slower comes from trying to render
a lot of points coming from SOLR into GeoServer, which is definitely slower
than doing the same from PostGIS

Cheers
Andrea

--

GeoServer Professional Services from the experts! Visit
http://goo.gl/NWWaa2 for more information.

Ing. Andrea Aime
@geowolf
Technical Lead

GeoSolutions S.A.S.
Via Poggio alle Viti 1187
55054 Massarosa (LU)
Italy
phone: +39 0584 962313
fax: +39 0584 1660272
mob: +39 339 8844549

http://www.geo-solutions.it
http://twitter.com/geosolutions_it

*AVVERTENZE AI SENSI DEL D.Lgs. 196/2003*

Le informazioni contenute in questo messaggio di posta elettronica e/o
nel/i file/s allegato/i sono da considerarsi strettamente riservate. Il
loro utilizzo è consentito esclusivamente al destinatario del messaggio,
per le finalità indicate nel messaggio stesso. Qualora riceviate questo
messaggio senza esserne il destinatario, Vi preghiamo cortesemente di
darcene notizia via e-mail e di procedere alla distruzione del messaggio
stesso, cancellandolo dal Vostro sistema. Conservare il messaggio stesso,
divulgarlo anche in parte, distribuirlo ad altri soggetti, copiarlo, od
utilizzarlo per finalità diverse, costituisce comportamento contrario ai
principi dettati dal D.Lgs. 196/2003.

The information in this message and/or attachments, is intended solely for
the attention and use of the named addressee(s) and may be confidential or
proprietary in nature or covered by the provisions of privacy act
(Legislative Decree June, 30 2003, no.196 - Italy's New Data Protection
Code).Any use not in accord with its purpose, any disclosure, reproduction,
copying, distribution, or either dissemination, either whole or partial, is
strictly forbidden except previous formal approval of the named
addressee(s). If you are not the intended recipient, please contact
immediately the sender by telephone, fax or e-mail and delete the
information in this message that has been received in error. The sender
does not give any warranty or accept liability as the content, accuracy or
completeness of sent messages and accepts no responsibility for changes
made after they were sent or for other risks which arise as a result of
e-mail transmission, viruses, etc.

-------------------------------------------------------

Hello, Ramo.

There would be 2 reasons why SOLR is slower than PG

  1. SOLR is not built from the ground up for spatial queries - eg. it stores spatial data as WKT strings (although maybe it transforms them to something more efficient behind the scenes? - I don’t know)
  2. The transfer of the results from SOLR to Geoserver is probably less efficient than from PG - but we have chosen a binary format for data transfer, so that is as good as it can be

Despite the above, we have tested with queries that return tens of thousands of results, and the system works nicely. Best of course, if SOLR and Geoserver are situated close together.

Another performance enhancement we have made with the SOLR module, is that if you are doing a spatial search using polygons/multipolygons in SOLR, you can choose to return only the centroids of the polygons - of course this reduces the size of the returned data - and often this is all you want on the map, particularly if the polygons overlap and make a big mess on the map.

Another general hint for doing spatial searches in SOLR … if you have polygons with hundreds or thousands of vertices, consider SIMPLIFYING the polygons - ie. reducing the number of vertices, making the shapes coarser - the search will be much more efficient.

And the answer to your final question - yes - you will only hookup Geoserver to SOLR if you want to do powerful text-based searches (ie. google-style searches) on wordy data that is spatially located. For example, you have a document management system containing thousands of reports (or more) and each report is assigned a location - then you copy all the words from the reports and the locations of the reports into SOLR - then you can search for any word or phrase in any report and using Geoserver and the SOLR module, see where these reports cluster on a map - potentially quite powerful.

···

​Regards,
David​

On Thu, Mar 12, 2015 at 8:09 PM, Tobias Reinicke <ramotswa@anonymised.com> wrote:

Hi David,

Many thanks - that’s some great info - if I may pry though - why is the spatial search of SOLR not as good as PostGIS (as I don’t know SOLR much I don’t know what to expect :slight_smile: ) - I presume it’s along the lines of PG has better spatial indexes (indicies?)?
If so, that pretty much answers my question; that there is very little point in trying to replace my PG with a SOLR backend for WMS rendering purposes. Would that be a right assumption?

Thanks again,

Ramo

On 11 March 2015 at 22:31, David Collins <david.8.collins@anonymised.com> wrote:

Hello, Ramo.

Andre Aime is more expert than me, but I can tell you a few things.

Yes, the spatial data does need to be stored in SOLR as a WKT string. SOLR has its own spatial search capability and this capability is utilized by the SOLR module.

If you can get your shapes/points into PostGIS, you should be able to create the WKT strings fairly easily - http://postgis.refractions.net/documentation/manual-1.4/ST_AsText.html

The performance of the spatial search in SOLR is not as good as in PostGIS - as you probably would expect - but it is still quite good.

The main idea of the module is to be able to plot the results of any ‘q’, ‘fq’ search in a SOLR store - ie. run very powerful text-based searches - and then plot the results efficiently - particularly important if your search is returning large numbers of results.

Regards,

David Collins

Geological Survey of NSW, Australia.

On Wed, Mar 11, 2015 at 8:55 PM, Tobias Reinicke <ramotswa@anonymised.com> wrote:

Hello all,

I see that there is a SOLR backend community module available to use, which I find quite interesting and was wondering two things;
How does the performance compare between SOLR and say Postgres/gis?
How do I load GML into SOLR?

I’ve been looking at https://cwiki.apache.org/confluence/display/solr/Spatial+Search and it says to use WKT - is that really the best way of doing things?

Any help would be much appreciated.

Regards,
Ramo


Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now. http://goparallel.sourceforge.net/


Geoserver-users mailing list
Geoserver-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/geoserver-users