[Geoserver-devel] revamp of geosearch extension

Hi all,

this is just a little notice that I'm about to commit a revamp of the
geosearch extension soon, later today or tomorrow morning, so if anyone
has a concern this would be the time to speak up.

The rationale is that Google changed the way it indexes kml sitemaps. In
the past, it liked them to address specific placemarks and hence the
geosearch extension produced deep sitemaps down to actual features.

But the way it works now, and that has more chances to get crawled and
assigned high rates, is that the kml pointed out by the sitemap should
be some sort of "metadata kml document", meaning it should contain the
layer's title, abstract, may have some sample placemarks and/or bounding
box, and link back to the actual data.

By the other side, the sitemaps generated by the geosearch module were
broken since GeoServer 2.0.x (not sure about the exact version but way
in the past), in the sense that the urls it generated, besides the
toplevel sitemap.xml, lead to 404 http errors, so there's actually no
geoserver sitemap that could be crawled.

Another reason why the geosearch extension was not working is that the
sitemap it published lived under /rest/sitemap.xml. That is, it is not
possible to access it anonymously, as /rest/ requires authentication,
which the google bot does not, of course. So the new sitemap is gonna
live under /geosearch/sitemap.xml (thanks justin for the solution).

So that's more or less it. I'm planning to commit to trunk, have a demo
instance be crawled, and when having a confirmation that google likes it
backport to 2.1.x.

Any comments are welcome.

Cheers,
Gabriel

--
Gabriel Roldan
groldan@anonymised.com
Expert service straight from the developers

Good stuff Gabriel. As you noted the geosearch extension never really worked since how the google indexing worked was pretty opaque. Sounds like a great improvement.

On Tue, May 24, 2011 at 2:36 PM, Gabriel Roldán <groldan@anonymised.com> wrote:

Hi all,

this is just a little notice that I’m about to commit a revamp of the
geosearch extension soon, later today or tomorrow morning, so if anyone
has a concern this would be the time to speak up.

The rationale is that Google changed the way it indexes kml sitemaps. In
the past, it liked them to address specific placemarks and hence the
geosearch extension produced deep sitemaps down to actual features.

But the way it works now, and that has more chances to get crawled and
assigned high rates, is that the kml pointed out by the sitemap should
be some sort of “metadata kml document”, meaning it should contain the
layer’s title, abstract, may have some sample placemarks and/or bounding
box, and link back to the actual data.

By the other side, the sitemaps generated by the geosearch module were
broken since GeoServer 2.0.x (not sure about the exact version but way
in the past), in the sense that the urls it generated, besides the
toplevel sitemap.xml, lead to 404 http errors, so there’s actually no
geoserver sitemap that could be crawled.

Another reason why the geosearch extension was not working is that the
sitemap it published lived under /rest/sitemap.xml. That is, it is not
possible to access it anonymously, as /rest/ requires authentication,
which the google bot does not, of course. So the new sitemap is gonna
live under /geosearch/sitemap.xml (thanks justin for the solution).

So that’s more or less it. I’m planning to commit to trunk, have a demo
instance be crawled, and when having a confirmation that google likes it
backport to 2.1.x.

Any comments are welcome.

Cheers,
Gabriel


Gabriel Roldan
groldan@anonymised.com
Expert service straight from the developers


vRanger cuts backup time in half-while increasing security.
With the market-leading solution for virtual backup and recovery,
you get blazing-fast, flexible, and affordable data protection.
Download your free trial now.
http://p.sf.net/sfu/quest-d2dcopy1


Geoserver-devel mailing list
Geoserver-devel@anonymised.comsts.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/geoserver-devel


Justin Deoliveira
OpenGeo - http://opengeo.org
Enterprise support for open source geospatial.

On Tue, May 24, 2011 at 10:36 PM, Gabriel Roldán <groldan@anonymised.com.1501…> wrote:

Hi all,

this is just a little notice that I’m about to commit a revamp of the
geosearch extension soon, later today or tomorrow morning, so if anyone
has a concern this would be the time to speak up.

The rationale is that Google changed the way it indexes kml sitemaps. In
the past, it liked them to address specific placemarks and hence the
geosearch extension produced deep sitemaps down to actual features.

But the way it works now, and that has more chances to get crawled and
assigned high rates, is that the kml pointed out by the sitemap should
be some sort of “metadata kml document”, meaning it should contain the
layer’s title, abstract, may have some sample placemarks and/or bounding
box, and link back to the actual data.

By the other side, the sitemaps generated by the geosearch module were
broken since GeoServer 2.0.x (not sure about the exact version but way
in the past), in the sense that the urls it generated, besides the
toplevel sitemap.xml, lead to 404 http errors, so there’s actually no
geoserver sitemap that could be crawled.

Another reason why the geosearch extension was not working is that the
sitemap it published lived under /rest/sitemap.xml. That is, it is not
possible to access it anonymously, as /rest/ requires authentication,
which the google bot does not, of course. So the new sitemap is gonna
live under /geosearch/sitemap.xml (thanks justin for the solution).

So that’s more or less it. I’m planning to commit to trunk, have a demo
instance be crawled, and when having a confirmation that google likes it
backport to 2.1.x.

Any comments are welcome.

Nice to see work being brought back up to a working state, +1

Cheers
Andrea

Ing. Andrea Aime
GeoSolutions S.A.S.
Tech lead

Via Poggio alle Viti 1187
55054 Massarosa (LU)
Italy

phone: +39 0584 962313
fax: +39 0584 962313

http://www.geo-solutions.it
http://geo-solutions.blogspot.com/
http://www.youtube.com/user/GeoSolutionsIT
http://www.linkedin.com/in/andreaaime
http://twitter.com/geowolf