[Geoserver-devel] [geoserver-scm] [17118] trunk/src/main/src/main/java/org/geoserver/config/util/ XStreamPersister.java: [GEOS-5126] Saving a layer whose native SRS has no native EPSG code takes several seconds

Hey Andrea,

Wondering if this change warrants some further discussion since as I understand it this will result in a element not being encoded in many cases. Which perhaps for storing configuration might not be an issue since we do have the native crs wkt but it could have negative side affects for restconfig in which clients may be expecting a srs.

-Justin

On Sun, May 27, 2012 at 11:35 AM, <aaime@anonymised.com> wrote:

Revision
[17118](http://fisheye.codehaus.org/changelog/geoserver/?cs=17118)
Author
aaime
Date
2012-05-27 12:35:41 -0500 (Sun, 27 May 2012)
### Log Message
[GEOS-5126] Saving a layer whose native SRS has no native EPSG code takes several seconds

Modified Paths- trunk/src/main/src/main/java/org/geoserver/config/util/XStreamPersister.java

Diff

Modified: trunk/src/main/src/main/java/org/geoserver/config/util/XStreamPersister.java (17117 => 17118)


--- trunk/src/main/src/main/java/org/geoserver/config/util/XStreamPersister.java	2012-05-27 17:34:30 UTC (rev 17117)
+++ trunk/src/main/src/main/java/org/geoserver/config/util/XStreamPersister.java	2012-05-27 17:35:41 UTC (rev 17118)
@@ -973,7 +973,7 @@
         public String toString(Object obj) {
             CoordinateReferenceSystem crs = (CoordinateReferenceSystem) obj;
             try {
<del>-                Integer epsg = CRS.lookupEpsgCode(crs, true);
</del><ins>+                Integer epsg = CRS.lookupEpsgCode(crs, false);
</ins>                 if (epsg != null) {
                     return "EPSG:" + epsg;
                 }


To unsubscribe from this list please visit:

http://xircles.codehaus.org/manage_email


Justin Deoliveira
OpenGeo - http://opengeo.org
Enterprise support for open source geospatial.

On Mon, May 28, 2012 at 6:29 PM, Justin Deoliveira <jdeolive@anonymised.com> wrote:

Hey Andrea,

Wondering if this change warrants some further discussion since as I
understand it this will result in a <srs> element not being encoded in many
cases. Which perhaps for storing configuration might not be an issue since
we do have the native crs wkt but it could have negative side affects for
restconfig in which clients may be expecting a srs.

I've made the change straight away because of two reasons:
- it makes the GUI unusable if you have lots of those layers and need to
  change bits in the configurations (the delay happens at each save, not
  only when doing the first insert)
- rest-config clients need to be prepared to accept WKT anyways, since
  not all .prj files in the world can be matched ot a EPSG code, even
  after a full scan

Impact wise, it affects shapefiles and rasters
(databases and WFS come out with full EPSG codes).

Alternatives I see:
- have a configuration option to switch between the GUI unfriendly,
  but REST friendlier behavior
- keep some sort of central cache of the mappings, so that we
  can avoid doing full scans each time, the delay only happens
  the first time

Cheers
Andrea

--
Ing. Andrea Aime
GeoSolutions S.A.S.
Tech lead

Via Poggio alle Viti 1187
55054 Massarosa (LU)
Italy

phone: +39 0584 962313
fax: +39 0584 962313
mob: +39 339 8844549

http://www.geo-solutions.it
http://geo-solutions.blogspot.com/
http://www.youtube.com/user/GeoSolutionsIT
http://www.linkedin.com/in/andreaaime
http://twitter.com/geowolf

On Mon, May 28, 2012 at 11:01 AM, Andrea Aime <andrea.aime@anonymised.com> wrote:

On Mon, May 28, 2012 at 6:29 PM, Justin Deoliveira <jdeolive@anonymised.com> wrote:

Hey Andrea,

Wondering if this change warrants some further discussion since as I
understand it this will result in a element not being encoded in many
cases. Which perhaps for storing configuration might not be an issue since
we do have the native crs wkt but it could have negative side affects for
restconfig in which clients may be expecting a srs.

I’ve made the change straight away because of two reasons:

  • it makes the GUI unusable if you have lots of those layers and need to
    change bits in the configurations (the delay happens at each save, not
    only when doing the first insert)
  • rest-config clients need to be prepared to accept WKT anyways, since
    not all .prj files in the world can be matched ot a EPSG code, even
    after a full scan

Yup, makes sense, aware of the problem, just don’t think the fix is that cut and dry.

Impact wise, it affects shapefiles and rasters
(databases and WFS come out with full EPSG codes).

Alternatives I see:

  • have a configuration option to switch between the GUI unfriendly,
    but REST friendlier behavior
  • keep some sort of central cache of the mappings, so that we
    can avoid doing full scans each time, the delay only happens
    the first time

This seems to make sense to me. In CRS.lookup keep a set of most recently looked up CRS objects as a secondary cache, my guess is that most systems are probably using a handful of different CRS’s and like you say, the price is paid only once.

Cheers
Andrea


Ing. Andrea Aime
GeoSolutions S.A.S.
Tech lead

Via Poggio alle Viti 1187
55054 Massarosa (LU)
Italy

phone: +39 0584 962313
fax: +39 0584 962313
mob: +39 339 8844549

http://www.geo-solutions.it
http://geo-solutions.blogspot.com/
http://www.youtube.com/user/GeoSolutionsIT
http://www.linkedin.com/in/andreaaime
http://twitter.com/geowolf


Justin Deoliveira
OpenGeo - http://opengeo.org
Enterprise support for open source geospatial.

On Mon, May 28, 2012 at 7:14 PM, Justin Deoliveira <jdeolive@anonymised.com> wrote:

I've made the change straight away because of two reasons:
- it makes the GUI unusable if you have lots of those layers and need to
change bits in the configurations (the delay happens at each save, not
only when doing the first insert)
- rest-config clients need to be prepared to accept WKT anyways, since
not all .prj files in the world can be matched ot a EPSG code, even
after a full scan

Yup, makes sense, aware of the problem, just don't think the fix is that cut
and dry.

I can go for that, though I don't understand how it changes on the rest clients
side, since it merely changes a likeliness of getting the code, not
the fact that
the client is broken if it cannot handle WKT too.
There is however a timing issue:
- Rolling back the fix is going to make GS unusable again for GUI people that
  happen to hit the issue described in the jira (which I've fixed because I've
  been bitten by it multiple times in the past, and in a very severe
way yesterday)
- I cannot work on an alternative approach till next weekend (I'm going to be
  away for a few days)
- 2.1.4 may be coming out this week

Hmm... what to do?

Cheers
Andrea

--
Ing. Andrea Aime
GeoSolutions S.A.S.
Tech lead

Via Poggio alle Viti 1187
55054 Massarosa (LU)
Italy

phone: +39 0584 962313
fax: +39 0584 962313
mob: +39 339 8844549

http://www.geo-solutions.it
http://geo-solutions.blogspot.com/
http://www.youtube.com/user/GeoSolutionsIT
http://www.linkedin.com/in/andreaaime
http://twitter.com/geowolf

I can go for that, though I don't understand how it changes on the rest clients
side, since it merely changes a likeliness of getting the code, not
the fact that
the client is broken if it cannot handle WKT too.

Point in case, here is topp:states featureType.xml, from svn:
http://svn.codehaus.org/geoserver/trunk/data/release/workspaces/topp/states_shapefile/states/featuretype.xml

If a client cannot handle one of our oldest demo layers... I would say
it is broken.

Cheers
Andrea

--
Ing. Andrea Aime
GeoSolutions S.A.S.
Tech lead

Via Poggio alle Viti 1187
55054 Massarosa (LU)
Italy

phone: +39 0584 962313
fax: +39 0584 962313
mob: +39 339 8844549

http://www.geo-solutions.it
http://geo-solutions.blogspot.com/
http://www.youtube.com/user/GeoSolutionsIT
http://www.linkedin.com/in/andreaaime
http://twitter.com/geowolf

On Mon, May 28, 2012 at 11:27 AM, Andrea Aime <andrea.aime@anonymised.com> wrote:

On Mon, May 28, 2012 at 7:14 PM, Justin Deoliveira <jdeolive@anonymised.com> wrote:

I’ve made the change straight away because of two reasons:

  • it makes the GUI unusable if you have lots of those layers and need to
    change bits in the configurations (the delay happens at each save, not
    only when doing the first insert)
  • rest-config clients need to be prepared to accept WKT anyways, since
    not all .prj files in the world can be matched ot a EPSG code, even
    after a full scan

Yup, makes sense, aware of the problem, just don’t think the fix is that cut
and dry.

I can go for that, though I don’t understand how it changes on the rest clients
side, since it merely changes a likeliness of getting the code, not
the fact that
the client is broken if it cannot handle WKT too.

Well from my point of view this could be viewed a regression. WKT that we could map to an srs code automatically before is now not occurring.

There is however a timing issue:

  • Rolling back the fix is going to make GS unusable again for GUI people that
    happen to hit the issue described in the jira (which I’ve fixed because I’ve
    been bitten by it multiple times in the past, and in a very severe
    way yesterday)

Given that I have not heard anyone complain about this I feel stating that it makes the GUI “unusable” is an overstatement. That code has been in place for years. Although admittedly i don’t monitor the user list, irc, and jira as closely as others.

  • I cannot work on an alternative approach till next weekend (I’m going to be
    away for a few days)
  • 2.1.4 may be coming out this week

Well i am happy to try to help come up with the patch. I don’t know the referencing system very well but my idea was just a simple fixed sized thread safe set that kept recently looked up CRS objects around. Then in CRS.lookupEpsgCode when the fast path fails scan through the second level cache and do the comparison. If that fails fall back to the full scan. If that sound sane I will proceed. One question i have is whether the second level cache should be used when fullScan = false?

If not sane please advise.

Hmm… what to do?

Cheers
Andrea


Ing. Andrea Aime
GeoSolutions S.A.S.
Tech lead

Via Poggio alle Viti 1187
55054 Massarosa (LU)
Italy

phone: +39 0584 962313
fax: +39 0584 962313
mob: +39 339 8844549

http://www.geo-solutions.it
http://geo-solutions.blogspot.com/
http://www.youtube.com/user/GeoSolutionsIT
http://www.linkedin.com/in/andreaaime
http://twitter.com/geowolf


Justin Deoliveira
OpenGeo - http://opengeo.org
Enterprise support for open source geospatial.

Well my thought about the use case is not handling cases with pre-configured layers such as this one but one that is fairily common one among our clients that goes like this:

  1. user (via rest api) posts a new layer to the server
  2. user gets xml representation of the new layer to get all configuration info
  3. user uses config info, bounds, srs, etc… to make a proper OGC service request

On Mon, May 28, 2012 at 11:35 AM, Andrea Aime <andrea.aime@anonymised.com> wrote:

I can go for that, though I don’t understand how it changes on the rest clients
side, since it merely changes a likeliness of getting the code, not
the fact that
the client is broken if it cannot handle WKT too.

Point in case, here is topp:states featureType.xml, from svn:
http://svn.codehaus.org/geoserver/trunk/data/release/workspaces/topp/states_shapefile/states/featuretype.xml

If a client cannot handle one of our oldest demo layers… I would say
it is broken.

Cheers
Andrea


Ing. Andrea Aime
GeoSolutions S.A.S.
Tech lead

Via Poggio alle Viti 1187
55054 Massarosa (LU)
Italy

phone: +39 0584 962313
fax: +39 0584 962313
mob: +39 339 8844549

http://www.geo-solutions.it
http://geo-solutions.blogspot.com/
http://www.youtube.com/user/GeoSolutionsIT
http://www.linkedin.com/in/andreaaime
http://twitter.com/geowolf


Justin Deoliveira
OpenGeo - http://opengeo.org
Enterprise support for open source geospatial.

On Mon, May 28, 2012 at 7:59 PM, Justin Deoliveira <jdeolive@anonymised.com> wrote:

Well my thought about the use case is not handling cases with pre-configured
layers such as this one but one that is fairily common one among our clients
that goes like this:

1. user (via rest api) posts a new layer to the server
2. user gets xml representation of the new layer to get all configuration
info
3. user uses config info, bounds, srs, etc... to make a proper OGC service
request

The change affects only the native srs, the declared one is filled by the rest
api doing a full scan at line 384 of DataStoreFileResource, calling
builder.lookupSRS(ftinfo, true);

The native srs is something that we expose for mere information, but
it's not what the clients should use, since the declared SRS is the one
actually being exposed to services.

The declared SRS is always going to be a code since it's hand filled
from the GUI p.o.v., and forcefully filled by full scan by the rest
API.

Is there some use case that I'm missing?

Cheers
Andrea

--
Ing. Andrea Aime
GeoSolutions S.A.S.
Tech lead

Via Poggio alle Viti 1187
55054 Massarosa (LU)
Italy

phone: +39 0584 962313
fax: +39 0584 962313
mob: +39 339 8844549

http://www.geo-solutions.it
http://geo-solutions.blogspot.com/
http://www.youtube.com/user/GeoSolutionsIT
http://www.linkedin.com/in/andreaaime
http://twitter.com/geowolf

On Mon, May 28, 2012 at 7:57 PM, Justin Deoliveira <jdeolive@anonymised.com> wrote:

Given that I have not heard anyone complain about this I feel stating that
it makes the GUI "unusable" is an overstatement. That code has been in place
for years. Although admittedly i don't monitor the user list, irc, and jira
as closely as others.

I had to go through 25 shapefile layers to do manual changes on some HTTP
headers setup, each save took its own 5 seconds, during which the GUI
could not be used in other tabs also due to the lock protecting the GUI
from concurrent changes (which means you cannot do anything in another
tab either).
I'm quite sensitive to slowdowns and delay, so this makes the GUI unusable
for me (personal feeling, I understand it).

Well i am happy to try to help come up with the patch. I don't know the
referencing system very well but my idea was just a simple fixed sized
thread safe set that kept recently looked up CRS objects around. Then in
CRS.lookupEpsgCode when the fast path fails scan through the second level
cache and do the comparison. If that fails fall back to the full scan. If
that sound sane I will proceed. One question i have is whether the second
level cache should be used when fullScan = false?

Actually, why not go through the cache first?

I was actually wondering if the cache should not be better placed in
the GeoServer own configuration subsystem, and automatically filled
on startup with the associations that the admin already made in the
existing config, that would kill also the first lookup most of the time

Ah, another bit, the cache in GT would need to be wiped out when the admin
does a reset/reload anyways as part of the CRS.reset("all") call, one
in GeoServer
may not have to abide to the same issues, or at least could be refilled by the
existing associations during the reload).

Cheers
Andrea

--
Ing. Andrea Aime
GeoSolutions S.A.S.
Tech lead

Via Poggio alle Viti 1187
55054 Massarosa (LU)
Italy

phone: +39 0584 962313
fax: +39 0584 962313
mob: +39 339 8844549

http://www.geo-solutions.it
http://geo-solutions.blogspot.com/
http://www.youtube.com/user/GeoSolutionsIT
http://www.linkedin.com/in/andreaaime
http://twitter.com/geowolf

On Mon, May 28, 2012 at 12:31 PM, Andrea Aime <andrea.aime@anonymised.com> wrote:

On Mon, May 28, 2012 at 7:59 PM, Justin Deoliveira <jdeolive@anonymised.com> wrote:

Well my thought about the use case is not handling cases with pre-configured
layers such as this one but one that is fairily common one among our clients
that goes like this:

  1. user (via rest api) posts a new layer to the server
  2. user gets xml representation of the new layer to get all configuration
    info
  3. user uses config info, bounds, srs, etc… to make a proper OGC service
    request

The change affects only the native srs, the declared one is filled by the rest
api doing a full scan at line 384 of DataStoreFileResource, calling
builder.lookupSRS(ftinfo, true);

Right. Well this code is only executed when uploading a file and not creating a resource directly, but the POST a new resource case will also do this scan.

The native srs is something that we expose for mere information, but
it’s not what the clients should use, since the declared SRS is the one
actually being exposed to services.

The declared SRS is always going to be a code since it’s hand filled
from the GUI p.o.v., and forcefully filled by full scan by the rest
API.

Is there some use case that I’m missing?

Nope, looks like the use case is still covered. One other thing though is that we have code outside of geoserver that uses the xstream persister to serialize catalog objects. I don’t think this should affect any of that but if it does i would like to reserve the right to roll this change back and attempt the solution of caching the lookup as discussed. But as it stands i am fine with this change for now.

Cheers
Andrea


Ing. Andrea Aime
GeoSolutions S.A.S.
Tech lead

Via Poggio alle Viti 1187
55054 Massarosa (LU)
Italy

phone: +39 0584 962313
fax: +39 0584 962313
mob: +39 339 8844549

http://www.geo-solutions.it
http://geo-solutions.blogspot.com/
http://www.youtube.com/user/GeoSolutionsIT
http://www.linkedin.com/in/andreaaime
http://twitter.com/geowolf


Justin Deoliveira
OpenGeo - http://opengeo.org
Enterprise support for open source geospatial.

On Mon, May 28, 2012 at 12:43 PM, Andrea Aime <andrea.aime@anonymised.com> wrote:

On Mon, May 28, 2012 at 7:57 PM, Justin Deoliveira <jdeolive@anonymised.com> wrote:

Given that I have not heard anyone complain about this I feel stating that
it makes the GUI “unusable” is an overstatement. That code has been in place
for years. Although admittedly i don’t monitor the user list, irc, and jira
as closely as others.

I had to go through 25 shapefile layers to do manual changes on some HTTP
headers setup, each save took its own 5 seconds, during which the GUI
could not be used in other tabs also due to the lock protecting the GUI
from concurrent changes (which means you cannot do anything in another
tab either).
I’m quite sensitive to slowdowns and delay, so this makes the GUI unusable
for me (personal feeling, I understand it).

Right, my point was that this isn’t really the “typical” use case. But i totally understand, and am fine with the change I just think that any changes to the configuration serialization subsystem (no matter how small) can lead to subtle changes in behaviour, etc… and probably warrant a bit of discussion before hand.

Well i am happy to try to help come up with the patch. I don’t know the
referencing system very well but my idea was just a simple fixed sized
thread safe set that kept recently looked up CRS objects around. Then in
CRS.lookupEpsgCode when the fast path fails scan through the second level
cache and do the comparison. If that fails fall back to the full scan. If
that sound sane I will proceed. One question i have is whether the second
level cache should be used when fullScan = false?

Actually, why not go through the cache first?

I was actually wondering if the cache should not be better placed in
the GeoServer own configuration subsystem, and automatically filled
on startup with the associations that the admin already made in the
existing config, that would kill also the first lookup most of the time

Ah, another bit, the cache in GT would need to be wiped out when the admin
does a reset/reload anyways as part of the CRS.reset(“all”) call, one
in GeoServer
may not have to abide to the same issues, or at least could be refilled by the
existing associations during the reload).

Makes sense. A cache at the geoServer level makes sense… although given that calling this method with the extensive flag set is such a no-no performance wise it would be nice to have it at the core gt level. Anyways, no sense in worrying about this now but will flag this email in case I do have to revisit it.

Cheers
Andrea


Ing. Andrea Aime
GeoSolutions S.A.S.
Tech lead

Via Poggio alle Viti 1187
55054 Massarosa (LU)
Italy

phone: +39 0584 962313
fax: +39 0584 962313
mob: +39 339 8844549

http://www.geo-solutions.it
http://geo-solutions.blogspot.com/
http://www.youtube.com/user/GeoSolutionsIT
http://www.linkedin.com/in/andreaaime
http://twitter.com/geowolf


Justin Deoliveira
OpenGeo - http://opengeo.org
Enterprise support for open source geospatial.

On Mon, May 28, 2012 at 8:45 PM, Justin Deoliveira <jdeolive@anonymised.com> wrote:

Right. Well this code is only executed when uploading a file and not
creating a resource directly, but the POST a new resource case will also do
this scan.

Ah ok, I see the use case now... a client that posts a configuration with
a WKT description of whatever srs and expects GeoServer persistence
to "fix it" and turn it into a official EPSG code when possible.
And that would occurr also for the declared SRS.

Nope, looks like the use case is still covered. One other thing though is
that we have code outside of geoserver that uses the xstream persister to
serialize catalog objects. I don't think this should affect any of that but
if it does i would like to reserve the right to roll this change back and
attempt the solution of caching the lookup as discussed. But as it stands i
am fine with this change for now.

Cool. The cache idea is going to be useful regardless in the future, one
of the things that "regressed" since 1.7.x is the "lookup srs" button
that people often did not understand, which provided a GUI way to
invoke the full scan.
If we have a working cache it may be that we can resurrect that mechanism
and simply make it automatic as the page is being presented to the user.
It would also speedup extensions like the importer (when importing large
number of layers in the same srs).

GUI wise something like the Lucene based mechanism of prj2epsg
would probably be even better though, so that you get an imprecise,
but possibly useful response, even out of of something that cannot
be matched.

Ah right, the cache mechanism should probably also cache the "lack
of match" responses, otherwise we'd end up with long scans every time
the native srs cannot be matched at all

Cheers
Andrea

--
Ing. Andrea Aime
GeoSolutions S.A.S.
Tech lead

Via Poggio alle Viti 1187
55054 Massarosa (LU)
Italy

phone: +39 0584 962313
fax: +39 0584 962313
mob: +39 339 8844549

http://www.geo-solutions.it
http://geo-solutions.blogspot.com/
http://www.youtube.com/user/GeoSolutionsIT
http://www.linkedin.com/in/andreaaime
http://twitter.com/geowolf