[Geoserver-devel] GML3EncodingUtils.idExists performance impact

Hey Justin,

when you have a chance please take a look at the comments in <http://jira.codehaus.org/browse/GEOT-2627&gt;, as GML3EncodingUtils.idExists() is making the gml encoding process almost twice slower and I would love you opinion on the proposed workaround

Cheers,
Gabriel
--
Gabriel Roldan
OpenGeo - http://opengeo.org
Expert service straight from the developers.

Hmmm... well one possibility might be to use a similar solution as FeatureTypeCache. Each time the encoder stats, the container context gets a FeatureTypeCache instance which basically serves as a cache for feature type objects, used while parsing features to avoid creating the feature type over and over again.

A similar class could be used ot track ids during encoding. GMLConfiguration.configureContext could register it, and then whatever binding (AbstractFeatureTypeBinding or whatever) could declare a dependency on it, grab the instnace and check for or update the set with encoded ids.

The downside being that for huge feature collections the set could get quite large... but i guess its better than the current solution of building up a document full of ids.

Gabriel Roldan wrote:

Hey Justin,

when you have a chance please take a look at the comments in <http://jira.codehaus.org/browse/GEOT-2627&gt;, as GML3EncodingUtils.idExists() is making the gml encoding process almost twice slower and I would love you opinion on the proposed workaround

Cheers,
Gabriel

--
Justin Deoliveira
OpenGeo - http://opengeo.org
Enterprise support for open source geospatial.

That'd be a good solution to provide a context to the gml encoding process, my open question is now how to ensure that context is not shared between encoding processes, as the list of ids is to be checked independently for each one. My understanding is FeatureTypeCache is shared and that's ok, but this one may require a per process "context". What do you think?

Justin Deoliveira wrote:

Hmmm... well one possibility might be to use a similar solution as FeatureTypeCache. Each time the encoder stats, the container context gets a FeatureTypeCache instance which basically serves as a cache for feature type objects, used while parsing features to avoid creating the feature type over and over again.

A similar class could be used ot track ids during encoding. GMLConfiguration.configureContext could register it, and then whatever binding (AbstractFeatureTypeBinding or whatever) could declare a dependency on it, grab the instnace and check for or update the set with encoded ids.

The downside being that for huge feature collections the set could get quite large... but i guess its better than the current solution of building up a document full of ids.

Gabriel Roldan wrote:

Hey Justin,

when you have a chance please take a look at the comments in <http://jira.codehaus.org/browse/GEOT-2627&gt;, as GML3EncodingUtils.idExists() is making the gml encoding process almost twice slower and I would love you opinion on the proposed workaround

Cheers,
Gabriel

--
Gabriel Roldan
OpenGeo - http://opengeo.org
Expert service straight from the developers.

Even though the Configuration object is a singleton in geoserver (hopefully fix that someday), the binding context is not stored as a member of it, it is created by the encoder and passed into Configuration.configureContext(). So as long as two processes are not storing the same encoder instance as state there should be no problem.

Gabriel Roldan wrote:

That'd be a good solution to provide a context to the gml encoding process, my open question is now how to ensure that context is not shared between encoding processes, as the list of ids is to be checked independently for each one. My understanding is FeatureTypeCache is shared and that's ok, but this one may require a per process "context". What do you think?

Justin Deoliveira wrote:

Hmmm... well one possibility might be to use a similar solution as FeatureTypeCache. Each time the encoder stats, the container context gets a FeatureTypeCache instance which basically serves as a cache for feature type objects, used while parsing features to avoid creating the feature type over and over again.

A similar class could be used ot track ids during encoding. GMLConfiguration.configureContext could register it, and then whatever binding (AbstractFeatureTypeBinding or whatever) could declare a dependency on it, grab the instnace and check for or update the set with encoded ids.

The downside being that for huge feature collections the set could get quite large... but i guess its better than the current solution of building up a document full of ids.

Gabriel Roldan wrote:

Hey Justin,

when you have a chance please take a look at the comments in <http://jira.codehaus.org/browse/GEOT-2627&gt;, as GML3EncodingUtils.idExists() is making the gml encoding process almost twice slower and I would love you opinion on the proposed workaround

Cheers,
Gabriel

--
Justin Deoliveira
OpenGeo - http://opengeo.org
Enterprise support for open source geospatial.

thaaat's what I call a satisfactory answer :slight_smile:

Rini, see how FeatureTypeCache is used in WFSConfiguration.configureContext

As Justin suggested, the same way you can register a new instance of, say, IdLookUp in GMLConfiguration.configureContext and have AbstractFeatureTypeBinding take an extra constructor argument of type IdLookUp (if my understanding is correct the picocontainer will take care of passing over the constructor argument as long as it's registered by GMLConfiguration)

Then IdLookUp might be something like:
  class IdLookUp{
  ...
  boolean idExists(String gmlId){...}
  void add(String gmlId){...}
}

May you have time to do that so we fix the perf penalty?

(Justin please confirm my above suggestion is correct)

Cheers,
Gabriel

Justin Deoliveira wrote:

Even though the Configuration object is a singleton in geoserver (hopefully fix that someday), the binding context is not stored as a member of it, it is created by the encoder and passed into Configuration.configureContext(). So as long as two processes are not storing the same encoder instance as state there should be no problem.

Gabriel Roldan wrote:

That'd be a good solution to provide a context to the gml encoding process, my open question is now how to ensure that context is not shared between encoding processes, as the list of ids is to be checked independently for each one. My understanding is FeatureTypeCache is shared and that's ok, but this one may require a per process "context". What do you think?

Justin Deoliveira wrote:

Hmmm... well one possibility might be to use a similar solution as FeatureTypeCache. Each time the encoder stats, the container context gets a FeatureTypeCache instance which basically serves as a cache for feature type objects, used while parsing features to avoid creating the feature type over and over again.

A similar class could be used ot track ids during encoding. GMLConfiguration.configureContext could register it, and then whatever binding (AbstractFeatureTypeBinding or whatever) could declare a dependency on it, grab the instnace and check for or update the set with encoded ids.

The downside being that for huge feature collections the set could get quite large... but i guess its better than the current solution of building up a document full of ids.

Gabriel Roldan wrote:

Hey Justin,

when you have a chance please take a look at the comments in <http://jira.codehaus.org/browse/GEOT-2627&gt;, as GML3EncodingUtils.idExists() is making the gml encoding process almost twice slower and I would love you opinion on the proposed workaround

Cheers,
Gabriel

--
Gabriel Roldan
OpenGeo - http://opengeo.org
Expert service straight from the developers.

Hi Gabriel,

Sure thing. I'll reopen the issue and assign to myself. Thank's for spotting this.

Cheers
Rini

-----Original Message-----
From: Gabriel Roldan [mailto:groldan@anonymised.com]
Sent: Tuesday, 15 September 2009 11:36 AM
To: Justin Deoliveira
Cc: Geoserver-devel; Angreani, Rini (E&M, Kensington)
Subject: Re: [Geoserver-devel] GML3EncodingUtils.idExists performance impact

thaaat's what I call a satisfactory answer :slight_smile:

Rini, see how FeatureTypeCache is used in WFSConfiguration.configureContext

As Justin suggested, the same way you can register a new instance of,
say, IdLookUp in GMLConfiguration.configureContext and have
AbstractFeatureTypeBinding take an extra constructor argument of type
IdLookUp (if my understanding is correct the picocontainer will take
care of passing over the constructor argument as long as it's registered
by GMLConfiguration)

Then IdLookUp might be something like:
  class IdLookUp{
  ...
  boolean idExists(String gmlId){...}
  void add(String gmlId){...}
}

May you have time to do that so we fix the perf penalty?

(Justin please confirm my above suggestion is correct)

Cheers,
Gabriel

Justin Deoliveira wrote:

Even though the Configuration object is a singleton in geoserver
(hopefully fix that someday), the binding context is not stored as a
member of it, it is created by the encoder and passed into
Configuration.configureContext(). So as long as two processes are not
storing the same encoder instance as state there should be no problem.

Gabriel Roldan wrote:

That'd be a good solution to provide a context to the gml encoding
process, my open question is now how to ensure that context is not
shared between encoding processes, as the list of ids is to be checked
independently for each one. My understanding is FeatureTypeCache is
shared and that's ok, but this one may require a per process
"context". What do you think?

Justin Deoliveira wrote:

Hmmm... well one possibility might be to use a similar solution as
FeatureTypeCache. Each time the encoder stats, the container context
gets a FeatureTypeCache instance which basically serves as a cache
for feature type objects, used while parsing features to avoid
creating the feature type over and over again.

A similar class could be used ot track ids during encoding.
GMLConfiguration.configureContext could register it, and then
whatever binding (AbstractFeatureTypeBinding or whatever) could
declare a dependency on it, grab the instnace and check for or update
the set with encoded ids.

The downside being that for huge feature collections the set could
get quite large... but i guess its better than the current solution
of building up a document full of ids.

Gabriel Roldan wrote:

Hey Justin,

when you have a chance please take a look at the comments in
<http://jira.codehaus.org/browse/GEOT-2627&gt;, as
GML3EncodingUtils.idExists() is making the gml encoding process
almost twice slower and I would love you opinion on the proposed
workaround

Cheers,
Gabriel

--
Gabriel Roldan
OpenGeo - http://opengeo.org
Expert service straight from the developers.

Gabriel Roldan ha scritto:

thaaat's what I call a satisfactory answer :slight_smile:

Rini, see how FeatureTypeCache is used in WFSConfiguration.configureContext

As Justin suggested, the same way you can register a new instance of, say, IdLookUp in GMLConfiguration.configureContext and have AbstractFeatureTypeBinding take an extra constructor argument of type IdLookUp (if my understanding is correct the picocontainer will take care of passing over the constructor argument as long as it's registered by GMLConfiguration)

Then IdLookUp might be something like:
  class IdLookUp{
  ...
  boolean idExists(String gmlId){...}
  void add(String gmlId){...}
}

May you have time to do that so we fix the perf penalty?

Sounding in just for a quick check. Is the IdLookup necessary even
for simple features? How much will it cost in terms of performance
having it always on?

I was thinking that maybe in case of simple features the need is
not there and a dummy implementation could just always return
false to idExists and avoid both the search and memory consumption
overhead?

Then again, I don't fully understand the use case. It's just that
yesterday was chatting with a user that was asking why GML3 output
was so much slower than GML2 (on 1.7.x) and this seems to make it
worse from that pov.

Cheers
Andrea

--
Andrea Aime
OpenGeo - http://opengeo.org
Expert service straight from the developers.

Gabriel Roldan wrote:

thaaat's what I call a satisfactory answer :slight_smile:

Rini, see how FeatureTypeCache is used in WFSConfiguration.configureContext

As Justin suggested, the same way you can register a new instance of, say, IdLookUp in GMLConfiguration.configureContext and have AbstractFeatureTypeBinding take an extra constructor argument of type IdLookUp (if my understanding is correct the picocontainer will take care of passing over the constructor argument as long as it's registered by GMLConfiguration)

Then IdLookUp might be something like:
  class IdLookUp{
  ...
  boolean idExists(String gmlId){...}
  void add(String gmlId){...}
}

May you have time to do that so we fix the perf penalty?

(Justin please confirm my above suggestion is correct)

Yup, that looks good.

Cheers,
Gabriel

Justin Deoliveira wrote:

Even though the Configuration object is a singleton in geoserver (hopefully fix that someday), the binding context is not stored as a member of it, it is created by the encoder and passed into Configuration.configureContext(). So as long as two processes are not storing the same encoder instance as state there should be no problem.

Gabriel Roldan wrote:

That'd be a good solution to provide a context to the gml encoding process, my open question is now how to ensure that context is not shared between encoding processes, as the list of ids is to be checked independently for each one. My understanding is FeatureTypeCache is shared and that's ok, but this one may require a per process "context". What do you think?

Justin Deoliveira wrote:

Hmmm... well one possibility might be to use a similar solution as FeatureTypeCache. Each time the encoder stats, the container context gets a FeatureTypeCache instance which basically serves as a cache for feature type objects, used while parsing features to avoid creating the feature type over and over again.

A similar class could be used ot track ids during encoding. GMLConfiguration.configureContext could register it, and then whatever binding (AbstractFeatureTypeBinding or whatever) could declare a dependency on it, grab the instnace and check for or update the set with encoded ids.

The downside being that for huge feature collections the set could get quite large... but i guess its better than the current solution of building up a document full of ids.

Gabriel Roldan wrote:

Hey Justin,

when you have a chance please take a look at the comments in <http://jira.codehaus.org/browse/GEOT-2627&gt;, as GML3EncodingUtils.idExists() is making the gml encoding process almost twice slower and I would love you opinion on the proposed workaround

Cheers,
Gabriel

--
Justin Deoliveira
OpenGeo - http://opengeo.org
Enterprise support for open source geospatial.

Andrea Aime wrote:

Gabriel Roldan ha scritto:

thaaat's what I call a satisfactory answer :slight_smile:

Rini, see how FeatureTypeCache is used in WFSConfiguration.configureContext

As Justin suggested, the same way you can register a new instance of, say, IdLookUp in GMLConfiguration.configureContext and have AbstractFeatureTypeBinding take an extra constructor argument of type IdLookUp (if my understanding is correct the picocontainer will take care of passing over the constructor argument as long as it's registered by GMLConfiguration)

Then IdLookUp might be something like:
  class IdLookUp{
  ...
  boolean idExists(String gmlId){...}
  void add(String gmlId){...}
}

May you have time to do that so we fix the perf penalty?

Sounding in just for a quick check. Is the IdLookup necessary even
for simple features? How much will it cost in terms of performance
having it always on?

From what I have seen from the two encoding chains they are pretty integrated, would probably take a bit of work to separate them out.

I was thinking that maybe in case of simple features the need is
not there and a dummy implementation could just always return
false to idExists and avoid both the search and memory consumption
overhead?

Then again, I don't fully understand the use case. It's just that
yesterday was chatting with a user that was asking why GML3 output
was so much slower than GML2 (on 1.7.x) and this seems to make it
worse from that pov.

Well at some point I hope to resurrect the optimized gml encoder stuff i wrote a while back. Just lacking time as of late i guess.

Cheers
Andrea

--
Justin Deoliveira
OpenGeo - http://opengeo.org
Enterprise support for open source geospatial.