[Geoserver-devel] GML3 encoding with XSDIdRegistry still not optimal

Hi Rini,

first off, sorry for the late feedback on XSDIdRegistry, I just happen to get to testing with large datasets now.

Trying to serve a 10M roads network in GML3 it happens that at about 1M encoded features XSDIdRegistry produces OOM with 512M Heap. And also I don't really see why idExists() and add() performs inside a synchronized block? as far as I can see those methods are not going to be called concurrently (there's a single instance of XSDIdRegistry per encoding process and it's sequential).

Wonder if at least when encoding a SimpleFeature we could avoid the id repeating check (is that a fair enough assumption? unfortunately there's no SimpleFeatureCollection that would make an instanceof check easier, and going back to check if the original source is DataStore instead of the more general DataAccess seems like going too far back in the encoding chain as to easily patch).

And removing the synchronization looks also like a safe move and less a performance penalty?

What others think?

Cheers,

Gabriel

--
Gabriel Roldan
OpenGeo - http://opengeo.org
Expert service straight from the developers.

Hi Gabriel,

Good point. I have no problem with removing the synchronized bit.
I'll try running the profiler and let you know how it goes.

With checking SimpleFeature, I think it should be enough, although I'm not very familiar with SimpleFeature stuff, so I might be wrong.
In the beginning of the associated method in GML3EncodingUtils, it has the feature, and we can check instanceof FeatureImpl or SimpleFeature:

public static Element AbstractFeatureType_encode(Object object, Document document,
            Element value, XSDIdRegistry idSet) {
        Feature feature = (Feature) object;

Cheers
Rini

-----Original Message-----
From: Gabriel Roldan [mailto:groldan@anonymised.com]
Sent: Friday, 2 October 2009 11:53 AM
To: Geoserver-devel; Angreani, Rini (E&M, Kensington); Andrea Aime; Justin Deoliveira
Subject: GML3 encoding with XSDIdRegistry still not optimal

Hi Rini,

first off, sorry for the late feedback on XSDIdRegistry, I just happen
to get to testing with large datasets now.

Trying to serve a 10M roads network in GML3 it happens that at about 1M
encoded features XSDIdRegistry produces OOM with 512M Heap. And also I
don't really see why idExists() and add() performs inside a synchronized
block? as far as I can see those methods are not going to be called
concurrently (there's a single instance of XSDIdRegistry per encoding
process and it's sequential).

Wonder if at least when encoding a SimpleFeature we could avoid the id
repeating check (is that a fair enough assumption? unfortunately there's
no SimpleFeatureCollection that would make an instanceof check easier,
and going back to check if the original source is DataStore instead of
the more general DataAccess seems like going too far back in the
encoding chain as to easily patch).

And removing the synchronization looks also like a safe move and less a
performance penalty?

What others think?

Cheers,

Gabriel

--
Gabriel Roldan
OpenGeo - http://opengeo.org
Expert service straight from the developers.

Gabriel Roldan ha scritto:

Hi Rini,

first off, sorry for the late feedback on XSDIdRegistry, I just happen to get to testing with large datasets now.

Trying to serve a 10M roads network in GML3 it happens that at about 1M encoded features XSDIdRegistry produces OOM with 512M Heap. And also I don't really see why idExists() and add() performs inside a synchronized block? as far as I can see those methods are not going to be called concurrently (there's a single instance of XSDIdRegistry per encoding process and it's sequential).

Ouch ouch. The ability to stream out limitless amount of data with
minor memory penalty is one of the major GeoServer features, please
please let's not regress on that. If GS cannot serve 10GB xml
with 64MB heap with have a regression, and a serious one in my book.

I understand the complex feauture case is... complex and might require
people to throw hardware at the problem, but let's spare the simple
one from these headaches.

A GeoServer that easily OOM's is not useful to anyone.

Cheers
Andrea

--
Andrea Aime
OpenGeo - http://opengeo.org
Expert service straight from the developers.