[Geoserver-devel] Cleaning up xml libs?

Hi,
a post on the user list reminded me of an xml libs
cleanup that I wanted to propose some time ago.

Nowadays we're running on JDK 5. Doesn't it contain
xerces and xalan equivalents already? Do we really
need to ship that extra 3.5MB of libraries? :slight_smile:
(2.7MB xalan, 875KB xerces)

Cheers
Andrea

--
Andrea Aime
OpenGeo - http://opengeo.org
Expert service straight from the developers.

Andrea Aime wrote:

Hi,
a post on the user list reminded me of an xml libs
cleanup that I wanted to propose some time ago.

Nowadays we're running on JDK 5. Doesn't it contain
xerces and xalan equivalents already? Do we really
need to ship that extra 3.5MB of libraries? :slight_smile:
(2.7MB xalan, 875KB xerces)

Cheers
Andrea
  

There's also a newcomer, XStream's XPP3 dependency.

But looking at the dependency tree for web,

...
[INFO] +- org.mortbay.jetty:jsp-2.0:pom:6.1.8:test
[INFO] | +- tomcat:jasper-compiler-jdt:jar:5.5.15:test
[INFO] | | \- org.eclipse.jdt:core:jar:3.1.1:test
[INFO] | +- xerces:xmlParserAPIs:jar:2.6.2:test
...

Doesn't that mean we need xerces? I couldn't quite figure it out from reading the maven metadata.

-Arne

--
Arne Kepp
OpenGeo - http://opengeo.org
Expert service straight from the developers

afaik gt-xsd depends on xerces classes directly:
Encoder.java
import org.apache.xml.serialize.OutputFormat;
import org.apache.xml.serialize.XMLSerializer;

not sure if there are more around though, but I guess that's at least a reason
why we have the direct dependency.

Gabriel
On Thursday 08 January 2009 06:58:59 Andrea Aime wrote:

Hi,
a post on the user list reminded me of an xml libs
cleanup that I wanted to propose some time ago.

Nowadays we're running on JDK 5. Doesn't it contain
xerces and xalan equivalents already? Do we really
need to ship that extra 3.5MB of libraries? :slight_smile:
(2.7MB xalan, 875KB xerces)

Cheers
Andrea

Gabriel Roldan ha scritto:

afaik gt-xsd depends on xerces classes directly:
Encoder.java
import org.apache.xml.serialize.OutputFormat;
import org.apache.xml.serialize.XMLSerializer;

not sure if there are more around though, but I guess that's at least a reason why we have the direct dependency.

In fact I was hoping Justin could jump on board
and comment on this one...

Cheers
Andrea

--
Andrea Aime
OpenGeo - http://opengeo.org
Expert service straight from the developers.

At first glance Gabriel is correct, there is a direct dependency on xerces. And indeed the encoder uses the XMLSerializer class quite extensively so getting rid of it might not be all that easy. I don't see an alternative to XMLSerializer in java 1.5, but I do in java 1.6, XMLStreamWriter.

Another alternative would be to create a stripped down version of xerces since we are only using a small part of the library. I think Andrea may have brought this up before.

Furthermore, at first glance it looks like we might be able to kill the following dependencies from the xml-xsd stuff:

* xml-apis
* xml-apis-xerces
* jdom

However removing those is negligible... less than 0.5M. The real hog is xalan which is 2.7M, which is dragged in via main and ows. Taking it off the build path reveals a single compile error in ows which is easily fixable (actually it's just in the test scope). Fixing that and starting geoserver it appears things run fine. I tried a variety of XML requests with no issues.

However the issue with XML libs always comes up in other deployments, various servlet containers and the like. So while it looks like we can kill the xalan dependency i would want to do some extensive testing in other environments.

The xpp dependency comes in from the ows dispatcher. We could try and kill it as it is only used to read the root element of a request. Also an alternative (XMLStreamReader) seems to be available in java 1.6 as well. However the dependency on xpp is pretty minimal at < 0.2M.

-Justin

Andrea Aime wrote:

Gabriel Roldan ha scritto:

afaik gt-xsd depends on xerces classes directly:
Encoder.java
import org.apache.xml.serialize.OutputFormat;
import org.apache.xml.serialize.XMLSerializer;

not sure if there are more around though, but I guess that's at least a reason why we have the direct dependency.

In fact I was hoping Justin could jump on board
and comment on this one...

Cheers
Andrea

--
Justin Deoliveira
OpenGeo - http://opengeo.org
Enterprise support for open source geospatial.

Justin Deoliveira ha scritto:

At first glance Gabriel is correct, there is a direct dependency on xerces. And indeed the encoder uses the XMLSerializer class quite extensively so getting rid of it might not be all that easy. I don't see an alternative to XMLSerializer in java 1.5, but I do in java 1.6, XMLStreamWriter.

Pity.

Another alternative would be to create a stripped down version of xerces since we are only using a small part of the library. I think Andrea may have brought this up before.

Actually that was for an older jaxb dependency, for which we already
acted afaik. (by taking some code from the open source alternative
jaxme?).

Furthermore, at first glance it looks like we might be able to kill the following dependencies from the xml-xsd stuff:

* xml-apis
* xml-apis-xerces
* jdom

However removing those is negligible... less than 0.5M.

Well, that might well allow us to add 2 more megabytes of uncompressed
sample data, so not that bad?

The real hog is xalan which is 2.7M, which is dragged in via main and ows. Taking it off the build path reveals a single compile error in ows which is easily fixable (actually it's just in the test scope). Fixing that and starting geoserver it appears things run fine. I tried a variety of XML requests with no issues.

However the issue with XML libs always comes up in other deployments, various servlet containers and the like. So while it looks like we can kill the xalan dependency i would want to do some extensive testing in other environments.

I guess we test in Tomcat and JBoss, if somebody needs to add the
lib for other containers, they are still free to do so?
2.7MB is the size of the release directory zipped :slight_smile:

The xpp dependency comes in from the ows dispatcher. We could try and kill it as it is only used to read the root element of a request. Also an alternative (XMLStreamReader) seems to be available in java 1.6 as well. However the dependency on xpp is pretty minimal at < 0.2M.

Sure, let's not get crazy with this one.

Another way to get some extra space would be to move all the SVG
related stuff in its own plugin, that batik*.jar is around 2MB,
and also the PDF output is heavy, 1.1MB.

Maybe I'm over-reacting, but I still remember the pain of using
internet in South Africa :wink: (not at FOSS4G 2008, before that).

Cheers
Andrea

--
Andrea Aime
OpenGeo - http://opengeo.org
Expert service straight from the developers.

I agree with Justin, a couple comments inline

On Wednesday 14 January 2009 13:17:50 Justin Deoliveira wrote:

At first glance Gabriel is correct, there is a direct dependency on
xerces. And indeed the encoder uses the XMLSerializer class quite
extensively so getting rid of it might not be all that easy. I don't see
an alternative to XMLSerializer in java 1.5, but I do in java 1.6,
XMLStreamWriter.

Another alternative would be to create a stripped down version of xerces
since we are only using a small part of the library. I think Andrea may
have brought this up before.

Furthermore, at first glance it looks like we might be able to kill the
following dependencies from the xml-xsd stuff:

* xml-apis
* xml-apis-xerces
* jdom

However removing those is negligible... less than 0.5M. The real hog is
xalan which is 2.7M, which is dragged in via main and ows. Taking it off
the build path reveals a single compile error in ows which is easily
fixable (actually it's just in the test scope). Fixing that and starting
geoserver it appears things run fine. I tried a variety of XML requests
with no issues.

However the issue with XML libs always comes up in other deployments,
various servlet containers and the like. So while it looks like we can
kill the xalan dependency i would want to do some extensive testing in
other environments.

That's really a good point.... at some point I sort of prefer 2.7 megs for the
security that things will work under different environments (provided we're
setting the system properties to instruct the api which implementation we
want, which I'm not sure we're doing anyways), or an extensive test under
different set ups, since each servlet engine seems to come with its own
preferred xml apis implementation and well, one never knows..

The xpp dependency comes in from the ows dispatcher. We could try and
kill it as it is only used to read the root element of a request. Also
an alternative (XMLStreamReader) seems to be available in java 1.6 as
well. However the dependency on xpp is pretty minimal at < 0.2M.

Right. Xpp is NOT StAX 1.0 compliant, while XMLStreamReader comes from the
StAX specification. So I'm all for removing the legacy xpp one and moving to
the standard StAX api, which also implies I have to do that for the geotools
wfs module, or we'd be carring out xpp from gt-wfs anyways.

Gabriel

-Justin

Andrea Aime wrote:
> Gabriel Roldan ha scritto:
>> afaik gt-xsd depends on xerces classes directly:
>> Encoder.java
>> import org.apache.xml.serialize.OutputFormat;
>> import org.apache.xml.serialize.XMLSerializer;
>>
>> not sure if there are more around though, but I guess that's at least a
>> reason why we have the direct dependency.
>
> In fact I was hoping Justin could jump on board
> and comment on this one...
>
> Cheers
> Andrea