[Geoserver-devel] SLD and Internationalization

Hi everybody,
I would like to introduce the possibility to have internationalized Title and Abstract elements in SLD documents.
Luckily they are already defined as InternationalString in Geotools SLD code, but the current implementation uses SimpleInternationalString, that doesn’t allow a different value based on a Locale.
What I would like to do is to use a modified version of ResourceInternationalString instead of SimpleInternationalString to allow for properties lookup in the given locale, but falling back nicely to the current value if no translation is available.

To have internationalized title / abstract when needed a user should:

  • use a key instead of the real value in SLD for a title / abstract (the key could be any valid value for a properties file key)
  • create a property file named SLD_.properties in the org.geotools.styling package on a path in the classpath with key=value rows for the given locale
  • repeat the above for any locale needed

The final purpose of this is to introduce internationalized rule titles in GeoServer WMS GetLegendGraphic rendering, through a new LEGEND_OPTIONS value named locale.

Any opinions on this?

Thanks.
Mauro Bartolomeoli

==
GeoServer training in Milan, 6th & 7th June 2013! Visit
http://geoserver.geo-solutions.it for more information.

Dott. Mauro Bartolomeoli
@mauro_bart
Senior Software Engineer

GeoSolutions S.A.S.
Via Poggio alle Viti 1187
55054 Massarosa (LU)
Italy
phone: +39 0584 962313
fax: +39 0584 1660272

http://www.geo-solutions.it
http://twitter.com/geosolutions_it


Mauro,
this is a very interesting proposal.

···

To have internationalized title / abstract when needed a user should:

  • use a key instead of the real value in SLD for a title / abstract (the key could be any valid value for a properties file key)
  • create a property file named SLD_.properties in the org.geotools.styling package on a path in the classpath with key=value rows for the given locale

Isn’t the geoserver_data_dir a good place to store i18n property file for styles?

  • repeat the above for any locale needed

The final purpose of this is to introduce internationalized rule titles in GeoServer WMS GetLegendGraphic rendering, through a new LEGEND_OPTIONS value named locale.

Carlo

Any opinions on this?

Thanks.
Mauro Bartolomeoli

==
GeoServer training in Milan, 6th & 7th June 2013! Visit
http://geoserver.geo-solutions.it for more information.

Dott. Mauro Bartolomeoli
@mauro_bart
Senior Software Engineer

GeoSolutions S.A.S.
Via Poggio alle Viti 1187
55054 Massarosa (LU)
Italy
phone: +39 0584 962313
fax: +39 0584 1660272

http://www.geo-solutions.it
http://twitter.com/geosolutions_it



Try New Relic Now & We’ll Send You this Cool Shirt
New Relic is the only SaaS-based application performance monitoring service
that delivers powerful full stack analytics. Optimize and monitor your
browser, app, & servers with just a few lines of code. Try New Relic
and get this awesome Nerd Life shirt! http://p.sf.net/sfu/newrelic_d2d_may


GeoTools-Devel mailing list
GeoTools-Devel@anonymised.coms.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/geotools-devel

==
GeoServer training in Milan, 6th & 7th June 2013! Visit
http://geoserver.geo-solutions.it for more information.

Dott. Carlo Cancellieri
@cancellieric
Software Engineer

GeoSolutions S.A.S.
Via Poggio alle Viti 1187
55054 Massarosa (LU)
Italy
phone: +39 0584 962313
mobile: +39 3371094494
fax: +39 0584 1660272

http://www.geo-solutions.it
http://twitter.com/geosolutions_it


2013/5/21 carlo cancellieri <carlo.cancellieri@anonymised.com>

Mauro,
this is a very interesting proposal.

To have internationalized title / abstract when needed a user should:

- use a key instead of the real value in SLD for a title / abstract (the
key could be any valid value for a properties file key)
- create a property file named SLD_<locale>.properties in the
org.geotools.styling package on a path in the classpath with key=value rows
for the given locale

Isn't the geoserver_data_dir a good place to store i18n property file for
styles?

Yes, it could be, but at the Geotools level there is no knowlegde of the
geoserver_data_dir. So we should introduce this eventually at the GeoServer
level.
I should look at how to make this pluggable and allow various sources for
internationalization resource bundles.

Mauro

--

GeoServer training in Milan, 6th & 7th June 2013! Visit
http://geoserver.geo-solutions.it for more information.

Dott. Mauro Bartolomeoli
@mauro_bart
Senior Software Engineer

GeoSolutions S.A.S.
Via Poggio alle Viti 1187
55054 Massarosa (LU)
Italy
phone: +39 0584 962313
fax: +39 0584 1660272

http://www.geo-solutions.it
http://twitter.com/geosolutions_it

-------------------------------------------------------

Mauro,

···

Right.

Have you considered to add i18n as an extension point into the stile itself? A week ago I asked to Andrea (added in c.c.) about a possible extension point for styles he suggested me to try adding VendorOption into the UserStyle.
This will allow to keep all the information into the same SLD also providing a new extension point to the style which can also be used for other purposes (such as default format options).

Cheers,
Carlo

==
GeoServer training in Milan, 6th & 7th June 2013! Visit
http://geoserver.geo-solutions.it for more information.

Dott. Carlo Cancellieri
@cancellieric
Software Engineer

GeoSolutions S.A.S.
Via Poggio alle Viti 1187
55054 Massarosa (LU)
Italy
phone: +39 0584 962313
mobile: +39 3371094494
fax: +39 0584 1660272

http://www.geo-solutions.it
http://twitter.com/geosolutions_it


Yes, it could be, but at the Geotools level there is no knowlegde of the geoserver_data_dir. So we should introduce this eventually at the GeoServer level.
I should look at how to make this pluggable and allow various sources for internationalization resource bundles.

Mauro

==
GeoServer training in Milan, 6th & 7th June 2013! Visit
http://geoserver.geo-solutions.it for more information.

Dott. Mauro Bartolomeoli
@mauro_bart
Senior Software Engineer

GeoSolutions S.A.S.
Via Poggio alle Viti 1187
55054 Massarosa (LU)
Italy
phone: +39 0584 962313
fax: +39 0584 1660272

http://www.geo-solutions.it
http://twitter.com/geosolutions_it


You should also be able to use XML language support to list the different translations right in the SLD file itself (and then parse the whole title out as an international string ).

···

Right.

Have you considered to add i18n as an extension point into the stile itself? A week ago I asked to Andrea (added in c.c.) about a possible extension point for styles he suggested me to try adding VendorOption into the UserStyle.
This will allow to keep all the information into the same SLD also providing a new extension point to the style which can also be used for other purposes (such as default format options).

Cheers,
Carlo

==
GeoServer training in Milan, 6th & 7th June 2013! Visit
http://geoserver.geo-solutions.it for more information.

Dott. Carlo Cancellieri
@cancellieric
Software Engineer

GeoSolutions S.A.S.
Via Poggio alle Viti 1187
55054 Massarosa (LU)
Italy
phone: +39 0584 962313
mobile: +39 3371094494
fax: +39 0584 1660272

http://www.geo-solutions.it
http://twitter.com/geosolutions_it


Yes, it could be, but at the Geotools level there is no knowlegde of the geoserver_data_dir. So we should introduce this eventually at the GeoServer level.
I should look at how to make this pluggable and allow various sources for internationalization resource bundles.

Mauro

==
GeoServer training in Milan, 6th & 7th June 2013! Visit
http://geoserver.geo-solutions.it for more information.

Dott. Mauro Bartolomeoli
@mauro_bart
Senior Software Engineer

GeoSolutions S.A.S.
Via Poggio alle Viti 1187
55054 Massarosa (LU)
Italy
phone: +39 0584 962313
fax: +39 0584 1660272

http://www.geo-solutions.it
http://twitter.com/geosolutions_it


On Tue, May 21, 2013 at 5:47 PM, Jody Garnett <jody.garnett@anonymised.com>wrote:

You should also be able to use XML language support to list the different
translations right in the SLD file itself (and then parse the whole title
out as an international string ).

Eh, I'm afraid not:

  <xsd:element name="UserStyle">
    <xsd:annotation>
      <xsd:documentation>
        A UserStyle allows user-defined styling and is semantically
        equivalent to a WMS named style.
      </xsd:documentation>
    </xsd:annotation>
    <xsd:complexType>
      <xsd:sequence>
        <xsd:element ref="sld:Name" minOccurs="0"/>
        <xsd:element ref="sld:Title" minOccurs="0"/>
        <xsd:element ref="sld:Abstract" minOccurs="0"/>
        <xsd:element ref="sld:IsDefault" minOccurs="0"/>
        <xsd:element ref="sld:FeatureTypeStyle" maxOccurs="unbounded"/>
      </xsd:sequence>
    </xsd:complexType>
  </xsd:element>
  <xsd:element name="IsDefault" type="xsd:string"/>

Not citing maxOccurs means maxOccurs=1, and in order to use XML support
we'd have to do something like:

<Title lang="en">The Title</Title>
<Title lang="it">Il titolo</Title>
...

which requires maxOccurs > 1, unless I'm missing something.

Cheers
Andrea

--

GeoServer training in Milan, 6th & 7th June 2013! Visit
http://geoserver.geo-solutions.it for more information.

Ing. Andrea Aime
@geowolf
Technical Lead

GeoSolutions S.A.S.
Via Poggio alle Viti 1187
55054 Massarosa (LU)
Italy
phone: +39 0584 962313
fax: +39 0584 1660272
mob: +39 339 8844549

http://www.geo-solutions.it
http://twitter.com/geosolutions_it

-------------------------------------------------------

2013/5/21 carlo cancellieri <carlo.cancellieri@anonymised.com>

Mauro,

Yes, it could be, but at the Geotools level there is no knowlegde of the

geoserver_data_dir. So we should introduce this eventually at the GeoServer
level.
I should look at how to make this pluggable and allow various sources for
internationalization resource bundles.

Right.

Have you considered to add i18n as an extension point into the stile
itself? A week ago I asked to Andrea (added in c.c.) about a possible
extension point for styles he suggested me to try adding VendorOption into
the UserStyle.
This will allow to keep all the information into the same SLD also
providing a new extension point to the style which can also be used for
other purposes (such as default format options).

Cheers,
Carlo

Can you explain me using an example how you would use the VendorOption to
specify localized versions of a title or abstract?

Thanks.
Mauro

--

GeoServer training in Milan, 6th & 7th June 2013! Visit
http://geoserver.geo-solutions.it for more information.

Dott. Mauro Bartolomeoli
@mauro_bart
Senior Software Engineer

GeoSolutions S.A.S.
Via Poggio alle Viti 1187
55054 Massarosa (LU)
Italy
phone: +39 0584 962313
fax: +39 0584 1660272

http://www.geo-solutions.it
http://twitter.com/geosolutions_it

-------------------------------------------------------

Mauro,

···

Well I was thinking to standardize the keys using the I18n standard prefix plus a fixed Keyword, something like:

Il titolo
The title

Questo e’ un esempio
This is an example

Ref:
http://www.w3.org/TR/2012/NOTE-ws-i18n-20120522/

Carlo

==
GeoServer training in Milan, 6th & 7th June 2013! Visit
http://geoserver.geo-solutions.it for more information.

Dott. Carlo Cancellieri
@cancellieric
Software Engineer

GeoSolutions S.A.S.
Via Poggio alle Viti 1187
55054 Massarosa (LU)
Italy
phone: +39 0584 962313
mobile: +39 3371094494
fax: +39 0584 1660272

http://www.geo-solutions.it
http://twitter.com/geosolutions_it


Can you explain me using an example how you would use the VendorOption to specify localized versions of a title or abstract?

Thanks.

Mauro

==
GeoServer training in Milan, 6th & 7th June 2013! Visit
http://geoserver.geo-solutions.it for more information.

Dott. Mauro Bartolomeoli
@mauro_bart
Senior Software Engineer

GeoSolutions S.A.S.
Via Poggio alle Viti 1187
55054 Massarosa (LU)
Italy
phone: +39 0584 962313
fax: +39 0584 1660272

http://www.geo-solutions.it
http://twitter.com/geosolutions_it


On Tue, May 21, 2013 at 6:42 PM, carlo cancellieri <
carlo.cancellieri@anonymised.com> wrote:

Mauro,

Can you explain me using an example how you would use the VendorOption to

specify localized versions of a title or abstract?

Well I was thinking to standardize the keys using the I18n standard prefix
plus a fixed Keyword, something like:

<VendorOption name="it_IT.TITLE">Il titolo</VendorOption>
<VendorOption name="en_US.TITLE">The title</VendorOption>
<VendorOption name="it_IT.ABSTRACT">Questo e' un esempio</VendorOption>
<VendorOption name="en_US.ABSTRACT">This is an example</VendorOption>

I guess it's a nice balance... does not break too much SLD, it's clear it's
a vendor extension, and we already
have option maps in the symbolizers, so why not have them in UserStyle as
well.
And more importantly, keeps everything in a single place

Cheers
Andrea

--

GeoServer training in Milan, 6th & 7th June 2013! Visit
http://geoserver.geo-solutions.it for more information.

Ing. Andrea Aime
@geowolf
Technical Lead

GeoSolutions S.A.S.
Via Poggio alle Viti 1187
55054 Massarosa (LU)
Italy
phone: +39 0584 962313
fax: +39 0584 1660272
mob: +39 339 8844549

http://www.geo-solutions.it
http://twitter.com/geosolutions_it

-------------------------------------------------------

2013/5/21 Andrea Aime <andrea.aime@anonymised.com>

On Tue, May 21, 2013 at 6:42 PM, carlo cancellieri <
carlo.cancellieri@anonymised.com> wrote:

Mauro,

Can you explain me using an example how you would use the VendorOption

to specify localized versions of a title or abstract?

Well I was thinking to standardize the keys using the I18n standard
prefix plus a fixed Keyword, something like:

<VendorOption name="it_IT.TITLE">Il titolo</VendorOption>
<VendorOption name="en_US.TITLE">The title</VendorOption>
<VendorOption name="it_IT.ABSTRACT">Questo e' un esempio</VendorOption>
<VendorOption name="en_US.ABSTRACT">This is an example</VendorOption>

I guess it's a nice balance... does not break too much SLD, it's clear
it's a vendor extension, and we already
have option maps in the symbolizers, so why not have them in UserStyle as
well.
And more importantly, keeps everything in a single place

Just a note: for my specific needs I would add VendorOption support to
Rule, not to UserStyle, since it's the Rule title that is shown on
GetLegendGraphic labels. I think the same can be extended to other SLD
elements containing Title and Abstract, such as UserStyle, FeatureTypeStyle
and maybe others.

I will try to prepare a patch (for Geotools and GeoServer) and inform you
when it's ready.

Thanks.
Mauro Bartolomeoli

--

GeoServer training in Milan, 6th & 7th June 2013! Visit
http://geoserver.geo-solutions.it for more information.

Dott. Mauro Bartolomeoli
@mauro_bart
Senior Software Engineer

GeoSolutions S.A.S.
Via Poggio alle Viti 1187
55054 Massarosa (LU)
Italy
phone: +39 0584 962313
fax: +39 0584 1660272

http://www.geo-solutions.it
http://twitter.com/geosolutions_it

-------------------------------------------------------

Thinking about it, I would like to propose a further variant that is
possibly cleaner on the user's eyes, and better to parse.

Just like we extended the content of some SLD elements already (think
geometry, not limited anymore to a property name)
we could allow mixed content in Title and Abstract and allow either:

<Title>My title</Title>

or:

<Title>
    <value lang="en">xxx</value>
    <value lang="fr">yyy</value>
    ...
</Title>

which would parse directly into a international string.

Opinions?

Cheers
Andrea

--

GeoServer training in Milan, 6th & 7th June 2013! Visit
http://geoserver.geo-solutions.it for more information.

Ing. Andrea Aime
@geowolf
Technical Lead

GeoSolutions S.A.S.
Via Poggio alle Viti 1187
55054 Massarosa (LU)
Italy
phone: +39 0584 962313
fax: +39 0584 1660272
mob: +39 339 8844549

http://www.geo-solutions.it
http://twitter.com/geosolutions_it

-------------------------------------------------------

Yeah I like that, more similar to the W3C approach for html documents.

Only wrinkle would be to define the document default lang, and use for the alternatives.


Jody Garnett

On Wednesday, 22 May 2013 at 8:35 AM, Andrea Aime wrote:

Thinking about it, I would like to propose a further variant that is possibly cleaner on the user’s eyes, and better to parse.

Just like we extended the content of some SLD elements already (think geometry, not limited anymore to a property name)
we could allow mixed content in Title and Abstract and allow either:

My title

or:

xxx yyy ...

which would parse directly into a international string.

Opinions?

Cheers
Andrea

==
GeoServer training in Milan, 6th & 7th June 2013! Visit http://geoserver.geo-solutions.it for more information.

Ing. Andrea Aime
@geowolf
Technical Lead

GeoSolutions S.A.S.
Via Poggio alle Viti 1187
55054 Massarosa (LU)
Italy
phone: +39 0584 962313
fax: +39 0584 1660272
mob: +39 339 8844549

http://www.geo-solutions.it
http://twitter.com/geosolutions_it



Try New Relic Now & We’ll Send You this Cool Shirt
New Relic is the only SaaS-based application performance monitoring service
that delivers powerful full stack analytics. Optimize and monitor your
browser, app, & servers with just a few lines of code. Try New Relic
and get this awesome Nerd Life shirt! http://p.sf.net/sfu/newrelic_d2d_may


GeoTools-Devel mailing list
GeoTools-Devel@anonymised.comeforge.net
https://lists.sourceforge.net/lists/listinfo/geotools-devel

Hi all,
I think I can prepare a patch for this shortly using the last variant proposed by Andrea (I did some testing patching the SLDParser and everything seems to work well).
I would use the “free text” (not included in a value tag) as the default value if no locale is specified.

So, something like:

default title Titolo Title ...

Also: do you think any other part (xsd, validators, parsers) would need to be touched to allow mixed content in Title and Abstract classes?

Mauro

···

2013/5/23 Jody Garnett <jody.garnett@anonymised.com…403…>

Yeah I like that, more similar to the W3C approach for html documents.

Only wrinkle would be to define the document default lang, and use for the alternatives.


Jody Garnett

On Wednesday, 22 May 2013 at 8:35 AM, Andrea Aime wrote:

Thinking about it, I would like to propose a further variant that is possibly cleaner on the user’s eyes, and better to parse.

Just like we extended the content of some SLD elements already (think geometry, not limited anymore to a property name)
we could allow mixed content in Title and Abstract and allow either:

My title

or:

xxx yyy ...

which would parse directly into a international string.

Opinions?

Cheers
Andrea

==
GeoServer training in Milan, 6th & 7th June 2013! Visit http://geoserver.geo-solutions.it for more information.

Ing. Andrea Aime
@geowolf
Technical Lead

GeoSolutions S.A.S.
Via Poggio alle Viti 1187
55054 Massarosa (LU)
Italy
phone: +39 0584 962313
fax: +39 0584 1660272
mob: +39 339 8844549

http://www.geo-solutions.it
http://twitter.com/geosolutions_it



Try New Relic Now & We’ll Send You this Cool Shirt
New Relic is the only SaaS-based application performance monitoring service
that delivers powerful full stack analytics. Optimize and monitor your
browser, app, & servers with just a few lines of code. Try New Relic
and get this awesome Nerd Life shirt! http://p.sf.net/sfu/newrelic_d2d_may


GeoTools-Devel mailing list
GeoTools-Devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/geotools-devel

==
GeoServer training in Milan, 6th & 7th June 2013! Visit
http://geoserver.geo-solutions.it for more information.

Dott. Mauro Bartolomeoli
@mauro_bart
Senior Software Engineer

GeoSolutions S.A.S.
Via Poggio alle Viti 1187
55054 Massarosa (LU)
Italy
phone: +39 0584 962313
fax: +39 0584 1660272

http://www.geo-solutions.it
http://twitter.com/geosolutions_it


On Thu, May 23, 2013 at 11:48 AM, Mauro Bartolomeoli <
mauro.bartolomeoli@anonymised.com> wrote:

Hi all,
I think I can prepare a patch for this shortly using the last variant
proposed by Andrea (I did some testing patching the SLDParser and
everything seems to work well).
I would use the "free text" (not included in a value tag) as the default
value if no locale is specified.

So, something like:
<Title>default title
<value lang="it">Titolo</value>
<value lang="en">Title</value>
...
</Title>

Also: do you think any other part (xsd, validators, parsers) would need to
be touched to allow mixed content in Title and Abstract classes?

The xsd's in gt-xsd-sld I believe

Cheers
Andrea

--

GeoServer training in Milan, 6th & 7th June 2013! Visit
http://geoserver.geo-solutions.it for more information.

Ing. Andrea Aime
@geowolf
Technical Lead

GeoSolutions S.A.S.
Via Poggio alle Viti 1187
55054 Massarosa (LU)
Italy
phone: +39 0584 962313
fax: +39 0584 1660272
mob: +39 339 8844549

http://www.geo-solutions.it
http://twitter.com/geosolutions_it

-------------------------------------------------------

2013/6/9 Andrea Aime <andrea.aime@anonymised.com>

On Sun, Jun 9, 2013 at 4:14 PM, Jody Garnett <jody.garnett@anonymised.com>wrote:

Created a pull request for this:
https://github.com/geotools/geotools/pull/202

Updated both gt-styling (in main) and gt-xsd-sld to support the new
extension.

Checked it over, looks good. I may of missed it in the test cases, did
you support encoding on this one?

This is making me wonder... encoding is a general property of the XML
file, and should be handled by the
SAX parser... or am I missing something, and the client code should be
doing something about it?
Seen this one:

http://stackoverflow.com/questions/12860115/not-able-to-parse-xml-file-containing-chinese-content

which I find weird... the XML declares the charset, I thought the parser
was supposed to read it and
handle it automatically?

From my experience, it's not the parser duty to handle the encoding. This

is usually done building a Reader using this weird syntax:

new InputStreamReader(<input_stream>, encoding)

but this assumes you already know the encoding of the stream you are going
to read.

In the past I used a framework (http://cpdetector.sourceforge.net/) that is
able to guess encoding from a stream and built a Reader using it to have
automatic encoding handling. CPDetector is a pluggable system using several
heuristics to detect a stream encoding.

I think Geotools / Geoserver are currently using the platform default
encoding in many cases (for example reading SLD and freemarker templates)
and this is often causing issues when transferring a data_dir to a
different platform (for example from Windows to Linux).

Maybe something similar to cpdetector could be used to avoid encoding
issues like the above.

Mauro

--

GeoServer training in Milan, 6th & 7th June 2013! Visit
http://geoserver.geo-solutions.it for more information.

Dott. Mauro Bartolomeoli
@mauro_bart
Senior Software Engineer

GeoSolutions S.A.S.
Via Poggio alle Viti 1187
55054 Massarosa (LU)
Italy
phone: +39 0584 962313
fax: +39 0584 1660272

http://www.geo-solutions.it
http://twitter.com/geosolutions_it

-------------------------------------------------------

On Mon, Jun 10, 2013 at 9:11 AM, Mauro Bartolomeoli <
mauro.bartolomeoli@anonymised.com> wrote:

From my experience, it's not the parser duty to handle the encoding. This
is usually done building a Reader using this weird syntax:

new InputStreamReader(<input_stream>, encoding)

but this assumes you already know the encoding of the stream you are going
to read.

In the past I used a framework (http://cpdetector.sourceforge.net/) that
is able to guess encoding from a stream and built a Reader using it to have
automatic encoding handling. CPDetector is a pluggable system using several
heuristics to detect a stream encoding.

I think Geotools / Geoserver are currently using the platform default
encoding in many cases (for example reading SLD and freemarker templates)
and this is often causing issues when transferring a data_dir to a
different platform (for example from Windows to Linux).

Maybe something similar to cpdetector could be used to avoid encoding
issues like the above.

This totally makes sense, thank you.
Hum... cpdetector is half a megabyte, a wee bit too large.
However, the xml declaration is just one line, and I believe that one can
be read before reading the actual content.
Actually, see this one, it seems a parser can read the encoding if given
the right type of input:
http://stackoverflow.com/questions/3482494/howto-let-the-sax-parser-determine-the-encoding-from-the-xml-declaration

Cheers
Andrea

--

GeoServer training in Milan, 6th & 7th June 2013! Visit
http://geoserver.geo-solutions.it for more information.

Ing. Andrea Aime
@geowolf
Technical Lead

GeoSolutions S.A.S.
Via Poggio alle Viti 1187
55054 Massarosa (LU)
Italy
phone: +39 0584 962313
fax: +39 0584 1660272
mob: +39 339 8844549

http://www.geo-solutions.it
http://twitter.com/geosolutions_it

-------------------------------------------------------

This totally makes sense, thank you.
Hum... cpdetector is half a megabyte, a wee bit too large.

Yes, I know.

However, the xml declaration is just one line, and I believe that one can
be read before reading the actual content.
Actually, see this one, it seems a parser can read the encoding if given
the right type of input:

http://stackoverflow.com/questions/3482494/howto-let-the-sax-parser-determine-the-encoding-from-the-xml-declaration

Fine, I think that makes the InputSource use the specified encoding to
build an internal reader.
The only caveat I found with this approach is that sometimes the declared
xml encoding cannot correspond to the the real file encoding, but I think
we can ignore this.
Another thought: this is surely the way to go for xml files (like SLD). For
different type of file files (like freemarker templates) we should use
other techniques. Will investigate on that.

Mauro

--

GeoServer training in Milan, 6th & 7th June 2013! Visit
http://geoserver.geo-solutions.it for more information.

Dott. Mauro Bartolomeoli
@mauro_bart
Senior Software Engineer

GeoSolutions S.A.S.
Via Poggio alle Viti 1187
55054 Massarosa (LU)
Italy
phone: +39 0584 962313
fax: +39 0584 1660272

http://www.geo-solutions.it
http://twitter.com/geosolutions_it

-------------------------------------------------------