I18n patch for XML requests handling
------------------------------------
Key: GEOS-323
URL: http://jira.codehaus.org/browse/GEOS-323
Project: GeoServer
Type: Improvement
Components: WFS
Versions: 1.2.4
Environment: Tested it (not too well) with Tomcat 5.5.7 on jsdk 1.5.0_01
Reporter: Artie Konin
Assigned to: Chris Holmes
Priority: Minor
Fix For: 1.3.0
Attachments: i18n_patch.zip
That is basically a small patch, which allows GeoServer to detect charset used
in incoming XML requests (POST only, of course). I modified files from GEOS-258
attachment for GeoServer needs (work is based on the trunk snapshot made week
ago or so). In addition to what was sent to Chris this version also contains a
change in WfsDispatcher, mostly in order to make charset detection more unified
across the code (this also allows handling of multibyte encoding schemes, absent
in previous version).
Changes in GeoServer code are limited to following 2 files:
org.vfny.geoserver.servlets.AbstractService.java
org.vfny.geoserver.wfs.servlets.WfsDispatcher.java
These files with all modifications are included in attached archive, but I think
it would be good to summarize the changes here, so they could be applied by hand
to possibly new versions of these files (of course, using a diff tool may prove
to be a better solution
org.vfny.geoserver.servlets.AbstractService.java
1. Added a few new imports:
import java.io.BufferedReader;
import org.vfny.geoserver.util.requests.XmlCharsetDetector;
2. Changed `doPost` method:
a) Commented out line
Reader xml = (requestXml != null) ? requestXml : request.getReader();
b) Added following lines instead:
Reader xml;
if (null != requestXml) {
xml = requestXml;
} else {
/*
* `getCharsetAwareReader` returns a reader which not support
* mark/reset. So it is a good idea to wrap it into BufferedReader.
* In this case the below debug output will work.
*/
xml = new BufferedReader(
XmlCharsetDetector.getCharsetAwareReader(
request.getInputStream()));
}
3. Some diff tools may highlight other lines as different when comparing
against attached file. That's because I removed excess trailing spaces
from the lines with regexp (/[ \t]+$/ to empty string). That kind of
artefacts often appear when editing files with Far Manager internal
editor. And it seems that a few trailing spaces in other code were
affected by this too. This note also applies to other files.
org.vfny.geoserver.wfs.servlets.WfsDispatcher.java
1. Imported two new classes:
import org.vfny.geoserver.util.requests.XmlCharsetDetector;
import org.vfny.geoserver.util.requests.EncodingInfo;
2. Commented out code relating to old detection algorithm (so it can be
quickly brought back in case of troubles with a new one), namely
a) Some static declarations:
// private static final String DEFAULT_ENCODING = "UTF-8";
// private static final String ENCODING_HEADER_ARG = "Content-Type";
// private static final Pattern ENCODING_PATTERN =
// Pattern.compile("encoding\\s*\\=\\s*\"([^\"]+)\"");
b) Methods `guessRequestEncoding` and `getXmlEncoding`
c) Following code in `doPost`:
String req_enc = guessRequestEncoding(request);
BufferedReader disReader = new BufferedReader(
new InputStreamReader(
new FileInputStream(temp), req_enc));
BufferedReader requestReader = new BufferedReader(
new InputStreamReader(
new FileInputStream(temp), req_enc));
3. Added these lines to `doPost` before the aforementioned commented ones:
/*
* To avoid repeating charset detection process twice, we could
* remember charset info after the first time and then use it
* to create all subsequent readers with the same source.
* `createReader` immediately creates a charset aware reader using
* data from existing `EncodingInfo` instance. It doesn't perform
* any detection.
*/
EncodingInfo encInfo = new EncodingInfo();
BufferedReader disReader;
BufferedReader requestReader;
try {
disReader = new BufferedReader(
XmlCharsetDetector.getCharsetAwareReader(
new FileInputStream(temp), encInfo));
requestReader = new BufferedReader(
XmlCharsetDetector.createReader(
new FileInputStream(temp), encInfo));
} catch (Exception e) {
/*
* Any exception other than WfsException will "hang up" the
* process - no client output, no log entries, only "Internal
* server error". So this is a little trick to make detector's
* exceptions "visible".
*/
throw new WfsException(e);
}
It is worth to note that now HTTP `Content-Type` header is ignored, and that
slightly reduces the capabilities of detection comparing to old algorithm.
Still it may be put back, and I can do this, if necessary. I simply don't
think it is a reliable source of information. XML document itself should be
generally enough of a hint.
And now about four new files, that I placed in a new `util` package (this
location could be changed as appropriate of course).
1. `org.vfny.geoserver.util.requests.XmlCharsetDetector` is a container for
detection methods itself. It uses other three classes to achive its goals.
Note that at least one of its methods is taken from Xerces codebase.
2. `org.vfny.geoserver.util.requests.EncodingInfo` (should've named it
CharsetInfo, though can hold information about detected charset.
Then this data can be used to form a response appropriately encoded,
as Gabriel suggested. Though currently I have no idea about how to do this.
3. `org.vfny.geoserver.util.requests.RewindableInputStream` is a legacy of
original Xerces-J charset detection algorithm which I based upon. I don't
know why they invented it (wonder if the same thing can be achieved with
BufferedInputStream somehow) but I didn't have a time to investigate
this subject more closely and just copied it (with a few modifications).
This byte-stream provides very limited mark/reset functionality.
4. `org.vfny.geoserver.util.requests.readers.UCSReader` is based upon
`org.apache.xerces.impl.io.UCSReader` and (in theory) should handle the
decoding of ISO-10646-UCS-2 and ISO-10646-UCS-4 charsets that are missing
from standard JVM distribution. Don't know if these charsets ever used,
but I thought it would be better to preserve this Xerces functionality.
I also wrote a basic Writer for UCS-4 but it is of no use currently and so
not included. May be it will prove useful when the task of adopting responses
to requests is solved.
Ok, the whole stuff needs some serious testing (including perfomance and
possible synchronization issues). I can only guarantee that it works for
my specific tasks I tested it with.
I'm always open for any questions, though not always have Internet access
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
http://jira.codehaus.org/secure/Administrators.jspa
-
If you want more information on JIRA, or have a bug to report see:
http://www.atlassian.com/software/jira