Greetings again, all.
Sorry for my previous incomplete posting It seems that pressing
F2 for saving program code turns out to be a very bad habit when
editing mail messages with TheBat!
So I'll continue.
After speding some time trying to get GeoServer logging system
to differentiate between individual loggers, I realized that this is too
complicated matter for me and started using old good
System.out.println()'s in any suspicious part of source code.
Eventually I found the offending section in WfsDispatcher's
doPost() method.
It uses FileWriter and then FileReader classes to save the incoming
XML into temporary file and then reading it back. Examining Java API
specification I learned that the above two classes are suitable when
"the default character encoding and the default byte-buffer size
are acceptable". In my case that assumptions about default character
encodings lead to a sad results It looks like my non-US ASCII
XML file content was mangled during either file writing or reading,
or maybe even both Strangely indeed that this behaved similar both
at Linux and my Windows machine, though at later one the default encoding
is "windows-1251".
So I returned to Java documentation and was lucky enough to
circumvent the problem. The changes I made in doPost()'s code are
below:
Was:
BufferedReader tempReader = new BufferedReader(request.getReader());
// REVISIT: Should do more than sequence here
// (In case we are running two GeoServers at once)
// - Could we use response.getHandle() in the filename?
// - ProcessID is traditional, I don't know how to find that in Java
sequence++;
temp = File.createTempFile("wfsdispatch" + sequence, "tmp");
FileWriter out = new FileWriter(temp);
int c;
while ((c = tempReader.read()) != -1) {
out.write(c);
}
tempReader.close();
out.close();
BufferedReader disReader = new BufferedReader(new FileReader(temp));
BufferedReader requestReader = new BufferedReader(new FileReader(
temp));
Became:
InputStream is = new BufferedInputStream(request.getInputStream());
// REVISIT: Should do more than sequence here
// (In case we are running two GeoServers at once)
// - Could we use response.getHandle() in the filename?
// - ProcessID is traditional, I don't know how to find that in Java
sequence++;
temp = File.createTempFile("wfsdispatch" + sequence, "tmp");
BufferedOutputStream out = new BufferedOutputStream(
new FileOutputStream(temp));
int c;
while (-1 != (c = is.read())) {
out.write(c);
}
is.close();
out.flush();
out.close();
String req_enc = guessRequestEncoding(request);
BufferedReader disReader = new BufferedReader(
new InputStreamReader(
new FileInputStream(temp), req_enc));
BufferedReader requestReader = new BufferedReader(
new InputStreamReader(
new FileInputStream(temp), req_enc));
Where `guessRequestEncoding()` is a convenience method to resolve the
character encoding of the XML markup contained within incoming
request:
protected String guessRequestEncoding(HttpServletRequest request) {
String def_enc = "UTF-8";
String enc = getXmlEncoding();
if (null == enc) {
enc = request.getHeader("Content-Type");
if (null == enc) {
enc = def_enc;
} else {
if (-1 == enc.indexOf("=")) {
enc = def_enc;
} else {
enc = enc.substring(enc.lastIndexOf("=") + 1).trim();
}
}
}
return enc;
}
protected String getXmlEncoding() {
try {
StringWriter sw = new StringWriter(60);
BufferedReader in = new BufferedReader(new FileReader(temp));
int c;
while ((-1 != (c = in.read())) && (0x3E != c)) {
sw.write(c);
}
in.close();
Pattern p = Pattern.compile("encoding\\s*\\=\\s*\"([^\"]+)\"");
Matcher m = p.matcher(sw.toString());
if (m.find()) {
return m.toMatchResult().group(1);
} else {
return null;
}
} catch (IOException e) {
return null;
}
}
After above changes all is working perfectly both on Linux and
Windows. I'm almost happy
However I have notes considering that code:
1. It looks like changing BufferedReader to BufferedInputStream at the
beginning was not really necessary, as BufferedReader itself doen't
perform any codepage conversions. But at the other hand streams are
just more realiable from my point of view You get the data exactly
as it comes.
2. Both `guessRequestEncoding()` and `getXmlEncoding()` functions
looks pretty ugly even to me But that is the best I can do with my
current level of Java acquaintance
3. Placing "Content-Type" header check before the reading the XML
declaration should be faster, but I think that encoding specified at
the above declaration is simply more adequate.
4. Is there any less ugly way to extract encoding info from the
incoming XML data? `getXmlEncoding()` looks like a pregnant mammoth
-- Best regards,
Artie Konin mailto:a-thor@anonymised.com