Yes, character sets should have been unified from the beginning of the computer age…
Codepages are essentially an IBM PC / Windows thing, as described by Wikipedia. Fairly complete lists are available in the external links, e.g. http://msdn.microsoft.com/en-us/library/ms776446.aspx.
For us the important issue is to easily use the shapefiles correctly in different softwares. To my knowledge, the codepage file ('.cpg) is read by all ESRI softwares and many others as well. And can easily be updated if needed…
Best Regards
Andreas Oxenstierna
Telefon direkt 040-16 70 17 Mobil 0734-12 80 17 andreas.oxenstierna@anonymised.com
|
SWECO Position AB
Hans Michelsensgatan 2 Box 286 201 22 Malmö Telefon 040-16 70 00 www.sweco.se
|
|
-----Ursprungligt meddelande-----
Från: Andrea Aime [mailto:aaime@anonymised.com]
Skickat: den 19 november 2008 10:14
Till: Oxenstierna Andreas
Kopia: Andrea Aime (JIRA); geoserver-devel@lists.sourceforge.net
Ämne: Re: [Geoserver-devel] [jira] Created: (GEOS-2399) Need a way to specify the encoding of shapefiles generated with SHAPE-ZIP output format
Oxenstierna Andreas ha scritto:
Great enhancement for all non-A-Z languages.
How will the encoding be stored in the DBF-file?
ESRI has two ways of doing this, either storing LDID in the DBF header
or creating a textfile .cpg which stores the codepage.
See
http://support.esri.com/index.cfm?fa=knowledgebase.techarticles.articl
eShow&d=26015
<http://support.esri.com/index.cfm?fa=knowledgebase.techarticles.artic
leShow&d=26015>
Hum, not sure we can use any of these… In particular, Java has no notion of what a codepage is, only knows about Locale and Charset, both basically go with the standard encoding names such as ISO-8859-xx or UTF8/16/32 family.
For reading foreign chars shapefiles we already allow the user to specify the encoding that way, and for writing we would to the same, but how to turn a java.nio.Charset to a codepage number is something I don’t know.
By quickly looking around with Google I’ve found this library (http://cpdetector.sourceforge.net/) that does the opposite, it guesses the encoding based on the file contents, and it’s called Code Page detector, but in fact it does return a java.nio.Charset.
By looking more I’ve found this post
(http://forums.sun.com/thread.jspa?messageID=10372122) where someone states that codepage concept is not supported by Java as it’s something Windows specific.
There was some discussion about codepage support in OGR, not sure how it turned out:
http://article.gmane.org/gmane.comp.gis.gdal.devel/8710
So it seems to pull this we’d first need to build a conversion table from codepages to encodings, provided that is even possible.
Seems like quite a bit of long boring work…
Cheers
Andrea
PS: more info about code pages here:
http://en.wikipedia.org/wiki/Code_page
–
Andrea Aime
OpenGeo - http://opengeo.org
Expert service straight from the developers.