Hi,
the mail client today dropped me a notification about
http://jira.codehaus.org/browse/GEOS-3345, that is, updating
the EPSG database.
I've actually been working on that in spare time for
the last couple of months.
The current EPSG database is held in a HSQL database and as
the jira states, it's old, does not contain the official
Google project code as well as another few hundreds new codes
(7.1 contains over 4000 codes).
The first attempt was to upgrade the HSQL version, but failed,
as HSQL does not support UTF-8 chars and those are new used
in some fields the referencing subsystem uses (axis orientation,
unit of measure and the like, which also required various
changes in the referencing subsystem, happily those are
already done).
I've then started looking into making a H2 version of the
database, H2 supports UTF-8 just fine and we're already using
it in other places. This resulted in the gt-epsg-h2 unsupported
module, that does contain a version 7.1 of the database.
The H2 version has two significant downsides thought:
- untested, thought I've heard Jody tested it against
uDig
- the H2 version of the database uses almost 40MB instead
of the 9-10 of the HSQL version. This is a known issue
with H2, the author is working on a more compact storage
engine, but the work is incomplete
So if we want to upgrade we're going to live with
both of the above issues, and we're already in RC, that's
why I'm publicly asking on the list.
Other reasons why I haven't been pushing on the H2 version
are that it's still not that great from other point of views,
thought I have plans to address each of them:
- as the HSQL version, if you kill the process while it's
creating the database on disk you'll get a ruined database
that will prevent GS from working at all. Subsequent restarts
will just fail, and the only way out if to locate and wipe out
the directory containing the EPSG database (which is in the
current "temp", a different place depending on the OS and
user configs)
- as with the HSQL version, the database creation is done
by running SQL statements, and takes over 30 seconds on a
relatively recent machine, making the above risk all the
more likely.
- the is not way to make the database work without unpacking
it on the file system. H2 allows to keep the database in
a ZIP, but only still outside of the classpath, I've also
tested it and it's very slow (5-10 times slower)
if you have to scan the entire database.
Which is something we do often when we
have to guess the official EPSG code out of a random
PRJ file.
Before proposing a switch to H2 I also wanted to also solve
the first two above issues.
Long story short, it's not the kind of upgrade I would take
lightly.
I'm hesitant to do the upgrade now, but I'm not opposed
either. If we could solve the two critical database
creation issues above at least we'd have reward enough to
make the risk acceptable
Cheers
Andrea
--
Andrea Aime
OpenGeo - http://opengeo.org
Expert service straight from the developers.