[Geoserver-devel] Updating the EPSG database

Hi,
the mail client today dropped me a notification about
http://jira.codehaus.org/browse/GEOS-3345, that is, updating
the EPSG database.

I've actually been working on that in spare time for
the last couple of months.

The current EPSG database is held in a HSQL database and as
the jira states, it's old, does not contain the official
Google project code as well as another few hundreds new codes
(7.1 contains over 4000 codes).

The first attempt was to upgrade the HSQL version, but failed,
as HSQL does not support UTF-8 chars and those are new used
in some fields the referencing subsystem uses (axis orientation,
unit of measure and the like, which also required various
changes in the referencing subsystem, happily those are
already done).

I've then started looking into making a H2 version of the
database, H2 supports UTF-8 just fine and we're already using
it in other places. This resulted in the gt-epsg-h2 unsupported
module, that does contain a version 7.1 of the database.

The H2 version has two significant downsides thought:
- untested, thought I've heard Jody tested it against
   uDig
- the H2 version of the database uses almost 40MB instead
   of the 9-10 of the HSQL version. This is a known issue
   with H2, the author is working on a more compact storage
   engine, but the work is incomplete

So if we want to upgrade we're going to live with
both of the above issues, and we're already in RC, that's
why I'm publicly asking on the list.

Other reasons why I haven't been pushing on the H2 version
are that it's still not that great from other point of views,
thought I have plans to address each of them:
- as the HSQL version, if you kill the process while it's
   creating the database on disk you'll get a ruined database
   that will prevent GS from working at all. Subsequent restarts
   will just fail, and the only way out if to locate and wipe out
   the directory containing the EPSG database (which is in the
   current "temp", a different place depending on the OS and
   user configs)
- as with the HSQL version, the database creation is done
   by running SQL statements, and takes over 30 seconds on a
   relatively recent machine, making the above risk all the
   more likely.
- the is not way to make the database work without unpacking
   it on the file system. H2 allows to keep the database in
   a ZIP, but only still outside of the classpath, I've also
   tested it and it's very slow (5-10 times slower)
   if you have to scan the entire database.
   Which is something we do often when we
   have to guess the official EPSG code out of a random
   PRJ file.

Before proposing a switch to H2 I also wanted to also solve
the first two above issues.

Long story short, it's not the kind of upgrade I would take
lightly.
I'm hesitant to do the upgrade now, but I'm not opposed
either. If we could solve the two critical database
creation issues above at least we'd have reward enough to
make the risk acceptable

Cheers
Andrea

--
Andrea Aime
OpenGeo - http://opengeo.org
Expert service straight from the developers.

Hi Andrea:

The H2 version has two significant downsides thought:
- untested, thought I've heard Jody tested it against uDig

This is true; the transition was fine (although the database size issue; and delay in creating the first time caused some trouble). Performance seems fine.

- the H2 version of the database uses almost 40MB instead
  of the 9-10 of the HSQL version. This is a known issue
  with H2, the author is working on a more compact storage
  engine, but the work is incomplete

So if we want to upgrade we're going to live with
both of the above issues, and we're already in RC, that's
why I'm publicly asking on the list.

I am not too worried about disk use once we are running (it is the delay that was tough). The actual script is like 800k which is the same as for HSQL.

- as the HSQL version, if you kill the process while it's
  creating the database on disk you'll get a ruined database
  that will prevent GS from working at all. Subsequent restarts
  will just fail, and the only way out if to locate and wipe out
  the directory containing the EPSG database (which is in the
  current "temp", a different place depending on the OS and
  user configs)

I thought I saw a "table check" and if that failed it would try creating the database again?

Before proposing a switch to H2 I also wanted to also solve
the first two above issues.

Long story short, it's not the kind of upgrade I would take lightly.
I'm hesitant to do the upgrade now, but I'm not opposed
either. If we could solve the two critical database
creation issues above at least we'd have reward enough to
make the risk acceptable

The reward for udig was killing the hsql dependency; what about your idea of unzipping a copy of the database (rather then running the script; or executing out of a jar).

Jody

On Wed, Aug 19, 2009 at 3:24 AM, Andrea Aime<aaime@anonymised.com> wrote:

The first attempt was to upgrade the HSQL version, but failed,
as HSQL does not support UTF-8 chars and those are new used
in some fields the referencing subsystem uses (axis orientation,
unit of measure and the like, which also required various
changes in the referencing subsystem, happily those are
already done).

Would filtering the EPSG database dump through iconv before loading
the HSQL be a temporary fix? Probably most of the UTF characters are
just LATIN1 anyways. Then you could stick w/ HSQL, the Devil-we-know.

P

Paul Ramsey ha scritto:

On Wed, Aug 19, 2009 at 3:24 AM, Andrea Aime<aaime@anonymised.com> wrote:

The first attempt was to upgrade the HSQL version, but failed,
as HSQL does not support UTF-8 chars and those are new used
in some fields the referencing subsystem uses (axis orientation,
unit of measure and the like, which also required various
changes in the referencing subsystem, happily those are
already done).

Would filtering the EPSG database dump through iconv before loading
the HSQL be a temporary fix? Probably most of the UTF characters are
just LATIN1 anyways. Then you could stick w/ HSQL, the Devil-we-know.

Hmmm... I've tried and failed, no matter what I did I could not
get the ° character get into HSQL properly. I could try again thought.... did that on a Sunday some time ago, I was not exactly
focused on making the best effort to keep the old database

Cheers
Andrea

--
Andrea Aime
OpenGeo - http://opengeo.org
Expert service straight from the developers.

Odd, it's in both Win1252 and ISO8859-1 (0xB0)

On Wed, Aug 19, 2009 at 10:57 AM, Andrea Aime<aaime@anonymised.com> wrote:

Paul Ramsey ha scritto:

On Wed, Aug 19, 2009 at 3:24 AM, Andrea Aime<aaime@anonymised.com> wrote:

The first attempt was to upgrade the HSQL version, but failed,
as HSQL does not support UTF-8 chars and those are new used
in some fields the referencing subsystem uses (axis orientation,
unit of measure and the like, which also required various
changes in the referencing subsystem, happily those are
already done).

Would filtering the EPSG database dump through iconv before loading
the HSQL be a temporary fix? Probably most of the UTF characters are
just LATIN1 anyways. Then you could stick w/ HSQL, the Devil-we-know.

Hmmm... I've tried and failed, no matter what I did I could not
get the ° character get into HSQL properly. I could try again thought....
did that on a Sunday some time ago, I was not exactly
focused on making the best effort to keep the old database

Cheers
Andrea

--
Andrea Aime
OpenGeo - http://opengeo.org
Expert service straight from the developers.

Paul Ramsey ha scritto:

Odd, it's in both Win1252 and ISO8859-1 (0xB0)

Btw, for the record, it was the procedure for updating the
database that was mangling its contents. It suggested
to load the data from the official scripts and the dump
the database to sql in order to get a significantly smaller
sql file (using COPY instead of INSERT).
Too bad HSQL mangled all non ASCII chars irreparably
when doing the backup (what an effective backup...)

Changed completely the procedure and this step is no more in the
cards, so ° is happily sitting in the database

Cheers
Andrea

--
Andrea Aime
OpenGeo - http://opengeo.org
Expert service straight from the developers.

Not sure if anything has moved on this but i am generally -1 on any such upgrade while in release candidate stage. But +1 for the upgrade in 2.0.1.

2c,

-Justin

Andrea Aime wrote:

Hi,
the mail client today dropped me a notification about
http://jira.codehaus.org/browse/GEOS-3345, that is, updating
the EPSG database.

I've actually been working on that in spare time for
the last couple of months.

The current EPSG database is held in a HSQL database and as
the jira states, it's old, does not contain the official
Google project code as well as another few hundreds new codes
(7.1 contains over 4000 codes).

The first attempt was to upgrade the HSQL version, but failed,
as HSQL does not support UTF-8 chars and those are new used
in some fields the referencing subsystem uses (axis orientation,
unit of measure and the like, which also required various
changes in the referencing subsystem, happily those are
already done).

I've then started looking into making a H2 version of the
database, H2 supports UTF-8 just fine and we're already using
it in other places. This resulted in the gt-epsg-h2 unsupported
module, that does contain a version 7.1 of the database.

The H2 version has two significant downsides thought:
- untested, thought I've heard Jody tested it against
   uDig
- the H2 version of the database uses almost 40MB instead
   of the 9-10 of the HSQL version. This is a known issue
   with H2, the author is working on a more compact storage
   engine, but the work is incomplete

So if we want to upgrade we're going to live with
both of the above issues, and we're already in RC, that's
why I'm publicly asking on the list.

Other reasons why I haven't been pushing on the H2 version
are that it's still not that great from other point of views,
thought I have plans to address each of them:
- as the HSQL version, if you kill the process while it's
   creating the database on disk you'll get a ruined database
   that will prevent GS from working at all. Subsequent restarts
   will just fail, and the only way out if to locate and wipe out
   the directory containing the EPSG database (which is in the
   current "temp", a different place depending on the OS and
   user configs)
- as with the HSQL version, the database creation is done
   by running SQL statements, and takes over 30 seconds on a
   relatively recent machine, making the above risk all the
   more likely.
- the is not way to make the database work without unpacking
   it on the file system. H2 allows to keep the database in
   a ZIP, but only still outside of the classpath, I've also
   tested it and it's very slow (5-10 times slower)
   if you have to scan the entire database.
   Which is something we do often when we
   have to guess the official EPSG code out of a random
   PRJ file.

Before proposing a switch to H2 I also wanted to also solve
the first two above issues.

Long story short, it's not the kind of upgrade I would take
lightly.
I'm hesitant to do the upgrade now, but I'm not opposed
either. If we could solve the two critical database
creation issues above at least we'd have reward enough to
make the risk acceptable

Cheers
Andrea

--
Justin Deoliveira
OpenGeo - http://opengeo.org
Enterprise support for open source geospatial.

Justin Deoliveira ha scritto:

Not sure if anything has moved on this but i am generally -1 on any such upgrade while in release candidate stage. But +1 for the upgrade in 2.0.1.

HSQL has been upgraded to 7.1 and the procedure that unpacks the db
on the disk has been changed to be a lot faster and more reliable.
I'm working on H2 in my spare time, first I need H2 core developer
to land a patch I've been working on to keep the database in the
classpath.
The idea is to have both options, if for any reason the database
fails to unpack we double back to the slower but safer in
classpath version (quite a bit slower, 3-5 times depending
on the test).

Cheers
Andrea

--
Andrea Aime
OpenGeo - http://opengeo.org
Expert service straight from the developers.