[Geoserver-devel] GeoServer now depending on sleepycat embeeded db? Now three embedded databases in the classpath

Hi,
today I've noticed this startup error:

22 mar 10:14:43 WARN [mortbay.log] - Nested in
org.springframework.beans.factory.BeanCreationException: Error
creating bean with name 'gwcFacade' defined in URL
[file:/home/aaime/devel/git-gs/src/gwc/target/classes/geowebcache-geoserver-context.xml]:
Cannot resolve reference to bean 'DiskQuotaStore' while setting
constructor argument; nested exception is
org.springframework.beans.factory.BeanCreationException: Error
creating bean with name 'DiskQuotaStore' defined in URL
[file:/home/aaime/devel/git-gs/src/gwc/target/classes/geowebcache-diskquota-context.xml]:
Invocation of init method failed; nested exception is
java.lang.NoClassDefFoundError: com/sleepycat/je/EnvironmentConfig:
java.lang.ClassNotFoundException: com.sleepycat.je.EnvironmentConfig
  at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
  at java.security.AccessController.doPrivileged(Native Method)
  at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
  at java.lang.ClassLoader.loadClass(ClassLoader.java:307)
  at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
  at java.lang.ClassLoader.loadClass(ClassLoader.java:248)
  at org.geowebcache.diskquota.storage.EntityStoreBuilder.buildEntityStore(EntityStoreBuilder.java:37)
  at org.geowebcache.diskquota.storage.BDBQuotaStore.configure(BDBQuotaStore.java:133)
  at org.geowebcache.diskquota.storage.BDBQuotaStore.startUp(BDBQuotaStore.java:97)
  at org.geowebcache.diskquota.storage.BDBQuotaStore.afterPropertiesSet(BDBQuotaStore.java:87)

The error per se was quickly solved by updating my eclipse project.
But I'm wondering about this new dependency... it's one of the biggest
jars we have around, weighting 2.1MB,
plus we already have two other embedded databases around in the
classpath, h2 and hsql
(I should really look into moving GS to the h2 version of the epsg
database btw), why adding another?

Cheers
Andrea

--
-------------------------------------------------------
Ing. Andrea Aime
GeoSolutions S.A.S.
Tech lead

Via Poggio alle Viti 1187
55054 Massarosa (LU)
Italy

phone: +39 0584 962313
fax: +39 0584 962313
mob: +39 333 8128928

http://www.geo-solutions.it
http://geo-solutions.blogspot.com/
http://www.youtube.com/user/GeoSolutionsIT
http://www.linkedin.com/in/andreaaime
http://twitter.com/geowolf

-------------------------------------------------------

On Tue, 2011-03-22 at 10:23 +0100, Andrea Aime wrote:

Hi,
today I've noticed this startup error:

22 mar 10:14:43 WARN [mortbay.log] - Nested in
org.springframework.beans.factory.BeanCreationException: Error
The error per se was quickly solved by updating my eclipse project.
But I'm wondering about this new dependency... it's one of the biggest
jars we have around, weighting 2.1MB,
plus we already have two other embedded databases around in the
classpath, h2 and hsql
(I should really look into moving GS to the h2 version of the epsg
database btw), why adding another?

because its better suited for the task at hand for GWC: less overhead,
better performance, direct object model on top of a kvp database, and
complete control over memory usage. Plus it supports HA configurations,
and will use it next to support clustered environments with a single
jar, in-process database (remember having to disable the metastore and
getting rid of the diskquota module if you want to deploy gwc
pseudo-clustered?). It's called to be a complement for hazelcast to
support actual clustering in the near future while still being able of
enforcing disk quota and cache layers with parameter filters (as in
caching non default styles) without the extra overhead of configuring an
out of process, possible replicated database.

It's worth the 2M jar in my opinion, at least for those great plans for
the next versions of gwc.

Cheers,
Gabriel

Cheers
Andrea

--
Gabriel Roldan
groldan@anonymised.com
Expert service straight from the developers

On Tue, Mar 22, 2011 at 11:14 AM, Gabriel Roldán <groldan@anonymised.com> wrote:

On Tue, 2011-03-22 at 10:23 +0100, Andrea Aime wrote:

Hi,
today I've noticed this startup error:

22 mar 10:14:43 WARN [mortbay.log] - Nested in
org.springframework.beans.factory.BeanCreationException: Error
The error per se was quickly solved by updating my eclipse project.
But I'm wondering about this new dependency... it's one of the biggest
jars we have around, weighting 2.1MB,
plus we already have two other embedded databases around in the
classpath, h2 and hsql
(I should really look into moving GS to the h2 version of the epsg
database btw), why adding another?

because its better suited for the task at hand for GWC: less overhead,
better performance, direct object model on top of a kvp database, and
complete control over memory usage. Plus it supports HA configurations,
and will use it next to support clustered environments with a single
jar, in-process database (remember having to disable the metastore and
getting rid of the diskquota module if you want to deploy gwc
pseudo-clustered?). It's called to be a complement for hazelcast to
support actual clustering in the near future while still being able of
enforcing disk quota and cache layers with parameter filters (as in
caching non default styles) without the extra overhead of configuring an
out of process, possible replicated database.

It's worth the 2M jar in my opinion, at least for those great plans for
the next versions of gwc.

I don't doubt the extra capabilities, still, having three separate databases
around worries me.
h2 is usage hard coded in the kml superoverlay system, hsql we use
for the EPSG database, and now sleepycat (aka oracle berkeley db java edition)
in gwc... what a mess...

If the db is so great I would not mind migrating everything to it, what bothers
me is having that many dbs basically doing the same job in the classpath

Cheers
Andrea

--
-------------------------------------------------------
Ing. Andrea Aime
GeoSolutions S.A.S.
Tech lead

Via Poggio alle Viti 1187
55054 Massarosa (LU)
Italy

phone: +39 0584 962313
fax: +39 0584 962313
mob: +39 333 8128928

http://www.geo-solutions.it
http://geo-solutions.blogspot.com/
http://www.youtube.com/user/GeoSolutionsIT
http://www.linkedin.com/in/andreaaime
http://twitter.com/geowolf

-------------------------------------------------------

On Tue, 2011-03-22 at 11:50 +0100, Andrea Aime wrote:

On Tue, Mar 22, 2011 at 11:14 AM, Gabriel Roldán <groldan@anonymised.com> wrote:
> On Tue, 2011-03-22 at 10:23 +0100, Andrea Aime wrote:
>> Hi,
>> today I've noticed this startup error:
>>
>> 22 mar 10:14:43 WARN [mortbay.log] - Nested in
>> org.springframework.beans.factory.BeanCreationException: Error
>> The error per se was quickly solved by updating my eclipse project.
>> But I'm wondering about this new dependency... it's one of the biggest
>> jars we have around, weighting 2.1MB,
>> plus we already have two other embedded databases around in the
>> classpath, h2 and hsql
>> (I should really look into moving GS to the h2 version of the epsg
>> database btw), why adding another?
> because its better suited for the task at hand for GWC: less overhead,
> better performance, direct object model on top of a kvp database, and
> complete control over memory usage. Plus it supports HA configurations,
> and will use it next to support clustered environments with a single
> jar, in-process database (remember having to disable the metastore and
> getting rid of the diskquota module if you want to deploy gwc
> pseudo-clustered?). It's called to be a complement for hazelcast to
> support actual clustering in the near future while still being able of
> enforcing disk quota and cache layers with parameter filters (as in
> caching non default styles) without the extra overhead of configuring an
> out of process, possible replicated database.
>
> It's worth the 2M jar in my opinion, at least for those great plans for
> the next versions of gwc.

I don't doubt the extra capabilities, still, having three separate databases
around worries me.
h2 is usage hard coded in the kml superoverlay system, hsql we use
for the EPSG database, and now sleepycat (aka oracle berkeley db java edition)
in gwc... what a mess...

If the db is so great I would not mind migrating everything to it, what bothers
me is having that many dbs basically doing the same job in the classpath

I guess it depends on what you call the same job. Serving static
contents like in the epsg database case doesn't seem to be _the_ same
job than updating usage statistics every time a tile (out of potentially
many millions) is stored or served.

not sure what you expect me to say other than we're all trying to do our
best with the resources we have, have I won't be sarcastic and invite
you joining the gwc development team cause I know you're under high load
too.

It's a transient dependency, if you don't want it get rid of it, or
report back any problem and I'll be glad to tackle it as soon as
possible.

Getting rid of h2 is on my todo list GWC wise. Can't really volunteer to
migrate the epsg database. May be able to take a look at the kml
superoverlay stuff sometime soon.

Cheers,
Gabriel

Cheers
Andrea

--
Gabriel Roldan
groldan@anonymised.com
Expert service straight from the developers

On Tue, Mar 22, 2011 at 12:07 PM, Gabriel Roldán <groldan@anonymised.com> wrote:

I guess it depends on what you call the same job. Serving static
contents like in the epsg database case doesn't seem to be _the_ same
job than updating usage statistics every time a tile (out of potentially
many millions) is stored or served.

not sure what you expect me to say other than we're all trying to do our
best with the resources we have, have I won't be sarcastic and invite
you joining the gwc development team cause I know you're under high load
too.

Sigh Gabriel, I did not expect to say anything particular, nor to do anything
right away.
I'm just saying that long term having many embedded databases around is
messy and should be imho solved. I did not say we have to do it today.

Cheers
Andrea

--
-------------------------------------------------------
Ing. Andrea Aime
GeoSolutions S.A.S.
Tech lead

Via Poggio alle Viti 1187
55054 Massarosa (LU)
Italy

phone: +39 0584 962313
fax: +39 0584 962313
mob: +39 333 8128928

http://www.geo-solutions.it
http://geo-solutions.blogspot.com/
http://www.youtube.com/user/GeoSolutionsIT
http://www.linkedin.com/in/andreaaime
http://twitter.com/geowolf

-------------------------------------------------------

On Tue, Mar 22, 2011 at 12:41 PM, Andrea Aime
<andrea.aime@anonymised.com> wrote:

On Tue, Mar 22, 2011 at 12:07 PM, Gabriel Roldán <groldan@anonymised.com> wrote:

I guess it depends on what you call the same job. Serving static
contents like in the epsg database case doesn't seem to be _the_ same
job than updating usage statistics every time a tile (out of potentially
many millions) is stored or served.

not sure what you expect me to say other than we're all trying to do our
best with the resources we have, have I won't be sarcastic and invite
you joining the gwc development team cause I know you're under high load
too.

Sigh Gabriel, I did not expect to say anything particular, nor to do anything
right away.
I'm just saying that long term having many embedded databases around is
messy and should be imho solved. I did not say we have to do it today.

Btw, I stand corrected on the duplicated work stuff, I did not know Berkeley
DB is actually a nosql database, it seems does not even have a JDBC interface.

Its kvp structure could actually be amenable to be used as storage for the
superoverlay stuff, but could not be used to store the EPSG dababase unless
someone writes from scratch a new referencing authority.

Cheers
Andrea

--
-------------------------------------------------------
Ing. Andrea Aime
GeoSolutions S.A.S.
Tech lead

Via Poggio alle Viti 1187
55054 Massarosa (LU)
Italy

phone: +39 0584 962313
fax: +39 0584 962313
mob: +39 333 8128928

http://www.geo-solutions.it
http://geo-solutions.blogspot.com/
http://www.youtube.com/user/GeoSolutionsIT
http://www.linkedin.com/in/andreaaime
http://twitter.com/geowolf

-------------------------------------------------------

On Tue, 2011-03-22 at 12:41 +0100, Andrea Aime wrote:

On Tue, Mar 22, 2011 at 12:07 PM, Gabriel Roldán <groldan@anonymised.com> wrote:
> I guess it depends on what you call the same job. Serving static
> contents like in the epsg database case doesn't seem to be _the_ same
> job than updating usage statistics every time a tile (out of potentially
> many millions) is stored or served.
>
> not sure what you expect me to say other than we're all trying to do our
> best with the resources we have, have I won't be sarcastic and invite
> you joining the gwc development team cause I know you're under high load
> too.

Sigh Gabriel, I did not expect to say anything particular, nor to do anything
right away.

Didn't mean to be annoying, was just tired being 8am and no sleep trying
to get all this in seamlessly.

I'm just saying that long term having many embedded databases around is
messy and should be imho solved. I did not say we have to do it today.

Agreed, I don't like it much either. We should sit down sometime soon
and get a plan.

Cheers,
Gabriel

Cheers
Andrea

--
Gabriel Roldan
groldan@anonymised.com
Expert service straight from the developers