[Geoserver-devel] GSIP 52 (dao refactor) review

Hi Justin (hey all),
here is the (in)famous file by file patch review :-p

General
--------------

* Most classes seem to miss the GeoServer license headers. This should
be addressed
  before committing
* How are Hibernate lazy initializations and proxies managed? As far
as I can see
* Following up on the synchronization blocks inside the catalog: yeah,
  we'd definitely need transactions if we want that to work fine also in memory.
  Hibernate catalog wise we could use an "open session in view"
approach and just
  have a servlet filter open the session and a transaction for us
without having to
  roll a catalog transaction api.
  The current setup, with transaction marked on the dao
implementations (via annotations)
  is not going to work fine, the synchronization in the catalog does
not give any guarantee
  that the changes are going to occur properly on the db. For example,
  if the network goes down during one of those synch blocks and just
one of the dao commands
  were sent away, well, bye bye, you'd get your database in an
inconsistent state.
  I'd say this one is the only really serious problem I've seen in the
review, and one that should
  be addressed with some urgency (if we go for a "open session in
view" approach it should not
  be too hard to address too).
* I see some code trying to handle a migration from file based to datbase based.
   Where is the migration happening? I see HibGeoServerLoader loading all of
   the information from the old catalog, but how is that stored back in the db?

Details details
--------------------

* The dbconfig pom seems to have a number of commented out dependencies.
  A quick cleanup seems in order.
* in AbstractHibDao.first(query, doWarn) the doWarn param is not really used
  to control exception reporting, an exception is always thrown
* In the same method there is some paranoid logging that was probably useful
  only to the developer during the building of the class? At least
it's odd to see it
  only there and not, for example, in list(query)
* HibPostLoadEventListener has an unused field, appContext
* FilterType uses XML encoding to store the filters. I'm wondering, what happens
  if we have an idfilter mixed with other filters? Official encoding
does not support
  that case, but we're good as long as we can round trip that case in
xml somehow.
  ECQL could be another candidate for filter storage (but it has problems with
  field names in other locales, which I guess is not the case for xml encoding)
* ClassMappings implementation looks odd to me.... why not use two fields and
  create a class mapping with a WORKSPACE(WorkspaceInfo.class,
WorkspaceInfoImpl.class)
  instead of using a raft of inner classes? Would seem simpler to
implement, easier to read,
  and use less bytecode in the end
* persistence.xml has a commented out block about the mapping file
* dbConfig own applicationContext.xml hard codes the references to a H2 database
  located in the data directory. Is this going to become configurable
so that one can
  point GeoServer to an external database without the need to unpack the jars,
  modify the files, pack them again and restart?
* in mappings.hbm.xml there is a ton of " <!--cache usage="read-write"/-->"
  What is going on there? :slight_smile:
  There is also a mappings.hbm.xml.orig which should not be there long term wise
* the github diff contains a "community/monitoring/GeoLiteCity.dat"
file that should
  not be there (btw, is that a geolocation database? :slight_smile: )
* The patch set contains a H2 dialect. As far as I can see here:
  http://opensource.atlassian.com/projects/hibernate/browse/HHH-3401
  in 3.3.2 the H2 dialect was finally fixed. Maybe this class is no
longer needed?

Closing up
----------------

Most of the feedback above is on secondary importance details, meaning the
overall setup is good :wink:

The current transaction management worries me, I'm not saying it has to be
improved right now but it should at least be put at some high priority
in the near
future, before we start advertising that we have a "rock solid" db storage for
catalog and configuration

Cheers
Andrea

-----------------------------------------------------
Ing. Andrea Aime
Senior Software Engineer

GeoSolutions S.A.S.
Via Poggio alle Viti 1187
55054 Massarosa (LU)
Italy

phone: +39 0584962313
fax: +39 0584962313

http://www.geo-solutions.it
http://geo-solutions.blogspot.com/
http://www.linkedin.com/in/andreaaime
http://twitter.com/geowolf

-----------------------------------------------------

Thanks for the review Andrea. Lots of great stuff. However much of the meat of your review has to do with the dbconfig module which is not actually included in this gsip. This gsip has only to do with a core refactoring of the catalog and config to even allow for a dbconfig to exists.

I apologize because i confused the issue by posting a link to the github repository which contains the dbconfig changes as well. That is also somewhat out of date. The patch that should be evaluated for this proposal is the one the jira issue.

http://jira.codehaus.org/secure/attachment/51804/GEOS-3806.patch

Again sorry for not making that clear. But it was not my intention to have the dbconfig module itself go through a formal review since as you have noted it is still rough around the edges. That said getting feedback now is great. And all that said I will attempt to address all your questions below. But for the terms of this proposal I ask that it only be evaluated based on the above patch. Which will then allow the dbconfig module to be added to community space and be incrementally improved upon based on all the feedback and conversation that has occurred thus far.

On Sun, Oct 24, 2010 at 1:46 AM, Andrea Aime <andrea.aime@anonymised.com> wrote:

Hi Justin (hey all),
here is the (in)famous file by file patch review :-p

General

  • Most classes seem to miss the GeoServer license headers. This should
    be addressed
    before committing

will do.

  • How are Hibernate lazy initializations and proxies managed? As far
    as I can see

Maybe you did not finish the thought here? Basically the way dbconfig does it (currently, and i am sure this will change) is that sessions are handled by a servlet filter using the OpenSessionInView pattern you mention. This allows us to use lazy initializations and have the ui work.

However, transactions are managed at the dao/facade level using springs hibernate facilities (LocalSessionFactoryBean) and declarative transaction management. Also the bean handles the opening of a session if one is not already open. This occurs only during system startup in which the dao is accessed but not as a result of a request.

  • Following up on the synchronization blocks inside the catalog: yeah,
    we’d definitely need transactions if we want that to work fine also in memory.
    Hibernate catalog wise we could use an “open session in view”
    approach and just
    have a servlet filter open the session and a transaction for us
    without having to
    roll a catalog transaction api.
    The current setup, with transaction marked on the dao
    implementations (via annotations)
    is not going to work fine, the synchronization in the catalog does
    not give any guarantee
    that the changes are going to occur properly on the db. For example,
    if the network goes down during one of those synch blocks and just
    one of the dao commands
    were sent away, well, bye bye, you’d get your database in an
    inconsistent state.
    I’d say this one is the only really serious problem I’ve seen in the
    review, and one that should
    be addressed with some urgency (if we go for a “open session in
    view” approach it should not
    be too hard to address too).

Agreed. I think that indeed moving transaction management up into the filter makes sense. Actually i had it implemented like that at one point but then moved to transaction mgmt around the dao itself to address issues during startup. But yeah, should be a relatively easy change since that filter is already in place.

  • I see some code trying to handle a migration from file based to datbase based.
    Where is the migration happening? I see HibGeoServerLoader loading all of
    the information from the old catalog, but how is that stored back in the db?

The way it works now is that the HibGeoserverLoader tracks whether or not an initial migration has happened, with a file marker. When it is not present the loader will actually trigger the classic loading of the file based config. Which will in turn call all the methods to populate the catalog and config, which does the initial population of the db.

On subsequent startups the file marker is recognized and no file based config is loaded. Which makes startup very quick as you can imagine :slight_smile:

Perhaps we want a better strategy for this. This was just a first cut that was relatively easy to pull off.

Details details

  • The dbconfig pom seems to have a number of commented out dependencies.
    A quick cleanup seems in order.
  • in AbstractHibDao.first(query, doWarn) the doWarn param is not really used
    to control exception reporting, an exception is always thrown
  • In the same method there is some paranoid logging that was probably useful
    only to the developer during the building of the class? At least
    it’s odd to see it only there and not, for example, in list(query)

I pretty took these as is and used them from the hibernate module that was there. But yeah, some cleanup is in order.

  • HibPostLoadEventListener has an unused field, appContext
  • FilterType uses XML encoding to store the filters. I’m wondering, what happens
    if we have an idfilter mixed with other filters? Official encoding
    does not support
    that case, but we’re good as long as we can round trip that case in
    xml somehow.
    ECQL could be another candidate for filter storage (but it has problems with
    field names in other locales, which I guess is not the case for xml encoding)

Have not tried that yet. But yeah if we can get the parser/encoder to round trip which i think it does or if it not will be possible then we should be good.

  • ClassMappings implementation looks odd to me… why not use two fields and
    create a class mapping with a WORKSPACE(WorkspaceInfo.class,
    WorkspaceInfoImpl.class)
    instead of using a raft of inner classes? Would seem simpler to
    implement, easier to read,
    and use less bytecode in the end

Yup, chalk this one up to inexperience with enums. For some reason i tried that but could not get the darn thing to compile so moved on. But going back it seems trivial… not sure what was up.

  • persistence.xml has a commented out block about the mapping file

In the end i decided to ditch going through jpa abstraction and work directly with the hibernate api. Since I am relatively new to hibernate working with jpa made it hard to figure out what was actually going on. So this file is unused.

  • dbConfig own applicationContext.xml hard codes the references to a H2 database
    located in the data directory. Is this going to become configurable
    so that one can
    point GeoServer to an external database without the need to unpack the jars,
    modify the files, pack them again and restart?

Indeed. The configuration you see is actually just the default. And the data source (which is a subclass) copies that out into the data directory so the user can edit the database parameters.

  • in mappings.hbm.xml there is a ton of " "
    What is going on there? :slight_smile:
    There is also a mappings.hbm.xml.orig which should not be there long term wise

The cache statements have been restored. They were commented out because i was having some issues with caching and had to do testing without a second level cache in play. The current implementation of dbconfig has them back. The orig mapping file is there for reference since i have made a number of changes to it.

  • the github diff contains a “community/monitoring/GeoLiteCity.dat”
    file that should
    not be there (btw, is that a geolocation database? :slight_smile: )

I will be sure not to commit that. and yes it is the maxmind free geolocation database that goes from ip to cite name.

Cool, that would be nice.

Closing up

Most of the feedback above is on secondary importance details, meaning the
overall setup is good :wink:

The current transaction management worries me, I’m not saying it has to be
improved right now but it should at least be put at some high priority
in the near
future, before we start advertising that we have a “rock solid” db storage for
catalog and configuration

Agreed, and as a community module this will undoubtedly be something that gets addressed before the dbconfig module can become an extension. And to clarify this proposal is in now way trying to push the dbconfig as “rock solid”. It needs work no doubt. This proposal is just to do the base work in the core to allow the module to worked on.

Thanks again for taking the time to do the in depth review.

Cheers
Andrea


Ing. Andrea Aime
Senior Software Engineer

GeoSolutions S.A.S.
Via Poggio alle Viti 1187
55054 Massarosa (LU)
Italy

phone: +39 0584962313
fax: +39 0584962313

http://www.geo-solutions.it
http://geo-solutions.blogspot.com/
http://www.linkedin.com/in/andreaaime
http://twitter.com/geowolf



Nokia and AT&T present the 2010 Calling All Innovators-North America contest
Create new apps & games for the Nokia N8 for consumers in U.S. and Canada
$10 million total in prizes - $4M cash, 500 devices, nearly $6M in marketing
Develop with Nokia Qt SDK, Web Runtime, or Java and Publish to Ovi Store
http://p.sf.net/sfu/nokia-dev2dev


Geoserver-devel mailing list
Geoserver-devel@anonymised.comsts.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/geoserver-devel


Justin Deoliveira
OpenGeo - http://opengeo.org
Enterprise support for open source geospatial.

On Sun, Oct 24, 2010 at 9:47 PM, Justin Deoliveira <jdeolive@anonymised.com> wrote:

Thanks for the review Andrea. Lots of great stuff. However much of the meat
of your review has to do with the dbconfig module which is not actually
included in this gsip. This gsip has only to do with a core refactoring of
the catalog and config to even allow for a dbconfig to exists.

Ah! Ok, I did not get that one :slight_smile:
Anyways, I already added my +1 to the GSIP page, the work seems to be heading
in the right direction.

Cheers
Andrea

-----------------------------------------------------
Ing. Andrea Aime
Senior Software Engineer

GeoSolutions S.A.S.
Via Poggio alle Viti 1187
55054 Massarosa (LU)
Italy

phone: +39 0584962313
fax: +39 0584962313

http://www.geo-solutions.it
http://geo-solutions.blogspot.com/
http://www.linkedin.com/in/andreaaime
http://twitter.com/geowolf

-----------------------------------------------------