[GeoNetwork-devel] Agenda for GeoNetwork hacking in Bolsena&release planning [SEC=UNCLASSIFIED]

Hi Francois,
In theory you could use the UUID as a primary key column. It would be cluster
safe. I wouldn't recommend it, using a VARCHAR type column as a primary key
is generally a bad idea.

Also, the SerialFactory is being used to generate keys for the Groups, Users,
Metadata and Categories tables. Only the Metadata table has the UUID column.

A good discussion on the problem of key generation can be found in Fowlers
book "Patterns of Enterprise Application Architecture". See the Identity
Field pattern.

Using a persistence framework is admittedly a heavy weight solution to the
problem. Another option would be to create a primary key table although you
do need to use a degree of caution when managing the database transaction.
The key generation query needs to be separated from the data insert query.

In general projects tend to start out using direct SQL. It's easy to do and
you get the warm glow of control. Trouble is, projects tend to grow in size
and complexity and soon the 8 tables and 20 or so queries grow to 20+ tables
and any number of queries. It becomes difficult to find and manage all those
buried SQL queries, maintenance becomes an issue.

You then find yourself writing code to manage the object/database disconnect.
Then you create caches for performance. I know, I've written persistence
frameworks before (used to work for The Object People, creators of TOPLink).

GeoNetwork is not yet at the size where a persistence framework is a must but
if it was up to me I'd be looking seriously at the option.

Regards,
Stephen Davies

-----Original Message-----
From: Francois-Xavier Prunayre
[mailto:francois-xavier.prunayre@anonymised.com]
Sent: Thursday, 12 June 2008 4:57
To: Davies Stephen
Cc: geonetwork-devel@lists.sourceforge.net; Jeroen.Ticheler@anonymised.com
Subject: Re: [GeoNetwork-devel] Agenda for GeoNetwork hacking in
Bolsena&release planning [SEC=UNCLASSIFIED]

Hi Stephen, one question on cluster deployment,

On mer, 2008-06-11 at 15:03 +1000, Stephen.Davies@anonymised.com wrote:

As of version 2.2.0 the GeoNetwork application cannot be deployed to a
cluster. Existing deployments probably haven't gotten to the size where
clustering is necessary, but if this were to happen, deployment to a

cluster

will fail.
Secondly, and this is the real killer, the current mechanism of generating
unique primary keys in jeeves.util.SerialFactory will fail in a cluster due
to duplicate primary keys. The SerialFactory caches the max primary key
values for each table. In a cluster multiple SerialFactory instances will
exist and are oblivious of each other. The first node to insert a record

will

succeed, other nodes will fail.

could we use safely the uuid generated by a UUID.randomUUID() as the
primary key for the metadata (we also use that uuid to have direct
access to metadata in the webinterface).
Jeroen added a unique constraint on that field some weeks ago:
http://www.nabble.com/SF.net-SVN%3A-geonetwork%
3A--1311--branches-2.2.x-gast-setup-sql-td17561379.html (this has not
been commmitted to trunk, only to branch 2.2.x, should we align trunk on
that Jeroen ?).

If that situation is safe in a cluster environment, it could be an
option to use the uuid as the internal identifier for GeoNetwork instead
of Jeeves's serialFactory ?

Geoscience Australia has deployed GeoNetwork using Oracle. The correct way

to

deal with this in Oracle is to use a SEQUENCE. This requires generating
Oracle specific SQL, something the project has avoided doing.

Cheers. Francois

Thanks for the info Stephen.

On jeu, 2008-06-12 at 17:44 +1000, Stephen.Davies@anonymised.com wrote:

Also, the SerialFactory is being used to generate keys for the Groups, Users,
Metadata and Categories tables. Only the Metadata table has the UUID column.

One exception is the Sources table which use the uuid as a key. Maybe
all tables should use the same mechanism.

Francois