[GeoNetwork-devel] JPA database layer questions

Hi

I have started to do developments in the develop branch and get some issues with the JPA data layer, possibly due to a lack of expertise with the new code.

If someone can provide some clarifications would be really great, thanks in advance:

1) What is the purpose of the *ListenerManager classes? Should each entity have one?

2) Is there any guideline to define relations between entities?

I see for example in the Metadata entity the categories are defined like:

private Set _metadataCategories = new HashSet();

But not others like metadata statuses or privileges that are commented:

// private List _metadataStatus;
// private Set operations = new HashSet();

3) There’re some Identifier entities like: MetadataRelationId, but not clear about when to define them. Seem like a “replacement” instead of modelling relationships in the entity classes.

What is the advantage of using this Identifier entities approach?

Any other implications? What about the foreign keys between entities, for example?

4) Is there any guideline to build queries/specifications that require to join tables to filter the results.

I understand from Metadata entity, that filtering by categories should be quite simple as categories are part of the Metadata entity.

But not very clear how to manage for example to retrieve metadata that has a specific status or allowed to certain groups? I guess a join of the entities should be done (any example would be great)?

Has this approach any performance penalties? (related to points 2/3)

Regards,
Jose García


GeoCat Bridge for ArcGIS allows instant publishing of data and metadata on GeoServer and GeoNetwork. Visit http://geocat.net for details.


Jose García
GeoCat bv
Veenderweg 13
6721 WD Bennekom
The Netherlands
http://GeoCat.net

Comments in-line

···

On Tue, Apr 8, 2014 at 2:23 PM, Jose Garcia <jose.garcia@anonymised.com> wrote:

Hi

I have started to do developments in the develop branch and get some issues with the JPA data layer, possibly due to a lack of expertise with the new code.

If someone can provide some clarifications would be really great, thanks in advance:

1) What is the purpose of the *ListenerManager classes? Should each entity have one?

These are not critical. They allow one to listen for JPA lifecycle events. I created them thinking that they would be very useful but at the moment only the user one is used. We should remove the ones that aren’t used and add them as needed.

2) Is there any guideline to define relations between entities?

I see for example in the Metadata entity the categories are defined like:

private Set _metadataCategories = new HashSet();

But not others like metadata statuses or privileges that are commented:

// private List _metadataStatus;
// private Set operations = new HashSet();

I have been trying to keep the model fairly loosely coupled. The main reason is to keep the implementation simple. It is possible to have lots of relationships, but then it gets trickier to know what objects are loaded when. Lazy-loading is possible but, especially for new developers, I think it makes it more complex.

3) There’re some Identifier entities like: MetadataRelationId, but not clear about when to define them. Seem like a “replacement” instead of modelling relationships in the entity classes.

What is the advantage of using this Identifier entities approach?

Any other implications? What about the foreign keys between entities, for example?

Any time a row is identified by several column attributes you have to have an ID class. It is frequently easier to have a primary key for the id
and autogenerate it.

Most of the tables with ID classes are that way because that was the design in previous geonetwork versions.

4) Is there any guideline to build queries/specifications that require to join tables to filter the results.

I understand from Metadata entity, that filtering by categories should be quite simple as categories are part of the Metadata entity.

But not very clear how to manage for example to retrieve metadata that has a specific status or allowed to certain groups? I guess a join of the entities should be done (any example would be great)?

Has this approach any performance penalties? (related to points 2/3)

Because I have kept the tables loosely coupled you do have to do joins for certain queries that would be easier if the model was richer. That is the main trade-off of my approach.

The queries with joins are usually done in the *RepositoryImpl classes. For example I have several fairly complex queries in the org.fao.geonet.repository.statistic.MetadataStatisticsQueries class.

However, because the queries are complex I would only do them if it is known that they are going to be a performance problem if a join is not used.

Jesse

Hi Jesse

Thanks a lot for all the information. Sounds fine about keeping the tables loosely coupled, just wanted to get some clarifications to know how to manage the developments.

Regards,
Jose García

···

On Tue, Apr 8, 2014 at 2:33 PM, Jesse Eichar <jesse.eichar@anonymised.com> wrote:

Comments in-line


GeoCat Bridge for ArcGIS allows instant publishing of data and metadata on GeoServer and GeoNetwork. Visit http://geocat.net for details.


Jose García
GeoCat bv
Veenderweg 13
6721 WD Bennekom
The Netherlands
http://GeoCat.net

On Tue, Apr 8, 2014 at 2:23 PM, Jose Garcia <jose.garcia@anonymised.com> wrote:

Hi

I have started to do developments in the develop branch and get some issues with the JPA data layer, possibly due to a lack of expertise with the new code.

If someone can provide some clarifications would be really great, thanks in advance:

1) What is the purpose of the *ListenerManager classes? Should each entity have one?

These are not critical. They allow one to listen for JPA lifecycle events. I created them thinking that they would be very useful but at the moment only the user one is used. We should remove the ones that aren’t used and add them as needed.

2) Is there any guideline to define relations between entities?

I see for example in the Metadata entity the categories are defined like:

private Set _metadataCategories = new HashSet();

But not others like metadata statuses or privileges that are commented:

// private List _metadataStatus;
// private Set operations = new HashSet();

I have been trying to keep the model fairly loosely coupled. The main reason is to keep the implementation simple. It is possible to have lots of relationships, but then it gets trickier to know what objects are loaded when. Lazy-loading is possible but, especially for new developers, I think it makes it more complex.

3) There’re some Identifier entities like: MetadataRelationId, but not clear about when to define them. Seem like a “replacement” instead of modelling relationships in the entity classes.

What is the advantage of using this Identifier entities approach?

Any other implications? What about the foreign keys between entities, for example?

Any time a row is identified by several column attributes you have to have an ID class. It is frequently easier to have a primary key for the id
and autogenerate it.

Most of the tables with ID classes are that way because that was the design in previous geonetwork versions.

4) Is there any guideline to build queries/specifications that require to join tables to filter the results.

I understand from Metadata entity, that filtering by categories should be quite simple as categories are part of the Metadata entity.

But not very clear how to manage for example to retrieve metadata that has a specific status or allowed to certain groups? I guess a join of the entities should be done (any example would be great)?

Has this approach any performance penalties? (related to points 2/3)

Because I have kept the tables loosely coupled you do have to do joins for certain queries that would be easier if the model was richer. That is the main trade-off of my approach.

The queries with joins are usually done in the *RepositoryImpl classes. For example I have several fairly complex queries in the org.fao.geonet.repository.statistic.MetadataStatisticsQueries class.

However, because the queries are complex I would only do them if it is known that they are going to be a performance problem if a join is not used.

Jesse