[GeoNetwork-devel] CSW harvester and metadata uuid

Hi

The CSW harvester uses gmd:fileIdentifier/gco:CharacterString in full record response as the metadata uuid. If this value doesn’t exist CSW harvester discards the metadata record.

As gmd:fileIdentifier is not mandatory, I propose to extend the actual code, so if is not declared, then check if is declared gmd:identificationInfo/gmd:MD_DataIdentification/gmd:citation/gmd:CI_Citation/gmd:identifier/gmd:MD_Identifier/gmd:code/gco:CharacterString and use it.

Any comment against this change? or seem ok to apply it?

Thanks and regards,
Jose García

Hi Jose,

Sounds ok to me but the CSW code changed as part of svn rev 8223 - it doesn't use hard coded XPaths etc in org/fao/geonet/kernel/harvest/harvester/csw/Harvester.java any more. Instead the schema autodetect code from the SchemaManager is used to find the schema that the harvested record belongs to so that other schemas and iso profiles can be harvested. So now when the harvested extracts uuid from the record, extract-uuid.xsl is used from the detected schema, hence you should be able to make the change you want to extract-uuid.xsl for your schema.

Cheers,
Simon
________________________________________
From: jose garcia [josegar74@anonymised.com]
Sent: Saturday, 1 October 2011 12:19 AM
To: geonetwork-devel@lists.sourceforge.net
Subject: [GeoNetwork-devel] CSW harvester and metadata uuid

Hi

The CSW harvester uses gmd:fileIdentifier/gco:CharacterString in full record response as the metadata uuid. If this value doesn't exist CSW harvester discards the metadata record.

As gmd:fileIdentifier is not mandatory, I propose to extend the actual code, so if is not declared, then check if is declared gmd:identificationInfo/gmd:MD_DataIdentification/gmd:citation/gmd:CI_Citation/gmd:identifier/gmd:MD_Identifier/gmd:code/gco:CharacterString and use it.

Any comment against this change? or seem ok to apply it?

Thanks and regards,
Jose García

Hi Jose,

There is a difference between the MD_Metadata/fileIdentifier and the MD_metadata/../identificationInfo/../identifier/../code. The former is the identifier for the metadata record. However the latter is the identifier for the resource that the metadata describes. I would suggest that these are two different things and hence your proposal is logically incorrect.

What about creating a UUID for an MD_Metadata/fileIdentifier if it doesn't exist? It doesn't matter if an optional element is used does it?

I hope this helps.

John Hockaday

________________________________________
From: jose garcia [josegar74@anonymised.com]
Sent: Saturday, 1 October 2011 12:19 AM
To: geonetwork-devel@lists.sourceforge.net
Subject: [GeoNetwork-devel] CSW harvester and metadata uuid

Hi

The CSW harvester uses gmd:fileIdentifier/gco:CharacterString
in full record response as the metadata uuid. If this value
doesn't exist CSW harvester discards the metadata record.

As gmd:fileIdentifier is not mandatory, I propose to extend
the actual code, so if is not declared, then check if is
declared
gmd:identificationInfo/gmd:MD_DataIdentification/gmd:citation/
gmd:CI_Citation/gmd:identifier/gmd:MD_Identifier/gmd:code/gco:
CharacterString and use it.

Any comment against this change? or seem ok to apply it?

Thanks and regards,
Jose García

--------------------------------------------------------------
----------------
All of the data generated in your IT infrastructure is
seriously valuable.
Why? It contains a definitive record of application
performance, security
threats, fraudulent activity, and more. Splunk takes this
data and makes
sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-d2dcopy2
_______________________________________________
GeoNetwork-devel mailing list
GeoNetwork-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/geonetwork-devel
GeoNetwork OpenSource is maintained at
http://sourceforge.net/projects/geonetwork

*************************************************************************
The information contained in this e-mail, and any attachments to it,
is intended for the use of the addressee and is confidential. If you
are not the intended recipient you must not use, disclose, read,
forward, copy or retain any of the information. If you received this
e-mail in error, please delete it and notify the sender by return
e-mail or telephone.

The Commonwealth does not warrant that any attachments are free
from viruses or any other defects. You assume all liability for any
loss, damage or other consequences which may arise from opening
or using the attachments.

The security of emails transmitted in an unencrypted environment
cannot be guaranteed. By forwarding or replying to this email, you
acknowledge and accept these risks.
*************************************************************************

Hi John

Thanks for clarification. The problem is that harvester requires a uuid to manage if update or create the harvested metadata.

Maybe the option is just allow to manage this situation in harvester, so if no uuid for metadata on a reharvest always create it.

Regards,
Jose García

On Mon, Oct 17, 2011 at 5:24 AM, Hockaday, John <John.Hockaday@anonymised.com> wrote:

Hi Jose,

There is a difference between the MD_Metadata/fileIdentifier and the MD_metadata/…/identificationInfo/…/identifier/…/code. The former is the identifier for the metadata record. However the latter is the identifier for the resource that the metadata describes. I would suggest that these are two different things and hence your proposal is logically incorrect.

What about creating a UUID for an MD_Metadata/fileIdentifier if it doesn’t exist? It doesn’t matter if an optional element is used does it?

I hope this helps.

John Hockaday


From: jose garcia [josegar74@anonymised.com31…]
Sent: Saturday, 1 October 2011 12:19 AM
To: geonetwork-devel@lists.sourceforge.net
Subject: [GeoNetwork-devel] CSW harvester and metadata uuid

Hi

The CSW harvester uses gmd:fileIdentifier/gco:CharacterString
in full record response as the metadata uuid. If this value
doesn’t exist CSW harvester discards the metadata record.

As gmd:fileIdentifier is not mandatory, I propose to extend
the actual code, so if is not declared, then check if is
declared
gmd:identificationInfo/gmd:MD_DataIdentification/gmd:citation/
gmd:CI_Citation/gmd:identifier/gmd:MD_Identifier/gmd:code/gco:
CharacterString and use it.

Any comment against this change? or seem ok to apply it?

Thanks and regards,
Jose García



All of the data generated in your IT infrastructure is
seriously valuable.
Why? It contains a definitive record of application
performance, security
threats, fraudulent activity, and more. Splunk takes this
data and makes
sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-d2dcopy2


GeoNetwork-devel mailing list
GeoNetwork-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/geonetwork-devel
GeoNetwork OpenSource is maintained at
http://sourceforge.net/projects/geonetwork


The information contained in this e-mail, and any attachments to it,
is intended for the use of the addressee and is confidential. If you
are not the intended recipient you must not use, disclose, read,
forward, copy or retain any of the information. If you received this
e-mail in error, please delete it and notify the sender by return
e-mail or telephone.

The Commonwealth does not warrant that any attachments are free
from viruses or any other defects. You assume all liability for any
loss, damage or other consequences which may arise from opening
or using the attachments.

The security of emails transmitted in an unencrypted environment
cannot be guaranteed. By forwarding or replying to this email, you
acknowledge and accept these risks.


Hi, reopening this discussion because having a requirement relative to the resource code.

If you have one dataset (same MD_metadata/…/identificationInfo/…/identifier/…/code) (eg. a NMA layer for roads) described in 2 or more catalogs with different metadata uuids. The metadata may be slightly different depending on the author, but the resource is the same. When harvesting, some users would like to have the capability to not introduce “duplicate” description of the same dataset.

Do you think we could add an extra step which allows to activate a filter on the resource code ?

@Jose, which option did you choose to manage the case where no uuid is set in a record ?

Thanks for your comments.

Francois

2011/10/1 Simon.Pigot@anonymised.com

Hi Jose,

Sounds ok to me but the CSW code changed as part of svn rev 8223 - it doesn’t use hard coded XPaths etc in org/fao/geonet/kernel/harvest/harvester/csw/Harvester.java any more. Instead the schema autodetect code from the SchemaManager is used to find the schema that the harvested record belongs to so that other schemas and iso profiles can be harvested. So now when the harvested extracts uuid from the record, extract-uuid.xsl is used from the detected schema, hence you should be able to make the change you want to extract-uuid.xsl for your schema.

Cheers,
Simon


From: jose garcia [josegar74@anonymised.com]
Sent: Saturday, 1 October 2011 12:19 AM
To: geonetwork-devel@lists.sourceforge.net
Subject: [GeoNetwork-devel] CSW harvester and metadata uuid

Hi

The CSW harvester uses gmd:fileIdentifier/gco:CharacterString in full record response as the metadata uuid. If this value doesn’t exist CSW harvester discards the metadata record.

As gmd:fileIdentifier is not mandatory, I propose to extend the actual code, so if is not declared, then check if is declared gmd:identificationInfo/gmd:MD_DataIdentification/gmd:citation/gmd:CI_Citation/gmd:identifier/gmd:MD_Identifier/gmd:code/gco:CharacterString and use it.

Any comment against this change? or seem ok to apply it?

Thanks and regards,
Jose García


All of the data generated in your IT infrastructure is seriously valuable.
Why? It contains a definitive record of application performance, security
threats, fraudulent activity, and more. Splunk takes this data and makes
sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-d2dcopy2


GeoNetwork-devel mailing list
GeoNetwork-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/geonetwork-devel
GeoNetwork OpenSource is maintained at http://sourceforge.net/projects/geonetwork

Hi Francois

About the extra step you mention seem ok to me.

About the option choose for uuid, if not wrong finally no change was done as there were some mails with concerns about using code value as uuid as indeed code is the dataset identifier, no related to metadata identifier.

Not sure if a good solution for the no uuid, except maybe discard metadata without uuid from harvest. Other option is let GeoNetwork to create a uuid for these cases. But will only work nicely for harvesters that clean all data on reharvest. Afaik for example CSW harvester uses uuid to check if a metadata should be updated or created, so no a valid option to let GeoNetwork to create a uuid if md hasn’t one as the record will be added again on each reharvest.

Regards,
Jose García

On Fri, Jan 11, 2013 at 9:36 AM, Francois Prunayre <fx.prunayre@anonymised.com> wrote:

Hi, reopening this discussion because having a requirement relative to the resource code.

If you have one dataset (same MD_metadata/…/identificationInfo/…/identifier/…/code) (eg. a NMA layer for roads) described in 2 or more catalogs with different metadata uuids. The metadata may be slightly different depending on the author, but the resource is the same. When harvesting, some users would like to have the capability to not introduce “duplicate” description of the same dataset.

Do you think we could add an extra step which allows to activate a filter on the resource code ?

@Jose, which option did you choose to manage the case where no uuid is set in a record ?

Thanks for your comments.

Francois

2011/10/1 Simon.Pigot@anonymised.com

Hi Jose,

Sounds ok to me but the CSW code changed as part of svn rev 8223 - it doesn’t use hard coded XPaths etc in org/fao/geonet/kernel/harvest/harvester/csw/Harvester.java any more. Instead the schema autodetect code from the SchemaManager is used to find the schema that the harvested record belongs to so that other schemas and iso profiles can be harvested. So now when the harvested extracts uuid from the record, extract-uuid.xsl is used from the detected schema, hence you should be able to make the change you want to extract-uuid.xsl for your schema.

Cheers,
Simon


From: jose garcia [josegar74@anonymised.com]
Sent: Saturday, 1 October 2011 12:19 AM
To: geonetwork-devel@lists.sourceforge.net
Subject: [GeoNetwork-devel] CSW harvester and metadata uuid

Hi

The CSW harvester uses gmd:fileIdentifier/gco:CharacterString in full record response as the metadata uuid. If this value doesn’t exist CSW harvester discards the metadata record.

As gmd:fileIdentifier is not mandatory, I propose to extend the actual code, so if is not declared, then check if is declared gmd:identificationInfo/gmd:MD_DataIdentification/gmd:citation/gmd:CI_Citation/gmd:identifier/gmd:MD_Identifier/gmd:code/gco:CharacterString and use it.

Any comment against this change? or seem ok to apply it?

Thanks and regards,
Jose García


All of the data generated in your IT infrastructure is seriously valuable.
Why? It contains a definitive record of application performance, security
threats, fraudulent activity, and more. Splunk takes this data and makes
sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-d2dcopy2


GeoNetwork-devel mailing list
GeoNetwork-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/geonetwork-devel
GeoNetwork OpenSource is maintained at http://sourceforge.net/projects/geonetwork


Master HTML5, CSS3, ASP.NET, MVC, AJAX, Knockout.js, Web API and
much more. Get web development skills now with LearnDevNow -
350+ hours of step-by-step video tutorials by Microsoft MVPs and experts.
SALE $99.99 this month only – learn more at:
http://p.sf.net/sfu/learnmore_122812


GeoNetwork-devel mailing list
GeoNetwork-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/geonetwork-devel
GeoNetwork OpenSource is maintained at http://sourceforge.net/projects/geonetwork


GeoCat Bridge for ArcGIS allows instant publishing of data and metadata on GeoServer and GeoNetwork. Visit http://geocat.net for details.


Jose García
GeoCat bv
Veenderweg 13
6721 WD Bennekom
The Netherlands
http://GeoCat.net