[GeoNetwork-users] "Duplicate" metadata, Datasets and Series [SEC=UNCLASSIFIED]

Hi Jeroen,

Please see my comments below:

Thanks.

John

-----Original Message-----
From: Jeroen Ticheler [mailto:Jeroen.Ticheler@anonymised.com]
Sent: Wednesday, 9 January 2008 6:17 PM
To: Hockaday John
Cc: geonetwork-users@lists.sourceforge.net
Subject: Re: [GeoNetwork-users] "Duplicate" metadata,
Datasets and Series

Thanks John and others,
To me this is a useful discussion to stimulate thinking of how this
should work. My main concerns with taking a generic approach are
probably those:
- A usable GUI requires a limited use of split out objects to avoid
the user is forced to select objects in 20, 30, 50 places in a
metadata record

Would it be nice to have a button or something that would allow the
presentation of available xlink objects for each type of relevant element on
that instance of GN? This would present the current interface with an extra
button or simple icon that would pop up another window for selecting the
appropriate reusable object. Another button/icon could be available for each
appropriate element to store that information as a reusable object prompting
to give it an 'id'. Or maybe an icon could pop up is the user enters a 'id'.

Just some simple thoughts at the moment.

- A generic implementation is nice, but may double, triple,
quadruple
(!?) the development time required. And who is willing to fund such
an extra cost if they only need to have two or three objects to be
split out?

Are yes. This is where a tool that reads the XSDs to present the appropriate
components would be suitable. Something like CHIBA or . This would make
presentation very easy if the code was written to create the appropriate
configuration files.

- In metadata harvesting between to catalogues, how does one deal
with all the xlink parts of a metadata record? Should the client be
responsible for resolving all xlinks and recreate the full resource?
In that case, should the client be storing all objects locally or
should it store a compiled version locally? Or is it more reliable
for the server to offer the client a fully merged document so the
client does not have to worry about the different objects?
My idea on the last one is that we probably do not want to rely on
clients recompiling records, but just deal with the full document.
This means that xlinks in GeoNetwork will only be used internally to
split out objects and merge them back together.

Yes. We/I have thought of this many times without a complete solution yet
someone should do this to resolve any scenarios.

I agree with you that GN should resolve the xlink when it is accessed. This
includes for harvesting, indexing, presentation etc. If a metadata record is
harvested in its denormalised format then it is not necessary for the CSW to
contact the originating CSW that it harvested from each time it is accessed.
This would make presentation quicker and reduce the bandwidth use.

However, there is the problem that the xlink object (on the originating CSW)
may have been updated since harvesting. It would therefore be necessary for
the harvesting application to check if the original record has been updated.
This is also the case for any harvested record not including xlink objects.
Therefore, all GN (or an application that manages xlink) needs to do is mark
any metadata records to be 'changed' that refer to a changed xlink object.
The normal harvesting process would then harvest that virtually changed
metadata record.

This is one reason why I don't like harvesting metadata. If there is a
"push" rather than a "pull" capability for harvesting then the metadata
records are likely to be more current. But most harvesting processes seem to
be "pull" at a certain time rather than "pull".

I don't think that these problems are insurmountable but I do think that they
need to be thoroughly thought through to address all scenarios. It will
create more code but if the coding people can make use of existing XML and
internet technologies then that may reduce the effort.

OK, enough for now. Ciao,
Jeroen

On Jan 9, 2008, at 1:58 AM, John.Hockaday@anonymised.com wrote:

> Hi All,
>
> I hope my mistakes for this year are over. ;--)
>
> To answer Jeroen's question about which components should be
> reusable or
> available using xlink, I believe that any object that can have an
> xlink
> attribute according to the XSDs should be available. We
just don't
> know what
> the GN clients want and so we should give them whatever is
officially
> available. That way no-one can complain except maybe the code
> writers who
> have to implement this. ;--)
>
> Some examples of those elements that can have an xlink
attribute are:
>
> gmd:contact
> gmd:contactInfo
> gmd:phone
> gmd:address
> gmd:locale
> gmd:spatialRepresentationInfo
> gmd:axisDimensionProperties
> gmd:axisDimensionProperties
> gmd:cornerPoints
> gmd:referenceSystemInfo
> gmd:referenceSystemIdentifier
> ...
> Etc.
>
> There is no need to list the rest because that information can be
> obtained
> from the XSDs and is dependent on the XML implementation of the
> profile.
>
> How can the application resolve the xlinks?
> ------------------------------
>
> Here is an example of the use of an xlink:
>
> <gmd:contact
>
> xlink:href="http://asdd.ga.gov.au/asdd/work/ISOmetadata/
> GAOpenDaySeries.xml#G
> ADetails"/>
>
> And here is where the contact information is found:
>
> <gmd:CI_ResponsibleParty id="GADetails">
> ...
> </gmd:CI_ResponsibleParty>
>
> So theoretically the application only has to download the
XML document
> referenced in the xlink, extract the content of the XML element
> that contains
> and id value of "GADetails" and then insert that into the
> referencing element
> in the referencing XML document. In the example above it would be
> <gmd:contact>.
>
> Of course the application has to identify if the content of the
> referenced
> xlink is equal to the structure of the referencing object. This
> could be
> resolved by the validation process.
>
> In the above example the metadata record is directly
available by a
> URL.
> However, I expect that the xlink references in GN should be
using a
> URL that
> calls GN and passes the fileIdentifier as a parameter. For example:
>
> http://hostname:portnumber/geonetworkService?
> fileIdentifier="uuidNumber"#GADe
> tails
>
> The software should call up the XML document delivered by GeoNetwork
> identified by the uuidNumber of the fileIdentifier and then
extract
> and
> deliver the element identified by the "id" attribute with content of
> "GADetails".
>
> There are many questions to answer in this implementation such as,
> what URIs
> are allowed? Do we allow URNs? If so how are they resolved to an
> actual web
> service? I would suggest that we keep it simple to start with.
> Also, what
> happens if the called XML element also has sub elements which have
> xlink
> references? I suggest that these should also be resolved at time
> of access
> so that xlink is a cascading process. Another questions is what
> does the
> application do if the id or uuidNumber do not exist? Etc.
>
> How can the application determine what elements should be reusable?
> --------------------------------------------------------------------
>
> It's all in the attributes. If an element can have a "id"
> attribute and it
> also has a parent element that can have an "xlink" attribute then
> it should
> be storable as a reusable element. For example, gmd:CI_Date can
> have an
> "id" attribute and its parent "gmd:date" can have an "xlink"
> attribute.
> Therefore it should be storable as a reusable component.
>
> I expect that you immediately worry about some elements that may
> not be
> suitable as reusable elements such as "gmd:MD_DataIdentification"
> because why
> would you want to have duplicate gmd:identificationInfo? The
> answer is "not
> likely" but it is a possibility that an organisation would want to
> have the
> exact same metadata record but with a different
> metadataStandardName. For
> example, I may want to show a metadata record in "ANZLIC Metadata
> Profile"
> metadataStandardName format and the same information in
"Australian
> Marine
> Community Profile" metadataStandardName. The only thing
that would
> change
> are the fileIdentfiers because they are unique and some of the
> structure
> elements that are appropriate for the "Australian Marine Community
> Profile".
>
>
> I believe that there are better ways to present metadata
> information in two
> profiles such as using XSLTs and doing on the fly
transformations.
> However,
> it may be something that an organisation may want for their
particular
> business rules and who are we to prevent that?
>
> Configuration of reusable elements.
> ------------------------------------
>
> It is now logical to discuss what elements should be restricted to
> not be
> reusable? This will be determined by the business rules of the
> organisation
> using GN. As shown in the previous section, if the organisation
> believes
> that there should not be duplicate "gmd:MD_DataIdentification"s
> then they can
> remove this element from some sort of configuration file
that has been
> generated by processing the XSDs of the available profile. It may
> be the
> case where the organisation may want to make an element
reusable in
> one
> profile and not reusable in another so each profile should have a
> 'reusable
> elements' configuration file.
>
> I have probably made this reply too long but it is something that
> will have
> to be considered when xlink is implemented.
>
> Thanks.
>
>
> John
>
>> -----Original Message-----
>> From: Lagarde Pierre [mailto:P.Lagarde@anonymised.com]
>> Sent: Tuesday, 8 January 2008 7:16 PM
>> To: Jeroen Ticheler; Hockaday John; François Prunayre
>> Cc: geonetwork-users@lists.sourceforge.net
>> Subject: RE: [GeoNetwork-users] "Duplicate" metadata,
>> Datasets and Series [SEC=UNCLASSIFIED]
>>
>>
>> Hi all,
>>
>> To resume, two types of links could be defined in regard of
>> standards :
>>
>> 1 / the relation between two metadata : child / parent
>> (dataset/series) or Service 19119 MD / Data 19139 MD. The
>> standards include a reference between these metatada files.
>>
>> 2 / the relation between a part of the metadata and a
>> repository (SRS, thesaurus, responsability parties,..).
>> Currently these repository is used with a approach of
>> "copy-paste" a element in the XML file. The issue is not the
>> purpose of the standards and it's more a software
>> specification. A solution could be the use of ebXML
>> implementation or to work with a relational database. Indeed,
>> the solution of xlink/xinclude seems to be more efficient in
>> the case of GN.
>>
>> And we could perhaps merge the point 1) and 2) with the use
>> of xlink/xinclude to include the reference between two
>> metadata files.
>>
>> So, the "jump" forward the implementation of xlink mechanism
>> seems me mandatory.. I could include the implementation in
>> the next version of Geosource but the development will not
>> start before march 2008...
>>
>> Cia,
>>
>> Pierre
>>
>> -----Message d'origine-----
>> De : geonetwork-users-bounces@lists.sourceforge.net
>> [mailto:geonetwork-users-bounces@lists.sourceforge.net] De la
>> part de Jeroen Ticheler
>> Envoyé : mardi 8 janvier 2008 07:54
>> À : John.Hockaday@anonymised.com; François Prunayre
>> Cc : geonetwork-users@lists.sourceforge.net
>> Objet : Re: [GeoNetwork-users] "Duplicate" metadata, Datasets
>> and Series [SEC=UNCLASSIFIED]
>>
>> Hi both,
>> Indeed xlink (or xinclude) could be what we use internally.
>> What goes out to others would be the fully merged metadata
>> document. In the editor, a system very much like the
>> prototype sub-template system could be used, or what BRGM
>> implemented for SRS in geosource (!?).
>> What we should work on is to define the main metadata objects
>> that we want to store as separate entities. Things like the
>> contacts, SRS, citations et cetera.
>> Do we want such implementation to be generic with respect to
>> these objects? And if so, how generic to ensure we can build
>> a usable user interface at the front end? Should we store the
>> duplicates as templates? Should they be stored in separate
>> tables, or simply be flagged as objects in the metadata table?
>> Once we have agreement on those aspects and possibly others,
>> we can define what the implementation should look like.
>> Implementation of the ebXML meta model seems to move us in
>> such direction and vice versa :slight_smile: Ciao, Jeroen
>>
>> On Jan 3, 2008, at 12:23 AM, John.Hockaday@anonymised.com wrote:
>>
>>> Hi Francois,
>>>
>>> Although it hasn't been implemented yet the proper use of reusing
>>> duplicate information in metadata records is the xlink attribute.
>>>
>>> Although one can use the parentIdentifier to identify the
>>> fileIdentifier of the parent metadata record, I don't know
>> of anyway
>>> for the content of the parent metadata record to be
>> included into the
>>> child metadata record other than xlink.
>>>
>>> It may be possible for some Australians to look at
>> incorporating xlink
>>> into GeoNetwork. However we won't have that discussion
until March
>>> 2008 and the outcomes of that discussion may be that it
>> won't be done.
>>>
>>> I hope that this helps.
>>>
>>>
>>> John
>>>
>>>> -----Original Message-----
>>>> From: geonetwork-users-bounces@lists.sourceforge.net
>>>> [mailto:geonetwork-users-bounces@lists.sourceforge.net] On
>> Behalf Of
>>>> Francois-Xavier Prunayre
>>>> Sent: Wednesday, 2 January 2008 8:50 PM
>>>> To: geonetwork-users@lists.sourceforge.net
>>>> Subject: [GeoNetwork-users] "Duplicate" metadata, Datasets
>> and Series
>>>>
>>>>
>>>> Hi list, I'm looking for some advice/comments on how to deal with
>>>> "duplicate" metadata between datasets and series.
>>>>
>>>> I've to deal with more and more datasets and sometimes
>> metadata are
>>>> more or less the same between datasets (ie. only title, point of
>>>> contact and bbox are different). These metadata are edited by
>>>> different organisations and sometimes in different catalogues.
>>>>
>>>> It looks like we could set up control on the metadata editing /
>>>> search tools between a "master" metadata and childs
>> metadata in order
>>>> to have easy editing for user but it seems quite difficult and
>>>> complex to do so
>>>> : some sections in read-only, some sections not ...
>>>>
>>>> Maybe a more generic approach could be to use
relationship between
>>>> series and datasets as defined in ISO and then set up a
>> MD_Metadata
>>>> composed of a series which has 0..n MD_Metadata :
>>>> gmd:series/gmd:DS_Series/gmd:composedOf/gmd:DS_DataSet/gmd:has
>>>> /gmd:MD_Metadata
>>>>
>>>> Then could we described the dataset completely in the DS_Series
>>>> element or having a gmd:MD_Metadata element pointing to another
>>>> metadata using the uuid attribute ? This uuid could be in
>> the local
>>>> node or even in another remote node. Then how to resolve
>> uuid ? Use
>>>> of xlink:href could be better ...
>>>>
>>>> Anybody using such mechanism ? Comments are welcomed.
>>>> Thanks.
>>>>
>>>> Francois
>>>>
>>>> --------------------------------------------------------------
>>>> -----------
>>>> This SF.net email is sponsored by: Microsoft Defy all challenges.
>>>> Microsoft(R) Visual Studio 2005.
>>>> http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
>>>> _______________________________________________
>>>> GeoNetwork-users mailing list
>>>> GeoNetwork-users@lists.sourceforge.net
>>>> https://lists.sourceforge.net/lists/listinfo/geonetwork-users
>>>> GeoNetwork OpenSource is maintained at
>>>> http://sourceforge.net/projects/geonetwork
>>>>
>>>
>>>
>>
---------------------------------------------------------------------
>> -
>>> ---
>>> This SF.net email is sponsored by: Microsoft Defy all challenges.
>>> Microsoft(R) Visual Studio 2005.
>>> http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
>>> _______________________________________________
>>> GeoNetwork-users mailing list
>>> GeoNetwork-users@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/geonetwork-users
>>> GeoNetwork OpenSource is maintained at http://sourceforge.net/
>>> projects/geonetwork
>>
>>
>> --------------------------------------------------------------
>> -----------
>> Check out the new SourceForge.net Marketplace.
>> It's the best place to buy or sell services for just about
>> anything Open Source.
>> http://ad.doubleclick.net/clk;164216239;13503038;w?http://sf.n
> et/marketplace
> _______________________________________________
> GeoNetwork-users mailing list
> GeoNetwork-users@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/geonetwork-users
> GeoNetwork OpenSource is maintained at
> http://sourceforge.net/projects/geonetwork
>
**************************************************************
********
> *******
> *****************
>
> Pensez à l'environnement avant d'imprimer ce message
> Think Environment before printing
>
> Le contenu de ce mél et de ses pièces jointes est destiné à
l'usage
> exclusif
> du (des) destinataire(s) désigné
> (s)
> comme tel(s).
> En cas de réception par erreur, le signaler à son expéditeur et ne
> pas en
> divulguer le contenu.
> L'absence de virus a été vérifiée à l'émission, il convient
> néanmoins de
> s'assurer de l'absence de
> contamination à sa réception.
>
> The contents of this email and any attachments areconfidential.
> They are
> intended for the named recipient
> (s)
> only.
> If you have received this email in error please notifythe system
> manager or
> the sender immediately and do
> not
> disclose the contents to anyone or make copies.
> eSafe scanned this email for viruses, vandals and malicious content.
>
>
**************************************************************
********
> *******
> *****************
>