RE: [Geonetwork-devel] Flexibility and validation of ISO 19139 XMLdocument instances

Hi Jeroen,

Please see my comments to your comments below:

Thanks.

John

-----Original Message-----
From: Jeroen Ticheler [mailto:Jeroen.Ticheler@anonymised.com]
Sent: Tuesday, 23 May 2006 6:45 PM
To: Hockaday John
Cc: acarboni@anonymised.com; geonetwork-devel@lists.sourceforge.net
Subject: Re: [Geonetwork-devel] Flexibility and validation of
ISO 19139 XMLdocument instances

Hi John,
Thanks for all the comments! Will for sure not reply to all;-) but I
have added some below.
Jeroen

On May 23, 2006, at 5:43 AM, John.Hockaday@anonymised.com wrote:

> Hi Andrea,
>
> I am very pleased that you are taking the time to look at these
> issues.
>
> Please see my comments below:
>
> Thanks.
>
>
> John
>
>> -----Original Message-----
>> From: geonetwork-devel-admin@lists.sourceforge.net
>> [mailto:geonetwork-devel-admin@lists.sourceforge.net] On
>> Behalf Of Andrea Carboni
>> Sent: Saturday, 20 May 2006 12:32 AM
>> To: geonetwork-devel@lists.sourceforge.net
>> Subject: Re: [Geonetwork-devel] Validation of XML against ISO
>> 19139 XSDs and other ISO 19115 rules
>>
>>
>> Hi John,
>>
>> here are my answers and comments.
>> ...
>>> Discussion item 1:
>>> ==================
>>>
>>> To allow flexibility the ISO 19139 XSDs do not provide code
>> lists. For
>>> example, each of the 24 Code lists in section B.5 of ISO
>> 19115 can be
>>> extended and supplied in different languages. Therefore,
>> it is expected that
>>> these appropriate code lists will be defined in the
>> profiles that implement
>>> ISO 19139.
>>
>> I don't understand this point. Actually, the 19139 defines
>> many code lists
>> (I'm referring to the xml schema, I didn't read the specification).
>> For each codelist, the sets of allowed values is not enforced
>> into the xml
>> schema but simply declared in an external xml file. These
>> values should
>> obviously be localized.
>>
>
> I have had a look at the XSDs and you are right. There are many
> code lists.
> The definition of a code list is that it can be extended. There
> needs to be
> some mechanism in GeoNetwork to allow extensions of these code
> lists to be
> read. They should have URIs to identify their
schemaLocations. For
> example,
> our organisation may wish to have an extra MD_ScopeCode of
> 'modelRun'. This
> will be used for individual run parameters for our complex
models.
> (EG.
> tsunami predictions) We will have to import the ISO 19139 code
> lists, add
> the extra 'modelRun' code list option and refer to the new code
> list from the
> "codeList" attribute for the MD_ScopeCode element.
>
> There is what I mean by flexibility.
>
> GeoNetwork needs to allow one to refer to local copies of these
> code lists
> for efficiency and use something like an OASIS Catalog file for
> reference to
> the local copies for validation rather than access the code list
> via the
> schemaLocation URI.

We'll indeed need to define if such a check can be done by every
instance (and if yes, than how!?), or that it depends on the
GeoNetwork node what code lists are offered to use in searches. Not
sure here, needs more discussion and thinking on my part to better
understand the implications. E.g. what happens when catalogs
work off-
line or are poorly connected so their specific schema's are
difficult
to be reached by other nodes while representing data?

AN XML instance should contain the URI of any codelist that it uses in the
'codeSpace' attribute of the code. I think that the profile's XSDs should
identify the location of the code lists in the code lists catalogue XSD. A
template (XML file showing all the possible elements for the profile) could
be used to allow GeoNetwork to identify what code lists are used for that
profile. These code lists as well as the XSDs for the profiles *must* be
available on a public website. I believe that the full URL of the codelist
should be in the 'schemaLocation' element so that anyone who downloads the
XML document can validate it and/or find the code lists on the web. A
Catalog file could be used to locate a local copy of the code lists.

A profile *must* provide XSDs and XSL or Schematron for the second validation
parse. Any validation *must* use these XML files whether they are via the
internet or by using local copies of the XSDs and an OASIS Catalog file.
GeoNetwork could use XML configuration files that have been generated for
that profile using the XSDs.

One of the purposes of the Catalog files is to allow parsers to work
off-line. I have placed an example Catalog file at the following URL:
http://asdd.ga.gov.au/asdd/work/catalog.xml . If Catalog files are *not*
used then most parsers will try to access the authoritative XSDs via the
schemaLocation. Hence, Catalog files provide more efficient use of
bandwidth and therefore *should* allow faster validation.

>
>>> Many of the conditional statements in ISO 19115 (shown in
>> comment boxes in
>>> the UML diagrams) are not validated by the ISO 19139 XSDs.
>> Conditions such
>>> as, "hierarchyLevel" and "hierarchyLevelName" are mandatory
>> if not equal to
>>> "dataset"; "topicCategory" is only mandatory if
>> hierarchyLevel equals
>>> "dataset" or "series"; "GeographicBoundingBox" and/or
>> "GeographicDescription"
>>> are mandatory if hierarchyLevel equals "dataset". These
>> conditions will need
>>> to be validated using Schematron or XSL. This does not
appear to be
>>> available in GeoNetwork.
>>>
>>> GeoNetwork will need to provide a two parse process to
>> apply these rules.
>>> The first validation is against the ISO 19139 XSDs to prove
>> compliance to
>>> this specification and the second validation using
>> Schematron or XSLs to
>>> prove compliance of the conditional statements, code lists
>> and profile
>>> extensions for ISO 19115. There will also be the need to
>> convert from one
>>> profile to ISO 19139 format using some form of XSL.
>>
>> I agree.
>>
>> From a programming point of view a two phase validation
>> (against the schema
>> and against the rules) is very easy to achieve. The problem
>> here is the creation
>> of the stylesheet for the second pass.
>>
>
> The ANZLIC ISO 19115 metadata profile will be creating either
> Schematron or
> an XSL that will enforce these rules and maybe some other ANZLIC
> rules. I
> expect that I will be able to provide you with this XML once it is
> available.
> That may help.

Cool, that could be a good starting point! By default the base rules
can be included. How to go about the specific profiles at different
countries is something that needs good thinking and in my mind is
clearly something that will come up in a design for GN3. For GN2 I
think we'll need to work with separate schema's for each profile or
so (Andrea?)

Every profile should have its own XSDs and XSL or Schematron files to allow
validation of the XML document instances for that profile. There is no other
way of *proving* compliance with that profile. Also every profile *must*
provide an XSL to translate XML from the profiles format to ISO 19139 format.
This will allow proof of compliance to the ISO 19139 XSDs via validation.

>
>> Anyway, even a simple validation could not succeed. If
>> geonetwork starts from
>> a valid metadata it cannot guarantee that after an edit
>> operation the metadata
>> is still valid. When you create a node into the metadata you
>> have to create
>> all mandatory subnodes. Some of them are simple nodes but
>> other are mandatory
>> choices between several nodes. In this case the user must
>> supply more information
>> and untill that point the metadata is not invalid.
>>
>
> I agree. There is no need to enforce the rules while the metadata
> is still
> being created. The user may not have yet entered the
content for the
> mandatory elements or they may be waiting for the information to
> come from
> some other source. They should be able to "save" the invalid XML
> so that
> they don't loose the work they have already done. They can
then go
> back,
> once they have the right information, and add the content
for those
> mandatory
> elements. Once they *think* that they have added all the content
> and they
> are ready to 'publish' the metadata record, then the two parse
> validation
> process should be done. They should not be able to make their
> metadata
> public unless the metadata record passes the two parse validation
> process.

This is a new "feature" that would need to be implemented: Metadata
not validated => validation flag = false => privileges for viewing
can not be set to true (or should be reset upon failing
validation of
a previously valid one).

Sounds logical to me. ;--) A user will have to prove compliance by
validation before they can make their metadata public. However, there are
some needs where users still want to create metadata but not make it public.
It is only available for internal use. We have imported XML metadata
documents into our system when we acquire other organisation's data.

A similar use is: We use this metadata on an internal search system to allow
our staff to discover and know that the data is on our system. We *only*
make public the metadata that we are custodian's. We have 720 metadata
records that we have made public and 1231 metadata records *only* available
internally which documents what we have gathered from other sources.

>
>>
>> It will help for sure.
>>
>> Cheers,
>> Andrea
>>
>>
>>
>
>
> -------------------------------------------------------
> Using Tomcat but need to do more? Need to support web services,
> security?
> Get stuff done quickly with pre-integrated technology to make your
> job easier
> Download IBM WebSphere Application Server v.1.0.1 based on Apache
> Geronimo
> http://sel.as-us.falkag.net/sel?cmdlnk&kid0709&bid&3057&dat1642
> _______________________________________________
> Geonetwork-devel mailing list
> Geonetwork-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/geonetwork-devel
> GeoNetwork OpenSource is maintained at http://sourceforge.net/
> projects/geonetwork