Hi Andrea,
I am very pleased that you are taking the time to look at these issues.
Please see my comments below:
Thanks.
John
-----Original Message-----
From: geonetwork-devel-admin@lists.sourceforge.net
[mailto:geonetwork-devel-admin@lists.sourceforge.net] On
Behalf Of Andrea Carboni
Sent: Saturday, 20 May 2006 12:32 AM
To: geonetwork-devel@lists.sourceforge.net
Subject: Re: [Geonetwork-devel] Validation of XML against ISO
19139 XSDs and other ISO 19115 rulesHi John,
here are my answers and comments.
...
> Discussion item 1:
> ==================
>
> To allow flexibility the ISO 19139 XSDs do not provide code
lists. For
> example, each of the 24 Code lists in section B.5 of ISO
19115 can be
> extended and supplied in different languages. Therefore,
it is expected that
> these appropriate code lists will be defined in the
profiles that implement
> ISO 19139.I don't understand this point. Actually, the 19139 defines
many code lists
(I'm referring to the xml schema, I didn't read the specification).
For each codelist, the sets of allowed values is not enforced
into the xml
schema but simply declared in an external xml file. These
values should
obviously be localized.
I have had a look at the XSDs and you are right. There are many code lists.
The definition of a code list is that it can be extended. There needs to be
some mechanism in GeoNetwork to allow extensions of these code lists to be
read. They should have URIs to identify their schemaLocations. For example,
our organisation may wish to have an extra MD_ScopeCode of 'modelRun'. This
will be used for individual run parameters for our complex models. (EG.
tsunami predictions) We will have to import the ISO 19139 code lists, add
the extra 'modelRun' code list option and refer to the new code list from the
"codeList" attribute for the MD_ScopeCode element.
There is what I mean by flexibility.
GeoNetwork needs to allow one to refer to local copies of these code lists
for efficiency and use something like an OASIS Catalog file for reference to
the local copies for validation rather than access the code list via the
schemaLocation URI.
> Many of the conditional statements in ISO 19115 (shown in
comment boxes in
> the UML diagrams) are not validated by the ISO 19139 XSDs.
Conditions such
> as, "hierarchyLevel" and "hierarchyLevelName" are mandatory
if not equal to
> "dataset"; "topicCategory" is only mandatory if
hierarchyLevel equals
> "dataset" or "series"; "GeographicBoundingBox" and/or
"GeographicDescription"
> are mandatory if hierarchyLevel equals "dataset". These
conditions will need
> to be validated using Schematron or XSL. This does not appear to be
> available in GeoNetwork.
>
> GeoNetwork will need to provide a two parse process to
apply these rules.
> The first validation is against the ISO 19139 XSDs to prove
compliance to
> this specification and the second validation using
Schematron or XSLs to
> prove compliance of the conditional statements, code lists
and profile
> extensions for ISO 19115. There will also be the need to
convert from one
> profile to ISO 19139 format using some form of XSL.I agree.
From a programming point of view a two phase validation
(against the schema
and against the rules) is very easy to achieve. The problem
here is the creation
of the stylesheet for the second pass.
The ANZLIC ISO 19115 metadata profile will be creating either Schematron or
an XSL that will enforce these rules and maybe some other ANZLIC rules. I
expect that I will be able to provide you with this XML once it is available.
That may help.
Anyway, even a simple validation could not succeed. If
geonetwork starts from
a valid metadata it cannot guarantee that after an edit
operation the metadata
is still valid. When you create a node into the metadata you
have to create
all mandatory subnodes. Some of them are simple nodes but
other are mandatory
choices between several nodes. In this case the user must
supply more information
and untill that point the metadata is not invalid.
I agree. There is no need to enforce the rules while the metadata is still
being created. The user may not have yet entered the content for the
mandatory elements or they may be waiting for the information to come from
some other source. They should be able to "save" the invalid XML so that
they don't loose the work they have already done. They can then go back,
once they have the right information, and add the content for those mandatory
elements. Once they *think* that they have added all the content and they
are ready to 'publish' the metadata record, then the two parse validation
process should be done. They should not be able to make their metadata
public unless the metadata record passes the two parse validation process.
It will help for sure.
Cheers,
Andrea