[Geonetwork-devel] Flexibility and validation of ISO 19139 XMLdocument instances

Hi Andrea,

I am very pleased that you are taking the time to look at these issues.

Please see my comments below:

Thanks.

John

-----Original Message-----
From: geonetwork-devel-admin@lists.sourceforge.net
[mailto:geonetwork-devel-admin@lists.sourceforge.net] On
Behalf Of Andrea Carboni
Sent: Saturday, 20 May 2006 12:32 AM
To: geonetwork-devel@lists.sourceforge.net
Subject: Re: [Geonetwork-devel] Validation of XML against ISO
19139 XSDs and other ISO 19115 rules

Hi John,

here are my answers and comments.
...
> Discussion item 1:
> ==================
>
> To allow flexibility the ISO 19139 XSDs do not provide code
lists. For
> example, each of the 24 Code lists in section B.5 of ISO
19115 can be
> extended and supplied in different languages. Therefore,
it is expected that
> these appropriate code lists will be defined in the
profiles that implement
> ISO 19139.

I don't understand this point. Actually, the 19139 defines
many code lists
(I'm referring to the xml schema, I didn't read the specification).
For each codelist, the sets of allowed values is not enforced
into the xml
schema but simply declared in an external xml file. These
values should
obviously be localized.

I have had a look at the XSDs and you are right. There are many code lists.
The definition of a code list is that it can be extended. There needs to be
some mechanism in GeoNetwork to allow extensions of these code lists to be
read. They should have URIs to identify their schemaLocations. For example,
our organisation may wish to have an extra MD_ScopeCode of 'modelRun'. This
will be used for individual run parameters for our complex models. (EG.
tsunami predictions) We will have to import the ISO 19139 code lists, add
the extra 'modelRun' code list option and refer to the new code list from the
"codeList" attribute for the MD_ScopeCode element.

There is what I mean by flexibility.

GeoNetwork needs to allow one to refer to local copies of these code lists
for efficiency and use something like an OASIS Catalog file for reference to
the local copies for validation rather than access the code list via the
schemaLocation URI.

> Many of the conditional statements in ISO 19115 (shown in
comment boxes in
> the UML diagrams) are not validated by the ISO 19139 XSDs.
  Conditions such
> as, "hierarchyLevel" and "hierarchyLevelName" are mandatory
if not equal to
> "dataset"; "topicCategory" is only mandatory if
hierarchyLevel equals
> "dataset" or "series"; "GeographicBoundingBox" and/or
"GeographicDescription"
> are mandatory if hierarchyLevel equals "dataset". These
conditions will need
> to be validated using Schematron or XSL. This does not appear to be
> available in GeoNetwork.
>
> GeoNetwork will need to provide a two parse process to
apply these rules.
> The first validation is against the ISO 19139 XSDs to prove
compliance to
> this specification and the second validation using
Schematron or XSLs to
> prove compliance of the conditional statements, code lists
and profile
> extensions for ISO 19115. There will also be the need to
convert from one
> profile to ISO 19139 format using some form of XSL.

I agree.

From a programming point of view a two phase validation
(against the schema
and against the rules) is very easy to achieve. The problem
here is the creation
of the stylesheet for the second pass.

The ANZLIC ISO 19115 metadata profile will be creating either Schematron or
an XSL that will enforce these rules and maybe some other ANZLIC rules. I
expect that I will be able to provide you with this XML once it is available.
That may help.

Anyway, even a simple validation could not succeed. If
geonetwork starts from
a valid metadata it cannot guarantee that after an edit
operation the metadata
is still valid. When you create a node into the metadata you
have to create
all mandatory subnodes. Some of them are simple nodes but
other are mandatory
choices between several nodes. In this case the user must
supply more information
and untill that point the metadata is not invalid.

I agree. There is no need to enforce the rules while the metadata is still
being created. The user may not have yet entered the content for the
mandatory elements or they may be waiting for the information to come from
some other source. They should be able to "save" the invalid XML so that
they don't loose the work they have already done. They can then go back,
once they have the right information, and add the content for those mandatory
elements. Once they *think* that they have added all the content and they
are ready to 'publish' the metadata record, then the two parse validation
process should be done. They should not be able to make their metadata
public unless the metadata record passes the two parse validation process.

It will help for sure.

Cheers,
Andrea

Hi John,
Thanks for all the comments! Will for sure not reply to all;-) but I have added some below.
Jeroen

On May 23, 2006, at 5:43 AM, John.Hockaday@anonymised.com wrote:

Hi Andrea,

I am very pleased that you are taking the time to look at these issues.

Please see my comments below:

Thanks.

John

-----Original Message-----
From: geonetwork-devel-admin@lists.sourceforge.net
[mailto:geonetwork-devel-admin@lists.sourceforge.net] On
Behalf Of Andrea Carboni
Sent: Saturday, 20 May 2006 12:32 AM
To: geonetwork-devel@lists.sourceforge.net
Subject: Re: [Geonetwork-devel] Validation of XML against ISO
19139 XSDs and other ISO 19115 rules

Hi John,

here are my answers and comments.
...

Discussion item 1:

To allow flexibility the ISO 19139 XSDs do not provide code

lists. For

example, each of the 24 Code lists in section B.5 of ISO

19115 can be

extended and supplied in different languages. Therefore,

it is expected that

these appropriate code lists will be defined in the

profiles that implement

ISO 19139.

I don't understand this point. Actually, the 19139 defines
many code lists
(I'm referring to the xml schema, I didn't read the specification).
For each codelist, the sets of allowed values is not enforced
into the xml
schema but simply declared in an external xml file. These
values should
obviously be localized.

I have had a look at the XSDs and you are right. There are many code lists.
The definition of a code list is that it can be extended. There needs to be
some mechanism in GeoNetwork to allow extensions of these code lists to be
read. They should have URIs to identify their schemaLocations. For example,
our organisation may wish to have an extra MD_ScopeCode of 'modelRun'. This
will be used for individual run parameters for our complex models. (EG.
tsunami predictions) We will have to import the ISO 19139 code lists, add
the extra 'modelRun' code list option and refer to the new code list from the
"codeList" attribute for the MD_ScopeCode element.

There is what I mean by flexibility.

GeoNetwork needs to allow one to refer to local copies of these code lists
for efficiency and use something like an OASIS Catalog file for reference to
the local copies for validation rather than access the code list via the
schemaLocation URI.

We'll indeed need to define if such a check can be done by every instance (and if yes, than how!?), or that it depends on the GeoNetwork node what code lists are offered to use in searches. Not sure here, needs more discussion and thinking on my part to better understand the implications. E.g. what happens when catalogs work off-line or are poorly connected so their specific schema's are difficult to be reached by other nodes while representing data?

Many of the conditional statements in ISO 19115 (shown in

comment boxes in

the UML diagrams) are not validated by the ISO 19139 XSDs.

  Conditions such

as, "hierarchyLevel" and "hierarchyLevelName" are mandatory

if not equal to

"dataset"; "topicCategory" is only mandatory if

hierarchyLevel equals

"dataset" or "series"; "GeographicBoundingBox" and/or

"GeographicDescription"

are mandatory if hierarchyLevel equals "dataset". These

conditions will need

to be validated using Schematron or XSL. This does not appear to be
available in GeoNetwork.

GeoNetwork will need to provide a two parse process to

apply these rules.

The first validation is against the ISO 19139 XSDs to prove

compliance to

this specification and the second validation using

Schematron or XSLs to

prove compliance of the conditional statements, code lists

and profile

extensions for ISO 19115. There will also be the need to

convert from one

profile to ISO 19139 format using some form of XSL.

I agree.

From a programming point of view a two phase validation
(against the schema
and against the rules) is very easy to achieve. The problem
here is the creation
of the stylesheet for the second pass.

The ANZLIC ISO 19115 metadata profile will be creating either Schematron or
an XSL that will enforce these rules and maybe some other ANZLIC rules. I
expect that I will be able to provide you with this XML once it is available.
That may help.

Cool, that could be a good starting point! By default the base rules can be included. How to go about the specific profiles at different countries is something that needs good thinking and in my mind is clearly something that will come up in a design for GN3. For GN2 I think we'll need to work with separate schema's for each profile or so (Andrea?)

Anyway, even a simple validation could not succeed. If
geonetwork starts from
a valid metadata it cannot guarantee that after an edit
operation the metadata
is still valid. When you create a node into the metadata you
have to create
all mandatory subnodes. Some of them are simple nodes but
other are mandatory
choices between several nodes. In this case the user must
supply more information
and untill that point the metadata is not invalid.

I agree. There is no need to enforce the rules while the metadata is still
being created. The user may not have yet entered the content for the
mandatory elements or they may be waiting for the information to come from
some other source. They should be able to "save" the invalid XML so that
they don't loose the work they have already done. They can then go back,
once they have the right information, and add the content for those mandatory
elements. Once they *think* that they have added all the content and they
are ready to 'publish' the metadata record, then the two parse validation
process should be done. They should not be able to make their metadata
public unless the metadata record passes the two parse validation process.

This is a new "feature" that would need to be implemented: Metadata not validated => validation flag = false => privileges for viewing can not be set to true (or should be reset upon failing validation of a previously valid one).

It will help for sure.

Cheers,
Andrea

-------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmdlnk&kid0709&bid&3057&dat1642
_______________________________________________
Geonetwork-devel mailing list
Geonetwork-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/geonetwork-devel
GeoNetwork OpenSource is maintained at http://sourceforge.net/projects/geonetwork

Hi John,

please see my comments below:

> I don't understand this point. Actually, the 19139 defines
> many code lists
> (I'm referring to the xml schema, I didn't read the specification).
> For each codelist, the sets of allowed values is not enforced
> into the xml
> schema but simply declared in an external xml file. These
> values should
> obviously be localized.
>

I have had a look at the XSDs and you are right. There are many code lists.
The definition of a code list is that it can be extended. There needs to be
some mechanism in GeoNetwork to allow extensions of these code lists to be
read. They should have URIs to identify their schemaLocations. For example,
our organisation may wish to have an extra MD_ScopeCode of 'modelRun'. This
will be used for individual run parameters for our complex models. (EG.
tsunami predictions) We will have to import the ISO 19139 code lists, add
the extra 'modelRun' code list option and refer to the new code list from the
"codeList" attribute for the MD_ScopeCode element.

There is what I mean by flexibility.

GeoNetwork needs to allow one to refer to local copies of these code lists
for efficiency and use something like an OASIS Catalog file for reference to
the local copies for validation rather than access the code list via the
schemaLocation URI.

The idea is to take the codelist values directly from the file inside the 19139
schema (resources/Codelist/gmxCodelists.xml). What should be discussed is
how to address custom changes: just change that file or create another file
with only the extras (to avoid the cut'n'paste every time the schema is upgraded)?

> From a programming point of view a two phase validation
> (against the schema
> and against the rules) is very easy to achieve. The problem
> here is the creation
> of the stylesheet for the second pass.
>

The ANZLIC ISO 19115 metadata profile will be creating either Schematron or
an XSL that will enforce these rules and maybe some other ANZLIC rules. I
expect that I will be able to provide you with this XML once it is available.
That may help.

Good.

> Anyway, even a simple validation could not succeed. If
> geonetwork starts from
> a valid metadata it cannot guarantee that after an edit
> operation the metadata
> is still valid. When you create a node into the metadata you
> have to create
> all mandatory subnodes. Some of them are simple nodes but
> other are mandatory
> choices between several nodes. In this case the user must
> supply more information
> and untill that point the metadata is not invalid.
>

I agree. There is no need to enforce the rules while the metadata is still
being created. The user may not have yet entered the content for the
mandatory elements or they may be waiting for the information to come from
some other source. They should be able to "save" the invalid XML so that
they don't loose the work they have already done. They can then go back,
once they have the right information, and add the content for those mandatory
elements. Once they *think* that they have added all the content and they
are ready to 'publish' the metadata record, then the two parse validation
process should be done. They should not be able to make their metadata
public unless the metadata record passes the two parse validation process.

Ok.

Cheers,
Andrea