[GeoNetwork-devel] [GeoNetwork opensource Developer website] #1043: xml resolver should only have oasis catalogs for relevant schema when doing validation

#1043: xml resolver should only have oasis catalogs for relevant schema when doing
validation
---------------------+------------------------------------------------------
Reporter: simonp | Owner: geonetwork-devel@…
     Type: defect | Status: new
Priority: major | Milestone: v2.8.0 RC0
Component: General | Version: v2.8.0RC0
Keywords: |
---------------------+------------------------------------------------------
As an example: the codelist schematron uses the xslt document
function/capability to retrieve the codelist xml document referred to in
each codelist element in the metadata. eg. when processing this element:

<gmd:characterSet>
     <gmd:MD_CharacterSetCode
codeList="http://asdd.ga.gov.au/asdd/profileinfo/gmxCodelists.xml#MD_CharacterSetCode&quot;
codeListValue="utf8" />
</gmd:characterSet>

the codelist schematron will attempt to load the document
http://asdd.ga.gov.au/asdd/profileinfo/gmxCodelists.xml to check that the
codeListValue attribute is set to something that actually exists. These
URLs should be mapped to a local file using an oasis catalog for each
schema. If you switch on the debugging output for the xml resolver in
GeoNetwork (in WEB-INF/log4j.cfg, log4j.logger.jeeves.xmlresolver = DEBUG)
you should see something like the following in your logs showing how these
URLs are resolved to local file paths in the codelist validation
schematron:

2012-09-12 00:22:11,722 DEBUG [jeeves.xmlresolver] - Trying to resolve
http://asdd.ga.gov.au/asdd/profileinfo/GAScopeCodeList.xml:file:/usr/local/jakarta
/tomcat-geonetwork/data/config/schema_plugins/iso19139.anzlic/schematron-
rules-iso-codeListValidation.xsl
2012-09-12 00:22:11,723 DEBUG [jeeves.xmlresolver] - Resolved as
file:/usr/local/jakarta/tomcat-
geonetwork/data/config/schema_plugins/iso19139.anzlic/schema/resources/Codelist/GAScopeCodeList.xml

Basically the bug is that only the oasis catalog for the schema against
which a record is being validated should be loaded into the resolver when
validating - at present *all* oasis catalogs are loaded and that means
that the resolver may end up with (in this case for example) resolving to
a codelist file that belongs to a different schema and thus the values may
not be found. I saw this when looking at this a little earlier for example
the log shows a codelist file for the iso19139.mcp-1.4 schema being used
in the validation of an iso19139.anzlic record:

2012-09-12 00:22:11,715 DEBUG [jeeves.xmlresolver] - Trying to resolve
http://asdd.ga.gov.au/asdd/profileinfo/gmxCodelists.xml:file:/usr/local/jakarta
/tomcat-geonetwork/data/config/schema_plugins/iso19139.anzlic/schematron-
rules-iso-codeListValidation.xsl
2012-09-12 00:22:11,716 DEBUG [jeeves.xmlresolver] - Resolved as
file:/usr/local/jakarta/tomcat-
geonetwork/data/config/schema_plugins/iso19139.mcp-1.4/schema/resources/Codelist/gmxCodelists.xml

--
Ticket URL: <http://trac.osgeo.org/geonetwork/ticket/1043&gt;
GeoNetwork opensource Developer website <http://sourceforge.net/projects/geonetwork/&gt;
GeoNetwork opensource is a standards based, Free and Open Source catalog application to manage spatially referenced resources through the web. It provides powerful metadata editing and search functions as well as an embedded interactive web map viewer. This website contains information related to the development of the software.

#1043: xml resolver should only have oasis catalogs for relevant schema when doing
validation
---------------------+------------------------------------------------------
Reporter: simonp | Owner: geonetwork-devel@…
     Type: defect | Status: new
Priority: major | Milestone: v2.8.0 RC0
Component: General | Version: v2.8.0RC0
Keywords: |
---------------------+------------------------------------------------------

Comment(by simonp):

(XML)Resolver is a singleton that is initialized with the oasis catalogs.
It is passed to all the different XML features that can use it (eg.
SAXBuilder for parsing XML, Xerces validation stuff for validator and the
XSLT transformer (saxon)).

codelist URLs: codelist URLs used in metadata records that belong to a
profile should actually point to the codelist that is used for the
profile. eg. using http://www.isotc211.org/2005/resources/gmxCodelists.xml
(codelists for base iso19115/19139) for the codelist of a profile is
incorrect if the profile actually uses a different codelist to that
provided by the base standard.

As a result, rather than try to change the code, it seems better to
document that:

- codelist urls should point to the actual codelist being used by the
profile
- oasis catalog entries should exist in one oasis catalog file -
duplicates may cause odd behaviour (probably depends on which schema gets
loaded first)

So, at this stage, I think the fix for this bug will be:

- to tidy up the oasis catalogs of all schema plugins (remove duplicates)
- ensure profiles have URLs etc that point to the actual resources they
use (via update-fixed-info.xsl)
- document what to specify in the oasis catalog of a profile in the GN
schema plugin doco

--
Ticket URL: <http://trac.osgeo.org/geonetwork/ticket/1043#comment:1&gt;
GeoNetwork opensource Developer website <http://sourceforge.net/projects/geonetwork/&gt;
GeoNetwork opensource is a standards based, Free and Open Source catalog application to manage spatially referenced resources through the web. It provides powerful metadata editing and search functions as well as an embedded interactive web map viewer. This website contains information related to the development of the software.

Hi Simon,

One of your statements in this bug fix is a bit wrong.

Please see below:

On Wed, 2012-09-12 at 15:32 +0000, GeoNetwork opensource Developer
website wrote:

#1043: xml resolver should only have oasis catalogs for relevant schema when doing
validation
---------------------+------------------------------------------------------
Reporter: simonp | Owner: geonetwork-devel@…
     Type: defect | Status: new
Priority: major | Milestone: v2.8.0 RC0
Component: General | Version: v2.8.0RC0
Keywords: |
---------------------+------------------------------------------------------

Comment(by simonp):

<snip>

codelist URLs: codelist URLs used in metadata records that belong to a
profile should actually point to the codelist that is used for the
profile. eg. using http://www.isotc211.org/2005/resources/gmxCodelists.xml
(codelists for base iso19115/19139) for the codelist of a profile is
incorrect if the profile actually uses a different codelist to that
provided by the base standard.

The code lists are extendible and hence they can be extended without he
need to create a new profile. I know that ISO 19115 states that a new
profile is necessary if a code list is extended but that was not the
intent. The intent was to make sure that people made their code lists
accessible by the web so that the information for those extended code
lists can be found. That was more k=likely if a profile was developed.

I believe this has been changed in ISO 19115-1 and therefore what you
are doing will not be suitable in the future when ISO 19115-1 and ISO
19115-3 are published.

Also, the code list at URL
http://www.isotc211.org/2005/resources/gmxCodelists.xml is invalid. If
you use the code list at that URL there may be some validation errors
that people will not be able to explain because they expect the stated
resources for ISO 19139 to be valid.

As a result, rather than try to change the code, it seems better to
document that:

- codelist urls should point to the actual codelist being used by the
profile

This should not be done as explained above.

I hope that this helps.

John Hockaday

<snip>

John,

I understand that you don't need to create a profile if you extend a codelist. However, you should use a URL in your records that points to the extended codelist (and for performance reasons during schematron validation in GeoNetwork you should map that URL to a local filesystem path in the plugin schema). eg. If you only wanted to add to extend the base codelist, then to do this in GeoNetwork, you could do the following to the iso19139 plugin schema:

- change the codelist URL in the update-fixed-info.xsl to your new publicly available codelist
- add a map to the iso19139 schema oasis catalog from this URL to a local copy of the codelist file
- add your new codelist values to the localized codelists file used by GeoNetwork

You would not need to create a new profile and plugin schema.

Also I take your point about the codelist at http://www.isotc211.org/2005/resources/gmxCodelists.xml - however I was only using it as an example of how some records in profiles that also purport to extend the base ISO codelists, are still referring to this codelist (even if it is broken) - no doubt these records need fixing anyway which is the point I was trying to make!

The bug is all about using the correct paths for resources referred to in a metadata record. Presumably this is still consistent with future iterations of the standard (which I suspect will just have modified wording about when you need to create a profile)?

Cheers and thanks,
Simon

________________________________________
From: john.hockaday [john.hockaday@anonymised.com]
Sent: Thursday, 13 September 2012 9:11 AM
To: geonetwork-devel@lists.sourceforge.net; Pigot, Simon (CMAR, Hobart)
Subject: Re: [GeoNetwork-devel] [GeoNetwork opensource Developer website] #1043: xml resolver should only have oasis catalogs for relevant schema when doing validation

Hi Simon,

One of your statements in this bug fix is a bit wrong.

Please see below:

On Wed, 2012-09-12 at 15:32 +0000, GeoNetwork opensource Developer
website wrote:

#1043: xml resolver should only have oasis catalogs for relevant schema when doing
validation
---------------------+------------------------------------------------------
Reporter: simonp | Owner: geonetwork-devel@…
     Type: defect | Status: new
Priority: major | Milestone: v2.8.0 RC0
Component: General | Version: v2.8.0RC0
Keywords: |
---------------------+------------------------------------------------------

Comment(by simonp):

<snip>

codelist URLs: codelist URLs used in metadata records that belong to a
profile should actually point to the codelist that is used for the
profile. eg. using http://www.isotc211.org/2005/resources/gmxCodelists.xml
(codelists for base iso19115/19139) for the codelist of a profile is
incorrect if the profile actually uses a different codelist to that
provided by the base standard.

The code lists are extendible and hence they can be extended without he
need to create a new profile. I know that ISO 19115 states that a new
profile is necessary if a code list is extended but that was not the
intent. The intent was to make sure that people made their code lists
accessible by the web so that the information for those extended code
lists can be found. That was more k=likely if a profile was developed.

I believe this has been changed in ISO 19115-1 and therefore what you
are doing will not be suitable in the future when ISO 19115-1 and ISO
19115-3 are published.

Also, the code list at URL
http://www.isotc211.org/2005/resources/gmxCodelists.xml is invalid. If
you use the code list at that URL there may be some validation errors
that people will not be able to explain because they expect the stated
resources for ISO 19139 to be valid.

As a result, rather than try to change the code, it seems better to
document that:

- codelist urls should point to the actual codelist being used by the
profile

This should not be done as explained above.

I hope that this helps.

John Hockaday

<snip>

Hi Simon, John and Craig,

Apologies for any cross postings....

Following this discussion has allowed me to fix a schematron validation error (failure of codelist URI lookup
actually) with records from the mcp 1.4 schema plugin -:slight_smile:

The XML resolver was trying to resolve this XML with the code List URI
http://bluenet3.antcrc.utas.edu.au/mcp/resources/Codelist/gmxCodelists.xml#DP_TypeCode

            <mcp:DP_DataParameter>
              <mcp:parameterName>
                <mcp:DP_ParameterName>
                  <mcp:name>
                    <gco:CharacterString>temperature of the water column by expendable bathythermograph (XBT)</gco:CharacterString>
                  </mcp:name>
                  <mcp:type>
                    <mcp:DP_TypeCode codeList="http://bluenet3.antcrc.utas.edu.au/mcp/resources/Codelist/gmxCodelists.xml#DP_TypeCode&quot; codeListValue="longName" />
                  </mcp:type>

and DEBUG of the oasis catalog shows it resolving it to:

http://bluenet3.antcrc.utas.edu.au/mcp/resources/Codelist/gmxCodelists.xml

which doesn't exist.

This particular URI I think was "made up" a couple of years back when the mcp "data parameters"
extension got introduced but it doesn't resolve to an XML document from the web.

So my fix was to add the following entry to the mcp-1.4 oasis-catalog.xml file as
follows:

<uri name="http://bluenet3.antcrc.utas.edu.au/mcp/resources/Codelist/gmxCodelists.xml&quot;
      uri="schema/resources/Codelist/gmxCodelists.xml"/>

and now the good thing is I get my existing production mcp records to pass the codelist schematron
validation step -:slight_smile:

Craig,

Copying you as I think this particular URI occurs in 1000's mcp records that have "data parameters"
in the AODN MEST.

Will submit a GN bug ticket for this to add this extra entry to the mcp-1.4 oasis-catalog.xml.

Cheers,

Andrew

PS: There looks like a small typo error in the mcp-1.4 oasis catalog file, an umatched single
quote ' just before the "http" as follows:

<uri name="'http://bluenet3.antcrc.utas.edu.au/mcp/schema/resources/Codelist/gmxCodelists.xml&quot;
      uri="schema/resources/Codelist/gmxCodelists.xml"/>
          http://bluenet3.antcrc.utas.edu.au/mcp/resources/Codelist/gmxCodelists.xml

<uri name="'http://bluenet3.antcrc.utas.edu.au/mcp-1.4/schema/resources/Codelist/gmxCodelists.xml&quot;
      uri="schema/resources/Codelist/gmxCodelists.xml"/>

----- Original Message ----- From: <Simon.Pigot@anonymised.com>
To: <john.hockaday@anonymised.com>; <geonetwork-devel@lists.sourceforge.net>
Sent: Thursday, September 13, 2012 11:42 AM
Subject: Re: [GeoNetwork-devel] [GeoNetwork opensource Developer website] #1043: xml resolver should only have oasis catalogs for relevant schema when doing validation

John,

I understand that you don't need to create a profile if you extend a codelist. However, you should use a URL in your records that points to the extended codelist (and for performance reasons during schematron validation in GeoNetwork you should map that URL to a local filesystem path in the plugin schema). eg. If you only wanted to add to extend the base codelist, then to do this in GeoNetwork, you could do the following to the iso19139 plugin schema:

- change the codelist URL in the update-fixed-info.xsl to your new publicly available codelist
- add a map to the iso19139 schema oasis catalog from this URL to a local copy of the codelist file
- add your new codelist values to the localized codelists file used by GeoNetwork

You would not need to create a new profile and plugin schema.

Also I take your point about the codelist at http://www.isotc211.org/2005/resources/gmxCodelists.xml - however I was only using it as an example of how some records in profiles that also purport to extend the base ISO codelists, are still referring to this codelist (even if it is broken) - no doubt these records need fixing anyway which is the point I was trying to make!

The bug is all about using the correct paths for resources referred to in a metadata record. Presumably this is still consistent with future iterations of the standard (which I suspect will just have modified wording about when you need to create a profile)?

Cheers and thanks,
Simon

________________________________________
From: john.hockaday [john.hockaday@anonymised.com]
Sent: Thursday, 13 September 2012 9:11 AM
To: geonetwork-devel@lists.sourceforge.net; Pigot, Simon (CMAR, Hobart)
Subject: Re: [GeoNetwork-devel] [GeoNetwork opensource Developer website] #1043: xml resolver should only have oasis catalogs for relevant schema when doing validation

Hi Simon,

One of your statements in this bug fix is a bit wrong.

Please see below:

On Wed, 2012-09-12 at 15:32 +0000, GeoNetwork opensource Developer
website wrote:

#1043: xml resolver should only have oasis catalogs for relevant schema when doing
validation
---------------------+------------------------------------------------------
Reporter: simonp | Owner: geonetwork-devel@…
     Type: defect | Status: new
Priority: major | Milestone: v2.8.0 RC0
Component: General | Version: v2.8.0RC0
Keywords: |
---------------------+------------------------------------------------------

Comment(by simonp):

<snip>

codelist URLs: codelist URLs used in metadata records that belong to a
profile should actually point to the codelist that is used for the
profile. eg. using http://www.isotc211.org/2005/resources/gmxCodelists.xml
(codelists for base iso19115/19139) for the codelist of a profile is
incorrect if the profile actually uses a different codelist to that
provided by the base standard.

The code lists are extendible and hence they can be extended without he
need to create a new profile. I know that ISO 19115 states that a new
profile is necessary if a code list is extended but that was not the
intent. The intent was to make sure that people made their code lists
accessible by the web so that the information for those extended code
lists can be found. That was more k=likely if a profile was developed.

I believe this has been changed in ISO 19115-1 and therefore what you
are doing will not be suitable in the future when ISO 19115-1 and ISO
19115-3 are published.

Also, the code list at URL
http://www.isotc211.org/2005/resources/gmxCodelists.xml is invalid. If
you use the code list at that URL there may be some validation errors
that people will not be able to explain because they expect the stated
resources for ISO 19139 to be valid.

As a result, rather than try to change the code, it seems better to
document that:

- codelist urls should point to the actual codelist being used by the
profile

This should not be done as explained above.

I hope that this helps.

John Hockaday

<snip>

------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and
threat landscape has changed and how IT managers can respond. Discussions
will include endpoint security, mobile security and the latest in malware
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
GeoNetwork-devel mailing list
GeoNetwork-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/geonetwork-devel
GeoNetwork OpenSource is maintained at http://sourceforge.net/projects/geonetwork

#1043: xml resolver should only have oasis catalogs for relevant schema when doing
validation
----------------------+-----------------------------------------------------
  Reporter: simonp | Owner: geonetwork-devel@…
      Type: defect | Status: closed
  Priority: major | Milestone: v2.8.0 RC0
Component: General | Version: v2.8.0RC0
Resolution: fixed | Keywords:
----------------------+-----------------------------------------------------
Changes (by simonp):

  * status: new => closed
  * resolution: => fixed

Comment:

Fixed in commit 4bd3d985c8d100c1db3318097882a5ce0905e38d

--
Ticket URL: <http://trac.osgeo.org/geonetwork/ticket/1043#comment:2&gt;
GeoNetwork opensource Developer website <http://sourceforge.net/projects/geonetwork/&gt;
GeoNetwork opensource is a standards based, Free and Open Source catalog application to manage spatially referenced resources through the web. It provides powerful metadata editing and search functions as well as an embedded interactive web map viewer. This website contains information related to the development of the software.