[GeoNetwork-devel] OGC services (WMS, WFS) harvesting - what about missing required elements?

I am validating the ISO 19139 metadata generated by GN 2.4.2 harvesting of OGC services (WMS, WFS, etc.).
It would be nice if GN would generate required elements such as <gmd:MD_Metadata><gmd:contact> (metadata contact person/organization) even if it is not specified in the service's getCapability.

I also came across some empty, non-mandatory elements that my schema validator (XMLspy) chokes on. It would be nice if they were removed:
            <srv:containsOperations>
                <srv:SV_OperationMetadata>
                    <srv:operationName>
                        <gco:CharacterString/>
                    </srv:operationName>
                </srv:SV_OperationMetadata>
            </srv:containsOperations>

--
_______________________________
Wolfgang Grunberg
Arizona Geological Survey
wgrunberg@anonymised.com
520-770-3500

Hello Wolfgang

2009/10/28 Wolfgang Grunberg <wgrunberg@anonymised.com>:

I am validating the ISO 19139 metadata generated by GN 2.4.2 harvesting
of OGC services (WMS, WFS, etc.).
It would be nice if GN would generate required elements such as
<gmd:MD_Metadata><gmd:contact> (metadata contact person/organization)
even if it is not specified in the service's getCapability.

If this information is not in the capabilities document, I don't see
any benefit to add empty tags in here.
Maybe it's better to have this record invalid and ask the service
provider to add this information ?

I also came across some empty, non-mandatory elements that my schema
validator (XMLspy) chokes on. It would be nice if they were removed:
<srv:containsOperations>
<srv:SV_OperationMetadata>
<srv:operationName>
<gco:CharacterString/>
</srv:operationName>
</srv:SV_OperationMetadata>
</srv:containsOperations>

Could you provide the service url or capabilities document for test ?

Thanks.

Francois

--
_______________________________
Wolfgang Grunberg
Arizona Geological Survey
wgrunberg@anonymised.com
520-770-3500

------------------------------------------------------------------------------
Come build with us! The BlackBerry(R) Developer Conference in SF, CA
is the only developer event you need to attend this year. Jumpstart your
developing skills, take BlackBerry mobile applications to market and stay
ahead of the curve. Join us from November 9 - 12, 2009. Register now!
http://p.sf.net/sfu/devconference
_______________________________________________
GeoNetwork-devel mailing list
GeoNetwork-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/geonetwork-devel
GeoNetwork OpenSource is maintained at http://sourceforge.net/projects/geonetwork

gmd:contact is required, so the benefit is that the MD_Metadata record is schema valid. Just populate like this:

<gmd:contact gco:nilReason=“missing”/>

steve

Francois Prunayre wrote:

···
-- 
Stephen M. Richard
Section Chief, Geoinformatics
Arizona Geological Survey
416 W. Congress St., #100
Tucson, Arizona, 85701 USA

Phone: 
Office: (520) 209-4127
Reception: (520) 770-3500 
FAX: (520) 770-3505

email: [steve.richard@anonymised.com](mailto:steve.richard@anonymised.com)

this discussion reminds me of the discussion whether the “default” (= template-based) editor should be prevented from storing optional elements that cannot be empty if present.

Bottom line is that some feel that it is more important to store “what the user wants” instead of having GN take any action towards more valid metadata.

And “what the user wants” in the case of the “default” editor is taken to be, whatever is in the template that the user selected; and in the case of the metadata generated from OGCWxS, it is taken to be whatever the service provider chooses to put there.

For the case of the “default” editor, I advocate a name change to “template-based editor” or something like that that better describes its function; and while at it, rename the “advanced” editor to “free editor”, or similar. Along with some readily accessible help texts about the different storage strategies employed by the two editors.

That of course doesn’t help the OGCWxS harvesting. Maybe an option to add a setting so admin users can configure the behaviour ? Or would that only increase confusion ?

Kind regards
Heikki Doeleman

On Thu, Oct 29, 2009 at 6:46 PM, Stephen M Richard <steve.richard@anonymised.com> wrote:

gmd:contact is required, so the benefit is that the MD_Metadata record is schema valid. Just populate like this:

<gmd:contact gco:nilReason=“missing”/>

steve

Francois Prunayre wrote:

Hello Wolfgang

2009/10/28 Wolfgang Grunberg [<wgrunberg@anonymised.com>](mailto:wgrunberg@anonymised.com):
  
I am validating the ISO 19139 metadata generated by GN 2.4.2 harvesting
of OGC services (WMS, WFS, etc.).
It would be nice if GN would generate required elements such as
<gmd:MD_Metadata><gmd:contact> (metadata contact person/organization)
even if it is not specified in the service's getCapability.
    
If this information is not in the capabilities document, I don't see
any benefit to add empty tags in here.
Maybe it's better to have this record invalid and ask the service
provider to add this information ?

  
-- 
Stephen M. Richard
Section Chief, Geoinformatics
Arizona Geological Survey
416 W. Congress St., #100
Tucson, Arizona, 85701 USA

Phone: 
Office: (520) 209-4127
Reception: (520) 770-3500 
FAX: (520) 770-3505

email: [steve.richard@anonymised.com](mailto:steve.richard@anonymised.com)

Come build with us! The BlackBerry(R) Developer Conference in SF, CA
is the only developer event you need to attend this year. Jumpstart your
developing skills, take BlackBerry mobile applications to market and stay
ahead of the curve. Join us from November 9 - 12, 2009. Register now!
http://p.sf.net/sfu/devconference


GeoNetwork-devel mailing list
GeoNetwork-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/geonetwork-devel
GeoNetwork OpenSource is maintained at http://sourceforge.net/projects/geonetwork

I think that if an application is advertising use of an XML schema (like http://schemas.opengis.net/iso/19139/20070417/gmd/gmd.xsd), then any documents produced must be schema valid. This is a separate issue (to some degree) from how valid the metadata content is. No matter how valid the content is in the rest of the document, if the xml isn’t schema valid, software applications using that xml document may not function. Otherwise, why bother with the schema?

steve

heikki wrote:

···
-- 
Stephen M. Richard
Section Chief, Geoinformatics
Arizona Geological Survey
416 W. Congress St., #100
Tucson, Arizona, 85701 USA

Phone: 
Office: (520) 209-4127
Reception: (520) 770-3500 
FAX: (520) 770-3505

email: [steve.richard@anonymised.com](mailto:steve.richard@anonymised.com)

Hi Francois,

Francois Prunayre wrote:

Hello Wolfgang

2009/10/28 Wolfgang Grunberg [<wgrunberg@anonymised.com>](mailto:wgrunberg@anonymised.com):
  
I am validating the ISO 19139 metadata generated by GN 2.4.2 harvesting
of OGC services (WMS, WFS, etc.).
It would be nice if GN would generate required elements such as
<gmd:MD_Metadata><gmd:contact> (metadata contact person/organization)
even if it is not specified in the service's getCapability.
    
If this information is not in the capabilities document, I don't see
any benefit to add empty tags in here.
Maybe it's better to have this record invalid and ask the service
provider to add this information ?

  

I agree with you from a theoretical metadata perspective and we will have to fix our own services getCapabilities docs.
However, if we want interoperable services, we will need smart clients that can deal with bad information the way web browsers deal with bad HTML and JavaScript. I would argue that GeoNetwork acts as a client when it “harvests” GetCapabilities from ArcGIS Server and GeoServer WMS and WMS services. Hence it would be nice if GeoNetwork would do it’s best to correct human/software omissions and errors when creating ISO 19139 service metadata by adding the “missing” attribute (gco:nilReason=“missing”), etc… Otherwise, GeoNetwork would perpetuate schema errors and it would be better off refusing to harvest those services.

  
I also came across some empty, non-mandatory elements that my schema
validator (XMLspy) chokes on. It would be nice if they were removed:
           <srv:containsOperations>
               <srv:SV_OperationMetadata>
                   <srv:operationName>
                       <gco:CharacterString/>
                   </srv:operationName>
               </srv:SV_OperationMetadata>
           </srv:containsOperations>
    

Could you provide the service url or capabilities document for test ?

  

Following is the WFS 1.1.0 test service that generated the empty srv:containsOperation element group.
http://proxy.azgs.az.gov/arcgis/services/GeologicMaps/AZStateGeo_FC/MapServer/WFSServer (ArcGIS Server)

On Saturday, 31 October 2009 3:39 AM Wolfgang Grunberg wrote:

<snip>

I agree with you from a theoretical metadata perspective and
we will have to fix our own services getCapabilities docs.
However, if we want interoperable services, we will need
smart clients that can deal with bad information the way web
browsers deal with bad HTML and JavaScript.

I entirely disagree with this statement. The presentation of HTML in the past has been for the benefit of humans. The presentation of XML for machine to machine process is a different matter. If the XML is valid then the application can reliably process the content. However, if a metadata editor has been slack and not produced valid content then the application should not interpret what it thinks the editor meant. The application should fail and the editor should be prompted to correct the errors. After all, that is what standards are for. We want people to follow the standards so that applications can reliably process the content.

It is also impolite to assume what a person meant and try to resolve that content. If a person did not provide valid content then the application should reject the metadata record and ask for the editor to correct the error.

I would argue
that GeoNetwork acts as a client when it "harvests"
GetCapabilities from ArcGIS Server and GeoServer WMS and WMS
services. Hence it would be nice if GeoNetwork would do it's
best to correct human/software omissions and errors when
creating ISO 19139 service metadata by adding the "missing"
attribute (gco:nilReason="missing"), etc.. Otherwise,
GeoNetwork would perpetuate schema errors and it would be
better off refusing to harvest those services.

Yes. This is what GN should do. It should reject the harvested content and warn the editor that the content is invalid so that the editor can correct the content for the next harvest.

It is wrong for an application to change the content of a metadata record. If that occurred and the resulting metadata was used for some incorrect purpose then GN developers would be held responsible for any resulting law suits. However, if GN rejected invalid XML then any resulting law suites would be the responsibility of the metadata editor and not the GN developers.

I strongly suggest that any invalid XML is reported and rejected. After all, that
is the one big benefit of XML. IE. The XML document instance can be validated against the XML Schema etc. to prove that it is compliant to the DSDL. If the application interprets the XML when it is invalid this will perpetuate the problems of users creating incorrect content.

My two cents worth.

John

<snip>

The operation we are talking about is harvesting ISO19115 metadata from OGC getCapabilities xml. The ows schema (http://schemas.opengis.net/ows/1.1.0/owsDataIdentification.xsd) makes all content optional, so from the point of schema validity, its not an error to be missing contact information in a getCapabilties doc. The issue is whether to take such metadata, and put nilReason=‘missing’ in for the gmd:contact (or any other missing content), which is a completely accurate representation of the situation, and is allowed by ISO19139. This is not changing the content of a metadata record, and we’re still talking data validation at the xml schema level.

Rejecting metadata based on content is an application profile decision, and can not be done at the xml schema level. This would equate to requiring the contact is not empty with a nilReason–a restriction of the ISO19139 schema. It certainly makes sense from one point of view (I don’t want any metadata without contact information), but one could also take the approach that ‘any metadata is better than none, I’ll take what you give me, and note where things are missing’. Personally I think the second approach is useful. Is GN in the business of developing profiles that proscribe content requirements under various xml schema? There are some serious software development issues in this approach–enabling schematron or other more sophisticated rule-based validation, giving the user a choice about content validation rules to use… a slippery slope.

steve