[Geoserver-devel] [jira] Created: (GEOS-820) GML 2.1.2 schemas don't validate in Xerces

GML 2.1.2 schemas don't validate in Xerces
------------------------------------------

                 Key: GEOS-820
                 URL: http://jira.codehaus.org/browse/GEOS-820
             Project: GeoServer
          Issue Type: Bug
          Components: Configuration
    Affects Versions: 1.4.0-RC4, 1.3.4, 1.4.0-RC5, 1.4.0, 1.5.0-beta1
            Reporter: Saul Farber
         Assigned To: dblasby
            Priority: Minor
             Fix For: 1.4.0-RC5, 1.5.0-beta1
         Attachments: geometry.xsd.patch

The gml 2.1.2 schemas (as found here http://schemas.opengis.net/gml/2.1.2/) are not valid. They are *almost* valid, and of the many different validation engines out there (MSXML, the w3c validator, xerces-j, etc.) only a small number of them catch the error. It's an obscure part of the XML schema spec, but I asked once a long long time ago on the w3c-schema-dev list and they did conclude that the gml 2.1.2 spec is indeed invalid, and that xerces-j does actually catch the error.

To solve this, I tweaked the geometry.xsd file just a tiny bit to make it valid. I don't think it actually changes the content-model at all, it just makes the schema valid.

I think fixing the schema is a good thing, and will make many people attempting to validate WFS responses with strict validators (for example, xerces-j).

Patch to fix geometry.xsd is attached.

--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://jira.codehaus.org/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

Saul,

Most validating parsers will allow you to turn off the
schema-validitity checks. Its a bit dangerous, but it seems to work:

parser.setFeature("http://apache.org/xml/features/validation/schema-full-checking",
                false);

This is how the SLD validator gets around the issue. See
SLDValidator#validateSLD().

Hopefully this will help you,

dave

On 12/5/06, Saul Farber (JIRA) <jira@anonymised.com> wrote:

GML 2.1.2 schemas don't validate in Xerces
------------------------------------------

                 Key: GEOS-820
                 URL: http://jira.codehaus.org/browse/GEOS-820
             Project: GeoServer
          Issue Type: Bug
          Components: Configuration
    Affects Versions: 1.4.0-RC4, 1.3.4, 1.4.0-RC5, 1.4.0, 1.5.0-beta1
            Reporter: Saul Farber
         Assigned To: dblasby
            Priority: Minor
             Fix For: 1.4.0-RC5, 1.5.0-beta1
         Attachments: geometry.xsd.patch

The gml 2.1.2 schemas (as found here Index of /gml/2.1.2) are not valid. They are *almost* valid, and of the many different validation engines out there (MSXML, the w3c validator, xerces-j, etc.) only a small number of them catch the error. It's an obscure part of the XML schema spec, but I asked once a long long time ago on the w3c-schema-dev list and they did conclude that the gml 2.1.2 spec is indeed invalid, and that xerces-j does actually catch the error.

To solve this, I tweaked the geometry.xsd file just a tiny bit to make it valid. I don't think it actually changes the content-model at all, it just makes the schema valid.

I think fixing the schema is a good thing, and will make many people attempting to validate WFS responses with strict validators (for example, xerces-j).

Patch to fix geometry.xsd is attached.

--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://jira.codehaus.org/secure/Administrators.jspa
-
For more information on JIRA, see: Jira | Issue & Project Tracking Software | Atlassian

-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Geoserver-devel mailing list
Geoserver-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/geoserver-devel

Dave,

Yeah, that would definitely get me around any validity/parsing issues, but it doesn't get me around JAXB or people who actually really *want* to validate their documents.

My bug submission was motivated by someone who was "auto-sucking" our WFS collections out, using a schema-based-binding-engine to generate SQL tables and populating them with GetFeature calls.

What're your thoughts on the fix itself?

--saul

David Blasby wrote:

Saul,

Most validating parsers will allow you to turn off the
schema-validitity checks. Its a bit dangerous, but it seems to work:

parser.setFeature("http://apache.org/xml/features/validation/schema-full-checking&quot;,
                false);

This is how the SLD validator gets around the issue. See
SLDValidator#validateSLD().

Hopefully this will help you,

dave

I think that the OGC has acknowledged there is a problem there and is fine with people doing slight modifications to the schemas to make it work. I feel there was a discussion on wfs-dev awhile ago. It's clearly a problem the OGC knows about, they just haven't gotten to fixing it in their own stuff, so it's ok for us to do the little fix. In my humble opinion, at least.

best regards,

Chris

Saul Farber wrote:

Dave,

Yeah, that would definitely get me around any validity/parsing issues, but it doesn't get me around JAXB or people who actually really *want* to validate their documents.

My bug submission was motivated by someone who was "auto-sucking" our WFS collections out, using a schema-based-binding-engine to generate SQL tables and populating them with GetFeature calls.

What're your thoughts on the fix itself?

--saul

David Blasby wrote:

Saul,

Most validating parsers will allow you to turn off the
schema-validitity checks. Its a bit dangerous, but it seems to work:

parser.setFeature("http://apache.org/xml/features/validation/schema-full-checking&quot;,
                false);

This is how the SLD validator gets around the issue. See
SLDValidator#validateSLD().

Hopefully this will help you,

dave

-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Geoserver-devel mailing list
Geoserver-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/geoserver-devel

!DSPAM:1003,4576eefe155596309890654!

--
Chris Holmes
The Open Planning Project
http://topp.openplans.org

geoserver-devel-bounces@lists.sourceforge.net wrote on 12/06/2006 09:30:11
AM:

I think that the OGC has acknowledged there is a problem there and is
fine with people doing slight modifications to the schemas to make it
work. I feel there was a discussion on wfs-dev awhile ago. It's
clearly a problem the OGC knows about, they just haven't gotten to
fixing it in their own stuff, so it's ok for us to do the little fix.
In my humble opinion, at least.

I was under the impression that they were just going to fix new versions
and expect people to upgrade. I didn't know they were still updating GML2.

They finally did fix GML3, even though it took at least three releases to
get it right.

Using cleverness to increase complexity is almost never a good idea,
Grasshopper. Cleverness should embrace simplicity. :slight_smile:

Bryce

A known problem.

There is a logical answer of course:

GML 2.x is deprecated.

move to GML 3.1.1 and WFS 1.1 ASAP instead of burning effort on this.

On the more general side, schema validation is problematic at run time due to performance reasons. A few salient points:

1) GML does not rely on the PSVI being richer.
2) Validation on insertion makes more sense, but you generaly need something much stronger than schema validation to ensure the content meets the typical foreign-key constraints we are used to in DB-land
3) OASIS Catalogs can be used to find local (cached/hacked/simplified) copies of schemas - in practice this is the only workable way of using GML IMHO

Rob

Saul Farber wrote:

Dave,

Yeah, that would definitely get me around any validity/parsing issues, but it doesn't get me around JAXB or people who actually really *want* to validate their documents.

My bug submission was motivated by someone who was "auto-sucking" our WFS collections out, using a schema-based-binding-engine to generate SQL tables and populating them with GetFeature calls.

What're your thoughts on the fix itself?

--saul

David Blasby wrote:
  

Saul,

Most validating parsers will allow you to turn off the
schema-validitity checks. Its a bit dangerous, but it seems to work:

parser.setFeature("http://apache.org/xml/features/validation/schema-full-checking&quot;,
                false);

This is how the SLD validator gets around the issue. See
SLDValidator#validateSLD().

Hopefully this will help you,

dave

-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Geoserver-devel mailing list
Geoserver-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/geoserver-devel
  

Good points Rob.

I think, however, that my original question has gotten a bit derailed. I'll give it one more shot.

Here's my summary of the emails so far:

1) GML 2.1.2 schemas have a bug in them. This bug causes xerces and XMLSpy (and possibly other parsers) to not properly validate any documents referencing geometry.xsd

2) GML 3.1.x does not have such a bug. The future (including WFS 1.1) has lots of GML 3.1.x in it.

3) There exists a very simple fix to the GML 2.1.2 schema bug which will allow GML 2.1.2-based documents (all WFS 1.0/SLD 1.0 requests) to validate and gain the advantages of validating. Schema<->language binding tools, eclipse auto-complete, general sanity checks, etc.

4) Chris is in the process of pushing OGC to "quickly" (1-2 months?) apply the fix from #3 to GML 2.1.2 via a corrigendum.

My question is:

Is there any reason that Geoserver should *not* apply the gml-2.1.2-fix to geometry.xsd?

The patch will not invalidate any existing XML documents, and will not make any documents which are currently invalid valid. The patch does *not* change the content-model of geometry.xsd in any way.

The patch *will* let people hit "validate" in their favorite XML editor without errors. It will let eclipse perform auto-completion for SLD, WFS and GML when using an XML editor with schema-awareness (WST, oXygen, etc.). It will let folks using xerces-based GML <-> language binding tools use these tools more effectively.

In addition, there's no effort "burnt" on this (other than my time writing long-winded emails!) The patch exists, it works, application is approximately 30 seconds, most of which is waiting for the ssh login prompt to appear.

Anyone have a reason not to apply this fix?

--saul

Rob Atkinson wrote:

A known problem.

There is a logical answer of course:

GML 2.x is deprecated.

move to GML 3.1.1 and WFS 1.1 ASAP instead of burning effort on this.

On the more general side, schema validation is problematic at run time due to performance reasons. A few salient points:

1) GML does not rely on the PSVI being richer.
2) Validation on insertion makes more sense, but you generaly need something much stronger than schema validation to ensure the content meets the typical foreign-key constraints we are used to in DB-land
3) OASIS Catalogs can be used to find local (cached/hacked/simplified) copies of schemas - in practice this is the only workable way of using GML IMHO

Rob

I'm +1 on applying this fix. It's the sensible thing to do, people I've talked to at the OGC agree, and we're actively working with them to move this fix through the process. But there's no reason for us to not do sensible stuff that everyone acknowledges should be in OGC stuff.

Chris

Saul Farber wrote:

Good points Rob.

I think, however, that my original question has gotten a bit derailed. I'll give it one more shot.

Here's my summary of the emails so far:

1) GML 2.1.2 schemas have a bug in them. This bug causes xerces and XMLSpy (and possibly other parsers) to not properly validate any documents referencing geometry.xsd

2) GML 3.1.x does not have such a bug. The future (including WFS 1.1) has lots of GML 3.1.x in it.

3) There exists a very simple fix to the GML 2.1.2 schema bug which will allow GML 2.1.2-based documents (all WFS 1.0/SLD 1.0 requests) to validate and gain the advantages of validating. Schema<->language binding tools, eclipse auto-complete, general sanity checks, etc.

4) Chris is in the process of pushing OGC to "quickly" (1-2 months?) apply the fix from #3 to GML 2.1.2 via a corrigendum.

My question is:

Is there any reason that Geoserver should *not* apply the gml-2.1.2-fix to geometry.xsd?

The patch will not invalidate any existing XML documents, and will not make any documents which are currently invalid valid. The patch does *not* change the content-model of geometry.xsd in any way.

The patch *will* let people hit "validate" in their favorite XML editor without errors. It will let eclipse perform auto-completion for SLD, WFS and GML when using an XML editor with schema-awareness (WST, oXygen, etc.). It will let folks using xerces-based GML <-> language binding tools use these tools more effectively.

In addition, there's no effort "burnt" on this (other than my time writing long-winded emails!) The patch exists, it works, application is approximately 30 seconds, most of which is waiting for the ssh login prompt to appear.

Anyone have a reason not to apply this fix?

--saul

Rob Atkinson wrote:

A known problem.

There is a logical answer of course:

GML 2.x is deprecated.

move to GML 3.1.1 and WFS 1.1 ASAP instead of burning effort on this.

On the more general side, schema validation is problematic at run time due to performance reasons. A few salient points:

1) GML does not rely on the PSVI being richer.
2) Validation on insertion makes more sense, but you generaly need something much stronger than schema validation to ensure the content meets the typical foreign-key constraints we are used to in DB-land
3) OASIS Catalogs can be used to find local (cached/hacked/simplified) copies of schemas - in practice this is the only workable way of using GML IMHO

Rob

-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Geoserver-devel mailing list
Geoserver-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/geoserver-devel

!DSPAM:1003,45774869251421116498154!

--
Chris Holmes
The Open Planning Project
http://topp.openplans.org

Patch - no problems. +1

Your reasoning as to why its worthwhile makes sense, but the explicit assumption I was missing, and I think we both agree on, is that people will have to validate against a hacked copy of the schemas in practice. This still requires a fair bit of knowledge how to use the tools properly, and is again why GML 3.1.1 must be taken forward.

Rob

Saul Farber wrote:

Good points Rob.

I think, however, that my original question has gotten a bit derailed. I'll give it one more shot.

Here's my summary of the emails so far:

1) GML 2.1.2 schemas have a bug in them. This bug causes xerces and XMLSpy (and possibly other parsers) to not properly validate any documents referencing geometry.xsd

2) GML 3.1.x does not have such a bug. The future (including WFS 1.1) has lots of GML 3.1.x in it.

3) There exists a very simple fix to the GML 2.1.2 schema bug which will allow GML 2.1.2-based documents (all WFS 1.0/SLD 1.0 requests) to validate and gain the advantages of validating. Schema<->language binding tools, eclipse auto-complete, general sanity checks, etc.

4) Chris is in the process of pushing OGC to "quickly" (1-2 months?) apply the fix from #3 to GML 2.1.2 via a corrigendum.

My question is:

Is there any reason that Geoserver should *not* apply the gml-2.1.2-fix to geometry.xsd?

The patch will not invalidate any existing XML documents, and will not make any documents which are currently invalid valid. The patch does *not* change the content-model of geometry.xsd in any way.

The patch *will* let people hit "validate" in their favorite XML editor without errors. It will let eclipse perform auto-completion for SLD, WFS and GML when using an XML editor with schema-awareness (WST, oXygen, etc.). It will let folks using xerces-based GML <-> language binding tools use these tools more effectively.

In addition, there's no effort "burnt" on this (other than my time writing long-winded emails!) The patch exists, it works, application is approximately 30 seconds, most of which is waiting for the ssh login prompt to appear.

Anyone have a reason not to apply this fix?

--saul

Rob Atkinson wrote:
  

A known problem.

There is a logical answer of course:

GML 2.x is deprecated.

move to GML 3.1.1 and WFS 1.1 ASAP instead of burning effort on this.

On the more general side, schema validation is problematic at run time due to performance reasons. A few salient points:

1) GML does not rely on the PSVI being richer.
2) Validation on insertion makes more sense, but you generaly need something much stronger than schema validation to ensure the content meets the typical foreign-key constraints we are used to in DB-land
3) OASIS Catalogs can be used to find local (cached/hacked/simplified) copies of schemas - in practice this is the only workable way of using GML IMHO

Rob

-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Geoserver-devel mailing list
Geoserver-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/geoserver-devel