>This said I am impressed with how much you actually did get, getting the
>pointTypes and pointMember stuff right as far as I can tell in the gml
>stuff. The point I'm trying to make here is not to say you've fallen
>short, it's to say that XMLSchema stuff is _nasty_. And most users are
>not going to want or know anything about any of this. But our current
>solution limits people; it tries to squeeze everyone into the things
>that we cover, instead of allowing them to express it as they want.
>
>
Lets clean up these bugs and see what we can see.
The goal is to:
- parse out minOccurs, maxOccurs, nillable if present
- determine attribute type or type+fragment
- determine attribute name (may be derived from type)
...
Can you do me a favour and place as many strange XMLSchemas into Junit
tests as we possible can? This will be the fastest way to clean up support.
Ok, I think you missed my point here, which is that to do XMLSchemas we
need to approach it head on, instead of me just thinking of ways to mess
things up, and attempting to back into a solution. We need a better
framework if we are to do this, and everything needs to turn into
FeatureTypes as a FeatureType is directly our representation of an xml
schema. There's also imports to worry about, where a schema is
contained in multiple documents. I may define cdf:length in
another schema file, and just do an import. There are groups, choices,
all, attribute groups. There's also nested types, features within
features, which we should also handle eventually. I don't have samples of
these laying around, and I'm not sure that it's worth it to come up with
them. And I don't think shoving this information into the DTO is the
answer, because it becomes more and more work going into a datastructure
that we can't use to validate with.
>Ok, so now you're saying, but a partial solution is better than none.
>And I completely agree. I think this code probably does have a place in
>geotools. But before it is there I would like unit tests with xerces
>validations against the gml schemas. I'm always getting my gml schemas
>wrong, and if people are to expand on this stuff in the future, and add
>even more functionality, we need really good tests to ensure that
>nothing breaks when someone decides to work on restriction bases.
>
>
Now that david has generalized the GMLUtils to handle more then just the
hardcoded xs/gml prefixs, we may be able to do this.
I am afraid I have not bothered David to get into the habit of writing
test cases, it is not the practice in GeoServer.
Since your module is welcoming David to gt2, we will need to introduce
david to the gt2 guidelines of unit test cases and 60% coverage.
Cool. Yeah, I get lax with geoserver, but that's because we have the cite
tests to fall back upon (though I would like to see more unit tests, and
during my travels when I'm less connected to the internet that's one of
the things I'll be working on). But geotools we _need_ tests for, since
it's a project on its on; we can't expect people to run geoserver cite
tests to see if they broke something (though I've had a few occasions
where the tests weren't good enough in geotools and people broke stuff for
me in geoserver).
>But I also think the direction we are taking things is wrong. This is
>criticism is separate from the GMLUtils stuff (yes, I know it's renamed,
>and I like the factories, but it's easiest to just refer to it as
>GMLUtils). This has more to do with the AttributeTypeMetaData
>structures where we are currently keeping the 'extra' information from
>reading the schemas. For the most part I'm fine with the
>FeatureTypeMetaData, because it _actually_ is metadata, the name,
>abstract, ect. But minOccurs/maxOccurs, complex and whatnot are
>information about the attributeType. I do agree that there is some
>information that should be in a sort of metadata structure, for
>representations that are data format specific. The main one I'm
>thinking of here is prefixes. Geotools feature model should not care
>what prefix things represent themselves with. The min/maxOccurs should
>go in the actual AttributeType. This is also where this restriction
>base stuff should go. There's actually a hint of this with
>'fieldLength'.
>
I understand chris, and agree completely. I just did not have time last
milestone to push a decent split
between AttributeType and AttributeTypeMetaData. I am afraid I get hacky
occasionallyWhat I would like to do is: clean up the GMLUtils stuff and use it as a
driver to clean up AttributeType/AttributeMetaData.
Agreed. Though I think I may want to blow away AttributeMetaData (see
below).
> This is actually not quite as it should be, because what
>does fieldLength mean for a geometry? But fieldLength is essentially a
>restrictionBase of maxLength. Our AttributeTypes should be able to
>represent these things, and parse and reject attributes that do not meet
>them. For some data formats this will be overkill – shapefiles can't
>specify that something should have 5 places before the decimal and two
>after. But gml certainly can, and so can a database. So instead of
>putting restriction base information inside of MetaData, which from
>where the work is going it seems like the thing that you would do, just
>record the structure so you could spit it out later, instead you should
>be able to put it in the attributeType. Yes, this will lead to geotools
>feature model being richer, which is what should happen. For something
>like prefix we may need another solution, but I think most everything
>should be in AttributeType.
>
>
See above: agreeded. I should of made my hacking more explicit. As long
as we are in confession time, I *really* want
to switch the relationship around so that AttributeType points to
AttributeTypeMetaData.
Ok, please explain to me why we need AttributeTypeMetaData? What is
contained in an attributeType that is appropriate for metadata? min and
max occurs should not be there, they do validation, determine what the
attribute should be. Restriction bases should not be, they are
information about what the attribute should be. AttributeTypes already
are metadata. I see the need for FeatureTypeMetaData, the title,
abstract, ect. But I don't see it for attributeType. The one thing I
conceded before was the prefix. But I don't think I'll give that any
more. I think AttributeType should have a namespace. Most will just
inherit from the featureType, but I think geometry ones should always be
http://www.opengis.net/gml. The prefix that they use should be set by the
gml producer. So the application keeping track of the namespaces and
using the producer should do the setting of the prefixes.
>Awhile ago I recommended two routes to you. One was to take schema
>parsing seriously, the other was to lean on your guys's validation stuff
>more. You've gone down the route of schema parsing, but the problem is
>that it's not tied into anything, it doesn't _mean_ anything. If I have
>an attributeType called length, that is represented by decimals in the
>backend, then right now I can call it a decimal in my schema.xml file
>and no one will complain until I actually try to validate it. Now I
>felt this was fine when we were just passing a schema straight from the
>schema.xml file, putting all the onus on the user. But now that we sort
>of mess with it in a half assed way I feel less good about it. We don't
>_actually_ let the user do everything that one can with a schema (not
>even close, since you can do some crazy shit), but we also don't check
>the things that we do let him do. If this information where in the
>AttributeType then this would be easier to do. What we would do is have
>the user's defined FeatureType interact with the datastore. We would
>create the features with his defined FeatureType, or at least parse them
>on the way out. We could have an option for speed for him to not go
>through that additional step, but if we are controlling the data in the
>way that we are then we should be able to provide some checks.
>
>
Actually this is tied to something, the user interface. We needed this
information to have a decent ui, recognize pointPropertyType and so on.
I will understand if we need to back off from this approach, lets try
and fix those bugs you did find (remember we have no practicle
experience writing strange XMLSchemas).
I like the user interface as it stands, and I actually don't think it's
really appropriate to introduce this level of complexity (that is all the
weird schema stuff one could do) into it. I like that you can specify
which attributes to include, if they're mandatory, ect. I think it's more
than sufficient for someone using a ui. But I don't think that a user who
wants to work with the files and put in their own gml schema should have
their schema not work just because we fail to parse it.
Back to your point, I am much more comfortable forcing AttributeType to
handle restrictions then generating validaiton tests. We both want
AttributeType to be more expressive, lets add functionality at the same
time.
Cool.
Providing checks as part of the validation stuff can be done, I am more
interested in pushing this sort of thing into gt2.
Talk more about "true" schema parsing?
True schema parsing includes weird imports, all kinds of restriction
bases, a bunch of different ways to describe the schema, and complex,
nested types. Doing it would involve working from the XMLSchema
specification:
http://www.w3.org/TR/2001/REC-xmlschema-0-20010502/
http://www.w3.org/TR/xmlschema-1/
http://www.w3.org/TR/xmlschema-2/
There are full books written on just _using_ xml schemas, let alone
writing a parser for them. And the canadian government is going to pay
good money for people to spend months writing one.
Once again I am sorry for
confusing this issue with AttributeTypeMetaData, this amounted
to hacking my own class since it would of taken too much time to place
changes into AttributeType.
Yeah, the problem with doing that is that the stuff put in
AttributeTypeMetaData can't _do_ anything. Or if it does it introduces
another level of complexity into geotools that is not needed.
>This also speaks against your thought to have non-fragmented schema.xml
>files, because if we give people that kind of free reign they're going
>to do all sorts of nasty stuff that we can't currently represent.
>
>
Sigh, makes me wish we had just given them a big text areaRight now
we are looking at four weeks all told vs a big old text area.
There may be some hope there - apparently while gml and xs are not
fixed, there is a name that is fixed that is separate from both the
prefix and where the XMLSchema file is located.
I actually sort of like the ui as it is. I might change min/max occurs to
simply 'mandatory', since we don't actually support more than 1 max
occurs. The text area right now sort of confuses me, I'm not sure what
should go in it. You could have the big text area as an alternative, but
my feeling is that if someone is using the ui they don't want to be
confused with all that complexity. The reason I've insisted on having
everything compatible with files is to allow people who know what they are
doing to hack in stuff that they want, like advanced schemas.
>And of course we can just put restrictions on people of what they can
>and can't do in the schema.xml files, but that really wasn't the point
>of keeping around the schema.xml files – anything that we can handle
>automatically we should just handle automatically. But we don't
>currently handle enough automatically to do away with user configurable
>stuff.
>
>Does this make sense? Am I being unreasonable? I just feel that if we
>are going to attempt to read schemas and put them into data structures
>we should 1) handle close to everything or leave a way for people to
>bypass it, and 2) actually do something with the datastructures, as they
>are useful, and could do some wonderful with validation, which is why
>people would add such extensions in the first place.
>
>
You are reasonable:
1) We need to quash bugs, I would like to handle everything (well half
of everything, we have a XML fragement escape hatch)
Ok, I guess I'm not yet convinced that we really have that escape hatch,
as I want that escape hatch for anything that anyone might put into a
schema.xml file that is valid according to the schema spec and that we
can't yet handle.
2) We need to give geotools2 some time to adjust, let these changes
ripple through the API before we can expect very much.
I would say that _we_ need to adjust geotools, we need to make our
intentions clear, and we have to start rolling it in and doing the work,
making sure people know we are doing the work and what it will mean for
them.
Remember that the XMLSchemas on their own are useful, they describe the
data to the end GML user. If you are desparate for 2) we could generate
some validation tests, but I would hate to see anything destract from
make a more rich gt2 attributeType model.
No, I don't need validation tests. I just want users to be able to input
their own gml and not have us choke if we've never seen it before. I can
maybe be dissuaded from this, but I'd like geoserver to make things easy
for most users but also be able to handle very complex things, even more
complex than we may anticipate. If we really can't have it both ways I
will concede towards easier use, but I think we are too limited right now
in what we do handle in schemas.
Chris