My only suggestion is - send these emails to the geotools2 list. It is a
developement listI have been bad recently and have had a lot of
these private, productive development discussions/decisions away from
the group. I have tried to reform my ways by posting irc chats to Jira
bug tasks. When ever you start off one of these threads (I am sure it
began as a short email), I never know if I should cross post it to devel
or not.
Yeah, cross post your replies somewhere. If I ever send you something
that I don't want to be seen on a mailing list I will tell you directly.
The main reason I did not post the original to geotools is that no one
there has _any_ idea what I would be talking about. So I wanted to be
sure that I had a clear picture from you guys before I actually tried to
introduce this stuff to them. I wanted to formulate my opinions on the
work done, and I'm still doing that. To roll stuff in to geotools we need
to explain exactly what's going on and the pros and cons of doing it one
way or another. I'm fine with some hacks for geoserver, but I want
everything going into geotools clean and thought about by more than one
person. So i wrote this email to you two to clarify my thoughts and get
feedback. But you're right, it should have at least gone to
geoserver-devel. I was planning on taking your reply and working on an
introduction of the issues and posting it to geotools. But let's go ahead
and continue on geoserver-devel. I'll post my reply here shortly, and we
can figure out our strategy for getting this stuff in geotools.
Chris
On Wed, 11 Feb 2004, Jody Garnett wrote:
cholmes@anonymised.com wrote:
>Ok, so I've finally got a chance to start to review the gml stuff you've
>been working on. And started to review other parts of the code base,
>and the ui. I'll write that up in a seperate email. It's looking good,
>I got it to actually work for the first time, previously I could
>navigate but things would never save and reload. But it means now I can
>get into the more nitpicky stuff to make it really tight. I'll probably
>have some more design stuff (and I'm currently regretting the decision
>to go with the action/config/form stuff, we should have flipped it
>around when we had the chance, it's just confusing for someone looking
>at the package names as to what does what. But that's in the past,
>someday I'll try to flip it around).
>
>But I have spent some good time with the gml stuff. And I'm not sure
>we're necessarily on the right track. I tried to say awhile ago that we
>should not try to do schema parsing, and we currently have a half
>solution to schema parsing, that falls short in a number of areas. It
>won't do full schema parsing, but it also won't allow users to have
>complete control over their schemas.
>
>The first place this falls short is with further restrictions to a
>schema. Which to me is the main reason a user would write their own
>schema. The current things that it reads in express very little that we
>don't already automatically do. And if that's the approach than I can
>understand why you guys are not very psyched about supporting this
>capability, and have attempted to squash it all into the current model.
> A simple case of this is given in the old schema tutorial. I want to
>limit my strings to a length of 10. I can make my postgis table say
>this (and now I can also add validation that says this). So I'd like to
>express this in my schema to users:
>
><xs:element name="BOROUGHNAM" nillable="true" minOccurs="0" maxOccurs="1">
> <xs:simpleType>
> <xs:restriction base="xs:string">
> <xs:maxLength value="10"/>
> </xs:restriction>
> </xs:simpleType>
> </xs:element>
>
>Currently this is read in and stored somehow, and then when I ask for it
>back in DescribeFeatureType I get:
>
>
><xs:element minOccurs="0" name="length" nillable="true" maxOccurs="1">
>Element[QName[xs:element] Attr[QName[name] length] Attr[QName[nillable]
>true] Attr[QName[minOccurs] 0] Attr[QName[maxOccurs] 1]]
></xs:element>
>
>
>
I would guess this is a bug!>ie a bunch of garbage.
>
>
Looks like it.>Other then maxLength the other contraining facets are length, minLength,
>pattern, enumeration, whitespace, (min|max)(Ex|In)clusive, totalDigits,
>fractionDigits.
>
>
All of this should be preserved in an XML fragment by the DTO objects.>As a schema writer I also might want to add an xs:annotation, but our
>schema reading right now just ignores it, returning nothing as far as I
>can tell.
>
>I also might choose to write my schemas a little differently, and
>instead of saying:
>
><xs:element type="descType" minOccurs="0" name="btrn_bc_id"
>nillable="false" maxOccurs="1"/>
>
>I prefer to say:
>
><xs:simpleType name="descType">
> <xs:restriction base="xs:string"/>
> </xs:simpleType>
><xs:element type="descType" minOccurs="0" name="btrn_bc_id"
>nillable="false" maxOccurs="1"/>
>
>This currently leads to a null pointer exception.
>
>This said I am impressed with how much you actually did get, getting the
>pointTypes and pointMember stuff right as far as I can tell in the gml
>stuff. The point I'm trying to make here is not to say you've fallen
>short, it's to say that XMLSchema stuff is _nasty_. And most users are
>not going to want or know anything about any of this. But our current
>solution limits people; it tries to squeeze everyone into the things
>that we cover, instead of allowing them to express it as they want.
>
>
Lets clean up these bugs and see what we can see.
The goal is to:
- parse out minOccurs, maxOccurs, nillable if present
- determine attribute type or type+fragment
- determine attribute name (may be derived from type)>Ok, so now you're saying, but a partial solution is better than none.
>And I completely agree. I think this code probably does have a place in
>geotools. But before it is there I would like unit tests with xerces
>validations against the gml schemas. I'm always getting my gml schemas
>wrong, and if people are to expand on this stuff in the future, and add
>even more functionality, we need really good tests to ensure that
>nothing breaks when someone decides to work on restriction bases.
>
>
Now that david has generalized the GMLUtils to handle more then just the
hardcoded xs/gml prefixs, we may be able to do this.
I am afraid I have not bothered David to get into the habit of writing
test cases, it is not the practice in GeoServer.
Since your module is welcoming David to gt2, we will need to introduce
david to the gt2 guidelines of unit test cases and 60% coverage.>But I also think the direction we are taking things is wrong. This is
>criticism is separate from the GMLUtils stuff (yes, I know it's renamed,
>and I like the factories, but it's easiest to just refer to it as
>GMLUtils). This has more to do with the AttributeTypeMetaData
>structures where we are currently keeping the 'extra' information from
>reading the schemas. For the most part I'm fine with the
>FeatureTypeMetaData, because it _actually_ is metadata, the name,
>abstract, ect. But minOccurs/maxOccurs, complex and whatnot are
>information about the attributeType. I do agree that there is some
>information that should be in a sort of metadata structure, for
>representations that are data format specific. The main one I'm
>thinking of here is prefixes. Geotools feature model should not care
>what prefix things represent themselves with. The min/maxOccurs should
>go in the actual AttributeType. This is also where this restriction
>base stuff should go. There's actually a hint of this with
>'fieldLength'.
>
I understand chris, and agree completely. I just did not have time last
milestone to push a decent split
between AttributeType and AttributeTypeMetaData. I am afraid I get hacky
occasionallyWhat I would like to do is: clean up the GMLUtils stuff and use it as a
driver to clean up AttributeType/AttributeMetaData.> This is actually not quite as it should be, because what
>does fieldLength mean for a geometry? But fieldLength is essentially a
>restrictionBase of maxLength. Our AttributeTypes should be able to
>represent these things, and parse and reject attributes that do not meet
>them. For some data formats this will be overkill – shapefiles can't
>specify that something should have 5 places before the decimal and two
>after. But gml certainly can, and so can a database. So instead of
>putting restriction base information inside of MetaData, which from
>where the work is going it seems like the thing that you would do, just
>record the structure so you could spit it out later, instead you should
>be able to put it in the attributeType. Yes, this will lead to geotools
>feature model being richer, which is what should happen. For something
>like prefix we may need another solution, but I think most everything
>should be in AttributeType.
>
>
See above: agreeded. I should of made my hacking more explicit. As long
as we are in confession time, I *really* want
to switch the relationship around so that AttributeType points to
AttributeTypeMetaData.>The reason for this is that if we get the geotools FeatureType rich
>enough then it can perform validation for us. The
>TransactionFeatureHandler should _just_ have a featureType to validate.
>
>
Ah to dream.> Jody, you were saying that it would need a AttributeTypeMetaData, and
>that it would handle stuff. I disagree – it should all be in one place,
>it's all involved in parsing an attribute and seeing if it is allowed in
>the FeatureType. Once we have a rich enough FeatureType we should be
>able to parse a feature against a FeatureType passed in to the
>FeatureHandler. The FeatureType will be parsed separately (or known as
>is the case with geoserver), and the feature coming in will have to
>conform to it. This solves the jira task of figuring out the
>appropriate type, and does much more.
>
>>Awhile ago I recommended two routes to you. One was to take schema
>parsing seriously, the other was to lean on your guys's validation stuff
>more. You've gone down the route of schema parsing, but the problem is
>that it's not tied into anything, it doesn't _mean_ anything. If I have
>an attributeType called length, that is represented by decimals in the
>backend, then right now I can call it a decimal in my schema.xml file
>and no one will complain until I actually try to validate it. Now I
>felt this was fine when we were just passing a schema straight from the
>schema.xml file, putting all the onus on the user. But now that we sort
>of mess with it in a half assed way I feel less good about it. We don't
>_actually_ let the user do everything that one can with a schema (not
>even close, since you can do some crazy shit), but we also don't check
>the things that we do let him do. If this information where in the
>AttributeType then this would be easier to do. What we would do is have
>the user's defined FeatureType interact with the datastore. We would
>create the features with his defined FeatureType, or at least parse them
>on the way out. We could have an option for speed for him to not go
>through that additional step, but if we are controlling the data in the
>way that we are then we should be able to provide some checks.
>
>
Actually this is tied to something, the user interface. We needed this
information to have a decent ui, recognize pointPropertyType and so on.
I will understand if we need to back off from this approach, lets try
and fix those bugs you did find (remember we have no practicle
experience writing strange XMLSchemas).Can you do me a favour and place as many strange XMLSchemas into Junit
tests as we possible can? This will be the fastest way to clean up support.Back to your point, I am much more comfortable forcing AttributeType to
handle restrictions then generating validaiton tests. We both want
AttributeType to be more expressive, lets add functionality at the same
time.>So I'd like to provide those checks if we are in fact controlling the
>users data as we are now. The alternative, that I've thought through
>less thoroughly, is to read information in from the schemas but
>translate that into your guys' validation rules. You have things like
>length and minimums and whatnot I think? I don't know the validation
>stuff that well, but it seems like it could offer, or could be coded to
>offer, most all of the restrictions that are possible. I think true
>schema parsing is the way to eventually go, but if you insist on still
>turning the users schema data into code then I really think we should
>actually do something with that data. Representing it in code isn't
>that great if we can't do anything with it (which I feel like is the
>case with AttributeTypeMetaData), and I don't want more effort to go
>into representing the things I've brought up without setting up data
>structures to actually do stuff with it. And I think those
>datastructures should be extensions to the core geotools classes, not
>weird metadata tacked on.
>
>
Providing checks as part of the validation stuff can be done, I am more
interested in pushing this sort of thing into gt2.
Talk more about "true" schema parsing? Once again I am sorry for
confusing this issue with AttributeTypeMetaData, this amounted
to hacking my own class since it would of taken too much time to place
changes into AttributeType.>This also speaks against your thought to have non-fragmented schema.xml
>files, because if we give people that kind of free reign they're going
>to do all sorts of nasty stuff that we can't currently represent.
>
>
Sigh, makes me wish we had just given them a big text areaRight now
we are looking at four weeks all told vs a big old text area.
There may be some hope there - apparently while gml and xs are not
fixed, there is a name that is fixed that is separate from both the
prefix and where the XMLSchema file is located.>And of course we can just put restrictions on people of what they can
>and can't do in the schema.xml files, but that really wasn't the point
>of keeping around the schema.xml files – anything that we can handle
>automatically we should just handle automatically. But we don't
>currently handle enough automatically to do away with user configurable
>stuff.
>
>Does this make sense? Am I being unreasonable? I just feel that if we
>are going to attempt to read schemas and put them into data structures
>we should 1) handle close to everything or leave a way for people to
>bypass it, and 2) actually do something with the datastructures, as they
>are useful, and could do some wonderful with validation, which is why
>people would add such extensions in the first place.
>
>
You are reasonable:
1) We need to quash bugs, I would like to handle everything (well half
of everything, we have a XML fragement escape hatch)
2) We need to give geotools2 some time to adjust, let these changes
ripple through the API before we can expect very much.Remember that the XMLSchemas on their own are useful, they describe the
data to the end GML user. If you are desparate for 2) we could generate
some validation tests, but I would hate to see anything destract from
make a more rich gt2 attributeType model.>Ok, more tomorrow, and please let me know if I'm being completely
>wrong-headed about this, I'm just worried that this path of trying to do
>schema processing is going to bite us in the ass at a later date. And
>apologies for errors in writing, this is too long and I'm too tired to
>proof read.
>
>
That is okay - really chris this is great feedback.Remember where XMLSchema is concerned - we are always going to get
bitten at a later date. The pain we are feeling now is that GMLUtils has
moved the date closer. Most of this email has reminded me of all the
short cuts I made with AttributeTypeMetaData. Talking with david he is
keen that you have found some bugs.My only suggestion is - send these emails to the geotools2 list. It is a
developement listI have been bad recently and have had a lot of
these private, productive development discussions/decisions away from
the group. I have tried to reform my ways by posting irc chats to Jira
bug tasks. When ever you start off one of these threads (I am sure it
began as a short email), I never know if I should cross post it to devel
or not.If you origionally did not cross post it for fear of a) filling up the
email list with pages of text or b) upsetting david or I with public
review please don't worry about such in the future.a) The list is for public review b) we are all working on this together
- XMLSchema has been beating us up on and off for weeks now, getting the
whole list involved can only help.Jody
--