[Geoserver-devel] thinking about literal, and the other sides of the coin...

We have been toying on the email list (from a bunch of separate problems) with the idea of creating a helper class to, ie a converter, to allow our library to work with more "literal" types.

One of the things that appears on RobA's plan for geoserver over the next while, is a range of new types to consider (some of them like time we are talking about now), and others are small beans that are known to us (Citation and so on), and others will be new ...

I was thinking we may wish to consider the larger picture:
- our SAX and DOM parsers need this utility class to allow correct handling of simple "literals" (ie plain text)
- our XML Transform system needs a similar system to write out a plain text literal
- our prototype XDO parser can also make use of the the utility class for both reading and writing plain text literals
- our GTXML parser can also make use of the the utility class for both reading and writing plain text literals

<aside>
We also have the following examples using XML fragments:
- our SAX parsers support Geometry (an example of content that is an XML fragment rather then a plain text literal)
- our DOM parser supports Geometry (an example of content that is an XML fragment rather then a plain text literal)
- GTXML provides a API allowing us to writer a "binding" to get a bean produced from XML

(For handling XML fragments we should directly reuse the GTXML bindings, we can traverse the DOM to feed a small fragment into GTXML if needed, the SAX parser should just be replaced)
</aside>

If we combined all of these concerns what would the interface look like? Something similar to the hibernate "UserType", but instead defined with the method needed to make our code turn over....

The easy case is what we need right now - a value that can be encoded in plain-text:

interface UserType<T> {
      Class<T> getType();
      String toLiteral( T object );
      Object parseLiteral( String literal ); }

The idea is that when parsing, you have the target schema in mind, and the AttributeType has a class which you can use to look up UserType to parse the literal. The same thing works in reverse when transforming content into XML.

If we did not have the AttributeType information we would need to:
- try them all ie boolean canParse( String literal ); or
- look at the schema to make an informed guess .... which is exactly what GTXML does

So here is my questions....
Q: can we settle down on the plain-text "literal" bindings for GTXML, ensure the API is simple and well documented, and start making use of it right away to patch our existing SAX and DOM parsers?

Q: you do have the concept in XML of allowing a field to contain many literals (seperated by spaces), in a pure parsing senario this would be a case where canParse( String literal ) is required, but for our use our attribute types have a single type of content right now.

Jody

One thing to emphasise:

One strategy to build the data model is to parse the schema. This is certainly a huge step forward than assuming the persistence layer can be turned into what the user wants directly. It does however run into some limitations, in that a GML schema is actually not fully expressive of the data model. For example, JTS implements a subset of the ISO geometry types, and we do not rely on parsing the GML schemas to understand how these behave.

The ability to plug in type handling libraries provides for more power, and significantly better performance than parsing schemas for well-known types. One set of types we know we will have trouble with are those in ISO19139 - the XML encoding of ISO19115 metadata - which creates elements with xs:anyType, that are supposed to be bound to concrete types using a separate "isoType" attribute. There are several approaches to this problem, but will require some ability to inject type bindings during the parsing process, probably from a schema-mapping configuration.

Hopefully, we can get the API right, then introduce trivial implementations to get over the immediate problems. Do we need a breakout on this issue to get it right? Can the FM branch hackers identify the details of a candidate API to discuss?

RobA

Jody Garnett wrote:

We have been toying on the email list (from a bunch of separate problems) with the idea of creating a helper class to, ie a converter, to allow our library to work with more "literal" types.

One of the things that appears on RobA's plan for geoserver over the next while, is a range of new types to consider (some of them like time we are talking about now), and others are small beans that are known to us (Citation and so on), and others will be new ...

I was thinking we may wish to consider the larger picture:
- our SAX and DOM parsers need this utility class to allow correct handling of simple "literals" (ie plain text)
- our XML Transform system needs a similar system to write out a plain text literal
- our prototype XDO parser can also make use of the the utility class for both reading and writing plain text literals
- our GTXML parser can also make use of the the utility class for both reading and writing plain text literals

<aside>
We also have the following examples using XML fragments:
- our SAX parsers support Geometry (an example of content that is an XML fragment rather then a plain text literal)
- our DOM parser supports Geometry (an example of content that is an XML fragment rather then a plain text literal)
- GTXML provides a API allowing us to writer a "binding" to get a bean produced from XML

(For handling XML fragments we should directly reuse the GTXML bindings, we can traverse the DOM to feed a small fragment into GTXML if needed, the SAX parser should just be replaced)
</aside>

If we combined all of these concerns what would the interface look like? Something similar to the hibernate "UserType", but instead defined with the method needed to make our code turn over....

The easy case is what we need right now - a value that can be encoded in plain-text:

interface UserType<T> {
      Class<T> getType();
      String toLiteral( T object );
      Object parseLiteral( String literal ); }

The idea is that when parsing, you have the target schema in mind, and the AttributeType has a class which you can use to look up UserType to parse the literal. The same thing works in reverse when transforming content into XML.

If we did not have the AttributeType information we would need to:
- try them all ie boolean canParse( String literal ); or
- look at the schema to make an informed guess .... which is exactly what GTXML does

So here is my questions....
Q: can we settle down on the plain-text "literal" bindings for GTXML, ensure the API is simple and well documented, and start making use of it right away to patch our existing SAX and DOM parsers?

Q: you do have the concept in XML of allowing a field to contain many literals (seperated by spaces), in a pure parsing senario this would be a case where canParse( String literal ) is required, but for our use our attribute types have a single type of content right now.

Jody

-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys -- and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Geotools-devel mailing list
Geotools-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/geotools-devel