[Geoserver-devel] Question about FeatureType info.xml

Hello,

I am still working on the SDE in geoserver, I will provide more
information regarding the jar distribution and sde view soon to this
mailing list.

I have the following questions regarding the featureType schema, they
might be related directly to geoserver, and if they are not, I apologize
to post it here.

i) currently each featureType is stored in a separated directory. There
is limitation on Windows on the number of subdirectories in a directory,
what is the correct way to handle this on Windows? (I am expecting to
have ten of thousands of featureType)

ii) Many of my featureType have similar data schema, can I create my own
schema base to include in my featureType? I am open to suggestion and
ideas. This problem is probably being dealt with many times, and I
really appreciate if someone can share their thoughts or experience.

Thanks for any help in advance

Regards.

Eddy

I am still working on the SDE in geoserver, I will provide more
information regarding the jar distribution and sde view soon to this
mailing list.

Cool.

I have the following questions regarding the featureType schema, they
might be related directly to geoserver, and if they are not, I apologize
to post it here.

No worries, anything remotely related is fine to deal with here.

i) currently each featureType is stored in a separated directory. There
is limitation on Windows on the number of subdirectories in a directory,
what is the correct way to handle this on Windows? (I am expecting to
have ten of thousands of featureType)

Hmmm... We definitely did not anticipate this problem. Is it the total
number of sub directories that windows limits? What I mean, is could you
have like only 100 sub directories, but then instead of 99 sub
directories, but each of those sub directories has an additional 99 sub
directories? Is the limit the sub directories at one level, or all
recursed sub directories? If it's the latter then I believe GeoServer
should be able to handle it, just group the featureTypes - I'm pretty
positive the configuraiton reader is made to recurse down. If it's the
former than we'll probably have to redo the way we read configuration.
I've been thinking a bit about getting away from the directory structure
and at least allowing the option for a flat file. I believe this should
be too hard, as it's all xml, we just have to figure out how to specify
each featureType. But with tens of thousands of featureTypes we might
actually feel the performance hit of DOM (as it loads the whole tree in
memory). I must admit I never anticipate configuration files so big that
you'd actually feel that hit from DOM. There have been also been thoughts
to rewrite the configuration reading to use something like Avalon (I think
that's what it's called), tools from the cocoon project, which may handle
memory issues a little better.

ii) Many of my featureType have similar data schema, can I create my own
schema base to include in my featureType? I am open to suggestion and
ideas. This problem is probably being dealt with many times, and I
really appreciate if someone can share their thoughts or experience.

You should be able to. We actually had a lot of discussion (I felt maybe
too much) about how to handle schemas. We wanted to make it super easy
for most users - if you don't input any information then it's all just
generated for you from the structure of the data. So if you are fine with
having it all generated, with not having as much control over it, then
just don't create a schema.xml file (doing the defaults in the web admin
tool should do this for you). Whenever a describeFeatureType request is
made the appropriate XML schema is generated.

We also wanted to give people a decent level of control over some of the
basics. And wanted to centralize some of the configuration. So what we
did was allowed the schema.xml to have some control over the featureType.
You can make attributes mandatory (set minOccurs=1), which makes it so
they are always returned in a GetFeatureRequest (even if not requested,
since they need to be there to match the schema). And you can also not
include certain attributes, and they will then be essentially 'hidden',
not reported in any of the returned features. In 1.3 we're hoping to have
the ability to map different names to the various attributes.

We figured that would cover maybe 99% of the uses. And I insisted that we
allow people to have complete control over the schema.xml file if they
wanted to. I'm actually not positive if this completely worked out, right
now I'm wondering if it might try to assert some control over what appears
in the output. But basically you can put in whatever you want for the
schema.xml file, and it will be returned as the DescribeFeatureType
response, with a slight bit of extra information - which allows us to
perform the appropriate namespacing. So the answer is, if you really want
your own schemaBase, I'm pretty sure you can have it. I don't know all
the exact XML statements to do it right, but basically I think you would
have it live somewhere online (could be within geoserver, in one of the
featureType directories for example, though you'd probably have to hard
code the reference, you would not be able to get geoserver's ability to
resolve where the request is coming from). You would have an import
statement to import it into your schema.xml file, and then I think have
each extend it in a different way. There's a chance I may be wrong
though, you may need the import at the level of the complexType name=...,
and that's where GeoServer does some generation, so that the type and the
substitution group match. If this doesn't work for you and you need more
control I can definitely work with you to get it working as you'd like.
You're actually probably one of the more advanced users of GeoServer, so I
don't think this problem has actually been dealt with _many_ times. But I
would say that if you're just needing a schema that matches the features
you're returning then just let GeoServer generate it for you, that's what
most users do. Do you have good reasons to not do this? I believe there
are good reasons, I'm just interested in getting more actual user
feedback, since we really did not have much of an idea when we coded it
up.

best regards,

Chris

Hi Eddy,

Chris has eloquently reviewed the design considerations behind geoserver's current thinking. There is some work being done to shake this paradigm up a little - with the idea that the feature you serve is best defined by the user's needs, not the database schema employed.

you might like to review the thinking at:

http://geotools.org/Community+Schema+Support+and+Complex+Types

and the progress we have made at

https://www.seegrid.csiro.au/twiki/bin/view/Infosrvices/GeoserverMapping

(NB "We" is a consortium that includes GML and WFS specification working group member Simon Cox - so we are at the cutting edge of the standardisation of these approaches if you are interested in deeper collaboration)

You might also look at the thinking we have developed in respect of "Feature Type Catalogs" at :

https://www.seegrid.csiro.au/twiki/bin/view/Infosrvices/InformationViewpoint

It sounds to me you might be running up against the sort of semantic scalability concerns that we have explored a bit - and we'd be very happy to have your feedback at the very least.

There might be a way of reconciling these concerns with the "ease of setup" issues Chris has raised with automated generation of implementation (tables) from the externally defined schemas. It wouldnt suprise me though if your application couldnt use a single table with extension "slots" to hold the many related feature types, in which case the ability to pull out only the matching features for a particualr schema is required - this would be supported by the current implementation under the "complex_sco" branch.

Cheers
Rob Atkinson

Chris Holmes wrote:

I am still working on the SDE in geoserver, I will provide more
information regarding the jar distribution and sde view soon to this
mailing list.
   

Cool.

I have the following questions regarding the featureType schema, they
might be related directly to geoserver, and if they are not, I apologize
to post it here.
   

No worries, anything remotely related is fine to deal with here.

i) currently each featureType is stored in a separated directory. There
is limitation on Windows on the number of subdirectories in a directory,
what is the correct way to handle this on Windows? (I am expecting to
have ten of thousands of featureType)
   

Hmmm... We definitely did not anticipate this problem. Is it the total number of sub directories that windows limits? What I mean, is could you have like only 100 sub directories, but then instead of 99 sub directories, but each of those sub directories has an additional 99 sub directories? Is the limit the sub directories at one level, or all recursed sub directories? If it's the latter then I believe GeoServer should be able to handle it, just group the featureTypes - I'm pretty positive the configuraiton reader is made to recurse down. If it's the former than we'll probably have to redo the way we read configuration. I've been thinking a bit about getting away from the directory structure and at least allowing the option for a flat file. I believe this should be too hard, as it's all xml, we just have to figure out how to specify each featureType. But with tens of thousands of featureTypes we might actually feel the performance hit of DOM (as it loads the whole tree in memory). I must admit I never anticipate configuration files so big that you'd actually feel that hit from DOM. There have been also been thoughts to rewrite the configuration reading to use something like Avalon (I think that's what it's called), tools from the cocoon project, which may handle memory issues a little better.

ii) Many of my featureType have similar data schema, can I create my own
schema base to include in my featureType? I am open to suggestion and
ideas. This problem is probably being dealt with many times, and I
really appreciate if someone can share their thoughts or experience.
   

You should be able to. We actually had a lot of discussion (I felt maybe too much) about how to handle schemas. We wanted to make it super easy for most users - if you don't input any information then it's all just generated for you from the structure of the data. So if you are fine with having it all generated, with not having as much control over it, then just don't create a schema.xml file (doing the defaults in the web admin tool should do this for you). Whenever a describeFeatureType request is made the appropriate XML schema is generated.

We also wanted to give people a decent level of control over some of the basics. And wanted to centralize some of the configuration. So what we did was allowed the schema.xml to have some control over the featureType. You can make attributes mandatory (set minOccurs=1), which makes it so they are always returned in a GetFeatureRequest (even if not requested, since they need to be there to match the schema). And you can also not include certain attributes, and they will then be essentially 'hidden', not reported in any of the returned features. In 1.3 we're hoping to have the ability to map different names to the various attributes.

We figured that would cover maybe 99% of the uses. And I insisted that we allow people to have complete control over the schema.xml file if they wanted to. I'm actually not positive if this completely worked out, right now I'm wondering if it might try to assert some control over what appears in the output. But basically you can put in whatever you want for the schema.xml file, and it will be returned as the DescribeFeatureType response, with a slight bit of extra information - which allows us to perform the appropriate namespacing. So the answer is, if you really want your own schemaBase, I'm pretty sure you can have it. I don't know all the exact XML statements to do it right, but basically I think you would have it live somewhere online (could be within geoserver, in one of the featureType directories for example, though you'd probably have to hard code the reference, you would not be able to get geoserver's ability to resolve where the request is coming from). You would have an import statement to import it into your schema.xml file, and then I think have each extend it in a different way. There's a chance I may be wrong though, you may need the import at the level of the complexType name=..., and that's where GeoServer does some generation, so that the type and the substitution group match. If this doesn't work for you and you need more control I can definitely work with you to get it working as you'd like. You're actually probably one of the more advanced users of GeoServer, so I don't think this problem has actually been dealt with _many_ times. But I would say that if you're just needing a schema that matches the features you're returning then just let GeoServer generate it for you, that's what most users do. Do you have good reasons to not do this? I believe there are good reasons, I'm just interested in getting more actual user feedback, since we really did not have much of an idea when we coded it up.

best regards,

Chris

-------------------------------------------------------
This SF.Net email is sponsored by:
Sybase ASE Linux Express Edition - download now for FREE
LinuxWorld Reader's Choice Award Winner for best database on Linux.
http://ads.osdn.com/?ad_id=5588&alloc_id=12065&op=click
_______________________________________________
Geoserver-devel mailing list
Geoserver-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/geoserver-devel