[Geoserver-devel] Creating new feature types from scratch (aka DataStore.createSchema())

Hi,
in OpenGeo we're looking into adding in GeoServer the ability
to create new feature types from a user provided description
by leveraging the data store createSchema(FeatureType) call.

With this mail I'm trying to provide some ideas on the how
and gather feedback.

The main idea is to have a GUI that allows one to compose a
feature type description and an equivalent REST call.
The GUI could be placed in the "new layer chooser",
now we have a
"add layer from" <datastore>
selection, I guess we could have a couple of radio buttons
instead providing also a
"create layer into" <datastore>
or something like that.

The GUI would then offer the ability to add attributes,
name, data type, nullability and length.

The REST api would allow to POST a description of
the feature type to
/workspaces/<ws>/datastores/<ds>
The description would come in the same formats
supported for the feature type
(see http://docs.geoserver.org/2.0.x/en/user/extensions/rest/rest-config-api.html#feature-types)

This would make it quite a bit easier to start
from scratch and allow data editing.

Two things that are rolling in my mind are also how
to possibly handle something as simple as the human error.

Say I made a mistake in the structure of the feature type.
How do I go and correct it?
DataStore and DataAccess provides updateSchema(), though I'm not aware
of any datastore actually implementing this call. For
a datastore
DataStore is also missing dropSchema() method, which sounds
quite like a strong limitation to this use case... I mean
being able to create a feature type by GUI and then have to go
on the database and issue a drop table manually seems backwards.
Dropping is also much simpler to implement than updating.
How do we go about to add this functionality though?
Shall we roll a new interface that contains only that method?
Or create a DataAccess2, DataStore2 subinterfaces that contains
it (oh the horror).
Or... but please be seated before reading this one... hijack
updateSchema and assume the user meant to delete the feature
type when we call updateSchema(Name, null)?

The second thing is, would it make things easier if we mark
a datastore as the "default" and then have all schema management
calls hit it?

Cheers
Andrea

--
Andrea Aime
OpenGeo - http://opengeo.org
Expert service straight from the developers.

Good stuff, this is something I have thought a little about over the last while. Here are some of my random thoughts:

* DataStore.deleteSchema()

I think I would prefer adding a deleteSchema rather than overload the meaning of updateSchema... even though in most cases that is how a datastore would implement updateSchema, first dropping then creating.

How about adding the deleteSchema() method to a few of the abstract data store base classes (like ContentDataStore) without explicitly adding it to the interface. Then of course we would have to call the method reflectively and bomb out in cases where it does not exist.

Then when we go to a new major GeoTools version we pull it up into the DataStore interface.

* Attribute length restrictions

If creating a new schema via an http ("rest") call we are stuck to representing what AttributeTypeInfo allows us to represent. Which leaves out length restrictions and other restrictions.

Perhaps we can just leave this is as a limitation of the http interface. Or perhaps we modify AttributeTypeINfo and add a length field.

* Default datastore

I was thinking that it would be nice to add a concept of a default data store to a workspace, just like we have the concept of a default workspace.

And like workspaces we add a "symbolic link" called "default" which is accessible via http GET/POST/PUT. For example:

   GET http://…/workspaces/topp/datastores/default.xml

And when creating a new schema a user can:

   POST http://…/workspaces/topp/datastores/default/featuretypes/

Or they can explicitly specify a datastore:

   POST http://…/workspaces/topp/datastores/myDataStore/featuretypes/

Or perhaps just rely on all the server defaults:

   POST http://…/workspaces/default/datastores/default/featuretypes/

-Justin

On 3/4/10 7:22 AM, Andrea Aime wrote:

Hi,
in OpenGeo we're looking into adding in GeoServer the ability
to create new feature types from a user provided description
by leveraging the data store createSchema(FeatureType) call.

With this mail I'm trying to provide some ideas on the how
and gather feedback.

The main idea is to have a GUI that allows one to compose a
feature type description and an equivalent REST call.
The GUI could be placed in the "new layer chooser",
now we have a
"add layer from"<datastore>
selection, I guess we could have a couple of radio buttons
instead providing also a
"create layer into"<datastore>
or something like that.

The GUI would then offer the ability to add attributes,
name, data type, nullability and length.

The REST api would allow to POST a description of
the feature type to
/workspaces/<ws>/datastores/<ds>
The description would come in the same formats
supported for the feature type
(see
http://docs.geoserver.org/2.0.x/en/user/extensions/rest/rest-config-api.html#feature-types)

This would make it quite a bit easier to start
from scratch and allow data editing.

Two things that are rolling in my mind are also how
to possibly handle something as simple as the human error.

Say I made a mistake in the structure of the feature type.
How do I go and correct it?
DataStore and DataAccess provides updateSchema(), though I'm not aware
of any datastore actually implementing this call. For
a datastore
DataStore is also missing dropSchema() method, which sounds
quite like a strong limitation to this use case... I mean
being able to create a feature type by GUI and then have to go
on the database and issue a drop table manually seems backwards.
Dropping is also much simpler to implement than updating.
How do we go about to add this functionality though?
Shall we roll a new interface that contains only that method?
Or create a DataAccess2, DataStore2 subinterfaces that contains
it (oh the horror).
Or... but please be seated before reading this one... hijack
updateSchema and assume the user meant to delete the feature
type when we call updateSchema(Name, null)?

The second thing is, would it make things easier if we mark
a datastore as the "default" and then have all schema management
calls hit it?

Cheers
Andrea

--
Justin Deoliveira
OpenGeo - http://opengeo.org
Enterprise support for open source geospatial.

Justin Deoliveira ha scritto:

Good stuff, this is something I have thought a little about over the last while. Here are some of my random thoughts:

* DataStore.deleteSchema()

I think I would prefer adding a deleteSchema rather than overload the meaning of updateSchema... even though in most cases that is how a datastore would implement updateSchema, first dropping then creating.

Ouch, wouldn't this result in loosing all the data?
In database terms, I would be more favourable to a alter table
than drop and create.
Shapefiles could have to create the new type with a different name,
copy over the data as possible, destroy the old and rename the new.

How about adding the deleteSchema() method to a few of the abstract data store base classes (like ContentDataStore) without explicitly adding it to the interface. Then of course we would have to call the method reflectively and bomb out in cases where it does not exist.

Then when we go to a new major GeoTools version we pull it up into the DataStore interface.

Sounds like a good plan to me. I think that by covering ContentDataStore
and AbstractDataStore we actually have the methods in most of
the actual datastore implementations.
And yes, calling it reflectively allows stores that do not extend from
any of the above (ArcSdeDataStore, PreGeneralizeDataStore)
to still expose such method and have it called.

* Attribute length restrictions

If creating a new schema via an http ("rest") call we are stuck to representing what AttributeTypeInfo allows us to represent. Which leaves out length restrictions and other restrictions.

Perhaps we can just leave this is as a limitation of the http interface. Or perhaps we modify AttributeTypeINfo and add a length field.

Not having the length for strings might end up being quite a limitation.
Afaik we default to 255, which might be way too much (e.g., the field
is supposed to contain the values 'A', 'B', 'C') or not enough (if
the field contains a narrative).
So I would rather go and add it as a new feature.

There is also the problem we also don't have a clear way to differentiate between VARCHAR(x) and TEXT. I guess past this point
people will just have to drop the GS facilities and go use native
tools to create the types.

* Default datastore

I was thinking that it would be nice to add a concept of a default data store to a workspace, just like we have the concept of a default workspace.

And like workspaces we add a "symbolic link" called "default" which is accessible via http GET/POST/PUT. For example:

   GET http://…/workspaces/topp/datastores/default.xml

And when creating a new schema a user can:

   POST http://…/workspaces/topp/datastores/default/featuretypes/

Or they can explicitly specify a datastore:

   POST http://…/workspaces/topp/datastores/myDataStore/featuretypes/

Or perhaps just rely on all the server defaults:

   POST http://…/workspaces/default/datastores/default/featuretypes/

Scratch scratch. I'm having a hard time getting why the
default is important (I hesitantly mentioned it in my first
mail because it was something that was in the air about this work).

The reason I have for having the default workspace is to allow people
to avoid prefixing layer names in OWS requests.
What is the reason for having a default store? Avoiding to remember
its name when doing REST calls, building a client that just does
not know anything about existing workspaces and stores and still
be able to create a layer?

On a different note, if we have also the ability of dropping and
updating, where do we exercise it?
If we have dropping I guess we can skip updating for the
moment and simplify the implementation a little.
But dropping... where? Maybe in the layer feature type section?
A button that allows to kill the layer and also its underlying
resource for good?
Updating, if implemented, could go in the same panel.

Things could get a little confusing however the day we also
have remapping, that is, changing the name or the type of a
column for publishing. This would look a little like updating,
but we would not actually change the structure of the
underlying storage.

Cheers
Andrea

--
Andrea Aime
OpenGeo - http://opengeo.org
Expert service straight from the developers.

On 3/4/10 10:06 AM, Andrea Aime wrote:

Justin Deoliveira ha scritto:

Good stuff, this is something I have thought a little about over the
last while. Here are some of my random thoughts:

* DataStore.deleteSchema()

I think I would prefer adding a deleteSchema rather than overload the
meaning of updateSchema... even though in most cases that is how a
datastore would implement updateSchema, first dropping then creating.

Ouch, wouldn't this result in loosing all the data?
In database terms, I would be more favourable to a alter table
than drop and create.
Shapefiles could have to create the new type with a different name,
copy over the data as possible, destroy the old and rename the new.

I think we are misunderstanding each other. All i am saying is that if the user wishes to update the schema with another attribute, or drop an attribute, etc... then the code ends up calling updateSchema(). If the user actually wishes to delete the schema then the code ends up calling deleteSchema(). If we want to be cautious we can only call deleteSchema() if a "purge" flag is set.

How about adding the deleteSchema() method to a few of the abstract
data store base classes (like ContentDataStore) without explicitly
adding it to the interface. Then of course we would have to call the
method reflectively and bomb out in cases where it does not exist.

Then when we go to a new major GeoTools version we pull it up into the
DataStore interface.

Sounds like a good plan to me. I think that by covering ContentDataStore
and AbstractDataStore we actually have the methods in most of
the actual datastore implementations.
And yes, calling it reflectively allows stores that do not extend from
any of the above (ArcSdeDataStore, PreGeneralizeDataStore)
to still expose such method and have it called.

* Attribute length restrictions

If creating a new schema via an http ("rest") call we are stuck to
representing what AttributeTypeInfo allows us to represent. Which
leaves out length restrictions and other restrictions.

Perhaps we can just leave this is as a limitation of the http
interface. Or perhaps we modify AttributeTypeINfo and add a length field.

Not having the length for strings might end up being quite a limitation.
Afaik we default to 255, which might be way too much (e.g., the field
is supposed to contain the values 'A', 'B', 'C') or not enough (if
the field contains a narrative).
So I would rather go and add it as a new feature.

There is also the problem we also don't have a clear way to
differentiate between VARCHAR(x) and TEXT. I guess past this point
people will just have to drop the GS facilities and go use native
tools to create the types.

* Default datastore

I was thinking that it would be nice to add a concept of a default
data store to a workspace, just like we have the concept of a default
workspace.

And like workspaces we add a "symbolic link" called "default" which is
accessible via http GET/POST/PUT. For example:

GET http://…/workspaces/topp/datastores/default.xml

And when creating a new schema a user can:

POST http://…/workspaces/topp/datastores/default/featuretypes/

Or they can explicitly specify a datastore:

POST http://…/workspaces/topp/datastores/myDataStore/featuretypes/

Or perhaps just rely on all the server defaults:

POST http://…/workspaces/default/datastores/default/featuretypes/

Scratch scratch. I'm having a hard time getting why the
default is important (I hesitantly mentioned it in my first
mail because it was something that was in the air about this work).

The reason I have for having the default workspace is to allow people
to avoid prefixing layer names in OWS requests.
What is the reason for having a default store? Avoiding to remember
its name when doing REST calls, building a client that just does
not know anything about existing workspaces and stores and still
be able to create a layer?

Yeah to make things easier on the client. Perhaps the client does not care where the new data layer ends up in the workspace / datastore hierarchy. Or maybe the admin does not want the client to know because they want the freedom to change it.

All the client wants is a place to create new layers. In this case it would be nice to not have to burden the client with (a) knowing ahead of time what the default workspace and datastores are or (b) having to parse through configuration to find out dynamically what they are.

Anyways, a hypothetical use case to be sure so not a strong argument. It would be nice to have the input of a client side developer who would be building an application against this functionality.

On a different note, if we have also the ability of dropping and
updating, where do we exercise it?
If we have dropping I guess we can skip updating for the
moment and simplify the implementation a little.
But dropping... where? Maybe in the layer feature type section?
A button that allows to kill the layer and also its underlying
resource for good?
Updating, if implemented, could go in the same panel.

Works for me.

Things could get a little confusing however the day we also
have remapping, that is, changing the name or the type of a
column for publishing. This would look a little like updating,
but we would not actually change the structure of the
underlying storage.

Agreed. But I just think we need to be explicit about the differences. Remapping is a publishing concern whereas updating is a resource management concern. I guess if we had the resource pub split it might be a bit clearer.

Cheers
Andrea

--
Justin Deoliveira
OpenGeo - http://opengeo.org
Enterprise support for open source geospatial.

I would add two distinct methods to Datastore:
- updateSchema
- dropSchema

The added clarity would be worth it.

I would also consider extending the format of updateSchema so we can supply default expressions to the "new" columns.

Jody

On 04/03/2010, at 11:22 PM, Andrea Aime wrote:

Hi,
in OpenGeo we're looking into adding in GeoServer the ability
to create new feature types from a user provided description
by leveraging the data store createSchema(FeatureType) call.

With this mail I'm trying to provide some ideas on the how
and gather feedback.

The main idea is to have a GUI that allows one to compose a
feature type description and an equivalent REST call.
The GUI could be placed in the "new layer chooser",
now we have a
"add layer from" <datastore>
selection, I guess we could have a couple of radio buttons
instead providing also a
"create layer into" <datastore>
or something like that.

The GUI would then offer the ability to add attributes,
name, data type, nullability and length.

The REST api would allow to POST a description of
the feature type to
/workspaces/<ws>/datastores/<ds>
The description would come in the same formats
supported for the feature type
(see
http://docs.geoserver.org/2.0.x/en/user/extensions/rest/rest-config-api.html#feature-types)

This would make it quite a bit easier to start
from scratch and allow data editing.

Two things that are rolling in my mind are also how
to possibly handle something as simple as the human error.

Say I made a mistake in the structure of the feature type.
How do I go and correct it?
DataStore and DataAccess provides updateSchema(), though I'm not aware
of any datastore actually implementing this call. For
a datastore
DataStore is also missing dropSchema() method, which sounds
quite like a strong limitation to this use case... I mean
being able to create a feature type by GUI and then have to go
on the database and issue a drop table manually seems backwards.
Dropping is also much simpler to implement than updating.
How do we go about to add this functionality though?
Shall we roll a new interface that contains only that method?
Or create a DataAccess2, DataStore2 subinterfaces that contains
it (oh the horror).
Or... but please be seated before reading this one... hijack
updateSchema and assume the user meant to delete the feature
type when we call updateSchema(Name, null)?

The second thing is, would it make things easier if we mark
a datastore as the "default" and then have all schema management
calls hit it?

Cheers
Andrea

--
Andrea Aime
OpenGeo - http://opengeo.org
Expert service straight from the developers.

------------------------------------------------------------------------------
Download Intel&#174; Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
_______________________________________________
Geoserver-devel mailing list
Geoserver-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/geoserver-devel

On 3/4/10 4:29 PM, Jody Garnett wrote:

I would add two distinct methods to Datastore:
- updateSchema
- dropSchema

Might be nice to agree on "delete" or "remove" rather than "drop" as the latter makes the operation seem database specific.

The added clarity would be worth it.

I would also consider extending the format of updateSchema so we can supply default expressions to the "new" columns.

Jody

On 04/03/2010, at 11:22 PM, Andrea Aime wrote:

Hi,
in OpenGeo we're looking into adding in GeoServer the ability
to create new feature types from a user provided description
by leveraging the data store createSchema(FeatureType) call.

With this mail I'm trying to provide some ideas on the how
and gather feedback.

The main idea is to have a GUI that allows one to compose a
feature type description and an equivalent REST call.
The GUI could be placed in the "new layer chooser",
now we have a
"add layer from"<datastore>
selection, I guess we could have a couple of radio buttons
instead providing also a
"create layer into"<datastore>
or something like that.

The GUI would then offer the ability to add attributes,
name, data type, nullability and length.

The REST api would allow to POST a description of
the feature type to
/workspaces/<ws>/datastores/<ds>
The description would come in the same formats
supported for the feature type
(see
http://docs.geoserver.org/2.0.x/en/user/extensions/rest/rest-config-api.html#feature-types)

This would make it quite a bit easier to start
from scratch and allow data editing.

Two things that are rolling in my mind are also how
to possibly handle something as simple as the human error.

Say I made a mistake in the structure of the feature type.
How do I go and correct it?
DataStore and DataAccess provides updateSchema(), though I'm not aware
of any datastore actually implementing this call. For
a datastore
DataStore is also missing dropSchema() method, which sounds
quite like a strong limitation to this use case... I mean
being able to create a feature type by GUI and then have to go
on the database and issue a drop table manually seems backwards.
Dropping is also much simpler to implement than updating.
How do we go about to add this functionality though?
Shall we roll a new interface that contains only that method?
Or create a DataAccess2, DataStore2 subinterfaces that contains
it (oh the horror).
Or... but please be seated before reading this one... hijack
updateSchema and assume the user meant to delete the feature
type when we call updateSchema(Name, null)?

The second thing is, would it make things easier if we mark
a datastore as the "default" and then have all schema management
calls hit it?

Cheers
Andrea

--
Andrea Aime
OpenGeo - http://opengeo.org
Expert service straight from the developers.

------------------------------------------------------------------------------
Download Intel&#174; Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
_______________________________________________
Geoserver-devel mailing list
Geoserver-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/geoserver-devel

------------------------------------------------------------------------------
Download Intel&#174; Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
_______________________________________________
Geoserver-devel mailing list
Geoserver-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/geoserver-devel

--
Justin Deoliveira
OpenGeo - http://opengeo.org
Enterprise support for open source geospatial.

Justin Deoliveira ha scritto:

On 3/4/10 4:29 PM, Jody Garnett wrote:

I would add two distinct methods to Datastore:
- updateSchema
- dropSchema

Might be nice to agree on "delete" or "remove" rather than "drop" as the latter makes the operation seem database specific.

Makes sense... Otherwise we would have called the first "alterSchema"
:-p

Cheers
Andrea

--
Andrea Aime
OpenGeo - http://opengeo.org
Expert service straight from the developers.

Jody Garnett ha scritto:

I would also consider extending the format of updateSchema so we can supply default expressions to the "new" columns.

I can propose another possibility as well. In JDBC data stores
we use to put quite a bit of extra information as user data
in each attribute: the native srid, or the native column type,
and so on.

We could do the same for updateSchema, establish a set of
conventions that udpateSchema is supposed to follow to
force a specific database native type or a specific default value.
For example, what does it mean the attribute is a String?
CHAR, VARCHAR, CLOB? What does it mean the attribute is a Double?
Is it a DOUBLE, or a NUMBER(12,5)?

Generally speaking there is quite a bit of native information that
we don't capture in AttributeDescriptor. But we could put it in the
associated user map.

Cheers
Andrea

--
Andrea Aime
OpenGeo - http://opengeo.org
Expert service straight from the developers.

I would prefer "delete" to capture the nice destructive nature of the activity (opposite of new).

Remove sometimes implies removing an item from a list (oppose of add).

Jody

On 09/03/2010, at 1:17 PM, Justin Deoliveira wrote:

On 3/4/10 4:29 PM, Jody Garnett wrote:

I would add two distinct methods to Datastore:
- updateSchema
- dropSchema

Might be nice to agree on "delete" or "remove" rather than "drop" as the
latter makes the operation seem database specific.

The added clarity would be worth it.

I would also consider extending the format of updateSchema so we can supply default expressions to the "new" columns.

Jody

On 04/03/2010, at 11:22 PM, Andrea Aime wrote:

Hi,
in OpenGeo we're looking into adding in GeoServer the ability
to create new feature types from a user provided description
by leveraging the data store createSchema(FeatureType) call.

With this mail I'm trying to provide some ideas on the how
and gather feedback.

The main idea is to have a GUI that allows one to compose a
feature type description and an equivalent REST call.
The GUI could be placed in the "new layer chooser",
now we have a
"add layer from"<datastore>
selection, I guess we could have a couple of radio buttons
instead providing also a
"create layer into"<datastore>
or something like that.

The GUI would then offer the ability to add attributes,
name, data type, nullability and length.

The REST api would allow to POST a description of
the feature type to
/workspaces/<ws>/datastores/<ds>
The description would come in the same formats
supported for the feature type
(see
http://docs.geoserver.org/2.0.x/en/user/extensions/rest/rest-config-api.html#feature-types)

This would make it quite a bit easier to start
from scratch and allow data editing.

Two things that are rolling in my mind are also how
to possibly handle something as simple as the human error.

Say I made a mistake in the structure of the feature type.
How do I go and correct it?
DataStore and DataAccess provides updateSchema(), though I'm not aware
of any datastore actually implementing this call. For
a datastore
DataStore is also missing dropSchema() method, which sounds
quite like a strong limitation to this use case... I mean
being able to create a feature type by GUI and then have to go
on the database and issue a drop table manually seems backwards.
Dropping is also much simpler to implement than updating.
How do we go about to add this functionality though?
Shall we roll a new interface that contains only that method?
Or create a DataAccess2, DataStore2 subinterfaces that contains
it (oh the horror).
Or... but please be seated before reading this one... hijack
updateSchema and assume the user meant to delete the feature
type when we call updateSchema(Name, null)?

The second thing is, would it make things easier if we mark
a datastore as the "default" and then have all schema management
calls hit it?

Cheers
Andrea

--
Andrea Aime
OpenGeo - http://opengeo.org
Expert service straight from the developers.

------------------------------------------------------------------------------
Download Intel&#174; Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
_______________________________________________
Geoserver-devel mailing list
Geoserver-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/geoserver-devel

------------------------------------------------------------------------------
Download Intel&#174; Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
_______________________________________________
Geoserver-devel mailing list
Geoserver-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/geoserver-devel

--
Justin Deoliveira
OpenGeo - http://opengeo.org
Enterprise support for open source geospatial.

------------------------------------------------------------------------------
Download Intel&#174; Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
_______________________________________________
Geoserver-devel mailing list
Geoserver-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/geoserver-devel