Hi Chris, hi all,
recently a subject has come up: how to create FID for tables that have no
primary keys? On Postgis yesterday I added the OID mapping, which works
just fine, only for tables without primary key mapping.
What to do in general, for tables that have no primary key and databases
that do not support the OIDs? My current solution is to use a FIDMapper
that generates a different UID (that is, unique identifier) every time it is
asked to.
Pros:
* we can generate a FID for features;
* the resulting features can be put into an hash map (that is, loaded into the
memory data store) without clashes
Cons:
* every time a feature is loaded, gets a different UID (since I have no way to
really identify it)
* the ID looks funny...
What should we do, refuse to load features from a table because it does not
have primary keys, or adopt the above? Or make it configurable?
The semantics of a feature really require a persistent ID IMHO. Better to get used to the concept of reusable, interoperable data not put effort into solving the wrong problems.
- so should neither create non-repeatable features
a corrollary is that you should not use OIDS on a transactional source, or one that is not strictly controlled to guarantees an id will not be re-used.
The good news is this makes a lot of stuff a lot more obvious - such as transaction semantics.
Rob
Andrea Aime wrote:
Hi Chris, hi all,
recently a subject has come up: how to create FID for tables that have no primary keys? On Postgis yesterday I added the OID mapping, which works
just fine, only for tables without primary key mapping.
What to do in general, for tables that have no primary key and databases that do not support the OIDs? My current solution is to use a FIDMapper
that generates a different UID (that is, unique identifier) every time it is asked to.
Pros:
* we can generate a FID for features;
* the resulting features can be put into an hash map (that is, loaded into the
memory data store) without clashes
Cons:
* every time a feature is loaded, gets a different UID (since I have no way to
really identify it)
* the ID looks funny...
What should we do, refuse to load features from a table because it does not
have primary keys, or adopt the above? Or make it configurable?
Best regards
Andrea Aime
-------------------------------------------------------
This SF.Net email sponsored by Black Hat Briefings & Training.
Attend Black Hat Briefings & Training, Las Vegas July 24-29 - digital self defense, top technical experts, no vendor pitches, unmatched networking opportunities. Visit www.blackhat.com
_______________________________________________
Geoserver-devel mailing list
Geoserver-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/geoserver-devel
Part of the point of a feature id is to be able to use it to refer back to the feature. If the table has no primary key and features get a different id everytime then you can't refer back to it, which makes having an FID pointless. Furthermore, having an FID that doesnt support refering back to the feature using it is just going give us problems when users try to do something that should work, but doesn't.
I am in favour of not supporting tables without fids. There is nothing wrong with applying minimum requirements to the type of data geotools supports.
Sean
Andrea Aime wrote:
Hi Chris, hi all,
recently a subject has come up: how to create FID for tables that have no primary keys? On Postgis yesterday I added the OID mapping, which works
just fine, only for tables without primary key mapping.
What to do in general, for tables that have no primary key and databases that do not support the OIDs? My current solution is to use a FIDMapper
that generates a different UID (that is, unique identifier) every time it is asked to.
Pros:
* we can generate a FID for features;
* the resulting features can be put into an hash map (that is, loaded into the
memory data store) without clashes
Cons:
* every time a feature is loaded, gets a different UID (since I have no way to
really identify it)
* the ID looks funny...
What should we do, refuse to load features from a table because it does not
have primary keys, or adopt the above? Or make it configurable?
Best regards
Andrea Aime
-------------------------------------------------------
This SF.Net email sponsored by Black Hat Briefings & Training.
Attend Black Hat Briefings & Training, Las Vegas July 24-29 - digital self defense, top technical experts, no vendor pitches, unmatched networking opportunities. Visit www.blackhat.com
_______________________________________________
Geotools-devel mailing list
Geotools-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/geotools-devel
The semantics of a feature really require a persistent ID IMHO. Better to get used to the concept of reusable, interoperable data not put effort into solving the wrong problems.
The other side of the coin is trying to ease setup for geoserver. With this in mind I would rather see OIDS used for FeatureID (when a unique key is unavailble), and disable transaction support against such tables.
How likly is it that we will run into GIS systems in the wild that have been setup, but lack a unique key? Most postgis tables are set-up with a single shapefile import program. Could we ask this program to provide a unique key as part of the import process?
Shapefile is similarly troubled when it uses row number as a FeatureID.
A couple of approachs could happend:
- supports the shapefile idea of marking features as deleted - so we can preserve rowid = featureid
- port andrea's recent fid_exp work for jdbc DataStores to be part of the core DataStore api?
Pros:
* we can generate a FID for features;
* the resulting features can be put into an hash map (that is, loaded into the
memory data store) without clashes
We could just use null for the FID - the datastore really is not providing one. Application like GeoServer can throw a fit and come up with some facility for allowing the user to specify a mapping during set up.
Cons:
* every time a feature is loaded, gets a different UID (since I have no way to
really identify it)
* the ID looks funny...
What should we do, refuse to load features from a table because it does not
have primary keys, or adopt the above? Or make it configurable?
It does strike me that this is a datastore admin problem and should not be left to client applications. Even if that means setting up an extra postgis table to persist fid mapping information. It would not be the first time, an extra postgis table exists for feature locking information.
The semantics of a feature really require a persistent ID IMHO. Better to get used to the concept of reusable, interoperable data not put effort into solving the wrong problems.
The other side of the coin is trying to ease setup for geoserver. With this in mind I would rather see OIDS used for FeatureID (when a unique key is unavailble), and disable transaction support against such tables.
How likly is it that we will run into GIS systems in the wild that have been setup, but lack a unique key? Most postgis tables are set-up with a single shapefile import program. Could we ask this program to provide a unique key as part of the import process?
There will be many copies of data sets behaving this way - and its this culture of copying rather than serving as an authoritative spouce that is at the root of most of the data quality issues plaguing real GIS business systems - its important that we make it possible to set up authoritative data sources, which must support persistent keys. Ergo, if you dont have a persistent key you guarantee you should not be maintaining the data - so disabling transactions makes sense.
So, OIDs do match the actual feature semantics in this case - "I am a feature whose only identy is that I'm _currently_ at position 37 of a data store. Try to use me for anything than a ephemeral disaply and you will enter a world of pain!" - the question is why would you want to serve such a feature - instead of simply return a WMS response - since you cant safely remeber it or derive any meaningful statistics from it. Maybe you could as aprt of an aggregate set - 35% of whale sightings occured in man-made lakes - but you coundly make any representations about a specific lake... unless you used a field as a key - in which case why not use it as a the feature id in the first place?
Alle 04:24, lunedì 28 giugno 2004, Rob Atkinson ha scritto:
Jody Garnett wrote:
> Rob Atkinson wrote:
>> My 2c
>>
>> The semantics of a feature really require a persistent ID IMHO.
>> Better to get used to the concept of reusable, interoperable data not
>> put effort into solving the wrong problems.
>
> The other side of the coin is trying to ease setup for geoserver.
> With this in mind I would rather see OIDS used for FeatureID (when a
> unique key is unavailble), and disable transaction support against
> such tables.
>
> How likly is it that we will run into GIS systems in the wild that
> have been setup, but lack a unique key? Most postgis tables are set-up
> with a single shapefile import program. Could we ask this program to
> provide a unique key as part of the import process?
There will be many copies of data sets behaving this way - and its this
culture of copying rather than serving as an authoritative spouce that
is at the root of most of the data quality issues plaguing real GIS
business systems - its important that we make it possible to set up
authoritative data sources, which must support persistent keys. Ergo,
if you dont have a persistent key you guarantee you should not be
maintaining the data - so disabling transactions makes sense.
So, OIDs do match the actual feature semantics in this case - "I am a
feature whose only identy is that I'm _currently_ at position 37 of a
data store. Try to use me for anything than a ephemeral disaply and you
will enter a world of pain!" - the question is why would you want to
serve such a feature - instead of simply return a WMS response - since
you cant safely remeber it or derive any meaningful statistics from it.
Maybe you could as aprt of an aggregate set - 35% of whale sightings
occured in man-made lakes - but you coundly make any representations
about a specific lake... unless you used a field as a key - in which
case why not use it as a the feature id in the first place?
Rob, my question was at the datastore level. Geotools is a library for
programmers buiding application, not for end users. Your concerns are
valid, but are expressed at the application level, at the level where you
decide what you can do and what you can't do.
So, in my opinion, Geotools should be able to work with tables with no
primary keys, and provide a way to disable this behaviour when people
building the application think that handling tables without primary keys
are inappropriate, and also leave space for further detail, such as, for
example, allow reading but not writing.
Forcing a behaviour deep in the library is not the proper way to educate
people in my opinion. Documentation and communication in general is.
Alle 01:21, lunedì 28 giugno 2004, Sean Geoghegan ha scritto:
Part of the point of a feature id is to be able to use it to refer back
to the feature. If the table has no primary key and features get a
different id everytime then you can't refer back to it, which makes
having an FID pointless. Furthermore, having an FID that doesnt support
refering back to the feature using it is just going give us problems
when users try to do something that should work, but doesn't.
I am in favour of not supporting tables without fids. There is nothing
wrong with applying minimum requirements to the type of data geotools
supports.
I agree but... I think we should provide at least a way to enable the library
read data without primary keys. One thing is an application that its written
from scratch, data stores included, another is one that needs to work
on a legacy data store that may not have (ouch!) primary keys...
It should be something like: we don't recommend this, but if you really
need to here is the command to enable it...
I agree but... I think we should provide at least a way to enable the library
read data without primary keys. One thing is an application that its written
from scratch, data stores included, another is one that needs to work
on a legacy data store that may not have (ouch!) primary keys...
It should be something like: we don't recommend this, but if you really
need to here is the command to enable it...
Best regards
Andrea Aime
In that case maybe we should not allow FeatureStores and FeatureWriters to be created for these tables and then use some hashing and/or random mechanism to create the feature id?