Jody, Sean and I have been having an off list discussion, but I figured it
was time to make the discussion more open, so it can be archived and input
can be given. We started talking about datasources and setting up a break
out irc discussion to talk about geotools datasources, but several threads
developed. If the others continue I shall post them, but I think we should
generally try to have our devel discussions on the devel list. I'm still
unsure if this should go to geoserver or geotools - geotools is nice because
there is the devel vs. users list that we don't yet have on geoserver.
Perhaps if users of geoserver don't want so much email about development we
could create a geoserver-users list. There just hasn't really been enough
traffic on the list to justify it. Jody and Sean, this also contains my
latest reply to the thread...
Some other issues that would be nice to address eventually are:
* Refactoring the SQL statement code from oracle and postgis into a
common class ( but where to put it??)
* Resolving the FID generation problem.Yes, I'd like to resolve that problem. I don't have any great ideas
right now though.Me neither. I need to think about it some more.
> Perhaps It would help if you explained the problem to me?
> Reading the code on Fids was confusing for me - what is the
> actual intent?
>
As for the general problem, the contract of generating fids is a hard one
to get exactly right. They aren't _required_ by gml, but they are very
useful. They need to be persistant, across shutting down and starting up
of the db and changes to the file, they can't start with digits, and they
need to be automatically generated during an insert statement. The ideal
candidate I know for fids is the postgresql oid column, as it's
automatically generated, unique and persistant. But the postgis guys say
that I'm not allowed to use it as it's going away in postgres in the
future. So I started looking for primary keys to use those, as they have
to be unique, but the problem is generation. It's also tricky for shape,
since the row number provides a decent fid, but if you are to modify the
file it can change the row numbers.
The current method for producing unique FIDs in the Oracle datasource
is to use an FIDSequence that is initialised with the max fid in the
database. When new features are added, the sequence is then queried for
the next fid, which is produced by incrementing the max fid in the
FIDSequence.>>This creates a sort of artificial, in code version of an oracle
SEQUENCE construct. However the main problems is that if some other
application adds data to the table, our in code generated fids will be
invalid.This of course is a better solution than the current postgis datasource,
as that datasource doesn't implement anything for inserts if primary keys
are declared. But most postgis installs don't declare primary keys, so
the oid is just used. Jody, did the postgis guys ever fix shp2pgsql to
use primary keys? They were going to declare a sequence or something, but
I'm not sure if they ever actually implemented. Doing that would make
moving to using primary keys more reasonable. But we still need to figure
out the problem of fids. Another problem with the current approach is
that it also assumes that the primary key column is a number, which will
not always be the case. We can tell users that their databases need to be
set up in a certain way to use geotools, but it's always best to have it
work right away with any existing install.
Oh yeah, the other thing we could do to make the FIDSequence sure to
return the correct fid is to do the SELECT max(fidcolumn) each time you
want the next key. And you could also lock the table before calling the
max, releasing after the insert, to ensure that no one else got it.
>Ideally, we should use a DB sequence construct, however this makes some
assumptions about that database that may not be inline with reality.
Right. Perhaps the best route is to check for each of our ideal
constructs, and go to the more hacky construct if the ideal is not found.
So for postgis we would first check for a primary key with a sequence for
it, if not available then check if primary key is a number, and do an
FIDSequence type thing, and if that's not there either then use the oid.
I haven't really devoted much time to coming up with a good solution to
this and the current solution is just a stop gap measure, however it is
probably sufficient for the beta at this stage. If you have any ideas on
this, I would be most interested.I'm in the same place, I spent a lot of time thinking about it before, but
haven't since then.
---slight jump, picks up from jody's reply to my email, and my reply to that.
>the oid is just used. Jody, did the postgis guys ever fix shp2pgsql to
>use primary keys? They were going to declare a sequence or something, but
>I'm not sure if they ever actually implemented.
>
It seems that they just use the row number in the shape file, so the
answer is probably no.
Hrm. They were definitely the ones that wanted me to change my datasource
to use primary keys, even though most users of postgis don't make primary
keys. Or wait, are you saying no they don't use primary keys, or no they
don't use a sequence? If they don't use a sequence what happens with
using the append flag? Won't you get duplicates on the primary key, as
rows will be repeated? Did they solve that problem? And if they didn't
implement a primary key declaration at all I'm inclined to use the oid
column. Could ask them again why using the oid is bad? Paul originally
said that pgsql is becoming oid independant, so they might not be around
forever, and that since they are invisible, therefor non-inuitive to non
pgsql users. At the time I just went ahead and did it, but he also said
shp2pgsql would be fixed up with primary keys. If that hasn't really
happened I'm thinking change it when oid's disappear, or when we know for
sure they are going to. As for the second concern, I don't think it
really matters if people don't know where their fids come from.
Also, Sean, did you ever look into using ROWID? As far as I can tell it's
like oid, but not scheduled to go away any time soon? I'm asking my
friend at oracle right now to be sure, but it looks promising. It's
generated for each row, is unique and autogenerated, and can easily be
queried into a 16 character string with ROWIDTOCHAR(ROWID). I'm still
trying to set up my oracle db, so I haven't checked it yet, what form it
is, but if it starts with a number we can just prepend the table name,
though I suspect it might already do that, as the wfs spec author uses
some oracle type db. If it works for oracledatasource then we won't need
to mess with fidsequence for it.
So is there just the two suggestions:
- a sequence backed by the Database
- a fid generated from the primary keyI am also wondering for the purposes of a WFS what happens when we are
trying to Insert and we get a failure or a rollback on the same
transaction. Do we rollback the sequence, or just have gaps in our fids?
I see no reason at all for us to not have gaps in our fids. That's not in
the contract at all. They just have to be unique and autogenerated.