[Geoserver-devel] Some concerns on FeatureStore.addFeatures()

Hi,
looking at the transaction code in Geoserver I found a
comment that prompted me to look at the WFS specification
about how inserted feature fids are returned.
The specification states:

The <InsertResult> element contains one or more feature identifiers of newly created feature instances. One <InsertResult> element is reported per <Insert> element in the request. The insert results are reported in the order in which the <Insert> operations were contained in the <Transaction> element. Additionally, they can be correlated using the handle attribute if it was specified.

Now, Geoserver uses FeatureStore.addFeatures(xxx) to add features
into the datastore, which returns a Set. The Set is usually
implementd using an HashSet, which is not insertion order preserving,
meaning that in the end we're not satisfying the WFS spec
(I guess no cite test makes sure proper fid insertion order is satisfied).

Uarg, this seems to me like a major issue. We can solve it easily
in Geoserver by switching back to the FeatureWriter API (getting
an append feature writer), but it suprise me a little since the
DataStore API was inspired by WFS no?

Cheers
Andrea

You are correct - the whole adding Features and assigning Fids workflow is messed up. See GeoAPI for a more realistic workflow that we have not had time to implement. Basically we should not return the "Set" until *commit* is called (since some added features may be removed by later operations in the transaction).

When I last looked poor GeoServer was adding Features one at a time in order to verify that it was successful - hense the ability to report things in order :frowning: I may be getting confused with the update process however.

We could change this to be a SortedSet - with the order being determined by insertion order. Or simply a unmodifiable List.

Basically we ran out of enthusiasm on this issue (everyone was stick of talking about it on email).

Jody

Hi,
looking at the transaction code in Geoserver I found a
comment that prompted me to look at the WFS specification
about how inserted feature fids are returned.
The specification states:

The <InsertResult> element contains one or more feature identifiers of newly created feature instances. One <InsertResult> element is reported per <Insert> element in the request. The insert results are reported in the order in which the <Insert> operations were contained in the <Transaction> element. Additionally, they can be correlated using the handle attribute if it was specified.

Now, Geoserver uses FeatureStore.addFeatures(xxx) to add features
into the datastore, which returns a Set. The Set is usually
implementd using an HashSet, which is not insertion order preserving,
meaning that in the end we're not satisfying the WFS spec
(I guess no cite test makes sure proper fid insertion order is satisfied).

Uarg, this seems to me like a major issue. We can solve it easily
in Geoserver by switching back to the FeatureWriter API (getting
an append feature writer), but it suprise me a little since the
DataStore API was inspired by WFS no?

Cheers
Andrea

-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
Geoserver-devel mailing list
Geoserver-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/geoserver-devel
  

Jody Garnett ha scritto:

You are correct - the whole adding Features and assigning Fids workflow is messed up. See GeoAPI for a more realistic workflow that we have not had time to implement. Basically we should not return the "Set" until *commit* is called (since some added features may be removed by later operations in the transaction).

I disagree on this approach, we're selling away the convenience for the most mainstream system (databases) just to handle the issue with two
other datastores. I would feel more comfortable having both, returing
fids right away if possible, and return all of them again at the
end of the transaction. In both cases, do preserve the insertion order,
since this is what the WFS spec requires us.

When I last looked poor GeoServer was adding Features one at a time in order to verify that it was successful - hense the ability to report things in order :frowning: I may be getting confused with the update process however.

Yup, update does change a feature at a time.

We could change this to be a SortedSet - with the order being determined by insertion order. Or simply a unmodifiable List.

SortedSet handle Comparables, or stuff that has a comparator anyways.
Insertion order is not an attribute of the FIDs, we can work around this
by inserting a larger bean that has insertion order too, but that would
mean playing with our non standard implementation of a SortedSet.
List is the most natural choice.

As usual, I'm not only concerned with the medium term issues, but also
about what's happening now. And what's happening now is that we're not
complaint with WFS spec _today_. The easiest fix I can think of is going
into the 2-3 places that are filling up HashSet instances with FIDS and
use a LinkedHashSet instead, and then go and modify the javadoc of
addFeatures stating that the Set returned must have an iteration order
equals to the insertion one. Anything against that?

Basically we ran out of enthusiasm on this issue (everyone was stick of talking about it on email).

The real issue is that changing this would require changing the datastore interface. Can we setup a wiki page with a list of changes
we deem important for the datastore API, and maybe have a voting section
behind each one, so that we can also gather a PMC opinion as a whole
on each one?
In the near future complex features will hit us, if we have to break
the API, let's do it once.

Cheers
Andrea

Andrea Aime wrote:

I disagree on this approach, we're selling away the convenience for the most mainstream system (databases) just to handle the issue with two
other datastores. I would feel more comfortable having both, returing
fids right away if possible, and return all of them again at the
end of the transaction. In both cases, do preserve the insertion order,
since this is what the WFS spec requires us.

This was not a question of convenience but of ability. Pretending we have FIDs when we have not sent the request away to an external WFS is causing a lot of pain. And some really bad hacks.

More later,
Jody

Jody Garnett ha scritto:

Andrea Aime wrote:

I disagree on this approach, we're selling away the convenience for the most mainstream system (databases) just to handle the issue with two
other datastores. I would feel more comfortable having both, returing
fids right away if possible, and return all of them again at the
end of the transaction. In both cases, do preserve the insertion order,
since this is what the WFS spec requires us.

This was not a question of convenience but of ability. Pretending we have FIDs when we have not sent the request away to an external WFS is causing a lot of pain. And some really bad hacks.

That's exactly ability that I was talking about. Sometimes you have to
code against the minimum common denominator because you don't know what
you're coding against (this is Geoserver case btw).

Other times, you need to leverage that extra functionality that only
the datastore you're playing against provides. Think for example about
applications that do hit a jdbc datastore but at the same time need
to alter other tables by direct jdbc access (using the same transaction), and need to references features somehow (so you need the
primary key of newly inserted features).
Now it's possible to write them if you're using the
FeatureWriter, or DataStore.addFeatures, if you don't care about order.
I don't want the API to change in a way that makes this impossible,
that's just my concern.
A programmer needing primary key does not give a damn about WFS limitations, he knows that the database generates the primary keys
on data insertion.

Cheers
Andrea