(forwading this to devel, hope you don't mind Jody)
Ok, I guess I'll chime in with my own connection pool research of the past
day and a half, and attempt to answer Jody's question.
I spent the last day and a half implementing connection pooling with
postgis. Mostly just shamelessly ripping Sean's great code off - I think
sometime this week I'm going to at least move some stuff up to an
AbstractJDBCDataSource (any better name suggestions?) Not sure if I want
to commit it to geotools - maybe after they tag defaultcore. Well, I
guess we'll bring it up at the next irc. Most of the rationale for this
now is that PostgisDataSource is just getting too damn long. I'm going to
start with just the new transaction handling code that Sean wrote to deal
with the connection pooling, getConnection, getTransactionConnection,
closeTransactionConnectoin, finalizeTransactionConnection, and probably
getAutoCommit, setAutoCommit, commit, rollback, and setFeatures. All of
that code for postgis is exactly the same in postgis as oracle, except for
using Connections instead of OracleConnections.
Sean, OracleConnections implement Connection, right? How much of the code
needs it to be an OracleConnection, and would it be reasonable to cast to
OracleConnections where needed? The other possibility is for your
getConnection methods to call super.getConnection() and then cast it
there. Would that work?
The postgresql jdbc3 driver had me a bit confused for a bit, but I think
I've got most of it figured out. First I needed to download a new one,
and then I started using org.postgresql.jdbc3.Jdbc3PoolingDataSource,
because it sounds like the one to use, no? Well, it turns out that it
doesn't implement ConnectionPoolDataSource, it instead implements
DataSource but gives back pooled connections. Which is nice, but doesn't
work with the nice ConnectionPool code. So I looked at the source, and
tried to make it work for me, as it had private getPooledConnection
methods, just not public ones, so I changed things around to get it to
implement ConnectionPoolDataSource, since all the code was to do it, the
methods just weren't implemented. Didn't figure it out so I went home,
but this morning I decided to look again to see if _anything_ actually
implemented ConnectionPoolDataSource, and found it. It's named
org.postgresql.jdbc2.optional.ConnectionPool (or
jdbc3.Jdbc3ConnectionPool). So that made me happy, though a bit pissed
that it's not named a bit better, or even recognized in their
documentation, as there it says that code to initalize a pooling
DataSource should use Jdbc3PoolingDataSource. So anyways, yesterday was
lame, but it's all fixed and working now.
One limitation, however, that I wasn't able to find a workaround for is
the charSet param, that you used to be able to include in the url or pass
in the props to the driver manager, which you can't seem to use with
ConnectionPooling. The postgresql ConnectionPoolDataSource doesn't have a
nice setURL method like the oracle one does, you have to set each of the
params individually, and setting the charSet isn't an option. I sent an
email to the postgres-jdbc list asking them if they had thought about
it/were planning to add it, and volunteering to code it up if they'd like.
I sent it yesterday, and I haven't heard anything back from them yet. But
yes, that's pretty important to get in, as I know at least a few
international users make use of it. I could probably hack it in, but I
don't really want to fork the postgres jdbc.
The other thought I had while redoing stuff was multiple transactions
taking place at once. The way the geoserver code works right now makes it
so that two transactions happening at the same time have the potential to
mess things up, as they work on the same connection. The new pooling code
is an improvement, but still doesn't fix the problem in geoserver, since
only one datasource is constructed for each featureType, so that if two
transactions come in at the same time they will use the same connection.
This is fairly easily fixed, however, by having getTransactionDataSource
(in TypeInfo) construct a new datasource each time, and have
TransactionResponse handle it. That way each request would be sure to get
it's own connection. Granted this is only a problem when locking is not
used, so it's never been a high priority of mine to fix, but as it's much
easier to do so now we might as well.
We could even get rid of the getDataSource/getTransactionDataSource split,
and have getDataSource just return a new datasource each time. I'm a bit
hesitant to do this, however, as there is an expense in generating each
new datasource, namely FeatureType creation. In postgis it needs at least
3 different sql statements, and has to process the results, and seems
kinda silly to do every single time, when it doesn't change. You see this
right now in the PostgisTest, as it creates a new datasource each time,
making it so almost half the lines in the log of it coming from the schema
creation that happens each time in setUp().
Sean's static connection pool actually gave me an idea of doing the same
sort of thing for schemas, so that DataSources created during the same jvm
session could just look to see if someone else already went through the
trouble to make the schema. The other alternative is one that used to be
in postgis, namely a constructor that takes a schema as an argument. I
did away with it because it ended up being a hacky way to do
getProperties, as you could pass in an abbreviated schema (geoserver
worked this way before I got the Query object in geotools). But yes, if
we were to do that, and with the connection pooling, it seems like it
would be nice to just have one getDataSource() function in TypeInfo, as
geotools would be handling things well enough that GeoServer doesn't have
to micro-manage what's going on. Oh wait, actually the second option
wouldn't work by itself, as GeoServer requests things through the
DataSourceFinder, which can't use the schema constructor. But perhaps
the DataSourceFactory could have a Singleton holding on to schemas or
datasources?
Oh yeah, I've been having great fun trying to install an oracle database,
so that I can actually work with Sean's code, which would be nice if I'm
do to the AbstractJdbcDataSource - though I guess I can just start coding
it up and have Sean adjust it as he needs it. But the oracle install just
doesn't get past the first 5 files of installing a 1.1.8 jvm.
I am CCing Chris with this email and the question: How often are
singletons used in the geotool2 codebase? Is this something we should
look into? Or is this code just left echoing around my head from
immature JVMs of days past....
I'm pretty sure geoserver uses singletons more than geotools does. I
think IanS might have put some in for the new factories - but I think you
may not be required to use them. Keep in mind that I also don't know the
whole geotools code base, I only know the data/filter/gml half, so you
should probably ask the geotools-devel list. But I don't think we're
really targeting geotools2 for applets. I'm under the impression that
geotools1 was for applets, geotools2 isn't. But I forget if there are
people who actually try to keep things workable for applets or not, I feel
like it comes up every once in awhile.
Ok, I'm going to go test the new postgis pooling with geoserver. I've got
it passing all tests in geotools, so we're well along. Though I think I'd
like to fix the charSet encoding before I put a release out.
Chris
> Hi Jody,
>
>I thought about using JNDI for the connection pooling, which would be fine for use in Geoserver, however we can't assume that every user of geotools will be using the appropriate environment for JNDI.
>
>However the connection pooling code I commited last week does make use of the ConnectionPoolDataSource interface and it should be usable across any JDBC based DataSource. Any feedback on the code would be appreciated.
>
>Sean
>
>