[GeoNetwork-users] connection pooling and reconnectTime setting

We are currently running GN 2.6.1 (Linux) with a Postgres/PostGIS backend. I posted a related issue over a year ago (http://osgeo-org.1803224.n2.nabble.com/Geonetwork-and-JNDI-connection-pooling-td4750597.html) which was solved at that time, but now has recurred in a different form.

I recently switched our GN instance over from using the shapefile spatial index to using the PostGIS one. This is because our catalogue is becoming very large, and my reading on the issue suggested that sticking with the shapefile approach was going to lead to scalability problems. I used the reference document from http://geonetwork-opensource.org/_static/foss4g2010/FOSS4G_Mastering_Advanced_GeoNetwork.pdf to set the application up. Not wanting to recreate my database from scratch, I extracted the relevant SQL from create-db.postgis.sql and just created the spatialindex table and indexes. I then altered WEB-INF/config.xml to look like:

<resource enabled="true">
    <name>main-db</name>
        <provider>jeeves.resources.dbms.DbmsPool</provider>
        <config>
            <user>username</user>
            <password>password</password>
            <driver>org.postgresql.Driver</driver>
            <url>jdbc:postgis://ourserver:5432/geonetwork</url>
            <poolSize>10</poolSize>
            <reconnectTime>3000</reconnectTime>
        </config>
</resource>

Note I changed the url to read jdbc:postgis rather than jdbc:postgresql. I'm not aware that I needed to change anything else to achieve the switch. When I restarted Tomcat, GN started up correctly and reported that the PostGIS database was being used for the spatial index. Queries all worked as expected, both within the GN GUI and via my application which uses CSW queries into GN. However, leaving GN idle for an hour or more (the length of time our firewall allows before deciding that Postgres connections in the connection pool are unused and therefore cutting them off) now causes GN to hang. This is because Jeeves still thinks a given connection is alive, even though the firewall has killed it. In the logs, I see the XML request issued, and no data ever coming back. Occasionally I will see messages like:

2011-07-04 12:39:50,062 ERROR [jeeves.dbmspool] - reconnecting: 10596659>=3000000 ms since last connection

Which I take to mean the reconnectTime parameter timeout is being used (3000000ms = 50 minutes). I sometimes see:

org.postgresql.util.PSQLException: An I/O error occured while sending to the backend.

And "broken pipe" errors, which I take to indicate GN being unable to communicate with PostGIS.

Everything was working fine before the spatial indexing switch. I have read the release notes for GN 2.6.2 to 2.6.4 and don't see anything related to this kind of issue being solved in the later versions. We are happy to try upgrading if any of the experts think that might help.

I have tried altering the Tomcat (6.0.18) conf/server.xml to include lines like:

<Resource name="jdbc/Geonetwork"
                          type="javax.sql.DataSource"
                          initialSize="20"
                          maxActive="100"
                          maxIdle="4"
                          minIdle="0"
                          maxWait="60000"
                          validationQuery="SELECT 1"
                          removeAbandoned="true"
                          removeAbandonedTimeout="60"
                          password="password"
                          driverClassName="org.postgresql.Driver"
                          username="username"
                          url="jdbc:postgis://servername:5432/geonetwork"
                          />

to do a "tidy up" on the connection pool on a frequent basis by making a simple query and discarding connections which fail because of being killed by the firewall. This has worked for another application, but seems to make no difference here.

Does anyone have any insight into why the reconnectTime parameter in WEB-INF/config.xml doesn't appear to be working with spatial indexing in Postgres/PostGIS?

Grateful for any input,

Thanks,

David Herbert
British Antarctic Survey.
--
This message (and any attachments) is for the recipient only. NERC
is subject to the Freedom of Information Act 2000 and the contents
of this email and any reply you make may be disclosed by NERC unless
it is exempt from release under the Act. Any material supplied to
NERC may be stored in an electronic records management system.

Hi David,

The code that handles the connection pool actually has changed in 2.7 onwards. The old pool code in jeeves.resources.dbms.DbmsPool (which is what you see in 2.6.x and earlier) has been deprecated in favour of jeeves.resources.dbms.ApacheDBCPool which is based on the Apache Commons DB Connection Pool (this change is in config.xml). So now instead of jeeves handling the logic when a connection has been closed/needs to be reopened, it is the apache code that does this instead (hopefully in a more sophisticated and tolerant fashion).

You can even get some simple snapshot stats from the pool like connections in use, connections available etc in Administration->System Information (wow! :-))

I've been thinking of adding a few more config params though to tune it a little more so maybe this is a good opportunity for you to try it out and see whether it needs more tweaking to fit your situation?

Cheers and thanks for the excellent problem description!
Simon
________________________________________
From: Herbert, David J. [darb1@anonymised.com]
Sent: Monday, 4 July 2011 11:18 PM
To: geonetwork-users@lists.sourceforge.net
Subject: [GeoNetwork-users] connection pooling and reconnectTime setting

We are currently running GN 2.6.1 (Linux) with a Postgres/PostGIS backend. I posted a related issue over a year ago (http://osgeo-org.1803224.n2.nabble.com/Geonetwork-and-JNDI-connection-pooling-td4750597.html) which was solved at that time, but now has recurred in a different form.

I recently switched our GN instance over from using the shapefile spatial index to using the PostGIS one. This is because our catalogue is becoming very large, and my reading on the issue suggested that sticking with the shapefile approach was going to lead to scalability problems. I used the reference document from http://geonetwork-opensource.org/_static/foss4g2010/FOSS4G_Mastering_Advanced_GeoNetwork.pdf to set the application up. Not wanting to recreate my database from scratch, I extracted the relevant SQL from create-db.postgis.sql and just created the spatialindex table and indexes. I then altered WEB-INF/config.xml to look like:

<resource enabled="true">
    <name>main-db</name>
        <provider>jeeves.resources.dbms.DbmsPool</provider>
        <config>
            <user>username</user>
            <password>password</password>
            <driver>org.postgresql.Driver</driver>
            <url>jdbc:postgis://ourserver:5432/geonetwork</url>
            <poolSize>10</poolSize>
            <reconnectTime>3000</reconnectTime>
        </config>
</resource>

Note I changed the url to read jdbc:postgis rather than jdbc:postgresql. I'm not aware that I needed to change anything else to achieve the switch. When I restarted Tomcat, GN started up correctly and reported that the PostGIS database was being used for the spatial index. Queries all worked as expected, both within the GN GUI and via my application which uses CSW queries into GN. However, leaving GN idle for an hour or more (the length of time our firewall allows before deciding that Postgres connections in the connection pool are unused and therefore cutting them off) now causes GN to hang. This is because Jeeves still thinks a given connection is alive, even though the firewall has killed it. In the logs, I see the XML request issued, and no data ever coming back. Occasionally I will see messages like:

2011-07-04 12:39:50,062 ERROR [jeeves.dbmspool] - reconnecting: 10596659>=3000000 ms since last connection

Which I take to mean the reconnectTime parameter timeout is being used (3000000ms = 50 minutes). I sometimes see:

org.postgresql.util.PSQLException: An I/O error occured while sending to the backend.

And "broken pipe" errors, which I take to indicate GN being unable to communicate with PostGIS.

Everything was working fine before the spatial indexing switch. I have read the release notes for GN 2.6.2 to 2.6.4 and don't see anything related to this kind of issue being solved in the later versions. We are happy to try upgrading if any of the experts think that might help.

I have tried altering the Tomcat (6.0.18) conf/server.xml to include lines like:

<Resource name="jdbc/Geonetwork"
                          type="javax.sql.DataSource"
                          initialSize="20"
                          maxActive="100"
                          maxIdle="4"
                          minIdle="0"
                          maxWait="60000"
                          validationQuery="SELECT 1"
                          removeAbandoned="true"
                          removeAbandonedTimeout="60"
                          password="password"
                          driverClassName="org.postgresql.Driver"
                          username="username"
                          url="jdbc:postgis://servername:5432/geonetwork"
                          />

to do a "tidy up" on the connection pool on a frequent basis by making a simple query and discarding connections which fail because of being killed by the firewall. This has worked for another application, but seems to make no difference here.

Does anyone have any insight into why the reconnectTime parameter in WEB-INF/config.xml doesn't appear to be working with spatial indexing in Postgres/PostGIS?

Grateful for any input,

Thanks,

David Herbert
British Antarctic Survey.
--
This message (and any attachments) is for the recipient only. NERC
is subject to the Freedom of Information Act 2000 and the contents
of this email and any reply you make may be disclosed by NERC unless
it is exempt from release under the Act. Any material supplied to
NERC may be stored in an electronic records management system.

Hi Simon,

Thanks for the input on this - I have discovered the development work you mention is detailed in:

http://trac.osgeo.org/geonetwork/wiki/DatabaseConnectionsForThreadsAndNewPool

However, I know we're only on 2.6.4 stable release at the moment. I'm unclear if 2.7 (which I imagine is just "trunk" from SVN at this point) is in a stable enough state for us to use in a production system? I suppose the only way to find out is to try it and see - I guess in the absence of a workaround for this problem for 2.6.x maybe it's the only way forward?

Thanks again,

David Herbert
British Antarctic Survey.

--
This message (and any attachments) is for the recipient only. NERC
is subject to the Freedom of Information Act 2000 and the contents
of this email and any reply you make may be disclosed by NERC unless
it is exempt from release under the Act. Any material supplied to
NERC may be stored in an electronic records management system.