[Geoserver-devel] monitoring and archiving request data

Hi all,

I waned to poll the list to do some brainstorming about how to go about archiving request data information in the monitoring extension.

As it stands now the extension simply persists request data (via hibernate) into a single table called “REQUEST”. You can imagine that a busy server will fill this table up quite quickly eventually getting to the point where operations start to perform poorly. So some sort of strategy to clear out the main table and possibly archive existing information will be needed.

My practical database admin experience is quite limited so i am very eager to hear thoughts on this one. But that said I can think of a couple of different ways to skin this cat.

  1. no archiving

Sort of lame, but leave all facilities for archiving to the db admin. The main downside of this is that it makes it impossible for the monitoring extension to retrieve historical request information for reporting purposes. So we would basically saying you can only do your own analysis on your achieved data.

  1. automated archiving

Flipping the coin would be to attempt to archive request data transparently. I tried to google for information about people using hibernate that need to archive in this manner but could not come up with much. One thought I had was to come up with a second mapping, creating a table “REQUEST_HISTORY”. And periodically move data from one table to the other. However i am not sure how feasible this is strictly going through hibernate. It seems kind of clunky and could be problematic to try and synchronize, trying to archive while the server is still under a load. I guess we could only support an “offline” archiving…

Another thought I had that is less hibernate centric would be to use triggers. Basically duplicate the REQUEST table creating REQUEST_HISTORY and set up a trigger on the main table so that any insert results in a duplicate row in the history table. Then an “archive” really just becomes clearing out the main request table. Obviously this would not be possible with straight hibernate so would require some homegrown jdbc stuff. Which gets tricky when you try to support different databases, messing with dialects is sort of a messy business.

  1. hybrid

The third solution would sort of be a hybrid. Basically we would not provide any archiving facilities leaving it up to the admin. But we would support querying a REQUEST_HISTORY table (if it exists) in order to retrive the historical information.

Thats all I got. Any feedback on this one would be much appreciated.

-Justin


Justin Deoliveira
OpenGeo - http://opengeo.org
Enterprise support for open source geospatial.

Justin Deoliveira ha scritto:

Hi all,

I waned to poll the list to do some brainstorming about how to go about archiving request data information in the monitoring extension.

As it stands now the extension simply persists request data (via hibernate) into a single table called "REQUEST". You can imagine that a busy server will fill this table up quite quickly eventually getting to the point where operations start to perform poorly. So some sort of strategy to clear out the main table and possibly archive existing information will be needed.

My practical database admin experience is quite limited so i am very eager to hear thoughts on this one. But that said I can think of a couple of different ways to skin this cat.

1) no archiving

Sort of lame, but leave all facilities for archiving to the db admin. The main downside of this is that it makes it impossible for the monitoring extension to retrieve historical request information for reporting purposes. So we would basically saying you can only do your own analysis on your achieved data.

2) automated archiving

Flipping the coin would be to attempt to archive request data transparently. I tried to google for information about people using hibernate that need to archive in this manner but could not come up with much. One thought I had was to come up with a second mapping, creating a table "REQUEST_HISTORY". And periodically move data from one table to the other. However i am not sure how feasible this is strictly going through hibernate. It seems kind of clunky and could be problematic to try and synchronize, trying to archive while the server is still under a load. I guess we could only support an "offline" archiving...

Another thought I had that is less hibernate centric would be to use triggers. Basically duplicate the REQUEST table creating REQUEST_HISTORY and set up a trigger on the main table so that any insert results in a duplicate row in the history table. Then an "archive" really just becomes clearing out the main request table. Obviously this would not be possible with straight hibernate so would require some homegrown jdbc stuff. Which gets tricky when you try to support different databases, messing with dialects is sort of a messy business.

3) hybrid

The third solution would sort of be a hybrid. Basically we would not provide any archiving facilities leaving it up to the admin. But we would support querying a REQUEST_HISTORY table (if it exists) in order to retrive the historical information.

Thats all I got. Any feedback on this one would be much appreciated.

I would go for the second, possibly dodging Hibernate, which is normally
not well suited for mass data transfers (at least, it was not last time
I checked). The following:

-- start transaction --
INSERT INTO REQUEST_HISTORY SELECT * FROM REQUEST WHERE DATE > ?
DELETE FROM REQUEST WHERE DATE > ?
-- commit transaction --

should be well supported among databases + plain jdbc without the need
for a translation layer in the middle.

Cheers
Andrea

--
Andrea Aime
OpenGeo - http://opengeo.org
Expert service straight from the developers.

  On 8/30/10 8:52 AM, Andrea Aime wrote:

Justin Deoliveira ha scritto:

Hi all,

I waned to poll the list to do some brainstorming about how to go about
archiving request data information in the monitoring extension.

As it stands now the extension simply persists request data (via
hibernate) into a single table called "REQUEST". You can imagine that a
busy server will fill this table up quite quickly eventually getting to
the point where operations start to perform poorly. So some sort of
strategy to clear out the main table and possibly archive existing
information will be needed.

My practical database admin experience is quite limited so i am very
eager to hear thoughts on this one. But that said I can think of a
couple of different ways to skin this cat.

1) no archiving

Sort of lame, but leave all facilities for archiving to the db admin.
The main downside of this is that it makes it impossible for the
monitoring extension to retrieve historical request information for
reporting purposes. So we would basically saying you can only do your
own analysis on your achieved data.

2) automated archiving

Flipping the coin would be to attempt to archive request data
transparently. I tried to google for information about people using
hibernate that need to archive in this manner but could not come up with
much. One thought I had was to come up with a second mapping, creating a
table "REQUEST_HISTORY". And periodically move data from one table to
the other. However i am not sure how feasible this is strictly going
through hibernate. It seems kind of clunky and could be problematic to
try and synchronize, trying to archive while the server is still under a
load. I guess we could only support an "offline" archiving...

Another thought I had that is less hibernate centric would be to use
triggers. Basically duplicate the REQUEST table creating REQUEST_HISTORY
and set up a trigger on the main table so that any insert results in a
duplicate row in the history table. Then an "archive" really just
becomes clearing out the main request table. Obviously this would not be
possible with straight hibernate so would require some homegrown jdbc
stuff. Which gets tricky when you try to support different databases,
messing with dialects is sort of a messy business.

3) hybrid

The third solution would sort of be a hybrid. Basically we would not
provide any archiving facilities leaving it up to the admin. But we
would support querying a REQUEST_HISTORY table (if it exists) in order
to retrive the historical information.

Thats all I got. Any feedback on this one would be much appreciated.

I would go for the second, possibly dodging Hibernate, which is normally
not well suited for mass data transfers (at least, it was not last time
I checked). The following:

-- start transaction --
INSERT INTO REQUEST_HISTORY SELECT * FROM REQUEST WHERE DATE> ?
DELETE FROM REQUEST WHERE DATE> ?
-- commit transaction --

should be well supported among databases + plain jdbc without the need
for a translation layer in the middle.

Cheers
Andrea

I know that one big online advertiser just renames the old table, because it's a lot faster than copying millions of rows. A separate process then examines the history table and calculates aggregate values for the hits they don't need to store forever.

Unfortunately renaming is an extension of the ANSI SQL standard, but I think it's pretty widely supported.

-Arne

Arne Kepp ha scritto:

On 8/30/10 8:52 AM, Andrea Aime wrote:

Justin Deoliveira ha scritto:

Hi all,

I waned to poll the list to do some brainstorming about how to go about
archiving request data information in the monitoring extension.

As it stands now the extension simply persists request data (via
hibernate) into a single table called "REQUEST". You can imagine that a
busy server will fill this table up quite quickly eventually getting to
the point where operations start to perform poorly. So some sort of
strategy to clear out the main table and possibly archive existing
information will be needed.

My practical database admin experience is quite limited so i am very
eager to hear thoughts on this one. But that said I can think of a
couple of different ways to skin this cat.

1) no archiving

Sort of lame, but leave all facilities for archiving to the db admin.
The main downside of this is that it makes it impossible for the
monitoring extension to retrieve historical request information for
reporting purposes. So we would basically saying you can only do your
own analysis on your achieved data.

2) automated archiving

Flipping the coin would be to attempt to archive request data
transparently. I tried to google for information about people using
hibernate that need to archive in this manner but could not come up with
much. One thought I had was to come up with a second mapping, creating a
table "REQUEST_HISTORY". And periodically move data from one table to
the other. However i am not sure how feasible this is strictly going
through hibernate. It seems kind of clunky and could be problematic to
try and synchronize, trying to archive while the server is still under a
load. I guess we could only support an "offline" archiving...

Another thought I had that is less hibernate centric would be to use
triggers. Basically duplicate the REQUEST table creating REQUEST_HISTORY
and set up a trigger on the main table so that any insert results in a
duplicate row in the history table. Then an "archive" really just
becomes clearing out the main request table. Obviously this would not be
possible with straight hibernate so would require some homegrown jdbc
stuff. Which gets tricky when you try to support different databases,
messing with dialects is sort of a messy business.

3) hybrid

The third solution would sort of be a hybrid. Basically we would not
provide any archiving facilities leaving it up to the admin. But we
would support querying a REQUEST_HISTORY table (if it exists) in order
to retrive the historical information.

Thats all I got. Any feedback on this one would be much appreciated.

I would go for the second, possibly dodging Hibernate, which is normally
not well suited for mass data transfers (at least, it was not last time
I checked). The following:

-- start transaction --
INSERT INTO REQUEST_HISTORY SELECT * FROM REQUEST WHERE DATE> ?
DELETE FROM REQUEST WHERE DATE> ?
-- commit transaction --

should be well supported among databases + plain jdbc without the need
for a translation layer in the middle.

Cheers
Andrea

I know that one big online advertiser just renames the old table, because it's a lot faster than copying millions of rows. A separate process then examines the history table and calculates aggregate values for the hits they don't need to store forever.

Unfortunately renaming is an extension of the ANSI SQL standard, but I think it's pretty widely supported.

Ah. This approach works if the target name contains the month or the
year, meaning you get multiple archive tables that have then to
be unioned somehow when one needs to access data crossing the border
between one archive and the next.

Fully agree about computing more compact aggregates btw

Cheers
Andrea

--
Andrea Aime
OpenGeo - http://opengeo.org
Expert service straight from the developers.