[Geoserver-users] Geoserver in a clustered production environment

I got a question recently, and I thought it would be nice to share the
response with the list:

What special tricks do you use to run Geoserver at MASSGIS? There are
some things like load balancing, fail over, and log analysis that come
up in production but rarely get discussed on the development lists.

Here's what I wrote back:

We use a couple of tricks, and a lot of things are simply not finished
yet (crises seem to take priority).

1) We use a simple, free TCP-level load balancer called "balance"
http://freshmeat.net/projects/balance/ It's really simple, and does
what we expect. I'm also interested in
http://freshmeat.net/projects/haproxy/ but haven't gotten around to any
testing with it yet.

The load balancer runs on a re-purposed desktop machine (p3-733 with
512MB ram). This machine is probably overpowered for what it's doing.
High TCP/IP load could cause this machine to get swamped, but a couple
of good network cards (with ASICs) would probably mitigate any cpu load
and leave the machine to idle in peace.

2) On each individual load-balanced geoserver box we run squid on port
80, http-accelerating geoserver on port 8080. This leads to all kinds
of proxy issues inside of geoserver, and I'm often the developer who
spends time fixing PROXY_URL based geoserver issues. But it works
pretty well these days, and hasn't given us trouble in a while.

3) Our geoserver data_dir is stored on a shared samba share that each
geoserver machine mounts and reads its config from. Actually, we cheat
and use the load-balancer as the samba server, but that's just an
implementation artifact...nothing structural there.

4) Failover: Occasionally a JVM will crash (usually due to geoserver's
use of JAI's native components to do rendering of rasters...I'm using a
rather bleeding-edge copy of the JAI libs from dev.java.net and the libs
are a bit flakey). This is bad, as squid isn't smart enough to close
its port 80 when port 8080 closes up and we get these bad "hangs" where
balance is still sending requests to squid which is then failing them.
To rectify this I wrote a little polling script which keeps an eye on
the running java process and on the response time of tomcat (is tomcat
taking more than 5 seconds to respond to a directory-index request?) and
if anything's outside of normal it shuts down squid.

Squid shutting down means port 80 closes and then balance won't send any
more requests to that node...effectively failing that node and having
the other two boxes pick up the load. When java comes back (either the
load works itself out or we manually restart java after the crash) then
the script re-starts squid when tomcat is behaving normally again.

5) Log analysis: We don't do as good a job of this as we should.
Geoserver 1.6.x doesn't have access-logging in it, but geoserver-trunk
(1.7.x) has it now. My hope is to get a geoserver_access_log logger
that's apache commonlog compatible and separate from the usual logging
stream. Once that's up we'll just aggregate (perl? bash with xargs?
manual?) the logs on the three machines and run awstats or something on
it.

6) We played a lot with the tomcat thread/request pooling mechanisms.
This resulted in many less resources being consumed, and failures
manifesting themselves more quickly. Its all stuff covered in the
tomcat server.xml documentation.

7) Current issues:

a. If you use a tiled map then roughly one of every three tile requests
goes to each server. But the SAME requests don't go to each server
(gah!) so the caching is sub-optimally balanced out. Supposedly you can
configure squid to ask peer caches if the content is cached elsewhere,
but I've not had much luck configuring this (probably because I'm
dense!) It might be much smarter to run one BIG squid cache in front of
the entire cluster.

b. log analysis is currently sub-optimally tracked. We really need to
get on fixing this.

c. failures could be better scripted to fix themselves. A java crash
could be easily fixed by having the script restart java if it's off.
I'm wary of "feedback loops" or self-DOSing due to a structural/hardware
failure causing this to spin out of control. But it would sure save us
the effort/annoyance of having to restart stuff at weird times.

--saul

Saul Farber ha scritto:

I got a question recently, and I thought it would be nice to share the
response with the list:

What special tricks do you use to run Geoserver at MASSGIS? There are
some things like load balancing, fail over, and log analysis that come
up in production but rarely get discussed on the development lists.

Here's what I wrote back:

We use a couple of tricks, and a lot of things are simply not finished
yet (crises seem to take priority).

Hey, lots of interesting stuff in this mail, thanks for sharing.

2) On each individual load-balanced geoserver box we run squid on port
80, http-accelerating geoserver on port 8080. This leads to all kinds
of proxy issues inside of geoserver, and I'm often the developer who
spends time fixing PROXY_URL based geoserver issues. But it works
pretty well these days, and hasn't given us trouble in a while.

Interesting. How are you dealing with changes in the data? Are you
accelerating everything or just the WMS calls?

3) Our geoserver data_dir is stored on a shared samba share that each
geoserver machine mounts and reads its config from. Actually, we cheat
and use the load-balancer as the samba server, but that's just an
implementation artifact...nothing structural there.

What do you do when you need to change the config? Script to reload
all servers in one shot? Btw, how many servers are there? How is each
one configured?

4) Failover: Occasionally a JVM will crash (usually due to geoserver's
use of JAI's native components to do rendering of rasters...I'm using a
rather bleeding-edge copy of the JAI libs from dev.java.net and the libs
are a bit flakey).

Hum, you sure that's the cause? We've had crashes of the JVM hosting
GeoServer but they are due to changes in how the glib handles double frees, and we learned that setting a certain env variable brings
back the old glib behaviour:
export MALLOC_CHECK_=0

More info on how we setup GeoServer instances at TOPP here:
http://docs.codehaus.org/display/GEOSDOC/CentOS+(Red+Hat)+5.1+Install

This is bad, as squid isn't smart enough to close
its port 80 when port 8080 closes up and we get these bad "hangs" where
balance is still sending requests to squid which is then failing them.
To rectify this I wrote a little polling script which keeps an eye on
the running java process and on the response time of tomcat (is tomcat
taking more than 5 seconds to respond to a directory-index request?) and
if anything's outside of normal it shuts down squid.

Well, if only WMS caching is needed, GeoWebCache could be used with the
nice side effect that when GeoServer dies, GWC does too making the
balancer skip that box.

5) Log analysis: We don't do as good a job of this as we should.
Geoserver 1.6.x doesn't have access-logging in it, but geoserver-trunk
(1.7.x) has it now. My hope is to get a geoserver_access_log logger
that's apache commonlog compatible and separate from the usual logging
stream. Once that's up we'll just aggregate (perl? bash with xargs?
manual?) the logs on the three machines and run awstats or something on
it.

1.6.2 has the request logging on by default, have you checked it
(along with the serious security fix we added in it, see our blog
for more details).

6) We played a lot with the tomcat thread/request pooling mechanisms.
This resulted in many less resources being consumed, and failures
manifesting themselves more quickly. Its all stuff covered in the
tomcat server.xml documentation.

Last time I did that I choose to limit threads to 30-50 and to
allow a wait queue of 100 (after that Tomcat refuses to handle
requests). What about you?

7) Current issues:

a. If you use a tiled map then roughly one of every three tile requests
goes to each server. But the SAME requests don't go to each server
(gah!) so the caching is sub-optimally balanced out. Supposedly you can
configure squid to ask peer caches if the content is cached elsewhere,
but I've not had much luck configuring this (probably because I'm
dense!) It might be much smarter to run one BIG squid cache in front of
the entire cluster.

With the downside that it would be a single point of failure for the whole cluster. If you have such a machine it better be very redundant,
even if not powerful.

b. log analysis is currently sub-optimally tracked. We really need to
get on fixing this.

c. failures could be better scripted to fix themselves. A java crash
could be easily fixed by having the script restart java if it's off.
I'm wary of "feedback loops" or self-DOSing due to a structural/hardware
failure causing this to spin out of control. But it would sure save us
the effort/annoyance of having to restart stuff at weird times.

A smart script could use the same technique ethernet uses to avoid
killing the network when a packet fails to be sent due to a collision:
exponential increasing delays between subsequent attempts. After a few tentatives you also just give up.

This mail is leaving me wondering about some numbers thought.
How many servers, how many requests served each day, and moreover,
what kind of requests?
Also, I heard that hitting ArcSDE is quite a bit slower than hitting a postgis db, but I don't know by how much. Do you have any figure (no need to have an accurate one, an approximate one based on experience would be more than enough, like 50% slower, or 2 times slower, or 10 times?).

Cheers
Andrea

Andrea,

Very good questions, many of which don't really have answers. I'll try
and address each one.

> 2) On each individual load-balanced geoserver box we run squid on port
> 80, http-accelerating geoserver on port 8080. This leads to all kinds
> of proxy issues inside of geoserver, and I'm often the developer who
> spends time fixing PROXY_URL based geoserver issues. But it works
> pretty well these days, and hasn't given us trouble in a while.

Interesting. How are you dealing with changes in the data? Are you
accelerating everything or just the WMS calls?

squid doesn't accelerate POST at all, so it's only GET requests, and
only those GET requests whose response sets a Cache-Control: max-age
header. Meaning only WMS requests involving layers which are
"cache-enabled" get cached.

> 3) Our geoserver data_dir is stored on a shared samba share that each
> geoserver machine mounts and reads its config from. Actually, we cheat
> and use the load-balancer as the samba server, but that's just an
> implementation artifact...nothing structural there.

What do you do when you need to change the config? Script to reload
all servers in one shot? Btw, how many servers are there? How is each
one configured?

Yep, a little scripted wget magic to simulate a re-load on each one via
a script after the config changes.

>
> 4) Failover: Occasionally a JVM will crash (usually due to geoserver's
> use of JAI's native components to do rendering of rasters...I'm using a
> rather bleeding-edge copy of the JAI libs from dev.java.net and the libs
> are a bit flakey).

Hum, you sure that's the cause? We've had crashes of the JVM hosting
GeoServer but they are due to changes in how the glib handles double
frees, and we learned that setting a certain env variable brings
back the old glib behaviour:
export MALLOC_CHECK_=0

More info on how we setup GeoServer instances at TOPP here:
http://docs.codehaus.org/display/GEOSDOC/CentOS+(Red+Hat)+5.1+Install

Hey, cool. I'll check that out.

> This is bad, as squid isn't smart enough to close
> its port 80 when port 8080 closes up and we get these bad "hangs" where
> balance is still sending requests to squid which is then failing them.
> To rectify this I wrote a little polling script which keeps an eye on
> the running java process and on the response time of tomcat (is tomcat
> taking more than 5 seconds to respond to a directory-index request?) and
> if anything's outside of normal it shuts down squid.

Well, if only WMS caching is needed, GeoWebCache could be used with the
nice side effect that when GeoServer dies, GWC does too making the
balancer skip that box.

Yeah, GeoWebCache is definitely on my list.

> 5) Log analysis: We don't do as good a job of this as we should.
> Geoserver 1.6.x doesn't have access-logging in it, but geoserver-trunk
> (1.7.x) has it now. My hope is to get a geoserver_access_log logger
> that's apache commonlog compatible and separate from the usual logging
> stream. Once that's up we'll just aggregate (perl? bash with xargs?
> manual?) the logs on the three machines and run awstats or something on
> it.

1.6.2 has the request logging on by default, have you checked it
(along with the serious security fix we added in it, see our blog
for more details).

Cool. I'll be glad to see 1.6.2s logging, as it will really help us
with our usage metrics. Security wise we're not too vulnerable to the
WEB-INF visibility issue. Everything important is stored outside the
WEB-INF directory.

> 6) We played a lot with the tomcat thread/request pooling mechanisms.
> This resulted in many less resources being consumed, and failures
> manifesting themselves more quickly. Its all stuff covered in the
> tomcat server.xml documentation.

Last time I did that I choose to limit threads to 30-50 and to
allow a wait queue of 100 (after that Tomcat refuses to handle
requests). What about you?

Very similar. I think our wait queue went to about 50, with 30 or so
threads.

>
> 7) Current issues:
>
> a. If you use a tiled map then roughly one of every three tile requests
> goes to each server. But the SAME requests don't go to each server
> (gah!) so the caching is sub-optimally balanced out. Supposedly you can
> configure squid to ask peer caches if the content is cached elsewhere,
> but I've not had much luck configuring this (probably because I'm
> dense!) It might be much smarter to run one BIG squid cache in front of
> the entire cluster.

With the downside that it would be a single point of failure for the
whole cluster. If you have such a machine it better be very redundant,
even if not powerful.

True, true. Need to think it through carefully. Getting that squid
distributed shared caching working would be a really sweet solution...

> b. log analysis is currently sub-optimally tracked. We really need to
> get on fixing this.
>
> c. failures could be better scripted to fix themselves. A java crash
> could be easily fixed by having the script restart java if it's off.
> I'm wary of "feedback loops" or self-DOSing due to a structural/hardware
> failure causing this to spin out of control. But it would sure save us
> the effort/annoyance of having to restart stuff at weird times.

A smart script could use the same technique ethernet uses to avoid
killing the network when a packet fails to be sent due to a collision:
exponential increasing delays between subsequent attempts. After a few
tentatives you also just give up.

Very good ideas.

This mail is leaving me wondering about some numbers thought.
How many servers, how many requests served each day, and moreover,
what kind of requests?
Also, I heard that hitting ArcSDE is quite a bit slower than hitting a
postgis db, but I don't know by how much. Do you have any figure (no
need to have an accurate one, an approximate one based on experience
would be more than enough, like 50% slower, or 2 times slower, or 10
times?).

We have one backend arcsde server (on a big, expensive HP box with dual
Opterons and 16GB of RAM with a huge storage array and batter backup and
stuff).

We have three commodity-built 1.5GB RAM dual P4-Xeon 2.8 GHz geoserver
machines.

Unfortunately we've never run our geoserver against postGIS, so I have
no comparison at all. I've run OTHER geoserver instances against
PostGIS, but nothing that's similar wrt hardware or setup...so I don't
have a very good idea of the relative speed of ArcSDE.

Supposedly (from Foss4G) ArcSDE 9.3 will support storing its backend
data in postgis geometry format...meaning that we could run our
ESRI-based front-end stuff via ArcSDE, back the ArcSDE onto PostGIS and
then run geoserver against postgis directly. We'll see how that works
if and when that software comes along!

Cheers
Andrea

7) Current issues:

a. If you use a tiled map then roughly one of every three tile requests
goes to each server. But the SAME requests don't go to each server
(gah!) so the caching is sub-optimally balanced out. Supposedly you can
configure squid to ask peer caches if the content is cached elsewhere,
but I've not had much luck configuring this (probably because I'm
dense!) It might be much smarter to run one BIG squid cache in front of
the entire cluster.

With the downside that it would be a single point of failure for the whole cluster. If you have such a machine it better be very redundant,
even if not powerful.

True, true. Need to think it through carefully. Getting that squid
distributed shared caching working would be a really sweet solution...

One thing to investigate might be tweaking the JCS backend of GeoWebcache. It can do a 'lateral cache'

http://jakarta.apache.org/jcs/LateralTCPAuxCache.html

C