[Geoserver-users] parallel processing

Hi, there is any chance to convert geoserver to run with parallel processing…

Using somthing like http://www.jppf.org/

I’ve no clue in what it’s needed to convert geoserver in a grid-aware application.

Thanks in advance
Facundo.-
ps: sorry my english… and is worst than mi spanish… :slight_smile:

There is;

What you need is:
1. A common configuration
2. Something up front to distribute the load across the computers in the cluster

The recent "catalog proposal" is step one (you want to get all the geoserver instances to read their configuration from a shared database). The second step is why we sign up for this Java Enterprise Edition idea - the details of load balancing is left up to your application server (our job ends with providing a geoserver war with enough configuration options declared so you can set it up - as an example we would need to let you provide the name of the database holding the shared configuration).

Jody

Hi, there is any chance to convert geoserver to run with parallel processing....

Using somthing like http://www.jppf.org/

I've no clue in what it's needed to convert geoserver in a grid-aware application.

Thanks in advance
Facundo.-
ps: sorry my english... and is worst than mi spanish... :slight_smile:
------------------------------------------------------------------------

-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems? Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >> http://get.splunk.com/
------------------------------------------------------------------------

_______________________________________________
Geoserver-devel mailing list
Geoserver-devel@lists.sourceforge.net
geoserver-devel List Signup and Options
  

Facundo Garat ha scritto:

Hi, there is any chance to convert geoserver to run with parallel processing....

Using somthing like http://www.jppf.org/

I've no clue in what it's needed to convert geoserver in a grid-aware application.

We never looked into grid computing so far, and clustering, whilst
already possible, it's clumsy.
The problem lies, as Jody noted, in the configuration: it's a set of
xml files sitting on a disk, and fully loaded by GeoServer in memory
on startup for performance reasons.

This means you can share the configuration using a network disk, but
you have to kick all of the GeoServer instances (force them to
reload the config) each time the files are modified. Often people
do write some network script to force each GeoServer instance to upgrade.
Now that I think about it, it would be possible to write a simple
polling thread in GeoServer that looks at the last modification
date of the configuration file and reloads them if they happen
to have been modified since last reload. By putting the polling thread
out of the data serving path the overhead would be minimal (not
even measurable I think).

Once you have this, you just need to put a load balancer in front
of a GeoServer cluster and you're gold.

Cheers
Andrea

One thing I eventually want to check out is hadoop - http://lucene.apache.org/hadoop/ to figure out if we can use map reduce to make huge amounts of tiles on large clusters. But yeah, the parallel processing stuff is great RnD stuff, but not high enough on the priority lists right now. I wonder if we could try to find some university student that could investigate it for their masters, since it would be a very cool project.

Chris

Andrea Aime wrote:

Facundo Garat ha scritto:

Hi, there is any chance to convert geoserver to run with parallel processing....

Using somthing like http://www.jppf.org/

I've no clue in what it's needed to convert geoserver in a grid-aware application.

We never looked into grid computing so far, and clustering, whilst
already possible, it's clumsy.
The problem lies, as Jody noted, in the configuration: it's a set of
xml files sitting on a disk, and fully loaded by GeoServer in memory
on startup for performance reasons.

This means you can share the configuration using a network disk, but
you have to kick all of the GeoServer instances (force them to
reload the config) each time the files are modified. Often people
do write some network script to force each GeoServer instance to upgrade.
Now that I think about it, it would be possible to write a simple
polling thread in GeoServer that looks at the last modification
date of the configuration file and reloads them if they happen
to have been modified since last reload. By putting the polling thread
out of the data serving path the overhead would be minimal (not
even measurable I think).

Once you have this, you just need to put a load balancer in front
of a GeoServer cluster and you're gold.

Cheers
Andrea

-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems? Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >> http://get.splunk.com/
_______________________________________________
Geoserver-users mailing list
Geoserver-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/geoserver-users

!DSPAM:4005,46e0f724147301431913854!

Facundo Garat ha scritto:

Maybe to overcome this problem it would be nice to configure one geoserver as master configuration and the other nodes in the cluster could "ask" (as slave) for the configuration to the master.

Yes, this could be a solution, thoguth there already are load balancing
solutions in the wild such as Apache with its mod_proxy_balancer:
http://httpd.apache.org/docs/2.2/mod/mod_proxy_balancer.html
Also some containers sport load balancing themselves, such a tomcat:
http://tomcat.apache.org/tomcat-5.0-doc/balancer-howto.html#Using%20the%20balancer%20webapp
thought it seems it has been removed from tomcat 6:
http://tomcat.apache.org/tomcat-6.0-doc/balancer-howto.html
(it probably wasn't very efficient? who knows...)

Coding one directly into a master GeoServer could make things easier
if you're not familiar with Apache, but making a load balancer
that really delivers is no easy task (code must be very carefully
crafted to avoid the balancer to become a network bottleneck).

Or use some kind of replication for the configuration between the nodes in a cluster like OsCache do for the cached files.

Yeah, this could work too.

Cheers
Andrea