[Geoserver-devel] Proposal to enhance control flow module

Hi all,

I’d like to discuss an enhancement that I’ve been working on for the control flow module that would allow it to:

  1. Establish global limits for number of concurrent requests from a single IP address. A single IP address may only take up to n number of requests in parallel.

  2. Specify limits for the number of requests that a particular IP address can take, same as above but specific to a particular IP

  3. IP blacklist, which would reject requests coming from specific IP addresses.

Not sure if completely necessary, but I’ve rounded up some more details into a GSIP here:

http://geoserver.org/display/GEOS/GSIP+72±+Control+Flow+Module+Enhancements

Feedback and comments really welcome.

Thanks


Juan Marín Otero

On Wed, Feb 15, 2012 at 10:58 PM, Juan Marín Otero <juan.marin.otero@anonymised.com> wrote:

Hi all,

I’d like to discuss an enhancement that I’ve been working on for the control flow module that would allow it to:

  1. Establish global limits for number of concurrent requests from a single IP address. A single IP address may only take up to n number of requests in parallel.

  2. Specify limits for the number of requests that a particular IP address can take, same as above but specific to a particular IP

  3. IP blacklist, which would reject requests coming from specific IP addresses.

Not sure if completely necessary, but I’ve rounded up some more details into a GSIP here:

http://geoserver.org/display/GEOS/GSIP+72±+Control+Flow+Module+Enhancements

Feedback and comments really welcome.

Hi Juan,
thanks a lot for the detailed GSIP, it’s well laid out and provides a good explanation
of what has been done.

I’m overall quite happy about the improvements, there are a just a few minor points that
may need amending.

The reason why the original module did not have ip address control is due to routers and
proxies.
When you have a big organization, with hundred or thousands of people behind a single
public ip address, it gets difficult to use just the ip: you might be handling a user with
a single ip that’s flooding you, or seeing the effect of 30 people working in parallel
against GeoServer, while you want to stop/limit the first applying limits to the seconds
might well make the application unusable for that particular organization.

Also you might have reverse proxies local to the server that act as front ends to the
server, if you just get the ip address you’ll get the one of the local proxy.
To address the above there is the x-forwarded-for header that reverse proxies
normally set to inform software that a proxy is in the middle:
http://en.wikipedia.org/wiki/X-Forwarded-For
If you look into the “monitoring” module you’ll see how the header is used.

Another detail in the GSIP that might need reworking is this syntax:
ip.address=,<ip_addr>

Is it just me, or the above won’t allow to control more than one specific ip
address? The property files are, in the end, serialized maps.
I guess the following one might do instead:
ip.<ip_addr>=count
(and you scan the whole property file contents to look for those).

One final note is that this is the first GSIP that is being proposed by a non
core developer. As far as I know you don’t have committ access either, right?
If this is your first contribution you should also open a ticket in jira
and attach the full patch for review.

Anyways, the work looks good.
Normally we allow commit access directly for new community modules, since
you are modifying an extension I guess we can give you commit access anyways
but you’ll have to ask for reviews before making commits to any core/extension
module.

Thanks again for the contribution and for approaching the community in such
a clear way, looking forward to see the GSIP applied.

Cheers
Andrea

Ing. Andrea Aime
GeoSolutions S.A.S.
Tech lead

Via Poggio alle Viti 1187
55054 Massarosa (LU)
Italy

phone: +39 0584 962313
fax: +39 0584 962313
mob: +39 339 8844549

http://www.geo-solutions.it
http://geo-solutions.blogspot.com/
http://www.youtube.com/user/GeoSolutionsIT
http://www.linkedin.com/in/andreaaime
http://twitter.com/geowolf


The reason why the original module did not have ip address control is due to
routers and
proxies.
When you have a big organization, with hundred or thousands of people behind
a single
public ip address, it gets difficult to use just the ip: you might be
handling a user with
a single ip that's flooding you, or seeing the effect of 30 people working
in parallel
against GeoServer, while you want to stop/limit the first applying limits to
the seconds
might well make the application unusable for that particular organization.

Well I'd say that for a big public deployment it'd be better to
throttle or blacklist an organization rather than risk having it go
down for everyone.

But I agree that we should allow more granularity. User throttling
can help with that I think. I realize that most people don't actually
log in to GeoServer now, but I think that's going to start to change
with the new security stuff that more easily integrates with ldap and
single sign ons, and with things like GeoNode that put users more to
the fore. And I think if there's a benefit like more granular
throttling control than admins will see an advantage to having users
login in.

Looking at the proposal it says that per user is cookie-based. Is
that just because the current state of the security system? Is this
proposal compatible with the security work Justin and Christian have
been doing (unfortunately there's not much public info thus far on
http://geoserver.org/display/GEOS/GSIP+71+-+New+Security+Subsystem, so
that may be more a question for Justin and Christian than you Juan).
Like if one uses a new single sign on plugin on the security system
for users than will that work fine with the existing control flow user
throttling?

And how does per ip and per user throttling should interact with one
another? Like if you turn on throttling for both users and IP's can
the users all from one IP get throttled less than if they weren't
logged in? Which one takes precedent, is there any configuration
possible to say how they interact?

C

Also you might have reverse proxies local to the server that act as front
ends to the
server, if you just get the ip address you'll get the one of the local
proxy.
To address the above there is the x-forwarded-for header that reverse
proxies
normally set to inform software that a proxy is in the middle:
http://en.wikipedia.org/wiki/X-Forwarded-For
If you look into the "monitoring" module you'll see how the header is used.

Another detail in the GSIP that might need reworking is this syntax:
ip.address=<count>,<ip_addr>

Is it just me, or the above won't allow to control more than one specific
ip
address? The property files are, in the end, serialized maps.
I guess the following one might do instead:
ip.<ip_addr>=count
(and you scan the whole property file contents to look for those).

One final note is that this is the first GSIP that is being proposed by a
non
core developer. As far as I know you don't have committ access either,
right?
If this is your first contribution you should also open a ticket in jira
and attach the full patch for review.

Anyways, the work looks good.
Normally we allow commit access directly for new community modules, since
you are modifying an extension I guess we can give you commit access anyways
but you'll have to ask for reviews before making commits to any
core/extension
module.

Thanks again for the contribution and for approaching the community in such
a clear way, looking forward to see the GSIP applied.

Cheers
Andrea

--
-------------------------------------------------------
Ing. Andrea Aime
GeoSolutions S.A.S.
Tech lead

Via Poggio alle Viti 1187
55054 Massarosa (LU)
Italy

phone: +39 0584 962313
fax: +39 0584 962313
mob: +39 339 8844549

http://www.geo-solutions.it
http://geo-solutions.blogspot.com/
http://www.youtube.com/user/GeoSolutionsIT
http://www.linkedin.com/in/andreaaime
http://twitter.com/geowolf

-------------------------------------------------------

------------------------------------------------------------------------------
Virtualization & Cloud Management Using Capacity Planning
Cloud computing makes use of virtualization - but cloud computing
also focuses on allowing computing to be delivered as a service.
http://www.accelacomm.com/jaw/sfnl/114/51521223/
_______________________________________________
Geoserver-devel mailing list
Geoserver-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/geoserver-devel

On Thu, Feb 16, 2012 at 2:46 PM, Chris Holmes <cholmes@anonymised.com.> wrote:

The reason why the original module did not have ip address control is due to
routers and
proxies.
When you have a big organization, with hundred or thousands of people behind
a single
public ip address, it gets difficult to use just the ip: you might be
handling a user with
a single ip that’s flooding you, or seeing the effect of 30 people working
in parallel
against GeoServer, while you want to stop/limit the first applying limits to
the seconds
might well make the application unusable for that particular organization.

Well I’d say that for a big public deployment it’d be better to
throttle or blacklist an organization rather than risk having it go
down for everyone.

Oh, I fully agree on this one. However, while you can set a limit like 6
concurrent requests tops per user with the current cookie based mechanism,
and that does not pose serious problems, you should set a much larger number,
like 100, on the per IP limits, to avoid chocking large organisations working
behind a single proxy.

But I agree that we should allow more granularity. User throttling
can help with that I think. I realize that most people don’t actually
log in to GeoServer now, but I think that’s going to start to change
with the new security stuff that more easily integrates with ldap and
single sign ons, and with things like GeoNode that put users more to
the fore. And I think if there’s a benefit like more granular
throttling control than admins will see an advantage to having users
login in.

Looking at the proposal it says that per user is cookie-based. Is
that just because the current state of the security system?

Nope, the current control flow system is cookie based, and works
independently of the authentication, while the proposal works solely
based on the ip

Is this
proposal compatible with the security work Justin and Christian have
been doing (unfortunately there’s not much public info thus far on
http://geoserver.org/display/GEOS/GSIP+71±+New+Security+Subsystem, so
that may be more a question for Justin and Christian than you Juan).
Like if one uses a new single sign on plugin on the security system
for users than will that work fine with the existing control flow user
throttling?

Throttling based on the authenticated user would be yet another developent.

And how does per ip and per user throttling should interact with one
another? Like if you turn on throttling for both users and IP’s can
the users all from one IP get throttled less than if they weren’t
logged in? Which one takes precedent, is there any configuration
possible to say how they interact?

The current system does not allow interactions, a request has to go
through all queues that it is incercepted by.
To allow interactions I believe we have to change system or have
the flow controllers know about each other in some way

Cheers
Andrea

Ing. Andrea Aime
GeoSolutions S.A.S.
Tech lead

Via Poggio alle Viti 1187
55054 Massarosa (LU)
Italy

phone: +39 0584 962313
fax: +39 0584 962313
mob: +39 339 8844549

http://www.geo-solutions.it
http://geo-solutions.blogspot.com/
http://www.youtube.com/user/GeoSolutionsIT
http://www.linkedin.com/in/andreaaime
http://twitter.com/geowolf


Hi Andrea,

Thanks a lot for your feedback. I incorporated your suggestions (change in single ip address configuration, x-forwarded-for to account for proxied ip addresses) into the code and created a JIRA issue with a patch here

http://jira.codehaus.org/browse/GEOS-4961

As for commit access, no, I don’t have rights. My original plan was to first submit a community module I had been working on before, but this got precedence in my to do list.

I’m perfectly fine with submitting patches up for review in the meantime.

Thanks,


Juan Marín Otero

2012/2/16 Andrea Aime <andrea.aime@anonymised.com>

On Wed, Feb 15, 2012 at 10:58 PM, Juan Marín Otero <juan.marin.otero@anonymised.com> wrote:

Hi all,

I’d like to discuss an enhancement that I’ve been working on for the control flow module that would allow it to:

  1. Establish global limits for number of concurrent requests from a single IP address. A single IP address may only take up to n number of requests in parallel.

  2. Specify limits for the number of requests that a particular IP address can take, same as above but specific to a particular IP

  3. IP blacklist, which would reject requests coming from specific IP addresses.

Not sure if completely necessary, but I’ve rounded up some more details into a GSIP here:

http://geoserver.org/display/GEOS/GSIP+72±+Control+Flow+Module+Enhancements

Feedback and comments really welcome.

Hi Juan,
thanks a lot for the detailed GSIP, it’s well laid out and provides a good explanation
of what has been done.

I’m overall quite happy about the improvements, there are a just a few minor points that
may need amending.

The reason why the original module did not have ip address control is due to routers and
proxies.
When you have a big organization, with hundred or thousands of people behind a single
public ip address, it gets difficult to use just the ip: you might be handling a user with
a single ip that’s flooding you, or seeing the effect of 30 people working in parallel
against GeoServer, while you want to stop/limit the first applying limits to the seconds
might well make the application unusable for that particular organization.

Also you might have reverse proxies local to the server that act as front ends to the
server, if you just get the ip address you’ll get the one of the local proxy.
To address the above there is the x-forwarded-for header that reverse proxies
normally set to inform software that a proxy is in the middle:
http://en.wikipedia.org/wiki/X-Forwarded-For
If you look into the “monitoring” module you’ll see how the header is used.

Another detail in the GSIP that might need reworking is this syntax:
ip.address=,<ip_addr>

Is it just me, or the above won’t allow to control more than one specific ip
address? The property files are, in the end, serialized maps.
I guess the following one might do instead:
ip.<ip_addr>=count
(and you scan the whole property file contents to look for those).

One final note is that this is the first GSIP that is being proposed by a non
core developer. As far as I know you don’t have committ access either, right?
If this is your first contribution you should also open a ticket in jira
and attach the full patch for review.

Anyways, the work looks good.
Normally we allow commit access directly for new community modules, since
you are modifying an extension I guess we can give you commit access anyways
but you’ll have to ask for reviews before making commits to any core/extension
module.

Thanks again for the contribution and for approaching the community in such
a clear way, looking forward to see the GSIP applied.

Cheers
Andrea

Ing. Andrea Aime
GeoSolutions S.A.S.
Tech lead

Via Poggio alle Viti 1187
55054 Massarosa (LU)
Italy

phone: +39 0584 962313
fax: +39 0584 962313
mob: +39 339 8844549

http://www.geo-solutions.it
http://geo-solutions.blogspot.com/
http://www.youtube.com/user/GeoSolutionsIT
http://www.linkedin.com/in/andreaaime
http://twitter.com/geowolf