[SAC] Re-enable LDAP user creation

strk · May 8, 2016, 10:00am

The OSGeo Userid creation form is still offline:
https://www.osgeo.org/cgi-bin/ldap_create_user.py

It mentions being "under maintenance".
I guess it would be pretty frustrating for new users
to be left like that, with nothing to do.

How about adding at least an email address to send a registration
request to ?

I understand the "maintenance" is the work that adds
email-confirmation, but there's no progress update on
the ticket since 4 days: #1665 (Add Confirmation Flow to LDAP Account Creation) – OSGeo
Frank: is that going to happen ? How long will it take ?

--strk;

Frank_Warmerdam · May 8, 2016, 10:28am

Strk,

It seems I missed some of the discussion in #1665, but I don't really
see what is hoped to be accomplished. If someone is willing to create
the accounts with a human then they will also be willing to do email
confirmation, etc. Basically, we can't really stop humans that want
to span things.

Best regards,
Frank

On Sun, May 8, 2016 at 12:00 PM, Sandro Santilli <strk@keybit.net> wrote:

The OSGeo Userid creation form is still offline:
https://www.osgeo.org/cgi-bin/ldap_create_user.py

It mentions being "under maintenance".
I guess it would be pretty frustrating for new users
to be left like that, with nothing to do.

How about adding at least an email address to send a registration
request to ?

I understand the "maintenance" is the work that adds
email-confirmation, but there's no progress update on
the ticket since 4 days: https://trac.osgeo.org/osgeo/ticket/1665
Frank: is that going to happen ? How long will it take ?

--strk;
_______________________________________________
Sac mailing list
Sac@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/sac

--
---------------------------------------+--------------------------------------
I set the clouds in motion - turn up | Frank Warmerdam, warmerdam@pobox.com
light and sound - activate the windows | http://pobox.com/~warmerdam
and watch the world go round - Rush | Geospatial Software Developer

strk · May 8, 2016, 10:34am

On Sun, May 08, 2016 at 12:28:12PM +0200, Frank Warmerdam wrote:

Strk,

It seems I missed some of the discussion in #1665, but I don't really
see what is hoped to be accomplished. If someone is willing to create
the accounts with a human then they will also be willing to do email
confirmation, etc. Basically, we can't really stop humans that want
to span things.

I don't know the details of what happened either.
Alex reported that users were still being created.
I'm not sure if his query was correct as he's been using
"modify" timestamp rather than "create" timestamp, and
I don't know by which time was the captcha-based mechanism
introduced.

To me, forcing accounts to be created by humans is good enough
for a start. Then we should aim at finding a way to detect
"dormient" users to remove them. For example I found there
are accounts named gmail1 to gmail33, but only gmail1 to gmail8
were found to be sending spam, so far.

--strk;

Alex_M · May 8, 2016, 2:21pm

I wanted to be 100% positive that the sign ups were being done by hand.
That part was not clear, hence my request to do an ip based fail2ban on
the registration url. I was also waiting for the spam blocking to be
enabled across Trac.

We do want email verification added, 1. because without a valid email
address a user can not reset their password, even if they ask us to
because we can't verify who they are, 2. I think an additional step
might actually slow the process of registration to annoy spammers.

Sandro, my initial query was based on modify, the chart I added later to
the ticket was based on create.

If you think the new anti-spam measures are working and we can re-enable
it. Martin was working on a script to make it faster to remove spam
accounts once found, is that in place so admins can use it without
having to ask Martin?

Thanks,
Alex

On 05/08/2016 03:34 AM, Sandro Santilli wrote:

On Sun, May 08, 2016 at 12:28:12PM +0200, Frank Warmerdam wrote:

Strk,

It seems I missed some of the discussion in #1665, but I don't really
see what is hoped to be accomplished. If someone is willing to create
the accounts with a human then they will also be willing to do email
confirmation, etc. Basically, we can't really stop humans that want
to span things.

I don't know the details of what happened either.
Alex reported that users were still being created.
I'm not sure if his query was correct as he's been using
"modify" timestamp rather than "create" timestamp, and
I don't know by which time was the captcha-based mechanism
introduced.

To me, forcing accounts to be created by humans is good enough
for a start. Then we should aim at finding a way to detect
"dormient" users to remove them. For example I found there
are accounts named gmail1 to gmail33, but only gmail1 to gmail8
were found to be sending spam, so far.

--strk;
_______________________________________________
Sac mailing list
Sac@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/sac

strk · May 8, 2016, 2:36pm

On Sun, May 08, 2016 at 07:21:45AM -0700, Alex Mandel wrote:

We do want email verification added, 1. because without a valid email
address a user can not reset their password, even if they ask us to
because we can't verify who they are, 2. I think an additional step
might actually slow the process of registration to annoy spammers.

Both are good points. +1 to email verification.

If you think the new anti-spam measures are working and we can re-enable
it.

The anti-spam plugin doesn't catch spam out of the box, but requires
configuration and bayes training that needs to be done by each of
the project admins. I've done some of that for ossim which was being
still hit these days, but don't know how good that was (especially as
there was no candidate "ham" reported).

Martin was working on a script to make it faster to remove spam
accounts once found, is that in place so admins can use it without
having to ask Martin?

The command to remove accounts is documented on the wiki, but it takes
a LDAP administrator password to run.

--strk;

Alex_M · May 8, 2016, 8:20pm

On 05/08/2016 07:36 AM, Sandro Santilli wrote:

On Sun, May 08, 2016 at 07:21:45AM -0700, Alex Mandel wrote:

We do want email verification added, 1. because without a valid email
address a user can not reset their password, even if they ask us to
because we can't verify who they are, 2. I think an additional step
might actually slow the process of registration to annoy spammers.

Both are good points. +1 to email verification.

If you think the new anti-spam measures are working and we can re-enable
it.

The anti-spam plugin doesn't catch spam out of the box, but requires
configuration and bayes training that needs to be done by each of
the project admins. I've done some of that for ossim which was being
still hit these days, but don't know how good that was (especially as
there was no candidate "ham" reported).

Martin was working on a script to make it faster to remove spam
accounts once found, is that in place so admins can use it without
having to ask Martin?

The command to remove accounts is documented on the wiki, but it takes
a LDAP administrator password to run.

--strk;

I had forgotten, we can re-enable as soon as a fail2ban rule is in place
to prevent rapid registration from the same ip. Then keep adding of
email verification on the todo list for the next week or so.

I can at least confirm that no new accounts have been created since we
disabled the web form, so accounts aren't being created in some more
nefarious method.

Thanks,
Alex

strk · May 9, 2016, 1:05pm

On Sun, May 08, 2016 at 01:20:39PM -0700, Alex Mandel wrote:

I had forgotten, we can re-enable as soon as a fail2ban rule is in place
to prevent rapid registration from the same ip.

What represents "rapid" ?
I went trough apache logs analisys to sense the current pattern.
Logs contain POSTs to the user creation script from Jan 17 to May 05.
The top 10 busy days, ordered by requests:

  245 30/Apr
  122 27/Apr
  118 03/May
  115 02/May
  63 01/May
  43 14/Mar
  38 26/Apr
  36 18/Apr
  34 23/Feb
  34 06/Apr

The average for January, February and initial portion of April was
around 20 new users, so it looks like in April 27th the storm
started with a x6 increment on the number of registered users
and it reached a x12 increment on April 30th.

That day (April 30th) the 245 requests came from a total of 36 IP
addresses. The top 10 hitters of these IPS:

  93 103.233.118.38
  32 108.61.224.153
  26 180.151.246.4
  16 182.68.169.25
  11 104.156.228.177
  11 103.38.177.2
  4 151.236.19.24
  4 107.152.98.151
  4 106.78.50.229
  3 98.234.5.157

The 93 hits from 103.233.118.38 all occurred between 14:49 and 15:49,
so within a single hour.

The fail2ban solution will only ban the IP _after_ checking the log
file, so if we use a 1 hour window there could be ~100 new users
before the IP is banned. Maybe we could check every 5 minutes and ban
IPs from which more than 1 user was created. Do you think that's too
conservative ?

--strk;

strk · May 9, 2016, 1:21pm

I've enabled fail2ban banning for one hour IPs from which 2 users
are created over a 2 minutes period, and re-enabled users creation.
The /etc/fail2ban directory on "web" machine was put under a local
git repository, with "master" branch being the default configuration
from debian and "web" branch being the configuration for "web"
machine, so to eventually hold the configuration of all machines
within the same git repository (one branch for host).

Let's see what happens. Please keep an eye on users creation and
spam (remember there are a lot of fake users already, so it doesn't
relaly take a new registration to start spamming).

--strk;

On Mon, May 09, 2016 at 03:05:55PM +0200, Sandro Santilli wrote:

On Sun, May 08, 2016 at 01:20:39PM -0700, Alex Mandel wrote:
>
> I had forgotten, we can re-enable as soon as a fail2ban rule is in place
> to prevent rapid registration from the same ip.

What represents "rapid" ?
I went trough apache logs analisys to sense the current pattern.
Logs contain POSTs to the user creation script from Jan 17 to May 05.
The top 10 busy days, ordered by requests:

  245 30/Apr
  122 27/Apr
  118 03/May
  115 02/May
  63 01/May
  43 14/Mar
  38 26/Apr
  36 18/Apr
  34 23/Feb
  34 06/Apr

The average for January, February and initial portion of April was
around 20 new users, so it looks like in April 27th the storm
started with a x6 increment on the number of registered users
and it reached a x12 increment on April 30th.

That day (April 30th) the 245 requests came from a total of 36 IP
addresses. The top 10 hitters of these IPS:

  93 103.233.118.38
  32 108.61.224.153
  26 180.151.246.4
  16 182.68.169.25
  11 104.156.228.177
  11 103.38.177.2
  4 151.236.19.24
  4 107.152.98.151
  4 106.78.50.229
  3 98.234.5.157

The 93 hits from 103.233.118.38 all occurred between 14:49 and 15:49,
so within a single hour.

The fail2ban solution will only ban the IP _after_ checking the log
file, so if we use a 1 hour window there could be ~100 new users
before the IP is banned. Maybe we could check every 5 minutes and ban
IPs from which more than 1 user was created. Do you think that's too
conservative ?

--strk;

strk · May 9, 2016, 1:32pm

Frank: I noticed that failing to solve the captcha still results
in a 200 http response (at least so it looks from the apache logs),
can it be ? Also the error is not very user friendly (it's a
stacktrace from the script), can it be improved ?

We found the cgi-bin directory to be managed in an SVN repository
but only users in the LDAP "admin" group have access and it looks
neither I nor Alex are in that group. Would it be worth converting
that SVN repository to a GIT one for easier access ?

--strk;

On Mon, May 09, 2016 at 03:21:18PM +0200, Sandro Santilli wrote:

I've enabled fail2ban banning for one hour IPs from which 2 users
are created over a 2 minutes period, and re-enabled users creation.
The /etc/fail2ban directory on "web" machine was put under a local
git repository, with "master" branch being the default configuration
from debian and "web" branch being the configuration for "web"
machine, so to eventually hold the configuration of all machines
within the same git repository (one branch for host).

Let's see what happens. Please keep an eye on users creation and
spam (remember there are a lot of fake users already, so it doesn't
relaly take a new registration to start spamming).

--strk;

On Mon, May 09, 2016 at 03:05:55PM +0200, Sandro Santilli wrote:
> On Sun, May 08, 2016 at 01:20:39PM -0700, Alex Mandel wrote:
> >
> > I had forgotten, we can re-enable as soon as a fail2ban rule is in place
> > to prevent rapid registration from the same ip.
>
> What represents "rapid" ?
> I went trough apache logs analisys to sense the current pattern.
> Logs contain POSTs to the user creation script from Jan 17 to May 05.
> The top 10 busy days, ordered by requests:
>
> 245 30/Apr
> 122 27/Apr
> 118 03/May
> 115 02/May
> 63 01/May
> 43 14/Mar
> 38 26/Apr
> 36 18/Apr
> 34 23/Feb
> 34 06/Apr
>
> The average for January, February and initial portion of April was
> around 20 new users, so it looks like in April 27th the storm
> started with a x6 increment on the number of registered users
> and it reached a x12 increment on April 30th.
>
> That day (April 30th) the 245 requests came from a total of 36 IP
> addresses. The top 10 hitters of these IPS:
>
> 93 103.233.118.38
> 32 108.61.224.153
> 26 180.151.246.4
> 16 182.68.169.25
> 11 104.156.228.177
> 11 103.38.177.2
> 4 151.236.19.24
> 4 107.152.98.151
> 4 106.78.50.229
> 3 98.234.5.157
>
> The 93 hits from 103.233.118.38 all occurred between 14:49 and 15:49,
> so within a single hour.
>
> The fail2ban solution will only ban the IP _after_ checking the log
> file, so if we use a 1 hour window there could be ~100 new users
> before the IP is banned. Maybe we could check every 5 minutes and ban
> IPs from which more than 1 user was created. Do you think that's too
> conservative ?
>
> --strk;

Alex_M · May 9, 2016, 2:34pm

I was thinking of something that might be rather impossible for humans
sitting in the same room over a shared ip or same computer.
Like more than 2-3 in 30 seconds (should take people that long to fill
out the form and click the box)

My chart of actual user creation matches your analysis of the logs.

Go ahead and re-enable the registration, and we'll just have to keep an
eye on it, and possibly adjust the rules. How long does the ban last?

Thanks,
Alex

On 05/09/2016 06:21 AM, Sandro Santilli wrote:

I've enabled fail2ban banning for one hour IPs from which 2 users
are created over a 2 minutes period, and re-enabled users creation.
The /etc/fail2ban directory on "web" machine was put under a local
git repository, with "master" branch being the default configuration
from debian and "web" branch being the configuration for "web"
machine, so to eventually hold the configuration of all machines
within the same git repository (one branch for host).

Let's see what happens. Please keep an eye on users creation and
spam (remember there are a lot of fake users already, so it doesn't
relaly take a new registration to start spamming).

--strk;

On Mon, May 09, 2016 at 03:05:55PM +0200, Sandro Santilli wrote:

On Sun, May 08, 2016 at 01:20:39PM -0700, Alex Mandel wrote:

I had forgotten, we can re-enable as soon as a fail2ban rule is in place
to prevent rapid registration from the same ip.

What represents "rapid" ?
I went trough apache logs analisys to sense the current pattern.
Logs contain POSTs to the user creation script from Jan 17 to May 05.
The top 10 busy days, ordered by requests:

  245 30/Apr
  122 27/Apr
  118 03/May
  115 02/May
  63 01/May
  43 14/Mar
  38 26/Apr
  36 18/Apr
  34 23/Feb
  34 06/Apr

The average for January, February and initial portion of April was
around 20 new users, so it looks like in April 27th the storm
started with a x6 increment on the number of registered users
and it reached a x12 increment on April 30th.

That day (April 30th) the 245 requests came from a total of 36 IP
addresses. The top 10 hitters of these IPS:

  93 103.233.118.38
  32 108.61.224.153
  26 180.151.246.4
  16 182.68.169.25
  11 104.156.228.177
  11 103.38.177.2
  4 151.236.19.24
  4 107.152.98.151
  4 106.78.50.229
  3 98.234.5.157

The 93 hits from 103.233.118.38 all occurred between 14:49 and 15:49,
so within a single hour.

The fail2ban solution will only ban the IP _after_ checking the log
file, so if we use a 1 hour window there could be ~100 new users
before the IP is banned. Maybe we could check every 5 minutes and ban
IPs from which more than 1 user was created. Do you think that's too
conservative ?

--strk;

strk · May 9, 2016, 2:38pm

On Mon, May 09, 2016 at 07:34:14AM -0700, Alex Mandel wrote:

I was thinking of something that might be rather impossible for humans
sitting in the same room over a shared ip or same computer.
Like more than 2-3 in 30 seconds (should take people that long to fill
out the form and click the box)

2-3 in 30 seconds never happened during the spam storm, as far as I
can tell, so it would block nothing.

My chart of actual user creation matches your analysis of the logs.

Great.

Go ahead and re-enable the registration, and we'll just have to keep an
eye on it, and possibly adjust the rules.

It's enabled now, we got 5 new registered users, 2 of which have
the _same_ email (something else to disallow?).

See:

ldapsearch -x "(&(createTimestamp>=20160509000000Z))"

How long does the ban last?

The ban is 1 hour.

--strk;

Alex_M · May 9, 2016, 2:49pm

On 05/09/2016 07:38 AM, Sandro Santilli wrote:

On Mon, May 09, 2016 at 07:34:14AM -0700, Alex Mandel wrote:

I was thinking of something that might be rather impossible for humans
sitting in the same room over a shared ip or same computer.
Like more than 2-3 in 30 seconds (should take people that long to fill
out the form and click the box)

2-3 in 30 seconds never happened during the spam storm, as far as I
can tell, so it would block nothing.

Ya was wondering, thanks for checking.

My chart of actual user creation matches your analysis of the logs.

Great.

Go ahead and re-enable the registration, and we'll just have to keep an
eye on it, and possibly adjust the rules.

It's enabled now, we got 5 new registered users, 2 of which have
the _same_ email (something else to disallow?).

Well not until we have a password reset. I could see kicking back the
form, saying - you've already registered.

See:

ldapsearch -x "(&(createTimestamp>=20160509000000Z))"

Ya I'm wondering if we should run a daily report, or hourly, that emails
SAC or at least the main admins if more than x number of accounts have
been made in the last hour (maybe 20). Since that would be a good sign
of bulk registration. This would use the ldapsearch above...

How long does the ban last?

The ban is 1 hour.

--strk;

Thanks,
Alex

strk · May 9, 2016, 3:08pm

On Mon, May 09, 2016 at 07:49:37AM -0700, Alex Mandel wrote:

On 05/09/2016 07:38 AM, Sandro Santilli wrote:

> It's enabled now, we got 5 new registered users, 2 of which have
> the _same_ email (something else to disallow?).
>

Well not until we have a password reset. I could see kicking back the
form, saying - you've already registered.

We badly need the email confirmation thing, no dubt.
Frank: do you think you could work on that ?

> ldapsearch -x "(&(createTimestamp>=20160509000000Z))"
>

Ya I'm wondering if we should run a daily report, or hourly, that emails
SAC or at least the main admins if more than x number of accounts have
been made in the last hour (maybe 20). Since that would be a good sign
of bulk registration. This would use the ldapsearch above...

Another 2 accounts were just registered, and they uid doesn't
look sane at all to me:

kumartinkusingh08
ct7316944

I can't match those to the IP they came from as I don't know how
to extract createTimestamp from LDAP, the apache log does not contain
information about the name AND the script creator itself does not
create any log.

I'll look at creating a script to report the number of users created
in the last X hours, and then get it called to report to sac.

--strk;

strk · May 9, 2016, 3:28pm

On Mon, May 09, 2016 at 05:08:11PM +0200, Sandro Santilli wrote:

I'll look at creating a script to report the number of users created
in the last X hours, and then get it called to report to sac.

Scripts are ready, and put in a git repository on tracsvn machine,
directory /osgeo/tools/ldap.

Alex: you should have received an email report, if you like what you
see I can have the script run hourly and report to an arbitrary list
of recipients.

Usage:
./ldap_user_creation_report.sh <hours> <maxusers> <email> [<email>...]

--strk;

Alex_M · May 9, 2016, 3:51pm

On 05/09/2016 11:08 AM, Sandro Santilli wrote:

On Mon, May 09, 2016 at 07:49:37AM -0700, Alex Mandel wrote:

On 05/09/2016 07:38 AM, Sandro Santilli wrote:

It's enabled now, we got 5 new registered users, 2 of which have
the _same_ email (something else to disallow?).

Well not until we have a password reset. I could see kicking back the
form, saying - you've already registered.

We badly need the email confirmation thing, no dubt.
Frank: do you think you could work on that ?

ldapsearch -x "(&(createTimestamp>=20160509000000Z))"

Ya I'm wondering if we should run a daily report, or hourly, that emails
SAC or at least the main admins if more than x number of accounts have
been made in the last hour (maybe 20). Since that would be a good sign
of bulk registration. This would use the ldapsearch above...

Another 2 accounts were just registered, and they uid doesn't
look sane at all to me:

kumartinkusingh08
ct7316944

I can't match those to the IP they came from as I don't know how
to extract createTimestamp from LDAP, the apache log does not contain
information about the name AND the script creator itself does not
create any log.

ldapsearch -H ldaps://ldap.osgeo.org/ -b dc=osgeo,dc=org -x
"(&(createTimestamp>=20160401100000Z))" +

The + sign at the end dumps the createTimestamp

Alex

strk · May 9, 2016, 4:07pm

On Mon, May 09, 2016 at 11:51:20AM -0400, Alex M wrote:

ldapsearch -H ldaps://ldap.osgeo.org/ -b dc=osgeo,dc=org -x
"(&(createTimestamp>=20160401100000Z))" +

The + sign at the end dumps the createTimestamp

I've figured other ways. Documented them in:

SAC:LDAP - OSGeo

And I added a new section with details on the new cron job:

SAC:LDAP - OSGeo

--strk;

strk · May 9, 2016, 6:37pm

On Mon, May 09, 2016 at 05:08:11PM +0200, Sandro Santilli wrote:

We badly need the email confirmation thing, no dubt.
Frank: do you think you could work on that ?

For the record, as mentioned in the other thread, at least one
of the suspicious users (ct7316944) was confirmed to be a spammer,
so I disabled the user registration form again.

I can't tell for sure if the registrant is a bot or a human, but
I can tell the captcha itself is not preventing spammers from
registering, so I disabled the registration form again.

--strk;

> >
>
> Ya I'm wondering if we should run a daily report, or hourly, that emails
> SAC or at least the main admins if more than x number of accounts have
> been made in the last hour (maybe 20). Since that would be a good sign
> of bulk registration. This would use the ldapsearch above...

Another 2 accounts were just registered, and they uid doesn't
look sane at all to me:

kumartinkusingh08
ct7316944

I can't match those to the IP they came from as I don't know how
to extract createTimestamp from LDAP, the apache log does not contain
information about the name AND the script creator itself does not
create any log.

I'll look at creating a script to report the number of users created
in the last X hours, and then get it called to report to sac.

--strk;