[GRASS5] grass.itc.it: up again - IP traffic restrictions

The main site 'grass.itc.it' is up again - the provider
has resolved the network problems.

Some changes:

As the internet traffic consumed nearly all bandwidth of
our institute, we have activated "apache-throttle" to keep
the maximum traffic below a certain limit over a given period.

If you get a message:
  "503 - Service temporarily not available"
the server is overloaded. Please use a mirror site close to you.
Probably the "throttle" parameters have to be tuned in a few days.

The current policy is:
- allow 1 Mbit/4sec in general, otherwise "sleep" two seconds
- allow 500Kbit/2sec per IP, otherwise "sleep" two seconds

We have to see how to treat short peaks.

Mirroring with rsync - changes:

Also the mirror sites will be reorganized to use cascaded mirroring.
rsync connections to grass.itc.it will be only allowed for a few
mirror sites. These mirror sites will provide rsync access for
other mirrors and third party connections.

All changes are needed to reduce the network load here at ITC-irst.

Sorry for any inconvenience,

Markus Neteler

Markus,
thanks to you and ITC-irst
for the continuing efforts to host GRASS.

  Bernhard

On Wed, Jun 04, 2003 at 11:32:35AM +0200, Markus Neteler wrote:

The main site 'grass.itc.it' is up again - the provider
has resolved the network problems.

Some changes:

As the internet traffic consumed nearly all bandwidth of
our institute, we have activated "apache-throttle" to keep
the maximum traffic below a certain limit over a given period.

If you get a message:
  "503 - Service temporarily not available"
the server is overloaded. Please use a mirror site close to you.
Probably the "throttle" parameters have to be tuned in a few days.

The current policy is:
- allow 1 Mbit/4sec in general, otherwise "sleep" two seconds
- allow 500Kbit/2sec per IP, otherwise "sleep" two seconds

We have to see how to treat short peaks.

Mirroring with rsync - changes:

Also the mirror sites will be reorganized to use cascaded mirroring.
rsync connections to grass.itc.it will be only allowed for a few
mirror sites. These mirror sites will provide rsync access for
other mirrors and third party connections.

All changes are needed to reduce the network load here at ITC-irst.

On Thu, Jun 05, 2003 at 12:28:39PM +0200, Bernhard Reiter wrote:

Markus,
thanks to you and ITC-irst
for the continuing efforts to host GRASS.

Cheers :slight_smile:
It's also interesting to learn all this throttling etc.

On Wed, Jun 04, 2003 at 11:32:35AM +0200, Markus Neteler wrote:

[...]

> If you get a message:
> "503 - Service temporarily not available"
> the server is overloaded. Please use a mirror site close to you.

Now the mirrors page is included. Soonish we will have three
or four tier-1 mirror sites which read from grass.itc.it and
provide themselves rsync access.

> Probably the "throttle" parameters have to be tuned in a few days.
>
> The current policy is:
> - allow 1 Mbit/4sec in general, otherwise "sleep" two seconds

now: 2Mbit/4sec

> - allow 500Kbit/2sec per IP, otherwise "sleep" two seconds

left unchanged

> We have to see how to treat short peaks.

So far more than 2000 people have seen "503-overload" during the last
24 hours. They may enjoy the mirror sites.

Note:
Currently the search engine doesn't work any more due to the new
password protection of the archives. I'll have to look into this.

Markus

On Thu, 5 Jun 2003, Markus Neteler wrote:

Note:
Currently the search engine doesn't work any more due to the new
password protection of the archives. I'll have to look into this.

It is very useful the way all the lists are currently indexed on Google.
It would be a great pity to lose this (presumably the Google indexing
robot can't access the archive either now). But it may be OK as it is now
on gmane.org. However their archive doesn't go back as far.

I notice the link to the old developer's mailing list archive seems to
have been removed from the website (although it never worked for me). Has
this information just been lost for ever? Was there much interesting /
useful reading on it?

Paul

On Fri, Jun 06, 2003 at 11:51:15AM +0100, Paul Kelly wrote:

On Thu, 5 Jun 2003, Markus Neteler wrote:

> Note:
> Currently the search engine doesn't work any more due to the new
> password protection of the archives. I'll have to look into this.

It is very useful the way all the lists are currently indexed on Google.
It would be a great pity to lose this (presumably the Google indexing
robot can't access the archive either now).

In fact I have actively deleted the google index to get the mailing
list out for spam protection reasons.
Meanwhile the htdig on grass.itc.it seems to work again (but only
fixed for the HTML pages, I don't get the virtual directory /pipermail/
indexed, although I have generated a /pipermail/index.html.).

Well, we have to agree on the archives policy:

a) keep extern search engines out of our archives (current implementation)
b) allow google et al. to index the archives (meanwhile the email addresses
   are no longer coded as foo@bar.baf but as foo at bar.baf)

But it may be OK as it is now
on gmane.org. However their archive doesn't go back as far.

The archive is completely accessible. Only htdig is rather unconfigurable
to read in the Mailman archives.

I notice the link to the old developer's mailing list archive seems to
have been removed from the website (although it never worked for me). Has
this information just been lost for ever? Was there much interesting /
useful reading on it?

The (small) archive is on a SUN in Hannover, they have switched of the
machine before I could copy the files over to Italy. Looks to be lost
unfortunately (does anyone have a copy of the 'grass5' archives before
4/2001?

Markus

On Fri, Jun 06, 2003 at 08:33:41PM +0200, Markus Neteler wrote:

On Fri, Jun 06, 2003 at 11:51:15AM +0100, Paul Kelly wrote:
>
>
> On Thu, 5 Jun 2003, Markus Neteler wrote:
>
> > Note:
> > Currently the search engine doesn't work any more due to the new
> > password protection of the archives. I'll have to look into this.
>
> It is very useful the way all the lists are currently indexed on Google.
> It would be a great pity to lose this (presumably the Google indexing
> robot can't access the archive either now).

In fact I have actively deleted the google index to get the mailing
list out for spam protection reasons.
Meanwhile the htdig on grass.itc.it seems to work again (but only
fixed for the HTML pages, I don't get the virtual directory /pipermail/
indexed, although I have generated a /pipermail/index.html.).

Well, we have to agree on the archives policy:

a) keep extern search engines out of our archives (current implementation)

Not a real option.
This is among the biggest resource of the internet.

b) allow google et al. to index the archives (meanwhile the email addresses
   are no longer coded as foo@bar.baf but as foo at bar.baf)

On Tue, 10 Jun 2003, Bernhard Reiter wrote:

On Fri, Jun 06, 2003 at 08:33:41PM +0200, Markus Neteler wrote:
> On Fri, Jun 06, 2003 at 11:51:15AM +0100, Paul Kelly wrote:
> >
> >
> > On Thu, 5 Jun 2003, Markus Neteler wrote:
> >
> > > Note:
> > > Currently the search engine doesn't work any more due to the new
> > > password protection of the archives. I'll have to look into this.
> >
> > It is very useful the way all the lists are currently indexed on Google.
> > It would be a great pity to lose this (presumably the Google indexing
> > robot can't access the archive either now).
>
> In fact I have actively deleted the google index to get the mailing
> list out for spam protection reasons.
> Meanwhile the htdig on grass.itc.it seems to work again (but only
> fixed for the HTML pages, I don't get the virtual directory /pipermail/
> indexed, although I have generated a /pipermail/index.html.).
>
> Well, we have to agree on the archives policy:
>
> a) keep extern search engines out of our archives (current implementation)

Not a real option.
This is among the biggest resource of the internet.

Please, Markus, can you remove the password protection to the developers
mailing list so I can search it on Google again. As you said above, the
search at http://grass.itc.it/searchgrass.html doesn't work and I can
confirm this; I can't get any results out of it at all. And recently it
seems to have been deleted from Google's cache so I can't access it at
all.

There is so much stuff that is only documented in the mailing list and I
think it is really vital to have searching it working properly.

I noticed that the original reason this thread started; someone complained
about all the GIS cafe spam e-mails that were arriving: this seemed to
stop after the complaints so somebody must have been reading this list.

I wonder has no one else an opinion on this?

Paul

On Sun, Jun 29, 2003 at 07:23:56PM +0100, Paul Kelly wrote:

On Tue, 10 Jun 2003, Bernhard Reiter wrote:

> On Fri, Jun 06, 2003 at 08:33:41PM +0200, Markus Neteler wrote:
> > On Fri, Jun 06, 2003 at 11:51:15AM +0100, Paul Kelly wrote:
> > >
> > >
> > > On Thu, 5 Jun 2003, Markus Neteler wrote:
> > >
> > > > Note:
> > > > Currently the search engine doesn't work any more due to the new
> > > > password protection of the archives. I'll have to look into this.
> > >
> > > It is very useful the way all the lists are currently indexed on Google.
> > > It would be a great pity to lose this (presumably the Google indexing
> > > robot can't access the archive either now).
> >
> > In fact I have actively deleted the google index to get the mailing
> > list out for spam protection reasons.
> > Meanwhile the htdig on grass.itc.it seems to work again (but only
> > fixed for the HTML pages, I don't get the virtual directory /pipermail/
> > indexed, although I have generated a /pipermail/index.html.).
> >
> > Well, we have to agree on the archives policy:
> >
> > a) keep extern search engines out of our archives (current implementation)
>
> Not a real option.
> This is among the biggest resource of the internet.
>

Please, Markus, can you remove the password protection to the developers
mailing list so I can search it on Google again. As you said above, the
search at http://grass.itc.it/searchgrass.html doesn't work and I can
confirm this; I can't get any results out of it at all. And recently it
seems to have been deleted from Google's cache so I can't access it at
all.

Another option (which I would prefer) is to get
http://grass.itc.it/searchgrass.html
a.k.a. "htdig" running so that it also reads the pipermail/ archives.

There is so much stuff that is only documented in the mailing list and I
think it is really vital to have searching it working properly.

Definitely.

I noticed that the original reason this thread started; someone complained
about all the GIS cafe spam e-mails that were arriving: this seemed to
stop after the complaints so somebody must have been reading this list.

I receive tons of spam every day. This person complaining wrote to me
and to "weblist", he could demonstrate that his new email address was only
found only found by google in the GFZ/Potsdam user list archive
and that he had received spam some days later. At least this indicates
that spam harvesters use google.

I wonder has no one else an opinion on this?

Surprisingly not (yet).

Is there any "htdig" expert to give me a hand? I spent much time on it
already, but it refuses to index local directories (to bypass the
".htaccess" archives password protection).

I also want to the search engine back.

Markus

On Mon, Jun 30, 2003 at 09:08:27AM +0200, Markus Neteler wrote:

On Sun, Jun 29, 2003 at 07:23:56PM +0100, Paul Kelly wrote:
> On Tue, 10 Jun 2003, Bernhard Reiter wrote:
>
> > On Fri, Jun 06, 2003 at 08:33:41PM +0200, Markus Neteler wrote:
> > > On Fri, Jun 06, 2003 at 11:51:15AM +0100, Paul Kelly wrote:
> > > >
> > > >
> > > > On Thu, 5 Jun 2003, Markus Neteler wrote:
> > > >
> > > > > Note:
> > > > > Currently the search engine doesn't work any more due to the new
> > > > > password protection of the archives. I'll have to look into this.

Finally I got it working. Now
http://grass.itc.it/searchgrass.html

is able to search in the mailing lists, but you have to enter (one time per
session) the public password when clicking on a mailing list message.

I hope that this is an acceptable solution for all.

Markus

Hello Markus

On Mon, 30 Jun 2003, Markus Neteler wrote:

Finally I got it working. Now
http://grass.itc.it/searchgrass.html

is able to search in the mailing lists, but you have to enter (one time per
session) the public password when clicking on a mailing list message.

I hope that this is an acceptable solution for all.

Yes, it works and it does the job quite well. However, I suppose we can't
use it when the webserver is overloaded? The mailing lists aren't included
on the mirror sites.

It will work for now but I think as Bernhard says we can't really keep the
mailing list archives secret from the rest of the internet. It means people
are less likely to come across GRASS while searching for the solution to a
problem.

Paul

On Mon, Jun 30, 2003 at 11:11:03AM +0100, Paul Kelly wrote:

Hello Markus

On Mon, 30 Jun 2003, Markus Neteler wrote:

> Finally I got it working. Now
> http://grass.itc.it/searchgrass.html
>
> is able to search in the mailing lists, but you have to enter (one time per
> session) the public password when clicking on a mailing list message.
>
> I hope that this is an acceptable solution for all.

Yes, it works and it does the job quite well. However, I suppose we can't
use it when the webserver is overloaded?

That's true, but usually just for some seconds to 1 minute.

The mailing lists aren't included on the mirror sites.

Right. But also the mirror sites would need a working search engine
otherwise a copy didn't help.

It will work for now but I think as Bernhard says we can't really keep the
mailing list archives secret from the rest of the internet. It means people
are less likely to come across GRASS while searching for the solution to a
problem.

I agree almost, but on the other hand the privacy (email addresses) must be
somewhat protected. Mailman isn't very clever to scramble the email
addresses (or I don't know how to do that, hints are welcome).

Markus

On Mon, Jun 30, 2003 at 12:24:32PM +0200, Markus Neteler wrote:

On Mon, Jun 30, 2003 at 11:11:03AM +0100, Paul Kelly wrote:

> It will work for now but I think as Bernhard says we can't really keep the
> mailing list archives secret from the rest of the internet. It means people
> are less likely to come across GRASS while searching for the solution to a
> problem.

I agree almost, but on the other hand the privacy (email addresses) must be
somewhat protected. Mailman isn't very clever to scramble the email
addresses (or I don't know how to do that, hints are welcome).

Address scrambling is not helping a lot in my opinion.
Thus there are many places my email addresses are published unscrambled.
A good (local) spamfilter gives some protection,
but in the long run the internet community as to come
to better solutions which is outside of the focus of this group.
You might want to support efforts like http://www.cauce.org/.

Bottom line: scrambling email addresses or protecting webpages
behind authentification does more damage then good as I can see it.

On Mon, Jun 30, 2003 at 04:50:35PM +0200, Bernhard Reiter wrote:

On Mon, Jun 30, 2003 at 12:24:32PM +0200, Markus Neteler wrote:
> On Mon, Jun 30, 2003 at 11:11:03AM +0100, Paul Kelly wrote:

> > It will work for now but I think as Bernhard says we can't really keep the
> > mailing list archives secret from the rest of the internet. It means people
> > are less likely to come across GRASS while searching for the solution to a
> > problem.
>
> I agree almost, but on the other hand the privacy (email addresses) must be
> somewhat protected. Mailman isn't very clever to scramble the email
> addresses (or I don't know how to do that, hints are welcome).

Address scrambling is not helping a lot in my opinion.
Thus there are many places my email addresses are published unscrambled.

... e.g. in the FreeGIS mailing list archive.

A good (local) spamfilter gives some protection,

If you are able to set it up. In case you are not root or able
to implement it, you face a problem.

The effort to block the archive for spiders is much less that
hundreds of people installing spamfilters.

but in the long run the internet community as to come
to better solutions which is outside of the focus of this group.
You might want to support efforts like http://www.cauce.org/.

Bottom line: scrambling email addresses or protecting webpages
behind authentification does more damage then good as I can see it.

Why does authentification more damage then good?

Markus

Bernhard Reiter wrote:

On Mon, Jun 30, 2003 at 12:24:32PM +0200, Markus Neteler wrote:
> On Mon, Jun 30, 2003 at 11:11:03AM +0100, Paul Kelly wrote:

> > It will work for now but I think as Bernhard says we can't really keep the
> > mailing list archives secret from the rest of the internet. It means people
> > are less likely to come across GRASS while searching for the solution to a
> > problem.
>
> I agree almost, but on the other hand the privacy (email addresses) must be
> somewhat protected. Mailman isn't very clever to scramble the email
> addresses (or I don't know how to do that, hints are welcome).

Address scrambling is not helping a lot in my opinion.
Thus there are many places my email addresses are published unscrambled.
A good (local) spamfilter gives some protection,
but in the long run the internet community as to come
to better solutions which is outside of the focus of this group.
You might want to support efforts like http://www.cauce.org/.

Bottom line: scrambling email addresses or protecting webpages
behind authentification does more damage then good as I can see it.

  ------------------------------------------------------------------------
   Part 1.2Type: application/pgp-signature

I agree with Bernhard on this - I have my ncsu.edu email published in
many
places on the web and I get little spam there (and in fact until
recently
I did not get any at all). On the other hand, my hotmail address is not
published
anywhere but I use it for on-line purchases and I get ton of spam there
every day.
I get no spam at all on care2.com
So at least for me it depends more on the filters and how I use the
email,
than on publishing the email address.

Anyway, it would be nice to keep the GRASS mailing list archive open.

Helena

On Mon, Jun 30, 2003 at 12:02:24PM -0400, Helena Mitasova wrote:
[...]

I agree with Bernhard on this - I have my ncsu.edu email published in many
places on the web and I get little spam there (and in fact until recently
I did not get any at all). On the other hand, my hotmail address is not
published anywhere but I use it for on-line purchases and I get ton of
spam there every day. I get no spam at all on care2.com So at least for me
it depends more on the filters and how I use the email, than on publishing
the email address.

Well, I get > 50 spams per day, not really exciting.

Anyway, it would be nice to keep the GRASS mailing list archive open.

The GRASS mailing list archive *is* open.
But it is not open to automated email harvesters.

Markus

On Mon, Jun 30, 2003 at 06:01:57PM +0200, Markus Neteler wrote:

On Mon, Jun 30, 2003 at 04:50:35PM +0200, Bernhard Reiter wrote:
> On Mon, Jun 30, 2003 at 12:24:32PM +0200, Markus Neteler wrote:
> > On Mon, Jun 30, 2003 at 11:11:03AM +0100, Paul Kelly wrote:
>
> > > It will work for now but I think as Bernhard says we can't really keep the
> > > mailing list archives secret from the rest of the internet. It means people
> > > are less likely to come across GRASS while searching for the solution to a
> > > problem.
> >
> > I agree almost, but on the other hand the privacy (email addresses) must be
> > somewhat protected. Mailman isn't very clever to scramble the email
> > addresses (or I don't know how to do that, hints are welcome).
>
> Address scrambling is not helping a lot in my opinion.
> Thus there are many places my email addresses are published unscrambled.

... e.g. in the FreeGIS mailing list archive.

Yes.

> A good (local) spamfilter gives some protection,

If you are able to set it up. In case you are not root or able
to implement it, you face a problem.

Unfortunately the only helpful suggestion here is to go
to an email provider which cares for the account
and provides reasonable spam filters. Everybody can do that
and it is necessary with our without the GRASS list being behind
authenfication.

The effort to block the archive for spiders is much less that
hundreds of people installing spamfilters.

Why does authentification more damage then good?

As the hundreds of people's situation does not change much at all
regarding spam the effort to block the spiders is extra.
And it does damage, because search engines cannot easily do indexing.

On Mon, 30 Jun 2003, Helena Mitasova wrote:

Bernhard Reiter wrote:
>
> On Mon, Jun 30, 2003 at 12:24:32PM +0200, Markus Neteler wrote:
> > On Mon, Jun 30, 2003 at 11:11:03AM +0100, Paul Kelly wrote:
>
> > > It will work for now but I think as Bernhard says we can't really keep the
> > > mailing list archives secret from the rest of the internet. It means people
> > > are less likely to come across GRASS while searching for the solution to a
> > > problem.
> >
> > I agree almost, but on the other hand the privacy (email addresses) must be
> > somewhat protected. Mailman isn't very clever to scramble the email
> > addresses (or I don't know how to do that, hints are welcome).
>
> Address scrambling is not helping a lot in my opinion.
> Thus there are many places my email addresses are published unscrambled.
> A good (local) spamfilter gives some protection,
> but in the long run the internet community as to come
> to better solutions which is outside of the focus of this group.
> You might want to support efforts like http://www.cauce.org/.
>
> Bottom line: scrambling email addresses or protecting webpages
> behind authentification does more damage then good as I can see it.
>
> ------------------------------------------------------------------------
> Part 1.2Type: application/pgp-signature

I agree with Bernhard on this - I have my ncsu.edu email published in
many places on the web and I get little spam there (and in fact until
recently I did not get any at all). On the other hand, my hotmail
address is not published anywhere but I use it for on-line purchases and
I get ton of spam there every day. I get no spam at all on care2.com
So at least for me it depends more on the filters and how I use the
email, than on publishing the email address.

I also agree with Bernhard, as things are, spam filtering is unfortunately
necessary for everone everywhere, and email address mangling doesn't seem
to be the issue. I have the same experience as Helena, a work address
published freely and not much spam (about 50% spam before a filter was
implemented on the mail server), and a home address from a commercial ISP
with 50 spam (or more) to one real message - this address is not published
anywhere. I use spamassassin and recommend it or similar to others.

There are in fact, as Bernhard says, two issues: 1) revealing email
addresses in an archive - I believe this can be mangled a bit by inserting
" at " for "@", but you don't have to be bright to see that this doesn't
have much life left - like other mangling schemes; 2) the important
indexing and searchability issue, which is one of the effective ways of
attracting attention. Searching on Google for "regularized splines" should
lead to GRASS and does.

Anyway, it would be nice to keep the GRASS mailing list archive open.

Including open for crawlers - we benefit from other projects' archives
being externally indexed (most of my answers to compiling bits of
libes/gis under MinGW came from various list archives), so we ought to at
least consider ourselves a resource also for people who aren't in GIS, but
are facing similar development questions.

But it boils down to running it, and if it can't be run open within the
contraints that apply, then it can't.

Roger

--
Roger Bivand
Economic Geography Section, Department of Economics, Norwegian School of
Economics and Business Administration, Breiviksveien 40, N-5045 Bergen,
Norway. voice: +47 55 95 93 55; fax +47 55 95 93 93
e-mail: Roger.Bivand@nhh.no