[SAC] projectsVM: rsync reniced to 10

neteler · May 10, 2012, 4:52pm

Hi SAC,

FYI

since some openlayers rsync jobs almost killed the projVM, I have
generally reniced rsync to 10, i.e. modified the related value in
/etc/default/rsync
and restarted the daemon.

Now the machine is responsive again.

cheers
Markus

Martin_Spott · May 10, 2012, 5:56pm

On Thu, May 10, 2012 at 06:52:56PM +0200, Markus Neteler wrote:

since some openlayers rsync jobs almost killed the projVM, I have
generally reniced rsync to 10, i.e. modified the related value in
/etc/default/rsync

Rsync also knows about bandwidth limitation ("--bwlimit", as far as I
remember), according to my experience that's quite handy.

Cheers,
Martin.
--
Unix _IS_ user friendly - it's just selective about who its friends are !
--------------------------------------------------------------------------

neteler · May 14, 2012, 4:10pm

On Thu, May 10, 2012 at 6:52 PM, Markus Neteler <neteler@osgeo.org> wrote:

Hi SAC,

FYI

since some openlayers rsync jobs almost killed the projVM, I have
generally reniced rsync to 10, i.e. modified the related value in
/etc/default/rsync
and restarted the daemon.

I found the openlayers rsync jobs again almost killing
the machine (2hs ago from now). I would kindly invite
the openlayers people to review their cronjobs and
put "nice" in front of the command.

Thanks.

Markus

neteler · May 24, 2012, 5:29pm

Hi Tim,

according to
http://wiki.osgeo.org/wiki/Contacts#Software_Projects

you are the contact point for OpenLayers. Please subscribe
to the SAC list (or delegate) for important problems caused
by the OpenLayers project on the shared projectsVM server.

Each project using OSGeo infrastructure should have a person
reading this low traffic SAC list.

If an OpenLayers delegate is already subscribed here, please
suggest to him/her to read the emails and react on requests.
Maybe you could help us to identify the person.

thanks
Markus

On Mon, May 14, 2012 at 6:10 PM, Markus Neteler <neteler@osgeo.org> wrote:

On Thu, May 10, 2012 at 6:52 PM, Markus Neteler <neteler@osgeo.org> wrote:
> Hi SAC,
>
> FYI
>
> since some openlayers rsync jobs almost killed the projVM, I have
> generally reniced rsync to 10, i.e. modified the related value in
> /etc/default/rsync
> and restarted the daemon.

I found the openlayers rsync jobs again almost killing
the machine (2hs ago from now). I would kindly invite
the openlayers people to review their cronjobs and
put "nice" in front of the command.

Thanks.

Markus

Tim_Schaub · May 25, 2012, 12:09am

Hi Markus and others,

Apologies for the hassle. I'm now subscribed to the SAC list. With
no real "SA" credentials to speak of, I welcome suggestions on how we
can manage resources on the shared machines without causing trouble
for others.

I've removed three uses of rsync in the script that updates the hosted
OpenLayers sites. I'm hoping this will alleviate future problems. I
see someone has added "nice" to our cron job.

# m h dom mon dow command
33 * * * * nice -n 18
/osgeo/openlayers/repos/openlayers/tools/update_dev_dir.sh

I'll briefly describe what our goals are below. Please let me know if
you can see better ways to accomplish this.

The openlayers.org website is based on content from two git
repositories. Everything under the /dev path is the result of
concatenating/minifiying scripts and modifying examples from the main
library repository. Everything else comes from a separate repository.

Previously, the hosted resources were served out of svn checkouts -
with a checkout of the main library repo nested within the the
checkout of the website repo (under the dev path). Because the lib
resources were modified in place, the checkouts were not always
updated reliably.

Recently I updated things to maintain clones of the two git repos, and
am using rsync to update copies of the website and library repos in
/osgeo/openlayers/sites/openlayers.org and
/osgeo/openlayers/sites/openlayers.org/dev respectively. The website
repo is updated very infrequently and the library repo is updated
frequently.

As mentioned above, I got rid of three uses of rsync (to update hosted
copies of the sandboxes, addins, and sphinx docs). Please suggest
alternatives if they way we're using it currently is still a problem
[1][2].

Thanks,
Tim

[1] https://github.com/openlayers/openlayers/blob/11084334653f1520c62aafc8cfe5896f133e929f/tools/update_dev_dir.sh#L15-16
[2] https://github.com/openlayers/openlayers/blob/11084334653f1520c62aafc8cfe5896f133e929f/tools/update_dev_dir.sh#L66-68

On Thu, May 24, 2012 at 11:29 AM, Markus Neteler <neteler@osgeo.org> wrote:

Hi Tim,

according to
http://wiki.osgeo.org/wiki/Contacts#Software_Projects

you are the contact point for OpenLayers. Please subscribe
to the SAC list (or delegate) for important problems caused
by the OpenLayers project on the shared projectsVM server.

Each project using OSGeo infrastructure should have a person
reading this low traffic SAC list.

If an OpenLayers delegate is already subscribed here, please
suggest to him/her to read the emails and react on requests.
Maybe you could help us to identify the person.

thanks
Markus

On Mon, May 14, 2012 at 6:10 PM, Markus Neteler <neteler@osgeo.org> wrote:

On Thu, May 10, 2012 at 6:52 PM, Markus Neteler <neteler@osgeo.org> wrote:
> Hi SAC,
>
> FYI
>
> since some openlayers rsync jobs almost killed the projVM, I have
> generally reniced rsync to 10, i.e. modified the related value in
> /etc/default/rsync
> and restarted the daemon.

I found the openlayers rsync jobs again almost killing
the machine (2hs ago from now). I would kindly invite
the openlayers people to review their cronjobs and
put "nice" in front of the command.

Thanks.

Markus

--
Tim Schaub
OpenGeo http://opengeo.org/
Expert service straight from the developers.

Martin_Spott · May 26, 2012, 8:21am

Hi Tim,

On Thu, May 24, 2012 at 06:09:57PM -0600, Tim Schaub wrote:

I've removed three uses of rsync in the script that updates the hosted
OpenLayers sites. I'm hoping this will alleviate future problems. I
see someone has added "nice" to our cron job.

# m h dom mon dow command
33 * * * * nice -n 18
/osgeo/openlayers/repos/openlayers/tools/update_dev_dir.sh

.... and set it to an hourly schedule instead of running the "rsync"
job every minute.
As far as I can tell, the issue we faced was triggered by "rsync" jobs
accumulting, because every individual of them didn't finish within one
minute. Maybe the remote end was almost unavailable or at least very
slow. Thus, with every new "rsync" every minute, the remote end was
responding even slower than before, leading to a pile of pending
"rsync" jobs with no chance to recover.

I'd say it's ok to run a "touch ${HOME}/.crond-running" every minute,
but running an "rsync" job which highly depends on network latency as
well as on the availability and performance of the remote end is rude -
especially in a case when the maintainer of the respective job doesn't
monitor the primary communication channel for the ressources he's
using.

From my perspective I'd recommend to leave things in the current state

and to check early for upcoming trouble.

Cheers,
Martin.
--
Unix _IS_ user friendly - it's just selective about who its friends are !
--------------------------------------------------------------------------

Tim_Schaub · May 26, 2012, 11:03pm

Hi Martin,

On Sat, May 26, 2012 at 2:21 AM, Martin Spott <Martin.Spott@mgras.net> wrote:

Hi Tim,

On Thu, May 24, 2012 at 06:09:57PM -0600, Tim Schaub wrote:
I've removed three uses of rsync in the script that updates the hosted
OpenLayers sites. I'm hoping this will alleviate future problems. I
see someone has added "nice" to our cron job.
\# m h  dom mon dow   command
33 \* \* \* \*  nice \-n 18
/osgeo/openlayers/repos/openlayers/tools/update_dev_dir.sh
.... and set it to an hourly schedule instead of running the "rsync"
job every minute.
As far as I can tell, the issue we faced was triggered by "rsync" jobs
accumulting, because every individual of them didn't finish within one
minute. Maybe the remote end was almost unavailable or at least very
slow. Thus, with every new "rsync" every minute, the remote end was
responding even slower than before, leading to a pile of pending
"rsync" jobs with no chance to recover.

I'd say it's ok to run a "touch ${HOME}/.crond-running" every minute,
but running an "rsync" job which highly depends on network latency as
well as on the availability and performance of the remote end is rude -

Don't think we can roll back to see what the frequency of the cron job
really was before, but my recollection was that the job was running
every 15 minutes. At least this is how things were 6 months ago or
so. Can't say when this might have changed.

In addition, the rsync should only be happening when there are changes
[1]. So, less than once every fifteen minutes in general. And the
rsync is between local directories - so I'm not sure "highly depends
on network latency" is appropriate here. Though maybe you're
referring to the `git ls-remote ...`.

Again, I appreciate the feedback. Please know that in setting it up,
I didn't think that synchronizing local directories less than every
fifteen minutes would be considered rude. It sounds like one way or
another, it ended up behaving differently than it was intended.

Thanks again for any suggestions on making additional changes to the
scripts that maintain the OpenLayers website.

Tim

[1] https://github.com/openlayers/openlayers/blob/master/tools/update_dev_dir.sh#L5-9

especially in a case when the maintainer of the respective job doesn't
monitor the primary communication channel for the ressources he's
using.

>From my perspective I'd recommend to leave things in the current state
and to check early for upcoming trouble.

Cheers,
Martin.
--
Unix _IS_ user friendly - it's just selective about who its friends are !
--------------------------------------------------------------------------
_______________________________________________
Sac mailing list
Sac@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/sac

--
Tim Schaub
OpenGeo http://opengeo.org/
Expert service straight from the developers.