[SAC] Fair projectsVM usage (esp. Openlayers)

neteler · February 12, 2012, 8:26am

I would like to remind folks to set cron jobs "nice"ly, otherwise
the performance goes down (read: unresponsive):

top - 00:21:37 up 10 days, 3:53, 1 user, load average: 42.23,
36.57, 31.99 <<---!!
Tasks: 287 total, 2 running, 285 sleeping, 0 stopped, 0 zombie
Cpu(s): 2.7%us, 1.3%sy, 22.1%ni, 18.2%id, 45.9%wa, 0.0%hi, 9.8%si, 0.0%st
Mem: 8198148k total, 8034800k used, 163348k free, 976752k buffers
Swap: 4096564k total, 32384k used, 4064180k free, 5006152k cached

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
23588 openlaye 39 0 144m 18m 1640 R 100 0.2 2044:46 perl
^^^^^^--!!

See the load average.

Solution: Put the word "nice" in front of your jobs to run them with nice
level 10 rather than 0.

Yes, please do recheck your jobs. Thanks.

Markus

Hamish · February 12, 2012, 8:35am

Markus wrote:

I would like to remind folks to set
cron jobs "nice"ly, otherwise
the performance goes down (read: unresponsive):

top - 00:21:37 up 10 days, 3:53, 1 user,
load average: 42.23, 36.57, 31.99 <<---!!

fwiw I'm pretty sure the crazy load averages are due to bacula.

  PID USER PR NI
VIRT RES SHR S %CPU %MEM
TIME+ COMMAND
23588 openlaye 39 0 144m 18m 1640
R 100 0.2 2044:46 perl

  ^^^^^^--!!

See the load average.

Solution: Put the word "nice" in front of your jobs to run
them with nice level 10 rather than 0.

Yes, please do recheck your jobs. Thanks.

perl has been running at 100% for a couple days...

Hamish

neteler · February 12, 2012, 9:53am

On Sun, Feb 12, 2012 at 9:35 AM, Hamish <hamish_b@yahoo.com> wrote:

Markus wrote:

I would like to remind folks to set
cron jobs "nice"ly, otherwise
the performance goes down (read: unresponsive):

top - 00:21:37 up 10 days, 3:53, 1 user,
load average: 42.23, 36.57, 31.99 <<---!!

fwiw I'm pretty sure the crazy load averages are due to bacula.

Not sure: It happened several times (when I checked), it was
always an openlayers job (sometimes mapserver) and renicing
it manually brought down the load average. Also today.

perl has been running at 100% for a couple days...

So even more a reason to "nice" it.
Remember:: shared resources should be used properly.

Markus

christopher.schmidt · February 12, 2012, 12:22pm

On Feb 12, 2012, at 3:35 AM, ext Hamish wrote:

Markus wrote:

I would like to remind folks to set
cron jobs "nice"ly, otherwise
the performance goes down (read: unresponsive):

top - 00:21:37 up 10 days, 3:53, 1 user,
load average: 42.23, 36.57, 31.99 <<---!!

fwiw I'm pretty sure the crazy load averages are due to bacula.

PID USER PR NI
VIRT RES SHR S %CPU %MEM
TIME+ COMMAND
23588 openlaye 39 0 144m 18m 1640
R 100 0.2 2044:46 perl

^^^^^^--!!

See the load average.

Solution: Put the word "nice" in front of your jobs to run
them with nice level 10 rather than 0.

Yes, please do recheck your jobs. Thanks.

perl has been running at 100% for a couple days...

Looks like naturaldocs got stuck in a loop. That script is
part of a cron job that runs every hour, so killing it if i
it ever gets in that state is a perfectly reasonable response.
(I've gone ahead and done so now.)

-- Chris

Hamish
_______________________________________________
Sac mailing list
Sac@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/sac

Hamish · February 12, 2012, 12:30pm

christopher wrote:

Looks like naturaldocs got stuck in a loop. That script is
part of a cron job that runs every hour, so killing it if i
it ever gets in that state is a perfectly reasonable
response.
(I've gone ahead and done so now.)

suggest to have the hourly cron job begin by 'pgrep -c'ing for itself
and pkill'ing the old one if it has not completed. &/or spamming the
job's owner when that happens. (will the new one just suffer the same
loop-fate as the previous?)

(& run it niced too

Hamish

christopher.schmidt · February 12, 2012, 12:36pm

On Feb 12, 2012, at 7:30 AM, ext Hamish wrote:

christopher wrote:

Looks like naturaldocs got stuck in a loop. That script is
part of a cron job that runs every hour, so killing it if i
it ever gets in that state is a perfectly reasonable
response.
(I've gone ahead and done so now.)

suggest to have the hourly cron job begin by 'pgrep -c'ing for itself
and pkill'ing the old one if it has not completed. &/or spamming the
job's owner when that happens. (will the new one just suffer the same
loop-fate as the previous?)

Since the job has been running every hour since that one started
without failing, it seems unlikely, otherwise we'd have a whole
stack of them. I've seen it fail in this way precisely twice in
the two years since we moved to the projects server; to be honest,
any software work at this point is probably overkill, but:

https://github.com/openlayers/openlayers/blob/master/tools/update_dev_dir.sh

patches welcome

-- Chris

(& run it niced too

Hamish
_______________________________________________
Sac mailing list
Sac@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/sac

Hamish · February 12, 2012, 12:58pm

christopher wrote:

Since the job has been running every hour since that one
started without failing, it seems unlikely, otherwise we'd have a
whole stack of them. I've seen it fail in this way precisely twice
in the two years since we moved to the projects server; to be
honest, any software work at this point is probably overkill, but:
https://github.com/openlayers/openlayers/blob/master/tools/update_dev_dir.sh

patches welcome

I'm all about the overkill...

.
.
.

####

user="`whoami`"
command="update_dev_dir.sh"
email="you@example.com"
subj="$command running overtime on `hostname`"

NUM_ALREADY=`pgrep -c -U $user -f "$command"`
if [ "$NUM_ALREADY" -gt 0 ] ; then
   pkill -U $user -f "$command"
   echo -e "$command on `hostname` went bad, `date`.\n" | \
      mail -s "$subj" "$email"
fi

####

renice +19 -p $$

.
.
.

(untested)

regards,
Hamish

Martin_Spott · February 12, 2012, 12:59pm

On Sun, Feb 12, 2012 at 12:35:41AM -0800, Hamish wrote:

Markus wrote:

> top - 00:21:37 up 10 days, 3:53, 1 user,
> load average: 42.23, 36.57, 31.99 <<---!!

fwiw I'm pretty sure the crazy load averages are due to bacula.

Why ?

Martin.
--
Unix _IS_ user friendly - it's just selective about who its friends are !
--------------------------------------------------------------------------

Hamish · February 12, 2012, 1:04pm

> Markus wrote:

> > top - 00:21:37 up 10 days, 3:53, 1 user,
> > load average: 42.23, 36.57, 31.99 <<---!!

Hamish:

> fwiw I'm pretty sure the crazy load averages are due to bacula.

Martin:

Why ?

correlation with the same time of day that it is running. (although
there may be another cron job set to go then as well..)

PgUp/PgDn in `less ~hamish/cpu_use.projects.log` and see.

regards,
Hamish

Hamish · February 12, 2012, 1:06pm

Hamish wrote:

NUM_ALREADY=`pgrep -c -U $user -f "$command"`
if [ "$NUM_ALREADY" -gt 0 ] ; then

oops, some number greater than the list which includes the cron job that's
running the test

H

Martin_Spott · February 12, 2012, 1:30pm

On Sun, Feb 12, 2012 at 05:04:23AM -0800, Hamish wrote:

> > Markus wrote:
>
> > > top - 00:21:37 up 10 days, 3:53, 1 user,
> > > load average: 42.23, 36.57, 31.99 <<---!!

Hamish:
> > fwiw I'm pretty sure the crazy load averages are due to bacula.

Martin:
> Why ?

correlation with the same time of day that it is running. (although
there may be another cron job set to go then as well..)

Indeed. I'm maintaining a couple of Bacula setups on production
servers and none of them is affected by cpu load increasing that much
during backup, and, what's probably most relevant, there's simply no
plausible explanation.

Bacula is just reading file metatada or complete files, the file system
operations are highly predictable, because Bacula simply walks the
direcory tree. That's a really easy task for the OS. Therefore if the
load rises that much, I'd rather assume Bacula to be the victim and not
the cause.

Cheers,
Martin.
--
Unix _IS_ user friendly - it's just selective about who its friends are !
--------------------------------------------------------------------------

Martin_Spott · February 12, 2012, 1:34pm

On Sun, Feb 12, 2012 at 02:30:20PM +0100, Martin Spott wrote:

Bacula is just reading file metatada or complete files, the file system
operations are highly predictable, because Bacula simply walks the
direcory tree. That's a really easy task for the OS.

.... even though we're using ext3 on the OSGeo VM's

Martin.
--
Unix _IS_ user friendly - it's just selective about who its friends are !
--------------------------------------------------------------------------