On Sun, Feb 12, 2012 at 9:35 AM, Hamish <hamish_b@yahoo.com> wrote:
Markus wrote:
I would like to remind folks to set
cron jobs "nice"ly, otherwise
the performance goes down (read: unresponsive):
top - 00:21:37 up 10 days, 3:53, 1 user,
load average: 42.23, 36.57, 31.99 <<---!!
fwiw I'm pretty sure the crazy load averages are due to bacula.
Not sure: It happened several times (when I checked), it was
always an openlayers job (sometimes mapserver) and renicing
it manually brought down the load average. Also today.
perl has been running at 100% for a couple days...
So even more a reason to "nice" it.
Remember:: shared resources should be used properly.
I would like to remind folks to set
cron jobs "nice"ly, otherwise
the performance goes down (read: unresponsive):
top - 00:21:37 up 10 days, 3:53, 1 user,
load average: 42.23, 36.57, 31.99 <<---!!
fwiw I'm pretty sure the crazy load averages are due to bacula.
PID USER PR NI
VIRT RES SHR S %CPU %MEM
TIME+ COMMAND
23588 openlaye 39 0 144m 18m 1640
R 100 0.2 2044:46 perl
^^^^^^--!!
See the load average.
Solution: Put the word "nice" in front of your jobs to run
them with nice level 10 rather than 0.
Yes, please do recheck your jobs. Thanks.
perl has been running at 100% for a couple days...
Looks like naturaldocs got stuck in a loop. That script is
part of a cron job that runs every hour, so killing it if i
it ever gets in that state is a perfectly reasonable response.
(I've gone ahead and done so now.)
Looks like naturaldocs got stuck in a loop. That script is
part of a cron job that runs every hour, so killing it if i
it ever gets in that state is a perfectly reasonable
response.
(I've gone ahead and done so now.)
suggest to have the hourly cron job begin by 'pgrep -c'ing for itself
and pkill'ing the old one if it has not completed. &/or spamming the
job's owner when that happens. (will the new one just suffer the same
loop-fate as the previous?)
Looks like naturaldocs got stuck in a loop. That script is
part of a cron job that runs every hour, so killing it if i
it ever gets in that state is a perfectly reasonable
response.
(I've gone ahead and done so now.)
suggest to have the hourly cron job begin by 'pgrep -c'ing for itself
and pkill'ing the old one if it has not completed. &/or spamming the
job's owner when that happens. (will the new one just suffer the same
loop-fate as the previous?)
Since the job has been running every hour since that one started
without failing, it seems unlikely, otherwise we'd have a whole
stack of them. I've seen it fail in this way precisely twice in
the two years since we moved to the projects server; to be honest,
any software work at this point is probably overkill, but:
Since the job has been running every hour since that one
started without failing, it seems unlikely, otherwise we'd have a
whole stack of them. I've seen it fail in this way precisely twice
in the two years since we moved to the projects server; to be
honest, any software work at this point is probably overkill, but: https://github.com/openlayers/openlayers/blob/master/tools/update_dev_dir.sh
patches welcome
I'm all about the overkill...
.
.
.
####
user="`whoami`"
command="update_dev_dir.sh"
email="you@example.com"
subj="$command running overtime on `hostname`"
NUM_ALREADY=`pgrep -c -U $user -f "$command"`
if [ "$NUM_ALREADY" -gt 0 ] ; then
pkill -U $user -f "$command"
echo -e "$command on `hostname` went bad, `date`.\n" | \
mail -s "$subj" "$email"
fi
On Sun, Feb 12, 2012 at 12:35:41AM -0800, Hamish wrote:
Markus wrote:
> top - 00:21:37 up 10 days, 3:53, 1 user,
> load average: 42.23, 36.57, 31.99 <<---!!
fwiw I'm pretty sure the crazy load averages are due to bacula.
Why ?
Martin.
--
Unix _IS_ user friendly - it's just selective about who its friends are !
--------------------------------------------------------------------------
On Sun, Feb 12, 2012 at 05:04:23AM -0800, Hamish wrote:
> > Markus wrote:
>
> > > top - 00:21:37 up 10 days, 3:53, 1 user,
> > > load average: 42.23, 36.57, 31.99 <<---!!
Hamish:
> > fwiw I'm pretty sure the crazy load averages are due to bacula.
Martin:
> Why ?
correlation with the same time of day that it is running. (although
there may be another cron job set to go then as well..)
Indeed. I'm maintaining a couple of Bacula setups on production
servers and none of them is affected by cpu load increasing that much
during backup, and, what's probably most relevant, there's simply no
plausible explanation.
Bacula is just reading file metatada or complete files, the file system
operations are highly predictable, because Bacula simply walks the
direcory tree. That's a really easy task for the OS. Therefore if the
load rises that much, I'd rather assume Bacula to be the victim and not
the cause.
Cheers,
Martin.
--
Unix _IS_ user friendly - it's just selective about who its friends are !
--------------------------------------------------------------------------
On Sun, Feb 12, 2012 at 02:30:20PM +0100, Martin Spott wrote:
Bacula is just reading file metatada or complete files, the file system
operations are highly predictable, because Bacula simply walks the
direcory tree. That's a really easy task for the OS.
.... even though we're using ext3 on the OSGeo VM's
Martin.
--
Unix _IS_ user friendly - it's just selective about who its friends are !
--------------------------------------------------------------------------