[SAC] dsmc - tivoli backup system on osgeo1

Frank_Warmerdam · November 19, 2009, 9:36pm

Folks,

I am increasingly convinced that the automated backup done by dsmc
on osgeo1 is responsible for the IO contention that leads to service
unavailability on a fairly frequent basis.

I would like to disable these backups for a couple weeks to see if it helps
a lot. We have never used the backups and most service data is backed up
by other mechanisms anyways.

Thoughts? If there are no objections raised here in a couple days I might
just go ahead and do it.

Also, does anyone know how to manage this service? Do I need to file a ticket
with peer1?

Best regards,
--
---------------------------------------+--------------------------------------
I set the clouds in motion - turn up | Frank Warmerdam, warmerdam@pobox.com
light and sound - activate the windows | http://pobox.com/~warmerdam
and watch the world go round - Rush | Geospatial Programmer for Rent

hobu · November 19, 2009, 9:55pm

On Nov 19, 2009, at 3:36 PM, Frank Warmerdam wrote:

Folks,

I am increasingly convinced that the automated backup done by dsmc
on osgeo1 is responsible for the IO contention that leads to service
unavailability on a fairly frequent basis.

I would like to disable these backups for a couple weeks to see if it helps
a lot. We have never used the backups and most service data is backed up
by other mechanisms anyways.

Thoughts? If there are no objections raised here in a couple days I might
just go ahead and do it.

Also, does anyone know how to manage this service? Do I need to file a ticket
with peer1?

+ 1 Assuming we have documented by-hand or existing backup solutions for all of our services that are currently assuming tivoli's doing their backup (ldap?, databases?, htdocs directories?, trac config

Howard

Tyler_Mitchell1 · November 19, 2009, 10:27pm

On Thu, 19 Nov 2009 15:55:29 -0600
Howard Butler <hobu.inc@gmail.com> wrote:

On Nov 19, 2009, at 3:36 PM, Frank Warmerdam wrote:

> Folks,
>
> I am increasingly convinced that the automated backup done by dsmc
> on osgeo1 is responsible for the IO contention that leads to service
> unavailability on a fairly frequent basis.
>
> I would like to disable these backups for a couple weeks to see if
> it helps a lot. We have never used the backups and most service
> data is backed up by other mechanisms anyways.
>
> Thoughts? If there are no objections raised here in a couple days
> I might just go ahead and do it.
>
> Also, does anyone know how to manage this service? Do I need to
> file a ticket with peer1?

+ 1 Assuming we have documented by-hand or existing backup solutions
for all of our services that are currently assuming tivoli's doing
their backup (ldap?, databases?, htdocs directories?, trac config

Yes, you have to file a ticket to discuss this with them. It would be
good to let them know since I get a note from their automated
systems every time their backup system cannot be contacted from their
end.

Martin_Spott · November 23, 2009, 10:04pm

On Thu, Nov 19, 2009 at 04:36:12PM -0500, Frank Warmerdam wrote:

I am increasingly convinced that the automated backup done by dsmc
on osgeo1 is responsible for the IO contention that leads to service
unavailability on a fairly frequent basis.

Today I have installed 'iostat' on the 'osgeo1' machine. The next time
you suspect IO contention (when I'm not around), please paste the
output of approx. half a minute into an EMail, running the command as:

# ~> iostat -x 5

Thanks,
Martin.
--
Unix _IS_ user friendly - it's just selective about who its friends are !
--------------------------------------------------------------------------

Frank_Warmerdam · November 24, 2009, 7:46am

Martin Spott wrote:

On Thu, Nov 19, 2009 at 04:36:12PM -0500, Frank Warmerdam wrote:

I am increasingly convinced that the automated backup done by dsmc
on osgeo1 is responsible for the IO contention that leads to service
unavailability on a fairly frequent basis.

Today I have installed 'iostat' on the 'osgeo1' machine. The next time
you suspect IO contention (when I'm not around), please paste the
output of approx. half a minute into an EMail, running the command as:

# ~> iostat -x 5

Martin,

log for a part of a minute attached. This is during a period when the load
average spiked to 23 or so, and "wait states" were around 65% in the top
report.

In this case, there is no sign of dsmc. The top of the top report looks like:

top - 02:43:22 up 19 days, 5:38, 2 users, load average: 23.77, 22.19, 19.00
Tasks: 333 total, 1 running, 332 sleeping, 0 stopped, 0 zombie
Cpu(s): 14.9% us, 2.6% sy, 0.0% ni, 21.9% id, 60.6% wa, 0.1% hi, 0.0% si
Mem: 2074860k total, 2050372k used, 24488k free, 5060k buffers
Swap: 2040244k total, 723380k used, 1316864k free, 453616k cached

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND

26106 apache 17 0 64836 45m 7516 S 25.8 2.2 0:15.01 httpd
26116 postgres 18 0 27032 11m 10m D 6.9 0.6 0:01.22 postmaster
26162 apache 16 0 40060 21m 6248 S 5.9 1.1 0:01.17 httpd
25202 apache 15 0 65620 42m 7656 D 4.6 2.1 0:11.93 httpd
26114 root 16 0 3252 540 188 S 4.6 0.0 0:00.74 gzip
22970 apache 16 0 228m 137m 7636 S 3.6 6.8 1:04.32 httpd
25971 postgres 17 0 26424 11m 9m D 3.6 0.5 0:03.70 postmaster
26149 postgres 16 0 26336 11m 9m S 3.3 0.5 0:01.87 postmaster
2922 mysql 16 0 200m 93m 3344 S 2.6 4.6 3016:12 mysqld
25456 apache 15 0 76080 50m 7776 D 1.0 2.5 0:25.90 httpd
26159 postgres 15 0 26336 11m 10m D 1.0 0.5 0:00.61 postmaster
67 root 15 0 0 0 0 S 0.7 0.0 33:18.59 kswapd0
25937 postgres 15 0 26380 11m 9m S 0.7 0.5 0:02.09 postmaster
26017 postgres 16 0 26768 11m 10m D 0.7 0.6 0:01.40 postmaster
26040 postgres 15 0 26380 11m 9m D 0.7 0.5 0:00.77 postmaster
26113 root 16 0 5020 1676 1296 S 0.7 0.1 0:00.11 pg_dump
26140 root 16 0 3188 1156 772 R 0.7 0.1 0:00.56 top
25252 postgres 16 0 26324 8440 7520 S 0.3 0.4 0:00.15 postmaster
25792 postgres 15 0 26336 10m 10m D 0.3 0.5 0:00.63 postmaster
25846 postgres 15 0 26952 11m 10m D 0.3 0.6 0:02.64 postmaster
25883 postgres 17 0 26336 10m 9512 D 0.3 0.5 0:00.22 postmaster
25993 postgres 15 0 26344 10m 9m D 0.3 0.5 0:00.45 postmaster
26003 postgres 17 0 26336 10m 9.9m D 0.3 0.5 0:00.35 postmaster
26020 apache 16 0 62312 43m 7664 S 0.3 2.2 0:05.19 httpd
26093 postgres 15 0 26336 11m 10m D 0.3 0.5 0:00.79 postmaster
26151 postgres 16 0 26340 8064 7068 D 0.3 0.4 0:00.08 postmaster

Best regards,
--
---------------------------------------+--------------------------------------
I set the clouds in motion - turn up | Frank Warmerdam, warmerdam@pobox.com
light and sound - activate the windows | http://pobox.com/~warmerdam
and watch the world go round - Rush | Geospatial Programmer for Rent

log (3.93 KB)

Martin_Spott · November 24, 2009, 11:44am

Hi Frank, thanks for posting the report.

On Tue, Nov 24, 2009 at 02:46:34AM -0500, Frank Warmerdam wrote:

In this case, there is no sign of dsmc.

.... which doesn't come by surprise since the backup client doesn't
consume a lot of CPU cycles while it's waiting for the result of its
read request to get returned

Are 'we' having RAID5/6 running here or just stupid disk mirroring via
the 3ware controller ? The backup server is 'bu2atl.bu.peer1.net', if
this points to a fixed IP number, then we could try to throttle the
backup bandwidth via network traffic shaping.

Cheers,
Martin.
--
Unix _IS_ user friendly - it's just selective about who its friends are !
--------------------------------------------------------------------------