[SAC] OSGeo "tracsvn" VM down

Hi,
I wonder how to restart the "tracsvn.osgeo.osuosl.org" VM on the
"osgeo3" from command line.

We found out that the VM is down but the Ganeti Web Manager told me it
was running. But apparently it was unavailable (no HTTP, no SSH, no
ping).
I called an "immediate shutdown" of the VM in order to get into a
consistent state. After calling a subsequent "start" in the Ganetí Web
Manager I was told:

"Error checking bridges on destination node 'osgeo3.osuosl.bak': Error
60: SSL certificate problem: certificate has expired"

I do have access to the physical "osgeo3" machine but I have no idea on
how to start KVM guests on Gentoo (many KVM support pages recommend
"virsh", but there's no such command on "osgeo3"). I found a nice Wiki
page:

  http://www.linux-kvm.org/page/KvmOnGentoo

.... which says:

"there are no standard Gentoo way to do this, you will need 3rd party
scripts/front-ends"

I tried to put the Ganeti utilities into well-defined state (as a test
balloon for rebooting the entire machine) by restarting
"/etc/init.d/ganeti" but ended up in:

" * Starting ganeti-masterd ...
  * exit code 1 [ !! ]"

Please help me start this VM, I don't want to reboot the entire host
just to start a single VM.

Thanks,
  Martin.
--
Unix _IS_ user friendly - it's just selective about who its friends are !
--------------------------------------------------------------------------

A new ticket has been created and assigned an ID of [support.osuosl.org/25173].

---

From: Martin Spott <Martin.Spott@mgras.net>

Hi,
I wonder how to restart the "tracsvn.osgeo.osuosl.org" VM on the
"osgeo3" from command line.

We found out that the VM is down but the Ganeti Web Manager told me it
was running. But apparently it was unavailable (no HTTP, no SSH, no
ping).
I called an "immediate shutdown" of the VM in order to get into a
consistent state. After calling a subsequent "start" in the Ganetí Web
Manager I was told:

"Error checking bridges on destination node 'osgeo3.osuosl.bak': Error
60: SSL certificate problem: certificate has expired"

I do have access to the physical "osgeo3" machine but I have no idea on
how to start KVM guests on Gentoo (many KVM support pages recommend
"virsh", but there's no such command on "osgeo3"). I found a nice Wiki
page:

  http://www.linux-kvm.org/page/KvmOnGentoo

.... which says:

"there are no standard Gentoo way to do this, you will need 3rd party
scripts/front-ends"

I tried to put the Ganeti utilities into well-defined state (as a test
balloon for rebooting the entire machine) by restarting
"/etc/init.d/ganeti" but ended up in:

" * Starting ganeti-masterd ...
  * exit code 1 [ !! ]"

Please help me start this VM, I don't want to reboot the entire host
just to start a single VM.

Thanks,
  Martin.
--
Unix _IS_ user friendly - it's just selective about who its friends are !
--------------------------------------------------------------------------

I've worked around a Ganeti issue by following:

  https://code.google.com/p/ganeti/issues/detail?id=191#c8

Now after creating a new "server.pem" certificate and copying that over
to "osgeo4" I'm able to restart the "ganeti-masterd" as advertized.

And I'm able to compare the output of "gnt-instance list" on "osgeo3"
with the output of the Ganeti Web Manager. This is a little bit scary:

After starting the "tracsvn" VM from command line on "osgeo" using
"gnt-instance startup tracsvn.osgeo.osuosl.org", the "gnt-instance list"
command shows a running "tracsvn" and a halted "tracsvn2".
The Ganeti Web Manager instead shows a halted "tracsvn" and a running
"tracsvn2".
Now, let's say, someone would try to destroy the obsolete "tracsvn2"
VM via Web Manager, are we going to loose the production instance ?!?

Anyway, the "tracsvn" VM is starting again, but requires entering the
root password on the console for manual filesystem check. Anybody here
to help ?

Thanks,
  Martin.
--
Unix _IS_ user friendly - it's just selective about who its friends are !
--------------------------------------------------------------------------

I've worked around a Ganeti issue by following:

  https://code.google.com/p/ganeti/issues/detail?id=191#c8

Now after creating a new "server.pem" certificate and copying that over
to "osgeo4" I'm able to restart the "ganeti-masterd" as advertized.

And I'm able to compare the output of "gnt-instance list" on "osgeo3"
with the output of the Ganeti Web Manager. This is a little bit scary:

After starting the "tracsvn" VM from command line on "osgeo" using
"gnt-instance startup tracsvn.osgeo.osuosl.org", the "gnt-instance list"
command shows a running "tracsvn" and a halted "tracsvn2".
The Ganeti Web Manager instead shows a halted "tracsvn" and a running
"tracsvn2".
Now, let's say, someone would try to destroy the obsolete "tracsvn2"
VM via Web Manager, are we going to loose the production instance ?!?

Anyway, the "tracsvn" VM is starting again, but requires entering the
root password on the console for manual filesystem check. Anybody here
to help ?

Thanks,
  Martin.
--
Unix _IS_ user friendly - it's just selective about who its friends are !
--------------------------------------------------------------------------

On Sat, May 23, 2015 at 02:15:04PM -0700, Martin Spott via RT wrote:

Anyway, the "tracsvn" VM is starting again, but requires entering the
root password on the console for manual filesystem check. Anybody here
to help ?

This is frustrating: Instead of playing with the kids I spent the
bigger part of this Saturday for getting the Trac/SVN VM running again
and now I'm stuck because:
a) Someone chose a filesystem which reqires manual intervention after
   the machine crashes and
b) we don't know the root PW to perform this intervention.

Unhappy,
  Martin.
--
Unix _IS_ user friendly - it's just selective about who its friends are !
--------------------------------------------------------------------------

Mission accomplished,

  Martin.
--
Unix _IS_ user friendly - it's just selective about who its friends are !
--------------------------------------------------------------------------

Mission accomplished,

  Martin.
--
Unix _IS_ user friendly - it's just selective about who its friends are !
--------------------------------------------------------------------------