[SAC] [OSGeo] #1810: AdhocVM not accessible

#1810: AdhocVM not accessible
Reporter: jmckenna | Owner: sac@…
     Type: task | Status: new
Priority: normal | Milestone:
Component: Systems Admin | Keywords:
- cannot ssh into the AdhocVM (https://wiki.osgeo.org/wiki/AdhocVM)
  - not sure if this is related to recent DDOS attacks
  - host: adhoc(dot)osgeo(dot)osuosl(dot)org

Comment (by martin):


Comment (by tomkralidis):

$ ssh tomkralidis@demo.pycsw.org
ssh: connect to host demo.pycsw.org port 22: Connection timed out
$ ssh tomkralidis@adhoc.osuosl.osgeo.org
ssh: Could not resolve hostname adhoc.osuosl.osgeo.org: Name or service
not known

Comment (by jmckenna):

As that was 3 days ago, and as this machine is critical for so many OSGeo
projects, maybe we need to spread out the administration of this VM, so it
isn't one person holding this back. I am sure Tom or I can help here.
Please let us know if we can help in any way.

Comment (by martin):

Surprisingly the "working on it"-comment seemingly didn't get through ....
well, here it is :slight_smile:

The issue is a little bit tricky. The easiest solution - for us - would be
to ask OSL to set up the same kernel/boot hack as they did for the
"webextra" VM. Last time I asked them they weren't too happy about it.

Alex, would you mind asking again ?

Comment (by jmckenna):

Thanks for the update Martin, as I am sure several projects were awaiting
this update. Thanks again for sharing this news. 'working on it' for 3
days, for a server hosting so many OSGeo projects, likely just needed an
update on status. Thanks again for this.

Comment (by martin):

I'm really sorry the intermediate update was lost - maybe I simply clicked
the wrong button.

I *do* have access to the "bare iron" and thus to the filesystems of the
VM as well. Anyhow, finding out why the boot loader doesn't work in this
setup (the initial setup of all of these VM's had a little design flaw)
proved to be more time-consuming than expected (and I need to respect the
constraints of my day-job).

Comment (by jmckenna):

I do appreciate the sarcasm, but, rest assured that all projects received
your 'working on it' message here in this ticket 3 days ago; my point is,
for future tickets, just give updates as you travel down the journey -
your message today of OSL was great, I am sure the OSGeo projects
appreciated the update. The 3 days of your effort so far was greatly
appreciated, just be sure to keep this ticket updated with your efforts.

Comment (by strk):

I think sharing the admin burden among at least 2 people is still a
good idea. 3 would be even better.

Comment (by wildintellect):

There are lots of people who have admin to the VM, just not the bare metal
(which we are retiring).

Yes, we'll need to file a ticket with osuosl, if someone has an old email
about how it was managed last time that would help.

As an alternative, what if we clone this VM disk over to osgeo6 and run it
inside of a KVM+libvirt setup? We need a plan to move everything anyways
as osgeo4 needs to be retired, and has 1 failed disk right now.

Comment (by jmckenna):

Cloning this disk over to osgeo6 sounds better, as we're not delayed by an
external ticket in that case.

Comment (by jmckenna):

Can we have an update on this?

Comment (by martin):

I'll start working on it right now and hopefully will be able to provide a
proper solution for the other affected VM's as well.

Comment (by jmckenna):

thanks martin

Comment (by martin):

I give up on this, I simply don't understand how they're booting our VM's.
No matter how I'm installing a GRUB bootloader into the virtual disk, it
simply doesn't get loaded.

Thus, would anybody being in touch with OSL please ask them to bolt a
kernel into the "adhoc" VM in the same way as they did to the "webextra"
VM ?

If not, then we might consider turning the "osgeo6" machine into a Xen
host and migrate a filesystem dump of the "adhoc" VM into a Xen guest

Comment (by martin):

Well, it looks like I finally made it work. Please let me know what's

Comment (by strk):

Last time I contacted OSUOSL people (not sure it was for webextra)
I did so by joining #osuosl IRC channel on freenode.

Will do again, referencing this ticket.

Comment (by strk):

I actually just logged into adhoc, am I useing the wrong IP ?

strk@adhoc:~$ hostname -f; date; /sbin/ifconfig eth0
Fri Oct 28 01:48:12 PDT 2016
eth0 Link encap:Ethernet HWaddr aa:00:00:ae:6b:fc
           inet addr: Bcast:
           inet6 addr: fe80::a800:ff:feae:6bfc/64 Scope:Link
           RX packets:205136 errors:0 dropped:0 overruns:0 frame:0
           TX packets:39667 errors:0 dropped:0 overruns:0 carrier:0
           collisions:0 txqueuelen:1000
           RX bytes:40203361 (38.3 MiB) TX bytes:19318868 (18.4 MiB)

Comment (by martin):

Sandro, i can't see anything wrong about your findings.

As I finally wrote last night, I managed to get a proper boot loader
installed. As a side effect I learned how to cure the partitioning and
boot loader setups, which allows to upgrade all our VM's to latest Debian.

Next I'll apply the same procedure to the former "mail" VM, in preparation
for #1805.

Comment (by strk):

Great news, thank you Martin !
(I hadn't read your comment before writing my findings)

