[SAC] OSGeo Ganeti Cluster

​​OSGeo Admins,

I’d like to do several changes to your Ganeti cluster eventually to bring it up to a better supported platform and version of Ganeti as well. Unfortunately this is going to cause some downtime for each node but I’m pretty sure I can do it without losing data or downtime to certain VMs. Both of your nodes are currently running Gentoo which we haven’t been maintaining other than for very important security issues that come up. Also, the version of Ganeti is currently 2.6.2 and the latest stable version is 2.15.2 which includes several improvements.

The summary of items I’d like to do are:

  1. Install CentOS 7 as the OS for all of the nodes

  2. Switch to managing said nodes to Chef instead of Cfengine

  3. Upgrade Ganeti from 2.6.2 to 2.15.2 (or whatever is stable at the point we get to this)

This is going to need to be a multi-stage process unfortunately, but I’m hoping I only have to do one down time per node. I’ve tested this process in a Vagrant environment and it seems to work.

Here’s the actual steps I plan to do:

  1. Take osgeo3 down and reinstall it’s OS with CentOS 7 and retain it’s LVM data for VMs

  2. Install Ganeti 2.6.2 on osgeo3 using Chef so that the version stays the same throughout the whole cluster

  3. Readd osgeo3 back into the cluster using its previous configuration and start all the VMs back up

  4. Repeat the process of steps 1 through 3 with osgeo4

  5. Upgrade Ganeti to 2.11.8 on all the nodes (I’ve found this to be safer than jumping from 2.6.2 directly to 2.15 as they made some major changes to the backend in those versions)

  6. Finally upgrade Ganeti to 2.15.2 or whatever is latest stable at the time.

So my questions to you are:

  1. Should any of the instances below be migrated to another node during it’s primary node downtime? If so and they’re currently set to plain, we can convert them to DRBD, it will just take a short downtime (depending on how large the disk is) and move them over.
  2. When could we start doing this? I was hoping to start within the next month or so but it can certainly be adjusted.
  3. How should we communicate in real-time if we need to? Via #osuosl on IRC? Other means?
Instance Primary_node Status Memory DiskUsage Disk_template
adhoc.osgeo.osuosl.org osgeo4.osuosl.bak running 4096 65536 plain
base.osgeo.osuosl.org osgeo3.osuosl.bak ADMIN_down - 4096 plain
download.osgeo.osuosl.org osgeo3.osuosl.bak running 8192 158720 plain
mail.osgeo.osuosl.org osgeo4.osuosl.bak running 4096 75776 plain
projects.osgeo.osuosl.org osgeo4.osuosl.bak running 16384 208896 plain
qgis.osgeo.osuosl.org osgeo4.osuosl.bak running 6144 167936 plain
secure.osgeo.osuosl.org osgeo3.osuosl.bak running 4096 14464 drbd
tracsvn2.osgeo.osuosl.org osgeo3.osuosl.bak ADMIN_down - 86016 plain
tracsvn.osgeo.osuosl.org osgeo3.osuosl.bak running 8192 106496 plain
web.osgeo.osuosl.org osgeo3.osuosl.bak running 4096 36864 plain
webextra.osgeo.osuosl.org osgeo3.osuosl.bak running 4096 126976 plain
wiki.osgeo.osuosl.org osgeo3.osuosl.bak running 4096 20480 plain

​Thanks-​

Lance Albertson

Director
Oregon State University | Open Source Lab

Lance Albertson wrote:

I'd like to do several changes to your Ganeti cluster eventually [...]

Did anybody respond to this proposal ?

  Martin.
--
Unix _IS_ user friendly - it's just selective about who its friends are !
--------------------------------------------------------------------------

On 10/21/2017 01:41 PM, Martin Spott wrote:

Lance Albertson wrote:

I'd like to do several changes to your Ganeti cluster eventually [...]

Did anybody respond to this proposal ?

  Martin.

No I don't think anyone has responded. Did he detail what he wants to
change?

-Alex

Hi All,

I’m not sure if you got my original email back in July but I’m finally ready to start scheduling this. I’d like to amend my plan below to the following:

Summary:

  1. Upgrade Ganeti from 2.6.2 to 2.15.2

  2. Install CentOS 7 as the OS for all of the nodes

  3. Switch to managing said nodes to Chef instead of Cfengine

···

Here’s the actual steps I plan to do:


  1. ​U
    pgrade Ganeti to 2.15.2 on the current cluster from 2.6.2

  2. Migrating high priority instances from plain to drbd using --no-wait-for-sync [1]

  3. ​Failover instances on osgeo3 to osgeo4

  4. Take osgeo3 down and reinstall it’s OS with CentOS 7 and retain it’s LVM data for VMs

  5. Readd osgeo3 back into the cluster using its previous configuration and start all the VMs back up

  6. Repeat the process of steps
    ​3​
    ​through ​
    ​5
    with osgeo4

​I’d like to go ahead with #1 and then schedule a time to do #2 after that’s completed.

Let me know!​

​[1] ​_The -t (–disk-template) option will change the disk template of the instance. Currently only conversions between the plain and drbd disk templates are supported, and the instance must be stopped before attempting the conversion. When changing from the plain to the drbd disk template, a new secondary node must be specified via the -n option. The option --no-wait-for-sync can be used when converting to the drbd template in order to make the instance available for startup before DRBD has finished resyncing._

On Thu, Jul 27, 2017 at 1:46 PM, Lance Albertson <lance@osuosl.org> wrote:

​​OSGeo Admins,

I’d like to do several changes to your Ganeti cluster eventually to bring it up to a better supported platform and version of Ganeti as well. Unfortunately this is going to cause some downtime for each node but I’m pretty sure I can do it without losing data or downtime to certain VMs. Both of your nodes are currently running Gentoo which we haven’t been maintaining other than for very important security issues that come up. Also, the version of Ganeti is currently 2.6.2 and the latest stable version is 2.15.2 which includes several improvements.

The summary of items I’d like to do are:

  1. Install CentOS 7 as the OS for all of the nodes

  2. Switch to managing said nodes to Chef instead of Cfengine

  3. Upgrade Ganeti from 2.6.2 to 2.15.2 (or whatever is stable at the point we get to this)

This is going to need to be a multi-stage process unfortunately, but I’m hoping I only have to do one down time per node. I’ve tested this process in a Vagrant environment and it seems to work.

Here’s the actual steps I plan to do:

  1. Take osgeo3 down and reinstall it’s OS with CentOS 7 and retain it’s LVM data for VMs

  2. Install Ganeti 2.6.2 on osgeo3 using Chef so that the version stays the same throughout the whole cluster

  3. Readd osgeo3 back into the cluster using its previous configuration and start all the VMs back up

  4. Repeat the process of steps 1 through 3 with osgeo4

  5. Upgrade Ganeti to 2.11.8 on all the nodes (I’ve found this to be safer than jumping from 2.6.2 directly to 2.15 as they made some major changes to the backend in those versions)

  6. Finally upgrade Ganeti to 2.15.2 or whatever is latest stable at the time.

So my questions to you are:

  1. Should any of the instances below be migrated to another node during it’s primary node downtime? If so and they’re currently set to plain, we can convert them to DRBD, it will just take a short downtime (depending on how large the disk is) and move them over.
  2. When could we start doing this? I was hoping to start within the next month or so but it can certainly be adjusted.
  3. How should we communicate in real-time if we need to? Via #osuosl on IRC? Other means?
Instance Primary_node Status Memory DiskUsage Disk_template
adhoc.osgeo.osuosl.org osgeo4.osuosl.bak running 4096 65536 plain
base.osgeo.osuosl.org osgeo3.osuosl.bak ADMIN_down - 4096 plain
download.osgeo.osuosl.org osgeo3.osuosl.bak running 8192 158720 plain
mail.osgeo.osuosl.org osgeo4.osuosl.bak running 4096 75776 plain
projects.osgeo.osuosl.org osgeo4.osuosl.bak running 16384 208896 plain
qgis.osgeo.osuosl.org osgeo4.osuosl.bak running 6144 167936 plain
secure.osgeo.osuosl.org osgeo3.osuosl.bak running 4096 14464 drbd
tracsvn2.osgeo.osuosl.org osgeo3.osuosl.bak ADMIN_down - 86016 plain
tracsvn.osgeo.osuosl.org osgeo3.osuosl.bak running 8192 106496 plain
web.osgeo.osuosl.org osgeo3.osuosl.bak running 4096 36864 plain
webextra.osgeo.osuosl.org osgeo3.osuosl.bak running 4096 126976 plain
wiki.osgeo.osuosl.org osgeo3.osuosl.bak running 4096 20480 plain

​Thanks-​

Lance Albertson

Director
Oregon State University | Open Source Lab

Lance Albertson

Director
Oregon State University | Open Source Lab

Resending this plain text cause got bounced first time.

Lance,

We have plans to retire osgeo4 so may not be worthwhile to upgrade that.

After we get the new hardware, would it be possible for you to do the upgrade on the new hardware we send, and then move all the VMs on osgeo4 to the new hardware and then chuck osgeo4 (or use for whatever you want)?

I think that would be ideal if it's not too much trouble.

See our upcoming agenda items for reference.

https://wiki.osgeo.org/wiki/SAC_Meeting_2017-12-21

Alex please correct me if I misspoke.

Thanks,
Regina

From: Sac [mailto:sac-bounces@lists.osgeo.org] On Behalf Of Lance Albertson
Sent: Thursday, December 14, 2017 3:46 PM
To: sac@lists.osgeo.org
Cc: systems@osuosl.org; sysadmin@osgeo.org
Subject: Re: [SAC] OSGeo Ganeti Cluster

Hi All,

I'm not sure if you got my original email back in July but I'm finally ready to start scheduling this. I'd like to amend my plan below to the following:

Summary:
1. Upgrade Ganeti from 2.6.2 to 2.15.2
2. Install CentOS 7 as the OS for all of the nodes
3. Switch to managing said nodes to Chef instead of Cfengine
Here's the actual steps I plan to do:
1. ​
​U
pgrade Ganeti to 2.15.2 on the current cluster from 2.6.2

2. Migrating high priority instances from plain to drbd using --no-wait-for-sync [1]
3. ​Failover instances on osgeo3 to osgeo4

4. Take osgeo3 down and reinstall it's OS with CentOS 7 and retain it's LVM data for VMs
5. Readd osgeo3 back into the cluster using its previous configuration and start all the VMs back up
6. Repeat the process of steps
​3​
​through ​
​5
with osgeo4
​I'd like to go ahead with #1 and then schedule a time to do #2 after that's completed.

Let me know!​

​[1] ​The -t (--disk-template) option will change the disk template of the instance. Currently only conversions between the plain and drbd disk templates are supported, and the instance must be stopped before attempting the conversion. When changing from the plain to the drbd disk template, a new secondary node must be specified via the -n option. The option --no-wait-for-sync can be used when converting to the drbd template in order to make the instance available for startup before DRBD has finished resyncing.

On Thu, Jul 27, 2017 at 1:46 PM, Lance Albertson <lance@osuosl.org> wrote:
​​OSGeo Admins,

I'd like to do several changes to your Ganeti cluster eventually to bring it up to a better supported platform and version of Ganeti as well. Unfortunately this is going to cause some downtime for each node but I'm pretty sure I can do it without losing data or downtime to certain VMs. Both of your nodes are currently running Gentoo which we haven't been maintaining other than for very important security issues that come up. Also, the version of Ganeti is currently 2.6.2 and the latest stable version is 2.15.2 which includes several improvements.

The summary of items I'd like to do are:
1. Install CentOS 7 as the OS for all of the nodes
2. Switch to managing said nodes to Chef instead of Cfengine
3. Upgrade Ganeti from 2.6.2 to 2.15.2 (or whatever is stable at the point we get to this)
This is going to need to be a multi-stage process unfortunately, but I'm hoping I only have to do one down time per node. I've tested this process in a Vagrant environment and it seems to work.

Here's the actual steps I plan to do:
1. Take osgeo3 down and reinstall it's OS with CentOS 7 and retain it's LVM data for VMs
2. Install Ganeti 2.6.2 on osgeo3 using Chef so that the version stays the same throughout the whole cluster
3. Readd osgeo3 back into the cluster using its previous configuration and start all the VMs back up
4. Repeat the process of steps 1 through 3 with osgeo4
5. Upgrade Ganeti to 2.11.8 on all the nodes (I've found this to be safer than jumping from 2.6.2 directly to 2.15 as they made some major changes to the backend in those versions)
6. Finally upgrade Ganeti to 2.15.2 or whatever is latest stable at the time.
So my questions to you are:
1. Should any of the instances below be migrated to another node during it's primary node downtime? If so and they're currently set to plain, we can convert them to DRBD, it will just take a short downtime (depending on how large the disk is) and move them over.
2. When could we start doing this? I was hoping to start within the next month or so but it can certainly be adjusted.
3. How should we communicate in real-time if we need to? Via #osuosl on IRC? Other means?
Instance
Primary_node
Status
Memory
DiskUsage
Disk_template
adhoc.osgeo.osuosl.org
osgeo4.osuosl.bak
running
4096
65536
plain
base.osgeo.osuosl.org
osgeo3.osuosl.bak
ADMIN_down
-
4096
plain
download.osgeo.osuosl.org
osgeo3.osuosl.bak
running
8192
158720
plain
mail.osgeo.osuosl.org
osgeo4.osuosl.bak
running
4096
75776
plain
projects.osgeo.osuosl.org
osgeo4.osuosl.bak
running
16384
208896
plain
qgis.osgeo.osuosl.org
osgeo4.osuosl.bak
running
6144
167936
plain
secure.osgeo.osuosl.org
osgeo3.osuosl.bak
running
4096
14464
drbd
tracsvn2.osgeo.osuosl.org
osgeo3.osuosl.bak
ADMIN_down
-
86016
plain
tracsvn.osgeo.osuosl.org
osgeo3.osuosl.bak
running
8192
106496
plain
web.osgeo.osuosl.org
osgeo3.osuosl.bak
running
4096
36864
plain
webextra.osgeo.osuosl.org
osgeo3.osuosl.bak
running
4096
126976
plain
wiki.osgeo.osuosl.org
osgeo3.osuosl.bak
running
4096
20480
plain

​Thanks-​

--
Lance Albertson
Director
Oregon State University | Open Source Lab

--
Lance Albertson
Director
Oregon State University | Open Source Lab

On Thu, Dec 14, 2017 at 1:14 PM, Regina Obe <lr@pcorp.us> wrote:

Resending this plain text cause got bounced first time.

Lance,

We have plans to retire osgeo4 so may not be worthwhile to upgrade that.

After we get the new hardware, would it be possible for you to do the
upgrade on the new hardware we send, and then move all the VMs on osgeo4 to
the new hardware and then chuck osgeo4 (or use for whatever you want)?

I think that would be ideal if it's not too much trouble.

See our upcoming agenda items for reference.

https://wiki.osgeo.org/wiki/SAC_Meeting_2017-12-21

Alex please correct me if I misspoke.

​When do you expect the new server to arrive? I'd rather not wait months to
complete this although I don't mind waiting a few weeks. Rebuilding the
machine isn't that much work for me, most of the time is spent moving VMs
around.

Let me know how that meeting goes! Thanks-​

--
Lance Albertson
Director
Oregon State University | Open Source Lab

On 12/14/2017 02:29 PM, Lance Albertson wrote:

On Thu, Dec 14, 2017 at 1:14 PM, Regina Obe <lr@pcorp.us> wrote:

Resending this plain text cause got bounced first time.

Lance,

We have plans to retire osgeo4 so may not be worthwhile to upgrade that.

After we get the new hardware, would it be possible for you to do the
upgrade on the new hardware we send, and then move all the VMs on osgeo4 to
the new hardware and then chuck osgeo4 (or use for whatever you want)?

I think that would be ideal if it's not too much trouble.

See our upcoming agenda items for reference.

https://wiki.osgeo.org/wiki/SAC_Meeting_2017-12-21

Alex please correct me if I misspoke.

​When do you expect the new server to arrive? I'd rather not wait months to
complete this although I don't mind waiting a few weeks. Rebuilding the
machine isn't that much work for me, most of the time is spent moving VMs
around.

Let me know how that meeting goes! Thanks-​

I would say I don't trust osgeo4.

I believe it has a failed drive in it's raid, that we did not replace in
anticipation of moving to new hardware. Also because it already burned
through a couple of replacements, and the raid rebuild times were
agonizing.

osgeo6, is already in, and is the replacement machine for osgeo4, we
just haven't finished moving everything off. osgeo6 does not run ganeti
or kvm at this time. We have debated if it should.

I'm not sure we are using drbd for an instances anymore. Would it be
simpler to remove ganeti? Or is it possible to use other ganeti machines
you have as the 2nd disks for the shuffle and upgrade?

The new machine we are discussing is osgeo7 a replacement for osgeo3.

Lance, what's the rack and PDU situation? If there is room we can order
it sooner. Last I knew we needed to get osgeo4 off and out before we
could add anything else. If there is room we can order sooner.

Alternate option, what would be the cost if we just want to buy in to
existing Ganeti VM services OSUOSL is running? We aren't 100% sure the
direction we are going with containers, virtualization, and cloud
services. So an OSUOSL offer of "cloud" virtualization might be an option.

Thanks,
Alex

On Thu, Dec 14, 2017 at 2:46 PM, Alex M <tech_dev@wildintellect.com> wrote:

I would say I don't trust osgeo4.

I believe it has a failed drive in it's raid, that we did not replace in
anticipation of moving to new hardware. Also because it already burned
through a couple of replacements, and the raid rebuild times were
agonizing.

​Right, I had forgotten that its in a failed drive state.​

osgeo6, is already in, and is the replacement machine for osgeo4, we
just haven't finished moving everything off. osgeo6 does not run ganeti
or kvm at this time. We have debated if it should.

I'm not sure we are using drbd for an instances anymore. Would it be
simpler to remove ganeti? Or is it possible to use other ganeti machines
you have as the 2nd disks for the shuffle and upgrade?

​You can't mixed Ganeti clusters ​unfortunately so we'd have to add a
completely new node.

The new machine we are discussing is osgeo7 a replacement for osgeo3.

​*nods*​

Lance, what's the rack and PDU situation? If there is room we can order
it sooner. Last I knew we needed to get osgeo4 off and out before we
could add anything else. If there is room we can order sooner.

​We have plenty of room now​ so feel free to get that started.

Alternate option, what would be the cost if we just want to buy in to

existing Ganeti VM services OSUOSL is running? We aren't 100% sure the
direction we are going with containers, virtualization, and cloud
services. So an OSUOSL offer of "cloud" virtualization might be an option.

​​Our primary VM infrastructure is still based on Ganeti, however we've
been exploring using OpenStack as an alternative for more elastic needs.
We've been running an OpenStack cluster for the past several years on the
ppc64le platform, but we haven't created a cluster for x86 yet. I was
hoping we'd get something like that deployed sometime next year, but it
depends on various factors.

What exactly are your needs in the medium and long term? We could put you
on our primary Ganeti cluster but we have to be careful with any I/O
intensive VMs so they don't impact other users.

Thanks-

--
Lance Albertson
Director
Oregon State University | Open Source Lab

Any update from your last SAC meeting?

Thanks-

···

On Thu, Dec 14, 2017 at 4:41 PM, Lance Albertson <ramereth@osuosl.org> wrote:

On Thu, Dec 14, 2017 at 2:46 PM, Alex M <tech_dev@wildintellect.com> wrote:

I would say I don’t trust osgeo4.

I believe it has a failed drive in it’s raid, that we did not replace in
anticipation of moving to new hardware. Also because it already burned
through a couple of replacements, and the raid rebuild times were
agonizing.

​Right, I had forgotten that its in a failed drive state.​

osgeo6, is already in, and is the replacement machine for osgeo4, we
just haven’t finished moving everything off. osgeo6 does not run ganeti
or kvm at this time. We have debated if it should.

I’m not sure we are using drbd for an instances anymore. Would it be
simpler to remove ganeti? Or is it possible to use other ganeti machines
you have as the 2nd disks for the shuffle and upgrade?

​You can’t mixed Ganeti clusters ​unfortunately so we’d have to add a completely new node.

The new machine we are discussing is osgeo7 a replacement for osgeo3.

nods

Lance, what’s the rack and PDU situation? If there is room we can order
it sooner. Last I knew we needed to get osgeo4 off and out before we
could add anything else. If there is room we can order sooner.

​We have plenty of room now​ so feel free to get that started.

Alternate option, what would be the cost if we just want to buy in to
existing Ganeti VM services OSUOSL is running? We aren’t 100% sure the
direction we are going with containers, virtualization, and cloud
services. So an OSUOSL offer of “cloud” virtualization might be an option.

​​Our primary VM infrastructure is still based on Ganeti, however we’ve been exploring using OpenStack as an alternative for more elastic needs. We’ve been running an OpenStack cluster for the past several years on the ppc64le platform, but we haven’t created a cluster for x86 yet. I was hoping we’d get something like that deployed sometime next year, but it depends on various factors.

What exactly are your needs in the medium and long term? We could put you on our primary Ganeti cluster but we have to be careful with any I/O intensive VMs so they don’t impact other users.

Thanks-

Lance Albertson

Director
Oregon State University | Open Source Lab

Lance Albertson

Director
Oregon State University | Open Source Lab

Lance,

I’m afraid we are further behind on new server than I thought. Seems more questions came out of the meeting than answers.

Any thoughts you have to add would be greatly appreciated.

One of the surprising outcomes for me was I thought sticking with Ganeti was done deal. Seems it is not and libvrt is under consideration

You have any thoughts between Ganeti and Libvrt, what we would be losing if we switch to Libvrt. Are the image formats even compatible? I suspect they are not but haven’t done the research. I’m more concerned with OSUSL being able to support us if we decide to go with libvrt and rebuilding our currently in use VMs on libvrt.

Minutes from last meeting here: https://wiki.osgeo.org/wiki/SAC_Meeting_2017-12-21#Minutes

(transcript starts around 20:15 – 22:01ish http://irclogs.geoapt.com/osgeo-sac/%23osgeo-sac.2017-12-21.log )

To summarize OSU specific outcomes

  1. We still need to pick out specs on new server. Alex is going to propose some options on the mailing list as I recall from here - https://www.siliconmechanics.com/ to fit in a $5000 ish budget.

  2. We are debating with sticking with Ganeti or moving to something easier for us to manage like libvrt. I’m concerned with having just one libvrt and it doesn’t solve the problem we have of just having 1 Ganeti cluster we can trust so would just assume stick with Ganeti, but I’m less knowledgeable on the subject of the difference between the 2. So I guess this means a hold-off for you on your plans unless you have any options we missed. L

  3. On the existing Ganeti clusters we have to inventory what is easy to move off and what we actually are still using cause on quick finding, I think a lot of things on those servers are not in use. I think Martin was in middle of migrating stuff off because all those VMs are old Debian 5 or 6 and have to be rebuilt anyway, but I’m not confident we’ll have enough bandwidth in next month or two to move everything off.

Thanks,

Regina

From: Sac [mailto:sac-bounces@lists.osgeo.org] On Behalf Of Lance Albertson
Sent: Wednesday, December 27, 2017 7:25 PM
To: tech@wildintellect.com
Cc: System Administration Committee Discussion/OSGeo sac@lists.osgeo.org; systems@osuosl.org; sysadmin@osgeo.org
Subject: Re: [SAC] OSGeo Ganeti Cluster

Any update from your last SAC meeting?

Thanks-

On Thu, Dec 14, 2017 at 4:41 PM, Lance Albertson <ramereth@osuosl.org> wrote:

On Thu, Dec 14, 2017 at 2:46 PM, Alex M <tech_dev@wildintellect.com> wrote:

I would say I don’t trust osgeo4.

I believe it has a failed drive in it’s raid, that we did not replace in
anticipation of moving to new hardware. Also because it already burned
through a couple of replacements, and the raid rebuild times were
agonizing.

​Right, I had forgotten that its in a failed drive state.​

osgeo6, is already in, and is the replacement machine for osgeo4, we
just haven’t finished moving everything off. osgeo6 does not run ganeti
or kvm at this time. We have debated if it should.

I’m not sure we are using drbd for an instances anymore. Would it be
simpler to remove ganeti? Or is it possible to use other ganeti machines
you have as the 2nd disks for the shuffle and upgrade?

​You can’t mixed Ganeti clusters ​unfortunately so we’d have to add a completely new node.

The new machine we are discussing is osgeo7 a replacement for osgeo3.

nods

Lance, what’s the rack and PDU situation? If there is room we can order
it sooner. Last I knew we needed to get osgeo4 off and out before we
could add anything else. If there is room we can order sooner.

​We have plenty of room now​ so feel free to get that started.

Alternate option, what would be the cost if we just want to buy in to
existing Ganeti VM services OSUOSL is running? We aren’t 100% sure the
direction we are going with containers, virtualization, and cloud
services. So an OSUOSL offer of “cloud” virtualization might be an option.

​​Our primary VM infrastructure is still based on Ganeti, however we’ve been exploring using OpenStack as an alternative for more elastic needs. We’ve been running an OpenStack cluster for the past several years on the ppc64le platform, but we haven’t created a cluster for x86 yet. I was hoping we’d get something like that deployed sometime next year, but it depends on various factors.

What exactly are your needs in the medium and long term? We could put you on our primary Ganeti cluster but we have to be careful with any I/O intensive VMs so they don’t impact other users.

Thanks-

Lance Albertson

Director

Oregon State University | Open Source Lab

Lance Albertson

Director

Oregon State University | Open Source Lab

I would have never suggested libvrt if I didn't know the compatibility.
Both are managers on top of KVM, with the disks of the vms being lvm
volumes. Moving a vm is a matter of copying the lvm volume and declaring
a config to use it as it's disk.

The feature we'd be losing is DRBD, which is a multi server hotcopy
failover system, which we don't really use because of performance issues
with load on some our machines. Ganeti also isn't really designed for
less than 3 servers, and kind of expects those servers to all be roughly
the same.

You are correct if we moved off ganeti we'd remove OSUOSL from anything
but the hardware management. This came up as SAC has never really had
good access to the hosts osgeo3 and osgeo4, and OSUOSL doesn't always
have time to troubleshoot some of our unusual issues.

Thanks,
Alex

On 12/28/2017 10:15 AM, Regina Obe wrote:

Lance,

I'm afraid we are further behind on new server than I thought. Seems more questions came out of the meeting than answers.

Any thoughts you have to add would be greatly appreciated.

One of the surprising outcomes for me was I thought sticking with Ganeti was done deal. Seems it is not and libvrt is under consideration

You have any thoughts between Ganeti and Libvrt, what we would be losing if we switch to Libvrt. Are the image formats even compatible? I suspect they are not but haven't done the research. I'm more concerned with OSUSL being able to support us if we decide to go with libvrt and rebuilding our currently in use VMs on libvrt.

Minutes from last meeting here: https://wiki.osgeo.org/wiki/SAC_Meeting_2017-12-21#Minutes

(transcript starts around 20:15 – 22:01ish http://irclogs.geoapt.com/osgeo-sac/%23osgeo-sac.2017-12-21.log )

To summarize OSU specific outcomes

1) We still need to pick out specs on new server. Alex is going to propose some options on the mailing list as I recall from here - https://www.siliconmechanics.com/ to fit in a $5000 ish budget.

2) We are debating with sticking with Ganeti or moving to something easier for us to manage like libvrt. I'm concerned with having just one libvrt and it doesn't solve the problem we have of just having 1 Ganeti cluster we can trust so would just assume stick with Ganeti, but I'm less knowledgeable on the subject of the difference between the 2. So I guess this means a hold-off for you on your plans unless you have any options we missed. :frowning:

3) On the existing Ganeti clusters we have to inventory what is easy to move off and what we actually are still using cause on quick finding, I think a lot of things on those servers are not in use. I think Martin was in middle of migrating stuff off because all those VMs are old Debian 5 or 6 and have to be rebuilt anyway, but I'm not confident we'll have enough bandwidth in next month or two to move everything off.

Thanks,

Regina

From: Sac [mailto:sac-bounces@lists.osgeo.org] On Behalf Of Lance Albertson
Sent: Wednesday, December 27, 2017 7:25 PM
To: tech@wildintellect.com
Cc: System Administration Committee Discussion/OSGeo <sac@lists.osgeo.org>; systems@osuosl.org; sysadmin@osgeo.org
Subject: Re: [SAC] OSGeo Ganeti Cluster

Any update from your last SAC meeting?

Thanks-

On Thu, Dec 14, 2017 at 4:41 PM, Lance Albertson <ramereth@osuosl.org <mailto:ramereth@osuosl.org> > wrote:

On Thu, Dec 14, 2017 at 2:46 PM, Alex M <tech_dev@wildintellect.com <mailto:tech_dev@wildintellect.com> > wrote:

I would say I don't trust osgeo4.

I believe it has a failed drive in it's raid, that we did not replace in
anticipation of moving to new hardware. Also because it already burned
through a couple of replacements, and the raid rebuild times were
agonizing.

​Right, I had forgotten that its in a failed drive state.​

osgeo6, is already in, and is the replacement machine for osgeo4, we
just haven't finished moving everything off. osgeo6 does not run ganeti
or kvm at this time. We have debated if it should.

I'm not sure we are using drbd for an instances anymore. Would it be
simpler to remove ganeti? Or is it possible to use other ganeti machines
you have as the 2nd disks for the shuffle and upgrade?

​You can't mixed Ganeti clusters ​unfortunately so we'd have to add a completely new node.

The new machine we are discussing is osgeo7 a replacement for osgeo3.

​*nods*​

Lance, what's the rack and PDU situation? If there is room we can order
it sooner. Last I knew we needed to get osgeo4 off and out before we
could add anything else. If there is room we can order sooner.

​We have plenty of room now​ so feel free to get that started.

Alternate option, what would be the cost if we just want to buy in to
existing Ganeti VM services OSUOSL is running? We aren't 100% sure the
direction we are going with containers, virtualization, and cloud
services. So an OSUOSL offer of "cloud" virtualization might be an option.

​​Our primary VM infrastructure is still based on Ganeti, however we've been exploring using OpenStack as an alternative for more elastic needs. We've been running an OpenStack cluster for the past several years on the ppc64le platform, but we haven't created a cluster for x86 yet. I was hoping we'd get something like that deployed sometime next year, but it depends on various factors.

What exactly are your needs in the medium and long term? We could put you on our primary Ganeti cluster but we have to be careful with any I/O intensive VMs so they don't impact other users.

Thanks-

On Thu, Dec 28, 2017 at 11:47 AM, Alex Mandel <tech_dev@wildintellect.com>
wrote:

I would have never suggested libvrt if I didn't know the compatibility.
Both are managers on top of KVM, with the disks of the vms being lvm
volumes. Moving a vm is a matter of copying the lvm volume and declaring
a config to use it as it's disk.

​This is true​ and wouldn't be that difficult to do.

The feature we'd be losing is DRBD, which is a multi server hotcopy
failover system, which we don't really use because of performance issues
with load on some our machines. Ganeti also isn't really designed for
less than 3 servers, and kind of expects those servers to all be roughly
the same.

​While I agree it works better with at least 3 servers, clusters running
only two is OK still.​ I don't recall what issues you had in the past with
DRBD but it hasn't been that much of problem lately on our clusters.

You are correct if we moved off ganeti we'd remove OSUOSL from anything
but the hardware management. This came up as SAC has never really had
good access to the hosts osgeo3 and osgeo4, and OSUOSL doesn't always
have time to troubleshoot some of our unusual issues.

​That's OK with us, we can do whatever makes the most sense for you and
your project.​ At the time, Ganeti was the best option, but now there are a
few other options available. We have been using OpenStack internally for
several years and are almost ready to open up a larger cluster for FOSS
projects. It's a bit more complicated to maintain, but it offers a lot more
flexibility in how you manage and access the VMs using a standard public
API.

What are your project's needs?

On 12/28/2017 10:15 AM, Regina Obe wrote:
> Lance,
>
> I'm afraid we are further behind on new server than I thought. Seems
more questions came out of the meeting than answers.
>
> Any thoughts you have to add would be greatly appreciated.
>
> One of the surprising outcomes for me was I thought sticking with Ganeti
was done deal. Seems it is not and libvrt is under consideration
>
> You have any thoughts between Ganeti and Libvrt, what we would be losing
if we switch to Libvrt. Are the image formats even compatible? I suspect
they are not but haven't done the research. I'm more concerned with OSUSL
being able to support us if we decide to go with libvrt and rebuilding our
currently in use VMs on libvrt.
>
> Minutes from last meeting here: https://wiki.osgeo.org/wiki/
SAC_Meeting_2017-12-21#Minutes
>
> (transcript starts around 20:15 – 22:01ish http://irclogs.geoapt.com/
osgeo-sac/%23osgeo-sac.2017-12-21.log )
>
> To summarize OSU specific outcomes
>
> 1) We still need to pick out specs on new server. Alex is going to
propose some options on the mailing list as I recall from here -
https://www.siliconmechanics.com/ to fit in a $5000 ish budget.
>
> 2) We are debating with sticking with Ganeti or moving to something
easier for us to manage like libvrt. I'm concerned with having just one
libvrt and it doesn't solve the problem we have of just having 1 Ganeti
cluster we can trust so would just assume stick with Ganeti, but I'm less
knowledgeable on the subject of the difference between the 2. So I guess
this means a hold-off for you on your plans unless you have any options we
missed. :frowning:
>
> 3) On the existing Ganeti clusters we have to inventory what is
easy to move off and what we actually are still using cause on quick
finding, I think a lot of things on those servers are not in use. I think
Martin was in middle of migrating stuff off because all those VMs are old
Debian 5 or 6 and have to be rebuilt anyway, but I'm not confident we'll
have enough bandwidth in next month or two to move everything off.
>
>
>
> Thanks,
>
> Regina

--
Lance Albertson
Director
Oregon State University | Open Source Lab

Lance,

It seems no one has answered you so I’ll offer my assessment of the situation to hopefully get the ball rolling.

​> That’s OK with us, we can do whatever makes the most sense for you and your project.​ At the time, Ganeti was the best option, but now there are a > few other options available. We have been using OpenStack internally for several years and are almost ready to open up a larger cluster for FOSS

projects. It’s a bit more complicated to maintain, but it offers a lot more flexibility in how you manage and access the VMs using a standard public API.

What are your project’s needs?


  1. Our current VMs are very old. I think they are of Debian 6 vintage. These should be resetup as Debian 9ish.

So we have issue of keeping these going while we move the services we still use on them off to new VM/hardware.

Our newest is OSGeo6 which is Debian8, fine but a bear-metal everything as I recall so I fear when comes time will be hard to upgrade. As I recall I think we had plans of setting up a Docker Server on this. Not sure if it was done.

  1. We have our FOSS4G initiatives and many other initiatives going on around the year, and these often need a server of some sort or at least a website, and for said such folks to be able to manage their own space as they need. Right now I don’t feel confident offering this on our existing systems, because they are either

a) OS too old b) taxed

As has been done we basically tell the new FOSS4G group hey you are on your own for everything. I feel really bad about that.

  1. We’ve got our OSGeo Live group – one of our initiatives, that produces ready to use software disk for conferences workshops etc. It would be nice if they had their own VM they can configure as they want to do things like publish their workshop material real-time. They also have special needs like high-bandwith for holding images – which as I recall they rely on sourceforge.net for – which of course has been on a spiral decline lately.

  2. Of particular urgency is our new website which has been delayed on release for 6 months. For our immediate need, Alex will follow up on you for this to see if we can use OSUOSL for cloud hosting for this for the time being, as we seem to have trouble making up our mind.

  3. We’ve got our Gitlab test initiative which is going no where because no hardware to put it that we don’t fear is too old or too taxed.

People have bandied the idea of using Docker. Docker is fine for somethings, maybe it’s my lack of experience with it, but seems like it’s suitable for things that don’t change, but not something I would feel comfortable handing to a project and say – configure it as you like, we’ll manage the updates if you need us to.

Also given that it relies a lot on the underlying OS, seems you are stuck with OS. So if a project comes to us and says, our stuff works best on CentOS or FreeBSD or whatever, we are stuck.

So I really really want a VM solution of some sort like Ganeti, OpenStack, Libvrt. If they all underneath rely on the same KVM/LVM structure, I don’t see why it much matters which one we go with aside from manageability and as long as we can move images to another box while we are reconfiguring things. And If we are having these many arguments about it, I would much prefer a solution that OSUOSL has experience supporting or wants to support.

For picking a VM, things we are concerned about

  1. How easy would it be for us to manage and provision a new one on our own so we are not too dependent on OSUOSL for everything.

  2. Ease of upgrade

  3. Can we have basically console as if “we were right there standing in front of it” if we screw up

  4. Ability to backup the image as needed when we fuss with it and need to do something like upgrade the OS.

  5. I would really love having some redundancy. This redundancy can be manual if it eases manageability. Like if we need to upgrade a cluster, be able to move VMs to another.

Thanks,

Regina

Lance,

We're currently shopping for the new machine, and figuring out how we
want to migrate off the older hardware.

In the meantime could we request a VM on your ganeti cluster to explore
if we want to try hosting more things on your systems instead of our own?
We'd like a Debian 9, with 2 cpu, 4 GB Ram, 100 GB hard drive.

We can supply ssh keys for access to root to configure the VM once up.

Thanks,
Alex
OSGeo Sys Admin

On 12/14/2017 04:41 PM, Lance Albertson wrote:

On Thu, Dec 14, 2017 at 2:46 PM, Alex M <tech_dev@wildintellect.com> wrote:

I would say I don't trust osgeo4.

I believe it has a failed drive in it's raid, that we did not replace in
anticipation of moving to new hardware. Also because it already burned
through a couple of replacements, and the raid rebuild times were
agonizing.

​Right, I had forgotten that its in a failed drive state.​

osgeo6, is already in, and is the replacement machine for osgeo4, we
just haven't finished moving everything off. osgeo6 does not run ganeti
or kvm at this time. We have debated if it should.

I'm not sure we are using drbd for an instances anymore. Would it be
simpler to remove ganeti? Or is it possible to use other ganeti machines
you have as the 2nd disks for the shuffle and upgrade?

​You can't mixed Ganeti clusters ​unfortunately so we'd have to add a
completely new node.

The new machine we are discussing is osgeo7 a replacement for osgeo3.

​*nods*​

Lance, what's the rack and PDU situation? If there is room we can order
it sooner. Last I knew we needed to get osgeo4 off and out before we
could add anything else. If there is room we can order sooner.

​We have plenty of room now​ so feel free to get that started.

Alternate option, what would be the cost if we just want to buy in to

existing Ganeti VM services OSUOSL is running? We aren't 100% sure the
direction we are going with containers, virtualization, and cloud
services. So an OSUOSL offer of "cloud" virtualization might be an option.

​​Our primary VM infrastructure is still based on Ganeti, however we've
been exploring using OpenStack as an alternative for more elastic needs.
We've been running an OpenStack cluster for the past several years on the
ppc64le platform, but we haven't created a cluster for x86 yet. I was
hoping we'd get something like that deployed sometime next year, but it
depends on various factors.

What exactly are your needs in the medium and long term? We could put you
on our primary Ganeti cluster but we have to be careful with any I/O
intensive VMs so they don't impact other users.

Thanks-

Mind submitting a new ticket for that via support@osuosl.org? Also please include what hostname you want to give it.

Thanks!

···

On Thu, Jan 18, 2018 at 12:53 PM, Alex M <tech_dev@wildintellect.com> wrote:

Lance,

We’re currently shopping for the new machine, and figuring out how we
want to migrate off the older hardware.

In the meantime could we request a VM on your ganeti cluster to explore
if we want to try hosting more things on your systems instead of our own?
We’d like a Debian 9, with 2 cpu, 4 GB Ram, 100 GB hard drive.

We can supply ssh keys for access to root to configure the VM once up.

Thanks,
Alex
OSGeo Sys Admin

On 12/14/2017 04:41 PM, Lance Albertson wrote:

On Thu, Dec 14, 2017 at 2:46 PM, Alex M <tech_dev@wildintellect.com> wrote:

I would say I don’t trust osgeo4.

I believe it has a failed drive in it’s raid, that we did not replace in
anticipation of moving to new hardware. Also because it already burned
through a couple of replacements, and the raid rebuild times were
agonizing.

​Right, I had forgotten that its in a failed drive state.​

osgeo6, is already in, and is the replacement machine for osgeo4, we
just haven’t finished moving everything off. osgeo6 does not run ganeti
or kvm at this time. We have debated if it should.

I’m not sure we are using drbd for an instances anymore. Would it be
simpler to remove ganeti? Or is it possible to use other ganeti machines
you have as the 2nd disks for the shuffle and upgrade?

​You can’t mixed Ganeti clusters ​unfortunately so we’d have to add a
completely new node.

The new machine we are discussing is osgeo7 a replacement for osgeo3.

nods

Lance, what’s the rack and PDU situation? If there is room we can order
it sooner. Last I knew we needed to get osgeo4 off and out before we
could add anything else. If there is room we can order sooner.

​We have plenty of room now​ so feel free to get that started.

Alternate option, what would be the cost if we just want to buy in to

existing Ganeti VM services OSUOSL is running? We aren’t 100% sure the
direction we are going with containers, virtualization, and cloud
services. So an OSUOSL offer of “cloud” virtualization might be an option.

​​Our primary VM infrastructure is still based on Ganeti, however we’ve
been exploring using OpenStack as an alternative for more elastic needs.
We’ve been running an OpenStack cluster for the past several years on the
ppc64le platform, but we haven’t created a cluster for x86 yet. I was
hoping we’d get something like that deployed sometime next year, but it
depends on various factors.

What exactly are your needs in the medium and long term? We could put you
on our primary Ganeti cluster but we have to be careful with any I/O
intensive VMs so they don’t impact other users.

Thanks-

Lance Albertson

Director
Oregon State University | Open Source Lab

How hard would it be to also add more CPU slices to an existing VM ?
I'm thinking about TracSVN, which according to Munin has "steal"
events ...

--strk;

On Thu, Jan 18, 2018 at 12:56:55PM -0800, Lance Albertson wrote:

Mind submitting a new ticket for that via support@osuosl.org? Also please
include what hostname you want to give it.

Thanks!

On Thu, Jan 18, 2018 at 12:53 PM, Alex M <tech_dev@wildintellect.com> wrote:

> Lance,
>
> We're currently shopping for the new machine, and figuring out how we
> want to migrate off the older hardware.
>
> In the meantime could we request a VM on your ganeti cluster to explore
> if we want to try hosting more things on your systems instead of our own?
> We'd like a Debian 9, with 2 cpu, 4 GB Ram, 100 GB hard drive.
>
> We can supply ssh keys for access to root to configure the VM once up.
>
> Thanks,
> Alex
> OSGeo Sys Admin
>
>
> On 12/14/2017 04:41 PM, Lance Albertson wrote:
> > On Thu, Dec 14, 2017 at 2:46 PM, Alex M <tech_dev@wildintellect.com>
> wrote:
> >>
> >> I would say I don't trust osgeo4.
> >>
> >> I believe it has a failed drive in it's raid, that we did not replace in
> >> anticipation of moving to new hardware. Also because it already burned
> >> through a couple of replacements, and the raid rebuild times were
> >> agonizing.
> >>
> >
> > ​Right, I had forgotten that its in a failed drive state.​
> >
> >
> >> osgeo6, is already in, and is the replacement machine for osgeo4, we
> >> just haven't finished moving everything off. osgeo6 does not run ganeti
> >> or kvm at this time. We have debated if it should.
> >>
> >> I'm not sure we are using drbd for an instances anymore. Would it be
> >> simpler to remove ganeti? Or is it possible to use other ganeti machines
> >> you have as the 2nd disks for the shuffle and upgrade?
> >>
> >
> > ​You can't mixed Ganeti clusters ​unfortunately so we'd have to add a
> > completely new node.
> >
> >
> >> The new machine we are discussing is osgeo7 a replacement for osgeo3.
> >>
> >
> > ​*nods*​
> >
> >
> >> Lance, what's the rack and PDU situation? If there is room we can order
> >> it sooner. Last I knew we needed to get osgeo4 off and out before we
> >> could add anything else. If there is room we can order sooner.
> >>
> >
> > ​We have plenty of room now​ so feel free to get that started.
> >
> > Alternate option, what would be the cost if we just want to buy in to
> >> existing Ganeti VM services OSUOSL is running? We aren't 100% sure the
> >> direction we are going with containers, virtualization, and cloud
> >> services. So an OSUOSL offer of "cloud" virtualization might be an
> option.
> >>
> >
> > ​​Our primary VM infrastructure is still based on Ganeti, however we've
> > been exploring using OpenStack as an alternative for more elastic needs.
> > We've been running an OpenStack cluster for the past several years on the
> > ppc64le platform, but we haven't created a cluster for x86 yet. I was
> > hoping we'd get something like that deployed sometime next year, but it
> > depends on various factors.
> >
> > What exactly are your needs in the medium and long term? We could put you
> > on our primary Ganeti cluster but we have to be careful with any I/O
> > intensive VMs so they don't impact other users.
> >
> > Thanks-
> >
>
>

--
Lance Albertson
Director
Oregon State University | Open Source Lab

Sandro,

Are you asking about the TracSVN machine or in general? It's a simple
configuration change and reboot in ganeti to change the CPU allocation,
assuming there are more CPUs to allocate. In this case I would say
hitting CPU steal indicates that services need to be moved to different
machines to lower the overall load. I was hoping now that the Drupal
site is not the main website the overall load on the machine would
reduce, however the Wiki VM appears to be a suspect with it's noticeable
swap which creates disk iowait which leads to CPU load. Though if you
look at the CPU load chart on TracSVN, it's got plenty of CPU available.
So I think increasing the ram on Wiki and modifying the kernel
swappiness might help. Let me check the configuration notes of osgeo3
and we can put in a request to increase the ram allocation of the Wiki VM.

Thanks,
Alex

On 01/19/2018 10:55 AM, Sandro Santilli wrote:

How hard would it be to also add more CPU slices to an existing VM ?
I'm thinking about TracSVN, which according to Munin has "steal"
events ...

--strk;

On Thu, Jan 18, 2018 at 12:56:55PM -0800, Lance Albertson wrote:

Mind submitting a new ticket for that via support@osuosl.org? Also please
include what hostname you want to give it.

Thanks!

On Thu, Jan 18, 2018 at 12:53 PM, Alex M <tech_dev@wildintellect.com> wrote:

Lance,

We're currently shopping for the new machine, and figuring out how we
want to migrate off the older hardware.

In the meantime could we request a VM on your ganeti cluster to explore
if we want to try hosting more things on your systems instead of our own?
We'd like a Debian 9, with 2 cpu, 4 GB Ram, 100 GB hard drive.

We can supply ssh keys for access to root to configure the VM once up.

Thanks,
Alex
OSGeo Sys Admin

On 12/14/2017 04:41 PM, Lance Albertson wrote:

On Thu, Dec 14, 2017 at 2:46 PM, Alex M <tech_dev@wildintellect.com>

wrote:

I would say I don't trust osgeo4.

I believe it has a failed drive in it's raid, that we did not replace in
anticipation of moving to new hardware. Also because it already burned
through a couple of replacements, and the raid rebuild times were
agonizing.

​Right, I had forgotten that its in a failed drive state.​

osgeo6, is already in, and is the replacement machine for osgeo4, we
just haven't finished moving everything off. osgeo6 does not run ganeti
or kvm at this time. We have debated if it should.

I'm not sure we are using drbd for an instances anymore. Would it be
simpler to remove ganeti? Or is it possible to use other ganeti machines
you have as the 2nd disks for the shuffle and upgrade?

​You can't mixed Ganeti clusters ​unfortunately so we'd have to add a
completely new node.

The new machine we are discussing is osgeo7 a replacement for osgeo3.

​*nods*​

Lance, what's the rack and PDU situation? If there is room we can order
it sooner. Last I knew we needed to get osgeo4 off and out before we
could add anything else. If there is room we can order sooner.

​We have plenty of room now​ so feel free to get that started.

Alternate option, what would be the cost if we just want to buy in to

existing Ganeti VM services OSUOSL is running? We aren't 100% sure the
direction we are going with containers, virtualization, and cloud
services. So an OSUOSL offer of "cloud" virtualization might be an

option.

​​Our primary VM infrastructure is still based on Ganeti, however we've
been exploring using OpenStack as an alternative for more elastic needs.
We've been running an OpenStack cluster for the past several years on the
ppc64le platform, but we haven't created a cluster for x86 yet. I was
hoping we'd get something like that deployed sometime next year, but it
depends on various factors.

What exactly are your needs in the medium and long term? We could put you
on our primary Ganeti cluster but we have to be careful with any I/O
intensive VMs so they don't impact other users.

Thanks-

--
Lance Albertson
Director
Oregon State University | Open Source Lab

Hi there,

Just checking in on this again as I’d like to either 1) finish migrating these systems to Chef+CentOS 7 or 2) decommission them.

What’s the status of your migration? Keep in mind these machines have been running with a degraded array for a while now.

Thanks-

···

Lance Albertson

Director
Oregon State University | Open Source Lab

Lance,

Some things we have migrated off already, but there are some lose ends I am in the middle of inventorying.

We plan to migrate whatever is left to our new LXD container system on osgeo7 and reformat osgeo4 with Ubuntu 18.04 LTS so it can work as a spear for osgeo7.

Are we talking about both osgeo3 and osgeo4. I recall one being in bad shape (with the arrays) – I think it was osgeo4 and osgeo3 being more or less okay, but I’m still feeling my way thru the setup of things so could be wrong.

Thanks,

Regina

From: Lance Albertson [mailto:ramereth@osuosl.org]
Sent: Monday, April 08, 2019 2:30 PM
To: tech@wildintellect.com
Cc: System Administration Committee Discussion/OSGeo sac@lists.osgeo.org; systems systems@osuosl.org; sysadmin@osgeo.org
Subject: Re: [SAC] OSGeo Ganeti Cluster

Hi there,

Just checking in on this again as I’d like to either 1) finish migrating these systems to Chef+CentOS 7 or 2) decommission them.

What’s the status of your migration? Keep in mind these machines have been running with a degraded array for a while now.

Thanks-

On Thu, Jan 18, 2018 at 12:53 PM Alex M <tech_dev@wildintellect.com> wrote:

Lance,

We’re currently shopping for the new machine, and figuring out how we
want to migrate off the older hardware.

In the meantime could we request a VM on your ganeti cluster to explore
if we want to try hosting more things on your systems instead of our own?
We’d like a Debian 9, with 2 cpu, 4 GB Ram, 100 GB hard drive.

We can supply ssh keys for access to root to configure the VM once up.

Thanks,
Alex
OSGeo Sys Admin

On 12/14/2017 04:41 PM, Lance Albertson wrote:

On Thu, Dec 14, 2017 at 2:46 PM, Alex M <tech_dev@wildintellect.com> wrote:

I would say I don’t trust osgeo4.

I believe it has a failed drive in it’s raid, that we did not replace in
anticipation of moving to new hardware. Also because it already burned
through a couple of replacements, and the raid rebuild times were
agonizing.

​Right, I had forgotten that its in a failed drive state.​

osgeo6, is already in, and is the replacement machine for osgeo4, we
just haven’t finished moving everything off. osgeo6 does not run ganeti
or kvm at this time. We have debated if it should.

I’m not sure we are using drbd for an instances anymore. Would it be
simpler to remove ganeti? Or is it possible to use other ganeti machines
you have as the 2nd disks for the shuffle and upgrade?

​You can’t mixed Ganeti clusters ​unfortunately so we’d have to add a
completely new node.

The new machine we are discussing is osgeo7 a replacement for osgeo3.

nods

Lance, what’s the rack and PDU situation? If there is room we can order
it sooner. Last I knew we needed to get osgeo4 off and out before we
could add anything else. If there is room we can order sooner.

​We have plenty of room now​ so feel free to get that started.

Alternate option, what would be the cost if we just want to buy in to

existing Ganeti VM services OSUOSL is running? We aren’t 100% sure the
direction we are going with containers, virtualization, and cloud
services. So an OSUOSL offer of “cloud” virtualization might be an option.

​​Our primary VM infrastructure is still based on Ganeti, however we’ve
been exploring using OpenStack as an alternative for more elastic needs.
We’ve been running an OpenStack cluster for the past several years on the
ppc64le platform, but we haven’t created a cluster for x86 yet. I was
hoping we’d get something like that deployed sometime next year, but it
depends on various factors.

What exactly are your needs in the medium and long term? We could put you
on our primary Ganeti cluster but we have to be careful with any I/O
intensive VMs so they don’t impact other users.

Thanks-

Lance Albertson

Director

Oregon State University | Open Source Lab

Lance,

Some things we have migrated off already, but there are some lose ends I am in the middle of inventorying.

Great!

We plan to migrate whatever is left to our new LXD container system on osgeo7 and reformat osgeo4 with Ubuntu 18.04 LTS so it can work as a spear for osgeo7.

When you’re ready to do that, let us know and we’ll need to change how the networking is setup on the machine and take it out of our manage. In addition, I’d recommend we upgrade the bios/firmware on it before you start using it again. I can do that for you if you’d like when you’re ready to do that.

Are we talking about both osgeo3 and osgeo4. I recall one being in bad shape (with the arrays) – I think it was osgeo4 and osgeo3 being more or less okay, but I’m still feeling my way thru the setup of things so could be wrong.

I just checked and osgeo3 has no failed drives (currently). However, osgeo4 has one failed drive in a RAID6 and if you’d like to replace it, it seems like it’s going to be fairly cheap [1]. This should be an exact replacement. If you do plan on replacing it, you should buy 2-3 more just in case others fail.

Do you have any idea when you’ll be wrapping the migrations up?

Thanks-

[1] https://www.newegg.com/Product/Product.aspx?Item=N82E16822148538

···

Lance Albertson

Director
Oregon State University | Open Source Lab