[SAC] FOSS4G meet with OSUOSL update

Just some notes from when I had a change to sit down with Justin from
OSUOSL.

1. OSL can give Martin (and others if we request) access to the host OS
of osgeo3 and osgeo4 via their VPN. Which I think is the same way Martin
setup osgeo5/backup.

2. OSL has spare cycles on an existing set of build machines if we want
to give it a try. Builbot for sure, but I think Jenkins too.

3. Justin thinks there's enough space on other OSL hardware to shift
osgeo4 vms for approximately 1 month so we can redo osgeo4. AKA switch
to software raid 5, update the host OS, etc... We should schedule this
in advance based on SAC availability to do required work.

4. OSL has a Xserve machine. We're going to try getting Larry (QGIS)
access to try some QGIS/osgeo4mac build stuff. Long term we might still
want to rack mac mini.

5. There is rack space, for new machines, just some complicated PDU
balancing. Should be easier as OSL turns off some power hogs in the near
future.

I didn't get a chance to talk to Daniel who was also around the conf,
but I've been told he's the person to really talk to about Docker usage
and Mac build stuff.

Summary we might be able to put off buying a machine for 6 months to a
year. But I wouldn't want to hold off longer than that as our machines
are nearing 4 years in age.

More background:
http://wiki.osgeo.org/wiki/Infrastructure_Transition_Plan_2014

Thanks,
Alex

On 09/11/2014 11:10 AM, Alex Mandel wrote:

Just some notes from when I had a change to sit down with Justin from
OSUOSL.

1. OSL can give Martin (and others if we request) access to the host OS
of osgeo3 and osgeo4 via their VPN. Which I think is the same way Martin
setup osgeo5/backup.

2. OSL has spare cycles on an existing set of build machines if we want
to give it a try. Builbot for sure, but I think Jenkins too.

3. Justin thinks there's enough space on other OSL hardware to shift
osgeo4 vms for approximately 1 month so we can redo osgeo4. AKA switch
to software raid 5, update the host OS, etc... We should schedule this
in advance based on SAC availability to do required work.

4. OSL has a Xserve machine. We're going to try getting Larry (QGIS)
access to try some QGIS/osgeo4mac build stuff. Long term we might still
want to rack mac mini.

5. There is rack space, for new machines, just some complicated PDU
balancing. Should be easier as OSL turns off some power hogs in the near
future.

I didn't get a chance to talk to Daniel who was also around the conf,
but I've been told he's the person to really talk to about Docker usage
and Mac build stuff.

Summary we might be able to put off buying a machine for 6 months to a
year. But I wouldn't want to hold off longer than that as our machines
are nearing 4 years in age.

More background:
http://wiki.osgeo.org/wiki/Infrastructure_Transition_Plan_2014

Thanks,
Alex

Some follow up details and ideas after talking with more SAC members at
the conference.

1. OSUOSL did point out that if we switch to software raid, it will be a
little more difficult to swap disks. I assume Martin S. knows what needs
to be done in order to tell the raid to drop a disk and be ready for the
new one. Question, does software raid allow hot swapping disks?

2. I forgot to talk with OSUOSL about upgrading/changing the OS on
osgeo4. Since we are redoing the disks I suspect this will need to
happen. I will email them.

Proposal Work Plan:

Step 1: Offload osgeo4

1. Hotcopy mail from osgeo3 to osgeo4. Minimal downtime.
2. Get new VM or docker, or docker in a vm (Debian 7) on OSUOSL ganeti
cluster. Invite projects to migrate their stuff from projects over to
the new spot.
3. Do something similar to adhoc
4. Do something similar to qgis (I'll coordinate with them)

Step 2: Redo osgeo4
Software Raid 5
New Debian or Centos
Docker or Ganeti

Step 3: Move stuff back to our hardware

Step 4: Buy new hardware (sometime in the next year)
Invite projects to a new sac-announce list which sends out notices about
upcoming events. Announce a survey of project needs, and create a web
page that clearly states who and how to request services.

Tangents:
Setup a cloudflare account, free tier, possibly one account per domain.
This should cut out some nefarious traffic.

Talk more with board about a proposal to get bid requests on a part time
system support for osgeo.
- 24 hour response team
- When not doing emergency response, work on a SAC list of management tasks
- OSGeo SAC members would be allowed to bid but not be allowed to vote
if they bid.
- would be for ~6-12 month terms before need to rebid.

I'll try to get his all cleaned up an into the wiki next week.

Thanks,
Alex

Hi, please excuse the late reply. I was so enthusiastic over the
GRASS/PostGIS stuff I did last weekend so I completely forgot about the
admin side of the coin :wink:

On Thu, Sep 11, 2014 at 11:10:01AM -0700, Alex Mandel wrote:

1. OSL can give Martin (and others if we request) access to the host OS
of osgeo3 and osgeo4 via their VPN. Which I think is the same way Martin
setup osgeo5/backup.

Sounds good, so we can detect and inspect issues without depending on
OSL's support.
Setting up the backup machine was a little bit more complex. I was
given access to some non-public network and then, as far as memory
serves, connected to a console-to-IP adapter using some special browser
plugin.

3. Justin thinks there's enough space on other OSL hardware to shift
osgeo4 vms for approximately 1 month so we can redo osgeo4. AKA switch
to software raid 5, update the host OS, etc... We should schedule this
in advance based on SAC availability to do required work.

I have to admit I have no idea about what exactly they did to get 'our'
Ganeti setup running. I suspect they apply some sort of automated
install onto every machine in order to keep all affected systems
compatible to each other. Right ?
Therefore I also suspect they have a preferred host OS for good reasons
and I don't think it's advisable to question their choice. I also
didn't mean to question Ganeti in general, just having too limited
access was a bit unfortunate.

On Sat, Sep 13, 2014 at 10:37:55AM -0700, Alex Mandel wrote:

1. OSUOSL did point out that if we switch to software raid, it will be a
little more difficult to swap disks. I assume Martin S. knows what needs
to be done in order to tell the raid to drop a disk and be ready for the
new one. Question, does software raid allow hot swapping disks?

Yes - but even though I've done this many times, I still always look at
the manual before removing a disk from the set :wink:

2. I forgot to talk with OSUOSL about upgrading/changing the OS on
osgeo4. Since we are redoing the disks I suspect this will need to
happen. I will email them.

See above.

Proposal Work Plan:

Step 1: Offload osgeo4

1. Hotcopy mail from osgeo3 to osgeo4.

.... "from osgeo4 to osgeo3" in order to offload osgeoo4 !?

2. Get new VM or docker, or docker in a vm (Debian 7) on OSUOSL ganeti
cluster. Invite projects to migrate their stuff from projects over to
the new spot.

I think inviting projects to migrate their stuff is the most
unpredictable item in the entire plan :slight_smile:

Practically speaking I'd prefer a plan which allows changing from HW
RAID to SW RAID (and maybe updating the host OS, if required)
independent from any other migration plans. Thus, if OSL offers to run
our VM's on their cluster for a limited period, then I'd like to do
exactly this: Move our VM's off osgeo4, redo osgeo4 (using OSL's
favourite host OS and Ganeti) and move the VM's back to osgeo4. If we
expect the projects to migrate their stuff in the same period, then we
might fail to meet the schedule.

Aside from that, I think the migration to Docker ("Docker" used as a
synonym for Linux containers) is a different taks. One of the main
benefits of using containers is to make better use of the specific
topology of the hardware, much better than virtualization can do.
Therefore I'd refrain from using containers inside a VM, because the
net benefit isn't convincing and instead start using containers on new
hardware with no virtualization in place.

While we're at it: Are we talking about redoing osgeo4 only, not osgeo3
as well ?

Step 4: Buy new hardware (sometime in the next year)

See above.

Invite projects to a new sac-announce list which sends out notices about
upcoming events.

I'd say we should request projects to subscribe to such list. As far as
I understand, communication to projects was mostly done over private
channels in the past and, as a consequence, they did expect the
continuation of private 'care' in case of upcoming changes. This
didn't work out as planned, as we know :wink:

- 24 hour response team

Hah, as far as I understand, the primary reason why *I* was invited
into the admin team was the fact that I don't live in any of the US
time zones :slight_smile:

Best regards,
  Martin.
--
Unix _IS_ user friendly - it's just selective about who its friends are !
--------------------------------------------------------------------------

On 09/16/2014 06:01 AM, Martin Spott wrote:

Hi, please excuse the late reply. I was so enthusiastic over the
GRASS/PostGIS stuff I did last weekend so I completely forgot about the
admin side of the coin :wink:

On Thu, Sep 11, 2014 at 11:10:01AM -0700, Alex Mandel wrote:

1. OSL can give Martin (and others if we request) access to the host OS
of osgeo3 and osgeo4 via their VPN. Which I think is the same way Martin
setup osgeo5/backup.

Sounds good, so we can detect and inspect issues without depending on
OSL's support.
Setting up the backup machine was a little bit more complex. I was
given access to some non-public network and then, as far as memory
serves, connected to a console-to-IP adapter using some special browser
plugin.

It's the same deal, you have to VPN to the non-public network. the
console-to-IP is only when the main host isn't booting. Otherwise you
can ssh to the host.

3. Justin thinks there's enough space on other OSL hardware to shift
osgeo4 vms for approximately 1 month so we can redo osgeo4. AKA switch
to software raid 5, update the host OS, etc... We should schedule this
in advance based on SAC availability to do required work.

I have to admit I have no idea about what exactly they did to get 'our'
Ganeti setup running. I suspect they apply some sort of automated
install onto every machine in order to keep all affected systems
compatible to each other. Right ?
Therefore I also suspect they have a preferred host OS for good reasons
and I don't think it's advisable to question their choice. I also
didn't mean to question Ganeti in general, just having too limited
access was a bit unfortunate.

All I know is the host OS is currently Gentoo and OSUOSL has mentioned
that they now standardize on CentOS. Yes I too would assume they use
something to keep all the nodes the same, possibly Chef (Justin said
that's what they tend to use).

On Sat, Sep 13, 2014 at 10:37:55AM -0700, Alex Mandel wrote:

1. OSUOSL did point out that if we switch to software raid, it will be a
little more difficult to swap disks. I assume Martin S. knows what needs
to be done in order to tell the raid to drop a disk and be ready for the
new one. Question, does software raid allow hot swapping disks?

Yes - but even though I've done this many times, I still always look at
the manual before removing a disk from the set :wink:

2. I forgot to talk with OSUOSL about upgrading/changing the OS on
osgeo4. Since we are redoing the disks I suspect this will need to
happen. I will email them.

See above.

Proposal Work Plan:

Step 1: Offload osgeo4

1. Hotcopy mail from osgeo3 to osgeo4.

.... "from osgeo4 to osgeo3" in order to offload osgeoo4 !?

Yes to offload osgeo4 and have 0 downtime or less than 1 minute downtime
for Mail.

Long version, enable drbd, wait a few hours for it to sync, failover to
osgeo3, remove drbd. It's a trick to shuffle a non-drbd instance with 0
downtime. Only works with nodes on the same ganeti cluster.

2. Get new VM or docker, or docker in a vm (Debian 7) on OSUOSL ganeti
cluster. Invite projects to migrate their stuff from projects over to
the new spot.

I think inviting projects to migrate their stuff is the most
unpredictable item in the entire plan :slight_smile:

Agreed, see more below.

Practically speaking I'd prefer a plan which allows changing from HW
RAID to SW RAID (and maybe updating the host OS, if required)
independent from any other migration plans. Thus, if OSL offers to run
our VM's on their cluster for a limited period, then I'd like to do
exactly this: Move our VM's off osgeo4, redo osgeo4 (using OSL's
favourite host OS and Ganeti) and move the VM's back to osgeo4. If we
expect the projects to migrate their stuff in the same period, then we
might fail to meet the schedule.

For me there are 2 considerations:
1. Downtime - my proposal has almost 0 downtime by shuffling stuff
between running machines. Moving a VM between clusters requires downtime
of hours.

2. Cruft, Markus' biggest complaint about the current Projects VM is
that there seems to be too much stuff in once place to figure out what's
actually causing the troubles.

Aside from that, I think the migration to Docker ("Docker" used as a
synonym for Linux containers) is a different taks. One of the main
benefits of using containers is to make better use of the specific
topology of the hardware, much better than virtualization can do.
Therefore I'd refrain from using containers inside a VM, because the
net benefit isn't convincing and instead start using containers on new
hardware with no virtualization in place.

While we're at it: Are we talking about redoing osgeo4 only, not osgeo3
as well ?

Nope just osgeo4, osgeo3 is running ok right now. Though after we redo
osgeo4 we could migrate all the VMs over to osgeo4 from osgeo3 so we can
upgrade the host OS on osgeo3 to match. Since the raid 5 on osgeo3 seems
to perform ok I'd rather not touch it.

Step 4: Buy new hardware (sometime in the next year)

See above.

I suggest Docker for the new hardware as OSUOSL is familiar with it and
it might be a better utilization of our hardware. It also means that
once a project writes a docker setup script, we know how to deploy it
anywhere.

Invite projects to a new sac-announce list which sends out notices about
upcoming events.

I'd say we should request projects to subscribe to such list. As far as
I understand, communication to projects was mostly done over private
channels in the past and, as a consequence, they did expect the
continuation of private 'care' in case of upcoming changes. This
didn't work out as planned, as we know :wink:

Markus filed a ticket to make a Sac-announce list, Jachym (Board
Secretary) said he will forward a request for projects to join once I
send it to him.

- 24 hour response team

Hah, as far as I understand, the primary reason why *I* was invited
into the admin team was the fact that I don't live in any of the US
time zones :slight_smile:

Sure, the board just wants to explore ways to fund support services.
Since you and me isn't always enough to respond to all emergencies.

Best regards,
  Martin.

Thanks,
Alex

On Tue, Sep 16, 2014 at 10:52:06AM -0700, Alex Mandel wrote:

On 09/16/2014 06:01 AM, Martin Spott wrote:

> Practically speaking I'd prefer a plan which allows changing from HW
> RAID to SW RAID (and maybe updating the host OS, if required)
> independent from any other migration plans. Thus, if OSL offers to run
> our VM's on their cluster for a limited period, then I'd like to do
> exactly this: Move our VM's off osgeo4, redo osgeo4 (using OSL's
> favourite host OS and Ganeti) and move the VM's back to osgeo4. If we
> expect the projects to migrate their stuff in the same period, then we
> might fail to meet the schedule.
>
For me there are 2 considerations:
1. Downtime - my proposal has almost 0 downtime by shuffling stuff
between running machines. Moving a VM between clusters requires downtime
of hours.

I agree that downtime is a little bit unfortunate for the main
communication relay, the "mail" VM, but for the other ones ? I mean,
if we announce a maintenance window let's say one week in advance, I'd
say this should be ok.
I can't imagine how to estimate the time required to move let's say all
projects to Docker, therefore I'd prefer not to mix host OS renovation
and migration to Docker.

2. Cruft, Markus' biggest complaint about the current Projects VM is
that there seems to be too much stuff in once place to figure out what's
actually causing the troubles.

Mmmmmh, I'd say we should try to solve one problem at a time.

- First there's the RAID which doesn't perform as planned and we're
trying to fix this issue by converting HW RAID into SW RAID.
- Second there's a lot of cruft on the "projects" VM which deserves a
cleanup.
- Third we don't have access to the host OS - which, from my
perspective, is the main reason why whe've been unable to perform a
reasonable diagnosis of the current trouble. It's not the cruft on the
"projects" VM which prevents better diagnosis - at least according to
my opinion.

I suggest Docker for the new hardware [...]

Agreed.

Cheers,
  Martin.
--
Unix _IS_ user friendly - it's just selective about who its friends are !
--------------------------------------------------------------------------

hi,

2014-09-16 19:52 GMT+02:00 Alex Mandel <tech_dev@wildintellect.com>:

On 09/16/2014 06:01 AM, Martin Spott wrote:

[ ....]

I'd say we should request projects to subscribe to such list. As far as
I understand, communication to projects was mostly done over private
channels in the past and, as a consequence, they did expect the
continuation of private 'care' in case of upcoming changes. This
didn't work out as planned, as we know :wink:

Markus filed a ticket to make a Sac-announce list, Jachym (Board
Secretary) said he will forward a request for projects to join once I
send it to him.

we agreed on creating questionnaire for the projects, in order to find
out, what do they need on new servers, right?

anything else? (sorry, if I forget, it was long day)

thanks

Jachym

--
Jachym Cepicky
e-mail: jachym.cepicky gmail com
URL: http://les-ejk.cz
GPG: http://les-ejk.cz/pgp/JachymCepicky.pgp

Give your code freedom with PyWPS - http://pywps.wald.intevation.org

Quick summary of the plan:

1. Hotcopy failover Mail from osgeo3 to osgeo4

2. Migrate as is Projects and Adhoc to OSUOSL ganeti hosts

3. Make a fresh Debian 7 VM for QGIS, they only have 2 sites now and
have people who can help with the migration. I will coordinate with them.

Once osgeo4 is clear OSUOSL will disconnect the hardware raid. Martin or
OSUOSL will reformat the main OS to something newer with software raid 5
of the disks.

We'll test the performance of the disks.

After clearing tests we will migrate all 4 VMs back.

If that sounds good I will start filing tickets with OSUOSL for the
tasks they are involved in.

Thanks,
Alex

On Thu, Sep 11, 2014 at 11:10 AM, Alex Mandel
<tech_dev@wildintellect.com> wrote:

Just some notes from when I had a change to sit down with Justin from
OSUOSL.

3. Justin thinks there's enough space on other OSL hardware to shift
osgeo4 vms for approximately 1 month so we can redo osgeo4. AKA switch
to software raid 5, update the host OS, etc... We should schedule this
in advance based on SAC availability to do required work.

...

Summary we might be able to put off buying a machine for 6 months to a
year. But I wouldn't want to hold off longer than that as our machines
are nearing 4 years in age.

Will osgeo4 be on new infrastructure in six months?

From an observer perspective, I see the most limited resource to be

SAC members' very kindly donated volunteer hours. Making more SAC
work (migrate osgeo4-->OSUOSL-->osgeo4-->new infrastructure in 6
months) to delay hardware purchases doesn't seem like the best use of
SAC time. That being said, I'm not the one doing the work so leave it
to those doing the work to suggest purchasing infrastructure and
migrating osgeo4-->new infrastructure. I know that some of the work
needs to be done either way and sometimes it isn't actually easier to
do the 'simpler' route or that sometimes things just need to get done
sooner than other plans allow.

Best regards, Eli

On 09/18/2014 10:12 AM, Eli Adam wrote:

On Thu, Sep 11, 2014 at 11:10 AM, Alex Mandel
<tech_dev@wildintellect.com> wrote:

Just some notes from when I had a change to sit down with Justin from
OSUOSL.

3. Justin thinks there's enough space on other OSL hardware to shift
osgeo4 vms for approximately 1 month so we can redo osgeo4. AKA switch
to software raid 5, update the host OS, etc... We should schedule this
in advance based on SAC availability to do required work.

...

Summary we might be able to put off buying a machine for 6 months to a
year. But I wouldn't want to hold off longer than that as our machines
are nearing 4 years in age.

Will osgeo4 be on new infrastructure in six months?

From an observer perspective, I see the most limited resource to be
SAC members' very kindly donated volunteer hours. Making more SAC
work (migrate osgeo4-->OSUOSL-->osgeo4-->new infrastructure in 6
months) to delay hardware purchases doesn't seem like the best use of
SAC time. That being said, I'm not the one doing the work so leave it
to those doing the work to suggest purchasing infrastructure and
migrating osgeo4-->new infrastructure. I know that some of the work
needs to be done either way and sometimes it isn't actually easier to
do the 'simpler' route or that sometimes things just need to get done
sooner than other plans allow.

Best regards, Eli

If it performs well enough, then no additional migration is needed. We
have no firm plans on new hardware but do have a need to fix the
performance now. We also don't know what exactly we plan to do with the
new hardware when we get it.

If I had confidence that there would be volunteers to cleanup and
migrate to fresh installs then we would do so.

Thanks,
--
Alex Mandel
http://wildintellect.com

On Wed, Sep 17, 2014 at 10:00:48PM -0700, Alex Mandel wrote:

Quick summary of the plan:

Sounds good to me, I'd suggest starting the disk mirror for "mail"
right now and planning the migration to and from OSL infrastructure in
calendar week 41 and 42.

Cheers,
  Martin.
--
Unix _IS_ user friendly - it's just selective about who its friends are !
--------------------------------------------------------------------------

On 09/18/2014 11:50 AM, Martin Spott wrote:

On Wed, Sep 17, 2014 at 10:00:48PM -0700, Alex Mandel wrote:

Quick summary of the plan:

Sounds good to me, I'd suggest starting the disk mirror for "mail"
right now and planning the migration to and from OSL infrastructure in
calendar week 41 and 42.

Cheers,
  Martin.

I'll be unavailable week 41 and part of week 42. But as long as the team
is on board it should be possible. Maybe we should start week 42, I'll
check with OSUOSL on schedule.

Thanks,
Alex

On Thu, Sep 18, 2014 at 06:58:32PM -0700, Alex Mandel wrote:

On 09/18/2014 11:50 AM, Martin Spott wrote:

> Sounds good to me, I'd suggest starting the disk mirror for "mail"
> right now and planning the migration to and from OSL infrastructure in
> calendar week 41 and 42.

I'll be unavailable week 41 and part of week 42. But as long as the team
is on board it should be possible. Maybe we should start week 42, I'll
check with OSUOSL on schedule.

In 41/42 I can take care of some stuff, in 43 the usual rat race kicks
in again. That's just an offer, if you prefer starting in 42, that's
ok with me.

Cheers,
  Martin.
--
Unix _IS_ user friendly - it's just selective about who its friends are !
--------------------------------------------------------------------------