[SAC] Status report for April - System Contract I 2020

Below is list of things I've been involed with in April revolving around
core infrastructure

1) Rebuilding Wiki. I built a new debian 10 container with latest
Wikimedia, PHP, apache installed -- but still exploring the db and stuff to
create a script to chuck bad user accounts.
I also need to retest the data restore script I have setup and revisit some
plugins that did not come thru (no upgrade path for them).

https://wiki.new.osgeo.org (this is running on OSGeo3)

I'm hoping to do the final migration sometime late May.

Once this is done we can setup a dev of this new version to do the LDAP
integration. This we can contract out -- we have two proposals from outside
contractors for LDAP work.

The below services are new/updated ones that use OSGeo LDAP for
authentication

2) Server Monitoring -- To replace the old munin monitoring, I put in place
Prometheus / Grafana (https://prometheus.io/docs/visualization/grafana/ )
(this is so I can address recent complaints about speed etc and know when a
container needs more resources)

https://prometheus.io/ - Prometheus is a monitoring tool with already a lot
of what they call "Exporters" available for it, most of the exporters are
written in Go. Some are specifically geared for specific applications and
seems like perhaps not much effort to write our own if needed. Since Go is
statically compiled these work on old servers as well and can be installed
by just copying the binaries and setting up the service script.

Exporters run as services on each Container / VM. I haven't installed any
of these on osgeo6 or osgeo5, but have many installed on the new LXD
containers
Exporter service scripts I have committed to this repo -
https://git.osgeo.org/gitea/sac/prometheus-config (and have collected the
node exporter (for OS monitoring) and nginx exporter (for nginx monitoring)

The nginx monitoring is installed on all the nginx proxies (OSGeo7, OSGeo3,
OSGeo4) and download container

There are 3 prometheus servers (I have 3 instead of 1 to ease network
management since the Prometheus servers are all on the same private network
of the containers they pull metrics from they don't need any additional
whitelisting)
OSGeo7 - nginx (collects all the metrics from the OSGeo7 container
exporters)
OSGeo3 - monitor (also runs Grafana and collects all metrics from OSGeo3
containers)
OSGeo4 - osgeo4-nginx (collects metrics from OSGeo4 containers)

Grafana is running on OSGeo3 monitor container -> https://monitor.osgeo.org
(all OSGeo LDAP users can log in to see the metrics).
I whitelisted it on OSGeo7 and OSGeo4 so it can query the Prometheus servers
for monitor stats.

3) https://repo.osgeo.org -- Jody Garnett spear-headed this effort. The
Repo service is running Nexus repository management.
Feel free to login and check it out and if you want to use it in some way
for your projects -- Please put in a trac ticket -
https://trac.osgeo.org/osgeo/
Nexus supports the following kinds of repos - Maven, Docker Registry, Apt,
Yum, Nuget, R CPAN, RubyGems, npm (and some other stuff I've never heard of)

At the moment the following projects are using it (GeoTools, GeoServer,
GeoNetwork -- for their maven repo), (PostGIS / GEOS to manage docker
containers we use for drone bot regression testing)

4) https://matrix.osgeo.org -- container running Matrix (synapse server
https://matrix.org/docs/guides/installing-synapse ) bridges with our
existing IRC channels and can be used for private chat rooms like GSoc ones
for example
Requires an OSGeo LDAP account to use. Talk to Sandro Santilli if you have
questions about how to use it.

5) nextcloud.osgeo.org - upgraded to 18.03 (Hub) (from 15)
https://nextcloud.com/hub/ . There was some discussion about coediting
documents being slow from I think QGIS group. Our version is running
Collabora for (Libre Office / MS Office online document editing)
  I am debating if we should switch to the Community server version which
uses OnlyOffice for document editing / collaboration (which I have
installed on experimental https://nextcloud.gallery.osgeo.org )
Waiting for feedback from QGIS PSC group to see if it's worthwhile to
switch. From a cursory play with both, the OnlyOffice seems faster (but I
have no one to collaborate with to test the group editing features).
As far as editing goes
Collabora seems to be better for LibreOffice , I had issues trying to
upgrade it though in that the PDF export no longer worked so had to revert
back.
OnlyOffice seems to be better for Microsoft documents (Word, Excel,
Powerpoint) (it screwed up one of my libre .odt docs, and also doesn't have
ability to view LibreOffice drawings from what I can tell)

Thanks,
Regina

On Sat, May 02, 2020 at 11:46:54PM -0400, Regina Obe wrote:

2) Server Monitoring -- To replace the old munin monitoring, I put in place

The nginx monitoring is installed on all the nginx proxies (OSGeo7, OSGeo3,
OSGeo4) and download container

There are 3 prometheus servers (I have 3 instead of 1 to ease network
management since the Prometheus servers are all on the same private network
of the containers they pull metrics from they don't need any additional
whitelisting)
OSGeo7 - nginx (collects all the metrics from the OSGeo7 container
exporters)
OSGeo3 - monitor (also runs Grafana and collects all metrics from OSGeo3
containers)
OSGeo4 - osgeo4-nginx (collects metrics from OSGeo4 containers)

Grafana is running on OSGeo3 monitor container -> https://monitor.osgeo.org
(all OSGeo LDAP users can log in to see the metrics).
I whitelisted it on OSGeo7 and OSGeo4 so it can query the Prometheus servers
for monitor stats.

Where am I supposed to find the graphs of nginx metrix ?
When going to monitor.osgeo.org, and logging in via OSGeo UserID
credentials, and picking "Home" menu item from the "Dashboard"
menu entry (the 4 squares icon), I only see:

  - "Grafana metrics"
  - "Prometheus 2.0 Stats"

Clicking on each of them shows me a few graphs, but all empty (No
Data) except from one under the "Promoetheus 2.0 Stats" and being
called "Scrape Duration". That graph shows 3 lines:

  - osgeo4-prometheus
  - osgeo4_nginx_os_exporter
  - osgeo4_nginx_web_exported

Where are the actual stats ?
Why can't I see them, if they are present ?

--strk;

on each of them shows me a few graphs, but all empty (No

Data) except from one under the "Promoetheus 2.0 Stats" and being called
"Scrape Duration". That graph shows 3 lines:

  - osgeo4-prometheus
  - osgeo4_nginx_os_exporter
  - osgeo4_nginx_web_exported

Where are the actual stats ?
Why can't I see them, if they are present ?

--strk;
_______________________________________________
Sac mailing list
Sac@lists.osgeo.org
https://lists.osgeo.org/mailman/listinfo/sac

I wonder if it's because I set you up first and I screwed things up initially.

Can you get to this link?

https://monitor.osgeo.org/d/MsjffzSZz7/osgeo7-nginx-by-nginxinc?orgId=2&refresh=5s

Or this:

https://monitor.osgeo.org/d/rYdddlPWk7/node-exporter-full-osgeo7?orgId=2

The navigation I admit is very confusing. Haven't researched how to make that easier.

Thanks,
Regina

-----Original Message-----
From: Regina Obe [mailto:lr@pcorp.us]
Sent: Sunday, May 3, 2020 12:57 PM
To: 'System Administration Committee Discussion/OSGeo'
<sac@lists.osgeo.org>
Subject: RE: [SAC] Server Monitoring (was: Status report for April - System
Contract I 2020)

on each of them shows me a few graphs, but all empty (No
> Data) except from one under the "Promoetheus 2.0 Stats" and being
> called "Scrape Duration". That graph shows 3 lines:
>
> - osgeo4-prometheus
> - osgeo4_nginx_os_exporter
> - osgeo4_nginx_web_exported
>
> Where are the actual stats ?
> Why can't I see them, if they are present ?
>
> --strk;
Thanks,
Regina

I usually get to the full list by going here

https://monitor.osgeo.org/dashboards

I have each LXD host dashboards in separate folders

For some reason on the Menu, they call dashboards - Manage

On Sun, May 03, 2020 at 01:26:25PM -0400, Regina Obe wrote:

I usually get to the full list by going here

https://monitor.osgeo.org/dashboards

Now I see them all in that page too (not sure what changed)

For some reason on the Menu, they call dashboards - Manage

Right, while dashboards - Home still doestn' show me all those
things. Maybe there's a way to "configure" a dashboard showing
selected stats from the different panels into a single view..

--strk;

  1. Rebuilding Wiki. I built a new debian 10 container with latest
    Wikimedia, PHP, apache installed – but still exploring the db and stuff to
    create a script to chuck bad user accounts.
    I also need to retest the data restore script I have setup and revisit some
    plugins that did not come thru (no upgrade path for them).

https://wiki.new.osgeo.org (this is running on OSGeo3)

I’m hoping to do the final migration sometime late May.

Once this is done we can setup a dev of this new version to do the LDAP
integration. This we can contract out – we have two proposals from outside
contractors for LDAP work.

I just wanted to be encouraging on this! Really glad to see it going ahead.

  1. Server Monitoring

Grafana is running on OSGeo3 monitor container → https://monitor.osgeo.org
(all OSGeo LDAP users can log in to see the metrics).
I whitelisted it on OSGeo7 and OSGeo4 so it can query the Prometheus servers
for monitor stats.

Thanks, I eventually managed to start a dashboard so I could find osgeo3 again quickly.

  1. https://repo.osgeo.org – Jody Garnett spear-headed this effort. The
    Repo service is running Nexus repository management.
    Feel free to login and check it out and if you want to use it in some way
    for your projects – Please put in a trac ticket -
    https://trac.osgeo.org/osgeo/
    Nexus supports the following kinds of repos - Maven, Docker Registry, Apt,
    Yum, Nuget, R CPAN, RubyGems, npm (and some other stuff I’ve never heard of)

At the moment the following projects are using it (GeoTools, GeoServer,
GeoNetwork – for their maven repo), (PostGIS / GEOS to manage docker
containers we use for drone bot regression testing).

GeoNetwork just voted to use it, going to try and connect it up this week.
GeoServer also plans to use docker for help facilitate automated cite testing.

  1. https://matrix.osgeo.org – container running Matrix (synapse server
    https://matrix.org/docs/guides/installing-synapse ) bridges with our
    existing IRC channels and can be used for private chat rooms like GSoc ones
    for example Requires an OSGeo LDAP account to use. Talk to Sandro Santilli if you have
    questions about how to use it.

I highly recommend this, the killer feature is the ability to see prior chat history.

On Mon, May 04, 2020 at 09:56:21PM -0700, Jody Garnett wrote:

> Grafana is running on OSGeo3 monitor container -> https://monitor.osgeo.org

Thanks, I eventually managed to start a dashboard so I could find osgeo3
again quickly.

Can you check if Dashboards can be shared with other users ?
Because I keep having NO dashboards in the "home" page for dashboards
and only see some of them in the "manage" page, were every dashboard
seems to be attached to exactly one of the hosts. I guess it must be
possible to have a single dashboard showing "health" of all OSGeo
services instead ?

--strk;

Subject: Re: [OSGeo-Discuss] [Projects] Status report for April - System
Contract I 2020

On Mon, May 04, 2020 at 09:56:21PM -0700, Jody Garnett wrote:

> > Grafana is running on OSGeo3 monitor container ->
> > https://monitor.osgeo.org
>
> Thanks, I eventually managed to start a dashboard so I could find
> osgeo3 again quickly.

Can you check if Dashboards can be shared with other users ?
Because I keep having NO dashboards in the "home" page for dashboards
and only see some of them in the "manage" page, were every dashboard
seems to be attached to exactly one of the hosts. I guess it must be

possible

to have a single dashboard showing "health" of all OSGeo services instead

?

--strk;

I think every one that is logged in has access to view all the dashboards
and admins can edit them. So it's just the navigation that is a bit screwy.
I usually just favorite the ones I like and then they show on my home page.

You want to volunteer to help me figure out the UI stuff. I'm more focused
on making sure we are collecting the metrics we need and there are a ton
more we can be collecting that would probably be useful.

In the ldap config there is also a way to segregate people based on LDAP
group and then do a finer grain control of the dashboards viewed.

The Prometheus queries are all written in some special time series focused
language https://prometheus.io/docs/prometheus/latest/querying/examples/

I don't have time to learn all that so I've just been installing dashboards
from Grafana for Prometheus that look useful and that query the metrics we
are currently collecting. https://grafana.com/grafana/dashboards We can
study some of the queries and merge together the ones we like into a single
dashboard I think will satisfy your need for an all-encompassing dashboard.
If you look under the hood of the json of the dashboards you'll see the data
sources specified (corresponding to each of the Prometheus servers) and the
job node etc.

Thanks,
Regina