[SAC] New Hardware, can we purchase now

robe · March 2, 2018, 8:25pm

I'm in IRC meeting with Chris and he recalls the only outstanding thing
before hardware purchase was the disk size

[15:17] <TemptorSent> From my reply to the mailing list a while back, the
pricing for larger drives: (+$212 for 4x10he or +$540 for 4x12he)
[15:19] <TemptorSent> That gives us practical double-redundant storage of
12-16TB and 16-20TB respectively, depending how we use it.

If that is all, can we just get the bigger disk and move forward with the
hardware purchase. Unless of course the purchase has already been made.

Thanks,
Regina

Alex_M · March 12, 2018, 5:41pm

On 03/02/2018 12:25 PM, Regina Obe wrote:

I'm in IRC meeting with Chris and he recalls the only outstanding thing
before hardware purchase was the disk size

[15:17] <TemptorSent> From my reply to the mailing list a while back, the
pricing for larger drives: (+$212 for 4x10he or +$540 for 4x12he)
[15:19] <TemptorSent> That gives us practical double-redundant storage of
12-16TB and 16-20TB respectively, depending how we use it.

If that is all, can we just get the bigger disk and move forward with the
hardware purchase. Unless of course the purchase has already been made.

Thanks,
Regina

Apologies, I dropped the ball on many things while traveling for work...

My take on this, I was unclear on if we really understood how we would
utilize the hardware for the needs, since there are a few new
technologies in discussion we haven't used before. Was also in favor of
small savings as we're over the line item, and that money could be used
for things like people hours or 3rd party hosting, spare parts, etc...

So a few questions:
1. If we get the optane card, do we really need the SSDs? What would we
put on the SSDs that would benefit from it, considering the Optane card?
2. What caching tool will we use with the Optane? Something like
fscache/CacheFS that just does everything accessed, or something
configured per site like varnish/memcache etc?
3. Our storage growth is modest, not that I don't consider the quoted 8
or 10 TB to be reliable, but the 2 and 4 TB models have a lot more
reliability data, and take significantly less time to rebuild in a Raid
configuration. So how much storage do we really need for Downloads and
Foss4g archives?
4. Do we know what we plan to put on the SSD drives vs the Spinning Disks?

I think with the answers to these we'll be able to vote this week and order.

Thanks,
Alex

Chris_Giorgi · March 13, 2018, 7:57pm

Hi Alex,
Answers inline below:
Take care,
~~~Chris~~~

On Mon, Mar 12, 2018 at 10:41 AM, Alex M <tech_dev@wildintellect.com> wrote:

On 03/02/2018 12:25 PM, Regina Obe wrote:

I'm in IRC meeting with Chris and he recalls the only outstanding thing
before hardware purchase was the disk size

[15:17] <TemptorSent> From my reply to the mailing list a while back, the
pricing for larger drives: (+$212 for 4x10he or +$540 for 4x12he)
[15:19] <TemptorSent> That gives us practical double-redundant storage of
12-16TB and 16-20TB respectively, depending how we use it.

If that is all, can we just get the bigger disk and move forward with the
hardware purchase. Unless of course the purchase has already been made.

Thanks,
Regina

Apologies, I dropped the ball on many things while traveling for work...

My take on this, I was unclear on if we really understood how we would
utilize the hardware for the needs, since there are a few new
technologies in discussion we haven't used before. Was also in favor of
small savings as we're over the line item, and that money could be used
for things like people hours or 3rd party hosting, spare parts, etc...

So a few questions:
1. If we get the optane card, do we really need the SSDs? What would we
put on the SSDs that would benefit from it, considering the Optane card?

The Optane is intended for caching frequently read data on very fast storage.
As a single unmirrored device, it is not recommended for write-caching of
important data, but will serve quite well for temporary scratch space.

Mirrored SSDs are required for write caching to prevent failure of a single
device causing data loss. The size of the write cache is very small by
comparison to the read cache, but the write-to-read ratio is much higher,
necessitating the larger total DWPD*size rating. The SSDs can also provide
the fast tablespace for databases as needed, which also have high write-
amplification. The total allocated space should probably be 40-60% of the
device size to ensure long-term endurance. The data stored on the SSDs
can be automatically backed up to the spinning rust on a regular basis for
improved redundancy.

2. What caching tool will we use with the Optane? Something like
fscache/CacheFS that just does everything accessed, or something
configured per site like varnish/memcache etc?

We can do both if desirable, allocating large cache for the fs (L2ARC in ZFS),
as well as providing an explicit cache where desirable. This configuration can
be modified at any time, as the system's operation is not dependent on the
caching device being active.

3. Our storage growth is modest, not that I don't consider the quoted 8
or 10 TB to be reliable, but the 2 and 4 TB models have a lot more
reliability data, and take significantly less time to rebuild in a Raid
configuration. So how much storage do we really need for Downloads and
Foss4g archives?

OSGeo-Live alone has a growth rate and retention policy that indicates needs for
on the order of 100GB-1TB over the next 5 years from my quick calculations, not
including any additional large datasets. Supporting the geodata project would
likely consume every bit of storage we throw at it and still be
thirsty for more in
short order, so I would propose serving only the warm data on the new server and
re-purposing one of the older machines for bulk cold storage and backups once
services have been migrated successfully.

Remember, the usable capacity will approximately equal the total capacity of a
single drive in a doubly redundant configuration with 4 drives at
proper filesystem
fill ratios. We'll gain some due to compression, but also want to provision for
snapshots and backup of the SSD based storage, so 1x single drive size is a
good SWAG. Resliver times for ZFS are based on actual stored data, not disk
size, and can be done online with minimal degradation of service, so that's a
moot point I believe.

4. Do we know what we plan to put on the SSD drives vs the Spinning Disks?

See (1).

I think with the answers to these we'll be able to vote this week and order.

Thanks,
Alex
_______________________________________________
Sac mailing list
Sac@lists.osgeo.org
https://lists.osgeo.org/mailman/listinfo/sac

Alex_M · March 14, 2018, 4:05pm

My overall response, I'm a little hesitant to implement so many new
technologies at the same time with only 1 person who knows them (Chris G).

My opinion
+1 on some use of ZFS, if we have a good guide
-1 on use of Funtoo, We've prefered Debian or Ubuntu for many years and
have more people comfortable with them.
+1 on trying LXD
+1 on Optane
?0 on the SSD caching

1. What tool are we using to configure write-caching on the SSDs? I'd
rather not be making an overly complicated database configuration.

2. That seems a reasonable answer to me, though do we still need the
SSDs if we use the Optane for caching? It sounds to me like Optane or
SSD would suffice.

3. Disks - Yes if we plan to archive OSGeo Live that would benefit from
larger disks. I'm a -1 on storing data for the geodata committee, unless
they can find large data that is not publicly hosted elsewhere. At which
point I would recommend we find partners to host the data like GeoForAll
members or companies like Amazon/Google etc... Keep in mind we also need
to plan for backup space. Note, I don't see the total usable disk size
of backup in the wiki, can someone figure that out and add it. We need
to update https://wiki.osgeo.org/wiki/SAC:Backups

New question, which disk are we installing the OS on, and therefore the
ZFS packages?

Thanks,
Alex

On 03/13/2018 12:57 PM, Chris Giorgi wrote:

Hi Alex,
Answers inline below:
Take care,
~~~Chris~~~

On Mon, Mar 12, 2018 at 10:41 AM, Alex M <tech_dev@wildintellect.com> wrote:

On 03/02/2018 12:25 PM, Regina Obe wrote:

I'm in IRC meeting with Chris and he recalls the only outstanding thing
before hardware purchase was the disk size

[15:17] <TemptorSent> From my reply to the mailing list a while back, the
pricing for larger drives: (+$212 for 4x10he or +$540 for 4x12he)
[15:19] <TemptorSent> That gives us practical double-redundant storage of
12-16TB and 16-20TB respectively, depending how we use it.

If that is all, can we just get the bigger disk and move forward with the
hardware purchase. Unless of course the purchase has already been made.

Thanks,
Regina

Apologies, I dropped the ball on many things while traveling for work...

My take on this, I was unclear on if we really understood how we would
utilize the hardware for the needs, since there are a few new
technologies in discussion we haven't used before. Was also in favor of
small savings as we're over the line item, and that money could be used
for things like people hours or 3rd party hosting, spare parts, etc...

So a few questions:
1. If we get the optane card, do we really need the SSDs? What would we
put on the SSDs that would benefit from it, considering the Optane card?

The Optane is intended for caching frequently read data on very fast storage.
As a single unmirrored device, it is not recommended for write-caching of
important data, but will serve quite well for temporary scratch space.

Mirrored SSDs are required for write caching to prevent failure of a single
device causing data loss. The size of the write cache is very small by
comparison to the read cache, but the write-to-read ratio is much higher,
necessitating the larger total DWPD*size rating. The SSDs can also provide
the fast tablespace for databases as needed, which also have high write-
amplification. The total allocated space should probably be 40-60% of the
device size to ensure long-term endurance. The data stored on the SSDs
can be automatically backed up to the spinning rust on a regular basis for
improved redundancy.

2. What caching tool will we use with the Optane? Something like
fscache/CacheFS that just does everything accessed, or something
configured per site like varnish/memcache etc?

We can do both if desirable, allocating large cache for the fs (L2ARC in ZFS),
as well as providing an explicit cache where desirable. This configuration can
be modified at any time, as the system's operation is not dependent on the
caching device being active.

3. Our storage growth is modest, not that I don't consider the quoted 8
or 10 TB to be reliable, but the 2 and 4 TB models have a lot more
reliability data, and take significantly less time to rebuild in a Raid
configuration. So how much storage do we really need for Downloads and
Foss4g archives?

OSGeo-Live alone has a growth rate and retention policy that indicates needs for
on the order of 100GB-1TB over the next 5 years from my quick calculations, not
including any additional large datasets. Supporting the geodata project would
likely consume every bit of storage we throw at it and still be
thirsty for more in
short order, so I would propose serving only the warm data on the new server and
re-purposing one of the older machines for bulk cold storage and backups once
services have been migrated successfully.

Remember, the usable capacity will approximately equal the total capacity of a
single drive in a doubly redundant configuration with 4 drives at
proper filesystem
fill ratios. We'll gain some due to compression, but also want to provision for
snapshots and backup of the SSD based storage, so 1x single drive size is a
good SWAG. Resliver times for ZFS are based on actual stored data, not disk
size, and can be done online with minimal degradation of service, so that's a
moot point I believe.

4. Do we know what we plan to put on the SSD drives vs the Spinning Disks?

See (1).

I think with the answers to these we'll be able to vote this week and order.

Thanks,
Alex
_______________________________________________
Sac mailing list
Sac@lists.osgeo.org
https://lists.osgeo.org/mailman/listinfo/sac

robe · March 15, 2018, 6:37am

Just a note about Funtoo.

It's my understanding that we'd be using hardware from the Funtoo projects and the things we had outlined to use on the Funtoo hardware
were initiatives (experiments) that have been stalled because we have no hardware. So to me this is like a supplement of our cloud server strategy.

Chris comment from IRC (which I think he had stated before Alex wrote below note but he had expressed previously on IRC)
"Services to bring up on the funtoo containers are nextcloud, weblate, gitlab, and drone, some or all of which we can migrate to our own hardware once it's running, but probably keeping the more compute intensive loads off our primary server."

So to me the key points are:
1) Funtoo will be providing hardware for this
2) We will be learning how to setup these services, which are things we decided fit our needs, and few of us have much experience if any with anyway.
There is a whole set of intricacies associated with a service that are specific to the service and will be the same regardless what hardware/OS we are running it on.

3) Yes later once we get our new hardware in place, if we can ever agree on a spec, we can move these services in that we'd decided are useful.
Some we may decide don't quite fit our need.

BTW I've added a couple more items to today's agenda (the Funtoo discussion, and also how we manage changes on servers using git as we seem pretty loosey goosey there)

https://wiki.osgeo.org/wiki/SAC_Meeting_2018-03-15#Agenda

Thanks,
Regina

-----Original Message-----
From: Sac [mailto:sac-bounces@lists.osgeo.org] On Behalf Of Alex M
Sent: Wednesday, March 14, 2018 12:05 PM
To: Chris Giorgi <chrisgiorgi@gmail.com>; System Administration Committee Discussion/OSGeo <sac@lists.osgeo.org>
Subject: Re: [SAC] New Hardware, can we purchase now

My overall response, I'm a little hesitant to implement so many new technologies at the same time with only 1 person who knows them (Chris G).

My opinion
+1 on some use of ZFS, if we have a good guide
-1 on use of Funtoo, We've prefered Debian or Ubuntu for many years and have more people comfortable with them.
+1 on trying LXD
+1 on Optane
?0 on the SSD caching

1. What tool are we using to configure write-caching on the SSDs? I'd rather not be making an overly complicated database configuration.

2. That seems a reasonable answer to me, though do we still need the SSDs if we use the Optane for caching? It sounds to me like Optane or SSD would suffice.

3. Disks - Yes if we plan to archive OSGeo Live that would benefit from larger disks. I'm a -1 on storing data for the geodata committee, unless they can find large data that is not publicly hosted elsewhere. At which point I would recommend we find partners to host the data like GeoForAll members or companies like Amazon/Google etc... Keep in mind we also need to plan for backup space. Note, I don't see the total usable disk size of backup in the wiki, can someone figure that out and add it. We need to update https://wiki.osgeo.org/wiki/SAC:Backups

New question, which disk are we installing the OS on, and therefore the ZFS packages?

Thanks,
Alex

On 03/13/2018 12:57 PM, Chris Giorgi wrote:

Hi Alex,
Answers inline below:
Take care,
~~~Chris~~~

On Mon, Mar 12, 2018 at 10:41 AM, Alex M <tech_dev@wildintellect.com> wrote:

On 03/02/2018 12:25 PM, Regina Obe wrote:

I'm in IRC meeting with Chris and he recalls the only outstanding
thing before hardware purchase was the disk size

[15:17] <TemptorSent> From my reply to the mailing list a while
back, the pricing for larger drives: (+$212 for 4x10he or +$540 for
4x12he) [15:19] <TemptorSent> That gives us practical
double-redundant storage of 12-16TB and 16-20TB respectively, depending how we use it.

If that is all, can we just get the bigger disk and move forward
with the hardware purchase. Unless of course the purchase has already been made.

Thanks,
Regina

Apologies, I dropped the ball on many things while traveling for work...

My take on this, I was unclear on if we really understood how we
would utilize the hardware for the needs, since there are a few new
technologies in discussion we haven't used before. Was also in favor
of small savings as we're over the line item, and that money could be
used for things like people hours or 3rd party hosting, spare parts, etc...

So a few questions:
1. If we get the optane card, do we really need the SSDs? What would
we put on the SSDs that would benefit from it, considering the Optane card?

The Optane is intended for caching frequently read data on very fast storage.
As a single unmirrored device, it is not recommended for write-caching
of important data, but will serve quite well for temporary scratch space.

Mirrored SSDs are required for write caching to prevent failure of a
single device causing data loss. The size of the write cache is very
small by comparison to the read cache, but the write-to-read ratio is
much higher, necessitating the larger total DWPD*size rating. The SSDs
can also provide the fast tablespace for databases as needed, which
also have high write- amplification. The total allocated space should
probably be 40-60% of the device size to ensure long-term endurance.
The data stored on the SSDs can be automatically backed up to the
spinning rust on a regular basis for improved redundancy.

2. What caching tool will we use with the Optane? Something like
fscache/CacheFS that just does everything accessed, or something
configured per site like varnish/memcache etc?

We can do both if desirable, allocating large cache for the fs (L2ARC
in ZFS), as well as providing an explicit cache where desirable. This
configuration can be modified at any time, as the system's operation
is not dependent on the caching device being active.

3. Our storage growth is modest, not that I don't consider the quoted
8 or 10 TB to be reliable, but the 2 and 4 TB models have a lot more
reliability data, and take significantly less time to rebuild in a
Raid configuration. So how much storage do we really need for
Downloads and Foss4g archives?

OSGeo-Live alone has a growth rate and retention policy that indicates
needs for on the order of 100GB-1TB over the next 5 years from my
quick calculations, not including any additional large datasets.
Supporting the geodata project would likely consume every bit of
storage we throw at it and still be thirsty for more in short order,
so I would propose serving only the warm data on the new server and
re-purposing one of the older machines for bulk cold storage and
backups once services have been migrated successfully.

Remember, the usable capacity will approximately equal the total
capacity of a single drive in a doubly redundant configuration with 4
drives at proper filesystem fill ratios. We'll gain some due to
compression, but also want to provision for snapshots and backup of
the SSD based storage, so 1x single drive size is a good SWAG.
Resliver times for ZFS are based on actual stored data, not disk size,
and can be done online with minimal degradation of service, so that's
a moot point I believe.

4. Do we know what we plan to put on the SSD drives vs the Spinning Disks?

See (1).

I think with the answers to these we'll be able to vote this week and order.

Thanks,
Alex
_______________________________________________
Sac mailing list
Sac@lists.osgeo.org
https://lists.osgeo.org/mailman/listinfo/sac

_______________________________________________
Sac mailing list
Sac@lists.osgeo.org
https://lists.osgeo.org/mailman/listinfo/sac

strk · March 15, 2018, 9:34am

On Thu, Mar 15, 2018 at 02:37:39AM -0400, Regina Obe wrote:

BTW I've added a couple more items to today's agenda (the Funtoo discussion, and also how we manage changes on servers using git as we seem pretty loosey goosey there)
https://wiki.osgeo.org/wiki/SAC_Meeting_2018-03-15#Agenda

Thanks a lot, fun and important points (respectively).

I hope we can discuss more on the mailing list too though,
as meetings are not accessible to everybody.

For loosey goosey git, for example, maybe Martin wants to
start a thread. I suggested using ansible, but it takes a
champion to drive that change.

--strk;

Alex_M · March 30, 2018, 6:21pm

Here's the latest quote with the modifications Chris suggested.

One question, any reason we can't just use the Optanes for both read &
write caches?

Otherwise unless there are other suggestions or clarifications, I will
send out another thread for an official vote to approve. Note the price
is +$1,000 more than originally budgeted.

Thanks,
Alex

On 03/14/2018 09:47 PM, Chris Giorgi wrote:

Further investigation into the chassis shows this is the base sm is using:
https://www.supermicro.com/products/system/1U/6019/SYS-6019P-MT.cfm
It has a full-height PCIe 3.0 x8 port, as well as a M2 PCIe 3.0 x4
slot on the motherboard.
In light of this, I am changing my recommendation to the following,
please follow-up with sm for pricing:
2ea. Intel Optane 900p 280GB PCIe 3.0 x4 with U.2 interfaces,
replacing SATA SSDs
..connected to either a SuperMicro AOC-SLG3-2E4R or AOC-SLG3-2E4R
(Depending on compatibility)
Then, a single M.2 SSD such as a 512GB Samsung 960 PRO in the motherboard slot.

With this configuration, the Optanes supply a very fast mirrored write
cache (ZFS ZIL SLOG), while the M.2 card provides read caching (ZFS
L2ARC), and no further cache configuration needed.

Let me know if that sound more palatable.
   ~~~Chris~~~

On Wed, Mar 14, 2018 at 10:36 AM, Chris Giorgi <chrisgiorgi@gmail.com> wrote:

Alex,

Simply put, write caching requires redundant devices; read caching does not.

The write cache can be relatively small -- it only needs to handle
writes which have not yet been committed to disks. This allows sync
writes to finish as soon as the data hits the SSD, with the write to
disk being done async. Failure of the write cache device(s) may result
in data loss and corruption, so they MUST be redundant for
reliability.

The read cache should be large enough to handle all hot and much warm
data. It provides a second level cache to the in-memory block cache,
so that cache-misses to evicted blocks can be serviced very quickly
without waiting for drives to seek. Failure of the read cache device
degrades performance, but has no impact on data integrity.

  ~~~Chris~~~

On Wed, Mar 14, 2018 at 9:05 AM, Alex M <tech_dev@wildintellect.com> wrote:

My overall response, I'm a little hesitant to implement so many new
technologies at the same time with only 1 person who knows them (Chris G).

My opinion
+1 on some use of ZFS, if we have a good guide
-1 on use of Funtoo, We've prefered Debian or Ubuntu for many years and
have more people comfortable with them.
+1 on trying LXD
+1 on Optane
?0 on the SSD caching

1. What tool are we using to configure write-caching on the SSDs? I'd
rather not be making an overly complicated database configuration.

2. That seems a reasonable answer to me, though do we still need the
SSDs if we use the Optane for caching? It sounds to me like Optane or
SSD would suffice.

3. Disks - Yes if we plan to archive OSGeo Live that would benefit from
larger disks. I'm a -1 on storing data for the geodata committee, unless
they can find large data that is not publicly hosted elsewhere. At which
point I would recommend we find partners to host the data like GeoForAll
members or companies like Amazon/Google etc... Keep in mind we also need
to plan for backup space. Note, I don't see the total usable disk size
of backup in the wiki, can someone figure that out and add it. We need
to update https://wiki.osgeo.org/wiki/SAC:Backups

New question, which disk are we installing the OS on, and therefore the
ZFS packages?

Thanks,
Alex

On 03/13/2018 12:57 PM, Chris Giorgi wrote:

Hi Alex,
Answers inline below:
Take care,
   ~~~Chris~~~

On Mon, Mar 12, 2018 at 10:41 AM, Alex M <tech_dev@wildintellect.com> wrote:

On 03/02/2018 12:25 PM, Regina Obe wrote:

I'm in IRC meeting with Chris and he recalls the only outstanding thing
before hardware purchase was the disk size

[15:17] <TemptorSent> From my reply to the mailing list a while back, the
pricing for larger drives: (+$212 for 4x10he or +$540 for 4x12he)
[15:19] <TemptorSent> That gives us practical double-redundant storage of
12-16TB and 16-20TB respectively, depending how we use it.

If that is all, can we just get the bigger disk and move forward with the
hardware purchase. Unless of course the purchase has already been made.

Thanks,
Regina

Apologies, I dropped the ball on many things while traveling for work...

My take on this, I was unclear on if we really understood how we would
utilize the hardware for the needs, since there are a few new
technologies in discussion we haven't used before. Was also in favor of
small savings as we're over the line item, and that money could be used
for things like people hours or 3rd party hosting, spare parts, etc...

So a few questions:
1. If we get the optane card, do we really need the SSDs? What would we
put on the SSDs that would benefit from it, considering the Optane card?

The Optane is intended for caching frequently read data on very fast storage.
As a single unmirrored device, it is not recommended for write-caching of
important data, but will serve quite well for temporary scratch space.

Mirrored SSDs are required for write caching to prevent failure of a single
device causing data loss. The size of the write cache is very small by
comparison to the read cache, but the write-to-read ratio is much higher,
necessitating the larger total DWPD*size rating. The SSDs can also provide
the fast tablespace for databases as needed, which also have high write-
amplification. The total allocated space should probably be 40-60% of the
device size to ensure long-term endurance. The data stored on the SSDs
can be automatically backed up to the spinning rust on a regular basis for
improved redundancy.

2. What caching tool will we use with the Optane? Something like
fscache/CacheFS that just does everything accessed, or something
configured per site like varnish/memcache etc?

We can do both if desirable, allocating large cache for the fs (L2ARC in ZFS),
as well as providing an explicit cache where desirable. This configuration can
be modified at any time, as the system's operation is not dependent on the
caching device being active.

3. Our storage growth is modest, not that I don't consider the quoted 8
or 10 TB to be reliable, but the 2 and 4 TB models have a lot more
reliability data, and take significantly less time to rebuild in a Raid
configuration. So how much storage do we really need for Downloads and
Foss4g archives?

OSGeo-Live alone has a growth rate and retention policy that indicates needs for
on the order of 100GB-1TB over the next 5 years from my quick calculations, not
including any additional large datasets. Supporting the geodata project would
likely consume every bit of storage we throw at it and still be
thirsty for more in
short order, so I would propose serving only the warm data on the new server and
re-purposing one of the older machines for bulk cold storage and backups once
services have been migrated successfully.

Remember, the usable capacity will approximately equal the total capacity of a
single drive in a doubly redundant configuration with 4 drives at
proper filesystem
fill ratios. We'll gain some due to compression, but also want to provision for
snapshots and backup of the SSD based storage, so 1x single drive size is a
good SWAG. Resliver times for ZFS are based on actual stored data, not disk
size, and can be done online with minimal degradation of service, so that's a
moot point I believe.

4. Do we know what we plan to put on the SSD drives vs the Spinning Disks?

See (1).

I think with the answers to these we'll be able to vote this week and order.

Thanks,
Alex
_______________________________________________
Sac mailing list
Sac@lists.osgeo.org
https://lists.osgeo.org/mailman/listinfo/sac

Harrison_Grundy · March 30, 2018, 6:53pm

Can someone confirm that the 4x PCIe slots aren't shared with the M.2 slot on the board and that 2 independent 4x slots are available?

If all 3 (SSD, Optanes) are on a single 4x bus, it kinda defeats the purpose.

Harrison

Sent via the BlackBerry Hub for Android

Original Message
From: tech_dev@wildintellect.com
Sent: March 31, 2018 02:21
To: sac@lists.osgeo.org
Reply-to: tech@wildintellect.com; sac@lists.osgeo.org
Cc: chrisgiorgi@gmail.com
Subject: Re: [SAC] New Hardware, can we purchase now

Here's the latest quote with the modifications Chris suggested.

One question, any reason we can't just use the Optanes for both read &
write caches?

Otherwise unless there are other suggestions or clarifications, I will
send out another thread for an official vote to approve. Note the price
is +$1,000 more than originally budgeted.

Thanks,
Alex

On 03/14/2018 09:47 PM, Chris Giorgi wrote:

Further investigation into the chassis shows this is the base sm is using:
https://www.supermicro.com/products/system/1U/6019/SYS-6019P-MT.cfm
It has a full-height PCIe 3.0 x8 port, as well as a M2 PCIe 3.0 x4
slot on the motherboard.
In light of this, I am changing my recommendation to the following,
please follow-up with sm for pricing:
2ea. Intel Optane 900p 280GB PCIe 3.0 x4 with U.2 interfaces,
replacing SATA SSDs
..connected to either a SuperMicro AOC-SLG3-2E4R or AOC-SLG3-2E4R
(Depending on compatibility)
Then, a single M.2 SSD such as a 512GB Samsung 960 PRO in the motherboard slot.

With this configuration, the Optanes supply a very fast mirrored write
cache (ZFS ZIL SLOG), while the M.2 card provides read caching (ZFS
L2ARC), and no further cache configuration needed.

Let me know if that sound more palatable.
~~~Chris~~~

On Wed, Mar 14, 2018 at 10:36 AM, Chris Giorgi <chrisgiorgi@gmail.com> wrote:

Alex,

Simply put, write caching requires redundant devices; read caching does not.

The write cache can be relatively small -- it only needs to handle
writes which have not yet been committed to disks. This allows sync
writes to finish as soon as the data hits the SSD, with the write to
disk being done async. Failure of the write cache device(s) may result
in data loss and corruption, so they MUST be redundant for
reliability.

The read cache should be large enough to handle all hot and much warm
data. It provides a second level cache to the in-memory block cache,
so that cache-misses to evicted blocks can be serviced very quickly
without waiting for drives to seek. Failure of the read cache device
degrades performance, but has no impact on data integrity.

~~~Chris~~~

On Wed, Mar 14, 2018 at 9:05 AM, Alex M <tech_dev@wildintellect.com> wrote:

My overall response, I'm a little hesitant to implement so many new
technologies at the same time with only 1 person who knows them (Chris G).

My opinion
+1 on some use of ZFS, if we have a good guide
-1 on use of Funtoo, We've prefered Debian or Ubuntu for many years and
have more people comfortable with them.
+1 on trying LXD
+1 on Optane
?0 on the SSD caching

1. What tool are we using to configure write-caching on the SSDs? I'd
rather not be making an overly complicated database configuration.

2. That seems a reasonable answer to me, though do we still need the
SSDs if we use the Optane for caching? It sounds to me like Optane or
SSD would suffice.

3. Disks - Yes if we plan to archive OSGeo Live that would benefit from
larger disks. I'm a -1 on storing data for the geodata committee, unless
they can find large data that is not publicly hosted elsewhere. At which
point I would recommend we find partners to host the data like GeoForAll
members or companies like Amazon/Google etc... Keep in mind we also need
to plan for backup space. Note, I don't see the total usable disk size
of backup in the wiki, can someone figure that out and add it. We need
to update https://wiki.osgeo.org/wiki/SAC:Backups

New question, which disk are we installing the OS on, and therefore the
ZFS packages?

Thanks,
Alex

On 03/13/2018 12:57 PM, Chris Giorgi wrote:

Hi Alex,
Answers inline below:
Take care,
~~~Chris~~~

On Mon, Mar 12, 2018 at 10:41 AM, Alex M <tech_dev@wildintellect.com> wrote:

On 03/02/2018 12:25 PM, Regina Obe wrote:

I'm in IRC meeting with Chris and he recalls the only outstanding thing
before hardware purchase was the disk size

[15:17] <TemptorSent> From my reply to the mailing list a while back, the
pricing for larger drives: (+$212 for 4x10he or +$540 for 4x12he)
[15:19] <TemptorSent> That gives us practical double-redundant storage of
12-16TB and 16-20TB respectively, depending how we use it.

If that is all, can we just get the bigger disk and move forward with the
hardware purchase. Unless of course the purchase has already been made.

Thanks,
Regina

Apologies, I dropped the ball on many things while traveling for work...

My take on this, I was unclear on if we really understood how we would
utilize the hardware for the needs, since there are a few new
technologies in discussion we haven't used before. Was also in favor of
small savings as we're over the line item, and that money could be used
for things like people hours or 3rd party hosting, spare parts, etc...

So a few questions:
1. If we get the optane card, do we really need the SSDs? What would we
put on the SSDs that would benefit from it, considering the Optane card?

The Optane is intended for caching frequently read data on very fast storage.
As a single unmirrored device, it is not recommended for write-caching of
important data, but will serve quite well for temporary scratch space.

Mirrored SSDs are required for write caching to prevent failure of a single
device causing data loss. The size of the write cache is very small by
comparison to the read cache, but the write-to-read ratio is much higher,
necessitating the larger total DWPD*size rating. The SSDs can also provide
the fast tablespace for databases as needed, which also have high write-
amplification. The total allocated space should probably be 40-60% of the
device size to ensure long-term endurance. The data stored on the SSDs
can be automatically backed up to the spinning rust on a regular basis for
improved redundancy.

2. What caching tool will we use with the Optane? Something like
fscache/CacheFS that just does everything accessed, or something
configured per site like varnish/memcache etc?

We can do both if desirable, allocating large cache for the fs (L2ARC in ZFS),
as well as providing an explicit cache where desirable. This configuration can
be modified at any time, as the system's operation is not dependent on the
caching device being active.

3. Our storage growth is modest, not that I don't consider the quoted 8
or 10 TB to be reliable, but the 2 and 4 TB models have a lot more
reliability data, and take significantly less time to rebuild in a Raid
configuration. So how much storage do we really need for Downloads and
Foss4g archives?

OSGeo-Live alone has a growth rate and retention policy that indicates needs for
on the order of 100GB-1TB over the next 5 years from my quick calculations, not
including any additional large datasets. Supporting the geodata project would
likely consume every bit of storage we throw at it and still be
thirsty for more in
short order, so I would propose serving only the warm data on the new server and
re-purposing one of the older machines for bulk cold storage and backups once
services have been migrated successfully.

Remember, the usable capacity will approximately equal the total capacity of a
single drive in a doubly redundant configuration with 4 drives at
proper filesystem
fill ratios. We'll gain some due to compression, but also want to provision for
snapshots and backup of the SSD based storage, so 1x single drive size is a
good SWAG. Resliver times for ZFS are based on actual stored data, not disk
size, and can be done online with minimal degradation of service, so that's a
moot point I believe.

4. Do we know what we plan to put on the SSD drives vs the Spinning Disks?

See (1).

I think with the answers to these we'll be able to vote this week and order.

Thanks,
Alex
_______________________________________________
Sac mailing list
Sac@lists.osgeo.org
https://lists.osgeo.org/mailman/listinfo/sac

_______________________________________________
Sac mailing list
Sac@lists.osgeo.org
https://lists.osgeo.org/mailman/listinfo/sac

Chris_Giorgi · March 30, 2018, 9:54pm

I'm not sure how we would go about fitting a third Optane device --
the quote had HHHL PCIe cards listed, not the required U.2 devices
which go in place of the micron sata ssds.
The PCIe -> U.2 interface card provides 4 PCIe 3.0 lanes to each U.2
interface, which then connects by cable to the drives themselves.
The M.2 card slot on the board should be on it's own set of lanes, as
none of the remaining PCIe slots on the board are occupied due to
space constraints.
The reason for using the more expensive (and faster) Optanes for the
write cache is that a write-cache failure can lead to data corruption,
and they have an order of magnitude more write endurance than a
standard SSD.
The read cache can use a larger, cheaper (but still fast) SSD because
it see much lower write-amplification than the write cache and a
failure won't cause corruption.

~~~Chris~~~

On Fri, Mar 30, 2018 at 11:53 AM, <harrison.grundy@astrodoggroup.com> wrote:

Can someone confirm that the 4x PCIe slots aren't shared with the M.2 slot on the board and that 2 independent 4x slots are available?

If all 3 (SSD, Optanes) are on a single 4x bus, it kinda defeats the purpose.

Harrison

Sent via the BlackBerry Hub for Android

  Original Message
From: tech_dev@wildintellect.com
Sent: March 31, 2018 02:21
To: sac@lists.osgeo.org
Reply-to: tech@wildintellect.com; sac@lists.osgeo.org
Cc: chrisgiorgi@gmail.com
Subject: Re: [SAC] New Hardware, can we purchase now

Here's the latest quote with the modifications Chris suggested.

One question, any reason we can't just use the Optanes for both read &
write caches?

Otherwise unless there are other suggestions or clarifications, I will
send out another thread for an official vote to approve. Note the price
is +$1,000 more than originally budgeted.

Thanks,
Alex

On 03/14/2018 09:47 PM, Chris Giorgi wrote:

Further investigation into the chassis shows this is the base sm is using:
https://www.supermicro.com/products/system/1U/6019/SYS-6019P-MT.cfm
It has a full-height PCIe 3.0 x8 port, as well as a M2 PCIe 3.0 x4
slot on the motherboard.
In light of this, I am changing my recommendation to the following,
please follow-up with sm for pricing:
2ea. Intel Optane 900p 280GB PCIe 3.0 x4 with U.2 interfaces,
replacing SATA SSDs
..connected to either a SuperMicro AOC-SLG3-2E4R or AOC-SLG3-2E4R
(Depending on compatibility)
Then, a single M.2 SSD such as a 512GB Samsung 960 PRO in the motherboard slot.

With this configuration, the Optanes supply a very fast mirrored write
cache (ZFS ZIL SLOG), while the M.2 card provides read caching (ZFS
L2ARC), and no further cache configuration needed.

Let me know if that sound more palatable.
   ~~~Chris~~~

On Wed, Mar 14, 2018 at 10:36 AM, Chris Giorgi <chrisgiorgi@gmail.com> wrote:

Alex,

Simply put, write caching requires redundant devices; read caching does not.

The write cache can be relatively small -- it only needs to handle
writes which have not yet been committed to disks. This allows sync
writes to finish as soon as the data hits the SSD, with the write to
disk being done async. Failure of the write cache device(s) may result
in data loss and corruption, so they MUST be redundant for
reliability.

The read cache should be large enough to handle all hot and much warm
data. It provides a second level cache to the in-memory block cache,
so that cache-misses to evicted blocks can be serviced very quickly
without waiting for drives to seek. Failure of the read cache device
degrades performance, but has no impact on data integrity.

  ~~~Chris~~~

On Wed, Mar 14, 2018 at 9:05 AM, Alex M <tech_dev@wildintellect.com> wrote:

My overall response, I'm a little hesitant to implement so many new
technologies at the same time with only 1 person who knows them (Chris G).

My opinion
+1 on some use of ZFS, if we have a good guide
-1 on use of Funtoo, We've prefered Debian or Ubuntu for many years and
have more people comfortable with them.
+1 on trying LXD
+1 on Optane
?0 on the SSD caching

1. What tool are we using to configure write-caching on the SSDs? I'd
rather not be making an overly complicated database configuration.

2. That seems a reasonable answer to me, though do we still need the
SSDs if we use the Optane for caching? It sounds to me like Optane or
SSD would suffice.

3. Disks - Yes if we plan to archive OSGeo Live that would benefit from
larger disks. I'm a -1 on storing data for the geodata committee, unless
they can find large data that is not publicly hosted elsewhere. At which
point I would recommend we find partners to host the data like GeoForAll
members or companies like Amazon/Google etc... Keep in mind we also need
to plan for backup space. Note, I don't see the total usable disk size
of backup in the wiki, can someone figure that out and add it. We need
to update https://wiki.osgeo.org/wiki/SAC:Backups

New question, which disk are we installing the OS on, and therefore the
ZFS packages?

Thanks,
Alex

On 03/13/2018 12:57 PM, Chris Giorgi wrote:

Hi Alex,
Answers inline below:
Take care,
   ~~~Chris~~~

On Mon, Mar 12, 2018 at 10:41 AM, Alex M <tech_dev@wildintellect.com> wrote:

On 03/02/2018 12:25 PM, Regina Obe wrote:

I'm in IRC meeting with Chris and he recalls the only outstanding thing
before hardware purchase was the disk size

[15:17] <TemptorSent> From my reply to the mailing list a while back, the
pricing for larger drives: (+$212 for 4x10he or +$540 for 4x12he)
[15:19] <TemptorSent> That gives us practical double-redundant storage of
12-16TB and 16-20TB respectively, depending how we use it.

If that is all, can we just get the bigger disk and move forward with the
hardware purchase. Unless of course the purchase has already been made.

Thanks,
Regina

Apologies, I dropped the ball on many things while traveling for work...

My take on this, I was unclear on if we really understood how we would
utilize the hardware for the needs, since there are a few new
technologies in discussion we haven't used before. Was also in favor of
small savings as we're over the line item, and that money could be used
for things like people hours or 3rd party hosting, spare parts, etc...

So a few questions:
1. If we get the optane card, do we really need the SSDs? What would we
put on the SSDs that would benefit from it, considering the Optane card?

The Optane is intended for caching frequently read data on very fast storage.
As a single unmirrored device, it is not recommended for write-caching of
important data, but will serve quite well for temporary scratch space.

Mirrored SSDs are required for write caching to prevent failure of a single
device causing data loss. The size of the write cache is very small by
comparison to the read cache, but the write-to-read ratio is much higher,
necessitating the larger total DWPD*size rating. The SSDs can also provide
the fast tablespace for databases as needed, which also have high write-
amplification. The total allocated space should probably be 40-60% of the
device size to ensure long-term endurance. The data stored on the SSDs
can be automatically backed up to the spinning rust on a regular basis for
improved redundancy.

2. What caching tool will we use with the Optane? Something like
fscache/CacheFS that just does everything accessed, or something
configured per site like varnish/memcache etc?

We can do both if desirable, allocating large cache for the fs (L2ARC in ZFS),
as well as providing an explicit cache where desirable. This configuration can
be modified at any time, as the system's operation is not dependent on the
caching device being active.

3. Our storage growth is modest, not that I don't consider the quoted 8
or 10 TB to be reliable, but the 2 and 4 TB models have a lot more
reliability data, and take significantly less time to rebuild in a Raid
configuration. So how much storage do we really need for Downloads and
Foss4g archives?

OSGeo-Live alone has a growth rate and retention policy that indicates needs for
on the order of 100GB-1TB over the next 5 years from my quick calculations, not
including any additional large datasets. Supporting the geodata project would
likely consume every bit of storage we throw at it and still be
thirsty for more in
short order, so I would propose serving only the warm data on the new server and
re-purposing one of the older machines for bulk cold storage and backups once
services have been migrated successfully.

Remember, the usable capacity will approximately equal the total capacity of a
single drive in a doubly redundant configuration with 4 drives at
proper filesystem
fill ratios. We'll gain some due to compression, but also want to provision for
snapshots and backup of the SSD based storage, so 1x single drive size is a
good SWAG. Resliver times for ZFS are based on actual stored data, not disk
size, and can be done online with minimal degradation of service, so that's a
moot point I believe.

4. Do we know what we plan to put on the SSD drives vs the Spinning Disks?

See (1).

I think with the answers to these we'll be able to vote this week and order.

Thanks,
Alex
_______________________________________________
Sac mailing list
Sac@lists.osgeo.org
https://lists.osgeo.org/mailman/listinfo/sac

_______________________________________________
Sac mailing list
Sac@lists.osgeo.org
https://lists.osgeo.org/mailman/listinfo/sac

Harrison_Grundy · March 31, 2018, 11:05pm

I mean that, while the slot may be 16x electrically, do we know how it's wired on the board, with the on board RAID, NIC, M2, etc. Supermicro has this somewhere, I'll see if I can dig it up.

While the life of the optanes may be a great deal longer than SSD, is it material to how hard we'd push the ZFS write cache? (Since that's transactional, on most implementations, you can minimize the write amplification issues.)

My thinking is that first generation drives that will last for 80 years of writes instead of 8 probably aren't worth the significantly increased cost unless we'll actually take advantage of the increased throughput, since we can always swap them later for faster/larger/etc units in the future without impacting the pool.

Harrison

Original Message
From: chrisgiorgi@gmail.com
Sent: March 31, 2018 05:54
To: harrison.grundy@astrodoggroup.com
Cc: tech@wildintellect.com; sac@lists.osgeo.org
Subject: Re: [SAC] New Hardware, can we purchase now

I'm not sure how we would go about fitting a third Optane device --
the quote had HHHL PCIe cards listed, not the required U.2 devices
which go in place of the micron sata ssds.
The PCIe -> U.2 interface card provides 4 PCIe 3.0 lanes to each U.2
interface, which then connects by cable to the drives themselves.
The M.2 card slot on the board should be on it's own set of lanes, as
none of the remaining PCIe slots on the board are occupied due to
space constraints.
The reason for using the more expensive (and faster) Optanes for the
write cache is that a write-cache failure can lead to data corruption,
and they have an order of magnitude more write endurance than a
standard SSD.
The read cache can use a larger, cheaper (but still fast) SSD because
it see much lower write-amplification than the write cache and a
failure won't cause corruption.

~~~Chris~~~

On Fri, Mar 30, 2018 at 11:53 AM, <harrison.grundy@astrodoggroup.com> wrote:

Can someone confirm that the 4x PCIe slots aren't shared with the M.2 slot on the board and that 2 independent 4x slots are available?

If all 3 (SSD, Optanes) are on a single 4x bus, it kinda defeats the purpose.

Harrison

Sent via the BlackBerry Hub for Android

Original Message
From: tech_dev@wildintellect.com
Sent: March 31, 2018 02:21
To: sac@lists.osgeo.org
Reply-to: tech@wildintellect.com; sac@lists.osgeo.org
Cc: chrisgiorgi@gmail.com
Subject: Re: [SAC] New Hardware, can we purchase now

Here's the latest quote with the modifications Chris suggested.

One question, any reason we can't just use the Optanes for both read &
write caches?

Otherwise unless there are other suggestions or clarifications, I will
send out another thread for an official vote to approve. Note the price
is +$1,000 more than originally budgeted.

Thanks,
Alex

On 03/14/2018 09:47 PM, Chris Giorgi wrote:

Further investigation into the chassis shows this is the base sm is using:
https://www.supermicro.com/products/system/1U/6019/SYS-6019P-MT.cfm
It has a full-height PCIe 3.0 x8 port, as well as a M2 PCIe 3.0 x4
slot on the motherboard.
In light of this, I am changing my recommendation to the following,
please follow-up with sm for pricing:
2ea. Intel Optane 900p 280GB PCIe 3.0 x4 with U.2 interfaces,
replacing SATA SSDs
..connected to either a SuperMicro AOC-SLG3-2E4R or AOC-SLG3-2E4R
(Depending on compatibility)
Then, a single M.2 SSD such as a 512GB Samsung 960 PRO in the motherboard slot.

With this configuration, the Optanes supply a very fast mirrored write
cache (ZFS ZIL SLOG), while the M.2 card provides read caching (ZFS
L2ARC), and no further cache configuration needed.

Let me know if that sound more palatable.
~~~Chris~~~

On Wed, Mar 14, 2018 at 10:36 AM, Chris Giorgi <chrisgiorgi@gmail.com> wrote:

Alex,

Simply put, write caching requires redundant devices; read caching does not.

The write cache can be relatively small -- it only needs to handle
writes which have not yet been committed to disks. This allows sync
writes to finish as soon as the data hits the SSD, with the write to
disk being done async. Failure of the write cache device(s) may result
in data loss and corruption, so they MUST be redundant for
reliability.

The read cache should be large enough to handle all hot and much warm
data. It provides a second level cache to the in-memory block cache,
so that cache-misses to evicted blocks can be serviced very quickly
without waiting for drives to seek. Failure of the read cache device
degrades performance, but has no impact on data integrity.

~~~Chris~~~

On Wed, Mar 14, 2018 at 9:05 AM, Alex M <tech_dev@wildintellect.com> wrote:

My overall response, I'm a little hesitant to implement so many new
technologies at the same time with only 1 person who knows them (Chris G).

My opinion
+1 on some use of ZFS, if we have a good guide
-1 on use of Funtoo, We've prefered Debian or Ubuntu for many years and
have more people comfortable with them.
+1 on trying LXD
+1 on Optane
?0 on the SSD caching

1. What tool are we using to configure write-caching on the SSDs? I'd
rather not be making an overly complicated database configuration.

2. That seems a reasonable answer to me, though do we still need the
SSDs if we use the Optane for caching? It sounds to me like Optane or
SSD would suffice.

3. Disks - Yes if we plan to archive OSGeo Live that would benefit from
larger disks. I'm a -1 on storing data for the geodata committee, unless
they can find large data that is not publicly hosted elsewhere. At which
point I would recommend we find partners to host the data like GeoForAll
members or companies like Amazon/Google etc... Keep in mind we also need
to plan for backup space. Note, I don't see the total usable disk size
of backup in the wiki, can someone figure that out and add it. We need
to update https://wiki.osgeo.org/wiki/SAC:Backups

New question, which disk are we installing the OS on, and therefore the
ZFS packages?

Thanks,
Alex

On 03/13/2018 12:57 PM, Chris Giorgi wrote:

Hi Alex,
Answers inline below:
Take care,
~~~Chris~~~

On Mon, Mar 12, 2018 at 10:41 AM, Alex M <tech_dev@wildintellect.com> wrote:

On 03/02/2018 12:25 PM, Regina Obe wrote:

I'm in IRC meeting with Chris and he recalls the only outstanding thing
before hardware purchase was the disk size

[15:17] <TemptorSent> From my reply to the mailing list a while back, the
pricing for larger drives: (+$212 for 4x10he or +$540 for 4x12he)
[15:19] <TemptorSent> That gives us practical double-redundant storage of
12-16TB and 16-20TB respectively, depending how we use it.

If that is all, can we just get the bigger disk and move forward with the
hardware purchase. Unless of course the purchase has already been made.

Thanks,
Regina

Apologies, I dropped the ball on many things while traveling for work...

My take on this, I was unclear on if we really understood how we would
utilize the hardware for the needs, since there are a few new
technologies in discussion we haven't used before. Was also in favor of
small savings as we're over the line item, and that money could be used
for things like people hours or 3rd party hosting, spare parts, etc...

So a few questions:
1. If we get the optane card, do we really need the SSDs? What would we
put on the SSDs that would benefit from it, considering the Optane card?

The Optane is intended for caching frequently read data on very fast storage.
As a single unmirrored device, it is not recommended for write-caching of
important data, but will serve quite well for temporary scratch space.

Mirrored SSDs are required for write caching to prevent failure of a single
device causing data loss. The size of the write cache is very small by
comparison to the read cache, but the write-to-read ratio is much higher,
necessitating the larger total DWPD*size rating. The SSDs can also provide
the fast tablespace for databases as needed, which also have high write-
amplification. The total allocated space should probably be 40-60% of the
device size to ensure long-term endurance. The data stored on the SSDs
can be automatically backed up to the spinning rust on a regular basis for
improved redundancy.

2. What caching tool will we use with the Optane? Something like
fscache/CacheFS that just does everything accessed, or something
configured per site like varnish/memcache etc?

We can do both if desirable, allocating large cache for the fs (L2ARC in ZFS),
as well as providing an explicit cache where desirable. This configuration can
be modified at any time, as the system's operation is not dependent on the
caching device being active.

3. Our storage growth is modest, not that I don't consider the quoted 8
or 10 TB to be reliable, but the 2 and 4 TB models have a lot more
reliability data, and take significantly less time to rebuild in a Raid
configuration. So how much storage do we really need for Downloads and
Foss4g archives?

OSGeo-Live alone has a growth rate and retention policy that indicates needs for
on the order of 100GB-1TB over the next 5 years from my quick calculations, not
including any additional large datasets. Supporting the geodata project would
likely consume every bit of storage we throw at it and still be
thirsty for more in
short order, so I would propose serving only the warm data on the new server and
re-purposing one of the older machines for bulk cold storage and backups once
services have been migrated successfully.

Remember, the usable capacity will approximately equal the total capacity of a
single drive in a doubly redundant configuration with 4 drives at
proper filesystem
fill ratios. We'll gain some due to compression, but also want to provision for
snapshots and backup of the SSD based storage, so 1x single drive size is a
good SWAG. Resliver times for ZFS are based on actual stored data, not disk
size, and can be done online with minimal degradation of service, so that's a
moot point I believe.

4. Do we know what we plan to put on the SSD drives vs the Spinning Disks?

See (1).

I think with the answers to these we'll be able to vote this week and order.

Thanks,
Alex
_______________________________________________
Sac mailing list
Sac@lists.osgeo.org
https://lists.osgeo.org/mailman/listinfo/sac

_______________________________________________
Sac mailing list
Sac@lists.osgeo.org
https://lists.osgeo.org/mailman/listinfo/sac

Chris_Giorgi · April 1, 2018, 12:28am

Hi Harrison,
I understand what you mean regarding PCIe lanes, but I would be highly
surprised if supermicro chose to share the lanes of the slot used for
the lone PCIe riser with any other devices, given the large number of
PCIe lanes provided by the controllers on dual xeon processors. Of
course, if you can verify or refute that, it would be valuable. In any
case, U.2 would still be far faster than a SATA3 interconnect even if
it was bridged and only had four lanes

The write cache is responsible for logging every write before
returning from a synchronous write, so each DB transaction to
insert,update,or delete an entry may represent dozens or even hundreds
of block writes to the cache by the time it completes; these blocks
are however reordered into a transaction group for the write to the
permanent storage pool. Standard SSDs used in such an application
either have to be grossly oversized and/or replaced on a somewhat
frequent basis, and they also don't perform nearly as well in general.
The previous generation SLC flash based enterprise SSDs such as the
Intel 3700 have been discontinued, and prices are absurdly high for
those units which are available. SSDs also tend to slow noticeably
after a use for frequent writes.

The expected useful life of this server is planned for 5-7 years
minimum, and one of the criteria that was stated is avoiding the
necessity for on-site modifications or service if at all possible. A
rough estimate is 2-5years service for SSDs of around twice the size,
vs 6-10yrs on the optanes, based on both DWPD and MTBF. Please take a
look at the specs and let me know if I missed something.

Thanks,
~~~Chris~~~

On Sat, Mar 31, 2018 at 4:05 PM, <harrison.grundy@astrodoggroup.com> wrote:

I mean that, while the slot may be 16x electrically, do we know how it's wired on the board, with the on board RAID, NIC, M2, etc. Supermicro has this somewhere, I'll see if I can dig it up.

While the life of the optanes may be a great deal longer than SSD, is it material to how hard we'd push the ZFS write cache? (Since that's transactional, on most implementations, you can minimize the write amplification issues.)

My thinking is that first generation drives that will last for 80 years of writes instead of 8 probably aren't worth the significantly increased cost unless we'll actually take advantage of the increased throughput, since we can always swap them later for faster/larger/etc units in the future without impacting the pool.

Harrison

  Original Message
From: chrisgiorgi@gmail.com
Sent: March 31, 2018 05:54
To: harrison.grundy@astrodoggroup.com
Cc: tech@wildintellect.com; sac@lists.osgeo.org
Subject: Re: [SAC] New Hardware, can we purchase now

I'm not sure how we would go about fitting a third Optane device --
the quote had HHHL PCIe cards listed, not the required U.2 devices
which go in place of the micron sata ssds.
The PCIe -> U.2 interface card provides 4 PCIe 3.0 lanes to each U.2
interface, which then connects by cable to the drives themselves.
The M.2 card slot on the board should be on it's own set of lanes, as
none of the remaining PCIe slots on the board are occupied due to
space constraints.
The reason for using the more expensive (and faster) Optanes for the
write cache is that a write-cache failure can lead to data corruption,
and they have an order of magnitude more write endurance than a
standard SSD.
The read cache can use a larger, cheaper (but still fast) SSD because
it see much lower write-amplification than the write cache and a
failure won't cause corruption.

   ~~~Chris~~~

On Fri, Mar 30, 2018 at 11:53 AM, <harrison.grundy@astrodoggroup.com> wrote:

Can someone confirm that the 4x PCIe slots aren't shared with the M.2 slot on the board and that 2 independent 4x slots are available?

If all 3 (SSD, Optanes) are on a single 4x bus, it kinda defeats the purpose.

Harrison

Sent via the BlackBerry Hub for Android

  Original Message
From: tech_dev@wildintellect.com
Sent: March 31, 2018 02:21
To: sac@lists.osgeo.org
Reply-to: tech@wildintellect.com; sac@lists.osgeo.org
Cc: chrisgiorgi@gmail.com
Subject: Re: [SAC] New Hardware, can we purchase now

Here's the latest quote with the modifications Chris suggested.

One question, any reason we can't just use the Optanes for both read &
write caches?

Otherwise unless there are other suggestions or clarifications, I will
send out another thread for an official vote to approve. Note the price
is +$1,000 more than originally budgeted.

Thanks,
Alex

On 03/14/2018 09:47 PM, Chris Giorgi wrote:

Further investigation into the chassis shows this is the base sm is using:
https://www.supermicro.com/products/system/1U/6019/SYS-6019P-MT.cfm
It has a full-height PCIe 3.0 x8 port, as well as a M2 PCIe 3.0 x4
slot on the motherboard.
In light of this, I am changing my recommendation to the following,
please follow-up with sm for pricing:
2ea. Intel Optane 900p 280GB PCIe 3.0 x4 with U.2 interfaces,
replacing SATA SSDs
..connected to either a SuperMicro AOC-SLG3-2E4R or AOC-SLG3-2E4R
(Depending on compatibility)
Then, a single M.2 SSD such as a 512GB Samsung 960 PRO in the motherboard slot.

With this configuration, the Optanes supply a very fast mirrored write
cache (ZFS ZIL SLOG), while the M.2 card provides read caching (ZFS
L2ARC), and no further cache configuration needed.

Let me know if that sound more palatable.
   ~~~Chris~~~

On Wed, Mar 14, 2018 at 10:36 AM, Chris Giorgi <chrisgiorgi@gmail.com> wrote:

Alex,

Simply put, write caching requires redundant devices; read caching does not.

The write cache can be relatively small -- it only needs to handle
writes which have not yet been committed to disks. This allows sync
writes to finish as soon as the data hits the SSD, with the write to
disk being done async. Failure of the write cache device(s) may result
in data loss and corruption, so they MUST be redundant for
reliability.

The read cache should be large enough to handle all hot and much warm
data. It provides a second level cache to the in-memory block cache,
so that cache-misses to evicted blocks can be serviced very quickly
without waiting for drives to seek. Failure of the read cache device
degrades performance, but has no impact on data integrity.

  ~~~Chris~~~

On Wed, Mar 14, 2018 at 9:05 AM, Alex M <tech_dev@wildintellect.com> wrote:

My overall response, I'm a little hesitant to implement so many new
technologies at the same time with only 1 person who knows them (Chris G).

My opinion
+1 on some use of ZFS, if we have a good guide
-1 on use of Funtoo, We've prefered Debian or Ubuntu for many years and
have more people comfortable with them.
+1 on trying LXD
+1 on Optane
?0 on the SSD caching

1. What tool are we using to configure write-caching on the SSDs? I'd
rather not be making an overly complicated database configuration.

2. That seems a reasonable answer to me, though do we still need the
SSDs if we use the Optane for caching? It sounds to me like Optane or
SSD would suffice.

3. Disks - Yes if we plan to archive OSGeo Live that would benefit from
larger disks. I'm a -1 on storing data for the geodata committee, unless
they can find large data that is not publicly hosted elsewhere. At which
point I would recommend we find partners to host the data like GeoForAll
members or companies like Amazon/Google etc... Keep in mind we also need
to plan for backup space. Note, I don't see the total usable disk size
of backup in the wiki, can someone figure that out and add it. We need
to update https://wiki.osgeo.org/wiki/SAC:Backups

New question, which disk are we installing the OS on, and therefore the
ZFS packages?

Thanks,
Alex

On 03/13/2018 12:57 PM, Chris Giorgi wrote:

Hi Alex,
Answers inline below:
Take care,
   ~~~Chris~~~

On Mon, Mar 12, 2018 at 10:41 AM, Alex M <tech_dev@wildintellect.com> wrote:

On 03/02/2018 12:25 PM, Regina Obe wrote:

I'm in IRC meeting with Chris and he recalls the only outstanding thing
before hardware purchase was the disk size

[15:17] <TemptorSent> From my reply to the mailing list a while back, the
pricing for larger drives: (+$212 for 4x10he or +$540 for 4x12he)
[15:19] <TemptorSent> That gives us practical double-redundant storage of
12-16TB and 16-20TB respectively, depending how we use it.

If that is all, can we just get the bigger disk and move forward with the
hardware purchase. Unless of course the purchase has already been made.

Thanks,
Regina

Apologies, I dropped the ball on many things while traveling for work...

My take on this, I was unclear on if we really understood how we would
utilize the hardware for the needs, since there are a few new
technologies in discussion we haven't used before. Was also in favor of
small savings as we're over the line item, and that money could be used
for things like people hours or 3rd party hosting, spare parts, etc...

So a few questions:
1. If we get the optane card, do we really need the SSDs? What would we
put on the SSDs that would benefit from it, considering the Optane card?

The Optane is intended for caching frequently read data on very fast storage.
As a single unmirrored device, it is not recommended for write-caching of
important data, but will serve quite well for temporary scratch space.

Mirrored SSDs are required for write caching to prevent failure of a single
device causing data loss. The size of the write cache is very small by
comparison to the read cache, but the write-to-read ratio is much higher,
necessitating the larger total DWPD*size rating. The SSDs can also provide
the fast tablespace for databases as needed, which also have high write-
amplification. The total allocated space should probably be 40-60% of the
device size to ensure long-term endurance. The data stored on the SSDs
can be automatically backed up to the spinning rust on a regular basis for
improved redundancy.

2. What caching tool will we use with the Optane? Something like
fscache/CacheFS that just does everything accessed, or something
configured per site like varnish/memcache etc?

We can do both if desirable, allocating large cache for the fs (L2ARC in ZFS),
as well as providing an explicit cache where desirable. This configuration can
be modified at any time, as the system's operation is not dependent on the
caching device being active.

3. Our storage growth is modest, not that I don't consider the quoted 8
or 10 TB to be reliable, but the 2 and 4 TB models have a lot more
reliability data, and take significantly less time to rebuild in a Raid
configuration. So how much storage do we really need for Downloads and
Foss4g archives?

OSGeo-Live alone has a growth rate and retention policy that indicates needs for
on the order of 100GB-1TB over the next 5 years from my quick calculations, not
including any additional large datasets. Supporting the geodata project would
likely consume every bit of storage we throw at it and still be
thirsty for more in
short order, so I would propose serving only the warm data on the new server and
re-purposing one of the older machines for bulk cold storage and backups once
services have been migrated successfully.

Remember, the usable capacity will approximately equal the total capacity of a
single drive in a doubly redundant configuration with 4 drives at
proper filesystem
fill ratios. We'll gain some due to compression, but also want to provision for
snapshots and backup of the SSD based storage, so 1x single drive size is a
good SWAG. Resliver times for ZFS are based on actual stored data, not disk
size, and can be done online with minimal degradation of service, so that's a
moot point I believe.

4. Do we know what we plan to put on the SSD drives vs the Spinning Disks?

See (1).

I think with the answers to these we'll be able to vote this week and order.

Thanks,
Alex
_______________________________________________
Sac mailing list
Sac@lists.osgeo.org
https://lists.osgeo.org/mailman/listinfo/sac

_______________________________________________
Sac mailing list
Sac@lists.osgeo.org
https://lists.osgeo.org/mailman/listinfo/sac

Harrison_Grundy · April 1, 2018, 12:47am

I'll do some digging and come up with a summary for the board. I suspect you're right on the layout, but I've had a few boards with really nasty PCIe channel layouts. (Sharing the disk controller and 4 port NIC on the same 4x bus is evil!)

On the writes, it depends a lot on how we expect load on the machine to be organized... a single database hitting the write cache can hurt on magnification, but a large group will generally be coalesced effectively as the deeper disk queue is flushed in a single operation. Since ZFS was originally designed for spinning rust, it's pretty clever about doing as much as it can in the single op.

Is there a target in terms of operations per second, database throughput, or something else we can use to calculate the actual load we plan to put on the drives?

On the speed side, since it's the write cache, you can disable cache, secure erase, and enable to restore prior performance, when I've done SSD backed ZFS caches, I usually just put that on a weekly crontab.

I can hop on IRC for a chat if you want to run through it quickly. Also, don't hold this up on my account, I'm an intermittent participant at best and the concerns I've got are outweighed by how helpful the new machine would be, regardless of the answers to the above!

Harrison

Original Message
From: chrisgiorgi@gmail.com
Sent: April 1, 2018 08:28
To: harrison.grundy@astrodoggroup.com
Cc: tech@wildintellect.com; sac@lists.osgeo.org
Subject: Re: [SAC] New Hardware, can we purchase now

Hi Harrison,
I understand what you mean regarding PCIe lanes, but I would be highly
surprised if supermicro chose to share the lanes of the slot used for
the lone PCIe riser with any other devices, given the large number of
PCIe lanes provided by the controllers on dual xeon processors. Of
course, if you can verify or refute that, it would be valuable. In any
case, U.2 would still be far faster than a SATA3 interconnect even if
it was bridged and only had four lanes

The write cache is responsible for logging every write before
returning from a synchronous write, so each DB transaction to
insert,update,or delete an entry may represent dozens or even hundreds
of block writes to the cache by the time it completes; these blocks
are however reordered into a transaction group for the write to the
permanent storage pool. Standard SSDs used in such an application
either have to be grossly oversized and/or replaced on a somewhat
frequent basis, and they also don't perform nearly as well in general.
The previous generation SLC flash based enterprise SSDs such as the
Intel 3700 have been discontinued, and prices are absurdly high for
those units which are available. SSDs also tend to slow noticeably
after a use for frequent writes.

The expected useful life of this server is planned for 5-7 years
minimum, and one of the criteria that was stated is avoiding the
necessity for on-site modifications or service if at all possible. A
rough estimate is 2-5years service for SSDs of around twice the size,
vs 6-10yrs on the optanes, based on both DWPD and MTBF. Please take a
look at the specs and let me know if I missed something.

Thanks,
~~~Chris~~~

On Sat, Mar 31, 2018 at 4:05 PM, <harrison.grundy@astrodoggroup.com> wrote:

I mean that, while the slot may be 16x electrically, do we know how it's wired on the board, with the on board RAID, NIC, M2, etc. Supermicro has this somewhere, I'll see if I can dig it up.

While the life of the optanes may be a great deal longer than SSD, is it material to how hard we'd push the ZFS write cache? (Since that's transactional, on most implementations, you can minimize the write amplification issues.)

My thinking is that first generation drives that will last for 80 years of writes instead of 8 probably aren't worth the significantly increased cost unless we'll actually take advantage of the increased throughput, since we can always swap them later for faster/larger/etc units in the future without impacting the pool.

Harrison

Original Message
From: chrisgiorgi@gmail.com
Sent: March 31, 2018 05:54
To: harrison.grundy@astrodoggroup.com
Cc: tech@wildintellect.com; sac@lists.osgeo.org
Subject: Re: [SAC] New Hardware, can we purchase now

I'm not sure how we would go about fitting a third Optane device --
the quote had HHHL PCIe cards listed, not the required U.2 devices
which go in place of the micron sata ssds.
The PCIe -> U.2 interface card provides 4 PCIe 3.0 lanes to each U.2
interface, which then connects by cable to the drives themselves.
The M.2 card slot on the board should be on it's own set of lanes, as
none of the remaining PCIe slots on the board are occupied due to
space constraints.
The reason for using the more expensive (and faster) Optanes for the
write cache is that a write-cache failure can lead to data corruption,
and they have an order of magnitude more write endurance than a
standard SSD.
The read cache can use a larger, cheaper (but still fast) SSD because
it see much lower write-amplification than the write cache and a
failure won't cause corruption.
\~\~\~Chris\~\~\~
On Fri, Mar 30, 2018 at 11:53 AM, <harrison.grundy@astrodoggroup.com> wrote:

Can someone confirm that the 4x PCIe slots aren't shared with the M.2 slot on the board and that 2 independent 4x slots are available?

If all 3 (SSD, Optanes) are on a single 4x bus, it kinda defeats the purpose.

Harrison

Sent via the BlackBerry Hub for Android

Original Message
From: tech_dev@wildintellect.com
Sent: March 31, 2018 02:21
To: sac@lists.osgeo.org
Reply-to: tech@wildintellect.com; sac@lists.osgeo.org
Cc: chrisgiorgi@gmail.com
Subject: Re: [SAC] New Hardware, can we purchase now

Here's the latest quote with the modifications Chris suggested.

One question, any reason we can't just use the Optanes for both read &
write caches?

Otherwise unless there are other suggestions or clarifications, I will
send out another thread for an official vote to approve. Note the price
is +$1,000 more than originally budgeted.

Thanks,
Alex

On 03/14/2018 09:47 PM, Chris Giorgi wrote:

Further investigation into the chassis shows this is the base sm is using:
https://www.supermicro.com/products/system/1U/6019/SYS-6019P-MT.cfm
It has a full-height PCIe 3.0 x8 port, as well as a M2 PCIe 3.0 x4
slot on the motherboard.
In light of this, I am changing my recommendation to the following,
please follow-up with sm for pricing:
2ea. Intel Optane 900p 280GB PCIe 3.0 x4 with U.2 interfaces,
replacing SATA SSDs
..connected to either a SuperMicro AOC-SLG3-2E4R or AOC-SLG3-2E4R
(Depending on compatibility)
Then, a single M.2 SSD such as a 512GB Samsung 960 PRO in the motherboard slot.

With this configuration, the Optanes supply a very fast mirrored write
cache (ZFS ZIL SLOG), while the M.2 card provides read caching (ZFS
L2ARC), and no further cache configuration needed.

Let me know if that sound more palatable.
~~~Chris~~~

On Wed, Mar 14, 2018 at 10:36 AM, Chris Giorgi <chrisgiorgi@gmail.com> wrote:

Alex,

Simply put, write caching requires redundant devices; read caching does not.

The write cache can be relatively small -- it only needs to handle
writes which have not yet been committed to disks. This allows sync
writes to finish as soon as the data hits the SSD, with the write to
disk being done async. Failure of the write cache device(s) may result
in data loss and corruption, so they MUST be redundant for
reliability.

The read cache should be large enough to handle all hot and much warm
data. It provides a second level cache to the in-memory block cache,
so that cache-misses to evicted blocks can be serviced very quickly
without waiting for drives to seek. Failure of the read cache device
degrades performance, but has no impact on data integrity.

~~~Chris~~~

On Wed, Mar 14, 2018 at 9:05 AM, Alex M <tech_dev@wildintellect.com> wrote:

My overall response, I'm a little hesitant to implement so many new
technologies at the same time with only 1 person who knows them (Chris G).

My opinion
+1 on some use of ZFS, if we have a good guide
-1 on use of Funtoo, We've prefered Debian or Ubuntu for many years and
have more people comfortable with them.
+1 on trying LXD
+1 on Optane
?0 on the SSD caching

1. What tool are we using to configure write-caching on the SSDs? I'd
rather not be making an overly complicated database configuration.

2. That seems a reasonable answer to me, though do we still need the
SSDs if we use the Optane for caching? It sounds to me like Optane or
SSD would suffice.

3. Disks - Yes if we plan to archive OSGeo Live that would benefit from
larger disks. I'm a -1 on storing data for the geodata committee, unless
they can find large data that is not publicly hosted elsewhere. At which
point I would recommend we find partners to host the data like GeoForAll
members or companies like Amazon/Google etc... Keep in mind we also need
to plan for backup space. Note, I don't see the total usable disk size
of backup in the wiki, can someone figure that out and add it. We need
to update https://wiki.osgeo.org/wiki/SAC:Backups

New question, which disk are we installing the OS on, and therefore the
ZFS packages?

Thanks,
Alex

On 03/13/2018 12:57 PM, Chris Giorgi wrote:

Hi Alex,
Answers inline below:
Take care,
~~~Chris~~~

On Mon, Mar 12, 2018 at 10:41 AM, Alex M <tech_dev@wildintellect.com> wrote:

On 03/02/2018 12:25 PM, Regina Obe wrote:

I'm in IRC meeting with Chris and he recalls the only outstanding thing
before hardware purchase was the disk size

[15:17] <TemptorSent> From my reply to the mailing list a while back, the
pricing for larger drives: (+$212 for 4x10he or +$540 for 4x12he)
[15:19] <TemptorSent> That gives us practical double-redundant storage of
12-16TB and 16-20TB respectively, depending how we use it.

If that is all, can we just get the bigger disk and move forward with the
hardware purchase. Unless of course the purchase has already been made.

Thanks,
Regina

Apologies, I dropped the ball on many things while traveling for work...

My take on this, I was unclear on if we really understood how we would
utilize the hardware for the needs, since there are a few new
technologies in discussion we haven't used before. Was also in favor of
small savings as we're over the line item, and that money could be used
for things like people hours or 3rd party hosting, spare parts, etc...

So a few questions:
1. If we get the optane card, do we really need the SSDs? What would we
put on the SSDs that would benefit from it, considering the Optane card?

The Optane is intended for caching frequently read data on very fast storage.
As a single unmirrored device, it is not recommended for write-caching of
important data, but will serve quite well for temporary scratch space.

Mirrored SSDs are required for write caching to prevent failure of a single
device causing data loss. The size of the write cache is very small by
comparison to the read cache, but the write-to-read ratio is much higher,
necessitating the larger total DWPD*size rating. The SSDs can also provide
the fast tablespace for databases as needed, which also have high write-
amplification. The total allocated space should probably be 40-60% of the
device size to ensure long-term endurance. The data stored on the SSDs
can be automatically backed up to the spinning rust on a regular basis for
improved redundancy.

2. What caching tool will we use with the Optane? Something like
fscache/CacheFS that just does everything accessed, or something
configured per site like varnish/memcache etc?

We can do both if desirable, allocating large cache for the fs (L2ARC in ZFS),
as well as providing an explicit cache where desirable. This configuration can
be modified at any time, as the system's operation is not dependent on the
caching device being active.

3. Our storage growth is modest, not that I don't consider the quoted 8
or 10 TB to be reliable, but the 2 and 4 TB models have a lot more
reliability data, and take significantly less time to rebuild in a Raid
configuration. So how much storage do we really need for Downloads and
Foss4g archives?

OSGeo-Live alone has a growth rate and retention policy that indicates needs for
on the order of 100GB-1TB over the next 5 years from my quick calculations, not
including any additional large datasets. Supporting the geodata project would
likely consume every bit of storage we throw at it and still be
thirsty for more in
short order, so I would propose serving only the warm data on the new server and
re-purposing one of the older machines for bulk cold storage and backups once
services have been migrated successfully.

Remember, the usable capacity will approximately equal the total capacity of a
single drive in a doubly redundant configuration with 4 drives at
proper filesystem
fill ratios. We'll gain some due to compression, but also want to provision for
snapshots and backup of the SSD based storage, so 1x single drive size is a
good SWAG. Resliver times for ZFS are based on actual stored data, not disk
size, and can be done online with minimal degradation of service, so that's a
moot point I believe.

4. Do we know what we plan to put on the SSD drives vs the Spinning Disks?

See (1).

I think with the answers to these we'll be able to vote this week and order.

Thanks,
Alex
_______________________________________________
Sac mailing list
Sac@lists.osgeo.org
https://lists.osgeo.org/mailman/listinfo/sac

_______________________________________________
Sac mailing list
Sac@lists.osgeo.org
https://lists.osgeo.org/mailman/listinfo/sac

Chris_Giorgi · April 1, 2018, 2:23am

Harrison,

From what munin shows, overall write latency is the primary bottleneck

for responsiveness, total device iops being next, and read latency
only occasionally showing a significant overall impact.

Most of the write loading appears to be non-sequential DB or
small-file writes, both of which are notoriously bad in terms of write
amplification, and which require a very fast write cache to maintain
good latency and proper write barriers and reliable transactional
guarantees.

A reasonable goal under load is probably ~ 80% percentile of writes
having on the order of 1ms added latency worst case. 95% < ~10ms, and
99.9+% < ~250ms.

Consider this may be supporting a couple dozen trac instances, several
of wordpress, the wiki, gitea, etc., and the databases backing all of
the above, the peak loading can get reasonably substantial in terms of
iops, even if sustained throughput isn't terribly high.

Yes, there are certainly viable ways to extend the performant life of
a SSD, but I'm not sure the administrative overhead or potential
hassles with premature device replacement is worth the savings of a
couple hundred dollars, not to mention performance.

Please do ping me on IRC (TemptorSent on freenode) and we can discuss
the details further.

Take care,
~~~Chris~~~

On Sat, Mar 31, 2018 at 5:47 PM, <harrison.grundy@astrodoggroup.com> wrote:

I'll do some digging and come up with a summary for the board. I suspect you're right on the layout, but I've had a few boards with really nasty PCIe channel layouts. (Sharing the disk controller and 4 port NIC on the same 4x bus is evil!)

On the writes, it depends a lot on how we expect load on the machine to be organized... a single database hitting the write cache can hurt on magnification, but a large group will generally be coalesced effectively as the deeper disk queue is flushed in a single operation. Since ZFS was originally designed for spinning rust, it's pretty clever about doing as much as it can in the single op.

Is there a target in terms of operations per second, database throughput, or something else we can use to calculate the actual load we plan to put on the drives?

On the speed side, since it's the write cache, you can disable cache, secure erase, and enable to restore prior performance, when I've done SSD backed ZFS caches, I usually just put that on a weekly crontab.

I can hop on IRC for a chat if you want to run through it quickly. Also, don't hold this up on my account, I'm an intermittent participant at best and the concerns I've got are outweighed by how helpful the new machine would be, regardless of the answers to the above!

Harrison

  Original Message
From: chrisgiorgi@gmail.com
Sent: April 1, 2018 08:28
To: harrison.grundy@astrodoggroup.com
Cc: tech@wildintellect.com; sac@lists.osgeo.org
Subject: Re: [SAC] New Hardware, can we purchase now

Hi Harrison,
I understand what you mean regarding PCIe lanes, but I would be highly
surprised if supermicro chose to share the lanes of the slot used for
the lone PCIe riser with any other devices, given the large number of
PCIe lanes provided by the controllers on dual xeon processors. Of
course, if you can verify or refute that, it would be valuable. In any
case, U.2 would still be far faster than a SATA3 interconnect even if
it was bridged and only had four lanes

The write cache is responsible for logging every write before
returning from a synchronous write, so each DB transaction to
insert,update,or delete an entry may represent dozens or even hundreds
of block writes to the cache by the time it completes; these blocks
are however reordered into a transaction group for the write to the
permanent storage pool. Standard SSDs used in such an application
either have to be grossly oversized and/or replaced on a somewhat
frequent basis, and they also don't perform nearly as well in general.
The previous generation SLC flash based enterprise SSDs such as the
Intel 3700 have been discontinued, and prices are absurdly high for
those units which are available. SSDs also tend to slow noticeably
after a use for frequent writes.

The expected useful life of this server is planned for 5-7 years
minimum, and one of the criteria that was stated is avoiding the
necessity for on-site modifications or service if at all possible. A
rough estimate is 2-5years service for SSDs of around twice the size,
vs 6-10yrs on the optanes, based on both DWPD and MTBF. Please take a
look at the specs and let me know if I missed something.

Thanks,
   ~~~Chris~~~

On Sat, Mar 31, 2018 at 4:05 PM, <harrison.grundy@astrodoggroup.com> wrote:

I mean that, while the slot may be 16x electrically, do we know how it's wired on the board, with the on board RAID, NIC, M2, etc. Supermicro has this somewhere, I'll see if I can dig it up.

While the life of the optanes may be a great deal longer than SSD, is it material to how hard we'd push the ZFS write cache? (Since that's transactional, on most implementations, you can minimize the write amplification issues.)

My thinking is that first generation drives that will last for 80 years of writes instead of 8 probably aren't worth the significantly increased cost unless we'll actually take advantage of the increased throughput, since we can always swap them later for faster/larger/etc units in the future without impacting the pool.

Harrison

  Original Message
From: chrisgiorgi@gmail.com
Sent: March 31, 2018 05:54
To: harrison.grundy@astrodoggroup.com
Cc: tech@wildintellect.com; sac@lists.osgeo.org
Subject: Re: [SAC] New Hardware, can we purchase now

I'm not sure how we would go about fitting a third Optane device --
the quote had HHHL PCIe cards listed, not the required U.2 devices
which go in place of the micron sata ssds.
The PCIe -> U.2 interface card provides 4 PCIe 3.0 lanes to each U.2
interface, which then connects by cable to the drives themselves.
The M.2 card slot on the board should be on it's own set of lanes, as
none of the remaining PCIe slots on the board are occupied due to
space constraints.
The reason for using the more expensive (and faster) Optanes for the
write cache is that a write-cache failure can lead to data corruption,
and they have an order of magnitude more write endurance than a
standard SSD.
The read cache can use a larger, cheaper (but still fast) SSD because
it see much lower write-amplification than the write cache and a
failure won't cause corruption.

   ~~~Chris~~~

On Fri, Mar 30, 2018 at 11:53 AM, <harrison.grundy@astrodoggroup.com> wrote:

Can someone confirm that the 4x PCIe slots aren't shared with the M.2 slot on the board and that 2 independent 4x slots are available?

If all 3 (SSD, Optanes) are on a single 4x bus, it kinda defeats the purpose.

Harrison

Sent via the BlackBerry Hub for Android

  Original Message
From: tech_dev@wildintellect.com
Sent: March 31, 2018 02:21
To: sac@lists.osgeo.org
Reply-to: tech@wildintellect.com; sac@lists.osgeo.org
Cc: chrisgiorgi@gmail.com
Subject: Re: [SAC] New Hardware, can we purchase now

Here's the latest quote with the modifications Chris suggested.

One question, any reason we can't just use the Optanes for both read &
write caches?

Otherwise unless there are other suggestions or clarifications, I will
send out another thread for an official vote to approve. Note the price
is +$1,000 more than originally budgeted.

Thanks,
Alex

On 03/14/2018 09:47 PM, Chris Giorgi wrote:

Further investigation into the chassis shows this is the base sm is using:
https://www.supermicro.com/products/system/1U/6019/SYS-6019P-MT.cfm
It has a full-height PCIe 3.0 x8 port, as well as a M2 PCIe 3.0 x4
slot on the motherboard.
In light of this, I am changing my recommendation to the following,
please follow-up with sm for pricing:
2ea. Intel Optane 900p 280GB PCIe 3.0 x4 with U.2 interfaces,
replacing SATA SSDs
..connected to either a SuperMicro AOC-SLG3-2E4R or AOC-SLG3-2E4R
(Depending on compatibility)
Then, a single M.2 SSD such as a 512GB Samsung 960 PRO in the motherboard slot.

With this configuration, the Optanes supply a very fast mirrored write
cache (ZFS ZIL SLOG), while the M.2 card provides read caching (ZFS
L2ARC), and no further cache configuration needed.

Let me know if that sound more palatable.
   ~~~Chris~~~

On Wed, Mar 14, 2018 at 10:36 AM, Chris Giorgi <chrisgiorgi@gmail.com> wrote:

Alex,

Simply put, write caching requires redundant devices; read caching does not.

The write cache can be relatively small -- it only needs to handle
writes which have not yet been committed to disks. This allows sync
writes to finish as soon as the data hits the SSD, with the write to
disk being done async. Failure of the write cache device(s) may result
in data loss and corruption, so they MUST be redundant for
reliability.

The read cache should be large enough to handle all hot and much warm
data. It provides a second level cache to the in-memory block cache,
so that cache-misses to evicted blocks can be serviced very quickly
without waiting for drives to seek. Failure of the read cache device
degrades performance, but has no impact on data integrity.

  ~~~Chris~~~

On Wed, Mar 14, 2018 at 9:05 AM, Alex M <tech_dev@wildintellect.com> wrote:

My overall response, I'm a little hesitant to implement so many new
technologies at the same time with only 1 person who knows them (Chris G).

My opinion
+1 on some use of ZFS, if we have a good guide
-1 on use of Funtoo, We've prefered Debian or Ubuntu for many years and
have more people comfortable with them.
+1 on trying LXD
+1 on Optane
?0 on the SSD caching

1. What tool are we using to configure write-caching on the SSDs? I'd
rather not be making an overly complicated database configuration.

2. That seems a reasonable answer to me, though do we still need the
SSDs if we use the Optane for caching? It sounds to me like Optane or
SSD would suffice.

3. Disks - Yes if we plan to archive OSGeo Live that would benefit from
larger disks. I'm a -1 on storing data for the geodata committee, unless
they can find large data that is not publicly hosted elsewhere. At which
point I would recommend we find partners to host the data like GeoForAll
members or companies like Amazon/Google etc... Keep in mind we also need
to plan for backup space. Note, I don't see the total usable disk size
of backup in the wiki, can someone figure that out and add it. We need
to update https://wiki.osgeo.org/wiki/SAC:Backups

New question, which disk are we installing the OS on, and therefore the
ZFS packages?

Thanks,
Alex

On 03/13/2018 12:57 PM, Chris Giorgi wrote:

Hi Alex,
Answers inline below:
Take care,
   ~~~Chris~~~

On Mon, Mar 12, 2018 at 10:41 AM, Alex M <tech_dev@wildintellect.com> wrote:

On 03/02/2018 12:25 PM, Regina Obe wrote:

I'm in IRC meeting with Chris and he recalls the only outstanding thing
before hardware purchase was the disk size

[15:17] <TemptorSent> From my reply to the mailing list a while back, the
pricing for larger drives: (+$212 for 4x10he or +$540 for 4x12he)
[15:19] <TemptorSent> That gives us practical double-redundant storage of
12-16TB and 16-20TB respectively, depending how we use it.

If that is all, can we just get the bigger disk and move forward with the
hardware purchase. Unless of course the purchase has already been made.

Thanks,
Regina

Apologies, I dropped the ball on many things while traveling for work...

My take on this, I was unclear on if we really understood how we would
utilize the hardware for the needs, since there are a few new
technologies in discussion we haven't used before. Was also in favor of
small savings as we're over the line item, and that money could be used
for things like people hours or 3rd party hosting, spare parts, etc...

So a few questions:
1. If we get the optane card, do we really need the SSDs? What would we
put on the SSDs that would benefit from it, considering the Optane card?

The Optane is intended for caching frequently read data on very fast storage.
As a single unmirrored device, it is not recommended for write-caching of
important data, but will serve quite well for temporary scratch space.

Mirrored SSDs are required for write caching to prevent failure of a single
device causing data loss. The size of the write cache is very small by
comparison to the read cache, but the write-to-read ratio is much higher,
necessitating the larger total DWPD*size rating. The SSDs can also provide
the fast tablespace for databases as needed, which also have high write-
amplification. The total allocated space should probably be 40-60% of the
device size to ensure long-term endurance. The data stored on the SSDs
can be automatically backed up to the spinning rust on a regular basis for
improved redundancy.

2. What caching tool will we use with the Optane? Something like
fscache/CacheFS that just does everything accessed, or something
configured per site like varnish/memcache etc?

We can do both if desirable, allocating large cache for the fs (L2ARC in ZFS),
as well as providing an explicit cache where desirable. This configuration can
be modified at any time, as the system's operation is not dependent on the
caching device being active.

3. Our storage growth is modest, not that I don't consider the quoted 8
or 10 TB to be reliable, but the 2 and 4 TB models have a lot more
reliability data, and take significantly less time to rebuild in a Raid
configuration. So how much storage do we really need for Downloads and
Foss4g archives?

OSGeo-Live alone has a growth rate and retention policy that indicates needs for
on the order of 100GB-1TB over the next 5 years from my quick calculations, not
including any additional large datasets. Supporting the geodata project would
likely consume every bit of storage we throw at it and still be
thirsty for more in
short order, so I would propose serving only the warm data on the new server and
re-purposing one of the older machines for bulk cold storage and backups once
services have been migrated successfully.

Remember, the usable capacity will approximately equal the total capacity of a
single drive in a doubly redundant configuration with 4 drives at
proper filesystem
fill ratios. We'll gain some due to compression, but also want to provision for
snapshots and backup of the SSD based storage, so 1x single drive size is a
good SWAG. Resliver times for ZFS are based on actual stored data, not disk
size, and can be done online with minimal degradation of service, so that's a
moot point I believe.

4. Do we know what we plan to put on the SSD drives vs the Spinning Disks?

See (1).

I think with the answers to these we'll be able to vote this week and order.

Thanks,
Alex
_______________________________________________
Sac mailing list
Sac@lists.osgeo.org
https://lists.osgeo.org/mailman/listinfo/sac

_______________________________________________
Sac mailing list
Sac@lists.osgeo.org
https://lists.osgeo.org/mailman/listinfo/sac

Alex_M · April 2, 2018, 5:59pm

To clarify I was pondering 2 devices not 3. The answer may be you want
the 3 we've already selected so the read cache is separate and larger.

Please let me know if there are any other issues with the config before
we proceed.

Thanks,
Alex

On 03/30/2018 02:54 PM, Chris Giorgi wrote:

I'm not sure how we would go about fitting a third Optane device --
the quote had HHHL PCIe cards listed, not the required U.2 devices
which go in place of the micron sata ssds.
The PCIe -> U.2 interface card provides 4 PCIe 3.0 lanes to each U.2
interface, which then connects by cable to the drives themselves.
The M.2 card slot on the board should be on it's own set of lanes, as
none of the remaining PCIe slots on the board are occupied due to
space constraints.
The reason for using the more expensive (and faster) Optanes for the
write cache is that a write-cache failure can lead to data corruption,
and they have an order of magnitude more write endurance than a
standard SSD.
The read cache can use a larger, cheaper (but still fast) SSD because
it see much lower write-amplification than the write cache and a
failure won't cause corruption.

   ~~~Chris~~~

On Fri, Mar 30, 2018 at 11:53 AM, <harrison.grundy@astrodoggroup.com> wrote:

Can someone confirm that the 4x PCIe slots aren't shared with the M.2 slot on the board and that 2 independent 4x slots are available?

If all 3 (SSD, Optanes) are on a single 4x bus, it kinda defeats the purpose.

Harrison

Sent via the BlackBerry Hub for Android

  Original Message
From: tech_dev@wildintellect.com
Sent: March 31, 2018 02:21
To: sac@lists.osgeo.org
Reply-to: tech@wildintellect.com; sac@lists.osgeo.org
Cc: chrisgiorgi@gmail.com
Subject: Re: [SAC] New Hardware, can we purchase now

Here's the latest quote with the modifications Chris suggested.

One question, any reason we can't just use the Optanes for both read &
write caches?

Otherwise unless there are other suggestions or clarifications, I will
send out another thread for an official vote to approve. Note the price
is +$1,000 more than originally budgeted.

Thanks,
Alex

On 03/14/2018 09:47 PM, Chris Giorgi wrote:

Further investigation into the chassis shows this is the base sm is using:
https://www.supermicro.com/products/system/1U/6019/SYS-6019P-MT.cfm
It has a full-height PCIe 3.0 x8 port, as well as a M2 PCIe 3.0 x4
slot on the motherboard.
In light of this, I am changing my recommendation to the following,
please follow-up with sm for pricing:
2ea. Intel Optane 900p 280GB PCIe 3.0 x4 with U.2 interfaces,
replacing SATA SSDs
..connected to either a SuperMicro AOC-SLG3-2E4R or AOC-SLG3-2E4R
(Depending on compatibility)
Then, a single M.2 SSD such as a 512GB Samsung 960 PRO in the motherboard slot.

With this configuration, the Optanes supply a very fast mirrored write
cache (ZFS ZIL SLOG), while the M.2 card provides read caching (ZFS
L2ARC), and no further cache configuration needed.

Let me know if that sound more palatable.
   ~~~Chris~~~

On Wed, Mar 14, 2018 at 10:36 AM, Chris Giorgi <chrisgiorgi@gmail.com> wrote:

Alex,

Simply put, write caching requires redundant devices; read caching does not.

The write cache can be relatively small -- it only needs to handle
writes which have not yet been committed to disks. This allows sync
writes to finish as soon as the data hits the SSD, with the write to
disk being done async. Failure of the write cache device(s) may result
in data loss and corruption, so they MUST be redundant for
reliability.

The read cache should be large enough to handle all hot and much warm
data. It provides a second level cache to the in-memory block cache,
so that cache-misses to evicted blocks can be serviced very quickly
without waiting for drives to seek. Failure of the read cache device
degrades performance, but has no impact on data integrity.

  ~~~Chris~~~

On Wed, Mar 14, 2018 at 9:05 AM, Alex M <tech_dev@wildintellect.com> wrote:

My overall response, I'm a little hesitant to implement so many new
technologies at the same time with only 1 person who knows them (Chris G).

My opinion
+1 on some use of ZFS, if we have a good guide
-1 on use of Funtoo, We've prefered Debian or Ubuntu for many years and
have more people comfortable with them.
+1 on trying LXD
+1 on Optane
?0 on the SSD caching

1. What tool are we using to configure write-caching on the SSDs? I'd
rather not be making an overly complicated database configuration.

2. That seems a reasonable answer to me, though do we still need the
SSDs if we use the Optane for caching? It sounds to me like Optane or
SSD would suffice.

3. Disks - Yes if we plan to archive OSGeo Live that would benefit from
larger disks. I'm a -1 on storing data for the geodata committee, unless
they can find large data that is not publicly hosted elsewhere. At which
point I would recommend we find partners to host the data like GeoForAll
members or companies like Amazon/Google etc... Keep in mind we also need
to plan for backup space. Note, I don't see the total usable disk size
of backup in the wiki, can someone figure that out and add it. We need
to update https://wiki.osgeo.org/wiki/SAC:Backups

New question, which disk are we installing the OS on, and therefore the
ZFS packages?

Thanks,
Alex

On 03/13/2018 12:57 PM, Chris Giorgi wrote:

Hi Alex,
Answers inline below:
Take care,
   ~~~Chris~~~

On Mon, Mar 12, 2018 at 10:41 AM, Alex M <tech_dev@wildintellect.com> wrote:

On 03/02/2018 12:25 PM, Regina Obe wrote:

I'm in IRC meeting with Chris and he recalls the only outstanding thing
before hardware purchase was the disk size

[15:17] <TemptorSent> From my reply to the mailing list a while back, the
pricing for larger drives: (+$212 for 4x10he or +$540 for 4x12he)
[15:19] <TemptorSent> That gives us practical double-redundant storage of
12-16TB and 16-20TB respectively, depending how we use it.

If that is all, can we just get the bigger disk and move forward with the
hardware purchase. Unless of course the purchase has already been made.

Thanks,
Regina

Apologies, I dropped the ball on many things while traveling for work...

My take on this, I was unclear on if we really understood how we would
utilize the hardware for the needs, since there are a few new
technologies in discussion we haven't used before. Was also in favor of
small savings as we're over the line item, and that money could be used
for things like people hours or 3rd party hosting, spare parts, etc...

So a few questions:
1. If we get the optane card, do we really need the SSDs? What would we
put on the SSDs that would benefit from it, considering the Optane card?

The Optane is intended for caching frequently read data on very fast storage.
As a single unmirrored device, it is not recommended for write-caching of
important data, but will serve quite well for temporary scratch space.

Mirrored SSDs are required for write caching to prevent failure of a single
device causing data loss. The size of the write cache is very small by
comparison to the read cache, but the write-to-read ratio is much higher,
necessitating the larger total DWPD*size rating. The SSDs can also provide
the fast tablespace for databases as needed, which also have high write-
amplification. The total allocated space should probably be 40-60% of the
device size to ensure long-term endurance. The data stored on the SSDs
can be automatically backed up to the spinning rust on a regular basis for
improved redundancy.

2. What caching tool will we use with the Optane? Something like
fscache/CacheFS that just does everything accessed, or something
configured per site like varnish/memcache etc?

We can do both if desirable, allocating large cache for the fs (L2ARC in ZFS),
as well as providing an explicit cache where desirable. This configuration can
be modified at any time, as the system's operation is not dependent on the
caching device being active.

3. Our storage growth is modest, not that I don't consider the quoted 8
or 10 TB to be reliable, but the 2 and 4 TB models have a lot more
reliability data, and take significantly less time to rebuild in a Raid
configuration. So how much storage do we really need for Downloads and
Foss4g archives?

OSGeo-Live alone has a growth rate and retention policy that indicates needs for
on the order of 100GB-1TB over the next 5 years from my quick calculations, not
including any additional large datasets. Supporting the geodata project would
likely consume every bit of storage we throw at it and still be
thirsty for more in
short order, so I would propose serving only the warm data on the new server and
re-purposing one of the older machines for bulk cold storage and backups once
services have been migrated successfully.

Remember, the usable capacity will approximately equal the total capacity of a
single drive in a doubly redundant configuration with 4 drives at
proper filesystem
fill ratios. We'll gain some due to compression, but also want to provision for
snapshots and backup of the SSD based storage, so 1x single drive size is a
good SWAG. Resliver times for ZFS are based on actual stored data, not disk
size, and can be done online with minimal degradation of service, so that's a
moot point I believe.

4. Do we know what we plan to put on the SSD drives vs the Spinning Disks?

See (1).

I think with the answers to these we'll be able to vote this week and order.

Thanks,
Alex
_______________________________________________
Sac mailing list
Sac@lists.osgeo.org
https://lists.osgeo.org/mailman/listinfo/sac

_______________________________________________
Sac mailing list
Sac@lists.osgeo.org
https://lists.osgeo.org/mailman/listinfo/sac

Alex_M · April 13, 2018, 3:39pm

Chris and Harrison,

Can you confirm that this quote is acceptable and we should move on to
voting?

https://drive.google.com/open?id=1M491x3mSl51K1o60Bksulf7KOCqkru55

Thanks,
Alex

On 04/02/2018 10:59 AM, Alex M wrote:

To clarify I was pondering 2 devices not 3. The answer may be you want
the 3 we've already selected so the read cache is separate and larger.

Please let me know if there are any other issues with the config before
we proceed.

Thanks,
Alex

On 03/30/2018 02:54 PM, Chris Giorgi wrote:

I'm not sure how we would go about fitting a third Optane device --
the quote had HHHL PCIe cards listed, not the required U.2 devices
which go in place of the micron sata ssds.
The PCIe -> U.2 interface card provides 4 PCIe 3.0 lanes to each U.2
interface, which then connects by cable to the drives themselves.
The M.2 card slot on the board should be on it's own set of lanes, as
none of the remaining PCIe slots on the board are occupied due to
space constraints.
The reason for using the more expensive (and faster) Optanes for the
write cache is that a write-cache failure can lead to data corruption,
and they have an order of magnitude more write endurance than a
standard SSD.
The read cache can use a larger, cheaper (but still fast) SSD because
it see much lower write-amplification than the write cache and a
failure won't cause corruption.

   ~~~Chris~~~

On Fri, Mar 30, 2018 at 11:53 AM, <harrison.grundy@astrodoggroup.com> wrote:

Can someone confirm that the 4x PCIe slots aren't shared with the M.2 slot on the board and that 2 independent 4x slots are available?

If all 3 (SSD, Optanes) are on a single 4x bus, it kinda defeats the purpose.

Harrison

Sent via the BlackBerry Hub for Android

  Original Message
From: tech_dev@wildintellect.com
Sent: March 31, 2018 02:21
To: sac@lists.osgeo.org
Reply-to: tech@wildintellect.com; sac@lists.osgeo.org
Cc: chrisgiorgi@gmail.com
Subject: Re: [SAC] New Hardware, can we purchase now

Here's the latest quote with the modifications Chris suggested.

One question, any reason we can't just use the Optanes for both read &
write caches?

Otherwise unless there are other suggestions or clarifications, I will
send out another thread for an official vote to approve. Note the price
is +$1,000 more than originally budgeted.

Thanks,
Alex

On 03/14/2018 09:47 PM, Chris Giorgi wrote:

Further investigation into the chassis shows this is the base sm is using:
https://www.supermicro.com/products/system/1U/6019/SYS-6019P-MT.cfm
It has a full-height PCIe 3.0 x8 port, as well as a M2 PCIe 3.0 x4
slot on the motherboard.
In light of this, I am changing my recommendation to the following,
please follow-up with sm for pricing:
2ea. Intel Optane 900p 280GB PCIe 3.0 x4 with U.2 interfaces,
replacing SATA SSDs
..connected to either a SuperMicro AOC-SLG3-2E4R or AOC-SLG3-2E4R
(Depending on compatibility)
Then, a single M.2 SSD such as a 512GB Samsung 960 PRO in the motherboard slot.

With this configuration, the Optanes supply a very fast mirrored write
cache (ZFS ZIL SLOG), while the M.2 card provides read caching (ZFS
L2ARC), and no further cache configuration needed.

Let me know if that sound more palatable.
   ~~~Chris~~~

On Wed, Mar 14, 2018 at 10:36 AM, Chris Giorgi <chrisgiorgi@gmail.com> wrote:

Alex,

Simply put, write caching requires redundant devices; read caching does not.

The write cache can be relatively small -- it only needs to handle
writes which have not yet been committed to disks. This allows sync
writes to finish as soon as the data hits the SSD, with the write to
disk being done async. Failure of the write cache device(s) may result
in data loss and corruption, so they MUST be redundant for
reliability.

The read cache should be large enough to handle all hot and much warm
data. It provides a second level cache to the in-memory block cache,
so that cache-misses to evicted blocks can be serviced very quickly
without waiting for drives to seek. Failure of the read cache device
degrades performance, but has no impact on data integrity.

  ~~~Chris~~~

On Wed, Mar 14, 2018 at 9:05 AM, Alex M <tech_dev@wildintellect.com> wrote:

My overall response, I'm a little hesitant to implement so many new
technologies at the same time with only 1 person who knows them (Chris G).

My opinion
+1 on some use of ZFS, if we have a good guide
-1 on use of Funtoo, We've prefered Debian or Ubuntu for many years and
have more people comfortable with them.
+1 on trying LXD
+1 on Optane
?0 on the SSD caching

1. What tool are we using to configure write-caching on the SSDs? I'd
rather not be making an overly complicated database configuration.

2. That seems a reasonable answer to me, though do we still need the
SSDs if we use the Optane for caching? It sounds to me like Optane or
SSD would suffice.

3. Disks - Yes if we plan to archive OSGeo Live that would benefit from
larger disks. I'm a -1 on storing data for the geodata committee, unless
they can find large data that is not publicly hosted elsewhere. At which
point I would recommend we find partners to host the data like GeoForAll
members or companies like Amazon/Google etc... Keep in mind we also need
to plan for backup space. Note, I don't see the total usable disk size
of backup in the wiki, can someone figure that out and add it. We need
to update https://wiki.osgeo.org/wiki/SAC:Backups

New question, which disk are we installing the OS on, and therefore the
ZFS packages?

Thanks,
Alex

On 03/13/2018 12:57 PM, Chris Giorgi wrote:

Hi Alex,
Answers inline below:
Take care,
   ~~~Chris~~~

On Mon, Mar 12, 2018 at 10:41 AM, Alex M <tech_dev@wildintellect.com> wrote:

On 03/02/2018 12:25 PM, Regina Obe wrote:

I'm in IRC meeting with Chris and he recalls the only outstanding thing
before hardware purchase was the disk size

[15:17] <TemptorSent> From my reply to the mailing list a while back, the
pricing for larger drives: (+$212 for 4x10he or +$540 for 4x12he)
[15:19] <TemptorSent> That gives us practical double-redundant storage of
12-16TB and 16-20TB respectively, depending how we use it.

If that is all, can we just get the bigger disk and move forward with the
hardware purchase. Unless of course the purchase has already been made.

Thanks,
Regina

Apologies, I dropped the ball on many things while traveling for work...

My take on this, I was unclear on if we really understood how we would
utilize the hardware for the needs, since there are a few new
technologies in discussion we haven't used before. Was also in favor of
small savings as we're over the line item, and that money could be used
for things like people hours or 3rd party hosting, spare parts, etc...

So a few questions:
1. If we get the optane card, do we really need the SSDs? What would we
put on the SSDs that would benefit from it, considering the Optane card?

The Optane is intended for caching frequently read data on very fast storage.
As a single unmirrored device, it is not recommended for write-caching of
important data, but will serve quite well for temporary scratch space.

Mirrored SSDs are required for write caching to prevent failure of a single
device causing data loss. The size of the write cache is very small by
comparison to the read cache, but the write-to-read ratio is much higher,
necessitating the larger total DWPD*size rating. The SSDs can also provide
the fast tablespace for databases as needed, which also have high write-
amplification. The total allocated space should probably be 40-60% of the
device size to ensure long-term endurance. The data stored on the SSDs
can be automatically backed up to the spinning rust on a regular basis for
improved redundancy.

2. What caching tool will we use with the Optane? Something like
fscache/CacheFS that just does everything accessed, or something
configured per site like varnish/memcache etc?

We can do both if desirable, allocating large cache for the fs (L2ARC in ZFS),
as well as providing an explicit cache where desirable. This configuration can
be modified at any time, as the system's operation is not dependent on the
caching device being active.

3. Our storage growth is modest, not that I don't consider the quoted 8
or 10 TB to be reliable, but the 2 and 4 TB models have a lot more
reliability data, and take significantly less time to rebuild in a Raid
configuration. So how much storage do we really need for Downloads and
Foss4g archives?

OSGeo-Live alone has a growth rate and retention policy that indicates needs for
on the order of 100GB-1TB over the next 5 years from my quick calculations, not
including any additional large datasets. Supporting the geodata project would
likely consume every bit of storage we throw at it and still be
thirsty for more in
short order, so I would propose serving only the warm data on the new server and
re-purposing one of the older machines for bulk cold storage and backups once
services have been migrated successfully.

Remember, the usable capacity will approximately equal the total capacity of a
single drive in a doubly redundant configuration with 4 drives at
proper filesystem
fill ratios. We'll gain some due to compression, but also want to provision for
snapshots and backup of the SSD based storage, so 1x single drive size is a
good SWAG. Resliver times for ZFS are based on actual stored data, not disk
size, and can be done online with minimal degradation of service, so that's a
moot point I believe.

4. Do we know what we plan to put on the SSD drives vs the Spinning Disks?

See (1).

I think with the answers to these we'll be able to vote this week and order.

Thanks,
Alex
_______________________________________________
Sac mailing list
Sac@lists.osgeo.org
https://lists.osgeo.org/mailman/listinfo/sac

_______________________________________________
Sac mailing list
Sac@lists.osgeo.org
https://lists.osgeo.org/mailman/listinfo/sac

_______________________________________________
Sac mailing list
Sac@lists.osgeo.org
https://lists.osgeo.org/mailman/listinfo/sac

Chris_Giorgi · April 14, 2018, 2:15am

No, this has the wrong optane drives again it appears:
"Optane: 2 x Intel 280GB 900P Series (3D XPoint, 10 DWPD) HHHL PCIe
3.0 x4 NVMe SSD"
is wrong, they should be the U.2 form devices.

On Fri, Apr 13, 2018 at 8:39 AM, Alex M <tech_dev@wildintellect.com> wrote:

Chris and Harrison,

Can you confirm that this quote is acceptable and we should move on to
voting?

https://drive.google.com/open?id=1M491x3mSl51K1o60Bksulf7KOCqkru55

Thanks,
Alex

On 04/02/2018 10:59 AM, Alex M wrote:

To clarify I was pondering 2 devices not 3. The answer may be you want
the 3 we've already selected so the read cache is separate and larger.

Please let me know if there are any other issues with the config before
we proceed.

Thanks,
Alex

On 03/30/2018 02:54 PM, Chris Giorgi wrote:

I'm not sure how we would go about fitting a third Optane device --
the quote had HHHL PCIe cards listed, not the required U.2 devices
which go in place of the micron sata ssds.
The PCIe -> U.2 interface card provides 4 PCIe 3.0 lanes to each U.2
interface, which then connects by cable to the drives themselves.
The M.2 card slot on the board should be on it's own set of lanes, as
none of the remaining PCIe slots on the board are occupied due to
space constraints.
The reason for using the more expensive (and faster) Optanes for the
write cache is that a write-cache failure can lead to data corruption,
and they have an order of magnitude more write endurance than a
standard SSD.
The read cache can use a larger, cheaper (but still fast) SSD because
it see much lower write-amplification than the write cache and a
failure won't cause corruption.

   ~~~Chris~~~

On Fri, Mar 30, 2018 at 11:53 AM, <harrison.grundy@astrodoggroup.com> wrote:

Can someone confirm that the 4x PCIe slots aren't shared with the M.2 slot on the board and that 2 independent 4x slots are available?

If all 3 (SSD, Optanes) are on a single 4x bus, it kinda defeats the purpose.

Harrison

Sent via the BlackBerry Hub for Android

  Original Message
From: tech_dev@wildintellect.com
Sent: March 31, 2018 02:21
To: sac@lists.osgeo.org
Reply-to: tech@wildintellect.com; sac@lists.osgeo.org
Cc: chrisgiorgi@gmail.com
Subject: Re: [SAC] New Hardware, can we purchase now

Here's the latest quote with the modifications Chris suggested.

One question, any reason we can't just use the Optanes for both read &
write caches?

Otherwise unless there are other suggestions or clarifications, I will
send out another thread for an official vote to approve. Note the price
is +$1,000 more than originally budgeted.

Thanks,
Alex

On 03/14/2018 09:47 PM, Chris Giorgi wrote:

Further investigation into the chassis shows this is the base sm is using:
https://www.supermicro.com/products/system/1U/6019/SYS-6019P-MT.cfm
It has a full-height PCIe 3.0 x8 port, as well as a M2 PCIe 3.0 x4
slot on the motherboard.
In light of this, I am changing my recommendation to the following,
please follow-up with sm for pricing:
2ea. Intel Optane 900p 280GB PCIe 3.0 x4 with U.2 interfaces,
replacing SATA SSDs
..connected to either a SuperMicro AOC-SLG3-2E4R or AOC-SLG3-2E4R
(Depending on compatibility)
Then, a single M.2 SSD such as a 512GB Samsung 960 PRO in the motherboard slot.

With this configuration, the Optanes supply a very fast mirrored write
cache (ZFS ZIL SLOG), while the M.2 card provides read caching (ZFS
L2ARC), and no further cache configuration needed.

Let me know if that sound more palatable.
   ~~~Chris~~~

On Wed, Mar 14, 2018 at 10:36 AM, Chris Giorgi <chrisgiorgi@gmail.com> wrote:

Alex,

Simply put, write caching requires redundant devices; read caching does not.

The write cache can be relatively small -- it only needs to handle
writes which have not yet been committed to disks. This allows sync
writes to finish as soon as the data hits the SSD, with the write to
disk being done async. Failure of the write cache device(s) may result
in data loss and corruption, so they MUST be redundant for
reliability.

The read cache should be large enough to handle all hot and much warm
data. It provides a second level cache to the in-memory block cache,
so that cache-misses to evicted blocks can be serviced very quickly
without waiting for drives to seek. Failure of the read cache device
degrades performance, but has no impact on data integrity.

  ~~~Chris~~~

On Wed, Mar 14, 2018 at 9:05 AM, Alex M <tech_dev@wildintellect.com> wrote:

My overall response, I'm a little hesitant to implement so many new
technologies at the same time with only 1 person who knows them (Chris G).

My opinion
+1 on some use of ZFS, if we have a good guide
-1 on use of Funtoo, We've prefered Debian or Ubuntu for many years and
have more people comfortable with them.
+1 on trying LXD
+1 on Optane
?0 on the SSD caching

1. What tool are we using to configure write-caching on the SSDs? I'd
rather not be making an overly complicated database configuration.

2. That seems a reasonable answer to me, though do we still need the
SSDs if we use the Optane for caching? It sounds to me like Optane or
SSD would suffice.

3. Disks - Yes if we plan to archive OSGeo Live that would benefit from
larger disks. I'm a -1 on storing data for the geodata committee, unless
they can find large data that is not publicly hosted elsewhere. At which
point I would recommend we find partners to host the data like GeoForAll
members or companies like Amazon/Google etc... Keep in mind we also need
to plan for backup space. Note, I don't see the total usable disk size
of backup in the wiki, can someone figure that out and add it. We need
to update https://wiki.osgeo.org/wiki/SAC:Backups

New question, which disk are we installing the OS on, and therefore the
ZFS packages?

Thanks,
Alex

On 03/13/2018 12:57 PM, Chris Giorgi wrote:

Hi Alex,
Answers inline below:
Take care,
   ~~~Chris~~~

On Mon, Mar 12, 2018 at 10:41 AM, Alex M <tech_dev@wildintellect.com> wrote:

On 03/02/2018 12:25 PM, Regina Obe wrote:

I'm in IRC meeting with Chris and he recalls the only outstanding thing
before hardware purchase was the disk size

[15:17] <TemptorSent> From my reply to the mailing list a while back, the
pricing for larger drives: (+$212 for 4x10he or +$540 for 4x12he)
[15:19] <TemptorSent> That gives us practical double-redundant storage of
12-16TB and 16-20TB respectively, depending how we use it.

If that is all, can we just get the bigger disk and move forward with the
hardware purchase. Unless of course the purchase has already been made.

Thanks,
Regina

Apologies, I dropped the ball on many things while traveling for work...

My take on this, I was unclear on if we really understood how we would
utilize the hardware for the needs, since there are a few new
technologies in discussion we haven't used before. Was also in favor of
small savings as we're over the line item, and that money could be used
for things like people hours or 3rd party hosting, spare parts, etc...

So a few questions:
1. If we get the optane card, do we really need the SSDs? What would we
put on the SSDs that would benefit from it, considering the Optane card?

The Optane is intended for caching frequently read data on very fast storage.
As a single unmirrored device, it is not recommended for write-caching of
important data, but will serve quite well for temporary scratch space.

Mirrored SSDs are required for write caching to prevent failure of a single
device causing data loss. The size of the write cache is very small by
comparison to the read cache, but the write-to-read ratio is much higher,
necessitating the larger total DWPD*size rating. The SSDs can also provide
the fast tablespace for databases as needed, which also have high write-
amplification. The total allocated space should probably be 40-60% of the
device size to ensure long-term endurance. The data stored on the SSDs
can be automatically backed up to the spinning rust on a regular basis for
improved redundancy.

2. What caching tool will we use with the Optane? Something like
fscache/CacheFS that just does everything accessed, or something
configured per site like varnish/memcache etc?

We can do both if desirable, allocating large cache for the fs (L2ARC in ZFS),
as well as providing an explicit cache where desirable. This configuration can
be modified at any time, as the system's operation is not dependent on the
caching device being active.

3. Our storage growth is modest, not that I don't consider the quoted 8
or 10 TB to be reliable, but the 2 and 4 TB models have a lot more
reliability data, and take significantly less time to rebuild in a Raid
configuration. So how much storage do we really need for Downloads and
Foss4g archives?

OSGeo-Live alone has a growth rate and retention policy that indicates needs for
on the order of 100GB-1TB over the next 5 years from my quick calculations, not
including any additional large datasets. Supporting the geodata project would
likely consume every bit of storage we throw at it and still be
thirsty for more in
short order, so I would propose serving only the warm data on the new server and
re-purposing one of the older machines for bulk cold storage and backups once
services have been migrated successfully.

Remember, the usable capacity will approximately equal the total capacity of a
single drive in a doubly redundant configuration with 4 drives at
proper filesystem
fill ratios. We'll gain some due to compression, but also want to provision for
snapshots and backup of the SSD based storage, so 1x single drive size is a
good SWAG. Resliver times for ZFS are based on actual stored data, not disk
size, and can be done online with minimal degradation of service, so that's a
moot point I believe.

4. Do we know what we plan to put on the SSD drives vs the Spinning Disks?

See (1).

I think with the answers to these we'll be able to vote this week and order.

Thanks,
Alex
_______________________________________________
Sac mailing list
Sac@lists.osgeo.org
https://lists.osgeo.org/mailman/listinfo/sac

_______________________________________________
Sac mailing list
Sac@lists.osgeo.org
https://lists.osgeo.org/mailman/listinfo/sac

_______________________________________________
Sac mailing list
Sac@lists.osgeo.org
https://lists.osgeo.org/mailman/listinfo/sac

_______________________________________________
Sac mailing list
Sac@lists.osgeo.org
https://lists.osgeo.org/mailman/listinfo/sac