[Geoserver-devel] Adding MBTiles Support in GWC

Hi,

Some comments:

- If you plan to support also other projections than EPSG:3857, woudn't it be better to call it SQLite cache? If cache is not in EPSG:3857 then it is not MBTiles really. Actually I can see that you have been reading MapCache documentation about its SQLite cache http://mapserver.org/mapcache/caches.html. If the cache is not exactly MBTiles I think it would be good to evaluate if the same db schema than MapCache is using could suit also for GWC. Having some interoperability among two OSGeo project would not make harm unless there is something we can do obviously better.
- I wonder if WAL would suit for this kind of usage http://www.sqlite.org/draft/wal.html
- Replace might be good to be used instead of VACUUM as well
- Have you considered to use ATTACH/DETACH DATABASE http://sqlite.org/lang_attach.html http://sqlite.org/lang_detach.html as a possible alternative for your procedure in Replace operation? I am not sure but it might allow to keep connection to main database open all the time and swap the attached db on-the-fly. If you run service from attached_1, make new attachment as attached_2, swap requests to attached_2 and detach attached_1. Detach will fail during transactions so replacement should be instant instead of almost instant. But it looks like ATTACH and WAL do not suit together well.
- I wonder that if SQLite is completely seeded then GWC could open it as read-only and avoid possible corruption on shared file systems.

-Jukka Rahkonen-

-----Alkuperäinen viesti-----
Lähettäjä: Nuno Oliveira [mailto:nuno.oliveira@…1268…]
Lähetetty: 19. toukokuuta 2016 13:01
Vastaanottaja: Geoserver-devel
Aihe: [Geoserver-devel] Adding MBTiles Support in GWC

Hi all,
sorry for the cross posting.

We would like to add MBTiles support to GWC.
Follows a description of the work with the main issues\limitations.

I would like to have community feedback on this, by the way is there a better way to propose this work ?

* MBtiles and SQLitle *

MBtiles is a specification that describe how to store tiles in an SQLite database, this will allow us to store many tiles in a single SQLite file avoiding us file systems headaches:
https://github.com/mapbox/mbtiles-spec/blob/master/1.1/spec.md.

We can rely on GeoTools gt-mbtiles module for reading and writing MBTiles, this way most of the work of implementing this blobstore will be managing SQLite connections and SQLite files.

SQLite files cannot be managed as simple files. When connections to an SQLite database are open we should not delete, move or switch the associated file. Databases files can be filled with "empty space" after deleting an huge amount of data or can become fragmented after frequent inserts, updates or delete operations.

SQLite documentation warns us against putting databases files on a shared file system if multiple process need access to it (which is our case). Unless we can rely on a distributed lock mechanism SQLite databases files should not be used with shared stores.

* VACUUM and DiskQuota *

To remove the fragmented space (or the empty space), the VACUUM command needs to be executed. Although, performing a VACUUM command as a few
drawbacks:

    - During a VACUUM twice the size of the original database file is required in disk.
    - During the VACUUM operation no access to the database is allowed.
    - The VACUUM operation copies the whole database which can take minutes.

For these reasons the VACUUM command cannot be performed after each operation. When possible we will avoid creating fragmented space. For example, during a truncate operation we may prefer remove a whole SQLilte file instead of deleting part of is content. Another consequence of the fragmented space is that DiskQuota will not be compatible with this blobstore.

* MBTiles Granularity *

Reading and writing tiles on an SQLite database will be slower than writing on a file system but will allow us to avoid file system headaches. In order to limit the amount of contention on each single MBTiles file we will allow users to decide the granularity of the files so that instead of having a single file for each single layer we will allow users to have more granularity.

MBTiles force us to have at least a file per layer and format. If we want to support more CRSs we will also need a file for each CRSs. By configuration it will be possible to configure the granularity of the database files. By default we will have a granularity per layer, crs, format and zoom level. As an instance something like this could be offered:

    <blobstore>
       <file>/path/to/{grid}/{dim}/{tileset}/{z}/{x}-{y}.sqlite</file>
       <xcount>1000</xcount>
       <ycount>1000</ycount>
    </blobstore>

In this case we should include the {x}, {y} and {z} replacements in the template determining the file to use. In the previous example, tile
(z,x,y)=(15,3024,1534) would be stored in a file named /path/to/g/mytileset/15/3000-1000.sqlite3 and tile (5,2,8) would be stored in a file named /path/to/g/mytileset/5/0-0.sqlite3.

With more databases files we have more performance but we will have also more files to manage on the file system. In addition we can couple this with the in-memory cache in order to improve tile serving performance.

* Connection Pooling and Performance *

SQLite allow multiple readers but only allow one writer at the time which will block the entire database. At most only one connection should be open to each SQLite database, the total number of open connections is limited by the number of open files allowed by the OS (in linux this is controlled by the ulimit). A connection pool that will control the number of open connections and that will be responsible to manage the connections will be implemented.

* Replace Operation *

As said before, if the cache is running we cannot simply switch SQLite files, we need to make sure that all connections are closed. A replace operation will be created for this propose. The replace operation will first copy the new file side by side the old one, then block the requests to the old file, tear down the store, delete the old one, rename the new file to current one, reopen the new db file and start serving requests again. Should be almost instant. A REST entry point for this operation will be created (with the possibility to send the new file with the request).

Regards,

--

GeoServer Professional Services from the experts!
Visit http://goo.gl/it488V for more information.

Nuno Miguel Carvalho Oliveira
@nmcoliveira
Software Engineer

GeoSolutions S.A.S.
Via di Montramito 3/A
55054 Massarosa (LU)
Italy

phone: +39 0584 962313
fax: +39 0584 1660272
mob: +39 333 8128928

http://www.geo-solutions.it
http://twitter.com/geosolutions_it

-------------------------------------------------------

AVVERTENZE AI SENSI DEL D.Lgs. 196/2003
Le informazioni contenute in questo messaggio di posta elettronica e/o nel/i file/s allegato/i sono da considerarsi strettamente riservate. Il loro utilizzo è consentito esclusivamente al destinatario del messaggio, per le finalità indicate nel messaggio stesso. Qualora riceviate questo messaggio senza esserne il destinatario, Vi preghiamo cortesemente di darcene notizia via e -mail e di procedere alla distruzione del messaggio stesso, cancellandolo dal Vostro sistema. Conservare il messaggio stesso, divulgarlo anche in parte, distribuirlo ad altri soggetti, copiarlo, od utilizzarlo per finalità diverse, costituisce comportamento contrario ai principi dettati dal D.Lgs. 196/2003.

The information in this message and/or attachments, is intended solely for the attention and use of the named addressee(s) and may be confidential or proprietary in nature or covered by the provisions of privacy act (Legislative Decree June, 30 2003, no.196 - Italy's New Data Protection Code).Any use not in accord with its purpose, any disclosure, reproduction, copying, distribution, or either dissemination, either whole or partial, is strictly forbidden except previous formal approval of the named addressee(s). If you are not the intended recipient, please contact immediately the sender by telephone, fax or e-mail and delete the information in this message that has been received in error. The sender does not give any warranty or accept liability as the content, accuracy or completeness of sent messages and accepts no responsibility for changes made after they were sent or for other risks which arise as a result of e-mail transmission, viruses, etc.

------------------------------------------------------------------------------
Mobile security can be enabling, not merely restricting. Employees who bring their own devices (BYOD) to work are irked by the imposition of MDM restrictions. Mobile Device Manager Plus allows you to control only the apps on BYO-devices by containerizing them, leaving personal data untouched!
https://ad.doubleclick.net/ddm/clk/304595813;131938128;j
_______________________________________________
Geoserver-devel mailing list
Geoserver-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/geoserver-devel

Thanks for thee feedback. My answers inline:

Le jeudi 19 mai 2016 à 12:03 +0000, Rahkonen Jukka (MML) a écrit :

Hi,

Some comments:

- If you plan to support also other projections than EPSG:3857, woudn't it be better to call it SQLite cache? If cache is not in EPSG:3857
then it is not MBTiles really. Actually I can see that you have been reading MapCache documentation about its SQLite cache
http://mapserver.org/mapcache/caches.html. If the cache is not exactly MBTiles I think it would be good to evaluate if the same db schema
than MapCache is using could suit also for GWC. Having some interoperability among two OSGeo project would not make harm unless there is
something we can do obviously better.

You are correct. Writing MBTiles, GeoPakage or similar specs is just a matter of selecting the correct schema and be aware of the spec
limitations. For now we only want MBTiles, with all the associated restrictions (projection and formats). The code related with SQLitle will
be generic and can be used by other SQLitle based specifications implementations. Basically we will have an abstract SQLitle blobstore.

In the future we may want to create an extended MBTiles bloblstore that will allow us to store different projections or different formats.
The only difference between mbtiles-extend and mbtiles will be the meta-data.

- I wonder if WAL would suit for this kind of usage http://www.sqlite.org/draft/wal.html

Using WAL allow writers a readers to run at the same time (still no write concurrency will be allowed). Some of WAL limitations makes me
avoid it (at least for now), particularly this one: There is an additional quasi-persistent "-wal" file and "-shm" shared memory file
associated with each database, which can make SQLite less appealing for use as an application file-format.

- Replace might be good to be used instead of VACUUM as well

Agree the goal will be to avoid the VACUUM operation and instead perform a REPLACE operation with a new database.

- Have you considered to use ATTACH/DETACH DATABASE http://sqlite.org/lang_attach.html http://sqlite.org/lang_detach.html as a possible
alternative for your procedure in Replace operation? I am not sure but it might allow to keep connection to main database open all the
time and swap the attached db on-the-fly. If you run service from attached_1, make new attachment as attached_2, swap requests to
attached_2 and detach attached_1. Detach will fail during transactions so replacement should be instant instead of almost instant. But it
looks like ATTACH and WAL do not suit together well.

We will allow the user to control the granularity/number of SQLitle files, when we need to write or read a tile the identification of which
file contains that tile must be instantaneous (at least simple as concatenating some values). Even using attach\detach operations at some
point we would have to rename the file to is original name. To take advantage of attach\detach operations we would need to use a more
complex (and less performant) naming strategy and I don't expect that the REPLACE operation will happen very frequently.

- I wonder that if SQLite is completely seeded then GWC could open it as read-only and avoid possible corruption on shared file systems.

For now SQLitle based blobstores should not be used in shared stores, but I agree that allowing a read-only mode (WAL needs to be taken in
consideration) will allow using seeded SQLitle files in a shared store.

-Jukka Rahkonen-

-----Alkuperäinen viesti-----
Lähettäjä: Nuno Oliveira [mailto:nuno.oliveira@anonymised.com]
Lähetetty: 19. toukokuuta 2016 13:01
Vastaanottaja: Geoserver-devel
Aihe: [Geoserver-devel] Adding MBTiles Support in GWC

Hi all,
sorry for the cross posting.

We would like to add MBTiles support to GWC.
Follows a description of the work with the main issues\limitations.

I would like to have community feedback on this, by the way is there a better way to propose this work ?

* MBtiles and SQLitle *

MBtiles is a specification that describe how to store tiles in an SQLite database, this will allow us to store many tiles in a single
SQLite file avoiding us file systems headaches:
https://github.com/mapbox/mbtiles-spec/blob/master/1.1/spec.md.

We can rely on GeoTools gt-mbtiles module for reading and writing MBTiles, this way most of the work of implementing this blobstore will
be managing SQLite connections and SQLite files.

SQLite files cannot be managed as simple files. When connections to an SQLite database are open we should not delete, move or switch the
associated file. Databases files can be filled with "empty space" after deleting an huge amount of data or can become fragmented after
frequent inserts, updates or delete operations.

SQLite documentation warns us against putting databases files on a shared file system if multiple process need access to it (which is our
case). Unless we can rely on a distributed lock mechanism SQLite databases files should not be used with shared stores.

* VACUUM and DiskQuota *

To remove the fragmented space (or the empty space), the VACUUM command needs to be executed. Although, performing a VACUUM command as a
few
drawbacks:

    - During a VACUUM twice the size of the original database file is required in disk.
    - During the VACUUM operation no access to the database is allowed.
    - The VACUUM operation copies the whole database which can take minutes.

For these reasons the VACUUM command cannot be performed after each operation. When possible we will avoid creating fragmented space. For
example, during a truncate operation we may prefer remove a whole SQLilte file instead of deleting part of is content. Another consequence
of the fragmented space is that DiskQuota will not be compatible with this blobstore.

* MBTiles Granularity *

Reading and writing tiles on an SQLite database will be slower than writing on a file system but will allow us to avoid file system
headaches. In order to limit the amount of contention on each single MBTiles file we will allow users to decide the granularity of the
files so that instead of having a single file for each single layer we will allow users to have more granularity.

MBTiles force us to have at least a file per layer and format. If we want to support more CRSs we will also need a file for each CRSs. By
configuration it will be possible to configure the granularity of the database files. By default we will have a granularity per layer,
crs, format and zoom level. As an instance something like this could be offered:

    <blobstore>
       <file>/path/to/{grid}/{dim}/{tileset}/{z}/{x}-{y}.sqlite</file>
       <xcount>1000</xcount>
       <ycount>1000</ycount>
    </blobstore>

In this case we should include the {x}, {y} and {z} replacements in the template determining the file to use. In the previous example,
tile
(z,x,y)=(15,3024,1534) would be stored in a file named /path/to/g/mytileset/15/3000-1000.sqlite3 and tile (5,2,8) would be stored in a
file named /path/to/g/mytileset/5/0-0.sqlite3.

With more databases files we have more performance but we will have also more files to manage on the file system. In addition we can
couple this with the in-memory cache in order to improve tile serving performance.

* Connection Pooling and Performance *

SQLite allow multiple readers but only allow one writer at the time which will block the entire database. At most only one connection
should be open to each SQLite database, the total number of open connections is limited by the number of open files allowed by the OS (in
linux this is controlled by the ulimit). A connection pool that will control the number of open connections and that will be responsible
to manage the connections will be implemented.

* Replace Operation *

As said before, if the cache is running we cannot simply switch SQLite files, we need to make sure that all connections are closed. A
replace operation will be created for this propose. The replace operation will first copy the new file side by side the old one, then
block the requests to the old file, tear down the store, delete the old one, rename the new file to current one, reopen the new db file
and start serving requests again. Should be almost instant. A REST entry point for this operation will be created (with the possibility to
send the new file with the request).

Regards,

--

GeoServer Professional Services from the experts!
Visit http://goo.gl/it488V for more information.

Nuno Miguel Carvalho Oliveira
@nmcoliveira
Software Engineer

GeoSolutions S.A.S.
Via di Montramito 3/A
55054 Massarosa (LU)
Italy

phone: +39 0584 962313
fax: +39 0584 1660272
mob: +39 333 8128928

http://www.geo-solutions.it
http://twitter.com/geosolutions_it

-------------------------------------------------------

AVVERTENZE AI SENSI DEL D.Lgs. 196/2003
Le informazioni contenute in questo messaggio di posta elettronica e/o nel/i file/s allegato/i sono da considerarsi strettamente
riservate. Il loro utilizzo è consentito esclusivamente al destinatario del messaggio, per le finalità indicate nel messaggio stesso.
Qualora riceviate questo messaggio senza esserne il destinatario, Vi preghiamo cortesemente di darcene notizia via e -mail e di procedere
alla distruzione del messaggio stesso, cancellandolo dal Vostro sistema. Conservare il messaggio stesso, divulgarlo anche in parte,
distribuirlo ad altri soggetti, copiarlo, od utilizzarlo per finalità diverse, costituisce comportamento contrario ai principi dettati dal
D.Lgs. 196/2003.

The information in this message and/or attachments, is intended solely for the attention and use of the named addressee(s) and may be
confidential or proprietary in nature or covered by the provisions of privacy act (Legislative Decree June, 30 2003, no.196 - Italy's New
Data Protection Code).Any use not in accord with its purpose, any disclosure, reproduction, copying, distribution, or either
dissemination, either whole or partial, is strictly forbidden except previous formal approval of the named addressee(s). If you are not
the intended recipient, please contact immediately the sender by telephone, fax or e-mail and delete the information in this message that
has been received in error. The sender does not give any warranty or accept liability as the content, accuracy or completeness of sent
messages and accepts no responsibility for changes made after they were sent or for other risks which arise as a result of e-mail
transmission, viruses, etc.

------------------------------------------------------------------------------
Mobile security can be enabling, not merely restricting. Employees who bring their own devices (BYOD) to work are irked by the imposition
of MDM restrictions. Mobile Device Manager Plus allows you to control only the apps on BYO-devices by containerizing them, leaving
personal data untouched!
https://ad.doubleclick.net/ddm/clk/304595813;131938128;j
_______________________________________________
Geoserver-devel mailing list
Geoserver-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/geoserver-devel

--

GeoServer Professional Services from the experts!
Visit http://goo.gl/it488V for more information.

Nuno Miguel Carvalho Oliveira
@nmcoliveira
Software Engineer

GeoSolutions S.A.S.
Via di Montramito 3/A
55054 Massarosa (LU)
Italy

phone: +39 0584 962313
fax: +39 0584 1660272
mob: +39 333 8128928

http://www.geo-solutions.it
http://twitter.com/geosolutions_it

-------------------------------------------------------

AVVERTENZE AI SENSI DEL D.Lgs. 196/2003
Le informazioni contenute in questo messaggio di posta elettronica e/o nel/i file/s allegato/i sono
da considerarsi strettamente riservate. Il loro utilizzo è consentito esclusivamente al destinatario del messaggio, per le finalità indicate
nel messaggio stesso. Qualora riceviate questo messaggio senza esserne il destinatario, Vi preghiamo cortesemente di darcene notizia via e
-mail e di procedere alla distruzione del messaggio stesso, cancellandolo dal Vostro sistema. Conservare il messaggio stesso, divulgarlo
anche in parte, distribuirlo ad altri soggetti, copiarlo, od utilizzarlo per finalità diverse, costituisce comportamento contrario ai
principi dettati dal D.Lgs. 196/2003.

The information in this message and/or attachments, is intended solely for the attention and use of
the named addressee(s) and may be confidential or proprietary in nature or covered by the provisions of privacy act (Legislative Decree
June, 30 2003, no.196 - Italy's New Data Protection Code).Any use not in accord with its purpose, any disclosure, reproduction, copying,
distribution, or either dissemination, either whole or partial, is strictly forbidden except previous formal approval of the named
addressee(s). If you are not the intended recipient, please contact immediately the sender by telephone, fax or e-mail and delete the
information in this message that has been received in error. The sender does not give any warranty or accept liability as the content,
accuracy or completeness of sent messages and accepts no responsibility for changes made after they were sent or for other risks which
arise as a result of e-mail transmission, viruses, etc.