[GRASS-user] To import from or to link to data stored in a PostgreSQL data base?

Dears,

the following concerns an update of an existing workflow, part of which
is GRASS GIS, that makes use of a large PostgreSQL data base which does
not reside locally.

The original data set consists of tens of thousands of (overlapping)
polygons. The data are required solely to build raster MASKs.
So, importing the whole of it, is an overkill. Instead, options, already
working, are to split all records in single tables or views. Then access
these via GRASS to perform some analytics.

First instructions of the workflow are:

- read a (external) vector map
- set the computational region
- build a raster mask.

Building a MASK using a pseudo vector map that links to an
external table, stored in a PostgreSQL data base, is times slower than
importing the vetor of interest in GRASS GIS and then building a MASK
using the "native" GRASS GIS vector map.

Giacomo timed different options, using `v.external` as well as importing
the data using `v.in.ogr`. Specifically,

- building a MASK using one pseudo vector map (without and with a
  spatial-index), takes about 9 minutes (real time).

time r.mask vector=test_nogeoindex --o

real 8m40.306s
user 5m14.225s
sys 0m56.378s

and

time r.mask vector=test_geoindex --o

real 8m46.096s
user 5m15.693s
sys 0m56.346s

- building a MASK using a native GRASS GIS vector map, imported via a
  table or a view, takes about 0.4 seconds.

real 0m0.373s
user 0m0.191s
sys 0m0.111s

and

real 0m0.350s
user 0m0.179s
sys 0m0.115s

For the latter, building a view is way faster than a table (half a
minute for more than 20000 views, while it would take approximately an
hour to build single tables).

The trade-off appears to be space vs time. If data are imported, more
disk space is required. If data are not imported, and `v.external` is
used, then `r.mask` takes too much time to build a raster MASK.

- Is it acceptable for `r.mask` to take so long in building a MASK based
  on an external vector map stored in a PostgreSQL data base?

- Is network connection a limiting factor here, since the PG data base
  is not local?

- Would anyone have any recommendations/considerations on this approach?

Thank you, Nikos

Hei Nikos,

What about using gdal_rasterize and then r.external output=MASK?

Cheers

Stefan


From: grass-user grass-user-bounces@lists.osgeo.org on behalf of Nikos Alexandris nik@nikosalexandris.net
Sent: Tuesday, May 8, 2018 4:01:14 PM
To: GRASS-GIS user mailing list
Cc: Giacomo.DELLI@ext.ec.europa.eu
Subject: [GRASS-user] To import from or to link to data stored in a PostgreSQL data base?

Dears,

the following concerns an update of an existing workflow, part of which
is GRASS GIS, that makes use of a large PostgreSQL data base which does
not reside locally.

The original data set consists of tens of thousands of (overlapping)
polygons. The data are required solely to build raster MASKs.
So, importing the whole of it, is an overkill. Instead, options, already
working, are to split all records in single tables or views. Then access
these via GRASS to perform some analytics.

First instructions of the workflow are:

  • read a (external) vector map
  • set the computational region
  • build a raster mask.

Building a MASK using a pseudo vector map that links to an
external table, stored in a PostgreSQL data base, is times slower than
importing the vetor of interest in GRASS GIS and then building a MASK
using the “native” GRASS GIS vector map.

Giacomo timed different options, using v.external as well as importing
the data using v.in.ogr. Specifically,

  • building a MASK using one pseudo vector map (without and with a
    spatial-index), takes about 9 minutes (real time).

time r.mask vector=test_nogeoindex --o

real 8m40.306s
user 5m14.225s
sys 0m56.378s

and

time r.mask vector=test_geoindex --o

real 8m46.096s
user 5m15.693s
sys 0m56.346s

  • building a MASK using a native GRASS GIS vector map, imported via a
    table or a view, takes about 0.4 seconds.

real 0m0.373s
user 0m0.191s
sys 0m0.111s

and

real 0m0.350s
user 0m0.179s
sys 0m0.115s

For the latter, building a view is way faster than a table (half a
minute for more than 20000 views, while it would take approximately an
hour to build single tables).

The trade-off appears to be space vs time. If data are imported, more
disk space is required. If data are not imported, and v.external is
used, then r.mask takes too much time to build a raster MASK.

  • Is it acceptable for r.mask to take so long in building a MASK based
    on an external vector map stored in a PostgreSQL data base?

  • Is network connection a limiting factor here, since the PG data base
    is not local?

  • Would anyone have any recommendations/considerations on this approach?

Thank you, Nikos

On Tue, May 8, 2018 at 4:01 PM, Nikos Alexandris <nik@nikosalexandris.net> wrote:

Dears,

the following concerns an update of an existing workflow, part of which
is GRASS GIS, that makes use of a large PostgreSQL data base which does
not reside locally.

The original data set consists of tens of thousands of (overlapping)
polygons. The data are required solely to build raster MASKs.
So, importing the whole of it, is an overkill. Instead, options, already
working, are to split all records in single tables or views. Then access
these via GRASS to perform some analytics.

First instructions of the workflow are:

  • read a (external) vector map
  • set the computational region
  • build a raster mask.

Building a MASK using a pseudo vector map that links to an
external table, stored in a PostgreSQL data base, is times slower than
importing the vetor of interest in GRASS GIS and then building a MASK
using the “native” GRASS GIS vector map.

Giacomo timed different options, using v.external as well as importing
the data using v.in.ogr. Specifically,

  • building a MASK using one pseudo vector map (without and with a
    spatial-index), takes about 9 minutes (real time).

time r.mask vector=test_nogeoindex --o

real 8m40.306s
user 5m14.225s
sys 0m56.378s

and

time r.mask vector=test_geoindex --o

real 8m46.096s
user 5m15.693s
sys 0m56.346s

  • building a MASK using a native GRASS GIS vector map, imported via a
    table or a view, takes about 0.4 seconds.

real 0m0.373s
user 0m0.191s
sys 0m0.111s

and

real 0m0.350s
user 0m0.179s
sys 0m0.115s

you need to include v.external/v.in.ogr in your timing in order to get real timings for creating a raster MASK from a vector stored in a remote database.

as a general rule of thumb, processing becomes faster if you create a local copy of the data to be processed, particularly if these data need to be accessed repeatedly.

For the latter, building a view is way faster than a table (half a
minute for more than 20000 views, while it would take approximately an
hour to build single tables).

The trade-off appears to be space vs time. If data are imported, more
disk space is required. If data are not imported, and v.external is
used, then r.mask takes too much time to build a raster MASK.

  • Is it acceptable for r.mask to take so long in building a MASK based
    on an external vector map stored in a PostgreSQL data base?

  • Is network connection a limiting factor here, since the PG data base
    is not local?

you could check the network connection speed with some data transfer, ideally by transferring ordinary files/directories

Markus M

  • Would anyone have any recommendations/considerations on this approach?

Thank you, Nikos


grass-user mailing list
grass-user@lists.osgeo.org
https://lists.osgeo.org/mailman/listinfo/grass-user

That's an interesting idea. Monday, when the JRC is open again, will be
the day to test that.

Thank you Stefan. Thank you also Markus for the tips.
I am positive that eventually we will find a balanced approach for the
workflow.

Cheers, Nikos

* Stefan Blumentrath <Stefan.Blumentrath@nina.no> [2018-05-08 18:31:49 +0000]:

Hei Nikos,
What about using gdal_rasterize and then r.external output=MASK?

Cheers
Stefan
________________________________
From: grass-user <grass-user-bounces@lists.osgeo.org> on behalf of Nikos Alexandris <nik@nikosalexandris.net>
Sent: Tuesday, May 8, 2018 4:01:14 PM
To: GRASS-GIS user mailing list
Cc: Giacomo.DELLI@ext.ec.europa.eu
Subject: [GRASS-user] To import from or to link to data stored in a PostgreSQL data base?

Dears,

the following concerns an update of an existing workflow, part of which
is GRASS GIS, that makes use of a large PostgreSQL data base which does
not reside locally.

The original data set consists of tens of thousands of (overlapping)
polygons. The data are required solely to build raster MASKs.
So, importing the whole of it, is an overkill. Instead, options, already
working, are to split all records in single tables or views. Then access
these via GRASS to perform some analytics.

First instructions of the workflow are:

- read a (external) vector map
- set the computational region
- build a raster mask.

Building a MASK using a pseudo vector map that links to an
external table, stored in a PostgreSQL data base, is times slower than
importing the vetor of interest in GRASS GIS and then building a MASK
using the "native" GRASS GIS vector map.

Giacomo timed different options, using `v.external` as well as importing
the data using `v.in.ogr`. Specifically,

- building a MASK using one pseudo vector map (without and with a
spatial-index), takes about 9 minutes (real time).

time r.mask vector=test_nogeoindex --o

real 8m40.306s
user 5m14.225s
sys 0m56.378s

and

time r.mask vector=test_geoindex --o

real 8m46.096s
user 5m15.693s
sys 0m56.346s

- building a MASK using a native GRASS GIS vector map, imported via a
table or a view, takes about 0.4 seconds.

real 0m0.373s
user 0m0.191s
sys 0m0.111s

and

real 0m0.350s
user 0m0.179s
sys 0m0.115s

For the latter, building a view is way faster than a table (half a
minute for more than 20000 views, while it would take approximately an
hour to build single tables).

The trade-off appears to be space vs time. If data are imported, more
disk space is required. If data are not imported, and `v.external` is
used, then `r.mask` takes too much time to build a raster MASK.

- Is it acceptable for `r.mask` to take so long in building a MASK based
on an external vector map stored in a PostgreSQL data base?

- Is network connection a limiting factor here, since the PG data base
is not local?

- Would anyone have any recommendations/considerations on this approach?

Thank you, Nikos

--
Nikos Alexandris | Remote Sensing & Geomatics
GPG Key Fingerprint 6F9D4506F3CA28380974D31A9053534B693C4FB3

Update: Stefan's suggestion is worth. First test show that we can save a
lot of time.

Thank you so much Stefan and community.

Nikos

* Nikos Alexandris <nik@nikosalexandris.net> [2018-05-10 14:58:59 +0200]:

That's an interesting idea. Monday, when the JRC is open again, will be
the day to test that.

Thank you Stefan. Thank you also Markus for the tips.
I am positive that eventually we will find a balanced approach for the
workflow.

Cheers, Nikos

* Stefan Blumentrath <Stefan.Blumentrath@nina.no> [2018-05-08 18:31:49 +0000]:

Hei Nikos,
What about using gdal_rasterize and then r.external output=MASK?

Cheers
Stefan