[GRASS-dev] Does multi-threading apply to r.series.lwr?

Before processing tens of thousands of Landsat8 TIRS bands, a script
tests the workflow described below, for 74 scenes, that correspond to one
WRS2 tile.

The script takes as inputs:

<Landsat8Pool> path to directory with Landsat scenes
<ScenePattern> regex pattern to match a set of Landsat scene identifiers
<LandCover> land cover map
<grassdb> path to the GRASS GIS data base
<Location> name for the GRASS GIS Location
<TargetMapset> name for the GRASS GIS Mapset to host maps to build a time series
<WindowSize> an odd integer, parameter for a split-window algorithm (SW)

Part 1 of the workflow derives Land Surface Temperature maps by:

1. Creating the target Location
2. Linking pseudo GRASS raster maps to Landsat8 GeoTIFF files
3. Exporting a TGIS-compliant list of maps and timestamps
4. Importing a land cover map required for the SW algorithm
5. Estimating Land Surface Temperature (LST) maps for given scenes (i.landsat8.swlst)
6. Creating a dedicated Mapset for LST maps
7. Copying LST maps in the "LST" Mapset
8. Removing initial LST maps from individual scene Mapsets

With a somewhat strong CPU, producing one LST map (7771 rows by 7651
columns = 59455921 cells), takes ~34 minutes.

For 74 Landsat8 input scenes, first trials took about 42 hours, running
grass-7.3.svn inside a docker container, albeit assigned one CPU.

Part 2 concerns building Time Series by:

1. Creating and LST Spatio-Temporal Raster Dara Set (STRDS)
2. Registering LST maps in TGIS' data base
3. Smoothing the LST STRDS via Local Weighted Regression (r.series.lwr)
4. Timestamping Local-Weight-Regression derived maps
5. Creating an STRDS for LWR maps
6. Registering LWR maps in TGIS' data base

Part 2 took about 130 minutes including all steps. Obviously, step 3 is
practically the consumer.

Overall it took about 44 hours to build an LWR Smoothed LST STRDS.

After proofing the concept, good use of the cluster concerns steps 2, 5,
7 and 8 (of Part 1). Processes that can/should run in parallel, in an
(admittedly heterogeneous) cluster that consists of 912 cores.

Final concern is if multi-threading applies to r.series.lwr (step 3 of
Part 2).

Thank you, Nikos

On Fri, Nov 3, 2017 at 7:08 PM, Nikos Alexandris <nik@nikosalexandris.net> wrote:

Before processing tens of thousands of Landsat8 TIRS bands, a script
tests the workflow described below, for 74 scenes, that correspond to one
WRS2 tile.

The script takes as inputs:

path to directory with Landsat scenes
regex pattern to match a set of Landsat scene identifiers
land cover map
path to the GRASS GIS data base
name for the GRASS GIS Location
name for the GRASS GIS Mapset to host maps to build a time series
an odd integer, parameter for a split-window algorithm (SW)

Part 1 of the workflow derives Land Surface Temperature maps by:

  1. Creating the target Location
  2. Linking pseudo GRASS raster maps to Landsat8 GeoTIFF files
  3. Exporting a TGIS-compliant list of maps and timestamps
  4. Importing a land cover map required for the SW algorithm
  5. Estimating Land Surface Temperature (LST) maps for given scenes (i.landsat8.swlst)
  6. Creating a dedicated Mapset for LST maps
  7. Copying LST maps in the “LST” Mapset
  8. Removing initial LST maps from individual scene Mapsets

With a somewhat strong CPU, producing one LST map (7771 rows by 7651
columns = 59455921 cells), takes ~34 minutes.

For 74 Landsat8 input scenes, first trials took about 42 hours, running
grass-7.3.svn inside a docker container, albeit assigned one CPU.

Part 2 concerns building Time Series by:

  1. Creating and LST Spatio-Temporal Raster Dara Set (STRDS)
  2. Registering LST maps in TGIS’ data base
  3. Smoothing the LST STRDS via Local Weighted Regression (r.series.lwr)
  4. Timestamping Local-Weight-Regression derived maps
  5. Creating an STRDS for LWR maps
  6. Registering LWR maps in TGIS’ data base

Part 2 took about 130 minutes including all steps. Obviously, step 3 is
practically the consumer.

Overall it took about 44 hours to build an LWR Smoothed LST STRDS.

After proofing the concept, good use of the cluster concerns steps 2, 5,
7 and 8 (of Part 1). Processes that can/should run in parallel, in an
(admittedly heterogeneous) cluster that consists of 912 cores.

Final concern is if multi-threading applies to r.series.lwr (step 3 of
Part 2).

You could

  1. create different temporal chunks of the time series, each chunk will be processed by r.series.lwr, all chunks can be processed in parallel.

  2. create different spatial chunks of the time series (tiling the computational region), each chunk will be processed by r.series.lwr, all chunks can be processed in parallel.

HTH,

Markus M

Thank you, Nikos


grass-dev mailing list
grass-dev@lists.osgeo.org
https://lists.osgeo.org/mailman/listinfo/grass-dev

Nikos Alexandris wrote:

Before processing tens of thousands of Landsat8 TIRS bands, a script
tests the workflow described below, for 74 scenes, that correspond to one
WRS2 tile.

The script takes as inputs:

<Landsat8Pool> path to directory with Landsat scenes
<ScenePattern> regex pattern to match a set of Landsat scene identifiers
<LandCover> land cover map
<grassdb> path to the GRASS GIS data base
<Location> name for the GRASS GIS Location
<TargetMapset> name for the GRASS GIS Mapset to host maps to build a time

series

<WindowSize> an odd integer, parameter for a split-window algorithm (SW)

Part 1 of the workflow derives Land Surface Temperature maps by:

1. Creating the target Location
2. Linking pseudo GRASS raster maps to Landsat8 GeoTIFF files
3. Exporting a TGIS-compliant list of maps and timestamps
4. Importing a land cover map required for the SW algorithm
5. Estimating Land Surface Temperature (LST) maps for given scenes (i.landsat8.swlst)
6. Creating a dedicated Mapset for LST maps
7. Copying LST maps in the "LST" Mapset
8. Removing initial LST maps from individual scene Mapsets

With a somewhat strong CPU, producing one LST map (7771 rows by 7651
columns = 59455921 cells), takes ~34 minutes.

For 74 Landsat8 input scenes, first trials took about 42 hours, running
grass-7.3.svn inside a docker container, albeit assigned one CPU.

Part 2 concerns building Time Series by:

1. Creating and LST Spatio-Temporal Raster Dara Set (STRDS)
2. Registering LST maps in TGIS' data base
3. Smoothing the LST STRDS via Local Weighted Regression (r.series.lwr)
4. Timestamping Local-Weight-Regression derived maps
5. Creating an STRDS for LWR maps
6. Registering LWR maps in TGIS' data base

Part 2 took about 130 minutes including all steps. Obviously, step 3 is
practically the consumer.

Overall it took about 44 hours to build an LWR Smoothed LST STRDS.

After proofing the concept, good use of the cluster concerns steps 2, 5,
7 and 8 (of Part 1). Processes that can/should run in parallel, in an
(admittedly heterogeneous) cluster that consists of 912 cores.

Final concern is if multi-threading applies to r.series.lwr (step 3 of
Part 2).

Markus Metz <markus.metz.giswork@gmail.com> [2017-11-03 22:08:39 +0100]:

You could
1. create different temporal chunks of the time series, each chunk will be
processed by r.series.lwr, all chunks can be processed in parallel.
2. create different spatial chunks of the time series (tiling the
computational region), each chunk will be processed by r.series.lwr, all
chunks can be processed in parallel.

#2 is, sort of, obvious and not difficult to realise.

I am statistically puzzled about #1 in what concerns the degree of
over-determination (dod) and the maximum size of gaps to be interpolated
(maxgap) if the time series is temporally split.

I will proceed with #2 and set #1 aside as a future (study and learn) goal.

Much appreciated, it certainly helps,

N

On Tue, Nov 7, 2017 at 10:55 AM, Nikos Alexandris <nik@nikosalexandris.net> wrote:

Nikos Alexandris wrote:

Before processing tens of thousands of Landsat8 TIRS bands, a script
tests the workflow described below, for 74 scenes, that correspond to one
WRS2 tile.

The script takes as inputs:

path to directory with Landsat scenes
regex pattern to match a set of Landsat scene identifiers
land cover map
path to the GRASS GIS data base
name for the GRASS GIS Location
name for the GRASS GIS Mapset to host maps to build a time

series

an odd integer, parameter for a split-window algorithm (SW)

Part 1 of the workflow derives Land Surface Temperature maps by:

  1. Creating the target Location
  2. Linking pseudo GRASS raster maps to Landsat8 GeoTIFF files
  3. Exporting a TGIS-compliant list of maps and timestamps
  4. Importing a land cover map required for the SW algorithm
  5. Estimating Land Surface Temperature (LST) maps for given scenes (i.landsat8.swlst)
  6. Creating a dedicated Mapset for LST maps
  7. Copying LST maps in the “LST” Mapset
  8. Removing initial LST maps from individual scene Mapsets

With a somewhat strong CPU, producing one LST map (7771 rows by 7651
columns = 59455921 cells), takes ~34 minutes.

For 74 Landsat8 input scenes, first trials took about 42 hours, running
grass-7.3.svn inside a docker container, albeit assigned one CPU.

Part 2 concerns building Time Series by:

  1. Creating and LST Spatio-Temporal Raster Dara Set (STRDS)
  2. Registering LST maps in TGIS’ data base
  3. Smoothing the LST STRDS via Local Weighted Regression (r.series.lwr)
  4. Timestamping Local-Weight-Regression derived maps
  5. Creating an STRDS for LWR maps
  6. Registering LWR maps in TGIS’ data base

Part 2 took about 130 minutes including all steps. Obviously, step 3 is
practically the consumer.

Overall it took about 44 hours to build an LWR Smoothed LST STRDS.

After proofing the concept, good use of the cluster concerns steps 2, 5,
7 and 8 (of Part 1). Processes that can/should run in parallel, in an
(admittedly heterogeneous) cluster that consists of 912 cores.

Final concern is if multi-threading applies to r.series.lwr (step 3 of
Part 2).

Markus Metz <markus.metz.giswork@gmail.com> [2017-11-03 22:08:39 +0100]:

You could

  1. create different temporal chunks of the time series, each chunk will be
    processed by r.series.lwr, all chunks can be processed in parallel.
  2. create different spatial chunks of the time series (tiling the
    computational region), each chunk will be processed by r.series.lwr, all
    chunks can be processed in parallel.

#2 is, sort of, obvious and not difficult to realise.

I am statistically puzzled about #1 in what concerns the degree of
over-determination (dod) and the maximum size of gaps to be interpolated
(maxgap) if the time series is temporally split.

The degree of over-determination is and the maximum gap size are important global settings. I am using r.series.lwr with 1. (temporal chunks), setting an overlap of maxgap between different temporal chunks and discarding the overlap results for each temporal chunk. This gives me seamless temporal interpolation for several temporal chunks.

Markus M

I will proceed with #2 and set #1 aside as a future (study and learn) goal.

Much appreciated, it certainly helps,

N