[GRASS-user] error while using i.segment.stats

Hi,

We are trying to use i.segment.stats for a map with 800000+ segments and already in two different laptops, we get:

bands=g.list rast pat=IGUAZU_IMG_* sep=,

RASTER_STATS=(min,max,range,mean,stddev,median,first_quart,third_quart,perc_90)
AREA_STATS=(area,perimeter,compact_circle,compact_square,fd)

i.segment.stats -rc \

map=segments_full_region
rasters=$bands
raster_statistics=$RASTER_STATS
area_measures=$AREA_STATS
vectormap=segs_stats_map
processes=4
Calculating geometry statistics…
Calculating statistics for raster maps…
Exception in thread Thread-3:
Traceback (most recent call last):
File “/usr/lib/python3.6/threading.py”, line 916, in _bootstrap_inner
self.run()
File “/usr/lib/python3.6/threading.py”, line 864, in run
self._target(*self._args, **self._kwargs)
File “/usr/lib/python3.6/multiprocessing/pool.py”, line 463, in _handle_results
task = get()
File “/usr/lib/python3.6/multiprocessing/connection.py”, line 251, in recv
return _ForkingPickler.loads(buf.getbuffer())

TypeError: init() missing 3 required positional arguments: ‘module’, ‘code’, and ‘returncode’

Does it have to do with memory? I used the module a month ago with 500000+ segments and it worked just fine…

Any hints are more than welcome

Best,
Vero

Hi Vero,

Le Mon, 2 Dec 2019 13:43:30 +0100,
Veronica Andreo <veroandreo@gmail.com> a écrit :

Hi,

We are trying to use i.segment.stats for a map with 800000+ segments
and already in two different laptops, we get:

bands=`g.list rast pat=IGUAZU_IMG_* sep=,`
RASTER_STATS=(min,max,range,mean,stddev,median,first_quart,third_quart,perc_90)
AREA_STATS=(area,perimeter,compact_circle,compact_square,fd)

i.segment.stats -rc \
                 map=segments_full_region \
                 rasters=$bands \
                 raster_statistics=$RASTER_STATS \
                 area_measures=$AREA_STATS \
                 vectormap=segs_stats_map \
                 processes=4
Calculating geometry statistics...
Calculating statistics for raster maps...
Exception in thread Thread-3:
Traceback (most recent call last):
  File "/usr/lib/python3.6/threading.py", line 916, in
_bootstrap_inner self.run()
  File "/usr/lib/python3.6/threading.py", line 864, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/lib/python3.6/multiprocessing/pool.py", line 463, in
_handle_results
    task = get()
  File "/usr/lib/python3.6/multiprocessing/connection.py", line 251,
in recv return _ForkingPickler.loads(buf.getbuffer())
TypeError: __init__() missing 3 required positional arguments:
'module', 'code', and 'returncode'

Does it have to do with memory? I used the module a month ago with
500000+ segments and it worked just fine...

I don't think memory is the issue, but I find the error message
pretty cryptic, so wouldn't exclude altogether. Could it be some
difference between Python 2 and 3 in the multiprocessing module ? Would
it be possible for you try running it in Python 2 ?

Maybe you could also try to run it on a smaller subset of the
segmentation result ?

Moritz

--
Département Géosciences, Environnement et Société Université Libre de
Bruxelles Bureau: S.DB.6.138
CP 130/03
Av. F.D. Roosevelt 50
1050 Bruxelles
Belgique

tél. + 32 2 650.68.12 / 68.11 (secr.)
fax + 32 2 650.68.30

Hi Moritz,

Thanks for your answer :slight_smile:

In smaller subsets (per cutlines tiles), it works just fine.

I will try later today or tomorrow to run it with grass76.

Best,
Vero

El mar., 3 dic. 2019 a las 12:41, Moritz Lennert (<mlennert@club.worldonline.be>) escribió:

Hi Vero,

Le Mon, 2 Dec 2019 13:43:30 +0100,
Veronica Andreo <veroandreo@gmail.com> a écrit :

Hi,

We are trying to use i.segment.stats for a map with 800000+ segments
and already in two different laptops, we get:

bands=g.list rast pat=IGUAZU_IMG_* sep=,
RASTER_STATS=(min,max,range,mean,stddev,median,first_quart,third_quart,perc_90)
AREA_STATS=(area,perimeter,compact_circle,compact_square,fd)

i.segment.stats -rc
map=segments_full_region
rasters=$bands
raster_statistics=$RASTER_STATS
area_measures=$AREA_STATS
vectormap=segs_stats_map
processes=4
Calculating geometry statistics…
Calculating statistics for raster maps…
Exception in thread Thread-3:
Traceback (most recent call last):
File “/usr/lib/python3.6/threading.py”, line 916, in
_bootstrap_inner self.run()
File “/usr/lib/python3.6/threading.py”, line 864, in run
self._target(*self._args, **self._kwargs)
File “/usr/lib/python3.6/multiprocessing/pool.py”, line 463, in
_handle_results
task = get()
File “/usr/lib/python3.6/multiprocessing/connection.py”, line 251,
in recv return _ForkingPickler.loads(buf.getbuffer())
TypeError: init() missing 3 required positional arguments:
‘module’, ‘code’, and ‘returncode’

Does it have to do with memory? I used the module a month ago with
500000+ segments and it worked just fine…

I don’t think memory is the issue, but I find the error message
pretty cryptic, so wouldn’t exclude altogether. Could it be some
difference between Python 2 and 3 in the multiprocessing module ? Would
it be possible for you try running it in Python 2 ?

Maybe you could also try to run it on a smaller subset of the
segmentation result ?

Moritz


Département Géosciences, Environnement et Société Université Libre de
Bruxelles Bureau: S.DB.6.138
CP 130/03
Av. F.D. Roosevelt 50
1050 Bruxelles
Belgique

tél. + 32 2 650.68.12 / 68.11 (secr.)
fax + 32 2 650.68.30

On 3/12/19 13:26, Veronica Andreo wrote:

Hi Moritz,

Thanks for your answer :slight_smile:

In smaller subsets (per cutlines tiles), it works just fine.

With grass78 using python 3 ? Then probably it's not a python version issue.

What is the region size ? When it is calculating the raster statistics it uses r.univar with extended statistics (-e flag), if the region size is too big, actually you are right that this might lead to memory errors if it tries to run 4 instances in parallel.

Unfortunately, r.univar does not have a memory option allowing to limit memory usage. Maybe i.segment.stats should check the region size and available memory and bail out if there's not enough memory. Not sure how to calculate the needed memory size, though (especially since r.univar's -t flag is also set).

You could also try to reduce parallelization, i.e. run it with two processes only. It will obviously be slower.

Moritz

El mar., 3 dic. 2019 a las 12:41, Moritz Lennert (<mlennert@club.worldonline.be <mailto:mlennert@club.worldonline.be>>) escribió:

    Hi Vero,

    Le Mon, 2 Dec 2019 13:43:30 +0100,
    Veronica Andreo <veroandreo@gmail.com <mailto:veroandreo@gmail.com>>
    a écrit :

     > Hi,
     >
     > We are trying to use i.segment.stats for a map with 800000+ segments
     > and already in two different laptops, we get:
     >
     > bands=`g.list rast pat=IGUAZU_IMG_* sep=,`
     >
    RASTER_STATS=(min,max,range,mean,stddev,median,first_quart,third_quart,perc_90)
     > AREA_STATS=(area,perimeter,compact_circle,compact_square,fd)
     >
     > i.segment.stats -rc \
     > map=segments_full_region \
     > rasters=$bands \
     > raster_statistics=$RASTER_STATS \
     > area_measures=$AREA_STATS \
     > vectormap=segs_stats_map \
     > processes=4
     > Calculating geometry statistics...
     > Calculating statistics for raster maps...
     > Exception in thread Thread-3:
     > Traceback (most recent call last):
     > File "/usr/lib/python3.6/threading.py", line 916, in
     > _bootstrap_inner self.run()
     > File "/usr/lib/python3.6/threading.py", line 864, in run
     > self._target(*self._args, **self._kwargs)
     > File "/usr/lib/python3.6/multiprocessing/pool.py", line 463, in
     > _handle_results
     > task = get()
     > File "/usr/lib/python3.6/multiprocessing/connection.py", line 251,
     > in recv return _ForkingPickler.loads(buf.getbuffer())
     > TypeError: __init__() missing 3 required positional arguments:
     > 'module', 'code', and 'returncode'
     >
     > Does it have to do with memory? I used the module a month ago with
     > 500000+ segments and it worked just fine...

    I don't think memory is the issue, but I find the error message
    pretty cryptic, so wouldn't exclude altogether. Could it be some
    difference between Python 2 and 3 in the multiprocessing module ? Would
    it be possible for you try running it in Python 2 ?

    Maybe you could also try to run it on a smaller subset of the
    segmentation result ?

    Moritz

    -- Département Géosciences, Environnement et Société Université Libre de
    Bruxelles Bureau: S.DB.6.138
    CP 130/03
    Av. F.D. Roosevelt 50
    1050 Bruxelles
    Belgique

    tél. + 32 2 650.68.12 / 68.11 (secr.)
    fax + 32 2 650.68.30

Hi Moritz,

thanks for coming back to this topic :slight_smile:

Here, the original region settings:

g.region -p
projection: 1 (UTM)
zone: -21
datum: wgs84
ellipsoid: wgs84
north: 7168182
south: 7157384
west: 737450
east: 750236
nsres: 0.5
ewres: 0.5
rows: 21596
cols: 25572
cells: 552252912

Because of the error we reduced it by ~half, but still got the same error…

[…]
rows: 20478
cols: 13553
cells: 277538334

In the end, we solved it by extracting the stats per tile (cutlines irregular tiles) and then patching…

best,
Vero

El jue., 5 dic. 2019 a las 9:35, Moritz Lennert (<mlennert@club.worldonline.be>) escribió:

On 3/12/19 13:26, Veronica Andreo wrote:

Hi Moritz,

Thanks for your answer :slight_smile:

In smaller subsets (per cutlines tiles), it works just fine.

With grass78 using python 3 ? Then probably it’s not a python version issue.

What is the region size ? When it is calculating the raster statistics
it uses r.univar with extended statistics (-e flag), if the region size
is too big, actually you are right that this might lead to memory errors
if it tries to run 4 instances in parallel.

Unfortunately, r.univar does not have a memory option allowing to limit
memory usage. Maybe i.segment.stats should check the region size and
available memory and bail out if there’s not enough memory. Not sure how
to calculate the needed memory size, though (especially since r.univar’s
-t flag is also set).

You could also try to reduce parallelization, i.e. run it with two
processes only. It will obviously be slower.

Moritz

El mar., 3 dic. 2019 a las 12:41, Moritz Lennert
(<mlennert@club.worldonline.be mailto:[mlennert@club.worldonline.be](mailto:mlennert@club.worldonline.be)>)
escribió:

Hi Vero,

Le Mon, 2 Dec 2019 13:43:30 +0100,
Veronica Andreo <veroandreo@gmail.com mailto:[veroandreo@gmail.com](mailto:veroandreo@gmail.com)>
a écrit :

Hi,

We are trying to use i.segment.stats for a map with 800000+ segments
and already in two different laptops, we get:

bands=g.list rast pat=IGUAZU_IMG_* sep=,

RASTER_STATS=(min,max,range,mean,stddev,median,first_quart,third_quart,perc_90)

AREA_STATS=(area,perimeter,compact_circle,compact_square,fd)

i.segment.stats -rc
map=segments_full_region
rasters=$bands
raster_statistics=$RASTER_STATS
area_measures=$AREA_STATS
vectormap=segs_stats_map
processes=4
Calculating geometry statistics…
Calculating statistics for raster maps…
Exception in thread Thread-3:
Traceback (most recent call last):
File “/usr/lib/python3.6/threading.py”, line 916, in
_bootstrap_inner self.run()
File “/usr/lib/python3.6/threading.py”, line 864, in run
self._target(*self._args, **self._kwargs)
File “/usr/lib/python3.6/multiprocessing/pool.py”, line 463, in
_handle_results
task = get()
File “/usr/lib/python3.6/multiprocessing/connection.py”, line 251,
in recv return _ForkingPickler.loads(buf.getbuffer())
TypeError: init() missing 3 required positional arguments:
‘module’, ‘code’, and ‘returncode’

Does it have to do with memory? I used the module a month ago with
500000+ segments and it worked just fine…

I don’t think memory is the issue, but I find the error message
pretty cryptic, so wouldn’t exclude altogether. Could it be some
difference between Python 2 and 3 in the multiprocessing module ? Would
it be possible for you try running it in Python 2 ?

Maybe you could also try to run it on a smaller subset of the
segmentation result ?

Moritz


Département Géosciences, Environnement et Société Université Libre de
Bruxelles Bureau: S.DB.6.138
CP 130/03
Av. F.D. Roosevelt 50
1050 Bruxelles
Belgique

tél. + 32 2 650.68.12 / 68.11 (secr.)
fax + 32 2 650.68.30