[GRASS-dev] multiprocessing problem

Hi devs,

I wrote an addons to calculate Rao's Q diversity index [0], I would
like to speed it up using multiprocessing but with multiprocessing it
took 2/3 times longer than a single process. Could someone look at the
code and explain to me what I'm doing wrong?
A colleague of mine suggested using cython with prange function [1]
but I would avoid it since I need to study it and how to compile it in
GRASS and I have no time to spend on it.

thanks a lot
Luca

[0] https://github.com/lucadelu/grass-addons/tree/raoq/src/raster/r.raoq.area
[1] https://cython.readthedocs.io/en/latest/src/userguide/parallelism.html

--
ciao
Luca

www.lucadelu.org

Hi Luca,

Just two brainstorming ideas:

- From a rapid glance at the code it seems to me that you create a separate worker for each row in the raster. Correct ? AFAIR, spawning workers does create quite a bit of overhead. Depending on the row to column ratio of your raster, maybe you would be better off sending larger chunks to workers ?

- Depending on the number of parallel jobs, disk access can quickly become the bottleneck on non parallelized file systems. So it would be interesting to see if using fewer processes might actually speed up things. Then it is a question of finding the equilibrium.

Moritz

Le 8 avril 2022 07:32:37 GMT+02:00, Luca Delucchi <lucadeluge@gmail.com> a écrit :

Hi devs,

I wrote an addons to calculate Rao's Q diversity index [0], I would
like to speed it up using multiprocessing but with multiprocessing it
took 2/3 times longer than a single process. Could someone look at the
code and explain to me what I'm doing wrong?
A colleague of mine suggested using cython with prange function [1]
but I would avoid it since I need to study it and how to compile it in
GRASS and I have no time to spend on it.

thanks a lot
Luca

[0] https://github.com/lucadelu/grass-addons/tree/raoq/src/raster/r.raoq.area
[1] https://cython.readthedocs.io/en/latest/src/userguide/parallelism.html

On Fri, 8 Apr 2022 at 09:14, Moritz Lennert
<mlennert@club.worldonline.be> wrote:

Hi Luca,

Hi Moritz,

Just two brainstorming ideas:

- From a rapid glance at the code it seems to me that you create a separate worker for each row in the raster. Correct ? AFAIR, spawning workers does create quite a bit of overhead. Depending on the row to column ratio of your raster, maybe you would be better off sending larger chunks to workers ?

right now I creating a worker for each pixel to be checked against all
the other pixels, yes it could be and idea to send larger chunks, I
could split the array vertically according to the number of processor

- Depending on the number of parallel jobs, disk access can quickly become the bottleneck on non parallelized file systems. So it would be interesting to see if using fewer processes might actually speed up things. Then it is a question of finding the equilibrium.

ok, this make sense
thanks for your support

Moritz

--
ciao
Luca

www.lucadelu.org

Ciao Luca,

Yes, you could also consider looping over e.g. rows (maybe in combination with "np.apply_along_axis") so you could put results easier back together to a map if needed at a later stage.

In addition, since you use multiprocessing.Manager, you may try to use multiprocessing.Array: https://docs.python.org/3/library/multiprocessing.html#multiprocessing.Array

E.g. here:
https://github.com/lucadelu/grass-addons/blob/5ca56bdb8b3394ebeed23aa5b3240bf6690e51bf/src/raster/r.raoq.area/r.raoq.area.py#L81

According to the post here: https://medium.com/analytics-vidhya/using-numpy-efficiently-between-processes-1bee17dcb01
multiprocessing.Array is needed to put the numpy array into shared memory and avoid pickling.

I have not tried or investigated myself, but maybe worth a try...

Cheers
Stefan

-----Original Message-----
From: grass-dev <grass-dev-bounces@lists.osgeo.org> On Behalf Of Luca Delucchi
Sent: fredag 8. april 2022 10:46
To: Moritz Lennert <mlennert@club.worldonline.be>
Cc: GRASS-dev <grass-dev@lists.osgeo.org>
Subject: Re: [GRASS-dev] multiprocessing problem

On Fri, 8 Apr 2022 at 09:14, Moritz Lennert <mlennert@club.worldonline.be> wrote:

Hi Luca,

Hi Moritz,

Just two brainstorming ideas:

- From a rapid glance at the code it seems to me that you create a separate worker for each row in the raster. Correct ? AFAIR, spawning workers does create quite a bit of overhead. Depending on the row to column ratio of your raster, maybe you would be better off sending larger chunks to workers ?

right now I creating a worker for each pixel to be checked against all the other pixels, yes it could be and idea to send larger chunks, I could split the array vertically according to the number of processor

- Depending on the number of parallel jobs, disk access can quickly become the bottleneck on non parallelized file systems. So it would be interesting to see if using fewer processes might actually speed up things. Then it is a question of finding the equilibrium.

ok, this make sense
thanks for your support

Moritz

--
ciao
Luca

https://eur01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.lucadelu.org%2F&amp;data=04|01|Stefan.Blumentrath%40nina.no|8821ac9b35674720f9b908da193c3cab|6cef373021314901831055b3abf02c73|0|0|637850043869911903|Unknown|TWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D|3000&amp;sdata=xt8x5QeXm3h1eJIYq9aRbBMHAWXaaYAI9yYNqKMj3mg%3D&amp;reserved=0
_______________________________________________
grass-dev mailing list
grass-dev@lists.osgeo.org
https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.osgeo.org%2Fmailman%2Flistinfo%2Fgrass-dev&amp;data=04|01|Stefan.Blumentrath%40nina.no|8821ac9b35674720f9b908da193c3cab|6cef373021314901831055b3abf02c73|0|0|637850043869911903|Unknown|TWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D|3000&amp;sdata=5Dd9Az3ZqLwd5wS7A9dM5jJz8boqwE3%2FPJFBK8texCQ%3D&amp;reserved=0

Hi Luca,

I would say the biggest problem is the memory, I tried to run it and it consumes way too much memory. Maybe you could process the differences from each pixel (compute the sum) as they are computed, not collect it and do it in the end. Otherwise you can significantly speed it up simply with one core by using numpy in a better way:

vals = np.array([np.abs(y - array.flat) for y in array.flat])


out = np.sum(vals) / number2

On Fri, Apr 8, 2022 at 5:17 AM Stefan Blumentrath <Stefan.Blumentrath@nina.no> wrote:

Ciao Luca,

Yes, you could also consider looping over e.g. rows (maybe in combination with “np.apply_along_axis”) so you could put results easier back together to a map if needed at a later stage.

In addition, since you use multiprocessing.Manager, you may try to use multiprocessing.Array: https://docs.python.org/3/library/multiprocessing.html#multiprocessing.Array

E.g. here:
https://github.com/lucadelu/grass-addons/blob/5ca56bdb8b3394ebeed23aa5b3240bf6690e51bf/src/raster/r.raoq.area/r.raoq.area.py#L81

According to the post here: https://medium.com/analytics-vidhya/using-numpy-efficiently-between-processes-1bee17dcb01
multiprocessing.Array is needed to put the numpy array into shared memory and avoid pickling.

I have not tried or investigated myself, but maybe worth a try…

Cheers
Stefan

-----Original Message-----
From: grass-dev <grass-dev-bounces@lists.osgeo.org> On Behalf Of Luca Delucchi
Sent: fredag 8. april 2022 10:46
To: Moritz Lennert <mlennert@club.worldonline.be>
Cc: GRASS-dev <grass-dev@lists.osgeo.org>
Subject: Re: [GRASS-dev] multiprocessing problem

On Fri, 8 Apr 2022 at 09:14, Moritz Lennert <mlennert@club.worldonline.be> wrote:

Hi Luca,

Hi Moritz,

Just two brainstorming ideas:

  • From a rapid glance at the code it seems to me that you create a separate worker for each row in the raster. Correct ? AFAIR, spawning workers does create quite a bit of overhead. Depending on the row to column ratio of your raster, maybe you would be better off sending larger chunks to workers ?

right now I creating a worker for each pixel to be checked against all the other pixels, yes it could be and idea to send larger chunks, I could split the array vertically according to the number of processor

  • Depending on the number of parallel jobs, disk access can quickly become the bottleneck on non parallelized file systems. So it would be interesting to see if using fewer processes might actually speed up things. Then it is a question of finding the equilibrium.

ok, this make sense
thanks for your support

Moritz


ciao
Luca

https://eur01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.lucadelu.org%2F&data=04%7C01%7CStefan.Blumentrath%40nina.no%7C8821ac9b35674720f9b908da193c3cab%7C6cef373021314901831055b3abf02c73%7C0%7C0%7C637850043869911903%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=xt8x5QeXm3h1eJIYq9aRbBMHAWXaaYAI9yYNqKMj3mg%3D&reserved=0


grass-dev mailing list
grass-dev@lists.osgeo.org
https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.osgeo.org%2Fmailman%2Flistinfo%2Fgrass-dev&data=04%7C01%7CStefan.Blumentrath%40nina.no%7C8821ac9b35674720f9b908da193c3cab%7C6cef373021314901831055b3abf02c73%7C0%7C0%7C637850043869911903%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=5Dd9Az3ZqLwd5wS7A9dM5jJz8boqwE3%2FPJFBK8texCQ%3D&reserved=0


grass-dev mailing list
grass-dev@lists.osgeo.org
https://lists.osgeo.org/mailman/listinfo/grass-dev

On Fri, 8 Apr 2022 at 11:17, Stefan Blumentrath
<Stefan.Blumentrath@nina.no> wrote:

Ciao Luca,

Ciao Stefan

Yes, you could also consider looping over e.g. rows (maybe in combination with "np.apply_along_axis") so you could put results easier back together to a map if needed at a later stage.

In addition, since you use multiprocessing.Manager, you may try to use multiprocessing.Array: https://docs.python.org/3/library/multiprocessing.html#multiprocessing.Array

E.g. here:
https://github.com/lucadelu/grass-addons/blob/5ca56bdb8b3394ebeed23aa5b3240bf6690e51bf/src/raster/r.raoq.area/r.raoq.area.py#L81

According to the post here: https://medium.com/analytics-vidhya/using-numpy-efficiently-between-processes-1bee17dcb01
multiprocessing.Array is needed to put the numpy array into shared memory and avoid pickling.

I have not tried or investigated myself, but maybe worth a try...

Yes I saw it but I didn't try before. I tried last days but I didn't
get any improvements, I will try in the coming days

Cheers
Stefan

--
ciao
Luca

www.lucadelu.org

On Fri, 8 Apr 2022 at 16:33, Anna Petrášová <kratochanna@gmail.com> wrote:

Hi Luca,

Hi Anna,

I would say the biggest problem is the memory, I tried to run it and it consumes way too much memory. Maybe you could process the differences from each pixel (compute the sum) as they are computed, not collect it and do it in the end. Otherwise you can significantly speed it up simply with one core by using numpy in a better way:

vals = np.array([np.abs(y - array.flat) for y in array.flat])
...
out = np.sum(vals) / number2

yes, this work better then my solution, but increasing the number of
pixels I get the process killed. I have 16GB RAM and I was not able to
process 80000 cells....
I tried a few different solutions but the result is always the same.

--
ciao
Luca

www.lucadelu.org

As I said, you can sum the values for each pixel so you don’t store all the differences, that gets rid of the memory problem, but of course it will still be slow if it’s not parallelized:

vals = np.array([np.sum(np.abs(y - array.flat)) for y in array.flat])

Note that I didn’t check thoroughly if the computation by itself is correct, i.e. you get the correct value in terms of the index definition. One other idea is to avoid some of the computations since you are in fact computing the distances twice (distance from pixel 1 to pixel 2 and vice versa). Also, do you actually need to compute this for the entire raster, shouldn’t this be more a moving window approach, so you would restrict the distance computation only to a window around that pixel?

Anna

On Mon, Apr 11, 2022 at 2:09 AM Luca Delucchi <lucadeluge@gmail.com> wrote:

On Fri, 8 Apr 2022 at 16:33, Anna Petrášová <kratochanna@gmail.com> wrote:

Hi Luca,

Hi Anna,

I would say the biggest problem is the memory, I tried to run it and it consumes way too much memory. Maybe you could process the differences from each pixel (compute the sum) as they are computed, not collect it and do it in the end. Otherwise you can significantly speed it up simply with one core by using numpy in a better way:

vals = np.array([np.abs(y - array.flat) for y in array.flat])

out = np.sum(vals) / number2

yes, this work better then my solution, but increasing the number of
pixels I get the process killed. I have 16GB RAM and I was not able to
process 80000 cells…
I tried a few different solutions but the result is always the same.


ciao
Luca

www.lucadelu.org