I’m interested in applying for the “Parallelization of existing tools” GSoC idea. I’ve been getting familiar with the GRASS codebase and OpenMP patterns used in modules like r.neighbors. I submitted a draft PR (#7044) attempting to parallelize r.resamp.stats , I’d really appreciate feedback on whether my approach is on the right track, since I’m still learning.
While studying the existing parallelization in r.neighbors, I noticed that G_percent is called inside the parallel region, which seems related to issue #5776. I documented this in a comment on the issue. I’m looking at r.fill.stats and r.mfilter as potential next candidates. Could someone point me in the right direction on which tools the community would most benefit from parallelizing?
Some tools are listed in the GSoC topic, but we haven’t done a thorough analysis whether these are necessarily good candidates based on their algorithms and libraries, so you would have to explore that. r.proj is a tool that I would like to see parallelized, but it has been tried before (you can probably dig in github history to get to the previous attempt) and failed for some reason.
Regarding the G_percent, you can check how other already parallelized tools use G_percent and attempt to improve it.
I dug into the r.proj history and found the 2012 segfault was caused by the tile cache in readcell.c being non-thread-safe. I’m thinking of starting a proof-of-concept — would a per-thread cache approach or a mutex around get_block() be preferred? Also curious if there are other tools on your radar besides r.proj, r.fill.stats, and r.mfilter.
Hi @annakrat, any update on PR (#7044). Also, I am currently looking into parallelizing r.proj and would love some guidance on the best approach to tackle it. I’m really eager to make this a core part of my GSoC proposal and want to ensure I’m heading in the right direction that aligns with the project’s overall goals
Update on r.proj: I dug deep into the code and the 2012 segfault. The core blocker is that the raster library itself (readcell.c tile cache, Rast_get_row()) is not thread-safe. This isn’t an r.proj-specific problem it’s a foundational issue that affects any module trying to read raster rows from multiple threads. Parallelizing r.proj properly would first require making parts of the raster library thread-safe, which is a large design effort needing mentor alignment.
For my GSoC proposal, I’m shifting focus to modules where parallelization works within the current infrastructure like improving r.texture’s suboptimal schedule(static,1) ordered, and other modules that load data into memory first before parallelizing computation.
I’ve been actively contributing for about 2 months now (PR #7044,#7005,#7097 researching r.proj, looking into r.texture). I’m really committed to GRASS and want to make sure my GSoC proposal is strong. What would you suggest I focus on to improve my chances ,like …. more PRs, a specific set of modules, or refining the proposal itself? Any guidance would mean a lot!
Sorry for the late reply… I will try to get to your PR. What would be valuable is to make a deep analysis of the r.proj problem, you could open an issue for that. That will help us decide what would be a good path forward and help strengthen your application.