Hi mentors,
My name is Kaushik Raja, and I’m a CS + Crop Sciences student at UIUC. I’ve been diving into the r.proj implementation and I’m very interested in taking on the Parallelization of existing tools project for GSoC this year.
In digging through the code, I found two main hurdles for an OpenMP implementation:
-
Thread-Safety of PROJ Contexts: Managing PJ_CONTEXT per thread to avoid race conditions during coordinate transformations.
-
Caching Logic: The struct cache and get_block mechanisms in r.proj.h seem to require a refactor for concurrent access.
I’m currently debating between two architectural approaches for the proposal:
-
A mutex-locked shared cache for memory efficiency.
-
A tiled processing approach where threads operate on independent row-bands/tiles with local buffers to maximize cache locality and minimize synchronization overhead.
I’m already working with the codebase locally to establish baseline benchmarks for the serial implementation. I’m also planning to submit a draft PR soon with a small proof of concept to show some initial progress and get early feedback on the direction. I’d love to hear from the mentors on which architectural approach aligns better with the long-term goals of the raster library.
Thanks,
Kaushik Raja
Following up on my earlier post about parallelizing r.proj. I have a working proof-of-concept and a draft PR ready.
What I built:
The readcell tile cache is not thread-safe, so rather than wrapping it in mutexes (which I prototyped first, it just serializes everything and gives no speedup), I pre-load the input raster into a flat contiguous RAM buffer before the parallel loop. Reads from a flat array are thread-safe by nature, so no locks are needed during projection.
Benchmarks (10,000 × 10,000 raster, Apple M4, 8 cores):
- Serial: ~20.3s, 83% CPU
- Parallel (UTM → WebMercator): ~8.1s, 515% CPU
- Parallel (UTM → Lat/Lon): 9.1s, 572% CPU
around 2.5x wall-time speedup across two independent projection pipelines.
I’ve added a memory safety check that warns users if the buffer would exceed 4GB and points them to g.region to reduce it.
Draft PR: [GSoC 2026 Draft POC] r.proj: OpenMP parallelization via RAM-resident buffer by krcoder123 · Pull Request #7185 · OSGeo/grass · GitHub
Two questions for mentors:
- Is the RAM-resident approach acceptable long-term, or are thread-local caches strongly preferred for the final implementation?
- Are there thread-safety concerns in GPJ_transform beyond PJ_CONTEXT that I should audit before the proposal deadline?
Thanks,
Kaushik Raja
GitHub: krcoder123
Have you thought about giving an option for raising the 4 GB limit to a user-defined memory limit? I do like referring to g.region rather than just crashing when the memory limit is exceeded.
Doug
On Mon, Mar 16, 2026 at 1:57 AM Kaushik_Raja <noreply@discourse.osgeo.org> wrote:
Someone replied to a topic you are Watching.
Following up on my earlier post about parallelizing r.proj. I have a working proof-of-concept and a draft PR ready.
What I built:
The readcell tile cache is not thread-safe, so rather than wrapping it in mutexes (which I prototyped first — it just serializes everything and gives no speedup), I pre-load the input raster into a flat contiguous RAM buffer before the parallel loop. Reads from a flat array are thread-safe by nature, so no locks are needed during projection.
Benchmarks (10,000 × 10,000 raster, Apple M4, 8 cores):
- Serial: ~20.3s, 83% CPU
- Parallel (UTM → WebMercator): ~8.1s, 515% CPU
- Parallel (UTM → Lat/Lon): 9.1s, 572% CPU
around 2.5x wall-time speedup across two independent projection pipelines.
I’ve added a memory safety check that warns users if the buffer would exceed 4GB and points them to g.region to reduce it.
Draft PR: [GSoC 2026 Draft POC] r.proj: OpenMP parallelization via RAM-resident buffer by krcoder123 · Pull Request #7185 · OSGeo/grass · GitHub
Two questions for mentors:
- Is the RAM-resident approach acceptable long-term, or are thread-local caches strongly preferred for the final implementation?
- Are there thread-safety concerns in GPJ_transform beyond PJ_CONTEXT that I should audit before the proposal deadline?
Thanks,
Kaushik Raja
GitHub: krcoder123
Visit Topic or reply to this email to respond.
To unsubscribe from these emails, click here.
Hi Doug, thanks for the feedback. I’ve updated the PR to wire the existing memory parameter directly to the RAM buffer limit instead of the hardcoded 4GB. Testing with memory=100 correctly triggers the warning showing the map requires 381MB which exceeds the limit, and with memory=8000 it runs cleanly at 632% CPU in 5.9 seconds. The warning now also references g.region as you suggested. Updated commit is in the PR.
Happy to address any other feedback before I finalize the proposal.
Thanks,
Kaushik
I’ve completed a full draft of my proposal and would welcome any feedback before the deadline.
Two things I’d specifically value input on: whether the static globals fix in lib/proj/do_proj.c should be scoped within this project or treated as a separate library fix, and whether the thread-local cache design for Path B aligns with how the community would want this approached.
Thank you,
Kaushik
Thank you, I briefly looked at it and I don’t have necessarily any suggestions for improvements, but you might want to reframe it around parallelization in general as opposed to only r.proj. My expectation would be parallelization of r.proj might not take you more than a month, especially with today’s AI.
Hi Anna, thank you for the feedback. I’ll expand the scope to include additional modules. Could you suggest which tools beyond r.proj would be most valuable to the community? I want to prioritize based on what matters most rather than picking randomly.
Also, for the additional modules, would you expect the same level of analysis as r.proj, or is a higher level overview fine for the secondary targets?
Hi @annakrat
I’ve updated the proposal and expanded the scope based on your feedback. After auditing the r.fill.stats source code, I found that row-level parallelism is blocked by a sliding ring buffer, but the column loop inside interpolate_row() is fully independent per cell, making it a clean target. r.fill.stats is now confirmed as the secondary module. I’ll identify an additional module during the bonding period after auditing the remaining candidates.
Please let me know if there are any changes I should make before the deadline.
Thanks,
Kaushik