[GSoC 2026] Parallelizing r.proj: Question About Architecture on PJ_CONTEXT and Cache Thread-Safety

Kaushik_Raja · March 9, 2026, 3:03am

Hi mentors,

My name is Kaushik Raja, and I’m a CS + Crop Sciences student at UIUC. I’ve been diving into the r.proj implementation and I’m very interested in taking on the Parallelization of existing tools project for GSoC this year.

In digging through the code, I found two main hurdles for an OpenMP implementation:

Thread-Safety of PROJ Contexts: Managing PJ_CONTEXT per thread to avoid race conditions during coordinate transformations.
Caching Logic: The struct cache and get_block mechanisms in r.proj.h seem to require a refactor for concurrent access.

I’m currently debating between two architectural approaches for the proposal:

A mutex-locked shared cache for memory efficiency.
A tiled processing approach where threads operate on independent row-bands/tiles with local buffers to maximize cache locality and minimize synchronization overhead.

I’m already working with the codebase locally to establish baseline benchmarks for the serial implementation. I’m also planning to submit a draft PR soon with a small proof of concept to show some initial progress and get early feedback on the direction. I’d love to hear from the mentors on which architectural approach aligns better with the long-term goals of the raster library.

Thanks,

Kaushik Raja

Kaushik_Raja · March 16, 2026, 5:51am

Following up on my earlier post about parallelizing r.proj. I have a working proof-of-concept and a draft PR ready.

What I built:
The readcell tile cache is not thread-safe, so rather than wrapping it in mutexes (which I prototyped first, it just serializes everything and gives no speedup), I pre-load the input raster into a flat contiguous RAM buffer before the parallel loop. Reads from a flat array are thread-safe by nature, so no locks are needed during projection.

Benchmarks (10,000 × 10,000 raster, Apple M4, 8 cores):

Serial: ~20.3s, 83% CPU
Parallel (UTM → WebMercator): ~8.1s, 515% CPU
Parallel (UTM → Lat/Lon): 9.1s, 572% CPU

around 2.5x wall-time speedup across two independent projection pipelines.

I’ve added a memory safety check that warns users if the buffer would exceed 4GB and points them to g.region to reduce it.
Draft PR: [GSoC 2026 Draft POC] r.proj: OpenMP parallelization via RAM-resident buffer by krcoder123 · Pull Request #7185 · OSGeo/grass · GitHub

Two questions for mentors:

Is the RAM-resident approach acceptable long-term, or are thread-local caches strongly preferred for the final implementation?
Are there thread-safety concerns in GPJ_transform beyond PJ_CONTEXT that I should audit before the proposal deadline?

Thanks,
Kaushik Raja
GitHub: krcoder123

dnewcomb · March 16, 2026, 1:52pm

Have you thought about giving an option for raising the 4 GB limit to a user-defined memory limit? I do like referring to g.region rather than just crashing when the memory limit is exceeded.

Doug

On Mon, Mar 16, 2026 at 1:57 AM Kaushik_Raja <noreply@discourse.osgeo.org> wrote:

Someone replied to a topic you are Watching.

Kaushik_Raja
March 16

Following up on my earlier post about parallelizing r.proj. I have a working proof-of-concept and a draft PR ready.

What I built:
The readcell tile cache is not thread-safe, so rather than wrapping it in mutexes (which I prototyped first — it just serializes everything and gives no speedup), I pre-load the input raster into a flat contiguous RAM buffer before the parallel loop. Reads from a flat array are thread-safe by nature, so no locks are needed during projection.

Benchmarks (10,000 × 10,000 raster, Apple M4, 8 cores):

Serial: ~20.3s, 83% CPU

Parallel (UTM → WebMercator): ~8.1s, 515% CPU

Parallel (UTM → Lat/Lon): 9.1s, 572% CPU

around 2.5x wall-time speedup across two independent projection pipelines.

I’ve added a memory safety check that warns users if the buffer would exceed 4GB and points them to g.region to reduce it.
Draft PR: [GSoC 2026 Draft POC] r.proj: OpenMP parallelization via RAM-resident buffer by krcoder123 · Pull Request #7185 · OSGeo/grass · GitHub

Two questions for mentors:

Is the RAM-resident approach acceptable long-term, or are thread-local caches strongly preferred for the final implementation?

Are there thread-safety concerns in GPJ_transform beyond PJ_CONTEXT that I should audit before the proposal deadline?

Thanks,
Kaushik Raja
GitHub: krcoder123

Visit Topic or reply to this email to respond.

To unsubscribe from these emails, click here.

Kaushik_Raja · March 17, 2026, 12:02am

Hi Doug, thanks for the feedback. I’ve updated the PR to wire the existing memory parameter directly to the RAM buffer limit instead of the hardcoded 4GB. Testing with memory=100 correctly triggers the warning showing the map requires 381MB which exceeds the limit, and with memory=8000 it runs cleanly at 632% CPU in 5.9 seconds. The warning now also references g.region as you suggested. Updated commit is in the PR.

Happy to address any other feedback before I finalize the proposal.

Thanks,

Kaushik

Kaushik_Raja · March 20, 2026, 7:23am

I’ve completed a full draft of my proposal and would welcome any feedback before the deadline.

Two things I’d specifically value input on: whether the static globals fix in lib/proj/do_proj.c should be scoped within this project or treated as a separate library fix, and whether the thread-local cache design for Path B aligns with how the community would want this approached.

Thank you,

Kaushik

annakrat · March 22, 2026, 3:47am

Thank you, I briefly looked at it and I don’t have necessarily any suggestions for improvements, but you might want to reframe it around parallelization in general as opposed to only r.proj. My expectation would be parallelization of r.proj might not take you more than a month, especially with today’s AI.

Kaushik_Raja · March 22, 2026, 5:02am

Hi Anna, thank you for the feedback. I’ll expand the scope to include additional modules. Could you suggest which tools beyond r.proj would be most valuable to the community? I want to prioritize based on what matters most rather than picking randomly.

Also, for the additional modules, would you expect the same level of analysis as r.proj, or is a higher level overview fine for the secondary targets?

Kaushik_Raja · March 26, 2026, 8:41am

Hi @annakrat

I’ve updated the proposal and expanded the scope based on your feedback. After auditing the r.fill.stats source code, I found that row-level parallelism is blocked by a sliding ring buffer, but the column loop inside interpolate_row() is fully independent per cell, making it a clean target. r.fill.stats is now confirmed as the secondary module. I’ll identify an additional module during the bonding period after auditing the remaining candidates.

Please let me know if there are any changes I should make before the deadline.

Thanks,

Kaushik