[GSoC 2026] Parallelizing r.proj: Question About Architecture on PJ_CONTEXT and Cache Thread-Safety

Hi mentors,

My name is Kaushik Raja, and I’m a CS + Crop Sciences student at UIUC. I’ve been diving into the r.proj implementation and I’m very interested in taking on the Parallelization of existing tools project for GSoC this year.

In digging through the code, I found two main hurdles for an OpenMP implementation:

  • Thread-Safety of PROJ Contexts: Managing PJ_CONTEXT per thread to avoid race conditions during coordinate transformations.

  • Caching Logic: The struct cache and get_block mechanisms in r.proj.h seem to require a refactor for concurrent access.

I’m currently debating between two architectural approaches for the proposal:

  • A mutex-locked shared cache for memory efficiency.

  • A tiled processing approach where threads operate on independent row-bands/tiles with local buffers to maximize cache locality and minimize synchronization overhead.

I’m already working with the codebase locally to establish baseline benchmarks for the serial implementation. I’m also planning to submit a draft PR soon with a small proof of concept to show some initial progress and get early feedback on the direction. I’d love to hear from the mentors on which architectural approach aligns better with the long-term goals of the raster library.

Thanks,

Kaushik Raja

Following up on my earlier post about parallelizing r.proj. I have a working proof-of-concept and a draft PR ready.

What I built:
The readcell tile cache is not thread-safe, so rather than wrapping it in mutexes (which I prototyped first, it just serializes everything and gives no speedup), I pre-load the input raster into a flat contiguous RAM buffer before the parallel loop. Reads from a flat array are thread-safe by nature, so no locks are needed during projection.


Benchmarks (10,000 × 10,000 raster, Apple M4, 8 cores):

  • Serial: ~20.3s, 83% CPU
  • Parallel (UTM → WebMercator): ~8.1s, 515% CPU
  • Parallel (UTM → Lat/Lon): 9.1s, 572% CPU

around 2.5x wall-time speedup across two independent projection pipelines.

I’ve added a memory safety check that warns users if the buffer would exceed 4GB and points them to g.region to reduce it.
Draft PR: [GSoC 2026 Draft POC] r.proj: OpenMP parallelization via RAM-resident buffer by krcoder123 · Pull Request #7185 · OSGeo/grass · GitHub

Two questions for mentors:

  1. Is the RAM-resident approach acceptable long-term, or are thread-local caches strongly preferred for the final implementation?
  2. Are there thread-safety concerns in GPJ_transform beyond PJ_CONTEXT that I should audit before the proposal deadline?

Thanks,
Kaushik Raja
GitHub: krcoder123

Have you thought about giving an option for raising the 4 GB limit to a user-defined memory limit? I do like referring to g.region rather than just crashing when the memory limit is exceeded.

Doug

On Mon, Mar 16, 2026 at 1:57 AM Kaushik_Raja <noreply@discourse.osgeo.org> wrote:

Someone replied to a topic you are Watching.

Kaushik_Raja
March 16

Following up on my earlier post about parallelizing r.proj. I have a working proof-of-concept and a draft PR ready.

What I built:
The readcell tile cache is not thread-safe, so rather than wrapping it in mutexes (which I prototyped first — it just serializes everything and gives no speedup), I pre-load the input raster into a flat contiguous RAM buffer before the parallel loop. Reads from a flat array are thread-safe by nature, so no locks are needed during projection.

Benchmarks (10,000 × 10,000 raster, Apple M4, 8 cores):

  • Serial: ~20.3s, 83% CPU
  • Parallel (UTM → WebMercator): ~8.1s, 515% CPU
  • Parallel (UTM → Lat/Lon): 9.1s, 572% CPU

around 2.5x wall-time speedup across two independent projection pipelines.

I’ve added a memory safety check that warns users if the buffer would exceed 4GB and points them to g.region to reduce it.
Draft PR: [GSoC 2026 Draft POC] r.proj: OpenMP parallelization via RAM-resident buffer by krcoder123 · Pull Request #7185 · OSGeo/grass · GitHub

Two questions for mentors:

  1. Is the RAM-resident approach acceptable long-term, or are thread-local caches strongly preferred for the final implementation?
  2. Are there thread-safety concerns in GPJ_transform beyond PJ_CONTEXT that I should audit before the proposal deadline?

Thanks,
Kaushik Raja
GitHub: krcoder123


Visit Topic or reply to this email to respond.

To unsubscribe from these emails, click here.

Hi Doug, thanks for the feedback. I’ve updated the PR to wire the existing memory parameter directly to the RAM buffer limit instead of the hardcoded 4GB. Testing with memory=100 correctly triggers the warning showing the map requires 381MB which exceeds the limit, and with memory=8000 it runs cleanly at 632% CPU in 5.9 seconds. The warning now also references g.region as you suggested. Updated commit is in the PR.

Happy to address any other feedback before I finalize the proposal.

Thanks,

Kaushik

I’ve completed a full draft of my proposal and would welcome any feedback before the deadline.

Two things I’d specifically value input on: whether the static globals fix in lib/proj/do_proj.c should be scoped within this project or treated as a separate library fix, and whether the thread-local cache design for Path B aligns with how the community would want this approached.

Thank you,

Kaushik

Thank you, I briefly looked at it and I don’t have necessarily any suggestions for improvements, but you might want to reframe it around parallelization in general as opposed to only r.proj. My expectation would be parallelization of r.proj might not take you more than a month, especially with today’s AI.

Hi Anna, thank you for the feedback. I’ll expand the scope to include additional modules. Could you suggest which tools beyond r.proj would be most valuable to the community? I want to prioritize based on what matters most rather than picking randomly.

Also, for the additional modules, would you expect the same level of analysis as r.proj, or is a higher level overview fine for the secondary targets?

Hi @annakrat

I’ve updated the proposal and expanded the scope based on your feedback. After auditing the r.fill.stats source code, I found that row-level parallelism is blocked by a sliding ring buffer, but the column loop inside interpolate_row() is fully independent per cell, making it a clean target. r.fill.stats is now confirmed as the secondary module. I’ll identify an additional module during the bonding period after auditing the remaining candidates.

Please let me know if there are any changes I should make before the deadline.

Thanks,

Kaushik

Thanks for looking into it more. I am somewhat skeptical the column-level parallelization can provide any meaningful speedup, so these tools might need more work. Most of the low hanging fruit has been already parallelized… You can look into r.geomorphon or r.param.scale and we haven’t explored any vector tools yet.

Hi @annakrat

Thank you for the suggestions.

I’ve been looking into r.param.scale and r.geomorphon. r.param.scale uses a sequential sliding buffer but I believe the correct solution is the same RAM preload pattern from r.proj: load the full raster into a contiguous array, then parallelize the outer row loop with per-thread computation buffers. This avoids the need to read each row’s neighborhood from disk multiple times. I’m working on a prototype and will share it before the deadline and update my proposal. Would r.param.scale be your preferred secondary target, or would r.geomorphon be higher priority?

Thanks,

Kaushik

Hi @annakrat

I’ve updated the proposal based on your suggestions. r.param.scale is now confirmed as the secondary target. I audited the source, identified the sequential sliding buffer as the core blocker, applied the same RAM preload pattern from r.proj, and got 1.7x speedup on a 100M cell raster. I opened a draft PR for this: Gsoc parallel rparamscale by krcoder123 · Pull Request #7236 · OSGeo/grass · GitHub . r.geomorphon is the leading candidate for the third module and will be audited during the bonding period.

Updated proposal: GSoC 2026 GRASS Proposal - Kaushik Raja - Google Docs

Any suggestions for improvement before the submission deadline would be great!

Thanks,

Kaushik