GSoC 2025: Final Report | Migrating and Updating the Existing CWL Runners
Organization: OSGeo (Open Source Geospatial Foundation), ZOO-Project
Contributor: Aryan Khare
Hello everyone,
As GSoC comes to a end, I’m pleased to share my final report on the work completed over the past three months. This journey has been full of learning, and I’m grateful for the support and guidance from the ZOO-Project community and mentors.
Project Description
This project focuses on migrating and updating the existing CWL runners within the ZOO-Project to create a centralized, modular, and AI-ready framework.
Key goals included:
- Unifying multiple runners (WES, Argo, Calrissian) under a single entry point.
- Reducing code duplication through reusable commons.
- Introducing abstract base classes for consistency.
- Standardizing service templates.
- Building automated CI/CD pipelines.
- Providing clear developer documentation.
This work improves maintainability, speeds up development, and ensures reliable execution of geospatial and Earth Observation workflows.
State of the Project Before GSoC
- Scattered Repositories: Each runner lived in a separate repo with no single entry point.
- Code Duplication: Common functionalities (logging, validation, execution) were repeated.
- Difficult Onboarding: No templates or guides for adding new runners.
- No Centralized Testing: Separate test setups, inconsistent quality, and no automated CI/CD.
Benefits to the Community
- Improved Code Quality: Unit tests serve as living documentation.
- Enhanced Reliability: Automated checks catch regressions early.
- Faster Development Cycles: CI/CD enables rapid iteration.
- Increased Maintainability: Reusable commons reduce long-term maintenance costs.
Outcomes of the Project
1. Architecture & Organization
- Single entry point:
zoo-cwl-runners/main_runner.pyprovides a unified CLI. - Lazy imports to enable fast tests without full runner dependencies.
- Separation of concerns: runners, templates, and commons in their own repos.
- Repository interconnection strategy with
zoo-cwl-runnersas the hub.
2. Common Libraries
- zoo-runner-common: BaseRunner, ZooConf, ZooInputs, ZooOutputs, ZooStub.
- zoo-template-common: ExecutionHandler, CustomStacIO, extensible hooks.
- Packaged with
pyproject.toml, installable viapip.
3. Runners Refactor
- Removed duplicated base code.
- Preserved public APIs.
- Runners (ArgoWF, WES, Calrissian) now import from commons.
- Smoke-test ready without requiring clusters.
4. Service Templates
- Updated to use commons (
ExecutionHandler,ZooStub,CustomStacIO). - Preserved cookiecutter layouts for flexibility.
5. CI/CD (GitHub Actions)
- Two jobs: unit tests and integration smoke tests.
- Pinned Python 3.10 for stability.
- Skipped Calrissian tests due to packaging issues.
6. Testing Outcomes
- Unit tests: validate repo sanity, importability, structure.
- Integration tests: dummy CWL configs for ArgoWF & WES.
- Test hygiene: markers, selective runs, extensible for future E2E.
7. Documentation & Onboarding
- READMEs for commons and runners.
- Developer Guide: “How to Implement a New CWL Runner for ZOO-Project”.
8. Dependency Stabilization
- Python 3.10 pinned.
- Lazy imports for heavy/optional deps.
- Documented rationale for requirements.
9. Developer Experience
- Modular, extensible APIs.
- Consistent workspace strategy (
deps/in CI, sibling repos locally). pip install -efriendly.
10. Governance & Maintenance
- Single source of truth with commons.
- CI as a safety net.
- Transparent documentation of limitations.
11. Items Deferred
- Calrissian CI integration (packaging issues).
- End-to-End workflow execution tests.
- PyPI publishing of commons.
12. Net Impact
- Maintainability ↑: fewer copies, shared commons.
- Extensibility ↑: easier to add runners/templates.
- Reliability ↑: CI with automated checks.
- Community Onboarding ↑: docs and guides in place.
Potential Future Work
- Calrissian Integration – resolve pycalrissian packaging issues, add Kubernetes-based mock tests.
- End-to-End Workflow Tests – deploy lightweight infra (Argo, WES mocks) for real workflow validation.
- PyPI Distribution – publish
zoo-runner-commonandzoo-template-common. - Documentation Expansion – unified docs site with API references and diagrams.
- CI/CD Enhancements – coverage, linting, mypy, multi-version testing.
- New Runner/Template Development – e.g., Nextflow, Snakemake, GPU-aware templates.
- Community Adoption – merge upstream, present at OSGeo/OGC, promote adoption in EOEPCA projects.
- Scalability & Performance – benchmarking runners, optimizing resource logic, caching strategies.
What I Have Learned
Technical Skills
- Python packaging (
pyproject.toml, setuptools, dependency management). - Designed modular, reusable components (BaseRunner, ExecutionHandler).
- Built CI/CD with GitHub Actions, unit + integration tests.
Open Source Practices
- Managed multi-repo ecosystem with distributed structures.
- Wrote developer guides and documentation.
- Reduced duplication through shared commons.
Personal Growth
- Strengthened problem-solving and debugging skills.
- Improved communication with mentors via weekly reports.
- Gained confidence in contributing to large-scale, OGC-aligned projects.
Links
- Weekly Reports & Updates: View Weekly Reports
- Project Proposal: Google Docs Proposal
- Project Wiki: Project Wiki Home
- Final Report: View Final Report
Reference links of Repositories worked upon
- [1] zoo-calrissian-runner
- [2] zoo-argowf-runner
- [3] zoo-wes-runner
- [4] zoo-service-template
- [5] eoepca-proc-service-template
- [6] eoepca-proc-service-template-wes
- [7] zoo-argo-wf-proc-service-template
- [8] zoo-runner-common
- [9] zoo-cwl-runners
- [10] zoo-template-common
Acknowledgements
I sincerely thank my mentors Gérald Fenoy and Chetan Mahajan for their continuous support and guidance, and the OSGeo/ZOO-Project community for their encouragement throughout this journey.
Thank you!