Add link checker to CI to catch broken manual URLs

Hi! While going through the time series tutorials, I noticed a few documentation links that weren’t being validated:

  • Several links were hardcoded to versioned GRASS manuals (grass83, grass84) and now return 301 redirects.
  • One link pointed to the wrong tool entirely (t.rast.univar linking to t.vect.univar).

These were fixed in #121, but they slipped through because the current CI appears to verify that the site builds successfully without checking the validity of links.

Would the project be open to adding a lightweight link-checking step to CI? A small GitHub Actions workflow using lychee (the Rust-based link checker) could automatically detect broken links, unexpected redirects, and similar issues on every PR. It would be self-contained, require no changes to the existing CI setup, and can be configured to ignore known unstable URLs if needed.

If this seems useful, I’d be happy to put together a PR with the workflow.

@echoix any opinion on this? No strong opinion on my side either way.

It has been run manually about 1 or 2 years ago on the main repo. The problem is that there’s a lot of older references, that aren’t always available, or on some smaller personal servers that are slower.

With these slower servers, without caching the results, it’s not realistic to check every time. And having failures on the build tasks because of flaky externals servers is more frustrating than it should be. Then, when configuring that check, it needs to be made correctly so that we don’t end up with failures in the PR CI run vs the deployed layout.

Is there a way to not have It fail? To have it run as a separate job that can be restarted without rebuilding docs? To have it run manually?

I also commented somewhere about lychee this morning, I don’t remember where.

@echoix @annakrat Thank you for the reviews!

Thanks for the context, that makes sense.

  • A separate workflow (not part of the PR checks) that runs on a schedule (e.g., weekly) or can be triggered manually via workflow_dispatch would avoid blocking PRs due to flaky external servers.

  • lychee also supports caching results and provides a --max-retries option, which could help reduce false positives from transient network failures.

Would that approach be worth a PR to try out?