The carbon-accounting risk stack is mature. The next vector — ESG-grade safeguards — is the one Qatalyst has the right team and the right customers to own. This brief lays out the thesis, the homework, the v0.3 prototype I built, and the v0.4 path I'd want to take it on next.
For the last fifteen years the carbon-credit market spent its energy on one question: did the project actually remove the carbon it claims to remove? Registries, VVBs, the ICVCM, and the rating agencies were all built around answering it. That's the carbon-accounting stack, and as of 2026 it's roughly mature.
The next question — the one that's going to drive the next five years of buy-side workflow — is different: who's getting hurt, who's getting paid, and who's going to sue. That's the ESG-grade safeguards stack, and it doesn't really exist yet. Caroline Guyot named the social-license dimension at Qatalyst's launch. Sentinel is one answer to what the screening tool for it should look like.
Qatalyst is positioned to own this layer because of three things at once: a carbon-finance domain anchor (Martalena), enterprise customers who actually care (ENGIE, SC), and the workflow surface already built around the analyst. The competition can't replicate any single one of those, and they certainly can't replicate the three together.
Five candidate pain points went through five elimination rounds, one fresh lens each.
| CANDIDATE | ELIMINATION LENS | WHY IT FELL |
|---|---|---|
| Methodology drift alarm | Time-saved lens | Fires rarely; feature, not product. |
| PDD parsing | Qatalyst product-fit lens | You already ship this — "Automated First-Line Evaluation." |
| Rater reconciler | Buildability lens | Sylvera + Calyx data is paywalled; demo would feel fake. |
| IC memo drafter | Defensibility lens | Every B2B SaaS will ship this. Feature, not product. |
| ✓ Safeguards screen | Narrative lens | Fills the gap Caroline Guyot named. Real free data sources. Becomes ESG. |
What works today, end-to-end.
Sentinel is a Flask + Python engine — four parallel data-source calls fan out per screen, the score rolls up by a transparent rule, and an LLM drafts the IC-memo Safeguards section. End-to-end latency runs 5–30 seconds depending on synthesis. The Next.js showcase you're reading right now is a separate, static-deployable site.
v0.3 shipped. v0.4 + v1.0 are ~6 weeks of focused engineering.
Where I'd want to push this in month one.
Three product moves I'd want to make in my first 90 days at NOC, in order.
Month one — ride shotgun. Sit with one of your analysts during real DD on a real project, with Sentinel open in a tab. Learn where it adds time, where it adds friction, where it adds value, where it adds nothing.
Month two — close the news layer. The blind tests showed adverse-news retrieval is the engine's weakest link — major-outlet coverage of known-controversial projects sometimes returns zero hits. The fix is an LLM-per-article claim classifier replacing the keyword scorer, multilingual sources, and an in-memory cache so verdicts are reproducible. The work isn't hard; it's the highest-leverage thing left.
Month three — close the loop. Re-screen the portfolio weekly. When a project flips from green to amber, the analyst gets a notification with diff. This is what turns Sentinel from a screen into a watcher.
Where the prototype falls short — said out loud.
The curated NGO ledger has four entries. No real curation pipeline yet — that's the second most important thing to build after deployment.
Indigenous-overlap detection is dark for uncurated projects. Native Land Digital deprecated its free tier in late 2025. Production wires the paid API key for global coverage.
Environmental + Governance are country-level today. A REDD+ project sitting in 0.001% of Brazil's land shouldn't be judged by Brazil's national forest trend. v0.4 wires project-polygon GFW GLAD alerts.
News retrieval is the weakest link. Blind tests showed major-outlet coverage of known-controversial projects sometimes returning zero hits because of strict query-construction. v0.3 ships cascading query variants; v0.4 replaces the keyword scorer with an LLM-per-article classifier.
Latency runs 5 – 30 seconds. Data fetch is ~3s; the rest is LLM synthesis. There is no caching layer today. v0.4 adds per-project caching to make verdicts reproducible run-over-run.
The honest version: I'm a third-year Rotman Commerce student, not a carbon-finance veteran. What I am is someone who got obsessed with the buy-side analyst's day and wanted to find out what would happen if I took one corner of it seriously, end-to-end, with Claude as a coding collaborator.
What came out is a working tool — engine, test suite, standards-anchored framework, public seams — and a clear opinion about where it goes next. I think that combination — the artifact plus the opinion — is the most honest signal I can hand you about how I'd work inside Qatalyst.
Happy to walk through any part of this live, or to take it apart and ship the v0.4 version if I get the chance.