D.Challoner

§ Chapter III

Consulting

Hard problems,
patiently solved.

I work with a small number of teams each quarter on the infrastructure problems that matter most — reliability, scale, platform design, and increasingly, the engineering behind serious AI systems.
Availability
Q3 · Q4 2026
Engagement
2–8 weeks
Remote
EU · US hours
Equity
Considered

§I — Practice

Five pillars, one point of view.

01

Authorization & Data Governance

Who can do what, where, and why — answered well.

My home turf. I help teams design authorization that holds up in regulated environments: attribute-based access control, path-aware policies, tamper-proof audit, and human-and-agent approval flows. Drawn from a decade of running access systems at Google scale.

Typical outcomes

  • AuthZ architecture that survives audits
  • Policy evaluation with sub-200ms latency budgets
  • Immutable audit trails that actually answer questions
  • A path for agents, not just humans, to use safely

02

Site Reliability Engineering

SLOs, error budgets, and grown-up on-call.

I wrote the chapter on this. Literally — “Eliminating Toil” in The Site Reliability Workbook. I help teams make reliability a first-class concern: SLOs that map to user journeys, error-budget culture that keeps launches honest, and on-call you can actually sustain.

Typical outcomes

  • Service-level objectives tied to real user journeys
  • Observability that answers the 2am question
  • On-call rotations engineers don’t dread
  • Post-mortems that change behaviour

03

Service Mesh & Zero Trust

Connectivity without the chaos.

Istio, Envoy, Anthos — I co-wrote the reference account of how Google’s Corp Eng rolled ASM internally. I help teams pick the mesh shape that fits their traffic, their org, and their on-call load, and the migration path that doesn’t break them in the process.

Typical outcomes

  • A mesh architecture matched to your actual needs
  • Zero-trust mTLS done without breaking everything
  • Progressive delivery engineers trust
  • A migration plan with graceful rollbacks

04

Distributed Systems & Platform

Consensus, coordination, and correctness.

The problems that don’t appear in a single-node test suite. Consistency models, partition behaviour, leader election, back-pressure, idempotency. Plus the platform engineering around them: golden paths, cluster topology, and blast-radius control.

Typical outcomes

  • Architecture reviews that find the real risks
  • Correctness properties stated plainly and tested
  • Replay, reconciliation, and recovery paths
  • A platform team charter worth keeping

05

AI & Agentic Engineering

From demo to durable — when the user is an agent.

A working prototype is not a product — especially when the product talks back. I help teams build serious agentic systems: evals that correlate with user value, observability for non-deterministic systems, and the infrastructure question we’re all still answering: what do filesystems, identity, and access look like when the caller is an agent?

Typical outcomes

  • Evals that correlate with user value
  • Observability for non-deterministic systems
  • Authorization and audit for agent callers
  • Cost & latency budgets that hold under growth

§II

Enterprise roadmap

Selling into Big Enterprise?
Here’s the roadmap.

Most startups find the regulated-enterprise and federal markets opaque — a wall of acronyms sitting between them and ten-figure deal sizes. It isn’t opaque. It’s a staged climb with well-known rungs, and each rung is tractable if you plan for it. Here’s the ladder I help teams map against — and the insider-risk controls that live across every stage.

  1. 00

    Foundations

    Stage 00

    Unlocks — Mid-market & large commercial — most procurement reviews clear here.

    SOC 2 Type IIISO 27001Evidence automationVPAT / accessibility

    Table stakes. The goal is not to pass — it’s to pass cheaply and never again.

  2. 01

    Commercial federal

    Stage 01

    Unlocks — Civilian agencies, state & local, heavily regulated commercial.

    FedRAMP ModerateStateRAMPCJIS where relevantCustomer-held encryption (BYOK/HYOK)US-only residency

    The first step that actually changes the product. Expect 12–18 months and a dedicated program lead.

  3. 02

    DoD-adjacent

    Stage 02

    Unlocks — DoD mission owners, intelligence community primes, defence industrial base.

    FedRAMP HighDoD IL4 · IL5CMMC 2.0 L2 / L3GovCloud isolationHardware root of trust

    Architecture starts to fork. Worth it if your deal sizes justify a second SKU.

  4. 03

    Export-controlled

    Stage 03

    Unlocks — Primes, defence R&D, regulated research, the top of the pyramid.

    ITAR boundaryEAR / dual-use controlsNationality-aware accessRegional data residency (EU, UK, AUKUS)Air-gapped / sovereign options

    Path-aware access control — not just endpoint checks — is the quiet load-bearing requirement here.

§III

Process

How an engagement unfolds.

Every engagement starts with a conversation and ends with something you keep. What happens in between is shaped to the problem.

  1. I30 min

    Listen

    A free call. You describe the problem; I ask a lot of questions. We both leave with a clearer picture of whether I’m useful.

  2. II1 week

    Scope

    A short written proposal: the problem as I understand it, the proposed engagement shape, deliverables, and a fixed price.

  3. III2–6 weeks

    Dive

    I work alongside your team — reading code, instrumenting systems, joining standups, writing design docs, shipping fixes.

  4. IV1 week

    Hand-off

    A durable artifact: a memo, a working system, a hiring brief, a playbook. The goal is that you don’t need me after I leave.

§IV

Case notes

The shape of the work.

A few representative engagements drawn from years on Access SRE. Companies anonymised, details adjusted, shape preserved.

Case 01

Regulated-industry SaaS

6 weeks

Problem

Zero-trust authorization stalled at scale — policy latency made every request a liability.

Approach

Re-shaped evaluation as path-aware, with attribute inputs resolved at the edge and a tiered cache for device and location signals.

Result

Policy evaluation under 50ms at p99. Audit coverage held. Compliance team stopped blocking launches.

AuthZZero Trust

Case 02

Legal + finance org

8 weeks

Problem

High-value research was slow, manual, and too expensive to repeat reliably.

Approach

Built grounded AI research workflows with clear review checkpoints, source capture, and failure boundaries that made the output auditable.

Result

Turnaround dropped from days to minutes, replacing expensive manual research loops with agentic systems people would actually use.

AI EngReliability

Case 03

AI-native startup

4 weeks

Problem

Agentic system worked in demo, drifted in prod; no way to measure the drift.

Approach

Built a repo-grounded eval harness (see RepoGauge), wired evals into CI, and defined the authorization boundaries the agents had to respect.

Result

Regression caught in PR rather than from customers. Agent cost per task down 35%. Trust, slowly, restored.

AI Eng.Evals

§V

Engagement

Three shapes of working together.

Office hours

01

Weekly · ongoing·From $2.5k / mo

Two hours of my week, on tap. A standing call plus async Slack/email. The right shape for teams that need a thoughtful outside eye, not a deliverable.

  • Architecture review on demand
  • Hiring & rubric guidance
  • Code & design-doc reads
Most chosen

Focused engagement

02

2–8 weeks·Fixed-price

Most common. A defined problem with a defined outcome — a reliability rollout, an AI pipeline hardening, a platform blueprint. Scoped together before we start.

  • Named deliverable(s)
  • Code contributions where useful
  • Written artifact you keep

Advisory

03

Quarterly·Cash or equity

For founders in the early innings. A few hours a month at your side as you make the architectural decisions that are hard to unmake later.

  • Strategic technical direction
  • Interview loops & senior hires
  • Early-stage partnership

To begin

Tell me what’s on fire.

A single, honest paragraph is all it takes to start. I read every message, and reply to most within two business days.