D.Challoner

§ Chapter II

About

A short account of the person on the other end of the email.

Engineer, manager, dad, builder. Spent the last stretch a Staff Engineer and Uber Tech Lead at Google. Chapter author of The Site Reliability Workbook and Building Secure and Reliable Systems. Now independent and exploring how to make autonomous agentic engineering reliable.
Based in
Los Angeles
Status
Independent · 2026
From
Google - 2017
Focus
AuthZ · AI · SRE
David Challoner smiling with his child in a black-and-white illustrated portrait
Family Portrait

Fig. 1 — The author, off the clock© DC

I spent nearly a decade at Google, including the last five years as an SRE “Uber” Tech Lead. My team ran the authorization and data-governance systems behind GCP’s Regulated Cloud, as well as the controls that governed privileged access inside Google. It was zero-trust infrastructure at a scale large enough to punish hand-waving. I stepped out in March 2026 to build independently.

Along the way I co-wrote the Anthos Service Mesh adoption story for Google Corp Eng, was the primary author of “Eliminating Toil” in The Site Reliability Workbook (O’Reilly, 2018), and contributed to Building Secure and Reliable Systems (Google, 2020). I’ve also filed a handful of patent disclosures, including one on path-aware access control.

Now that I’ve left, I’m building a few things in the open: JustAuth, RepoGauge, and ClawFS. Each one pokes at a different edge of the question: what does infrastructure look like when the primary user is an agent, not a human?

Off the keyboard I’m a dad, archer, hiker, a long-time tinkerer, and the proud maintainer of several half-finished side projects.

Specialties

Authorization · Data Governance · Zero-Trust · Service Mesh · SRE · Agentic Systems.

Also

Rust · Python · Go · TypeScript. Writer, occasional speaker, patient debugger.

§II

Experience

A record of where the time went.

  1. 2026 —

    Independent · Building & advising

    Building JustAuth, RepoGauge, and ClawFS in the open. Taking on a small number of consulting engagements per quarter on authorization, reliability, service mesh, and agentic AI infrastructure.

    AuthZAI Eng.SRE
  2. — March 2026

    Uber Tech lead · Google — Regulated Cloud SRE

    Ran the authorization and data-governance systems that mediate how Googlers reach internal applications. Led the adoption of Anthos Service Mesh across Corp Eng to enforce consistent security policies over services deployed across cloud, corp data centers, and edge locations.

    AuthZZero TrustAnthosIstio
  3. Earlier at Google

    Senior Site Reliability Engineer · Google

    A long tour across SRE teams. Primary author of “Eliminating Toil” in The Site Reliability Workbook (O’Reilly, 2018) and contributing author to Building Secure and Reliable Systems (Google, 2020).

    SREReliabilityPlatform
  4. Before Google

    Software Engineer · Startups & infrastructure

    Early-career engineering in distributed systems and infrastructure. Learned what “good” operations looks like largely by not having them yet.

    DistributedInfra

§II.ii

Publications

Written, co-written, and quoted elsewhere.

  1. 2018

    The Site Reliability Workbook — Ch. 6 · O’Reilly · Primary author

    “Eliminating Toil” — the chapter on identifying and reducing the repetitive, predictable work that erodes SRE team capacity. Co-authored with John Looney, Vivek Rau, and others.

    BookSRE
  2. 2020

    Building Secure and Reliable Systems · Google · Contributing author

    Google’s follow-up to the SRE Book, bringing the disciplines of reliability and security into one volume.

    BookSecurity
  3. 2022

    Securing apps for Googlers using Anthos Service Mesh · Google Cloud Blog · Co-author

    With Anthony Bushong. How Corp Eng and Access SRE adopted Anthos Service Mesh to mediate Googler access across trust boundaries with minimal operational overhead.

    Service MeshZero Trust
  4. 2024

    Access Control Based on Entire Request Path · Invention disclosure · TDCommons 6837

    A path-aware access-control mechanism: validates every device a request traverses, not just endpoints — preventing regulated data from leaving permissible regions.

    PatentAuthZ
  5. 2026

    Climbing the Agentic Coding Ladder · LinkedIn · Essay

    A working field guide to autonomous coding agents: which rungs are solid, which are aspirational, and how to tell in practice.

    AI Eng.

§II.iii

Education

Somewhat formal training.

  1. 2004 — 2008

    Computer Science + Political Science · University of California, Santa Cruz

    Data structures and algorithims, storage systems (Ceph!), distributed systems. Also studied international trade, how democracies work, and which policies seem to promote the most prosperity.

§III

Principles

How I think about the work.

  1. 01

    Boring is a feature.

    The best infrastructure is the kind you can forget about. I reach for proven tools and earn the right to novelty.

  2. 02

    Reliability is a lens.

    SLOs before launches. Error budgets as real constraints. If you can’t measure it, you can’t protect it.

  3. 03

    Small surface area.

    Interfaces should be narrow, durable, and easy to reason about. Remove weight on the airframe.

  4. 04

    Write things down.

    Design docs, post-mortems, READMEs. The org’s memory outlives its people; the artifacts should be worth inheriting.

  5. 05

    Technology agnostic.

    I’ve seen pretty bad code in every language. Velocity and confidently shipping without regressions matters more than frameworks and languages.

  6. 06

    Respect the on-call.

    Be deliberate about what surfaces as a ticket vs a page. Waking people up kills them slowly.