Creating an Operations Module
An operations document keeps the system itself healthy at runtime — cache eviction, log levels, observability, configuration, outbound mail. It is engineer-facing and steady-state: the day-to-day upkeep that does not require a deployment.
1. When to use this type
Create an Operations module when engineers/SREs need documented, repeatable runtime control over a deployed system: inspecting health, changing log levels, evicting caches, reading traces, configuring integrations.
| It IS operations when… | It is NOT operations when… |
|---|---|
It keeps the running system healthy |
It produces a business outcome → Procedures |
It is routine / steady-state |
It recovers from an exceptional event → Runbooks |
The audience is an engineer / SRE |
The audience is an operator learning a feature → User Guide |
It is runtime (no redeploy) |
It is build/release/infra provisioning → a Deployment module |
2. Standard structure
modules/operations/
├── nav.adoc
└── pages/
├── index.adoc # Overview + boundary with Procedures/Runbooks/Deployment
├── management-api-access.adoc # One page per operational concern
├── cache-management.adoc
├── observability.adoc
└── <integration>-configuration.adoc
The index.adoc should explicitly state the boundary: "this module keeps the system healthy;
for business outcomes see Procedures, for exceptional recovery see Runbooks, for build/deploy see
Deployment."
3. Page template
Operations pages vary more than procedures, but each should cover: purpose, prerequisites /
access, how to do it (commands/endpoints), how to verify, and cautions. Copy
templates/operations-template.adoc, or:
= <Operational concern>
:description: <one line — what runtime capability this covers>
== Overview
<What this controls and when an engineer needs it.>
== Access
<Authentication, endpoints, tunnels, required roles.>
== Procedure
<Commands / API calls / config, with expected output.>
== Verification
<How to confirm the change took effect (e.g. cluster-wide).>
== Cautions
<Blast radius, timing windows, what NOT to do in production.>
4. Flexibility
-
Library / SDK: usually no Operations module — there is no running system to keep healthy.
-
Single service: a few pages (health/management API, cache, observability) typically suffice.
-
Platform / multi-service: split by concern and consider a dedicated Observability or Configuration sub-area.
5. Quality checklist
-
Each page states access/prerequisites and the verification step.
-
Cautions cover blast radius and production timing.
-
The index draws the boundary with Procedures, Runbooks, and Deployment.
-
No business-outcome content (that is Procedures); no incident recovery (that is Runbooks).
6. Common pitfalls
|
7. Related
-
Documentation Architecture — where this type fits.
-
Runbooks Guide — the exceptional sibling.
-
Procedures Guide — the operator-facing sibling.