ADR-0005: Prefer eventual consistency over enforced process
| Status |
Accepted |
| Date |
2026-04-24 |
| Deciders | |
| Related |
1. Context
The Event and Membership system supports a small live-events operation: registration desks under time pressure, volunteers running stock rooms, timing systems that occasionally misfire, and admin that happens after the event when someone has time to sit down and catch up. The software is one tool among many; it does not own the workflow.
Two design philosophies are in tension:
-
Enforced process. The software treats each formal workflow step as a precondition. If step N is skipped, step N+1 fails loudly. Data stays consistent at the cost of forcing operators to complete every step in order — even when the physical reality has already moved past that step.
-
Eventual consistency. The software accepts out-of-sequence events, missing triggers, and informal handovers as part of reality. Each deliberate transition is idempotent; each trigger is a signal, not a gate. Data converges to truth when later events provide the missing context.
Enforced process looks tidy on paper but punishes the busiest people in the system. Eventual consistency is messier to document but matches how live events actually run. The whole set of decisions we’ve taken on the number-lifecycle state machine points the same direction; it’s worth naming the pattern explicitly so future features follow it consistently.
2. Decision
When designing new workflows that span operator action and data capture, prefer eventual consistency:
-
Accept implicit transitions. If the system receives a signal that is only valid after a formal step, and that step was skipped, infer the step from the signal. Record the inference in the audit trail but do not reject the signal. (ADR-0002 — IN_STOCK result row → implicit issuance.)
-
Make stamps monotone. Timestamps that represent "most recent" should advance only. Re-imports, out-of-order processing, and replay must not corrupt the signal. (ADR-0004 —
last_used.) -
Make transitions idempotent. The same trigger firing twice should produce the same end state. A result row encountering an already-IN_USE number re-stamps
last_usedwithout writing a duplicate log entry. -
Separate audit from state.
race_number_state_logcaptures what happened;race_number.statecaptures where the number is now. These are different questions; the state machine does not reconstruct history, and the audit log does not drive business rules. -
Reject only when a human must decide. ADR-0003 rejects UNFIT → IN_USE because the four possible real-world causes have four different resolutions. "A human must decide" is the bright line for rejection; "the process wasn’t followed" is not.
3. Consequences
3.1. Positive
-
Operators can do their job without fighting the system. Registration desks, stock rooms, and admin desks all work at their own cadence.
-
The system is resilient to sequencing errors: out-of-order events, missed triggers, duplicate imports, and replay all converge to the same end state.
-
The audit log remains the canonical record of what happened; the state fields give a fast answer to where we are now. Each question has one right place to look.
-
New features can be added incrementally without requiring ops to change their habits.
3.2. Negative
-
Mid-flight state can be misleading. A number that has been physically handed to a participant but not yet raced may still show as
IN_STOCKin a stock-count query until the result lands. Downstream consumers that need point-in-time accuracy have to consult the audit log, not the state. -
There is a ceiling on data quality: if no trigger ever arrives (participant DNF, no result row, no return scan), the system will never reconcile. A number handed out and never seen again stays
IN_STOCKindefinitely. Periodic stock-takes are the compensating control. -
Discipline around the audit log matters. If a caller bypasses the service and writes state directly, the log diverges from reality and the self-healing guarantee breaks. ADR-0001 is the structural mitigation.
-
"The system will figure it out" is not an excuse for sloppy process. Self-healing lowers the cost of a missed step, but the formal process still exists and ops training should still teach it — the tolerant semantics are a safety net, not a design intent.
4. Self-healing examples
These are the concrete points where the pattern shows up, as of 2026-04-24:
-
IN_STOCK → IN_USE via result import. Formal issuance skipped; the result row is enough evidence; state catches up. (ADR-0002)
-
last_usedadvances, never retreats. Historical re-imports cannot corrupt the pick-list signal. (ADR-0004) -
IN_USE re-stamp is silent. Repeated result imports for the same (number, event) do not duplicate log rows; only the first transition writes.
-
Return-scan tolerates any source state. A stock-take scan for a number already IN_STOCK is a no-op, not an error. Ops can run the scan without first checking the state.
-
Implicit orphan re-linking.
recordNumberAssignmentaccepts a number inISSUED/IN_USEwithperson_id = NULL(orphan state from migration or partial returns) and re-links the person without changing state.
When adding a new workflow, consult this list: if the new feature has a similar "formal step may be skipped" or "event may arrive out of order" shape, apply the same pattern.
5. Alternatives Considered
5.1. Alternative A: Strict process enforcement
Every transition requires its formal predecessor. Skipping a step raises an exception; ops must run the missing step retroactively before the system will accept the newer event. Rejected because — in live-events context — this pushes coordination cost onto the people least able to absorb it, generates low-quality retroactive data entry, and creates a perpetual "reconciliation backlog" that nobody wants to work through.
5.2. Alternative B: Strict process, with a parallel "fix-it-up" admin tool
Enforce strictly in the primary path; build a separate admin UI to patch inconsistent state after the fact. Rejected because the fix-it tool becomes the primary path (ops learn it’s easier than running the formal workflow), and we end up maintaining two codebases for what could be one self-healing path.