ADR-0005: Prefer eventual consistency over enforced process

Status	Accepted
Date	2026-04-24
Deciders	[email protected]
Related	ADR-0002, ADR-0004, Feature #473

1. Context

The Event and Membership system supports a small live-events operation: registration desks under time pressure, volunteers running stock rooms, timing systems that occasionally misfire, and admin that happens after the event when someone has time to sit down and catch up. The software is one tool among many; it does not own the workflow.

Two design philosophies are in tension:

Enforced process. The software treats each formal workflow step as a precondition. If step N is skipped, step N+1 fails loudly. Data stays consistent at the cost of forcing operators to complete every step in order — even when the physical reality has already moved past that step.
Eventual consistency. The software accepts out-of-sequence events, missing triggers, and informal handovers as part of reality. Each deliberate transition is idempotent; each trigger is a signal, not a gate. Data converges to truth when later events provide the missing context.

Enforced process looks tidy on paper but punishes the busiest people in the system. Eventual consistency is messier to document but matches how live events actually run. The whole set of decisions we’ve taken on the number-lifecycle state machine points the same direction; it’s worth naming the pattern explicitly so future features follow it consistently.

2. Decision

When designing new workflows that span operator action and data capture, prefer eventual consistency:

Accept implicit transitions. If the system receives a signal that is only valid after a formal step, and that step was skipped, infer the step from the signal. Record the inference in the audit trail but do not reject the signal. (ADR-0002 — IN_STOCK result row → implicit issuance.)
Make stamps monotone. Timestamps that represent "most recent" should advance only. Re-imports, out-of-order processing, and replay must not corrupt the signal. (ADR-0004 — last_used.)
Make transitions idempotent. The same trigger firing twice should produce the same end state. A result row encountering an already-IN_USE number re-stamps last_used without writing a duplicate log entry.
Separate audit from state. race_number_state_log captures what happened; race_number.state captures where the number is now. These are different questions; the state machine does not reconstruct history, and the audit log does not drive business rules.
Reject only when a human must decide. ADR-0003 rejects UNFIT → IN_USE because the four possible real-world causes have four different resolutions. "A human must decide" is the bright line for rejection; "the process wasn’t followed" is not.

3. Consequences

3.1. Positive

Operators can do their job without fighting the system. Registration desks, stock rooms, and admin desks all work at their own cadence.
The system is resilient to sequencing errors: out-of-order events, missed triggers, duplicate imports, and replay all converge to the same end state.
The audit log remains the canonical record of what happened; the state fields give a fast answer to where we are now. Each question has one right place to look.
New features can be added incrementally without requiring ops to change their habits.

3.2. Negative

Mid-flight state can be misleading. A number that has been physically handed to a participant but not yet raced may still show as IN_STOCK in a stock-count query until the result lands. Downstream consumers that need point-in-time accuracy have to consult the audit log, not the state.
There is a ceiling on data quality: if no trigger ever arrives (participant DNF, no result row, no return scan), the system will never reconcile. A number handed out and never seen again stays IN_STOCK indefinitely. Periodic stock-takes are the compensating control.
Discipline around the audit log matters. If a caller bypasses the service and writes state directly, the log diverges from reality and the self-healing guarantee breaks. ADR-0001 is the structural mitigation.
"The system will figure it out" is not an excuse for sloppy process. Self-healing lowers the cost of a missed step, but the formal process still exists and ops training should still teach it — the tolerant semantics are a safety net, not a design intent.

3.3. Neutral

Some decisions still require enforcement (e.g. DESTROYED is terminal; you cannot resurrect a disposed number via any trigger). Those are recorded in the state machine itself, not in this ADR.

4. Self-healing examples

These are the concrete points where the pattern shows up, as of 2026-04-24:

IN_STOCK → IN_USE via result import. Formal issuance skipped; the result row is enough evidence; state catches up. (ADR-0002)
last_used advances, never retreats. Historical re-imports cannot corrupt the pick-list signal. (ADR-0004)
IN_USE re-stamp is silent. Repeated result imports for the same (number, event) do not duplicate log rows; only the first transition writes.
Return-scan tolerates any source state. A stock-take scan for a number already IN_STOCK is a no-op, not an error. Ops can run the scan without first checking the state.
Implicit orphan re-linking. recordNumberAssignment accepts a number in ISSUED/IN_USE with person_id = NULL (orphan state from migration or partial returns) and re-links the person without changing state.

When adding a new workflow, consult this list: if the new feature has a similar "formal step may be skipped" or "event may arrive out of order" shape, apply the same pattern.

5. Alternatives Considered

5.1. Alternative A: Strict process enforcement

Every transition requires its formal predecessor. Skipping a step raises an exception; ops must run the missing step retroactively before the system will accept the newer event. Rejected because — in live-events context — this pushes coordination cost onto the people least able to absorb it, generates low-quality retroactive data entry, and creates a perpetual "reconciliation backlog" that nobody wants to work through.

5.2. Alternative B: Strict process, with a parallel "fix-it-up" admin tool

Enforce strictly in the primary path; build a separate admin UI to patch inconsistent state after the fact. Rejected because the fix-it tool becomes the primary path (ops learn it’s easier than running the formal workflow), and we end up maintaining two codebases for what could be one self-healing path.

5.3. Alternative C: Document as ops guidance, leave code strict

Keep the code strict; tell ops "don’t skip steps". Rejected because software that fights its users loses. Operational reality does not change because we document a preference.

6. References

Design journal: design-journal/2026-03/number-tag-management.adoc (sessions 3 & 6 — state machine tolerances)
Each concrete application: ADR-0002, ADR-0003, ADR-0004
ADO: Feature #473 (Number & Tag Lifecycle)