Failure semantics (spec)¶
This document explains how failures propagate through the pipeline and into artifacts.
Where failures appear¶
Failures can surface in:
- the pipeline result (
final_status) - trace entries (
failure_artifact) - API responses (
errorobject)
The representation is intentionally redundant: the verdict summary is for humans, the trace is for auditing and tooling.
Recoverable vs non-recoverable¶
A failure artifact includes recoverable:
recoverable: truemeans the orchestrator MAY attempt a recovery strategy (e.g. retry)recoverable: falsemeans the run MUST terminate (or move to a non-retry fallback path)
Recoverability is constrained by the failure class profile; see docs/spec/failure_model.md.
Operational vs epistemic failures¶
- Operational failures: timeouts, resource exhaustion, validation errors, etc.
- Epistemic failures: the system cannot justify a confident decision (e.g. insufficient evidence)
Epistemic failures are not “bugs”; they are explicit outcomes that preserve honesty.
Trace replayability interaction¶
Some failures are replayable (safe to validate deterministically), others are not. If a failure is non-replayable, replay tooling MUST not claim deterministic reproducibility.
Contractual rule¶
If a failure artifact violates taxonomy or profiles, the system MUST fail fast rather than emit an invalid trace.