Skip to content

Failure semantics (spec)

This document explains how failures propagate through the pipeline and into artifacts.

Where failures appear

Failures can surface in:

  • the pipeline result (final_status)
  • trace entries (failure_artifact)
  • API responses (error object)

The representation is intentionally redundant: the verdict summary is for humans, the trace is for auditing and tooling.

Recoverable vs non-recoverable

A failure artifact includes recoverable:

  • recoverable: true means the orchestrator MAY attempt a recovery strategy (e.g. retry)
  • recoverable: false means the run MUST terminate (or move to a non-retry fallback path)

Recoverability is constrained by the failure class profile; see docs/spec/failure_model.md.

Operational vs epistemic failures

  • Operational failures: timeouts, resource exhaustion, validation errors, etc.
  • Epistemic failures: the system cannot justify a confident decision (e.g. insufficient evidence)

Epistemic failures are not “bugs”; they are explicit outcomes that preserve honesty.

Trace replayability interaction

Some failures are replayable (safe to validate deterministically), others are not. If a failure is non-replayable, replay tooling MUST not claim deterministic reproducibility.

Contractual rule

If a failure artifact violates taxonomy or profiles, the system MUST fail fast rather than emit an invalid trace.