Concepts¶
This project exists to make LLM-backed document processing auditable. The core design choice is that a run must leave behind artifacts that can be inspected, diffed, and validated.
Pipeline as a contract¶
A run is an execution of the canonical pipeline (AuditableDocPipeline) over a context:
- a goal (
task_goal) - an input (
textor afile_path) - a stable identifier (
context_id)
The pipeline is canonical: its phase order and allowed transitions are fixed and versioned. Consumers should assume “same inputs + same config + same model settings” implies “same trace classification”, not “same model output”.
Phases¶
The canonical lifecycle is:
INIT → PLAN → EXECUTE → JUDGE → VERIFY → FINALIZE → DONE
Not every phase necessarily produces user-visible content today, but the lifecycle exists so traces have a stable semantic scaffold.
Artifacts and why they matter¶
A normal CLI run writes two primary artifacts:
result/final_result.json— a compact verdict summary (what you would read first)trace/run_trace.json— the trace (what you audit and validate)
The trace is designed to support:
- post-hoc debugging (“why did we decide PASS?”),
- regression detection (compare fingerprints across versions),
- replay validation (“does this trace satisfy the contract?”).
Determinism vs replayability¶
Determinism is treated as a classification, not a promise that LLM sampling will match across time.
The system marks traces:
- REPLAYABLE when
model_metadata.temperature == 0.0 - NON_REPLAYABLE otherwise
Replayability controls what the tooling is allowed to claim, not what the user is allowed to do.
Failure semantics (high-level)¶
Failures are represented by an immutable FailureArtifact:
- a machine-actionable class (
failure_class) - an operational/epistemic category
- recoverability (whether the orchestrator may attempt recovery)
See the formal taxonomy: docs/spec/failure_model.md.
Where to go next¶
- For hands-on usage:
docs/user/usage.md - For the formal contract:
docs/spec/read_this_first.md