Dataset Workflows¶
Dataset workflows are the bridge between built artifact state and store-backed serving state.
Dataset Workflow Map¶
flowchart TD
Build[Ingest build root] --> Validate[dataset validate]
Validate --> Verify[dataset verify]
Verify --> Publish[dataset publish]
Publish --> Pack[dataset pack]
Pack --> VerifyPack[dataset verify-pack]
This workflow map shows the main dataset lifecycle after ingest. Atlas keeps validation, publication, and packaging as explicit steps so readers can see which boundary they are crossing.
The Important Distinction¶
flowchart LR
BuildRoot[Build root] --> ValidateOps[validate and verify]
BuildRoot --> PublishOps[publish into store]
Store[Serving store] --> CatalogOps[catalog workflows]
This distinction diagram exists because “dataset state” can mean more than one thing in casual conversation. The build root and the serving store are related, but they are not interchangeable.
Atlas uses dataset commands both before and after publication:
- before publication, they validate or verify build output
- after publication, they help package or inspect durable dataset state
Most Common Dataset Commands¶
dataset validatedataset verifydataset publishdataset packdataset verify-pack
Example Workflow¶
Validate and deeply verify a build root:
cargo run -p bijux-atlas --bin bijux-atlas -- dataset validate \
--root artifacts/getting-started/tiny-build \
--release 110 \
--species homo_sapiens \
--assembly GRCh38
cargo run -p bijux-atlas --bin bijux-atlas -- dataset verify \
--root artifacts/getting-started/tiny-build \
--release 110 \
--species homo_sapiens \
--assembly GRCh38 \
--deep
Publish into a store:
cargo run -p bijux-atlas --bin bijux-atlas -- dataset publish \
--source-root artifacts/getting-started/tiny-build \
--store-root artifacts/getting-started/tiny-store \
--release 110 \
--species homo_sapiens \
--assembly GRCh38
When to Use Pack Operations¶
Use dataset pack and dataset verify-pack when you need a portable dataset bundle for transport, validation, or release handling outside the immediate build directory.
Workflow Advice¶
- do not skip validation before publication
- treat build roots and serving stores as different lifecycle stages
- use pack verification when moving dataset bundles across trust boundaries
When This Page Is Enough¶
- you are validating or publishing a dataset root
- you are packaging a dataset bundle for transport or review
- you need the dataset lifecycle without the deeper contract details
Purpose¶
This page explains the Atlas material for dataset workflows and points readers to the canonical checked-in workflow or boundary for this topic.
Stability¶
This page is part of the canonical Atlas docs spine. Keep it aligned with the current repository behavior and adjacent contract pages.