Deep Dive Snakemake¶
A course-book and executable capstone that teaches Snakemake as a workflow engine—not merely a collection of rules and scripts. The objective is to enable the creation of workflows that feature explicit contracts, safe dynamic behavior, atomic outputs, reproducible execution, and built-in validation.
CI executes full confirmation runs including workflow execution and artifact validation.
What this is¶
Many Snakemake workflows function adequately in simple cases but encounter issues under scale: implicit dependencies, checkpoint misuse, non-atomic outputs, configuration drift, or reproducibility failures across environments.
Deep Dive Snakemake provides a structured approach to robust design. It emphasizes a strict contract:
- Explicit inputs/outputs: every dependency and product is declared and enforced.
- Atomic publication: outputs are written safely with no partial artifacts.
- Dynamic safety: checkpoints and re-evaluation used correctly without races or surprises.
- Configuration discipline: validated schemas and modular composition.
- Reproducibility: profiles, manifests, and integrity checks for verifiable runs.
- Self-validation: wrapper-driven checks confirm correctness end-to-end.
This repository offers practical guidance toward genuine mastery of Snakemake semantics: understanding its guarantees, limitations, and patterns that ensure workflows remain reliable as complexity increases.
What you get¶
1) The course-book¶
A compact, focused handbook with practical patterns, anti-patterns, and guidance:
- explicit inputs/outputs and safe writing
- checkpoints and dependency re-evaluation
- configuration + schema validation
- modular workflow composition
- publishing, manifests, and integrity checks
- execution profiles and reproducible runs
Read on the website: https://bijux.github.io/deep-dive-snakemake/
2) The executable capstone¶
snakemake-capstone/ is a complete end-to-end pipeline on toy FASTQ data that embodies the principles above, demonstrating:
- checkpoint-driven sample discovery
- per-sample processing stages
- summary and report generation
- versioned
publish/v1/outputs - checksummed manifest and artifact sanity checks
- a Make-driven verification flow
Quick start¶
Prerequisites: - Python 3.11+ - make
From the repository root:
Preview the course book locally¶
Open the local URL displayed by MkDocs.
Run the capstone reference workflow¶
This executes formatting/linting/tests, a dry-run, full workflow execution, and artifact validation.
Successful completion confirms the workflow's contract on your system.
Repository layout¶
.
├── course-book/ # Course-book source (MkDocs)
├── mkdocs.yml # Documentation configuration
├── snakemake-capstone/ # End-to-end Snakemake reference workflow
│ ├── Snakefile
│ ├── config/
│ ├── workflow/
│ └── ...
├── .github/workflows/ # CI + Pages automation
├── Makefile # Unified entrypoints for local use and CI
├── LICENSE
└── README.md
Who this is for¶
- Engineers maintaining or inheriting complex bioinformatics workflows seeking reliability.
- Users familiar with basic Snakemake but encountering issues with checkpoints, reproducibility, or scaling.
- Teams requiring workflows that are trustworthy in CI/CD and production environments.
This is not an introductory syntax tutorial. It focuses on workflow semantics and correctness engineering using Snakemake.
Contributing¶
Contributions that enhance correctness, clarity, or reproducibility are welcome (improvements to documentation, exercises, or capstone hardening).
- Fork and clone the repository.
- Implement a focused change (documentation or capstone).
- From the repository root, verify:
- Open a pull request against
main.
License¶
MIT — see LICENSE. © 2025 Bijan Mousavi.