Deep Dive Snakemake¶

A course-book and executable capstone that teaches Snakemake as a workflow engine—not merely a collection of rules and scripts. The objective is to enable the creation of workflows that feature explicit contracts, safe dynamic behavior, atomic outputs, reproducible execution, and built-in validation.

CI executes full confirmation runs including workflow execution and artifact validation.

What this is¶

Many Snakemake workflows function adequately in simple cases but encounter issues under scale: implicit dependencies, checkpoint misuse, non-atomic outputs, configuration drift, or reproducibility failures across environments.

Deep Dive Snakemake provides a structured approach to robust design. It emphasizes a strict contract:

Explicit inputs/outputs: every dependency and product is declared and enforced.
Atomic publication: outputs are written safely with no partial artifacts.
Dynamic safety: checkpoints and re-evaluation used correctly without races or surprises.
Configuration discipline: validated schemas and modular composition.
Reproducibility: profiles, manifests, and integrity checks for verifiable runs.
Self-validation: wrapper-driven checks confirm correctness end-to-end.

This repository offers practical guidance toward genuine mastery of Snakemake semantics: understanding its guarantees, limitations, and patterns that ensure workflows remain reliable as complexity increases.

Back to top

What you get¶

1) The course-book¶

A compact, focused handbook with practical patterns, anti-patterns, and guidance:

explicit inputs/outputs and safe writing
checkpoints and dependency re-evaluation
configuration + schema validation
modular workflow composition
publishing, manifests, and integrity checks
execution profiles and reproducible runs

Read on the website: https://bijux.github.io/deep-dive-snakemake/

2) The executable capstone¶

snakemake-capstone/ is a complete end-to-end pipeline on toy FASTQ data that embodies the principles above, demonstrating:

checkpoint-driven sample discovery
per-sample processing stages
summary and report generation
versioned publish/v1/ outputs
checksummed manifest and artifact sanity checks
a Make-driven verification flow

Back to top

Quick start¶

Prerequisites: - Python 3.11+ - make

From the repository root:

Preview the course book locally¶

make venv
make docs-serve

Open the local URL displayed by MkDocs.

Run the capstone reference workflow¶

make capstone-confirm

This executes formatting/linting/tests, a dry-run, full workflow execution, and artifact validation.

Successful completion confirms the workflow's contract on your system.

Back to top

Repository layout¶

.
├── course-book/              # Course-book source (MkDocs)
├── mkdocs.yml                # Documentation configuration
├── snakemake-capstone/       # End-to-end Snakemake reference workflow
│   ├── Snakefile
│   ├── config/
│   ├── workflow/
│   └── ...
├── .github/workflows/        # CI + Pages automation
├── Makefile                  # Unified entrypoints for local use and CI
├── LICENSE
└── README.md

Back to top

Who this is for¶

Engineers maintaining or inheriting complex bioinformatics workflows seeking reliability.
Users familiar with basic Snakemake but encountering issues with checkpoints, reproducibility, or scaling.
Teams requiring workflows that are trustworthy in CI/CD and production environments.

This is not an introductory syntax tutorial. It focuses on workflow semantics and correctness engineering using Snakemake.

Back to top

Contributing¶

Contributions that enhance correctness, clarity, or reproducibility are welcome (improvements to documentation, exercises, or capstone hardening).

Fork and clone the repository.
Implement a focused change (documentation or capstone).
From the repository root, verify:
```
make capstone-confirm
```
Open a pull request against main.

Back to top

License¶

Back to top