Module 2: First-Class Functions and Expressive Python¶
Progression Note¶
By the end of Module 2, you'll master first-class functions for configurability, expression-oriented code, and debugging taps. This prepares for lazy iteration in Module 3. See the series progression map in the repo root for full details.
Here's a snippet from the progression map:
| Module | Focus | Key Outcomes |
|---|---|---|
| 1: Foundational FP Concepts | Purity, contracts, refactoring | Spot impurities, write pure functions, prove equivalence with Hypothesis |
| 2: First-Class Functions & Expressive Python | Closures, partials, composable configurators | Configure pure pipelines without globals |
| 3: Lazy Iteration & Generators | Streaming/lazy pipelines | Efficient data processing without materializing everything |
M02C01: Closures & Partials for Configurators – Pure Configurators¶
Core question:
How do closures and partial application create pure configurators that capture immutable config to produce reusable, deterministic variants of RAG pipelines without globals or mutable defaults?
This core introduces pure configurators in Python:
- Treat config as explicit, immutable data captured in closures or fixed via partials for predictable variants.
- Default to pure functions that depend only on captured config and inputs, preserving determinism.
- Isolate mutable state (if any) to thin edges, building on Module 1's purity.
This core builds on Module 1’s purity, immutability, and explicit dependencies by showing how to capture configuration as immutable values instead of leaking it through globals or mutable defaults.
We use the running project from m02-rag.md—extending the FuncPipe RAG Builder—to ground every concept. This project evolves across all 10 cores: start with a configurable but impure version using globals; end with pure, composable configurators.
Audience: Python developers from Module 1 with pure pipelines who now need configurable variants (e.g., different chunk sizes or rules) but face nondeterminism from globals or mutable defaults.
Outcome:
1. Spot globals or mutable defaults in config and explain why they break determinism.
2. Refactor a configurable impure function to pure using closures/partials.
3. Write a Hypothesis property providing strong evidence of equivalence to Module 1, including a shrinking example.
Runnability Note (Module 01 Snapshot vs Module 02 End-State)¶
This core includes two kinds of snippets:
1) Runnable against the end-of-Module-02 codebase (this checkout)
These use the real APIs in src/funcpipe_rag/ (e.g., RagConfig, make_rag_fn, full_rag_api_docs, iter_rag_core).
2) Hypothetical pre-refactor snippets (illustration only)
These are intentionally “bad” or “in-between” states used to teach refactoring. They are not meant to match a real snapshot 1:1. They are labeled as Hypothetical pre-refactor and are refactored into the real Module 02 API across this module.
If you want a real, runnable Module 01 codebase, use the module-01 git tag in a worktree:
make worktrees- Module 01 path:
history/worktrees/module-01/ - Import path for Module 01:
history/worktrees/module-01/src/(usePYTHONPATHwhen running examples there)
Module 01 uses the same import name (import funcpipe_rag), so run it from the Module 01 worktree (or set PYTHONPATH) to avoid mixing versions.
We refactor the hypothetical pre-refactor shapes into the real Module 02 API across this module.
1. Conceptual Foundation¶
1.1 The One-Sentence Rule¶
Use closures/partials for pure configurators with immutable capture only; avoid globals, env vars, or mutable defaults to preserve pipeline determinism and equivalence.
1.2 Closures & Partial Application in One Precise Sentence¶
Closures capture immutable config to produce pure customized functions; partial application fixes arguments for reusable variants—ensuring deterministic, composable behavior from explicit data only.
1.3 Why This Matters Now¶
Without pure configurators, adding variability (e.g., different chunk sizes or cleaning rules) introduces globals or mutable defaults, leading to nondeterministic outputs where the same inputs yield different chunks depending on prior calls or shared state—breaking Module 1's substitution and tests.
1.4 Configurators as Values in 5 Lines¶
Configurators as first-class values enable dynamic variants:
from functools import partial
from collections.abc import Callable
from funcpipe_rag import CleanDoc, ChunkWithoutEmbedding, RagEnv
def chunk_doc(doc: CleanDoc, env: RagEnv) -> list[ChunkWithoutEmbedding]:
text = doc.abstract
step = env.chunk_size
return [
ChunkWithoutEmbedding(
doc.doc_id,
text[i: i + step],
i,
i + len(text[i: i + step]),
)
for i in range(0, len(text), step)
]
def make_env(chunk_size: int) -> RagEnv:
return RagEnv(chunk_size)
variants: dict[str, Callable[[CleanDoc], list[ChunkWithoutEmbedding]]] = {
"small": partial(chunk_doc, env=make_env(256)),
"medium": partial(chunk_doc, env=make_env(512)),
"large": partial(chunk_doc, env=make_env(1024)),
}
def run_variant(key: str, doc: CleanDoc) -> list[ChunkWithoutEmbedding]:
return variants[key](doc)
Because the partial is pure (immutable env, no globals), we can safely store it in a dict, pass it around, and test it in isolation—just like data.
2. Mental Model: Globals vs Closures/Partials¶
2.1 One Picture¶
Globals (Hidden Deps) Closures/Partials (Explicit Capture)
+-----------------------+ +------------------------------+
| global ENV / CFG | | make_rag_fn(env, cfg) |
| ↓ | | ↓ |
| rag(docs) → chunks | | rag_fn(docs) → chunks |
| (hidden config) | | (fixed config) |
+-----------------------+ +------------------------------+
↑ Flaky / Non-deterministic ↑ Deterministic / Testable
2.2 Contract Table¶
| Aspect | Globals / Mutable Defaults | Closures / Partials |
|---|---|---|
| Dependencies | Hidden globals, env vars | Explicit, immutable capture |
| Determinism | Breaks (outputs vary) | Safe (same config = same) |
| Testing | Flaky (depends on history) | Local reasoning (properties) |
| Composability | Races / scattered config | Freedom from shared state |
| Mutable Defaults in Partials | Breaks Determinism | Use frozen dataclasses or immutable types for configs |
Note on Globals Choice: Rarely, for fixed scripts with no variants, use explicit params; but prefer configurators for reuse.
3. Running Project: FuncPipe RAG Builder¶
Our running project (from m02-rag.md) is extending the pure RAG pipeline from Module 1 to add configurability.
- Dataset: 10k arXiv CS abstracts (arxiv_cs_abstracts_10k.csv).
- Goal: Configurable cleaning, chunking, etc., into Chunk list (dedup added later).
- Start: Hypothetical pre-refactor with globals (core1_start.py, illustration only).
- End (this core): Pure configurator with explicit capture, preserving equivalence.
3.1 Types (End-of-Module-02, Used by Configurators)¶
In the end-of-Module-02 codebase, cleaner configuration is represented as frozen data:
from funcpipe_rag import CleanConfig, make_cleaner
cfg = CleanConfig(rule_names=("strip", "lower", "collapse_ws"))
cleaner = make_cleaner(cfg)
3.2 Impure Start (Anti-Pattern)¶
This is a hypothetical pre-refactor example used for contrast. It is intentionally not meant to match a real snapshot 1:1, and it is not intended to be run in the end-of-Module-02 checkout.
# core1_start.py (hypothetical pre-refactor; illustration only)
from funcpipe_rag import RawDoc, CleanDoc, Chunk, RagEnv
from funcpipe_rag import chunk_doc, embed_chunk, structural_dedup_chunks
GLOBAL_ENV = RagEnv(512) # BAD: hidden dependency (breaks determinism contract)
MUTABLE_CFG = {"rules": [str.strip, str.lower]} # BAD: shared mutable default (breaks determinism contract)
def impure_full_rag(docs: list[RawDoc], cfg: dict = MUTABLE_CFG) -> list[Chunk]:
global GLOBAL_ENV
def impure_cleaner(d: RawDoc) -> CleanDoc:
abstract = d.abstract
for r in cfg["rules"]:
abstract = r(abstract)
return CleanDoc(d.doc_id, d.title, abstract, d.categories)
cleaned = [impure_cleaner(d) for d in docs]
chunked = [c for doc in cleaned for c in chunk_doc(doc, GLOBAL_ENV)]
embedded = [embed_chunk(c) for c in chunked]
return structural_dedup_chunks(embedded)
# Usage: Non-deterministic due to globals
docs: list[RawDoc] = [RawDoc("cs-123", "Title", "Abstract text...", "cs.AI")]
chunks1 = impure_full_rag(docs)
chunks2 = impure_full_rag(docs)
# May differ if GLOBAL_ENV mutated externally
Smells: Globals (GLOBAL_ENV), mutable defaults (MUTABLE_CFG as shared dict).
Problem: impure_full_rag(docs) depends on hidden state; can't substitute without replaying globals.
4. Refactor to Pure: Explicit Capture¶
4.1 Pure Configurator¶
Pass all config explicitly; capture in closures/partials.
# Pure refactor: Explicit capture
from funcpipe_rag import CleanConfig, make_rag_fn
# `make_rag_fn` is the canonical configurator in this repo:
# it captures frozen config in a closure (no globals).
cfg = CleanConfig()
rag_fn = make_rag_fn(chunk_size=512, clean_cfg=cfg)
chunks1, obs1 = rag_fn(docs)
chunks2, obs2 = rag_fn(docs)
assert chunks1 == chunks2 and obs1 == obs2
Wins: No globals/mutables; explicit capture. Matches Module 1 when defaults used.
Note: Defaults are expressed as data (DEFAULT_RULES, DEFAULT_CLEAN_CONFIG) and captured immutably into RagConfig.
4.2 Before-and-After Refactoring Snippet¶
To cement the transition from globals to closures, here's an explicit mini-example showing the "ugly before" with a global and the "clean after" using a closure:
# Before: Ugly global config
from funcpipe_rag import CleanDoc, ChunkWithoutEmbedding, RagEnv
GLOBAL_ENV = RagEnv(chunk_size=512) # BAD: hidden dependency (breaks determinism contract)
def chunk_doc(doc: CleanDoc) -> list[ChunkWithoutEmbedding]:
# Implicitly uses global
text = doc.abstract
step = GLOBAL_ENV.chunk_size
chunks: list[ChunkWithoutEmbedding] = []
for start in range(0, len(text), step):
segment = text[start:start + step]
chunks.append(ChunkWithoutEmbedding(doc.doc_id, segment, start, start + len(segment)))
return chunks
# After: Pure closure with explicit config
def make_chunk_doc(env: RagEnv) -> Callable[[CleanDoc], list[ChunkWithoutEmbedding]]:
def chunk_doc(doc: CleanDoc) -> list[ChunkWithoutEmbedding]:
text = doc.abstract
step = env.chunk_size
chunks: list[ChunkWithoutEmbedding] = []
for start in range(0, len(text), step):
segment = text[start:start + step]
chunks.append(ChunkWithoutEmbedding(doc.doc_id, segment, start, start + len(segment)))
return chunks
return chunk_doc
# Usage: Deterministic and testable
chunk_fn = make_chunk_doc(RagEnv(chunk_size=512))
This refactor eliminates hidden dependencies, making the function pure and easier to test—same inputs always yield the same outputs.
4.3 Pure Partial for Rules¶
# RulesConfig configurator (end-of-Module-02)
from funcpipe_rag import All, LenGt, RulesConfig, StartsWith, make_rag_fn
def keep_categories(prefix: str) -> RulesConfig:
"""Pure configurator: capture a prefix into an immutable RulesConfig."""
return RulesConfig(keep_pred=StartsWith("categories", prefix))
cs_keep = keep_categories("cs.")
cs_long_keep = RulesConfig(keep_pred=All((StartsWith("categories", "cs."), LenGt("abstract", 500))))
# Usage: Variant with filtering (RulesConfig is the canonical keep type in Module 02)
rag_filtered = make_rag_fn(chunk_size=512, clean_cfg=cfg, keep=cs_keep)
filtered_chunks, _ = rag_filtered(docs)
Wins: Fixes prefix explicitly; composable. Enables configurable filtering without globals.
5. Equational Reasoning: Substitution Exercise¶
Hand Exercise: Replace expressions in make_rag_fn.
1. Inline env = RagEnv(512) → fixed value.
2. Substitute into partial → fixed call.
3. Result: rag_fn(docs) = fixed value for fixed inputs.
Bug Hunt: In impure version, substitution fails (depends on globals).
6. Property-Based Testing: Providing Strong Evidence of Equivalence (Advanced, Optional)¶
Use Hypothesis to provide strong evidence that the refactor preserved behavior. Verify all patterns with Hypothesis—examples provided show how to detect impurities like globals or non-determinism.
6.1 Custom Strategy (RAG Domain)¶
From tests/conftest.py (as in Module 1).
6.2 Equivalence Property¶
# tests/test_rag_api.py
from hypothesis import given
import hypothesis.strategies as st
from tests.conftest import doc_list_strategy
from funcpipe_rag import (
RagEnv,
clean_doc,
embed_chunk,
iter_chunk_doc,
structural_dedup_chunks,
CleanConfig,
make_rag_fn,
)
def baseline_full_rag(docs, env):
embedded = [embed_chunk(c) for d in docs for c in iter_chunk_doc(clean_doc(d), env)]
return structural_dedup_chunks(embedded)
@given(docs=doc_list_strategy(), chunk_size=st.integers(128, 1024))
def test_configurator_parity(docs, chunk_size):
rag_fn = make_rag_fn(chunk_size=chunk_size, clean_cfg=CleanConfig())
new_chunks, _ = rag_fn(docs)
old_chunks = baseline_full_rag(docs, RagEnv(chunk_size))
assert new_chunks == old_chunks
Note: Property focuses on equivalence (same chunks); assumes no rules/taps.
6.3 Determinism Property¶
@given(docs=doc_list_strategy(), chunk_size=st.integers(128, 1024))
def test_configurator_deterministic(docs, chunk_size):
rag_fn = make_rag_fn(chunk_size=chunk_size, clean_cfg=CleanConfig())
assert rag_fn(docs) == rag_fn(docs)
6.4 Shrinking Demo: Catching a Bug¶
Bad refactor (uses global):
from collections.abc import Callable
from funcpipe_rag import CleanConfig, RagEnv, clean_doc, embed_chunk, iter_chunk_doc, structural_dedup_chunks
def bad_make_rag_fn(
chunk_size: int, clean_cfg: CleanConfig
) -> Callable[[list], list]:
global GLOBAL_ENV
GLOBAL_ENV = RagEnv(chunk_size)
cleaner = make_cleaner(clean_cfg)
def run(docs):
embedded = [embed_chunk(c) for d in docs for c in iter_chunk_doc(cleaner(d), GLOBAL_ENV)]
return structural_dedup_chunks(embedded)
return run
Property (swapped to bad_make_rag_fn):
@given(
docs=doc_list_strategy(),
chunk_size1=st.integers(128, 1024),
chunk_size2=st.integers(128, 1024),
)
def test_bad_configurator_env_sensitive(docs, chunk_size1, chunk_size2):
global GLOBAL_ENV
GLOBAL_ENV = RagEnv(chunk_size1) # Simulate prior state
rag_fn = bad_make_rag_fn(chunk_size2, CleanConfig())
pure = make_rag_fn(chunk_size=chunk_size2, clean_cfg=CleanConfig())
pure_chunks, _ = pure(docs)
assert rag_fn(docs) == pure_chunks # Fails when chunk_size1 != chunk_size2 (global sensitivity)
Hypothesis failure trace (run to verify; example with differing sizes):
Falsifying example: test_bad_configurator_env_sensitive(
docs=[RawDoc(doc_id='a', title='', abstract='a', categories='')],
chunk_size1=128,
chunk_size2=129,
)
AssertionError
- Shrinks to minimal doc; catches reliance on global because the test models external state changes. This deliberately reintroduces the global GLOBAL_ENV and shows how Hypothesis exposes the hidden dependency, by comparing against the pure make_rag_fn.
7. When Configurators Aren't Worth It¶
Rarely, for one-off calls with fixed config (e.g., scripts), pass params explicitly; but use configurators for variants or reuse.
8. Pre-Core Quiz¶
f = partial(g, x=1); f() == f()? → Yes, if pure.- Substitute closure call? → Safe (fixed capture).
- Global in configurator? → Hidden dep → impure.
- Mutable default? → Breaks determinism.
- Cache configured fn? → Safe if pure.
9. Post-Core Reflection & Exercise¶
Reflect: In your code, find one configurable func (global/default). Refactor to pure configurator; add Hypothesis equiv.
Project Exercise: Apply to RAG; run properties on sample data.
Next: Core 2 – Expression-Oriented Python. (Builds on this configurator.)
Further Reading: For more on closures in Python, see 'Fluent Python' by Luciano Ramalho. Explore toolz for advanced partials once comfortable.