Skip to content

Module 2: First-Class Functions and Expressive Python

Progression Note

By the end of Module 2, you'll master first-class functions for configurability, expression-oriented code, and debugging taps. This prepares for lazy iteration in Module 3. See the series progression map in the repo root for full details.

Here's a snippet from the progression map:

Module Focus Key Outcomes
1: Foundational FP Concepts Purity, contracts, refactoring Spot impurities, write pure functions, prove equivalence with Hypothesis
2: First-Class Functions & Expressive Python Closures, partials, composable configurators Configure pure pipelines without globals
3: Lazy Iteration & Generators Streaming/lazy pipelines Efficient data processing without materializing everything

M02C01: Closures & Partials for Configurators – Pure Configurators

Core question:
How do closures and partial application create pure configurators that capture immutable config to produce reusable, deterministic variants of RAG pipelines without globals or mutable defaults?

This core introduces pure configurators in Python:
- Treat config as explicit, immutable data captured in closures or fixed via partials for predictable variants.
- Default to pure functions that depend only on captured config and inputs, preserving determinism.
- Isolate mutable state (if any) to thin edges, building on Module 1's purity.

This core builds on Module 1’s purity, immutability, and explicit dependencies by showing how to capture configuration as immutable values instead of leaking it through globals or mutable defaults.

We use the running project from m02-rag.md—extending the FuncPipe RAG Builder—to ground every concept. This project evolves across all 10 cores: start with a configurable but impure version using globals; end with pure, composable configurators.

Audience: Python developers from Module 1 with pure pipelines who now need configurable variants (e.g., different chunk sizes or rules) but face nondeterminism from globals or mutable defaults.
Outcome:
1. Spot globals or mutable defaults in config and explain why they break determinism.
2. Refactor a configurable impure function to pure using closures/partials.
3. Write a Hypothesis property providing strong evidence of equivalence to Module 1, including a shrinking example.


Runnability Note (Module 01 Snapshot vs Module 02 End-State)

This core includes two kinds of snippets:

1) Runnable against the end-of-Module-02 codebase (this checkout)
These use the real APIs in src/funcpipe_rag/ (e.g., RagConfig, make_rag_fn, full_rag_api_docs, iter_rag_core).

2) Hypothetical pre-refactor snippets (illustration only)
These are intentionally “bad” or “in-between” states used to teach refactoring. They are not meant to match a real snapshot 1:1. They are labeled as Hypothetical pre-refactor and are refactored into the real Module 02 API across this module.

If you want a real, runnable Module 01 codebase, use the module-01 git tag in a worktree:

  • make worktrees
  • Module 01 path: history/worktrees/module-01/
  • Import path for Module 01: history/worktrees/module-01/src/ (use PYTHONPATH when running examples there)

Module 01 uses the same import name (import funcpipe_rag), so run it from the Module 01 worktree (or set PYTHONPATH) to avoid mixing versions.

We refactor the hypothetical pre-refactor shapes into the real Module 02 API across this module.

1. Conceptual Foundation

1.1 The One-Sentence Rule

Use closures/partials for pure configurators with immutable capture only; avoid globals, env vars, or mutable defaults to preserve pipeline determinism and equivalence.

1.2 Closures & Partial Application in One Precise Sentence

Closures capture immutable config to produce pure customized functions; partial application fixes arguments for reusable variants—ensuring deterministic, composable behavior from explicit data only.

1.3 Why This Matters Now

Without pure configurators, adding variability (e.g., different chunk sizes or cleaning rules) introduces globals or mutable defaults, leading to nondeterministic outputs where the same inputs yield different chunks depending on prior calls or shared state—breaking Module 1's substitution and tests.

1.4 Configurators as Values in 5 Lines

Configurators as first-class values enable dynamic variants:

from functools import partial
from collections.abc import Callable
from funcpipe_rag import CleanDoc, ChunkWithoutEmbedding, RagEnv


def chunk_doc(doc: CleanDoc, env: RagEnv) -> list[ChunkWithoutEmbedding]:
    text = doc.abstract
    step = env.chunk_size
    return [
        ChunkWithoutEmbedding(
            doc.doc_id,
            text[i: i + step],
            i,
            i + len(text[i: i + step]),
        )
        for i in range(0, len(text), step)
    ]


def make_env(chunk_size: int) -> RagEnv:
    return RagEnv(chunk_size)


variants: dict[str, Callable[[CleanDoc], list[ChunkWithoutEmbedding]]] = {
    "small": partial(chunk_doc, env=make_env(256)),
    "medium": partial(chunk_doc, env=make_env(512)),
    "large": partial(chunk_doc, env=make_env(1024)),
}


def run_variant(key: str, doc: CleanDoc) -> list[ChunkWithoutEmbedding]:
    return variants[key](doc)

Because the partial is pure (immutable env, no globals), we can safely store it in a dict, pass it around, and test it in isolation—just like data.


2. Mental Model: Globals vs Closures/Partials

2.1 One Picture

Globals (Hidden Deps)                   Closures/Partials (Explicit Capture)
+-----------------------+               +------------------------------+
| global ENV / CFG      |               |   make_rag_fn(env, cfg)      |
|        ↓              |               |        ↓                     |
| rag(docs) → chunks    |               |   rag_fn(docs) → chunks      |
| (hidden config)       |               |   (fixed config)             |
+-----------------------+               +------------------------------+
   ↑ Flaky / Non-deterministic             ↑ Deterministic / Testable

2.2 Contract Table

Aspect Globals / Mutable Defaults Closures / Partials
Dependencies Hidden globals, env vars Explicit, immutable capture
Determinism Breaks (outputs vary) Safe (same config = same)
Testing Flaky (depends on history) Local reasoning (properties)
Composability Races / scattered config Freedom from shared state
Mutable Defaults in Partials Breaks Determinism Use frozen dataclasses or immutable types for configs

Note on Globals Choice: Rarely, for fixed scripts with no variants, use explicit params; but prefer configurators for reuse.


3. Running Project: FuncPipe RAG Builder

Our running project (from m02-rag.md) is extending the pure RAG pipeline from Module 1 to add configurability.
- Dataset: 10k arXiv CS abstracts (arxiv_cs_abstracts_10k.csv).
- Goal: Configurable cleaning, chunking, etc., into Chunk list (dedup added later).
- Start: Hypothetical pre-refactor with globals (core1_start.py, illustration only).
- End (this core): Pure configurator with explicit capture, preserving equivalence.

3.1 Types (End-of-Module-02, Used by Configurators)

In the end-of-Module-02 codebase, cleaner configuration is represented as frozen data:

from funcpipe_rag import CleanConfig, make_cleaner

cfg = CleanConfig(rule_names=("strip", "lower", "collapse_ws"))
cleaner = make_cleaner(cfg)

3.2 Impure Start (Anti-Pattern)

This is a hypothetical pre-refactor example used for contrast. It is intentionally not meant to match a real snapshot 1:1, and it is not intended to be run in the end-of-Module-02 checkout.

# core1_start.py (hypothetical pre-refactor; illustration only)
from funcpipe_rag import RawDoc, CleanDoc, Chunk, RagEnv
from funcpipe_rag import chunk_doc, embed_chunk, structural_dedup_chunks

GLOBAL_ENV = RagEnv(512)  # BAD: hidden dependency (breaks determinism contract)
MUTABLE_CFG = {"rules": [str.strip, str.lower]}  # BAD: shared mutable default (breaks determinism contract)


def impure_full_rag(docs: list[RawDoc], cfg: dict = MUTABLE_CFG) -> list[Chunk]:
    global GLOBAL_ENV

    def impure_cleaner(d: RawDoc) -> CleanDoc:
        abstract = d.abstract
        for r in cfg["rules"]:
            abstract = r(abstract)
        return CleanDoc(d.doc_id, d.title, abstract, d.categories)

    cleaned = [impure_cleaner(d) for d in docs]
    chunked = [c for doc in cleaned for c in chunk_doc(doc, GLOBAL_ENV)]
    embedded = [embed_chunk(c) for c in chunked]
    return structural_dedup_chunks(embedded)


# Usage: Non-deterministic due to globals
docs: list[RawDoc] = [RawDoc("cs-123", "Title", "Abstract text...", "cs.AI")]
chunks1 = impure_full_rag(docs)
chunks2 = impure_full_rag(docs)
# May differ if GLOBAL_ENV mutated externally

Smells: Globals (GLOBAL_ENV), mutable defaults (MUTABLE_CFG as shared dict).
Problem: impure_full_rag(docs) depends on hidden state; can't substitute without replaying globals.


4. Refactor to Pure: Explicit Capture

4.1 Pure Configurator

Pass all config explicitly; capture in closures/partials.

# Pure refactor: Explicit capture
from funcpipe_rag import CleanConfig, make_rag_fn

# `make_rag_fn` is the canonical configurator in this repo:
# it captures frozen config in a closure (no globals).
cfg = CleanConfig()
rag_fn = make_rag_fn(chunk_size=512, clean_cfg=cfg)

chunks1, obs1 = rag_fn(docs)
chunks2, obs2 = rag_fn(docs)
assert chunks1 == chunks2 and obs1 == obs2

Wins: No globals/mutables; explicit capture. Matches Module 1 when defaults used.
Note: Defaults are expressed as data (DEFAULT_RULES, DEFAULT_CLEAN_CONFIG) and captured immutably into RagConfig.

4.2 Before-and-After Refactoring Snippet

To cement the transition from globals to closures, here's an explicit mini-example showing the "ugly before" with a global and the "clean after" using a closure:

# Before: Ugly global config
from funcpipe_rag import CleanDoc, ChunkWithoutEmbedding, RagEnv

GLOBAL_ENV = RagEnv(chunk_size=512)  # BAD: hidden dependency (breaks determinism contract)


def chunk_doc(doc: CleanDoc) -> list[ChunkWithoutEmbedding]:
    # Implicitly uses global
    text = doc.abstract
    step = GLOBAL_ENV.chunk_size
    chunks: list[ChunkWithoutEmbedding] = []
    for start in range(0, len(text), step):
        segment = text[start:start + step]
        chunks.append(ChunkWithoutEmbedding(doc.doc_id, segment, start, start + len(segment)))
    return chunks


# After: Pure closure with explicit config
def make_chunk_doc(env: RagEnv) -> Callable[[CleanDoc], list[ChunkWithoutEmbedding]]:
    def chunk_doc(doc: CleanDoc) -> list[ChunkWithoutEmbedding]:
        text = doc.abstract
        step = env.chunk_size
        chunks: list[ChunkWithoutEmbedding] = []
        for start in range(0, len(text), step):
            segment = text[start:start + step]
            chunks.append(ChunkWithoutEmbedding(doc.doc_id, segment, start, start + len(segment)))
        return chunks

    return chunk_doc


# Usage: Deterministic and testable
chunk_fn = make_chunk_doc(RagEnv(chunk_size=512))

This refactor eliminates hidden dependencies, making the function pure and easier to test—same inputs always yield the same outputs.

4.3 Pure Partial for Rules

# RulesConfig configurator (end-of-Module-02)
from funcpipe_rag import All, LenGt, RulesConfig, StartsWith, make_rag_fn


def keep_categories(prefix: str) -> RulesConfig:
    """Pure configurator: capture a prefix into an immutable RulesConfig."""
    return RulesConfig(keep_pred=StartsWith("categories", prefix))


cs_keep = keep_categories("cs.")
cs_long_keep = RulesConfig(keep_pred=All((StartsWith("categories", "cs."), LenGt("abstract", 500))))

# Usage: Variant with filtering (RulesConfig is the canonical keep type in Module 02)
rag_filtered = make_rag_fn(chunk_size=512, clean_cfg=cfg, keep=cs_keep)
filtered_chunks, _ = rag_filtered(docs)

Wins: Fixes prefix explicitly; composable. Enables configurable filtering without globals.


5. Equational Reasoning: Substitution Exercise

Hand Exercise: Replace expressions in make_rag_fn.
1. Inline env = RagEnv(512) → fixed value.
2. Substitute into partial → fixed call.
3. Result: rag_fn(docs) = fixed value for fixed inputs.
Bug Hunt: In impure version, substitution fails (depends on globals).


6. Property-Based Testing: Providing Strong Evidence of Equivalence (Advanced, Optional)

Use Hypothesis to provide strong evidence that the refactor preserved behavior. Verify all patterns with Hypothesis—examples provided show how to detect impurities like globals or non-determinism.

6.1 Custom Strategy (RAG Domain)

From tests/conftest.py (as in Module 1).

6.2 Equivalence Property

# tests/test_rag_api.py
from hypothesis import given
import hypothesis.strategies as st
from tests.conftest import doc_list_strategy
from funcpipe_rag import (
    RagEnv,
    clean_doc,
    embed_chunk,
    iter_chunk_doc,
    structural_dedup_chunks,
    CleanConfig,
    make_rag_fn,
)

def baseline_full_rag(docs, env):
    embedded = [embed_chunk(c) for d in docs for c in iter_chunk_doc(clean_doc(d), env)]
    return structural_dedup_chunks(embedded)


@given(docs=doc_list_strategy(), chunk_size=st.integers(128, 1024))
def test_configurator_parity(docs, chunk_size):
    rag_fn = make_rag_fn(chunk_size=chunk_size, clean_cfg=CleanConfig())
    new_chunks, _ = rag_fn(docs)
    old_chunks = baseline_full_rag(docs, RagEnv(chunk_size))
    assert new_chunks == old_chunks

Note: Property focuses on equivalence (same chunks); assumes no rules/taps.

6.3 Determinism Property

@given(docs=doc_list_strategy(), chunk_size=st.integers(128, 1024))
def test_configurator_deterministic(docs, chunk_size):
    rag_fn = make_rag_fn(chunk_size=chunk_size, clean_cfg=CleanConfig())
    assert rag_fn(docs) == rag_fn(docs)

6.4 Shrinking Demo: Catching a Bug

Bad refactor (uses global):

from collections.abc import Callable
from funcpipe_rag import CleanConfig, RagEnv, clean_doc, embed_chunk, iter_chunk_doc, structural_dedup_chunks

def bad_make_rag_fn(
    chunk_size: int, clean_cfg: CleanConfig
) -> Callable[[list], list]:
    global GLOBAL_ENV
    GLOBAL_ENV = RagEnv(chunk_size)
    cleaner = make_cleaner(clean_cfg)
    def run(docs):
        embedded = [embed_chunk(c) for d in docs for c in iter_chunk_doc(cleaner(d), GLOBAL_ENV)]
        return structural_dedup_chunks(embedded)
    return run

Property (swapped to bad_make_rag_fn):

@given(
    docs=doc_list_strategy(),
    chunk_size1=st.integers(128, 1024),
    chunk_size2=st.integers(128, 1024),
)
def test_bad_configurator_env_sensitive(docs, chunk_size1, chunk_size2):
    global GLOBAL_ENV
    GLOBAL_ENV = RagEnv(chunk_size1)  # Simulate prior state
    rag_fn = bad_make_rag_fn(chunk_size2, CleanConfig())
    pure = make_rag_fn(chunk_size=chunk_size2, clean_cfg=CleanConfig())
    pure_chunks, _ = pure(docs)
    assert rag_fn(docs) == pure_chunks  # Fails when chunk_size1 != chunk_size2 (global sensitivity)

Hypothesis failure trace (run to verify; example with differing sizes):

Falsifying example: test_bad_configurator_env_sensitive(
    docs=[RawDoc(doc_id='a', title='', abstract='a', categories='')], 
    chunk_size1=128,
    chunk_size2=129,
)
AssertionError
  • Shrinks to minimal doc; catches reliance on global because the test models external state changes. This deliberately reintroduces the global GLOBAL_ENV and shows how Hypothesis exposes the hidden dependency, by comparing against the pure make_rag_fn.

7. When Configurators Aren't Worth It

Rarely, for one-off calls with fixed config (e.g., scripts), pass params explicitly; but use configurators for variants or reuse.


8. Pre-Core Quiz

  1. f = partial(g, x=1); f() == f()? → Yes, if pure.
  2. Substitute closure call? → Safe (fixed capture).
  3. Global in configurator? → Hidden dep → impure.
  4. Mutable default? → Breaks determinism.
  5. Cache configured fn? → Safe if pure.

9. Post-Core Reflection & Exercise

Reflect: In your code, find one configurable func (global/default). Refactor to pure configurator; add Hypothesis equiv.
Project Exercise: Apply to RAG; run properties on sample data.

Next: Core 2 – Expression-Oriented Python. (Builds on this configurator.)

Further Reading: For more on closures in Python, see 'Fluent Python' by Luciano Ramalho. Explore toolz for advanced partials once comfortable.