Module 2: First-Class Functions and Expressive Python¶
Progression Note¶
By the end of Module 2, you'll master first-class functions for configurability, expression-oriented code, and debugging taps. This prepares for lazy iteration in Module 3. See the series progression map in the repo root for full details.
Here's a snippet from the progression map:
| Module | Focus | Key Outcomes |
|---|---|---|
| 1: Foundational FP Concepts | Purity, contracts, refactoring | Spot impurities, write pure functions, prove equivalence with Hypothesis |
| 2: First-Class Functions & Expressive Python | Closures, partials, composable configurators | Configure pure pipelines without globals |
| 3: Lazy Iteration & Generators | Streaming/lazy pipelines | Efficient data processing without materializing everything |
M02C04 – Designing FP-Friendly APIs (Small Arity, Explicit Dependencies, No Hidden Globals)¶
Core question:
How do you design APIs with ≤3 parameters, explicit config and dependencies, and no hidden globals—so pipelines from M02C01–M02C03 are composable, testable, and predictable?
This core introduces FP-friendly API design in Python:
- Craft functions as small, composable bricks with explicit interfaces (arity ≤3 for core public APIs).
- Group parameters into immutable config (domain settings) and dependencies (injected services).
- Build on M02C01 configurators, M02C02 expressions, and M02C03 laziness for streaming pipelines.
We extend the running project from m02-rag.md—the FuncPipe RAG Builder—evolving from a high-arity, global-dependent version to clean, composable APIs that preserve baseline equivalence.
Audience: Developers from M02C03 using lazy pipelines but facing high-arity functions or hidden globals that hinder testing and composition.
Outcome:
1. Identify high-arity or hidden dependencies in code and explain their impact on composability.
2. Refactor a high-arity function into a small-arity API with grouped config and dependencies.
3. Write Hypothesis properties proving equivalence and idempotence, with a shrinking example.
1. Conceptual Foundation¶
1.1 FP-Friendly API Design in One Precise Sentence¶
FP-friendly APIs limit core public functions to ≤3 parameters, group domain settings into immutable config and services into explicit dependencies, and avoid hidden globals—ensuring composability, testability, and equational reasoning.
1.2 The One-Sentence Rule¶
Core public APIs must have ≤3 parameters (inputs, config, deps), with all dependencies explicit and globals forbidden; bind config/deps at edges using M02C01 partials or factories.
1.3 Why This Matters Now¶
M02C03 introduced lazy pipelines, but high-arity APIs or hidden globals make them hard to partialize, test, or compose. FP-friendly design ensures pipelines snap together, leveraging M02C01 configurators for variants, M02C02 expressions for clarity, and M02C03 laziness for efficiency.
1.4 FP-Friendly APIs as Values in 5 Lines¶
Small-arity APIs enable dynamic composition:
from functools import partial
from funcpipe_rag import RagConfig, RagEnv, RulesConfig, StartsWith, get_deps, iter_rag_core
standard_config = RagConfig(env=RagEnv(512))
standard_deps = get_deps(standard_config)
filtered_config = RagConfig(
env=RagEnv(512),
keep=RulesConfig(keep_pred=StartsWith("categories", "cs.")),
)
filtered_deps = get_deps(filtered_config)
rags: dict[str, object] = {
"standard": partial(iter_rag_core, config=standard_config, deps=standard_deps),
"filtered": partial(iter_rag_core, config=filtered_config, deps=filtered_deps),
}
Small-arity functions (inputs, config, deps), explicit config/deps, and no globals allow storage in dicts, composition with M02C01 partials, and testing as first-class values. For example, swapping keep in config creates variants without globals or high arity.
Note: In real systems, embed may involve I/O (e.g., API calls); injecting it in deps ensures stricter purity, treating the core as referentially transparent.
2. Mental Model: High-Arity Globals vs Small Explicit APIs¶
2.1 One Picture¶
High-Arity Globals (Messy) Small Explicit APIs (Composable)
+-----------------------+ +------------------------------+
| def rag(docs, env, | | def iter_rag_core(docs, |
| cleaner, keep, taps, | | config, deps) |
| chunk_size, more...) | | -> Iterator[Chunk] |
| # Uses GLOBAL_CFG | | # Config: env, keep |
| | | # Deps: cleaner, embed, taps |
+-----------------------+ +------------------------------+
↑ Hard to Test/Compose ↑ Snaps into Partial/Flow
2.2 Contract Table¶
| Aspect | High-Arity Globals | Small Explicit APIs |
|---|---|---|
| Arity | >3 params | ≤3 (inputs, config, deps) |
| Dependencies | Hidden globals/env vars | Explicit config/deps structs |
| Composability | Hard (many args, globals) | Easy (partial, flow) |
| Testing | Mock globals, flaky | Inject fakes, deterministic |
| Boundaries | Mixed pure/effects | Pure core, effectful edges |
| Reasoning | Opaque (hidden state) | Equational (substitutable) |
| Mutable Defaults in Partials | Breaks Determinism | Use frozen dataclasses or immutable types for configs |
Note on High-Arity Choice: Use higher arity only for legacy adapters, wrapping small-arity cores.
2.3 Common API Shapes Table¶
To lock in the arity rule, here are typical shapes:
| Shape | Meaning | Example |
|---|---|---|
| f(data) | Pure utility, no config/deps | hash(data) |
| f(data, config) | Domain-level core | chunk(data, ChunkConfig) |
| f(data, config, deps) | Cross-cutting deps present | iter_rag_core(docs, config, deps) |
Any other shape (e.g., f(docs, env, keep, cleaner, taps, ...)) must be considered an anti-pattern and refactored.
3. Running Project: FuncPipe RAG Builder¶
We extend the FuncPipe RAG Builder from m02-rag.md:
- Dataset: 10k arXiv CS abstracts (arxiv_cs_abstracts_10k.csv).
- Goal: Design small-arity, explicit APIs that compose lazily and match the baseline outputs.
- Start: High-arity, global-dependent version (core4_start.py).
- End: FP-friendly API with streaming core and edge materialization.
3.1 Types (Canonical, Used Throughout)¶
From src/funcpipe_rag/rag_types.py, src/funcpipe_rag/api/types.py, plus new config/deps:
3.2 High-Arity Start (Anti-Pattern)¶
# core4_start.py: High-arity, global-dependent RAG
from funcpipe_rag import RawDoc, CleanDoc, ChunkWithoutEmbedding, Chunk, RagEnv
from funcpipe_rag import DocRule, Observations, RagTaps
from funcpipe_rag import any_doc
from funcpipe_rag import clean_doc, embed_chunk, structural_dedup_chunks, gen_chunk_doc
from collections.abc import Callable, Sequence
GLOBAL_ENV = RagEnv(512) # Hidden global
def high_arity_rag(
docs: list[RawDoc],
cleaner: Callable[[RawDoc], CleanDoc],
keep: DocRule | None,
taps: RagTaps | None,
chunk_size: int = GLOBAL_ENV.chunk_size,
debug: bool = False
) -> tuple[list[Chunk], Observations]:
rule = keep if keep is not None else any_doc
kept_docs = [d for d in docs if rule(d)]
if taps and taps.docs and debug:
taps.docs(tuple(kept_docs))
cleaned = [cleaner(d) for d in kept_docs]
if taps and taps.cleaned:
taps.cleaned(tuple(cleaned))
chunk_we = [c for cd in cleaned for c in gen_chunk_doc(cd, RagEnv(chunk_size))]
embedded = [embed_chunk(c) for c in chunk_we]
chunks = structural_dedup_chunks(embedded)
if taps and taps.chunks and debug:
taps.chunks(tuple(chunks))
obs = Observations(
total_docs=len(docs),
total_chunks=len(chunks),
kept_docs=len(kept_docs),
cleaned_docs=len(cleaned),
sample_doc_ids=tuple(d.doc_id for d in kept_docs[:5]),
sample_chunk_starts=tuple(c.start for c in chunks[:5]),
)
return chunks, obs
Smells:
- High arity (6 params: docs, cleaner, keep, taps, chunk_size, debug).
- Hidden global (GLOBAL_ENV).
- Mixed effects (taps with debug flag).
Problem: Hard to partialize, test, or reason about due to excessive params and global dependency.
4. Refactor to FP-Friendly: Small Arity, Explicit Dependencies¶
To strengthen pedagogy, here's a concrete before/after example for redesigning an unfriendly API:
import os
import pandas as pd
from dataclasses import dataclass
# Before: Unfriendly API with implicit context
def foo(df: pd.DataFrame) -> pd.DataFrame:
threshold = float(os.environ.get('THRESHOLD', '0.5')) # Hidden env dep
return df[df['value'] > threshold] # Non-deterministic if env changes
@dataclass(frozen=True)
class FooConfig:
threshold: float
# After: FP-Friendly with explicit deps
def foo(data: pd.DataFrame, *, config: FooConfig) -> pd.DataFrame:
return data[data['value'] > config.threshold] # Pure: Depends only on inputs
This makes the function testable (inject mock config) and composable—no surprises from environment variables.
4.1 Streaming Core (Pure, Lazy)¶
A pure, lazy core with small arity, building on M02C03:
from funcpipe_rag import RagConfig, RagCoreDeps, iter_rag_core
chunks_iter = iter_rag_core(docs, config, deps)
Properties:
- Arity 3: docs, config, deps.
- Pure, fully lazy (generator-based, O(1) memory).
- No taps (effects deferred to edge).
- Explicit config/deps, no globals.
4.2 Post-Clean Streaming Sub-Core¶
To reuse core logic at the edge without duplicating the full pipeline:
from collections.abc import Iterator, Iterable, Callable
from funcpipe_rag import CleanDoc, Chunk, ChunkWithoutEmbedding, RagConfig
from funcpipe_rag import gen_chunk_doc
def iter_chunks_from_cleaned(
cleaned: Iterable[CleanDoc],
config: RagConfig,
embed: Callable[[ChunkWithoutEmbedding], Chunk]
) -> Iterator[Chunk]:
"""Sub-core: lazy chunk and embed from cleaned docs (reuses M02C03 patterns)."""
for cd in cleaned:
for chunk in gen_chunk_doc(cd, config.env):
yield embed(chunk)
Properties:
- Arity 3: cleaned, config, embed (sub-core, internal; embed injected for consistency).
Here config is domain config and embed is a dependency; we still respect the “data, config, deps” ≤3-arity pattern even in internal sub-cores.
- Enables reuse in full_rag_api_docs (and full_rag_api) for lazy post-clean processing.
4.3 Public API (Edge, Materializes)¶
Wraps the core components, handles materialization and taps:
from funcpipe_rag import RagConfig, RagCoreDeps, full_rag_api_docs
# Canonical end-of-Module-02 API (implemented in `src/funcpipe_rag/api/core.py`)
chunks, obs = full_rag_api_docs(docs, config, deps)
Properties:
- Arity 3, explicit config/deps.
- Builds on M02C03 laziness internally (lazy post-clean via sub-core); materializes filter/clean at edge for taps/obs.
- Reuses core expressions via a private _tap helper and the iter_chunks_from_cleaned sub-core; taps are observational side effects isolated to the edge.
- Matches the baseline stage composition when config.keep = DEFAULT_RULES, deps.taps = None.
Note: iter_rag_core is the fully streaming core. full_rag_api_docs intentionally materializes intermediates for observations/taps; laziness applies post-clean. Dedup runs post-tap as it requires a global view. _tap is an internal helper (see src/funcpipe_rag/api/core.py), not a public API.
4.4 Configurator Tie-In (M02C01) and Swapping Examples¶
from functools import partial
from funcpipe_rag import (
Chunk,
ChunkWithoutEmbedding,
RagConfig,
RagCoreDeps,
RagEnv,
RulesConfig,
StartsWith,
full_rag_api_docs,
get_deps,
)
# Standard variant
standard_config = RagConfig(env=RagEnv(512))
standard_deps = get_deps(standard_config)
rag_fn = partial(full_rag_api_docs, config=standard_config, deps=standard_deps)
# Swapping config: Filter to CS docs
cs_config = RagConfig(
env=RagEnv(512),
keep=RulesConfig(keep_pred=StartsWith("categories", "cs.")),
)
cs_deps = get_deps(cs_config)
cs_rag_fn = partial(full_rag_api_docs, config=cs_config, deps=cs_deps)
# Swapping deps: Fake embedder for tests (no I/O)
def fake_embed(c: ChunkWithoutEmbedding) -> Chunk:
return Chunk(c.doc_id, c.text, c.start, c.end, (0.0,) * 16) # Mock embedding
test_deps = RagCoreDeps(cleaner=standard_deps.cleaner, embedder=fake_embed, taps=None)
test_rag_fn = partial(full_rag_api_docs, config=standard_config, deps=test_deps)
Wins: Small arity enables easy partialization; config/deps allow clean swapping (e.g., rules via config, fakes via deps). Composes with M02C01 make_rag_fn.
5. Equational Reasoning: Substitution Exercise¶
Hand Exercise: Substitute expressions in iter_rag_core.
1. Inline rule = config.keep → fixed predicate.
2. Substitute into generator expression → filtered stream.
3. Result: Output stream is fixed for fixed docs, config, deps.
Bug Hunt: In high_arity_rag, GLOBAL_ENV breaks substitution (replacing reference changes behavior).
Example:
- High-arity: chunk_size = GLOBAL_ENV.chunk_size → depends on mutable global, substitution fails.
- Friendly: config.env.chunk_size → immutable, substitutable, behavior preserved.
6. Property-Based Testing: Proving Equivalence and Idempotence¶
Use Hypothesis to prove the refactor preserves baseline behavior and avoids global bugs.
6.1 Custom Strategy¶
From tests/conftest.py.
6.2 Equivalence Property (Core vs Baseline)¶
# tests/test_rag_api.py
from hypothesis import given
import hypothesis.strategies as st
from funcpipe_rag import (
clean_doc,
embed_chunk,
get_deps,
iter_chunk_doc,
RagConfig,
structural_dedup_chunks,
full_rag_api_docs,
iter_rag_core,
)
from tests.conftest import doc_list_strategy, env_strategy
from itertools import islice, tee
def baseline_full_rag(docs, env):
embedded = [embed_chunk(c) for d in docs for c in iter_chunk_doc(clean_doc(d), env)]
return structural_dedup_chunks(embedded)
@given(docs=doc_list_strategy(), env=env_strategy())
def test_rag_equivalence(docs, env):
config = RagConfig(env=env)
deps = get_deps(config)
docs1, docs2 = tee(iter(docs)) # Consistent iterables
baseline = baseline_full_rag(list(docs1), env)
chunks, _ = full_rag_api_docs(docs2, config, deps)
assert chunks == baseline
Note: Tests chunk equivalence to a baseline built from the pure stages.
6.3 Prefix Equivalence (Streaming Core)¶
@given(docs=doc_list_strategy(), env=env_strategy(), k=st.integers(0, 50))
def test_core_prefix_equivalence(docs, env, k):
config = RagConfig(env=env)
deps = get_deps(config)
docs1, docs2 = tee(iter(docs))
baseline = baseline_full_rag(list(docs1), env)
core_iter = iter_rag_core(docs2, config, deps)
assert list(islice(core_iter, k)) == baseline[:k]
Note: Verifies streaming core matches the baseline on finite prefixes (M02C03 tie-in).
6.4 Idempotence Property¶
@given(docs=doc_list_strategy(), env=env_strategy())
def test_rag_idempotence(docs, env):
config = RagConfig(env=env)
deps = get_deps(config)
docs1, docs2 = tee(iter(docs))
chunks1, _ = full_rag_api_docs(docs1, config, deps)
chunks2, _ = full_rag_api_docs(docs2, config, deps)
assert chunks1 == chunks2
Note: Verifies same inputs yield same outputs, catching global mutation bugs.
6.5 Shrinking Demo: Catching a Global Bug¶
Bad refactor with global mutation:
from funcpipe_rag import RawDoc, CleanDoc, Chunk, ChunkWithoutEmbedding, RagEnv
from funcpipe_rag import Observations, RagCoreDeps, eval_pred, full_rag_api_docs
from funcpipe_rag import gen_chunk_doc, structural_dedup_chunks
from collections.abc import Iterator, Iterable, Callable
def bad_full_rag_api(
docs: Iterable[RawDoc],
config: RagConfig,
deps: RagCoreDeps
) -> tuple[list[Chunk], Observations]:
# Reuse the same GLOBAL_ENV from the high_arity_rag anti-pattern
global GLOBAL_ENV
GLOBAL_ENV = RagEnv(config.env.chunk_size + 1) # Mutates global
docs_list = list(docs)
kept_docs = [d for d in docs_list if eval_pred(d, config.keep.keep_pred)]
cleaned = [deps.cleaner(d) for d in kept_docs]
chunks_iter = (deps.embedder(c) for cd in cleaned for c in gen_chunk_doc(cd, GLOBAL_ENV))
chunks = list(chunks_iter)
chunks = structural_dedup_chunks(chunks)
obs = Observations(total_docs=len(docs_list), total_chunks=len(chunks), kept_docs=len(kept_docs), cleaned_docs=len(cleaned))
return chunks, obs
Property testing the bad version:
@given(docs=doc_list_strategy(), env=env_strategy())
def test_bad_rag_idempotence(docs, env):
global GLOBAL_ENV
GLOBAL_ENV = env
config = RagConfig(env=env)
deps = RagCoreDeps(cleaner=clean_doc, embedder=embed_chunk, taps=None)
docs1, docs2 = tee(iter(docs))
chunks1, _ = bad_full_rag_api(docs1, config, deps)
chunks2, _ = bad_full_rag_api(docs2, config, deps)
assert chunks1 == chunks2
Failure Trace (Example):
Falsifying example: test_bad_rag_idempotence(
docs=[RawDoc(doc_id='a', title='t', abstract='abc', categories='c')],
env=RagEnv(chunk_size=128),
)
AssertionError
Analysis: Shrinks to minimal doc where GLOBAL_ENV mutation changes chunk sizes between calls, breaking idempotence.
7. When FP-Friendly APIs Aren't Worth It¶
Use higher arity or globals only in:
- Legacy adapters (e.g., framework callbacks requiring fixed signatures).
- One-off scripts with no reuse.
Guardrails: Wrap such functions in thin adapters calling small-arity cores to isolate complexity.
Example:
# Legacy adapter
def legacy_rag(docs, chunk_size, cleaner, keep, debug):
config = RagConfig(env=RagEnv(chunk_size), keep=keep)
deps = RagCoreDeps(cleaner=cleaner, embedder=embed_chunk)
return full_rag_api_docs(docs, config, deps)
8. Pre-Core Quiz¶
- Why does
def f(a, b, c, d, e)violate FP-friendly design?
Answer: Arity >3, hard to partialize or compose. - How to fix a function using
GLOBAL_DB?
Answer: Inject as dependency indeps. - What’s wrong with
def rag(docs, cleaner, env, keep, taps)?
Answer: High arity (5); groupenv, keepintoconfig,cleaner, tapsintodeps. - Why use
RagConfigandRagCoreDepsstructs?
Answer: Encapsulate domain settings and services, reduce arity, clarify intent. - Tool to prove refactor correctness?
Answer: Hypothesis (equivalence, idempotence).
9. Post-Core Reflection & Exercise¶
Reflect: Find a function in your codebase with >3 params or hidden globals. Refactor it to use inputs, config, deps with arity ≤3. Add Hypothesis tests for equivalence and idempotence.
Project Exercise: Apply to RAG pipeline; run properties on arxiv_cs_abstracts_10k.csv.
- Did composability improve (easier partials)?
- Did tests catch global bugs?
- Did config/deps clarify domain logic?
Next: Core 5 – Boundary Design (Isolating I/O to Edges).
Verify all patterns with Hypothesis—examples provided show how to detect impurities like globals or non-determinism.
Further Reading: For more on closures in Python, see 'Fluent Python' by Luciano Ramalho. Explore toolz for advanced partials once comfortable.