Module 2: First-Class Functions and Expressive Python¶
Progression Note¶
By the end of Module 2, you'll master first-class functions for configurability, expression-oriented code, and debugging taps. This prepares for lazy iteration in Module 3. See the series progression map in the repo root for full details.
Here's a snippet from the progression map:
| Module | Focus | Key Outcomes |
|---|---|---|
| 1: Foundational FP Concepts | Purity, contracts, refactoring | Spot impurities, write pure functions, prove equivalence with Hypothesis |
| 2: First-Class Functions & Expressive Python | Closures, partials, composable configurators | Configure pure pipelines without globals |
| 3: Lazy Iteration & Generators | Streaming/lazy pipelines | Efficient data processing without materializing everything |
M02C05 – Boundary Design (Isolating I/O to Edges Only)¶
Core question:
How do you isolate all side effects (I/O, mutation, exceptions) to thin, explicit boundaries—so the core stays parametric over effects, composable, and equational while handling real-world I/O?
This core introduces boundary design in Python:
- Confine effects to thin implementations injected via protocols in deps, keeping the core parametric over pure or effectful functions.
- Use Result for explicit errors instead of exceptions.
- Build on M02C04's config/deps for injecting services (pure or effectful).
We extend the running project from m02-rag.md—the FuncPipe RAG Builder—evolving from a leaky version with scattered I/O to parametric core + injected boundaries that preserve baseline equivalence.
Audience: Developers from M02C04 using small-arity APIs but with effects (e.g., file reads, exceptions) leaking into the core, breaking parametricity.
Outcome:
1. Identify effect leaks (I/O, raises) in code and explain their impact on reasoning.
2. Refactor a leaky function into parametric core + thin boundary with injected deps.
3. Write Hypothesis properties proving parametricity (equivalence, idempotence), with a shrinking example.
Note: This core anticipates Module 7's Ports & Adapters—start isolating I/O now by wrapping any file/network calls in thin functions.
Result Preview: In this core we only care about where I/O happens, not about advanced error algebra. We define a minimal Result[T] = Ok[T] | Err with Err always carrying a str. In Module 4 we generalize this to a fully-typed Result[T, E] with laws and a richer API. For now, treat it as a way to handle errors without exceptions: check isinstance(res, Ok) to get the value or isinstance(res, Err) to get the error.
1. Conceptual Foundation¶
1.1 Boundary Design in One Precise Sentence¶
Boundary design isolates side effects to thin implementations injected via protocols in deps—ensuring the core remains parametric over pure or effectful services, composable via M02C01–M02C04, while effects are testable and replaceable.
1.2 The One-Sentence Rule¶
Confine side effects to boundary implementations (e.g., FSReader); inject them as deps so the core stays parametric and testable—never hardcode effects, raises, or mutation in core functions.
1.3 Why This Matters Now¶
M02C04 gave small-arity APIs with explicit deps, but hardcoded effects in the core break parametricity, making reasoning conditional. Boundary design enforces parametricity, enabling full M02C01–M02C04 power in real systems with injectable I/O.
1.4 Boundaries as Values in 5 Lines¶
Boundaries as first-class enable dynamic injection:
from dataclasses import dataclass
from collections.abc import Callable
from funcpipe_rag import FSReader, Ok, RawDoc, Result
@dataclass(frozen=True)
class FakeReader:
docs: list[RawDoc]
def read_docs(self, path: str) -> Result[list[RawDoc]]:
_ = path
return Ok(self.docs)
ReaderFn = Callable[[str], Result[list[RawDoc]]]
readers: dict[str, ReaderFn] = {
"fake": FakeReader([RawDoc("test", "title", "abstract", "cat")]).read_docs,
"real": FSReader().read_docs,
}
Thin boundaries (protocols), explicit injection via deps, and parametric core allow swapping implementations (pure fakes or effectful reals) without changing core logic. In practice, you may store boundary implementations in registries like readers, then inject the chosen implementation into RagBoundaryDeps.
Note: Core is parametric: pure if deps are pure (e.g., fake embedder), effectful if deps perform I/O. Iterators defer computation; if deps are effectful, consumption performs effects.
2. Mental Model: Leaky Effects vs Sealed Boundaries¶
2.1 One Picture¶
Leaky Effects (Chaotic) Sealed Boundaries (Parametric)
+---------------------------+ +------------------------------+
| def rag(docs_path): | | def iter_rag_core(docs, |
| docs = open(...) | | config, deps) |
| # I/O in core! | | -> Iterator[Chunk] |
| return process(docs) | | # Parametric over deps |
+---------------------------+ +------------------------------+
↑ Flaky Reasoning ↑ Effects via Injected Deps
2.2 Contract Table¶
| Aspect | Leaky Effects | Sealed Boundaries |
|---|---|---|
| Parametricity | Hardcoded effects | Core parametric over deps |
| Dependencies | Hidden I/O globals | Explicit protocols in deps |
| Composability | Flaky (side effects) | Easy (pure flow/partial) |
| Testing | Mock globals, integration | Unit pure, fake deps |
| Boundaries | Scattered | Thin implementations |
| Reasoning | Opaque (hidden effects) | Equational (substitutable) |
Note on Leaky Choice: Use leaks only in trivial scripts; always seal for reuse.
3. Running Project: FuncPipe RAG Builder¶
We extend the FuncPipe RAG Builder from m02-rag.md:
- Dataset: 10k arXiv CS abstracts (arxiv_cs_abstracts_10k.csv).
- Goal: Isolate I/O (file loading) to boundaries, keeping core parametric.
- Start: Leaky version with I/O in core (core5_start.py).
- End: Injected boundaries, preserving equivalence.
3.1 Types (Canonical, Used Throughout)¶
Extend M02C04 with effect protocols in deps:
from funcpipe_rag import Err, Ok, Reader, Result
from funcpipe_rag import RagBoundaryDeps, RagConfig, RagCoreDeps
Note: M02C05 extends deps with reader for boundaries; core functions ignore reader.
3.2 Leaky Start (Anti-Pattern)¶
# core5_start.py: Leaky RAG with I/O in core (anti-pattern; illustration only)
from funcpipe_rag import Observations, RagConfig, RagCoreDeps, RagEnv
from funcpipe_rag import Chunk, RawDoc, clean_doc, embed_chunk, iter_rag_core, structural_dedup_chunks
import csv
def leaky_full_rag_api(
path: str,
config: RagConfig,
deps: RagCoreDeps
) -> tuple[list[Chunk], Observations]:
try:
with open(path) as f: # Leaky I/O in "core"!
reader = csv.DictReader(f)
docs = [RawDoc(**row) for row in reader]
except Exception as e:
raise ValueError(f"Load failed: {e}") # Leaky exception
chunks_iter = iter_rag_core(docs, config, deps) # From M02C04
chunks = list(chunks_iter)
chunks = structural_dedup_chunks(chunks)
obs = Observations(total_docs=len(docs), total_chunks=len(chunks)) # Simplified
return chunks, obs
Smells:
- I/O (open) in API, not boundary.
- Exceptions for control flow.
- Mixed parametric/streaming with effects.
Problem: Breaks parametricity; hard to test without real files.
4. Refactor to Boundaries: Parametric Core + Injected Implementations¶
4.1 Streaming Core (Parametric over Deps)¶
Canonical M02C04 core (repeated for reference):
from funcpipe_rag import RagConfig, get_deps, iter_rag_core
deps = get_deps(config)
chunks_iter = iter_rag_core(docs, config, deps)
Properties:
- Arity 3: Parametric; pure if deps pure.
- Lazy: Builds on M02C03.
- Deps may be effectful (e.g., real embedder performs I/O).
4.2 Post-Clean Streaming Sub-Core¶
Internal sub-core:
from funcpipe_rag import iter_chunks_from_cleaned
chunks_iter = iter_chunks_from_cleaned(cleaned, config, deps.embedder)
Properties:
- Arity 3: Parametric, reusable.
4.3 I/O Boundary Implementations (Thin, Injected)¶
Explicit reader implementations:
from funcpipe_rag import FSReader, Ok, RawDoc, Result
class FakeReader:
def __init__(self, docs: list[RawDoc]):
self._docs = docs
def read_docs(self, path: str) -> Result[list[RawDoc]]:
_ = path
return Ok(self._docs)
Properties:
- Thin: Single responsibility.
- Result: Explicit errors.
- Injected via deps.reader.
4.4 Public API (Edge, Composes Boundaries)¶
Orchestrates implementation + core:
from funcpipe_rag import FSReader, RagBoundaryDeps, full_rag_api_docs, full_rag_api_path
chunks, obs = full_rag_api_docs(docs, config, deps)
boundary_deps = RagBoundaryDeps(core=deps, reader=FSReader())
res = full_rag_api_path("arxiv_cs_abstracts_10k.csv", config, boundary_deps)
Properties:
- Arity 3: Effects in implementations (e.g., FSReader).
- Uses simple isinstance for Result handling.
- Matches the baseline stage composition on Ok.
Layers:
- Core (library, parametric, streaming): iter_rag_core.
- Sub-core (internal helper): iter_chunks_from_cleaned.
- Boundary/Edge (CLI/API, effectful): full_rag_api_path (path in, Result out).
4.5 Configurator Tie-In (M02C01)¶
from functools import partial
from funcpipe_rag import Chunk, ChunkWithoutEmbedding, DebugConfig, RagBoundaryDeps, RagConfig, RagCoreDeps, RagEnv
from funcpipe_rag import FSReader, Ok, RulesConfig, StartsWith, full_rag_api_path, get_deps, make_rag_fn
def fake_embedder(c: ChunkWithoutEmbedding) -> Chunk:
return Chunk(c.doc_id, c.text, c.start, c.end, (0.0,) * 16) # Fake embedding
# Docs API (preferred): configure a docs -> (chunks, obs) callable
rag_docs_fn = make_rag_fn(chunk_size=512)
# Boundary API: configure boundary deps and call `full_rag_api_path`
config = RagConfig(env=RagEnv(512), debug=DebugConfig())
boundary_deps = RagBoundaryDeps(core=get_deps(config), reader=FSReader())
rag_path_fn = partial(full_rag_api_path, config=config, deps=boundary_deps)
# Fake boundary: swap reader/embedder for tests
keep_all_cs = RulesConfig(keep_pred=StartsWith("categories", "cs."))
test_config = RagConfig(env=RagEnv(512), keep=keep_all_cs)
fake_boundary_deps = RagBoundaryDeps(
core=RagCoreDeps(cleaner=get_deps(test_config).cleaner, embedder=fake_embedder, taps=None),
reader=FakeReader([]),
)
test_rag_path_fn = partial(full_rag_api_path, config=test_config, deps=fake_boundary_deps)
Wins: Implementations injectable; fakes make core pure. Composes with M02C01.
5. Equational Reasoning: Substitution Exercise¶
Hand Exercise: Substitute in iter_rag_core.
1. Inline embedder = deps.embedder → fixed function.
2. Substitute into generator → parametric stream.
3. Result: Output fixed for fixed inputs/deps (parametric).
Bug Hunt: In leaky version, open breaks substitution (effects change behavior).
Example:
- Leaky: with open(...) → depends on FS, not substitutable.
- Sealed: deps.reader.read_docs(path) → injectable, substitutable with fake implementation.
6. Property-Based Testing: Proving Parametricity (Advanced, Optional)¶
Use Hypothesis to prove refactor preserves behavior with parametric deps.
6.1 Custom Strategy¶
From tests/conftest.py.
6.2 Core Equivalence Property¶
# tests/test_rag_api.py
from hypothesis import given
import hypothesis.strategies as st
from funcpipe_rag import (
RagConfig,
RagEnv,
RagCoreDeps,
RagBoundaryDeps,
Err,
Ok,
FSReader,
clean_doc,
embed_chunk,
iter_chunk_doc,
structural_dedup_chunks,
iter_rag_core,
full_rag_api_path,
)
from tests.conftest import doc_list_strategy, env_strategy
from itertools import islice
def baseline_full_rag(docs, env):
embedded = [embed_chunk(c) for d in docs for c in iter_chunk_doc(clean_doc(d), env)]
return structural_dedup_chunks(embedded)
@given(docs=doc_list_strategy(), env=env_strategy())
def test_core_equivalence(docs, env):
config = RagConfig(env=env)
deps = RagCoreDeps(cleaner=clean_doc, embedder=embed_chunk)
core_iter = iter_rag_core(iter(docs), config, deps)
assert list(core_iter) == baseline_full_rag(docs, env)
Note: Tests parametric core equivalence to the baseline (no boundaries).
6.3 Prefix Equivalence (Streaming Core)¶
@given(docs=doc_list_strategy(), env=env_strategy(), k=st.integers(0, 50))
def test_core_prefix_equivalence(docs, env, k):
config = RagConfig(env=env)
deps = RagCoreDeps(cleaner=clean_doc, embedder=embed_chunk)
core_iter = iter_rag_core(iter(docs), config, deps)
assert list(islice(core_iter, k)) == baseline_full_rag(docs, env)[:k]
Note: Verifies parametric core streaming matches the baseline.
6.4 Boundary Error Handling¶
def test_boundary_failure():
config = RagConfig(env=RagEnv(512))
deps = RagBoundaryDeps(RagCoreDeps(cleaner=clean_doc, embedder=embed_chunk, taps=None), FSReader())
res = full_rag_api_path("nonexistent.csv", config, deps)
assert isinstance(res, Err)
assert "Load failed" in res.error
Note: Tests boundary implementation returns Err on I/O error.
6.5 Idempotence Property (Boundary with Fake Implementation)¶
@given(env=env_strategy())
def test_rag_idempotence(env):
from funcpipe_rag import Chunk, ChunkWithoutEmbedding, Ok, RawDoc, Result
class FakeReader:
def read_docs(self, path: str) -> Result[list[RawDoc]]:
_ = path
return Ok([])
def fake_embedder(c: ChunkWithoutEmbedding) -> Chunk:
return Chunk(c.doc_id, c.text, c.start, c.end, (0.0,) * 16)
config = RagConfig(env=env)
deps = RagBoundaryDeps(
RagCoreDeps(cleaner=clean_doc, embedder=fake_embedder, taps=None),
FakeReader(),
)
res1 = full_rag_api_path("fake_path", config, deps)
res2 = full_rag_api_path("fake_path", config, deps)
assert res1 == res2
Note: Ensures no hidden state with faked implementations (pure deps).
6.6 Shrinking Demo: Catching a Leaky Bug¶
Bad reader with leaky state:
from funcpipe_rag import Ok, RawDoc, Result
class BadReader:
counter = 0
def read_docs(self, path: str) -> Result[list[RawDoc]]:
BadReader.counter += 1 # Leaky mutation
if BadReader.counter % 2 == 0:
return Ok([])
return Ok([RawDoc("cs-123", "Title", "Abstract", "cs.AI")])
Property:
@given(env=env_strategy())
def test_bad_rag_idempotence(env):
config = RagConfig(env=env)
deps = RagBoundaryDeps(RagCoreDeps(cleaner=clean_doc, embedder=embed_chunk, taps=None), BadReader())
res1 = full_rag_api_path("fake_path", config, deps)
res2 = full_rag_api_path("fake_path", config, deps)
assert res1 == res2
Failure Trace (Example):
Analysis: Shrinks to minimal; catches leaky counter changing output between calls.
7. When Boundaries Aren't Worth It¶
Use leaks only in:
- Trivial one-off scripts (no reuse).
- Legacy wrappers around sealed cores.
Guardrails: Isolate leaks to <10 lines; always prefer boundaries for tests/reuse.
Example:
8. Pre-Core Quiz¶
open()in core? → Violates parametricity.raise ValueError? → Use Result.- How to test I/O? → Fake implementation.
- Effects in generator? → Inject implementation.
- Prove parametricity? → Hypothesis idempotence.
9. Post-Core Reflection & Exercise¶
Reflect: Find a function with I/O or raises. Refactor to parametric core + implementation; inject fake. Add Hypothesis for equivalence/idempotence.
Project Exercise: Apply to RAG (e.g., load_docs as boundary); run properties.
- Did parametricity enable easier tests?
- Did fakes catch leaks?
- Did boundaries clarify effects?
Next: Core 6 – Configuration-as-Data.
Verify all patterns with Hypothesis—examples provided show how to detect impurities like globals or non-determinism.
Further Reading: For more on closures in Python, see 'Fluent Python' by Luciano Ramalho. Explore toolz for advanced partials once comfortable.