Skip to content

Module 2: First-Class Functions and Expressive Python

Progression Note

By the end of Module 2, you'll master first-class functions for configurability, expression-oriented code, and debugging taps. This prepares for lazy iteration in Module 3. See the series progression map in the repo root for full details.

Here's a snippet from the progression map:

Module Focus Key Outcomes
1: Foundational FP Concepts Purity, contracts, refactoring Spot impurities, write pure functions, prove equivalence with Hypothesis
2: First-Class Functions & Expressive Python Closures, partials, composable configurators Configure pure pipelines without globals
3: Lazy Iteration & Generators Streaming/lazy pipelines Efficient data processing without materializing everything

M02C05 – Boundary Design (Isolating I/O to Edges Only)

Core question:
How do you isolate all side effects (I/O, mutation, exceptions) to thin, explicit boundaries—so the core stays parametric over effects, composable, and equational while handling real-world I/O?

This core introduces boundary design in Python:
- Confine effects to thin implementations injected via protocols in deps, keeping the core parametric over pure or effectful functions.
- Use Result for explicit errors instead of exceptions.
- Build on M02C04's config/deps for injecting services (pure or effectful).

We extend the running project from m02-rag.md—the FuncPipe RAG Builder—evolving from a leaky version with scattered I/O to parametric core + injected boundaries that preserve baseline equivalence.

Audience: Developers from M02C04 using small-arity APIs but with effects (e.g., file reads, exceptions) leaking into the core, breaking parametricity.
Outcome:
1. Identify effect leaks (I/O, raises) in code and explain their impact on reasoning.
2. Refactor a leaky function into parametric core + thin boundary with injected deps.
3. Write Hypothesis properties proving parametricity (equivalence, idempotence), with a shrinking example.

Note: This core anticipates Module 7's Ports & Adapters—start isolating I/O now by wrapping any file/network calls in thin functions.

Result Preview: In this core we only care about where I/O happens, not about advanced error algebra. We define a minimal Result[T] = Ok[T] | Err with Err always carrying a str. In Module 4 we generalize this to a fully-typed Result[T, E] with laws and a richer API. For now, treat it as a way to handle errors without exceptions: check isinstance(res, Ok) to get the value or isinstance(res, Err) to get the error.


1. Conceptual Foundation

1.1 Boundary Design in One Precise Sentence

Boundary design isolates side effects to thin implementations injected via protocols in deps—ensuring the core remains parametric over pure or effectful services, composable via M02C01–M02C04, while effects are testable and replaceable.

1.2 The One-Sentence Rule

Confine side effects to boundary implementations (e.g., FSReader); inject them as deps so the core stays parametric and testable—never hardcode effects, raises, or mutation in core functions.

1.3 Why This Matters Now

M02C04 gave small-arity APIs with explicit deps, but hardcoded effects in the core break parametricity, making reasoning conditional. Boundary design enforces parametricity, enabling full M02C01–M02C04 power in real systems with injectable I/O.

1.4 Boundaries as Values in 5 Lines

Boundaries as first-class enable dynamic injection:

from dataclasses import dataclass
from collections.abc import Callable

from funcpipe_rag import FSReader, Ok, RawDoc, Result


@dataclass(frozen=True)
class FakeReader:
    docs: list[RawDoc]

    def read_docs(self, path: str) -> Result[list[RawDoc]]:
        _ = path
        return Ok(self.docs)


ReaderFn = Callable[[str], Result[list[RawDoc]]]
readers: dict[str, ReaderFn] = {
    "fake": FakeReader([RawDoc("test", "title", "abstract", "cat")]).read_docs,
    "real": FSReader().read_docs,
}

Thin boundaries (protocols), explicit injection via deps, and parametric core allow swapping implementations (pure fakes or effectful reals) without changing core logic. In practice, you may store boundary implementations in registries like readers, then inject the chosen implementation into RagBoundaryDeps.

Note: Core is parametric: pure if deps are pure (e.g., fake embedder), effectful if deps perform I/O. Iterators defer computation; if deps are effectful, consumption performs effects.


2. Mental Model: Leaky Effects vs Sealed Boundaries

2.1 One Picture

Leaky Effects (Chaotic)                     Sealed Boundaries (Parametric)
+---------------------------+               +------------------------------+
| def rag(docs_path):       |               | def iter_rag_core(docs,      |
|     docs = open(...)      |               | config, deps)                |
|     # I/O in core!        |               | -> Iterator[Chunk]           |
|     return process(docs)  |               | # Parametric over deps       |
+---------------------------+               +------------------------------+
   ↑ Flaky Reasoning                          ↑ Effects via Injected Deps

2.2 Contract Table

Aspect Leaky Effects Sealed Boundaries
Parametricity Hardcoded effects Core parametric over deps
Dependencies Hidden I/O globals Explicit protocols in deps
Composability Flaky (side effects) Easy (pure flow/partial)
Testing Mock globals, integration Unit pure, fake deps
Boundaries Scattered Thin implementations
Reasoning Opaque (hidden effects) Equational (substitutable)

Note on Leaky Choice: Use leaks only in trivial scripts; always seal for reuse.


3. Running Project: FuncPipe RAG Builder

We extend the FuncPipe RAG Builder from m02-rag.md:
- Dataset: 10k arXiv CS abstracts (arxiv_cs_abstracts_10k.csv).
- Goal: Isolate I/O (file loading) to boundaries, keeping core parametric.
- Start: Leaky version with I/O in core (core5_start.py).
- End: Injected boundaries, preserving equivalence.

3.1 Types (Canonical, Used Throughout)

Extend M02C04 with effect protocols in deps:

from funcpipe_rag import Err, Ok, Reader, Result
from funcpipe_rag import RagBoundaryDeps, RagConfig, RagCoreDeps

Note: M02C05 extends deps with reader for boundaries; core functions ignore reader.

3.2 Leaky Start (Anti-Pattern)

# core5_start.py: Leaky RAG with I/O in core (anti-pattern; illustration only)
from funcpipe_rag import Observations, RagConfig, RagCoreDeps, RagEnv
from funcpipe_rag import Chunk, RawDoc, clean_doc, embed_chunk, iter_rag_core, structural_dedup_chunks
import csv


def leaky_full_rag_api(
        path: str,
        config: RagConfig,
        deps: RagCoreDeps
) -> tuple[list[Chunk], Observations]:
    try:
        with open(path) as f:  # Leaky I/O in "core"!
            reader = csv.DictReader(f)
            docs = [RawDoc(**row) for row in reader]
    except Exception as e:
        raise ValueError(f"Load failed: {e}")  # Leaky exception
    chunks_iter = iter_rag_core(docs, config, deps)  # From M02C04
    chunks = list(chunks_iter)
    chunks = structural_dedup_chunks(chunks)
    obs = Observations(total_docs=len(docs), total_chunks=len(chunks))  # Simplified
    return chunks, obs

Smells:
- I/O (open) in API, not boundary.
- Exceptions for control flow.
- Mixed parametric/streaming with effects.
Problem: Breaks parametricity; hard to test without real files.


4. Refactor to Boundaries: Parametric Core + Injected Implementations

4.1 Streaming Core (Parametric over Deps)

Canonical M02C04 core (repeated for reference):

from funcpipe_rag import RagConfig, get_deps, iter_rag_core

deps = get_deps(config)
chunks_iter = iter_rag_core(docs, config, deps)

Properties:
- Arity 3: Parametric; pure if deps pure.
- Lazy: Builds on M02C03.
- Deps may be effectful (e.g., real embedder performs I/O).

4.2 Post-Clean Streaming Sub-Core

Internal sub-core:

from funcpipe_rag import iter_chunks_from_cleaned

chunks_iter = iter_chunks_from_cleaned(cleaned, config, deps.embedder)

Properties:
- Arity 3: Parametric, reusable.

4.3 I/O Boundary Implementations (Thin, Injected)

Explicit reader implementations:

from funcpipe_rag import FSReader, Ok, RawDoc, Result


class FakeReader:
    def __init__(self, docs: list[RawDoc]):
        self._docs = docs

    def read_docs(self, path: str) -> Result[list[RawDoc]]:
        _ = path
        return Ok(self._docs)

Properties:
- Thin: Single responsibility.
- Result: Explicit errors.
- Injected via deps.reader.

4.4 Public API (Edge, Composes Boundaries)

Orchestrates implementation + core:

from funcpipe_rag import FSReader, RagBoundaryDeps, full_rag_api_docs, full_rag_api_path

chunks, obs = full_rag_api_docs(docs, config, deps)
boundary_deps = RagBoundaryDeps(core=deps, reader=FSReader())
res = full_rag_api_path("arxiv_cs_abstracts_10k.csv", config, boundary_deps)

Properties:
- Arity 3: Effects in implementations (e.g., FSReader).
- Uses simple isinstance for Result handling.
- Matches the baseline stage composition on Ok.

Layers:
- Core (library, parametric, streaming): iter_rag_core.
- Sub-core (internal helper): iter_chunks_from_cleaned.
- Boundary/Edge (CLI/API, effectful): full_rag_api_path (path in, Result out).

4.5 Configurator Tie-In (M02C01)

from functools import partial
from funcpipe_rag import Chunk, ChunkWithoutEmbedding, DebugConfig, RagBoundaryDeps, RagConfig, RagCoreDeps, RagEnv
from funcpipe_rag import FSReader, Ok, RulesConfig, StartsWith, full_rag_api_path, get_deps, make_rag_fn


def fake_embedder(c: ChunkWithoutEmbedding) -> Chunk:
    return Chunk(c.doc_id, c.text, c.start, c.end, (0.0,) * 16)  # Fake embedding


# Docs API (preferred): configure a docs -> (chunks, obs) callable
rag_docs_fn = make_rag_fn(chunk_size=512)

# Boundary API: configure boundary deps and call `full_rag_api_path`
config = RagConfig(env=RagEnv(512), debug=DebugConfig())
boundary_deps = RagBoundaryDeps(core=get_deps(config), reader=FSReader())
rag_path_fn = partial(full_rag_api_path, config=config, deps=boundary_deps)

# Fake boundary: swap reader/embedder for tests
keep_all_cs = RulesConfig(keep_pred=StartsWith("categories", "cs."))
test_config = RagConfig(env=RagEnv(512), keep=keep_all_cs)
fake_boundary_deps = RagBoundaryDeps(
    core=RagCoreDeps(cleaner=get_deps(test_config).cleaner, embedder=fake_embedder, taps=None),
    reader=FakeReader([]),
)
test_rag_path_fn = partial(full_rag_api_path, config=test_config, deps=fake_boundary_deps)

Wins: Implementations injectable; fakes make core pure. Composes with M02C01.


5. Equational Reasoning: Substitution Exercise

Hand Exercise: Substitute in iter_rag_core.
1. Inline embedder = deps.embedder → fixed function.
2. Substitute into generator → parametric stream.
3. Result: Output fixed for fixed inputs/deps (parametric).
Bug Hunt: In leaky version, open breaks substitution (effects change behavior).

Example:
- Leaky: with open(...) → depends on FS, not substitutable.
- Sealed: deps.reader.read_docs(path) → injectable, substitutable with fake implementation.


6. Property-Based Testing: Proving Parametricity (Advanced, Optional)

Use Hypothesis to prove refactor preserves behavior with parametric deps.

6.1 Custom Strategy

From tests/conftest.py.

6.2 Core Equivalence Property

# tests/test_rag_api.py
from hypothesis import given
import hypothesis.strategies as st
from funcpipe_rag import (
    RagConfig,
    RagEnv,
    RagCoreDeps,
    RagBoundaryDeps,
    Err,
    Ok,
    FSReader,
    clean_doc,
    embed_chunk,
    iter_chunk_doc,
    structural_dedup_chunks,
    iter_rag_core,
    full_rag_api_path,
)
from tests.conftest import doc_list_strategy, env_strategy
from itertools import islice

def baseline_full_rag(docs, env):
    embedded = [embed_chunk(c) for d in docs for c in iter_chunk_doc(clean_doc(d), env)]
    return structural_dedup_chunks(embedded)

@given(docs=doc_list_strategy(), env=env_strategy())
def test_core_equivalence(docs, env):
    config = RagConfig(env=env)
    deps = RagCoreDeps(cleaner=clean_doc, embedder=embed_chunk)
    core_iter = iter_rag_core(iter(docs), config, deps)
    assert list(core_iter) == baseline_full_rag(docs, env)

Note: Tests parametric core equivalence to the baseline (no boundaries).

6.3 Prefix Equivalence (Streaming Core)

@given(docs=doc_list_strategy(), env=env_strategy(), k=st.integers(0, 50))
def test_core_prefix_equivalence(docs, env, k):
    config = RagConfig(env=env)
    deps = RagCoreDeps(cleaner=clean_doc, embedder=embed_chunk)
    core_iter = iter_rag_core(iter(docs), config, deps)
    assert list(islice(core_iter, k)) == baseline_full_rag(docs, env)[:k]

Note: Verifies parametric core streaming matches the baseline.

6.4 Boundary Error Handling

def test_boundary_failure():
    config = RagConfig(env=RagEnv(512))
    deps = RagBoundaryDeps(RagCoreDeps(cleaner=clean_doc, embedder=embed_chunk, taps=None), FSReader())
    res = full_rag_api_path("nonexistent.csv", config, deps)
    assert isinstance(res, Err)
    assert "Load failed" in res.error

Note: Tests boundary implementation returns Err on I/O error.

6.5 Idempotence Property (Boundary with Fake Implementation)

@given(env=env_strategy())
def test_rag_idempotence(env):
    from funcpipe_rag import Chunk, ChunkWithoutEmbedding, Ok, RawDoc, Result

    class FakeReader:
        def read_docs(self, path: str) -> Result[list[RawDoc]]:
            _ = path
            return Ok([])

    def fake_embedder(c: ChunkWithoutEmbedding) -> Chunk:
        return Chunk(c.doc_id, c.text, c.start, c.end, (0.0,) * 16)

    config = RagConfig(env=env)
    deps = RagBoundaryDeps(
        RagCoreDeps(cleaner=clean_doc, embedder=fake_embedder, taps=None),
        FakeReader(),
    )
    res1 = full_rag_api_path("fake_path", config, deps)
    res2 = full_rag_api_path("fake_path", config, deps)
    assert res1 == res2

Note: Ensures no hidden state with faked implementations (pure deps).

6.6 Shrinking Demo: Catching a Leaky Bug

Bad reader with leaky state:

from funcpipe_rag import Ok, RawDoc, Result


class BadReader:
    counter = 0

    def read_docs(self, path: str) -> Result[list[RawDoc]]:
        BadReader.counter += 1  # Leaky mutation
        if BadReader.counter % 2 == 0:
            return Ok([])
        return Ok([RawDoc("cs-123", "Title", "Abstract", "cs.AI")])

Property:

@given(env=env_strategy())
def test_bad_rag_idempotence(env):
    config = RagConfig(env=env)
    deps = RagBoundaryDeps(RagCoreDeps(cleaner=clean_doc, embedder=embed_chunk, taps=None), BadReader())
    res1 = full_rag_api_path("fake_path", config, deps)
    res2 = full_rag_api_path("fake_path", config, deps)
    assert res1 == res2

Failure Trace (Example):

Falsifying example: test_bad_rag_idempotence(
    env=RagEnv(chunk_size=128),
)
AssertionError

Analysis: Shrinks to minimal; catches leaky counter changing output between calls.


7. When Boundaries Aren't Worth It

Use leaks only in:
- Trivial one-off scripts (no reuse).
- Legacy wrappers around sealed cores.
Guardrails: Isolate leaks to <10 lines; always prefer boundaries for tests/reuse.

Example:

import json
# Trivial script
print(json.loads(open("data.json").read()))  # OK for one-off

8. Pre-Core Quiz

  1. open() in core? → Violates parametricity.
  2. raise ValueError? → Use Result.
  3. How to test I/O? → Fake implementation.
  4. Effects in generator? → Inject implementation.
  5. Prove parametricity? → Hypothesis idempotence.

9. Post-Core Reflection & Exercise

Reflect: Find a function with I/O or raises. Refactor to parametric core + implementation; inject fake. Add Hypothesis for equivalence/idempotence.
Project Exercise: Apply to RAG (e.g., load_docs as boundary); run properties.
- Did parametricity enable easier tests?
- Did fakes catch leaks?
- Did boundaries clarify effects?

Next: Core 6 – Configuration-as-Data.

Verify all patterns with Hypothesis—examples provided show how to detect impurities like globals or non-determinism.

Further Reading: For more on closures in Python, see 'Fluent Python' by Luciano Ramalho. Explore toolz for advanced partials once comfortable.