Skip to content

Module 2: First-Class Functions and Expressive Python

Progression Note

By the end of Module 2, you'll master first-class functions for configurability, expression-oriented code, and debugging taps. This prepares for lazy iteration in Module 3. See the series progression map in the repo root for full details.

Here's a snippet from the progression map:

Module Focus Key Outcomes
1: Foundational FP Concepts Purity, contracts, refactoring Spot impurities, write pure functions, prove equivalence with Hypothesis
2: First-Class Functions & Expressive Python Closures, partials, composable configurators Configure pure pipelines without globals
3: Lazy Iteration & Generators Streaming/lazy pipelines Efficient data processing without materializing everything

M02C04 – Designing FP-Friendly APIs (Small Arity, Explicit Dependencies, No Hidden Globals)

Core question:
How do you design APIs with ≤3 parameters, explicit config and dependencies, and no hidden globals—so pipelines from M02C01–M02C03 are composable, testable, and predictable?

This core introduces FP-friendly API design in Python:
- Craft functions as small, composable bricks with explicit interfaces (arity ≤3 for core public APIs).
- Group parameters into immutable config (domain settings) and dependencies (injected services).
- Build on M02C01 configurators, M02C02 expressions, and M02C03 laziness for streaming pipelines.

We extend the running project from m02-rag.md—the FuncPipe RAG Builder—evolving from a high-arity, global-dependent version to clean, composable APIs that preserve baseline equivalence.

Audience: Developers from M02C03 using lazy pipelines but facing high-arity functions or hidden globals that hinder testing and composition.
Outcome:
1. Identify high-arity or hidden dependencies in code and explain their impact on composability.
2. Refactor a high-arity function into a small-arity API with grouped config and dependencies.
3. Write Hypothesis properties proving equivalence and idempotence, with a shrinking example.


1. Conceptual Foundation

1.1 FP-Friendly API Design in One Precise Sentence

FP-friendly APIs limit core public functions to ≤3 parameters, group domain settings into immutable config and services into explicit dependencies, and avoid hidden globals—ensuring composability, testability, and equational reasoning.

1.2 The One-Sentence Rule

Core public APIs must have ≤3 parameters (inputs, config, deps), with all dependencies explicit and globals forbidden; bind config/deps at edges using M02C01 partials or factories.

1.3 Why This Matters Now

M02C03 introduced lazy pipelines, but high-arity APIs or hidden globals make them hard to partialize, test, or compose. FP-friendly design ensures pipelines snap together, leveraging M02C01 configurators for variants, M02C02 expressions for clarity, and M02C03 laziness for efficiency.

1.4 FP-Friendly APIs as Values in 5 Lines

Small-arity APIs enable dynamic composition:

from functools import partial
from funcpipe_rag import RagConfig, RagEnv, RulesConfig, StartsWith, get_deps, iter_rag_core

standard_config = RagConfig(env=RagEnv(512))
standard_deps = get_deps(standard_config)

filtered_config = RagConfig(
    env=RagEnv(512),
    keep=RulesConfig(keep_pred=StartsWith("categories", "cs.")),
)
filtered_deps = get_deps(filtered_config)

rags: dict[str, object] = {
    "standard": partial(iter_rag_core, config=standard_config, deps=standard_deps),
    "filtered": partial(iter_rag_core, config=filtered_config, deps=filtered_deps),
}

Small-arity functions (inputs, config, deps), explicit config/deps, and no globals allow storage in dicts, composition with M02C01 partials, and testing as first-class values. For example, swapping keep in config creates variants without globals or high arity.

Note: In real systems, embed may involve I/O (e.g., API calls); injecting it in deps ensures stricter purity, treating the core as referentially transparent.


2. Mental Model: High-Arity Globals vs Small Explicit APIs

2.1 One Picture

High-Arity Globals (Messy)              Small Explicit APIs (Composable)
+-----------------------+               +------------------------------+
| def rag(docs, env,    |               | def iter_rag_core(docs,      |
| cleaner, keep, taps,  |               | config, deps)                |
| chunk_size, more...)  |               | -> Iterator[Chunk]           |
| # Uses GLOBAL_CFG     |               | # Config: env, keep          |
|                       |               | # Deps: cleaner, embed, taps |
+-----------------------+               +------------------------------+
   ↑ Hard to Test/Compose                ↑ Snaps into Partial/Flow

2.2 Contract Table

Aspect High-Arity Globals Small Explicit APIs
Arity >3 params ≤3 (inputs, config, deps)
Dependencies Hidden globals/env vars Explicit config/deps structs
Composability Hard (many args, globals) Easy (partial, flow)
Testing Mock globals, flaky Inject fakes, deterministic
Boundaries Mixed pure/effects Pure core, effectful edges
Reasoning Opaque (hidden state) Equational (substitutable)
Mutable Defaults in Partials Breaks Determinism Use frozen dataclasses or immutable types for configs

Note on High-Arity Choice: Use higher arity only for legacy adapters, wrapping small-arity cores.

2.3 Common API Shapes Table

To lock in the arity rule, here are typical shapes:

Shape Meaning Example
f(data) Pure utility, no config/deps hash(data)
f(data, config) Domain-level core chunk(data, ChunkConfig)
f(data, config, deps) Cross-cutting deps present iter_rag_core(docs, config, deps)

Any other shape (e.g., f(docs, env, keep, cleaner, taps, ...)) must be considered an anti-pattern and refactored.


3. Running Project: FuncPipe RAG Builder

We extend the FuncPipe RAG Builder from m02-rag.md:
- Dataset: 10k arXiv CS abstracts (arxiv_cs_abstracts_10k.csv).
- Goal: Design small-arity, explicit APIs that compose lazily and match the baseline outputs.
- Start: High-arity, global-dependent version (core4_start.py).
- End: FP-friendly API with streaming core and edge materialization.

3.1 Types (Canonical, Used Throughout)

From src/funcpipe_rag/rag_types.py, src/funcpipe_rag/api/types.py, plus new config/deps:

from funcpipe_rag import Observations, RagConfig, RagCoreDeps, RagEnv, RagTaps

3.2 High-Arity Start (Anti-Pattern)

# core4_start.py: High-arity, global-dependent RAG
from funcpipe_rag import RawDoc, CleanDoc, ChunkWithoutEmbedding, Chunk, RagEnv
from funcpipe_rag import DocRule, Observations, RagTaps
from funcpipe_rag import any_doc
from funcpipe_rag import clean_doc, embed_chunk, structural_dedup_chunks, gen_chunk_doc
from collections.abc import Callable, Sequence

GLOBAL_ENV = RagEnv(512)  # Hidden global


def high_arity_rag(
        docs: list[RawDoc],
        cleaner: Callable[[RawDoc], CleanDoc],
        keep: DocRule | None,
        taps: RagTaps | None,
        chunk_size: int = GLOBAL_ENV.chunk_size,
        debug: bool = False
) -> tuple[list[Chunk], Observations]:
    rule = keep if keep is not None else any_doc
    kept_docs = [d for d in docs if rule(d)]
    if taps and taps.docs and debug:
        taps.docs(tuple(kept_docs))
    cleaned = [cleaner(d) for d in kept_docs]
    if taps and taps.cleaned:
        taps.cleaned(tuple(cleaned))
    chunk_we = [c for cd in cleaned for c in gen_chunk_doc(cd, RagEnv(chunk_size))]
    embedded = [embed_chunk(c) for c in chunk_we]
    chunks = structural_dedup_chunks(embedded)
    if taps and taps.chunks and debug:
        taps.chunks(tuple(chunks))
    obs = Observations(
        total_docs=len(docs),
        total_chunks=len(chunks),
        kept_docs=len(kept_docs),
        cleaned_docs=len(cleaned),
        sample_doc_ids=tuple(d.doc_id for d in kept_docs[:5]),
        sample_chunk_starts=tuple(c.start for c in chunks[:5]),
    )
    return chunks, obs

Smells:
- High arity (6 params: docs, cleaner, keep, taps, chunk_size, debug).
- Hidden global (GLOBAL_ENV).
- Mixed effects (taps with debug flag).
Problem: Hard to partialize, test, or reason about due to excessive params and global dependency.


4. Refactor to FP-Friendly: Small Arity, Explicit Dependencies

To strengthen pedagogy, here's a concrete before/after example for redesigning an unfriendly API:

import os
import pandas as pd
from dataclasses import dataclass

# Before: Unfriendly API with implicit context
def foo(df: pd.DataFrame) -> pd.DataFrame:
    threshold = float(os.environ.get('THRESHOLD', '0.5'))  # Hidden env dep
    return df[df['value'] > threshold]  # Non-deterministic if env changes

@dataclass(frozen=True)
class FooConfig:
    threshold: float

# After: FP-Friendly with explicit deps
def foo(data: pd.DataFrame, *, config: FooConfig) -> pd.DataFrame:
    return data[data['value'] > config.threshold]  # Pure: Depends only on inputs

This makes the function testable (inject mock config) and composable—no surprises from environment variables.

4.1 Streaming Core (Pure, Lazy)

A pure, lazy core with small arity, building on M02C03:

from funcpipe_rag import RagConfig, RagCoreDeps, iter_rag_core

chunks_iter = iter_rag_core(docs, config, deps)

Properties:
- Arity 3: docs, config, deps.
- Pure, fully lazy (generator-based, O(1) memory).
- No taps (effects deferred to edge).
- Explicit config/deps, no globals.

4.2 Post-Clean Streaming Sub-Core

To reuse core logic at the edge without duplicating the full pipeline:

from collections.abc import Iterator, Iterable, Callable
from funcpipe_rag import CleanDoc, Chunk, ChunkWithoutEmbedding, RagConfig
from funcpipe_rag import gen_chunk_doc


def iter_chunks_from_cleaned(
        cleaned: Iterable[CleanDoc],
        config: RagConfig,
        embed: Callable[[ChunkWithoutEmbedding], Chunk]
) -> Iterator[Chunk]:
    """Sub-core: lazy chunk and embed from cleaned docs (reuses M02C03 patterns)."""
    for cd in cleaned:
        for chunk in gen_chunk_doc(cd, config.env):
            yield embed(chunk)

Properties:
- Arity 3: cleaned, config, embed (sub-core, internal; embed injected for consistency).
Here config is domain config and embed is a dependency; we still respect the “data, config, deps” ≤3-arity pattern even in internal sub-cores.
- Enables reuse in full_rag_api_docs (and full_rag_api) for lazy post-clean processing.

4.3 Public API (Edge, Materializes)

Wraps the core components, handles materialization and taps:

from funcpipe_rag import RagConfig, RagCoreDeps, full_rag_api_docs

# Canonical end-of-Module-02 API (implemented in `src/funcpipe_rag/api/core.py`)
chunks, obs = full_rag_api_docs(docs, config, deps)

Properties:
- Arity 3, explicit config/deps.
- Builds on M02C03 laziness internally (lazy post-clean via sub-core); materializes filter/clean at edge for taps/obs.
- Reuses core expressions via a private _tap helper and the iter_chunks_from_cleaned sub-core; taps are observational side effects isolated to the edge.
- Matches the baseline stage composition when config.keep = DEFAULT_RULES, deps.taps = None.
Note: iter_rag_core is the fully streaming core. full_rag_api_docs intentionally materializes intermediates for observations/taps; laziness applies post-clean. Dedup runs post-tap as it requires a global view. _tap is an internal helper (see src/funcpipe_rag/api/core.py), not a public API.

4.4 Configurator Tie-In (M02C01) and Swapping Examples

from functools import partial
from funcpipe_rag import (
    Chunk,
    ChunkWithoutEmbedding,
    RagConfig,
    RagCoreDeps,
    RagEnv,
    RulesConfig,
    StartsWith,
    full_rag_api_docs,
    get_deps,
)

# Standard variant
standard_config = RagConfig(env=RagEnv(512))
standard_deps = get_deps(standard_config)
rag_fn = partial(full_rag_api_docs, config=standard_config, deps=standard_deps)

# Swapping config: Filter to CS docs
cs_config = RagConfig(
    env=RagEnv(512),
    keep=RulesConfig(keep_pred=StartsWith("categories", "cs.")),
)
cs_deps = get_deps(cs_config)
cs_rag_fn = partial(full_rag_api_docs, config=cs_config, deps=cs_deps)


# Swapping deps: Fake embedder for tests (no I/O)
def fake_embed(c: ChunkWithoutEmbedding) -> Chunk:
    return Chunk(c.doc_id, c.text, c.start, c.end, (0.0,) * 16)  # Mock embedding


test_deps = RagCoreDeps(cleaner=standard_deps.cleaner, embedder=fake_embed, taps=None)
test_rag_fn = partial(full_rag_api_docs, config=standard_config, deps=test_deps)

Wins: Small arity enables easy partialization; config/deps allow clean swapping (e.g., rules via config, fakes via deps). Composes with M02C01 make_rag_fn.


5. Equational Reasoning: Substitution Exercise

Hand Exercise: Substitute expressions in iter_rag_core.
1. Inline rule = config.keep → fixed predicate.
2. Substitute into generator expression → filtered stream.
3. Result: Output stream is fixed for fixed docs, config, deps.
Bug Hunt: In high_arity_rag, GLOBAL_ENV breaks substitution (replacing reference changes behavior).

Example:
- High-arity: chunk_size = GLOBAL_ENV.chunk_size → depends on mutable global, substitution fails.
- Friendly: config.env.chunk_size → immutable, substitutable, behavior preserved.


6. Property-Based Testing: Proving Equivalence and Idempotence

Use Hypothesis to prove the refactor preserves baseline behavior and avoids global bugs.

6.1 Custom Strategy

From tests/conftest.py.

6.2 Equivalence Property (Core vs Baseline)

# tests/test_rag_api.py
from hypothesis import given
import hypothesis.strategies as st
from funcpipe_rag import (
    clean_doc,
    embed_chunk,
    get_deps,
    iter_chunk_doc,
    RagConfig,
    structural_dedup_chunks,
    full_rag_api_docs,
    iter_rag_core,
)
from tests.conftest import doc_list_strategy, env_strategy
from itertools import islice, tee

def baseline_full_rag(docs, env):
    embedded = [embed_chunk(c) for d in docs for c in iter_chunk_doc(clean_doc(d), env)]
    return structural_dedup_chunks(embedded)

@given(docs=doc_list_strategy(), env=env_strategy())
def test_rag_equivalence(docs, env):
    config = RagConfig(env=env)
    deps = get_deps(config)
    docs1, docs2 = tee(iter(docs))  # Consistent iterables
    baseline = baseline_full_rag(list(docs1), env)
    chunks, _ = full_rag_api_docs(docs2, config, deps)
    assert chunks == baseline

Note: Tests chunk equivalence to a baseline built from the pure stages.

6.3 Prefix Equivalence (Streaming Core)

@given(docs=doc_list_strategy(), env=env_strategy(), k=st.integers(0, 50))
def test_core_prefix_equivalence(docs, env, k):
    config = RagConfig(env=env)
    deps = get_deps(config)
    docs1, docs2 = tee(iter(docs))
    baseline = baseline_full_rag(list(docs1), env)
    core_iter = iter_rag_core(docs2, config, deps)
    assert list(islice(core_iter, k)) == baseline[:k]

Note: Verifies streaming core matches the baseline on finite prefixes (M02C03 tie-in).

6.4 Idempotence Property

@given(docs=doc_list_strategy(), env=env_strategy())
def test_rag_idempotence(docs, env):
    config = RagConfig(env=env)
    deps = get_deps(config)
    docs1, docs2 = tee(iter(docs))
    chunks1, _ = full_rag_api_docs(docs1, config, deps)
    chunks2, _ = full_rag_api_docs(docs2, config, deps)
    assert chunks1 == chunks2

Note: Verifies same inputs yield same outputs, catching global mutation bugs.

6.5 Shrinking Demo: Catching a Global Bug

Bad refactor with global mutation:

from funcpipe_rag import RawDoc, CleanDoc, Chunk, ChunkWithoutEmbedding, RagEnv
from funcpipe_rag import Observations, RagCoreDeps, eval_pred, full_rag_api_docs
from funcpipe_rag import gen_chunk_doc, structural_dedup_chunks
from collections.abc import Iterator, Iterable, Callable


def bad_full_rag_api(
        docs: Iterable[RawDoc],
        config: RagConfig,
        deps: RagCoreDeps
) -> tuple[list[Chunk], Observations]:
    # Reuse the same GLOBAL_ENV from the high_arity_rag anti-pattern
    global GLOBAL_ENV
    GLOBAL_ENV = RagEnv(config.env.chunk_size + 1)  # Mutates global
    docs_list = list(docs)
    kept_docs = [d for d in docs_list if eval_pred(d, config.keep.keep_pred)]
    cleaned = [deps.cleaner(d) for d in kept_docs]
    chunks_iter = (deps.embedder(c) for cd in cleaned for c in gen_chunk_doc(cd, GLOBAL_ENV))
    chunks = list(chunks_iter)
    chunks = structural_dedup_chunks(chunks)
    obs = Observations(total_docs=len(docs_list), total_chunks=len(chunks), kept_docs=len(kept_docs), cleaned_docs=len(cleaned))
    return chunks, obs

Property testing the bad version:

@given(docs=doc_list_strategy(), env=env_strategy())
def test_bad_rag_idempotence(docs, env):
    global GLOBAL_ENV
    GLOBAL_ENV = env
    config = RagConfig(env=env)
    deps = RagCoreDeps(cleaner=clean_doc, embedder=embed_chunk, taps=None)
    docs1, docs2 = tee(iter(docs))
    chunks1, _ = bad_full_rag_api(docs1, config, deps)
    chunks2, _ = bad_full_rag_api(docs2, config, deps)
    assert chunks1 == chunks2

Failure Trace (Example):

Falsifying example: test_bad_rag_idempotence(
    docs=[RawDoc(doc_id='a', title='t', abstract='abc', categories='c')],
    env=RagEnv(chunk_size=128),
)
AssertionError

Analysis: Shrinks to minimal doc where GLOBAL_ENV mutation changes chunk sizes between calls, breaking idempotence.


7. When FP-Friendly APIs Aren't Worth It

Use higher arity or globals only in:
- Legacy adapters (e.g., framework callbacks requiring fixed signatures).
- One-off scripts with no reuse.
Guardrails: Wrap such functions in thin adapters calling small-arity cores to isolate complexity.

Example:

# Legacy adapter
def legacy_rag(docs, chunk_size, cleaner, keep, debug):
    config = RagConfig(env=RagEnv(chunk_size), keep=keep)
    deps = RagCoreDeps(cleaner=cleaner, embedder=embed_chunk)
    return full_rag_api_docs(docs, config, deps)

8. Pre-Core Quiz

  1. Why does def f(a, b, c, d, e) violate FP-friendly design?
    Answer: Arity >3, hard to partialize or compose.
  2. How to fix a function using GLOBAL_DB?
    Answer: Inject as dependency in deps.
  3. What’s wrong with def rag(docs, cleaner, env, keep, taps)?
    Answer: High arity (5); group env, keep into config, cleaner, taps into deps.
  4. Why use RagConfig and RagCoreDeps structs?
    Answer: Encapsulate domain settings and services, reduce arity, clarify intent.
  5. Tool to prove refactor correctness?
    Answer: Hypothesis (equivalence, idempotence).

9. Post-Core Reflection & Exercise

Reflect: Find a function in your codebase with >3 params or hidden globals. Refactor it to use inputs, config, deps with arity ≤3. Add Hypothesis tests for equivalence and idempotence.
Project Exercise: Apply to RAG pipeline; run properties on arxiv_cs_abstracts_10k.csv.
- Did composability improve (easier partials)?
- Did tests catch global bugs?
- Did config/deps clarify domain logic?

Next: Core 5 – Boundary Design (Isolating I/O to Edges).

Verify all patterns with Hypothesis—examples provided show how to detect impurities like globals or non-determinism.

Further Reading: For more on closures in Python, see 'Fluent Python' by Luciano Ramalho. Explore toolz for advanced partials once comfortable.