Module 2: First-Class Functions and Expressive Python¶
Progression Note¶
By the end of Module 2, you'll master first-class functions for configurability, expression-oriented code, and debugging taps. This prepares for lazy iteration in Module 3. See the series progression map in the repo root for full details.
Here's a snippet from the progression map:
| Module | Focus | Key Outcomes |
|---|---|---|
| 1: Foundational FP Concepts | Purity, contracts, refactoring | Spot impurities, write pure functions, prove equivalence with Hypothesis |
| 2: First-Class Functions & Expressive Python | Closures, partials, composable configurators | Configure pure pipelines without globals |
| 3: Lazy Iteration & Generators | Streaming/lazy pipelines | Efficient data processing without materializing everything |
M02C09 – Debugging FP Code (Naming, Probing, and Tracing Intermediate Steps in Compositions)¶
Core question:
How do you debug pure, composed FP pipelines without scattering prints or breaking laziness—so "it works in my head but not in prod" becomes structured traces, reproducible probes, and self-documenting names that turn black-box flows into auditable glass?
This core introduces debugging FP code in Python:
- Treat debugging as explicit, sealed stages with meaningful names for self-documentation.
- Use tee for non-destructive tracing (effects isolated here).
- Use probe for pure assertions or boundary checks.
- Leverage structured logs and decision traces (from M02C08).
- Employ Hypothesis with verbose mode to shrink failures to minimal counterexamples.
- No print in core transforms, no mutable debug flags, no unbounded eager materialization.
We extend the running project from m02-rag.md—the FuncPipe RAG Builder—evolving from opaque pipelines to debuggable ones with named stages, tees, and probes that preserve baseline equivalence.
Audience: Developers from M02C08 with DSL-driven pipelines but still debugging via prints or breakpoints, losing laziness and reproducibility.
Outcome:
1. Spot debug smells (prints in core, mutable flags, eager list()) and explain their impact on purity.
2. Refactor an opaque pipeline to include named functions, tee traces, and probe assertions.
3. Write Hypothesis properties with verbose tracing to debug failures, including a shrinking example.
1. Conceptual Foundation¶
1.1 Debugging FP Code in One Precise Sentence¶
Debugging FP code treats naming, probing (tee/probe stages), and tracing (structured logs + Hypothesis) as explicit, composable operations—so pipelines remain lazy and reproducible while revealing every intermediate step.
1.2 The One-Sentence Rule¶
Never smuggle ad-hoc prints or breakpoints into core transforms; isolate debug effects in explicit tee/probe stages bound by boundary config—debug like sealed data.
1.3 Why This Matters Now¶
M02C08 gave data-driven DSLs for rules, but opaque pipelines hide bugs. Debugging as explicit stages makes them auditable, leveraging M02C03 laziness, M02C05 boundaries, and M02C07 combinators for production-ready transparency.
1.4 Debugging as Values in 5 Lines¶
Debug stages as first-class enable dynamic tracing:
from collections.abc import Callable, Iterable, Iterator
from typing import TypeVar
import logging
import json
T = TypeVar("T")
log = logging.getLogger(__name__)
def tee(stage: str) -> Callable[[Iterable[T]], Iterator[T]]:
def tracer(xs: Iterable[T]) -> Iterator[T]:
for x in xs:
log.info(json.dumps({"stage": stage, "value": repr(x)[:100]})) # Structured, truncated
yield x
return tracer
Tee stages, bound via partial if needed, allow storage in dicts, composition with M02C01, and lazy tracing—explicit and configurable.
Note: Use structured JSON logs in production; effects sealed in tee. In production, keep trace_* off by default and only enable for targeted runs; tee is intentionally heavyweight and should not be permanently hot in tight loops. Adapt the logging shape (e.g., repr(x)[:100]) for your own domains since repr can be large or noisy.
2. Mental Model: Print Hell vs Explicit Stages¶
2.1 One Picture¶
Print Hell (Impure) Explicit Stages (Sealed)
+-----------------------+ +------------------------------+
| def rag(gen): | | flow( |
| for x in gen: | | producer, |
| print(x) | | tee("input"), |
| yield process(x)| | named("process", process), |
+-----------------------+ | probe("valid_chunk", assert_is_chunk),|
↑ Breaks Purity/Laziness | tee("output") |
| )() |
+------------------------------+
↑ Lazy, Auditable
2.2 Contract Table¶
| Aspect | Print Hell | Explicit Stages |
|---|---|---|
| Purity | Leaky effects | Sealed in tee |
| Laziness | Often eager | Yields through |
| Readability | Scattered prints | Named + probed stages |
| Reproducibility | Manual runs | Hypothesis verbose |
| Configurability | Global flags | Boundary config + identity |
| Mutable Defaults in Partials | Breaks Determinism | Use frozen dataclasses or immutable types for configs |
Note on Print Choice: Use prints only in trivial scripts; always prefer explicit stages for pipelines.
3. Running Project: FuncPipe RAG Builder¶
We extend the FuncPipe RAG Builder from m02-rag.md:
- Dataset: 10k arXiv CS abstracts (arxiv_cs_abstracts_10k.csv).
- Goal: Add debugging to combinator pipelines without breaking laziness.
- Start: Opaque version (core9_start.py).
- End: Debuggable pipeline with named stages, tees, and probes.
3.1 Types (Canonical, Used Throughout)¶
Extend with debug config (full, runnable with imports):
3.2 Flow Contract (Recall from M02C07)¶
Recall the flow contract from M02C07: Producer starts; transformers chain lazily. Types approximate the structure; in practice, use mypy for checks.
from typing import Any, Callable, Iterable, Protocol, TypeVar
T = TypeVar("T")
U = TypeVar("U")
class Producer(Protocol[T]):
def __call__(self) -> Iterable[T]: ...
class Stage(Protocol[T, U]):
def __call__(self, xs: Iterable[T]) -> Iterable[U]: ...
def flow(prod: Producer[T], *stages: Stage[Any, Any]) -> Callable[[], Iterable[Any]]:
def run() -> Iterable[Any]:
data: Iterable[Any] = prod()
for s in stages:
data = s(data)
return data
return run
3.3 Opaque Start (Anti-Pattern)¶
from funcpipe_rag import (
DebugConfig,
Observations,
RagConfig,
RagEnv,
eval_pred,
ffilter,
flatmap,
flow,
fmap,
gen_chunk_doc,
get_deps,
structural_dedup_chunks,
)
def opaque_run_core_on_docs(docs, chunk_size):
config = RagConfig(env=RagEnv(chunk_size), debug=DebugConfig())
deps = get_deps(config)
pipeline = flow(
lambda: docs,
ffilter(lambda d: eval_pred(d, config.keep.keep_pred)),
fmap(deps.cleaner),
flatmap(lambda cd: gen_chunk_doc(cd, config.env)),
fmap(deps.embedder),
)
chunks = structural_dedup_chunks(pipeline())
obs = Observations(total_docs=len(docs), total_chunks=len(chunks))
return chunks, obs
Smells:
- Anonymous lambdas (e.g., lambda cd: gen_chunk_doc).
- No traces or probes.
- Hard to audit decisions.
4. Refactor to Debuggable: Naming + Tee + Probe¶
4.1 Debug Primitives (Explicit Stages)¶
from collections.abc import Callable, Iterable, Iterator
from typing import TypeVar
import logging
import json
T = TypeVar("T")
log = logging.getLogger(__name__)
def tee(stage: str) -> Callable[[Iterable[T]], Iterator[T]]:
def tracer(xs: Iterable[T]) -> Iterator[T]:
for x in xs:
log.info(json.dumps({"stage": stage, "value": repr(x)[:100]})) # Structured, truncated
yield x
return tracer
def probe(stage: str, check_fn: Callable[[T], None]) -> Callable[[Iterable[T]], Iterator[T]]:
def checker(xs: Iterable[T]) -> Iterator[T]:
for x in xs:
try:
check_fn(x)
except AssertionError as e:
raise AssertionError(f"{stage}: {e}") from e
yield x
return checker
def identity(xs: Iterable[T]) -> Iterator[T]:
yield from xs
Properties:
- Tee: Traces lazily, seals logs.
- Probe: Asserts lazily, raises on failure with stage.
- Identity: No-op for conditional debug.
Note: Ensure tap functions (like tee) don’t mutate or influence data—e.g., use lambda x: log.info(repr(x)) for observation only, preserving referential transparency. tee and probe are concrete implementations of the _tap idea from earlier cores: observe without altering.
4.2 Instrumentation Wrapper (Higher-Level)¶
from funcpipe_rag import StageInstrumentation, instrument_stage
# Wrap an iterable stage with optional tracing/probing.
wrapped = instrument_stage(
stage,
stage_name="stage_name",
instrumentation=StageInstrumentation(trace=True, probe_fn=check_fn),
)
Properties:
- Safe: Uses getattr for name, no mutation.
- Composes: Adds trace/probe without rewriting.
- Lazy: Yields through.
4.2.1 Observability Without Breaking Laziness¶
To observe intermediates without materializing, use tee or probe in the flow. Here's a table of combinators for observability:
| Combinator | Use Case | Example |
|---|---|---|
| tee | Lazy tracing (logs) | tee("docs") |
| probe | Lazy assertions | probe("chunks", check_chunk) |
| identity | Conditional no-op | identity if not debug else tee |
| instrument_stage | Wrap existing stages with trace/probe | instrument_stage(ffilter(...), instrumentation=StageInstrumentation(trace=True)) |
Property test for tee transparency:
from hypothesis import given
import hypothesis.strategies as st
from unittest.mock import MagicMock
@given(xs=st.lists(st.integers()))
def test_tee_transparent(xs):
log_info = log.info # Save original
log.info = MagicMock() # Mock for test
tee_stage = tee("test")
out = list(tee_stage(iter(xs)))
log.info.assert_called() # Called once per element
assert out == xs # Tee doesn't alter output
log.info = log_info # Restore
4.3 Refactored Core (Debuggable Pipeline)¶
from funcpipe_rag import DebugConfig, RagConfig, RagEnv, get_deps, iter_rag_core, structural_dedup_chunks
config = RagConfig(
env=RagEnv(512),
debug=DebugConfig(trace_docs=True, trace_chunks=True, probe_chunks=True),
)
deps = get_deps(config)
# `iter_rag_core` already wires `instrument_stage(...)` based on `config.debug`.
chunks = structural_dedup_chunks(iter_rag_core(docs, config, deps))
Properties:
- Conditional: Via config.debug (granular).
- Lazy: All stages yield through.
- Auditable: Structured logs with stage names; probes raise with context. Instrument traces/probes the outputs of each stage; the initial docs are traced by inserting a tee("docs") immediately after the producer.
4.4 Public API (Unchanged from M02C05–M02C08)¶
from funcpipe_rag import full_rag_api_docs, full_rag_api_path, get_deps
chunks, obs = full_rag_api_docs(docs, config, get_deps(config))
res = full_rag_api_path("arxiv_cs_abstracts_10k.csv", config, boundary_deps)
Properties:
- Keeps Result; boundaries unchanged.
4.5 Configurator Tie-In (M02C01)¶
from funcpipe_rag import DebugConfig, make_rag_fn
debug_rag_fn = make_rag_fn(
chunk_size=512,
debug=DebugConfig(trace_docs=True, probe_chunks=True),
)
Wins: Debug config flows like data; composes with partial.
5. Equational Reasoning: Substitution Exercise¶
Hand Exercise: Substitute in tee/probe.
1. Inline tee("docs") if ... else identity → fixed stage.
2. Substitute into flow → parametric trace.
3. Result: Pipeline fixed for fixed config (immutable); traces sealed.
Bug Hunt: In opaque version, no substitution reveals intermediates.
Example:
- Opaque: ffilter(partial(eval_pred)) → black box.
- Debug: instrument_stage with trace/probe → auditable, substitutable.
6. Property-Based Testing: Proving Debug Behaviour¶
Use Hypothesis verbose to trace failures.
6.1 Custom Strategy¶
From tests/conftest.py.
6.2 Debug Equivalence Property (No Debug)¶
# tests/test_rag_api.py (debug flags don't affect values)
from dataclasses import replace
from hypothesis import given
from funcpipe_rag import DebugConfig, RagConfig, get_deps, iter_rag_core
from tests.conftest import doc_list_strategy, env_strategy
@given(docs=doc_list_strategy(), env=env_strategy())
def test_debug_flags_do_not_change_values(docs, env):
config = RagConfig(env=env)
deps = get_deps(config)
out1 = list(iter_rag_core(docs, config, deps))
debug_cfg = replace(
config,
debug=DebugConfig(trace_docs=True, trace_chunks=True, probe_chunks=True),
)
out2 = list(iter_rag_core(docs, debug_cfg, get_deps(debug_cfg)))
assert out1 == out2
Note: Debug off for pure equivalence; use separate verbose for tracing.
6.3 Probe Property (Invariants)¶
from hypothesis import settings, Verbosity
@settings(verbosity=Verbosity.verbose)
@given(docs=doc_list_strategy())
def test_probe_invariants(docs):
config = RagConfig(env=RagEnv(512), debug=DebugConfig(probe_chunks=True))
deps = get_deps(config)
list(iter_rag_core(docs, config, deps)) # Probes raise on failure
Note: Verbose traces on failure; concrete invariant: assert isinstance(x, ChunkWithoutEmbedding). Test passes iff no probe assertion is raised.
6.4 Idempotence with Trace¶
@settings(verbosity=Verbosity.verbose)
@given(docs=doc_list_strategy(), env=env_strategy())
def test_debug_idempotence(docs, env):
from funcpipe_rag import Ok, RagBoundaryDeps, full_rag_api_path
class FakeReader:
def __init__(self, docs):
self._docs = docs
def read_docs(self, path):
_ = path
return Ok(self._docs)
config = RagConfig(env=env, debug=DebugConfig(trace_chunks=True))
deps = RagBoundaryDeps(core=get_deps(config), reader=FakeReader(docs))
res1 = full_rag_api_path("fake_path", config, deps)
res2 = full_rag_api_path("fake_path", config, deps)
assert res1 == res2
Note: Traces confirm no state.
6.5 Shrinking Demo: Catching a Bug¶
Bad probe with state (violates referential transparency):
from collections.abc import Callable, Iterable, Iterator
from typing import Any, TypeVar
from funcpipe_rag import ChunkWithoutEmbedding
T = TypeVar("T")
def bad_probe(stage: str, check_fn: Callable[[T], None]) -> Callable[[Iterable[T]], Iterator[T]]:
counter = 0
def checker(xs: Iterable[T]) -> Iterator[T]:
nonlocal counter
for x in xs:
counter += 1
if counter % 2 == 0:
try:
check_fn(x)
except AssertionError as e:
raise AssertionError(f"{stage}: {e}") from e
yield x
return checker
def check_chunk_without_embedding(x: Any) -> None:
assert isinstance(x, ChunkWithoutEmbedding), "Invalid chunk type"
assert x.start == 0, "Expected first chunk only (demo invariant)"
Property (intentionally failing example):
@settings(verbosity=Verbosity.verbose)
@given(docs=doc_list_strategy(), chunk_size=st.integers(128, 1024))
def test_bad_debug(docs, chunk_size):
config = RagConfig(env=RagEnv(chunk_size), debug=DebugConfig(probe_chunks=True))
deps = get_deps(config)
pipeline = flow(
lambda: docs,
ffilter(lambda d: eval_pred(d, config.keep.keep_pred)),
fmap(deps.cleaner),
flatmap(lambda cd: gen_chunk_doc(cd, config.env)),
bad_probe("chunks", check_chunk_without_embedding),
fmap(deps.embedder),
)
list(pipeline()) # Consume to trigger probe
Failure Trace (Example):
Falsifying example: test_bad_debug(
docs=[RawDoc(...), RawDoc(...)], # Minimal pair triggering even/odd bug
chunk_size=128,
)
AssertionError: chunks: Expected first chunk only (demo invariant)
Analysis: This fails on any input that produces more than one chunk. The point is not the invariant itself; it's that stateful probes make failures depend on enumeration order rather than just inputs.
7. When Debugging Stages Aren't Worth It¶
Use prints only in:
- Trivial one-step scripts.
- Legacy wrappers around stages.
Guardrails: Isolate to <5 lines; prefer stages for pipelines.
Example:
8. Pre-Core Quiz¶
printin mapper? → Purity violation.list(gen)inspect? → tee(stage).- Anonymous lambda? → named("name", fn).
- Global debug flag? → Config + identity.
- Shrink failures? → Hypothesis verbose.
9. Post-Core Reflection & Exercise¶
Reflect: Find an opaque pipeline. Refactor with named, tee, probe; add verbose Hypothesis.
Project Exercise: Apply to RAG (e.g., trace decisions); run verbose properties.
- Did traces clarify bugs?
- Did probes catch invariants?
- Did verbose shrink failures?
Next: Core 10 – Refactoring Imperative Scripts into FP-Friendly Modules and APIs.
Verify all patterns with Hypothesis—examples provided show how to detect impurities like globals or non-determinism.
Further Reading: For more on closures in Python, see 'Fluent Python' by Luciano Ramalho. Explore toolz for advanced partials once comfortable.