Module 2: First-Class Functions and Expressive Python¶
Progression Note¶
By the end of Module 2, you'll master first-class functions for configurability, expression-oriented code, and debugging taps. This prepares for lazy iteration in Module 3. See the series progression map in the repo root for full details.
Here's a snippet from the progression map:
| Module | Focus | Key Outcomes |
|---|---|---|
| 1: Foundational FP Concepts | Purity, contracts, refactoring | Spot impurities, write pure functions, prove equivalence with Hypothesis |
| 2: First-Class Functions & Expressive Python | Closures, partials, composable configurators | Configure pure pipelines without globals |
| 3: Lazy Iteration & Generators | Streaming/lazy pipelines | Efficient data processing without materializing everything |
M02C02: Expression-Oriented Python – Comprehensions, Conditional Expressions, No Control Flags¶
Core question:
How do you replace statement-heavy imperative code (loops + flags + breaks) with expressions, comprehensions, and data-driven conditionals—so control flow becomes explicit, composable, and easy to reason about?
This core introduces the expression-oriented mindset in Python:
- Treat core logic as value-producing expressions, not sequences of mutations.
- Default to comprehensions, conditional expressions, and built-ins (
any,all,next) for control flow. - Eliminate mutable control flags from core logic—keep them only at trivial edges, if at all.
We continue the running project from m02-rag.md, extending the FuncPipe RAG Builder:
- Baseline: a composition of the pure stages (clean → chunk → embed → dedup).
- Module 2 (Core 1):
make_rag_fn(...)– closure-based configurators. - This core: Replace imperative loops in the RAG core with expression-oriented code that is easier to configure, test, and prove equivalent to the baseline.
Audience: Developers who understand purity and configurators (Core 1) but still write loops like:
Outcome:
- Spot control flags (
found,valid,done) and explain why they obscure logic. - Refactor a 10–20 line loop into comprehensions /
any/nextwhile preserving semantics. - Write a Hypothesis property that proves equivalence to the baseline and exposes a real flag-based bug.
Runnability Note (Module 01 Snapshot vs Module 02 End-State)¶
Some “before” snippets in this core are hypothetical pre-refactor examples used for contrast. They are labeled accordingly and are not meant to exactly match a real snapshot. We refactor these shapes into the real Module 02 API as the module progresses.
For a real, runnable Module 01 codebase, use the module-01 tag worktree:
make worktrees- Module 01 path:
history/worktrees/module-01/ - Import path for Module 01:
history/worktrees/module-01/src/
1. Conceptual Foundation¶
1.1 Expression-Oriented Python in One Precise Sentence¶
Expression-oriented programming treats control flow as compositions of value-producing expressions instead of stepwise mutation—so code reads as “data -> data” rather than “state -> state”.
1.2 The One-Sentence Rule¶
In core logic, do not use mutable flags (
found,valid,done) or manualbreak/continuefor control; use comprehensions, conditional expressions, and built-ins that return values—flags andbreakmay be acceptable inside encapsulated low-level helpers with pure signatures.
1.3 Why This Matters Now¶
Core 1 gave you pure functions and closure-based configurators:
make_rag_fn(...) -> Callable[[list[RawDoc]], tuple[list[Chunk], Observations]]is pure and deterministic.-
But the implementation of the RAG core can still be imperative:
-
Loops with flags, early breaks, scattered
ifblocks. - Harder to reason about, harder to transform, and easier to subtly break when adding new behaviors.
Expression-oriented code:
- Turns “do this, then maybe that” into “compute this value, then transform it”.
- Makes pipelines equational: each step is an expression you can substitute and test in isolation.
- Aligns perfectly with Core 1’s closure-based configurators: you configure expressions, not control-flow spaghetti.
Core 1 configures what RAG function we call (make_rag_fn); Core 2 refactors how that function is implemented internally (full_rag_api expressed as comprehensions instead of flags).
1.4 Expressions as Values in 5 Lines¶
We start with a simple, RAG-flavored predicate table:
from collections.abc import Callable
from funcpipe_rag import RawDoc
def has_long_abstract(d: RawDoc) -> bool:
return len(d.abstract) >= 100
def is_cs_category(d: RawDoc) -> bool:
return d.categories.startswith("cs.")
DocPred = Callable[[RawDoc], bool]
predicates: dict[str, DocPred] = {
"long_abstract": has_long_abstract,
"cs_only": is_cs_category,
}
def filter_docs(key: str, docs: list[RawDoc]) -> list[RawDoc]:
return [d for d in docs if predicates[key](d)]
The key point:
filter_docsis a single expression ([...]) mapping docs to docs.- Control flow (“if this doc satisfies predicate P, keep it”) is encoded as data:
predicates[key].
No flags, no break; everything is composable and easy to test.
2. Mental Model: Imperative Flags vs Expressions¶
2.1 One Picture¶
Imperative Flags (Mutable) Expression-Oriented (Pure)
+-----------------------+ +------------------------------+
| found = False | | found = any(pred(x) |
| for x in xs: | | for x in xs) |
| if pred(x): | | |
| found = True | | # Single expression |
| break | | # No flags, no break |
+-----------------------+ +------------------------------+
↑ Scattered control ↑ Control is data
↑ Subtle state coupling ↑ Easy to compose / test
2.2 Contract Table¶
| Aspect | Imperative Flags | Expression-Oriented |
|---|---|---|
| Dependencies | Hidden in loop structure | Explicit in predicates and expressions |
| Control Flow | Flags + break/continue |
Comprehensions, any/all/next, ternaries |
| Reasoning | Global: “what happens to found?” |
Local: “what does this expression compute?” |
| Refactoring | Easy to introduce non-local bugs | Equational: refactor expression ↔ expression |
| Testing | Need to inspect loop behavior | Test expressions as pure functions |
# Imperative: flag + break to get first matching doc
first_long = None
for d in docs:
if has_long_abstract(d):
first_long = d
break
# Expression-oriented: next() with default
first_long = next(
(d for d in docs if has_long_abstract(d)),
None, # default if no doc matches
)
While comprehensions promote expression-oriented code, prioritize readability: If a comprehension becomes nested or complex (e.g., 3+ layers), refactor to named helper functions or consider a simple loop inside a trivial pure wrapper. Purity matters, but so does maintainability.
3. Running Project: FuncPipe RAG Builder¶
We continue the FuncPipe RAG Builder from m02-rag.md.
- Baseline: a pure stages composition (clean → chunk → embed → dedup).
- Module 2 Core 1:
make_rag_fn(...)– closure-based configurators. - This core: We refactor the internal implementation of the RAG API from imperative loops to expression-based code while preserving equivalence to the baseline.
3.1 Types (Canonical, Used Throughout)¶
We rely on the types defined in src/funcpipe_rag/rag_types.py and src/funcpipe_rag/api/types.py:
from funcpipe_rag import DocRule, Observations, RagTaps
from funcpipe_rag import RawDoc, CleanDoc, Chunk, RagEnv
These are pure data containers; expression orientation will sit on top of them.
4. Imperative Start: Loops and Flags¶
We begin with a hypothetical pre-refactor implementation of the extended RAG pipeline. It’s semantically correct, but filled with flags and stepwise loops, and it is not intended to be run as-is in the end-of-Module-02 checkout.
# core2_start.py (hypothetical pre-refactor; illustration only)
from funcpipe_rag import RawDoc, CleanDoc, ChunkWithoutEmbedding, Chunk, RagEnv
from funcpipe_rag import DocRule, Observations, RagTaps
from funcpipe_rag import any_doc
from funcpipe_rag import clean_doc # baseline stage
from funcpipe_rag import embed_chunk, structural_dedup_chunks
def imperative_full_rag_api(
docs: list[RawDoc],
env: RagEnv,
cleaner: Callable[[RawDoc], CleanDoc],
*,
keep: DocRule | None = None,
taps: RagTaps | None = None,
) -> tuple[list[Chunk], Observations]:
rule = keep if keep is not None else any_doc
# 1) Filter docs using per-doc flag
kept_docs: list[RawDoc] = []
for d in docs:
is_kept = rule(d) # Flag; local, but unnecessary
if is_kept:
kept_docs.append(d)
if taps and taps.docs:
taps.docs(tuple(kept_docs))
# 2) Clean docs using explicit accumulation
cleaned: list[CleanDoc] = []
for d in kept_docs:
cd = cleaner(d)
cleaned.append(cd)
if taps and taps.cleaned:
taps.cleaned(tuple(cleaned))
# 3) Chunk each cleaned doc using index + while loop
chunk_we: list[ChunkWithoutEmbedding] = []
for cd in cleaned:
text = cd.abstract
i = 0
while i < len(text):
s = text[i:i + env.chunk_size]
if s:
chunk_we.append(
ChunkWithoutEmbedding(cd.doc_id, s, i, i + len(s))
)
i += env.chunk_size
# 4) Embed chunks
embedded: list[Chunk] = []
for c in chunk_we:
embedded.append(embed_chunk(c))
# 5) Deduplicate structurally (baseline stage helper)
chunks = structural_dedup_chunks(embedded)
if taps and taps.chunks:
taps.chunks(tuple(chunks))
obs = Observations(
total_docs=len(docs),
total_chunks=len(chunks),
kept_docs=len(kept_docs),
cleaned_docs=len(cleaned),
sample_doc_ids=tuple(d.doc_id for d in kept_docs[:5]),
sample_chunk_starts=tuple(c.start for c in chunks[:5]),
)
return chunks, obs
Key points:
- This function is pure and deterministic.
-
But control flow is encoded as:
-
Per-doc flags (
is_kept), - Manual accumulation loops,
- Explicit index management (
i+while).
It works, but it doesn’t read as “data -> data” so much as “do X, then Y, then Z”.
5. Refactor to Expressions: Comprehensions & Conditionals¶
We now introduce a small helper and an expression-oriented RAG core.
5.1 Side-Effect Taps as an Expression Primitive¶
We define _tap as the only side-effect primitive allowed in this core:
from typing import TypeVar, Callable
T = TypeVar("T")
def _tap(xs: list[T], h: Callable[[tuple[T, ...]], None] | None) -> list[T]:
"""
Observational tap: if h is provided, call h(tuple(xs)) for side effects,
then return xs unchanged.
Contract: For all xs and h, the *return value* of _tap(xs, h) equals xs.
All value-level behavior of the pipeline is unchanged; only side effects differ.
"""
if h:
h(tuple(xs))
return xs
This preserves the value semantics of the pipeline while allowing optional metrics/logging at the edges.
5.2 Expression-Oriented RAG Core¶
We now rewrite the RAG core in an expression style. This is an illustration-only refactor; the runnable end-of-Module-02 implementation lives in src/funcpipe_rag/api/core.py (full_rag_api_docs / full_rag_api).
# core2_refactor_demo.py (illustration only; not the canonical Module-02 API)
from collections.abc import Callable
from funcpipe_rag import RawDoc, CleanDoc, ChunkWithoutEmbedding, Chunk, RagEnv
from funcpipe_rag import DocRule, Observations, RagTaps
from funcpipe_rag import any_doc
from funcpipe_rag import embed_chunk, structural_dedup_chunks
def toy_gen_chunk_doc(cd: CleanDoc, env: RagEnv) -> list[ChunkWithoutEmbedding]:
"""
Pure helper: chunk a cleaned document into fixed-size pieces.
"""
text = cd.abstract
return [
ChunkWithoutEmbedding(cd.doc_id, chunk_text, start, start + len(chunk_text))
for start in range(0, len(text), env.chunk_size)
if (chunk_text := text[start:start + env.chunk_size])
]
def toy_full_rag_api(
docs: list[RawDoc],
env: RagEnv,
cleaner: Callable[[RawDoc], CleanDoc],
*,
keep: DocRule | None = None,
taps: RagTaps | None = None,
) -> tuple[list[Chunk], Observations]:
rule = keep if keep is not None else any_doc # conditional expression
kept_docs = _tap(
[d for d in docs if rule(d)], # filter
taps.docs if taps else None,
)
cleaned = _tap(
[cleaner(d) for d in kept_docs], # map
taps.cleaned if taps else None,
)
chunk_we = [
c
for cd in cleaned
for c in toy_gen_chunk_doc(cd, env) # flatMap
]
embedded = [embed_chunk(c) for c in chunk_we]
chunks = _tap(
structural_dedup_chunks(embedded),
taps.chunks if taps else None,
)
obs = Observations(
total_docs=len(docs),
total_chunks=len(chunks),
kept_docs=len(kept_docs),
cleaned_docs=len(cleaned),
sample_doc_ids=tuple(d.doc_id for d in kept_docs[:5]),
sample_chunk_starts=tuple(c.start for c in chunks[:5]),
)
return chunks, obs
Properties:
- No mutable flags (
is_kept,found_chunk,done). -
Control flow is now encoded as expressions:
-
Filtering:
[d for d in docs if rule(d)] - Mapping:
[cleaner(d) for d in kept_docs] - Chunk flattening:
for cd in cleaned for c in gen_chunk_doc(cd, env) _tapis the only place where side effects may occur, and it preserves the values.
This is now a direct “data -> data” description of the pipeline.
5.3 Expression Partial (Core 1 Tie-In)¶
from functools import partial
from funcpipe_rag import CleanConfig, make_rag_fn, any_doc
has_long_abstract = lambda d: len(d.abstract) >= 100
has_valid_doc = lambda d: any_doc(d) and has_long_abstract(d) # Logical and as expression
# In the end-of-Module-02 codebase, `make_rag_fn` captures frozen config.
rag_fn = make_rag_fn(chunk_size=512, clean_cfg=CleanConfig())
# Expression-oriented filtering still composes cleanly:
filtered_docs = [d for d in docs if has_valid_doc(d)]
chunks, obs = rag_fn(filtered_docs)
Wins: Data-driven filtering without flags; composes with Core 1. make_rag_fn is canonical configurator wrapping this expression-based pipeline.
6. Equational Reasoning: Substitution Exercise¶
Hand Exercise: Replace expressions in full_rag_api.
1. Inline rule = keep if keep is not None else any_doc → selected rule.
2. Substitute into kept_docs → filtered list.
3. Result: Entire call = fixed value for fixed inputs.
Bug Hunt: In imperative version, flags obscure per-element logic.
7. Property-Based Testing: Proving Equivalence (Advanced, Optional)¶
Use Hypothesis to prove the refactor preserved behavior against the baseline.
7.1 Custom Strategy (RAG Domain)¶
From tests/conftest.py.
7.2 Equivalence Property¶
# tests/test_rag_api.py
from hypothesis import given
import hypothesis.strategies as st
from funcpipe_rag import (
RagEnv,
RagConfig,
get_deps,
full_rag_api_docs,
clean_doc,
embed_chunk,
iter_chunk_doc,
structural_dedup_chunks,
)
from tests.conftest import doc_list_strategy, env_strategy
def baseline_full_rag(docs, env):
embedded = [embed_chunk(c) for d in docs for c in iter_chunk_doc(clean_doc(d), env)]
return structural_dedup_chunks(embedded)
@given(docs=doc_list_strategy(), env=env_strategy())
def test_rag_equivalence(docs, env):
config = RagConfig(env=env)
deps = get_deps(config)
expressive, _ = full_rag_api_docs(docs, config, deps)
assert expressive == baseline_full_rag(docs, env)
Note: Property checks chunk equivalence to a baseline built from the pure stages; obs is new in Module 2. Separately from chunk equivalence, you can add invariants on Observations—for example, that kept_docs equals the number of kept docs and total_chunks equals the number of produced chunks. These invariants live in your implementation, not in the property itself.
7.3 Shrinking Demo: Catching a Bug¶
Bad refactor (cumulative flag):
# (imports as in full_rag_api above)
from funcpipe_rag import RawDoc, Chunk, RagEnv
from funcpipe_rag import DocRule, Observations, RagTaps
from funcpipe_rag import any_doc
from funcpipe_rag import clean_doc, embed_chunk, structural_dedup_chunks
def bad_full_rag_api(docs: list[RawDoc], env: RagEnv, cleaner: Callable[[RawDoc], CleanDoc], *,
keep: DocRule | None = None, taps: RagTaps | None = None) -> tuple[list[Chunk], Observations]:
rule = keep if keep is not None else any_doc
valid = True # Shared flag
kept_docs = []
for d in docs:
valid = rule(d) and valid # Cumulative poison on False
if valid:
kept_docs.append(d)
# ... rest as expressive
cleaned = [cleaner(d) for d in kept_docs]
chunk_we = [c for cd in cleaned for c in gen_chunk_doc(cd, env)]
embedded = [embed_chunk(c) for c in chunk_we]
chunks = structural_dedup_chunks(embedded)
obs = Observations(
total_docs=len(docs),
total_chunks=len(chunks),
kept_docs=len(kept_docs),
cleaned_docs=len(cleaned),
sample_doc_ids=tuple(d.doc_id for d in kept_docs[:5]),
sample_chunk_starts=tuple(c.start for c in chunks[:5]),
)
return chunks, obs
Property (swapped to bad_full_rag_api):
@given(docs=doc_list_strategy(), env=env_strategy())
def test_bad_rag(docs, env):
baseline_chunks = baseline_full_rag(docs, env)
bad_exp, bad_obs = bad_full_rag_api(docs, env, clean_doc)
assert bad_exp == baseline_chunks
Hypothesis failure trace (run to verify; example):
Falsifying example: test_bad_rag(
docs=[RawDoc(... valid ...), RawDoc(... invalid ...), RawDoc(... valid ...)],
env=RagEnv(chunk_size=128),
)
AssertionError
- Shrinks to mixed docs; catches cumulative flag bug. If you accidentally regress full_rag_api into a flag-based bug like bad_full_rag_api, the same equivalence property will fail and Hypothesis will shrink to a minimal counterexample (mixed valid/invalid docs). You simply can’t write this class of error when your filtering is expressed as
[d for d in docs if rule(d)]— there is no mutable accumulator to poison. The invariant ‘each doc is kept iff rule(d) is True’ is silently broken once a single doc fails the rule, because valid remains False forever.
8. When Expressions Aren't Worth It¶
Rarely, for profiled hot paths (e.g., large loops), use imperative behind an expression API.
9. Pre-Core Quiz¶
- Mutable
foundflag? → No mutable flags. breakin loop → fix with? → next() or any().- Deep if-else → fix with? → Ternary or dict.
- Loop with counter? → Comprehension.
- Prove imperative ≡ expression? → Hypothesis.
10. Post-Core Reflection & Exercise¶
Reflect: In your code, find one loop with flag. Refactor to expression; add Hypothesis equiv.
Project Exercise: Apply to RAG; run properties on sample data.
Next: Core 3 – Intro Laziness & Generators. (Builds on this expressions.)
Verify all patterns with Hypothesis—examples provided show how to detect impurities like globals or non-determinism.
Further Reading: For more on closures in Python, see 'Fluent Python' by Luciano Ramalho. Explore toolz for advanced partials once comfortable.