Module 2: First-Class Functions and Expressive Python¶

Progression Note¶

By the end of Module 2, you'll master first-class functions for configurability, expression-oriented code, and debugging taps. This prepares for lazy iteration in Module 3. See the series progression map in the repo root for full details.

Here's a snippet from the progression map:

Module	Focus	Key Outcomes
1: Foundational FP Concepts	Purity, contracts, refactoring	Spot impurities, write pure functions, prove equivalence with Hypothesis
2: First-Class Functions & Expressive Python	Closures, partials, composable configurators	Configure pure pipelines without globals
3: Lazy Iteration & Generators	Streaming/lazy pipelines	Efficient data processing without materializing everything

M02C02: Expression-Oriented Python – Comprehensions, Conditional Expressions, No Control Flags¶

Core question:
How do you replace statement-heavy imperative code (loops + flags + breaks) with expressions, comprehensions, and data-driven conditionals—so control flow becomes explicit, composable, and easy to reason about?

This core introduces the expression-oriented mindset in Python:

Treat core logic as value-producing expressions, not sequences of mutations.
Default to comprehensions, conditional expressions, and built-ins (any, all, next) for control flow.
Eliminate mutable control flags from core logic—keep them only at trivial edges, if at all.

We continue the running project from m02-rag.md, extending the FuncPipe RAG Builder:

Baseline: a composition of the pure stages (clean → chunk → embed → dedup).
Module 2 (Core 1): make_rag_fn(...) – closure-based configurators.
This core: Replace imperative loops in the RAG core with expression-oriented code that is easier to configure, test, and prove equivalent to the baseline.

Audience: Developers who understand purity and configurators (Core 1) but still write loops like:

found = False
for x in xs:
    if pred(x):
        found = True
        break

Outcome:

Spot control flags (found, valid, done) and explain why they obscure logic.
Refactor a 10–20 line loop into comprehensions / any / next while preserving semantics.
Write a Hypothesis property that proves equivalence to the baseline and exposes a real flag-based bug.

Runnability Note (Module 01 Snapshot vs Module 02 End-State)¶

Some “before” snippets in this core are hypothetical pre-refactor examples used for contrast. They are labeled accordingly and are not meant to exactly match a real snapshot. We refactor these shapes into the real Module 02 API as the module progresses.

For a real, runnable Module 01 codebase, use the module-01 tag worktree:

make worktrees
Module 01 path: history/worktrees/module-01/
Import path for Module 01: history/worktrees/module-01/src/

1. Conceptual Foundation¶

1.1 Expression-Oriented Python in One Precise Sentence¶

Expression-oriented programming treats control flow as compositions of value-producing expressions instead of stepwise mutation—so code reads as “data -> data” rather than “state -> state”.

1.2 The One-Sentence Rule¶

In core logic, do not use mutable flags (found, valid, done) or manual break/continue for control; use comprehensions, conditional expressions, and built-ins that return values—flags and break may be acceptable inside encapsulated low-level helpers with pure signatures.

1.3 Why This Matters Now¶

Core 1 gave you pure functions and closure-based configurators:

make_rag_fn(...) -> Callable[[list[RawDoc]], tuple[list[Chunk], Observations]] is pure and deterministic.
But the implementation of the RAG core can still be imperative:
Loops with flags, early breaks, scattered if blocks.
Harder to reason about, harder to transform, and easier to subtly break when adding new behaviors.

Expression-oriented code:

Turns “do this, then maybe that” into “compute this value, then transform it”.
Makes pipelines equational: each step is an expression you can substitute and test in isolation.
Aligns perfectly with Core 1’s closure-based configurators: you configure expressions, not control-flow spaghetti.

Core 1 configures what RAG function we call (make_rag_fn); Core 2 refactors how that function is implemented internally (full_rag_api expressed as comprehensions instead of flags).

1.4 Expressions as Values in 5 Lines¶

We start with a simple, RAG-flavored predicate table:

from collections.abc import Callable
from funcpipe_rag import RawDoc


def has_long_abstract(d: RawDoc) -> bool:
    return len(d.abstract) >= 100


def is_cs_category(d: RawDoc) -> bool:
    return d.categories.startswith("cs.")


DocPred = Callable[[RawDoc], bool]

predicates: dict[str, DocPred] = {
    "long_abstract": has_long_abstract,
    "cs_only": is_cs_category,
}


def filter_docs(key: str, docs: list[RawDoc]) -> list[RawDoc]:
    return [d for d in docs if predicates[key](d)]

The key point:

filter_docs is a single expression ([...]) mapping docs to docs.
Control flow (“if this doc satisfies predicate P, keep it”) is encoded as data: predicates[key].

No flags, no break; everything is composable and easy to test.

2. Mental Model: Imperative Flags vs Expressions¶

2.1 One Picture¶

Imperative Flags (Mutable)              Expression-Oriented (Pure)
+-----------------------+               +------------------------------+
| found = False         |               |   found = any(pred(x)        |
| for x in xs:          |               |               for x in xs)   |
|     if pred(x):       |               |                              |
|         found = True  |               |   # Single expression        |
|         break         |               |   # No flags, no break       |
+-----------------------+               +------------------------------+
   ↑ Scattered control                         ↑ Control is data
   ↑ Subtle state coupling                     ↑ Easy to compose / test

2.2 Contract Table¶

Aspect	Imperative Flags	Expression-Oriented
Dependencies	Hidden in loop structure	Explicit in predicates and expressions
Control Flow	Flags + `break`/`continue`	Comprehensions, `any`/`all`/`next`, ternaries
Reasoning	Global: “what happens to `found`?”	Local: “what does this expression compute?”
Refactoring	Easy to introduce non-local bugs	Equational: refactor expression ↔ expression
Testing	Need to inspect loop behavior	Test expressions as pure functions

# Imperative: flag + break to get first matching doc
first_long = None
for d in docs:
    if has_long_abstract(d):
        first_long = d
        break

# Expression-oriented: next() with default
first_long = next(
    (d for d in docs if has_long_abstract(d)),
    None,  # default if no doc matches
)

While comprehensions promote expression-oriented code, prioritize readability: If a comprehension becomes nested or complex (e.g., 3+ layers), refactor to named helper functions or consider a simple loop inside a trivial pure wrapper. Purity matters, but so does maintainability.

3. Running Project: FuncPipe RAG Builder¶

We continue the FuncPipe RAG Builder from m02-rag.md.

Baseline: a pure stages composition (clean → chunk → embed → dedup).
Module 2 Core 1: make_rag_fn(...) – closure-based configurators.
This core: We refactor the internal implementation of the RAG API from imperative loops to expression-based code while preserving equivalence to the baseline.

3.1 Types (Canonical, Used Throughout)¶

We rely on the types defined in src/funcpipe_rag/rag_types.py and src/funcpipe_rag/api/types.py:

from funcpipe_rag import DocRule, Observations, RagTaps
from funcpipe_rag import RawDoc, CleanDoc, Chunk, RagEnv

These are pure data containers; expression orientation will sit on top of them.

4. Imperative Start: Loops and Flags¶

We begin with a hypothetical pre-refactor implementation of the extended RAG pipeline. It’s semantically correct, but filled with flags and stepwise loops, and it is not intended to be run as-is in the end-of-Module-02 checkout.

# core2_start.py (hypothetical pre-refactor; illustration only)
from funcpipe_rag import RawDoc, CleanDoc, ChunkWithoutEmbedding, Chunk, RagEnv
from funcpipe_rag import DocRule, Observations, RagTaps
from funcpipe_rag import any_doc
from funcpipe_rag import clean_doc  # baseline stage
from funcpipe_rag import embed_chunk, structural_dedup_chunks


def imperative_full_rag_api(
        docs: list[RawDoc],
        env: RagEnv,
        cleaner: Callable[[RawDoc], CleanDoc],
        *,
        keep: DocRule | None = None,
        taps: RagTaps | None = None,
) -> tuple[list[Chunk], Observations]:
    rule = keep if keep is not None else any_doc

    # 1) Filter docs using per-doc flag
    kept_docs: list[RawDoc] = []
    for d in docs:
        is_kept = rule(d)  # Flag; local, but unnecessary
        if is_kept:
            kept_docs.append(d)
    if taps and taps.docs:
        taps.docs(tuple(kept_docs))

    # 2) Clean docs using explicit accumulation
    cleaned: list[CleanDoc] = []
    for d in kept_docs:
        cd = cleaner(d)
        cleaned.append(cd)
    if taps and taps.cleaned:
        taps.cleaned(tuple(cleaned))

    # 3) Chunk each cleaned doc using index + while loop
    chunk_we: list[ChunkWithoutEmbedding] = []
    for cd in cleaned:
        text = cd.abstract
        i = 0
        while i < len(text):
            s = text[i:i + env.chunk_size]
            if s:
                chunk_we.append(
                    ChunkWithoutEmbedding(cd.doc_id, s, i, i + len(s))
                )
            i += env.chunk_size

    # 4) Embed chunks
    embedded: list[Chunk] = []
    for c in chunk_we:
        embedded.append(embed_chunk(c))

    # 5) Deduplicate structurally (baseline stage helper)
    chunks = structural_dedup_chunks(embedded)
    if taps and taps.chunks:
        taps.chunks(tuple(chunks))

    obs = Observations(
        total_docs=len(docs),
        total_chunks=len(chunks),
        kept_docs=len(kept_docs),
        cleaned_docs=len(cleaned),
        sample_doc_ids=tuple(d.doc_id for d in kept_docs[:5]),
        sample_chunk_starts=tuple(c.start for c in chunks[:5]),
    )
    return chunks, obs

Key points:

This function is pure and deterministic.
But control flow is encoded as:
Per-doc flags (is_kept),
Manual accumulation loops,
Explicit index management (i + while).

It works, but it doesn’t read as “data -> data” so much as “do X, then Y, then Z”.

5. Refactor to Expressions: Comprehensions & Conditionals¶

We now introduce a small helper and an expression-oriented RAG core.

5.1 Side-Effect Taps as an Expression Primitive¶

We define _tap as the only side-effect primitive allowed in this core:

from typing import TypeVar, Callable

T = TypeVar("T")

def _tap(xs: list[T], h: Callable[[tuple[T, ...]], None] | None) -> list[T]:
    """
    Observational tap: if h is provided, call h(tuple(xs)) for side effects,
    then return xs unchanged.

    Contract: For all xs and h, the *return value* of _tap(xs, h) equals xs.
    All value-level behavior of the pipeline is unchanged; only side effects differ.
    """
    if h:
        h(tuple(xs))
    return xs

This preserves the value semantics of the pipeline while allowing optional metrics/logging at the edges.

5.2 Expression-Oriented RAG Core¶

We now rewrite the RAG core in an expression style. This is an illustration-only refactor; the runnable end-of-Module-02 implementation lives in src/funcpipe_rag/api/core.py (full_rag_api_docs / full_rag_api).

# core2_refactor_demo.py (illustration only; not the canonical Module-02 API)
from collections.abc import Callable

from funcpipe_rag import RawDoc, CleanDoc, ChunkWithoutEmbedding, Chunk, RagEnv
from funcpipe_rag import DocRule, Observations, RagTaps
from funcpipe_rag import any_doc
from funcpipe_rag import embed_chunk, structural_dedup_chunks


def toy_gen_chunk_doc(cd: CleanDoc, env: RagEnv) -> list[ChunkWithoutEmbedding]:
    """
    Pure helper: chunk a cleaned document into fixed-size pieces.
    """
    text = cd.abstract
    return [
        ChunkWithoutEmbedding(cd.doc_id, chunk_text, start, start + len(chunk_text))
        for start in range(0, len(text), env.chunk_size)
        if (chunk_text := text[start:start + env.chunk_size])
    ]


def toy_full_rag_api(
        docs: list[RawDoc],
        env: RagEnv,
        cleaner: Callable[[RawDoc], CleanDoc],
        *,
        keep: DocRule | None = None,
        taps: RagTaps | None = None,
) -> tuple[list[Chunk], Observations]:
    rule = keep if keep is not None else any_doc  # conditional expression

    kept_docs = _tap(
        [d for d in docs if rule(d)],  # filter
        taps.docs if taps else None,
    )

    cleaned = _tap(
        [cleaner(d) for d in kept_docs],  # map
        taps.cleaned if taps else None,
    )

    chunk_we = [
        c
        for cd in cleaned
        for c in toy_gen_chunk_doc(cd, env)  # flatMap
    ]

    embedded = [embed_chunk(c) for c in chunk_we]
    chunks = _tap(
        structural_dedup_chunks(embedded),
        taps.chunks if taps else None,
    )

    obs = Observations(
        total_docs=len(docs),
        total_chunks=len(chunks),
        kept_docs=len(kept_docs),
        cleaned_docs=len(cleaned),
        sample_doc_ids=tuple(d.doc_id for d in kept_docs[:5]),
        sample_chunk_starts=tuple(c.start for c in chunks[:5]),
    )
    return chunks, obs

Properties:

No mutable flags (is_kept, found_chunk, done).
Control flow is now encoded as expressions:
Filtering: [d for d in docs if rule(d)]
Mapping: [cleaner(d) for d in kept_docs]
Chunk flattening: for cd in cleaned for c in gen_chunk_doc(cd, env)
_tap is the only place where side effects may occur, and it preserves the values.

This is now a direct “data -> data” description of the pipeline.

5.3 Expression Partial (Core 1 Tie-In)¶

from functools import partial
from funcpipe_rag import CleanConfig, make_rag_fn, any_doc

has_long_abstract = lambda d: len(d.abstract) >= 100
has_valid_doc = lambda d: any_doc(d) and has_long_abstract(d)  # Logical and as expression
# In the end-of-Module-02 codebase, `make_rag_fn` captures frozen config.
rag_fn = make_rag_fn(chunk_size=512, clean_cfg=CleanConfig())
# Expression-oriented filtering still composes cleanly:
filtered_docs = [d for d in docs if has_valid_doc(d)]
chunks, obs = rag_fn(filtered_docs)

Wins: Data-driven filtering without flags; composes with Core 1. make_rag_fn is canonical configurator wrapping this expression-based pipeline.

6. Equational Reasoning: Substitution Exercise¶

Hand Exercise: Replace expressions in full_rag_api.
1. Inline rule = keep if keep is not None else any_doc → selected rule.
2. Substitute into kept_docs → filtered list.
3. Result: Entire call = fixed value for fixed inputs.
Bug Hunt: In imperative version, flags obscure per-element logic.

7. Property-Based Testing: Proving Equivalence (Advanced, Optional)¶

Use Hypothesis to prove the refactor preserved behavior against the baseline.

7.1 Custom Strategy (RAG Domain)¶

From tests/conftest.py.

7.2 Equivalence Property¶

# tests/test_rag_api.py
from hypothesis import given
import hypothesis.strategies as st
from funcpipe_rag import (
    RagEnv,
    RagConfig,
    get_deps,
    full_rag_api_docs,
    clean_doc,
    embed_chunk,
    iter_chunk_doc,
    structural_dedup_chunks,
)
from tests.conftest import doc_list_strategy, env_strategy

def baseline_full_rag(docs, env):
    embedded = [embed_chunk(c) for d in docs for c in iter_chunk_doc(clean_doc(d), env)]
    return structural_dedup_chunks(embedded)

@given(docs=doc_list_strategy(), env=env_strategy())
def test_rag_equivalence(docs, env):
    config = RagConfig(env=env)
    deps = get_deps(config)
    expressive, _ = full_rag_api_docs(docs, config, deps)
    assert expressive == baseline_full_rag(docs, env)

Note: Property checks chunk equivalence to a baseline built from the pure stages; obs is new in Module 2. Separately from chunk equivalence, you can add invariants on Observations—for example, that kept_docs equals the number of kept docs and total_chunks equals the number of produced chunks. These invariants live in your implementation, not in the property itself.

7.3 Shrinking Demo: Catching a Bug¶

Bad refactor (cumulative flag):

# (imports as in full_rag_api above)
from funcpipe_rag import RawDoc, Chunk, RagEnv
from funcpipe_rag import DocRule, Observations, RagTaps
from funcpipe_rag import any_doc
from funcpipe_rag import clean_doc, embed_chunk, structural_dedup_chunks


def bad_full_rag_api(docs: list[RawDoc], env: RagEnv, cleaner: Callable[[RawDoc], CleanDoc], *,
                     keep: DocRule | None = None, taps: RagTaps | None = None) -> tuple[list[Chunk], Observations]:
    rule = keep if keep is not None else any_doc
    valid = True  # Shared flag
    kept_docs = []
    for d in docs:
        valid = rule(d) and valid  # Cumulative poison on False
        if valid:
            kept_docs.append(d)
    # ... rest as expressive
    cleaned = [cleaner(d) for d in kept_docs]
    chunk_we = [c for cd in cleaned for c in gen_chunk_doc(cd, env)]
    embedded = [embed_chunk(c) for c in chunk_we]
    chunks = structural_dedup_chunks(embedded)
    obs = Observations(
        total_docs=len(docs),
        total_chunks=len(chunks),
        kept_docs=len(kept_docs),
        cleaned_docs=len(cleaned),
        sample_doc_ids=tuple(d.doc_id for d in kept_docs[:5]),
        sample_chunk_starts=tuple(c.start for c in chunks[:5]),
    )
    return chunks, obs

Property (swapped to bad_full_rag_api):

@given(docs=doc_list_strategy(), env=env_strategy())
def test_bad_rag(docs, env):
    baseline_chunks = baseline_full_rag(docs, env)
    bad_exp, bad_obs = bad_full_rag_api(docs, env, clean_doc)
    assert bad_exp == baseline_chunks

Hypothesis failure trace (run to verify; example):

Falsifying example: test_bad_rag(
    docs=[RawDoc(... valid ...), RawDoc(... invalid ...), RawDoc(... valid ...)], 
    env=RagEnv(chunk_size=128),
)
AssertionError

Shrinks to mixed docs; catches cumulative flag bug. If you accidentally regress full_rag_api into a flag-based bug like bad_full_rag_api, the same equivalence property will fail and Hypothesis will shrink to a minimal counterexample (mixed valid/invalid docs). You simply can’t write this class of error when your filtering is expressed as [d for d in docs if rule(d)] — there is no mutable accumulator to poison. The invariant ‘each doc is kept iff rule(d) is True’ is silently broken once a single doc fails the rule, because valid remains False forever.

8. When Expressions Aren't Worth It¶

Rarely, for profiled hot paths (e.g., large loops), use imperative behind an expression API.

9. Pre-Core Quiz¶

Mutable found flag? → No mutable flags.
break in loop → fix with? → next() or any().
Deep if-else → fix with? → Ternary or dict.
Loop with counter? → Comprehension.
Prove imperative ≡ expression? → Hypothesis.

10. Post-Core Reflection & Exercise¶

Reflect: In your code, find one loop with flag. Refactor to expression; add Hypothesis equiv.
Project Exercise: Apply to RAG; run properties on sample data.

Next: Core 3 – Intro Laziness & Generators. (Builds on this expressions.)

Verify all patterns with Hypothesis—examples provided show how to detect impurities like globals or non-determinism.

Further Reading: For more on closures in Python, see 'Fluent Python' by Luciano Ramalho. Explore toolz for advanced partials once comfortable.