Module 2: First-Class Functions and Expressive Python¶
Progression Note¶
By the end of Module 2, you'll master first-class functions for configurability, expression-oriented code, and debugging taps. This prepares for lazy iteration in Module 3. See the series progression map in the repo root for full details.
Here's a snippet from the progression map:
| Module | Focus | Key Outcomes |
|---|---|---|
| 1: Foundational FP Concepts | Purity, contracts, refactoring | Spot impurities, write pure functions, prove equivalence with Hypothesis |
| 2: First-Class Functions & Expressive Python | Closures, partials, composable configurators | Configure pure pipelines without globals |
| 3: Lazy Iteration & Generators | Streaming/lazy pipelines | Efficient data processing without materializing everything |
M02C08 – Tiny Data-Driven DSLs (Using Frozen Data to Express Domain Rules)¶
Core question:
How do you replace sprawling if-else chains and hard-coded domain logic with tiny, composable data-driven DSLs—so rules become printable, testable, evolvable, and flow through M02C07 pipelines without scattering behaviour across the codebase?
This core introduces tiny data-driven DSLs in Python:
- Represent rules as frozen data (dataclasses with paths, operators, values).
- Use pure interpreters to evaluate rule data into behaviour.
- Compose via combinators like All/AnyOf/Not.
- Build on M02C06 config-as-data for rules as values, M02C07 combinators for orchestration.
We extend the running project from m02-rag.md—the FuncPipe RAG Builder—evolving from hard-coded rules to data-driven DSLs that preserve baseline equivalence for the chunk sequence.
Audience: Developers from M02C07 with combinator pipelines but still embedding domain logic in if-else or scattered predicates.
Outcome:
1. Identify rule smells (if-else sprawl, mutable flags) and explain their impact on evolvability.
2. Refactor domain logic to frozen rule data + pure interpreter.
3. Write Hypothesis properties proving DSL equivalence, with a shrinking example.
1. Conceptual Foundation¶
1.1 Tiny Data-Driven DSLs in One Precise Sentence¶
Tiny data-driven DSLs represent domain rules as immutable data (frozen dataclasses with paths and operators) evaluated by pure interpreters—ensuring rules are composable, testable, and flow like config through M02C07 pipelines.
1.2 The One-Sentence Rule¶
Represent domain rules as frozen data with paths and operators evaluated by pure interpreters—never use if-else or mutable flags in core; pass rules like config.
1.3 Why This Matters Now¶
M02C07 gave combinators for pipelines, but hard-coded rules limit evolvability. Data-driven DSLs make rules data, enabling full M02C01–M02C07 power with printable, testable domain logic.
1.4 DSLs as Values in 5 Lines¶
DSLs as first-class enable dynamic rules:
from functools import partial
from funcpipe_rag import All, LenGt, Pred, StartsWith, eval_pred
rules: dict[str, Pred] = {
"cs": StartsWith("categories", "cs."),
"long": LenGt("abstract", 500),
}
keep_pred = All((rules["cs"], rules["long"]))
keep_fn = partial(eval_pred, pred=keep_pred) # RawDoc -> bool
Rule data, evaluated by pure functions, allows storage in dicts, composition with M02C01 partial, and testing as values—explicit and evolvable.
2. Mental Model: If-Else Sprawl vs Data-Driven DSLs¶
2.1 One Picture¶
If-Else Sprawl (Messy) Data-Driven DSLs (Clean)
+---------------------------+ +-----------------------------------+
| if d.categories == "cs": | | cs_rule = StartsWith("categories", "cs.") |
| if len(d.abstract) > 500: | | long_rule = LenGt("abstract", 500)|
| return True | | rule = All(cs_rule, long_rule) |
| ... | | eval_pred(d, rule) |
+---------------------------+ +-----------------------------------+
↑ Hardcoded, Rigid ↑ Data, Composable
2.2 Contract Table¶
| Aspect | If-Else Sprawl | Data-Driven DSLs |
|---|---|---|
| Evolvability | Code changes | Data changes |
| Testability | Mock contexts | Generate rules |
| Readability | Nested branches | Linear data |
| Composability | Manual nesting | All/AnyOf/Not |
| Auditing | Trace execution | Print rule/decision |
| Mutable Defaults in Partials | Breaks Determinism | Use frozen dataclasses or immutable types for configs |
Note on If-Else Choice: Use if-else only for trivial logic; always prefer DSLs for domain rules.
3. Running Project: FuncPipe RAG Builder¶
We extend the FuncPipe RAG Builder from m02-rag.md:
- Dataset: 10k arXiv CS abstracts (arxiv_cs_abstracts_10k.csv).
- Goal: Turn hard-coded rules into data-driven DSL.
- Start: Hard-coded version (core8_start.py).
- End: DSL rules as data, preserving equivalence.
3.1 Types (Canonical, Used Throughout)¶
Use the project’s DSL types from src/funcpipe_rag/core/rules_pred.py (re-exported from funcpipe_rag):
from funcpipe_rag import All, CleanConfig, LenGt, RagConfig, RagEnv, RulesConfig, StartsWith
CS_RULE = StartsWith("categories", "cs.")
LONG_RULE = LenGt("abstract", 500)
KEEP_PRED = All((CS_RULE, LONG_RULE))
CS_LONG_RULES = RulesConfig(keep_pred=KEEP_PRED)
config = RagConfig(env=RagEnv(512), keep=CS_LONG_RULES, clean=CleanConfig())
Note: DEFAULT_RULES is RulesConfig(keep_pred=All(())) (no conditions ⇒ keep everything). Pass an explicit rules config like CS_LONG_RULES to actually filter.
3.2 Hard-Coded Start (Anti-Pattern)¶
from funcpipe_rag import RawDoc
def hard_keep(d: RawDoc) -> bool:
# Hard-coded path ("categories") and values ("cs.", 500)
return d.categories.startswith("cs.") and len(d.abstract) > 500
Smells:
- Hard-coded paths/values (categories == "cs.").
- If-else sprawl.
- Magic numbers (500).
Problem: Hard to evolve/test; scattered logic.
4. Refactor to DSL: Data-Driven Rules + Interpreter¶
4.1 DSL Data (Frozen, Composable)¶
Rule data in config (as defined in §3.1: CS_RULE, LONG_RULE, KEEP_PRED, CS_LONG_RULES).
Properties:
- Frozen: Immutable.
- Composable: All/AnyOf/Not.
- In config: Flows like data.
4.1.1 Before-and-After Refactoring Snippet¶
To cement the transition from if-else to DSL, here's an explicit mini-example showing the "ugly before" with hard-coded if-else (e.g., from the anti-pattern code) and the "clean after" using DSL data + interpreter:
# Before: Ugly hard-coded if-else chain
from functools import partial
from funcpipe_rag import All, LenGt, RawDoc, StartsWith, eval_pred
def hard_keep(d: RawDoc) -> bool:
return d.categories.startswith("cs.") and len(d.abstract) > 500
# After: Data-driven DSL + pure interpreter (`eval_pred`)
KEEP_PRED = All((StartsWith("categories", "cs."), LenGt("abstract", 500)))
dsl_keep = partial(eval_pred, pred=KEEP_PRED) # RawDoc -> bool
assert dsl_keep(RawDoc("id", "title", "x" * 501, "cs.AI")) is True
assert dsl_keep(RawDoc("id", "title", "short", "cs.AI")) is False
This refactor eliminates hard-coded logic, making the rules data that is easy to test, evolve, and compose—same inputs always yield the same outputs.
4.2 Pure Interpreter (Evaluates Data)¶
The project’s pure interpreter is funcpipe_rag.eval_pred (implemented in src/funcpipe_rag/core/rules_pred.py). It only supports the known RawDoc paths (doc_id, title, abstract, categories).
Properties:
- Pure: Deterministic, no effects.
- Tied to data: Evaluates rule structures.
4.3 Refactored Core (Uses DSL)¶
Updated core with DSL (building on M02C07 combinators as implemented in src/funcpipe_rag/fp.py):
from funcpipe_rag import (
All,
LenGt,
RagConfig,
RagEnv,
RulesConfig,
StartsWith,
eval_pred,
ffilter,
flatmap,
flow,
fmap,
gen_chunk_doc,
get_deps,
structural_dedup_chunks,
)
config = RagConfig(
env=RagEnv(512),
keep=RulesConfig(keep_pred=All((StartsWith("categories", "cs."), LenGt("abstract", 500)))),
)
deps = get_deps(config)
keep_rule = lambda d: eval_pred(d, config.keep.keep_pred)
pipeline = flow(
lambda: docs,
ffilter(keep_rule),
fmap(deps.cleaner),
flatmap(lambda cd: gen_chunk_doc(cd, config.env)),
fmap(deps.embedder),
)
chunks = structural_dedup_chunks(pipeline())
Properties:
- Data-driven: Rules as data.
- Composable: Via M02C07 combinators.
4.4 Public API (Unchanged from M02C05–M02C07)¶
from funcpipe_rag import full_rag_api_docs, full_rag_api_path, get_deps
chunks, obs = full_rag_api_docs(docs, config, get_deps(config))
res = full_rag_api_path("arxiv_cs_abstracts_10k.csv", config, boundary_deps)
Properties:
- Keeps Result; boundaries unchanged.
4.5 Configurator Tie-In (M02C01)¶
from funcpipe_rag import make_rag_fn
rag_fn = make_rag_fn(chunk_size=512, keep=CS_LONG_RULES) # docs -> (chunks, obs)
Wins: DSLs compose with M02C01 partial for variants. Note: RagConfig.keep defaults to DEFAULT_RULES (keep everything).
5. Equational Reasoning: Substitution Exercise¶
Hand Exercise: Substitute in eval_pred.
1. Inline KEEP_PRED = All((CS_RULE, LONG_RULE)) → fixed data.
2. Substitute into eval_pred → parametric bool.
3. Result: Behaviour fixed for fixed rule data (immutable).
Bug Hunt: In hard-coded version, if-else breaks substitution.
Example:
- Hard-coded: if d.categories == "cs." → rigid, not substitutable.
- DSL: eval_pred(d, KEEP_PRED) → data-driven, substitutable with fake rule.
6. Property-Based Testing: Proving DSL Behaviour¶
Use Hypothesis to prove refactor preserves data-driven rules.
6.1 Custom Strategy¶
From tests/conftest.py. Add a raw_doc_strategy if needed for single docs.
6.2 DSL Equivalence Property¶
# tests/test_rag_api.py (DSL equivalence)
from hypothesis import given
from funcpipe_rag import All, LenGt, RawDoc, StartsWith, eval_pred
from tests.conftest import doc_list_strategy
KEEP_PRED = All((StartsWith("categories", "cs."), LenGt("abstract", 500)))
def hard_keep(d: RawDoc) -> bool:
return d.categories.startswith("cs.") and len(d.abstract) > 500
@given(docs=doc_list_strategy())
def test_dsl_matches_hard_keep(docs):
dsl_kept = [d for d in docs if eval_pred(d, KEEP_PRED)]
hard_kept = [d for d in docs if hard_keep(d)]
assert dsl_kept == hard_kept
Note: Tests DSL matches hard-coded keep.
6.3 DSL Rule Equality Property¶
from dataclasses import replace
@given(docs=doc_list_strategy())
def test_equal_rules_equal_behaviour(docs):
rules1 = KEEP_PRED
rules2 = replace(rules1)
out1 = [d for d in docs if eval_pred(d, rules1)]
out2 = [d for d in docs if eval_pred(d, rules2)]
assert out1 == out2
Note: Verifies rule equality implies behaviour equality.
6.4 DSL Algebraic Property¶
from hypothesis import given
import hypothesis.strategies as st
from funcpipe_rag import All, AnyOf, LenGt, Not, Pred, RawDoc, StartsWith, eval_pred
from tests.conftest import raw_doc_strategy
pred_strategy = st.recursive(
st.one_of(
st.builds(StartsWith, st.just("categories"), st.text(max_size=10)),
st.builds(LenGt, st.just("abstract"), st.integers(min_value=0, max_value=1000)),
),
lambda child: st.one_of(
st.builds(All, st.tuples(child, child)),
st.builds(AnyOf, st.tuples(child, child)),
st.builds(Not, child),
),
max_leaves=20,
)
@given(pred=pred_strategy, doc=raw_doc_strategy())
def test_dsl_double_negation(pred: Pred, doc: RawDoc):
assert eval_pred(doc, pred) == eval_pred(doc, Not(Not(pred)))
Note: Verifies DSL algebraic properties (e.g., double negation) with generated contexts.
6.5 Idempotence Property (DSL-Driven)¶
@given(chunk_size=st.integers(128, 1024))
def test_rag_idempotence(chunk_size):
from funcpipe_rag import Ok, RagBoundaryDeps, RagConfig, RagEnv, full_rag_api_path, get_deps
class FakeReader:
def __init__(self, docs):
self._docs = docs
def read_docs(self, path):
_ = path
return Ok(self._docs)
from funcpipe_rag import All, LenGt, RulesConfig, StartsWith
keep = RulesConfig(keep_pred=All((StartsWith("categories", "cs."), LenGt("abstract", 500))))
config = RagConfig(env=RagEnv(chunk_size), keep=keep)
deps = RagBoundaryDeps(core=get_deps(config), reader=FakeReader([]))
res1 = full_rag_api_path("fake_path", config, deps)
res2 = full_rag_api_path("fake_path", config, deps)
assert res1 == res2
Note: Ensures no hidden state with immutable DSL rules and faked deps (see tests/test_rag_api.py for a minimal FakeReader pattern).
6.6 Full Pipeline Equivalence Property¶
# tests/test_rag_api.py (baseline equivalence)
from hypothesis import given
from funcpipe_rag import (
DEFAULT_RULES,
RagConfig,
clean_doc,
embed_chunk,
full_rag_api_docs,
gen_chunk_doc,
get_deps,
structural_dedup_chunks,
)
from tests.conftest import doc_list_strategy, env_strategy
def _baseline_chunks(docs, env):
cleaned = [clean_doc(d) for d in docs]
embedded = [embed_chunk(c) for cd in cleaned for c in gen_chunk_doc(cd, env)]
return structural_dedup_chunks(embedded)
@given(docs=doc_list_strategy(), env=env_strategy())
def test_full_rag_api_docs_matches_baseline(docs, env):
config = RagConfig(env=env, keep=DEFAULT_RULES)
deps = get_deps(config)
chunks, obs = full_rag_api_docs(docs, config, deps)
assert chunks == _baseline_chunks(docs, env)
assert obs.total_docs == len(docs)
Note: Tests the full API matches a baseline built from the pure stages (with DEFAULT_RULES ⇒ keep everything).
6.7 Shrinking Demo: Catching a Leaky Bug¶
Bad interpreter with mutable:
from funcpipe_rag import All, LenGt, Not, StartsWith, eval_pred
KEEP_PRED = All((StartsWith("categories", "cs."), LenGt("abstract", 500)))
MUTABLE_PRED = KEEP_PRED
def bad_keep(doc) -> bool:
global MUTABLE_PRED
MUTABLE_PRED = Not(MUTABLE_PRED) # Leaky mutation
return eval_pred(doc, MUTABLE_PRED)
Property:
from hypothesis import given
from tests.conftest import raw_doc_strategy
@given(doc=raw_doc_strategy())
def test_bad_dsl_is_not_idempotent(doc):
global MUTABLE_PRED
MUTABLE_PRED = KEEP_PRED
out1 = bad_keep(doc)
out2 = bad_keep(doc)
assert out1 == out2
Failure Trace (Example):
Falsifying example: test_bad_dsl_is_not_idempotent(
doc=RawDoc(doc_id='1', title='t', abstract='...', categories='cs.AI'),
)
AssertionError
Analysis: Shrinks to a minimal RawDoc where toggling the global predicate flips the result; catches mutation bug.
7. When DSLs Aren't Worth It¶
Use if-else only in:
- Trivial one-rule logic.
- Legacy code wrapping DSLs.
Guardrails: Isolate to <5 lines; prefer DSLs for domain rules.
Example:
8. Pre-Core Quiz¶
- If-else chain? → Hard-coded logic.
- Mutable rule? → frozen=True.
- Magic path? → LenGt("path", value).
- Global rule? → Pass as param.
- Prove rules? → Hypothesis recursive.
9. Post-Core Reflection & Exercise¶
Reflect: Find if-else domain logic. Refactor to frozen rule data + interpreter; add Hypothesis for equivalence/idempotence.
Project Exercise: Apply to RAG (e.g., keep as DSL); run properties.
- Did data reduce branches?
- Did interpreter enable tests?
- Did composability clarify logic?
Next: Core 9 – Debugging FP Code.
Verify all patterns with Hypothesis—examples provided show how to detect impurities like globals or non-determinism.
Further Reading: For more on closures in Python, see 'Fluent Python' by Luciano Ramalho. Explore toolz for advanced partials once comfortable.