Module 1: Foundational FP Concepts¶
Progression Note¶
By the end of Module 1, you'll master purity laws, write pure functions, and refactor impure code using Hypothesis. This builds the foundation for lazy streams in Module 3. See the series progression map in the repo root for full details.
Here's a snippet from the progression map:
| Module | Focus | Key Outcomes |
|---|---|---|
| 1: Foundational FP Concepts | Purity, contracts, refactoring | Spot impurities, write pure functions, prove equivalence with Hypothesis |
| 2: ... | ... | ... |
| ... | ... | ... |
M01C02: Pure Functions & Contracts – Inputs, Outputs, No Shared Mutation¶
Core question:
How do you constrain functions so that purity violations—hidden inputs, nondeterministic outputs, or shared mutation—are detectable early by signatures, properties, and optional runtime checks?
This core builds on Core 1's functional mindset by adding detectable contracts for purity:
- Make all inputs explicit (no globals, OS env, time, RNG).
- Aim for deterministic outputs (same args → same result).
- Never mutate shared state (return new values).
We continue the running project from Core 1: refactoring the FuncPipe RAG Builder to pure, now with contracts on stages like clean_doc, chunk_doc, and embed_chunk.
Audience: Developers who refactored to pure in Core 1 but still see flaky tests from hidden globals, nondeterministic RNG/time, or accidental argument mutation.
Outcome:
1. Write a pure function whose purity is checkable by Hypothesis in < 10 lines.
2. Explain why random.random() or os.getenv breaks purity—and how to fix it with explicit params.
3. Add properties detecting determinism + no input mutation whenever those are part of the contract.
4. Spot and fix three classic violations: hidden inputs, nondeterministic outputs, shared mutation.
1. Conceptual Foundation¶
1.1 The One-Sentence Rule¶
Make every input explicit, every output deterministic, and never mutate shared state—or mark the function impure and isolate it.
1.2 Pure Function Contract in One Precise Sentence¶
A pure function contract requires: all inputs explicit parameters, outputs deterministic, no shared state mutated, no side effects (except raising exceptions based on inputs)—detectable with high confidence by property tests and, to a lesser extent, type hints and runtime checks.
1.3 Why This Matters Now¶
Detectable contracts catch purity violations early, ensuring functions from Core 1 compose safely in pipelines (Core 4) and satisfy equational laws (Core 9); without detection, hidden state leads to flaky tests and races.
1.4 Contracts: Three Layers¶
Contracts go beyond purity to include preconditions (e.g., "amount >= 0"), postconditions (e.g., "sum of balances unchanged"), and invariants (e.g., "abstract has no double spaces"). Enforce them across three layers:
| Layer | Description | Examples | When to Use |
|---|---|---|---|
| Static (typing) | Shape and domain of data | Type hints like list[RawDoc] |
For structure/shape; encourages explicitness |
| Dynamic (asserts) | Runtime checks that raise errors | assert amount >= 0 |
For simple, enforceable conditions |
| Behavioral (Hypothesis) | Relational properties over inputs/outputs | f(g(x)) == g(f(x)) |
For subtle behaviors like determinism, no mutation, invariants |
Guidance: If it's about data shape/structure → use types. If it's a simple check → use asserts. If it's relational (e.g., output properties, no side effects) → use property tests.
2. Mental Model: Contract Violations vs Detection¶
2.1 One Picture¶
Leaky Contract (violations) Detected Contract
+---------------------------+ +---------------------------+
| hidden globals / time | | explicit params only |
| hidden RNG / OS env vars | | |
| mutates caller data | | returns new values |
| prints / logs / I/O | | |
| same args → different out | | same args → same out |
+---------------------------+ +---------------------------+
Hypothesis falsifies leaks; types + runtime checks catch them early.
2.2 Contract Table¶
| Clause | Violation Example | Detected By |
|---|---|---|
| Explicit inputs | Globals, os.getenv, datetime.now() |
Code review + discipline; types encourage explicitness |
| Deterministic outputs | random, time, external state |
Hypothesis determinism property |
| No shared mutation | list.sort(), dict.update() |
Hypothesis mutation check + deepcopy |
| No side effects | print, logging, I/O |
Manual review |
Note on Shared Mutation: Includes nested structures (aliasing); use deepcopy in properties for detection. Unseeded RNG violates determinism; seeded RNG is acceptable if the seed is an explicit argument. Note that Hypothesis cannot automatically detect side effects like printing; capture streams manually if needed.
In the rest of this core we turn those columns into concrete contracts on the RAG stages and one non-RAG function.
3. Running Project: Contracts on RAG Stages¶
Our running project (from module-01/funcpipe-rag-01/README.md) adds contracts to Core 1's pure stages.
- Goal: Ensure each stage is pure and detectable.
- Start: Core 1's pure functions.
- End (this core): Functions with Hypothesis properties for determinism, no mutation, and invariants. Semantics aligned with Core 1.
3.1 Types (Canonical)¶
These are defined in module-01/funcpipe-rag-01/src/funcpipe_rag/rag_types.py (as in Core 1) and imported as needed. No redefinition here.
3.2 Impure Variants (Anti-Patterns in RAG)¶
Full code:
# Impure clean (hidden global)
from funcpipe_rag import RawDoc, CleanDoc
DEBUG = True
def impure_clean_doc(doc: RawDoc) -> CleanDoc:
abstract = " ".join(doc.abstract.strip().lower().split())
if DEBUG:
abstract += " (debug)"
return CleanDoc(doc.doc_id, doc.title, abstract, doc.categories)
# Impure chunk (nondeterministic)
import random
from typing import List
from funcpipe_rag import CleanDoc, ChunkWithoutEmbedding, RagEnv
def impure_chunk_doc(doc: CleanDoc, env: RagEnv) -> List[ChunkWithoutEmbedding]:
text = doc.abstract
offset = random.randint(0, min(10, max(0, len(text) - 1))) # Hidden RNG; violates determinism
return [
ChunkWithoutEmbedding(doc.doc_id, text[i + offset:i + offset + env.chunk_size], i + offset,
i + offset + len(text[i + offset:i + offset + env.chunk_size]))
for i in range(0, len(text), env.chunk_size)
] # Arbitrary offset for demo; not production chunking logic
# Impure embed (mutates input)
import hashlib
from dataclasses import dataclass
from funcpipe_rag import Chunk
@dataclass # Mutable for demo
class MutableChunkWithoutEmbedding:
doc_id: str
text: str
start: int
end: int
def impure_embed_chunk(chunk: MutableChunkWithoutEmbedding) -> Chunk:
# Mutates text (shared mutation)
chunk.text = chunk.text.upper()
h = hashlib.sha256(chunk.text.encode("utf-8")).hexdigest()
step = 4
vec = tuple(int(h[i:i + step], 16) / (16 ** step - 1) for i in range(0, 64, step))
return Chunk(chunk.doc_id, chunk.text, chunk.start, chunk.end, vec)
Smells: Hidden global (DEBUG), nondeterministic RNG (violates determinism as offset varies), shared mutation (chunk.text). Note: Used mutable class for mutation demo; canonical types remain frozen.
4. Correct Pattern: Detectable Contracts in RAG¶
4.1 Pure Core¶
The canonical implementations of clean_doc, chunk_doc, embed_chunk, and structural_dedup_chunks live in module-01/funcpipe-rag-01/src/funcpipe_rag/pipeline_stages.py (identical to Core 1). Each function accepts only explicit parameters, returns brand-new dataclasses, and never mutates inputs—exactly the contract we want to enforce in this core.
4.2 Non-RAG Example¶
Use your own domain code (e.g., accounting transfers) to practice adding explicit inputs, determinism, and invariants—the same discipline applied to the FuncPipe stages. Keep side effects isolated just as we do with rag_shell.
4.3 Impure Shell (Edge Only)¶
The shell from Core 1 remains; contracts focus on pure core.
5. Equational Reasoning: Substitution Exercise¶
Hand Exercise: Replace expressions in clean_doc.
1. Inline abstract = " ".join(...) → normalized string.
2. Substitute into CleanDoc → fixed value.
Bug Hunt: In impure_clean_doc, the value of CleanDoc(..., abstract, ...) depends on the hidden global DEBUG, so you can’t safely replace the call with its value without also knowing that global state.
6. Property-Based Testing: Proving Equivalence (Advanced, Optional)¶
Use Hypothesis to prove contract compliance.
You can safely skip this on a first read and still follow later cores—come back when you want to mechanically verify your own refactors.
We saw the simplest determinism property in Core 1; now we’ll apply the same idea to the RAG stages.
To enhance the explanation, here's a short demo of falsifying an impure function using Hypothesis:
import random
from hypothesis import given
import hypothesis.strategies as st
def impure_random_add(x: int) -> int:
return x + random.randint(1, 10) # Non-deterministic
@given(st.integers())
def test_detect_impurity(x):
assert impure_random_add(x) == impure_random_add(x) # Falsifies due to randomness
# Hypothesis will quickly find differing outputs for the same x
This property test detects the impurity by showing outputs vary for identical inputs—run it to see Hypothesis in action.
6.1 Custom Strategy (RAG Domain)¶
From module-01/funcpipe-rag-01/tests/conftest.py (as in Core 1).
6.2 Contract Properties for RAG Stages¶
Full code:
# module-01/funcpipe-rag-01/tests/test_laws.py (excerpt)
from hypothesis import given
import hypothesis.strategies as st
from funcpipe_rag import clean_doc, chunk_doc, embed_chunk
from funcpipe_rag import ChunkWithoutEmbedding
from .conftest import raw_doc_strategy, env_strategy
@given(doc=raw_doc_strategy())
def test_clean_doc_deterministic(doc):
assert clean_doc(doc) == clean_doc(doc)
@given(doc=raw_doc_strategy())
def test_clean_doc_invariants(doc):
cleaned = clean_doc(doc)
# no double spaces
assert " " not in cleaned.abstract
# normalized whitespace and case
assert cleaned.abstract == " ".join(doc.abstract.strip().lower().split())
@given(doc=raw_doc_strategy(), env=env_strategy())
def test_chunk_doc_deterministic(doc, env):
cleaned = clean_doc(doc)
assert chunk_doc(cleaned, env) == chunk_doc(cleaned, env)
@given(doc=raw_doc_strategy(), env=env_strategy())
def test_chunk_doc_covers_cleaned(doc, env):
cleaned = clean_doc(doc)
assert "".join(c.text for c in chunk_doc(cleaned, env)) == cleaned.abstract
chunk_we_strategy = st.builds(
ChunkWithoutEmbedding,
doc_id=st.text(),
text=st.text(min_size=1),
start=st.integers(min_value=0, max_value=1000),
).map(
lambda c: ChunkWithoutEmbedding(
c.doc_id, c.text, c.start, c.start + len(c.text)
)
)
@given(chunk_we=chunk_we_strategy)
def test_embed_deterministic(chunk_we):
assert embed_chunk(chunk_we) == embed_chunk(chunk_we)
@given(chunk_we=chunk_we_strategy)
def test_embed_range_and_dimension(chunk_we):
emb = embed_chunk(chunk_we).embedding
assert len(emb) == 16
assert all(0.0 <= x <= 1.0 for x in emb)
import copy
@given(chunk_we=chunk_we_strategy)
def test_embed_does_not_mutate_input(chunk_we):
original = copy.deepcopy(chunk_we)
_ = embed_chunk(chunk_we)
assert chunk_we == original
These real-world properties encode determinism, invariants, no mutation, and pure behavior directly in the repository.
6.3 Shrinking Demo: Catching a Bug¶
Bad refactor (off-by-one in chunk_doc end, dropping last char):
Full code:
from typing import List
from funcpipe_rag import CleanDoc, ChunkWithoutEmbedding, RagEnv
def bad_chunk_doc(doc: CleanDoc, env: RagEnv) -> List[ChunkWithoutEmbedding]:
text = doc.abstract
return [
ChunkWithoutEmbedding(doc.doc_id, text[i:i + env.chunk_size], i, i + len(text[i:i + env.chunk_size]) - 1)
# Off-by-one
for i in range(0, len(text), env.chunk_size)
]
Property:
from hypothesis import given
import hypothesis.strategies as st
from funcpipe_rag import CleanDoc, RagEnv
from .conftest import env_strategy
@given(
doc=st.builds(CleanDoc, doc_id=st.text(min_size=1), title=st.text(), abstract=st.text(min_size=1),
categories=st.text()),
env=env_strategy(),
)
def test_bad_chunk_doc_index_invariant(doc: CleanDoc, env: RagEnv) -> None:
chunks = bad_chunk_doc(doc, env)
for c in chunks:
assert c.end - c.start == len(c.text) # Fails on off-by-one
@given(
doc=st.builds(CleanDoc, doc_id=st.text(min_size=1), title=st.text(), abstract=st.text(min_size=1),
categories=st.text()),
env=env_strategy(),
)
def test_bad_chunk_doc_covers_abstract(doc: CleanDoc, env: RagEnv) -> None:
chunks = bad_chunk_doc(doc, env)
reconstructed = "".join(c.text for c in chunks)
assert reconstructed == doc.abstract # Fails on dropped chars
These two properties encode the “index” and “coverage” invariants we care about for chunks.
Hypothesis failure trace (run to verify; example for index_invariant):
Falsifying example: test_bad_chunk_doc_index_invariant(
doc=CleanDoc(doc_id='a', title='', abstract='a', categories=''),
env=RagEnv(chunk_size=128),
)
AssertionError
- Shrinks to minimal doc with abstract='a'; off-by-one makes end - start != len(text). For coverage, shrinks to doc with abstract length not multiple of chunk_size, dropping tail. Catches subtle bug via shrinking.
7. When Contracts Aren't Worth It¶
For ultra-hot paths, skip runtime checks; rely on properties in tests.
8. Pre-Core Quiz¶
- Hidden global in a function → violates which clause? → Explicit inputs
random.random()inside a function → violates? → Deterministic outputsdata.sort()on a param → violates? → No shared mutationprint()inside a function → violates? → No side effects- Tool that searches for counterexamples to determinism? → Hypothesis
9. Post-Core Reflection & Exercise¶
Reflect: In your code, find one function that violates a contract. Apply the recipe; add properties.
Project Exercise: Implement contracts on RAG stages; run properties on sample data.
All claims (e.g., referential transparency) are verifiable via the provided Hypothesis examples—run them to confirm.
Further Reading: For more on purity pitfalls, see 'Fluent Python' Chapter on Functions as Objects. Check free resources like Python.org's FP section or Codecademy's Advanced Python course for basics.
Next: Core 3 – Immutability & Value Semantics. (Builds on this RAG pure core.)