Module 1: Foundational FP Concepts¶
Progression Note¶
By the end of Module 1, you'll master purity laws, write pure functions, and refactor impure code using Hypothesis. This builds the foundation for lazy streams in Module 3. See the series progression map in the repo root for full details.
Here's a snippet from the progression map:
| Module | Focus | Key Outcomes |
|---|---|---|
| 1: Foundational FP Concepts | Purity, contracts, refactoring | Spot impurities, write pure functions, prove equivalence with Hypothesis |
| 2: ... | ... | ... |
| ... | ... | ... |
M01C05: Local FP Refactorings – Replace In-Place Loops with Small Pure Transforms¶
Core question:
How do you systematically refactor imperative, stateful code into small, pure, composable functions—so that hidden mutations vanish, equational reasoning becomes possible, and every change is locally verifiable?
This core builds on Core 1's mindset, Core 2's contracts, Core 3's immutability, and Core 4's composition by applying it locally: refactor one function at a time, turning loops, mutable accumulators, and in-place updates into pure transforms over immutable data.
We continue the running project from Core 1-4: refactoring the FuncPipe RAG Builder, now replacing imperative accumulators with pure equivalents.
Audience: Developers who love Core 4 pipelines but still have legacy functions full of for loops, while, list.append(), dict updates, and index juggling.
Outcome:
1. Refactor any stateful function into pure equivalent in < 20 lines.
2. Explain why manual accumulators hurt reasoning/testing vs comprehensions/folds.
3. Add Hypothesis properties proving refactor equivalence + new purity.
4. Spot and fix three classic refactor blockers: multiple accumulators, index-dependent logic, mutable defaults.
1. Conceptual Foundation¶
1.1 The One-Sentence Rule¶
Default to no mutation of inputs/shared state; keep loops index-free when possible; express work as map/filter/fold over immutable data.
1.2 Local Refactoring in One Precise Sentence¶
Local FP refactoring replaces stateful code by extracting pure functions, eliminating mutable accumulators/shared state, and preferring comprehensions/folds over index-heavy loops while preserving observable behavior.
1.3 Why This Matters Now¶
Local refactoring operationalizes Core 1-4's tools on real code, enabling typed pipelines (Core 6) and effect extraction (Core 7); without it, legacy state blocks progress.
1.4 Refactor Moves in This Core¶
- Loop → map
- Accumulator → comprehension
- In-place update → new object
1.5 Purity Spectrum Table¶
| Level | Description | Example |
|---|---|---|
| Fully Pure | Explicit inputs/outputs only | def add(x: int, y: int) -> int: return x + y |
| Semi-Pure | Observational taps (e.g., logging) | def add_with_log(x: int, y: int) -> int: log(f"Adding {x}+{y}"); return x + y |
| Impure | Globals/I/O/mutation | def read_file(path: str) -> str: ... |
Note on Semi-Pure: This is a pragmatic category for functions that are observationally transparent for business logic (i.e., they behave as if pure for callers) but are still impure in the strict FP sense due to side effects like logging. Allow this only at edges or in debugging taps.
2. Mental Model: Stateful Mess vs Pure Transform¶
2.1 One Picture¶
Stateful Loop Pure Transform
+----------------------------------+ +--------------------------------+
| results = [] | | results = [ |
| totals = {} | | f(x) for x in xs |
| for x in xs: | | if pred(x) |
| results.append(f(x)) | | ] |
| totals[x] = totals.get(x,0)+1| | totals = Counter(xs) |
| return results, totals | +--------------------------------+
+----------------------------------+ No mutation of inputs/shared state
2.2 Contract Table¶
| Clause | Violation Example | Detected By |
|---|---|---|
| No shared mutation | In-place list.sort() / append | Hypothesis + deepcopy equality |
| Equivalence | Old vs new disagree | Hypothesis golden-test property |
| No hidden state | Closure over mutable | Manual review + Hypothesis |
| Determinism | Random/state inside loop | Hypothesis determinism |
Note on Refactors: Safe only when old/new agree on all inputs—prove it. Note: Some stateful variants here are observationally pure (their mutation is confined to local variables), others like stateful_add_chunk are truly impure (shared mutable defaults). We treat both as refactor targets for clarity and safety.
3. Running Project: Local Refactorings in RAG¶
Our running project (from module-01/funcpipe-rag-01/README.md) refactors RAG stages locally to pure.
- Goal: Eliminate mutable accumulators/indexing.
- Start: Core 1-4's pure functions.
- End (this core): Refactored stages with equivalence properties. Semantics aligned with Core 1-4.
3.1 Types (Canonical)¶
These are defined in module-01/funcpipe-rag-01/src/funcpipe_rag/rag_types.py (as in Core 1) and imported as needed. No redefinition here.
3.2 Stateful Variants (Anti-Patterns in RAG)¶
Full code:
from funcpipe_rag import RawDoc, CleanDoc, ChunkWithoutEmbedding, RagEnv, Chunk
import hashlib
# Stateful clean (mutable accum)
def stateful_clean_doc(doc: RawDoc) -> CleanDoc:
abstract = ""
for word in doc.abstract.strip().lower().split():
abstract += word + " " # Mutable string accum
return CleanDoc(doc.doc_id, doc.title, abstract.strip(), doc.categories)
# Stateful chunk (multiple accumulators)
def stateful_chunk_doc(doc: CleanDoc, env: RagEnv) -> tuple[ChunkWithoutEmbedding, ...]:
text = doc.abstract
chunks: list[ChunkWithoutEmbedding] = []
starts: list[int] = []
i = 0
while i < len(text):
end = min(i + env.chunk_size, len(text))
chunks.append(ChunkWithoutEmbedding(doc.doc_id, text[i:end], i, end))
starts.append(i) # Conflated accum
i = end
return tuple(chunks) # Second accum unused; bug risk
# Stateful embed (mutable accum)
def stateful_embed_chunk(chunk: ChunkWithoutEmbedding) -> Chunk:
h = hashlib.sha256(chunk.text.encode("utf-8")).hexdigest()
step = 4
vec = []
for i in range(0, 64, step):
val = int(h[i:i + step], 16) / (16 ** step - 1)
vec.append(val) # Mutable accum
return Chunk(chunk.doc_id, chunk.text, chunk.start, chunk.end, tuple(vec))
# Stateful mutable default (classic blocker)
def stateful_add_chunk(chunks: list[Chunk] = []) -> list[Chunk]: # Mutable default
chunks.append(Chunk("id", "text", 0, 5, tuple(range(16))))
return chunks
Smells: Mutable accum (string/list), multiple accum (error-prone), manual indexing (off-by-one risk), mutable defaults (shared across calls).
3.3 RAG-Specific Refactors¶
RAG Example (clean_doc): Accumulator → Comprehension¶
Before: abstract = "" + loop concat.
After:
from funcpipe_rag import RawDoc, CleanDoc
def clean_doc(doc: RawDoc) -> CleanDoc:
abstract = " ".join(doc.abstract.strip().lower().split())
return CleanDoc(doc.doc_id, doc.title, abstract, doc.categories)
RAG Example (chunk_doc): Loop → Map¶
module-01/funcpipe-rag-01/src/funcpipe_rag/pipeline_stages.py uses a single list comprehension—no manual indexes beyond the comprehension range, and no mutable accumulators:
from funcpipe_rag import CleanDoc, ChunkWithoutEmbedding, RagEnv
def chunk_doc(doc: CleanDoc, env: RagEnv) -> tuple[ChunkWithoutEmbedding, ...]:
text = doc.abstract
return tuple(
ChunkWithoutEmbedding(doc.doc_id, text[i:i + env.chunk_size], i, i + len(text[i:i + env.chunk_size]))
for i in range(0, len(text), env.chunk_size)
)
Note: If your legacy function returns tuples, keep the same outer container type in the refactor until you’ve audited callers. Here we keep tuple for equivalence with the stateful variant; in real refactors, you’d likely keep the original type until you can safely change it.
RAG Example (embed_chunk): In-Place Update → New Object¶
Before: vec = [] + append.
After:
import hashlib
from funcpipe_rag import ChunkWithoutEmbedding, Chunk
def embed_chunk(chunk: ChunkWithoutEmbedding) -> Chunk:
h = hashlib.sha256(chunk.text.encode("utf-8")).hexdigest()
step = 4
vec = tuple(int(h[i:i + step], 16) / (16 ** step - 1) for i in range(0, 64, step))
return Chunk(chunk.doc_id, chunk.text, chunk.start, chunk.end, vec)
4. Refactor to Pure: Small Transforms (Generic Examples)¶
4.1 Loop → Map¶
How to spot this smell: Manual iteration with mutable state (e.g., append to list) or index juggling (i=0; while i < len; i += step).
Before/After Diff (Generic Example):
Before (stateful loop):
def double_evens(xs: list[int]) -> list[int]:
res = []
for x in xs:
if x % 2 == 0:
res.append(x * 2)
return res
After (map + filter):
def double_evens(xs: list[int]) -> list[int]:
return list(map(lambda x: x * 2, filter(lambda x: x % 2 == 0, xs)))
Process: Extract predicates/transforms to small functions; chain filter/map; materialize as list for caller compatibility (we treat it immutably by convention here); you can later switch to a tuple when you know callers don’t rely on mutation. In idiomatic Python you’d usually prefer a comprehension here: list(x * 2 for x in xs if x % 2 == 0), but we’re using map/filter to make the refactor move explicit. Note: In real migrations, keep the outer container type identical (list→list) until you’ve proven no caller depends on mutability; then you can separately refactor list→tuple.
4.2 Accumulator → Comprehension¶
How to spot this smell: Empty list/dict/string + loop appending/updating/concatenating.
Before/After Diff (Generic Example):
Before (mutable accum):
After (comprehension + sum):
Process: Replace accum init + update with generator expression + built-in aggregator (sum, ''.join, dict, set).
4.3 In-Place Update → New Object¶
How to spot this smell: Modifying inputs/arguments (e.g., xs.sort(), d[k] = v) or mutable defaults.
Before/After Diff (Generic Example):
Before (in-place update):
After (new object):
Process: Create new container from comprehension; avoid mutating inputs.
4.4 Non-RAG Example: API Handler Refactor¶
Before (stateful updates):
def handle_request(req: dict) -> dict:
if 'user' not in req:
req['error'] = 'no user' # Mutates input
return req
logs = []
logs.append(f"User: {req['user']}")
if 'action' in req:
logs.append(f"Action: {req['action']}")
req['logs'] = logs # Mutates
return req
After (new objects):
def handle_request(req: dict) -> dict:
if 'user' not in req:
return {**req, 'error': 'no user'}
logs = [f"User: {req['user']}"]
if 'action' in req:
logs += [f"Action: {req['action']}"]
return {**req, 'logs': logs}
Moves Applied: Accumulator → comprehension (logs as list literal + concat); In-place update → new object ({**req, ...}). Local mutation of short-lived variables (like logs inside a function) is usually fine; what we’re avoiding is mutation of inputs or shared state.
4.5 Non-RAG Example: CLI Data Cleaner Refactor¶
Before (mutable accum):
def clean_lines(lines: list[str]) -> list[str]:
cleaned = []
for line in lines:
stripped = line.strip()
if stripped:
cleaned.append(stripped.lower())
return cleaned
After (comprehension):
def clean_lines(lines: list[str]) -> list[str]:
return [line.strip().lower() for line in lines if line.strip()]
Moves Applied: Loop → map (implicit in comprehension); Accumulator → comprehension. Note: Here we keep list→list for equivalence; change to tuple later if callers allow.
4.6 Impure Shell (Edge Only)¶
The shell from Core 1 remains; refactoring focuses on core. Use 'with' for resource safety in impure boundaries—full details in Module 7 (inspired by PEP 343).
5. Equational Reasoning: Substitution Exercise¶
Hand Exercise: Replace expressions in chunk_doc.
1. Inline generator → tuple of chunks.
2. Substitute into full_rag → fixed value.
Bug Hunt: In stateful_chunk_doc, substitution fails (multiple accum risks).
6. Property-Based Testing: Proving Equivalence (Advanced, Optional)¶
Use Hypothesis to prove refactor equivalence.
You can safely skip this on a first read and still follow later cores—come back when you want to mechanically verify your own refactors.
To bridge theory and practice, here's a short demo of falsifying an impure function using Hypothesis:
import random
from hypothesis import given
import hypothesis.strategies as st
def impure_random_add(x: int) -> int:
return x + random.randint(1, 10) # Non-deterministic
@given(st.integers())
def test_detect_impurity(x):
assert impure_random_add(x) == impure_random_add(x) # Falsifies due to randomness
# Hypothesis will quickly find differing outputs for the same x
This property test detects the impurity by showing outputs vary for identical inputs—run it to see Hypothesis in action.
6.1 Mutation Detection Example¶
As mentioned in the contract table, use deepcopy to detect shared mutation:
from hypothesis import given
import hypothesis.strategies as st
from copy import deepcopy
def bad_increment(xs: list[int]) -> list[int]:
xs[0] += 1 # Mutates input
return xs
@given(st.lists(st.integers(), min_size=1))
def test_no_mutation(xs: list[int]) -> None:
original = deepcopy(xs)
_ = bad_increment(xs)
assert xs == original # Fails on mutation
6.2 Custom Strategy (RAG Domain)¶
From module-01/funcpipe-rag-01/tests/conftest.py (as in Core 1).
6.3 Equivalence Property¶
module-01/funcpipe-rag-01/tests/test_laws.py already encodes the key refactoring guarantees. For example:
from hypothesis import given
from funcpipe_rag import docs_to_embedded # impure_chunks from full_rag.py
from funcpipe_rag import impure_chunks
from .conftest import doc_list_strategy, env_strategy
@given(docs=doc_list_strategy(), env=env_strategy())
def test_refactor_preserves_structure(docs, env):
old = sorted(
(c["doc_id"], c["text"], c["start"], c["end"])
for c in impure_chunks(docs, env)
)
new = sorted(
(c.doc_id, c.text, c.start, c.end)
for c in docs_to_embedded(docs, env)
)
assert old == new
This property compares the legacy dictionary-based pipeline with the refactored pure pipeline and proves that every chunk (doc_id/text/start/end) matches exactly despite the internal rewrite. It’s the automated safety net for the refactors described in this core.
6.4 Shrinking Demo: Catching a Bug¶
Bad refactor (off-by-one in chunk_doc):
from funcpipe_rag import CleanDoc, ChunkWithoutEmbedding, RagEnv
def bad_chunk_doc(doc: CleanDoc, env: RagEnv) -> tuple[ChunkWithoutEmbedding, ...]:
text = doc.abstract
return tuple(
ChunkWithoutEmbedding(doc.doc_id, text[i:i + env.chunk_size], i, i + len(text[i:i + env.chunk_size]) - 1)
# Off-by-one
for i in range(0, len(text), env.chunk_size)
)
Property:
from hypothesis import given
import hypothesis.strategies as st
from funcpipe_rag import CleanDoc, RagEnv
from .conftest import env_strategy
@given(
doc=st.builds(CleanDoc, doc_id=st.text(min_size=1), title=st.text(), abstract=st.text(min_size=1),
categories=st.text()),
env=env_strategy(),
)
def test_bad_chunk_doc_equivalence(doc: CleanDoc, env: RagEnv) -> None:
assert stateful_chunk_doc(doc, env) == bad_chunk_doc(doc, env) # Fails on off-by-one
Hypothesis failure trace (run to verify; example):
Falsifying example: test_bad_chunk_doc_equivalence(
doc=CleanDoc(doc_id='a', title='', abstract='a', categories=''),
env=RagEnv(chunk_size=128),
)
AssertionError
- Shrinks to minimal doc with abstract='a'; off-by-one drops char, failing equivalence. Catches bug via shrinking.
7. When Local Refactoring Isn't Worth It¶
Rarely, for profiled hot paths (e.g., tight loops), keep imperative but wrap in pure API.
8. Pre-Core Quiz¶
- Mutable accumulator → violates? → No shared mutation
- Multiple temps → better as? → Tuple/frozen record
- Index juggling → better as? → Comprehension
- Deep recursion → mitigate? → Iteration or explicit stack
- Prove refactor safe? → Hypothesis equivalence
9. Post-Core Reflection & Exercise¶
Reflect: In your code, find one stateful function. Refactor to pure; add equivalence properties.
Project Exercise: Refactor RAG stages; run properties on sample data.
All claims (e.g., referential transparency) are verifiable via the provided Hypothesis examples—run them to confirm.
Further Reading: For more on purity pitfalls, see 'Fluent Python' Chapter on Functions as Objects. Check free resources like Python.org's FP section or Codecademy's Advanced Python course for basics.
Next: Core 6 – Small Combinator Libraries. (Builds on this RAG pure core.)