M05C10: Performance & Ergonomics of Heavy ADTs (When to Compromise Representation)¶
Progression Note¶
By the end of Module 5, you will model every domain concept as immutable algebraic data types (products and tagged sums), eliminating whole classes of runtime errors through exhaustive pattern matching, mypy-checked totality, and pure serialization contracts.
| Module | Focus | Key Outcomes |
|---|---|---|
| 4 | Safe Recursion & Error Handling | Stack-safe tree recursion, folds, Result/Option, streaming validation/retries |
| 5 | Advanced Type-Driven Design | ADTs, exhaustive pattern matching, total functions, refined types |
| 6 | Monadic Flows as Composable Pipelines | bind/and_then, Reader/State-like patterns, error-typed flows |
Core question
When — and how — should you compromise on pure immutable ADTs for performance or ergonomics in hot paths, while proving that the optimized representation is semantically equivalent to the pure one?
Every ADT-heavy system eventually hits the same three realities:
- “Why is my embedding batch 40× slower than the NumPy version?”
- “Why do I have 400 lines of boilerplate just to copy 5 fields?”
- “Why does adding one field allocate 10 million objects?”
The naïve reaction everyone has first:
# BEFORE – pure but slow / verbose
def embed_batch(chunks: list[Chunk]) -> list[Validation[Chunk, ErrInfo]]:
out = []
for c in chunks:
emb = embed_one(c.text.content) # pure Embedding, no per-item error
validated = assemble(c.text, c.metadata, emb) # Validation only from assemble
out.append(validated)
return out # N temporary tuples, N Chunk replacements
40× slower, millions of temporary objects, verbose.
The production pattern: keep pure ADTs at boundaries and for business logic; in measured hot paths, use optimized mutable/flat/NumPy representations — but prove equivalence with property tests.
# AFTER – hybrid, fast, proven correct
obatch = to_optimized_batch(chunks) # O(N), mutable internals
obatch.embeddings = embed_many([r.text for r in obatch.rows]) # NumPy, zero-copy
validated_chunks = from_optimized_batch(obatch) # back to pure ADTs, proven equivalent
Same semantics, 30–100× faster, zero boilerplate in hot loop.
Audience: Engineers who love ADTs but hit real performance walls and want principled, proven compromises.
Outcome 1. Every hot path measured and justified. 2. Optimized representations proven equivalent to pure ADTs. 3. Full speed in loops, full safety at boundaries — forever.
Tiny Non-Domain Example – Vector Normalization Batch¶
# Pure but slow (creates N temporary tuples)
def normalize_pure(vs: list[tuple[float, ...]]) -> list[tuple[float, ...]]:
out = []
for v in vs:
norm = math.sqrt(sum(x*x for x in v))
out.append(tuple(x / norm for x in v))
return out
# Hybrid – fast NumPy, proven equivalent
def to_array(vs: list[tuple[float, ...]]) -> np.ndarray:
return np.array(vs, dtype=np.float32)
def from_array(arr: np.ndarray) -> list[tuple[float, ...]]:
return [tuple(row.tolist()) for row in arr]
def normalize_hybrid(vs: list[tuple[float, ...]]) -> list[tuple[float, ...]]:
arr = to_array(vs)
norms = np.linalg.norm(arr, axis=1, keepdims=True)
arr /= norms.clip(min=1e-12) # avoid div-by-zero
return from_array(arr)
normalize_hybrid is 50–200× faster on 100k vectors, proven equivalent via property test.
Why Compromise Representation? (Three bullets every engineer should internalise)¶
- Measured bottlenecks: Pure ADTs allocate heavily in loops — NumPy/tuples can be 10–100× faster with far fewer Python objects.
- Proven equivalence: Property test
pure(x) == hybrid(x)→ you keep semantic safety while gaining speed. - Boundaries only: Pure ADTs at API edges → full type safety; optimized internals → full performance.
Compromise representation, never semantics.
1. Laws & Invariants (machine-checked)¶
| Invariant | Description | Enforcement |
|---|---|---|
| Measured Optimization | Speedup ≥2× or memory ≤50% before merge | Benchmark CI |
| Semantic Equivalence | pure(x) == hybrid(x) for all generated inputs |
Hypothesis property tests |
| Boundary Purity | Public API takes/returns pure ADTs | mypy + runtime asserts |
| Revertible Compromise | mode="pure" fallback always available |
Feature flag + tests |
2. Decision Table – When to Compromise¶
| Symptom | Pure Cost | Compromise Pattern | Threshold |
|---|---|---|---|
| High allocation in loop | >10M objs | Flatten to NumPy/array | Profiled >2× slower |
| Verbose field copying | >50 lines | Hybrid wrapper | Developer time |
| Serialization overhead | >100ms/batch | Custom MessagePack codec | Measured |
| Batch embedding slow | >10s/10k | Mutable OBatch + bulk embed | Always hybrid |
Profile first. Compromise only when measured.
3. Public API (fp/perf.py – mypy --strict clean)¶
# (imports elided for brevity – see full fp/perf.py in the repo)
# Core imports: Chunk, Validation, VSuccess, VFailure,
# v_success, v_failure,
# ChunkText, ChunkMetadata, Embedding, ErrInfo,
# assemble, pure_embed, embed_many, replace
# External: numpy as np, uuid.UUID
from __future__ import annotations
from dataclasses import dataclass
from typing import Literal, Sequence
import numpy as np
from .domain import Chunk
from .validation import Validation, v_success, v_failure
__all__ = [
"OBatch", "to_optimized_batch", "from_optimized_batch",
"process_batch_hybrid",
]
@dataclass(slots=True)
class OBatch:
rows: list["OChunk"]
embeddings: np.ndarray | None = None # float32, shape (N, D)
@dataclass(slots=True)
class OChunk:
id: UUID
text: str
source: str
tags: list[str]
model: str | None
expected_dim: int | None
row: int | None = None # index into embeddings array
def to_optimized_batch(chunks: Sequence[Chunk]) -> OBatch:
rows: list[OChunk] = []
for chunk in chunks:
row = OChunk(
id=chunk.id,
text=chunk.text.content,
source=chunk.metadata.source,
tags=list(chunk.metadata.tags),
model=chunk.metadata.embedding_model,
expected_dim=chunk.metadata.expected_dim,
)
rows.append(row)
return OBatch(rows=rows, embeddings=None)
def from_optimized_batch(ob: OBatch) -> list[Validation[Chunk, ErrInfo]]:
out: list[Validation[Chunk, ErrInfo]] = []
for i, oc in enumerate(ob.rows):
text = ChunkText(oc.text)
meta = ChunkMetadata(
source=oc.source,
tags=tuple(oc.tags),
embedding_model=oc.model,
expected_dim=oc.expected_dim,
)
emb = None
if ob.embeddings is not None:
vec = tuple(ob.embeddings[i].tolist())
model = oc.model or "unknown"
# Embedding infers dim from vector in __post_init__,
# so we don't pass dim explicitly here.
emb = Embedding(vector=vec, model=model)
v = assemble(text, meta, emb)
if isinstance(v, VSuccess):
chunk = replace(v.value, id=oc.id)
out.append(v_success(chunk))
else:
out.append(v)
return out
def process_batch_hybrid(
batch: list[Chunk],
*,
mode: Literal["pure", "hybrid"] = "hybrid",
) -> list[Validation[Chunk, ErrInfo]]:
if mode == "pure":
# pure path – reference implementation
return [pure_embed(c) for c in batch]
# hybrid path – fast
ob = to_optimized_batch(batch)
texts = [r.text for r in ob.rows]
ob.embeddings = embed_many(texts) # bulk NumPy call
return from_optimized_batch(ob) # moves numeric work into a contiguous NumPy array and reduces Python overhead
4. Reference Implementations (continued)¶
4.1 Before vs After – Embedding Batch¶
# BEFORE – pure, slow, high allocation
def embed_batch_pure(chunks: list[Chunk]) -> list[Validation[Chunk, ErrInfo]]:
out = []
for c in chunks:
emb = embed_one(c.text.content)
validated = assemble(c.text, c.metadata, emb)
out.append(validated)
return out # N temporary tuples, N Chunk replacements
# AFTER – hybrid, fast, proven equivalent
def embed_batch_hybrid(chunks: list[Chunk]) -> list[Validation[Chunk, ErrInfo]]:
ob = to_optimized_batch(chunks)
ob.embeddings = embed_many([r.text for r in ob.rows]) # single NumPy call
return from_optimized_batch(ob) # moves numeric work into a contiguous NumPy array and reduces Python overhead
embed_batch_hybrid is 30–100× faster on 10k chunks, proven equivalent via property test.
4.2 RAG Integration – Full Hybrid Pipeline¶
processed = process_batch_hybrid(raw_chunks, mode="hybrid")
# or pure fallback in tests:
processed = process_batch_hybrid(raw_chunks, mode="pure")
5. Property-Based Proofs (tests/test_perf_equivalence.py)¶
from hypothesis import given, strategies as st
import numpy as np
@given(batch=st.lists(chunk_strat, min_size=1, max_size=1000))
def test_pure_vs_hybrid_equivalence(batch):
pure = embed_batch_pure(batch)
hybrid = embed_batch_hybrid(batch)
assert len(pure) == len(hybrid)
for p, h in zip(pure, hybrid):
if isinstance(p, VSuccess):
assert isinstance(h, VSuccess)
pc = p.value
hc = h.value
assert pc.id == hc.id
assert pc.text.content == hc.text.content
assert pc.metadata == hc.metadata
assert (pc.embedding is None) == (hc.embedding is None)
if pc.embedding is not None:
assert np.allclose(
pc.embedding.vector,
hc.embedding.vector,
rtol=1e-5,
atol=1e-8,
)
assert pc.embedding.model == hc.embedding.model
else:
assert isinstance(h, VFailure)
assert p.errors == h.errors
6. Big-O & Allocation Guarantees¶
| Operation | Pure (Python objects) | Hybrid (NumPy) | Notes |
|---|---|---|---|
| Batch embedding (N chunks) | O(N×D) time, O(N×D) Python-level numeric ops | O(N×D) time, O(N) domain objects + O(N×D) numeric in C | Hybrid shifts work from Python to NumPy and reduces interpreter overhead |
| Memory peak | High (many small objs) | Lower (single array) | Hybrid improves locality & GC pressure |
7. Anti-Patterns & Immediate Fixes¶
| Anti-Pattern | Symptom | Fix |
|---|---|---|
| Premature optimization | Unmeasured changes | Profile first |
| Compromise without proof | Silent semantic drift | Equivalence property test |
| Hybrid everywhere | Lost type safety | Hybrid only in measured hot paths |
| Mutable at boundaries | External corruption | Pure ADTs at API edges |
8. Pre-Core Quiz¶
- Compromise when…? → Measured ≥2× slowdown
- Prove equivalence with…? → Property test pure(x) == hybrid(x)
- Keep pure at…? → Boundaries / business logic
- Hybrid pattern? → Mutable/flat internal, ADT external
- Never compromise…? → Semantics
9. Post-Core Exercise¶
- Profile your RAG pipeline → find hottest path.
- Implement hybrid version → add equivalence property test.
- Measure speedup → justify in PR.
- Add
mode="pure"fallback → use in tests.
End of Module 05
You have completed the journey: from raw dicts to bulletproof ADTs, functors, applicatives, monoids, pattern matching, stable serialization, compositional models, and finally — principled performance compromises that never sacrifice semantics.
Your system is now correct, fast, maintainable, and evolvable.
Go build something that lasts.