Skip to content

Module 1: Foundational FP Concepts

Progression Note

By the end of Module 1, you'll master purity laws, write pure functions, and refactor impure code using Hypothesis. This builds the foundation for lazy streams in Module 3. See the series progression map in the repo root for full details.

Here's a snippet from the progression map:

Module Focus Key Outcomes
1: Foundational FP Concepts Purity, contracts, refactoring Spot impurities, write pure functions, prove equivalence with Hypothesis
2: ... ... ...
... ... ...

M01C03: Immutability & Value Semantics – Tuples, Frozensets, Frozen Dataclasses, Persistent Structures

Core question:
How do you make data truly immutable with value semantics—never changing after creation, equality by value, and (when possible) structural sharing for efficiency—so that sharing is safe, reasoning is local, and mutations are impossible?

This core builds on Core 1's functional mindset and Core 2's contracts by adding immutability as the data counterpart:
- Immutable containers (tuple, frozenset, frozen dataclasses).
- Value semantics (equality by contents, not identity).
- Structural sharing (persistent updates reuse unchanged parts—built-ins give shallow sharing via replace).

We continue the running project from Core 1/2: refactoring the FuncPipe RAG Builder, now enforcing immutability on types.

Audience: Developers who passed Core 2's contract checks but still see bugs from nested mutation, accidental rebinding, or cache invalidation due to mutable defaults.
Outcome:
1. Replace any mutable container with an immutable equivalent in < 10 lines.
2. Explain value semantics vs reference semantics—and why == on immutables is safe.
3. Add properties proving hash stability and absence of observable mutation for RAG data.
4. Spot and fix three classic immutability violations: mutable defaults, nested mutation, identity-based caching.


1. Conceptual Foundation

1.1 The One-Sentence Rule

Default to frozen, hashable, value-equality data; never expose mutable containers—or wrap them behind read-only views.

1.2 Value Semantics in One Precise Sentence

Value semantics means objects are compared by their contents (not identity) and never change after creation; structural sharing is an implementation strategy that makes updates efficient.

1.3 Why This Matters Now

Immutability enforces Core 2's no-mutation contract at the type level, enabling safe sharing in higher-order functions (Core 4) and equational proofs (Core 9); without it, nested mutations cause heisenbugs.

1.4 Bug Story: Mutating a Hash Key

Imagine caching embeddings by chunk:

from dataclasses import dataclass
import hashlib
from typing import Tuple, Dict

@dataclass(eq=True, unsafe_hash=True)  # unsafe_hash=True on a mutable class is almost always a bug factory; prefer frozen=True and immutable fields instead
class MutableChunkWithoutEmbedding:
    doc_id: str
    text: str
    start: int
    end: int

cache: Dict[MutableChunkWithoutEmbedding, Tuple[float, ...]] = {}
chunk = MutableChunkWithoutEmbedding("doc1", "text", 0, 4)
cache[chunk] = (0.1, 0.2)  # Cache hit
old_hash = hash(chunk)
chunk.text = "mutated"  # mutation after hashing
new_hash = hash(chunk)
assert old_hash != new_hash  # hash changed
assert cache.get(chunk) is None  # lookup fails → key “lost”

Problem: Mutation invalidates hash, breaking dict/set lookups. Immutability prevents this entirely. This is why we insist that anything you put in a set / dict key path be logically immutable: otherwise caches, memoization, and lookup-based optimizations silently rot.


2. Mental Model: Mutable Reference vs Immutable Value

2.1 One Picture

Mutable Reference Semantics                Immutable Value Semantics
+---------------------------+            +---------------------------+
| obj1 ──┐                  |            | obj1: [1,2,3]             |
|        ├─> [1,2,3]        |            | obj2: [1,2,3]             |
| obj2 ──┘                  |            | obj1 == obj2 (contents)   |
| mutate → [1,2,4]          |            | mutate → new object       |
| both see change!          |            | sharing safe              |
+---------------------------+            +---------------------------+

2.2 Contract Table

Clause Violation Example Detected By
No post-creation mutation list.append, dict.__setitem__ Hypothesis mutation check + deepcopy
Value equality is instead of == Hypothesis equality property
Hash stability Mutable in set/dict key Hypothesis rehash property
Structural sharing Full copy on update Manual inspection + benchmarks

Note on Nested Mutation: frozen=True is shallow; use deepcopy in properties to detect deep violations. In this series, ‘immutable’ always implicitly means ‘deeply immutable’ for any data that crosses module boundaries.

Note on Local Mutables: Local mutable temporaries are fine; publishing them as part of the API is not.

2.3 Shallow vs Deep Immutability

@dataclass(frozen=True) prevents rebinding attributes but not mutating nested mutables:

from dataclasses import dataclass
from typing import List

@dataclass(frozen=True)
class ShallowImmutable:
    items: List[int]  # Nested mutable

obj = ShallowImmutable([1, 2])
# obj.items = [3]  # Error: can't rebind
obj.items.append(3)  # Succeeds! Nested mutation
assert obj.items == [1, 2, 3]

For deep immutability, change list[int] to tuple[int, ...] and construct it immutably at the boundaries:

from dataclasses import dataclass
from typing import Tuple

@dataclass(frozen=True)
class DeepImmutable:
    items: Tuple[int, ...]

deep = DeepImmutable((1, 2))
# deep.items = (3,)      # Error: can't rebind frozen field
# deep.items += (3,)     # Also an error: attempts to rebind a frozen field

3. Running Project: Immutability in RAG Types

Our running project (from module-01/funcpipe-rag-01/README.md) enforces immutability on Core 1/2's types.
- Goal: Make data safe for sharing, hashing, caching.
- Start: Core 1/2's types.
- End (this core): Frozen types with properties for no mutation/hash stability. Semantics aligned with Core 1/2.

3.1 Types (Canonical)

These are defined in module-01/funcpipe-rag-01/src/funcpipe_rag/rag_types.py (as in Core 1) and imported as needed. No redefinition here. All core RAG dataclasses in rag_types.py are frozen=True and contain only immutable fields. Because all RAG dataclasses are frozen=True and only contain immutable fields, their default hash implementations are safe to use as dict keys and set members.

3.2 Mutable Variants (Anti-Patterns in RAG)

Full code:

from dataclasses import dataclass
from typing import List
from funcpipe_rag import CleanDoc, ChunkWithoutEmbedding, RawDoc, Chunk, RagEnv


# Mutable RawDoc (anti-pattern)
@dataclass  # Mutable
class MutableRawDoc:
    doc_id: str
    title: str
    abstract: str
    categories: str


# Mutable clean (nested mutation)
def mutable_clean_doc(doc: MutableRawDoc) -> MutableRawDoc:
    doc.abstract = " ".join(doc.abstract.strip().lower().split())
    return doc  # Mutates input


# Mutable chunk (mutable list output)
def mutable_chunk_doc(doc: CleanDoc, env: RagEnv) -> List[ChunkWithoutEmbedding]:  # Mutable list
    text = doc.abstract
    chunks = []  # Mutable accum (local OK, but returned mutable)
    for i in range(0, len(text), env.chunk_size):
        chunks.append(
            ChunkWithoutEmbedding(doc.doc_id, text[i:i + env.chunk_size], i, i + len(text[i:i + env.chunk_size])))
    return chunks  # Caller can mutate returned list


# Mutable embed (mutates input for demo)
import hashlib


@dataclass  # Mutable for demo
class MutableChunkWithoutEmbedding:
    doc_id: str
    text: str
    start: int
    end: int


def mutable_embed_chunk(chunk: MutableChunkWithoutEmbedding) -> Chunk:
    chunk.text += " mutated"  # Nested mutation
    h = hashlib.sha256(chunk.text.encode("utf-8")).hexdigest()
    step = 4
    vec = tuple(int(h[i:i + step], 16) / (16 ** step - 1) for i in range(0, 64, step))
    return Chunk(chunk.doc_id, chunk.text, chunk.start, chunk.end, vec)

Smells: Mutable fields (rebindable), nested mutation (abstract changed), mutable outputs (caller can append to chunks). Note: Local mutable accum is OK; problem is exposing mutable API.


4. Refactor to Immutable: Value Semantics in RAG

4.1 Immutable Core

Use frozen dataclasses, tuples for safe sharing. Full code:

from funcpipe_rag import RawDoc, CleanDoc, ChunkWithoutEmbedding, Chunk, RagEnv
from typing import Tuple
import hashlib


# Immutable clean (returns new)
def clean_doc(doc: RawDoc) -> CleanDoc:
    abstract = " ".join(doc.abstract.strip().lower().split())
    return CleanDoc(doc.doc_id, doc.title, abstract, doc.categories)  # New instance


# Immutable chunk (tuple-valued API with immutable elements)
def chunk_doc(doc: CleanDoc, env: RagEnv) -> Tuple[ChunkWithoutEmbedding, ...]:
    text = doc.abstract
    return tuple(
        ChunkWithoutEmbedding(doc.doc_id, text[i:i + env.chunk_size], i, i + len(text[i:i + env.chunk_size]))
        for i in range(0, len(text), env.chunk_size)
    )


# Immutable embed (no mutation)
def embed_chunk(chunk: ChunkWithoutEmbedding) -> Chunk:
    h = hashlib.sha256(chunk.text.encode("utf-8")).hexdigest()
    step = 4
    vec = tuple(int(h[i:i + step], 16) / (16 ** step - 1) for i in range(0, 64, step))
    return Chunk(chunk.doc_id, chunk.text, chunk.start, chunk.end, vec)

Wins: New instances on update, immutable outputs (tuple prevents append), no mutation. Note: From this core onward, chunk_doc returns tuple instead of list to enforce immutability; update full_rag accordingly.

4.2 Persistent Update Example

Python's dataclasses support persistent updates via replace:

from dataclasses import dataclass, replace
from typing import Tuple

@dataclass(frozen=True)
class Config:
    host: str
    port: int
    tags: Tuple[str, ...]  # immutable

base = Config("localhost", 8000, ("api",))

# “Update” using structural sharing: new object reuses unchanged pieces
cfg_prod = replace(base, host="prod.example.com")

assert base.port == cfg_prod.port        # shared value
assert base.tags is cfg_prod.tags        # same tuple object, reused (this is an optimization, not a semantic guarantee)
assert base is not cfg_prod              # new Config instance

This is persistent update: you get a new value back, but unchanged substructures (like the tags tuple) are shared.

4.3 Impure Shell (Edge Only)

The shell from Core 1 remains; immutability focuses on core data.

4.4 Connection to Later Cores

Immutability enables referential transparency, making equational reasoning (Core 9) possible by ensuring expressions equal their values. It also unlocks safe concurrency (no races) and caching (stable hashes), key for pipelines (Core 4) and parallelism (Core 10).


5. Equational Reasoning: Substitution Exercise

Hand Exercise: Replace expressions in clean_doc.
1. Inline abstract = " ".join(...) → normalized string.
2. Substitute into CleanDoc → fixed value.
Bug Hunt: In mutable_clean_doc, substitution fails (mutates original doc).


6. Property-Based Testing: Proving Equivalence (Advanced, Optional)

Use Hypothesis to prove immutability.

You can safely skip this on a first read and still follow later cores—come back when you want to mechanically verify your own refactors.

6.1 Custom Strategy (RAG Domain)

From module-01/funcpipe-rag-01/tests/conftest.py (as in Core 1).

6.2 Immutability & Hash Properties for RAG Types

Full code:

# module-01/funcpipe-rag-01/tests/test_laws.py (excerpt)
from hypothesis import given
import hypothesis.strategies as st
from copy import deepcopy
from typing import List
from funcpipe_rag import clean_doc, chunk_doc, embed_chunk, full_rag
from funcpipe_rag import RawDoc, CleanDoc, ChunkWithoutEmbedding, Chunk, RagEnv
from .conftest import raw_doc_strategy, env_strategy, doc_list_strategy


# Properties for clean_doc
@given(doc=raw_doc_strategy())
def test_clean_doc_no_mutation(doc: RawDoc) -> None:
    original = deepcopy(doc)
    clean_doc(doc)
    assert doc == original


# Properties for chunk_doc
@given(doc=st.builds(CleanDoc, doc_id=st.text(min_size=1), title=st.text(), abstract=st.text(), categories=st.text()),
       env=env_strategy())
def test_chunk_doc_no_mutation(doc: CleanDoc, env: RagEnv) -> None:
    original = deepcopy(doc)
    chunk_doc(doc, env)
    assert doc == original


@given(doc=st.builds(CleanDoc, doc_id=st.text(min_size=1), title=st.text(), abstract=st.text(), categories=st.text()),
       env=env_strategy())
def test_chunks_equal_imply_equal_hash(doc: CleanDoc, env: RagEnv) -> None:
    c1 = chunk_doc(doc, env)
    c2 = chunk_doc(doc, env)
    assert c1 == c2
    assert hash(c1) == hash(c2)


@given(doc=st.builds(CleanDoc, doc_id=st.text(min_size=1), title=st.text(), abstract=st.text(), categories=st.text()),
       env=env_strategy())
def test_chunk_hash_stable(doc: CleanDoc, env: RagEnv) -> None:
    c = chunk_doc(doc, env)
    h = hash(c)
    assert hash(c) == h


# Properties for embed_chunk
chunk_we_strategy = st.builds(
    ChunkWithoutEmbedding,
    doc_id=st.text(min_size=1),
    text=st.text(min_size=1),
    start=st.integers(min_value=0, max_value=1000),
).map(
    lambda c: ChunkWithoutEmbedding(
        c.doc_id, c.text, c.start, c.start + len(c.text)
    )
)


@given(chunk_we=chunk_we_strategy)
def test_embed_chunk_no_mutation(chunk_we: ChunkWithoutEmbedding) -> None:
    original = deepcopy(chunk_we)
    embed_chunk(chunk_we)
    assert chunk_we == original


@given(chunk_we=chunk_we_strategy)
def test_embedded_equal_imply_equal_hash(chunk_we: ChunkWithoutEmbedding) -> None:
    e1 = embed_chunk(chunk_we)
    e2 = embed_chunk(chunk_we)
    assert e1 == e2
    assert hash(e1) == hash(e2)


@given(chunk_we=chunk_we_strategy)
def test_chunk_value_equality(chunk_we: ChunkWithoutEmbedding) -> None:
    e1 = embed_chunk(chunk_we)
    e2 = embed_chunk(chunk_we)
    assert e1 == e2
    assert e1 is not e2


# Composite property (full_rag)
@given(docs=doc_list_strategy(), env=env_strategy())
def test_full_rag_no_mutation(docs: List[RawDoc], env: RagEnv) -> None:
    original = deepcopy(docs)
    full_rag(docs, env)
    assert docs == original

Note: Properties enforce no mutation and hash stability across stages.

6.3 Shrinking Demo: Catching a Bug

Bad refactor (nested mutation in embed_chunk):

Full code:

import hashlib
from dataclasses import dataclass
from funcpipe_rag import Chunk


@dataclass  # Mutable for demo
class MutableChunkWithoutEmbedding:
    doc_id: str
    text: str
    start: int
    end: int


def bad_embed_chunk(chunk: MutableChunkWithoutEmbedding) -> Chunk:
    chunk.text += " mutated"  # Nested mutation
    h = hashlib.sha256(chunk.text.encode("utf-8")).hexdigest()
    step = 4
    vec = tuple(int(h[i:i + step], 16) / (16 ** step - 1) for i in range(0, 64, step))
    return Chunk(chunk.doc_id, chunk.text, chunk.start, chunk.end, vec)

Property:

from hypothesis import given
import hypothesis.strategies as st
from copy import deepcopy

@given(st.builds(MutableChunkWithoutEmbedding, doc_id=st.text(min_size=1), text=st.text(min_size=1), start=st.integers(min_value=0), end=st.integers(min_value=1)))
def test_bad_embed_chunk_no_mutation(chunk: MutableChunkWithoutEmbedding) -> None:
    original = deepcopy(chunk)
    bad_embed_chunk(chunk)
    assert chunk == original  # Fails on mutation

Hypothesis failure trace (run to verify; example):

Falsifying example: test_bad_embed_chunk_no_mutation(
    chunk=MutableChunkWithoutEmbedding(doc_id='a', text='a', start=0, end=1),
)
AssertionError
  • Shrinks to minimal chunk; mutation changes text, failing equality. Catches nested bug via shrinking.

7. When Immutability Isn't Worth It

Rarely, for profiled hot paths (e.g., array processing), use mutable internals but expose immutable API.


8. Pre-Core Quiz

  1. Mutable default argument → violates which clause? → No shared mutation
  2. Mutating list inside tuple → violates? → No shared mutation
  3. Using is instead of == on dataclasses → violates? → Value equality
  4. @dataclass(frozen=True) prevents nested list mutation? → No (shallow only)
  5. Tool to detect nested mutation in tests? → Hypothesis + deepcopy

9. Post-Core Reflection & Exercise

Reflect: In your code, find one mutable container. Freeze it; add immutability properties.
Project Exercise: Update RAG types to immutable; run properties on sample data.

All claims (e.g., referential transparency) are verifiable via the provided Hypothesis examples—run them to confirm.

Further Reading: For more on purity pitfalls, see 'Fluent Python' Chapter on Functions as Objects. Check free resources like Python.org's FP section or Codecademy's Advanced Python course for basics.

Next: Core 4 – Higher-Order Functions & Composition. (Builds on this RAG pure core.)