Module 1: Foundational FP Concepts¶

Progression Note¶

By the end of Module 1, you'll master purity laws, write pure functions, and refactor impure code using Hypothesis. This builds the foundation for lazy streams in Module 3. See the series progression map in the repo root for full details.

Here's a snippet from the progression map:

Module	Focus	Key Outcomes
1: Foundational FP Concepts	Purity, contracts, refactoring	Spot impurities, write pure functions, prove equivalence with Hypothesis
2: ...	...	...
...	...	...

M01C08: Extracting Side Effects – Passing Dependencies Explicitly Instead of Touching Globals¶

Core question:
How do you eliminate hidden side effects (globals, env, time, RNG, I/O) by passing all dependencies explicitly—so that pure logic stays testable, composable, and free from “it works on my machine” bugs?

This core builds on Core 1's mindset, Core 2's contracts, Core 3's immutability, Core 4's composition, Core 5's refactorings, Core 6's combinators, and Core 7's typed pipelines by isolating impurities at the edges:
- Pass config, logger, DB, clock, RNG explicitly (or via frozen context).
- Pure core: referentially transparent logic only (no effects).
- Thin shell: effectful wrapper that uses context and delegates to pure core.
- Use with_context (Core 7) + frozen dataclasses for dependency bundles.
- Never touch os.getenv, datetime.now, or globals inside pure functions.

We continue the running project from Core 1-7: refactoring the FuncPipe RAG Builder, now isolating effects.

Audience: Developers who mastered Core 7 typed pipelines but still see flaky tests from hidden globals, env vars, or time/RNG.
Outcome:
1. Refactor any function touching globals/env/time/RNG into a pure core + explicit dependency param in < 15 lines.
2. Bundle dependencies into frozen @dataclass contexts (one per layer) and inject via with_context.
3. Write tests (and optionally Hypothesis properties) proving determinism when dependencies are fixed.
4. Spot and fix three classic effect leaks: implicit print/logging, datetime.now(), random.random().
5. Add property tests showing pure core ≡ old impure version (with mocked effects).

1. Conceptual Foundation¶

1.1 The One-Sentence Rule¶

Never touch globals, env, time, or RNG directly; pass everything explicitly via frozen context bundles—one per layer.

1.2 Explicit Dependencies in One Precise Sentence¶

Explicit dependencies mean every effectful operation receives its capabilities via frozen context objects—so the pure core remains deterministic and testable, while thin shells handle I/O, logging, and state.

1.3 Why This Matters Now¶

Explicit dependencies isolate effects, enabling equational reasoning (Core 9) and idempotence (Core 10); without it, hidden state breaks everything.

1.4 How This Relates to DI / Ports & Adapters / Clean Architecture¶

This approach aligns with known patterns:

Dependency Injection (DI): Passing Env bundles is manual DI—simple and zero-deps.
Ports & Adapters: Pure core is the domain; shells are adapters for effects (I/O, time).
Clean Architecture: Core is entities/use-cases (pure); shells are interfaces/infra (effects).

We keep it lightweight: frozen dataclasses + with_context instead of full DI frameworks.

1.5 Purity Spectrum Table¶

Level	Description	Example
Fully Pure	Explicit inputs/outputs only	`def add(x: int, y: int) -> int: return x + y`
Semi-Pure	Observational taps (e.g., logging)	`def add_with_log(x: int, y: int) -> int: log(f"Adding {x}+{y}"); return x + y`
Impure	Globals/I/O/mutation	`def read_file(path: str) -> str: ...`

In this core we'll start moving even logging out of the core and into explicit artifacts.

2. Mental Model: Hidden Effects vs Explicit Context¶

2.1 One Picture¶

Hidden Effects (globals)                   Explicit Context
+---------------------------+            +---------------------------+
| global LOG                |            | @dataclass(frozen=True)   |
| global CONFIG             |            | class CoreEnv:            |
| datetime.now()            |            |     log: Logger           |
| os.getenv("KEY")          |            |     cfg: Config           |
| random.random()           |            |                           |
| → Heisenbugs everywhere   |            | pure_core(cfg, data)      |
|                           |            | shell = with_context(env, |
|                           |            |         effectful_wrapper)|
+---------------------------+            +---------------------------+

2.2 Contract Table¶

Clause	Violation Example	Detected By
Explicit dependencies	`os.getenv`, `datetime.now()`	Tests with frozen context
No hidden prints	`print` inside pure logic	Code review + linter
Determinism when fixed	Same inputs+deps → same outputs	Tests with frozen context
Mockable effects	Direct DB calls	Unit tests with fake Env
Edge isolation	Effects in pipeline middle	Code review + linter

Note on Contracts: Push effects to the edges; prove the core stays pure.

3. Running Project: Extracting Effects in RAG¶

Our running project (from module-01/funcpipe-rag-01/README.md) isolates effects in Core 7's typed pipelines.
- Goal: Push I/O, logging, time/RNG to edges.
- Start: Core 1-7's typed pure functions.
- End (this core): Pure core with explicit values; effects in shell. Semantics aligned with Core 1-7.

3.1 Types (Canonical)¶

These are defined in module-01/funcpipe-rag-01/src/funcpipe_rag/rag_types.py (as in Core 1) and imported as needed. No redefinition here.

3.2 Effectful Variants (Anti-Patterns in RAG)¶

Full code:

from funcpipe_rag import RawDoc, CleanDoc, ChunkWithoutEmbedding, Chunk, RagEnv
import hashlib
from datetime import datetime
import random
import logging

# Before refactors: implicit logging, time, RNG inside the pipeline
LOG = logging.getLogger("rag")


def effectful_clean_doc(doc: RawDoc) -> CleanDoc:
    abstract = " ".join(doc.abstract.strip().lower().split())
    LOG.info("Cleaned doc %s", doc.doc_id)
    return CleanDoc(doc.doc_id, doc.title, abstract, doc.categories)


def effectful_chunk_doc(doc: CleanDoc, env: RagEnv) -> list[ChunkWithoutEmbedding]:
    text = doc.abstract
    chunks = [
        ChunkWithoutEmbedding(doc.doc_id, text[i:i + env.chunk_size], i, i + len(text[i:i + env.chunk_size]))
        for i in range(0, len(text), env.chunk_size)
    ]
    random.shuffle(chunks)
    return chunks


def effectful_embed_chunk(chunk: ChunkWithoutEmbedding) -> Chunk:
    if datetime.now() > datetime(2025, 1, 1):
        raise ValueError("Expired")
    h = hashlib.sha256(chunk.text.encode("utf-8")).hexdigest()
    step = 4
    vec = tuple(int(h[i:i + step], 16) / (16 ** step - 1) for i in range(0, 64, step))
    return Chunk(chunk.doc_id, chunk.text, chunk.start, chunk.end, vec)

Smells: Static global LOG (hidden logging), RNG (nondeterministic), time (flaky).

4. Refactor to Explicit: Pure Core + Shell in RAG¶

4.1 Pure Core¶

Pure logic; return values + artifacts (logs, etc.); no effects in core.

Full code:

# module-01/funcpipe-rag-01/src/funcpipe_rag/pipeline_stages.py (pure helpers)
from funcpipe_rag import RawDoc, CleanDoc, ChunkWithoutEmbedding, Chunk, RagEnv
from datetime import datetime
import random
import hashlib
from funcpipe_rag import structural_dedup_chunks


def clean_doc_pure(doc: RawDoc) -> tuple[CleanDoc, list[str]]:
    abstract = " ".join(doc.abstract.strip().lower().split())
    cleaned = CleanDoc(doc.doc_id, doc.title, abstract, doc.categories)
    return cleaned, [f"Cleaned doc {doc.doc_id}"]


def chunk_doc_pure(seed: int, doc: CleanDoc, env: RagEnv) -> tuple[ChunkWithoutEmbedding, ...]:
    # Use seed for deterministic shuffle if needed; here we demonstrate with shuffle
    text = doc.abstract
    chunks = [
        ChunkWithoutEmbedding(doc.doc_id, text[i:i + env.chunk_size], i, i + len(text[i:i + env.chunk_size]))
        for i in range(0, len(text), env.chunk_size)
    ]
    rng = random.Random(seed)
    rng.shuffle(chunks)
    return tuple(chunks)


def embed_chunk_pure(current_time: datetime, chunk: ChunkWithoutEmbedding) -> Chunk:
    if current_time > datetime(2025, 1, 1):
        raise ValueError(
            "Expired")  # We still throw here; in later modules we’ll model this as Result[Chunk, ExpiredError] instead.
    h = hashlib.sha256(chunk.text.encode("utf-8")).hexdigest()
    step = 4
    vec = tuple(int(h[i:i + step], 16) / (16 ** step - 1) for i in range(0, 64, step))
    return Chunk(chunk.doc_id, chunk.text, chunk.start, chunk.end, vec)


def full_rag_pure(seed: int, current_time: datetime, docs: list[RawDoc], env: RagEnv) -> tuple[
    tuple[Chunk, ...], list[str]]:
    cleaned_with_logs = [clean_doc_pure(doc) for doc in docs]
    cleaned = [cleaned for cleaned, _ in cleaned_with_logs]
    logs = [msg for _, messages in cleaned_with_logs for msg in messages]
    chunked = [chunk_doc_pure(seed, doc, env) for doc in cleaned]
    flattened = [chunk for doc_chunks in chunked for chunk in doc_chunks]
    embedded = [embed_chunk_pure(current_time, chunk) for chunk in flattened]
    # structural_dedup_chunks: pure helper that removes duplicate chunks; defined in Core 6
    deduped = structural_dedup_chunks(embedded)
    return tuple(deduped), logs

4.2 Impure Shell (Edge Only)¶

Handle effects; delegate to pure core.

Full code:

# module-01/funcpipe-rag-01/src/funcpipe_rag/rag_shell.py (context bundle)
from dataclasses import dataclass
from typing import Callable
from funcpipe_rag import full_rag_pure
from funcpipe_rag import RawDoc, Chunk, RagEnv
from datetime import datetime


@dataclass(frozen=True)
class LogEnv:
    log: Callable[[str], None]


@dataclass(frozen=True)
class TimeEnv:
    now: Callable[[], datetime]


@dataclass(frozen=True)
class RandEnv:
    seed: int


@dataclass(frozen=True)
class RagCoreEnv:
    log_env: LogEnv
    time_env: TimeEnv
    rand_env: RandEnv


def full_rag_shell(env: RagCoreEnv, docs: list[RawDoc], rag_env: RagEnv) -> tuple[Chunk, ...]:
    chunks, logs = full_rag_pure(env.rand_env.seed, env.time_env.now(), docs, rag_env)
    for message in logs:
        env.log_env.log(message)
    return chunks

module-01/funcpipe-rag-01/src/funcpipe_rag/rag_shell.py remains the only effectful entry point, reading CSV input and writing JSONL output while calling full_rag_shell (which delegates into full_rag_pure).

Wins: Static (no effects in core), deterministic when fixed, semantics aligned with Core 1-7.

4.3 Real-World Integration¶

Frameworks (e.g., Django/Flask) often force globals (request, timezone.now()). Adapt by constructing Env from framework context:

Full code:

# Flask example: Wrap request + timezone into Env
from flask import request, current_app
from datetime import datetime, timezone
from funcpipe_rag import full_rag_shell, RagCoreEnv, LogEnv, TimeEnv, RandEnv
from funcpipe_rag import RawDoc, RagEnv, Chunk
from funcpipe_rag import with_context


def rag_entry(env: RagCoreEnv, docs: list[RawDoc], rag_env: RagEnv) -> tuple[Chunk, ...]:
    return full_rag_shell(env, docs, rag_env)


def flask_handler() -> tuple[Chunk, ...]:
    env = RagCoreEnv(
        log_env=LogEnv(log=current_app.logger.info),
        time_env=TimeEnv(now=lambda: datetime.now(timezone.utc)),
        rand_env=RandEnv(seed=42)
    )
    body = request.json
    docs = [RawDoc(**d) for d in body["docs"]]

    # Freeze env so downstream call sites don't have to thread it through manually.
    full_rag = with_context(env, rag_entry)
    return full_rag(docs, RagEnv(chunk_size=512))

Wins: Framework globals → explicit Env; pure core stays isolated.

5. Equational Reasoning: Substitution Exercise¶

Hand Exercise: Replace expressions in full_rag_pure.
1. Inline clean_doc_pure(doc) → (CleanDoc, logs).
2. Substitute into chunk_doc_pure → tuple of chunks (seeded).
Bug Hunt: In effectful_clean_doc, substitution fails (hidden log/time/RNG).

6. Property-Based Testing: Proving Equivalence (Advanced, Optional)¶

Use Hypothesis to prove behavior.

You can safely skip this on a first read and still follow later cores—come back when you want to mechanically verify your own refactors.

For side-effect extraction, a couple of simple tests with a fake Env are usually enough; Hypothesis is nice-to-have, not mandatory.

To bridge theory and practice, here's a simple Hypothesis example illustrating impurity detection:

import random
from hypothesis import given
import hypothesis.strategies as st

def impure_random_add(x: int) -> int:
    return x + random.randint(1, 10)  # Non-deterministic

@given(st.integers())
def test_detect_impurity(x):
    assert impure_random_add(x) == impure_random_add(x)  # Falsifies due to randomness

# Hypothesis will quickly find differing outputs for the same x

This property test detects the impurity by showing outputs vary for identical inputs—run it to see Hypothesis in action.

6.1 Custom Strategy (RAG Domain)¶

From module-01/funcpipe-rag-01/tests/conftest.py (as in Core 1).

6.2 Equivalence Property¶

Properties for stages (using the helpers in module-01/funcpipe-rag-01/src/funcpipe_rag/rag_shell.py):

Full code:

# module-01/funcpipe-rag-01/tests/test_laws.py (excerpt)
from hypothesis import given
import hypothesis.strategies as st
from funcpipe_rag import clean_doc_pure, chunk_doc_pure, embed_chunk_pure, full_rag_pure
from funcpipe_rag import RawDoc, CleanDoc, ChunkWithoutEmbedding, Chunk, RagEnv
from funcpipe_rag import RagCoreEnv, LogEnv, TimeEnv, RandEnv, full_rag_shell
from .conftest import raw_doc_strategy, env_strategy, doc_list_strategy
from datetime import datetime

fixed_seed = 42
fixed_time = datetime(2024, 1, 1)


@given(raw_doc_strategy())
def test_clean_doc_pure_deterministic(doc: RawDoc) -> None:
    res1, logs1 = clean_doc_pure(doc)
    res2, logs2 = clean_doc_pure(doc)
    assert res1 == res2 and logs1 == logs2


@given(st.builds(CleanDoc, doc_id=st.text(min_size=1), title=st.text(), abstract=st.text(), categories=st.text()),
       env_strategy())
def test_chunk_doc_pure_deterministic(doc: CleanDoc, env: RagEnv) -> None:
    assert chunk_doc_pure(fixed_seed, doc, env) == chunk_doc_pure(fixed_seed, doc, env)


@given(st.builds(ChunkWithoutEmbedding, doc_id=st.text(min_size=1), text=st.text(min_size=1),
                 start=st.integers(min_value=0), end=st.integers(min_value=1)))
def test_embed_chunk_pure_deterministic(chunk: ChunkWithoutEmbedding) -> None:
    assert embed_chunk_pure(fixed_time, chunk) == embed_chunk_pure(fixed_time, chunk)


@given(doc_list_strategy(), env_strategy())
def test_full_rag_shell_matches_pure(docs: list[RawDoc], env: RagEnv) -> None:
    messages: list[str] = []
    env_bundle = RagCoreEnv(
        log_env=LogEnv(log=messages.append),
        time_env=TimeEnv(now=lambda: fixed_time),
        rand_env=RandEnv(seed=fixed_seed),
    )
    shell_chunks = full_rag_shell(env_bundle, docs, env)
    pure_chunks, logs = full_rag_pure(fixed_seed, fixed_time, docs, env)
    assert shell_chunks == pure_chunks
    assert messages == logs

Note: Properties enforce determinism, equivalence (up to order, with mocks), invariants.

6.3 Shrinking Demo: Catching a Bug¶

Bad refactor (hidden RNG in chunk):

from funcpipe_rag import CleanDoc, ChunkWithoutEmbedding, RagEnv
import random


def bad_chunk_doc(doc: CleanDoc, env: RagEnv) -> tuple[ChunkWithoutEmbedding, ...]:
    text = doc.abstract
    chunks = [
        ChunkWithoutEmbedding(doc.doc_id, text[i:i + env.chunk_size], i, i + len(text[i:i + env.chunk_size]))
        for i in range(0, len(text), env.chunk_size)
    ]
    random.shuffle(chunks)  # Hidden
    return tuple(chunks)

Property:

from hypothesis import given
import hypothesis.strategies as st
from funcpipe_rag import CleanDoc, RagEnv
from .conftest import env_strategy


@given(st.builds(CleanDoc, doc_id=st.text(min_size=1), title=st.text(), abstract=st.text(min_size=1),
                 categories=st.text()), env_strategy())
def test_bad_chunk_doc_deterministic(doc: CleanDoc, env: RagEnv) -> None:
    assert bad_chunk_doc(doc, env) == bad_chunk_doc(doc, env)  # Falsifies due to randomness

Hypothesis failure trace (run to verify; example):

Falsifying example: test_bad_chunk_doc_deterministic(
    doc=CleanDoc(doc_id='a', title='', abstract='ab', categories=''), 
    env=RagEnv(chunk_size=1),
)
AssertionError

Shrinks to doc with multiple chunks; different shuffles fail equality. Catches bug via shrinking.

7. When Explicit Dependencies Aren't Worth It¶

Rarely, for trivial scripts or hot paths, use globals; rely on properties in tests.

8. Pre-Core Quiz¶

datetime.now() inside pure func → violates? → Explicit dependencies
Global logger → violates? → No hidden prints
Same inputs+fixed env → same output? → Determinism
Direct DB call → fix with? → env.db
Tool to prove fixed-env determinism? → Hypothesis

9. Post-Core Reflection & Exercise¶

Reflect: In your code, find one function touching globals/env/time/random/print. Bundle into frozen Env; pull pure core; write shell; inject with with_context; add Hypothesis.
Project Exercise: Isolate effects in RAG; run properties on sample data.

All claims (e.g., referential transparency) are verifiable via the provided Hypothesis examples—run them to confirm.

Further Reading: For more on purity pitfalls, see 'Fluent Python' Chapter on Functions as Objects. Check free resources like Python.org's FP section or Codecademy's Advanced Python course for readers wanting basics.

Next: Core 9 – Equational Reasoning and Local Rewrite Rules for Pure Code. (Builds on this RAG pure core.)