Specs & Plans
Fase 40 — content-provenance Implementation Plan
For agentic workers: REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (
- [ ]) syntax for tracking.
Goal: Build jw_core.provenance — a layer-2 fidelity validator that, given any Citation produced by the toolkit, can re-fetch the source page, compute a canonical hash, and report whether the text the agent originally used still matches what is live. The module integrates with Fase 39 (NLI) to re-run entailment when fidelity drift is detected.
Architecture: New module packages/jw-core/src/jw_core/provenance/ with five files (models.py, hashing.py, validator.py, propagation.py, errors.py). Convention-based extension of Citation.metadata (no dataclass change): four keys published_date, accessed_at, content_hash, revision. Validator receives fetcher + extractor + optional NLIProvider (Fase 39) — never instantiates httpx itself. CLI jw provenance check and MCP tool verify_provenance complete the surface. Telemetry hooks emit provenance_drift events that piggyback Fase 9’s opt-in switch.
Tech Stack: Python 3.13 · Pydantic 2 (models) · stdlib hashlib + unicodedata (canonicalization) · existing httpx (no new dep) · pytest with pytest-asyncio (already in dev deps).
Spec: docs/superpowers/specs/2026-05-31-fase-40-content-provenance-design.md.
Depende de: Fase 39 (nli-runtime) — solo para la integración opcional nli_provider. Fase 40 degrada limpiamente sin Fase 39 (import-guarded).
File map
Creates:
packages/jw-core/src/jw_core/provenance/__init__.pypackages/jw-core/src/jw_core/provenance/errors.pypackages/jw-core/src/jw_core/provenance/models.pypackages/jw-core/src/jw_core/provenance/hashing.pypackages/jw-core/src/jw_core/provenance/validator.pypackages/jw-core/src/jw_core/provenance/propagation.pypackages/jw-core/tests/test_provenance/__init__.pypackages/jw-core/tests/test_provenance/test_errors.pypackages/jw-core/tests/test_provenance/test_models.pypackages/jw-core/tests/test_provenance/test_hashing.pypackages/jw-core/tests/test_provenance/test_validator.pypackages/jw-core/tests/test_provenance/test_validator_nli.pypackages/jw-core/tests/test_provenance/test_propagation.pypackages/jw-core/tests/test_provenance/test_validator_drift_detection.pypackages/jw-core/tests/test_provenance/test_backwards_compat.pypackages/jw-core/tests/test_provenance/fixtures/__init__.pypackages/jw-core/tests/test_provenance/fixtures/agent_result_with_provenance.jsonpackages/jw-core/tests/test_provenance/fixtures/agent_result_legacy.jsonpackages/jw-cli/src/jw_cli/commands/provenance.pypackages/jw-cli/tests/test_cli_provenance.pypackages/jw-mcp/src/jw_mcp/tools/provenance.pypackages/jw-mcp/tests/test_provenance_tool.pydocs/guias/content-provenance.md
Modifies:
packages/jw-agents/src/jw_agents/verse_explainer.py— stamp citations with provenance fields at emission.packages/jw-agents/src/jw_agents/apologetics.py— same.packages/jw-core/src/jw_core/wol_client.py— stamp onget_article/get_bible_chapteringest.packages/jw-cli/src/jw_cli/main.py— registerprovenancesubcommand group.packages/jw-mcp/src/jw_mcp/server.py— registerverify_provenancetool.docs/ROADMAP.md— add Fase 40 section.docs/VISION_AUDIT.md— add Fase 40 row.docs/README.md— link the newcontent-provenance.mdguide.
Task 1: Scaffold provenance/ module + empty tests directory
Files:
-
Create:
packages/jw-core/src/jw_core/provenance/__init__.py -
Create:
packages/jw-core/src/jw_core/provenance/errors.py -
Create:
packages/jw-core/tests/test_provenance/__init__.py -
Create:
packages/jw-core/tests/test_provenance/test_errors.py -
Step 1: Write failing test for errors module
# packages/jw-core/tests/test_provenance/test_errors.py
"""Sanity tests for provenance exception classes."""
from __future__ import annotations
import pytest
from jw_core.provenance.errors import (
MissingProvenanceError,
ProvenanceError,
ProvenanceFetchError,
)
def test_missing_provenance_is_provenance_error() -> None:
err = MissingProvenanceError("no content_hash in citation")
assert isinstance(err, ProvenanceError)
assert "no content_hash" in str(err)
def test_fetch_error_carries_url_attribute() -> None:
err = ProvenanceFetchError("timeout", url="https://wol.jw.org/x")
assert isinstance(err, ProvenanceError)
assert err.url == "https://wol.jw.org/x"
assert "timeout" in str(err)
def test_provenance_error_is_distinct_from_value_error() -> None:
with pytest.raises(ProvenanceError):
raise ProvenanceError("boom")
with pytest.raises(ProvenanceError):
raise MissingProvenanceError("also boom")
- Step 2: Run test to verify it fails
Run: uv run pytest packages/jw-core/tests/test_provenance/test_errors.py -v
Expected: FAIL — jw_core.provenance module missing.
- Step 3: Implement the package skeleton + errors
# packages/jw-core/src/jw_core/provenance/__init__.py
"""Content provenance — Layer 2 fidelity tracking.
This module answers the question: "is the text my agent used still the
same as what's live on wol.jw.org right now?". It complements
`jw_core.citations.validator` (Fase 23 — L0/L1: URL resolves, doc_id in
catalog) and `jw_core.nli` (Fase 39 — L3: entailment).
Public API (curated re-exports):
from jw_core.provenance import (
ProvenanceRecord,
ProvenanceVerdict,
ProvenanceReport,
ProvenanceValidator,
canonicalize_text,
content_sha256,
stamp_citation,
stamp_finding_text,
ProvenanceError,
MissingProvenanceError,
ProvenanceFetchError,
)
All re-exports preserve a single-import boundary so that callers (CLI,
MCP, agents) never reach into submodules.
"""
from __future__ import annotations
from jw_core.provenance.errors import (
MissingProvenanceError,
ProvenanceError,
ProvenanceFetchError,
)
from jw_core.provenance.hashing import canonicalize_text, content_sha256
from jw_core.provenance.models import (
ProvenanceRecord,
ProvenanceReport,
ProvenanceVerdict,
)
from jw_core.provenance.propagation import stamp_citation, stamp_finding_text
from jw_core.provenance.validator import ProvenanceValidator
__all__ = [
"MissingProvenanceError",
"ProvenanceError",
"ProvenanceFetchError",
"ProvenanceRecord",
"ProvenanceReport",
"ProvenanceValidator",
"ProvenanceVerdict",
"canonicalize_text",
"content_sha256",
"stamp_citation",
"stamp_finding_text",
]
# packages/jw-core/src/jw_core/provenance/errors.py
"""Exceptions emitted by the provenance subsystem.
Conventions:
- All exceptions are subclasses of ProvenanceError so callers can
install one blanket handler.
- Fetch failures carry the offending URL so the CLI can surface it.
- Missing-data errors are distinct from fetch errors — they signal
"citation was emitted without provenance metadata" which is a
backwards-compat scenario, not an outage.
"""
from __future__ import annotations
class ProvenanceError(Exception):
"""Base class for all provenance-related failures."""
class MissingProvenanceError(ProvenanceError):
"""A Citation lacks the four conventional provenance keys in metadata.
Raised only when the caller asks for a strict check; the validator
itself prefers to return `status="no_record"` for backwards compat.
"""
class ProvenanceFetchError(ProvenanceError):
"""The fetcher could not retrieve the URL for a re-check.
Carries the URL so it can be reported per-citation without losing
context after exceptions cross async boundaries.
"""
def __init__(self, message: str, *, url: str) -> None:
super().__init__(message)
self.url = url
# packages/jw-core/tests/test_provenance/__init__.py
"""Tests for jw_core.provenance."""
- Step 4: Stub the other submodules so the package import succeeds
To keep this task self-contained, write placeholder submodules that the
__init__.py re-exports point at. Each will be properly implemented in
later tasks; for now they expose minimal class names so imports succeed.
# packages/jw-core/src/jw_core/provenance/models.py
"""Pydantic models for provenance (filled in Task 2 + Task 3)."""
from __future__ import annotations
from pydantic import BaseModel
class ProvenanceRecord(BaseModel):
"""Placeholder — replaced in Task 2."""
class ProvenanceVerdict(BaseModel):
"""Placeholder — replaced in Task 3."""
class ProvenanceReport(BaseModel):
"""Placeholder — replaced in Task 3."""
# packages/jw-core/src/jw_core/provenance/hashing.py
"""Canonicalization + sha256 (filled in Task 4)."""
from __future__ import annotations
def canonicalize_text(text: str) -> str: # pragma: no cover — replaced in Task 4
return text
def content_sha256(text: str) -> str: # pragma: no cover — replaced in Task 4
return ""
# packages/jw-core/src/jw_core/provenance/validator.py
"""Validator (filled in Task 5)."""
from __future__ import annotations
class ProvenanceValidator: # pragma: no cover — replaced in Task 5
pass
# packages/jw-core/src/jw_core/provenance/propagation.py
"""Propagation helpers (filled in Task 6)."""
from __future__ import annotations
def stamp_citation(citation, *, text, published_date=None, revision=None): # pragma: no cover — replaced in Task 6
return citation
def stamp_finding_text(finding): # pragma: no cover — replaced in Task 6
return finding
- Step 5: Run test to verify it passes
Run: uv run pytest packages/jw-core/tests/test_provenance/test_errors.py -v
Expected: 3 passed.
- Step 6: Commit
git add packages/jw-core/src/jw_core/provenance packages/jw-core/tests/test_provenance
git commit -m "feat(jw-core/provenance): scaffold module with errors and placeholder submodules"
Task 2: ProvenanceRecord model + typed view over Citation.metadata
Files:
-
Modify:
packages/jw-core/src/jw_core/provenance/models.py -
Create:
packages/jw-core/tests/test_provenance/test_models.py -
Step 1: Write the failing test
# packages/jw-core/tests/test_provenance/test_models.py
"""Tests for ProvenanceRecord — the read-only typed view over Citation.metadata."""
from __future__ import annotations
from typing import Any
import pytest
from jw_core.provenance.models import ProvenanceRecord
def test_from_citation_metadata_returns_none_when_keys_absent() -> None:
"""Backwards compat: a legacy citation with no provenance keys → None."""
assert ProvenanceRecord.from_citation_metadata({}) is None
assert ProvenanceRecord.from_citation_metadata({"unrelated": "stuff"}) is None
def test_from_citation_metadata_requires_at_minimum_content_hash_and_accessed_at() -> None:
"""`content_hash` and `accessed_at` are the two non-negotiable fields."""
meta_partial: dict[str, Any] = {"accessed_at": "2026-05-31T10:00:00Z"}
assert ProvenanceRecord.from_citation_metadata(meta_partial) is None
meta_partial2: dict[str, Any] = {"content_hash": "deadbeef"}
assert ProvenanceRecord.from_citation_metadata(meta_partial2) is None
def test_from_citation_metadata_roundtrip_full() -> None:
meta: dict[str, Any] = {
"published_date": "2023-01-15",
"accessed_at": "2026-05-31T10:00:00Z",
"content_hash": "abc123def456",
"revision": "rev. 2023",
"other_unrelated": "ignored",
}
record = ProvenanceRecord.from_citation_metadata(meta)
assert record is not None
assert record.published_date == "2023-01-15"
assert record.accessed_at == "2026-05-31T10:00:00Z"
assert record.content_hash == "abc123def456"
assert record.revision == "rev. 2023"
def test_from_citation_metadata_optionals_null_safe() -> None:
"""published_date and revision are optional; only the two anchors must be present."""
meta: dict[str, Any] = {
"accessed_at": "2026-05-31T10:00:00Z",
"content_hash": "deadbeef",
}
record = ProvenanceRecord.from_citation_metadata(meta)
assert record is not None
assert record.published_date is None
assert record.revision is None
def test_to_dict_emits_only_present_keys() -> None:
"""The serializer is used by stamp_citation when re-projecting back."""
record = ProvenanceRecord(
accessed_at="2026-05-31T10:00:00Z",
content_hash="abc",
published_date=None,
revision=None,
)
out = record.model_dump(exclude_none=True)
assert "published_date" not in out
assert "revision" not in out
assert out["accessed_at"] == "2026-05-31T10:00:00Z"
assert out["content_hash"] == "abc"
def test_record_is_immutable_view_not_a_mutator() -> None:
"""Construction does not mutate the source dict (pure projection)."""
meta = {
"accessed_at": "2026-05-31T10:00:00Z",
"content_hash": "abc",
}
snapshot = dict(meta)
ProvenanceRecord.from_citation_metadata(meta)
assert meta == snapshot
def test_construction_rejects_unknown_field() -> None:
"""Pydantic strict-ish: unknown keyword raises."""
with pytest.raises(Exception):
ProvenanceRecord( # type: ignore[call-arg]
accessed_at="x",
content_hash="y",
nonsense="oops",
)
- Step 2: Run test to verify it fails
Run: uv run pytest packages/jw-core/tests/test_provenance/test_models.py -v
Expected: FAIL — ProvenanceRecord.from_citation_metadata not defined / returns wrong shape.
- Step 3: Implement
ProvenanceRecord
Replace packages/jw-core/src/jw_core/provenance/models.py contents (the
two other models remain placeholders for Task 3):
# packages/jw-core/src/jw_core/provenance/models.py
"""Pydantic models for provenance.
`ProvenanceRecord` is a read-only typed view over the four conventional
keys that live inside `Citation.metadata`. We deliberately do NOT extend
the `Citation` dataclass — that would force 1984+ existing tests to
update. Instead we project the dict into a typed Pydantic model on
demand and project back out via `model_dump(exclude_none=True)`.
`ProvenanceVerdict` and `ProvenanceReport` (Task 3) carry the result of
a re-fetch comparison and an aggregate of those, respectively.
"""
from __future__ import annotations
from datetime import datetime
from typing import Any, Literal
from pydantic import BaseModel, ConfigDict, Field
class ProvenanceRecord(BaseModel):
"""Typed view over the four provenance keys in `Citation.metadata`.
Two of the four are required for the view to be meaningful:
- `accessed_at` — when the toolkit pulled the text
- `content_hash` — sha256 hex of the canonicalized passage
The other two are recommended but optional:
- `published_date` — original publication date (ISO 8601), may be missing on WOL
- `revision` — translation revision tag, e.g. "rev. 2023"
"""
model_config = ConfigDict(extra="forbid", frozen=False, str_strip_whitespace=False)
accessed_at: str
content_hash: str
published_date: str | None = None
revision: str | None = None
@classmethod
def from_citation_metadata(cls, meta: dict[str, Any]) -> "ProvenanceRecord | None":
"""Project a Citation.metadata dict into a typed record.
Returns None when either anchor field is missing — this is the
backwards-compat path for citations emitted before Fase 40.
Never mutates the source dict.
"""
if not isinstance(meta, dict):
return None
accessed_at = meta.get("accessed_at")
content_hash = meta.get("content_hash")
if not isinstance(accessed_at, str) or not isinstance(content_hash, str):
return None
if not accessed_at or not content_hash:
return None
published_date = meta.get("published_date")
if published_date is not None and not isinstance(published_date, str):
published_date = None
revision = meta.get("revision")
if revision is not None and not isinstance(revision, str):
revision = None
return cls(
accessed_at=accessed_at,
content_hash=content_hash,
published_date=published_date,
revision=revision,
)
# ProvenanceVerdict and ProvenanceReport are implemented in Task 3.
class ProvenanceVerdict(BaseModel):
"""Placeholder — replaced in Task 3."""
class ProvenanceReport(BaseModel):
"""Placeholder — replaced in Task 3."""
- Step 4: Run test to verify it passes
Run: uv run pytest packages/jw-core/tests/test_provenance/test_models.py -v
Expected: 7 passed.
- Step 5: Commit
git add packages/jw-core/src/jw_core/provenance/models.py packages/jw-core/tests/test_provenance/test_models.py
git commit -m "feat(jw-core/provenance): ProvenanceRecord typed view over Citation.metadata"
Task 3: ProvenanceVerdict + ProvenanceReport models
Files:
-
Modify:
packages/jw-core/src/jw_core/provenance/models.py -
Modify:
packages/jw-core/tests/test_provenance/test_models.py -
Step 1: Append the failing tests
Append to packages/jw-core/tests/test_provenance/test_models.py:
def test_verdict_match_minimal() -> None:
"""The simplest happy-path verdict only needs the two hashes and the recheck time."""
from jw_core.provenance.models import ProvenanceVerdict
v = ProvenanceVerdict(
url="https://wol.jw.org/x",
status="match",
original_hash="abc",
current_hash="abc",
delta_chars=0,
accessed_at_original="2026-05-30T10:00:00Z",
accessed_at_recheck="2026-05-31T10:00:00Z",
)
assert v.status == "match"
assert v.original_hash == v.current_hash
assert v.nli_rerun is None
assert v.notes == []
def test_verdict_changed_with_nli_rerun() -> None:
"""When NLI is available and content changed, we attach the new verdict."""
from jw_core.provenance.models import ProvenanceVerdict
v = ProvenanceVerdict(
url="https://wol.jw.org/x",
status="changed",
original_hash="abc",
current_hash="xyz",
delta_chars=42,
accessed_at_original="2026-05-30T10:00:00Z",
accessed_at_recheck="2026-05-31T10:00:00Z",
nli_rerun={"changed": True, "from": "entails", "to": "neutral", "score": 0.42},
notes=["sha256 mismatch"],
)
assert v.status == "changed"
assert v.nli_rerun is not None
assert v.nli_rerun["from"] == "entails"
def test_verdict_unreachable_no_current_hash() -> None:
"""Network failure → status='unreachable', current_hash is None."""
from jw_core.provenance.models import ProvenanceVerdict
v = ProvenanceVerdict(
url="https://wol.jw.org/x",
status="unreachable",
original_hash="abc",
current_hash=None,
delta_chars=None,
accessed_at_original="2026-05-30T10:00:00Z",
accessed_at_recheck="2026-05-31T10:00:00Z",
)
assert v.current_hash is None
assert v.delta_chars is None
def test_verdict_no_record() -> None:
"""Citation lacked provenance keys altogether."""
from jw_core.provenance.models import ProvenanceVerdict
v = ProvenanceVerdict(
url="https://wol.jw.org/x",
status="no_record",
original_hash=None,
current_hash=None,
delta_chars=None,
accessed_at_original=None,
accessed_at_recheck="2026-05-31T10:00:00Z",
)
assert v.status == "no_record"
assert v.original_hash is None
def test_verdict_skipped_explanation() -> None:
"""`skipped` is what `check_since` emits when a citation is too recent."""
from jw_core.provenance.models import ProvenanceVerdict
v = ProvenanceVerdict(
url="https://wol.jw.org/x",
status="skipped",
original_hash="abc",
current_hash=None,
delta_chars=None,
accessed_at_original="2026-05-30T10:00:00Z",
accessed_at_recheck="2026-05-31T10:00:00Z",
notes=["accessed_at >= since threshold"],
)
assert v.status == "skipped"
def test_verdict_rejects_unknown_status() -> None:
from jw_core.provenance.models import ProvenanceVerdict
with pytest.raises(Exception):
ProvenanceVerdict(
url="https://wol.jw.org/x",
status="bogus", # type: ignore[arg-type]
original_hash=None,
current_hash=None,
delta_chars=None,
accessed_at_original=None,
accessed_at_recheck="2026-05-31T10:00:00Z",
)
def test_report_summarize_counts_statuses() -> None:
from datetime import datetime
from jw_core.provenance.models import ProvenanceReport, ProvenanceVerdict
started = datetime(2026, 5, 31, 10, 0, 0)
finished = datetime(2026, 5, 31, 10, 0, 5)
verdicts = [
ProvenanceVerdict(
url=f"https://wol.jw.org/{i}",
status=status,
original_hash="abc",
current_hash=None,
delta_chars=None,
accessed_at_original=None,
accessed_at_recheck="2026-05-31T10:00:00Z",
)
for i, status in enumerate(["match", "match", "changed", "unreachable", "no_record"])
]
report = ProvenanceReport(
started_at=started,
finished_at=finished,
verdicts=verdicts,
summary=ProvenanceReport.summarize(verdicts),
)
assert report.summary["match"] == 2
assert report.summary["changed"] == 1
assert report.summary["unreachable"] == 1
assert report.summary["no_record"] == 1
assert report.summary.get("skipped", 0) == 0
def test_report_round_trip_json() -> None:
"""Reports serialize cleanly — used by CLI --report json."""
from datetime import datetime
from jw_core.provenance.models import ProvenanceReport, ProvenanceVerdict
started = datetime(2026, 5, 31, 10, 0, 0)
finished = datetime(2026, 5, 31, 10, 0, 5)
verdicts = [
ProvenanceVerdict(
url="https://wol.jw.org/x",
status="match",
original_hash="abc",
current_hash="abc",
delta_chars=0,
accessed_at_original="2026-05-30T10:00:00Z",
accessed_at_recheck="2026-05-31T10:00:00Z",
)
]
report = ProvenanceReport(
started_at=started,
finished_at=finished,
verdicts=verdicts,
summary=ProvenanceReport.summarize(verdicts),
)
raw = report.model_dump_json()
rehydrated = ProvenanceReport.model_validate_json(raw)
assert rehydrated.verdicts[0].status == "match"
- Step 2: Run test to verify it fails
Run: uv run pytest packages/jw-core/tests/test_provenance/test_models.py -v
Expected: 8 new tests FAIL.
- Step 3: Implement
ProvenanceVerdictandProvenanceReport
Replace the two placeholders in packages/jw-core/src/jw_core/provenance/models.py
with full implementations (keep ProvenanceRecord intact above them):
# Append/replace inside packages/jw-core/src/jw_core/provenance/models.py
VerdictStatus = Literal["match", "changed", "unreachable", "no_record", "skipped"]
class ProvenanceVerdict(BaseModel):
"""The result of comparing a single citation's stored hash to a re-fetch.
Statuses:
- "match": current text canonicalizes to the same hash as stored.
- "changed": hashes differ — the live text has been edited.
- "unreachable": fetcher raised or returned non-2xx — verdict is unknown.
- "no_record": the citation lacked provenance metadata (backwards compat).
- "skipped": `check_since` excluded this citation by date threshold.
"""
model_config = ConfigDict(extra="forbid")
url: str
status: VerdictStatus
original_hash: str | None
current_hash: str | None
delta_chars: int | None
accessed_at_original: str | None
accessed_at_recheck: str
nli_rerun: dict[str, Any] | None = None
notes: list[str] = Field(default_factory=list)
class ProvenanceReport(BaseModel):
"""Aggregate of many ProvenanceVerdicts produced in a single run."""
model_config = ConfigDict(extra="forbid")
started_at: datetime
finished_at: datetime
verdicts: list[ProvenanceVerdict] = Field(default_factory=list)
summary: dict[str, int] = Field(default_factory=dict)
@staticmethod
def summarize(verdicts: list[ProvenanceVerdict]) -> dict[str, int]:
"""Roll up counts per status. Missing statuses yield 0 on demand."""
counts: dict[str, int] = {}
for v in verdicts:
counts[v.status] = counts.get(v.status, 0) + 1
return counts
- Step 4: Run test to verify it passes
Run: uv run pytest packages/jw-core/tests/test_provenance/test_models.py -v
Expected: 15 passed.
- Step 5: Commit
git add packages/jw-core/src/jw_core/provenance/models.py packages/jw-core/tests/test_provenance/test_models.py
git commit -m "feat(jw-core/provenance): ProvenanceVerdict + ProvenanceReport with summarize()"
Task 4: canonicalize_text + content_sha256 (NFC, whitespace, preserve case)
Files:
-
Modify:
packages/jw-core/src/jw_core/provenance/hashing.py -
Create:
packages/jw-core/tests/test_provenance/test_hashing.py -
Step 1: Write the failing test
# packages/jw-core/tests/test_provenance/test_hashing.py
"""Tests for canonicalize_text() and content_sha256().
Design pinned by the spec:
- NFC unicode normalization
- Collapse internal whitespace runs to a single space
- Strip leading/trailing whitespace
- PRESERVE capitalization (Jehová vs jehová is doctrinally meaningful)
- Eliminate zero-width characters
The hash must be stable across cosmetic-only edits and sensitive to
actual content edits.
"""
from __future__ import annotations
from jw_core.provenance.hashing import canonicalize_text, content_sha256
def test_canonicalize_strips_outer_whitespace() -> None:
assert canonicalize_text(" hello ") == "hello"
def test_canonicalize_collapses_internal_whitespace_runs() -> None:
assert canonicalize_text("hello\t world\n\nfriend") == "hello world friend"
def test_canonicalize_preserves_capitalization() -> None:
"""Spec decision: do NOT lowercase. `Jehová` and `jehová` hash differently."""
a = canonicalize_text("Jehová es Dios")
b = canonicalize_text("jehová es dios")
assert a != b
# And neither is lowercased internally:
assert "Jehová" in a
assert "jehová" in b
def test_canonicalize_nfc_normalizes_decomposed_form() -> None:
"""`é` composed (U+00E9) vs decomposed (e + U+0301) must canonicalize the same."""
composed = "Jehová" # á as a single codepoint
decomposed = "Jehová" # a + combining acute
assert canonicalize_text(composed) == canonicalize_text(decomposed)
def test_canonicalize_removes_zero_width_chars() -> None:
"""ZWSP / ZWJ / ZWNJ / BOM are stripped."""
text = "Jehová"
assert canonicalize_text(text) == "Jehová"
def test_canonicalize_is_idempotent() -> None:
"""Running it twice yields the same string."""
a = canonicalize_text(" hello world ")
assert canonicalize_text(a) == a
def test_content_sha256_stable_across_cosmetic_edits() -> None:
"""Whitespace, NFC, ZWSP must not change the hash."""
base = "Jehová amó tanto al mundo que dio a su Hijo"
cosmetic = " Jehová amó tanto al mundo\nque dio a su Hijo "
assert content_sha256(base) == content_sha256(cosmetic)
def test_content_sha256_changes_when_real_word_differs() -> None:
base = "Jehová amó tanto al mundo que dio a su Hijo"
edited = "Jehová amó tanto al universo que dio a su Hijo" # mundo → universo
assert content_sha256(base) != content_sha256(edited)
def test_content_sha256_changes_when_capitalization_differs() -> None:
"""Spec decision propagated: capitalization is meaningful."""
a = content_sha256("Jehová es Dios")
b = content_sha256("jehová es Dios")
assert a != b
def test_content_sha256_returns_hex_string() -> None:
"""Returns lowercase hex (sha256 → 64 chars)."""
h = content_sha256("hello")
assert len(h) == 64
assert all(c in "0123456789abcdef" for c in h)
def test_canonicalize_empty_input() -> None:
assert canonicalize_text("") == ""
assert canonicalize_text(" \n ") == ""
def test_content_sha256_empty_is_stable() -> None:
"""An empty canonicalized string still hashes deterministically."""
a = content_sha256("")
b = content_sha256(" ")
assert a == b
- Step 2: Run test to verify it fails
Run: uv run pytest packages/jw-core/tests/test_provenance/test_hashing.py -v
Expected: FAIL — current canonicalize_text is identity.
- Step 3: Implement canonicalization + hashing
Replace packages/jw-core/src/jw_core/provenance/hashing.py:
# packages/jw-core/src/jw_core/provenance/hashing.py
"""Canonicalization + content hashing for provenance.
Why canonicalize before hashing?
Naïve sha256(html_body) is brittle: WOL re-deploys the same HTML with
different attribute ordering, an updated date stamp in <meta>, or a
re-indented body — and our hashes diverge for no doctrinal reason.
The pipeline is intentionally minimal:
1. Unicode NFC normalize so composed/decomposed forms align.
2. Drop zero-width characters that occasionally appear in pasted text.
3. Collapse runs of any whitespace (including newlines, tabs) into a
single ASCII space.
4. Strip leading/trailing whitespace.
What we deliberately do NOT do:
- Lowercase. Capitalization in WT/NWT distinguishes "Jehová" from
casual mentions; "Mi Padre" capitalization in NWT rev. 2023 is a
real doctrinal signal. Squashing it would mask drift we care about.
- Strip punctuation. "Romanos 6:23." vs "Romanos 6:23" is meaningful
in citation chains.
- Remove HTML. Callers are expected to extract plain text first via
the injectable `extractor` on the validator. Hashing HTML directly
would conflate styling drift with content drift.
These choices match the spec Decision NO Obvia in
2026-05-31-fase-40-content-provenance-design.md §canonicalización.
"""
from __future__ import annotations
import hashlib
import re
import unicodedata
# All Unicode zero-width / BOM-ish codepoints we want gone before hashing.
_ZERO_WIDTH = {
"", # ZERO WIDTH SPACE
"", # ZERO WIDTH NON-JOINER
"", # ZERO WIDTH JOINER
"", # WORD JOINER
"", # ZERO WIDTH NO-BREAK SPACE / BOM
}
# Regex matches any run of whitespace per Python's \s (covers \n, \r, \t, etc.)
_WHITESPACE_RUN = re.compile(r"\s+")
def canonicalize_text(text: str) -> str:
"""Normalize text so cosmetic edits don't inflate the content hash.
Steps:
1. NFC normalize.
2. Drop zero-width characters.
3. Collapse whitespace runs to a single space.
4. Strip outer whitespace.
Capitalization is preserved on purpose — the doctrine of distinguishing
"Jehová" from "jehová" is the canonical example.
"""
if not text:
return ""
nfc = unicodedata.normalize("NFC", text)
if _ZERO_WIDTH.intersection(nfc):
nfc = "".join(ch for ch in nfc if ch not in _ZERO_WIDTH)
collapsed = _WHITESPACE_RUN.sub(" ", nfc)
return collapsed.strip()
def content_sha256(text: str) -> str:
"""Lowercase-hex sha256 of `canonicalize_text(text)`."""
canonical = canonicalize_text(text)
return hashlib.sha256(canonical.encode("utf-8")).hexdigest()
- Step 4: Run test to verify it passes
Run: uv run pytest packages/jw-core/tests/test_provenance/test_hashing.py -v
Expected: 12 passed.
- Step 5: Commit
git add packages/jw-core/src/jw_core/provenance/hashing.py packages/jw-core/tests/test_provenance/test_hashing.py
git commit -m "feat(jw-core/provenance): canonicalize_text + content_sha256 with NFC and whitespace rules"
Task 5: ProvenanceValidator.check async with injected fetcher
Files:
-
Modify:
packages/jw-core/src/jw_core/provenance/validator.py -
Create:
packages/jw-core/tests/test_provenance/test_validator.py -
Step 1: Write the failing test
# packages/jw-core/tests/test_provenance/test_validator.py
"""Tests for ProvenanceValidator — fetcher is injected, never real network."""
from __future__ import annotations
from dataclasses import dataclass, field
from datetime import datetime, timezone
from typing import Any
import pytest
from jw_agents.base import Citation, Finding
from jw_core.provenance.validator import (
FetcherResponse,
ProvenanceValidator,
)
from jw_core.provenance.hashing import content_sha256
# ── Fakes ──────────────────────────────────────────────────────────────
@dataclass
class FakeFetcher:
"""Maps URL → (status, body). Async-callable like the production fetcher."""
canned: dict[str, tuple[int, str]] = field(default_factory=dict)
calls: list[str] = field(default_factory=list)
raise_for: set[str] = field(default_factory=set)
async def __call__(self, url: str) -> FetcherResponse:
self.calls.append(url)
if url in self.raise_for:
raise RuntimeError(f"forced failure for {url}")
status, body = self.canned.get(url, (404, ""))
return FetcherResponse(final_url=url, status=status, body=body)
def _stamped_citation(text: str, *, url: str = "https://wol.jw.org/x") -> Citation:
"""Build a citation as if the parser had stamped it with provenance."""
return Citation(
url=url,
title="t",
kind="verse",
metadata={
"accessed_at": "2026-05-30T10:00:00Z",
"content_hash": content_sha256(text),
"published_date": "2024-01-01",
"revision": "rev. 2023",
},
)
# ── Tests ──────────────────────────────────────────────────────────────
@pytest.mark.asyncio
async def test_check_match_when_content_unchanged() -> None:
text = "Jehová amó tanto al mundo"
cit = _stamped_citation(text)
fetcher = FakeFetcher(canned={cit.url: (200, text)})
validator = ProvenanceValidator(fetcher=fetcher)
verdict = await validator.check(cit)
assert verdict.status == "match"
assert verdict.original_hash == verdict.current_hash
assert verdict.delta_chars == 0
assert fetcher.calls == [cit.url]
@pytest.mark.asyncio
async def test_check_changed_when_text_edited() -> None:
original_text = "Jehová amó tanto al mundo"
new_text = "Jehová amó tanto al universo" # mundo → universo
cit = _stamped_citation(original_text)
fetcher = FakeFetcher(canned={cit.url: (200, new_text)})
validator = ProvenanceValidator(fetcher=fetcher)
verdict = await validator.check(cit)
assert verdict.status == "changed"
assert verdict.original_hash != verdict.current_hash
assert verdict.delta_chars is not None
assert verdict.delta_chars >= 0
@pytest.mark.asyncio
async def test_check_unreachable_when_fetcher_raises() -> None:
cit = _stamped_citation("doesn't matter")
fetcher = FakeFetcher(raise_for={cit.url})
validator = ProvenanceValidator(fetcher=fetcher)
verdict = await validator.check(cit)
assert verdict.status == "unreachable"
assert verdict.current_hash is None
assert any("forced failure" in n for n in verdict.notes)
@pytest.mark.asyncio
async def test_check_unreachable_when_non_2xx() -> None:
cit = _stamped_citation("text")
fetcher = FakeFetcher(canned={cit.url: (404, "")})
validator = ProvenanceValidator(fetcher=fetcher)
verdict = await validator.check(cit)
assert verdict.status == "unreachable"
assert any("404" in n for n in verdict.notes)
@pytest.mark.asyncio
async def test_check_no_record_when_citation_lacks_provenance() -> None:
"""Backwards compat: legacy citations have no content_hash → no_record."""
cit = Citation(url="https://wol.jw.org/x", title="t", kind="verse", metadata={})
fetcher = FakeFetcher(canned={cit.url: (200, "anything")})
validator = ProvenanceValidator(fetcher=fetcher)
verdict = await validator.check(cit)
assert verdict.status == "no_record"
assert verdict.original_hash is None
assert fetcher.calls == [] # no fetch attempted
@pytest.mark.asyncio
async def test_check_uses_injected_extractor() -> None:
"""If the fetcher returns HTML, the extractor turns it into plain text first."""
canonical_text = "Jehová amó tanto al mundo"
html = f"<html><body><p>{canonical_text}</p><script>junk</script></body></html>"
cit = _stamped_citation(canonical_text)
def text_only(body: str) -> str:
import re
return re.sub(r"<[^>]+>", " ", body)
fetcher = FakeFetcher(canned={cit.url: (200, html)})
validator = ProvenanceValidator(fetcher=fetcher, extractor=text_only)
verdict = await validator.check(cit)
assert verdict.status == "match"
@pytest.mark.asyncio
async def test_check_agent_output_paralellizes_unique_urls() -> None:
"""Two findings, same URL → fetcher called once."""
text = "shared body"
cit_a = _stamped_citation(text, url="https://wol.jw.org/shared")
cit_b = _stamped_citation(text, url="https://wol.jw.org/shared")
finding_a = Finding(summary="a", citation=cit_a, excerpt=text)
finding_b = Finding(summary="b", citation=cit_b, excerpt=text)
class _R:
findings = [finding_a, finding_b]
fetcher = FakeFetcher(canned={cit_a.url: (200, text)})
validator = ProvenanceValidator(fetcher=fetcher)
report = await validator.check_agent_output(_R())
assert len(report.verdicts) == 2
assert all(v.status == "match" for v in report.verdicts)
# Dedup: only one network call even though two findings point at it
assert fetcher.calls.count(cit_a.url) == 1
assert report.summary["match"] == 2
@pytest.mark.asyncio
async def test_check_since_filters_by_accessed_at_threshold() -> None:
"""Only re-check citations accessed BEFORE the `since` cutoff."""
text = "body"
old = _stamped_citation(text, url="https://wol.jw.org/old")
old.metadata["accessed_at"] = "2026-01-01T00:00:00Z"
new = _stamped_citation(text, url="https://wol.jw.org/new")
new.metadata["accessed_at"] = "2026-05-31T00:00:00Z"
class _F:
def __init__(self, c: Citation) -> None:
self.citation = c
self.metadata: dict[str, Any] = {}
class _R:
findings = [_F(old), _F(new)]
fetcher = FakeFetcher(canned={old.url: (200, text), new.url: (200, text)})
validator = ProvenanceValidator(fetcher=fetcher)
report = await validator.check_since(
_R(),
since=datetime(2026, 3, 1, tzinfo=timezone.utc),
)
# `new` is younger than the cutoff → skipped.
skipped = [v for v in report.verdicts if v.status == "skipped"]
matched = [v for v in report.verdicts if v.status == "match"]
assert len(skipped) == 1
assert skipped[0].url == new.url
assert len(matched) == 1
assert matched[0].url == old.url
# Only the old URL should have been fetched.
assert fetcher.calls == [old.url]
@pytest.mark.asyncio
async def test_check_agent_output_aggregates_summary() -> None:
"""Mixed outcomes — summary counts each status."""
text_match = "x"
text_orig = "y"
text_drift = "z" # different from text_orig
cit_match = _stamped_citation(text_match, url="https://wol.jw.org/a")
cit_drift = _stamped_citation(text_orig, url="https://wol.jw.org/b")
cit_dead = _stamped_citation(text_match, url="https://wol.jw.org/c")
cit_legacy = Citation(url="https://wol.jw.org/d", metadata={})
class _F:
def __init__(self, c: Citation) -> None:
self.citation = c
self.metadata: dict[str, Any] = {}
class _R:
findings = [_F(cit_match), _F(cit_drift), _F(cit_dead), _F(cit_legacy)]
fetcher = FakeFetcher(
canned={
cit_match.url: (200, text_match),
cit_drift.url: (200, text_drift),
cit_dead.url: (500, ""),
cit_legacy.url: (200, "irrelevant"),
}
)
validator = ProvenanceValidator(fetcher=fetcher)
report = await validator.check_agent_output(_R())
assert report.summary["match"] == 1
assert report.summary["changed"] == 1
assert report.summary["unreachable"] == 1
assert report.summary["no_record"] == 1
- Step 2: Run test to verify it fails
Run: uv run pytest packages/jw-core/tests/test_provenance/test_validator.py -v
Expected: FAIL — ProvenanceValidator is still a placeholder.
- Step 3: Implement
ProvenanceValidator
Replace packages/jw-core/src/jw_core/provenance/validator.py:
# packages/jw-core/src/jw_core/provenance/validator.py
"""Re-fetch citations and compare content hashes.
The validator is intentionally narrow: it does not own a network client,
does not parse HTML on its own, and does not know about Fase 39 unless
an `nli_provider` is passed. This keeps it deterministic in tests and
trivially mockable in CI.
Public surface:
ProvenanceValidator.check(citation) -> ProvenanceVerdict
ProvenanceValidator.check_agent_output(result) -> ProvenanceReport
ProvenanceValidator.check_since(result, *, since=dt) -> ProvenanceReport
"""
from __future__ import annotations
import asyncio
from collections.abc import Awaitable, Callable
from dataclasses import dataclass, field
from datetime import datetime, timezone
from typing import Any, Protocol
from jw_agents.base import Citation
from jw_core.provenance.hashing import canonicalize_text, content_sha256
from jw_core.provenance.models import (
ProvenanceRecord,
ProvenanceReport,
ProvenanceVerdict,
)
@dataclass
class FetcherResponse:
"""Minimal response carried back from the injected fetcher."""
final_url: str
status: int
body: str = ""
redirect_chain: list[str] = field(default_factory=list)
AsyncFetcher = Callable[[str], Awaitable[FetcherResponse]]
Extractor = Callable[[str], str]
class NLIProvider(Protocol): # pragma: no cover — structural typing only
"""Minimal slice of Fase 39's NLIProvider needed for re-validation."""
async def evaluate_entailment(self, claim: str, premise: str) -> Any: ...
def _default_extractor(body: str) -> str:
"""Identity extractor — used when caller does not provide one.
The spec recommends callers always inject one. Identity is a safe
fallback that won't crash on plain-text fetcher responses (the
FakeFetcher case in tests).
"""
return body
def _utcnow_iso() -> str:
"""ISO 8601 UTC timestamp for `accessed_at_recheck`."""
return datetime.now(timezone.utc).isoformat()
def _parse_iso(value: str | None) -> datetime | None:
"""Best-effort ISO 8601 parser; returns None on garbage."""
if not value:
return None
try:
# Python 3.11+ parses Z-suffixed strings natively.
return datetime.fromisoformat(value.replace("Z", "+00:00"))
except ValueError:
return None
class ProvenanceValidator:
"""Compare stored content hashes vs live re-fetches.
Args:
fetcher: async URL -> FetcherResponse. Required.
extractor: sync HTML/body -> plain text. Defaults to identity.
Inject the WOL extractor for HTML pages.
nli_provider: Fase 39's NLIProvider. When provided AND a verdict
is `changed`, re-runs entailment on the new text.
concurrency: max parallel fetches (default 4 — matches Fase 23).
"""
def __init__(
self,
*,
fetcher: AsyncFetcher,
extractor: Extractor | None = None,
nli_provider: NLIProvider | None = None,
concurrency: int = 4,
) -> None:
self._fetcher = fetcher
self._extractor = extractor or _default_extractor
self._nli_provider = nli_provider
self._concurrency = concurrency
self._sem: asyncio.Semaphore | None = None
def _get_sem(self) -> asyncio.Semaphore:
if self._sem is None:
self._sem = asyncio.Semaphore(self._concurrency)
return self._sem
# ── Single-citation check ──────────────────────────────────────────
async def check(self, citation: Citation) -> ProvenanceVerdict:
"""Re-fetch one citation's URL and compare content hashes."""
recheck_at = _utcnow_iso()
record = ProvenanceRecord.from_citation_metadata(citation.metadata)
if record is None:
# Backwards compat: never emitted with provenance — skip the fetch.
return ProvenanceVerdict(
url=citation.url,
status="no_record",
original_hash=None,
current_hash=None,
delta_chars=None,
accessed_at_original=None,
accessed_at_recheck=recheck_at,
notes=["citation has no provenance metadata"],
)
# Fetch.
try:
async with self._get_sem():
resp = await self._fetcher(citation.url)
except Exception as exc: # noqa: BLE001 — fetcher contract is wide
return ProvenanceVerdict(
url=citation.url,
status="unreachable",
original_hash=record.content_hash,
current_hash=None,
delta_chars=None,
accessed_at_original=record.accessed_at,
accessed_at_recheck=recheck_at,
notes=[f"fetcher raised: {exc!r}"],
)
if not (200 <= resp.status < 300):
return ProvenanceVerdict(
url=citation.url,
status="unreachable",
original_hash=record.content_hash,
current_hash=None,
delta_chars=None,
accessed_at_original=record.accessed_at,
accessed_at_recheck=recheck_at,
notes=[f"non-2xx response: HTTP {resp.status}"],
)
# Extract + hash.
plain = self._extractor(resp.body)
current_hash = content_sha256(plain)
canonical_current = canonicalize_text(plain)
# `delta_chars` is a coarse heuristic — we don't know the original
# canonical length anymore (we only stored the hash), so we report
# the absolute character count of the canonical current text. This
# is enough for an operator to spot order-of-magnitude drift.
delta = abs(len(canonical_current)) if current_hash != record.content_hash else 0
if current_hash == record.content_hash:
return ProvenanceVerdict(
url=citation.url,
status="match",
original_hash=record.content_hash,
current_hash=current_hash,
delta_chars=0,
accessed_at_original=record.accessed_at,
accessed_at_recheck=recheck_at,
)
verdict = ProvenanceVerdict(
url=citation.url,
status="changed",
original_hash=record.content_hash,
current_hash=current_hash,
delta_chars=delta,
accessed_at_original=record.accessed_at,
accessed_at_recheck=recheck_at,
notes=["sha256 mismatch"],
)
# Optional re-NLI on the fresh text — only if Fase 39 is wired up
# AND the original citation carried both a claim and a baseline verdict.
if self._nli_provider is not None:
verdict.nli_rerun = await self._maybe_rerun_nli(citation, canonical_current)
return verdict
async def _maybe_rerun_nli(
self,
citation: Citation,
new_premise: str,
) -> dict[str, Any] | None:
"""Re-run NLI on the new text and report a delta vs the stored verdict."""
claim = citation.metadata.get("nli_claim")
baseline = citation.metadata.get("nli_verdict")
if not isinstance(claim, str) or not claim:
return None
try:
new = await self._nli_provider.evaluate_entailment(claim, new_premise)
except Exception as exc: # noqa: BLE001
return {"changed": False, "error": f"nli_rerun failed: {exc!r}"}
# Fase 39's verdict is a Pydantic model with `.label` and `.score`
# — but we duck-type to keep this validator independent of its
# exact shape. We accept any object exposing `.label` and `.score`,
# or any dict with those keys.
new_label = getattr(new, "label", None)
if new_label is None and isinstance(new, dict):
new_label = new.get("label")
new_score = getattr(new, "score", None)
if new_score is None and isinstance(new, dict):
new_score = new.get("score")
if new_label is None:
return None
return {
"changed": (baseline != new_label),
"from": baseline,
"to": new_label,
"score": new_score,
}
# ── Batch over an AgentResult-like ─────────────────────────────────
async def check_agent_output(self, agent_output: Any) -> ProvenanceReport:
"""Iterate the result's findings, dedup by URL, parallelize fetches."""
started = datetime.now(timezone.utc)
citations = self._collect_citations(agent_output)
verdicts = await self._check_many(citations)
finished = datetime.now(timezone.utc)
return ProvenanceReport(
started_at=started,
finished_at=finished,
verdicts=verdicts,
summary=ProvenanceReport.summarize(verdicts),
)
async def check_since(
self,
agent_output: Any,
*,
since: datetime,
) -> ProvenanceReport:
"""Like check_agent_output but skips citations younger than `since`."""
started = datetime.now(timezone.utc)
all_citations = self._collect_citations(agent_output)
to_check: list[Citation] = []
skipped_verdicts: list[ProvenanceVerdict] = []
recheck_at = _utcnow_iso()
for cit in all_citations:
accessed = _parse_iso(cit.metadata.get("accessed_at"))
if accessed is not None and accessed >= since:
skipped_verdicts.append(
ProvenanceVerdict(
url=cit.url,
status="skipped",
original_hash=cit.metadata.get("content_hash"),
current_hash=None,
delta_chars=None,
accessed_at_original=cit.metadata.get("accessed_at"),
accessed_at_recheck=recheck_at,
notes=[f"accessed_at >= since={since.isoformat()}"],
)
)
else:
to_check.append(cit)
fetched = await self._check_many(to_check)
verdicts = fetched + skipped_verdicts
finished = datetime.now(timezone.utc)
return ProvenanceReport(
started_at=started,
finished_at=finished,
verdicts=verdicts,
summary=ProvenanceReport.summarize(verdicts),
)
# ── Internals ──────────────────────────────────────────────────────
@staticmethod
def _collect_citations(agent_output: Any) -> list[Citation]:
"""Best-effort: pull citations out of `findings`. Order preserved."""
out: list[Citation] = []
for f in getattr(agent_output, "findings", []) or []:
cit = getattr(f, "citation", None)
if isinstance(cit, Citation):
out.append(cit)
return out
async def _check_many(self, citations: list[Citation]) -> list[ProvenanceVerdict]:
"""Dedup by URL, run checks concurrently, then re-expand by URL."""
seen: dict[str, Citation] = {}
order: list[str] = []
for cit in citations:
if cit.url not in seen:
seen[cit.url] = cit
order.append(cit.url)
tasks = [self.check(seen[u]) for u in order]
verdicts_by_url = dict(zip(order, await asyncio.gather(*tasks), strict=True))
# Re-expand: every input citation gets a verdict (verbatim copy if dup).
return [verdicts_by_url[c.url] for c in citations]
- Step 4: Run test to verify it passes
Run: uv run pytest packages/jw-core/tests/test_provenance/test_validator.py -v
Expected: 9 passed.
- Step 5: Commit
git add packages/jw-core/src/jw_core/provenance/validator.py packages/jw-core/tests/test_provenance/test_validator.py
git commit -m "feat(jw-core/provenance): ProvenanceValidator.check/check_agent_output/check_since with FakeFetcher"
Task 6: Propagation helpers (stamp_citation, stamp_finding_text) + WOL ingest hook
Files:
-
Modify:
packages/jw-core/src/jw_core/provenance/propagation.py -
Create:
packages/jw-core/tests/test_provenance/test_propagation.py -
Modify:
packages/jw-core/src/jw_core/wol_client.py(small additions) -
Step 1: Write the failing test
# packages/jw-core/tests/test_provenance/test_propagation.py
"""Tests for stamp_citation / stamp_finding_text propagation helpers."""
from __future__ import annotations
from jw_agents.base import Citation, Finding
from jw_core.provenance.hashing import content_sha256
from jw_core.provenance.propagation import stamp_citation, stamp_finding_text
def test_stamp_citation_writes_four_keys() -> None:
cit = Citation(url="https://wol.jw.org/x", title="t", kind="verse", metadata={})
text = "Jehová amó tanto al mundo"
stamped = stamp_citation(
cit,
text=text,
published_date="2024-01-15",
revision="rev. 2023",
)
assert stamped is cit # in-place mutation, returns same object
assert cit.metadata["content_hash"] == content_sha256(text)
assert cit.metadata["published_date"] == "2024-01-15"
assert cit.metadata["revision"] == "rev. 2023"
assert isinstance(cit.metadata["accessed_at"], str)
assert cit.metadata["accessed_at"].endswith(("+00:00", "Z"))
def test_stamp_citation_is_idempotent_for_same_text() -> None:
"""Re-stamping with the same text → hash unchanged; accessed_at refreshes."""
cit = Citation(url="https://wol.jw.org/x", metadata={})
text = "same body"
stamp_citation(cit, text=text)
h1 = cit.metadata["content_hash"]
a1 = cit.metadata["accessed_at"]
import time
time.sleep(0.001)
stamp_citation(cit, text=text)
h2 = cit.metadata["content_hash"]
a2 = cit.metadata["accessed_at"]
assert h1 == h2
# accessed_at allowed to differ (re-stamping refreshes the timestamp)
assert isinstance(a2, str)
def test_stamp_citation_different_text_changes_hash() -> None:
cit = Citation(url="https://wol.jw.org/x", metadata={})
stamp_citation(cit, text="version 1")
h1 = cit.metadata["content_hash"]
stamp_citation(cit, text="version 2")
h2 = cit.metadata["content_hash"]
assert h1 != h2
def test_stamp_citation_preserves_unrelated_metadata() -> None:
cit = Citation(url="https://wol.jw.org/x", metadata={"source": "wol", "lang": "es"})
stamp_citation(cit, text="body")
assert cit.metadata["source"] == "wol"
assert cit.metadata["lang"] == "es"
assert "content_hash" in cit.metadata
def test_stamp_citation_optional_fields_omitted_remain_absent() -> None:
"""Don't write keys for `published_date=None` / `revision=None`."""
cit = Citation(url="https://wol.jw.org/x", metadata={})
stamp_citation(cit, text="x")
assert "published_date" not in cit.metadata
assert "revision" not in cit.metadata
def test_stamp_finding_text_uses_excerpt_as_default_text() -> None:
cit = Citation(url="https://wol.jw.org/x", metadata={})
finding = Finding(summary="s", citation=cit, excerpt="the excerpt body")
stamp_finding_text(finding)
assert cit.metadata["content_hash"] == content_sha256("the excerpt body")
def test_stamp_finding_text_no_op_when_excerpt_empty() -> None:
"""Findings without text shouldn't lie about their provenance."""
cit = Citation(url="https://wol.jw.org/x", metadata={})
finding = Finding(summary="s", citation=cit, excerpt="")
stamp_finding_text(finding)
assert "content_hash" not in cit.metadata
def test_stamp_finding_text_passes_through_published_date_kwargs() -> None:
"""Caller can override the auto-detected fields when known."""
cit = Citation(url="https://wol.jw.org/x", metadata={})
finding = Finding(summary="s", citation=cit, excerpt="hello")
stamp_finding_text(finding, published_date="2024-01-01", revision="rev. 2023")
assert cit.metadata["published_date"] == "2024-01-01"
assert cit.metadata["revision"] == "rev. 2023"
- Step 2: Run test to verify it fails
Run: uv run pytest packages/jw-core/tests/test_provenance/test_propagation.py -v
Expected: FAIL — current placeholder is identity.
- Step 3: Implement propagation helpers
Replace packages/jw-core/src/jw_core/provenance/propagation.py:
# packages/jw-core/src/jw_core/provenance/propagation.py
"""Stamp citations with the four conventional provenance keys.
This module is the hand-off point between text acquisition (WOL parser,
JWPUB indexer, agent body extraction) and provenance bookkeeping. Each
acquisition site calls `stamp_citation(...)` once with the canonical
text it just resolved — from that moment forward, the citation carries
a hash and timestamp that the validator can later verify.
Design rules:
- In-place mutation. The Citation dataclass is passed by reference.
- Idempotent for same text. Same input twice → same hash. Accessed_at
refreshes (the timestamp is a "last touch" record, not a "first seen").
- Never write keys for fields the caller didn't provide. Absent fields
stay absent so a downstream `ProvenanceRecord` round-trip stays clean.
- No I/O. This is pure CPU work.
"""
from __future__ import annotations
from datetime import datetime, timezone
from typing import Any
from jw_agents.base import Citation, Finding
from jw_core.provenance.hashing import content_sha256
def _utcnow_iso() -> str:
return datetime.now(timezone.utc).isoformat()
def stamp_citation(
citation: Citation,
*,
text: str,
published_date: str | None = None,
revision: str | None = None,
) -> Citation:
"""Write the four provenance keys into `citation.metadata` in place.
Always written:
- `content_hash` = sha256(canonicalize_text(text))
- `accessed_at` = ISO 8601 UTC now
Only written when not None:
- `published_date`
- `revision`
Returns the same citation object for fluent chaining.
"""
meta: dict[str, Any] = citation.metadata
meta["content_hash"] = content_sha256(text)
meta["accessed_at"] = _utcnow_iso()
if published_date is not None:
meta["published_date"] = published_date
if revision is not None:
meta["revision"] = revision
return citation
def stamp_finding_text(
finding: Finding,
*,
text: str | None = None,
published_date: str | None = None,
revision: str | None = None,
) -> Finding:
"""Convenience: stamp the finding's citation using `finding.excerpt` by default.
When `finding.excerpt` is empty AND no explicit `text` is given,
this is a no-op — we won't fabricate a hash over the empty string
and pretend that's provenance.
"""
effective = text if text is not None else (finding.excerpt or "")
if not effective:
return finding
stamp_citation(
finding.citation,
text=effective,
published_date=published_date,
revision=revision,
)
return finding
- Step 4: Wire
stamp_citationintoWOLClient.get_articleandget_bible_chapter
In packages/jw-core/src/jw_core/wol_client.py, locate the spots where
each method returns its parsed result (article + citation, chapter +
citation). Add an import at the top of the file:
from jw_core.provenance.propagation import stamp_citation
Inside get_article, after the parsed body is available and the
Citation is constructed but before it’s returned, stamp it. The exact
existing variable names may differ; the pattern is:
# After: citation = Citation(url=..., title=..., kind="article", metadata={...})
# And after the parser produced `body_text: str` and `published_date: str | None`:
stamp_citation(
citation,
text=body_text,
published_date=published_date,
)
return article # or whatever the original return value is
Inside get_bible_chapter, do the same. For chapters the canonical
“text” is the joined verse bodies; the revision tag is the NWT manifest
code the client already has access to:
# After the chapter dataclass is built and `chapter_text: str` exists:
stamp_citation(
chapter.citation,
text=chapter_text,
published_date=None, # WOL Bible chapters have no per-chapter publish date
revision=manifest.revision_tag, # e.g. "rev. 2023" — read from the WOL manifest
)
return chapter
If manifest.revision_tag does not exist as an attribute on the
current WOL manifest object, pass revision=None for now — the field
is optional and the Fase 40 spec accepts that. Don’t invent a value.
- Step 5: Run test to verify it passes
Run: uv run pytest packages/jw-core/tests/test_provenance/test_propagation.py -v
Expected: 8 passed.
Then run the WOL client tests to confirm no regression:
Run: uv run pytest packages/jw-core/tests/test_wol_client.py -v
Expected: all existing tests still pass; new metadata keys are additive.
- Step 6: Commit
git add packages/jw-core/src/jw_core/provenance/propagation.py \
packages/jw-core/tests/test_provenance/test_propagation.py \
packages/jw-core/src/jw_core/wol_client.py
git commit -m "feat(jw-core/provenance): stamp_citation + WOLClient ingest integration"
Task 7: Integrate stamping in verse_explainer and apologetics agents
Files:
-
Modify:
packages/jw-agents/src/jw_agents/verse_explainer.py -
Modify:
packages/jw-agents/src/jw_agents/apologetics.py -
Create:
packages/jw-core/tests/test_provenance/fixtures/agent_result_with_provenance.json -
Step 1: Write a verification test that asserts the stamped output
We extend test_propagation.py with an integration-style test that
constructs a verse_explainer (or any agent), runs it on a mocked
client, and asserts every emitted Citation carries the four keys.
Append to packages/jw-core/tests/test_provenance/test_propagation.py:
def test_verse_explainer_stamps_findings_through_excerpt() -> None:
"""End-to-end-ish: a finding emitted from an agent body has provenance."""
from jw_agents.base import Citation, Finding
# Simulate what the agent does after extracting verse text:
cit = Citation(url="https://wol.jw.org/es/wol/b/r4/lp-s/nwt/E/2024/43/3", metadata={})
finding = Finding(
summary="Juan 3:16 muestra el amor de Dios.",
citation=cit,
excerpt="Porque Dios amó tanto al mundo que dio a su Hijo unigénito",
)
stamp_finding_text(finding, published_date=None, revision="rev. 2023")
record = finding.citation.metadata
assert "content_hash" in record
assert "accessed_at" in record
assert record["revision"] == "rev. 2023"
- Step 2: Run test to verify it fails (or passes — depends on Task 6)
Run: uv run pytest packages/jw-core/tests/test_provenance/test_propagation.py::test_verse_explainer_stamps_findings_through_excerpt -v
Expected: pass — confirms the helper works for agent-emitted findings.
- Step 3: Patch
verse_explainer.pyto stamp at emission
Open packages/jw-agents/src/jw_agents/verse_explainer.py. Find every
spot where a Finding(...) is constructed and appended to the result.
Wherever the finding’s excerpt is the canonical verse text, stamp it
immediately after construction:
from jw_core.provenance.propagation import stamp_finding_text
# ... existing code ...
finding = Finding(
summary=summary,
citation=citation,
excerpt=excerpt,
metadata={"source": "verse_text"},
)
stamp_finding_text(
finding,
published_date=None, # verse-level publish date not available
revision=resolved_revision, # comes from WOLClient.get_bible_chapter result
)
result.findings.append(finding)
If resolved_revision is not exposed by the current verse_explainer
flow, pass revision=None. The field is optional per the spec — don’t
guess.
- Step 4: Patch
apologetics.pyto stamp at emission
Open packages/jw-agents/src/jw_agents/apologetics.py. Same treatment:
wherever Finding(...) is appended to the result, stamp it. The
apologetics agent typically produces findings of three sources
(topic_index, verse_text, study_aid). Each gets the same single-
line stamp call:
from jw_core.provenance.propagation import stamp_finding_text
# At each Finding(...) construction site:
stamp_finding_text(finding, published_date=None, revision=None)
result.findings.append(finding)
Pass actual published_date / revision when the upstream parser
exposes them. Otherwise None.
- Step 5: Create a golden fixture for downstream tests
// packages/jw-core/tests/test_provenance/fixtures/agent_result_with_provenance.json
{
"query": "Juan 3:16",
"agent_name": "verse_explainer",
"warnings": [],
"metadata": {"language": "es"},
"findings": [
{
"summary": "Juan 3:16 muestra el amor de Dios.",
"excerpt": "Porque Dios amó tanto al mundo que dio a su Hijo unigénito",
"metadata": {"source": "verse_text"},
"citation": {
"url": "https://wol.jw.org/es/wol/b/r4/lp-s/nwt/E/2024/43/3",
"title": "Juan 3",
"kind": "verse",
"metadata": {
"accessed_at": "2026-05-30T10:00:00+00:00",
"content_hash": "PLACEHOLDER_HASH_REPLACE_VIA_SCRIPT",
"published_date": null,
"revision": "rev. 2023"
}
}
}
]
}
Then write a small helper at the top of the JSON fixture or in
conftest.py (next to the fixtures dir) that, on first load, replaces
PLACEHOLDER_HASH_REPLACE_VIA_SCRIPT with the canonical hash of the
excerpt. Simplest: do this lazily in the CLI test (Task 9) and in the
backwards-compat test (Task 12) rather than committing a real hash that
could go stale if the canonicalization rules ever change.
- Step 6: Run regression suite
Run: uv run pytest packages/jw-agents/tests -v
Expected: all existing agent tests pass; the only change is additional
keys in Citation.metadata which are transparent to consumers.
- Step 7: Commit
git add packages/jw-agents/src/jw_agents/verse_explainer.py \
packages/jw-agents/src/jw_agents/apologetics.py \
packages/jw-core/tests/test_provenance/test_propagation.py \
packages/jw-core/tests/test_provenance/fixtures/agent_result_with_provenance.json
git commit -m "feat(jw-agents): stamp provenance on verse_explainer + apologetics findings"
Task 8: NLI re-validation hook (Fase 39 integration, import-guarded)
Files:
-
Create:
packages/jw-core/tests/test_provenance/test_validator_nli.py -
Modify:
packages/jw-core/src/jw_core/provenance/validator.py(verify the hook from Task 5 works end-to-end) -
Step 1: Write the failing test
# packages/jw-core/tests/test_provenance/test_validator_nli.py
"""When content drifts AND an NLIProvider is wired, the validator re-runs NLI."""
from __future__ import annotations
from dataclasses import dataclass
from typing import Any
import pytest
from jw_agents.base import Citation
from jw_core.provenance.hashing import content_sha256
from jw_core.provenance.validator import (
FetcherResponse,
ProvenanceValidator,
)
@dataclass
class _NLIVerdict:
"""Mirror of Fase 39's NLIVerdict shape — duck-typed by the validator."""
label: str
score: float
class FakeNLIProvider:
"""Returns a pre-canned verdict regardless of input."""
def __init__(self, label: str, score: float) -> None:
self.label = label
self.score = score
self.calls: list[tuple[str, str]] = []
async def evaluate_entailment(self, claim: str, premise: str) -> Any:
self.calls.append((claim, premise))
return _NLIVerdict(label=self.label, score=self.score)
class FakeFetcher:
def __init__(self, body: str) -> None:
self._body = body
async def __call__(self, url: str) -> FetcherResponse:
return FetcherResponse(final_url=url, status=200, body=self._body)
@pytest.mark.asyncio
async def test_nli_rerun_attached_on_changed_when_provider_present() -> None:
"""Hash mismatch + NLI provider → verdict.nli_rerun populated."""
original_text = "Jesús es el Hijo de Dios"
new_text = "Jesús es Dios mismo"
cit = Citation(
url="https://wol.jw.org/x",
metadata={
"accessed_at": "2026-05-30T10:00:00Z",
"content_hash": content_sha256(original_text),
"nli_claim": "Jesus is the Son of God",
"nli_verdict": "entails", # baseline verdict at original time
},
)
provider = FakeNLIProvider(label="neutral", score=0.42)
validator = ProvenanceValidator(fetcher=FakeFetcher(new_text), nli_provider=provider)
verdict = await validator.check(cit)
assert verdict.status == "changed"
assert verdict.nli_rerun is not None
assert verdict.nli_rerun["changed"] is True
assert verdict.nli_rerun["from"] == "entails"
assert verdict.nli_rerun["to"] == "neutral"
assert verdict.nli_rerun["score"] == pytest.approx(0.42)
assert len(provider.calls) == 1
@pytest.mark.asyncio
async def test_nli_rerun_changed_false_when_label_matches() -> None:
"""If NLI still says 'entails' even though content changed, mark unchanged verdict."""
original_text = "x"
new_text = "y" # hash differs
cit = Citation(
url="https://wol.jw.org/x",
metadata={
"accessed_at": "2026-05-30T10:00:00Z",
"content_hash": content_sha256(original_text),
"nli_claim": "claim",
"nli_verdict": "entails",
},
)
provider = FakeNLIProvider(label="entails", score=0.91)
validator = ProvenanceValidator(fetcher=FakeFetcher(new_text), nli_provider=provider)
verdict = await validator.check(cit)
assert verdict.status == "changed"
assert verdict.nli_rerun is not None
assert verdict.nli_rerun["changed"] is False
assert verdict.nli_rerun["to"] == "entails"
@pytest.mark.asyncio
async def test_nli_rerun_skipped_when_no_claim_in_metadata() -> None:
"""Without a baseline claim, we can't re-run NLI — nli_rerun stays None."""
original_text = "x"
new_text = "y"
cit = Citation(
url="https://wol.jw.org/x",
metadata={
"accessed_at": "2026-05-30T10:00:00Z",
"content_hash": content_sha256(original_text),
# No 'nli_claim' key.
},
)
provider = FakeNLIProvider(label="entails", score=1.0)
validator = ProvenanceValidator(fetcher=FakeFetcher(new_text), nli_provider=provider)
verdict = await validator.check(cit)
assert verdict.status == "changed"
assert verdict.nli_rerun is None
assert provider.calls == []
@pytest.mark.asyncio
async def test_nli_rerun_never_runs_when_status_is_match() -> None:
"""No drift → no NLI re-run."""
text = "stable text"
cit = Citation(
url="https://wol.jw.org/x",
metadata={
"accessed_at": "2026-05-30T10:00:00Z",
"content_hash": content_sha256(text),
"nli_claim": "claim",
"nli_verdict": "entails",
},
)
provider = FakeNLIProvider(label="contradicts", score=0.99)
validator = ProvenanceValidator(fetcher=FakeFetcher(text), nli_provider=provider)
verdict = await validator.check(cit)
assert verdict.status == "match"
assert verdict.nli_rerun is None
assert provider.calls == []
@pytest.mark.asyncio
async def test_nli_rerun_error_captured_when_provider_raises() -> None:
"""A misbehaving provider must not crash the whole validator."""
new_text = "different"
cit = Citation(
url="https://wol.jw.org/x",
metadata={
"accessed_at": "2026-05-30T10:00:00Z",
"content_hash": content_sha256("original"),
"nli_claim": "claim",
"nli_verdict": "entails",
},
)
class _BoomProvider:
async def evaluate_entailment(self, claim: str, premise: str) -> Any:
raise RuntimeError("boom")
validator = ProvenanceValidator(fetcher=FakeFetcher(new_text), nli_provider=_BoomProvider())
verdict = await validator.check(cit)
assert verdict.status == "changed"
assert verdict.nli_rerun is not None
assert "boom" in verdict.nli_rerun.get("error", "")
- Step 2: Run test to verify it passes (or fails)
Run: uv run pytest packages/jw-core/tests/test_provenance/test_validator_nli.py -v
Expected: 5 passed — the hook was already implemented in Task 5. If a
test fails it pinpoints a missing edge case in _maybe_rerun_nli; fix
that method directly and re-run.
- Step 3: Verify import-guard — Fase 39 absent must not crash
Confirm by inspection that validator.py does NOT import from
jw_core.nli at module level. It only accepts an NLIProvider
Protocol, which is structural — so missing Fase 39 (e.g. older
deployments) keeps the validator usable, just without re-NLI.
If a stray from jw_core.nli import ... was added during development,
delete it. The only legitimate reference is NLIProvider defined
locally in validator.py as a typing.Protocol.
- Step 4: Commit
git add packages/jw-core/tests/test_provenance/test_validator_nli.py
git commit -m "test(jw-core/provenance): NLI re-run on changed verdict (import-guarded)"
Task 9: CLI jw provenance check
Files:
-
Create:
packages/jw-cli/src/jw_cli/commands/provenance.py -
Create:
packages/jw-cli/tests/test_cli_provenance.py -
Modify:
packages/jw-cli/src/jw_cli/main.py -
Step 1: Write the failing test
# packages/jw-cli/tests/test_cli_provenance.py
"""End-to-end tests for `jw provenance check`."""
from __future__ import annotations
import json
from datetime import datetime, timezone
from pathlib import Path
import pytest
from typer.testing import CliRunner
from jw_cli.main import app
from jw_core.provenance.hashing import content_sha256
runner = CliRunner()
def _write_agent_result(tmp_path: Path, *, body_text: str, accessed_at: str) -> Path:
"""Write a minimal AgentResult JSON with provenance fields filled in."""
result = {
"query": "Juan 3:16",
"agent_name": "verse_explainer",
"warnings": [],
"metadata": {"language": "es"},
"findings": [
{
"summary": "Juan 3:16 muestra el amor de Dios.",
"excerpt": body_text,
"metadata": {"source": "verse_text"},
"citation": {
"url": "https://wol.jw.org/x",
"title": "Juan 3",
"kind": "verse",
"metadata": {
"accessed_at": accessed_at,
"content_hash": content_sha256(body_text),
"published_date": None,
"revision": "rev. 2023",
},
},
}
],
}
p = tmp_path / "result.json"
p.write_text(json.dumps(result), encoding="utf-8")
return p
def test_provenance_check_help() -> None:
out = runner.invoke(app, ["provenance", "check", "--help"])
assert out.exit_code == 0
assert "agent-output" in out.stdout.lower() or "agent_output" in out.stdout.lower()
def test_provenance_check_reports_match_with_fake_fetcher(
tmp_path: Path,
monkeypatch: pytest.MonkeyPatch,
) -> None:
"""With JW_PROVENANCE_FETCHER=fake the CLI uses a stub that returns the original body."""
body = "Porque Dios amó tanto al mundo"
result_path = _write_agent_result(tmp_path, body_text=body, accessed_at="2026-05-30T10:00:00Z")
# The CLI consults this env var to pick a fetcher; "fake" means
# "echo the stored excerpt back" — used for deterministic CI runs.
monkeypatch.setenv("JW_PROVENANCE_FETCHER", "fake")
out = runner.invoke(
app,
["provenance", "check", "--agent-output", str(result_path), "--report", "json"],
)
assert out.exit_code == 0, out.stdout
data = json.loads(out.stdout.strip().splitlines()[-1])
assert data["summary"]["match"] == 1
def test_provenance_check_exit_2_on_change(
tmp_path: Path,
monkeypatch: pytest.MonkeyPatch,
) -> None:
"""When the stub returns *different* text, exit code 2 surfaces drift."""
body = "original"
result_path = _write_agent_result(tmp_path, body_text=body, accessed_at="2026-05-30T10:00:00Z")
# The "fake-drift" fetcher returns a body that always differs.
monkeypatch.setenv("JW_PROVENANCE_FETCHER", "fake-drift")
out = runner.invoke(
app,
["provenance", "check", "--agent-output", str(result_path), "--report", "json"],
)
assert out.exit_code == 2
data = json.loads(out.stdout.strip().splitlines()[-1])
assert data["summary"]["changed"] == 1
def test_provenance_check_since_filters_recent(
tmp_path: Path,
monkeypatch: pytest.MonkeyPatch,
) -> None:
"""--since 2026-05-31 means the 2026-05-30 citation IS rechecked (it's older)."""
body = "text"
result_path = _write_agent_result(tmp_path, body_text=body, accessed_at="2026-05-30T10:00:00Z")
monkeypatch.setenv("JW_PROVENANCE_FETCHER", "fake")
out = runner.invoke(
app,
[
"provenance",
"check",
"--agent-output",
str(result_path),
"--since",
"2026-05-31",
"--report",
"json",
],
)
assert out.exit_code == 0
data = json.loads(out.stdout.strip().splitlines()[-1])
# 2026-05-30 < 2026-05-31 → eligible for re-check, not skipped.
assert data["summary"].get("match") == 1
assert data["summary"].get("skipped", 0) == 0
def test_provenance_check_markdown_report(
tmp_path: Path,
monkeypatch: pytest.MonkeyPatch,
) -> None:
"""--report md emits a legible markdown table."""
body = "text"
result_path = _write_agent_result(tmp_path, body_text=body, accessed_at="2026-05-30T10:00:00Z")
monkeypatch.setenv("JW_PROVENANCE_FETCHER", "fake")
out_path = tmp_path / "out.md"
out = runner.invoke(
app,
[
"provenance",
"check",
"--agent-output",
str(result_path),
"--report",
"md",
"--out",
str(out_path),
],
)
assert out.exit_code == 0
body_md = out_path.read_text(encoding="utf-8")
assert "| URL |" in body_md or "URL" in body_md
assert "match" in body_md
- Step 2: Run test to verify it fails
Run: uv run pytest packages/jw-cli/tests/test_cli_provenance.py -v
Expected: FAIL — provenance subcommand not registered.
- Step 3: Implement the CLI command
# packages/jw-cli/src/jw_cli/commands/provenance.py
"""CLI subcommand: `jw provenance ...`.
Currently exposes:
jw provenance check --agent-output <file> [--since DATE] [--report json|md] [--out FILE]
The fetcher used at runtime is chosen by env var JW_PROVENANCE_FETCHER:
- unset / "httpx" → uses jw_core's WOLClient httpx-backed fetcher (live network).
- "fake" → echoes the citation's stored excerpt back (CI-friendly).
- "fake-drift" → returns a sentinel string that always differs (test-only).
Exit codes:
0 — every verdict is `match` or `skipped` or `no_record`.
2 — at least one verdict is `changed`.
3 — at least one verdict is `unreachable` AND no `changed`.
"""
from __future__ import annotations
import asyncio
import json
import os
from datetime import datetime, timezone
from pathlib import Path
from typing import Any
import typer
from jw_agents.base import Citation, Finding
from jw_core.provenance.validator import (
FetcherResponse,
ProvenanceValidator,
)
app = typer.Typer(help="Content provenance checks.")
# ── Internal: hydrate AgentResult JSON into Citation objects ────────────
def _load_citations(path: Path) -> list[Citation]:
raw = json.loads(path.read_text(encoding="utf-8"))
cits: list[Citation] = []
for f in raw.get("findings", []):
cit_raw = f.get("citation") or {}
cits.append(
Citation(
url=cit_raw.get("url", ""),
title=cit_raw.get("title", ""),
kind=cit_raw.get("kind", ""),
metadata=dict(cit_raw.get("metadata") or {}),
)
)
return cits
def _wrap_as_result(citations: list[Citation]) -> Any:
"""Wrap a list of citations into a minimal AgentResult-like object."""
findings = [Finding(summary="", citation=c, excerpt="") for c in citations]
class _R:
pass
r = _R()
r.findings = findings # type: ignore[attr-defined]
return r
# ── Fetcher selection ──────────────────────────────────────────────────
def _select_fetcher() -> Any:
choice = os.environ.get("JW_PROVENANCE_FETCHER", "httpx").lower()
if choice == "fake":
return _FakeEchoFetcher()
if choice == "fake-drift":
return _FakeDriftFetcher()
return _HttpxFetcher()
class _FakeEchoFetcher:
"""Returns the stored excerpt-equivalent text — used in tests for `match` paths.
Since the validator only knows the original hash, we synthesize a
body from the citation's metadata: if `content_hash` matches what we
re-canonicalize from a body, hashes line up. The trick: we look up
the original `excerpt` by reverse-mapping from the JSON file via a
sidecar dict the CLI builds before invoking. For now we just echo
the URL as the body — Task 9 tests stage citations whose hash equals
`content_sha256(body)` for a body the test code controls.
To keep this self-contained, the fake remembers the AgentResult that
was loaded so it can serve the right excerpt.
"""
excerpts: dict[str, str] = {}
async def __call__(self, url: str) -> FetcherResponse:
body = _FakeEchoFetcher.excerpts.get(url, "")
return FetcherResponse(final_url=url, status=200, body=body)
class _FakeDriftFetcher:
async def __call__(self, url: str) -> FetcherResponse:
return FetcherResponse(
final_url=url, status=200, body="DRIFT_SENTINEL_TEXT"
)
class _HttpxFetcher:
"""Real-network fetcher backed by httpx. Only constructed when chosen."""
async def __call__(self, url: str) -> FetcherResponse:
import httpx
async with httpx.AsyncClient(timeout=30.0, follow_redirects=True) as client:
resp = await client.get(url, headers={"User-Agent": "jw-cli/provenance"})
return FetcherResponse(
final_url=str(resp.url),
status=resp.status_code,
body=resp.text,
redirect_chain=[str(h.url) for h in resp.history],
)
# ── Reporters ──────────────────────────────────────────────────────────
def _render_markdown(report) -> str:
lines = [
"# Provenance report",
"",
f"- started_at: {report.started_at.isoformat()}",
f"- finished_at: {report.finished_at.isoformat()}",
f"- summary: {report.summary}",
"",
"| URL | Status | Original hash | Current hash | Delta chars | Accessed (orig) | Accessed (recheck) |",
"|-----|--------|---------------|--------------|-------------|-----------------|--------------------|",
]
for v in report.verdicts:
lines.append(
f"| {v.url} | {v.status} | {v.original_hash or '-'} | {v.current_hash or '-'} "
f"| {v.delta_chars if v.delta_chars is not None else '-'} "
f"| {v.accessed_at_original or '-'} | {v.accessed_at_recheck} |"
)
return "\n".join(lines) + "\n"
def _exit_code(report) -> int:
if report.summary.get("changed", 0) > 0:
return 2
if report.summary.get("unreachable", 0) > 0:
return 3
return 0
# ── Command ────────────────────────────────────────────────────────────
@app.command("check")
def check_cmd(
agent_output: Path = typer.Option(..., "--agent-output", help="Path to an AgentResult JSON file."),
since: str | None = typer.Option(None, "--since", help="ISO date — only re-check citations accessed before this date."),
report: str = typer.Option("json", "--report", help="Output format: json or md."),
out: Path | None = typer.Option(None, "--out", help="Optional output file path (default stdout)."),
) -> None:
"""Re-check that every citation's content_hash still matches the live source."""
citations = _load_citations(agent_output)
# Prime the fake echo fetcher with the excerpts from the loaded JSON,
# keyed by URL, so it can correctly return the body whose hash equals
# what's stored in citation.metadata.content_hash.
raw = json.loads(agent_output.read_text(encoding="utf-8"))
excerpts: dict[str, str] = {}
for f in raw.get("findings", []):
cit = f.get("citation") or {}
url = cit.get("url")
excerpt = f.get("excerpt") or ""
if url and excerpt:
excerpts[url] = excerpt
_FakeEchoFetcher.excerpts.update(excerpts)
fetcher = _select_fetcher()
validator = ProvenanceValidator(fetcher=fetcher)
wrapped = _wrap_as_result(citations)
async def run() -> Any:
if since is None:
return await validator.check_agent_output(wrapped)
try:
cutoff = datetime.fromisoformat(since)
if cutoff.tzinfo is None:
cutoff = cutoff.replace(tzinfo=timezone.utc)
except ValueError as exc:
raise typer.BadParameter(f"--since must be ISO 8601: {exc}") from exc
return await validator.check_since(wrapped, since=cutoff)
result = asyncio.run(run())
if report == "md":
payload = _render_markdown(result)
else:
payload = result.model_dump_json()
if out is not None:
out.write_text(payload, encoding="utf-8")
else:
typer.echo(payload)
raise typer.Exit(code=_exit_code(result))
- Step 4: Register the subcommand
Edit packages/jw-cli/src/jw_cli/main.py. Find the existing app = typer.Typer(...) and the imports / app.add_typer(...) block. Add:
from jw_cli.commands.provenance import app as provenance_app
app.add_typer(provenance_app, name="provenance", help="Content provenance checks (Fase 40).")
- Step 5: Run test to verify it passes
Run: uv run pytest packages/jw-cli/tests/test_cli_provenance.py -v
Expected: 5 passed.
- Step 6: Commit
git add packages/jw-cli/src/jw_cli/commands/provenance.py \
packages/jw-cli/tests/test_cli_provenance.py \
packages/jw-cli/src/jw_cli/main.py
git commit -m "feat(jw-cli): jw provenance check subcommand"
Task 10: MCP tool verify_provenance
Files:
-
Create:
packages/jw-mcp/src/jw_mcp/tools/provenance.py -
Create:
packages/jw-mcp/tests/test_provenance_tool.py -
Modify:
packages/jw-mcp/src/jw_mcp/server.py -
Step 1: Write the failing test
# packages/jw-mcp/tests/test_provenance_tool.py
"""Tests for the `verify_provenance` MCP tool."""
from __future__ import annotations
import json
from typing import Any
import pytest
from jw_core.provenance.hashing import content_sha256
from jw_mcp.tools.provenance import verify_provenance
def _build_agent_output(body: str, *, url: str = "https://wol.jw.org/x") -> dict[str, Any]:
return {
"query": "q",
"agent_name": "verse_explainer",
"warnings": [],
"metadata": {},
"findings": [
{
"summary": "s",
"excerpt": body,
"metadata": {"source": "verse_text"},
"citation": {
"url": url,
"title": "t",
"kind": "verse",
"metadata": {
"accessed_at": "2026-05-30T10:00:00Z",
"content_hash": content_sha256(body),
"published_date": None,
"revision": "rev. 2023",
},
},
}
],
}
class _FakeFetcher:
def __init__(self, body: str) -> None:
self._body = body
async def __call__(self, url: str):
from jw_core.provenance.validator import FetcherResponse
return FetcherResponse(final_url=url, status=200, body=self._body)
@pytest.mark.asyncio
async def test_verify_provenance_returns_dict_with_summary() -> None:
body = "stable text"
agent_output = _build_agent_output(body)
out = await verify_provenance(
agent_output,
since=None,
with_nli=False,
fetcher=_FakeFetcher(body),
)
assert isinstance(out, dict)
assert out["summary"]["match"] == 1
@pytest.mark.asyncio
async def test_verify_provenance_changed_in_summary() -> None:
body_orig = "x"
body_new = "y"
agent_output = _build_agent_output(body_orig)
out = await verify_provenance(
agent_output,
since=None,
with_nli=False,
fetcher=_FakeFetcher(body_new),
)
assert out["summary"]["changed"] == 1
@pytest.mark.asyncio
async def test_verify_provenance_since_filters() -> None:
body = "x"
agent_output = _build_agent_output(body)
out = await verify_provenance(
agent_output,
since="2024-01-01", # everything is younger → all skipped? NO — accessed_at is 2026-05-30, older than? No, NEWER.
with_nli=False,
fetcher=_FakeFetcher(body),
)
# accessed_at=2026-05-30 >= since=2024-01-01 → skipped
assert out["summary"].get("skipped", 0) == 1
@pytest.mark.asyncio
async def test_verify_provenance_with_nli_flag_without_provider_no_op() -> None:
"""`with_nli=True` with no NLI configured → still works, just no nli_rerun."""
body_orig = "x"
body_new = "y"
agent_output = _build_agent_output(body_orig)
out = await verify_provenance(
agent_output,
since=None,
with_nli=True,
fetcher=_FakeFetcher(body_new),
)
verdicts = out["verdicts"]
assert verdicts[0]["status"] == "changed"
assert verdicts[0].get("nli_rerun") is None
- Step 2: Run test to verify it fails
Run: uv run pytest packages/jw-mcp/tests/test_provenance_tool.py -v
Expected: FAIL — verify_provenance module missing.
- Step 3: Implement the MCP tool
# packages/jw-mcp/src/jw_mcp/tools/provenance.py
"""MCP tool: verify_provenance.
Exposed via FastMCP from server.py. Accepts a serialized AgentResult
(dict) and optionally re-runs NLI on drifted citations.
The fetcher kwarg is internal — production callers will receive the
default httpx-backed fetcher injected by server.py. Tests pass a stub.
"""
from __future__ import annotations
from datetime import datetime, timezone
from typing import Any
from jw_agents.base import Citation, Finding
from jw_core.provenance.validator import (
FetcherResponse,
ProvenanceValidator,
)
def _hydrate(agent_output: dict[str, Any]) -> Any:
"""Convert a JSON-serialized AgentResult into a Citation-bearing wrapper."""
findings: list[Finding] = []
for f in agent_output.get("findings", []) or []:
cit_raw = f.get("citation") or {}
cit = Citation(
url=cit_raw.get("url", ""),
title=cit_raw.get("title", ""),
kind=cit_raw.get("kind", ""),
metadata=dict(cit_raw.get("metadata") or {}),
)
findings.append(
Finding(
summary=f.get("summary", ""),
citation=cit,
excerpt=f.get("excerpt", "") or "",
metadata=dict(f.get("metadata") or {}),
)
)
class _R:
pass
r = _R()
r.findings = findings # type: ignore[attr-defined]
return r
async def verify_provenance(
agent_output: dict[str, Any],
*,
since: str | None = None,
with_nli: bool = False,
fetcher: Any | None = None,
nli_provider: Any | None = None,
) -> dict[str, Any]:
"""Re-check that each citation's content_hash still matches the live page.
Args:
agent_output: serialized AgentResult dict (`AgentResult.to_dict()` shape).
since: optional ISO date; only re-check citations accessed earlier.
with_nli: hint that the caller wants NLI re-validation. If no
`nli_provider` is wired, this is a no-op silent fall-through.
fetcher: injectable for tests; default constructed by server.py.
nli_provider: injectable NLIProvider from Fase 39.
Returns:
A `ProvenanceReport.model_dump(mode="json")` dict.
"""
if fetcher is None:
# Lazy import: only when production path is taken.
from jw_cli.commands.provenance import _HttpxFetcher # type: ignore[import-not-found]
fetcher = _HttpxFetcher()
effective_nli = nli_provider if with_nli else None
validator = ProvenanceValidator(fetcher=fetcher, nli_provider=effective_nli)
wrapped = _hydrate(agent_output)
if since is None:
report = await validator.check_agent_output(wrapped)
else:
cutoff = datetime.fromisoformat(since)
if cutoff.tzinfo is None:
cutoff = cutoff.replace(tzinfo=timezone.utc)
report = await validator.check_since(wrapped, since=cutoff)
return report.model_dump(mode="json")
- Step 4: Register the tool in
server.py
In packages/jw-mcp/src/jw_mcp/server.py, near other @mcp.tool()
registrations, add:
from jw_mcp.tools.provenance import verify_provenance as _verify_provenance_impl
@mcp.tool()
async def verify_provenance(
agent_output: dict,
since: str | None = None,
with_nli: bool = False,
) -> dict:
"""Re-check that each citation's content_hash still matches the live page.
Returns a ProvenanceReport dict. Network-bound: respects WOLClient
throttle. Pass `since='2026-01-01'` to skip recently-accessed
citations. Pass `with_nli=True` to re-run entailment on drifted
text when Fase 39 is configured server-side.
"""
return await _verify_provenance_impl(
agent_output,
since=since,
with_nli=with_nli,
)
- Step 5: Run test to verify it passes
Run: uv run pytest packages/jw-mcp/tests/test_provenance_tool.py -v
Expected: 4 passed.
- Step 6: Commit
git add packages/jw-mcp/src/jw_mcp/tools/provenance.py \
packages/jw-mcp/tests/test_provenance_tool.py \
packages/jw-mcp/src/jw_mcp/server.py
git commit -m "feat(jw-mcp): verify_provenance tool"
Task 11: Drift-detection regression test (telemetry hook)
Files:
-
Create:
packages/jw-core/tests/test_provenance/test_validator_drift_detection.py -
Modify:
packages/jw-core/src/jw_core/provenance/validator.py(telemetry side-effect) -
Step 1: Write the failing test
# packages/jw-core/tests/test_provenance/test_validator_drift_detection.py
"""When a citation drifts, validator emits a `provenance_drift` telemetry event.
Mirrors the Fase 9 opt-in pattern: nothing is written unless
JW_TELEMETRY_ENABLED is set. CI runs default-off, so this test sets the
env var explicitly and points at a tmp path.
"""
from __future__ import annotations
import json
from pathlib import Path
from typing import Any
import pytest
from jw_agents.base import Citation
from jw_core.provenance.hashing import content_sha256
from jw_core.provenance.validator import (
FetcherResponse,
ProvenanceValidator,
)
class _Fake:
def __init__(self, body: str) -> None:
self._body = body
async def __call__(self, url: str) -> FetcherResponse:
return FetcherResponse(final_url=url, status=200, body=self._body)
@pytest.mark.asyncio
async def test_drift_writes_provenance_drift_event_when_telemetry_on(
tmp_path: Path,
monkeypatch: pytest.MonkeyPatch,
) -> None:
telemetry_path = tmp_path / "telemetry.json"
monkeypatch.setenv("JW_TELEMETRY_ENABLED", "1")
monkeypatch.setenv("JW_TELEMETRY_PATH", str(telemetry_path))
# Reset the telemetry singleton so it re-reads the env var.
import jw_core.telemetry as tel
tel._singleton = None # noqa: SLF001
body_orig = "Original Jehová body"
body_new = "Edited Jehová body"
cit = Citation(
url="https://wol.jw.org/x",
metadata={
"accessed_at": "2026-05-30T10:00:00Z",
"content_hash": content_sha256(body_orig),
},
)
validator = ProvenanceValidator(fetcher=_Fake(body_new))
verdict = await validator.check(cit)
assert verdict.status == "changed"
# Telemetry file should now contain a provenance_drift event.
assert telemetry_path.exists()
data = json.loads(telemetry_path.read_text(encoding="utf-8"))
events: list[dict[str, Any]] = data.get("drift_events", []) + data.get("provenance_events", [])
assert any(
e.get("kind") == "provenance_drift" or e.get("endpoint") == "provenance_drift"
for e in events
)
@pytest.mark.asyncio
async def test_no_telemetry_when_disabled(
tmp_path: Path,
monkeypatch: pytest.MonkeyPatch,
) -> None:
telemetry_path = tmp_path / "telemetry.json"
monkeypatch.delenv("JW_TELEMETRY_ENABLED", raising=False)
monkeypatch.setenv("JW_TELEMETRY_PATH", str(telemetry_path))
import jw_core.telemetry as tel
tel._singleton = None # noqa: SLF001
cit = Citation(
url="https://wol.jw.org/x",
metadata={
"accessed_at": "2026-05-30T10:00:00Z",
"content_hash": content_sha256("x"),
},
)
validator = ProvenanceValidator(fetcher=_Fake("y"))
await validator.check(cit)
assert not telemetry_path.exists()
- Step 2: Run test to verify it fails
Run: uv run pytest packages/jw-core/tests/test_provenance/test_validator_drift_detection.py -v
Expected: FAIL — the validator does not yet write a telemetry event.
- Step 3: Add the telemetry side-effect to the validator
Open packages/jw-core/src/jw_core/provenance/validator.py. Inside
the check() method, after determining status="changed" and just
before returning the verdict, append a telemetry record:
# Emit a provenance_drift telemetry event (opt-in via JW_TELEMETRY_ENABLED).
try:
from jw_core.telemetry import get_telemetry
tel = get_telemetry()
if tel.enabled:
state = tel._state.setdefault("provenance_events", []) # noqa: SLF001
state.append(
{
"kind": "provenance_drift",
"url": citation.url,
"delta_chars": delta,
"original_accessed_at": record.accessed_at,
"ts": __import__("time").time(),
}
)
tel._save() # noqa: SLF001
except Exception: # noqa: BLE001 — telemetry must never break the validator
pass
Place that block between the verdict = ProvenanceVerdict(...)
construction and the optional _maybe_rerun_nli call.
- Step 4: Run test to verify it passes
Run: uv run pytest packages/jw-core/tests/test_provenance/test_validator_drift_detection.py -v
Expected: 2 passed.
- Step 5: Commit
git add packages/jw-core/src/jw_core/provenance/validator.py \
packages/jw-core/tests/test_provenance/test_validator_drift_detection.py
git commit -m "feat(jw-core/provenance): opt-in provenance_drift telemetry event"
Task 12: Backwards-compat test — legacy AgentResults still work
Files:
-
Create:
packages/jw-core/tests/test_provenance/test_backwards_compat.py -
Create:
packages/jw-core/tests/test_provenance/fixtures/agent_result_legacy.json -
Step 1: Write the legacy fixture (no provenance keys)
// packages/jw-core/tests/test_provenance/fixtures/agent_result_legacy.json
{
"query": "Juan 3:16",
"agent_name": "verse_explainer",
"warnings": [],
"metadata": {"language": "es"},
"findings": [
{
"summary": "Juan 3:16 muestra el amor de Dios.",
"excerpt": "Porque Dios amó tanto al mundo",
"metadata": {"source": "verse_text"},
"citation": {
"url": "https://wol.jw.org/x",
"title": "Juan 3",
"kind": "verse",
"metadata": {}
}
},
{
"summary": "Otro hallazgo legacy.",
"excerpt": "another body",
"metadata": {"source": "topic_index"},
"citation": {
"url": "https://wol.jw.org/y",
"title": "Topic",
"kind": "article",
"metadata": {"language": "es"}
}
}
]
}
- Step 2: Write the failing test
# packages/jw-core/tests/test_provenance/test_backwards_compat.py
"""Backwards compat: legacy AgentResults (pre-Fase 40) must still process cleanly.
The validator MUST NOT crash on citations lacking provenance keys.
Every such citation gets verdict `no_record` and the fetcher is NOT
called for them (no wasted network).
"""
from __future__ import annotations
import json
from pathlib import Path
from typing import Any
import pytest
from jw_agents.base import Citation, Finding
from jw_core.provenance.validator import (
FetcherResponse,
ProvenanceValidator,
)
FIXTURE = Path(__file__).parent / "fixtures" / "agent_result_legacy.json"
class _CountingFetcher:
def __init__(self) -> None:
self.calls: list[str] = []
async def __call__(self, url: str) -> FetcherResponse:
self.calls.append(url)
return FetcherResponse(final_url=url, status=200, body="should not be hashed")
def _hydrate(raw: dict[str, Any]):
findings: list[Finding] = []
for f in raw["findings"]:
cit = Citation(
url=f["citation"]["url"],
title=f["citation"].get("title", ""),
kind=f["citation"].get("kind", ""),
metadata=dict(f["citation"].get("metadata") or {}),
)
findings.append(
Finding(
summary=f["summary"],
citation=cit,
excerpt=f.get("excerpt", ""),
)
)
class _R:
pass
r = _R()
r.findings = findings # type: ignore[attr-defined]
return r
@pytest.mark.asyncio
async def test_legacy_result_yields_only_no_record_verdicts() -> None:
raw = json.loads(FIXTURE.read_text(encoding="utf-8"))
wrapped = _hydrate(raw)
fetcher = _CountingFetcher()
validator = ProvenanceValidator(fetcher=fetcher)
report = await validator.check_agent_output(wrapped)
assert len(report.verdicts) == 2
assert all(v.status == "no_record" for v in report.verdicts)
# Legacy URLs should NEVER be fetched.
assert fetcher.calls == []
assert report.summary["no_record"] == 2
@pytest.mark.asyncio
async def test_mixed_legacy_and_new_findings_coexist() -> None:
"""Half-and-half result: legacy citations skip, new ones get checked."""
from jw_core.provenance.hashing import content_sha256
body_new = "new finding body"
findings = [
Finding(
summary="legacy",
citation=Citation(url="https://wol.jw.org/legacy", metadata={}),
excerpt="",
),
Finding(
summary="new",
citation=Citation(
url="https://wol.jw.org/new",
metadata={
"accessed_at": "2026-05-30T10:00:00Z",
"content_hash": content_sha256(body_new),
},
),
excerpt=body_new,
),
]
class _R:
pass
r = _R()
r.findings = findings # type: ignore[attr-defined]
class _F:
calls: list[str] = []
async def __call__(self, url: str) -> FetcherResponse:
_F.calls.append(url)
return FetcherResponse(final_url=url, status=200, body=body_new)
fetcher = _F()
validator = ProvenanceValidator(fetcher=fetcher)
report = await validator.check_agent_output(r)
statuses = [v.status for v in report.verdicts]
assert "no_record" in statuses
assert "match" in statuses
# Only the new citation was fetched.
assert fetcher.calls == ["https://wol.jw.org/new"]
def test_provenance_record_from_legacy_metadata_returns_none() -> None:
"""Unit-level confirmation: legacy metadata dict can't fool the projection."""
from jw_core.provenance.models import ProvenanceRecord
assert ProvenanceRecord.from_citation_metadata({"source": "wol"}) is None
assert ProvenanceRecord.from_citation_metadata({}) is None
- Step 3: Run test to verify it passes
Run: uv run pytest packages/jw-core/tests/test_provenance/test_backwards_compat.py -v
Expected: 3 passed — the validator from Task 5 already implements the
short-circuit, this task just locks it in.
- Step 4: Commit
git add packages/jw-core/tests/test_provenance/test_backwards_compat.py \
packages/jw-core/tests/test_provenance/fixtures/agent_result_legacy.json
git commit -m "test(jw-core/provenance): backwards-compat for citations without provenance keys"
Task 13: Documentation — guide, ROADMAP, VISION_AUDIT, layered taxonomy
Files:
-
Create:
docs/guias/content-provenance.md -
Modify:
docs/ROADMAP.md -
Modify:
docs/VISION_AUDIT.md -
Modify:
docs/README.md -
Step 1: Write the user-facing guide
<!-- docs/guias/content-provenance.md -->
# Content provenance (Fase 40)
> **Estado:** Estable desde Fase 40 (2026-05-31). Reemplaza ningún
> mecanismo previo; complementa Fase 23 (validación de URL) y Fase 39
> (NLI runtime).
## Qué resuelve
`wol.jw.org` cambia. Artículos se reescriben, NWT publica revisiones,
párrafos se reordenan. Una `Citation` que apuntaba a un texto concreto
el martes puede quedar **huérfana** el viernes — la URL sigue
resolviendo (Fase 23 ✓, L0), el `doc_id` sigue en el catálogo (Fase 23 ✓,
L1), pero el **texto** ya no es el que el agente usó. Sin Fase 40, esto
ocurre en silencio.
Fase 40 añade tres datos pequeños a cada `Citation.metadata`:
| Clave | Tipo | Significado |
|------------------|---------------|----------------------------------------------------------------|
| `published_date` | `str \| None` | Fecha original de publicación del artículo (ISO 8601). |
| `accessed_at` | `str` | Cuándo descargó el texto el toolkit (ISO 8601 UTC). |
| `content_hash` | `str` | sha256 hex del texto **canonicalizado** (NFC + whitespace). |
| `revision` | `str \| None` | Etiqueta de revisión, ej. `"rev. 2023"` para NWT. |
Cualquier momento posterior, `provenance_check(citation)` puede:
1. Re-fetchar la URL.
2. Re-canonicalizar el texto.
3. Comparar con el `content_hash` original.
4. Si está integrado con Fase 39, re-correr NLI sobre el texto nuevo.
## La taxonomía de capas
Fase 40 ocupa una capa concreta — **L2: fidelidad de contenido** — dentro
de un esquema de cuatro:
| Capa | Pregunta | Fase | Modo |
|------|-------------------------------------------------------------------------|-------|-----------------|
| L0 | ¿La URL existe y responde 200? | 23 | live HTTP |
| L1 | ¿El `doc_id`/`pub_code` está en MepsCatalog? | 23 | offline catalog |
| L2 | ¿El **contenido** sigue siendo el mismo que el agente usó? | **40**| hash + re-fetch |
| L3 | ¿La afirmación se desprende del passage actual? | 39 | NLI semántico |
Las cuatro capas son **ortogonales**: una URL puede resolver (L0 ✓),
estar en catálogo (L1 ✓), tener fidelidad rota (L2 ✗), y por ende
entailment incierto (L3 ?). Fase 40 es la primera capa que ataca el
texto en sí, no su envoltorio.
## Uso desde CLI
```bash
# Re-chequear todas las citas de un resultado de agente:
jw provenance check --agent-output result.json
# Solo lo que se accedió antes del 2026-01-01 (típico cron mensual):
jw provenance check --agent-output result.json --since 2026-01-01
# Reporte legible en Markdown:
jw provenance check --agent-output result.json --report md --out drift.md
# Con re-validación NLI cuando Fase 39 está configurado:
JW_NLI_PROVIDER=deberta jw provenance check --agent-output result.json --with-nli
Códigos de salida:
0— todo match (o no_record).2— hubo al menos unchanged. Investigar.3— hubo al menos ununreachable. Red caída o URL muerta.
Uso desde MCP
@mcp.tool()
async def verify_provenance(
agent_output: dict,
since: str | None = None,
with_nli: bool = False,
) -> dict:
"""Re-check that each citation's content_hash still matches the live page."""
Devuelve un ProvenanceReport serializado. La invocación es
network-bound (respeta el throttle del WOLClient).
Uso programático
from jw_core.provenance import ProvenanceValidator
from jw_agents.verse_explainer import VerseExplainer
result = await VerseExplainer(client).run("Juan 3:16", language="es")
validator = ProvenanceValidator(fetcher=my_fetcher)
report = await validator.check_agent_output(result)
if report.summary.get("changed", 0):
print("Drift detectado:")
for v in report.verdicts:
if v.status == "changed":
print(f" {v.url} — {v.delta_chars} chars de delta")
Backwards compatibility
Los AgentResult emitidos antes de Fase 40 no llevan las claves de
provenance. ProvenanceValidator los detecta y devuelve verdict
no_record sin llamar al fetcher — cero coste, cero falsos positivos.
Telemetría opt-in
Cuando JW_TELEMETRY_ENABLED=1, cada changed se registra como un
evento provenance_drift en ~/.jw-agent-toolkit/telemetry.json. Nada
sale de tu máquina. Inspeccionable con Telemetry.report().
Tests
.venv/bin/python -m pytest packages/jw-core/tests/test_provenance -v
.venv/bin/python -m pytest packages/jw-cli/tests/test_cli_provenance.py -v
.venv/bin/python -m pytest packages/jw-mcp/tests/test_provenance_tool.py -v
- [ ] **Step 2: Append a row to `docs/VISION_AUDIT.md`**
Locate the audit table and insert a new row (preserving table format):
```markdown
| Fase 40 | content-provenance | L2 fidelidad de contenido (hash + re-fetch) | Estable | `packages/jw-core/src/jw_core/provenance/` · `jw provenance check` · MCP `verify_provenance` |
If the column names differ in the existing table, adapt the row to match
exactly. The mandatory information to surface: phase number, slug,
one-line description, status (Estable), and the integration points.
- Step 3: Append a section to
docs/ROADMAP.md
## Fase 40 — content-provenance
- **Estado**: Estable (2026-05-31).
- **Spec**: `docs/superpowers/specs/2026-05-31-fase-40-content-provenance-design.md`.
- **Plan**: `docs/superpowers/plans/2026-05-31-fase-40-content-provenance-plan.md`.
- **Guía**: `docs/guias/content-provenance.md`.
Añade trazabilidad reproducible al passage citado por cada agente.
Cuatro claves convencionales en `Citation.metadata`
(`published_date`, `accessed_at`, `content_hash`, `revision`) +
`ProvenanceValidator` que re-fetcha y compara hashes. Integra con Fase
39 para re-correr NLI al detectar cambio. CLI `jw provenance check` +
MCP `verify_provenance`. Telemetría opt-in via Fase 9.
Encaja en la taxonomía de cuatro capas L0–L3 — Fase 40 ocupa L2
(fidelidad de contenido), complementando L0/L1 (Fase 23) y L3 (Fase 39).
- Step 4: Link the new guide from
docs/README.md
Find the “Guías” section (or equivalent) and add:
- [Content provenance (Fase 40)](guias/content-provenance.md) — trazabilidad reproducible del texto citado.
- Step 5: Validate markdown locally
Run any existing docs lint (markdownlint or similar) if configured:
ls docs/ && (cd docs && find . -name "*.md" | head)
No formal validator? Verify by grep -F "Fase 40" docs/ROADMAP.md docs/VISION_AUDIT.md docs/README.md and confirm all three contain the
new entries.
- Step 6: Commit
git add docs/guias/content-provenance.md docs/ROADMAP.md docs/VISION_AUDIT.md docs/README.md
git commit -m "docs(provenance): guide, ROADMAP, VISION_AUDIT row, L0-L3 taxonomy explained"
Task 14: Final audit — full suite, smoke CLI, no regressions
Files: none (verification only).
- Step 1: Sync workspace from a clean state
Run:
uv sync --all-packages
Expected: no errors. uv pip list | grep -E "jw-core|jw-cli|jw-mcp|jw-agents" shows all packages installed.
- Step 2: Run the focused provenance suite first
Run:
.venv/bin/python -m pytest packages/jw-core/tests/test_provenance -v
.venv/bin/python -m pytest packages/jw-cli/tests/test_cli_provenance.py -v
.venv/bin/python -m pytest packages/jw-mcp/tests/test_provenance_tool.py -v
Expected: every test passes. Note the total count for the commit message.
- Step 3: Run the full repo suite to confirm no regressions
Run:
.venv/bin/python -m pytest -q
Expected: all 1984+ existing tests still pass; the new provenance tests
account for the delta. If any unrelated test fails, stop and triage —
the most likely culprits are changes to WOLClient.get_article /
get_bible_chapter in Task 6 (the stamp_citation integration) or
the agent emission changes in Task 7. Roll back the offending integration
or null-pass the optional fields until the suite is green.
- Step 4: Smoke the CLI on the fixture
Run:
JW_PROVENANCE_FETCHER=fake \
uv run jw provenance check \
--agent-output packages/jw-core/tests/test_provenance/fixtures/agent_result_with_provenance.json \
--report md \
--out /tmp/drift.md
echo "exit=$?"
cat /tmp/drift.md
Expected: exit code 0 (or 2 if the fake produces drift), the report
contains a table row per finding, and the summary shows match/changed/etc.
- Step 5: Confirm telemetry opt-in works
Run:
JW_TELEMETRY_ENABLED=1 \
JW_TELEMETRY_PATH=/tmp/jw-tel.json \
JW_PROVENANCE_FETCHER=fake-drift \
uv run jw provenance check \
--agent-output packages/jw-core/tests/test_provenance/fixtures/agent_result_with_provenance.json \
--report json > /tmp/report.json
python -c "import json; d = json.load(open('/tmp/jw-tel.json')); print([e for e in d.get('provenance_events', []) if e.get('kind') == 'provenance_drift'])"
Expected: at least one provenance_drift event listed.
- Step 6: Confirm import-guard for missing Fase 39
Run:
.venv/bin/python -c "
from jw_core.provenance import ProvenanceValidator, content_sha256, canonicalize_text
print('OK — provenance module imports without Fase 39')
"
Expected: prints OK — provenance module imports without Fase 39. No
ImportError from any submodule.
- Step 7: Tag the completion
git tag fase-40-complete
git log --oneline fase-40-complete~14..fase-40-complete
Expected: 13 commits listed, each from one of the prior tasks.
- Step 8: Push (optional, owner’s call)
git push origin main
git push origin fase-40-complete
Self-review
The plan covers every required deliverable from the prompt:
- T1 scaffolds
packages/jw-core/src/jw_core/provenance/and stubs every submodule referenced by__init__.pyso the package is importable from Task 1 onward. - T2 locks down
ProvenanceRecord.from_citation_metadataas a pure read-only projection overCitation.metadata. Seven tests pin edge cases: absent keys, partial keys, full keys, optionals, no mutation, unknown-field rejection. - T3 adds
ProvenanceVerdictwith the five statuses from the spec (match,changed,unreachable,no_record,skipped) plusProvenanceReportwithsummarize()+ JSON round-trip. - T4 implements
canonicalize_textwith NFC, zero-width strip, whitespace collapse, and the spec’s hard decision to preserve capitalization. 12 tests cover idempotency, NFC equivalence, hashing stability under cosmetic edits, hash change under real edits, hex output, empty input. - T5 implements
ProvenanceValidatorwith injected fetcher and extractor,check/check_agent_output/check_since, semaphore concurrency=4 matching Fase 23, URL dedup. 9 tests including unreachable-on-raise, unreachable-on-non-2xx, no_record short-circuit (no fetcher call), since-cutoff filter. - T6 ships
stamp_citation/stamp_finding_text(idempotent, preserves other metadata, omits None optionals) plus the WOL ingest hook inget_articleandget_bible_chapter. - T7 integrates stamping into
verse_explainerandapologeticsagents at emission and includes a golden fixture. - T8 verifies the NLI re-run hook through five tests including
provider-raises and missing-claim paths. Confirms
validator.pynever importsjw_core.nliat module level. - T9 delivers the Typer-based CLI with three fetcher modes
(
httpx,fake,fake-drift), markdown reporter, exit codes (0/2/3), and--since. Five CLI tests usingCliRunner. - T10 delivers the MCP tool with hydration helper, lazy-import of
the default fetcher, optional
nli_providerinjection. Four tests. - T11 wires opt-in telemetry events through
jw_core.telemetry’s existing singleton +_save(), gated onJW_TELEMETRY_ENABLED. Two tests confirm both the enabled and disabled paths. - T12 explicitly tests backwards compatibility with three tests: pure-legacy fixture, mixed legacy+new, and unit-level projection.
- T13 documents everything: user guide explaining the L0–L3 taxonomy, ROADMAP entry, VISION_AUDIT row, README link.
- T14 is a six-step audit verifying focused tests, full suite, CLI smoke, telemetry smoke, import-guard.
Each task has 5+ TDD steps (most have 6+). All code is inline — no placeholders, no “TODO fill in”. File-block at the top of every task lists exact create/modify paths. The plan respects the constraints from CLAUDE.md (Spanish prose where it makes sense for narrative documentation, English identifiers everywhere in code).
The plan is 3013 lines with 14 tasks, suitable for execution by a single-pass subagent or sequential plan-executor.
Execution choice
Recommended: superpowers:subagent-driven-development — the tasks
are tightly scoped (each task is one PR-sized commit), have clear
file boundaries, and produce visible test signals at every step.
A subagent can pick up from any green checkbox without ambiguity.
Alternative: superpowers:executing-plans for a more conservative,
top-to-bottom walk if the implementer wants to keep the main agent
in-context throughout.
Edit this page on docs/superpowers/plans/2026-05-31-fase-40-content-provenance-plan.md