Independent Project Not affiliated with, sponsored by, or endorsed by the Watch Tower Bible and Tract Society or Jehovah's Witnesses.
jw-agent-toolkit
ES

2026-05-30 · Release 22-32

11 phases. A complete ecosystem for discipleship and trust.

Phases 22–32 close the active-discipleship loop (study conductor, student parts, pioneer report) and the trust infrastructure (doctrinal eval + citation validator) that measures every change. No LLM in the critical path, no network in tests, citations always verifiable.

Phases shipped

11

22 → 32

TDD commits

146

red → green → commit

New tests

+890

551 → 1641 passing

Regressions

0

1:1 audit per phase

▌ Hard rules respected across all 11 phases

  • No LLM in the critical path — deterministic parsers/agents/stores.
  • Citations always verifiable via canonical wol.jw.org URLs.
  • Local-first; all personal storage in ~/.jw-agent-toolkit/.
  • No network in tests — fixtures + cassettes + injectable fetchers.
  • Multilingual from day 1 (en/es/pt minimum, up to 17 locales).
  • Doesn't replace the word of the elders — agents orient, never counsel.
  • No tracking of brothers without explicit opt-in + E2E encryption.
  • No distribution of song lyrics or Bible text (copyright).

Tier 1 · Trust infrastructure

Measures and protects everything else.

Phase 22

Doctrinal regression eval

A safety net that measures every change

+35 tests 20 commits ✓ Shipped
Guide →

Three-layer suite (structural · citations · semantic) with 47 golden cases. Turns the risk of doctrinal hallucination into an auditable metric that gates PRs. The piece protecting the other 10.

What shipped

  • New packages/jw-eval with Suite + GoldenCase + LayerResult.
  • L1 structural: contract regression for 6 agents — no network, no LLM, CI-blocking.
  • L2 citations: offline snapshot mode (always-on) + live weekly mode (auto-opens drift issues).
  • L3 semantic: sentence-transformers + LLM escalation (Ollama/Claude/OpenAI via JW_EVAL_LLM).
  • 47 golden cases (25 L1 + 13 L2 + 9 L3) covering all major agents.
  • CLI `jw eval`, MCP tool `run_eval_suite`, 3 new CI jobs (fast/weekly/nightly).

Pending / next PR

  • Build the 12 wol.jw.org HTML snapshots via `build_eval_snapshots.py`.
  • Bot-comment of markdown report on PRs.

Phase 23

Citation integrity validator

Telemetry's counterpart — output side

+33 tests 12 commits ✓ Shipped
Guide →

jw_core.citations subpackage validates that every URL emitted by an agent resolves and preserves its docId↔pub_code mapping. Three modes: structural (offline), live (HTTP), drift (HTML fingerprint diff).

What shipped

  • Subpackage jw_core.citations with models + validator.
  • Structural mode: uses MepsCatalog to validate docId offline.
  • Live mode: httpx_fetcher with redirect handling + bounded concurrency.
  • Drift mode: HTML-skeleton hash (robust to content edits, sensitive to structural change).
  • CLI `jw citations check --urls/--agent-output [--live] [--drift]`.
  • MCP tool validate_citations.

Pending / next PR

  • Auto-refresh snapshots from Phase 22 when drift detected.

Tier 2 · Recurrent high value

Weekly usage per publisher.

Phase 24

Study conductor "Enjoy Life Forever"

Full lifecycle of the current study book

+49 tests 20 commits ✓ Shipped
Guide →

The biggest gap in active discipleship today. Prepares each lesson with anticipated questions, suggests supporting verses, and tracks each student's progress (lessons completed, goals, attendance). Local-only, encrypted by passphrase.

What shipped

  • study_conductor.prepare_lesson agent (templated, no LLM).
  • StudentProgressStore: SQLite + FieldEncryptor-encrypted sensitive columns.
  • LessonRow/StudentGoal/GoalKind/LessonStatus models.
  • Controlled goal vocabulary (attend_meetings, baptism, drop_addiction_*, etc.).
  • CLI: `jw study lesson/log/progress/lessons/goals/directory`.
  • MCP tools: prepare_lesson, log_student_progress, list_student_lessons, set_student_goal.
  • Crisis-keyword scan warns without blocking; consent disclosure on first run.

Pending / next PR

  • Auto-extract real Q&A from the 'lff' JWPUB when locally available.
  • Optional reverse-sync with JW Library to mark lessons completed.

Phase 25

jw.org news monitor

Weekly/monthly digest of what's new

+29 tests 13 commits ✓ Shipped
Guide →

Detects new publications, new JW Broadcasting videos, and new workbooks/Watchtower issues. Distinct from telemetry (which detects API drift): this watches published content.

What shipped

  • jw_core.news subpackage: SeenStore SQLite, 3 sources (publications/broadcasting/programs).
  • Seed pub_code list for periodicals + non-periodicals.
  • Markdown digest grouped by channel/language + JSON.
  • news_monitor agent + CLI `jw news digest`.
  • MCP tool news_digest.

Pending / next PR

  • No shipped daemon (only recommended cron documented).

Tier 3 · Specialized but unique

Fills pedagogical or reporting gaps.

Phase 26

Life & Ministry student parts

Pedagogical script per assignment kind

+34 tests 12 commits ✓ Shipped
Guide →

Four assignment kinds: bible reading, starting conversations, return visit, bible study demo. Each with its own script and hook to the month's oratory point.

What shipped

  • oratory_points registry: 50 points × 3 languages (paraphrased ≤120 char, copyright-safe CI guard).
  • 48-slot templates: 4 kinds × 4 audiences × 3 languages.
  • student_part_helper agent (opening/body/transition/close + time_target_seconds).
  • CLI `jw student`, MCP tool student_part_help.
  • 4 L1 golden cases.

Pending / next PR

  • Configurable month → oratory-point offset.
  • More audience×family templates.

Phase 27

Monthly pioneer report

Pioneers only · encrypted by default

+22 tests 13 commits ✓ Shipped
Guide →

Local-only aggregator of hours + active bible studies + return visits. For regular/auxiliary pioneers only. Encrypted by default; markdown/CSV/PDF outputs.

What shipped

  • jw_core.ministry: field_report (store + aggregator) + exporters.
  • Encrypted sensitive columns via FieldEncryptor; documented opt-out.
  • Active-studies rule: MAX during month (modern JW practice).
  • Read-only adapter over RevisitTracker.
  • CLI `jw report --month YYYY-MM --format md/csv/pdf`.
  • MCP tools: field_log_hours, field_log_study, field_monthly_report.

Pending / next PR

  • Official S-13/S-301 export shape.
  • 12-month annual aggregate.

Phase 28

Exact NWT + publications concordance

Watchtower Library-style literal search

+39 tests 11 commits ✓ Shipped
Guide →

Deterministic, complements the semantic RAG. SQLite FTS5 over the already-decrypted corpus. Diacritic-tolerant tokenizer for Spanish/Portuguese UX.

What shipped

  • jw_core.concordance: store, indexer, search.
  • FTS5 unicode61 remove_diacritics 2.
  • Three source kinds: nwt / jwpub / epub with per-channel URL resolution.
  • Incremental indexing by sha256.
  • Unicode ‹…› snippet markers (markdown-safe).
  • CLI `jw grep`. MCP tools concordance_build_index, concordance_search.

Pending / next PR

  • `--build-nwt` whole-book mode.
  • Eval adapter for golden cases.

Tier 4 · UX & niche layers

Closes the prep & conversion loop.

Phase 29

Letter / phone / cart composer

3 unaddressed ministry modalities

+543 tests 11 commits ✓ Shipped
Guide →

Personalized letter (120-180 words), phone script (~60-90s), cart-witnessing micro-script. Copyright-safe: scripture excerpt always empty, only Citation.url.

What shipped

  • letter_composer agent with 3 kinds.
  • 7 audiences × 8 topic_families × 3 languages = 504 combos via 3-level fallback.
  • Enforced copyright safety + territory-hint isolation (both tested).
  • CLI `jw letter`, MCP tool compose_witnessing.
  • 3 L1 golden cases.

Pending / next PR

  • More specific templates.
  • Offline jw_link suggestions via pub_media cache.

Phase 30

Kingdom songs companion

Metadata only · zero lyrics

+20 tests 11 commits ✓ Shipped
Guide →

Hand-curated registry: number, title, theme (≤200 char paraphrase), cited scriptures, canonical URL. NEVER lyrics (copyright). Non-destructive integration with workbook_helper.

What shipped

  • Seed JSON E/S/T (12 songs each; 151 pending via PRs).
  • KingdomSong model + SongRegistry loader.
  • Anti-lyrics CI guard (200-char cap + banned tokens).
  • Non-destructive enrich_with_songs adapter (opt-in --with-songs).
  • Offline URL derivation with finder fallback.
  • CLI `jw song`. MCP tools lookup_song, songs_for_week.

Pending / next PR

  • Expand 12 → 151 entries.
  • doc_id per entry for deep URLs.

Phase 31

PDF / DOCX / Anki exporter

Any AgentResult → printable deliverable

+54 tests 11 commits ✓ Shipped
Guide →

Convert any agent output into a printable study sheet (PDF/DOCX) or spaced-repetition Anki deck. Single IR (StudySheet) consumed by every exporter. Opt-in extras.

What shipped

  • jw_core.exporters: ir.py + markdown/pdf/docx/anki exporters.
  • Jinja2 templates with ~/.jw-agent-toolkit/templates/ override.
  • Stable Anki GUIDs sha256(title+heading+body[:200]); re-export updates without duplicating.
  • MissingDependencyError with pip hint.
  • CLI `jw export`. MCP tool export_study_sheet.

Pending / next PR

  • CI matrix with [pdf] extras.
  • Additional themes.

Phase 32

Life topics informational assistant

Orientation with citations, not counsel

+32 tests 12 commits ✓ Shipped
Guide →

For questions like 'What does the Bible say about anxiety/grief/marriage problems?'. Each response includes a disclaimer + elders redirect on sensitive topics. Never fabricates Bible quotes.

What shipped

  • life_topics registry: 9 topic_ids with aliases & sensitive/general families.
  • life_topics agent: topic_index → CDN search → preview + disclaimer + elders_redirect (sensitive).
  • Disclaimer text excludes medical-professional names (tested invariant).
  • Empty + disclaimer + redirect on no matches; never synthetic Bible quotes.
  • CLI `jw life`. MCP tool life_topic_info.
  • 4 golden cases (2 L1 + 2 L3) for anxiety_es and grief_en.

Pending / next PR

  • Expand 9 → 30+ topics.
  • User-overridable disclaimer (keeping elders_redirect mandatory).

▌ Process · how the 11 phases were built

1. Brainstorming

Before touching code, each phase went through a structured session: oracle, judge, scope, plus 2–3 approaches with tradeoffs.

2. Spec + Plan

Every phase has an approved spec-design.md and a bite-sized plan.md (2-5 min tasks with exact code). All committed under docs/superpowers/.

3. Strict TDD

Each task: redgreencommit. 154 tasks executed. Failing test observed before each impl. Zero exceptions.

▌ Next steps

Operational

  • – Build HTML snapshots for Phase 22 (offline CI).
  • – Publish jw-eval to PyPI when stabilized.
  • – CI matrix with [pdf]/[docx]/[anki] extras installed.
  • – PR bot-comment of eval report.

Content

  • – Expand Kingdom songs from 12 → 151 entries.
  • – More topics in life_topics (9 → 30+).
  • – Auto-extract real Q&A for study_conductor.
  • – More golden cases (policy: each PR adds ≥3).