Independent Project Not affiliated with, sponsored by, or endorsed by the Watch Tower Bible and Tract Society or Jehovah's Witnesses.
jw-agent-toolkit
ES

Release F65-F76 · 9 new phases + post-MVPs closed

Agentic · multimodal · predictive · family voice

Nine phases opening four new layers on top of an already stable toolkit: verifiable agentic orchestration (F65-F67), end-to-end multimodal (F68-F71), diachronic analysis (F72) and consented family voice (F76). All critical decisions stay deterministic. The LLM only enters when a NLI critic is verifying its output. Every verdict carries its canonical URL to wol.jw.org.

Phases

9

New tests

481+

New MCP tools

15+

Phase 65

meta-orchestrator

Planner → executor → critic over the 12 existing agents

✅ Shipped 🧪 ~65 tests T1 Agentic
Technical guide →

Agentic orchestrator that decomposes a goal into a tool DAG, runs it in topological order and critiques the result with F39 NLI before returning. Reuses Plugin SDK F41 + env-driven LLM/NLI factories (Anthropic + Ollama + Fake). Post-MVP shipped: Mermaid DAG export and deterministic replay of saved plans.

What was delivered

  • LLM planner with JSON-schema validation + critique loop + opt-in replan.
  • 12 real adapters in builtin_tools.py (no more placeholders).
  • LLM factory: Anthropic + Ollama + Fake with graceful degradation.
  • NLI factory wrapping get_default_nli_provider() from F39.
  • F43 tracing via opt-in MetaOrchestrator.tracer=.
  • Opt-in persistence with --save-plan / --save-result JSON.
  • Mermaid export: plan_to_mermaid() + result_to_mermaid() (post-MVP).
  • Plan replay: MetaOrchestrator.run_plan(plan) + CLI jw meta replay (post-MVP).
  • CLI jw meta {tools,plan,run,replay} + flag --mermaid + jw plan-sunday.
  • 3 new MCP tools.

Phase 66

conversation-sparring

Conversational sparring partner for ministry practice

✅ Shipped 🧪 ~65 tests T1 Agentic
Technical guide →

6 personas (atheist, jw_student, biblical_scholar, evangelical, agnostic, returning) in 3 languages (es/en/pt) with F61 memory and opt-in F39 NLI to verify answer coherence. Full voice mode via F34 ASR/TTS. Post-MVP shipped: cross-process SQLite persistence.

What was delivered

  • 6 personas × 3 languages in TOML (18 files) with multi-language resolution.
  • Voice mode: jw spar voice-turn — ASR → LLM → TTS, audio never leaves disk.
  • Golden conversations in fixtures/conversations/*.jsonl + deterministic FakeSparLLM.
  • spar.session tool in builtin_tools for use from the F65 meta-orchestrator.
  • Markdown export of transcripts via jw spar show/close --export.
  • Cross-process SQLite in jw_agents/spar/persistence.py (post-MVP).
  • Opt-in autosave via JW_SPAR_PERSIST=1 (post-MVP).
  • CLI jw spar {personas,start,turn,show,close,voice-turn}.
  • 4 MCP tools.

Phase 67

doctrinal-reasoner

Verifiable chain-of-thought with ReAct + F39 NLI

✅ Shipped 🧪 ~59 tests T1 Agentic
Technical guide →

Structured reasoner that runs ReAct (Reason → Act → Observe) with an exportable proof tree. Toxic framing reformulator (12 patterns es/en/pt). Jinja2 planner with JSON-schema validation. NLI modes off/warn/reject. Post-MVP shipped: real tool dispatcher + 10-question golden set.

What was delivered

  • Toxic-framing reformulator (12 patterns es/en/pt).
  • Multi-language Jinja2 planner + JSON-schema validation.
  • ReAct executor with F39 NLI (off/warn/reject) and truncation.
  • Deterministic prose summary in three languages.
  • Real tool dispatcher: verse_explainer/research_topic/apologetics/life_topics (post-MVP).
  • Opt-in activation via use_real_dispatcher=True.
  • 10-question multi-step golden set in tests/reasoner/fixtures/golden.jsonl (post-MVP).
  • CLI jw reason {ask,languages} + MCP doctrinal_reason.
  • Wired into F65 as reason.doctrinal tool.

Phase 68

talk-lab

Multimodal speech coach, local-first

✅ Shipped 🧪 ~70 tests T2 Multimodal
Technical guide →

End-to-end analysis of Theocratic Ministry School parts: WhisperX F64 for transcription + diarization + prosody (librosa opt + numpy fallback) + 6 pedagogical counsel points. Audio never leaves disk. Post-MVP shipped: SVG timeline and F31 PDF export.

What was delivered

  • WhisperX F64 diarized ASR + prosody features (speech_rate, pitch, pauses, fillers).
  • TOML catalog es/en/pt with applies_by_kind for the 6 initial counsel points.
  • Local SQLite history with cross-session compare.
  • Horizontal SVG timeline report_to_svg with score-colored bars (post-MVP).
  • F31 PDF export wrapper talklab_to_studysheet + export_talk_lab_pdf (post-MVP).
  • CLI jw talklab {analyze,history,compare,counsel-points} + --svg --pdf flags.
  • 3 new MCP tools.

Phase 69

broadcasting-visual-index

Frame-level multimodal search over JW Broadcasting videos

✅ Shipped 🧪 ~43 tests T2 Multimodal
Technical guide →

Frame sampler + VLM captioning + CLIP embedding + RRF fusion to locate visual moments inside Broadcasting videos. Hybrid FTS + vector + RRF retrieval. Post-MVP shipped: OCR on frames reusing the F70 pytesseract adapter.

What was delivered

  • Pydantic VisualFrame with caption + ocr_text + embedding_id + transcript_concurrent.
  • Deterministic frame sampler by interval + change detection.
  • VLM captioning via provider Protocol + CLIP encoder Protocol.
  • Hybrid FTS + vector + RRF search with thumb_path + deep_link.
  • Frame OCR: enrich_frames_with_ocr() reuses F70 (post-MVP).
  • Language routing español → spa, etc.
  • CLI jw broadcasting visual {index,search} + MCP tools.

Phase 70

image-quote-verifier

Visual defense against fake quotes in memes/screenshots

✅ Shipped 🧪 ~58 tests T2 Multimodal
Technical guide →

VLM + OCR + RAG + F39 NLI pipeline that extracts the quote attributed to a JW publication, searches the original text in the corpus and emits an entails/neutral/contradicts verdict with a visual fingerprint (apparent era, layout consistency, anomalies). Post-MVP shipped: real F33 RAG + F39 NLI wire-up.

What was delivered

  • Pipeline load_image → ocr → cleanup → extract_quote → fingerprint → retrieve → NLI.
  • VisualFingerprint with apparent_era + apparent_publication + layout_consistency + anomalies.
  • Verdict synthesis (entails/neutral/contradicts/unverifiable) with reasoning.
  • default_rag_retriever() via env JW_IMAGE_QUOTE_STORE_PATH (post-MVP).
  • default_nli() adapter over F39 get_default_nli_provider() (post-MVP).
  • Engine accepts use_real_defaults=True with graceful degradation.
  • Robust Tier 1 paragraph classifier (paragraph wins over question).

Phase 71

book-camera

Camera for physical books: OCR + classification + actions

✅ Shipped 🧪 ~30 tests T2 Multimodal
Technical guide →

Point the phone at a physical book (Watchtower, Insight on the Scriptures, Bible) and the toolkit OCRs, classifies (bible verse / study question / Watchtower paragraph / plain text) and proposes contextual actions: read_aloud (F76), open_in_jw_library, open_in_wol, show_answer. Post-MVP shipped: REST endpoints for mobile apps.

What was delivered

  • Procedural classifier over OCR cleanup (no LLM in the critical path).
  • Suggested actions: read_aloud · open_in_jw_library · open_in_wol · show_answer.
  • Integrates with F76 voice-clone for read_aloud with consented family voice.
  • REST endpoints jw_mcp/rest/book_camera.py: POST /api/v1/book_camera/{analyze,tts,rag_answer} (post-MVP).
  • /tts enforces the F76 license gate (consent + license + non-commercial).
  • /rag_answer delegates to jw_agents.research_topic when available.
  • CLI jw book-camera {analyze,kinds} + MCP book_camera_analyze.
  • Wired into F65 as book_camera.analyze.

Phase 72

doctrinal-drift

Diachronic analysis with temporal embeddings + DBSCAN cosine

✅ Shipped 🧪 ~43 tests T3 Classical / predictive ML
Technical guide →

Detects doctrinal-understanding evolution across the 20th-21st century by partitioning text by era (13 decades) and DBSCAN-style clustering with cosine distance (pure numpy). Significance minor/moderate/major. The Prov 4:18 trilingual note is ALWAYS injected. Post-MVP shipped: F49 Second Brain wire-up + SVG timeline.

What was delivered

  • partition_by_era + dbscan_cluster cosine + cluster_alignment + significance.
  • Explanatory Prov 4:18 trilingual note ALWAYS injected into the report.
  • Embedding-agnostic (any compatible provider).
  • chunks_from_brain() adapter: reads Publication nodes from F49 Second Brain (post-MVP).
  • Year extraction: explicit (year/pub_year) or derived from published_date (post-MVP).
  • SVG drift timeline with era-colored markers + significance arrows + Prov 4:18 footer (post-MVP).
  • CLI jw drift {analyze,note,eras} + --svg flag + MCP drift_analyze.
  • Wired into F65 as drift.analyze.

Phase 76

family-voice-clone

TTS with consented family voice + 3-layer license gate

✅ Shipped 🧪 ~48 tests T4 Voice / accessibility
Technical guide →

Lets you clone a family member's voice (with signed consent) for personal non-commercial uses — like reading a bedtime story to a child with grandpa's voice. 3-layer license gate: forbidden-names deny list (branch/broadcasting/president/governing_body/warwick), active consent (not revoked + not expired), non-commercial text (5 regex patterns). Post-MVP shipped: opt-in Fernet weights encryption.

What was delivered

  • Pydantic models: VoiceProfile + ConsentRecord + TrainingSample + TrainResult.
  • 3-layer license gate: name deny list + active consent + non-commercial regex.
  • Per-profile JSON registry with JW_VOICECLONE_ROOT env override.
  • Deterministic FakeVoiceProvider for tests (no heavy models).
  • Opt-in audit hook emit_trace=fn F43-compatible.
  • Opt-in Fernet encryption in encryption.py + JW_VOICE_KEY (post-MVP).
  • encrypt_weights / decrypt_to_tempfile / generate_key (post-MVP).
  • CLI jw voiceclone {register-from-consent,list,show,say,revoke,delete}.
  • 3 MCP tools voice_clone_{list,synthesize,audit}.