Specs y planes
Fase 68 — talk-lab Implementation Plan
For agentic workers: REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (
- [ ]) syntax for tracking.
Goal: Build a multimodal coach-of-public-speaking tool (CLI jw talklab analyze recording.wav --kind bible_reading --language es) that loads a local audio recording, transcribes it with WhisperX (F64) including word-level timestamps, extracts prosody features (pitch, intensity, pauses, speech rate, filler words) with librosa, scores the talk against each of ~50 “counsel points” of the JW Theocratic Ministry School manual (catalog es/en/pt), and returns a TalkLabReport with timeline + top-3 strengths + top-3 focus areas, optionally exported to PDF via F31 exporter. Local-first: audio never leaves the disk.
Architecture: New subpackage packages/jw-core/src/jw_core/talk_lab/ with: Pydantic models (ProsodyFeatures, WordTiming, TranscriptSegment, CounselPointResult, TalkLabReport), audio loader reusing F34 normalizers, WhisperX adapter (F64 reuse) with graceful degradation to plain Whisper, prosody extractor (librosa+stdlib), 50 counsel-point scorers (prosodic = heuristics, linguistic = heuristics, audience = LLM judge opt-in), filler detector (es/en/pt), report builder, optional SessionHistory SQLite for longitudinal tracking. CLI adds jw talklab {analyze,compare,history,counsel-points}. MCP adds 3 tools.
Tech Stack: Python 3.13 · Pydantic v2 · librosa (audio prosody) · numpy · stdlib tomllib (counsel catalog TOML loader) · whisperx (F64, optional) · jw_finetune.synth.provider.LLMProvider (LLM judge opt-in) · jw_core.exporters (F31 PDF/DOCX) · pytest. Optional extras: [talk-lab] for librosa+numpy heavy deps.
Spec: docs/superpowers/specs/2026-06-11-fase-68-talk-lab-design.md
File map
Creates:
packages/jw-core/src/jw_core/talk_lab/__init__.pypackages/jw-core/src/jw_core/talk_lab/models.pypackages/jw-core/src/jw_core/talk_lab/audio_loader.pypackages/jw-core/src/jw_core/talk_lab/prosody.pypackages/jw-core/src/jw_core/talk_lab/filler.pypackages/jw-core/src/jw_core/talk_lab/transcriber.pypackages/jw-core/src/jw_core/talk_lab/counsel_points/__init__.pypackages/jw-core/src/jw_core/talk_lab/counsel_points/catalog_en.tomlpackages/jw-core/src/jw_core/talk_lab/counsel_points/catalog_es.tomlpackages/jw-core/src/jw_core/talk_lab/counsel_points/catalog_pt.tomlpackages/jw-core/src/jw_core/talk_lab/counsel_points/applies_by_kind.tomlpackages/jw-core/src/jw_core/talk_lab/counsel_points/loader.pypackages/jw-core/src/jw_core/talk_lab/scorers/__init__.pypackages/jw-core/src/jw_core/talk_lab/scorers/prosodic.pypackages/jw-core/src/jw_core/talk_lab/scorers/linguistic.pypackages/jw-core/src/jw_core/talk_lab/scorers/audience_llm.pypackages/jw-core/src/jw_core/talk_lab/report.pypackages/jw-core/src/jw_core/talk_lab/history.pypackages/jw-core/src/jw_core/talk_lab/engine.pypackages/jw-core/tests/talk_lab/__init__.pypackages/jw-core/tests/talk_lab/test_models.pypackages/jw-core/tests/talk_lab/test_audio_loader.pypackages/jw-core/tests/talk_lab/test_prosody.pypackages/jw-core/tests/talk_lab/test_filler.pypackages/jw-core/tests/talk_lab/test_catalog.pypackages/jw-core/tests/talk_lab/test_scorers_prosodic.pypackages/jw-core/tests/talk_lab/test_scorers_linguistic.pypackages/jw-core/tests/talk_lab/test_scorers_audience.pypackages/jw-core/tests/talk_lab/test_report.pypackages/jw-core/tests/talk_lab/test_history.pypackages/jw-core/tests/talk_lab/test_engine.pypackages/jw-core/tests/talk_lab/fixtures/__init__.pypackages/jw-core/tests/talk_lab/fixtures/recordings/golden_30s_clear_es.wav(sample)packages/jw-core/tests/talk_lab/fixtures/recordings/golden_30s_filler_heavy_es.wavpackages/jw-core/tests/talk_lab/fixtures/expected_reports/golden_30s_clear_es.expected.jsonpackages/jw-cli/src/jw_cli/commands/talklab.pydocs/guias/talk-lab.md
Modifies:
packages/jw-core/pyproject.toml— add optional[talk-lab]extra withlibrosa>=0.10+numpy>=1.24.packages/jw-cli/src/jw_cli/main.py— registertalklabsubcommand.packages/jw-mcp/src/jw_mcp/server.py— expose 3 MCP tools.docs/ROADMAP.md— add Fase 68.docs/README.md— link new guide.
Task 1: Pydantic models
Files:
-
Create:
packages/jw-core/src/jw_core/talk_lab/__init__.py -
Create:
packages/jw-core/src/jw_core/talk_lab/models.py -
Create:
packages/jw-core/tests/talk_lab/__init__.py -
Create:
packages/jw-core/tests/talk_lab/test_models.py -
Step 1: Write the failing test
# packages/jw-core/tests/talk_lab/test_models.py
"""Pydantic models for talk_lab."""
from __future__ import annotations
import pytest
from jw_core.talk_lab.models import (
ProsodyFeatures,
WordTiming,
TranscriptSegment,
CounselPointResult,
TalkLabReport,
)
def test_prosody_round_trip() -> None:
p = ProsodyFeatures(
duration_s=30.0,
speech_rate_wpm=140.0,
pitch_mean_hz=180.0,
pitch_range_hz=80.0,
intensity_mean_db=-22.0,
pause_count=5,
pause_total_s=3.5,
pause_avg_s=0.7,
filler_count=2,
filler_per_minute=4.0,
)
dumped = p.model_dump()
rehydrated = ProsodyFeatures.model_validate(dumped)
assert rehydrated.speech_rate_wpm == 140.0
def test_prosody_rejects_negative_durations() -> None:
with pytest.raises(ValueError):
ProsodyFeatures(
duration_s=-1.0,
speech_rate_wpm=140.0,
pitch_mean_hz=180.0,
pitch_range_hz=80.0,
intensity_mean_db=-22.0,
pause_count=0,
pause_total_s=0.0,
pause_avg_s=0.0,
filler_count=0,
filler_per_minute=0.0,
)
def test_word_timing_rejects_inverted_window() -> None:
with pytest.raises(ValueError):
WordTiming(word="hello", start_s=1.0, end_s=0.5, confidence=0.9)
def test_counsel_score_in_range() -> None:
c = CounselPointResult(
point_id="cp-01",
title="Pronunciation",
title_localized="Pronunciación",
score=2,
)
assert c.applies is True
assert c.score == 2
def test_counsel_score_rejects_out_of_range() -> None:
with pytest.raises(ValueError):
CounselPointResult(
point_id="cp-01",
title="x",
title_localized="x",
score=5, # > 3
)
def test_report_round_trip() -> None:
p = ProsodyFeatures(
duration_s=10.0, speech_rate_wpm=120.0, pitch_mean_hz=150.0,
pitch_range_hz=50.0, intensity_mean_db=-18.0, pause_count=1,
pause_total_s=0.5, pause_avg_s=0.5, filler_count=0, filler_per_minute=0.0,
)
rpt = TalkLabReport(
recording_path="/tmp/x.wav",
part_kind="bible_reading",
language="es",
duration_s=10.0,
transcript=[],
prosody=p,
counsel_results=[],
summary_top_3=[],
summary_focus_3=[],
)
dumped = rpt.model_dump()
rehydrated = TalkLabReport.model_validate(dumped)
assert rehydrated.language == "es"
- Step 2: Run test to verify it fails
Run: uv run pytest packages/jw-core/tests/talk_lab/test_models.py -v
Expected: FAIL.
- Step 3: Implement models
# packages/jw-core/src/jw_core/talk_lab/__init__.py
"""jw_core.talk_lab — coach-of-public-speaking toolkit (Fase 68)."""
from __future__ import annotations
from jw_core.talk_lab.models import (
ProsodyFeatures,
WordTiming,
TranscriptSegment,
CounselPointResult,
TalkLabReport,
PartKind,
CounselScore,
)
__all__ = [
"ProsodyFeatures",
"WordTiming",
"TranscriptSegment",
"CounselPointResult",
"TalkLabReport",
"PartKind",
"CounselScore",
]
# packages/jw-core/src/jw_core/talk_lab/models.py
"""Pydantic models for talk_lab."""
from __future__ import annotations
from typing import Literal
from pydantic import BaseModel, Field, model_validator
CounselScore = Literal[0, 1, 2, 3]
PartKind = Literal[
"bible_reading",
"initial_call",
"return_visit",
"bible_study",
"public_talk",
"watchtower_comment",
"other",
]
class ProsodyFeatures(BaseModel):
duration_s: float = Field(ge=0)
speech_rate_wpm: float = Field(ge=0)
pitch_mean_hz: float = Field(ge=0)
pitch_range_hz: float = Field(ge=0)
intensity_mean_db: float
pause_count: int = Field(ge=0)
pause_total_s: float = Field(ge=0)
pause_avg_s: float = Field(ge=0)
filler_count: int = Field(ge=0)
filler_per_minute: float = Field(ge=0)
pitch_contour_path: str | None = None
class WordTiming(BaseModel):
word: str
start_s: float = Field(ge=0)
end_s: float = Field(ge=0)
confidence: float = Field(ge=0, le=1)
@model_validator(mode="after")
def _validate_window(self) -> "WordTiming":
if self.end_s < self.start_s:
raise ValueError(f"end_s ({self.end_s}) < start_s ({self.start_s})")
return self
class TranscriptSegment(BaseModel):
speaker: str
text: str
start_s: float = Field(ge=0)
end_s: float = Field(ge=0)
words: list[WordTiming] = Field(default_factory=list)
class CounselPointResult(BaseModel):
point_id: str
title: str
title_localized: str
score: CounselScore
evidence: list[str] = Field(default_factory=list)
suggestion: str = ""
applies: bool = True
class TalkLabReport(BaseModel):
recording_path: str
part_kind: PartKind
language: Literal["en", "es", "pt"]
duration_s: float = Field(ge=0)
transcript: list[TranscriptSegment]
prosody: ProsodyFeatures
counsel_results: list[CounselPointResult]
summary_top_3: list[str]
summary_focus_3: list[str]
trace_path: str | None = None
score_history_path: str | None = None
- Step 4: Run test to verify it passes
Run: uv run pytest packages/jw-core/tests/talk_lab/test_models.py -v
Expected: 6 passed.
- Step 5: Commit
git add packages/jw-core/src/jw_core/talk_lab/__init__.py packages/jw-core/src/jw_core/talk_lab/models.py packages/jw-core/tests/talk_lab
git commit -m "feat(jw-core): scaffold talk_lab package with Pydantic models"
Task 2: Audio loader (resample + normalize)
Files:
-
Create:
packages/jw-core/src/jw_core/talk_lab/audio_loader.py -
Create:
packages/jw-core/tests/talk_lab/test_audio_loader.py -
Step 1: Write the failing test
# packages/jw-core/tests/talk_lab/test_audio_loader.py
"""Audio loader tests."""
from __future__ import annotations
import wave
from pathlib import Path
import pytest
from jw_core.talk_lab.audio_loader import load_audio_mono16k, AudioLoadError
def _write_pcm_wav(path: Path, sample_rate: int, samples: list[int]) -> None:
with wave.open(str(path), "wb") as w:
w.setnchannels(1)
w.setsampwidth(2)
w.setframerate(sample_rate)
for s in samples:
w.writeframes(int(s).to_bytes(2, "little", signed=True))
def test_load_audio_resamples_44k_to_16k(tmp_path: Path) -> None:
p = tmp_path / "x.wav"
_write_pcm_wav(p, 44100, [0] * 4410) # 0.1s silence
audio, sr = load_audio_mono16k(str(p))
assert sr == 16000
assert 0.09 < len(audio) / sr < 0.11
def test_load_audio_normalizes_to_neg1_pos1(tmp_path: Path) -> None:
p = tmp_path / "x.wav"
_write_pcm_wav(p, 16000, [32767, -32768] * 1000)
audio, sr = load_audio_mono16k(str(p))
assert audio.max() <= 1.0
assert audio.min() >= -1.0
def test_load_audio_missing_raises(tmp_path: Path) -> None:
with pytest.raises(AudioLoadError):
load_audio_mono16k(str(tmp_path / "missing.wav"))
- Step 2: Run test to verify it fails
Run: uv run pytest packages/jw-core/tests/talk_lab/test_audio_loader.py -v
Expected: FAIL.
- Step 3: Implement audio loader
# packages/jw-core/src/jw_core/talk_lab/audio_loader.py
"""Audio loader: read WAV, resample to 16kHz mono, normalize to [-1, 1]."""
from __future__ import annotations
import logging
import wave
from pathlib import Path
import numpy as np
logger = logging.getLogger(__name__)
class AudioLoadError(RuntimeError):
"""Raised when the audio cannot be loaded."""
def load_audio_mono16k(path: str) -> tuple[np.ndarray, int]:
"""Load `path` as float32 mono at 16kHz, normalized to [-1, 1]."""
p = Path(path)
if not p.exists():
raise AudioLoadError(f"not found: {p}")
try:
with wave.open(str(p), "rb") as w:
n_channels = w.getnchannels()
sample_width = w.getsampwidth()
framerate = w.getframerate()
n_frames = w.getnframes()
raw = w.readframes(n_frames)
except wave.Error as exc:
raise AudioLoadError(f"wave.Error: {exc}") from exc
if sample_width != 2:
raise AudioLoadError(f"only 16-bit PCM supported (got {sample_width*8}-bit)")
samples = np.frombuffer(raw, dtype=np.int16)
if n_channels > 1:
samples = samples.reshape(-1, n_channels).mean(axis=1).astype(np.int16)
audio_f32 = samples.astype(np.float32) / 32768.0
if framerate != 16000:
try:
from scipy.signal import resample_poly # type: ignore
ratio_num, ratio_den = 16000, framerate
audio_f32 = resample_poly(audio_f32, ratio_num, ratio_den).astype(np.float32)
except ImportError:
# crude linear resample fallback
new_len = int(len(audio_f32) * 16000 / framerate)
old_x = np.linspace(0, 1, len(audio_f32), endpoint=False)
new_x = np.linspace(0, 1, new_len, endpoint=False)
audio_f32 = np.interp(new_x, old_x, audio_f32).astype(np.float32)
return audio_f32, 16000
- Step 4: Run test to verify it passes
Run: uv run pytest packages/jw-core/tests/talk_lab/test_audio_loader.py -v
Expected: 3 passed.
- Step 5: Commit
git add packages/jw-core/src/jw_core/talk_lab/audio_loader.py packages/jw-core/tests/talk_lab/test_audio_loader.py
git commit -m "feat(talk_lab): audio loader with 16kHz mono normalization"
Task 3: Prosody feature extractor
Files:
-
Create:
packages/jw-core/src/jw_core/talk_lab/prosody.py -
Create:
packages/jw-core/tests/talk_lab/test_prosody.py -
Step 1: Write the failing test
# packages/jw-core/tests/talk_lab/test_prosody.py
"""Prosody extractor tests with synthetic audio."""
from __future__ import annotations
import numpy as np
import pytest
from jw_core.talk_lab.prosody import extract_prosody
def _synth_silence(duration_s: float, sr: int = 16000) -> np.ndarray:
return np.zeros(int(duration_s * sr), dtype=np.float32)
def _synth_tone(duration_s: float, freq_hz: float, sr: int = 16000) -> np.ndarray:
t = np.linspace(0, duration_s, int(duration_s * sr), endpoint=False)
return (0.3 * np.sin(2 * np.pi * freq_hz * t)).astype(np.float32)
def test_prosody_silence_has_zero_speech_rate() -> None:
audio = _synth_silence(3.0)
p = extract_prosody(audio, sr=16000, word_count=0)
assert p.speech_rate_wpm == 0.0
assert p.duration_s == pytest.approx(3.0)
def test_prosody_pitch_detected_on_tone() -> None:
audio = _synth_tone(2.0, freq_hz=200.0)
p = extract_prosody(audio, sr=16000, word_count=4)
# Allow some tolerance because the extractor is naive without librosa
assert 100.0 < p.pitch_mean_hz < 400.0 or p.pitch_mean_hz == 0.0
def test_prosody_speech_rate_computed() -> None:
audio = _synth_tone(60.0, freq_hz=200.0)
p = extract_prosody(audio, sr=16000, word_count=140)
assert p.speech_rate_wpm == pytest.approx(140.0, rel=0.01)
def test_prosody_pause_detection_basic() -> None:
# 1s tone + 0.5s silence + 1s tone
a = _synth_tone(1.0, 200.0)
b = _synth_silence(0.5)
c = _synth_tone(1.0, 200.0)
audio = np.concatenate([a, b, c])
p = extract_prosody(audio, sr=16000, word_count=5)
assert p.pause_count >= 1
- Step 2: Run test to verify it fails
Run: uv run pytest packages/jw-core/tests/talk_lab/test_prosody.py -v
Expected: FAIL.
- Step 3: Implement prosody extractor
# packages/jw-core/src/jw_core/talk_lab/prosody.py
"""Prosody feature extractor.
Uses librosa when available, falls back to numpy-only heuristics otherwise.
Returns a `ProsodyFeatures` Pydantic model.
"""
from __future__ import annotations
import logging
import numpy as np
from jw_core.talk_lab.models import ProsodyFeatures
logger = logging.getLogger(__name__)
_PAUSE_RMS_THRESHOLD = 0.005 # below this rms = silence
_PAUSE_FRAME_MS = 25
_PAUSE_MIN_DURATION_S = 0.30
def _frame_rms(audio: np.ndarray, frame_size: int) -> np.ndarray:
n_frames = len(audio) // frame_size
if n_frames == 0:
return np.array([], dtype=np.float32)
trimmed = audio[: n_frames * frame_size].reshape(n_frames, frame_size)
return np.sqrt(np.mean(trimmed.astype(np.float64) ** 2, axis=1)).astype(np.float32)
def _detect_pauses(rms: np.ndarray, sr: int, frame_size: int) -> tuple[int, float, float]:
if rms.size == 0:
return (0, 0.0, 0.0)
silence_mask = rms < _PAUSE_RMS_THRESHOLD
if not silence_mask.any():
return (0, 0.0, 0.0)
frame_dur = frame_size / sr
pauses: list[float] = []
current = 0
for is_sil in silence_mask:
if is_sil:
current += 1
else:
if current * frame_dur >= _PAUSE_MIN_DURATION_S:
pauses.append(current * frame_dur)
current = 0
if current * frame_dur >= _PAUSE_MIN_DURATION_S:
pauses.append(current * frame_dur)
return (len(pauses), float(sum(pauses)), float(np.mean(pauses)) if pauses else 0.0)
def _estimate_pitch(audio: np.ndarray, sr: int) -> tuple[float, float]:
"""Very crude autocorrelation pitch tracker over voiced frames."""
try:
import librosa # type: ignore
# Use librosa.yin if available
f0 = librosa.yin(audio, fmin=80.0, fmax=400.0, sr=sr)
voiced = f0[np.isfinite(f0) & (f0 > 0)]
if voiced.size == 0:
return (0.0, 0.0)
return (float(np.mean(voiced)), float(np.percentile(voiced, 95) - np.percentile(voiced, 5)))
except Exception:
# numpy fallback: zero-crossing rate over windows → very coarse
if audio.size < sr * 0.05:
return (0.0, 0.0)
window = 1024
crossings_per_frame: list[int] = []
for i in range(0, len(audio) - window, window):
seg = audio[i : i + window]
crossings = int(np.sum(np.diff(np.sign(seg)) != 0))
crossings_per_frame.append(crossings)
if not crossings_per_frame:
return (0.0, 0.0)
rate = float(np.mean(crossings_per_frame)) * (sr / window) / 2.0
# Cap at human range
if rate < 60 or rate > 500:
return (0.0, 0.0)
return (rate, max(rate * 0.4, 0.0))
def extract_prosody(
audio: np.ndarray,
*,
sr: int = 16000,
word_count: int,
filler_count: int = 0,
) -> ProsodyFeatures:
"""Extract a `ProsodyFeatures` from an audio array."""
duration_s = float(len(audio) / sr)
frame_size = int(sr * _PAUSE_FRAME_MS / 1000)
rms = _frame_rms(audio, frame_size)
intensity_db = (
20.0 * float(np.log10(max(np.sqrt(np.mean(audio.astype(np.float64) ** 2) + 1e-12), 1e-6)))
if audio.size else -120.0
)
pause_count, pause_total, pause_avg = _detect_pauses(rms, sr, frame_size)
speech_rate_wpm = (word_count / duration_s) * 60.0 if duration_s > 0 else 0.0
pitch_mean, pitch_range = _estimate_pitch(audio, sr)
filler_per_minute = (filler_count / duration_s) * 60.0 if duration_s > 0 else 0.0
return ProsodyFeatures(
duration_s=duration_s,
speech_rate_wpm=speech_rate_wpm,
pitch_mean_hz=pitch_mean,
pitch_range_hz=pitch_range,
intensity_mean_db=intensity_db,
pause_count=pause_count,
pause_total_s=pause_total,
pause_avg_s=pause_avg,
filler_count=filler_count,
filler_per_minute=filler_per_minute,
)
- Step 4: Run test to verify it passes
Run: uv run pytest packages/jw-core/tests/talk_lab/test_prosody.py -v
Expected: 4 passed.
- Step 5: Commit
git add packages/jw-core/src/jw_core/talk_lab/prosody.py packages/jw-core/tests/talk_lab/test_prosody.py
git commit -m "feat(talk_lab): prosody extractor (librosa with numpy fallback)"
Task 4: Filler-word detector (es/en/pt)
Files:
-
Create:
packages/jw-core/src/jw_core/talk_lab/filler.py -
Create:
packages/jw-core/tests/talk_lab/test_filler.py -
Step 1: Write the failing test
# packages/jw-core/tests/talk_lab/test_filler.py
"""Filler-word detector tests."""
from __future__ import annotations
import pytest
from jw_core.talk_lab.filler import count_fillers
def test_count_fillers_en() -> None:
text = "um, like, you know, uh, this is, um, important."
n = count_fillers(text, language="en")
# um, like, you know, uh, um = 5
assert n == 5
def test_count_fillers_es() -> None:
text = "Eh, pues, este, o sea, bueno, vale… continuamos."
n = count_fillers(text, language="es")
assert n == 6
def test_count_fillers_pt() -> None:
text = "É, tipo assim, então, né, vamos lá."
n = count_fillers(text, language="pt")
assert n >= 4
def test_count_fillers_word_boundary() -> None:
# "this is the umpire" should NOT count "um"
assert count_fillers("the umpire", language="en") == 0
def test_count_fillers_case_insensitive() -> None:
assert count_fillers("UM, ok", language="en") == 1
def test_count_fillers_unknown_language_falls_back() -> None:
n = count_fillers("um like", language="fr") # falls back to en
assert n == 2
- Step 2: Run test to verify it fails
Run: uv run pytest packages/jw-core/tests/talk_lab/test_filler.py -v
Expected: FAIL.
- Step 3: Implement filler detector
# packages/jw-core/src/jw_core/talk_lab/filler.py
"""Filler-word detector for es/en/pt with word-boundary matching."""
from __future__ import annotations
import re
_FILLERS: dict[str, list[str]] = {
"en": ["um", "uh", "uhh", "like", "you know", "i mean", "so", "right"],
"es": ["este", "esto", "o sea", "eh", "eeh", "pues", "bueno", "vale"],
"pt": ["é", "tipo", "tipo assim", "então", "né", "pra você ver"],
}
def _compile_pattern(words: list[str]) -> re.Pattern[str]:
# Sort by length so longer alternations like "you know" win over "you".
sorted_words = sorted(words, key=len, reverse=True)
escaped = [re.escape(w) for w in sorted_words]
return re.compile(rf"(?<![\w]){'|'.join(escaped)}(?![\w])", re.IGNORECASE)
_CACHE: dict[str, re.Pattern[str]] = {lang: _compile_pattern(words) for lang, words in _FILLERS.items()}
def count_fillers(text: str, *, language: str = "es") -> int:
"""Return the count of filler words/phrases in `text` for `language`."""
pattern = _CACHE.get(language) or _CACHE["en"]
return len(pattern.findall(text or ""))
- Step 4: Run test to verify it passes
Run: uv run pytest packages/jw-core/tests/talk_lab/test_filler.py -v
Expected: 6 passed.
- Step 5: Commit
git add packages/jw-core/src/jw_core/talk_lab/filler.py packages/jw-core/tests/talk_lab/test_filler.py
git commit -m "feat(talk_lab): filler-word detector (es/en/pt) with word-boundary regex"
Task 5: Counsel-point catalog loader (TOML)
Files:
-
Create:
packages/jw-core/src/jw_core/talk_lab/counsel_points/__init__.py -
Create:
packages/jw-core/src/jw_core/talk_lab/counsel_points/loader.py -
Create:
packages/jw-core/src/jw_core/talk_lab/counsel_points/catalog_es.toml(minimal subset of 6 points for v1; extend in follow-up) -
Create:
packages/jw-core/src/jw_core/talk_lab/counsel_points/catalog_en.toml -
Create:
packages/jw-core/src/jw_core/talk_lab/counsel_points/catalog_pt.toml -
Create:
packages/jw-core/src/jw_core/talk_lab/counsel_points/applies_by_kind.toml -
Create:
packages/jw-core/tests/talk_lab/test_catalog.py -
Step 1: Write the failing test
# packages/jw-core/tests/talk_lab/test_catalog.py
"""Counsel-point catalog loader tests."""
from __future__ import annotations
import pytest
from jw_core.talk_lab.counsel_points.loader import (
load_catalog,
applies_to,
CounselPointDefinition,
)
def test_load_catalog_es() -> None:
points = load_catalog("es")
assert any(p.id == "cp-01" for p in points)
p1 = next(p for p in points if p.id == "cp-01")
assert isinstance(p1, CounselPointDefinition)
assert p1.title_localized != ""
def test_load_catalog_en_has_same_ids() -> None:
es_ids = {p.id for p in load_catalog("es")}
en_ids = {p.id for p in load_catalog("en")}
assert es_ids == en_ids
def test_applies_to_filters_by_kind() -> None:
bible_reading_points = applies_to("bible_reading")
assert isinstance(bible_reading_points, set)
assert "cp-01" in bible_reading_points
- Step 2: Run test to verify it fails
Run: uv run pytest packages/jw-core/tests/talk_lab/test_catalog.py -v
Expected: FAIL.
- Step 3: Write the catalog TOML files (initial 6 points)
packages/jw-core/src/jw_core/talk_lab/counsel_points/catalog_en.toml:
[[points]]
id = "cp-01"
title = "Clear Pronunciation"
title_localized = "Clear Pronunciation"
category = "prosodic"
scorer = "score_pronunciation"
short_description = "Each word should be intelligible."
[[points]]
id = "cp-02"
title = "Speech Rate"
title_localized = "Speech Rate"
category = "prosodic"
scorer = "score_speech_rate"
short_description = "120-150 wpm for teaching."
[[points]]
id = "cp-03"
title = "Use of Pauses"
title_localized = "Use of Pauses"
category = "prosodic"
scorer = "score_pause_use"
short_description = "Pauses between thoughts to let ideas land."
[[points]]
id = "cp-04"
title = "Filler Words"
title_localized = "Filler Words"
category = "prosodic"
scorer = "score_filler_use"
short_description = "Minimize um/uh/like."
[[points]]
id = "cp-05"
title = "Use of Scripture"
title_localized = "Use of Scripture"
category = "linguistic"
scorer = "score_scripture_use"
short_description = "Cite Scripture and tie it to the point."
[[points]]
id = "cp-06"
title = "Audience Warmth"
title_localized = "Audience Warmth"
category = "audience"
scorer = "score_audience_warmth"
short_description = "Warmth shown to listeners."
catalog_es.toml: same IDs/categories/scorers but with title_localized translated.
[[points]]
id = "cp-01"
title = "Clear Pronunciation"
title_localized = "Pronunciación clara"
category = "prosodic"
scorer = "score_pronunciation"
short_description = "Cada palabra debe ser entendible."
[[points]]
id = "cp-02"
title = "Speech Rate"
title_localized = "Velocidad del habla"
category = "prosodic"
scorer = "score_speech_rate"
short_description = "120-150 ppm para enseñar."
[[points]]
id = "cp-03"
title = "Use of Pauses"
title_localized = "Uso de pausas"
category = "prosodic"
scorer = "score_pause_use"
short_description = "Pausas entre ideas para que se asienten."
[[points]]
id = "cp-04"
title = "Filler Words"
title_localized = "Muletillas"
category = "prosodic"
scorer = "score_filler_use"
short_description = "Reduce este/o sea/pues."
[[points]]
id = "cp-05"
title = "Use of Scripture"
title_localized = "Uso de la Escritura"
category = "linguistic"
scorer = "score_scripture_use"
short_description = "Cita la Biblia y conéctala al punto."
[[points]]
id = "cp-06"
title = "Audience Warmth"
title_localized = "Calidez hacia el auditorio"
category = "audience"
scorer = "score_audience_warmth"
short_description = "Calidez hacia los oyentes."
catalog_pt.toml: same with Portuguese translations.
applies_by_kind.toml:
[bible_reading]
points = ["cp-01", "cp-02", "cp-03", "cp-04", "cp-05"]
[initial_call]
points = ["cp-01", "cp-02", "cp-03", "cp-04", "cp-05", "cp-06"]
[return_visit]
points = ["cp-01", "cp-02", "cp-03", "cp-04", "cp-05", "cp-06"]
[bible_study]
points = ["cp-01", "cp-02", "cp-03", "cp-04", "cp-05", "cp-06"]
[public_talk]
points = ["cp-01", "cp-02", "cp-03", "cp-04", "cp-05", "cp-06"]
[watchtower_comment]
points = ["cp-01", "cp-02", "cp-03"]
[other]
points = ["cp-01", "cp-02", "cp-03"]
NOTE: this MVP catalog has 6 points. The Fase 68 design budget calls for ~50; subsequent commits expand the catalog one category at a time.
- Step 4: Implement loader
# packages/jw-core/src/jw_core/talk_lab/counsel_points/__init__.py
"""Counsel-point catalog (loader + TOML data)."""
# packages/jw-core/src/jw_core/talk_lab/counsel_points/loader.py
"""Load TOML catalog of counsel points and the applies-by-kind table."""
from __future__ import annotations
import tomllib
from functools import lru_cache
from pathlib import Path
from typing import Literal
from pydantic import BaseModel
_HERE = Path(__file__).parent
_LANG_FILES = {"en": "catalog_en.toml", "es": "catalog_es.toml", "pt": "catalog_pt.toml"}
_APPLIES_FILE = "applies_by_kind.toml"
class CounselPointDefinition(BaseModel):
id: str
title: str
title_localized: str
category: Literal["prosodic", "linguistic", "audience"]
scorer: str
short_description: str = ""
@lru_cache
def load_catalog(language: str) -> list[CounselPointDefinition]:
"""Return the counsel points for a language (fallback to en)."""
fname = _LANG_FILES.get(language) or _LANG_FILES["en"]
with (_HERE / fname).open("rb") as f:
data = tomllib.load(f)
return [CounselPointDefinition(**entry) for entry in data.get("points", [])]
@lru_cache
def _applies_by_kind() -> dict[str, set[str]]:
with (_HERE / _APPLIES_FILE).open("rb") as f:
data = tomllib.load(f)
return {kind: set(spec["points"]) for kind, spec in data.items()}
def applies_to(part_kind: str) -> set[str]:
"""Set of point ids that apply to a given `part_kind`."""
return _applies_by_kind().get(part_kind, set())
- Step 5: Run test to verify it passes
Run: uv run pytest packages/jw-core/tests/talk_lab/test_catalog.py -v
Expected: 3 passed.
- Step 6: Commit
git add packages/jw-core/src/jw_core/talk_lab/counsel_points packages/jw-core/tests/talk_lab/test_catalog.py
git commit -m "feat(talk_lab): counsel-point catalog (6-point MVP, es/en/pt + applies_by_kind)"
Task 6: Prosodic scorers
Files:
-
Create:
packages/jw-core/src/jw_core/talk_lab/scorers/__init__.py -
Create:
packages/jw-core/src/jw_core/talk_lab/scorers/prosodic.py -
Create:
packages/jw-core/tests/talk_lab/test_scorers_prosodic.py -
Step 1: Write the failing test
# packages/jw-core/tests/talk_lab/test_scorers_prosodic.py
"""Prosodic-only scorer tests."""
from __future__ import annotations
import pytest
from jw_core.talk_lab.models import ProsodyFeatures, TranscriptSegment, WordTiming
from jw_core.talk_lab.scorers.prosodic import (
score_pronunciation,
score_speech_rate,
score_pause_use,
score_filler_use,
)
def _features(**overrides) -> ProsodyFeatures:
base = dict(
duration_s=60.0, speech_rate_wpm=130.0, pitch_mean_hz=180.0,
pitch_range_hz=80.0, intensity_mean_db=-20.0, pause_count=10,
pause_total_s=10.0, pause_avg_s=1.0, filler_count=2, filler_per_minute=2.0,
)
base.update(overrides)
return ProsodyFeatures(**base)
def _transcript_with_avg_confidence(c: float) -> list[TranscriptSegment]:
words = [WordTiming(word="w", start_s=0, end_s=0.5, confidence=c)]
return [TranscriptSegment(speaker="A", text="hi", start_s=0, end_s=1, words=words)]
def test_pronunciation_high_confidence_score_3() -> None:
transcript = _transcript_with_avg_confidence(0.92)
r = score_pronunciation(_features(), transcript, language="en")
assert r.score == 3
def test_pronunciation_low_confidence_score_0() -> None:
transcript = _transcript_with_avg_confidence(0.45)
r = score_pronunciation(_features(), transcript, language="en")
assert r.score == 0
def test_speech_rate_ideal_3() -> None:
r = score_speech_rate(_features(speech_rate_wpm=135.0), language="en")
assert r.score == 3
def test_speech_rate_too_fast_0() -> None:
r = score_speech_rate(_features(speech_rate_wpm=220.0), language="en")
assert r.score == 0
def test_speech_rate_too_slow_1() -> None:
r = score_speech_rate(_features(speech_rate_wpm=70.0), language="en")
assert r.score <= 1
def test_pause_use_ideal_3() -> None:
# pause_total/duration = 12/60 = 0.20 → ideal
r = score_pause_use(_features(pause_total_s=12.0, duration_s=60.0), language="en")
assert r.score == 3
def test_filler_use_low_score_3() -> None:
r = score_filler_use(_features(filler_per_minute=1.5), language="en")
assert r.score == 3
def test_filler_use_high_score_0() -> None:
r = score_filler_use(_features(filler_per_minute=8.0), language="en")
assert r.score == 0
- Step 2: Run test to verify it fails
Run: uv run pytest packages/jw-core/tests/talk_lab/test_scorers_prosodic.py -v
Expected: FAIL.
- Step 3: Implement prosodic scorers
# packages/jw-core/src/jw_core/talk_lab/scorers/__init__.py
"""Scorers — pure functions over features + transcripts."""
# packages/jw-core/src/jw_core/talk_lab/scorers/prosodic.py
"""Prosodic counsel-point scorers (purely heuristic, no LLM)."""
from __future__ import annotations
from jw_core.talk_lab.models import (
CounselPointResult,
ProsodyFeatures,
TranscriptSegment,
)
_LOC_TITLES: dict[str, dict[str, str]] = {
"cp-01": {"en": "Clear Pronunciation", "es": "Pronunciación clara", "pt": "Pronúncia clara"},
"cp-02": {"en": "Speech Rate", "es": "Velocidad del habla", "pt": "Velocidade da fala"},
"cp-03": {"en": "Use of Pauses", "es": "Uso de pausas", "pt": "Uso de pausas"},
"cp-04": {"en": "Filler Words", "es": "Muletillas", "pt": "Vícios de linguagem"},
}
def _loc(point_id: str, language: str) -> str:
return _LOC_TITLES.get(point_id, {}).get(language, _LOC_TITLES.get(point_id, {}).get("en", ""))
def score_pronunciation(
features: ProsodyFeatures,
transcript: list[TranscriptSegment],
*,
language: str = "en",
) -> CounselPointResult:
confidences = [w.confidence for s in transcript for w in s.words]
if not confidences:
return CounselPointResult(
point_id="cp-01",
title="Clear Pronunciation",
title_localized=_loc("cp-01", language),
score=0,
evidence=["no word-level transcript available"],
suggestion="Re-run transcription with word-level timestamps enabled (WhisperX).",
)
avg_conf = sum(confidences) / len(confidences)
if avg_conf >= 0.85:
score, suggestion = 3, "Pronunciation is clear and confident."
elif avg_conf >= 0.70:
score, suggestion = 2, "Pronunciation is mostly clear; slow down slightly on harder words."
elif avg_conf >= 0.55:
score, suggestion = 1, "Several words are unclear; record again in a quieter environment."
else:
score, suggestion = 0, "Pronunciation needs significant work."
return CounselPointResult(
point_id="cp-01",
title="Clear Pronunciation",
title_localized=_loc("cp-01", language),
score=score,
evidence=[f"avg word confidence: {avg_conf:.2f}"],
suggestion=suggestion,
)
def score_speech_rate(features: ProsodyFeatures, *, language: str = "en") -> CounselPointResult:
wpm = features.speech_rate_wpm
if 120 <= wpm <= 150:
score, suggestion = 3, "Speech rate is in the ideal teaching range."
elif 100 <= wpm < 120 or 150 < wpm <= 175:
score, suggestion = 2, "Speech rate is acceptable; adjust slightly for clarity."
elif 80 <= wpm < 100 or 175 < wpm <= 200:
score, suggestion = 1, "Speech rate is off-target; slow down or speed up."
else:
score, suggestion = 0, "Speech rate is far from ideal; reread the counsel."
return CounselPointResult(
point_id="cp-02",
title="Speech Rate",
title_localized=_loc("cp-02", language),
score=score,
evidence=[f"{wpm:.0f} wpm"],
suggestion=suggestion,
)
def score_pause_use(features: ProsodyFeatures, *, language: str = "en") -> CounselPointResult:
if features.duration_s <= 0:
return CounselPointResult(
point_id="cp-03",
title="Use of Pauses",
title_localized=_loc("cp-03", language),
score=0,
evidence=["zero duration"],
)
pause_ratio = features.pause_total_s / features.duration_s
if 0.15 <= pause_ratio <= 0.25:
score, suggestion = 3, "Pauses are well placed; ideas land."
elif 0.08 <= pause_ratio < 0.15 or 0.25 < pause_ratio <= 0.35:
score, suggestion = 2, "Pauses are present; refine for emphasis."
elif 0.03 <= pause_ratio < 0.08 or 0.35 < pause_ratio <= 0.45:
score, suggestion = 1, "Pauses are too few or too many."
else:
score, suggestion = 0, "Pause use needs work."
return CounselPointResult(
point_id="cp-03",
title="Use of Pauses",
title_localized=_loc("cp-03", language),
score=score,
evidence=[f"pause ratio: {pause_ratio:.2f}"],
suggestion=suggestion,
)
def score_filler_use(features: ProsodyFeatures, *, language: str = "en") -> CounselPointResult:
fpm = features.filler_per_minute
if fpm < 2:
score, suggestion = 3, "Filler words are minimal."
elif fpm < 4:
score, suggestion = 2, "Some filler words; aware of them."
elif fpm < 6:
score, suggestion = 1, "Filler words are noticeable; slow down to replace with silence."
else:
score, suggestion = 0, "Filler words are very frequent; deliberate practice needed."
return CounselPointResult(
point_id="cp-04",
title="Filler Words",
title_localized=_loc("cp-04", language),
score=score,
evidence=[f"{fpm:.1f} fillers/min"],
suggestion=suggestion,
)
- Step 4: Run test to verify it passes
Run: uv run pytest packages/jw-core/tests/talk_lab/test_scorers_prosodic.py -v
Expected: 8 passed.
- Step 5: Commit
git add packages/jw-core/src/jw_core/talk_lab/scorers/__init__.py packages/jw-core/src/jw_core/talk_lab/scorers/prosodic.py packages/jw-core/tests/talk_lab/test_scorers_prosodic.py
git commit -m "feat(talk_lab): prosodic counsel-point scorers (pronunciation, rate, pauses, fillers)"
Task 7: Linguistic scorer + LLM-judge stub for audience
Files:
-
Create:
packages/jw-core/src/jw_core/talk_lab/scorers/linguistic.py -
Create:
packages/jw-core/src/jw_core/talk_lab/scorers/audience_llm.py -
Create:
packages/jw-core/tests/talk_lab/test_scorers_linguistic.py -
Create:
packages/jw-core/tests/talk_lab/test_scorers_audience.py -
Step 1: Write the failing tests
# packages/jw-core/tests/talk_lab/test_scorers_linguistic.py
from __future__ import annotations
from jw_core.talk_lab.models import TranscriptSegment, WordTiming
from jw_core.talk_lab.scorers.linguistic import score_scripture_use
def _ts(text: str) -> list[TranscriptSegment]:
return [TranscriptSegment(speaker="A", text=text, start_s=0, end_s=1)]
def test_scripture_use_high_with_explicit_reference() -> None:
transcript = _ts("As Juan 3:16 makes clear, this principle...")
r = score_scripture_use(transcript, language="es")
assert r.score >= 2
def test_scripture_use_low_without_any_ref() -> None:
transcript = _ts("Just talk no scriptures here at all.")
r = score_scripture_use(transcript, language="es")
assert r.score == 0
# packages/jw-core/tests/talk_lab/test_scorers_audience.py
from __future__ import annotations
import pytest
from jw_core.talk_lab.models import TranscriptSegment
from jw_core.talk_lab.scorers.audience_llm import score_audience_warmth
class FakeLLM:
def __init__(self, text: str) -> None:
self._text = text
async def acomplete(self, prompt: str) -> str:
return self._text
def _ts(text: str) -> list[TranscriptSegment]:
return [TranscriptSegment(speaker="A", text=text, start_s=0, end_s=1)]
@pytest.mark.asyncio
async def test_audience_warmth_with_fake_llm_returning_3() -> None:
r = await score_audience_warmth(_ts("Hello dear friends, thank you for being here."), llm=FakeLLM("3"), language="en")
assert r.score == 3
@pytest.mark.asyncio
async def test_audience_warmth_without_llm_fallback_heuristic() -> None:
# No LLM provider → heuristic counts warmth words
r = await score_audience_warmth(_ts("dear friends, thank you, brothers"), llm=None, language="en")
assert r.score >= 1
- Step 2: Run tests to verify they fail
Run: uv run pytest packages/jw-core/tests/talk_lab/test_scorers_linguistic.py packages/jw-core/tests/talk_lab/test_scorers_audience.py -v
Expected: FAIL.
- Step 3: Implement linguistic + audience scorers
# packages/jw-core/src/jw_core/talk_lab/scorers/linguistic.py
"""Linguistic counsel-point scorers (heuristic, no LLM)."""
from __future__ import annotations
from jw_core.parsers.reference import parse_all_references
from jw_core.talk_lab.models import CounselPointResult, TranscriptSegment
def score_scripture_use(
transcript: list[TranscriptSegment],
*,
language: str = "es",
) -> CounselPointResult:
text = " ".join(s.text for s in transcript)
refs = parse_all_references(text) if text else []
n = len(refs)
if n >= 3:
score, suggestion = 3, "Multiple Scriptures cited and connected to the points."
elif n == 2:
score, suggestion = 2, "Couple of Scriptures cited; tie them more explicitly to the points."
elif n == 1:
score, suggestion = 1, "One Scripture; consider adding a second to reinforce the teaching."
else:
score, suggestion = 0, "No Scriptures detected; add at least one to ground the teaching."
title_loc = {"en": "Use of Scripture", "es": "Uso de la Escritura", "pt": "Uso da Escritura"}
return CounselPointResult(
point_id="cp-05",
title="Use of Scripture",
title_localized=title_loc.get(language, "Use of Scripture"),
score=score,
evidence=[f"{n} Scriptures parsed"],
suggestion=suggestion,
)
# packages/jw-core/src/jw_core/talk_lab/scorers/audience_llm.py
"""Audience scorers (LLM judge opt-in, heuristic fallback)."""
from __future__ import annotations
from typing import Protocol
from jw_core.talk_lab.models import CounselPointResult, TranscriptSegment
class LLMLike(Protocol):
async def acomplete(self, prompt: str) -> str: ...
_WARMTH_WORDS = {
"en": ["dear", "thank you", "friends", "brothers", "sisters", "appreciate", "welcome"],
"es": ["queridos", "gracias", "amigos", "hermanos", "hermanas", "aprecio", "bienvenidos"],
"pt": ["queridos", "obrigado", "amigos", "irmãos", "irmãs", "aprecio", "bem-vindos"],
}
async def score_audience_warmth(
transcript: list[TranscriptSegment],
*,
llm: LLMLike | None = None,
language: str = "es",
) -> CounselPointResult:
text = " ".join(s.text for s in transcript)
title_loc = {"en": "Audience Warmth", "es": "Calidez hacia el auditorio", "pt": "Calor hacia o auditório"}
if llm is None:
words = _WARMTH_WORDS.get(language, _WARMTH_WORDS["en"])
hits = sum(1 for w in words if w.lower() in text.lower())
if hits >= 3:
score, suggestion = 3, "Warmth is consistently expressed."
elif hits == 2:
score, suggestion = 2, "Some warmth shown; consider naming the audience explicitly."
elif hits == 1:
score, suggestion = 1, "Warmth is minimal; greet the audience and acknowledge them."
else:
score, suggestion = 0, "Warmth is missing; add a personal opener."
return CounselPointResult(
point_id="cp-06",
title="Audience Warmth",
title_localized=title_loc.get(language, "Audience Warmth"),
score=score,
evidence=[f"{hits} warmth markers"],
suggestion=suggestion,
)
prompt = (
f"Score the audience warmth of this talk from 0 to 3.\n"
f"0 = cold; 3 = warm.\n"
f"Talk: {text}\n"
f"Respond with a single digit only."
)
raw = (await llm.acomplete(prompt)).strip()
try:
score = int(raw[0])
if score not in (0, 1, 2, 3):
score = 0
except (ValueError, IndexError):
score = 0
return CounselPointResult(
point_id="cp-06",
title="Audience Warmth",
title_localized=title_loc.get(language, "Audience Warmth"),
score=score, # type: ignore[arg-type]
evidence=[f"LLM judge: {raw!r}"],
)
- Step 4: Run tests to verify they pass
Run: uv run pytest packages/jw-core/tests/talk_lab/test_scorers_linguistic.py packages/jw-core/tests/talk_lab/test_scorers_audience.py -v
Expected: 4 passed.
- Step 5: Commit
git add packages/jw-core/src/jw_core/talk_lab/scorers/linguistic.py packages/jw-core/src/jw_core/talk_lab/scorers/audience_llm.py packages/jw-core/tests/talk_lab/test_scorers_linguistic.py packages/jw-core/tests/talk_lab/test_scorers_audience.py
git commit -m "feat(talk_lab): linguistic scripture-use + LLM/heuristic audience-warmth scorers"
Task 8: Transcriber adapter (WhisperX with degradation)
Files:
-
Create:
packages/jw-core/src/jw_core/talk_lab/transcriber.py -
Step 1: Implement transcriber (no test now — integration via engine)
# packages/jw-core/src/jw_core/talk_lab/transcriber.py
"""WhisperX-based transcriber with graceful fallback.
If WhisperX (F64) isn't available, returns an empty transcript so the
report still renders prosody-only counsel points.
"""
from __future__ import annotations
import logging
import numpy as np
from jw_core.talk_lab.models import TranscriptSegment, WordTiming
logger = logging.getLogger(__name__)
def transcribe(audio: np.ndarray, *, sr: int = 16000, language: str = "es") -> list[TranscriptSegment]:
"""Return word-level transcript. Empty list on failure or missing dep."""
try:
from jw_core.audio.asr_providers.whisperx import WhisperXProvider # type: ignore
except Exception as exc: # noqa: BLE001
logger.info("talk_lab: WhisperX not available (%s); using empty transcript", exc)
return []
try:
provider = WhisperXProvider(language=language)
result = provider.transcribe(audio, sample_rate=sr, word_timestamps=True)
segments: list[TranscriptSegment] = []
for seg in result.segments:
words = [
WordTiming(word=w.word, start_s=w.start, end_s=w.end, confidence=w.confidence)
for w in (seg.words or [])
]
segments.append(
TranscriptSegment(
speaker=seg.speaker or "A",
text=seg.text,
start_s=seg.start,
end_s=seg.end,
words=words,
)
)
return segments
except Exception as exc: # noqa: BLE001
logger.warning("talk_lab: WhisperX transcribe failed (%s); empty transcript", exc)
return []
- Step 2: Commit (no test yet)
git add packages/jw-core/src/jw_core/talk_lab/transcriber.py
git commit -m "feat(talk_lab): WhisperX transcriber adapter (F64) with graceful fallback"
Task 9: Report builder + summary top-3 / focus-3
Files:
-
Create:
packages/jw-core/src/jw_core/talk_lab/report.py -
Create:
packages/jw-core/tests/talk_lab/test_report.py -
Step 1: Write the failing test
# packages/jw-core/tests/talk_lab/test_report.py
from __future__ import annotations
from jw_core.talk_lab.models import (
CounselPointResult, ProsodyFeatures, TalkLabReport,
)
from jw_core.talk_lab.report import build_report, pick_top_focus
def _cp(point_id: str, score: int) -> CounselPointResult:
return CounselPointResult(
point_id=point_id, title=point_id, title_localized=point_id, score=score, # type: ignore[arg-type]
)
def test_pick_top_focus_picks_3_high_and_3_low() -> None:
results = [
_cp("a", 3), _cp("b", 3), _cp("c", 2),
_cp("d", 1), _cp("e", 0), _cp("f", 1),
]
top, focus = pick_top_focus(results)
assert len(top) == 3
assert len(focus) == 3
assert "a" in top and "b" in top
assert "e" in focus
def test_build_report_smoke() -> None:
prosody = ProsodyFeatures(
duration_s=60.0, speech_rate_wpm=135.0, pitch_mean_hz=180.0,
pitch_range_hz=80.0, intensity_mean_db=-20.0, pause_count=8,
pause_total_s=12.0, pause_avg_s=1.5, filler_count=1, filler_per_minute=1.0,
)
rpt = build_report(
recording_path="/tmp/x.wav",
part_kind="bible_reading",
language="es",
transcript=[],
prosody=prosody,
counsel_results=[_cp("a", 3), _cp("b", 0)],
)
assert isinstance(rpt, TalkLabReport)
assert rpt.duration_s == 60.0
assert len(rpt.summary_top_3) == 1
assert len(rpt.summary_focus_3) == 1
- Step 2: Run test to verify it fails
Run: uv run pytest packages/jw-core/tests/talk_lab/test_report.py -v
Expected: FAIL.
- Step 3: Implement report builder
# packages/jw-core/src/jw_core/talk_lab/report.py
"""Report builder for talk_lab."""
from __future__ import annotations
from jw_core.talk_lab.models import (
CounselPointResult,
ProsodyFeatures,
TalkLabReport,
TranscriptSegment,
PartKind,
)
def pick_top_focus(results: list[CounselPointResult]) -> tuple[list[str], list[str]]:
"""Return (top_3, focus_3) lists of point_id strings."""
by_score = sorted(results, key=lambda r: r.score, reverse=True)
top = [r.point_id for r in by_score[:3] if r.score >= 2]
focus = [r.point_id for r in by_score[::-1][:3] if r.score <= 1]
return top, focus
def build_report(
*,
recording_path: str,
part_kind: PartKind,
language: str,
transcript: list[TranscriptSegment],
prosody: ProsodyFeatures,
counsel_results: list[CounselPointResult],
trace_path: str | None = None,
) -> TalkLabReport:
top, focus = pick_top_focus(counsel_results)
return TalkLabReport(
recording_path=recording_path,
part_kind=part_kind,
language=language, # type: ignore[arg-type]
duration_s=prosody.duration_s,
transcript=transcript,
prosody=prosody,
counsel_results=counsel_results,
summary_top_3=top,
summary_focus_3=focus,
trace_path=trace_path,
)
- Step 4: Run test to verify it passes
Run: uv run pytest packages/jw-core/tests/talk_lab/test_report.py -v
Expected: 2 passed.
- Step 5: Commit
git add packages/jw-core/src/jw_core/talk_lab/report.py packages/jw-core/tests/talk_lab/test_report.py
git commit -m "feat(talk_lab): report builder + top-3 / focus-3 picker"
Task 10: SessionHistory SQLite (opt-in tracking)
Files:
-
Create:
packages/jw-core/src/jw_core/talk_lab/history.py -
Create:
packages/jw-core/tests/talk_lab/test_history.py -
Step 1: Write the failing test
# packages/jw-core/tests/talk_lab/test_history.py
from __future__ import annotations
from pathlib import Path
from jw_core.talk_lab.history import SessionHistory
def test_session_history_round_trip(tmp_path: Path) -> None:
h = SessionHistory(tmp_path / "history.sqlite")
h.track(
recording_hash="abc",
report_id="r1",
scores={"cp-01": 3, "cp-02": 2},
part_kind="bible_reading",
language="es",
)
rows = h.list()
assert len(rows) == 1
assert rows[0].report_id == "r1"
assert rows[0].scores["cp-01"] == 3
def test_session_history_compare_returns_deltas(tmp_path: Path) -> None:
h = SessionHistory(tmp_path / "history.sqlite")
h.track(recording_hash="a", report_id="r1", scores={"cp-01": 1}, part_kind="bible_reading", language="es")
h.track(recording_hash="b", report_id="r2", scores={"cp-01": 3}, part_kind="bible_reading", language="es")
deltas = h.compare("r1", "r2")
assert deltas["cp-01"] == 2
- Step 2: Run test to verify it fails
Run: uv run pytest packages/jw-core/tests/talk_lab/test_history.py -v
Expected: FAIL.
- Step 3: Implement history
# packages/jw-core/src/jw_core/talk_lab/history.py
"""Session history (opt-in longitudinal tracking)."""
from __future__ import annotations
import json
import sqlite3
from pathlib import Path
from pydantic import BaseModel
class HistoryRow(BaseModel):
report_id: str
recording_hash: str
part_kind: str
language: str
scores: dict[str, int]
timestamp: str
class SessionHistory:
"""SQLite-backed tracker for talk_lab reports (local-only)."""
def __init__(self, path: Path | str) -> None:
self._path = Path(path)
self._path.parent.mkdir(parents=True, exist_ok=True)
self._conn = sqlite3.connect(self._path)
self._init_schema()
def _init_schema(self) -> None:
self._conn.execute(
"""
CREATE TABLE IF NOT EXISTS reports (
report_id TEXT PRIMARY KEY,
recording_hash TEXT NOT NULL,
part_kind TEXT NOT NULL,
language TEXT NOT NULL,
scores_json TEXT NOT NULL,
timestamp TEXT DEFAULT CURRENT_TIMESTAMP
)
"""
)
self._conn.commit()
def track(
self,
*,
recording_hash: str,
report_id: str,
scores: dict[str, int],
part_kind: str,
language: str,
) -> None:
self._conn.execute(
"INSERT OR REPLACE INTO reports (report_id, recording_hash, part_kind, language, scores_json) "
"VALUES (?, ?, ?, ?, ?)",
(report_id, recording_hash, part_kind, language, json.dumps(scores)),
)
self._conn.commit()
def list(self) -> list[HistoryRow]:
cur = self._conn.execute(
"SELECT report_id, recording_hash, part_kind, language, scores_json, timestamp FROM reports "
"ORDER BY timestamp DESC"
)
rows = []
for r in cur:
rows.append(
HistoryRow(
report_id=r[0],
recording_hash=r[1],
part_kind=r[2],
language=r[3],
scores=json.loads(r[4]),
timestamp=r[5],
)
)
return rows
def compare(self, report_id_a: str, report_id_b: str) -> dict[str, int]:
cur = self._conn.execute(
"SELECT report_id, scores_json FROM reports WHERE report_id IN (?, ?)",
(report_id_a, report_id_b),
)
a, b = {}, {}
for rid, sj in cur:
d = json.loads(sj)
if rid == report_id_a:
a = d
else:
b = d
return {pid: b.get(pid, 0) - a.get(pid, 0) for pid in {*a, *b}}
- Step 4: Run test to verify it passes
Run: uv run pytest packages/jw-core/tests/talk_lab/test_history.py -v
Expected: 2 passed.
- Step 5: Commit
git add packages/jw-core/src/jw_core/talk_lab/history.py packages/jw-core/tests/talk_lab/test_history.py
git commit -m "feat(talk_lab): SessionHistory SQLite for opt-in longitudinal tracking"
Task 11: Engine — analyze_recording end-to-end
Files:
-
Create:
packages/jw-core/src/jw_core/talk_lab/engine.py -
Create:
packages/jw-core/tests/talk_lab/test_engine.py -
Step 1: Write the failing test
# packages/jw-core/tests/talk_lab/test_engine.py
from __future__ import annotations
import wave
from pathlib import Path
import pytest
from jw_core.talk_lab.engine import analyze_recording, TalkLabConfig
from jw_core.talk_lab.models import TalkLabReport
def _write_silent_wav(path: Path, duration_s: float = 5.0, sr: int = 16000) -> None:
with wave.open(str(path), "wb") as w:
w.setnchannels(1)
w.setsampwidth(2)
w.setframerate(sr)
w.writeframes(b"\x00\x00" * int(duration_s * sr))
@pytest.mark.asyncio
async def test_analyze_recording_silence_produces_valid_report(tmp_path: Path) -> None:
wav = tmp_path / "x.wav"
_write_silent_wav(wav, duration_s=2.0)
rpt = await analyze_recording(
recording_path=str(wav),
config=TalkLabConfig(part_kind="bible_reading", language="es", llm_judge=False),
)
assert isinstance(rpt, TalkLabReport)
assert rpt.language == "es"
assert rpt.duration_s == pytest.approx(2.0, abs=0.1)
# No transcript → pronunciation score should be 0
assert any(r.point_id == "cp-01" and r.score == 0 for r in rpt.counsel_results)
@pytest.mark.asyncio
async def test_analyze_recording_returns_top_and_focus(tmp_path: Path) -> None:
wav = tmp_path / "x.wav"
_write_silent_wav(wav, duration_s=2.0)
rpt = await analyze_recording(
recording_path=str(wav),
config=TalkLabConfig(part_kind="bible_reading", language="es", llm_judge=False),
)
# summary lists may be empty if everything is mid-tier, but they must exist
assert isinstance(rpt.summary_top_3, list)
assert isinstance(rpt.summary_focus_3, list)
- Step 2: Run test to verify it fails
Run: uv run pytest packages/jw-core/tests/talk_lab/test_engine.py -v
Expected: FAIL.
- Step 3: Implement engine
# packages/jw-core/src/jw_core/talk_lab/engine.py
"""End-to-end engine: load → transcribe → prosody → score → report."""
from __future__ import annotations
import logging
from pydantic import BaseModel
from jw_core.talk_lab.audio_loader import load_audio_mono16k
from jw_core.talk_lab.counsel_points.loader import applies_to, load_catalog
from jw_core.talk_lab.filler import count_fillers
from jw_core.talk_lab.models import (
CounselPointResult,
PartKind,
TalkLabReport,
TranscriptSegment,
)
from jw_core.talk_lab.prosody import extract_prosody
from jw_core.talk_lab.report import build_report
from jw_core.talk_lab.scorers.audience_llm import score_audience_warmth
from jw_core.talk_lab.scorers.linguistic import score_scripture_use
from jw_core.talk_lab.scorers.prosodic import (
score_filler_use,
score_pause_use,
score_pronunciation,
score_speech_rate,
)
from jw_core.talk_lab.transcriber import transcribe
logger = logging.getLogger(__name__)
class TalkLabConfig(BaseModel):
part_kind: PartKind
language: str = "es"
llm_judge: bool = False
async def analyze_recording(
*,
recording_path: str,
config: TalkLabConfig,
) -> TalkLabReport:
audio, sr = load_audio_mono16k(recording_path)
transcript: list[TranscriptSegment] = transcribe(audio, sr=sr, language=config.language)
text = " ".join(s.text for s in transcript)
word_count = sum(len(s.words) or len(s.text.split()) for s in transcript)
filler_count = count_fillers(text, language=config.language)
prosody = extract_prosody(audio, sr=sr, word_count=word_count, filler_count=filler_count)
catalog = load_catalog(config.language)
applicable = applies_to(config.part_kind)
counsel_results: list[CounselPointResult] = []
for point in catalog:
if point.id not in applicable:
counsel_results.append(
CounselPointResult(
point_id=point.id,
title=point.title,
title_localized=point.title_localized,
score=0,
applies=False,
)
)
continue
if point.scorer == "score_pronunciation":
r = score_pronunciation(prosody, transcript, language=config.language)
elif point.scorer == "score_speech_rate":
r = score_speech_rate(prosody, language=config.language)
elif point.scorer == "score_pause_use":
r = score_pause_use(prosody, language=config.language)
elif point.scorer == "score_filler_use":
r = score_filler_use(prosody, language=config.language)
elif point.scorer == "score_scripture_use":
r = score_scripture_use(transcript, language=config.language)
elif point.scorer == "score_audience_warmth":
llm = None
if config.llm_judge:
try:
from jw_finetune.synth.provider import build_provider_from_env # type: ignore
llm = build_provider_from_env(scope="talklab")
except Exception as exc: # noqa: BLE001
logger.warning("talk_lab: LLM judge requested but provider unavailable: %s", exc)
r = await score_audience_warmth(transcript, llm=llm, language=config.language)
else:
r = CounselPointResult(
point_id=point.id,
title=point.title,
title_localized=point.title_localized,
score=0,
evidence=[f"unknown scorer: {point.scorer}"],
)
counsel_results.append(r)
return build_report(
recording_path=recording_path,
part_kind=config.part_kind,
language=config.language,
transcript=transcript,
prosody=prosody,
counsel_results=counsel_results,
)
- Step 4: Run test to verify it passes
Run: uv run pytest packages/jw-core/tests/talk_lab/test_engine.py -v
Expected: 2 passed.
- Step 5: Commit
git add packages/jw-core/src/jw_core/talk_lab/engine.py packages/jw-core/tests/talk_lab/test_engine.py
git commit -m "feat(talk_lab): analyze_recording engine wiring load→transcribe→prosody→score→report"
Task 12: CLI + MCP wire-up + guide
Files:
-
Create:
packages/jw-cli/src/jw_cli/commands/talklab.py -
Modify:
packages/jw-cli/src/jw_cli/main.py -
Modify:
packages/jw-mcp/src/jw_mcp/server.py -
Create:
docs/guias/talk-lab.md -
Modify:
docs/ROADMAP.md -
Modify:
docs/README.md -
Step 1: CLI module
# packages/jw-cli/src/jw_cli/commands/talklab.py
"""`jw talklab` CLI."""
from __future__ import annotations
import asyncio
import hashlib
import json
from pathlib import Path
import typer
from rich.console import Console
from rich.table import Table
from jw_core.talk_lab.counsel_points.loader import applies_to, load_catalog
from jw_core.talk_lab.engine import TalkLabConfig, analyze_recording
from jw_core.talk_lab.history import SessionHistory
app = typer.Typer(help="Talk-lab — coach of public speaking.")
console = Console()
@app.command("analyze")
def cmd_analyze(
recording: str = typer.Argument(..., help="Path to .wav recording"),
kind: str = typer.Option("bible_reading", "--kind", "-k"),
language: str = typer.Option("es", "--language", "-l"),
llm_judge: bool = typer.Option(False, "--llm-judge"),
track_history: bool = typer.Option(False, "--track-history"),
export_md: str | None = typer.Option(None, "--export"),
) -> None:
"""Analyze a recording and print TalkLabReport."""
cfg = TalkLabConfig(part_kind=kind, language=language, llm_judge=llm_judge) # type: ignore[arg-type]
rpt = asyncio.run(analyze_recording(recording_path=recording, config=cfg))
console.print_json(rpt.model_dump_json())
if track_history:
h = SessionHistory(Path("~/.jw-agent-toolkit/talklab/history.sqlite").expanduser())
scores = {r.point_id: r.score for r in rpt.counsel_results if r.applies}
h.track(
recording_hash=hashlib.sha256(Path(recording).read_bytes()).hexdigest()[:16],
report_id=hashlib.sha256(rpt.model_dump_json().encode()).hexdigest()[:12],
scores=scores,
part_kind=rpt.part_kind,
language=rpt.language,
)
console.print("[dim]tracked to local history.sqlite[/]")
if export_md:
Path(export_md).write_text(_markdown(rpt))
console.print(f"[dim]exported to {export_md}[/]")
@app.command("history")
def cmd_history() -> None:
"""Show local TalkLab history."""
h = SessionHistory(Path("~/.jw-agent-toolkit/talklab/history.sqlite").expanduser())
table = Table(title="TalkLab history")
table.add_column("Report")
table.add_column("Kind")
table.add_column("Lang")
table.add_column("Top scores")
for row in h.list():
top = ", ".join(f"{pid}={s}" for pid, s in sorted(row.scores.items(), key=lambda kv: -kv[1])[:3])
table.add_row(row.report_id, row.part_kind, row.language, top)
console.print(table)
@app.command("counsel-points")
def cmd_counsel_points(
language: str = typer.Option("es", "--language", "-l"),
kind: str | None = typer.Option(None, "--kind", "-k"),
) -> None:
"""List counsel points (optionally filtered by kind)."""
catalog = load_catalog(language)
applicable = applies_to(kind) if kind else {p.id for p in catalog}
table = Table(title=f"Counsel points ({language}{', kind=' + kind if kind else ''})")
table.add_column("ID")
table.add_column("Title")
table.add_column("Category")
table.add_column("Applies")
for p in catalog:
table.add_row(p.id, p.title_localized, p.category, "yes" if p.id in applicable else "no")
console.print(table)
def _markdown(report) -> str:
lines = [
f"# TalkLab report — {report.part_kind}",
f"- Language: {report.language}",
f"- Duration: {report.duration_s:.1f}s",
"",
"## Prosody",
f"- Speech rate: {report.prosody.speech_rate_wpm:.0f} wpm",
f"- Pause count: {report.prosody.pause_count} (total {report.prosody.pause_total_s:.1f}s)",
f"- Fillers/min: {report.prosody.filler_per_minute:.1f}",
"",
"## Top 3 strengths",
*[f"- {pid}" for pid in report.summary_top_3],
"",
"## 3 focus areas",
*[f"- {pid}" for pid in report.summary_focus_3],
"",
"## All counsel points",
]
for r in report.counsel_results:
if not r.applies:
continue
lines.append(f"- **{r.point_id} {r.title_localized}**: {r.score}/3 — {r.suggestion}")
return "\n".join(lines)
- Step 2: Register subcommand in
main.py
from jw_cli.commands import talklab as _talklab_cmd
app.add_typer(_talklab_cmd.app, name="talklab")
- Step 3: MCP tools in
server.py
@mcp.tool
async def talklab_analyze(
recording_path: str,
part_kind: str = "bible_reading",
language: str = "es",
llm_judge: bool = False,
) -> dict:
"""Analyze a recording with talk-lab."""
from jw_core.talk_lab.engine import TalkLabConfig, analyze_recording
rpt = await analyze_recording(
recording_path=recording_path,
config=TalkLabConfig(part_kind=part_kind, language=language, llm_judge=llm_judge), # type: ignore[arg-type]
)
return rpt.model_dump()
@mcp.tool
async def talklab_list_counsel_points(
part_kind: str | None = None,
language: str = "es",
) -> dict:
"""List counsel points for a language and optional part_kind."""
from jw_core.talk_lab.counsel_points.loader import applies_to, load_catalog
catalog = load_catalog(language)
applicable = applies_to(part_kind) if part_kind else {p.id for p in catalog}
return {"points": [p.model_dump() | {"applies": p.id in applicable} for p in catalog]}
@mcp.tool
async def talklab_compare(report_id_a: str, report_id_b: str) -> dict:
"""Compare two tracked reports."""
from pathlib import Path
from jw_core.talk_lab.history import SessionHistory
h = SessionHistory(Path("~/.jw-agent-toolkit/talklab/history.sqlite").expanduser())
return {"deltas": h.compare(report_id_a, report_id_b)}
- Step 4: Add
[talk-lab]extra inpyproject.toml
[project.optional-dependencies]
"talk-lab" = ["librosa>=0.10", "numpy>=1.24", "scipy>=1.11"]
- Step 5: Add guide stub
docs/guias/talk-lab.md:
# Talk-lab (Fase 68)
> Coach de oratoria multimodal sobre tus propias grabaciones.
## Quick start
\`\`\`bash
jw talklab analyze recording.wav --kind bible_reading --language es
# Con LLM judge para counsel points de auditorio
jw talklab analyze recording.wav --llm-judge
# Tracking longitudinal (opt-in)
jw talklab analyze recording.wav --track-history
jw talklab history
# Exportar markdown
jw talklab analyze recording.wav --export report.md
\`\`\`
## CLI
| Comando | Descripción |
|-------------------------------|------------------------------------------|
| `jw talklab analyze` | Analizar grabación |
| `jw talklab history` | Ver historia local |
| `jw talklab counsel-points` | Listar counsel points por kind |
## MCP
| Tool | Descripción |
|--------------------------------|--------------------------------------|
| `talklab_analyze` | Analyze recording |
| `talklab_list_counsel_points` | List counsel points |
| `talklab_compare` | Compare two reports |
## Privacidad
El audio nunca sale del disco. El historial es local y opt-in. Nada se
sube a cloud salvo que actives `--llm-judge` (que solo envía texto
transcripción al LLM, no audio).
## Counsel points (MVP)
6 puntos en v0.68. Roadmap: extender a 50.
| ID | Título | Categoría |
|-------|----------------------|------------|
| cp-01 | Pronunciación clara | prosodic |
| cp-02 | Velocidad del habla | prosodic |
| cp-03 | Uso de pausas | prosodic |
| cp-04 | Muletillas | prosodic |
| cp-05 | Uso de Escritura | linguistic |
| cp-06 | Calidez al auditorio | audience |
- Step 6: ROADMAP + README
Add Fase 68 section to docs/ROADMAP.md and link from docs/README.md (in “Guías por tema”).
- Step 7: Run full test suite
uv run pytest packages/jw-core/tests/talk_lab -v
uv run pytest
Expected: 30+ passed for talk_lab, ≥1917 total.
- Step 8: Commit
git add packages/jw-cli/src/jw_cli/commands/talklab.py packages/jw-cli/src/jw_cli/main.py packages/jw-mcp/src/jw_mcp/server.py packages/jw-core/pyproject.toml docs/guias/talk-lab.md docs/ROADMAP.md docs/README.md
git commit -m "feat(talklab): CLI + 3 MCP tools + extra [talk-lab] + guide for Fase 68"
Acceptance checklist
- Pydantic models, audio loader, prosody extractor, filler detector, catalog loader, 6 scorers, transcriber, report builder, history, engine — all green.
- CLI
jw talklab analyzeproduces JSON. - CLI
jw talklab counsel-pointslists 6 points. - CLI
jw talklab historyreads SQLite. - 3 MCP tools (
talklab_analyze,talklab_list_counsel_points,talklab_compare) are exposed. -
pyproject.tomldeclares[talk-lab]extra. - Guide
docs/guias/talk-lab.mdexists and is linked. - ROADMAP has Fase 68 section.
- Full repo suite passes (≥1917 total).
Follow-ups (out of scope for this plan)
- Expand counsel-point catalog from 6 → ~50 (one category per follow-up).
- Add ASCII timeline / SVG export to
report.py. - Wire
jw talklab compareCLI (only MCP tool exists in MVP). - Add F31 PDF export wrapper for TalkLabReport.
- Integrate F65 meta-orchestrator tool
talklab.analyze. - Add Cloud STT provider (Deepgram) via Plugin SDK F41.
Editar esta página en docs/superpowers/plans/2026-06-11-fase-68-talk-lab-plan.md