AI Pipeline¶
This page covers the application's AI pipeline — the runtime services that power transcription, extraction, review, and assessment. For the offline evaluation CLI, see E2E Eval Pipeline.
Pipeline Stages¶
Audio Recording
│
▼
┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐
│Transcription│───▶│ Extraction │───▶│ Review │ │ Assessment │
│ audio→text │ │ text→prices │ │ QA scoring │ │ obs→prices │
└─────────────┘ └─────────────┘ └─────────────┘ └─────────────┘
▲
PriceObservations
(from any source)
Transcription → Extraction → Review form the data capture path (per-call). Assessment is a separate workflow that operates on accumulated PriceObservation records across multiple calls/sources for a MAG.
Provider Matrix¶
| Stage | Providers | Default | Env Var |
|---|---|---|---|
| Transcription | AssemblyAI, Azure Speech, Whisper, ElevenLabs Scribe | AssemblyAI | TRANSCRIPTION_PROVIDER |
| Extraction | Gemini Flash, Claude Haiku, GPT-5.2 | Gemini | EXTRACTION_PROVIDER |
| Review | Gemini Flash, Claude Haiku, GPT-5.2 | Gemini | REVIEW_PROVIDER |
| Assessment | Gemini Flash, Claude Haiku, GPT-5.2 | Gemini | ASSESSMENT_PROVIDER |
All providers implement a common interface per stage (e.g., IExtractionService). Switching provider requires only the environment variable — no code changes.
API Keys¶
| Provider | Env Var |
|---|---|
| Gemini | GEMINI_API_KEY |
| Anthropic (Claude) | ANTHROPIC_API_KEY |
| OpenAI (GPT) | OPENAI_API_KEY |
| AssemblyAI | ASSEMBLYAI_API_KEY |
| Azure Speech | AZURE_SPEECH_API_KEY + AZURE_SPEECH_REGION |
| ElevenLabs | ELEVENLABS_API_KEY |
Prompt Templates¶
Prompts live in prompts/*.yaml and are loaded at runtime by src/lib/prompts.ts.
| File | Stage | Purpose |
|---|---|---|
extraction.yaml | Extraction | Transcript → structured price data points |
review.yaml | Review | Extraction quality scoring with confidence |
review-v2.yaml | Review | Adversarial review prompt (calibration variant) |
review-judge.yaml | Review | Judge-style review for calibration |
assessment.yaml | Assessment | Observations + statistics → assessed prices |
assessment-naive.yaml | Assessment | Simplified assessment (baseline comparison) |
assessment-review.yaml | Assessment | Post-assessment data-grounding validation (checks rationale claims against observations) |
Each template has:
system_prompt— role and instructions for the LLMuser_prompt_template— Mustache-style template with{{variable}}placeholders- Optional
thinking_budget— controls extended thinking (0 = disabled; assessment uses 0)
Prompt hashing (computePromptHash()) tracks which prompt version produced each result.
Commodity Configuration¶
configs/commodities.yaml is the single source of truth for commodity-specific data:
commodities:
cobalt:
methodology: |
## Cobalt Assessment Methodology
...data hierarchy, rationale writing guide...
extraction_context:
commodity_type: base metal
common_terms: [alloy grade, standard grade, MB free market]
price_units: [$ per lb, $ per tonne]
typical_ranges:
"$ per lb": { min: 10, max: 25 }
assessment_guidance: |
...cobalt-specific assessment rules...
patterns:
mags: [cobalt]
markets: [cobalt]
Resolution: MAG code / market name → pattern match → commodity config. A default_methodology provides fallback text when no commodity pattern matches.
Supported Commodities¶
| Commodity | MAG Code | Markets | Key Features |
|---|---|---|---|
| Cobalt | COBALT-LON | 2 | Base metal, $/lb ranges, alloy + standard grade |
| SYP Lumber | LBR-SYP | 205 | Derived pricing, benchmark + formula markets |
| SA Vegoils | SA-VEGOILS | 5 | Forward curve (M1–M6 tenors), contract months |
Assessment Pipeline¶
The assessment workflow is the most complex pipeline. It runs per-MAG, per-assessment-period:
- Observation gathering — query
PriceObservationrecords for the MAG's markets within the assessment period; per-assessment exclusion overrides (AssessmentObservationExclusion) are resolved viaobservation-resolver.ts, taking priority over the globaldiscardedflag - Statistics calculation (deterministic TypeScript, not LLM):
- Weighted mean, median, range per market
- Cross-market spreads and directional indicators
- Related market context (sibling MAG prices, coherence checks)
- LLM assessment — prompt includes methodology, observations, and pre-calculated statistics. The LLM reasons about the assessed price range, confidence, rationale, and flags. It does not perform arithmetic.
- Derive strategy (SYP Lumber) — for formula-based markets, TypeScript applies the formula to the LLM's benchmark assessment rather than asking the LLM to assess each market individually
- Validation — two-part check between assess and persist nodes: (a) deterministic zero-data guard clamps any market with 0 observations to no-data status; (b) fast LLM review (
assessment-review.yaml) audits rationale claims against observations to catch phantom data and price disconnects — derived (formula) markets are excluded from LLM review; corrections are surgical only (number/type swaps and line strikes; rationale structure is never altered); up to two review passes run with early stop if the first pass is clean - Result persistence —
AiAssessmentResultrecord with per-market JSON results and overall summary
Key Service Files¶
| Path | Purpose |
|---|---|
src/services/assessment/ | Assessment service (Gemini, Anthropic, OpenAI implementations) |
src/services/assessment/observation-resolver.ts | Resolves effective observation include/exclude status with per-assessment overrides |
src/services/assessment/derive/ | Pluggable derive strategy for formula markets |
src/services/extraction/ | Extraction service + storage |
src/services/transcription/ | Transcription service (provider factory) |
src/services/review/ | Review service |
src/services/evaluation/ | Result comparison and evaluators |
src/services/pipeline/graphs/ | LangGraph workflow definitions |
src/services/market-resolution/ | Market name → database ID resolution |
src/services/audio-generation/ | TTS pipeline for test audio |
src/lib/prompts.ts | Prompt loader with hash tracking |
src/lib/ai-config.ts | Model configuration and provider selection |
LangGraph Orchestration¶
Pipeline steps are implemented as LangGraph graphs in src/services/pipeline/graphs/:
assessment-graph.ts— the primary assessment workflow graph- Individual step graphs for transcription, extraction, review
full-pipeline-graph.ts— end-to-end orchestration
Each graph node calls abstracted service classes (not inline AI code), enabling provider swapping and isolated testing. Logfire instrumentation is built into each service for tracing.