AI Pipeline¶

This page covers the application's AI pipeline — the runtime services that power transcription, extraction, review, and assessment. For the offline evaluation CLI, see E2E Eval Pipeline.

Pipeline Stages¶

Audio Recording
    │
    ▼
┌─────────────┐    ┌─────────────┐    ┌─────────────┐    ┌─────────────┐
│Transcription│───▶│ Extraction  │───▶│   Review    │    │ Assessment  │
│ audio→text  │    │ text→prices │    │  QA scoring │    │ obs→prices  │
└─────────────┘    └─────────────┘    └─────────────┘    └─────────────┘
                                                               ▲
                                                    PriceObservations
                                                    (from any source)

Transcription → Extraction → Review form the data capture path (per-call). Assessment is a separate workflow that operates on accumulated PriceObservation records across multiple calls/sources for a MAG.

Provider Matrix¶

Stage	Providers	Default	Env Var
Transcription	AssemblyAI, Azure Speech, Whisper, ElevenLabs Scribe	AssemblyAI	`TRANSCRIPTION_PROVIDER`
Extraction	Gemini Flash, Claude Haiku, GPT-5.2	Gemini	`EXTRACTION_PROVIDER`
Review	Gemini Flash, Claude Haiku, GPT-5.2	Gemini	`REVIEW_PROVIDER`
Assessment	Gemini Flash, Claude Haiku, GPT-5.2	Gemini	`ASSESSMENT_PROVIDER`

All providers implement a common interface per stage (e.g., IExtractionService). Switching provider requires only the environment variable — no code changes.

API Keys¶

Provider	Env Var
Gemini	`GEMINI_API_KEY`
Anthropic (Claude)	`ANTHROPIC_API_KEY`
OpenAI (GPT)	`OPENAI_API_KEY`
AssemblyAI	`ASSEMBLYAI_API_KEY`
Azure Speech	`AZURE_SPEECH_API_KEY` + `AZURE_SPEECH_REGION`
ElevenLabs	`ELEVENLABS_API_KEY`

Prompt Templates¶

Prompts live in prompts/*.yaml and are loaded at runtime by src/lib/prompts.ts.

File	Stage	Purpose
`extraction.yaml`	Extraction	Transcript → structured price data points
`review.yaml`	Review	Extraction quality scoring with confidence
`review-v2.yaml`	Review	Adversarial review prompt (calibration variant)
`review-judge.yaml`	Review	Judge-style review for calibration
`assessment.yaml`	Assessment	Observations + statistics → assessed prices
`assessment-naive.yaml`	Assessment	Simplified assessment (baseline comparison)
`assessment-review.yaml`	Assessment	Post-assessment data-grounding validation (checks rationale claims against observations)

Each template has:

system_prompt — role and instructions for the LLM
user_prompt_template — Mustache-style template with {{variable}} placeholders
Optional thinking_budget — controls extended thinking (0 = disabled; assessment uses 0)

Prompt hashing (computePromptHash()) tracks which prompt version produced each result.

Commodity Configuration¶

configs/commodities.yaml is the single source of truth for commodity-specific data:

commodities:
  cobalt:
    methodology: |
      ## Cobalt Assessment Methodology
      ...data hierarchy, rationale writing guide...
    extraction_context:
      commodity_type: base metal
      common_terms: [alloy grade, standard grade, MB free market]
      price_units: [$ per lb, $ per tonne]
      typical_ranges:
        "$ per lb": { min: 10, max: 25 }
    assessment_guidance: |
      ...cobalt-specific assessment rules...
    patterns:
      mags: [cobalt]
      markets: [cobalt]

Resolution: MAG code / market name → pattern match → commodity config. A default_methodology provides fallback text when no commodity pattern matches.

Supported Commodities¶

Commodity	MAG Code	Markets	Key Features
Cobalt	`COBALT-LON`	2	Base metal, $/lb ranges, alloy + standard grade
SYP Lumber	`LBR-SYP`	205	Derived pricing, benchmark + formula markets
SA Vegoils	`SA-VEGOILS`	5	Forward curve (M1–M6 tenors), contract months

Assessment Pipeline¶

The assessment workflow is the most complex pipeline. It runs per-MAG, per-assessment-period:

Observation gathering — query PriceObservation records for the MAG's markets within the assessment period; per-assessment exclusion overrides (AssessmentObservationExclusion) are resolved via observation-resolver.ts, taking priority over the global discarded flag
Statistics calculation (deterministic TypeScript, not LLM):
- Weighted mean, median, range per market
- Cross-market spreads and directional indicators
- Related market context (sibling MAG prices, coherence checks)
LLM assessment — prompt includes methodology, observations, and pre-calculated statistics. The LLM reasons about the assessed price range, confidence, rationale, and flags. It does not perform arithmetic.
Derive strategy (SYP Lumber) — for formula-based markets, TypeScript applies the formula to the LLM's benchmark assessment rather than asking the LLM to assess each market individually
Validation — two-part check between assess and persist nodes: (a) deterministic zero-data guard clamps any market with 0 observations to no-data status; (b) fast LLM review (assessment-review.yaml) audits rationale claims against observations to catch phantom data and price disconnects — derived (formula) markets are excluded from LLM review; corrections are surgical only (number/type swaps and line strikes; rationale structure is never altered); up to two review passes run with early stop if the first pass is clean
Result persistence — AiAssessmentResult record with per-market JSON results and overall summary

Key Service Files¶

Path	Purpose
`src/services/assessment/`	Assessment service (Gemini, Anthropic, OpenAI implementations)
`src/services/assessment/observation-resolver.ts`	Resolves effective observation include/exclude status with per-assessment overrides
`src/services/assessment/derive/`	Pluggable derive strategy for formula markets
`src/services/extraction/`	Extraction service + storage
`src/services/transcription/`	Transcription service (provider factory)
`src/services/review/`	Review service
`src/services/evaluation/`	Result comparison and evaluators
`src/services/pipeline/graphs/`	LangGraph workflow definitions
`src/services/market-resolution/`	Market name → database ID resolution
`src/services/audio-generation/`	TTS pipeline for test audio
`src/lib/prompts.ts`	Prompt loader with hash tracking
`src/lib/ai-config.ts`	Model configuration and provider selection

LangGraph Orchestration¶

Pipeline steps are implemented as LangGraph graphs in src/services/pipeline/graphs/:

assessment-graph.ts — the primary assessment workflow graph
Individual step graphs for transcription, extraction, review
full-pipeline-graph.ts — end-to-end orchestration

Each graph node calls abstracted service classes (not inline AI code), enabling provider swapping and isolated testing. Logfire instrumentation is built into each service for tracing.