Skip to content

AI Pipeline

This page covers the application's AI pipeline — the runtime services that power transcription, extraction, review, and assessment. For the offline evaluation CLI, see E2E Eval Pipeline.

Pipeline Stages

Audio Recording
┌─────────────┐    ┌─────────────┐    ┌─────────────┐    ┌─────────────┐
│Transcription│───▶│ Extraction  │───▶│   Review    │    │ Assessment  │
│ audio→text  │    │ text→prices │    │  QA scoring │    │ obs→prices  │
└─────────────┘    └─────────────┘    └─────────────┘    └─────────────┘
                                                    PriceObservations
                                                    (from any source)

Transcription → Extraction → Review form the data capture path (per-call). Assessment is a separate workflow that operates on accumulated PriceObservation records across multiple calls/sources for a MAG.

Provider Matrix

Stage Providers Default Env Var
Transcription AssemblyAI, Azure Speech, Whisper, ElevenLabs Scribe AssemblyAI TRANSCRIPTION_PROVIDER
Extraction Gemini Flash, Claude Haiku, GPT-5.2 Gemini EXTRACTION_PROVIDER
Review Gemini Flash, Claude Haiku, GPT-5.2 Gemini REVIEW_PROVIDER
Assessment Gemini Flash, Claude Haiku, GPT-5.2 Gemini ASSESSMENT_PROVIDER

All providers implement a common interface per stage (e.g., IExtractionService). Switching provider requires only the environment variable — no code changes.

API Keys

Provider Env Var
Gemini GEMINI_API_KEY
Anthropic (Claude) ANTHROPIC_API_KEY
OpenAI (GPT) OPENAI_API_KEY
AssemblyAI ASSEMBLYAI_API_KEY
Azure Speech AZURE_SPEECH_API_KEY + AZURE_SPEECH_REGION
ElevenLabs ELEVENLABS_API_KEY

Prompt Templates

Prompts live in prompts/*.yaml and are loaded at runtime by src/lib/prompts.ts.

File Stage Purpose
extraction.yaml Extraction Transcript → structured price data points
review.yaml Review Extraction quality scoring with confidence
review-v2.yaml Review Adversarial review prompt (calibration variant)
review-judge.yaml Review Judge-style review for calibration
assessment.yaml Assessment Observations + statistics → assessed prices
assessment-naive.yaml Assessment Simplified assessment (baseline comparison)
assessment-review.yaml Assessment Post-assessment data-grounding validation (checks rationale claims against observations)

Each template has:

  • system_prompt — role and instructions for the LLM
  • user_prompt_template — Mustache-style template with {{variable}} placeholders
  • Optional thinking_budget — controls extended thinking (0 = disabled; assessment uses 0)

Prompt hashing (computePromptHash()) tracks which prompt version produced each result.

Commodity Configuration

configs/commodities.yaml is the single source of truth for commodity-specific data:

commodities:
  cobalt:
    methodology: |
      ## Cobalt Assessment Methodology
      ...data hierarchy, rationale writing guide...
    extraction_context:
      commodity_type: base metal
      common_terms: [alloy grade, standard grade, MB free market]
      price_units: [$ per lb, $ per tonne]
      typical_ranges:
        "$ per lb": { min: 10, max: 25 }
    assessment_guidance: |
      ...cobalt-specific assessment rules...
    patterns:
      mags: [cobalt]
      markets: [cobalt]

Resolution: MAG code / market name → pattern match → commodity config. A default_methodology provides fallback text when no commodity pattern matches.

Supported Commodities

Commodity MAG Code Markets Key Features
Cobalt COBALT-LON 2 Base metal, $/lb ranges, alloy + standard grade
SYP Lumber LBR-SYP 205 Derived pricing, benchmark + formula markets
SA Vegoils SA-VEGOILS 5 Forward curve (M1–M6 tenors), contract months

Assessment Pipeline

The assessment workflow is the most complex pipeline. It runs per-MAG, per-assessment-period:

  1. Observation gathering — query PriceObservation records for the MAG's markets within the assessment period; per-assessment exclusion overrides (AssessmentObservationExclusion) are resolved via observation-resolver.ts, taking priority over the global discarded flag
  2. Statistics calculation (deterministic TypeScript, not LLM):
    • Weighted mean, median, range per market
    • Cross-market spreads and directional indicators
    • Related market context (sibling MAG prices, coherence checks)
  3. LLM assessment — prompt includes methodology, observations, and pre-calculated statistics. The LLM reasons about the assessed price range, confidence, rationale, and flags. It does not perform arithmetic.
  4. Derive strategy (SYP Lumber) — for formula-based markets, TypeScript applies the formula to the LLM's benchmark assessment rather than asking the LLM to assess each market individually
  5. Validation — two-part check between assess and persist nodes: (a) deterministic zero-data guard clamps any market with 0 observations to no-data status; (b) fast LLM review (assessment-review.yaml) audits rationale claims against observations to catch phantom data and price disconnects — derived (formula) markets are excluded from LLM review; corrections are surgical only (number/type swaps and line strikes; rationale structure is never altered); up to two review passes run with early stop if the first pass is clean
  6. Result persistenceAiAssessmentResult record with per-market JSON results and overall summary

Key Service Files

Path Purpose
src/services/assessment/ Assessment service (Gemini, Anthropic, OpenAI implementations)
src/services/assessment/observation-resolver.ts Resolves effective observation include/exclude status with per-assessment overrides
src/services/assessment/derive/ Pluggable derive strategy for formula markets
src/services/extraction/ Extraction service + storage
src/services/transcription/ Transcription service (provider factory)
src/services/review/ Review service
src/services/evaluation/ Result comparison and evaluators
src/services/pipeline/graphs/ LangGraph workflow definitions
src/services/market-resolution/ Market name → database ID resolution
src/services/audio-generation/ TTS pipeline for test audio
src/lib/prompts.ts Prompt loader with hash tracking
src/lib/ai-config.ts Model configuration and provider selection

LangGraph Orchestration

Pipeline steps are implemented as LangGraph graphs in src/services/pipeline/graphs/:

  • assessment-graph.ts — the primary assessment workflow graph
  • Individual step graphs for transcription, extraction, review
  • full-pipeline-graph.ts — end-to-end orchestration

Each graph node calls abstracted service classes (not inline AI code), enabling provider swapping and isolated testing. Logfire instrumentation is built into each service for tracing.