same prompt, same tiers (nano → mini → base) — four DIFFERENT uncertainty metrics decide when to escalate.
rows = metric, columns = tier. shared per-tier work (1 main answer + N samples + logprobs) feeds all four scores.
free-form medical Q&A; LLM-as-judge grades each row's committed answer against a reference when available.
Open the live Grafana dashboard \u2192
Cumulative savings, escalation rate, calibration, residency \u2014 updated in real time.
\u2197
Batch run — sweep N random questions through the full grid; reports cumulative cost & tier distribution.
Audit log — GDPR Art. 30 record. Prompt-hash only, no PHI. Refreshes after each run.
| time | prompt sha | router | score | final tier | chain → regions | cost | latency |
|---|