med-routing · cascade demo

same prompt, same tiers (nano → mini → base) — four DIFFERENT uncertainty metrics decide when to escalate. rows = metric, columns = tier. shared per-tier work (1 main answer + N samples + logprobs) feeds all four scores. free-form medical Q&A; LLM-as-judge grades each row's committed answer against a reference when available.
load random: tip: ⌘/ctrl+enter to run
Batch run — sweep N random questions through the full grid; reports cumulative cost & tier distribution.

      
Audit log — GDPR Art. 30 record. Prompt-hash only, no PHI. Refreshes after each run.
time prompt sha router score final tier chain → regions cost latency