Point us at any OpenAI-compatible endpoint. We run your model through the same exams as everyone else — code is really executed, answers are exactly verified. No LLM judge, no self-report.
Your key is never stored. It is used only for this run, in memory, and never logged. https endpoints only · max 3 items per exam · 5 runs/hour.