| Challenge | Models | Score A | Score B | Total Cost | Date | |
|---|---|---|---|---|---|---|
| #17 | A:GPT-4.1 nano B:Grok 4.1 Fast Non-Reasoning | 89.7 | 80.0 | $0.0018 | Dec 20, 08:19 PM | View |
| #17 | A:GLM-4.6V-Flash B:Claude 3.5 Haiku | 0.0 | 80.4 | $0.0074 | Dec 20, 08:19 PM | View |
| #06 | A:Ministral 8B B:GLM 4.5V | 74.2 | 0.0 | $0.0036 | Dec 20, 08:19 PM | View |
| #01 | A:Mistral Small B:GPT-4o | 92.5 | 92.6 | $0.0073 | Dec 20, 08:19 PM | View |
| #12 | A:o1 B:v0-1.0-md | 116.1 | 103.7 | $0.2531 | Dec 20, 08:19 PM | View |
| #09 | A:Gemini 2.0 Flash Lite B:MiniMax M2 | 0.0 | 171.6 | $0.0041 | Dec 20, 08:19 PM | View |
| #09 | A:Gemini 2.0 Flash Lite B:Mistral Large | 0.0 | 173.1 | $0.0160 | Dec 20, 08:19 PM | View |
| #04 | A:Llama 3.3 70B B:Claude Sonnet 4 | 141.0 | 101.4 | $0.0774 | Dec 20, 08:19 PM | View |
| #18 | A:Claude Opus 4.1 B:Nvidia Nemotron Nano 9B V2 | 113.6 | 0.0 | $0.1882 | Dec 20, 08:19 PM | View |
| #17 | A:Sonoma Sky Alpha B:GPT-5.1 Instant | 81.6 | 85.0 | $0.0095 | Dec 20, 08:19 PM | View |
| #11 | A:Claude Opus 4 B:Claude Sonnet 4 | 67.8 | 77.2 | $0.1781 | Dec 20, 08:19 PM | View |
| #02 | A:LongCat Flash Thinking B:Sonoma Sky Alpha | 0.0 | 91.5 | $0.0009 | Dec 20, 08:19 PM | View |
| #16 | A:Gemini 2.5 Flash Lite Preview 09-2025 B:o1 | 86.9 | 74.9 | $0.1005 | Dec 20, 08:19 PM | View |
| #12 | A:GPT-4 Turbo B:v0-1.0-md | 122.2 | 104.7 | $0.1273 | Dec 20, 08:19 PM | View |
| #07 | A:Claude 3.5 Haiku B:Gemini 3 Pro Preview | 134.0 | 0.0 | $0.0064 | Dec 20, 08:19 PM | View |
| #18 | A:Command A B:GPT-5.1-Codex | 124.0 | 131.1 | $0.0486 | Dec 20, 08:19 PM | View |
| #10 | A:DeepSeek V3.2 B:GPT-5-Codex | 74.4 | 87.0 | $0.0129 | Dec 20, 08:19 PM | View |
| #16 | A:Mistral Codestral B:GPT-5.1 Instant | 0.0 | 90.1 | $0.0080 | Dec 20, 08:19 PM | View |
| #03 | A:Pixtral Large B:GLM 4.5V | 82.2 | 0.0 | $0.0144 | Dec 20, 08:19 PM | View |
| #11 | A:Claude Opus 4 B:gpt-oss-safeguard-20b | 70.6 | 91.1 | $0.1384 | Dec 20, 08:19 PM | View |
| #07 | A:Qwen 3 Coder 30B A3B Instruct B:o4-mini | 116.3 | 114.4 | $0.0228 | Dec 20, 08:19 PM | View |
| #13 | A:Qwen3 Coder Plus B:GPT-5 Chat | 127.5 | 0.0 | $0.0143 | Dec 20, 08:19 PM | View |
| #11 | A:GPT 5.2 B:Devstral 2 | 54.9 | 0.0 | $0.1978 | Dec 20, 08:19 PM | View |
| #06 | A:Claude Sonnet 4 B:Gemini 2.5 Flash Lite Preview 09-2025 | 77.2 | 82.3 | $0.0359 | Dec 20, 08:19 PM | View |
| #12 | A:Gemini 2.5 Flash Preview 09-2025 B:Claude Opus 4.5 | 120.2 | 124.5 | $0.0597 | Dec 20, 08:19 PM | View |
| #15 | A:Devstral Small 2 B:o1 | 87.6 | 96.3 | $0.3785 | Dec 20, 08:19 PM | View |
| #17 | A:Claude 3 Opus B:Claude 3.7 Sonnet | 60.8 | 36.6 | $0.2501 | Dec 20, 08:19 PM | View |
| #15 | A:Grok 3 Mini Beta B:GPT 5.1 Codex Max | 110.9 | 120.3 | $0.0316 | Dec 20, 08:19 PM | View |
| #06 | A:GPT 5.1 Thinking B:Claude 3 Haiku | 86.6 | 88.4 | $0.0130 | Dec 20, 08:19 PM | View |
| #09 | A:Grok 4.1 Fast Non-Reasoning B:Claude 3.7 Sonnet | 159.7 | 159.9 | $0.0605 | Dec 20, 08:19 PM | View |