| Challenge | Models | Score A | Score B | Total Cost | Date | |
|---|---|---|---|---|---|---|
| #11 | A:Claude Sonnet 4.5 B:o3-mini | 77.2 | 83.4 | $0.0373 | Dec 20, 09:01 PM | View |
| #18 | A:Claude Haiku 4.5 B:o3 Pro | 126.0 | 0.0 | $0.1323 | Dec 20, 09:01 PM | View |
| #08 | A:GPT-4.1 B:Kimi K2 Thinking Turbo | 91.0 | 78.3 | $0.0274 | Dec 20, 09:01 PM | View |
| #02 | A:Sonoma Sky Alpha B:Ministral 8B | 89.2 | 90.7 | $0.0013 | Dec 20, 09:01 PM | View |
| #05 | A:Grok 4 B:GPT 5.1 Thinking | 63.5 | 87.0 | $0.0393 | Dec 20, 09:01 PM | View |
| #07 | A:LongCat Flash Thinking B:Claude 3 Opus | 0.0 | 116.1 | $0.1296 | Dec 20, 09:01 PM | View |
| #02 | A:GPT-5 Chat B:gpt-oss-120b | 0.0 | 91.8 | $0.0065 | Dec 20, 09:01 PM | View |
| #03 | A:Mistral Nemo B:Grok 3 Beta | 0.0 | 85.8 | $0.0161 | Dec 20, 09:01 PM | View |
| #13 | A:GLM-4.6V B:Kimi K2 Thinking Turbo | 0.0 | 128.6 | $0.0219 | Dec 20, 09:01 PM | View |
| #18 | A:Mistral Nemo B:Grok 3 Fast Beta | 0.0 | 120.8 | $0.0544 | Dec 20, 09:01 PM | View |
| #04 | A:Mercury Coder Small Beta B:v0-1.5-md | 141.6 | 92.3 | $0.0949 | Dec 20, 09:01 PM | View |
| #04 | A:Kimi K2 B:Mistral Small | 98.4 | 137.3 | $0.0098 | Dec 20, 09:01 PM | View |
| #08 | A:Codex Mini B:Gemini 2.0 Flash Lite | 75.4 | 0.0 | $0.0170 | Dec 20, 09:01 PM | View |
| #10 | A:Claude 3.7 Sonnet B:Pixtral Large | 82.1 | 76.1 | $0.0386 | Dec 20, 09:01 PM | View |
| #01 | A:Claude Opus 4.5 B:Pixtral Large | 86.9 | 87.9 | $0.0349 | Dec 20, 09:01 PM | View |
| #17 | A:MiniMax M2 B:Mercury Coder Small Beta | 76.8 | 90.1 | $0.0029 | Dec 20, 09:01 PM | View |
| #18 | A:Gemini 2.5 Flash Lite B:gpt-oss-120b | 130.0 | 136.5 | $0.0025 | Dec 20, 09:01 PM | View |
| #01 | A:Qwen 3 Coder 30B A3B Instruct B:v0-1.0-md | 89.0 | 80.5 | $0.0563 | Dec 20, 09:01 PM | View |
| #14 | A:Sonoma Sky Alpha B:GPT-4.1 mini | 85.4 | 83.0 | $0.0031 | Dec 20, 09:01 PM | View |
| #02 | A:Claude 3.5 Sonnet (2024-06-20) B:Grok 4.1 Fast Non-Reasoning | 85.1 | 83.5 | $0.0253 | Dec 20, 09:01 PM | View |
| #10 | A:Claude 3.5 Haiku B:GPT-4 Turbo | 85.1 | 64.8 | $0.0850 | Dec 20, 09:01 PM | View |
| #09 | A:gpt-oss-safeguard-20b B:Mistral Large | 186.3 | 170.2 | $0.0177 | Dec 20, 09:01 PM | View |
| #11 | A:Grok 4 Fast Non-Reasoning B:GPT 5.1 Codex Max | 82.9 | 0.0 | $0.0119 | Dec 20, 09:01 PM | View |
| #18 | A:DeepSeek V3.2 Thinking B:GPT-4.1 nano | 0.0 | 121.3 | $0.0008 | Dec 20, 09:01 PM | View |
| #10 | A:GPT 5.2 B:Mistral Nemo | 51.5 | 0.0 | $0.2376 | Dec 20, 09:01 PM | View |
| #18 | A:Grok Code Fast 1 B:o4-mini | 134.3 | 130.9 | $0.0140 | Dec 20, 09:01 PM | View |
| #14 | A:Gemini 2.5 Pro B:Mistral Small | 79.8 | 90.2 | $0.0093 | Dec 20, 09:01 PM | View |
| #11 | A:Codex Mini B:Claude Opus 4 | 33.7 | 67.9 | $0.1923 | Dec 20, 09:01 PM | View |
| #12 | A:GLM 4.5 B:GPT-5 pro | 108.0 | 0.0 | $0.0071 | Dec 20, 09:01 PM | View |
| #10 | A:Pixtral Large B:Grok 4 | 79.7 | 65.5 | $0.0361 | Dec 20, 09:01 PM | View |