Model | Average CR⬆️ | AGIEval Mean (Min, Max) | AGIEval CR | MMLU-Pro Mean (Min, Max) | MMLU-Pro CR | Math Mean (Min, Max) | Math CR | #Params (B) |
---|---|---|---|---|---|---|---|---|
meta-llama/Llama-3.1-70B-Instruct | 72.39 | 72.43, (65.34, 74.66) | 81.79 | 66.63, (55.16, 70.68) | 73.19 | 65.88, (64.58, 67.86) | 62.18 | 0 |
mistralai/Mistral-Large-Instruct-2407 | 71.93 | 68.78, (61.41, 74.49) | 75.77 | 65.1, (50.28, 69.23) | 72.31 | 71.04, (69.66, 72.72) | 67.71 | 0 |
meta-llama/Meta-Llama-3-70B-Instruct | 69.11 | 69.71, (60.77, 71.2) | 83.13 | 58.75, (49.3, 63.16) | 75.24 | 51.29, (49.66, 54.2) | 48.96 | 0 |
01-ai/Yi-1.5-34B-Chat | 58.43 | 63.89, (50.85, 70.98) | 69.95 | 49.91, (36.47, 55.76) | 57.31 | 53.46, (51.7, 54.42) | 48.04 | 0 |
meta-llama/Llama-3.1-8B-Instruct | 52.74 | 54.59, (44.62, 59.66) | 62.54 | 45.3, (32.34, 51.94) | 52.79 | 49.21, (46.88, 51.18) | 42.9 | 0 |
mistralai/Mistral-Nemo-Instruct-2407 | 49.46 | 51.57, (38.46, 63.8) | 58.7 | 40.63, (31.49, 47.65) | 51.43 | 42.91, (40.72, 45.22) | 38.26 | 0 |
Created
March 23, 2025 16:45
-
-
Save pszemraj/3aaaa3ce1874eaa6130abe260ab15efe to your computer and use it in GitHub Desktop.
https://huggingface.co/spaces/nvidia/llm-robustness-leaderboard parsed to md table
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment