Original Qwen3.6-35B-A3B benchmark table extended to include scores for Claude Opus 4.7, Claude Opus 4.6, Claude Sonnet 4.6, gpt-5.4-high, and gemini-3.1-pro-preview. Empty cells (—) indicate scores not published by the vendor or not applicable. Cross-vendor numbers are not directly comparable because each vendor uses different evaluation harnesses — see Methodology Flags at the bottom.
| Benchmark | Opus 4.7 | Qwen3.5-27B | Qwen3.5-35BA3B | Gemma4-31B | Gemma4-26BA4B | Qwen3.6-35BA3B | Opus 4.6 | Sonnet 4.6 | gpt-5.4-high | gemini-3.1-pro-preview |
|---|---|---|---|---|---|---|---|---|---|---|
| SWE-bench Verified | 87.6 | 75.0 | 70.0 | 52.0 | 17.4 | 73.4 | 80.8 | 79.6 | 77.2* | 80.6 |
| SWE-bench Multilingual | — | 69.3 | 60.3 | 51.7 | 17.3 | 67.2 | 77.8 | — | — | — |