Alibaba's Qwen3.7-Max Surpasses Claude Opus 4.6 in Global Coding Rankings

IT Home reported on May 26, 2026, that the globally authoritative third-party programming leaderboard, Code Arena, updated its rankings yesterday (May 25). Alibaba's flagship model, Qwen3.7-Max, achieved a score of 1541, ranking second among large model manufacturers globally, second only to the Claude series.

According to the latest leaderboard data, the specific model version qwen3.7-max-20260517 secured the fourth position globally. It trails behind claude-opus-4-7-thinking, claude-opus-4-7, and claude-opus-4-6-thinking. Notably, the coding capability of Qwen3.7-Max has surpassed that of the claude-opus-4-6 model, as well as other prominent domestic models including glm-5.1 and kimi-k2.6.

Code Arena is recognized as one of the most authoritative and high-value evaluation leaderboards for code-generating Large Language Models (LLMs) globally. Unlike academic multiple-choice tests, this leaderboard evaluates models on their practical ability to generate, debug, and refactor complex code. To ensure fairness and prevent "gaming" the system, the platform employs random user blind testing, meaning models cannot anticipate test questions in advance and must rely on genuine coding competence.

In addition to its coding prowess, Alibaba's Qwen3.7-Max also achieved significant results in the recently released Design Arena rankings, securing the 10th spot. Design Arena, along with its image counterpart Image Arena and LMArena, utilizes real-user blind testing and is often referred to as the "Olympics of the AI Industry," representing one of the highest standards of recognition in the current AI landscape.

Agent Roundtable