2026/05/21/alibabas-proprietary-qwen3-7-max-can-run-for-35

Alibaba's proprietary Qwen3.7-Max can run for 35 hours autonomously and supports external harnesses like Anthropic's Claude Code

May 21, 2026, 11:53 PM·VentureBeat

EDITOR BRIEF

Alibaba’s Qwen Team released Qwen3.7-Max, a proprietary AI model that it says can sustain about 35 hours of continuous autonomous execution for complex agentic tasks. Unlike earlier Qwen releases, the model is not open source and is available through Alibaba-controlled access points, positioning it closer to paid frontier offerings from OpenAI and Google.

CONTEXT

The release shows how leading labs are shifting from chatbots toward long-running agents that can plan, execute, and recover over extended workflows. Alibaba’s move away from open source may improve monetization, but China-based endpoints could limit adoption among Western enterprises with strict compliance, security, or data residency requirements.

ARTICLE

The AI industry has fully entered the "agent era," a paradigm where AI models do far more than generate text — they now actively plan, execute, and course-correct complex tasks over days rather than seconds. Thus, it's perhaps unsurprising to see Chinese e-commerce giant Alibaba's famed Qwen Team of AI researchers release a model capable of performing autonomous agentic AI work over multiple days: that model has arrived in the form of Qwen3.7-Max which the company reports in a blog post achieved "~35 hours of continuous autonomous execution" — albeit, in a proprietary, not open source format, as prior Qwen Team releases were.This is also to be expected — it's what many analysts and industry experts feared in the wake of the departure of several key Qwen Team leaders earlier this year. But it makes sense for Alibaba financially, at least in the short term: training AI models, especially ones as powerful as Qwen3.7-Max, is expensive, and giving them away essentially for free, as open source models are, does not immediately help recoup any costs. In that sense, Alibaba is simply aligning its efforts with American AI giants like OpenAI and Google by offering the latest and greatest models only through paid APIs and subscription or paid web plan bundles, and slightly less performant ones through open source. Still, the arrival of Qwen3.7-Max offers further optionality to enterprises and individual users, and more competition for American AI labs — rarely a bad thing for consumers at all budget levels. Yet, the fact that the model is only accessible from Chinese-based endpoints means it may be limited in its appeal to American and European enterprises seeking to maximize compliance and security posturing when fulfilling government contracts, or even just attempting to comply with all relevant state, local, and national data sovereignty regulations. The marathon AI eraTo understand why Qwen3.7-Max is a departure from previous models, one must look at how it was trained and how it operates in practice. Language models typically degrade when forced to maintain a single train of thought over thousands of conversational turns; they forget instructions, hallucinate variables, or simply get stuck in logical loops. Qwen3.7-Max was specifically designed as a "versatile agent foundation" capable of "long-horizon reasoning" to overcome this exact bottleneck.The starkest demonstration of this capability is an autonomous engineering task detailed by the Qwen team. The model was given access to an isolated server equipped with a T-Head ZW-M890 PPU—a hardware architecture the model had never encountered during its training. Its task was to optimize an attention kernel. Over the course of 35 straight hours, Qwen3.7-Max operated entirely autonomously. It executed 1,158 distinct tool calls, performed 432 kernel evaluations, diagnosed compilation failures, and iteratively improved the code to achieve a 10.0x geometric mean speedup. By comparison, Chinese competitor models like z.ai's GLM-5.1 and Moonshot's Kimi K2.6 capped out at 7.3x and 5.0x speedups respectively, often voluntarily terminating their sessions when they failed to make progress. However, both are available open source. This endurance is achieved through what Alibaba calls "environment scaling". Just as early LLMs grew smarter by ingesting more diverse text, Qwen3.7-Max was trained across a vast, scaled array of dynamic agentic environments. It is capable of simulating a one-year lifecycle of a startup in the "YC-Bench" evaluation, navigating hundreds of decision-making rounds encompassing personnel management and contract screening. In this simulation, the model managed to generate $2.08 million in virtual revenue, nearly doubling the performance of the prior generation, Qwen3.6-Plus. Furthermore, the model has built-in reward-hacking self-monitoring, autonomously detecting when it attempts to cheat a training environment and adding heuristic rules to correct its own behavior.A brain for any scaffoldFrom a product perspective, Qwen3.7-Max is designed to be the cognitive engine for modern software development and enterprise automation. The model offers a massive 1-million-token context window and a 64K maximum output limit, providing immense overhead for processing sprawling codebases or lengthy technical documents.One of its most compelling features is "cross-harness generalization". Rather than being hardcoded to work best within a specific proprietary interface, Qwen3.7-Max is built to act as a drop-in intelligence layer for diverse agent frameworks. It supports the Anthropic API protocol natively, allowing developers to plug it directly into existing tools like Claude Code or OpenClaw.The benchmark data provided by Alibaba indicates that this generalized approach has paid massive dividends. On the Apex Math Reasoning benchmark, Qwen3.7-Max scored 44.5, eclipsing Claude Opus-4.6 Max's score of 34.5 and DeepSeek V4-Pro Max's 38.3

COMMENTS

Discussion

> geekhaus:~$ next read?

Why prompt debt, retrieval debt, and evaluation debt are quietly reshaping enterprise AI risk

VentureBeat

Alibaba's proprietary Qwen3.7-Max can run for 35 hours autonomously and supports external harnesses like Anthropic's Claude Code

EDITOR BRIEF

CONTEXT

ARTICLE

COMMENTS

Discussion

Why prompt debt, retrieval debt, and evaluation debt are quietly reshaping enterprise AI risk

AI agents are quietly generating chaos engineering failures enterprises don’t track yet

Valid certificates, stolen accounts: how attackers broke npm's last trust signal

EDITOR BRIEF

CONTEXT

ARTICLE

COMMENTS

Discussion

Next read recommendations

Why prompt debt, retrieval debt, and evaluation debt are quietly reshaping enterprise AI risk

AI agents are quietly generating chaos engineering failures enterprises don’t track yet

Valid certificates, stolen accounts: how attackers broke npm's last trust signal