GEEK HAUS
Back to feed
2026/06/02/alibabas-qwen3-7-plus-supports-text-video-and

Alibaba's Qwen3.7-Plus supports text, video and imagery inputs at low cost of $0.4/$1.6 per 1M token — but it's proprietary

·VentureBeat
read original

EDITOR BRIEF

Alibaba released Qwen3.7-Plus, a multimodal LLM that can process text, images, video, and screenshots at $0.40 per 1M input tokens and $1.60 per 1M output tokens. The model is 60% cheaper than Qwen3.7-Max and adds capabilities beyond text, but it is available only through proprietary APIs and Qwen Chat rather than as open source.

CONTEXT

The release highlights Alibaba’s push to compete on both performance and price in enterprise AI, especially for multimodal workloads. But the closed licensing marks a strategic break from Qwen’s open model reputation, potentially frustrating developers and companies that built around open source Qwen releases.

ARTICLE

Alibaba this week released Qwen3.7-Plus, the latest AI large language model (LLM) in its globally beloved and increasingly expansive Qwen family, boasting more multimodal capabilities and a 60% lower cost than the prior, text-only Qwen3.7-Max model released just weeks ago. However, like its immediate predecessor Qwen3.7-Plus is available only under a "closed" commercial license via proprietary application programming interfaces (API) and Qwen Chat. That marks a big departure from the Qwen strategy to date, which was focused mainly on releasing powerful,near state-of-the-art open source models. Those enterprises and users who relied on the open source Qwen models — among them, U.S. giants such as Airbnb — will no doubt be disappointed to see that Alibaba is going closed for its newer releases.Still, the model is worth a look because of its low cost and high performance on multimodal tasks like creating enterprise-grade visuals or analyzing video, imagery and screenshots, which Qwen3.7-Max cannot do (it's text-only). It is among the cheaper powerful AI models available now, coming in price-wise just above Chinese rival's new MiniMax-M3's limited-time discount pricing. VentureBeat Frontier AI Model API Pricing SnapshotModelInputOutputTotal CostSourceMiMo-V2.5 Flash$0.10$0.30$0.40Xiaomi MiModeepseek-v4-flash$0.14$0.28$0.42DeepSeekdeepseek-v4-pro$0.435$0.87$1.305DeepSeekMiniMax-M3$0.30$1.20$1.50MiniMaxQwen3.7-Plus$0.40$1.60$2.00Alibaba CloudGemini 3.1 Flash-Lite$0.25$1.50$1.75GoogleMiMo-V2.5$0.40$2.00$2.40Xiaomi MiMoGrok 4.3 low context$1.25$2.50$3.75xAIGLM-5$1.00$3.20$4.20Z.aiKimi-K2.6$0.95$4.00$4.95Moonshot/KimiGLM-5.1$1.40$4.40$5.80Z.aiGrok 4.3 high context$2.50$5.00$7.50xAIQwen3.7-Max$2.50$7.50$10.00Alibaba CloudGemini 3.5 Flash$1.50$9.00$10.50GoogleGemini 3.1 Pro Preview ≤200K$2.00$12.00$14.00GoogleGPT-5.4$2.50$15.00$17.50OpenAIGemini 3.1 Pro Preview >200K$4.00$18.00$22.00GoogleClaude Opus 4.8$5.00$25.00$30.00AnthropicGPT-5.5$5.00$30.00$35.00OpenAIMaintaining continuity during complex tool execution loops For technical decision-makers deploying autonomous agents, the primary bottleneck has rarely been initial model intelligence. Instead, it is state decay—the tendency of an agent framework to lose its analytical trajectory over multi-step, long-horizon tasks. Qwen3.7-Plus addresses this architectural vulnerability through a combined approach to context management and reasoning state preservation. The model ships with a 1-million token context window and allocates up to 256K tokens specifically for internal chain-of-thought processing. To contextualize this capacity, imagine an automated cloud migration agent: it can ingest an entire codebase, map out the dependencies, and spend thousands of tokens quietly evaluating edge cases before executing a single line of bash script.Crucially, the API exposes a parameter called 'preserve_thinking.' Across Alibaba's ecosystem, the capability serves as a standardized architectural bridge rather than a tiered perk. Alibaba introduced the feature during the prior Qwen 3.6 generation, integrating it into both the open-weight Qwen3.6-27B and the proprietary Max models. At its core, the parameter operates at the API and template level to retain internal blocks across continuous conversational turns.This structural continuity solves a critical bottleneck for developers engineering long-horizon tasks. By keeping these internal logic loops intact, the feature prevents the model from dropping its context or needlessly recomputing its cached history midway through an operation. When a model executes complex, multi-step agentic coding assignments, this retention allows the system to hold onto its original train of thought without losing the plot or forgetting the underlying logic of its previous actions.Alibaba remains far from alone in recognizing this technical necessity, as the underlying concept now dictates the architecture of nearly all major artificial intelligence laboratories. Anthropic deploys this exact capability under the moniker "Extended Thinking" for its advanced models, including its latest Claude Opus 4.8. This framework requires developers to feed unmodified thinking blocks directly back into the API on subsequent turns to maintain an unbroken chain of reasoning. OpenAI tackles the same challenge through an encrypted reasoning pass-back mechanism for models like GPT-5.5. Within the OpenAI ecosystem, developers must return specific reasoning items generated alongside previous function calls, ensuring the model explicitly remembers the rationale behind its tool executions. Ultimately, preserve_thinking simply represents Alibaba's terminology for what has rapidly become the undisputed table stakes for modern multi-turn reasoning.Benchmarks show a competitive, yet sub state-of-the-art modelOn raw capability metrics, this deep-thinking architecture translates to structural gains across multimodal and agentic benchmarks. However, it

COMMENTS

Discussion

> geekhaus:~$ next read?

Next read recommendations