Alibaba's Qwen3.7-Plus supports text, video and imagery inputs at low cost of $0.4/$1.6 per 1M token — but it's proprietary
편집자 요약
Alibaba가 텍스트뿐 아니라 이미지, 비디오, 스크린샷 입력을 처리하는 Qwen3.7-Plus를 공개했습니다. 가격은 100만 토큰당 입력 $0.40, 출력 $1.60으로, 최근 출시된 텍스트 전용 Qwen3.7-Max보다 약 60% 낮습니다. 다만 모델은 Qwen Chat과 독점 API를 통한 폐쇄형 상용 라이선스로만 제공돼, 기존 오픈소스 Qwen 생태계와는 다른 방향을 보입니다.
맥락
Alibaba의 이번 행보는 frontier급 모델에서 개방형 배포보다 API 기반 수익화와 사용 통제가 우선순위로 부상하고 있음을 보여줍니다. 동시에 중국 AI 업체들이 저가 고성능 모델 경쟁을 강화하면서, 기업 고객은 비용 절감과 벤더 종속성 사이의 균형을 더 신중히 따져야 하는 국면에 들어섰습니다.
본문
Alibaba this week released Qwen3.7-Plus, the latest AI large language model (LLM) in its globally beloved and increasingly expansive Qwen family, boasting more multimodal capabilities and a 60% lower cost than the prior, text-only Qwen3.7-Max model released just weeks ago. However, like its immediate predecessor Qwen3.7-Plus is available only under a "closed" commercial license via proprietary application programming interfaces (API) and Qwen Chat. That marks a big departure from the Qwen strategy to date, which was focused mainly on releasing powerful,near state-of-the-art open source models. Those enterprises and users who relied on the open source Qwen models — among them, U.S. giants such as Airbnb — will no doubt be disappointed to see that Alibaba is going closed for its newer releases.Still, the model is worth a look because of its low cost and high performance on multimodal tasks like creating enterprise-grade visuals or analyzing video, imagery and screenshots, which Qwen3.7-Max cannot do (it's text-only). It is among the cheaper powerful AI models available now, coming in price-wise just above Chinese rival's new MiniMax-M3's limited-time discount pricing. VentureBeat Frontier AI Model API Pricing SnapshotModelInputOutputTotal CostSourceMiMo-V2.5 Flash$0.10$0.30$0.40Xiaomi MiModeepseek-v4-flash$0.14$0.28$0.42DeepSeekdeepseek-v4-pro$0.435$0.87$1.305DeepSeekMiniMax-M3$0.30$1.20$1.50MiniMaxQwen3.7-Plus$0.40$1.60$2.00Alibaba CloudGemini 3.1 Flash-Lite$0.25$1.50$1.75GoogleMiMo-V2.5$0.40$2.00$2.40Xiaomi MiMoGrok 4.3 low context$1.25$2.50$3.75xAIGLM-5$1.00$3.20$4.20Z.aiKimi-K2.6$0.95$4.00$4.95Moonshot/KimiGLM-5.1$1.40$4.40$5.80Z.aiGrok 4.3 high context$2.50$5.00$7.50xAIQwen3.7-Max$2.50$7.50$10.00Alibaba CloudGemini 3.5 Flash$1.50$9.00$10.50GoogleGemini 3.1 Pro Preview ≤200K$2.00$12.00$14.00GoogleGPT-5.4$2.50$15.00$17.50OpenAIGemini 3.1 Pro Preview >200K$4.00$18.00$22.00GoogleClaude Opus 4.8$5.00$25.00$30.00AnthropicGPT-5.5$5.00$30.00$35.00OpenAIMaintaining continuity during complex tool execution loops For technical decision-makers deploying autonomous agents, the primary bottleneck has rarely been initial model intelligence. Instead, it is state decay—the tendency of an agent framework to lose its analytical trajectory over multi-step, long-horizon tasks. Qwen3.7-Plus addresses this architectural vulnerability through a combined approach to context management and reasoning state preservation. The model ships with a 1-million token context window and allocates up to 256K tokens specifically for internal chain-of-thought processing. To contextualize this capacity, imagine an automated cloud migration agent: it can ingest an entire codebase, map out the dependencies, and spend thousands of tokens quietly evaluating edge cases before executing a single line of bash script.Crucially, the API exposes a parameter called 'preserve_thinking.' Across Alibaba's ecosystem, the capability serves as a standardized architectural bridge rather than a tiered perk. Alibaba introduced the feature during the prior Qwen 3.6 generation, integrating it into both the open-weight Qwen3.6-27B and the proprietary Max models. At its core, the parameter operates at the API and template level to retain internal blocks across continuous conversational turns.This structural continuity solves a critical bottleneck for developers engineering long-horizon tasks. By keeping these internal logic loops intact, the feature prevents the model from dropping its context or needlessly recomputing its cached history midway through an operation. When a model executes complex, multi-step agentic coding assignments, this retention allows the system to hold onto its original train of thought without losing the plot or forgetting the underlying logic of its previous actions.Alibaba remains far from alone in recognizing this technical necessity, as the underlying concept now dictates the architecture of nearly all major artificial intelligence laboratories. Anthropic deploys this exact capability under the moniker "Extended Thinking" for its advanced models, including its latest Claude Opus 4.8. This framework requires developers to feed unmodified thinking blocks directly back into the API on subsequent turns to maintain an unbroken chain of reasoning. OpenAI tackles the same challenge through an encrypted reasoning pass-back mechanism for models like GPT-5.5. Within the OpenAI ecosystem, developers must return specific reasoning items generated alongside previous function calls, ensuring the model explicitly remembers the rationale behind its tool executions. Ultimately, preserve_thinking simply represents Alibaba's terminology for what has rapidly become the undisputed table stakes for modern multi-turn reasoning.Benchmarks show a competitive, yet sub state-of-the-art modelOn raw capability metrics, this deep-thinking architecture translates to structural gains across multimodal and agentic benchmarks. However, it
댓글
토론
다음 읽을거리 추천

Perplexity AI unveils hybrid local-cloud inference system at Computex 2026

MiniMax-M3 debuts, eclipsing GPT-5.5 and Gemini 3.1 Pro on key benchmark performance for just 5-10% of the cost
