Perplexity AI unveils hybrid local-cloud inference system at Computex 2026
편집자 요약
Perplexity AI는 Computex 2026 Intel 키노트에서 사용자의 기기와 클라우드 frontier model 사이에 AI 작업을 실시간 배분하는 하이브리드 local-cloud 추론 시스템을 시연했습니다. Aravind Srinivas CEO는 Intel Core Ultra Series 3 기반 로컬 모델이 민감 정보를 기기에 남기고, 더 강한 추론이 필요한 작업만 클라우드 모델로 보내도록 자동 판단한다고 설명했습니다. 제품은 아직 출시되지 않았으며 회사는 몇 주 내 기능을 제공할 예정이라고 밝혔습니다.
맥락
이번 발표의 핵심은 온디바이스 AI 실행이 아니라, 작업 단위로 로컬과 클라우드를 자동 선택하는 추론 오케스트레이션입니다. 개인정보 보호, 비용, 지연시간, 모델 성능을 동시에 최적화하려는 PC AI 경쟁이 본격화되고 있으며, Intel 같은 칩 업체와 AI 서비스 기업의 결합이 더 중요해질 전망입니다.
본문
Perplexity AI, the fast-growing search startup now valued at $20 billion, unveiled what it calls the first hybrid local-server inference orchestrator at Computex 2026 on Monday night, demonstrating software that autonomously decides — in real time and mid-task — which AI workloads stay on a user's device and which get routed to frontier models in the cloud.CEO Aravind Srinivas demonstrated the system onstage alongside Intel CEO Lip-Bu Tan during Intel's keynote address, using Perplexity's "Personal Computer" agent to process confidential deal materials. In the demonstration, local models running on Intel Core Ultra Series 3 determined which information should remain on the device and which information could be sent to cloud-based models. Srinivas said the approach balances intelligence, accuracy, privacy, and cost.The key claim is not that a model can run locally — dozens of tools already do that. It is that Perplexity's system makes the routing decision itself, task by task, without requiring the user to choose in advance. Sensitive data like financial records or health information stays on the local machine; the heavier reasoning tasks that require frontier-scale models get sent to the cloud. One task, multiple execution locations, automatic orchestration."No product has done this before," a Perplexity spokesperson said in an email to VentureBeat. The product is not yet available to users; according to the company, the hybrid inference feature will launch in the coming weeks.Perplexity's road from cloud-only agents to on-device AI orchestrationTo understand why the Computex demonstration matters, it helps to trace the product arc Perplexity has been building since early this year.On February 25, Perplexity launched Computer, a multi-model AI agent that orchestrates 19 different AI models to complete complex, long-running tasks on behalf of users. The system ran entirely in the cloud, breaking goals into subtasks and routing each to whichever model — Claude, Gemini, GPT, Grok, or others — was best suited for the job. Perplexity Computer unified every current AI capability into a single system, functioning as a general-purpose digital worker that operates the same interfaces a user does.Then, in March, Perplexity introduced Personal Computer at its inaugural Ask 2026 developer conference. That product launched as a new Mac app with support for a hybrid local-cloud AI agent, which Perplexity described as a "personal orchestrator" that hybridizes local and server environments for security and productivity. Personal Computer could access the Mac's file system and native Mac apps to create and execute entire workflows, with files created in a secure sandbox and all actions auditable and reversible.What Srinivas demonstrated at Computex extends this architecture in a fundamental way. Previously, even the Personal Computer product divided labor along relatively clear lines: local file access on the device, heavy computation on Perplexity's servers.The new hybrid inference orchestrator gives the system itself the ability to reason about where each piece of a task should execute — not just which model to use, but which physical location should process it. The system reportedly asks for user permission before sending sensitive tasks to the cloud, a design choice that addresses one of the central anxieties enterprises have about agentic AI: data governance.Why Nvidia’s RTX Spark and Intel's new silicon make the timing strategicThe timing of the demonstration is not coincidental. Computex 2026 has been dominated by a single theme: on-device AI. Just hours before the Intel keynote, Nvidia CEO Jensen Huang unveiled the RTX Spark, a new Arm-based superchip that the company positions as the foundation for a new generation of AI-native Windows PCs.At full strength, the RTX Spark Superchip offers up to 20 Arm CPU cores, a Blackwell GPU with 6,144 CUDA cores, 128GB of LPDDR5X RAM, and up to 300 GB/s of memory bandwidth — enough power and memory for AI agents and 120-billion-parameter models with context lengths stretching to a million tokens. RTX Spark systems will begin arriving in the fall.Intel, not to be outdone, used its keynote to showcase Xeon 6+ processors with 288 efficiency cores built on 18A technology for the data center, and positioned its Core Ultra Series 3 as the client silicon that makes hybrid inference possible on the PC.Perplexity's hybrid orchestrator sits at the intersection of both strategies. If the system performs as advertised, it creates a direct economic incentive for users — and eventually enterprises — to invest in more powerful local silicon. The more capable the on-device chip, the more inference can run locally, reducing cloud costs and improving latency for sensitive workloads. That dynamic benefits Nvidia, Intel, and every other chipmaker competing for AI PC sockets.The implications extend well beyond chip economics. "As chips become more powerfu
댓글
토론
다음 읽을거리 추천

Alibaba's Qwen3.7-Plus supports text, video and imagery inputs at low cost of $0.4/$1.6 per 1M token — but it's proprietary

MiniMax-M3 debuts, eclipsing GPT-5.5 and Gemini 3.1 Pro on key benchmark performance for just 5-10% of the cost
