2026/05/29/pinterest-cut-ai-costs-90-by-gutting-a-frontier

Pinterest cut AI costs 90% by gutting a frontier model's vision layer

2026년 5월 29일 PM 04:24·VentureBeat

편집자 요약

Pinterest는 월간 활성 사용자 6억2000만 명 규모에서 모든 이미지 추천에 frontier model을 호출하는 방식이 비용·지연 측면에서 지속 가능하지 않다고 판단했습니다. CTO Matt Madrigal 팀은 Qwen3-VL의 vision encoder를 제거하고 자체 multimodal embeddings와 이미지 metadata로 재구성해 비용을 90% 낮추고 정확도를 30% 높였다고 밝혔습니다.

맥락

이번 사례는 대규모 소비자 서비스에서 범용 대형 모델을 그대로 쓰기보다, 고유 데이터와 도메인 embeddings로 재설계하는 접근이 비용과 성능을 동시에 개선할 수 있음을 보여줍니다. 특히 Apache 라이선스 기반 open-source models를 내부 데이터로 fine-tuning하고 일부 레이어를 대체하는 전략은 추론 지연과 클라우드 비용을 줄이는 실용적 AI 아키텍처로 확산될 가능성이 큽니다.

본문

At 620 million monthly users, calling a frontier model for every image recommendation isn't a strategy — it's a bill. Pinterest CTO Matt Madrigal solved it by gutting Qwen3-VL's vision layer and rebuilding it with proprietary embeddings, cutting costs 90% and boosting accuracy 30%.Madrigal’s team has been heavily investing in customizing open-source models “foundationally in-house.” “If you've got really unique data that you can then fine-tune an open source model with, data quality will, frankly, outweigh or overcome model size,” Madrigal explained in a recent VB Beyond the Pilot podcast. How Pinterest customized Qwen for visual discoveryPinterest, which has around 620 million monthly active users, has long applied open source models for visual search and discovery, going back to Google’s BERT and OpenAI’s CLIP. The company fine-tuned its own Pin CLIP on the latter, incorporating proprietary visual embeddings and image metadata. Pinterest’s conversational shopping assistant, Navigator 1, was built on Qwen3-VL and customized in “pretty significant” ways. Madrigal’s team essentially “ripped out” Qwen’s vision encoder layer and fine-tuned the model on proprietary multimodal embeddings. This has allowed them to capture metadata around pins and images that can then be precomputed offline and regularly retrained on new information to deliver personalized experiences. “Open-source models, especially with open Apache licenses where you can truly tweak a lot of open weights and customize for unique use cases — that's where we've found open source to be so powerful for us,” Madrigal said. Bringing their own embeddings allows his team to gain context around metadata, pins, and images; also, notably, the model performs better at runtime and inference. Without these embeddings, devs would have to call and encode each image returned at runtime, one at a time. That results in a latency “20 times worse” from an inference perspective, Madrigal said. “If it's something that's going to be critical for our end users, that's going to drive engagement, that will have to scale to over 600 million monthly active users, we're going to either probably build it or we're going to leverage open source and customize the heck out of it,” he said. How a taste graph captures evolving interestsTo guide users from inspiration to purchase, Madrigal's team built a "taste graph": a dynamic representation of what individual users actually like, not just what they click on. “It's this representation of billions of people's evolving tastes,” he said. People go to Google or other search engines when they have a clear picture of what they want; Pinterest is for when they’re still in the discovery phase, Madrigal said. Pinterest’s goal is to encourage “lateral exploration” and transform discovery to intent (that is, clicking through ads or making purchases). Under the hood, the architecture combines a graph structure with representational learning. User embeddings capture a user’s evolving tastes. These are constantly updated based on activity and new content and signals. “It's not a social graph,” Madrigal said. “It's much more of a preference graph: What's going to inspire you? What are you trying to do next?” For instance, one user may be into mid-century modern designs; another may prefer a Nantucket aesthetic. Those preferences will be captured in user embeddings, and the taste graph will deliver up specific, relevant products as a result. “You go from the upper funnel, inspiration discovery, all the way through lower funnel intent,” Madrigal said. Listen to the full podcast to hear more about:How Pinterest uses sandboxes to encourage creativity in a way that is secure and contained; Why a continuous feedback loop can prevent visual AI slop; The importance of constant benchmarking to gauge user engagement, performance, latency, and other factors. You can also listen and subscribe to Beyond the Pilot on Spotify, Apple or wherever you get your podcasts.

토론

> geekhaus:~$ 다음 읽을거리?

The AI agent bottleneck isn't model performance — it's permissions

VentureBeat

Pinterest cut AI costs 90% by gutting a frontier model's vision layer

편집자 요약

맥락

본문

댓글

토론

The AI agent bottleneck isn't model performance — it's permissions

MeMo's memory model lets teams upgrade their LLM without retraining it — and performance jumps 26%

AI agents are entering their rebuild era as enterprises confront the reliability problem

편집자 요약

맥락

본문

댓글

토론

다음 읽을거리 추천

The AI agent bottleneck isn't model performance — it's permissions

MeMo's memory model lets teams upgrade their LLM without retraining it — and performance jumps 26%

AI agents are entering their rebuild era as enterprises confront the reliability problem