Geek HausGeek Haus
Back to feed
Sequoia Capital·

Standard Intelligence bets raw computer-use video, not language models, is the path to general AI agents

View original article
Summary

Standard Intelligence is training agents directly in pixel space, using video of computer activity to predict mouse movements, clicks, and keystrokes. The company argues that large-scale video pre-training can unlock more general computer agents than language-model wrappers or hand-built workflows.

Insight

The approach reflects a broader shift toward scaling raw action data rather than relying solely on text and tool orchestration. If successful, Standard Intelligence could reshape agent development by making computer-use datasets and efficient video encoders central infrastructure for AI systems.

Discussion

?