2026/06/04/study-finds-transformers-can-share-key-value

Study finds transformers can share key-value projections to cut KV cache memory with limited language model quality loss

Jun 4, 2026, 11:11 PM·arxiv.org

EDITOR BRIEF

An arXiv paper systematically tests transformer attention variants that share or collapse query, key, and value projections across synthetic, vision, and language tasks. The strongest variant, shared key-value projections, performs close to standard QKV attention and reduces KV cache size by 50% with a 3.1% perplexity hit in language modeling.

CONTEXT

The findings suggest part of the standard transformer attention design may be overparameterized, especially for inference-heavy deployments. Because projection sharing stacks with GQA and MQA, it could become a practical route to on-device inference by sharply reducing memory use without redesigning the whole model.

ARTICLE

Do transformers need three projections? Systematic study of QKV variants

COMMENTS

Discussion

> geekhaus:~$ next read?

Valve says it’s ready to launch the Steam Machine this summer

The Verge

Study finds transformers can share key-value projections to cut KV cache memory with limited language model quality loss

EDITOR BRIEF

CONTEXT

ARTICLE

COMMENTS

Discussion

Valve says it’s ready to launch the Steam Machine this summer

Ahead of its IPO, Anthropic’s Daniela Amodei shrugs off doubts about AI’s returns

Airbnb’s Brian Chesky plans to launch a new AI lab

EDITOR BRIEF

CONTEXT

ARTICLE

COMMENTS

Discussion

Next read recommendations

Valve says it’s ready to launch the Steam Machine this summer

Ahead of its IPO, Anthropic’s Daniela Amodei shrugs off doubts about AI’s returns

Airbnb’s Brian Chesky plans to launch a new AI lab