GEEK HAUS
Back to feed
2026/06/04/study-finds-transformers-can-share-key-value

Study finds transformers can share key-value projections to cut KV cache memory with limited language model quality loss

·arxiv.org
read original

EDITOR BRIEF

An arXiv paper systematically tests transformer attention variants that share or collapse query, key, and value projections across synthetic, vision, and language tasks. The strongest variant, shared key-value projections, performs close to standard QKV attention and reduces KV cache size by 50% with a 3.1% perplexity hit in language modeling.

CONTEXT

The findings suggest part of the standard transformer attention design may be overparameterized, especially for inference-heavy deployments. Because projection sharing stacks with GQA and MQA, it could become a practical route to on-device inference by sharply reducing memory use without redesigning the whole model.

ARTICLE

Do transformers need three projections? Systematic study of QKV variants

COMMENTS

Discussion

> geekhaus:~$ next read?

Next read recommendations