2026/06/04/study-finds-transformers-can-share-key-value

QKV 3중 projection이 필수인지 검증한 연구, key-value 공유로 KV cache 최대 50% 절감 제시

2026년 6월 4일 PM 11:11·arxiv.org

편집자 요약

arXiv에 제출된 이 아티클은 Transformer attention의 query, key, value projection 중 일부를 공유하는 QKV 변형을 체계적으로 평가합니다. 합성 과제, vision, language modeling 실험에서 일부 공유 구조가 표준 QKV와 비슷하거나 더 나은 성능을 보였으며, 특히 Q-K=V 방식은 language modeling에서 perplexity 저하 3.1%로 KV cache를 50% 줄였습니다.

맥락

이 결과는 Transformer 추론 비용 최적화가 quantization이나 pruning뿐 아니라 attention 내부의 weight tying에서도 가능함을 보여줍니다. Q-K=V가 GQA/MQA와 결합될 경우 cache 절감 폭이 87.5~96.9%까지 커져, edge device와 on-device AI 배포에서 메모리 병목을 완화하는 실용적 경로가 될 수 있습니다.

본문

Do transformers need three projections? Systematic study of QKV variants

토론

> geekhaus:~$ 다음 읽을거리?

Valve says it’s ready to launch the Steam Machine this summer

The Verge

QKV 3중 projection이 필수인지 검증한 연구, key-value 공유로 KV cache 최대 50% 절감 제시

편집자 요약

맥락

본문

댓글

토론

Valve says it’s ready to launch the Steam Machine this summer

Ahead of its IPO, Anthropic’s Daniela Amodei shrugs off doubts about AI’s returns

Airbnb’s Brian Chesky plans to launch a new AI lab

편집자 요약

맥락

본문

댓글

토론

다음 읽을거리 추천

Valve says it’s ready to launch the Steam Machine this summer

Ahead of its IPO, Anthropic’s Daniela Amodei shrugs off doubts about AI’s returns

Airbnb’s Brian Chesky plans to launch a new AI lab