Analyzing Multi-Head Self-Attention: Specialized Heads Do the Heavy Lifting, the Rest Can Be Pruned

Voita, Elena; Talbot, David; Moiseev, Fédor; Sennrich, Rico; Titov, Ivan

doi:10.18653/v1/p19-1580

Public

Analyzing Multi-Head Self-Attention: Specialized Heads Do the Heavy Lifting, the Rest Can Be Pruned

Shared by NobleBlocks on Jan 1, 2019 • 12:00 AM UTC

Authors:

Elena Voita

David Talbot

Fédor Moiseev

Abstract

Multi-head self-attention is a key component of the Transformer, a state-of-the-art architecture for neural machine translation. In this work we evaluate the contribution made by individual attention heads in the encoder to the overall performance of the model and analyze the roles played by them. W...

Subject

Computer science

Machine translation

Encoder

Research Assistant

AI chat, annotations, notes & similar papers

Finding related papers...

Discussions

(0)

No comments yet

Be the first to share your thoughts!