Multi-head self-attention is a key component of the Transformer, a state-of-the-art architecture for neural machine translation. In this work we evaluate the contribution made by individual attention heads in the encoder to the overall performance of the model and analyze the roles played by them. W...
Research Assistant
AI chat, annotations, notes & similar papers
No comments yet
Be the first to share your thoughts!