👷 Readings in Digital Typography, Scientific Visualization, Information Retrieval and Machine Learning

[Michal Štefánik] Attention semantics: What attention heads actually know and why should we care 29. 10. 2020

Presentation slides (Google Docs, animated)

Presentation slides (without animations) for the 2020-10-29 talk by Michal Štefánik

Attention semantics: What attention heads actually know and why should we care

Video recording for the 2020-10-29 talk by Michal štefánik

Abstract

Transformer models can strike the impression of a magical black box, such as huge neural models quite often are. One might think that its architecture must be a result of systematic, incremental development that has led to the optimal package on output, that it's not a good idea to mangle with.

The more surprising it can be then, that its main inner components, attention heads, can sufficiently perform distinct NLP tasks in standalone, or that removing some of the heads in pre-trained models can be beneficial for the accuracy of end tasks.

In this presentation, we'll go over a quick survey of what research has been done with an aim to understand the functionality of each of the heads and how successful it was. On the way, we'll highlight several more or less intuitive, yet interesting observations, concerning Transformers' inner parts. Eventually, we'll outline some of the consequences, that interpretation and/or sensible management of the particular attention heads might have.

The related research will be supplemented with the author's in-hand experiments, that aim for the utilization of Attention in a more accurate and interpretable Information Retrieval system.

Literature

Are Sixteen Heads Really Better than One? https://arxiv.org/pdf/1905.10650.pdf
Looking for Grammar in all the Right Places https://aletheap.github.io/posts/2020/07/looking-for-grammar/
Head Pruning in Transformer Models: https://towardsdatascience.com/head-pruning-in-transformer-models-ec222ca9ece7
Big Bird: Transformers for Longer Sequences: https://arxiv.org/pdf/2007.14062v1.pdf

Předchozí

Následující

👷 Readings in Digital Typography, Scientific Visualization, Information Retrieval and Machine Learning
- Nyní studovat
  
  [Michal Štefánik] Attention sparsification: Look into the future and the past (behind the context window) 8. 10. 2020
- Nyní studovat
  
  [Vítek Novotný] Word Embeddings: Towards Fast, Interpretable, and Accurate Information Retrieval Systems 15. 10. 2020
- Nyní studovat
  
  [Mikuláš Bankovič] Single Image Super Resolution: SRCNN and ESPCN 22. 10. 2020
- Nyní studovat
  
  [Michal Štefánik] Attention semantics: What attention heads actually know and why should we care 29. 10. 2020
- Nyní studovat
  
  [Vlastimil Martinek] Experiments with image augmentation for classification and segmentation 5. 11. 2020
- Nyní studovat
  
  [Vítek Novotný & Dominik Rehák] Five Years of Markdown in LaTeX: What, Why, How, and Whereto 12. 11. 2020
- Nyní studovat
  
  [Jakub Ryšavý] Feature Reduction: Selection or Extraction for Time Series (Financial) Data 19. 11. 2020
- Nyní studovat
  
  [Vlastimil Martinek] Experiments with Image Augmentation for Classification and Segmentation: Part 2 26. 11. 2020
- Nyní studovat
  
  [Eniafe Festus Ayetiran] Exploting Semantic Knowledge for Aspect Sentiment Classification: A Deep Learning Approach 3. 12. 2020
- Nyní studovat
  
  [Michal Štefánik] Unsupervised Data Augmentation: Thinking Outside the Single-Objective Box 10. 12. 2020
- Nyní studovat
  
  [All] Christmas party 17. 12.
- Nyní studovat
  
  [Petr Sojka et al.]: The Representations of Language which Allow Thinking, Fast and Slow 7. 1. 2021
- Nyní studovat
  
  [Michal Štefánik and Vítek Novotný]: Poster session for the ALPS NLP winter school 14. 1. 2021
- Nyní studovat
  
  [Filip Široký] Forecasting the Linac3 ion beam current challenge (a.k.a. co se dělá v CERN) 28. 1. 2021
- Nyní studovat
  
  [Vítek Novotný a Michal Štefánik] Advanced Language Processing Winter School 2021 11. 2. 2021
- Nyní studovat
  
  [Vítek Novotný] Math-Aware Search Engine in a Single Line of Code 18. 2. 2021

Operace

Prohlédnout vše

Interaktivní osnova

[Michal Štefánik] Attention semantics: What attention heads actually know and why should we care 29. 10. 2020

Presentation slides (Google Docs, animated)

Literature

Operace