Neural networks have advanced the state-of-theart in natural language processing (Wu et al., 2019; Joulin et al., 2017). Their performance has made their use ubiquitous, but their lack of interpretability has been a long-standing issue (Li et al., 2016). Attention-based models offer a way to improve performance without sacrificing interpretability. Attention mechanisms were first introduced for machine translation (Bahdanau et al., 2014; Luong et al., 2015), and have since been extended to text classification (Yang et al., 2016), natural language inference (Chen et al., 2016) and language modeling (Salton et al., 2017). Self-attention and transformer architectures (Vaswani et al., 2017) are now the state of the art in language understanding (Devlin et al., 2018), extractive summarization (Liu, 2019), semantic role labeling (Strubell et al., 2018) and machine translation for low-resource languages (Rikters, 2018; Rikters et al., 2018). In this short survey, we first explain attention mechanisms and compare their interpretability in machine translation and text classification. Then, we explore self-attention and highlight the limits of its interpretability. Finally, we provide alternatives and challenges from the literature to attention for model interpretability, and outline suggestions for future work.
Learn More