Representations and Sequence Models
This database covers the pre-Transformer lineage: distributed word representations, recurrent sequence modeling, encoder-decoder translation, attention, and contextual language representations.
Word and Contextual Representations
| Year | Paper | Topic | Note |
|---|---|---|---|
| 2013 | Efficient Estimation of Word Representations in Vector Space | Word2Vec | Efficient dense word representations from local prediction tasks. |
| 2014 | GloVe: Global Vectors for Word Representation | GloVe | Word vectors from global co-occurrence statistics. |
| 2018 | Deep Contextualized Word Representations | ELMo | Context-dependent word representations from deep bidirectional language models. |
Recurrent Sequence Modeling and Attention
| Year | Paper | Topic | Note |
|---|---|---|---|
| 1997 | Long Short-Term Memory | LSTM | Gated recurrent memory for long dependencies. |
| 2014 | Sequence to Sequence Learning with Neural Networks | Seq2Seq | Encoder-decoder mapping between variable-length sequences. |
| 2014 | Neural Machine Translation by Jointly Learning to Align and Translate | Attention | Dynamic alignment over encoder states during decoding. |
| 2015 | Effective Approaches to Attention-based Neural Machine Translation | Attention variants | Systematizes global and local attention variants for NMT. |
Reading Path
| Step | Read |
|---|---|
| 1 | Word2Vec and GloVe for static word embeddings. |
| 2 | LSTM and Seq2Seq for recurrent sequence modeling. |
| 3 | Bahdanau attention and Luong attention for the bridge to Transformers. |
| 4 | ELMo for contextual representations before BERT-style pre-training. |