Representations and Sequence Models

This database covers the pre-Transformer lineage: distributed word representations, recurrent sequence modeling, encoder-decoder translation, attention, and contextual language representations.

Word and Contextual Representations

YearPaperTopicNote
2013Efficient Estimation of Word Representations in Vector SpaceWord2VecEfficient dense word representations from local prediction tasks.
2014GloVe: Global Vectors for Word RepresentationGloVeWord vectors from global co-occurrence statistics.
2018Deep Contextualized Word RepresentationsELMoContext-dependent word representations from deep bidirectional language models.

Recurrent Sequence Modeling and Attention

YearPaperTopicNote
1997Long Short-Term MemoryLSTMGated recurrent memory for long dependencies.
2014Sequence to Sequence Learning with Neural NetworksSeq2SeqEncoder-decoder mapping between variable-length sequences.
2014Neural Machine Translation by Jointly Learning to Align and TranslateAttentionDynamic alignment over encoder states during decoding.
2015Effective Approaches to Attention-based Neural Machine TranslationAttention variantsSystematizes global and local attention variants for NMT.

Reading Path

StepRead
1Word2Vec and GloVe for static word embeddings.
2LSTM and Seq2Seq for recurrent sequence modeling.
3Bahdanau attention and Luong attention for the bridge to Transformers.
4ELMo for contextual representations before BERT-style pre-training.