Representations and Sequence Models

This database covers the pre-Transformer lineage: distributed word representations, recurrent sequence modeling, encoder-decoder translation, attention, and contextual language representations.

Word and Contextual Representations

Year	Paper	Topic	Note
2013	Efficient Estimation of Word Representations in Vector Space	Word2Vec	Efficient dense word representations from local prediction tasks.
2014	GloVe: Global Vectors for Word Representation	GloVe	Word vectors from global co-occurrence statistics.
2018	Deep Contextualized Word Representations	ELMo	Context-dependent word representations from deep bidirectional language models.

Recurrent Sequence Modeling and Attention

Year	Paper	Topic	Note
1997	Long Short-Term Memory	LSTM	Gated recurrent memory for long dependencies.
2014	Sequence to Sequence Learning with Neural Networks	Seq2Seq	Encoder-decoder mapping between variable-length sequences.
2014	Neural Machine Translation by Jointly Learning to Align and Translate	Attention	Dynamic alignment over encoder states during decoding.
2015	Effective Approaches to Attention-based Neural Machine Translation	Attention variants	Systematizes global and local attention variants for NMT.

Reading Path

Step	Read
1	Word2Vec and GloVe for static word embeddings.
2	LSTM and Seq2Seq for recurrent sequence modeling.
3	Bahdanau attention and Luong attention for the bridge to Transformers.
4	ELMo for contextual representations before BERT-style pre-training.