NLP and Sequence Modeling
This database tracks the shift from dense word representations and recurrent models to attention-based architectures, pre-training, and large language models.
Focused Databases
| Database | Scope |
|---|---|
| Representations and Sequence Models | Word2Vec, GloVe, LSTMs, Seq2Seq, attention, ELMo, and early neural language representation. |
| Transformers | Core architecture, pre-training, efficient attention, LLM scaling, adaptation, vision, and multimodal Transformers. |
| State Space Models | S4, S5, Mamba, Mamba-2, Mamba-3, and linear-time alternatives to attention. |
Milestone Map
| Stage | Key Papers | Primary Database |
|---|---|---|
| Dense word representations | Word2Vec, GloVe, ELMo | Representations and Sequence Models |
| Recurrent sequence modeling | LSTM, Seq2Seq, attention-based neural machine translation | Representations and Sequence Models |
| Transformer pre-training | GPT-1, BERT, XLNet, RoBERTa, T5, BART, ELECTRA, GPT-2, GPT-3 | Transformers |
| Efficient long-context modeling | Transformer-XL, Longformer, Big Bird, FlashAttention, S4, Mamba | Transformers and State Space Models |
Suggested Path
| Step | Read |
|---|---|
| 1 | Representations and Sequence Models for Word2Vec, LSTM, Seq2Seq, and Bahdanau attention. |
| 2 | Transformers for the architecture shift and modern LLM lineage. |
| 3 | State Space Models for the linear-time sequence modeling branch. |