NLP and Sequence Modeling

This database tracks the shift from dense word representations and recurrent models to attention-based architectures, pre-training, and large language models.

Focused Databases

DatabaseScope
Representations and Sequence ModelsWord2Vec, GloVe, LSTMs, Seq2Seq, attention, ELMo, and early neural language representation.
TransformersCore architecture, pre-training, efficient attention, LLM scaling, adaptation, vision, and multimodal Transformers.
State Space ModelsS4, S5, Mamba, Mamba-2, Mamba-3, and linear-time alternatives to attention.

Milestone Map

StageKey PapersPrimary Database
Dense word representationsWord2Vec, GloVe, ELMoRepresentations and Sequence Models
Recurrent sequence modelingLSTM, Seq2Seq, attention-based neural machine translationRepresentations and Sequence Models
Transformer pre-trainingGPT-1, BERT, XLNet, RoBERTa, T5, BART, ELECTRA, GPT-2, GPT-3Transformers
Efficient long-context modelingTransformer-XL, Longformer, Big Bird, FlashAttention, S4, MambaTransformers and State Space Models

Suggested Path

StepRead
1Representations and Sequence Models for Word2Vec, LSTM, Seq2Seq, and Bahdanau attention.
2Transformers for the architecture shift and modern LLM lineage.
3State Space Models for the linear-time sequence modeling branch.