State Space Models

State Space Models (SSMs) are sequence models based on a hidden state that is updated over time. In deep learning, they are studied as an alternative to attention-heavy Transformers, especially when long context, streaming inference, or memory efficiency matter.

Problem Map

Problem	Why It Matters	What SSMs Try to Do
Transformer attention scales poorly with length	Full attention has O(n^2) attention cost and a growing KV cache.	Use recurrent/state updates with linear scaling and compact state.
Long-context inference is expensive	Agentic workflows, codebases, books, audio, and genomics can require very long sequences.	Maintain a compressed history instead of attending over every token.
Older linear models lost quality	Many efficient models struggled with content-based reasoning and discrete language.	Make state updates input-dependent so the model can selectively remember or forget.
Linear inference can underuse hardware	Theoretical O(n) does not automatically mean high GPU utilization.	Redesign the recurrence and kernels around practical inference throughput.
State tracking is hard	Some linear models fail tasks requiring precise updates to latent variables or symbolic state.	Use richer state dynamics, complex-valued updates, and MIMO formulations.

Background SSMs

Year	Paper	Topic	Note
2021	Efficiently Modeling Long Sequences with Structured State Spaces	S4	Structured SSM layer for long-range sequence modeling.
2022	Diagonal State Spaces are as Effective as Structured State Spaces	DSS	Simpler diagonal SSMs can match S4 on long-range tasks.
2022	Simplified State Space Layers for Sequence Modeling	S5	Multi-input, multi-output SSM using efficient parallel scans.

Mamba Family

Year	Paper	Topic	Problem Addressed
2023	Mamba: Linear-Time Sequence Modeling with Selective State Spaces (code)	Mamba / selective SSM	Makes SSM parameters input-dependent to recover content-aware reasoning while keeping linear sequence scaling.
2024	Transformers are SSMs (PMLR)	Mamba-2 / state space duality	Connects SSMs and attention through structured semiseparable matrices; makes Mamba-style layers faster and more Transformer-like.
2026	Mamba-3: Improved Sequence Modeling using State Space Principles (project)	Mamba-3 / inference-first SSM	Targets the quality-efficiency gap: better state tracking, retrieval, and decode hardware utilization with expressive recurrence, complex state updates, and MIMO SSMs.

Key Ideas

Idea	Meaning
State	A compact memory vector updated as the sequence is processed.
Selectivity	The model decides what to propagate or forget based on the current input.
Linear scaling	Sequence processing avoids the O(n^2) attention matrix.
Constant-memory decoding	In autoregressive inference, the model can update a fixed-size state instead of expanding a KV cache.
State Space Duality	Mamba-2 shows structural links between SSMs and attention-like computations.
MIMO SSM	Mamba-3 processes vector-valued inputs/outputs in the recurrence to improve expressivity and hardware utilization.

Reading Path

Step	Read
1	S4 for the structured state-space foundation.
2	S5 for the MIMO simplification perspective.
3	Mamba for selective state spaces and language modeling.
4	Mamba-2 for the SSM-attention bridge and faster SSD layer.
5	Mamba-3 for inference-first design, richer state tracking, and MIMO recurrence.

Deep Learning: Zero to Hero

Explorer

State Space Models

State Space Models

Problem Map

Background SSMs

Mamba Family

Key Ideas

Reading Path

Graph View

Table of Contents

Backlinks