State Space Models

State Space Models (SSMs) are sequence models based on a hidden state that is updated over time. In deep learning, they are studied as an alternative to attention-heavy Transformers, especially when long context, streaming inference, or memory efficiency matter.

Problem Map

ProblemWhy It MattersWhat SSMs Try to Do
Transformer attention scales poorly with lengthFull attention has O(n^2) attention cost and a growing KV cache.Use recurrent/state updates with linear scaling and compact state.
Long-context inference is expensiveAgentic workflows, codebases, books, audio, and genomics can require very long sequences.Maintain a compressed history instead of attending over every token.
Older linear models lost qualityMany efficient models struggled with content-based reasoning and discrete language.Make state updates input-dependent so the model can selectively remember or forget.
Linear inference can underuse hardwareTheoretical O(n) does not automatically mean high GPU utilization.Redesign the recurrence and kernels around practical inference throughput.
State tracking is hardSome linear models fail tasks requiring precise updates to latent variables or symbolic state.Use richer state dynamics, complex-valued updates, and MIMO formulations.

Background SSMs

YearPaperTopicNote
2021Efficiently Modeling Long Sequences with Structured State SpacesS4Structured SSM layer for long-range sequence modeling.
2022Diagonal State Spaces are as Effective as Structured State SpacesDSSSimpler diagonal SSMs can match S4 on long-range tasks.
2022Simplified State Space Layers for Sequence ModelingS5Multi-input, multi-output SSM using efficient parallel scans.

Mamba Family

YearPaperTopicProblem Addressed
2023Mamba: Linear-Time Sequence Modeling with Selective State Spaces (code)Mamba / selective SSMMakes SSM parameters input-dependent to recover content-aware reasoning while keeping linear sequence scaling.
2024Transformers are SSMs (PMLR)Mamba-2 / state space dualityConnects SSMs and attention through structured semiseparable matrices; makes Mamba-style layers faster and more Transformer-like.
2026Mamba-3: Improved Sequence Modeling using State Space Principles (project)Mamba-3 / inference-first SSMTargets the quality-efficiency gap: better state tracking, retrieval, and decode hardware utilization with expressive recurrence, complex state updates, and MIMO SSMs.

Key Ideas

IdeaMeaning
StateA compact memory vector updated as the sequence is processed.
SelectivityThe model decides what to propagate or forget based on the current input.
Linear scalingSequence processing avoids the O(n^2) attention matrix.
Constant-memory decodingIn autoregressive inference, the model can update a fixed-size state instead of expanding a KV cache.
State Space DualityMamba-2 shows structural links between SSMs and attention-like computations.
MIMO SSMMamba-3 processes vector-valued inputs/outputs in the recurrence to improve expressivity and hardware utilization.

Reading Path

StepRead
1S4 for the structured state-space foundation.
2S5 for the MIMO simplification perspective.
3Mamba for selective state spaces and language modeling.
4Mamba-2 for the SSM-attention bridge and faster SSD layer.
5Mamba-3 for inference-first design, richer state tracking, and MIMO recurrence.