Generalization and Scaling

This database collects papers that explain why modern networks generalize despite overparameterization, and how performance changes with model size, data, and compute.

Generalization and Overparameterization

Year	Paper	Topic	Note
2016	Understanding Deep Learning Requires Rethinking Generalization	Generalization puzzle	Shows large networks can fit random labels.
2018	The Lottery Ticket Hypothesis	Sparse subnetworks	Dense networks contain trainable sparse winning tickets.
2018	Neural Tangent Kernel	Infinite-width theory	Connects wide neural networks to kernel dynamics.
2019	Deep Double Descent	Double descent	Test error can improve again beyond interpolation.
2020	What Neural Networks Memorize and Why	Memorization	Studies memorization patterns in deep networks.

Scaling Laws

Year	Paper	Topic	Note
2020	Scaling Laws for Neural Language Models (OpenAI)	Scaling laws	Loss vs model size, dataset size, and compute.
2022	Training Compute-Optimal Large Language Models	Chinchilla	Compute-optimal balance between parameters and tokens.
2020	Scaling Laws for Autoregressive Generative Modeling	Generative scaling	Scaling behavior beyond language-only models.
2022	Scaling Laws for Reward Model Overoptimization	Alignment scaling	Studies how reward model optimization can overfit.

Reading Path

Step	Read
1	Rethinking Generalization and Lottery Ticket for overparameterized networks.
2	NTK and Deep Double Descent for theory and interpolation behavior.
3	Scaling Laws and Chinchilla for modern large-model training.
4	Reward model overoptimization for scaling issues in alignment.