Architecture and Trainability
This database tracks architectural ideas that changed what could be trained. The central theme is signal flow: how information and gradients move through very deep networks.
Depth and Skip Connections
| Year | Paper | Topic | Note |
|---|---|---|---|
| 2015 | Highway Networks | Gated skip paths | Early architecture for training very deep networks using learned information highways. |
| 2015 | Deep Residual Learning for Image Recognition | ResNet / skip connections | Kaiming He et al. introduced residual blocks that made very deep networks practical. |
| 2016 | Identity Mappings in Deep Residual Networks | Pre-activation ResNet | Clarifies why identity skip connections help forward and backward signal propagation. |
| 2016 | Deep Networks with Stochastic Depth | Stochastic depth | Randomly drops residual layers during training to regularize very deep nets. |
| 2016 | Densely Connected Convolutional Networks | DenseNet | Connects each layer to later layers for feature reuse and gradient flow. |
General Architectural Templates
| Year | Paper | Topic | Note |
|---|---|---|---|
| 2014 | Sequence to Sequence Learning with Neural Networks | Seq2Seq | Encoder-decoder neural sequence modeling. |
| 2015 | Neural Machine Translation by Jointly Learning to Align and Translate | Attention | Dynamic retrieval over encoder states. |
| 2017 | Attention Is All You Need | Transformer | Parallel attention-based architecture for sequence modeling. |
| 2018 | Neural Ordinary Differential Equations | Neural ODEs | Continuous-depth view of residual transformations. |
Reading Path
| Step | Read |
|---|---|
| 1 | Highway Networks, then ResNet. |
| 2 | Identity Mappings to understand why skip connections work. |
| 3 | Stochastic Depth and DenseNet for variants of deep signal flow. |
| 4 | Seq2Seq, attention, and Transformer for the sequence-modeling architectural shift. |
| 5 | Neural ODEs for the continuous-depth interpretation of residual networks. |