Site Logo

Deep Learning: Zero to Hero

Home

❯

Deep NNs

❯

Optimizers and schedulers

Optimizers and schedulers

Optimizer theory and learning-rate scheduling for deep neural networks, from sparse gradients to AdamW and cosine annealing

10 items under this folder.

  • Mar 03, 2026

    Sparse gradients

    • gradients
    • sparse-gradients
    • optimizers
  • Mar 03, 2026

    AdaGrad

    • optimizers
    • adagrad
  • Mar 03, 2026

    RMSProp

    • optimizers
    • rmsprop
  • Mar 03, 2026

    Adam

    • optimizers
    • adam
  • Mar 03, 2026

    AdamW

    • optimizers
    • adamw
    • weight-decay
  • Mar 15, 2026

    Choosing an optimizer

    • optimizer-selection
    • adaptive-optimizers
  • Mar 03, 2026

    Learning rate scheduling

    • learning-rate
    • lr-scheduling
  • Mar 15, 2026

    Step and Exponential decay

    • learning-rate
    • lr-scheduling
    • step-decay
    • exponential-decay
  • Apr 13, 2026

    Cosine annealing

    • learning-rate
    • lr-scheduling
    • cosine-annealing
  • Mar 15, 2026

    Choosing a LR scheduler

    • learning-rate
    • lr-scheduling
    • scheduler-selection

‹PreviousVanishing gradient
OverfittingNext›
Random page

Attention is all you need, but ❤️ turns Zero into Hero.

Giuseppe Alfieri · GitHub · Manifesto · Feedback · © 2026

  • Scroll to top ↑