Authorship
This website is written and maintained by Giuseppe Alfieri. The overwhelming majority of the material on this site (the editorial structure, the mathematical derivations as presented, the pedagogical sequencing, the callouts and tables, the historical framings, the original explanations, the synthesis across topics, and the running notation) is original work by the author.
A small fraction of the early material on backpropagation and on the contrast between MSE and cross-entropy losses is adapted from, or inspired by, Michael Nielsen’s Neural Networks and Deep Learning (Determination Press, 2015), which is acknowledged in full below. Other sources are cited inline within the notes where their work is referenced.
The point of this page
Two facts need to be visible at the top.
- Most of this site is original. The reader is not looking at a paraphrase of someone else’s textbook. The bulk of the content has been written from scratch by Giuseppe Alfieri, including derivations, structure, callouts, tables, and the choices of what to emphasize and how to connect topics.
- The original content is protected. The license and usage terms are stated separately in License. The protection is intentional: producing this material took years of work, and the author retains the rights described there.
Acknowledged third-party sources
Michael Nielsen, Neural Networks and Deep Learning
A subset of the early material on this site is adapted from or inspired by:
- Author: Michael A. Nielsen
- Title: Neural Networks and Deep Learning
- Publisher: Determination Press, 2015
- Original Website: neuralnetworksanddeeplearning.com
Portions of the following notes contain material adapted from Nielsen:
- The introductory framing of the backpropagation algorithm.
- The toy example contrasting MSE and cross-entropy losses on a saturating sigmoid neuron.
- A small number of notational conventions for layered networks (, etc.).
Where Nielsen’s material has been adapted, credit is given to the original author. The licensing status of adapted material is summarized in License.
Other cited works
Throughout the notes, individual papers, books, and historical references are cited inline. These references are citations, not redistributions: the cited works remain entirely the property of their respective authors and publishers, and this site does not claim any rights over them. Citations are provided for scholarly attribution and for guiding the reader to primary sources.
A non-exhaustive list of works cited inline across the site:
| Domain | Examples of cited works |
|---|---|
| Foundations of NNs | Rosenblatt (1958), Minsky and Papert (1969), Rumelhart, Hinton and Williams (1986), Linnainmaa (1970), Werbos (1974) |
| Recurrent and gated architectures | Hochreiter and Schmidhuber (1997), Gers, Schmidhuber and Cummins (2000) |
| Attention and Transformers | Bahdanau, Cho and Bengio (2014), Luong, Pham and Manning (2015), Vaswani et al. (2017), Devlin et al. (2018), Brown et al. (2020), Dao et al. (2022) |
| State Space Models | Gu et al. (2021-2023), Gu and Dao (2023), Dao and Gu (2024) |
| Optimization and scaling | Polyak (1964), Nesterov (1983), Sutskever et al. (2013), Kingma and Ba (2014), Loshchilov and Hutter (2017, 2019), Kaplan et al. (2020), Hoffmann et al. (2022) |
| Vision and CNNs | LeCun et al. (1989), Krizhevsky, Sutskever and Hinton (2012), He et al. (2015, 2016) |
| Universal approximation | Cybenko (1989), Hornik (1991), Telgarsky (2016), Eldan and Shamir (2016) |
| Explainability | Selvaraju et al. (2017, Grad-CAM), Kim et al. (2018, CAV) |
Where third-party images or diagrams are embedded, attribution is provided at the point of use. Licensing and usage terms are summarized in License.