Read the Overview First

This note is part of the Micrograd series. Before continuing, it is essential to read Micrograd overview: it explains how these Markdown notes are organized, how the code carries across notes, and how to read or reproduce the material step by step.

Rebuilding the Micrograd neuron in PyTorch

From scalar values to tensors

Micrograd is a scalar-valued autograd engine: each Value object stores one scalar number and one scalar gradient.

PyTorch, instead, is built around tensors. To reproduce the same neuron example as directly as possible, scalar-valued tensors are used. Each tensor below contains a single scalar element.

Precision and gradient tracking

Python floating-point numbers use double precision by default, which corresponds to float64. The Micrograd Value class stores its data using Python floats, so the PyTorch version is cast to double precision with .double() in order to match the same numerical precision.

By default:

torch.tensor([2.0]).dtype

would produce a torch.float32 tensor. After calling:

torch.tensor([2.0]).double().dtype

the tensor uses torch.float64, matching the precision used by Python floats in the Micrograd implementation.

Building the same neuron

The same scalar neuron is rebuilt by representing x1, x2, w1, w2, and b as PyTorch tensors containing a single scalar value.

The operations then produce intermediate tensors. In particular:

The tensor o plays the same role as the output Value object in Micrograd.

Tensor values and gradients

In Micrograd, a Value stores the scalar value in .data and the gradient in .grad.

PyTorch tensors also expose .data and .grad. Since o is a tensor containing a single scalar, .item() extracts that scalar as a plain Python number.

Finally, o.backward() asks PyTorch to run automatic backpropagation from the output tensor o, just as o.backward() did in the custom Value class.

import torch
 
x1 = torch.tensor([2.0]).double()               ; x1.requires_grad = True
x2 = torch.tensor([0.0]).double()               ; x2.requires_grad = True
w1 = torch.tensor([-3.0]).double()              ; w1.requires_grad = True
w2 = torch.tensor([1.0]).double()               ; w2.requires_grad = True
b = torch.tensor([6.8813735870195432]).double() ; b.requires_grad  = True
n = x1*w1 + x2*w2 + b
o = torch.tanh(n)
 
print(o.data.item())
o.backward()
 
 
print('---')
print('x2', x2.grad.item())
print('w2', w2.grad.item())
print('x1', x1.grad.item())
print('w1', w1.grad.item()) 
0.7071066904050358
---
x2 0.5000001283844369
w2 0.0
x1 -1.5000003851533106
w1 1.0000002567688737

Why requires_grad = True is needed

PyTorch builds an autograd graph for operations that involve tensors with requires_grad=True.

In a real PyTorch neural network, weights and biases are usually stored as nn.Parameter objects, and nn.Parameter has requires_grad=True by default.

In this note, however, the neuron is built manually from raw tensors. The tensors x1, x2, w1, w2, and b are leaf tensors because they are not produced by earlier operations. Setting requires_grad = True tells PyTorch that gradients should be tracked from these tensors and accumulated in their .grad fields during the backward pass.

In a normal training setup, gradients are usually needed for parameters such as weights and biases, while raw input data often does not require gradients. In this demonstration, gradients are enabled for both the inputs and the parameters so the PyTorch results can be compared directly with Micrograd.

Micrograd as a scalar special case

The conceptual structure is the same: build a computation graph during the forward pass, then call backward() on the output to populate gradients.

Same idea, larger engine

With scalar-valued tensors, PyTorch reproduces the same computation that was built manually in Micrograd.

The difference is that PyTorch is designed for tensor operations, not only scalar operations. This makes it significantly more efficient in practice, because many operations can be performed in parallel over tensor data.