The Local Receptive Field (LRF) of a neuron within a feature map delineates the exact sub-volume of the input (i.e., the raw image or the previous input feature map) that directly drives its activation. It formally quantifies the restricted spatial context from which a single unit is able to extract information.
Etymology
- Receptive field is a term borrowed from neurophysiology: cells in the visual cortex activate only when a stimulus appears in a specific, restricted region of the retina.
- It is Local because, unlike in an MLP where each neuron of a layer is connected to all the neurons of the previous layer (i.e. fully connectivity), a CNN neuron is connected only to a small, localized topological neighborhood of the previous layer.
Let be the spatial dimensionality of the data:
- : Signals (e.g., audio, text)
- : Images or spectrograms
- : Videos or volumetric data (e.g., CT, MRI scans)
- : Multi-dimensional tensors (e.g., scientific data)
A Local Receptive Field defines a spatial window of shape . However, it is crucial to understand that CNNs employ dense, global connectivity across the channel dimension (). Therefore, the actual volume of activations from the previous layer to which a single neuron in layer is sensitive has the shape:
The 2D Case ( )
If the spatial window is (i.e., ), the neuron does not just look at a flat square:
- If the input is an RGB image (), the neuron observes a volume of raw pixel values.
- If the input is a hidden layer with 64 feature maps (), the neuron observes abstract feature activations.
The concept remains the same, only the nature of the observed units (raw pixels vs. abstract features) changes moving from input layer to deeper hidden layers.
Isotropic vs. Anisotropic LRFs
If all spatial dimensions are equal (), the spatial LRF is an isotropic hypercube of size (e.g., in 2D).
While isotropic spatial kernels are the most common in standard CNNs, some domains benefit from anisotropic (non-square / non-cubic) LRFs:
- Text / OCR: Tall and narrow windows (e.g., ) capture vertical strokes efficiently.
- Audio / Spectrograms: Wider windows (e.g., ) capture distinct frequency bands over time.
- Medical Imaging: Anisotropic 3D kernels match the uneven physical slice spacing in CT/MRI volumes.
LRF vs. Dense Connectivity
Success
By restricting the neuron’s vision to a local window, CNNs directly solve the Loss of Spatial Prior inherent to Multi-Layer Perceptrons. Instead of flattening the image into a vector, which destroys the potential correlation among neighboring pixels, the local window preserves the spatial arrangement. Edges, textures, and geometric patterns remain perfectly interpretable.
| MLP Neuron | CNN Neuron | |
|---|---|---|
| 🔗Connectivity | Fully connected | Local (connected only to the LRF) |
| 👁️ Receptive Field | Global (sees the entire flattened input) | Limited (sees only activations in the case of an isotropic LRF) |
| Geometry | Destroyed (Loss of spatial prior) | Preserved (Spatial arrangement intact) |

Final Remark
A neuron does not “see” the entire input, but only the local portion defined by the LRF.