Deep Learning with CNNs consists of using convolutional filters to extract features from an image, in a way similar to traditional Image Processing.
DL vs. Image Processing Domain
There is a fundamental difference between DL and the domain of Image Processing. In DL:
Filter weights are not handcrafted (manually designed), but are learned automatically by the network during training.
Instead of using a single filter, the network applies many filters per layer.
Each filter extracts different features, corresponding to different representations of the data: this allows the network to learn multiple mappings simultaneously.
The Missing Piece 🧩: Looking Farther Ahead
Important
To obtain features at higher levels of abstraction, local convolution alone is not enough.
It is necessary to look from farther away, that is, to change scale. A crucial step is still missing to achieve a true hierarchy of representations: the scale must change.
To extract new features, different from those just acquired, it is essential to “zoom out” on the data. For example, edges are learned by observing small portions of the input image (patches of pixels) because within these patches, the only discernible structure is precisely the edges.
Question
But how can more abstract, higher-level details be learned?
Answer: downsampling
It is necessary to reduce the resolution, that is, to perform downsampling: it is precisely this change of scale that enables the network to capture increasingly complex patterns in the data.
By reducing the resolution of the convolutional output, subsequent layers can capture larger and more abstract structures in the data.
The Role of Downsampling
In practice, downsampling:
- Reduces the resolution of the convolutional output
- Expands the network’s field of view
- Allows subsequent layers to “see farther”
- Lays the groundwork for extracting higher-level features
Important
Downsampling therefore represents the fourth key ingredient of CNNs:
- Local Receptive Fields,
- Parameter sharing,
- Multiple feature maps,
- Downsampling (scale reduction for building hierarchical abstractions).
