Strided convolution is a flavor of the standard convolution (vanilla convolution), introduced earlier, and it is a technique that enables downsampling (reduction of spatial resolution).
The term strided implies that the convolutional filter is not shifted by a single pixel at a time, as in vanilla convolution, but is instead shifted by pixels in all directions.
If (default): the filter moves 1 pixel at a time (horizontally and vertically in the case).
If : the local receptive field shifts by 2 pixels at a time in both directions
- this produces a smaller output
- equivalent to downsampling with a factor of 2
Important
Strided convolution (like pooling) implements downsampling, however, unlike pooling, it does so in a learnable fashion, because convolutional filters have learnable weights.
| No Padding - Stride of | No Padding - Stride of |
| Padding - Stride of | Padding - Stride of |
PROS
- Learnable: it performs data-adaptive downsampling.
- Achieves two things at once:
- Extracts relevant features (like a convolution)
- Reduces resolution (like pooling)
- Since convolutional filters are learnable feature extractors, those used in strided convolution can also learn meaningful features during training.
CONS
Learnable: since the filters have weights, during training there will be gradient components also for the weights of the strided convolution
this makes gradient propagation more difficult.
Strided Convolution vs Pooling
During backpropagation (right-to-left through the network):
-
When passing through a max pooling layer, which has no learnable weights,
there is no extra contribution to the gradient (no overhead from gradient components)
the gradient flows smoothly, without any overhead it only encounters convolutional layers, where backpropagation may eventually halt due to dying ReLU or vanishing gradient issues. -
With strided convolution, instead:
- The layer introduces new gradient components
- These can exacerbate the vanishing gradient problem, unlike pooling, which does not contribute in this way
-
Strided convolution is computationally more expensive than pooling
📌 In summary
Strided convolution implements downsampling in a smart and adaptive way, but it introduces greater computational complexity and may hinder gradient flow, contributing to the vanishing gradient phenomenon.