strided-convolution

Strided convolution is a flavor of the standard convolution (vanilla convolution), introduced earlier, and it is a technique that enables downsampling (reduction of spatial resolution).

The term strided implies that the convolutional filter is not shifted by a single pixel at a time, as in vanilla convolution, but is instead shifted by $S$ pixels in all directions.

If $S = 1$ (default): the filter moves 1 pixel at a time (horizontally and vertically in the $2 D$ case).
If $S = 2$ : the local receptive field shifts by 2 pixels at a time in both directions

this produces a smaller output
equivalent to downsampling with a factor of 2

Important

Strided convolution (like pooling) implements downsampling, however, unlike pooling, it does so in a learnable fashion, because convolutional filters have learnable weights.


No Padding - Stride of $1$	No Padding - Stride of $2$

$[1, 1, 1, 1]$ Padding - Stride of $1$	$[1, 1, 1, 1]$ Padding - Stride of $2$

PROS

Learnable: it performs data-adaptive downsampling.

Achieves two things at once:

Extracts relevant features (like a convolution)

Reduces resolution (like pooling)

Since convolutional filters are learnable feature extractors, those used in strided convolution can also learn meaningful features during training.

CONS

Learnable: since the filters have weights, during training there will be gradient components also for the weights of the strided convolution
$⟹$ this makes gradient propagation more difficult.

Strided Convolution vs Pooling

During backpropagation (right-to-left through the network):

When passing through a max pooling layer, which has no learnable weights,
there is no extra contribution to the gradient (no overhead from gradient components)
$⟹$ the gradient flows smoothly, without any overhead $⟹$ it only encounters convolutional layers, where backpropagation may eventually halt due to dying ReLU or vanishing gradient issues.
With strided convolution, instead:
- The layer introduces new gradient components
- These can exacerbate the vanishing gradient problem, unlike pooling, which does not contribute in this way
Strided convolution is computationally more expensive than pooling

📌 In summary

Strided convolution implements downsampling in a smart and adaptive way, but it introduces greater computational complexity and may hinder gradient flow, contributing to the vanishing gradient phenomenon.

Deep Learning

Explorer

strided-convolution

Strided Convolution vs Pooling

Graph View