Convolution as matrix multiplication

Important

Discrete convolution can be viewed as multiplication by a matrix, but the matrix has several entries constrained to be equal to other entries.

For example, for univariate ( $1 D$ ) discrete convolution, each row of the matrix is constrained to be equal to the row above shifted by one element.

The Matrix Representation of 1D ‘Valid’ Discrete Convolution

Let’s consider a simple 1D signal and a kernel:

Input Signal $i$ : $[i_{1}, i_{2}, i_{3}, i_{4}, i_{5}]$
Kernel $k$ : $[k_{1}, k_{2}, k_{3}]$

A ‘valid’ convolution (implying no padding or kernel flipping) calculates the output $s$ only at positions where the kernel fully overlaps the signal:

$s_{1} = (i_{1} \cdot k_{1}) + (i_{2} \cdot k_{2}) + (i_{3} \cdot k_{3})$
$s_{2} = (i_{2} \cdot k_{1}) + (i_{3} \cdot k_{2}) + (i_{4} \cdot k_{3})$
$s_{3} = (i_{3} \cdot k_{1}) + (i_{4} \cdot k_{2}) + (i_{5} \cdot k_{3})$

This operation can be expressed as a single matrix multiplication ( $s = M \cdot i$ ) by constructing a corresponding Toeplitz matrix $M$ , as follows:

s_{1} s_{2} s_{3} = k_{1} 00 k_{2} k_{1} 0 k_{3} k_{2} k_{1} 0 k_{3} k_{2} 00 k_{3} \cdot i_{1} i_{2} i_{3} i_{4} i_{5}

This proves how the sliding operation of convolution can be regarded as a single matrix multiplication.

Toeplitz matrix

Looking at the matrix $M$ :
$M = k_{1} 00 k_{2} k_{1} 0 k_{3} k_{2} k_{1} 0 k_{3} k_{2} 00 k_{3}$
it can be observed the following pattern:

the second row is exactly the first row shifted one position to the right.

the third row is the second shifted one position to the right.

This special structure, with constant diagonals, is precisely a Toeplitz matrix.

2D case

In $2 D$ case, convolution corresponds to a doubly block circulant matrix.
The underlying principle remains the same, but the matrix structure becomes more elaborate in order to manage the 2D nature of the image.

In addition to these equality constraints between elements, convolution usually corresponds to a sparse matrix (a matrix whose elements are mostly zero).
This is because the kernel is generally much smaller than the input image.

Looking again at the above Toeplitz matrix $M$ , notice how many zeros are present: this is due to the kernel ( $3$ elements) being smaller than the input ( $5$ elements).
In a real case, with an image of millions of pixels and a kernel of size $5 \times 5$ , the matrix $M$ would be enormous but almost entirely filled with zeros.

Note

Any neural network algorithm that works with matrix multiplication and does not depend on specific properties of the matrix structure should work with convolution, without requiring any further changes to the neural network.

Typical convolutional neural networks do make use of further specializations in order to deal with large inputs efficiently, but these are not strictly necessary from a theoretical perspective.

Deep Learning

Explorer

Convolution as matrix multiplication

The Matrix Representation of 1D ‘Valid’ Discrete Convolution

Graph View