bp-mlp-3

An equation for the error in the output layer

The components of $δ^{L}$ are given by

δ_{j}^{L} = \frac{\partial L}{\partial a _{j}^{L}} σ^{'} (z_{j}^{L}) (BP1)

This is a very natural expression:

the first term on the right, $\partial L / \partial a_{j}^{L}$ , just measures how fast the cost is changing as a function of the $j - t h$ output activation. If, for example, $L$ doesn’t depend much on a particular output neuron, $j$ , then $δ_{j}^{L}$ will be small, which is what we’d expect.
the second term on the right, $σ^{'} (z_{j}^{L})$ , measures how fast the activation function $σ$ is changing at $z_{j}^{L}$

Notice that everything in Eq. $\ref e q : BP 1$ equation is easily computed. In particular, we compute $z_{j}^{L}$ while computing the behavior of the network, and it’s only a small additional overhead to compute $σ^{'} (z_{j}^{L})$ . The exact form of $\partial L / \partial a_{j}^{L}$ will, of course, depend on the form of the cost function. However, provided the cost function is known there should be little trouble computing $\partial L / \partial a_{j}^{L}$ . For example, if we’re using the quadratic cost function then $L = \frac{1}{2} \sum_{j} (y_{j} - a_{j}^{L})^{2}$ , and so $\partial L / \partial a_{j}^{L} = (a_{j}^{L} - y_{j})$ , which obviously is easily computable.

Equation~(\ref{eq:BP1}) is a component wise expression for $δ^{L}$ . It’s a perfectly good expression, but not the matrix-based form we want for backpropagation. However, it’s easy to rewrite the equation in a matrix-based form, as

δ^{L} = \nabla_{a} L ⊙ σ^{'} (z^{L}) (BP1a)

Here, $\nabla_{a} L$ is defined to be a vector whose components are the partial derivatives $\partial L / \partial a_{j}^{L}$ . You can think of $\nabla_{a} L$ as expressing the rate of change of $L$ with respect to the output activations. It’s easy to see that Equations (\ref{eq:BP1a}) and (\ref{eq:BP1}) are equivalent, and for that reason from now on we’ll use (\ref{eq:BP1}) interchangeably to refer to both equations. As an example, in the case of the quadratic cost we have $\nabla_{a} L = (a^{L} - y)$ , and so the fully matrix-based form of (\ref{eq:BP1}) becomes

δ^{L} = (a^{L} - y) ⊙ σ^{'} (z^{L})

As you can see, everything in this expression has a nice vector form, and is easily computed using a library such as Numpy.

Deep Learning

Explorer

bp-mlp-3

An equation for the error in the output layer

Interpretation

Graph View

Table of Contents