An equation for the error in the output layer

The components of are given by
This is a very natural expression:
-
the first term on the right, , just measures how fast the cost is changing as a function of the output activation. If, for example, doesn’t depend much on a particular output neuron, , then will be small, which is what we’d expect.
-
the second term on the right, , measures how fast the activation function is changing at
Notice that everything in Eq. equation is easily computed. In particular, we compute while computing the behavior of the network, and it’s only a small additional overhead to compute . The exact form of will, of course, depend on the form of the cost function. However, provided the cost function is known there should be little trouble computing . For example, if we’re using the quadratic cost function then , and so , which obviously is easily computable.
Equation~(\ref{eq:BP1}) is a component wise expression for . It’s a perfectly good expression, but not the matrix-based form we want for backpropagation. However, it’s easy to rewrite the equation in a matrix-based form, as
Here, is defined to be a vector whose components are the partial derivatives . You can think of as expressing the rate of change of with respect to the output activations. It’s easy to see that Equations (\ref{eq:BP1a}) and (\ref{eq:BP1}) are equivalent, and for that reason from now on we’ll use (\ref{eq:BP1}) interchangeably to refer to both equations. As an example, in the case of the quadratic cost we have , and so the fully matrix-based form of (\ref{eq:BP1}) becomes
As you can see, everything in this expression has a nice vector form, and is easily computed using a library such as Numpy.
Interpretation
