How to calculate the total error from an arbitrary hidden layer in a neural network back propogation?

I’m following the tutorial over at and so far everything makes sense to me. I am now trying to reason about how these formulas extend to any number of hidden layers. Specifically, how to proceed when you have the partial derivative of total error with respect to a weight. In the above post, Etotalw1 is just the sum of the partials for Eo1 and Eo2 with respect to w1. Let’s say that there is an extra layer between the h nodes and the o nodes, with nodes j1 and j2. Is Etotalw1=Ej1w1+Ej2w1 or is it more complicated? Something like: Etotalw1=Eo1w1+Eo2w1=(Ej1w1+Ej2w1)+(Ej1w1+Ej2w1)


The short answer is yes, Etotalw1=Ej1w1+Ej2w1.

The long answer. The key formula is the chain rule, as D.W. mentioned:


What’s good about it is that all three components on the left are local information. No matter what the next or previous layers are,



Hence, the calculation is the current node depends only on forward messages from direct neighbors to the left and backward messages from direct neighbors to the right:

outh1 and i1 are known from the forward pass, and Etotalouth1 is the total backward message.

In the architecture from the post, the node h1 has two direct neighbors to the right: o1 and o2, and that explains the sum Etotalouth1=Eo1outh1+Eo2outh1. In your example, the neighbors are j1 and j2, so it will be the sum Etotalouth1=Ej1outh1+Ej2outh1. If h1 has even more connections, all of them will pass the backward message and they will be added up.

Source : Link , Question Author : Matthew , Answer Author : Maxim

Leave a Comment