# How to calculate the total error from an arbitrary hidden layer in a neural network back propogation?

I’m following the tutorial over at https://mattmazur.com/2015/03/17/a-step-by-step-backpropagation-example/ and so far everything makes sense to me. I am now trying to reason about how these formulas extend to any number of hidden layers. Specifically, how to proceed when you have the partial derivative of total error with respect to a weight. In the above post, $\frac{\partial Etotal}{\partial w1}$ is just the sum of the partials for $Eo1$ and $Eo2$ with respect to $w1$. Let’s say that there is an extra layer between the $h$ nodes and the $o$ nodes, with nodes $j1$ and $j2$. Is $\frac{\partial Etotal}{\partial w1} = \frac{\partial Ej1}{\partial w1} + \frac{\partial Ej2}{\partial w1}$ or is it more complicated? Something like: $\frac{\partial Etotal}{\partial w1} = \frac{\partial Eo1}{\partial w1} + \frac{\partial Eo2}{\partial w1} = (\frac{\partial Ej1}{\partial w1} + \frac{\partial Ej2}{\partial w1}) + (\frac{\partial Ej1}{\partial w1} + \frac{\partial Ej2}{\partial w1})$

The short answer is yes, $\frac{\partial E_{total}}{\partial w_1} = \frac{\partial E_{j_1}}{\partial w_1} + \frac{\partial E_{j_2}}{\partial w_1}$.

The long answer. The key formula is the chain rule, as D.W. mentioned:

$\frac{\partial E_{total}}{\partial w_1} = \frac{\partial E_{total}}{\partial out_{h_1}} \cdot \frac{\partial out_{h_1}}{\partial net_{h_1}} \cdot \frac{\partial net_{h_1}}{\partial w_1}$

What’s good about it is that all three components on the left are local information. No matter what the next or previous layers are,

$\frac{\partial out_{h_1}}{\partial net_{h_1}} = out_{h_1} (1 - out_{h_1})$

$\frac{\partial net_{h_1}}{\partial w_1} = i_1$

Hence, the calculation is the current node depends only on forward messages from direct neighbors to the left and backward messages from direct neighbors to the right:

$out_{h_1}$ and $i_1$ are known from the forward pass, and $\frac{\partial E_{total}}{\partial out_{h_1}}$ is the total backward message.

In the architecture from the post, the node $h_1$ has two direct neighbors to the right: $o_1$ and $o_2$, and that explains the sum $\frac{\partial E_{total}}{\partial out_{h_1}} = \frac{\partial E_{o_1}}{\partial out_{h_1}} + \frac{\partial E_{o_2}}{\partial out_{h_1}}$. In your example, the neighbors are $j_1$ and $j_2$, so it will be the sum $\frac{\partial E_{total}}{\partial out_{h_1}} = \frac{\partial E_{j_1}}{\partial out_{h_1}} + \frac{\partial E_{j_2}}{\partial out_{h_1}}$. If $h_1$ has even more connections, all of them will pass the backward message and they will be added up.