Fully connected (at least layer to layer with more than 2 hidden layers) backprop networks are universal learners. Unfortunately, they are often slow to learn and tend to over-fit or have awkward generalizations.
From fooling around with these networks, I have observed that pruning some of the edges (so that their weight is zero and impossible to change) tends to make the networks learn faster and generalize better. Is there a reason for this? Is it only because of a decrease in the dimensionality of the weights search space, or is there a more subtle reason?
Also, is the better generalization an artifact of the ‘natural’ problems I am looking at?
Fewer nodes/edges (or edges with fixed weights) means that there are fewer parameters whose values need to be found, and this typically reduces the time to learn. Also, when there are fewer parameters, the space that can be expressed by the neural network has fewer dimensions, so the neural network can only express more general models. It is thus is less capable of over-fitting the data, and hence the models will seem more general.