Fully connected (at least layer to layer with more than 2 hidden layers) backpro
ID: 646426 • Letter: F
Question
Fully connected (at least layer to layer with more than 2 hidden layers)
backprop networks are universal learners. Unfortunately, they are often slow to
learn and tend to over-fit or have awkward generalizations.
From fooling around with these networks, I have observed that pruning some of
the edges (so that their weight is zero and impossible to change) tends to make
the networks learn faster and generalize better. Is there a reason for this? Is
it only because of a decrease in the dimensionality of the weights search
space, or is there a more subtle reason?
Also, is the better generalization an artifact of the 'natural' problems I am
looking at?
Explanation / Answer
Fewer nodes/edges (or edges with fixed weights) means that there are fewer
parameters whose values need to be found, and this typically reduces the time
to learn. Also, when there are fewer parameters, the space that can be
expressed by the neural network has fewer dimensions, so the neural network can
only express more general models. It is thus is less capable of over-fitting
the data, and hence the models will seem more general.
Related Questions
Navigate
Integrity-first tutoring: explanations and feedback only — we do not complete graded work. Learn more.