Fully connected (at least layer to layer with more than 2 hidden layers) backpro

ID: 646426 • Letter: F

Question

Fully connected (at least layer to layer with more than 2 hidden layers)

backprop networks are universal learners. Unfortunately, they are often slow to

learn and tend to over-fit or have awkward generalizations.

From fooling around with these networks, I have observed that pruning some of

the edges (so that their weight is zero and impossible to change) tends to make

the networks learn faster and generalize better. Is there a reason for this? Is

it only because of a decrease in the dimensionality of the weights search

space, or is there a more subtle reason?

Also, is the better generalization an artifact of the 'natural' problems I am

looking at?

Fewer nodes/edges (or edges with fixed weights) means that there are fewer

parameters whose values need to be found, and this typically reduces the time

to learn. Also, when there are fewer parameters, the space that can be

expressed by the neural network has fewer dimensions, so the neural network can

only express more general models. It is thus is less capable of over-fitting

the data, and hence the models will seem more general.

Integrity-first tutoring: explanations and feedback only — we do not complete graded work. Learn more.