## Neural Computation

January 1993, Vol. 5, No. 1, Pages 140-153
(doi: 10.1162/neco.1993.5.1.140)
© 1993 Massachusetts Institute of Technology
Statistical Theory of Learning Curves under Entropic Loss Criterion
Article PDF (610.43 KB)
Abstract

The present paper elucidates a universal property of learning curves, which shows how the generalization error, training error, and the complexity of the underlying stochastic machine are related and how the behavior of a stochastic machine is improved as the number of training examples increases. The error is measured by the entropic loss. It is proved that the generalization error converges to H0, the entropy of the conditional distribution of the true machine, as H0 + m*/(2t), while the training error converges as H0 - m*/(2t), where t is the number of examples and m* shows the complexity of the network. When the model is faithful, implying that the true machine is in the model, m* is reduced to m, the number of modifiable parameters. This is a universal law because it holds for any regular machine irrespective of its structure under the maximum likelihood estimator. Similar relations are obtained for the Bayes and Gibbs learning algorithms. These learning curves show the relation among the accuracy of learning, the complexity of a model, and the number of training examples.