論文

査読有り
2001年12月

Upper bound of the expected training error of neural network regression for a Gaussian noise sequence

NEURAL NETWORKS
  • K Hagiwara
  • ,
  • T Hayasaka
  • ,
  • N Toda
  • ,
  • S Usui
  • ,
  • K Kuno

14
10
開始ページ
1419
終了ページ
1429
記述言語
英語
掲載種別
研究論文(学術雑誌)
DOI
10.1016/S0893-6080(01)00122-8
出版者・発行元
PERGAMON-ELSEVIER SCIENCE LTD

In neural network regression problems, often referred to as additive noise models, NIC (Network Information Criterion) has been proposed as a general model selection criterion to determine the optimal network size with high generalization performance. Although NIC has been derived using asymptotic expansion, it has been pointed out that this technique cannot be applied under the assumption that a target function is in a family of assumed networks and the family is not minimal for representing the target true function, i.e. the overrealizable case, in which NIC reduces to the well-known AIC (Akaike Information Criterion) and others depending on a loss function. Because NIC is the unbiased estimator of generalization error based on training error, it is required to derive the expectations of errors for neural networks for such cases. This paper gives upper bounds of the expectations of training errors with respect to the distribution of training data, which we call the expected training error, for some types of networks under the squared error loss. In the overrealizable case, because the errors are determined by fitting properties of networks to noise components, including in data, the target set of data is taken to be a Gaussian noise sequence. For radial basis function networks and 3-layered neural networks with bell shaped activation function in the hidden layer, the expected training error is bounded above by sigma (2)(*) - 2n sigma (2)(*)logT/T, where sigma (2)(*) is the variance of noise, n is the number of basis functions or the number of hidden units and T is the number of data. Furthermore, for 3-layered neural networks with sigmoidal activation function in the hidden layer, we obtained the upper bound of sigma (2)(*) - O(log T/T) when n > 2. If the number of data is large enough, these bounds of the expected training error are smaller than sigma (2)(*) - N(n)sigma (2)(*)/T as evaluated in NIC, where N(n) is the number of all network parameters. (C) 2001 Elsevier Science Ltd. All rights reserved.

リンク情報
DOI
https://doi.org/10.1016/S0893-6080(01)00122-8
DBLP
https://dblp.uni-trier.de/rec/journals/nn/HagiwaraHTUK01
Web of Science
https://gateway.webofknowledge.com/gateway/Gateway.cgi?GWVersion=2&SrcAuth=JSTA_CEL&SrcApp=J_Gate_JST&DestLinkType=FullRecord&KeyUT=WOS:000172609200006&DestApp=WOS_CPL
URL
http://dblp.uni-trier.de/db/journals/nn/nn14.html#journals/nn/HagiwaraHTUK01
ID情報
  • DOI : 10.1016/S0893-6080(01)00122-8
  • ISSN : 0893-6080
  • DBLP ID : journals/nn/HagiwaraHTUK01
  • Web of Science ID : WOS:000172609200006

エクスポート
BibTeX RIS