2001年12月
Upper bound of the expected training error of neural network regression for a Gaussian noise sequence
NEURAL NETWORKS
- ,
- ,
- ,
- ,
- 巻
- 14
- 号
- 10
- 開始ページ
- 1419
- 終了ページ
- 1429
- 記述言語
- 英語
- 掲載種別
- 研究論文(学術雑誌)
- DOI
- 10.1016/S0893-6080(01)00122-8
- 出版者・発行元
- PERGAMON-ELSEVIER SCIENCE LTD
In neural network regression problems, often referred to as additive noise models, NIC (Network Information Criterion) has been proposed as a general model selection criterion to determine the optimal network size with high generalization performance. Although NIC has been derived using asymptotic expansion, it has been pointed out that this technique cannot be applied under the assumption that a target function is in a family of assumed networks and the family is not minimal for representing the target true function, i.e. the overrealizable case, in which NIC reduces to the well-known AIC (Akaike Information Criterion) and others depending on a loss function. Because NIC is the unbiased estimator of generalization error based on training error, it is required to derive the expectations of errors for neural networks for such cases. This paper gives upper bounds of the expectations of training errors with respect to the distribution of training data, which we call the expected training error, for some types of networks under the squared error loss. In the overrealizable case, because the errors are determined by fitting properties of networks to noise components, including in data, the target set of data is taken to be a Gaussian noise sequence. For radial basis function networks and 3-layered neural networks with bell shaped activation function in the hidden layer, the expected training error is bounded above by sigma (2)(*) - 2n sigma (2)(*)logT/T, where sigma (2)(*) is the variance of noise, n is the number of basis functions or the number of hidden units and T is the number of data. Furthermore, for 3-layered neural networks with sigmoidal activation function in the hidden layer, we obtained the upper bound of sigma (2)(*) - O(log T/T) when n > 2. If the number of data is large enough, these bounds of the expected training error are smaller than sigma (2)(*) - N(n)sigma (2)(*)/T as evaluated in NIC, where N(n) is the number of all network parameters. (C) 2001 Elsevier Science Ltd. All rights reserved.
- リンク情報
-
- DOI
- https://doi.org/10.1016/S0893-6080(01)00122-8
- DBLP
- https://dblp.uni-trier.de/rec/journals/nn/HagiwaraHTUK01
- Web of Science
- https://gateway.webofknowledge.com/gateway/Gateway.cgi?GWVersion=2&SrcAuth=JSTA_CEL&SrcApp=J_Gate_JST&DestLinkType=FullRecord&KeyUT=WOS:000172609200006&DestApp=WOS_CPL
- URL
- http://dblp.uni-trier.de/db/journals/nn/nn14.html#journals/nn/HagiwaraHTUK01
- ID情報
-
- DOI : 10.1016/S0893-6080(01)00122-8
- ISSN : 0893-6080
- DBLP ID : journals/nn/HagiwaraHTUK01
- Web of Science ID : WOS:000172609200006