MISC

査読有り
2010年

The Coding Divergence for Measuring the Complexity of Separating Two Sets.

PROCEEDINGS OF 2ND ASIAN CONFERENCE ON MACHINE LEARNING (ACML2010)
  • Mahito Sugiyama
  • ,
  • Akihiro Yamamoto

13
開始ページ
127
終了ページ
143
記述言語
英語
掲載種別
出版者・発行元
MICROTOME PUBLISHING

In this paper we integrate two essential processes, discretization of continuous data and learning of a model that explains them, towards fully computational machine learning from continuous data. Discretization is fundamental for machine learning and data mining, since every continuous datum; e.g., a real-valued datum obtained by observation in the real world, must be discretized and converted from analog (continuous) to digital (discrete) form to store in databases. However, most machine learning methods do not pay attention to the situation; i.e., they use digital data in actual applications on a computer whereas assume analog data (usually vectors of real numbers) theoretically. To bridge the gap, we propose a novel measure of the difference between two sets of data, called the coding divergence, and unify two processes discretization and learning computationally. Discretization of continuous data is realized by a topological mapping (in the sense of mathematics) from the d-dimensional Euclidean space R-d into the Cantor space Sigma(omega), and the simplest model is learned in the Cantor space, which corresponds to the minimum open set separating the given two sets of data. Furthermore, we construct a classifier using the divergence, and experimentally demonstrate robust performance of it. Our contribution is not only introducing a new measure from the computational point of view, but also triggering more interaction between experimental science and machine learning.

リンク情報
DBLP
https://dblp.uni-trier.de/rec/journals/jmlr/SugiyamaY10a
Web of Science
https://gateway.webofknowledge.com/gateway/Gateway.cgi?GWVersion=2&SrcAuth=JSTA_CEL&SrcApp=J_Gate_JST&DestLinkType=FullRecord&KeyUT=WOS:000392007300009&DestApp=WOS_CPL
URL
http://proceedings.mlr.press/v13/sugiyama10b.html
URL
https://dblp.uni-trier.de/conf/acml/2010
URL
https://dblp.uni-trier.de/db/journals/jmlr/jmlrp13.html#SugiyamaY10a
ID情報
  • ISSN : 1938-7288
  • DBLP ID : journals/jmlr/SugiyamaY10a
  • Web of Science ID : WOS:000392007300009

エクスポート
BibTeX RIS