Exploring overlapping clusters using dynamic re-scaling and sampling

KNOWLEDGE AND INFORMATION SYSTEMS

Mei Kobayashi
Masaki Aono

巻: 10
号: 3
開始ページ: 295
終了ページ: 313
記述言語: 英語
掲載種別: 研究論文（学術雑誌）
DOI: 10.1007/s10115-006-0005-y
出版者・発行元: SPRINGER LONDON LTD

Until recently, the aim of most text-mining work has been to understand major topics and clusters. Minor topics and clusters have been relatively neglected even though they may represent important information on rare events. We present a novel method for exploring overlapping clusters of heterogeneous sizes, which is based on vector space modeling, covariance matrix analysis, random sampling, and dynamic re-weighting of document vectors in massive databases. Our system addresses a combination of difficult issues in database analysis, such as synonymy and polysemy, identification of minor clusters, accommodation of cluster overlap, automatic labeling of clusters based on their document contents, and the user-controlled trade-off between speed of computation and quality of results. We conducted implementation studies with new articles from the Reuters and LA Times TREC data sets and artificially generated data with a known cluster structure to demonstrate the effectiveness of our system.

リンク情報

DOI: https://doi.org/10.1007/s10115-006-0005-y
Web of Science: https://gateway.webofknowledge.com/gateway/Gateway.cgi?GWVersion=2&SrcAuth=JSTA_CEL&SrcApp=J_Gate_JST&DestLinkType=FullRecord&KeyUT=WOS:000241961800002&DestApp=WOS_CPL

ID情報

DOI : 10.1007/s10115-006-0005-y
ISSN : 0219-1377
Web of Science ID : WOS:000241961800002

エクスポート: BibTeX RIS

青野雅樹

論文

Exploring overlapping clusters using dynamic re-scaling and sampling

メニュー

共著者の一覧