Papers

Peer-reviewed
2012

Extraction of topic evolutions from references in scientific articles and its GPU acceleration

ACM International Conference Proceeding Series
  • Tomonari Masada
  • ,
  • Atsuhiro Takasu

First page
1522
Last page
1526
Language
English
Publishing type
Research paper (international conference proceedings)
DOI
10.1145/2396761.2398465
Publisher
ACM

This paper provides a topic model for extracting topic evolutions as a corpus-wide transition matrix among latent topics. Recent trends in text mining point to a high demand for exploiting metadata. Especially, exploitation of reference relationships among documents induced by hyperlinking Web pages, citing scientific articles, tumblring blog posts, retweeting tweets, etc., is put in the foreground of the effort for an effective mining. We focus on scholarly activities and propose a topic model for obtaining a corpus-wide view on how research topics evolve along citation relationships. Our model, called TERESA, extends latent Dirichlet allocation (LDA) by introducing a corpus-wide topic transition probability matrix, which models reference relationships as transitions among topics. Our approximated variational inference updates LDA posteriors and topic transition posteriors alternately. The main issue is execution time amounting to O(MK2), where K is the number of topics and M is that of links in citation network. Therefore, we accelerate the inference with Nvidia CUDA compatible GPUs. We compare the effectiveness of TERESA with that of LDA by introducing a new measure called diversity plus focusedness (D+F). We also present topic evolution examples our method gives. © 2012 ACM.

Link information
DOI
https://doi.org/10.1145/2396761.2398465
DBLP
https://dblp.uni-trier.de/rec/conf/cikm/MasadaT12
URL
http://doi.acm.org/10.1145/2396761.2398465
URL
http://dblp.uni-trier.de/db/conf/cikm/cikm2012.html#conf/cikm/MasadaT12
ID information
  • DOI : 10.1145/2396761.2398465
  • DBLP ID : conf/cikm/MasadaT12
  • SCOPUS ID : 84871036347

Export
BibTeX RIS