2012
Extraction of topic evolutions from references in scientific articles and its GPU acceleration
ACM International Conference Proceeding Series
- ,
- First page
- 1522
- Last page
- 1526
- Language
- English
- Publishing type
- Research paper (international conference proceedings)
- DOI
- 10.1145/2396761.2398465
- Publisher
- ACM
This paper provides a topic model for extracting topic evolutions as a corpus-wide transition matrix among latent topics. Recent trends in text mining point to a high demand for exploiting metadata. Especially, exploitation of reference relationships among documents induced by hyperlinking Web pages, citing scientific articles, tumblring blog posts, retweeting tweets, etc., is put in the foreground of the effort for an effective mining. We focus on scholarly activities and propose a topic model for obtaining a corpus-wide view on how research topics evolve along citation relationships. Our model, called TERESA, extends latent Dirichlet allocation (LDA) by introducing a corpus-wide topic transition probability matrix, which models reference relationships as transitions among topics. Our approximated variational inference updates LDA posteriors and topic transition posteriors alternately. The main issue is execution time amounting to O(MK2), where K is the number of topics and M is that of links in citation network. Therefore, we accelerate the inference with Nvidia CUDA compatible GPUs. We compare the effectiveness of TERESA with that of LDA by introducing a new measure called diversity plus focusedness (D+F). We also present topic evolution examples our method gives. © 2012 ACM.
- Link information
- ID information
-
- DOI : 10.1145/2396761.2398465
- DBLP ID : conf/cikm/MasadaT12
- SCOPUS ID : 84871036347