論文

査読有り
2017年12月

An extension of the FFT-based algorithm for the match-count problem to weighted scores

IEEJ Transactions on Electrical and Electronic Engineering
  • Kensuke Baba

12
開始ページ
S97
終了ページ
S100
記述言語
英語
掲載種別
研究論文(学術雑誌)
DOI
10.1002/tee.22554
出版者・発行元
WILEY

© 2017 Institute of Electrical Engineers of Japan. Published by John Wiley & Sons, Inc. The match-count problem on strings is the basic problem of counting the matches of characters between two strings for every possible alignment. The problem is classically computed in O(σ n log m) time using a fast Fourier transform (FFT) for two strings of lengths m and n (m ≤ n) over an alphabet of size σ. This paper extends the target of this FFT-based algorithm to a weighted version of the problem, which computes the sum of similarities between characters instead of the number of matches. The algorithm extended in this paper can solve the weighted match-count problem in O(dn log m) time by mapping characters to numerical vectors of dimensionality d. This paper also evaluates the usefulness of the extended algorithm by applying it to plagiarism detection in documents. The experimental results show that the proposed algorithm is applicable to general vector representation of words and that the obtained plagiarism detection method can extremely reduce the processing time with a slight decrease of accuracy from the method based on the normal match-count problem.

リンク情報
DOI
https://doi.org/10.1002/tee.22554
Web of Science
https://gateway.webofknowledge.com/gateway/Gateway.cgi?GWVersion=2&SrcAuth=JSTA_CEL&SrcApp=J_Gate_JST&DestLinkType=FullRecord&KeyUT=WOS:000417572600013&DestApp=WOS_CPL
Scopus
https://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=85037539476&origin=inward
Scopus Citedby
https://www.scopus.com/inward/citedby.uri?partnerID=HzOxMe3b&scp=85037539476&origin=inward
ID情報
  • DOI : 10.1002/tee.22554
  • ISSN : 1931-4973
  • eISSN : 1931-4981
  • ORCIDのPut Code : 46202282
  • SCOPUS ID : 85037539476
  • Web of Science ID : WOS:000417572600013

エクスポート
BibTeX RIS