Misc.

May 13, 1998

Reduction of Expanded Search Terms for Fuzzy English-text Retrieval

IEICE technical report. Data engineering
  • OHTA Manabu
  • ,
  • TAKASU Atsuhiro
  • ,
  • ADACHI Jun

Volume
98
Number
42
First page
63
Last page
70
Language
Japanese
Publishing type
Publisher
The Institute of Electronics, Information and Communication Engineers

OSR misrecognition is a serious problem where OCR-recognized text is used for retrieval purpose in digital libraries. We have proposed fuzzy retrieval methods which assume that errors remain in the recognized text, without correcting errors manually from a cost standpoint. The proposed methods generate multiple search terms for an input query term by referring to the confusion matrices which store all characters likely to be misrecognized and the respective probability of each misrecognition. The proposed methods can improve recall rate without decreasing precision rate but occasionally generate a few million search terms in English fuzzy retrieval, which is a bottleneck for retrieval speed. Therefore this paper presents a method to reduce the number of the generated search terms with keeping sufficient retrieval effectiveness by restricting the number of errors included in the expanded search terms.

Link information
CiNii Articles
http://ci.nii.ac.jp/naid/110003188816
CiNii Books
http://ci.nii.ac.jp/ncid/AN10012921
ID information
  • CiNii Articles ID : 110003188816
  • CiNii Books ID : AN10012921

Export
BibTeX RIS