May 13, 1998
Reduction of Expanded Search Terms for Fuzzy English-text Retrieval
IEICE technical report. Data engineering
- ,
- ,
- Volume
- 98
- Number
- 42
- First page
- 63
- Last page
- 70
- Language
- Japanese
- Publishing type
- Publisher
- The Institute of Electronics, Information and Communication Engineers
OSR misrecognition is a serious problem where OCR-recognized text is used for retrieval purpose in digital libraries. We have proposed fuzzy retrieval methods which assume that errors remain in the recognized text, without correcting errors manually from a cost standpoint. The proposed methods generate multiple search terms for an input query term by referring to the confusion matrices which store all characters likely to be misrecognized and the respective probability of each misrecognition. The proposed methods can improve recall rate without decreasing precision rate but occasionally generate a few million search terms in English fuzzy retrieval, which is a bottleneck for retrieval speed. Therefore this paper presents a method to reduce the number of the generated search terms with keeping sufficient retrieval effectiveness by restricting the number of errors included in the expanded search terms.
- Link information
-
- CiNii Articles
- http://ci.nii.ac.jp/naid/110003188816
- CiNii Books
- http://ci.nii.ac.jp/ncid/AN10012921
- ID information
-
- CiNii Articles ID : 110003188816
- CiNii Books ID : AN10012921