The Re-Pair algorithm proposed by Larsson and Moffat in 1999 is a simple grammar-based compression method that achieves an extremely high compression ratio. However, Re-Pair is an offine and very space consuming algorithm. Thus, to apply it to a v...
OHAMA Iku, KIDA Takuya, ARIMURA Hiroki, ABE Toshihisa
IEICE technical report 110(476) 9-16 Mar 2011
We discuss the geographical clustering problem on human activity logs. In this paper, we formulate the problem as a segmentation problem for 2-dimensional sequence data, and propose two clustering methods, named LS-linHMM and X-linHMM. LS-linHMM i...
In this paper, we discuss about a method to preserve both search efficiency and compression ratio by combining VF coding with arithmetic coding. A VF code, we discuss here, is a source coding scheme that parses an input text into variable-length b...
IEICE technical report. Theoretical foundations of Computing 109(108) 1-8 Jun 2009
We discuss the problem of pattern matching on a VF coded text. We derive a KMP-type pattern matching algorithm from the collage system, which is a general framework to capture the essence of compressed pattern matching. The algorithm runs on a VF ...
Technical report of IEICE. PRMU 108(363) 25-30 Dec 2008
We can touch a numerous number of images on the Internet in this days. Almost all images are stored in compressed form. The JPEG format is one of the most popular image formats. For a JPEG image, we usually need to decode it into a bitmap image in...
Web browsers play an important role in information-gathering on the Internet. However, searching and browsing documents are still time-consuming jobs for many users. In this paper we present an algorithm for keyword extraction at the current Web p...
In some intelligent application of text retrieval, it is required to do a search just through particular parts of target text data with consideration for some properties that the text have. Namely, the texts are followed by additional information,...
Recently many libraries provide their patrons with OPAC (Online Public Access Catalog) system for retrieving their materials. However, in libraries with a long history for example, it often happens that a patron can not find very old materials by ...
A space-efficient linear-time approximation algorithm for the grammar-based compression problem, which requests for a given string to find a smallest context-free grammar deriving the string, is presented. The algorithm consumes only O(g_* log g_*...
IEICE technical report. Theoretical foundations of Computing 103(622) 61-68 Jan 2004
A variable length don't care (VLDC) pattern is a string pattern including gaps that can match any strings. Let P be a VLDC pattern and P be a set of strings that match to P. Given P and a string T, the problem we address is to answer whether T inc...