DBPapers
DOI: 10.5593/SGEM2014/B21/S7.004

ANALYSIS OF METHODS AND SYSTEMS FOR FUZZY DUPLICATE DETECTION

E. Sharapova
Wednesday 1 October 2014 by Libadmin2014

References: 14th International Multidisciplinary Scientific GeoConference SGEM 2014, www.sgem.org, SGEM2014 Conference Proceedings, ISBN 978-619-7105-10-0 / ISSN 1314-2704, June 19-25, 2014, Book 2, Vol. 1, 27-34 pp

ABSTRACT
In the paper we discuss the problem of fuzzy duplicate texts detecting. We give the basic methods and algorithms to detection of full and fuzzy text duplicates, such as Wagner and Fisher, Masek and Paterson, Ukkonen, suffix tree, Bitap, n-grams, shingles, hashing by signature, I-Match signature, TF*IDF etc. The characteristics of algorithms to fuzzy duplicate detections are shows. We review the existing systems of duplicate texts detecting used for Russian and English languages. The benefits and disadvantages of such systems are described. Existing systems checks text duplications only in the internal database or in the Internet. To better search, the system must operate with both types of sources.

Keywords: text, duplicate, method, system, fuzzy duplicate.

Home | Contact | Site Map | Site statistics | Visitors : 127 / 353063

Follow site activity en  Follow site activity INFORMATICS  Follow site activity Papers SGEM2014   ?

CrossRef Member    Indexed in ISI Web Of Knowledge   Indexed in ISI Web Of Knowledge
   

© Copyright 2001 International Multidisciplinary Scientific GeoConference & EXPO SGEM. All Rights Reserved.

Creative Commons License