SciELO - Scientific Electronic Library Online

 
vol.11 issue20PoV-GAME: VIEWPOINTS THROUGH GAMESFUZZY SYSTEM DESIGN FOR ASSESSMENT OF CONTRIBUTIONS IN COLLABORATIVE SYSTEMS author indexsubject indexarticles search
Home Pagealphabetic serial listing  

Services on Demand

Journal

Article

Indicators

Related links

  • On index processCited by Google
  • Have no similar articlesSimilars in SciELO
  • On index processSimilars in Google

Share


Revista Ingenierías Universidad de Medellín

Print version ISSN 1692-3324On-line version ISSN 2248-4094

Abstract

AMON, Iván; MORENO, Francisco  and  ECHEVERRI, Jaime. PHONETIC ALGORITHM TO DETECT DUPLICATE TEXT STRINGS IN SPANISH. Rev. ing. univ. Medellín [online]. 2012, vol.11, n.20, pp.127-138. ISSN 1692-3324.

Often data that should be written so they are not identical due to misspellings and typos, variations in word order, use of prefixes and suffixes, among others. Phonetic techniques for duplicate detection are not geared toward the Spanish language, which makes the identification and correction of problems such as spelling errors in texts written in this language. In this paper we propose an algorithm called PhoneticSpanish to detect duplicate text strings which considers the presence of spelling errors in Spanish. The proposed algorithm was compared with nine techniques to detect duplicates. The results were satisfactory and the algorithm that performed better than the other techniques and demonstrate opportunities for improved analysis of information in Spanish.

Keywords : Data cleansing; data quality; detection of duplicates; similarity functions; phonetic algorithms.

        · abstract in Spanish     · text in Spanish     · Spanish ( pdf )

 

Creative Commons License All the contents of this journal, except where otherwise noted, is licensed under a Creative Commons Attribution License