SciELO - Scientific Electronic Library Online

 
vol.24 número2Scoping coupled to the Conesa methodology for the environmental assessment of an advanced system of landfill leachate decontaminationStudy of energy consumption in Haas UMC-750 and Leadwell V-40iT® CNC machining centers índice de autoresíndice de assuntospesquisa de artigos
Home Pagelista alfabética de periódicos  

Serviços Personalizados

Journal

Artigo

Indicadores

Links relacionados

  • Em processo de indexaçãoCitado por Google
  • Não possue artigos similaresSimilares em SciELO
  • Em processo de indexaçãoSimilares em Google

Compartilhar


Ingeniería y competitividad

versão impressa ISSN 0123-3033versão On-line ISSN 2027-8284

Resumo

AMEZQUITA, Juan C  e  ESLAVA, Hermes J. Supervised Learning for data cleaning in the coherence and completeness dimensions. Ing. compet. [online]. 2022, vol.24, n.2, e21011361.  Epub 26-Maio-2022. ISSN 0123-3033.  https://doi.org/10.25100/iyc.v24i2.11361.

Information has become an asset for companies because most business strategic decisions are made based on data analysis; however, the best results are not always obtained in these analyses due to the low quality of information. It as several evaluation dimensions, making the task complex of achieving an adequate level of quality. One of the main activities before proceeding with any type of analysis is the pre-processing of the data. This activity is one of the most demanding in time; the expected levels of quality are not always obtained, nor are the evaluation dimensions with the most significant impact are covered. This work presents the use of machine learning as a tool to clean data in the dimension of completeness and coherence; its validation is done on a data set provided by a government entity in charge of protecting children’s rights at the national level. It starts from the selection of the information processing tools, the descriptive analysis of the data, the specific identification of the problems in which the machine learning techniques will be applied to improve the quality of the data, experimentation, and evaluation of the different models, and finally the implementation of the best performing model. Among the results of this work, there is an improvement in the completeness dimension, decreasing the null data by 4.9%. In the coherence dimension, 2.6% of the records were identified with contradictions, thus validating machine learning for data cleaning.

Palavras-chave : Quality; Data; Machine learning; Completeness; Coherence..

        · resumo em Espanhol     · texto em Inglês     · Inglês ( pdf )