SciELO - Scientific Electronic Library Online

 
vol.28 número50Beneficio de la resistencia de la enmienda al aserrín/ceniza de madera en la estabilización de cemento de un suelo expansivoRepresentación y estimación del coeficiente de potencia en sistemas de conversión de energía eólica índice de autoresíndice de materiabúsqueda de artículos
Home Pagelista alfabética de revistas  

Servicios Personalizados

Revista

Articulo

Indicadores

Links relacionados

  • En proceso de indezaciónCitado por Google
  • No hay articulos similaresSimilares en SciELO
  • En proceso de indezaciónSimilares en Google

Compartir


Revista Facultad de Ingeniería

versión impresa ISSN 0121-1129

Resumen

TORRES-DOMINGUEZ, Omar et al. Anomalies detection for big data. Rev. Fac. ing. [online]. 2019, vol.28, n.50, pp.62-76. ISSN 0121-1129.  https://doi.org/10.19053/01211129.v28.n50.2019.8793.

The development of the digital age has resulted in a considerable increase in data volumes. These large volumes of data have been called big data since they exceed the processing capacity of conventional database systems. Several sectors consider various opportunities and applications in the detection of anomalies in big data problems. This type of analysis can be very useful the use of data mining techniques because it allows extracting patterns and relationships from large amounts of data. The processing and analysis of these data volumes need tools capable of processing them as Apache Spark and Hadoop. These tools do not have specific algorithms for detecting anomalies. The general objective of the work is to develop a new algorithm for the detection of neighborhood-based anomalies in big data problems. From a comparative study, the KNNW algorithm was selected by its results, in order to design a big data variant. The implementation of the big data algorithm was done in the Apache Spark tool, using the parallel programming paradigm MapReduce. Subsequently different experiments were performed to analyze the behavior of the algorithm with different configurations. Within the experiments, the execution times and the quality of the results were compared between the sequential variant and the big data variant. Getting better results, the big data variant with significant difference. Getting the big data variant, KNNW-Big Data, can process large volumes of data.

Palabras clave : big data; data mining; detecting anomalies; MapReduce.

        · resumen en Español | Portugués     · texto en Español     · Español ( pdf )