SciELO - Scientific Electronic Library Online

 
vol.24 número51Sistema de gestión de energía descentralizado basado en multiagentes para operación de múltiples microrredesModelando la evolución del SARS COV 2 usando una aproximación fraccionaria índice de autoresíndice de materiabúsqueda de artículos
Home Pagelista alfabética de revistas  

Servicios Personalizados

Revista

Articulo

Indicadores

Links relacionados

  • En proceso de indezaciónCitado por Google
  • No hay articulos similaresSimilares en SciELO
  • En proceso de indezaciónSimilares en Google

Compartir


TecnoLógicas

versión impresa ISSN 0123-7799versión On-line ISSN 2256-5337

Resumen

LOPEZ-TRUJILLO, Sebastián  y  TORRES-MADRONERO, Maria C.. Comparison of Text Summarization Algorithms for Processing Editorials and News in Spanish. TecnoL. [online]. 2021, vol.24, n.51, pp.120-132.  Epub 04-Oct-2021. ISSN 0123-7799.  https://doi.org/10.22430/22565337.1816.

Language is affected not only by grammatical rules but also by the context and socio-cultural differences. Therefore, automatic text summarization, an area of interest in natural language processing (NLP), faces challenges such as identifying essential fragments according to the context and establishing the type of text under analysis. Previous literature has described several automatic summarization methods; however, no studies so far have examined their effectiveness in specific contexts and Spanish texts. In this paper, we compare three automatic summarization algorithms using news articles and editorials in Spanish. The three algorithms are extractive methods that estimate the importance of a phrase or word based on similarity or word frequency metrics. A document database was built with 33 editorials and 27 news articles, and three summaries of each text were manually extracted employing the three algorithms. The algorithms were quantitatively compared using the Recall-Oriented Understudy for Gisting Evaluation (ROUGE) metric. We analyzed the algorithms’ potential to identify the main components of a text. In the case of editorials, the automatic summary should include a problem and the author’s opinion. Regarding news articles, the summary should describe the temporal and spatial characteristics of an event. In terms of word reduction percentage and accuracy, the method based on the similarity matrix produced the best results and can achieve a 70 % reduction in both cases (i.e., news and editorials). However, semantics and context should be incorporated into the algorithms to improve their performance in terms of accuracy and sensitivity.

Palabras clave : Natural language processing; Recall Oriented Understudy for Gisting Evaluation; Text Analysis; Text Mining; Automatic Summarization.

        · resumen en Español     · texto en Español     · Español ( pdf )