Servicios Personalizados
Revista
Articulo
Indicadores
- Citado por SciELO
- Accesos
Links relacionados
- Citado por Google
- Similares en SciELO
- Similares en Google
Compartir
Revista científica
versión impresa ISSN 0124-2253versión On-line ISSN 2344-8350
Resumen
GARCIA-CHICANGANA, David-Santiago et al. Multi-Client Document Classification Service Based on Machine Learning Techniques and Elasticsearch. Rev. Cient. [online]. 2022, n.43, pp.64-79. Epub 18-Feb-2022. ISSN 0124-2253. https://doi.org/10.14483/23448350.18352.
This paper presents a document classification service that allows multiple client (multi-tenant) document management systems to provide greater confidence and credibility regarding the document types assigned to documents uploaded by users. The research was carried out through the phases of CRISP-DM, where two document representation models were evaluated (bags of words with cumulative n-grams and BERT, which was recently proposed by Google) and five machine learning techniques (multilayer perceptron, random forests, k-nearest neighbors, decision trees, and naïve bayes). The experiments were carried out with data from two organizations, and the best results were obtained by multilayer perceptron, random forests, and k-nearest neighbors, which showed very similar results regarding general accuracy and recall by class. The results are not conclusive with respect to the ability to offer the service to multiple clients with a single model, since this also depends on their documents and document types. Therefore, a service is offered which is based on a microservices architecture that allows each organization to create its own model, monitor its performance in production, and update it when performance is not adequate.
Palabras clave : CRISP-DM; data analytics; document management system; k-nearest neighbors; multilayer perceptron; random forests; trigrams..