Machine Translation of a Training Set for Semantic Extraction of Relations

Peña-Torres, Jefferson A.; Bucheli, Víctor; Gutiérrez de Piñérez Reyes, Raúl E.

doi:10.19053/0121053x.n39.2022.13436

Serviços Personalizados

Journal

Artigo

Indicadores

Citado por SciELO
Acessos

Links relacionados

Citado por Google
Similares em SciELO
Similares em Google

Permalink

Cuadernos de Lingüística Hispánica

versão impressa ISSN 0121-053Xversão On-line ISSN 2346-1829

Resumo

PENA-TORRES, Jefferson A.; BUCHELI, Víctor e GUTIERREZ DE PINEREZ REYES, Raúl E.. Machine Translation of a Training Set for Semantic Extraction of Relations. Cuad. linguist. hisp. [online]. 2022, n.39, pp.1-. Epub 03-Mar-2023. ISSN 0121-053X. https://doi.org/10.19053/0121053x.n39.2022.13436.

Machine translation (MT) is used to obtain annotated corpus of English corpus which can be applicable to different natural language processing (NLP) tasks. Considering that there are more resources or data sets for training NLP models in English language, this paper explores the application of MT to automate NLP tasks in Spanish. Thus, the article describes a dataset for the extraction of generic relations (reACE) and the construction of a semantic extraction model of relations in Spanish (ER), based on the set of samples translated from English to Spanish. The results show that for the MT task it is necessary to implement a corpus pre-editing process in English to avoid translation and post-editing errors and maintain the original corpus annotations. The ER models in Spanish achieve measures of accuracy, completeness, and F-value comparable to those obtained by the model in the English language, which suggests that machine translation is a useful tool to perform NLP tasks in the Spanish language.

Palavras-chave : computer linguistics; machine translation; corpus linguistics; relations extraction.

· resumo em Português | Espanhol | Francês · texto em Espanhol · Espanhol (

pdf )