Ayuda
Ir al contenido

Dialnet


Resumen de Towards the enrichment of terminological resources by scientific corpora analysis

Izabella Thomas, Iana Atanassova

  • The research presented in this paper explores the possibility of enriching terminological databases through the analysis of recent scientific publications. Our main concern is to evaluate how useful automatic term extraction can be to a human expert. To carry out our experiment, we constructed two corpora of recent scientific papers in two different sub-domains of the bio-medical sciences. Then we proceeded with three steps: automatic term extraction and ranking from a set of corpora of scientific papers; evaluation of the overlap of the candidate terms (CTs) extracted from the corpora and those present in the multidisciplinary terminology portal TermSciences; and evaluation by domain experts of the three sets of the top 200 CTs extracted from the different corpora. To extract terms we used the Sensunique Platform, a web based platform for building terminological resources. Our results show that only about 10% of the extracted CTs are present in the TermSciences resource, which means that many of the extracted CTs, if validated, could potentially be used to enrich the terminological database. Furthermore, the expert evaluation of the top 200 terms for each sub-corpus shows clearly that about 75% of these CTs are correct terms in the respective domains. This validates our ranking algorithm.


Fundación Dialnet

Dialnet Plus

  • Más información sobre Dialnet Plus