Karst Exploration: Extracting Terms and Definitions from Karst Domain Corpus

Senja Pollak; Andraz Repar; Matej Martinc; Vid Podpecan

Ayuda

Karst Exploration: Extracting Terms and Definitions from Karst Domain Corpus

Senja Pollak ^[1] ; Andraž Repar ^[1] ; Matej Martinc ^[1] ; Vid Podpečan ^[1]
1. [1] Jožef Stefan Institute
  
  Jožef Stefan Institute
  
  Eslovenia
Localización: Electronic lexicography in the 21st century. Proceedings of the eLex 2019 conference. 1-3 October 2019, Sintra, Portugal / Iztok Kosem (ed. lit.), Tanara Zingano Kuhn (ed. lit.), Margarita Correia (ed. lit.), José Pedro Ferreira (ed. lit.), Maarten Jansen (ed. lit.), Isabel Pereira (ed. lit.), Jelena Kallas (ed. lit.), Miloš Jakubíček (ed. lit.), Simon Krek (ed. lit.), Carole Tiberius (ed. lit.), 2019, págs. 934-956
Idioma: inglés
Enlaces
- Texto completo (pdf)
Resumen
- In this paper, we present the extraction of specialized knowledge from a corpus of karstology literature. Domain terms are extracted by comparing the domain corpus to a reference corpus, and several heuristics to improve the extraction process are proposed (filtering based on nested terms, stopwords and fuzzy matching). We also use a word embedding model to extend the list of terms, and evaluate the potential of the approach from a term extraction perspective, as well as in terms of semantic relatedness. This step is followed by an automated term alignment and analysis of the Slovene and English karst terminology in terms of cognates. Finally, the corpus is used for extracting domain definitions, as well as triplets, where the latter can be considered as a potential resource for complementary knowledge-rich context extraction and visualization.