Collecting Collocations for the Albanian Language

Besim Kabashi

Ayuda

Collecting Collocations for the Albanian Language

Besim Kabashi ^[1]
1. [1] University of Erlangen-Nuremberg
  
  University of Erlangen-Nuremberg
  
  Kreisfreie Stadt Erlangen, Alemania
Localización: Electronic lexicography in the 21st century. Proceedings of the eLex 2019 conference. 1-3 October 2019, Sintra, Portugal / Iztok Kosem (ed. lit.), Tanara Zingano Kuhn (ed. lit.), Margarita Correia (ed. lit.), José Pedro Ferreira (ed. lit.), Maarten Jansen (ed. lit.), Isabel Pereira (ed. lit.), Jelena Kallas (ed. lit.), Miloš Jakubíček (ed. lit.), Simon Krek (ed. lit.), Carole Tiberius (ed. lit.), 2019, págs. 478-489
Idioma: inglés
Enlaces
- Texto completo (pdf)
Resumen
- The presented paper describes the collecting of data from different sources to build a collocation data set with the aim of compiling the first contemporary collocation dictionary for the Albanian language. The work is based (1) on the analysis of empirical data, i. e. linguistic corpora, using the computational methods and tools, as well as (2) on traditional dictionaries. As empirical data we use the AlCo (Albanian Text Corpus), the AlCoPress 2017-2019, N- Grams extracted from both, methods like Log-likelihood and Dice coefficient using the IMS Open Corpus Workbench (CWB) and the Corpus Query Processor, Web version (CQPweb). Despite the enormous support, an unsupervised automated compilation of a collocation dictionary of high quality, like those created by lexicographers, seems to be impossible without intervention. In order to complete the collection of the data we additionally use lexical information extracted from traditional dictionaries. The primary goal is to create a language resource that can be used among others also for Natural Language Processing purposes. The presented work is still in progress and, of course, will change until its final version.