Ayuda
Ir al contenido

Dialnet


Collecting Collocations for the Albanian Language

    1. [1] University of Erlangen-Nuremberg

      University of Erlangen-Nuremberg

      Kreisfreie Stadt Erlangen, Alemania

  • Localización: Electronic lexicography in the 21st century. Proceedings of the eLex 2019 conference. 1-3 October 2019, Sintra, Portugal / Iztok Kosem (ed. lit.), Tanara Zingano Kuhn (ed. lit.), Margarita Correia (ed. lit.), José Pedro Ferreira (ed. lit.), Maarten Jansen (ed. lit.), Isabel Pereira (ed. lit.), Jelena Kallas (ed. lit.), Miloš Jakubíček (ed. lit.), Simon Krek (ed. lit.), Carole Tiberius (ed. lit.), 2019, págs. 478-489
  • Idioma: inglés
  • Enlaces
  • Resumen
    • The presented paper describes the collecting of data from different sources to build a collocation data set with the aim of compiling the first contemporary collocation dictionary for the Albanian language. The work is based (1) on the analysis of empirical data, i. e. linguistic corpora, using the computational methods and tools, as well as (2) on traditional dictionaries. As empirical data we use the AlCo (Albanian Text Corpus), the AlCoPress 2017-2019, N- Grams extracted from both, methods like Log-likelihood and Dice coefficient using the IMS Open Corpus Workbench (CWB) and the Corpus Query Processor, Web version (CQPweb). Despite the enormous support, an unsupervised automated compilation of a collocation dictionary of high quality, like those created by lexicographers, seems to be impossible without intervention. In order to complete the collection of the data we additionally use lexical information extracted from traditional dictionaries. The primary goal is to create a language resource that can be used among others also for Natural Language Processing purposes. The presented work is still in progress and, of course, will change until its final version.


Fundación Dialnet

Dialnet Plus

  • Más información sobre Dialnet Plus

Opciones de compartir

Opciones de entorno