Ayuda
Ir al contenido

Dialnet


POS-tagging a bilingual parallel corpus: methods and challenges

    1. [1] Universidade de Santiago de Compostela

      Universidade de Santiago de Compostela

      Santiago de Compostela, España

  • Localización: Research in Corpus Linguistics (RiCL), ISSN-e 2243-4712, Nº. 5, 2017, págs. 35-46
  • Idioma: inglés
  • Enlaces
  • Resumen
    • This paper reviews the author’s experiences of tokenizing and POS tagging a bilingual parallel corpus, the PaGeS Corpus, consisting mostly of German and Spanish fictional texts. This is part of an ongoing process of annotating the corpus for part-of-speech information. This study discusses the specific problems encountered so far. On the one hand, tagging performance degrades significantly when applied to fictional data and, on the other, pre-existing annotation schemes are all language specific. To further improve accuracy during post-editing, the author has developed a common tagset and identified major error patterns.


Fundación Dialnet

Dialnet Plus

  • Más información sobre Dialnet Plus

Opciones de compartir

Opciones de entorno