Ayuda
Ir al contenido

Dialnet


Adding Compound Splitting and Analysis to a Semantic Tagger of Modern Standard Finnish – On the Way to FiSTComp

    1. [1] University of Eastern Finland

      University of Eastern Finland

      Kuopio, Finlandia

  • Localización: Human Language Technologies – The Baltic Perspective: Proceedings of the Ninth International Conference Baltic HLT 2020 / coord. por Andrius Utka, Jurgita Vaičenonienė, Jolanta Kovalevskaitė, Danguolė Kalinauskaitė, 2024, ISBN 978-1-64368-116-0, págs. 150-157
  • Idioma: inglés
  • Enlaces
  • Resumen
    • This study continues a work in progress for implementing a full-text lexical semantic tagger for Finnish, FiST. The tagger is based on a 46,226 lexeme semantic lexicon of Finnish that was published in 2016 [1]. Kettunen [2], [3] describes the basic working version of FiST. FiST is based on freely available components: the first implementation uses Omorfi and FinnPos for morphological analysis and disambiguation of Finnish words. The current paper describes work with compound splitting for semantic tagging and its effects on the lexical coverage of the tagger. We try out two different approaches to morphological analysis and disambiguation of words for an improved version of FiST, FiSTComp: FinnPos [4], and Turku Dependency Parser [5], [6], UD1. Both these tools disambiguate morphological interpretations of words and provide boundary markings for compounds, but details and granularity of constituent decomposition vary. Our results with two-, three and four-part compounds show that analysis of compounds through their constituents with UD1 may improve the lexical coverage of the tagger with about 6.6 % units at best. Although we are able to proceed in basic problems of compound splitting, the results are still initial and further work is needed as compounds are a complex phenomenon.


Fundación Dialnet

Dialnet Plus

  • Más información sobre Dialnet Plus

Opciones de compartir

Opciones de entorno