Ayuda
Ir al contenido

Dialnet


Comparison of Pre-trained vs Custom-trained Word Embedding Models for Word Sense Disambiguation

    1. [1] University of Lahore

      University of Lahore

      Pakistán

    2. [2] University of Central Punjab

      University of Central Punjab

      Pakistán

  • Localización: ADCAIJ: Advances in Distributed Computing and Artificial Intelligence Journal, ISSN-e 2255-2863, Vol. 12, Nº. 1, 2023
  • Idioma: inglés
  • Enlaces
  • Resumen
    • The prime objective of word sense disambiguation (WSD) is to develop such machines that can automatically recognize the actual meaning (sense) of ambiguous words in a sentence. WSD can improve various NLP and HCI challenges. Researchers explored a wide variety of methods to resolve this issue of sense ambiguity. However, majorly, their focus was on English and some other well-reputed languages. Urdu with more than 300 million users and a large amount of electronic text available on the web is still unexplored. In recent years, for a variety of Natural Language Processing tasks, word embedding methods have proven extremely successful. This study evaluates, compares, and applies a variety of word embedding approaches to Urdu Word embedding (both Lexical Sample and All-Words), including pre-trained (Word2Vec, Glove, and FastText) as well as custom-trained (Word2Vec, Glove, and FastText trained on the Ur-Mono corpus). Two benchmark corpora are used for the evaluation in this study: (1) the UAW-WSD-18 corpus and (2) the ULS-WSD-18 corpus. For Urdu All-Words WSD tasks, top results have been achieved (Accuracy=60.07 and F1=0.45) using pre-trained FastText. For the Lexical Sample, WSD has been achieved (Accuracy=70.93 and F1=0.60) using custom-trained GloVe word embedding method.


Fundación Dialnet

Dialnet Plus

  • Más información sobre Dialnet Plus

Opciones de compartir

Opciones de entorno