Comparing research trends through author-provided keywords with machine extracted terms: A ML algorithm approach using publications data on neurological disorders

Priya Tiwari; Saloni Chaudhary; Debasis Majhi; Bhaskar Mukherjee

Ayuda

Comparing research trends through author-provided keywords with machine extracted terms: A ML algorithm approach using publications data on neurological disorders

Tiwari, Priya ^[1] ; Chaudhary, Saloni ^[1] ; Majhi, Debasis ^[1] ; Mukherjee, Bhaskar ^[1]
1. [1] Banaras Hindu University
  
  Banaras Hindu University
  
  India
Localización: Iberoamerican Journal of Science Measurement and Communication, ISSN 2709-7595, ISSN-e 2709-3158, Vol. 3, Nº. 1, 2023 (Ejemplar dedicado a: Iberoamerican Journal of Science Measurement and Communication)
Idioma: inglés
Títulos paralelos:
- English
Enlaces
- Texto completo (pdf)
Resumen
- Objective. This study aimed to identify the primary research areas, countries, and organizational involvement in publications on neurological disorders through an analysis of human-assigned keywords. These results were then compared with unsupervised and machine-algorithm-based extracted terms from the title and abstract of the publications to gain knowledge about deficiencies of both techniques. This has enabled us to understand how far machine-derived terms through titles and abstracts can be a substitute for human-assigned keywords of scientific research articles.
  
  Design/Methodology/Approach. While significant research areas on neurological disorders were identified from the author-provided keywords of downloaded publications of Web of Science and PubMed, these results were compared by the terms extracted from titles and abstracts through unsupervised based models like VOSviewer and machine-algorithm-based techniques like YAKE and CounterVectorizer.
  
  Results/Discussion. We observed that the post-covid-19 era witnessed more research on various neurological disorders, but authors still chose more generic terms in the keyword list than specific ones. The unsupervised extraction tool, like VOSviewer, identified many other extraneous and insignificant terms along with significant ones. However, our self-developed machine learning algorithm using CountVectorizer and YAKE provided precise results subject to adding more stop-words in the dictionary of the stop-word list of the NLTK tool kit.
  
  Conclusion. We observed that although author provided keywords play a vital role as they are assigned in a broader sense by the author to increase readability, these concept terms lacked specificity for in-depth analysis. We suggested that the ML algorithm being more compatible with unstructured data was a valid alternative to the author-generated keywords for more accurate results.
  
  Originality/Value. To our knowledge, this is the first-ever study that compared the results of author-provided keywords with machine-extracted terms with real datasets, which may be an essential lead in the machine learning domain. Replicating these techniques with large datasets from different fields may be a valuable knowledge resource for experts and stakeholders.