Introduction The accelerated pace at which new scientific knowledge is produced and communicated makes it challenging for scholars to keep up with the latest advances,even in their own research areas.(1) The growth in the generation and dissemination of scientific information also poses a barrier for the discovery and assessment of relevant findings by editors, research managers and decision makers, limiting and/or delaying the impact of scientific outcomes in the definition of evidence-based policies (Rogers, 2010).
Some elements in the evaluation of scientific production necessarily require the intervention of human experts. This includes weighting the relevance of the problem at stake and the in-depth appraisal of the potential impact of the solutions proposed. Language technologies, however, can facilitate the assessment of other aspects of scientific communication. The verifiability of claims included in articles, their effectiveness with respect to its communication objectives, and their reliability in terms of the provided evidence are some areas in which natural language processing (NLP) tools can make a contribution.
Different quality aspects need to be considered when assessing the argumentative structure of a research paper, including its logic, rhetoric and dialectic dimensions (Wachsmuth et al., 2017). This, in general, involves not only to identify the information that the authors provide in relation to what they do in their work and the conclusions that they draw from it–the claims–but also to consider the motivations/justifications for the proposed intervention and the evidence that they offer to support their assertions–the premises.
The automatic identification of arguments, its components and relations in texts is known as "argument mining" or "argumentation mining" (Lawrence and Reed, 2020). The steps involved in the automatic extraction of arguments from texts, including the identification of claims and premises and the prediction of the argumentative structure (how they are linked together) is not substantially different to other text-mining tasks for which supervised learning methods are generally applied (e.g., text segmentation, sequence labeling and entity linking) (Lippi and Torroni, 2016). However, state-of-the-art results for these tasks are currently obtained by means of parameter-rich neural-based architectures that require large amounts of annotated data. The identification of argumentative units and relations in scientific texts, in particular, has been identified as a particularly challenging task–even for humans–due to the inherent complexity of the scientific discourse (Stab et al.,2014). Scarcity of annotated corpora, therefore, can hinder the advance of argumentation mining in this important domain.
Objectives In the first part of this thesis we address the identification of argumentative components in scientific abstracts and the relations between them. The main goals of this part include:
i. To propose a new annotation scheme for the argumentative structure of scientific abstracts that can contribute to bridge an existing gap between various overlapping research areas, including: argument mining in scientific texts, rhetorical analysis of scientific discourse, and full-fledged discourse parsing; ii. To apply and evaluate the proposed scheme in the annotation of abstracts in two scientific disciplines: computational linguistics and bio-medicine, making the resulting corpus available as a contribution to the research community; iii. To use the newly created corpus to train and evaluate machine-learning models aimed at predicting the argumentative structure of abstracts–both at sentence and intra-sentence levels; iv. To explore the possibility of adapting models trained with texts in one scientific discipline to predict the argumentative structure of abstracts in another discipline;(2) v. To explore potential benefits obtained by leveraging annotations available for related tasks–in particular, discourse parsing and rhetorical classification of sentences–for mining arguments in scientific text; vi. To evaluate two specific transfer-learning approaches in the context of our tasks and domains: supplementary training on intermediate tasks and multi-task learning; vii. To investigate whether benefits can be obtained–in particular, for the identification of argumentative components in scientific texts–by implementing a rhetorical-complexity-aware pipeline that allows sentence-level and intra-sentence level tasks to be addressed individually.
In the second part of the thesis we analyze the practical usability of the gold annotations and the predictions obtained with the models developed in the first part for the automatic assessment of argumentative quality dimensions. This includes:
viii. To analyze whether features obtained from the argumentative structure of scientific abstracts can contribute to predict scores reflecting argumentative-quality dimensions of the abstracts and/or the full papers; ix. To explore the potential benefits of incorporating annotation-confidence information in the training process for models aimed at predicting quality scores.(3) Contributions and outline In the first part of the thesis we focus on the prediction of the argumentative structure of scientific abstracts:
- In Chapter 2 we contextualize our work within the fields of argumentative mining and the analysis of the rhetorical and discourse structure of scientific texts.
- In Chapter 3 we present the SciARG corpus of scientific abstracts. We describe our proposed annotation scheme and its application to the annotation of 225 abstracts in computational linguistics (SciARG-CL). We describe the annotation process and evaluate the agreement of the produced annotations.
- In Chapter 4 we use the SciARG-CL corpus to train and evaluate BERT-based (Devlin et al., 2019) models aimed at predicting the argumentative structure of the abstracts. We consider models trained for each task independently, as well as jointly, in multi-task settings. We investigate, in particular, the possibility of leveraging existing discourse-level annotations by considering, as an intermediate task, the prediction of discourse relations between sentences before fine-tuning the models for our target tasks. When considering the first series of experiments, in Chapter 4, we argue that different methods should be applied for the identification of argumentative components at sentence and intra-sentence levels, and that the rhetorical complexity of sentences should be considered to decide which method(s) to apply in each case.
- In Chapter 5 we explore different ways of determining the rhetorical complexity of sentences and conduct experiments to identify the inner rhetorical and argumentative structures of sentences. We also investigate the possibility of leveraging existing annotations for these tasks. In this case, we consider existing annotations aimed at describing the rhetorical role both of sentences and intra-sentence segments in scientific abstracts, as well as annotations that establish discourse relations within sentences.
- In Chapter 6 we investigate the adaptability of the proposed annotation scheme to a scientific discipline different to the one for which it was originally developed, by extending the SciARG corpus with 285 abstracts in bio-medicine (SciARG-BIO). We analyze similarities and differences between the new annotations and the ones in SciARG-CL, and report results obtained in experiments conducted with them. In particular, we examine whether language models fine-tuned with annotations in computational linguistics can be directly plugged-in in an architecture used to predict the argumentative structure of biomedical abstracts without further fine-tuning.
In the second part of the thesis we explore the potential use of argumentative units and relations predicted by means of the methods described in the first part in a downstream application. In particular, we analyze their potential to predict argumentative quality dimensions of the texts.
- In Chapter 7 we consider related work and antecedents in the area of argumentative quality assessment.
- In Chapter 8 we conduct experiments aimed at predicting quality scores assigned by annotators to the abstracts included in the SciARG-CL corpus.
- In Chapter 9 we use scores assigned by reviewers to manuscripts included in the ACL, CoNLL and ICLR sections of the PeerRead dataset (Kang et al.,2018), in order to investigate whether a set of features extracted from the argumentative structure of scientific abstracts can be used as predictors of specific argumentative quality dimensions of the papers in which they are included.
Finally, in Chapter 10 we summarize the main conclusions of our work and describe potential follow-ups.
(1) In its most recent overview of scientific and scholarly publishing, in 2018, the International Association of Scientific, Technical and Medical Publishers (STM) indicate that ”the number of articles published each year and the number of journals have both grown steadily for over two centuries, by about 3% and 3.5% per year respectively. However, growth has accelerated to 4% per year for articles and over 5% for journals in recent years” (Johnson et al., 2018).
(2) This, in turn, would shed some light about whether the argumentative structure of the abstracts encoded in these models is tied to the scientific discipline in which they are trained.
(3) A task with high levels of subjectivity , where mixed levels of reliability can be obtained for the annotations.
References Devlin, J., Chang, M.-W., Lee, K., and Toutanova, K. (2019). BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 4171–4186, Minneapolis, Minnesota. Association for Computational Linguistics.
Johnson, R., Watkinson, A., and Mabe, M. (2018). The STM report: An overview of scientific and scholarly publishing. International Association of Scientific, Technical and Medical Publishers.
Kang, D., Ammar, W., Dalvi, B., van Zuylen, M., Kohlmeier, S., Hovy, E., and Schwartz, R. (2018). A dataset of peer reviews (PeerRead): Collection, insights and NLP applications. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), pages 1647–1661, New Orleans, Louisiana. Association for Computational Linguistics.
Lawrence, J. and Reed, C. (2020). Argument Mining: A Survey. Computational Linguistics, 45(4):765–818.
Lippi, M. and Torroni, P. (2016). Argumentation mining: State of the art and emerging trends. ACM Trans. Internet Technol., 16(2):10:1–10:25.
Rogers, E. M. (2010). Diffusion of innovations. Simon and Schuster.
Stab, C. and Gurevych, I. (2014). Annotating argument components and relations in persuasive essays. In Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers, pages 1501–1510, Dublin, Ireland. Dublin City University and Association for Computational Linguistics.
Wachsmuth, H., Naderi, N., Hou, Y., Bilu, Y., Prabhakaran, V., Thijm, T. A., Hirst, G., and Stein, B. (2017). Computational argumentation quality assessment in natural language. In Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers, pages 176–187, Valencia, Spain. Association for Computational Linguistics.
© 2001-2024 Fundación Dialnet · Todos los derechos reservados