Comment on “Comparing the Performance of College Chemistry Students with ChatGPT for Calculations Involving Acids and Bases”

Joshua Schrier ^[1]
1. [1] Fordham University
  
  Fordham University
  
  Estados Unidos
Localización: Journal of chemical education, ISSN 0021-9584, Vol. 101, Nº 5, 2024, págs. 1782-1784
Idioma: inglés
Enlaces
- Texto completo
Resumen
- In a recent paper in this Journal ( J. Chem. Educ. 2023, 100, 3934−3944), Clark et al. evaluated the performance of the GPT-3.5 large language model (LLM) on ten undergraduate pH calculation problems. They reported that GPT-3.5 gave especially poor results for salt and titration problems, returning the correct results only 10% and 0% of the time, respectively, and that, despite a correct application of heuristics, the LLM made mathematical errors and used flawed strategies. However, these problems are partially mitigated using the more advanced GPT-4 model and entirely corrected using simple prompting and calculator tool use patterns demonstrated herein.