Resumen de Comment on “Comparing the Performance of College Chemistry Students with ChatGPT for Calculations Involving Acids and Bases”

In a recent paper in this Journal ( J. Chem. Educ. 2023, 100, 3934−3944), Clark et al. evaluated the performance of the GPT-3.5 large language model (LLM) on ten undergraduate pH calculation problems. They reported that GPT-3.5 gave especially poor results for salt and titration problems, returning the correct results only 10% and 0% of the time, respectively, and that, despite a correct application of heuristics, the LLM made mathematical errors and used flawed strategies. However, these problems are partially mitigated using the more advanced GPT-4 model and entirely corrected using simple prompting and calculator tool use patterns demonstrated herein.

Acceso de usuarios registrados

¿Es nuevo? Regístrese

Coordinado por: