AI versus Faculty: A Comparative Study of Narrative Feedback on Medical Students' Written Mental Status Exams
Main Article Content
Abstract
Objective: The authors evaluated narrative feedback generated by ChatGPT and faculty members for medical students' Mental Status Exam (MSE) write-ups. The study compared feedback quality and usefulness and assessed whether students and academic psychiatrists could identify the feedback source.
Methods: Medical students (N=164) wrote MSEs and received blinded feedback from either a faculty member (for low-scoring write-ups, n=43) or ChatGPT (for high-scoring write-ups, n=121). Students rated the feedback's quality and usefulness and guessed its origin. Three academic psychiatrists also conducted a blinded evaluation, rating both feedback types for the low-scoring MSEs, choosing the superior version, and guessing the source.
Results: Students rated AI feedback quality significantly higher than faculty feedback (mean=4.22 vs. 3.5). Academic psychiatrists preferred the AI-generated feedback in 93% of cases. Only 29% of students receiving AI feedback correctly identified its source. Psychiatrists correctly identified AI feedback only 23% of the time and misattributed faculty feedback as AI-generated 71% of the time.
Conclusions: AI-generated feedback was perceived as high-quality by students and preferred by expert raters. The difficulty in distinguishing AI from faculty feedback suggests generative AI can produce feedback comparable or superior to human experts, offering a scalable tool to support medical education and reduce faculty workload.
Methods: Medical students (N=164) wrote MSEs and received blinded feedback from either a faculty member (for low-scoring write-ups, n=43) or ChatGPT (for high-scoring write-ups, n=121). Students rated the feedback's quality and usefulness and guessed its origin. Three academic psychiatrists also conducted a blinded evaluation, rating both feedback types for the low-scoring MSEs, choosing the superior version, and guessing the source.
Results: Students rated AI feedback quality significantly higher than faculty feedback (mean=4.22 vs. 3.5). Academic psychiatrists preferred the AI-generated feedback in 93% of cases. Only 29% of students receiving AI feedback correctly identified its source. Psychiatrists correctly identified AI feedback only 23% of the time and misattributed faculty feedback as AI-generated 71% of the time.
Conclusions: AI-generated feedback was perceived as high-quality by students and preferred by expert raters. The difficulty in distinguishing AI from faculty feedback suggests generative AI can produce feedback comparable or superior to human experts, offering a scalable tool to support medical education and reduce faculty workload.
Article Details
How to Cite
S. CLEAVES, Elle et al.
AI versus Faculty: A Comparative Study of Narrative Feedback on Medical Students' Written Mental Status Exams.
Medical Research Archives, [S.l.], v. 14, n. 3, apr. 2026.
ISSN 2375-1924.
Available at: <https://esmed.org/MRA/mra/article/view/7347>. Date accessed: 06 apr. 2026.
doi: https://doi.org/10.18103/mra.v14i3.7347.
Keywords
Artificial Intelligence, Medical Education, Feedback, Mental Status Examination, Psychiatry
Section
Research Articles
The Medical Research Archives grants authors the right to publish and reproduce the unrevised contribution in whole or in part at any time and in any form for any scholarly non-commercial purpose with the condition that all publications of the contribution include a full citation to the journal as published by the Medical Research Archives.