{rfName}
Ap

License and Use

Icono OpenAccess

Citations

3

Altmetrics

Analysis of institutional authors

Garcia Sanchez, Francisco JavierAuthor

Share

March 19, 2025
Publications
>
Article

Application of Artificial Intelligence as an Aid for the Correction of the Objective Structured Clinical Examination (OSCE)

Publicated to: Applied Sciences-Basel. 15 (3): 1153- - 2025-02-01 15(3), DOI: 10.3390/app15031153

Authors:

Luordo, Davide; Torres Arrese, Marta; Tristan Calvo, Cristina; Shani Shani, Kirti Dayal; Rodriguez Cruz, Luis Miguel; Garcia Sanchez, Francisco Javier; Lagares Gomez-Abascal, Alfonso; Rubio Garcia, Rafael; Delgado Jimenez, Juan; Perez Carreras, Mercedes; Diez Lobato, Ramiro; Granizo Martinez, Juan Jose; Tung-Chen, Yale; Villena Garrido, Ma Victoria
[+]

Affiliations

12 Octubre Univ Hosp, Madrid 28041, Spain - Author
Alcorcon Fdn Hosp, Alcorcon 28922, Spain - Author
Autonomous Univ Madrid, Dept Med, Madrid 29040, Spain - Author
Infanta Cristina Univ Hosp, Madrid 28981, Spain - Author
Univ Complutense Madrid, Dept Med, Madrid 28040, Spain - Author
See more

Abstract

The assessment of clinical competencies is essential in medical training, and the Objective Structured Clinical Examination (OSCE) is an essential tool in this process. There are multiple studies exploring the usefulness of artificial intelligence (AI) in medical education. This study explored the use of the GPT-4 AI model to grade clinical reports written by students during the OSCE at the Teaching Unit of the 12 de Octubre and Infanta Cristina University Hospitals, part of the Faculty of Medicine at the Complutense University of Madrid, comparing its results with those of human graders. Ninety-six (96) students participated, and their reports were evaluated by two experts, an inexperienced grader, and the AI using a checklist designed during the OSCE planning by the teaching team. The results show a significant correlation between the AI and human graders (ICC = 0.77 for single measures and 0.91 for average measures). AI was more stringent, assigning scores on an average of 3.51 points lower (t = -15.358, p < 0.001); its correction was considerably faster, completing the analysis in only 24 min compared to the 2-4 h required by human graders. These results suggest that AI could be a promising tool to enhance efficiency and objectivity in OSCE grading.
[+]

Keywords

Ai in healthcareAi-assisted gradingArtificial intelligenceClinical competency assessmentDigital osce evaluationHuman-ai comparison in gradingHuman–ai comparison in gradingMedical educationMedical report evaluatioMedical report evaluationObjective structured clinical examination (osce)

Quality index

Bibliometric impact. Analysis of the contribution and dissemination channel

The work has been published in the journal Applied Sciences-Basel due to its progression and the good impact it has achieved in recent years, according to the agency WoS (JCR), it has become a reference in its field. In the year of publication of the work, 2025, it was in position 50/179, thus managing to position itself as a Q2 (Segundo Cuartil), in the category Engineering, Multidisciplinary. Notably, the journal is positioned en el Cuartil Q2 para la agencia Scopus (SJR) en la categoría Engineering (Miscellaneous).

[+]

Impact and social visibility

From the perspective of influence or social adoption, and based on metrics associated with mentions and interactions provided by agencies specializing in calculating the so-called "Alternative or Social Metrics," we can highlight as of 2026-04-05:

  • The use, from an academic perspective evidenced by the Altmetric agency indicator referring to aggregations made by the personal bibliographic manager Mendeley, gives us a total of: 42.
  • The use of this contribution in bookmarks, code forks, additions to favorite lists for recurrent reading, as well as general views, indicates that someone is using the publication as a basis for their current work. This may be a notable indicator of future more formal and academic citations. This claim is supported by the result of the "Capture" indicator, which yields a total of: 42 (PlumX).

With a more dissemination-oriented intent and targeting more general audiences, we can observe other more global scores such as:

  • The Total Score from Altmetric: 7.
  • The number of mentions on the social network X (formerly Twitter): 10 (Altmetric).

It is essential to present evidence supporting full alignment with institutional principles and guidelines on Open Science and the Conservation and Dissemination of Intellectual Heritage. A clear example of this is:

  • The work has been submitted to a journal whose editorial policy allows open Open Access publication.
  • Assignment of a Handle/URN as an identifier within the deposit in the Institutional Repository: http://hdl.handle.net/20.500.14352/122244
[+]

Project objectives

La aportación persigue los siguientes objetivos: analizar la aplicabilidad del modelo de inteligencia artificial GPT-4 para la corrección de informes clínicos en el examen OSCE; comparar los resultados de la corrección automática con la evaluación realizada por expertos humanos y un evaluador inexperto; determinar la correlación entre las calificaciones otorgadas por la inteligencia artificial y los evaluadores humanos, evidenciada por un ICC de 0.77 en medidas individuales y 0.91 en medidas promedio; evaluar la rigurosidad del sistema AI, que asignó puntuaciones en promedio 3.51 puntos más bajas (t = -15.358, p < 0.001); y valorar la eficiencia temporal, destacando que la corrección automática se realizó en 24 minutos frente a las 2-4 horas humanas.
[+]

Most relevant results

El estudio evaluó la aplicación del modelo de inteligencia artificial GPT-4 para la corrección de informes clínicos en el OSCE, comparando sus resultados con los de evaluadores humanos. Se observó una correlación significativa entre la IA y los expertos, con un coeficiente de correlación intraclase (ICC) de 0.77 para medidas individuales y 0.91 para medidas promedio. La IA mostró mayor rigurosidad, asignando puntuaciones en promedio 3.51 puntos inferiores (t = -15.358, p < 0.001). Además, la corrección realizada por la IA fue sustancialmente más rápida, completándose en 24 minutos frente a las 2-4 horas necesarias para los evaluadores humanos. Estos resultados evidencian la eficacia y rapidez del uso de IA en la evaluación clínica.
[+]

Awards linked to the item

The funding for this study was provided by the Spanish Society for Medical Education (SEDEM) through Grant 3/2024.
[+]