TY - JOUR
T1 - Standard Setting Methods for Pass/Fail Decisions on High-Stakes Objective Structured Clinical Examinations
T2 - A Validity Study
AU - Yousuf, Naveed
AU - Violato, Claudio
AU - Zuberi, Rukhsana W.
N1 - Publisher Copyright:
Copyright © 2015, Taylor & Francis Group, LLC.
PY - 2015/7/3
Y1 - 2015/7/3
N2 - Construct: Authentic standard setting methods will demonstrate high convergent validity evidence of their outcomes, that is, cutoff scores and pass/fail decisions, with most other methods when compared with each other. Background: The objective structured clinical examination (OSCE) was established for valid, reliable, and objective assessment of clinical skills in health professions education. Various standard setting methods have been proposed to identify objective, reliable, and valid cutoff scores on OSCEs. These methods may identify different cutoff scores for the same examinations. Identification of valid and reliable cutoff scores for OSCEs remains an important issue and a challenge. Approach: Thirty OSCE stations administered at least twice in the years 2010–2012 to 393 medical students in Years 2 and 3 at Aga Khan University are included. Psychometric properties of the scores are determined. Cutoff scores and pass/fail decisions of Wijnen, Cohen, Mean–1.5SD, Mean–1SD, Angoff, borderline group and borderline regression (BL-R) methods are compared with each other and with three variants of cluster analysis using repeated measures analysis of variance and Cohen's kappa. Results: The mean psychometric indices on the 30 OSCE stations are reliability coefficient = 0.76 (SD = 0.12); standard error of measurement = 5.66 (SD = 1.38); coefficient of determination = 0.47 (SD = 0.19), and intergrade discrimination = 7.19 (SD = 1.89). BL-R and Wijnen methods show the highest convergent validity evidence among other methods on the defined criteria. Angoff and Mean–1.5SD demonstrated least convergent validity evidence. The three cluster variants showed substantial convergent validity with borderline methods. Conclusions: Although there was a high level of convergent validity of Wijnen method, it lacks the theoretical strength to be used for competency-based assessments. The BL-R method is found to show the highest convergent validity evidences for OSCEs with other standard setting methods used in the present study. We also found that cluster analysis using mean method can be used for quality assurance of borderline methods. These findings should be further confirmed by studies in other settings.
AB - Construct: Authentic standard setting methods will demonstrate high convergent validity evidence of their outcomes, that is, cutoff scores and pass/fail decisions, with most other methods when compared with each other. Background: The objective structured clinical examination (OSCE) was established for valid, reliable, and objective assessment of clinical skills in health professions education. Various standard setting methods have been proposed to identify objective, reliable, and valid cutoff scores on OSCEs. These methods may identify different cutoff scores for the same examinations. Identification of valid and reliable cutoff scores for OSCEs remains an important issue and a challenge. Approach: Thirty OSCE stations administered at least twice in the years 2010–2012 to 393 medical students in Years 2 and 3 at Aga Khan University are included. Psychometric properties of the scores are determined. Cutoff scores and pass/fail decisions of Wijnen, Cohen, Mean–1.5SD, Mean–1SD, Angoff, borderline group and borderline regression (BL-R) methods are compared with each other and with three variants of cluster analysis using repeated measures analysis of variance and Cohen's kappa. Results: The mean psychometric indices on the 30 OSCE stations are reliability coefficient = 0.76 (SD = 0.12); standard error of measurement = 5.66 (SD = 1.38); coefficient of determination = 0.47 (SD = 0.19), and intergrade discrimination = 7.19 (SD = 1.89). BL-R and Wijnen methods show the highest convergent validity evidence among other methods on the defined criteria. Angoff and Mean–1.5SD demonstrated least convergent validity evidence. The three cluster variants showed substantial convergent validity with borderline methods. Conclusions: Although there was a high level of convergent validity of Wijnen method, it lacks the theoretical strength to be used for competency-based assessments. The BL-R method is found to show the highest convergent validity evidences for OSCEs with other standard setting methods used in the present study. We also found that cluster analysis using mean method can be used for quality assurance of borderline methods. These findings should be further confirmed by studies in other settings.
KW - OSCE
KW - clinical skills assessment
KW - psychometrics
KW - standard setting
KW - validity
UR - http://www.scopus.com/inward/record.url?scp=84938525724&partnerID=8YFLogxK
U2 - 10.1080/10401334.2015.1044749
DO - 10.1080/10401334.2015.1044749
M3 - Article
C2 - 26158330
AN - SCOPUS:84938525724
SN - 1040-1334
VL - 27
SP - 280
EP - 291
JO - Teaching and Learning in Medicine
JF - Teaching and Learning in Medicine
IS - 3
ER -