Construct: Authentic standard setting methods will demonstrate high convergent validity evidence of their outcomes, that is, cutoff scores and pass/fail decisions, with most other methods when compared with each other. Background: The objective structured clinical examination (OSCE) was established for valid, reliable, and objective assessment of clinical skills in health professions education. Various standard setting methods have been proposed to identify objective, reliable, and valid cutoff scores on OSCEs. These methods may identify different cutoff scores for the same examinations. Identification of valid and reliable cutoff scores for OSCEs remains an important issue and a challenge. Approach: Thirty OSCE stations administered at least twice in the years 2010–2012 to 393 medical students in Years 2 and 3 at Aga Khan University are included. Psychometric properties of the scores are determined. Cutoff scores and pass/fail decisions of Wijnen, Cohen, Mean–1.5SD, Mean–1SD, Angoff, borderline group and borderline regression (BL-R) methods are compared with each other and with three variants of cluster analysis using repeated measures analysis of variance and Cohen's kappa. Results: The mean psychometric indices on the 30 OSCE stations are reliability coefficient = 0.76 (SD = 0.12); standard error of measurement = 5.66 (SD = 1.38); coefficient of determination = 0.47 (SD = 0.19), and intergrade discrimination = 7.19 (SD = 1.89). BL-R and Wijnen methods show the highest convergent validity evidence among other methods on the defined criteria. Angoff and Mean–1.5SD demonstrated least convergent validity evidence. The three cluster variants showed substantial convergent validity with borderline methods. Conclusions: Although there was a high level of convergent validity of Wijnen method, it lacks the theoretical strength to be used for competency-based assessments. The BL-R method is found to show the highest convergent validity evidences for OSCEs with other standard setting methods used in the present study. We also found that cluster analysis using mean method can be used for quality assurance of borderline methods. These findings should be further confirmed by studies in other settings.
- clinical skills assessment
- standard setting