TY - JOUR
T1 - Assessing the quality of ChatGPT's responses to questions related to radiofrequency ablation for varicose veins
AU - Anees, Muhammad
AU - Shaikh, Fareed Ahmed
AU - Shaikh, Hafsah
AU - Siddiqui, Nadeem Ahmed
AU - Rehman, Zia Ur
N1 - Publisher Copyright:
© 2024
PY - 2024
Y1 - 2024
N2 - Objective: This study aimed to evaluate the accuracy and reproducibility of information provided by ChatGPT, in response to frequently asked questions about radiofrequency ablation (RFA) for varicose veins. Methods: This cross-sectional study was conducted at The Aga Khan University Hospital, Karachi, Pakistan. A set of 18 frequently asked questions regarding RFA for varicose veins were compiled from credible online sources and presented to ChatGPT twice, separately, using the new chat option. Twelve experienced vascular surgeons (with >2 years of experience and ≥20 RFA procedures performed annually) independently evaluated the accuracy of the responses using a 4-point Likert scale and assessed their reproducibility. Results: Most evaluators were males (n = 10/12 [83.3%]) with an average of 12.3 ± 6.2 years of experience as a vascular surgeon. Six evaluators (50%) were from the UK followed by three from Saudi Arabia (25.0%), two from Pakistan (16.7%), and one from the United States (8.3%). Among the 216 accuracy grades, most of the evaluators graded the responses as comprehensive (n = 87/216 [40.3%]) or accurate but insufficient (n = 70/216 [32.4%]), whereas only 17.1% (n = 37/216) were graded as a mixture of both accurate and inaccurate information and 10.8% (n = 22/216) as entirely inaccurate. Overall, 89.8% of the responses (n = 194/216) were deemed reproducible. Of the total responses, 70.4% (n = 152/216) were classified as good quality and reproducible. The remaining responses were poor quality with 19.4% reproducible (n = 42/216) and 10.2% nonreproducible (n = 22/216). There was nonsignificant inter-rater disagreement among the vascular surgeons for overall responses (Fleiss' kappa, −0.028; P =.131). Conclusions: ChatGPT provided generally accurate and reproducible information on RFA for varicose veins; however, variability in response quality and limited inter-rater reliability highlight the need for further improvements. Although it has the potential to enhance patient education and support healthcare decision-making, improvements in its training, validation, transparency, and mechanisms to address inaccurate or incomplete information are essential.
AB - Objective: This study aimed to evaluate the accuracy and reproducibility of information provided by ChatGPT, in response to frequently asked questions about radiofrequency ablation (RFA) for varicose veins. Methods: This cross-sectional study was conducted at The Aga Khan University Hospital, Karachi, Pakistan. A set of 18 frequently asked questions regarding RFA for varicose veins were compiled from credible online sources and presented to ChatGPT twice, separately, using the new chat option. Twelve experienced vascular surgeons (with >2 years of experience and ≥20 RFA procedures performed annually) independently evaluated the accuracy of the responses using a 4-point Likert scale and assessed their reproducibility. Results: Most evaluators were males (n = 10/12 [83.3%]) with an average of 12.3 ± 6.2 years of experience as a vascular surgeon. Six evaluators (50%) were from the UK followed by three from Saudi Arabia (25.0%), two from Pakistan (16.7%), and one from the United States (8.3%). Among the 216 accuracy grades, most of the evaluators graded the responses as comprehensive (n = 87/216 [40.3%]) or accurate but insufficient (n = 70/216 [32.4%]), whereas only 17.1% (n = 37/216) were graded as a mixture of both accurate and inaccurate information and 10.8% (n = 22/216) as entirely inaccurate. Overall, 89.8% of the responses (n = 194/216) were deemed reproducible. Of the total responses, 70.4% (n = 152/216) were classified as good quality and reproducible. The remaining responses were poor quality with 19.4% reproducible (n = 42/216) and 10.2% nonreproducible (n = 22/216). There was nonsignificant inter-rater disagreement among the vascular surgeons for overall responses (Fleiss' kappa, −0.028; P =.131). Conclusions: ChatGPT provided generally accurate and reproducible information on RFA for varicose veins; however, variability in response quality and limited inter-rater reliability highlight the need for further improvements. Although it has the potential to enhance patient education and support healthcare decision-making, improvements in its training, validation, transparency, and mechanisms to address inaccurate or incomplete information are essential.
KW - Artificial intelligence
KW - ChatGPT
KW - Large language model
KW - Radiofrequency ablation
KW - Varicose veins
UR - http://www.scopus.com/inward/record.url?scp=85206668298&partnerID=8YFLogxK
U2 - 10.1016/j.jvsv.2024.101985
DO - 10.1016/j.jvsv.2024.101985
M3 - Article
C2 - 39332626
AN - SCOPUS:85206668298
SN - 2213-333X
JO - Journal of Vascular Surgery: Venous and Lymphatic Disorders
JF - Journal of Vascular Surgery: Venous and Lymphatic Disorders
M1 - 101985
ER -