OBJECTIVES: The objective of the study is to evaluate the performance of Claude 3.5 Sonnet, a novel multimodal large language model, in interpreting image-based ophthalmology case questions.
METHODS: A total of 174 image-based ophthalmology questions from a comprehensive ophthalmology education plat-form were analyzed by Claude 3.5 Sonnet. Each question was presented in both multiple-choice and open-ended formats. Questions were categorized into six subspecialties: Retina and uveitis; external eye and cornea; orbit and oculoplastics; neuroophthalmology; glaucoma and cataract; and strabismus, pediatric ophthalmology, and genetics. Performance was evaluated by two board-certified ophthalmologists.
RESULTS: Claude 3.5 Sonnet demonstrated an overall accuracy rate of 89.65% in multiple-choice questions and a comparable 87.93% in open-ended questions, with no statistically significant difference between formats (p=0.72). Performance showed slight variations among subspecialties, with the highest accuracy in external eye and cornea cases (95.65% in both formats) and lower accuracy in strabismus, pediatric ophthalmology, and genetics (87.50% in multiple-choice and 84.38% in open-ended).
DISCUSSION AND CONCLUSION: Claude 3.5 Sonnet showed strong capabilities in interpreting image-based ophthalmology questions across all subspecialties, with consistent performance between different question formats. These findings suggest potential applications in ophthalmology education and board examination preparation; however, validation of its utility in real-world clinical scenarios needs further evaluation.
Keywords: Artificial intelligence, Claude 3.5 Sonnet, ophthalmology board examinations