OBJECTIVES: The increasing use of large language models, such as ChatGPT, in academic writing has raised significant ethical concerns within the academic community. This study explores the potential challenges posed by the ability of artificial intelligence (AI) to produce realistic, evidence-based academic texts and investigates whether these challenges can be effectively controlled.
METHODS: Three original articles in the field of ophthalmology were provided as input to ChatGPT-4o to generate introduction sections. A total of 50 introduction texts were synthesized from 150 original articles. These AI-generated texts were analyzed using AI detectors (GPTZero, Writer, CorrectorApp, and ZeroGPT) and a plagiarism detector. In addition, the ability of AI detectors to differentiate between original and AI-generated texts was evaluated.
RESULTS: There was a statistically significant difference in AI detector probabilities between original and AI-generated texts (p<0.001 for all detectors). GPTZero demonstrated a sensitivity of 100% and a specificity of 96% in distinguishing original from AI-generated texts, outperforming all other AI detectors. However, paraphrased AI-generated texts significantly reduced the detection accuracy of GPTZero (p<0.001).
DISCUSSION AND CONCLUSION: ChatGPT-4o demonstrated the ability to synthesize new texts with referenced citations within seconds, capable of bypassing plagiarism detectors. However, AI detectors showed limitations in achieving absolute accuracy and occasionally misclassified original texts. Even with the most accurate AI detectors, a simple paraphrasing method significantly compromised prediction accuracy, highlighting the need for improved detection strategies and ethical oversight.