Here’s What Happens When ChatGPT Writes a Scientific Article

February 20, 2024 10:09 AM EST

First came the students, who wanted help with their homework and essays. Now, ChatGPT is luring scientists, who are under pressure to publish papers in reputable scientific journals.

AI is already disrupting the archaic world of scientific publishing. When Melissa Kacena, vice chair of orthopaedic surgery at Indiana University School of Medicine, reviews articles submitted for publication in journals, she now knows to look out for ones that might have been written by the AI program. “I have a rule of thumb now that if I pull up 10 random references cited in the paper, and if more than one isn’t accurate, then I reject the paper,” she says.

But despite the pitfalls, there is also promise. Writing review articles, for example, is a task well suited to AI: it involves sifting through the existing research on a subject, analyzing the results, reaching a conclusion about the state of the science on the topic, and providing some new insight. ChatGPT can do all of those things well.

Kacena decided to see who is better at writing review articles: people or ChatGPT. For her study published in Current Osteoporosis Reports, she sorted nine students and the AI program into three groups and asked each group to write a review article on a different topic. For one group, she asked the students to write review articles on the topics; for another, she instructed ChatGPT to write articles on the same topics; and for the last group, she gave each of the students their own ChatGPT account and told them to work together with the AI program to write articles. That allowed her to compare articles written by people, by AI, and a combination of people and AI. She asked faculty member colleagues and the students to fact check each of the articles, and compared the three types of articles on measures like accuracy, ease of reading, and use of appropriate language.

The results were eye-opening. The articles written by ChatGPT were easy to read and were even better written than the students'. But up to 70% of the cited references were inaccurate: they were either incoherently merged from several different studies or completely fictitious. The AI versions were also more likely to be plagiarized.

“ChatGPT was pretty convincing with some of the phony statements it made, to be honest,” says Kacena. “It used the proper syntax and integrated them with proper statements in a paragraph, so sometimes there were no warning bells. It was only because the faculty members had a good understanding of the data, or because the students fact checked everything, that they were detected.”

There were some advantages to the AI-generated articles. The algorithm was faster and more efficient in processing all the required data, and in general, ChatGPT used better grammar than the students. But it couldn't always read the room: AI tended to use more flowery language that wasn’t always appropriate for scientific journals (unless the students had told ChatGPT to write it from the perspective of a graduate-level science student.)

That reflects a truth about the use of AI: it's only as good as the information it receives. While ChatGPT isn’t quite ready to author scientific journal articles, with the proper programming and training, it could improve and become a useful tool for researchers. “Right now it’s not great by itself, but it can be made to work,” says Kacena. For example, if queried, the algorithm was good at recommending ways to summarize data in figures and graphical depictions. “The advice it gave on those were spot on, and exactly what I would have done,” she says.

The more feedback the students provided on ChatGPT's work, the better it learned—and that represents its greatest promise. In the study, some students found that when they worked together with ChatGPT to write the article, the program continued to improve and provide better results if they told it what things it was doing right, and what was less helpful. That means that addressing problems like questionable references and plagiarism could potentially be fixed. ChatGPT could be programmed, for example, to not merge references and to treat each scientific journal article as its own separate reference, and to limit copying consecutive words to avoid plagiarism.

With more input and some fixes, Kacena believes that AI could help researchers smooth out the writing process and even gain scientific insights. "I think ChatGPT is here to stay, and figuring out how to make it better, and how to use it in an ethical and conscientious and scientifically sound manner, is going to be really important,” she says.

AI Writes Scientific Papers That Sound Great—but Aren’t Accurate

More Must-Reads from TIME